Re: Kernel patch cases qemu live migration failed.

2020-10-19 Thread Dave Martin
On Mon, Oct 19, 2020 at 05:23:11PM +0200, Andrew Jones wrote:
> On Mon, Oct 19, 2020 at 03:58:40PM +0100, Dave Martin wrote:
> > On Mon, Oct 19, 2020 at 03:18:11PM +0100, Peter Maydell wrote:
> > > On Mon, 19 Oct 2020 at 14:40, Andrew Jones  wrote:
> > > >
> > > > On Mon, Oct 19, 2020 at 12:43:33PM +0100, Peter Maydell wrote:
> > > > > Well, ID regs are special in the architecture -- they always exist
> > > > > and must RAZ/WI, even if they're not actually given any fields yet.
> > > > > This is different from other "unused" parts of the system register
> > > > > encoding space, which UNDEF.
> > > >
> > > > Table D12-2 confirms the register should be RAZ, as it says the register
> > > > is "RO, but RAZ if SVE is not implemented". Does "RO" imply "WI", 
> > > > though?
> > > > For the guest we inject an exception on writes, and for userspace we
> > > > require the value to be preserved on write.
> > > 
> > > Sorry, I mis-spoke. They're RAZ, but not WI, just RO (which is to say
> > > they'll UNDEF if you try to write to them).
> > > 
> > > > I think we should follow the spec, even for userspace access, and be RAZ
> > > > for when the feature isn't implemented. As for writes, assuming the
> > > > exception injection is what we want for the guest (not WI), then that's
> > > > correct. For userspace, I think we should continue forcing preservation
> > > > (which will force preservation of zero when it's RAZ).
> > > 
> > > Yes, that sounds right.
> > 
> > [...]
> > 
> > > > > The problem is that you've actually removed registers from
> > > > > the list that were previously in it (because pre-SVE
> > > > > kernels put this ID register in the list as a RAZ/WI register,
> > > > > and now it's not in the list if SVE isn't supported).
> > 
> > Define "previously", though.  IIUC, the full enumeration was added in
> > v4.15 (with ID_AA64ZFR0_EL1 still not supported at all):
> > 
> > v4.15-rc1~110^2~27
> > 93390c0a1b20 ("arm64: KVM: Hide unsupported AArch64 CPU features from 
> > guests")
> > 
> > 
> > And then ID_AA64FZR0_EL1 was removed from the enumeration, also in
> > v4.15:
> > 
> > v4.15-rc1~110^2~5
> > 07d79fe7c223 ("arm64/sve: KVM: Hide SVE from CPU features exposed to 
> > guests")
> > 
> > 
> > So, are there really two upstram kernel tags that are mismatched on
> > this, or is this just a bisectability issue in v4.14..v4.15?
> > 
> > It's a while since I looked at this, and I may have misunderstood the
> > timeline.
> > 
> > 
> > > > > > So, I think that instead of changing the ID_AA64ZFR0_EL1 behaviour,
> > > > > > parhaps we should move all ID_UNALLOCATED() regs (and possibly
> > > > > > ID_HIDDEN(), not sure about that) to have REG_HIDDEN_USER 
> > > > > > visibility.
> > > > >
> > > > > What does this do as far as the user-facing list-of-registers
> > > > > is concerned? All these registers need to remain in the
> > > > > KVM_GET_REG_LIST list, or you break migration from an old
> > > > > kernel to a new one.
> > 
> > OK, I think I see where you are coming from, now.
> > 
> > It may make sense to get rid of the REG_HIDDEN_GUEST / REG_HIDDEN_USER
> > distinction, and provide the same visibility for userspace as for MSR/
> > MRS all the time.  This would restore ID_AA64ZFR0_EL1 into the userspace
> > view, and may also allow a bit of simplification in the code.
> > 
> > Won't this will still break migration from the resulting kernel to a
> > current kernel that hides ID_AA64ZFR0_EL1?  Or have I misunderstood
> > something.
> >
> 
> Yes, but, while neither direction old -> new nor new -> old is actually
> something that people should do when using host cpu passthrough (they
> should only ever migrate between identical hosts, both hardware and
> host kernel version), migrating from old -> new makes more sense, as
> that's the upgrade path, and it's more supportable - we can workaround
> things on the new side. So, long story short, new -> old will fail due
> to making this change, but it's still probably the right thing to do,
> as we'll be defining a better pattern for ID registers going forward,
> and we never claimed new -> old migrations would work anyway with host
> passthrough.
> 
> Thanks,
> drew

Ack, just wanted to make sure I understood the implications correctly.

I'm still not sure I fully understand why we hit this problem (i.e.,
ZFR0 enumeration mismatch between host and guest) in the first place,
unless I've misunderstood which patches make these changes, or unless
RHEL has cherry-picked odd patches that weren't intended to be applied
separately...

Cheers
---Dave



Re: Kernel patch cases qemu live migration failed.

2020-10-19 Thread Dave Martin
On Mon, Oct 19, 2020 at 03:18:11PM +0100, Peter Maydell wrote:
> On Mon, 19 Oct 2020 at 14:40, Andrew Jones  wrote:
> >
> > On Mon, Oct 19, 2020 at 12:43:33PM +0100, Peter Maydell wrote:
> > > Well, ID regs are special in the architecture -- they always exist
> > > and must RAZ/WI, even if they're not actually given any fields yet.
> > > This is different from other "unused" parts of the system register
> > > encoding space, which UNDEF.
> >
> > Table D12-2 confirms the register should be RAZ, as it says the register
> > is "RO, but RAZ if SVE is not implemented". Does "RO" imply "WI", though?
> > For the guest we inject an exception on writes, and for userspace we
> > require the value to be preserved on write.
> 
> Sorry, I mis-spoke. They're RAZ, but not WI, just RO (which is to say
> they'll UNDEF if you try to write to them).
> 
> > I think we should follow the spec, even for userspace access, and be RAZ
> > for when the feature isn't implemented. As for writes, assuming the
> > exception injection is what we want for the guest (not WI), then that's
> > correct. For userspace, I think we should continue forcing preservation
> > (which will force preservation of zero when it's RAZ).
> 
> Yes, that sounds right.

[...]

> > > The problem is that you've actually removed registers from
> > > the list that were previously in it (because pre-SVE
> > > kernels put this ID register in the list as a RAZ/WI register,
> > > and now it's not in the list if SVE isn't supported).

Define "previously", though.  IIUC, the full enumeration was added in
v4.15 (with ID_AA64ZFR0_EL1 still not supported at all):

v4.15-rc1~110^2~27
93390c0a1b20 ("arm64: KVM: Hide unsupported AArch64 CPU features from guests")


And then ID_AA64FZR0_EL1 was removed from the enumeration, also in
v4.15:

v4.15-rc1~110^2~5
07d79fe7c223 ("arm64/sve: KVM: Hide SVE from CPU features exposed to guests")


So, are there really two upstram kernel tags that are mismatched on
this, or is this just a bisectability issue in v4.14..v4.15?

It's a while since I looked at this, and I may have misunderstood the
timeline.


> > > > So, I think that instead of changing the ID_AA64ZFR0_EL1 behaviour,
> > > > parhaps we should move all ID_UNALLOCATED() regs (and possibly
> > > > ID_HIDDEN(), not sure about that) to have REG_HIDDEN_USER visibility.
> > >
> > > What does this do as far as the user-facing list-of-registers
> > > is concerned? All these registers need to remain in the
> > > KVM_GET_REG_LIST list, or you break migration from an old
> > > kernel to a new one.

OK, I think I see where you are coming from, now.

It may make sense to get rid of the REG_HIDDEN_GUEST / REG_HIDDEN_USER
distinction, and provide the same visibility for userspace as for MSR/
MRS all the time.  This would restore ID_AA64ZFR0_EL1 into the userspace
view, and may also allow a bit of simplification in the code.

Won't this will still break migration from the resulting kernel to a
current kernel that hides ID_AA64ZFR0_EL1?  Or have I misunderstood
something.

Cheers
---Dave



Re: Kernel patch cases qemu live migration failed.

2020-10-19 Thread Dave Martin
On Mon, Oct 19, 2020 at 11:25:25AM +0200, Andrew Jones wrote:
> On Thu, Oct 15, 2020 at 03:57:02PM +0100, Peter Maydell wrote:
> > On Thu, 15 Oct 2020 at 15:41, Andrew Jones  wrote:
> > > The reporter states neither the source nor destination hardware supports
> > > SVE. My guess is that what's happening is the reserved ID register
> > > ID_UNALLOCATED(4,4) was showing up in the KVM_GET_REG_LIST count on
> > > the old kernel, but the new kernel filters it out. Maybe it is a
> > > bug to filter it out of the count, as it's a reserved ID register and
> > > I suppose the other reserved ID registers are still showing up?
> > 
> > Yeah, RES0 ID registers should show up in the list, because otherwise
> > userspace has to annoyingly special case them when the architecture
> > eventually defines behaviour for them.
> > 
> > Dave's comment in the kernel commit message
> > # ID_AA64ZFR0_EL1 is RO-RAZ for MRS/MSR when SVE is disabled for the
> > # guest, but for compatibility with non-SVE aware KVM implementations
> > # the register should not be enumerated at all for KVM_GET_REG_LIST
> > # in this case.
> > seems wrong to me -- for compatibility the register should remain
> > present and behave as RAZ/WI if SVE is disabled in the guest,
> > the same way it was before the kernel/KVM knew about SVE at all.
> 
> Yup, I agree with you and I'll try writing a patch for this.
> 
> Thanks,
> drew

I'm not quite sure about Peter's assessment here.

I agree with the inconsistency identified here: we always enumerate all
unallocated ID regs, but we enumerate ID_AA64ZFR0_EL1 conditionally.
This doesn't feel right: on a non-SVE guest, ID_AA64ZFR0_EL1 should
behave exactly as an unallocated ID register.

I'm not sure about the proposed fix.

For one thing, I'm not sure that old hosts will accept writing of 0 to
arbitrary ID regs.  This may require some digging, but commit
93390c0a1b20 ("arm64: KVM: Hide unsupported AArch64 CPU features from guests")
may be the place to start.

My original idea was that at the source end we should be conservative:
enumerate and dump the minimum set of registers relevant to the
target -- for compatibility with old hosts that don't handle the
unallocated ID regs at all.  At the destination end, modern hosts
should be permissive, i.e., allow any ID reg to be set to 0, but don't
require the setting of any reg that older source hosts might not send.

So, I think that instead of changing the ID_AA64ZFR0_EL1 behaviour,
parhaps we should move all ID_UNALLOCATED() regs (and possibly
ID_HIDDEN(), not sure about that) to have REG_HIDDEN_USER visibility.

Thoughts?

---Dave



Re: [PATCH v3 RESEND] fcntl: Add 32bit filesystem mode

2020-10-13 Thread Dave Martin
On Tue, Oct 13, 2020 at 12:06:20AM +0200, Linus Walleij wrote:
> It was brought to my attention that this bug from 2018 was
> still unresolved: 32 bit emulators like QEMU were given
> 64 bit hashes when running 32 bit emulation on 64 bit systems.
> 
> This adds a flag to the fcntl() F_GETFD and F_SETFD operations
> to set the underlying filesystem into 32bit mode even if the
> file handle was opened using 64bit mode without the compat
> syscalls.
> 
> Programs that need the 32 bit file system behavior need to
> issue a fcntl() system call such as in this example:
> 
>   #define FD_32BIT_MODE 2
> 
>   int main(int argc, char** argv) {
> DIR* dir;
> int err;
> int fd;
> 
> dir = opendir("/boot");
> fd = dirfd(dir);
> err = fcntl(fd, F_SETFD, FD_32BIT_MODE);
> if (err) {
>   printf("fcntl() failed! err=%d\n", err);
>   return 1;
> }
> printf("dir=%p\n", dir);
> printf("readdir(dir)=%p\n", readdir(dir));
> printf("errno=%d: %s\n", errno, strerror(errno));
> return 0;
>   }
> 
> This can be pretty hard to test since C libraries and linux
> userspace security extensions aggressively filter the parameters
> that are passed down and allowed to commit into actual system
> calls.
> 
> Cc: Florian Weimer 
> Cc: Peter Maydell 
> Cc: Andy Lutomirski 
> Suggested-by: Theodore Ts'o 
> Link: https://bugs.launchpad.net/qemu/+bug/1805913
> Link: https://lore.kernel.org/lkml/87bm56vqg4@mid.deneb.enyo.de/
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=205957
> Signed-off-by: Linus Walleij 
> ---
> ChangeLog v3->v3 RESEND 1:
> - Resending during the v5.10 merge window to get attention.
> ChangeLog v2->v3:
> - Realized that I also have to clear the flag correspondingly
>   if someone ask for !FD_32BIT_MODE after setting it the
>   first time.
> ChangeLog v1->v2:
> - Use a new flag FD_32BIT_MODE to F_GETFD and F_SETFD
>   instead of a new fcntl operation, there is already a fcntl
>   operation to set random flags.
> - Sorry for taking forever to respin this patch :(
> ---
>  fs/fcntl.c   | 7 +++
>  include/uapi/asm-generic/fcntl.h | 8 
>  2 files changed, 15 insertions(+)
> 
> diff --git a/fs/fcntl.c b/fs/fcntl.c
> index 19ac5baad50f..6c32edc4099a 100644
> --- a/fs/fcntl.c
> +++ b/fs/fcntl.c
> @@ -335,10 +335,17 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned 
> long arg,
>   break;
>   case F_GETFD:
>   err = get_close_on_exec(fd) ? FD_CLOEXEC : 0;
> + /* Report 32bit file system mode */
> + if (filp->f_mode & FMODE_32BITHASH)
> + err |= FD_32BIT_MODE;
>   break;
>   case F_SETFD:
>   err = 0;
>   set_close_on_exec(fd, arg & FD_CLOEXEC);
> + if (arg & FD_32BIT_MODE)
> + filp->f_mode |= FMODE_32BITHASH;
> + else
> + filp->f_mode &= ~FMODE_32BITHASH;

This seems inconsistent?  F_SETFD is for setting flags on a file
descriptor.  Won't setting a flag on filp here instead cause the
behaviour to change for all file descriptors across the system that are
open on this struct file?  Compare set_close_on_exec().

I don't see any discussion on whether this should be an F_SETFL or an
F_SETFD, though I see F_SETFD was Ted's suggestion originally.

[...]

Cheers
---Dave



Re: [Qemu-devel] [PATCH v7 4/6] include/elf: Add defines related to GNU property notes for AArch64

2019-08-20 Thread Dave Martin
On Tue, Aug 20, 2019 at 04:59:50PM +0100, Richard Henderson wrote:
> On 8/20/19 8:39 AM, Peter Maydell wrote:
> > On Sat, 3 Aug 2019 at 22:08, Richard Henderson
> >  wrote:
> >>
> >> These are all of the defines required to parse
> >> GNU_PROPERTY_AARCH64_FEATURE_1_AND, copied from binutils.
> >> Other missing defines related to other GNU program headers
> >> and notes are elided for now.
> >>
> >> Signed-off-by: Richard Henderson 
> > 
> > What's the authoritative source for these definitions? I
> > tried looking in glibc, binutils and the kernel without
> > any luck.
> 
> Presumably the true "authoritative" source is an ARM document, but I don't 
> have
> that handy.
> 
> For binutils, the defines are in include/elf/common.h:
> 
> https://sourceware.org/git/?p=binutils-gdb.git;a=blob_plain;f=include/elf/common.h;hb=HEAD
> 
> The upstream kernel is also lacking the defines, as they're part of the ARM
> patch set that is still in flight.  The defines are still not present in glibc
> as of today.


The AArch64 spec is here:

https://developer.arm.com/docs/ihi0056/latest/elf-for-the-arm-64-bit-architecture-aarch64-abi-2019q2-documentation

Cheers
---Dave



Re: [Qemu-devel] [RFC] Add virtual SDEI support in qemu

2019-07-16 Thread Dave Martin
On Mon, Jul 15, 2019 at 03:44:46PM +0100, Mark Rutland wrote:
> On Mon, Jul 15, 2019 at 03:26:39PM +0100, James Morse wrote:
> > On 15/07/2019 14:48, Mark Rutland wrote:
> > > On Mon, Jul 15, 2019 at 02:41:00PM +0100, Dave Martin wrote:
> > >> One option (suggested to me by James Morse) would be to allow userspace
> > >> to disable in the in-kernel PSCI implementation and provide its own
> > >> PSCI to the guest via SMC -- in which case userspace that wants to
> > >> implement SDEI would have to implement PSCI as well.
> > > 
> > > I think this would be the best approach, since it puts userspace in
> > > charge of everything.
> > > 
> > > However, this interacts poorly with FW-based mitigations that we
> > > implement in hyp. I suspect we'd probably need a mechanism to delegate
> > > that responsibility back to the kernel, and figure out if that has any
> > > interaction with thigns that got punted to userspace...
> > 
> > This has come up before:
> > https://lore.kernel.org/r/59c139d0.3040...@arm.com
> > 
> > I agree Qemu should opt-in to this, it needs to be a feature that is 
> > enabled.
> > 
> > I had an early version of something like this for testing SDEI before
> > there was firmware available. The review feedback from Christoffer was
> > that it should include HVC and SMC, their immediates, and shouldn't be
> > tied to SMC-CC ranges.
> > 
> > I think this should be a catch-all as Heyi describes to deliver
> > 'unhandled SMC/HVC' to user-space as hypercall exits. We should
> > include the immediate in the struct.
> > 
> > We can allow Qemu to disable the in-kernel PSCI implementation, which
> > would let it be done in user-space via this catch-all mechanism. (PSCI
> > in user-space has come up on another thread recently). The in-kernel
> > PSCI needs to be default-on for backwards compatibility.
> > 
> > As Mark points out, the piece that's left is the 'arch workaround'
> > stuff. We always need to handle these in the kernel. I don't think
> > these should be routed-back, they should be un-obtainable by
> > user-space.
> 
> Sure; I meant that those should be handled in the kernel rather than
> going to host userspace and back.
> 
> I was suggesting was that userspace would opt into taking ownership of
> all HVC calls, then explicitly opt-in to the kernel handling specific
> (sets of) calls.

The most logical thing to do would be to have userspace handle all
calls, but add an ioctl to forward a call to KVM.  This puts userspace
in charge of the SMCCC interface, with KVM handling only those things
that userspace can't do for itself, on request.

If the performance overhead is unacceptable for certain calls, we could
have a way to delegate specific function IDs to KVM.  I suspect that
will be the exception rather than the rule.

> There are probably issues with that, but I suspect defining "all
> undandled calls" will be problematic otherwise.

Agreed: the set of calls not handled by KVM will mutate over time.

Cheers
---Dave



Re: [Qemu-devel] [RFC] Add virtual SDEI support in qemu

2019-07-16 Thread Dave Martin
On Mon, Jul 15, 2019 at 02:48:49PM +0100, Mark Rutland wrote:
> On Mon, Jul 15, 2019 at 02:41:00PM +0100, Dave Martin wrote:

[...]

> > So long as KVM_EXIT_HYPERCALL reports sufficient information so that
> > userspace can identify the cause as an SMC and retrieve the SMC
> > immediate field, this seems feasible.
> > 
> > For its own SMCCC APIs, KVM exclusively uses HVC, so rerouting SMC to
> > userspace shouldn't conflict.
> 
> Be _very_ careful here! In systems without EL3 (and without NV), SMC
> UNDEFs rather than trapping to EL2. Given that, we shouldn't build a
> hypervisor ABI that depends on SMC.

Good point.  I was hoping that was all ancient history by now, but if
not...

[...]

Cheers
---Dave



Re: [Qemu-devel] [RFC] Add virtual SDEI support in qemu

2019-07-15 Thread Dave Martin
On Sat, Jul 13, 2019 at 05:53:57PM +0800, Guoheyi wrote:
> Hi folks,
> 
> Do it make sense to implement virtual SDEI in qemu? So that we can have the
> standard way for guest to handle NMI watchdog, RAS events and something else
> which involves SDEI in a physical ARM64 machine.
> 
> My basic idea is like below:
> 
> 1. Change a few lines of code in kvm to allow unhandled SMC invocations
> (like SDEI) to be sent to qemu, with exit reason of KVM_EXIT_HYPERCALL, so
> we don't need to add new API.

So long as KVM_EXIT_HYPERCALL reports sufficient information so that
userspace can identify the cause as an SMC and retrieve the SMC
immediate field, this seems feasible.

For its own SMCCC APIs, KVM exclusively uses HVC, so rerouting SMC to
userspace shouldn't conflict.

This bouncing of SMCs to userspace would need to be opt-in, otherwise
old userspace would see exits that it doesn't know what to do with.

> 2. qemu handles supported SDEI calls just as the spec says for what a
> hypervisor should do for a guest OS.
> 
> 3. For interrupts bound to hypervisor, qemu should stop injecting the IRQ to
> guest through KVM, but jump to the registered event handler directly,
> including context saving and restoring. Some interrupts like virtual timer
> are handled by kvm directly, so we may refuse to bind such interrupts to
> SDEI events.

Something like that.

Interactions between SDEI and PSCI would need some thought: for example,
after PSCI_CPU_ON, the newly online cpu needs to have SDEs masked.

One option (suggested to me by James Morse) would be to allow userspace
to disable in the in-kernel PSCI implementation and provide its own
PSCI to the guest via SMC -- in which case userspace that wants to
implement SDEI would have to implement PSCI as well.

There may be reasons why this wouldn't work ... I haven't thought about
it in depth.

Cheers
---Dave



Re: [Qemu-devel] [PATCH v2 07/14] target/arm/cpu64: max cpu: Introduce sve properties

2019-06-27 Thread Dave Martin
On Thu, Jun 27, 2019 at 12:47:01PM +0100, Andrew Jones wrote:
> On Thu, Jun 27, 2019 at 01:00:27PM +0200, Auger Eric wrote:
> > Hi,
> > 
> > On 6/27/19 12:46 PM, Andrew Jones wrote:
> > > On Wed, Jun 26, 2019 at 06:56:54PM +0200, Auger Eric wrote:
> > >>> diff --git a/target/arm/helper.c b/target/arm/helper.c
> > >>> index f500ccb6d31b..b7b719dba57f 100644
> > >>> --- a/target/arm/helper.c
> > >>> +++ b/target/arm/helper.c
> > >>> @@ -5324,7 +5324,16 @@ static void zcr_write(CPUARMState *env, const 
> > >>> ARMCPRegInfo *ri,
> > >>>  
> > >>>  /* Bits other than [3:0] are RAZ/WI.  */
> > >>>  QEMU_BUILD_BUG_ON(ARM_MAX_VQ > 16);
> > >>> -raw_write(env, ri, value & 0xf);
> > >>> +value &= 0xf;
> > >>> +
> > >>> +if (value) {
> > >>> +/* get next vq that is smaller than or equal to value's vq */
> > >>> +uint32_t vq = value + 1;
> > >>> +vq = arm_cpu_vq_map_next_smaller(cpu, vq + 1);
> > >>> +value = vq - 1;
> > >> spec says:
> > >>
> > >> "if an unsupported vector length is requested in ZCR_ELx, the
> > >> implementation is required to select the largest
> > >> supported vector length that is less than the requested length. This
> > >> does not alter the value of ZCR_ELx.LEN.
> > >> "
> > >>
> > >> So I understand the value written in the reg should not be unmodified.
> > >>
> > > 
> > > Sorry, I can't parse what you're trying to tell me here. Here we have
> > > to write 'value', because that's what the guest is trying to do. As the
> > > spec says in your quote, we have to pick the length the guest wants, or
> > > the next smaller valid one, so that's what the code above does. So are
> > > you just stating that you agree with this hunk of the code?
> > What we are writing into the reg is arm_cpu_vq_map_next_smaller(cpu, vq
> > + 1) -1. Maybe I misunderstand the whole wording but I would have
> > expected the original unmodified value to be written in the reg instead?
> 
> Hmm... So maybe we need more changes to the emulation in order for it to
> have an acting value and a register value? Maybe Richard knows what we
> should do here.

The "effective" value of an individual ZCR_ELx.LEN field is a bit
nonsensical: the effective vector length comes from the minimum
LEN value at any relevant EL (depending on which ELs are implemented,
which security state you're in, VHE controls etc.).

This is tedious to compute, so I'd expect you want to cache it,
recomputing it only when a ZCR_ELx.LEN changes or when you switch to a
different EL.


The architecture says:

"For all purposes other than returning the result of a direct read of
ZCR_EL1 then this field behaves as if it is set to the minimum of the
stored value and the constrained length inherited from more privileged
Exception levels in the current Security state, rounded down to the
nearest implemented vector length."

I think the behaviour of a direct read is implied: the LEN bits yielded
by an MRS should contain exactly what was last written to them via MSR.

Cheers
---Dave



Re: [Qemu-devel] [PATCH v2 10/14] target/arm/kvm64: Add kvm_arch_get/put_sve

2019-06-27 Thread Dave Martin
On Thu, Jun 27, 2019 at 12:26:06PM +0100, Richard Henderson wrote:
> On 6/27/19 12:59 PM, Dave Martin wrote:
> >> It's a shame that these slices exist at all.  It seems like the kernel 
> >> could
> >> use the negotiated max sve size to grab the data all at once.
> > 
> > The aim here was to be forwards compatible while fitting within the
> > existing ABI.
> > 
> > The ABI doesn't allow variable-sized registers, and if the vq can
> > someday grow above 16 then the individual registers could become pretty
> > big.
> 
> The ABI doesn't appear to have fixed sized data blocks.  Since that's
> the case, it didn't seem to me that variable sized blocks was so
> different, given that the actual size is constrained by the hardware
> on which we're running.

I'm not sure what you mean here.

For KVM_GET_ONE_REG, the size is determined by the reg size field in
the register ID, so size is deemed to be a fixed property of a
particular register.

Having the register IDs vary according to the vector length seemed a
step too far.

> And if VQ does grow large, then do we really want oodles of syscalls in order
> to transfer the data for each register?  With the 9 bits reserved for this
> field, we could require a maximum of 1024 syscalls to transfer the whole
> register set.

A save/restore requires oodles of syscalls in any case, and for SVE
there is a rapid dropoff of probabilities: VQ < 16 is much likelier than
VQ == 32 is likelier than VQ == 64 etc.

The reg access API has some shortcomings, and we might find at some
point that the whole thing needs redesigning.

I suppose we could have taken the view that the KVM ABI would not even
try to support VQ > 16 in a forwards compatible way.  In the end we
decided to at least have things workable.

Either way, it's entirely reasonable for userspace not to try to support
additional slices for now.  We'll have plenty of time to plan away
across that bridge when we spot it on the horizon...

> > It's for QEMU to choose, but does it actually matter what byteorder you
> > store a Z-reg or P-reg in?  Maybe the byteswap isn't really needed.
> 
> I think the only sensible order for the kernel is that in which LDR/STR itself
> saves the data.  Which is a byte stream.

We have a choice of STRs though.  Anyway, yes, it is the way it is, now.

> Within QEMU, it has so far made sense to keep the data in 64-bit hunks in
> host-endian order.  That's how the AdvSIMD code was written originally, and it
> turned out to be easy enough to continue that for SVE.

Fair enough.  It's entirely up to QEMU to decide -- I just wanted to
check that there was no misunderstanding about this issue in the ABI.

> Which does mean that if we want to pass data to the kernel as the
> aforementioned byte stream that a big-endian host does need to bswap the data
> in 64-bit hunks.
> 
> > I don't know how this works when migrating from a little-endian to a
> > big-endian host or vice-versa (or if that is even supported...)
> 
> The data is stored canonically within the vmsave, so such migrations are
> supposed to work.

Right, I was wondering about that.  Could be fun to test :)

Cheers
---Dave



Re: [Qemu-devel] [PATCH v2 10/14] target/arm/kvm64: Add kvm_arch_get/put_sve

2019-06-27 Thread Dave Martin
On Wed, Jun 26, 2019 at 04:22:34PM +0100, Richard Henderson wrote:
> On 6/21/19 6:34 PM, Andrew Jones wrote:
> > +/*
> > + * If ARM_MAX_VQ is increased to be greater than 16, then we can no
> > + * longer hard code slices to 1 in kvm_arch_put/get_sve().
> > + */
> > +QEMU_BUILD_BUG_ON(ARM_MAX_VQ > 16);
> 
> This seems easy to fix, or simply drop the slices entirely for now, as
> otherwise they are a teeny bit confusing.
> 
> It's a shame that these slices exist at all.  It seems like the kernel could
> use the negotiated max sve size to grab the data all at once.

The aim here was to be forwards compatible while fitting within the
existing ABI.

The ABI doesn't allow variable-sized registers, and if the vq can
someday grow above 16 then the individual registers could become pretty
big.

Inside the kernel, we took the view that if that ever happens, it's
sufficiently far out that we can skip implementing the support today,
providing that the ABI can accommodate the change.

For qemu, if you don't actually care what's in the regs, you could just
enumerate then using KVM_GET_REG_LIST instead of manufacturing the IDs
by hand.  That way, you don't have to care what slices exist.  For
save/restore/migrate purposes, the regs are just data, so that's
probably enough.

Debug, and exchanging vector registers between the guest and, say, and
SMC trapped to userspace, would still need to examine specific regs.
I don't know what QEMU does in this area though.

> > +for (n = 0; n < KVM_ARM64_SVE_NUM_ZREGS; n++) {
> > +uint64_t *q = aa64_vfp_qreg(env, n);
> > +#ifdef HOST_WORDS_BIGENDIAN
> > +uint64_t d[ARM_MAX_VQ * 2];
> > +int j;
> > +for (j = 0; j < cpu->sve_max_vq * 2; j++) {
> > +d[j] = bswap64(q[j]);
> > +}
> > +reg.addr = (uintptr_t)d;
> > +#else
> > +reg.addr = (uintptr_t)q;
> > +#endif
> > +reg.id = KVM_REG_ARM64_SVE_ZREG(n, i);
> > +ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, );
> 
> It might be worth splitting this...
> 
> > +for (n = 0; n < KVM_ARM64_SVE_NUM_PREGS; n++) {
> > +uint64_t *q = >vfp.pregs[n].p[0];
> > +#ifdef HOST_WORDS_BIGENDIAN
> > +uint64_t d[ARM_MAX_VQ * 2 / 8];
> > +int j;
> > +for (j = 0; j < cpu->sve_max_vq * 2 / 8; j++) {
> > +d[j] = bswap64(q[j]);
> > +}
> > +reg.addr = (uintptr_t)d;
> > +#else
> > +reg.addr = (uintptr_t)q;
> > +#endif
> > +reg.id = KVM_REG_ARM64_SVE_PREG(n, i);
> > +ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, );

It's for QEMU to choose, but does it actually matter what byteorder you
store a Z-reg or P-reg in?  Maybe the byteswap isn't really needed.

I don't know how this works when migrating from a little-endian to a
big-endian host or vice-versa (or if that is even supported...)

[...]

Cheers
---Dave



Re: [Qemu-devel] [PATCH v2 12/14] target/arm/kvm: scratch vcpu: Preserve input kvm_vcpu_init features

2019-06-27 Thread Dave Martin
On Thu, Jun 27, 2019 at 08:30:57AM +0100, Auger Eric wrote:
> Hi Drew,
> 
> On 6/21/19 6:34 PM, Andrew Jones wrote:
> > kvm_arm_create_scratch_host_vcpu() takes a struct kvm_vcpu_init
> > parameter. Rather than just using it as an output parameter to
> > pass back the preferred target, use it also as an input parameter,
> > allowing a caller to pass a selected target if they wish and to
> > also pass cpu features. If the caller doesn't want to select a
> > target they can pass -1 for the target which indicates they want
> > to use the preferred target and have it passed back like before.
> > 
> > Signed-off-by: Andrew Jones 
> > ---
> >  target/arm/kvm.c   | 20 +++-
> >  target/arm/kvm32.c |  6 +-
> >  target/arm/kvm64.c |  6 +-
> >  3 files changed, 25 insertions(+), 7 deletions(-)
> > 
> > diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> > index 60645a196d3d..66c0c198604a 100644
> > --- a/target/arm/kvm.c
> > +++ b/target/arm/kvm.c
> > @@ -64,7 +64,7 @@ bool kvm_arm_create_scratch_host_vcpu(const uint32_t 
> > *cpus_to_try,
> >int *fdarray,
> >struct kvm_vcpu_init *init)
> >  {
> > -int ret, kvmfd = -1, vmfd = -1, cpufd = -1;
> > +int ret = 0, kvmfd = -1, vmfd = -1, cpufd = -1;
> >  
> >  kvmfd = qemu_open("/dev/kvm", O_RDWR);
> >  if (kvmfd < 0) {
> > @@ -84,7 +84,14 @@ bool kvm_arm_create_scratch_host_vcpu(const uint32_t 
> > *cpus_to_try,
> >  goto finish;
> >  }
> >  
> > -ret = ioctl(vmfd, KVM_ARM_PREFERRED_TARGET, init);
> > +if (init->target == -1) {
> > +struct kvm_vcpu_init preferred;
> > +
> > +ret = ioctl(vmfd, KVM_ARM_PREFERRED_TARGET, );
> > +if (!ret) {
> > +init->target = preferred.target;
> wouldn't it be safe to copy the whole struct. Kernel code says:
> /*
>  * For now, we don't return any features.
>  * In future, we might use features to return target
>  * specific features available for the preferred
>  * target type.
>  */

Marc or Christoffer should preferably comment on this.

I think the spirit of the ABI is that can use the whole return of
KVM_ARM_PREFERRED_TARGET as a reasonable template for KVM_VCPU_INIT
without it blowing up in your face.

I initially tried to report SVE as available through this route,
but we decided against it precisely because userspace might be doing
the above.

[...]

Cheers
---Dave



Re: [Qemu-devel] [PATCH v2 10/14] target/arm/kvm64: Add kvm_arch_get/put_sve

2019-06-24 Thread Dave Martin
On Mon, Jun 24, 2019 at 12:55:53PM +0100, Andrew Jones wrote:
> On Mon, Jun 24, 2019 at 12:05:35PM +0100, Dave Martin wrote:
> > On Fri, Jun 21, 2019 at 05:34:18PM +0100, Andrew Jones wrote:
> > > These are the SVE equivalents to kvm_arch_get/put_fpsimd. Note, the
> > > swabbing is different than it is for fpsmid because the vector format
> > > is a little-endian stream of words.
> > 
> > Note, on big-endian hosts the FPSIMD view Vn and the SVE view Zn[127:0]
> > of the FPSIMD/SVE common register bits has the opposite endianness for
> > SVE_{GET,SET}_ONE_REG.
> > 
> > This only matters if mixing the two views: just from this patch I don't
> > know whether this is an issue for QEMU or not.
> 
> I don't know either. My experience with the emulation side of QEMU is
> mostly the zcr_write tweak in this series. And, TBH, I didn't put too
> much thought into the endianness stuff, nor test this series with big
> endian.

Neither did I (at least beyond the "does it boot" level) -- hence the
bug ;)

And of course, few people are using big-endian, so nobody complained.

Just flagging it up so it doesn't get missed!

> Hopefully Richard can chime in on this.

It would be interesting to know whether you do hit this issue
somewhere, or whether you have a strong view about the clarification to
the KVM ABI.

Cheers
---Dave



Re: [Qemu-devel] [PATCH v2 07/14] target/arm/cpu64: max cpu: Introduce sve properties

2019-06-24 Thread Dave Martin
On Mon, Jun 24, 2019 at 01:10:41PM +0100, Andrew Jones wrote:
> On Mon, Jun 24, 2019 at 01:49:11PM +0200, Andrew Jones wrote:
> > On Mon, Jun 24, 2019 at 12:05:26PM +0100, Dave Martin wrote:
> > > On Fri, Jun 21, 2019 at 05:34:15PM +0100, Andrew Jones wrote:
> > > > Introduce cpu properties to give fine control over SVE vector lengths.
> > > > We introduce a property for each valid length up to the current
> > > > maximum supported, which is 2048-bits. The properties are named, e.g.
> > > > sve128, sve256, sve512, ..., where the number is the number of bits.
> > > > 
> > > > It's now possible to do something like -cpu max,sve-max-vq=4,sve384=off
> > > > to provide a guest vector lengths 128, 256, and 512 bits. The resulting
> > > > set must conform to the architectural constraint of having all 
> > > > power-of-2
> > > > lengths smaller than the maximum length present. It's also possible to
> > > > only provide sve properties, e.g. -cpu max,sve512=on. That
> > > > example provides the machine with 128, 256, and 512 bit vector lengths.
> > > > It doesn't hurt to explicitly ask for all expected vector lengths,
> > > > which is what, for example, libvirt should do.
> > > > 
> > > > Note1, it is not possible to use sve properties before
> > > > sve-max-vq, e.g. -cpu max,sve384=off,sve-max-vq=4, as supporting
> > > > that overly complicates the user input validation.
> > > > 
> > > > Note2, while one might expect -cpu max,sve-max-vq=4,sve512=on to be the
> > > > same as -cpu max,sve512=on, they are not. The former enables all vector
> > > > lengths 512 bits and smaller, while the latter only sets the 512-bit
> > > > length and its smaller power-of-2 lengths. It's probably best not to use
> > > > sve-max-vq with sve properties, but it can't be completely
> > > > forbidden as we want qmp_query_cpu_model_expansion to work with guests
> > > > launched with e.g. -cpu max,sve-max-vq=8 on their command line.
> > > 
> > > Supporting both options together in a non-idempotent way seems to
> > > complicate things.
> > > 
> > > Would it be simpler to allow sve-max-vq only when there are no sve
> > > options?
> > 
> > Not really. Since we can't simply remove sve-max-vq from the 'max' cpu
> > type, then we'd still need conditions to check when we can use it. The
> > restriction that it has to come first reduces the complexity
> > substantially, and I think restricting to not being allowed
> > at all, when sve are used, would only give us a net check
> > reduction of one or two.
> > 
> > > 
> > > Alternatively, the two options would be defined so that their meanings
> > > are independent of parse order.
> > > 
> > > So, way sve-max-vq= means that:
> > > 
> > >  * all VQs up to max for which there is no corresponding sve=off,
> > >are enabled; and
> > > 
> > >  * VQ max is enabled; and
> > > 
> > >  * all VQs exceeding max are disabled.
> > > 
> > > While sve=(on|off) means
> > > 
> > >  * the VQ coresponding to  is enabled (for on) or disabled (for
> > >off).
> > >  
> > > After parsing all the options, you check that the sve-max-vq and
> > > sve optinos are consistent.  If you disallow duplicate sve-max-vq
> > > or sve options, then there is no possibility of ambiguity and the
> > > order or options doesn't matter.
> > 
> > I don't want to put any final checks at the end of parsing all options,
> > because that won't work with the QMP query.

Ack, I'm not familiar with the QMP side of things.

> Actually, I think I can allow sve-max-vq to come after sve
> without adding much code, and without requiring a final check. I
> could try that if you'd like, but I'm not sure it's worth it. Also, I
> feel like I tried this once already and rejected it for some reason,
> but atm I can't remember why.

Probably not.  I was aiming to simplify things, but if my suggestions
don't make anything simpler, then it's obviously not worth it :)

> > > (There may be constraints on the way qemu options parsing works that
> > > make this hard, though...)
> > 
> > I can't think of any issue with the parsing, but the QMP query only using
> > the property get accessors, without any finalizing, does put constraints
> > on what we can do.
> > 
> > > Having sve-max-vq in 128-bit units while sve are named in terms of
&g

Re: [Qemu-devel] [PATCH v2 05/14] target/arm/helper: zcr: Add build bug next to value range assumption

2019-06-24 Thread Dave Martin
On Mon, Jun 24, 2019 at 12:30:37PM +0100, Andrew Jones wrote:
> On Mon, Jun 24, 2019 at 12:05:07PM +0100, Dave Martin wrote:
> > On Fri, Jun 21, 2019 at 05:34:13PM +0100, Andrew Jones wrote:
> > 
> > The purpose of this check should probably at least be described in a
> > comment -- i.e., what actually depends on this?
> 
> I was thinking the already present "Bits other than [3:0] are RAZ/WI."
> explained that, but how about this for an improvement?
> 
> /*
>  * Only the lowest 4 bits of ZCR_ELx may be used to constrain the vector
>  * length, the rest of the bits are RAZ/WI. Since the vector length of
>  * 128-bits (1 in quadwords) is represented as zero in ZCR_ELx, and all
>  * vector lengths are represented as their length in quadwords minus 1,
>  * then four bits allow up to quadword 16 to be selected.
>  */

No, maybe the existing comment is enough.

I thought there might be more code elsewhere that assumes that checks
sve_max_vq <= ARM_MAX_VQ then then assumes that sve_max_vq <= 16.  But
if not, we probably don't need an additional comment here.

I haven't tried to understand all the code in the series beyond the
user/kernel interactions, so maybe I was just paranoid.

[...]

Cheers
---Dave



Re: [Qemu-devel] [PATCH v2 07/14] target/arm/cpu64: max cpu: Introduce sve properties

2019-06-24 Thread Dave Martin
On Fri, Jun 21, 2019 at 05:34:15PM +0100, Andrew Jones wrote:
> Introduce cpu properties to give fine control over SVE vector lengths.
> We introduce a property for each valid length up to the current
> maximum supported, which is 2048-bits. The properties are named, e.g.
> sve128, sve256, sve512, ..., where the number is the number of bits.
> 
> It's now possible to do something like -cpu max,sve-max-vq=4,sve384=off
> to provide a guest vector lengths 128, 256, and 512 bits. The resulting
> set must conform to the architectural constraint of having all power-of-2
> lengths smaller than the maximum length present. It's also possible to
> only provide sve properties, e.g. -cpu max,sve512=on. That
> example provides the machine with 128, 256, and 512 bit vector lengths.
> It doesn't hurt to explicitly ask for all expected vector lengths,
> which is what, for example, libvirt should do.
> 
> Note1, it is not possible to use sve properties before
> sve-max-vq, e.g. -cpu max,sve384=off,sve-max-vq=4, as supporting
> that overly complicates the user input validation.
> 
> Note2, while one might expect -cpu max,sve-max-vq=4,sve512=on to be the
> same as -cpu max,sve512=on, they are not. The former enables all vector
> lengths 512 bits and smaller, while the latter only sets the 512-bit
> length and its smaller power-of-2 lengths. It's probably best not to use
> sve-max-vq with sve properties, but it can't be completely
> forbidden as we want qmp_query_cpu_model_expansion to work with guests
> launched with e.g. -cpu max,sve-max-vq=8 on their command line.

Supporting both options together in a non-idempotent way seems to
complicate things.

Would it be simpler to allow sve-max-vq only when there are no sve
options?

Alternatively, the two options would be defined so that their meanings
are independent of parse order.

So, way sve-max-vq= means that:

 * all VQs up to max for which there is no corresponding sve=off,
   are enabled; and

 * VQ max is enabled; and

 * all VQs exceeding max are disabled.

While sve=(on|off) means

 * the VQ coresponding to  is enabled (for on) or disabled (for
   off).
 
After parsing all the options, you check that the sve-max-vq and
sve optinos are consistent.  If you disallow duplicate sve-max-vq
or sve options, then there is no possibility of ambiguity and the
order or options doesn't matter.

(There may be constraints on the way qemu options parsing works that
make this hard, though...)

Having sve-max-vq in 128-bit units while sve are named in terms of
bit counts also feels a bit odd now.

[...]

Cheers
---Dave



Re: [Qemu-devel] [PATCH v2 05/14] target/arm/helper: zcr: Add build bug next to value range assumption

2019-06-24 Thread Dave Martin
On Fri, Jun 21, 2019 at 05:34:13PM +0100, Andrew Jones wrote:

The purpose of this check should probably at least be described in a
comment -- i.e., what actually depends on this?

Cheers
---Dave

> Suggested-by: Dave Martin 
> Signed-off-by: Andrew Jones 
> ---
>  target/arm/helper.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index df4276f5f6ca..edba94004e0b 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -5319,6 +5319,7 @@ static void zcr_write(CPUARMState *env, const 
> ARMCPRegInfo *ri,
>  int new_len;
>  
>  /* Bits other than [3:0] are RAZ/WI.  */
> +QEMU_BUILD_BUG_ON(ARM_MAX_VQ > 16);
>  raw_write(env, ri, value & 0xf);
>  
>  /*
> -- 
> 2.20.1
> 



Re: [Qemu-devel] [PATCH v2 10/14] target/arm/kvm64: Add kvm_arch_get/put_sve

2019-06-24 Thread Dave Martin
On Fri, Jun 21, 2019 at 05:34:18PM +0100, Andrew Jones wrote:
> These are the SVE equivalents to kvm_arch_get/put_fpsimd. Note, the
> swabbing is different than it is for fpsmid because the vector format
> is a little-endian stream of words.

Note, on big-endian hosts the FPSIMD view Vn and the SVE view Zn[127:0]
of the FPSIMD/SVE common register bits has the opposite endianness for
SVE_{GET,SET}_ONE_REG.

This only matters if mixing the two views: just from this patch I don't
know whether this is an issue for QEMU or not.

The kernel and gdb were recently found to be broken in this regard for
userspace [1] but the KVM interface should be unaffected.

Cheers
---Dave

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/arm64/kernel?id=41040cf7c5f0f26c368bc5d3016fed3a9ca6dba4



Re: [Qemu-devel] [PATCH v6 6/6] tests/tcg/aarch64: Add bti smoke test

2019-06-06 Thread Dave Martin
On Wed, Jun 05, 2019 at 09:57:06PM +0100, Richard Henderson wrote:
> This will build with older toolchains, without the upstream support
> for -mbranch-protection.  Such a toolchain will produce a warning
> in such cases,
> 
> ld: warning: /tmp/ccyZt0kq.o: unsupported GNU_PROPERTY_TYPE (5) \
> type: 0xc000
> 
> but the still places the note at the correct location in the binary
> for processing by the runtime loader.
> 
> Signed-off-by: Richard Henderson 
> ---
>  tests/tcg/aarch64/bti-1.c | 77 +++
>  tests/tcg/aarch64/bti-crt.inc.c   | 69 +++
>  tests/tcg/aarch64/Makefile.target |  3 ++
>  3 files changed, 149 insertions(+)
>  create mode 100644 tests/tcg/aarch64/bti-1.c
>  create mode 100644 tests/tcg/aarch64/bti-crt.inc.c
> 
> diff --git a/tests/tcg/aarch64/bti-1.c b/tests/tcg/aarch64/bti-1.c
> new file mode 100644
> index 00..2aee57ea7a
> --- /dev/null
> +++ b/tests/tcg/aarch64/bti-1.c
> @@ -0,0 +1,77 @@
> +/*
> + * Branch target identification, basic notskip cases.
> + */
> +
> +#include "bti-crt.inc.c"
> +
> +/*
> + * Work around lack of -mbranch-protection=standard in older toolchains.
> + * The signal handler is invoked by the kernel with PSTATE.BTYPE=2, which
> + * means that the handler must begin with a marker like BTI_C.
> + */
> +asm("skip2_sigill1:\n\
> + hint#34\n\
> + b   skip2_sigill2\n\
> +.type skip2_sigill1,%function\n\
> +.size skip2_sigill1,8");
> +
> +extern void skip2_sigill1(int sig, siginfo_t *info, ucontext_t *uc)
> +__attribute__((visibility("hidden")));
> +
> +static void __attribute__((used))
> +skip2_sigill2(int sig, siginfo_t *info, ucontext_t *uc)
> +{
> +uc->uc_mcontext.pc += 8;
> +uc->uc_mcontext.pstate = 1;
> +}
> +
> +#define NOP   "nop"
> +#define BTI_N "hint #32"
> +#define BTI_C "hint #34"
> +#define BTI_J "hint #36"
> +#define BTI_JC"hint #38"
> +
> +#define BTYPE_1(DEST) \
> +asm("mov %0,#1; adr x16, 1f; br x16; 1: " DEST "; mov %0,#0" \
> +: "=r"(skipped) : : "x16")
> +
> +#define BTYPE_2(DEST) \
> +asm("mov %0,#1; adr x16, 1f; blr x16; 1: " DEST "; mov %0,#0" \
> +: "=r"(skipped) : : "x16", "x30")
> +
> +#define BTYPE_3(DEST) \
> +asm("mov %0,#1; adr x15, 1f; br x15; 1: " DEST "; mov %0,#0" \
> +: "=r"(skipped) : : "x15")
> +
> +#define TEST(WHICH, DEST, EXPECT) \
> +do { WHICH(DEST); fail += skipped ^ EXPECT; } while (0)
> +
> +
> +int main()
> +{
> +int fail = 0;
> +int skipped;
> +
> +/* Signal-like with SA_SIGINFO.  */
> +signal_info(SIGILL, skip2_sigill1);
> +
> +TEST(BTYPE_1, NOP, 1);
> +TEST(BTYPE_1, BTI_N, 1);
> +TEST(BTYPE_1, BTI_C, 0);
> +TEST(BTYPE_1, BTI_J, 0);
> +TEST(BTYPE_1, BTI_JC, 0);
> +
> +TEST(BTYPE_2, NOP, 1);
> +TEST(BTYPE_2, BTI_N, 1);
> +TEST(BTYPE_2, BTI_C, 0);
> +TEST(BTYPE_2, BTI_J, 1);
> +TEST(BTYPE_2, BTI_JC, 0);
> +
> +TEST(BTYPE_3, NOP, 1);
> +TEST(BTYPE_3, BTI_N, 1);
> +TEST(BTYPE_3, BTI_C, 1);
> +TEST(BTYPE_3, BTI_J, 0);
> +TEST(BTYPE_3, BTI_JC, 0);
> +
> +return fail;
> +}
> diff --git a/tests/tcg/aarch64/bti-crt.inc.c b/tests/tcg/aarch64/bti-crt.inc.c
> new file mode 100644
> index 00..bb363853de
> --- /dev/null
> +++ b/tests/tcg/aarch64/bti-crt.inc.c
> @@ -0,0 +1,69 @@
> +/*
> + * Minimal user-environment for testing BTI.
> + *
> + * Normal libc is not built with BTI support enabled, and so could
> + * generate a BTI TRAP before ever reaching main.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +int main(void);
> +
> +void _start(void)
> +{
> +exit(main());
> +}
> +
> +void exit(int ret)
> +{
> +register int x0 __asm__("x0") = ret;
> +register int x8 __asm__("x8") = __NR_exit;
> +
> +asm volatile("svc #0" : : "r"(x0), "r"(x8));
> +__builtin_unreachable();
> +}
> +
> +/*
> + * Irritatingly, the user API struct sigaction does not match the
> + * kernel API struct sigaction.  So for simplicity, isolate the
> + * kernel ABI here, and make this act like signal.
> + */
> +void signal_info(int sig, void (*fn)(int, siginfo_t *, ucontext_t *))
> +{
> +struct kernel_sigaction {
> +void (*handler)(int, siginfo_t *, ucontext_t *);
> +unsigned long flags;
> +unsigned long restorer;
> +unsigned long mask;
> +} sa = { fn, SA_SIGINFO, 0, 0 };
> +
> +register int x0 __asm__("x0") = sig;
> +register void *x1 __asm__("x1") = 
> +register void *x2 __asm__("x2") = 0;
> +register int x3 __asm__("x3") = sizeof(unsigned long);
> +register int x8 __asm__("x8") = __NR_rt_sigaction;
> +
> +asm volatile("svc #0"
> + : : "r"(x0), "r"(x1), "r"(x2), "r"(x3), "r"(x8) : "memory");
> +}
> +
> +/*
> + * Create the PT_NOTE that will enable BTI in the page tables.
> + * This will be created by the compiler with -mbranch-protection=standard,
> + * but as of 2019-03-29, this is has not been committed 

Re: [Qemu-devel] [PATCH v6 5/6] linux-user: Parse NT_GNU_PROPERTY_TYPE_0 notes

2019-06-06 Thread Dave Martin
On Wed, Jun 05, 2019 at 09:57:05PM +0100, Richard Henderson wrote:
> For aarch64, this includes the GNU_PROPERTY_AARCH64_FEATURE_1_BTI bit,
> which indicates that the image should be mapped with guarded pages.

Heads-up: for arm64 I plan to move to making PT_GNU_PROPERTY
authoritiative for ELF on linux: if this is present, we use it to find
the note directly and ignore any other notes; if there is no
PT_GNU_PROPERTY entry then we assume there is no NT_GNU_PROPERTY_TYPE_0
note.

This is not quite decided yet, but to avoid fragmentation I'd prefer
qemu and Linux apply the same policy -- I'll keep you in the loop about
the decision.

I think you can reasonable upstream this patch in qemu and then
subsequently make it stricter without an ABI break.  Upstream GNU ld
generates this entry today.

> 
> Signed-off-by: Richard Henderson 
> ---
>  linux-user/elfload.c | 83 +++-
>  1 file changed, 75 insertions(+), 8 deletions(-)
> 
> diff --git a/linux-user/elfload.c b/linux-user/elfload.c
> index a57b7049dd..1a12c60a33 100644
> --- a/linux-user/elfload.c
> +++ b/linux-user/elfload.c
> @@ -2253,7 +2253,7 @@ static void load_elf_image(const char *image_name, int 
> image_fd,
>  struct elfhdr *ehdr = (struct elfhdr *)bprm_buf;
>  struct elf_phdr *phdr;
>  abi_ulong load_addr, load_bias, loaddr, hiaddr, error;
> -int i, retval;
> +int i, retval, prot_exec = PROT_EXEC;
>  const char *errmsg;
>  
>  /* First of all, some simple consistency checks */
> @@ -2288,17 +2288,78 @@ static void load_elf_image(const char *image_name, 
> int image_fd,
>  loaddr = -1, hiaddr = 0;
>  info->alignment = 0;
>  for (i = 0; i < ehdr->e_phnum; ++i) {
> -if (phdr[i].p_type == PT_LOAD) {
> -abi_ulong a = phdr[i].p_vaddr - phdr[i].p_offset;
> +struct elf_phdr *eppnt = phdr + i;
> +
> +if (eppnt->p_type == PT_LOAD) {
> +abi_ulong a = eppnt->p_vaddr - eppnt->p_offset;
>  if (a < loaddr) {
>  loaddr = a;
>  }
> -a = phdr[i].p_vaddr + phdr[i].p_memsz;
> +a = eppnt->p_vaddr + eppnt->p_memsz;
>  if (a > hiaddr) {
>  hiaddr = a;
>  }
>  ++info->nsegs;
> -info->alignment |= phdr[i].p_align;
> +info->alignment |= eppnt->p_align;
> +} else if (eppnt->p_type == PT_NOTE) {
> +#ifdef TARGET_AARCH64
> +/*
> + * Process NT_GNU_PROPERTY_TYPE_0.
> + *
> + * TODO: The only item that is AArch64 specific is the
> + * GNU_PROPERTY_AARCH64_FEATURE_1_AND processing at the end.
> + * If we were to ever process GNU_PROPERTY_X86_*, all of the
> + * code through checking the gnu0 magic number is sharable.
> + * But for now, since this *is* only used by AArch64, don't
> + * process the note elsewhere.
> + */
> +const uint32_t gnu0_magic = const_le32('G' | 'N' << 8 | 'U' << 
> 16);
> +uint32_t note[7];
> +
> +/*
> + * The note contents are 7 words, but depending on LP64 vs ILP32
> + * there may be an 8th padding word at the end.  Check for and
> + * read the minimum size.  Further checks below will validate
> + * that the sizes of everything involved are as we expect.
> + */
> +if (eppnt->p_filesz < sizeof(note)) {
> +continue;
> +}
> +if (eppnt->p_offset + eppnt->p_filesz <= BPRM_BUF_SIZE) {
> +memcpy(note, bprm_buf + eppnt->p_offset, sizeof(note));
> +} else {
> +retval = pread(image_fd, note, sizeof(note), 
> eppnt->p_offset);
> +if (retval != sizeof(note)) {
> +goto exit_perror;
> +}
> +}

Can we police that the segment alignment matches the ELF class (i.e., 8
for 64-bit, 4 for 32-bit)?

hjl specifies this, but it's controversial and sometimes missed --
there's a bug right now in GNU ld where --force-bti generates a note
with alignment 1.

Due to the high chance of screwing this up, it'd be good to police it
wherever appropriate.

> +#ifdef BSWAP_NEEDED
> +for (i = 0; i < ARRAY_SIZE(note); ++i) {
> +bswap32s(note + i);
> +}
> +#endif
> +/*
> + * Check that this is a NT_GNU_PROPERTY_TYPE_0 note.
> + * Again, descsz includes padding.  Full size validation
> + * awaits checking the final payload.
> + */
> +if (note[0] != 4 ||   /* namesz */
> +note[1] < 12 ||   /* descsz */
> +note[2] != NT_GNU_PROPERTY_TYPE_0 ||  /* type */
> +note[3] != gnu0_magic) {  /* name */
> +continue;
> +}
> +  

Re: [Qemu-devel] [PATCH v6 1/6] linux-user/aarch64: Reset btype for syscalls and signalsy

2019-06-06 Thread Dave Martin
On Wed, Jun 05, 2019 at 09:57:01PM +0100, Richard Henderson wrote:
> The value of btype for syscalls is CONSTRAINED UNPREDICTABLE,
> so we need to make sure that the value is 0 before clone,
> fork, or syscall return.
> 
> The kernel sets btype for the signal handler as if for a call.
> 
> Signed-off-by: Richard Henderson 
> ---
>  linux-user/aarch64/cpu_loop.c |  7 +++
>  linux-user/aarch64/signal.c   | 10 --
>  2 files changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/linux-user/aarch64/cpu_loop.c b/linux-user/aarch64/cpu_loop.c
> index 2f2f63e3e8..1f68b13168 100644
> --- a/linux-user/aarch64/cpu_loop.c
> +++ b/linux-user/aarch64/cpu_loop.c
> @@ -86,6 +86,13 @@ void cpu_loop(CPUARMState *env)
>  
>  switch (trapnr) {
>  case EXCP_SWI:
> +/*
> + * The state of BTYPE on syscall entry is CONSTRAINED
> + * UNPREDICTABLE.  The real kernel will need to tidy this up
> + * as well.  Do this before syscalls so that the value is
> + * correct on return from syscall (especially clone & fork).
> + */
> +env->btype = 0;

Note, after discussion with the architects, I think Linux won't bother
sanitising this field in my next spin of the patches.

If the SVC (or HVC or SMC) sits in a PROT_BTI page, then it won't
execute anyway unless BTYPE is already 0: otherwise, a branch target
exception would be taken instead.

If the insn isn't in a PROT_BTI page, then BTYPE could be nonzero when
the SVC (etc.) exception is taken, but it also won't matter in that case
what BTYPE is on return, since branch target exceptions are never
generated on insns in non-guarded pages.

For this to make a difference:

 a) the SVC be in a non-PROT_BTI page, just before a page boundary,
where the next page is a PROT_BTI page, so that the exception return
goes to the next page.

 b) the SVC must be an mprotect() call or similar that enabled PROT_BTI
for the patch containing the SVC itself.

These are both silly things to do, and we probably don't care what
happens in such cases.

(Question: does qemu ever mprotect() the page containing PC?  I'd hope
not...)

>  ret = do_syscall(env,
>   env->xregs[8],
>   env->xregs[0],
> diff --git a/linux-user/aarch64/signal.c b/linux-user/aarch64/signal.c
> index f84a9cf28a..5605d404b3 100644
> --- a/linux-user/aarch64/signal.c
> +++ b/linux-user/aarch64/signal.c
> @@ -506,10 +506,16 @@ static void target_setup_frame(int usig, struct 
> target_sigaction *ka,
>  + offsetof(struct target_rt_frame_record, tramp);
>  }
>  env->xregs[0] = usig;
> -env->xregs[31] = frame_addr;
>  env->xregs[29] = frame_addr + fr_ofs;
> -env->pc = ka->_sa_handler;
>  env->xregs[30] = return_addr;
> +env->xregs[31] = frame_addr;
> +env->pc = ka->_sa_handler;
> +
> +/* Invoke the signal handler as if by indirect call.  */
> +if (cpu_isar_feature(aa64_bti, arm_env_get_cpu(env))) {
> +env->btype = 2;
> +}
> +

Ack.  I had a simple test for this for native userspace, and it seems to
work as desired -- so I don't expect to change it.

Cheers
---Dave



Re: [Qemu-devel] How do we do user input bitmap properties?

2019-05-15 Thread Dave Martin
On Wed, May 15, 2019 at 12:09:04PM +0100, Dr. David Alan Gilbert wrote:
> * Dave Martin (dave.mar...@arm.com) wrote:
> > On Wed, May 15, 2019 at 09:18:54AM +0100, Andrew Jones wrote:
> > > On Tue, May 14, 2019 at 04:48:38PM +0200, Igor Mammedov wrote:
> > > > On Tue, 14 May 2019 11:02:25 +0200
> > > > Andrew Jones  wrote:
> > > > > My thought is primarily machines. If a human wants to use the command
> > > > > line and SVE, then I'm assuming they'll be happy with sve-max-vq or
> > > > > figuring out a map they like once and then sticking to it.
> > > > 
> > > > maybe naive question, but why not use a property/bit as user facing 
> > > > interface,
> > > > in line with what we do with CPUID bits. (that's assuming that bits have
> > > > fixed meaning).
> > > > Yes, it's verbose but follows current practice and works fine with -cpu 
> > > > and
> > > > -device.
> > > > (I really hate custom preprocessing of -cpu and we were working hard to 
> > > > remove
> > > > that in favor of canonical properties at the expense of more verbose 
> > > > CLI).
> > > >
> > > 
> > > Are you asking if we should do something like the following?
> > > 
> > >   -cpu host,sve1=on,sve=2=on,sve3=off,sve4=on
> > 
> > Note, there is nothing SVE-specific about this.
> > 
> > Either enabling features on a per-vcpu basis is justified, or it isn't:
> > if it's justified, then it would be better to have a general way of
> > specifying per-vcpu properties, rather than it being reinvented per
> > feature.
> 
> SVE *is* a bit unusual.  In most CPU features they're actually features,
> they're on or off, so we have a big list of features that are
> enabled/disabled.  We've had that type of thing (at least on x86) for
> years and it's OK.
> We've got one or two things where they're numerical
> (e.g. host-physbits) and we struggle a bit with how to handle them.
> 
> SVE is somewhere in between - it's a list of numbers, apparently a
> fairly large arbitrarily set of numbers that could be chosen so you'd
> need lots of feature flags (sve1...sve64 say or more); so that doesn't
> fit the existing things we've had that have worked.

I see what you mean now.  SVE configuration is indeed more than just a
boolean flag.

Cheers
---Dave



Re: [Qemu-devel] How do we do user input bitmap properties?

2019-05-15 Thread Dave Martin
On Wed, May 15, 2019 at 12:42:44PM +0100, Andrew Jones wrote:
> On Wed, May 15, 2019 at 12:00:45PM +0100, Dave Martin wrote:
> > On Wed, May 15, 2019 at 09:18:54AM +0100, Andrew Jones wrote:
> > > On Tue, May 14, 2019 at 04:48:38PM +0200, Igor Mammedov wrote:
> > > > On Tue, 14 May 2019 11:02:25 +0200
> > > > Andrew Jones  wrote:
> > > > > My thought is primarily machines. If a human wants to use the command
> > > > > line and SVE, then I'm assuming they'll be happy with sve-max-vq or
> > > > > figuring out a map they like once and then sticking to it.
> > > > 
> > > > maybe naive question, but why not use a property/bit as user facing 
> > > > interface,
> > > > in line with what we do with CPUID bits. (that's assuming that bits have
> > > > fixed meaning).
> > > > Yes, it's verbose but follows current practice and works fine with -cpu 
> > > > and
> > > > -device.
> > > > (I really hate custom preprocessing of -cpu and we were working hard to 
> > > > remove
> > > > that in favor of canonical properties at the expense of more verbose 
> > > > CLI).
> > > >
> > > 
> > > Are you asking if we should do something like the following?
> > > 
> > >   -cpu host,sve1=on,sve=2=on,sve3=off,sve4=on
> > 
> > Note, there is nothing SVE-specific about this.
> 
> In the above example there is some specific SVE stuff there. If the
> command line has sve4=on, then it must also have sve1=on and sve2=on,
> per the architecture requiring all smaller power-of-2 vector lengths.
> Only sve3 is optional, but because it's optional we have to explicitly
> state when it's on or off in order to ensure we can cleanly fail a
> migration to a host that doesn't support that option.
> 
> > 
> > Either enabling features on a per-vcpu basis is justified, or it isn't:
> > if it's justified, then it would be better to have a general way of
> > specifying per-vcpu properties, rather than it being reinvented per
> > feature.
> > 
> > Creating mismatched configurations is allowed by the architecture and so
> > it's useful for testing the kernel, but probably less useful for real-
> > world use cases today.
> > 
> > So it may be a good idea to get the symmetric support sorted out first
> > before thinking about whether and how to specify asymmetric
> > configurations.
> 
> These properties are per-vcpu for KVM only. QEMU doesn't have a way
> to allow per-vcpu features to be described on the command line yet.
> With '-cpu host,...' The '...' applies to all vcpus. So we are "just"
> working on the symmetric support now.

OK, I think I misunderstood what was being proposed here.

Until/unless someone comes up with a compelling use case, I think it's
entirely reasonable for QEMU not to support asymmetry of this sort.

Cheers
---Dave



Re: [Qemu-devel] [PATCH 00/13] target/arm/kvm: enable SVE in guests

2019-05-15 Thread Dave Martin
On Wed, May 15, 2019 at 12:28:20PM +0100, Andrea Bolognani wrote:
> On Wed, 2019-05-15 at 12:14 +0100, Dave Martin wrote:
> > On Wed, May 15, 2019 at 09:03:58AM +0100, Andrea Bolognani wrote:
> > > On Tue, 2019-05-14 at 13:14 -0700, Richard Henderson wrote:
> > > > Why is =4 less user-friendly than =512?
> > > > 
> > > > I don't actually see "total bits in vector" as more user-friendly than 
> > > > "number
> > > > of quadwords" when it comes to non-powers-of-2 like =7 vs =896 or =13 
> > > > vs =1664.
> > > 
> > > I would wager most people are intimately familiar with bits, bytes
> > > and multiples due to having to work with them daily. Quadwords, not
> > > so much.
> > 
> > Generally I tend to agree.  For kvmtool I leaned torward quadwords
> > purely because
> > 
> > 16,32,48,64,80,96,112,128,144,160,176,192,208
> > 
> > is a big pain to type compared with
> > 
> > 1,2,3,4,5,6,7,8,9,10,11,12,13
> > 
> > Even though I prefer to specify vector lengths in bytes everywhere else
> > in the Linux user API (precisely to avoid the confusion you object to).
> > 
> > This isn't finalised yet for kvmtool -- I need to rework the patches
> > and may not include it at all initially: kvmtool doesn't support
> > migration, which is the main usecase for being able to specify an exact
> > set of vector lengths AFAICT.
> > 
> > Since this is otherwise only useful for migration, experimentation or
> > machine-driven configuration, a bitmask
> > 
> > 0x1fff
> > 
> > as some have suggested may well be a pragmatic alternative for kvmtool.
> 
> Just to be clear, I have suggested using bits (or bytes or megabytes
> depending on the exact value) only for the command-line-user-oriented
> sve-vl-max option, which would take a single value.
> 
> For interoperation with the management layer, on the other hand,
> using a bitmap is perfectly fine, and whether the values encoded
> within are expressed in quadwords or whatever other format is largely
> irrelevant, so long as it it's properly documented of course.

Seems fair.

Cheers
---Dave



Re: [Qemu-devel] [PATCH 00/13] target/arm/kvm: enable SVE in guests

2019-05-15 Thread Dave Martin
On Wed, May 15, 2019 at 09:03:58AM +0100, Andrea Bolognani wrote:
> On Tue, 2019-05-14 at 13:14 -0700, Richard Henderson wrote:
> > On 5/14/19 9:03 AM, Andrea Bolognani wrote:
> > > On Tue, 2019-05-14 at 14:53 +0200, Andrew Jones wrote:
> > > > We already have sve-max-vq, so I'm not sure we want to rename it.
> > > 
> > > Oh, I didn't realize that was the case. And of course it already
> > > takes a number of quadwords as argument, I suppose? That's pretty
> > > unfortunate :(
> > > 
> > > Perhaps we could consider deprecating it in favor of a user-friendly
> > > variant that's actually suitable for regular humans, like the one I
> > > suggest above?
> > 
> > Why is =4 less user-friendly than =512?
> > 
> > I don't actually see "total bits in vector" as more user-friendly than 
> > "number
> > of quadwords" when it comes to non-powers-of-2 like =7 vs =896 or =13 vs 
> > =1664.
> 
> I would wager most people are intimately familiar with bits, bytes
> and multiples due to having to work with them daily. Quadwords, not
> so much.

Generally I tend to agree.  For kvmtool I leaned torward quadwords
purely because

16,32,48,64,80,96,112,128,144,160,176,192,208

is a big pain to type compared with

1,2,3,4,5,6,7,8,9,10,11,12,13

Even though I prefer to specify vector lengths in bytes everywhere else
in the Linux user API (precisely to avoid the confusion you object to).

This isn't finalised yet for kvmtool -- I need to rework the patches
and may not include it at all initially: kvmtool doesn't support
migration, which is the main usecase for being able to specify an exact
set of vector lengths AFAICT.

Since this is otherwise only useful for migration, experimentation or
machine-driven configuration, a bitmask

0x1fff

as some have suggested may well be a pragmatic alternative for kvmtool.

Cheers
---Dave



Re: [Qemu-devel] How do we do user input bitmap properties?

2019-05-15 Thread Dave Martin
On Wed, May 15, 2019 at 09:18:54AM +0100, Andrew Jones wrote:
> On Tue, May 14, 2019 at 04:48:38PM +0200, Igor Mammedov wrote:
> > On Tue, 14 May 2019 11:02:25 +0200
> > Andrew Jones  wrote:
> > > My thought is primarily machines. If a human wants to use the command
> > > line and SVE, then I'm assuming they'll be happy with sve-max-vq or
> > > figuring out a map they like once and then sticking to it.
> > 
> > maybe naive question, but why not use a property/bit as user facing 
> > interface,
> > in line with what we do with CPUID bits. (that's assuming that bits have
> > fixed meaning).
> > Yes, it's verbose but follows current practice and works fine with -cpu and
> > -device.
> > (I really hate custom preprocessing of -cpu and we were working hard to 
> > remove
> > that in favor of canonical properties at the expense of more verbose CLI).
> >
> 
> Are you asking if we should do something like the following?
> 
>   -cpu host,sve1=on,sve=2=on,sve3=off,sve4=on

Note, there is nothing SVE-specific about this.

Either enabling features on a per-vcpu basis is justified, or it isn't:
if it's justified, then it would be better to have a general way of
specifying per-vcpu properties, rather than it being reinvented per
feature.

Creating mismatched configurations is allowed by the architecture and so
it's useful for testing the kernel, but probably less useful for real-
world use cases today.

So it may be a good idea to get the symmetric support sorted out first
before thinking about whether and how to specify asymmetric
configurations.

Cheers
---Dave



Re: [Qemu-devel] How do we do user input bitmap properties?

2019-05-15 Thread Dave Martin
On Wed, May 15, 2019 at 09:15:20AM +0100, Andrew Jones wrote:
> On Tue, May 14, 2019 at 03:32:13PM +0200, Markus Armbruster wrote:
> > Syntax that can support such growth would be nice.
> > 
> > To grow a single unsigned number, we can make it wider (but we don't
> > have infrastructure for numbers wider than 64 bits), or we can add more
> > numbers (but under what name?).
> > 
> > Dotted keys syntax could grow more easily, but it's rather awkward.
> > 
> > Looking more closely at your "[PATCH 11/13] target/arm/cpu64: max cpu:
> > Introduce sve-vls-map"... your syntax reflects your data structure:
> > property "sve-vls-map" is of type uint64_t, and interpreted as bit set.
> > This data type would have to grow, too.
> > 
> > We could make widen the integer property (but we don't have
> > infrastructure for integer properties wider than 64 bits), or we can
> > turn it into an array of integers (compatibility?), or we can add more
> > properties to hold the additional integers (yet another silly way to
> > represent a list/array of integers).
> > 
> > I'm not asking you to complicate things just to future-proof this.  Just
> > pause and think whether you can pick a data type that's similarly
> > convenient now, and easier to grow.
> > 
> > Then pick an external syntax for this data type.  You may have to pick a
> > reasonable compromise between ease of implementation and ease of use.
> 
> Widening the integer property sounds good to me. I just hadn't thought of
> it (implementation tunnel vision affecting my user interface design).
> Andrea also mentioned that as a possibility in a reply to the series. I
> think we can leave the property as a uint64_t right now and then, when/if
> it needs to expand past 64 bits we can change the property to a string
> and start parsing arbitrarily large integers from it. The internal state,
> 'uint64_t sve_vls_map' can easily be changed to a 'uint64_t sve_vls_map[]'
> at that point too. How's that sound?

Having an arbitrary-width integer should work.

It will suck a bit for the common case of sparse vector length support

0x800080008000808b

(= [ 1, 2, 4, 8, 16, 32, 64, 128 ] quadwords)

Since lengths above 16 quadwords remain theoretical for now though it's
probably OK as a compromise, though.

The most human-compatible approach would be some kind of list
comprehension syntax, but it's hard to justify that adding a whole new
syntax is justified at this point.

[...]

Cheers
---Dave



Re: [Qemu-devel] How do we do user input bitmap properties?

2019-05-14 Thread Dave Martin
On Tue, May 14, 2019 at 05:54:03AM +0100, Markus Armbruster wrote:
> Andrew Jones  writes:
> 
> > On Thu, Apr 18, 2019 at 07:48:09PM +0200, Markus Armbruster wrote:
> >> Daniel P. Berrangé  writes:
> >> 
> >> > On Thu, Apr 18, 2019 at 11:28:41AM +0200, Andrew Jones wrote:
> >> >> Hi all,
> >> >> 
> >> >> First some background:
> >> >> 
> >> >> For the userspace side of AArch64 guest SVE support we need to
> >> >> expose KVM's allowed vector lengths bitmap to the user and allow
> >> >> the user to choose a subset of that bitmap. Since bitmaps are a
> >> >> bit awkward to work with then we'll likely want to expose it as
> >> >> an array of vector lengths instead. Also, assuming we want to
> >> >> expose the lengths as number-of-quadwords (quadword == 128 bits
> >> >> for AArch64 and vector lengths must be multiples of quadwords)
> >> >> rather than number-of-bits, then an example array (which will
> >> >> always be a sequence) might be
> >> >> 
> >> >>  [ 8, 16, 32 ]
> >> >> 
> >> >> The user may choose a subsequence, but only through truncation,
> >> >> i.e. [ 8, 32 ] is not valid, but [ 8, 16 ] is.
> >> >> 
> >> >> Furthermore, different hosts may support different sequences
> >> >> which have the same maximum. For example, if the above sequence
> >> >> is for Host_A, then Host_B could be
> >> >> 
> >> >>  [ 8, 16, 24, 32 ]
> >> >> 
> >> >> The host must support all lengths in the sequence, which means
> >> >> that while Host_A supports 32, since it doesn't support 24 and
> >> >> we can only truncate sequences, we must use either [ 8 ] or
> >> >> [ 8, 16 ] for a compatible sequence if we intend to migrate
> >> >> between the hosts.
> >> >> 
> >> >> Now to the $SUBJECT question:
> >> >> 
> >> >> My feeling is that we should require the sequence to be
> >> >> provided on the command line as a cpu property. Something
> >> >> like
> >> >> 
> >> >>   -cpu host,sve-vl-list=8:16
> >> >> 
> >> >> (I chose ':' for the delimiter because ',' can't work, but
> >> >> if there's a better choice, then that's fine by me.)
> >> >> 
> >> >> Afaict a property list like this will require a new parser,
> >> 
> >> We had 20+ of those when I last counted.  Among the more annoying
> >> reasons CLI QAPIfication is hard[1].
> >> 
> >> >> which feels a bit funny since it seems we should already
> >> >> have support for this type of thing somewhere in QEMU. So,
> >> >> the question is: do we? I see we have array properties, but
> >> >> I don't believe that works with the command line. Should we
> >> >> only use QMP for this? We already want some QMP in order to
> >> >> query the supported vector lengths. Maybe we should use QMP
> >> >> to set the selection too? But then what about command line
> >> >> support for developers? And if the property is on the command
> >> >> line then we don't have to add it to the migration stream.
> >> >
> >> > You should be able to use arrays from the CLI with QemuOpts by repeating
> >> > the same option name many times, though I can't say it is a very
> >> > nice approach if you have many values to list as it gets very repetative.
> >> 
> >> Yes, this is one of the ways the current CLI does lists.  It's also one
> >> of the more annoying reasons CLI QAPIfication is hard[2].
> >> 
> >> QemuOpts let the last param=value win the stupidest way that could
> >> possibly work (I respect that): add to the front of the list, search it
> >> front to back.
> >> 
> >> Then somebody discovered that if you search the list manually, you can
> >> see them all, and abuse that to get a list-valued param.  I'm sure that
> >> felt clever at the time.
> >> 
> >> Another way to do lists the funky list feature of string input and opts
> >> visitor.  Yet another annoying reason CLI QAPIfication is hard[3].
> >> 
> >> We use the opts visitor's list feature for -numa node,cpus=...  Hmm,
> >> looks like we even combine it with the "multiple param=value build up a
> >> list" technique: -smp node,cpus=0-1,cpus=4-5 denotes [0,1,4,5].
> >> 
> >> > That's the curse of not having a good CLI syntax for non-scalar data in
> >> > QemuOpts & why Markus believes we should switch to JSON for the CLI too
> >> >
> >> >  -cpu host,sve-vl=8,sve-vl=16
> >> 
> >> We actually have CLI syntax for non-scalar data: dotted keys.  Dotted
> >> keys are syntactic sugar for JSON.  It looks friendlier than JSON for
> >> simple cases, then gets uglier as things get more complex, and then it
> >> falls apart: it can't quite express all of JSON.
> >> 
> >> Example: sve-vl.0=8,sve-vl.1=16
> >> gets desugared into {"sve": [8, 16]}
> >> if the QAPI schema has 'sve': ['int'].
> >> 
> >> The comment at the beginning of util/keyval.c explains it in more
> >> detail.
> >> 
> >> It powers -blockdev and -display.  Both options accept either JSON or
> >> dotted keys.  If the option argument starts with '{', it's JSON.
> >> Management applications should stick to JSON.
> >> 
> >> 
> >> [1] Towards a more expressive and introspectable QEMU command line
> >> 

Re: [Qemu-devel] [PATCH 05/13] target/arm/kvm: Add kvm_arch_get/put_sve

2019-05-14 Thread Dave Martin
On Mon, May 13, 2019 at 05:58:59PM +0100, Richard Henderson wrote:
> On 5/13/19 7:39 AM, Dave Martin wrote:
> > On that point, could TCG easily be made to expose a larger vector length
> > to the kernel?  I'd be interested to see what happened.
> 
> It would be easy enough to extend the maximum vector length within TCG.
> 
> Increase ARM_MAX_VQ.  Alter the couple of places where we manipulate ZCR.LEN 
> to
> extend the current 4-bit mask.
> 
> How large do you need the max to be, for testing?

Anything upwards of 256 bytes is interesting.

The architecture reserves space for it to grow up to 9 bits, though it's
unlikely it would ever get that large in reality.

So if you wanted to go crazy, you might be able to go up to 8192 bytes.

This is just for fun, since it goes outside the architecture and Linux
officially doesn't support it today in any case.  So definitely not a
priority!

Cheers
---Dave



Re: [Qemu-devel] [PATCH 05/13] target/arm/kvm: Add kvm_arch_get/put_sve

2019-05-13 Thread Dave Martin
On Mon, May 13, 2019 at 04:40:52PM +0100, Peter Maydell wrote:
> On Mon, 13 May 2019 at 16:31, Dave Martin  wrote:
> >
> > On Mon, May 13, 2019 at 02:55:01PM +0100, Andrew Jones wrote:
> > > QEMU keeps its 128-bit and larger words in the same order (least
> > > significant word first) for both host endian types. We need to
> > > do word swapping every time we set/get them to/from KVM.
> >
> > I'm not sure whether this is appropriate here, though it depends on
> > what QEMU does with the data.
> 
> The layout is optimised for TCG emulation to be able
> to work with it, I think (rth would have the definite
> reason, though).

So long as we are agreed about the ABI, this is none of my concern :)

> > Something non-obvious to be aware of:
> >
> > As exposed through the signal frame and the KVM ABI, the memory
> > representation of an SVE reg is invariant with respect to the
> > endianness.
> 
> Yes; we handle this conversion as we write out the signal frame:
> https://github.com/qemu/qemu/blob/master/linux-user/aarch64/signal.c#L184

Right.  I hadn't focused consciously on this, since the architecture
does the work for us in the kernel (mostly).

I will check the documentation to make sure the behaviour is clearly
described, anyhow.

Cheers
---Dave



Re: [Qemu-devel] [PATCH 05/13] target/arm/kvm: Add kvm_arch_get/put_sve

2019-05-13 Thread Dave Martin
On Mon, May 13, 2019 at 02:55:01PM +0100, Andrew Jones wrote:
> On Mon, May 13, 2019 at 01:31:11PM +0100, Dave Martin wrote:
> > On Sun, May 12, 2019 at 09:36:16AM +0100, Andrew Jones wrote:
> > > These are the SVE equivalents to kvm_arch_get/put_fpsimd.
> > > 
> > > Signed-off-by: Andrew Jones 
> > > ---
> > >  target/arm/kvm64.c | 127 +++--
> > >  1 file changed, 123 insertions(+), 4 deletions(-)
> > 
> > [...]
> > 
> > > +static int kvm_arch_put_sve(CPUState *cs)
> > > +{
> > > +ARMCPU *cpu = ARM_CPU(cs);
> > > +CPUARMState *env = >env;
> > > +struct kvm_one_reg reg;
> > > +int n, ret;
> > > +
> > > +for (n = 0; n < KVM_ARM64_SVE_NUM_ZREGS; n++) {
> > > +uint64_t *q = aa64_vfp_qreg(env, n);
> > > +#ifdef HOST_WORDS_BIGENDIAN
> > > +uint64_t d[ARM_MAX_VQ * 2];
> > > +int i;
> > > +for (i = 0; i < cpu->sve_max_vq * 2; i++) {
> > > +d[i] = q[cpu->sve_max_vq * 2 - 1 - i];
> > > +}
> > 
> > Out of interest, why do all this swabbing?  It seems expensive.
> >
> 
> QEMU keeps its 128-bit and larger words in the same order (least
> significant word first) for both host endian types. We need to
> do word swapping every time we set/get them to/from KVM.

I'm not sure whether this is appropriate here, though it depends on
what QEMU does with the data.

Something non-obvious to be aware of:

As exposed through the signal frame and the KVM ABI, the memory
representation of an SVE reg is invariant with respect to the
endianness.

IIUC, the byte order seen for a V-reg in KVM_REG_ARM_CORE and for
the equivalent Z-reg in KVM_REG_ARM64_SVE would be the opposite of each
other on BE, but the same on LE.

This is a feature of the archtiecture: a V-reg can be stored as a single
value, but Z-regs are in general too big to be treated as a single
value: they are always treated as a sequence of elements, and the
largest element size supported is 64 bits, not 128.  IIUC, there is no
direct native way to store with 128-bit swabbing: some explicit data
processing operation would also be needed to swap adjacent 64-bit
elements in the vector around the store/load.

This is not specified in the ABI documentation -- I should address that.

If this is infeasible for KVM to work with, we could perhaps change it,
but I'm not too keen on that at this stage.  KVM_REG_ARM64_SVE_VLS
has a similar behaviour: it's a vector of 64-bit possibly-swabbed words,
not a single possibly-swabbed 512-bit word.


Looking at the kernel, I may have screwed up in places where the
two representations interact, like fpsimd_to_sve().  I should take a
look at that...  This doesn't affect the KVM ABI though.

Cheers
---Dave




Re: [Qemu-devel] [PATCH 05/13] target/arm/kvm: Add kvm_arch_get/put_sve

2019-05-13 Thread Dave Martin
On Mon, May 13, 2019 at 03:07:26PM +0100, Andrew Jones wrote:
> On Mon, May 13, 2019 at 01:43:56PM +0100, Dave Martin wrote:
> > On Sun, May 12, 2019 at 09:36:16AM +0100, Andrew Jones wrote:
> > > These are the SVE equivalents to kvm_arch_get/put_fpsimd.
> > > 
> > > Signed-off-by: Andrew Jones 
> > > ---
> > >  target/arm/kvm64.c | 127 +++--
> > >  1 file changed, 123 insertions(+), 4 deletions(-)
> > 
> > [...]
> > 
> > > +static int kvm_arch_put_sve(CPUState *cs)
> > > +{
> > > +ARMCPU *cpu = ARM_CPU(cs);
> > > +CPUARMState *env = >env;
> > > +struct kvm_one_reg reg;
> > > +int n, ret;
> > > +
> > > +for (n = 0; n < KVM_ARM64_SVE_NUM_ZREGS; n++) {
> > > +uint64_t *q = aa64_vfp_qreg(env, n);
> > > +#ifdef HOST_WORDS_BIGENDIAN
> > > +uint64_t d[ARM_MAX_VQ * 2];
> > > +int i;
> > > +for (i = 0; i < cpu->sve_max_vq * 2; i++) {
> > > +d[i] = q[cpu->sve_max_vq * 2 - 1 - i];
> > > +}
> > > +reg.addr = (uintptr_t)d;
> > > +#else
> > > +reg.addr = (uintptr_t)q;
> > > +#endif
> > > +reg.id = KVM_REG_ARM64_SVE_ZREG(n, 0);
> > 
> > Will this silently go wrong if more than one slice is required (i.e.,
> > the register size grows beyond 8192 bits?)
> 
> Yeah, I could probably implement the slice loop now and add a
> function that returns 1 (for now) like your vcpu_sve_slices()
> function in KVM. I'll do that for v2.

Or just add a sanity check that the vector length is <= 2048 bits.

Support for larger vectors is untestable for now, since the kernel
doesn't support that and would never expose it.


On that point, could TCG easily be made to expose a larger vector length
to the kernel?  I'd be interested to see what happened.

The kernel should warn and hide the larger vector lengths from KVM,
but I've not been able to test that.

It's only worth trying this out if the hacks to QEMU to enable testing
this were pretty trivial, though.

Cheers
---Dave



Re: [Qemu-devel] [PATCH 11/13] target/arm/cpu64: max cpu: Introduce sve-vls-map

2019-05-13 Thread Dave Martin
On Mon, May 13, 2019 at 02:45:12PM +0100, Andrew Jones wrote:
> On Mon, May 13, 2019 at 02:12:48PM +0100, Dave Martin wrote:
> > On Mon, May 13, 2019 at 01:57:40PM +0100, Andrew Jones wrote:
> > > On Mon, May 13, 2019 at 01:41:26PM +0100, Dave Martin wrote:
> > > > On Mon, May 13, 2019 at 01:30:23PM +0100, Andrew Jones wrote:
> > > > > On Mon, May 13, 2019 at 12:26:36PM +0100, Dave Martin wrote:
> > > > > > On Sun, May 12, 2019 at 09:36:22AM +0100, Andrew Jones wrote:
> > > > > > > Introduce another cpu property to control SVE vector lengths,
> > > > > > > sve-vls-map, which allows the user to explicitly select the
> > > > > > > set of vector lengths the guest can use. The map must conform
> > > > > > > to QEMU's limits and architectural constraints, checked when
> > > > > > > the property is set. Inconsistencies with sve-max-vq are also
> > > > > > > checked. The bit number of a set bit in the map represents the
> > > > > > > allowed vector length in number of quadwords.
> > > > > > > 
> > > > > > > Note, as the map is implemented with a single 64-bit word we
> > > > > > > currently only support up to 8192-bit vectors. As QEMU and
> > > > > > > KVM only support up to 2048-bit vectors then this sufficient
> > > > > > > now, and probably for some time. Extending the bitmap beyond
> > > > > > > a single word will likely require changing the property to
> > > > > > > a string and adding yet another parser to QEMU.
> > > > > > > 
> > > > > > > Signed-off-by: Andrew Jones 
> > > > > > > ---
> > > > > > >  target/arm/cpu.c |  4 +++
> > > > > > >  target/arm/cpu.h |  3 ++
> > > > > > >  target/arm/cpu64.c   | 70 
> > > > > > > +---
> > > > > > >  target/arm/helper.c  | 10 ++-
> > > > > > >  target/arm/monitor.c |  9 --
> > > > > > >  5 files changed, 88 insertions(+), 8 deletions(-)
> > > > > > > 
> > > > > > 
> > > > > > [...]
> > > > > > 
> > > > > > > diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> > > > > > > index 8292d547e8f9..f0d0ce759ba8 100644
> > > > > > > --- a/target/arm/cpu.h
> > > > > > > +++ b/target/arm/cpu.h
> > > > > > > @@ -920,6 +920,9 @@ struct ARMCPU {
> > > > > > >  
> > > > > > >  /* Used to set the maximum vector length the cpu will 
> > > > > > > support.  */
> > > > > > >  uint32_t sve_max_vq;
> > > > > > > +
> > > > > > > +/* Each bit represents a supported vector length of (bitnum 
> > > > > > > * 16) bytes */
> > > > > > > +uint64_t sve_vls_map;
> > > > > > 
> > > > > > Just to be clear, the representation here is different from the
> > > > > > representation in KVM_REG_ARM64_SVE_VLS?
> > > > > > 
> > > > > > In the latter, bit n represents vector length ((n + 1) * 16) bytes.
> > > > > 
> > > > > KVM also uses bitnum * 16. bitnum is the the bit number, not the 
> > > > > shift.
> > > > 
> > > > Can you point to the relevant kernel code?  This isn't what I thought I
> > > > wrote...
> > > 
> > > The Documentation/virtual/kvm/api.txt documentation has
> > > 
> > >  > if (vq >= SVE_VQ_MIN && vq <= SVE_VQ_MAX &&
> > >  >((vector_lengths[(vq - KVM_ARM64_SVE_VQ_MIN) / 64] >>
> > >  >((vq - KVM_ARM64_SVE_VQ_MIN) % 64)) & 1))
> > >  >/* Vector length vq * 16 bytes supported */
> > >  > else
> > >  >/* Vector length vq * 16 bytes not supported */
> > > 
> > > Taking vq=1, we check (vector_lengths[0] >> 0) to see if we have the
> > > length (1 * 16) bytes. Since bitnum 1 is (1 << 0) that means the shift
> > > is zero and (bitnum * 16) is the same as (vq * 16). With ((n + 1) * 16),
> > > n would have to be zero, which is not a valid bitnum, but is a valid
> > > bit shift.
> > 
> > OK, so it sounds like the interpretation of the KVM ABI is what I intended.
> > 
> > What is "bitnum" though?  Most of the time, people seem to number bits
> > starting from 0 -- though of course that's just a convention.
> 
> Eh, I see the problem. I used the word 'number', when I meant 'position'.
> Indeed the bit number of bit position 1 is zero. Sorry about the confusion.
> For v2 I can try to ensure I don't use the wrong terminology.

I'm not sure there's an actual problem here.  I found the terminology
unfamiliar, that's all.

My main concern was that people might be confused about the kernel ABI
here, but it sounds like your interpretation is as I intended.

But by all means clarify the comments if you think there's a risk of
QEMU folks being confused.

[...]

Cheers
---Dave



Re: [Qemu-devel] [PATCH 11/13] target/arm/cpu64: max cpu: Introduce sve-vls-map

2019-05-13 Thread Dave Martin
On Mon, May 13, 2019 at 01:57:40PM +0100, Andrew Jones wrote:
> On Mon, May 13, 2019 at 01:41:26PM +0100, Dave Martin wrote:
> > On Mon, May 13, 2019 at 01:30:23PM +0100, Andrew Jones wrote:
> > > On Mon, May 13, 2019 at 12:26:36PM +0100, Dave Martin wrote:
> > > > On Sun, May 12, 2019 at 09:36:22AM +0100, Andrew Jones wrote:
> > > > > Introduce another cpu property to control SVE vector lengths,
> > > > > sve-vls-map, which allows the user to explicitly select the
> > > > > set of vector lengths the guest can use. The map must conform
> > > > > to QEMU's limits and architectural constraints, checked when
> > > > > the property is set. Inconsistencies with sve-max-vq are also
> > > > > checked. The bit number of a set bit in the map represents the
> > > > > allowed vector length in number of quadwords.
> > > > > 
> > > > > Note, as the map is implemented with a single 64-bit word we
> > > > > currently only support up to 8192-bit vectors. As QEMU and
> > > > > KVM only support up to 2048-bit vectors then this sufficient
> > > > > now, and probably for some time. Extending the bitmap beyond
> > > > > a single word will likely require changing the property to
> > > > > a string and adding yet another parser to QEMU.
> > > > > 
> > > > > Signed-off-by: Andrew Jones 
> > > > > ---
> > > > >  target/arm/cpu.c |  4 +++
> > > > >  target/arm/cpu.h |  3 ++
> > > > >  target/arm/cpu64.c   | 70 
> > > > > +---
> > > > >  target/arm/helper.c  | 10 ++-
> > > > >  target/arm/monitor.c |  9 --
> > > > >  5 files changed, 88 insertions(+), 8 deletions(-)
> > > > > 
> > > > 
> > > > [...]
> > > > 
> > > > > diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> > > > > index 8292d547e8f9..f0d0ce759ba8 100644
> > > > > --- a/target/arm/cpu.h
> > > > > +++ b/target/arm/cpu.h
> > > > > @@ -920,6 +920,9 @@ struct ARMCPU {
> > > > >  
> > > > >  /* Used to set the maximum vector length the cpu will support.  
> > > > > */
> > > > >  uint32_t sve_max_vq;
> > > > > +
> > > > > +/* Each bit represents a supported vector length of (bitnum * 
> > > > > 16) bytes */
> > > > > +uint64_t sve_vls_map;
> > > > 
> > > > Just to be clear, the representation here is different from the
> > > > representation in KVM_REG_ARM64_SVE_VLS?
> > > > 
> > > > In the latter, bit n represents vector length ((n + 1) * 16) bytes.
> > > 
> > > KVM also uses bitnum * 16. bitnum is the the bit number, not the shift.
> > 
> > Can you point to the relevant kernel code?  This isn't what I thought I
> > wrote...
> 
> The Documentation/virtual/kvm/api.txt documentation has
> 
>  > if (vq >= SVE_VQ_MIN && vq <= SVE_VQ_MAX &&
>  >((vector_lengths[(vq - KVM_ARM64_SVE_VQ_MIN) / 64] >>
>  >((vq - KVM_ARM64_SVE_VQ_MIN) % 64)) & 1))
>  >/* Vector length vq * 16 bytes supported */
>  > else
>  >/* Vector length vq * 16 bytes not supported */
> 
> Taking vq=1, we check (vector_lengths[0] >> 0) to see if we have the
> length (1 * 16) bytes. Since bitnum 1 is (1 << 0) that means the shift
> is zero and (bitnum * 16) is the same as (vq * 16). With ((n + 1) * 16),
> n would have to be zero, which is not a valid bitnum, but is a valid
> bit shift.

OK, so it sounds like the interpretation of the KVM ABI is what I intended.

What is "bitnum" though?  Most of the time, people seem to number bits
starting from 0 -- though of course that's just a convention.

> > > So you can't have bitnum=0. 'n' must be a shift, as it would need to
> > > allow zero in order to represent KVM_ARM64_SVE_VQ_MIN.

[...]

> > > > > diff --git a/target/arm/helper.c b/target/arm/helper.c
> > > > > index 1e6eb0d0f360..bedec1ea0b27 100644
> > > > > --- a/target/arm/helper.c
> > > > > +++ b/target/arm/helper.c
> > > > > @@ -5254,12 +5254,20 @@ uint32_t sve_zcr_len_for_el(CPUARMState *env, 
> > > > > int el)
> > > > >  static void zcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
> > > > >uint64_t value

Re: [Qemu-devel] [PATCH 00/13] target/arm/kvm: enable SVE in guests

2019-05-13 Thread Dave Martin
On Mon, May 13, 2019 at 01:38:57PM +0100, Andrew Jones wrote:
> On Mon, May 13, 2019 at 12:15:56PM +0100, Dave Martin wrote:
> > On Mon, May 13, 2019 at 10:32:46AM +0100, Andrea Bolognani wrote:
> > > On Sun, 2019-05-12 at 10:36 +0200, Andrew Jones wrote:

[...]

> > > > The QMP query returns a list of valid vq lists. For example, if
> > > > a guest can use vqs 1, 2, 3, and 4, then the following list will
> > > > be returned
> > > > 
> > > >  [ [ 1 ], [ 1, 2 ], [ 1, 2, 3 ], [ 1, 2, 3, 4 ] ]
> > > > 
> > > > Another example might be 1, 2, 4, as the architecture states 3
> > > > is optional. In that case the list would be
> > > > 
> > > >  [ [ 1 ], [ 1, 2 ], [ 1, 2, 4 ] ]
> > > 
> > > I think the proposed QMP command is problematic, because it reports
> > > the valid vector lengths for either KVM or TCG based on which
> > > accelerator is currently enabled: it should report all information
> > > at all times instead, similarly to how query-gic-capabilities does
> > > it.
> > 
> > I wonder if this is premature flexibility?
> > 
> > The size of these lists is going to get cumbersome if the architecture
> > is ever extended.  Even today, we might need over 100 items in this
> > (nested) list.  If this is to be presented to the user this will be
> > far from friendly, it could get much worse if the architecutre changes
> > in future to allow larger vectors or more flexible virtualisation.
> > 
> > Could we just have a list of supported vector lengths and a possibly
> > empty list of additional capabilities that describe what kinds of
> > flexibility are allowed?
> > 
> > So, for example, we might support vector lengths of 1, 2, 4 and 8
> > quadwords, with the the ability to clamp the max vector length the
> > guest sees: the kernel ABI guarantees that you can do this, even
> > if you can't disable/enable each individual vector length independently.
> > 
> > So, [ 1, 2, 4, 8 ] seems sufficient to describe this in a forwards
> > compatible way.
> > 
> > Some day, we might report { "independent", [ 1, 2, 4, 8, 16, 32, ... ] }
> > 
> > I'm guessing about the data representation here.
> > 
> 
> I think that could work, and something along those lines even crossed my
> mind. Let's see what libvirt folk say. I'm not overly concerned about
> user friendless here though, as users aren't running QMP commands and
> parsing json by hand too much.

One concern I have is that if the architecture ever supports masking out
vector lengths individually, the size of this description becomes
something like O(2^n) in the maximum vector length, instead of O(n^2).

That could be very painful to deal with...

Cheers
---Dave



Re: [Qemu-devel] [PATCH 11/13] target/arm/cpu64: max cpu: Introduce sve-vls-map

2019-05-13 Thread Dave Martin
On Mon, May 13, 2019 at 01:30:23PM +0100, Andrew Jones wrote:
> On Mon, May 13, 2019 at 12:26:36PM +0100, Dave Martin wrote:
> > On Sun, May 12, 2019 at 09:36:22AM +0100, Andrew Jones wrote:
> > > Introduce another cpu property to control SVE vector lengths,
> > > sve-vls-map, which allows the user to explicitly select the
> > > set of vector lengths the guest can use. The map must conform
> > > to QEMU's limits and architectural constraints, checked when
> > > the property is set. Inconsistencies with sve-max-vq are also
> > > checked. The bit number of a set bit in the map represents the
> > > allowed vector length in number of quadwords.
> > > 
> > > Note, as the map is implemented with a single 64-bit word we
> > > currently only support up to 8192-bit vectors. As QEMU and
> > > KVM only support up to 2048-bit vectors then this sufficient
> > > now, and probably for some time. Extending the bitmap beyond
> > > a single word will likely require changing the property to
> > > a string and adding yet another parser to QEMU.
> > > 
> > > Signed-off-by: Andrew Jones 
> > > ---
> > >  target/arm/cpu.c |  4 +++
> > >  target/arm/cpu.h |  3 ++
> > >  target/arm/cpu64.c   | 70 +---
> > >  target/arm/helper.c  | 10 ++-
> > >  target/arm/monitor.c |  9 --
> > >  5 files changed, 88 insertions(+), 8 deletions(-)
> > > 
> > 
> > [...]
> > 
> > > diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> > > index 8292d547e8f9..f0d0ce759ba8 100644
> > > --- a/target/arm/cpu.h
> > > +++ b/target/arm/cpu.h
> > > @@ -920,6 +920,9 @@ struct ARMCPU {
> > >  
> > >  /* Used to set the maximum vector length the cpu will support.  */
> > >  uint32_t sve_max_vq;
> > > +
> > > +/* Each bit represents a supported vector length of (bitnum * 16) 
> > > bytes */
> > > +uint64_t sve_vls_map;
> > 
> > Just to be clear, the representation here is different from the
> > representation in KVM_REG_ARM64_SVE_VLS?
> > 
> > In the latter, bit n represents vector length ((n + 1) * 16) bytes.
> 
> KVM also uses bitnum * 16. bitnum is the the bit number, not the shift.

Can you point to the relevant kernel code?  This isn't what I thought I
wrote...

> So you can't have bitnum=0. 'n' must be a shift, as it would need to
> allow zero in order to represent KVM_ARM64_SVE_VQ_MIN.

[...]

> > > +static void cpu_set_sve_vls_map(Object *obj, Visitor *v, const char 
> > > *name,
> > > +void *opaque, Error **errp)
> > > +{
> > > +ARMCPU *cpu = ARM_CPU(obj);
> > > +Error *err = NULL;
> > > +uint64_t mask = ~(BIT_MASK(ARM_MAX_VQ - 1) - 1);
> > > +int i;
> > > +
> > > +visit_type_uint64(v, name, >sve_vls_map, errp);
> > > +
> > > +if (!err) {
> > > +if (cpu->sve_vls_map == 0) {
> > > +error_setg(, "SVE vector length map cannot be zero");
> > 
> > Maybe say "empty" here, since the map represents a set?
> 
> Empty does sound better when considering the map represents a set, but
> the user will be inputting a number for the map, so zero will be what
> they attempted to input.

If the user is inputting the encoded number directly, I guess it makes
sense to describe the empty set as "zero".

> > (But it this is just for debug rather than reporting errors to the user,
> > it probably doesn't matter much.)
> 
> I'm also not too worried about which term we use, so I'm happy to change
> it if you'd like.

I guess I don't have a strong opinion.

[...]

> > > diff --git a/target/arm/helper.c b/target/arm/helper.c
> > > index 1e6eb0d0f360..bedec1ea0b27 100644
> > > --- a/target/arm/helper.c
> > > +++ b/target/arm/helper.c
> > > @@ -5254,12 +5254,20 @@ uint32_t sve_zcr_len_for_el(CPUARMState *env, int 
> > > el)
> > >  static void zcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
> > >uint64_t value)
> > >  {
> > > +ARMCPU *cpu = arm_env_get_cpu(env);
> > >  int cur_el = arm_current_el(env);
> > >  int old_len = sve_zcr_len_for_el(env, cur_el);
> > >  int new_len;
> > >  
> > >  /* Bits other than [3:0] are RAZ/WI.  */
> > > -raw_write(env, ri, value & 0xf);
> > > +value &= 0xf;
> > 
> > You might want to sanity-check that the max vq you configured for the
> > vcpu is <= 16 here.
> 
> It's already checked by the cpu property set function at input time.

The check is in one place, but the assumptions based on it are
potentially all over the place -- it may be tricky to track them
all down if you ever add support for >8192-bit vectors.

(Not a big deal from my point of view, but I'd migration to larger
vectors to be as smooth as reasonably possible if we ever find we have
to do it.)
 
Cheers
---Dave



Re: [Qemu-devel] [PATCH 05/13] target/arm/kvm: Add kvm_arch_get/put_sve

2019-05-13 Thread Dave Martin
On Sun, May 12, 2019 at 09:36:16AM +0100, Andrew Jones wrote:
> These are the SVE equivalents to kvm_arch_get/put_fpsimd.
> 
> Signed-off-by: Andrew Jones 
> ---
>  target/arm/kvm64.c | 127 +++--
>  1 file changed, 123 insertions(+), 4 deletions(-)

[...]

> +static int kvm_arch_put_sve(CPUState *cs)
> +{
> +ARMCPU *cpu = ARM_CPU(cs);
> +CPUARMState *env = >env;
> +struct kvm_one_reg reg;
> +int n, ret;
> +
> +for (n = 0; n < KVM_ARM64_SVE_NUM_ZREGS; n++) {
> +uint64_t *q = aa64_vfp_qreg(env, n);
> +#ifdef HOST_WORDS_BIGENDIAN
> +uint64_t d[ARM_MAX_VQ * 2];
> +int i;
> +for (i = 0; i < cpu->sve_max_vq * 2; i++) {
> +d[i] = q[cpu->sve_max_vq * 2 - 1 - i];
> +}
> +reg.addr = (uintptr_t)d;
> +#else
> +reg.addr = (uintptr_t)q;
> +#endif
> +reg.id = KVM_REG_ARM64_SVE_ZREG(n, 0);

Will this silently go wrong if more than one slice is required (i.e.,
the register size grows beyond 8192 bits?)

[...]

Cheers
---Dave



Re: [Qemu-devel] [PATCH 05/13] target/arm/kvm: Add kvm_arch_get/put_sve

2019-05-13 Thread Dave Martin
On Sun, May 12, 2019 at 09:36:16AM +0100, Andrew Jones wrote:
> These are the SVE equivalents to kvm_arch_get/put_fpsimd.
> 
> Signed-off-by: Andrew Jones 
> ---
>  target/arm/kvm64.c | 127 +++--
>  1 file changed, 123 insertions(+), 4 deletions(-)

[...]

> +static int kvm_arch_put_sve(CPUState *cs)
> +{
> +ARMCPU *cpu = ARM_CPU(cs);
> +CPUARMState *env = >env;
> +struct kvm_one_reg reg;
> +int n, ret;
> +
> +for (n = 0; n < KVM_ARM64_SVE_NUM_ZREGS; n++) {
> +uint64_t *q = aa64_vfp_qreg(env, n);
> +#ifdef HOST_WORDS_BIGENDIAN
> +uint64_t d[ARM_MAX_VQ * 2];
> +int i;
> +for (i = 0; i < cpu->sve_max_vq * 2; i++) {
> +d[i] = q[cpu->sve_max_vq * 2 - 1 - i];
> +}

Out of interest, why do all this swabbing?  It seems expensive.

Cheers
---Dave



Re: [Qemu-devel] [PATCH 00/13] target/arm/kvm: enable SVE in guests

2019-05-13 Thread Dave Martin
On Mon, May 13, 2019 at 10:32:46AM +0100, Andrea Bolognani wrote:
> On Sun, 2019-05-12 at 10:36 +0200, Andrew Jones wrote:
> [...]
> >CPU type | accel | sve-max-vq | sve-vls-map
> >---
> >  1) max | tcg   |  $MAX_VQ   |  $VLS_MAP
> >  2) max | kvm   |  $MAX_VQ   |  $VLS_MAP
> >  3)host | kvm   |  N/A   |  $VLS_MAP
> > 
> > Where for (1) $MAX_VQ sets the maximum vq and smaller vqs are
> > all supported when $VLS_MAP is zero, or when the vqs are selected
> > in $VLS_MAP.
> 
> I'm a bit confused by the nomenclature here. VL clearly stands for
> Vector Length, but what does VQ stand for? You seem to be using the
> two terms pretty much interchangeably throughout the cover letter.

>From the Linux end, "vector length" or VL refers to the size of a vector
register, either in no particular unit or in bytes.

"VQ" refers specifically to the vector length in 128-bit quadwords.

In some situations, neither terminology is obviously better than the
other, such as in the way KVM_REG_ARM64_SVE_VLS is encoded.

> [...]
> > There is never any need to provide both properties, but if both
> > are provided then they are checked for consistency.
> 
> I would personally just error out when both are provided.
> 
> > The QMP query returns a list of valid vq lists. For example, if
> > a guest can use vqs 1, 2, 3, and 4, then the following list will
> > be returned
> > 
> >  [ [ 1 ], [ 1, 2 ], [ 1, 2, 3 ], [ 1, 2, 3, 4 ] ]
> > 
> > Another example might be 1, 2, 4, as the architecture states 3
> > is optional. In that case the list would be
> > 
> >  [ [ 1 ], [ 1, 2 ], [ 1, 2, 4 ] ]
> 
> I think the proposed QMP command is problematic, because it reports
> the valid vector lengths for either KVM or TCG based on which
> accelerator is currently enabled: it should report all information
> at all times instead, similarly to how query-gic-capabilities does
> it.

I wonder if this is premature flexibility?

The size of these lists is going to get cumbersome if the architecture
is ever extended.  Even today, we might need over 100 items in this
(nested) list.  If this is to be presented to the user this will be
far from friendly, it could get much worse if the architecutre changes
in future to allow larger vectors or more flexible virtualisation.

Could we just have a list of supported vector lengths and a possibly
empty list of additional capabilities that describe what kinds of
flexibility are allowed?

So, for example, we might support vector lengths of 1, 2, 4 and 8
quadwords, with the the ability to clamp the max vector length the
guest sees: the kernel ABI guarantees that you can do this, even
if you can't disable/enable each individual vector length independently.

So, [ 1, 2, 4, 8 ] seems sufficient to describe this in a forwards
compatible way.

Some day, we might report { "independent", [ 1, 2, 4, 8, 16, 32, ... ] }

I'm guessing about the data representation here.

[...]

Cheers
---Dave



Re: [Qemu-devel] [PATCH 11/13] target/arm/cpu64: max cpu: Introduce sve-vls-map

2019-05-13 Thread Dave Martin
On Sun, May 12, 2019 at 09:36:22AM +0100, Andrew Jones wrote:
> Introduce another cpu property to control SVE vector lengths,
> sve-vls-map, which allows the user to explicitly select the
> set of vector lengths the guest can use. The map must conform
> to QEMU's limits and architectural constraints, checked when
> the property is set. Inconsistencies with sve-max-vq are also
> checked. The bit number of a set bit in the map represents the
> allowed vector length in number of quadwords.
> 
> Note, as the map is implemented with a single 64-bit word we
> currently only support up to 8192-bit vectors. As QEMU and
> KVM only support up to 2048-bit vectors then this sufficient
> now, and probably for some time. Extending the bitmap beyond
> a single word will likely require changing the property to
> a string and adding yet another parser to QEMU.
> 
> Signed-off-by: Andrew Jones 
> ---
>  target/arm/cpu.c |  4 +++
>  target/arm/cpu.h |  3 ++
>  target/arm/cpu64.c   | 70 +---
>  target/arm/helper.c  | 10 ++-
>  target/arm/monitor.c |  9 --
>  5 files changed, 88 insertions(+), 8 deletions(-)
> 

[...]

> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 8292d547e8f9..f0d0ce759ba8 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -920,6 +920,9 @@ struct ARMCPU {
>  
>  /* Used to set the maximum vector length the cpu will support.  */
>  uint32_t sve_max_vq;
> +
> +/* Each bit represents a supported vector length of (bitnum * 16) bytes 
> */
> +uint64_t sve_vls_map;

Just to be clear, the representation here is different from the
representation in KVM_REG_ARM64_SVE_VLS?

In the latter, bit n represents vector length ((n + 1) * 16) bytes.

(QEMU is free to choose its own internal representation, naturally.)

[...]

> diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c

[...]

> +static void cpu_set_sve_vls_map(Object *obj, Visitor *v, const char *name,
> +void *opaque, Error **errp)
> +{
> +ARMCPU *cpu = ARM_CPU(obj);
> +Error *err = NULL;
> +uint64_t mask = ~(BIT_MASK(ARM_MAX_VQ - 1) - 1);
> +int i;
> +
> +visit_type_uint64(v, name, >sve_vls_map, errp);
> +
> +if (!err) {
> +if (cpu->sve_vls_map == 0) {
> +error_setg(, "SVE vector length map cannot be zero");

Maybe say "empty" here, since the map represents a set?

(But it this is just for debug rather than reporting errors to the user,
it probably doesn't matter much.)

[...]

> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index 1e6eb0d0f360..bedec1ea0b27 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -5254,12 +5254,20 @@ uint32_t sve_zcr_len_for_el(CPUARMState *env, int el)
>  static void zcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
>uint64_t value)
>  {
> +ARMCPU *cpu = arm_env_get_cpu(env);
>  int cur_el = arm_current_el(env);
>  int old_len = sve_zcr_len_for_el(env, cur_el);
>  int new_len;
>  
>  /* Bits other than [3:0] are RAZ/WI.  */
> -raw_write(env, ri, value & 0xf);
> +value &= 0xf;

You might want to sanity-check that the max vq you configured for the
vcpu is <= 16 here.

> +
> +if (value && !(BIT_MASK(value) & cpu->sve_vls_map)) {
> +uint64_t map = cpu->sve_vls_map & (BIT_MASK(value) - 1);
> +value = arm_cpu_fls64(map) - 1;
> +}
> +
> +raw_write(env, ri, value);

[...]

Cheers
---Dave



Re: [Qemu-devel] How do we do user input bitmap properties?

2019-04-18 Thread Dave Martin
On Thu, Apr 18, 2019 at 03:43:06PM +0100, Andrew Jones wrote:
> On Thu, Apr 18, 2019 at 03:03:02PM +0100, Dave Martin wrote:

[...]

> > It's worth nothing that there are two problems to be solved here: one is
> > to specify an exact set unambiguously, which is important for migration
> > scenarios.
> > 
> > The other is to be able to clamp the vector length for user convenience,
> > but without particularly considering migration.  There's no direct way
> > for the user to know the set of vector lengths supported by KVM today,
> > so cooking up the right magic to go on the command line is non-trivial:
> > you have to create a dummy vcpu and read KVM_REG_ARM64_SVE_VLS to find
> > the host's supported set.  Or you just have to know.
> 
> Right, we'll have to query first to know what's available. libvirt will
> do that and tools built on libvirt should help provide the user with
> sane defaults or a relatively painless selection process. The QEMU command
> line is primarily for developers, so they'll likely just know what they
> want.
> 
> > 
> > A separate max-vl option (or whatever) might be a useful alternative.
> 
> Yeah, we could offer this to be nicer to QEMU command line users that
> don't intend to migrate, or don't care if a migration can fail. This
> would be analogous to '-cpu max' which enables all available cpu
> features. Bad for migration, but a nice shorthand if you want your
> guest to have everything.

Sounds reasonable.

Cheers
---Dave



Re: [Qemu-devel] How do we do user input bitmap properties?

2019-04-18 Thread Dave Martin
On Thu, Apr 18, 2019 at 12:28:47PM +0100, Andrew Jones wrote:
> On Thu, Apr 18, 2019 at 11:52:04AM +0100, Dave Martin wrote:
> > On Thu, Apr 18, 2019 at 10:46:34AM +0100, Andrew Jones wrote:
> > > On Thu, Apr 18, 2019 at 11:28:41AM +0200, Andrew Jones wrote:
> > > > Hi all,
> > > > 
> > > > First some background:
> > > > 
> > > > For the userspace side of AArch64 guest SVE support we need to
> > > > expose KVM's allowed vector lengths bitmap to the user and allow
> > > > the user to choose a subset of that bitmap. Since bitmaps are a
> > > > bit awkward to work with then we'll likely want to expose it as
> > > > an array of vector lengths instead. Also, assuming we want to
> > > > expose the lengths as number-of-quadwords (quadword == 128 bits
> > > > for AArch64 and vector lengths must be multiples of quadwords)
> > > > rather than number-of-bits, then an example array (which will
> > > > always be a sequence) might be
> > > > 
> > > >  [ 8, 16, 32 ]
> > > > 
> > > > The user may choose a subsequence, but only through truncation,
> > > > i.e. [ 8, 32 ] is not valid, but [ 8, 16 ] is.
> > > > 
> > > > Furthermore, different hosts may support different sequences
> > > > which have the same maximum. For example, if the above sequence
> > > > is for Host_A, then Host_B could be
> > > > 
> > > >  [ 8, 16, 24, 32 ]
> > > 
> > > It doesn't really matter for this discussion, but I just realized
> > > that I picked bogus numbers for these examples. 32 would be too
> > > big. The largest supported is 16. I probably should have just used
> > > the simple [ 1, 2, 4 ] and [ 1, 2, 3, 4 ] arrays, corresponding to
> > > vector lengths (in bits) 128, 256, 512 and 128, 256, 384, 512.
> > 
> > I'd argue differently from the ABI perspective.
> > 
> > If the host supports vq = [ 1,2,3,4 ], then it is entirely valid to ask
> > for [ 1,2,4 ].  KVM will fail the KVM_REG_ARM64_SVE_VLS write with
> > EINVAL, but I don't distinguish between a set that is not satisfiable on
> > this hardware and a set that is not valid at all.  The kernel basically
> > doesn't check for the latter: an architecturally invalid set will always
> > not be satisfiable on the hardware.
> 
> I agree that 1,2,4 should be a valid set, but how do we avoid the user
> attempting to select it on a platform that supports 1,2,3,4 now? Until
> the hardware and KVM will allow it, then I don't think we should try
> to "support" it in any way. Maybe we should name the property such that
> it's clear we need to use truncation when constructing a subset now.
> This would allow us to add another, more generally named, property later
> when we can select nearly arbitrary subsets.
> 
> > 
> > Ideally, userspace should not preempt this decision, in case the
> > architecture becomes more flexible someday in what it can virtualise.
> > 
> > > > The host must support all lengths in the sequence, which means
> > > > that while Host_A supports 32, since it doesn't support 24 and
> > > > we can only truncate sequences, we must use either [ 8 ] or
> > > > [ 8, 16 ] for a compatible sequence if we intend to migrate
> > > > between the hosts.
> > > > 
> > > > Now to the $SUBJECT question:
> > > > 
> > > > My feeling is that we should require the sequence to be
> > > > provided on the command line as a cpu property. Something
> > > > like
> > > > 
> > > >   -cpu host,sve-vl-list=8:16
> > 
> > I can't decide whether it's more or less user-friendly to denote VL in
> > terms of bytes in this kind of context.
> 
> I'm not sure either. Switching to my second take at quadwords this should
> have been '1:2'. In bytes [ 1, 2 ] would be '16:32'.
> 
> > 
> > Usually when I say "VL" in the kernel ABI I try to mean this.
> > (KVM_REG_ARM64_SVE_VLS is a special case: here the lengths are fancily
> > encoded rather than being passed as integers, so the precise unit used
> > is a more abstract concept here ... or anyway, that's my excuse.
> > _SVE_VQS felt odd here.)
> 
> My thinking is that using the number of quadwords keeps the numbers
> smaller. But, if people are mostly going to want to think about vector
> lengths in terms of bits, like the specification does, then maybe we
> should just let the numbers be bigger: [ 128, 256, 512 ]
> 
> > 
> > For kvmtool I we

Re: [Qemu-devel] How do we do user input bitmap properties?

2019-04-18 Thread Dave Martin
On Thu, Apr 18, 2019 at 10:46:34AM +0100, Andrew Jones wrote:
> On Thu, Apr 18, 2019 at 11:28:41AM +0200, Andrew Jones wrote:
> > Hi all,
> > 
> > First some background:
> > 
> > For the userspace side of AArch64 guest SVE support we need to
> > expose KVM's allowed vector lengths bitmap to the user and allow
> > the user to choose a subset of that bitmap. Since bitmaps are a
> > bit awkward to work with then we'll likely want to expose it as
> > an array of vector lengths instead. Also, assuming we want to
> > expose the lengths as number-of-quadwords (quadword == 128 bits
> > for AArch64 and vector lengths must be multiples of quadwords)
> > rather than number-of-bits, then an example array (which will
> > always be a sequence) might be
> > 
> >  [ 8, 16, 32 ]
> > 
> > The user may choose a subsequence, but only through truncation,
> > i.e. [ 8, 32 ] is not valid, but [ 8, 16 ] is.
> > 
> > Furthermore, different hosts may support different sequences
> > which have the same maximum. For example, if the above sequence
> > is for Host_A, then Host_B could be
> > 
> >  [ 8, 16, 24, 32 ]
> 
> It doesn't really matter for this discussion, but I just realized
> that I picked bogus numbers for these examples. 32 would be too
> big. The largest supported is 16. I probably should have just used
> the simple [ 1, 2, 4 ] and [ 1, 2, 3, 4 ] arrays, corresponding to
> vector lengths (in bits) 128, 256, 512 and 128, 256, 384, 512.

I'd argue differently from the ABI perspective.

If the host supports vq = [ 1,2,3,4 ], then it is entirely valid to ask
for [ 1,2,4 ].  KVM will fail the KVM_REG_ARM64_SVE_VLS write with
EINVAL, but I don't distinguish between a set that is not satisfiable on
this hardware and a set that is not valid at all.  The kernel basically
doesn't check for the latter: an architecturally invalid set will always
not be satisfiable on the hardware.

Ideally, userspace should not preempt this decision, in case the
architecture becomes more flexible someday in what it can virtualise.

> > The host must support all lengths in the sequence, which means
> > that while Host_A supports 32, since it doesn't support 24 and
> > we can only truncate sequences, we must use either [ 8 ] or
> > [ 8, 16 ] for a compatible sequence if we intend to migrate
> > between the hosts.
> > 
> > Now to the $SUBJECT question:
> > 
> > My feeling is that we should require the sequence to be
> > provided on the command line as a cpu property. Something
> > like
> > 
> >   -cpu host,sve-vl-list=8:16

I can't decide whether it's more or less user-friendly to denote VL in
terms of bytes in this kind of context.

Usually when I say "VL" in the kernel ABI I try to mean this.
(KVM_REG_ARM64_SVE_VLS is a special case: here the lengths are fancily
encoded rather than being passed as integers, so the precise unit used
is a more abstract concept here ... or anyway, that's my excuse.
_SVE_VQS felt odd here.)

For kvmtool I went with quadwords on the command line, but only as a
quick hack to simplify the parser slightly.

OTOH, specifying 1,2,4 on the command line is clearly less annoying and
error-prone than 16,32,64.

Naming the option with "vq" instead of "vl" is another option, though
"vq" is Linux terminology not endorsed by the architecture.

For kvmtool I've considered a range / filtering syntax, but I didn't
implement it yet.

[...]

Cheers
---Dave



Re: [Qemu-devel] [PATCH v3 15/20] kvm: arm/arm64: Allow tuning the physical address size for VM

2018-07-11 Thread Dave Martin
On Wed, Jul 11, 2018 at 10:05:50AM +0100, Suzuki K Poulose wrote:
> On 10/07/18 18:03, Dave Martin wrote:
> >On Tue, Jul 10, 2018 at 05:38:39PM +0100, Suzuki K Poulose wrote:
> >>On 09/07/18 14:37, Dave Martin wrote:
> >>>On Mon, Jul 09, 2018 at 01:29:42PM +0100, Marc Zyngier wrote:
> >>>>On 09/07/18 12:23, Dave Martin wrote:
> >
> >[...]
> >
> >>>>>Wedging arguments into a few bits in the type argument feels awkward,
> >>>>>and may be regretted later if we run out of bits, or something can't be
> >>>>>represented in the chosen encoding.
> >>>>
> >>>>I think that's a pretty convincing argument for a "better" CREATE_VM,
> >>>>one that would have a clearly defined, structured (and potentially
> >>>>extensible) argument.
> >>>>
> >>>>I've quickly hacked the following:
> >>>>
> >>>>diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> >>>>index b6270a3b38e9..3e76214034c2 100644
> >>>>--- a/include/uapi/linux/kvm.h
> >>>>+++ b/include/uapi/linux/kvm.h
> >>>>@@ -735,6 +735,20 @@ struct kvm_ppc_resize_hpt {
> >>>>  __u32 pad;
> >>>>  };
> >>>>
> >>>>+struct kvm_create_vm2 {
> >>>>+ __u64   version;/* Or maybe not */
> >>>>+ union {
> >>>>+ struct {
> >>>>+#define KVM_ARM_SVE_CAPABLE  (1 << 0)
> >>>>+#define KVM_ARM_SELECT_IPA   {1 << 1)
> >>>>+ __u64   capabilities;
> >>>>+ __u16   sve_vlen;
> >>>>+ __u8ipa_size;
> >>>>+ } arm64;
> >>>>+ __u64   dummy[15];
> >>>>+ };
> >>>>+};
> >>>>+
> >>>>  #define KVMIO 0xAE
> >>>>
> >>>>  /* machine type bits, to be used as argument to KVM_CREATE_VM */
> >>>>
> >>>>Other architectures could fill in their own bits if they need to.
> >>>>
> >>>>Thoughts?
> >>>
> >>>This kind of thing should work, but it may still get messy when we
> >>>add additional fields.
> >>
> >>
> >>Marc, Dave,
> >>
> >>I like Dave's approach. Some comments below.
> >>
> >>>
> >>>It we want this to work cross-arch, would it make sense to go
> >>>for a more generic approach, say
> >>>
> >>>struct kvm_create_vm_attr_any {
> >>> __u32   type;
> >>>};
> >>>
> >>>#define KVM_CREATE_VM_ATTR_ARCH_CAPABILITIES 1
> >>>struct kvm_create_vm_attr_arch_capabilities {
> >>> __u32   type;
> >>> __u16   size; /* support future expansion of capabilities[] */
> >>> __u16   reserved;
> >>> __u64   capabilities[1];
> >>>};
> >>
> >>We also need to advertise which attributes are supported by the host,
> >>so that the user can tune the available ones. That would make a bit mask
> >>like the above trickier, unless we return the supported values back
> >>in the argument ptr for the "probe" call. And this scheme in general
> >>can be useful for passing back a non-boolean result specific to the
> >>attribute, without having a per-attribute ioctl. (e.g, maximum limit
> >>for IPA).
> >
> >Maybe, but this could quickly become bloated.  (My approach already
> >feels a bit bloated...)
> >
> >I'm not sure that arbitrarily complex negotiation will really be
> >needed, but userspace might want to change its mind if setting a
> >particular propertiy fails.
> >
> >An alternative might be to have a bunch of per-VM ioctls to configure
> >different things, like x86 has.  There's at least precedent for that.
> >For arm, we currently only have a few.  That allows for easy extension,
> >at the cost of adding ioctls.
> 
> As you know, one of the major problems with the per-VM ioctls is
> the ordering of different operations and tracking to make sure that
> the userspace follows the expected order. e.g, the first approach for
> IPA series was based on this and it made things complex enough to drop
> it.

I'm aware of that, but if we are adding a new KVM_CREATE_VM, we could
perhaps give it different semantics: i.e., we create a half-created VM
that only accepts configuration ioctls and a "finish creation" ioctl
that finalises everything before you're allowed to create devices,
vcpus etc.

This is the sort of thing I was moving torwards for SVE (but for
vcpus there).

I'm not saying we should drop the existing KVM_CREATE_VM2 ideas,
but that we should take a step back if it starts to accrue complexity.

> >
> >There may be some ioctls we can reuse, like KVM_ENABLE_CAP for per-
> >vm capability flags.
> 
> May be we could switch to KVM_VM_CAPS and pass a list of capabilities
> to be enabled at creation time ? The kvm_enable_cap can pass in additional
> arguments for each cap. That way we don't have to rely on a new set of
> attributes and probing becomes straight forward.

That's a possibility.  I guess we'd need to understand how exactly x86
uses this.

Cheers
---Dave



Re: [Qemu-devel] [PATCH v3 15/20] kvm: arm/arm64: Allow tuning the physical address size for VM

2018-07-10 Thread Dave Martin
On Tue, Jul 10, 2018 at 05:38:39PM +0100, Suzuki K Poulose wrote:
> On 09/07/18 14:37, Dave Martin wrote:
> >On Mon, Jul 09, 2018 at 01:29:42PM +0100, Marc Zyngier wrote:
> >>On 09/07/18 12:23, Dave Martin wrote:

[...]

> >>>Wedging arguments into a few bits in the type argument feels awkward,
> >>>and may be regretted later if we run out of bits, or something can't be
> >>>represented in the chosen encoding.
> >>
> >>I think that's a pretty convincing argument for a "better" CREATE_VM,
> >>one that would have a clearly defined, structured (and potentially
> >>extensible) argument.
> >>
> >>I've quickly hacked the following:
> >>
> >>diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> >>index b6270a3b38e9..3e76214034c2 100644
> >>--- a/include/uapi/linux/kvm.h
> >>+++ b/include/uapi/linux/kvm.h
> >>@@ -735,6 +735,20 @@ struct kvm_ppc_resize_hpt {
> >>__u32 pad;
> >>  };
> >>
> >>+struct kvm_create_vm2 {
> >>+   __u64   version;/* Or maybe not */
> >>+   union {
> >>+   struct {
> >>+#define KVM_ARM_SVE_CAPABLE(1 << 0)
> >>+#define KVM_ARM_SELECT_IPA {1 << 1)
> >>+   __u64   capabilities;
> >>+   __u16   sve_vlen;
> >>+   __u8ipa_size;
> >>+   } arm64;
> >>+   __u64   dummy[15];
> >>+   };
> >>+};
> >>+
> >>  #define KVMIO 0xAE
> >>
> >>  /* machine type bits, to be used as argument to KVM_CREATE_VM */
> >>
> >>Other architectures could fill in their own bits if they need to.
> >>
> >>Thoughts?
> >
> >This kind of thing should work, but it may still get messy when we
> >add additional fields.
> 
> 
> Marc, Dave,
> 
> I like Dave's approach. Some comments below.
> 
> >
> >It we want this to work cross-arch, would it make sense to go
> >for a more generic approach, say
> >
> >struct kvm_create_vm_attr_any {
> > __u32   type;
> >};
> >
> >#define KVM_CREATE_VM_ATTR_ARCH_CAPABILITIES 1
> >struct kvm_create_vm_attr_arch_capabilities {
> > __u32   type;
> > __u16   size; /* support future expansion of capabilities[] */
> > __u16   reserved;
> > __u64   capabilities[1];
> >};
> 
> We also need to advertise which attributes are supported by the host,
> so that the user can tune the available ones. That would make a bit mask
> like the above trickier, unless we return the supported values back
> in the argument ptr for the "probe" call. And this scheme in general
> can be useful for passing back a non-boolean result specific to the
> attribute, without having a per-attribute ioctl. (e.g, maximum limit
> for IPA).

Maybe, but this could quickly become bloated.  (My approach already
feels a bit bloated...)

I'm not sure that arbitrarily complex negotiation will really be
needed, but userspace might want to change its mind if setting a
particular propertiy fails.

An alternative might be to have a bunch of per-VM ioctls to configure
different things, like x86 has.  There's at least precedent for that.
For arm, we currently only have a few.  That allows for easy extension,
at the cost of adding ioctls.

There may be some ioctls we can reuse, like KVM_ENABLE_CAP for per-
vm capability flags.


[...]

> >union kvm_create_vm_attr {
> > struct kvm_create_vm_attr_any;
> > struct kvm_create_vm_attr_arch_capabilities;
> > struct kvm_create_vm_attr_arm64_physaddr_size;
> > /* ... */
> >};
> 
> nit: Could we simply do s/kvm_create_vm_attr/kvm_vm_attr/ everywhere ?
> While I agree that the kvm_create_vm_attr makes it implicit that the 
> attributes
> are valid only "create" ioctl, the lack of an ioctl to set the VM attribute
> should be sufficient to indicate the same.

I just randomly came up with some names.  The precise naming scheme
isn't that important, so long as it unlikely to result in name
collisions and so long as it's reasonablu clear (or compiler-checkable,
or preferably both) which things can be used where.

I wouldn't have a problem with something a bit terser.

> 
> >
> >struct kvm_create_vm2 {
> > __u32   version;/* harmless, even if not useful */
> > __u16   nr_attrs;   /* or could just terminate attrs with a
> >NULL entry */
> > union kvm_create_vm_attr __user *__user *attrs;
> >};
> >
> >
> >This is quite flexible, but obviously a bit heavy.
> >
> >However, if we're adding a new interface due to lack of extensibility,
> >it may be worth going for something that's freely extensible.
> 
> True. I could hack something up along the lines above and send it here.

Sure, but best to keep it fairly rough for now.

Cheers
---Dave



Re: [Qemu-devel] [PATCH v3 15/20] kvm: arm/arm64: Allow tuning the physical address size for VM

2018-07-09 Thread Dave Martin
On Mon, Jul 09, 2018 at 01:29:42PM +0100, Marc Zyngier wrote:
> On 09/07/18 12:23, Dave Martin wrote:
> > On Fri, Jul 06, 2018 at 05:39:00PM +0100, Suzuki K Poulose wrote:
> >> On 07/06/2018 04:09 PM, Marc Zyngier wrote:
> >>> On 06/07/18 14:49, Suzuki K Poulose wrote:
> >>>> On 04/07/18 23:03, Suzuki K Poulose wrote:
> >>>>> On 07/04/2018 04:51 PM, Will Deacon wrote:
> >>>>>> Hi Suzuki,
> >>>>>>
> >>>>>> On Fri, Jun 29, 2018 at 12:15:35PM +0100, Suzuki K Poulose wrote:
> >>>>>>> Allow specifying the physical address size for a new VM via
> >>>>>>> the kvm_type argument for KVM_CREATE_VM ioctl. This allows
> >>>>>>> us to finalise the stage2 page table format as early as possible
> >>>>>>> and hence perform the right checks on the memory slots without
> >>>>>>> complication. The size is encoded as Log2(PA_Size) in the bits[7:0]
> >>>>>>> of the type field and can encode more information in the future if
> >>>>>>> required. The IPA size is still capped at 40bits.
> >>>>>>>
> >>>>>>> Cc: Marc Zyngier 
> >>>>>>> Cc: Christoffer Dall 
> >>>>>>> Cc: Peter Maydel 
> >>>>>>> Cc: Paolo Bonzini 
> >>>>>>> Cc: Radim Krčmář 
> >>>>>>> Signed-off-by: Suzuki K Poulose 
> >>>>>>> ---
> >>>>>>>   arch/arm/include/asm/kvm_mmu.h   |  2 ++
> >>>>>>>   arch/arm64/include/asm/kvm_arm.h | 10 +++---
> >>>>>>>   arch/arm64/include/asm/kvm_mmu.h |  2 ++
> >>>>>>>   include/uapi/linux/kvm.h | 10 ++
> >>>>>>>   virt/kvm/arm/arm.c   | 24 ++--
> >>>>>>>   5 files changed, 39 insertions(+), 9 deletions(-)
> >>>>>>
> >>>>>> [...]
> >>>>>>
> >>>>>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> >>>>>>> index 4df9bb6..fa4cab0 100644
> >>>>>>> --- a/include/uapi/linux/kvm.h
> >>>>>>> +++ b/include/uapi/linux/kvm.h
> >>>>>>> @@ -751,6 +751,16 @@ struct kvm_ppc_resize_hpt {
> >>>>>>>   #define KVM_S390_SIE_PAGE_OFFSET 1
> >>>>>>>   /*
> >>>>>>> + * On arm/arm64, machine type can be used to request the physical
> >>>>>>> + * address size for the VM. Bits [7-0] have been reserved for the
> >>>>>>> + * PA size shift (i.e, log2(PA_Size)). For backward compatibility,
> >>>>>>> + * value 0 implies the default IPA size, which is 40bits.
> >>>>>>> + */
> >>>>>>> +#define KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK    0xff
> >>>>>>> +#define KVM_VM_TYPE_ARM_PHYS_SHIFT(x)    \
> >>>>>>> +    ((x) & KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK)
> >>>>>>
> >>>>>> This seems like you're allocating quite a lot of bits in a 
> >>>>>> non-extensible
> >>>>>> interface to a fairly esoteric parameter. Would it be better to add 
> >>>>>> another
> >>>>>> ioctl, or condense the number of sizes you support instead?
> >>>>>
> >>>>> As I explained in the other thread, we need the size as soon as the VM
> >>>>> is created. The major challenge is keeping the backward compatibility by
> >>>>> mapping 0 to 40bits. I will give it a thought.
> >>>>
> >>>> Here is one option. We could re-use the {V}TCR_ELx.{I}PS field format, 
> >>>> which
> >>>> occupies 3 bits and has the following definitions. 
> >>>> (ID_AA64MMFR0_EL1:PARange
> >>>> also has the field definitions, except that the field is 4bits wide, but
> >>>> only 3bits are used)
> >>>>
> >>>> 000 32 bits, 4GB.
> >>>> 001 36 bits, 64GB.
> >>>> 010 40 bits, 1TB.
> >>>> 011 42 bits, 4TB.
> >>>> 100 44 bits, 16TB.
> >>>> 101 48 bits, 256TB.
> >>>> 110 52 bits, 4PB
> >>>>
> >>>> But we need to map 0 => 40bits IPA to make our ABI backward compatible. 
> >>>> So
&

Re: [Qemu-devel] [PATCH v3 15/20] kvm: arm/arm64: Allow tuning the physical address size for VM

2018-07-09 Thread Dave Martin
On Fri, Jul 06, 2018 at 05:39:00PM +0100, Suzuki K Poulose wrote:
> On 07/06/2018 04:09 PM, Marc Zyngier wrote:
> >On 06/07/18 14:49, Suzuki K Poulose wrote:
> >>On 04/07/18 23:03, Suzuki K Poulose wrote:
> >>>On 07/04/2018 04:51 PM, Will Deacon wrote:
> Hi Suzuki,
> 
> On Fri, Jun 29, 2018 at 12:15:35PM +0100, Suzuki K Poulose wrote:
> >Allow specifying the physical address size for a new VM via
> >the kvm_type argument for KVM_CREATE_VM ioctl. This allows
> >us to finalise the stage2 page table format as early as possible
> >and hence perform the right checks on the memory slots without
> >complication. The size is encoded as Log2(PA_Size) in the bits[7:0]
> >of the type field and can encode more information in the future if
> >required. The IPA size is still capped at 40bits.
> >
> >Cc: Marc Zyngier 
> >Cc: Christoffer Dall 
> >Cc: Peter Maydel 
> >Cc: Paolo Bonzini 
> >Cc: Radim Krčmář 
> >Signed-off-by: Suzuki K Poulose 
> >---
> >   arch/arm/include/asm/kvm_mmu.h   |  2 ++
> >   arch/arm64/include/asm/kvm_arm.h | 10 +++---
> >   arch/arm64/include/asm/kvm_mmu.h |  2 ++
> >   include/uapi/linux/kvm.h | 10 ++
> >   virt/kvm/arm/arm.c   | 24 ++--
> >   5 files changed, 39 insertions(+), 9 deletions(-)
> 
> [...]
> 
> >diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> >index 4df9bb6..fa4cab0 100644
> >--- a/include/uapi/linux/kvm.h
> >+++ b/include/uapi/linux/kvm.h
> >@@ -751,6 +751,16 @@ struct kvm_ppc_resize_hpt {
> >   #define KVM_S390_SIE_PAGE_OFFSET 1
> >   /*
> >+ * On arm/arm64, machine type can be used to request the physical
> >+ * address size for the VM. Bits [7-0] have been reserved for the
> >+ * PA size shift (i.e, log2(PA_Size)). For backward compatibility,
> >+ * value 0 implies the default IPA size, which is 40bits.
> >+ */
> >+#define KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK    0xff
> >+#define KVM_VM_TYPE_ARM_PHYS_SHIFT(x)    \
> >+    ((x) & KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK)
> 
> This seems like you're allocating quite a lot of bits in a non-extensible
> interface to a fairly esoteric parameter. Would it be better to add 
> another
> ioctl, or condense the number of sizes you support instead?
> >>>
> >>>As I explained in the other thread, we need the size as soon as the VM
> >>>is created. The major challenge is keeping the backward compatibility by
> >>>mapping 0 to 40bits. I will give it a thought.
> >>
> >>Here is one option. We could re-use the {V}TCR_ELx.{I}PS field format, which
> >>occupies 3 bits and has the following definitions. (ID_AA64MMFR0_EL1:PARange
> >>also has the field definitions, except that the field is 4bits wide, but
> >>only 3bits are used)
> >>
> >>000 32 bits, 4GB.
> >>001 36 bits, 64GB.
> >>010 40 bits, 1TB.
> >>011 42 bits, 4TB.
> >>100 44 bits, 16TB.
> >>101 48 bits, 256TB.
> >>110 52 bits, 4PB
> >>
> >>But we need to map 0 => 40bits IPA to make our ABI backward compatible. So
> >>we could use the additional one bit to indicate that IPA size is requested
> >>in the 3 bits.
> >>
> >>i.e,
> >>
> >>machine_type:
> >>
> >>Bit [2:0]   - Requested IPA size. Values follow VTCR_EL2.PS format.
> >>
> >>Bit [3] - 1 => IPA Size bits (Bits[2:0]) requested.
> >>0 => Not requested
> >>
> >>The only minor down side is restricting to the predefined values above,
> >>which is not a real issue for a VM.
> >>
> >>Thoughts ?
> >
> >I'd be very wary of using that 4th bit to do something that is not in
> >the architecture. We have only a single value left to be used (0b111),
> >and then your scheme clashes with the architecture definition.
> 
> I agree. However, if we ever go beyond the 3bits in PARange, we have an
> issue with {V}TCR counter part. But lets not take that chance.
> 
> >
> >I'd rather encode things in a way that is independent from the
> >architecture, and be done with it. You can map 0 to 40bits, and we have
> >the ability to express all values the architecture has (just in a
> >different order).
> 
> The other option I can think of is encoding a signed number which is the
> difference of the IPA from 40. But that would need 5 bits if we were to
> encode it as it is. And if we want to squeeze it in 4bit, we could store
> half the difference (limiting the IPA limit to even numbers).
> 
> i.e IPA = 40 + 2 * sign_extend(bits[3:0);

I came across similar issues when trying to work out how to enable
SVE for KVM.  In the end I reduced this to a per-vcpu feature, but
it means that there is no global opt-in for the SVE-specific KVM
API extensions:

That's a bit gross, because SVE may require a change to the way
vcpus are initialised.  The set of supported SVE vector lengths needs
to be set somehow before the vcpu is set running, but it's tricky do
do that without a new ioctl -- which would mean that 

Re: [Qemu-devel] [RISU PATCH 07/10] risugen: add --sve support

2017-11-09 Thread Dave Martin
On Tue, Nov 07, 2017 at 03:05:55PM +, Alex Bennée wrote:
> This is similar to the approach used by the FP/simd data in so far as
> we generate a block of random data and then load into it. As there are
> no post-index SVE operations we need to emit an additional incp
> instruction to generate our offset into the array.
> 
> Signed-off-by: Alex Bennée 
> ---
>  risugen|  3 +++
>  risugen_arm.pm | 57 ++---
>  2 files changed, 53 insertions(+), 7 deletions(-)
> 
> diff --git a/risugen b/risugen
> index aba4bb7..0ac8e86 100755
> --- a/risugen
> +++ b/risugen
> @@ -317,6 +317,7 @@ sub main()
>  my $condprob = 0;
>  my $fpscr = 0;
>  my $fp_enabled = 1;
> +my $sve_enabled = 1;
>  my $big_endian = 0;
>  my ($infile, $outfile);
>  
> @@ -334,6 +335,7 @@ sub main()
>  },
>  "be" => sub { $big_endian = 1; },
>  "no-fp" => sub { $fp_enabled = 0; },
> +"sve" => sub { $sve_enabled = 1; },
>  ) or return 1;
>  # allow "--pattern re,re" and "--pattern re --pattern re"
>  @pattern_re = split(/,/,join(',',@pattern_re));
> @@ -361,6 +363,7 @@ sub main()
>  'fpscr' => $fpscr,
>  'numinsns' => $numinsns,
>  'fp_enabled' => $fp_enabled,
> +'sve_enabled' => $sve_enabled,
>  'outfile' => $outfile,
>  'details' => \%insn_details,
>  'keys' => \@insn_keys,
> diff --git a/risugen_arm.pm b/risugen_arm.pm
> index 2f10d58..8d1e1fd 100644
> --- a/risugen_arm.pm
> +++ b/risugen_arm.pm
> @@ -472,9 +472,47 @@ sub write_random_aarch64_fpdata()
>  }
>  }
>  
> -sub write_random_aarch64_regdata($)
> +sub write_random_aarch64_svedata()
>  {
> -my ($fp_enabled) = @_;
> +# Load SVE registers
> +my $align = 16;
> +my $vl = 16; # number of vqs

Would this be better phrased

my $vq = 16;# quadwords per vector

> +my $datalen = (32 * $vl * 16) + $align;
> +
> +write_pc_adr(0, (3 * 4) + ($align - 1)); # insn 1
> +write_align_reg(0, $align);  # insn 2
> +write_jump_fwd($datalen);# insn 3
> +
> +# align safety
> +for (my $i = 0; $i < ($align / 4); $i++) {
> +# align with nops
> +insn32(0xd503201f);
> +};
> +
> +for (my $rt = 0; $rt <= 31; $rt++) {
> +for (my $q = 0; $q < $vl; $q++) {
> +write_random_fpreg_var(4); # quad
> +}
> +}
> +
> +# Reset all the predicate registers to all true
> +for (my $p = 0; $p < 16; $p++) {
> +insn32(0x2518e3e0 | $p);
> +}
> +
> +# there is no post index load so we do this by hand
> +write_mov_ri(1, 0);
> +for (my $rt = 0; $rt <= 31; $rt++) {
> +# ld1dz0.d, p0/z, [x0, x1, lsl #3]
> +insn32(0xa5e14000 | $rt);
> +# incpx1, p0.d
> +insn32(0x25ec8801);

You could avoid this with the unpredicated form LDR (vector).
(LD1x scalar+immediate doesn't provide enough immediate range).

# ldr   z$rt, [x0, #$rt, mul vl]
insn32(0x85804000 + $rt + (($rt & 7) << 10) + (($rt & 0x18) << 13));

which is what the kernel does.

No harm in exercising different instructions though!  The kernel uses
embarrassingly few.


Does it matter that the stride will depend on the actual current VL?
If x0 just points to a block of random data, I guess it doesn't matter:
some trailing data remains unused, but that doesn't make the used data
any less random.

[...]

Cheers
---Dave



Re: [Qemu-devel] [RISU PATCH 00/10] Initial support for SVE

2017-11-08 Thread Dave Martin
On Wed, Nov 08, 2017 at 11:02:15AM +, Alex Bennée wrote:
> 
> Dave Martin <dave.mar...@arm.com> writes:
> 
> > On Tue, Nov 07, 2017 at 03:05:48PM +, Alex Bennée wrote:
> >> Hi,
> >>
> >> These patches apply on-top of the last clean-up series:
> >>
> >>   Subject: [RISU PATCH 0/7] Add @Group support and some aarch64.risu 
> >> cleanups
> >>   Date: Tue, 31 Oct 2017 14:54:37 +
> >>   Message-Id: <20171031145444.13766-1-alex.ben...@linaro.org>
> >>
> >> This series adds support for SVE to RISU. Most of the initial patches
> >> are plumbing changes to better support arch specific option flags
> >> (cleaning up a TODO in the process). I also needed to ensure configure
> >> actually honoured CPPFLAGS so it could be passed yet to be released
> >> headers.
> >
> > Should there be a getauxval(AT_HWCAP) & HWCAP_SVE check in this series
> > somewhere?
> >
> > I don't know enough about how RISU is structured to know whether/where
> > this is needed.
> 
> That would be a saner runtime check to do but it's a balance as RISU is
> a fairly specialist tool which kind of assumes people know what they are
> doing.
> 
> The current check is on SVE_MAGIC in the header files which does mean a
> binary compiled on an SVE headered system is now carrying about a much
> larger register dump even when run without the --test-sve flag.
> 
> Whether it makes sense to be more flexible is a call I'll leave up to
> Peter.

Fair enough.  If there is anywhere useful to put it, it would serve as
a useful example -- but as you point out, this is not a typical piece
of userspace software...

Cheers
---Dave



Re: [Qemu-devel] [RISU PATCH 09/10] risu_reginfo_aarch64: add reginfo_copy_sve

2017-11-08 Thread Dave Martin
On Tue, Nov 07, 2017 at 03:05:57PM +, Alex Bennée wrote:
> Add the ability to save SVE registers from the signal context. This is
> controlled with an optional flag --test-sve. The whole thing is
> conditionally compiled when SVE support is in the sigcontext headers.
> 
> Technically SVE registers could be beyond an EXTRA_MAGIC section. I've
> not seen this on the model so currently we abort() if we encounter the
> EXTRA_MAGIC section.
> 
> Signed-off-by: Alex Bennée 
> ---
>  risu_reginfo_aarch64.c | 74 
> ++
>  risu_reginfo_aarch64.h |  8 ++
>  2 files changed, 82 insertions(+)
> 
> diff --git a/risu_reginfo_aarch64.c b/risu_reginfo_aarch64.c
> index 38ad338..7c97790 100644
> --- a/risu_reginfo_aarch64.c
> +++ b/risu_reginfo_aarch64.c
> @@ -13,12 +13,78 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> +#include 
>  
>  #include "risu.h"
>  #include "risu_reginfo_aarch64.h"
>  
> +#ifndef SVE_MAGIC
>  void *arch_long_opts;
>  char *arch_extra_help;
> +#else
> +/* Should we test SVE register state */
> +static int test_sve;
> +static struct option extra_opts[] = {
> +{"test-sve", no_argument, _sve, 1},
> +{0, 0, 0, 0}
> +};
> +
> +void *arch_long_opts = _opts[0];
> +char *arch_extra_help = "  --test-sveCompare SVE registers\n";
> +
> +/* Extra SVE copy function, only called with --test-sve */
> +static void reginfo_copy_sve(struct reginfo *ri, struct _aarch64_ctx *ctx)
> +{
> +struct sve_context *sve;
> +int r, vq;
> +bool found = false;
> +
> +while (!found) {
> +   switch (ctx->magic)
> +   {
> +  case SVE_MAGIC:
> + found = true;
> + break;
> +  case EXTRA_MAGIC:
> + fprintf(stderr, "%s: found EXTRA_MAGIC\n", __func__);
> + abort();
> +  case 0:
> + /* We might not have an SVE context */
> + fprintf(stderr, "%s: reached end of ctx, no joy (%d)\n", 
> __func__, ctx->size);
> + return;
> +  default:
> + ctx = (struct _aarch64_ctx *)((void *)ctx + ctx->size);
> + break;
> +   }
> +
> +}
> +
> +sve = (struct sve_context *) ctx;
> +ri->vl = sve->vl;
> +vq = sve_vq_from_vl(sve->vl); /* number of quads for whole vl */
> +
> +/* Copy ZREG's one at a time */
> +for (r = 0; r < SVE_NUM_ZREGS; r++) {
> +memcpy(>zregs[r],
> +   (char *)sve + SVE_SIG_ZREG_OFFSET(vq, r),
> +   SVE_SIG_ZREG_SIZE(vq));
> +}
> +
> +/* Copy PREG's one at a time */
> +for (r = 0; r < SVE_NUM_PREGS; r++) {
> +memcpy(>pregs[r],
> +   (char *)sve + SVE_SIG_PREG_OFFSET(vq, r),
> +   SVE_SIG_PREG_SIZE(vq));
> +}
> +
> +/* Finally the FFR */
> +memcpy(>ffr,(char *)sve + SVE_SIG_FFR_OFFSET(vq),
> +   SVE_SIG_FFR_SIZE(vq));
> +
> +}
> +#endif
>  
>  /* reginfo_init: initialize with a ucontext */
>  void reginfo_init(struct reginfo *ri, ucontext_t *uc)
> @@ -26,6 +92,7 @@ void reginfo_init(struct reginfo *ri, ucontext_t *uc)
>  int i;
>  struct _aarch64_ctx *ctx;
>  struct fpsimd_context *fp;
> +
>  /* necessary to be able to compare with memcmp later */
>  memset(ri, 0, sizeof(*ri));
>  
> @@ -59,6 +126,13 @@ void reginfo_init(struct reginfo *ri, ucontext_t *uc)
>  for (i = 0; i < 32; i++) {
>  ri->vregs[i] = fp->vregs[i];
>  }
> +

Where do you get uc from?

If it comes straight out of a signal frame that's OK, but if it's copied
around as a ucontext_t or manipulated by getcontext() etc this is going
to go wrong when extra_context is present.

This code may get used by people as a reference, so at least adding a
comment to the effect that this only works on the signal frame would be
a good idea.


When you add support for extra_context, you should probably add a
sanity-check to ensure that the datap pointer really does point to the
right place -- see .

Even though the kernel _should_ never violate this, it can be violated
by ucontext_t manipulation in userspace, so any function that doesn't
know where its ucontext_t came from should do this check.

[...]

Cheers
---Dave



Re: [Qemu-devel] [RISU PATCH 00/10] Initial support for SVE

2017-11-08 Thread Dave Martin
On Tue, Nov 07, 2017 at 03:05:48PM +, Alex Bennée wrote:
> Hi,
> 
> These patches apply on-top of the last clean-up series:
> 
>   Subject: [RISU PATCH 0/7] Add @Group support and some aarch64.risu cleanups
>   Date: Tue, 31 Oct 2017 14:54:37 +
>   Message-Id: <20171031145444.13766-1-alex.ben...@linaro.org>
> 
> This series adds support for SVE to RISU. Most of the initial patches
> are plumbing changes to better support arch specific option flags
> (cleaning up a TODO in the process). I also needed to ensure configure
> actually honoured CPPFLAGS so it could be passed yet to be released
> headers.

Should there be a getauxval(AT_HWCAP) & HWCAP_SVE check in this series
somewhere?

I don't know enough about how RISU is structured to know whether/where
this is needed.

[...]

Cheers
---Dave



Re: [Qemu-devel] [RISU PATCH 06/10] configure: support CPPFLAGS

2017-11-08 Thread Dave Martin
On Tue, Nov 07, 2017 at 03:05:54PM +, Alex Bennée wrote:
> Useful for accessing API's that are still brewing, e.g:
> 
>   CROSS_PREFIX=aarch64-linux-gnu- \
> CPPFLAGS=-I/home/alex/lsrc/qemu/risu.git/sve-headers/include \
> ../configure
> 
> Signed-off-by: Alex Bennée 
> ---
>  README| 6 ++
>  configure | 1 +
>  2 files changed, 7 insertions(+)
> 
> diff --git a/README b/README
> index 9946e6e..fbe408d 100644
> --- a/README
> +++ b/README
> @@ -26,6 +26,12 @@ Most useful is
> need this if you're not building on the target system
> (Example: CROSS_PREFIX=arm-linux-gnueabihf- )
>  
> +Another useful flag is
> + CPPFLAGS= which specified pre-processor flags, usually -I statements
> +   for specifying extra include paths. Use this is you need something

s/is/if/

> +   from new kernel headers not installed on your system.
> +   (Example: CPPFLAGS=-I/path/to/sve-kernel-headers/include)
> +

You should probably point out that the path should refer to headers
installed by running 'make headers_install' in the Linux source.  This
means we really test the uapi headers properly, and I don't get spurious
bug reports about bad headers ;)

[...]

Cheers
---Dave



Re: [Qemu-devel] [RISU PATCH 10/10] risu_reginfo_aarch64: add SVE support to reginfo_dump_mismatch

2017-11-08 Thread Dave Martin
On Tue, Nov 07, 2017 at 03:05:58PM +, Alex Bennée wrote:
> Signed-off-by: Alex Bennée 
> ---
>  risu_reginfo_aarch64.c | 49 +
>  1 file changed, 49 insertions(+)
> 
> diff --git a/risu_reginfo_aarch64.c b/risu_reginfo_aarch64.c
> index 7c97790..8aba192 100644
> --- a/risu_reginfo_aarch64.c
> +++ b/risu_reginfo_aarch64.c
> @@ -141,6 +141,18 @@ int reginfo_is_eq(struct reginfo *r1, struct reginfo *r2)
>  return memcmp(r1, r2, sizeof(*r1)) == 0;
>  }
>  
> +#ifdef SVE_MAGIC
> +static int sve_zreg_is_eq(struct reginfo *r1, struct reginfo *r2, int z)
> +{
> +return memcmp(r1->zregs[z], r2->zregs[z], sizeof(*r1->zregs[z])) == 0;
> +}
> +
> +static int sve_preg_is_eq(struct reginfo *r1, struct reginfo *r2, int p)
> +{
> +return memcmp(r1->pregs[p], r2->pregs[p], sizeof(*r1->pregs[p])) == 0;
> +}
> +#endif
> +
>  /* reginfo_dump: print state to a stream, returns nonzero on success */
>  int reginfo_dump(struct reginfo *ri, FILE * f)
>  {
> @@ -216,5 +228,42 @@ int reginfo_dump_mismatch(struct reginfo *m, struct 
> reginfo *a, FILE * f)
>  }
>  }
>  
> +#ifdef SVE_MAGIC
> +if (test_sve) {
> +if (m->vl != a->vl) {
> +fprintf(f, "  SVE VL  : %d vs %d\n", m->vl, a->vl);
> +}
> +for (i = 0; i < SVE_NUM_PREGS; i++) {
> +   if (!sve_preg_is_eq(m, a, i)) {
> +  int q;
> +  fprintf(f, "  P%2d   : ", i);
> +  for (q = 0; q < sve_vq_from_vl(m->vl); q++) {
> + fprintf(f, "%04x", m->pregs[i][q]);
> +  }
> +  fprintf(f, " vs ");
> +  for (q = 0; q < sve_vq_from_vl(m->vl); q++) {
> + fprintf(f, "%04x", a->pregs[i][q]);
> +  }
> +  fprintf(f, "\n");
> +}
> +}
> +for (i = 0; i < SVE_NUM_ZREGS; i++) {
> +   if (!sve_zreg_is_eq(m, a, i)) {
> +  int q;
> +  char *pad="";
> +  fprintf(f, "  Z%2d   : ", i);
> +  for (q = 0; q < sve_vq_from_vl(m->vl); q++) {
> + if (m->zregs[i][q] != a->zregs[i][q]) {
> +fprintf(f, "%sq%02d: %016" PRIx64 "%016" PRIx64 " vs 
> %016" PRIx64 "%016" PRIx64"\n", pad, q,
> +(uint64_t) (m->zregs[i][q] >> 64), (uint64_t) 
> m->zregs[i][q],
> +(uint64_t) (a->zregs[i][q] >> 64), (uint64_t) 
> a->zregs[i][q]);
> +pad = "  ";
> + }
> +  }
> +   }
> +}

No FFR?

Perhaps I should have explicitly encoded FFR as "P16" -- that sort of
works and saves some open-coding of the extra special case, but it feels
less correct.

You could do

static int sve_preg_is_eq(uint16_t const (*p1)[SVE_VQ_MAX],
uint16_t const (*p2)[SVE_VQ_MAX])
{
return memcmp(p1, p2, sizeof *p1) == 0;
}

/* ... */

sve_preg_is_eq(>pregs[p], >pregs[p])

/* ... */

sve_preg_is_eq(>ffr, >ffr)

(or some variation on this theme).  ffr is a specialised predicate
register, so I think you can assume that it really does have the same
type as pregs[p].  Coding the above way will give a typcheck error if
not.

Cheers
---Dave