Re: [PATCH 1/2] Add a rudimentary manpage
Hi Andre, of Phoronix fame, On Tue, Dec 22, 2015 at 02:00:46PM +, Andre Przywara wrote: > The kvmtool documentation is somewhat lacking, also it is not easily > accessible when living in the source tree only. > Add a good ol' manpage to document at least the basic commands and > their options. > This level of documentation matches the one that is already there in > the Documentation directory and should be subject to extension. > > Signed-off-by: Andre Przywara> --- > Documentation/kvmtool.1 | 222 > > 1 file changed, 222 insertions(+) > create mode 100644 Documentation/kvmtool.1 > > diff --git a/Documentation/kvmtool.1 b/Documentation/kvmtool.1 > new file mode 100644 > index 000..aecb2dc > --- /dev/null > +++ b/Documentation/kvmtool.1 > @@ -0,0 +1,222 @@ > +.\" Manpage for kvmtool > +.\" Copyright (C) 2015 by Andre Przywara > +.TH kvmtool 1 "11 Nov 2015" "0.1" "kvmtool man page" > +.SH NAME > +kvmtool \- running KVM guests > +.SH SYNOPSIS > +lkvm COMMAND [ARGS] > +.SH DESCRIPTION > +kvmtool is a userland tool for creating and controlling KVM guests. > +.SH "KVMTOOL COMMANDS" > +.sp > +.PP > +.B run -k ... You seem to be inconsistent with your synopses for each command. That is, here you just have -k ... , but later you have things like > +.B debug --all|--name [--dump] [--nmi ] [--sysrq ] which describes all of the possible options to debug. I think I prefer this latter way, so could you make all of the commands look like that, please? Will -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 02/21] KVM: ARM64: Define PMU data structure for each vcpu
On Tue, Dec 08, 2015 at 01:37:14PM +, Marc Zyngier wrote: > On 08/12/15 12:47, Shannon Zhao wrote: > > From: Shannon Zhao> > > > Here we plan to support virtual PMU for guest by full software > > emulation, so define some basic structs and functions preparing for > > futher steps. Define struct kvm_pmc for performance monitor counter and > > struct kvm_pmu for performance monitor unit for each vcpu. According to > > ARMv8 spec, the PMU contains at most 32(ARMV8_MAX_COUNTERS) counters. > > > > Since this only supports ARM64 (or PMUv3), add a separate config symbol > > for it. > > > > Signed-off-by: Shannon Zhao > > --- > > arch/arm64/include/asm/kvm_host.h | 2 ++ > > arch/arm64/kvm/Kconfig| 8 > > include/kvm/arm_pmu.h | 40 > > +++ > > 3 files changed, 50 insertions(+) > > create mode 100644 include/kvm/arm_pmu.h [...] > > diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig > > index a5272c0..66da9a2 100644 > > --- a/arch/arm64/kvm/Kconfig > > +++ b/arch/arm64/kvm/Kconfig > > @@ -36,6 +36,7 @@ config KVM > > select HAVE_KVM_EVENTFD > > select HAVE_KVM_IRQFD > > select KVM_ARM_VGIC_V3 > > + select KVM_ARM_PMU > > What if HW_PERF_EVENTS is not selected? Also, selecting HW_PERF_EVENTS > is not enough, and you probably need PERF_EVENTS as well, So this should > probably read: > > select KVM_ARM_PMU if (HW_PERF_EVENTS && PERF_EVENTS) HW_PERF_EVENTS depends on ARM_PMU which in turn depends on PERF_EVENTS. Will -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvmtool: virtio-net: fix VIRTIO_NET_F_MRG_RXBUF usage in rx thread
When merging virtio-net buffers using the VIRTIO_NET_F_MRG_RXBUF feature, the first buffer added to the used ring should indicate the total number of buffers used to hold the packet. Unfortunately, kvmtool has a number of issues when constructing these merged buffers: - Commit 5131332e3f1a ("kvmtool: convert net backend to support bi-endianness") introduced a strange loop counter, which resulted in hdr->num_buffers being set redundantly the first time round - When adding the buffers to the ring, we actually add them one-by-one, allowing the guest to see the header before we've inserted the rest of the data buffers... - ... which is made worse because we non-atomically increment the num_buffers count in the header each time we insert a new data buffer Consequently, the guest quickly becomes confused in its net rx code and the whole thing grinds to a halt. This is easily exemplified by trying to boot a root filesystem over NFS, which seldom succeeds. This patch resolves the issues by allowing us to insert items into the used ring without updating the index. Once the full payload has been added and num_buffers corresponds to the total size, we *then* publish the buffers to the guest. Cc: Marc Zyngier <marc.zyng...@arm.com> Cc: Sasha Levin <sasha.le...@oracle.com> Signed-off-by: Will Deacon <will.dea...@arm.com> --- Without this patch, I can't boot to a prompt using NFS over virtio-net. include/kvm/virtio.h | 2 ++ virtio/core.c| 32 +--- virtio/net.c | 31 +++ 3 files changed, 42 insertions(+), 23 deletions(-) diff --git a/include/kvm/virtio.h b/include/kvm/virtio.h index 768ee9668d44..8324ba7d38be 100644 --- a/include/kvm/virtio.h +++ b/include/kvm/virtio.h @@ -112,6 +112,8 @@ static inline bool virt_queue__available(struct virt_queue *vq) return virtio_guest_to_host_u16(vq, vq->vring.avail->idx) != vq->last_avail_idx; } +void virt_queue__used_idx_advance(struct virt_queue *queue, u16 jump); +struct vring_used_elem * virt_queue__set_used_elem_no_update(struct virt_queue *queue, u32 head, u32 len, u16 offset); struct vring_used_elem *virt_queue__set_used_elem(struct virt_queue *queue, u32 head, u32 len); bool virtio_queue__should_signal(struct virt_queue *vq); diff --git a/virtio/core.c b/virtio/core.c index 3b6e4d7cd045..d6ac289d450e 100644 --- a/virtio/core.c +++ b/virtio/core.c @@ -21,22 +21,17 @@ const char* virtio_trans_name(enum virtio_trans trans) return "unknown"; } -struct vring_used_elem *virt_queue__set_used_elem(struct virt_queue *queue, u32 head, u32 len) +void virt_queue__used_idx_advance(struct virt_queue *queue, u16 jump) { - struct vring_used_elem *used_elem; u16 idx = virtio_guest_to_host_u16(queue, queue->vring.used->idx); - used_elem = >vring.used->ring[idx % queue->vring.num]; - used_elem->id = virtio_host_to_guest_u32(queue, head); - used_elem->len = virtio_host_to_guest_u32(queue, len); - /* * Use wmb to assure that used elem was updated with head and len. * We need a wmb here since we can't advance idx unless we're ready * to pass the used element to the guest. */ wmb(); - idx++; + idx += jump; queue->vring.used->idx = virtio_host_to_guest_u16(queue, idx); /* @@ -45,6 +40,29 @@ struct vring_used_elem *virt_queue__set_used_elem(struct virt_queue *queue, u32 * an updated idx. */ wmb(); +} + +struct vring_used_elem * +virt_queue__set_used_elem_no_update(struct virt_queue *queue, u32 head, + u32 len, u16 offset) +{ + struct vring_used_elem *used_elem; + u16 idx = virtio_guest_to_host_u16(queue, queue->vring.used->idx); + + idx += offset; + used_elem = >vring.used->ring[idx % queue->vring.num]; + used_elem->id = virtio_host_to_guest_u32(queue, head); + used_elem->len = virtio_host_to_guest_u32(queue, len); + + return used_elem; +} + +struct vring_used_elem *virt_queue__set_used_elem(struct virt_queue *queue, u32 head, u32 len) +{ + struct vring_used_elem *used_elem; + + used_elem = virt_queue__set_used_elem_no_update(queue, head, len, 0); + virt_queue__used_idx_advance(queue, 1); return used_elem; } diff --git a/virtio/net.c b/virtio/net.c index 6d1be657735e..ffa83ab3b549 100644 --- a/virtio/net.c +++ b/virtio/net.c @@ -80,14 +80,12 @@ static void virtio_net_fix_tx_hdr(struct virtio_net_hdr *hdr, struct net_dev *nd hdr->csum_offset= virtio_guest_to_host_u16(>vdev, hdr->csum_offset); } -static void virtio_net_fix_rx_hdr(struct virtio_net_hdr_mrg_rxbuf *hdr, struct net_dev *ndev) +static void virtio_net_fix_rx_hdr(struct virtio_net_hdr *hdr, struct net_dev
Re: [PATCH 0/7] kvmtool: Cleanup kernel loading
On Wed, Nov 18, 2015 at 10:29:30AM +, Andre Przywara wrote: > On 02/11/15 14:58, Will Deacon wrote: > > On Fri, Oct 30, 2015 at 06:26:53PM +, Andre Przywara wrote: > >> this series cleans up kvmtool's kernel loading functionality a bit. > >> It has been broken out of a previous series I sent [1] and contains > >> just the cleanup and bug fix parts, which should be less controversial > >> and thus easier to merge ;-) > >> I will resend the pipe loading part later on as a separate series. > >> > >> The first patch properly abstracts kernel loading to move > >> responsibility into each architecture's code. It removes quite some > >> ugly code from the generic kvm.c file. > >> The later patches address the naive usage of read(2) to, well, read > >> data from files. Doing this without coping with the subtleties of > >> the UNIX read semantics (returning with less or none data read is not > >> an error) can provoke hard to debug failures. > >> So these patches make use of the existing and one new wrapper function > >> to make sure we read everything we actually wanted to. > >> The last patch moves the ARM kernel loading code into the proper > >> location to be in line with the other architectures. > >> > >> Please have a look and give some comments! > > > > Looks good to me, but I'd like to see some comments from some mips/ppc/x86 > > people on the changes you're making over there. > > Sounds reasonable, but no answers yet. > > Can you take at least patch 1 and 2 meanwhile, preferably 6 and 7 (the > ARM parts) also if you are OK with it? > I have other patches that depend on 1/7 and 2/7, so having them upstream > would help me and reduce further dependency churn. > I am happy to resend the remaining patches for further discussion later. We let them sit on the list for a while with no comments, so I just pushed out your series. If a bug report shows up, then we can always revert the offending patch if there's no quick fix. Will -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] kvmtool: Cleanup kernel loading
On Wed, Nov 18, 2015 at 10:29:30AM +, Andre Przywara wrote: > On 02/11/15 14:58, Will Deacon wrote: > > On Fri, Oct 30, 2015 at 06:26:53PM +, Andre Przywara wrote: > >> this series cleans up kvmtool's kernel loading functionality a bit. > >> It has been broken out of a previous series I sent [1] and contains > >> just the cleanup and bug fix parts, which should be less controversial > >> and thus easier to merge ;-) > >> I will resend the pipe loading part later on as a separate series. > >> > >> The first patch properly abstracts kernel loading to move > >> responsibility into each architecture's code. It removes quite some > >> ugly code from the generic kvm.c file. > >> The later patches address the naive usage of read(2) to, well, read > >> data from files. Doing this without coping with the subtleties of > >> the UNIX read semantics (returning with less or none data read is not > >> an error) can provoke hard to debug failures. > >> So these patches make use of the existing and one new wrapper function > >> to make sure we read everything we actually wanted to. > >> The last patch moves the ARM kernel loading code into the proper > >> location to be in line with the other architectures. > >> > >> Please have a look and give some comments! > > > > Looks good to me, but I'd like to see some comments from some mips/ppc/x86 > > people on the changes you're making over there. > > Sounds reasonable, but no answers yet. > > Can you take at least patch 1 and 2 meanwhile, preferably 6 and 7 (the > ARM parts) also if you are OK with it? > I have other patches that depend on 1/7 and 2/7, so having them upstream > would help me and reduce further dependency churn. > I am happy to resend the remaining patches for further discussion later. We let them sit on the list for a while with no comments, so I just pushed out your series. If a bug report shows up, then we can always revert the offending patch if there's no quick fix. Will -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] arm64: KVM: Add workaround for Cortex-A57 erratum 834220
Hi Marc, On Mon, Nov 16, 2015 at 10:28:18AM +, Marc Zyngier wrote: > Cortex-A57 parts up to r1p2 can misreport Stage 2 translation faults > when a Stage 1 permission fault or device alignment fault should > have been reported. > > This patch implements the workaround (which is to validate that the > Stage-1 translation actually succeeds) by using code patching. > > Signed-off-by: Marc Zyngier <marc.zyng...@arm.com> > --- > arch/arm64/Kconfig | 21 + > arch/arm64/include/asm/cpufeature.h | 3 ++- > arch/arm64/kernel/cpu_errata.c | 9 + > arch/arm64/kvm/hyp.S| 6 ++ > 4 files changed, 38 insertions(+), 1 deletion(-) > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index 9ac16a4..746d985 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -316,6 +316,27 @@ config ARM64_ERRATUM_832075 > > If unsure, say Y. > > +config ARM64_ERRATUM_834220 > + bool "Cortex-A57: 834220: Stage 2 translation fault might be > incorrectly reported in presence of a Stage 1 fault" > + depends on KVM > + default y > + help > + This option adds an alternative code sequence to work around ARM > + erratum 834220 on Cortex-A57 parts up to r1p2. > + > + Affected Cortex-A57 parts might report a Stage 2 translation > + fault as a the result of a Stage 1 fault for load crossing a s/as a the/as the/ s/for load/for a load/ > + page boundary when there is a permission or device memory > + alignment fault at Stage 1 and a translation fault at Stage 2. > + > + The workaround is to verify that the Stage-1 translation Consistency between "Stage 1" and "Stage-1". > + doesn't generate a fault before handling the Stage-2 fault. Same here. > + Please note that this does not necessarily enable the workaround, > + as it depends on the alternative framework, which will only patch > + the kernel if an affected CPU is detected. > + > + If unsure, say Y. > + > config ARM64_ERRATUM_845719 > bool "Cortex-A53: 845719: a load might read incorrect data" > depends on COMPAT > diff --git a/arch/arm64/include/asm/cpufeature.h > b/arch/arm64/include/asm/cpufeature.h > index 11d5bb0f..52722ee 100644 > --- a/arch/arm64/include/asm/cpufeature.h > +++ b/arch/arm64/include/asm/cpufeature.h > @@ -29,8 +29,9 @@ > #define ARM64_HAS_PAN4 > #define ARM64_HAS_LSE_ATOMICS5 > #define ARM64_WORKAROUND_CAVIUM_231546 > +#define ARM64_WORKAROUND_834220 7 > > -#define ARM64_NCAPS 7 > +#define ARM64_NCAPS 8 > > #ifndef __ASSEMBLY__ > > diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c > index 24926f2..feb6b4e 100644 > --- a/arch/arm64/kernel/cpu_errata.c > +++ b/arch/arm64/kernel/cpu_errata.c > @@ -75,6 +75,15 @@ const struct arm64_cpu_capabilities arm64_errata[] = { > (1 << MIDR_VARIANT_SHIFT) | 2), > }, > #endif > +#ifdef CONFIG_ARM64_ERRATUM_834220 > + { > + /* Cortex-A57 r0p0 - r1p2 */ > + .desc = "ARM erratum 834220", > + .capability = ARM64_WORKAROUND_834220, > + MIDR_RANGE(MIDR_CORTEX_A57, 0x00, > +(1 << MIDR_VARIANT_SHIFT) | 2), > + }, > +#endif > #ifdef CONFIG_ARM64_ERRATUM_845719 > { > /* Cortex-A53 r0p[01234] */ > diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S > index 1599701..ff2e038 100644 > --- a/arch/arm64/kvm/hyp.S > +++ b/arch/arm64/kvm/hyp.S > @@ -1015,9 +1015,15 @@ el1_trap: > b.ne1f // Not an abort we care about > > /* This is an abort. Check for permission fault */ > +alternative_if_not ARM64_WORKAROUND_834220 > and x2, x1, #ESR_ELx_FSC_TYPE > cmp x2, #FSC_PERM > b.ne1f // Not a permission fault > +alternative_else > + nop // Use the permission fault path to > + nop // check for a valid S1 translation, > + nop // regardless of the ESR value. > +alternative_endif With the cosmetic changes: Reviewed-by: Will Deacon <will.dea...@arm.com> Can you cc stable as well, please? Will -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] kvmtool: assume dead vcpus are paused too
On Wed, Nov 04, 2015 at 06:51:12PM -0500, Sasha Levin wrote: > On 11/04/2015 06:51 AM, Will Deacon wrote: > > + mutex_lock(_lock); > > + > > + /* The kvm->cpus array contains a null pointer in the last location */ > > + for (i = 0; ; i++) { > > + if (kvm->cpus[i]) > > + pthread_kill(kvm->cpus[i]->thread, SIGKVMEXIT); > > + else > > + break; > > + } > > + > > + kvm__continue(kvm); > > In this scenario: if we grabbed pause_lock, signaled vcpu0 to exit, and it did > before we called kvm__continue(), we won't end up releasing pause_lock, which > might cause a lockup later, no? Hmm, yeah, maybe that should be an explicit mutex_unlock rather than a call to kvm__continue. Will -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] kvmtool: assume dead vcpus are paused too
Hi Sasha, [adding the kvm list; not sure why it was dropped] On Wed, Oct 28, 2015 at 01:34:25PM +, Will Deacon wrote: > On Wed, Oct 28, 2015 at 09:00:07AM -0400, Sasha Levin wrote: > > On 10/28/2015 08:27 AM, Will Deacon wrote: > > > On Tue, Oct 27, 2015 at 10:08:44PM -0400, Sasha Levin wrote: > > >> > On 10/27/2015 12:08 PM, Will Deacon wrote: > > >>>> > >> for (i = 0; i < kvm->nrcpus; i++) > > >>>>> > >> > - pthread_kill(kvm->cpus[i]->thread, SIGKVMPAUSE); > > >>>>> > >> > + if (kvm->cpus[i]->is_running) > > >>>>> > >> > + pthread_kill(kvm->cpus[i]->thread, > > >>>>> > >> > SIGKVMPAUSE); > > >>>>> > >> > + else > > >>>>> > >> > + paused_vcpus++; > > >>> > > What serialises the test of kvm->cpus[i]->is_running against a > > >>> > > concurrent > > >>> > > SIGKVMEXIT? > > >> > > > >> > If there's a pause signal pending we'd still end up marking it as > > >> > paused > > >> > and do the whole process even though the vcpu is actually dead, so > > >> > while > > >> > we can race with SIGKVMEXIT, it'll just mean we'd go through the whole > > >> > pausing procedure even though the vcpu is dead. > > >> > > > >> > Or did you mean something else? > > > I couldn't convince myself there was an issue here (hence the question ;), > > > but I was wondering about something like: > > > > > > 1. The VCPUn thread takes SIGKVMEXIT (e.g. due to kvm_cpu__reboot) > > > 2. The IPC thread handles a pause event and reads > > > kvm->cpus[n]->is_running > > > as true > > > 3. VCPUn sets kvm->cpus[n]->is_running to false > > > 4. VCPUn exits > > > 5. The IPC thread trues to pthread_kill on an exited thread > > > > > > am I missing some obvious synchronisation here? > > > > Can we take two signals in parallel? (I'm really not sure). If yes, than > > what you've described is a problem (and has been for a while). If not, > > then no :) > > Regardless, isn't the pause event coming from polling the IPC file > descriptor as opposed to a signal? It looks like lkvm is currently SEGVing on exit due to the original issue. Digging deeper, it looks like we should be taking the pause_lock for the SIGKVMEXIT case too, so that we can serialise access to is_running. What do you think about the patch below? Will --->8 diff --git a/hw/i8042.c b/hw/i8042.c index 3801e20a675d..288b7d1108ac 100644 --- a/hw/i8042.c +++ b/hw/i8042.c @@ -163,7 +163,7 @@ static void kbd_write_command(struct kvm *kvm, u8 val) state.mode &= ~MODE_DISABLE_AUX; break; case I8042_CMD_SYSTEM_RESET: - kvm_cpu__reboot(kvm); + kvm__reboot(kvm); break; default: break; diff --git a/include/kvm/kvm-cpu.h b/include/kvm/kvm-cpu.h index aa0cb543f8fb..c4c9cca449eb 100644 --- a/include/kvm/kvm-cpu.h +++ b/include/kvm/kvm-cpu.h @@ -12,7 +12,6 @@ void kvm_cpu__reset_vcpu(struct kvm_cpu *vcpu); void kvm_cpu__setup_cpuid(struct kvm_cpu *vcpu); void kvm_cpu__enable_singlestep(struct kvm_cpu *vcpu); void kvm_cpu__run(struct kvm_cpu *vcpu); -void kvm_cpu__reboot(struct kvm *kvm); int kvm_cpu__start(struct kvm_cpu *cpu); bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu); int kvm_cpu__get_endianness(struct kvm_cpu *vcpu); diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h index 37155dbc132b..a474dae6c2cf 100644 --- a/include/kvm/kvm.h +++ b/include/kvm/kvm.h @@ -93,6 +93,7 @@ int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool c void (*mmio_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr), void *ptr); bool kvm__deregister_mmio(struct kvm *kvm, u64 phys_addr); +void kvm__reboot(struct kvm *kvm); void kvm__pause(struct kvm *kvm); void kvm__continue(struct kvm *kvm); void kvm__notify_paused(void); diff --git a/kvm-cpu.c b/kvm-cpu.c index ad4441b1d96c..3e2037e3ccb3 100644 --- a/kvm-cpu.c +++ b/kvm-cpu.c @@ -45,10 +45,8 @@ void kvm_cpu__run(struct kvm_cpu *vcpu) static void kvm_cpu_signal_handler(int signum) { if (signum == SIGKVMEXIT) { - if (current_kvm_cpu && current_kvm_cpu->is_running) { + if (current_kvm_cpu && current_kvm_cpu->is_running) curre
Re: [PATCH 1/2] Makefile: allow overriding CFLAGS on the command line
On Wed, Nov 04, 2015 at 12:02:23PM +0200, Riku Voipio wrote: > On 30 October 2015 at 19:20, Andre Przywarawrote: > > When a Makefile variable is set on the make command line, all > > Makefile-internal assignments to that very variable are _ignored_. > > Since we add quite some essential values to CFLAGS internally, > > specifying some CFLAGS on the command line will usually break the > > build (and not fix any include file problems you hoped to overcome > > with that). > > Somewhat against intuition GNU make provides the "override" directive > > to change this behavior; with that assignments in the Makefile get > > _appended_ to the value given on the command line. [1] > > > > Change any internal assignments to use that directive, so that a user > > can use: > > $ make CFLAGS=/path/to/my/include/dir > > to teach kvmtool about non-standard header file locations (helpful > > for cross-compilation) or to tweak other compiler options. > > > > Signed-off-by: Andre Przywara > > > > [1] > > https://www.gnu.org/software/make/manual/html_node/Override-Directive.html > > --- > > Makefile | 15 +++ > > 1 file changed, 7 insertions(+), 8 deletions(-) > > > > diff --git a/Makefile b/Makefile > > index f8f7cc4..77a7c9f 100644 > > --- a/Makefile > > +++ b/Makefile > > @@ -15,9 +15,7 @@ include config/utilities.mak > > include config/feature-tests.mak > > > > CC := $(CROSS_COMPILE)gcc > > -CFLAGS := > > LD := $(CROSS_COMPILE)ld > > -LDFLAGS:= > > This breaks builds of debian packages as dpkg-buildpackage sets LDFLAGS > to something unsuitable for guest init. Thanks for the report, Riku. I'll revert this patch while we rethink how to support user-supplied toolchain flags. Will -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM/arm: kernel low level debug support for ARM32 virtual platforms
On Tue, Nov 03, 2015 at 09:44:52AM -0800, Mario Smarduch wrote: > On 11/3/2015 8:33 AM, Christopher Covington wrote: > > On 11/02/2015 06:51 PM, Mario Smarduch wrote: > >>this is a re-post from couple weeks ago, please take time to review > >> this > >> simple patch which simplifies DEBUG_LL and prevents kernel crash on > >> virtual > >> platforms. > >> > >> Before this patch DEBUG_LL for 'dummy virtual machine': > >> > >> ( ) Kernel low-level debugging via EmbeddedICE DCC channel > >> ( ) Kernel low-level debug output via semihosting I/O > >> ( ) Kernel low-level debugging via 8250 UART > >> ( ) Kernel low-level debugging via ARM Ltd PL01x Primecell > >> > >> In summary if debug uart is not emulated kernel crashes. > >> And once you pass that hurdle, uart physical/virtual addresses are unknown. > >> DEBUG_LL comes in handy on many occasions and should be somewhat > >> intuitive to use like it is for physical platforms. For virtual platforms > >> user may start daubting the host and get into a bigger mess. > >> > >> After this patch is applied user gets: > >> > >> (X) Kernel low-level debugging on QEMU Virtual Platform > >> ( ) Kernel low-level debugging on Kvmtool Virtual Platform > >>. above repeated > >> > >> The virtual addresses selected follow arm reference models, high in > >> vmalloc > >> section with high mem enabled and guest running with >= 1GB of memory. The > >> offset is leftover from arm reference models. > > > > Which model? It doesn't appear to match the vexpress AEM/RTSM/FVP/whatever > > which used 0x1c09 for UART0. > > I recall QEMU virt model had it's own physical address map, for sure I saw the > virtio-mmio regions assigned in some ARM document. Peter would you know? > > As far as kvmtool I'm not sure, currently PC1 COM1 port is used? Andre will > that > stay fixed? We make absolutely no guarantees about the memory map provided by kvmtool. Will -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] kvmtool: Cleanup kernel loading
On Fri, Oct 30, 2015 at 06:26:53PM +, Andre Przywara wrote: > Hi, Hello Andre, > this series cleans up kvmtool's kernel loading functionality a bit. > It has been broken out of a previous series I sent [1] and contains > just the cleanup and bug fix parts, which should be less controversial > and thus easier to merge ;-) > I will resend the pipe loading part later on as a separate series. > > The first patch properly abstracts kernel loading to move > responsibility into each architecture's code. It removes quite some > ugly code from the generic kvm.c file. > The later patches address the naive usage of read(2) to, well, read > data from files. Doing this without coping with the subtleties of > the UNIX read semantics (returning with less or none data read is not > an error) can provoke hard to debug failures. > So these patches make use of the existing and one new wrapper function > to make sure we read everything we actually wanted to. > The last patch moves the ARM kernel loading code into the proper > location to be in line with the other architectures. > > Please have a look and give some comments! Looks good to me, but I'd like to see some comments from some mips/ppc/x86 people on the changes you're making over there. Will -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] kvmtool: Cleanup kernel loading
On Fri, Oct 30, 2015 at 06:26:53PM +, Andre Przywara wrote: > Hi, Hello Andre, > this series cleans up kvmtool's kernel loading functionality a bit. > It has been broken out of a previous series I sent [1] and contains > just the cleanup and bug fix parts, which should be less controversial > and thus easier to merge ;-) > I will resend the pipe loading part later on as a separate series. > > The first patch properly abstracts kernel loading to move > responsibility into each architecture's code. It removes quite some > ugly code from the generic kvm.c file. > The later patches address the naive usage of read(2) to, well, read > data from files. Doing this without coping with the subtleties of > the UNIX read semantics (returning with less or none data read is not > an error) can provoke hard to debug failures. > So these patches make use of the existing and one new wrapper function > to make sure we read everything we actually wanted to. > The last patch moves the ARM kernel loading code into the proper > location to be in line with the other architectures. > > Please have a look and give some comments! Looks good to me, but I'd like to see some comments from some mips/ppc/x86 people on the changes you're making over there. Will -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] vfio/type1: Do not support IOMMUs that allow bypass
On Tue, Oct 27, 2015 at 10:00:11AM -0600, Alex Williamson wrote: > On Tue, 2015-10-27 at 15:40 +, Will Deacon wrote: > > On Fri, Oct 16, 2015 at 09:51:22AM -0600, Alex Williamson wrote: > > > Would it be possible to add iommu_domain_geometry support to arm-smmu.c? > > > In addition to this test to verify that DMA cannot bypass the IOMMU, I'd > > > eventually like to pass the aperture information out through the VFIO > > > API. Thanks, > > > > The slight snag here is that we don't know which SMMU in the system the > > domain is attached to at the point when the geometry is queried, so I > > can't give you an upper bound on the aperture. For example, if there is > > an SMMU with a 32-bit input address and another with a 48-bit input > > address. > > > > We could play the same horrible games that we do with the pgsize bitmap, > > and truncate some global aperture everytime we probe an SMMU device, but > > I'd really like to have fewer hacks like that if possible. The root of > > the problem is still that domains are allocated for a bus, rather than > > an IOMMU instance. > > Yes, Intel VT-d has this issue as well. In theory we can have > heterogeneous IOMMU hardware units (DRHDs) in a system and the upper > bound of the geometry could be diminished if we add a less capable DRHD > into the domain. I suspect this is more a theoretical problem than a > practical one though as we're typically mixing similar DRHDs and I think > we're still capable of 39-bit addressing in the least capable version > per the spec. > > In any case, I really want to start testing geometry.force_aperture, > even if we're not yet comfortable to expose the IOMMU limits to the > user. The vfio type1 shouldn't be enabled at all for underlying > hardware that allows DMA bypass. Thanks, Ok, I'll put it on my list of things to look at under the assumption that the actual aperture limits don't need to be accurate as long as DMA to an arbitrary unmapped address always faults. Will -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] vfio/type1: Do not support IOMMUs that allow bypass
On Thu, Oct 29, 2015 at 06:42:10PM +, Robin Murphy wrote: > On 29/10/15 18:28, Will Deacon wrote: > >On Tue, Oct 27, 2015 at 10:00:11AM -0600, Alex Williamson wrote: > >>On Tue, 2015-10-27 at 15:40 +, Will Deacon wrote: > >>>On Fri, Oct 16, 2015 at 09:51:22AM -0600, Alex Williamson wrote: > >>>>Would it be possible to add iommu_domain_geometry support to arm-smmu.c? > >>>>In addition to this test to verify that DMA cannot bypass the IOMMU, I'd > >>>>eventually like to pass the aperture information out through the VFIO > >>>>API. Thanks, > >>> > >>>The slight snag here is that we don't know which SMMU in the system the > >>>domain is attached to at the point when the geometry is queried, so I > >>>can't give you an upper bound on the aperture. For example, if there is > >>>an SMMU with a 32-bit input address and another with a 48-bit input > >>>address. > >>> > >>>We could play the same horrible games that we do with the pgsize bitmap, > >>>and truncate some global aperture everytime we probe an SMMU device, but > >>>I'd really like to have fewer hacks like that if possible. The root of > >>>the problem is still that domains are allocated for a bus, rather than > >>>an IOMMU instance. > >> > >>Yes, Intel VT-d has this issue as well. In theory we can have > >>heterogeneous IOMMU hardware units (DRHDs) in a system and the upper > >>bound of the geometry could be diminished if we add a less capable DRHD > >>into the domain. I suspect this is more a theoretical problem than a > >>practical one though as we're typically mixing similar DRHDs and I think > >>we're still capable of 39-bit addressing in the least capable version > >>per the spec. > >> > >>In any case, I really want to start testing geometry.force_aperture, > >>even if we're not yet comfortable to expose the IOMMU limits to the > >>user. The vfio type1 shouldn't be enabled at all for underlying > >>hardware that allows DMA bypass. Thanks, > > > >Ok, I'll put it on my list of things to look at under the assumption that > >the actual aperture limits don't need to be accurate as long as DMA to > >an arbitrary unmapped address always faults. > > I'm pretty sure we'd only ever set the aperture to the full input address > range anyway (since we're not a GART-type thing), in which case we should > only need to worry about unmatched streams that don't hit in a domain at > all. Doesn't the disable_bypass option already cover that? (FWIW I hacked it > up for v2 a while back, too[0]). Well, the "full input address range" is tricky when you have multiple SMMU instances with different input address sizes. I can do something similar to the pgsize_bitmap. I also don't think the disable_bypass option is what we're after -- this is about devices attached to a VFIO domain that can still mysteriously bypass the SMMU for some ranges AFAIU (and shouldn't be an issue for ARM). Will -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vfio/type1: handle case where IOMMU does not support PAGE_SIZE size
On Thu, Oct 29, 2015 at 01:59:45PM +, Eric Auger wrote: > Current vfio_pgsize_bitmap code hides the supported IOMMU page > sizes smaller than PAGE_SIZE. As a result, in case the IOMMU > does not support PAGE_SIZE page, the alignment check on map/unmap > is done with larger page sizes, if any. This can fail although > mapping could be done with pages smaller than PAGE_SIZE. > > This patch modifies vfio_pgsize_bitmap implementation so that, > in case the IOMMU supports page sizes smaller than PAGE_HOST > we pretend PAGE_HOST is supported and hide sub-PAGE_HOST sizes. > That way the user will be able to map/unmap buffers whose size/ > start address is aligned with PAGE_HOST. Pinning code uses that > granularity while iommu driver can use the sub-PAGE_HOST size > to map the buffer. > > Signed-off-by: Eric Auger <eric.au...@linaro.org> > Signed-off-by: Alex Williamson <alex.william...@redhat.com> > > --- > > This was tested on AMD Seattle with 64kB page host. ARM MMU 401 > currently expose 4kB, 2MB and 1GB page support. With a 64kB page host, > the map/unmap check is done against 2MB. Some alignment check fail > so VFIO_IOMMU_MAP_DMA fail while we could map using 4kB IOMMU page > size. > > RFC -> PATCH v1: > - move all modifications in vfio_pgsize_bitmap following Alex' > suggestion to expose a fake PAGE_HOST support > - restore WARN_ON's > --- > drivers/vfio/vfio_iommu_type1.c | 15 ++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index 57d8c37..cee504a 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -403,13 +403,26 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, > struct vfio_dma *dma) > static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu) > { > struct vfio_domain *domain; > - unsigned long bitmap = PAGE_MASK; > + unsigned long bitmap = ULONG_MAX; > > mutex_lock(>lock); > list_for_each_entry(domain, >domain_list, next) > bitmap &= domain->domain->ops->pgsize_bitmap; > mutex_unlock(>lock); > > + /* > + * In case the IOMMU supports page sizes smaller than PAGE_HOST > + * we pretend PAGE_HOST is supported and hide sub-PAGE_HOST sizes. > + * That way the user will be able to map/unmap buffers whose size/ > + * start address is aligned with PAGE_HOST. Pinning code uses that > + * granularity while iommu driver can use the sub-PAGE_HOST size > + * to map the buffer. > + */ > + if (bitmap & ~PAGE_MASK) { > + bitmap &= PAGE_MASK; > + bitmap |= PAGE_SIZE; > + } > + s/PAGE_HOST/PAGE_SIZE/g (in the cover-letter too) and then I think this looks good: Acked-by: Will Deacon <will.dea...@arm.com> Will -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] vfio/type1: handle case where IOMMU does not support PAGE_SIZE size
On Wed, Oct 28, 2015 at 01:12:45PM +, Eric Auger wrote: > Current vfio_pgsize_bitmap code hides the supported IOMMU page > sizes smaller than PAGE_SIZE. As a result, in case the IOMMU > does not support PAGE_SIZE page, the alignment check on map/unmap > is done with larger page sizes, if any. This can fail although > mapping could be done with pages smaller than PAGE_SIZE. > > vfio_pgsize_bitmap is modified to expose the IOMMU page sizes, > supported by all domains, even those smaller than PAGE_SIZE. The > alignment check on map is performed against PAGE_SIZE if the minimum > IOMMU size is less than PAGE_SIZE or against the min page size greater > than PAGE_SIZE. > > Signed-off-by: Eric Auger> > --- > > This was tested on AMD Seattle with 64kB page host. ARM MMU 401 > currently expose 4kB, 2MB and 1GB page support. With a 64kB page host, > the map/unmap check is done against 2MB. Some alignment check fail > so VFIO_IOMMU_MAP_DMA fail while we could map using 4kB IOMMU page > size. > --- > drivers/vfio/vfio_iommu_type1.c | 25 +++-- > 1 file changed, 11 insertions(+), 14 deletions(-) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index 57d8c37..13fb974 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -403,7 +403,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, > struct vfio_dma *dma) > static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu) > { > struct vfio_domain *domain; > - unsigned long bitmap = PAGE_MASK; > + unsigned long bitmap = ULONG_MAX; > > mutex_lock(>lock); > list_for_each_entry(domain, >domain_list, next) > @@ -416,20 +416,18 @@ static unsigned long vfio_pgsize_bitmap(struct > vfio_iommu *iommu) > static int vfio_dma_do_unmap(struct vfio_iommu *iommu, >struct vfio_iommu_type1_dma_unmap *unmap) > { > - uint64_t mask; > struct vfio_dma *dma; > size_t unmapped = 0; > int ret = 0; > + unsigned int min_pagesz = __ffs(vfio_pgsize_bitmap(iommu)); > + unsigned int requested_alignment = (min_pagesz < PAGE_SIZE) ? > + PAGE_SIZE : min_pagesz; max_t ? > > - mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1; > - > - if (unmap->iova & mask) > + if (!IS_ALIGNED(unmap->iova, requested_alignment)) > return -EINVAL; > - if (!unmap->size || unmap->size & mask) > + if (!unmap->size || !IS_ALIGNED(unmap->size, requested_alignment)) > return -EINVAL; > > - WARN_ON(mask & PAGE_MASK); > - > mutex_lock(>lock); > > /* > @@ -553,25 +551,24 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, > size_t size = map->size; > long npage; > int ret = 0, prot = 0; > - uint64_t mask; > struct vfio_dma *dma; > unsigned long pfn; > + unsigned int min_pagesz = __ffs(vfio_pgsize_bitmap(iommu)); > + unsigned int requested_alignment = (min_pagesz < PAGE_SIZE) ? > + PAGE_SIZE : min_pagesz; Same code here. Perhaps you need another function to get the alignment? Otherwise, this looks pretty straightforward to me; iommu_map will take care of splitting up the requests to the IOMMU driver so they are in the right chunks. Will -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] vfio/type1: handle case where IOMMU does not support PAGE_SIZE size
On Wed, Oct 28, 2015 at 10:27:28AM -0600, Alex Williamson wrote: > On Wed, 2015-10-28 at 13:12 +, Eric Auger wrote: > > diff --git a/drivers/vfio/vfio_iommu_type1.c > > b/drivers/vfio/vfio_iommu_type1.c > > index 57d8c37..13fb974 100644 > > --- a/drivers/vfio/vfio_iommu_type1.c > > +++ b/drivers/vfio/vfio_iommu_type1.c > > @@ -403,7 +403,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, > > struct vfio_dma *dma) > > static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu) > > { > > struct vfio_domain *domain; > > - unsigned long bitmap = PAGE_MASK; > > + unsigned long bitmap = ULONG_MAX; > > Isn't this and removing the WARN_ON()s the only real change in this > patch? The rest looks like conversion to use IS_ALIGNED and the > following test, that I don't really understand... > > > > > mutex_lock(>lock); > > list_for_each_entry(domain, >domain_list, next) > > @@ -416,20 +416,18 @@ static unsigned long vfio_pgsize_bitmap(struct > > vfio_iommu *iommu) > > static int vfio_dma_do_unmap(struct vfio_iommu *iommu, > > struct vfio_iommu_type1_dma_unmap *unmap) > > { > > - uint64_t mask; > > struct vfio_dma *dma; > > size_t unmapped = 0; > > int ret = 0; > > + unsigned int min_pagesz = __ffs(vfio_pgsize_bitmap(iommu)); > > + unsigned int requested_alignment = (min_pagesz < PAGE_SIZE) ? > > + PAGE_SIZE : min_pagesz; > > This one. If we're going to support sub-PAGE_SIZE mappings, why do we > care to cap alignment at PAGE_SIZE? Eric can clarify, but I think the intention here is to have VFIO continue doing things in PAGE_SIZE chunks precisely so that we don't have to rework all of the pinning code etc. The IOMMU API can then deal with the smaller page size. > > - mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1; > > - > > - if (unmap->iova & mask) > > + if (!IS_ALIGNED(unmap->iova, requested_alignment)) > > return -EINVAL; > > - if (!unmap->size || unmap->size & mask) > > + if (!unmap->size || !IS_ALIGNED(unmap->size, requested_alignment)) > > return -EINVAL; > > > > - WARN_ON(mask & PAGE_MASK); > > - > > mutex_lock(>lock); > > > > /* > > @@ -553,25 +551,24 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, > > size_t size = map->size; > > long npage; > > int ret = 0, prot = 0; > > - uint64_t mask; > > struct vfio_dma *dma; > > unsigned long pfn; > > + unsigned int min_pagesz = __ffs(vfio_pgsize_bitmap(iommu)); > > + unsigned int requested_alignment = (min_pagesz < PAGE_SIZE) ? > > + PAGE_SIZE : min_pagesz; > > > > /* Verify that none of our __u64 fields overflow */ > > if (map->size != size || map->vaddr != vaddr || map->iova != iova) > > return -EINVAL; > > > > - mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1; > > - > > - WARN_ON(mask & PAGE_MASK); > > - > > /* READ/WRITE from device perspective */ > > if (map->flags & VFIO_DMA_MAP_FLAG_WRITE) > > prot |= IOMMU_WRITE; > > if (map->flags & VFIO_DMA_MAP_FLAG_READ) > > prot |= IOMMU_READ; > > > > - if (!prot || !size || (size | iova | vaddr) & mask) > > + if (!prot || !size || > > + !IS_ALIGNED(size | iova | vaddr, requested_alignment)) > > return -EINVAL; > > > > /* Don't allow IOVA or virtual address wrap */ > > This is mostly ignoring the problems with sub-PAGE_SIZE mappings. For > instance, we can only pin on PAGE_SIZE and therefore we only do > accounting on PAGE_SIZE, so if the user does 4K mappings across your 64K > page, that page gets pinned and accounted 16 times. Are we going to > tell users that their locked memory limit needs to be 16x now? The rest > of the code would need an audit as well to see what other sub-page bugs > might be hiding. Thanks, I don't see that. The pinning all happens the same in VFIO, which can then happily pass a 64k region to iommu_map. iommu_map will then call ->map in 4k chunks on the IOMMU driver ops. Will -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] vfio/type1: handle case where IOMMU does not support PAGE_SIZE size
On Wed, Oct 28, 2015 at 06:48:41PM +0100, Eric Auger wrote: > On 10/28/2015 06:37 PM, Alex Williamson wrote: > > Ok, so with hopefully correcting my understand of what this does, isn't > > this effectively the same: > > > > diff --git a/drivers/vfio/vfio_iommu_type1.c > > b/drivers/vfio/vfio_iommu_type1.c > > index 57d8c37..7db4f5a 100644 > > --- a/drivers/vfio/vfio_iommu_type1.c > > +++ b/drivers/vfio/vfio_iommu_type1.c > > @@ -403,13 +403,19 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, > > stru > > static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu) > > { > > struct vfio_domain *domain; > > - unsigned long bitmap = PAGE_MASK; > > + unsigned long bitmap = ULONG_MAX; > > > > mutex_lock(>lock); > > list_for_each_entry(domain, >domain_list, next) > > bitmap &= domain->domain->ops->pgsize_bitmap; > > mutex_unlock(>lock); > > > > + /* Some comment about how the IOMMU API splits requests */ > > + if (bitmap & ~PAGE_MASK) { > > + bitmap &= PAGE_MASK; > > + bitmap |= PAGE_SIZE; > > + } > > + > > return bitmap; > > } > Yes, to me it is indeed the same > > > > This would also expose to the user that we're accepting PAGE_SIZE, which > > we weren't before, so it was not quite right to just let them do it > > anyway. I don't think we even need to get rid of the WARN_ONs, do we? > > Thanks, > > The end-user might be afraid of those latter. Personally I would get rid > of them but that's definitively up to you. I think Alex's point is that the WARN_ON's won't trigger with this patch, because he clears those lower bits in the bitmap. Will -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network hangs when communicating with host
[apologies for the delay -- I've been off for a week and am catching up on email] On Tue, Oct 20, 2015 at 09:58:51AM -0400, Sasha Levin wrote: > On 10/20/2015 09:42 AM, Dmitry Vyukov wrote: > > I now have another issue. My binary fails to mmap a file within lkvm > > sandbox. The same binary works fine on host and in qemu. I've added > > strace into sandbox script, and here is the output: > > > > [pid 837] openat(AT_FDCWD, "syzkaller-shm048878722", O_RDWR|O_CLOEXEC) = 5 > > [pid 837] mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_SHARED, 5, > > 0) = -1 EINVAL (Invalid argument) > > > > I don't see anything that can potentially cause EINVAL here. Is it > > possible that lkvm somehow affects kernel behavior here? > > > > I run lkvm as: > > > > $ taskset 1 /kvmtool/lkvm sandbox --disk syz-0 --mem=2048 --cpus=2 > > --kernel /arch/x86/boot/bzImage --network mode=user --sandbox > > /workdir/kvm/syz-0.sh > > It's possible that something in the virtio-9p layer is broken. I'll give > it a look in the evening. I ended up with the patch below, but it's really ugly and I didn't get round to posting it. Will --->8 >From 7cbcdfef1b9f094db4bf75676f22339f3164e103 Mon Sep 17 00:00:00 2001 From: Will Deacon <will.dea...@arm.com> Date: Fri, 17 Apr 2015 17:31:36 +0100 Subject: [PATCH] kvmtool: virtio-net: fix VIRTIO_NET_F_MRG_RXBUF usage in rx thread When merging virtio-net buffers using the VIRTIO_NET_F_MRG_RXBUF feature, the first buffer added to the used ring should indicate the total number of buffers used to hold the packet. Unfortunately, kvmtool has a number of issues when constructing these merged buffers: - Commit 5131332e3f1a ("kvmtool: convert net backend to support bi-endianness") introduced a strange loop counter, which resulted in hdr->num_buffers being set redundantly the first time round - When adding the buffers to the ring, we actually add them one-by-one, allowing the guest to see the header before we've inserted the rest of the data buffers... - ... which is made worse because we non-atomically increment the num_buffers count in the header each time we insert a new data buffer Consequently, the guest quickly becomes confused in its net rx code and the whole thing grinds to a halt. This is easily exemplified by trying to boot a root filesystem over NFS, which seldom succeeds. This patch resolves the issues by allowing us to insert items into the used ring without updating the index. Once the full payload has been added and num_buffers corresponds to the total size, we *then* publish the buffers to the guest. Cc: Marc Zyngier <marc.zyng...@arm.com> Cc: Sasha Levin <sasha.le...@oracle.com> Signed-off-by: Will Deacon <will.dea...@arm.com> --- include/kvm/virtio.h | 2 ++ virtio/core.c| 32 +--- virtio/net.c | 31 +++ 3 files changed, 42 insertions(+), 23 deletions(-) diff --git a/include/kvm/virtio.h b/include/kvm/virtio.h index 768ee9668d44..8324ba7d38be 100644 --- a/include/kvm/virtio.h +++ b/include/kvm/virtio.h @@ -112,6 +112,8 @@ static inline bool virt_queue__available(struct virt_queue *vq) return virtio_guest_to_host_u16(vq, vq->vring.avail->idx) != vq->last_avail_idx; } +void virt_queue__used_idx_advance(struct virt_queue *queue, u16 jump); +struct vring_used_elem * virt_queue__set_used_elem_no_update(struct virt_queue *queue, u32 head, u32 len, u16 offset); struct vring_used_elem *virt_queue__set_used_elem(struct virt_queue *queue, u32 head, u32 len); bool virtio_queue__should_signal(struct virt_queue *vq); diff --git a/virtio/core.c b/virtio/core.c index 3b6e4d7cd045..d6ac289d450e 100644 --- a/virtio/core.c +++ b/virtio/core.c @@ -21,22 +21,17 @@ const char* virtio_trans_name(enum virtio_trans trans) return "unknown"; } -struct vring_used_elem *virt_queue__set_used_elem(struct virt_queue *queue, u32 head, u32 len) +void virt_queue__used_idx_advance(struct virt_queue *queue, u16 jump) { - struct vring_used_elem *used_elem; u16 idx = virtio_guest_to_host_u16(queue, queue->vring.used->idx); - used_elem = >vring.used->ring[idx % queue->vring.num]; - used_elem->id = virtio_host_to_guest_u32(queue, head); - used_elem->len = virtio_host_to_guest_u32(queue, len); - /* * Use wmb to assure that used elem was updated with head and len. * We need a wmb here since we can't advance idx unless we're ready * to pass the used element to the guest. */ wmb(); - idx++; + idx += jump; queue->vring.used->idx = virtio_host_to_guest_u16(queue, idx); /* @@ -45,6 +40,29 @@ struct vring_used_elem *virt_queue__set_used_elem(struct virt_queue *queue
Re: [RFC PATCH] vfio/type1: Do not support IOMMUs that allow bypass
On Fri, Oct 16, 2015 at 09:51:22AM -0600, Alex Williamson wrote: > On Fri, 2015-10-16 at 16:03 +0200, Eric Auger wrote: > > Hi Alex, > > On 10/15/2015 10:52 PM, Alex Williamson wrote: > > > We can only provide isolation if DMA is forced through the IOMMU > > > aperture. Don't allow type1 to be used if this is not the case. > > > > > > Signed-off-by: Alex Williamson> > > --- > > > > > > Eric, I see a number of IOMMU drivers enable this, do the ones you > > > care about for ARM set geometry.force_aperture? Thanks, > > I am currently using arm-smmu.c. I don't see force_aperture being set. > > Hi Will, Hi Alex, > Would it be possible to add iommu_domain_geometry support to arm-smmu.c? > In addition to this test to verify that DMA cannot bypass the IOMMU, I'd > eventually like to pass the aperture information out through the VFIO > API. Thanks, The slight snag here is that we don't know which SMMU in the system the domain is attached to at the point when the geometry is queried, so I can't give you an upper bound on the aperture. For example, if there is an SMMU with a 32-bit input address and another with a 48-bit input address. We could play the same horrible games that we do with the pgsize bitmap, and truncate some global aperture everytime we probe an SMMU device, but I'd really like to have fewer hacks like that if possible. The root of the problem is still that domains are allocated for a bus, rather than an IOMMU instance. Will -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] kvmtool: don't overwrite /virt/init
Hi Oleg, On Thu, Oct 22, 2015 at 05:59:21PM +0200, Oleg Nesterov wrote: > On top of "[PATCH 0/3] kvmtool: tiny init fox x86_64" I sent. > > I simply can't understand why kvmtool always overwrites /virt/init. > This doesn't look right, especially because you can't pass "init=" > kernel argument without another patch "[PATCH] kvmtool/run: append > cfg.kernel_cmdline at the end of real_cmdline" I sent. And so far > it is not clear to me if you are going to accept that patch or not. Sorry, I was off last week and didn't have email access. I'm going through my inbox now and your patches look pretty good to me. I'll apply/test and then push out if I don't see any issues. Thanks! Will -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvmtool] Add a link to the lwn.net article
Hi Sven, On Sat, Oct 03, 2015 at 10:42:55PM +1000, Sven Dowideit wrote: > Signed-off-by: Sven Dowideit> --- > README | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/README b/README > index 6667f23..5501f05 100644 > --- a/README > +++ b/README > @@ -97,6 +97,9 @@ project: > > http://thread.gmane.org/gmane.linux.kernel/962051/focus=962620 > > +Another detailed example can be found in the lwn.net article: > + > +http://lwn.net/Articles/658511/ Currently, this link requires an lwn subscription but I'm happy to take this patch once it becomes freely available later this week. Will -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvmtool] Skip a few messages by default: command line args; flat binary; earlyprintk.
On Wed, Sep 30, 2015 at 05:11:15PM +0100, Andre Przywara wrote: > On 29/09/15 17:59, Dimitri John Ledkov wrote: > > The partial command line args & earlyprintk=serial are still enabled > > in the debug mode. Warning that a flat binary kernel image is attemped > > to be loaded is completely dropped. These are not that informative, > > once one is past intial debugging, and only polute the console. > > > > Signed-off-by: Dimitri John Ledkov> > --- > > builtin-run.c | 10 ++ > > kvm.c | 1 - > > x86/kvm.c | 8 ++-- > > 3 files changed, 12 insertions(+), 7 deletions(-) > > > > diff --git a/builtin-run.c b/builtin-run.c > > index e0c8732..8edbf88 100644 > > --- a/builtin-run.c > > +++ b/builtin-run.c > > @@ -613,10 +613,12 @@ static struct kvm *kvm_cmd_run_init(int argc, const > > char **argv) > > > > kvm->cfg.real_cmdline = real_cmdline; > > > > - printf(" # %s run -k %s -m %Lu -c %d --name %s\n", KVM_BINARY_NAME, > > - kvm->cfg.kernel_filename, > > - (unsigned long long)kvm->cfg.ram_size / 1024 / 1024, > > - kvm->cfg.nrcpus, kvm->cfg.guest_name); > > + if (do_debug_print) { > > + printf(" # %s run -k %s -m %Lu -c %d --name %s\n", > > KVM_BINARY_NAME, > > + kvm->cfg.kernel_filename, > > + (unsigned long long)kvm->cfg.ram_size / 1024 / 1024, > > + kvm->cfg.nrcpus, kvm->cfg.guest_name); > > + } > > I like the general idea. In fact I have this very patch (among others) > in my tree too. I applied similar guarding to other messages as well > (mostly those that only show up on ARM, but also the "ended normally" > message). Like any good UNIX tool kvmtool should keep quiet if it has > nothing worthwhile to say ;-) > But looking at it more closely, I see that there is pr_debug() defined > doing that "if (do_debug_print)" already. The only issue is that is > prints source line information, which is not really useful here. But > then again there does not seem to be any user of it? > > So what about the following: > - We avoid printing pr_info() messages in the default case. Looking at > its current users in the tree this information is not really useful for > normal users. We enable pr_info() output only if do_debug_print is > enabled or introduce another command line flag (--verbose?) for that. > - We check each user of pr_info() to see whether this information is > actually "informational" or whether it should be converted to pr_warn. > - We change the above line to use pr_info instead of printf. > - We fix the EOL mayhem we have atm while at it. > > If you don't mind I will give this a try later this week. Sounds good to me. Will -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvmtool: don't rely on $HOME
On Fri, Sep 18, 2015 at 11:51:37AM +0100, Riku Voipio wrote: > On 17 September 2015 at 18:53, Will Deacon <will.dea...@arm.com> wrote: > > On Thu, Sep 17, 2015 at 03:03:15PM +0100, Alban Crequy wrote: > >> kvm__set_dir() called in main() and kvm__get_dir() rely on $HOME. But in > >> some environments (such as starting lkvm through systemd-run), $HOME is > >> undefined. This causes bind() to use a socket path containing "(null)" > >> and this fails. The current code does not check errors returned by > >> realpath(). > >> > >> Symptoms: > >> > >> | bind: No such file or directory > >> | Error: Failed adding socket to epoll > >> | Warning: Failed init: kvm_ipc__init > >> | > >> | Fatal: Initialisation failed > >> > >> This bug was first reported on https://github.com/coreos/rkt/issues/1393 > >> > >> Instead of using "$HOME/.lkvm/" (i.e. "/root/.lkvm/"), this patch uses > >> "/var/lib/lkvm/". This also improve the error reporting by printing the > >> socket filename. > > > > Hmm, but that requires lkvm to be run with sufficient privileges to > > write to /var/lib, which I don't think is generally the case. I think we > > have a few options: > > > > (1) Try /var/lib/lkvm if $HOME is NULL > > (2) Use an alternative environment variable for the pid prefix > > (3) Add a --pid command line option for the pidfile > > (4) ??? > > > > Any preferences? What do other projects do? > > The right place to put a pid file would be $XDG_RUNTIME_DIR aka > /run/user/$UID/. System services write their pidfiles and sockets to > plain /run Ok, that certainly sounds like the right things to do for temporary structures then. Thanks. > The place where to put the rootfs structure created in builtin-setup.c > is more complicated. Is the created rootfs epheremal? Then sticking > under /run(user/UID) is fine. Currently, the filesystem persists across multiple invocations of lkvm, so I can imagine somebody being surprised if what they thought was persistent was lost over something like a reboot. Will -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvmtool: don't rely on $HOME
On Thu, Sep 17, 2015 at 03:03:15PM +0100, Alban Crequy wrote: > kvm__set_dir() called in main() and kvm__get_dir() rely on $HOME. But in > some environments (such as starting lkvm through systemd-run), $HOME is > undefined. This causes bind() to use a socket path containing "(null)" > and this fails. The current code does not check errors returned by > realpath(). > > Symptoms: > > | bind: No such file or directory > | Error: Failed adding socket to epoll > | Warning: Failed init: kvm_ipc__init > | > | Fatal: Initialisation failed > > This bug was first reported on https://github.com/coreos/rkt/issues/1393 > > Instead of using "$HOME/.lkvm/" (i.e. "/root/.lkvm/"), this patch uses > "/var/lib/lkvm/". This also improve the error reporting by printing the > socket filename. Hmm, but that requires lkvm to be run with sufficient privileges to write to /var/lib, which I don't think is generally the case. I think we have a few options: (1) Try /var/lib/lkvm if $HOME is NULL (2) Use an alternative environment variable for the pid prefix (3) Add a --pid command line option for the pidfile (4) ??? Any preferences? What do other projects do? Will -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 kvmtool] Make static libc and guest-init functionality optional.
Hi Dmitri, On Fri, Sep 11, 2015 at 03:40:00PM +0100, Dimitri John Ledkov wrote: > If one typically only boots full disk-images, one wouldn't necessaraly > want to statically link glibc, for the guest-init feature of the > kvmtool. As statically linked glibc triggers haevy security > maintainance. > > Signed-off-by: Dimitri John Ledkov <dimitri.j.led...@intel.com> > --- > Changes since v1: > - rename CONFIG_HAS_LIBC to CONFIG_GUEST_INIT for clarity > - use more ifdefs, instead of runtime check of _binary_guest_init_size==0 The idea looks good, but I think we can tidy some of this up at the same time by moving all the guest_init code in builtin_setup.c. How about the patch below? Will --->8 >From cdce942c1a3a04635065a7972ca4e21386664756 Mon Sep 17 00:00:00 2001 From: Dimitri John Ledkov <dimitri.j.led...@intel.com> Date: Fri, 11 Sep 2015 15:40:00 +0100 Subject: [PATCH] Make static libc and guest-init functionality optional. If one typically only boots full disk-images, one wouldn't necessaraly want to statically link glibc, for the guest-init feature of the kvmtool. As statically linked glibc triggers haevy security maintainance. Signed-off-by: Dimitri John Ledkov <dimitri.j.led...@intel.com> [will: moved all the guest_init handling into builtin_setup.c] Signed-off-by: Will Deacon <will.dea...@arm.com> --- Makefile| 12 +++- builtin-run.c | 29 + builtin-setup.c | 19 ++- include/kvm/builtin-setup.h | 1 + 4 files changed, 23 insertions(+), 38 deletions(-) diff --git a/Makefile b/Makefile index 7b17d529d13b..f1701aa7b8ec 100644 --- a/Makefile +++ b/Makefile @@ -34,8 +34,6 @@ bindir_SQ = $(subst ','\'',$(bindir)) PROGRAM:= lkvm PROGRAM_ALIAS := vm -GUEST_INIT := guest/init - OBJS += builtin-balloon.o OBJS += builtin-debug.o OBJS += builtin-help.o @@ -279,8 +277,13 @@ ifeq ($(LTO),1) endif endif -ifneq ($(call try-build,$(SOURCE_STATIC),,-static),y) -$(error No static libc found. Please install glibc-static package.) +ifeq ($(call try-build,$(SOURCE_STATIC),,-static),y) + CFLAGS += -DCONFIG_GUEST_INIT + GUEST_INIT := guest/init + GUEST_OBJS = guest/guest_init.o +else + $(warning No static libc found. Skipping guest init) + NOTFOUND+= static-libc endif ifeq (y,$(ARCH_WANT_LIBFDT)) @@ -356,7 +359,6 @@ c_flags = -Wp,-MD,$(depfile) $(CFLAGS) # $(OTHEROBJS) are things that do not get substituted like this. # STATIC_OBJS = $(patsubst %.o,%.static.o,$(OBJS) $(OBJS_STATOPT)) -GUEST_OBJS = guest/guest_init.o $(PROGRAM)-static: $(STATIC_OBJS) $(OTHEROBJS) $(GUEST_INIT) $(E) " LINK" $@ diff --git a/builtin-run.c b/builtin-run.c index 1ee75ad3f010..e0c87329e52b 100644 --- a/builtin-run.c +++ b/builtin-run.c @@ -59,9 +59,6 @@ static int kvm_run_wrapper; bool do_debug_print = false; -extern char _binary_guest_init_start; -extern char _binary_guest_init_size; - static const char * const run_usage[] = { "lkvm run [] []", NULL @@ -345,30 +342,6 @@ void kvm_run_help(void) usage_with_options(run_usage, options); } -static int kvm_setup_guest_init(struct kvm *kvm) -{ - const char *rootfs = kvm->cfg.custom_rootfs_name; - char tmp[PATH_MAX]; - size_t size; - int fd, ret; - char *data; - - /* Setup /virt/init */ - size = (size_t)&_binary_guest_init_size; - data = (char *)&_binary_guest_init_start; - snprintf(tmp, PATH_MAX, "%s%s/virt/init", kvm__get_dir(), rootfs); - remove(tmp); - fd = open(tmp, O_CREAT | O_WRONLY, 0755); - if (fd < 0) - die("Fail to setup %s", tmp); - ret = xwrite(fd, data, size); - if (ret < 0) - die("Fail to setup %s", tmp); - close(fd); - - return 0; -} - static int kvm_run_set_sandbox(struct kvm *kvm) { const char *guestfs_name = kvm->cfg.custom_rootfs_name; @@ -631,7 +604,7 @@ static struct kvm *kvm_cmd_run_init(int argc, const char **argv) if (!kvm->cfg.no_dhcp) strcat(real_cmdline, " ip=dhcp"); - if (kvm_setup_guest_init(kvm)) + if (kvm_setup_guest_init(kvm->cfg.custom_rootfs_name)) die("Failed to setup init for guest."); } } else if (!strstr(real_cmdline, "root=")) { diff --git a/builtin-setup.c b/builtin-setup.c index 8b45c5645ad4..40fef15dbbe4 100644 --- a/builtin-setup.c +++ b/builtin-setup.c @@ -16,9 +16,6 @@ #include #include -extern char _binary_guest_init_start; -extern char _binary_guest_init_size; - static const char *instance_name; static const char * const set
[PATCH] KVM: arm64: remove all traces of the ThumbEE registers
Although the ThumbEE registers and traps were present in earlier versions of the v8 architecture, it was retrospectively removed and so we can do the same. Cc: Marc Zyngier <marc.zyng...@arm.com> Signed-off-by: Will Deacon <will.dea...@arm.com> --- arch/arm64/include/asm/kvm_arm.h | 1 - arch/arm64/include/asm/kvm_asm.h | 4 +--- arch/arm64/kvm/hyp.S | 22 -- arch/arm64/kvm/sys_regs.c| 7 --- 4 files changed, 5 insertions(+), 29 deletions(-) diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h index 7605e095217f..cdc9888134e5 100644 --- a/arch/arm64/include/asm/kvm_arm.h +++ b/arch/arm64/include/asm/kvm_arm.h @@ -168,7 +168,6 @@ #define VTTBR_VMID_MASK (UL(0xFF) << VTTBR_VMID_SHIFT) /* Hyp System Trap Register */ -#define HSTR_EL2_TTEE (1 << 16) #define HSTR_EL2_T(x) (1 << x) /* Hyp Coproccessor Trap Register Shifts */ diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h index 67fa0de3d483..5e377101f919 100644 --- a/arch/arm64/include/asm/kvm_asm.h +++ b/arch/arm64/include/asm/kvm_asm.h @@ -53,9 +53,7 @@ #defineIFSR32_EL2 25 /* Instruction Fault Status Register */ #defineFPEXC32_EL2 26 /* Floating-Point Exception Control Register */ #defineDBGVCR32_EL227 /* Debug Vector Catch Register */ -#defineTEECR32_EL1 28 /* ThumbEE Configuration Register */ -#defineTEEHBR32_EL129 /* ThumbEE Handler Base Register */ -#defineNR_SYS_REGS 30 +#defineNR_SYS_REGS 28 /* 32bit mapping */ #define c0_MPIDR (MPIDR_EL1 * 2) /* MultiProcessor ID Register */ diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S index 6addf97aadb3..c4016d411f4a 100644 --- a/arch/arm64/kvm/hyp.S +++ b/arch/arm64/kvm/hyp.S @@ -433,20 +433,13 @@ mrs x5, ifsr32_el2 stp x4, x5, [x3] - skip_fpsimd_state x8, 3f + skip_fpsimd_state x8, 2f mrs x6, fpexc32_el2 str x6, [x3, #16] -3: - skip_debug_state x8, 2f +2: + skip_debug_state x8, 1f mrs x7, dbgvcr32_el2 str x7, [x3, #24] -2: - skip_tee_state x8, 1f - - add x3, x2, #CPU_SYSREG_OFFSET(TEECR32_EL1) - mrs x4, teecr32_el1 - mrs x5, teehbr32_el1 - stp x4, x5, [x3] 1: .endm @@ -466,16 +459,9 @@ msr dacr32_el2, x4 msr ifsr32_el2, x5 - skip_debug_state x8, 2f + skip_debug_state x8, 1f ldr x7, [x3, #24] msr dbgvcr32_el2, x7 -2: - skip_tee_state x8, 1f - - add x3, x2, #CPU_SYSREG_OFFSET(TEECR32_EL1) - ldp x4, x5, [x3] - msr teecr32_el1, x4 - msr teehbr32_el1, x5 1: .endm diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index b41607d270ac..6c35e49757d8 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -539,13 +539,6 @@ static const struct sys_reg_desc sys_reg_descs[] = { { Op0(0b10), Op1(0b000), CRn(0b0111), CRm(0b1110), Op2(0b110), trap_dbgauthstatus_el1 }, - /* TEECR32_EL1 */ - { Op0(0b10), Op1(0b010), CRn(0b), CRm(0b), Op2(0b000), - NULL, reset_val, TEECR32_EL1, 0 }, - /* TEEHBR32_EL1 */ - { Op0(0b10), Op1(0b010), CRn(0b0001), CRm(0b), Op2(0b000), - NULL, reset_val, TEEHBR32_EL1, 0 }, - /* MDCCSR_EL1 */ { Op0(0b10), Op1(0b011), CRn(0b), CRm(0b0001), Op2(0b000), trap_raz_wi }, -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: arm64: add workaround for Cortex-A57 erratum #852523
When restoring the system register state for an AArch32 guest at EL2, writes to DACR32_EL2 may not be correctly synchronised by Cortex-A57, which can lead to the guest effectively running with junk in the DACR and running into unexpected domain faults. This patch works around the issue by re-ordering our restoration of the AArch32 register aliases so that they happen before the AArch64 system registers. Ensuring that the registers are restored in this order guarantees that they will be correctly synchronised by the core. Cc: <sta...@vger.kernel.org> Cc: Marc Zyngier <marc.zyng...@arm.com> Signed-off-by: Will Deacon <will.dea...@arm.com> --- arch/arm64/kvm/hyp.S | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S index 3c4f641451bb..c4016d411f4a 100644 --- a/arch/arm64/kvm/hyp.S +++ b/arch/arm64/kvm/hyp.S @@ -739,6 +739,9 @@ ENTRY(__kvm_vcpu_run) // Guest context add x2, x0, #VCPU_CONTEXT + // We must restore the 32-bit state before the sysregs, thanks + // to Cortex-A57 erratum #852523. + restore_guest_32bit_state bl __restore_sysregs skip_debug_state x3, 1f @@ -746,7 +749,6 @@ ENTRY(__kvm_vcpu_run) kern_hyp_va x3 bl __restore_debug 1: - restore_guest_32bit_state restore_guest_regs // That's it, no more messing around. -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: arm64: add workaround for Cortex-A57 erratum #852523
On Mon, Sep 14, 2015 at 04:46:28PM +0100, Marc Zyngier wrote: > On 14/09/15 16:06, Will Deacon wrote: > > When restoring the system register state for an AArch32 guest at EL2, > > writes to DACR32_EL2 may not be correctly synchronised by Cortex-A57, > > which can lead to the guest effectively running with junk in the DACR > > and running into unexpected domain faults. > > > > This patch works around the issue by re-ordering our restoration of the > > AArch32 register aliases so that they happen before the AArch64 system > > registers. Ensuring that the registers are restored in this order > > guarantees that they will be correctly synchronised by the core. > > > > Cc: <sta...@vger.kernel.org> > > Cc: Marc Zyngier <marc.zyng...@arm.com> > > Signed-off-by: Will Deacon <will.dea...@arm.com> > > Reviewed-by: Marc Zyngier <marc.zyng...@arm.com> > > I'll queue that together with the next batch of fixes. Great, thanks Marc. Will -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvmtool Makefile: relax arm test
On Thu, Sep 10, 2015 at 06:45:59AM +0100, Riku Voipio wrote: > On 4 September 2015 at 14:06, Andre Przywarawrote: > > On 04/09/15 11:52, Riku Voipio wrote: > >> On 4 September 2015 at 13:10, Andre Przywara > >> wrote: > >>> On 03/09/15 12:20, riku.voi...@linaro.org wrote: > From: Riku Voipio > > Currently Makefile accepts only armv7l.* When building kvmtool under > 32bit > personality on Aarch64 machines, uname -m reports "armv8l", so build > fails. > We expect doing 32bit arm builds in Aarch64 to become standard the same > way > people do i386 builds on x86_64 machines. > > Make the sed test a little more greedy so armv8l becomes acceptable. > > Signed-off-by: Riku Voipio > >>> > >>> The patch looks OK to me, I just wonder how you do the actual build > >>> within the linux32 environment? > >>> Do you have an arm cross compiler installed and set CROSS_COMPILE? Or is > >>> there a magic compiler (driver) which uses uname -m as well? > >>> And what would be the difference to setting ARCH=arm as well? Just > >>> convenience? > >> > >> It's just an arm32 chroot, with an native arm32 compiler. The chroot > >> is on an arm64 machine since these tend to be much faster than arm32 > >> hardware. > > > > Oh right, a chroot, didn't think about the obvious ;-) > > Also it applies to 64-bit kernels with 32-bit root filesystems, I think. > > So: > > > > Acked-by: Andre Przywara > > Ping? Applied and pushed, thanks. In future, it's best to Cc me if you want to make sure stuff doesn't get missed :) Cheers, Will -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] Late arm64 KVM fix for 4.2
Hi Linus, I appreciate that it's extremely late in the cycle, but we've uncovered a nasty bug in the arm64 KVM code which allows a badly behaved 32-bit guest to bring down the host. The fix is simple (it's what I believe we call a brown paper bag bug) and I don't think it makes sense to sit on this, particularly as Russell ended up triggering this rather than just somebody noticing a potential problem by inspection. Usually arm64 KVM changes would go via Paolo's tree, but he's on holiday at the moment and the deal is that anything urgent gets shuffled via the arch trees, so here it is. Please pull. Cheers, Will ---8 The following changes since commit c13dcf9f2d6f5f06ef1bf79ec456df614c5e058b: Linux 4.2-rc8 (2015-08-23 20:52:59 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git tags/arm64-fixes for you to fetch changes up to 126c69a0bd0e441bf6766a5d9bf20de011be9f68: arm64: KVM: Fix host crash when injecting a fault into a 32bit guest (2015-08-27 16:16:55 +0100) Urgent arm64 KVM fix for 4.2: Fix arm64 KVM issue when injecting an abort into a 32-bit guest, which would lead to an illegal exception return at EL2 and a subsequent host crash. Marc Zyngier (1): arm64: KVM: Fix host crash when injecting a fault into a 32bit guest arch/arm64/kvm/inject_fault.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvmtool] kvm__emulate_io: Don't call br_read_unlock() twice on IO error
On Thu, Aug 06, 2015 at 07:39:44PM +0100, Josh Triplett wrote: The IO error path in kvm__emulate_io would call br_read_unlock(), then goto error, which would call br_read_unlock() again. Refactor the control flow to have only one exit path and one call to br_read_unlock(). Thanks, Josh. I applied all three patches! Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv3 1/2] kvmtool: Introduce downscript option for virtio-net
On Tue, Jul 21, 2015 at 10:44:31AM +0100, Andre Przywara wrote: thanks for the rework, that looks good to me, some minor nits below. Not sure if that requires a new version, maybe Will can fix that up when he applies it. Nah, please send a new version with these points addressed. Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/12] kvmtool: Improve portability
On Fri, Jul 17, 2015 at 05:02:06PM +0100, Andre Przywara wrote: Hi, Hi Andre, this is a collection of patches to bring kvmtool closer to standards compliance (with standards not necessarily meaning GNU only). With all those patches applied, you can compile kvmtool with newer C standards, with clang and against musl-libc. Most of this looks good to me, thanks! I'll leave it to stew for a few days on the list, but I'm planning to merge it all apart from the two *printf clang hacks and the PAGE_SIZE patch which Szabolcs commented on. Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/12] kvm__set_dir(): avoid variable arguments call
On Fri, Jul 17, 2015 at 05:02:12PM +0100, Andre Przywara wrote: The clang compiler by default dislikes non-literal format strings in *printf functions, so it complains about kvm__set_dir() in kvm.c. Instead of suppressing this warning, lets change the code to avoid that unneeded var_args detour and use a literal format string. Why does clang moan about this? The current code looks perfectly alright to me. Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 07/12] util/util.c: avoid clang error on vsnprintf
On Fri, Jul 17, 2015 at 05:02:13PM +0100, Andre Przywara wrote: clang by default doesn't seem to like printf calls with non-literal format strings. Add the proper pragma to disable this warning in the report function to make kvmtool compile with clang. Despite its GCC name, clang also accepts this. Signed-off-by: Andre Przywara andre.przyw...@arm.com --- util/util.c | 1 + 1 file changed, 1 insertion(+) diff --git a/util/util.c b/util/util.c index 1877105..825da3f 100644 --- a/util/util.c +++ b/util/util.c @@ -10,6 +10,7 @@ #include sys/stat.h #include sys/statfs.h +#pragma GCC diagnostic ignored -Wformat-nonliteral Urgh! I think we need to figure out a better way to keep clang happy in this regard, if we decide that we care about building with it. Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/13] arm64: Add support for running Linux in EL2 mode
On Wed, Jul 08, 2015 at 05:19:13PM +0100, Marc Zyngier wrote: With the ARMv8.1 VHE, the architecture is able to (almost) transparently run the kernel at EL2, despite being written for EL1. This patch takes care of the almost part, mostly preventing the kernel from dropping from EL2 to EL1, and setting up the HYP configuration. Signed-off-by: Marc Zyngier marc.zyng...@arm.com --- arch/arm64/kernel/head.S | 23 ++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index c0ff3ce..a179747 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -29,6 +29,7 @@ #include asm/asm-offsets.h #include asm/cache.h #include asm/cputype.h +#include asm/kvm_mmu.h #include asm/memory.h #include asm/thread_info.h #include asm/pgtable-hwdef.h @@ -481,8 +482,16 @@ CPU_LE( bic x0, x0, #(3 24) ) // Clear the EE and E0E bits for EL1 isb ret + /* Check for VHE being present */ +2: mrs x2, id_aa64mmfr1_el1 + ubfxx2, x2, #8, #4 + /* Hyp configuration. */ -2: mov x0, #(1 31) // 64-bit EL1 + mov x0, #HCR_RW // 64-bit EL1 + cbz x2, set_hcr + orr x0, x0, #HCR_TGE// Enable Host Extensions + orr x0, x0, #HCR_E2H +set_hcr: msr hcr_el2, x0 /* Generic timers. */ @@ -522,6 +531,9 @@ CPU_LE( movkx0, #0x30d0, lsl #16) // Clear EE and E0E on LE systems /* Coprocessor traps. */ mov x0, #0x33ff These bits are RES0 with VHE enabled, afaict. + cbz x2, set_cptr + orr x0, x0, #(3 20) // Don't trap FP +set_cptr: msr cptr_el2, x0// Disable copro. traps to EL2 #ifdef CONFIG_COMPAT @@ -531,6 +543,15 @@ CPU_LE( movkx0, #0x30d0, lsl #16) // Clear EE and E0E on LE systems /* Stage-2 translation */ msr vttbr_el2, xzr + cbz x2, install_el2_stub + + setup_vtcr x4, x5 + + mov w20, #BOOT_CPU_MODE_EL2 // This CPU booted in EL2 You do this in install_el2_stub as well -- can you move it above the cbz? Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/13] arm64: Panic when VHE and non VHE CPUs coexist
On Wed, Jul 08, 2015 at 05:19:14PM +0100, Marc Zyngier wrote: Having both VHE and non-VHE capable CPUs in the same system is likely to be a recipe for disaster. If the boot CPU has VHE, but a secondary is not, we won't be able to downgrade and run the kernel at EL1. Add CPU hotplug to the mix, and this produces a terrifying mess. Let's solve the problem once and for all. If you mix VHE and non-VHE CPUs in the same system, you deserve to loose, and this patch makes sure you don't get a chance. This is implemented by storing the kernel execution level in a global variable. Secondaries will park themselves in a WFI loop if they observe a mismatch. Also, the primary CPU will detect that the secondary CPU has died on a mismatched execution level. Panic will follow. Signed-off-by: Marc Zyngier marc.zyng...@arm.com --- arch/arm64/include/asm/virt.h | 15 +++ arch/arm64/kernel/head.S | 15 +++ arch/arm64/kernel/smp.c | 4 3 files changed, 34 insertions(+) diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h index 9f22dd6..8e246f7 100644 --- a/arch/arm64/include/asm/virt.h +++ b/arch/arm64/include/asm/virt.h @@ -36,6 +36,11 @@ */ extern u32 __boot_cpu_mode[2]; +/* + * __run_cpu_mode records the mode the boot CPU uses for the kernel. + */ +extern u32 __run_cpu_mode; + void __hyp_set_vectors(phys_addr_t phys_vector_base); phys_addr_t __hyp_get_vectors(void); @@ -60,6 +65,16 @@ static inline bool is_kernel_in_hyp_mode(void) return el == CurrentEL_EL2; } +static inline bool is_kernel_mode_mismatched(void) +{ + u64 el; + u32 mode; + + asm(mrs %0, CurrentEL : =r (el)); + mode = ACCESS_ONCE(__run_cpu_mode); + return el != mode; Why the temporary 'mode' variable? +} + /* The section containing the hypervisor text */ extern char __hyp_text_start[]; extern char __hyp_text_end[]; diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index a179747..318e69f 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -578,7 +578,20 @@ ENTRY(set_cpu_boot_mode_flag) 1: str w20, [x1] // This CPU has booted in EL1 dmb sy dc ivac, x1// Invalidate potentially stale cache line + adr_l x1, __run_cpu_mode + ldr w0, [x1] + mrs x20, CurrentEL Why x20? + str w20, [x1] + dmb sy + dc ivac, x1// Invalidate potentially stale cache line Can we stick __run_cpu_mode and __boot_cpu_mode into a struct in the same cacheline and avoid the extra maintenance? + cbz x0, skip_el_check w0? + cmp x0, x20 w0, w20? + bne mismatched_el +skip_el_check: ret +mismatched_el: + wfi + b mismatched_el ENDPROC(set_cpu_boot_mode_flag) /* @@ -593,6 +606,8 @@ ENDPROC(set_cpu_boot_mode_flag) ENTRY(__boot_cpu_mode) .long BOOT_CPU_MODE_EL2 .long BOOT_CPU_MODE_EL1 +ENTRY(__run_cpu_mode) + .long 0 .popsection #ifdef CONFIG_SMP diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 695801a..b467f51 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -52,6 +52,7 @@ #include asm/sections.h #include asm/tlbflush.h #include asm/ptrace.h +#include asm/virt.h #define CREATE_TRACE_POINTS #include trace/events/ipi.h @@ -112,6 +113,9 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle) pr_crit(CPU%u: failed to come online\n, cpu); ret = -EIO; } + + if (is_kernel_mode_mismatched()) + panic(CPU%u: incompatible execution level, cpu); Might be useful to print the incompatible values, if possible. Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/13] arm64: Add ARM64_HAS_VIRT_HOST_EXTN feature
On Wed, Jul 08, 2015 at 05:19:06PM +0100, Marc Zyngier wrote: Add a new ARM64_HAS_VIRT_HOST_EXTN features to indicate that the CPU has the ARMv8,1 VHE capability. This will be used to trigger kernel patching in KVM. Signed-off-by: Marc Zyngier marc.zyng...@arm.com Acked-by: Will Deacon will.dea...@arm.com Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/13] arm/arm64: Add new is_kernel_in_hyp_mode predicate
On Wed, Jul 08, 2015 at 05:19:04PM +0100, Marc Zyngier wrote: With ARMv8.1 VHE extension, it will be possible to run the kernel at EL2 (aka HYP mode). In order for the kernel to easily find out where it is running, add a new predicate that returns whether or not the kernel is in HYP mode. For completeness, the 32bit code also get such a predicate (always returning false) so that code common to both architecture (timers, KVM) can use it transparently. Signed-off-by: Marc Zyngier marc.zyng...@arm.com --- arch/arm/include/asm/virt.h | 5 + arch/arm64/include/asm/virt.h | 10 ++ 2 files changed, 15 insertions(+) [...] +static inline bool is_kernel_in_hyp_mode(void) +{ + u64 el; + + asm(mrs %0, CurrentEL : =r (el)); + return el == CurrentEL_EL2; Missing mask? Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/18] ARM64: Move PMU register related defines to asm/pmu.h
On Mon, Jul 06, 2015 at 03:17:31AM +0100, shannon.z...@linaro.org wrote: From: Shannon Zhao shannon.z...@linaro.org To use the ARMv8 PMU related register defines from the KVM code, we move the relevant definitions to asm/pmu.h header file. Signed-off-by: Anup Patel anup.pa...@linaro.org Signed-off-by: Shannon Zhao shannon.z...@linaro.org --- arch/arm64/include/asm/pmu.h | 45 ++ arch/arm64/kernel/perf_event.c | 35 2 files changed, 45 insertions(+), 35 deletions(-) Whilst I'm ok with the idea, we're currently in the process of moving the ARM and arm64 PMU drivers out into drivers/perf/, so this is likely to conflict horribly with that work. Christoffer/Marc: if you end up wanting to queue this for 4.3, give me a shout and I'll try to create a stable branch you can use as a base (but it's likely to contain a whole heap of stuff you don't want...). Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 08/11] KVM: arm: implement dirty bit mechanism for debug registers
On Wed, Jul 08, 2015 at 11:50:22AM +0100, Zhichao Huang wrote: Are you happy with this?: You miss the reserved breakpoint, I think. I also still don't understand why this is preferable to trapping. Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 08/11] KVM: arm: implement dirty bit mechanism for debug registers
On Tue, Jul 07, 2015 at 11:06:57AM +0100, Zhichao Huang wrote: Chazy and me are talking about how to reduce the saving/restoring overhead for debug registers. We want to add a state in hw_breakpoint.c to indicate whether the host enable any hwbrpts or not (might export a fuction that kvm can call), then we can read this state from memory instead of reading from real hardware registers, and to decide whether we need a world switch or not. Does it acceptable? Maybe, hard to tell without the code. There are obvious races to deal with if you use variables to indicate whether resources are in use -- why not just trap debug access from the host as well? Then you could keep track of the owner in kvm and trap accesses from everybody else. Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 09/11] KVM: arm64: guest debug, HW assisted debug support
On Fri, Jul 03, 2015 at 05:07:41PM +0100, Alex Bennée wrote: Will Deacon will.dea...@arm.com writes: On Thu, Jul 02, 2015 at 02:50:33PM +0100, Alex Bennée wrote: Are you happy with this?: [...] +/** + * kvm_arch_dev_ioctl_check_extension + * + * We currently assume that the number of HW registers is uniform + * across all CPUs (see cpuinfo_sanity_check). + */ int kvm_arch_dev_ioctl_check_extension(long ext) { int r; @@ -64,6 +71,12 @@ int kvm_arch_dev_ioctl_check_extension(long ext) case KVM_CAP_ARM_EL1_32BIT: r = cpu_has_32bit_el1(); break; + case KVM_CAP_GUEST_DEBUG_HW_BPS: + r = hw_breakpoint_slots(TYPE_INST); + break; + case KVM_CAP_GUEST_DEBUG_HW_WPS: + r = hw_breakpoint_slots(TYPE_DATA); + break; Whilst I much prefer this code, it actually adds an unwanted dependency on PERF_EVENTS that I didn't think about to start with. Sorry to keep messing you about -- I guess your original patch is the best thing after all. Everything looks to be in hw_breakpoint.[ch] which does depend on CONFIG_HAVE_HW_BREAKPOINT which depends on PERF_EVENTS to be built. However the previous code depended on this behaviour as well. I think your original approach (of sticking stuff in the header file) works regardless of the CONFIG option, no? It would seem weird to enable guest debug using HW debug registers to debug the guest yet not allowing the host kernel to use them? Of course this is the only code they would share as all the magic of guest debugging is already mostly there for dirty guest handling. I'm not familiar with Kconfig but it looks like this is all part of arm64 defconfig. Are people really going to want to disable PERF_EVENTS but still debug their guests with HW support? Then it's your call. I just find the host dependency on perf a bit weird to get guest debug working (especially as the dependency is completely fake because we don't use any perf infrastructure at all). Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 09/11] KVM: arm64: guest debug, HW assisted debug support
Hi Alex, On Thu, Jul 02, 2015 at 02:50:33PM +0100, Alex Bennée wrote: Are you happy with this?: [...] +/** + * kvm_arch_dev_ioctl_check_extension + * + * We currently assume that the number of HW registers is uniform + * across all CPUs (see cpuinfo_sanity_check). + */ int kvm_arch_dev_ioctl_check_extension(long ext) { int r; @@ -64,6 +71,12 @@ int kvm_arch_dev_ioctl_check_extension(long ext) case KVM_CAP_ARM_EL1_32BIT: r = cpu_has_32bit_el1(); break; + case KVM_CAP_GUEST_DEBUG_HW_BPS: + r = hw_breakpoint_slots(TYPE_INST); + break; + case KVM_CAP_GUEST_DEBUG_HW_WPS: + r = hw_breakpoint_slots(TYPE_DATA); + break; Whilst I much prefer this code, it actually adds an unwanted dependency on PERF_EVENTS that I didn't think about to start with. Sorry to keep messing you about -- I guess your original patch is the best thing after all. Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 09/11] KVM: arm64: guest debug, HW assisted debug support
Hi Alex, On Wed, Jul 01, 2015 at 07:29:01PM +0100, Alex Bennée wrote: This adds support for userspace to control the HW debug registers for guest debug. In the debug ioctl we copy an IMPDEF registers into a new register set called host_debug_state. We use the recently introduced vcpu parameter debug_ptr to select which register set is copied into the real registers when world switch occurs. I've made some helper functions from hw_breakpoint.c more widely available for re-use. As with single step we need to tweak the guest registers to enable the exceptions so we need to save and restore those bits. Two new capabilities have been added to the KVM_EXTENSION ioctl to allow userspace to query the number of hardware break and watch points available on the host hardware. Signed-off-by: Alex Bennée alex.ben...@linaro.org Reviewed-by: Christoffer Dall christoffer.d...@linaro.org [...] diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c index e7d934d..ac07f2a 100644 --- a/arch/arm64/kernel/hw_breakpoint.c +++ b/arch/arm64/kernel/hw_breakpoint.c @@ -50,13 +50,13 @@ static int core_num_brps; static int core_num_wrps; /* Determine number of BRP registers available. */ -static int get_num_brps(void) +int get_num_brps(void) { return ((read_cpuid(ID_AA64DFR0_EL1) 12) 0xf) + 1; } /* Determine number of WRP registers available. */ -static int get_num_wrps(void) +int get_num_wrps(void) { return ((read_cpuid(ID_AA64DFR0_EL1) 20) 0xf) + 1; } Sorry, just noticed this, but we already have a public interface for figuring these numbers out as required by perf. Can't you use hw_breakpoint_slots(...) instead of exposing these internal helpers? Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] kvmtool: Add parameter to specifiy number of threads in thread_pool
Hi Andreas, On Mon, Jun 29, 2015 at 12:43:33PM +0100, Andreas Herrmann wrote: With current code the number of threads added to the thread_pool equals number of online CPUs. So on cn78xx we usually have 48 threads per guest just for the thread_pool. IMHO this is overkill for guests that just have a few vCPUs and/or if a guest is pinned to a subset of host CPUs. E.g. # numactl -C 4,5,7,8 ./lkvm run -c 2 -m 256 -k paravirt -d rootfs.ext3 ... # ps -La | grep threadpool-work | wc -l 48 Don't change default behaviour (for sake of compatibility) but introduce a new parameter (-t or --threads) that allows to specify number of threads to be created for the thread_pool: # numactl -C 4,5,7,8 ./lkvm run -c 2 -m 256 --threads 4 -k paravirt -d ... # ps -La | grep threadpool-work | wc -l 4 Signed-off-by: Andreas Herrmann andreas.herrm...@caviumnetworks.com --- builtin-run.c|6 ++ include/kvm/kvm-config.h |1 + util/threadpool.c|2 +- 3 files changed, 8 insertions(+), 1 deletion(-) New in v2: paramter must be in range [1, number of online cpus] otherwise the default (number of online cpus) will be used. I thought some more about this and started to wonder whether we can make the threadpool self-balancing. Given the fairly restricted nature of kvmtool's threading model, could we not start with a small fixed number of threads (but no more than the number of physical CPUs) and then create new threads on demand when we detect that there is backlog in the job queue and there are spare CPUs? I don't think it should be too hard, especially if we ignore the problem of destroying idle threads and it removes the need for a magic tunable on the cmdline. Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 10/10] arm: use new irqchip parameter to create different vGIC types
On Fri, Jun 26, 2015 at 02:16:18PM +0100, Andre Przywara wrote: Currently we unconditionally create a virtual GICv2 in the guest. Add a --irqchip= parameter to let the user specify a different GIC type for the guest, when omitting this parameter it still defaults to --irqchip=gicv2. For now the only other supported type is --irqchip=gicv3 Signed-off-by: Andre Przywara andre.przyw...@arm.com --- arm/aarch64/arm-cpu.c| 2 +- arm/gic.c| 17 + arm/include/arm-common/kvm-config-arch.h | 9 - arm/kvm.c| 2 +- 4 files changed, 27 insertions(+), 3 deletions(-) diff --git a/arm/aarch64/arm-cpu.c b/arm/aarch64/arm-cpu.c index f702b9e..3dc8ea3 100644 --- a/arm/aarch64/arm-cpu.c +++ b/arm/aarch64/arm-cpu.c @@ -12,7 +12,7 @@ static void generate_fdt_nodes(void *fdt, struct kvm *kvm, u32 gic_phandle) { int timer_interrupts[4] = {13, 14, 11, 10}; - gic__generate_fdt_nodes(fdt, gic_phandle, IRQCHIP_GICV2); + gic__generate_fdt_nodes(fdt, gic_phandle, kvm-cfg.arch.irqchip); timer__generate_fdt_nodes(fdt, kvm, timer_interrupts); } diff --git a/arm/gic.c b/arm/gic.c index efe4b42..ff56de7 100644 --- a/arm/gic.c +++ b/arm/gic.c @@ -22,6 +22,23 @@ static int gic_fd = -1; static u64 gic_redists_base; static u64 gic_redists_size; +int irqchip_parser(const struct option *opt, const char *arg, int unset) +{ + enum irqchip_type *type = opt-value; + + *type = IRQCHIP_GICV2; Pointless assignment? + if (!strcmp(arg, gicv2)) { + *type = IRQCHIP_GICV2; + } else if (!strcmp(arg, gicv3)) { + *type = IRQCHIP_GICV3; + } else { + fprintf(stderr, irqchip: unknown type \%s\\n, arg); + return -1; + } + + return 0; +} + static int gic__create_device(struct kvm *kvm, enum irqchip_type type) { int err; diff --git a/arm/include/arm-common/kvm-config-arch.h b/arm/include/arm-common/kvm-config-arch.h index a8ebd94..9529881 100644 --- a/arm/include/arm-common/kvm-config-arch.h +++ b/arm/include/arm-common/kvm-config-arch.h @@ -8,8 +8,11 @@ struct kvm_config_arch { unsigned intforce_cntfrq; boolvirtio_trans_pci; boolaarch32_guest; + enum irqchip_type irqchip; }; +int irqchip_parser(const struct option *opt, const char *arg, int unset); + #define OPT_ARCH_RUN(pfx, cfg) \ pfx, \ ARM_OPT_ARCH_RUN(cfg) \ @@ -21,6 +24,10 @@ struct kvm_config_arch { updated to program CNTFRQ correctly*), \ OPT_BOOLEAN('\0', force-pci, (cfg)-virtio_trans_pci, \ Force virtio devices to use PCI as their default \ - transport), + transport), \ +OPT_CALLBACK('\0', irqchip, (cfg)-irqchip, \ + [gicv2|gicv3], \ + type of interrupt controller to emulate in the guest, \ + irqchip_parser, NULL), What happens if I don't pass this option at all? Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 07/10] limit number of VCPUs on demand
On Fri, Jun 26, 2015 at 02:16:15PM +0100, Andre Przywara wrote: Currently the ARM GIC checks the number of VCPUs against a fixed limit, which is GICv2 specific. Don't pretend we know better than the kernel and let's get rid of that explicit check. Instead be more relaxed about KVM_CREATE_VCPU failing with EINVAL, which is the way the kernel communicates having reached a VCPU limit. If we see this and have at least brought up one VCPU already successfully, then don't panic, but limit the number of VCPUs instead. Signed-off-by: Andre Przywara andre.przyw...@arm.com --- arm/gic.c | 6 -- arm/kvm-cpu.c | 7 ++- kvm-cpu.c | 7 +++ 3 files changed, 13 insertions(+), 7 deletions(-) diff --git a/arm/gic.c b/arm/gic.c index 99f0d2b..05f85a2 100644 --- a/arm/gic.c +++ b/arm/gic.c @@ -84,12 +84,6 @@ int gic__create(struct kvm *kvm) { int err; - if (kvm-nrcpus GIC_MAX_CPUS) { - pr_warning(%d CPUS greater than maximum of %d -- truncating\n, - kvm-nrcpus, GIC_MAX_CPUS); - kvm-nrcpus = GIC_MAX_CPUS; - } - /* Try the new way first, and fallback on legacy method otherwise */ err = gic__create_device(kvm); if (err) diff --git a/arm/kvm-cpu.c b/arm/kvm-cpu.c index 7780251..b2fd6ed 100644 --- a/arm/kvm-cpu.c +++ b/arm/kvm-cpu.c @@ -51,8 +51,13 @@ struct kvm_cpu *kvm_cpu__arch_init(struct kvm *kvm, unsigned long cpu_id) return NULL; vcpu-vcpu_fd = ioctl(kvm-vm_fd, KVM_CREATE_VCPU, cpu_id); - if (vcpu-vcpu_fd 0) + if (vcpu-vcpu_fd 0) { + if (errno == EINVAL) { + free(vcpu); + return NULL; + } Hmm, but EINVAL can mean all sorts of other failures too, surely? I'm not against removing the nrcpus check, but I think we should die if ioctls start failing rather than silently ignore them. Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvmtool: Add parameter to specifiy number of threads in thread_pool
On Mon, Jun 29, 2015 at 08:45:44AM +0100, Andreas Herrmann wrote: With current code the number of threads added to the thread_pool equals number of online CPUs. Thus on an OcteonIII cn78xx system we usually have 48 threads per guest just for the thread_pool. IMHO this is overkill for guests that just have a few vCPUs and/or if a guest is pinned to a subset of host CPUs. E.g. # numactl -C 4,5,7,8 ./lkvm run -c 2 -m 256 -k paravirt -d rootfs.ext3 ... # ps -La | grep threadpool-work | wc -l 48 Don't change default behaviour (for sake of compatibility) but introduce a new parameter (-t or --threads) that allows to specify number of threads to be created for the thread_pool: # numactl -C 4,5,7,8 ./lkvm run -c 2 -m 256 --threads 4 -k paravirt -d ... # ps -La | grep threadpool-work | wc -l 4 We should probably bound this on some minimum value. I assume things go pear-shaped if you pass --threads 1 (or 0, or -1)? Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvmtool: don't use PCI config space IRQ line field
Hi Andre, On Thu, Jun 18, 2015 at 06:19:53PM +0100, Andre Przywara wrote: I am tempted to remove shmem, since it's broken: a) there is no upstream driver, only some out-of-tree uio driver module in some Github repo Right, but that's the same for qemu and we've already made the jump of merging the driver, so I don't think that's a good argument for throwing it out of the tree. b) the PCI device BARs do not match what QEMU implements and what the uio driver expects (IO BAR vs. MMIO BAR) In what way? A quick look suggests that kvmtool is at least aligned with said github repo. c) there is (at least one) bug in kvmtool (easily fixed, though) Care to elaborate? Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 06/10] arm: simplify MMIO dispatching
On Wed, Jun 24, 2015 at 02:30:05PM +0100, Andre Przywara wrote: do you want me to respin the whole series to address the remaining minor comments in the last four patches or do you want to take patch 01-06 already (which I think Marc has already agreed upon)? Then I would just send an updated version of the remaining patches. Yes, please. It's easier to keep track if you just repost the series with the changes and acks. Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] powerpc: Define the hcall opcodes return values we need
On Fri, Jun 19, 2015 at 08:21:00AM +0100, Michael Ellerman wrote: Now that we don't have the kernel header on hand, just define the minimum set of hcall opcodes and return values we need in order to build. Thanks Michael! I pushed both of these out. Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] powerpc: Define the hcall opcodes return values we need
On Fri, Jun 19, 2015 at 08:21:00AM +0100, Michael Ellerman wrote: Now that we don't have the kernel header on hand, just define the minimum set of hcall opcodes and return values we need in order to build. Thanks Michael! I pushed both of these out. Will -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] powerpc: implement barrier primitives
On Thu, Jun 18, 2015 at 10:11:58AM +0100, Michael Ellerman wrote: On Wed, 2015-06-17 at 11:15 +0100, Will Deacon wrote: On Wed, Jun 17, 2015 at 10:43:48AM +0100, Andre Przywara wrote: Instead of referring to the Linux header including the barrier macros, copy over the rather simple implementation for the PowerPC barrier instructions kvmtool uses. This fixes build for powerpc. Signed-off-by: Andre Przywara andre.przyw...@arm.com --- Hi, I just took what kvmtool seems to have used before, I actually have no idea if sync is the right instruction or lwsync would do. Would be nice if some people with PowerPC knowledge could comment. I *think* we can use lwsync for rmb and wmb, but would want confirmation from a ppc guy before making that change! Ugh, memory barriers :) I prefer to call them Job Security :) You probably can use lwsync, assuming you're only ordering cacheable vs cacheable. But, lwsync has given us pain in the past[1], so I'd be happier if you just used sync. No probs. I pushed Andre's original patch. Will -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] powerpc: implement barrier primitives
On Thu, Jun 18, 2015 at 10:11:58AM +0100, Michael Ellerman wrote: On Wed, 2015-06-17 at 11:15 +0100, Will Deacon wrote: On Wed, Jun 17, 2015 at 10:43:48AM +0100, Andre Przywara wrote: Instead of referring to the Linux header including the barrier macros, copy over the rather simple implementation for the PowerPC barrier instructions kvmtool uses. This fixes build for powerpc. Signed-off-by: Andre Przywara andre.przyw...@arm.com --- Hi, I just took what kvmtool seems to have used before, I actually have no idea if sync is the right instruction or lwsync would do. Would be nice if some people with PowerPC knowledge could comment. I *think* we can use lwsync for rmb and wmb, but would want confirmation from a ppc guy before making that change! Ugh, memory barriers :) I prefer to call them Job Security :) You probably can use lwsync, assuming you're only ordering cacheable vs cacheable. But, lwsync has given us pain in the past[1], so I'd be happier if you just used sync. No probs. I pushed Andre's original patch. Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] kvmtool: Save datamatch as little endian in {add,del}_event
On Wed, Jun 17, 2015 at 08:17:49AM +0100, Andreas Herrmann wrote: On Tue, Jun 16, 2015 at 06:17:14PM +0100, Will Deacon wrote: On Mon, Jun 15, 2015 at 12:49:45PM +0100, Andreas Herrmann wrote: W/o dedicated endianess it's impossible to find reliably a match e.g. in kernel/virt/kvm/eventfd.c ioeventfd_in_range. Hmm, but shouldn't this be the endianness of the guest, rather than just forcing things to little-endian? With my patch and following adaption to ioeventfd_in_range (in virt/kvm/eventfd.c): [...] But now I see, w/o a correponding kernel change the patch shouldn't be merged. Digging a bit deeper, I think it's up to the architecture KVM backend (in the kernel) to present the mmio buffer to core kvm in the host endianness. For example, on ARM, we honour the endianness of the vcpu in vcpu_data_guest_to_host when we populate the buffer for kvm_io_bus_write (which is what ends up in the ioeventfd code). I couldn't find equivalent code for MIPs, but I may have been looking in the wrong place. Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] powerpc: add hvcall.h header from Linux
On Wed, Jun 17, 2015 at 10:43:50AM +0100, Andre Przywara wrote: The powerpc code uses some PAPR hypercalls, of which we need the hypercall number. Copy the macro definition parts from the kernel's (private) hvcall.h file and remove the extra tricks formerly used to be able to include this header file directly. Signed-off-by: Andre Przywara andre.przyw...@arm.com --- Hi, I copied most of the Linux header, without removing definitions that kvmtool doesn't use. That should make updates easier. If people would prefer a bespoke header, let me know. I'd rather just #define the stuff we need now that we're outside of the kernel source tree. Will -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] powerpc: add hvcall.h header from Linux
On Wed, Jun 17, 2015 at 10:43:50AM +0100, Andre Przywara wrote: The powerpc code uses some PAPR hypercalls, of which we need the hypercall number. Copy the macro definition parts from the kernel's (private) hvcall.h file and remove the extra tricks formerly used to be able to include this header file directly. Signed-off-by: Andre Przywara andre.przyw...@arm.com --- Hi, I copied most of the Linux header, without removing definitions that kvmtool doesn't use. That should make updates easier. If people would prefer a bespoke header, let me know. I'd rather just #define the stuff we need now that we're outside of the kernel source tree. Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] powerpc: implement barrier primitives
On Wed, Jun 17, 2015 at 10:43:48AM +0100, Andre Przywara wrote: Instead of referring to the Linux header including the barrier macros, copy over the rather simple implementation for the PowerPC barrier instructions kvmtool uses. This fixes build for powerpc. Signed-off-by: Andre Przywara andre.przyw...@arm.com --- Hi, I just took what kvmtool seems to have used before, I actually have no idea if sync is the right instruction or lwsync would do. Would be nice if some people with PowerPC knowledge could comment. I *think* we can use lwsync for rmb and wmb, but would want confirmation from a ppc guy before making that change! Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] kvmtool: Save datamatch as little endian in {add,del}_event
On Mon, Jun 15, 2015 at 12:49:45PM +0100, Andreas Herrmann wrote: W/o dedicated endianess it's impossible to find reliably a match e.g. in kernel/virt/kvm/eventfd.c ioeventfd_in_range. Hmm, but shouldn't this be the endianness of the guest, rather than just forcing things to little-endian? Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 01/11] KVM: arm: plug guest debug exploit
On Sun, Jun 14, 2015 at 05:13:05PM +0100, zichao wrote: I and marc are talking about how to plug the guest debug exploit in an easier way. I remembered that you mentioned disabling monitor mode had proven to be extremely fragile in practice on 32-bit ARM SoCs, what if I save/restore the debug monitor mode on each switch between the guest and the host, would it be acceptable? If you're just referring to DBGDSCRext, then you could give it a go, but you'll certainly want to predicate any writes to that register on whether or not hw_breakpoint managed to reset the debug regs on the host. Like I said, accessing these registers always worries me, so I'd really avoid it in KVM if you can. If not, you'll need to do extensive testing on a bunch of platforms with and without the presence of external debug. Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] kvmtool: Misc fixes
On Mon, Jun 15, 2015 at 12:49:41PM +0100, Andreas Herrmann wrote: Following some patches to fix misc issues found when testing the standalone kvmtool version. Please apply. All applied, apart from the ioeventfd patch which I'm not sure about. Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvmtool: don't use PCI config space IRQ line field
On Mon, Jun 15, 2015 at 11:45:38AM +0100, Andre Przywara wrote: On 06/05/2015 05:41 PM, Will Deacon wrote: On Thu, Jun 04, 2015 at 04:20:45PM +0100, Andre Przywara wrote: In PCI config space there is an interrupt line field (offset 0x3f), which is used to initially communicate the IRQ line number from firmware to the OS. _Hardware_ should never use this information, as the OS is free to write any information in there. But kvmtool uses this number when it triggers IRQs in the guest, which fails starting with Linux 3.19-rc1, where the PCI layer starts writing the virtual IRQ number in there. Fix that by storing the IRQ number in a separate field in struct virtio_pci, which is independent from the PCI config space and cannot be influenced by the guest. This fixes ARM/ARM64 guests using PCI with newer kernels. Signed-off-by: Andre Przywara andre.przyw...@arm.com --- include/kvm/virtio-pci.h | 8 virtio/pci.c | 9 ++--- 2 files changed, 14 insertions(+), 3 deletions(-) diff --git a/include/kvm/virtio-pci.h b/include/kvm/virtio-pci.h index c795ce7..b70cadd 100644 --- a/include/kvm/virtio-pci.h +++ b/include/kvm/virtio-pci.h @@ -30,6 +30,14 @@ struct virtio_pci { u8 isr; u32 features; + /* + * We cannot rely on the INTERRUPT_LINE byte in the config space once + * we have run guest code, as the OS is allowed to use that field + * as a scratch pad to communicate between driver and PCI layer. + * So store our legacy interrupt line number in here for internal use. + */ + u8 legacy_irq_line; + /* MSI-X */ u16 config_vector; u32 config_gsi; diff --git a/virtio/pci.c b/virtio/pci.c index 7556239..e17e5a9 100644 --- a/virtio/pci.c +++ b/virtio/pci.c @@ -141,7 +141,7 @@ static bool virtio_pci__io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 p break; case VIRTIO_PCI_ISR: ioport__write8(data, vpci-isr); - kvm__irq_line(kvm, vpci-pci_hdr.irq_line, VIRTIO_IRQ_LOW); + kvm__irq_line(kvm, vpci-legacy_irq_line, VIRTIO_IRQ_LOW); vpci-isr = VIRTIO_IRQ_LOW; break; default: @@ -299,7 +299,7 @@ int virtio_pci__signal_vq(struct kvm *kvm, struct virtio_device *vdev, u32 vq) kvm__irq_trigger(kvm, vpci-gsis[vq]); } else { vpci-isr = VIRTIO_IRQ_HIGH; - kvm__irq_trigger(kvm, vpci-pci_hdr.irq_line); + kvm__irq_trigger(kvm, vpci-legacy_irq_line); } return 0; } @@ -323,7 +323,7 @@ int virtio_pci__signal_config(struct kvm *kvm, struct virtio_device *vdev) kvm__irq_trigger(kvm, vpci-config_gsi); } else { vpci-isr = VIRTIO_PCI_ISR_CONFIG; - kvm__irq_trigger(kvm, vpci-pci_hdr.irq_line); + kvm__irq_trigger(kvm, vpci-legacy_irq_line); } return 0; @@ -422,6 +422,9 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev, if (r 0) goto free_msix_mmio; + /* save the IRQ that device__register() has allocated */ + vpci-legacy_irq_line = vpci-pci_hdr.irq_line; I'd rather we used the container_of trick that we do for virtio-mmio devices when assigning the irq in device__register. Then we can avoid this line completely. Not completely sure I get what you mean, I take it you want to assign legacy_irq_line in pci__assign_irq() directly (where the IRQ number is allocated). But this function is PCI generic code and is used by the VESA framebuffer and the shmem device on x86 as well. For those devices dev_hdr is not part of a struct virtio_pci, so we can't do container_of to assign the legacy_irq_line here directly. Admittedly this fix should apply to the other two users as well, but VESA does not use interrupts and pci-shmem is completely broken anyway, so I didn't bother to fix it in this regard. Would it be justified to provide an IRQ number field in struct device_header to address all users? Or what am I missing here? If VESA and shmem are broken, they should either be fixed or removed. If you fix them, then we could have separate virtual buses for virtio-pci and emulated-pci (or whatever you want to call it). We could also have a separate bus for passthrough-devices too. However, that's quite a lot of work for a bug-fix, so I guess the easiest thing is to extend your current hack to cover VESA and shmem too. Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvmtool: don't use PCI config space IRQ line field
On Thu, Jun 04, 2015 at 04:20:45PM +0100, Andre Przywara wrote: In PCI config space there is an interrupt line field (offset 0x3f), which is used to initially communicate the IRQ line number from firmware to the OS. _Hardware_ should never use this information, as the OS is free to write any information in there. But kvmtool uses this number when it triggers IRQs in the guest, which fails starting with Linux 3.19-rc1, where the PCI layer starts writing the virtual IRQ number in there. Fix that by storing the IRQ number in a separate field in struct virtio_pci, which is independent from the PCI config space and cannot be influenced by the guest. This fixes ARM/ARM64 guests using PCI with newer kernels. Signed-off-by: Andre Przywara andre.przyw...@arm.com --- include/kvm/virtio-pci.h | 8 virtio/pci.c | 9 ++--- 2 files changed, 14 insertions(+), 3 deletions(-) diff --git a/include/kvm/virtio-pci.h b/include/kvm/virtio-pci.h index c795ce7..b70cadd 100644 --- a/include/kvm/virtio-pci.h +++ b/include/kvm/virtio-pci.h @@ -30,6 +30,14 @@ struct virtio_pci { u8 isr; u32 features; + /* + * We cannot rely on the INTERRUPT_LINE byte in the config space once + * we have run guest code, as the OS is allowed to use that field + * as a scratch pad to communicate between driver and PCI layer. + * So store our legacy interrupt line number in here for internal use. + */ + u8 legacy_irq_line; + /* MSI-X */ u16 config_vector; u32 config_gsi; diff --git a/virtio/pci.c b/virtio/pci.c index 7556239..e17e5a9 100644 --- a/virtio/pci.c +++ b/virtio/pci.c @@ -141,7 +141,7 @@ static bool virtio_pci__io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 p break; case VIRTIO_PCI_ISR: ioport__write8(data, vpci-isr); - kvm__irq_line(kvm, vpci-pci_hdr.irq_line, VIRTIO_IRQ_LOW); + kvm__irq_line(kvm, vpci-legacy_irq_line, VIRTIO_IRQ_LOW); vpci-isr = VIRTIO_IRQ_LOW; break; default: @@ -299,7 +299,7 @@ int virtio_pci__signal_vq(struct kvm *kvm, struct virtio_device *vdev, u32 vq) kvm__irq_trigger(kvm, vpci-gsis[vq]); } else { vpci-isr = VIRTIO_IRQ_HIGH; - kvm__irq_trigger(kvm, vpci-pci_hdr.irq_line); + kvm__irq_trigger(kvm, vpci-legacy_irq_line); } return 0; } @@ -323,7 +323,7 @@ int virtio_pci__signal_config(struct kvm *kvm, struct virtio_device *vdev) kvm__irq_trigger(kvm, vpci-config_gsi); } else { vpci-isr = VIRTIO_PCI_ISR_CONFIG; - kvm__irq_trigger(kvm, vpci-pci_hdr.irq_line); + kvm__irq_trigger(kvm, vpci-legacy_irq_line); } return 0; @@ -422,6 +422,9 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev, if (r 0) goto free_msix_mmio; + /* save the IRQ that device__register() has allocated */ + vpci-legacy_irq_line = vpci-pci_hdr.irq_line; I'd rather we used the container_of trick that we do for virtio-mmio devices when assigning the irq in device__register. Then we can avoid this line completely. Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: stand-alone kvmtool
On Mon, Feb 23, 2015 at 05:23:58PM +, Pekka Enberg wrote: On 2/18/15 5:50 PM, Will Deacon wrote: Thanks for doing this. Since it looks unlikely that kvmtool will ever be merged back into the kernel tree, it makes sense to cut the dependency in my opinion. I am certainly OK with a standalone repository which preserves the history. Will, would you like to take over the proposed new repository and put it somewhere on git.kernel.org, perhaps? Finally got around to this with Andre's help, so I'll post a separate mail advertising it. Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ANNOUNCE] Stand-alone kvmtool repository
Hi all, Thanks to a tonne of help (and friendly nagging!) from Andre, I'm pleased to announce a stand-alone repository for kvmtool: git://git.kernel.org/pub/scm/linux/kernel/git/will/kvmtool.git This is a follow-up from a previous thread[1] where we concluded that this was the right way to go. As requested, the original git history is preserved but is *not* bisectable since we don't include the kernel tree in the history. For a bisectable history, the original github repo[2] should be used instead. For now, it's living in my area of kernel.org since (a) I already had a fork there and (b) the patch traffic is pretty low for the project. If there is sufficient development interest, I can move it somewhere more appropriate for shared maintenance (e.g. under virt/kvm/). Anyway, please give it a spin and send patches when it falls over ;) Will [1] https://lkml.org/lkml/2015/2/23/393 [2] git://github.com/penberg/linux-kvm.git -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 09/11] KVM: arm: disable debug mode if we don't actually need it.
On Sun, May 31, 2015 at 05:27:10AM +0100, Zhichao Huang wrote: Until now we enable debug mode all the time even if we don't actually need it. Inspired by the implementation in arm64, disable debug mode if we don't need it. And then we are able to reduce unnecessary registers saving/restoring when the debug mode is disabled. I'm terrified about this patch. Enabling monitor mode has proven to be *extremely* fragile in practice on 32-bit ARM SoCs, so trying to do this morei often makes me very nervous. Signed-off-by: Zhichao Huang zhichao.hu...@linaro.org --- arch/arm/kernel/hw_breakpoint.c | 55 ++--- 1 file changed, 46 insertions(+), 9 deletions(-) diff --git a/arch/arm/kernel/hw_breakpoint.c b/arch/arm/kernel/hw_breakpoint.c index dc7d0a9..1d27563 100644 --- a/arch/arm/kernel/hw_breakpoint.c +++ b/arch/arm/kernel/hw_breakpoint.c @@ -266,8 +266,7 @@ static int enable_monitor_mode(void) } /* Check that the write made it through. */ - ARM_DBG_READ(c0, c1, 0, dscr); - if (!(dscr ARM_DSCR_MDBGEN)) { + if (!monitor_mode_enabled()) { pr_warn_once(Failed to enable monitor mode on CPU %d.\n, smp_processor_id()); return -EPERM; Ok, this hunk is harmless :) @@ -277,6 +276,43 @@ out: return 0; } +static int disable_monitor_mode(void) +{ + u32 dscr; + + ARM_DBG_READ(c0, c1, 0, dscr); + + /* If monitor mode is already disabled, just return. */ + if (!(dscr ARM_DSCR_MDBGEN)) + goto out; + + /* Write to the corresponding DSCR. */ + switch (get_debug_arch()) { + case ARM_DEBUG_ARCH_V6: + case ARM_DEBUG_ARCH_V6_1: + ARM_DBG_WRITE(c0, c1, 0, (dscr ~ARM_DSCR_MDBGEN)); + break; + case ARM_DEBUG_ARCH_V7_ECP14: + case ARM_DEBUG_ARCH_V7_1: + case ARM_DEBUG_ARCH_V8: + ARM_DBG_WRITE(c0, c2, 2, (dscr ~ARM_DSCR_MDBGEN)); + isb(); + break; + default: + return -ENODEV; + } + + /* Check that the write made it through. */ + if (monitor_mode_enabled()) { + pr_warn_once(Failed to disable monitor mode on CPU %d.\n, + smp_processor_id()); + return -EPERM; + } + +out: + return 0; +} I'm not comfortable with this. enable_monitor_mode has precisly one caller [reset_ctrl_regs] which goes to some lengths to get the system into a well-defined state. On top of that, the whole thing is run with an undef hook registered because there isn't an architected way to discover whether or not DBGSWENABLE is driven low. Why exactly do you need this? Can you not trap guest debug accesses some other way? int hw_breakpoint_slots(int type) { if (!debug_arch_supported()) @@ -338,6 +374,8 @@ int arch_install_hw_breakpoint(struct perf_event *bp) int i, max_slots, ctrl_base, val_base; u32 addr, ctrl; + enable_monitor_mode(); + addr = info-address; ctrl = encode_ctrl_reg(info-ctrl) | 0x1; @@ -430,6 +468,8 @@ void arch_uninstall_hw_breakpoint(struct perf_event *bp) /* Reset the control register. */ write_wb_reg(base + i, 0); + + disable_monitor_mode(); My previous concerns notwithstanding, shouldn't this be refcounted? Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 10/12] KVM: arm64: guest debug, HW assisted debug support
Hi Alex, On Fri, May 29, 2015 at 10:30:26AM +0100, Alex Bennée wrote: This adds support for userspace to control the HW debug registers for guest debug. In the debug ioctl we copy the IMPDEF defined number of registers into a new register set called host_debug_state. There is now a new vcpu parameter called debug_ptr which selects which register set is to copied into the real registers when world switch occurs. I've moved some helper functions into the hw_breakpoint.h header for re-use. As with single step we need to tweak the guest registers to enable the exceptions so we need to save and restore those bits. Two new capabilities have been added to the KVM_EXTENSION ioctl to allow userspace to query the number of hardware break and watch points available on the host hardware. Signed-off-by: Alex Bennée alex.ben...@linaro.org [...] diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 33c8143..ada57df 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2668,7 +2668,7 @@ The top 16 bits of the control field are architecture specific control flags which can include the following: - KVM_GUESTDBG_USE_SW_BP: using software breakpoints [x86, arm64] - - KVM_GUESTDBG_USE_HW_BP: using hardware breakpoints [x86, s390] + - KVM_GUESTDBG_USE_HW_BP: using hardware breakpoints [x86, s390, arm64] - KVM_GUESTDBG_INJECT_DB: inject DB type exception [x86] - KVM_GUESTDBG_INJECT_BP: inject BP type exception [x86] - KVM_GUESTDBG_EXIT_PENDING: trigger an immediate guest exit [s390] @@ -2683,6 +2683,11 @@ updated to the correct (supplied) values. The second part of the structure is architecture specific and typically contains a set of debug registers. +For arm64 the number of debug registers is implementation defined and +can be determined by querying the KVM_CAP_GUEST_DEBUG_HW_BPS and +KVM_CAP_GUEST_DEBUG_HW_WPS capabilities which return a positive number +indicating the number of supported registers. + When debug events exit the main run loop with the reason KVM_EXIT_DEBUG with the kvm_debug_exit_arch part of the kvm_run structure containing architecture specific debug information. diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 0d17c7b..6df47c1 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -307,6 +307,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) #define KVM_GUESTDBG_VALID_MASK (KVM_GUESTDBG_ENABLE |\ KVM_GUESTDBG_USE_SW_BP | \ + KVM_GUESTDBG_USE_HW_BP | \ KVM_GUESTDBG_SINGLESTEP) /** @@ -327,6 +328,12 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, if (dbg-control KVM_GUESTDBG_ENABLE) { vcpu-guest_debug = dbg-control; + + /* Hardware assisted Break and Watch points */ + if (vcpu-guest_debug KVM_GUESTDBG_USE_HW_BP) { + vcpu-arch.external_debug_state = dbg-arch; + } + } else { /* If not enabled clear all flags */ vcpu-guest_debug = 0; diff --git a/arch/arm64/include/asm/hw_breakpoint.h b/arch/arm64/include/asm/hw_breakpoint.h index 52b484b..c450552 100644 --- a/arch/arm64/include/asm/hw_breakpoint.h +++ b/arch/arm64/include/asm/hw_breakpoint.h @@ -130,6 +130,18 @@ static inline void ptrace_hw_copy_thread(struct task_struct *task) } #endif +/* Determine number of BRP registers available. */ +static inline int get_num_brps(void) +{ + return ((read_cpuid(ID_AA64DFR0_EL1) 12) 0xf) + 1; +} + +/* Determine number of WRP registers available. */ +static inline int get_num_wrps(void) +{ + return ((read_cpuid(ID_AA64DFR0_EL1) 20) 0xf) + 1; +} I'm fine with moving these into the header file, but we should probably revisit this at some point so that we use a shadow value to indicate the minimum number of registers across the system for e.g. big.LITTLE platforms that don't have uniform debug resources. extern struct pmu perf_ops_bp; #endif /* __KERNEL__ */ diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index e5040b6..498d4f7 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -113,12 +113,13 @@ struct kvm_vcpu_arch { /* * For debugging the guest we need to keep a set of debug - * registers which can override the guests own debug state + * registers which can override the guest's own debug state * while being used. These are set via the KVM_SET_GUEST_DEBUG * ioctl. */ struct kvm_guest_debug_arch *debug_ptr; struct kvm_guest_debug_arch vcpu_debug_state; + struct kvm_guest_debug_arch external_debug_state; /* Pointer to host CPU context */ kvm_cpu_context_t *host_cpu_context; diff
Re: stand-alone kvmtool
On Mon, Feb 23, 2015 at 05:23:58PM +, Pekka Enberg wrote: Hi, Hi Pekka, Sorry for the delay, I've been away from email for a few days. On 2/18/15 5:50 PM, Will Deacon wrote: Thanks for doing this. Since it looks unlikely that kvmtool will ever be merged back into the kernel tree, it makes sense to cut the dependency in my opinion. I am certainly OK with a standalone repository which preserves the history. Will, would you like to take over the proposed new repository and put it somewhere on git.kernel.org, perhaps? Sure thing. I'll sync-up with Andre and reply back here when we've got something sensible. One extra point that I don't think has been mentioned in this thread so far is that separating kvmtool from the kernel sources is likely a prerequisite for distribution packaging. Once we've got something sorted out, I'll poke some friendly local debian developers and see if they can't package it up for us. Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: stand-alone kvmtool
Hi Andre, Thanks for doing this. Since it looks unlikely that kvmtool will ever be merged back into the kernel tree, it makes sense to cut the dependency in my opinion. On Fri, Feb 13, 2015 at 10:39:33AM +, Andre Przywara wrote: as I found it increasingly inconvenient to use kvmtool[1] as part of a Linux repository, I decided to give it a go and make it a stand-alone project. So I filtered all the respective commits, adjusted the paths in there (while keeping authorship and commit date, of course) and then added the missing bits to let it compile without a kernel tree nearby. The result is now available on: git://linux-arm.org/kvmtool.git http://linux-arm.org/kvmtool.git You can simply check it out, type make and use ./lkvm run for a quick test. So far I briefly tested x86-64, arm and arm64, the later two were also cross-compiled. For sure there are rough edges in there (for instance copying a few non-uapi header files into), but I deem it worthy enough to get some public comments. For me that also fixed some nasty warnings about libfdt, which now are gone due it using your system library version of it. I also managed to get rid of the libc-i386-dev dependency when compiling for x86-64, but that still needs to be cleaned up and thus is not in the current HEAD. I haven't got around to compile-test the other supported architectures, but supporting them should be as easy as copying over the uapi kvm.h header file (see the respective ARM commit). Contributions (and tests!) are welcome. Please give it a go and tell me what you think. I don't want to fork the project, so I am happy if someone official picks it up. In which case, it's probably best to post the patches for review rather than just point me at your git repo! Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v3 0/6] rcu,nohz,kvm: use RCU extended quiescent state when running KVM guest
Hi Rik, On Mon, Feb 09, 2015 at 04:04:38PM +, r...@redhat.com wrote: Apologies to Catalin and Will for not fixing up ARM. I am not familiar with ARM assembly, and not sure how to pass a constant argument to a function from assembly code on ARM :) It's a bit of a faff getting enum values into asm -- we actually have to duplicate the definitions using #defines to get at the constants. Perhaps it would be cleaner to leave context_tracking_user_{enter,exit} intact as C wrappers around context_tracking_{enter,exit} passing the appropriate constant? That way we don't actually need to change the arch code at all. Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvmtool: don't use PCI config space IRQ line field
On Fri, Feb 06, 2015 at 07:02:25PM +, Peter Maydell wrote: On 6 February 2015 at 18:55, Will Deacon will.dea...@arm.com wrote: On Wed, Feb 04, 2015 at 03:39:50PM +, Andre Przywara wrote: In PCI config space there is an interrupt line field (offset 0x3f), which is used to initially communicate the IRQ line number from firmware to the OS. _Hardware_ should never use this information, as the OS is free to write any information in there. Is this true even with probe-only? I appreciate that this isn't a BAR, but it still feels odd for Linux to write this in that case. The hardware (model) shouldn't be doing anything with the value in this register anyway, so I think this change to kvmtool is correct regardless of Linux's behaviour. Well, kvmtool is also pretending to be firmware in this case, which is why it passes things like probe-only and PSCI nodes. Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvmtool: don't use PCI config space IRQ line field
On Wed, Feb 04, 2015 at 03:39:50PM +, Andre Przywara wrote: In PCI config space there is an interrupt line field (offset 0x3f), which is used to initially communicate the IRQ line number from firmware to the OS. _Hardware_ should never use this information, as the OS is free to write any information in there. Is this true even with probe-only? I appreciate that this isn't a BAR, but it still feels odd for Linux to write this in that case. Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/11] kvmtool: AArch{32,64}: use KVM_CREATE_DEVICE co to instanciate the GIC
On Fri, Jan 23, 2015 at 04:35:02PM +, Andre Przywara wrote: From: Marc Zyngier marc.zyng...@arm.com As of 3.14, KVM/arm supports the creation/configuration of the GIC through a more generic device API, which is now the preferred way to do so. Plumb the new API in, and allow the old code to be used as a fallback. [Andre: Rename some functions on the way to differentiate between creation and initialisation more clearly.] Signed-off-by: Marc Zyngier marc.zyng...@arm.com Signed-off-by: Andre Przywara andre.przyw...@arm.com --- tools/kvm/arm/gic.c| 60 tools/kvm/arm/include/arm-common/gic.h |2 +- tools/kvm/arm/kvm.c|6 ++-- 3 files changed, 57 insertions(+), 11 deletions(-) diff --git a/tools/kvm/arm/gic.c b/tools/kvm/arm/gic.c index 5d8cbe6..ce5f7fa 100644 --- a/tools/kvm/arm/gic.c +++ b/tools/kvm/arm/gic.c @@ -7,7 +7,41 @@ #include linux/byteorder.h #include linux/kvm.h -int gic__init_irqchip(struct kvm *kvm) +static int gic_fd = -1; + +static int gic__create_device(struct kvm *kvm) +{ + int err; + u64 cpu_if_addr = ARM_GIC_CPUI_BASE; + u64 dist_addr = ARM_GIC_DIST_BASE; + struct kvm_create_device gic_device = { + .type = KVM_DEV_TYPE_ARM_VGIC_V2, + }; + struct kvm_device_attr cpu_if_attr = { + .group = KVM_DEV_ARM_VGIC_GRP_ADDR, + .attr = KVM_VGIC_V2_ADDR_TYPE_CPU, + .addr = (u64)(unsigned long)cpu_if_addr, + }; + struct kvm_device_attr dist_attr = { + .group = KVM_DEV_ARM_VGIC_GRP_ADDR, + .attr = KVM_VGIC_V2_ADDR_TYPE_DIST, + .addr = (u64)(unsigned long)dist_addr, + }; + + err = ioctl(kvm-vm_fd, KVM_CREATE_DEVICE, gic_device); + if (err) + return err; + + gic_fd = gic_device.fd; + + err = ioctl(gic_fd, KVM_SET_DEVICE_ATTR, cpu_if_attr); + if (err) + return err; + + return ioctl(gic_fd, KVM_SET_DEVICE_ATTR, dist_attr); +} + +static int gic__create_irqchip(struct kvm *kvm) { int err; struct kvm_arm_device_addr gic_addr[] = { @@ -23,12 +57,6 @@ int gic__init_irqchip(struct kvm *kvm) } }; - if (kvm-nrcpus GIC_MAX_CPUS) { - pr_warning(%d CPUS greater than maximum of %d -- truncating\n, - kvm-nrcpus, GIC_MAX_CPUS); - kvm-nrcpus = GIC_MAX_CPUS; - } - err = ioctl(kvm-vm_fd, KVM_CREATE_IRQCHIP); if (err) return err; @@ -41,6 +69,24 @@ int gic__init_irqchip(struct kvm *kvm) return err; } +int gic__create(struct kvm *kvm) +{ + int err; + + if (kvm-nrcpus GIC_MAX_CPUS) { + pr_warning(%d CPUS greater than maximum of %d -- truncating\n, + kvm-nrcpus, GIC_MAX_CPUS); + kvm-nrcpus = GIC_MAX_CPUS; + } + + /* Try the new way first, and fallback on legacy method otherwise */ + err = gic__create_device(kvm); + if (err) + err = gic__create_irqchip(kvm); This fallback doesn't look safe to me: - gic_fd might remain initialised - What does the kernel vgic driver do if you've already done a successful KVM_CREATE_DEVICE and then try to use the legacy method? Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/11] kvmtool: add command line parameter to instantiate a vGICv3
On Fri, Jan 23, 2015 at 04:35:10PM +, Andre Przywara wrote: Add the command line parameter --gicv3 to request GICv3 emulation in the kernel. Connect that to the already existing GICv3 code. Signed-off-by: Andre Przywara andre.przyw...@arm.com --- tools/kvm/arm/aarch64/arm-cpu.c|5 - .../kvm/arm/aarch64/include/kvm/kvm-config-arch.h |4 +++- tools/kvm/arm/gic.c| 14 ++ tools/kvm/arm/include/arm-common/kvm-config-arch.h |1 + tools/kvm/arm/kvm-cpu.c|2 +- tools/kvm/arm/kvm.c|3 ++- 6 files changed, 25 insertions(+), 4 deletions(-) diff --git a/tools/kvm/arm/aarch64/arm-cpu.c b/tools/kvm/arm/aarch64/arm-cpu.c index a70d6bb..46d6d6a 100644 --- a/tools/kvm/arm/aarch64/arm-cpu.c +++ b/tools/kvm/arm/aarch64/arm-cpu.c @@ -12,7 +12,10 @@ static void generate_fdt_nodes(void *fdt, struct kvm *kvm, u32 gic_phandle) { int timer_interrupts[4] = {13, 14, 11, 10}; - gic__generate_fdt_nodes(fdt, gic_phandle, KVM_DEV_TYPE_ARM_VGIC_V2); + gic__generate_fdt_nodes(fdt, gic_phandle, + kvm-cfg.arch.gicv3 ? + KVM_DEV_TYPE_ARM_VGIC_V3 : + KVM_DEV_TYPE_ARM_VGIC_V2); timer__generate_fdt_nodes(fdt, kvm, timer_interrupts); } diff --git a/tools/kvm/arm/aarch64/include/kvm/kvm-config-arch.h b/tools/kvm/arm/aarch64/include/kvm/kvm-config-arch.h index 89860ae..106e52f 100644 --- a/tools/kvm/arm/aarch64/include/kvm/kvm-config-arch.h +++ b/tools/kvm/arm/aarch64/include/kvm/kvm-config-arch.h @@ -3,7 +3,9 @@ #define ARM_OPT_ARCH_RUN(cfg) \ OPT_BOOLEAN('\0', aarch32, (cfg)-aarch32_guest, \ - Run AArch32 guest), + Run AArch32 guest), \ + OPT_BOOLEAN('\0', gicv3, (cfg)-gicv3, \ + Use a GICv3 interrupt controller in the guest), On a GICv3-capable system, why would I *not* want to enable this option? In other words, could we make this the default behaviour on systems that support it, and if you need an override then it should be something like --force-gicv2. Or am I missing a key piece of the puzzle? Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvmtool: virtio-9p: Convert EMFILE error at the server to ENFILE for the guest
On Fri, Jan 16, 2015 at 11:30:10AM +, Suzuki K. Poulose wrote: From: Suzuki K. Poulose suzuki.poul...@arm.com If an open at the 9p server(host) fails with EMFILE (Too many open files for the process), we should return ENFILE(too many open files in the system) to the guest to indicate the actual status within the guest. This was uncovered during LTP, where getdtablesize01 fails to open the maximum number-open-files. getdtablesize010 TINFO : Maximum number of files a process can have opened is 1024 getdtablesize010 TINFO : Checking with the value returned by getrlimit...RLIMIT_NOFILE getdtablesize011 TPASS : got correct dtablesize, value is 1024 getdtablesize010 TINFO : Checking Max num of files that can be opened by a process.Should be: RLIMIT_NOFILE - 1 getdtablesize012 TFAIL : getdtablesize01.c:102: 974 != 1023 For a more practial impact: # ./getdtablesize01 [1] 1834 getdtablesize010 TINFO : Maximum number of files a process can have opened is 1024 getdtablesize010 TINFO : Checking with the value returned by getrlimit...RLIMIT_NOFILE getdtablesize011 TPASS : got correct dtablesize, value is 1024 getdtablesize010 TINFO : Checking Max num of files that can be opened by a process.Should be: RLIMIT_NOFILE - 1 getdtablesize012 TFAIL : getdtablesize01.c:102: 974 != 1023 [--- Modified to sleep indefinitely, without closing the files --- ] # ls bash: /bin/ls: Too many open files That gives a wrong error message for the bash, when getdtablesize01 has exhausted the system wide limits, giving false indicators. With the fix, we get : # ls bash: /bin/ls: Too many open files in system I doubt this is affecting anybody in practice, but: Acked-by: Will Deacon will.dea...@arm.com Will Signed-off-by: Suzuki K. Poulose suzuki.poul...@arm.com --- tools/kvm/virtio/9p.c |4 1 file changed, 4 insertions(+) diff --git a/tools/kvm/virtio/9p.c b/tools/kvm/virtio/9p.c index 9073a1e..b24c0f2 100644 --- a/tools/kvm/virtio/9p.c +++ b/tools/kvm/virtio/9p.c @@ -152,6 +152,10 @@ static void virtio_p9_error_reply(struct p9_dev *p9dev, { u16 tag; + /* EMFILE at server implies ENFILE for the VM */ + if (err == EMFILE) + err = ENFILE; + pdu-write_offset = VIRTIO_9P_HDR_LEN; virtio_p9_pdu_writef(pdu, d, err); *outlen = pdu-write_offset; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] kvmtool: remove 8250 IRQ line reset on device_init
On Fri, Jan 09, 2015 at 03:58:19PM +, Andre Przywara wrote: Currently we reset the KVM interrupt line on initializing the 8250 serial device emulation. For ARM this creates a problem where we use the in-kernel IRQ chip before having fully initialized it. But with the new kernel interface we cannot finish the GIC initialization before we know the number of used IRQs, so we have to wait until all devices have been created and initialized. Since the in-kernel GIC emulation resets the IRQ line anyway and also QEMU gets away without resetting it, the easiest solution is to drop the IRQ line reset. Signed-off-by: Andre Przywara andre.przyw...@arm.com --- Hi Pekka, this patch is an easy fix for our problems with Marc's kvmtool tree and 3.19-rc (setting number of IRQs after the GIC has been used already). I see that this basically reverts: commit 2ca9e37193ca5f5df5726e0061dc749829295435 Author: Pekka Enberg penb...@kernel.org Date: Sun Jan 23 11:49:39 2011 +0200 kvm,8250: Fix device initial state This patch fixes 8250 device initial state for registers and IRQ based on what Qemu does. Do you (or does anyone) know of the issue that this patch fixed? This is four years old and from what I see QEMU does no longer the mentioned IRQ reset(?). Reworking kvmtool to avoid this issue sound rather painful. I'm fine with this from an ARM point-of-view. Acked-by: Will Deacon will.dea...@arm.com Have you tested this on x86? Will tools/kvm/hw/serial.c |1 - 1 file changed, 1 deletion(-) diff --git a/tools/kvm/hw/serial.c b/tools/kvm/hw/serial.c index 270e6182..2f19ba8 100644 --- a/tools/kvm/hw/serial.c +++ b/tools/kvm/hw/serial.c @@ -406,7 +406,6 @@ static int serial8250__device_init(struct kvm *kvm, struct serial8250_device *de ioport__map_irq(dev-irq); r = ioport__register(kvm, dev-iobase, serial8250_ops, 8, dev); - kvm__irq_line(kvm, dev-irq, 0); return r; } -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/5] kvmtool: replace GIC specific IRQ type #defines
On Wed, Dec 17, 2014 at 11:14:44AM +, Andre Przywara wrote: We had GIC specific defines for the IRQ type identifiers in kvmtool. But In fact the specification of being a level or edge interrupt is quite generic, with the GIC binding using the generic Linux defines. So lets replace the GIC specific names in favour of the general defines used in Linux. Signed-off-by: Andre Przywara andre.przyw...@arm.com --- tools/kvm/arm/fdt.c|2 +- tools/kvm/arm/include/arm-common/gic.h |5 - tools/kvm/arm/pci.c|2 +- tools/kvm/arm/timer.c |8 tools/kvm/include/kvm/fdt.h|9 + 5 files changed, 15 insertions(+), 11 deletions(-) diff --git a/tools/kvm/arm/fdt.c b/tools/kvm/arm/fdt.c index 4a33846..24f030f 100644 --- a/tools/kvm/arm/fdt.c +++ b/tools/kvm/arm/fdt.c @@ -79,7 +79,7 @@ static void generate_irq_prop(void *fdt, u8 irq) u32 irq_prop[] = { cpu_to_fdt32(GIC_FDT_IRQ_TYPE_SPI), cpu_to_fdt32(irq - GIC_SPI_IRQ_BASE), - cpu_to_fdt32(GIC_FDT_IRQ_FLAGS_EDGE_LO_HI), + cpu_to_fdt32(IRQ_TYPE_EDGE_RISING), }; _FDT(fdt_property(fdt, interrupts, irq_prop, sizeof(irq_prop))); diff --git a/tools/kvm/arm/include/arm-common/gic.h b/tools/kvm/arm/include/arm-common/gic.h index 850edc7..5a36f2c 100644 --- a/tools/kvm/arm/include/arm-common/gic.h +++ b/tools/kvm/arm/include/arm-common/gic.h @@ -10,11 +10,6 @@ #define GIC_FDT_IRQ_TYPE_SPI 0 #define GIC_FDT_IRQ_TYPE_PPI 1 -#define GIC_FDT_IRQ_FLAGS_EDGE_LO_HI 1 -#define GIC_FDT_IRQ_FLAGS_EDGE_HI_LO 2 -#define GIC_FDT_IRQ_FLAGS_LEVEL_HI 4 -#define GIC_FDT_IRQ_FLAGS_LEVEL_LO 8 - #define GIC_FDT_IRQ_PPI_CPU_SHIFT8 #define GIC_FDT_IRQ_PPI_CPU_MASK (0xff GIC_FDT_IRQ_PPI_CPU_SHIFT) diff --git a/tools/kvm/arm/pci.c b/tools/kvm/arm/pci.c index 9f4dabc..99a8130 100644 --- a/tools/kvm/arm/pci.c +++ b/tools/kvm/arm/pci.c @@ -87,7 +87,7 @@ void pci__generate_fdt_nodes(void *fdt, u32 gic_phandle) .gic_irq = { .type = cpu_to_fdt32(GIC_FDT_IRQ_TYPE_SPI), .num= cpu_to_fdt32(irq - GIC_SPI_IRQ_BASE), - .flags = cpu_to_fdt32(GIC_FDT_IRQ_FLAGS_EDGE_LO_HI), + .flags = cpu_to_fdt32(IRQ_TYPE_EDGE_RISING), }, }; diff --git a/tools/kvm/arm/timer.c b/tools/kvm/arm/timer.c index 209251e..29991da 100644 --- a/tools/kvm/arm/timer.c +++ b/tools/kvm/arm/timer.c @@ -15,19 +15,19 @@ void timer__generate_fdt_nodes(void *fdt, struct kvm *kvm, int *irqs) u32 irq_prop[] = { cpu_to_fdt32(GIC_FDT_IRQ_TYPE_PPI), cpu_to_fdt32(irqs[0]), - cpu_to_fdt32(cpu_mask | GIC_FDT_IRQ_FLAGS_EDGE_LO_HI), + cpu_to_fdt32(cpu_mask | IRQ_TYPE_EDGE_RISING), cpu_to_fdt32(GIC_FDT_IRQ_TYPE_PPI), cpu_to_fdt32(irqs[1]), - cpu_to_fdt32(cpu_mask | GIC_FDT_IRQ_FLAGS_EDGE_LO_HI), + cpu_to_fdt32(cpu_mask | IRQ_TYPE_EDGE_RISING), cpu_to_fdt32(GIC_FDT_IRQ_TYPE_PPI), cpu_to_fdt32(irqs[2]), - cpu_to_fdt32(cpu_mask | GIC_FDT_IRQ_FLAGS_EDGE_LO_HI), + cpu_to_fdt32(cpu_mask | IRQ_TYPE_EDGE_RISING), cpu_to_fdt32(GIC_FDT_IRQ_TYPE_PPI), cpu_to_fdt32(irqs[3]), - cpu_to_fdt32(cpu_mask | GIC_FDT_IRQ_FLAGS_EDGE_LO_HI), + cpu_to_fdt32(cpu_mask | IRQ_TYPE_EDGE_RISING), }; _FDT(fdt_begin_node(fdt, timer)); diff --git a/tools/kvm/include/kvm/fdt.h b/tools/kvm/include/kvm/fdt.h index 19f95ac..dee9a71 100644 --- a/tools/kvm/include/kvm/fdt.h +++ b/tools/kvm/include/kvm/fdt.h @@ -7,6 +7,15 @@ #define FDT_MAX_SIZE 0x1 +/* Those definitions are generic FDT values for specifying IRQ + * types and are used in the Linux kernel internally as well as in + * the dts files and their documentation. + */ +#define IRQ_TYPE_EDGE_RISING 1 +#define IRQ_TYPE_EDGE_FALLING2 +#define IRQ_TYPE_LEVEL_HIGH 4 +#define IRQ_TYPE_LEVEL_LOW 8 Any chance we can keep this as an enum, please? That matches that the kernel uses internally, and allows you to take a specific type instead of a u32 for the device-tree generating functions. Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 5/5] kvmtool: remove warning about bzImage on non-x86 architectures
On Wed, Dec 17, 2014 at 11:14:47AM +, Andre Przywara wrote: Among the architectures supported by kvmtool, only x86 defines a bzImage format. So we shouldn't bother users of other architectures with a message about something that cannot work. Make the bzImage check dependent on compiling for x86. Signed-off-by: Andre Przywara andre.przyw...@arm.com --- tools/kvm/kvm.c |5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c index e1b9f6c..21817c3 100644 --- a/tools/kvm/kvm.c +++ b/tools/kvm/kvm.c @@ -380,12 +380,15 @@ bool kvm__load_kernel(struct kvm *kvm, const char *kernel_filename, die(%s is not an initrd, initrd_filename); } +#ifdef CONFIG_X86 ret = load_bzimage(kvm, fd_kernel, fd_initrd, kernel_cmdline); if (ret) goto found_kernel; - pr_warning(%s is not a bzImage. Trying to load it as a flat binary..., kernel_filename); + pr_warning(%s is not a bzImage. Trying to load it as an ELF or flat binary..., +kernel_filename); +#endif I don't think you need to change the string (at least, that seems to be an unrelated change). Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] kvmtool: ARM: allow level interrupts in device tree
On Thu, Dec 11, 2014 at 04:30:33PM +, Andre Przywara wrote: Currently we describe every interrupt for each device in the FDT as being edge triggered. Add a parameter to the irq property generation to allow devices to specify their interrupts as level triggered if needed. Signed-off-by: Andre Przywara andre.przyw...@arm.com --- tools/kvm/arm/fdt.c|6 +++--- tools/kvm/hw/serial.c |5 +++-- tools/kvm/include/kvm/ioport.h |4 +++- tools/kvm/ioport.c |6 -- tools/kvm/virtio/mmio.c|5 +++-- 5 files changed, 16 insertions(+), 10 deletions(-) [...] @@ -71,7 +72,8 @@ static void generate_ioport_fdt_node(void *fdt, static void generate_ioport_fdt_node(void *fdt, struct device_header *dev_hdr, void (*generate_irq_prop)(void *fdt, -u8 irq)) +u8 irq, +u32 irq_type)) { die(Unable to generate device tree nodes without libfdt\n); } diff --git a/tools/kvm/virtio/mmio.c b/tools/kvm/virtio/mmio.c index 3a2bd62..28b0651 100644 --- a/tools/kvm/virtio/mmio.c +++ b/tools/kvm/virtio/mmio.c @@ -233,7 +233,8 @@ static void virtio_mmio_mmio_callback(struct kvm_cpu *vcpu, static void generate_virtio_mmio_fdt_node(void *fdt, struct device_header *dev_hdr, void (*generate_irq_prop)(void *fdt, - u8 irq)) + u8 irq, + u32 type)) { char dev_name[DEVICE_NAME_MAX_LEN]; struct virtio_mmio *vmmio = container_of(dev_hdr, @@ -250,7 +251,7 @@ static void generate_virtio_mmio_fdt_node(void *fdt, _FDT(fdt_begin_node(fdt, dev_name)); _FDT(fdt_property_string(fdt, compatible, virtio,mmio)); _FDT(fdt_property(fdt, reg, reg_prop, sizeof(reg_prop))); - generate_irq_prop(fdt, vmmio-irq); + generate_irq_prop(fdt, vmmio-irq, GIC_FDT_IRQ_FLAGS_EDGE_LO_HI); This is a GIC-specific #define in arch-agnostic code. I think we should have a new enum type for describing edge and level interrupts, then just use that instead. Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 1/3] KVM: arm/arm64: unset CONFIG_HAVE_KVM_IRQCHIP
On Sun, Nov 23, 2014 at 05:56:58PM +, Eric Auger wrote: CONFIG_HAVE_KVM_IRQCHIP is needed to support IRQ routing (along with irq_comm.c and irqchip.c usage). This is not the case for arm/arm64 currently. This patch unsets the flag for both arm and arm64. Signed-off-by: Eric Auger eric.au...@linaro.org --- The arm64 change is fine by me. I assume this will go via the kvm tree eventually? Acked-by: Will Deacon will.dea...@arm.com Will arch/arm/kvm/Kconfig | 2 -- arch/arm64/kvm/Kconfig | 1 - 2 files changed, 3 deletions(-) diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig index 466bd29..9f581b1 100644 --- a/arch/arm/kvm/Kconfig +++ b/arch/arm/kvm/Kconfig @@ -55,7 +55,6 @@ config KVM_ARM_MAX_VCPUS config KVM_ARM_VGIC bool KVM support for Virtual GIC depends on KVM_ARM_HOST OF - select HAVE_KVM_IRQCHIP default y ---help--- Adds support for a hardware assisted, in-kernel GIC emulation. @@ -63,7 +62,6 @@ config KVM_ARM_VGIC config KVM_ARM_TIMER bool KVM support for Architected Timers depends on KVM_ARM_VGIC ARM_ARCH_TIMER - select HAVE_KVM_IRQCHIP default y ---help--- Adds support for the Architected Timers in virtual machines diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig index 8ba85e9..279e1a0 100644 --- a/arch/arm64/kvm/Kconfig +++ b/arch/arm64/kvm/Kconfig @@ -50,7 +50,6 @@ config KVM_ARM_MAX_VCPUS config KVM_ARM_VGIC bool depends on KVM_ARM_HOST OF - select HAVE_KVM_IRQCHIP ---help--- Adds support for a hardware assisted, in-kernel GIC emulation. -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 3/4] kvmtool: Handle exit reason KVM_EXIT_SYSTEM_EVENT
On Wed, Oct 01, 2014 at 11:34:54AM +0100, Anup Patel wrote: The KVM_EXIT_SYSTEM_EVENT exit reason was added to define architecture independent system-wide events for a Guest. Currently, it is used by in-kernel PSCI-0.2 emulation of KVM ARM/ARM64 to inform user space about PSCI SYSTEM_OFF or PSCI SYSTEM_RESET request. For now, we simply treat all system-wide guest events as shutdown request in KVMTOOL. Signed-off-by: Pranavkumar Sawargaonkar pranavku...@linaro.org Signed-off-by: Anup Patel anup.pa...@linaro.org Reviewed-by: Andre Przywara andre.przyw...@arm.com --- tools/kvm/kvm-cpu.c | 21 + 1 file changed, 21 insertions(+) diff --git a/tools/kvm/kvm-cpu.c b/tools/kvm/kvm-cpu.c index ee0a8ec..5180039 100644 --- a/tools/kvm/kvm-cpu.c +++ b/tools/kvm/kvm-cpu.c @@ -160,6 +160,27 @@ int kvm_cpu__start(struct kvm_cpu *cpu) goto exit_kvm; case KVM_EXIT_SHUTDOWN: goto exit_kvm; + case KVM_EXIT_SYSTEM_EVENT: + /* + * Print the type of system event and + * treat all system events as shutdown request. + */ + switch (cpu-kvm_run-system_event.type) { + case KVM_SYSTEM_EVENT_SHUTDOWN: + printf( # Info: shutdown system event\n); + goto exit_kvm; + case KVM_SYSTEM_EVENT_RESET: + printf( # Info: reset system event\n); + printf( # Info: KVMTOOL does not support VM reset\n); + printf( # Info: please re-launch the VM manually\n); Can you use pr_info here instead? + goto exit_kvm; + default: + printf( # Warning: unknown system event type=%d\n, +cpu-kvm_run-system_event.type); and pr_warning here? (be sure to drop the '\n's). Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 0/4] kvmtool: ARM/ARM64: Misc updates
On Wed, Oct 01, 2014 at 11:34:51AM +0100, Anup Patel wrote: This patchset updates KVMTOOL to use some of the features supported by Linux-3.16 KVM ARM/ARM64, such as: 1. Target CPU == Host using KVM_ARM_PREFERRED_TARGET vm ioctl 2. Target CPU type Potenza for using KVMTOOL on X-Gene 3. PSCI v0.2 support for Aarch32 and Aarch64 guest 4. System event exit reason Changes since v4: - Avoid using magic '0' target for kvm arm generic target - Added comment for why we need Potenza target in KVMTOOL Please can you send a v5 addressing my minor comment and adding Andre's reviewed-by tags? Thanks, Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm-vfio: do not use module_init
On Wed, Sep 24, 2014 at 12:29:09PM +0100, Paolo Bonzini wrote: /me got confused between the kernel and QEMU. In the kernel, you can only have one module_init function, and it will prevent unloading the module unless you also have the corresponding module_exit function. Happy for you to take the blame, but I think this one's my fault! So, commit 80ce1639727e (KVM: VFIO: register kvm_device_ops dynamically, 2014-09-02) broke unloading of the kvm module, by adding a module_init function and no module_exit. I forget kvm builds as a module for other architectures (ie. not arm/arm64). Repair it by making kvm_vfio_ops_init weak, and checking it in kvm_init. Hehe, if only there was a kconfig option for kvm-vfio.c... Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm-vfio: do not use module_init
On Wed, Sep 24, 2014 at 12:45:53PM +0100, Paolo Bonzini wrote: Il 24/09/2014 13:44, Will Deacon ha scritto: On Wed, Sep 24, 2014 at 12:29:09PM +0100, Paolo Bonzini wrote: /me got confused between the kernel and QEMU. In the kernel, you can only have one module_init function, and it will prevent unloading the module unless you also have the corresponding module_exit function. Happy for you to take the blame, but I think this one's my fault! That's why you were CCed! ;) So, commit 80ce1639727e (KVM: VFIO: register kvm_device_ops dynamically, 2014-09-02) broke unloading of the kvm module, by adding a module_init function and no module_exit. I forget kvm builds as a module for other architectures (ie. not arm/arm64). Repair it by making kvm_vfio_ops_init weak, and checking it in kvm_init. Hehe, if only there was a kconfig option for kvm-vfio.c... Yeah, I was tempted to put it back. What do you think? I think it's nicer than the __weak symbol and I don't see a downside to having the option (the reason for removing it was lack of users, which lasted a couple of hours). Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] kvm-vfio: do not use module_init
On Wed, Sep 24, 2014 at 12:56:20PM +0100, Paolo Bonzini wrote: /me got confused between the kernel and QEMU. In the kernel, you can only have one module_init function, and it will prevent unloading the module unless you also have the corresponding module_exit function. So, commit 80ce1639727e (KVM: VFIO: register kvm_device_ops dynamically, 2014-09-02) broke unloading of the kvm module, by adding a module_init function and no module_exit. Repair it by making kvm_vfio_ops_init weak, and checking it in kvm_init. Cc: Will Deacon will.dea...@arm.com Cc: Gleb Natapov g...@kernel.org Cc: Alex Williamson alex.william...@redhat.com Fixes: 80ce1639727e9d38729c34f162378508c307ca25 Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- virt/kvm/kvm_main.c | 9 + virt/kvm/vfio.c | 4 ++-- virt/kvm/vfio.h | 6 ++ 3 files changed, 17 insertions(+), 2 deletions(-) create mode 100644 virt/kvm/vfio.h diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index db57363cc287..ad3439835adb 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -58,6 +58,10 @@ #include coalesced_mmio.h #include async_pf.h +#ifdef CONFIG_KVM_VFIO +#include vfio.h +#endif Can't you include this unconditionally, then have the #ifdef in the header and a static inline kvm_vfio_ops_init that returns 0? Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] kvm-vfio: do not use module_init
On Wed, Sep 24, 2014 at 01:09:28PM +0100, Paolo Bonzini wrote: /me got confused between the kernel and QEMU. In the kernel, you can only have one module_init function, and it will prevent unloading the module unless you also have the corresponding module_exit function. So, commit 80ce1639727e (KVM: VFIO: register kvm_device_ops dynamically, 2014-09-02) broke unloading of the kvm module, by adding a module_init function and no module_exit. Repair it by making kvm_vfio_ops_init weak, and checking it in kvm_init. Cc: Will Deacon will.dea...@arm.com Cc: Gleb Natapov g...@kernel.org Cc: Alex Williamson alex.william...@redhat.com Fixes: 80ce1639727e9d38729c34f162378508c307ca25 Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- virt/kvm/kvm_main.c | 4 virt/kvm/vfio.c | 4 ++-- virt/kvm/vfio.h | 13 + 3 files changed, 19 insertions(+), 2 deletions(-) create mode 100644 virt/kvm/vfio.h Looks good to me, thanks for fixing this: Acked-by: Will Deacon will.dea...@arm.com Will diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index db57363cc287..499db0977f3c 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -57,6 +57,7 @@ #include coalesced_mmio.h #include async_pf.h +#include vfio.h #define CREATE_TRACE_POINTS #include trace/events/kvm.h @@ -3226,6 +3227,9 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align, goto out_undebugfs; } + r = kvm_vfio_ops_init(); + WARN_ON(r); + return 0; out_undebugfs: diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c index bb11b36ee8a2..281e7cf2b8e5 100644 --- a/virt/kvm/vfio.c +++ b/virt/kvm/vfio.c @@ -18,6 +18,7 @@ #include linux/slab.h #include linux/uaccess.h #include linux/vfio.h +#include vfio.h struct kvm_vfio_group { struct list_head node; @@ -278,8 +279,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type) return 0; } -static int __init kvm_vfio_ops_init(void) +int kvm_vfio_ops_init(void) { return kvm_register_device_ops(kvm_vfio_ops, KVM_DEV_TYPE_VFIO); } -module_init(kvm_vfio_ops_init); diff --git a/virt/kvm/vfio.h b/virt/kvm/vfio.h new file mode 100644 index ..92eac75d6b62 --- /dev/null +++ b/virt/kvm/vfio.h @@ -0,0 +1,13 @@ +#ifndef __KVM_VFIO_H +#define __KVM_VFIO_H + +#ifdef CONFIG_KVM_VFIO +int kvm_vfio_ops_init(void); +#else +static inline int kvm_vfio_ops_init(void) +{ + return 0; +} +#endif + +#endif -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv7 01/26] iommu/arm-smmu: change IOMMU_EXEC to IOMMU_NOEXEC
Hi Antonios, On Tue, Sep 23, 2014 at 03:46:00PM +0100, Antonios Motakis wrote: Exposing the XN flag of the SMMU driver as IOMMU_NOEXEC instead of IOMMU_EXEC makes it enforceable, since for IOMMUs that don't support the XN flag pages will always be executable. Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com --- drivers/iommu/arm-smmu.c | 9 + include/linux/iommu.h| 2 +- 2 files changed, 6 insertions(+), 5 deletions(-) [...] diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 20f9a52..e1a644c 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -27,7 +27,7 @@ #define IOMMU_READ (1 0) #define IOMMU_WRITE (1 1) #define IOMMU_CACHE (1 2) /* DMA cache coherency */ -#define IOMMU_EXEC (1 3) +#define IOMMU_NOEXEC (1 3) This hunk needs to be a separate patch merged by Joerg before I can take the arm-smmu part (which looks fine). Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 0/4] Make kvm_device_ops registration dynamic
Hi Paolo, On Tue, Sep 02, 2014 at 10:27:32AM +0100, Will Deacon wrote: Hi all, This is version 3 of the patches originally posted here: v1: http://www.spinics.net/lists/kvm-arm/msg10219.html v2: http://www.spinics.net/lists/kvm/msg105197.html Changes since v2 include: - Rebased onto 3.17-rc* (the vgic code changed a lot!) - Added relevant acks The mpic, flic and xics are still not ported over, as I don't want to risk breaking those devices (it's not clear at which point they need to be registered). Any further comments on this lot? Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm tools: arm: remove register accessor macros now that they are in uapi
On Fri, Aug 29, 2014 at 02:00:24PM +0100, Will Deacon wrote: The kernel now exposes register accessor macros in the uapi/ headers for arm and arm64, so use those instead (and avoid the compile failure from the duplicate definitions). Signed-off-by: Will Deacon will.dea...@arm.com --- Pekka -- please take this as a fix, since merging the 3.16 sources has caused some breakage for ARM. Cheers! Ping? It would be great to get the build failure fixed on master. Cheers, Will tools/kvm/arm/aarch32/kvm-cpu.c | 15 +-- tools/kvm/arm/aarch64/kvm-cpu.c | 15 --- 2 files changed, 1 insertion(+), 29 deletions(-) diff --git a/tools/kvm/arm/aarch32/kvm-cpu.c b/tools/kvm/arm/aarch32/kvm-cpu.c index 464b473dc936..95fb1da5ba3d 100644 --- a/tools/kvm/arm/aarch32/kvm-cpu.c +++ b/tools/kvm/arm/aarch32/kvm-cpu.c @@ -7,25 +7,12 @@ #define ARM_CORE_REG(x) (KVM_REG_ARM | KVM_REG_SIZE_U32 | KVM_REG_ARM_CORE | \ KVM_REG_ARM_CORE_REG(x)) -#define ARM_CP15_REG_SHIFT_MASK(x,n) \ - (((x) KVM_REG_ARM_ ## n ## _SHIFT) KVM_REG_ARM_ ## n ## _MASK) - -#define __ARM_CP15_REG(op1,crn,crm,op2) \ - (KVM_REG_ARM | KVM_REG_SIZE_U32 | \ - (15 KVM_REG_ARM_COPROC_SHIFT) | \ - ARM_CP15_REG_SHIFT_MASK(op1, OPC1) | \ - ARM_CP15_REG_SHIFT_MASK(crn, 32_CRN) | \ - ARM_CP15_REG_SHIFT_MASK(crm, CRM) | \ - ARM_CP15_REG_SHIFT_MASK(op2, 32_OPC2)) - -#define ARM_CP15_REG(...)__ARM_CP15_REG(__VA_ARGS__) - unsigned long kvm_cpu__get_vcpu_mpidr(struct kvm_cpu *vcpu) { struct kvm_one_reg reg; u32 mpidr; - reg.id = ARM_CP15_REG(ARM_CPU_ID, ARM_CPU_ID_MPIDR); + reg.id = ARM_CP15_REG32(ARM_CPU_ID, ARM_CPU_ID_MPIDR); reg.addr = (u64)(unsigned long)mpidr; if (ioctl(vcpu-vcpu_fd, KVM_GET_ONE_REG, reg) 0) die(KVM_GET_ONE_REG failed (get_mpidr vcpu%ld, vcpu-cpu_id); diff --git a/tools/kvm/arm/aarch64/kvm-cpu.c b/tools/kvm/arm/aarch64/kvm-cpu.c index 71a2a3a7789d..1b293748efd6 100644 --- a/tools/kvm/arm/aarch64/kvm-cpu.c +++ b/tools/kvm/arm/aarch64/kvm-cpu.c @@ -15,21 +15,6 @@ #define ARM64_CORE_REG(x)(KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \ KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x)) -#define ARM64_SYS_REG_SHIFT_MASK(x,n)\ - (((x) KVM_REG_ARM64_SYSREG_ ## n ## _SHIFT) \ - KVM_REG_ARM64_SYSREG_ ## n ## _MASK) - -#define __ARM64_SYS_REG(op0,op1,crn,crm,op2) \ - (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \ - KVM_REG_ARM64_SYSREG | \ - ARM64_SYS_REG_SHIFT_MASK(op0, OP0) | \ - ARM64_SYS_REG_SHIFT_MASK(op1, OP1) | \ - ARM64_SYS_REG_SHIFT_MASK(crn, CRN) | \ - ARM64_SYS_REG_SHIFT_MASK(crm, CRM) | \ - ARM64_SYS_REG_SHIFT_MASK(op2, OP2)) - -#define ARM64_SYS_REG(...) __ARM64_SYS_REG(__VA_ARGS__) - unsigned long kvm_cpu__get_vcpu_mpidr(struct kvm_cpu *vcpu) { struct kvm_one_reg reg; -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 0/4] Make kvm_device_ops registration dynamic
Hi all, This is version 3 of the patches originally posted here: v1: http://www.spinics.net/lists/kvm-arm/msg10219.html v2: http://www.spinics.net/lists/kvm/msg105197.html Changes since v2 include: - Rebased onto 3.17-rc* (the vgic code changed a lot!) - Added relevant acks The mpic, flic and xics are still not ported over, as I don't want to risk breaking those devices (it's not clear at which point they need to be registered). Thanks, Will ---8 Cornelia Huck (1): KVM: s390: register flic ops dynamically Will Deacon (3): KVM: device: add simple registration mechanism for kvm_device_ops KVM: ARM: vgic: register kvm_device_ops dynamically KVM: VFIO: register kvm_device_ops dynamically arch/s390/kvm/kvm-s390.c | 3 +- arch/s390/kvm/kvm-s390.h | 1 + include/linux/kvm_host.h | 4 +- include/uapi/linux/kvm.h | 22 +-- virt/kvm/arm/vgic.c | 157 --- virt/kvm/kvm_main.c | 57 + virt/kvm/vfio.c | 22 --- 7 files changed, 142 insertions(+), 124 deletions(-) -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 3/4] KVM: s390: register flic ops dynamically
From: Cornelia Huck cornelia.h...@de.ibm.com Using the new kvm_register_device_ops() interface makes us get rid of an #ifdef in common code. Cc: Gleb Natapov g...@kernel.org Cc: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com Signed-off-by: Will Deacon will.dea...@arm.com --- arch/s390/kvm/kvm-s390.c | 3 ++- arch/s390/kvm/kvm-s390.h | 1 + include/linux/kvm_host.h | 1 - virt/kvm/kvm_main.c | 4 4 files changed, 3 insertions(+), 6 deletions(-) diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index ce81eb2ab76a..63793641393c 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -130,7 +130,8 @@ void kvm_arch_check_processor_compat(void *rtn) int kvm_arch_init(void *opaque) { - return 0; + /* Register floating interrupt controller interface. */ + return kvm_register_device_ops(kvm_flic_ops, KVM_DEV_TYPE_FLIC); } void kvm_arch_exit(void) diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h index 3862fa2cefe0..04e90538c37e 100644 --- a/arch/s390/kvm/kvm-s390.h +++ b/arch/s390/kvm/kvm-s390.h @@ -228,6 +228,7 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu); int psw_extint_disabled(struct kvm_vcpu *vcpu); void kvm_s390_destroy_adapters(struct kvm *kvm); int kvm_s390_si_ext_call_pending(struct kvm_vcpu *vcpu); +extern struct kvm_device_ops kvm_flic_ops; /* implemented in guestdbg.c */ void kvm_s390_backup_guest_per_regs(struct kvm_vcpu *vcpu); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index e9a069ddcb2b..73f796ad0d0b 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1073,7 +1073,6 @@ int kvm_register_device_ops(struct kvm_device_ops *ops, u32 type); extern struct kvm_device_ops kvm_mpic_ops; extern struct kvm_device_ops kvm_xics_ops; extern struct kvm_device_ops kvm_vfio_ops; -extern struct kvm_device_ops kvm_flic_ops; #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 6aaa44c1506e..78b25a44db53 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2272,10 +2272,6 @@ static struct kvm_device_ops *kvm_device_ops_table[KVM_DEV_TYPE_MAX] = { #ifdef CONFIG_KVM_VFIO [KVM_DEV_TYPE_VFIO] = kvm_vfio_ops, #endif - -#ifdef CONFIG_S390 - [KVM_DEV_TYPE_FLIC] = kvm_flic_ops, -#endif }; int kvm_register_device_ops(struct kvm_device_ops *ops, u32 type) -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/4] KVM: device: add simple registration mechanism for kvm_device_ops
kvm_ioctl_create_device currently has knowledge of all the device types and their associated ops. This is fairly inflexible when adding support for new in-kernel device emulations, so move what we currently have out into a table, which can support dynamic registration of ops by new drivers for virtual hardware. Cc: Alex Williamson alex.william...@redhat.com Cc: Alex Graf ag...@suse.de Cc: Gleb Natapov g...@kernel.org Cc: Paolo Bonzini pbonz...@redhat.com Cc: Marc Zyngier marc.zyng...@arm.com Acked-by: Cornelia Huck cornelia.h...@de.ibm.com Reviewed-by: Christoffer Dall christoffer.d...@linaro.org Signed-off-by: Will Deacon will.dea...@arm.com --- include/linux/kvm_host.h | 1 + include/uapi/linux/kvm.h | 22 +++- virt/kvm/kvm_main.c | 65 3 files changed, 55 insertions(+), 33 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index a4c33b34fe3f..ca105f919f0b 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1068,6 +1068,7 @@ struct kvm_device_ops { void kvm_device_get(struct kvm_device *dev); void kvm_device_put(struct kvm_device *dev); struct kvm_device *kvm_device_from_filp(struct file *filp); +int kvm_register_device_ops(struct kvm_device_ops *ops, u32 type); extern struct kvm_device_ops kvm_mpic_ops; extern struct kvm_device_ops kvm_xics_ops; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index cf3a2ff440e4..5541024dd5d0 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -947,15 +947,25 @@ struct kvm_device_attr { __u64 addr; /* userspace address of attr data */ }; -#define KVM_DEV_TYPE_FSL_MPIC_20 1 -#define KVM_DEV_TYPE_FSL_MPIC_42 2 -#define KVM_DEV_TYPE_XICS 3 -#define KVM_DEV_TYPE_VFIO 4 #define KVM_DEV_VFIO_GROUP1 #define KVM_DEV_VFIO_GROUP_ADD 1 #define KVM_DEV_VFIO_GROUP_DEL 2 -#define KVM_DEV_TYPE_ARM_VGIC_V2 5 -#define KVM_DEV_TYPE_FLIC 6 + +enum kvm_device_type { + KVM_DEV_TYPE_FSL_MPIC_20= 1, +#define KVM_DEV_TYPE_FSL_MPIC_20 KVM_DEV_TYPE_FSL_MPIC_20 + KVM_DEV_TYPE_FSL_MPIC_42, +#define KVM_DEV_TYPE_FSL_MPIC_42 KVM_DEV_TYPE_FSL_MPIC_42 + KVM_DEV_TYPE_XICS, +#define KVM_DEV_TYPE_XICS KVM_DEV_TYPE_XICS + KVM_DEV_TYPE_VFIO, +#define KVM_DEV_TYPE_VFIO KVM_DEV_TYPE_VFIO + KVM_DEV_TYPE_ARM_VGIC_V2, +#define KVM_DEV_TYPE_ARM_VGIC_V2 KVM_DEV_TYPE_ARM_VGIC_V2 + KVM_DEV_TYPE_FLIC, +#define KVM_DEV_TYPE_FLIC KVM_DEV_TYPE_FLIC + KVM_DEV_TYPE_MAX, +}; /* * ioctls for VM fds diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 33712fb26eb1..81de6768d2c2 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2259,44 +2259,55 @@ struct kvm_device *kvm_device_from_filp(struct file *filp) return filp-private_data; } -static int kvm_ioctl_create_device(struct kvm *kvm, - struct kvm_create_device *cd) -{ - struct kvm_device_ops *ops = NULL; - struct kvm_device *dev; - bool test = cd-flags KVM_CREATE_DEVICE_TEST; - int ret; - - switch (cd-type) { +static struct kvm_device_ops *kvm_device_ops_table[KVM_DEV_TYPE_MAX] = { #ifdef CONFIG_KVM_MPIC - case KVM_DEV_TYPE_FSL_MPIC_20: - case KVM_DEV_TYPE_FSL_MPIC_42: - ops = kvm_mpic_ops; - break; + [KVM_DEV_TYPE_FSL_MPIC_20] = kvm_mpic_ops, + [KVM_DEV_TYPE_FSL_MPIC_42] = kvm_mpic_ops, #endif + #ifdef CONFIG_KVM_XICS - case KVM_DEV_TYPE_XICS: - ops = kvm_xics_ops; - break; + [KVM_DEV_TYPE_XICS] = kvm_xics_ops, #endif + #ifdef CONFIG_KVM_VFIO - case KVM_DEV_TYPE_VFIO: - ops = kvm_vfio_ops; - break; + [KVM_DEV_TYPE_VFIO] = kvm_vfio_ops, #endif + #ifdef CONFIG_KVM_ARM_VGIC - case KVM_DEV_TYPE_ARM_VGIC_V2: - ops = kvm_arm_vgic_v2_ops; - break; + [KVM_DEV_TYPE_ARM_VGIC_V2] = kvm_arm_vgic_v2_ops, #endif + #ifdef CONFIG_S390 - case KVM_DEV_TYPE_FLIC: - ops = kvm_flic_ops; - break; + [KVM_DEV_TYPE_FLIC] = kvm_flic_ops, #endif - default: +}; + +int kvm_register_device_ops(struct kvm_device_ops *ops, u32 type) +{ + if (type = ARRAY_SIZE(kvm_device_ops_table)) + return -ENOSPC; + + if (kvm_device_ops_table[type] != NULL) + return -EEXIST; + + kvm_device_ops_table[type] = ops; + return 0; +} + +static int kvm_ioctl_create_device(struct kvm *kvm, + struct kvm_create_device *cd) +{ + struct kvm_device_ops *ops = NULL; + struct kvm_device *dev; + bool test = cd-flags KVM_CREATE_DEVICE_TEST
[PATCH v3 2/4] KVM: ARM: vgic: register kvm_device_ops dynamically
Now that we have a dynamic means to register kvm_device_ops, use that for the ARM VGIC, instead of relying on the static table. Cc: Gleb Natapov g...@kernel.org Cc: Paolo Bonzini pbonz...@redhat.com Acked-by: Marc Zyngier marc.zyng...@arm.com Reviewed-by: Christoffer Dall christoffer.d...@linaro.org Signed-off-by: Will Deacon will.dea...@arm.com --- include/linux/kvm_host.h | 1 - virt/kvm/arm/vgic.c | 157 --- virt/kvm/kvm_main.c | 4 -- 3 files changed, 79 insertions(+), 83 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index ca105f919f0b..e9a069ddcb2b 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1073,7 +1073,6 @@ int kvm_register_device_ops(struct kvm_device_ops *ops, u32 type); extern struct kvm_device_ops kvm_mpic_ops; extern struct kvm_device_ops kvm_xics_ops; extern struct kvm_device_ops kvm_vfio_ops; -extern struct kvm_device_ops kvm_arm_vgic_v2_ops; extern struct kvm_device_ops kvm_flic_ops; #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 73eba793b17f..3ee3ce06bbec 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -1522,83 +1522,6 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu) return 0; } -static void vgic_init_maintenance_interrupt(void *info) -{ - enable_percpu_irq(vgic-maint_irq, 0); -} - -static int vgic_cpu_notify(struct notifier_block *self, - unsigned long action, void *cpu) -{ - switch (action) { - case CPU_STARTING: - case CPU_STARTING_FROZEN: - vgic_init_maintenance_interrupt(NULL); - break; - case CPU_DYING: - case CPU_DYING_FROZEN: - disable_percpu_irq(vgic-maint_irq); - break; - } - - return NOTIFY_OK; -} - -static struct notifier_block vgic_cpu_nb = { - .notifier_call = vgic_cpu_notify, -}; - -static const struct of_device_id vgic_ids[] = { - { .compatible = arm,cortex-a15-gic, .data = vgic_v2_probe, }, - { .compatible = arm,gic-v3, .data = vgic_v3_probe, }, - {}, -}; - -int kvm_vgic_hyp_init(void) -{ - const struct of_device_id *matched_id; - int (*vgic_probe)(struct device_node *,const struct vgic_ops **, - const struct vgic_params **); - struct device_node *vgic_node; - int ret; - - vgic_node = of_find_matching_node_and_match(NULL, - vgic_ids, matched_id); - if (!vgic_node) { - kvm_err(error: no compatible GIC node found\n); - return -ENODEV; - } - - vgic_probe = matched_id-data; - ret = vgic_probe(vgic_node, vgic_ops, vgic); - if (ret) - return ret; - - ret = request_percpu_irq(vgic-maint_irq, vgic_maintenance_handler, -vgic, kvm_get_running_vcpus()); - if (ret) { - kvm_err(Cannot register interrupt %d\n, vgic-maint_irq); - return ret; - } - - ret = __register_cpu_notifier(vgic_cpu_nb); - if (ret) { - kvm_err(Cannot register vgic CPU notifier\n); - goto out_free_irq; - } - - /* Callback into for arch code for setup */ - vgic_arch_setup(vgic); - - on_each_cpu(vgic_init_maintenance_interrupt, NULL, 1); - - return 0; - -out_free_irq: - free_percpu_irq(vgic-maint_irq, kvm_get_running_vcpus()); - return ret; -} - /** * kvm_vgic_init - Initialize global VGIC state before running any VCPUs * @kvm: pointer to the kvm struct @@ -2062,7 +1985,7 @@ static int vgic_create(struct kvm_device *dev, u32 type) return kvm_vgic_create(dev-kvm); } -struct kvm_device_ops kvm_arm_vgic_v2_ops = { +static struct kvm_device_ops kvm_arm_vgic_v2_ops = { .name = kvm-arm-vgic, .create = vgic_create, .destroy = vgic_destroy, @@ -2070,3 +1993,81 @@ struct kvm_device_ops kvm_arm_vgic_v2_ops = { .get_attr = vgic_get_attr, .has_attr = vgic_has_attr, }; + +static void vgic_init_maintenance_interrupt(void *info) +{ + enable_percpu_irq(vgic-maint_irq, 0); +} + +static int vgic_cpu_notify(struct notifier_block *self, + unsigned long action, void *cpu) +{ + switch (action) { + case CPU_STARTING: + case CPU_STARTING_FROZEN: + vgic_init_maintenance_interrupt(NULL); + break; + case CPU_DYING: + case CPU_DYING_FROZEN: + disable_percpu_irq(vgic-maint_irq); + break; + } + + return NOTIFY_OK; +} + +static struct notifier_block vgic_cpu_nb = { + .notifier_call = vgic_cpu_notify, +}; + +static const struct of_device_id vgic_ids[] = { + { .compatible = arm,cortex-a15-gic, .data = vgic_v2_probe, }, + { .compatible = arm,gic-v3, .data
[PATCH v3 4/4] KVM: VFIO: register kvm_device_ops dynamically
Now that we have a dynamic means to register kvm_device_ops, use that for the VFIO kvm device, instead of relying on the static table. This is achieved by a module_init call to register the ops with KVM. Cc: Gleb Natapov g...@kernel.org Cc: Paolo Bonzini pbonz...@redhat.com Acked-by: Alex Williamson alex.william...@redhat.com Signed-off-by: Will Deacon will.dea...@arm.com --- include/linux/kvm_host.h | 1 - virt/kvm/kvm_main.c | 4 virt/kvm/vfio.c | 22 +++--- 3 files changed, 15 insertions(+), 12 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 73f796ad0d0b..55a1618a19dc 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1072,7 +1072,6 @@ int kvm_register_device_ops(struct kvm_device_ops *ops, u32 type); extern struct kvm_device_ops kvm_mpic_ops; extern struct kvm_device_ops kvm_xics_ops; -extern struct kvm_device_ops kvm_vfio_ops; #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 78b25a44db53..e9a703eca8c1 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2268,10 +2268,6 @@ static struct kvm_device_ops *kvm_device_ops_table[KVM_DEV_TYPE_MAX] = { #ifdef CONFIG_KVM_XICS [KVM_DEV_TYPE_XICS] = kvm_xics_ops, #endif - -#ifdef CONFIG_KVM_VFIO - [KVM_DEV_TYPE_VFIO] = kvm_vfio_ops, -#endif }; int kvm_register_device_ops(struct kvm_device_ops *ops, u32 type) diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c index ba1a93f935c7..bb11b36ee8a2 100644 --- a/virt/kvm/vfio.c +++ b/virt/kvm/vfio.c @@ -246,6 +246,16 @@ static void kvm_vfio_destroy(struct kvm_device *dev) kfree(dev); /* alloc by kvm_ioctl_create_device, free by .destroy */ } +static int kvm_vfio_create(struct kvm_device *dev, u32 type); + +static struct kvm_device_ops kvm_vfio_ops = { + .name = kvm-vfio, + .create = kvm_vfio_create, + .destroy = kvm_vfio_destroy, + .set_attr = kvm_vfio_set_attr, + .has_attr = kvm_vfio_has_attr, +}; + static int kvm_vfio_create(struct kvm_device *dev, u32 type) { struct kvm_device *tmp; @@ -268,10 +278,8 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type) return 0; } -struct kvm_device_ops kvm_vfio_ops = { - .name = kvm-vfio, - .create = kvm_vfio_create, - .destroy = kvm_vfio_destroy, - .set_attr = kvm_vfio_set_attr, - .has_attr = kvm_vfio_has_attr, -}; +static int __init kvm_vfio_ops_init(void) +{ + return kvm_register_device_ops(kvm_vfio_ops, KVM_DEV_TYPE_VFIO); +} +module_init(kvm_vfio_ops_init); -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm tools: arm: remove register accessor macros now that they are in uapi
The kernel now exposes register accessor macros in the uapi/ headers for arm and arm64, so use those instead (and avoid the compile failure from the duplicate definitions). Signed-off-by: Will Deacon will.dea...@arm.com --- Pekka -- please take this as a fix, since merging the 3.16 sources has caused some breakage for ARM. Cheers! tools/kvm/arm/aarch32/kvm-cpu.c | 15 +-- tools/kvm/arm/aarch64/kvm-cpu.c | 15 --- 2 files changed, 1 insertion(+), 29 deletions(-) diff --git a/tools/kvm/arm/aarch32/kvm-cpu.c b/tools/kvm/arm/aarch32/kvm-cpu.c index 464b473dc936..95fb1da5ba3d 100644 --- a/tools/kvm/arm/aarch32/kvm-cpu.c +++ b/tools/kvm/arm/aarch32/kvm-cpu.c @@ -7,25 +7,12 @@ #define ARM_CORE_REG(x)(KVM_REG_ARM | KVM_REG_SIZE_U32 | KVM_REG_ARM_CORE | \ KVM_REG_ARM_CORE_REG(x)) -#define ARM_CP15_REG_SHIFT_MASK(x,n) \ - (((x) KVM_REG_ARM_ ## n ## _SHIFT) KVM_REG_ARM_ ## n ## _MASK) - -#define __ARM_CP15_REG(op1,crn,crm,op2)\ - (KVM_REG_ARM | KVM_REG_SIZE_U32 | \ -(15 KVM_REG_ARM_COPROC_SHIFT) | \ -ARM_CP15_REG_SHIFT_MASK(op1, OPC1) | \ -ARM_CP15_REG_SHIFT_MASK(crn, 32_CRN) | \ -ARM_CP15_REG_SHIFT_MASK(crm, CRM) | \ -ARM_CP15_REG_SHIFT_MASK(op2, 32_OPC2)) - -#define ARM_CP15_REG(...) __ARM_CP15_REG(__VA_ARGS__) - unsigned long kvm_cpu__get_vcpu_mpidr(struct kvm_cpu *vcpu) { struct kvm_one_reg reg; u32 mpidr; - reg.id = ARM_CP15_REG(ARM_CPU_ID, ARM_CPU_ID_MPIDR); + reg.id = ARM_CP15_REG32(ARM_CPU_ID, ARM_CPU_ID_MPIDR); reg.addr = (u64)(unsigned long)mpidr; if (ioctl(vcpu-vcpu_fd, KVM_GET_ONE_REG, reg) 0) die(KVM_GET_ONE_REG failed (get_mpidr vcpu%ld, vcpu-cpu_id); diff --git a/tools/kvm/arm/aarch64/kvm-cpu.c b/tools/kvm/arm/aarch64/kvm-cpu.c index 71a2a3a7789d..1b293748efd6 100644 --- a/tools/kvm/arm/aarch64/kvm-cpu.c +++ b/tools/kvm/arm/aarch64/kvm-cpu.c @@ -15,21 +15,6 @@ #define ARM64_CORE_REG(x) (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \ KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x)) -#define ARM64_SYS_REG_SHIFT_MASK(x,n) \ - (((x) KVM_REG_ARM64_SYSREG_ ## n ## _SHIFT) \ -KVM_REG_ARM64_SYSREG_ ## n ## _MASK) - -#define __ARM64_SYS_REG(op0,op1,crn,crm,op2) \ - (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \ -KVM_REG_ARM64_SYSREG | \ -ARM64_SYS_REG_SHIFT_MASK(op0, OP0) | \ -ARM64_SYS_REG_SHIFT_MASK(op1, OP1) | \ -ARM64_SYS_REG_SHIFT_MASK(crn, CRN) | \ -ARM64_SYS_REG_SHIFT_MASK(crm, CRM) | \ -ARM64_SYS_REG_SHIFT_MASK(op2, OP2)) - -#define ARM64_SYS_REG(...) __ARM64_SYS_REG(__VA_ARGS__) - unsigned long kvm_cpu__get_vcpu_mpidr(struct kvm_cpu *vcpu) { struct kvm_one_reg reg; -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/4] kvmtool: ARM: Use KVM_ARM_PREFERRED_TARGET vm ioctl to determine target cpu
On Fri, Aug 29, 2014 at 10:10:52AM +0100, Andre Przywara wrote: (resent, that was the wrong account before ...) Aha, and now your true identity has been revealed to all! Nice try Andre... or should I say, Rienhard? Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5] kvmtool: ARM64: Fix compile error for aarch64
On Thu, Aug 28, 2014 at 10:56:29AM +0100, Pekka Enberg wrote: On 08/07/2014 12:12 PM, Will Deacon wrote: Ok. Pekka, could you merge in 3.16 to the kvmtool master branch please? You'll need my patch below to resolve some ARM build fallout. Done. Thanks, Pekka! We'll give it a spin. Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 4/5] vfio: amba: VFIO support for AMBA devices
Hi Antonios, On Fri, Aug 22, 2014 at 10:01:27AM +0100, Antonios Motakis wrote: Add support for discovering AMBA devices with VFIO and handle them similarly to Linux platform devices. [...] +static struct amba_id pl330_ids[] = { + { 0, 0 }, +}; + +MODULE_DEVICE_TABLE(amba, pl330_ids); + +static struct amba_driver vfio_amba_driver = { + .probe = vfio_amba_probe, + .remove = vfio_amba_remove, + .id_table = pl330_ids, + .drv = { + .name = vfio-amba, + .owner = THIS_MODULE, + }, +}; I don't understand what you're doing with the IDs here. What's the point in the empty list? This also raises a larger question about whether or not it's safe to allow device passthrough of arbitrary platform devices with VFIO. In the absence of a bus/device standard like PCI, I really think this should be in opt-in decision, where certain platform drivers can declare that their device can be safely used with passthrough. Thoughts? Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html