[PATCH 0/2] kvmtool: replace documentations stubs with manpage

2015-12-22 Thread Andre Przywara
Hi,

as I got annoyed with the availability and quality of the
documentation and always wanted to write a manpage, I just took this
first step by replacing the stub text files in the Documentation
directory with a manpage.
This is clearly only the beginning, there is more functionality which
currently is not documented at all (networking comes to mind).

Cheers,
Andre.

Andre Przywara (2):
  Add a rudimentary manpage
  Documentation: remove documentation stubs and common-cmds.h generation

 .gitignore|   1 -
 Documentation/kvm-balloon.txt |  24 -
 Documentation/kvm-debug.txt   |  16 ---
 Documentation/kvm-list.txt|  16 ---
 Documentation/kvm-pause.txt   |  16 ---
 Documentation/kvm-resume.txt  |  16 ---
 Documentation/kvm-run.txt |  62 
 Documentation/kvm-sandbox.txt |  16 ---
 Documentation/kvm-setup.txt   |  15 ---
 Documentation/kvm-stat.txt|  19 
 Documentation/kvm-stop.txt|  16 ---
 Documentation/kvm-version.txt |  21 
 Documentation/kvmtool.1   | 222 ++
 Makefile  |  10 --
 command-list.txt  |  15 ---
 include/common-cmds.h |  19 
 16 files changed, 241 insertions(+), 263 deletions(-)
 delete mode 100644 Documentation/kvm-balloon.txt
 delete mode 100644 Documentation/kvm-debug.txt
 delete mode 100644 Documentation/kvm-list.txt
 delete mode 100644 Documentation/kvm-pause.txt
 delete mode 100644 Documentation/kvm-resume.txt
 delete mode 100644 Documentation/kvm-run.txt
 delete mode 100644 Documentation/kvm-sandbox.txt
 delete mode 100644 Documentation/kvm-setup.txt
 delete mode 100644 Documentation/kvm-stat.txt
 delete mode 100644 Documentation/kvm-stop.txt
 delete mode 100644 Documentation/kvm-version.txt
 create mode 100644 Documentation/kvmtool.1
 delete mode 100644 command-list.txt
 create mode 100644 include/common-cmds.h

-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Documentation: remove documentation stubs and common-cmds.h generation

2015-12-22 Thread Andre Przywara
Now that we have a manpage in place, we can get rid of the manpage
style text files in the Documentation directory.
This allows us also to get rid of the crude common-cmds.h generation,
which relied on these files and on a command-list.txt file.
Instead include the version of that header file generated with the
current HEAD into the source tree.

Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
---
 .gitignore|  1 -
 Documentation/kvm-balloon.txt | 24 -
 Documentation/kvm-debug.txt   | 16 ---
 Documentation/kvm-list.txt| 16 ---
 Documentation/kvm-pause.txt   | 16 ---
 Documentation/kvm-resume.txt  | 16 ---
 Documentation/kvm-run.txt | 62 ---
 Documentation/kvm-sandbox.txt | 16 ---
 Documentation/kvm-setup.txt   | 15 ---
 Documentation/kvm-stat.txt| 19 -
 Documentation/kvm-stop.txt| 16 ---
 Documentation/kvm-version.txt | 21 ---
 Makefile  | 10 ---
 command-list.txt  | 15 ---
 include/common-cmds.h | 19 +
 15 files changed, 19 insertions(+), 263 deletions(-)
 delete mode 100644 Documentation/kvm-balloon.txt
 delete mode 100644 Documentation/kvm-debug.txt
 delete mode 100644 Documentation/kvm-list.txt
 delete mode 100644 Documentation/kvm-pause.txt
 delete mode 100644 Documentation/kvm-resume.txt
 delete mode 100644 Documentation/kvm-run.txt
 delete mode 100644 Documentation/kvm-sandbox.txt
 delete mode 100644 Documentation/kvm-setup.txt
 delete mode 100644 Documentation/kvm-stat.txt
 delete mode 100644 Documentation/kvm-stop.txt
 delete mode 100644 Documentation/kvm-version.txt
 delete mode 100644 command-list.txt
 create mode 100644 include/common-cmds.h

diff --git a/.gitignore b/.gitignore
index a16a97f..f21a0bd 100644
--- a/.gitignore
+++ b/.gitignore
@@ -6,7 +6,6 @@
 *.swp
 .cscope
 tags
-include/common-cmds.h
 tests/boot/boot_test.iso
 tests/boot/rootfs/
 guest/init
diff --git a/Documentation/kvm-balloon.txt b/Documentation/kvm-balloon.txt
deleted file mode 100644
index efc0a87..000
--- a/Documentation/kvm-balloon.txt
+++ /dev/null
@@ -1,24 +0,0 @@
-lkvm-balloon(1)
-
-
-NAME
-
-lkvm-balloon - Inflate or deflate the virtio balloon
-
-SYNOPSIS
-
-[verse]
-'lkvm balloon [command] [size] [instance]'
-
-DESCRIPTION

-The command inflates or deflates the virtio balloon located in the
-specified instance.
-For a list of running instances see 'lkvm list'.
-
-Command can be either 'inflate' or 'deflate'. Inflate increases the
-size of the balloon, thus decreasing the amount of virtual RAM available
-for the guest. Deflation returns previously inflated memory back to the
-guest.
-
-size is specified in Mb.
diff --git a/Documentation/kvm-debug.txt b/Documentation/kvm-debug.txt
deleted file mode 100644
index a8eb2c0..000
--- a/Documentation/kvm-debug.txt
+++ /dev/null
@@ -1,16 +0,0 @@
-lkvm-debug(1)
-
-
-NAME
-
-lkvm-debug - Print debug information from a running instance
-
-SYNOPSIS
-
-[verse]
-'lkvm debug [instance]'
-
-DESCRIPTION

-The command prints debug information from a running instance.
-For a list of running instances see 'lkvm list'.
diff --git a/Documentation/kvm-list.txt b/Documentation/kvm-list.txt
deleted file mode 100644
index a245607..000
--- a/Documentation/kvm-list.txt
+++ /dev/null
@@ -1,16 +0,0 @@
-lkvm-list(1)
-
-
-NAME
-
-lkvm-list - Print a list of running instances on the host.
-
-SYNOPSIS
-
-[verse]
-'lkvm list'
-
-DESCRIPTION

-This command prints a list of running instances on the host which
-belong to the user who currently ran 'lkvm list'.
diff --git a/Documentation/kvm-pause.txt b/Documentation/kvm-pause.txt
deleted file mode 100644
index 1ea2a23..000
--- a/Documentation/kvm-pause.txt
+++ /dev/null
@@ -1,16 +0,0 @@
-lkvm-pause(1)
-
-
-NAME
-
-lkvm-pause - Pause the virtual machine
-
-SYNOPSIS
-
-[verse]
-'lkvm pause [instance]'
-
-DESCRIPTION

-The command pauses a virtual machine.
-For a list of running instances see 'lkvm list'.
diff --git a/Documentation/kvm-resume.txt b/Documentation/kvm-resume.txt
deleted file mode 100644
index a36c4df..000
--- a/Documentation/kvm-resume.txt
+++ /dev/null
@@ -1,16 +0,0 @@
-lkvm-resume(1)
-
-
-NAME
-
-lkvm-resume - Resume the virtual machine
-
-SYNOPSIS
-
-[verse]
-'lkvm resume [instance]'
-
-DESCRIPTION

-The command resumes a virtual machine.
-For a list of running instances see 'lkvm list'.
diff --git a/Documentation/kvm-run.txt b/Documentation/kvm-run.txt
deleted file mode 100644
index 8ddf470..000
--- a/Documentation/kvm-run.txt
+++ /dev/null
@@ -1,62 +0,0 @@
-lkvm-run(1)
-
-
-NAME
-
-lkvm-run - Start the virtual machine
-
-SYNOPSIS
-
-[verse]
-'lk

[PATCH 1/2] Add a rudimentary manpage

2015-12-22 Thread Andre Przywara
The kvmtool documentation is somewhat lacking, also it is not easily
accessible when living in the source tree only.
Add a good ol' manpage to document at least the basic commands and
their options.
This level of documentation matches the one that is already there in
the Documentation directory and should be subject to extension.

Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
---
 Documentation/kvmtool.1 | 222 
 1 file changed, 222 insertions(+)
 create mode 100644 Documentation/kvmtool.1

diff --git a/Documentation/kvmtool.1 b/Documentation/kvmtool.1
new file mode 100644
index 000..aecb2dc
--- /dev/null
+++ b/Documentation/kvmtool.1
@@ -0,0 +1,222 @@
+.\" Manpage for kvmtool
+.\" Copyright (C) 2015 by Andre Przywara <andre.przyw...@arm.com>
+.TH kvmtool 1 "11 Nov 2015" "0.1" "kvmtool man page"
+.SH NAME
+kvmtool \- running KVM guests
+.SH SYNOPSIS
+lkvm COMMAND [ARGS]
+.SH DESCRIPTION
+kvmtool is a userland tool for creating and controlling KVM guests.
+.SH "KVMTOOL COMMANDS"
+.sp
+.PP
+.B run -k  ...
+.RS 4
+Run a guest.
+.sp
+.B \-k, \-\-kernel 
+.RS 4
+The virtual machine kernel.
+.RE
+.sp
+.B \-c, \-\-cpus 
+.RS 4
+The number of virtual CPUs to run.
+.RE
+.sp
+.B \-m, \-\-mem 
+.RS 4
+Virtual machine memory size in MiB.
+.RE
+.sp
+.B \-p, \-\-params 
+.RS 4
+Additional kernel command line arguments.
+.RE
+.sp
+.B \-i, \-\-initrd 
+.RS 4
+Initial RAM disk image.
+.RE
+.sp
+.B \-d, \-\-disk 
+.RS 4
+A disk image file or a rootfs directory.
+.RE
+.sp
+.B \-\-console serial|virtio|hv
+.RS 4
+Console to use.
+.RE
+.sp
+.B \-\-dev 
+.RS 4
+KVM device file (instead of the default /dev/kvm).
+.RE
+.sp
+.B \-\-debug
+.RS 4
+Enable debug messages.
+.RE
+.sp
+.B \-\-debug-single-step
+.RS 4
+Enable single stepping.
+.RE
+.sp
+.B \-\-debug-ioport
+.RS 4
+Enable ioport debugging.
+.RE
+.RE
+.PP
+.B setup 
+.RS 4
+Setup a new virtual machine. This creates a new rootfs in the .lkvm
+folder of your home directory.
+.RE
+.PP
+.B pause \-\-all|\-\-name 
+.RS 4
+Pause a virtual machine.
+.sp
+.B \-a, \-\-all
+.RS 4
+Pause all running instances.
+.RE
+.sp
+.B \-n, \-\-name 
+.RS 4
+Pause that specified instance. For a list of running instances, see \fI lkvm 
list\fR.
+.RE
+.RE
+.PP
+.B resume --all|--name 
+.RS 4
+Resume a previously paused virtual machine.
+.sp
+.B \-a, \-\-all
+.RS 4
+Resume all running instances.
+.RE
+.sp
+.B \-n, \-\-name 
+.RS 4
+Resume that specified instance. For a list of running instances, see \fI lkvm 
list\fR.
+.RE
+.RE
+.PP
+.B list [\-i] [\-r]
+.RS 4
+Print a list of running instances on the host. This is restricted to instances
+started by the current user, as it looks in the .lkvm folder in your home
+directory to find the socket files.
+.sp
+.B \-i, \-\-run
+.RS 4
+List all running instances.
+.RE
+.sp
+.B \-r, \-\-rootfs
+.RS 4
+List rootfs instances.
+.RE
+.RE
+.PP
+.B debug --all|--name  [--dump] [--nmi ] [--sysrq ]
+.RS 4
+Print debug information from a running VM instance.
+.sp
+.B \-a, \-\-all
+.RS 4
+Debug all running instances.
+.RE
+.PP
+.B \-n, \-\-name 
+.RS 4
+Debug the specified instance.
+.RE
+.sp
+.B \-d, \-\-dump
+.RS 4
+Generate a debug dump from guest.
+.RE
+.PP
+.B \-m, \-\-nmi 
+.RS 4
+Generate an NMI on the specified virtual CPU.
+.RE
+.PP
+.B \-s, \-\-sysrq 
+.RS 4
+Inject a Linux sysrq into the guest.
+.RE
+.RE
+.PP
+.B balloon \-\-name  \-\-inflate|\-\-deflate 
+.RS 4
+This command inflates or deflates the virtio balloon located in the
+specified instance.
+\-\-inflate increases the size of the balloon, thus \fIdecreasing\fR the
+amount of virtual RAM available for the guest. \-\-deflate returns previously
+inflated memory back to the guest.
+.sp
+.B \-n, \-\-name 
+.RS 4
+Ballon the specified instance. For a list of all instances, see \fI"lkvm 
list"\fR.
+.RE
+.PP
+.B \-i, \-\-inflate 
+.RS 4
+Inflates the ballon by the specified number of Megabytes. This decreases the
+amount of usable memory in the guest.
+.RE
+.PP
+.B \-d, \-\-deflate 
+.RS 4
+Deflates the ballon by the specified number of Megabytes. This increases the
+amount of usable memory in the guest.
+.RE
+.RE
+.PP
+.B stop --all|--name 
+.RS 4
+Stop a running instance.
+.sp
+.B \-a, \-\-all
+.RS 4
+Stop all running instances.
+.RE
+.sp
+.B \-n, \-\-name 
+.RS 4
+Stop the specified instance. For a list of running instances, see \fI lkvm 
list\fR.
+.RE
+.RE
+.PP
+.B stat \-\-all|\-\-name  [\-m]
+.RS 4
+Print statistics about a running instance.
+.sp
+.B \-m, \-\-memory
+.RS 4
+Display memory statistics.
+.RE
+.RE
+.PP
+.B sandbox (\fIlkvm run arguments\fR) \-\- [sandboxed command]
+.RS 4
+Run a command in a sandboxed guest. Kvmtool will inject a special init
+binary which will do an initial setup of the guest Linux and then
+lauch a shell script with the specified command. Upon this command ending,
+the guest will be shutdown.
+.RE
+.SH EXAMPLES
+.RS 4
+\fB$\fR lkvm 

Re: [PATCH 0/7] kvmtool: Cleanup kernel loading

2015-11-18 Thread Andre Przywara
Hi Will,

On 02/11/15 14:58, Will Deacon wrote:
> On Fri, Oct 30, 2015 at 06:26:53PM +0000, Andre Przywara wrote:
>> Hi,
> 
> Hello Andre,
> 
>> this series cleans up kvmtool's kernel loading functionality a bit.
>> It has been broken out of a previous series I sent [1] and contains
>> just the cleanup and bug fix parts, which should be less controversial
>> and thus easier to merge ;-)
>> I will resend the pipe loading part later on as a separate series.
>>
>> The first patch properly abstracts kernel loading to move
>> responsibility into each architecture's code. It removes quite some
>> ugly code from the generic kvm.c file.
>> The later patches address the naive usage of read(2) to, well, read
>> data from files. Doing this without coping with the subtleties of
>> the UNIX read semantics (returning with less or none data read is not
>> an error) can provoke hard to debug failures.
>> So these patches make use of the existing and one new wrapper function
>> to make sure we read everything we actually wanted to.
>> The last patch moves the ARM kernel loading code into the proper
>> location to be in line with the other architectures.
>>
>> Please have a look and give some comments!
> 
> Looks good to me, but I'd like to see some comments from some mips/ppc/x86
> people on the changes you're making over there.

Sounds reasonable, but no answers yet.

Can you take at least patch 1 and 2 meanwhile, preferably 6 and 7 (the
ARM parts) also if you are OK with it?
I have other patches that depend on 1/7 and 2/7, so having them upstream
would help me and reduce further dependency churn.
I am happy to resend the remaining patches for further discussion later.

Cheers,
Andre.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] kvmtool: Cleanup kernel loading

2015-11-18 Thread Andre Przywara
Hi Will,

On 02/11/15 14:58, Will Deacon wrote:
> On Fri, Oct 30, 2015 at 06:26:53PM +0000, Andre Przywara wrote:
>> Hi,
> 
> Hello Andre,
> 
>> this series cleans up kvmtool's kernel loading functionality a bit.
>> It has been broken out of a previous series I sent [1] and contains
>> just the cleanup and bug fix parts, which should be less controversial
>> and thus easier to merge ;-)
>> I will resend the pipe loading part later on as a separate series.
>>
>> The first patch properly abstracts kernel loading to move
>> responsibility into each architecture's code. It removes quite some
>> ugly code from the generic kvm.c file.
>> The later patches address the naive usage of read(2) to, well, read
>> data from files. Doing this without coping with the subtleties of
>> the UNIX read semantics (returning with less or none data read is not
>> an error) can provoke hard to debug failures.
>> So these patches make use of the existing and one new wrapper function
>> to make sure we read everything we actually wanted to.
>> The last patch moves the ARM kernel loading code into the proper
>> location to be in line with the other architectures.
>>
>> Please have a look and give some comments!
> 
> Looks good to me, but I'd like to see some comments from some mips/ppc/x86
> people on the changes you're making over there.

Sounds reasonable, but no answers yet.

Can you take at least patch 1 and 2 meanwhile, preferably 6 and 7 (the
ARM parts) also if you are OK with it?
I have other patches that depend on 1/7 and 2/7, so having them upstream
would help me and reduce further dependency churn.
I am happy to resend the remaining patches for further discussion later.

Cheers,
Andre.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvmtool: Makefile: remove LDFLAGS from guest_init linking

2015-11-08 Thread Andre Przywara
Looking back at the HEAD from a few commits ago, it's obvious that
using the LDFLAGS variable for linking the guest_init binary was
rather pointless, as it was zeroed in the beginning and then never
set.

As guest_init is a rather special binary that does not cope well with
arbitrary linker flags, let's reinstantiate the previous state by
removing the LDFLAGS variable from those linking steps. This allows
LDFLAGS to be used for linking the actual kvmtool binary only and
helps to re-merge commit d0e2772b93a ("Makefile: allow overriding
CFLAGS on the command line").

Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
---
Hi,

Riku: can you confirm that this patch together with the one that was
reverted (d0e2772b93a) still works in your environment?

Will, if that works for Riku and you acknowledge this patch, can you
also re-merge the reverted patch mentioned above? Turns out that the
revert also made my other patch useless ;-)

Cheers,
Andre.

 Makefile | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index 9138942..8095d59 100644
--- a/Makefile
+++ b/Makefile
@@ -385,13 +385,13 @@ ifneq ($(ARCH_PRE_INIT),)
 $(GUEST_PRE_INIT): $(ARCH_PRE_INIT)
$(E) "  LINK" $@
$(Q) $(CC) -s -nostdlib $(ARCH_PRE_INIT) -o $@
-   $(Q) $(LD) $(LDFLAGS) -r -b binary -o guest/guest_pre_init.o 
$(GUEST_PRE_INIT)
+   $(Q) $(LD) -r -b binary -o guest/guest_pre_init.o $(GUEST_PRE_INIT)
 endif
 
 $(GUEST_INIT): guest/init.c
$(E) "  LINK" $@
$(Q) $(CC) $(GUEST_INIT_FLAGS) guest/init.c -o $@
-   $(Q) $(LD) $(LDFLAGS) -r -b binary -o guest/guest_init.o $(GUEST_INIT)
+   $(Q) $(LD) -r -b binary -o guest/guest_init.o $(GUEST_INIT)
 
 %.s: %.c
$(Q) $(CC) -o $@ -S $(CFLAGS) -fverbose-asm $<
-- 
1.8.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Makefile: allow overriding CFLAGS on the command line

2015-11-04 Thread Andre Przywara
Hi Riku,

On 04/11/15 10:02, Riku Voipio wrote:
> On 30 October 2015 at 19:20, Andre Przywara <andre.przyw...@arm.com> wrote:
>> When a Makefile variable is set on the make command line, all
>> Makefile-internal assignments to that very variable are _ignored_.
>> Since we add quite some essential values to CFLAGS internally,
>> specifying some CFLAGS on the command line will usually break the
>> build (and not fix any include file problems you hoped to overcome
>> with that).
>> Somewhat against intuition GNU make provides the "override" directive
>> to change this behavior; with that assignments in the Makefile get
>> _appended_ to the value given on the command line. [1]
>>
>> Change any internal assignments to use that directive, so that a user
>> can use:
>> $ make CFLAGS=/path/to/my/include/dir
>> to teach kvmtool about non-standard header file locations (helpful
>> for cross-compilation) or to tweak other compiler options.
>>
>> Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
>>
>> [1] 
>> https://www.gnu.org/software/make/manual/html_node/Override-Directive.html
>> ---
>>  Makefile | 15 +++
>>  1 file changed, 7 insertions(+), 8 deletions(-)
>>
>> diff --git a/Makefile b/Makefile
>> index f8f7cc4..77a7c9f 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -15,9 +15,7 @@ include config/utilities.mak
>>  include config/feature-tests.mak
>>
>>  CC := $(CROSS_COMPILE)gcc
>> -CFLAGS :=
>>  LD := $(CROSS_COMPILE)ld
>> -LDFLAGS:=
> 
> This breaks builds of debian packages as dpkg-buildpackage sets LDFLAGS
> to something unsuitable for guest init.
> 
> Looks like this has been an issue before:

> 
> commit 57fa349a9792a629e4ed2d89e1309cc96dcc39af
> Author: Will Deacon <will.dea...@arm.com>
> Date:   Thu Jun 4 16:25:36 2015 +0100
> 
> Don't inherit CFLAGS and LDFLAGS from the environment
> 
> kvmtool doesn't build with arbitrary flags, so let's clear CFLAGS and
> LDFLAGS by default at the top of the Makefile, allowing people to add
> additional options there if they really want to.
> 
> Reported by Dave Jones, who ended up passing -std=gnu99 by mistake.

Well, I fixed this issue later with making kvmtool compilation more
robust when using modern compiler standards.
That's why I wanted this kludge to go away.

> I think it's better to have EXTRA_CFLAGS and EXTRA_LDFLAGS like the kernel
> has.

Mmmh, I'd rather see guest_init creation use their own flags for it,
since it is so special and actually independent from the target userland.
Let me check this out and send out my guest_init Makefile fix I have
lying around here on the way.

What LDFLAGS are actually causing your issues?

Cheers,
Andre.

> 
>>  FIND   := find
>>  CSCOPE := cscope
>> @@ -162,7 +160,7 @@ ifeq ($(ARCH), arm)
>> OBJS+= arm/aarch32/kvm-cpu.o
>> ARCH_INCLUDE:= $(HDRS_ARM_COMMON)
>> ARCH_INCLUDE+= -Iarm/aarch32/include
>> -   CFLAGS  += -march=armv7-a
>> +   override CFLAGS += -march=armv7-a
>>
>> ARCH_WANT_LIBFDT := y
>>  endif
>> @@ -274,12 +272,12 @@ endif
>>  ifeq ($(LTO),1)
>> FLAGS_LTO := -flto
>> ifeq ($(call try-build,$(SOURCE_HELLO),$(CFLAGS),$(FLAGS_LTO)),y)
>> -   CFLAGS  += $(FLAGS_LTO)
>> +   override CFLAGS += $(FLAGS_LTO)
>> endif
>>  endif
>>
>>  ifeq ($(call try-build,$(SOURCE_STATIC),,-static),y)
>> -   CFLAGS  += -DCONFIG_GUEST_INIT
>> +   override CFLAGS += -DCONFIG_GUEST_INIT
>> GUEST_INIT  := guest/init
>> GUEST_OBJS  = guest/guest_init.o
>> ifeq ($(ARCH_PRE_INIT),)
>> @@ -331,7 +329,8 @@ DEFINES += -DKVMTOOLS_VERSION='"$(KVMTOOLS_VERSION)"'
>>  DEFINES+= -DBUILD_ARCH='"$(ARCH)"'
>>
>>  KVM_INCLUDE := include
>> -CFLAGS += $(CPPFLAGS) $(DEFINES) -I$(KVM_INCLUDE) -I$(ARCH_INCLUDE) -O2 
>> -fno-strict-aliasing -g
>> +override CFLAGS+= $(CPPFLAGS) $(DEFINES) -I$(KVM_INCLUDE) 
>> -I$(ARCH_INCLUDE)
>> +override CFLAGS+= -O2 -fno-strict-aliasing -g
>>
>>  WARNINGS += -Wall
>>  WARNINGS += -Wformat=2
>> @@ -349,10 +348,10 @@ WARNINGS += -Wvolatile-register-var
>>  WARNINGS += -Wwrite-strings
>>  WARNINGS += -Wno-format-nonliteral
>>
>> -CFLAGS += $(WARNINGS)
>> +override CFLAGS+= $(WARNINGS)
>>
>>  ifneq ($(WERROR),0)
>> -   CFLAGS += -Werror
>> +   override CFLAGS += -Werror
>>  endif
>>
>>  all: $(PROGRAM) $(PROGRAM_ALIAS) $(GUEST_INIT) $(GUEST_PRE_INIT)
>> --
>> 2.5.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] kvmtool: Cleanup kernel loading

2015-11-02 Thread Andre Przywara
Hi Dimitri,

On 02/11/15 15:17, Dimitri John Ledkov wrote:
> On 2 November 2015 at 14:58, Will Deacon <will.dea...@arm.com> wrote:
>> On Fri, Oct 30, 2015 at 06:26:53PM +, Andre Przywara wrote:
>>> Hi,
>>
>> Hello Andre,
>>
>>> this series cleans up kvmtool's kernel loading functionality a bit.
>>> It has been broken out of a previous series I sent [1] and contains
>>> just the cleanup and bug fix parts, which should be less controversial
>>> and thus easier to merge ;-)
>>> I will resend the pipe loading part later on as a separate series.
>>>
>>> The first patch properly abstracts kernel loading to move
>>> responsibility into each architecture's code. It removes quite some
>>> ugly code from the generic kvm.c file.
>>> The later patches address the naive usage of read(2) to, well, read
>>> data from files. Doing this without coping with the subtleties of
>>> the UNIX read semantics (returning with less or none data read is not
>>> an error) can provoke hard to debug failures.
>>> So these patches make use of the existing and one new wrapper function
>>> to make sure we read everything we actually wanted to.
>>> The last patch moves the ARM kernel loading code into the proper
>>> location to be in line with the other architectures.
>>>
>>> Please have a look and give some comments!
>>
>> Looks good to me, but I'd like to see some comments from some mips/ppc/x86
>> people on the changes you're making over there.
> 
> Looks mostly good to me, as one of the kvmtool down streams. Over at
> https://github.com/clearlinux/kvmtool we have some patches to tweak
> the x86 boot flow, which will need rebasing/retweaking) specifically
> this commit here -
> https://github.com/clearlinux/kvmtool/commit/a8dee709f85735d16739d0eda0cc00d3c1b17477

Awesome - I was actually thinking about coding something like this!
In the last week I move the MIPS ELF loading out of mips/ into /elf.c to
be able to load kvm-unit-tests' tests (which are Multiboot/ELF). As
multiboot requires entering in protected mode, I was thinking about
changing kvmtool to support entering a guest directly in protected mode
- seems like this is mostly what you've done here already.

Looks like we should both post our patches to merge them somehow ;-)

Cheers,
Andre.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] kvmtool: Cleanup kernel loading

2015-11-02 Thread Andre Przywara
Hi Dimitri,

On 02/11/15 15:17, Dimitri John Ledkov wrote:
> On 2 November 2015 at 14:58, Will Deacon <will.dea...@arm.com> wrote:
>> On Fri, Oct 30, 2015 at 06:26:53PM +, Andre Przywara wrote:
>>> Hi,
>>
>> Hello Andre,
>>
>>> this series cleans up kvmtool's kernel loading functionality a bit.
>>> It has been broken out of a previous series I sent [1] and contains
>>> just the cleanup and bug fix parts, which should be less controversial
>>> and thus easier to merge ;-)
>>> I will resend the pipe loading part later on as a separate series.
>>>
>>> The first patch properly abstracts kernel loading to move
>>> responsibility into each architecture's code. It removes quite some
>>> ugly code from the generic kvm.c file.
>>> The later patches address the naive usage of read(2) to, well, read
>>> data from files. Doing this without coping with the subtleties of
>>> the UNIX read semantics (returning with less or none data read is not
>>> an error) can provoke hard to debug failures.
>>> So these patches make use of the existing and one new wrapper function
>>> to make sure we read everything we actually wanted to.
>>> The last patch moves the ARM kernel loading code into the proper
>>> location to be in line with the other architectures.
>>>
>>> Please have a look and give some comments!
>>
>> Looks good to me, but I'd like to see some comments from some mips/ppc/x86
>> people on the changes you're making over there.
> 
> Looks mostly good to me, as one of the kvmtool down streams. Over at
> https://github.com/clearlinux/kvmtool we have some patches to tweak
> the x86 boot flow, which will need rebasing/retweaking) specifically
> this commit here -
> https://github.com/clearlinux/kvmtool/commit/a8dee709f85735d16739d0eda0cc00d3c1b17477

Awesome - I was actually thinking about coding something like this!
In the last week I move the MIPS ELF loading out of mips/ into /elf.c to
be able to load kvm-unit-tests' tests (which are Multiboot/ELF). As
multiboot requires entering in protected mode, I was thinking about
changing kvmtool to support entering a guest directly in protected mode
- seems like this is mostly what you've done here already.

Looks like we should both post our patches to merge them somehow ;-)

Cheers,
Andre.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Makefile: consider LDFLAGS on feature tests and when linking executables

2015-10-30 Thread Andre Przywara
While we have an LDFLAGS variable in kvmtool's Makefile, it's not
really used when both doing the feature tests and when finally linking
the lkvm executable.
Add that variable to all the linking steps to allow the user to
specify custom library directories or linker options on the command
line.

Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
---
 Makefile | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/Makefile b/Makefile
index 77a7c9f..eac1220 100644
--- a/Makefile
+++ b/Makefile
@@ -196,12 +196,12 @@ endif
 # On a given system, some libs may link statically, some may not; so, check
 # both and only build those that link!
 
-ifeq ($(call try-build,$(SOURCE_STRLCPY),$(CFLAGS),),y)
+ifeq ($(call try-build,$(SOURCE_STRLCPY),$(CFLAGS),$(LDFLAGS)),y)
CFLAGS_DYNOPT   += -DHAVE_STRLCPY
CFLAGS_STATOPT  += -DHAVE_STRLCPY
 endif
 
-ifeq ($(call try-build,$(SOURCE_BFD),$(CFLAGS),-lbfd -static),y)
+ifeq ($(call try-build,$(SOURCE_BFD),$(CFLAGS),$(LDFLAGS) -lbfd -static),y)
CFLAGS_STATOPT  += -DCONFIG_HAS_BFD
OBJS_STATOPT+= symbol.o
LIBS_STATOPT+= -lbfd
@@ -212,7 +212,7 @@ endif
 ifeq (y,$(ARCH_HAS_FRAMEBUFFER))
CFLAGS_GTK3 := $(shell pkg-config --cflags gtk+-3.0 2>/dev/null)
LDFLAGS_GTK3 := $(shell pkg-config --libs gtk+-3.0 2>/dev/null)
-   ifeq ($(call try-build,$(SOURCE_GTK3),$(CFLAGS) 
$(CFLAGS_GTK3),$(LDFLAGS_GTK3)),y)
+   ifeq ($(call try-build,$(SOURCE_GTK3),$(CFLAGS) 
$(CFLAGS_GTK3),$(LDFLAGS) $(LDFLAGS_GTK3)),y)
OBJS_DYNOPT += ui/gtk3.o
CFLAGS_DYNOPT   += -DCONFIG_HAS_GTK3 $(CFLAGS_GTK3)
LIBS_DYNOPT += $(LDFLAGS_GTK3)
@@ -220,63 +220,63 @@ ifeq (y,$(ARCH_HAS_FRAMEBUFFER))
NOTFOUND+= GTK3
endif
 
-   ifeq ($(call try-build,$(SOURCE_VNCSERVER),$(CFLAGS),-lvncserver),y)
+   ifeq ($(call try-build,$(SOURCE_VNCSERVER),$(CFLAGS),$(LDFLAGS) 
-lvncserver),y)
OBJS_DYNOPT += ui/vnc.o
CFLAGS_DYNOPT   += -DCONFIG_HAS_VNCSERVER
LIBS_DYNOPT += -lvncserver
else
NOTFOUND+= vncserver
endif
-   ifeq ($(call try-build,$(SOURCE_VNCSERVER),$(CFLAGS),-lvncserver 
-static),y)
+   ifeq ($(call try-build,$(SOURCE_VNCSERVER),$(CFLAGS),$(LDFLAGS) 
-lvncserver -static),y)
OBJS_STATOPT+= ui/vnc.o
CFLAGS_STATOPT  += -DCONFIG_HAS_VNCSERVER
LIBS_STATOPT+= -lvncserver
endif
 
-   ifeq ($(call try-build,$(SOURCE_SDL),$(CFLAGS),-lSDL),y)
+   ifeq ($(call try-build,$(SOURCE_SDL),$(CFLAGS),$(LDFLAGS) -lSDL),y)
OBJS_DYNOPT += ui/sdl.o
CFLAGS_DYNOPT   += -DCONFIG_HAS_SDL
LIBS_DYNOPT += -lSDL
else
NOTFOUND+= SDL
endif
-   ifeq ($(call try-build,$(SOURCE_SDL),$(CFLAGS),-lSDL -static), y)
+   ifeq ($(call try-build,$(SOURCE_SDL),$(CFLAGS),$(LDFLAGS) -lSDL 
-static), y)
OBJS_STATOPT+= ui/sdl.o
CFLAGS_STATOPT  += -DCONFIG_HAS_SDL
LIBS_STATOPT+= -lSDL
endif
 endif
 
-ifeq ($(call try-build,$(SOURCE_ZLIB),$(CFLAGS),-lz),y)
+ifeq ($(call try-build,$(SOURCE_ZLIB),$(CFLAGS),$(LDFLAGS) -lz),y)
CFLAGS_DYNOPT   += -DCONFIG_HAS_ZLIB
LIBS_DYNOPT += -lz
 else
NOTFOUND+= zlib
 endif
-ifeq ($(call try-build,$(SOURCE_ZLIB),$(CFLAGS),-lz -static),y)
+ifeq ($(call try-build,$(SOURCE_ZLIB),$(CFLAGS),$(LDFLAGS) -lz -static),y)
CFLAGS_STATOPT  += -DCONFIG_HAS_ZLIB
LIBS_STATOPT+= -lz
 endif
 
-ifeq ($(call try-build,$(SOURCE_AIO),$(CFLAGS),-laio),y)
+ifeq ($(call try-build,$(SOURCE_AIO),$(CFLAGS),$(LDFLAGS) -laio),y)
CFLAGS_DYNOPT   += -DCONFIG_HAS_AIO
LIBS_DYNOPT += -laio
 else
NOTFOUND+= aio
 endif
-ifeq ($(call try-build,$(SOURCE_AIO),$(CFLAGS),-laio -static),y)
+ifeq ($(call try-build,$(SOURCE_AIO),$(CFLAGS),$(LDFLAGS) -laio -static),y)
CFLAGS_STATOPT  += -DCONFIG_HAS_AIO
LIBS_STATOPT+= -laio
 endif
 
 ifeq ($(LTO),1)
FLAGS_LTO := -flto
-   ifeq ($(call try-build,$(SOURCE_HELLO),$(CFLAGS),$(FLAGS_LTO)),y)
+   ifeq ($(call try-build,$(SOURCE_HELLO),$(CFLAGS),$(LDFLAGS) 
$(FLAGS_LTO)),y)
override CFLAGS += $(FLAGS_LTO)
endif
 endif
 
-ifeq ($(call try-build,$(SOURCE_STATIC),,-static),y)
+ifeq ($(call try-build,$(SOURCE_STATIC),$(CFLAGS),$(LDFLAGS) -static),y)
override CFLAGS += -DCONFIG_GUEST_INIT
GUEST_INIT  := guest/init
GUEST_OBJS  = guest/guest_init.o
@@ -370,11 +370,11 @@ STATIC_OBJS = $(patsubst %.o,%.static.o,$(OBJS) 
$(OBJS_STATOPT))
 
 $(PROGRAM)-static:  $(STATIC_OBJS) $(OTHEROBJS) $(GUEST_INIT) $(GUEST_PRE_INIT)
$(E) "  LINK" $@
-   $(Q) $(CC) -static $(CFLAGS) $(ST

[PATCH 1/2] Makefile: allow overriding CFLAGS on the command line

2015-10-30 Thread Andre Przywara
When a Makefile variable is set on the make command line, all
Makefile-internal assignments to that very variable are _ignored_.
Since we add quite some essential values to CFLAGS internally,
specifying some CFLAGS on the command line will usually break the
build (and not fix any include file problems you hoped to overcome
with that).
Somewhat against intuition GNU make provides the "override" directive
to change this behavior; with that assignments in the Makefile get
_appended_ to the value given on the command line. [1]

Change any internal assignments to use that directive, so that a user
can use:
$ make CFLAGS=/path/to/my/include/dir
to teach kvmtool about non-standard header file locations (helpful
for cross-compilation) or to tweak other compiler options.

Signed-off-by: Andre Przywara <andre.przyw...@arm.com>

[1] https://www.gnu.org/software/make/manual/html_node/Override-Directive.html
---
 Makefile | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/Makefile b/Makefile
index f8f7cc4..77a7c9f 100644
--- a/Makefile
+++ b/Makefile
@@ -15,9 +15,7 @@ include config/utilities.mak
 include config/feature-tests.mak
 
 CC := $(CROSS_COMPILE)gcc
-CFLAGS :=
 LD := $(CROSS_COMPILE)ld
-LDFLAGS:=
 
 FIND   := find
 CSCOPE := cscope
@@ -162,7 +160,7 @@ ifeq ($(ARCH), arm)
OBJS+= arm/aarch32/kvm-cpu.o
ARCH_INCLUDE:= $(HDRS_ARM_COMMON)
ARCH_INCLUDE+= -Iarm/aarch32/include
-   CFLAGS  += -march=armv7-a
+   override CFLAGS += -march=armv7-a
 
ARCH_WANT_LIBFDT := y
 endif
@@ -274,12 +272,12 @@ endif
 ifeq ($(LTO),1)
FLAGS_LTO := -flto
ifeq ($(call try-build,$(SOURCE_HELLO),$(CFLAGS),$(FLAGS_LTO)),y)
-   CFLAGS  += $(FLAGS_LTO)
+   override CFLAGS += $(FLAGS_LTO)
endif
 endif
 
 ifeq ($(call try-build,$(SOURCE_STATIC),,-static),y)
-   CFLAGS  += -DCONFIG_GUEST_INIT
+   override CFLAGS += -DCONFIG_GUEST_INIT
GUEST_INIT  := guest/init
GUEST_OBJS  = guest/guest_init.o
ifeq ($(ARCH_PRE_INIT),)
@@ -331,7 +329,8 @@ DEFINES += -DKVMTOOLS_VERSION='"$(KVMTOOLS_VERSION)"'
 DEFINES+= -DBUILD_ARCH='"$(ARCH)"'
 
 KVM_INCLUDE := include
-CFLAGS += $(CPPFLAGS) $(DEFINES) -I$(KVM_INCLUDE) -I$(ARCH_INCLUDE) -O2 
-fno-strict-aliasing -g
+override CFLAGS+= $(CPPFLAGS) $(DEFINES) -I$(KVM_INCLUDE) 
-I$(ARCH_INCLUDE)
+override CFLAGS+= -O2 -fno-strict-aliasing -g
 
 WARNINGS += -Wall
 WARNINGS += -Wformat=2
@@ -349,10 +348,10 @@ WARNINGS += -Wvolatile-register-var
 WARNINGS += -Wwrite-strings
 WARNINGS += -Wno-format-nonliteral
 
-CFLAGS += $(WARNINGS)
+override CFLAGS+= $(WARNINGS)
 
 ifneq ($(WERROR),0)
-   CFLAGS += -Werror
+   override CFLAGS += -Werror
 endif
 
 all: $(PROGRAM) $(PROGRAM_ALIAS) $(GUEST_INIT) $(GUEST_PRE_INIT)
-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] kvmtool: allow CFLAGS and LDFLAGS override

2015-10-30 Thread Andre Przywara
In the past there have been some complaints from cross-compilation
users about missing libraries and include files. Fixing those issues
by pointing make to the proper directories via CFLAGS and LDFLAGS
proved to be complicated.
This series fixes this, so custom CFLAGS and LDFLAGS can be given on
the make command line and those will not overwrite kvmtool's vital
internal assignments.

Cheers,
Andre.

Andre Przywara (2):
  Makefile: allow overriding CFLAGS on the command line
  Makefile: consider LDFLAGS on feature tests and when linking executables

 Makefile | 45 ++---
 1 file changed, 22 insertions(+), 23 deletions(-)

-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/7] x86: use read wrappers in kernel loading

2015-10-30 Thread Andre Przywara
Replace the unsafe read-loops in the x86 kernel image loading
functions with our safe read_file() and read_in_full() wrappers.
This should fix random fails in kernel image loading, especially
from pipes and sockets.

Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
---
 x86/kvm.c | 35 ++-
 1 file changed, 14 insertions(+), 21 deletions(-)

diff --git a/x86/kvm.c b/x86/kvm.c
index a0204b8..ae430a0 100644
--- a/x86/kvm.c
+++ b/x86/kvm.c
@@ -9,6 +9,7 @@
 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -209,15 +210,14 @@ static inline void *guest_real_to_host(struct kvm *kvm, 
u16 selector, u16 offset
 static bool load_flat_binary(struct kvm *kvm, int fd_kernel)
 {
void *p;
-   int nr;
 
if (lseek(fd_kernel, 0, SEEK_SET) < 0)
die_perror("lseek");
 
p = guest_real_to_host(kvm, BOOT_LOADER_SELECTOR, BOOT_LOADER_IP);
 
-   while ((nr = read(fd_kernel, p, 65536)) > 0)
-   p += nr;
+   if (read_file(fd_kernel, p, kvm->cfg.ram_size) < 0)
+   die_perror("read");
 
kvm->arch.boot_selector = BOOT_LOADER_SELECTOR;
kvm->arch.boot_ip   = BOOT_LOADER_IP;
@@ -232,12 +232,10 @@ static bool load_bzimage(struct kvm *kvm, int fd_kernel, 
int fd_initrd,
 const char *kernel_cmdline)
 {
struct boot_params *kern_boot;
-   unsigned long setup_sects;
struct boot_params boot;
size_t cmdline_size;
-   ssize_t setup_size;
+   ssize_t file_size;
void *p;
-   int nr;
u16 vidmode;
 
/*
@@ -248,7 +246,7 @@ static bool load_bzimage(struct kvm *kvm, int fd_kernel, 
int fd_initrd,
if (lseek(fd_kernel, 0, SEEK_SET) < 0)
die_perror("lseek");
 
-   if (read(fd_kernel, , sizeof(boot)) != sizeof(boot))
+   if (read_in_full(fd_kernel, , sizeof(boot)) != sizeof(boot))
return false;
 
if (memcmp(, BZIMAGE_MAGIC, strlen(BZIMAGE_MAGIC)))
@@ -262,20 +260,17 @@ static bool load_bzimage(struct kvm *kvm, int fd_kernel, 
int fd_initrd,
 
if (!boot.hdr.setup_sects)
boot.hdr.setup_sects = BZ_DEFAULT_SETUP_SECTS;
-   setup_sects = boot.hdr.setup_sects + 1;
-
-   setup_size = setup_sects << 9;
+   file_size = (boot.hdr.setup_sects + 1) << 9;
p = guest_real_to_host(kvm, BOOT_LOADER_SELECTOR, BOOT_LOADER_IP);
+   if (read_in_full(fd_kernel, p, file_size) != file_size)
+   die_perror("kernel setup read");
 
-   /* copy setup.bin to mem*/
-   if (read(fd_kernel, p, setup_size) != setup_size)
-   die_perror("read");
-
-   /* copy vmlinux.bin to BZ_KERNEL_START*/
+   /* read actual kernel image (vmlinux.bin) to BZ_KERNEL_START */
p = guest_flat_to_host(kvm, BZ_KERNEL_START);
-
-   while ((nr = read(fd_kernel, p, 65536)) > 0)
-   p += nr;
+   file_size = read_file(fd_kernel, p,
+ kvm->cfg.ram_size - BZ_KERNEL_START);
+   if (file_size < 0)
+   die_perror("kernel read");
 
p = guest_flat_to_host(kvm, BOOT_CMDLINE_OFFSET);
if (kernel_cmdline) {
@@ -287,7 +282,6 @@ static bool load_bzimage(struct kvm *kvm, int fd_kernel, 
int fd_initrd,
memcpy(p, kernel_cmdline, cmdline_size - 1);
}
 
-
/* vidmode should be either specified or set by default */
if (kvm->cfg.vnc || kvm->cfg.sdl || kvm->cfg.gtk) {
if (!kvm->cfg.arch.vidmode)
@@ -326,8 +320,7 @@ static bool load_bzimage(struct kvm *kvm, int fd_kernel, 
int fd_initrd,
}
 
p = guest_flat_to_host(kvm, addr);
-   nr = read(fd_initrd, p, initrd_stat.st_size);
-   if (nr != initrd_stat.st_size)
+   if (read_in_full(fd_initrd, p, initrd_stat.st_size) < 0)
die("Failed to read initrd");
 
kern_boot->hdr.ramdisk_image= addr;
-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/7] arm/arm64: use read_file() in kernel and initrd loading

2015-10-30 Thread Andre Przywara
Use the new read_file() wrapper in our arm/arm64 kernel image loading
function instead of the private implementation.

Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
---
 arm/fdt.c | 40 ++--
 1 file changed, 18 insertions(+), 22 deletions(-)

diff --git a/arm/fdt.c b/arm/fdt.c
index ec7453f..19d7ed9 100644
--- a/arm/fdt.c
+++ b/arm/fdt.c
@@ -224,19 +224,6 @@ static int setup_fdt(struct kvm *kvm)
 }
 late_init(setup_fdt);
 
-static int read_image(int fd, void **pos, void *limit)
-{
-   int count;
-
-   while (((count = xread(fd, *pos, SZ_64K)) > 0) && *pos <= limit)
-   *pos += count;
-
-   if (pos < 0)
-   die_perror("xread");
-
-   return *pos < limit ? 0 : -ENOMEM;
-}
-
 #define FDT_ALIGN  SZ_2M
 #define INITRD_ALIGN   4
 bool kvm__arch_load_kernel_image(struct kvm *kvm, int fd_kernel, int fd_initrd,
@@ -244,6 +231,7 @@ bool kvm__arch_load_kernel_image(struct kvm *kvm, int 
fd_kernel, int fd_initrd,
 {
void *pos, *kernel_end, *limit;
unsigned long guest_addr;
+   ssize_t file_size;
 
if (lseek(fd_kernel, 0, SEEK_SET) < 0)
die_perror("lseek");
@@ -256,13 +244,16 @@ bool kvm__arch_load_kernel_image(struct kvm *kvm, int 
fd_kernel, int fd_initrd,
 
pos = kvm->ram_start + ARM_KERN_OFFSET(kvm);
kvm->arch.kern_guest_start = host_to_guest_flat(kvm, pos);
-   if (read_image(fd_kernel, , limit) == -ENOMEM)
-   die("kernel image too big to contain in guest memory.");
+   file_size = read_file(fd_kernel, pos, limit - pos);
+   if (file_size < 0) {
+   if (errno == ENOMEM)
+   die("kernel image too big to contain in guest memory.");
 
-   kernel_end = pos;
-   pr_info("Loaded kernel to 0x%llx (%llu bytes)",
-   kvm->arch.kern_guest_start,
-   host_to_guest_flat(kvm, pos) - kvm->arch.kern_guest_start);
+   die_perror("kernel read");
+   }
+   kernel_end = pos + file_size;
+   pr_info("Loaded kernel to 0x%llx (%zd bytes)",
+   kvm->arch.kern_guest_start, file_size);
 
/*
 * Now load backwards from the end of memory so the kernel
@@ -300,11 +291,16 @@ bool kvm__arch_load_kernel_image(struct kvm *kvm, int 
fd_kernel, int fd_initrd,
die("initrd overlaps with kernel image.");
 
initrd_start = guest_addr;
-   if (read_image(fd_initrd, , limit) == -ENOMEM)
-   die("initrd too big to contain in guest memory.");
+   file_size = read_file(fd_initrd, pos, limit - pos);
+   if (file_size == -1) {
+   if (errno == ENOMEM)
+   die("initrd too big to contain in guest 
memory.");
+
+   die_perror("initrd read");
+   }
 
kvm->arch.initrd_guest_start = initrd_start;
-   kvm->arch.initrd_size = host_to_guest_flat(kvm, pos) - 
initrd_start;
+   kvm->arch.initrd_size = file_size;
pr_info("Loaded initrd to 0x%llx (%llu bytes)",
kvm->arch.initrd_guest_start,
kvm->arch.initrd_size);
-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/7] arm: move kernel loading into arm/kvm.c

2015-10-30 Thread Andre Przywara
For some reasons (probably to have easy access to the command line)
the kernel loading for arm and arm64 was located in arm/fdt.c.
Move the routines to kvm.c (where other architectures put it) to
only have real device tree code in fdt.c. We use the pointer in
struct kvm to access the command line string.

Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
---
 arm/fdt.c | 95 +--
 arm/kvm.c | 88 ++
 2 files changed, 89 insertions(+), 94 deletions(-)

diff --git a/arm/fdt.c b/arm/fdt.c
index 19d7ed9..381d48f 100644
--- a/arm/fdt.c
+++ b/arm/fdt.c
@@ -9,14 +9,11 @@
 
 #include 
 
-#include 
 #include 
 #include 
 #include 
 #include 
 
-static char kern_cmdline[COMMAND_LINE_SIZE];
-
 bool kvm__load_firmware(struct kvm *kvm, const char *firmware_filename)
 {
return false;
@@ -145,7 +142,7 @@ static int setup_fdt(struct kvm *kvm)
/* /chosen */
_FDT(fdt_begin_node(fdt, "chosen"));
_FDT(fdt_property_cell(fdt, "linux,pci-probe-only", 1));
-   _FDT(fdt_property_string(fdt, "bootargs", kern_cmdline));
+   _FDT(fdt_property_string(fdt, "bootargs", kvm->cfg.real_cmdline));
 
/* Initrd */
if (kvm->arch.initrd_size != 0) {
@@ -223,93 +220,3 @@ static int setup_fdt(struct kvm *kvm)
return 0;
 }
 late_init(setup_fdt);
-
-#define FDT_ALIGN  SZ_2M
-#define INITRD_ALIGN   4
-bool kvm__arch_load_kernel_image(struct kvm *kvm, int fd_kernel, int fd_initrd,
-const char *kernel_cmdline)
-{
-   void *pos, *kernel_end, *limit;
-   unsigned long guest_addr;
-   ssize_t file_size;
-
-   if (lseek(fd_kernel, 0, SEEK_SET) < 0)
-   die_perror("lseek");
-
-   /*
-* Linux requires the initrd and dtb to be mapped inside lowmem,
-* so we can't just place them at the top of memory.
-*/
-   limit = kvm->ram_start + min(kvm->ram_size, (u64)SZ_256M) - 1;
-
-   pos = kvm->ram_start + ARM_KERN_OFFSET(kvm);
-   kvm->arch.kern_guest_start = host_to_guest_flat(kvm, pos);
-   file_size = read_file(fd_kernel, pos, limit - pos);
-   if (file_size < 0) {
-   if (errno == ENOMEM)
-   die("kernel image too big to contain in guest memory.");
-
-   die_perror("kernel read");
-   }
-   kernel_end = pos + file_size;
-   pr_info("Loaded kernel to 0x%llx (%zd bytes)",
-   kvm->arch.kern_guest_start, file_size);
-
-   /*
-* Now load backwards from the end of memory so the kernel
-* decompressor has plenty of space to work with. First up is
-* the device tree blob...
-*/
-   pos = limit;
-   pos -= (FDT_MAX_SIZE + FDT_ALIGN);
-   guest_addr = ALIGN(host_to_guest_flat(kvm, pos), FDT_ALIGN);
-   pos = guest_flat_to_host(kvm, guest_addr);
-   if (pos < kernel_end)
-   die("fdt overlaps with kernel image.");
-
-   kvm->arch.dtb_guest_start = guest_addr;
-   pr_info("Placing fdt at 0x%llx - 0x%llx",
-   kvm->arch.dtb_guest_start,
-   host_to_guest_flat(kvm, limit));
-   limit = pos;
-
-   /* ... and finally the initrd, if we have one. */
-   if (fd_initrd != -1) {
-   struct stat sb;
-   unsigned long initrd_start;
-
-   if (lseek(fd_initrd, 0, SEEK_SET) < 0)
-   die_perror("lseek");
-
-   if (fstat(fd_initrd, ))
-   die_perror("fstat");
-
-   pos -= (sb.st_size + INITRD_ALIGN);
-   guest_addr = ALIGN(host_to_guest_flat(kvm, pos), INITRD_ALIGN);
-   pos = guest_flat_to_host(kvm, guest_addr);
-   if (pos < kernel_end)
-   die("initrd overlaps with kernel image.");
-
-   initrd_start = guest_addr;
-   file_size = read_file(fd_initrd, pos, limit - pos);
-   if (file_size == -1) {
-   if (errno == ENOMEM)
-   die("initrd too big to contain in guest 
memory.");
-
-   die_perror("initrd read");
-   }
-
-   kvm->arch.initrd_guest_start = initrd_start;
-   kvm->arch.initrd_size = file_size;
-   pr_info("Loaded initrd to 0x%llx (%llu bytes)",
-   kvm->arch.initrd_guest_start,
-   kvm->arch.initrd_size);
-   } else {
-   kvm->arch.initrd_size = 0;
-   }
-
-   strncpy(kern_cmdline, kernel_cmdline, COMMAND_LINE_SIZE);
-   kern_cmdline[COMMAND_LINE_SIZE - 1] = '\0';
-
-   return true;
-}
diff --git a/arm/kvm.c b/ar

[PATCH 1/7] Refactor kernel image loading

2015-10-30 Thread Andre Przywara
Let's face it: Kernel loading is quite architecture specific. Don't
claim otherwise and move the loading routines into each
architecture's responsibility.
This introduces kvm__arch_load_kernel(), which each architecture can
implement accordingly.
Provide bzImage loading for x86 and ELF loading for MIPS as special
cases for those architectures (removing the arch specific code from
the generic kvm.c file on the way) and rename the existing "flat binary"
loader functions for the other architectures to the new name.

Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
---
 arm/fdt.c |  4 ++--
 include/kvm/kvm.h |  5 ++---
 kvm.c | 42 --
 mips/kvm.c| 23 +++
 powerpc/kvm.c |  3 ++-
 x86/kvm.c | 27 +--
 6 files changed, 46 insertions(+), 58 deletions(-)

diff --git a/arm/fdt.c b/arm/fdt.c
index 3657108..ec7453f 100644
--- a/arm/fdt.c
+++ b/arm/fdt.c
@@ -239,8 +239,8 @@ static int read_image(int fd, void **pos, void *limit)
 
 #define FDT_ALIGN  SZ_2M
 #define INITRD_ALIGN   4
-int load_flat_binary(struct kvm *kvm, int fd_kernel, int fd_initrd,
-const char *kernel_cmdline)
+bool kvm__arch_load_kernel_image(struct kvm *kvm, int fd_kernel, int fd_initrd,
+const char *kernel_cmdline)
 {
void *pos, *kernel_end, *limit;
unsigned long guest_addr;
diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
index 37155db..055a7a2 100644
--- a/include/kvm/kvm.h
+++ b/include/kvm/kvm.h
@@ -111,9 +111,8 @@ void kvm__arch_read_term(struct kvm *kvm);
 void *guest_flat_to_host(struct kvm *kvm, u64 offset);
 u64 host_to_guest_flat(struct kvm *kvm, void *ptr);
 
-int load_flat_binary(struct kvm *kvm, int fd_kernel, int fd_initrd, const char 
*kernel_cmdline);
-int load_elf_binary(struct kvm *kvm, int fd_kernel, int fd_initrd, const char 
*kernel_cmdline);
-bool load_bzimage(struct kvm *kvm, int fd_kernel, int fd_initrd, const char 
*kernel_cmdline);
+bool kvm__arch_load_kernel_image(struct kvm *kvm, int fd_kernel, int fd_initrd,
+const char *kernel_cmdline);
 
 /*
  * Debugging
diff --git a/kvm.c b/kvm.c
index 10ed230..ca7dfee 100644
--- a/kvm.c
+++ b/kvm.c
@@ -341,18 +341,6 @@ static bool initrd_check(int fd)
!memcmp(id, CPIO_MAGIC, 4);
 }
 
-int __attribute__((__weak__)) load_elf_binary(struct kvm *kvm, int fd_kernel,
-   int fd_initrd, const char *kernel_cmdline)
-{
-   return false;
-}
-
-bool __attribute__((__weak__)) load_bzimage(struct kvm *kvm, int fd_kernel,
-   int fd_initrd, const char *kernel_cmdline)
-{
-   return false;
-}
-
 bool kvm__load_kernel(struct kvm *kvm, const char *kernel_filename,
const char *initrd_filename, const char *kernel_cmdline)
 {
@@ -372,40 +360,18 @@ bool kvm__load_kernel(struct kvm *kvm, const char 
*kernel_filename,
die("%s is not an initrd", initrd_filename);
}
 
-#ifdef CONFIG_X86
-   ret = load_bzimage(kvm, fd_kernel, fd_initrd, kernel_cmdline);
-
-   if (ret)
-   goto found_kernel;
-
-   pr_warning("%s is not a bzImage. Trying to load it as a flat 
binary...", kernel_filename);
-#endif
-
-   ret = load_elf_binary(kvm, fd_kernel, fd_initrd, kernel_cmdline);
-
-   if (ret)
-   goto found_kernel;
-
-   ret = load_flat_binary(kvm, fd_kernel, fd_initrd, kernel_cmdline);
-
-   if (ret)
-   goto found_kernel;
+   ret = kvm__arch_load_kernel_image(kvm, fd_kernel, fd_initrd,
+ kernel_cmdline);
 
if (initrd_filename)
close(fd_initrd);
close(fd_kernel);
 
-   die("%s is not a valid bzImage or flat binary", kernel_filename);
-
-found_kernel:
-   if (initrd_filename)
-   close(fd_initrd);
-   close(fd_kernel);
-
+   if (!ret)
+   die("%s is not a valid kernel image", kernel_filename);
return ret;
 }
 
-
 void kvm__dump_mem(struct kvm *kvm, unsigned long addr, unsigned long size, 
int debug_fd)
 {
unsigned char *p;
diff --git a/mips/kvm.c b/mips/kvm.c
index 1925f38..c1c596c 100644
--- a/mips/kvm.c
+++ b/mips/kvm.c
@@ -163,7 +163,8 @@ static void kvm__mips_install_cmdline(struct kvm *kvm)
 
 /* Load at the 1M point. */
 #define KERNEL_LOAD_ADDR 0x100
-int load_flat_binary(struct kvm *kvm, int fd_kernel, int fd_initrd, const char 
*kernel_cmdline)
+
+static bool load_flat_binary(struct kvm *kvm, int fd_kernel)
 {
void *p;
void *k_start;
@@ -281,7 +282,7 @@ static bool kvm__arch_get_elf_32_info(Elf32_Ehdr *ehdr, int 
fd_kernel,
return true;
 }
 
-int load_elf_binary(struct kvm *kvm, int fd_kernel, int fd_initrd, const char 
*kernel_cmdline)
+static bool load_elf

[PATCH 3/7] powerpc: use read_file() in kernel and initrd loading

2015-10-30 Thread Andre Przywara
Replace the unsafe read-loops in the powerpc kernel image loading
function with our new and safe read_file() wrapper.
This should fix random fails in kernel image loading, especially
from pipes and sockets.

Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
---
 powerpc/kvm.c | 36 
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/powerpc/kvm.c b/powerpc/kvm.c
index 13bba30..2b0bddd 100644
--- a/powerpc/kvm.c
+++ b/powerpc/kvm.c
@@ -162,19 +162,22 @@ bool kvm__arch_load_kernel_image(struct kvm *kvm, int 
fd_kernel, int fd_initrd,
 {
void *p;
void *k_start;
-   void *i_start;
-   int nr;
+   ssize_t filesize;
 
if (lseek(fd_kernel, 0, SEEK_SET) < 0)
die_perror("lseek");
 
p = k_start = guest_flat_to_host(kvm, KERNEL_LOAD_ADDR);
 
-   while ((nr = read(fd_kernel, p, 65536)) > 0)
-   p += nr;
-
-   pr_info("Loaded kernel to 0x%x (%ld bytes)", KERNEL_LOAD_ADDR, 
(long)(p-k_start));
+   filesize = read_file(fd_kernel, p, INITRD_LOAD_ADDR - KERNEL_LOAD_ADDR);
+   if (filesize < 0) {
+   if (errno == ENOMEM)
+   die("Kernel overlaps initrd!");
 
+   die_perror("kernel read");
+   }
+   pr_info("Loaded kernel to 0x%x (%ld bytes)", KERNEL_LOAD_ADDR,
+   filesize);
if (fd_initrd != -1) {
if (lseek(fd_initrd, 0, SEEK_SET) < 0)
die_perror("lseek");
@@ -183,19 +186,20 @@ bool kvm__arch_load_kernel_image(struct kvm *kvm, int 
fd_kernel, int fd_initrd,
die("Kernel overlaps initrd!");
 
/* Round up kernel size to 8byte alignment, and load initrd 
right after. */
-   i_start = p = guest_flat_to_host(kvm, INITRD_LOAD_ADDR);
-
-   while (((nr = read(fd_initrd, p, 65536)) > 0) &&
-  p < (kvm->ram_start + kvm->ram_size))
-   p += nr;
-
-   if (p >= (kvm->ram_start + kvm->ram_size))
-   die("initrd too big to contain in guest RAM.\n");
+   p = guest_flat_to_host(kvm, INITRD_LOAD_ADDR);
+
+   filesize = read_file(fd_initrd, p,
+  (kvm->ram_start + kvm->ram_size) - p);
+   if (filesize < 0) {
+   if (errno == ENOMEM)
+   die("initrd too big to contain in guest 
RAM.\n");
+   die_perror("initrd read");
+   }
 
pr_info("Loaded initrd to 0x%x (%ld bytes)",
-   INITRD_LOAD_ADDR, (long)(p-i_start));
+   INITRD_LOAD_ADDR, filesize);
kvm->arch.initrd_gra = INITRD_LOAD_ADDR;
-   kvm->arch.initrd_size = p-i_start;
+   kvm->arch.initrd_size = filesize;
} else {
kvm->arch.initrd_size = 0;
}
-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/7] MIPS: use read wrappers in kernel loading

2015-10-30 Thread Andre Przywara
Replace the unsafe read-loops used in the MIPS kernel image loading
with our safe read_file() and read_in_full() wrappers.
This should fix random fails in kernel image loading, especially
from pipes and sockets.

Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
---
 mips/kvm.c | 36 ++--
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/mips/kvm.c b/mips/kvm.c
index c1c596c..8fbf8de 100644
--- a/mips/kvm.c
+++ b/mips/kvm.c
@@ -168,20 +168,27 @@ static bool load_flat_binary(struct kvm *kvm, int 
fd_kernel)
 {
void *p;
void *k_start;
-   int nr;
+   ssize_t kernel_size;
 
if (lseek(fd_kernel, 0, SEEK_SET) < 0)
die_perror("lseek");
 
p = k_start = guest_flat_to_host(kvm, KERNEL_LOAD_ADDR);
 
-   while ((nr = read(fd_kernel, p, 65536)) > 0)
-   p += nr;
+   kernel_size = read_file(fd_kernel, p,
+   kvm->cfg.ram_size - KERNEL_LOAD_ADDR);
+   if (kernel_size == -1) {
+   if (errno == ENOMEM)
+   die("kernel too big for guest memory");
+   else
+   die_perror("kernel read");
+   }
 
kvm->arch.is64bit = true;
kvm->arch.entry_point = 0x8100ull;
 
-   pr_info("Loaded kernel to 0x%x (%ld bytes)", KERNEL_LOAD_ADDR, (long 
int)(p - k_start));
+   pr_info("Loaded kernel to 0x%x (%zd bytes)", KERNEL_LOAD_ADDR,
+   kernel_size);
 
return true;
 }
@@ -197,7 +204,6 @@ static bool kvm__arch_get_elf_64_info(Elf64_Ehdr *ehdr, int 
fd_kernel,
  struct kvm__arch_elf_info *ei)
 {
int i;
-   size_t nr;
Elf64_Phdr phdr;
 
if (ehdr->e_phentsize != sizeof(phdr)) {
@@ -212,8 +218,7 @@ static bool kvm__arch_get_elf_64_info(Elf64_Ehdr *ehdr, int 
fd_kernel,
 
phdr.p_type = PT_NULL;
for (i = 0; i < ehdr->e_phnum; i++) {
-   nr = read(fd_kernel, , sizeof(phdr));
-   if (nr != sizeof(phdr)) {
+   if (read_in_full(fd_kernel, , sizeof(phdr)) != 
sizeof(phdr)) {
pr_info("Couldn't read %d bytes for ELF PHDR.", 
(int)sizeof(phdr));
return false;
}
@@ -243,7 +248,6 @@ static bool kvm__arch_get_elf_32_info(Elf32_Ehdr *ehdr, int 
fd_kernel,
  struct kvm__arch_elf_info *ei)
 {
int i;
-   size_t nr;
Elf32_Phdr phdr;
 
if (ehdr->e_phentsize != sizeof(phdr)) {
@@ -258,8 +262,7 @@ static bool kvm__arch_get_elf_32_info(Elf32_Ehdr *ehdr, int 
fd_kernel,
 
phdr.p_type = PT_NULL;
for (i = 0; i < ehdr->e_phnum; i++) {
-   nr = read(fd_kernel, , sizeof(phdr));
-   if (nr != sizeof(phdr)) {
+   if (read_in_full(fd_kernel, , sizeof(phdr)) != 
sizeof(phdr)) {
pr_info("Couldn't read %d bytes for ELF PHDR.", 
(int)sizeof(phdr));
return false;
}
@@ -334,14 +337,11 @@ static bool load_elf_binary(struct kvm *kvm, int 
fd_kernel)
p = guest_flat_to_host(kvm, ei.load_addr);
 
pr_info("ELF Loading 0x%lx bytes from 0x%llx to 0x%llx",
-   (unsigned long)ei.len, (unsigned long long)ei.offset, (unsigned 
long long)ei.load_addr);
-   do {
-   nr = read(fd_kernel, p, ei.len);
-   if (nr < 0)
-   die_perror("read");
-   p += nr;
-   ei.len -= nr;
-   } while (ei.len);
+   (unsigned long)ei.len, (unsigned long long)ei.offset,
+   (unsigned long long)ei.load_addr);
+
+   if (read_in_full(fd_kernel, p, ei.len) != (ssize_t)ei.len)
+   die_perror("read");
 
return true;
 }
-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/7] arm/arm64: use read_file() in kernel and initrd loading

2015-10-30 Thread Andre Przywara
Use the new read_file() wrapper in our arm/arm64 kernel image loading
function instead of the private implementation.

Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
---
 arm/fdt.c | 40 ++--
 1 file changed, 18 insertions(+), 22 deletions(-)

diff --git a/arm/fdt.c b/arm/fdt.c
index ec7453f..19d7ed9 100644
--- a/arm/fdt.c
+++ b/arm/fdt.c
@@ -224,19 +224,6 @@ static int setup_fdt(struct kvm *kvm)
 }
 late_init(setup_fdt);
 
-static int read_image(int fd, void **pos, void *limit)
-{
-   int count;
-
-   while (((count = xread(fd, *pos, SZ_64K)) > 0) && *pos <= limit)
-   *pos += count;
-
-   if (pos < 0)
-   die_perror("xread");
-
-   return *pos < limit ? 0 : -ENOMEM;
-}
-
 #define FDT_ALIGN  SZ_2M
 #define INITRD_ALIGN   4
 bool kvm__arch_load_kernel_image(struct kvm *kvm, int fd_kernel, int fd_initrd,
@@ -244,6 +231,7 @@ bool kvm__arch_load_kernel_image(struct kvm *kvm, int 
fd_kernel, int fd_initrd,
 {
void *pos, *kernel_end, *limit;
unsigned long guest_addr;
+   ssize_t file_size;
 
if (lseek(fd_kernel, 0, SEEK_SET) < 0)
die_perror("lseek");
@@ -256,13 +244,16 @@ bool kvm__arch_load_kernel_image(struct kvm *kvm, int 
fd_kernel, int fd_initrd,
 
pos = kvm->ram_start + ARM_KERN_OFFSET(kvm);
kvm->arch.kern_guest_start = host_to_guest_flat(kvm, pos);
-   if (read_image(fd_kernel, , limit) == -ENOMEM)
-   die("kernel image too big to contain in guest memory.");
+   file_size = read_file(fd_kernel, pos, limit - pos);
+   if (file_size < 0) {
+   if (errno == ENOMEM)
+   die("kernel image too big to contain in guest memory.");
 
-   kernel_end = pos;
-   pr_info("Loaded kernel to 0x%llx (%llu bytes)",
-   kvm->arch.kern_guest_start,
-   host_to_guest_flat(kvm, pos) - kvm->arch.kern_guest_start);
+   die_perror("kernel read");
+   }
+   kernel_end = pos + file_size;
+   pr_info("Loaded kernel to 0x%llx (%zd bytes)",
+   kvm->arch.kern_guest_start, file_size);
 
/*
 * Now load backwards from the end of memory so the kernel
@@ -300,11 +291,16 @@ bool kvm__arch_load_kernel_image(struct kvm *kvm, int 
fd_kernel, int fd_initrd,
die("initrd overlaps with kernel image.");
 
initrd_start = guest_addr;
-   if (read_image(fd_initrd, , limit) == -ENOMEM)
-   die("initrd too big to contain in guest memory.");
+   file_size = read_file(fd_initrd, pos, limit - pos);
+   if (file_size == -1) {
+   if (errno == ENOMEM)
+   die("initrd too big to contain in guest 
memory.");
+
+   die_perror("initrd read");
+   }
 
kvm->arch.initrd_guest_start = initrd_start;
-   kvm->arch.initrd_size = host_to_guest_flat(kvm, pos) - 
initrd_start;
+   kvm->arch.initrd_size = file_size;
pr_info("Loaded initrd to 0x%llx (%llu bytes)",
kvm->arch.initrd_guest_start,
kvm->arch.initrd_size);
-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/7] arm: move kernel loading into arm/kvm.c

2015-10-30 Thread Andre Przywara
For some reasons (probably to have easy access to the command line)
the kernel loading for arm and arm64 was located in arm/fdt.c.
Move the routines to kvm.c (where other architectures put it) to
only have real device tree code in fdt.c. We use the pointer in
struct kvm to access the command line string.

Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
---
 arm/fdt.c | 95 +--
 arm/kvm.c | 88 ++
 2 files changed, 89 insertions(+), 94 deletions(-)

diff --git a/arm/fdt.c b/arm/fdt.c
index 19d7ed9..381d48f 100644
--- a/arm/fdt.c
+++ b/arm/fdt.c
@@ -9,14 +9,11 @@
 
 #include 
 
-#include 
 #include 
 #include 
 #include 
 #include 
 
-static char kern_cmdline[COMMAND_LINE_SIZE];
-
 bool kvm__load_firmware(struct kvm *kvm, const char *firmware_filename)
 {
return false;
@@ -145,7 +142,7 @@ static int setup_fdt(struct kvm *kvm)
/* /chosen */
_FDT(fdt_begin_node(fdt, "chosen"));
_FDT(fdt_property_cell(fdt, "linux,pci-probe-only", 1));
-   _FDT(fdt_property_string(fdt, "bootargs", kern_cmdline));
+   _FDT(fdt_property_string(fdt, "bootargs", kvm->cfg.real_cmdline));
 
/* Initrd */
if (kvm->arch.initrd_size != 0) {
@@ -223,93 +220,3 @@ static int setup_fdt(struct kvm *kvm)
return 0;
 }
 late_init(setup_fdt);
-
-#define FDT_ALIGN  SZ_2M
-#define INITRD_ALIGN   4
-bool kvm__arch_load_kernel_image(struct kvm *kvm, int fd_kernel, int fd_initrd,
-const char *kernel_cmdline)
-{
-   void *pos, *kernel_end, *limit;
-   unsigned long guest_addr;
-   ssize_t file_size;
-
-   if (lseek(fd_kernel, 0, SEEK_SET) < 0)
-   die_perror("lseek");
-
-   /*
-* Linux requires the initrd and dtb to be mapped inside lowmem,
-* so we can't just place them at the top of memory.
-*/
-   limit = kvm->ram_start + min(kvm->ram_size, (u64)SZ_256M) - 1;
-
-   pos = kvm->ram_start + ARM_KERN_OFFSET(kvm);
-   kvm->arch.kern_guest_start = host_to_guest_flat(kvm, pos);
-   file_size = read_file(fd_kernel, pos, limit - pos);
-   if (file_size < 0) {
-   if (errno == ENOMEM)
-   die("kernel image too big to contain in guest memory.");
-
-   die_perror("kernel read");
-   }
-   kernel_end = pos + file_size;
-   pr_info("Loaded kernel to 0x%llx (%zd bytes)",
-   kvm->arch.kern_guest_start, file_size);
-
-   /*
-* Now load backwards from the end of memory so the kernel
-* decompressor has plenty of space to work with. First up is
-* the device tree blob...
-*/
-   pos = limit;
-   pos -= (FDT_MAX_SIZE + FDT_ALIGN);
-   guest_addr = ALIGN(host_to_guest_flat(kvm, pos), FDT_ALIGN);
-   pos = guest_flat_to_host(kvm, guest_addr);
-   if (pos < kernel_end)
-   die("fdt overlaps with kernel image.");
-
-   kvm->arch.dtb_guest_start = guest_addr;
-   pr_info("Placing fdt at 0x%llx - 0x%llx",
-   kvm->arch.dtb_guest_start,
-   host_to_guest_flat(kvm, limit));
-   limit = pos;
-
-   /* ... and finally the initrd, if we have one. */
-   if (fd_initrd != -1) {
-   struct stat sb;
-   unsigned long initrd_start;
-
-   if (lseek(fd_initrd, 0, SEEK_SET) < 0)
-   die_perror("lseek");
-
-   if (fstat(fd_initrd, ))
-   die_perror("fstat");
-
-   pos -= (sb.st_size + INITRD_ALIGN);
-   guest_addr = ALIGN(host_to_guest_flat(kvm, pos), INITRD_ALIGN);
-   pos = guest_flat_to_host(kvm, guest_addr);
-   if (pos < kernel_end)
-   die("initrd overlaps with kernel image.");
-
-   initrd_start = guest_addr;
-   file_size = read_file(fd_initrd, pos, limit - pos);
-   if (file_size == -1) {
-   if (errno == ENOMEM)
-   die("initrd too big to contain in guest 
memory.");
-
-   die_perror("initrd read");
-   }
-
-   kvm->arch.initrd_guest_start = initrd_start;
-   kvm->arch.initrd_size = file_size;
-   pr_info("Loaded initrd to 0x%llx (%llu bytes)",
-   kvm->arch.initrd_guest_start,
-   kvm->arch.initrd_size);
-   } else {
-   kvm->arch.initrd_size = 0;
-   }
-
-   strncpy(kern_cmdline, kernel_cmdline, COMMAND_LINE_SIZE);
-   kern_cmdline[COMMAND_LINE_SIZE - 1] = '\0';
-
-   return true;
-}
diff --git a/arm/kvm.c b/ar

[PATCH 3/7] powerpc: use read_file() in kernel and initrd loading

2015-10-30 Thread Andre Przywara
Replace the unsafe read-loops in the powerpc kernel image loading
function with our new and safe read_file() wrapper.
This should fix random fails in kernel image loading, especially
from pipes and sockets.

Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
---
 powerpc/kvm.c | 36 
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/powerpc/kvm.c b/powerpc/kvm.c
index 13bba30..2b0bddd 100644
--- a/powerpc/kvm.c
+++ b/powerpc/kvm.c
@@ -162,19 +162,22 @@ bool kvm__arch_load_kernel_image(struct kvm *kvm, int 
fd_kernel, int fd_initrd,
 {
void *p;
void *k_start;
-   void *i_start;
-   int nr;
+   ssize_t filesize;
 
if (lseek(fd_kernel, 0, SEEK_SET) < 0)
die_perror("lseek");
 
p = k_start = guest_flat_to_host(kvm, KERNEL_LOAD_ADDR);
 
-   while ((nr = read(fd_kernel, p, 65536)) > 0)
-   p += nr;
-
-   pr_info("Loaded kernel to 0x%x (%ld bytes)", KERNEL_LOAD_ADDR, 
(long)(p-k_start));
+   filesize = read_file(fd_kernel, p, INITRD_LOAD_ADDR - KERNEL_LOAD_ADDR);
+   if (filesize < 0) {
+   if (errno == ENOMEM)
+   die("Kernel overlaps initrd!");
 
+   die_perror("kernel read");
+   }
+   pr_info("Loaded kernel to 0x%x (%ld bytes)", KERNEL_LOAD_ADDR,
+   filesize);
if (fd_initrd != -1) {
if (lseek(fd_initrd, 0, SEEK_SET) < 0)
die_perror("lseek");
@@ -183,19 +186,20 @@ bool kvm__arch_load_kernel_image(struct kvm *kvm, int 
fd_kernel, int fd_initrd,
die("Kernel overlaps initrd!");
 
/* Round up kernel size to 8byte alignment, and load initrd 
right after. */
-   i_start = p = guest_flat_to_host(kvm, INITRD_LOAD_ADDR);
-
-   while (((nr = read(fd_initrd, p, 65536)) > 0) &&
-  p < (kvm->ram_start + kvm->ram_size))
-   p += nr;
-
-   if (p >= (kvm->ram_start + kvm->ram_size))
-   die("initrd too big to contain in guest RAM.\n");
+   p = guest_flat_to_host(kvm, INITRD_LOAD_ADDR);
+
+   filesize = read_file(fd_initrd, p,
+  (kvm->ram_start + kvm->ram_size) - p);
+   if (filesize < 0) {
+   if (errno == ENOMEM)
+   die("initrd too big to contain in guest 
RAM.\n");
+   die_perror("initrd read");
+   }
 
pr_info("Loaded initrd to 0x%x (%ld bytes)",
-   INITRD_LOAD_ADDR, (long)(p-i_start));
+   INITRD_LOAD_ADDR, filesize);
kvm->arch.initrd_gra = INITRD_LOAD_ADDR;
-   kvm->arch.initrd_size = p-i_start;
+   kvm->arch.initrd_size = filesize;
} else {
kvm->arch.initrd_size = 0;
}
-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/7] x86: use read wrappers in kernel loading

2015-10-30 Thread Andre Przywara
Replace the unsafe read-loops in the x86 kernel image loading
functions with our safe read_file() and read_in_full() wrappers.
This should fix random fails in kernel image loading, especially
from pipes and sockets.

Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
---
 x86/kvm.c | 35 ++-
 1 file changed, 14 insertions(+), 21 deletions(-)

diff --git a/x86/kvm.c b/x86/kvm.c
index a0204b8..ae430a0 100644
--- a/x86/kvm.c
+++ b/x86/kvm.c
@@ -9,6 +9,7 @@
 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -209,15 +210,14 @@ static inline void *guest_real_to_host(struct kvm *kvm, 
u16 selector, u16 offset
 static bool load_flat_binary(struct kvm *kvm, int fd_kernel)
 {
void *p;
-   int nr;
 
if (lseek(fd_kernel, 0, SEEK_SET) < 0)
die_perror("lseek");
 
p = guest_real_to_host(kvm, BOOT_LOADER_SELECTOR, BOOT_LOADER_IP);
 
-   while ((nr = read(fd_kernel, p, 65536)) > 0)
-   p += nr;
+   if (read_file(fd_kernel, p, kvm->cfg.ram_size) < 0)
+   die_perror("read");
 
kvm->arch.boot_selector = BOOT_LOADER_SELECTOR;
kvm->arch.boot_ip   = BOOT_LOADER_IP;
@@ -232,12 +232,10 @@ static bool load_bzimage(struct kvm *kvm, int fd_kernel, 
int fd_initrd,
 const char *kernel_cmdline)
 {
struct boot_params *kern_boot;
-   unsigned long setup_sects;
struct boot_params boot;
size_t cmdline_size;
-   ssize_t setup_size;
+   ssize_t file_size;
void *p;
-   int nr;
u16 vidmode;
 
/*
@@ -248,7 +246,7 @@ static bool load_bzimage(struct kvm *kvm, int fd_kernel, 
int fd_initrd,
if (lseek(fd_kernel, 0, SEEK_SET) < 0)
die_perror("lseek");
 
-   if (read(fd_kernel, , sizeof(boot)) != sizeof(boot))
+   if (read_in_full(fd_kernel, , sizeof(boot)) != sizeof(boot))
return false;
 
if (memcmp(, BZIMAGE_MAGIC, strlen(BZIMAGE_MAGIC)))
@@ -262,20 +260,17 @@ static bool load_bzimage(struct kvm *kvm, int fd_kernel, 
int fd_initrd,
 
if (!boot.hdr.setup_sects)
boot.hdr.setup_sects = BZ_DEFAULT_SETUP_SECTS;
-   setup_sects = boot.hdr.setup_sects + 1;
-
-   setup_size = setup_sects << 9;
+   file_size = (boot.hdr.setup_sects + 1) << 9;
p = guest_real_to_host(kvm, BOOT_LOADER_SELECTOR, BOOT_LOADER_IP);
+   if (read_in_full(fd_kernel, p, file_size) != file_size)
+   die_perror("kernel setup read");
 
-   /* copy setup.bin to mem*/
-   if (read(fd_kernel, p, setup_size) != setup_size)
-   die_perror("read");
-
-   /* copy vmlinux.bin to BZ_KERNEL_START*/
+   /* read actual kernel image (vmlinux.bin) to BZ_KERNEL_START */
p = guest_flat_to_host(kvm, BZ_KERNEL_START);
-
-   while ((nr = read(fd_kernel, p, 65536)) > 0)
-   p += nr;
+   file_size = read_file(fd_kernel, p,
+ kvm->cfg.ram_size - BZ_KERNEL_START);
+   if (file_size < 0)
+   die_perror("kernel read");
 
p = guest_flat_to_host(kvm, BOOT_CMDLINE_OFFSET);
if (kernel_cmdline) {
@@ -287,7 +282,6 @@ static bool load_bzimage(struct kvm *kvm, int fd_kernel, 
int fd_initrd,
memcpy(p, kernel_cmdline, cmdline_size - 1);
}
 
-
/* vidmode should be either specified or set by default */
if (kvm->cfg.vnc || kvm->cfg.sdl || kvm->cfg.gtk) {
if (!kvm->cfg.arch.vidmode)
@@ -326,8 +320,7 @@ static bool load_bzimage(struct kvm *kvm, int fd_kernel, 
int fd_initrd,
}
 
p = guest_flat_to_host(kvm, addr);
-   nr = read(fd_initrd, p, initrd_stat.st_size);
-   if (nr != initrd_stat.st_size)
+   if (read_in_full(fd_initrd, p, initrd_stat.st_size) < 0)
die("Failed to read initrd");
 
kern_boot->hdr.ramdisk_image= addr;
-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/7] kvmtool: Cleanup kernel loading

2015-10-30 Thread Andre Przywara
Hi,

this series cleans up kvmtool's kernel loading functionality a bit.
It has been broken out of a previous series I sent [1] and contains
just the cleanup and bug fix parts, which should be less controversial
and thus easier to merge ;-)
I will resend the pipe loading part later on as a separate series.

The first patch properly abstracts kernel loading to move
responsibility into each architecture's code. It removes quite some
ugly code from the generic kvm.c file.
The later patches address the naive usage of read(2) to, well, read
data from files. Doing this without coping with the subtleties of
the UNIX read semantics (returning with less or none data read is not
an error) can provoke hard to debug failures.
So these patches make use of the existing and one new wrapper function
to make sure we read everything we actually wanted to.
The last patch moves the ARM kernel loading code into the proper
location to be in line with the other architectures.

Please have a look and give some comments!

Find the branch on my kvmtool git tree on:
git://linux-arm.org/kvmtool.git (kern_load-v2 branch)
http://www.linux-arm.org/git?p=kvmtool.git;a=shortlog;h=refs/heads/kern_load-v2

Cheers,
Andre.

[1] http://marc.info/?l=kvm=143825354808135=2

Andre Przywara (7):
  Refactor kernel image loading
  provide generic read_file() implementation
  powerpc: use read_file() in kernel and initrd loading
  MIPS: use read wrappers in kernel loading
  x86: use read wrappers in kernel loading
  arm/arm64: use read_file() in kernel and initrd loading
  arm: move kernel loading into arm/kvm.c

 arm/fdt.c| 99 +---
 arm/kvm.c| 88 ++
 include/kvm/kvm.h|  5 +--
 include/kvm/read-write.h |  2 +
 kvm.c| 42 ++--
 mips/kvm.c   | 57 ++--
 powerpc/kvm.c| 39 ++-
 util/read-write.c| 21 ++
 x86/kvm.c| 62 +++---
 9 files changed, 207 insertions(+), 208 deletions(-)

-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/7] kvmtool: Cleanup kernel loading

2015-10-30 Thread Andre Przywara
Hi,

this series cleans up kvmtool's kernel loading functionality a bit.
It has been broken out of a previous series I sent [1] and contains
just the cleanup and bug fix parts, which should be less controversial
and thus easier to merge ;-)
I will resend the pipe loading part later on as a separate series.

The first patch properly abstracts kernel loading to move
responsibility into each architecture's code. It removes quite some
ugly code from the generic kvm.c file.
The later patches address the naive usage of read(2) to, well, read
data from files. Doing this without coping with the subtleties of
the UNIX read semantics (returning with less or none data read is not
an error) can provoke hard to debug failures.
So these patches make use of the existing and one new wrapper function
to make sure we read everything we actually wanted to.
The last patch moves the ARM kernel loading code into the proper
location to be in line with the other architectures.

Please have a look and give some comments!

Find the branch on my kvmtool git tree on:
git://linux-arm.org/kvmtool.git (kern_load-v2 branch)
http://www.linux-arm.org/git?p=kvmtool.git;a=shortlog;h=refs/heads/kern_load-v2

Cheers,
Andre.

[1] http://marc.info/?l=kvm=143825354808135=2

Andre Przywara (7):
  Refactor kernel image loading
  provide generic read_file() implementation
  powerpc: use read_file() in kernel and initrd loading
  MIPS: use read wrappers in kernel loading
  x86: use read wrappers in kernel loading
  arm/arm64: use read_file() in kernel and initrd loading
  arm: move kernel loading into arm/kvm.c

 arm/fdt.c| 99 +---
 arm/kvm.c| 88 ++
 include/kvm/kvm.h|  5 +--
 include/kvm/read-write.h |  2 +
 kvm.c| 42 ++--
 mips/kvm.c   | 57 ++--
 powerpc/kvm.c| 39 ++-
 util/read-write.c| 21 ++
 x86/kvm.c| 62 +++---
 9 files changed, 207 insertions(+), 208 deletions(-)

-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/7] Refactor kernel image loading

2015-10-30 Thread Andre Przywara
Let's face it: Kernel loading is quite architecture specific. Don't
claim otherwise and move the loading routines into each
architecture's responsibility.
This introduces kvm__arch_load_kernel(), which each architecture can
implement accordingly.
Provide bzImage loading for x86 and ELF loading for MIPS as special
cases for those architectures (removing the arch specific code from
the generic kvm.c file on the way) and rename the existing "flat binary"
loader functions for the other architectures to the new name.

Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
---
 arm/fdt.c |  4 ++--
 include/kvm/kvm.h |  5 ++---
 kvm.c | 42 --
 mips/kvm.c| 23 +++
 powerpc/kvm.c |  3 ++-
 x86/kvm.c | 27 +--
 6 files changed, 46 insertions(+), 58 deletions(-)

diff --git a/arm/fdt.c b/arm/fdt.c
index 3657108..ec7453f 100644
--- a/arm/fdt.c
+++ b/arm/fdt.c
@@ -239,8 +239,8 @@ static int read_image(int fd, void **pos, void *limit)
 
 #define FDT_ALIGN  SZ_2M
 #define INITRD_ALIGN   4
-int load_flat_binary(struct kvm *kvm, int fd_kernel, int fd_initrd,
-const char *kernel_cmdline)
+bool kvm__arch_load_kernel_image(struct kvm *kvm, int fd_kernel, int fd_initrd,
+const char *kernel_cmdline)
 {
void *pos, *kernel_end, *limit;
unsigned long guest_addr;
diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
index 37155db..055a7a2 100644
--- a/include/kvm/kvm.h
+++ b/include/kvm/kvm.h
@@ -111,9 +111,8 @@ void kvm__arch_read_term(struct kvm *kvm);
 void *guest_flat_to_host(struct kvm *kvm, u64 offset);
 u64 host_to_guest_flat(struct kvm *kvm, void *ptr);
 
-int load_flat_binary(struct kvm *kvm, int fd_kernel, int fd_initrd, const char 
*kernel_cmdline);
-int load_elf_binary(struct kvm *kvm, int fd_kernel, int fd_initrd, const char 
*kernel_cmdline);
-bool load_bzimage(struct kvm *kvm, int fd_kernel, int fd_initrd, const char 
*kernel_cmdline);
+bool kvm__arch_load_kernel_image(struct kvm *kvm, int fd_kernel, int fd_initrd,
+const char *kernel_cmdline);
 
 /*
  * Debugging
diff --git a/kvm.c b/kvm.c
index 10ed230..ca7dfee 100644
--- a/kvm.c
+++ b/kvm.c
@@ -341,18 +341,6 @@ static bool initrd_check(int fd)
!memcmp(id, CPIO_MAGIC, 4);
 }
 
-int __attribute__((__weak__)) load_elf_binary(struct kvm *kvm, int fd_kernel,
-   int fd_initrd, const char *kernel_cmdline)
-{
-   return false;
-}
-
-bool __attribute__((__weak__)) load_bzimage(struct kvm *kvm, int fd_kernel,
-   int fd_initrd, const char *kernel_cmdline)
-{
-   return false;
-}
-
 bool kvm__load_kernel(struct kvm *kvm, const char *kernel_filename,
const char *initrd_filename, const char *kernel_cmdline)
 {
@@ -372,40 +360,18 @@ bool kvm__load_kernel(struct kvm *kvm, const char 
*kernel_filename,
die("%s is not an initrd", initrd_filename);
}
 
-#ifdef CONFIG_X86
-   ret = load_bzimage(kvm, fd_kernel, fd_initrd, kernel_cmdline);
-
-   if (ret)
-   goto found_kernel;
-
-   pr_warning("%s is not a bzImage. Trying to load it as a flat 
binary...", kernel_filename);
-#endif
-
-   ret = load_elf_binary(kvm, fd_kernel, fd_initrd, kernel_cmdline);
-
-   if (ret)
-   goto found_kernel;
-
-   ret = load_flat_binary(kvm, fd_kernel, fd_initrd, kernel_cmdline);
-
-   if (ret)
-   goto found_kernel;
+   ret = kvm__arch_load_kernel_image(kvm, fd_kernel, fd_initrd,
+ kernel_cmdline);
 
if (initrd_filename)
close(fd_initrd);
close(fd_kernel);
 
-   die("%s is not a valid bzImage or flat binary", kernel_filename);
-
-found_kernel:
-   if (initrd_filename)
-   close(fd_initrd);
-   close(fd_kernel);
-
+   if (!ret)
+   die("%s is not a valid kernel image", kernel_filename);
return ret;
 }
 
-
 void kvm__dump_mem(struct kvm *kvm, unsigned long addr, unsigned long size, 
int debug_fd)
 {
unsigned char *p;
diff --git a/mips/kvm.c b/mips/kvm.c
index 1925f38..c1c596c 100644
--- a/mips/kvm.c
+++ b/mips/kvm.c
@@ -163,7 +163,8 @@ static void kvm__mips_install_cmdline(struct kvm *kvm)
 
 /* Load at the 1M point. */
 #define KERNEL_LOAD_ADDR 0x100
-int load_flat_binary(struct kvm *kvm, int fd_kernel, int fd_initrd, const char 
*kernel_cmdline)
+
+static bool load_flat_binary(struct kvm *kvm, int fd_kernel)
 {
void *p;
void *k_start;
@@ -281,7 +282,7 @@ static bool kvm__arch_get_elf_32_info(Elf32_Ehdr *ehdr, int 
fd_kernel,
return true;
 }
 
-int load_elf_binary(struct kvm *kvm, int fd_kernel, int fd_initrd, const char 
*kernel_cmdline)
+static bool load_elf

[PATCH 4/7] MIPS: use read wrappers in kernel loading

2015-10-30 Thread Andre Przywara
Replace the unsafe read-loops used in the MIPS kernel image loading
with our safe read_file() and read_in_full() wrappers.
This should fix random fails in kernel image loading, especially
from pipes and sockets.

Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
---
 mips/kvm.c | 36 ++--
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/mips/kvm.c b/mips/kvm.c
index c1c596c..8fbf8de 100644
--- a/mips/kvm.c
+++ b/mips/kvm.c
@@ -168,20 +168,27 @@ static bool load_flat_binary(struct kvm *kvm, int 
fd_kernel)
 {
void *p;
void *k_start;
-   int nr;
+   ssize_t kernel_size;
 
if (lseek(fd_kernel, 0, SEEK_SET) < 0)
die_perror("lseek");
 
p = k_start = guest_flat_to_host(kvm, KERNEL_LOAD_ADDR);
 
-   while ((nr = read(fd_kernel, p, 65536)) > 0)
-   p += nr;
+   kernel_size = read_file(fd_kernel, p,
+   kvm->cfg.ram_size - KERNEL_LOAD_ADDR);
+   if (kernel_size == -1) {
+   if (errno == ENOMEM)
+   die("kernel too big for guest memory");
+   else
+   die_perror("kernel read");
+   }
 
kvm->arch.is64bit = true;
kvm->arch.entry_point = 0x8100ull;
 
-   pr_info("Loaded kernel to 0x%x (%ld bytes)", KERNEL_LOAD_ADDR, (long 
int)(p - k_start));
+   pr_info("Loaded kernel to 0x%x (%zd bytes)", KERNEL_LOAD_ADDR,
+   kernel_size);
 
return true;
 }
@@ -197,7 +204,6 @@ static bool kvm__arch_get_elf_64_info(Elf64_Ehdr *ehdr, int 
fd_kernel,
  struct kvm__arch_elf_info *ei)
 {
int i;
-   size_t nr;
Elf64_Phdr phdr;
 
if (ehdr->e_phentsize != sizeof(phdr)) {
@@ -212,8 +218,7 @@ static bool kvm__arch_get_elf_64_info(Elf64_Ehdr *ehdr, int 
fd_kernel,
 
phdr.p_type = PT_NULL;
for (i = 0; i < ehdr->e_phnum; i++) {
-   nr = read(fd_kernel, , sizeof(phdr));
-   if (nr != sizeof(phdr)) {
+   if (read_in_full(fd_kernel, , sizeof(phdr)) != 
sizeof(phdr)) {
pr_info("Couldn't read %d bytes for ELF PHDR.", 
(int)sizeof(phdr));
return false;
}
@@ -243,7 +248,6 @@ static bool kvm__arch_get_elf_32_info(Elf32_Ehdr *ehdr, int 
fd_kernel,
  struct kvm__arch_elf_info *ei)
 {
int i;
-   size_t nr;
Elf32_Phdr phdr;
 
if (ehdr->e_phentsize != sizeof(phdr)) {
@@ -258,8 +262,7 @@ static bool kvm__arch_get_elf_32_info(Elf32_Ehdr *ehdr, int 
fd_kernel,
 
phdr.p_type = PT_NULL;
for (i = 0; i < ehdr->e_phnum; i++) {
-   nr = read(fd_kernel, , sizeof(phdr));
-   if (nr != sizeof(phdr)) {
+   if (read_in_full(fd_kernel, , sizeof(phdr)) != 
sizeof(phdr)) {
pr_info("Couldn't read %d bytes for ELF PHDR.", 
(int)sizeof(phdr));
return false;
}
@@ -334,14 +337,11 @@ static bool load_elf_binary(struct kvm *kvm, int 
fd_kernel)
p = guest_flat_to_host(kvm, ei.load_addr);
 
pr_info("ELF Loading 0x%lx bytes from 0x%llx to 0x%llx",
-   (unsigned long)ei.len, (unsigned long long)ei.offset, (unsigned 
long long)ei.load_addr);
-   do {
-   nr = read(fd_kernel, p, ei.len);
-   if (nr < 0)
-   die_perror("read");
-   p += nr;
-   ei.len -= nr;
-   } while (ei.len);
+   (unsigned long)ei.len, (unsigned long long)ei.offset,
+   (unsigned long long)ei.load_addr);
+
+   if (read_in_full(fd_kernel, p, ei.len) != (ssize_t)ei.len)
+   die_perror("read");
 
return true;
 }
-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/7] provide generic read_file() implementation

2015-10-30 Thread Andre Przywara
In various parts of kvmtool we simply try to read files into memory,
but fail to do so in a safe way. The read(2) syscall can return early
having only parts of the file read, or it may return -1 due to being
interrupted by a signal (in which case we should simply retry).
The ARM code seems to provide the only safe implementation, so take
that as an inspiration to provide a generic read_file() function
usable by every part of kvmtool.

Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
---
 include/kvm/read-write.h |  2 ++
 util/read-write.c| 21 +
 2 files changed, 23 insertions(+)

diff --git a/include/kvm/read-write.h b/include/kvm/read-write.h
index 67571f9..acbd6f0 100644
--- a/include/kvm/read-write.h
+++ b/include/kvm/read-write.h
@@ -12,6 +12,8 @@
 ssize_t xread(int fd, void *buf, size_t count);
 ssize_t xwrite(int fd, const void *buf, size_t count);
 
+ssize_t read_file(int fd, char *buf, size_t max_size);
+
 ssize_t read_in_full(int fd, void *buf, size_t count);
 ssize_t write_in_full(int fd, const void *buf, size_t count);
 
diff --git a/util/read-write.c b/util/read-write.c
index 44709df..bf6fb2f 100644
--- a/util/read-write.c
+++ b/util/read-write.c
@@ -32,6 +32,27 @@ restart:
return nr;
 }
 
+/*
+ * Read in the whole file while not exceeding max_size bytes of the buffer.
+ * Returns -1 (with errno set) in case of an error (ENOMEM if buffer was
+ * too small) or the filesize if the whole file could be read.
+ */
+ssize_t read_file(int fd, char *buf, size_t max_size)
+{
+   ssize_t ret;
+   char dummy;
+
+   errno = 0;
+   ret = read_in_full(fd, buf, max_size);
+
+   /* Probe whether we reached EOF. */
+   if (xread(fd, , 1) == 0)
+   return ret;
+
+   errno = ENOMEM;
+   return -1;
+}
+
 ssize_t read_in_full(int fd, void *buf, size_t count)
 {
ssize_t total = 0;
-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/7] provide generic read_file() implementation

2015-10-30 Thread Andre Przywara
In various parts of kvmtool we simply try to read files into memory,
but fail to do so in a safe way. The read(2) syscall can return early
having only parts of the file read, or it may return -1 due to being
interrupted by a signal (in which case we should simply retry).
The ARM code seems to provide the only safe implementation, so take
that as an inspiration to provide a generic read_file() function
usable by every part of kvmtool.

Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
---
 include/kvm/read-write.h |  2 ++
 util/read-write.c| 21 +
 2 files changed, 23 insertions(+)

diff --git a/include/kvm/read-write.h b/include/kvm/read-write.h
index 67571f9..acbd6f0 100644
--- a/include/kvm/read-write.h
+++ b/include/kvm/read-write.h
@@ -12,6 +12,8 @@
 ssize_t xread(int fd, void *buf, size_t count);
 ssize_t xwrite(int fd, const void *buf, size_t count);
 
+ssize_t read_file(int fd, char *buf, size_t max_size);
+
 ssize_t read_in_full(int fd, void *buf, size_t count);
 ssize_t write_in_full(int fd, const void *buf, size_t count);
 
diff --git a/util/read-write.c b/util/read-write.c
index 44709df..bf6fb2f 100644
--- a/util/read-write.c
+++ b/util/read-write.c
@@ -32,6 +32,27 @@ restart:
return nr;
 }
 
+/*
+ * Read in the whole file while not exceeding max_size bytes of the buffer.
+ * Returns -1 (with errno set) in case of an error (ENOMEM if buffer was
+ * too small) or the filesize if the whole file could be read.
+ */
+ssize_t read_file(int fd, char *buf, size_t max_size)
+{
+   ssize_t ret;
+   char dummy;
+
+   errno = 0;
+   ret = read_in_full(fd, buf, max_size);
+
+   /* Probe whether we reached EOF. */
+   if (xread(fd, , 1) == 0)
+   return ret;
+
+   errno = ENOMEM;
+   return -1;
+}
+
 ssize_t read_in_full(int fd, void *buf, size_t count)
 {
ssize_t total = 0;
-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network hangs when communicating with host

2015-10-19 Thread Andre Przywara
Hi Dmitry,

On 19/10/15 10:05, Dmitry Vyukov wrote:
> On Fri, Oct 16, 2015 at 7:25 PM, Sasha Levin  wrote:
>> On 10/15/2015 04:20 PM, Dmitry Vyukov wrote:
>>> Hello,
>>>
>>> I am trying to run a program in lkvm sandbox so that it communicates
>>> with a program on host. I run lkvm as:
>>>
>>> ./lkvm sandbox --disk sandbox-test --mem=2048 --cpus=4 --kernel
>>> /arch/x86/boot/bzImage --network mode=user -- /my_prog
>>>
>>> /my_prog then connects to a program on host over a tcp socket.
>>> I see that host receives some data, sends some data back, but then
>>> my_prog hangs on network read.
>>>
>>> To localize this I wrote 2 programs (attached). ping is run on host
>>> and pong is run from lkvm sandbox. They successfully establish tcp
>>> connection, but after some iterations both hang on read.
>>>
>>> Networking code in Go runtime is there for more than 3 years, widely
>>> used in production and does not have any known bugs. However, it uses
>>> epoll edge-triggered readiness notifications that known to be tricky.
>>> Is it possible that lkvm contains some networking bug? Can it be
>>> related to the data races in lkvm I reported earlier today?

Just to let you know:
I think we have seen networking issues in the past - root over NFS had
issues IIRC. Will spent some time on debugging this and it looked like a
race condition in kvmtool's virtio implementation. I think pinning
kvmtool's virtio threads to one host core made this go away. However
although he tried hard (even by Will's standards!) he couldn't find a
the real root cause or a fix at the time he looked at it and we found
other ways to work around the issues (using virtio-blk or initrd's).

So it's quite possible that there are issues. I haven't had time yet to
look at your sanitizer reports, but it looks like a promising approach
to find the root cause.

Cheers,
Andre.

>>>
>>> I am on commit 3695adeb227813d96d9c41850703fb53a23845eb.
>>
>> Hey Dmitry,
>>
>> How long does it take to reproduce? I've been running ping/pong as you've
>> described and it looks like it doesn't get stuck (read/writes keep going
>> on both sides).
>>
>> Can you share your guest kernel config maybe?
> 
> 
> Humm it my setup it happens within a minute or so.
> 
> My kernel is not completely standard, but it works with qemu without
> any problems.
> It is not trivial to reproduce, but FWIW I on commit
> f9fbf6b72ffaaca8612979116c872c9d5d9cc1f5 of
> https://github.com/dvyukov/linux/commits/coverage branch. Config file
> is attached. Then, I build it with custom gcc: revision 228818 +
> https://codereview.appspot.com/267910043 patch. This is all per
> https://github.com/google/syzkaller instructions.
> 
> I run lkvm as:
> ./lkvm sandbox --disk sandbox-test --mem=2048 --cpus=4 --kernel
> /arch/x86/boot/bzImage --network mode=user -- /pong
> 
> kvmtool is on 3695adeb227813d96d9c41850703fb53a23845eb.
> 
> Just tried to do the same with qemu, it does not hang.
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 00/16] KVM: arm64: GICv3 ITS emulation

2015-10-12 Thread Andre Przywara
Hej,

On 10/10/15 16:37, Christoffer Dall wrote:
> Hi Andre,
> 
> 
> On Wed, Oct 07, 2015 at 03:55:10PM +0100, Andre Przywara wrote:
>> Hi,
>>
>> another respin and rebase of the ITS emulation series.
>> Major changes compared to v2 (beside some minor things like added
>> comments and function renames) are the rebasing and adaption to 4.3-rc
>> and Christoffer's timer rework series. Also the locking has been
>> reworked to cope with the dependencies of the its and the dist lock
>> in connection with the PROPBASER/PENDBASER and the command handling.
>> For a more detailed changelog see below or look at the respective
>> commit messages.
>>
>> This should address most of the comments I got on the list.
>> Many thanks to the diligent reviewers!
>> I didn't bother to fine-tune patch 01/16 too much, as I guess there
>> will be more discussion around this based on Pavel's latest post.
>>
>> These patches go on top of Christoffer's timer rework series [1],
>> which itself is on top of 4.3-rc2.
>> You can find all of this code in the its-emul/v3 branch of my
>> repository [2].
> 
> Thanks for rebasing the series!
> 
> Just a heads up that I may not be able to review this series for the
> next 1-2 weeks, so I'm afraid it's not going to make it in for v4.4,
> sorry.
> 
> Please let me know if this breaks expectations from everyone.

No worries, I wasn't expecting this for 4.4 anyway.
I'd rather see the prerequisites like your timer series going upstream
first, I will then rebase it on top of 4.4-rc1 (with fixes from newer
review comments incorporated).
Maybe we can take Pavel's cleanup (replacing my 1/16 and 2/16) for 4.4
already? (I will reply on those soon)
Also what is the status of Eric's IRQ routing support? Should this go in
first now?

Cheers,
Andre.

> Othersie, I will try review it with due dilligence so it makes it in for
> v4.5.
> 
> Best,
> -Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 12/16] KVM: arm64: handle pending bit for LPIs in ITS emulation

2015-10-12 Thread Andre Przywara
Hi Pavel,

On 12/10/15 08:40, Pavel Fedin wrote:
>  Hello!
> 
>> -Original Message-
>> From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf 
>> Of Andre Przywara
>> Sent: Wednesday, October 07, 2015 5:55 PM
>> To: marc.zyng...@arm.com; christoffer.d...@linaro.org
>> Cc: eric.au...@linaro.org; p.fe...@samsung.com; 
>> kvm...@lists.cs.columbia.edu; linux-arm-
>> ker...@lists.infradead.org; kvm@vger.kernel.org
>> Subject: [PATCH v3 12/16] KVM: arm64: handle pending bit for LPIs in ITS 
>> emulation
>>
>> As the actual LPI number in a guest can be quite high, but is mostly
>> assigned using a very sparse allocation scheme, bitmaps and arrays
>> for storing the virtual interrupt status are a waste of memory.
>> We use our equivalent of the "Interrupt Translation Table Entry"
>> (ITTE) to hold this extra status information for a virtual LPI.
>> As the normal VGIC code cannot use its fancy bitmaps to manage
>> pending interrupts, we provide a hook in the VGIC code to let the
>> ITS emulation handle the list register queueing itself.
>> LPIs are located in a separate number range (>=8192), so
>> distinguishing them is easy. With LPIs being only edge-triggered, we
>> get away with a less complex IRQ handling.
>> We extend the number of bits for storing the IRQ number in our
>> LR struct to 16 to cover the LPI numbers we support as well.
>>
>> Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
>> ---
>> Changelog v2..v3:
>> - extend LR data structure to hold 16-bit wide IRQ IDs
>> - only clear pending bit if IRQ could be queued
>> - adapt __kvm_vgic_sync_hwstate() to upstream changes
>>
>>  include/kvm/arm_vgic.h  |  4 +-
>>  virt/kvm/arm/its-emul.c | 75 
>>  virt/kvm/arm/its-emul.h |  3 ++
>>  virt/kvm/arm/vgic-v3-emul.c |  2 +
>>  virt/kvm/arm/vgic.c | 93 
>> +++--
>>  5 files changed, 148 insertions(+), 29 deletions(-)
>>
>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>> index c3eb414..035911f 100644
>> --- a/include/kvm/arm_vgic.h
>> +++ b/include/kvm/arm_vgic.h
>> @@ -95,7 +95,7 @@ enum vgic_type {
>>  #define LR_HW   (1 << 3)
>>
>>  struct vgic_lr {
>> -unsigned irq:10;
>> +unsigned irq:16;
>>  union {
>>  unsigned hwirq:10;
>>  unsigned source:3;
>> @@ -147,6 +147,8 @@ struct vgic_vm_ops {
>>  int (*init_model)(struct kvm *);
>>  void(*destroy_model)(struct kvm *);
>>  int (*map_resources)(struct kvm *, const struct vgic_params *);
>> +bool(*queue_lpis)(struct kvm_vcpu *);
>> +void(*unqueue_lpi)(struct kvm_vcpu *, int irq);
>>  };
>>
>>  struct vgic_io_device {
>> diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
>> index bab8033..8349970 100644
>> --- a/virt/kvm/arm/its-emul.c
>> +++ b/virt/kvm/arm/its-emul.c
>> @@ -59,8 +59,27 @@ struct its_itte {
>>  struct its_collection *collection;
>>  u32 lpi;
>>  u32 event_id;
>> +bool enabled;
>> +unsigned long *pending;
>>  };
>>
>> +/* To be used as an iterator this macro misses the enclosing parentheses */
>> +#define for_each_lpi(dev, itte, kvm) \
>> +list_for_each_entry(dev, &(kvm)->arch.vgic.its.device_list, dev_list) \
>> +list_for_each_entry(itte, &(dev)->itt, itte_list)
>> +
>> +static struct its_itte *find_itte_by_lpi(struct kvm *kvm, int lpi)
>> +{
>> +struct its_device *device;
>> +struct its_itte *itte;
>> +
>> +for_each_lpi(device, itte, kvm) {
>> +if (itte->lpi == lpi)
>> +return itte;
>> +}
>> +return NULL;
>> +}
>> +
>>  #define BASER_BASE_ADDRESS(x) ((x) & 0xf000ULL)
>>
>>  /* The distributor lock is held by the VGIC MMIO handler. */
>> @@ -154,9 +173,65 @@ static bool handle_mmio_gits_idregs(struct kvm_vcpu 
>> *vcpu,
>>  return false;
>>  }
>>
>> +/*
>> + * Find all enabled and pending LPIs and queue them into the list
>> + * registers.
>> + * The dist lock is held by the caller.
>> + */
>> +bool vits_queue_lpis(struct kvm_vcpu *vcpu)
>> +{
>> +struct vgic_its *its = >kvm->arch.vgic.its;
>> +struct its_device *device;
>> +struct its_itte *itte;
>> +bool ret = true;
>&g

Re: [PATCH 1/2] KVM: arm/arm64: Optimize away redundant LR tracking

2015-10-12 Thread Andre Przywara
Hi,

On 02/10/15 15:44, Pavel Fedin wrote:
> Currently we use vgic_irq_lr_map in order to track which LRs hold which
> IRQs, and lr_used bitmap in order to track which LRs are used or free.
> 
> vgic_irq_lr_map is actually used only for piggy-back optimization, and
> can be easily replaced by iteration over lr_used. This is good because in
> future, when LPI support is introduced, number of IRQs will grow up to at
> least 16384, while numbers from 1024 to 8192 are never going to be used.
> This would be a huge memory waste.
> 
> In its turn, lr_used is also completely redundant since
> ae705930fca6322600690df9dc1c7d0516145a93 ("arm/arm64: KVM: Keep elrsr/aisr
> in sync with software model"), because together with lr_used we also update
> elrsr. This allows to easily replace lr_used with elrsr, inverting all
> conditions (because in elrsr '1' means 'free').

So this looks pretty good to me, probably a better (because less
intrusive) solution than my first two patches of the ITS emulation,
which have a very similar scope.
I will give this some testing on my boxes here to spot any regressions,
but I guess I will use these two patches as the base for my next version
of the ITS emulation series.

Christoffer, Marc, do you consider these for 4.4 (since they are an
independent cleanup) or do you want them to be part of the ITS emulation
series since they make more sense in there?

Cheers,
Andre.

> 
> Signed-off-by: Pavel Fedin 
> ---
>  include/kvm/arm_vgic.h |  6 
>  virt/kvm/arm/vgic.c| 74 
> +++---
>  2 files changed, 28 insertions(+), 52 deletions(-)
> 
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 4e14dac..d908028 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -296,9 +296,6 @@ struct vgic_v3_cpu_if {
>  };
>  
>  struct vgic_cpu {
> - /* per IRQ to LR mapping */
> - u8  *vgic_irq_lr_map;
> -
>   /* Pending/active/both interrupts on this VCPU */
>   DECLARE_BITMAP( pending_percpu, VGIC_NR_PRIVATE_IRQS);
>   DECLARE_BITMAP( active_percpu, VGIC_NR_PRIVATE_IRQS);
> @@ -309,9 +306,6 @@ struct vgic_cpu {
>   unsigned long   *active_shared;
>   unsigned long   *pend_act_shared;
>  
> - /* Bitmap of used/free list registers */
> - DECLARE_BITMAP( lr_used, VGIC_V2_MAX_LRS);
> -
>   /* Number of list registers on this CPU */
>   int nr_lr;
>  
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 6bd1c9b..2f4d25a 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -102,9 +102,10 @@
>  #include "vgic.h"
>  
>  static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
> -static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
> +static void vgic_retire_lr(int lr_nr, struct kvm_vcpu *vcpu);
>  static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
>  static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr 
> lr_desc);
> +static u64 vgic_get_elrsr(struct kvm_vcpu *vcpu);
>  static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
>   int virt_irq);
>  
> @@ -683,9 +684,11 @@ bool vgic_handle_cfg_reg(u32 *reg, struct kvm_exit_mmio 
> *mmio,
>  void vgic_unqueue_irqs(struct kvm_vcpu *vcpu)
>  {
>   struct vgic_cpu *vgic_cpu = >arch.vgic_cpu;
> + u64 elrsr = vgic_get_elrsr(vcpu);
> + unsigned long *elrsr_ptr = u64_to_bitmask();
>   int i;
>  
> - for_each_set_bit(i, vgic_cpu->lr_used, vgic_cpu->nr_lr) {
> + for_each_clear_bit(i, elrsr_ptr, vgic_cpu->nr_lr) {
>   struct vgic_lr lr = vgic_get_lr(vcpu, i);
>  
>   /*
> @@ -728,7 +731,7 @@ void vgic_unqueue_irqs(struct kvm_vcpu *vcpu)
>* Mark the LR as free for other use.
>*/
>   BUG_ON(lr.state & LR_STATE_MASK);
> - vgic_retire_lr(i, lr.irq, vcpu);
> + vgic_retire_lr(i, vcpu);
>   vgic_irq_clear_queued(vcpu, lr.irq);
>  
>   /* Finally update the VGIC state. */
> @@ -1087,15 +1090,12 @@ static inline void vgic_enable(struct kvm_vcpu *vcpu)
>   vgic_ops->enable(vcpu);
>  }
>  
> -static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu)
> +static void vgic_retire_lr(int lr_nr, struct kvm_vcpu *vcpu)
>  {
> - struct vgic_cpu *vgic_cpu = >arch.vgic_cpu;
>   struct vgic_lr vlr = vgic_get_lr(vcpu, lr_nr);
>  
>   vlr.state = 0;
>   vgic_set_lr(vcpu, lr_nr, vlr);
> - clear_bit(lr_nr, vgic_cpu->lr_used);
> - vgic_cpu->vgic_irq_lr_map[irq] = LR_EMPTY;
>   vgic_sync_lr_elrsr(vcpu, lr_nr, vlr);
>  }
>  
> @@ -1110,14 +1110,15 @@ static void vgic_retire_lr(int lr_nr, int irq, struct 
> kvm_vcpu *vcpu)
>   */
>  static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu)
>  {
> - struct vgic_cpu *vgic_cpu = >arch.vgic_cpu;
> + u64 elrsr = vgic_get_elrsr(vcpu);
> + unsigned 

Re: [PATCH 0/2] KVM: arm/arm64: Clean up some obsolete code

2015-10-08 Thread Andre Przywara
Hi,

On 08/10/15 11:56, Marc Zyngier wrote:
> On 08/10/15 11:14, Christoffer Dall wrote:
>> Hi Pavel,
>>
>> On Fri, Oct 02, 2015 at 05:44:27PM +0300, Pavel Fedin wrote:
>>> Current KVM code has lots of old redundancies, which can be cleaned up.
>>> This patchset is actually a better alternative to
>>> http://www.spinics.net/lists/arm-kernel/msg430726.html, which allows to
>>> keep piggy-backed LRs. The idea is based on the fact that our code also
>>> maintains LR state in elrsr, and this information is enough to track LR
>>> usage.
>>>
>>> This patchset is made against linux-next of 02.10.2015. Thanks to Andre
>>> for pointing out some 4.3 specifics.
>>>
>> I'm not opposed to these changes, they clean up the data structures
>> which is definitely a good thing.
>>
>> I am a bit worries about how/if this is going to conflict with the ITS
>> series and other patches in flight touchignt he vgic.
>>
>> Marc/Andre, any thoughts on this?
> 
> I don't mind the simplification (Andre was already removing the
> piggybacking stuff as part of his ITS series). I'm a bit more cautious
> about the sync_elrsr stuff, but that's mostly because I've only read the
> patch in a superficial way.
> 
> But yes, this is probably going to clash, unless we make this part of an
> existing series (/me looks at André... ;-)

Yes, I am looking at merging this. From the discussion with Pavel I
remember some things that I disagreed with, so I may propose a follow-up
patch. I will give this a try tomorrow.

Cheers,
Andre.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 13/16] KVM: arm64: sync LPI configuration and pending tables

2015-10-07 Thread Andre Przywara
The LPI configuration and pending tables of the GICv3 LPIs are held
in tables in (guest) memory. To achieve reasonable performance, we
cache this data in our own data structures, so we need to sync those
two views from time to time. This behaviour is well described in the
GICv3 spec and is also exercised by hardware, so the sync points are
well known.

Provide functions that read the guest memory and store the
information from the configuration and pending tables in the kernel.

Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
---
Changelog v2..v3:
- rework functions to avoid propbaser/pendbaser accesses inside lock

 include/kvm/arm_vgic.h  |   2 +
 virt/kvm/arm/its-emul.c | 133 
 virt/kvm/arm/its-emul.h |   3 ++
 3 files changed, 138 insertions(+)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 035911f..4ea023c 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -179,6 +179,8 @@ struct vgic_its {
int cwriter;
struct list_headdevice_list;
struct list_headcollection_list;
+   /* memory used for buffering guest's memory */
+   void*buffer_page;
 };
 
 struct vgic_dist {
diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
index 8349970..7a8c5db 100644
--- a/virt/kvm/arm/its-emul.c
+++ b/virt/kvm/arm/its-emul.c
@@ -59,6 +59,7 @@ struct its_itte {
struct its_collection *collection;
u32 lpi;
u32 event_id;
+   u8 priority;
bool enabled;
unsigned long *pending;
 };
@@ -80,8 +81,124 @@ static struct its_itte *find_itte_by_lpi(struct kvm *kvm, 
int lpi)
return NULL;
 }
 
+#define LPI_PROP_ENABLE_BIT(p) ((p) & LPI_PROP_ENABLED)
+#define LPI_PROP_PRIORITY(p)   ((p) & 0xfc)
+
+/* stores the priority and enable bit for a given LPI */
+static void update_lpi_config(struct kvm *kvm, struct its_itte *itte, u8 prop)
+{
+   itte->priority = LPI_PROP_PRIORITY(prop);
+   itte->enabled  = LPI_PROP_ENABLE_BIT(prop);
+}
+
+#define GIC_LPI_OFFSET 8192
+
+/* We scan the table in chunks the size of the smallest page size */
+#define CHUNK_SIZE 4096U
+
 #define BASER_BASE_ADDRESS(x) ((x) & 0xf000ULL)
 
+static int nr_idbits_propbase(u64 propbaser)
+{
+   int nr_idbits = (1U << (propbaser & 0x1f)) + 1;
+
+   return max(nr_idbits, INTERRUPT_ID_BITS_ITS);
+}
+
+/*
+ * Scan the whole LPI configuration table and put the LPI configuration
+ * data in our own data structures. This relies on the LPI being
+ * mapped before.
+ */
+static bool its_update_lpis_configuration(struct kvm *kvm, u64 prop_base_reg)
+{
+   struct vgic_dist *dist = >arch.vgic;
+   u8 *prop = dist->its.buffer_page;
+   u32 tsize;
+   gpa_t propbase;
+   int lpi = GIC_LPI_OFFSET;
+   struct its_itte *itte;
+   struct its_device *device;
+   int ret;
+
+   propbase = BASER_BASE_ADDRESS(prop_base_reg);
+   tsize = nr_idbits_propbase(prop_base_reg);
+
+   while (tsize > 0) {
+   int chunksize = min(tsize, CHUNK_SIZE);
+
+   ret = kvm_read_guest(kvm, propbase, prop, chunksize);
+   if (ret)
+   return false;
+
+   spin_lock(>its.lock);
+   /*
+* Updating the status for all allocated LPIs. We catch
+* those LPIs that get disabled. We really don't care
+* about unmapped LPIs, as they need to be updated
+* later manually anyway once they get mapped.
+*/
+   for_each_lpi(device, itte, kvm) {
+   if (itte->lpi < lpi || itte->lpi >= lpi + chunksize)
+   continue;
+
+   update_lpi_config(kvm, itte, prop[itte->lpi - lpi]);
+   }
+   spin_unlock(>its.lock);
+   tsize -= chunksize;
+   lpi += chunksize;
+   propbase += chunksize;
+   }
+
+   return true;
+}
+
+/*
+ * Scan the whole LPI pending table and sync the pending bit in there
+ * with our own data structures. This relies on the LPI being
+ * mapped before.
+ */
+static bool its_sync_lpi_pending_table(struct kvm_vcpu *vcpu, u64 
base_addr_reg)
+{
+   struct vgic_dist *dist = >kvm->arch.vgic;
+   unsigned long *pendmask = dist->its.buffer_page;
+   u32 nr_lpis = VITS_NR_LPIS;
+   gpa_t pendbase;
+   int lpi = 0;
+   struct its_itte *itte;
+   struct its_device *device;
+   int ret;
+   int lpi_bit, nr_bits;
+
+   pendbase = BASER_BASE_ADDRESS(base_addr_reg);
+
+   while (nr_lpis > 0) {
+   nr_bits = min(nr_lpis, CHUNK_SIZE * 8);
+
+   ret = kvm_read_guest(vcpu->kvm, pendbase, pendmask,
+nr_bits / 8);
+   if (r

[PATCH v3 05/16] KVM: arm/arm64: extend arch CAP checks to allow per-VM capabilities

2015-10-07 Thread Andre Przywara
KVM capabilities can be a per-VM property, though ARM/ARM64 currently
does not pass on the VM pointer to the architecture specific
capability handlers.
Add a "struct kvm*" parameter to those function to later allow proper
per-VM capability reporting.

Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
---
Changelog v2..v3:
- none

 arch/arm/include/asm/kvm_host.h   | 2 +-
 arch/arm/kvm/arm.c| 2 +-
 arch/arm64/include/asm/kvm_host.h | 2 +-
 arch/arm64/kvm/reset.c| 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 3df1e97..88e84db 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -210,7 +210,7 @@ static inline void __cpu_init_hyp_mode(phys_addr_t 
boot_pgd_ptr,
kvm_call_hyp((void*)hyp_stack_ptr, vector_ptr, pgd_ptr);
 }
 
-static inline int kvm_arch_dev_ioctl_check_extension(long ext)
+static inline int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext)
 {
return 0;
 }
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 6c7f4520..bdbefcd 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -197,7 +197,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = KVM_MAX_VCPUS;
break;
default:
-   r = kvm_arch_dev_ioctl_check_extension(ext);
+   r = kvm_arch_dev_ioctl_check_extension(kvm, ext);
break;
}
return r;
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 4562459..c41e613 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -43,7 +43,7 @@
 
 int __attribute_const__ kvm_target_cpu(void);
 int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
-int kvm_arch_dev_ioctl_check_extension(long ext);
+int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext);
 
 struct kvm_arch {
/* The VMID generation used for the virt. memory system */
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 91cf535..4d7f78b4 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -63,7 +63,7 @@ static bool cpu_has_32bit_el1(void)
  * We currently assume that the number of HW registers is uniform
  * across all CPUs (see cpuinfo_sanity_check).
  */
-int kvm_arch_dev_ioctl_check_extension(long ext)
+int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext)
 {
int r;
 
-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 14/16] KVM: arm64: implement ITS command queue command handlers

2015-10-07 Thread Andre Przywara
The connection between a device, an event ID, the LPI number and the
allocated CPU is stored in in-memory tables in a GICv3, but their
format is not specified by the spec. Instead software uses a command
queue in a ring buffer to let the ITS implementation use their own
format.
Implement handlers for the various ITS commands and let them store
the requested relation into our own data structures.
To avoid kmallocs inside the ITS spinlock, we preallocate possibly
needed memory outside of the lock and free that if it turns out to
be not needed (mostly error handling).
Error handling is very basic at this point, as we don't have a good
way of communicating errors to the guest (usually a SError).
The INT command handler is missing at this point, as we gain the
capability of actually injecting MSIs into the guest only later on.

Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
---
Changelog v2..v3:
- adjust handlers to new pendbaser/propbaser locking scheme
- properly free ITTEs (including pending bitmap)
- fix handling of unmapped collections

 include/linux/irqchip/arm-gic-v3.h |   5 +-
 virt/kvm/arm/its-emul.c| 502 -
 virt/kvm/arm/its-emul.h|  11 +
 3 files changed, 516 insertions(+), 2 deletions(-)

diff --git a/include/linux/irqchip/arm-gic-v3.h 
b/include/linux/irqchip/arm-gic-v3.h
index ef274a9..27c0e75 100644
--- a/include/linux/irqchip/arm-gic-v3.h
+++ b/include/linux/irqchip/arm-gic-v3.h
@@ -255,7 +255,10 @@
  */
 #define GITS_CMD_MAPD  0x08
 #define GITS_CMD_MAPC  0x09
-#define GITS_CMD_MAPVI 0x0a
+#define GITS_CMD_MAPTI 0x0a
+/* older GIC documentation used MAPVI for this command */
+#define GITS_CMD_MAPVI GITS_CMD_MAPTI
+#define GITS_CMD_MAPI  0x0b
 #define GITS_CMD_MOVI  0x01
 #define GITS_CMD_DISCARD   0x0f
 #define GITS_CMD_INV   0x0c
diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
index 7a8c5db..642effb 100644
--- a/virt/kvm/arm/its-emul.c
+++ b/virt/kvm/arm/its-emul.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -64,6 +65,34 @@ struct its_itte {
unsigned long *pending;
 };
 
+static struct its_device *find_its_device(struct kvm *kvm, u32 device_id)
+{
+   struct vgic_its *its = >arch.vgic.its;
+   struct its_device *device;
+
+   list_for_each_entry(device, >device_list, dev_list)
+   if (device_id == device->device_id)
+   return device;
+
+   return NULL;
+}
+
+static struct its_itte *find_itte(struct kvm *kvm, u32 device_id, u32 event_id)
+{
+   struct its_device *device;
+   struct its_itte *itte;
+
+   device = find_its_device(kvm, device_id);
+   if (device == NULL)
+   return NULL;
+
+   list_for_each_entry(itte, >itt, itte_list)
+   if (itte->event_id == event_id)
+   return itte;
+
+   return NULL;
+}
+
 /* To be used as an iterator this macro misses the enclosing parentheses */
 #define for_each_lpi(dev, itte, kvm) \
list_for_each_entry(dev, &(kvm)->arch.vgic.its.device_list, dev_list) \
@@ -81,6 +110,19 @@ static struct its_itte *find_itte_by_lpi(struct kvm *kvm, 
int lpi)
return NULL;
 }
 
+static struct its_collection *find_collection(struct kvm *kvm, int coll_id)
+{
+   struct its_collection *collection;
+
+   list_for_each_entry(collection, >arch.vgic.its.collection_list,
+   coll_list) {
+   if (coll_id == collection->collection_id)
+   return collection;
+   }
+
+   return NULL;
+}
+
 #define LPI_PROP_ENABLE_BIT(p) ((p) & LPI_PROP_ENABLED)
 #define LPI_PROP_PRIORITY(p)   ((p) & 0xfc)
 
@@ -352,13 +394,471 @@ static void its_free_itte(struct its_itte *itte)
kfree(itte);
 }
 
+static u64 its_cmd_mask_field(u64 *its_cmd, int word, int shift, int size)
+{
+   return (le64_to_cpu(its_cmd[word]) >> shift) & (BIT_ULL(size) - 1);
+}
+
+#define its_cmd_get_command(cmd)   its_cmd_mask_field(cmd, 0,  0,  8)
+#define its_cmd_get_deviceid(cmd)  its_cmd_mask_field(cmd, 0, 32, 32)
+#define its_cmd_get_id(cmd)its_cmd_mask_field(cmd, 1,  0, 32)
+#define its_cmd_get_physical_id(cmd)   its_cmd_mask_field(cmd, 1, 32, 32)
+#define its_cmd_get_collection(cmd)its_cmd_mask_field(cmd, 2,  0, 16)
+#define its_cmd_get_target_addr(cmd)   its_cmd_mask_field(cmd, 2, 16, 32)
+#define its_cmd_get_validbit(cmd)  its_cmd_mask_field(cmd, 2, 63,  1)
+
+/* The DISCARD command frees an Interrupt Translation Table Entry (ITTE). */
+static int vits_cmd_handle_discard(struct kvm *kvm, u64 *its_cmd)
+{
+   struct vgic_its *its = >arch.vgic.its;
+   u32 device_id;
+   u32 event_id;
+   struct its_itte *itte;
+   int ret 

[PATCH v3 15/16] KVM: arm64: implement MSI injection in ITS emulation

2015-10-07 Thread Andre Przywara
When userland wants to inject a MSI into the guest, we have to use
our data structures to find the LPI number and the VCPU to receive
the interrupt.
Use the wrapper functions to iterate the linked lists and find the
proper Interrupt Translation Table Entry. Then set the pending bit
in this ITTE to be later picked up by the LR handling code. Kick
the VCPU which is meant to handle this interrupt.
We provide a VGIC emulation model specific routine for the actual
MSI injection. The wrapper functions return an error for models not
(yet) implementing MSIs (like the GICv2 emulation).
We also provide the handler for the ITS "INT" command, which allows a
guest to trigger an MSI via the ITS command queue.

Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
---
Changelog v2..v3:
- proper checking for unmapped collections

 include/kvm/arm_vgic.h  |  1 +
 virt/kvm/arm/its-emul.c | 65 +
 virt/kvm/arm/its-emul.h |  2 ++
 virt/kvm/arm/vgic-v3-emul.c |  1 +
 4 files changed, 69 insertions(+)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 4ea023c..7911059 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -149,6 +149,7 @@ struct vgic_vm_ops {
int (*map_resources)(struct kvm *, const struct vgic_params *);
bool(*queue_lpis)(struct kvm_vcpu *);
void(*unqueue_lpi)(struct kvm_vcpu *, int irq);
+   int (*inject_msi)(struct kvm *, struct kvm_msi *);
 };
 
 struct vgic_io_device {
diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
index 642effb..cd8526a 100644
--- a/virt/kvm/arm/its-emul.c
+++ b/virt/kvm/arm/its-emul.c
@@ -333,6 +333,55 @@ static bool handle_mmio_gits_idregs(struct kvm_vcpu *vcpu,
 }
 
 /*
+ * Translates an incoming MSI request into the redistributor (=VCPU) and
+ * the associated LPI number. Sets the LPI pending bit and also marks the
+ * VCPU as having a pending interrupt.
+ */
+int vits_inject_msi(struct kvm *kvm, struct kvm_msi *msi)
+{
+   struct vgic_dist *dist = >arch.vgic;
+   struct vgic_its *its = >its;
+   struct its_itte *itte;
+   int cpuid;
+   bool inject = false;
+   int ret = 0;
+
+   if (!vgic_has_its(kvm))
+   return -ENODEV;
+
+   if (!(msi->flags & KVM_MSI_VALID_DEVID))
+   return -EINVAL;
+
+   spin_lock(>lock);
+
+   if (!its->enabled || !dist->lpis_enabled) {
+   ret = -EAGAIN;
+   goto out_unlock;
+   }
+
+   itte = find_itte(kvm, msi->devid, msi->data);
+   /* Triggering an unmapped IRQ gets silently dropped. */
+   if (!itte || !its_is_collection_mapped(itte->collection))
+   goto out_unlock;
+
+   cpuid = itte->collection->target_addr;
+   __set_bit(cpuid, itte->pending);
+   inject = itte->enabled;
+
+out_unlock:
+   spin_unlock(>lock);
+
+   if (inject) {
+   spin_lock(>lock);
+   __set_bit(cpuid, dist->irq_pending_on_cpu);
+   spin_unlock(>lock);
+   kvm_vcpu_kick(kvm_get_vcpu(kvm, cpuid));
+   }
+
+   return ret;
+}
+
+/*
  * Find all enabled and pending LPIs and queue them into the list
  * registers.
  * The dist lock is held by the caller.
@@ -812,6 +861,19 @@ static int vits_cmd_handle_movall(struct kvm *kvm, u64 
*its_cmd)
return 0;
 }
 
+/* The INT command injects the LPI associated with that DevID/EvID pair. */
+static int vits_cmd_handle_int(struct kvm *kvm, u64 *its_cmd)
+{
+   struct kvm_msi msi = {
+   .data = its_cmd_get_id(its_cmd),
+   .devid = its_cmd_get_deviceid(its_cmd),
+   .flags = KVM_MSI_VALID_DEVID,
+   };
+
+   vits_inject_msi(kvm, );
+   return 0;
+}
+
 /*
  * This function is called with both the ITS and the distributor lock dropped,
  * so the actual command handlers must take the respective locks when needed.
@@ -846,6 +908,9 @@ static int vits_handle_command(struct kvm_vcpu *vcpu, u64 
*its_cmd)
case GITS_CMD_MOVALL:
ret = vits_cmd_handle_movall(vcpu->kvm, its_cmd);
break;
+   case GITS_CMD_INT:
+   ret = vits_cmd_handle_int(vcpu->kvm, its_cmd);
+   break;
case GITS_CMD_INV:
ret = vits_cmd_handle_inv(vcpu->kvm, its_cmd);
break;
diff --git a/virt/kvm/arm/its-emul.h b/virt/kvm/arm/its-emul.h
index 830524a..95e56a7 100644
--- a/virt/kvm/arm/its-emul.h
+++ b/virt/kvm/arm/its-emul.h
@@ -36,6 +36,8 @@ void vgic_enable_lpis(struct kvm_vcpu *vcpu);
 int vits_init(struct kvm *kvm);
 void vits_destroy(struct kvm *kvm);
 
+int vits_inject_msi(struct kvm *kvm, struct kvm_msi *msi);
+
 bool vits_queue_lpis(struct kvm_vcpu *vcpu);
 void vits_unqueue_lpi(struct kvm_vcpu *vcpu, int irq);
 
diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c
index f482e34..90

[PATCH v3 12/16] KVM: arm64: handle pending bit for LPIs in ITS emulation

2015-10-07 Thread Andre Przywara
As the actual LPI number in a guest can be quite high, but is mostly
assigned using a very sparse allocation scheme, bitmaps and arrays
for storing the virtual interrupt status are a waste of memory.
We use our equivalent of the "Interrupt Translation Table Entry"
(ITTE) to hold this extra status information for a virtual LPI.
As the normal VGIC code cannot use its fancy bitmaps to manage
pending interrupts, we provide a hook in the VGIC code to let the
ITS emulation handle the list register queueing itself.
LPIs are located in a separate number range (>=8192), so
distinguishing them is easy. With LPIs being only edge-triggered, we
get away with a less complex IRQ handling.
We extend the number of bits for storing the IRQ number in our
LR struct to 16 to cover the LPI numbers we support as well.

Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
---
Changelog v2..v3:
- extend LR data structure to hold 16-bit wide IRQ IDs
- only clear pending bit if IRQ could be queued
- adapt __kvm_vgic_sync_hwstate() to upstream changes

 include/kvm/arm_vgic.h  |  4 +-
 virt/kvm/arm/its-emul.c | 75 
 virt/kvm/arm/its-emul.h |  3 ++
 virt/kvm/arm/vgic-v3-emul.c |  2 +
 virt/kvm/arm/vgic.c | 93 +++--
 5 files changed, 148 insertions(+), 29 deletions(-)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index c3eb414..035911f 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -95,7 +95,7 @@ enum vgic_type {
 #define LR_HW  (1 << 3)
 
 struct vgic_lr {
-   unsigned irq:10;
+   unsigned irq:16;
union {
unsigned hwirq:10;
unsigned source:3;
@@ -147,6 +147,8 @@ struct vgic_vm_ops {
int (*init_model)(struct kvm *);
void(*destroy_model)(struct kvm *);
int (*map_resources)(struct kvm *, const struct vgic_params *);
+   bool(*queue_lpis)(struct kvm_vcpu *);
+   void(*unqueue_lpi)(struct kvm_vcpu *, int irq);
 };
 
 struct vgic_io_device {
diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
index bab8033..8349970 100644
--- a/virt/kvm/arm/its-emul.c
+++ b/virt/kvm/arm/its-emul.c
@@ -59,8 +59,27 @@ struct its_itte {
struct its_collection *collection;
u32 lpi;
u32 event_id;
+   bool enabled;
+   unsigned long *pending;
 };
 
+/* To be used as an iterator this macro misses the enclosing parentheses */
+#define for_each_lpi(dev, itte, kvm) \
+   list_for_each_entry(dev, &(kvm)->arch.vgic.its.device_list, dev_list) \
+   list_for_each_entry(itte, &(dev)->itt, itte_list)
+
+static struct its_itte *find_itte_by_lpi(struct kvm *kvm, int lpi)
+{
+   struct its_device *device;
+   struct its_itte *itte;
+
+   for_each_lpi(device, itte, kvm) {
+   if (itte->lpi == lpi)
+   return itte;
+   }
+   return NULL;
+}
+
 #define BASER_BASE_ADDRESS(x) ((x) & 0xf000ULL)
 
 /* The distributor lock is held by the VGIC MMIO handler. */
@@ -154,9 +173,65 @@ static bool handle_mmio_gits_idregs(struct kvm_vcpu *vcpu,
return false;
 }
 
+/*
+ * Find all enabled and pending LPIs and queue them into the list
+ * registers.
+ * The dist lock is held by the caller.
+ */
+bool vits_queue_lpis(struct kvm_vcpu *vcpu)
+{
+   struct vgic_its *its = >kvm->arch.vgic.its;
+   struct its_device *device;
+   struct its_itte *itte;
+   bool ret = true;
+
+   if (!vgic_has_its(vcpu->kvm))
+   return true;
+   if (!its->enabled || !vcpu->kvm->arch.vgic.lpis_enabled)
+   return true;
+
+   spin_lock(>lock);
+   for_each_lpi(device, itte, vcpu->kvm) {
+   if (!itte->enabled || !test_bit(vcpu->vcpu_id, itte->pending))
+   continue;
+
+   if (!itte->collection)
+   continue;
+
+   if (itte->collection->target_addr != vcpu->vcpu_id)
+   continue;
+
+
+   if (vgic_queue_irq(vcpu, 0, itte->lpi))
+   __clear_bit(vcpu->vcpu_id, itte->pending);
+   else
+   ret = false;
+   }
+
+   spin_unlock(>lock);
+   return ret;
+}
+
+/* Called with the distributor lock held by the caller. */
+void vits_unqueue_lpi(struct kvm_vcpu *vcpu, int lpi)
+{
+   struct vgic_its *its = >kvm->arch.vgic.its;
+   struct its_itte *itte;
+
+   spin_lock(>lock);
+
+   /* Find the right ITTE and put the pending state back in there */
+   itte = find_itte_by_lpi(vcpu->kvm, lpi);
+   if (itte)
+   __set_bit(vcpu->vcpu_id, itte->pending);
+
+   spin_unlock(>lock);
+}
+
 static void its_free_itte(struct its_itte *itte)
 {
list_del(>itte_list);
+

[PATCH v3 16/16] KVM: arm64: enable ITS emulation as a virtual MSI controller

2015-10-07 Thread Andre Przywara
If userspace has provided a base address for the ITS register frame,
we enable the bits that advertise LPIs in the GICv3.
When the guest has enabled LPIs and the ITS, we enable the emulation
part by initializing the ITS data structures and trapping on ITS
register frame accesses by the guest.
Also we enable the KVM_SIGNAL_MSI feature to allow userland to inject
MSIs into the guest. Not having enabled the ITS emulation will lead
to a -ENODEV when trying to inject a MSI.

Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
---
Changelog v2..v3:
- replace kmalloc with kcalloc
- adjust number of supported LPIs in comment

 Documentation/virtual/kvm/api.txt |  2 +-
 arch/arm64/kvm/Kconfig|  1 +
 arch/arm64/kvm/reset.c|  6 ++
 include/kvm/arm_vgic.h|  6 ++
 virt/kvm/arm/its-emul.c   | 10 +-
 virt/kvm/arm/vgic-v3-emul.c   | 20 ++--
 virt/kvm/arm/vgic.c   |  8 
 7 files changed, 45 insertions(+), 8 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index a302e0a..047e4e7 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2134,7 +2134,7 @@ after pausing the vcpu, but before it is resumed.
 4.71 KVM_SIGNAL_MSI
 
 Capability: KVM_CAP_SIGNAL_MSI
-Architectures: x86
+Architectures: x86 arm64
 Type: vm ioctl
 Parameters: struct kvm_msi (in)
 Returns: >0 on delivery, 0 if guest blocked the MSI, and -1 on error
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 5c7e920..e8d77f4 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -31,6 +31,7 @@ config KVM
select KVM_VFIO
select HAVE_KVM_EVENTFD
select HAVE_KVM_IRQFD
+   select HAVE_KVM_MSI
---help---
  Support hosting virtualized guest machines.
 
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 4d7f78b4..a490f67 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -80,6 +80,12 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long 
ext)
case KVM_CAP_SET_GUEST_DEBUG:
r = 1;
break;
+   case KVM_CAP_MSI_DEVID:
+   if (!kvm)
+   r = -EINVAL;
+   else
+   r = kvm->arch.vgic.msis_require_devid;
+   break;
default:
r = 0;
}
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 7911059..35657f9 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -174,6 +174,7 @@ struct irq_phys_map_entry {
 
 struct vgic_its {
boolenabled;
+   struct vgic_io_device   iodev;
spinlock_t  lock;
u64 cbaser;
int creadr;
@@ -192,6 +193,9 @@ struct vgic_dist {
/* vGIC model the kernel emulates for the guest (GICv2 or GICv3) */
u32 vgic_model;
 
+   /* Do injected MSIs require an additional device ID? */
+   boolmsis_require_devid;
+
int nr_cpus;
int nr_irqs;
 
@@ -397,4 +401,6 @@ static inline int vgic_v3_probe(struct device_node 
*vgic_node,
 }
 #endif
 
+int kvm_send_userspace_msi(struct kvm *kvm, struct kvm_msi *msi);
+
 #endif
diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
index cd8526a..b40a7fc 100644
--- a/virt/kvm/arm/its-emul.c
+++ b/virt/kvm/arm/its-emul.c
@@ -1117,6 +1117,7 @@ int vits_init(struct kvm *kvm)
 {
struct vgic_dist *dist = >arch.vgic;
struct vgic_its *its = >its;
+   int ret;
 
dist->pendbaser = kcalloc(dist->nr_cpus, sizeof(u64), GFP_KERNEL);
if (!dist->pendbaser)
@@ -1131,9 +1132,16 @@ int vits_init(struct kvm *kvm)
INIT_LIST_HEAD(>device_list);
INIT_LIST_HEAD(>collection_list);
 
+   ret = vgic_register_kvm_io_dev(kvm, dist->vgic_its_base,
+  KVM_VGIC_V3_ITS_SIZE, vgicv3_its_ranges,
+  -1, >iodev);
+   if (ret)
+   return ret;
+
its->enabled = false;
+   dist->msis_require_devid = true;
 
-   return -ENXIO;
+   return 0;
 }
 
 void vits_destroy(struct kvm *kvm)
diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c
index 90f3628..311b3ea 100644
--- a/virt/kvm/arm/vgic-v3-emul.c
+++ b/virt/kvm/arm/vgic-v3-emul.c
@@ -8,7 +8,6 @@
  *
  * Limitations of the emulation:
  * (RAZ/WI: read as zero, write ignore, RAO/WI: read as one, write ignore)
- * - We do not support LPIs (yet). TYPER.LPIS is reported as 0 and is RAZ/WI.
  * - We do not support the message based interrupts (MBIs) triggered by
  *   writes to the GICD_{SET,CLR}SPI_* registers. TYPER.MBIS is reported as 0.
  * - We do not support the (optional) backwards compatibi

[PATCH v3 03/16] KVM: extend struct kvm_msi to hold a 32-bit device ID

2015-10-07 Thread Andre Przywara
The ARM GICv3 ITS MSI controller requires a device ID to be able to
assign the proper interrupt vector. On real hardware, this ID is
sampled from the bus. To be able to emulate an ITS controller, extend
the KVM MSI interface to let userspace provide such a device ID. For
PCI devices, the device ID is simply the 16-bit bus-device-function
triplet, which should be easily available to the userland tool.

Also there is a new KVM capability which advertises whether the
current VM requires a device ID to be set along with the MSI data.
This flag is still reported as not available everywhere, later we will
enable it when ITS emulation is used.

Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
Reviewed-by: Eric Auger <eric.au...@linaro.org>
---
Changelog v2..v3:
- adjust KVM_CAP number to not clash with upstream

 Documentation/virtual/kvm/api.txt | 12 ++--
 include/uapi/linux/kvm.h  |  5 -
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index d9eccee..a302e0a 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2147,10 +2147,18 @@ struct kvm_msi {
__u32 address_hi;
__u32 data;
__u32 flags;
-   __u8  pad[16];
+   __u32 devid;
+   __u8  pad[12];
 };
 
-No flags are defined so far. The corresponding field must be 0.
+flags: KVM_MSI_VALID_DEVID: devid contains a valid value
+devid: If KVM_MSI_VALID_DEVID is set, contains a unique device identifier
+   for the device that wrote the MSI message.
+   For PCI, this is usually a BFD identifier in the lower 16 bits.
+
+The per-VM KVM_CAP_MSI_DEVID capability advertises the need to provide
+the device ID. If this capability is not set, userland cannot rely on
+the kernel to allow the KVM_MSI_VALID_DEVID flag being set.
 
 
 4.71 KVM_CREATE_PIT2
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index a9256f0..eae9ba1 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -824,6 +824,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_MULTI_ADDRESS_SPACE 118
 #define KVM_CAP_GUEST_DEBUG_HW_BPS 119
 #define KVM_CAP_GUEST_DEBUG_HW_WPS 120
+#define KVM_CAP_MSI_DEVID 121
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -975,12 +976,14 @@ struct kvm_one_reg {
__u64 addr;
 };
 
+#define KVM_MSI_VALID_DEVID(1U << 0)
 struct kvm_msi {
__u32 address_lo;
__u32 address_hi;
__u32 data;
__u32 flags;
-   __u8  pad[16];
+   __u32 devid;
+   __u8  pad[12];
 };
 
 struct kvm_arm_device_addr {
-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 11/16] KVM: arm64: add data structures to model ITS interrupt translation

2015-10-07 Thread Andre Przywara
The GICv3 Interrupt Translation Service (ITS) uses tables in memory
to allow a sophisticated interrupt routing. It features device tables,
an interrupt table per device and a table connecting "collections" to
actual CPUs (aka. redistributors in the GICv3 lingo).
Since the interrupt numbers for the LPIs are allocated quite sparsely
and the range can be quite huge (8192 LPIs being the minimum), using
bitmaps or arrays for storing information is a waste of memory.
We use linked lists instead, which we iterate linearily. This works
very well with the actual number of LPIs/MSIs in the guest being
quite low. Should the number of LPIs exceed the number where iterating
through lists seems acceptable, we can later revisit this and use more
efficient data structures.

Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
---
Changelog v2..v3:
- add a comment

 include/kvm/arm_vgic.h  |  3 +++
 virt/kvm/arm/its-emul.c | 66 +
 2 files changed, 69 insertions(+)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 9ac850d..c3eb414 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define VGIC_NR_IRQS_LEGACY256
 #define VGIC_NR_SGIS   16
@@ -174,6 +175,8 @@ struct vgic_its {
u64 cbaser;
int creadr;
int cwriter;
+   struct list_headdevice_list;
+   struct list_headcollection_list;
 };
 
 struct vgic_dist {
diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
index 9bbed86..bab8033 100644
--- a/virt/kvm/arm/its-emul.c
+++ b/virt/kvm/arm/its-emul.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -32,6 +33,34 @@
 #include "vgic.h"
 #include "its-emul.h"
 
+struct its_device {
+   struct list_head dev_list;
+
+   /* the head for the list of ITTEs */
+   struct list_head itt;
+   u32 device_id;
+};
+
+#define COLLECTION_NOT_MAPPED ((u32)-1)
+
+struct its_collection {
+   struct list_head coll_list;
+
+   u32 collection_id;
+   u32 target_addr;
+};
+
+#define its_is_collection_mapped(coll) ((coll) && \
+   ((coll)->target_addr != COLLECTION_NOT_MAPPED))
+
+struct its_itte {
+   struct list_head itte_list;
+
+   struct its_collection *collection;
+   u32 lpi;
+   u32 event_id;
+};
+
 #define BASER_BASE_ADDRESS(x) ((x) & 0xf000ULL)
 
 /* The distributor lock is held by the VGIC MMIO handler. */
@@ -125,6 +154,12 @@ static bool handle_mmio_gits_idregs(struct kvm_vcpu *vcpu,
return false;
 }
 
+static void its_free_itte(struct its_itte *itte)
+{
+   list_del(>itte_list);
+   kfree(itte);
+}
+
 /*
  * This function is called with both the ITS and the distributor lock dropped,
  * so the actual command handlers must take the respective locks when needed.
@@ -321,6 +356,9 @@ int vits_init(struct kvm *kvm)
 
spin_lock_init(>lock);
 
+   INIT_LIST_HEAD(>device_list);
+   INIT_LIST_HEAD(>collection_list);
+
its->enabled = false;
 
return -ENXIO;
@@ -330,11 +368,39 @@ void vits_destroy(struct kvm *kvm)
 {
struct vgic_dist *dist = >arch.vgic;
struct vgic_its *its = >its;
+   struct its_device *dev;
+   struct its_itte *itte;
+   struct list_head *dev_cur, *dev_temp;
+   struct list_head *cur, *temp;
 
if (!vgic_has_its(kvm))
return;
 
+   /*
+* We may end up here without the lists ever having been initialized.
+* Check this and bail out early to avoid dereferencing a NULL pointer.
+*/
+   if (!its->device_list.next)
+   return;
+
+   spin_lock(>lock);
+   list_for_each_safe(dev_cur, dev_temp, >device_list) {
+   dev = container_of(dev_cur, struct its_device, dev_list);
+   list_for_each_safe(cur, temp, >itt) {
+   itte = (container_of(cur, struct its_itte, itte_list));
+   its_free_itte(itte);
+   }
+   list_del(dev_cur);
+   kfree(dev);
+   }
+
+   list_for_each_safe(cur, temp, >collection_list) {
+   list_del(cur);
+   kfree(container_of(cur, struct its_collection, coll_list));
+   }
+
kfree(dist->pendbaser);
 
its->enabled = false;
+   spin_unlock(>lock);
 }
-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 07/16] KVM: arm64: Introduce new MMIO region for the ITS base address

2015-10-07 Thread Andre Przywara
The ARM GICv3 ITS controller requires a separate register frame to
cover ITS specific registers. Add a new VGIC address type and store
the address in a field in the vgic_dist structure.
Provide a function to check whether userland has provided the address,
so ITS functionality can be guarded by that check.

Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
Reviewed-by: Eric Auger <eric.au...@linaro.org>
---
Changelog v2..v3:
- none

 Documentation/virtual/kvm/devices/arm-vgic.txt |  9 +
 arch/arm64/include/uapi/asm/kvm.h  |  2 ++
 include/kvm/arm_vgic.h |  3 +++
 virt/kvm/arm/vgic-v3-emul.c|  2 ++
 virt/kvm/arm/vgic.c| 16 
 virt/kvm/arm/vgic.h|  1 +
 6 files changed, 33 insertions(+)

diff --git a/Documentation/virtual/kvm/devices/arm-vgic.txt 
b/Documentation/virtual/kvm/devices/arm-vgic.txt
index 3fb9054..ec715f9e 100644
--- a/Documentation/virtual/kvm/devices/arm-vgic.txt
+++ b/Documentation/virtual/kvm/devices/arm-vgic.txt
@@ -39,6 +39,15 @@ Groups:
   Only valid for KVM_DEV_TYPE_ARM_VGIC_V3.
   This address needs to be 64K aligned.
 
+KVM_VGIC_V3_ADDR_TYPE_ITS (rw, 64-bit)
+  Base address in the guest physical address space of the GICv3 ITS
+  control register frame. The ITS allows MSI(-X) interrupts to be
+  injected into guests. This extension is optional, if the kernel
+  does not support the ITS, the call returns -ENODEV.
+  This memory is solely for the guest to access the ITS control
+  registers and does not cover the ITS translation register.
+  Only valid for KVM_DEV_TYPE_ARM_VGIC_V3.
+  This address needs to be 64K aligned and the region covers 64 KByte.
 
   KVM_DEV_ARM_VGIC_GRP_DIST_REGS
   Attributes:
diff --git a/arch/arm64/include/uapi/asm/kvm.h 
b/arch/arm64/include/uapi/asm/kvm.h
index 0cd7b59..99e4006 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -87,9 +87,11 @@ struct kvm_regs {
 /* Supported VGICv3 address types  */
 #define KVM_VGIC_V3_ADDR_TYPE_DIST 2
 #define KVM_VGIC_V3_ADDR_TYPE_REDIST   3
+#define KVM_VGIC_V3_ADDR_TYPE_ITS  4
 
 #define KVM_VGIC_V3_DIST_SIZE  SZ_64K
 #define KVM_VGIC_V3_REDIST_SIZE(2 * SZ_64K)
+#define KVM_VGIC_V3_ITS_SIZE   SZ_64K
 
 #define KVM_ARM_VCPU_POWER_OFF 0 /* CPU is started in OFF state */
 #define KVM_ARM_VCPU_EL1_32BIT 1 /* CPU running a 32bit VM */
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 2c10082..067ad09 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -190,6 +190,9 @@ struct vgic_dist {
phys_addr_t vgic_redist_base;
};
 
+   /* The base address of the ITS control register frame */
+   phys_addr_t vgic_its_base;
+
/* Distributor enabled */
u32 enabled;
 
diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c
index 1f42348..a8cf669 100644
--- a/virt/kvm/arm/vgic-v3-emul.c
+++ b/virt/kvm/arm/vgic-v3-emul.c
@@ -887,6 +887,7 @@ void vgic_v3_init_emulation(struct kvm *kvm)
 
dist->vgic_dist_base = VGIC_ADDR_UNDEF;
dist->vgic_redist_base = VGIC_ADDR_UNDEF;
+   dist->vgic_its_base = VGIC_ADDR_UNDEF;
 
kvm->arch.max_vcpus = KVM_MAX_VCPUS;
 }
@@ -1059,6 +1060,7 @@ static int vgic_v3_has_attr(struct kvm_device *dev,
return -ENXIO;
case KVM_VGIC_V3_ADDR_TYPE_DIST:
case KVM_VGIC_V3_ADDR_TYPE_REDIST:
+   case KVM_VGIC_V3_ADDR_TYPE_ITS:
return 0;
}
break;
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 1dd79e1..4219f22 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -953,6 +953,16 @@ int vgic_register_kvm_io_dev(struct kvm *kvm, gpa_t base, 
int len,
return ret;
 }
 
+bool vgic_has_its(struct kvm *kvm)
+{
+   struct vgic_dist *dist = >arch.vgic;
+
+   if (dist->vgic_model != KVM_DEV_TYPE_ARM_VGIC_V3)
+   return false;
+
+   return !IS_VGIC_ADDR_UNDEF(dist->vgic_its_base);
+}
+
 static int vgic_nr_shared_irqs(struct vgic_dist *dist)
 {
return dist->nr_irqs - VGIC_NR_PRIVATE_IRQS;
@@ -2257,6 +2267,12 @@ int kvm_vgic_addr(struct kvm *kvm, unsigned long type, 
u64 *addr, bool write)
block_size = KVM_VGIC_V3_REDIST_SIZE;
alignment = SZ_64K;
break;
+   case KVM_VGIC_V3_ADDR_TYPE_ITS:
+   type_needed = KVM_DEV_TYPE_ARM_VGIC_V3;
+   addr_ptr = >vgic_its_base;
+   block_size = KVM_VGIC_V3_ITS_SIZE;
+   alignment = SZ_64K;
+   break;
 #endif
default:
r = -ENODEV;
diff --git a/virt/kvm/arm/vgic.h b/virt/kvm/arm/vgic.h
index 0df74cb..a093f5c 100644
-

[PATCH v3 10/16] KVM: arm64: implement basic ITS register handlers

2015-10-07 Thread Andre Przywara
Add emulation for some basic MMIO registers used in the ITS emulation.
This includes:
- GITS_{CTLR,TYPER,IIDR}
- ID registers
- GITS_{CBASER,CREADR,CWRITER}
  those implement the ITS command buffer handling

Most of the handlers are pretty straight forward, but CWRITER goes
some extra miles to allow fine grained locking. The idea here
is to let only the first instance iterate through the command ring
buffer, CWRITER accesses on other VCPUs meanwhile will be picked up
by that first instance and handled as well. The ITS lock is thus only
hold for very small periods of time and is dropped before the actual
command handler is called.

Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
---
Changelog v2..v3:
- use new renamed vgic_reg64_access() function
- rework locking in CWRITER handling
- use kcalloc instead of kmalloc

 include/kvm/arm_vgic.h |   3 +
 include/linux/irqchip/arm-gic-v3.h |   8 ++
 virt/kvm/arm/its-emul.c| 215 +
 virt/kvm/arm/its-emul.h|   1 +
 virt/kvm/arm/vgic-v3-emul.c|   2 +
 5 files changed, 229 insertions(+)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index c8c48e3..9ac850d 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -171,6 +171,9 @@ struct irq_phys_map_entry {
 struct vgic_its {
boolenabled;
spinlock_t  lock;
+   u64 cbaser;
+   int creadr;
+   int cwriter;
 };
 
 struct vgic_dist {
diff --git a/include/linux/irqchip/arm-gic-v3.h 
b/include/linux/irqchip/arm-gic-v3.h
index 70e9539..ef274a9 100644
--- a/include/linux/irqchip/arm-gic-v3.h
+++ b/include/linux/irqchip/arm-gic-v3.h
@@ -181,15 +181,23 @@
 #define GITS_BASER 0x0100
 #define GITS_IDREGS_BASE   0xffd0
 #define GITS_PIDR2 GICR_PIDR2
+#define GITS_PIDR4 0xffd0
+#define GITS_CIDR0 0xfff0
+#define GITS_CIDR1 0xfff4
+#define GITS_CIDR2 0xfff8
+#define GITS_CIDR3 0xfffc
 
 #define GITS_TRANSLATER0x10040
 
 #define GITS_CTLR_ENABLE   (1U << 0)
 #define GITS_CTLR_QUIESCENT(1U << 31)
 
+#define GITS_TYPER_PLPIS   (1UL << 0)
+#define GITS_TYPER_IDBITS_SHIFT8
 #define GITS_TYPER_DEVBITS_SHIFT   13
 #define GITS_TYPER_DEVBITS(r)  r) >> GITS_TYPER_DEVBITS_SHIFT) & 
0x1f) + 1)
 #define GITS_TYPER_PTA (1UL << 19)
+#define GITS_TYPER_HWCOLLCNT_SHIFT 24
 
 #define GITS_CBASER_VALID  (1UL << 63)
 #define GITS_CBASER_nCnB   (0UL << 59)
diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
index 659dd39..9bbed86 100644
--- a/virt/kvm/arm/its-emul.c
+++ b/virt/kvm/arm/its-emul.c
@@ -32,10 +32,62 @@
 #include "vgic.h"
 #include "its-emul.h"
 
+#define BASER_BASE_ADDRESS(x) ((x) & 0xf000ULL)
+
+/* The distributor lock is held by the VGIC MMIO handler. */
 static bool handle_mmio_misc_gits(struct kvm_vcpu *vcpu,
  struct kvm_exit_mmio *mmio,
  phys_addr_t offset)
 {
+   struct vgic_its *its = >kvm->arch.vgic.its;
+   u32 reg;
+   bool was_enabled;
+
+   switch (offset & ~3) {
+   case 0x00:  /* GITS_CTLR */
+   /* We never defer any command execution. */
+   reg = GITS_CTLR_QUIESCENT;
+   if (its->enabled)
+   reg |= GITS_CTLR_ENABLE;
+   was_enabled = its->enabled;
+   vgic_reg_access(mmio, , offset & 3,
+   ACCESS_READ_VALUE | ACCESS_WRITE_VALUE);
+   its->enabled = !!(reg & GITS_CTLR_ENABLE);
+   return !was_enabled && its->enabled;
+   case 0x04:  /* GITS_IIDR */
+   reg = (PRODUCT_ID_KVM << 24) | (IMPLEMENTER_ARM << 0);
+   vgic_reg_access(mmio, , offset & 3,
+   ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
+   break;
+   case 0x08:  /* GITS_TYPER */
+   /*
+* We use linear CPU numbers for redistributor addressing,
+* so GITS_TYPER.PTA is 0.
+* To avoid memory waste on the guest side, we keep the
+* number of IDBits and DevBits low for the time being.
+* This could later be made configurable by userland.
+* Since we have all collections in linked list, we claim
+* that we can hold all of the collection tables in our
+* own memory and that the ITT entry size is 1 byte (the
+* smallest possible one).
+   

[PATCH v3 06/16] KVM: arm/arm64: make GIC frame address initialization model specific

2015-10-07 Thread Andre Przywara
Currently we initialize all the possible GIC frame addresses in one
function, without looking at the specific GIC model we instantiate
for the guest.
As this gets confusing when adding another VGIC model later, lets
move these initializations into the respective model's emulation
init functions.

Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
---
Changelog v2..v3:
- none

 virt/kvm/arm/vgic-v2-emul.c | 3 +++
 virt/kvm/arm/vgic-v3-emul.c | 3 +++
 virt/kvm/arm/vgic.c | 3 ---
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/arm/vgic-v2-emul.c b/virt/kvm/arm/vgic-v2-emul.c
index 1390797..8faa28c 100644
--- a/virt/kvm/arm/vgic-v2-emul.c
+++ b/virt/kvm/arm/vgic-v2-emul.c
@@ -567,6 +567,9 @@ void vgic_v2_init_emulation(struct kvm *kvm)
dist->vm_ops.init_model = vgic_v2_init_model;
dist->vm_ops.map_resources = vgic_v2_map_resources;
 
+   dist->vgic_cpu_base = VGIC_ADDR_UNDEF;
+   dist->vgic_dist_base = VGIC_ADDR_UNDEF;
+
kvm->arch.max_vcpus = VGIC_V2_MAX_CPUS;
 }
 
diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c
index d2eeb20..1f42348 100644
--- a/virt/kvm/arm/vgic-v3-emul.c
+++ b/virt/kvm/arm/vgic-v3-emul.c
@@ -885,6 +885,9 @@ void vgic_v3_init_emulation(struct kvm *kvm)
dist->vm_ops.destroy_model = vgic_v3_destroy_model;
dist->vm_ops.map_resources = vgic_v3_map_resources;
 
+   dist->vgic_dist_base = VGIC_ADDR_UNDEF;
+   dist->vgic_redist_base = VGIC_ADDR_UNDEF;
+
kvm->arch.max_vcpus = KVM_MAX_VCPUS;
 }
 
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index b71f627..1dd79e1 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -2160,9 +2160,6 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
kvm->arch.vgic.in_kernel = true;
kvm->arch.vgic.vgic_model = type;
kvm->arch.vgic.vctrl_base = vgic->vctrl_base;
-   kvm->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
-   kvm->arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF;
-   kvm->arch.vgic.vgic_redist_base = VGIC_ADDR_UNDEF;
 
 out_unlock:
for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 01/16] KVM: arm/arm64: VGIC: don't track used LRs in the distributor

2015-10-07 Thread Andre Przywara
Currently we track which IRQ has been mapped to which VGIC list
register and also have to synchronize both. We used to do this
to hold some extra state (for instance the active bit).
It turns out that this extra state in the LRs is no longer needed and
this extra tracking causes some pain later.
Remove the tracking feature (lr_map and lr_used) and get rid of
quite some code on the way.
In places where we scan LRs we now use our shadow copy of the ELRSR
register directly.
This code change means we lose the "piggy-back" optimization, which
would re-use an active-only LR to inject the pending state on top of
it. Tracing with various workloads shows that this actually occurred
very rarely, the ballpark figure is about once every 10,000 exits
in a disk I/O heavy workload. Also the list registers don't seem to
as scarce as assumed, with all 4 LRs on the popular implementations
used less than once every 100,000 exits.

This has been briefly tested on Midway, Juno and the model (the latter
both with GICv2 and GICv3 guests).

Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
---
Changelog v2..v3:
- adapt to 4.3-rc
- keep, but change retire_lr to drop now unused parameter

 include/kvm/arm_vgic.h |   6 ---
 virt/kvm/arm/vgic-v2.c |   1 +
 virt/kvm/arm/vgic-v3.c |   1 +
 virt/kvm/arm/vgic.c| 137 +
 4 files changed, 61 insertions(+), 84 deletions(-)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 7bc5d02..926d67c 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -295,9 +295,6 @@ struct vgic_v3_cpu_if {
 };
 
 struct vgic_cpu {
-   /* per IRQ to LR mapping */
-   u8  *vgic_irq_lr_map;
-
/* Pending/active/both interrupts on this VCPU */
DECLARE_BITMAP( pending_percpu, VGIC_NR_PRIVATE_IRQS);
DECLARE_BITMAP( active_percpu, VGIC_NR_PRIVATE_IRQS);
@@ -308,9 +305,6 @@ struct vgic_cpu {
unsigned long   *active_shared;
unsigned long   *pend_act_shared;
 
-   /* Bitmap of used/free list registers */
-   DECLARE_BITMAP( lr_used, VGIC_V2_MAX_LRS);
-
/* Number of list registers on this CPU */
int nr_lr;
 
diff --git a/virt/kvm/arm/vgic-v2.c b/virt/kvm/arm/vgic-v2.c
index 8d7b04d..c0f5d7f 100644
--- a/virt/kvm/arm/vgic-v2.c
+++ b/virt/kvm/arm/vgic-v2.c
@@ -158,6 +158,7 @@ static void vgic_v2_enable(struct kvm_vcpu *vcpu)
 * anyway.
 */
vcpu->arch.vgic_cpu.vgic_v2.vgic_vmcr = 0;
+   vcpu->arch.vgic_cpu.vgic_v2.vgic_elrsr = ~0;
 
/* Get the show on the road... */
vcpu->arch.vgic_cpu.vgic_v2.vgic_hcr = GICH_HCR_EN;
diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
index 7dd5d62..92003cb 100644
--- a/virt/kvm/arm/vgic-v3.c
+++ b/virt/kvm/arm/vgic-v3.c
@@ -193,6 +193,7 @@ static void vgic_v3_enable(struct kvm_vcpu *vcpu)
 * anyway.
 */
vgic_v3->vgic_vmcr = 0;
+   vgic_v3->vgic_elrsr = ~0;
 
/*
 * If we are emulating a GICv3, we do it in an non-GICv2-compatible
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index f3e76e5..da0a866 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -102,7 +102,7 @@
 #include "vgic.h"
 
 static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
-static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
+static void vgic_retire_lr(int lr_nr, struct kvm_vcpu *vcpu);
 static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
 static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr lr_desc);
 static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
@@ -672,6 +672,17 @@ bool vgic_handle_cfg_reg(u32 *reg, struct kvm_exit_mmio 
*mmio,
return false;
 }
 
+static void vgic_sync_lr_elrsr(struct kvm_vcpu *vcpu, int lr,
+  struct vgic_lr vlr)
+{
+   vgic_ops->sync_lr_elrsr(vcpu, lr, vlr);
+}
+
+static inline u64 vgic_get_elrsr(struct kvm_vcpu *vcpu)
+{
+   return vgic_ops->get_elrsr(vcpu);
+}
+
 /**
  * vgic_unqueue_irqs - move pending/active IRQs from LRs to the distributor
  * @vgic_cpu: Pointer to the vgic_cpu struct holding the LRs
@@ -683,9 +694,11 @@ bool vgic_handle_cfg_reg(u32 *reg, struct kvm_exit_mmio 
*mmio,
 void vgic_unqueue_irqs(struct kvm_vcpu *vcpu)
 {
struct vgic_cpu *vgic_cpu = >arch.vgic_cpu;
+   u64 elrsr = vgic_get_elrsr(vcpu);
+   unsigned long *elrsr_ptr = u64_to_bitmask();
int i;
 
-   for_each_set_bit(i, vgic_cpu->lr_used, vgic_cpu->nr_lr) {
+   for_each_clear_bit(i, elrsr_ptr, vgic_cpu->nr_lr) {
struct vgic_lr lr = vgic_get_lr(vcpu, i);
 
/*
@@ -728,7 +741,7 @@ void vgic_unqueue_irqs(struct kvm_vcpu *vcpu)
 * Mark the LR as free for other use.
 */
BUG_ON(lr.state & LR_STATE_MASK);
-   vgic_ret

[PATCH v3 08/16] KVM: arm64: handle ITS related GICv3 redistributor registers

2015-10-07 Thread Andre Przywara
In the GICv3 redistributor there are the PENDBASER and PROPBASER
registers which we did not emulate so far, as they only make sense
when having an ITS. In preparation for that emulate those MMIO
accesses by storing the 64-bit data written into it into a variable
which we later read in the ITS emulation.

Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
---
Changelog v2..v3:
- rename vgic_handle_base_register to vgic_reg64_access()

 include/kvm/arm_vgic.h  |  8 
 virt/kvm/arm/vgic-v3-emul.c | 44 
 virt/kvm/arm/vgic.c | 31 +++
 virt/kvm/arm/vgic.h |  2 ++
 4 files changed, 85 insertions(+)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 067ad09..06c33bc 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -272,6 +272,14 @@ struct vgic_dist {
/* Virtual irq to hwirq mapping */
spinlock_t  irq_phys_map_lock;
struct list_headirq_phys_map_list;
+
+   /* Address of LPI configuration table shared by all redistributors */
+   u64 propbaser;
+
+   /* Addresses of LPI pending tables per redistributor */
+   u64 *pendbaser;
+
+   boollpis_enabled;
 };
 
 struct vgic_v2_cpu_if {
diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c
index a8cf669..6939f7c 100644
--- a/virt/kvm/arm/vgic-v3-emul.c
+++ b/virt/kvm/arm/vgic-v3-emul.c
@@ -651,6 +651,38 @@ static bool handle_mmio_cfg_reg_redist(struct kvm_vcpu 
*vcpu,
return vgic_handle_cfg_reg(reg, mmio, offset);
 }
 
+/* We don't trigger any actions here, just store the register value */
+static bool handle_mmio_propbaser_redist(struct kvm_vcpu *vcpu,
+struct kvm_exit_mmio *mmio,
+phys_addr_t offset)
+{
+   struct vgic_dist *dist = >kvm->arch.vgic;
+   int mode = ACCESS_READ_VALUE;
+
+   /* Storing a value with LPIs already enabled is undefined */
+   mode |= dist->lpis_enabled ? ACCESS_WRITE_IGNORED : ACCESS_WRITE_VALUE;
+   vgic_reg64_access(mmio, offset, >propbaser, mode);
+
+   return false;
+}
+
+/* We don't trigger any actions here, just store the register value */
+static bool handle_mmio_pendbaser_redist(struct kvm_vcpu *vcpu,
+struct kvm_exit_mmio *mmio,
+phys_addr_t offset)
+{
+   struct kvm_vcpu *rdvcpu = mmio->private;
+   struct vgic_dist *dist = >kvm->arch.vgic;
+   int mode = ACCESS_READ_VALUE;
+
+   /* Storing a value with LPIs already enabled is undefined */
+   mode |= dist->lpis_enabled ? ACCESS_WRITE_IGNORED : ACCESS_WRITE_VALUE;
+   vgic_reg64_access(mmio, offset,
+ >pendbaser[rdvcpu->vcpu_id], mode);
+
+   return false;
+}
+
 #define SGI_base(x) ((x) + SZ_64K)
 
 static const struct vgic_io_range vgic_redist_ranges[] = {
@@ -679,6 +711,18 @@ static const struct vgic_io_range vgic_redist_ranges[] = {
.handle_mmio= handle_mmio_raz_wi,
},
{
+   .base   = GICR_PENDBASER,
+   .len= 0x08,
+   .bits_per_irq   = 0,
+   .handle_mmio= handle_mmio_pendbaser_redist,
+   },
+   {
+   .base   = GICR_PROPBASER,
+   .len= 0x08,
+   .bits_per_irq   = 0,
+   .handle_mmio= handle_mmio_propbaser_redist,
+   },
+   {
.base   = GICR_IDREGS,
.len= 0x30,
.bits_per_irq   = 0,
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 4219f22..11bf692 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -471,6 +471,37 @@ void vgic_reg_access(struct kvm_exit_mmio *mmio, u32 *reg,
}
 }
 
+/* handle a 64-bit register access */
+void vgic_reg64_access(struct kvm_exit_mmio *mmio, phys_addr_t offset,
+  u64 *basereg, int mode)
+{
+   u32 reg;
+   u64 breg;
+
+   switch (offset & ~3) {
+   case 0x00:
+   breg = *basereg;
+   reg = lower_32_bits(breg);
+   vgic_reg_access(mmio, , offset & 3, mode);
+   if (mmio->is_write && (mode & ACCESS_WRITE_VALUE)) {
+   breg &= GENMASK_ULL(63, 32);
+   breg |= reg;
+   *basereg = breg;
+   }
+   break;
+   case 0x04:
+   breg = *basereg;
+   reg = upper_32_bits(breg);
+   vgic_reg_access(mmio, , offset & 3, mode);
+   if (mmio->is_write && (mode & ACCESS_WRITE_VALUE)) {
+   breg  = lower_32_bits(breg);
+

[PATCH v3 04/16] KVM: arm/arm64: add emulation model specific destroy function

2015-10-07 Thread Andre Przywara
Currently we destroy the VGIC emulation in one function that cares for
all emulated models. To be on par with init_model (which is model
specific), lets introduce a per-emulation-model destroy method, too.
Use it for a tiny GICv3 specific code already, later it will be handy
for the ITS emulation.

Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
Reviewed-by: Eric Auger <eric.au...@linaro.org>
---
Changelog v2..v3:
- none

 include/kvm/arm_vgic.h  |  1 +
 virt/kvm/arm/vgic-v3-emul.c |  9 +
 virt/kvm/arm/vgic.c | 11 ++-
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 926d67c..2c10082 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -144,6 +144,7 @@ struct vgic_vm_ops {
bool(*queue_sgi)(struct kvm_vcpu *, int irq);
void(*add_sgi_source)(struct kvm_vcpu *, int irq, int source);
int (*init_model)(struct kvm *);
+   void(*destroy_model)(struct kvm *);
int (*map_resources)(struct kvm *, const struct vgic_params *);
 };
 
diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c
index e661e7f..d2eeb20 100644
--- a/virt/kvm/arm/vgic-v3-emul.c
+++ b/virt/kvm/arm/vgic-v3-emul.c
@@ -862,6 +862,14 @@ static int vgic_v3_init_model(struct kvm *kvm)
return 0;
 }
 
+static void vgic_v3_destroy_model(struct kvm *kvm)
+{
+   struct vgic_dist *dist = >arch.vgic;
+
+   kfree(dist->irq_spi_mpidr);
+   dist->irq_spi_mpidr = NULL;
+}
+
 /* GICv3 does not keep track of SGI sources anymore. */
 static void vgic_v3_add_sgi_source(struct kvm_vcpu *vcpu, int irq, int source)
 {
@@ -874,6 +882,7 @@ void vgic_v3_init_emulation(struct kvm *kvm)
dist->vm_ops.queue_sgi = vgic_v3_queue_sgi;
dist->vm_ops.add_sgi_source = vgic_v3_add_sgi_source;
dist->vm_ops.init_model = vgic_v3_init_model;
+   dist->vm_ops.destroy_model = vgic_v3_destroy_model;
dist->vm_ops.map_resources = vgic_v3_map_resources;
 
kvm->arch.max_vcpus = KVM_MAX_VCPUS;
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index a5360b7..b71f627 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -125,6 +125,14 @@ int kvm_vgic_map_resources(struct kvm *kvm)
return kvm->arch.vgic.vm_ops.map_resources(kvm, vgic);
 }
 
+static void vgic_destroy_model(struct kvm *kvm)
+{
+   struct vgic_vm_ops *vm_ops = >arch.vgic.vm_ops;
+
+   if (vm_ops->destroy_model)
+   vm_ops->destroy_model(kvm);
+}
+
 /*
  * struct vgic_bitmap contains a bitmap made of unsigned longs, but
  * extracts u32s out of them.
@@ -1941,6 +1949,8 @@ void kvm_vgic_destroy(struct kvm *kvm)
struct kvm_vcpu *vcpu;
int i;
 
+   vgic_destroy_model(kvm);
+
kvm_for_each_vcpu(i, vcpu, kvm)
kvm_vgic_vcpu_destroy(vcpu);
 
@@ -1957,7 +1967,6 @@ void kvm_vgic_destroy(struct kvm *kvm)
}
kfree(dist->irq_sgi_sources);
kfree(dist->irq_spi_cpu);
-   kfree(dist->irq_spi_mpidr);
kfree(dist->irq_spi_target);
kfree(dist->irq_pending_on_cpu);
kfree(dist->irq_active_on_cpu);
-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 00/16] KVM: arm64: GICv3 ITS emulation

2015-10-07 Thread Andre Przywara
t://linux-arm.org/linux-ap.git
 http://www.linux-arm.org/git?p=linux-ap.git;a=log;h=refs/heads/its-emul/v3
[3]: git://linux-arm.org/kvmtool.git
 http://www.linux-arm.org/git?p=kvmtool.git;a=log;h=refs/heads/its
[4]: 
http://arminfo.emea.arm.com/help/topic/com.arm.doc.ihi0069a/IHI0069A_gic_architecture_specification.pdf

Andre Przywara (16):
  KVM: arm/arm64: VGIC: don't track used LRs in the distributor
  KVM: arm/arm64: remove now unused code after stay-in-LR rework
  KVM: extend struct kvm_msi to hold a 32-bit device ID
  KVM: arm/arm64: add emulation model specific destroy function
  KVM: arm/arm64: extend arch CAP checks to allow per-VM capabilities
  KVM: arm/arm64: make GIC frame address initialization model specific
  KVM: arm64: Introduce new MMIO region for the ITS base address
  KVM: arm64: handle ITS related GICv3 redistributor registers
  KVM: arm64: introduce ITS emulation file with stub functions
  KVM: arm64: implement basic ITS register handlers
  KVM: arm64: add data structures to model ITS interrupt translation
  KVM: arm64: handle pending bit for LPIs in ITS emulation
  KVM: arm64: sync LPI configuration and pending tables
  KVM: arm64: implement ITS command queue command handlers
  KVM: arm64: implement MSI injection in ITS emulation
  KVM: arm64: enable ITS emulation as a virtual MSI controller

 Documentation/virtual/kvm/api.txt  |   14 +-
 Documentation/virtual/kvm/devices/arm-vgic.txt |9 +
 arch/arm/include/asm/kvm_host.h|2 +-
 arch/arm/kvm/arm.c |2 +-
 arch/arm64/include/asm/kvm_host.h  |2 +-
 arch/arm64/include/uapi/asm/kvm.h  |2 +
 arch/arm64/kvm/Kconfig |1 +
 arch/arm64/kvm/Makefile|1 +
 arch/arm64/kvm/reset.c |8 +-
 include/kvm/arm_vgic.h |   43 +-
 include/linux/irqchip/arm-gic-v3.h |   14 +-
 include/uapi/linux/kvm.h   |5 +-
 virt/kvm/arm/its-emul.c| 1187 
 virt/kvm/arm/its-emul.h|   55 ++
 virt/kvm/arm/vgic-v2-emul.c|3 +
 virt/kvm/arm/vgic-v2.c |1 +
 virt/kvm/arm/vgic-v3-emul.c|  101 +-
 virt/kvm/arm/vgic-v3.c |1 +
 virt/kvm/arm/vgic.c|  292 +++---
 virt/kvm/arm/vgic.h|3 +
 20 files changed, 1601 insertions(+), 145 deletions(-)
 create mode 100644 virt/kvm/arm/its-emul.c
 create mode 100644 virt/kvm/arm/its-emul.h

-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 02/16] KVM: arm/arm64: remove now unused code after stay-in-LR rework

2015-10-07 Thread Andre Przywara
Now that we synchronize the LR state into our emulation upon guest
exit, there is no need for taking extra care of disabled IRQs.
Remove that code.

Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
---
Changelog v2..v3:
- new patch

 virt/kvm/arm/vgic.c | 29 -
 1 file changed, 29 deletions(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index da0a866..a5360b7 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -101,7 +101,6 @@
 
 #include "vgic.h"
 
-static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
 static void vgic_retire_lr(int lr_nr, struct kvm_vcpu *vcpu);
 static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
 static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr lr_desc);
@@ -477,7 +476,6 @@ bool vgic_handle_enable_reg(struct kvm *kvm, struct 
kvm_exit_mmio *mmio,
 {
u32 *reg;
int mode = ACCESS_READ_VALUE | access;
-   struct kvm_vcpu *target_vcpu = kvm_get_vcpu(kvm, vcpu_id);
 
reg = vgic_bitmap_get_reg(>arch.vgic.irq_enabled, vcpu_id, offset);
vgic_reg_access(mmio, reg, offset, mode);
@@ -485,7 +483,6 @@ bool vgic_handle_enable_reg(struct kvm *kvm, struct 
kvm_exit_mmio *mmio,
if (access & ACCESS_WRITE_CLEARBIT) {
if (offset < 4) /* Force SGI enabled */
*reg |= 0x;
-   vgic_retire_disabled_irqs(target_vcpu);
}
vgic_update_state(kvm);
return true;
@@ -1099,32 +1096,6 @@ static void vgic_retire_lr(int lr_nr, struct kvm_vcpu 
*vcpu)
vgic_sync_lr_elrsr(vcpu, lr_nr, vlr);
 }
 
-/*
- * An interrupt may have been disabled after being made pending on the
- * CPU interface (the classic case is a timer running while we're
- * rebooting the guest - the interrupt would kick as soon as the CPU
- * interface gets enabled, with deadly consequences).
- *
- * The solution is to examine already active LRs, and check the
- * interrupt is still enabled. If not, just retire it.
- */
-static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu)
-{
-   u64 elrsr = vgic_get_elrsr(vcpu);
-   unsigned long *elrsr_ptr = u64_to_bitmask();
-   int lr;
-
-   for_each_clear_bit(lr, elrsr_ptr, vgic->nr_lr) {
-   struct vgic_lr vlr = vgic_get_lr(vcpu, lr);
-
-   if (!vgic_irq_is_enabled(vcpu, vlr.irq)) {
-   vgic_retire_lr(lr, vcpu);
-   if (vgic_irq_is_queued(vcpu, vlr.irq))
-   vgic_irq_clear_queued(vcpu, vlr.irq);
-   }
-   }
-}
-
 static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq,
 int lr_nr, struct vgic_lr vlr)
 {
-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 09/16] KVM: arm64: introduce ITS emulation file with stub functions

2015-10-07 Thread Andre Przywara
The ARM GICv3 ITS emulation code goes into a separate file, but
needs to be connected to the GICv3 emulation, of which it is an
option.
Introduce the skeleton with function stubs to be filled later.
Introduce the basic ITS data structure and initialize it, but don't
return any success yet, as we are not yet ready for the show.

Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
Reviewed-by: Eric Auger <eric.au...@linaro.org>
---
Changelog v2..v3:
- drop ITS check before doing GICR_CTLR access

 arch/arm64/kvm/Makefile|   1 +
 include/kvm/arm_vgic.h |   6 ++
 include/linux/irqchip/arm-gic-v3.h |   1 +
 virt/kvm/arm/its-emul.c| 125 +
 virt/kvm/arm/its-emul.h|  35 +++
 virt/kvm/arm/vgic-v3-emul.c|  20 +-
 6 files changed, 185 insertions(+), 3 deletions(-)
 create mode 100644 virt/kvm/arm/its-emul.c
 create mode 100644 virt/kvm/arm/its-emul.h

diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 1949fe5..75069a9 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -25,5 +25,6 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v2-emul.o
 kvm-$(CONFIG_KVM_ARM_HOST) += vgic-v2-switch.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v3.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v3-emul.o
+kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/its-emul.o
 kvm-$(CONFIG_KVM_ARM_HOST) += vgic-v3-switch.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 06c33bc..c8c48e3 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -168,6 +168,11 @@ struct irq_phys_map_entry {
struct irq_phys_map map;
 };
 
+struct vgic_its {
+   boolenabled;
+   spinlock_t  lock;
+};
+
 struct vgic_dist {
spinlock_t  lock;
boolin_kernel;
@@ -280,6 +285,7 @@ struct vgic_dist {
u64 *pendbaser;
 
boollpis_enabled;
+   struct vgic_its its;
 };
 
 struct vgic_v2_cpu_if {
diff --git a/include/linux/irqchip/arm-gic-v3.h 
b/include/linux/irqchip/arm-gic-v3.h
index 9eeeb95..70e9539 100644
--- a/include/linux/irqchip/arm-gic-v3.h
+++ b/include/linux/irqchip/arm-gic-v3.h
@@ -179,6 +179,7 @@
 #define GITS_CWRITER   0x0088
 #define GITS_CREADR0x0090
 #define GITS_BASER 0x0100
+#define GITS_IDREGS_BASE   0xffd0
 #define GITS_PIDR2 GICR_PIDR2
 
 #define GITS_TRANSLATER0x10040
diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
new file mode 100644
index 000..659dd39
--- /dev/null
+++ b/virt/kvm/arm/its-emul.c
@@ -0,0 +1,125 @@
+/*
+ * GICv3 ITS emulation
+ *
+ * Copyright (C) 2015 ARM Ltd.
+ * Author: Andre Przywara <andre.przyw...@arm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include "vgic.h"
+#include "its-emul.h"
+
+static bool handle_mmio_misc_gits(struct kvm_vcpu *vcpu,
+ struct kvm_exit_mmio *mmio,
+ phys_addr_t offset)
+{
+   return false;
+}
+
+static bool handle_mmio_gits_idregs(struct kvm_vcpu *vcpu,
+   struct kvm_exit_mmio *mmio,
+   phys_addr_t offset)
+{
+   return false;
+}
+
+static bool handle_mmio_gits_cbaser(struct kvm_vcpu *vcpu,
+   struct kvm_exit_mmio *mmio,
+   phys_addr_t offset)
+{
+   return false;
+}
+
+static bool handle_mmio_gits_cwriter(struct kvm_vcpu *vcpu,
+struct kvm_exit_mmio *mmio,
+phys_addr_t offset)
+{
+   return false;
+}
+
+static bool handle_mmio_gits_creadr(struct kvm_vcpu *vcpu,
+   struct kvm_exit_mmio *mmio,
+   phys_addr_t offset)
+{
+   return false;
+}
+
+static const struct vgic_io_range vgicv3_its_ranges[] = {
+   {
+   .base   = GITS_CTLR,
+   .len= 0x10,
+   .bits_per_irq   = 0,
+  

Re: [PATCH v2 13/15] KVM: arm64: implement ITS command queue command handlers

2015-10-07 Thread Andre Przywara
Hi Eric,

thanks a lot for your comments. I just found this email, which seemed to
have been crushed between my holidays and KVM forum.
I tried to address your concerns in my new revision, some have become
obsolete due to the reworked locking.
So I refrain from answering in detail here, since some code has changed
in v3 and I don't really want to talk about the old version anymore ;-)

If you care I can try to answer on your concerns in the new v3 context,
but you may want to take a look at the new patch anyway.

Cheers,
Andre.

On 17/08/15 14:33, Eric Auger wrote:
> On 07/10/2015 04:21 PM, Andre Przywara wrote:
>> The connection between a device, an event ID, the LPI number and the
>> allocated CPU is stored in in-memory tables in a GICv3, but their
>> format is not specified by the spec. Instead software uses a command
>> queue in a ring buffer to let the ITS implementation use their own
>> format.
>> Implement handlers for the various ITS commands and let them store
>> the requested relation into our own data structures.
>> To avoid kmallocs inside the ITS spinlock, we preallocate possibly
>> needed memory outside of the lock and free that if it turns out to
>> be not needed (mostly error handling).
> still dist lock ...?
>> Error handling is very basic at this point, as we don't have a good
>> way of communicating errors to the guest (usually a SError).
>> The INT command handler is missing at this point, as we gain the
>> capability of actually injecting MSIs into the guest only later on.
>>
>> Signed-off-by: Andre Przywara <andre.przyw...@arm.com>
>> ---
>>  include/linux/irqchip/arm-gic-v3.h |   5 +-
>>  virt/kvm/arm/its-emul.c| 497 
>> -
>>  virt/kvm/arm/its-emul.h|  11 +
>>  3 files changed, 511 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/irqchip/arm-gic-v3.h 
>> b/include/linux/irqchip/arm-gic-v3.h
>> index 0b450c7..80db4f6 100644
>> --- a/include/linux/irqchip/arm-gic-v3.h
>> +++ b/include/linux/irqchip/arm-gic-v3.h
>> @@ -253,7 +253,10 @@
>>   */
>>  #define GITS_CMD_MAPD   0x08
>>  #define GITS_CMD_MAPC   0x09
>> -#define GITS_CMD_MAPVI  0x0a
>> +#define GITS_CMD_MAPTI  0x0a
>> +/* older GIC documentation used MAPVI for this command */
>> +#define GITS_CMD_MAPVI  GITS_CMD_MAPTI
>> +#define GITS_CMD_MAPI   0x0b
>>  #define GITS_CMD_MOVI   0x01
>>  #define GITS_CMD_DISCARD0x0f
>>  #define GITS_CMD_INV0x0c
>> diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
>> index 05245cb..89534c6 100644
>> --- a/virt/kvm/arm/its-emul.c
>> +++ b/virt/kvm/arm/its-emul.c
>> @@ -22,6 +22,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #include 
>>  #include 
>> @@ -55,6 +56,34 @@ struct its_itte {
>>  unsigned long *pending;
>>  };
>>  
>> +static struct its_device *find_its_device(struct kvm *kvm, u32 device_id)
>> +{
>> +struct vgic_its *its = >arch.vgic.its;
>> +struct its_device *device;
>> +
>> +list_for_each_entry(device, >device_list, dev_list)
>> +if (device_id == device->device_id)
>> +return device;
>> +
>> +return NULL;
>> +}
>> +
>> +static struct its_itte *find_itte(struct kvm *kvm, u32 device_id, u32 
>> event_id)
>> +{
>> +struct its_device *device;
>> +struct its_itte *itte;
>> +
>> +device = find_its_device(kvm, device_id);
>> +if (device == NULL)
>> +return NULL;
>> +
>> +list_for_each_entry(itte, >itt, itte_list)
>> +if (itte->event_id == event_id)
>> +return itte;
>> +
>> +return NULL;
>> +}
>> +
>>  #define for_each_lpi(dev, itte, kvm) \
>>  list_for_each_entry(dev, &(kvm)->arch.vgic.its.device_list, dev_list) \
>>  list_for_each_entry(itte, &(dev)->itt, itte_list)
>> @@ -71,6 +100,19 @@ static struct its_itte *find_itte_by_lpi(struct kvm 
>> *kvm, int lpi)
>>  return NULL;
>>  }
>>  
>> +static struct its_collection *find_collection(struct kvm *kvm, int coll_id)
>> +{
>> +struct its_collection *collection;
>> +
>> +list_for_each_entry(collection, >arch.vgic.its.collection_list,
>> +coll_list) {
>> 

Re: [PATCH v2 01/15] KVM: arm/arm64: VGIC: don't track used LRs in the distributor

2015-10-02 Thread Andre Przywara
Hi Pavel,

On 02/10/15 13:39, Pavel Fedin wrote:
>  Hello!
> 
>> Can't you use the ELRSR bitmap instead? The idea of lr_used sounds like
>> a moot optimization to me.
> 
>  This perfectly works on 4.2, but will break HW interrupt forwarding on 4.3. 
> If you look at 4.3
> __kvm_vgic_sync_hwstate(), you'll notice that for HW interrupts lr_used and 
> elrsr_ptr will diverge
> at this point, and this function actually brings them into sync. And it 
> relies on lr_used for the
> loop to operate correctly (no idea why we use "for" loop here with extra 
> check instead of
> "for_each_set_bit(lr, vgic_cpu->lr_used, vgic->nr_lr)", looks stupid to me).

I know, because I have reworked my patch lately to work on top of 4.3-rc
and Christoffer's timer rework series. I have something which "works
now"(TM), but wanted to wait for Christoffer's respin to send out.
I will send you this new version this as a sneak preview in private,
maybe that helps.

Cheers,
Andre.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 01/15] KVM: arm/arm64: VGIC: don't track used LRs in the distributor

2015-10-02 Thread Andre Przywara
Hi Pavel,

On 02/10/15 10:55, Pavel Fedin wrote:
>  Hello! One more concern.
> 
>> Currently we track which IRQ has been mapped to which VGIC list
>> register and also have to synchronize both. We used to do this
>> to hold some extra state (for instance the active bit).
>> It turns out that this extra state in the LRs is no longer needed and
>> this extra tracking causes some pain later.
> 
>  Returning to the beginning, can you explain, what kind of pain exactly does 
> it bring?
>  For some reasons here i had to keep all this tracking mechanism, and 
> modified your patchset. I see
> no significant problems, except memory usage. I have to allocate 
> vgic_irq_lr_map large enough to
> hold indexes up to 16384, and indexes from dist->nr_irqs to 8192 appear to be 
> not used.

Yes, this was the main problem I was concerned about. Actually not so
much about memory usage really, but more about the idea of pushing the
concept of bitmaps beyond the point where it is a reasonable data
structure to use.
I briefly thought about extending the bitmaps, but that sounded like a
hack to me.
For instance how did you come up with that 16384? LPIs could actually be
much bigger (in fact the emulation currently support up to 64k).
In my opinion removing that tracking mechanism is actually a good idea.
Most implementations actually have only _four_ list registers and since
we keep them shadowed in our KVM data structure, iterating over all of
them is not really painful.

>  Since the map itself is actually used only for piggy-back optimization, it 
> is possible to easily
> get rid of it using for_each_set_bit(lr, vgic_cpu->lr_used, vgic->nr_lr) 
> iteration instead.

Can't you use the ELRSR bitmap instead? The idea of lr_used sounds like
a moot optimization to me.

Cheers,
Andre.

> The rest
> of mechanism will work as it is, there's no need to remove the state tracking 
> bitmap and
> optimization itself.
>  I am currently testing this approach and preparing my alternative patch for 
> upstreaming.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 5/8] arm/arm64: KVM: Use appropriate define in VGIC reset code

2015-10-02 Thread Andre Przywara
Hi Christoffer,

On 29/09/15 15:49, Christoffer Dall wrote:
> We currently initialize the SGIs to be enabled in the VGIC code, but we
> use the VGIC_NR_PPIS define for this purpose, instead of the the more
> natural VGIC_NR_SGIS.  Change this slightly confusing use of the
> defines.
> 
> Note: This should have no functional change, as both names are defined
> to the number 16.
> 
> Acked-by: Marc Zyngier 
> Signed-off-by: Christoffer Dall 
> ---
>  virt/kvm/arm/vgic.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index e606f78..9ed8d53 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -2109,7 +2109,7 @@ int vgic_init(struct kvm *kvm)
>   }
>  
>   for (i = 0; i < dist->nr_irqs; i++) {
> - if (i < VGIC_NR_PPIS)
> + if (i < VGIC_NR_SGIS)
>   vgic_bitmap_set_irq_val(>irq_enabled,
>   vcpu->vcpu_id, i, 1);
>   if (i < VGIC_NR_PRIVATE_IRQS)
> 

While the patch itself is a good catch, I wonder why we iterate over all
IRQs here if we only do something for private IRQs? Can you fix that on
the way as well?
Oh, and while you are at it: ;-)
A comments like: "Set all private IRQs to be edge-triggered and enable
all SGIs." sounds useful to me.

Cheers,
Andre.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/8] arm/arm64: KVM: vgic: Factor out level irq processing on guest exit

2015-10-02 Thread Andre Przywara
Hi Christoffer,

On 29/09/15 15:49, Christoffer Dall wrote:
> Currently vgic_process_maintenance() processes dealing with a completed
> level-triggered interrupt directly, but we are soon going to reuse this
> logic for level-triggered mapped interrupts with the HW bit set, so
> move this logic into a separate static function.
> 
> Probably the most scary part of this commit is convincing yourself that
> the current flow is safe compared to the old one.  In the following I
> try to list the changes and why they are harmless:
> 
>   Move vgic_irq_clear_queued after kvm_notify_acked_irq:
> Harmless because the only potential effect of clearing the queued
> flag wrt.  kvm_set_irq is that vgic_update_irq_pending does not set
> the pending bit on the emulated CPU interface or in the
> pending_on_cpu bitmask if the function is called with level=1.
> However, the point of kvm_notify_acked_irq is to call kvm_set_irq
> with level=0, and we set the queued flag again in
> __kvm_vgic_sync_hwstate later on if the level is stil high.
> 
>   Move vgic_set_lr before kvm_notify_acked_irq:
> Also, harmless because the LR are cpu-local operations and
> kvm_notify_acked only affects the dist
> 
>   Move vgic_dist_irq_clear_soft_pend after kvm_notify_acked_irq:
> Also harmless because it's just a bit which is cleared and altering
> the line state does not affect this bit.

Mmmh, kvm_set_irq(level=0) will eventually execute (in
vgic_update_irq_pending()):

vgic_dist_irq_clear_level(vcpu, irq_num);
if (!vgic_dist_irq_soft_pend(vcpu, irq_num))
vgic_dist_irq_clear_pending(vcpu, irq_num);

So with the former code we would clear the (dist) pending bit if
soft_pend was set before, while with the newer code we wouldn't.
Is this just still working because Linux guests will never set the
soft_pend bit? Or is this safe because will always clear the pending bit
anyway later on? (my brain is too much jellyfish by now to still work
this dependency out)
Or what do I miss here?

> 
> Reviewed-by: Eric Auger 
> Reviewed-by: Marc Zyngier 
> Signed-off-by: Christoffer Dall 
> ---
>  virt/kvm/arm/vgic.c | 88 
> ++---
>  1 file changed, 50 insertions(+), 38 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 6bd1c9b..fe0e5db 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1322,12 +1322,56 @@ epilog:
>   }
>  }
>  
> +static int process_level_irq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr 
> vlr)
> +{
> + int level_pending = 0;

Why is this an int and not a bool? Also see below ...

> +
> + vlr.state = 0;
> + vlr.hwirq = 0;
> + vgic_set_lr(vcpu, lr, vlr);
> +
> + /*
> +  * If the IRQ was EOIed (called from vgic_process_maintenance) or it
> +  * went from active to non-active (called from vgic_sync_hwirq) it was
> +  * also ACKed and we we therefore assume we can clear the soft pending
> +  * state (should it had been set) for this interrupt.
> +  *
> +  * Note: if the IRQ soft pending state was set after the IRQ was
> +  * acked, it actually shouldn't be cleared, but we have no way of
> +  * knowing that unless we start trapping ACKs when the soft-pending
> +  * state is set.
> +  */
> + vgic_dist_irq_clear_soft_pend(vcpu, vlr.irq);
> +
> + /*
> +  * Tell the gic to start sampling the line of this interrupt again.
> +  */
> + vgic_irq_clear_queued(vcpu, vlr.irq);
> +
> + /* Any additional pending interrupt? */
> + if (vgic_dist_irq_get_level(vcpu, vlr.irq)) {
> + vgic_cpu_irq_set(vcpu, vlr.irq);
> + level_pending = 1;
> + } else {
> + vgic_dist_irq_clear_pending(vcpu, vlr.irq);
> + vgic_cpu_irq_clear(vcpu, vlr.irq);
> + }
> +
> + /*
> +  * Despite being EOIed, the LR may not have
> +  * been marked as empty.
> +  */
> + vgic_sync_lr_elrsr(vcpu, lr, vlr);
> +
> + return level_pending;
> +}
> +
>  static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>  {
>   u32 status = vgic_get_interrupt_status(vcpu);
>   struct vgic_dist *dist = >kvm->arch.vgic;
> - bool level_pending = false;
>   struct kvm *kvm = vcpu->kvm;
> + int level_pending = 0;

Why this change here? Even after 8/8 I don't see any use of values
outside of true/false.

Cheers,
Andre.

>  
>   kvm_debug("STATUS = %08x\n", status);
>  
> @@ -1342,54 +1386,22 @@ static bool vgic_process_maintenance(struct kvm_vcpu 
> *vcpu)
>  
>   for_each_set_bit(lr, eisr_ptr, vgic->nr_lr) {
>   struct vgic_lr vlr = vgic_get_lr(vcpu, lr);
> - WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
>  
> - spin_lock(>lock);
> - vgic_irq_clear_queued(vcpu, vlr.irq);
> +   

Re: [PATCH v3 4/8] arm/arm64: KVM: Implement GICD_ICFGR as RO for PPIs

2015-10-02 Thread Andre Przywara
Hi Christoffer,

On 29/09/15 15:49, Christoffer Dall wrote:
> The GICD_ICFGR allows the bits for the SGIs and PPIs to be read only.
> We currently simulate this behavior by writing a hardcoded value to the
> register for the SGIs and PPIs on every write of these bits to the
> register (ignoring what the guest actually wrote), and by writing the
> same value as the reset value to the register.
> 
> This is a bit counter-intuitive, as the register is RO for these bits,
> and we can just implement it that way, allowing us to control the value
> of the bits purely in the reset code.
> 
> Reviewed-by: Marc Zyngier <marc.zyng...@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.d...@linaro.org>
> ---
>  virt/kvm/arm/vgic.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index fe0e5db..e606f78 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -655,7 +655,7 @@ bool vgic_handle_cfg_reg(u32 *reg, struct kvm_exit_mmio 
> *mmio,
>   ACCESS_READ_VALUE | ACCESS_WRITE_VALUE);
>   if (mmio->is_write) {
>   if (offset < 8) {
> - *reg = ~0U; /* Force PPIs/SGIs to 1 */
> + /* Ignore writes to read-only SGI and PPI bits */
>   return false;
>   }

Nit: Isn't this now violating kernel coding style because of a single
statement not needing braces? Maybe move the comment in front of the
if-statement to make this more obvious?

Other than that:

Reviewed-by: Andre Przywara <andre.przyw...@arm.com>

Cheers,
Andre.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 8/8] arm/arm64: KVM: Support edge-triggered forwarded interrupts

2015-10-02 Thread Andre Przywara
On 29/09/15 15:49, Christoffer Dall wrote:
> We mark edge-triggered interrupts with the HW bit set as queued to
> prevent the VGIC code from injecting LRs with both the Active and
> Pending bits set at the same time while also setting the HW bit,
> because the hardware does not support this.
> 
> However, this means that we must also clear the queued flag when we sync
> back a LR where the state on the physical distributor went from active
> to inactive because the guest deactivated the interrupt.  At this point
> we must also check if the interrupt is pending on the distributor, and
> tell the VGIC to queue it again if it is.
> 
> Since these actions on the sync path are extremely close to those for
> level-triggered interrupts, rename process_level_irq to
> process_queued_irq, allowing it to cater for both cases.
> 
> Signed-off-by: Christoffer Dall 


> ---
>  virt/kvm/arm/vgic.c | 40 ++--
>  1 file changed, 22 insertions(+), 18 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 53548f1..f3e76e5 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1322,13 +1322,10 @@ epilog:
>   }
>  }
>  
> -static int process_level_irq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr 
> vlr)
> +static int process_queued_irq(struct kvm_vcpu *vcpu,
> +int lr, struct vgic_lr vlr)
>  {
> - int level_pending = 0;
> -
> - vlr.state = 0;
> - vlr.hwirq = 0;
> - vgic_set_lr(vcpu, lr, vlr);
> + int pending = 0;

As I mentioned in my reply to 3/8 already: shouldn't this be "bool"?

>  
>   /*
>* If the IRQ was EOIed (called from vgic_process_maintenance) or it
> @@ -1344,26 +1341,35 @@ static int process_level_irq(struct kvm_vcpu *vcpu, 
> int lr, struct vgic_lr vlr)
>   vgic_dist_irq_clear_soft_pend(vcpu, vlr.irq);
>  
>   /*
> -  * Tell the gic to start sampling the line of this interrupt again.
> +  * Tell the gic to start sampling this interrupt again.
>*/
>   vgic_irq_clear_queued(vcpu, vlr.irq);
>  
>   /* Any additional pending interrupt? */
> - if (vgic_dist_irq_get_level(vcpu, vlr.irq)) {
> - vgic_cpu_irq_set(vcpu, vlr.irq);
> - level_pending = 1;
> + if (vgic_irq_is_edge(vcpu, vlr.irq)) {
> + BUG_ON(!(vlr.state & LR_HW));

Is that really needed here? I don't see how this function would fail if
called on a non-mapped IRQ. Also the two current callers would always
fulfil this requirement:
- vgic_process_maintenance() already has a WARN_ON(vgic_irq_is_edge)
- vgic_sync_irq() returns early if it's not a mapped IRQ

Removing this would also allow to pass "int irq" instead of "struct
vgic_lr vlr".

Just an idea, though and not a show-stopper.

Other than that it looks good to me.

Cheers,
Andre.

> + pending = vgic_dist_irq_is_pending(vcpu, vlr.irq);
>   } else {
> - vgic_dist_irq_clear_pending(vcpu, vlr.irq);
> - vgic_cpu_irq_clear(vcpu, vlr.irq);
> + if (vgic_dist_irq_get_level(vcpu, vlr.irq)) {
> + vgic_cpu_irq_set(vcpu, vlr.irq);
> + pending = 1;
> + } else {
> + vgic_dist_irq_clear_pending(vcpu, vlr.irq);
> + vgic_cpu_irq_clear(vcpu, vlr.irq);
> + }
>   }
>  
>   /*
>* Despite being EOIed, the LR may not have
>* been marked as empty.
>*/
> + vlr.state = 0;
> + vlr.hwirq = 0;
> + vgic_set_lr(vcpu, lr, vlr);
> +
>   vgic_sync_lr_elrsr(vcpu, lr, vlr);
>  
> - return level_pending;
> + return pending;
>  }
>  
>  static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
> @@ -1400,7 +1406,7 @@ static bool vgic_process_maintenance(struct kvm_vcpu 
> *vcpu)
>vlr.irq - VGIC_NR_PRIVATE_IRQS);
>  
>   spin_lock(>lock);
> - level_pending |= process_level_irq(vcpu, lr, vlr);
> + level_pending |= process_queued_irq(vcpu, lr, vlr);
>   spin_unlock(>lock);
>   }
>   }
> @@ -1422,7 +1428,7 @@ static bool vgic_process_maintenance(struct kvm_vcpu 
> *vcpu)
>  /*
>   * Save the physical active state, and reset it to inactive.
>   *
> - * Return true if there's a pending level triggered interrupt line to queue.
> + * Return true if there's a pending forwarded interrupt to queue.
>   */
>  static bool vgic_sync_hwirq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr 
> vlr)
>  {
> @@ -1458,10 +1464,8 @@ static bool vgic_sync_hwirq(struct kvm_vcpu *vcpu, int 
> lr, struct vgic_lr vlr)
>   return false;
>   }
>  
> - /* Mapped edge-triggered interrupts not yet supported. */
> - WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
>   spin_lock(>lock);
> - level_pending = process_level_irq(vcpu, lr, vlr);
> + level_pending = 

Re: [PATCH v4 07/11] KVM: arm/arm64: vgic: Allow HW interrupts to be queued to a guest

2015-09-30 Thread Andre Przywara
Hi Christoffer,

On 29/09/15 14:44, Christoffer Dall wrote:
> On Wed, Sep 23, 2015 at 06:55:04PM +0100, Andre Przywara wrote:
>> Salut Marc,
>>
>> I know that this patch is already merged, but 
>>
>> On 07/08/15 16:45, Marc Zyngier wrote:
>>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>>> index 51c9900..9d009d2 100644
>> ...
>>> @@ -1364,6 +1397,39 @@ static bool vgic_process_maintenance(struct kvm_vcpu 
>>> *vcpu)
>>> return level_pending;
>>>  }
>>>  
>>> +/*
>>> + * Save the physical active state, and reset it to inactive.
>>> + *
>>> + * Return 1 if HW interrupt went from active to inactive, and 0 otherwise.
>>> + */
>>> +static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
>>> +{
>>> +   struct irq_phys_map *map;
>>> +   int ret;
>>> +
>>> +   if (!(vlr.state & LR_HW))
>>> +   return 0;
>>> +
>>> +   map = vgic_irq_map_search(vcpu, vlr.irq);
>>> +   BUG_ON(!map || !map->active);
>>> +
>>> +   ret = irq_get_irqchip_state(map->irq,
>>> +   IRQCHIP_STATE_ACTIVE,
>>> +   >active);
>>> +
>>> +   WARN_ON(ret);
>>> +
>>> +   if (map->active) {
>>> +   ret = irq_set_irqchip_state(map->irq,
>>> +   IRQCHIP_STATE_ACTIVE,
>>> +   false);
>>> +   WARN_ON(ret);
>>> +   return 0;
>>> +   }
>>> +
>>> +   return 1;
>>> +}
>>> +
>>>  /* Sync back the VGIC state after a guest run */
>>>  static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
>>>  {
>>> @@ -1378,14 +1444,31 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu 
>>> *vcpu)
>>> elrsr = vgic_get_elrsr(vcpu);
>>> elrsr_ptr = u64_to_bitmask();
>>>  
>>> -   /* Clear mappings for empty LRs */
>>> -   for_each_set_bit(lr, elrsr_ptr, vgic->nr_lr) {
>>> +   /* Deal with HW interrupts, and clear mappings for empty LRs */
>>> +   for (lr = 0; lr < vgic->nr_lr; lr++) {
>>> struct vgic_lr vlr;
>>>  
>>> -   if (!test_and_clear_bit(lr, vgic_cpu->lr_used))
>>> +   if (!test_bit(lr, vgic_cpu->lr_used))
>>> continue;
>>>  
>>> vlr = vgic_get_lr(vcpu, lr);
>>> +   if (vgic_sync_hwirq(vcpu, vlr)) {
>>> +   /*
>>> +* So this is a HW interrupt that the guest
>>> +* EOI-ed. Clean the LR state and allow the
>>> +* interrupt to be sampled again.
>>> +*/
>>> +   vlr.state = 0;
>>> +   vlr.hwirq = 0;
>>> +   vgic_set_lr(vcpu, lr, vlr);
>>> +   vgic_irq_clear_queued(vcpu, vlr.irq);
>>
>> Isn't this line altering common VGIC state without holding the lock?
>> Eric removed the coarse dist->lock around the whole
>> __kvm_vgic_sync_hwstate() function, we take it now in
>> vgic_process_maintenance(), but don't hold it here AFAICT.
>> As long as we are only dealing with private timer IRQs this is probably
>> not a problem, but the IRQ number could be a SPI as well, right?
>>
> I don't see a problematic race with this though, as all we're doing is
> to clear a bit in a bitmap, which is always checked atomically, so
> adding a lock around this really doesn't change anything as far as I can
> tell.

Indeed I found a similar comment in some older revisions of the code.

But isn't it that other code holding the lock (thinking about
kvm_vgic_flush_hwstate() in particular) assumes that no-one else tinkers
with the VGIC state while it holds the lock?
So couldn't we (potentially) run into inconsistent state because we
cleared the queued bit while the flushing code runs over all interrupts?
Maybe not in this particular case, but in general?

Haven't looked at your new series yet, but will do this ASAP.

Cheers,
Andre.

> 
> Nevertheless, my rework series also addresses this.
> 
> -Christoffer
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvmtool] Skip a few messages by default: command line args; flat binary; earlyprintk.

2015-09-30 Thread Andre Przywara
Hi Dimitri,

thanks for sharing your patches.

On 29/09/15 17:59, Dimitri John Ledkov wrote:
> The partial command line args & earlyprintk=serial are still enabled
> in the debug mode. Warning that a flat binary kernel image is attemped
> to be loaded is completely dropped. These are not that informative,
> once one is past intial debugging, and only polute the console.
> 
> Signed-off-by: Dimitri John Ledkov 
> ---
>  builtin-run.c | 10 ++
>  kvm.c |  1 -
>  x86/kvm.c |  8 ++--
>  3 files changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/builtin-run.c b/builtin-run.c
> index e0c8732..8edbf88 100644
> --- a/builtin-run.c
> +++ b/builtin-run.c
> @@ -613,10 +613,12 @@ static struct kvm *kvm_cmd_run_init(int argc, const 
> char **argv)
>  
>   kvm->cfg.real_cmdline = real_cmdline;
>  
> - printf("  # %s run -k %s -m %Lu -c %d --name %s\n", KVM_BINARY_NAME,
> - kvm->cfg.kernel_filename,
> - (unsigned long long)kvm->cfg.ram_size / 1024 / 1024,
> - kvm->cfg.nrcpus, kvm->cfg.guest_name);
> + if (do_debug_print) {
> + printf("  # %s run -k %s -m %Lu -c %d --name %s\n", 
> KVM_BINARY_NAME,
> +kvm->cfg.kernel_filename,
> +(unsigned long long)kvm->cfg.ram_size / 1024 / 1024,
> +kvm->cfg.nrcpus, kvm->cfg.guest_name);
> + }

I like the general idea. In fact I have this very patch (among others)
in my tree too. I applied similar guarding to other messages as well
(mostly those that only show up on ARM, but also the "ended normally"
message). Like any good UNIX tool kvmtool should keep quiet if it has
nothing worthwhile to say ;-)
But looking at it more closely, I see that there is pr_debug() defined
doing that "if (do_debug_print)" already. The only issue is that is
prints source line information, which is not really useful here. But
then again there does not seem to be any user of it?

So what about the following:
- We avoid printing pr_info() messages in the default case. Looking at
its current users in the tree this information is not really useful for
normal users. We enable pr_info() output only if do_debug_print is
enabled or introduce another command line flag (--verbose?) for that.
- We check each user of pr_info() to see whether this information is
actually "informational" or whether it should be converted to pr_warn.
- We change the above line to use pr_info instead of printf.
- We fix the EOL mayhem we have atm while at it.

If you don't mind I will give this a try later this week.

>  
>   if (init_list__init(kvm) < 0)
>   die ("Initialisation failed");
> diff --git a/kvm.c b/kvm.c
> index 10ed230..1081072 100644
> --- a/kvm.c
> +++ b/kvm.c
> @@ -378,7 +378,6 @@ bool kvm__load_kernel(struct kvm *kvm, const char 
> *kernel_filename,
>   if (ret)
>   goto found_kernel;
>  
> - pr_warning("%s is not a bzImage. Trying to load it as a flat 
> binary...", kernel_filename);

I think on x86 this message is useful to have: to point people to the
fact that they are trying to load a kernel which most probably isn't one.
Do you actually load a "flat binary", so not a Linux bzImage? If yes,
what is it? Does this work for you? I didn't have the impression that
this code was actually used at all.
If you do use it, could you please give my kernel loading series [1] a
try? I touch flat binary loading there, but had no chance to test it.

>  #endif
>  
>   ret = load_elf_binary(kvm, fd_kernel, fd_initrd, kernel_cmdline);
> diff --git a/x86/kvm.c b/x86/kvm.c
> index 512ad67..4a5fa41 100644
> --- a/x86/kvm.c
> +++ b/x86/kvm.c
> @@ -124,8 +124,12 @@ void kvm__arch_set_cmdline(char *cmdline, bool video)
>   "i8042.dumbkbd=1 i8042.nopnp=1");
>   if (video)
>   strcat(cmdline, " video=vesafb console=tty0");
> - else
> - strcat(cmdline, " console=ttyS0 earlyprintk=serial 
> i8042.noaux=1");
> + else {
> + strcat(cmdline, " console=ttyS0 i8042.noaux=1");
> + if (do_debug_print) {
> + strcat(cmdline, " earlyprintk=serial");
> + }
> + }

I am not completely convinced of this one. The do_debug_print is meant
to affect kvmtool's own debug output only and should really have no
effect on the guest's kernel output, shouldn't it?
Maybe we should clarify the semantics in the documentation?

Cheers,
Andre.

[1] http://marc.info/?l=kvm=143825354808135

>  }
>  
>  /* Architecture-specific KVM init */
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvmtool] kvmtool: expose the TSC Deadline Timer feature to the guest

2015-09-30 Thread Andre Przywara
Hi Dimitri,

On 29/09/15 18:00, Dimitri John Ledkov wrote:
> From: Arjan van de Ven 
> 
> with the TSC deadline timer feature, we don't need to calibrate the apic
> timers anymore, which saves more than 100 milliseconds of boot time.
> 
> Signed-off-by: Arjan van de Ven 
> Signed-off-by: Dimitri John Ledkov 
> ---
>  x86/cpuid.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/x86/cpuid.c b/x86/cpuid.c
> index c3b67d9..1d8bd23 100644
> --- a/x86/cpuid.c
> +++ b/x86/cpuid.c
> @@ -31,6 +31,9 @@ static void filter_cpuid(struct kvm_cpuid2 *kvm_cpuid)
>   /* Set X86_FEATURE_HYPERVISOR */
>   if (entry->index == 0)
>   entry->ecx |= (1 << 31);
> +/* Set CPUID_EXT_TSC_DEADLINE_TIMER*/
> + if (entry->index == 0)
> + entry->ecx |= (1 << 24);

This can only be enabled if the kernel supports emulation of that
feature (reported via KVM_CAP_TSC_DEADLINE_TIMER)
(cf: Documentation/virtual/kvm/api.txt and respective QEMU code in
target-i386/kvm.c)

Cheers,
Andre.

>   break;
>   case 6:
>   /* Clear X86_FEATURE_EPB */
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 5/5] KVM: arm64: Implement vGICv3 CPU interface access

2015-09-25 Thread Andre Przywara
On 24/09/15 13:08, Pavel Fedin wrote:
>  Hello!
> 
>> The only thing that is pure 64-bit is the MRS/MSR _instruction_ in
>> Aarch64, which always takes a x register.
>> So can you model the register size according to the spec and allow
>> 32-bit accesses from userland?
> 
>  I would like to complete the rework and respin v4, but this is, i guess, the 
> only major issue left.
> Additionally, it impacts the API. So...
>  In order to allow 32-bit accesses we would have to drop using 
> ARM64_SYS_REG() for building
> attribute ID and introduce something own, like KVM_DEV_ARM_VGIC_REG(). It 
> will have different bits
> layout (actually it will be missing 'arch' and 'size' field, and instead i 
> will use
> KVM_DEV_ARM_VGIC_64BIT flag for length specification, the same as for 
> redistributor.
>  Will this be OK ?

No, instead you should go with your original approach ;-)
Thinking about that again I see that this interface is of course modeled
after the architectured GICv3 system registers, where AArch32 has its
own, separate encoding. So it's perfectly fine to use that 64-bit
interface between userland and KVM now. If we later get Aarch32 support
for the GICv3, we can add the appropriate Aarch32 sysregs to that
interface and have a natural match.

So: sorry for the noise, you can just go ahead with that native 64-bit
sysregs encoding for [SG]ET_ONE_REG as you had before.

Cheers,
Andre.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/15] KVM: arm64: GICv3 ITS emulation

2015-09-24 Thread Andre Przywara
Hi Pavel,

On 24/09/15 12:18, Pavel Fedin wrote:
>  Hello Andre and others!
> 
>  How are things going? I see the last message in thread something like 1 
> month old, then silence...
>  Our project relies on this feature, any assistance needed?

I am about to make it work on top of Christoffer's latest arch timer
rework patches, which means I need to rewrite most of patch 1. Currently
that boots, but hangs as soon as I put some load on it. Finding the
reason for this is a bit tedious at the moment.
I have addressed the comments from the list on the other patches, so
ideally I can send a new revision as soon as I fixed that bug in the
first patch.

Cheers,
Andre.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 7/8] arm/arm64: KVM: Rework the arch timer to use level-triggered semantics

2015-09-23 Thread Andre Przywara
Hi Christoffer,

> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 9ed8d53..f4ea950 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1422,34 +1422,43 @@ static bool vgic_process_maintenance(struct kvm_vcpu 
> *vcpu)
>  /*
>   * Save the physical active state, and reset it to inactive.
>   *
> - * Return 1 if HW interrupt went from active to inactive, and 0 otherwise.
> + * Return true if there's a pending level triggered interrupt line to queue.
>   */
> -static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
> +static bool vgic_sync_hwirq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr 
> vlr)
>  {
>   struct irq_phys_map *map;
> + bool phys_active;
>   int ret;
>  
>   if (!(vlr.state & LR_HW))
>   return 0;
>  
>   map = vgic_irq_map_search(vcpu, vlr.irq);
> - BUG_ON(!map || !map->active);
> + BUG_ON(!map);
>  
>   ret = irq_get_irqchip_state(map->irq,
>   IRQCHIP_STATE_ACTIVE,
> - >active);
> + _active);
>  
>   WARN_ON(ret);
>  
> - if (map->active) {
> + if (phys_active) {
> + /*
> +  * Interrupt still marked as active on the physical
> +  * distributor, so guest did not EOI it yet.  Reset to
> +  * non-active so that other VMs can see interrupts from this
> +  * device.
> +  */
>   ret = irq_set_irqchip_state(map->irq,
>   IRQCHIP_STATE_ACTIVE,
>   false);
>   WARN_ON(ret);
> - return 0;
> + return false;
>   }
>  
> - return 1;
> + /* Mapped edge-triggered interrupts not yet supported. */
> + WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
> + return process_level_irq(vcpu, lr, vlr);

Don't you miss the dist->lock here? The other call to
process_level_irq() certainly does it, and Eric recently removed the
coarse grained lock around the whole __kvm_vgic_sync_hwstate() function.
So we don't hold the lock here, but we change quite some common VGIC
state in there.

Cheers.
Andre.

>  }
>  
>  /* Sync back the VGIC state after a guest run */
> @@ -1474,18 +1483,8 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu 
> *vcpu)
>   continue;
>  
>   vlr = vgic_get_lr(vcpu, lr);
> - if (vgic_sync_hwirq(vcpu, vlr)) {
> - /*
> -  * So this is a HW interrupt that the guest
> -  * EOI-ed. Clean the LR state and allow the
> -  * interrupt to be sampled again.
> -  */
> - vlr.state = 0;
> - vlr.hwirq = 0;
> - vgic_set_lr(vcpu, lr, vlr);
> - vgic_irq_clear_queued(vcpu, vlr.irq);
> - set_bit(lr, elrsr_ptr);
> - }
> + if (vgic_sync_hwirq(vcpu, lr, vlr))
> + level_pending = true;
>  
>   if (!test_bit(lr, elrsr_ptr))
>   continue;
> @@ -1861,30 +1860,6 @@ static void vgic_free_phys_irq_map_rcu(struct rcu_head 
> *rcu)
>  }
>  
>  /**
> - * kvm_vgic_get_phys_irq_active - Return the active state of a mapped IRQ
> - *
> - * Return the logical active state of a mapped interrupt. This doesn't
> - * necessarily reflects the current HW state.
> - */
> -bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map)
> -{
> - BUG_ON(!map);
> - return map->active;
> -}
> -
> -/**
> - * kvm_vgic_set_phys_irq_active - Set the active state of a mapped IRQ
> - *
> - * Set the logical active state of a mapped interrupt. This doesn't
> - * immediately affects the HW state.
> - */
> -void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active)
> -{
> - BUG_ON(!map);
> - map->active = active;
> -}
> -
> -/**
>   * kvm_vgic_unmap_phys_irq - Remove a virtual to physical IRQ mapping
>   * @vcpu: The VCPU pointer
>   * @map: The pointer to a mapping obtained through kvm_vgic_map_phys_irq
> @@ -2112,10 +2087,14 @@ int vgic_init(struct kvm *kvm)
>   if (i < VGIC_NR_SGIS)
>   vgic_bitmap_set_irq_val(>irq_enabled,
>   vcpu->vcpu_id, i, 1);
> - if (i < VGIC_NR_PRIVATE_IRQS)
> + if (i < VGIC_NR_SGIS)
>   vgic_bitmap_set_irq_val(>irq_cfg,
>   vcpu->vcpu_id, i,
>   VGIC_CFG_EDGE);
> + else if (i < VGIC_NR_PRIVATE_IRQS) /* PPIs */
> + vgic_bitmap_set_irq_val(>irq_cfg,
> + vcpu->vcpu_id, i,
> + 

Re: [PATCH v4 07/11] KVM: arm/arm64: vgic: Allow HW interrupts to be queued to a guest

2015-09-23 Thread Andre Przywara
Salut Marc,

I know that this patch is already merged, but 

On 07/08/15 16:45, Marc Zyngier wrote:
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 51c9900..9d009d2 100644
...
> @@ -1364,6 +1397,39 @@ static bool vgic_process_maintenance(struct kvm_vcpu 
> *vcpu)
>   return level_pending;
>  }
>  
> +/*
> + * Save the physical active state, and reset it to inactive.
> + *
> + * Return 1 if HW interrupt went from active to inactive, and 0 otherwise.
> + */
> +static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
> +{
> + struct irq_phys_map *map;
> + int ret;
> +
> + if (!(vlr.state & LR_HW))
> + return 0;
> +
> + map = vgic_irq_map_search(vcpu, vlr.irq);
> + BUG_ON(!map || !map->active);
> +
> + ret = irq_get_irqchip_state(map->irq,
> + IRQCHIP_STATE_ACTIVE,
> + >active);
> +
> + WARN_ON(ret);
> +
> + if (map->active) {
> + ret = irq_set_irqchip_state(map->irq,
> + IRQCHIP_STATE_ACTIVE,
> + false);
> + WARN_ON(ret);
> + return 0;
> + }
> +
> + return 1;
> +}
> +
>  /* Sync back the VGIC state after a guest run */
>  static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
>  {
> @@ -1378,14 +1444,31 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu 
> *vcpu)
>   elrsr = vgic_get_elrsr(vcpu);
>   elrsr_ptr = u64_to_bitmask();
>  
> - /* Clear mappings for empty LRs */
> - for_each_set_bit(lr, elrsr_ptr, vgic->nr_lr) {
> + /* Deal with HW interrupts, and clear mappings for empty LRs */
> + for (lr = 0; lr < vgic->nr_lr; lr++) {
>   struct vgic_lr vlr;
>  
> - if (!test_and_clear_bit(lr, vgic_cpu->lr_used))
> + if (!test_bit(lr, vgic_cpu->lr_used))
>   continue;
>  
>   vlr = vgic_get_lr(vcpu, lr);
> + if (vgic_sync_hwirq(vcpu, vlr)) {
> + /*
> +  * So this is a HW interrupt that the guest
> +  * EOI-ed. Clean the LR state and allow the
> +  * interrupt to be sampled again.
> +  */
> + vlr.state = 0;
> + vlr.hwirq = 0;
> + vgic_set_lr(vcpu, lr, vlr);
> + vgic_irq_clear_queued(vcpu, vlr.irq);

Isn't this line altering common VGIC state without holding the lock?
Eric removed the coarse dist->lock around the whole
__kvm_vgic_sync_hwstate() function, we take it now in
vgic_process_maintenance(), but don't hold it here AFAICT.
As long as we are only dealing with private timer IRQs this is probably
not a problem, but the IRQ number could be a SPI as well, right?

Cheers,
Andre.

> + set_bit(lr, elrsr_ptr);
> + }
> +
> + if (!test_bit(lr, elrsr_ptr))
> + continue;
> +
> + clear_bit(lr, vgic_cpu->lr_used);
>  
>   BUG_ON(vlr.irq >= dist->nr_irqs);
>   vgic_cpu->vgic_irq_lr_map[vlr.irq] = LR_EMPTY;
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 6/8] arm/arm64: KVM: Add forwarded physical interrupts documentation

2015-09-15 Thread Andre Przywara
Hi Christoffer,

On 14/09/15 12:42, Christoffer Dall wrote:

 Where is this done? I see that the physical dist state is altered on the
 actual IRQ forwarding, but not on later exits/entries? Do you mean
 kvm_vgic_flush_hwstate() with "flush"?
>>>
>>> this is a bug and should be fixed in the 'fixes' patches I sent last
>>> week.  We should set active state on every entry to the guest for IRQs
>>> with the HW bit set in either pending or active state.
>>
>> OK, sorry, I missed that one patch, I was looking at what should become
>> -rc1 soon (because that's what I want to rebase my ITS emulation patches
>> on). That patch wasn't in queue at the time I started looking at it.
>>
>> So I updated to the latest queue containing those two fixes and also
>> applied your v2 series. Indeed this series addresses some of the things
>> I was wondering about the last time, but the main thing still persists:
>> - Every time the physical dist state is active we have the virtual state
>> still at pending or active.
> 
> For the arch timer, yes.
> 
> For a passthrough device, there should be a situation where the physical
> dist state is active but we didn't see the virtual state updated at the
> vgic yet (after physical IRQ fires and before the VFIO ISR calls
> kvm_set_irq).

But then we wouldn't get into vgic_sync_hwirq(), because we wouldn't
inject a mapped IRQ before kvm_set_irq() is called, would we?

>> - If the physical dist state is non-active, the virtual state is
>> inactive (LR.state==8: HW bit) as well. The associated ELRSR bit is 1
>> (LR empty).
>> (I was tracing every HW mapped LR in vgic_sync_hwirq() for this)
>>
>> So that contradicts:
>>
>> +  - On guest EOI, the *physical distributor* active bit gets cleared,
>> +but the LR.Active is left untouched (set).
>>
>> This is the main point I was actually wondering about: I cannot confirm
>> this statement. In my tests the LR state and the physical dist state
>> always correspond, as excepted by reading the spec.
>>
>> I reckon that these observations are mostly independent from the actual
>> KVM code, as I try to observe hardware state (physical distributor and
>> LRs) before KVM tinkers with them.
> 
> ok, I got this paragraph from Marc, so we really need to ask him?  Which
> hardware are you seeing this behavior on?  Perhaps implementations vary
> on this point?

I checked this on Midway and Juno. Both have a GIC-400, but I don't have
access to any other GIC implementations.
I added the two BUG_ONs shown below to prove that assumption.

Eric, I've been told you observed the behaviour with the GIC not syncing
LR and phys state for a mapped HWIRQ which was not the timer.
Can you reproduce this? Does it complain with the patch below?

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 5942ce9..7fac16e 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1459,9 +1459,12 @@ static bool vgic_sync_hwirq(struct kvm_vcpu
 IRQCHIP_STATE_ACTIVE,
 false);
WARN_ON(ret);
+   BUG_ON(!(vlr.state & 3));
return false;
}

+   BUG_ON(vlr.state & 3);
+
return process_queued_irq(vcpu, lr, vlr);
 }

> 
> I have no objections removing this point from the doc though, I'm just
> relaying information on this one.

I see, I talked with Marc and I am about to gather more data with the
above patch to prove that this never happens.

>>
>> ...
>>
>>>
 Is this an observation, an implementation bug or is this mentioned in
 the spec? Needing to spoon-feed the VGIC by doing it's job sounds a bit
 awkward to me.
>>>
>>> What do you mean?  How are we spoon-feeding the VGIC?
>>
>> By looking at the physical dist state and all LRs and clearing the LR we
>> do what the GIC is actually supposed to do for us - and what it actually
>> does according to my observations.
>>
>> The point is that patch 1 in my ITS emulation series is reworking the LR
>> handling and this patch was based on assumptions that seem to be no
>> longer true (i.e. we don't care about inactive LRs except for our LR
>> mapping code). So I want to be sure that I fully get what is going on
>> here and I struggle at this at the moment due to the above statement.
>>
>> What are the plans regarding your "v2: Rework architected timer..."
>> series? Will this be queued for 4.4? I want to do the
>> rebasing^Wrewriting of my series only once if possible ;-)
>>
> I think we should settle on this series ASAP and base your ITS stuff on
> top of it.  What do you think?

Yeah, that's what I was thinking too. So I will be working against
4.3-rc1 with your timer-rework-v2 branch plus the other fixes from the
kvm-arm queue merged.

Cheers,
Andre.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 7/8] arm/arm64: KVM: Rework the arch timer to use level-triggered semantics

2015-09-14 Thread Andre Przywara
Hi Christoffer,

just one small nit I stumbled upon:

On 04/09/15 20:40, Christoffer Dall wrote:
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 9ed8d53..f4ea950 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1422,34 +1422,43 @@ static bool vgic_process_maintenance(struct kvm_vcpu 
> *vcpu)
>  /*
>   * Save the physical active state, and reset it to inactive.
>   *
> - * Return 1 if HW interrupt went from active to inactive, and 0 otherwise.
> + * Return true if there's a pending level triggered interrupt line to queue.
>   */
> -static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
> +static bool vgic_sync_hwirq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr 
> vlr)
>  {
>   struct irq_phys_map *map;
> + bool phys_active;
>   int ret;
>  
>   if (!(vlr.state & LR_HW))
>   return 0;

This should read "return false;" now.

Cheers,
Andre.

>  
>   map = vgic_irq_map_search(vcpu, vlr.irq);
> - BUG_ON(!map || !map->active);
> + BUG_ON(!map);
>  
>   ret = irq_get_irqchip_state(map->irq,
>   IRQCHIP_STATE_ACTIVE,
> - >active);
> + _active);
>  
>   WARN_ON(ret);
>  
> - if (map->active) {
> + if (phys_active) {
> + /*
> +  * Interrupt still marked as active on the physical
> +  * distributor, so guest did not EOI it yet.  Reset to
> +  * non-active so that other VMs can see interrupts from this
> +  * device.
> +  */
>   ret = irq_set_irqchip_state(map->irq,
>   IRQCHIP_STATE_ACTIVE,
>   false);
>   WARN_ON(ret);
> - return 0;
> + return false;
>   }
>  
> - return 1;
> + /* Mapped edge-triggered interrupts not yet supported. */
> + WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
> + return process_level_irq(vcpu, lr, vlr);
>  }
>  
>  /* Sync back the VGIC state after a guest run */
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 6/8] arm/arm64: KVM: Add forwarded physical interrupts documentation

2015-09-11 Thread Andre Przywara
Hi Christoffer,

(actually you are not supposed to reply during your holidays!)

On 09/09/15 09:49, Christoffer Dall wrote:
> On Tue, Sep 8, 2015 at 6:57 PM, Andre Przywara <andre.przyw...@arm.com> wrote:
>> Hi Eric,
>>
>> thanks for you answer.
>>
>> On 08/09/15 09:43, Eric Auger wrote:
>>> Hi Andre,
>>> On 09/07/2015 01:25 PM, Andre Przywara wrote:
>>>> Hi,
>>>>
>>>> firstly: this text is really great, thanks for coming up with that.
>>>> See below for some information I got from tracing the host which I
>>>> cannot make sense of
>>>>
>>>>
>>>> On 04/09/15 20:40, Christoffer Dall wrote:
>>>>> Forwarded physical interrupts on arm/arm64 is a tricky concept and the
>>>>> way we deal with them is not apparently easy to understand by reading
>>>>> various specs.
>>>>>
>>>>> Therefore, add a proper documentation file explaining the flow and
>>>>> rationale of the behavior of the vgic.
>>>>>
>>>>> Some of this text was contributed by Marc Zyngier and edited by me.
>>>>> Omissions and errors are all mine.
>>>>>
>>>>> Signed-off-by: Christoffer Dall <christoffer.d...@linaro.org>
>>>>> ---
>>>>>  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 181 
>>>>> +
>>>>>  1 file changed, 181 insertions(+)
>>>>>  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>>>>
>>>>> diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt 
>>>>> b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>>>> new file mode 100644
>>>>> index 000..24b6f28
>>>>> --- /dev/null
>>>>> +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>>>> @@ -0,0 +1,181 @@
>>>>> +KVM/ARM VGIC Forwarded Physical Interrupts
>>>>> +==
>>>>> +
>>>>> +The KVM/ARM code implements software support for the ARM Generic
>>>>> +Interrupt Controller's (GIC's) hardware support for virtualization by
>>>>> +allowing software to inject virtual interrupts to a VM, which the guest
>>>>> +OS sees as regular interrupts.  The code is famously known as the VGIC.
>>>>> +
>>>>> +Some of these virtual interrupts, however, correspond to physical
>>>>> +interrupts from real physical devices.  One example could be the
>>>>> +architected timer, which itself supports virtualization, and therefore
>>>>> +lets a guest OS program the hardware device directly to raise an
>>>>> +interrupt at some point in time.  When such an interrupt is raised, the
>>>>> +host OS initially handles the interrupt and must somehow signal this
>>>>> +event as a virtual interrupt to the guest.  Another example could be a
>>>>> +passthrough device, where the physical interrupts are initially handled
>>>>> +by the host, but the device driver for the device lives in the guest OS
>>>>> +and KVM must therefore somehow inject a virtual interrupt on behalf of
>>>>> +the physical one to the guest OS.
>>>>> +
>>>>> +These virtual interrupts corresponding to a physical interrupt on the
>>>>> +host are called forwarded physical interrupts, but are also sometimes
>>>>> +referred to as 'virtualized physical interrupts' and 'mapped interrupts'.
>>>>> +
>>>>> +Forwarded physical interrupts are handled slightly differently compared
>>>>> +to virtual interrupts generated purely by a software emulated device.
>>>>> +
>>>>> +
>>>>> +The HW bit
>>>>> +--
>>>>> +Virtual interrupts are signalled to the guest by programming the List
>>>>> +Registers (LRs) on the GIC before running a VCPU.  The LR is programmed
>>>>> +with the virtual IRQ number and the state of the interrupt (Pending,
>>>>> +Active, or Pending+Active).  When the guest ACKs and EOIs a virtual
>>>>> +interrupt, the LR state moves from Pending to Active, and finally to
>>>>> +inactive.
>>>>> +
>>>>> +The LRs include an extra bit, called the HW bit.  When this bit is set,
>>>>> +KVM must also program an additional field in the LR, the physical IRQ
>>>>>

Re: [PATCH kvmtool] Make static libc and guest-init functionality optional.

2015-09-11 Thread Andre Przywara
Hi Dimitri,

thanks for sharing this patch and sorry for the delay.

(CC:ing Will)

On 04/09/15 13:04, Dimitri John Ledkov wrote:
> If one typically only boots full disk-images, one wouldn't necessaraly
> want to statically link glibc, for the guest-init feature of the
> kvmtool. As statically linked glibc triggers haevy security
> maintainance.

I like the idea of making guest-init optional, and actually was bitten
by this annoying static libc requirement once before.
Some comments below:

> 
> Signed-off-by: Dimitri John Ledkov 
> ---
>  Makefile| 11 ++-
>  builtin-run.c   |  7 +++
>  builtin-setup.c |  7 +++
>  3 files changed, 20 insertions(+), 5 deletions(-)
> 
> diff --git a/Makefile b/Makefile
> index 1534e6f..42a629a 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -34,8 +34,6 @@ bindir_SQ = $(subst ','\'',$(bindir))
>  PROGRAM  := lkvm
>  PROGRAM_ALIAS := vm
>  
> -GUEST_INIT := guest/init
> -
>  OBJS += builtin-balloon.o
>  OBJS += builtin-debug.o
>  OBJS += builtin-help.o
> @@ -279,8 +277,12 @@ ifeq ($(LTO),1)
>   endif
>  endif
>  
> -ifneq ($(call try-build,$(SOURCE_STATIC),,-static),y)
> -$(error No static libc found. Please install glibc-static package.)
> +ifeq ($(call try-build,$(SOURCE_STATIC),,-static),y)
> + CFLAGS  += -DCONFIG_HAS_LIBC

The name CONFIG_HAS_LIBC seems a bit misleading to me, so at least this
symbol should read CONFIG_HAS_STATIC_LIBC. But I'd prefer to have it
named after it's user instead: CONFIG_GUEST_INIT (or the like), since
this is what it protects in the code.

> + GUEST_INIT := guest/init
> + GUEST_OBJS = guest/guest_init.o
> +else
> + NOTFOUND+= static-libc
>  endif
>  
>  ifeq (y,$(ARCH_WANT_LIBFDT))
> @@ -356,7 +358,6 @@ c_flags   = -Wp,-MD,$(depfile) $(CFLAGS)
>  # $(OTHEROBJS) are things that do not get substituted like this.
>  #
>  STATIC_OBJS = $(patsubst %.o,%.static.o,$(OBJS) $(OBJS_STATOPT))
> -GUEST_OBJS = guest/guest_init.o
>  
>  $(PROGRAM)-static:  $(STATIC_OBJS) $(OTHEROBJS) $(GUEST_INIT)
>   $(E) "  LINK" $@
> diff --git a/builtin-run.c b/builtin-run.c
> index 1ee75ad..0f67471 100644
> --- a/builtin-run.c
> +++ b/builtin-run.c
> @@ -59,8 +59,13 @@ static int  kvm_run_wrapper;
>  
>  bool do_debug_print = false;
>  
> +#ifdef CONFIG_HAS_LIBC
>  extern char _binary_guest_init_start;
>  extern char _binary_guest_init_size;
> +#else
> +static char _binary_guest_init_start=0;
> +static char _binary_guest_init_size=0;
> +#endif
>  
>  static const char * const run_usage[] = {
>   "lkvm run [] []",
> @@ -354,6 +359,8 @@ static int kvm_setup_guest_init(struct kvm *kvm)
>   char *data;
>  
>   /* Setup /virt/init */
> + if (!_binary_guest_init_size)
> + die("Guest init not compiled");

I wonder if comparing with 0 is safe in every case. I appreciate not
spoiling the code with #ifdefs, but putting one around here seems
cleaner to me (especially if you look at the error message).

>   size = (size_t)&_binary_guest_init_size;
>   data = (char *)&_binary_guest_init_start;
>   snprintf(tmp, PATH_MAX, "%s%s/virt/init", kvm__get_dir(), rootfs);
> diff --git a/builtin-setup.c b/builtin-setup.c
> index 8b45c56..d77e5e0 100644
> --- a/builtin-setup.c
> +++ b/builtin-setup.c
> @@ -16,8 +16,13 @@
>  #include 
>  #include 
>  
> +#ifdef CONFIG_HAS_LIBC
>  extern char _binary_guest_init_start;
>  extern char _binary_guest_init_size;
> +#else
> +static char _binary_guest_init_start=0;
> +static char _binary_guest_init_size=0;
> +#endif
>  
>  static const char *instance_name;
>  
> @@ -131,6 +136,8 @@ static int copy_init(const char *guestfs_name)
>   int fd, ret;
>   char *data;
>  
> + if (!_binary_guest_init_size)
> + die("Guest init not compiled");

Same as above.

Cheers,
Andre.

>   size = (size_t)&_binary_guest_init_size;
>   data = (char *)&_binary_guest_init_start;
>   snprintf(path, PATH_MAX, "%s%s/virt/init", kvm__get_dir(), 
> guestfs_name);
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 6/8] arm/arm64: KVM: Add forwarded physical interrupts documentation

2015-09-08 Thread Andre Przywara
Hi Eric,

thanks for you answer.

On 08/09/15 09:43, Eric Auger wrote:
> Hi Andre,
> On 09/07/2015 01:25 PM, Andre Przywara wrote:
>> Hi,
>>
>> firstly: this text is really great, thanks for coming up with that.
>> See below for some information I got from tracing the host which I
>> cannot make sense of
>>
>>
>> On 04/09/15 20:40, Christoffer Dall wrote:
>>> Forwarded physical interrupts on arm/arm64 is a tricky concept and the
>>> way we deal with them is not apparently easy to understand by reading
>>> various specs.
>>>
>>> Therefore, add a proper documentation file explaining the flow and
>>> rationale of the behavior of the vgic.
>>>
>>> Some of this text was contributed by Marc Zyngier and edited by me.
>>> Omissions and errors are all mine.
>>>
>>> Signed-off-by: Christoffer Dall <christoffer.d...@linaro.org>
>>> ---
>>>  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 181 
>>> +
>>>  1 file changed, 181 insertions(+)
>>>  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>>
>>> diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt 
>>> b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>> new file mode 100644
>>> index 000..24b6f28
>>> --- /dev/null
>>> +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>> @@ -0,0 +1,181 @@
>>> +KVM/ARM VGIC Forwarded Physical Interrupts
>>> +==
>>> +
>>> +The KVM/ARM code implements software support for the ARM Generic
>>> +Interrupt Controller's (GIC's) hardware support for virtualization by
>>> +allowing software to inject virtual interrupts to a VM, which the guest
>>> +OS sees as regular interrupts.  The code is famously known as the VGIC.
>>> +
>>> +Some of these virtual interrupts, however, correspond to physical
>>> +interrupts from real physical devices.  One example could be the
>>> +architected timer, which itself supports virtualization, and therefore
>>> +lets a guest OS program the hardware device directly to raise an
>>> +interrupt at some point in time.  When such an interrupt is raised, the
>>> +host OS initially handles the interrupt and must somehow signal this
>>> +event as a virtual interrupt to the guest.  Another example could be a
>>> +passthrough device, where the physical interrupts are initially handled
>>> +by the host, but the device driver for the device lives in the guest OS
>>> +and KVM must therefore somehow inject a virtual interrupt on behalf of
>>> +the physical one to the guest OS.
>>> +
>>> +These virtual interrupts corresponding to a physical interrupt on the
>>> +host are called forwarded physical interrupts, but are also sometimes
>>> +referred to as 'virtualized physical interrupts' and 'mapped interrupts'.
>>> +
>>> +Forwarded physical interrupts are handled slightly differently compared
>>> +to virtual interrupts generated purely by a software emulated device.
>>> +
>>> +
>>> +The HW bit
>>> +--
>>> +Virtual interrupts are signalled to the guest by programming the List
>>> +Registers (LRs) on the GIC before running a VCPU.  The LR is programmed
>>> +with the virtual IRQ number and the state of the interrupt (Pending,
>>> +Active, or Pending+Active).  When the guest ACKs and EOIs a virtual
>>> +interrupt, the LR state moves from Pending to Active, and finally to
>>> +inactive.
>>> +
>>> +The LRs include an extra bit, called the HW bit.  When this bit is set,
>>> +KVM must also program an additional field in the LR, the physical IRQ
>>> +number, to link the virtual with the physical IRQ.
>>> +
>>> +When the HW bit is set, KVM must EITHER set the Pending OR the Active
>>> +bit, never both at the same time.
>>> +
>>> +Setting the HW bit causes the hardware to deactivate the physical
>>> +interrupt on the physical distributor when the guest deactivates the
>>> +corresponding virtual interrupt.
>>> +
>>> +
>>> +Forwarded Physical Interrupts Life Cycle
>>> +
>>> +
>>> +The state of forwarded physical interrupts is managed in the following way:
>>> +
>>> +  - The physical interrupt is acked by the host, and becomes active on
>>> +the physical distributor (*).
>>> +  - KVM sets the LR.Pen

Re: [PATCH v2 6/8] arm/arm64: KVM: Add forwarded physical interrupts documentation

2015-09-07 Thread Andre Przywara
Hi,

firstly: this text is really great, thanks for coming up with that.
See below for some information I got from tracing the host which I
cannot make sense of


On 04/09/15 20:40, Christoffer Dall wrote:
> Forwarded physical interrupts on arm/arm64 is a tricky concept and the
> way we deal with them is not apparently easy to understand by reading
> various specs.
> 
> Therefore, add a proper documentation file explaining the flow and
> rationale of the behavior of the vgic.
> 
> Some of this text was contributed by Marc Zyngier and edited by me.
> Omissions and errors are all mine.
> 
> Signed-off-by: Christoffer Dall 
> ---
>  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 181 
> +
>  1 file changed, 181 insertions(+)
>  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> 
> diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt 
> b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> new file mode 100644
> index 000..24b6f28
> --- /dev/null
> +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> @@ -0,0 +1,181 @@
> +KVM/ARM VGIC Forwarded Physical Interrupts
> +==
> +
> +The KVM/ARM code implements software support for the ARM Generic
> +Interrupt Controller's (GIC's) hardware support for virtualization by
> +allowing software to inject virtual interrupts to a VM, which the guest
> +OS sees as regular interrupts.  The code is famously known as the VGIC.
> +
> +Some of these virtual interrupts, however, correspond to physical
> +interrupts from real physical devices.  One example could be the
> +architected timer, which itself supports virtualization, and therefore
> +lets a guest OS program the hardware device directly to raise an
> +interrupt at some point in time.  When such an interrupt is raised, the
> +host OS initially handles the interrupt and must somehow signal this
> +event as a virtual interrupt to the guest.  Another example could be a
> +passthrough device, where the physical interrupts are initially handled
> +by the host, but the device driver for the device lives in the guest OS
> +and KVM must therefore somehow inject a virtual interrupt on behalf of
> +the physical one to the guest OS.
> +
> +These virtual interrupts corresponding to a physical interrupt on the
> +host are called forwarded physical interrupts, but are also sometimes
> +referred to as 'virtualized physical interrupts' and 'mapped interrupts'.
> +
> +Forwarded physical interrupts are handled slightly differently compared
> +to virtual interrupts generated purely by a software emulated device.
> +
> +
> +The HW bit
> +--
> +Virtual interrupts are signalled to the guest by programming the List
> +Registers (LRs) on the GIC before running a VCPU.  The LR is programmed
> +with the virtual IRQ number and the state of the interrupt (Pending,
> +Active, or Pending+Active).  When the guest ACKs and EOIs a virtual
> +interrupt, the LR state moves from Pending to Active, and finally to
> +inactive.
> +
> +The LRs include an extra bit, called the HW bit.  When this bit is set,
> +KVM must also program an additional field in the LR, the physical IRQ
> +number, to link the virtual with the physical IRQ.
> +
> +When the HW bit is set, KVM must EITHER set the Pending OR the Active
> +bit, never both at the same time.
> +
> +Setting the HW bit causes the hardware to deactivate the physical
> +interrupt on the physical distributor when the guest deactivates the
> +corresponding virtual interrupt.
> +
> +
> +Forwarded Physical Interrupts Life Cycle
> +
> +
> +The state of forwarded physical interrupts is managed in the following way:
> +
> +  - The physical interrupt is acked by the host, and becomes active on
> +the physical distributor (*).
> +  - KVM sets the LR.Pending bit, because this is the only way the GICV
> +interface is going to present it to the guest.
> +  - LR.Pending will stay set as long as the guest has not acked the 
> interrupt.
> +  - LR.Pending transitions to LR.Active on the guest read of the IAR, as
> +expected.
> +  - On guest EOI, the *physical distributor* active bit gets cleared,
> +but the LR.Active is left untouched (set).

I tried hard in the last week, but couldn't confirm this. Tracing shows
the following pattern over and over (case 1):
(This is the kvm/kvm.git:queue branch from last week, so including the
mapped timer IRQ code. Tests were done on Juno and Midway)

...
229.340171: kvm_exit: TRAP: HSR_EC: 0x0001 (WFx), PC: 0xffc98a64
229.340324: kvm_exit: IRQ: HSR_EC: 0x0001 (WFx), PC: 0xffc0001c63a0
229.340428: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC:
0xffc0004089d8
229.340430: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 8,
ELRSR: 1, dist active: 0, log. active: 1


My hunch is that the following happens (please correct me if needed!):
First there is an unrelated trap 

Re: [PATCH] kvmtool Makefile: relax arm test

2015-09-04 Thread Andre Przywara
Hi Riku,

On 04/09/15 11:52, Riku Voipio wrote:
> On 4 September 2015 at 13:10, Andre Przywara <andre.przyw...@arm.com> wrote:
>> Hi Riku,
>>
>> On 03/09/15 12:20, riku.voi...@linaro.org wrote:
>>> From: Riku Voipio <riku.voi...@linaro.org>
>>>
>>> Currently Makefile accepts only armv7l.* When building kvmtool under 32bit
>>> personality on Aarch64 machines, uname -m reports "armv8l", so build fails.
>>> We expect doing 32bit arm builds in Aarch64 to become standard the same way
>>> people do i386 builds on x86_64 machines.
>>>
>>> Make the sed test a little more greedy so armv8l becomes acceptable.
>>>
>>> Signed-off-by: Riku Voipio <riku.voi...@linaro.org>
>>
>> The patch looks OK to me, I just wonder how you do the actual build
>> within the linux32 environment?
>> Do you have an arm cross compiler installed and set CROSS_COMPILE? Or is
>> there a magic compiler (driver) which uses uname -m as well?
>> And what would be the difference to setting ARCH=arm as well? Just
>> convenience?
> 
> It's just an arm32 chroot, with an native arm32 compiler. The chroot
> is on an arm64 machine since these tend to be much faster than arm32
> hardware.

Oh right, a chroot, didn't think about the obvious ;-)
Also it applies to 64-bit kernels with 32-bit root filesystems, I think.
So:

Acked-by: Andre Przywara <andre.przyw...@arm.com>

Cheers,
Andre.

> 
> It would of course be possible to set ARCH=arm, but that would mean
> some ifdefs in the Debian packaging, since the same build rule should
> work for all architectures.
> 
> Riku
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvmtool Makefile: relax arm test

2015-09-04 Thread Andre Przywara
Hi Riku,

On 03/09/15 12:20, riku.voi...@linaro.org wrote:
> From: Riku Voipio 
> 
> Currently Makefile accepts only armv7l.* When building kvmtool under 32bit
> personality on Aarch64 machines, uname -m reports "armv8l", so build fails.
> We expect doing 32bit arm builds in Aarch64 to become standard the same way
> people do i386 builds on x86_64 machines.
> 
> Make the sed test a little more greedy so armv8l becomes acceptable.
> 
> Signed-off-by: Riku Voipio 

The patch looks OK to me, I just wonder how you do the actual build
within the linux32 environment?
Do you have an arm cross compiler installed and set CROSS_COMPILE? Or is
there a magic compiler (driver) which uses uname -m as well?
And what would be the difference to setting ARCH=arm as well? Just
convenience?

Cheers,
Andre.

> ---
>  Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/Makefile b/Makefile
> index 1534e6f..7b17d52 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -103,7 +103,7 @@ OBJS  += hw/i8042.o
>  
>  # Translate uname -m into ARCH string
>  ARCH ?= $(shell uname -m | sed -e s/i.86/i386/ -e s/ppc.*/powerpc/ \
> -   -e s/armv7.*/arm/ -e s/aarch64.*/arm64/ -e s/mips64/mips/)
> +   -e s/armv.*/arm/ -e s/aarch64.*/arm64/ -e s/mips64/mips/)
>  
>  ifeq ($(ARCH),i386)
>   ARCH := x86
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/5] KVM: arm/arm64: Refactor vGIC attributes handling code

2015-09-04 Thread Andre Przywara
Hi Pavel,

On 02/09/15 09:09, Pavel Fedin wrote:
> Separate all implementation-independent code in vgic_attr_regs_access()
> and move it to vgic.c. This will allow to reuse this code for vGICv3
> implementation.
> 
> Signed-off-by: Pavel Fedin 
> ---
>  virt/kvm/arm/vgic-v2-emul.c | 126 
> +---
>  virt/kvm/arm/vgic.c |  77 +++
>  virt/kvm/arm/vgic.h |   4 ++
>  3 files changed, 107 insertions(+), 100 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic-v2-emul.c b/virt/kvm/arm/vgic-v2-emul.c
> index 1390797..557c5a6 100644
> --- a/virt/kvm/arm/vgic-v2-emul.c
> +++ b/virt/kvm/arm/vgic-v2-emul.c
> @@ -661,97 +661,38 @@ static const struct vgic_io_range vgic_cpu_ranges[] = {
>   },
>  };
>  
> -static int vgic_attr_regs_access(struct kvm_device *dev,
> -  struct kvm_device_attr *attr,
> -  u32 *reg, bool is_write)
> +static int vgic_v2_attr_regs_access(struct kvm_device *dev,
> + struct kvm_device_attr *attr,
> + __le32 *data, bool is_write)
>  {
> - const struct vgic_io_range *r = NULL, *ranges;
> + const struct vgic_io_range *ranges;
>   phys_addr_t offset;
> - int ret, cpuid, c;
> - struct kvm_vcpu *vcpu, *tmp_vcpu;
> - struct vgic_dist *vgic;
> + int cpuid;
> + struct vgic_dist *vgic = >kvm->arch.vgic;
>   struct kvm_exit_mmio mmio;
> - u32 data;
>  
>   offset = attr->attr & KVM_DEV_ARM_VGIC_OFFSET_MASK;
>   cpuid = (attr->attr & KVM_DEV_ARM_VGIC_CPUID_MASK) >>
>   KVM_DEV_ARM_VGIC_CPUID_SHIFT;
>  
> - mutex_lock(>kvm->lock);
> -
> - ret = vgic_init(dev->kvm);
> - if (ret)
> - goto out;
> -
> - if (cpuid >= atomic_read(>kvm->online_vcpus)) {
> - ret = -EINVAL;
> - goto out;
> - }
> -
> - vcpu = kvm_get_vcpu(dev->kvm, cpuid);
> - vgic = >kvm->arch.vgic;
> -
> - mmio.len = 4;
> - mmio.is_write = is_write;
> - mmio.data = 
> - if (is_write)
> - mmio_data_write(, ~0, *reg);
>   switch (attr->group) {
>   case KVM_DEV_ARM_VGIC_GRP_DIST_REGS:
> - mmio.phys_addr = vgic->vgic_dist_base + offset;
> + mmio.phys_addr = vgic->vgic_dist_base;
>   ranges = vgic_dist_ranges;
>   break;
>   case KVM_DEV_ARM_VGIC_GRP_CPU_REGS:
> - mmio.phys_addr = vgic->vgic_cpu_base + offset;
> + mmio.phys_addr = vgic->vgic_cpu_base;
>   ranges = vgic_cpu_ranges;
>   break;
>   default:
> - BUG();
> + return -ENXIO;
>   }
> - r = vgic_find_range(ranges, 4, offset);
>  
> - if (unlikely(!r || !r->handle_mmio)) {
> - ret = -ENXIO;
> - goto out;
> - }
> -
> -
> - spin_lock(>lock);
> -
> - /*
> -  * Ensure that no other VCPU is running by checking the vcpu->cpu
> -  * field.  If no other VPCUs are running we can safely access the VGIC
> -  * state, because even if another VPU is run after this point, that
> -  * VCPU will not touch the vgic state, because it will block on
> -  * getting the vgic->lock in kvm_vgic_sync_hwstate().
> -  */
> - kvm_for_each_vcpu(c, tmp_vcpu, dev->kvm) {
> - if (unlikely(tmp_vcpu->cpu != -1)) {
> - ret = -EBUSY;
> - goto out_vgic_unlock;
> - }
> - }
> -
> - /*
> -  * Move all pending IRQs from the LRs on all VCPUs so the pending
> -  * state can be properly represented in the register state accessible
> -  * through this API.
> -  */
> - kvm_for_each_vcpu(c, tmp_vcpu, dev->kvm)
> - vgic_unqueue_irqs(tmp_vcpu);
> -
> - offset -= r->base;
> - r->handle_mmio(vcpu, , offset);
> -
> - if (!is_write)
> - *reg = mmio_data_read(, ~0);
> + mmio.is_write = is_write;
> + mmio.data = data;
>  
> - ret = 0;
> -out_vgic_unlock:
> - spin_unlock(>lock);
> -out:
> - mutex_unlock(>kvm->lock);
> - return ret;
> + return vgic_attr_regs_access(dev, ranges, , offset, sizeof(data),
> +  cpuid);

Isn't the len parameter redundant here? I see that you don't initialize
mmio.len (which is a bit scary, btw), so can't you just use that field?

>  }
>  
>  static int vgic_v2_create(struct kvm_device *dev, u32 type)
> @@ -767,53 +708,38 @@ static void vgic_v2_destroy(struct kvm_device *dev)
>  static int vgic_v2_set_attr(struct kvm_device *dev,
>   struct kvm_device_attr *attr)
>  {
> + u32 __user *uaddr = (u32 __user *)(long)attr->addr;
> + u32 reg;
> + __le32 data;

That (and other parts of this patch) sneak in some endianness handling,
which I'd like to be mentioned in the commit message, but preferably be
in a separate patch. The commit message 

Re: [PATCH v3 2/5] KVM: arm64: Implement vGICv3 distributor and redistributor access from userspace

2015-09-04 Thread Andre Przywara
Hi Pavel,

On 04/09/15 13:40, Pavel Fedin wrote:
> The access is done similar to vGICv2, using KVM_DEV_ARM_VGIC_GRP_DIST_REGS
> and KVM_DEV_ARM_VGIC_GRP_REDIST_REGS with KVM_SET_DEVICE_ATTR and
> KVM_GET_DEVICE_ATTR ioctls. Since GICv3 can handle large number of CPUs,
> KVM_DEV_ARM_VGIC_CPUID_MASK has been extended to 20 bits. This is enough
> for 1048576 CPUs.

I guess the 20 bits come from 8 bits for Aff2 and Aff1 and 4-bits for
Aff0? If so, please mention this. But I am not sure we should limit the
cpu index in this public API to something as low 20 bits. Since this
mask is GIC specific, we could push the size bit into offset and use the
full upper 32 bits for cpuid, or at least 28 bits plus 4 reserved.

> 
> Some registers are 64-bit wide according to the specification.
> KVM_DEV_ARM_VGIC_64BIT flag is introduced, allowing to perform full 64-bit
> accesses.
> 
> Signed-off-by: Pavel Fedin 
> ---
>  Documentation/virtual/kvm/devices/arm-vgic.txt | 35 --
>  arch/arm64/include/uapi/asm/kvm.h  |  4 +-
>  virt/kvm/arm/vgic-v3-emul.c| 95 
> ++
>  virt/kvm/arm/vgic.c|  1 +
>  4 files changed, 116 insertions(+), 19 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/devices/arm-vgic.txt 
> b/Documentation/virtual/kvm/devices/arm-vgic.txt
> index 3fb9054..03f640f 100644
> --- a/Documentation/virtual/kvm/devices/arm-vgic.txt
> +++ b/Documentation/virtual/kvm/devices/arm-vgic.txt
> @@ -43,10 +43,13 @@ Groups:
>KVM_DEV_ARM_VGIC_GRP_DIST_REGS
>Attributes:
>  The attr field of kvm_device_attr encodes two values:
> -bits: | 63     40 | 39 ..  32  |  31   0 |
> -values:   |reserved   |   cpu id   |  offset |
> +bits: |  63  | 62 .. 52 | 51 ..  32  |  31   0 |
> +values:   | size | reserved |   cpu id   |  offset |
>  
>  All distributor regs are (rw, 32-bit)
> +For GICv3 some regs are also (rw, 64-bit) according to the specification.

That sounds contradictory to me. What about:
All registers can be accessed by using 32-bit accesses, some registers
also by 64-bit reads/writes according to the specification.

> +In order to perform full 64-bit access 'size' bit should be set to 1.
> +KVM_DEV_ARM_VGIC_64BIT flag value is provided for this purpose.
>  
>  The offset is relative to the "Distributor base address" as defined in 
> the
>  GICv2 specs.  Getting or setting such a register has the same effect as
> @@ -54,9 +57,33 @@ Groups:
>  specified with cpu id field.  Note that most distributor fields are not
>  banked, but return the same value regardless of the cpu id used to access
>  the register.
> +
> +  Limitations:
> +- Priorities are not implemented, and registers are RAZ/WI
> +  Errors:
> +-ENODEV: Getting or setting this register is not yet supported

Isn't that actually -ENXIO in the code? I see that this is just copy &
paste, but it should be fixed in either case.

> +-EBUSY: One or more VCPUs are running
> +
> +  KVM_DEV_ARM_VGIC_GRP_REDIST_REGS
> +  Attributes:
> +The attr field of kvm_device_attr encodes two values:
> +bits: |  63  | 62 .. 52 | 51 ..  32  |  31   0 |
> +values:   | size | reserved |   cpu id   |  offset |
> +
> +All redistributor regs are (rw, 32-bit)
> +For GICv3 some regs are also (rw, 64-bit) according to the specification.
> +In order to perform full 64-bit access 'size' bit should be set to 1.
> +KVM_DEV_ARM_VGIC_64BIT flag value is provided for this purpose.
> +
> +The offset is relative to the "Redistributor base address" as defined in
> +the GICv3 specs.  Getting or setting such a register has the same effect 
> as
> +reading or writing the register on the actual hardware from the cpu
> +specified with cpu id field.  Note that most distributor fields are not
> +banked, but return the same value regardless of the cpu id used to access
> +the register.
> +
>Limitations:
>  - Priorities are not implemented, and registers are RAZ/WI
> -- Currently only implemented for KVM_DEV_TYPE_ARM_VGIC_V2.
>Errors:
>  -ENODEV: Getting or setting this register is not yet supported
>  -EBUSY: One or more VCPUs are running
> @@ -64,7 +91,7 @@ Groups:
>KVM_DEV_ARM_VGIC_GRP_CPU_REGS
>Attributes:
>  The attr field of kvm_device_attr encodes two values:
> -bits: | 63     40 | 39 ..  32  |  31   0 |
> +bits: | 63     52 | 51 ..  32  |  31   0 |
>  values:   |reserved   |   cpu id   |  offset |
>  
>  All CPU interface regs are (rw, 32-bit)
> diff --git a/arch/arm64/include/uapi/asm/kvm.h 
> b/arch/arm64/include/uapi/asm/kvm.h
> index 0cd7b59..249954f 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -196,13 +196,15 @@ struct kvm_arch_memory_slot {
>  #define 

Re: [PATCH v3 3/5] KVM: arm64: Refactor system register handlers

2015-09-04 Thread Andre Przywara
Hi Pavel,

On 04/09/15 13:40, Pavel Fedin wrote:
> Replace Rt with data pointer in struct sys_reg_params. This will allow to
> reuse system register handling code in implementation of vGICv3 CPU
> interface access API. Additionally, got rid of "massive hack"
> in kvm_handle_cp_64().
> 
> Signed-off-by: Pavel Fedin 
> ---
>  arch/arm64/kvm/sys_regs.c| 61 
> +---
>  arch/arm64/kvm/sys_regs.h|  4 +--
>  arch/arm64/kvm/sys_regs_generic_v8.c |  2 +-
>  3 files changed, 32 insertions(+), 35 deletions(-)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index b41607d..fe6b517 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -102,7 +102,7 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
>  
>   BUG_ON(!p->is_write);
>  
> - val = *vcpu_reg(vcpu, p->Rt);
> + val = *p->val;
>   if (!p->is_aarch32) {
>   vcpu_sys_reg(vcpu, r->reg) = val;
>   } else {
> @@ -125,13 +125,10 @@ static bool access_gic_sgi(struct kvm_vcpu *vcpu,
>  const struct sys_reg_params *p,
>  const struct sys_reg_desc *r)
>  {
> - u64 val;
> -
>   if (!p->is_write)
>   return read_from_write_only(vcpu, p);
>  
> - val = *vcpu_reg(vcpu, p->Rt);
> - vgic_v3_dispatch_sgi(vcpu, val);
> + vgic_v3_dispatch_sgi(vcpu, *p->val);
>  
>   return true;
>  }
> @@ -153,7 +150,7 @@ static bool trap_oslsr_el1(struct kvm_vcpu *vcpu,
>   if (p->is_write) {
>   return ignore_write(vcpu, p);
>   } else {
> - *vcpu_reg(vcpu, p->Rt) = (1 << 3);
> + *p->val = (1 << 3);
>   return true;
>   }
>  }
> @@ -167,7 +164,7 @@ static bool trap_dbgauthstatus_el1(struct kvm_vcpu *vcpu,
>   } else {
>   u32 val;
>   asm volatile("mrs %0, dbgauthstatus_el1" : "=r" (val));
> - *vcpu_reg(vcpu, p->Rt) = val;
> + *p->val = val;
>   return true;
>   }
>  }
> @@ -204,13 +201,13 @@ static bool trap_debug_regs(struct kvm_vcpu *vcpu,
>   const struct sys_reg_desc *r)
>  {
>   if (p->is_write) {
> - vcpu_sys_reg(vcpu, r->reg) = *vcpu_reg(vcpu, p->Rt);
> + vcpu_sys_reg(vcpu, r->reg) = *p->val;
>   vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
>   } else {
> - *vcpu_reg(vcpu, p->Rt) = vcpu_sys_reg(vcpu, r->reg);
> + *p->val = vcpu_sys_reg(vcpu, r->reg);
>   }
>  
> - trace_trap_reg(__func__, r->reg, p->is_write, *vcpu_reg(vcpu, p->Rt));
> + trace_trap_reg(__func__, r->reg, p->is_write, *p->val);
>  
>   return true;
>  }
> @@ -228,7 +225,7 @@ static inline void reg_to_dbg(struct kvm_vcpu *vcpu,
> const struct sys_reg_params *p,
> u64 *dbg_reg)
>  {
> - u64 val = *vcpu_reg(vcpu, p->Rt);
> + u64 val = *p->val;
>  
>   if (p->is_32bit) {
>   val &= 0xUL;
> @@ -248,7 +245,7 @@ static inline void dbg_to_reg(struct kvm_vcpu *vcpu,
>   if (p->is_32bit)
>   val &= 0xUL;
>  
> - *vcpu_reg(vcpu, p->Rt) = val;
> + *p->val = val;
>  }
>  
>  static inline bool trap_bvr(struct kvm_vcpu *vcpu,
> @@ -704,10 +701,10 @@ static bool trap_dbgidr(struct kvm_vcpu *vcpu,
>   u64 pfr = read_cpuid(ID_AA64PFR0_EL1);
>   u32 el3 = !!((pfr >> 12) & 0xf);
>  
> - *vcpu_reg(vcpu, p->Rt) = dfr >> 20) & 0xf) << 28) |
> -   (((dfr >> 12) & 0xf) << 24) |
> -   (((dfr >> 28) & 0xf) << 20) |
> -   (6 << 16) | (el3 << 14) | (el3 << 
> 12));
> + *p->val = dfr >> 20) & 0xf) << 28) |
> +(((dfr >> 12) & 0xf) << 24) |
> +(((dfr >> 28) & 0xf) << 20) |
> +(6 << 16) | (el3 << 14) | (el3 << 12));
>   return true;
>   }
>  }
> @@ -717,10 +714,10 @@ static bool trap_debug32(struct kvm_vcpu *vcpu,
>const struct sys_reg_desc *r)
>  {
>   if (p->is_write) {
> - vcpu_cp14(vcpu, r->reg) = *vcpu_reg(vcpu, p->Rt);
> + vcpu_cp14(vcpu, r->reg) = *p->val;
>   vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
>   } else {
> - *vcpu_reg(vcpu, p->Rt) = vcpu_cp14(vcpu, r->reg);
> + *p->val = vcpu_cp14(vcpu, r->reg);
>   }
>  
>   return true;
> @@ -747,12 +744,12 @@ static inline bool trap_xvr(struct kvm_vcpu *vcpu,
>   u64 val = *dbg_reg;
>  
>   val &= 0xUL;
> - val |= *vcpu_reg(vcpu, p->Rt) << 32;
> + val |= *p->val << 32;
>   *dbg_reg = val;
>  
>   vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
>   } else {
> - 

Re: [PATCH v3 4/5] KVM: arm64: Introduce find_reg_by_id()

2015-09-04 Thread Andre Przywara
On 04/09/15 13:40, Pavel Fedin wrote:
> In order to implement vGICv3 CPU interface access, we will need to perform
> table lookup of system registers. We would need both index_to_params() and
> find_reg() exported for that purpose, but instead we export a single
> function which combines them both.
> 
> Signed-off-by: Pavel Fedin <p.fe...@samsung.com>

Reviewed-by: Andre Przywara <andre.przyw...@arm.com>

> ---
>  arch/arm64/kvm/sys_regs.c | 22 +++---
>  arch/arm64/kvm/sys_regs.h |  4 
>  2 files changed, 19 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index fe6b517..21403fa 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1283,6 +1283,17 @@ static bool index_to_params(u64 id, struct 
> sys_reg_params *params)
>   }
>  }
>  
> +const struct sys_reg_desc *find_reg_by_id(u64 id,
> +   struct sys_reg_params *params,
> +   const struct sys_reg_desc table[],
> +   unsigned int num)
> +{
> + if (!index_to_params(id, params))
> + return NULL;
> +
> + return find_reg(params, table, num);
> +}
> +
>  /* Decode an index value, and find the sys_reg_desc entry. */
>  static const struct sys_reg_desc *index_to_sys_reg_desc(struct kvm_vcpu 
> *vcpu,
>   u64 id)
> @@ -1410,10 +1421,8 @@ static int get_invariant_sys_reg(u64 id, void __user 
> *uaddr)
>   struct sys_reg_params params;
>   const struct sys_reg_desc *r;
>  
> - if (!index_to_params(id, ))
> - return -ENOENT;
> -
> - r = find_reg(, invariant_sys_regs, 
> ARRAY_SIZE(invariant_sys_regs));
> + r = find_reg_by_id(id, , invariant_sys_regs,
> +ARRAY_SIZE(invariant_sys_regs));
>   if (!r)
>   return -ENOENT;
>  
> @@ -1427,9 +1436,8 @@ static int set_invariant_sys_reg(u64 id, void __user 
> *uaddr)
>   int err;
>   u64 val = 0; /* Make sure high bits are 0 for 32-bit regs */
>  
> - if (!index_to_params(id, ))
> - return -ENOENT;
> - r = find_reg(, invariant_sys_regs, 
> ARRAY_SIZE(invariant_sys_regs));
> + r = find_reg_by_id(id, , invariant_sys_regs,
> +ARRAY_SIZE(invariant_sys_regs));
>   if (!r)
>   return -ENOENT;
>  
> diff --git a/arch/arm64/kvm/sys_regs.h b/arch/arm64/kvm/sys_regs.h
> index 3267518..0646108 100644
> --- a/arch/arm64/kvm/sys_regs.h
> +++ b/arch/arm64/kvm/sys_regs.h
> @@ -136,6 +136,10 @@ static inline int cmp_sys_reg(const struct sys_reg_desc 
> *i1,
>   return i1->Op2 - i2->Op2;
>  }
>  
> +const struct sys_reg_desc *find_reg_by_id(u64 id,
> +   struct sys_reg_params *params,
> +   const struct sys_reg_desc table[],
> +   unsigned int num);
>  
>  #define Op0(_x)  .Op0 = _x
>  #define Op1(_x)  .Op1 = _x
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 5/5] KVM: arm64: Implement vGICv3 CPU interface access

2015-09-04 Thread Andre Przywara
Hi Pavel,

On 04/09/15 13:40, Pavel Fedin wrote:
> The access is done similar to GICv2, using KVM_DEV_ARM_VGIC_GRP_CPU_REGS
> group, however attribute ID encodes corresponding system register. Access
> size is always 64 bits.

Why is this? Actually all registers in the CPU interface (except the w/o
SGI registers) are 32 bits and in the pending 32-bit GICv3 support
series[1] this is exploited by using MRC/MCR accesses.
The only thing that is pure 64-bit is the MRS/MSR _instruction_ in
Aarch64, which always takes a x register.
So can you model the register size according to the spec and allow
32-bit accesses from userland?

> Since CPU interface state actually affects only a single vCPU, no vGIC
> locking is done. Just made sure that the vCPU is not running.
> 
> Signed-off-by: Pavel Fedin 
> ---
>  Documentation/virtual/kvm/devices/arm-vgic.txt |  38 +++-
>  arch/arm64/include/uapi/asm/kvm.h  |   7 +
>  include/linux/irqchip/arm-gic-v3.h |  18 +-
>  virt/kvm/arm/vgic-v3-emul.c| 244 
> +
>  4 files changed, 303 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/devices/arm-vgic.txt 
> b/Documentation/virtual/kvm/devices/arm-vgic.txt
> index 03f640f..518b634 100644
> --- a/Documentation/virtual/kvm/devices/arm-vgic.txt
> +++ b/Documentation/virtual/kvm/devices/arm-vgic.txt
> @@ -88,7 +88,7 @@ Groups:
>  -ENODEV: Getting or setting this register is not yet supported
>  -EBUSY: One or more VCPUs are running
> 
> -  KVM_DEV_ARM_VGIC_GRP_CPU_REGS
> +  KVM_DEV_ARM_VGIC_GRP_CPU_REGS (vGICv2)
>Attributes:
>  The attr field of kvm_device_attr encodes two values:
>  bits: | 63     52 | 51 ..  32  |  31   0 |
> @@ -116,11 +116,45 @@ Groups:
> 
>Limitations:
>  - Priorities are not implemented, and registers are RAZ/WI
> -- Currently only implemented for KVM_DEV_TYPE_ARM_VGIC_V2.
>Errors:
>  -ENODEV: Getting or setting this register is not yet supported
>  -EBUSY: One or more VCPUs are running
> 
> +  KVM_DEV_ARM_VGIC_GRP_CPU_REGS (vGICv3)
> +  Attributes:
> +The attr field of kvm_device_attr encodes the following values:
> +bits:   | 63 .. 56 | 55 .. 48 | 47 ... 40 | 39 .. 32 | 31 .. 0 |
> +values: |   arch   |   size   | reserved  |  cpu id  |  reg id |
> +
> +All CPU interface regs are (rw, 64-bit). The only supported size value is
> +KVM_REG_SIZE_U64.
> +
> +Arch, size and reg id fields actually encode system register to be
> +accessed. Normally these values are obtained using  ARM64_SYS_REG() 
> macro.
> +Getting or setting such a register has the same effect as reading or
> +writing the register on the actual hardware.
> +
> +The Active Priorities Registers AP0Rn and AP1Rn are implementation 
> defined,
> +so we set a fixed format for our implementation that fits with the model 
> of
> +a "GICv3 implementation without the security extensions" which we present
> +to the guest. This interface always exposes four register APR[0-3]
> +describing the maximum possible 128 preemption levels. The semantics of 
> the
> +register indicates if any interrupts in a given preemption level are in 
> the
> +active state by setting the corresponding bit.
> +
> +Thus, preemption level X has one or more active interrupts if and only 
> if:
> +
> +  APRn[X mod 32] == 0b1,  where n = X / 32
> +
> +Bits for undefined preemption levels are RAZ/WI.
> +
> +  Limitations:
> +- Priorities are not implemented, and registers are RAZ/WI
> +  Errors:
> +-ENODEV: Getting or setting this register is not yet supported

The code uses -ENXIO.

> +-EBUSY: One or more VCPUs are running
> +
> +
>KVM_DEV_ARM_VGIC_GRP_NR_IRQS
>Attributes:
>  A value describing the number of interrupts (SGI, PPI and SPI) for
> diff --git a/arch/arm64/include/uapi/asm/kvm.h 
> b/arch/arm64/include/uapi/asm/kvm.h
> index 249954f..7d37ccd 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -201,6 +201,13 @@ struct kvm_arch_memory_slot {
>  #define   KVM_DEV_ARM_VGIC_CPUID_MASK  (0xfULL << 
> KVM_DEV_ARM_VGIC_CPUID_SHIFT)
>  #define   KVM_DEV_ARM_VGIC_OFFSET_SHIFT0
>  #define   KVM_DEV_ARM_VGIC_OFFSET_MASK (0xULL << 
> KVM_DEV_ARM_VGIC_OFFSET_SHIFT)
> +#define   KVM_DEV_ARM_VGIC_REG_MASK(KVM_REG_SIZE_MASK | \
> +KVM_REG_ARM64_SYSREG_OP0_MASK | \
> +KVM_REG_ARM64_SYSREG_OP1_MASK | \
> +KVM_REG_ARM64_SYSREG_CRN_MASK | \
> +KVM_REG_ARM64_SYSREG_CRM_MASK | \
> +KVM_REG_ARM64_SYSREG_OP2_MASK)
> +
>  #define KVM_DEV_ARM_VGIC_GRP_NR_IRQS   3
>  #define KVM_DEV_ARM_VGIC_GRP_CTRL  4
>  #define   KVM_DEV_ARM_VGIC_CTRL_INIT   0
> diff --git 

Re: [PATCH v2 1/5] KVM: arm/arm64: Refactor vGIC attributes handling code

2015-09-04 Thread Andre Przywara
Hi,

On 04/09/15 16:11, Pavel Fedin wrote:
>  Hello!
> 
>> Isn't the len parameter redundant here? I see that you don't initialize
>> mmio.len (which is a bit scary, btw), so can't you just use that field?
> 
>  This was because of split below. I did not know about call_range_handler(), 
> and now i will redo
> this.
> 
>> That (and other parts of this patch) sneak in some endianness handling,
>> which I'd like to be mentioned in the commit message, but preferably be
>> in a separate patch. The commit message here talks only about refactoring.
> 
>  These come from mmio_data_read() and mmio_data_write() in original 
> vgic_attr_regs_access().
> These inlines cannot be used with arbitrary data length, so i opened them up 
> (they contain
> endianness conversion plus masking which isn't used in our case) and moved 
> endianness conversion to
> load/store part.
>  If i make this a separate patch, it will be two lines patch. Does it worth 
> that? In the next respin
> i'd better add this explanation to commit message. Would it be OK?

>From a review (and later bisecting) point of view separate patches would
be better. Ideally the refactoring does not introduce any change except
code moving around.

Cheers,
Andre.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] arm/arm64: KVM: Improve kvm_exit tracepoint

2015-09-03 Thread Andre Przywara
Hi Christoffer,

On 30/08/15 14:55, Christoffer Dall wrote:
> The ARM architecture only saves the exit class to the HSR (ESR_EL2 for
> arm64) on synchronous exceptions, not on asynchronous exceptions like an
> IRQ.  However, we only report the exception class on kvm_exit, which is
> confusing because an IRQ looks like it exited at some PC with the same
> reason as the previous exit.  Add a lookup table for the exception index
> and prepend the kvm_exit tracepoint text with the exception type to
> clarify this situation.
> 
> Also resolve the exception class (EC) to a human-friendly text version
> so the trace output becomes immediately usable for debugging this code.

That patch just proved very useful for me, especially since the encoding
of .EC is different between ARM & ARM64, so thanks for that!

But still there is HSR.EC reported for asynchronous exceptions, which is
confusing, so I wonder if it would be worth to have two tracepoints to
just report the PC for async exits and .EC and PC for traps?
I guess it would be neater to have this differentiation in the
TRACE_EVENT macro invocation, but I reckon it is not powerful enough?

Also this patch is independent from both the first one and the reworked
arch timer series. I see your intention of pushing your arch timer
series through ;-), but I suggest to make this patch separate and add
1/2 to the arch timer series.

Cheers,
Andre.

> Signed-off-by: Christoffer Dall 
> ---
>  arch/arm/include/asm/kvm_arm.h   | 20 
>  arch/arm/kvm/arm.c   |  2 +-
>  arch/arm/kvm/trace.h | 10 +++---
>  arch/arm64/include/asm/kvm_arm.h | 16 
>  4 files changed, 44 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
> index d995821..dc641dd 100644
> --- a/arch/arm/include/asm/kvm_arm.h
> +++ b/arch/arm/include/asm/kvm_arm.h
> @@ -218,4 +218,24 @@
>  #define HSR_DABT_CM  (1U << 8)
>  #define HSR_DABT_EA  (1U << 9)
>  
> +#define kvm_arm_exception_type   \
> + {0, "RESET" },  \
> + {1, "UNDEFINED" },  \
> + {2, "SOFTWARE" },   \
> + {3, "PREF_ABORT" }, \
> + {4, "DATA_ABORT" }, \
> + {5, "IRQ" },\
> + {6, "FIQ" },\
> + {7, "HVC" }
> +
> +#define HSRECN(x) { HSR_EC_##x, #x }
> +
> +#define kvm_arm_exception_class \
> + HSRECN(UNKNOWN), HSRECN(WFI), HSRECN(CP15_32), HSRECN(CP15_64), \
> + HSRECN(CP14_MR), HSRECN(CP14_LS), HSRECN(CP_0_13), HSRECN(CP10_ID), \
> + HSRECN(JAZELLE), HSRECN(BXJ), HSRECN(CP14_64), HSRECN(SVC_HYP), \
> + HSRECN(HVC), HSRECN(SMC), HSRECN(IABT), HSRECN(IABT_HYP), \
> + HSRECN(DABT), HSRECN(DABT_HYP)
> +
> +
>  #endif /* __ARM_KVM_ARM_H__ */
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 102a4aa..ffec2f2 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -606,7 +606,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
> kvm_run *run)
>* guest time.
>*/
>   kvm_guest_exit();
> - trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
> + trace_kvm_exit(ret, kvm_vcpu_trap_get_class(vcpu), 
> *vcpu_pc(vcpu));
>  
>   /*
>* We must sync the timer state before the vgic state so that
> diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
> index 0ec3539..c25a885 100644
> --- a/arch/arm/kvm/trace.h
> +++ b/arch/arm/kvm/trace.h
> @@ -25,21 +25,25 @@ TRACE_EVENT(kvm_entry,
>  );
>  
>  TRACE_EVENT(kvm_exit,
> - TP_PROTO(unsigned int exit_reason, unsigned long vcpu_pc),
> - TP_ARGS(exit_reason, vcpu_pc),
> + TP_PROTO(int idx, unsigned int exit_reason, unsigned long vcpu_pc),
> + TP_ARGS(idx, exit_reason, vcpu_pc),
>  
>   TP_STRUCT__entry(
> + __field(int,idx )
>   __field(unsigned int,   exit_reason )
>   __field(unsigned long,  vcpu_pc )
>   ),
>  
>   TP_fast_assign(
> + __entry->idx= idx;
>   __entry->exit_reason= exit_reason;
>   __entry->vcpu_pc= vcpu_pc;
>   ),
>  
> - TP_printk("HSR_EC: 0x%04x, PC: 0x%08lx",
> + TP_printk("%s: HSR_EC: 0x%04x (%s), PC: 0x%08lx",
> +   __print_symbolic(__entry->idx, kvm_arm_exception_type),
> __entry->exit_reason,
> +   __print_symbolic(__entry->exit_reason, 
> kvm_arm_exception_class),
> __entry->vcpu_pc)
>  );
>  
> diff --git a/arch/arm64/include/asm/kvm_arm.h 
> b/arch/arm64/include/asm/kvm_arm.h
> index 7605e09..ffb86bf 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -197,4 +197,20 @@
>  /* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
>  #define HPFAR_MASK   (~UL(0xf))
>  
> +#define 

Re: [PATCH v2 01/15] KVM: arm/arm64: VGIC: don't track used LRs in the distributor

2015-09-02 Thread Andre Przywara
On 31/08/15 09:42, Eric Auger wrote:
> On 08/24/2015 06:33 PM, Andre Przywara wrote:

Salut Eric,

...

>>>> @@ -1126,9 +1124,9 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu 
>>>> *vcpu, int irq,
>>>>   */
>>>>  bool vgic_queue_irq(struct kvm_vcpu *vcpu, u8 sgi_source_id, int irq)
>>>>  {
>>>> -   struct vgic_cpu *vgic_cpu = >arch.vgic_cpu;
>>>> struct vgic_dist *dist = >kvm->arch.vgic;
>>>> -   struct vgic_lr vlr;
>>>> +   u64 elrsr = vgic_get_elrsr(vcpu);
>>>> +   unsigned long *elrsr_ptr = u64_to_bitmask();
>>>> int lr;
>>>>
>>>> /* Sanitize the input... */
>>>> @@ -1138,42 +1136,20 @@ bool vgic_queue_irq(struct kvm_vcpu *vcpu, u8 
>>>> sgi_source_id, int irq)
>>>>
>>>> kvm_debug("Queue IRQ%d\n", irq);
>>>>
>>>> -   lr = vgic_cpu->vgic_irq_lr_map[irq];
>>>> -
>>>> -   /* Do we have an active interrupt for the same CPUID? */
>>>> -   if (lr != LR_EMPTY) {
>>>> -   vlr = vgic_get_lr(vcpu, lr);
>>>> -   if (vlr.source == sgi_source_id) {
>>>> -   kvm_debug("LR%d piggyback for IRQ%d\n", lr, vlr.irq);
>>>> -   BUG_ON(!test_bit(lr, vgic_cpu->lr_used));
>>>> -   vgic_queue_irq_to_lr(vcpu, irq, lr, vlr);
>>>> -   return true;
>>>> -   }
>>>> -   }
>>>> +   lr = find_first_bit(elrsr_ptr, vgic->nr_lr);
>>>>
>>>> -   /* Try to use another LR for this interrupt */
>>>> -   lr = find_first_zero_bit((unsigned long *)vgic_cpu->lr_used,
>>>> -  vgic->nr_lr);
>>>> if (lr >= vgic->nr_lr)
>>>> return false;
>>>>
>>>> kvm_debug("LR%d allocated for IRQ%d %x\n", lr, irq, sgi_source_id);
>>>> -   vgic_cpu->vgic_irq_lr_map[irq] = lr;
>>>> -   set_bit(lr, vgic_cpu->lr_used);
>>>>
>>>> -   vlr.irq = irq;
>>>> -   vlr.source = sgi_source_id;
>>>> -   vlr.state = 0;
>>>> -   vgic_queue_irq_to_lr(vcpu, irq, lr, vlr);
>>>> +   vgic_queue_irq_to_lr(vcpu, irq, lr, sgi_source_id);
>>>>
>>>> return true;
>>>>  }
>>>>
>>>>  static bool vgic_queue_hwirq(struct kvm_vcpu *vcpu, int irq)
>>>>  {
>>>> -   if (!vgic_can_sample_irq(vcpu, irq))
>>>> -   return true; /* level interrupt, already queued */
>>>> -
>>> I think that change needs to be introduced in a separate patch as the
>>> other one mentioned above and justified since it affects the state machine.
>>
>> But this change is dependent on this patch: it will not work without
>> this patch and this patch will not work without that change.
>> So the idea is that on returning from the guest we now harvest all
>> _used_ LRs by copying their state back into the distributor. The
>> previous behaviour was to just check _unused_ LRs for completed IRQs.
>> So now all IRQs need to be re-inserted into the LRs before the next
>> guest run, that's why we have to remove the test which skipped this for
>> IRQs where the code knew that they were still in the LRs.
>> Does that make sense?
> In level sensitive case, what if the IRQ'LR state was active. LR was
> kept intact. IRQ is queued. With previous code we wouldn't inject a new
> IRQ. Here aren't we going to inject it?

Effectively we only inject _pending_ IRQs: I was going forth and back
through the current code and couldn't find a place where we actually
make use of any active-only interrupt - the only exception being
migration, where we explicitly iterate through all LRs again and pick up
active-only IRQs as well. We never set the active state except there.

So I decided to keep only-active IRQs in the LRs and do not propagate
them back into the emulated distributor state. One reason is that we
don't use it, which makes the code look silly and secondly we avoid the
issue you described above.

> In the sync when you move the IRQ from the LR reg to the state variable,
> shouldn't you reset the queued state for level sensitive case? In such a
> case I think you could keep that check.

We keep them as "queued" in our emulation until they become inactive and
we clear the queued bit in the EOI handler.

Admittedly this whole approach is not obvious and it's perfectly
possible that I missed something. So please can you check and confirm my
assumptions above?

Merci,
André

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] KVM: arm64: Implement vGICv3 distributor and redistributor access from userspace

2015-09-01 Thread Andre Przywara
Hi Pavel,

...

>> diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c
>> index e661e7f..b3847e1 100644
>> --- a/virt/kvm/arm/vgic-v3-emul.c
>> +++ b/virt/kvm/arm/vgic-v3-emul.c
...
>> @@ -1000,40 +1102,95 @@ static void vgic_v3_destroy(struct kvm_device *dev)
>>  kfree(dev);
>>  }
>>  
>> +static u32 vgic_v3_get_reg_size(struct kvm_device_attr *attr)
>> +{
>> +u32 offset;
>> +
>> +switch (attr->group) {
>> +case KVM_DEV_ARM_VGIC_GRP_DIST_REGS:
>> +offset = attr->attr & KVM_DEV_ARM_VGIC_OFFSET_MASK;
>> +if (offset >= GICD_IROUTER && offset <= 0x7FD8)
> 
> eh, 0x7FD8 ?
> 
>> +return 8;
>> +else
>> +return 4;
>> +break;
>> +
>> +case KVM_DEV_ARM_VGIC_GRP_REDIST_REGS:
>> +offset = attr->attr & KVM_DEV_ARM_VGIC_OFFSET_MASK;
>> +if ((offset == GICR_TYPER) ||
>> +(offset >= GICR_SETLPIR && offset <= GICR_INVALLR))
>> +return 8;
>> +else
>> +return 4;
>> +break;
>> +
>> +default:
>> +return -ENXIO;
>> +}
>> +}
> 
> this feels wrong.

I agree on this, actually I consider this dangerous. Currently the
memory behind addr in QEMU (hw/intc/arm_gic_kvm.c:kvm_arm_gic_get() for
instance) is only uint32_t, so you have to take care to provide uint64_t
backing for those registers, which means that there must be a match
between the register size the kernel knows and the size userland thinks
of. So I'd rather see the access size controlled by userland, probably
using Christoffer's suggestion below.

Also the GIC specification says that everything must be accessible with
32-bit accesses. Correct me if I am wrong on this, but vCPUs are not
supposed to run while you are getting/setting VGIC registers, right? So
there shouldn't be any issues with non-atomic accesses to 64-bit
registers, which means you could just go ahead and do everything in
32-bit only. This would also help with supporting 32-bit userland and/or
kernel later.

Cheers,
Andre.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 12/15] KVM: arm64: sync LPI configuration and pending tables

2015-08-25 Thread Andre Przywara
Hi Eric,

On 14/08/15 12:58, Eric Auger wrote:
 On 07/10/2015 04:21 PM, Andre Przywara wrote:
 The LPI configuration and pending tables of the GICv3 LPIs are held
 in tables in (guest) memory. To achieve reasonable performance, we
 cache this data in our own data structures, so we need to sync those
 two views from time to time. This behaviour is well described in the
 GICv3 spec and is also exercised by hardware, so the sync points are
 well known.

 Provide functions that read the guest memory and store the
 information from the configuration and pending tables in the kernel.

 Signed-off-by: Andre Przywara andre.przyw...@arm.com
 ---
 would help to have change log between v1 - v2 (valid for the whole series)
  include/kvm/arm_vgic.h  |   2 +
  virt/kvm/arm/its-emul.c | 124 
 
  virt/kvm/arm/its-emul.h |   3 ++
  3 files changed, 129 insertions(+)

 diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
 index 2a67a10..323c33a 100644
 --- a/include/kvm/arm_vgic.h
 +++ b/include/kvm/arm_vgic.h
 @@ -167,6 +167,8 @@ struct vgic_its {
  int cwriter;
  struct list_headdevice_list;
  struct list_headcollection_list;
 +/* memory used for buffering guest's memory */
 +void*buffer_page;
  };
  
  struct vgic_dist {
 diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
 index b9c40d7..05245cb 100644
 --- a/virt/kvm/arm/its-emul.c
 +++ b/virt/kvm/arm/its-emul.c
 @@ -50,6 +50,7 @@ struct its_itte {
  struct its_collection *collection;
  u32 lpi;
  u32 event_id;
 +u8 priority;
  bool enabled;
  unsigned long *pending;
  };
 @@ -70,8 +71,124 @@ static struct its_itte *find_itte_by_lpi(struct kvm 
 *kvm, int lpi)
  return NULL;
  }
  
 +#define LPI_PROP_ENABLE_BIT(p)  ((p)  LPI_PROP_ENABLED)
 +#define LPI_PROP_PRIORITY(p)((p)  0xfc)
 +
 +/* stores the priority and enable bit for a given LPI */
 +static void update_lpi_config(struct kvm *kvm, struct its_itte *itte, u8 
 prop)
 +{
 +itte-priority = LPI_PROP_PRIORITY(prop);
 +itte-enabled  = LPI_PROP_ENABLE_BIT(prop);
 +}
 +
 +#define GIC_LPI_OFFSET 8192
 +
 +/* We scan the table in chunks the size of the smallest page size */
 4kB chunks?

Marc was complaining about this wording, I think. The rationale was that
4K is already in the code and thus does not need to be repeated in the
comment, whereas the comment should explain the meaning of the value.

 +#define CHUNK_SIZE 4096U
 +
  #define BASER_BASE_ADDRESS(x) ((x)  0xf000ULL)
  
 +static int nr_idbits_propbase(u64 propbaser)
 +{
 +int nr_idbits = (1U  (propbaser  0x1f)) + 1;
 +
 +return max(nr_idbits, INTERRUPT_ID_BITS_ITS);
 +}
 +
 +/*
 + * Scan the whole LPI configuration table and put the LPI configuration
 + * data in our own data structures. This relies on the LPI being
 + * mapped before.
 + */
 +static bool its_update_lpis_configuration(struct kvm *kvm)
 +{
 +struct vgic_dist *dist = kvm-arch.vgic;
 +u8 *prop = dist-its.buffer_page;
 +u32 tsize;
 +gpa_t propbase;
 +int lpi = GIC_LPI_OFFSET;
 +struct its_itte *itte;
 +struct its_device *device;
 +int ret;
 +
 +propbase = BASER_BASE_ADDRESS(dist-propbaser);
 +tsize = nr_idbits_propbase(dist-propbaser);
 +
 +while (tsize  0) {
 +int chunksize = min(tsize, CHUNK_SIZE);
 +
 +ret = kvm_read_guest(kvm, propbase, prop, chunksize);
 I think you still have the spin_lock issue  since if my understanding is
 correct this is called from
 vgic_handle_mmio_access/vcall_range_handler/gic_enable_lpis
 where vgic_handle_mmio_access. Or does it take another path?

Well, it's (also) called on handling the INVALL command, but you are
right that on that enable path the dist lock is held. I reckon that this
init part isn't racy so that shouldn't be a problem (famous last words ;-).
Let me see whether I can find a way to just drop the lock around the
while loop.

Cheers,
Andre.

 
 Shouldn't we create a new kvm_io_device to avoid holding the dist lock?
 
 Eric
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 11/15] KVM: arm64: handle pending bit for LPIs in ITS emulation

2015-08-25 Thread Andre Przywara
Hi Eric,

On 14/08/15 12:58, Eric Auger wrote:
 On 07/10/2015 04:21 PM, Andre Przywara wrote:
 As the actual LPI number in a guest can be quite high, but is mostly
 assigned using a very sparse allocation scheme, bitmaps and arrays
 for storing the virtual interrupt status are a waste of memory.
 We use our equivalent of the Interrupt Translation Table Entry
 (ITTE) to hold this extra status information for a virtual LPI.
 As the normal VGIC code cannot use it's fancy bitmaps to manage
 pending interrupts, we provide a hook in the VGIC code to let the
 ITS emulation handle the list register queueing itself.
 LPIs are located in a separate number range (=8192), so
 distinguishing them is easy. With LPIs being only edge-triggered, we
 get away with a less complex IRQ handling.

 Signed-off-by: Andre Przywara andre.przyw...@arm.com
 ---
  include/kvm/arm_vgic.h  |  2 ++
  virt/kvm/arm/its-emul.c | 71 
 
  virt/kvm/arm/its-emul.h |  3 ++
  virt/kvm/arm/vgic-v3-emul.c |  2 ++
  virt/kvm/arm/vgic.c | 72 
 ++---
  5 files changed, 133 insertions(+), 17 deletions(-)

 diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
 index 1648668..2a67a10 100644
 --- a/include/kvm/arm_vgic.h
 +++ b/include/kvm/arm_vgic.h
 @@ -147,6 +147,8 @@ struct vgic_vm_ops {
   int (*init_model)(struct kvm *);
   void(*destroy_model)(struct kvm *);
   int (*map_resources)(struct kvm *, const struct vgic_params *);
 + bool(*queue_lpis)(struct kvm_vcpu *);
 + void(*unqueue_lpi)(struct kvm_vcpu *, int irq);
  };

  struct vgic_io_device {
 diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
 index 7f217fa..b9c40d7 100644
 --- a/virt/kvm/arm/its-emul.c
 +++ b/virt/kvm/arm/its-emul.c
 @@ -50,8 +50,26 @@ struct its_itte {
   struct its_collection *collection;
   u32 lpi;
   u32 event_id;
 + bool enabled;
 + unsigned long *pending;
  };

 +#define for_each_lpi(dev, itte, kvm) \
 + list_for_each_entry(dev, (kvm)-arch.vgic.its.device_list, dev_list) \
 + list_for_each_entry(itte, (dev)-itt, itte_list)
 +
 You have a checkpatch error here:
 
 ERROR: Macros with complex values should be enclosed in parentheses
 #52: FILE: virt/kvm/arm/its-emul.c:57:
 +#define for_each_lpi(dev, itte, kvm) \
 +   list_for_each_entry(dev, (kvm)-arch.vgic.its.device_list, dev_list) 
 \
 +   list_for_each_entry(itte, (dev)-itt, itte_list)

I know about that one. The problem is that if I add the parentheses it
breaks the usage below due to the curly brackets. But the definition
above is just so convenient and I couldn't find another neat solution so
far. If you are concerned about that I can give it another try,
otherwise I tend to just ignore checkpatch here.

 +static struct its_itte *find_itte_by_lpi(struct kvm *kvm, int lpi)
 +{
 can't we have the same LPI present in different interrupt translation
 tables? I don't know it is a sensible setting but I did not succeed in
 finding it was not possible.

Thanks to Marc I am happy (and relieved!) to point you to 6.1.1 LPI INTIDs:
The behavior of the GIC is UNPREDICTABLE if software:
- Maps multiple EventID/DeviceID combinations to the same physical LPI
INTID.

So I exercise the freedom of UNPREDICTABLE here ;-)

 + struct its_device *device;
 + struct its_itte *itte;
 +
 + for_each_lpi(device, itte, kvm) {
 + if (itte-lpi == lpi)
 + return itte;
 + }
 + return NULL;
 +}
 +
  #define BASER_BASE_ADDRESS(x) ((x)  0xf000ULL)

  /* The distributor lock is held by the VGIC MMIO handler. */
 @@ -145,6 +163,59 @@ static bool handle_mmio_gits_idregs(struct kvm_vcpu 
 *vcpu,
   return false;
  }

 +/*
 + * Find all enabled and pending LPIs and queue them into the list
 + * registers.
 + * The dist lock is held by the caller.
 + */
 +bool vits_queue_lpis(struct kvm_vcpu *vcpu)
 +{
 + struct vgic_its *its = vcpu-kvm-arch.vgic.its;
 + struct its_device *device;
 + struct its_itte *itte;
 + bool ret = true;
 +
 + if (!vgic_has_its(vcpu-kvm))
 + return true;
 + if (!its-enabled || !vcpu-kvm-arch.vgic.lpis_enabled)
 + return true;
 +
 + spin_lock(its-lock);
 + for_each_lpi(device, itte, vcpu-kvm) {
 + if (!itte-enabled || !test_bit(vcpu-vcpu_id, itte-pending))
 + continue;
 +
 + if (!itte-collection)
 + continue;
 +
 + if (itte-collection-target_addr != vcpu-vcpu_id)
 + continue;
 +
 + __clear_bit(vcpu-vcpu_id, itte-pending);
 +
 + ret = vgic_queue_irq(vcpu, 0, itte-lpi);
 what if the vgic_queue_irq fails since no LR can be found, the
 itte-pending was cleared so we forget that LPI? shouldn't we restore
 the pending state in ITT? in vgic_queue_hwirq the state change only is
 performed

Re: [PATCH v2 12/15] KVM: arm64: sync LPI configuration and pending tables

2015-08-25 Thread Andre Przywara
Hi Eric,

On 14/08/15 13:35, Eric Auger wrote:
 On 08/14/2015 01:58 PM, Eric Auger wrote:
 On 07/10/2015 04:21 PM, Andre Przywara wrote:
 The LPI configuration and pending tables of the GICv3 LPIs are held
 in tables in (guest) memory. To achieve reasonable performance, we
 cache this data in our own data structures, so we need to sync those
 two views from time to time. This behaviour is well described in the
 GICv3 spec and is also exercised by hardware, so the sync points are
 well known.

 Provide functions that read the guest memory and store the
 information from the configuration and pending tables in the kernel.

 Signed-off-by: Andre Przywara andre.przyw...@arm.com
 ---
 would help to have change log between v1 - v2 (valid for the whole series)
  include/kvm/arm_vgic.h  |   2 +
  virt/kvm/arm/its-emul.c | 124 
 
  virt/kvm/arm/its-emul.h |   3 ++
  3 files changed, 129 insertions(+)

 diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
 index 2a67a10..323c33a 100644
 --- a/include/kvm/arm_vgic.h
 +++ b/include/kvm/arm_vgic.h
 @@ -167,6 +167,8 @@ struct vgic_its {
 int cwriter;
 struct list_headdevice_list;
 struct list_headcollection_list;
 +   /* memory used for buffering guest's memory */
 +   void*buffer_page;
  };
  
  struct vgic_dist {
 diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
 index b9c40d7..05245cb 100644
 --- a/virt/kvm/arm/its-emul.c
 +++ b/virt/kvm/arm/its-emul.c
 @@ -50,6 +50,7 @@ struct its_itte {
 struct its_collection *collection;
 u32 lpi;
 u32 event_id;
 +   u8 priority;
 bool enabled;
 unsigned long *pending;
  };
 @@ -70,8 +71,124 @@ static struct its_itte *find_itte_by_lpi(struct kvm 
 *kvm, int lpi)
 return NULL;
  }
  
 +#define LPI_PROP_ENABLE_BIT(p) ((p)  LPI_PROP_ENABLED)
 +#define LPI_PROP_PRIORITY(p)   ((p)  0xfc)
 +
 +/* stores the priority and enable bit for a given LPI */
 +static void update_lpi_config(struct kvm *kvm, struct its_itte *itte, u8 
 prop)
 +{
 +   itte-priority = LPI_PROP_PRIORITY(prop);
 +   itte-enabled  = LPI_PROP_ENABLE_BIT(prop);
 +}
 +
 +#define GIC_LPI_OFFSET 8192
 +
 +/* We scan the table in chunks the size of the smallest page size */
 4kB chunks?
 +#define CHUNK_SIZE 4096U
 +
  #define BASER_BASE_ADDRESS(x) ((x)  0xf000ULL)
  
 +static int nr_idbits_propbase(u64 propbaser)
 +{
 +   int nr_idbits = (1U  (propbaser  0x1f)) + 1;
 +
 +   return max(nr_idbits, INTERRUPT_ID_BITS_ITS);
 +}
 +
 +/*
 + * Scan the whole LPI configuration table and put the LPI configuration
 + * data in our own data structures. This relies on the LPI being
 + * mapped before.
 + */
 +static bool its_update_lpis_configuration(struct kvm *kvm)
 +{
 +   struct vgic_dist *dist = kvm-arch.vgic;
 +   u8 *prop = dist-its.buffer_page;
 +   u32 tsize;
 +   gpa_t propbase;
 +   int lpi = GIC_LPI_OFFSET;
 +   struct its_itte *itte;
 +   struct its_device *device;
 +   int ret;
 +
 +   propbase = BASER_BASE_ADDRESS(dist-propbaser);
 +   tsize = nr_idbits_propbase(dist-propbaser);
 +
 +   while (tsize  0) {
 +   int chunksize = min(tsize, CHUNK_SIZE);
 +
 +   ret = kvm_read_guest(kvm, propbase, prop, chunksize);
 I think you still have the spin_lock issue  since if my understanding is
 correct this is called from
 vgic_handle_mmio_access/vcall_range_handler/gic_enable_lpis
 where vgic_handle_mmio_access. Or does it take another path?

 Shouldn't we create a new kvm_io_device to avoid holding the dist lock?
 
 Sorry I forgot it was the case already. But currently we always register
 the same io ops (registration entry point being
 vgic_register_kvm_io_dev) and maybe we should have separate dispatcher
 function for dist, redit and its?

What would be the idea behind it? To have separate locks for each? I
don't think that will work, as some ITS functions are called from GICv3
register handler functions which manipulate members of the distributor
structure. So I am more in favour of dropping the dist lock in these
cases before handing off execution to ITS specific functions.

Cheers,
Andre.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 08/15] KVM: arm64: introduce ITS emulation file with stub functions

2015-08-25 Thread Andre Przywara
Salut Eric,



 diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c
 index 5269ad1..f5865e7 100644
 --- a/virt/kvm/arm/vgic-v3-emul.c
 +++ b/virt/kvm/arm/vgic-v3-emul.c
 @@ -48,6 +48,7 @@
  #include asm/kvm_mmu.h
  
  #include vgic.h
 +#include its-emul.h
  
  static bool handle_mmio_rao_wi(struct kvm_vcpu *vcpu,
 struct kvm_exit_mmio *mmio, phys_addr_t offset)
 @@ -530,9 +531,20 @@ static bool handle_mmio_ctlr_redist(struct kvm_vcpu 
 *vcpu,
  struct kvm_exit_mmio *mmio,
  phys_addr_t offset)
  {
 -/* since we don't support LPIs, this register is zero for now */
 -vgic_reg_access(mmio, NULL, offset,
 -ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
 +struct vgic_dist *dist = vcpu-kvm-arch.vgic;
 +u32 reg;
 +
 +if (!vgic_has_its(vcpu-kvm)) {
 +vgic_reg_access(mmio, NULL, offset,
 +ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
 +return false;
 +}
 can't we remove above block and ...
 +reg = dist-lpis_enabled ? GICR_CTLR_ENABLE_LPIS : 0;
 +vgic_reg_access(mmio, reg, offset,
 +ACCESS_READ_VALUE | ACCESS_WRITE_VALUE);
 +if (!dist-lpis_enabled  (reg  GICR_CTLR_ENABLE_LPIS
 add vgic_has_its(vcpu-kvm)  above?

Yeah, makes some sense. Changed that.
 
 Besides Reviewed-by: Eric Auger eric.au...@linaro.org

Merci!

André

 
 Eric
 )) {
 +/* Eventually do something */
 +}
  return false;
  }
  
 @@ -861,6 +873,12 @@ static int vgic_v3_map_resources(struct kvm *kvm,
  rdbase += GIC_V3_REDIST_SIZE;
  }
  
 +if (vgic_has_its(kvm)) {
 +ret = vits_init(kvm);
 +if (ret)
 +goto out_unregister;
 +}
 +
  dist-redist_iodevs = iodevs;
  dist-ready = true;
  goto out;

 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 09/15] KVM: arm64: implement basic ITS register handlers

2015-08-25 Thread Andre Przywara
Hi Eric,



 diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
 index 659dd39..b498f06 100644
 --- a/virt/kvm/arm/its-emul.c
 +++ b/virt/kvm/arm/its-emul.c
 @@ -32,10 +32,62 @@
  #include vgic.h
  #include its-emul.h

 +#define BASER_BASE_ADDRESS(x) ((x)  0xf000ULL)
 +
 +/* The distributor lock is held by the VGIC MMIO handler. */
  static bool handle_mmio_misc_gits(struct kvm_vcpu *vcpu,
 struct kvm_exit_mmio *mmio,
 phys_addr_t offset)
  {
 + struct vgic_its *its = vcpu-kvm-arch.vgic.its;
 + u32 reg;
 + bool was_enabled;
 +
 + switch (offset  ~3) {
 + case 0x00:  /* GITS_CTLR */
 + /* We never defer any command execution. */
 + reg = GITS_CTLR_QUIESCENT;
 + if (its-enabled)
 + reg |= GITS_CTLR_ENABLE;
 + was_enabled = its-enabled;
 + vgic_reg_access(mmio, reg, offset  3,
 + ACCESS_READ_VALUE | ACCESS_WRITE_VALUE);
 + its-enabled = !!(reg  GITS_CTLR_ENABLE);
 + return !was_enabled  its-enabled;
 + case 0x04:  /* GITS_IIDR */
 + reg = (PRODUCT_ID_KVM  24) | (IMPLEMENTER_ARM  0);
 + vgic_reg_access(mmio, reg, offset  3,
 + ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
 + break;
 + case 0x08:  /* GITS_TYPER */
 + /*
 +  * We use linear CPU numbers for redistributor addressing,
 +  * so GITS_TYPER.PTA is 0.
 +  * To avoid memory waste on the guest side, we keep the
 +  * number of IDBits and DevBits low for the time being.
 +  * This could later be made configurable by userland.
 +  * Since we have all collections in linked list, we claim
 +  * that we can hold all of the collection tables in our
 +  * own memory and that the ITT entry size is 1 byte (the
 +  * smallest possible one).
 +  */
 + reg = GITS_TYPER_PLPIS;
 + reg |= 0xff  GITS_TYPER_HWCOLLCNT_SHIFT;
 + reg |= 0x0f  GITS_TYPER_DEVBITS_SHIFT;
 + reg |= 0x0f  GITS_TYPER_IDBITS_SHIFT;
 + vgic_reg_access(mmio, reg, offset  3,
 + ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
 + break;
 + case 0x0c:
 + /* The upper 32bits of TYPER are all 0 for the time being.
 +  * Should we need more than 256 collections, we can enable
 +  * some bits in here.
 +  */
 + vgic_reg_access(mmio, NULL, offset  3,
 + ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
 + break;
 + }
 +
   return false;
  }

 @@ -43,20 +95,142 @@ static bool handle_mmio_gits_idregs(struct kvm_vcpu 
 *vcpu,
   struct kvm_exit_mmio *mmio,
   phys_addr_t offset)
  {
 + u32 reg = 0;
 + int idreg = (offset  ~3) + GITS_IDREGS_BASE;
 +
 + switch (idreg) {
 + case GITS_PIDR2:
 + reg = GIC_PIDR2_ARCH_GICv3;
 + break;
 + case GITS_PIDR4:
 + /* This is a 64K software visible page */
 + reg = 0x40;
 + break;
 + /* Those are the ID registers for (any) GIC. */
 + case GITS_CIDR0:
 + reg = 0x0d;
 + break;
 + case GITS_CIDR1:
 + reg = 0xf0;
 + break;
 + case GITS_CIDR2:
 + reg = 0x05;
 + break;
 + case GITS_CIDR3:
 + reg = 0xb1;
 + break;
 + }
 + vgic_reg_access(mmio, reg, offset  3,
 + ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
   return false;
  }

 +static int vits_handle_command(struct kvm_vcpu *vcpu, u64 *its_cmd)
 +{
 + return -ENODEV;
 +}
 +
  static bool handle_mmio_gits_cbaser(struct kvm_vcpu *vcpu,
   struct kvm_exit_mmio *mmio,
   phys_addr_t offset)
  {
 + struct vgic_its *its = vcpu-kvm-arch.vgic.its;
 + int mode = ACCESS_READ_VALUE;
 +
 + mode |= its-enabled ? ACCESS_WRITE_IGNORED : ACCESS_WRITE_VALUE;
 +
 + vgic_handle_base_register(vcpu, mmio, offset, its-cbaser, mode);
 +
 + /* Writing CBASER resets the read pointer. */
 + if (mmio-is_write)
 + its-creadr = 0;
 +
   return false;
  }

 +static int its_cmd_buffer_size(struct kvm *kvm)
 +{
 + struct vgic_its *its = kvm-arch.vgic.its;
 +
 + return ((its-cbaser  0xff) + 1)  12;
 +}
 +
 +static gpa_t its_cmd_buffer_base(struct kvm *kvm)
 +{
 + struct vgic_its *its = kvm-arch.vgic.its;
 +
 + return BASER_BASE_ADDRESS(its-cbaser);
 +}
 +
 +/*
 + * By writing to CWRITER the guest announces new commands to be processed.
 + * Since we cannot read from guest memory inside the ITS 

Re: [PATCH v2 10/15] KVM: arm64: add data structures to model ITS interrupt translation

2015-08-25 Thread Andre Przywara
Hi Eric,

On 13/08/15 16:46, Eric Auger wrote:
 
 On 07/10/2015 04:21 PM, Andre Przywara wrote:
 The GICv3 Interrupt Translation Service (ITS) uses tables in memory
 to allow a sophisticated interrupt routing. It features device tables,
 an interrupt table per device and a table connecting collections to
 actual CPUs (aka. redistributors in the GICv3 lingo).
 Since the interrupt numbers for the LPIs are allocated quite sparsely
 and the range can be quite huge (8192 LPIs being the minimum), using
 bitmaps or arrays for storing information is a waste of memory.
 We use linked lists instead, which we iterate linearily. This works
 very well with the actual number of LPIs/MSIs in the guest being
 quite low. Should the number of LPIs exceed the number where iterating
 through lists seems acceptable, we can later revisit this and use more
 efficient data structures.

 Signed-off-by: Andre Przywara andre.przyw...@arm.com
 ---
  include/kvm/arm_vgic.h  |  3 +++
  virt/kvm/arm/its-emul.c | 48 
 
  2 files changed, 51 insertions(+)

 diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
 index b432055..1648668 100644
 --- a/include/kvm/arm_vgic.h
 +++ b/include/kvm/arm_vgic.h
 @@ -25,6 +25,7 @@
  #include linux/spinlock.h
  #include linux/types.h
  #include kvm/iodev.h
 +#include linux/list.h
  
  #define VGIC_NR_IRQS_LEGACY 256
  #define VGIC_NR_SGIS16
 @@ -162,6 +163,8 @@ struct vgic_its {
  u64 cbaser;
  int creadr;
  int cwriter;
 +struct list_headdevice_list;
 +struct list_headcollection_list;
  };
  
  struct vgic_dist {
 diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
 index b498f06..7f217fa 100644
 --- a/virt/kvm/arm/its-emul.c
 +++ b/virt/kvm/arm/its-emul.c
 @@ -21,6 +21,7 @@
  #include linux/kvm.h
  #include linux/kvm_host.h
  #include linux/interrupt.h
 +#include linux/list.h
  
  #include linux/irqchip/arm-gic-v3.h
  #include kvm/arm_vgic.h
 @@ -32,6 +33,25 @@
  #include vgic.h
  #include its-emul.h
  
 +struct its_device {
 +struct list_head dev_list;
 +struct list_head itt;
 +u32 device_id;
 +};
 +
 +struct its_collection {
 +struct list_head coll_list;
 +u32 collection_id;
 +u32 target_addr;
 +};
 +
 +struct its_itte {
 +struct list_head itte_list;
 +struct its_collection *collection;
 +u32 lpi;
 +u32 event_id;
 +};
 +
  #define BASER_BASE_ADDRESS(x) ((x)  0xf000ULL)
  
  /* The distributor lock is held by the VGIC MMIO handler. */
 @@ -311,6 +331,9 @@ int vits_init(struct kvm *kvm)
  
  spin_lock_init(its-lock);
  
 +INIT_LIST_HEAD(its-device_list);
 +INIT_LIST_HEAD(its-collection_list);
 +
  its-enabled = false;
  
  return -ENXIO;
 @@ -320,11 +343,36 @@ void vits_destroy(struct kvm *kvm)
  {
  struct vgic_dist *dist = kvm-arch.vgic;
  struct vgic_its *its = dist-its;
 +struct its_device *dev;
 +struct its_itte *itte;
 +struct list_head *dev_cur, *dev_temp;
 +struct list_head *cur, *temp;
  
  if (!vgic_has_its(kvm))
  return;
  
 +if (!its-device_list.next)
 Why not using list_empty? But I think I would simply remove this since
 the empty case if handle below...

list_empty() requires the list to be initialized before. This check here
is to detect that map_resources was never called (this is only done on
the first VCPU run) and thus device_list is basically still all zeroes.
If we abort the guest without ever running a VCPU (for instance because
some initialization failed), we call vits_destroy() anyway (because this
is called when tearing down the VGIC device).
So the check is here to detect early that vits_destroy() has been called
without the ITS ever been fully initialized. This fixed a real bug when
the guest start was aborted before the ITS was ever used.
I will add a comment to make this clear.

 +return;
 +
 +spin_lock(its-lock);
 +list_for_each_safe(dev_cur, dev_temp, its-device_list) {
 +dev = container_of(dev_cur, struct its_device, dev_list);
 isn't the usage of list_for_each_entry_safe more synthetic here?

If I got this correctly, we need the _safe variant if we want to remove
the list item within the loop. Or am I missing something here?

Cheers,
Andre.


 +list_for_each_safe(cur, temp, dev-itt) {
 +itte = (container_of(cur, struct its_itte, itte_list));
 same
 
 Eric
 +list_del(cur);
 +kfree(itte);
 +}
 +list_del(dev_cur);
 +kfree(dev);
 +}
 +
 +list_for_each_safe(cur, temp, its-collection_list) {
 +list_del(cur);
 +kfree(container_of(cur, struct its_collection, coll_list));
 +}
 +
  kfree(dist-pendbaser);
  
  its-enabled = false;
 +spin_unlock(its-lock);
  }

 
--
To unsubscribe from this list: send

Re: [PATCH v2 14/15] KVM: arm64: implement MSI injection in ITS emulation

2015-08-24 Thread Andre Przywara
Hi,

On 03/08/15 18:06, Marc Zyngier wrote:
 On 03/08/15 16:37, Eric Auger wrote:
 Andre, Pavel,
 On 08/03/2015 11:16 AM, Pavel Fedin wrote:
  Hello!

 Again the case that leaves me uncomfortable is the one where the
 userspace does not provide the devid whereas it must (GICv3 ITS case).

  Hypothetical broken userland which does not exist for now ?
 Yes but that's the rule to be not confident in *any* userspace, isn't it?

Well, that's only regarding safety, not regarding functionality, right?
So if we could break the kernel by not providing the flag and/or devid,
this needs to be fixed. But if it just doesn't work, that's OK.


 As of now I prefer keeping the flags at uapi level and propagate it
 downto the kernel, as long as I don't have any answer for the unset
 devid discrimination question. Please apologize for my stubbornness ;-)
 
 I think this flag should be kept, as it really indicates what is valid
 in the MSI structure. It also has other benefits such as making obvious
 what userspace expects, which can then be checked against the kernel's
 own expectations.

I agree on this. Usually this kind of redundancy leads to strange code,
but this does not seem to apply here, since we can at least still guard
the assignments to demonstrate that the devid field needs to go along
with the flag.

Cheers,
Andre.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 01/15] KVM: arm/arm64: VGIC: don't track used LRs in the distributor

2015-08-24 Thread Andre Przywara
Hi Eric,

On 12/08/15 10:01, Eric Auger wrote:

 diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
 index bc40137..394622c 100644
 --- a/virt/kvm/arm/vgic.c
 +++ b/virt/kvm/arm/vgic.c
 @@ -79,7 +79,6 @@
  #include vgic.h
  
  static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
 -static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
  static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
  static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr 
 lr_desc);
  
 @@ -647,6 +646,17 @@ bool vgic_handle_cfg_reg(u32 *reg, struct kvm_exit_mmio 
 *mmio,
  return false;
  }
  
 +static void vgic_sync_lr_elrsr(struct kvm_vcpu *vcpu, int lr,
 +   struct vgic_lr vlr)
 +{
 +vgic_ops-sync_lr_elrsr(vcpu, lr, vlr);
 +}
 why not renaming this into vgic_set_elrsr. This would be homogeneous
 with other virtual interface control register setters?

But that would involve renaming the vgic_ops members as well to be
consistent, right? As there is no change in the behaviour, a naming
change sounds unmotivated to me. And _set_ wouldn't be exact, as this
function deals only with only one bit at a time and allows to clear it
as well.

 +
 +static inline u64 vgic_get_elrsr(struct kvm_vcpu *vcpu)
 +{
 +return vgic_ops-get_elrsr(vcpu);
 +}
 If I am not wrong, each time you manipulate the elrsr you handle the
 bitmap. why not directly returning an unsigned long * then (elrsr_ptr)?

Because the pointer needs to point somewhere, and that storage is
currently located on the caller's stack. Directly returning a pointer
would require the caller to provide some memory for the u64, which does
not save you so much in terms on LOC:

-   u64 elrsr = vgic_get_elrsr(vcpu);
-   unsigned long *elrsr_ptr = u64_to_bitmask(elrsr);
+   u64 elrsr;
+   unsigned long *elrsr_ptr = vgic_get_elrsr_bm(vcpu, elrsr);

Also we need u64_to_bitmask() in one case when converting the EISR
value, so we cannot get lost of that function.

 +
  /**
   * vgic_unqueue_irqs - move pending/active IRQs from LRs to the distributor
   * @vgic_cpu: Pointer to the vgic_cpu struct holding the LRs
 @@ -658,9 +668,11 @@ bool vgic_handle_cfg_reg(u32 *reg, struct kvm_exit_mmio 
 *mmio,
  void vgic_unqueue_irqs(struct kvm_vcpu *vcpu)
  {
  struct vgic_cpu *vgic_cpu = vcpu-arch.vgic_cpu;
 +u64 elrsr = vgic_get_elrsr(vcpu);
 +unsigned long *elrsr_ptr = u64_to_bitmask(elrsr);
  int i;
  
 -for_each_set_bit(i, vgic_cpu-lr_used, vgic_cpu-nr_lr) {
 +for_each_clear_bit(i, elrsr_ptr, vgic_cpu-nr_lr) {
  struct vgic_lr lr = vgic_get_lr(vcpu, i);
  
  /*
 @@ -703,7 +715,7 @@ void vgic_unqueue_irqs(struct kvm_vcpu *vcpu)
   * Mark the LR as free for other use.
   */
  BUG_ON(lr.state  LR_STATE_MASK);
 -vgic_retire_lr(i, lr.irq, vcpu);
 +vgic_sync_lr_elrsr(vcpu, i, lr);
  vgic_irq_clear_queued(vcpu, lr.irq);
  
  /* Finally update the VGIC state. */
 @@ -1011,17 +1023,6 @@ static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr,
  vgic_ops-set_lr(vcpu, lr, vlr);
  }
  
 -static void vgic_sync_lr_elrsr(struct kvm_vcpu *vcpu, int lr,
 -   struct vgic_lr vlr)
 -{
 -vgic_ops-sync_lr_elrsr(vcpu, lr, vlr);
 -}
 -
 -static inline u64 vgic_get_elrsr(struct kvm_vcpu *vcpu)
 -{
 -return vgic_ops-get_elrsr(vcpu);
 -}
 -
  static inline u64 vgic_get_eisr(struct kvm_vcpu *vcpu)
  {
  return vgic_ops-get_eisr(vcpu);
 @@ -1062,18 +1063,6 @@ static inline void vgic_enable(struct kvm_vcpu *vcpu)
  vgic_ops-enable(vcpu);
  }
  
 -static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu)
 -{
 -struct vgic_cpu *vgic_cpu = vcpu-arch.vgic_cpu;
 -struct vgic_lr vlr = vgic_get_lr(vcpu, lr_nr);
 -
 -vlr.state = 0;
 -vgic_set_lr(vcpu, lr_nr, vlr);
 -clear_bit(lr_nr, vgic_cpu-lr_used);
 -vgic_cpu-vgic_irq_lr_map[irq] = LR_EMPTY;
 -vgic_sync_lr_elrsr(vcpu, lr_nr, vlr);
 -}
 -
  /*
   * An interrupt may have been disabled after being made pending on the
   * CPU interface (the classic case is a timer running while we're
 @@ -1085,23 +1074,32 @@ static void vgic_retire_lr(int lr_nr, int irq, 
 struct kvm_vcpu *vcpu)
   */
  static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu)
  {
 -struct vgic_cpu *vgic_cpu = vcpu-arch.vgic_cpu;
 +u64 elrsr = vgic_get_elrsr(vcpu);
 +unsigned long *elrsr_ptr = u64_to_bitmask(elrsr);
 if you agree with above modif I would simply rename elrsr_ptr into elrsr.
  int lr;
 +struct vgic_lr vlr;
 why moving this declaration here. I think this can remain in the block.

Possibly. Don't remember the reason of this move, I think it was due to
some other changes I later removed. I will revert it.

  
 -for_each_set_bit(lr, vgic_cpu-lr_used, vgic-nr_lr) {
 -struct vgic_lr vlr = vgic_get_lr(vcpu, lr);
 +for_each_clear_bit(lr, elrsr_ptr, 

Re: [PATCH v2 05/15] KVM: arm/arm64: make GIC frame address initialization model specific

2015-08-24 Thread Andre Przywara
Hi,

On 12/08/15 14:02, Eric Auger wrote:
 On 07/10/2015 04:21 PM, Andre Przywara wrote:
 Currently we initialize all the possible GIC frame addresses in one
 function, without looking at the specific GIC model we instantiate
 for the guest.
 As this gets confusing when adding another VGIC model later, lets
 move these initializations into the respective model's init 
 nit: tobe more precise the init emulation function (not the
 vgic_v2/v3_init_model model's init function). pfouh?! ;-)
 functions.

OK, will try to find a wording that is not completely confusing.


 Signed-off-by: Andre Przywara andre.przyw...@arm.com
 ---
  virt/kvm/arm/vgic-v2-emul.c | 3 +++
  virt/kvm/arm/vgic-v3-emul.c | 3 +++
  virt/kvm/arm/vgic.c | 3 ---
  3 files changed, 6 insertions(+), 3 deletions(-)

 diff --git a/virt/kvm/arm/vgic-v2-emul.c b/virt/kvm/arm/vgic-v2-emul.c
 index 1390797..8faa28c 100644
 --- a/virt/kvm/arm/vgic-v2-emul.c
 +++ b/virt/kvm/arm/vgic-v2-emul.c
 @@ -567,6 +567,9 @@ void vgic_v2_init_emulation(struct kvm *kvm)
  dist-vm_ops.init_model = vgic_v2_init_model;
  dist-vm_ops.map_resources = vgic_v2_map_resources;
  
 +dist-vgic_cpu_base = VGIC_ADDR_UNDEF;
 +dist-vgic_dist_base = VGIC_ADDR_UNDEF;
 Looks strange to see the common dist_base here. Why don't you leave it
 in common part, kvm_vgic_create; all the more so you left
 kvm-arch.vgic.vctrl_base = vgic-vctrl_base in kvm_vgic_create.

The idea behind this is that dist_base refers to similar, but not
identical distributors (v2 vs. v3), so I found it a good idea to
initialize it in here. Also vctrl_base is host facing and not set by
userland, so this doesn't really compare here.

Cheers,
Andre.

 +
  kvm-arch.max_vcpus = VGIC_V2_MAX_CPUS;
  }
  
 diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c
 index d2eeb20..1f42348 100644
 --- a/virt/kvm/arm/vgic-v3-emul.c
 +++ b/virt/kvm/arm/vgic-v3-emul.c
 @@ -885,6 +885,9 @@ void vgic_v3_init_emulation(struct kvm *kvm)
  dist-vm_ops.destroy_model = vgic_v3_destroy_model;
  dist-vm_ops.map_resources = vgic_v3_map_resources;
  
 +dist-vgic_dist_base = VGIC_ADDR_UNDEF;
 +dist-vgic_redist_base = VGIC_ADDR_UNDEF;
 +
  kvm-arch.max_vcpus = KVM_MAX_VCPUS;
  }
  
 diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
 index cc8f5ed..59f1801 100644
 --- a/virt/kvm/arm/vgic.c
 +++ b/virt/kvm/arm/vgic.c
 @@ -1830,9 +1830,6 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
  kvm-arch.vgic.in_kernel = true;
  kvm-arch.vgic.vgic_model = type;
  kvm-arch.vgic.vctrl_base = vgic-vctrl_base;
 -kvm-arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
 -kvm-arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF;
 -kvm-arch.vgic.vgic_redist_base = VGIC_ADDR_UNDEF;
  
  out_unlock:
  for (; vcpu_lock_idx = 0; vcpu_lock_idx--) {

 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 07/15] KVM: arm64: handle ITS related GICv3 redistributor registers

2015-08-24 Thread Andre Przywara
Hi Eric,

On 13/08/15 13:17, Eric Auger wrote:
 On 07/10/2015 04:21 PM, Andre Przywara wrote:
 In the GICv3 redistributor there are the PENDBASER and PROPBASER
 registers which we did not emulate so far, as they only make sense
 when having an ITS. In preparation for that emulate those MMIO
 accesses by storing the 64-bit data written into it into a variable
 which we later read in the ITS emulation.

 Signed-off-by: Andre Przywara andre.przyw...@arm.com
 ---
  include/kvm/arm_vgic.h  |  8 
  virt/kvm/arm/vgic-v3-emul.c | 44 
 
  virt/kvm/arm/vgic.c | 35 +++
  virt/kvm/arm/vgic.h |  4 
  4 files changed, 91 insertions(+)

 diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
 index 3ee063b..8c6cb0e 100644
 --- a/include/kvm/arm_vgic.h
 +++ b/include/kvm/arm_vgic.h
 @@ -256,6 +256,14 @@ struct vgic_dist {
  struct vgic_vm_ops  vm_ops;
  struct vgic_io_device   dist_iodev;
  struct vgic_io_device   *redist_iodevs;
 +
 +/* Address of LPI configuration table shared by all redistributors */
 +u64 propbaser;
 +
 +/* Addresses of LPI pending tables per redistributor */
 +u64 *pendbaser;
 +
 +boollpis_enabled;
  };
  
  struct vgic_v2_cpu_if {
 diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c
 index a8cf669..5269ad1 100644
 --- a/virt/kvm/arm/vgic-v3-emul.c
 +++ b/virt/kvm/arm/vgic-v3-emul.c
 @@ -651,6 +651,38 @@ static bool handle_mmio_cfg_reg_redist(struct kvm_vcpu 
 *vcpu,
  return vgic_handle_cfg_reg(reg, mmio, offset);
  }
  
 +/* We don't trigger any actions here, just store the register value */
 +static bool handle_mmio_propbaser_redist(struct kvm_vcpu *vcpu,
 + struct kvm_exit_mmio *mmio,
 + phys_addr_t offset)
 +{
 +struct vgic_dist *dist = vcpu-kvm-arch.vgic;
 +int mode = ACCESS_READ_VALUE;
 +
 +/* Storing a value with LPIs already enabled is undefined */
 +mode |= dist-lpis_enabled ? ACCESS_WRITE_IGNORED : ACCESS_WRITE_VALUE;
 +vgic_handle_base_register(vcpu, mmio, offset, dist-propbaser, mode);
 +
 +return false;
 +}
 +
 +/* We don't trigger any actions here, just store the register value */
 +static bool handle_mmio_pendbaser_redist(struct kvm_vcpu *vcpu,
 + struct kvm_exit_mmio *mmio,
 + phys_addr_t offset)
 +{
 +struct kvm_vcpu *rdvcpu = mmio-private;
 +struct vgic_dist *dist = vcpu-kvm-arch.vgic;
 +int mode = ACCESS_READ_VALUE;
 +
 +/* Storing a value with LPIs already enabled is undefined */
 +mode |= dist-lpis_enabled ? ACCESS_WRITE_IGNORED : ACCESS_WRITE_VALUE;
 +vgic_handle_base_register(vcpu, mmio, offset,
 +  dist-pendbaser[rdvcpu-vcpu_id], mode);
 +
 +return false;
 +}
 +
  #define SGI_base(x) ((x) + SZ_64K)
  
  static const struct vgic_io_range vgic_redist_ranges[] = {
 @@ -679,6 +711,18 @@ static const struct vgic_io_range vgic_redist_ranges[] 
 = {
  .handle_mmio= handle_mmio_raz_wi,
  },
  {
 +.base   = GICR_PENDBASER,
 +.len= 0x08,
 +.bits_per_irq   = 0,
 +.handle_mmio= handle_mmio_pendbaser_redist,
 +},
 +{
 +.base   = GICR_PROPBASER,
 +.len= 0x08,
 +.bits_per_irq   = 0,
 +.handle_mmio= handle_mmio_propbaser_redist,
 +},
 +{
  .base   = GICR_IDREGS,
  .len= 0x30,
  .bits_per_irq   = 0,
 diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
 index 15e447f..49ee92b 100644
 --- a/virt/kvm/arm/vgic.c
 +++ b/virt/kvm/arm/vgic.c
 @@ -446,6 +446,41 @@ void vgic_reg_access(struct kvm_exit_mmio *mmio, u32 
 *reg,
  }
  }
  
 +/* handle a 64-bit register access */
 +void vgic_handle_base_register(struct kvm_vcpu *vcpu,
 +   struct kvm_exit_mmio *mmio,
 +   phys_addr_t offset, u64 *basereg,
 +   int mode)
 +{
 why do we have vcpu in the proto? I don't see it used. Also if it were
 can't we fetch it from mmio-private?
 
 why not renaming this into something like vgic_reg64_access as par
 vgic_reg_access 32b flavor above. vgic_handle* usually is the name of
 the region handler returning bool?

Makes sense, I both renamed the function and removed the vcpu parameter.
I need to check whether we need the vcpu to do some endianness checks in
the future, though. Using mmio-private would be a hack, then.

Cheers,
Andre.

 
 +u32 reg;
 +u64 breg;
 +
 +switch (offset  ~3) {
 +case 0x00:
 +breg = *basereg;
 +reg = lower_32_bits(breg);
 +vgic_reg_access(mmio, reg, offset  3, mode

Re: [PATCH v2 14/15] KVM: arm64: implement MSI injection in ITS emulation

2015-08-02 Thread Andre Przywara
On 31/07/15 14:22, Eric Auger wrote:

Salut Eric,

 On 07/10/2015 04:21 PM, Andre Przywara wrote:
 When userland wants to inject a MSI into the guest, we have to use
 our data structures to find the LPI number and the VCPU to receive
 the interrupt.
 Use the wrapper functions to iterate the linked lists and find the
 proper Interrupt Translation Table Entry. Then set the pending bit
 in this ITTE to be later picked up by the LR handling code. Kick
 the VCPU which is meant to handle this interrupt.
 We provide a VGIC emulation model specific routine for the actual
 MSI injection. The wrapper functions return an error for models not
 (yet) implementing MSIs (like the GICv2 emulation).
 We also provide the handler for the ITS INT command, which allows a
 guest to trigger an MSI via the ITS command queue.

 Signed-off-by: Andre Przywara andre.przyw...@arm.com
 ---
  include/kvm/arm_vgic.h  |  1 +
  virt/kvm/arm/its-emul.c | 65 
 +
  virt/kvm/arm/its-emul.h |  2 ++
  virt/kvm/arm/vgic-v3-emul.c |  1 +
  4 files changed, 69 insertions(+)

 diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
 index 323c33a..9e1abf9 100644
 --- a/include/kvm/arm_vgic.h
 +++ b/include/kvm/arm_vgic.h
 @@ -149,6 +149,7 @@ struct vgic_vm_ops {
  int (*map_resources)(struct kvm *, const struct vgic_params *);
  bool(*queue_lpis)(struct kvm_vcpu *);
  void(*unqueue_lpi)(struct kvm_vcpu *, int irq);
 +int (*inject_msi)(struct kvm *, struct kvm_msi *);
  };
  
  struct vgic_io_device {
 diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
 index 89534c6..a1c12bb 100644
 --- a/virt/kvm/arm/its-emul.c
 +++ b/virt/kvm/arm/its-emul.c
 @@ -323,6 +323,55 @@ static bool handle_mmio_gits_idregs(struct kvm_vcpu 
 *vcpu,
  }
  
  /*
 + * Translates an incoming MSI request into the redistributor (=VCPU) and
 + * the associated LPI number. Sets the LPI pending bit and also marks the
 + * VCPU as having a pending interrupt.
 + */
 +int vits_inject_msi(struct kvm *kvm, struct kvm_msi *msi)
 +{
 +struct vgic_dist *dist = kvm-arch.vgic;
 +struct vgic_its *its = dist-its;
 +struct its_itte *itte;
 +int cpuid;
 +bool inject = false;
 +int ret = 0;
 +
 +if (!vgic_has_its(kvm))
 +return -ENODEV;
 +
 +if (!(msi-flags  KVM_MSI_VALID_DEVID))
 +return -EINVAL;
 I am currently reworking the GSI routing series according to latest
 comments (flag usage on userside and removal of EXTENDED_MSI type on
 kernel side as you suggested). Given the data path,
 
 kvm_send_userspace_msi (kvm_msi*)
 |_ kvm_set_msi (kvm_kernel_irq_routing_entry *)
   |_ kvm-arch.vgic.vm_ops.inject_msi (kvm_msi *)
 
 the above check is useless I think since in kvm_set_msi I need to
 populate a kvm_msi struct from a kernel routing entry struct. The kernel
 routing entry struct has no info about the validity of devid so I
 systematically sets the flag in kvm_msi.
 
 I am still dubious about not storing the KVM_MSI_VALID_DEVID info
 somewhere in the kernel routing entry struct.

When I reworked our code to only use a flag and not a separate routing
type I ended up with the flag only guarding assignments, which wouldn't
hurt if done unconditionally (since they are all u32's). So the whole
usage of the flag is somewhat in jeopardy now.
Either the eventual MSI consumer requires a DevID (ITS emulation, which
will not work without it) or the consumer does not care at all and can
totally ignore it (GICv2m). So I think we can always pass on the DevID
field and let the final function decide whether to use it or not. But
somehow this doesn't sound right to me, so maybe I am missing something
here?

Cheers,
Andre.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 7/7] KVM: arm: implement kvm_set_msi by gsi direct mapping

2015-08-02 Thread Andre Przywara
On 31/07/15 13:59, Eric Auger wrote:
 Hi Andre,
 On 07/11/2015 01:17 AM, Andre Przywara wrote:
 On 09/07/15 09:22, Eric Auger wrote:
 If the ITS modality is not available, let's simply support MSI
 injection by transforming the MSI.data into an SPI ID.

 This becomes possible to use KVM_SIGNAL_MSI ioctl for arm too.

 Signed-off-by: Eric Auger eric.au...@linaro.org

 ---

 v1 - v2:
 - introduce vgic_v2m_inject_msi in vgic-v2-emul.c following Andre's
   advice
 ---
  arch/arm/kvm/Kconfig|  1 +
  virt/kvm/arm/vgic-v2-emul.c | 12 
  2 files changed, 13 insertions(+)

 diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
 index 151e710..0f58baf 100644
 --- a/arch/arm/kvm/Kconfig
 +++ b/arch/arm/kvm/Kconfig
 @@ -31,6 +31,7 @@ config KVM
 select KVM_VFIO
 select HAVE_KVM_EVENTFD
 select HAVE_KVM_IRQFD
 +   select HAVE_KVM_MSI

 I wonder if this requires some more code to only advertise
 KVM_CAP_SIGNAL_MSI if userland actually sets up a GICv2M?
 Otherwise userland could get the idea of being able to inectj MSIs
 without the guest actually being prepared for that (because the GICv2M
 driver did not initialize).
 Not sure I get what you mean here. By directly transforming the user
 provided MSI msg into an SPI ID, do we really have to care about GICv2M?

By user provided message you mean from user space? So this is an
emulated device, which the guest programs with an MSI payload and a
doorbell address at least? So how would a guest know these things
without having a MSI capable interrupt controller?

Or are we talking about different things here?

Cheers,
Andre.

 
 Best Regards
 
 Eric

 Cheers,
 Andre.

 select HAVE_KVM_IRQCHIP
 select HAVE_KVM_IRQ_ROUTING
 depends on ARM_VIRT_EXT  ARM_LPAE  ARM_ARCH_TIMER
 diff --git a/virt/kvm/arm/vgic-v2-emul.c b/virt/kvm/arm/vgic-v2-emul.c
 index 1390797..43013cc 100644
 --- a/virt/kvm/arm/vgic-v2-emul.c
 +++ b/virt/kvm/arm/vgic-v2-emul.c
 @@ -478,6 +478,17 @@ static bool vgic_v2_queue_sgi(struct kvm_vcpu *vcpu, 
 int irq)
  }
  
  /**
 + * Emulates GICv2M MSI injection by injecting the SPI ID matching
 + * the msi data
 + * @kvm: pointer to the kvm struct
 + * @msi: the msi struct handle
 + */
 +static int vgic_v2m_inject_msi(struct kvm *kvm, struct kvm_msi *msi)
 +{
 +   return kvm_vgic_inject_irq(kvm, 0, msi-data, 1);
 +}
 +
 +/**
   * kvm_vgic_map_resources - Configure global VGIC state before running any 
 VCPUs
   * @kvm: pointer to the kvm struct
   *
 @@ -566,6 +577,7 @@ void vgic_v2_init_emulation(struct kvm *kvm)
 dist-vm_ops.add_sgi_source = vgic_v2_add_sgi_source;
 dist-vm_ops.init_model = vgic_v2_init_model;
 dist-vm_ops.map_resources = vgic_v2_map_resources;
 +   dist-vm_ops.inject_msi = vgic_v2m_inject_msi;
  
 kvm-arch.max_vcpus = VGIC_V2_MAX_CPUS;
  }


 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/14] MIPS: use pseek() in ELF kernel image loading

2015-07-30 Thread Andre Przywara
Use the newly introduced pseek() function when skipping to the start
offset in the ELF file.
The layout of an ELF file should satisfy the constraints of pseek, so
that we should be able to use a pipe file descriptor as well.

Signed-off-by: Andre Przywara andre.przyw...@arm.com
---
 mips/kvm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mips/kvm.c b/mips/kvm.c
index c1c596c..4d08b20 100644
--- a/mips/kvm.c
+++ b/mips/kvm.c
@@ -328,8 +328,8 @@ static bool load_elf_binary(struct kvm *kvm, int fd_kernel)
 
kvm-arch.entry_point = ei.entry_point;
 
-   if (lseek(fd_kernel, ei.offset, SEEK_SET)  0)
-   die_perror(lseek);
+   if (!pseek(fd_kernel, ei.offset - sizeof(union ElfHeaders)))
+   die_perror(seek);
 
p = guest_flat_to_host(kvm, ei.load_addr);
 
-- 
2.3.5

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 12/14] powerpc: use read_file() in kernel and initrd loading

2015-07-30 Thread Andre Przywara
Replace the unsafe read-loops in the powerpc kernel image loading
function with our new and safe read_file() wrapper.
This should fix random fails in kernel image loading, especially
from pipes and sockets.

Signed-off-by: Andre Przywara andre.przyw...@arm.com
---
 powerpc/kvm.c | 36 
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/powerpc/kvm.c b/powerpc/kvm.c
index 87d0f9e..9888bf1 100644
--- a/powerpc/kvm.c
+++ b/powerpc/kvm.c
@@ -162,16 +162,19 @@ bool kvm__arch_load_kernel_image(struct kvm *kvm, int 
fd_kernel, int fd_initrd,
 {
void *p;
void *k_start;
-   void *i_start;
-   int nr;
+   ssize_t filesize;
 
p = k_start = guest_flat_to_host(kvm, KERNEL_LOAD_ADDR);
 
-   while ((nr = read(fd_kernel, p, 65536))  0)
-   p += nr;
-
-   pr_info(Loaded kernel to 0x%x (%ld bytes), KERNEL_LOAD_ADDR, 
(long)(p-k_start));
+   filesize = read_file(fd_kernel, p, INITRD_LOAD_ADDR - KERNEL_LOAD_ADDR);
+   if (filesize  0) {
+   if (errno == ENOMEM)
+   die(Kernel overlaps initrd!);
 
+   die_perror(kernel read);
+   }
+   pr_info(Loaded kernel to 0x%x (%ld bytes), KERNEL_LOAD_ADDR,
+   filesize);
if (fd_initrd != -1) {
if (lseek(fd_initrd, 0, SEEK_SET)  0)
die_perror(lseek);
@@ -180,19 +183,20 @@ bool kvm__arch_load_kernel_image(struct kvm *kvm, int 
fd_kernel, int fd_initrd,
die(Kernel overlaps initrd!);
 
/* Round up kernel size to 8byte alignment, and load initrd 
right after. */
-   i_start = p = guest_flat_to_host(kvm, INITRD_LOAD_ADDR);
-
-   while (((nr = read(fd_initrd, p, 65536))  0) 
-  p  (kvm-ram_start + kvm-ram_size))
-   p += nr;
-
-   if (p = (kvm-ram_start + kvm-ram_size))
-   die(initrd too big to contain in guest RAM.\n);
+   p = guest_flat_to_host(kvm, INITRD_LOAD_ADDR);
+
+   filesize = read_file(fd_initrd, p,
+  (kvm-ram_start + kvm-ram_size) - p);
+   if (filesize  0) {
+   if (errno == ENOMEM)
+   die(initrd too big to contain in guest 
RAM.\n);
+   die_perror(initrd read);
+   }
 
pr_info(Loaded initrd to 0x%x (%ld bytes),
-   INITRD_LOAD_ADDR, (long)(p-i_start));
+   INITRD_LOAD_ADDR, filesize);
kvm-arch.initrd_gra = INITRD_LOAD_ADDR;
-   kvm-arch.initrd_size = p-i_start;
+   kvm-arch.initrd_size = filesize;
} else {
kvm-arch.initrd_size = 0;
}
-- 
2.3.5

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/14] arm/arm64: use read_file() in kernel and initrd loading

2015-07-30 Thread Andre Przywara
Use the new read_file() wrapper in our arm/arm64 kernel image loading
function instead of the private implementation.

Signed-off-by: Andre Przywara andre.przyw...@arm.com
---
 arm/kvm.c | 42 --
 1 file changed, 20 insertions(+), 22 deletions(-)

diff --git a/arm/kvm.c b/arm/kvm.c
index 6e3f80e..277d9e6 100644
--- a/arm/kvm.c
+++ b/arm/kvm.c
@@ -87,19 +87,6 @@ void kvm__arch_init(struct kvm *kvm, const char 
*hugetlbfs_path, u64 ram_size)
die(Failed to create virtual GIC);
 }
 
-static int read_image(int fd, void **pos, void *limit)
-{
-   int count;
-
-   while (((count = xread(fd, *pos, SZ_64K))  0)  *pos = limit)
-   *pos += count;
-
-   if (pos  0)
-   die_perror(xread);
-
-   return *pos  limit ? 0 : -ENOMEM;
-}
-
 #define FDT_ALIGN  SZ_2M
 #define INITRD_ALIGN   4
 bool kvm__arch_load_kernel_image(struct kvm *kvm, int fd_kernel, int fd_initrd,
@@ -107,6 +94,7 @@ bool kvm__arch_load_kernel_image(struct kvm *kvm, int 
fd_kernel, int fd_initrd,
 {
void *pos, *kernel_end, *limit;
unsigned long guest_addr;
+   ssize_t file_size;
 
/*
 * Linux requires the initrd and dtb to be mapped inside lowmem,
@@ -116,13 +104,17 @@ bool kvm__arch_load_kernel_image(struct kvm *kvm, int 
fd_kernel, int fd_initrd,
 
pos = kvm-ram_start + ARM_KERN_OFFSET(kvm);
kvm-arch.kern_guest_start = host_to_guest_flat(kvm, pos);
-   if (read_image(fd_kernel, pos, limit) == -ENOMEM)
-   die(kernel image too big to contain in guest memory.);
+   file_size = read_file(fd_kernel, pos, limit - pos);
+   if (file_size  0) {
+   if (errno == ENOMEM)
+   die(kernel image too big to contain in guest memory.);
+
+   die_perror(kernel read);
+   }
+   kernel_end = pos + file_size;
 
-   kernel_end = pos;
-   pr_info(Loaded kernel to 0x%llx (%llu bytes),
-   kvm-arch.kern_guest_start,
-   host_to_guest_flat(kvm, pos) - kvm-arch.kern_guest_start);
+   pr_info(Loaded kernel to 0x%llx (%zd bytes),
+   kvm-arch.kern_guest_start, file_size);
 
/*
 * Now load backwards from the end of memory so the kernel
@@ -160,11 +152,17 @@ bool kvm__arch_load_kernel_image(struct kvm *kvm, int 
fd_kernel, int fd_initrd,
die(initrd overlaps with kernel image.);
 
initrd_start = guest_addr;
-   if (read_image(fd_initrd, pos, limit) == -ENOMEM)
-   die(initrd too big to contain in guest memory.);
+
+   file_size = read_file(fd_initrd, pos, limit - pos);
+   if (file_size == -1) {
+   if (errno == ENOMEM)
+   die(initrd too big to contain in guest 
memory.);
+
+   die_perror(initrd read);
+   }
 
kvm-arch.initrd_guest_start = initrd_start;
-   kvm-arch.initrd_size = host_to_guest_flat(kvm, pos) - 
initrd_start;
+   kvm-arch.initrd_size = file_size;
pr_info(Loaded initrd to 0x%llx (%llu bytes),
kvm-arch.initrd_guest_start,
kvm-arch.initrd_size);
-- 
2.3.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/14] MIPS: use read wrappers in kernel loading

2015-07-30 Thread Andre Przywara
Replace the unsafe read-loops used in the MIPS kernel image loading
with our safe read_file() and read_in_full() wrappers.
This should fix random fails in kernel image loading, especially
from pipes and sockets.

Signed-off-by: Andre Przywara andre.przyw...@arm.com
---
 mips/kvm.c | 35 ---
 1 file changed, 16 insertions(+), 19 deletions(-)

diff --git a/mips/kvm.c b/mips/kvm.c
index d970ee0..2f0d61b 100644
--- a/mips/kvm.c
+++ b/mips/kvm.c
@@ -169,21 +169,27 @@ static bool load_flat_binary(struct kvm *kvm, int 
fd_kernel, const void *buf,
 {
void *p;
void *k_start;
-   int nr;
+   ssize_t kernel_size;
 
p = k_start = guest_flat_to_host(kvm, KERNEL_LOAD_ADDR);
 
memcpy(p, buf, buflen);
p += buflen;
 
-   while ((nr = read(fd_kernel, p, 65536))  0)
-   p += nr;
+   kernel_size = read_file(fd_kernel, p,
+   kvm-cfg.ram_size - KERNEL_LOAD_ADDR);
+   if (kernel_size == -1) {
+   if (errno == ENOMEM)
+   die(kernel too big for guest memory);
+   else
+   die_perror(kernel read);
+   }
 
kvm-arch.is64bit = true;
kvm-arch.entry_point = 0x8100ull;
 
-   pr_info(Loaded kernel to 0x%x (%ld bytes), KERNEL_LOAD_ADDR,
-   (long int)(p - k_start));
+   pr_info(Loaded kernel to 0x%x (%zd bytes), KERNEL_LOAD_ADDR,
+   kernel_size);
 
return true;
 }
@@ -199,7 +205,6 @@ static bool kvm__arch_get_elf_64_info(Elf64_Ehdr *ehdr, int 
fd_kernel,
  struct kvm__arch_elf_info *ei)
 {
int i;
-   size_t nr;
Elf64_Phdr phdr;
 
if (ehdr-e_phentsize != sizeof(phdr)) {
@@ -214,8 +219,7 @@ static bool kvm__arch_get_elf_64_info(Elf64_Ehdr *ehdr, int 
fd_kernel,
 
phdr.p_type = PT_NULL;
for (i = 0; i  ehdr-e_phnum; i++) {
-   nr = read(fd_kernel, phdr, sizeof(phdr));
-   if (nr != sizeof(phdr)) {
+   if (read_in_full(fd_kernel, phdr, sizeof(phdr)) != 
sizeof(phdr)) {
pr_info(Couldn't read %d bytes for ELF PHDR., 
(int)sizeof(phdr));
return false;
}
@@ -245,7 +249,6 @@ static bool kvm__arch_get_elf_32_info(Elf32_Ehdr *ehdr, int 
fd_kernel,
  struct kvm__arch_elf_info *ei)
 {
int i;
-   size_t nr;
Elf32_Phdr phdr;
 
if (ehdr-e_phentsize != sizeof(phdr)) {
@@ -260,8 +263,7 @@ static bool kvm__arch_get_elf_32_info(Elf32_Ehdr *ehdr, int 
fd_kernel,
 
phdr.p_type = PT_NULL;
for (i = 0; i  ehdr-e_phnum; i++) {
-   nr = read(fd_kernel, phdr, sizeof(phdr));
-   if (nr != sizeof(phdr)) {
+   if (read_in_full(fd_kernel, phdr, sizeof(phdr)) != 
sizeof(phdr)) {
pr_info(Couldn't read %d bytes for ELF PHDR., 
(int)sizeof(phdr));
return false;
}
@@ -292,7 +294,6 @@ union ElfHeaders {
 static bool load_elf_binary(struct kvm *kvm, int fd_kernel,
union ElfHeaders *eh)
 {
-   size_t nr;
char *p;
struct kvm__arch_elf_info ei;
 
@@ -331,13 +332,9 @@ static bool load_elf_binary(struct kvm *kvm, int fd_kernel,
pr_info(ELF Loading 0x%lx bytes from 0x%llx to 0x%llx,
(unsigned long)ei.len, (unsigned long long)ei.offset,
(unsigned long long)ei.load_addr);
-   do {
-   nr = read(fd_kernel, p, ei.len);
-   if (nr  0)
-   die_perror(read);
-   p += nr;
-   ei.len -= nr;
-   } while (ei.len);
+
+   if (read_in_full(fd_kernel, p, ei.len) != (ssize_t)ei.len)
+   die_perror(read);
 
return true;
 }
-- 
2.3.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/14] x86: support loading flat binary kernel images from a pipe

2015-07-30 Thread Andre Przywara
With the latest patches we allow loading bzImage kernels from a pipe,
but we still fail on flat binary images.
Rework the loading routines to take memory buffers for the beginning
of the file, so we don't need to rewind the image.
This allows to fall back to flat binary loading if bzImage fails
without using a seek, so kvmtool will happily accept any file
descriptor (including pipes) for the image file.

Signed-off-by: Andre Przywara andre.przyw...@arm.com
---
 x86/kvm.c | 48 +---
 1 file changed, 25 insertions(+), 23 deletions(-)

diff --git a/x86/kvm.c b/x86/kvm.c
index 8fe5585..9817953 100644
--- a/x86/kvm.c
+++ b/x86/kvm.c
@@ -206,16 +206,16 @@ static inline void *guest_real_to_host(struct kvm *kvm, 
u16 selector, u16 offset
return guest_flat_to_host(kvm, flat);
 }
 
-static bool load_flat_binary(struct kvm *kvm, int fd_kernel)
+static bool load_flat_binary(struct kvm *kvm, int fd_kernel, void *buf, int 
len)
 {
void *p;
int nr;
 
-   if (lseek(fd_kernel, 0, SEEK_SET)  0)
-   die_perror(lseek);
-
p = guest_real_to_host(kvm, BOOT_LOADER_SELECTOR, BOOT_LOADER_IP);
 
+   memcpy(p, buf, len);
+   p += len;
+
while ((nr = read(fd_kernel, p, 65536))  0)
p += nr;
 
@@ -229,11 +229,10 @@ static bool load_flat_binary(struct kvm *kvm, int 
fd_kernel)
 static const char *BZIMAGE_MAGIC = HdrS;
 
 static bool load_bzimage(struct kvm *kvm, int fd_kernel, int fd_initrd,
-const char *kernel_cmdline)
+const char *kernel_cmdline, struct boot_params *boot)
 {
struct boot_params *kern_boot;
unsigned long setup_sects;
-   struct boot_params boot;
size_t cmdline_size;
ssize_t setup_size;
void *p;
@@ -245,26 +244,23 @@ static bool load_bzimage(struct kvm *kvm, int fd_kernel, 
int fd_initrd,
 * memory layout.
 */
 
-   if (read(fd_kernel, boot, sizeof(boot)) != sizeof(boot))
-   return false;
-
-   if (memcmp(boot.hdr.header, BZIMAGE_MAGIC, strlen(BZIMAGE_MAGIC)))
+   if (memcmp(boot-hdr.header, BZIMAGE_MAGIC, strlen(BZIMAGE_MAGIC)))
return false;
 
-   if (boot.hdr.version  BOOT_PROTOCOL_REQUIRED)
+   if (boot-hdr.version  BOOT_PROTOCOL_REQUIRED)
die(Too old kernel);
 
-   if (!boot.hdr.setup_sects)
-   boot.hdr.setup_sects = BZ_DEFAULT_SETUP_SECTS;
-   setup_sects = boot.hdr.setup_sects + 1;
+   if (!boot-hdr.setup_sects)
+   boot-hdr.setup_sects = BZ_DEFAULT_SETUP_SECTS;
+   setup_sects = boot-hdr.setup_sects + 1;
 
setup_size = setup_sects  9;
p = guest_real_to_host(kvm, BOOT_LOADER_SELECTOR, BOOT_LOADER_IP);
 
/* copy setup.bin to mem */
-   memcpy(p, boot, sizeof(boot));
-   p += sizeof(boot);
-   setup_size -= sizeof(boot);
+   memcpy(p, boot, sizeof(struct boot_params));
+   p += sizeof(struct boot_params);
+   setup_size -= sizeof(struct boot_params);
if (read(fd_kernel, p, setup_size) != setup_size)
die_perror(read);
 
@@ -277,10 +273,10 @@ static bool load_bzimage(struct kvm *kvm, int fd_kernel, 
int fd_initrd,
p = guest_flat_to_host(kvm, BOOT_CMDLINE_OFFSET);
if (kernel_cmdline) {
cmdline_size = strlen(kernel_cmdline) + 1;
-   if (cmdline_size  boot.hdr.cmdline_size)
-   cmdline_size = boot.hdr.cmdline_size;
+   if (cmdline_size  boot-hdr.cmdline_size)
+   cmdline_size = boot-hdr.cmdline_size;
 
-   memset(p, 0, boot.hdr.cmdline_size);
+   memset(p, 0, boot-hdr.cmdline_size);
memcpy(p, kernel_cmdline, cmdline_size - 1);
}
 
@@ -313,7 +309,7 @@ static bool load_bzimage(struct kvm *kvm, int fd_kernel, 
int fd_initrd,
if (fstat(fd_initrd, initrd_stat))
die_perror(fstat);
 
-   addr = boot.hdr.initrd_addr_max  ~0xf;
+   addr = boot-hdr.initrd_addr_max  ~0xf;
for (;;) {
if (addr  BZ_KERNEL_START)
die(Not enough memory for initrd);
@@ -345,15 +341,21 @@ static bool load_bzimage(struct kvm *kvm, int fd_kernel, 
int fd_initrd,
 bool kvm__arch_load_kernel_image(struct kvm *kvm, int fd_kernel, int fd_initrd,
 const char *kernel_cmdline)
 {
-   if (load_bzimage(kvm, fd_kernel, fd_initrd, kernel_cmdline))
+   struct boot_params boot;
+
+   if (read(fd_kernel, boot, sizeof(boot)) != sizeof(boot))
+   return false;
+
+   if (load_bzimage(kvm, fd_kernel, fd_initrd, kernel_cmdline, boot))
return true;
+
pr_warning(Kernel image is not a bzImage.);
pr_warning(Trying to load it as a flat binary (no cmdline support));
 
if (fd_initrd != -1

[PATCH 01/14] Refactor kernel image loading

2015-07-30 Thread Andre Przywara
Let's face it: Kernel loading is quite architecture specific. Don't
claim otherwise and move the loading routines into each
architecture's responsibility.
This introduces kvm__arch_load_kernel(), which each architecture can
implement accordingly.
Provide bzImage loading for x86 and ELF loading for MIPS as special
cases for those architectures and rename the existing flat binary
loader functions for the other architectures to the new name.

Signed-off-by: Andre Przywara andre.przyw...@arm.com
---
 arm/fdt.c |  4 ++--
 include/kvm/kvm.h |  5 ++---
 kvm.c | 42 --
 mips/kvm.c| 23 +++
 powerpc/kvm.c |  3 ++-
 x86/kvm.c | 27 +--
 6 files changed, 46 insertions(+), 58 deletions(-)

diff --git a/arm/fdt.c b/arm/fdt.c
index 3657108..ec7453f 100644
--- a/arm/fdt.c
+++ b/arm/fdt.c
@@ -239,8 +239,8 @@ static int read_image(int fd, void **pos, void *limit)
 
 #define FDT_ALIGN  SZ_2M
 #define INITRD_ALIGN   4
-int load_flat_binary(struct kvm *kvm, int fd_kernel, int fd_initrd,
-const char *kernel_cmdline)
+bool kvm__arch_load_kernel_image(struct kvm *kvm, int fd_kernel, int fd_initrd,
+const char *kernel_cmdline)
 {
void *pos, *kernel_end, *limit;
unsigned long guest_addr;
diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
index 37155db..055a7a2 100644
--- a/include/kvm/kvm.h
+++ b/include/kvm/kvm.h
@@ -111,9 +111,8 @@ void kvm__arch_read_term(struct kvm *kvm);
 void *guest_flat_to_host(struct kvm *kvm, u64 offset);
 u64 host_to_guest_flat(struct kvm *kvm, void *ptr);
 
-int load_flat_binary(struct kvm *kvm, int fd_kernel, int fd_initrd, const char 
*kernel_cmdline);
-int load_elf_binary(struct kvm *kvm, int fd_kernel, int fd_initrd, const char 
*kernel_cmdline);
-bool load_bzimage(struct kvm *kvm, int fd_kernel, int fd_initrd, const char 
*kernel_cmdline);
+bool kvm__arch_load_kernel_image(struct kvm *kvm, int fd_kernel, int fd_initrd,
+const char *kernel_cmdline);
 
 /*
  * Debugging
diff --git a/kvm.c b/kvm.c
index 10ed230..ca7dfee 100644
--- a/kvm.c
+++ b/kvm.c
@@ -341,18 +341,6 @@ static bool initrd_check(int fd)
!memcmp(id, CPIO_MAGIC, 4);
 }
 
-int __attribute__((__weak__)) load_elf_binary(struct kvm *kvm, int fd_kernel,
-   int fd_initrd, const char *kernel_cmdline)
-{
-   return false;
-}
-
-bool __attribute__((__weak__)) load_bzimage(struct kvm *kvm, int fd_kernel,
-   int fd_initrd, const char *kernel_cmdline)
-{
-   return false;
-}
-
 bool kvm__load_kernel(struct kvm *kvm, const char *kernel_filename,
const char *initrd_filename, const char *kernel_cmdline)
 {
@@ -372,40 +360,18 @@ bool kvm__load_kernel(struct kvm *kvm, const char 
*kernel_filename,
die(%s is not an initrd, initrd_filename);
}
 
-#ifdef CONFIG_X86
-   ret = load_bzimage(kvm, fd_kernel, fd_initrd, kernel_cmdline);
-
-   if (ret)
-   goto found_kernel;
-
-   pr_warning(%s is not a bzImage. Trying to load it as a flat 
binary..., kernel_filename);
-#endif
-
-   ret = load_elf_binary(kvm, fd_kernel, fd_initrd, kernel_cmdline);
-
-   if (ret)
-   goto found_kernel;
-
-   ret = load_flat_binary(kvm, fd_kernel, fd_initrd, kernel_cmdline);
-
-   if (ret)
-   goto found_kernel;
+   ret = kvm__arch_load_kernel_image(kvm, fd_kernel, fd_initrd,
+ kernel_cmdline);
 
if (initrd_filename)
close(fd_initrd);
close(fd_kernel);
 
-   die(%s is not a valid bzImage or flat binary, kernel_filename);
-
-found_kernel:
-   if (initrd_filename)
-   close(fd_initrd);
-   close(fd_kernel);
-
+   if (!ret)
+   die(%s is not a valid kernel image, kernel_filename);
return ret;
 }
 
-
 void kvm__dump_mem(struct kvm *kvm, unsigned long addr, unsigned long size, 
int debug_fd)
 {
unsigned char *p;
diff --git a/mips/kvm.c b/mips/kvm.c
index 1925f38..c1c596c 100644
--- a/mips/kvm.c
+++ b/mips/kvm.c
@@ -163,7 +163,8 @@ static void kvm__mips_install_cmdline(struct kvm *kvm)
 
 /* Load at the 1M point. */
 #define KERNEL_LOAD_ADDR 0x100
-int load_flat_binary(struct kvm *kvm, int fd_kernel, int fd_initrd, const char 
*kernel_cmdline)
+
+static bool load_flat_binary(struct kvm *kvm, int fd_kernel)
 {
void *p;
void *k_start;
@@ -281,7 +282,7 @@ static bool kvm__arch_get_elf_32_info(Elf32_Ehdr *ehdr, int 
fd_kernel,
return true;
 }
 
-int load_elf_binary(struct kvm *kvm, int fd_kernel, int fd_initrd, const char 
*kernel_cmdline)
+static bool load_elf_binary(struct kvm *kvm, int fd_kernel)
 {
union {
Elf64_Ehdr ehdr;
@@ -342,11 +343,25 @@ int load_elf_binary(struct kvm *kvm, int fd_kernel

[PATCH 00/14] kvmtool: Refactor kernel image loading to allow pipes

2015-07-30 Thread Andre Przywara
Currently kvmtool uses rewinds (lseeks to position 0) on kernel image
files. This prevents non-regular files (for instance pipes in the -k
parameter) from being used as a kernel image file.

This series reworks the kernel loading to avoid any seeks and allows
to pipe in kernel images on the command line. This basically gives us
decompression and direct kernel downloading for free in a neat UNIX
way:
$ lkvm run -k (zcat zImage.gz) ...
$ lkvm run -k (dd if=uImage bs=64 skip=1) ...
$ lkvm run -k (wget -O - http://foo.com/guest.zImage) ...
$ lkvm run -k (curl -s tftp://server/guest.zImage) ...

The first patch refactors the kernel image loading, which currently
tries to be architecture agnostic in a very intricate way. As the
actual implementations are very much architecture specific, make this
clear in the code by introducing separate functions for each arch.

Allowing pipes is quite easy for arm/arm64 and powerpc, since they
only do a (now pointless) rewind in the beginning, which we simply
drop in patch 2.
x86 requires some more love, patch 3 and 4 take care of that for
bzImage and flat binaries, respectively.
Since the MIPS ELF loader contains an actual (non-rewinding) seek,
patch 5 adds a pipe-aware wrapper around lseek to still do some
(forward) seeking despite the file descriptor being non-seekable.
Patch 6, 7 and 8 then use this to eventually rework the kernel image
loading for MIPS to be pipe-safe, too.
Patch 9 moves the ARM kernel loading from arm/fdt.c into arm/kvm.c,
to be in line with the other architectures.
Now that we may read from pipes or sockets, simply using the read(2)
syscall breaks occasionally.
Patch 10 introduces a safe wrapper for reading whole files (inspired
by the ARM implementation), whereas patches 11-14 move the kernel
loading in each architecture over to using the safe read wrappers.

These patches apply on top of the latest kvmtool master branch.
So far I could test arm, arm64 and x86, with MIPS and PowerPC
being at least compile-tested.

Cheers,
Andre.

Andre Przywara (14):
  Refactor kernel image loading
  arm/powerpc: remove unneeded seeks in kernel loading
  x86: allow pipes for bzImage kernel images
  x86: support loading flat binary kernel images from a pipe
  kvmtool: introduce pseek
  MIPS: use pseek() in ELF kernel image loading
  MIPS: move ELF headers loading outside of load_elf_binary()
  MIPS: remove seeks from load_flat_binary()
  arm: move kernel loading into arm/kvm.c
  provide generic read_file() implementation
  arm/arm64: use read_file() in kernel and initrd loading
  powerpc: use read_file() in kernel and initrd loading
  MIPS: use read wrappers in kernel loading
  x86: use read wrappers in kernel loading

 arm/fdt.c|  99 +---
 arm/kvm.c|  87 
 include/kvm/kvm.h|   5 +--
 include/kvm/read-write.h |   4 ++
 kvm.c|  42 ++---
 mips/kvm.c   | 114 ++-
 powerpc/kvm.c|  42 -
 util/read-write.c|  61 +
 x86/kvm.c| 102 ++
 9 files changed, 297 insertions(+), 259 deletions(-)

-- 
2.3.5

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   4   5   6   7   >