Re: [PATCH v2 2/2] KVM: x86: Add lowest-priority support for vt-d posted-interrupts

2015-12-23 Thread Radim Krčmář
2015-12-16 09:37+0800, Feng Wu:
> Use vector-hashing to deliver lowest-priority interrupts for
> VT-d posted-interrupts.
> 
> Signed-off-by: Feng Wu 
> ---
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> @@ -10702,8 +10702,16 @@ static int vmx_update_pi_irte(struct kvm *kvm, 
> unsigned int host_irq,
>*/
>  
>   kvm_set_msi_irq(e, );
> - if (!kvm_intr_is_single_vcpu(kvm, , ))
> - continue;
> +
> + if (!kvm_intr_is_single_vcpu(kvm, , )) {
> + if (!kvm_vector_hashing_enabled() ||
> + irq.delivery_mode != APIC_DM_LOWEST)

Hm, wouldn't it actually be easier to extend kvm_intr_is_single_vcpu()
with vector hashing? :)

> + continue;
> +
> + vcpu = kvm_intr_vector_hashing_dest(kvm, );
> + if (!vcpu)
> + continue;
> + }
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest-priority interrupts

2015-12-23 Thread Radim Krčmář
2015-12-16 09:37+0800, Feng Wu:
> Use vector-hashing to deliver lowest-priority interrupts, As an
> example, modern Intel CPUs in server platform use this method to
> handle lowest-priority interrupts.
> 
> Signed-off-by: Feng Wu 
> ---
> diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
> @@ -78,13 +83,25 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
> kvm_lapic *src,
>   r = 0;
>   r += kvm_apic_set_irq(vcpu, irq, dest_map);
>   } else if (kvm_lapic_enabled(vcpu)) {
> - if (!lowest)
> - lowest = vcpu;
> - else if (kvm_apic_compare_prio(vcpu, lowest) < 0)
> - lowest = vcpu;
> + if (!kvm_vector_hashing_enabled()) {
> + if (!lowest)
> + lowest = vcpu;
> + else if (kvm_apic_compare_prio(vcpu, lowest) < 
> 0)
> + lowest = vcpu;
> + } else {
> + __set_bit(vcpu->vcpu_id, dest_vcpu_bitmap);
> + dest_vcpus++;
> + }
>   }
>   }
>  
> + if (dest_vcpus != 0) {
> + idx = kvm_vector_2_index(irq->vector, dest_vcpus,
> +  dest_vcpu_bitmap, KVM_MAX_VCPUS);
> +
> + lowest = kvm_get_vcpu(kvm, idx - 1);

This is going to fail with sparse topologies (e.g. 3 cores per socket).
vcpu_id = initial APIC ID and kvm_get_vcpu() uses a compressed array
that has kvm->online_vcpus elements, so we could overflow.

The 'i' in kvm_for_each_vcpu() could be used for the bitmap.
(kvm_get_vcpu_by_id() instead of kvm_get_vcpu() is slightly worse.)

> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> @@ -678,6 +678,22 @@ bool kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct 
> kvm_lapic *source,
>  bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
>   struct kvm_lapic_irq *irq, int *r, unsigned long *dest_map)
>  {
> @@ -731,17 +747,38 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, 
> struct kvm_lapic *src,
> + if (!kvm_vector_hashing_enabled()) {
| [...]
> + } else {
> + int idx = 0;
> + unsigned int dest_vcpus = 0;

Now that we don't need to check for present/enabled LAPICs, I think it
would be better to solve this by assuming that all selected LAPICs are
enabled, so the n-th target is decided only based on vector and
destination.

> + for_each_set_bit(i, , 16) {
> + if (!dst[i] && 
> !kvm_lapic_enabled(dst[i]->vcpu)) {
> + __clear_bit(i, );
> + continue;
> + }
> + }

=> we could skip this loop.

> +
> + dest_vcpus = hweight16(bitmap);
> +
> + if (dest_vcpus != 0) {
> + idx = kvm_vector_2_index(irq->vector,
> + dest_vcpus, , 16);
> +
> + bitmap = 0;
> + __set_bit(idx-1, );

And set just this bit.

The drawback is that buggy software that included hardware disabled
APICs to lowest priority destinations could stop working ...
Do you think it's too risky?

> + }
>   }

(This is basically the same as converting the message to a fixed delivery
 to n-th bit beforehand, so it might be reasonable to to apply something
 similar to simplify the slow path as well.  Mixed flat/cluster/x2APIC
 mode makes me suspect that it won't be reasonable.)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests PATCH 3/3] add timeout support

2015-12-22 Thread Radim Krčmář
2015-12-21 13:45-0600, Andrew Jones:
> On Mon, Dec 21, 2015 at 06:04:20PM +0100, Radim Krčmář wrote:
>> 2015-12-17 14:10-0600, Andrew Jones:
>> > diff --git a/run_tests.sh b/run_tests.sh
>> > @@ -21,6 +21,7 @@ function run()
>> > +local timeout="${9:-$TIMEOUT}"
>> > diff --git a/scripts/mkstandalone.sh b/scripts/mkstandalone.sh
>> > @@ -97,8 +98,12 @@ if [ "\$QEMU" ]; then
>> > +if [ "$timeout" ]; then
>> > +  timeout_cmd='timeout --foreground $timeout'
>> 
>> Both would be nicer if they took the TIMEOUT variable as an override.
> 
> Everything already takes TIMEOUT as an override, i.e.
> 
> TIMEOUT=3 ./run_tests.sh
> 
> and
> 
> TIMEOUT=3 arm/run arm/test.flat
> 
> will both already set a timeout for any test that didn't have a timeout
> set in unittests.cfg, or wasn't run with run()/unittests.cfg.

Tests made with mkstandalone.sh ignore the TIMEOUT variable ...

>   Or, did
> you mean that you'd prefer TIMEOUT to override the timeout in
> unittests.cfg?

... and yes, I think that we could have a
- global timeout for all tests.  Something far longer than any tests
  should take (2 minutes?).  To automatically handle random hangs.

- per-test timeout in unittests.cfg.  When the test is known to timeout
  often and the usual time to fail is much shorter than the global
  default.  (Shouldn't be used much.)

- TIMEOUT variable.  It has to override the global timeout and I think
  that if we are ever going to use it, it's because we want something
  weird.  Like using `TIMEOUT=0 ./run_tests.sh` to disable all
  timeouts, prolonging/shortening timeouts because of a special
  configuration, ...
  Because we should improve our defaults otherwise.

  (I'd probably allow something as evil as `eval`ing the TIMEOUT, for
   unlikely stuff like TIMEOUT='$(( ${timeout:-10} / 2 ))')

>That does make some sense, in the case the one in the
> config is longer than desired, however it also makes sense the way I
> have it now when the one in the config is shorter than TIMEOUT (the
> fallback default). I think I like it this way better.

Ok, the difference was negligible to begin with.

>> We already don't do that for accel and the patch seems ok in other
>> regards,
> 
> Hmm, for accel I see a need for a patch allowing us to do
> 
> ACCEL=?? ./run_tests.sh

Btw. why do we have ACCEL when the project is *kvm*_unit_tests?

> as I already have for TIMEOUT. Also, for both I should add a
> mkstandalone patch allowing
> 
> TIMEOUT=? ACCEL=? make standalone

I'd also handle TIMEOUT/ACCEL in resulting standalone tests.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests PATCH 1/3] run_tests.sh: reduce return code ambiguity

2015-12-22 Thread Radim Krčmář
2015-12-21 13:35-0600, Andrew Jones:
> On Mon, Dec 21, 2015 at 05:31:24PM +0100, Radim Krčmář wrote:
> > 2015-12-17 14:10-0600, Andrew Jones:
>>  > 128 = exited because of signal $? - 128
>>  * = unit-test failed
>> 
>> (Signal 0 is not used, so we could map 128 to mean "debug-exit probably
>>  wasn't called", but others might not understand our signal convention.
> 
> I think we want 128 to be the beginning of signal space, which goes all
> the way up to 255, in order to allow exit code masking to work.
> 
>>  Anyway, it'd be best for us to start at 200, for `case $? in 2??)` ...)
> 
> Start what at 200?

Signals, signal = $? - 200.  Shell default to decimal representation of
numbers, so using binary steps doesn't give an advantage and 55 is still
a plenty of space.  (I deplore elif cascade on the same variable, but we
can always convert the $? to binary/octal/hex, for `case` decoding. :])

>I think we have everything covered above. The mapping
> looks like this
> 
> 0 = success
> 1-63  = unit test failure code
> 64-127= test suite failure code

77 is not that easy to categorize -- we want to return it from both.

> 128-255   = signal
> 
> which sounds good to me.

To me as well.

>> > Signed-off-by: Andrew Jones <drjo...@redhat.com>
>> > ---
>> > diff --git a/run_tests.sh b/run_tests.sh
>> > @@ -54,10 +55,32 @@ function run()
>> >  
>> >  # extra_params in the config file may contain backticks that need to 
>> > be
>> >  # expanded, so use eval to start qemu
>> > -eval $cmdline >> test.log
>> > +errlog=$(mktemp)
>> > +eval $cmdline >> test.log 2> $errlog
>> | [...]
>> |  cat $errlog >> test.log
>> 
>> This assumes that stderr is always after stdout,
> 
> True. I'm not sure that matters when the unit test, which only uses stdout
> will always output stuff serially with qemu, which could output a mix. But
> your version below is fine by me if we want to pick up the need for the
> pipe and tee.

Yeah, I assume that QEMU can warn during the test, or interact with its
own stdout in an ordered manner.  I don't think it matter much, but
there isn't a significant draw-back.

>>   eval $cmdline 2>&1 >> test.log | tee $errlog >> test.log
>> 
>> has a chance to print lines in wrong order too, but I think it's going
>> to be closer to the original.
> 
> I'll play with it and send a v2 soon.

Thanks, though I am quite distracted during the end of the year, so
"soon" won't be truly appreciated. :)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests PATCH 1/3] run_tests.sh: reduce return code ambiguity

2015-12-21 Thread Radim Krčmář
2015-12-17 14:10-0600, Andrew Jones:
> qemu/unittest exit codes are convoluted, causing codes 0 and 1
> to be ambiguous. Here are the possible meanings
> 
>  .-.
>  || 0 | 1  |
>  |-|
>  | QEMU   | did something successfully,   | FAILURE|
>  || but probably didn't run the   ||
>  || unittest, OR caught SIGINT,   ||
>  || SIGHUP, or SIGTERM||
>  |-|
>  | unittest   | for some reason exited using  | SUCCESS|
>  || ACPI/PSCI, not with debug-exit||
>  .-.
> 
> As we can see above, an exit code of zero is even ambiguous for each
> row, i.e. QEMU could exit with zero because it successfully completed,
> or because it caught a signal. unittest could exit with zero because
> it successfully powered-off, or because for some odd reason it powered-
> off instead of calling debug-exit.
> 
> And, the most fun is that exit-code == 1 means QEMU failed, but the
> unittest succeeded.
> 
> This patch attempts to reduce that ambiguity, by also looking at stderr.

Nice.

> With it, we have
> 
>  0  - unexpected exit from qemu, or the unittest not using debug-exit.
>   Consider it a FAILURE
>  1  - unittest SUCCESS
>  < 128  - something failed (could be the unittest, qemu, or a run script)
>   Check the logs.
>  >= 128 - signal (signum = code - 128)

I think this heuristic should be applied to {arm,x86}/run.
run_tests.sh would inherit it and we would finally get reasonable exit
values everywhere.

The resulting table would look like this:

 0 = unit-test passed
 77= unit-test skipped (not implemented yet)
 124   = unit-test timeouted (implemented in [3/3])
 127   = qemu returned 0 (debug-exit probably wasn't called)
 > 128 = exited because of signal $? - 128
 * = unit-test failed

(Signal 0 is not used, so we could map 128 to mean "debug-exit probably
 wasn't called", but others might not understand our signal convention.
 Anyway, it'd be best for us to start at 200, for `case $? in 2??)` ...)

> Signed-off-by: Andrew Jones 
> ---
> diff --git a/run_tests.sh b/run_tests.sh
> @@ -54,10 +55,32 @@ function run()
>  
>  # extra_params in the config file may contain backticks that need to be
>  # expanded, so use eval to start qemu
> -eval $cmdline >> test.log
> +errlog=$(mktemp)
> +eval $cmdline >> test.log 2> $errlog
| [...]
|  cat $errlog >> test.log

This assumes that stderr is always after stdout,

  eval $cmdline 2>&1 >> test.log | tee $errlog >> test.log

has a chance to print lines in wrong order too, but I think it's going
to be closer to the original.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests PATCH 3/3] add timeout support

2015-12-21 Thread Radim Krčmář
2015-12-17 14:10-0600, Andrew Jones:
> Signed-off-by: Andrew Jones <drjo...@redhat.com>
> ---
> diff --git a/arm/run b/arm/run
> @@ -75,10 +75,14 @@ chr_testdev+=' -device virtconsole,chardev=ctd -chardev 
> testdev,id=ctd'
>  M+=",accel=$ACCEL"
>  command="$qemu $M -cpu $processor $chr_testdev"
>  command+=" -display none -serial stdio -kernel"
> -echo $command "$@"
> +
> +if [ "$TIMEOUT" ]; then
> + timeout_cmd="timeout --foreground $TIMEOUT"

(command="timeout --foreground $TIMEOUT $command" # to keep the clutter
 down.)

> +fi

(We paste it three times, so I'd rather see this abstracted in a
 "scripts/" library.)

> diff --git a/run_tests.sh b/run_tests.sh
> @@ -21,6 +21,7 @@ function run()
> +local timeout="${9:-$TIMEOUT}"
> diff --git a/scripts/mkstandalone.sh b/scripts/mkstandalone.sh
> @@ -97,8 +98,12 @@ if [ "\$QEMU" ]; then
> +if [ "$timeout" ]; then
> + timeout_cmd='timeout --foreground $timeout'

Both would be nicer if they took the TIMEOUT variable as an override.
We already don't do that for accel and the patch seems ok in other
regards,

Reviewed-by: Radim Krčmář <rkrc...@redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests v2 01/12] run_tests: move run() to scripts/

2015-12-18 Thread Radim Krčmář
2015-12-17 12:45-0600, Andrew Jones:
> On Thu, Dec 17, 2015 at 06:53:32PM +0100, Radim Krčmář wrote:
>> We'll be using it from scripts/mkstandalone later.
>> 
>> Signed-off-by: Radim Krčmář <rkrc...@redhat.com>
>> ---
>>  v2: new
>>  
>>  run_tests.sh | 53 +
>>  scripts/run.bash | 51 +++
> 
> Could probably just put run() in scripts/functions.bash

Definitely.  The drawback is that for_each_unittest() and any future
helper would be pasted in each unit test, which would complicate
conversion back to shell and uselessly bloat tests.  Even though I like
programming in shell far more than in C, I would not do it if we decide
to POSIX our standalone tests.  (Bloating is not an issue for me.)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests v2 02/12] run_tests: prepare for changes in scripts/mkstandalone

2015-12-18 Thread Radim Krčmář
2015-12-17 12:53-0600, Andrew Jones:
> On Thu, Dec 17, 2015 at 06:53:33PM +0100, Radim Krčmář wrote:
>> mkstandalone has a different mechanism for running tests as well as a
>> different handling of output and return codes.
>>  - create two shell function to capture test execution and logging
>>  - return the return value of unit-test
>>  - cope with empty $verbose in `run`
>> 
>> Signed-off-by: Radim Krčmář <rkrc...@redhat.com>
>> ---
>>  v2: new (reused the bitshift and comment from v1)
>>  
>> diff --git a/run_tests.sh b/run_tests.sh
>> -cmdline="TESTNAME=$testname ACCEL=$accel ./$TEST_DIR-run $kernel -smp 
>> $smp $opts"
>> -if [ $verbose != 0 ]; then
>> +cmdline="TESTNAME=$testname ACCEL=$accel __run $kernel -smp $smp $opts"
>> +if [ "$verbose" -a "$verbose" != 0 ]; then
> 
> For bash bools I prefer just doing 'if [ "$verbose" = "yes" ]', allowing it
> to be empty.

Yeah, I guess it's a bit better.
(I prefer just [ "$verbose" ].)

>>  # extra_params in the config file may contain backticks that need to be
>>  # expanded, so use eval to start qemu
>> -eval $cmdline >> test.log
>> +__eval_log "$cmdline"
>> +# The first bit of return value is too hard to use, just skip it.
>> +# Unit-tests' return value is shifted by one.
>> +ret=$(($? >> 1))
> 
> I just wrote a patch, inspired by reviewing your v1 of this series, that
> tackles the ambiguous exit code problem. I'll post it now, but obviously
> we'll need to rebase one or the other of our run_tests.sh series'.

Nice, I'll review it later today (I'll travel for most of the afternoon).
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests v2 03/12] scripts/mkstandalone: use common run function

2015-12-18 Thread Radim Krčmář
2015-12-17 13:09-0600, Andrew Jones:
> On Thu, Dec 17, 2015 at 06:53:34PM +0100, Radim Krčmář wrote:
>> The biggest change is dependency on bash.  An alternative would be to
>> rewrite `run` in POSIX shell, but I think it's ok to presume that KVM
>> unit tests will run on a system where installing bash isn't a problem.
> 
> Hmm... as hard as I worked to avoid the dependency on bash for the
> standalone tests, then I'm reluctant to give up on that. I do agree
> that having the dependency for the printf-%q trick helps a ton in
> making the code more maintainable though.

And parts of run() would need to be rewritten as well ... I am not sure
it's worth to invest the time.

>> (We already depend on QEMU ...)
> 
> Dependency on qemu doesn't imply a dependency on bash. The idea behind
> the standalone version of kvm-unit-tests tests is that you can receive
> one in your email and launch it.

No, but its quite likely that the host has bash when something as big as
QEMU is already there, or at least that it shouldn't be a huge problem
to install it.

>> Apart from that, there are changes in output and exit codes.
>>  - summary doesn't go to stderr
> 
> I wanted the summary on stderr so when you redirect the output of the
> test to a file the output would directly diff with the corresponding
> output in test.log from a system where run_tests.sh was used.

Ok, stderr from qemu needs to be redirected too then.
I will add an extra param to set file for __*log helpers or helpers that
redirect to stderr.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests v2 05/12] lib/report: allow test skipping

2015-12-18 Thread Radim Krčmář
2015-12-17 13:37-0600, Andrew Jones:
> On Thu, Dec 17, 2015 at 01:30:23PM -0600, Andrew Jones wrote:
>> On Thu, Dec 17, 2015 at 06:53:36PM +0100, Radim Krčmář wrote:
>> > We can now explicitly mark a unit-test as skipped.
>> > If all unit-tests were skipped, the whole test is reported as skipped as
>> > well.  This also includes the case where no tests were run, but still
>> > ended with report_summary().
>> > 
>> > When the whole test is skipped, ./run_tests.sh prints yellow "SKIP"
>> > instead of green "PASS".
>> > 
>> > Return value of 77 is used to please Autotools.  I also renamed few
>> > things in reporting code and chose to refactor a logic while at it.
>> > 
>> > Signed-off-by: Radim Krčmář <rkrc...@redhat.com>
>> > ---
>> > diff --git a/lib/report.c b/lib/report.c
>> > @@ -43,25 +43,28 @@ void report_prefix_pop(void)
>> > -void va_report_xfail(const char *msg_fmt, bool xfail, bool cond, va_list 
>> > va)
>> > +static void va_report(const char *msg_fmt, bool pass, bool xfail, bool 
>> > skip,
>> > +  va_list va)
>> 
>> Making this static disallows unit test writers to create their own
>> variable arg report() wrapper functions. Perhaps to determine whether
>> or not a skip is in order, e.g.
>> 
>>  xyz_report(msg, pass, ...)
>>  {
>> va_list va;
>> va_start(va, pass);
>> if (xyz)
>>va_report(msg, pass, false, false, va);
>> else
>>va_report(msg, false, false, true, va);
>> va_end(va);
>>  }
> 
> Hmm, while I still think we should avoid using static, to allow new wrappers,
> the wrapper I wrote here as an example wouldn't be necessary if report_skip's
> inputs were instead 

That breaks encapsulation -- if we ever want to change va_report(),
we've just made our lives harder.

> void report_skip(const char *msg_fmt, bool pass, bool skip, ...)
> 
> Why not do that?

Yeah, some cases want to unconditionally skip, so we'd want to have
both.  I'll think of naming during lunch :)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests v2 10/12] run_tests: print summary

2015-12-18 Thread Radim Krčmář
2015-12-17 13:55-0600, Andrew Jones:
> On Thu, Dec 17, 2015 at 06:53:41PM +0100, Radim Krčmář wrote:
>> Might be interesting and hopefully won't break too many scripts.
>> 
>> Signed-off-by: Radim Krčmář <rkrc...@redhat.com>
>> ---
>>  v2:
>>  - don't print "0 unexpected failures" in run_tests' summary. [Drew]
>>(This could have been done in lib/report, but I'm not sure why we want
>> to always print it in the summary, so I've kept it there.)
>>  - use $testname
>>  - don't buffer the output (a serious bug in v1)
>>  - worse performance (reads the output of all tests)
>>  
>> diff --git a/run_tests.sh b/run_tests.sh
>> +__echo_last_log | sed 'x;$s/^SUMMARY: //;ta;$s/.*//p;d;
>> + :a s/, 0 unexpected failures//;s/^/ (/;s/$/)/'
> 
> This sed is way beyond my skillz. I would have just done the following
> at the expense of an extra pipe :-)
> 
> tac test.log | grep -m1 '^SUMMARY:' | sed 's/.*: \(.*\),.*/\1/'

This won't print a newline in case there is no summary, but yeah, I will
limit myself to the well known 's' command.

(tail won't buffer the whole file and `head` instead of -m is POSIX.)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests v2 10/12] run_tests: print summary

2015-12-18 Thread Radim Krčmář
2015-12-17 14:06-0600, Andrew Jones:
> On Thu, Dec 17, 2015 at 06:53:41PM +0100, Radim Krčmář wrote:
>> Might be interesting and hopefully won't break too many scripts.
>> 
>> Signed-off-by: Radim Krčmář <rkrc...@redhat.com>
>> ---
>>  v2:
>>  - don't print "0 unexpected failures" in run_tests' summary. [Drew]
>>(This could have been done in lib/report, but I'm not sure why we want
>> to always print it in the summary, so I've kept it there.)
> 
> I vote we kill it in lib/report. It's usually just noise.

I support this.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests v2 09/12] x86/hyperv_synic: check for support before testing

2015-12-18 Thread Radim Krčmář
2015-12-17 13:42-0600, Andrew Jones:
> On Thu, Dec 17, 2015 at 06:53:40PM +0100, Radim Krčmář wrote:
>> It's not easy to distinguish successful unit-test from failed QEMU, so
>> we check for presence of the needed feature before hand.
>> 
>> Signed-off-by: Radim Krčmář <rkrc...@redhat.com>
>> ---
>>  v2: remove "> /dev/null" as check doesn't print the output anymore
>> diff --git a/x86/unittests.cfg b/x86/unittests.cfg
>> @@ -182,3 +182,4 @@ arch = x86_64
>>  file = hyperv_synic.flat
>>  smp = 2
>>  extra_params = -cpu kvm64,hv_synic -device hyperv-testdev
>> +check = echo quit | $qemu -cpu kvm64,hv_synic -device hyperv-testdev 
>> -monitor stdio
> 
> Let's make sure $QEMU==$qemu in contexts where unittests.cfg is used, and
> then document (in the unittests.cfg header) that $QEMU may be used in the
> check lines.

I'd just get rid of $qemu, we don't gain anything by having them both.
Documenting it is a good idea.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests v2 00/12] Improve the output of test runners

2015-12-18 Thread Radim Krčmář
2015-12-17 14:04-0600, Andrew Jones:
> On Thu, Dec 17, 2015 at 06:53:31PM +0100, Radim Krčmář wrote:
>> v1: http://www.spinics.net/lists/kvm/msg125202.html
>> 
>> Drew brought up the existence of scripts/mkstandalone.sh, which
>> significantly expanded v2 (and my set of curses) ...
>> I didn't want to do the same twice, so first part of this series,
>> [1-4/12], reuses run() from run_tests.sh and does some non-conservative
>> changes to scripts/mkstandalone.sh.  scripts/mkstandalone.sh is lacking
>> behind run_tests.sh, but should be good enough to fulfill its purpose.
>> 
>> The output of run_tests.sh has also changed a bit and now looks like
>> this (you'll again need to imagine colours):
>> 
>> > PASS apic (14 tests)
>> > PASS ioapic (19 tests)
>> > PASS smptest (1 tests)
>> > PASS smptest3 (1 tests)
>> > PASS vmexit_cpuid
>> > PASS vmexit_vmcall
> 
> Why do some tests, which have only 1 test, say (1 tests), but other
> tests don't say anything?

Some tests don't use lib/report, hence don't print "^SUMMARY:", so we
don't really want to know what happens there.

> Some nice improvements with this series. I'm not sure I like depending
> on bash in standalone tests, but then that could just be cause I worked
> pretty hard at avoiding the dependency, and thus I'll have to cry at the
> loss of it...

Knowing the percentage of KVM+QEMU installations without bash would
help.  I expect it to be very close to zero, which makes compassion hard
to find ... sorry.

> Please review the series I'll send in about 2 minutes, so we can discuss
> how to integrate them.

Will do, thanks.  (Please excuse the delay.)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 00/12] Improve the output of test runners

2015-12-17 Thread Radim Krčmář
v1: http://www.spinics.net/lists/kvm/msg125202.html

Drew brought up the existence of scripts/mkstandalone.sh, which
significantly expanded v2 (and my set of curses) ...
I didn't want to do the same twice, so first part of this series,
[1-4/12], reuses run() from run_tests.sh and does some non-conservative
changes to scripts/mkstandalone.sh.  scripts/mkstandalone.sh is lacking
behind run_tests.sh, but should be good enough to fulfill its purpose.

The output of run_tests.sh has also changed a bit and now looks like
this (you'll again need to imagine colours):

> PASS apic (14 tests)
> PASS ioapic (19 tests)
> PASS smptest (1 tests)
> PASS smptest3 (1 tests)
> PASS vmexit_cpuid
> PASS vmexit_vmcall
> PASS vmexit_mov_from_cr8
> PASS vmexit_mov_to_cr8
> PASS vmexit_inl_pmtimer
> PASS vmexit_ipi
> PASS vmexit_ipi_halt
> PASS vmexit_ple_round_robin
> PASS access
> SKIP smap (0 tests)
> SKIP pku (0 tests)
> PASS emulator (132 tests, 1 skipped)
> PASS eventinj (13 tests)
> PASS hypercall (2 tests)
> PASS idt_test (4 tests)
> PASS msr (13 tests)
> PASS pmu (67 tests, 1 expected failures)
> PASS port80
> PASS realmode
> PASS s3
> PASS sieve
> PASS tsc (3 tests)
> PASS tsc_adjust (5 tests)
> PASS xsave (17 tests)
> PASS rmap_chain
> SKIP svm (0 tests)
> SKIP svm-disabled (0 tests)
> SKIP taskswitch (i386 only)
> SKIP taskswitch2 (i386 only)
> PASS kvmclock_test
> PASS pcid (3 tests)
> SKIP vmx (0 tests)
> PASS debug (7 tests)
> SKIP hyperv_synic (failed check)


Radim Krčmář (12):
  run_tests: move run() to scripts/
  run_tests: prepare for changes in scripts/mkstandalone
  scripts/mkstandalone: use common run function
  scripts/mkstandalone: improve exit paths
  lib/report: allow test skipping
  x86/*: report skipped tests
  x86/pmu: expect failure with nmi_watchdog
  scripts/run: generalize check
  x86/hyperv_synic: check for support before testing
  run_tests: print summary
  wrappers: consolidate skip output
  run_tests: suppress stderr

 lib/libcflat.h  |  1 +
 lib/report.c| 44 +++---
 run_tests.sh| 58 +---
 scripts/mkstandalone.sh | 64 +
 scripts/run.bash| 62 +++
 x86/apic.c  |  7 +++---
 x86/emulator.c  |  2 +-
 x86/hyperv_synic.c  |  2 +-
 x86/pku.c   |  2 +-
 x86/pmu.c   | 11 +++--
 x86/smap.c  |  2 +-
 x86/svm.c   |  2 +-
 x86/tsc.c   |  2 +-
 x86/unittests.cfg   |  4 ++--
 14 files changed, 146 insertions(+), 117 deletions(-)
 create mode 100644 scripts/run.bash

-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 01/12] run_tests: move run() to scripts/

2015-12-17 Thread Radim Krčmář
We'll be using it from scripts/mkstandalone later.

Signed-off-by: Radim Krčmář <rkrc...@redhat.com>
---
 v2: new
 
 run_tests.sh | 53 +
 scripts/run.bash | 51 +++
 2 files changed, 52 insertions(+), 52 deletions(-)
 create mode 100644 scripts/run.bash

diff --git a/run_tests.sh b/run_tests.sh
index fad22a935b00..58949e39c38c 100755
--- a/run_tests.sh
+++ b/run_tests.sh
@@ -6,63 +6,12 @@ if [ ! -f config.mak ]; then
 fi
 source config.mak
 source scripts/functions.bash
+source scripts/run.bash
 
 config=$TEST_DIR/unittests.cfg
 qemu=${QEMU:-qemu-system-$ARCH}
 verbose=0
 
-function run()
-{
-local testname="$1"
-local groups="$2"
-local smp="$3"
-local kernel="$4"
-local opts="$5"
-local arch="$6"
-local check="$7"
-local accel="$8"
-
-if [ -z "$testname" ]; then
-return
-fi
-
-if [ -n "$only_group" ] && ! grep -q "$only_group" <<<$groups; then
-return
-fi
-
-if [ -n "$arch" ] && [ "$arch" != "$ARCH" ]; then
-echo "skip $1 ($arch only)"
-return
-fi
-
-# check a file for a particular value before running a test
-# the check line can contain multiple files to check separated by a space
-# but each check parameter needs to be of the form =
-for check_param in ${check[@]}; do
-path=${check_param%%=*}
-value=${check_param#*=}
-if [ "$path" ] && [ "$(cat $path)" != "$value" ]; then
-echo "skip $1 ($path not equal to $value)"
-return
-fi
-done
-
-cmdline="TESTNAME=$testname ACCEL=$accel ./$TEST_DIR-run $kernel -smp $smp 
$opts"
-if [ $verbose != 0 ]; then
-echo $cmdline
-fi
-
-# extra_params in the config file may contain backticks that need to be
-# expanded, so use eval to start qemu
-eval $cmdline >> test.log
-
-if [ $? -le 1 ]; then
-echo -e "\e[32mPASS\e[0m $1"
-else
-echo -e "\e[31mFAIL\e[0m $1"
-fi
-}
-
 function usage()
 {
 cat <=
+for check_param in ${check[@]}; do
+path=${check_param%%=*}
+value=${check_param#*=}
+if [ "$path" ] && [ "$(cat $path)" != "$value" ]; then
+echo "skip $1 ($path not equal to $value)"
+return
+fi
+done
+
+cmdline="TESTNAME=$testname ACCEL=$accel ./$TEST_DIR-run $kernel -smp $smp 
$opts"
+if [ $verbose != 0 ]; then
+echo $cmdline
+fi
+
+# extra_params in the config file may contain backticks that need to be
+# expanded, so use eval to start qemu
+eval $cmdline >> test.log
+
+if [ $? -le 1 ]; then
+echo -e "\e[32mPASS\e[0m $1"
+else
+echo -e "\e[31mFAIL\e[0m $1"
+fi
+}
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 12/12] run_tests: suppress stderr

2015-12-17 Thread Radim Krčmář
log it instead to keep the output clear.

Signed-off-by: Radim Krčmář <rkrc...@redhat.com>
---
 v2: new (v1 did this by catching all output in a variable)

 run_tests.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/run_tests.sh b/run_tests.sh
index 62c106a0b693..4c8c20a16eac 100755
--- a/run_tests.sh
+++ b/run_tests.sh
@@ -29,7 +29,7 @@ EOF
 }
 
 __run() { ./$TEST_DIR-run "${@}"; }
-__eval_log() { eval "${@}" >> test.log; }
+__eval_log() { eval "${@}" >> test.log 2>&1; }
 __echo_last_log() { cat test.log; } # XXX: ignores the 'last' bit
 
 echo > test.log
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 06/12] x86/*: report skipped tests

2015-12-17 Thread Radim Krčmář
No care to consistency or exhaustivity was given.

(svm-disabled test should be redone and it's weird that x86/hyperv_synic
 is about the only one that does report_skip when unsupported.)

Signed-off-by: Radim Krčmář <rkrc...@redhat.com>
---
 v2: remove double newline in enable_tsc_deadline_timer [Drew]
 
 x86/apic.c | 7 +++
 x86/emulator.c | 2 +-
 x86/hyperv_synic.c | 2 +-
 x86/pku.c  | 2 +-
 x86/pmu.c  | 2 +-
 x86/smap.c | 2 +-
 x86/svm.c  | 2 +-
 x86/tsc.c  | 2 +-
 8 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/x86/apic.c b/x86/apic.c
index d4eec529e535..a97fe5e5a6c5 100644
--- a/x86/apic.c
+++ b/x86/apic.c
@@ -27,7 +27,7 @@ static void tsc_deadline_timer_isr(isr_regs_t *regs)
 ++tdt_count;
 }
 
-static void start_tsc_deadline_timer(void)
+static void __test_tsc_deadline_timer(void)
 {
 handle_irq(TSC_DEADLINE_TIMER_VECTOR, tsc_deadline_timer_isr);
 irq_enable();
@@ -45,7 +45,6 @@ static int enable_tsc_deadline_timer(void)
 if (cpuid(1).c & (1 << 24)) {
 lvtt = TSC_DEADLINE_TIMER_MODE | TSC_DEADLINE_TIMER_VECTOR;
 apic_write(APIC_LVTT, lvtt);
-start_tsc_deadline_timer();
 return 1;
 } else {
 return 0;
@@ -55,9 +54,9 @@ static int enable_tsc_deadline_timer(void)
 static void test_tsc_deadline_timer(void)
 {
 if(enable_tsc_deadline_timer()) {
-printf("tsc deadline timer enabled\n");
+__test_tsc_deadline_timer();
 } else {
-printf("tsc deadline timer not detected\n");
+report_skip("tsc deadline timer not detected");
 }
 }
 
diff --git a/x86/emulator.c b/x86/emulator.c
index e5c1c6b9a2f3..b64a5fe0f3dc 100644
--- a/x86/emulator.c
+++ b/x86/emulator.c
@@ -1062,7 +1062,7 @@ static void illegal_movbe_handler(struct ex_regs *regs)
 static void test_illegal_movbe(void)
 {
if (!(cpuid(1).c & (1 << 22))) {
-   printf("SKIP: illegal movbe\n");
+   report_skip("illegal movbe");
return;
}
 
diff --git a/x86/hyperv_synic.c b/x86/hyperv_synic.c
index 18d1295bfb37..602b79392bfd 100644
--- a/x86/hyperv_synic.c
+++ b/x86/hyperv_synic.c
@@ -228,7 +228,7 @@ int main(int ac, char **av)
 
 report("Hyper-V SynIC test", ok);
 } else {
-report("Hyper-V SynIC is not supported", true);
+report_skip("Hyper-V SynIC is not supported");
 }
 
 return report_summary();
diff --git a/x86/pku.c b/x86/pku.c
index 0e00b9984d70..58971d21ed05 100644
--- a/x86/pku.c
+++ b/x86/pku.c
@@ -91,7 +91,7 @@ int main(int ac, char **av)
 
 if (!(cpuid_indexed(7, 0).c & (1 << X86_FEATURE_PKU))) {
 printf("PKU not enabled, exiting\n");
-exit(1);
+return report_summary();
 }
 
 setup_vm();
diff --git a/x86/pmu.c b/x86/pmu.c
index 03f80190bb25..c68980044dee 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -387,7 +387,7 @@ int main(int ac, char **av)
 
if (!eax.split.version_id) {
printf("No pmu is detected!\n");
-   return 1;
+   return report_summary();
}
printf("PMU version: %d\n", eax.split.version_id);
printf("GP counters: %d\n", eax.split.num_counters);
diff --git a/x86/smap.c b/x86/smap.c
index d8a7ae82dc00..0aa44054bd48 100644
--- a/x86/smap.c
+++ b/x86/smap.c
@@ -93,7 +93,7 @@ int main(int ac, char **av)
 
if (!(cpuid_indexed(7, 0).b & (1 << X86_FEATURE_SMAP))) {
printf("SMAP not enabled, exiting\n");
-   exit(1);
+   return report_summary();
}
 
setup_vm();
diff --git a/x86/svm.c b/x86/svm.c
index 1046ddf73732..ff1a0f34b4bf 100644
--- a/x86/svm.c
+++ b/x86/svm.c
@@ -1064,7 +1064,7 @@ int main(int ac, char **av)
 
 if (!(cpuid(0x8001).c & 4)) {
 printf("SVM not availble\n");
-return 0;
+return report_summary();
 }
 
 setup_svm();
diff --git a/x86/tsc.c b/x86/tsc.c
index c71dc2a7abe0..ee247459fb42 100644
--- a/x86/tsc.c
+++ b/x86/tsc.c
@@ -43,5 +43,5 @@ int main()
test_rdtscp(0x100);
} else
printf("rdtscp not supported\n");
-   return 0;
+   return report_summary();
 }
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 03/12] scripts/mkstandalone: use common run function

2015-12-17 Thread Radim Krčmář
The biggest change is dependency on bash.  An alternative would be to
rewrite `run` in POSIX shell, but I think it's ok to presume that KVM
unit tests will run on a system where installing bash isn't a problem.
(We already depend on QEMU ...)

Apart from that, there are changes in output and exit codes.
 - summary doesn't go to stderr
 - PASS/FAIL is colourful
 - FAILed scripts return 1

Signed-off-by: Radim Krčmář <rkrc...@redhat.com>
---
 v2: new (I can fix the stderr in v3)
 
 scripts/mkstandalone.sh | 59 +
 1 file changed, 25 insertions(+), 34 deletions(-)

diff --git a/scripts/mkstandalone.sh b/scripts/mkstandalone.sh
index 3ce244aff67b..cf2182dbd936 100755
--- a/scripts/mkstandalone.sh
+++ b/scripts/mkstandalone.sh
@@ -20,6 +20,13 @@ fi
 unittests=$TEST_DIR/unittests.cfg
 mkdir -p tests
 
+escape ()
+{
+   for arg in "${@}"; do
+   printf "%q " "$arg"; # XXX: trailing whitespace
+   done
+}
+
 function mkstandalone()
 {
local testname="$1"
@@ -49,33 +56,14 @@ function mkstandalone()
cmdline=$(cut -d' ' -f2- <<< "$cmdline")
 
cat < $standalone
-#!/bin/sh
+#!/bin/bash
 
-EOF
-if [ "$arch" ]; then
-   cat <> $standalone
 ARCH=\`uname -m | sed -e s/i.86/i386/ | sed -e 's/arm.*/arm/'\`
-[ "\$ARCH" = "aarch64" ] && ARCH="arm64"
-if [ "\$ARCH" != "$arch" ]; then
-   echo "skip $testname ($arch only)" 1>&2
-   exit 1
-fi
 
 EOF
-fi
-if [ "$check" ]; then
-   cat <> $standalone
-for param in $check; do
-   path=\`echo \$param | cut -d= -f1\`
-   value=\`echo \$param | cut -d= -f2\`
-   if [ -f "\$path" ] && [ "\`cat \$path\`" != "\$value" ]; then
-   echo "skip $testname (\$path not equal to \$value)" 1>&2
-   exit 1
-   fi
-done
 
-EOF
-fi
+cat scripts/run.bash >> $standalone
+
 if [ ! -f $kernel ]; then
cat <> $standalone
 echo "skip $testname (test kernel not present)" 1>&2
@@ -100,9 +88,13 @@ MAX_SMP="MAX_SMP"
 echo \$qemu $cmdline -smp $smp $opts
 
 cmdline="\`echo '$cmdline' | sed s%$kernel%_NO_FILE_4Uhere_%\`"
-if \$qemu \$cmdline 2>&1 | grep 'No accelerator found'; then
-   ret=2
-else
+if \$qemu \$cmdline 2>&1 | grep 'No accelerator found' >/dev/null; then
+   echo "skip $testname (QEMU doesn't support KVM)"
+   exit 1
+fi
+
+__run()
+{
MAX_SMP=\`getconf _NPROCESSORS_CONF\`
while \$qemu \$cmdline -smp \$MAX_SMP 2>&1 | grep 'exceeds max cpus' > 
/dev/null; do
MAX_SMP=\`expr \$MAX_SMP - 1\`
@@ -110,16 +102,15 @@ else
 
cmdline="\`echo '$cmdline' | sed s%$kernel%\$bin%\`"
\$qemu \$cmdline -smp $smp $opts
-   ret=\$?
-fi
-echo Return value from qemu: \$ret
-if [ \$ret -le 1 ]; then
-   echo PASS $testname 1>&2
-else
-   echo FAIL $testname 1>&2
-fi
+}
+
+__eval_log() { eval "\${@}"; }
+
+run `escape "${@}"`
+ret=$?
+
 rm -f \$bin
-exit 0
+exit \$ret
 EOF
 fi
 chmod +x $standalone
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 10/12] run_tests: print summary

2015-12-17 Thread Radim Krčmář
Might be interesting and hopefully won't break too many scripts.

Signed-off-by: Radim Krčmář <rkrc...@redhat.com>
---
 v2:
 - don't print "0 unexpected failures" in run_tests' summary. [Drew]
   (This could have been done in lib/report, but I'm not sure why we want
to always print it in the summary, so I've kept it there.)
 - use $testname
 - don't buffer the output (a serious bug in v1)
 - worse performance (reads the output of all tests)
 
 run_tests.sh| 1 +
 scripts/mkstandalone.sh | 2 ++
 scripts/run.bash| 5 -
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/run_tests.sh b/run_tests.sh
index e09d410beaa4..62c106a0b693 100755
--- a/run_tests.sh
+++ b/run_tests.sh
@@ -30,6 +30,7 @@ EOF
 
 __run() { ./$TEST_DIR-run "${@}"; }
 __eval_log() { eval "${@}" >> test.log; }
+__echo_last_log() { cat test.log; } # XXX: ignores the 'last' bit
 
 echo > test.log
 
diff --git a/scripts/mkstandalone.sh b/scripts/mkstandalone.sh
index 778383077769..c37f694398b8 100755
--- a/scripts/mkstandalone.sh
+++ b/scripts/mkstandalone.sh
@@ -104,7 +104,9 @@ __run()
\$qemu \$cmdline -smp $smp $opts
 }
 
+# log goes to stdout and nothing is remembered
 __eval_log() { eval "\${@}"; }
+__echo_last_log() { echo; }
 
 run `escape "${@}"`
 exit \$?
diff --git a/scripts/run.bash b/scripts/run.bash
index f532cb9e8b1c..d3e8d37d315d 100644
--- a/scripts/run.bash
+++ b/scripts/run.bash
@@ -46,7 +46,10 @@ function run()
 *)  echo -ne "\e[31mFAIL\e[0m"
 esac
 
-echo " $1"
+echo -n " $testname"
+
+__echo_last_log | sed 'x;$s/^SUMMARY: //;ta;$s/.*//p;d;
+ :a s/, 0 unexpected failures//;s/^/ (/;s/$/)/'
 
 return $ret
 }
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 04/12] scripts/mkstandalone: improve exit paths

2015-12-17 Thread Radim Krčmář
trap can be called on EXIT, which covers most exits.

Signed-off-by: Radim Krčmář <rkrc...@redhat.com>
---
 v2: new
 
 scripts/mkstandalone.sh | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/scripts/mkstandalone.sh b/scripts/mkstandalone.sh
index cf2182dbd936..778383077769 100755
--- a/scripts/mkstandalone.sh
+++ b/scripts/mkstandalone.sh
@@ -71,7 +71,7 @@ exit 1
 EOF
 else
cat <> $standalone
-trap 'rm -f \$bin; exit 1' HUP INT TERM
+trap 'rm -f \$bin' EXIT
 bin=\`mktemp\`
 base64 -d << 'BIN_EOF' | zcat > \$bin &&
 EOF
@@ -107,10 +107,7 @@ __run()
 __eval_log() { eval "\${@}"; }
 
 run `escape "${@}"`
-ret=$?
-
-rm -f \$bin
-exit \$ret
+exit \$?
 EOF
 fi
 chmod +x $standalone
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 09/12] x86/hyperv_synic: check for support before testing

2015-12-17 Thread Radim Krčmář
It's not easy to distinguish successful unit-test from failed QEMU, so
we check for presence of the needed feature before hand.

Signed-off-by: Radim Krčmář <rkrc...@redhat.com>
---
 v2: remove "> /dev/null" as check doesn't print the output anymore
 
 x86/unittests.cfg | 1 +
 1 file changed, 1 insertion(+)

diff --git a/x86/unittests.cfg b/x86/unittests.cfg
index 6b94ad93dcf0..25779993cc27 100644
--- a/x86/unittests.cfg
+++ b/x86/unittests.cfg
@@ -182,3 +182,4 @@ arch = x86_64
 file = hyperv_synic.flat
 smp = 2
 extra_params = -cpu kvm64,hv_synic -device hyperv-testdev
+check = echo quit | $qemu -cpu kvm64,hv_synic -device hyperv-testdev -monitor 
stdio
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 07/12] x86/pmu: expect failure with nmi_watchdog

2015-12-17 Thread Radim Krčmář
Host's nmi_watchdog takes one slot, making the "all counters" unit-test
fail.  We know exactly what happens, mark it as expected failure.

PMU test is now executed regardless of host_nmi_watchdog.

Signed-off-by: Radim Krčmář <rkrc...@redhat.com>
---
 v2:
 - host_nmi_watchdog made static
 
 x86/pmu.c | 9 -
 x86/unittests.cfg | 3 +--
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/x86/pmu.c b/x86/pmu.c
index c68980044dee..70e9b3a41e96 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -92,6 +92,7 @@ struct pmu_event {
 };
 
 static int num_counters;
+static bool host_nmi_watchdog;
 
 char *buf;
 
@@ -291,7 +292,7 @@ static void check_counters_many(void)
if (!verify_counter([i]))
break;
 
-   report("all counters", i == n);
+   report_xfail("all counters", host_nmi_watchdog, i == n);
 }
 
 static void check_counter_overflow(void)
@@ -374,6 +375,7 @@ static void check_rdpmc(void)
 
 int main(int ac, char **av)
 {
+   int i;
struct cpuid id = cpuid(10);
 
setup_vm();
@@ -385,6 +387,11 @@ int main(int ac, char **av)
ebx.full = id.b;
edx.full = id.d;
 
+   /* XXX: horrible command line parsing */
+   for (i = 1; i < ac; i++)
+   if (!strcmp(av[i], "host_nmi_watchdog=1"))
+   host_nmi_watchdog = true;
+
if (!eax.split.version_id) {
printf("No pmu is detected!\n");
return report_summary();
diff --git a/x86/unittests.cfg b/x86/unittests.cfg
index c15c86df..6b94ad93dcf0 100644
--- a/x86/unittests.cfg
+++ b/x86/unittests.cfg
@@ -106,8 +106,7 @@ file = msr.flat
 
 [pmu]
 file = pmu.flat
-extra_params = -cpu host
-check = /proc/sys/kernel/nmi_watchdog=0
+extra_params = -cpu host -append "host_nmi_watchdog=`cat 
/proc/sys/kernel/nmi_watchdog`"
 
 [port80]
 file = port80.flat
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 02/12] run_tests: prepare for changes in scripts/mkstandalone

2015-12-17 Thread Radim Krčmář
mkstandalone has a different mechanism for running tests as well as a
different handling of output and return codes.
 - create two shell function to capture test execution and logging
 - return the return value of unit-test
 - cope with empty $verbose in `run`

Signed-off-by: Radim Krčmář <rkrc...@redhat.com>
---
 v2: new (reused the bitshift and comment from v1)
 
 run_tests.sh |  4 
 scripts/run.bash | 13 +
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/run_tests.sh b/run_tests.sh
index 58949e39c38c..e09d410beaa4 100755
--- a/run_tests.sh
+++ b/run_tests.sh
@@ -28,7 +28,11 @@ specify the appropriate qemu binary for ARCH-run.
 EOF
 }
 
+__run() { ./$TEST_DIR-run "${@}"; }
+__eval_log() { eval "${@}" >> test.log; }
+
 echo > test.log
+
 while getopts "g:hv" opt; do
 case $opt in
 g)
diff --git a/scripts/run.bash b/scripts/run.bash
index 0c5a2569d80e..243586c6d2fc 100644
--- a/scripts/run.bash
+++ b/scripts/run.bash
@@ -34,18 +34,23 @@ function run()
 fi
 done
 
-cmdline="TESTNAME=$testname ACCEL=$accel ./$TEST_DIR-run $kernel -smp $smp 
$opts"
-if [ $verbose != 0 ]; then
+cmdline="TESTNAME=$testname ACCEL=$accel __run $kernel -smp $smp $opts"
+if [ "$verbose" -a "$verbose" != 0 ]; then
 echo $cmdline
 fi
 
 # extra_params in the config file may contain backticks that need to be
 # expanded, so use eval to start qemu
-eval $cmdline >> test.log
+__eval_log "$cmdline"
+# The first bit of return value is too hard to use, just skip it.
+# Unit-tests' return value is shifted by one.
+ret=$(($? >> 1))
 
-if [ $? -le 1 ]; then
+if [ $ret -eq 0 ]; then
 echo -e "\e[32mPASS\e[0m $1"
 else
 echo -e "\e[31mFAIL\e[0m $1"
 fi
+
+return $ret
 }
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 05/12] lib/report: allow test skipping

2015-12-17 Thread Radim Krčmář
We can now explicitly mark a unit-test as skipped.
If all unit-tests were skipped, the whole test is reported as skipped as
well.  This also includes the case where no tests were run, but still
ended with report_summary().

When the whole test is skipped, ./run_tests.sh prints yellow "SKIP"
instead of green "PASS".

Return value of 77 is used to please Autotools.  I also renamed few
things in reporting code and chose to refactor a logic while at it.

Signed-off-by: Radim Krčmář <rkrc...@redhat.com>
---
 v2:
 - turned skip into yellow SKIP [Drew]
 - wrapped line at 80 characters [Drew]
 - added static to va_report
 
 lib/libcflat.h   |  1 +
 lib/report.c | 44 ++--
 scripts/run.bash | 12 +++-
 3 files changed, 38 insertions(+), 19 deletions(-)

diff --git a/lib/libcflat.h b/lib/libcflat.h
index 9747ccdbc9f1..070818354ee1 100644
--- a/lib/libcflat.h
+++ b/lib/libcflat.h
@@ -61,6 +61,7 @@ void report_prefix_push(const char *prefix);
 void report_prefix_pop(void);
 void report(const char *msg_fmt, bool pass, ...);
 void report_xfail(const char *msg_fmt, bool xfail, bool pass, ...);
+void report_skip(const char *msg_fmt, ...);
 int report_summary(void);
 
 #define ARRAY_SIZE(_a) (sizeof(_a)/sizeof((_a)[0]))
diff --git a/lib/report.c b/lib/report.c
index 35e664108a92..a47f2e00b529 100644
--- a/lib/report.c
+++ b/lib/report.c
@@ -13,7 +13,7 @@
 #include "libcflat.h"
 #include "asm/spinlock.h"
 
-static unsigned int tests, failures, xfailures;
+static unsigned int tests, failures, xfailures, skipped;
 static char prefixes[256];
 static struct spinlock lock;
 
@@ -43,25 +43,28 @@ void report_prefix_pop(void)
spin_unlock();
 }
 
-void va_report_xfail(const char *msg_fmt, bool xfail, bool cond, va_list va)
+static void va_report(const char *msg_fmt, bool pass, bool xfail, bool skip,
+   va_list va)
 {
-   char *pass = xfail ? "XPASS" : "PASS";
-   char *fail = xfail ? "XFAIL" : "FAIL";
char buf[2000];
+   char *prefix = skip ? "SKIP"
+   : xfail ? (pass ? "XPASS" : "XFAIL")
+   : (pass ? "PASS"  : "FAIL");
 
spin_lock();
 
tests++;
-   printf("%s: ", cond ? pass : fail);
+   printf("%s: ", prefix);
puts(prefixes);
vsnprintf(buf, sizeof(buf), msg_fmt, va);
puts(buf);
puts("\n");
-   if (xfail && cond)
-   failures++;
-   else if (xfail)
+
+   if (skip)
+   skipped++;
+   else if (xfail && !pass)
xfailures++;
-   else if (!cond)
+   else if (xfail || !pass)
failures++;
 
spin_unlock();
@@ -71,7 +74,7 @@ void report(const char *msg_fmt, bool pass, ...)
 {
va_list va;
va_start(va, pass);
-   va_report_xfail(msg_fmt, false, pass, va);
+   va_report(msg_fmt, pass, false, false, va);
va_end(va);
 }
 
@@ -79,7 +82,15 @@ void report_xfail(const char *msg_fmt, bool xfail, bool 
pass, ...)
 {
va_list va;
va_start(va, pass);
-   va_report_xfail(msg_fmt, xfail, pass, va);
+   va_report(msg_fmt, pass, xfail, false, va);
+   va_end(va);
+}
+
+void report_skip(const char *msg_fmt, ...)
+{
+   va_list va;
+   va_start(va, msg_fmt);
+   va_report(msg_fmt, false, false, true, va);
va_end(va);
 }
 
@@ -89,9 +100,14 @@ int report_summary(void)
 
printf("\nSUMMARY: %d tests, %d unexpected failures", tests, failures);
if (xfailures)
-   printf(", %d expected failures\n", xfailures);
-   else
-   printf("\n");
+   printf(", %d expected failures", xfailures);
+   if (skipped)
+   printf(", %d skipped", skipped);
+   printf("\n");
+
+   if (tests == skipped)
+   return 77; /* blame AUTOTOOLS */
+
return failures > 0 ? 1 : 0;
 
spin_unlock();
diff --git a/scripts/run.bash b/scripts/run.bash
index 243586c6d2fc..b92611c29fbb 100644
--- a/scripts/run.bash
+++ b/scripts/run.bash
@@ -46,11 +46,13 @@ function run()
 # Unit-tests' return value is shifted by one.
 ret=$(($? >> 1))
 
-if [ $ret -eq 0 ]; then
-echo -e "\e[32mPASS\e[0m $1"
-else
-echo -e "\e[31mFAIL\e[0m $1"
-fi
+case $ret in
+0)  echo -ne "\e[32mPASS\e[0m" ;;
+77) echo -ne "\e[33mSKIP\e[0m" ;;
+*)  echo -ne "\e[31mFAIL\e[0m"
+esac
+
+echo " $1"
 
 return $ret
 }
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 08/12] scripts/run: generalize check

2015-12-17 Thread Radim Krčmář
config attribute "check" is currently unused.
Provide a simple implementation instead of removing it.

Signed-off-by: Radim Krčmář <rkrc...@redhat.com>
---
 v2:
 - update scripts/mkstandalone.sh [Drew]
 - don't print too much [Drew]
 - log the output of check command
 - use $testname
 
 scripts/run.bash | 16 +---
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/scripts/run.bash b/scripts/run.bash
index b92611c29fbb..f532cb9e8b1c 100644
--- a/scripts/run.bash
+++ b/scripts/run.bash
@@ -22,17 +22,11 @@ function run()
 return
 fi
 
-# check a file for a particular value before running a test
-# the check line can contain multiple files to check separated by a space
-# but each check parameter needs to be of the form =
-for check_param in ${check[@]}; do
-path=${check_param%%=*}
-value=${check_param#*=}
-if [ "$path" ] && [ "$(cat $path)" != "$value" ]; then
-echo "skip $1 ($path not equal to $value)"
-return
-fi
-done
+__eval_log "$check" || {
+__eval_log 'echo "skipped $testname (check returned $?)"'
+echo "skip $testname (failed check)"
+return
+}
 
 cmdline="TESTNAME=$testname ACCEL=$accel __run $kernel -smp $smp $opts"
 if [ "$verbose" -a "$verbose" != 0 ]; then
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 11/12] wrappers: consolidate skip output

2015-12-17 Thread Radim Krčmář
Ugly helpers will get us yellow "SKIP" to stdout and 77 on exit.

Signed-off-by: Radim Krčmář <rkrc...@redhat.com>
---
 v2: new
 
 scripts/mkstandalone.sh |  6 +++---
 scripts/run.bash| 25 -
 2 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/scripts/mkstandalone.sh b/scripts/mkstandalone.sh
index c37f694398b8..adb11abf650c 100755
--- a/scripts/mkstandalone.sh
+++ b/scripts/mkstandalone.sh
@@ -66,7 +66,7 @@ cat scripts/run.bash >> $standalone
 
 if [ ! -f $kernel ]; then
cat <> $standalone
-echo "skip $testname (test kernel not present)" 1>&2
+echo "\$(SKIP) $testname (test kernel not present)"
 exit 1
 EOF
 else
@@ -89,8 +89,8 @@ echo \$qemu $cmdline -smp $smp $opts
 
 cmdline="\`echo '$cmdline' | sed s%$kernel%_NO_FILE_4Uhere_%\`"
 if \$qemu \$cmdline 2>&1 | grep 'No accelerator found' >/dev/null; then
-   echo "skip $testname (QEMU doesn't support KVM)"
-   exit 1
+   echo "\$(SKIP) $testname (QEMU doesn't support KVM)"
+   exit \$EXIT_SKIP
 fi
 
 __run()
diff --git a/scripts/run.bash b/scripts/run.bash
index d3e8d37d315d..06a13b9aaf1a 100644
--- a/scripts/run.bash
+++ b/scripts/run.bash
@@ -1,3 +1,10 @@
+PASS() { echo -ne "\e[32mPASS\e[0m"; }
+SKIP() { echo -ne "\e[33mSKIP\e[0m"; }
+FAIL() { echo -ne "\e[31mFAIL\e[0m"; }
+
+EXIT_SUCCESS=0
+EXIT_SKIP=77
+
 function run()
 {
 local testname="$1"
@@ -10,22 +17,22 @@ function run()
 local accel="$8"
 
 if [ -z "$testname" ]; then
-return
+return $EXIT_SKIP
 fi
 
 if [ -n "$only_group" ] && ! grep -q "$only_group" <<<$groups; then
-return
+return $EXIT_SKIP
 fi
 
 if [ -n "$arch" ] && [ "$arch" != "$ARCH" ]; then
-echo "skip $1 ($arch only)"
-return
+echo "`SKIP` $testname ($arch only)"
+return $EXIT_SKIP
 fi
 
 __eval_log "$check" || {
 __eval_log 'echo "skipped $testname (check returned $?)"'
-echo "skip $testname (failed check)"
-return
+echo "`SKIP` $testname (failed check)"
+return $EXIT_SKIP
 }
 
 cmdline="TESTNAME=$testname ACCEL=$accel __run $kernel -smp $smp $opts"
@@ -41,9 +48,9 @@ function run()
 ret=$(($? >> 1))
 
 case $ret in
-0)  echo -ne "\e[32mPASS\e[0m" ;;
-77) echo -ne "\e[33mSKIP\e[0m" ;;
-*)  echo -ne "\e[31mFAIL\e[0m"
+$EXIT_SUCCESS) PASS ;;
+$EXIT_SKIP)SKIP ;;
+*) FAIL
 esac
 
 echo -n " $testname"
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests 1/6] lib/report: allow test skipping

2015-12-15 Thread Radim Krčmář
2015-12-14 16:12-0600, Andrew Jones:
> On Mon, Dec 14, 2015 at 04:00:19PM -0600, Andrew Jones wrote:
>> On Mon, Dec 14, 2015 at 10:24:16PM +0100, Radim Krčmář wrote:
>> > ---
>> > diff --git a/lib/libcflat.h b/lib/libcflat.h
>> > -void va_report_xfail(const char *msg_fmt, bool xfail, bool cond, va_list 
>> > va)
>> > +void va_report(const char *msg_fmt, bool pass, bool xfail, bool skip, 
>> > va_list va)
>> 
>> Line greater than 80 char here. Yes, that was supposed to induce an eye
>> roll. But... this file doesn't have any "long" lines yet, so we could
>> continue avoiding them.

Ah, yeah, young'uns these days ...

>> > diff --git a/run_tests.sh b/run_tests.sh
>> > index fad22a935b00..4d813b9a7084 100755
>> > --- a/run_tests.sh
>> > +++ b/run_tests.sh
>> > @@ -55,12 +55,15 @@ function run()
>> >  # extra_params in the config file may contain backticks that need to 
>> > be
>> >  # expanded, so use eval to start qemu
>> >  eval $cmdline >> test.log
>> > +# The first bit of return value is too hard to use, just skip it.
>> > +# Unit-tests' return value is shifted by one.
>> > +case $(($? >> 1)) in
>> > +0)  echo -ne "\e[32mPASS\e[0m" ;;
>> > +77) echo -ne "skip" ;;
>> 
>> Why not "\e[31mSKIP\e[0m"? (and without those escape sequences echo doesn't
>> need -e)
> 
> oops, copy+paste error, I meant to put use color code 33 (yellow).

Will do.  (I started with yellow SKIP, but then kept white skip as I
didn't research the reason why it was chosen.)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests 2/6] x86/*: report skipped tests

2015-12-15 Thread Radim Krčmář
2015-12-14 16:07-0600, Andrew Jones:
> On Mon, Dec 14, 2015 at 10:24:17PM +0100, Radim Krčmář wrote:
>> No care to consistency or exhaustivity was given.
>> 
>> (svm-disabled test should be redone and it's weird that x86/hyperv_synic
>>  is about the only one that does report_skip when unsupported.)
>> 
>> Signed-off-by: Radim Krčmář <rkrc...@redhat.com>
>> ---
>> diff --git a/x86/apic.c b/x86/apic.c
>> @@ -55,9 +54,9 @@ static int enable_tsc_deadline_timer(void)
>>  static void test_tsc_deadline_timer(void)
>>  {
>>  if(enable_tsc_deadline_timer()) {
>> -printf("tsc deadline timer enabled\n");
>> +__test_tsc_deadline_timer();
>>  } else {
>> -printf("tsc deadline timer not detected\n");
>> +report_skip("tsc deadline timer not detected\n");
> 
> You probably don't want this '\n' anymore.

Thanks, I should have written that no care at all was given :(
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests 0/6] Improve the output of test runners

2015-12-15 Thread Radim Krčmář
2015-12-15 11:19+0100, Paolo Bonzini:
> On 14/12/2015 22:24, Radim Krčmář wrote:
>> This series is a mix of patches that change the output of run_tests.sh
>> and x86-run.  The output of ./run_tests.sh now looks like this:
> 
> I like the idea, thanks!  I agree with Andrew about pretty much
> everything, except that I like having the summary close to PASS/FAIL.

I'm planning a summary without useless informaton in v2, which might be
acceptable for everyone :)

>>> qemu-kvm: Property '.hv-synic' not found
>>> skip hyperv_synic (failed $(echo quit | $qemu -enable-kvm -cpu 
>>> kvm64,hv_synic -device hyperv-testdev -monitor stdio > /dev/null))
> 
> Here I'd just print "failed check".

Ok, thanks.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests 3/6] x86/pmu: expect failure with nmi_watchdog

2015-12-15 Thread Radim Krčmář
2015-12-14 16:05-0600, Andrew Jones:
> On Mon, Dec 14, 2015 at 10:24:18PM +0100, Radim Krčmář wrote:
>> diff --git a/x86/pmu.c b/x86/pmu.c
>> @@ -291,7 +292,7 @@ static void check_counters_many(void)
>>  if (!verify_counter([i]))
>>  break;
>>  
>> -report("all counters", i == n);
>> +report_xfail("all counters", host_nmi_watchdog, i == n);
> 
> How about outputting "host_nmi_watchdog=%d" as well?

It's already implied in the output.  Prefix will be XPASS/XFAIL if
host_nmi_watchdog=1 and PASS/FAIL otherwise.

Should it still be explicitly printed?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests 4/6] run_tests: generalize check

2015-12-15 Thread Radim Krčmář
2015-12-14 16:11-0600, Andrew Jones:
> On Mon, Dec 14, 2015 at 10:24:19PM +0100, Radim Krčmář wrote:
> > ---
>> diff --git a/run_tests.sh b/run_tests.sh
>> @@ -35,17 +35,10 @@ function run()
>>  return
>>  fi
>>  
>> -# check a file for a particular value before running a test
>> -# the check line can contain multiple files to check separated by a 
>> space
>> -# but each check parameter needs to be of the form =
>> -for check_param in ${check[@]}; do
>> -path=${check_param%%=*}
>> -value=${check_param#*=}
>> -if [ "$path" ] && [ "$(cat $path)" != "$value" ]; then
>> -echo "skip $1 ($path not equal to $value)"
>> -return
>> -fi
>> -done
>> +eval $check || {
>> +echo "skip $1 (failed \$($check))"
>> +return
>> +}
> 
> I think we should use "\e[33mSKIP\e[0m" for skip. Maybe we should create
> pass(),fail(),skip() functions in order to make sure all callers use the
> same prefix with the same color.

Sounds good.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests 0/6] Improve the output of test runners

2015-12-15 Thread Radim Krčmář
2015-12-14 16:20-0600, Andrew Jones:
> On Mon, Dec 14, 2015 at 10:24:15PM +0100, Radim Krčmář wrote:
>> > skip vmx (0 tests, 0 unexpected failures)
>> > PASS debug (7 tests, 0 unexpected failures)
>> > qemu-kvm: Property '.hv-synic' not found
>> > skip hyperv_synic (failed $(echo quit | $qemu -enable-kvm -cpu 
>> > kvm64,hv_synic -device hyperv-testdev -monitor stdio > /dev/null))
> 
> I'm not sure I like the (summary) addition. A summary of why we skip
> would be useful, like the (i386 only) stuff, but otherwise it doesn't
> seem necessary, and the "(failed $(echo quit | $qemu -enable-kvm -cpu
> kvm64,hv_synic -device hyperv-testdev -monitor stdio > /dev/null))"
> summary is a bit ugly, wrapping on many terminals.

The summary should help to make callers more certain of what happened
without spending time with tests.log.
I'd like the summary more if we didn't print "0 unexpected failures" in
passing/skipped cases, so maybe we can agree on that?

I'll shorten the failed case.

> Another comment that is series wide is that all these changes need to be
> tested with 'make standalone' (and many of your patches will require
> changes to scripts/mkstandalone.sh, for which you will never forgive
> me for having written :-)

First time I heard about it. :)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 0/6] Improve the output of test runners

2015-12-14 Thread Radim Krčmář
This series is a mix of patches that change the output of run_tests.sh
and x86-run.  The output of ./run_tests.sh now looks like this:

> PASS apic (14 tests, 0 unexpected failures)
> PASS ioapic (19 tests, 0 unexpected failures)
> PASS smptest (1 tests, 0 unexpected failures)
> PASS smptest3 (1 tests, 0 unexpected failures)
> PASS vmexit_cpuid 
> PASS vmexit_vmcall 
> PASS vmexit_mov_from_cr8 
> PASS vmexit_mov_to_cr8 
> PASS vmexit_inl_pmtimer 
> PASS vmexit_ipi 
> PASS vmexit_ipi_halt 
> PASS vmexit_ple_round_robin 
> PASS access 
> skip smap (0 tests, 0 unexpected failures)
> skip pku (0 tests, 0 unexpected failures)
> PASS emulator (132 tests, 0 unexpected failures, 1 skipped)
> PASS eventinj (13 tests, 0 unexpected failures)
> PASS hypercall (2 tests, 0 unexpected failures)
> PASS idt_test (4 tests, 0 unexpected failures)
> PASS msr (13 tests, 0 unexpected failures)
> PASS pmu (67 tests, 0 unexpected failures, 1 expected failures)
> PASS port80 
> PASS realmode 
> PASS s3 
> PASS sieve 
> PASS tsc (3 tests, 0 unexpected failures)
> PASS tsc_adjust (5 tests, 0 unexpected failures)
> PASS xsave (17 tests, 0 unexpected failures)
> PASS rmap_chain 
> skip svm (0 tests, 0 unexpected failures)
> skip svm-disabled (0 tests, 0 unexpected failures)
> skip taskswitch (i386 only)
> skip taskswitch2 (i386 only)
> PASS kvmclock_test 
> PASS pcid (3 tests, 0 unexpected failures)
> skip vmx (0 tests, 0 unexpected failures)
> PASS debug (7 tests, 0 unexpected failures)
> qemu-kvm: Property '.hv-synic' not found
> skip hyperv_synic (failed $(echo quit | $qemu -enable-kvm -cpu kvm64,hv_synic 
> -device hyperv-testdev -monitor stdio > /dev/null))

I don't like it too much, but it still seems like an improvement.

The main difference is in extra information for tests using
report_summary(), that smap and pku tests don't fail, pmu test isn't
completely skipped, and svm, vmx, and hyperv_synic are skipped.

The old one looked like this:

> PASS apic
> PASS ioapic
> PASS smptest
> PASS smptest3
> PASS vmexit_cpuid
> PASS vmexit_vmcall
> PASS vmexit_mov_from_cr8
> PASS vmexit_mov_to_cr8
> PASS vmexit_inl_pmtimer
> PASS vmexit_ipi
> PASS vmexit_ipi_halt
> PASS vmexit_ple_round_robin
> PASS access
> FAIL smap
> FAIL pku
> PASS emulator
> PASS eventinj
> PASS hypercall
> PASS idt_test
> PASS msr
> skip pmu (/proc/sys/kernel/nmi_watchdog not equal to 0)
> PASS port80
> PASS realmode
> PASS s3
> PASS sieve
> PASS tsc
> PASS tsc_adjust
> PASS xsave
> PASS rmap_chain
> PASS svm
> PASS svm-disabled
> skip taskswitch (i386 only)
> skip taskswitch2 (i386 only)
> PASS kvmclock_test
> PASS pcid
> PASS vmx
> PASS debug
> qemu-kvm: Property '.hv-synic' not found
> PASS hyperv_synic


Radim Krčmář (6):
  lib/report: allow test skipping
  x86/*: report that the test was skipped
  x86/pmu: expect failure with nmi_watchdog
  run_tests: generalize check
  x86/hyperv_synic: check for support before testing
  run_tests: print summary

 lib/libcflat.h |  1 +
 lib/report.c   | 43 +--
 run_tests.sh   | 35 ++-
 x86/apic.c |  7 +++
 x86/emulator.c |  2 +-
 x86/hyperv_synic.c |  2 +-
 x86/pku.c  |  2 +-
 x86/pmu.c  | 11 +--
 x86/smap.c |  2 +-
 x86/svm.c  |  2 +-
 x86/tsc.c  |  2 +-
 x86/unittests.cfg  |  4 ++--
 12 files changed, 68 insertions(+), 45 deletions(-)

-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 1/6] lib/report: allow test skipping

2015-12-14 Thread Radim Krčmář
This patch allows us to explicitly mark a unit-test as skipped.
If all unit-tests were skipped, the whole test is reported as skipped as
well.  This also includes the case where no report()s were done, but
the test still ended with report_summary().

When the whole test is skipped, ./run_tests.sh prints "skip" instead of
green "PASS".

Return value of 77 is used to please Autotools.  I also renamed few
things in reporting code and chose to refactor a logic while at it.

Signed-off-by: Radim Krčmář <rkrc...@redhat.com>
---
 lib/libcflat.h |  1 +
 lib/report.c   | 43 +--
 run_tests.sh   | 13 -
 3 files changed, 38 insertions(+), 19 deletions(-)

diff --git a/lib/libcflat.h b/lib/libcflat.h
index 9747ccdbc9f1..070818354ee1 100644
--- a/lib/libcflat.h
+++ b/lib/libcflat.h
@@ -61,6 +61,7 @@ void report_prefix_push(const char *prefix);
 void report_prefix_pop(void);
 void report(const char *msg_fmt, bool pass, ...);
 void report_xfail(const char *msg_fmt, bool xfail, bool pass, ...);
+void report_skip(const char *msg_fmt, ...);
 int report_summary(void);
 
 #define ARRAY_SIZE(_a) (sizeof(_a)/sizeof((_a)[0]))
diff --git a/lib/report.c b/lib/report.c
index 35e664108a92..e07baa347298 100644
--- a/lib/report.c
+++ b/lib/report.c
@@ -13,7 +13,7 @@
 #include "libcflat.h"
 #include "asm/spinlock.h"
 
-static unsigned int tests, failures, xfailures;
+static unsigned int tests, failures, xfailures, skipped;
 static char prefixes[256];
 static struct spinlock lock;
 
@@ -43,25 +43,27 @@ void report_prefix_pop(void)
spin_unlock();
 }
 
-void va_report_xfail(const char *msg_fmt, bool xfail, bool cond, va_list va)
+void va_report(const char *msg_fmt, bool pass, bool xfail, bool skip, va_list 
va)
 {
-   char *pass = xfail ? "XPASS" : "PASS";
-   char *fail = xfail ? "XFAIL" : "FAIL";
char buf[2000];
+   char *prefix = skip ? "SKIP"
+   : xfail ? (pass ? "XPASS" : "XFAIL")
+   : (pass ? "PASS"  : "FAIL");
 
spin_lock();
 
tests++;
-   printf("%s: ", cond ? pass : fail);
+   printf("%s: ", prefix);
puts(prefixes);
vsnprintf(buf, sizeof(buf), msg_fmt, va);
puts(buf);
puts("\n");
-   if (xfail && cond)
-   failures++;
-   else if (xfail)
+
+   if (skip)
+   skipped++;
+   else if (xfail && !pass)
xfailures++;
-   else if (!cond)
+   else if (xfail || !pass)
failures++;
 
spin_unlock();
@@ -71,7 +73,7 @@ void report(const char *msg_fmt, bool pass, ...)
 {
va_list va;
va_start(va, pass);
-   va_report_xfail(msg_fmt, false, pass, va);
+   va_report(msg_fmt, pass, false, false, va);
va_end(va);
 }
 
@@ -79,7 +81,15 @@ void report_xfail(const char *msg_fmt, bool xfail, bool 
pass, ...)
 {
va_list va;
va_start(va, pass);
-   va_report_xfail(msg_fmt, xfail, pass, va);
+   va_report(msg_fmt, pass, xfail, false, va);
+   va_end(va);
+}
+
+void report_skip(const char *msg_fmt, ...)
+{
+   va_list va;
+   va_start(va, msg_fmt);
+   va_report(msg_fmt, false, false, true, va);
va_end(va);
 }
 
@@ -89,9 +99,14 @@ int report_summary(void)
 
printf("\nSUMMARY: %d tests, %d unexpected failures", tests, failures);
if (xfailures)
-   printf(", %d expected failures\n", xfailures);
-   else
-   printf("\n");
+   printf(", %d expected failures", xfailures);
+   if (skipped)
+   printf(", %d skipped", skipped);
+   printf("\n");
+
+   if (tests == skipped)
+   return 77; /* blame AUTOTOOLS */
+
return failures > 0 ? 1 : 0;
 
spin_unlock();
diff --git a/run_tests.sh b/run_tests.sh
index fad22a935b00..4d813b9a7084 100755
--- a/run_tests.sh
+++ b/run_tests.sh
@@ -55,12 +55,15 @@ function run()
 # extra_params in the config file may contain backticks that need to be
 # expanded, so use eval to start qemu
 eval $cmdline >> test.log
+# The first bit of return value is too hard to use, just skip it.
+# Unit-tests' return value is shifted by one.
+case $(($? >> 1)) in
+0)  echo -ne "\e[32mPASS\e[0m" ;;
+77) echo -ne "skip" ;;
+*)  echo -ne "\e[31mFAIL\e[0m"
+esac
 
-if [ $? -le 1 ]; then
-echo -e "\e[32mPASS\e[0m $1"
-else
-echo -e "\e[31mFAIL\e[0m $1"
-fi
+echo " $1"
 }
 
 function usage()
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 2/6] x86/*: report skipped tests

2015-12-14 Thread Radim Krčmář
No care to consistency or exhaustivity was given.

(svm-disabled test should be redone and it's weird that x86/hyperv_synic
 is about the only one that does report_skip when unsupported.)

Signed-off-by: Radim Krčmář <rkrc...@redhat.com>
---
 x86/apic.c | 7 +++
 x86/emulator.c | 2 +-
 x86/hyperv_synic.c | 2 +-
 x86/pku.c  | 2 +-
 x86/pmu.c  | 2 +-
 x86/smap.c | 2 +-
 x86/svm.c  | 2 +-
 x86/tsc.c  | 2 +-
 8 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/x86/apic.c b/x86/apic.c
index d4eec529e535..57af86de8f8c 100644
--- a/x86/apic.c
+++ b/x86/apic.c
@@ -27,7 +27,7 @@ static void tsc_deadline_timer_isr(isr_regs_t *regs)
 ++tdt_count;
 }
 
-static void start_tsc_deadline_timer(void)
+static void __test_tsc_deadline_timer(void)
 {
 handle_irq(TSC_DEADLINE_TIMER_VECTOR, tsc_deadline_timer_isr);
 irq_enable();
@@ -45,7 +45,6 @@ static int enable_tsc_deadline_timer(void)
 if (cpuid(1).c & (1 << 24)) {
 lvtt = TSC_DEADLINE_TIMER_MODE | TSC_DEADLINE_TIMER_VECTOR;
 apic_write(APIC_LVTT, lvtt);
-start_tsc_deadline_timer();
 return 1;
 } else {
 return 0;
@@ -55,9 +54,9 @@ static int enable_tsc_deadline_timer(void)
 static void test_tsc_deadline_timer(void)
 {
 if(enable_tsc_deadline_timer()) {
-printf("tsc deadline timer enabled\n");
+__test_tsc_deadline_timer();
 } else {
-printf("tsc deadline timer not detected\n");
+report_skip("tsc deadline timer not detected\n");
 }
 }
 
diff --git a/x86/emulator.c b/x86/emulator.c
index e5c1c6b9a2f3..b64a5fe0f3dc 100644
--- a/x86/emulator.c
+++ b/x86/emulator.c
@@ -1062,7 +1062,7 @@ static void illegal_movbe_handler(struct ex_regs *regs)
 static void test_illegal_movbe(void)
 {
if (!(cpuid(1).c & (1 << 22))) {
-   printf("SKIP: illegal movbe\n");
+   report_skip("illegal movbe");
return;
}
 
diff --git a/x86/hyperv_synic.c b/x86/hyperv_synic.c
index 18d1295bfb37..602b79392bfd 100644
--- a/x86/hyperv_synic.c
+++ b/x86/hyperv_synic.c
@@ -228,7 +228,7 @@ int main(int ac, char **av)
 
 report("Hyper-V SynIC test", ok);
 } else {
-report("Hyper-V SynIC is not supported", true);
+report_skip("Hyper-V SynIC is not supported");
 }
 
 return report_summary();
diff --git a/x86/pku.c b/x86/pku.c
index 0e00b9984d70..58971d21ed05 100644
--- a/x86/pku.c
+++ b/x86/pku.c
@@ -91,7 +91,7 @@ int main(int ac, char **av)
 
 if (!(cpuid_indexed(7, 0).c & (1 << X86_FEATURE_PKU))) {
 printf("PKU not enabled, exiting\n");
-exit(1);
+return report_summary();
 }
 
 setup_vm();
diff --git a/x86/pmu.c b/x86/pmu.c
index 03f80190bb25..c68980044dee 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -387,7 +387,7 @@ int main(int ac, char **av)
 
if (!eax.split.version_id) {
printf("No pmu is detected!\n");
-   return 1;
+   return report_summary();
}
printf("PMU version: %d\n", eax.split.version_id);
printf("GP counters: %d\n", eax.split.num_counters);
diff --git a/x86/smap.c b/x86/smap.c
index d8a7ae82dc00..0aa44054bd48 100644
--- a/x86/smap.c
+++ b/x86/smap.c
@@ -93,7 +93,7 @@ int main(int ac, char **av)
 
if (!(cpuid_indexed(7, 0).b & (1 << X86_FEATURE_SMAP))) {
printf("SMAP not enabled, exiting\n");
-   exit(1);
+   return report_summary();
}
 
setup_vm();
diff --git a/x86/svm.c b/x86/svm.c
index 1046ddf73732..ff1a0f34b4bf 100644
--- a/x86/svm.c
+++ b/x86/svm.c
@@ -1064,7 +1064,7 @@ int main(int ac, char **av)
 
 if (!(cpuid(0x8001).c & 4)) {
 printf("SVM not availble\n");
-return 0;
+return report_summary();
 }
 
 setup_svm();
diff --git a/x86/tsc.c b/x86/tsc.c
index c71dc2a7abe0..ee247459fb42 100644
--- a/x86/tsc.c
+++ b/x86/tsc.c
@@ -43,5 +43,5 @@ int main()
test_rdtscp(0x100);
} else
printf("rdtscp not supported\n");
-   return 0;
+   return report_summary();
 }
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 5/6] x86/hyperv_synic: check for support before testing

2015-12-14 Thread Radim Krčmář
It's not easy to distinguish successful unit-test from failed QEMU, so
we check for presence of the needed feature before hand.

x86/hyperv_synic is "skip" instead of "PASS", without KVM/QEMU support.

Signed-off-by: Radim Krčmář <rkrc...@redhat.com>
---
 Probably not the nicest solution around ...
 "dry_run_qemu_and_skip_if_it_fails=true" parameter in unittest.cfg
 might be better.

 x86/unittests.cfg | 1 +
 1 file changed, 1 insertion(+)

diff --git a/x86/unittests.cfg b/x86/unittests.cfg
index 6b94ad93dcf0..32fc5f7037e1 100644
--- a/x86/unittests.cfg
+++ b/x86/unittests.cfg
@@ -182,3 +182,4 @@ arch = x86_64
 file = hyperv_synic.flat
 smp = 2
 extra_params = -cpu kvm64,hv_synic -device hyperv-testdev
+check = echo quit | $qemu -enable-kvm -cpu kvm64,hv_synic -device 
hyperv-testdev -monitor stdio > /dev/null
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 4/6] run_tests: generalize check

2015-12-14 Thread Radim Krčmář
config attribute "check" is currently unused.
Provide a simple implementation instead of removing it.

Signed-off-by: Radim Krčmář <rkrc...@redhat.com>
---
 run_tests.sh | 15 ---
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/run_tests.sh b/run_tests.sh
index 4d813b9a7084..b0b064f2e341 100755
--- a/run_tests.sh
+++ b/run_tests.sh
@@ -35,17 +35,10 @@ function run()
 return
 fi
 
-# check a file for a particular value before running a test
-# the check line can contain multiple files to check separated by a space
-# but each check parameter needs to be of the form =
-for check_param in ${check[@]}; do
-path=${check_param%%=*}
-value=${check_param#*=}
-if [ "$path" ] && [ "$(cat $path)" != "$value" ]; then
-echo "skip $1 ($path not equal to $value)"
-return
-fi
-done
+eval $check || {
+echo "skip $1 (failed \$($check))"
+return
+}
 
 cmdline="TESTNAME=$testname ACCEL=$accel ./$TEST_DIR-run $kernel -smp $smp 
$opts"
 if [ $verbose != 0 ]; then
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 3/6] x86/pmu: expect failure with nmi_watchdog

2015-12-14 Thread Radim Krčmář
Host's nmi_watchdog takes one slot, making the "all counters" unit-test
fail.  We know exactly what happens, mark it as expected failure.

PMU test is now executed regardless of host_nmi_watchdog.

Signed-off-by: Radim Krčmář <rkrc...@redhat.com>
---
 x86/pmu.c | 9 -
 x86/unittests.cfg | 3 +--
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/x86/pmu.c b/x86/pmu.c
index c68980044dee..4ca93235b977 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -92,6 +92,7 @@ struct pmu_event {
 };
 
 static int num_counters;
+bool host_nmi_watchdog;
 
 char *buf;
 
@@ -291,7 +292,7 @@ static void check_counters_many(void)
if (!verify_counter([i]))
break;
 
-   report("all counters", i == n);
+   report_xfail("all counters", host_nmi_watchdog, i == n);
 }
 
 static void check_counter_overflow(void)
@@ -374,6 +375,7 @@ static void check_rdpmc(void)
 
 int main(int ac, char **av)
 {
+   int i;
struct cpuid id = cpuid(10);
 
setup_vm();
@@ -385,6 +387,11 @@ int main(int ac, char **av)
ebx.full = id.b;
edx.full = id.d;
 
+   /* XXX: horrible command line parsing */
+   for (i = 1; i < ac; i++)
+   if (!strcmp(av[i], "host_nmi_watchdog=1"))
+   host_nmi_watchdog = true;
+
if (!eax.split.version_id) {
printf("No pmu is detected!\n");
return report_summary();
diff --git a/x86/unittests.cfg b/x86/unittests.cfg
index c15c86df..6b94ad93dcf0 100644
--- a/x86/unittests.cfg
+++ b/x86/unittests.cfg
@@ -106,8 +106,7 @@ file = msr.flat
 
 [pmu]
 file = pmu.flat
-extra_params = -cpu host
-check = /proc/sys/kernel/nmi_watchdog=0
+extra_params = -cpu host -append "host_nmi_watchdog=`cat 
/proc/sys/kernel/nmi_watchdog`"
 
 [port80]
 file = port80.flat
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 6/6] run_tests: print summary

2015-12-14 Thread Radim Krčmář
Add shortened SUMMARY line into the output (in parentheses).

Might be interesting and hopefully won't break too many scripts.

Signed-off-by: Radim Krčmář <rkrc...@redhat.com>
---
 run_tests.sh | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/run_tests.sh b/run_tests.sh
index b0b064f2e341..d0b282f7b079 100755
--- a/run_tests.sh
+++ b/run_tests.sh
@@ -21,6 +21,7 @@ function run()
 local arch="$6"
 local check="$7"
 local accel="$8"
+local output
 
 if [ -z "$testname" ]; then
 return
@@ -47,7 +48,7 @@ function run()
 
 # extra_params in the config file may contain backticks that need to be
 # expanded, so use eval to start qemu
-eval $cmdline >> test.log
+output=`eval $cmdline`
 # The first bit of return value is too hard to use, just skip it.
 # Unit-tests' return value is shifted by one.
 case $(($? >> 1)) in
@@ -56,7 +57,11 @@ function run()
 *)  echo -ne "\e[31mFAIL\e[0m"
 esac
 
-echo " $1"
+echo -n " $1 "
+
+printf "%s\n" "$output" |
+  tee -a test.log |
+  sed 'x;$s/^SUMMARY: //;ta;$s/.*//p;d;:a s/^/(/;s/$/)/'
 }
 
 function usage()
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Add lowest-priority support for vt-d posted-interrupts

2015-12-09 Thread Radim Krčmář
2015-12-09 08:19+, Wu, Feng:
>> -Original Message-
>> From: Radim Krčmář [mailto:rkrc...@redhat.com]
>> Sent: Tuesday, November 17, 2015 3:03 AM
>> To: Wu, Feng <feng...@intel.com>
>> Cc: pbonz...@redhat.com; kvm@vger.kernel.org; linux-ker...@vger.kernel.org
>> Subject: Re: [PATCH] KVM: x86: Add lowest-priority support for vt-d posted-
>> interrupts
>> 
>> 2015-11-09 10:46+0800, Feng Wu:
>> > +struct kvm_vcpu *kvm_intr_vector_hashing_dest(struct kvm *kvm,
>> > +struct kvm_lapic_irq *irq)
>> > +
>> > +{
>> > +  unsigned long dest_vcpu_bitmap[BITS_TO_LONGS(KVM_MAX_VCPUS)];
>> > +  unsigned int dest_vcpus = 0;
>> > +  struct kvm_vcpu *vcpu;
>> > +  unsigned int i, mod, idx = 0;
>> > +
>> > +  vcpu = kvm_intr_vector_hashing_dest_fast(kvm, irq);
>> > +  if (vcpu)
>> > +  return vcpu;
>> 
>> I think the rest of this function shouldn't be implemented:
>>  - Shorthands are only for IPIs and hence don't need to be handled,
>>  - Lowest priority physical broadcast is not supported,
>>  - Lowest priority cluster logical broadcast is not supported,
>>  - No point in optimizing mixed xAPIC and x2APIC mode,
> 
> I read your comments again, and don't quite understand why we
> don't need PI optimization for mixed xAPIC and x2APIC mode.

There shouldn't be a non-hobbyist operating system that uses mixed mode,
so the optimization would practically be dead code as all other cases
are handled by kvm_intr_vector_hashing_dest_fast().

I think that having extra code would bring problems in the future -- we
need to take care of it when refactoring KVM's APIC and we should also
write a unit-test for this otherwise dead path.  I don't think that the
benefit for guests would ever balance those efforts.

(Physical xAPIC+x2APIC mode is still somewhat reasonable and xAPIC CPUs
 start with LDR=0, which means that operating system doesn't need to
 utilize mixed mode, as defined by KVM, when switching to x2APIC.)

> BTW, can we have mixed flat and cluster mode?

Yes, KVM recognizes that mixed mode, but luckily, there are severe
limitations.

Notes below SDM section 10.6.2.2:
  All processors that have their APIC software enabled (using the
  spurious vector enable/disable bit) must have their DFRs (Destination
  Format Registers) programmed identically.

I hope there isn't a human that would use it in good faith.

(Only NMI/SMI/INIT/SIPI are delivered in software disabled mode and if
 the system uses cluster xAPIC, OS should set DFR before LDR, which
 doesn't trigger mixed mode either.)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Add lowest-priority support for vt-d posted-interrupts

2015-11-25 Thread Radim Krčmář
2015-11-25 15:38+0100, Paolo Bonzini:
> On 25/11/2015 15:12, Radim Krcmár wrote:
>> I think it's ok to pick any algorithm we like.  It's unlikely that
>> software would recognize and take advantage of the hardware algorithm
>> without adding a special treatment for KVM.
>> (I'd vote for the simple pick-first-APIC lowest priority algorithm ...
>>  I don't see much point in complicating lowest priority when it doesn't
>>  deliver to lowest priority CPU anyway.)
> 
> Vector hashing is an improvement for the common case where all vectors
> are set to all CPUs.  Sure you can get an unlucky assignment, but it's
> still better than pick-first-APIC.

Yeah, hashing has a valid use case, but a subtle weighting of drawbacks
led me to prefer pick-first-APIC ...

(I'd prefer to have simple code in KVM and depend on static IRQ balancing
 in a guest to handle the distribution.
 The guest could get the unlucky assignment anyway, so it should be
 prepared;  and hashing just made KVM worse in that case.  Guests might
 also configure physical x(2)APIC, where is no lowest priority.
 And if the guest doesn't do anything with IRQs, then it might not even
 care about the impact that our choice has.)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Add lowest-priority support for vt-d posted-interrupts

2015-11-24 Thread Radim Krčmář
2015-11-24 01:26+, Wu, Feng:
> "I don't think we do any vector hashing on our client parts.  This may be why 
> the customer is not able to detect this on Skylake client silicon.
> The vector hashing is micro-architectural and something we had done on server 
> parts.
> 
> If you look at the haswell server CPU spec 
> (https://www-ssl.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf)
> In section 4.1.2, you will see an IntControl register (this is a register 
> controlled/configured by BIOS) - see below.

Thank you!

> If you look at bits 6:4 in that register, you see the option we offer in 
> hardware for what kind of redirection is applied to lowest priority 
> interrupts.
> There are three options:
> 1.Fixed priority  
> 2.Redirect last 
> 3.Hash Vector
> 
> If picking vector hash, then bits 10:8 specifies the APIC-ID bits used for 
> the hashing."

The hash function just interprets a subset of vector's bits as a number
and uses that as a starting offset in a search for an enabled APIC
within the destination set?

For example:
The x2APIC destination is 0x0055 (= first four even APICs in cluster
0), the vector is 0b1110, and bits 10:8 of IntControl are 000.

000 means that bits 7:4 of vector are selected, thus the vector hash is
0b1110 = 14, so the round-robin effectively does 14 % 4 (because we only
have 4 destinations) and delivers to the 3rd possible APIC (= ID 6)?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Add lowest-priority support for vt-d posted-interrupts

2015-11-24 Thread Radim Krčmář
2015-11-24 15:31+0100, Radim Krčmář:
> 000 means that bits 7:4 of vector are selected, thus the vector hash is
> 0b1110 = 14, so the round-robin effectively does 14 % 4 (because we only
> have 4 destinations) and delivers to the 3rd possible APIC (= ID 6)?

Ah, 3rd APIC in the set has ID 4, of course :)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Add lowest-priority support for vt-d posted-interrupts

2015-11-16 Thread Radim Krčmář
2015-11-09 10:46+0800, Feng Wu:
> Use vector-hashing to handle lowest-priority interrupts for
> posted-interrupts. As an example, modern Intel CPUs use this
> method to handle lowest-priority interrupts.

(I don't think it's a good idea that the algorithm differs from non-PI
 lowest priority delivery.  I'd make them both vector-hashing, which
 would be "fun" to explain to people expecting round robin ...)

> Signed-off-by: Feng Wu 
> ---
> diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
> +/*
> + * This routine handles lowest-priority interrupts using vector-hashing
> + * mechanism. As an example, modern Intel CPUs use this method to handle
> + * lowest-priority interrupts.
> + *
> + * Here is the details about the vector-hashing mechanism:
> + * 1. For lowest-priority interrupts, store all the possible destination
> + *vCPUs in an array.
> + * 2. Use "guest vector % max number of destination vCPUs" to find the right
> + *destination vCPU in the array for the lowest-priority interrupt.
> + */

(Is Skylake i7-6700 a modern Intel CPU?
 I didn't manage to get hashing ... all interrupts always went to the
 lowest APIC ID in the set :/
 Is there a simple way to verify the algorithm?)

> +struct kvm_vcpu *kvm_intr_vector_hashing_dest(struct kvm *kvm,
> +   struct kvm_lapic_irq *irq)
> +
> +{
> + unsigned long dest_vcpu_bitmap[BITS_TO_LONGS(KVM_MAX_VCPUS)];
> + unsigned int dest_vcpus = 0;
> + struct kvm_vcpu *vcpu;
> + unsigned int i, mod, idx = 0;
> +
> + vcpu = kvm_intr_vector_hashing_dest_fast(kvm, irq);
> + if (vcpu)
> + return vcpu;

I think the rest of this function shouldn't be implemented:
 - Shorthands are only for IPIs and hence don't need to be handled,
 - Lowest priority physical broadcast is not supported,
 - Lowest priority cluster logical broadcast is not supported,
 - No point in optimizing mixed xAPIC and x2APIC mode,
 - The rest is handled by kvm_intr_vector_hashing_dest_fast().
   (Even lowest priority flat logical "broadcast".)
 - We do the work twice when vcpu == NULL means that there is no
   matching destination.

Is there a valid case that can be resolved by going through all vcpus?

> +
> + memset(dest_vcpu_bitmap, 0, sizeof(dest_vcpu_bitmap));
> +
> + kvm_for_each_vcpu(i, vcpu, kvm) {
> + if (!kvm_apic_present(vcpu))
> + continue;
> +
> + if (!kvm_apic_match_dest(vcpu, NULL, irq->shorthand,
> + irq->dest_id, irq->dest_mode))
> + continue;
> +
> + __set_bit(vcpu->vcpu_id, dest_vcpu_bitmap);
> + dest_vcpus++;
> + }
> +
> + if (dest_vcpus == 0)
> + return NULL;
> +
> + mod = irq->vector % dest_vcpus;
> +
> + for (i = 0; i <= mod; i++) {
> + idx = find_next_bit(dest_vcpu_bitmap, KVM_MAX_VCPUS, idx) + 1;
> + BUG_ON(idx >= KVM_MAX_VCPUS);
> + }
> +
> + return kvm_get_vcpu(kvm, idx - 1);
> +}
> +EXPORT_SYMBOL_GPL(kvm_intr_vector_hashing_dest);
> +
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> @@ -816,6 +816,63 @@ out:
> +struct kvm_vcpu *kvm_intr_vector_hashing_dest_fast(struct kvm *kvm,
> +struct kvm_lapic_irq *irq)

We now have three very similar functions :(

  kvm_irq_delivery_to_apic_fast
  kvm_intr_is_single_vcpu_fast
  kvm_intr_vector_hashing_dest_fast

By utilizing the gcc optimizer, they can be merged without introducing
many instructions to the hot path, kvm_irq_delivery_to_apic_fast.
(I would eventually do it, so you can save time by ignoring this.)

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: VMX: fix SMEP and SMAP without EPT

2015-11-02 Thread Radim Krčmář
The comment in code had it mostly right, but we enable paging for
emulated real mode regardless of EPT.

Without EPT (which implies emulated real mode), secondary VCPUs won't
start unless we disable SM[AE]P when the guest doesn't use paging.

Signed-off-by: Radim Krčmář <rkrc...@redhat.com>
---
 arch/x86/kvm/vmx.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index b680c2e0e8a3..ab598558a7a4 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3788,20 +3788,21 @@ static int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned 
long cr4)
if (!is_paging(vcpu)) {
hw_cr4 &= ~X86_CR4_PAE;
hw_cr4 |= X86_CR4_PSE;
-   /*
-* SMEP/SMAP is disabled if CPU is in non-paging mode
-* in hardware. However KVM always uses paging mode to
-* emulate guest non-paging mode with TDP.
-* To emulate this behavior, SMEP/SMAP needs to be
-* manually disabled when guest switches to non-paging
-* mode.
-*/
-   hw_cr4 &= ~(X86_CR4_SMEP | X86_CR4_SMAP);
} else if (!(cr4 & X86_CR4_PAE)) {
hw_cr4 &= ~X86_CR4_PAE;
}
}
 
+   if (!enable_unrestricted_guest && !is_paging(vcpu))
+   /*
+* SMEP/SMAP is disabled if CPU is in non-paging mode in
+* hardware.  However KVM always uses paging mode without
+* unrestricted guest.
+* To emulate this behavior, SMEP/SMAP needs to be manually
+* disabled when guest switches to non-paging mode.
+*/
+   hw_cr4 &= ~(X86_CR4_SMEP | X86_CR4_SMAP);
+
vmcs_writel(CR4_READ_SHADOW, cr4);
vmcs_writel(GUEST_CR4, hw_cr4);
return 0;
-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: x86: zero apic_arb_prio on reset

2015-10-30 Thread Radim Krčmář
BSP doesn't get INIT so its apic_arb_prio isn't zeroed after reboot.
BSP won't get lowest priority interrupts until other VCPUs get enough
interrupts to match their pre-reboot apic_arb_prio.

That behavior doesn't fit into KVM's round-robin-like interpretation of
lowest priority delivery ... userspace should KVM_SET_LAPIC on reset, so
just zero apic_arb_prio there.

Reported-by: Yuki Shibuya <shibuya...@ncos.nec.co.jp>
Signed-off-by: Radim Krčmář <rkrc...@redhat.com>
---
 arch/x86/kvm/lapic.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 1b02c44c7b8b..08655020417d 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1924,6 +1924,8 @@ void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu,
kvm_make_request(KVM_REQ_EVENT, vcpu);
if (ioapic_in_kernel(vcpu->kvm))
kvm_rtc_eoi_tracking_restore_one(vcpu);
+
+   vcpu->arch.apic_arb_prio = 0;
 }
 
 void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: fix RSM into 64-bit protected mode, round 2

2015-10-30 Thread Radim Krčmář
2015-10-26 17:32+0100, Paolo Bonzini:
> On 26/10/2015 16:43, Laszlo Ersek wrote:
>>> The code would be cleaner if we had a different approach, but this works
>>> too and is safer for stable. In case you prefer to leave the rewrite for
>>> a future victim,
>> 
>> It's hard to express how much I prefer that.
> 
> Radim, if you want to have a try go ahead since I cannot apply the patch
> until next Monday.

The future I originally had in mind was more hoverboardy, but a series
just landed, "KVM: x86: simplify RSM into 64-bit protected mode".

Laszlo, I'd be grateful if you could check that it works.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] KVM: x86: add read_phys to x86_emulate_ops

2015-10-30 Thread Radim Krčmář
We want to read the physical memory when emulating RSM.

X86EMUL_IO_NEEDED is returned on all errors for consistency with other
helpers.

Signed-off-by: Radim Krčmář <rkrc...@redhat.com>
---
 arch/x86/include/asm/kvm_emulate.h | 10 ++
 arch/x86/kvm/x86.c | 10 ++
 2 files changed, 20 insertions(+)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index e16466ec473c..96f1d1c5e6cb 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -112,6 +112,16 @@ struct x86_emulate_ops {
struct x86_exception *fault);
 
/*
+* read_phys: Read bytes of standard (non-emulated/special) memory.
+*Used for descriptor reading.
+*  @addr:  [IN ] Physical address from which to read.
+*  @val:   [OUT] Value read from memory.
+*  @bytes: [IN ] Number of bytes to read from memory.
+*/
+   int (*read_phys)(struct x86_emulate_ctxt *ctxt, unsigned long addr,
+   void *val, unsigned int bytes);
+
+   /*
 * write_std: Write bytes of standard (non-emulated/special) memory.
 *Used for descriptor writing.
 *  @addr:  [IN ] Linear address to which to write.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 441cb9d4ec8a..ae5af651af89 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4097,6 +4097,15 @@ static int kvm_read_guest_virt_system(struct 
x86_emulate_ctxt *ctxt,
return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, 0, exception);
 }
 
+static int kvm_read_guest_phys_system(struct x86_emulate_ctxt *ctxt,
+   unsigned long addr, void *val, unsigned int bytes)
+{
+   struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
+   int r = kvm_vcpu_read_guest(vcpu, addr, val, bytes);
+
+   return r < 0 ? X86EMUL_IO_NEEDED : X86EMUL_CONTINUE;
+}
+
 int kvm_write_guest_virt_system(struct x86_emulate_ctxt *ctxt,
   gva_t addr, void *val,
   unsigned int bytes,
@@ -4832,6 +4841,7 @@ static const struct x86_emulate_ops emulate_ops = {
.write_gpr   = emulator_write_gpr,
.read_std= kvm_read_guest_virt_system,
.write_std   = kvm_write_guest_virt_system,
+   .read_phys   = kvm_read_guest_phys_system,
.fetch   = kvm_fetch_guest_virt,
.read_emulated   = emulator_read_emulated,
.write_emulated  = emulator_write_emulated,
-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] KVM: x86: simplify RSM into 64-bit protected mode

2015-10-30 Thread Radim Krčmář
This series bases on "KVM: x86: fix RSM into 64-bit protected mode,
round 2" and reverts it in [3/3].  To avoid regressions after doing so,
[1/2] introduces a helper that is used in [2/2] to hopefully get the
same behavior.

I'll set up test environment next week, unless a random act of kindness
allows me not to.


Radim Krčmář (3):
  KVM: x86: add read_phys to x86_emulate_ops
  KVM: x86: handle SMBASE as physical address in RSM
  KVM: x86: simplify RSM into 64-bit protected mode

 arch/x86/include/asm/kvm_emulate.h | 10 +
 arch/x86/kvm/emulate.c | 44 +-
 arch/x86/kvm/x86.c | 10 +
 3 files changed, 30 insertions(+), 34 deletions(-)

-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] KVM: x86: handle SMBASE as physical address in RSM

2015-10-30 Thread Radim Krčmář
GET_SMSTATE depends on real mode to ensure that smbase+offset is treated
as a physical address, which has already caused a bug after shuffling
the code.  Enforce physical addressing.

Signed-off-by: Radim Krčmář <rkrc...@redhat.com>
---
 arch/x86/kvm/emulate.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 25e16b6f6ffa..59e80e0de865 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2272,8 +2272,8 @@ static int emulator_has_longmode(struct x86_emulate_ctxt 
*ctxt)
 #define GET_SMSTATE(type, smbase, offset)\
({\
 type __val;  \
-int r = ctxt->ops->read_std(ctxt, smbase + offset, &__val,   \
-sizeof(__val), NULL);\
+int r = ctxt->ops->read_phys(ctxt, smbase + offset, &__val,  \
+ sizeof(__val)); \
 if (r != X86EMUL_CONTINUE)   \
 return X86EMUL_UNHANDLEABLE; \
 __val;   \
@@ -2507,8 +2507,7 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt)
 
/*
 * Get back to real mode, to prepare a safe state in which to load
-* CR0/CR3/CR4/EFER.  Also this will ensure that addresses passed
-* to read_std/write_std are not virtual.
+* CR0/CR3/CR4/EFER.
 *
 * CR4.PCIDE must be zero, because it is a 64-bit mode only feature.
 */
-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] KVM: x86: simplify RSM into 64-bit protected mode

2015-10-30 Thread Radim Krčmář
This reverts 0123456789abc ("KVM: x86: fix RSM into 64-bit protected
mode, round 2").  We've achieved the same by treating SMBASE as a
physical address in the previous patch.

Signed-off-by: Radim Krčmář <rkrc...@redhat.com>
---
 arch/x86/kvm/emulate.c | 37 +++--
 1 file changed, 7 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 59e80e0de865..b60fed56671b 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2311,16 +2311,7 @@ static int rsm_load_seg_32(struct x86_emulate_ctxt 
*ctxt, u64 smbase, int n)
return X86EMUL_CONTINUE;
 }
 
-struct rsm_stashed_seg_64 {
-   u16 selector;
-   struct desc_struct desc;
-   u32 base3;
-};
-
-static int rsm_stash_seg_64(struct x86_emulate_ctxt *ctxt,
-   struct rsm_stashed_seg_64 *stash,
-   u64 smbase,
-   int n)
+static int rsm_load_seg_64(struct x86_emulate_ctxt *ctxt, u64 smbase, int n)
 {
struct desc_struct desc;
int offset;
@@ -2335,20 +2326,10 @@ static int rsm_stash_seg_64(struct x86_emulate_ctxt 
*ctxt,
set_desc_base(,  GET_SMSTATE(u32, smbase, offset + 8));
base3 =   GET_SMSTATE(u32, smbase, offset + 12);
 
-   stash[n].selector = selector;
-   stash[n].desc = desc;
-   stash[n].base3 = base3;
+   ctxt->ops->set_segment(ctxt, selector, , base3, n);
return X86EMUL_CONTINUE;
 }
 
-static inline void rsm_load_seg_64(struct x86_emulate_ctxt *ctxt,
-  struct rsm_stashed_seg_64 *stash,
-  int n)
-{
-   ctxt->ops->set_segment(ctxt, stash[n].selector, [n].desc,
-  stash[n].base3, n);
-}
-
 static int rsm_enter_protected_mode(struct x86_emulate_ctxt *ctxt,
 u64 cr0, u64 cr4)
 {
@@ -2438,7 +2419,6 @@ static int rsm_load_state_64(struct x86_emulate_ctxt 
*ctxt, u64 smbase)
u32 base3;
u16 selector;
int i, r;
-   struct rsm_stashed_seg_64 stash[6];
 
for (i = 0; i < 16; i++)
*reg_write(ctxt, i) = GET_SMSTATE(u64, smbase, 0x7ff8 - i * 8);
@@ -2480,18 +2460,15 @@ static int rsm_load_state_64(struct x86_emulate_ctxt 
*ctxt, u64 smbase)
dt.address =GET_SMSTATE(u64, smbase, 0x7e68);
ctxt->ops->set_gdt(ctxt, );
 
-   for (i = 0; i < ARRAY_SIZE(stash); i++) {
-   r = rsm_stash_seg_64(ctxt, stash, smbase, i);
-   if (r != X86EMUL_CONTINUE)
-   return r;
-   }
-
r = rsm_enter_protected_mode(ctxt, cr0, cr4);
if (r != X86EMUL_CONTINUE)
return r;
 
-   for (i = 0; i < ARRAY_SIZE(stash); i++)
-   rsm_load_seg_64(ctxt, stash, i);
+   for (i = 0; i < 6; i++) {
+   r = rsm_load_seg_64(ctxt, smbase, i);
+   if (r != X86EMUL_CONTINUE)
+   return r;
+   }
 
return X86EMUL_CONTINUE;
 }
-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: fix RSM into 64-bit protected mode, round 2

2015-10-26 Thread Radim Krčmář
2015-10-23 23:43+0200, Laszlo Ersek:
> Commit b10d92a54dac ("KVM: x86: fix RSM into 64-bit protected mode")
> reordered the rsm_load_seg_64() and rsm_enter_protected_mode() calls,
> relative to each other. The argument that said commit made was correct,
> however putting rsm_enter_protected_mode() first whole-sale violated the
> following (correct) invariant from em_rsm():
> 
>  * Get back to real mode, to prepare a safe state in which to load
>  * CR0/CR3/CR4/EFER.  Also this will ensure that addresses passed
>  * to read_std/write_std are not virtual.

Nice catch.

> Namely, rsm_enter_protected_mode() may re-enable paging, *after* which
> 
>   rsm_load_seg_64()
> GET_SMSTATE()
>   read_std()
> 
> will try to interpret the (smbase + offset) address as a virtual one. This
> will result in unexpected page faults being injected to the guest in
> response to the RSM instruction.

I think this is a good time to introduce the read_phys helper, which we
wanted to avoid with that assumption.

> Split rsm_load_seg_64() in two parts:
> 
> - The first part, rsm_stash_seg_64(), shall call GET_SMSTATE() while in
>   real mode, and save the relevant state off SMRAM into an array local to
>   rsm_load_state_64().
> 
> - The second part, rsm_load_seg_64(), shall occur after entering protected
>   mode, but the segment details shall come from the local array, not the
>   guest's SMRAM.
> 
> Fixes: b10d92a54dac25a6152f1aa1ffc95c12908035ce
> Cc: Paolo Bonzini <pbonz...@redhat.com>
> Cc: Radim Krčmář <rkrc...@redhat.com>
> Cc: Jordan Justen <jordan.l.jus...@intel.com>
> Cc: Michael Kinney <michael.d.kin...@intel.com>
> Cc: sta...@vger.kernel.org
> Signed-off-by: Laszlo Ersek <ler...@redhat.com>
> ---

The code would be cleaner if we had a different approach, but this works
too and is safer for stable. In case you prefer to leave the rewrite for
a future victim,

Reviewed-by: Radim Krčmář <rkrc...@redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] kvm: x86: zero EFER on INIT

2015-10-19 Thread Radim Krčmář
2015-10-19 11:35+0200, Paolo Bonzini:
> Not zeroing EFER means that a 32-bit firmware cannot enter paging mode
> without clearing EFER.LME first (which it should not know about).
> Yang Zhang from Intel confirmed that the manual is wrong and EFER is
> cleared to zero on INIT.

Also in APM v2, "Table 14-1. Initial Processor State.".

> 
> Fixes: d28bc9dd25ce023270d2e039e7c98d38ecbf7758
> Cc: sta...@vger.kernel.org
> Cc: Yang Z Zhang <yang.z.zh...@intel.com>
> Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
> ---

Reviewed-by: Radim Krčmář <rkrc...@redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: x86: build kvm_userspace_memory_region in x86_set_memory_region

2015-10-13 Thread Radim Krčmář
2015-10-12 14:09+0200, Paolo Bonzini:
> The next patch will make x86_set_memory_region fill the
> userspace_addr.  Since the struct is not used untouched
> anymore, it makes sense to build it in x86_set_memory_region
> directly; it also simplifies the callers.
> 
> Reported-by: Alexandre DERUMIER <aderum...@odiso.com>
> Cc: sta...@vger.kernel.org
> Fixes: 9da0e4d5ac969909f6b435ce28ea28135a9cbd69
> Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
> ---
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> @@ -1199,9 +1199,7 @@ void kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int 
> err);
> -int __x86_set_memory_region(struct kvm *kvm,
> - const struct kvm_userspace_memory_region *mem);
> -int x86_set_memory_region(struct kvm *kvm,
> -   const struct kvm_userspace_memory_region *mem);
> +int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size);
> +int x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size);

kvm_userspace_memory_region has u64 size, but we only use this for few
pages anyway and will hopefully get a warning from GCC if that changes.

Patch makes the code much better,

Reviewed-by: Radim Krčmář <rkrc...@redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: x86: map/unmap private slots in __x86_set_memory_region

2015-10-13 Thread Radim Krčmář
2015-10-12 14:09+0200, Paolo Bonzini:
> Otherwise, two copies (one of them never used and thus bogus) are
> allocated for the regular and SMM address spaces.  This breaks
> SMM with EPT but without unrestricted guest support, because the
> SMM copy of the identity page map is all zeros.

(Have you found out why EPT+unrestricted didn't use the alternative SMM
 mapping as well?)

> By moving the allocation to the caller we also remove the last
> vestiges of kernel-allocated memory regions (not accessible anymore
> in userspace since commit b74a07beed0e, "KVM: Remove kernel-allocated
> memory regions", 2010-06-21); that is a nice bonus.
> 
> Reported-by: Alexandre DERUMIER <aderum...@odiso.com>
> Cc: sta...@vger.kernel.org
> Fixes: 9da0e4d5ac969909f6b435ce28ea28135a9cbd69
> Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
> ---

vm_mmap() leaks if __kvm_set_memory_region() fails.  It's nothing new
and following process termination should take care of it,

Reviewed-by: Radim Krčmář <rkrc...@redhat.com>

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> @@ -7717,23 +7717,53 @@ void kvm_arch_sync_events(struct kvm *kvm)
>  int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size)
>  {
>   int i, r;
> + u64 hva;
> + struct kvm_memslots *slots = kvm_memslots(kvm);
> + struct kvm_memory_slot *slot, old;
| [...]
> + slot = >memslots[slots->id_to_index[id]];

This seems better written as

slot = id_to_memslot(slots, id);

(Made me remember that I want to refactor the memslot API ...)

| [...]
> + } else {
> + if (!slot->npages)
> + return 0;
> +
> + hva = 0;
> + }
> +
> + old = *slot;

(Assignment could be in the 'else' == !size branch, GCC would have fun.)

| [...]
> + if (!size) {
> + r = vm_munmap(old.userspace_addr, old.npages * PAGE_SIZE);
> + WARN_ON(r < 0);
> + }
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: x86: clean up kvm_arch_vcpu_runnable

2015-10-13 Thread Radim Krčmář
2015-10-13 12:34+0200, Paolo Bonzini:
> Split the huge conditional in two functions.
> 
> Fixes: 64d6067057d9658acb8675afcfba549abdb7fc16
> Cc: sta...@vger.kernel.org
> Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
> ---

Thanks!

Reviewed-by: Radim Krčmář <rkrc...@redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: x86: fix SMI to halted VCPU

2015-10-13 Thread Radim Krčmář
2015-10-13 12:34+0200, Paolo Bonzini:
> An SMI to a halted VCPU must wake it up, hence a VCPU with a pending
> SMI must be considered runnable.
> 
> Fixes: 64d6067057d9658acb8675afcfba549abdb7fc16
> Cc: sta...@vger.kernel.org
> Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
> ---

Reviewed-by: Radim Krčmář <rkrc...@redhat.com>

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> @@ -7781,6 +7781,9 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu 
> *vcpu)
> + if (test_bit(KVM_REQ_SMI, >requests))

(Ah, and refactoring of this API is also on my new TODO.)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: x86: map/unmap private slots in __x86_set_memory_region

2015-10-13 Thread Radim Krčmář
2015-10-13 18:28+0200, Paolo Bonzini:
> On 13/10/2015 17:39, Radim Krčmář wrote:
>> 2015-10-12 14:09+0200, Paolo Bonzini:
>>> Otherwise, two copies (one of them never used and thus bogus) are
>>> allocated for the regular and SMM address spaces.  This breaks
>>> SMM with EPT but without unrestricted guest support, because the
>>> SMM copy of the identity page map is all zeros.
>> 
>> (Have you found out why EPT+unrestricted didn't use the alternative SMM
>>  mapping as well?)
> 
> Yes, that I already knew; EPT+unrestricted uses CR0.PG=0 directly so
> it doesn't use the identity page at all.  (CR0.PG=0 w/o unrestricted
> instead runs with CR0.PG=1.  CR3 load and store exits are enabled,
> and the guest CR3 always points to the identity page map while the
> guest runs).

Thank you.

>>> +   } else {
>>> +   if (!slot->npages)
>>> +   return 0;
>>> +
>>> +   hva = 0;
>>> +   }
>>> +
>>> +   old = *slot;
>> 
>> (Assignment could be in the 'else' == !size branch, GCC would have fun.)
> 
> It would have fun _and_ warn, which is why it's not in the else branch. :)

I wondered if its "used uninitialized" analyzer got any better :)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/2] KVM: x86: fix edge EOI and IOAPIC reconfig race

2015-10-08 Thread Radim Krčmář
KVM uses eoi_exit_bitmap to track vectors that need an action on EOI.
The problem is that IOAPIC can be reconfigured while an interrupt with
old configuration is pending and eoi_exit_bitmap only remembers the
newest configuration;  thus EOI from the pending interrupt is not
recognized.

(Reconfiguration is not a problem for level interrupts, because IOAPIC
 sends interrupt with the new configuration.)

For an edge interrupt with ACK notifiers, like i8254 timer; things can
happen in this order
 1) IOAPIC inject a vector from i8254
 2) guest reconfigures that vector's VCPU and therefore eoi_exit_bitmap
on original VCPU gets cleared
 3) guest's handler for the vector does EOI
 4) KVM's EOI handler doesn't pass that vector to IOAPIC because it is
not in that VCPU's eoi_exit_bitmap
 5) i8254 stops working

A simple solution is to set the IOAPIC vector in eoi_exit_bitmap if the
vector is in PIR/IRR/ISR.

This creates an unwanted situation if the vector is reused by a
non-IOAPIC source, but I think it is so rare that we don't want to make
the solution more sophisticated.  The simple solution also doesn't work
if we are reconfiguring the vector.  (Shouldn't happen in the wild and
I'd rather fix users of ACK notifiers instead of working around that.)

The are no races because ioapic injection and reconfig are locked.

Fixes: b053b2aef25d ("KVM: x86: Add EOI exit bitmap inference")
[Before b053b2aef25d, this bug happened only with APICv.]
Fixes: c7c9c56ca26f ("x86, apicv: add virtual interrupt delivery support")
Cc: <sta...@vger.kernel.org>
Signed-off-by: Radim Krčmář <rkrc...@redhat.com>
---
 v2: moved sync_pir_to_irr out of kvm_ioapic_scan_entry [Paolo]

 arch/x86/kvm/ioapic.c | 4 +++-
 arch/x86/kvm/x86.c| 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 2dcda0f188ba..88d0a92d3f94 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -246,7 +246,9 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 
*eoi_exit_bitmap)
kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC, 
index) ||
index == RTC_GSI) {
if (kvm_apic_match_dest(vcpu, NULL, 0,
-   e->fields.dest_id, e->fields.dest_mode))
+e->fields.dest_id, e->fields.dest_mode) ||
+   (e->fields.trig_mode == IOAPIC_EDGE_TRIG &&
+kvm_apic_pending_eoi(vcpu, e->fields.vector)))
__set_bit(e->fields.vector,
(unsigned long *)eoi_exit_bitmap);
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2d2c9bb0d6d6..7ed88020d414 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6201,8 +6201,10 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
 
if (irqchip_split(vcpu->kvm))
kvm_scan_ioapic_routes(vcpu, vcpu->arch.eoi_exit_bitmap);
-   else
+   else {
+   kvm_x86_ops->sync_pir_to_irr(vcpu);
kvm_ioapic_scan_entry(vcpu, vcpu->arch.eoi_exit_bitmap);
+   }
kvm_x86_ops->load_eoi_exitmap(vcpu);
 }
 
-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: don't notify userspace IOAPIC on edge EOI

2015-10-08 Thread Radim Krčmář
2015-10-08 20:30+0200, Radim Krčmář:
> On real hardware, edge-triggered interrupts don't set a bit in TMR,
> which means that IOAPIC isn't notified on EOI.  Do the same here.
> 
> Staying in guest/kernel mode after edge EOI is what we want for most
> devices.  If some bugs could be nicely worked around with edge EOI
> notifications, we should invest in a better interface.
> 
> Signed-off-by: Radim Krčmář <rkrc...@redhat.com>
> ---
>  Completely untested.
>  
> diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
> @@ -389,13 +389,15 @@ void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu, u64 
> *eoi_exit_bitmap)
>   for (i = 0; i < nr_ioapic_pins; ++i) {
>   hlist_for_each_entry(entry, >map[i], link) {
>   u32 dest_id, dest_mode;
> + bool level;
>  
>   if (entry->type != KVM_IRQ_ROUTING_MSI)
>   continue;
>   dest_id = (entry->msi.address_lo >> 12) & 0xff;
>   dest_mode = (entry->msi.address_lo >> 2) & 0x1;
> - if (kvm_apic_match_dest(vcpu, NULL, 0, dest_id,
> - dest_mode)) {
> + level = entry->msi.data & MSI_DATA_TRIGGER_LEVEL;
> + if (level && kvm_apic_match_dest(vcpu, NULL, 0,

Describing that result is an overkill -- I'll send v2 if the idea is
accepted.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/2] KVM: x86: fix edge EOI and IOAPIC reconfig race

2015-10-08 Thread Radim Krčmář
v2:
 * rewritten [1/2] and
 * refactored [2/2], all thanks to Paolo's comments

This problem is not fixed for split userspace part as I think that it
would be better to solve that by excluding edge interrupts from
eoi_exit_bitmap (see the next patch in kvm-list for discussion).


Radim Krčmář (2):
  kvm: x86: set KVM_REQ_EVENT when updating IRR
  KVM: x86: fix edge EOI and IOAPIC reconfig race

 arch/x86/kvm/ioapic.c | 4 +++-
 arch/x86/kvm/lapic.c  | 2 ++
 arch/x86/kvm/x86.c| 4 +++-
 3 files changed, 8 insertions(+), 2 deletions(-)

-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/2] kvm: x86: set KVM_REQ_EVENT when updating IRR

2015-10-08 Thread Radim Krčmář
After moving PIR to IRR, the interrupt needs to be delivered manually.

Reported-by: Paolo Bonzini <pbonz...@redhat.com>
Cc: <sta...@vger.kernel.org>
Signed-off-by: Radim Krčmář <rkrc...@redhat.com>
---
 v2: completely rewritten

 arch/x86/kvm/lapic.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 944b38a56929..168b8759bd73 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -348,6 +348,8 @@ void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir)
struct kvm_lapic *apic = vcpu->arch.apic;
 
__kvm_apic_update_irr(pir, apic->regs);
+
+   kvm_make_request(KVM_REQ_EVENT, vcpu);
 }
 EXPORT_SYMBOL_GPL(kvm_apic_update_irr);
 
-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: x86: don't notify userspace IOAPIC on edge EOI

2015-10-08 Thread Radim Krčmář
On real hardware, edge-triggered interrupts don't set a bit in TMR,
which means that IOAPIC isn't notified on EOI.  Do the same here.

Staying in guest/kernel mode after edge EOI is what we want for most
devices.  If some bugs could be nicely worked around with edge EOI
notifications, we should invest in a better interface.

Signed-off-by: Radim Krčmář <rkrc...@redhat.com>
---
 Completely untested.
 
 arch/x86/kvm/irq_comm.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index c89228980230..8f4499c7ffc1 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -389,13 +389,15 @@ void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu, u64 
*eoi_exit_bitmap)
for (i = 0; i < nr_ioapic_pins; ++i) {
hlist_for_each_entry(entry, >map[i], link) {
u32 dest_id, dest_mode;
+   bool level;
 
if (entry->type != KVM_IRQ_ROUTING_MSI)
continue;
dest_id = (entry->msi.address_lo >> 12) & 0xff;
dest_mode = (entry->msi.address_lo >> 2) & 0x1;
-   if (kvm_apic_match_dest(vcpu, NULL, 0, dest_id,
-   dest_mode)) {
+   level = entry->msi.data & MSI_DATA_TRIGGER_LEVEL;
+   if (level && kvm_apic_match_dest(vcpu, NULL, 0,
+   dest_id, dest_mode)) {
u32 vector = entry->msi.data & 0xff;
 
__set_bit(vector,
-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: x86: fix edge EOI and IOAPIC reconfig race

2015-10-07 Thread Radim Krčmář
2015-10-07 11:29+0200, Paolo Bonzini:
> On 13/08/2015 15:46, Radim Krčmář wrote:
>> +if (kvm_x86_ops->sync_pir_to_irr(vcpu))
>> +kvm_make_request(KVM_REQ_EVENT, vcpu);
>> +
> 
> The call to sync_pir_to_irr belongs more in vcpu_scan_ioapic, I think.

Will do so in v2.

> More importantly, I think that KVM_REQ_EVENT is a latent bug for
> kvm_vcpu_ioctl_get_lapic as well, so the call to kvm_make_request should
> go in vmx_sync_pir_to_irr or in a new kvm_sync_pir_to_irr wrapper.

True, thanks.  I'll make the request in kvm_apic_update_irr (unless
you'd prefer to have it in new kvm_sync_pir_to_irr).

>> +(e->fields.trig_mode == IOAPIC_EDGE_TRIG &&
>> + kvm_apic_pending_eoi(vcpu, e->fields.vector)))
> 
> Should we test again here that kvm_irq_has_notifier(ioapic->kvm,
> KVM_IRQCHIP_IOAPIC, index), to avoid unnecessarily marking other
> edge-triggered interrupts?

Other edge-triggered interrupts are skipped by a previous condition:

  if (e->fields.trig_mode == IOAPIC_LEVEL_TRIG ||
  kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC, index) ||
  index == RTC_GSI)
 [we're here]

I think it is ok to ignore level-triggered RTC, but we do want to
include edge-triggered.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: x86: fix edge EOI and IOAPIC reconfig race

2015-10-07 Thread Radim Krčmář
2015-10-07 11:31+0200, Paolo Bonzini:
> What's the issue with
> handle_irq?

I get #PF instead of callback after redirecting to VCPU 1.
No idea what causes it, yet -- seeing handle_irq's iplementation made me
postpone debugging :)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: x86: fix edge EOI and IOAPIC reconfig race

2015-10-06 Thread Radim Krčmář
2015-08-15 02:00+0200, Paolo Bonzini:
>On 14/08/2015 10:38, Radim Krčmář wrote:
>>> How do you reproduce the bug?
>> I run rhel4 (2.6.9) kernel on 2 VCPUs and frequently alternate
>> smp_affinity of "timer".  The bug is hit within seconds.
> 
> Nice, I'll try to make a unit test for it on the plane. :)

(Time on planes is best spent by sleeping ;])

What do you think about the series?

I made a prototype kvm-unit-test ...
(Reproduces with APICv or split irqchip builds.)
---8<---
x86: test IO-APIC dest_id modification before

Regression test for kvm commit $TODO.
Run with '-smp 2'.

I had problems with handle_irq so this version uses asm workaround;
will fix that in final version.  Initialization also presumes that
everything will work as it did on my machines :)
---
 x86/ioapic.c | 47 +--
 1 file changed, 45 insertions(+), 2 deletions(-)

diff --git a/x86/ioapic.c b/x86/ioapic.c
index 1fe1ccc9eadd..55f8eea03496 100644
--- a/x86/ioapic.c
+++ b/x86/ioapic.c
@@ -4,21 +4,27 @@
 #include "smp.h"
 #include "desc.h"
 #include "isr.h"
+#include "io.h"
 
 #define EDGE_TRIGGERED 0
 #define LEVEL_TRIGGERED 1
 
-static void set_ioapic_redir(unsigned line, unsigned vec, unsigned trig_mode)
+static void __set_ioapic_redir(unsigned line, u8 vec, bool trig_mode, u8 
dest_id)
 {
ioapic_redir_entry_t e = {
.vector = vec,
-   .delivery_mode = 0,
.trig_mode = trig_mode,
+   .dest_id = dest_id,
};
 
ioapic_write_redir(line, e);
 }
 
+static void set_ioapic_redir(unsigned line, unsigned vec, unsigned trig_mode)
+{
+   __set_ioapic_redir(line, vec, trig_mode, 0);
+}
+
 static void set_irq_line(unsigned line, int val)
 {
asm volatile("out %0, %1" : : "a"((u8)val), "d"((u16)(0x2000 + line)));
@@ -298,6 +304,41 @@ static void test_ioapic_level_retrigger_mask(void)
set_mask(0x0e, false);
 }
 
+static volatile int pit_working = -1;
+
+static void __attribute__((used)) pit_isr_fn(void)
+{
+   if (!pit_working++);
+   __set_ioapic_redir(0x2, 0x84, EDGE_TRIGGERED, 1);
+   eoi();
+}
+
+void pit_isr(void);
+asm (
+ "pit_isr: \n"
+ "   call pit_isr_fn \n"
+#ifndef __x86_64__
+ "   iret"
+#else
+ "   iretq"
+#endif
+ );
+
+static void test_ioapic_eoi_bug(void)
+{
+   if (cpu_count() < 2)
+   return;
+
+   set_idt_entry(0x84, pit_isr, 0);
+   __set_ioapic_redir(0x2, 0x84, EDGE_TRIGGERED, 0);
+
+   outb(0x34, 0x43);
+
+   for (unsigned loops = 1; loops && pit_working < 1; loops--)
+   asm volatile ("nop");
+
+   report("dest_id reconfiguration before EOI", pit_working >= 1);
+}
 
 int main(void)
 {
@@ -325,5 +366,7 @@ int main(void)
test_ioapic_level_mask();
test_ioapic_level_retrigger_mask();
 
+   test_ioapic_eoi_bug();
+
return report_summary();
 }
-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/12] KVM: x86: Replace call-back set_tsc_khz() with a common function

2015-10-05 Thread Radim Krčmář
2015-09-28 13:38+0800, Haozhong Zhang:
> Both VMX and SVM propagate virtual_tsc_khz in the same way, so this
> patch removes the call-back set_tsc_khz() and replaces it with a common
> function.
> 
> Signed-off-by: Haozhong Zhang 
> ---
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> +static void set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool scale)
> +{
> + u64 ratio, khz;
| [...]
> + khz = user_tsc_khz;

I'd use "user_tsc_khz" directly.

> + /* TSC scaling required  - calculate ratio */
> + shift = (kvm_tsc_scaling_ratio_frac_bits <= 32) ?
> + kvm_tsc_scaling_ratio_frac_bits : 32;
> + ratio = khz << shift;
> + do_div(ratio, tsc_khz);
> + ratio <<= (kvm_tsc_scaling_ratio_frac_bits - shift);

VMX is losing 16 bits by this operation;  normal fixed point division
could get us a smaller drift (and an one-liner here) ...
at 4.3 GHz, 32 instead of 48 bits after decimal point translate to one
"lost" TSC tick per second, in the worst case.

Please mention that we are truncating on purpose :)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/12] KVM: x86: Use the correct vcpu's TSC rate to compute time scale

2015-10-05 Thread Radim Krčmář
2015-09-28 13:38+0800, Haozhong Zhang:
> This patch makes KVM use virtual_tsc_khz rather than the host TSC rate
> as vcpu's TSC rate to compute the time scale if TSC scaling is enabled.
> 
> Signed-off-by: Haozhong Zhang 
> ---
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> @@ -1782,7 +1782,9 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
>   return 0;
>  
>   if (unlikely(vcpu->hw_tsc_khz != this_tsc_khz)) {
> - kvm_get_time_scale(NSEC_PER_SEC / 1000, this_tsc_khz,
> + tgt_tsc_khz = kvm_has_tsc_control ?
> + vcpu->virtual_tsc_khz : this_tsc_khz;
> + kvm_get_time_scale(NSEC_PER_SEC / 1000, tgt_tsc_khz,
>  >hv_clock.tsc_shift,
>  >hv_clock.tsc_to_system_mul);

Good catch, it seems that SVM didn't scale kvmclock correctly ...
I think we'll want this patch in stable.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/12] KVM: x86: Add a common TSC scaling ratio field in kvm_vcpu_arch

2015-10-05 Thread Radim Krčmář
2015-09-28 13:38+0800, Haozhong Zhang:
> This patch moves the field of TSC scaling ratio from the architecture
> struct vcpu_svm to the common struct kvm_vcpu_arch.
> 
> Signed-off-by: Haozhong Zhang 
> ---
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> @@ -7080,6 +7080,9 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
>  
>   vcpu = kvm_x86_ops->vcpu_create(kvm, id);
>  
> + if (!IS_ERR(vcpu))
> + vcpu->arch.tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;

This shouldn't be necessary, (and we can definitely do it without error
checking later)

 kvm_arch_vcpu_create
   (vmx|svm)_create_vcpu
 kvm_vcpu_init
   kvm_arch_vcpu_init
 kvm_set_tsc_khz

sets vcpu->arch.tsc_scaling_ratio to something reasonable and SVM didn't
overwrite that value.  (kvm_set_tsc_khz() only doesn't set the ration if
this_tsc_khz == 0, which we could extend to be extra safe.)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1] Revert "KVM: x86: zero kvmclock_offset when vcpu0 initializes kvmclock system MSR"

2015-09-22 Thread Radim Krčmář
PVCLOCK_COUNTS_FROM_ZERO broke ABI and (at least) three things with it.
All problems stem from repeated writes to MSR_KVM_SYSTEM_TIME(_NEW).
The reverted patch treated the MSR write as a one-shot initializer:
any write from VCPU 0 would reset system_time.

And this is what broke for Linux guests:
* Onlining/hotplugging of VCPU 0

  VCPU has to assign an address to KVM clock before use, which is done
  with MSR_KVM_SYSTEM_TIME_NEW.  Linux has an idea that time should not
  jump backward, so any `sleep` won't return before |system_time at the
  point of offline| elapses since the online.  Be sure to run ntp.

* S3 and S4 resume

  If you don't have PVCLOCK_TSC_STABLE_BIT in Linux, resume will freeze
  for |system_time at the point of suspend|, because pvclock ensures
  monoticity and kvmclock did not think about it.

  If you have stable clock, execution will resume immediately, but
  restoring KVM clock writes to the MSR and dmesg starts to count from
  zero.  It's better than the onlining, but not what we want either.

* Boot of SLES 10 guest

  SLES 10 has a custom implementation of kvm clock that calls
  MSR_KVM_SYSTEM_TIME before every read to enhance precision ...
  Two things are happening at the same time:
  1) The guest periodically receives an interrupt that is handled by
 main_timer_handler():
 a) get time using the kvm clock:
1) write the address to MSR_KVM_SYSTEM_TIME
2) read tsc and pvclock (tsc_offset, system_time)
3) time = tsc - tsc_offset + system_time
 b) compute time since the last main_timer_handler()
 c) bump jiffies if enough time has elapsed
  2) the guest wants to calibrate loops per jiffy [1]:
 a) read tsc
 b) loop till jiffies increase
 c) compute lpj

  Because (1a1) always resets the system_time to 0, we read the same
  value over and over so the condition for (1c) is never true and
  jiffies remain constant.  A hang happens in (2b) as it is the first
  place that depends on jiffies.


We could make hypervisor workaround for this, but that is just asking
for more trouble.  Luckily, reverting does not break to guests that
learned about PVCLOCK_COUNTS_FROM_ZERO, in new ways.
Only 4.2+ guests with NOHZ_FULL wanted PVCLOCK_COUNTS_FROM_ZERO, which
is a good trade-off for not regressing.

This reverts commit b7e60c5aedd2b63f16ef06fde4f81ca032211bc5.
And adds a note to the definition of PVCLOCK_COUNTS_FROM_ZERO.

Cc: sta...@vger.kernel.org
Signed-off-by: Radim Krčmář <rkrc...@redhat.com>
---
 v1: Extended commit message based on a discussion with Marcelo

 arch/x86/include/asm/pvclock-abi.h | 1 +
 arch/x86/kvm/x86.c | 4 
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/pvclock-abi.h 
b/arch/x86/include/asm/pvclock-abi.h
index 655e07a48f6c..67f08230103a 100644
--- a/arch/x86/include/asm/pvclock-abi.h
+++ b/arch/x86/include/asm/pvclock-abi.h
@@ -41,6 +41,7 @@ struct pvclock_wall_clock {
 
 #define PVCLOCK_TSC_STABLE_BIT (1 << 0)
 #define PVCLOCK_GUEST_STOPPED  (1 << 1)
+/* PVCLOCK_COUNTS_FROM_ZERO broke ABI and can't be used anymore. */
 #define PVCLOCK_COUNTS_FROM_ZERO (1 << 2)
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_X86_PVCLOCK_ABI_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4bca39f0fdb3..71731994d897 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1711,8 +1711,6 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
vcpu->pvclock_set_guest_stopped_request = false;
}
 
-   pvclock_flags |= PVCLOCK_COUNTS_FROM_ZERO;
-
/* If the host uses TSC clocksource, then it is stable */
if (use_master_clock)
pvclock_flags |= PVCLOCK_TSC_STABLE_BIT;
@@ -2010,8 +2008,6 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
>requests);
 
ka->boot_vcpu_runs_old_kvmclock = tmp;
-
-   ka->kvmclock_offset = -get_kernel_ns();
}
 
vcpu->arch.time = data;
-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] kvmclock: fix ABI breakage from PVCLOCK_COUNTS_FROM_ZERO.

2015-09-22 Thread Radim Krčmář
2015-09-21 19:37-0300, Marcelo Tosatti:
> On Tue, Sep 22, 2015 at 12:00:39AM +0200, Radim Krčmář wrote:
>> 2015-09-21 17:53-0300, Marcelo Tosatti:
>> > On Mon, Sep 21, 2015 at 10:00:27PM +0200, Radim Krčmář wrote:
>> >> so I thought about other problems with
>> >> PVCLOCK_COUNTS_FROM_ZERO ... have you tried to hot-replug VCPU 0?
>> >> It doesn't work well ;)
>> > 
>> > Can you hot-unplug vcpu 0? 
>> 
>> I can, I did, hot-plug bugged.
> 
> Due to PVCLOCK_COUNTS_FROM_ZERO patch?  

Yes, VCPU 0 writes MSR_KVM_SYSTEM_TIME_NEW to access the kvm clock, but
that very operation resets system_time.  The result is a long sleep,
because at least Linux doesn't handle that shift.
(Just in case, works fine after reverting the one host patch.)

With more thinking, PVCLOCK_COUNTS_FROM_ZERO also breaks suspend and
resume as we write the MSR there.  Testing shows that is isn't as bad as
the hotplug, because `sleep 1` returns in a second and `date` is fine,
but having multiple 0 points in `dmesg` isn't very nice either.

There are too many problems with PVCLOCK_COUNTS_FROM_ZERO, I'll send the
revert with cc:stable soon.  (Without any guest changes to sched_clock.)

>> >> > The problem is, "selecting one read as the initial point" is inherently
>> >> > racy: that delta is relative to one moment (kvmclock read) at one vcpu,
>> >> > but must be applied to all vcpus.
>> >> 
>> >> I don't think that is a problem.
>> >> 
>> >> Kvmclock has a notion of a global system_time in nanoseconds (one value
>> >> that defines the time, assigned with VM ioctl KVM_SET_CLOCK) and tries
>> >> to propagate system_time into guest VCPUs as precisely as possible with
>> >> the help of TSC.
>> >
>> >Different pairs of values (system_time, tsc reads) are fundamentally
>> >broken. This is why 
>> >
>> >commit 489fb490dbf8dab0249ad82b56688ae3842a79e8
>> >x86, paravirt: Add a global synchronization point for pvclock
>> >
>> >Exists.
>> >
>> >Then to guarantee monotonicity you need to stop those updates
>> >(or perform the pair update in lock step):
>> >
>> >commit d828199e84447795c6669ff0e6c6d55eb9beeff6
>> >KVM: x86: implement PVCLOCK_TSC_STABLE_BIT pvclock flag
>> 
>> (Thanks for the commits, split values are the core design that avoids
>>  modifying rdtsc, 
> 
> Which is broken (thats the point).

Yes, but I don't think we could use tsc any better without having a
correct guest's time in the host.

We chose to provide kvmclock without a host_tsc -> host_ns function in
the host.  It is just impossible to give the guest a precise (tsc, ns)
tuple if the host is not using TSC for its clock.  (We don't have
control over the machine and using two reads of time can't be precise.)

>> > 3  3  02-1
>>   ^^^
>> -1 is the overflow.  Very unlikely to ever happen in Linux, as we now
>> boot other VCPUs much later than the KVM clock configuration and it can
>> only happen if VCPU1 is running as the reconfiguration takes place, but
>> a simple
>> 
>>   if (vcpu[x].sched_clock <= global_sched_clock_offset)
>>  return 0;
>> 
>> would take care of it.  The time would stand still for a while, which is
>> not a huge problem for boot-only scenario.  
> 
> Look at 
> 
> commit 489fb490dbf8dab0249ad82b56688ae3842a79e8
> x86, paravirt: Add a global synchronization point for pvclock

This patch shows the place where hotplug likely fails -- system_time
goes back to 0 so time is frozen until system_time reaches the original
value, again.
It also makes the result of kvm_clock_read monotonic across all vcpus,
so [1/2] doesn't need to add code to handle initial overflow.

> And think of what an overflow does.

If the clock was slightly negative (unsigned) and then overflowed,
sched_clock would freeze because it would never reach that high value
again.  Offset in [1/2] is applied later and can't reach negative.

> Note: i tried your solution before (to add offset) and saw this issue
> in practice.

Great, thanks, I will thoroughly test it to see where it fails.

>> Other VCPUs couldn't be
>> reading KVM sched before it was configured, so only operations that
>> happen before vcpu1sched_clock goes positive are affected.
>> (We have other problems if the window can be long.)
> 
> The point where vcpu1sched_clock is negative is after kvmclock is
> initialized.

Existing code takes care of that -- vcpu0 reads a value before a

Re: [RFC PATCH 0/2] kvmclock: fix ABI breakage from PVCLOCK_COUNTS_FROM_ZERO.

2015-09-22 Thread Radim Krčmář
[My mailbox filled between yours two emails, so I have pasted the latter
 one from archives and replied to a wrong one.]

2015-09-21 21:42-0300, Marcelo Tosatti:
> 2015-09-21 19:37-0300, Marcelo Tosatti:
>> Enable counts-from-zero on MSR_KVM_SYSTEM_TIME_NEW:
>> -> Figure out whats wrong with different kvmclock values on hotplug, 
>> and fix it.
> 
> Find data which allows you to differentiate between hotplug of pCPU-0
> and system initialization.

This kind of programminng is exactly what I don't like and won't be
doing.  We should at least be removing previous mistakes when adding new
ones.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] kvmclock: fix ABI breakage from PVCLOCK_COUNTS_FROM_ZERO.

2015-09-21 Thread Radim Krčmář
2015-09-21 17:53-0300, Marcelo Tosatti:
> On Mon, Sep 21, 2015 at 10:00:27PM +0200, Radim Krčmář wrote:
>> 2015-09-21 12:52-0300, Marcelo Tosatti:
>> > On Mon, Sep 21, 2015 at 05:12:10PM +0200, Radim Krčmář wrote:
>> >> 2015-09-20 19:57-0300, Marcelo Tosatti:
>> >>> Is it counting from zero that breaks SLES10?
>> >> 
>> >> Not by itself, treating MSR_KVM_SYSTEM_TIME as one-shot initializer did.
>> >> The guest wants to write to MSR_KVM_SYSTEM_TIME as much as it likes to,
>> >> while still keeping system time;  we used to allow that, which means an
>> >> ABI breakage.  (And we can't even say that guest's behaviour is against
>> >> the spec ...)
>> > 
>> > Because this behaviour was not defined.
>> 
>> It is defined by implementation.
>> 
>> > Can't you just condition PVCLOCK_COUNTS_FROM_ZERO behaviour on
>> > boot_vcpu_runs_old_kvmclock == false? 
>> > The patch would be much simpler.
>> 
>> If you mean the hunk in cover letter, I don't like it because we presume
>> that no other guests were broken.
> 
> Yes patch 1.

(I'd call them: a hunk in cover letter [0/2], patch 1 = [1/2], and
 patch 2 = [2/2].)

> Do you have an example of another guest that is broken? 

Yes, the hot-plug in any relatively recent Linux guest.

> Really, assuming things about the value of the msr read when you
> write to the MSR is very awkward. That SuSE code is incorrect, it fails
> on other occasions as well (see
> 54750f2cf042c42b4223d67b1bd20138464bde0e).

Yeah, I even read the SUSE implementation, sad times.

>> I really don't like it 
> 
> Because it changes behaviour that was previously unspecified?

No, because it adds another exemption to already ugly code.
(Talking about the hunk in [0/2].)

>> so I thought about other problems with
>> PVCLOCK_COUNTS_FROM_ZERO ... have you tried to hot-replug VCPU 0?
> 
> Can't unplug VCPU 0 i think.

You can.

>> It doesn't work well ;)
> 
> Can you hot-unplug vcpu 0? 

I can, I did, hot-plug bugged.

>> > The problem is, "selecting one read as the initial point" is inherently
>> > racy: that delta is relative to one moment (kvmclock read) at one vcpu,
>> > but must be applied to all vcpus.
>> 
>> I don't think that is a problem.
>> 
>> Kvmclock has a notion of a global system_time in nanoseconds (one value
>> that defines the time, assigned with VM ioctl KVM_SET_CLOCK) and tries
>> to propagate system_time into guest VCPUs as precisely as possible with
>> the help of TSC.
>
>Different pairs of values (system_time, tsc reads) are fundamentally
>broken. This is why 
>
>commit 489fb490dbf8dab0249ad82b56688ae3842a79e8
>x86, paravirt: Add a global synchronization point for pvclock
>
>Exists.
>
>Then to guarantee monotonicity you need to stop those updates
>(or perform the pair update in lock step):
>
>commit d828199e84447795c6669ff0e6c6d55eb9beeff6
>KVM: x86: implement PVCLOCK_TSC_STABLE_BIT pvclock flag

(Thanks for the commits, split values are the core design that avoids
 modifying rdtsc, so I'll be grad to read its details.)

>> >2) You rely on monotonicity across vcpus to perform 
>> >   the 'minus delta that was read on vcpu0' calculation, but 
>> >   monotonicity across vcpus can fail during runtime
>> >(say host clocksource goes tsc->hpet due to tsc instability).
>> 
>> That could be a problem, but I'm adding a VCPU independent constant to
>> all reads -- does the new code rely on monoticity in places where the
>> old one didn't?
> 
> The problem is overflow:
> kvmclock non-monotonic across vcpus AND delta subtraction:
> 
> ptime | vcpu0kvmclock time | vcpu0sched_clock | vcpu1kvmclock time | 
> vcpu1sched_clock
> 1  1  1 
> 2  2  21

KVM sched clock not used before this point (I even moved the code to
make sure), so there is no problem with monotonicity.

> 3  3  02-1
  ^^^
-1 is the overflow.  Very unlikely to ever happen in Linux, as we now
boot other VCPUs much later than the KVM clock configuration and it can
only happen if VCPU1 is running as the reconfiguration takes place, but
a simple

  if (vcpu[x].sched_clock <= global_sched_clock_offset)
return 0;

would take care of it.  The time would stand still for a while, which is
not a huge problem for boot-only scenario.  Other VCPUs couldn't be
reading KVM sched befor

Re: [RFC PATCH 0/2] kvmclock: fix ABI breakage from PVCLOCK_COUNTS_FROM_ZERO.

2015-09-21 Thread Radim Krčmář
2015-09-21 17:12+0200, Radim Krčmář:
> 2015-09-20 19:57-0300, Marcelo Tosatti:
> > On Fri, Sep 18, 2015 at 05:54:28PM +0200, Radim Krčmář wrote:
>>> This patch series will be disabling PVCLOCK_COUNTS_FROM_ZERO flag and is
>>> RFC because I haven't explored many potential problems or tested it.
>> 
>> The justification to disable PVCLOCK_COUNTS_FROM_ZERO is because you
>> haven't explored potential problems or tested it? Sorry can't parse it.

I missed that one, sorry.
PVCLOCK_COUNTS_FROM_ZERO is disabled because it breaks ABI.
Not testing nor ruling out all problems is a justification for RFC.

(This patch series
 - will be disabling PVCLOCK_COUNTS_FROM_ZERO flag and
 - is RFC because I haven't explored many potential problems or tested
   it [this patch series])
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] kvmclock: fix ABI breakage from PVCLOCK_COUNTS_FROM_ZERO.

2015-09-21 Thread Radim Krčmář
2015-09-20 19:57-0300, Marcelo Tosatti:
> On Fri, Sep 18, 2015 at 05:54:28PM +0200, Radim Krčmář wrote:
>> This patch series will be disabling PVCLOCK_COUNTS_FROM_ZERO flag and is
>> RFC because I haven't explored many potential problems or tested it.
> 
> The justification to disable PVCLOCK_COUNTS_FROM_ZERO is because you
> haven't explored potential problems or tested it? Sorry can't parse it.
> 
>> 
>> [1/2] uses a different algorithm in the guest to start counting from 0.
>> [2/2] stops exposing PVCLOCK_COUNTS_FROM_ZERO in the hypervisor.
>> 
>> A viable alternative would be to implement opt-in features in kvm clock.
>> 
>> And because we probably only broke one old user (the infamous SLES 10), a
>> workaround like this is also possible: (but I'd rather not do that)
> 
> Please describe why SLES 10 breaks in detail: the state of the guest and
> the host before the patch, the state of the guest and host after the
> patch.

1) The guest periodically receives an interrupt that is handled by
   main_timer_handler():
   a) get time using the kvm clock:
  1) write the address to MSR_KVM_SYSTEM_TIME
  2) read tsc and pvclock (tsc_offset, system_time)
  3) time = tsc - tsc_offset + system_time
   b) compute time since the last main_timer_handler()
   c) bump jiffies if enough time has elapsed
2) the guest wants to calibrate loops per jiffy [1]:
   a) read tsc
   b) loop till jiffies increase
   c) compute lpj

Because (1a1) always resets the system_time to 0, we read the same value
over and over so the condition for (1c) is never true and jiffies remain
constant.  This is the problem.  A hang happens in (2b) as it is the
first place that depends on jiffies.

> What does SLES10 expect?

That a write to MSR_KVM_SYSTEM_TIME does not reset the system time.

> Is it counting from zero that breaks SLES10?

Not by itself, treating MSR_KVM_SYSTEM_TIME as one-shot initializer did.
The guest wants to write to MSR_KVM_SYSTEM_TIME as much as it likes to,
while still keeping system time;  we used to allow that, which means an
ABI breakage.  (And we can't even say that guest's behaviour is against
the spec ...)


---
1: I also did diassembly, but the reproducer is easier to paste
   (couldn't find debuginfo)
   # qemu-kvm -nographic -kernel vmlinuz-2.6.16.60-0.85.1-default \
-serial stdio -monitor /dev/null -append 'console=ttyS0'
  
  and you can get a bit further when setting loops per jiffy manually,
-serial stdio -monitor /dev/null -append 'console=ttyS0 lpj=12345678'

  The dmesg for failing run is
Initializing CPU#0
PID hash table entries: 512 (order: 9, 16384 bytes)
kvm-clock: cpu 0, msr 0:3f6041, boot clock
kvm_get_tsc_khz: cpu 0, msr 0:e001
time.c: Using tsc for timekeeping HZ 250
time.c: Using 100.00 MHz WALL KVM GTOD KVM timer.
time.c: Detected 2591.580 MHz processor.
Console: colour VGA+ 80x25
Dentry cache hash table entries: 16384 (order: 5, 131072 bytes)
Inode-cache hash table entries: 8192 (order: 4, 65536 bytes)
Checking aperture...
Nosave address range: 0009f000 - 000a
Nosave address range: 000a - 000f
Nosave address range: 000f - 0010
Memory: 124884k/130944k available (1856k kernel code, 5544k reserved, 812k 
data, 188k init)
[Infinitely querying kvm clock here ...]

  With '-cpu kvm64,-kvmclock', the next line is
Calibrating delay using timer specific routine.. 5199.75 BogoMIPS 
(lpj=10399519)

  With 'lpj=10399519',
Calibrating delay loop (skipped)... 5199.75 BogoMIPS preset
[Manages to get stuck later, in default_idle.]
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] kvmclock: fix ABI breakage from PVCLOCK_COUNTS_FROM_ZERO.

2015-09-21 Thread Radim Krčmář
2015-09-21 12:52-0300, Marcelo Tosatti:
> On Mon, Sep 21, 2015 at 05:12:10PM +0200, Radim Krčmář wrote:
>> 2015-09-20 19:57-0300, Marcelo Tosatti:
>>> Is it counting from zero that breaks SLES10?
>> 
>> Not by itself, treating MSR_KVM_SYSTEM_TIME as one-shot initializer did.
>> The guest wants to write to MSR_KVM_SYSTEM_TIME as much as it likes to,
>> while still keeping system time;  we used to allow that, which means an
>> ABI breakage.  (And we can't even say that guest's behaviour is against
>> the spec ...)
> 
> Because this behaviour was not defined.

It is defined by implementation.

> Can't you just condition PVCLOCK_COUNTS_FROM_ZERO behaviour on
> boot_vcpu_runs_old_kvmclock == false? 
> The patch would be much simpler.


If you mean the hunk in cover letter, I don't like it because we presume
that no other guests were broken.

I really don't like it so I thought about other problems with
PVCLOCK_COUNTS_FROM_ZERO ... have you tried to hot-replug VCPU 0?
It doesn't work well ;)

We don't want to guess what the guest wants so I'd go for the opt-in
paravirt feature unless counting from zero can be done in guest alone.

> The problem is, "selecting one read as the initial point" is inherently
> racy: that delta is relative to one moment (kvmclock read) at one vcpu,
> but must be applied to all vcpus.

I don't think that is a problem.

Kvmclock has a notion of a global system_time in nanoseconds (one value
that defines the time, assigned with VM ioctl KVM_SET_CLOCK) and tries
to propagate system_time into guest VCPUs as precisely as possible with
the help of TSC.

sched_clock uses kvmclock to get nanoseconds since the system was
brought up and [1/2] only works with this abstracted ns count.
A poorly synchronized initial read is irrelevant because all VCPUs will
be using the same constant offset.
(We can never know the precise time anyway.)

> Besides:
> 
>   1) Stable sched clock in guest does not depend on
>  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT.

Yes, thanks, I will remove that requirement in v1.  (We'd need to
improve a loss of PVCLOCK_TSC_STABLE_BIT otherwise anyway.)

Because the clutchy dependency on PVCLOCK_TSC_STABLE_BIT is going away,
there is now one possible unsigned overflow: in case the clock was very
imprecise and VCPU1 managed to get smaller system_time than VCPU0 at the
time of initialization.  It's very unlikely that we'll ever reach legal
overflow so I can add a condition there.

>   2) You rely on monotonicity across vcpus to perform 
>  the 'minus delta that was read on vcpu0' calculation, but 
>  monotonicity across vcpus can fail during runtime
>(say host clocksource goes tsc->hpet due to tsc instability).

That could be a problem, but I'm adding a VCPU independent constant to
all reads -- does the new code rely on monoticity in places where the
old one didn't?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 03/13] KVM: Define a new interface kvm_intr_is_single_vcpu()

2015-09-18 Thread Radim Krčmář
2015-09-17 23:18+, Wu, Feng:
>> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
>> On 17/09/2015 17:58, Radim Krčmář wrote:
>>> xAPIC address are only 8 bit long so they always get delivered to x2APIC
>>> cluster 0, where first 16 bits work like xAPIC flat logical mode.
>> 
>> Ok, I was wondering whether this was the correct interpretation.  Thanks!
> 
> Paolo, I don't think Radim clarify your concern, right? Since mda is 8-bit, it
> is wrong with mda >> 16, this is your concern, right?

In case it was:  mda is u32 so the bitshift is defined by C.
(xAPIC destinations in KVM's x2APIC mode are stored in lowest 8 bits of
 mda, hence the cluster is always 0.)

Or am I still missing the point?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Revert "KVM: x86: zero kvmclock_offset when vcpu0 initializes kvmclock system MSR"

2015-09-18 Thread Radim Krčmář
Shifting pvclock_vcpu_time_info.system_time on write to KVM system time
MSR is a change of ABI.  Probably only 2.6.16 based SLES 10 breaks due
to its custom enhancements to kvmclock, but KVM never declared the MSR
only for one-shot initialization.  (Doc says that only one write is
needed.)

This reverts commit b7e60c5aedd2b63f16ef06fde4f81ca032211bc5.
And adds a note to the definition of PVCLOCK_COUNTS_FROM_ZERO.

Signed-off-by: Radim Krčmář <rkrc...@redhat.com>
---
 arch/x86/include/asm/pvclock-abi.h | 1 +
 arch/x86/kvm/x86.c | 4 
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/pvclock-abi.h 
b/arch/x86/include/asm/pvclock-abi.h
index 655e07a48f6c..67f08230103a 100644
--- a/arch/x86/include/asm/pvclock-abi.h
+++ b/arch/x86/include/asm/pvclock-abi.h
@@ -41,6 +41,7 @@ struct pvclock_wall_clock {
 
 #define PVCLOCK_TSC_STABLE_BIT (1 << 0)
 #define PVCLOCK_GUEST_STOPPED  (1 << 1)
+/* PVCLOCK_COUNTS_FROM_ZERO broke ABI and can't be used anymore. */
 #define PVCLOCK_COUNTS_FROM_ZERO (1 << 2)
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_X86_PVCLOCK_ABI_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 18d59b584dee..34d33f4757d2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1707,8 +1707,6 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
vcpu->pvclock_set_guest_stopped_request = false;
}
 
-   pvclock_flags |= PVCLOCK_COUNTS_FROM_ZERO;
-
/* If the host uses TSC clocksource, then it is stable */
if (use_master_clock)
pvclock_flags |= PVCLOCK_TSC_STABLE_BIT;
@@ -2006,8 +2004,6 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
>requests);
 
ka->boot_vcpu_runs_old_kvmclock = tmp;
-
-   ka->kvmclock_offset = -get_kernel_ns();
}
 
vcpu->arch.time = data;
-- 
2.5.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] x86: kvmclock: abolish PVCLOCK_COUNTS_FROM_ZERO

2015-09-18 Thread Radim Krčmář
Newer KVM won't be exposing PVCLOCK_COUNTS_FROM_ZERO anymore.
The purpose of that flags was to start counting system time from 0 when
the KVM clock has been initialized.
We can achieve the same by selecting one read as the initial point.

A simple subtraction will work unless the KVM clock count overflows
earlier (has smaller width) than scheduler's cycle count.  We should be
safe till x86_128.

Because PVCLOCK_COUNTS_FROM_ZERO was enabled only on new hypervisors,
setting sched clock as stable based on PVCLOCK_TSC_STABLE_BIT might
regress on older ones.

I presume we don't need to change kvm_clock_read instead of introducing
kvm_sched_clock_read.  A problem could arise in case sched_clock is
expected to return the same value as get_cycles, but we should have
merged those clocks in that case.

Signed-off-by: Radim Krčmář <rkrc...@redhat.com>
---
 arch/x86/kernel/kvmclock.c | 46 +++---
 1 file changed, 35 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 2c7aafa70702..ef5b3d2cecce 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -32,6 +32,7 @@
 static int kvmclock = 1;
 static int msr_kvm_system_time = MSR_KVM_SYSTEM_TIME;
 static int msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK;
+static cycle_t kvm_sched_clock_offset;
 
 static int parse_no_kvmclock(char *arg)
 {
@@ -92,6 +93,29 @@ static cycle_t kvm_clock_get_cycles(struct clocksource *cs)
return kvm_clock_read();
 }
 
+static cycle_t kvm_sched_clock_read(void)
+{
+   return kvm_clock_read() - kvm_sched_clock_offset;
+}
+
+static inline void kvm_sched_clock_init(bool stable)
+{
+   if (!stable) {
+   pv_time_ops.sched_clock = kvm_clock_read;
+   return;
+   }
+
+   kvm_sched_clock_offset = kvm_clock_read();
+   pv_time_ops.sched_clock = kvm_sched_clock_read;
+   set_sched_clock_stable();
+
+   printk("kvm-clock: using sched offset of %llu cycles\n",
+   kvm_sched_clock_offset);
+
+   BUILD_BUG_ON(sizeof(kvm_sched_clock_offset) >
+sizeof(((struct pvclock_vcpu_time_info *)NULL)->system_time));
+}
+
 /*
  * If we don't do that, there is the possibility that the guest
  * will calibrate under heavy load - thus, getting a lower lpj -
@@ -248,7 +272,17 @@ void __init kvmclock_init(void)
memblock_free(mem, size);
return;
}
-   pv_time_ops.sched_clock = kvm_clock_read;
+
+   if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STABLE_BIT))
+   pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
+
+   cpu = get_cpu();
+   vcpu_time = _clock[cpu].pvti;
+   flags = pvclock_read_flags(vcpu_time);
+
+   kvm_sched_clock_init(flags & PVCLOCK_TSC_STABLE_BIT);
+   put_cpu();
+
x86_platform.calibrate_tsc = kvm_get_tsc_khz;
x86_platform.get_wallclock = kvm_get_wallclock;
x86_platform.set_wallclock = kvm_set_wallclock;
@@ -265,16 +299,6 @@ void __init kvmclock_init(void)
kvm_get_preset_lpj();
clocksource_register_hz(_clock, NSEC_PER_SEC);
pv_info.name = "KVM";
-
-   if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STABLE_BIT))
-   pvclock_set_flags(~0);
-
-   cpu = get_cpu();
-   vcpu_time = _clock[cpu].pvti;
-   flags = pvclock_read_flags(vcpu_time);
-   if (flags & PVCLOCK_COUNTS_FROM_ZERO)
-   set_sched_clock_stable();
-   put_cpu();
 }
 
 int __init kvm_setup_vsyscall_timeinfo(void)
-- 
2.5.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 0/2] kvmclock: fix ABI breakage from PVCLOCK_COUNTS_FROM_ZERO.

2015-09-18 Thread Radim Krčmář
This patch series will be disabling PVCLOCK_COUNTS_FROM_ZERO flag and is
RFC because I haven't explored many potential problems or tested it.

[1/2] uses a different algorithm in the guest to start counting from 0.
[2/2] stops exposing PVCLOCK_COUNTS_FROM_ZERO in the hypervisor.

A viable alternative would be to implement opt-in features in kvm clock.

And because we probably only broke one old user (the infamous SLES 10), a
workaround like this is also possible: (but I'd rather not do that)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a60bdbccff51..ae9049248aaf 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2007,7 +2007,8 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
 
ka->boot_vcpu_runs_old_kvmclock = tmp;
 
-   ka->kvmclock_offset = -get_kernel_ns();
+   if (!ka->boot_vcpu_runs_old_kvmclock)
+   ka->kvmclock_offset = -get_kernel_ns();
}
 
vcpu->arch.time = data;


Radim Krčmář (2):
  x86: kvmclock: abolish PVCLOCK_COUNTS_FROM_ZERO
  Revert "KVM: x86: zero kvmclock_offset when vcpu0 initializes kvmclock
system MSR"

 arch/x86/include/asm/pvclock-abi.h |  1 +
 arch/x86/kernel/kvmclock.c | 46 +-
 arch/x86/kvm/x86.c |  4 
 3 files changed, 36 insertions(+), 15 deletions(-)

-- 
2.5.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 03/13] KVM: Define a new interface kvm_intr_is_single_vcpu()

2015-09-17 Thread Radim Krčmář
2015-09-17 16:24+0200, Paolo Bonzini:
> I think kvm_apic_match_logical_addr for MSI and IOAPIC interrupts is
> buggy in x2apic mode.
> 
> It does:
> 
> if (apic_x2apic_mode(apic))
> return ((logical_id >> 16) == (mda >> 16))
>&& (logical_id & mda & 0x) != 0;
> 
> But mda is only 8-bits for MSI and IOAPIC interrupts.
> 
> Radim, should kvm_apic_mda also handle the !ipi && x2apic_mda && dest_id
> != APIC_BROADCAST case?  It never triggers with Linux because it uses
> only the physical mode (that's not super-easy to see;
> ioapic_set_affinity looks for the RTEs in irq_data->chip_data and that
> is allocated with kzalloc).

KVM handles that case, it's just convoluted.
(I wish we scrapped the IR-less x2APIC mode.)

For interrupts from MSI and IOxAPIC:
- Flat logical interrupts are delivered as if we had natural
  (CPU0<->bit0, CPU1<->bit1, ...) flat logical xAPIC for first 8 VCPUs.
- Cluster logical doesn't work much, it's interpreted like flat logical.
  I didn't care about xAPIC cluster because Linux, the sole user of our
  paravirtualized x2APIC, doesn't configure it.

I'll paste kvm_apic_mda() source for better explanation:

  static u32 kvm_apic_mda(unsigned int dest_id, struct kvm_lapic *source,
struct kvm_lapic *target)
  {
bool ipi = source != NULL;
bool x2apic_mda = apic_x2apic_mode(ipi ? source : target);
  
if (!ipi && dest_id == APIC_BROADCAST && x2apic_mda)
return X2APIC_BROADCAST;
  
return x2apic_mda ? dest_id : SET_APIC_DEST_FIELD(dest_id);
  }

MSI/IOxAPIC interrupt means that source is NULL and if the target is in
x2APIC mode, the original 'dest_id' is returned as mda => a flat logical
xAPIC to 0x0f will get interpreted as (cluster) logical x2APIC 0xf in
kvm_apic_match_logical_addr().
xAPIC address are only 8 bit long so they always get delivered to x2APIC
cluster 0, where first 16 bits work like xAPIC flat logical mode.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IRQ affinity on Linux guest

2015-08-20 Thread Radim Krčmář
2015-08-20 17:16+0300, Mihai Neagu:
 Here is how IRQ affinity is configured on guest at startup, in an init.d
 script:
 
 echo 1  /proc/irq/default_smp_affinity
 for x in /proc/irq/*/smp_affinity;
 do
   echo 1  $x
 done 2 /dev/null
 
 The command line for starting the hardware accelerated VM:
 qemu-system-x86-64 -enable-kvm -kernel bzImage -hda rootfs.ext2 -append\
 root=/dev/sda console=ttyS0 rw -nographic -cpu qemu64 -snapshot -smp 2   \
 -m 2048
 
 On the hardware accelerated guest, 'cat /proc/interrupts' shows:
CPU0   CPU1
   0: 26  0   IO-APIC-edge  timer
   1:  7  4   IO-APIC-edge  i8042
   4:   1137523   IO-APIC-edge  serial
   8:  0  1   IO-APIC-edge  rtc0
   9:  0  0   IO-APIC-fasteoi   acpi
  11:   4971  4   IO-APIC-fasteoi   eth0
  12: 66 64   IO-APIC-edge  i8042
  14:   1958714   IO-APIC-edge  ata_piix
  15:   4512 63   IO-APIC-edge  ata_piix
 ...
 Interrupts are serviced on both cores, even though affinity is set to 1.

KVM's APIC balances interrupts -- until you set the affinity (probably
near the end of boot process), both CPUs are going to receive roughly
the same amount but after directing subsequent interrupts to CPU0, CPU1
shouldn't receive more.

Please verify that CPU0 is not receiving all interrupts by doing a
difference between two `cat /proc/interrupts` after the affinity was
set.  (CPU0 has higher numbers in your excerpt, which makes me suspect
that it works as expected.)

Thanks.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Help debugging a regression in KVM Module

2015-08-18 Thread Radim Krčmář
2015-08-18 16:54+0200, Peter Lieven:
 After some experiments I was able to find out the bad commit that introduced 
 the regression:
 
 commit f30ebc312ca9def25650b4e1d01cdb425c310dca
 Author: Radim Krčmář rkrc...@redhat.com
 Date:   Thu Oct 30 15:06:47 2014 +0100
 
 It seems that this optimisation is not working reliabliy after live 
 migration. I can't reproduce if
 I take a 3.19 kernel and revert this single commit.

Hello, this bug has gone unnoticed for a long time so it is fixed only
since v4.1 (and v3.19.stable was dead at that point).

commit b6ac069532218027f2991cba01d7a72a200688b0
Author: Radim Krčmář rkrc...@redhat.com
Date:   Fri Jun 5 20:57:41 2015 +0200

KVM: x86: fix lapic.timer_mode on restore

lapic.timer_mode was not properly initialized after migration, which
broke few useful things, like login, by making every sleep eternal.

Fix this by calling apic_update_lvtt in kvm_apic_post_state_restore.

There are other slowpaths that update lvtt, so this patch makes sure
something similar doesn't happen again by calling apic_update_lvtt
after every modification.

Cc: sta...@vger.kernel.org
Fixes: f30ebc312ca9 (KVM: x86: optimize some accesses to LVTT and SPIV)
Signed-off-by: Radim Krčmář rkrc...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: x86: fix edge EOI and IOAPIC reconfig race

2015-08-14 Thread Radim Krčmář
2015-08-13 16:53+0200, Paolo Bonzini:
 On 13/08/2015 15:46, Radim Krčmář wrote:
  1) IOAPIC inject a vector from i8254
  2) guest reconfigures that vector's VCPU and therefore eoi_exit_bitmap
 on original VCPU gets cleared
  3) guest's handler for the vector does EOI
  4) KVM's EOI handler doesn't pass that vector to IOAPIC because it is
 not in that VCPU's eoi_exit_bitmap
  5) i8254 stops working
 
 This creates an unwanted situation if the vector is reused by a
 non-IOAPIC source, but I think it is so rare that we don't want to make
 the solution more sophisticated. 
 
 What happens if the vector is changed in step 2?
 __kvm_ioapic_update_eoi won't match the redirection table entry.

Yes, the EOI is going to be ignored.  (With APICv, VMX won't even exit.)
In the patch, I dissmissed it as shouldn't happen in the wild because
we've always had the vector-change bug :) (Unlike the destination-change
one, which was APICv-only before recent changes.)

A simple solution to the vector-change would have a list of one-time
fixups (vector, *ioapic) and hooks in ioapic reconfig, scan and EOI.

A complex solution would replace ioapic scanning with an array of list
of ioapics (it needs to be a list or small array because vectors can be
shared).
An ioapic would be added to list[vector] on reconfig and removed on
reconfig unless an edge fixup was needed, then it would last til next
EOI  (I guess we won't need to consider vector in IRR and ISR).
Callbacks would update the eoi_exit_bitmap on relevant changes.

I considered doing the complex one, but then it occured to me that we
want the destination-change fixed in stable as APICv machines are
starting to get used and people might migrate old guests on them.

 How do you reproduce the bug?

I run rhel4 (2.6.9) kernel on 2 VCPUs and frequently alternate
smp_affinity of timer.  The bug is hit within seconds.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/5] KVM: add kvm_has_request wrapper

2015-08-14 Thread Radim Krčmář
2015-08-13 12:03+0200, Christian Borntraeger:
 Am 13.08.2015 um 11:29 schrieb Paolo Bonzini:
 On 13/08/2015 11:11, Radim Krčmář wrote:
 for the new interface. maybe we can rename kvm_check_request in a 
 separate patch somewhen.
 I wonder why haven't we copied the naming convention from bit operations
| [...]
 
 Yes, that would be much better.
 
 +1

I'll send patches later.  Hope you won't mind keeping the doomed
kvm_has_request() in v3.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 4/5] KVM: add KVM_USER_EXIT vcpu ioctl for userspace exit

2015-08-14 Thread Radim Krčmář
The guest can use KVM_USER_EXIT instead of a signal-based exiting to
userspace.  Availability depends on KVM_CAP_USER_EXIT.
Only x86 is implemented so far.

Signed-off-by: Radim Krčmář rkrc...@redhat.com
---
 v3:
  * use ioctl argument directly (unsigned long as flags) [Paolo]
 v2:
  * use vcpu ioctl instead of vm one [Paolo]
  * shrink kvm_user_exit from 64 to 32 bytes

 Documentation/virtual/kvm/api.txt | 25 +
 arch/x86/kvm/x86.c| 15 +++
 include/uapi/linux/kvm.h  |  3 +++
 virt/kvm/kvm_main.c   |  5 +++--
 4 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 3c714d43a717..df087ff3c5b6 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3020,6 +3020,31 @@ Returns: 0 on success, -1 on error
 
 Queues an SMI on the thread's vcpu.
 
+
+4.97 KVM_USER_EXIT
+
+Capability: KVM_CAP_USER_EXIT
+Architectures: x86
+Type: vcpu ioctl
+Parameters: unsigned long flags (in)
+Returns: 0 on success,
+ -EINVAL if flags is not 0
+
+The ioctl is asynchronous to VCPU execution and can be issued from all threads.
+
+Make vcpu_id exit to userspace as soon as possible.  If the VCPU is not running
+in kernel at the time, it will exit early on the next call to KVM_RUN.
+If the VCPU was going to exit because of other reasons when KVM_USER_EXIT was
+issued, it will keep the original exit reason without exiting early on next
+KVM_RUN.
+If VCPU exited because of KVM_USER_EXIT, the exit reason is KVM_EXIT_REQUEST.
+
+This ioctl has very similar effect (same sans some races on userspace exit) as
+sending a signal (that is blocked in userspace and set in KVM_SET_SIGNAL_MASK)
+to the VCPU thread.
+
+
+
 5. The kvm_run structure
 
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 37db1b32684a..d985806b17b1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2467,6 +2467,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
ext)
case KVM_CAP_ASSIGN_DEV_IRQ:
case KVM_CAP_PCI_2_3:
 #endif
+   case KVM_CAP_USER_EXIT:
r = 1;
break;
case KVM_CAP_X86_SMM:
@@ -3078,6 +3079,17 @@ static int kvm_set_guest_paused(struct kvm_vcpu *vcpu)
return 0;
 }
 
+static int kvm_vcpu_ioctl_user_exit(struct kvm_vcpu *vcpu, unsigned long flags)
+{
+   if (flags != 0)
+   return -EINVAL;
+
+   kvm_make_request(KVM_REQ_EXIT, vcpu);
+   kvm_vcpu_kick(vcpu);
+
+   return 0;
+}
+
 long kvm_arch_vcpu_ioctl(struct file *filp,
 unsigned int ioctl, unsigned long arg)
 {
@@ -3342,6 +3354,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
r = kvm_set_guest_paused(vcpu);
goto out;
}
+   case KVM_USER_EXIT:
+   r = kvm_vcpu_ioctl_user_exit(vcpu, arg);
+   break;
default:
r = -EINVAL;
}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index d996a7cdb4d2..58b3a07adc81 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -826,6 +826,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_X86_SMM 117
 #define KVM_CAP_MULTI_ADDRESS_SPACE 118
 #define KVM_CAP_SPLIT_IRQCHIP 119
+#define KVM_CAP_USER_EXIT 120
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1213,6 +1214,8 @@ struct kvm_s390_ucas_mapping {
 #define KVM_S390_GET_IRQ_STATE   _IOW(KVMIO, 0xb6, struct kvm_s390_irq_state)
 /* Available with KVM_CAP_X86_SMM */
 #define KVM_SMI   _IO(KVMIO,   0xb7)
+/* Available with KVM_CAP_USER_EXIT */
+#define KVM_USER_EXIT _IO(KVMIO,   0xb8)
 
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU(1  0)
 #define KVM_DEV_ASSIGN_PCI_2_3 (1  1)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 347899966178..dfa2d5f27713 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2251,15 +2251,16 @@ static long kvm_vcpu_ioctl(struct file *filp,
if (unlikely(_IOC_TYPE(ioctl) != KVMIO))
return -EINVAL;
 
-#if defined(CONFIG_S390) || defined(CONFIG_PPC) || defined(CONFIG_MIPS)
/*
 * Special cases: vcpu ioctls that are asynchronous to vcpu execution,
 * so vcpu_load() would break it.
 */
+#if defined(CONFIG_S390) || defined(CONFIG_PPC) || defined(CONFIG_MIPS)
if (ioctl == KVM_S390_INTERRUPT || ioctl == KVM_S390_IRQ || ioctl == 
KVM_INTERRUPT)
return kvm_arch_vcpu_ioctl(filp, ioctl, arg);
 #endif
-
+   if (ioctl == KVM_USER_EXIT)
+   return kvm_arch_vcpu_ioctl(filp, ioctl, arg);
 
r = vcpu_load(vcpu);
if (r)
-- 
2.5.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/5] KVM: add KVM_REQ_EXIT request for userspace exit

2015-08-14 Thread Radim Krčmář
When userspace wants KVM to exit to userspace, it sends a signal.
This has a disadvantage of requiring a change to the signal mask because
the signal needs to be blocked in userspace to stay pending when sending
to self.

Using a request flag allows us to shave 200-300 cycles from every
userspace exit and the speedup grows with NUMA because unblocking
touches shared spinlock.

The disadvantage is that it adds an overhead of one bit check for all
kernel exits.  A quick tracing shows that the ratio of userspace exits
after boot is about 1/5 and in subsequent run of nmap and kernel compile
has about 1/60, so the check should not regress global performance.

All signal_pending() calls are userspace exit requests, so we add a
check for KVM_REQ_EXIT there.  There is one omitted call in kvm_vcpu_run
because KVM_REQ_EXIT is implied in earlier check for requests.

Signed-off-by: Radim Krčmář rkrc...@redhat.com
---
 arch/x86/kvm/vmx.c   | 2 +-
 arch/x86/kvm/x86.c   | 6 ++
 include/linux/kvm_host.h | 8 +++-
 include/uapi/linux/kvm.h | 1 +
 virt/kvm/kvm_main.c  | 2 +-
 5 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 40c6180a0ecb..2b789a869ef5 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5833,7 +5833,7 @@ static int handle_invalid_guest_state(struct kvm_vcpu 
*vcpu)
goto out;
}
 
-   if (signal_pending(current))
+   if (kvm_need_exit(vcpu))
goto out;
if (need_resched())
schedule();
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e5850076bf7b..c3df7733af09 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6548,6 +6548,11 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
++vcpu-stat.signal_exits;
break;
}
+   if (unlikely(kvm_has_request(KVM_REQ_EXIT, vcpu))) {
+   r = 0;
+   vcpu-run-exit_reason = KVM_EXIT_REQUEST;
+   break;
+   }
if (need_resched()) {
srcu_read_unlock(kvm-srcu, vcpu-srcu_idx);
cond_resched();
@@ -6684,6 +6689,7 @@ out:
post_kvm_run_save(vcpu);
if (vcpu-sigset_active)
sigprocmask(SIG_SETMASK, sigsaved, NULL);
+   clear_bit(KVM_REQ_EXIT, vcpu-requests);
 
return r;
 }
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 52e388367a26..dcc57171e3ec 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -121,7 +121,7 @@ static inline bool is_error_page(struct page *page)
 #define KVM_REQ_UNHALT 6
 #define KVM_REQ_MMU_SYNC   7
 #define KVM_REQ_CLOCK_UPDATE   8
-#define KVM_REQ_KICK   9
+#define KVM_REQ_EXIT   9
 #define KVM_REQ_DEACTIVATE_FPU10
 #define KVM_REQ_EVENT 11
 #define KVM_REQ_APF_HALT  12
@@ -1104,6 +1104,12 @@ static inline bool kvm_check_request(int req, struct 
kvm_vcpu *vcpu)
}
 }
 
+static inline bool kvm_need_exit(struct kvm_vcpu *vcpu)
+{
+   return signal_pending(current) ||
+  kvm_has_request(KVM_REQ_EXIT, vcpu);
+}
+
 extern bool kvm_rebooting;
 
 struct kvm_device {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 26daafbba9ec..d996a7cdb4d2 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -184,6 +184,7 @@ struct kvm_s390_skeys {
 #define KVM_EXIT_SYSTEM_EVENT 24
 #define KVM_EXIT_S390_STSI25
 #define KVM_EXIT_IOAPIC_EOI   26
+#define KVM_EXIT_REQUEST  27
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d8db2f8fce9c..347899966178 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1914,7 +1914,7 @@ static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
}
if (kvm_cpu_has_pending_timer(vcpu))
return -EINTR;
-   if (signal_pending(current))
+   if (kvm_need_exit(vcpu))
return -EINTR;
 
return 0;
-- 
2.5.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/5] KVM: optimize userspace exits with a new ioctl

2015-08-14 Thread Radim Krčmář
v3:
 * acked by Christian [1/5]
 * use ioctl argument directly (unsigned long as flags) [4/5]
 * precisely #ifdef arch-specific ioctls [5/5]
v2:
 * move request_exits debug counter patch right after introduction of
   KVM_REQ_EXIT [3/5]
 * use vcpu ioctl instead of vm one [4/5]
 * shrink kvm_user_exit from 64 to 32 bytes [4/5]
 * new [5/5]

QEMU uses SIGUSR1 to force a userspace exit and also to queue an early
exit before calling VCPU_RUN -- the signal is blocked in user space and
temporarily unblocked in VCPU_RUN.
The temporal unblocking by sigprocmask() in kvm_arch_vcpu_ioctl_run()
takes a shared siglock, which leads to cacheline bouncing in NUMA
systems.

This series allows the same with a new request bit and VM IOCTL that
marks and kicks target VCPU, hence no need to unblock.

inl_from_{pmtimer,qemu} vmexit benchmark from kvm-unit-tests shows ~5%
speedup for 1-4 VCPUs (300-2000 saved cycles) without noticeably
regressing kernel VM exits.
(Paolo did a quick run of older version of this series on a NUMA system
 and the speedup was around 35% when utilizing more nodes.)

Radim Krčmář (5):
  KVM: add kvm_has_request wrapper
  KVM: add KVM_REQ_EXIT request for userspace exit
  KVM: x86: add request_exits debug counter
  KVM: add KVM_USER_EXIT vcpu ioctl for userspace exit
  KVM: refactor asynchronous vcpu ioctl dispatch

 Documentation/virtual/kvm/api.txt | 25 +
 arch/x86/include/asm/kvm_host.h   |  1 +
 arch/x86/kvm/vmx.c|  4 ++--
 arch/x86/kvm/x86.c| 23 +++
 include/linux/kvm_host.h  | 15 +--
 include/uapi/linux/kvm.h  |  4 
 virt/kvm/kvm_main.c   | 15 ++-
 7 files changed, 78 insertions(+), 9 deletions(-)

-- 
2.5.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/5] KVM: add kvm_has_request wrapper

2015-08-14 Thread Radim Krčmář
We want to have requests abstracted from bit operations.

Acked-by: Christian Borntraeger borntrae...@de.ibm.com
Signed-off-by: Radim Krčmář rkrc...@redhat.com
---
 v3: acked by Christian

 arch/x86/kvm/vmx.c   | 2 +-
 include/linux/kvm_host.h | 7 ++-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4cf25b90dbe0..40c6180a0ecb 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5809,7 +5809,7 @@ static int handle_invalid_guest_state(struct kvm_vcpu 
*vcpu)
if (intr_window_requested  vmx_interrupt_allowed(vcpu))
return handle_interrupt_window(vmx-vcpu);
 
-   if (test_bit(KVM_REQ_EVENT, vcpu-requests))
+   if (kvm_has_request(KVM_REQ_EVENT, vcpu))
return 1;
 
err = emulate_instruction(vcpu, EMULTYPE_NO_REEXECUTE);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 27ccdf91a465..52e388367a26 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1089,9 +1089,14 @@ static inline void kvm_make_request(int req, struct 
kvm_vcpu *vcpu)
set_bit(req, vcpu-requests);
 }
 
+static inline bool kvm_has_request(int req, struct kvm_vcpu *vcpu)
+{
+   return test_bit(req, vcpu-requests);
+}
+
 static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu)
 {
-   if (test_bit(req, vcpu-requests)) {
+   if (kvm_has_request(req, vcpu)) {
clear_bit(req, vcpu-requests);
return true;
} else {
-- 
2.5.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 3/5] KVM: x86: add request_exits debug counter

2015-08-14 Thread Radim Krčmář
We are still interested in the amount of exits userspace requested and
signal_exits doesn't cover that anymore.

Signed-off-by: Radim Krčmář rkrc...@redhat.com
---
 v2: move request_exits debug counter patch right after introduction of
 KVM_REQ_EXIT

 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/x86.c  | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 09acaa64ef8e..95c05a3d02d4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -729,6 +729,7 @@ struct kvm_vcpu_stat {
u32 hypercalls;
u32 irq_injections;
u32 nmi_injections;
+   u32 request_exits;
 };
 
 struct x86_instruction_info;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c3df7733af09..37db1b32684a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -145,6 +145,7 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
{ io_exits, VCPU_STAT(io_exits) },
{ mmio_exits, VCPU_STAT(mmio_exits) },
{ signal_exits, VCPU_STAT(signal_exits) },
+   { request_exits, VCPU_STAT(request_exits) },
{ irq_window, VCPU_STAT(irq_window_exits) },
{ nmi_window, VCPU_STAT(nmi_window_exits) },
{ halt_exits, VCPU_STAT(halt_exits) },
@@ -6551,6 +6552,7 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
if (unlikely(kvm_has_request(KVM_REQ_EXIT, vcpu))) {
r = 0;
vcpu-run-exit_reason = KVM_EXIT_REQUEST;
+   ++vcpu-stat.request_exits;
break;
}
if (need_resched()) {
-- 
2.5.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 5/5] KVM: refactor asynchronous vcpu ioctl dispatch

2015-08-14 Thread Radim Krčmář
I find the switch easier to read and modify.

Signed-off-by: Radim Krčmář rkrc...@redhat.com
---
 v3: precisely #ifdef arch-specific ioctls [Christian]
 v2: new

 virt/kvm/kvm_main.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index dfa2d5f27713..c059c01161fe 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2255,12 +2255,16 @@ static long kvm_vcpu_ioctl(struct file *filp,
 * Special cases: vcpu ioctls that are asynchronous to vcpu execution,
 * so vcpu_load() would break it.
 */
-#if defined(CONFIG_S390) || defined(CONFIG_PPC) || defined(CONFIG_MIPS)
-   if (ioctl == KVM_S390_INTERRUPT || ioctl == KVM_S390_IRQ || ioctl == 
KVM_INTERRUPT)
-   return kvm_arch_vcpu_ioctl(filp, ioctl, arg);
+   switch (ioctl) {
+#if defined(CONFIG_S390)
+   case KVM_S390_INTERRUPT:
+   case KVM_S390_IRQ:
+#elif defined(CONFIG_PPC) || defined(CONFIG_MIPS)
+   case KVM_INTERRUPT:
 #endif
-   if (ioctl == KVM_USER_EXIT)
+   case KVM_USER_EXIT:
return kvm_arch_vcpu_ioctl(filp, ioctl, arg);
+   }
 
r = vcpu_load(vcpu);
if (r)
-- 
2.5.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/5] KVM: add kvm_has_request wrapper

2015-08-13 Thread Radim Krčmář
2015-08-12 21:57+0200, Christian Borntraeger:
 kvm_check_request is now somewhat a misnomer (what is the difference between 
 test and check?)

kvm_check_request has always been poetic;  it uses two meanings of
check, examine and tick off, at the same time.

We also want something that clears the request, so kvm_drop_request was
my best candidate so far.

 for the new interface. maybe we can rename kvm_check_request in a separate 
 patch somewhen.

I wonder why haven't we copied the naming convention from bit operations
(or if programming would be better if German was its language),

  kvm_test_request
  kvm_set_request
  kvm_clear_request
  kvm_test_and_clear_request

The only disadvantage is that
  kvm_test_and_clear_request
is longer than
  kvm_check_request
   123456789
by whooping 9 characters.

I could live with that.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 5/5] KVM: refactor asynchronous vcpu ioctl dispatch

2015-08-13 Thread Radim Krčmář
2015-08-12 22:03+0200, Christian Borntraeger:
 Am 05.08.2015 um 18:33 schrieb Radim Krčmář:
 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 @@ -2252,12 +2252,15 @@ static long kvm_vcpu_ioctl(struct file *filp,
   * Special cases: vcpu ioctls that are asynchronous to vcpu execution,
   * so vcpu_load() would break it.
   */
 +switch (ioctl) {
  #if defined(CONFIG_S390) || defined(CONFIG_PPC) || defined(CONFIG_MIPS)
 -if (ioctl == KVM_S390_INTERRUPT || ioctl == KVM_S390_IRQ || ioctl == 
 KVM_INTERRUPT)
 -return kvm_arch_vcpu_ioctl(filp, ioctl, arg);
 +case KVM_S390_INTERRUPT:
 +case KVM_S390_IRQ:
 +case KVM_INTERRUPT:
 
 When you are it you might want to put the KVM_S390* withing CONFIG_S390 and
 KVM_INTERRUPT within CONFIG_PPC || CONFIG_MIPS

Sure, thanks.

 This might speed up the switch statement for s390/ppc/mips a tiny bit. It 
 will add
 another ifdef, though. Paolo?

For v3, I will name the decision as an inline function, which should
make the #ifing more acceptable (at the cost of not having ioctls #defs
in the body of kvm_vcpu_ioctl).  Something like this,

static inline bool kvm_asynchronous_ioctl(unsigned ioctl)
{
switch (ioctl) {
#if defined(CONFIG_S390)
case KVM_S390_INTERRUPT:
case KVM_S390_IRQ:
#endif
#if defined(CONFIG_MIPS)
case KVM_INTERRUPT:
#endif
case KVM_USER_EXIT:
return true;
}
return false;
}

[...]
if (kvm_asynchronous_ioctl(ioctl))
return kvm_arch_vcpu_ioctl(filp, ioctl, arg);
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM: x86: return bool from x86_ops.sync_pir_to_irr

2015-08-13 Thread Radim Krčmář
True means that we have added PIR to IRR.

Signed-off-by: Radim Krčmář rkrc...@redhat.com
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/svm.c  |  4 ++--
 arch/x86/kvm/vmx.c  | 12 +++-
 3 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 09acaa64ef8e..b73696b59d77 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -831,7 +831,7 @@ struct kvm_x86_ops {
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
void (*set_apic_access_page_addr)(struct kvm_vcpu *vcpu, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
-   void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
+   bool (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
int (*get_tdp_level)(void);
u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 189e46479dd5..cd4ad20951c4 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3758,9 +3758,9 @@ static void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu)
return;
 }
 
-static void svm_sync_pir_to_irr(struct kvm_vcpu *vcpu)
+static bool svm_sync_pir_to_irr(struct kvm_vcpu *vcpu)
 {
-   return;
+   return false;
 }
 
 static int svm_nmi_allowed(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4cf25b90dbe0..e3ae8c236cca 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -817,7 +817,7 @@ static void vmx_get_segment(struct kvm_vcpu *vcpu,
struct kvm_segment *var, int seg);
 static bool guest_state_valid(struct kvm_vcpu *vcpu);
 static u32 vmx_segment_access_rights(struct kvm_segment *var);
-static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu);
+static bool vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu);
 static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
 static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
 static int alloc_identity_pagetable(struct kvm *kvm);
@@ -4430,19 +4430,21 @@ static void vmx_deliver_posted_interrupt(struct 
kvm_vcpu *vcpu, int vector)
kvm_vcpu_kick(vcpu);
 }
 
-static void vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
+static bool vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
 
if (!pi_test_and_clear_on(vmx-pi_desc))
-   return;
+   return false;
 
kvm_apic_update_irr(vcpu, vmx-pi_desc.pir);
+
+   return true;
 }
 
-static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu)
+static bool vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu)
 {
-   return;
+   return false;
 }
 
 /*
-- 
2.5.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] KVM: x86: fix edge EOI and IOAPIC reconfig race

2015-08-13 Thread Radim Krčmář
(The main problem is that we care about EOI of edge interrupts, but our
 house of cards started a long time ago, so overturning that decision is
 not ideal for a stable fix.)

KVM uses eoi_exit_bitmap to track vectors that need an action on EOI.
The problem is that IOAPIC can be reconfigured while an interrupt with
old configuration is pending and eoi_exit_bitmap only remembers the
newest configuration so EOI from the pending interrupt is not
recognized.

This is not a problem for level interrupts, because IOAPIC sends
interrupt with the new configuration.
And then there are edge interrupts with ACK notifiers, like i8254 timer;
things can happen in this order
 1) IOAPIC inject a vector from i8254
 2) guest reconfigures that vector's VCPU and therefore eoi_exit_bitmap
on original VCPU gets cleared
 3) guest's handler for the vector does EOI
 4) KVM's EOI handler doesn't pass that vector to IOAPIC because it is
not in that VCPU's eoi_exit_bitmap
 5) i8254 stops working

A simple solution is to set the IOAPIC vector in eoi_exit_bitmap if the
vector is in PIR/IRR/ISR.

This creates an unwanted situation if the vector is reused by a
non-IOAPIC source, but I think it is so rare that we don't want to make
the solution more sophisticated.  The simple solution also doesn't work
if we are reconfiguring the vector.  (Shouldn't happen in the wild and
I'd rather fix users of ACK notifiers instead of working around that.)

The are no races because ioapic injection and reconfig are locked.

Fixes: 638e7c03efea (KVM: x86: Add EOI exit bitmap inference)
[Before 638e7c03efea, this bug happened only with APICv.]
Fixes: c7c9c56ca26f (x86, apicv: add virtual interrupt delivery support)
Cc: sta...@vger.kernel.org
Signed-off-by: Radim Krčmář rkrc...@redhat.com
---
 arch/x86/kvm/ioapic.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 2dcda0f188ba..85d25fe25e39 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -239,6 +239,9 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 
*eoi_exit_bitmap)
union kvm_ioapic_redirect_entry *e;
int index;
 
+   if (kvm_x86_ops-sync_pir_to_irr(vcpu))
+   kvm_make_request(KVM_REQ_EVENT, vcpu);
+
spin_lock(ioapic-lock);
for (index = 0; index  IOAPIC_NUM_PINS; index++) {
e = ioapic-redirtbl[index];
@@ -246,7 +249,9 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 
*eoi_exit_bitmap)
kvm_irq_has_notifier(ioapic-kvm, KVM_IRQCHIP_IOAPIC, 
index) ||
index == RTC_GSI) {
if (kvm_apic_match_dest(vcpu, NULL, 0,
-   e-fields.dest_id, e-fields.dest_mode))
+e-fields.dest_id, e-fields.dest_mode) ||
+   (e-fields.trig_mode == IOAPIC_EDGE_TRIG 
+kvm_apic_pending_eoi(vcpu, e-fields.vector)))
__set_bit(e-fields.vector,
(unsigned long *)eoi_exit_bitmap);
}
-- 
2.5.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] KVM: x86: fix edge EOI and IOAPIC reconfig race

2015-08-13 Thread Radim Krčmář
[1/2] changes sync_pir_to_irr interface to avoid a definitely
superfluous KVM_REQ_EVENT in [2/2].  It's possible that we don't need
KVM_REQ_EVENT, but a slight slowdown in execution is worth the time
needed for a proof.  (A quick test showed that KVM_REQ_EVENT is always
already set if sync_pir_to_irr does something.)

[2/2] is applicable to userspace split IOAPIC as well, but I hope that
we want it to work like normal IOAPIC (no ACK for edge irq), so I
haven't fixed it.


Radim Krčmář (2):
  KVM: x86: return bool from x86_ops.sync_pir_to_irr
  KVM: x86: fix edge EOI and IOAPIC reconfig race

 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/ioapic.c   |  7 ++-
 arch/x86/kvm/svm.c  |  4 ++--
 arch/x86/kvm/vmx.c  | 12 +++-
 4 files changed, 16 insertions(+), 9 deletions(-)

-- 
2.5.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: zero IDT limit on entry to SMM

2015-08-10 Thread Radim Krčmář
2015-08-07 12:54+0200, Paolo Bonzini:
 The recent BlackHat 2015 presentation The Memory Sinkhole
 mentions that the IDT limit is zeroed on entry to SMM.

Slide 64 of
https://www.blackhat.com/docs/us-15/materials/us-15-Domas-The-Memory-Sinkhole-Unleashing-An-x86-Design-Flaw-Allowing-Universal-Privilege-Escalation.pdf

 This is not documented, and must have changed some time after 2010
 (see http://www.ssi.gouv.fr/uploads/IMG/pdf/IT_Defense_2010_final.pdf).
 KVM was not doing it, but the fix is easy.

This patch also clears the IDT base.  Fetching original IDT is better
done from SMM saved state (and an anti-exploit based on comparing those
two seems unlikely) so it should be fine,

Reviewed-by: Radim Krčmář rkrc...@redhat.com

 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---

That takes care of Attack 1.
KVM is likely not vulnerable to attack 2 and 3 because of an emergent
security feature.  (A simple modification of kvm-unit-tests show that
mapping APIC base on top of real code/data makes the APIC page hidden
and I expect SMM memslot to behave similarly.)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/5] KVM: add KVM_USER_EXIT vcpu ioctl for userspace exit

2015-08-06 Thread Radim Krčmář
2015-08-06 15:52+0200, Paolo Bonzini:
 On 06/08/2015 15:44, Radim Krčmář wrote:
 The two obvious extensions are flags to skip kvm_make_request() or
 kvm_vcpu_kick(), both of dubious use.
 
 Skipping kvm_make_request() would make some sense if you can set
 vcpu-run-request_interrupt_window asynchronously.  So you could do
 
   vcpu-run-request_interrupt_window = 1;
   ioctl(vcpu_fd, KVM_USER_EXIT, KVM_USER_EXIT_LAZY);
 
 and only cause a lightweight vmexit if the interrupt window is currently
 closed.  I haven't thought of any races that could happen, but it looks
 like it could work.

Seems doable, kvm_run should have been better protected :)

 Skipping kvm_vcpu_kick() makes much less sense.

Could save some cycles when queuing an early exit from the VCPU thread.

 Another possibility is setting up
 conditional exits, but that would be better as a separate control, like
 most other sophisticated extensions.
 
 I think that u32 flags would be sufficient -- is casting the 'unsigned
 long arg' (data pointer) to a value still an accepted solution?
 
 Yeah, that would work for me as well.  Also because, for now, you'd
 return EINVAL if the unsigned long is not zero, which boils down to
 return EINVAL if the parameter is not NULL. :)

Exactly, only the ioctl number will change.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/5] KVM: add KVM_USER_EXIT vcpu ioctl for userspace exit

2015-08-06 Thread Radim Krčmář
2015-08-05 18:36+0200, Paolo Bonzini:
 On 05/08/2015 18:33, Radim Krčmář wrote:
 +4.97 KVM_USER_EXIT
 +
 +Capability: KVM_CAP_USER_EXIT
 +Architectures: x86
 +Type: vcpu ioctl
 +Parameters: struct kvm_user_exit (in)
 +Returns: 0 on success,
 + -EFAULT if the parameter couldn't be read,
 + -EINVAL if 'reserved' is not zeroed,
 +
 +struct kvm_user_exit {
 +__u8 reserved[32];
 +};
 
 Can we just return EINVAL if the parameter is not NULL?

It complicates handling if we extend the ioctl, but removes the useless
clearing/copying/checking now ...

The two obvious extensions are flags to skip kvm_make_request() or
kvm_vcpu_kick(), both of dubious use.  Another possibility is setting up
conditional exits, but that would be better as a separate control, like
most other sophisticated extensions.

I think that u32 flags would be sufficient -- is casting the 'unsigned
long arg' (data pointer) to a value still an accepted solution?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   4   5   6   >