Re: [question] updating the base image for all clones which have been running for months

2014-11-03 Thread Kevin Wolf
Am 03.11.2014 um 13:04 hat Zhang Haoyu geschrieben:
 Hi, all
 
 I used base image A to clone so many vm, 
 after running for months, each vm has its own private applications and data,
 which maybe different from each other.
 Now, I want to install some applications for all of the clones,
 what should I do?

Install the applications on each clone separately, or use some other
method to make it available (like installing on a shared network
resource).

 Can I rebase image A to B which have the applications to be installed,
 then change the base image to B for all clones?

The problem is that rebase works on the block level, not on the file
system level. Changing the backing file won't produce a correctly
working guest, it causes file system corruption.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qcow2: fix double-free of Qcow2DiscardRegion in qcow2_process_discards

2014-10-12 Thread Kevin Wolf
Am 11.10.2014 um 09:14 hat Zhang Haoyu geschrieben:
 In qcow2_update_snapshot_refcount - qcow2_process_discards() - 
 bdrv_discard()
 may free the Qcow2DiscardRegion which is referenced by next pointer in
 qcow2_process_discards() now, in next iteration, d = next, so g_free(d)
 will double-free this Qcow2DiscardRegion.
 
 qcow2_snapshot_delete
 |- qcow2_update_snapshot_refcount
 |-- qcow2_process_discards
 |--- bdrv_discard
 | aio_poll
 |- aio_dispatch
 |-- bdrv_co_io_em_complete
 |--- qemu_coroutine_enter(co-coroutine, NULL); === coroutine entry is 
 bdrv_co_do_rw
 |--- g_free(d) == free first Qcow2DiscardRegion is okay
 |--- d = next;  == this set is done in QTAILQ_FOREACH_SAFE() macro.
 |--- g_free(d);  == double-free will happen if during previous iteration, 
 bdrv_discard had free this object.

Do you have a reproducer for this or did code review lead you to this?

At the moment I can't see how bdrv_discard(bs-file) could ever free a
Qcow2DiscardRegion of bs, as it's working on a completely different
BlockDriverState (which usually won't even be a qcow2 one).

 bdrv_co_do_rw
 |- bdrv_co_do_writev
 |-- bdrv_co_do_pwritev
 |--- bdrv_aligned_pwritev
 | qcow2_co_writev
 |- qcow2_alloc_cluster_link_l2
 |-- qcow2_free_any_clusters
 |--- qcow2_free_clusters
 | update_refcount
 |- qcow2_process_discards
 |-- g_free(d)  == In next iteration, this Qcow2DiscardRegion will be 
 double-free.

This shouldn't happen in a nested call either, as s-lock can't be taken
recursively.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC]VM live snapshot proposal

2014-03-03 Thread Kevin Wolf
Am 03.03.2014 um 13:32 hat Stefan Hajnoczi geschrieben:
 On Mon, Mar 03, 2014 at 01:13:41AM +, Huangpeng (Peter) wrote:
 
 Just to summarize the idea of live savevm for people joining the
 discussion:
 
 It should be possible to save a snapshot of the guest (including memory,
 devices, and disk) without noticable downtime.
 
 The 'savevm' command pauses the guest until the snapshot has been
 completed and therefore doesn't meet the requirements.
 
  Here I have another proposal, based on the live-migration scheme, add 
  consistent 
  memory state tracking and saving.
  The idea is simple:
  1.First round use live-migration to save all memory to a snapshot file.
  2.intercept the action of memory-modify, save old pages to a temporary file 
  and mark dirty-bits,
  3.Merge temporary file to the original snapshot file

Why do you need a temporary file for this? Couldn't you directly store
the memory to its final destination in the snapshot file?

  Detailed process:
  (1)Pause VM
  (2) Save the device status to a temporary file (live-migration already 
  supported )
  (3) Make disk snapshot
  (4) Enable page dirty log and old dirty pages save function(which we need 
  to add)
  (5) Resume VM
  (6) Begin the first round of iteration, we save the entire contents of the 
  VM memory pages
  to the snapshot file
  (7) In the second round of iteration , we save the old page to the snapshot 
  file
  (8) Merge data of device status which is pre-saved in temporary files to 
  the snapshot file
  (8) End ram snapshot and some cleanup work
  
  Due to memory-modifications may happen in kvm, qemu, or vhost, the key-part 
  is how we
  can provide common page-modify-tracking-and-saving api, we completed a 
  prototype by 
  simply add modified-page tracking/saving function in qemu, and it seems 
  worked fine.
 
 Yes, this is the tricky part.  To be honest, I think this is the reason
 no one has submitted patches - it's a hard task and the win isn't that
 great (you can already migrate to file).

So why don't we simply reuse the existing migration code?

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC]VM live snapshot proposal

2014-03-03 Thread Kevin Wolf
Am 03.03.2014 um 14:19 hat Paolo Bonzini geschrieben:
 Il 03/03/2014 13:55, Kevin Wolf ha scritto:
   Due to memory-modifications may happen in kvm, qemu, or vhost, the 
   key-part is how we
   can provide common page-modify-tracking-and-saving api, we completed a 
   prototype by
   simply add modified-page tracking/saving function in qemu, and it 
   seems worked fine.
 
  Yes, this is the tricky part.  To be honest, I think this is the reason
  no one has submitted patches - it's a hard task and the win isn't that
  great (you can already migrate to file).
 So why don't we simply reuse the existing migration code?
 
 I think this is different in the same way that block-backup and
 block-mirror are different.  Huangpeng's proposal would let you make
 a consistent snapshot of disks and RAM.

Right. Though the point isn't about consistency (doing the disk snapshot
when memory has converged would be consistent as well), but about
having the snapshot semantically right at the time when the monitor
command is issued instead of only starting it then and being consistent
at the point of completion.

This is indeed like pre/post-copy live migration, and probably both
options have their uses. I would suggest starting with the easy one, and
adding the post-copy feature on top.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC]VM live snapshot proposal

2014-03-03 Thread Kevin Wolf
Am 03.03.2014 um 14:47 hat Paolo Bonzini geschrieben:
 Il 03/03/2014 14:30, Kevin Wolf ha scritto:
   So why don't we simply reuse the existing migration code?
  I think this is different in the same way that block-backup and
  block-mirror are different.  Huangpeng's proposal would let you make
  a consistent snapshot of disks and RAM.
 Right. Though the point isn't about consistency (doing the disk snapshot
 when memory has converged would be consistent as well), but about
 having the snapshot semantically right at the time when the monitor
 command is issued instead of only starting it then and being consistent
 at the point of completion.
 
 Right---though it's not entirely true that migration only affects
 the point in time where you have consistency.  For example, with
 migration you cannot use the guest agent for freeze/thaw and, even
 if we changed the code to allow that, the pause would be much longer
 than for live snapshots or block-backup.
 
 This is indeed like pre/post-copy live migration, and probably both
 options have their uses. I would suggest starting with the easy one, and
 adding the post-copy feature on top.
 
 The feature matrix for migration and snapshot
 
   disk   RAMinternal snapshot
 non-live  yes (0)yes (0)yes
 live, disk only   yes (1)N/Ayes (2)
 live, pre-copyyes (3)yesno
 live, post-copy   yes (4)no no
 live, point-in-time   yes (5)no no
 
 (0) just stop VM while doing normal pre-copy migration
 (1) blockdev-snapshot-sync
 (2) blockdev-snapshot-internal-sync
 (3) block-stream
 (4) drive-mirror
 (5) drive-backup
 
 By the easy one you mean live savevm with snapshot at the end of
 RAM migration, I guess.  But the functionality is already available
 using migration, while point-in-time snapshots actually add new
 functionality.  I'm not sure what's the status of the kernel
 infrastructure for post-copy.  Andrea?

Yes, it's available, but not with internal snapshots, but only with
RAM snapshots stored in an external file.

An incremental next step would be to avoid writing dirtied memory to two
places, because internal snapshots aren't a streaming, but a random
access interface, so you can overwrite the original place instead of
appending the new copy. That would already be a small advantage.

Once you have this infrastructure, it's probably also a bit easier to
plug in any post-copy/point-in-time features that the migration code can
(be improved to) provide.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for 2013-12-10

2013-12-10 Thread Kevin Wolf
Am 10.12.2013 um 16:05 hat Juan Quintela geschrieben:
 Anthony Liguori anth...@codemonkey.ws wrote:
  On Tue, Dec 10, 2013 at 4:37 AM, Paolo Bonzini pbonz...@redhat.com wrote:
  Il 10/12/2013 12:42, Juan Quintela ha scritto:
 
  Hi
 
  Please, send any topic that you are interested in covering.
 
  May not need a phone call, but I'll drop it here:
 
  Could we move the time on this phone call?  7am conflicts with my
  daily commute.  I could do 6am or 9am.  I think it would be very
  useful to be able to attend this call.
 
 No problem with me.  Both of them are ok for me.
 Votes?

It would have been useful to have a time zone specification because
otherwise the message is pretty meaningless.

Juan said on IRC that that would be 15:00 CET or 18:00 CET. So if his
interpretation is right, I'd vote for 15:00 (i.e. one hour earlier than
it used to be, 14:00 UTC).

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] RFC: kvm call reschedule

2013-12-10 Thread Kevin Wolf
Am 10.12.2013 um 16:11 hat Juan Quintela geschrieben:
 Anthony can't assist this call,  just in the middle of his commute.  As
 it looks like a good idea that he can assit,  can we move the call?
 
 Options so far are (his local time):
 - Current time is 7am
 
 His suggestions:
 - 6:00am (15pm CEST)
 - 9:00am (18pm CEST)

We don't have summer time currently, so s/CEST/CET/

That is, the options are 14:00 UTC and 17:00 UTC.

 I can do any of them.  Votes for the people that attend?
 As I guess he commutes everyday,  moving the day will not help :p

I prefer 15:00 CET (14:00 UTC).

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unittests v2 0/4] Add run_tests.sh script and update unittests.cfg

2013-04-15 Thread Kevin Wolf
Am 14.04.2013 um 01:26 hat Cole Robinson geschrieben:
 On 04/12/2013 07:27 AM, Kevin Wolf wrote:
  This adds a small script that allows to conveniently run all test cases and
  that reports back one PASS or FAIL line for each test case; it also 
  creates
  a test.log file with the full output.
  
  It parses the unittests.cfg file used by autotest, so I'm also making some
  updates to this file and hopefully make some improvements that autotest can
  make use of as well.
  
  I checked with Lucas that adding new keys isn't a problem. One thing that is
  likely to cause failures for autotext, though, is that I'm including 
  i386-only
  test cases in unittests.cfg. It should probably parse the 'arch' key in the
  future and skip them (and possibly introduce an i386 run besides the x86_64 
  one
  so that they actually get tested).
  
 
 This has some overlap with a series I sent a few weeks back, enhancing x86-run
 to read unittests.cfg, and rewriting it in python for native cfg parsing and,
 cli handling, etc:
 
 http://www.spinics.net/lists/kvm/msg88030.html
 http://www.spinics.net/lists/kvm/msg88033.html
 
 Thread went silent though...

I don't really mind which patch is applied as long as I have an easy way
to run my tests (with my scripts, ./run_tests.sh -g tasks) and get
simple results (one line per testcase, indicating PASS or FAIL) as long
as I'm not debugging.

The problem that I see with your patch (5/5 is the interesting one,
right?) is that you mix up the kernel file name and the test name; I
don't think you can use it with something like the vmexit tests for
which several unittests.cfg sections with different options exist. And
of course, it's missing some features in comparison.

Is there some feature your Python code has that I don't have yet? I can
try to implement it then.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests v2 0/4] Have x86-run parse unittests.cfg

2013-04-15 Thread Kevin Wolf
Am 14.04.2013 um 20:18 hat Cole Robinson geschrieben:
 First two patches are trivial bits. Rest rewrites x86-run in python,
 which then makes it easy to parse unittests.cfg. This makes it
 simpler to invoke individual unittests the same way autotest does.
 
 Kevin has a similar series[1], but I'm reposting this for completeness.
 
 [1] http://www.spinics.net/lists/kvm/msg89471.html

As long as it doesn't take away functionality, it wouldn't even conflict
with my series. Unfortunately, it seems it does: The old x86-run
forwarded any additional options to the qemu binary, now there isn't any
way to specify options on the command line, but it always does its magic.

The other problem I mentioned in the other thread is that you assume
that the kernel file name and the configuration section are the same.
The test cases that exist in the configuration file with several
different argument strings are a strong hint that this assumption is
invalid.

Maybe it's really best to keep x86-run the thin wrapper that is today
and add a second script like my series does for a higher level tool.
(Though I must admit that this wasn't by design, my script was written
before x86-run existed and used to be standalone. I just called into
x86-run with this rebase to avoid duplicating things. But now it
actually seems to be the right choice, even if accidental.)

Also, Signed-off-by is missing throughout the series?

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unittests v2] x86/taskswitch2: Task switches into/out of VM86

2013-04-15 Thread Kevin Wolf
Am 14.04.2013 um 14:42 hat Gleb Natapov geschrieben:
 On Fri, Apr 12, 2013 at 01:14:47PM +0200, Kevin Wolf wrote:
  This adds a test case that jumps into VM86 by iret-ing to a TSS and back
  to Protected Mode using a task gate in the IDT.
  
  Signed-off-by: Kevin Wolf kw...@redhat.com
 Applied, thanks. Found a bug with it and emulate_invalid_guest_state=1
 which is default. Are you running with emulate_invalid_guest_state=0?

Not knowingly at least, I didn't specify any module options. I
guess I just have enable_unrestricted_guest == true, which makes
guest_state_valid() return true immediately.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unittests v2] x86/taskswitch2: Task switches into/out of VM86

2013-04-15 Thread Kevin Wolf
Am 15.04.2013 um 17:38 hat Gleb Natapov geschrieben:
 On Mon, Apr 15, 2013 at 10:56:42AM +0200, Kevin Wolf wrote:
  Am 14.04.2013 um 14:42 hat Gleb Natapov geschrieben:
   On Fri, Apr 12, 2013 at 01:14:47PM +0200, Kevin Wolf wrote:
This adds a test case that jumps into VM86 by iret-ing to a TSS and back
to Protected Mode using a task gate in the IDT.

Signed-off-by: Kevin Wolf kw...@redhat.com
   Applied, thanks. Found a bug with it and emulate_invalid_guest_state=1
   which is default. Are you running with emulate_invalid_guest_state=0?
  
  Not knowingly at least, I didn't specify any module options. I
  guess I just have enable_unrestricted_guest == true, which makes
  guest_state_valid() return true immediately.
  
 Can you check in
 /sys/module/kvm_intel/parameters/emulate_invalid_guest_state  and
 /sys/module/kvm_intel/parameters/unrestricted_guest? You shouldn't have
 failed entry problem with enable_unrestricted_guest == true either.

$ cat /sys/module/kvm_intel/parameters/emulate_invalid_guest_state
Y
$ cat /sys/module/kvm_intel/parameters/unrestricted_guest
Y

Why do you think that I wouldn't have the failed entry check with
unrestricted_guest == true? The task switch is always done in software
and if it leads to an invalid segment descriptor in the VMCS, then a
failed VM entry looks quite expected to me.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unittests v2] x86/taskswitch2: Task switches into/out of VM86

2013-04-12 Thread Kevin Wolf
This adds a test case that jumps into VM86 by iret-ing to a TSS and back
to Protected Mode using a task gate in the IDT.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 lib/x86/desc.c| 41 --
 lib/x86/desc.h| 36 ++
 lib/x86/vm.c  |  4 ++--
 lib/x86/vm.h  |  1 +
 x86/taskswitch2.c | 67 ++-
 5 files changed, 104 insertions(+), 45 deletions(-)

diff --git a/lib/x86/desc.c b/lib/x86/desc.c
index 770c250..7c5c721 100644
--- a/lib/x86/desc.c
+++ b/lib/x86/desc.c
@@ -27,41 +27,6 @@ typedef struct {
u8 base_high;
 } gdt_entry_t;
 
-typedef struct {
-   u16 prev;
-   u16 res1;
-   u32 esp0;
-   u16 ss0;
-   u16 res2;
-   u32 esp1;
-   u16 ss1;
-   u16 res3;
-   u32 esp2;
-   u16 ss2;
-   u16 res4;
-   u32 cr3;
-   u32 eip;
-   u32 eflags;
-   u32 eax, ecx, edx, ebx, esp, ebp, esi, edi;
-   u16 es;
-   u16 res5;
-   u16 cs;
-   u16 res6;
-   u16 ss;
-   u16 res7;
-   u16 ds;
-   u16 res8;
-   u16 fs;
-   u16 res9;
-   u16 gs;
-   u16 res10;
-   u16 ldt;
-   u16 res11;
-   u16 t:1;
-   u16 res12:15;
-   u16 iomap_base;
-} tss32_t;
-
 extern idt_entry_t boot_idt[256];
 
 void set_idt_entry(int vec, void *addr, int dpl)
@@ -268,9 +233,11 @@ unsigned exception_error_code(void)
  * 0x18 - Not presend code segment
  * 0x20 - Primery task
  * 0x28 - Interrupt task
+ *
+ * 0x30 to 0x48 - Free to use for test cases
  */
 
-static gdt_entry_t gdt[6];
+static gdt_entry_t gdt[10];
 #define TSS_GDT_OFFSET 4
 
 void set_gdt_entry(int num, u32 base,  u32 limit, u8 access, u8 gran)
@@ -327,7 +294,7 @@ void setup_gdt(void)
  .Lflush2: ::r(0x10));
 }
 
-static void set_idt_task_gate(int vec, u16 sel)
+void set_idt_task_gate(int vec, u16 sel)
 {
 idt_entry_t *e = boot_idt[vec];
 
diff --git a/lib/x86/desc.h b/lib/x86/desc.h
index 0b4897c..f819452 100644
--- a/lib/x86/desc.h
+++ b/lib/x86/desc.h
@@ -24,6 +24,41 @@ struct ex_regs {
 unsigned long rflags;
 };
 
+typedef struct {
+   u16 prev;
+   u16 res1;
+   u32 esp0;
+   u16 ss0;
+   u16 res2;
+   u32 esp1;
+   u16 ss1;
+   u16 res3;
+   u32 esp2;
+   u16 ss2;
+   u16 res4;
+   u32 cr3;
+   u32 eip;
+   u32 eflags;
+   u32 eax, ecx, edx, ebx, esp, ebp, esi, edi;
+   u16 es;
+   u16 res5;
+   u16 cs;
+   u16 res6;
+   u16 ss;
+   u16 res7;
+   u16 ds;
+   u16 res8;
+   u16 fs;
+   u16 res9;
+   u16 gs;
+   u16 res10;
+   u16 ldt;
+   u16 res11;
+   u16 t:1;
+   u16 res12:15;
+   u16 iomap_base;
+} tss32_t;
+
 #define ASM_TRY(catch)  \
 movl $0, %%gs:4 \n\t  \
 .pushsection .data.ex \n\t\
@@ -44,6 +79,7 @@ unsigned exception_error_code(void);
 void set_idt_entry(int vec, void *addr, int dpl);
 void set_idt_sel(int vec, u16 sel);
 void set_gdt_entry(int num, u32 base,  u32 limit, u8 access, u8 gran);
+void set_idt_task_gate(int vec, u16 sel);
 void set_intr_task_gate(int e, void *fn);
 void print_current_tss_info(void);
 void handle_exception(u8 v, void (*func)(struct ex_regs *regs));
diff --git a/lib/x86/vm.c b/lib/x86/vm.c
index 2852c6c..260ec45 100644
--- a/lib/x86/vm.c
+++ b/lib/x86/vm.c
@@ -109,14 +109,14 @@ void install_large_page(unsigned long *cr3,
   unsigned long phys,
   void *virt)
 {
-install_pte(cr3, 2, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_PSE, 0);
+install_pte(cr3, 2, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_USER | 
PTE_PSE, 0);
 }
 
 void install_page(unsigned long *cr3,
   unsigned long phys,
   void *virt)
 {
-install_pte(cr3, 1, virt, phys | PTE_PRESENT | PTE_WRITE, 0);
+install_pte(cr3, 1, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_USER, 0);
 }
 
 
diff --git a/lib/x86/vm.h b/lib/x86/vm.h
index 3473f8d..0b5b5c7 100644
--- a/lib/x86/vm.h
+++ b/lib/x86/vm.h
@@ -13,6 +13,7 @@
 #define PTE_PRESENT (1ull  0)
 #define PTE_PSE (1ull  7)
 #define PTE_WRITE   (1ull  1)
+#define PTE_USER(1ull  2)
 #define PTE_ADDR(0xff000ull)
 
 void setup_vm();
diff --git a/x86/taskswitch2.c b/x86/taskswitch2.c
index 683834e..6573696 100644
--- a/x86/taskswitch2.c
+++ b/x86/taskswitch2.c
@@ -5,6 +5,10 @@
 #include processor.h
 #include vm.h
 
+#define FREE_GDT_INDEX 6
+#define MAIN_TSS_INDEX (FREE_GDT_INDEX + 0)
+#define VM86_TSS_INDEX (FREE_GDT_INDEX + 1)
+
 #define xstr(s) str(s)
 #define str(s) #s
 
@@ -113,15 +117,10 @@ start:
goto start;
 }
 
-int main()
+void test_kernel_mode_int()
 {
unsigned int res;
 
-   setup_vm();
-   setup_idt();
-   setup_gdt();
-   setup_tss32();
-
/* test that int $2 triggers task gate

[PATCH kvm-unittests v2 0/4] Add run_tests.sh script and update unittests.cfg

2013-04-12 Thread Kevin Wolf
This adds a small script that allows to conveniently run all test cases and
that reports back one PASS or FAIL line for each test case; it also creates
a test.log file with the full output.

It parses the unittests.cfg file used by autotest, so I'm also making some
updates to this file and hopefully make some improvements that autotest can
make use of as well.

I checked with Lucas that adding new keys isn't a problem. One thing that is
likely to cause failures for autotext, though, is that I'm including i386-only
test cases in unittests.cfg. It should probably parse the 'arch' key in the
future and skip them (and possibly introduce an i386 run besides the x86_64 one
so that they actually get tested).

Kevin Wolf (4):
  Add run_tests.sh
  x86/unittests.cfg: Add arch for x86_64-only tests
  x86/unittests.cfg: Add missing test cases
  x86/unittests.cfg: Create test case groups

 run_tests.sh  | 123 ++
 x86-run   |   9 +++-
 x86/unittests.cfg |  49 +-
 3 files changed, 178 insertions(+), 3 deletions(-)
 create mode 100755 run_tests.sh

-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unittests v2 1/4] Add run_tests.sh

2013-04-12 Thread Kevin Wolf
This adds a convenient way to run all tests without having to set up
Autotest.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 run_tests.sh  | 123 ++
 x86-run   |   9 +++-
 x86/unittests.cfg |   2 +
 3 files changed, 132 insertions(+), 2 deletions(-)
 create mode 100755 run_tests.sh

diff --git a/run_tests.sh b/run_tests.sh
new file mode 100755
index 000..55ecac5
--- /dev/null
+++ b/run_tests.sh
@@ -0,0 +1,123 @@
+#!/bin/bash
+
+testroot=x86
+config=$testroot/unittests.cfg
+qemu=${qemu:-qemu-system-x86_64}
+verbose=0
+
+function run()
+{
+local testname=$1
+local groups=$2
+local smp=$3
+local kernel=$4
+local opts=$5
+local arch=$6
+
+if [ -z $testname ]; then
+return
+fi
+
+if [ -n $only_group ]  ! grep -q $only_group $groups; then
+return
+fi
+
+if [ -n $arch ]  [ $arch != $ARCH ]; then
+echo skip $1 ($arch only)
+return
+fi
+
+   cmdline=./x86-run $kernel -smp $smp -display none $opts
+if [ $verbose != 0 ]; then
+echo $cmdline
+fi
+
+# extra_params in the config file may contain backticks that need to be
+# expanded, so use eval to start qemu
+eval $cmdline  test.log
+
+if [ $? -le 1 ]; then
+echo -e \e[32mPASS\e[0m $1
+else
+echo -e \e[31mFAIL\e[0m $1
+fi
+}
+
+function run_all()
+{
+local config=$1
+local testname
+local smp
+local kernel
+local opts
+local groups
+local arch
+
+exec {config_fd}$config
+
+while read -u $config_fd line; do
+if [[ $line =~ ^\[(.*)\]$ ]]; then
+run $testname $groups $smp $kernel $opts $arch
+testname=${BASH_REMATCH[1]}
+smp=1
+kernel=
+opts=
+groups=
+arch=
+elif [[ $line =~ ^file\ *=\ *(.*)$ ]]; then
+kernel=$testroot/${BASH_REMATCH[1]}
+elif [[ $line =~ ^smp\ *=\ *(.*)$ ]]; then
+smp=${BASH_REMATCH[1]}
+elif [[ $line =~ ^extra_params\ *=\ *(.*)$ ]]; then
+opts=${BASH_REMATCH[1]}
+elif [[ $line =~ ^groups\ *=\ *(.*)$ ]]; then
+groups=${BASH_REMATCH[1]}
+elif [[ $line =~ ^arch\ *=\ *(.*)$ ]]; then
+arch=${BASH_REMATCH[1]}
+fi
+done
+
+run $testname $groups $smp $kernel $opts $arch
+
+exec {config_fd}-
+}
+
+function usage()
+{
+cat EOF
+
+Usage: $0 [-g group] [-h] [-v]
+
+-g: Only execute tests in the given group
+-h: Output this help text
+-v: Enables verbose mode
+
+Set the environment variable QEMU=/path/to/qemu-system-x86_64 to allow the
+internally used x86-run to pick up the right qemu binary.
+
+EOF
+}
+
+# As it happens, config.mak is valid shell script code, too :-)
+source config.mak
+
+echo  test.log
+while getopts g:hv opt; do
+case $opt in
+g)
+only_group=$OPTARG
+;;
+h)
+usage
+exit
+;;
+v)
+verbose=1
+;;
+*)
+exit
+;;
+esac
+done
+
+run_all $config
diff --git a/x86-run b/x86-run
index e395a70..19e5dec 100755
--- a/x86-run
+++ b/x86-run
@@ -13,7 +13,7 @@ else
qemu=${qemusystem}
else
echo QEMU binary ${QEMU} has no support for test device. 
Exiting.
-   exit 1
+   exit 2
fi
 fi
 
@@ -24,4 +24,9 @@ then
 else
command=${qemu} -device testdev,chardev=testlog -chardev 
file,id=testlog,path=msr.out -serial stdio -kernel
 fi
-exec ${command} $@
+
+echo ${command} $@
+${command} $@
+ret=$?
+echo Return value from qemu: $ret
+exit $ret
diff --git a/x86/unittests.cfg b/x86/unittests.cfg
index 5e08c55..7d0fa73 100644
--- a/x86/unittests.cfg
+++ b/x86/unittests.cfg
@@ -3,6 +3,8 @@
 # file = foo.flat # Name of the flat file to be used
 # smp = 2 # Number of processors the VM will use during this test
 # extra_params = -cpu qemu64,+x2apic # Additional parameters used
+# arch = i386/x86_64 # Only if the test case works only on one of them
+# groups = group1 group2 # Used to identify test cases with run_tests -g ...
 
 [apic]
 file = apic.flat
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unittests v2 2/4] x86/unittests.cfg: Add arch for x86_64-only tests

2013-04-12 Thread Kevin Wolf
Their kernel binaries would be missing when the tests are built for
i386.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 x86/unittests.cfg | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/x86/unittests.cfg b/x86/unittests.cfg
index 7d0fa73..f2336bb 100644
--- a/x86/unittests.cfg
+++ b/x86/unittests.cfg
@@ -10,6 +10,7 @@
 file = apic.flat
 smp = 2
 extra_params = -cpu qemu64,+x2apic
+arch = x86_64
 
 [smptest]
 file = smptest.flat
@@ -55,15 +56,18 @@ extra_params = -append 'ple_round_robin'
 
 [access]
 file = access.flat
+arch = x86_64
 
 [emulator]
 file = emulator.flat
+arch = x86_64
 
 [hypercall]
 file = hypercall.flat
 
 [idt_test]
 file = idt_test.flat
+arch = x86_64
 
 [msr]
 file = msr.flat
@@ -82,19 +86,23 @@ file = tsc.flat
 
 [xsave]
 file = xsave.flat
+arch = x86_64
 
 [rmap_chain]
 file = rmap_chain.flat
+arch = x86_64
 
 [svm]
 file = svm.flat
 smp = 2
 extra_params = -cpu qemu64,+svm
+arch = x86_64
 
 [svm-disabled]
 file = svm.flat
 smp = 2
 extra_params = -cpu qemu64,-svm
+arch = x86_64
 
 [kvmclock_test]
 file = kvmclock_test.flat
@@ -103,4 +111,5 @@ extra_params = --append 1000 `date +%s`
 
 [pcid]
 file = pcid.flat
-extra_params = -cpu qemu64,+pcid
\ No newline at end of file
+extra_params = -cpu qemu64,+pcid
+arch = x86_64
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unittests v2 3/4] x86/unittests.cfg: Add missing test cases

2013-04-12 Thread Kevin Wolf
Some test cases seem to have been added without updating the
configuration file. This adds them, and leaves cases commented out that
don't seem to complete.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 x86/unittests.cfg | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/x86/unittests.cfg b/x86/unittests.cfg
index f2336bb..11e8077 100644
--- a/x86/unittests.cfg
+++ b/x86/unittests.cfg
@@ -58,10 +58,16 @@ extra_params = -append 'ple_round_robin'
 file = access.flat
 arch = x86_64
 
+#[asyncpf]
+#file = asyncpf.flat
+
 [emulator]
 file = emulator.flat
 arch = x86_64
 
+[eventinj]
+file = eventinj.flat
+
 [hypercall]
 file = hypercall.flat
 
@@ -69,21 +75,33 @@ file = hypercall.flat
 file = idt_test.flat
 arch = x86_64
 
+#[init]
+#file = init.flat
+
 [msr]
 file = msr.flat
 
+[pmu]
+file = pmu.flat
+
 [port80]
 file = port80.flat
 
 [realmode]
 file = realmode.flat
 
+[s3]
+file = s3.flat
+
 [sieve]
 file = sieve.flat
 
 [tsc]
 file = tsc.flat
 
+[tsc_adjust]
+file = tsc_adjust.flat
+
 [xsave]
 file = xsave.flat
 arch = x86_64
@@ -104,6 +122,14 @@ smp = 2
 extra_params = -cpu qemu64,-svm
 arch = x86_64
 
+[taskswitch]
+file = taskswitch.flat
+arch = i386
+
+[taskswitch2]
+file = taskswitch2.flat
+arch = i386
+
 [kvmclock_test]
 file = kvmclock_test.flat
 smp = 2
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unittests v2 4/4] x86/unittests.cfg: Create test case groups

2013-04-12 Thread Kevin Wolf
Put all vmexit test cases and all task switch test cases into a group,
so that you can use something like ./run_tests -g tasks

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 x86/unittests.cfg | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/x86/unittests.cfg b/x86/unittests.cfg
index 11e8077..bc9643e 100644
--- a/x86/unittests.cfg
+++ b/x86/unittests.cfg
@@ -23,36 +23,44 @@ smp = 3
 [vmexit_cpuid]
 file = vmexit.flat
 extra_params = -append 'cpuid'
+groups = vmexit
 
 [vmexit_vmcall]
 file = vmexit.flat
 extra_params = -append 'vmcall'
+groups = vmexit
 
 [vmexit_mov_from_cr8]
 file = vmexit.flat
 extra_params = -append 'mov_from_cr8'
+groups = vmexit
 
 [vmexit_mov_to_cr8]
 file = vmexit.flat
 extra_params = -append 'mov_to_cr8'
+groups = vmexit
 
 [vmexit_inl_pmtimer]
 file = vmexit.flat
 extra_params = -append 'inl_from_pmtimer'
+groups = vmexit
 
 [vmexit_ipi]
 file = vmexit.flat
 smp = 2
 extra_params = -append 'ipi'
+groups = vmexit
 
 [vmexit_ipi_halt]
 file = vmexit.flat
 smp = 2
 extra_params = -append 'ipi_halt'
+groups = vmexit
 
 [vmexit_ple_round_robin]
 file = vmexit.flat
 extra_params = -append 'ple_round_robin'
+groups = vmexit
 
 [access]
 file = access.flat
@@ -125,10 +133,12 @@ arch = x86_64
 [taskswitch]
 file = taskswitch.flat
 arch = i386
+groups = tasks
 
 [taskswitch2]
 file = taskswitch2.flat
 arch = i386
+groups = tasks
 
 [kvmclock_test]
 file = kvmclock_test.flat
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: x86 emulator: Fix segment loading in VM86

2013-04-11 Thread Kevin Wolf
This fixes a regression introduced in commit 03ebebeb1 (KVM: x86
emulator: Leave segment limit and attributs alone in real mode).

The mentioned commit changed the segment descriptors for both real mode
and VM86 to only update the segment base instead of creating a
completely new descriptor with limit 0x so that unreal mode keeps
working across a segment register reload.

This leads to an invalid segment descriptor in the eyes of VMX, which
seems to be okay for real mode because KVM will fix it up before the
next VM entry or emulate the state, but it doesn't do this if the guest
is in VM86, so we end up with:

  KVM: entry failed, hardware error 0x8021

Fix this by effectively reverting commit 03ebebeb1 for VM86 and leaving
it only in place for real mode, which is where it's really needed.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 arch/x86/kvm/emulate.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index a335cc6..069d799 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1578,12 +1578,21 @@ static int load_segment_descriptor(struct 
x86_emulate_ctxt *ctxt,
 
memset(seg_desc, 0, sizeof seg_desc);
 
-   if ((seg = VCPU_SREG_GS  ctxt-mode == X86EMUL_MODE_VM86)
-   || ctxt-mode == X86EMUL_MODE_REAL) {
-   /* set real mode segment descriptor */
+   if (ctxt-mode == X86EMUL_MODE_REAL) {
+   /* set real mode segment descriptor (keep limit etc. for
+* unreal mode) */
ctxt-ops-get_segment(ctxt, dummy, seg_desc, NULL, seg);
set_desc_base(seg_desc, selector  4);
goto load;
+   } else if (seg = VCPU_SREG_GS  ctxt-mode == X86EMUL_MODE_VM86) {
+   /* VM86 needs a clean new segment descriptor */
+   set_desc_base(seg_desc, selector  4);
+   set_desc_limit(seg_desc, 0x);
+   seg_desc.type = 3;
+   seg_desc.p = 1;
+   seg_desc.s = 1;
+   seg_desc.dpl = 3;
+   goto load;
}
 
rpl = selector  3;
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2012-12-11

2012-12-11 Thread Kevin Wolf
Am 11.12.2012 15:45, schrieb Juan Quintela:
 Anthony Liguori aligu...@us.ibm.com wrote:
 Kevin Wolf kw...@redhat.com writes:

 Am 10.12.2012 14:59, schrieb Juan Quintela:

 Hi

 Please send in any agenda topics you are interested in.

 Can probably be answered on the list, but what is the status of
 libqos?

 Still on my TODO list.
 
 Kevin, Anthony, this is the only topic, should we have the call or
 cancel it?

No need to have a call. Though I had hoped for a not so terse answer...

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Block Migration Assertion in qemu-kvm 1.2.0

2012-09-25 Thread Kevin Wolf
Am 19.09.2012 07:49, schrieb Peter Lieven:
 On 09/18/12 12:31, Kevin Wolf wrote:
 Am 18.09.2012 12:28, schrieb Peter Lieven:
 On 09/17/12 22:12, Peter Lieven wrote:
 On 09/17/12 10:41, Kevin Wolf wrote:
 Am 16.09.2012 12:13, schrieb Peter Lieven:
 Hi,

 when trying to block migrate a VM from one node to another, the source
 VM crashed with the following assertion:
 block.c:3829: bdrv_set_in_use: Assertion `bs-in_use != in_use' failed.

 Is this sth already addresses/known?
 Not that I'm aware of, at least.

 Block migration doesn't seem to check whether the device is already in
 use, maybe this is the problem. Not sure why it would be in use, though,
 and in my quick test it didn't crash.

 So we need some more information: What's you command line, did you do
 anything specific in the monitor with block devices, what does the
 stacktrace look like, etc.?
 kevin, it seems that i can very easily force a crash if I cancel a
 running block migration.
 if I understand correctly what happens there are aio callbacks coming in
 after
 blk_mig_cleanup() has been called.

 what is the proper way to detect this in blk_mig_read_cb()?
 You could try this, it doesn't detect the situation in
 blk_mig_read_cb(), but ensures that all callbacks happen before we do
 the actual cleanup (completely untested):
 after testing it for half an hour i can say, it seems to fix the problem.
 no segfaults and also no other assertions.
 
 while searching I have seen that the queses blk_list and bmds_list are 
 initialized at
 qemu startup. wouldn't it be better to initialize them at init_blk_migration
 or at least check that they are really empty? i have also seen that 
 prev_time_offset
 is not initialized.

Probably. If you sent this as a proper patch with a SoB, I wouldn't
reject it, but considering that block migration is deprecated anyway, I
won't bother myself as long as there's no real bug.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for September 25th

2012-09-25 Thread Kevin Wolf
Am 25.09.2012 14:57, schrieb Anthony Liguori:
 Paolo Bonzini pbonz...@redhat.com writes:
 
 Il 24/09/2012 13:28, Juan Quintela ha scritto:
 Hi

 Please send in any agenda items you are interested in covering.

 URI parsing library for glusterfs: libxml2 vs. in-tree fork of the
 same code.
 
 The call is a bit late for Bharata but I think copying is the way to go.
 
 Something I've been thinking about since this discussion started
 though.  Maybe we could standardize on using URIs as short-hand syntax
 for backends.

Compared with QemuOpts, it's not really short-hand or even convenient
for manual use. For management tools it might be nice because URIs have
a well-known syntax, can escape anything and implementations exist. But
I think we must still maintain an easy to use syntax for human users.

 For example:
 
 qemu -hda file:///foo.img
 
 Or:
 
 qemu -device virtio-net-pci,netdev=tap:///vnet0?script=/etc/qemu-ifup
 
 Or:
 
 qemu -device \
   isa-serial,index=0,chr=tcp://localhost:1025/?server=onwait=off

Your examples kind of prove this: They aren't much shorter than what
exists today, but they contain ? and , which are nasty characters on
the command line.

 This works particularly well with a treat unknown options as -device
 mechanism so that we could do:
 
 qemu -isa-serial chr=tcp://localhost:1025/?server=onwait=off
 
 We could even introduce a secondary implied option to shorten this
 further to:
 
 qemu -isa-serial tcp://localhost:1025/?server=onwait=off

This is something that I was thinking of in the context of -blockdev a
while ago (without URLs): Define the block device inside of -device
specifications. The problem of nesting an option string inside another
one is solved in theory by URLs because they allow (nested) escaping,
but in practice we'll need to use some kind of brackets instead if we
want it to be usable.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Block Migration Assertion in qemu-kvm 1.2.0

2012-09-18 Thread Kevin Wolf
Am 18.09.2012 12:28, schrieb Peter Lieven:
 On 09/17/12 22:12, Peter Lieven wrote:
 On 09/17/12 10:41, Kevin Wolf wrote:
 Am 16.09.2012 12:13, schrieb Peter Lieven:
 Hi,

 when trying to block migrate a VM from one node to another, the source
 VM crashed with the following assertion:
 block.c:3829: bdrv_set_in_use: Assertion `bs-in_use != in_use' failed.

 Is this sth already addresses/known?
 Not that I'm aware of, at least.

 Block migration doesn't seem to check whether the device is already in
 use, maybe this is the problem. Not sure why it would be in use, though,
 and in my quick test it didn't crash.

 So we need some more information: What's you command line, did you do
 anything specific in the monitor with block devices, what does the
 stacktrace look like, etc.?
 kevin, it seems that i can very easily force a crash if I cancel a 
 running block migration.
 if I understand correctly what happens there are aio callbacks coming in 
 after
 blk_mig_cleanup() has been called.
 
 what is the proper way to detect this in blk_mig_read_cb()?

You could try this, it doesn't detect the situation in
blk_mig_read_cb(), but ensures that all callbacks happen before we do
the actual cleanup (completely untested):

diff --git a/block-migration.c b/block-migration.c
index 7def8ab..ed93301 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -519,6 +519,8 @@ static void blk_mig_cleanup(void)
 BlkMigDevState *bmds;
 BlkMigBlock *blk;

+bdrv_drain_all();
+
 set_dirty_tracking(0);

 while ((bmds = QSIMPLEQ_FIRST(block_mig_state.bmds_list)) != NULL) {
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Block Migration Assertion in qemu-kvm 1.2.0

2012-09-17 Thread Kevin Wolf
Am 16.09.2012 12:13, schrieb Peter Lieven:
 Hi,
 
 when trying to block migrate a VM from one node to another, the source 
 VM crashed with the following assertion:
 block.c:3829: bdrv_set_in_use: Assertion `bs-in_use != in_use' failed.
 
 Is this sth already addresses/known?

Not that I'm aware of, at least.

Block migration doesn't seem to check whether the device is already in
use, maybe this is the problem. Not sure why it would be in use, though,
and in my quick test it didn't crash.

So we need some more information: What's you command line, did you do
anything specific in the monitor with block devices, what does the
stacktrace look like, etc.?

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [QEMU PATCH 0/2] virtio-blk: writeback cache enable improvements

2012-08-01 Thread Kevin Wolf
Am 01.08.2012 17:52, schrieb Paolo Bonzini:
 Il 23/07/2012 18:32, Paolo Bonzini ha scritto:
 Il 03/07/2012 15:20, Paolo Bonzini ha scritto:
 These patches let virtio-blk use the new support for toggling the cache
 mode between writethrough and writeback.

 The first patch introduces a new feature bit and configuration field to
 do this.  The second patch disables writeback caching for guests that do
 not negotiate VIRTIO_BLK_F_WCACHE (meaning that they cannot send flush
 requests), so that limited or older guests are now safe wrt power losses.
 VIRTIO_BLK_F_FLUSH has been introduced in Linux 2.6.32 (in 2009) and was
 backported to RHEL/CentOS 5.6 (in 2010).

 The Windows drivers (which work by emulating SCSI on top of virtio-blk)
 have bugs in this area, which I reported on the Red Hat Bugzilla as
 bugs 837321 and 837324.  With these patches they will suffer a
 performance hit but gain correctness.

 Paolo Bonzini (2):
   virtio-blk: support VIRTIO_BLK_F_CONFIG_WCE
   virtio-blk: disable write cache if not negotiated
 Ping - Anthony, mst?
 
 Ping^2, so we can switch writethrough-writeback in 1.2.

Speak now or forever hold your peace. If there are no more comments
until Friday, I'll just apply it.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 2/5] scsi-disk: report resized disk via sense codes

2012-07-17 Thread Kevin Wolf
Am 16.07.2012 16:25, schrieb Paolo Bonzini:
 Linux will not use these, but a very similar mechanism will be used to
 report the condition via virtio-scsi events.
 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  hw/scsi-bus.c  |5 +
  hw/scsi-disk.c |   15 +++
  hw/scsi.h  |2 ++
  3 files changed, 18 insertions(+), 4 deletions(-)
 
 diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c
 index 8b961f2..2547c50 100644
 --- a/hw/scsi-bus.c
 +++ b/hw/scsi-bus.c
 @@ -1161,6 +1161,11 @@ const struct SCSISense sense_code_LUN_FAILURE = {
  .key = ABORTED_COMMAND, .asc = 0x3e, .ascq = 0x01
  };
  
 +/* Unit attention, Capacity data has changed */
 +const struct SCSISense sense_code_CAPACITY_CHANGED = {
 +.key = UNIT_ATTENTION, .asc = 0x2a, .ascq = 0x09
 +};
 +
  /* Unit attention, Power on, reset or bus device reset occurred */
  const struct SCSISense sense_code_RESET = {
  .key = UNIT_ATTENTION, .asc = 0x29, .ascq = 0x00
 diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
 index 42bae3b..0905446 100644
 --- a/hw/scsi-disk.c
 +++ b/hw/scsi-disk.c
 @@ -1863,6 +1863,13 @@ static void scsi_destroy(SCSIDevice *dev)
  blockdev_mark_auto_del(s-qdev.conf.bs);
  }
  
 +static void scsi_disk_resize_cb(void *opaque)
 +{
 +SCSIDiskState *s = opaque;
 +
 +scsi_device_set_ua(s-qdev, SENSE_CODE(CAPACITY_CHANGED));
 +}
 +
  static void scsi_cd_change_media_cb(void *opaque, bool load)
  {
  SCSIDiskState *s = opaque;
 @@ -1904,11 +1911,13 @@ static bool scsi_cd_is_medium_locked(void *opaque)
  return ((SCSIDiskState *)opaque)-tray_locked;
  }
  
 -static const BlockDevOps scsi_cd_block_ops = {
 +static const BlockDevOps scsi_disk_block_ops = {
  .change_media_cb = scsi_cd_change_media_cb,
  .eject_request_cb = scsi_cd_eject_request_cb,
  .is_tray_open = scsi_cd_is_tray_open,
  .is_medium_locked = scsi_cd_is_medium_locked,
 +
 +.resize_cb = scsi_disk_resize_cb,
  };
  
  static void scsi_disk_unit_attention_reported(SCSIDevice *dev)
 @@ -1956,9 +1965,7 @@ static int scsi_initfn(SCSIDevice *dev)
  return -1;
  }
  
 -if (s-features  (1  SCSI_DISK_F_REMOVABLE)) {
 -bdrv_set_dev_ops(s-qdev.conf.bs, scsi_cd_block_ops, s);
 -}
 +bdrv_set_dev_ops(s-qdev.conf.bs, scsi_disk_block_ops, s);

Are you aware of this code?

bool bdrv_dev_has_removable_media(BlockDriverState *bs)
{
return !bs-dev || (bs-dev_ops  bs-dev_ops-change_media_cb);
}

This means that now all SCSI disks have removable media from the
monitor's point of view.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 2/2] virtio-blk: disable write cache if not negotiated

2012-07-04 Thread Kevin Wolf
Am 03.07.2012 15:51, schrieb Paolo Bonzini:
 Il 03/07/2012 15:49, Kevin Wolf ha scritto:
 If the guest does not support flushes, we should run in writethrough mode.
 The setting is temporary until the next reset, so that for example the
 BIOS will run in writethrough mode while Linux will run with a writeback
 cache.

 VIRTIO_BLK_F_FLUSH has been introduced in Linux 2.6.32 (in 2009) and
 was backported to RHEL/CentOS 5.6 (in 2010).  The Windows drivers have
 two bugs, which I reported on the Red Hat Bugzilla as bugs 837321 and
 837324.  With these patches they will suffer a performance hit but
 gain correctness.

 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 I generally like the idea for a default, but doesn't this override even
 an explicit cache=writeback?
 
 Yes.  It doesn't override cache=unsafe though.

When the guest doesn't support flushes, cache=writeback is equivalent to
cache=unsafe, so if you want the old behaviour back you can switch to
cache=unsafe without additional risks.

We don't have a cache=directunsafe, though, so if you want to get the
old behaviour of cache=none back, you're out of luck. Not sure how
acceptable this is.

Irrespective of this concern I've come to the conclusion that I agree
and we actually must enforce this for non-unsafe mode, and not doing it
is a bug.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 2/2] virtio-blk: disable write cache if not negotiated

2012-07-04 Thread Kevin Wolf
Am 04.07.2012 14:21, schrieb Paolo Bonzini:
 Il 04/07/2012 12:16, Kevin Wolf ha scritto:
 Yes.  It doesn't override cache=unsafe though.
 When the guest doesn't support flushes, cache=writeback is equivalent to
 cache=unsafe, so if you want the old behaviour back you can switch to
 cache=unsafe without additional risks.

 We don't have a cache=directunsafe, though, so if you want to get the
 old behaviour of cache=none back, you're out of luck. Not sure how
 acceptable this is.
 
 If we want to fix this, let's take the occasion to split the parameters
 into cache=on/off (well, we have that already), flush=on/off, and a
 device-side wce=on/off.

You're volunteering? Great! ;-)

 Irrespective of this concern I've come to the conclusion that I agree
 and we actually must enforce this for non-unsafe mode, and not doing it
 is a bug.
 
 Thanks!  Is that an Acked-by/Reviewed-by? :)

Before merging the patches (or actually this patch, I think patch 1 is
fairly independent), I'd like to hear more opinions on whether we need
the cache parameter split first. But as far as the hardware is
concerned, sure, take it as an Acked-by and go forward with the spec and
kernel side of things.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for Tuesday, July 3rd

2012-07-03 Thread Kevin Wolf
Am 02.07.2012 19:33, schrieb Eric Blake:
 On 07/02/2012 04:16 AM, Juan Quintela wrote:

 Hi

 Please send in any agenda items you are interested in covering.
 
 Can we discuss the future of 'getfd', the possibility of 'pass-fd', or
 even the enhancement of all existing monitor commands to take an
 optional 'nfds' JSON argument for atomic management of fd passing?
 Which commands need to reopen a file with different access, and do we
 bite the bullet to special case all of those commands to allow fd
 passing or can we make qemu_open() coupled with high-level fd passing
 generic enough to satisfy all of our reopen needs?

Sure we can, at least if Corey will attend the call. Otherwise I guess
it's better to keep the discussion on the mailing list.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] virtio-blk: support VIRTIO_BLK_F_CONFIG_WCE

2012-07-03 Thread Kevin Wolf
Am 03.07.2012 15:20, schrieb Paolo Bonzini:
 Introduce a new feature bit and configuration field that provide
 support for toggling the cache mode between writethrough and writeback.
 
 Also rename VIRTIO_BLK_F_WCACHE to VIRTIO_BLK_F_WCE for consistency with
 the spec.

My spec (and my kernel as well) call it VIRTIO_BLK_F_FLUSH.

What's the status of the kernel and spec side of the change?

 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  hw/virtio-blk.c |   16 ++--
  hw/virtio-blk.h |4 +++-
  2 files changed, 17 insertions(+), 3 deletions(-)
 
 diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
 index fe07746..280f96d 100644
 --- a/hw/virtio-blk.c
 +++ b/hw/virtio-blk.c
 @@ -510,9 +510,19 @@ static void virtio_blk_update_config(VirtIODevice *vdev, 
 uint8_t *config)
  blkcfg.size_max = 0;
  blkcfg.physical_block_exp = get_physical_block_exp(s-conf);
  blkcfg.alignment_offset = 0;
 +blkcfg.wce = bdrv_enable_write_cache(s-bs);
  memcpy(config, blkcfg, sizeof(struct virtio_blk_config));
  }
  
 +static void virtio_blk_set_config(VirtIODevice *vdev, const uint8_t *config)
 +{
 +VirtIOBlock *s = to_virtio_blk(vdev);
 +struct virtio_blk_config blkcfg;
 +
 +memcpy(blkcfg, config, sizeof(blkcfg));
 +bdrv_set_enable_write_cache(s-bs, blkcfg.wce != 0);
 +}

We need to call bdrv_flush() here when turning WCE off. And it seems we
don't have a way to signal failure, or may we just leave the bit unchanged?

 @@ -49,6 +50,7 @@ struct virtio_blk_config
  uint8_t alignment_offset;
  uint16_t min_io_size;
  uint32_t opt_io_size;
 +uint8_t wce;
  } QEMU_PACKED;

If the spec isn't set in stone yet, we could make it a flags field
instead of using a whole byte for a single flag.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] virtio-blk: disable write cache if not negotiated

2012-07-03 Thread Kevin Wolf
Am 03.07.2012 15:20, schrieb Paolo Bonzini:
 If the guest does not support flushes, we should run in writethrough mode.
 The setting is temporary until the next reset, so that for example the
 BIOS will run in writethrough mode while Linux will run with a writeback
 cache.
 
 VIRTIO_BLK_F_FLUSH has been introduced in Linux 2.6.32 (in 2009) and
 was backported to RHEL/CentOS 5.6 (in 2010).  The Windows drivers have
 two bugs, which I reported on the Red Hat Bugzilla as bugs 837321 and
 837324.  With these patches they will suffer a performance hit but
 gain correctness.
 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

I generally like the idea for a default, but doesn't this override even
an explicit cache=writeback? Are we sure that we want this?

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/21] KVM: x86 emulator: allow loading null SS in long mode

2012-06-18 Thread Kevin Wolf
Am 12.06.2012 19:34, schrieb Avi Kivity:
 Null SS is valid in long mode; allow loading it.
 
 Signed-off-by: Avi Kivity a...@redhat.com

The documentation suggests that trying to load it in CPL 3 should still
fail in long mode.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Biweekly KVM Test report, kernel 51bfd299... qemu a1fce560...

2012-06-13 Thread Kevin Wolf
Am 13.06.2012 12:28, schrieb Ren, Yongjie:
 -Original Message-
 From: Marcelo Tosatti [mailto:mtosa...@redhat.com]
 Sent: Wednesday, June 13, 2012 7:25 AM
 To: Kevin Wolf
 Cc: Stefan Hajnoczi; Ren, Yongjie; Avi Kivity; kvm@vger.kernel.org; Liu,
 RongrongX; Anthony Liguori
 Subject: Re: Biweekly KVM Test report, kernel 51bfd299... qemu
 a1fce560...

 On Tue, Jun 12, 2012 at 09:45:16AM +0200, Kevin Wolf wrote:
 Am 12.06.2012 03:52, schrieb Marcelo Tosatti:
 On Thu, Jun 07, 2012 at 01:13:50PM +0100, Stefan Hajnoczi wrote:
 The 1st bad commit in your attached list is abc551bd
 More detailed info:
 171d2f2249a360d7d623130d3aa991418c53716d   good
 fd453a24166e36a3d376c9bc221e520e3ee425afgood
 abc551bd456cf0407fa798395d83dc5aa35f6dbb bad
 823ccf41509baa197dd6a3bef63837a6cf101ad8 bad

 Thanks, this points to the qcow2 v3 changes. Let's try to find the
 exact
 culprit. I have rebased the qcow2 patches on top of that good
 merge
 (fd453a24). Please apply the attached mbox on top of this merge:

 git checkout fd453a24
 git am qcow2v3.mbox

 You can then start a git bisect between your new HEAD and
 fd453a24.

 Another thing you could try is if reverting e82dabd and bef0fd59
 fixes
 the problem for you.

 After reverting bef0fd59, it can work fine.

 I'm unable to reproduce this with qemu-kvm.git/master
 (18b012756fd041556764dfa2bb1f307b6923fb71) or the commit you
 3fd9fedb9fae4517d93d76e93a2924559cacf2f6 reported as bad.  The
 RHEL 6
 guest boots without any issues.  My host kernel is Linux 3.2 (Debian
 3.2.0-2-amd64).  The guest is Linux 2.6.32-71.el6.

 Since the guest sees a disk read error, it may be useful to add
 printfs to hw/ide/core.c:ide_sector_read_cb().  Let's find out why
 the
 disk read is failing.

 Any news on this problem?

 We're still waiting for the test results with Jan's timer fix.

 OK, Ren, can you please retest with qemu-kvm.git master? TIA.

 Yes, I re-tested with latest qemu-kvm.git master branch (commit: 0a948cbb).
 I found the qcow2 image 'disk error' issue can't be reproduced in latest tree.
 And, I can still reproduce it against commit e54f008e (May 9) as I reported 
 for this bug.
 Any recent patch has been checked in to fix this issue?
 Maybe, Jan's kvm: i8254: Fix conversion of in-kernel to userspace state?

Yes, this is what we expected to fix the problem. If you like to verify,
you can try applying only this patch on top of the broken commit e54f008e.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Biweekly KVM Test report, kernel 51bfd299... qemu a1fce560...

2012-06-12 Thread Kevin Wolf
Am 12.06.2012 03:52, schrieb Marcelo Tosatti:
 On Thu, Jun 07, 2012 at 01:13:50PM +0100, Stefan Hajnoczi wrote:
 The 1st bad commit in your attached list is abc551bd
 More detailed info:
 171d2f2249a360d7d623130d3aa991418c53716d   good
 fd453a24166e36a3d376c9bc221e520e3ee425afgood
 abc551bd456cf0407fa798395d83dc5aa35f6dbb bad
 823ccf41509baa197dd6a3bef63837a6cf101ad8 bad

 Thanks, this points to the qcow2 v3 changes. Let's try to find the exact
 culprit. I have rebased the qcow2 patches on top of that good merge
 (fd453a24). Please apply the attached mbox on top of this merge:

 git checkout fd453a24
 git am qcow2v3.mbox

 You can then start a git bisect between your new HEAD and fd453a24.

 Another thing you could try is if reverting e82dabd and bef0fd59 fixes
 the problem for you.

 After reverting bef0fd59, it can work fine.

 I'm unable to reproduce this with qemu-kvm.git/master
 (18b012756fd041556764dfa2bb1f307b6923fb71) or the commit you
 3fd9fedb9fae4517d93d76e93a2924559cacf2f6 reported as bad.  The RHEL 6
 guest boots without any issues.  My host kernel is Linux 3.2 (Debian
 3.2.0-2-amd64).  The guest is Linux 2.6.32-71.el6.

 Since the guest sees a disk read error, it may be useful to add
 printfs to hw/ide/core.c:ide_sector_read_cb().  Let's find out why the
 disk read is failing.
 
 Any news on this problem? 

We're still waiting for the test results with Jan's timer fix.

But, Marcelo, you mentioned that you can reproduce it, too. Maybe you
can give it a try?

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Swap virtio images

2012-06-06 Thread Kevin Wolf
Am 06.06.2012 15:35, schrieb Iain J. Watson:
 Hello,
 
 I am trying to find a way to use backing files to make snapshots of
 running VMs. The idea is I pause the VM, use qemu-img to create a new
 disk file with the currently used file as a backing file, swap the
 images then unpause the VM.
 
 Commands would be something like:
 
 --
   (qemu) stop
   (qemu) drive_del virtio0
 
   $ qemu-img create -f qcow2 -b os-disk-0.qcow2 os-disk-1.qcow2
 
   (qemu) drive_add 0 file=os-disk-1.qcow2   # I'm not 100% sure on the
 drive_add syntax
   (qemu) c
 --
 
 If I try and do this just now I get the error Can't hot-add drive to
 type 7 when running the drive_add command.
 
 Is there another way to achieve this? Is this something that could
 potentially be added or would architecture not support what I'm trying
 to do?

You're looking for the 'snapshot_blkdev' monitor command:

(qemu) snapshot_blkdev virtio0 os-disk1.qcow2 qcow2

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Biweekly KVM Test report, kernel 51bfd299... qemu a1fce560...

2012-06-04 Thread Kevin Wolf
Am 01.06.2012 10:31, schrieb Kevin Wolf:
 Am 01.06.2012 09:57, schrieb Ren, Yongjie:
 -Original Message-
 From: Marcelo Tosatti [mailto:mtosa...@redhat.com]
 Sent: Thursday, May 31, 2012 4:28 AM
 To: Ren, Yongjie
 Cc: Kevin Wolf; Avi Kivity; kvm@vger.kernel.org; Liu, RongrongX
 Subject: Re: Biweekly KVM Test report, kernel 51bfd299... qemu
 a1fce560...

 On Tue, May 22, 2012 at 07:40:29AM +, Ren, Yongjie wrote:
 -Original Message-
 From: Kevin Wolf [mailto:kw...@redhat.com]
 Sent: Monday, May 21, 2012 11:30 PM
 To: Ren, Yongjie
 Cc: Avi Kivity; kvm@vger.kernel.org; Liu, RongrongX
 Subject: Re: Biweekly KVM Test report, kernel 51bfd299... qemu
 a1fce560...

 Am 21.05.2012 11:45, schrieb Ren, Yongjie:
 -Original Message-
 From: Kevin Wolf [mailto:kw...@redhat.com]
 Sent: Monday, May 21, 2012 5:05 PM
 To: Avi Kivity
 Cc: Ren, Yongjie; kvm@vger.kernel.org
 Subject: Re: Biweekly KVM Test report, kernel 51bfd299... qemu
 a1fce560...

 Am 21.05.2012 10:27, schrieb Avi Kivity:
 On 05/21/2012 06:34 AM, Ren, Yongjie wrote:
 Hi All,

 This is KVM upstream test result against kvm.git
 51bfd2998113e1f8ce8dcf853407b76a04b5f2a0 based on kernel
 3.4.0-rc7,
 and qemu-kvm.git a1fce560c0e5f287ed65d2aaadb3e59578aaa983.

 We found 1 new bug and 1 bug got fixed in the past two weeks.

 New issue (1):
 1. disk error when guest boot up via qcow2 image
   https://bugs.launchpad.net/qemu/+bug/1002121
   -- Should be a regression on qemu-kvm.


 Kevin, is this the known regression in qcow2 or something new?

 If the commit ID is right, it must be something new. The regression
 that
 Marcelo found was fixed in 54e68143.

 Yes, it's right. This should be a new regression.
 I looked at the comment of 54e68143, and found it was not related
 the
 issue I reported.

 The Launchpad bug refers to commit e54f008ef, which doesn't
 include
 this
 fix indeed. So was the test repeated with a more current qemu-kvm
 version after filing the bug in Launchpad, or is the commit ID in this
 mail wrong?

 Latest commit 3fd9fedb in qemu-kvm master tree still has this issue.
 And, the commit ID provided in Launchpad is correct.

 Can you please check if the bug exists in upstream qemu.git as well?

 This bug doesn't exist on upstream qemu.git with latest commit:
 fd4567d9.
 So, it should only exists on qemu-kvm tree.

 Please bisect manually (not using git bisect), with the attached list of
 commits. These are the qemu - qemu-kvm merge commits in the range
 described as bad/good.

 The 1st bad commit in your attached list is abc551bd
 More detailed info:
 171d2f2249a360d7d623130d3aa991418c53716d   good
 fd453a24166e36a3d376c9bc221e520e3ee425afgood
 abc551bd456cf0407fa798395d83dc5aa35f6dbb bad
 823ccf41509baa197dd6a3bef63837a6cf101ad8 bad
 
 Thanks, this points to the qcow2 v3 changes. Let's try to find the exact
 culprit. I have rebased the qcow2 patches on top of that good merge
 (fd453a24). Please apply the attached mbox on top of this merge:
 
 git checkout fd453a24
 git am qcow2v3.mbox
 
 You can then start a git bisect between your new HEAD and fd453a24.

Another thing you could try is if reverting e82dabd and bef0fd59 fixes
the problem for you.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Biweekly KVM Test report, kernel 51bfd299... qemu a1fce560...

2012-05-21 Thread Kevin Wolf
Am 21.05.2012 10:27, schrieb Avi Kivity:
 On 05/21/2012 06:34 AM, Ren, Yongjie wrote:
 Hi All,

 This is KVM upstream test result against kvm.git 
 51bfd2998113e1f8ce8dcf853407b76a04b5f2a0 based on kernel 3.4.0-rc7, and 
 qemu-kvm.git a1fce560c0e5f287ed65d2aaadb3e59578aaa983.

 We found 1 new bug and 1 bug got fixed in the past two weeks. 

 New issue (1):
 1. disk error when guest boot up via qcow2 image
   https://bugs.launchpad.net/qemu/+bug/1002121
   -- Should be a regression on qemu-kvm.

 
 Kevin, is this the known regression in qcow2 or something new?

If the commit ID is right, it must be something new. The regression that
Marcelo found was fixed in 54e68143.

The Launchpad bug refers to commit e54f008ef, which doesn't include this
fix indeed. So was the test repeated with a more current qemu-kvm
version after filing the bug in Launchpad, or is the commit ID in this
mail wrong?

If the bug does still exist in current master, it would be helpful to
bisect it.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Biweekly KVM Test report, kernel 51bfd299... qemu a1fce560...

2012-05-21 Thread Kevin Wolf
Am 21.05.2012 11:45, schrieb Ren, Yongjie:
 -Original Message-
 From: Kevin Wolf [mailto:kw...@redhat.com]
 Sent: Monday, May 21, 2012 5:05 PM
 To: Avi Kivity
 Cc: Ren, Yongjie; kvm@vger.kernel.org
 Subject: Re: Biweekly KVM Test report, kernel 51bfd299... qemu
 a1fce560...

 Am 21.05.2012 10:27, schrieb Avi Kivity:
 On 05/21/2012 06:34 AM, Ren, Yongjie wrote:
 Hi All,

 This is KVM upstream test result against kvm.git
 51bfd2998113e1f8ce8dcf853407b76a04b5f2a0 based on kernel 3.4.0-rc7,
 and qemu-kvm.git a1fce560c0e5f287ed65d2aaadb3e59578aaa983.

 We found 1 new bug and 1 bug got fixed in the past two weeks.

 New issue (1):
 1. disk error when guest boot up via qcow2 image
   https://bugs.launchpad.net/qemu/+bug/1002121
   -- Should be a regression on qemu-kvm.


 Kevin, is this the known regression in qcow2 or something new?

 If the commit ID is right, it must be something new. The regression that
 Marcelo found was fixed in 54e68143.

 Yes, it's right. This should be a new regression.
 I looked at the comment of 54e68143, and found it was not related the issue I 
 reported.
 
 The Launchpad bug refers to commit e54f008ef, which doesn't include this
 fix indeed. So was the test repeated with a more current qemu-kvm
 version after filing the bug in Launchpad, or is the commit ID in this
 mail wrong?

 Latest commit 3fd9fedb in qemu-kvm master tree still has this issue.
 And, the commit ID provided in Launchpad is correct.

Can you please check if the bug exists in upstream qemu.git as well?

In any case, it would be helpful to bisect the problem.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for June, Tuesday 15th

2012-05-15 Thread Kevin Wolf
Am 14.05.2012 17:42, schrieb Anthony Liguori:
 On 05/14/2012 07:41 AM, Juan Quintela wrote:

 Hi

 Please send in any agenda items you are interested in covering.
 
 -open-fd-hook proposal
 
 The discussion seems to have tapered out without a consensus.

I asked for really good reasons to justify it, and the only response you
gave was along the lines of because it's easy to implement.

Actually, I think it's not very easy at all when you start considering
cases like libvirt crashing (or restarting during an upgrade, as Daniel
mentioned). Currently we have a very simple unidirectional structure:
qemu is a standalone program that keeps running on its own. libvirt is
the user of qemu. Often enough it's already hard to get things working
correctly in error cases with this simple structure - do you really want
to have qemu depend on an RPC to libvirt?

You're right that the proper fix would be in the kernel, but in qemu a
much better solution that RPCs to libvirt is allowing all QMP commands
that open new files to pass a block device description that can contain
a fd. This would much better than first getting an open command via QMP
and then using an RPC to ask back what we're really meant to open.

To the full extent we're going to get this with blockdev-add (which is
what we should really start working on now rather than on hacks like
-open-fd-hook), but if you like hacks, much (if not all) of it is
already possible today with the 'existing' mode of live snapshots.

So if you want -open-fd-hook, please give us use cases that can't be
solved by passing the file descriptor directly in the QMP command (or on
the command line). I believe you might find some obscure ones that don't
work today, but I don't really expect that it enables anything that
-blockdev wouldn't enable.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v3 1/9] net: introduce tcp_server_start()

2012-03-14 Thread Kevin Wolf
Am 14.03.2012 08:51, schrieb Amos Kong:
 On 14/03/12 15:27, Paolo Bonzini wrote:

 
 Hi Paolo,
 
 Il 14/03/2012 08:14, Orit Wasserman ha scritto:
 if (bind(*fd, (struct sockaddr *)saddr, sizeof(saddr))  0)
 {   
  closesocket(*fd);
  return -socket_error();
 }
 return 0;

 and than you will not need ret

 But closesocket could clobber socket_error(), no?
 
 Yes, it will effect socket_error()
 
 How about this fix ?
 
  ret = bind(*fd, (struct sockaddr *)saddr, sizeof(saddr));
  if (ret  0) {
  ret = -socket_error();
  closesocket(*fd);
  }
  return ret;
 }
 

But it's still moved (or in this patch copied) code, right?

If so, please move it in one patch, and then fix it in another one on top.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 3/4] net: split hostname and service by last colon

2012-03-05 Thread Kevin Wolf
Am 02.03.2012 20:54, schrieb Laine Stump:
 On 03/02/2012 05:35 AM, Kevin Wolf wrote:
 Am 02.03.2012 10:58, schrieb Amos Kong:
 On 02/03/12 11:38, Amos Kong wrote:
 --- a/net.c
 +++ b/net.c
 @@ -84,7 +84,7 @@ static int get_str_sep(char *buf, int buf_size,
 const char **pp, int sep)
   const char *p, *p1;
   int len;
   p = *pp;
 -p1 = strchr(p, sep);
 +p1 = strrchr(p, sep);
   if (!p1)
   return -1;
   len = p1 - p;
 And what if the port isn't specified? I think you would erroneously
 interpret the last part of the IP address as port.
 Hi Kevin, port must be specified in '-incoming' parameters and migrate 
 monitor cmd.

   qemu-kvm ... -incoming tcp:$host:$port
   (qemu) migrate -d tcp:$host:$port


 If use boot up guest by wrong cmdline, qemu will report an error msg.

 # ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -boot n -incoming 
 tcp:2312::8272 -monitor stdio
 qemu-system-x86_64: qemu: getaddrinfo: Name or service not known
 tcp_server_start: Invalid argument
 Migration failed. Exit code tcp:2312::8272(-22), exiting.
 Which is because 2312: isn't a valid IP address, right? But what if you
 have something like 2312::1234:8272? If you misinterpret the 8272 as a
 port number, the remaining address is still a valid IPv6 address.
 
 This is made irrelevant by PATCH 4/4, which allows for the IP address to
 be placed inside brackets:
 
[2312::8272]:port
 
 (at least it's irrelevant if your documentation *requires* brackets for
 all numeric ipv6-address:port pairs, which is strongly recommended by
 RFC 5952). It really is impossible to disambiguate the meaning of the
 final : unless you require these brackets (or 1) require full
 specification of all potential colons in the IPv6 address or require
 that the port *always* be specified, neither of which seem acceptable to
 me).

Here you're actually explaining why it's not irrelevant. You don't want
to enforce port numbers, so 2312::1234:8272 must be interpreted as an
IPv6 address without a port. This code however would take 8727 as the
port and 2312::1234 as the IPv6 address, which is not what you expected
(even after brackets are allowed - they don't make a difference because
the example doesn't use brackets).

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 3/4] net: split hostname and service by last colon

2012-03-05 Thread Kevin Wolf
Am 05.03.2012 09:59, schrieb Amos Kong:
 - Original Message -
 Am 02.03.2012 20:54, schrieb Laine Stump:
 On 03/02/2012 05:35 AM, Kevin Wolf wrote:
 Am 02.03.2012 10:58, schrieb Amos Kong:
 On 02/03/12 11:38, Amos Kong wrote:
 --- a/net.c
 +++ b/net.c
 @@ -84,7 +84,7 @@ static int get_str_sep(char *buf, int
 buf_size,
 const char **pp, int sep)
   const char *p, *p1;
   int len;
   p = *pp;
 -p1 = strchr(p, sep);
 +p1 = strrchr(p, sep);
   if (!p1)
   return -1;
   len = p1 - p;
 And what if the port isn't specified? I think you would
 erroneously
 interpret the last part of the IP address as port.
 Hi Kevin, port must be specified in '-incoming' parameters and
 migrate
 monitor cmd.

   qemu-kvm ... -incoming tcp:$host:$port
   (qemu) migrate -d tcp:$host:$port


 If use boot up guest by wrong cmdline, qemu will report an error
 msg.

 # ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -boot n
 -incoming
 tcp:2312::8272 -monitor stdio
 qemu-system-x86_64: qemu: getaddrinfo: Name or service not known
 tcp_server_start: Invalid argument
 Migration failed. Exit code tcp:2312::8272(-22), exiting.
 Which is because 2312: isn't a valid IP address, right? But what
 if you
 have something like 2312::1234:8272? If you misinterpret the 8272
 as a
 port number, the remaining address is still a valid IPv6 address.

 This is made irrelevant by PATCH 4/4, which allows for the IP
 address to
 be placed inside brackets:

[2312::8272]:port

 (at least it's irrelevant if your documentation *requires* brackets
 for
 all numeric ipv6-address:port pairs, which is strongly recommended
 by
 RFC 5952). It really is impossible to disambiguate the meaning of
 the
 final : unless you require these brackets (or 1) require full
 specification of all potential colons in the IPv6 address or
 require
 that the port *always* be specified, neither of which seem
 acceptable to
 me).

 Here you're actually explaining why it's not irrelevant. You don't
 want
 to enforce port numbers, so 2312::1234:8272 must be interpreted as an
 IPv6 address without a port. This code however would take 8727 as the
 port and 2312::1234 as the IPv6 address, which is not what you
 expected
 (even after brackets are allowed - they don't make a difference
 because
 the example doesn't use brackets).
 
 In the migration context, host/port are all necessary, so it's right to parse 
 8272 to a port.
 However, for IPv6 brackets must be mandatory if you require a port.

Makes sense.

 BTW, the DNS delay issue existed in the past (gethostbyname()), it should be 
 fixed by another patchset.
 I will post my V2 (without fix of DNS delay) later.

Yes, I agree.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 1/4] Use getaddrinfo for migration

2012-03-02 Thread Kevin Wolf
Am 02.03.2012 03:50, schrieb Amos Kong:
 On 24/02/12 17:34, Kevin Wolf wrote:
 Am 10.02.2012 07:27, schrieb Amos Kong:
 This allows us to use ipv4/ipv6 for migration addresses.
 Once there, it also uses /etc/services names (it came free).

 Signed-off-by: Juan Quintelaquint...@redhat.com
 Signed-off-by: Amos Kongak...@redhat.com
 ---
   migration-tcp.c |   60 ---
   net.c   |  108 
 +++
   qemu_socket.h   |3 ++
   3 files changed, 127 insertions(+), 44 deletions(-)

 @@ -157,28 +141,16 @@ out2:

   int tcp_start_incoming_migration(const char *host_port)
   {
 -struct sockaddr_in addr;
 -int val;
 +int ret;
   int s;

   DPRINTF(Attempting to start an incoming migration\n);

 -if (parse_host_port(addr, host_port)  0) {
 -fprintf(stderr, invalid host/port combination: %s\n, host_port);
 -return -EINVAL;
 -}

 Oh, and this case doesn't print an error message any more now.
 
 The check work is done in tcp_start_common()
 
 tcp_start_incoming_migration()
   - tcp_client_start()
   - tcp_start_common()

Yes, but it only return -EINVAL without printing an error message, so
the failure case is silent now.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 1/4] Use getaddrinfo for migration

2012-03-02 Thread Kevin Wolf
Am 02.03.2012 04:33, schrieb Amos Kong:
 On 24/02/12 17:08, Kevin Wolf wrote:
 Am 10.02.2012 07:27, schrieb Amos Kong:
 This allows us to use ipv4/ipv6 for migration addresses.
 Once there, it also uses /etc/services names (it came free).

 Signed-off-by: Juan Quintelaquint...@redhat.com
 Signed-off-by: Amos Kongak...@redhat.com
 ---
   migration-tcp.c |   60 ---
   net.c   |  108 
 +++
   qemu_socket.h   |3 ++
   3 files changed, 127 insertions(+), 44 deletions(-)

 This patch is making too many changes at once:

 1. Move code around
 2. Remove duplicated code between client and server
 3. Replace parse_host_port() with an open-coded version
 
 4. Modify the open-coded parse_host_port() to use getaddrinfo instead of
 inet_aton/gethostbyname (Why can't we just change parse_host_port? Are
 there valid reasons to use the old implementation? Which?)
 
 tcp_common_start() needs to use the address list which is got by 
 getaddrinfo(),
 
 But I agree with verifying host/port by getaddrinfo() in parse_host_port.

The new parse_host_port() could return an address list. If it's the
right thing to use it here, it's probably right to use it in all other
places as well.

 This makes it rather hard to review.

 diff --git a/migration-tcp.c b/migration-tcp.c
 index 35a5781..644bb8f 100644
 --- a/migration-tcp.c
 +++ b/migration-tcp.c
 @@ -81,43 +81,27 @@ static void tcp_wait_for_connect(void *opaque)

   int tcp_start_outgoing_migration(MigrationState *s, const char *host_port)
   {
 -struct sockaddr_in addr;
   int ret;
 -
 -ret = parse_host_port(addr, host_port);
 -if (ret  0) {
 -return ret;
 -}
 +int fd;

   s-get_error = socket_errno;
   s-write = socket_write;
   s-close = tcp_close;

 -s-fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
 -if (s-fd == -1) {
 -DPRINTF(Unable to open socket);
 -return -socket_error();
 -}
 -
 -socket_set_nonblock(s-fd);
 -
 -do {
 -ret = connect(s-fd, (struct sockaddr *)addr, sizeof(addr));
 -if (ret == -1) {
 -ret = -socket_error();
 -}
 -if (ret == -EINPROGRESS || ret == -EWOULDBLOCK) {
 -qemu_set_fd_handler2(s-fd, NULL, NULL, tcp_wait_for_connect, 
 s);
 -return 0;
 -}
 -} while (ret == -EINTR);
 -
 -if (ret  0) {
 +ret = tcp_client_start(host_port,fd);
 +s-fd = fd;
 +if (ret == -EINPROGRESS || ret == -EWOULDBLOCK) {
 +DPRINTF(connect in progress);
 +qemu_set_fd_handler2(s-fd, NULL, NULL, tcp_wait_for_connect, s);
 +} else if (ret  0) {
   DPRINTF(connect failed\n);
 -migrate_fd_error(s);
 +if (ret != -EINVAL) {
 +migrate_fd_error(s);
 +}

 This looks odd. It is probably needed to keep the old behaviour where a
 failed parse_host_port() wouldn't call migrate_fd_error(). Is this
 really required? Maybe I'm missing something, but the caller can't know
 which failure case we had and if migrate_fd_error() has been called.
 
 getaddrinfo() is used in tcp_common_start(), -EINVAL will be returned
 when getaddrinfo() doesn't return 0.

This is an implementation detail of tcp_common_start(). What does
-EINVAL mean at the tcp_client_start() level? Should it be documented
with a comment?

Why must migrate_fd_error() not be called when getaddrinfo() failed?

   return ret;
 +} else {
 +migrate_fd_connect(s);
   }
 -migrate_fd_connect(s);
   return 0;
   }

 @@ -157,28 +141,16 @@ out2:

   int tcp_start_incoming_migration(const char *host_port)
   {
 -struct sockaddr_in addr;
 -int val;
 +int ret;
   int s;

   DPRINTF(Attempting to start an incoming migration\n);

 -if (parse_host_port(addr, host_port)  0) {
 -fprintf(stderr, invalid host/port combination: %s\n, host_port);
 -return -EINVAL;
 -}
 -
 -s = qemu_socket(PF_INET, SOCK_STREAM, 0);
 -if (s == -1) {
 -return -socket_error();
 +ret = tcp_server_start(host_port,s);
 +if (ret  0) {
 +return ret;
   }

 -val = 1;
 -setsockopt(s, SOL_SOCKET, SO_REUSEADDR, (const char *)val, 
 sizeof(val));
 -
 -if (bind(s, (struct sockaddr *)addr, sizeof(addr)) == -1) {
 -goto err;
 -}
   if (listen(s, 1) == -1) {
   goto err;
   }
 diff --git a/net.c b/net.c
 index c34474f..f63014c 100644
 --- a/net.c
 +++ b/net.c
 @@ -99,6 +99,114 @@ static int get_str_sep(char *buf, int buf_size, const 
 char **pp, int sep)
   return 0;
   }

 +static int tcp_server_bind(int fd, struct addrinfo *rp)
 +{
 +int ret;
 +int val = 1;
 +
 +/* allow fast reuse */
 +setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, (const char *)val, 
 sizeof(val));
 +
 +ret = bind(fd, rp-ai_addr, rp-ai_addrlen);
 +
 +if (ret == -1) {
 +ret = -socket_error();
 +}
 +return ret

Re: [Qemu-devel] [PATCH 3/4] net: split hostname and service by last colon

2012-03-02 Thread Kevin Wolf
Am 02.03.2012 10:58, schrieb Amos Kong:
 On 02/03/12 11:38, Amos Kong wrote:
 --- a/net.c
 +++ b/net.c
 @@ -84,7 +84,7 @@ static int get_str_sep(char *buf, int buf_size,
 const char **pp, int sep)
   const char *p, *p1;
   int len;
   p = *pp;
 -p1 = strchr(p, sep);
 +p1 = strrchr(p, sep);
   if (!p1)
   return -1;
   len = p1 - p;

 And what if the port isn't specified? I think you would erroneously
 interpret the last part of the IP address as port.
 
 Hi Kevin, port must be specified in '-incoming' parameters and migrate 
 monitor cmd.
 
   qemu-kvm ... -incoming tcp:$host:$port
   (qemu) migrate -d tcp:$host:$port
 
 
 If use boot up guest by wrong cmdline, qemu will report an error msg.
 
 # ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -boot n -incoming 
 tcp:2312::8272 -monitor stdio
 qemu-system-x86_64: qemu: getaddrinfo: Name or service not known
 tcp_server_start: Invalid argument
 Migration failed. Exit code tcp:2312::8272(-22), exiting.

Which is because 2312: isn't a valid IP address, right? But what if you
have something like 2312::1234:8272? If you misinterpret the 8272 as a
port number, the remaining address is still a valid IPv6 address.

 parse_host_port() are used in four functions in net/socket.c:
   tcp_start_outgoing_migration()
   tcp_start_incoming_migration()
   net_socket_mcast_init()
   net_socket_udp_init()
 
 The argument type of parse_host_port() needs to be changed
 if we replace inet_aton/gethostbyname by getaddrinfo.
 
 So I will not change parse_host_port(), and verify Ipv6 addr
 in tcp_start_common without parse_host_port.

Why not change the prototype of parse_host_port() and modify all callers?

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] blockdev operations [was: KVM call agenda for Tuesday 28th]

2012-02-29 Thread Kevin Wolf
Am 28.02.2012 17:07, schrieb Eric Blake:
 On 02/28/2012 07:58 AM, Stefan Hajnoczi wrote:
 On Tue, Feb 28, 2012 at 2:47 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 28/02/2012 15:39, Stefan Hajnoczi ha scritto:
 I'm not a fan of transactions or freeze/thaw (if used to atomically
 perform other commands).

 We should not export low-level block device operations so that
 external software can micromanage via QMP.  I don't think this is a
 good idea because it takes the block device offline and possibly
 blocks the VM.  We're reaching a level comparable to an HTTP interface
 for acquiring pthread mutex, doing some operations, and then another
 HTTP request to unlock it.  This is micromanagement it will create
 more problems because we will have to support lots of little API
 functions.

 So you're for extending Jeff's patches to group mirroring etc.?

 That's also my favorite one, assuming we can do it in time for 1.1.

 Yes, that's the approach I like the most.  It's relatively clean and
 leaves us space to develop -blockdev.
 
 Here's the idea I was forming based on today's call:
 
 Jeff's idea of a group operation can be extended to allow multiple
 operations while reusing the framework.  For oVirt, we need the ability
 to open a mirror (by passing the mirror file alongside the name of the
 new external snapshot), as well as reopening a blockdev (to pivot to the
 other side of an already-open mirror).
 
 Is there a way to express a designated union in QMP?  I'm thinking
 something along the lines of having the overall group command take a
 list of operations, where each operation can either be 'create a
 snapshot', 'create a snapshot and mirror', or 'reopen a mirror'.
 
 I'm thinking it might look something like:
 
 { 'enum': 'BlockdevOp',
   'data': [ 'snapshot', 'snapshot-mirror', 'reopen' ] }
 { 'type': 'BlockdevAction',
   'data': {'device': 'str', 'op': 'BlockdevOp',
'file': 'str', '*format': 'str', '*reuse': 'bool',
'*mirror': 'str', '*mirror-format': 'str' } }
 { 'command': 'blkdev-group-action-sync',
   'data': { 'actionlist': [ 'BlockdevAction' ] } }

I think the general approach is good.

Your implementation in QAPI is kind of ugly because you mix the
arguments of all three commands in BlockdevAction (how about
extensibility? And the optional flags aren't the full truth either,
we'd have to add checks in the handlers.). We really want to have some
real union support in QAPI that creates three different C structs.

So something like (I'll reintroduce the bad word transaction, because
it's really what we're doing here, just everything in one command):

{ 'type': 'BlockdevSnapshot',
  'data': {'device': 'str', 'snapshot-file': 'str', '*format': 'str' } }
{ 'union': 'BlockdevAction',
  'types': {
   'snapshot': 'BlockdevSnapshot',
   'mirror': 'BlockdevMirror', ...
   } }
{ 'command': 'blockdev-transaction',
  'data': { 'actionlist': [ 'BlockdevAction' ] } }

On the wire in QMP this would look like:

{ 'execute': 'blockdev-transaction',
  'arguments': {
'actionlist': [
  { 'type': 'snapshot',
'data': { 'device': 'ide-hd0', ... } },
  { 'type': 'mirror',
'data': { ... }
]
  }
}

Now if you look closely, BlockdevSnapshot is exactly what Jeff called
SnapshotDev in his series, which in turn is the definition of the
blockdev-snapshot-sync command. We can reuse all of this even in the API
of a new more generic command.

So my conclusion for now is that we can merge the group snapshots right
now instead of waiting for the mirror design to be completed, and we can
reuse everything in a more complex transaction command later.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 1/4] Use getaddrinfo for migration

2012-02-24 Thread Kevin Wolf
Am 10.02.2012 07:27, schrieb Amos Kong:
 This allows us to use ipv4/ipv6 for migration addresses.
 Once there, it also uses /etc/services names (it came free).
 
 Signed-off-by: Juan Quintela quint...@redhat.com
 Signed-off-by: Amos Kong ak...@redhat.com
 ---
  migration-tcp.c |   60 ---
  net.c   |  108 
 +++
  qemu_socket.h   |3 ++
  3 files changed, 127 insertions(+), 44 deletions(-)

This patch is making too many changes at once:

1. Move code around
2. Remove duplicated code between client and server
3. Replace parse_host_port() with an open-coded version
4. Modify the open-coded parse_host_port() to use getaddrinfo instead of
inet_aton/gethostbyname (Why can't we just change parse_host_port? Are
there valid reasons to use the old implementation? Which?)

This makes it rather hard to review.

 diff --git a/migration-tcp.c b/migration-tcp.c
 index 35a5781..644bb8f 100644
 --- a/migration-tcp.c
 +++ b/migration-tcp.c
 @@ -81,43 +81,27 @@ static void tcp_wait_for_connect(void *opaque)
  
  int tcp_start_outgoing_migration(MigrationState *s, const char *host_port)
  {
 -struct sockaddr_in addr;
  int ret;
 -
 -ret = parse_host_port(addr, host_port);
 -if (ret  0) {
 -return ret;
 -}
 +int fd;
  
  s-get_error = socket_errno;
  s-write = socket_write;
  s-close = tcp_close;
  
 -s-fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
 -if (s-fd == -1) {
 -DPRINTF(Unable to open socket);
 -return -socket_error();
 -}
 -
 -socket_set_nonblock(s-fd);
 -
 -do {
 -ret = connect(s-fd, (struct sockaddr *)addr, sizeof(addr));
 -if (ret == -1) {
 -ret = -socket_error();
 -}
 -if (ret == -EINPROGRESS || ret == -EWOULDBLOCK) {
 -qemu_set_fd_handler2(s-fd, NULL, NULL, tcp_wait_for_connect, s);
 -return 0;
 -}
 -} while (ret == -EINTR);
 -
 -if (ret  0) {
 +ret = tcp_client_start(host_port, fd);
 +s-fd = fd;
 +if (ret == -EINPROGRESS || ret == -EWOULDBLOCK) {
 +DPRINTF(connect in progress);
 +qemu_set_fd_handler2(s-fd, NULL, NULL, tcp_wait_for_connect, s);
 +} else if (ret  0) {
  DPRINTF(connect failed\n);
 -migrate_fd_error(s);
 +if (ret != -EINVAL) {
 +migrate_fd_error(s);
 +}

This looks odd. It is probably needed to keep the old behaviour where a
failed parse_host_port() wouldn't call migrate_fd_error(). Is this
really required? Maybe I'm missing something, but the caller can't know
which failure case we had and if migrate_fd_error() has been called.

  return ret;
 +} else {
 +migrate_fd_connect(s);
  }
 -migrate_fd_connect(s);
  return 0;
  }
  
 @@ -157,28 +141,16 @@ out2:
  
  int tcp_start_incoming_migration(const char *host_port)
  {
 -struct sockaddr_in addr;
 -int val;
 +int ret;
  int s;
  
  DPRINTF(Attempting to start an incoming migration\n);
  
 -if (parse_host_port(addr, host_port)  0) {
 -fprintf(stderr, invalid host/port combination: %s\n, host_port);
 -return -EINVAL;
 -}
 -
 -s = qemu_socket(PF_INET, SOCK_STREAM, 0);
 -if (s == -1) {
 -return -socket_error();
 +ret = tcp_server_start(host_port, s);
 +if (ret  0) {
 +return ret;
  }
  
 -val = 1;
 -setsockopt(s, SOL_SOCKET, SO_REUSEADDR, (const char *)val, sizeof(val));
 -
 -if (bind(s, (struct sockaddr *)addr, sizeof(addr)) == -1) {
 -goto err;
 -}
  if (listen(s, 1) == -1) {
  goto err;
  }
 diff --git a/net.c b/net.c
 index c34474f..f63014c 100644
 --- a/net.c
 +++ b/net.c
 @@ -99,6 +99,114 @@ static int get_str_sep(char *buf, int buf_size, const 
 char **pp, int sep)
  return 0;
  }
  
 +static int tcp_server_bind(int fd, struct addrinfo *rp)
 +{
 +int ret;
 +int val = 1;
 +
 +/* allow fast reuse */
 +setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, (const char *)val, 
 sizeof(val));
 +
 +ret = bind(fd, rp-ai_addr, rp-ai_addrlen);
 +
 +if (ret == -1) {
 +ret = -socket_error();
 +}
 +return ret;
 +
 +}
 +
 +static int tcp_client_connect(int fd, struct addrinfo *rp)
 +{
 +int ret;
 +
 +do {
 +ret = connect(fd, rp-ai_addr, rp-ai_addrlen);
 +if (ret == -1) {
 +ret = -socket_error();
 +}
 +} while (ret == -EINTR);
 +
 +return ret;
 +}
 +
 +
 +static int tcp_start_common(const char *str, int *fd, bool server)
 +{
 +char hostname[512];
 +const char *service;
 +const char *name;
 +struct addrinfo hints;
 +struct addrinfo *result, *rp;
 +int s;
 +int sfd;
 +int ret = -EINVAL;
 +
 +*fd = -1;
 +service = str;
 +if (get_str_sep(hostname, sizeof(hostname), service, ':')  0) {
 +return -EINVAL;
 +}

Separating host name and port at the first colon 

Re: [Qemu-devel] [PATCH 2/4] net/socket: allow ipv6 for net_socket_listen_init and socket_connect_init

2012-02-24 Thread Kevin Wolf
Am 10.02.2012 07:27, schrieb Amos Kong:
 Remove use of parse_host_port.
 More SO_SOCKADDR changes.
 
 Signed-off-by: Juan Quintela quint...@redhat.com
 Signed-off-by: Amos Kong ak...@redhat.com
 ---
  net/socket.c |   60 
 +++---
  1 files changed, 11 insertions(+), 49 deletions(-)
 
 diff --git a/net/socket.c b/net/socket.c
 index d4c2002..439a69c 100644
 --- a/net/socket.c
 +++ b/net/socket.c
 @@ -403,29 +403,13 @@ static int net_socket_listen_init(VLANState *vlan,
const char *host_str)
  {
  NetSocketListenState *s;
 -int fd, val, ret;
 -struct sockaddr_in saddr;
 -
 -if (parse_host_port(saddr, host_str)  0)
 -return -1;
 +int fd, ret;
  
  s = g_malloc0(sizeof(NetSocketListenState));
  
 -fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
 -if (fd  0) {
 -perror(socket);
 -g_free(s);
 -return -1;
 -}
 -socket_set_nonblock(fd);
 -
 -/* allow fast reuse */
 -val = 1;
 -setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, (const char *)val, 
 sizeof(val));
 -
 -ret = bind(fd, (struct sockaddr *)saddr, sizeof(saddr));
 +ret = tcp_server_start(host_str, fd);
  if (ret  0) {
 -perror(bind);
 +fprintf(stderr, tcp_server_start: %s\n, strerror(ret));

error_report, and it should be strerror(-ret).

  g_free(s);
  closesocket(fd);
  return -1;
 @@ -451,41 +435,19 @@ static int net_socket_connect_init(VLANState *vlan,
 const char *host_str)
  {
  NetSocketState *s;
 -int fd, connected, ret, err;
 +int fd, connected, ret;
  struct sockaddr_in saddr;
  
 -if (parse_host_port(saddr, host_str)  0)
 -return -1;
 -
 -fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
 -if (fd  0) {
 -perror(socket);
 +ret = tcp_client_start(host_str, fd);
 +if (ret == -EINPROGRESS || ret == -EWOULDBLOCK) {
 +connected = 0;
 +} else if (ret  0) {
 +closesocket(fd);

There is no open fd in this case. The only time that fd != -1 and ret 
0 is for -EINPROGRESS and -EWOULDBLOCK.

  return -1;
 +} else {
 +connected = 1;
  }
 -socket_set_nonblock(fd);
  
 -connected = 0;
 -for(;;) {
 -ret = connect(fd, (struct sockaddr *)saddr, sizeof(saddr));
 -if (ret  0) {
 -err = socket_error();
 -if (err == EINTR || err == EWOULDBLOCK) {
 -} else if (err == EINPROGRESS) {
 -break;
 -#ifdef _WIN32
 -} else if (err == WSAEALREADY || err == WSAEINVAL) {
 -break;
 -#endif

This part is lost without replacement?

 -} else {
 -perror(connect);
 -closesocket(fd);

tcp_client_start uses close() instead of closesocket() in error case,
should probably be changed.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 3/4] net: split hostname and service by last colon

2012-02-24 Thread Kevin Wolf
Am 10.02.2012 07:27, schrieb Amos Kong:
 IPv6 address contains colons, parse will be wrong.
 
 [2312::8274]:5200
 
 Signed-off-by: Amos Kong ak...@redhat.com
 ---
  net.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/net.c b/net.c
 index f63014c..9e1ef9e 100644
 --- a/net.c
 +++ b/net.c
 @@ -84,7 +84,7 @@ static int get_str_sep(char *buf, int buf_size, const char 
 **pp, int sep)
  const char *p, *p1;
  int len;
  p = *pp;
 -p1 = strchr(p, sep);
 +p1 = strrchr(p, sep);
  if (!p1)
  return -1;
  len = p1 - p;

And what if the port isn't specified? I think you would erroneously
interpret the last part of the IP address as port.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 1/4] Use getaddrinfo for migration

2012-02-24 Thread Kevin Wolf
Am 10.02.2012 07:27, schrieb Amos Kong:
 This allows us to use ipv4/ipv6 for migration addresses.
 Once there, it also uses /etc/services names (it came free).
 
 Signed-off-by: Juan Quintela quint...@redhat.com
 Signed-off-by: Amos Kong ak...@redhat.com
 ---
  migration-tcp.c |   60 ---
  net.c   |  108 
 +++
  qemu_socket.h   |3 ++
  3 files changed, 127 insertions(+), 44 deletions(-)

 @@ -157,28 +141,16 @@ out2:
  
  int tcp_start_incoming_migration(const char *host_port)
  {
 -struct sockaddr_in addr;
 -int val;
 +int ret;
  int s;
  
  DPRINTF(Attempting to start an incoming migration\n);
  
 -if (parse_host_port(addr, host_port)  0) {
 -fprintf(stderr, invalid host/port combination: %s\n, host_port);
 -return -EINVAL;
 -}

Oh, and this case doesn't print an error message any more now.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 0/4] support to migrate with IPv6 address

2012-02-24 Thread Kevin Wolf
Am 10.02.2012 07:26, schrieb Amos Kong:
 Those four patches make migration of IPv6 address work.
 Use getaddrinfo() to socket addresses infomation.
 
 ---
 
 Amos Kong (4):
   Use getaddrinfo for migration
   net/socket: allow ipv6 for net_socket_listen_init and 
 socket_connect_init
   net: split hostname and service by last colon
   net: support to include ipv6 address by brackets

I think it would be better to split the patches differently:

1. Create tcp_server_start by moving code. Keep using parse_host_port
and the old loop around connect.
2. Unify all TCP servers to use this code
3. Introduce the client code and the convert all clients. Up to here,
this should be pure refactoring work.
4. Only now start changing the logic. First thing is getaddrinfo. Don't
change from IPv4 to IPv6 yet.
5. Make all changes required for IPv6 (getaddrinfo arguments, address
parsing, etc.)

Having step-by-step conversions with an explanation of each step in the
commit message makes it so much easier to understand the commits.

For the comments on details see the replies to the individual patches.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for Tuesday 21th

2012-02-21 Thread Kevin Wolf
Am 20.02.2012 11:13, schrieb Juan Quintela:
 
 Hi
 
 Please send in any agenda items you are interested in covering.

What's the status with qtest? (Though probably a one-line email would
already answer this)

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 1/4] KVM: x86 emulator: Fix task switch privilege checks

2012-02-08 Thread Kevin Wolf
Currently, all task switches check privileges against the DPL of the
TSS. This is only correct for jmp/call to a TSS. If a task gate is used,
the DPL of this take gate is used for the check instead. Exceptions,
external interrupts and iret shouldn't perform any check.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 arch/x86/include/asm/kvm_emulate.h |2 +-
 arch/x86/include/asm/kvm_host.h|4 +-
 arch/x86/kvm/emulate.c |   53 +++-
 arch/x86/kvm/svm.c |5 +++-
 arch/x86/kvm/vmx.c |8 +++--
 arch/x86/kvm/x86.c |6 ++--
 6 files changed, 61 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index ab4092e..c8a9cf3 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -372,7 +372,7 @@ bool x86_page_table_writing_insn(struct x86_emulate_ctxt 
*ctxt);
 #define EMULATION_INTERCEPTED 2
 int x86_emulate_insn(struct x86_emulate_ctxt *ctxt);
 int emulator_task_switch(struct x86_emulate_ctxt *ctxt,
-u16 tss_selector, int reason,
+u16 tss_selector, int idt_index, int reason,
 bool has_error_code, u32 error_code);
 int emulate_int_real(struct x86_emulate_ctxt *ctxt, int irq);
 #endif /* _ASM_X86_KVM_X86_EMULATE_H */
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 52d6640..0533fc4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -741,8 +741,8 @@ int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu);
 void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
 int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int seg);
 
-int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int reason,
-   bool has_error_code, u32 error_code);
+int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
+   int reason, bool has_error_code, u32 error_code);
 
 int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
 int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 05a562b..7097ca9 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1151,6 +1151,22 @@ static int pio_in_emulated(struct x86_emulate_ctxt *ctxt,
return 1;
 }
 
+static int read_interrupt_descriptor(struct x86_emulate_ctxt *ctxt,
+u16 index, struct kvm_desc_struct *desc)
+{
+   struct kvm_desc_ptr dt;
+   ulong addr;
+
+   ctxt-ops-get_idt(ctxt, dt);
+
+   if (dt.size  index * 8 + 7)
+   return emulate_gp(ctxt, index  3 | 0x2);
+
+   addr = dt.address + index * 8;
+   return ctxt-ops-read_std(ctxt, addr, desc, sizeof *desc,
+  ctxt-exception);
+}
+
 static void get_descriptor_table_ptr(struct x86_emulate_ctxt *ctxt,
 u16 selector, struct desc_ptr *dt)
 {
@@ -2350,7 +2366,7 @@ static int task_switch_32(struct x86_emulate_ctxt *ctxt,
 }
 
 static int emulator_do_task_switch(struct x86_emulate_ctxt *ctxt,
-  u16 tss_selector, int reason,
+  u16 tss_selector, int idt_index, int reason,
   bool has_error_code, u32 error_code)
 {
struct x86_emulate_ops *ops = ctxt-ops;
@@ -2372,12 +2388,35 @@ static int emulator_do_task_switch(struct 
x86_emulate_ctxt *ctxt,
 
/* FIXME: check that next_tss_desc is tss */
 
-   if (reason != TASK_SWITCH_IRET) {
-   if ((tss_selector  3)  next_tss_desc.dpl ||
-   ops-cpl(ctxt)  next_tss_desc.dpl)
-   return emulate_gp(ctxt, 0);
+   /*
+* Check privileges. The three cases are task switch caused by...
+*
+* 1. jmp/call/int to task gate: Check against DPL of the task gate
+* 2. Exception/IRQ/iret: No check is performed
+* 3. jmp/call to TSS: Check agains DPL of the TSS
+*/
+   if (reason == TASK_SWITCH_GATE) {
+   if (idt_index != -1) {
+   /* Software interrupts */
+   struct kvm_desc_struct task_gate_desc;
+   int dpl;
+
+   ret = read_interrupt_descriptor(ctxt, idt_index,
+   task_gate_desc);
+   if (ret != X86EMUL_CONTINUE)
+   return ret;
+
+   dpl = task_gate_desc.dpl;
+   if ((tss_selector  3)  dpl || ops-cpl(ctxt)  dpl)
+   return emulate_gp(ctxt, (idt_index  3) | 0x2);
+   }
+   } else if (reason != TASK_SWITCH_IRET) {
+   int dpl = next_tss_desc.dpl

[PATCH v4 2/4] KVM: x86 emulator: VM86 segments must have DPL 3

2012-02-08 Thread Kevin Wolf
Setting the segment DPL to 0 for at least the VM86 code segment makes
the VM entry fail on VMX.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 arch/x86/kvm/emulate.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 7097ca9..144a203 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1243,6 +1243,8 @@ static int load_segment_descriptor(struct 
x86_emulate_ctxt *ctxt,
seg_desc.type = 3;
seg_desc.p = 1;
seg_desc.s = 1;
+   if (ctxt-mode == X86EMUL_MODE_VM86)
+   seg_desc.dpl = 3;
goto load;
}
 
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 3/4] KVM: SVM: Fix CPL updates

2012-02-08 Thread Kevin Wolf
Keep CPL at 0 in real mode and at 3 in VM86. In protected/long mode, use
RPL rather than DPL of the code segment.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 arch/x86/kvm/svm.c |   19 ---
 1 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 6a977c1..4124a7e 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1263,6 +1263,21 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu)
wrmsrl(host_save_user_msrs[i], svm-host_user_msrs[i]);
 }
 
+static void svm_update_cpl(struct kvm_vcpu *vcpu)
+{
+   struct vcpu_svm *svm = to_svm(vcpu);
+   int cpl;
+
+   if (!is_protmode(vcpu))
+   cpl = 0;
+   else if (svm-vmcb-save.rflags  X86_EFLAGS_VM)
+   cpl = 3;
+   else
+   cpl = svm-vmcb-save.cs.selector  0x3;
+
+   svm-vmcb-save.cpl = cpl;
+}
+
 static unsigned long svm_get_rflags(struct kvm_vcpu *vcpu)
 {
return to_svm(vcpu)-vmcb-save.rflags;
@@ -1538,9 +1553,7 @@ static void svm_set_segment(struct kvm_vcpu *vcpu,
s-attrib |= (var-g  1)  SVM_SELECTOR_G_SHIFT;
}
if (seg == VCPU_SREG_CS)
-   svm-vmcb-save.cpl
-   = (svm-vmcb-save.cs.attrib
-   SVM_SELECTOR_DPL_SHIFT)  3;
+   svm_update_cpl(vcpu);
 
mark_dirty(svm-vmcb, VMCB_SEG);
 }
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 0/4] Fix task switches into/out of VM86

2012-02-08 Thread Kevin Wolf
Kevin Wolf (4):
  KVM: x86 emulator: Fix task switch privilege checks
  KVM: x86 emulator: VM86 segments must have DPL 3
  KVM: SVM: Fix CPL updates
  KVM: x86 emulator: Allow PM/VM86 switch during task switch

 arch/x86/include/asm/kvm_emulate.h |3 +-
 arch/x86/include/asm/kvm_host.h|4 +-
 arch/x86/kvm/emulate.c |   75 ---
 arch/x86/kvm/svm.c |   28 +++--
 arch/x86/kvm/vmx.c |8 ++-
 arch/x86/kvm/x86.c |   12 -
 6 files changed, 110 insertions(+), 20 deletions(-)

-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 4/4] KVM: x86 emulator: Allow PM/VM86 switch during task switch

2012-02-08 Thread Kevin Wolf
Task switches can switch between Protected Mode and VM86. The current
mode must be updated during the task switch emulation so that the new
segment selectors are interpreted correctly.

In order to let privilege checks succeed, rflags needs to be updated in
the vcpu struct as this causes a CPL update.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 arch/x86/include/asm/kvm_emulate.h |1 +
 arch/x86/kvm/emulate.c |   20 
 arch/x86/kvm/svm.c |4 
 arch/x86/kvm/x86.c |6 ++
 4 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index c8a9cf3..4a21c7d 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -176,6 +176,7 @@ struct x86_emulate_ops {
void (*set_idt)(struct x86_emulate_ctxt *ctxt, struct desc_ptr *dt);
ulong (*get_cr)(struct x86_emulate_ctxt *ctxt, int cr);
int (*set_cr)(struct x86_emulate_ctxt *ctxt, int cr, ulong val);
+   void (*set_rflags)(struct x86_emulate_ctxt *ctxt, ulong val);
int (*cpl)(struct x86_emulate_ctxt *ctxt);
int (*get_dr)(struct x86_emulate_ctxt *ctxt, int dr, ulong *dest);
int (*set_dr)(struct x86_emulate_ctxt *ctxt, int dr, ulong value);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 144a203..a9fc21d 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2273,6 +2273,8 @@ static int load_state_from_tss32(struct x86_emulate_ctxt 
*ctxt,
return emulate_gp(ctxt, 0);
ctxt-_eip = tss-eip;
ctxt-eflags = tss-eflags | 2;
+
+   /* General purpose registers */
ctxt-regs[VCPU_REGS_RAX] = tss-eax;
ctxt-regs[VCPU_REGS_RCX] = tss-ecx;
ctxt-regs[VCPU_REGS_RDX] = tss-edx;
@@ -2295,6 +2297,24 @@ static int load_state_from_tss32(struct x86_emulate_ctxt 
*ctxt,
set_segment_selector(ctxt, tss-gs, VCPU_SREG_GS);
 
/*
+* If we're switching between Protected Mode and VM86, we need to make
+* sure to update the mode before loading the segment descriptors so
+* that the selectors are interpreted correctly.
+*
+* Need to get rflags to the vcpu struct immediately because it
+* influences the CPL which is checked at least when loading the segment
+* descriptors and when pushing an error code to the new kernel stack.
+*
+* TODO Introduce a separate ctxt-ops-set_cpl callback
+*/
+   if (ctxt-eflags  X86_EFLAGS_VM)
+   ctxt-mode = X86EMUL_MODE_VM86;
+   else
+   ctxt-mode = X86EMUL_MODE_PROT32;
+
+   ctxt-ops-set_rflags(ctxt, ctxt-eflags);
+
+   /*
 * Now load segment descriptors. If fault happenes at this stage
 * it is handled in a context of new task
 */
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 4124a7e..1559b3b 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1285,7 +1285,11 @@ static unsigned long svm_get_rflags(struct kvm_vcpu 
*vcpu)
 
 static void svm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
 {
+   unsigned long old_rflags = to_svm(vcpu)-vmcb-save.rflags;
+
to_svm(vcpu)-vmcb-save.rflags = rflags;
+   if ((old_rflags ^ rflags)  X86_EFLAGS_VM)
+   svm_update_cpl(vcpu);
 }
 
 static void svm_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dc3e945..502b5c3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4040,6 +4040,11 @@ static int emulator_set_cr(struct x86_emulate_ctxt 
*ctxt, int cr, ulong val)
return res;
 }
 
+static void emulator_set_rflags(struct x86_emulate_ctxt *ctxt, ulong val)
+{
+   kvm_set_rflags(emul_to_vcpu(ctxt), val);
+}
+
 static int emulator_get_cpl(struct x86_emulate_ctxt *ctxt)
 {
return kvm_x86_ops-get_cpl(emul_to_vcpu(ctxt));
@@ -4199,6 +4204,7 @@ static struct x86_emulate_ops emulate_ops = {
.set_idt = emulator_set_idt,
.get_cr  = emulator_get_cr,
.set_cr  = emulator_set_cr,
+   .set_rflags  = emulator_set_rflags,
.cpl = emulator_get_cpl,
.get_dr  = emulator_get_dr,
.set_dr  = emulator_set_dr,
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/4] KVM: SVM: Fix CPL updates

2012-02-06 Thread Kevin Wolf
Am 05.02.2012 12:16, schrieb Gleb Natapov:
 On Fri, Feb 03, 2012 at 07:29:24PM +0100, Kevin Wolf wrote:
 Keep CPL at 0 in real mode and at 3 in VM86. In protected/long mode, use
 RPL rather than DPL of the code segment.

 Signed-off-by: Kevin Wolf kw...@redhat.com
 ---
  arch/x86/kvm/svm.c |   19 ---
  1 files changed, 16 insertions(+), 3 deletions(-)

 diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
 index 6a977c1..4124a7e 100644
 --- a/arch/x86/kvm/svm.c
 +++ b/arch/x86/kvm/svm.c
 @@ -1263,6 +1263,21 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu)
  wrmsrl(host_save_user_msrs[i], svm-host_user_msrs[i]);
  }
  
 +static void svm_update_cpl(struct kvm_vcpu *vcpu)
 +{
 +struct vcpu_svm *svm = to_svm(vcpu);
 +int cpl;
 +
 +if (!is_protmode(vcpu))
 +cpl = 0;
 +else if (svm-vmcb-save.rflags  X86_EFLAGS_VM)
 +cpl = 3;
 +else
 +cpl = svm-vmcb-save.cs.selector  0x3;
 +
 +svm-vmcb-save.cpl = cpl;
 +}
 +
 As you probably know already I think cpl should be updated in
 svm_get_rflags() too. With current patch restoring CS segment
 register before rflags register after migration may cause cpl
 to get wrong value for instance.

I think you mean set_rflags rather than get_rflags?

Patch 4 adds this with the set_rflags emulator callback.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/4] KVM: SVM: Fix CPL updates

2012-02-06 Thread Kevin Wolf
Am 06.02.2012 10:57, schrieb Gleb Natapov:
 On Mon, Feb 06, 2012 at 10:18:35AM +0100, Kevin Wolf wrote:
 Am 05.02.2012 12:16, schrieb Gleb Natapov:
 On Fri, Feb 03, 2012 at 07:29:24PM +0100, Kevin Wolf wrote:
 Keep CPL at 0 in real mode and at 3 in VM86. In protected/long mode, use
 RPL rather than DPL of the code segment.

 Signed-off-by: Kevin Wolf kw...@redhat.com
 ---
  arch/x86/kvm/svm.c |   19 ---
  1 files changed, 16 insertions(+), 3 deletions(-)

 diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
 index 6a977c1..4124a7e 100644
 --- a/arch/x86/kvm/svm.c
 +++ b/arch/x86/kvm/svm.c
 @@ -1263,6 +1263,21 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu)
wrmsrl(host_save_user_msrs[i], svm-host_user_msrs[i]);
  }
  
 +static void svm_update_cpl(struct kvm_vcpu *vcpu)
 +{
 +  struct vcpu_svm *svm = to_svm(vcpu);
 +  int cpl;
 +
 +  if (!is_protmode(vcpu))
 +  cpl = 0;
 +  else if (svm-vmcb-save.rflags  X86_EFLAGS_VM)
 +  cpl = 3;
 +  else
 +  cpl = svm-vmcb-save.cs.selector  0x3;
 +
 +  svm-vmcb-save.cpl = cpl;
 +}
 +
 As you probably know already I think cpl should be updated in
 svm_get_rflags() too. With current patch restoring CS segment
 register before rflags register after migration may cause cpl
 to get wrong value for instance.

 I think you mean set_rflags rather than get_rflags?

 Uh, yes.
 
 Patch 4 adds this with the set_rflags emulator callback.

 Sorry, missed that, I expected it to be in this patch for some reason.
 
 I think calling svm_update_cpl() from set_rflags() is incorrect though.
 svm_update_cpl() checks cr0 so if guest does mov 1, %cr0; popf and
 popf happens to be emulated it will change cpl to cs3 which is
 incorrect.

Good point. The easy, but IMHO somewhat hackish way to fix it is to only
call svm_update_cpl() if the VM flag has changed.

For the real thing, we'd have to know whether we are in the middle of a
mode switch, i.e. whether the segment selector is a protected mode or
real mode selector. I don't think we have this information, do we?

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 4/4] KVM: x86 emulator: Allow PM/VM86 switch during task switch

2012-02-06 Thread Kevin Wolf
Am 06.02.2012 11:32, schrieb Avi Kivity:
 On 02/03/2012 08:29 PM, Kevin Wolf wrote:
 Task switches can switch between Protected Mode and VM86. The current
 mode must be updated during the task switch emulation so that the new
 segment selectors are interpreted correctly.

 In order to let privilege checks succeed, rflags needs to be updated in
 the vcpu struct as this causes a CPL update.

 diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
 index 4124a7e..38d5ef3 100644
 --- a/arch/x86/kvm/svm.c
 +++ b/arch/x86/kvm/svm.c
 @@ -1286,6 +1286,7 @@ static unsigned long svm_get_rflags(struct kvm_vcpu 
 *vcpu)
  static void svm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
  {
  to_svm(vcpu)-vmcb-save.rflags = rflags;
 +svm_update_cpl(vcpu);
  }
 
 This can trigger the (cs3)!=0  (cr01) != 0 problem, right after we
 enter protected mode.
 
 What this code does is slave the cpl to other x86 state (copying vmx),
 whereas in reality it is separate state (with independent existence from
 the mov cr0 instruction until the next far jump...).

Yes, as discussed in v2, eventually we'll want to call set_cpl() rather
than set_rflags() during the task switch (I left a TODO comment there).
Then the not quite correct update in svm_set_rflags can be removed again.

 We can fix this with
 
 if ((save.rflags ^ rflags)  EFLAGS_VM)
 svm_update_cpl(vcpu)
 
 but that leaves us with an unupdated cpl after live migration (well, it
 will update after loading segment registers).  Both are bad, but I think
 my version is less bad - live migration is broken anyway in this window,
 since we don't save the cpl.

Right, it just leaves broken what is already broken. If the rest of the
series is okay now, I'll resend with a check if rflags.VM has changed.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/4] Fix task switches into/out of VM86

2012-02-03 Thread Kevin Wolf
Kevin Wolf (4):
  KVM: x86 emulator: Fix task switch privilege checks
  KVM: x86 emulator: VM86 segments must have DPL 3
  KVM: SVM: Fix CPL updates
  KVM: x86 emulator: Allow PM/VM86 switch during task switch

 arch/x86/include/asm/kvm_emulate.h |3 +-
 arch/x86/include/asm/kvm_host.h|4 +-
 arch/x86/kvm/emulate.c |   75 ---
 arch/x86/kvm/svm.c |   25 ++--
 arch/x86/kvm/vmx.c |8 ++-
 arch/x86/kvm/x86.c |   12 -
 6 files changed, 107 insertions(+), 20 deletions(-)

-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/4] KVM: x86 emulator: Fix task switch privilege checks

2012-02-03 Thread Kevin Wolf
Currently, all task switches check privileges against the DPL of the
TSS. This is only correct for jmp/call to a TSS. If a task gate is used,
the DPL of this take gate is used for the check instead. Exceptions,
external interrupts and iret shouldn't perform any check.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 arch/x86/include/asm/kvm_emulate.h |2 +-
 arch/x86/include/asm/kvm_host.h|4 +-
 arch/x86/kvm/emulate.c |   53 +++-
 arch/x86/kvm/svm.c |5 +++-
 arch/x86/kvm/vmx.c |8 +++--
 arch/x86/kvm/x86.c |6 ++--
 6 files changed, 61 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index ab4092e..c8a9cf3 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -372,7 +372,7 @@ bool x86_page_table_writing_insn(struct x86_emulate_ctxt 
*ctxt);
 #define EMULATION_INTERCEPTED 2
 int x86_emulate_insn(struct x86_emulate_ctxt *ctxt);
 int emulator_task_switch(struct x86_emulate_ctxt *ctxt,
-u16 tss_selector, int reason,
+u16 tss_selector, int idt_index, int reason,
 bool has_error_code, u32 error_code);
 int emulate_int_real(struct x86_emulate_ctxt *ctxt, int irq);
 #endif /* _ASM_X86_KVM_X86_EMULATE_H */
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 52d6640..0533fc4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -741,8 +741,8 @@ int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu);
 void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
 int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int seg);
 
-int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int reason,
-   bool has_error_code, u32 error_code);
+int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
+   int reason, bool has_error_code, u32 error_code);
 
 int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
 int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 05a562b..7097ca9 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1151,6 +1151,22 @@ static int pio_in_emulated(struct x86_emulate_ctxt *ctxt,
return 1;
 }
 
+static int read_interrupt_descriptor(struct x86_emulate_ctxt *ctxt,
+u16 index, struct kvm_desc_struct *desc)
+{
+   struct kvm_desc_ptr dt;
+   ulong addr;
+
+   ctxt-ops-get_idt(ctxt, dt);
+
+   if (dt.size  index * 8 + 7)
+   return emulate_gp(ctxt, index  3 | 0x2);
+
+   addr = dt.address + index * 8;
+   return ctxt-ops-read_std(ctxt, addr, desc, sizeof *desc,
+  ctxt-exception);
+}
+
 static void get_descriptor_table_ptr(struct x86_emulate_ctxt *ctxt,
 u16 selector, struct desc_ptr *dt)
 {
@@ -2350,7 +2366,7 @@ static int task_switch_32(struct x86_emulate_ctxt *ctxt,
 }
 
 static int emulator_do_task_switch(struct x86_emulate_ctxt *ctxt,
-  u16 tss_selector, int reason,
+  u16 tss_selector, int idt_index, int reason,
   bool has_error_code, u32 error_code)
 {
struct x86_emulate_ops *ops = ctxt-ops;
@@ -2372,12 +2388,35 @@ static int emulator_do_task_switch(struct 
x86_emulate_ctxt *ctxt,
 
/* FIXME: check that next_tss_desc is tss */
 
-   if (reason != TASK_SWITCH_IRET) {
-   if ((tss_selector  3)  next_tss_desc.dpl ||
-   ops-cpl(ctxt)  next_tss_desc.dpl)
-   return emulate_gp(ctxt, 0);
+   /*
+* Check privileges. The three cases are task switch caused by...
+*
+* 1. jmp/call/int to task gate: Check against DPL of the task gate
+* 2. Exception/IRQ/iret: No check is performed
+* 3. jmp/call to TSS: Check agains DPL of the TSS
+*/
+   if (reason == TASK_SWITCH_GATE) {
+   if (idt_index != -1) {
+   /* Software interrupts */
+   struct kvm_desc_struct task_gate_desc;
+   int dpl;
+
+   ret = read_interrupt_descriptor(ctxt, idt_index,
+   task_gate_desc);
+   if (ret != X86EMUL_CONTINUE)
+   return ret;
+
+   dpl = task_gate_desc.dpl;
+   if ((tss_selector  3)  dpl || ops-cpl(ctxt)  dpl)
+   return emulate_gp(ctxt, (idt_index  3) | 0x2);
+   }
+   } else if (reason != TASK_SWITCH_IRET) {
+   int dpl = next_tss_desc.dpl

[PATCH v3 2/4] KVM: x86 emulator: VM86 segments must have DPL 3

2012-02-03 Thread Kevin Wolf
Setting the segment DPL to 0 for at least the VM86 code segment makes
the VM entry fail on VMX.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 arch/x86/kvm/emulate.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 7097ca9..144a203 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1243,6 +1243,8 @@ static int load_segment_descriptor(struct 
x86_emulate_ctxt *ctxt,
seg_desc.type = 3;
seg_desc.p = 1;
seg_desc.s = 1;
+   if (ctxt-mode == X86EMUL_MODE_VM86)
+   seg_desc.dpl = 3;
goto load;
}
 
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 3/4] KVM: SVM: Fix CPL updates

2012-02-03 Thread Kevin Wolf
Keep CPL at 0 in real mode and at 3 in VM86. In protected/long mode, use
RPL rather than DPL of the code segment.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 arch/x86/kvm/svm.c |   19 ---
 1 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 6a977c1..4124a7e 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1263,6 +1263,21 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu)
wrmsrl(host_save_user_msrs[i], svm-host_user_msrs[i]);
 }
 
+static void svm_update_cpl(struct kvm_vcpu *vcpu)
+{
+   struct vcpu_svm *svm = to_svm(vcpu);
+   int cpl;
+
+   if (!is_protmode(vcpu))
+   cpl = 0;
+   else if (svm-vmcb-save.rflags  X86_EFLAGS_VM)
+   cpl = 3;
+   else
+   cpl = svm-vmcb-save.cs.selector  0x3;
+
+   svm-vmcb-save.cpl = cpl;
+}
+
 static unsigned long svm_get_rflags(struct kvm_vcpu *vcpu)
 {
return to_svm(vcpu)-vmcb-save.rflags;
@@ -1538,9 +1553,7 @@ static void svm_set_segment(struct kvm_vcpu *vcpu,
s-attrib |= (var-g  1)  SVM_SELECTOR_G_SHIFT;
}
if (seg == VCPU_SREG_CS)
-   svm-vmcb-save.cpl
-   = (svm-vmcb-save.cs.attrib
-   SVM_SELECTOR_DPL_SHIFT)  3;
+   svm_update_cpl(vcpu);
 
mark_dirty(svm-vmcb, VMCB_SEG);
 }
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 4/4] KVM: x86 emulator: Allow PM/VM86 switch during task switch

2012-02-03 Thread Kevin Wolf
Task switches can switch between Protected Mode and VM86. The current
mode must be updated during the task switch emulation so that the new
segment selectors are interpreted correctly.

In order to let privilege checks succeed, rflags needs to be updated in
the vcpu struct as this causes a CPL update.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 arch/x86/include/asm/kvm_emulate.h |1 +
 arch/x86/kvm/emulate.c |   20 
 arch/x86/kvm/svm.c |1 +
 arch/x86/kvm/x86.c |6 ++
 4 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index c8a9cf3..4a21c7d 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -176,6 +176,7 @@ struct x86_emulate_ops {
void (*set_idt)(struct x86_emulate_ctxt *ctxt, struct desc_ptr *dt);
ulong (*get_cr)(struct x86_emulate_ctxt *ctxt, int cr);
int (*set_cr)(struct x86_emulate_ctxt *ctxt, int cr, ulong val);
+   void (*set_rflags)(struct x86_emulate_ctxt *ctxt, ulong val);
int (*cpl)(struct x86_emulate_ctxt *ctxt);
int (*get_dr)(struct x86_emulate_ctxt *ctxt, int dr, ulong *dest);
int (*set_dr)(struct x86_emulate_ctxt *ctxt, int dr, ulong value);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 144a203..a9fc21d 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2273,6 +2273,8 @@ static int load_state_from_tss32(struct x86_emulate_ctxt 
*ctxt,
return emulate_gp(ctxt, 0);
ctxt-_eip = tss-eip;
ctxt-eflags = tss-eflags | 2;
+
+   /* General purpose registers */
ctxt-regs[VCPU_REGS_RAX] = tss-eax;
ctxt-regs[VCPU_REGS_RCX] = tss-ecx;
ctxt-regs[VCPU_REGS_RDX] = tss-edx;
@@ -2295,6 +2297,24 @@ static int load_state_from_tss32(struct x86_emulate_ctxt 
*ctxt,
set_segment_selector(ctxt, tss-gs, VCPU_SREG_GS);
 
/*
+* If we're switching between Protected Mode and VM86, we need to make
+* sure to update the mode before loading the segment descriptors so
+* that the selectors are interpreted correctly.
+*
+* Need to get rflags to the vcpu struct immediately because it
+* influences the CPL which is checked at least when loading the segment
+* descriptors and when pushing an error code to the new kernel stack.
+*
+* TODO Introduce a separate ctxt-ops-set_cpl callback
+*/
+   if (ctxt-eflags  X86_EFLAGS_VM)
+   ctxt-mode = X86EMUL_MODE_VM86;
+   else
+   ctxt-mode = X86EMUL_MODE_PROT32;
+
+   ctxt-ops-set_rflags(ctxt, ctxt-eflags);
+
+   /*
 * Now load segment descriptors. If fault happenes at this stage
 * it is handled in a context of new task
 */
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 4124a7e..38d5ef3 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1286,6 +1286,7 @@ static unsigned long svm_get_rflags(struct kvm_vcpu *vcpu)
 static void svm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
 {
to_svm(vcpu)-vmcb-save.rflags = rflags;
+   svm_update_cpl(vcpu);
 }
 
 static void svm_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dc3e945..502b5c3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4040,6 +4040,11 @@ static int emulator_set_cr(struct x86_emulate_ctxt 
*ctxt, int cr, ulong val)
return res;
 }
 
+static void emulator_set_rflags(struct x86_emulate_ctxt *ctxt, ulong val)
+{
+   kvm_set_rflags(emul_to_vcpu(ctxt), val);
+}
+
 static int emulator_get_cpl(struct x86_emulate_ctxt *ctxt)
 {
return kvm_x86_ops-get_cpl(emul_to_vcpu(ctxt));
@@ -4199,6 +4204,7 @@ static struct x86_emulate_ops emulate_ops = {
.set_idt = emulator_set_idt,
.get_cr  = emulator_get_cr,
.set_cr  = emulator_set_cr,
+   .set_rflags  = emulator_set_rflags,
.cpl = emulator_get_cpl,
.get_dr  = emulator_get_dr,
.set_dr  = emulator_set_dr,
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/3] Fix task switches into/out of VM86

2012-01-30 Thread Kevin Wolf
Am 27.01.2012 20:52, schrieb Gleb Natapov:
 On Fri, Jan 27, 2012 at 08:23:33PM +0100, Kevin Wolf wrote:
 I believe this should work with both VMX and SVM now. Gleb, Jörg, can one of
 you test this with SVM? I did some testing on my buggy processor and it looks
 as good as it gets, but it would be better if you could confirm.

 You forgot to set cpl to 3 in vmcb in svm_set_rflags() when vm86 is enabled, 
 no?

SVM updates the CPL when the segment selector for CS is loaded. From a
svm.c POV, segment selectors are updated immediately after set_rflags,
so it wouldn't really make a difference to do it twice.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/3] Fix task switches into/out of VM86

2012-01-30 Thread Kevin Wolf
Am 30.01.2012 09:55, schrieb Gleb Natapov:
 On Mon, Jan 30, 2012 at 09:48:33AM +0100, Kevin Wolf wrote:
 Am 27.01.2012 20:52, schrieb Gleb Natapov:
 On Fri, Jan 27, 2012 at 08:23:33PM +0100, Kevin Wolf wrote:
 I believe this should work with both VMX and SVM now. Gleb, Jörg, can one 
 of
 you test this with SVM? I did some testing on my buggy processor and it 
 looks
 as good as it gets, but it would be better if you could confirm.

 You forgot to set cpl to 3 in vmcb in svm_set_rflags() when vm86 is 
 enabled, no?

 SVM updates the CPL when the segment selector for CS is loaded. From a
 svm.c POV, segment selectors are updated immediately after set_rflags,
 so it wouldn't really make a difference to do it twice.

 It is too subtle to rely on that. The fact is that checking cpl after
 set_rflags provides incorrect value. This better be fixed.

Depends on what value you consider to be correct between reloading
eflags and reloading cs. I think it's logical and more consistent to say
that CPL only changes when cs is reloaded, but you could argue that it's
effective with the reload of rflags. It doesn't make a difference to
guests, so we can decide to choose whatever we like.

Depending on what we decide on (Gleb and I disagree on this, so more
input would be helpful), either VMX or SVM need a cleanup. I think it
can be done independent from and on top of this fix.

 BTW does load_state_from_tss16() need the same fix?

The manual says Do not use a 16-bit TSS to implement a virtual-8086
task. Actually, I don't think you could do that, even if you wanted,
with a 16-bit flags field.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/3] KVM: x86 emulator: Allow PM/VM86 switch during task switch

2012-01-30 Thread Kevin Wolf
Am 30.01.2012 11:24, schrieb Avi Kivity:
 On 01/27/2012 09:23 PM, Kevin Wolf wrote:
 Task switches can switch between Protected Mode and VM86. The current
 mode must be updated during the task switch emulation so that the new
 segment selectors are interpreted correctly and privilege checks
 succeed.

 VMX code calculates the CPL from the code segment selector and rflags,
 so it needs rflags to be updated in the vcpu struct. SVM stores the DPL
 of the code segment instead, so we must be sure to give the right one
 when updating the selector.
 
 svm has vmcb_save_area::cpl; it's independent of CS.RPL.

Right, but cpl in the VMCB is updated when you reload a segment (and I
think only then), so it's closely related.

 Signed-off-by: Kevin Wolf kw...@redhat.com
 ---
  arch/x86/include/asm/kvm_emulate.h |1 +
  arch/x86/kvm/emulate.c |   26 ++
  arch/x86/kvm/x86.c |6 ++
  3 files changed, 33 insertions(+), 0 deletions(-)

 diff --git a/arch/x86/include/asm/kvm_emulate.h 
 b/arch/x86/include/asm/kvm_emulate.h
 index c8a9cf3..4a21c7d 100644
 --- a/arch/x86/include/asm/kvm_emulate.h
 +++ b/arch/x86/include/asm/kvm_emulate.h
 @@ -176,6 +176,7 @@ struct x86_emulate_ops {
  void (*set_idt)(struct x86_emulate_ctxt *ctxt, struct desc_ptr *dt);
  ulong (*get_cr)(struct x86_emulate_ctxt *ctxt, int cr);
  int (*set_cr)(struct x86_emulate_ctxt *ctxt, int cr, ulong val);
 +void (*set_rflags)(struct x86_emulate_ctxt *ctxt, ulong val);
  int (*cpl)(struct x86_emulate_ctxt *ctxt);
  int (*get_dr)(struct x86_emulate_ctxt *ctxt, int dr, ulong *dest);
  int (*set_dr)(struct x86_emulate_ctxt *ctxt, int dr, ulong value);
 diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
 index 833969e..143ce8e 100644
 --- a/arch/x86/kvm/emulate.c
 +++ b/arch/x86/kvm/emulate.c
 @@ -552,6 +552,14 @@ static void set_segment_selector(struct 
 x86_emulate_ctxt *ctxt, u16 selector,
  struct desc_struct desc;
  
  ctxt-ops-get_segment(ctxt, dummy, desc, base3, seg);
 +
 +if (ctxt-mode == X86EMUL_MODE_REAL)
 +desc.dpl = 0;
 
 Can't happen.

set_segment_selector is only called during task switches right now, so
yes, this is impossible. Want me to drop the check? Or BUG()?

 +else if (ctxt-mode == X86EMUL_MODE_VM86)
 +desc.dpl = 3;
 +else
 +desc.dpl = selector  0x3;
 
 I don't think this is right.  The SDM explicitly says that just the
 selectors are loaded, yet you're modifying the DPL here too.  Unless
 there's an abort, we'll be loading the descriptors later anyway (with
 their DPL).  If there's an abort, I think we should continue with the
 mismatched DPL.
 
 +
  ctxt-ops-set_segment(ctxt, selector, desc, base3, seg);
  }
  
 @@ -2273,6 +2281,24 @@ static int load_state_from_tss32(struct 
 x86_emulate_ctxt *ctxt,
  return emulate_gp(ctxt, 0);
  ctxt-_eip = tss-eip;
  ctxt-eflags = tss-eflags | 2;
 +
 +/*
 + * If we're switching between Protected Mode and VM86, we need to make
 + * sure to update the mode before loading the segment descriptors so
 + * that the selectors are interpreted correctly.
 + *
 + * Need to get it to the vcpu struct immediately because it influences
 + * the CPL which is checked at least when loading the segment
 + * descriptors and when pushing an error code to the new kernel stack.
 + */
 +if (ctxt-eflags  X86_EFLAGS_VM)
 +ctxt-mode = X86EMUL_MODE_VM86;
 +else
 +ctxt-mode = X86EMUL_MODE_PROT32;
 +
 
 Shouldn't this be done after the set_segment_selector() block?  My
 interpretation of the SDM is that if a fault happens while loading
 descriptors the fault happens with old segment cache, that is, it needs
 to be interpreted according to the old mode.

This is closely related to my argument with Gleb whether CPL changes
when rflags is reloaded or if it only happens when cs is reloaded. I
argued that it makes more sense to couple it with cs, so I guess I have
to agree with you mostly.

The SDM says that any faults during loading the segment descriptors
happen in the context of the new task, and the task switch is completed
before an exception is generated. So it shouldn't make any difference to
the guest what the exact point is at which we change CPL, it's an KVM
internal decision.

So what about this order:

1. Reload general purpose registers and eflags without updating mode
   or writing back rflags to the vcpu struct.

2. Load segment selectors without changing the DPL yet.

3. Load segment descriptors, and disable the privilege checks in
   load_segment_descriptor using a new flag. For SVM, this updates
   the CPL when cs is reloaded.

4. Call ctxt-ops.set_rflag so that VMX updates the CPL. Should be
   cleaned up in a follow-up patch, so that VMX uses set_segment
   to update the CPL, like SVM does.

Would this match your interpretation?

 +ctxt-ops-set_rflags(ctxt, ctxt

Re: [PATCH v2 1/3] KVM: x86 emulator: Fix task switch privilege checks

2012-01-30 Thread Kevin Wolf
Am 30.01.2012 11:39, schrieb Avi Kivity:
 On 01/27/2012 09:23 PM, Kevin Wolf wrote:
 Currently, all task switches check privileges against the DPL of the
 TSS. This is only correct for jmp/call to a TSS. If a task gate is used,
 the DPL of this take gate is used for the check instead. Exceptions,
 external interrupts and iret shouldn't perform any check.

  
 @@ -2372,12 +2389,32 @@ static int emulator_do_task_switch(struct 
 x86_emulate_ctxt *ctxt,
  
  /* FIXME: check that next_tss_desc is tss */
  
 -if (reason != TASK_SWITCH_IRET) {
 -if ((tss_selector  3)  next_tss_desc.dpl ||
 -ops-cpl(ctxt)  next_tss_desc.dpl)
 -return emulate_gp(ctxt, 0);
 +/*
 + * Check privileges. The three cases are task switch caused by...
 + *
 + * 1. Software interrupt: Check against DPL of the task gate
 + * 2. Exception/IRQ/iret: No check is performed
 + * 3. jmp/call: Check agains DPL of the TSS
 + */
 +dpl = -1;
 +if (reason == TASK_SWITCH_GATE) {
 +if (idt_index != -1) {
 +struct kvm_desc_struct task_gate_desc;
 +
 +ret = read_interrupt_descriptor(ctxt, idt_index,
 +task_gate_desc);
 +if (ret != X86EMUL_CONTINUE)
 +return ret;
 +
 +dpl = task_gate_desc.dpl;
 +}
 +} else if (reason != TASK_SWITCH_IRET) {
 +dpl = next_tss_desc.dpl;
  }
  
 +if (dpl != -1  ((tss_selector  3)  dpl || ops-cpl(ctxt)  dpl))
 +return emulate_gp(ctxt, 0);
 
 Should be #GP(selector), no?

This line is just moved code, but without looking it up I would expect
the selector, yes. Would be an independent bug. I can add a patch to the
series if you like.

 The TASK-GATE: branch of the CALL instruction documentation lists many
 other conditions which we don't check, so I'll accept this, otherwise
 we'll never make any progress.

Hm, I didn't really look at call, just iret and exceptions.

Seems there are many test cases left to write.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/3] KVM: x86 emulator: Allow PM/VM86 switch during task switch

2012-01-30 Thread Kevin Wolf
Am 30.01.2012 14:23, schrieb Avi Kivity:
 On 01/30/2012 01:05 PM, Kevin Wolf wrote:
 Am 30.01.2012 11:24, schrieb Avi Kivity:
 On 01/27/2012 09:23 PM, Kevin Wolf wrote:
 Task switches can switch between Protected Mode and VM86. The current
 mode must be updated during the task switch emulation so that the new
 segment selectors are interpreted correctly and privilege checks
 succeed.

 VMX code calculates the CPL from the code segment selector and rflags,
 so it needs rflags to be updated in the vcpu struct. SVM stores the DPL
 of the code segment instead, so we must be sure to give the right one
 when updating the selector.

 svm has vmcb_save_area::cpl; it's independent of CS.RPL.

 Right, but cpl in the VMCB is updated when you reload a segment (and I
 think only then), 
 
 Gah, it's broken.  It should be qualified by more - real mode should
 keep cpl at zero, vm86 at 3.  And setting rflags.vm86 should update cpl
 as well.
 
 For example, live migration while in real mode with cs3!=0 or in vm86
 mode with cs3==0 would set the wrong cpl.

Ah, right, I didn't think of this case.

Not sure if I should fix it in this series. If we do the right fixes to
the task switch, it won't be strictly needed for the bug I'm fixing.

 so it's closely related.
   
ctxt-ops-get_segment(ctxt, dummy, desc, base3, seg);
 +
 +  if (ctxt-mode == X86EMUL_MODE_REAL)
 +  desc.dpl = 0;

 Can't happen.

 set_segment_selector is only called during task switches right now, so
 yes, this is impossible. Want me to drop the check? Or BUG()?
 
 BUG()s are dangerous in rarely used code since they can be exploited as
 a DoS.  So maybe a WARN_ON_ONCE().

Ok.

 
 @@ -2273,6 +2281,24 @@ static int load_state_from_tss32(struct 
 x86_emulate_ctxt *ctxt,
return emulate_gp(ctxt, 0);
ctxt-_eip = tss-eip;
ctxt-eflags = tss-eflags | 2;
 +
 +  /*
 +   * If we're switching between Protected Mode and VM86, we need to make
 +   * sure to update the mode before loading the segment descriptors so
 +   * that the selectors are interpreted correctly.
 +   *
 +   * Need to get it to the vcpu struct immediately because it influences
 +   * the CPL which is checked at least when loading the segment
 +   * descriptors and when pushing an error code to the new kernel stack.
 +   */
 +  if (ctxt-eflags  X86_EFLAGS_VM)
 +  ctxt-mode = X86EMUL_MODE_VM86;
 +  else
 +  ctxt-mode = X86EMUL_MODE_PROT32;
 +

 Shouldn't this be done after the set_segment_selector() block?  My
 interpretation of the SDM is that if a fault happens while loading
 descriptors the fault happens with old segment cache, that is, it needs
 to be interpreted according to the old mode.

 This is closely related to my argument with Gleb whether CPL changes
 when rflags is reloaded or if it only happens when cs is reloaded. I
 argued that it makes more sense to couple it with cs, so I guess I have
 to agree with you mostly.

 The SDM says that any faults during loading the segment descriptors
 happen in the context of the new task, and the task switch is completed
 before an exception is generated. So it shouldn't make any difference to
 the guest what the exact point is at which we change CPL, it's an KVM
 internal decision.

 So what about this order:

 1. Reload general purpose registers and eflags without updating mode
or writing back rflags to the vcpu struct.

 2. Load segment selectors without changing the DPL yet.
 
 (or changing anything in the segment cache)
 

 3. Load segment descriptors, and disable the privilege checks in
load_segment_descriptor using a new flag. 
 
 I don't like doing things that aren't directly traceable to the SDM. 
 Perhaps we should pass the cpl as a parameter to
 load_segment_descriptor().  Or we should -set_cpl() just before.

I like the idea of having a -set_cpl(). For SVM it should be trivial to
implement.

For VMX, I think instead of clearing VCPU_EXREG_CPL, vcpu_vmx-cpl
should directly be updated in set_rflags and set_segment (and in the new
set_cpl, obviously).

Would that be enough or would we have to avoid clearing it in all other
places as well? Where would it be initialised if it's not enough?

 For SVM, this updates
the CPL when cs is reloaded.
 
 
 

 4. Call ctxt-ops.set_rflag so that VMX updates the CPL. Should be
cleaned up in a follow-up patch, so that VMX uses set_segment
to update the CPL, like SVM does.

 Would this match your interpretation?
 
 Not exactly.  I claim that the cpl is a separate state from
 cs.cpl/ss.rpl/cr0.pe/eflags.vm - it cannot be derived just from those. 
 On the transition from real mode to protected mode, you can have (say)
 cs=0x13 and cpl=0.

But even with cs=0x13 cs.dpl would still be 0 (in the segment cache)
before cs is reloaded and you switch to a real protected mode segment,
right?

 Outside of mode switches, it's good to have set_segment() update the cpl
 automatically.  But inside mode switches it can screw up further checks
 or aborts.

Can you

Re: [PATCH v2 3/3] KVM: x86 emulator: Allow PM/VM86 switch during task switch

2012-01-30 Thread Kevin Wolf
Am 30.01.2012 15:32, schrieb Avi Kivity:
 On 01/30/2012 04:01 PM, Kevin Wolf wrote:
 Am 30.01.2012 14:23, schrieb Avi Kivity:
 On 01/30/2012 01:05 PM, Kevin Wolf wrote:
 Am 30.01.2012 11:24, schrieb Avi Kivity:
 On 01/27/2012 09:23 PM, Kevin Wolf wrote:
 Task switches can switch between Protected Mode and VM86. The current
 mode must be updated during the task switch emulation so that the new
 segment selectors are interpreted correctly and privilege checks
 succeed.

 VMX code calculates the CPL from the code segment selector and rflags,
 so it needs rflags to be updated in the vcpu struct. SVM stores the DPL
 of the code segment instead, so we must be sure to give the right one
 when updating the selector.

 svm has vmcb_save_area::cpl; it's independent of CS.RPL.

 Right, but cpl in the VMCB is updated when you reload a segment (and I
 think only then), 

 Gah, it's broken.  It should be qualified by more - real mode should
 keep cpl at zero, vm86 at 3.  And setting rflags.vm86 should update cpl
 as well.

 For example, live migration while in real mode with cs3!=0 or in vm86
 mode with cs3==0 would set the wrong cpl.

 Ah, right, I didn't think of this case.

 Not sure if I should fix it in this series. 
 
 No need.  But we need to have a clear picture of where we're going so we
 move in the right direction.  And that direction is cpl decoupled from
 cs.rpl/ss.rpl/cr0.pe/eflags.vm (but changes in response to them, except
 cr0.pe).
 
 Note that we don't expose cpl as separate state for save/restore, so the
 kvm ABI follows vmx instead of svm (unfortunately).

Which means that restoring a VM that was in the middle of a RM - PM
mode switch will result in corrupted state. Probably not a huge problem
in practice, but can we fail the save?

 If we do the right fixes to
 the task switch, it won't be strictly needed for the bug I'm fixing.
 
 Right.
 

 3. Load segment descriptors, and disable the privilege checks in
load_segment_descriptor using a new flag. 

 I don't like doing things that aren't directly traceable to the SDM. 
 Perhaps we should pass the cpl as a parameter to
 load_segment_descriptor().  Or we should -set_cpl() just before.

 I like the idea of having a -set_cpl(). For SVM it should be trivial to
 implement.

 For VMX, I think instead of clearing VCPU_EXREG_CPL, vcpu_vmx-cpl
 should directly be updated in set_rflags and set_segment (and in the new
 set_cpl, obviously).
 
 vcpu_vmx::cpl is currently just a cache, when valid it reflects the
 state of cr0.pe/eflags.vm/cs.rpl.  It's not separate state as the VMCS
 has nowhere to hold it.  vmx cannot virtualize the sequence 'mov $3,
 %ss; mov $1, %eax; mov %eax, %cr0; nop' - we have
 emulate_invalid_guest_state for that, but it's not on by default.
 
 What you propose is converting vcpu_vmx::cpl from a cache to separate
 state, but that state can change after a vmexit.  It's not entirely
 clear when it's valid and when it isn't.

It's the only way I can imagine to get a set_cpl() callback. Do you have
another one?

That it's not entirely clear where it's valid and where it needs to be
set is exactly the problem I have with it. Probably even more than you
as I'm not very familiar with the KVM code.

 Would that be enough or would we have to avoid clearing it in all other
 places as well? Where would it be initialised if it's not enough?
 
 Maybe vmx_vcpu_reset().

Do all CPL changes go through set_cr0/segment/rflags/cpl? I guess yes,
so initialising on reset and keeping it valid all the time should be
possible indeed.

 For SVM, this updates
the CPL when cs is reloaded.




 4. Call ctxt-ops.set_rflag so that VMX updates the CPL. Should be
cleaned up in a follow-up patch, so that VMX uses set_segment
to update the CPL, like SVM does.

 Would this match your interpretation?

 Not exactly.  I claim that the cpl is a separate state from
 cs.cpl/ss.rpl/cr0.pe/eflags.vm - it cannot be derived just from those. 
 On the transition from real mode to protected mode, you can have (say)
 cs=0x13 and cpl=0.

 But even with cs=0x13 cs.dpl would still be 0 (in the segment cache)
 before cs is reloaded and you switch to a real protected mode segment,
 right?
 
 Right.  But cpl is defined as cs.rpl, not cs.dpl, and I think there are
 cases where cs.rpl != cs.dpl.

Sounds like conforming code segments? A part of protected mode magic
that I've never touched and never intended to... But using the RPL
instead makes sense in that context.

So SVM isn't only wrong in not considering real mode and VM86, but also
in calculating CPL from vmcb-save.cs.attrib rather than from the RPL in
the selector?

 Outside of mode switches, it's good to have set_segment() update the cpl
 automatically.  But inside mode switches it can screw up further checks
 or aborts.

 Can you give a specific example of how it screws things up?
 
 KVM_SET_REGS/KVM_SET_SREGS happen non-atomically, so you might have
 KVM_SET_SREGS raising the CPL to 3 only to lower it afterwards

Re: [PATCH 1/3] KVM: x86 emulator: Fix task switch privilege checks

2012-01-27 Thread Kevin Wolf
Am 25.01.2012 17:00, schrieb Joerg Roedel:
 On Tue, Jan 24, 2012 at 06:23:50PM +0200, Gleb Natapov wrote:
 On Tue, Jan 24, 2012 at 03:24:50PM +0100, Kevin Wolf wrote:
 
 However, task_switch_interception() itself does some more based on the
 value of reason, for example it decides whether or not to call
 skip_emulated_instruction().

 Joerg need to help us here. If intercept of task switch happens before
 rip is advanced past instruction that cause it we have to know somehow
 that task switch was caused by instruction. It is not enough that HW
 checks permission, we still lack essential info.
 
 Hmm, the RIP in the VMCB points to the instruction causing the task
 switch. This is also true for lcall and ljmp. But in my experiments I
 have seen exit_int_info.valid = 1 for task-switches that went through
 the IDT. But I havn't tested the VM86 case, though.
 
 Kevin, can you please re-verify that exit_int_info.valid is always 0 in
 your experiment? On what hardware have you tested this?

I just retried. I use kvm-kmod and kvm.git with HEAD at ff92e9b5 plus
the tree patches of this series plus a printk to output exit_int_info in
task_switch_intercept(). I ran taskswitch2 from kvm-unittests and got
two failures and my VM86 unit test which hung when trying to return from
VM86. I also ran the kernel that made me aware of the issue initially.
All debug messages show exit_int_info = 0.

This is the /proc/cpuinfo snippet for the first core:

processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 107
model name  : AMD Athlon(tm) 64 X2 Dual Core Processor 5200+
stepping: 2
cpu MHz : 1800.000
cache size  : 512 KB
physical id : 0
siblings: 2
core id : 0
cpu cores   : 2
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 1
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext
fxsr_opt rdtscp lm 3dnowext 3dnow rep_good nopl extd_apicid pni cx16
lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch lbrv
bogomips: 3592.64
TLB size: 1024 4K pages
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc 100mhzsteps

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] KVM: x86 emulator: Fix task switch privilege checks

2012-01-27 Thread Kevin Wolf
Am 27.01.2012 14:34, schrieb Joerg Roedel:
 On Fri, Jan 27, 2012 at 01:58:38PM +0100, Kevin Wolf wrote:
 Am 25.01.2012 17:00, schrieb Joerg Roedel:
 
 I just retried. I use kvm-kmod and kvm.git with HEAD at ff92e9b5 plus
 the tree patches of this series plus a printk to output exit_int_info in
 task_switch_intercept(). I ran taskswitch2 from kvm-unittests and got
 two failures and my VM86 unit test which hung when trying to return from
 VM86. I also ran the kernel that made me aware of the issue initially.
 All debug messages show exit_int_info = 0.
 
 Okay, you are testing on a K8 which has exactly this bug. As I just
 found out it is documented as erratum 701. The good news is that this
 only happens on K8 and Fam11h, any later AMD processor doesn't have this
 bug.

Meh. Unless you give me a newer processor, this doesn't really help
me... Doesn't look like there's any way to get a workaround, is there? I
guess I'll have to hack it locally and possibly break other guests with
the hacked module.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] KVM: x86 emulator: Fix task switch privilege checks

2012-01-27 Thread Kevin Wolf
Am 27.01.2012 15:17, schrieb Joerg Roedel:
 On Fri, Jan 27, 2012 at 02:55:12PM +0100, Kevin Wolf wrote:
 Am 27.01.2012 14:34, schrieb Joerg Roedel:
 On Fri, Jan 27, 2012 at 01:58:38PM +0100, Kevin Wolf wrote:
 Am 25.01.2012 17:00, schrieb Joerg Roedel:

 I just retried. I use kvm-kmod and kvm.git with HEAD at ff92e9b5 plus
 the tree patches of this series plus a printk to output exit_int_info in
 task_switch_intercept(). I ran taskswitch2 from kvm-unittests and got
 two failures and my VM86 unit test which hung when trying to return from
 VM86. I also ran the kernel that made me aware of the issue initially.
 All debug messages show exit_int_info = 0.

 Okay, you are testing on a K8 which has exactly this bug. As I just
 found out it is documented as erratum 701. The good news is that this
 only happens on K8 and Fam11h, any later AMD processor doesn't have this
 bug.

 Meh. Unless you give me a newer processor, this doesn't really help
 me... Doesn't look like there's any way to get a workaround, is there? I
 guess I'll have to hack it locally and possibly break other guests with
 the hacked module.
 
 No, unfortunatly there is no workaround for this problem. How do you
 plan to hack around it?

I know that my guest only uses iret and exceptions for task switches, so
I think in my case I can assume that any TASK_SWITCH_CALL is really a
TASK_SWITCH_GATE and I don't have to skip an instruction.

Not quite upstreamable, obviously.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/3] Fix task switches into/out of VM86

2012-01-27 Thread Kevin Wolf
I believe this should work with both VMX and SVM now. Gleb, Jörg, can one of
you test this with SVM? I did some testing on my buggy processor and it looks
as good as it gets, but it would be better if you could confirm.

Kevin Wolf (3):
  KVM: x86 emulator: Fix task switch privilege checks
  KVM: x86 emulator: VM86 segments must have DPL 3
  KVM: x86 emulator: Allow PM/VM86 switch during task switch

 arch/x86/include/asm/kvm_emulate.h |3 +-
 arch/x86/include/asm/kvm_host.h|4 +-
 arch/x86/kvm/emulate.c |   79 ---
 arch/x86/kvm/svm.c |5 ++-
 arch/x86/kvm/vmx.c |8 ++-
 arch/x86/kvm/x86.c |   12 -
 6 files changed, 94 insertions(+), 17 deletions(-)

-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/3] KVM: x86 emulator: Fix task switch privilege checks

2012-01-27 Thread Kevin Wolf
Currently, all task switches check privileges against the DPL of the
TSS. This is only correct for jmp/call to a TSS. If a task gate is used,
the DPL of this take gate is used for the check instead. Exceptions,
external interrupts and iret shouldn't perform any check.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 arch/x86/include/asm/kvm_emulate.h |2 +-
 arch/x86/include/asm/kvm_host.h|4 +-
 arch/x86/kvm/emulate.c |   51 +++-
 arch/x86/kvm/svm.c |5 +++-
 arch/x86/kvm/vmx.c |8 +++--
 arch/x86/kvm/x86.c |6 ++--
 6 files changed, 59 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index ab4092e..c8a9cf3 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -372,7 +372,7 @@ bool x86_page_table_writing_insn(struct x86_emulate_ctxt 
*ctxt);
 #define EMULATION_INTERCEPTED 2
 int x86_emulate_insn(struct x86_emulate_ctxt *ctxt);
 int emulator_task_switch(struct x86_emulate_ctxt *ctxt,
-u16 tss_selector, int reason,
+u16 tss_selector, int idt_index, int reason,
 bool has_error_code, u32 error_code);
 int emulate_int_real(struct x86_emulate_ctxt *ctxt, int irq);
 #endif /* _ASM_X86_KVM_X86_EMULATE_H */
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 52d6640..0533fc4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -741,8 +741,8 @@ int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu);
 void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
 int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int seg);
 
-int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int reason,
-   bool has_error_code, u32 error_code);
+int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
+   int reason, bool has_error_code, u32 error_code);
 
 int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
 int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 05a562b..1b98a2c 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1151,6 +1151,22 @@ static int pio_in_emulated(struct x86_emulate_ctxt *ctxt,
return 1;
 }
 
+static int read_interrupt_descriptor(struct x86_emulate_ctxt *ctxt,
+u16 index, struct kvm_desc_struct *desc)
+{
+   struct kvm_desc_ptr dt;
+   ulong addr;
+
+   ctxt-ops-get_idt(ctxt, dt);
+
+   if (dt.size  index * 8 + 7)
+   return emulate_gp(ctxt, index  3 | 0x2);
+
+   addr = dt.address + index * 8;
+   return ctxt-ops-read_std(ctxt, addr, desc, sizeof *desc,
+  ctxt-exception);
+}
+
 static void get_descriptor_table_ptr(struct x86_emulate_ctxt *ctxt,
 u16 selector, struct desc_ptr *dt)
 {
@@ -2350,7 +2366,7 @@ static int task_switch_32(struct x86_emulate_ctxt *ctxt,
 }
 
 static int emulator_do_task_switch(struct x86_emulate_ctxt *ctxt,
-  u16 tss_selector, int reason,
+  u16 tss_selector, int idt_index, int reason,
   bool has_error_code, u32 error_code)
 {
struct x86_emulate_ops *ops = ctxt-ops;
@@ -2360,6 +2376,7 @@ static int emulator_do_task_switch(struct 
x86_emulate_ctxt *ctxt,
ulong old_tss_base =
ops-get_cached_segment_base(ctxt, VCPU_SREG_TR);
u32 desc_limit;
+   int dpl;
 
/* FIXME: old_tss_base == ~0 ? */
 
@@ -2372,12 +2389,32 @@ static int emulator_do_task_switch(struct 
x86_emulate_ctxt *ctxt,
 
/* FIXME: check that next_tss_desc is tss */
 
-   if (reason != TASK_SWITCH_IRET) {
-   if ((tss_selector  3)  next_tss_desc.dpl ||
-   ops-cpl(ctxt)  next_tss_desc.dpl)
-   return emulate_gp(ctxt, 0);
+   /*
+* Check privileges. The three cases are task switch caused by...
+*
+* 1. Software interrupt: Check against DPL of the task gate
+* 2. Exception/IRQ/iret: No check is performed
+* 3. jmp/call: Check agains DPL of the TSS
+*/
+   dpl = -1;
+   if (reason == TASK_SWITCH_GATE) {
+   if (idt_index != -1) {
+   struct kvm_desc_struct task_gate_desc;
+
+   ret = read_interrupt_descriptor(ctxt, idt_index,
+   task_gate_desc);
+   if (ret != X86EMUL_CONTINUE)
+   return ret;
+
+   dpl = task_gate_desc.dpl;
+   }
+   } else if (reason != TASK_SWITCH_IRET) {
+   dpl

[PATCH v2 2/3] KVM: x86 emulator: VM86 segments must have DPL 3

2012-01-27 Thread Kevin Wolf
Setting the segment DPL to 0 for at least the VM86 code segment makes
the VM entry fail on VMX.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 arch/x86/kvm/emulate.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 1b98a2c..833969e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1243,6 +1243,8 @@ static int load_segment_descriptor(struct 
x86_emulate_ctxt *ctxt,
seg_desc.type = 3;
seg_desc.p = 1;
seg_desc.s = 1;
+   if (ctxt-mode == X86EMUL_MODE_VM86)
+   seg_desc.dpl = 3;
goto load;
}
 
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/3] KVM: x86 emulator: Allow PM/VM86 switch during task switch

2012-01-27 Thread Kevin Wolf
Task switches can switch between Protected Mode and VM86. The current
mode must be updated during the task switch emulation so that the new
segment selectors are interpreted correctly and privilege checks
succeed.

VMX code calculates the CPL from the code segment selector and rflags,
so it needs rflags to be updated in the vcpu struct. SVM stores the DPL
of the code segment instead, so we must be sure to give the right one
when updating the selector.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 arch/x86/include/asm/kvm_emulate.h |1 +
 arch/x86/kvm/emulate.c |   26 ++
 arch/x86/kvm/x86.c |6 ++
 3 files changed, 33 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index c8a9cf3..4a21c7d 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -176,6 +176,7 @@ struct x86_emulate_ops {
void (*set_idt)(struct x86_emulate_ctxt *ctxt, struct desc_ptr *dt);
ulong (*get_cr)(struct x86_emulate_ctxt *ctxt, int cr);
int (*set_cr)(struct x86_emulate_ctxt *ctxt, int cr, ulong val);
+   void (*set_rflags)(struct x86_emulate_ctxt *ctxt, ulong val);
int (*cpl)(struct x86_emulate_ctxt *ctxt);
int (*get_dr)(struct x86_emulate_ctxt *ctxt, int dr, ulong *dest);
int (*set_dr)(struct x86_emulate_ctxt *ctxt, int dr, ulong value);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 833969e..143ce8e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -552,6 +552,14 @@ static void set_segment_selector(struct x86_emulate_ctxt 
*ctxt, u16 selector,
struct desc_struct desc;
 
ctxt-ops-get_segment(ctxt, dummy, desc, base3, seg);
+
+   if (ctxt-mode == X86EMUL_MODE_REAL)
+   desc.dpl = 0;
+   else if (ctxt-mode == X86EMUL_MODE_VM86)
+   desc.dpl = 3;
+   else
+   desc.dpl = selector  0x3;
+
ctxt-ops-set_segment(ctxt, selector, desc, base3, seg);
 }
 
@@ -2273,6 +2281,24 @@ static int load_state_from_tss32(struct x86_emulate_ctxt 
*ctxt,
return emulate_gp(ctxt, 0);
ctxt-_eip = tss-eip;
ctxt-eflags = tss-eflags | 2;
+
+   /*
+* If we're switching between Protected Mode and VM86, we need to make
+* sure to update the mode before loading the segment descriptors so
+* that the selectors are interpreted correctly.
+*
+* Need to get it to the vcpu struct immediately because it influences
+* the CPL which is checked at least when loading the segment
+* descriptors and when pushing an error code to the new kernel stack.
+*/
+   if (ctxt-eflags  X86_EFLAGS_VM)
+   ctxt-mode = X86EMUL_MODE_VM86;
+   else
+   ctxt-mode = X86EMUL_MODE_PROT32;
+
+   ctxt-ops-set_rflags(ctxt, ctxt-eflags);
+
+   /* General purpose registers */
ctxt-regs[VCPU_REGS_RAX] = tss-eax;
ctxt-regs[VCPU_REGS_RCX] = tss-ecx;
ctxt-regs[VCPU_REGS_RDX] = tss-edx;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dc3e945..502b5c3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4040,6 +4040,11 @@ static int emulator_set_cr(struct x86_emulate_ctxt 
*ctxt, int cr, ulong val)
return res;
 }
 
+static void emulator_set_rflags(struct x86_emulate_ctxt *ctxt, ulong val)
+{
+   kvm_set_rflags(emul_to_vcpu(ctxt), val);
+}
+
 static int emulator_get_cpl(struct x86_emulate_ctxt *ctxt)
 {
return kvm_x86_ops-get_cpl(emul_to_vcpu(ctxt));
@@ -4199,6 +4204,7 @@ static struct x86_emulate_ops emulate_ops = {
.set_idt = emulator_set_idt,
.get_cr  = emulator_get_cr,
.set_cr  = emulator_set_cr,
+   .set_rflags  = emulator_set_rflags,
.cpl = emulator_get_cpl,
.get_dr  = emulator_get_dr,
.set_dr  = emulator_set_dr,
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests 3/4] Fix i386 build

2012-01-24 Thread Kevin Wolf
Am 24.01.2012 10:51, schrieb Takuya Yoshikawa:
 (2012/01/24 1:07), Kevin Wolf wrote:
 Commit 1d946e07 removed idt, but left a reference to idt in i386-only
 code.

 
 This is already fixed by
 
   commit b319491d278d4e85de7ea967982f7d416f4a44e4
   desc: fix build for i386

Whoops, my .git/config was still referring to the github URL. Thanks!

The other three patches seem to apply to master anyway.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] KVM: x86 emulator: Fix task switch privilege checks

2012-01-24 Thread Kevin Wolf
Am 24.01.2012 10:52, schrieb Gleb Natapov:
 On Mon, Jan 23, 2012 at 05:10:46PM +0100, Kevin Wolf wrote:
 Currently, all task switches check privileges against the DPL of the
 TSS. This is only correct for jmp/call to a TSS. If a task gate is used,
 the DPL of this take gate is used for the check instead. Exceptions,
 external interrupts and iret shouldn't perform any check.

 This patch fixes the problem for VMX. For SVM, the logic used to
 determine the source of the task switch is buggy, so we can't pass
 useful information to the emulator there and just disable the check in
 all cases.

 Is this accurate description? Since on SVM you never get
 TASK_SWITCH_GATE reason the check is always done using TSS dpl, not
 disabled in all cases, correct?

Hm, right, they turn out as TASK_SWITCH_CALL, so we check against the
TSS. I'll fix the commit message if I need to do a v2.

 Signed-off-by: Kevin Wolf kw...@redhat.com
 ---
  arch/x86/include/asm/kvm_emulate.h |2 +-
  arch/x86/include/asm/kvm_host.h|4 +-
  arch/x86/kvm/emulate.c |   51 
 +++-
  arch/x86/kvm/svm.c |3 +-
  arch/x86/kvm/vmx.c |5 ++-
  arch/x86/kvm/x86.c |6 ++--
  6 files changed, 55 insertions(+), 16 deletions(-)

 diff --git a/arch/x86/include/asm/kvm_emulate.h 
 b/arch/x86/include/asm/kvm_emulate.h
 index ab4092e..c8a9cf3 100644
 --- a/arch/x86/include/asm/kvm_emulate.h
 +++ b/arch/x86/include/asm/kvm_emulate.h
 @@ -372,7 +372,7 @@ bool x86_page_table_writing_insn(struct x86_emulate_ctxt 
 *ctxt);
  #define EMULATION_INTERCEPTED 2
  int x86_emulate_insn(struct x86_emulate_ctxt *ctxt);
  int emulator_task_switch(struct x86_emulate_ctxt *ctxt,
 - u16 tss_selector, int reason,
 + u16 tss_selector, int idt_index, int reason,
   bool has_error_code, u32 error_code);
  int emulate_int_real(struct x86_emulate_ctxt *ctxt, int irq);
  #endif /* _ASM_X86_KVM_X86_EMULATE_H */
 diff --git a/arch/x86/include/asm/kvm_host.h 
 b/arch/x86/include/asm/kvm_host.h
 index 52d6640..0533fc4 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -741,8 +741,8 @@ int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu);
  void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int 
 seg);
  int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int 
 seg);
  
 -int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int reason,
 -bool has_error_code, u32 error_code);
 +int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
 +int reason, bool has_error_code, u32 error_code);
  
  int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
  int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3);
 diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
 index 05a562b..1b98a2c 100644
 --- a/arch/x86/kvm/emulate.c
 +++ b/arch/x86/kvm/emulate.c
 @@ -1151,6 +1151,22 @@ static int pio_in_emulated(struct x86_emulate_ctxt 
 *ctxt,
  return 1;
  }
  
 +static int read_interrupt_descriptor(struct x86_emulate_ctxt *ctxt,
 + u16 index, struct kvm_desc_struct *desc)
 +{
 +struct kvm_desc_ptr dt;
 +ulong addr;
 +
 +ctxt-ops-get_idt(ctxt, dt);
 +
 +if (dt.size  index * 8 + 7)
 +return emulate_gp(ctxt, index  3 | 0x2);
 +
 +addr = dt.address + index * 8;
 +return ctxt-ops-read_std(ctxt, addr, desc, sizeof *desc,
 +   ctxt-exception);
 +}
 +
  static void get_descriptor_table_ptr(struct x86_emulate_ctxt *ctxt,
   u16 selector, struct desc_ptr *dt)
  {
 @@ -2350,7 +2366,7 @@ static int task_switch_32(struct x86_emulate_ctxt 
 *ctxt,
  }
  
  static int emulator_do_task_switch(struct x86_emulate_ctxt *ctxt,
 -   u16 tss_selector, int reason,
 +   u16 tss_selector, int idt_index, int reason,
 bool has_error_code, u32 error_code)
  {
  struct x86_emulate_ops *ops = ctxt-ops;
 @@ -2360,6 +2376,7 @@ static int emulator_do_task_switch(struct 
 x86_emulate_ctxt *ctxt,
  ulong old_tss_base =
  ops-get_cached_segment_base(ctxt, VCPU_SREG_TR);
  u32 desc_limit;
 +int dpl;
  
  /* FIXME: old_tss_base == ~0 ? */
  
 @@ -2372,12 +2389,32 @@ static int emulator_do_task_switch(struct 
 x86_emulate_ctxt *ctxt,
  
  /* FIXME: check that next_tss_desc is tss */
  
 -if (reason != TASK_SWITCH_IRET) {
 -if ((tss_selector  3)  next_tss_desc.dpl ||
 -ops-cpl(ctxt)  next_tss_desc.dpl)
 -return emulate_gp(ctxt, 0);
 +/*
 + * Check privileges. The three cases are task switch caused by...
 + *
 + * 1. Software interrupt: Check against DPL of the task gate
 + * 2. Exception/IRQ/iret: No check is performed

Re: [PATCH kvm-unit-tests 4/4] x86/taskswitch_vm86: Task switches into/out of VM86

2012-01-24 Thread Kevin Wolf
Am 23.01.2012 17:42, schrieb Gleb Natapov:
 On Mon, Jan 23, 2012 at 05:32:59PM +0100, Kevin Wolf wrote:
 Am 23.01.2012 17:22, schrieb Gleb Natapov:
 On Mon, Jan 23, 2012 at 05:20:22PM +0100, Kevin Wolf wrote:
 Am 23.01.2012 17:10, schrieb Gleb Natapov:
 On Mon, Jan 23, 2012 at 05:07:13PM +0100, Kevin Wolf wrote:
 This adds a test case that jumps into VM86 by iret-ing to a TSS and back
 to Protected Mode using a task gate in the IDT.

 Can you add the test case to taskswitch2.c?

 Running one test to check all aspects of taskswitch emulation.

 (We all know that top-posting is disliked, but middle-posting looks even
 crazier!)

 Inserting replies 

Very true!

 at random places is a new cool thing!
 
 Does having one test provide any value in and of itself? It's just an
 implementation detail of the test suite. When testing the KVM patches I
 ran all three test cases with './run_tests.sh -g task', which is
 hopefully easy enough.

 I think it does. I do not have to use external script to combine tests
 on the same topic or even remember that such script exists. We do not
 create separate tests to test each instruction emulation either. And I
 usually run qemu not on the same machine I compile it on, so I need
 special tricks to make those test script work. Of course if putting this
 code into existing test file is hard separate test is OK, but is this
 really the case here?

I haven't really checked whether they interfere. I guess I would have to
move the GDT indexes for my manually created TSSes and I would have to
hope that nobody else needs the memory I'm overwriting with the real
mode code (there doesn't seem to be memory management for  1 MB).

Should taskswitch.c and taskswitch2.c be merged as well then? Or is
there a reason why they must stay separate? One file or three files
makes sense to me for three tests, but two not so much.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] KVM: x86 emulator: Fix task switch privilege checks

2012-01-24 Thread Kevin Wolf
Am 24.01.2012 11:17, schrieb Gleb Natapov:
 On Tue, Jan 24, 2012 at 11:09:09AM +0100, Kevin Wolf wrote:
 +  } else if (reason != TASK_SWITCH_IRET) {
 +  dpl = next_tss_desc.dpl;
}
 No need parentheses around one statement.

 Documentation/CodingStyle says:

 This does not apply if only one branch of a conditional statement is a
 single statement; in the latter case use braces in both branches:

 Then you need to put parentheses around if (reason != TASK_SWITCH_IRET)
 if you want to follow the letter of the CodingStyle :)

Not sure what you mean. If it is 'else { if (...) { ... then no, the
document isn't crazy like that.

 But I do not see this coding stile part widely used in core kernel code:
 $ git grep } else$ kernel | wc -l
 122
 
 Can't think of re to check when the rule is followed :(

Seem to be at least 77 occurences (git grep -A 2 } else { into a file
as git grep doesn't seem to do multi-line expressions and then on that
file } else {\n.*\n.*}$)

But anyway, I don't really want to discuss the right coding style here
and I'll apply whatever is considered right. Though if you think that
checkpatch.pl and Documentation/CodingStyle are both wrong, please get
them fixed. People might take them serious.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] KVM: x86 emulator: Fix task switch privilege checks

2012-01-24 Thread Kevin Wolf
Am 24.01.2012 11:52, schrieb Gleb Natapov:
 On Tue, Jan 24, 2012 at 11:38:24AM +0100, Kevin Wolf wrote:
 Am 24.01.2012 11:17, schrieb Gleb Natapov:
 On Tue, Jan 24, 2012 at 11:09:09AM +0100, Kevin Wolf wrote:
 +} else if (reason != TASK_SWITCH_IRET) {
 +dpl = next_tss_desc.dpl;
  }
 No need parentheses around one statement.

 Documentation/CodingStyle says:

 This does not apply if only one branch of a conditional statement is a
 single statement; in the latter case use braces in both branches:

 Then you need to put parentheses around if (reason != TASK_SWITCH_IRET)
 if you want to follow the letter of the CodingStyle :)

 Not sure what you mean. If it is 'else { if (...) { ... then no, the
 document isn't crazy like that.

 The document says:
 
 if (condition) {
 do_this();
 do_that();
 } else {
 otherwise();
 }
 
 So I do not see how you can interpret it otherwise.

Well, I just read more than only one paragraph. It further says:

Note that the closing brace is empty on a line of its own, _except_ in
the cases where it is followed by a continuation of the same statement,
ie a while in a do-statement or an else in an if-statement, like
this:

if (x == y) {
..
} else if (x  y) {
...
} else {

}


 
 But I do not see this coding stile part widely used in core kernel code:
 $ git grep } else$ kernel | wc -l
 122

 Can't think of re to check when the rule is followed :(

 Seem to be at least 77 occurences (git grep -A 2 } else { into a file
 as git grep doesn't seem to do multi-line expressions and then on that
 file } else {\n.*\n.*}$)

 But anyway, I don't really want to discuss the right coding style here
 and I'll apply whatever is considered right. Though if you think that
 checkpatch.pl and Documentation/CodingStyle are both wrong, please get
 them fixed. People might take them serious.

 The code looked strange for kernel code. If checkpatch.pl complains about
 missing parentheses then they should of course stay. Our interpretation
 of CodingStyle is different.

checkpatch.pl accepts both ways, and as the counts above show they are
both used in core kernel code. If Avi or Marcelo want to have it changed
anyway, I'll resend.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] KVM: x86 emulator: Allow PM/VM86 switch during task switch

2012-01-24 Thread Kevin Wolf
Am 24.01.2012 11:57, schrieb Gleb Natapov:
 On Mon, Jan 23, 2012 at 05:10:48PM +0100, Kevin Wolf wrote:
 Task switches can switch between Protected Mode and VM86. The current
 mode must be updated during the task switch emulation so that the new
 segment selectors are interpreted correctly and privilege checks
 succeed.

 Signed-off-by: Kevin Wolf kw...@redhat.com
 ---
  arch/x86/include/asm/kvm_emulate.h |1 +
  arch/x86/kvm/emulate.c |   17 +
  arch/x86/kvm/x86.c |6 ++
  3 files changed, 24 insertions(+), 0 deletions(-)

 diff --git a/arch/x86/include/asm/kvm_emulate.h 
 b/arch/x86/include/asm/kvm_emulate.h
 index c8a9cf3..4a21c7d 100644
 --- a/arch/x86/include/asm/kvm_emulate.h
 +++ b/arch/x86/include/asm/kvm_emulate.h
 @@ -176,6 +176,7 @@ struct x86_emulate_ops {
  void (*set_idt)(struct x86_emulate_ctxt *ctxt, struct desc_ptr *dt);
  ulong (*get_cr)(struct x86_emulate_ctxt *ctxt, int cr);
  int (*set_cr)(struct x86_emulate_ctxt *ctxt, int cr, ulong val);
 +void (*set_rflags)(struct x86_emulate_ctxt *ctxt, ulong val);
  int (*cpl)(struct x86_emulate_ctxt *ctxt);
  int (*get_dr)(struct x86_emulate_ctxt *ctxt, int dr, ulong *dest);
  int (*set_dr)(struct x86_emulate_ctxt *ctxt, int dr, ulong value);
 diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
 index 833969e..52fce89 100644
 --- a/arch/x86/kvm/emulate.c
 +++ b/arch/x86/kvm/emulate.c
 @@ -2273,6 +2273,23 @@ static int load_state_from_tss32(struct 
 x86_emulate_ctxt *ctxt,
  return emulate_gp(ctxt, 0);
  ctxt-_eip = tss-eip;
  ctxt-eflags = tss-eflags | 2;
 +
 +/*
 + * If we're switching between Protected Mode and VM86, we need to make
 + * sure to update the mode before loading the segment descriptors so
 + * that the selectors are interpreted correctly.
 + *
 + * Need to get it to the vcpu struct immediately because it influences
 + * the CPL which is checked when loading the segment descriptors.
 This is true only for VMX. SVM does not look at rflags. May be instead
 of adding new x86_emulate_ops callback we need to get rid of get_cpl()
 one and implement CPL checking using emulate.c:get_segment_selector().

The selector isn't enough for VM86. In most cases
ctxt-ops-get_segment() would work (assuming that the dpl field is
valid there), but it doesn't in task switches before switching the code
segment, which already needs to have the new CPL applied to succeed.

So the only other way I could think of is to pass a flag to
load_segment_descriptor() which says that it shouldn't check any privileges.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] KVM: x86 emulator: Allow PM/VM86 switch during task switch

2012-01-24 Thread Kevin Wolf
Am 24.01.2012 12:37, schrieb Gleb Natapov:
 On Tue, Jan 24, 2012 at 12:31:48PM +0100, Kevin Wolf wrote:
 Am 24.01.2012 11:57, schrieb Gleb Natapov:
 On Mon, Jan 23, 2012 at 05:10:48PM +0100, Kevin Wolf wrote:
 Task switches can switch between Protected Mode and VM86. The current
 mode must be updated during the task switch emulation so that the new
 segment selectors are interpreted correctly and privilege checks
 succeed.

 Signed-off-by: Kevin Wolf kw...@redhat.com
 ---
  arch/x86/include/asm/kvm_emulate.h |1 +
  arch/x86/kvm/emulate.c |   17 +
  arch/x86/kvm/x86.c |6 ++
  3 files changed, 24 insertions(+), 0 deletions(-)

 diff --git a/arch/x86/include/asm/kvm_emulate.h 
 b/arch/x86/include/asm/kvm_emulate.h
 index c8a9cf3..4a21c7d 100644
 --- a/arch/x86/include/asm/kvm_emulate.h
 +++ b/arch/x86/include/asm/kvm_emulate.h
 @@ -176,6 +176,7 @@ struct x86_emulate_ops {
void (*set_idt)(struct x86_emulate_ctxt *ctxt, struct desc_ptr *dt);
ulong (*get_cr)(struct x86_emulate_ctxt *ctxt, int cr);
int (*set_cr)(struct x86_emulate_ctxt *ctxt, int cr, ulong val);
 +  void (*set_rflags)(struct x86_emulate_ctxt *ctxt, ulong val);
int (*cpl)(struct x86_emulate_ctxt *ctxt);
int (*get_dr)(struct x86_emulate_ctxt *ctxt, int dr, ulong *dest);
int (*set_dr)(struct x86_emulate_ctxt *ctxt, int dr, ulong value);
 diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
 index 833969e..52fce89 100644
 --- a/arch/x86/kvm/emulate.c
 +++ b/arch/x86/kvm/emulate.c
 @@ -2273,6 +2273,23 @@ static int load_state_from_tss32(struct 
 x86_emulate_ctxt *ctxt,
return emulate_gp(ctxt, 0);
ctxt-_eip = tss-eip;
ctxt-eflags = tss-eflags | 2;
 +
 +  /*
 +   * If we're switching between Protected Mode and VM86, we need to make
 +   * sure to update the mode before loading the segment descriptors so
 +   * that the selectors are interpreted correctly.
 +   *
 +   * Need to get it to the vcpu struct immediately because it influences
 +   * the CPL which is checked when loading the segment descriptors.
 This is true only for VMX. SVM does not look at rflags. May be instead
 of adding new x86_emulate_ops callback we need to get rid of get_cpl()
 one and implement CPL checking using emulate.c:get_segment_selector().

 The selector isn't enough for VM86. In most cases
 ctxt-ops-get_segment() would work (assuming that the dpl field is
 valid there), but it doesn't in task switches before switching the code
 segment, which already needs to have the new CPL applied to succeed.

 So the only other way I could think of is to pass a flag to
 load_segment_descriptor() which says that it shouldn't check any privileges.

 What I mean it to have similar logic to what we have in
 __vmx_get_cpl(). Something like this:
 
 static int get_cpl(struct x86_emulate_ctxt *ctxt)
 {
   if (ctxt-mode == X86EMUL_MODE_REAL)
 return 0;
 
 if (ctxt-mode != X86EMUL_MODE_PROT64
  (ctx-eflags  EFLG_VM)) /* if virtual 8086 */
 return 3;
 
 return get_segment_selector(ctx, VCPU_SREG_CS)  3;
 }

Ah, you mean just using ctxt-rflags instead of calling out into x86.c
to update it before the call. Yes, I think that would work.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] KVM: x86 emulator: Fix task switch privilege checks

2012-01-24 Thread Kevin Wolf
Am 24.01.2012 15:03, schrieb Joerg Roedel:
 On Mon, Jan 23, 2012 at 05:10:46PM +0100, Kevin Wolf wrote:
 This patch fixes the problem for VMX. For SVM, the logic used to
 determine the source of the task switch is buggy, so we can't pass
 useful information to the emulator there and just disable the check in
 all cases.
 
 Actually, SVM isn't buggy :) For SVM you do not need to do any
 priviledge checks in software because the hardware already takes care of
 that.
 In other words, KVM only gets a task-switch intercept if the priviledges
 are all checked and correct.

Okay, that's good to hear. The current code is still buggy because as
Gleb noted it checks against the TSS DPL. We need to disable that check
for SVM then. Also all checks for TASK_SWITCH_GATE indicate that
something is wrong because it will never happen.

Are you going to rewrite task_switch_interception() on top of this series?

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] KVM: x86 emulator: Fix task switch privilege checks

2012-01-24 Thread Kevin Wolf
Am 24.01.2012 15:16, schrieb Gleb Natapov:
 On Tue, Jan 24, 2012 at 03:15:13PM +0100, Kevin Wolf wrote:
 Am 24.01.2012 15:03, schrieb Joerg Roedel:
 On Mon, Jan 23, 2012 at 05:10:46PM +0100, Kevin Wolf wrote:
 This patch fixes the problem for VMX. For SVM, the logic used to
 determine the source of the task switch is buggy, so we can't pass
 useful information to the emulator there and just disable the check in
 all cases.

 Actually, SVM isn't buggy :) For SVM you do not need to do any
 priviledge checks in software because the hardware already takes care of
 that.
 In other words, KVM only gets a task-switch intercept if the priviledges
 are all checked and correct.

 Okay, that's good to hear. The current code is still buggy because as
 Gleb noted it checks against the TSS DPL. We need to disable that check
 for SVM then. Also all checks for TASK_SWITCH_GATE indicate that
 something is wrong because it will never happen.

 Not necessary. Currently all checks for TASK_SWITCH_GATE also check for
 TASK_SWITCH_CALL, so I think you can fix SVM case in your patch by
 passing TASK_SWITCH_GATE instead of TASK_SWITCH_CALL to
 kvm_task_switch().

Yes, the emulator itself would be fixed by passing TASK_SWITCH_GATE and
idt_index = -1 (although it looks a bit brittle).

However, task_switch_interception() itself does some more based on the
value of reason, for example it decides whether or not to call
skip_emulated_instruction().

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for Tuesday 24

2012-01-24 Thread Kevin Wolf
Am 24.01.2012 15:08, schrieb Anthony Liguori:
 On 01/24/2012 08:03 AM, Paolo Bonzini wrote:
 On 01/24/2012 02:57 PM, Anthony Liguori wrote:
 Please send in any agenda items you are interested in covering.

 I don't have anything pressing. I vote to cancel the call.

 Nothing that cannot be discussed by email, but anyway here are a couple of 
 topics:
 
 Ok.
 

 * qtest/libos: Python or C?
 
 Both.

More importantly: When? :-)

Are there still any problems that must be fixed before it can be merged?

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 0/4] VM86 testcase and run_tests.sh

2012-01-23 Thread Kevin Wolf
This adds a test case for taskswitches into/out of VM86. This test case
currently fails on KVM, it passes with TCG. I'll send out KVM fixes together
with this series.

I also included a small shell script that just runs tests and prints a
PASS/FAIL message for each. I've been using this script locally for a while,
but maybe someone else finds it handy, too.

Kevin Wolf (4):
  Add run_tests.sh
  Add taskswitch testcases to unittest.cfg
  Fix i386 build
  x86/taskswitch_vm86: Task switches into/out of VM86

 config-i386.mak   |3 +-
 lib/x86/desc.c|   39 +-
 lib/x86/desc.h|   36 
 lib/x86/vm.c  |4 +-
 lib/x86/vm.h  |1 +
 run_tests.sh  |  107 +
 x86/taskswitch_vm86.c |   59 +++
 x86/unittests.cfg |   18 
 8 files changed, 227 insertions(+), 40 deletions(-)
 create mode 100755 run_tests.sh
 create mode 100644 x86/taskswitch_vm86.c

-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 1/4] Add run_tests.sh

2012-01-23 Thread Kevin Wolf
This adds a convenient way to run all tests without having to set up
Autotest.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 run_tests.sh |  107 ++
 1 files changed, 107 insertions(+), 0 deletions(-)
 create mode 100755 run_tests.sh

diff --git a/run_tests.sh b/run_tests.sh
new file mode 100755
index 000..8d152b0
--- /dev/null
+++ b/run_tests.sh
@@ -0,0 +1,107 @@
+#!/bin/bash
+
+testroot=x86
+config=$testroot/unittests.cfg
+qemu=${qemu:-qemu-system-x86_64}
+verbose=0
+
+function run()
+{
+local testname=$1
+local groups=$2
+local smp=$3
+local kernel=$4
+local opts=$5
+
+if [ -z $testname ]; then
+return
+fi
+
+if [ -n $only_group ]  ! grep -q $only_group $groups; then
+return
+fi
+
+cmdline=$qemu -display none -enable-kvm -device testdev,chardev=testlog 
-chardev stdio,id=testlog -kernel $kernel -smp $smp $opts
+if [ $verbose != 0 ]; then
+echo $cmdline
+fi
+
+# extra_params in the config file may contain backticks that need to be
+# expanded, so use eval to start qemu
+eval $cmdline  test.log
+
+if [ $? == 0 ]; then
+echo PASS $1
+else
+echo FAIL $1
+fi
+}
+
+function run_all()
+{
+local config=$1
+local testname
+local smp
+local kernel
+local opts
+local groups
+
+exec {config_fd}$config
+
+while read -u $config_fd line; do
+if [[ $line =~ ^\[(.*)\]$ ]]; then
+run $testname $groups $smp $kernel $opts
+testname=${BASH_REMATCH[1]}
+smp=1
+kernel=
+opts=
+groups=
+elif [[ $line =~ ^file\ *=\ *(.*)$ ]]; then
+kernel=$testroot/${BASH_REMATCH[1]}
+elif [[ $line =~ ^smp\ *=\ *(.*)$ ]]; then
+smp=${BASH_REMATCH[1]}
+elif [[ $line =~ ^extra_params\ *=\ *(.*)$ ]]; then
+opts=${BASH_REMATCH[1]}
+elif [[ $line =~ ^groups\ *=\ *(.*)$ ]]; then
+groups=${BASH_REMATCH[1]}
+fi
+done
+
+run $testname $groups $smp $kernel $opts
+
+exec {config_fd}-
+}
+
+function usage()
+{
+cat EOF
+
+Usage: $0 [-g group] [-h] [-v]
+
+-g: Only execute tests in the given group
+-h: Output this help text
+-v: Enables verbose mode
+
+EOF
+}
+
+echo  test.log
+while getopts g:hv opt; do
+case $opt in
+g)
+only_group=$OPTARG
+;;
+h)
+usage
+exit
+;;
+v)
+verbose=1
+;;
+*)
+exit
+;;
+esac
+done
+
+run_all $config
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 2/4] Add taskswitch testcases to unittest.cfg

2012-01-23 Thread Kevin Wolf
Signed-off-by: Kevin Wolf kw...@redhat.com
---
 x86/unittests.cfg |   12 
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/x86/unittests.cfg b/x86/unittests.cfg
index 065020a..dac7d44 100644
--- a/x86/unittests.cfg
+++ b/x86/unittests.cfg
@@ -64,6 +64,18 @@ file = svm.flat
 smp = 2
 extra_params = -cpu qemu64,-svm
 
+[taskswitch]
+file = taskswitch.flat
+smp = 2
+extra_params = -cpu qemu64,-svm
+groups = task
+
+[taskswitch2]
+file = taskswitch2.flat
+smp = 2
+extra_params = -cpu qemu64,-svm
+groups = task
+
 [kvmclock_test]
 file = kvmclock_test.flat
 smp = 2
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 3/4] Fix i386 build

2012-01-23 Thread Kevin Wolf
Commit 1d946e07 removed idt, but left a reference to idt in i386-only
code.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 lib/x86/desc.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/lib/x86/desc.c b/lib/x86/desc.c
index c268955..770c250 100644
--- a/lib/x86/desc.c
+++ b/lib/x86/desc.c
@@ -329,7 +329,7 @@ void setup_gdt(void)
 
 static void set_idt_task_gate(int vec, u16 sel)
 {
-idt_entry_t *e = idt[vec];
+idt_entry_t *e = boot_idt[vec];
 
 memset(e, 0, sizeof *e);
 
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 4/4] x86/taskswitch_vm86: Task switches into/out of VM86

2012-01-23 Thread Kevin Wolf
This adds a test case that jumps into VM86 by iret-ing to a TSS and back
to Protected Mode using a task gate in the IDT.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 config-i386.mak   |3 +-
 lib/x86/desc.c|   37 +-
 lib/x86/desc.h|   36 +
 lib/x86/vm.c  |4 +-
 lib/x86/vm.h  |1 +
 x86/taskswitch_vm86.c |   59 +
 x86/unittests.cfg |6 +
 7 files changed, 107 insertions(+), 39 deletions(-)
 create mode 100644 x86/taskswitch_vm86.c

diff --git a/config-i386.mak b/config-i386.mak
index de52f3d..b5c3b9c 100644
--- a/config-i386.mak
+++ b/config-i386.mak
@@ -5,9 +5,10 @@ ldarch = elf32-i386
 CFLAGS += -D__i386__
 CFLAGS += -I $(KERNELDIR)/include
 
-tests = $(TEST_DIR)/taskswitch.flat $(TEST_DIR)/taskswitch2.flat
+tests = $(TEST_DIR)/taskswitch.flat $(TEST_DIR)/taskswitch2.flat 
$(TEST_DIR)/taskswitch_vm86.flat
 
 include config-x86-common.mak
 
 $(TEST_DIR)/taskswitch.elf: $(cstart.o) $(TEST_DIR)/taskswitch.o
 $(TEST_DIR)/taskswitch2.elf: $(cstart.o) $(TEST_DIR)/taskswitch2.o
+$(TEST_DIR)/taskswitch_vm86.elf: $(cstart.o) $(TEST_DIR)/taskswitch_vm86.o
diff --git a/lib/x86/desc.c b/lib/x86/desc.c
index 770c250..c4a3607 100644
--- a/lib/x86/desc.c
+++ b/lib/x86/desc.c
@@ -27,41 +27,6 @@ typedef struct {
u8 base_high;
 } gdt_entry_t;
 
-typedef struct {
-   u16 prev;
-   u16 res1;
-   u32 esp0;
-   u16 ss0;
-   u16 res2;
-   u32 esp1;
-   u16 ss1;
-   u16 res3;
-   u32 esp2;
-   u16 ss2;
-   u16 res4;
-   u32 cr3;
-   u32 eip;
-   u32 eflags;
-   u32 eax, ecx, edx, ebx, esp, ebp, esi, edi;
-   u16 es;
-   u16 res5;
-   u16 cs;
-   u16 res6;
-   u16 ss;
-   u16 res7;
-   u16 ds;
-   u16 res8;
-   u16 fs;
-   u16 res9;
-   u16 gs;
-   u16 res10;
-   u16 ldt;
-   u16 res11;
-   u16 t:1;
-   u16 res12:15;
-   u16 iomap_base;
-} tss32_t;
-
 extern idt_entry_t boot_idt[256];
 
 void set_idt_entry(int vec, void *addr, int dpl)
@@ -327,7 +292,7 @@ void setup_gdt(void)
  .Lflush2: ::r(0x10));
 }
 
-static void set_idt_task_gate(int vec, u16 sel)
+void set_idt_task_gate(int vec, u16 sel)
 {
 idt_entry_t *e = boot_idt[vec];
 
diff --git a/lib/x86/desc.h b/lib/x86/desc.h
index 0b4897c..f819452 100644
--- a/lib/x86/desc.h
+++ b/lib/x86/desc.h
@@ -24,6 +24,41 @@ struct ex_regs {
 unsigned long rflags;
 };
 
+typedef struct {
+   u16 prev;
+   u16 res1;
+   u32 esp0;
+   u16 ss0;
+   u16 res2;
+   u32 esp1;
+   u16 ss1;
+   u16 res3;
+   u32 esp2;
+   u16 ss2;
+   u16 res4;
+   u32 cr3;
+   u32 eip;
+   u32 eflags;
+   u32 eax, ecx, edx, ebx, esp, ebp, esi, edi;
+   u16 es;
+   u16 res5;
+   u16 cs;
+   u16 res6;
+   u16 ss;
+   u16 res7;
+   u16 ds;
+   u16 res8;
+   u16 fs;
+   u16 res9;
+   u16 gs;
+   u16 res10;
+   u16 ldt;
+   u16 res11;
+   u16 t:1;
+   u16 res12:15;
+   u16 iomap_base;
+} tss32_t;
+
 #define ASM_TRY(catch)  \
 movl $0, %%gs:4 \n\t  \
 .pushsection .data.ex \n\t\
@@ -44,6 +79,7 @@ unsigned exception_error_code(void);
 void set_idt_entry(int vec, void *addr, int dpl);
 void set_idt_sel(int vec, u16 sel);
 void set_gdt_entry(int num, u32 base,  u32 limit, u8 access, u8 gran);
+void set_idt_task_gate(int vec, u16 sel);
 void set_intr_task_gate(int e, void *fn);
 void print_current_tss_info(void);
 void handle_exception(u8 v, void (*func)(struct ex_regs *regs));
diff --git a/lib/x86/vm.c b/lib/x86/vm.c
index abbb0c9..aae044a 100644
--- a/lib/x86/vm.c
+++ b/lib/x86/vm.c
@@ -108,14 +108,14 @@ void install_large_page(unsigned long *cr3,
   unsigned long phys,
   void *virt)
 {
-install_pte(cr3, 2, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_PSE, 0);
+install_pte(cr3, 2, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_USER | 
PTE_PSE, 0);
 }
 
 void install_page(unsigned long *cr3,
   unsigned long phys,
   void *virt)
 {
-install_pte(cr3, 1, virt, phys | PTE_PRESENT | PTE_WRITE, 0);
+install_pte(cr3, 1, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_USER, 0);
 }
 
 
diff --git a/lib/x86/vm.h b/lib/x86/vm.h
index bf8fd52..aebc5c3 100644
--- a/lib/x86/vm.h
+++ b/lib/x86/vm.h
@@ -13,6 +13,7 @@
 #define PTE_PRESENT (1ull  0)
 #define PTE_PSE (1ull  7)
 #define PTE_WRITE   (1ull  1)
+#define PTE_USER(1ull  2)
 #define PTE_ADDR(0xff000ull)
 
 void setup_vm();
diff --git a/x86/taskswitch_vm86.c b/x86/taskswitch_vm86.c
new file mode 100644
index 000..363cb00
--- /dev/null
+++ b/x86/taskswitch_vm86.c
@@ -0,0 +1,59 @@
+#include libcflat.h
+#include desc.h
+#include

[PATCH 1/3] KVM: x86 emulator: Fix task switch privilege checks

2012-01-23 Thread Kevin Wolf
Currently, all task switches check privileges against the DPL of the
TSS. This is only correct for jmp/call to a TSS. If a task gate is used,
the DPL of this take gate is used for the check instead. Exceptions,
external interrupts and iret shouldn't perform any check.

This patch fixes the problem for VMX. For SVM, the logic used to
determine the source of the task switch is buggy, so we can't pass
useful information to the emulator there and just disable the check in
all cases.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 arch/x86/include/asm/kvm_emulate.h |2 +-
 arch/x86/include/asm/kvm_host.h|4 +-
 arch/x86/kvm/emulate.c |   51 +++-
 arch/x86/kvm/svm.c |3 +-
 arch/x86/kvm/vmx.c |5 ++-
 arch/x86/kvm/x86.c |6 ++--
 6 files changed, 55 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index ab4092e..c8a9cf3 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -372,7 +372,7 @@ bool x86_page_table_writing_insn(struct x86_emulate_ctxt 
*ctxt);
 #define EMULATION_INTERCEPTED 2
 int x86_emulate_insn(struct x86_emulate_ctxt *ctxt);
 int emulator_task_switch(struct x86_emulate_ctxt *ctxt,
-u16 tss_selector, int reason,
+u16 tss_selector, int idt_index, int reason,
 bool has_error_code, u32 error_code);
 int emulate_int_real(struct x86_emulate_ctxt *ctxt, int irq);
 #endif /* _ASM_X86_KVM_X86_EMULATE_H */
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 52d6640..0533fc4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -741,8 +741,8 @@ int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu);
 void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
 int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int seg);
 
-int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int reason,
-   bool has_error_code, u32 error_code);
+int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
+   int reason, bool has_error_code, u32 error_code);
 
 int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
 int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 05a562b..1b98a2c 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1151,6 +1151,22 @@ static int pio_in_emulated(struct x86_emulate_ctxt *ctxt,
return 1;
 }
 
+static int read_interrupt_descriptor(struct x86_emulate_ctxt *ctxt,
+u16 index, struct kvm_desc_struct *desc)
+{
+   struct kvm_desc_ptr dt;
+   ulong addr;
+
+   ctxt-ops-get_idt(ctxt, dt);
+
+   if (dt.size  index * 8 + 7)
+   return emulate_gp(ctxt, index  3 | 0x2);
+
+   addr = dt.address + index * 8;
+   return ctxt-ops-read_std(ctxt, addr, desc, sizeof *desc,
+  ctxt-exception);
+}
+
 static void get_descriptor_table_ptr(struct x86_emulate_ctxt *ctxt,
 u16 selector, struct desc_ptr *dt)
 {
@@ -2350,7 +2366,7 @@ static int task_switch_32(struct x86_emulate_ctxt *ctxt,
 }
 
 static int emulator_do_task_switch(struct x86_emulate_ctxt *ctxt,
-  u16 tss_selector, int reason,
+  u16 tss_selector, int idt_index, int reason,
   bool has_error_code, u32 error_code)
 {
struct x86_emulate_ops *ops = ctxt-ops;
@@ -2360,6 +2376,7 @@ static int emulator_do_task_switch(struct 
x86_emulate_ctxt *ctxt,
ulong old_tss_base =
ops-get_cached_segment_base(ctxt, VCPU_SREG_TR);
u32 desc_limit;
+   int dpl;
 
/* FIXME: old_tss_base == ~0 ? */
 
@@ -2372,12 +2389,32 @@ static int emulator_do_task_switch(struct 
x86_emulate_ctxt *ctxt,
 
/* FIXME: check that next_tss_desc is tss */
 
-   if (reason != TASK_SWITCH_IRET) {
-   if ((tss_selector  3)  next_tss_desc.dpl ||
-   ops-cpl(ctxt)  next_tss_desc.dpl)
-   return emulate_gp(ctxt, 0);
+   /*
+* Check privileges. The three cases are task switch caused by...
+*
+* 1. Software interrupt: Check against DPL of the task gate
+* 2. Exception/IRQ/iret: No check is performed
+* 3. jmp/call: Check agains DPL of the TSS
+*/
+   dpl = -1;
+   if (reason == TASK_SWITCH_GATE) {
+   if (idt_index != -1) {
+   struct kvm_desc_struct task_gate_desc;
+
+   ret = read_interrupt_descriptor(ctxt, idt_index,
+   task_gate_desc

[PATCH 2/3] KVM: x86 emulator: VM86 segments must have DPL 3

2012-01-23 Thread Kevin Wolf
Setting the segment DPL to 0 for at least the VM86 code segment makes
the VM entry fail on VMX.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 arch/x86/kvm/emulate.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 1b98a2c..833969e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1243,6 +1243,8 @@ static int load_segment_descriptor(struct 
x86_emulate_ctxt *ctxt,
seg_desc.type = 3;
seg_desc.p = 1;
seg_desc.s = 1;
+   if (ctxt-mode == X86EMUL_MODE_VM86)
+   seg_desc.dpl = 3;
goto load;
}
 
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] KVM: x86 emulator: Allow PM/VM86 switch during task switch

2012-01-23 Thread Kevin Wolf
Task switches can switch between Protected Mode and VM86. The current
mode must be updated during the task switch emulation so that the new
segment selectors are interpreted correctly and privilege checks
succeed.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 arch/x86/include/asm/kvm_emulate.h |1 +
 arch/x86/kvm/emulate.c |   17 +
 arch/x86/kvm/x86.c |6 ++
 3 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index c8a9cf3..4a21c7d 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -176,6 +176,7 @@ struct x86_emulate_ops {
void (*set_idt)(struct x86_emulate_ctxt *ctxt, struct desc_ptr *dt);
ulong (*get_cr)(struct x86_emulate_ctxt *ctxt, int cr);
int (*set_cr)(struct x86_emulate_ctxt *ctxt, int cr, ulong val);
+   void (*set_rflags)(struct x86_emulate_ctxt *ctxt, ulong val);
int (*cpl)(struct x86_emulate_ctxt *ctxt);
int (*get_dr)(struct x86_emulate_ctxt *ctxt, int dr, ulong *dest);
int (*set_dr)(struct x86_emulate_ctxt *ctxt, int dr, ulong value);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 833969e..52fce89 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2273,6 +2273,23 @@ static int load_state_from_tss32(struct x86_emulate_ctxt 
*ctxt,
return emulate_gp(ctxt, 0);
ctxt-_eip = tss-eip;
ctxt-eflags = tss-eflags | 2;
+
+   /*
+* If we're switching between Protected Mode and VM86, we need to make
+* sure to update the mode before loading the segment descriptors so
+* that the selectors are interpreted correctly.
+*
+* Need to get it to the vcpu struct immediately because it influences
+* the CPL which is checked when loading the segment descriptors.
+*/
+   if (ctxt-eflags  X86_EFLAGS_VM)
+   ctxt-mode = X86EMUL_MODE_VM86;
+   else
+   ctxt-mode = X86EMUL_MODE_PROT32;
+
+   ctxt-ops-set_rflags(ctxt, ctxt-eflags);
+
+   /* General purpose registers */
ctxt-regs[VCPU_REGS_RAX] = tss-eax;
ctxt-regs[VCPU_REGS_RCX] = tss-ecx;
ctxt-regs[VCPU_REGS_RDX] = tss-edx;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dc3e945..502b5c3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4040,6 +4040,11 @@ static int emulator_set_cr(struct x86_emulate_ctxt 
*ctxt, int cr, ulong val)
return res;
 }
 
+static void emulator_set_rflags(struct x86_emulate_ctxt *ctxt, ulong val)
+{
+   kvm_set_rflags(emul_to_vcpu(ctxt), val);
+}
+
 static int emulator_get_cpl(struct x86_emulate_ctxt *ctxt)
 {
return kvm_x86_ops-get_cpl(emul_to_vcpu(ctxt));
@@ -4199,6 +4204,7 @@ static struct x86_emulate_ops emulate_ops = {
.set_idt = emulator_set_idt,
.get_cr  = emulator_get_cr,
.set_cr  = emulator_set_cr,
+   .set_rflags  = emulator_set_rflags,
.cpl = emulator_get_cpl,
.get_dr  = emulator_get_dr,
.set_dr  = emulator_set_dr,
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests 4/4] x86/taskswitch_vm86: Task switches into/out of VM86

2012-01-23 Thread Kevin Wolf
Am 23.01.2012 17:10, schrieb Gleb Natapov:
 On Mon, Jan 23, 2012 at 05:07:13PM +0100, Kevin Wolf wrote:
 This adds a test case that jumps into VM86 by iret-ing to a TSS and back
 to Protected Mode using a task gate in the IDT.

 Can you add the test case to taskswitch2.c?

That's actually what I intended to do at first, but there's nothing to
share and having a clean environment that can't interfere with other
tests feels nicer.

What would we gain from merging the files?

Kevin


 
 Signed-off-by: Kevin Wolf kw...@redhat.com
 ---
  config-i386.mak   |3 +-
  lib/x86/desc.c|   37 +-
  lib/x86/desc.h|   36 +
  lib/x86/vm.c  |4 +-
  lib/x86/vm.h  |1 +
  x86/taskswitch_vm86.c |   59 
 +
  x86/unittests.cfg |6 +
  7 files changed, 107 insertions(+), 39 deletions(-)
  create mode 100644 x86/taskswitch_vm86.c

 diff --git a/config-i386.mak b/config-i386.mak
 index de52f3d..b5c3b9c 100644
 --- a/config-i386.mak
 +++ b/config-i386.mak
 @@ -5,9 +5,10 @@ ldarch = elf32-i386
  CFLAGS += -D__i386__
  CFLAGS += -I $(KERNELDIR)/include
  
 -tests = $(TEST_DIR)/taskswitch.flat $(TEST_DIR)/taskswitch2.flat
 +tests = $(TEST_DIR)/taskswitch.flat $(TEST_DIR)/taskswitch2.flat 
 $(TEST_DIR)/taskswitch_vm86.flat
  
  include config-x86-common.mak
  
  $(TEST_DIR)/taskswitch.elf: $(cstart.o) $(TEST_DIR)/taskswitch.o
  $(TEST_DIR)/taskswitch2.elf: $(cstart.o) $(TEST_DIR)/taskswitch2.o
 +$(TEST_DIR)/taskswitch_vm86.elf: $(cstart.o) $(TEST_DIR)/taskswitch_vm86.o
 diff --git a/lib/x86/desc.c b/lib/x86/desc.c
 index 770c250..c4a3607 100644
 --- a/lib/x86/desc.c
 +++ b/lib/x86/desc.c
 @@ -27,41 +27,6 @@ typedef struct {
  u8 base_high;
  } gdt_entry_t;
  
 -typedef struct {
 -u16 prev;
 -u16 res1;
 -u32 esp0;
 -u16 ss0;
 -u16 res2;
 -u32 esp1;
 -u16 ss1;
 -u16 res3;
 -u32 esp2;
 -u16 ss2;
 -u16 res4;
 -u32 cr3;
 -u32 eip;
 -u32 eflags;
 -u32 eax, ecx, edx, ebx, esp, ebp, esi, edi;
 -u16 es;
 -u16 res5;
 -u16 cs;
 -u16 res6;
 -u16 ss;
 -u16 res7;
 -u16 ds;
 -u16 res8;
 -u16 fs;
 -u16 res9;
 -u16 gs;
 -u16 res10;
 -u16 ldt;
 -u16 res11;
 -u16 t:1;
 -u16 res12:15;
 -u16 iomap_base;
 -} tss32_t;
 -
  extern idt_entry_t boot_idt[256];
  
  void set_idt_entry(int vec, void *addr, int dpl)
 @@ -327,7 +292,7 @@ void setup_gdt(void)
.Lflush2: ::r(0x10));
  }
  
 -static void set_idt_task_gate(int vec, u16 sel)
 +void set_idt_task_gate(int vec, u16 sel)
  {
  idt_entry_t *e = boot_idt[vec];
  
 diff --git a/lib/x86/desc.h b/lib/x86/desc.h
 index 0b4897c..f819452 100644
 --- a/lib/x86/desc.h
 +++ b/lib/x86/desc.h
 @@ -24,6 +24,41 @@ struct ex_regs {
  unsigned long rflags;
  };
  
 +typedef struct {
 +u16 prev;
 +u16 res1;
 +u32 esp0;
 +u16 ss0;
 +u16 res2;
 +u32 esp1;
 +u16 ss1;
 +u16 res3;
 +u32 esp2;
 +u16 ss2;
 +u16 res4;
 +u32 cr3;
 +u32 eip;
 +u32 eflags;
 +u32 eax, ecx, edx, ebx, esp, ebp, esi, edi;
 +u16 es;
 +u16 res5;
 +u16 cs;
 +u16 res6;
 +u16 ss;
 +u16 res7;
 +u16 ds;
 +u16 res8;
 +u16 fs;
 +u16 res9;
 +u16 gs;
 +u16 res10;
 +u16 ldt;
 +u16 res11;
 +u16 t:1;
 +u16 res12:15;
 +u16 iomap_base;
 +} tss32_t;
 +
  #define ASM_TRY(catch)  \
  movl $0, %%gs:4 \n\t  \
  .pushsection .data.ex \n\t\
 @@ -44,6 +79,7 @@ unsigned exception_error_code(void);
  void set_idt_entry(int vec, void *addr, int dpl);
  void set_idt_sel(int vec, u16 sel);
  void set_gdt_entry(int num, u32 base,  u32 limit, u8 access, u8 gran);
 +void set_idt_task_gate(int vec, u16 sel);
  void set_intr_task_gate(int e, void *fn);
  void print_current_tss_info(void);
  void handle_exception(u8 v, void (*func)(struct ex_regs *regs));
 diff --git a/lib/x86/vm.c b/lib/x86/vm.c
 index abbb0c9..aae044a 100644
 --- a/lib/x86/vm.c
 +++ b/lib/x86/vm.c
 @@ -108,14 +108,14 @@ void install_large_page(unsigned long *cr3,
unsigned long phys,
void *virt)
  {
 -install_pte(cr3, 2, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_PSE, 0);
 +install_pte(cr3, 2, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_USER | 
 PTE_PSE, 0);
  }
  
  void install_page(unsigned long *cr3,
unsigned long phys,
void *virt)
  {
 -install_pte(cr3, 1, virt, phys | PTE_PRESENT | PTE_WRITE, 0);
 +install_pte(cr3, 1, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_USER, 0);
  }
  
  
 diff --git a/lib/x86/vm.h b/lib/x86/vm.h
 index bf8fd52..aebc5c3 100644
 --- a/lib/x86/vm.h
 +++ b/lib/x86/vm.h
 @@ -13,6 +13,7 @@
  #define PTE_PRESENT (1ull  0)
  #define PTE_PSE (1ull  7

Re: [PATCH kvm-unit-tests 4/4] x86/taskswitch_vm86: Task switches into/out of VM86

2012-01-23 Thread Kevin Wolf
Am 23.01.2012 17:22, schrieb Gleb Natapov:
 On Mon, Jan 23, 2012 at 05:20:22PM +0100, Kevin Wolf wrote:
 Am 23.01.2012 17:10, schrieb Gleb Natapov:
 On Mon, Jan 23, 2012 at 05:07:13PM +0100, Kevin Wolf wrote:
 This adds a test case that jumps into VM86 by iret-ing to a TSS and back
 to Protected Mode using a task gate in the IDT.

 Can you add the test case to taskswitch2.c?

 Running one test to check all aspects of taskswitch emulation.

(We all know that top-posting is disliked, but middle-posting looks even
crazier!)

Does having one test provide any value in and of itself? It's just an
implementation detail of the test suite. When testing the KVM patches I
ran all three test cases with './run_tests.sh -g task', which is
hopefully easy enough.

Kevin

 
 That's actually what I intended to do at first, but there's nothing to
 share and having a clean environment that can't interfere with other
 tests feels nicer.

 What would we gain from merging the files?

 Kevin



 Signed-off-by: Kevin Wolf kw...@redhat.com
 ---
  config-i386.mak   |3 +-
  lib/x86/desc.c|   37 +-
  lib/x86/desc.h|   36 +
  lib/x86/vm.c  |4 +-
  lib/x86/vm.h  |1 +
  x86/taskswitch_vm86.c |   59 
 +
  x86/unittests.cfg |6 +
  7 files changed, 107 insertions(+), 39 deletions(-)
  create mode 100644 x86/taskswitch_vm86.c

 diff --git a/config-i386.mak b/config-i386.mak
 index de52f3d..b5c3b9c 100644
 --- a/config-i386.mak
 +++ b/config-i386.mak
 @@ -5,9 +5,10 @@ ldarch = elf32-i386
  CFLAGS += -D__i386__
  CFLAGS += -I $(KERNELDIR)/include
  
 -tests = $(TEST_DIR)/taskswitch.flat $(TEST_DIR)/taskswitch2.flat
 +tests = $(TEST_DIR)/taskswitch.flat $(TEST_DIR)/taskswitch2.flat 
 $(TEST_DIR)/taskswitch_vm86.flat
  
  include config-x86-common.mak
  
  $(TEST_DIR)/taskswitch.elf: $(cstart.o) $(TEST_DIR)/taskswitch.o
  $(TEST_DIR)/taskswitch2.elf: $(cstart.o) $(TEST_DIR)/taskswitch2.o
 +$(TEST_DIR)/taskswitch_vm86.elf: $(cstart.o) $(TEST_DIR)/taskswitch_vm86.o
 diff --git a/lib/x86/desc.c b/lib/x86/desc.c
 index 770c250..c4a3607 100644
 --- a/lib/x86/desc.c
 +++ b/lib/x86/desc.c
 @@ -27,41 +27,6 @@ typedef struct {
u8 base_high;
  } gdt_entry_t;
  
 -typedef struct {
 -  u16 prev;
 -  u16 res1;
 -  u32 esp0;
 -  u16 ss0;
 -  u16 res2;
 -  u32 esp1;
 -  u16 ss1;
 -  u16 res3;
 -  u32 esp2;
 -  u16 ss2;
 -  u16 res4;
 -  u32 cr3;
 -  u32 eip;
 -  u32 eflags;
 -  u32 eax, ecx, edx, ebx, esp, ebp, esi, edi;
 -  u16 es;
 -  u16 res5;
 -  u16 cs;
 -  u16 res6;
 -  u16 ss;
 -  u16 res7;
 -  u16 ds;
 -  u16 res8;
 -  u16 fs;
 -  u16 res9;
 -  u16 gs;
 -  u16 res10;
 -  u16 ldt;
 -  u16 res11;
 -  u16 t:1;
 -  u16 res12:15;
 -  u16 iomap_base;
 -} tss32_t;
 -
  extern idt_entry_t boot_idt[256];
  
  void set_idt_entry(int vec, void *addr, int dpl)
 @@ -327,7 +292,7 @@ void setup_gdt(void)
  .Lflush2: ::r(0x10));
  }
  
 -static void set_idt_task_gate(int vec, u16 sel)
 +void set_idt_task_gate(int vec, u16 sel)
  {
  idt_entry_t *e = boot_idt[vec];
  
 diff --git a/lib/x86/desc.h b/lib/x86/desc.h
 index 0b4897c..f819452 100644
 --- a/lib/x86/desc.h
 +++ b/lib/x86/desc.h
 @@ -24,6 +24,41 @@ struct ex_regs {
  unsigned long rflags;
  };
  
 +typedef struct {
 +  u16 prev;
 +  u16 res1;
 +  u32 esp0;
 +  u16 ss0;
 +  u16 res2;
 +  u32 esp1;
 +  u16 ss1;
 +  u16 res3;
 +  u32 esp2;
 +  u16 ss2;
 +  u16 res4;
 +  u32 cr3;
 +  u32 eip;
 +  u32 eflags;
 +  u32 eax, ecx, edx, ebx, esp, ebp, esi, edi;
 +  u16 es;
 +  u16 res5;
 +  u16 cs;
 +  u16 res6;
 +  u16 ss;
 +  u16 res7;
 +  u16 ds;
 +  u16 res8;
 +  u16 fs;
 +  u16 res9;
 +  u16 gs;
 +  u16 res10;
 +  u16 ldt;
 +  u16 res11;
 +  u16 t:1;
 +  u16 res12:15;
 +  u16 iomap_base;
 +} tss32_t;
 +
  #define ASM_TRY(catch)  \
  movl $0, %%gs:4 \n\t  \
  .pushsection .data.ex \n\t\
 @@ -44,6 +79,7 @@ unsigned exception_error_code(void);
  void set_idt_entry(int vec, void *addr, int dpl);
  void set_idt_sel(int vec, u16 sel);
  void set_gdt_entry(int num, u32 base,  u32 limit, u8 access, u8 gran);
 +void set_idt_task_gate(int vec, u16 sel);
  void set_intr_task_gate(int e, void *fn);
  void print_current_tss_info(void);
  void handle_exception(u8 v, void (*func)(struct ex_regs *regs));
 diff --git a/lib/x86/vm.c b/lib/x86/vm.c
 index abbb0c9..aae044a 100644
 --- a/lib/x86/vm.c
 +++ b/lib/x86/vm.c
 @@ -108,14 +108,14 @@ void install_large_page(unsigned long *cr3,
unsigned long phys,
void *virt)
  {
 -install_pte(cr3, 2, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_PSE, 
 0);
 +install_pte(cr3, 2, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_USER | 
 PTE_PSE, 0);
  }
  
  void install_page(unsigned long *cr3,
unsigned long phys

Re: [RFC PATCH] emulator: Fix task switch into/out of VM86

2012-01-16 Thread Kevin Wolf
Am 10.01.2012 18:51, schrieb Joerg Roedel:
 On Tue, Jan 10, 2012 at 01:30:47PM +0200, Gleb Natapov wrote:
 On Tue, Jan 10, 2012 at 12:25:18PM +0100, Kevin Wolf wrote:
 Did that now, and it looks like exit_int_info is always 0 during the
 task switch intercept for a task gate in the IDT. So special-casing VM86
 won't help, I'm afraid.

 Joerg, do you know how can we check that task switch was cause by an
 exception triggering task gate if exit_int_info is not provided during
 intercept (short of decoding instruction that caused intercept to see if
 it's a call to a gate, which is racy) ?
 
 Hmm, havn't found a solution yet. But I will check further and let you
 know if I find out something.

Jörg, any news on this?

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] emulator: Fix task switch into/out of VM86

2012-01-10 Thread Kevin Wolf
Am 10.01.2012 10:01, schrieb Gleb Natapov:
 On Mon, Jan 09, 2012 at 09:10:10PM +0100, Kevin Wolf wrote:
 * This works with VMX, but with SVM I have an additional problem: When
   trying to exit VM86 (usually by an exception) through a task gate in
   the IDT, the code runs into the reason = TASK_SWITCH_CALL path. I
   searched a bit in the documentation, but didn't find any obvious way
   to fix this.

 Hmm, so exit_int_info is invalid during task switch exit even though
 task switch was caused by an exception. I wonder is this the case when
 vcpu is not in vm86 mode too?

No idea, I would have to try it out.

 For vm86 we can change:
 
 else
 reason = TASK_SWITCH_CALL;
 
 to
   else if (vcpu in vm86 mode)
   reason = TASK_SWITCH_GATE;
   else
   reason = TASK_SWITCH_CALL;
 
 IIRC you can't change tasks by call in vm86 mode.

Didn't check it in the manual, but you'll have a hard time accessing a
protected mode segment in VM86, so I guess you're right. And in the VM86
branch we can probably fake the rest of the interrupt information so
that we can pass the checks in the emulator (basically saying not a
software interrupt should be enough).

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] emulator: Fix task switch into/out of VM86

2012-01-10 Thread Kevin Wolf
Am 10.01.2012 10:28, schrieb Kevin Wolf:
 Am 10.01.2012 10:01, schrieb Gleb Natapov:
 On Mon, Jan 09, 2012 at 09:10:10PM +0100, Kevin Wolf wrote:
 * This works with VMX, but with SVM I have an additional problem: When
   trying to exit VM86 (usually by an exception) through a task gate in
   the IDT, the code runs into the reason = TASK_SWITCH_CALL path. I
   searched a bit in the documentation, but didn't find any obvious way
   to fix this.

 Hmm, so exit_int_info is invalid during task switch exit even though
 task switch was caused by an exception. I wonder is this the case when
 vcpu is not in vm86 mode too?
 
 No idea, I would have to try it out.

Did that now, and it looks like exit_int_info is always 0 during the
task switch intercept for a task gate in the IDT. So special-casing VM86
won't help, I'm afraid.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   >