Re: [PATCH v2 1/1] virtio: fix the condition for iommu_platform not supported

2022-01-27 Thread Brijesh Singh




On 1/27/22 7:28 AM, Halil Pasic wrote:

ping^2

Also adding Brijesh and Daniel, as I believe you guys should be
interested in this, and I'm yet to receive review.

@Brijesh, Daniel: Can you confirm that AMD (SEV) and Power are affected
too, and that the fix works for your platforms as well?



Thanks for looping me in, I can confirm that SEV virtio-fs device 
support was *broken* on the latest qemu, and your patch fixes it.



Tested-by: Brijesh Singh 


Regards,
Halil

On Tue, 25 Jan 2022 11:21:12 +0100
Halil Pasic  wrote:


ping

On Mon, 17 Jan 2022 13:02:38 +0100
Halil Pasic  wrote:


The commit 04ceb61a40 ("virtio: Fail if iommu_platform is requested, but
unsupported") claims to fail the device hotplug when iommu_platform
is requested, but not supported by the (vhost) device. On the first
glance the condition for detecting that situation looks perfect, but
because a certain peculiarity of virtio_platform it ain't.

In fact the aforementioned commit introduces a regression. It breaks
virtio-fs support for Secure Execution, and most likely also for AMD SEV
or any other confidential guest scenario that relies encrypted guest
memory.  The same also applies to any other vhost device that does not
support _F_ACCESS_PLATFORM.

The peculiarity is that iommu_platform and _F_ACCESS_PLATFORM collates
"device can not access all of the guest RAM" and "iova != gpa, thus
device needs to translate iova".

Confidential guest technologies currently rely on the device/hypervisor
offering _F_ACCESS_PLATFORM, so that, after the feature has been
negotiated, the guest  grants access to the portions of memory the
device needs to see. So in for confidential guests, generally,
_F_ACCESS_PLATFORM is about the restricted access to memory, but not
about the addresses used being something else than guest physical
addresses.

This is the very reason for which commit f7ef7e6e3b ("vhost: correctly
turn on VIRTIO_F_IOMMU_PLATFORM") for, which fences _F_ACCESS_PLATFORM
form the vhost device that does not need it, because on the vhost
interface it only means "I/O address translation is needed".

This patch takes inspiration from f7ef7e6e3b ("vhost: correctly turn on
VIRTIO_F_IOMMU_PLATFORM"), and uses the same condition for detecting the
situation when _F_ACCESS_PLATFORM is requested, but no I/O translation
by the device, and thus no device capability is needed. In this
situation claiming that the device does not support iommu_plattform=on
is counter-productive. So let us stop doing that!

Signed-off-by: Halil Pasic 
Reported-by: Jakob Naucke 
Fixes: 04ceb61a40 ("virtio: Fail if iommu_platform is requested, but
unsupported")
Cc: Kevin Wolf 
Cc: qemu-sta...@nongnu.org

---

v1->v2:
* Commit message tweaks. Most notably fixed commit SHA (Michael)

---
  hw/virtio/virtio-bus.c | 11 ++-
  1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
index d23db98c56..c1578f3de2 100644
--- a/hw/virtio/virtio-bus.c
+++ b/hw/virtio/virtio-bus.c
@@ -69,11 +69,6 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error 
**errp)
  return;
  }
  
-if (has_iommu && !virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {

-error_setg(errp, "iommu_platform=true is not supported by the device");
-return;
-}
-
  if (klass->device_plugged != NULL) {
  klass->device_plugged(qbus->parent, _err);
  }
@@ -88,6 +83,12 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error 
**errp)
  } else {
  vdev->dma_as = _space_memory;
  }
+
+if (has_iommu && vdev->dma_as != _space_memory
+  && !virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {
+error_setg(errp, "iommu_platform=true is not supported by the device");
+return;
+}
  }
  
  /* Reset the virtio_bus */


base-commit: 6621441db50d5bae7e34dbd04bf3c57a27a71b32








Re: [PATCH 0/2] Improved support for AMD SEV firmware loading

2022-01-17 Thread Brijesh Singh


On 1/17/22 1:34 AM, Dov Murik wrote:
> [+cc Tom, Brijesh, Ashish - see SEV-related changes in this series]
>
>
> On 13/01/2022 18:55, Daniel P. Berrangé wrote:
>> The AMD SEV build of EDK2 only emits a single file, intended to be
>>
>> mapped readonly. There is explicitly no separate writable VARS
>>
>> store for persisting non-volatile firmware variables.
>>
>>
>>
>> This can be used with QEMU's traditional pflash configuration
>>
>> mechanism by only populating pflash0, leaving pflash1 unconfigured.
>>
>> Conceptually, however, it is odd to be using pflash at all when we
>>
>> have no intention of supporting any writable variables. The -bios
>>
>> option should be sufficient for any firmware that is exclusively
>>
>> readonly code.
>>
>>
>>
>>
>>
>> A second issue is that the firmware descriptor schema does not allow
>>
>> for describing a firmware that uses pflash, without any associated
>>
>> non-volatile storage.
>>
>>
>>
>> In docs/interop/firmware.json
>>
>>
>>
>>  'struct' : 'FirmwareMappingFlash',
>>
>>   'data'   : { 'executable' : 'FirmwareFlashFile',
>>
>>'nvram-template' : 'FirmwareFlashFile' } }
>>
>>
>>
>> Notice that nvram-template is mandatory, and when consuming these
>>
>> files libvirt will thus complain if the nvram-template field is
>>
>> missing.
>>
>>
>>
>> We could in theory make nvram-template optional in the schema and
>>
>> then update libvirt to take account of it, but this feels dubious
>>
>> when we have a perfectly good way of describing a firmware without
>>
>> NVRAM, using 'FirmwareMappingMemory' which is intended to be used
>>
>> with QEMU's -bios option.
>>
>>
>>
>>
>>
>> A third issue is in libvirt, where again the code handling the
>>
>> configuration of pflash supports two scenarios
>>
>>
>>
>>  - A single pflash image, which is writable
>>
>>  - A pair of pflash images, one writable one readonly
>>
>>
>>
>> There is no support for a single read-only pflash image in libvirt
>>
>> today.
>>
>>
>>
>>
>>
>> This all points towards the fact that we should be using -bios
>>
>> to load the AMD SEV firmware build of EDK.
>>
>>
>>
>> The only thing preventing us doing that is that QEMU does not
>>
>> initialize the SEV firmware when using -bios. That is fairly
>>
>> easily solved, as done in this patch series.
>>
>>
>>
>> For testing I've launched QEMU in in these scenarios
>>
>>
>>
>>   - SEV guest using -bios and boot from HD
>>
>>   - SEV guest using pflash and boot from HD
>>
>>   - SEV-ES guest using -bios and direct kernel boot
>>
>>   - SEV-ES guest using pflash and direct kernel boot
>>
>>
>>
>> In all these cases I was able to validate the reported SEV
>>
>> guest measurement.
>>
>>
>
> I'm having trouble testing this series (applied on top of master commit 
> 69353c332c):
> it hangs with -bios but works OK with pflash:
>
> Here's with -bios:
>
> $ sudo /home/dmurik/git/qemu/build/qemu-system-x86_64 -enable-kvm \
>-cpu host -machine q35 -smp 4 -m 2G \
>-machine confidential-guest-support=sev0 \
>-object sev-guest,id=sev0,cbitpos=47,reduced-phys-bits=1,policy=0x0 \
>-bios /home/dmurik/git/edk2/Build/AmdSev/DEBUG_GCC5/FV/OVMF.fd \
>-nographic \
>-global isa-debugcon.iobase=0x402 -debugcon file:ovmf-1.log \
>-monitor pty -trace 'enable=kvm_sev_*'
>
> char device redirected to /dev/pts/14 (label compat_monitor0)
> kvm_sev_init
> kvm_sev_launch_start policy 0x0 session (nil) pdh (nil)
> kvm_sev_change_state uninit -> launch-update
> kvm_sev_launch_update_data addr 0x7f42e9bff010 len 0x40
> kvm_sev_change_state launch-update -> launch-secret
> kvm_sev_launch_measurement data 
> PF6n7+Vujx5sW8PC6iMRtHXfpXdJ4osbcfYvoknu7gg4ypMqs727NTzG86Ft8Llu
> kvm_sev_launch_finish
> kvm_sev_change_state launch-secret -> running
>
>
> Here it hangs. The ovmf-1.log file is empty.
>
> Notice that kvm_sev_launch_update_data is called, so the new
> -bios behaviour is triggered correctly.  But the guest doesn't
> start running.

I have not looked at the patch detail yet but address looks wrong, it
looks like the hva 0x7f42e9bff010 end of the ROM. We need to encrypt the
entire ROM to boot, so I was hoping that hva will be 2MB aligned or a
page-aligned. You can enable the KVM trace to see if we are able to
enter and execute anything from guest.


> Here is the guest's state:
>
> (qemu) info registers
> EAX=606b EBX=1268 ECX=440c EDX=008328d2
> ESI=91e2 EDI=e9e3 EBP=a451 ESP=9af0
> EIP=3612 EFL=0082 [--S] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =   9300
> CS =a76e 000a76e0  9b00
> SS =   9300
> DS =   9300
> FS =   9300
> GS =   9300
> LDT=   8200
> TR =   8b00
> GDT=  
> IDT=  
> CR0=6010 CR2= CR3= CR4=
> DR0= 

Re: SEV guest attestation

2021-11-29 Thread Brijesh Singh




On 11/29/21 8:29 AM, Brijesh Singh wrote:



On 11/25/21 7:59 AM, Dov Murik wrote:

[+cc Tom, Brijesh]

On 25/11/2021 15:42, Daniel P. Berrangé wrote:

On Thu, Nov 25, 2021 at 02:44:51PM +0200, Dov Murik wrote:

[+cc jejb, tobin, jim, hubertus]


On 25/11/2021 9:14, Sergio Lopez wrote:
On Wed, Nov 24, 2021 at 06:29:07PM +, Dr. David Alan Gilbert 
wrote:

* Daniel P. Berrangé (berra...@redhat.com) wrote:

On Wed, Nov 24, 2021 at 11:34:16AM -0500, Tyler Fanelli wrote:

Hi,

We recently discussed a way for remote SEV guest attestation 
through QEMU.
My initial approach was to get data needed for attestation 
through different
QMP commands (all of which are already available, so no changes 
required
there), deriving hashes and certificate data; and collecting all 
of this
into a new QMP struct (SevLaunchStart, which would include the 
VM's policy,
secret, and GPA) which would need to be upstreamed into QEMU. 
Once this is
provided, QEMU would then need to have support for attestation 
before a VM
is started. Upon speaking to Dave about this proposal, he 
mentioned that
this may not be the best approach, as some situations would 
render the
attestation unavailable, such as the instance where a VM is 
running in a
cloud, and a guest owner would like to perform attestation via 
QMP (a likely
scenario), yet a cloud provider cannot simply let anyone pass 
arbitrary QMP

commands, as this could be an issue.


As a general point, QMP is a low level QEMU implementation detail,
which is generally expected to be consumed exclusively on the host
by a privileged mgmt layer, which will in turn expose its own higher
level APIs to users or other apps. I would not expect to see QMP
exposed to anything outside of the privileged host layer.

We also use the QAPI protocol for QEMU guest agent commmunication,
however, that is a distinct service from QMP on the host. It shares
most infra with QMP but has a completely diffent command set. On the
host it is not consumed inside QEMU, but instead consumed by a
mgmt app like libvirt.

So I ask, does anyone involved in QEMU's SEV implementation have 
any input
on a quality way to perform guest attestation? If so, I'd be 
interested.


I think what's missing is some clearer illustrations of how this
feature is expected to be consumed in some real world application
and the use cases we're trying to solve.

I'd like to understand how it should fit in with common libvirt
applications across the different virtualization management
scenarios - eg virsh (command line),  virt-manger (local desktop
GUI), cockpit (single host web mgmt), OpenStack (cloud mgmt), etc.
And of course any non-traditional virt use cases that might be
relevant such as Kata.


That's still not that clear; I know Alice and Sergio have some ideas
(cc'd).
There's also some standardisation efforts (e.g. 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.potaroo.net%2Fietf%2Fhtml%2Fids-wg-rats.htmldata=04%7C01%7Cbrijesh.singh%40amd.com%7C3c94b09f0cd5450460a808d9b01be1f8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637734456065941078%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=E%2FeaI6JNF2ckosTeAbFRaCZUJOZ3zG0GNfKP8082INQ%3Dreserved=0 

and 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ietf.org%2Farchive%2Fid%2Fdraft-ietf-rats-architecture-00.htmldata=04%7C01%7Cbrijesh.singh%40amd.com%7C3c94b09f0cd5450460a808d9b01be1f8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637734456065951077%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=WEkMIZZp3O5Gyay5jZT8KSUH9fyarNfXy5O0Z%2FpHdnQ%3Dreserved=0 


) - that I can't claim to fully understand.
However, there are some themes that are emerging:

   a) One use is to only allow a VM to access some private data 
once we

prove it's the VM we expect running in a secure/confidential system
   b) (a) normally involves requesting some proof from the VM and 
then

providing it some confidential data/a key if it's OK
   c) RATs splits the problem up:
 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ietf.org%2Farchive%2Fid%2Fdraft-ietf-rats-architecture-00.html%23name-architectural-overviewdata=04%7C01%7Cbrijesh.singh%40amd.com%7C3c94b09f0cd5450460a808d9b01be1f8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637734456065951077%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=%2FwNFMGAfojFZyGIj79D5%2BW%2BRPPuwumJiqIrf5UVrkPU%3Dreserved=0 

 I don't fully understand the split yet, but in principal 
there are

at least a few different things:

   d) The comms layer
   e) Something that validates the attestation message (i.e. the
signatures are valid, the hashes all add up etc)
   f) Something that knows what hashes to expect (i.e. oh that's a 
RHEL

8.4 kernel, or that's a valid kernel command line)
   g) Something that holds some secrets that can be handed

Re: SEV guest attestation

2021-11-29 Thread Brijesh Singh




On 11/25/21 7:59 AM, Dov Murik wrote:

[+cc Tom, Brijesh]

On 25/11/2021 15:42, Daniel P. Berrangé wrote:

On Thu, Nov 25, 2021 at 02:44:51PM +0200, Dov Murik wrote:

[+cc jejb, tobin, jim, hubertus]


On 25/11/2021 9:14, Sergio Lopez wrote:

On Wed, Nov 24, 2021 at 06:29:07PM +, Dr. David Alan Gilbert wrote:

* Daniel P. Berrangé (berra...@redhat.com) wrote:

On Wed, Nov 24, 2021 at 11:34:16AM -0500, Tyler Fanelli wrote:

Hi,

We recently discussed a way for remote SEV guest attestation through QEMU.
My initial approach was to get data needed for attestation through different
QMP commands (all of which are already available, so no changes required
there), deriving hashes and certificate data; and collecting all of this
into a new QMP struct (SevLaunchStart, which would include the VM's policy,
secret, and GPA) which would need to be upstreamed into QEMU. Once this is
provided, QEMU would then need to have support for attestation before a VM
is started. Upon speaking to Dave about this proposal, he mentioned that
this may not be the best approach, as some situations would render the
attestation unavailable, such as the instance where a VM is running in a
cloud, and a guest owner would like to perform attestation via QMP (a likely
scenario), yet a cloud provider cannot simply let anyone pass arbitrary QMP
commands, as this could be an issue.


As a general point, QMP is a low level QEMU implementation detail,
which is generally expected to be consumed exclusively on the host
by a privileged mgmt layer, which will in turn expose its own higher
level APIs to users or other apps. I would not expect to see QMP
exposed to anything outside of the privileged host layer.

We also use the QAPI protocol for QEMU guest agent commmunication,
however, that is a distinct service from QMP on the host. It shares
most infra with QMP but has a completely diffent command set. On the
host it is not consumed inside QEMU, but instead consumed by a
mgmt app like libvirt.


So I ask, does anyone involved in QEMU's SEV implementation have any input
on a quality way to perform guest attestation? If so, I'd be interested.


I think what's missing is some clearer illustrations of how this
feature is expected to be consumed in some real world application
and the use cases we're trying to solve.

I'd like to understand how it should fit in with common libvirt
applications across the different virtualization management
scenarios - eg virsh (command line),  virt-manger (local desktop
GUI), cockpit (single host web mgmt), OpenStack (cloud mgmt), etc.
And of course any non-traditional virt use cases that might be
relevant such as Kata.


That's still not that clear; I know Alice and Sergio have some ideas
(cc'd).
There's also some standardisation efforts (e.g. 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.potaroo.net%2Fietf%2Fhtml%2Fids-wg-rats.htmldata=04%7C01%7Cbrijesh.singh%40amd.com%7C3c94b09f0cd5450460a808d9b01be1f8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637734456065941078%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=E%2FeaI6JNF2ckosTeAbFRaCZUJOZ3zG0GNfKP8082INQ%3Dreserved=0
and 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ietf.org%2Farchive%2Fid%2Fdraft-ietf-rats-architecture-00.htmldata=04%7C01%7Cbrijesh.singh%40amd.com%7C3c94b09f0cd5450460a808d9b01be1f8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637734456065951077%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=WEkMIZZp3O5Gyay5jZT8KSUH9fyarNfXy5O0Z%2FpHdnQ%3Dreserved=0
) - that I can't claim to fully understand.
However, there are some themes that are emerging:

   a) One use is to only allow a VM to access some private data once we
prove it's the VM we expect running in a secure/confidential system
   b) (a) normally involves requesting some proof from the VM and then
providing it some confidential data/a key if it's OK
   c) RATs splits the problem up:
 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ietf.org%2Farchive%2Fid%2Fdraft-ietf-rats-architecture-00.html%23name-architectural-overviewdata=04%7C01%7Cbrijesh.singh%40amd.com%7C3c94b09f0cd5450460a808d9b01be1f8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637734456065951077%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=%2FwNFMGAfojFZyGIj79D5%2BW%2BRPPuwumJiqIrf5UVrkPU%3Dreserved=0
 I don't fully understand the split yet, but in principal there are
at least a few different things:

   d) The comms layer
   e) Something that validates the attestation message (i.e. the
signatures are valid, the hashes all add up etc)
   f) Something that knows what hashes to expect (i.e. oh that's a RHEL
8.4 kernel, or that's a valid kernel command line)
   g) Something that holds some secrets that can be handed out if e & f
are happy.

   There have also been proposals (e.g. Intel 

Re: [RFC PATCH v2 00/12] Add AMD Secure Nested Paging (SEV-SNP) support

2021-11-16 Thread Brijesh Singh


On 11/16/21 3:23 AM, Daniel P. Berrangé wrote:
> On Thu, Aug 26, 2021 at 05:26:15PM -0500, Michael Roth wrote:
>> These patches implement SEV-SNP along with CPUID enforcement support for 
>> QEMU,
>> and are also available at:
>>
>>   
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmdroth%2Fqemu%2Fcommits%2Fsnp-rfc-v2-upstreamdata=04%7C01%7Cbrijesh.singh%40amd.com%7C3506c40b7121401945b108d9a8e2c8d0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637726514264887241%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=HXdG4TmNY157Gz6qLXhAL8FufCTxe9VzSiTaQICGawo%3Dreserved=0
>>
>> They are based on the initial RFC submitted by Brijesh:
>>
>>   
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fqemu-devel%2F20210722000259.ykepl7t6ptua7im5%40amd.com%2FT%2Fdata=04%7C01%7Cbrijesh.singh%40amd.com%7C3506c40b7121401945b108d9a8e2c8d0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637726514264887241%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=AhOI%2FoQFq4k%2B6uOqYQqos6FlxE4AD1FFYfIPPiSHioI%3Dreserved=0
> What's the status of these patches ?  Is there going to be any non-RFC
> version posted in the near future ?


I am waiting for the KVM interface to be finalized before spinning the
qemu patch. With the recent discussion on KVM patch we may see some
change in the interfaces. I am hoping to post updated series after
posting the newer KVM series.

thanls


>
> Regards,
> Daniel



Re: [PATCH v2 0/6] SEV: add kernel-hashes=on for measured -kernel launch

2021-11-10 Thread Brijesh Singh




On 11/8/21 7:48 AM, Dov Murik wrote:

Tom Lendacky and Brijesh Singh reported two issues with launching SEV
guests with the -kernel QEMU option when an old [1] or wrongly configured [2]
OVMF images are used.

To fix these issues, these series "hides" the whole kernel hashes
additions behind a kernel-hashes=on option (with default value of
"off").  This allows existing scenarios to work without change, and
explicitly forces kernel hashes additions for guests that require that.

Patch 1 introduces a new boolean option "kernel-hashes" on the sev-guest
object, and patch 2 causes QEMU to add kernel hashes only if its
explicitly set to "on".  This will mitigate both experienced issues
because the default of the new setting is off, and therefore is backward
compatible with older OVMF images (which don't have a designated hashes
table area) or with guests that don't wish to measure the kernel/initrd.

Patch 3 fixes the wording on the error message displayed when no hashes
table is found in the guest firmware.

Patch 4 detects incorrect address and length of the guest firmware
hashes table area and fails the boot.

Patch 5 is a refactoring of parts of the same function
sev_add_kernel_loader_hashes() to calculate all padding sizes at
compile-time.  Patch 6 also changes the same function and replaces the
call to qemu_map_ram_ptr() with address_space_map() to allow for error
detection.  Patches 5-6 are not required to fix the issues above, but
are suggested as an improvement (no functional change intended).

To enable addition of kernel/initrd/cmdline hashes into the SEV guest at
launch time, specify:

 qemu-system-x86_64 ... -object sev-guest,...,kernel-hashes=on


[1] 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fqemu-devel%2F3b9d10d9-5d9c-da52-f18c-cd93c1931706%40amd.com%2Fdata=04%7C01%7Cbrijesh.singh%40amd.com%7C908b739400a747e1b22308d9a2be7e07%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637719761315906327%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=cMoOlNU2faGwRk6dXVmOE1SuNrg3VvySAC1Ds8fcaFQ%3Dreserved=0
[2] 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fqemu-devel%2F001dd81a-282d-c307-a657-e228480d4af3%40amd.com%2Fdata=04%7C01%7Cbrijesh.singh%40amd.com%7C908b739400a747e1b22308d9a2be7e07%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637719761315916323%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=7IZ1%2B%2Fh%2B88xWDlHd%2FMKPN0fJfI6dmSX%2F1TbK8aL8bAs%3Dreserved=0



Changes in v2:
  - Instead of trying to figure out whether to add hashes or not,
explicity declare an option (kernel-hashes=on) for that.  When that
option is turned on, fail if the hashes cannot be added.
  - Rephrase error message when no hashes table GUID is found.
  - Replace qemu_map_ram_ptr with address_space_map

v1: 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fqemu-devel%2F20211101102136.1706421-1-dovmurik%40linux.ibm.com%2Fdata=04%7C01%7Cbrijesh.singh%40amd.com%7C908b739400a747e1b22308d9a2be7e07%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637719761315916323%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=SrE9kYP0Qdhx0WqIXbnwHgeX%2BjBVT9BsK6I0OLU3naI%3Dreserved=0


Dov Murik (6):
   qapi/qom,target/i386: sev-guest: Introduce kernel-hashes=on|off option
   target/i386/sev: Add kernel hashes only if sev-guest.kernel-hashes=on
   target/i386/sev: Rephrase error message when no hashes table in guest
 firmware
   target/i386/sev: Fail when invalid hashes table area detected
   target/i386/sev: Perform padding calculations at compile-time
   target/i386/sev: Replace qemu_map_ram_ptr with address_space_map

  qapi/qom.json |  7 -
  target/i386/sev.c | 77 +++
  qemu-options.hx   |  6 +++-
  3 files changed, 75 insertions(+), 15 deletions(-)




Thanks for the fixing it Dov.

Acked-by: Brijesh Singh 

thanks



Re: [PATCH 0/3] SEV: fixes for -kernel launch with incompatible OVMF

2021-11-08 Thread Brijesh Singh




On 11/5/21 1:32 PM, Dov Murik wrote:



On 02/11/2021 16:48, Brijesh Singh wrote:



On 11/2/21 8:22 AM, Dov Murik wrote:



On 02/11/2021 12:52, Brijesh Singh wrote:

Hi Dov,

Overall the patch looks good, only question I have is that now we are
enforce qemu to hash the kernel, initrd and cmdline unconditionally for
any of the SEV guest launches. This requires anyone wanting to
calculating the expected measurement need to account for it. Should we
make the hash page build optional ?



The problem with adding a -enable-add-kernel-hashes QEMU option (or
suboption) is yet another complexity for the user.  I'd also argue that
adding these hashes can lead to a more secure VM boot process, so it
makes sense for it to be the default (and maybe introduce a
-allow-insecure-unmeasured-kernel-via-fw-cfg option to prevent the
measurement from changing due to addition of hashes?).

Maybe, on the other hand, OVMF should "report" whether it supports
hashes verification. If it does, it should have the GUID in the table
(near the reset vector), like the current OvmfPkg/AmdSev edk2 build. If
it doesn't support that, then the entry should not appear at all, and
then QEMU won't add the hashes (with patch 1 from this series).  This
means that in edk2 we need to remove the SEV Hash Table block from the
ResetVectorVtf0.asm for OvmfPkg, but include it in the AmdSev build.



By leaving it ON is conveying a wrong message to the user. The library
used for verifying the hash is a NULL library for all the builds of Ovmf
except the AmdSev package. In the NULL library case, OVMF does not
perform any checks and hash table is useless. I will raise this on
concern on your Ovmf patch series.

IMHO, if you want to turn it ON by default then make sure all the OVMF
package builds supports validating the hash.



But the problem with this approach is that it prevents the future
unification of AmdSev and OvmfPkg, which is a possibility we discussed
(at least with Dave Gilbert), though not sure it's a good/feasible goal.




This is my exact concern, we are auto enabling the features in Qemu that
is supported by AmdSev package only.





I am thinking this more for the SEV-SNP guest. As you may be aware that
with SEV-SNP the attestation is performed by the guest, and its possible
for the launch flow to pass 512-bits of host_data that gets included in
the report. If a user wants to do the hash'e checks for the SNP then
they can pass a hash of kernel, initrd and cmdline through a
launch_finish.ID_BLOCK.host_data and does not require a special hash
page. This it will simplify the expected hash calculation.


That is a new measured boot "protocol" that we can discuss, and see
whether it's better/easier than the existing one at hand that works on
SEV and SEV-ES.

What I don't understand in your suggestion is who performs a SHA256 of
the fw_cfg blobs (kernel/initrd/cmdline) so they can later be verified
(though ideally earlier is better).  Can you describe the details
(step-by-step) of an SNP VM boot with -kernel/-initrd/-append and how
the measurement/attestation is performed?




There are a multiple ways on how you can do a measured boot with the SNP.

1) VMPL0 (SVSM) can provide a complete vTPM (see the MSFT proposal on
SNP mailing list).

2) Use your existing hashing approach with some changes to provide a bit
more flexibility.

3) Use your existing hashing approach but zero out the hash page when
-kernel is not used.

Let me expand #2.

While launching the SNP guest, a guest owner can provide a ID block that
KVM will pass to the PSP during the guest launch flow. In the ID block
there is a field called "host_data". A guest owner can do a hash of
kernel/initrd/cmdline and include it in the "host_data" field. During
the hash verification, the OVMF can call the SNP_GET_REPORT. The PSP
will includes the "host_data" passed in the launch process in the report
and OVMF can use it for the verification. Unlike the current
implementation, this enables a guest owner to provides the hash without
requiring any changes in the Qemu and thus affecting the measurement.



Is there a way (in the current NP patches for OVMF) for OVMF to call
SNP_GET_REPORT? Or is this something we need to add support for? Will it
mess up the sequence numbers that are later going to be used by the
kernel as well when managing SNP guest requests?




The current OVMF patches does not add a library to query the attestation 
report yet. If required it should be possible to add such a libraries. 
The VMGEXIT is available to both Guest OS and Guest BIOS. The sequence 
number should not be an issue. As per the GHCB spec, the guest BIOS will 
save the sequence number in the secrets page reserved area and guest 
kernel can picked the next number from that region (its same as the 
kexec approach).





One thing to note that both #2 and #3 requires ovmf to connect to guest
owner to validate the report before using the "host

Re: [PATCH 0/3] SEV: fixes for -kernel launch with incompatible OVMF

2021-11-03 Thread Brijesh Singh




On 11/3/21 9:08 AM, Dr. David Alan Gilbert wrote:

* Brijesh Singh (brijesh.si...@amd.com) wrote:



On 11/2/21 8:22 AM, Dov Murik wrote:



On 02/11/2021 12:52, Brijesh Singh wrote:

Hi Dov,

Overall the patch looks good, only question I have is that now we are
enforce qemu to hash the kernel, initrd and cmdline unconditionally for
any of the SEV guest launches. This requires anyone wanting to
calculating the expected measurement need to account for it. Should we
make the hash page build optional ?



The problem with adding a -enable-add-kernel-hashes QEMU option (or
suboption) is yet another complexity for the user.  I'd also argue that
adding these hashes can lead to a more secure VM boot process, so it
makes sense for it to be the default (and maybe introduce a
-allow-insecure-unmeasured-kernel-via-fw-cfg option to prevent the
measurement from changing due to addition of hashes?).

Maybe, on the other hand, OVMF should "report" whether it supports
hashes verification. If it does, it should have the GUID in the table
(near the reset vector), like the current OvmfPkg/AmdSev edk2 build. If
it doesn't support that, then the entry should not appear at all, and
then QEMU won't add the hashes (with patch 1 from this series).  This
means that in edk2 we need to remove the SEV Hash Table block from the
ResetVectorVtf0.asm for OvmfPkg, but include it in the AmdSev build.



By leaving it ON is conveying a wrong message to the user. The library used
for verifying the hash is a NULL library for all the builds of Ovmf except
the AmdSev package. In the NULL library case, OVMF does not perform any
checks and hash table is useless. I will raise this on concern on your Ovmf
patch series.

IMHO, if you want to turn it ON by default then make sure all the OVMF
package builds supports validating the hash.



But the problem with this approach is that it prevents the future
unification of AmdSev and OvmfPkg, which is a possibility we discussed
(at least with Dave Gilbert), though not sure it's a good/feasible goal.




This is my exact concern, we are auto enabling the features in Qemu that is
supported by AmdSev package only.


I'm confused; wouldn't the trick be to only define the GUIDs for the
builds that support the validation?



The GUID is hardcoded in the OVMF reset vector asm file, and the file 
gets included for all the flavor of OVMF builds. In its current form, 
GUID is defined for all the package.


thanks



Re: [PATCH 0/3] SEV: fixes for -kernel launch with incompatible OVMF

2021-11-02 Thread Brijesh Singh




On 11/2/21 8:22 AM, Dov Murik wrote:



On 02/11/2021 12:52, Brijesh Singh wrote:

Hi Dov,

Overall the patch looks good, only question I have is that now we are
enforce qemu to hash the kernel, initrd and cmdline unconditionally for
any of the SEV guest launches. This requires anyone wanting to
calculating the expected measurement need to account for it. Should we
make the hash page build optional ?



The problem with adding a -enable-add-kernel-hashes QEMU option (or
suboption) is yet another complexity for the user.  I'd also argue that
adding these hashes can lead to a more secure VM boot process, so it
makes sense for it to be the default (and maybe introduce a
-allow-insecure-unmeasured-kernel-via-fw-cfg option to prevent the
measurement from changing due to addition of hashes?).

Maybe, on the other hand, OVMF should "report" whether it supports
hashes verification. If it does, it should have the GUID in the table
(near the reset vector), like the current OvmfPkg/AmdSev edk2 build. If
it doesn't support that, then the entry should not appear at all, and
then QEMU won't add the hashes (with patch 1 from this series).  This
means that in edk2 we need to remove the SEV Hash Table block from the
ResetVectorVtf0.asm for OvmfPkg, but include it in the AmdSev build.



By leaving it ON is conveying a wrong message to the user. The library 
used for verifying the hash is a NULL library for all the builds of Ovmf 
except the AmdSev package. In the NULL library case, OVMF does not 
perform any checks and hash table is useless. I will raise this on 
concern on your Ovmf patch series.


IMHO, if you want to turn it ON by default then make sure all the OVMF 
package builds supports validating the hash.




But the problem with this approach is that it prevents the future
unification of AmdSev and OvmfPkg, which is a possibility we discussed
(at least with Dave Gilbert), though not sure it's a good/feasible goal.




This is my exact concern, we are auto enabling the features in Qemu that 
is supported by AmdSev package only.






I am thinking this more for the SEV-SNP guest. As you may be aware that
with SEV-SNP the attestation is performed by the guest, and its possible
for the launch flow to pass 512-bits of host_data that gets included in
the report. If a user wants to do the hash'e checks for the SNP then
they can pass a hash of kernel, initrd and cmdline through a
launch_finish.ID_BLOCK.host_data and does not require a special hash
page. This it will simplify the expected hash calculation.


That is a new measured boot "protocol" that we can discuss, and see
whether it's better/easier than the existing one at hand that works on
SEV and SEV-ES.

What I don't understand in your suggestion is who performs a SHA256 of
the fw_cfg blobs (kernel/initrd/cmdline) so they can later be verified
(though ideally earlier is better).  Can you describe the details
(step-by-step) of an SNP VM boot with -kernel/-initrd/-append and how
the measurement/attestation is performed?




There are a multiple ways on how you can do a measured boot with the SNP.

1) VMPL0 (SVSM) can provide a complete vTPM (see the MSFT proposal on 
SNP mailing list).


2) Use your existing hashing approach with some changes to provide a bit 
more flexibility.


3) Use your existing hashing approach but zero out the hash page when 
-kernel is not used.


Let me expand #2.

While launching the SNP guest, a guest owner can provide a ID block that 
KVM will pass to the PSP during the guest launch flow. In the ID block 
there is a field called "host_data". A guest owner can do a hash of 
kernel/initrd/cmdline and include it in the "host_data" field. During 
the hash verification, the OVMF can call the SNP_GET_REPORT. The PSP 
will includes the "host_data" passed in the launch process in the report 
and OVMF can use it for the verification. Unlike the current 
implementation, this enables a guest owner to provides the hash without 
requiring any changes in the Qemu and thus affecting the measurement.


One thing to note that both #2 and #3 requires ovmf to connect to guest 
owner to validate the report before using the "host_data" or "hash page".



thanks




Adding a
special page requires a validation of that page. All the prevalidated
page need to be excluded by guest BIOS page validation flow to avoid the
double validation. The hash page is populated only when we pass -kernel
and it will be tricky to communicate this information to the guest BIOS
so that it can skip the validation.


So that again comes back to the earlier question of whether we should
always fill the hashes page or only sometimes, and how can OVMF tell.

How about: QEMU always prevalidates this page (either fills it with
zeros or with the hashes table), and the BIOS always excludes it?

-Dov




Thoughts ?

thanks

On 11/1/21 5:21 AM, Dov Murik wrote:

Tom Lendacky and Brijesh Singh reported two issues

Re: [PATCH 0/3] SEV: fixes for -kernel launch with incompatible OVMF

2021-11-02 Thread Brijesh Singh
Hi Dov,

Overall the patch looks good, only question I have is that now we are
enforce qemu to hash the kernel, initrd and cmdline unconditionally for
any of the SEV guest launches. This requires anyone wanting to
calculating the expected measurement need to account for it. Should we
make the hash page build optional ?

I am thinking this more for the SEV-SNP guest. As you may be aware that
with SEV-SNP the attestation is performed by the guest, and its possible
for the launch flow to pass 512-bits of host_data that gets included in
the report. If a user wants to do the hash'e checks for the SNP then
they can pass a hash of kernel, initrd and cmdline through a
launch_finish.ID_BLOCK.host_data and does not require a special hash
page. This it will simplify the expected hash calculation. Adding a
special page requires a validation of that page. All the prevalidated
page need to be excluded by guest BIOS page validation flow to avoid the
double validation. The hash page is populated only when we pass -kernel
and it will be tricky to communicate this information to the guest BIOS
so that it can skip the validation.

Thoughts ?

thanks

On 11/1/21 5:21 AM, Dov Murik wrote:
> Tom Lendacky and Brijesh Singh reported two issues with launching SEV
> guests with the -kernel QEMU option when an old [1] or wrongly configured [2]
> OVMF images are used.
>
> The fixes in patches 1 and 2 allow such guests to boot by skipping the
> kernel/initrd/cmdline hashes addition to the initial guest memory (and
> warning the user).
>
> Patch 3 is a refactoring of parts of the same function
> sev_add_kernel_loader_hashes() to calculate all padding sizes at
> compile-time.  This patch is not required to fix the issues above, but
> is suggested as an improvement (no functional change intended).
>
> Note that launch measurement security is not harmed by these fixes: a
> Guest Owner that wants to use measured Linux boot with -kernel, must use
> (and measure) an OVMF image that designates a proper hashes table area,
> and that verifies those hashes when loading the binaries from QEMU via
> fw_cfg.
>
> The old OVMFs which don't publish the hashes table GUID or don't reserve
> a valid area for it in MEMFD cannot support these hashes verification in
> any case (for measured boot with -kernel).
>
>
> [1] 
> https://lore.kernel.org/qemu-devel/3b9d10d9-5d9c-da52-f18c-cd93c1931...@amd.com/
> [2] 
> https://lore.kernel.org/qemu-devel/001dd81a-282d-c307-a657-e228480d4...@amd.com/
>
> Dov Murik (3):
>   sev/i386: Allow launching with -kernel if no OVMF hashes table found
>   sev/i386: Warn if using -kernel with invalid OVMF hashes table area
>   sev/i386: Perform padding calculations at compile-time
>
>  target/i386/sev.c | 34 +++---
>  1 file changed, 23 insertions(+), 11 deletions(-)
>
>
> base-commit: af531756d25541a1b3b3d9a14e72e7fedd941a2e



Re: [PATCH v4 1/2] sev/i386: Introduce sev_add_kernel_loader_hashes for measured linux boot

2021-10-27 Thread Brijesh Singh

Hi Dov,

Sorry for coming a bit late on it but I am seeing another issue with 
this patch. The hash build logic looks for a SEV_HASH_TABLE_RV_GUID in 
the GUID list. If found, it uses the base address to store the hash'es. 
Looking at the OVMF, it seems that base address for this GUID is zero. 
It seems that by default the Base Address is non-zero for the AmdSev 
Package build only.


Can we add a check in the sev_add_kernel_loader_hashes() to verify that 
base address is non-zero and at the same time improve OVMF to update 
*.fdf to reserve this page in the MEMFD ?


Thanks
Brijesh

On 10/20/21 10:26 AM, Tom Lendacky wrote:

On 10/19/21 1:18 AM, Dov Murik wrote:

On 18/10/2021 21:02, Tom Lendacky wrote:

On 9/30/21 12:49 AM, Dov Murik wrote:

...


+/*
+ * Add the hashes of the linux kernel/initrd/cmdline to an encrypted
guest page
+ * which is included in SEV's initial memory measurement.
+ */
+bool sev_add_kernel_loader_hashes(SevKernelLoaderContext *ctx, Error
**errp)
+{
+    uint8_t *data;
+    SevHashTableDescriptor *area;
+    SevHashTable *ht;
+    uint8_t cmdline_hash[HASH_SIZE];
+    uint8_t initrd_hash[HASH_SIZE];
+    uint8_t kernel_hash[HASH_SIZE];
+    uint8_t *hashp;
+    size_t hash_len = HASH_SIZE;
+    int aligned_len;
+
+    if (!pc_system_ovmf_table_find(SEV_HASH_TABLE_RV_GUID, ,
NULL)) {
+    error_setg(errp, "SEV: kernel specified but OVMF has no hash
table guid");
+    return false;
+    }


This breaks backwards compatibility with an older OVMF image. Any older
OVMF image with SEV support that doesn't have the hash table GUID will
now fail to boot using -kernel/-initrd/-append, where it used to be able
to boot before.




Thanks Tom for noticing this.

Just so we're on the same page: this patch is already merged.


Right, just not in a release, yet.




We're dealing with a scenario of launching a guest with SEV enabled and
with -kernel.  The behaviours are:


A. With current QEMU:

A1. New AmdSev OVMF build: OVMF will verify the hashes and boot 
correctly.
A2. New Generic OvmfPkgX64 build: No verification but will boot 
correctly.


A3. Old AmdSev OVMF build: QEMU aborts the launch because there's no
hash table GUID.
A4. Old Generic OvmfPkgX64 build: QEMU aborts the launch because there's
no hash table GUID.


B. With older QEMU (before this patch was merged):

B1. New AmdSev OVMF build: OVMF will try to verify the hashes but they
are not populated; boot aborted.
B2. New Generic OvmfPkgX64 build: No verification but will boot 
correctly.


B3. Old AmdSev OVMF build: OVMF aborts the launch because -kernel is not
supported at all.
B4. Old Generic OvmfPkgX64 build: No verification but will boot 
correctly.



So the problem you are raising is scenario A4 (as opposed to previous
behaviour B4).


Correct, scenario A4.






Is that anything we need to be concerned about?



Possible solutions:

1. Do nothing. For users that encounter this: tell them to upgrade OVMF.
2. Modify the code: remove the line: error_setg(errp, "SEV: kernel
specified but OVMF has no hash table guid")

I think that option 2 will not degrade security *if* the Guest Owner
verifies the measurement (which is mandatory anyway; otherwise the
untrusted host can replace OVMF with a "malicious" version that doesn't
verify the hashes). Skipping silently might make debugging a bit harder.
Maybe we can print a warning and return, and then the guest launch will
continue?


That sounds like it might be the best approach if there are no security 
concerns. I agree with printing a message, either informational or 
warning is ok by me.


Lets see if anyone else has some thoughts/ideas.

Thanks,
Tom



Other ideas?


-Dov





Re: [PATCH v3 13/22] target/i386/sev: Remove stubs by using code elision

2021-10-08 Thread Brijesh Singh


On 10/6/21 11:55 AM, Philippe Mathieu-Daudé wrote:
> On 10/4/21 10:19, Paolo Bonzini wrote:
>> On 02/10/21 14:53, Philippe Mathieu-Daudé wrote:
>>> Only declare sev_enabled() and sev_es_enabled() when CONFIG_SEV is
>>> set, to allow the compiler to elide unused code. Remove unnecessary
>>> stubs.
>>>
>>> Signed-off-by: Philippe Mathieu-Daudé 
>>> ---
>>>   include/sysemu/sev.h    | 14 +-
>>>   target/i386/sev_i386.h  |  3 ---
>>>   target/i386/cpu.c   | 16 +---
>>>   target/i386/sev-stub.c  | 36 
>>>   target/i386/meson.build |  2 +-
>>>   5 files changed, 23 insertions(+), 48 deletions(-)
>>>   delete mode 100644 target/i386/sev-stub.c
>>>
>>> diff --git a/include/sysemu/sev.h b/include/sysemu/sev.h
>>> index a329ed75c1c..f5c625bb3b3 100644
>>> --- a/include/sysemu/sev.h
>>> +++ b/include/sysemu/sev.h
>>> @@ -14,9 +14,21 @@
>>>   #ifndef QEMU_SEV_H
>>>   #define QEMU_SEV_H
>>>   -#include "sysemu/kvm.h"
>>> +#ifndef CONFIG_USER_ONLY
>>> +#include CONFIG_DEVICES /* CONFIG_SEV */
>>> +#endif
>>>   +#ifdef CONFIG_SEV
>>>   bool sev_enabled(void);
>>> +bool sev_es_enabled(void);
>>> +#else
>>> +#define sev_enabled() 0
>>> +#define sev_es_enabled() 0
>>> +#endif
>> This means that sev.h can only be included from target-specific files.
>>
>> An alternative could be:
>>
>> #ifdef NEED_CPU_H
>> # include CONFIG_DEVICES
> : fatal error: x86_64-linux-user-config-devices.h: No such
> file or directory
>
>> #endif
>>
>> #if defined NEED_CPU_H && !defined CONFIG_SEV
>> # define sev_enabled() 0
>> # define sev_es_enabled() 0
>> #else
>> bool sev_enabled(void);
>> bool sev_es_enabled(void);
>> #endif
>>
>> ... but in fact sysemu/sev.h _is_ only used from x86-specific files. So
>> should it be moved to include/hw/i386, and even merged with
>> target/i386/sev_i386.h?  Do we need two files?
> No clue, I don't think we need. Brijesh?


Sorry for the late reply, we do not need two files and it can be easily
merged.

thanks



Re: [RFC PATCH v2 04/12] i386/sev: initialize SNP context

2021-09-05 Thread Brijesh Singh


On 9/5/21 4:19 AM, Dov Murik wrote:
>
> On 27/08/2021 1:26, Michael Roth wrote:
>> From: Brijesh Singh 
>>
>> When SEV-SNP is enabled, the KVM_SNP_INIT command is used to initialize
>> the platform. The command checks whether SNP is enabled in the KVM, if
>> enabled then it allocates a new ASID from the SNP pool and calls the
>> firmware to initialize the all the resources.
>>
>
> From the KVM code ("[PATCH Part2 v5 24/45] KVM: SVM: Add
> KVM_SEV_SNP_LAUNCH_START command") it seems that KVM_SNP_INIT does *not*
> allocate the ASID; actually this is done in KVM_SEV_SNP_LAUNCH_START.

Actually, the KVM_SNP_INIT does allocate the ASID. If you look at the
driver code then in switch state, the SNP_INIT fallthrough to SEV_INIT
which will call sev_guest_init(). The sev_guest_init() allocates a new
ASID.
https://github.com/AMDESE/linux/blob/bb9ba49cd9b749d5551aae295c091d8757153dd7/arch/x86/kvm/svm/sev.c#L255

The LAUNCH_START simply binds the ASID to a guest.

thanks



Re: [RFC PATCH v2 04/12] i386/sev: initialize SNP context

2021-09-05 Thread Brijesh Singh
Hi Dov,

On 9/5/21 2:07 AM, Dov Murik wrote:
...
>
>>  
>>  uint64_t
>> @@ -1074,6 +1083,7 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error 
>> **errp)
>>  uint32_t ebx;
>>  uint32_t host_cbitpos;
>>  struct sev_user_data_status status = {};
>> +void *init_args = NULL;
>>  
>>  if (!sev_common) {
>>  return 0;
>> @@ -1126,7 +1136,18 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error 
>> **errp)
>>  sev_common->api_major = status.api_major;
>>  sev_common->api_minor = status.api_minor;
> Not visible here in the context: the code here is using the
> SEV_PLATFORM_STATUS command to get the build_id, api_major, and api_minor.
>
> I see that SNP has a new command SNP_PLATFORM_STATUS, which fills a
> struct sev_data_snp_platform_status (hmmm, I can't find the struct's
> definition; I assume it should look like Table 38 in 8.3.2 in SNP FW ABI
> document).

The API version can be queries either through the SNP_PLATFORM_STATUS or
SEV_PLATFORM_STATUS and they both report the same info. As the
definition of the sev_data_platform_status is concerned it should be
defined in the kernel include/linux/psp-sev.h.


> My questions are:
>
> 1. Is it OK to call the "legacy" SEV_PLATFORM_STATUS when about to init
> an SNP guest?

Yes, the legacy platform status command can be called on the SNP
initialized host.

I choose not to new command because we only care about the verison
string and that is available through either of these commands (SNP or
SEV platform status).

> 2. Do we want to save some info like installed TCB version and reported
> TCB version, and maybe other fields from SNP platform status?

If we decide to add a new QMP (query-sev-snp) then it makes sense to
export those fields so that a hypervisor console can give additional
information; But note that for the guest, all these are available in the
attestation report.


> 3. Should we check the state field in the platform status?
>
>
Good point, we could use the SNP platform status. I don't expect the
state to be different between the SNP platform_status and SEV
platform_status.


>>  
>> -if (sev_es_enabled()) {
>> +if (sev_snp_enabled()) {
>> +SevSnpGuestState *sev_snp_guest = SEV_SNP_GUEST(sev_common);
>> +if (!kvm_kernel_irqchip_allowed()) {
>> +error_report("%s: SEV-SNP guests require in-kernel irqchip 
>> support",
>> + __func__);
> Most errors in this function use error_setg(errp, ...).  This should follow.
>
>
>> +goto err;
>> +}
>> +
>> +cmd = KVM_SEV_SNP_INIT;
>> +init_args = (void *)_snp_guest->kvm_init_conf;
>> +
>> +} else if (sev_es_enabled()) {
>>  if (!kvm_kernel_irqchip_allowed()) {
>>  error_report("%s: SEV-ES guests require in-kernel irqchip 
>> support",
>>   __func__);
> Not part of this patch, but this error_report (and another one in the
> SEV-ES case) should be converted to error_setg similarly.  Maybe add a
> separate patch for fixing this for SEV-ES.
>
>
>
>> @@ -1145,7 +1166,7 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error 
>> **errp)
>>  }
>>  
>>  trace_kvm_sev_init();
> Suggestions:
>
> 1. log the guest type (SEV / SEV-ES / SEV-SNP)
> 2. log the SNP init flags value when initializing an SNP guest

Noted.

thanks



Re: [RFC PATCH 4/6] i386/sev: add the SNP launch start context

2021-07-19 Thread Brijesh Singh




On 7/19/21 7:34 AM, Dov Murik wrote:

Hi Brijesh,

On 10/07/2021 0:55, Brijesh Singh wrote:

The SNP_LAUNCH_START is called first to create a cryptographic launch
context within the firmware.

Signed-off-by: Brijesh Singh 
---
  target/i386/sev.c| 30 +-
  target/i386/trace-events |  1 +
  2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 84ae244af0..259408a8f1 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -812,6 +812,29 @@ sev_read_file_base64(const char *filename, guchar **data, 
gsize *len)
  return 0;
  }
  
+static int

+sev_snp_launch_start(SevGuestState *sev)
+{
+int ret = 1;
+int fw_error, rc;
+struct kvm_sev_snp_launch_start *start = >snp_config.start;
+
+trace_kvm_sev_snp_launch_start(start->policy);
+
+rc = sev_ioctl(sev->sev_fd, KVM_SEV_SNP_LAUNCH_START, start, _error);
+if (rc < 0) {
+error_report("%s: SNP_LAUNCH_START ret=%d fw_error=%d '%s'",
+__func__, ret, fw_error, fw_error_to_str(fw_error));


Did you mean to report the value of ret or rc?


Ah, I was meaning the rc.






+goto out;


Suggestion:

Remove the `ret` variable.
Here: simply `return 1`.
At the end: remove the `out:` label; simply `return 0`.



Noted.

thanks



Re: [RFC PATCH 6/6] i386/sev: populate secrets and cpuid page and finalize the SNP launch

2021-07-19 Thread Brijesh Singh

Hi Dov,

On 7/19/21 6:24 AM, Dov Murik wrote:


s/LAUNCH_UPDATE/SNP_LAUNCH_UPDATE/
(to show it's the same command you refer to above)



Noted.

  
+static int

+sev_snp_launch_update_gpa(uint32_t hwaddr, uint32_t size, uint8_t type)


hwaddr is a confusing name here because it is also a typedef (which is
most likely uint64_t...).  Maybe call this argument `gpa` ?



Noted, 'gpa' sounds much better.


+static bool
+detectoverlap(uint32_t start, uint32_t end,
+  struct snp_pre_validated_range *overlap)


naming conventions dictate: detect_overlap



Noted.


+{
+int i;
+
+for (i = 0; i < ARRAY_SIZE(pre_validated); i++) {
+if (pre_validated[i].start < end && start < pre_validated[i].end) {
+memcpy(overlap, _validated[i], sizeof(*overlap));


Maybe simpler than memcpy:

 *overlap = pre_validated[i];



Noted.


+
+trace_kvm_sev_snp_launch_finish();


Maybe the trace should show some info about the snp_config.finish fields?



I did thought about it, but one of the field in the snp_config.finish is 
4K in size and may fill the trace buffer quickly.



+kvm_sev_snp_ovmf_boot_block_info(uint32_t secrets_gpa, uint32_t slen, uint32_t 
cpuid_gpa, uint32_t clen, uint32_t s, uint32_t e) "secrets 0x%x+0x%x cpuid 0x%x+0x%x 
pre-validate 0x%x+0x%x"


The last argument is an end-addr (not a length), so maybe the format
string should end with:

" pre-validate 0x%x - 0x%x"

Also I'd prefer to log the SevSnpBootInfoBlock fields in the order they
appear in the struct.




Noted.

thanks



Re: [RFC PATCH 1/6] linux-header: add the SNP specific command

2021-07-19 Thread Brijesh Singh

Hi Dov,


On 7/19/21 6:35 AM, Dov Murik wrote:

Hi Brijesh,

On 10/07/2021 0:55, Brijesh Singh wrote:

Sync the kvm.h with the kernel to include the SNP specific commands.

Signed-off-by: Brijesh Singh 
---
  linux-headers/linux/kvm.h | 47 +++



What about psp-sev.h ? I see that kernel patch "[PATCH Part2 RFC v4
11/40] crypto:ccp: Define the SEV-SNP commands" adds some new PSP return
codes.

The QEMU user-friendly string list sev_fw_errlist (in sev.c) should be
updated accordingly.



thanks for reminding me, I will sync the psp-sev.h and include the new 
error code as well in the sev.c.





Re: [RFC PATCH 3/6] i386/sev: initialize SNP context

2021-07-15 Thread Brijesh Singh




On 7/15/21 4:32 AM, Dov Murik wrote:


Just making sure I understand:

* sev_enabled() returns true for SEV or newer (SEV or SEV-ES or
   SEV-SNP).
* sev_es_enabled() returns true for SEV-ES or newer (SEV-ES or SEV-SNP).
* sev_snp_enabled() returns true for SEV-SNP or newer (currently only
   SEV-SNP).

Is that indeed the intention?



Yes. The SEV-SNP support requires the SEV and SEV-ES to be enabled. See 
the text from the APM vol2 section 15.36.


The SEV-SNP features enable additional protection for encrypted
VMs designed to achieve stronger isolation from the hypervisor.
SEV-SNP is used with the SEV and SEV-ES features described in
Section 15.34 and Section 15.35 respectively and requires the
enablement and use of these features.

thanks



Re: [RFC PATCH 6/6] i386/sev: populate secrets and cpuid page and finalize the SNP launch

2021-07-14 Thread Brijesh Singh




On 7/14/21 12:29 PM, Dr. David Alan Gilbert wrote:>> +struct 
snp_pre_validated_range {

+uint32_t start;
+uint32_t end;
+};


Just a thought, but maybe use a 'Range' from include/qemu/range.h ?



I will look into it.

thanks



Re: [RFC PATCH 5/6] i386/sev: add support to encrypt BIOS when SEV-SNP is enabled

2021-07-14 Thread Brijesh Singh




On 7/14/21 12:08 PM, Connor Kuehl wrote:

On 7/9/21 3:55 PM, Brijesh Singh wrote:

The KVM_SEV_SNP_LAUNCH_UPDATE command is used for encrypting the bios
image used for booting the SEV-SNP guest.

Signed-off-by: Brijesh Singh 
---
  target/i386/sev.c| 33 -
  target/i386/trace-events |  1 +
  2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 259408a8f1..41dcb084d1 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -883,6 +883,30 @@ out:
  return ret;
  }
  
+static int

+sev_snp_launch_update(SevGuestState *sev, uint8_t *addr, uint64_t len, int 
type)
+{
+int ret, fw_error;
+struct kvm_sev_snp_launch_update update = {};
+
+if (!addr || !len) {
+return 1;


Should this be a -1? It looks like the caller checks if this function
returns < 0, but doesn't check for res == 1.


Ah, it should be -1.



Alternatively, invoking error_report might provide more useful
information that the preconditions to this function were violated.



Sure, I will add error_report.

thanks



Re: [RFC PATCH 0/6] Add AMD Secure Nested Paging (SEV-SNP) support

2021-07-14 Thread Brijesh Singh


On 7/14/21 4:52 AM, Dr. David Alan Gilbert wrote:
> * Brijesh Singh (brijesh.si...@amd.com) wrote:
>>
>> On 7/13/21 3:05 AM, Dov Murik wrote:>
>>> Particularly confusing is the `policy` attribute which is only relevant
>>> for SEV / SEV-ES, while there's a new `snp.policy` attribute for SNP...
>>> Maybe the irrelevant attributes should not be added to the tree when not
>>> in SNP.
>> The policy fields are also applicable to the SNP. The main difference are:
>>
>> - in SEV/SEV-ES the policy is 32-bit compare to 64-bit value in SEV-SNP.
>> However, for SEV-SNP spec uses lower 32-bit value and higher bits are marked
>> reserved.
>>
>> - the bit field meaning are different
> Ah, I see that from the SNP ABI spec (section 4.3).
>
> That's a bit subtle; in that at the moment we select SEV or SEV-ES based
> on the existing guest policy flags; I think you're saying that SEV-SNP
> is enabled by the user explicitly.

Correct. This is one of the reason that I added the "snp" property.


>
>> Based on this, we can introduce a new filed 'snp-policy'.
> Yes, people are bound to confuse them if they're not clearly separated;
> although I guess whatever comes after SNP will probably share that
> longer field?


I am keeping my finger crossed on it. I hope that in future they will
share it.

-Brijesh




Re: [RFC PATCH 2/6] i386/sev: extend sev-guest property to include SEV-SNP

2021-07-14 Thread Brijesh Singh


On 7/13/21 8:46 AM, Markus Armbruster wrote:
> Brijesh Singh  writes:
>
>> To launch the SEV-SNP guest, a user can specify up to 8 parameters.
>> Passing all parameters through command line can be difficult. To simplify
>> the launch parameter passing, introduce a .ini-like config file that can be
>> used for passing the parameters to the launch flow.
>>
>> The contents of the config file will look like this:
>>
>> $ cat snp-launch.init
>>
>> # SNP launch parameters
>> [SEV-SNP]
>> init_flags = 0
>> policy = 0x1000
>> id_block = "YWFhYWFhYWFhYWFhYWFhCg=="
>>
>>
>> Add 'snp' property that can be used to indicate that SEV guest launch
>> should enable the SNP support.
>>
>> SEV-SNP guest launch examples:
>>
>> 1) launch without additional parameters
>>
>>   $(QEMU_CLI) \
>> -object sev-guest,id=sev0,snp=on
>>
>> 2) launch with optional parameters
>>   $(QEMU_CLI) \
>> -object sev-guest,id=sev0,snp=on,launch-config=
>>
>> Signed-off-by: Brijesh Singh 
> I acknowledge doing complex configuration on the command line can be
> awkward.  But if we added a separate configuration file for every
> configurable thing where that's the case, we'd have too many already,
> and we'd constantly grow more.  I don't think this is a viable solution.
>
> In my opinion, much of what we do on the command line should be done in
> configuration files instead.  Not in several different configuration
> languages, mind, but using one common language for all our configuration
> needs.
>
> Some of us argue this language already exists: QMP.  It can't do
> everything the command line can do, but that's a matter of putting in
> the work.  However, JSON isn't a good configuration language[1].  To get
> a decent one, we'd have to to extend JSON[2], or wrap another concrete
> syntax around QMP's abstract syntax.
>
> But this doesn't help you at all *now*.
>
> I recommend to do exactly what we've done before for complex
> configuration: define it in the QAPI schema, so we can use both dotted
> keys and JSON on the command line, and can have QMP, too.  Examples:
> -blockdev, -display, -compat.
>
> Questions?


I will take a look at the blockdev and try modeling after that. if I run
into any questions then I will ask. thanks for the pointer Markus.

-Brijesh



Re: [RFC PATCH 0/6] Add AMD Secure Nested Paging (SEV-SNP) support

2021-07-13 Thread Brijesh Singh




On 7/13/21 3:31 AM, Dr. David Alan Gilbert wrote:

adding it to QMP as well (unles sit's purely for debug and may change).


We have query-sev QMP, I will extend to add a new 'snp: bool' field.

thanks



Re: [RFC PATCH 0/6] Add AMD Secure Nested Paging (SEV-SNP) support

2021-07-13 Thread Brijesh Singh




On 7/13/21 3:05 AM, Dov Murik wrote:>

Particularly confusing is the `policy` attribute which is only relevant
for SEV / SEV-ES, while there's a new `snp.policy` attribute for SNP...
Maybe the irrelevant attributes should not be added to the tree when not
in SNP.


The policy fields are also applicable to the SNP. The main difference are:

- in SEV/SEV-ES the policy is 32-bit compare to 64-bit value in SEV-SNP. 
However, for SEV-SNP spec uses lower 32-bit value and higher bits are 
marked reserved.


- the bit field meaning are different

Based on this, we can introduce a new filed 'snp-policy'.

-Brijesh



Re: [RFC PATCH 2/6] i386/sev: extend sev-guest property to include SEV-SNP

2021-07-13 Thread Brijesh Singh




On 7/12/21 11:24 AM, Daniel P. Berrangé wrote:>>

policy: 8 bytes
flags: 8 bytes
id_block: 96 bytes
id_auth: 4096 bytes
host_data: 32 bytes
gosvw: 16 bytes


Only the id_auth parameter is really considered large here.

When you say "up to a page size", that implies that the size is
actually variable.  Is that correct, and if so, what is a real
world common size going to be ? Is the common size much smaller
than this upper limit ? If so I'd just ignore the issue entirely.


Looking at the recent spec, it appears that id_auth is fixed to 4K.



If not, then, 4k on the command line is certainly ugly, but isn't
technically impossible. It would be similarly ugly to have this
value stuffed into a libvirt XML configuration for that matter.

One option is to supply only that one parameter via an external
file, with the file being an opaque blob whose context is the
parameter value thus avoiding inventing a custom file format
parser.

When "id_auth" is described as "authentication data", are there
any secrecy requirements around this ?



Yes this sounds much better, we have been using the similar approach for 
the SEV in which we pass the PDH and session blob through the file.




QEMU does have the '-object secret' framework for passing blobs
of sensitive data to QEMU and can allow passing via a file:

   
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fqemu-project.gitlab.io%2Fqemu%2Fsystem%2Fsecrets.htmldata=04%7C01%7Cbrijesh.singh%40amd.com%7C891fdc1ab0d8483aecb808d945519054%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637617038899405482%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=8AHUC3DeyauxT4Pd2ZkUJkDyu9XHtexybM0BgdRlego%3Dreserved=0

Even if this doesn't actually need to be kept private, we
could decide to simply (ab)use the 'secret' object anyway
as a way to let it be passed in out of band via a file.



The content of the field does not need to be protected. It's a public 
information, so I am not sure the secrets object fits here.


thanks



Re: [RFC PATCH 2/6] i386/sev: extend sev-guest property to include SEV-SNP

2021-07-12 Thread Brijesh Singh




On 7/12/21 9:34 AM, Dr. David Alan Gilbert wrote:


$ cat snp-launch.init

# SNP launch parameters
[SEV-SNP]
init_flags = 0
policy = 0x1000
id_block = "YWFhYWFhYWFhYWFhYWFhCg=="


Wouldn't the 'gosvw' and 'hostdata' also be in there?



I did not included all the 8 parameters in the commit messages, mainly 
because some of them are big. I just picked 3 smaller ones.


-Brijesh



Re: [RFC PATCH 2/6] i386/sev: extend sev-guest property to include SEV-SNP

2021-07-12 Thread Brijesh Singh




On 7/12/21 9:43 AM, Daniel P. Berrangé wrote:

On Fri, Jul 09, 2021 at 04:55:46PM -0500, Brijesh Singh wrote:

To launch the SEV-SNP guest, a user can specify up to 8 parameters.
Passing all parameters through command line can be difficult.


This sentence applies to pretty much everything in QEMU and the
SEV-SNP example is nowhere near an extreme example IMHO.


  To simplify
the launch parameter passing, introduce a .ini-like config file that can be
used for passing the parameters to the launch flow.


Inventing a new config file format for usage by just one specific
niche feature in QEMU is something I'd say we do not want.

Our long term goal in QEMU is to move to a world where 100% of
QEMU configuration is provided in JSON format, using the QAPI
schema to define the accepted input set.



I am open to all suggestions. I was trying to avoid passing all these 
parameters through the command line because some of them can be huge (up 
to a page size)





The contents of the config file will look like this:

$ cat snp-launch.init

# SNP launch parameters
[SEV-SNP]
init_flags = 0
policy = 0x1000
id_block = "YWFhYWFhYWFhYWFhYWFhCg=="


These parameters are really tiny and trivial to provide on the command
line, so I'm not finding this config file compelling.



I have only included 3 small parameters. Other parameters can be up to a 
page size. The breakdown looks like this:


policy: 8 bytes
flags: 8 bytes
id_block: 96 bytes
id_auth: 4096 bytes
host_data: 32 bytes
gosvw: 16 bytes






Add 'snp' property that can be used to indicate that SEV guest launch
should enable the SNP support.

SEV-SNP guest launch examples:

1) launch without additional parameters

   $(QEMU_CLI) \
 -object sev-guest,id=sev0,snp=on

2) launch with optional parameters
   $(QEMU_CLI) \
 -object sev-guest,id=sev0,snp=on,launch-config=

Signed-off-by: Brijesh Singh 
---
  docs/amd-memory-encryption.txt |  81 +++-
  qapi/qom.json  |   6 +
  target/i386/sev.c  | 227 +
  3 files changed, 312 insertions(+), 2 deletions(-)


Regards,
Daniel





Re: [RFC PATCH 1/6] linux-header: add the SNP specific command

2021-07-12 Thread Brijesh Singh




On 7/10/21 3:32 PM, Michael S. Tsirkin wrote:

On Fri, Jul 09, 2021 at 04:55:45PM -0500, Brijesh Singh wrote:

Sync the kvm.h with the kernel to include the SNP specific commands.

Signed-off-by: Brijesh Singh 


Pls specify which kernel version you used for the sync.



This sync is based on the my guest kernel rfc patches (5.13-rc6). After 
the guest patches are accepted then will include the exact linux kernel 
version.




[RFC PATCH 0/6] Add AMD Secure Nested Paging (SEV-SNP) support

2021-07-09 Thread Brijesh Singh
SEV-SNP builds upon existing SEV and SEV-ES functionality while adding
new hardware-based memory protections. SEV-SNP adds strong memory integrity
protection to help prevent malicious hypervisor-based attacks like data
replay, memory re-mapping and more in order to create an isolated memory
encryption environment.

The patches to support the SEV-SNP in Linux kernel and OVMF are available:
https://lore.kernel.org/kvm/20210707181506.30489-1-brijesh.si...@amd.com/
https://lore.kernel.org/kvm/20210707183616.5620-1-brijesh.si...@amd.com/
https://edk2.groups.io/g/devel/message/77335?p=,,,20,0,0,0::Created,,posterid%3A5969970,20,2,20,83891508

The Qemu patches uses the command id added by the SEV-SNP hypervisor
patches to bootstrap the SEV-SNP VMs.

TODO:
 * Add support to filter CPUID values through the PSP.

Additional resources
-
SEV-SNP whitepaper
https://www.amd.com/system/files/TechDocs/SEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf

APM 2: https://www.amd.com/system/files/TechDocs/24593.pdf (section 15.36)

GHCB spec:
https://developer.amd.com/wp-content/resources/56421.pdf

SEV-SNP firmware specification:
https://www.amd.com/system/files/TechDocs/56860.pdf

Brijesh Singh (6):
  linux-header: add the SNP specific command
  i386/sev: extend sev-guest property to include SEV-SNP
  i386/sev: initialize SNP context
  i386/sev: add the SNP launch start context
  i386/sev: add support to encrypt BIOS when SEV-SNP is enabled
  i386/sev: populate secrets and cpuid page and finalize the SNP launch

 docs/amd-memory-encryption.txt |  81 +-
 linux-headers/linux/kvm.h  |  47 
 qapi/qom.json  |   6 +
 target/i386/sev.c  | 498 -
 target/i386/sev_i386.h |   1 +
 target/i386/trace-events   |   4 +
 6 files changed, 628 insertions(+), 9 deletions(-)

-- 
2.17.1




[RFC PATCH 5/6] i386/sev: add support to encrypt BIOS when SEV-SNP is enabled

2021-07-09 Thread Brijesh Singh
The KVM_SEV_SNP_LAUNCH_UPDATE command is used for encrypting the bios
image used for booting the SEV-SNP guest.

Signed-off-by: Brijesh Singh 
---
 target/i386/sev.c| 33 -
 target/i386/trace-events |  1 +
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 259408a8f1..41dcb084d1 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -883,6 +883,30 @@ out:
 return ret;
 }
 
+static int
+sev_snp_launch_update(SevGuestState *sev, uint8_t *addr, uint64_t len, int 
type)
+{
+int ret, fw_error;
+struct kvm_sev_snp_launch_update update = {};
+
+if (!addr || !len) {
+return 1;
+}
+
+update.uaddr = (__u64)(unsigned long)addr;
+update.len = len;
+update.page_type = type;
+trace_kvm_sev_snp_launch_update(addr, len, type);
+ret = sev_ioctl(sev->sev_fd, KVM_SEV_SNP_LAUNCH_UPDATE,
+, _error);
+if (ret) {
+error_report("%s: SNP_LAUNCH_UPDATE ret=%d fw_error=%d '%s'",
+__func__, ret, fw_error, fw_error_to_str(fw_error));
+}
+
+return ret;
+}
+
 static int
 sev_launch_update_data(SevGuestState *sev, uint8_t *addr, uint64_t len)
 {
@@ -1161,7 +1185,14 @@ sev_encrypt_flash(uint8_t *ptr, uint64_t len, Error 
**errp)
 
 /* if SEV is in update state then encrypt the data else do nothing */
 if (sev_check_state(sev_guest, SEV_STATE_LAUNCH_UPDATE)) {
-int ret = sev_launch_update_data(sev_guest, ptr, len);
+int ret;
+
+if (sev_snp_enabled()) {
+ret = sev_snp_launch_update(sev_guest, ptr, len,
+KVM_SEV_SNP_PAGE_TYPE_NORMAL);
+} else {
+ret = sev_launch_update_data(sev_guest, ptr, len);
+}
 if (ret < 0) {
 error_setg(errp, "failed to encrypt pflash rom");
 return ret;
diff --git a/target/i386/trace-events b/target/i386/trace-events
index 18cc14b956..0c2d250206 100644
--- a/target/i386/trace-events
+++ b/target/i386/trace-events
@@ -12,3 +12,4 @@ kvm_sev_launch_finish(void) ""
 kvm_sev_launch_secret(uint64_t hpa, uint64_t hva, uint64_t secret, int len) 
"hpa 0x%" PRIx64 " hva 0x%" PRIx64 " data 0x%" PRIx64 " len %d"
 kvm_sev_attestation_report(const char *mnonce, const char *data) "mnonce %s 
data %s"
 kvm_sev_snp_launch_start(uint64_t policy) "policy 0x%" PRIx64
+kvm_sev_snp_launch_update(void *addr, uint64_t len, int type) "addr %p len 
0x%" PRIx64 " type %d"
-- 
2.17.1




[RFC PATCH 6/6] i386/sev: populate secrets and cpuid page and finalize the SNP launch

2021-07-09 Thread Brijesh Singh
During the SNP guest launch sequence, a special secrets and cpuid page
needs to be populated by the SEV-SNP firmware. The secrets page contains
the VM Platform Communication Key (VMPCKs) used by the guest to send and
receive secure messages to the PSP. And CPUID page will contain the CPUID
value filtered through the PSP.

The guest BIOS (OVMF) reserves these pages in MEMFD and location of it
is available through the SNP boot block GUID. While finalizing the guest
boot flow, lookup for the boot block and call the SNP_LAUNCH_UPDATE
command to populate secrets and cpuid pages.

In order to support early boot code, the OVMF may ask hypervisor to
request the pre-validation of certain memory range. If such range is
present the call LAUNCH_UPDATE command to validate those address range
without affecting the measurement. See the SEV-SNP specification for
further details.

Finally, call the SNP_LAUNCH_FINISH to finalize the guest boot.

Signed-off-by: Brijesh Singh 
---
 target/i386/sev.c| 184 ++-
 target/i386/trace-events |   2 +
 2 files changed, 184 insertions(+), 2 deletions(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 41dcb084d1..f438e09d33 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -93,6 +93,19 @@ typedef struct __attribute__((__packed__)) SevInfoBlock {
 uint32_t reset_addr;
 } SevInfoBlock;
 
+#define SEV_SNP_BOOT_BLOCK_GUID "bd39c0c2-2f8e-4243-83e8-1b74cebcb7d9"
+typedef struct __attribute__((__packed__)) SevSnpBootInfoBlock {
+/* Prevalidate range address */
+uint32_t pre_validated_start;
+uint32_t pre_validated_end;
+/* Secrets page address */
+uint32_t secrets_addr;
+uint32_t secrets_len;
+/* CPUID page address */
+uint32_t cpuid_addr;
+uint32_t cpuid_len;
+} SevSnpBootInfoBlock;
+
 static SevGuestState *sev_guest;
 static Error *sev_mig_blocker;
 
@@ -1014,6 +1027,158 @@ static Notifier sev_machine_done_notify = {
 .notify = sev_launch_get_measure,
 };
 
+static int
+sev_snp_launch_update_gpa(uint32_t hwaddr, uint32_t size, uint8_t type)
+{
+void *hva;
+MemoryRegion *mr = NULL;
+
+hva = gpa2hva(, hwaddr, size, NULL);
+if (!hva) {
+error_report("SEV-SNP failed to get HVA for GPA 0x%x", hwaddr);
+return 1;
+}
+
+return sev_snp_launch_update(sev_guest, hva, size, type);
+}
+
+struct snp_pre_validated_range {
+uint32_t start;
+uint32_t end;
+};
+
+static struct snp_pre_validated_range pre_validated[2];
+
+static bool
+detectoverlap(uint32_t start, uint32_t end,
+  struct snp_pre_validated_range *overlap)
+{
+int i;
+
+for (i = 0; i < ARRAY_SIZE(pre_validated); i++) {
+if (pre_validated[i].start < end && start < pre_validated[i].end) {
+memcpy(overlap, _validated[i], sizeof(*overlap));
+return true;
+}
+}
+
+return false;
+}
+
+static void snp_ovmf_boot_block_setup(void)
+{
+struct snp_pre_validated_range overlap;
+SevSnpBootInfoBlock *info;
+uint32_t start, end, sz;
+int ret;
+
+/*
+ * Extract the SNP boot block for the SEV-SNP guests by locating the
+ * SNP_BOOT GUID. The boot block contains the information such as location
+ * of secrets and CPUID page, additionaly it may contain the range of
+ * memory that need to be pre-validated for the boot.
+ */
+if (!pc_system_ovmf_table_find(SEV_SNP_BOOT_BLOCK_GUID,
+(uint8_t **), NULL)) {
+error_report("SEV-SNP: failed to find the SNP boot block");
+exit(1);
+}
+
+trace_kvm_sev_snp_ovmf_boot_block_info(info->secrets_addr,
+   info->secrets_len, info->cpuid_addr,
+   info->cpuid_len,
+   info->pre_validated_start,
+   info->pre_validated_end);
+
+/* Populate the secrets page */
+ret = sev_snp_launch_update_gpa(info->secrets_addr, info->secrets_len,
+KVM_SEV_SNP_PAGE_TYPE_SECRETS);
+if (ret) {
+error_report("SEV-SNP: failed to insert secret page GPA 0x%x",
+ info->secrets_addr);
+exit(1);
+}
+
+/* Populate the cpuid page */
+ret = sev_snp_launch_update_gpa(info->cpuid_addr, info->cpuid_len,
+KVM_SEV_SNP_PAGE_TYPE_CPUID);
+if (ret) {
+error_report("SEV-SNP: failed to insert cpuid page GPA 0x%x",
+ info->cpuid_addr);
+exit(1);
+}
+
+/*
+ * Pre-validate the range using the LAUNCH_UPDATE_DATA, if the
+ * pre-validation range contains the CPUID and Secret page GPA then skip
+ * it. This is because SEV-SNP firmware pre-validates those pages as part
+ * of adding secrets and cpuid LAUNCH_UPDATE type.
+ */

[RFC PATCH 4/6] i386/sev: add the SNP launch start context

2021-07-09 Thread Brijesh Singh
The SNP_LAUNCH_START is called first to create a cryptographic launch
context within the firmware.

Signed-off-by: Brijesh Singh 
---
 target/i386/sev.c| 30 +-
 target/i386/trace-events |  1 +
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 84ae244af0..259408a8f1 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -812,6 +812,29 @@ sev_read_file_base64(const char *filename, guchar **data, 
gsize *len)
 return 0;
 }
 
+static int
+sev_snp_launch_start(SevGuestState *sev)
+{
+int ret = 1;
+int fw_error, rc;
+struct kvm_sev_snp_launch_start *start = >snp_config.start;
+
+trace_kvm_sev_snp_launch_start(start->policy);
+
+rc = sev_ioctl(sev->sev_fd, KVM_SEV_SNP_LAUNCH_START, start, _error);
+if (rc < 0) {
+error_report("%s: SNP_LAUNCH_START ret=%d fw_error=%d '%s'",
+__func__, ret, fw_error, fw_error_to_str(fw_error));
+goto out;
+}
+
+sev_set_guest_state(sev, SEV_STATE_LAUNCH_UPDATE);
+ret = 0;
+
+out:
+return ret;
+}
+
 static int
 sev_launch_start(SevGuestState *sev)
 {
@@ -1105,7 +1128,12 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error 
**errp)
 goto err;
 }
 
-ret = sev_launch_start(sev);
+if (sev_snp_enabled()) {
+ret = sev_snp_launch_start(sev);
+} else {
+ret = sev_launch_start(sev);
+}
+
 if (ret) {
 error_setg(errp, "%s: failed to create encryption context", __func__);
 goto err;
diff --git a/target/i386/trace-events b/target/i386/trace-events
index 2cd8726eeb..18cc14b956 100644
--- a/target/i386/trace-events
+++ b/target/i386/trace-events
@@ -11,3 +11,4 @@ kvm_sev_launch_measurement(const char *value) "data %s"
 kvm_sev_launch_finish(void) ""
 kvm_sev_launch_secret(uint64_t hpa, uint64_t hva, uint64_t secret, int len) 
"hpa 0x%" PRIx64 " hva 0x%" PRIx64 " data 0x%" PRIx64 " len %d"
 kvm_sev_attestation_report(const char *mnonce, const char *data) "mnonce %s 
data %s"
+kvm_sev_snp_launch_start(uint64_t policy) "policy 0x%" PRIx64
-- 
2.17.1




[RFC PATCH 3/6] i386/sev: initialize SNP context

2021-07-09 Thread Brijesh Singh
When SEV-SNP is enabled, the KVM_SNP_INIT command is used to initialize
the platform. The command checks whether SNP is enabled in the KVM, if
enabled then it allocate a new ASID from the SNP pool and calls the
firmware to initialize the all the resources.

Signed-off-by: Brijesh Singh 
---
 target/i386/sev.c  | 24 +---
 target/i386/sev_i386.h |  1 +
 2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 6b238ef969..84ae244af0 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -583,10 +583,17 @@ sev_enabled(void)
 return !!sev_guest;
 }
 
+bool
+sev_snp_enabled(void)
+{
+return sev_guest->snp;
+}
+
 bool
 sev_es_enabled(void)
 {
-return sev_enabled() && (sev_guest->policy & SEV_POLICY_ES);
+return sev_snp_enabled() ||
+   (sev_enabled() && (sev_guest->policy & SEV_POLICY_ES));
 }
 
 uint64_t
@@ -1008,6 +1015,7 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error 
**errp)
 uint32_t ebx;
 uint32_t host_cbitpos;
 struct sev_user_data_status status = {};
+void *init_args = NULL;
 
 if (!sev) {
 return 0;
@@ -1061,7 +1069,17 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error 
**errp)
 sev->api_major = status.api_major;
 sev->api_minor = status.api_minor;
 
-if (sev_es_enabled()) {
+if (sev_snp_enabled()) {
+if (!kvm_kernel_irqchip_allowed()) {
+error_report("%s: SEV-SNP guests require in-kernel irqchip 
support",
+ __func__);
+goto err;
+}
+
+cmd = KVM_SEV_SNP_INIT;
+init_args = (void *)>snp_config.init;
+
+} else if (sev_es_enabled()) {
 if (!kvm_kernel_irqchip_allowed()) {
 error_report("%s: SEV-ES guests require in-kernel irqchip support",
  __func__);
@@ -1080,7 +1098,7 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error 
**errp)
 }
 
 trace_kvm_sev_init();
-ret = sev_ioctl(sev->sev_fd, cmd, NULL, _error);
+ret = sev_ioctl(sev->sev_fd, cmd, init_args, _error);
 if (ret) {
 error_setg(errp, "%s: failed to initialize ret=%d fw_error=%d '%s'",
__func__, ret, fw_error, fw_error_to_str(fw_error));
diff --git a/target/i386/sev_i386.h b/target/i386/sev_i386.h
index ae6d840478..e0e1a599be 100644
--- a/target/i386/sev_i386.h
+++ b/target/i386/sev_i386.h
@@ -29,6 +29,7 @@
 #define SEV_POLICY_SEV  0x20
 
 extern bool sev_es_enabled(void);
+extern bool sev_snp_enabled(void);
 extern uint64_t sev_get_me_mask(void);
 extern SevInfo *sev_get_info(void);
 extern uint32_t sev_get_cbit_position(void);
-- 
2.17.1




[RFC PATCH 2/6] i386/sev: extend sev-guest property to include SEV-SNP

2021-07-09 Thread Brijesh Singh
To launch the SEV-SNP guest, a user can specify up to 8 parameters.
Passing all parameters through command line can be difficult. To simplify
the launch parameter passing, introduce a .ini-like config file that can be
used for passing the parameters to the launch flow.

The contents of the config file will look like this:

$ cat snp-launch.init

# SNP launch parameters
[SEV-SNP]
init_flags = 0
policy = 0x1000
id_block = "YWFhYWFhYWFhYWFhYWFhCg=="


Add 'snp' property that can be used to indicate that SEV guest launch
should enable the SNP support.

SEV-SNP guest launch examples:

1) launch without additional parameters

  $(QEMU_CLI) \
-object sev-guest,id=sev0,snp=on

2) launch with optional parameters
  $(QEMU_CLI) \
-object sev-guest,id=sev0,snp=on,launch-config=

Signed-off-by: Brijesh Singh 
---
 docs/amd-memory-encryption.txt |  81 +++-
 qapi/qom.json  |   6 +
 target/i386/sev.c  | 227 +
 3 files changed, 312 insertions(+), 2 deletions(-)

diff --git a/docs/amd-memory-encryption.txt b/docs/amd-memory-encryption.txt
index ffca382b5f..322bf38f68 100644
--- a/docs/amd-memory-encryption.txt
+++ b/docs/amd-memory-encryption.txt
@@ -22,8 +22,8 @@ support for notifying a guest's operating system when certain 
types of VMEXITs
 are about to occur. This allows the guest to selectively share information with
 the hypervisor to satisfy the requested function.
 
-Launching
--
+Launching (SEV and SEV-ES)
+--
 Boot images (such as bios) must be encrypted before a guest can be booted. The
 MEMORY_ENCRYPT_OP ioctl provides commands to encrypt the images: LAUNCH_START,
 LAUNCH_UPDATE_DATA, LAUNCH_MEASURE and LAUNCH_FINISH. These four commands
@@ -113,6 +113,83 @@ a SEV-ES guest:
  - Requires in-kernel irqchip - the burden is placed on the hypervisor to
manage booting APs.
 
+Launching (SEV-SNP)
+---
+Boot images (such as bios) must be encrypted before a guest can be booted. The
+MEMORY_ENCRYPT_OP ioctl provides commands to encrypt the images:
+KVM_SNP_INIT, SNP_LAUNCH_START, SNP_LAUNCH_UPDATE, and SNP_LAUNCH_FINISH. These
+four commands together generate a fresh memory encryption key for the VM,
+encrypt the boot images for a successful launch.
+
+KVM_SNP_INIT is called first to initialize the SEV-SNP firmware and SNP
+features in the KVM. The feature flags value can be provided through the
+launch-config file.
+
+++---+--+-+
+| key| type  | default  | meaning |
+++---+--+-+
+| init_flags | hex   | 0| SNP feature flags   |
++-+
+
+Note: currently the init_flags must be zero.
+
+SNP_LAUNCH_START is called first to create a cryptographic launch context
+within the firmware. To create this context, guest owner must provide a guest
+policy and other parameters as described in the SEV-SNP firmware
+specification. The launch parameters should be specified in the launch-config
+ini file and should be treated as a binary blob and must be passed as-is to
+the SEV-SNP firmware.
+
+The SNP_LAUNCH_START uses the following parameters from the launch-config
+file. See the SEV-SNP specification for more details.
+
+++---+--+--+
+| key| type  | default  | meaning  |
+++---+--+--+
+| policy | hex   | 0x3  | a 64-bit guest policy|
+| imi_en | bool  | 0| 1 when IMI is enabled|
+| ma_end | bool  | 0| 1 when migration agent is used   |
+| gosvw  | string| 0| 16-byte base64 encoded string for the guest  |
+||   |  | OS visible workaround.   |
+++---+--+--+
+
+SNP_LAUNCH_UPDATE encrypts the memory region using the cryptographic context
+created via the SNP_LAUNCH_START command. If required, this command can be 
called
+multiple times to encrypt different memory regions. The command also calculates
+the measurement of the memory contents as it encrypts.
+
+SNP_LAUNCH_FINISH finalizes the guest launch flow. Optionally, while finalizing
+the launch the firmware can perform checks on the launch digest computing
+through the SNP_LAUNCH_UPDATE. To perform the check the user must supply
+the id block, authentication blob and host data that should be included in the
+attestation report. See the SEV-SNP spec for further details.
+
+The SNP_LAUNCH_FINISH uses the following parameters from the launch-config 
file.
+
+++---+--+--+
+| key| type  | default 

[RFC PATCH 1/6] linux-header: add the SNP specific command

2021-07-09 Thread Brijesh Singh
Sync the kvm.h with the kernel to include the SNP specific commands.

Signed-off-by: Brijesh Singh 
---
 linux-headers/linux/kvm.h | 47 +++
 1 file changed, 47 insertions(+)

diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 20d6a263bb..c17ace1ece 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1679,6 +1679,12 @@ enum sev_cmd_id {
/* Guest Migration Extension */
KVM_SEV_SEND_CANCEL,
 
+   /* SNP specific commands */
+   KVM_SEV_SNP_INIT = 256,
+   KVM_SEV_SNP_LAUNCH_START,
+   KVM_SEV_SNP_LAUNCH_UPDATE,
+   KVM_SEV_SNP_LAUNCH_FINISH,
+
KVM_SEV_NR_MAX,
 };
 
@@ -1775,6 +1781,47 @@ struct kvm_sev_receive_update_data {
__u32 trans_len;
 };
 
+struct kvm_snp_init {
+   __u64 flags;
+};
+
+struct kvm_sev_snp_launch_start {
+   __u64 policy;
+   __u64 ma_uaddr;
+   __u8 ma_en;
+   __u8 imi_en;
+   __u8 gosvw[16];
+};
+
+#define KVM_SEV_SNP_PAGE_TYPE_NORMAL   0x1
+#define KVM_SEV_SNP_PAGE_TYPE_VMSA 0x2
+#define KVM_SEV_SNP_PAGE_TYPE_ZERO 0x3
+#define KVM_SEV_SNP_PAGE_TYPE_UNMEASURED   0x4
+#define KVM_SEV_SNP_PAGE_TYPE_SECRETS  0x5
+#define KVM_SEV_SNP_PAGE_TYPE_CPUID0x6
+
+struct kvm_sev_snp_launch_update {
+   __u64 uaddr;
+   __u32 len;
+   __u8 imi_page;
+   __u8 page_type;
+   __u8 vmpl3_perms;
+   __u8 vmpl2_perms;
+   __u8 vmpl1_perms;
+};
+
+#define KVM_SEV_SNP_ID_BLOCK_SIZE  96
+#define KVM_SEV_SNP_ID_AUTH_SIZE   4096
+#define KVM_SEV_SNP_FINISH_DATA_SIZE   32
+
+struct kvm_sev_snp_launch_finish {
+   __u64 id_block_uaddr;
+   __u64 id_auth_uaddr;
+   __u8 id_block_en;
+   __u8 auth_key_en;
+   __u8 host_data[KVM_SEV_SNP_FINISH_DATA_SIZE];
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX   (1 << 2)
-- 
2.17.1




Re: [PATCH] sev: sev_get_attestation_report use g_autofree

2021-06-03 Thread Brijesh Singh


On 6/3/21 6:30 AM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" 
>
> Removes a whole bunch of g_free's and a goto.
>
> Signed-off-by: Dr. David Alan Gilbert 


Reviewed-by: Brijesh Singh 

thanks

> ---
>  target/i386/sev.c | 11 +++
>  1 file changed, 3 insertions(+), 8 deletions(-)
>
> diff --git a/target/i386/sev.c b/target/i386/sev.c
> index 83df8c09f6..0bd976b4d0 100644
> --- a/target/i386/sev.c
> +++ b/target/i386/sev.c
> @@ -500,8 +500,8 @@ sev_get_attestation_report(const char *mnonce, Error 
> **errp)
>  struct kvm_sev_attestation_report input = {};
>  SevAttestationReport *report = NULL;
>  SevGuestState *sev = sev_guest;
> -guchar *data;
> -guchar *buf;
> +g_autofree guchar *data = NULL;
> +g_autofree guchar *buf = NULL;
>  gsize len;
>  int err = 0, ret;
>  
> @@ -521,7 +521,6 @@ sev_get_attestation_report(const char *mnonce, Error 
> **errp)
>  if (len != sizeof(input.mnonce)) {
>  error_setg(errp, "SEV: mnonce must be %zu bytes (got %" 
> G_GSIZE_FORMAT ")",
>  sizeof(input.mnonce), len);
> -g_free(buf);
>  return NULL;
>  }
>  
> @@ -532,7 +531,6 @@ sev_get_attestation_report(const char *mnonce, Error 
> **errp)
>  if (err != SEV_RET_INVALID_LEN) {
>  error_setg(errp, "failed to query the attestation report length "
>  "ret=%d fw_err=%d (%s)", ret, err, fw_error_to_str(err));
> -g_free(buf);
>  return NULL;
>  }
>  }
> @@ -547,7 +545,7 @@ sev_get_attestation_report(const char *mnonce, Error 
> **errp)
>  if (ret) {
>  error_setg_errno(errp, errno, "Failed to get attestation report"
>  " ret=%d fw_err=%d (%s)", ret, err, fw_error_to_str(err));
> -goto e_free_data;
> +return NULL;
>  }
>  
>  report = g_new0(SevAttestationReport, 1);
> @@ -555,9 +553,6 @@ sev_get_attestation_report(const char *mnonce, Error 
> **errp)
>  
>  trace_kvm_sev_attestation_report(mnonce, report->data);
>  
> -e_free_data:
> -g_free(data);
> -g_free(buf);
>  return report;
>  }
>  



Re: [PATCH v3] target/i386/sev: add support to query the attestation report

2021-05-21 Thread Brijesh Singh
Hi,

Ping. Please let me know if you have any feedback on this patch.

Thanks

On 4/29/21 12:07 PM, Brijesh Singh wrote:
> The SEV FW >= 0.23 added a new command that can be used to query the
> attestation report containing the SHA-256 digest of the guest memory
> and VMSA encrypted with the LAUNCH_UPDATE and sign it with the PEK.
>
> Note, we already have a command (LAUNCH_MEASURE) that can be used to
> query the SHA-256 digest of the guest memory encrypted through the
> LAUNCH_UPDATE. The main difference between previous and this command
> is that the report is signed with the PEK and unlike the LAUNCH_MEASURE
> command the ATTESATION_REPORT command can be called while the guest
> is running.
>
> Add a QMP interface "query-sev-attestation-report" that can be used
> to get the report encoded in base64.
>
> Cc: James Bottomley 
> Cc: Tom Lendacky 
> Cc: Eric Blake 
> Cc: Paolo Bonzini 
> Cc: k...@vger.kernel.org
> Reviewed-by: James Bottomley 
> Tested-by: James Bottomley 
> Signed-off-by: Brijesh Singh 
> ---
> v3:
>   * free the buffer in error path.
>
> v2:
>   * add trace event.
>   * fix the goto to return NULL on failure.
>   * make the mnonce as a base64 encoded string
>
>  linux-headers/linux/kvm.h |  8 +
>  qapi/misc-target.json | 38 ++
>  target/i386/monitor.c |  6 
>  target/i386/sev-stub.c|  7 
>  target/i386/sev.c | 67 +++
>  target/i386/sev_i386.h|  2 ++
>  target/i386/trace-events  |  1 +
>  7 files changed, 129 insertions(+)
>
> diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
> index 020b62a619..897f831374 100644
> --- a/linux-headers/linux/kvm.h
> +++ b/linux-headers/linux/kvm.h
> @@ -1591,6 +1591,8 @@ enum sev_cmd_id {
>   KVM_SEV_DBG_ENCRYPT,
>   /* Guest certificates commands */
>   KVM_SEV_CERT_EXPORT,
> + /* Attestation report */
> + KVM_SEV_GET_ATTESTATION_REPORT,
>  
>   KVM_SEV_NR_MAX,
>  };
> @@ -1643,6 +1645,12 @@ struct kvm_sev_dbg {
>   __u32 len;
>  };
>  
> +struct kvm_sev_attestation_report {
> + __u8 mnonce[16];
> + __u64 uaddr;
> + __u32 len;
> +};
> +
>  #define KVM_DEV_ASSIGN_ENABLE_IOMMU  (1 << 0)
>  #define KVM_DEV_ASSIGN_PCI_2_3   (1 << 1)
>  #define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
> diff --git a/qapi/misc-target.json b/qapi/misc-target.json
> index 0c7491cd82..4b62f0ac05 100644
> --- a/qapi/misc-target.json
> +++ b/qapi/misc-target.json
> @@ -285,3 +285,41 @@
>  ##
>  { 'command': 'query-gic-capabilities', 'returns': ['GICCapability'],
>'if': 'defined(TARGET_ARM)' }
> +
> +
> +##
> +# @SevAttestationReport:
> +#
> +# The struct describes attestation report for a Secure Encrypted 
> Virtualization
> +# feature.
> +#
> +# @data:  guest attestation report (base64 encoded)
> +#
> +#
> +# Since: 6.1
> +##
> +{ 'struct': 'SevAttestationReport',
> +  'data': { 'data': 'str'},
> +  'if': 'defined(TARGET_I386)' }
> +
> +##
> +# @query-sev-attestation-report:
> +#
> +# This command is used to get the SEV attestation report, and is supported 
> on AMD
> +# X86 platforms only.
> +#
> +# @mnonce: a random 16 bytes value encoded in base64 (it will be included in 
> report)
> +#
> +# Returns: SevAttestationReport objects.
> +#
> +# Since: 6.1
> +#
> +# Example:
> +#
> +# -> { "execute" : "query-sev-attestation-report", "arguments": { "mnonce": 
> "aaa" } }
> +# <- { "return" : { "data": "bbbd"} }
> +#
> +##
> +{ 'command': 'query-sev-attestation-report', 'data': { 'mnonce': 'str' },
> +  'returns': 'SevAttestationReport',
> +  'if': 'defined(TARGET_I386)' }
> diff --git a/target/i386/monitor.c b/target/i386/monitor.c
> index 5994408bee..119211f0b0 100644
> --- a/target/i386/monitor.c
> +++ b/target/i386/monitor.c
> @@ -757,3 +757,9 @@ void qmp_sev_inject_launch_secret(const char *packet_hdr,
>  
>  sev_inject_launch_secret(packet_hdr, secret, gpa, errp);
>  }
> +
> +SevAttestationReport *
> +qmp_query_sev_attestation_report(const char *mnonce, Error **errp)
> +{
> +return sev_get_attestation_report(mnonce, errp);
> +}
> diff --git a/target/i386/sev-stub.c b/target/i386/sev-stub.c
> index 0207f1c5aa..0227cb5177 100644
> --- a/target/i386/sev-stub.c
> +++ b/target/i386/sev-stub.c
> @@ -74,3 +74,10 @@ int sev_es_save_reset_vector(void *flash_ptr, uint64_t 
> flash_size)
>  {
>  abort();
>  }
> +
> +SevAttestationReport *
> +sev_get_attestation_report

[PATCH v3] target/i386/sev: add support to query the attestation report

2021-04-29 Thread Brijesh Singh
The SEV FW >= 0.23 added a new command that can be used to query the
attestation report containing the SHA-256 digest of the guest memory
and VMSA encrypted with the LAUNCH_UPDATE and sign it with the PEK.

Note, we already have a command (LAUNCH_MEASURE) that can be used to
query the SHA-256 digest of the guest memory encrypted through the
LAUNCH_UPDATE. The main difference between previous and this command
is that the report is signed with the PEK and unlike the LAUNCH_MEASURE
command the ATTESATION_REPORT command can be called while the guest
is running.

Add a QMP interface "query-sev-attestation-report" that can be used
to get the report encoded in base64.

Cc: James Bottomley 
Cc: Tom Lendacky 
Cc: Eric Blake 
Cc: Paolo Bonzini 
Cc: k...@vger.kernel.org
Reviewed-by: James Bottomley 
Tested-by: James Bottomley 
Signed-off-by: Brijesh Singh 
---
v3:
  * free the buffer in error path.

v2:
  * add trace event.
  * fix the goto to return NULL on failure.
  * make the mnonce as a base64 encoded string

 linux-headers/linux/kvm.h |  8 +
 qapi/misc-target.json | 38 ++
 target/i386/monitor.c |  6 
 target/i386/sev-stub.c|  7 
 target/i386/sev.c | 67 +++
 target/i386/sev_i386.h|  2 ++
 target/i386/trace-events  |  1 +
 7 files changed, 129 insertions(+)

diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 020b62a619..897f831374 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1591,6 +1591,8 @@ enum sev_cmd_id {
KVM_SEV_DBG_ENCRYPT,
/* Guest certificates commands */
KVM_SEV_CERT_EXPORT,
+   /* Attestation report */
+   KVM_SEV_GET_ATTESTATION_REPORT,
 
KVM_SEV_NR_MAX,
 };
@@ -1643,6 +1645,12 @@ struct kvm_sev_dbg {
__u32 len;
 };
 
+struct kvm_sev_attestation_report {
+   __u8 mnonce[16];
+   __u64 uaddr;
+   __u32 len;
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX   (1 << 2)
diff --git a/qapi/misc-target.json b/qapi/misc-target.json
index 0c7491cd82..4b62f0ac05 100644
--- a/qapi/misc-target.json
+++ b/qapi/misc-target.json
@@ -285,3 +285,41 @@
 ##
 { 'command': 'query-gic-capabilities', 'returns': ['GICCapability'],
   'if': 'defined(TARGET_ARM)' }
+
+
+##
+# @SevAttestationReport:
+#
+# The struct describes attestation report for a Secure Encrypted Virtualization
+# feature.
+#
+# @data:  guest attestation report (base64 encoded)
+#
+#
+# Since: 6.1
+##
+{ 'struct': 'SevAttestationReport',
+  'data': { 'data': 'str'},
+  'if': 'defined(TARGET_I386)' }
+
+##
+# @query-sev-attestation-report:
+#
+# This command is used to get the SEV attestation report, and is supported on 
AMD
+# X86 platforms only.
+#
+# @mnonce: a random 16 bytes value encoded in base64 (it will be included in 
report)
+#
+# Returns: SevAttestationReport objects.
+#
+# Since: 6.1
+#
+# Example:
+#
+# -> { "execute" : "query-sev-attestation-report", "arguments": { "mnonce": 
"aaa" } }
+# <- { "return" : { "data": "bbbd"} }
+#
+##
+{ 'command': 'query-sev-attestation-report', 'data': { 'mnonce': 'str' },
+  'returns': 'SevAttestationReport',
+  'if': 'defined(TARGET_I386)' }
diff --git a/target/i386/monitor.c b/target/i386/monitor.c
index 5994408bee..119211f0b0 100644
--- a/target/i386/monitor.c
+++ b/target/i386/monitor.c
@@ -757,3 +757,9 @@ void qmp_sev_inject_launch_secret(const char *packet_hdr,
 
 sev_inject_launch_secret(packet_hdr, secret, gpa, errp);
 }
+
+SevAttestationReport *
+qmp_query_sev_attestation_report(const char *mnonce, Error **errp)
+{
+return sev_get_attestation_report(mnonce, errp);
+}
diff --git a/target/i386/sev-stub.c b/target/i386/sev-stub.c
index 0207f1c5aa..0227cb5177 100644
--- a/target/i386/sev-stub.c
+++ b/target/i386/sev-stub.c
@@ -74,3 +74,10 @@ int sev_es_save_reset_vector(void *flash_ptr, uint64_t 
flash_size)
 {
 abort();
 }
+
+SevAttestationReport *
+sev_get_attestation_report(const char *mnonce, Error **errp)
+{
+error_setg(errp, "SEV is not available in this QEMU");
+return NULL;
+}
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 72b9e2ab40..4b9d7d3bb9 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -491,6 +491,73 @@ out:
 return cap;
 }
 
+SevAttestationReport *
+sev_get_attestation_report(const char *mnonce, Error **errp)
+{
+struct kvm_sev_attestation_report input = {};
+SevAttestationReport *report = NULL;
+SevGuestState *sev = sev_guest;
+guchar *data;
+guchar *buf;
+gsize len;
+int err = 0, ret;
+
+if (!sev_enabled()) {
+error_setg(errp, "SEV is not enabled");
+return NULL;
+}
+
+/* lets decode the mnonce string */
+buf = g_base64_decode(mnonce, );
+if

Fail to create sev-guest object on 6.0.0-rc0

2021-03-25 Thread Brijesh Singh
Hi All,

It seems creating the sev-guest object is broken rc0 tag. The following
command is no longer able to create the sev-guest object

$QEMU \

 -machine ...,confidential-guest-support=sev0 \

 -object sev-guest,id=sev0,policy=0x1 \

It fails with "-object sev-guest,id=sev0: Invalid parameter
'sev-guest'". I will try to bisect the broken commit but if someone has
already looked into it then let me know.


Thanks

Brijesh




Re: [PATCH] target/i386/sev: Ensure sev_fw_errlist is sync with update-linux-headers

2021-03-18 Thread Brijesh Singh


On 3/18/21 10:38 AM, Philippe Mathieu-Daudé wrote:
> ping^2?
>
> On 3/8/21 11:21 AM, Philippe Mathieu-Daudé wrote:
>> ping?
>>
>> On 2/19/21 7:01 PM, Philippe Mathieu-Daudé wrote:
>>> Ensure sev_fw_errlist[] is updated after running
>>> the update-linux-headers.sh script.
>>>
>>> Signed-off-by: Philippe Mathieu-Daudé 
>>> ---
>>> Based-on: <20210218151633.215374-1-cku...@redhat.com>

I am in the favor to keep list in sync with header updates. thanks

Acked-by: Brijesh Singh 

>>> ---
>>>  target/i386/sev.c | 5 -
>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/target/i386/sev.c b/target/i386/sev.c
>>> index 37690ae809c..92c69a23769 100644
>>> --- a/target/i386/sev.c
>>> +++ b/target/i386/sev.c
>>> @@ -87,7 +87,7 @@ typedef struct __attribute__((__packed__)) SevInfoBlock {
>>>  static SevGuestState *sev_guest;
>>>  static Error *sev_mig_blocker;
>>>  
>>> -static const char *const sev_fw_errlist[] = {
>>> +static const char *const sev_fw_errlist[SEV_RET_MAX] = {
>>>  [SEV_RET_SUCCESS]= "",
>>>  [SEV_RET_INVALID_PLATFORM_STATE] = "Platform state is invalid",
>>>  [SEV_RET_INVALID_GUEST_STATE]= "Guest state is invalid",
>>> @@ -114,6 +114,8 @@ static const char *const sev_fw_errlist[] = {
>>>  [SEV_RET_RESOURCE_LIMIT] = "Required firmware resource 
>>> depleted",
>>>  [SEV_RET_SECURE_DATA_INVALID]= "Part-specific integrity check 
>>> failure",
>>>  };
>>> +/* Ensure sev_fw_errlist[] is updated after running 
>>> update-linux-headers.sh */
>>> +QEMU_BUILD_BUG_ON(SEV_RET_SECURE_DATA_INVALID + 1 != SEV_RET_MAX);
>>>  
>>>  #define SEV_FW_MAX_ERROR  ARRAY_SIZE(sev_fw_errlist)
>>>  
>>> @@ -160,6 +162,7 @@ fw_error_to_str(int code)
>>>  if (code < 0 || code >= SEV_FW_MAX_ERROR) {
>>>  return "unknown error";
>>>  }
>>> +assert(sev_fw_errlist[code]);
>>>  
>>>  return sev_fw_errlist[code];
>>>  }
>>>



[PATCH v2] target/i386/sev: add support to query the attestation report

2021-01-05 Thread Brijesh Singh
The SEV FW >= 0.23 added a new command that can be used to query the
attestation report containing the SHA-256 digest of the guest memory
and VMSA encrypted with the LAUNCH_UPDATE and sign it with the PEK.

Note, we already have a command (LAUNCH_MEASURE) that can be used to
query the SHA-256 digest of the guest memory encrypted through the
LAUNCH_UPDATE. The main difference between previous and this command
is that the report is signed with the PEK and unlike the LAUNCH_MEASURE
command the ATTESATION_REPORT command can be called while the guest
is running.

Add a QMP interface "query-sev-attestation-report" that can be used
to get the report encoded in base64.

Cc: James Bottomley 
Cc: Tom Lendacky 
Cc: Eric Blake 
Cc: Paolo Bonzini 
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
v2:
  * add trace event.
  * fix the goto to return NULL on failure.
  * make the mnonce as a base64 encoded string

 linux-headers/linux/kvm.h |  8 +
 qapi/misc-target.json | 38 ++
 target/i386/monitor.c |  6 
 target/i386/sev-stub.c|  7 +
 target/i386/sev.c | 66 +++
 target/i386/sev_i386.h|  2 ++
 target/i386/trace-events  |  1 +
 7 files changed, 128 insertions(+)

diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 56ce14ad20..6d0f8101ba 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1585,6 +1585,8 @@ enum sev_cmd_id {
KVM_SEV_DBG_ENCRYPT,
/* Guest certificates commands */
KVM_SEV_CERT_EXPORT,
+   /* Attestation report */
+   KVM_SEV_GET_ATTESTATION_REPORT,
 
KVM_SEV_NR_MAX,
 };
@@ -1637,6 +1639,12 @@ struct kvm_sev_dbg {
__u32 len;
 };
 
+struct kvm_sev_attestation_report {
+   __u8 mnonce[16];
+   __u64 uaddr;
+   __u32 len;
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX   (1 << 2)
diff --git a/qapi/misc-target.json b/qapi/misc-target.json
index 06ef8757f0..5907a2dfaa 100644
--- a/qapi/misc-target.json
+++ b/qapi/misc-target.json
@@ -285,3 +285,41 @@
 ##
 { 'command': 'query-gic-capabilities', 'returns': ['GICCapability'],
   'if': 'defined(TARGET_ARM)' }
+
+
+##
+# @SevAttestationReport:
+#
+# The struct describes attestation report for a Secure Encrypted Virtualization
+# feature.
+#
+# @data:  guest attestation report (base64 encoded)
+#
+#
+# Since: 5.2
+##
+{ 'struct': 'SevAttestationReport',
+  'data': { 'data': 'str'},
+  'if': 'defined(TARGET_I386)' }
+
+##
+# @query-sev-attestation-report:
+#
+# This command is used to get the SEV attestation report, and is supported on 
AMD
+# X86 platforms only.
+#
+# @mnonce: a random 16 bytes value encoded in base64 (it will be included in 
report)
+#
+# Returns: SevAttestationReport objects.
+#
+# Since: 5.3
+#
+# Example:
+#
+# -> { "execute" : "query-sev-attestation-report", "arguments": { "mnonce": 
"aaa" } }
+# <- { "return" : { "data": "bbbd"} }
+#
+##
+{ 'command': 'query-sev-attestation-report', 'data': { 'mnonce': 'str' },
+  'returns': 'SevAttestationReport',
+  'if': 'defined(TARGET_I386)' }
diff --git a/target/i386/monitor.c b/target/i386/monitor.c
index 1bc91442b1..0c8377f900 100644
--- a/target/i386/monitor.c
+++ b/target/i386/monitor.c
@@ -736,3 +736,9 @@ void qmp_sev_inject_launch_secret(const char *packet_hdr,
 {
 sev_inject_launch_secret(packet_hdr, secret, gpa, errp);
 }
+
+SevAttestationReport *
+qmp_query_sev_attestation_report(const char *mnonce, Error **errp)
+{
+return sev_get_attestation_report(mnonce, errp);
+}
diff --git a/target/i386/sev-stub.c b/target/i386/sev-stub.c
index c1fecc2101..cdc9a014ee 100644
--- a/target/i386/sev-stub.c
+++ b/target/i386/sev-stub.c
@@ -54,3 +54,10 @@ int sev_inject_launch_secret(const char *hdr, const char 
*secret,
 {
 return 1;
 }
+
+SevAttestationReport *
+sev_get_attestation_report(const char *mnonce, Error **errp)
+{
+error_setg(errp, "SEV is not available in this QEMU");
+return NULL;
+}
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 1546606811..d1f90a1d8a 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -492,6 +492,72 @@ out:
 return cap;
 }
 
+SevAttestationReport *
+sev_get_attestation_report(const char *mnonce, Error **errp)
+{
+struct kvm_sev_attestation_report input = {};
+SevAttestationReport *report = NULL;
+SevGuestState *sev = sev_guest;
+guchar *data;
+guchar *buf;
+gsize len;
+int err = 0, ret;
+
+if (!sev_enabled()) {
+error_setg(errp, "SEV is not enabled");
+return NULL;
+}
+
+/* lets decode the mnonce string */
+buf = g_base64_decode(mnonce, );
+if (!buf) {
+error_setg(errp, "SEV: failed to decode mnonce input");
+

Re: [PATCH] target/i386/sev: add the support to query the attestation report

2020-12-11 Thread Brijesh Singh


On 12/10/20 10:13 AM, James Bottomley wrote:
> On Fri, 2020-12-04 at 15:31 -0600, Brijesh Singh wrote:
>> The SEV FW >= 0.23 added a new command that can be used to query the
>> attestation report containing the SHA-256 digest of the guest memory
>> and VMSA encrypted with the LAUNCH_UPDATE and sign it with the PEK.
>>
>> Note, we already have a command (LAUNCH_MEASURE) that can be used to
>> query the SHA-256 digest of the guest memory encrypted through the
>> LAUNCH_UPDATE. The main difference between previous and this command
>> is that the report is signed with the PEK and unlike the
>> LAUNCH_MEASURE
>> command the ATTESATION_REPORT command can be called while the guest
>> is running.
>>
>> Add a QMP interface "query-sev-attestation-report" that can be used
>> to get the report encoded in base64.
>>
>> Cc: James Bottomley 
>> Cc: Tom Lendacky 
>> Cc: Eric Blake 
>> Cc: Paolo Bonzini 
>> Cc: k...@vger.kernel.org
>> Signed-off-by: Brijesh Singh 
>> ---
>>  linux-headers/linux/kvm.h |  8 ++
>>  qapi/misc-target.json | 38 +++
>>  target/i386/monitor.c |  6 +
>>  target/i386/sev-stub.c|  7 +
>>  target/i386/sev.c | 54
>> +++
>>  target/i386/sev_i386.h|  2 ++
>>  6 files changed, 115 insertions(+)
>>
>> diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
>> index 56ce14ad20..6d0f8101ba 100644
>> --- a/linux-headers/linux/kvm.h
>> +++ b/linux-headers/linux/kvm.h
>> @@ -1585,6 +1585,8 @@ enum sev_cmd_id {
>>  KVM_SEV_DBG_ENCRYPT,
>>  /* Guest certificates commands */
>>  KVM_SEV_CERT_EXPORT,
>> +/* Attestation report */
>> +KVM_SEV_GET_ATTESTATION_REPORT,
>>  
>>  KVM_SEV_NR_MAX,
>>  };
>> @@ -1637,6 +1639,12 @@ struct kvm_sev_dbg {
>>  __u32 len;
>>  };
>>  
>> +struct kvm_sev_attestation_report {
>> +__u8 mnonce[16];
>> +__u64 uaddr;
>> +__u32 len;
>> +};
>> +
>>  #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
>>  #define KVM_DEV_ASSIGN_PCI_2_3  (1 << 1)
>>  #define KVM_DEV_ASSIGN_MASK_INTX(1 << 2)
>> diff --git a/qapi/misc-target.json b/qapi/misc-target.json
>> index 1e561fa97b..ec6565e6ef 100644
>> --- a/qapi/misc-target.json
>> +++ b/qapi/misc-target.json
>> @@ -267,3 +267,41 @@
>>  ##
>>  { 'command': 'query-gic-capabilities', 'returns': ['GICCapability'],
>>'if': 'defined(TARGET_ARM)' }
>> +
>> +
>> +##
>> +# @SevAttestationReport:
>> +#
>> +# The struct describes attestation report for a Secure Encrypted
>> Virtualization
>> +# feature.
>> +#
>> +# @data:  guest attestation report (base64 encoded)
>> +#
>> +#
>> +# Since: 5.2
>> +##
>> +{ 'struct': 'SevAttestationReport',
>> +  'data': { 'data': 'str'},
>> +  'if': 'defined(TARGET_I386)' }
>> +
>> +##
>> +# @query-sev-attestation-report:
>> +#
>> +# This command is used to get the SEV attestation report, and is
>> supported on AMD
>> +# X86 platforms only.
>> +#
>> +# @mnonce: a random 16 bytes of data (it will be included in report)
>> +#
>> +# Returns: SevAttestationReport objects.
>> +#
>> +# Since: 5.2
>> +#
>> +# Example:
>> +#
>> +# -> { "execute" : "query-sev-attestation-report", "arguments": {
>> "mnonce": "aaa" } }
>> +# <- { "return" : { "data": "bbbd"} }
> It would be nice here, rather than returning a binary blob to break it
> up into the actual returned components like query-sev does.

In past, I have seen that the fields defined in blobs have changed based
on the API versions. So, I tried to stay away from expanding the blob
unless its absolutely required. I would  prefer to stick to that approach.


>
>> +##
>> +{ 'command': 'query-sev-attestation-report', 'data': { 'mnonce':
>> 'str' },
>> +  'returns': 'SevAttestationReport',
>> +  'if': 'defined(TARGET_I386)' }
> [...]
>> diff --git a/target/i386/sev.c b/target/i386/sev.c
>> index 93c4d60b82..28958fb71b 100644
>> --- a/target/i386/sev.c
>> +++ b/target/i386/sev.c
>> @@ -68,6 +68,7 @@ struct SevGuestState {
>>  
>>  #define DEFAULT_GUEST_POLICY0x1 /* disable debug */
>>  #define DEFAULT_SEV_DEVICE  "/dev/sev"
>> +#define DEFAULT_ATTESATION_REPORT_BUF_SIZE  4096
>>

[PATCH] target/i386/sev: add the support to query the attestation report

2020-12-04 Thread Brijesh Singh
The SEV FW >= 0.23 added a new command that can be used to query the
attestation report containing the SHA-256 digest of the guest memory
and VMSA encrypted with the LAUNCH_UPDATE and sign it with the PEK.

Note, we already have a command (LAUNCH_MEASURE) that can be used to
query the SHA-256 digest of the guest memory encrypted through the
LAUNCH_UPDATE. The main difference between previous and this command
is that the report is signed with the PEK and unlike the LAUNCH_MEASURE
command the ATTESATION_REPORT command can be called while the guest
is running.

Add a QMP interface "query-sev-attestation-report" that can be used
to get the report encoded in base64.

Cc: James Bottomley 
Cc: Tom Lendacky 
Cc: Eric Blake 
Cc: Paolo Bonzini 
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 linux-headers/linux/kvm.h |  8 ++
 qapi/misc-target.json | 38 +++
 target/i386/monitor.c |  6 +
 target/i386/sev-stub.c|  7 +
 target/i386/sev.c | 54 +++
 target/i386/sev_i386.h|  2 ++
 6 files changed, 115 insertions(+)

diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 56ce14ad20..6d0f8101ba 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1585,6 +1585,8 @@ enum sev_cmd_id {
KVM_SEV_DBG_ENCRYPT,
/* Guest certificates commands */
KVM_SEV_CERT_EXPORT,
+   /* Attestation report */
+   KVM_SEV_GET_ATTESTATION_REPORT,
 
KVM_SEV_NR_MAX,
 };
@@ -1637,6 +1639,12 @@ struct kvm_sev_dbg {
__u32 len;
 };
 
+struct kvm_sev_attestation_report {
+   __u8 mnonce[16];
+   __u64 uaddr;
+   __u32 len;
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX   (1 << 2)
diff --git a/qapi/misc-target.json b/qapi/misc-target.json
index 1e561fa97b..ec6565e6ef 100644
--- a/qapi/misc-target.json
+++ b/qapi/misc-target.json
@@ -267,3 +267,41 @@
 ##
 { 'command': 'query-gic-capabilities', 'returns': ['GICCapability'],
   'if': 'defined(TARGET_ARM)' }
+
+
+##
+# @SevAttestationReport:
+#
+# The struct describes attestation report for a Secure Encrypted Virtualization
+# feature.
+#
+# @data:  guest attestation report (base64 encoded)
+#
+#
+# Since: 5.2
+##
+{ 'struct': 'SevAttestationReport',
+  'data': { 'data': 'str'},
+  'if': 'defined(TARGET_I386)' }
+
+##
+# @query-sev-attestation-report:
+#
+# This command is used to get the SEV attestation report, and is supported on 
AMD
+# X86 platforms only.
+#
+# @mnonce: a random 16 bytes of data (it will be included in report)
+#
+# Returns: SevAttestationReport objects.
+#
+# Since: 5.2
+#
+# Example:
+#
+# -> { "execute" : "query-sev-attestation-report", "arguments": { "mnonce": 
"aaa" } }
+# <- { "return" : { "data": "bbbd"} }
+#
+##
+{ 'command': 'query-sev-attestation-report', 'data': { 'mnonce': 'str' },
+  'returns': 'SevAttestationReport',
+  'if': 'defined(TARGET_I386)' }
diff --git a/target/i386/monitor.c b/target/i386/monitor.c
index 9f9e1c42f4..a4b65f330c 100644
--- a/target/i386/monitor.c
+++ b/target/i386/monitor.c
@@ -729,3 +729,9 @@ SevCapability *qmp_query_sev_capabilities(Error **errp)
 {
 return sev_get_capabilities(errp);
 }
+
+SevAttestationReport *
+qmp_query_sev_attestation_report(const char *mnonce, Error **errp)
+{
+return sev_get_attestation_report(mnonce, errp);
+}
diff --git a/target/i386/sev-stub.c b/target/i386/sev-stub.c
index 88e3f39a1e..66d16f53d8 100644
--- a/target/i386/sev-stub.c
+++ b/target/i386/sev-stub.c
@@ -49,3 +49,10 @@ SevCapability *sev_get_capabilities(Error **errp)
 error_setg(errp, "SEV is not available in this QEMU");
 return NULL;
 }
+
+SevAttestationReport *
+sev_get_attestation_report(const char *mnonce, Error **errp)
+{
+error_setg(errp, "SEV is not available in this QEMU");
+return NULL;
+}
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 93c4d60b82..28958fb71b 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -68,6 +68,7 @@ struct SevGuestState {
 
 #define DEFAULT_GUEST_POLICY0x1 /* disable debug */
 #define DEFAULT_SEV_DEVICE  "/dev/sev"
+#define DEFAULT_ATTESATION_REPORT_BUF_SIZE  4096
 
 static SevGuestState *sev_guest;
 static Error *sev_mig_blocker;
@@ -490,6 +491,59 @@ out:
 return cap;
 }
 
+SevAttestationReport *
+sev_get_attestation_report(const char *mnonce, Error **errp)
+{
+struct kvm_sev_attestation_report input = {};
+SevGuestState *sev = sev_guest;
+SevAttestationReport *report;
+guchar *data;
+int err = 0, ret;
+
+if (!sev_enabled()) {
+error_setg(errp, "SEV is not enabled");
+return NULL;
+}
+
+/* Verify that user provided random data length */
+if (str

Re: [PATCH v5] sev: add sev-inject-launch-secret

2020-10-15 Thread Brijesh Singh


On 10/15/20 9:37 AM, to...@linux.ibm.com wrote:
> From: Tobin Feldman-Fitzthum 
>
> AMD SEV allows a guest owner to inject a secret blob
> into the memory of a virtual machine. The secret is
> encrypted with the SEV Transport Encryption Key and
> integrity is guaranteed with the Transport Integrity
> Key. Although QEMU facilitates the injection of the
> launch secret, it cannot access the secret.
>
> Signed-off-by: Tobin Feldman-Fitzthum 
> Reviewed-by: Daniel P. Berrangé 


Reviewed-by: Brijesh Singh 

thanks


> ---
>  include/monitor/monitor.h |  3 ++
>  include/sysemu/sev.h  |  2 ++
>  monitor/misc.c|  8 ++---
>  qapi/misc-target.json | 18 +++
>  target/i386/monitor.c |  7 +
>  target/i386/sev-stub.c|  5 +++
>  target/i386/sev.c | 65 +++
>  target/i386/trace-events  |  1 +
>  8 files changed, 105 insertions(+), 4 deletions(-)
>
> diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
> index 348bfad3d5..af3887bb71 100644
> --- a/include/monitor/monitor.h
> +++ b/include/monitor/monitor.h
> @@ -4,6 +4,7 @@
>  #include "block/block.h"
>  #include "qapi/qapi-types-misc.h"
>  #include "qemu/readline.h"
> +#include "include/exec/hwaddr.h"
>  
>  typedef struct MonitorHMP MonitorHMP;
>  typedef struct MonitorOptions MonitorOptions;
> @@ -37,6 +38,8 @@ void monitor_flush(Monitor *mon);
>  int monitor_set_cpu(Monitor *mon, int cpu_index);
>  int monitor_get_cpu_index(Monitor *mon);
>  
> +void *gpa2hva(MemoryRegion **p_mr, hwaddr addr, uint64_t size, Error **errp);
> +
>  void monitor_read_command(MonitorHMP *mon, int show_prompt);
>  int monitor_read_password(MonitorHMP *mon, ReadLineFunc *readline_func,
>void *opaque);
> diff --git a/include/sysemu/sev.h b/include/sysemu/sev.h
> index 98c1ec8d38..7ab6e3e31d 100644
> --- a/include/sysemu/sev.h
> +++ b/include/sysemu/sev.h
> @@ -18,4 +18,6 @@
>  
>  void *sev_guest_init(const char *id);
>  int sev_encrypt_data(void *handle, uint8_t *ptr, uint64_t len);
> +int sev_inject_launch_secret(const char *hdr, const char *secret,
> + uint64_t gpa, Error **errp);
>  #endif
> diff --git a/monitor/misc.c b/monitor/misc.c
> index 4a859fb24a..f1ade245d5 100644
> --- a/monitor/misc.c
> +++ b/monitor/misc.c
> @@ -667,10 +667,10 @@ static void hmp_physical_memory_dump(Monitor *mon, 
> const QDict *qdict)
>  memory_dump(mon, count, format, size, addr, 1);
>  }
>  
> -static void *gpa2hva(MemoryRegion **p_mr, hwaddr addr, Error **errp)
> +void *gpa2hva(MemoryRegion **p_mr, hwaddr addr, uint64_t size, Error **errp)
>  {
>  MemoryRegionSection mrs = memory_region_find(get_system_memory(),
> - addr, 1);
> + addr, size);
>  
>  if (!mrs.mr) {
>  error_setg(errp, "No memory is mapped at address 0x%" HWADDR_PRIx, 
> addr);
> @@ -694,7 +694,7 @@ static void hmp_gpa2hva(Monitor *mon, const QDict *qdict)
>  MemoryRegion *mr = NULL;
>  void *ptr;
>  
> -ptr = gpa2hva(, addr, _err);
> +ptr = gpa2hva(, addr, 1, _err);
>  if (local_err) {
>  error_report_err(local_err);
>  return;
> @@ -770,7 +770,7 @@ static void hmp_gpa2hpa(Monitor *mon, const QDict *qdict)
>  void *ptr;
>  uint64_t physaddr;
>  
> -ptr = gpa2hva(, addr, _err);
> +ptr = gpa2hva(, addr, 1, _err);
>  if (local_err) {
>  error_report_err(local_err);
>  return;
> diff --git a/qapi/misc-target.json b/qapi/misc-target.json
> index 1e561fa97b..4486a543ae 100644
> --- a/qapi/misc-target.json
> +++ b/qapi/misc-target.json
> @@ -201,6 +201,24 @@
>  { 'command': 'query-sev-capabilities', 'returns': 'SevCapability',
>'if': 'defined(TARGET_I386)' }
>  
> +##
> +# @sev-inject-launch-secret:
> +#
> +# This command injects a secret blob into memory of SEV guest.
> +#
> +# @packet-header: the launch secret packet header encoded in base64
> +#
> +# @secret: the launch secret data to be injected encoded in base64
> +#
> +# @gpa: the guest physical address where secret will be injected.
> +#
> +# Since: 5.2
> +#
> +##
> +{ 'command': 'sev-inject-launch-secret',
> +  'data': { 'packet-header': 'str', 'secret': 'str', 'gpa': 'uint64' },
> +  'if': 'defined(TARGET_I386)' }
> +
>  ##
>  # @dump-skeys:
>  #
> diff --git a/target/i386/monitor.c b/target/i386/monitor.c
> index 7abae3c8df..f9d4951465 100644
> --- a/target/i386/monitor.c
> +++ b/target/i386/monitor.c
> @@ -728,3 +728,10 @@

Re: [PATCH v4] sev: add sev-inject-launch-secret

2020-10-14 Thread Brijesh Singh


On 10/14/20 10:17 AM, to...@linux.ibm.com wrote:
> From: Tobin Feldman-Fitzthum 
>
> AMD SEV allows a guest owner to inject a secret blob
> into the memory of a virtual machine. The secret is
> encrypted with the SEV Transport Encryption Key and
> integrity is guaranteed with the Transport Integrity
> Key. Although QEMU facilitates the injection of the
> launch secret, it cannot access the secret.
>
> Signed-off-by: Tobin Feldman-Fitzthum 
> ---
>  include/monitor/monitor.h |  3 ++
>  include/sysemu/sev.h  |  2 ++
>  monitor/misc.c|  8 +++---
>  qapi/misc-target.json | 18 
>  target/i386/monitor.c |  7 +
>  target/i386/sev-stub.c|  5 
>  target/i386/sev.c | 60 +++
>  target/i386/trace-events  |  1 +
>  8 files changed, 100 insertions(+), 4 deletions(-)
>
> diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
> index 348bfad3d5..af3887bb71 100644
> --- a/include/monitor/monitor.h
> +++ b/include/monitor/monitor.h
> @@ -4,6 +4,7 @@
>  #include "block/block.h"
>  #include "qapi/qapi-types-misc.h"
>  #include "qemu/readline.h"
> +#include "include/exec/hwaddr.h"
>  
>  typedef struct MonitorHMP MonitorHMP;
>  typedef struct MonitorOptions MonitorOptions;
> @@ -37,6 +38,8 @@ void monitor_flush(Monitor *mon);
>  int monitor_set_cpu(Monitor *mon, int cpu_index);
>  int monitor_get_cpu_index(Monitor *mon);
>  
> +void *gpa2hva(MemoryRegion **p_mr, hwaddr addr, uint64_t size, Error **errp);
> +
>  void monitor_read_command(MonitorHMP *mon, int show_prompt);
>  int monitor_read_password(MonitorHMP *mon, ReadLineFunc *readline_func,
>void *opaque);
> diff --git a/include/sysemu/sev.h b/include/sysemu/sev.h
> index 98c1ec8d38..7ab6e3e31d 100644
> --- a/include/sysemu/sev.h
> +++ b/include/sysemu/sev.h
> @@ -18,4 +18,6 @@
>  
>  void *sev_guest_init(const char *id);
>  int sev_encrypt_data(void *handle, uint8_t *ptr, uint64_t len);
> +int sev_inject_launch_secret(const char *hdr, const char *secret,
> + uint64_t gpa, Error **errp);
>  #endif
> diff --git a/monitor/misc.c b/monitor/misc.c
> index 4a859fb24a..f1ade245d5 100644
> --- a/monitor/misc.c
> +++ b/monitor/misc.c
> @@ -667,10 +667,10 @@ static void hmp_physical_memory_dump(Monitor *mon, 
> const QDict *qdict)
>  memory_dump(mon, count, format, size, addr, 1);
>  }
>  
> -static void *gpa2hva(MemoryRegion **p_mr, hwaddr addr, Error **errp)
> +void *gpa2hva(MemoryRegion **p_mr, hwaddr addr, uint64_t size, Error **errp)
>  {
>  MemoryRegionSection mrs = memory_region_find(get_system_memory(),
> - addr, 1);
> + addr, size);
>  
>  if (!mrs.mr) {
>  error_setg(errp, "No memory is mapped at address 0x%" HWADDR_PRIx, 
> addr);
> @@ -694,7 +694,7 @@ static void hmp_gpa2hva(Monitor *mon, const QDict *qdict)
>  MemoryRegion *mr = NULL;
>  void *ptr;
>  
> -ptr = gpa2hva(, addr, _err);
> +ptr = gpa2hva(, addr, 1, _err);
>  if (local_err) {
>  error_report_err(local_err);
>  return;
> @@ -770,7 +770,7 @@ static void hmp_gpa2hpa(Monitor *mon, const QDict *qdict)
>  void *ptr;
>  uint64_t physaddr;
>  
> -ptr = gpa2hva(, addr, _err);
> +ptr = gpa2hva(, addr, 1, _err);
>  if (local_err) {
>  error_report_err(local_err);
>  return;
> diff --git a/qapi/misc-target.json b/qapi/misc-target.json
> index 1e561fa97b..4486a543ae 100644
> --- a/qapi/misc-target.json
> +++ b/qapi/misc-target.json
> @@ -201,6 +201,24 @@
>  { 'command': 'query-sev-capabilities', 'returns': 'SevCapability',
>'if': 'defined(TARGET_I386)' }
>  
> +##
> +# @sev-inject-launch-secret:
> +#
> +# This command injects a secret blob into memory of SEV guest.
> +#
> +# @packet-header: the launch secret packet header encoded in base64
> +#
> +# @secret: the launch secret data to be injected encoded in base64
> +#
> +# @gpa: the guest physical address where secret will be injected.
> +#
> +# Since: 5.2
> +#
> +##
> +{ 'command': 'sev-inject-launch-secret',
> +  'data': { 'packet-header': 'str', 'secret': 'str', 'gpa': 'uint64' },
> +  'if': 'defined(TARGET_I386)' }
> +
>  ##
>  # @dump-skeys:
>  #
> diff --git a/target/i386/monitor.c b/target/i386/monitor.c
> index 7abae3c8df..f9d4951465 100644
> --- a/target/i386/monitor.c
> +++ b/target/i386/monitor.c
> @@ -728,3 +728,10 @@ SevCapability *qmp_query_sev_capabilities(Error **errp)
>  {
>  return sev_get_capabilities(errp);
>  }
> +
> +void qmp_sev_inject_launch_secret(const char *packet_hdr,
> +  const char *secret, uint64_t gpa,
> +  Error **errp)
> +{
> +sev_inject_launch_secret(packet_hdr, secret, gpa, errp);
> +}
> diff --git a/target/i386/sev-stub.c b/target/i386/sev-stub.c
> index 88e3f39a1e..2d2ee54cc6 100644
> --- a/target/i386/sev-stub.c

Re: SEV guest debugging support for Qemu

2020-09-24 Thread Brijesh Singh


On 9/24/20 2:06 PM, Ashish Kalra wrote:
> Hello Dave,
>
> Thanks for your response, please see my replies inline :
>
> On Thu, Sep 24, 2020 at 02:53:42PM +0100, Dr. David Alan Gilbert wrote:
>> * Ashish Kalra (ashish.ka...@amd.com) wrote:
>>> Hello Alan, Paolo,
>>>
>>> I am following up on Brijesh’s patches for SEV guest debugging support for 
>>> Qemu using gdb and/or qemu monitor.
>>> I believe that last time, Qemu SEV debug patches were not applied and have 
>>> attached the link to the email thread and Paolo’s feedback below for 
>>> reference [1].
>>> I wanted to re-start a discussion on the same here with the Qemu community 
>>> and seek the feedback on the approaches which we are considering :
>>> Looking at Qemu code, I see the following interface is defined, for virtual 
>>> memory access for debug : cpu_memory_rw_debug(). 
>>> Both gdbstub (target_memory_rw_debug() ) and QMP/HMP (monitor/misc.c : 
>>> memory_dump() ) use this standard and well-defined interface to access 
>>> guest memory for debugging purposes. 
>>>
>>> This internally invokes the address_space_rw() accessor functions which we 
>>> had  "fixed" internally (as part of the earlier patch) to invoke memory 
>>> region specific debug ops. 
>>> In our earlier approach we were adding debug ops/callbacks to memory 
>>> regions and as per comments on our earlier patches, Paolo was not happy 
>>> with this debug API for
>>> MemoryRegions and hence the SEV support for Qemu was merged without the 
>>> debug support.
>>>
>>> Now, we want to reuse this cpu_memory_rw_debug() interface or alternatively 
>>> introduce a new generic debug interface/object in the Qemu. This 
>>> debug interface should be controlled through the global machine policy.
>> Let me leave the question of how the memory_rw_debug interface should
>> work to Paolo.
>>
>>> For e.g., 
>>> # $QEMU -machine -debug=
>>> or
>>> # $QEMU -machine -debug=sev-guest-debug
>>>
>>> The QMP and GDB access will be updated to use the generic debug  interface. 
>>> The generic debug interface or the cpu_memory_rw_debug() interace will 
>>> introduce hooks to call a 
>>> vendor specific debug object to delegate accessing the data. The vendor 
>>> specific debug object may do a further checks before and after accessing 
>>> the memory.
>> I'm not sure that needs a commandline switch for it; since you can
>> already get it from the guest policy in the sev object and I can't think
>> of any other cases that would need something similar.
> Yes, i agree with that, so i am now considering abstracting this vendor
> specific debug interface via CPUClass object instead of doing it via
> MemoryRegions. 
>
>>> Now, looking specifically at cpu_memory_rw_debug() interface, this 
>>> interface is invoked for all guest memory accesses for debugging purposes 
>>> and it also does 
>>> guest VA to GPA translation via cpu_get_phys_page_attrs_debug(), so we can 
>>> again add a vendor specific callback here to do guest VA to GPA 
>>> translations specific
>>> to SEV as SEV guest debugging will also require accessing guest page table 
>>> entries and decrypting them via the SEV DBG_DECRYPT APIs and additionally 
>>> clearing
>>> the C-bit on page table entries (PxEs) before using them further for page 
>>> table walks.
>>>
>>> There is still an issue with the generic cpu_memory_rw_debug() interface, 
>>> though it is used for all guest memory accesses for debugging and we can 
>>> also handle
>>> guest page table walks via it (as mentioned above), there are still other 
>>> gdb/monitor commands such as tlb_info_xx() and mem_info_xx() which also do 
>>> guest page
>>> table walks, but they don’t go through any generic guest memory 
>>> access/debug interface, so these commands will need to be handled 
>>> additionally for SEV.
>> If some of those should be using the debug interface and aren't then
>> please fix them anyway.
>>
>>> The vendor specific debug object (added as a hook to generic debug object 
>>> or the generic cpu_memory_rw_debug() interface) will do further checks 
>>> before and after accessing the memory.
>>>
>>> e.g., in the case of SEV,
>>>
>>> 1. Check the guest policy, if guest policy does not allow debug then return 
>>> an error.
>>>
>>> 2. If its an MMIO region then access the data.
>>>
>>> 3. If its RAM region then call the PSP commands to decrypt the data.
>>>
>>> 4. If caller asked to read the PTE entry then probably clear the C-bits 
>>> after reading the PTE entry.
>> Does that work if the guest is currently running?
>>
> I assume you are asking that is this done when guest is being debugged,
> the above steps are only done when the guest is paused and being debugged.


I don't why we need to pause the guest. Ideally we should be able to
connect to Qemu monitor and run the "x" command to dump memory. IIRC, if
paging is enabled then monitor will walk the guest page table to reach
to gpa. Something like this in the Qemu monitor console should work:

x /10i $eip


> Thanks,
> Ashish
>

Re: [PATCH v2] SEV: QMP support for Inject-Launch-Secret

2020-07-03 Thread Brijesh Singh


On 7/3/20 6:11 AM, Dr. David Alan Gilbert wrote:
> * Tobin Feldman-Fitzthum (to...@linux.vnet.ibm.com) wrote:
>> From: Tobin Feldman-Fitzthum 
>>
>> AMD SEV allows a guest owner to inject a secret blob
>> into the memory of a virtual machine. The secret is
>> encrypted with the SEV Transport Encryption Key and
>> integrity is guaranteed with the Transport Integrity
>> Key. Although QEMU faciliates the injection of the
>> launch secret, it cannot access the secret.
>>
>> Signed-off-by: Tobin Feldman-Fitzthum 
>> ---
>>  include/monitor/monitor.h |  3 ++
>>  include/sysemu/sev.h  |  2 ++
>>  monitor/misc.c|  8 ++---
>>  qapi/misc-target.json | 18 +++
>>  target/i386/monitor.c |  9 ++
>>  target/i386/sev-stub.c|  5 +++
>>  target/i386/sev.c | 66 +++
>>  target/i386/sev_i386.h|  3 ++
>>  target/i386/trace-events  |  1 +
>>  9 files changed, 111 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
>> index 1018d754a6..bf049c5b00 100644
>> --- a/include/monitor/monitor.h
>> +++ b/include/monitor/monitor.h
>> @@ -4,6 +4,7 @@
>>  #include "block/block.h"
>>  #include "qapi/qapi-types-misc.h"
>>  #include "qemu/readline.h"
>> +#include "include/exec/hwaddr.h"
>>  
>>  extern __thread Monitor *cur_mon;
>>  typedef struct MonitorHMP MonitorHMP;
>> @@ -36,6 +37,8 @@ void monitor_flush(Monitor *mon);
>>  int monitor_set_cpu(int cpu_index);
>>  int monitor_get_cpu_index(void);
>>  
>> +void *gpa2hva(MemoryRegion **p_mr, hwaddr addr, uint64_t size, Error 
>> **errp);
>> +
>>  void monitor_read_command(MonitorHMP *mon, int show_prompt);
>>  int monitor_read_password(MonitorHMP *mon, ReadLineFunc *readline_func,
>>void *opaque);
>> diff --git a/include/sysemu/sev.h b/include/sysemu/sev.h
>> index 98c1ec8d38..b279b293e8 100644
>> --- a/include/sysemu/sev.h
>> +++ b/include/sysemu/sev.h
>> @@ -18,4 +18,6 @@
>>  
>>  void *sev_guest_init(const char *id);
>>  int sev_encrypt_data(void *handle, uint8_t *ptr, uint64_t len);
>> +int sev_inject_launch_secret(const char *hdr, const char *secret,
>> + uint64_t gpa);
>>  #endif
>> diff --git a/monitor/misc.c b/monitor/misc.c
>> index 89bb970b00..b9ec8ba410 100644
>> --- a/monitor/misc.c
>> +++ b/monitor/misc.c
>> @@ -674,10 +674,10 @@ static void hmp_physical_memory_dump(Monitor *mon, 
>> const QDict *qdict)
>>  memory_dump(mon, count, format, size, addr, 1);
>>  }
>>  
>> -static void *gpa2hva(MemoryRegion **p_mr, hwaddr addr, Error **errp)
>> +void *gpa2hva(MemoryRegion **p_mr, hwaddr addr, uint64_t size, Error **errp)
>>  {
>>  MemoryRegionSection mrs = memory_region_find(get_system_memory(),
>> - addr, 1);
>> + addr, size);
>>  
>>  if (!mrs.mr) {
>>  error_setg(errp, "No memory is mapped at address 0x%" HWADDR_PRIx, 
>> addr);
>> @@ -701,7 +701,7 @@ static void hmp_gpa2hva(Monitor *mon, const QDict *qdict)
>>  MemoryRegion *mr = NULL;
>>  void *ptr;
>>  
>> -ptr = gpa2hva(, addr, _err);
>> +ptr = gpa2hva(, addr, 1, _err);
>>  if (local_err) {
>>  error_report_err(local_err);
>>  return;
>> @@ -777,7 +777,7 @@ static void hmp_gpa2hpa(Monitor *mon, const QDict *qdict)
>>  void *ptr;
>>  uint64_t physaddr;
>>  
>> -ptr = gpa2hva(, addr, _err);
>> +ptr = gpa2hva(, addr, 1, _err);
>>  if (local_err) {
>>  error_report_err(local_err);
>>  return;
>> diff --git a/qapi/misc-target.json b/qapi/misc-target.json
>> index dee3b45930..d145f916b3 100644
>> --- a/qapi/misc-target.json
>> +++ b/qapi/misc-target.json
>> @@ -200,6 +200,24 @@
>>  { 'command': 'query-sev-capabilities', 'returns': 'SevCapability',
>>'if': 'defined(TARGET_I386)' }
>>  
>> +##
>> +# @sev-inject-launch-secret:
>> +#
>> +# This command injects a secret blob into memory of SEV guest.
>> +#
>> +# @packet-header: the launch secret packet header encoded in base64
>> +#
>> +# @secret: the launch secret data to be injected encoded in base64
>> +#
>> +# @gpa: the guest physical address where secret will be injected.
>> +#
>> +# Since: 5.1
>> +#
>> +##
>> +{ 'command': 'sev-inject-launch-secret',
>> +  'data': { 'packet-header': 'str', 'secret': 'str', 'gpa': 'uint64' },
>> +  'if': 'defined(TARGET_I386)' }
>> +
>>  ##
>>  # @dump-skeys:
>>  #
>> diff --git a/target/i386/monitor.c b/target/i386/monitor.c
>> index 27ebfa3ad2..42bcfe6dc0 100644
>> --- a/target/i386/monitor.c
>> +++ b/target/i386/monitor.c
>> @@ -736,3 +736,12 @@ SevCapability *qmp_query_sev_capabilities(Error **errp)
>>  
>>  return data;
>>  }
>> +
>> +void qmp_sev_inject_launch_secret(const char *packet_hdr,
>> +  const char *secret, uint64_t gpa,
>> +  Error **errp)
>> +{
>> +if 

Re: [PATCH v5 07/18] s390x: protvirt: Inhibit balloon when switching to protected mode

2020-03-24 Thread Brijesh Singh


On 3/20/20 1:43 PM, Halil Pasic wrote:
> On Thu, 19 Mar 2020 18:31:11 +0100
> David Hildenbrand  wrote:
>
>> [...]
>>
 I asked this question already to Michael (cc) via a different
 channel, but hare is it again:

 Why does the balloon driver not support VIRTIO_F_IOMMU_PLATFORM? It
 is absolutely not clear to me. The introducing commit mentioned
 that it "bypasses DMA". I fail to see that.

 At least the communication via the SG mechanism should work
 perfectly fine with an IOMMU enabled. So I assume it boils down to
 the pages that we inflate/deflate not being referenced via IOVA?
>>> AFAIU the IOVA/GPA stuff is not the problem here. You have said it
>>> yourself, the SG mechanism would work for balloon out of the box, as
>>> it does for the other virtio devices. 
>>>
>>> But VIRTIO_F_ACCESS_PLATFORM (aka VIRTIO_F_IOMMU_PLATFORM)  not
>>> presented means according to Michael that the device has full access
>>> to the entire guest RAM. If VIRTIO_F_ACCESS_PLATFORM is negotiated
>>> this may or may not be the case.
>> So you say
>>
>> "The virtio specification tells that the device is to present
>> VIRTIO_F_ACCESS_PLATFORM (a.k.a. VIRTIO_F_IOMMU_PLATFORM) when the
>> device "can only access certain memory addresses with said access
>> specified and/or granted by the platform"."
>>
>> So, AFAIU, *any* virtio device (hypervisor side) has to present this
>> flag when PV is enabled. 
> Yes, and balloon says bye bye when running in PV mode is only a secondary
> objective. I've compiled some references:
>
> "To summarize, the necessary conditions for a hack along these lines
> (using DMA API without VIRTIO_F_ACCESS_PLATFORM) are that we detect that:
>
>   - secure guest mode is enabled - so we know that since we don't share
> most memory regular virtio code won't
> work, even though the buggy hypervisor didn't set 
> VIRTIO_F_ACCESS_PLATFORM" 
> (Michael Tsirkin, 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org%2Flkml%2F2020%2F2%2F20%2F1021data=02%7C01%7Cbrijesh.singh%40amd.com%7C52b79b5c9e894dd968c508d7ccfe9479%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637203266090844487sdata=aNS%2FW2nL27mPSl1Xz3iXUY31qtrzmVHYhzVHEILAaQQ%3Dreserved=0)
> I.e.: PV but !VIRTIO_F_ACCESS_PLATFORM \implies bugy hypervisor
>
>
> "If VIRTIO_F_ACCESS_PLATFORM is set then things just work.  If
> VIRTIO_F_ACCESS_PLATFORM is clear device is supposed to have access to
> all of memory.  You can argue in various ways but it's easier to just
> declare a behaviour that violates this a bug."
> (Michael Tsirkin, 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org%2Flkml%2F2020%2F2%2F21%2F1626data=02%7C01%7Cbrijesh.singh%40amd.com%7C52b79b5c9e894dd968c508d7ccfe9479%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637203266090854439sdata=d3knybBUZ5NL0Lv1C2JS040A3toiCxXVYLkBlzXSrqc%3Dreserved=0)
> This one is about all memory guest, and not just the buffers transfered
> via the virtqueue, which surprised me a bit at the beginning. But balloon
> actually needs this.
>
> "A device SHOULD offer VIRTIO_F_ACCESS_PLATFORM if its access to memory
> is through bus addresses distinct from and translated by the platform to
> physical addresses used by the driver, and/or if it can only access
> certain memory addresses with said access specified and/or granted by
> the platform. A device MAY fail to operate further if
> VIRTIO_F_ACCESS_PLATFORM is not accepted. "
> (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.oasis-open.org%2Fvirtio%2Fvirtio%2Fv1.1%2Fcs01%2Fvirtio-v1.1-cs01.html%23x1-4120002data=02%7C01%7Cbrijesh.singh%40amd.com%7C52b79b5c9e894dd968c508d7ccfe9479%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637203266090854439sdata=RBx8cBr8I%2FWFChtVFTjBygRiHIXMmsjT8W%2BwLaTNQ24%3Dreserved=0)
>
>
>> In that regard, your patch makes perfect sense
>> (although I am not sure it's a good idea to overwrite these feature
>> bits
>> - maybe they should be activated on the cmdline permanently instead
>> when PV is to be used? (or enable )).
> I didn't understand the last part. I believe conserving the user
> specified value when not running in PV mode is better than the hard
> overwrite I did here. I wanted a discussion starter.
>
> I think the other option (with respect to let QEMU manage this for user,
> i.e. what I try to do here) is to fence the conversion if virtio devices
> that do not offer VIRTIO_F_ACCESS_PLATFORM are attached; and disallow
> hotplug of such devices at some point during the conversion.
>
> I believe that alternative is even uglier.
>
> IMHO we don't want the end user to fiddle with iommu_platform, because
> all the 'benefit' he gets from that is possibility to make a mistake.
> For example, I got an internal bug report saying virtio is broken with
> PV, which boiled down to an overlooked auto generated NIC, which of
> course had iommu_platform (VIRTIO_F_ACCESS_PLATFORM) not set.
>
>>> The actual problem is that the 

Re: [PATCH V2] vhost: correctly turn on VIRTIO_F_IOMMU_PLATFORM

2020-03-13 Thread Brijesh Singh


On 3/13/20 7:44 AM, Halil Pasic wrote:
> [..]
>>> CCing Tom. @Tom does vhost-vsock work for you with SEV and current qemu?
>>>
>>> Also, one can specify iommu_platform=on on a device that ain't a part of
>>> a secure-capable VM, just for the fun of it. And that breaks
>>> vhost-vsock. Or is setting iommu_platform=on only valid if
>>> qemu-system-s390x is protected virtualization capable?
>>>
>>> BTW, I don't have a strong opinion on the fixes tag. We currently do not
>>> recommend setting iommu_platform, and thus I don't think we care too
>>> much about past qemus having problems with it.
>>>
>>> Regards,
>>> Halil
>>
>> Let's just say if we do have a Fixes: tag we want to set it correctly to
>> the commit that needs this fix.
>>
> I finally did some digging regarding the performance degradation. For
> s390x the performance degradation on vhost-net was introduced by commit
> 076a93d797 ("exec: simplify address_space_get_iotlb_entry"). Before
> IOMMUTLBEntry.addr_mask used to be based on plen, which in turn was
> calculated as the rest of the memory regions size (from address), and
> covered most of the guest address space. That is we didn't have a whole
> lot of IOTLB API overhead.
>
> With commit 076a93d797 I see IOMMUTLBEntry.addr_mask == 0xfff which comes
> as ~TARGET_PAGE_MASK from flatview_do_translate(). To have things working
> properly I applied 75e5b70e6, b021d1c044, and d542800d1e on the level of
> 076a93d797 and 076a93d797~1.
>
> Regarding vhost-vsock. It does not work with iommu_platform=on since the
> very beginning (i.e. 8607f5c307 ("virtio: convert to use DMA api")). Not
> sure if that is a good or a bad thing. (If the vhost driver in the kernel
> would actually have to do the IOTLB translation, then failing in case
> where it does not support it seems sane. The problem is that
> ACCESS_PLATFORM is used for more than one thing (needs translation, and
> restricted memory access).)
>
> I don't think I've heard back from AMD whether vsock works with SEV or
> not... I don't have access to HW to test it myself.


I just tried vhost-vsock on AMD SEV machine and it does not work. I am
using FC31 (qemu 4.1.1.1.fc31).


> We (s390) don't require this being backported to the stable qemus,
> because for us iommu_platform=on becomes relevant with protected
> virtualization, and those qemu versions don't support it.
>
> Cheers,
> Halil
>



Re: [Qemu-devel] [PATCH v2 4/8] x86_iommu/amd: Prepare for interrupt remap support

2018-09-17 Thread Brijesh Singh




On 09/17/2018 01:06 PM, Eduardo Habkost wrote:
...#define TYPE_AMD_IOMMU_DEVICE "amd-iommu"

   #define AMD_IOMMU_DEVICE(obj)\
@@ -278,6 +288,9 @@ typedef struct AMDVIState {
   /* IOTLB */
   GHashTable *iotlb;
+
+/* Interrupt remapping */
+bool intr_enabled;


Why do you need this field if the same info is already available
at AMDVIState::iommu::intr_supported?



Again this is to be consistent with intel-iommu structure which has this
fields. Having said that I should be able to access the
AMDVIState::iommu::intr_supported and remove this new field.


intel-iommu seems to need the field because intr_enabled can
change at runtime at vtd_handle_gcmd_ire().  Does amd-iommu have
anything equivalent?




Actually there is no MMIO control register writes to detect whether
the guest OS enabled the interrupt remap feature. I believe in case of
intel we are able to intercept some register writes which gives info
whether the IR is enabled by guest. In amd-iommu, only way to check if
whether guest OS has enabled the IR is by looking at DTE IV bit - this
lookup is done when we get request to remap the interrupt.

In summary, intr_enabled = iommu->intr_supported hence I also do
not see much value for having this field in AMDVIState.

-Brijesh




Re: [Qemu-devel] [PATCH v2 4/8] x86_iommu/amd: Prepare for interrupt remap support

2018-09-17 Thread Brijesh Singh




On 09/17/2018 08:49 AM, Eduardo Habkost wrote:

Hi,

I couldn't review the whole patch yet, but I have some comments
below:

On Fri, Sep 14, 2018 at 01:26:59PM -0500, Brijesh Singh wrote:

Register the interrupt remapping callback and read/write ops for the
amd-iommu-ir memory region.

amd-iommu-ir is set to higher priority to ensure that this region won't
be masked out by other memory regions.

While at it, add a overlapping amd-iommu region with higher priority
and update address space name to include the devfn.

Cc: "Michael S. Tsirkin" 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Tom Lendacky 
Cc: Suravee Suthikulpanit 
Signed-off-by: Brijesh Singh 
---
  hw/i386/amd_iommu.c  | 140 ---
  hw/i386/amd_iommu.h  |  17 ++-
  hw/i386/trace-events |   5 ++
  3 files changed, 154 insertions(+), 8 deletions(-)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 225825e..b15962b 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -26,6 +26,7 @@
  #include "amd_iommu.h"
  #include "qapi/error.h"
  #include "qemu/error-report.h"
+#include "hw/i386/apic_internal.h"
  #include "trace.h"
  
  /* used AMD-Vi MMIO registers */

@@ -56,6 +57,7 @@ struct AMDVIAddressSpace {
  uint8_t devfn;  /* device function  */
  AMDVIState *iommu_state;/* AMDVI - one per machine  */
  IOMMUMemoryRegion iommu;/* Device's address translation region  */
+MemoryRegion root;  /* AMDVI Root memory map region */
  MemoryRegion iommu_ir;  /* Device's interrupt remapping region  */
  AddressSpace as;/* device's corresponding address space */
  };
@@ -1027,10 +1029,104 @@ static IOMMUTLBEntry amdvi_translate(IOMMUMemoryRegion 
*iommu, hwaddr addr,
  return ret;
  }
  
+/* Interrupt remapping for MSI/MSI-X entry */

+static int amdvi_int_remap_msi(AMDVIState *iommu,
+   MSIMessage *origin,
+   MSIMessage *translated,
+   uint16_t sid)
+{
+assert(origin && translated);
+
+trace_amdvi_ir_remap_msi_req(origin->address, origin->data, sid);
+
+if (!iommu || !iommu->intr_enabled) {
+memcpy(translated, origin, sizeof(*origin));
+goto out;
+}
+
+if (origin->address & AMDVI_MSI_ADDR_HI_MASK) {
+trace_amdvi_err("MSI address high 32 bits non-zero when "
+"Interrupt Remapping enabled.");
+return -AMDVI_IR_ERR;
+}
+
+if ((origin->address & AMDVI_MSI_ADDR_LO_MASK) != APIC_DEFAULT_ADDRESS) {
+trace_amdvi_err("MSI is not from IOAPIC.");
+return -AMDVI_IR_ERR;
+}
+
+out:
+trace_amdvi_ir_remap_msi(origin->address, origin->data,
+ translated->address, translated->data);
+return 0;
+}
+
+static int amdvi_int_remap(X86IOMMUState *iommu,
+   MSIMessage *origin,
+   MSIMessage *translated,
+   uint16_t sid)
+{
+return amdvi_int_remap_msi(AMD_IOMMU_DEVICE(iommu), origin,
+   translated, sid);
+}
+
+static MemTxResult amdvi_mem_ir_write(void *opaque, hwaddr addr,
+  uint64_t value, unsigned size,
+  MemTxAttrs attrs)
+{
+int ret;
+MSIMessage from = { 0, 0 }, to = { 0, 0 };
+uint16_t sid = AMDVI_IOAPIC_SB_DEVID;
+
+from.address = (uint64_t) addr + AMDVI_INT_ADDR_FIRST;
+from.data = (uint32_t) value;
+
+trace_amdvi_mem_ir_write_req(addr, value, size);
+
+if (!attrs.unspecified) {
+/* We have explicit Source ID */
+sid = attrs.requester_id;
+}
+
+ret = amdvi_int_remap_msi(opaque, , , sid);
+if (ret < 0) {
+/* TODO: report error */


How do you plan to address this TODO item?


+/* Drop the interrupt */


What does this comment mean?  Is this also a TODO item?



As per the specs, if we are not able to remap the interrupts then we
should be log the events so that if needed guest OS can access the log
events and make some decisions. I have not implemented this yet.
I still need to understand how all these things works before
attempting to emulate this part of code.

I have to see what can be done in addition to log to handle the
cases where we failed to remap. For now, I just added a comment so that
it reminds us to revisit it.





+return MEMTX_ERROR;
+}
+
+apic_get_class()->send_msi();
+
+trace_amdvi_mem_ir_write(to.address, to.data);
+return MEMTX_OK;
+}
+
+static MemTxResult amdvi_mem_ir_read(void *opaque, hwaddr addr,
+ uint64_t *data, unsigned size,
+  

Re: [Qemu-devel] [PATCH v2 5/8] x86_iommu/amd: Add interrupt remap support when VAPIC is not enabled

2018-09-17 Thread Brijesh Singh




On 09/17/2018 12:52 AM, Peter Xu wrote:

On Fri, Sep 14, 2018 at 01:27:00PM -0500, Brijesh Singh wrote:

Emulate the interrupt remapping support when guest virtual APIC is
not enabled.

For more info Refer: AMD IOMMU spec Rev 3.0 - section 2.2.5.1

When VAPIC is not enabled, it uses interrupt remapping as defined in
Table 20 and Figure 15 from IOMMU spec.


(feel free to cc me in your next post)



Will do so






Cc: "Michael S. Tsirkin" 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Tom Lendacky 
Cc: Suravee Suthikulpanit 
Signed-off-by: Brijesh Singh 
---
  hw/i386/amd_iommu.c  | 189 ++-
  hw/i386/amd_iommu.h  |  46 -
  hw/i386/trace-events |   7 ++
  3 files changed, 240 insertions(+), 2 deletions(-)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index b15962b..9c8e4de 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -28,6 +28,8 @@
  #include "qemu/error-report.h"
  #include "hw/i386/apic_internal.h"
  #include "trace.h"
+#include "cpu.h"
+#include "hw/i386/apic-msidef.h"
  
  /* used AMD-Vi MMIO registers */

  const char *amdvi_mmio_low[] = {
@@ -1029,17 +1031,144 @@ static IOMMUTLBEntry amdvi_translate(IOMMUMemoryRegion 
*iommu, hwaddr addr,
  return ret;
  }
  
+static int amdvi_get_irte(AMDVIState *s, MSIMessage *origin, uint64_t *dte,

+  union irte *irte, uint16_t devid)
+{
+uint64_t irte_root, offset;
+
+irte_root = dte[2] & AMDVI_IR_PHYS_ADDR_MASK;


(I'll have similar endianess question like previous patch, but I'll
  stop looking at those since I'll need to know whether you plan to
  support that first...)



I am not sure if we really need to be concern about big-endian
support in this series. If we need big-endian support then lets
work on new series to fix the endianess.





+offset = (origin->data & AMDVI_IRTE_OFFSET) << 2;
+
+trace_amdvi_ir_irte(irte_root, offset);
+
+if (dma_memory_read(_space_memory, irte_root + offset,
+irte, sizeof(*irte))) {
+trace_amdvi_ir_err("failed to get irte");
+return -AMDVI_IR_GET_IRTE;
+}
+
+trace_amdvi_ir_irte_val(irte->val);
+
+return 0;
+}
+
+static int amdvi_int_remap_legacy(AMDVIState *iommu,
+  MSIMessage *origin,
+  MSIMessage *translated,
+  uint64_t *dte,
+  X86IOMMUIrq *irq,
+  uint16_t sid)
+{
+int ret;
+union irte irte;
+
+/* get interrupt remapping table */
+ret = amdvi_get_irte(iommu, origin, dte, , sid);
+if (ret < 0) {
+return ret;
+}
+
+if (!irte.fields.valid) {
+trace_amdvi_ir_target_abort("RemapEn is disabled");


Note that recently QEMU introduced error_report_once().  Feel free to
switch to that if you want.  Many of the VT-d emulation code switched
to that to make sure errors like this will be dumped at least once and
we're also free from Dos attack.  This should at least apply to all of
your below trace_amdvi_ir_target_abort() calls, even some other traces.




IMHO we should not be using error_report_once() here. It's possible that
guest OS have DTE[IV]=1 but has not programmed the interrupt
remapping entries or have deactivated the remapping. I see that Linux
OS does it all the time and in those cases we will be printing out
error messages - which will give a wrong information.




+return -AMDVI_IR_TARGET_ABORT;
+}
+
+if (irte.fields.guest_mode) {
+trace_amdvi_ir_target_abort("guest mode is not zero");
+return -AMDVI_IR_TARGET_ABORT;
+}
+
+if (irte.fields.int_type > AMDVI_IOAPIC_INT_TYPE_ARBITRATED) {
+trace_amdvi_ir_target_abort("reserved int_type");
+return -AMDVI_IR_TARGET_ABORT;
+}
+
+irq->delivery_mode = irte.fields.int_type;
+irq->vector = irte.fields.vector;
+irq->dest_mode = irte.fields.dm;
+irq->redir_hint = irte.fields.rq_eoi;
+irq->dest = irte.fields.destination;
+
+return 0;
+}
+
+static int __amdvi_int_remap_msi(AMDVIState *iommu,
+ MSIMessage *origin,
+ MSIMessage *translated,
+ uint64_t *dte,
+ X86IOMMUIrq *irq,
+ uint16_t sid)
+{
+uint8_t int_ctl;
+
+int_ctl = (dte[2] >> AMDVI_IR_INTCTL_SHIFT) & 3;
+trace_amdvi_ir_intctl(int_ctl);
+
+switch (int_ctl) {
+case AMDVI_IR_INTCTL_PASS:
+memcpy(translated, origin, sizeof(*origin));
+return 0;
+case AMDVI_IR_INTCTL_REMAP:
+break;
+case AMDVI_IR_INTCTL_ABORT:
+trace_amdvi_ir_target_

Re: [Qemu-devel] [PATCH v2 3/8] x86_iommu/amd: remove V=1 check from amdvi_validate_dte()

2018-09-17 Thread Brijesh Singh




On 09/17/2018 07:56 AM, Eduardo Habkost wrote:

On Fri, Sep 14, 2018 at 01:26:58PM -0500, Brijesh Singh wrote:

Currently, the amdvi_validate_dte() assumes that a valid DTE will
always have V=1. This is not true. The V=1 means that bit[127:1] are
valid. A valid DTE can have IV=1 and V=0 (i.e pt=off, intremap=on).

Remove the V=1 check from amdvi_validate_dte(), make the caller
responsible to check for V or IV bits.

Signed-off-by: Brijesh Singh 
Cc: "Michael S. Tsirkin" 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Tom Lendacky 
Cc: Suravee Suthikulpanit 
---
  hw/i386/amd_iommu.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 1fd669f..225825e 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -807,7 +807,7 @@ static inline uint64_t amdvi_get_perms(uint64_t entry)
 AMDVI_DEV_PERM_SHIFT;
  }
  
-/* a valid entry should have V = 1 and reserved bits honoured */

+/* validate that reserved bits are honoured */
  static bool amdvi_validate_dte(AMDVIState *s, uint16_t devid,
 uint64_t *dte)
  {
@@ -820,7 +820,7 @@ static bool amdvi_validate_dte(AMDVIState *s, uint16_t 
devid,
  return false;
  }
  
-return dte[0] & AMDVI_DEV_VALID;


 [1]


+return true;
  }


For reference, this is the only caller of amdvi_validate_dte():

   /* get a device table entry given the devid */
   static bool amdvi_get_dte(AMDVIState *s, int devid, uint64_t *entry)
   {
   uint32_t offset = devid * AMDVI_DEVTAB_ENTRY_SIZE;
   
   if (dma_memory_read(_space_memory, s->devtab + offset, entry,

   AMDVI_DEVTAB_ENTRY_SIZE)) {
   trace_amdvi_dte_get_fail(s->devtab, offset);
   /* log error accessing dte */
   amdvi_log_devtab_error(s, devid, s->devtab + offset, 0);
   return false;
   }
   
   *entry = le64_to_cpu(*entry);

   if (!amdvi_validate_dte(s, devid, entry)) { /* <--- [2] */
   trace_amdvi_invalid_dte(entry[0]);
   return false;
   }
   
   return true;

   }

and the only caller of amdvi_get_dte() is below:

  
  /* get a device table entry given the devid */

@@ -967,7 +967,8 @@ static void amdvi_do_translate(AMDVIAddressSpace *as, 
hwaddr addr,
  }
  
  /* devices with V = 0 are not translated */

-if (!amdvi_get_dte(s, devid, entry)) {
+if (!amdvi_get_dte(s, devid, entry) &&
+!(entry[0] & AMDVI_DEV_VALID)) {

^ [3]


  goto out;
  }


This means `dte` at [1] == `entry` at [2] == `entry` at [3].

However, if amdvi_get_dte() returned false, `entry[0]` might be
uninitialized.  We should check (entry[0] & AMDVI_DEV_VALID) only
if amdvi_get_dte() returned true.  I assume you meant the
following:

 if (!amdvi_get_dte(s, devid, entry) ||
 !(entry[0] & AMDVI_DEV_VALID)) {
 goto out;
 }



Ah good catch. Yes we should check the valid bit only if we are
able to get a valid dte. thanks



Re: [Qemu-devel] [PATCH v2 3/8] x86_iommu/amd: remove V=1 check from amdvi_validate_dte()

2018-09-17 Thread Brijesh Singh



On 9/16/18 11:33 PM, Peter Xu wrote:
> On Fri, Sep 14, 2018 at 01:26:58PM -0500, Brijesh Singh wrote:
>> Currently, the amdvi_validate_dte() assumes that a valid DTE will
>> always have V=1. This is not true. The V=1 means that bit[127:1] are
>> valid. A valid DTE can have IV=1 and V=0 (i.e pt=off, intremap=on).
> "pt" might be a bit confusing here.  Now "intel-iommu" device has the
> "pt" parameter to specify IOMMU DMAR passthrough support.  Also the
> corresponding guest kernel parameter "iommu_pt".  So I would suggest
> to use "page translation" (is this really the term that AMD spec is
> used after all?) or directly DMAR (DMA remapping).

I will use "page or address translation" instead of pt to avoid confusions.

>
>> Remove the V=1 check from amdvi_validate_dte(), make the caller
>> responsible to check for V or IV bits.
>>
>> Signed-off-by: Brijesh Singh 
>> Cc: "Michael S. Tsirkin" 
>> Cc: Paolo Bonzini 
>> Cc: Richard Henderson 
>> Cc: Eduardo Habkost 
>> Cc: Marcel Apfelbaum 
>> Cc: Tom Lendacky 
>> Cc: Suravee Suthikulpanit 
>> ---
>>  hw/i386/amd_iommu.c | 7 ---
>>  1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
>> index 1fd669f..225825e 100644
>> --- a/hw/i386/amd_iommu.c
>> +++ b/hw/i386/amd_iommu.c
>> @@ -807,7 +807,7 @@ static inline uint64_t amdvi_get_perms(uint64_t entry)
>> AMDVI_DEV_PERM_SHIFT;
>>  }
>>  
>> -/* a valid entry should have V = 1 and reserved bits honoured */
>> +/* validate that reserved bits are honoured */
>>  static bool amdvi_validate_dte(AMDVIState *s, uint16_t devid,
>> uint64_t *dte)
>>  {
>> @@ -820,7 +820,7 @@ static bool amdvi_validate_dte(AMDVIState *s, uint16_t 
>> devid,
>>  return false;
>>  }
>>  
>> -return dte[0] & AMDVI_DEV_VALID;
>> +return true;
>>  }
>>  
>>  /* get a device table entry given the devid */
>> @@ -967,7 +967,8 @@ static void amdvi_do_translate(AMDVIAddressSpace *as, 
>> hwaddr addr,
>>  }
>>  
>>  /* devices with V = 0 are not translated */
>> -if (!amdvi_get_dte(s, devid, entry)) {
>> +if (!amdvi_get_dte(s, devid, entry) &&
>> +!(entry[0] & AMDVI_DEV_VALID)) {
> Here I'm not sure whether you're considering endianess.  I think
> amdvi_get_dte() tried to fix the endianess somehow but I'm not sure
> it's complete (so entry[0] is special here...):
>
> static bool amdvi_get_dte(AMDVIState *s, int devid, uint64_t *entry)
> {
> uint32_t offset = devid * AMDVI_DEVTAB_ENTRY_SIZE;
>
> if (dma_memory_read(_space_memory, s->devtab + offset, entry,
> AMDVI_DEVTAB_ENTRY_SIZE)) {
> trace_amdvi_dte_get_fail(s->devtab, offset);
> /* log error accessing dte */
> amdvi_log_devtab_error(s, devid, s->devtab + offset, 0);
> return false;
> }
>
> *entry = le64_to_cpu(*entry);  <--- [1]
> if (!amdvi_validate_dte(s, devid, entry)) {
> trace_amdvi_invalid_dte(entry[0]);
> return false;
> }
>
> return true;
> }
>
> At [1] only one 64bits entry is swapped correctly to cpu endianess,
> IMHO the rest of the three uint64_t is still using LE.
>
> I'm not really sure whether there would be anyone that wants to run
> the AMD IOMMU on big endian hosts, but I just want to know the goal of
> this series - do you want to support this scenario?  If so, you might
> need to fixup the places too AFAIU.

I had similar question in my mind when I looked at amd-iommu the very
first time. I am not sure if anyone is using this device on big endian
platform. This series focuses on interrupt remap support only. If needed
we can work on different series to add big endian hosts.

>
>>  goto out;
>>  }
>>  
>> -- 
>> 2.7.4
>>
>>
> Regards,
>




[Qemu-devel] [PATCH v2 6/8] i386: acpi: add IVHD device entry for IOAPIC

2018-09-14 Thread Brijesh Singh
When interrupt remapping is enabled, add a special IVHD device
(type IOAPIC).

Signed-off-by: Brijesh Singh 
Cc: "Michael S. Tsirkin" 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Tom Lendacky 
Cc: Suravee Suthikulpanit 
---
 hw/i386/acpi-build.c | 29 -
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index e1ee8ae..0a19e25 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2516,9 +2516,12 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker)
  *   IVRS table as specified in AMD IOMMU Specification v2.62, Section 5.2
  *   accessible here http://support.amd.com/TechDocs/48882_IOMMU.pdf
  */
+#define IOAPIC_SB_DEVID   (uint64_t)PCI_BUILD_BDF(0, PCI_DEVFN(0x14, 0))
+
 static void
 build_amd_iommu(GArray *table_data, BIOSLinker *linker)
 {
+int ivhd_table_len = 28;
 int iommu_start = table_data->len;
 AMDVIState *s = AMD_IOMMU_DEVICE(x86_iommu_get_default());
 
@@ -2540,8 +2543,16 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker)
  (1UL << 6) | /* PrefSup  */
  (1UL << 7),  /* PPRSup   */
  1);
+
+/*
+ * When interrupt remapping is enabled, we add a special IVHD device
+ * for type IO-APIC.
+ */
+if (s->intr_enabled) {
+ivhd_table_len += 8;
+}
 /* IVHD length */
-build_append_int_noprefix(table_data, 28, 2);
+build_append_int_noprefix(table_data, ivhd_table_len, 2);
 /* DeviceID */
 build_append_int_noprefix(table_data, s->devid, 2);
 /* Capability offset */
@@ -2565,6 +2576,22 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker)
  */
 build_append_int_noprefix(table_data, 0x001, 4);
 
+/*
+ * Add a special IVHD device type.
+ * Refer to spec - Table 95: IVHD device entry type codes
+ *
+ * When interrupt remapping is enabled Linux IOMMU driver checks for
+ * the special IVHD device (type IO-APIC).
+ * See Linux kernel commit 'c2ff5cf5294bcbd7fa50f7d860e90a66db7e5059'
+ */
+if (s->intr_enabled) {
+build_append_int_noprefix(table_data,
+ (0x1ull << 56) |   /* type IOAPIC */
+ (IOAPIC_SB_DEVID << 40) |  /* IOAPIC devid */
+ 0x48,  /* special device 
*/
+ 8);
+}
+
 build_header(linker, table_data, (void *)(table_data->data + iommu_start),
  "IVRS", table_data->len - iommu_start, 1, NULL, NULL);
 }
-- 
2.7.4




[Qemu-devel] [PATCH v2 5/8] x86_iommu/amd: Add interrupt remap support when VAPIC is not enabled

2018-09-14 Thread Brijesh Singh
Emulate the interrupt remapping support when guest virtual APIC is
not enabled.

For more info Refer: AMD IOMMU spec Rev 3.0 - section 2.2.5.1

When VAPIC is not enabled, it uses interrupt remapping as defined in
Table 20 and Figure 15 from IOMMU spec.

Cc: "Michael S. Tsirkin" 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Tom Lendacky 
Cc: Suravee Suthikulpanit 
Signed-off-by: Brijesh Singh 
---
 hw/i386/amd_iommu.c  | 189 ++-
 hw/i386/amd_iommu.h  |  46 -
 hw/i386/trace-events |   7 ++
 3 files changed, 240 insertions(+), 2 deletions(-)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index b15962b..9c8e4de 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -28,6 +28,8 @@
 #include "qemu/error-report.h"
 #include "hw/i386/apic_internal.h"
 #include "trace.h"
+#include "cpu.h"
+#include "hw/i386/apic-msidef.h"
 
 /* used AMD-Vi MMIO registers */
 const char *amdvi_mmio_low[] = {
@@ -1029,17 +1031,144 @@ static IOMMUTLBEntry amdvi_translate(IOMMUMemoryRegion 
*iommu, hwaddr addr,
 return ret;
 }
 
+static int amdvi_get_irte(AMDVIState *s, MSIMessage *origin, uint64_t *dte,
+  union irte *irte, uint16_t devid)
+{
+uint64_t irte_root, offset;
+
+irte_root = dte[2] & AMDVI_IR_PHYS_ADDR_MASK;
+offset = (origin->data & AMDVI_IRTE_OFFSET) << 2;
+
+trace_amdvi_ir_irte(irte_root, offset);
+
+if (dma_memory_read(_space_memory, irte_root + offset,
+irte, sizeof(*irte))) {
+trace_amdvi_ir_err("failed to get irte");
+return -AMDVI_IR_GET_IRTE;
+}
+
+trace_amdvi_ir_irte_val(irte->val);
+
+return 0;
+}
+
+static int amdvi_int_remap_legacy(AMDVIState *iommu,
+  MSIMessage *origin,
+  MSIMessage *translated,
+  uint64_t *dte,
+  X86IOMMUIrq *irq,
+  uint16_t sid)
+{
+int ret;
+union irte irte;
+
+/* get interrupt remapping table */
+ret = amdvi_get_irte(iommu, origin, dte, , sid);
+if (ret < 0) {
+return ret;
+}
+
+if (!irte.fields.valid) {
+trace_amdvi_ir_target_abort("RemapEn is disabled");
+return -AMDVI_IR_TARGET_ABORT;
+}
+
+if (irte.fields.guest_mode) {
+trace_amdvi_ir_target_abort("guest mode is not zero");
+return -AMDVI_IR_TARGET_ABORT;
+}
+
+if (irte.fields.int_type > AMDVI_IOAPIC_INT_TYPE_ARBITRATED) {
+trace_amdvi_ir_target_abort("reserved int_type");
+return -AMDVI_IR_TARGET_ABORT;
+}
+
+irq->delivery_mode = irte.fields.int_type;
+irq->vector = irte.fields.vector;
+irq->dest_mode = irte.fields.dm;
+irq->redir_hint = irte.fields.rq_eoi;
+irq->dest = irte.fields.destination;
+
+return 0;
+}
+
+static int __amdvi_int_remap_msi(AMDVIState *iommu,
+ MSIMessage *origin,
+ MSIMessage *translated,
+ uint64_t *dte,
+ X86IOMMUIrq *irq,
+ uint16_t sid)
+{
+uint8_t int_ctl;
+
+int_ctl = (dte[2] >> AMDVI_IR_INTCTL_SHIFT) & 3;
+trace_amdvi_ir_intctl(int_ctl);
+
+switch (int_ctl) {
+case AMDVI_IR_INTCTL_PASS:
+memcpy(translated, origin, sizeof(*origin));
+return 0;
+case AMDVI_IR_INTCTL_REMAP:
+break;
+case AMDVI_IR_INTCTL_ABORT:
+trace_amdvi_ir_target_abort("int_ctl abort");
+return -AMDVI_IR_TARGET_ABORT;
+default:
+trace_amdvi_ir_target_abort("int_ctl reserved");
+return -AMDVI_IR_TARGET_ABORT;
+}
+
+return amdvi_int_remap_legacy(iommu, origin, translated, dte, irq, sid);
+}
+
+static bool amdvi_validate_int_reamp(AMDVIState *s, uint64_t *dte)
+{
+/* Check if IR is enabled in DTE */
+if (!(dte[2] & AMDVI_IR_REMAP_ENABLE)) {
+return false;
+}
+
+/* validate that we are configure with intremap=on */
+if (!s->intr_enabled) {
+error_report("Interrupt remapping is enabled in the guest but "
+ "not in the host. Use intremap=on to enable interrupt "
+ "remapping in amd-iommu.");
+exit(1);
+}
+
+return true;
+}
+
 /* Interrupt remapping for MSI/MSI-X entry */
 static int amdvi_int_remap_msi(AMDVIState *iommu,
MSIMessage *origin,
MSIMessage *translated,
uint16_t sid)
 {
+int ret = 0;
+uint64_t pass = 0;
+uint64_t dte[4] = { 0 };
+X86IOMMUIrq irq = { 0 };
+uint8_t dest

[Qemu-devel] [PATCH v2 3/8] x86_iommu/amd: remove V=1 check from amdvi_validate_dte()

2018-09-14 Thread Brijesh Singh
Currently, the amdvi_validate_dte() assumes that a valid DTE will
always have V=1. This is not true. The V=1 means that bit[127:1] are
valid. A valid DTE can have IV=1 and V=0 (i.e pt=off, intremap=on).

Remove the V=1 check from amdvi_validate_dte(), make the caller
responsible to check for V or IV bits.

Signed-off-by: Brijesh Singh 
Cc: "Michael S. Tsirkin" 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Tom Lendacky 
Cc: Suravee Suthikulpanit 
---
 hw/i386/amd_iommu.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 1fd669f..225825e 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -807,7 +807,7 @@ static inline uint64_t amdvi_get_perms(uint64_t entry)
AMDVI_DEV_PERM_SHIFT;
 }
 
-/* a valid entry should have V = 1 and reserved bits honoured */
+/* validate that reserved bits are honoured */
 static bool amdvi_validate_dte(AMDVIState *s, uint16_t devid,
uint64_t *dte)
 {
@@ -820,7 +820,7 @@ static bool amdvi_validate_dte(AMDVIState *s, uint16_t 
devid,
 return false;
 }
 
-return dte[0] & AMDVI_DEV_VALID;
+return true;
 }
 
 /* get a device table entry given the devid */
@@ -967,7 +967,8 @@ static void amdvi_do_translate(AMDVIAddressSpace *as, 
hwaddr addr,
 }
 
 /* devices with V = 0 are not translated */
-if (!amdvi_get_dte(s, devid, entry)) {
+if (!amdvi_get_dte(s, devid, entry) &&
+!(entry[0] & AMDVI_DEV_VALID)) {
 goto out;
 }
 
-- 
2.7.4




[Qemu-devel] [PATCH v2 4/8] x86_iommu/amd: Prepare for interrupt remap support

2018-09-14 Thread Brijesh Singh
Register the interrupt remapping callback and read/write ops for the
amd-iommu-ir memory region.

amd-iommu-ir is set to higher priority to ensure that this region won't
be masked out by other memory regions.

While at it, add a overlapping amd-iommu region with higher priority
and update address space name to include the devfn.

Cc: "Michael S. Tsirkin" 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Tom Lendacky 
Cc: Suravee Suthikulpanit 
Signed-off-by: Brijesh Singh 
---
 hw/i386/amd_iommu.c  | 140 ---
 hw/i386/amd_iommu.h  |  17 ++-
 hw/i386/trace-events |   5 ++
 3 files changed, 154 insertions(+), 8 deletions(-)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 225825e..b15962b 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -26,6 +26,7 @@
 #include "amd_iommu.h"
 #include "qapi/error.h"
 #include "qemu/error-report.h"
+#include "hw/i386/apic_internal.h"
 #include "trace.h"
 
 /* used AMD-Vi MMIO registers */
@@ -56,6 +57,7 @@ struct AMDVIAddressSpace {
 uint8_t devfn;  /* device function  */
 AMDVIState *iommu_state;/* AMDVI - one per machine  */
 IOMMUMemoryRegion iommu;/* Device's address translation region  */
+MemoryRegion root;  /* AMDVI Root memory map region */
 MemoryRegion iommu_ir;  /* Device's interrupt remapping region  */
 AddressSpace as;/* device's corresponding address space */
 };
@@ -1027,10 +1029,104 @@ static IOMMUTLBEntry amdvi_translate(IOMMUMemoryRegion 
*iommu, hwaddr addr,
 return ret;
 }
 
+/* Interrupt remapping for MSI/MSI-X entry */
+static int amdvi_int_remap_msi(AMDVIState *iommu,
+   MSIMessage *origin,
+   MSIMessage *translated,
+   uint16_t sid)
+{
+assert(origin && translated);
+
+trace_amdvi_ir_remap_msi_req(origin->address, origin->data, sid);
+
+if (!iommu || !iommu->intr_enabled) {
+memcpy(translated, origin, sizeof(*origin));
+goto out;
+}
+
+if (origin->address & AMDVI_MSI_ADDR_HI_MASK) {
+trace_amdvi_err("MSI address high 32 bits non-zero when "
+"Interrupt Remapping enabled.");
+return -AMDVI_IR_ERR;
+}
+
+if ((origin->address & AMDVI_MSI_ADDR_LO_MASK) != APIC_DEFAULT_ADDRESS) {
+trace_amdvi_err("MSI is not from IOAPIC.");
+return -AMDVI_IR_ERR;
+}
+
+out:
+trace_amdvi_ir_remap_msi(origin->address, origin->data,
+ translated->address, translated->data);
+return 0;
+}
+
+static int amdvi_int_remap(X86IOMMUState *iommu,
+   MSIMessage *origin,
+   MSIMessage *translated,
+   uint16_t sid)
+{
+return amdvi_int_remap_msi(AMD_IOMMU_DEVICE(iommu), origin,
+   translated, sid);
+}
+
+static MemTxResult amdvi_mem_ir_write(void *opaque, hwaddr addr,
+  uint64_t value, unsigned size,
+  MemTxAttrs attrs)
+{
+int ret;
+MSIMessage from = { 0, 0 }, to = { 0, 0 };
+uint16_t sid = AMDVI_IOAPIC_SB_DEVID;
+
+from.address = (uint64_t) addr + AMDVI_INT_ADDR_FIRST;
+from.data = (uint32_t) value;
+
+trace_amdvi_mem_ir_write_req(addr, value, size);
+
+if (!attrs.unspecified) {
+/* We have explicit Source ID */
+sid = attrs.requester_id;
+}
+
+ret = amdvi_int_remap_msi(opaque, , , sid);
+if (ret < 0) {
+/* TODO: report error */
+/* Drop the interrupt */
+return MEMTX_ERROR;
+}
+
+apic_get_class()->send_msi();
+
+trace_amdvi_mem_ir_write(to.address, to.data);
+return MEMTX_OK;
+}
+
+static MemTxResult amdvi_mem_ir_read(void *opaque, hwaddr addr,
+ uint64_t *data, unsigned size,
+ MemTxAttrs attrs)
+{
+return MEMTX_OK;
+}
+
+static const MemoryRegionOps amdvi_ir_ops = {
+.read_with_attrs = amdvi_mem_ir_read,
+.write_with_attrs = amdvi_mem_ir_write,
+.endianness = DEVICE_LITTLE_ENDIAN,
+.impl = {
+.min_access_size = 4,
+.max_access_size = 4,
+},
+.valid = {
+.min_access_size = 4,
+.max_access_size = 4,
+}
+};
+
 static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
 {
+char name[128];
 AMDVIState *s = opaque;
-AMDVIAddressSpace **iommu_as;
+AMDVIAddressSpace **iommu_as, *amdvi_dev_as;
 int bus_num = pci_bus_num(bus);
 
 iommu_as = s->address_spaces[bus_num];
@@ -1043,19 +1139,46 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, 
void *opaqu

[Qemu-devel] [PATCH v2 0/8] x86_iommu/amd: add interrupt remap support

2018-09-14 Thread Brijesh Singh
This series adds the interrupt remapping support for amd-iommu device.

IOMMU spec is available at: https://support.amd.com/TechDocs/48882_IOMMU.pdf

To enable the interrupt remap use below qemu cli
# $QEMU \
  -device amd-iommu,intremap=on

I have tested FC-28 and Ubuntu 18.04 guest. 

Linux guest bootup log shows the interrupt remap supports:

[root@localhost ~]# dmesg | grep -i AMD-Vi
[0.001761] AMD-Vi: Using IVHD type 0x10
[0.003051] AMD-Vi: device: 00:03.0 cap: 0040 seg: 0 flags: d1 info 
[0.004007] AMD-Vi:mmio-addr: fed8
[0.004874] AMD-Vi:   DEV_ALLflags: 00
[0.006236] AMD-Vi:   DEV_SPECIAL(IOAPIC[0]) devid: 00:14.0
[0.667943] AMD-Vi: Found IOMMU at :00:03.0 cap 0x40
[0.668727] AMD-Vi: Extended features (0x29d3):
[0.669874] AMD-Vi: Interrupt remapping enabled
[0.671074] AMD-Vi: Lazy IO/TLB flushing enabled

cat /proc/interrupts confirms that its using IR

[root@localhost ~]# cat /proc/interrupts 
CPU0   
 0: 40  IR-IO-APIC2-edge  timer
 1:  9  IR-IO-APIC1-edge  i8042
 4:   1770  IR-IO-APIC4-edge  ttyS0
 7:  0  IR-IO-APIC7-edge  parport0
 8:  1  IR-IO-APIC8-edge  rtc0
 9:  0  IR-IO-APIC9-fasteoi   acpi
12: 15  IR-IO-APIC   12-edge  i8042
16:  0  IR-IO-APIC   16-fasteoi   i801_smbus
24:  0   PCI-MSI 49152-edge  AMD-Vi
25:  13070  IR-PCI-MSI 512000-edge  ahci[:00:1f.2]
26: 86  IR-PCI-MSI 32768-edge  enp0s2-rx-0
27:139  IR-PCI-MSI 32769-edge  enp0s2-tx-0
28:  1  IR-PCI-MSI 32770-edge  enp0s2
NMI:  0   Non-maskable interrupts
LOC:  26686   Local timer interrupts
SPU:  0   Spurious interrupts
...
...

Cc: "Michael S. Tsirkin" 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Tom Lendacky 
Cc: Suravee Suthikulpanit 

Changes since v1:
 - move vtd_generate_msi_message to common code
 - fix the dest_mode bit extraction
 - add more comments explaining why we add the special device
 - some minor cleanups based on Peter's feedbacks

Brijesh Singh (8):
  x86_iommu: move the kernel-irqchip check in common code
  x86_iommu: move vtd_generate_msi_message in common file
  x86_iommu/amd: remove V=1 check from amdvi_validate_dte()
  x86_iommu/amd: Prepare for interrupt remap support
  x86_iommu/amd: Add interrupt remap support when VAPIC is not enabled
  i386: acpi: add IVHD device entry for IOAPIC
  x86_iommu/amd: Add interrupt remap support when VAPIC is enabled
  x86_iommu/amd: Enable Guest virtual APIC support

 hw/i386/acpi-build.c  |  32 +++-
 hw/i386/amd_iommu.c   | 402 +-
 hw/i386/amd_iommu.h   | 101 ++-
 hw/i386/intel_iommu.c |  39 +---
 hw/i386/trace-events  |  14 ++
 hw/i386/x86-iommu.c   |  33 
 include/hw/i386/intel_iommu.h |  59 ---
 include/hw/i386/x86-iommu.h   |  66 +++
 8 files changed, 638 insertions(+), 108 deletions(-)

-- 
2.7.4




[Qemu-devel] [PATCH v2 8/8] x86_iommu/amd: Enable Guest virtual APIC support

2018-09-14 Thread Brijesh Singh
Now that amd-iommu support interrupt remapping, enable the GASup in IVRS
table and GASup in extended feature register to indicate that IOMMU
support guest virtual APIC mode. GASup provides option to guest OS to
make use of 128-bit IRTE.

Note that the GAMSup is set to zero to indicate that amd-iommu does not
support guest virtual APIC mode (aka AVIC) which would be used for the
nested VMs.

See Table 21 from IOMMU spec for interrupt virtualization controls

Signed-off-by: Brijesh Singh 
Reviewed-by: Peter Xu 
Cc: "Michael S. Tsirkin" 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Tom Lendacky 
Cc: Suravee Suthikulpanit 
---
 hw/i386/acpi-build.c | 3 ++-
 hw/i386/amd_iommu.h  | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 0a19e25..41f523a 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2567,7 +2567,8 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker)
 build_append_int_noprefix(table_data,
  (48UL << 30) | /* HATS   */
  (48UL << 28) | /* GATS   */
- (1UL << 2),/* GTSup  */
+ (1UL << 2)   | /* GTSup  */
+ (1UL << 6),/* GASup  */
  4);
 /*
  *   Type 1 device entry reporting all devices
diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
index 6579469..1ac920e 100644
--- a/hw/i386/amd_iommu.h
+++ b/hw/i386/amd_iommu.h
@@ -177,7 +177,7 @@
 /* extended feature support */
 #define AMDVI_EXT_FEATURES (AMDVI_FEATURE_PREFETCH | AMDVI_FEATURE_PPR | \
 AMDVI_FEATURE_IA | AMDVI_FEATURE_GT | AMDVI_FEATURE_HE | \
-AMDVI_GATS_MODE | AMDVI_HATS_MODE)
+AMDVI_GATS_MODE | AMDVI_HATS_MODE | AMDVI_FEATURE_GA)
 
 /* capabilities header */
 #define AMDVI_CAPAB_FEATURES (AMDVI_CAPAB_FLAT_EXT | \
-- 
2.7.4




[Qemu-devel] [PATCH v2 2/8] x86_iommu: move vtd_generate_msi_message in common file

2018-09-14 Thread Brijesh Singh
The vtd_generate_msi_message() in intel-iommu is used to construct a MSI
Message from IRQ. A similar function will be needed when we add interrupt
remapping support in amd-iommu. Moving the function in common file to
avoid the code duplication. Rename it to x86_iommu_irq_to_msi_message().
There is no logic changes in the code flow.

Signed-off-by: Brijesh Singh 
Suggested-by: Peter Xu 
Cc: "Michael S. Tsirkin" 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Tom Lendacky 
Cc: Suravee Suthikulpanit 
---
 hw/i386/intel_iommu.c | 32 +++--
 hw/i386/x86-iommu.c   | 24 
 include/hw/i386/intel_iommu.h | 59 --
 include/hw/i386/x86-iommu.h   | 66 +++
 4 files changed, 94 insertions(+), 87 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 84dbc20..014418b 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2701,7 +2701,7 @@ static int vtd_irte_get(IntelIOMMUState *iommu, uint16_t 
index,
 
 /* Fetch IRQ information of specific IR index */
 static int vtd_remap_irq_get(IntelIOMMUState *iommu, uint16_t index,
- VTDIrq *irq, uint16_t sid)
+ X86IOMMUIrq *irq, uint16_t sid)
 {
 VTD_IR_TableEntry irte = {};
 int ret = 0;
@@ -2730,30 +2730,6 @@ static int vtd_remap_irq_get(IntelIOMMUState *iommu, 
uint16_t index,
 return 0;
 }
 
-/* Generate one MSI message from VTDIrq info */
-static void vtd_generate_msi_message(VTDIrq *irq, MSIMessage *msg_out)
-{
-VTD_MSIMessage msg = {};
-
-/* Generate address bits */
-msg.dest_mode = irq->dest_mode;
-msg.redir_hint = irq->redir_hint;
-msg.dest = irq->dest;
-msg.__addr_hi = irq->dest & 0xff00;
-msg.__addr_head = cpu_to_le32(0xfee);
-/* Keep this from original MSI address bits */
-msg.__not_used = irq->msi_addr_last_bits;
-
-/* Generate data bits */
-msg.vector = irq->vector;
-msg.delivery_mode = irq->delivery_mode;
-msg.level = 1;
-msg.trigger_mode = irq->trigger_mode;
-
-msg_out->address = msg.msi_addr;
-msg_out->data = msg.msi_data;
-}
-
 /* Interrupt remapping for MSI/MSI-X entry */
 static int vtd_interrupt_remap_msi(IntelIOMMUState *iommu,
MSIMessage *origin,
@@ -2763,7 +2739,7 @@ static int vtd_interrupt_remap_msi(IntelIOMMUState *iommu,
 int ret = 0;
 VTD_IR_MSIAddress addr;
 uint16_t index;
-VTDIrq irq = {};
+X86IOMMUIrq irq = {};
 
 assert(origin && translated);
 
@@ -2842,8 +2818,8 @@ static int vtd_interrupt_remap_msi(IntelIOMMUState *iommu,
  */
 irq.msi_addr_last_bits = addr.addr.__not_care;
 
-/* Translate VTDIrq to MSI message */
-vtd_generate_msi_message(, translated);
+/* Translate X86IOMMUIrq to MSI message */
+x86_iommu_irq_to_msi_message(, translated);
 
 out:
 trace_vtd_ir_remap_msi(origin->address, origin->data,
diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
index 7440cb8..abc3c03 100644
--- a/hw/i386/x86-iommu.c
+++ b/hw/i386/x86-iommu.c
@@ -53,6 +53,30 @@ void x86_iommu_iec_notify_all(X86IOMMUState *iommu, bool 
global,
 }
 }
 
+/* Generate one MSI message from VTDIrq info */
+void x86_iommu_irq_to_msi_message(X86IOMMUIrq *irq, MSIMessage *msg_out)
+{
+X86IOMMU_MSIMessage msg = {};
+
+/* Generate address bits */
+msg.dest_mode = irq->dest_mode;
+msg.redir_hint = irq->redir_hint;
+msg.dest = irq->dest;
+msg.__addr_hi = irq->dest & 0xff00;
+msg.__addr_head = cpu_to_le32(0xfee);
+/* Keep this from original MSI address bits */
+msg.__not_used = irq->msi_addr_last_bits;
+
+/* Generate data bits */
+msg.vector = irq->vector;
+msg.delivery_mode = irq->delivery_mode;
+msg.level = 1;
+msg.trigger_mode = irq->trigger_mode;
+
+msg_out->address = msg.msi_addr;
+msg_out->data = msg.msi_data;
+}
+
 /* Default X86 IOMMU device */
 static X86IOMMUState *x86_iommu_default = NULL;
 
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index fbfedcb..ed4e758 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -66,8 +66,6 @@ typedef struct VTDIOTLBEntry VTDIOTLBEntry;
 typedef struct VTDBus VTDBus;
 typedef union VTD_IR_TableEntry VTD_IR_TableEntry;
 typedef union VTD_IR_MSIAddress VTD_IR_MSIAddress;
-typedef struct VTDIrq VTDIrq;
-typedef struct VTD_MSIMessage VTD_MSIMessage;
 
 /* Context-Entry */
 struct VTDContextEntry {
@@ -197,63 +195,6 @@ union VTD_IR_MSIAddress {
 uint32_t data;
 };
 
-/* Generic IRQ entry information */
-struct VTDIrq {
-/* Used by both IOAPIC/MSI interrupt remapping */
-uint8_t trigger_mode;
-uint8_t vector;
-uint8_t delivery_mode;
-uint32_t dest;
-uint8_t dest_mode;
-
-/* only

[Qemu-devel] [PATCH v2 1/8] x86_iommu: move the kernel-irqchip check in common code

2018-09-14 Thread Brijesh Singh
Interrupt remapping needs kernel-irqchip={off|split} on both Intel and AMD
platforms. Move the check in common place.

Signed-off-by: Brijesh Singh 
Reviewed-by: Peter Xu 
Cc: "Michael S. Tsirkin" 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Tom Lendacky 
Cc: Suravee Suthikulpanit 
---
 hw/i386/intel_iommu.c | 7 ---
 hw/i386/x86-iommu.c   | 9 +
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 3dfada1..84dbc20 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3248,13 +3248,6 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error 
**errp)
 {
 X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
 
-/* Currently Intel IOMMU IR only support "kernel-irqchip={off|split}" */
-if (x86_iommu->intr_supported && kvm_irqchip_in_kernel() &&
-!kvm_irqchip_is_split()) {
-error_setg(errp, "Intel Interrupt Remapping cannot work with "
- "kernel-irqchip=on, please use 'split|off'.");
-return false;
-}
 if (s->intr_eim == ON_OFF_AUTO_ON && !x86_iommu->intr_supported) {
 error_setg(errp, "eim=on cannot be selected without intremap=on");
 return false;
diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
index 8a01a2d..7440cb8 100644
--- a/hw/i386/x86-iommu.c
+++ b/hw/i386/x86-iommu.c
@@ -25,6 +25,7 @@
 #include "qapi/error.h"
 #include "qemu/error-report.h"
 #include "trace.h"
+#include "sysemu/kvm.h"
 
 void x86_iommu_iec_register_notifier(X86IOMMUState *iommu,
  iec_notify_fn fn, void *data)
@@ -94,6 +95,14 @@ static void x86_iommu_realize(DeviceState *dev, Error **errp)
 return;
 }
 
+/* Both Intel and AMD IOMMU IR only support "kernel-irqchip={off|split}" */
+if (x86_iommu->intr_supported && kvm_irqchip_in_kernel() &&
+!kvm_irqchip_is_split()) {
+error_setg(errp, "Interrupt Remapping cannot work with "
+ "kernel-irqchip=on, please use 'split|off'.");
+return;
+}
+
 if (x86_class->realize) {
 x86_class->realize(dev, errp);
 }
-- 
2.7.4




Re: [Qemu-devel] [PATCH 4/6] i386: acpi: add IVHD device entry for IOAPIC

2018-09-13 Thread Brijesh Singh




On 09/13/2018 01:18 PM, Michael S. Tsirkin wrote:
...>>

0x01 00a0 00 00  48

Byte 0: 0x48 (special device)
Byte 1 & 2: must be zero
Byte 3: 0 (dte setting)
Byte 4: 0 (handle)
Byte 5 & 6: IOAPIC devfn (14:0.0)


Do you mean *bus* devfn? devfn is 0.0.



Sorry my bad, I was meaning to write devid and not devfn.

See, 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/iommu/amd_iommu_init.c#n2343


/* SB IOAPIC is always on this device in AMD systems */
#define IOAPIC_SB_DEVID ((0x00 << 8) | PCI_DEVFN(0x14, 0))



Byte 7: 0x1 (IOAPIC) - See Table 97 in spec



Above should go into code comment, along with
first (oldest) version of spec that has this table.
Additionally the number is IMHO more readable as:
(0x1ull << 56) | (PCI_BUILD_BDF(14, 0) << 40) | 0x48

(assuming I got what it should be).





Re: [Qemu-devel] [PATCH 6/6] x86_iommu/amd: Enable Guest virtual APIC support

2018-09-12 Thread Brijesh Singh




On 09/11/2018 11:52 PM, Peter Xu wrote:
...



diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 5c2c638..1cbc8ba 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2565,7 +2565,8 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker)
  build_append_int_noprefix(table_data,
   (48UL << 30) | /* HATS   */
   (48UL << 28) | /* GATS   */
- (1UL << 2),/* GTSup  */
+ (1UL << 2)   | /* GTSup  */
+ (1UL << 6),/* GASup  */


Sorry if I misunderstood - is this for nested?

I'm a bit confused here... IIUC in your previous patches you didn't
really implement guest_mode==1 case in IRTEs.  So if you have this set
then the guest should be able to setup IRTEs with guest_mode==1?  How
did it work?



Yes, sometime spec can be confusing ;) let me see if I can explain it.

Suravee, please correct me if I misunderstood something

- A legacy interrupt remap support is available on all IOMMU versions.

  Guest OS makes the decision whether to use the feature. Guest OS
  sets the IV bit in DTE to indicate when there is a valid interrupt
  map (See Table 7 DTE definition).

  In this mode, IRTE is 32-bit. The field details are available
  in Table 20 - Section 2.2.5.2. The third patch in this series
  implements the support for this case.

  (NOTE: As I explained [1], the Linux guest enables intr remap only
   when it sees a special IVHD IOAPIC type device is present in IVRS)

  [1] https://marc.info/?l=qemu-devel=153677956506196=2

- When AVIC is used in the guest, the intr remap logic is different.

  Based on the guest_mode, IOMMU and AVIC may need to work together
  to remap the interrupts (IOMMU spec refers this as Guest Virtual APIC
  Enabled -- Section 2.2.5.3).

  The GASup bit in extended feature register and IVHD IOMMU feature
  reporting field in IVRS are used to tell whether the IOMMU supports
  intr remap when AVIC is enabled in the guest.

  In this mode, the IRTE is 128-bit.

  To make things interesting, there is a GAMSup bit in extended
  feature register. It is used by IOMMU to tell the AVIC supported
  modes:

  a) 0 = intr_remap only (i.e IOMMU can remap the interrupt on its own)
  b) 1 = Both guest AVIC and IOMMU will work together to remap the intr

  The patch 5 in the series implements the intr_remap (#a)

  I have not implemented the mode=1. I am not sure if its worth
  implementing this mode for the emulated IOMMU case.

  See Table 21 for control knobs from IOMMU. This patch series
  implements the first three rows.

I hope my explanation does not add more confusions ;)

-Brijesh



Re: [Qemu-devel] [PATCH 4/6] i386: acpi: add IVHD device entry for IOAPIC

2018-09-12 Thread Brijesh Singh




On 09/12/2018 11:35 AM, Igor Mammedov wrote:
...

  
+/*

+ * When interrupt remapping is enabled, Linux IOMMU driver also checks
+ * for special IVHD device (type IO-APIC), which is typically presented
+ * as PCI device 14:00.0.

Probably it shouldn't be a 'typically' device from somewhere but rather address
fetched from corresponding device model QEMU implements.



IOAPIC is not presented as a true PCI device to guest OS. When IOMMU is
enabled a pseudo address space to added under root PCI bus. PCI 14:0.0
presents to this pseudo device.




+ */
+if (s->intr_enabled) {
+build_append_int_noprefix(table_data, 0x0100a048, 8);

  ^^ this is incomprehensible,
where does this magic number comes from and how was it calculated?



In order to provide interrupt remap support, a special IVHD device need
to be added,  the magic number uses the format defined in Table 95 (IVHD
device entry type codes).

0x01 00a0 00 00  48

Byte 0: 0x48 (special device)
Byte 1 & 2: must be zero
Byte 3: 0 (dte setting)
Byte 4: 0 (handle)
Byte 5 & 6: IOAPIC devfn (14:0.0)
Byte 7: 0x1 (IOAPIC) - See Table 97 in spec



+}
+
  build_header(linker, table_data, (void *)(table_data->data + iommu_start),
   "IVRS", table_data->len - iommu_start, 1, NULL, NULL);
  }






Re: [Qemu-devel] [PATCH 4/6] i386: acpi: add IVHD device entry for IOAPIC

2018-09-12 Thread Brijesh Singh




On 09/11/2018 11:35 PM, Peter Xu wrote:

On Tue, Sep 11, 2018 at 11:49:47AM -0500, Brijesh Singh wrote:

When interrupt remapping is enabled, add a special IVHD device
(type IOAPIC) -- which is typically PCI device 14:0.0. Linux IOMMU driver
checks for this special device.

Cc: "Michael S. Tsirkin" 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Tom Lendacky 
Cc: Suravee Suthikulpanit 
Signed-off-by: Brijesh Singh 
---
  hw/i386/acpi-build.c | 20 +++-
  1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index e1ee8ae..5c2c638 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2519,6 +2519,7 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker)
  static void
  build_amd_iommu(GArray *table_data, BIOSLinker *linker)
  {
+int ivhd_table_len = 28;
  int iommu_start = table_data->len;
  AMDVIState *s = AMD_IOMMU_DEVICE(x86_iommu_get_default());
  
@@ -2540,8 +2541,16 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker)

   (1UL << 6) | /* PrefSup  */
   (1UL << 7),  /* PPRSup   */
   1);
+
+/*
+ * When interrupt remapping is enabled, we add a special IVHD device
+ * for type IO-APIC.
+ */
+if (s->intr_enabled) {
+ivhd_table_len += 8;
+}
  /* IVHD length */
-build_append_int_noprefix(table_data, 28, 2);
+build_append_int_noprefix(table_data, ivhd_table_len, 2);
  /* DeviceID */
  build_append_int_noprefix(table_data, s->devid, 2);
  /* Capability offset */
@@ -2565,6 +2574,15 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker)
   */
  build_append_int_noprefix(table_data, 0x001, 4);
  
+/*

+ * When interrupt remapping is enabled, Linux IOMMU driver also checks
+ * for special IVHD device (type IO-APIC), which is typically presented
+ * as PCI device 14:00.0.
+ */
+if (s->intr_enabled) {
+build_append_int_noprefix(table_data, 0x0100a048, 8);


Some comments on the bit definition would be nicer, or "please refer
to Table 95 of AMD-Vi spec".

Could I ask how come the 14:00.0?  Is that in the spec somewhere?

And since you explicitly mentioned Linux, then... would it work for
Windows too?



The PCI 14:00.0 is SouthBridge IOAPIC device. On bare metal the timer
subsystem is connected to the SB IOAPIC. The IVRS table must contains
the entry of SB IOAPIC otherwise Linux will not enable the IR mapping
while parsing the IVRS.

On bare meta system, IVRS will always have entry for SB IOAPIC. As per
Windows is concerned, I am not sure if Windows support interrupt remap.
If it does, adding the SB IOAPIC devid should not cause any problem
to it because its always available on bare metal system.

Here is linux commit

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c2ff5cf5294bcbd7fa50f7d860e90a66db7e5059




Re: [Qemu-devel] [PATCH 2/6] x86_iommu/amd: Prepare for interrupt remap support

2018-09-12 Thread Brijesh Singh




On 09/11/2018 10:52 PM, Peter Xu wrote:

On Tue, Sep 11, 2018 at 11:49:45AM -0500, Brijesh Singh wrote:

  static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int 
devfn)
  {
  AMDVIState *s = opaque;
@@ -1055,6 +1151,12 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, 
void *opaque, int devfn)
  address_space_init(_as[devfn]->as,
 MEMORY_REGION(_as[devfn]->iommu),
 "amd-iommu");
+memory_region_init_io(_as[devfn]->iommu_ir, OBJECT(s),
+  _ir_ops, s, "amd-iommu-ir",
+  AMDVI_INT_ADDR_SIZE);
+memory_region_add_subregion(MEMORY_REGION(_as[devfn]->iommu),
+  AMDVI_INT_ADDR_FIRST,
+  _as[devfn]->iommu_ir);


A pure question: just to make sure this IR region won't be masked out
by other memory regions.  Asked since VT-d is explicitly setting a
higher priority of the memory region for interrupts with
memory_region_add_subregion_overlap().



Hmm, I was hoping that this IR region will not be masked out by other
regions but if it does then we will have trouble. thanks for pointing
this out, I think we can do similar thing as VT-d and make the region
as high priority so that we get memops invoked.



  }
  return _as[devfn]->as;
  }
@@ -1172,6 +1274,10 @@ static void amdvi_realize(DeviceState *dev, Error **err)
  return;
  }
  
+/* Pseudo address space under root PCI bus. */

+pcms->ioapic_as = amdvi_host_dma_iommu(bus, s, AMDVI_SB_IOAPIC_ID);
+s->intr_enabled = x86_iommu->intr_supported;


So does this mean that AMD IR cannot be disabled if declared support?
For VT-d, IR needs to be explicitly enabled otherwise disabled (even
supported).




Yes, once its declared as supported then it can not disabled. Its
upto the guest OS to decide whether it want to use the intr remapping
feature by parsing the IVRS. This also brings question, should we
just enable it by default because its guest OS decision whether it
wants to use it or not.




Re: [Qemu-devel] [PATCH 3/6] x86_iommu/amd: Add interrupt remap support when VAPIC is not enabled

2018-09-12 Thread Brijesh Singh



Thanks for the quick review feedback.

On 09/11/2018 10:37 PM, Peter Xu wrote:

On Tue, Sep 11, 2018 at 11:49:46AM -0500, Brijesh Singh wrote:

Emulate the interrupt remapping support when guest virtual APIC is
not enabled.

See IOMMU spec: https://support.amd.com/TechDocs/48882_IOMMU.pdf
(section 2.2.5.1) for details information.

When VAPIC is not enabled, it uses interrupt remapping as defined in
Table 20 and Figure 15 from IOMMU spec.

Cc: "Michael S. Tsirkin" 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Tom Lendacky 
Cc: Suravee Suthikulpanit 
Signed-off-by: Brijesh Singh 
---
  hw/i386/amd_iommu.c  | 187 +++
  hw/i386/amd_iommu.h  |  60 -
  hw/i386/trace-events |   7 ++
  3 files changed, 253 insertions(+), 1 deletion(-)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 572ba0a..5ac19df 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -28,6 +28,8 @@
  #include "qemu/error-report.h"
  #include "hw/i386/apic_internal.h"
  #include "trace.h"
+#include "cpu.h"
+#include "hw/i386/apic-msidef.h"
  
  /* used AMD-Vi MMIO registers */

  const char *amdvi_mmio_low[] = {
@@ -1027,6 +1029,119 @@ static IOMMUTLBEntry amdvi_translate(IOMMUMemoryRegion 
*iommu, hwaddr addr,
  return ret;
  }
  
+static int amdvi_get_irte(AMDVIState *s, MSIMessage *origin, uint64_t *dte,

+  union irte *irte, uint16_t devid)
+{
+uint64_t irte_root, offset;
+
+irte_root = dte[2] & AMDVI_IR_PHYS_ADDR_MASK;
+offset = (origin->data & AMDVI_IRTE_OFFSET) << 2;
+
+trace_amdvi_ir_irte(irte_root, offset);
+
+if (dma_memory_read(_space_memory, irte_root + offset,
+irte, sizeof(*irte))) {
+trace_amdvi_ir_err("failed to get irte");
+return -AMDVI_IR_GET_IRTE;
+}
+
+trace_amdvi_ir_irte_val(irte->val);
+
+return 0;
+}
+
+static void amdvi_generate_msi_message(struct AMDVIIrq *irq, MSIMessage *out)
+{
+out->address = APIC_DEFAULT_ADDRESS | \
+(irq->dest_mode << MSI_ADDR_DEST_MODE_SHIFT) | \
+(irq->redir_hint << MSI_ADDR_REDIRECTION_SHIFT) | \
+(irq->dest << MSI_ADDR_DEST_ID_SHIFT);
+
+out->data = (irq->vector << MSI_DATA_VECTOR_SHIFT) | \
+(irq->delivery_mode << MSI_DATA_DELIVERY_MODE_SHIFT);
+
+trace_amdvi_ir_generate_msi_message(irq->vector, irq->delivery_mode,
+irq->dest_mode, irq->dest, irq->redir_hint);
+}
+
+static int amdvi_int_remap_legacy(AMDVIState *iommu,
+  MSIMessage *origin,
+  MSIMessage *translated,
+  uint64_t *dte,
+  struct AMDVIIrq *irq,
+  uint16_t sid)
+{
+int ret;
+union irte irte;
+
+/* get interrupt remapping table */


... get interrupt remapping table "entry"? :)

I see similar wordings in your spec, e.g., Table 20 is named as
"Interrupt Remapping Table Fields - Basic Format", but actually AFAICT
it's for the entry fields.  I'm confused a bit with them.



I was too much in spec hence used the same wording as spec. But, I agree
with you that we should use "... interrupt remapping table entry".



+ret = amdvi_get_irte(iommu, origin, dte, , sid);
+if (ret < 0) {
+return ret;
+}
+
+if (!irte.fields.valid) {
+trace_amdvi_ir_target_abort("RemapEn is disabled");
+return -AMDVI_IR_TARGET_ABORT;
+}
+
+if (irte.fields.guest_mode) {
+trace_amdvi_ir_target_abort("guest mode is not zero");
+return -AMDVI_IR_TARGET_ABORT;
+}
+
+if (irte.fields.int_type > AMDVI_IOAPIC_INT_TYPE_ARBITRATED) {
+trace_amdvi_ir_target_abort("reserved int_type");
+return -AMDVI_IR_TARGET_ABORT;
+}
+
+irq->delivery_mode = irte.fields.int_type;
+irq->vector = irte.fields.vector;
+irq->dest_mode = irte.fields.dm;
+irq->redir_hint = irte.fields.rq_eoi;
+irq->dest = irte.fields.destination;
+
+return 0;
+}
+
+static int __amdvi_int_remap_msi(AMDVIState *iommu,
+ MSIMessage *origin,
+ MSIMessage *translated,
+ uint64_t *dte,
+ uint16_t sid)
+{
+int ret;
+uint8_t int_ctl;
+struct AMDVIIrq irq = { 0 };
+
+int_ctl = (dte[2] >> AMDVI_IR_INTCTL_SHIFT) & 3;
+trace_amdvi_ir_intctl(int_ctl);
+
+switch (int_ctl) {
+case AMDVI_IR_INTCTL_PASS:
+memcpy(translated, origin, sizeof(*origin));
+return 0;
+case AMDVI_IR_INTCTL_REMAP:
+break;
+case AMDVI_IR_INTCTL_ABORT:
+   

[Qemu-devel] [PATCH 4/6] i386: acpi: add IVHD device entry for IOAPIC

2018-09-11 Thread Brijesh Singh
When interrupt remapping is enabled, add a special IVHD device
(type IOAPIC) -- which is typically PCI device 14:0.0. Linux IOMMU driver
checks for this special device.

Cc: "Michael S. Tsirkin" 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Tom Lendacky 
Cc: Suravee Suthikulpanit 
Signed-off-by: Brijesh Singh 
---
 hw/i386/acpi-build.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index e1ee8ae..5c2c638 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2519,6 +2519,7 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker)
 static void
 build_amd_iommu(GArray *table_data, BIOSLinker *linker)
 {
+int ivhd_table_len = 28;
 int iommu_start = table_data->len;
 AMDVIState *s = AMD_IOMMU_DEVICE(x86_iommu_get_default());
 
@@ -2540,8 +2541,16 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker)
  (1UL << 6) | /* PrefSup  */
  (1UL << 7),  /* PPRSup   */
  1);
+
+/*
+ * When interrupt remapping is enabled, we add a special IVHD device
+ * for type IO-APIC.
+ */
+if (s->intr_enabled) {
+ivhd_table_len += 8;
+}
 /* IVHD length */
-build_append_int_noprefix(table_data, 28, 2);
+build_append_int_noprefix(table_data, ivhd_table_len, 2);
 /* DeviceID */
 build_append_int_noprefix(table_data, s->devid, 2);
 /* Capability offset */
@@ -2565,6 +2574,15 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker)
  */
 build_append_int_noprefix(table_data, 0x001, 4);
 
+/*
+ * When interrupt remapping is enabled, Linux IOMMU driver also checks
+ * for special IVHD device (type IO-APIC), which is typically presented
+ * as PCI device 14:00.0.
+ */
+if (s->intr_enabled) {
+build_append_int_noprefix(table_data, 0x0100a048, 8);
+}
+
 build_header(linker, table_data, (void *)(table_data->data + iommu_start),
  "IVRS", table_data->len - iommu_start, 1, NULL, NULL);
 }
-- 
2.7.4




[Qemu-devel] [PATCH 2/6] x86_iommu/amd: Prepare for interrupt remap support

2018-09-11 Thread Brijesh Singh
Register the interrupt remapping callback and read/write ops for the
amd-iommu-ir memory region.

Cc: "Michael S. Tsirkin" 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Tom Lendacky 
Cc: Suravee Suthikulpanit 
Signed-off-by: Brijesh Singh 
---
 hw/i386/amd_iommu.c  | 107 +++
 hw/i386/amd_iommu.h  |  17 +++-
 hw/i386/trace-events |   5 +++
 3 files changed, 127 insertions(+), 2 deletions(-)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 1fd669f..572ba0a 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -26,6 +26,7 @@
 #include "amd_iommu.h"
 #include "qapi/error.h"
 #include "qemu/error-report.h"
+#include "hw/i386/apic_internal.h"
 #include "trace.h"
 
 /* used AMD-Vi MMIO registers */
@@ -1026,6 +1027,101 @@ static IOMMUTLBEntry amdvi_translate(IOMMUMemoryRegion 
*iommu, hwaddr addr,
 return ret;
 }
 
+/* Interrupt remapping for MSI/MSI-X entry */
+static int amdvi_int_remap_msi(AMDVIState *iommu,
+   MSIMessage *origin,
+   MSIMessage *translated,
+   uint16_t sid)
+{
+int ret;
+
+assert(origin && translated);
+
+trace_amdvi_ir_remap_msi_req(origin->address, origin->data, sid);
+
+if (!iommu || !iommu->intr_enabled) {
+memcpy(translated, origin, sizeof(*origin));
+goto out;
+}
+
+if (origin->address & AMDVI_MSI_ADDR_HI_MASK) {
+trace_amdvi_err("MSI address high 32 bits non-zero when "
+"Interrupt Remapping enabled.");
+return -AMDVI_IR_ERR;
+}
+
+if ((origin->address & AMDVI_MSI_ADDR_LO_MASK) != APIC_DEFAULT_ADDRESS) {
+trace_amdvi_err("MSI is not from IOAPIC.");
+return -AMDVI_IR_ERR;
+}
+
+out:
+trace_amdvi_ir_remap_msi(origin->address, origin->data,
+ translated->address, translated->data);
+return 0;
+}
+
+static int amdvi_int_remap(X86IOMMUState *iommu,
+   MSIMessage *origin,
+   MSIMessage *translated,
+   uint16_t sid)
+{
+return amdvi_int_remap_msi(AMD_IOMMU_DEVICE(iommu), origin,
+   translated, sid);
+}
+
+static MemTxResult amdvi_mem_ir_write(void *opaque, hwaddr addr,
+  uint64_t value, unsigned size,
+  MemTxAttrs attrs)
+{
+int ret;
+MSIMessage from = { 0, 0 }, to = { 0, 0 };
+uint16_t sid = AMDVI_SB_IOAPIC_ID;
+
+from.address = (uint64_t) addr + AMDVI_INT_ADDR_FIRST;
+from.data = (uint32_t) value;
+
+trace_amdvi_mem_ir_write_req(addr, value, size);
+
+if (!attrs.unspecified) {
+/* We have explicit Source ID */
+sid = attrs.requester_id;
+}
+
+ret = amdvi_int_remap_msi(opaque, , , sid);
+if (ret < 0) {
+/* TODO: report error */
+/* Drop the interrupt */
+return MEMTX_ERROR;
+}
+
+apic_get_class()->send_msi();
+
+trace_amdvi_mem_ir_write(to.address, to.data);
+return MEMTX_OK;
+}
+
+static MemTxResult amdvi_mem_ir_read(void *opaque, hwaddr addr,
+ uint64_t *data, unsigned size,
+ MemTxAttrs attrs)
+{
+return MEMTX_OK;
+}
+
+static const MemoryRegionOps amdvi_ir_ops = {
+.read_with_attrs = amdvi_mem_ir_read,
+.write_with_attrs = amdvi_mem_ir_write,
+.endianness = DEVICE_LITTLE_ENDIAN,
+.impl = {
+.min_access_size = 4,
+.max_access_size = 4,
+},
+.valid = {
+.min_access_size = 4,
+.max_access_size = 4,
+}
+};
+
 static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
 {
 AMDVIState *s = opaque;
@@ -1055,6 +1151,12 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, 
void *opaque, int devfn)
 address_space_init(_as[devfn]->as,
MEMORY_REGION(_as[devfn]->iommu),
"amd-iommu");
+memory_region_init_io(_as[devfn]->iommu_ir, OBJECT(s),
+  _ir_ops, s, "amd-iommu-ir",
+  AMDVI_INT_ADDR_SIZE);
+memory_region_add_subregion(MEMORY_REGION(_as[devfn]->iommu),
+  AMDVI_INT_ADDR_FIRST,
+  _as[devfn]->iommu_ir);
 }
 return _as[devfn]->as;
 }
@@ -1172,6 +1274,10 @@ static void amdvi_realize(DeviceState *dev, Error **err)
 return;
 }
 
+/* Pseudo address space under root PCI bus. */
+pcms->ioapic_as = amdvi_host_dma_iommu(bus, s, AMDVI_SB_IOAPIC_ID);
+s->intr_enabled = x86_iommu->intr_supported;
+
 /* set up MMIO */
 m

[Qemu-devel] [PATCH 1/6] x86_iommu: move the kernel-irqchip check in common code

2018-09-11 Thread Brijesh Singh
Interrupt remapping needs kernel-irqchip={off|split} on both Intel and AMD
platforms. Move the check in common place.

Cc: "Michael S. Tsirkin" 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Tom Lendacky 
Cc: Suravee Suthikulpanit 
Signed-off-by: Brijesh Singh 
---
 hw/i386/intel_iommu.c | 7 ---
 hw/i386/x86-iommu.c   | 9 +
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 3dfada1..84dbc20 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3248,13 +3248,6 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error 
**errp)
 {
 X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
 
-/* Currently Intel IOMMU IR only support "kernel-irqchip={off|split}" */
-if (x86_iommu->intr_supported && kvm_irqchip_in_kernel() &&
-!kvm_irqchip_is_split()) {
-error_setg(errp, "Intel Interrupt Remapping cannot work with "
- "kernel-irqchip=on, please use 'split|off'.");
-return false;
-}
 if (s->intr_eim == ON_OFF_AUTO_ON && !x86_iommu->intr_supported) {
 error_setg(errp, "eim=on cannot be selected without intremap=on");
 return false;
diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
index 8a01a2d..7440cb8 100644
--- a/hw/i386/x86-iommu.c
+++ b/hw/i386/x86-iommu.c
@@ -25,6 +25,7 @@
 #include "qapi/error.h"
 #include "qemu/error-report.h"
 #include "trace.h"
+#include "sysemu/kvm.h"
 
 void x86_iommu_iec_register_notifier(X86IOMMUState *iommu,
  iec_notify_fn fn, void *data)
@@ -94,6 +95,14 @@ static void x86_iommu_realize(DeviceState *dev, Error **errp)
 return;
 }
 
+/* Both Intel and AMD IOMMU IR only support "kernel-irqchip={off|split}" */
+if (x86_iommu->intr_supported && kvm_irqchip_in_kernel() &&
+!kvm_irqchip_is_split()) {
+error_setg(errp, "Interrupt Remapping cannot work with "
+ "kernel-irqchip=on, please use 'split|off'.");
+return;
+}
+
 if (x86_class->realize) {
 x86_class->realize(dev, errp);
 }
-- 
2.7.4




[Qemu-devel] [PATCH 3/6] x86_iommu/amd: Add interrupt remap support when VAPIC is not enabled

2018-09-11 Thread Brijesh Singh
Emulate the interrupt remapping support when guest virtual APIC is
not enabled.

See IOMMU spec: https://support.amd.com/TechDocs/48882_IOMMU.pdf
(section 2.2.5.1) for details information.

When VAPIC is not enabled, it uses interrupt remapping as defined in
Table 20 and Figure 15 from IOMMU spec.

Cc: "Michael S. Tsirkin" 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Tom Lendacky 
Cc: Suravee Suthikulpanit 
Signed-off-by: Brijesh Singh 
---
 hw/i386/amd_iommu.c  | 187 +++
 hw/i386/amd_iommu.h  |  60 -
 hw/i386/trace-events |   7 ++
 3 files changed, 253 insertions(+), 1 deletion(-)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 572ba0a..5ac19df 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -28,6 +28,8 @@
 #include "qemu/error-report.h"
 #include "hw/i386/apic_internal.h"
 #include "trace.h"
+#include "cpu.h"
+#include "hw/i386/apic-msidef.h"
 
 /* used AMD-Vi MMIO registers */
 const char *amdvi_mmio_low[] = {
@@ -1027,6 +1029,119 @@ static IOMMUTLBEntry amdvi_translate(IOMMUMemoryRegion 
*iommu, hwaddr addr,
 return ret;
 }
 
+static int amdvi_get_irte(AMDVIState *s, MSIMessage *origin, uint64_t *dte,
+  union irte *irte, uint16_t devid)
+{
+uint64_t irte_root, offset;
+
+irte_root = dte[2] & AMDVI_IR_PHYS_ADDR_MASK;
+offset = (origin->data & AMDVI_IRTE_OFFSET) << 2;
+
+trace_amdvi_ir_irte(irte_root, offset);
+
+if (dma_memory_read(_space_memory, irte_root + offset,
+irte, sizeof(*irte))) {
+trace_amdvi_ir_err("failed to get irte");
+return -AMDVI_IR_GET_IRTE;
+}
+
+trace_amdvi_ir_irte_val(irte->val);
+
+return 0;
+}
+
+static void amdvi_generate_msi_message(struct AMDVIIrq *irq, MSIMessage *out)
+{
+out->address = APIC_DEFAULT_ADDRESS | \
+(irq->dest_mode << MSI_ADDR_DEST_MODE_SHIFT) | \
+(irq->redir_hint << MSI_ADDR_REDIRECTION_SHIFT) | \
+(irq->dest << MSI_ADDR_DEST_ID_SHIFT);
+
+out->data = (irq->vector << MSI_DATA_VECTOR_SHIFT) | \
+(irq->delivery_mode << MSI_DATA_DELIVERY_MODE_SHIFT);
+
+trace_amdvi_ir_generate_msi_message(irq->vector, irq->delivery_mode,
+irq->dest_mode, irq->dest, irq->redir_hint);
+}
+
+static int amdvi_int_remap_legacy(AMDVIState *iommu,
+  MSIMessage *origin,
+  MSIMessage *translated,
+  uint64_t *dte,
+  struct AMDVIIrq *irq,
+  uint16_t sid)
+{
+int ret;
+union irte irte;
+
+/* get interrupt remapping table */
+ret = amdvi_get_irte(iommu, origin, dte, , sid);
+if (ret < 0) {
+return ret;
+}
+
+if (!irte.fields.valid) {
+trace_amdvi_ir_target_abort("RemapEn is disabled");
+return -AMDVI_IR_TARGET_ABORT;
+}
+
+if (irte.fields.guest_mode) {
+trace_amdvi_ir_target_abort("guest mode is not zero");
+return -AMDVI_IR_TARGET_ABORT;
+}
+
+if (irte.fields.int_type > AMDVI_IOAPIC_INT_TYPE_ARBITRATED) {
+trace_amdvi_ir_target_abort("reserved int_type");
+return -AMDVI_IR_TARGET_ABORT;
+}
+
+irq->delivery_mode = irte.fields.int_type;
+irq->vector = irte.fields.vector;
+irq->dest_mode = irte.fields.dm;
+irq->redir_hint = irte.fields.rq_eoi;
+irq->dest = irte.fields.destination;
+
+return 0;
+}
+
+static int __amdvi_int_remap_msi(AMDVIState *iommu,
+ MSIMessage *origin,
+ MSIMessage *translated,
+ uint64_t *dte,
+ uint16_t sid)
+{
+int ret;
+uint8_t int_ctl;
+struct AMDVIIrq irq = { 0 };
+
+int_ctl = (dte[2] >> AMDVI_IR_INTCTL_SHIFT) & 3;
+trace_amdvi_ir_intctl(int_ctl);
+
+switch (int_ctl) {
+case AMDVI_IR_INTCTL_PASS:
+memcpy(translated, origin, sizeof(*origin));
+return 0;
+case AMDVI_IR_INTCTL_REMAP:
+break;
+case AMDVI_IR_INTCTL_ABORT:
+trace_amdvi_ir_target_abort("int_ctl abort");
+return -AMDVI_IR_TARGET_ABORT;
+default:
+trace_amdvi_ir_target_abort("int_ctl reserved");
+return -AMDVI_IR_TARGET_ABORT;
+}
+
+ret = amdvi_int_remap_legacy(iommu, origin, translated, dte, , sid);
+if (ret < 0) {
+return ret;
+}
+
+/* Translate AMDVIIrq to MSI message */
+amdvi_generate_msi_message(, translated);
+
+return 0;
+}
+
 /* Interrupt remapping for MSI/MSI-X entry */
 static int amdvi_int_remap_msi(AMDVIState *iommu,
 

[Qemu-devel] [PATCH 0/6] x86_iommu/amd: add interrupt remap support

2018-09-11 Thread Brijesh Singh
This series adds the interrupt remapping support for amd-iommu device.

IOMMU spec is available at: https://support.amd.com/TechDocs/48882_IOMMU.pdf

To enable the interrupt remap use below qemu cli
# $QEMU \
  -device amd-iommu,intremap=on

I have tested FC-28 and Ubuntu 18.04 guest. 

Linux guest bootup log shows the interrupt remap supports:

[root@localhost ~]# dmesg | grep -i AMD-Vi
[0.001761] AMD-Vi: Using IVHD type 0x10
[0.003051] AMD-Vi: device: 00:03.0 cap: 0040 seg: 0 flags: d1 info 
[0.004007] AMD-Vi:mmio-addr: fed8
[0.004874] AMD-Vi:   DEV_ALLflags: 00
[0.006236] AMD-Vi:   DEV_SPECIAL(IOAPIC[0]) devid: 00:14.0
[0.667943] AMD-Vi: Found IOMMU at :00:03.0 cap 0x40
[0.668727] AMD-Vi: Extended features (0x29d3):
[0.669874] AMD-Vi: Interrupt remapping enabled
[0.671074] AMD-Vi: Lazy IO/TLB flushing enabled

cat /proc/interrupts confirms that its using IR

[root@localhost ~]# cat /proc/interrupts 
CPU0   
 0: 40  IR-IO-APIC2-edge  timer
 1:  9  IR-IO-APIC1-edge  i8042
 4:   1770  IR-IO-APIC4-edge  ttyS0
 7:  0  IR-IO-APIC7-edge  parport0
 8:  1  IR-IO-APIC8-edge  rtc0
 9:  0  IR-IO-APIC9-fasteoi   acpi
12: 15  IR-IO-APIC   12-edge  i8042
16:  0  IR-IO-APIC   16-fasteoi   i801_smbus
24:  0   PCI-MSI 49152-edge  AMD-Vi
25:  13070  IR-PCI-MSI 512000-edge  ahci[:00:1f.2]
26: 86  IR-PCI-MSI 32768-edge  enp0s2-rx-0
27:139  IR-PCI-MSI 32769-edge  enp0s2-tx-0
28:  1  IR-PCI-MSI 32770-edge  enp0s2
NMI:  0   Non-maskable interrupts
LOC:  26686   Local timer interrupts
SPU:  0   Spurious interrupts
...
...

Cc: "Michael S. Tsirkin" 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Tom Lendacky 
Cc: Suravee Suthikulpanit 

Brijesh Singh (6):
  x86_iommu: move the kernel-irqchip check in common code
  x86_iommu/amd: Prepare for interrupt remap support
  x86_iommu/amd: Add interrupt remap support when VAPIC is not enabled
  i386: acpi: add IVHD device entry for IOAPIC
  x86_iommu/amd: Add interrupt remap support when VAPIC is enabled
  x86_iommu/amd: Enable Guest virtual APIC support

 hw/i386/acpi-build.c  |  23 +++-
 hw/i386/amd_iommu.c   | 360 ++
 hw/i386/amd_iommu.h   | 115 +++-
 hw/i386/intel_iommu.c |   7 -
 hw/i386/trace-events  |  14 ++
 hw/i386/x86-iommu.c   |   9 ++
 6 files changed, 516 insertions(+), 12 deletions(-)

-- 
2.7.4




[Qemu-devel] [PATCH 6/6] x86_iommu/amd: Enable Guest virtual APIC support

2018-09-11 Thread Brijesh Singh
Now that amd-iommu support interrupt remapping, enable the GASup in IVRS
table and GASup in extended feature register to indicate that IOMMU
support guest virtual APIC mode.

Note that the GAMSup is set to zero to indicate that  Guest Virtual
APIC does not support advanced interrupt features (i.e virtualized
interrupts using the guest virtual APIC).

See Table 21 from IOMMU spec for interrupt virtualization controls

IOMMU spec: https://support.amd.com/TechDocs/48882_IOMMU.pdf

Cc: "Michael S. Tsirkin" 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Tom Lendacky 
Cc: Suravee Suthikulpanit 
Signed-off-by: Brijesh Singh 
---
 hw/i386/acpi-build.c | 3 ++-
 hw/i386/amd_iommu.h  | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 5c2c638..1cbc8ba 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2565,7 +2565,8 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker)
 build_append_int_noprefix(table_data,
  (48UL << 30) | /* HATS   */
  (48UL << 28) | /* GATS   */
- (1UL << 2),/* GTSup  */
+ (1UL << 2)   | /* GTSup  */
+ (1UL << 6),/* GASup  */
  4);
 /*
  *   Type 1 device entry reporting all devices
diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
index 1dab974..5defaac 100644
--- a/hw/i386/amd_iommu.h
+++ b/hw/i386/amd_iommu.h
@@ -177,7 +177,7 @@
 /* extended feature support */
 #define AMDVI_EXT_FEATURES (AMDVI_FEATURE_PREFETCH | AMDVI_FEATURE_PPR | \
 AMDVI_FEATURE_IA | AMDVI_FEATURE_GT | AMDVI_FEATURE_HE | \
-AMDVI_GATS_MODE | AMDVI_HATS_MODE)
+AMDVI_GATS_MODE | AMDVI_HATS_MODE | AMDVI_FEATURE_GA)
 
 /* capabilities header */
 #define AMDVI_CAPAB_FEATURES (AMDVI_CAPAB_FLAT_EXT | \
-- 
2.7.4




Re: [Qemu-devel] [libvirt] CPU Support

2018-07-18 Thread Brijesh Singh



On 7/18/18 8:49 AM, Eduardo Habkost wrote:
> CCing the AMD people who worked on this.
>
> On Wed, Jul 18, 2018 at 12:18:45PM +0200, Pavel Hrdina wrote:
>> On Wed, Jul 18, 2018 at 10:50:34AM +0100, Daniel P. Berrangé wrote:
>>> On Wed, Jul 18, 2018 at 12:41:48PM +0300, Hetz Ben Hamo wrote:
 Hi,

 I've been looking at the CPU list and although I see lots of CPU's, I
 cannot find 2 CPU families:

 * AMD Ryzen
 * AMD Threadripper

 Although EPYC has been added recently.

 Are there any missing details which preventing adding those CPU's to the
 list?
>>> Libvirt adds CPU models based on what QEMU supports. So from libvirt side 
>>> the
>>> answer is simply that QEMU doesn't expose any models for Ryzen/Threadripper,
>>> but I'm not clear why it doesn't...

EPYC model should work just fine on Ryzen/Threadripper. Are we seeing
some issues?

>>> For a while I thought Ryzen/Threadripper would have same feature set as
>>> EPYC, but I've seen bugs recently suggesting that is not in fact the
>>> case. So it does look like having those models exposed by QEMU might
>>> be useful.
>>>
>>> Copy'ing QEMU devel & the CPU model maintainers for opinions.
>> I think that QEMU should figure out some pattern for naming CPU models
>> because it's one big mess.  EPYC and Ryzen are bad names for QEMU as
>> Core/Xeon would be for Intel CPUs.  It's the name of a model families
>> and it will probably remain the same but with different
>> microarchitecture.
>> Better name would be similarly like for the latest Inter CPUs,
>> Skylake-Client and Skylake-Server.  Currently AMD has already two
>> microarchitectures, Zen and Zen+ and there is third one Zen 2 planned.
>>
>> Zen has AMD Ryzen, AMD Ryzen Threadripper and AMD Epyc.
>> Zen+ has AMD Ryzen, AMD Ryzen Threadripper
>>
>> And I bet that Zen 2 will follow the same model families.


My guess is same as your :) I hope sales/marketing does not come up with
different names for Soc's based Zen 2 core.


>> We probably cannot rename EPYC now, but before we introduce Ryzen and
>> Threadripper let's thing about it and come up with better names, for
>> example Zen-Client/Zen-Server Zen+-Client or something like that.

Zen-Client/Zen-Server naming convention looks better.




Re: [Qemu-devel] MSRC001_102C on EPYC (was Re: [PATCH v3] target-i386/cpu: Add new EPYC CPU model)

2018-06-27 Thread Brijesh Singh

Hi Eduardo,


On 06/27/2018 09:48 AM, Eduardo Habkost wrote:

Hi,

On Tue, Aug 15, 2017 at 12:00:51PM -0500, Brijesh Singh wrote:

Add a new base CPU model called 'EPYC' to model processors from AMD EPYC
family (which includes EPYC 76xx,75xx,74xx, 73xx and 72xx).

The following features bits have been added/removed compare to Opteron_G5

Added: monitor, movbe, rdrand, mmxext, ffxsr, rdtscp, cr8legacy, osvw,
fsgsbase, bmi1, avx2, smep, bmi2, rdseed, adx, smap, clfshopt, sha
xsaveopt, xsavec, xgetbv1, arat

Removed: xop, fma4, tbm


[...]

+{
+.name = "EPYC",
+.level = 0xd,
+.vendor = CPUID_VENDOR_AMD,
+.family = 23,
+.model = 1,
+.stepping = 2,


These f/m/s values trigger model-specific code in Windows 10
guests[1], and I couldn't find any public information that allow
us to fix the problem.
 > Windows 10 tries to set bit 15 of MSRC001_102C, in code that
looks like workarounds for CPU Erratas.

I found a Revision Guide for family 17h[2], but it has no mention
of MSRC001_102C at all.

Can AMD help us fix this?




IIRC, someone at AMD was looking into this. I will ask around and
update you.



If we are unable to fix it, I plan to work around it by changing
EPYC's family/model/stepping to the values in Opteron_G5 on QEMU
3.0.



I hope we find the correct definition of MSR and update the
document so that we don't need to workaround it in KVM/Qemu.




[1] Details can be seen at:
 https://bugzilla.redhat.com/show_bug.cgi?id=1592276
 https://bugzilla.redhat.com/show_bug.cgi?id=1593190#c12
[2] https://developer.amd.com/wp-content/resources/55449_1.12.pdf





Re: [Qemu-devel] [PATCH v2 for-2.12] tap: set vhostfd passed from qemu cli to non-blocking

2018-05-17 Thread Brijesh Singh

Hi Michael and Jason,

Looks like this patch was not included in any of your pull request hence 
it didn't made into 2.12. Can you please include this patch in your next 
pull request?


~ Brijesh

On 04/07/2018 10:53 PM, Michael S. Tsirkin wrote:

On Fri, Apr 06, 2018 at 01:51:25PM -0500, Brijesh Singh wrote:

A guest boot hangs while probing the network interface when
iommu_platform=on is used.

The following qemu cli hangs without this patch:

# $QEMU \
   -netdev tap,fd=3,id=hostnet0,vhost=on,vhostfd=4 3<>/dev/tap67 
4<>/dev/host-net \
   -device 
virtio-net-pci,netdev=hostnet0,id=net0,iommu_platform=on,disable-legacy=on \
   ...

Commit: c471ad0e9bd46 (vhost_net: device IOTLB support) took care of
setting vhostfd to non-blocking when QEMU opens /dev/host-net but if
the fd is passed from qemu cli then we need to ensure that fd is set
to non-blocking.

Fixes: c471ad0e9bd46 "vhost_net: device IOTLB support"
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Jason Wang <jasow...@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.si...@amd.com>


Reviewed-by: Michael S. Tsirkin <m...@redhat.com>


---

Changes since v1:
  - use qemu_set_nonblock() instead of fcntl(..)

  net/tap.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/net/tap.c b/net/tap.c
index 2b3a36f9b50d..89c4e19162a2 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -40,6 +40,7 @@
  #include "qemu-common.h"
  #include "qemu/cutils.h"
  #include "qemu/error-report.h"
+#include "qemu/sockets.h"
  
  #include "net/tap.h"
  
@@ -693,6 +694,7 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,

  }
  return;
  }
+qemu_set_nonblock(vhostfd);
  } else {
  vhostfd = open("/dev/vhost-net", O_RDWR);
  if (vhostfd < 0) {
--
2.14.3




[Qemu-devel] [PATCH v2 for-2.12] tap: set vhostfd passed from qemu cli to non-blocking

2018-04-06 Thread Brijesh Singh
A guest boot hangs while probing the network interface when
iommu_platform=on is used.

The following qemu cli hangs without this patch:

# $QEMU \
  -netdev tap,fd=3,id=hostnet0,vhost=on,vhostfd=4 3<>/dev/tap67 
4<>/dev/host-net \
  -device 
virtio-net-pci,netdev=hostnet0,id=net0,iommu_platform=on,disable-legacy=on \
  ...

Commit: c471ad0e9bd46 (vhost_net: device IOTLB support) took care of
setting vhostfd to non-blocking when QEMU opens /dev/host-net but if
the fd is passed from qemu cli then we need to ensure that fd is set
to non-blocking.

Fixes: c471ad0e9bd46 "vhost_net: device IOTLB support"
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Jason Wang <jasow...@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.si...@amd.com>
---

Changes since v1:
 - use qemu_set_nonblock() instead of fcntl(..)

 net/tap.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/tap.c b/net/tap.c
index 2b3a36f9b50d..89c4e19162a2 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -40,6 +40,7 @@
 #include "qemu-common.h"
 #include "qemu/cutils.h"
 #include "qemu/error-report.h"
+#include "qemu/sockets.h"
 
 #include "net/tap.h"
 
@@ -693,6 +694,7 @@ static void net_init_tap_one(const NetdevTapOptions *tap, 
NetClientState *peer,
 }
 return;
 }
+qemu_set_nonblock(vhostfd);
 } else {
 vhostfd = open("/dev/vhost-net", O_RDWR);
 if (vhostfd < 0) {
-- 
2.14.3




Re: [Qemu-devel] [PATCH for-2.12] tap: set vhostfd passed from qemu cli to non-blocking

2018-04-06 Thread Brijesh Singh



On 04/06/2018 10:44 AM, Eric Blake wrote:

On 04/06/2018 07:03 AM, Brijesh Singh wrote:

A guest boot hangs while probing the network interface when
iommu_platform=on is used.

The following qemu cli hangs without this patch:

# $QEMU \
   -netdev tap,fd=3,id=hostnet0,vhost=on,vhostfd=4 3<>/dev/tap67 
4<>/dev/host-net \
   -device 
virtio-net-pci,netdev=hostnet0,id=net0,iommu_platform=on,disable-legacy=on \
   ...

Commit: c471ad0e9bd46 (vhost_net: device IOTLB support) took care of
setting vhostfd to non-blocking when QEMU opens /dev/host-net but if
the fd is passed from qemu cli then we need to ensure that fd is set
to non-blocking.

Fixes: c471ad0e9bd46 "vhost_net: device IOTLB support"
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Jason Wang <jasow...@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.si...@amd.com>
---
  net/tap.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/net/tap.c b/net/tap.c
index 2b3a36f9b50d..8c026fbf95cd 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -693,6 +693,7 @@ static void net_init_tap_one(const NetdevTapOptions *tap, 
NetClientState *peer,
  }
  return;
  }
+fcntl(vhostfd, F_SETFL, O_NONBLOCK);


Please use qemu_set_nonblock() instead.




Sure will do.




[Qemu-devel] [PATCH for-2.12] tap: set vhostfd passed from qemu cli to non-blocking

2018-04-06 Thread Brijesh Singh
A guest boot hangs while probing the network interface when
iommu_platform=on is used.

The following qemu cli hangs without this patch:

# $QEMU \
  -netdev tap,fd=3,id=hostnet0,vhost=on,vhostfd=4 3<>/dev/tap67 
4<>/dev/host-net \
  -device 
virtio-net-pci,netdev=hostnet0,id=net0,iommu_platform=on,disable-legacy=on \
  ...

Commit: c471ad0e9bd46 (vhost_net: device IOTLB support) took care of
setting vhostfd to non-blocking when QEMU opens /dev/host-net but if
the fd is passed from qemu cli then we need to ensure that fd is set
to non-blocking.

Fixes: c471ad0e9bd46 "vhost_net: device IOTLB support"
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Jason Wang <jasow...@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.si...@amd.com>
---
 net/tap.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/tap.c b/net/tap.c
index 2b3a36f9b50d..8c026fbf95cd 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -693,6 +693,7 @@ static void net_init_tap_one(const NetdevTapOptions *tap, 
NetClientState *peer,
 }
 return;
 }
+fcntl(vhostfd, F_SETFL, O_NONBLOCK);
 } else {
 vhostfd = open("/dev/vhost-net", O_RDWR);
 if (vhostfd < 0) {
-- 
2.14.3




Re: [Qemu-devel] [PATCH v12 24/28] sev/i386: add migration blocker

2018-03-13 Thread Brijesh Singh


On 3/13/18 4:33 AM, Paolo Bonzini wrote:
> On 08/03/2018 13:48, Brijesh Singh wrote:
>>  sev_set_guest_state(SEV_STATE_RUNNING);
>> +
>> +/* add migration blocker */
>> +error_setg(_mig_blocker,
>> +   "SEV: Migration is not implemented");
>> +ret = migrate_add_blocker(sev_mig_blocker, _err);
>> +if (local_err) {
>> +error_report_err(local_err);
>> +error_free(sev_mig_blocker);
>> +exit(1);
>> +}
>>  }
> I think this should be in sev_guest_init instead?  Does migration
> transfer the measurement, or is it lost forever?  Not a blocker though.

The launch measurement does not get transferred during the migration.
During migration we get totally different measurement which is wrapped
with transport key etc and that need to send to destination. IIRC, in my
first attempt I was adding this blocker in sev_guest_init() but
migration_add_blocker() was failing because  sev_guest_init() is called
before the migration_object_init().

>
> Paolo




Re: [Qemu-devel] [PATCH v12 28/28] tests/qmp-test: blacklist sev specific qmp commands

2018-03-13 Thread Brijesh Singh


On 3/13/18 4:07 AM, Paolo Bonzini wrote:
> On 09/03/2018 11:12, Dr. David Alan Gilbert wrote:
>> * Eduardo Habkost (ehabk...@redhat.com) wrote:
>>> On Thu, Mar 08, 2018 at 02:18:55PM -0600, Brijesh Singh wrote:
>>>>
>>>> On 3/8/18 11:08 AM, Daniel P. Berrangé wrote:
>>>>> On Thu, Mar 08, 2018 at 06:49:01AM -0600, Brijesh Singh wrote:
>>>>>> Blacklist the following commands to fix the 'make check' failure.
>>>>>>
>>>>>> query-sev-launch-measure: it returns meaninful data only when we launch
>>>>>> SEV guest otherwise the command returns an error.
>>>>>>
>>>>>> query-sev: it return an error when SEV is not available on host (e.g non
>>>>>> X86 platform or KVM is disabled at the build time)
>>>>>>
>>>>>> query-sev-capabilities: it returns an error when SEV feature is not
>>>>>> available on host machine.
>>>>> We generally expect 'make check' to succeed on every single patch
>>>>> in a series, so that 'git bisect' doesn't break.
>>>>>
>>>>> So you should add each command to the blacklist in the same commit
>>>>> that introduced the failure in the first place.
>>>>
>>>> Sure, I can quickly send the updated patch series to address your this
>>>> concern, but before spamming everyone's inbox I was wondering if I can
>>>> get some indication whether this series will make into 2.12 merge.
>>>>
>>>> Paolo, Eduardo and Richard,
>>>>
>>>> Most of the changes are in x86 directory hence any thought if you are
>>>> considering this series for 2.12 ? I have been testing the series with
>>>> and without SEV support and so far have not ran into any issue. if you
>>>> are not planning to pull this series in 2.12 then I will wait a bit
>>>> longer to get more feedback before sending the updates to address
>>>> Daniel's comment. thanks
>>> Trying to merge it before 2.12 soft freeze (next Tuesday) still
>>> looks like a reasonable goal to me.  What do others think?
>> I've only looked at a few general comments and things but it looks like
>> it's getting there;  I don't think it's had many comments from the KVM
>> side yet.
> The KVM side is a pretty linear use of the kernel API.  I'm not very
> happy with the debug API for MemoryRegions (but it's not really
> Brijesh's fault), so my plan would be to merge it without debug support.

Thanks Paolo, I am working to drop the complete debug support and will
send series very soon. I need to run some quick smoke test to make sure
I don't break SEV guest support.




Re: [Qemu-devel] [PATCH v12 26/28] qmp: add query-sev-capabilities command

2018-03-08 Thread Brijesh Singh


On 3/8/18 11:05 AM, Daniel P. Berrangé wrote:
> On Thu, Mar 08, 2018 at 06:48:59AM -0600, Brijesh Singh wrote:
>> The command can be used by libvirt to query the SEV capabilities.
>>
>> Cc: "Daniel P. Berrangé" <berra...@redhat.com>
>> Cc: "Dr. David Alan Gilbert" <dgilb...@redhat.com>
>> Cc: Markus Armbruster <arm...@redhat.com>
>> Signed-off-by: Brijesh Singh <brijesh.si...@amd.com>
>> ---
>>  monitor.c |  7 +++
>>  qapi/misc.json| 42 ++
>>  target/i386/monitor.c |  6 ++
>>  3 files changed, 55 insertions(+)
>>
>> diff --git a/monitor.c b/monitor.c
>> index d53ecc5ddab3..29ce695a80d5 100644
>> --- a/monitor.c
>> +++ b/monitor.c
>> @@ -985,6 +985,7 @@ static void qmp_unregister_commands_hack(void)
>>  qmp_unregister_command(_commands, "rtc-reset-reinjection");
>>  qmp_unregister_command(_commands, "query-sev");
>>  qmp_unregister_command(_commands, "query-sev-launch-measure");
>> +qmp_unregister_command(_commands, "query-sev-capabilities");
>>  #endif
>>  #ifndef TARGET_S390X
>>  qmp_unregister_command(_commands, "dump-skeys");
>> @@ -4117,6 +4118,12 @@ SevLaunchMeasureInfo 
>> *qmp_query_sev_launch_measure(Error **errp)
>>  error_setg(errp, QERR_FEATURE_DISABLED, "query-sev-launch-measure");
>>  return NULL;
>>  }
>> +
>> +SevCapability *qmp_query_sev_capabilities(Error **errp)
>> +{
>> +error_setg(errp, QERR_FEATURE_DISABLED, "query-sev-capabilities");
>> +return NULL;
>> +}
>>  #endif
>>  
>>  #ifndef TARGET_S390X
>> diff --git a/qapi/misc.json b/qapi/misc.json
>> index a39c43aa64b1..37c89663d8f4 100644
>> --- a/qapi/misc.json
>> +++ b/qapi/misc.json
>> @@ -3306,3 +3306,45 @@
>>  #
>>  ##
>>  { 'command': 'query-sev-launch-measure', 'returns': 'SevLaunchMeasureInfo' }
>> +
>> +##
>> +# @SevCapability:
>> +#
>> +# The struct describes capability for a Secure Encrypted Virtualization
>> +# feature.
>> +#
>> +# @pdh:  Platform Diffie-Hellman key
>> +#
>> +# @cert-chain:  PDH certificate chain
> Are either of these base64 encoded ? If so nice to document that.

Yep, they are base64 encoded, I will update the doc.


>
>> +#
>> +# @cbitpos: C-bit location in page table entry
>> +#
>> +# @reduced-phys-bits: Number of physical Address bit reduction when SEV is
>> +# enabled
>> +#
>> +# Since: 2.12
>> +##
>> +{ 'struct': 'SevCapability',
>> +  'data': { 'pdh': 'str',
>> +'cert-chain': 'str',
>> +'cbitpos': 'int',
>> +'reduced-phys-bits': 'int'} }
> Regardless of answer to above Q, 
>
>   Reviewed-by: Daniel P. Berrangé <berra...@redhat.com>
>
>
> Regards,
> Daniel




Re: [Qemu-devel] [PATCH v12 08/28] target/i386: add Secure Encrypted Virtulization (SEV) object

2018-03-08 Thread Brijesh Singh


On 3/8/18 10:49 AM, Daniel P. Berrangé wrote:
> On Thu, Mar 08, 2018 at 06:48:41AM -0600, Brijesh Singh wrote:
>> Add a new memory encryption object 'sev-guest'. The object will be used
>> to create enrypted VMs on AMD EPYC CPU. The object provides the properties
>> to pass guest owner's public Diffie-hellman key, guest policy and session
>> information required to create the memory encryption context within the
>> SEV firmware.
>>
>> e.g to launch SEV guest
>>  # $QEMU \
>> -object sev-guest,id=sev0 \
>> -machine ,memory-encryption=sev0
>>
>> Cc: Paolo Bonzini <pbonz...@redhat.com>
>> Cc: Richard Henderson <r...@twiddle.net>
>> Cc: Eduardo Habkost <ehabk...@redhat.com>
>> Signed-off-by: Brijesh Singh <brijesh.si...@amd.com>
>
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index 4c280142c52c..6113bce08a8c 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -4353,6 +4353,50 @@ contents of @code{iv.b64} to the second secret
>>   data=$SECRET,iv=$(<iv.b64)
>>  @end example
>>  
>> +@item -object 
>> sev-guest,id=@var{id},cbitpos=@var{cbitpos},reduced-phys-bits=@var{val},[sev-device=@var{string},policy=@var{policy},handle=@var{handle},dh-cert-file=@var{file},session-file=@var{file}]
>> +
>> +Create a Secure Encrypted Virtualization (SEV) guest object, which can be 
>> used
>> +to provide the guest memory encryption support on AMD processors.
>> +
>> +When memory encryption is enabled, one of the physical address bit (aka the
>> +C-bit) is utilized to mark if a memory page is protected. The 
>> @option{cbitpos}
>> +is used to provide the C-bit position. The C-bit position is Host family 
>> dependent
>> +hence user must provide this value. On EPYC, the value should be 47.
>> +
>> +When memory encryption is enabled, we loose certain bits in physical 
>> address space.
>> +The @option{reduced-phys-bits} is used to provide the number of bits we 
>> loose in
>> +physical address space. Similar to C-bit, the value is Host family 
>> dependent.
>> +On EPYC, the value should be 5.
> Is it valid to specify a different value for either of these properties ?
> eg what happens if I pass cbitpos=45 instead of 47 on an EPYC host ?

On EPYC, passing anything other than 47 will trigger error during SEV
guest initialization. The value of Cbit position is host dependent, the
value is readonly and can be obtained through the host CPUID.  The
cbitpos must be same between guest and host. Please note that the pte's
in guest page table will need to use the cbitpos  information to mark
the pages as encrypted. If cbit position given to the guest is different
from the host then guest will fail to execute.

>
> In particular I thinking about possible migration scenario, where EPYC
> uses 47 by default but some $NEXT AMD CPU uses 48 by default. In that
> case we might want to use '47' on both CPUs if we need ability to live
> migrate between different host CPU generations. Would that be valid ?

We will not be able to migrate SEV guests if cbit position does not
match between the source and destination hosts. Since during migration,
the destination guest is launched with same QEMU cli as source hence
cbitpos check in QEMU will catch it and fail the new launch. Optionally,
user can call query-sev-capabilities on both source and destination to
see if cbitpos is compatible before attempting to migrate the guest.

> On the flip side, if the value really it strictly tied to the host
> CPU family and no deviation is permitted, could the kernel not just
> pick the right value automatically avoiding the config option ?
>

I think doing so will be an issue for the migration. Consider your above
use case, a SEV guest is running on EPYC with cbitpos=47 and if we
migrate to some $NEXT AMD CPU which uses need to use cbitpos=48 and we
will fail to resume the guest on destination after migrating.

>
> Regards,
> Daniel




Re: [Qemu-devel] [PATCH v12 28/28] tests/qmp-test: blacklist sev specific qmp commands

2018-03-08 Thread Brijesh Singh


On 3/8/18 11:08 AM, Daniel P. Berrangé wrote:
> On Thu, Mar 08, 2018 at 06:49:01AM -0600, Brijesh Singh wrote:
>> Blacklist the following commands to fix the 'make check' failure.
>>
>> query-sev-launch-measure: it returns meaninful data only when we launch
>> SEV guest otherwise the command returns an error.
>>
>> query-sev: it return an error when SEV is not available on host (e.g non
>> X86 platform or KVM is disabled at the build time)
>>
>> query-sev-capabilities: it returns an error when SEV feature is not
>> available on host machine.
> We generally expect 'make check' to succeed on every single patch
> in a series, so that 'git bisect' doesn't break.
>
> So you should add each command to the blacklist in the same commit
> that introduced the failure in the first place.


Sure, I can quickly send the updated patch series to address your this
concern, but before spamming everyone's inbox I was wondering if I can
get some indication whether this series will make into 2.12 merge.

Paolo, Eduardo and Richard,

Most of the changes are in x86 directory hence any thought if you are
considering this series for 2.12 ? I have been testing the series with
and without SEV support and so far have not ran into any issue. if you
are not planning to pull this series in 2.12 then I will wait a bit
longer to get more feedback before sending the updates to address
Daniel's comment. thanks


 
>> Cc: "Daniel P. Berrangé" <berra...@redhat.com>
>> Cc: "Dr. David Alan Gilbert" <dgilb...@redhat.com>
>> Cc: Markus Armbruster <arm...@redhat.com>
>> Reviewed-by: "Dr. David Alan Gilbert" <dgilb...@redhat.com>
>> Signed-off-by: Brijesh Singh <brijesh.si...@amd.com>
>> ---
>>  tests/qmp-test.c | 5 +
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/tests/qmp-test.c b/tests/qmp-test.c
>> index 22445d9ec258..7470c6b754bc 100644
>> --- a/tests/qmp-test.c
>> +++ b/tests/qmp-test.c
>> @@ -204,6 +204,11 @@ static bool query_is_blacklisted(const char *cmd)
>>  "query-gic-capabilities", /* arm */
>>  /* Success depends on target-specific build configuration: */
>>  "query-pci",  /* CONFIG_PCI */
>> +/* Success depends on launching SEV guest */
>> +"query-sev-launch-measure",
>> +/* Success depends on Host or Hypervisor SEV support */
>> +"query-sev",
>> +"query-sev-capabilities",
>>  NULL
>>  };
>>  int i;
>> -- 
>> 2.14.3
>>
> Regards,
> Daniel




[Qemu-devel] [PATCH v12 27/28] sev/i386: add sev_get_capabilities()

2018-03-08 Thread Brijesh Singh
The function can be used to get the current SEV capabilities.
The capabilities include platform diffie-hellman key (pdh) and certificate
chain. The key can be provided to the external entities which wants to
establish a trusted channel between SEV firmware and guest owner.

Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: Richard Henderson <r...@twiddle.net>
Cc: Eduardo Habkost <ehabk...@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.si...@amd.com>
---
 target/i386/monitor.c  | 11 +--
 target/i386/sev-stub.c |  5 +++
 target/i386/sev.c  | 83 ++
 target/i386/sev_i386.h |  1 +
 4 files changed, 98 insertions(+), 2 deletions(-)

diff --git a/target/i386/monitor.c b/target/i386/monitor.c
index 33e6bade693b..79fa9bd7a3e3 100644
--- a/target/i386/monitor.c
+++ b/target/i386/monitor.c
@@ -740,6 +740,13 @@ SevLaunchMeasureInfo *qmp_query_sev_launch_measure(Error 
**errp)
 
 SevCapability *qmp_query_sev_capabilities(Error **errp)
 {
-error_setg(errp, "SEV feature is not available");
-return NULL;
+SevCapability *data;
+
+data = sev_get_capabilities();
+if (!data) {
+error_setg(errp, "SEV feature is not available");
+return NULL;
+}
+
+return data;
 }
diff --git a/target/i386/sev-stub.c b/target/i386/sev-stub.c
index 2f61c32ec975..59a003a4ebe6 100644
--- a/target/i386/sev-stub.c
+++ b/target/i386/sev-stub.c
@@ -44,3 +44,8 @@ char *sev_get_launch_measurement(void)
 {
 return NULL;
 }
+
+SevCapability *sev_get_capabilities(void)
+{
+return NULL;
+}
diff --git a/target/i386/sev.c b/target/i386/sev.c
index b9bfce95246a..1d0cb8435e0f 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -427,6 +427,89 @@ sev_get_info(void)
 return info;
 }
 
+static int
+sev_get_pdh_info(int fd, guchar **pdh, size_t *pdh_len, guchar **cert_chain,
+ size_t *cert_chain_len)
+{
+guchar *pdh_data, *cert_chain_data;
+struct sev_user_data_pdh_cert_export export = {};
+int err, r;
+
+/* query the certificate length */
+r = sev_platform_ioctl(fd, SEV_PDH_CERT_EXPORT, , );
+if (r < 0) {
+if (err != SEV_RET_INVALID_LEN) {
+error_report("failed to export PDH cert ret=%d fw_err=%d (%s)",
+ r, err, fw_error_to_str(err));
+return 1;
+}
+}
+
+pdh_data = g_new(guchar, export.pdh_cert_len);
+cert_chain_data = g_new(guchar, export.cert_chain_len);
+export.pdh_cert_address = (unsigned long)pdh_data;
+export.cert_chain_address = (unsigned long)cert_chain_data;
+
+r = sev_platform_ioctl(fd, SEV_PDH_CERT_EXPORT, , );
+if (r < 0) {
+error_report("failed to export PDH cert ret=%d fw_err=%d (%s)",
+ r, err, fw_error_to_str(err));
+goto e_free;
+}
+
+*pdh = pdh_data;
+*pdh_len = export.pdh_cert_len;
+*cert_chain = cert_chain_data;
+*cert_chain_len = export.cert_chain_len;
+return 0;
+
+e_free:
+g_free(pdh_data);
+g_free(cert_chain_data);
+return 1;
+}
+
+SevCapability *
+sev_get_capabilities(void)
+{
+SevCapability *cap;
+guchar *pdh_data, *cert_chain_data;
+size_t pdh_len = 0, cert_chain_len = 0;
+uint32_t ebx;
+int fd;
+
+fd = open(DEFAULT_SEV_DEVICE, O_RDWR);
+if (fd < 0) {
+error_report("%s: Failed to open %s '%s'", __func__,
+ DEFAULT_SEV_DEVICE, strerror(errno));
+return NULL;
+}
+
+if (sev_get_pdh_info(fd, _data, _len,
+ _chain_data, _chain_len)) {
+return NULL;
+}
+
+cap = g_new0(SevCapability, 1);
+cap->pdh = g_base64_encode(pdh_data, pdh_len);
+cap->cert_chain = g_base64_encode(cert_chain_data, cert_chain_len);
+
+host_cpuid(0x801F, 0, NULL, , NULL, NULL);
+cap->cbitpos = ebx & 0x3f;
+
+/*
+ * When SEV feature is enabled, we loose one bit in guest physical
+ * addressing.
+ */
+cap->reduced_phys_bits = 1;
+
+g_free(pdh_data);
+g_free(cert_chain_data);
+
+close(fd);
+return cap;
+}
+
 static int
 sev_read_file_base64(const char *filename, guchar **data, gsize *len)
 {
diff --git a/target/i386/sev_i386.h b/target/i386/sev_i386.h
index 6e370775770e..b8622dfb1e49 100644
--- a/target/i386/sev_i386.h
+++ b/target/i386/sev_i386.h
@@ -38,6 +38,7 @@ extern SevInfo *sev_get_info(void);
 extern uint32_t sev_get_cbit_position(void);
 extern uint32_t sev_get_reduced_phys_bits(void);
 extern char *sev_get_launch_measurement(void);
+extern SevCapability *sev_get_capabilities(void);
 
 typedef struct QSevGuestInfo QSevGuestInfo;
 typedef struct QSevGuestInfoClass QSevGuestInfoClass;
-- 
2.14.3




[Qemu-devel] [PATCH v12 23/28] qmp: add query-sev-launch-measure command

2018-03-08 Thread Brijesh Singh
The command can be used by libvirt to retrieve the measurement of SEV guest.
This measurement is a signature of the memory contents that was encrypted
through the LAUNCH_UPDATE_DATA.

Cc: "Daniel P. Berrangé" <berra...@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilb...@redhat.com>
Cc: Markus Armbruster <arm...@redhat.com>
Reviewed-by: Eric Blake <ebl...@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.si...@amd.com>
---
 monitor.c |  7 +++
 qapi/misc.json| 29 +
 target/i386/monitor.c | 17 +
 3 files changed, 53 insertions(+)

diff --git a/monitor.c b/monitor.c
index 2225cf5030dc..d53ecc5ddab3 100644
--- a/monitor.c
+++ b/monitor.c
@@ -984,6 +984,7 @@ static void qmp_unregister_commands_hack(void)
 #ifndef TARGET_I386
 qmp_unregister_command(_commands, "rtc-reset-reinjection");
 qmp_unregister_command(_commands, "query-sev");
+qmp_unregister_command(_commands, "query-sev-launch-measure");
 #endif
 #ifndef TARGET_S390X
 qmp_unregister_command(_commands, "dump-skeys");
@@ -4110,6 +4111,12 @@ SevInfo *qmp_query_sev(Error **errp)
 error_setg(errp, QERR_FEATURE_DISABLED, "query-sev");
 return NULL;
 }
+
+SevLaunchMeasureInfo *qmp_query_sev_launch_measure(Error **errp)
+{
+error_setg(errp, QERR_FEATURE_DISABLED, "query-sev-launch-measure");
+return NULL;
+}
 #endif
 
 #ifndef TARGET_S390X
diff --git a/qapi/misc.json b/qapi/misc.json
index 14681729f8fc..a39c43aa64b1 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -3277,3 +3277,32 @@
 #
 ##
 { 'command': 'query-sev', 'returns': 'SevInfo' }
+
+##
+# @SevLaunchMeasureInfo:
+#
+# SEV Guest Launch measurement information
+#
+# @data: the measurement value encoded in base64
+#
+# Since: 2.12
+#
+##
+{ 'struct': 'SevLaunchMeasureInfo', 'data': {'data': 'str'} }
+
+##
+# @query-sev-launch-measure:
+#
+# Query the SEV guest launch information.
+#
+# Returns: The @SevLaunchMeasureInfo for the guest
+#
+# Since: 2.12
+#
+# Example:
+#
+# -> { "execute": "query-sev-launch-measure" }
+# <- { "return": { "data": "4l8LXeNlSPUDlXPJG5966/8%YZ" } }
+#
+##
+{ 'command': 'query-sev-launch-measure', 'returns': 'SevLaunchMeasureInfo' }
diff --git a/target/i386/monitor.c b/target/i386/monitor.c
index 7df31c3cdf1b..e5596bbc0fc2 100644
--- a/target/i386/monitor.c
+++ b/target/i386/monitor.c
@@ -720,3 +720,20 @@ void hmp_info_sev(Monitor *mon, const QDict *qdict)
 monitor_printf(mon, "SEV is not enabled\n");
 }
 }
+
+SevLaunchMeasureInfo *qmp_query_sev_launch_measure(Error **errp)
+{
+char *data;
+SevLaunchMeasureInfo *info;
+
+data = sev_get_launch_measurement();
+if (!data) {
+error_setg(errp, "Measurement is not available");
+return NULL;
+}
+
+info = g_malloc0(sizeof(*info));
+info->data = data;
+
+return info;
+}
-- 
2.14.3




[Qemu-devel] [PATCH v12 15/28] sev/i386: add command to create launch memory encryption context

2018-03-08 Thread Brijesh Singh
The KVM_SEV_LAUNCH_START command creates a new VM encryption key (VEK).
The encryption key created with the command will be used for encrypting
the bootstrap images (such as guest bios).

Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: Richard Henderson <r...@twiddle.net>
Cc: Eduardo Habkost <ehabk...@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.si...@amd.com>
---
 target/i386/sev.c| 86 
 target/i386/trace-events |  2 ++
 2 files changed, 88 insertions(+)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 4f85035d5203..eee693745103 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -105,6 +105,17 @@ fw_error_to_str(int code)
 return sev_fw_errlist[code];
 }
 
+static void
+sev_set_guest_state(SevState new_state)
+{
+assert(new_state < SEV_STATE__MAX);
+assert(sev_state);
+
+trace_kvm_sev_change_state(SevState_str(sev_state->state),
+   SevState_str(new_state));
+sev_state->state = new_state;
+}
+
 static void
 sev_ram_block_added(RAMBlockNotifier *n, void *host, size_t size)
 {
@@ -406,6 +417,75 @@ sev_get_info(void)
 return info;
 }
 
+static int
+sev_read_file_base64(const char *filename, guchar **data, gsize *len)
+{
+gsize sz;
+gchar *base64;
+GError *error = NULL;
+
+if (!g_file_get_contents(filename, , , )) {
+error_report("failed to read '%s' (%s)", filename, error->message);
+return -1;
+}
+
+*data = g_base64_decode(base64, len);
+return 0;
+}
+
+static int
+sev_launch_start(SEVState *s)
+{
+gsize sz;
+int ret = 1;
+int fw_error;
+QSevGuestInfo *sev = s->sev_info;
+struct kvm_sev_launch_start *start;
+guchar *session = NULL, *dh_cert = NULL;
+
+start = g_new0(struct kvm_sev_launch_start, 1);
+
+start->handle = object_property_get_int(OBJECT(sev), "handle",
+_abort);
+start->policy = object_property_get_int(OBJECT(sev), "policy",
+_abort);
+if (sev->session_file) {
+if (sev_read_file_base64(sev->session_file, , ) < 0) {
+return 1;
+}
+start->session_uaddr = (unsigned long)session;
+start->session_len = sz;
+}
+
+if (sev->dh_cert_file) {
+if (sev_read_file_base64(sev->dh_cert_file, _cert, ) < 0) {
+return 1;
+}
+start->dh_uaddr = (unsigned long)dh_cert;
+start->dh_len = sz;
+}
+
+trace_kvm_sev_launch_start(start->policy, session, dh_cert);
+ret = sev_ioctl(s->sev_fd, KVM_SEV_LAUNCH_START, start, _error);
+if (ret < 0) {
+error_report("%s: LAUNCH_START ret=%d fw_error=%d '%s'",
+__func__, ret, fw_error, fw_error_to_str(fw_error));
+return 1;
+}
+
+object_property_set_int(OBJECT(sev), start->handle, "handle",
+_abort);
+sev_set_guest_state(SEV_STATE_LUPDATE);
+s->handle = start->handle;
+s->policy = start->policy;
+
+g_free(start);
+g_free(session);
+g_free(dh_cert);
+
+return 0;
+}
+
 void *
 sev_guest_init(const char *id)
 {
@@ -476,6 +556,12 @@ sev_guest_init(const char *id)
 goto err;
 }
 
+ret = sev_launch_start(s);
+if (ret) {
+error_report("%s: failed to create encryption context", __func__);
+goto err;
+}
+
 ram_block_notifier_add(_ram_notifier);
 
 return s;
diff --git a/target/i386/trace-events b/target/i386/trace-events
index ffa3d2250425..9402251e9991 100644
--- a/target/i386/trace-events
+++ b/target/i386/trace-events
@@ -10,3 +10,5 @@ kvm_x86_update_msi_routes(int num) "Updated %d MSI routes"
 kvm_sev_init(void) ""
 kvm_memcrypt_register_region(void *addr, size_t len) "addr %p len 0x%lu"
 kvm_memcrypt_unregister_region(void *addr, size_t len) "addr %p len 0x%lu"
+kvm_sev_change_state(const char *old, const char *new) "%s -> %s"
+kvm_sev_launch_start(int policy, void *session, void *pdh) "policy 0x%x 
session %p pdh %p"
-- 
2.14.3




[Qemu-devel] [PATCH v12 13/28] kvm: introduce memory encryption APIs

2018-03-08 Thread Brijesh Singh
Inorder to integerate the Secure Encryption Virtualization (SEV) support
add few high-level memory encryption APIs which can be used for encrypting
the guest memory region.

Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh <brijesh.si...@amd.com>
---
 accel/kvm/kvm-all.c| 30 ++
 accel/stubs/kvm-stub.c | 14 ++
 include/sysemu/kvm.h   | 25 +
 3 files changed, 69 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index a6473522be11..975ba3845234 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -107,6 +107,8 @@ struct KVMState
 
 /* memory encryption */
 void *memcrypt_handle;
+int (*memcrypt_encrypt_data)(void *handle, uint8_t *ptr, uint64_t len);
+void (*memcrypt_debug_ops)(void *handle, MemoryRegion *mr);
 };
 
 KVMState *kvm_state;
@@ -142,6 +144,34 @@ int kvm_get_max_memslots(void)
 return s->nr_slots;
 }
 
+bool kvm_memcrypt_enabled(void)
+{
+if (kvm_state && kvm_state->memcrypt_handle) {
+return true;
+}
+
+return false;
+}
+
+int kvm_memcrypt_encrypt_data(uint8_t *ptr, uint64_t len)
+{
+if (kvm_state->memcrypt_handle &&
+kvm_state->memcrypt_encrypt_data) {
+return kvm_state->memcrypt_encrypt_data(kvm_state->memcrypt_handle,
+  ptr, len);
+}
+
+return 1;
+}
+
+void kvm_memcrypt_set_debug_ops(MemoryRegion *mr)
+{
+if (kvm_state->memcrypt_handle &&
+kvm_state->memcrypt_debug_ops) {
+kvm_state->memcrypt_debug_ops(kvm_state->memcrypt_handle, mr);
+}
+}
+
 static KVMSlot *kvm_get_free_slot(KVMMemoryListener *kml)
 {
 KVMState *s = kvm_state;
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index c964af3e1c97..5739712a67e3 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -105,6 +105,20 @@ int kvm_on_sigbus(int code, void *addr)
 return 1;
 }
 
+bool kvm_memcrypt_enabled(void)
+{
+return false;
+}
+
+int kvm_memcrypt_encrypt_data(uint8_t *ptr, uint64_t len)
+{
+  return 1;
+}
+
+void kvm_memcrypt_set_debug_ops(MemoryRegion *mr)
+{
+}
+
 #ifndef CONFIG_USER_ONLY
 int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev)
 {
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 85002ac49a54..d69bd1ff2b07 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -231,6 +231,31 @@ int kvm_destroy_vcpu(CPUState *cpu);
  */
 bool kvm_arm_supports_user_irq(void);
 
+/**
+ * kvm_memcrypt_enabled - return boolean indicating whether memory encryption
+ *is enabled
+ * Returns: 1 memory encryption is enabled
+ *  0 memory encryption is disabled
+ */
+bool kvm_memcrypt_enabled(void);
+
+/**
+ * kvm_memcrypt_encrypt_data: encrypt the memory range
+ *
+ * Return: 1 failed to encrypt the range
+ * 0 succesfully encrypted memory region
+ */
+int kvm_memcrypt_encrypt_data(uint8_t *ptr, uint64_t len);
+
+/**
+ * kvm_memcrypt_set_debug_ram_ops: set debug_ram_ops callback
+ *
+ * When debug_ram_ops is set, debug access to this memory region will use
+ * memory encryption APIs.
+ */
+void kvm_memcrypt_set_debug_ops(MemoryRegion *mr);
+
+
 #ifdef NEED_CPU_H
 #include "cpu.h"
 
-- 
2.14.3




[Qemu-devel] [PATCH v12 11/28] sev/i386: add command to initialize the memory encryption context

2018-03-08 Thread Brijesh Singh
When memory encryption is enabled, KVM_SEV_INIT command is used to
initialize the platform. The command loads the SEV related persistent
data from non-volatile storage and initializes the platform context.
This command should be first issued before invoking any other guest
commands provided by the SEV firmware.

Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: Richard Henderson <r...@twiddle.net>
Cc: Eduardo Habkost <ehabk...@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.si...@amd.com>
---
 accel/kvm/kvm-all.c   |  16 
 include/sysemu/sev.h  |  22 +
 stubs/Makefile.objs   |   1 +
 stubs/sev.c   |  21 +
 target/i386/Makefile.objs |   2 +-
 target/i386/monitor.c |  11 ++-
 target/i386/sev-stub.c|  41 +
 target/i386/sev.c | 224 ++
 target/i386/sev_i386.h|  24 +
 target/i386/trace-events  |   3 +
 10 files changed, 362 insertions(+), 3 deletions(-)
 create mode 100644 include/sysemu/sev.h
 create mode 100644 stubs/sev.c
 create mode 100644 target/i386/sev-stub.c

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index b91fcb7160d3..a6473522be11 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -38,6 +38,7 @@
 #include "qemu/event_notifier.h"
 #include "trace.h"
 #include "hw/irq.h"
+#include "sysemu/sev.h"
 
 #include "hw/boards.h"
 
@@ -103,6 +104,9 @@ struct KVMState
 #endif
 KVMMemoryListener memory_listener;
 QLIST_HEAD(, KVMParkedVcpu) kvm_parked_vcpus;
+
+/* memory encryption */
+void *memcrypt_handle;
 };
 
 KVMState *kvm_state;
@@ -1636,6 +1640,18 @@ static int kvm_init(MachineState *ms)
 
 kvm_state = s;
 
+/*
+ * if memory encryption object is specified then initialize the memory
+ * encryption context.
+ */
+if (ms->memory_encryption) {
+kvm_state->memcrypt_handle = sev_guest_init(ms->memory_encryption);
+if (!kvm_state->memcrypt_handle) {
+ret = -1;
+goto err;
+}
+}
+
 ret = kvm_arch_init(ms, s);
 if (ret < 0) {
 goto err;
diff --git a/include/sysemu/sev.h b/include/sysemu/sev.h
new file mode 100644
index ..3f6a26e92789
--- /dev/null
+++ b/include/sysemu/sev.h
@@ -0,0 +1,22 @@
+/*
+ * QEMU Secure Encrypted Virutualization (SEV) support
+ *
+ * Copyright: Advanced Micro Devices, 2016-2018
+ *
+ * Authors:
+ *  Brijesh Singh <brijesh.si...@amd.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef QEMU_SEV_H
+#define QEMU_SEV_H
+
+#include "sysemu/kvm.h"
+
+void *sev_guest_init(const char *id);
+int sev_encrypt_data(void *handle, uint8_t *ptr, uint64_t len);
+void sev_set_debug_ops(void *handle, MemoryRegion *mr);
+#endif
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index 2d59d8409162..31b36fdfdb88 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -43,3 +43,4 @@ stub-obj-y += xen-common.o
 stub-obj-y += xen-hvm.o
 stub-obj-y += pci-host-piix.o
 stub-obj-y += ram-block.o
+stub-obj-y += sev.o
diff --git a/stubs/sev.c b/stubs/sev.c
new file mode 100644
index ..4a5cc5569e5f
--- /dev/null
+++ b/stubs/sev.c
@@ -0,0 +1,21 @@
+/*
+ * QEMU SEV stub
+ *
+ * Copyright Advanced Micro Devices 2018
+ *
+ * Authors:
+ *  Brijesh Singh <brijesh.si...@amd.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "sysemu/sev.h"
+
+void *sev_guest_init(const char *id)
+{
+return NULL;
+}
diff --git a/target/i386/Makefile.objs b/target/i386/Makefile.objs
index 76aeaeae2750..741cb080eb17 100644
--- a/target/i386/Makefile.objs
+++ b/target/i386/Makefile.objs
@@ -5,7 +5,7 @@ obj-$(CONFIG_TCG) += int_helper.o mem_helper.o misc_helper.o 
mpx_helper.o
 obj-$(CONFIG_TCG) += seg_helper.o smm_helper.o svm_helper.o
 obj-$(CONFIG_SOFTMMU) += machine.o arch_memory_mapping.o arch_dump.o monitor.o
 obj-$(CONFIG_KVM) += kvm.o hyperv.o sev.o
-obj-$(call lnot,$(CONFIG_KVM)) += kvm-stub.o
+obj-$(call lnot,$(CONFIG_KVM)) += kvm-stub.o sev-stub.o
 # HAX support
 ifdef CONFIG_WIN32
 obj-$(CONFIG_HAX) += hax-all.o hax-mem.o hax-windows.o
diff --git a/target/i386/monitor.c b/target/i386/monitor.c
index 27b99adf395b..29de61996371 100644
--- a/target/i386/monitor.c
+++ b/target/i386/monitor.c
@@ -670,6 +670,13 @@ void hmp_info_io_apic(Monitor *mon, const QDict *qdict)
 
 SevInfo *qmp_query_sev(Error **errp)
 {
-error_setg(errp, "SEV feature is not available");
-return NULL;
+SevInfo *info;
+
+info = sev_get_info();
+if (!info) {
+error_setg(errp, "SEV feature is not available");
+return NULL;
+}
+
+retur

[Qemu-devel] [PATCH v12 28/28] tests/qmp-test: blacklist sev specific qmp commands

2018-03-08 Thread Brijesh Singh
Blacklist the following commands to fix the 'make check' failure.

query-sev-launch-measure: it returns meaninful data only when we launch
SEV guest otherwise the command returns an error.

query-sev: it return an error when SEV is not available on host (e.g non
X86 platform or KVM is disabled at the build time)

query-sev-capabilities: it returns an error when SEV feature is not
available on host machine.

Cc: "Daniel P. Berrangé" <berra...@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilb...@redhat.com>
Cc: Markus Armbruster <arm...@redhat.com>
Reviewed-by: "Dr. David Alan Gilbert" <dgilb...@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.si...@amd.com>
---
 tests/qmp-test.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/tests/qmp-test.c b/tests/qmp-test.c
index 22445d9ec258..7470c6b754bc 100644
--- a/tests/qmp-test.c
+++ b/tests/qmp-test.c
@@ -204,6 +204,11 @@ static bool query_is_blacklisted(const char *cmd)
 "query-gic-capabilities", /* arm */
 /* Success depends on target-specific build configuration: */
 "query-pci",  /* CONFIG_PCI */
+/* Success depends on launching SEV guest */
+"query-sev-launch-measure",
+/* Success depends on Host or Hypervisor SEV support */
+"query-sev",
+"query-sev-capabilities",
 NULL
 };
 int i;
-- 
2.14.3




[Qemu-devel] [PATCH v12 12/28] sev/i386: register the guest memory range which may contain encrypted data

2018-03-08 Thread Brijesh Singh
When SEV is enabled, the hardware encryption engine uses a tweak such
that the two identical plaintext at different location will have a
different ciphertexts. So swapping or moving a ciphertexts of two guest
pages will not result in plaintexts being swapped. Hence relocating
a physical backing pages of the SEV guest will require some additional
steps in KVM driver. The KVM_MEMORY_ENCRYPT_{UN,}REG_REGION ioctl can be
used to register/unregister the guest memory region which may contain the
encrypted data. KVM driver will internally handle the relocating physical
backing pages of registered memory regions.

Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: Richard Henderson <r...@twiddle.net>
Cc: Eduardo Habkost <ehabk...@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.si...@amd.com>
---
 target/i386/sev.c| 42 ++
 target/i386/trace-events |  2 ++
 2 files changed, 44 insertions(+)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 288612e1aa46..4f85035d5203 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -105,6 +105,46 @@ fw_error_to_str(int code)
 return sev_fw_errlist[code];
 }
 
+static void
+sev_ram_block_added(RAMBlockNotifier *n, void *host, size_t size)
+{
+int r;
+struct kvm_enc_region range;
+
+range.addr = (__u64)host;
+range.size = size;
+
+trace_kvm_memcrypt_register_region(host, size);
+r = kvm_vm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_REG_REGION, );
+if (r) {
+error_report("%s: failed to register region (%p+%#lx) error '%s'",
+ __func__, host, size, strerror(errno));
+exit(1);
+}
+}
+
+static void
+sev_ram_block_removed(RAMBlockNotifier *n, void *host, size_t size)
+{
+int r;
+struct kvm_enc_region range;
+
+range.addr = (__u64)host;
+range.size = size;
+
+trace_kvm_memcrypt_unregister_region(host, size);
+r = kvm_vm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_UNREG_REGION, );
+if (r) {
+error_report("%s: failed to unregister region (%p+%#lx)",
+ __func__, host, size);
+}
+}
+
+static struct RAMBlockNotifier sev_ram_notifier = {
+.ram_block_added = sev_ram_block_added,
+.ram_block_removed = sev_ram_block_removed,
+};
+
 static void
 qsev_guest_finalize(Object *obj)
 {
@@ -436,6 +476,8 @@ sev_guest_init(const char *id)
 goto err;
 }
 
+ram_block_notifier_add(_ram_notifier);
+
 return s;
 err:
 g_free(sev_state);
diff --git a/target/i386/trace-events b/target/i386/trace-events
index 797b716751b7..ffa3d2250425 100644
--- a/target/i386/trace-events
+++ b/target/i386/trace-events
@@ -8,3 +8,5 @@ kvm_x86_update_msi_routes(int num) "Updated %d MSI routes"
 
 # target/i386/sev.c
 kvm_sev_init(void) ""
+kvm_memcrypt_register_region(void *addr, size_t len) "addr %p len 0x%lu"
+kvm_memcrypt_unregister_region(void *addr, size_t len) "addr %p len 0x%lu"
-- 
2.14.3




[Qemu-devel] [PATCH v12 26/28] qmp: add query-sev-capabilities command

2018-03-08 Thread Brijesh Singh
The command can be used by libvirt to query the SEV capabilities.

Cc: "Daniel P. Berrangé" <berra...@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilb...@redhat.com>
Cc: Markus Armbruster <arm...@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.si...@amd.com>
---
 monitor.c |  7 +++
 qapi/misc.json| 42 ++
 target/i386/monitor.c |  6 ++
 3 files changed, 55 insertions(+)

diff --git a/monitor.c b/monitor.c
index d53ecc5ddab3..29ce695a80d5 100644
--- a/monitor.c
+++ b/monitor.c
@@ -985,6 +985,7 @@ static void qmp_unregister_commands_hack(void)
 qmp_unregister_command(_commands, "rtc-reset-reinjection");
 qmp_unregister_command(_commands, "query-sev");
 qmp_unregister_command(_commands, "query-sev-launch-measure");
+qmp_unregister_command(_commands, "query-sev-capabilities");
 #endif
 #ifndef TARGET_S390X
 qmp_unregister_command(_commands, "dump-skeys");
@@ -4117,6 +4118,12 @@ SevLaunchMeasureInfo *qmp_query_sev_launch_measure(Error 
**errp)
 error_setg(errp, QERR_FEATURE_DISABLED, "query-sev-launch-measure");
 return NULL;
 }
+
+SevCapability *qmp_query_sev_capabilities(Error **errp)
+{
+error_setg(errp, QERR_FEATURE_DISABLED, "query-sev-capabilities");
+return NULL;
+}
 #endif
 
 #ifndef TARGET_S390X
diff --git a/qapi/misc.json b/qapi/misc.json
index a39c43aa64b1..37c89663d8f4 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -3306,3 +3306,45 @@
 #
 ##
 { 'command': 'query-sev-launch-measure', 'returns': 'SevLaunchMeasureInfo' }
+
+##
+# @SevCapability:
+#
+# The struct describes capability for a Secure Encrypted Virtualization
+# feature.
+#
+# @pdh:  Platform Diffie-Hellman key
+#
+# @cert-chain:  PDH certificate chain
+#
+# @cbitpos: C-bit location in page table entry
+#
+# @reduced-phys-bits: Number of physical Address bit reduction when SEV is
+# enabled
+#
+# Since: 2.12
+##
+{ 'struct': 'SevCapability',
+  'data': { 'pdh': 'str',
+'cert-chain': 'str',
+'cbitpos': 'int',
+'reduced-phys-bits': 'int'} }
+
+##
+# @query-sev-capabilities:
+#
+# This command is used to get the SEV capabilities, and is supported on AMD
+# X86 platforms only.
+#
+# Returns: SevCapability objects.
+#
+# Since: 2.12
+#
+# Example:
+#
+# -> { "execute": "query-sev-capabilities" }
+# <- { "return": { "pdh": "8CCDD8DDD", "cert-chain": "888CCCDDDEE",
+#  "cbitpos": 47, "reduced-phys-bits": 5}}
+#
+##
+{ 'command': 'query-sev-capabilities', 'returns': 'SevCapability' }
diff --git a/target/i386/monitor.c b/target/i386/monitor.c
index e5596bbc0fc2..33e6bade693b 100644
--- a/target/i386/monitor.c
+++ b/target/i386/monitor.c
@@ -737,3 +737,9 @@ SevLaunchMeasureInfo *qmp_query_sev_launch_measure(Error 
**errp)
 
 return info;
 }
+
+SevCapability *qmp_query_sev_capabilities(Error **errp)
+{
+error_setg(errp, "SEV feature is not available");
+return NULL;
+}
-- 
2.14.3




[Qemu-devel] [PATCH v12 21/28] sev/i386: add debug encrypt and decrypt commands

2018-03-08 Thread Brijesh Singh
KVM_SEV_DBG_DECRYPT and KVM_SEV_DBG_ENCRYPT commands are used for
decrypting and encrypting guest memory region. The command works only if
the guest policy allows the debugging.

Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: Richard Henderson <r...@twiddle.net>
Cc: Eduardo Habkost <ehabk...@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.si...@amd.com>
---
 accel/kvm/kvm-all.c  |  1 +
 stubs/sev.c  |  4 
 target/i386/sev.c| 57 
 target/i386/trace-events |  1 +
 4 files changed, 63 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 411aa87719e6..8089173491dd 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1682,6 +1682,7 @@ static int kvm_init(MachineState *ms)
 }
 
 kvm_state->memcrypt_encrypt_data = sev_encrypt_data;
+kvm_state->memcrypt_debug_ops = sev_set_debug_ops;
 }
 
 ret = kvm_arch_init(ms, s);
diff --git a/stubs/sev.c b/stubs/sev.c
index 2e20f3b73a5b..73f5c7f93a67 100644
--- a/stubs/sev.c
+++ b/stubs/sev.c
@@ -15,6 +15,10 @@
 #include "qemu-common.h"
 #include "sysemu/sev.h"
 
+void sev_set_debug_ops(void *handle, MemoryRegion *mr)
+{
+}
+
 int sev_encrypt_data(void *handle, uint8_t *ptr, uint64_t len)
 {
 return 1;
diff --git a/target/i386/sev.c b/target/i386/sev.c
index ce199d259f7a..f687e9e40e32 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -29,6 +29,7 @@
 #define DEFAULT_SEV_DEVICE  "/dev/sev"
 
 static SEVState *sev_state;
+static MemoryRegionRAMReadWriteOps  sev_ops;
 
 static const char *const sev_fw_errlist[] = {
 "",
@@ -606,6 +607,46 @@ sev_vm_state_change(void *opaque, int running, RunState 
state)
 }
 }
 
+static int
+sev_dbg_enc_dec(uint8_t *dst, const uint8_t *src, uint32_t len, bool write)
+{
+int ret, error;
+struct kvm_sev_dbg dbg;
+
+dbg.src_uaddr = (unsigned long)src;
+dbg.dst_uaddr = (unsigned long)dst;
+dbg.len = len;
+
+trace_kvm_sev_debug(write ? "encrypt" : "decrypt", src, dst, len);
+ret = sev_ioctl(sev_state->sev_fd,
+write ? KVM_SEV_DBG_ENCRYPT : KVM_SEV_DBG_DECRYPT,
+, );
+if (ret) {
+error_report("%s (%s) %#llx->%#llx+%#x ret=%d fw_error=%d '%s'",
+ __func__, write ? "write" : "read", dbg.src_uaddr,
+ dbg.dst_uaddr, dbg.len, ret, error,
+ fw_error_to_str(error));
+}
+
+return ret;
+}
+
+static int
+sev_mem_read(uint8_t *dst, const uint8_t *src, uint32_t len, MemTxAttrs attrs)
+{
+assert(attrs.debug);
+
+return sev_dbg_enc_dec(dst, src, len, false);
+}
+
+static int
+sev_mem_write(uint8_t *dst, const uint8_t *src, uint32_t len, MemTxAttrs attrs)
+{
+assert(attrs.debug);
+
+return sev_dbg_enc_dec(dst, src, len, true);
+}
+
 void *
 sev_guest_init(const char *id)
 {
@@ -706,6 +747,22 @@ sev_encrypt_data(void *handle, uint8_t *ptr, uint64_t len)
 return 0;
 }
 
+void
+sev_set_debug_ops(void *handle, MemoryRegion *mr)
+{
+SEVState *s = (SEVState *)handle;
+
+/* If policy does not allow debug then no need to register ops */
+if (s->policy & SEV_POLICY_NODBG) {
+return;
+}
+
+sev_ops.read = sev_mem_read;
+sev_ops.write = sev_mem_write;
+
+memory_region_set_ram_debug_ops(mr, _ops);
+}
+
 static void
 sev_register_types(void)
 {
diff --git a/target/i386/trace-events b/target/i386/trace-events
index b1fbde6e40fe..00aa6e98d810 100644
--- a/target/i386/trace-events
+++ b/target/i386/trace-events
@@ -15,3 +15,4 @@ kvm_sev_launch_start(int policy, void *session, void *pdh) 
"policy 0x%x session
 kvm_sev_launch_update_data(void *addr, uint64_t len) "addr %p len 0x%" PRIu64
 kvm_sev_launch_measurement(const char *value) "data %s"
 kvm_sev_launch_finish(void) ""
+kvm_sev_debug(const char *op, const uint8_t *src, uint8_t *dst, int len) "(%s) 
src %p dst %p len %d"
-- 
2.14.3




[Qemu-devel] [PATCH v12 10/28] include: add psp-sev.h header file

2018-03-08 Thread Brijesh Singh
The header file provide the ioctl command and structure to communicate
with /dev/sev device.

Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: Richard Henderson <r...@twiddle.net>
Cc: Eduardo Habkost <ehabk...@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.si...@amd.com>
---
 linux-headers/linux/psp-sev.h | 142 ++
 1 file changed, 142 insertions(+)
 create mode 100644 linux-headers/linux/psp-sev.h

diff --git a/linux-headers/linux/psp-sev.h b/linux-headers/linux/psp-sev.h
new file mode 100644
index ..33e247471ae0
--- /dev/null
+++ b/linux-headers/linux/psp-sev.h
@@ -0,0 +1,142 @@
+/*
+ * Userspace interface for AMD Secure Encrypted Virtualization (SEV)
+ * platform management commands.
+ *
+ * Copyright (C) 2016-2017 Advanced Micro Devices, Inc.
+ *
+ * Author: Brijesh Singh <brijesh.si...@amd.com>
+ *
+ * SEV spec 0.14 is available at:
+ * http://support.amd.com/TechDocs/55766_SEV-KM%20API_Specification.pdf
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef __PSP_SEV_USER_H__
+#define __PSP_SEV_USER_H__
+
+#include 
+
+/**
+ * SEV platform commands
+ */
+enum {
+   SEV_FACTORY_RESET = 0,
+   SEV_PLATFORM_STATUS,
+   SEV_PEK_GEN,
+   SEV_PEK_CSR,
+   SEV_PDH_GEN,
+   SEV_PDH_CERT_EXPORT,
+   SEV_PEK_CERT_IMPORT,
+
+   SEV_MAX,
+};
+
+/**
+ * SEV Firmware status code
+ */
+typedef enum {
+   SEV_RET_SUCCESS = 0,
+   SEV_RET_INVALID_PLATFORM_STATE,
+   SEV_RET_INVALID_GUEST_STATE,
+   SEV_RET_INAVLID_CONFIG,
+   SEV_RET_INVALID_LEN,
+   SEV_RET_ALREADY_OWNED,
+   SEV_RET_INVALID_CERTIFICATE,
+   SEV_RET_POLICY_FAILURE,
+   SEV_RET_INACTIVE,
+   SEV_RET_INVALID_ADDRESS,
+   SEV_RET_BAD_SIGNATURE,
+   SEV_RET_BAD_MEASUREMENT,
+   SEV_RET_ASID_OWNED,
+   SEV_RET_INVALID_ASID,
+   SEV_RET_WBINVD_REQUIRED,
+   SEV_RET_DFFLUSH_REQUIRED,
+   SEV_RET_INVALID_GUEST,
+   SEV_RET_INVALID_COMMAND,
+   SEV_RET_ACTIVE,
+   SEV_RET_HWSEV_RET_PLATFORM,
+   SEV_RET_HWSEV_RET_UNSAFE,
+   SEV_RET_UNSUPPORTED,
+   SEV_RET_MAX,
+} sev_ret_code;
+
+/**
+ * struct sev_user_data_status - PLATFORM_STATUS command parameters
+ *
+ * @major: major API version
+ * @minor: minor API version
+ * @state: platform state
+ * @flags: platform config flags
+ * @build: firmware build id for API version
+ * @guest_count: number of active guests
+ */
+struct sev_user_data_status {
+   __u8 api_major; /* Out */
+   __u8 api_minor; /* Out */
+   __u8 state; /* Out */
+   __u32 flags;/* Out */
+   __u8 build; /* Out */
+   __u32 guest_count;  /* Out */
+} __attribute__((packed));
+
+/**
+ * struct sev_user_data_pek_csr - PEK_CSR command parameters
+ *
+ * @address: PEK certificate chain
+ * @length: length of certificate
+ */
+struct sev_user_data_pek_csr {
+   __u64 address;  /* In */
+   __u32 length;   /* In/Out */
+} __attribute__((packed));
+
+/**
+ * struct sev_user_data_cert_import - PEK_CERT_IMPORT command parameters
+ *
+ * @pek_address: PEK certificate chain
+ * @pek_len: length of PEK certificate
+ * @oca_address: OCA certificate chain
+ * @oca_len: length of OCA certificate
+ */
+struct sev_user_data_pek_cert_import {
+   __u64 pek_cert_address; /* In */
+   __u32 pek_cert_len; /* In */
+   __u64 oca_cert_address; /* In */
+   __u32 oca_cert_len; /* In */
+} __attribute__((packed));
+
+/**
+ * struct sev_user_data_pdh_cert_export - PDH_CERT_EXPORT command parameters
+ *
+ * @pdh_address: PDH certificate address
+ * @pdh_len: length of PDH certificate
+ * @cert_chain_address: PDH certificate chain
+ * @cert_chain_len: length of PDH certificate chain
+ */
+struct sev_user_data_pdh_cert_export {
+   __u64 pdh_cert_address; /* In */
+   __u32 pdh_cert_len; /* In/Out */
+   __u64 cert_chain_address;   /* In */
+   __u32 cert_chain_len;   /* In/Out */
+} __attribute__((packed));
+
+/**
+ * struct sev_issue_cmd - SEV ioctl parameters
+ *
+ * @cmd: SEV commands to execute
+ * @opaque: pointer to the command structure
+ * @error: SEV FW return code on failure
+ */
+struct sev_issue_cmd {
+   __u32 cmd;  /* In */
+   __u64 data; /* In */
+   __u32 error;/* Out */
+} __attribute__((packed));
+
+#define SEV_IOC_TYPE   'S'
+#define SEV_ISSUE_CMD  _IOWR(SEV_IOC_TYPE, 0x0, struct sev_issue_cmd)

[Qemu-devel] [PATCH v12 22/28] target/i386: clear C-bit when walking SEV guest page table

2018-03-08 Thread Brijesh Singh
In SEV-enabled guest the pte entry will have C-bit set, we need to
clear the C-bit when walking the page table.

Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: Richard Henderson <r...@twiddle.net>
Cc: Eduardo Habkost <ehabk...@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.si...@amd.com>
---
 target/i386/helper.c  | 31 +--
 target/i386/monitor.c | 68 +--
 2 files changed, 62 insertions(+), 37 deletions(-)

diff --git a/target/i386/helper.c b/target/i386/helper.c
index 58fb6eec562a..dc5c7005cf13 100644
--- a/target/i386/helper.c
+++ b/target/i386/helper.c
@@ -21,6 +21,7 @@
 #include "cpu.h"
 #include "exec/exec-all.h"
 #include "sysemu/kvm.h"
+#include "sev_i386.h"
 #include "kvm_i386.h"
 #ifndef CONFIG_USER_ONLY
 #include "sysemu/sysemu.h"
@@ -732,6 +733,9 @@ hwaddr x86_cpu_get_phys_page_debug(CPUState *cs, vaddr addr)
 int32_t a20_mask;
 uint32_t page_offset;
 int page_size;
+uint64_t me_mask;
+
+me_mask = sev_get_me_mask();
 
 a20_mask = x86_get_a20_mask(env);
 if (!(env->cr[0] & CR0_PG_MASK)) {
@@ -755,25 +759,25 @@ hwaddr x86_cpu_get_phys_page_debug(CPUState *cs, vaddr 
addr)
 }
 
 if (la57) {
-pml5e_addr = ((env->cr[3] & ~0xfff) +
+pml5e_addr = ((env->cr[3] & ~0xfff & me_mask) +
 (((addr >> 48) & 0x1ff) << 3)) & a20_mask;
-pml5e = ldq_phys_debug(cs, pml5e_addr);
+pml5e = ldq_phys_debug(cs, pml5e_addr) & me_mask;
 if (!(pml5e & PG_PRESENT_MASK)) {
 return -1;
 }
 } else {
-pml5e = env->cr[3];
+pml5e = env->cr[3] & me_mask;
 }
 
 pml4e_addr = ((pml5e & PG_ADDRESS_MASK) +
 (((addr >> 39) & 0x1ff) << 3)) & a20_mask;
-pml4e = ldq_phys_debug(cs, pml4e_addr);
+pml4e = ldq_phys_debug(cs, pml4e_addr) & me_mask;
 if (!(pml4e & PG_PRESENT_MASK)) {
 return -1;
 }
 pdpe_addr = ((pml4e & PG_ADDRESS_MASK) +
  (((addr >> 30) & 0x1ff) << 3)) & a20_mask;
-pdpe = x86_ldq_phys(cs, pdpe_addr);
+pdpe = ldq_phys_debug(cs, pdpe_addr) & me_mask;
 if (!(pdpe & PG_PRESENT_MASK)) {
 return -1;
 }
@@ -786,16 +790,16 @@ hwaddr x86_cpu_get_phys_page_debug(CPUState *cs, vaddr 
addr)
 } else
 #endif
 {
-pdpe_addr = ((env->cr[3] & ~0x1f) + ((addr >> 27) & 0x18)) &
-a20_mask;
-pdpe = ldq_phys_debug(cs, pdpe_addr);
+pdpe_addr = ((env->cr[3] & ~0x1f & me_mask) + ((addr >> 27) & 
0x18))
+  & a20_mask;
+pdpe = ldq_phys_debug(cs, pdpe_addr) & me_mask;
 if (!(pdpe & PG_PRESENT_MASK))
 return -1;
 }
 
 pde_addr = ((pdpe & PG_ADDRESS_MASK) +
 (((addr >> 21) & 0x1ff) << 3)) & a20_mask;
-pde = ldq_phys_debug(cs, pde_addr);
+pde = ldq_phys_debug(cs, pde_addr) & me_mask;
 if (!(pde & PG_PRESENT_MASK)) {
 return -1;
 }
@@ -808,7 +812,7 @@ hwaddr x86_cpu_get_phys_page_debug(CPUState *cs, vaddr addr)
 pte_addr = ((pde & PG_ADDRESS_MASK) +
 (((addr >> 12) & 0x1ff) << 3)) & a20_mask;
 page_size = 4096;
-pte = ldq_phys_debug(cs, pte_addr);
+pte = ldq_phys_debug(cs, pte_addr) & me_mask;
 }
 if (!(pte & PG_PRESENT_MASK)) {
 return -1;
@@ -817,8 +821,9 @@ hwaddr x86_cpu_get_phys_page_debug(CPUState *cs, vaddr addr)
 uint32_t pde;
 
 /* page directory entry */
-pde_addr = ((env->cr[3] & ~0xfff) + ((addr >> 20) & 0xffc)) & a20_mask;
-pde = ldl_phys_debug(cs, pde_addr);
+pde_addr = ((env->cr[3] & ~0xfff & me_mask) + ((addr >> 20) & 0xffc))
+ & a20_mask;
+pde = ldl_phys_debug(cs, pde_addr) & me_mask;
 if (!(pde & PG_PRESENT_MASK))
 return -1;
 if ((pde & PG_PSE_MASK) && (env->cr[4] & CR4_PSE_MASK)) {
@@ -827,7 +832,7 @@ hwaddr x86_cpu_get_phys_page_debug(CPUState *cs, vaddr addr)
 } else {
 /* page directory entry */
 pte_addr = ((pde & ~0xfff) + ((addr >> 10) & 0xffc)) & a20_mask;
-pte = ldl_phys_debug(cs, pte_addr);
+pte = ldl_phys_debug(cs, pte_addr) & me_mask;
 if (!(pte

[Qemu-devel] [PATCH v12 25/28] cpu/i386: populate CPUID 0x8000_001F when SEV is active

2018-03-08 Thread Brijesh Singh
When SEV is enabled, CPUID 0x8000_001F should provide additional
information regarding the feature (such as which page table bit is used
to mark the pages as encrypted etc).

The details for memory encryption CPUID is available in AMD APM
(https://support.amd.com/TechDocs/24594.pdf) Section E.4.17

Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: Richard Henderson <r...@twiddle.net>
Cc: Eduardo Habkost <ehabk...@redhat.com>
Reviewed-by: Eduardo Habkost <ehabk...@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.si...@amd.com>
---
 target/i386/cpu.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 2c04645ceac9..647f792ba123 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -26,6 +26,7 @@
 #include "sysemu/hvf.h"
 #include "sysemu/cpus.h"
 #include "kvm_i386.h"
+#include "sev_i386.h"
 
 #include "qemu/error-report.h"
 #include "qemu/option.h"
@@ -3612,6 +3613,13 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 *ecx = 0;
 *edx = 0;
 break;
+case 0x801F:
+*eax = sev_enabled() ? 0x2 : 0;
+*ebx = sev_get_cbit_position();
+*ebx |= sev_get_reduced_phys_bits() << 6;
+*ecx = 0;
+*edx = 0;
+break;
 default:
 /* reserved values: zero */
 *eax = 0;
@@ -4041,6 +4049,11 @@ static void x86_cpu_expand_features(X86CPU *cpu, Error 
**errp)
 if (env->features[FEAT_8000_0001_ECX] & CPUID_EXT3_SVM) {
 x86_cpu_adjust_level(cpu, >cpuid_min_xlevel, 0x800A);
 }
+
+/* SEV requires CPUID[0x801F] */
+if (sev_enabled()) {
+x86_cpu_adjust_level(cpu, >cpuid_min_xlevel, 0x801F);
+}
 }
 
 /* Set cpuid_*level* based on cpuid_min_*level, if not explicitly set */
-- 
2.14.3




[Qemu-devel] [PATCH v12 05/28] machine: add -memory-encryption property

2018-03-08 Thread Brijesh Singh
When CPU supports memory encryption feature, the property can be used to
specify the encryption object to use when launching an encrypted guest.

Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: Eduardo Habkost <ehabk...@redhat.com>
Cc: Marcel Apfelbaum <mar...@redhat.com>
Cc: Stefan Hajnoczi <stefa...@gmail.com>
Signed-off-by: Brijesh Singh <brijesh.si...@amd.com>
---
 hw/core/machine.c   | 22 ++
 include/hw/boards.h |  1 +
 qemu-options.hx |  5 -
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 5e2bbcdacedb..2040177664d5 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -334,6 +334,22 @@ static bool machine_get_enforce_config_section(Object 
*obj, Error **errp)
 return ms->enforce_config_section;
 }
 
+static char *machine_get_memory_encryption(Object *obj, Error **errp)
+{
+MachineState *ms = MACHINE(obj);
+
+return g_strdup(ms->memory_encryption);
+}
+
+static void machine_set_memory_encryption(Object *obj, const char *value,
+Error **errp)
+{
+MachineState *ms = MACHINE(obj);
+
+g_free(ms->memory_encryption);
+ms->memory_encryption = g_strdup(value);
+}
+
 void machine_class_allow_dynamic_sysbus_dev(MachineClass *mc, const char *type)
 {
 strList *item = g_new0(strList, 1);
@@ -612,6 +628,12 @@ static void machine_class_init(ObjectClass *oc, void *data)
 _abort);
 object_class_property_set_description(oc, "enforce-config-section",
 "Set on to enforce configuration section migration", _abort);
+
+object_class_property_add_str(oc, "memory-encryption",
+machine_get_memory_encryption, machine_set_memory_encryption,
+_abort);
+object_class_property_set_description(oc, "memory-encryption",
+"Set memory encyption object to use", _abort);
 }
 
 static void machine_class_base_init(ObjectClass *oc, void *data)
diff --git a/include/hw/boards.h b/include/hw/boards.h
index efb0a9edfdf1..8ce9a7a21d3d 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -243,6 +243,7 @@ struct MachineState {
 bool suppress_vmdesc;
 bool enforce_config_section;
 bool enable_graphics;
+char *memory_encryption;
 
 ram_addr_t ram_size;
 ram_addr_t maxram_size;
diff --git a/qemu-options.hx b/qemu-options.hx
index 6585058c6cde..4c280142c52c 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -43,7 +43,8 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
 "suppress-vmdesc=on|off disables self-describing migration 
(default=off)\n"
 "nvdimm=on|off controls NVDIMM support (default=off)\n"
 "enforce-config-section=on|off enforce configuration 
section migration (default=off)\n"
-"s390-squash-mcss=on|off (deprecated) controls support for 
squashing into default css (default=off)\n",
+"s390-squash-mcss=on|off (deprecated) controls support for 
squashing into default css (default=off)\n"
+"memory-encryption=@var{} memory encryption object to use 
(default=none)\n",
 QEMU_ARCH_ALL)
 STEXI
 @item -machine [type=]@var{name}[,prop=@var{value}[,...]]
@@ -110,6 +111,8 @@ code to send configuration section even if the machine-type 
sets the
 @option{migration.send-configuration} property to @var{off}.
 NOTE: this parameter is deprecated. Please use @option{-global}
 @option{migration.send-configuration}=@var{on|off} instead.
+@item memory-encryption=@var{}
+Memory encryption object to use. The default is none.
 @end table
 ETEXI
 
-- 
2.14.3




[Qemu-devel] [PATCH v12 20/28] hw/i386: set ram_debug_ops when memory encryption is enabled

2018-03-08 Thread Brijesh Singh
When memory encryption is enabled, the guest RAM and boot flash ROM will
contain the encrypted data. By setting the debug ops allow us to invoke
encryption APIs when accessing the memory for the debug purposes.

Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: Richard Henderson <r...@twiddle.net>
Cc: Eduardo Habkost <ehabk...@redhat.com>
Cc: "Michael S. Tsirkin" <m...@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.si...@amd.com>
---
 hw/i386/pc.c   | 9 +
 hw/i386/pc_sysfw.c | 6 ++
 2 files changed, 15 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 35fcb6efdfb9..69364b6856b5 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1360,6 +1360,15 @@ void pc_memory_init(PCMachineState *pcms,
 e820_add_entry(0x1ULL, pcms->above_4g_mem_size, E820_RAM);
 }
 
+/*
+ * When memory encryption is enabled, the guest RAM will be encrypted with
+ * a guest unique key. Set the debug ops so that any debug access to the
+ * guest RAM will go through the memory encryption APIs.
+ */
+if (kvm_memcrypt_enabled()) {
+kvm_memcrypt_set_debug_ops(ram);
+}
+
 if (!pcmc->has_reserved_memory &&
 (machine->ram_slots ||
  (machine->maxram_size > machine->ram_size))) {
diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
index 73ac783f2055..845240f97293 100644
--- a/hw/i386/pc_sysfw.c
+++ b/hw/i386/pc_sysfw.c
@@ -181,6 +181,12 @@ static void pc_system_flash_init(MemoryRegion *rom_memory)
 error_report("failed to encrypt pflash rom");
 exit(1);
 }
+
+/*
+ * The pflash ROM is encrypted, set the debug ops so that any
+ * debug accesses will use memory encryption APIs.
+ */
+kvm_memcrypt_set_debug_ops(flash_mem);
 }
 }
 }
-- 
2.14.3




[Qemu-devel] [PATCH v12 19/28] sev/i386: finalize the SEV guest launch flow

2018-03-08 Thread Brijesh Singh
SEV launch flow requires us to issue LAUNCH_FINISH command before guest
is ready to run.

Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: Richard Henderson <r...@twiddle.net>
Cc: Eduardo Habkost <ehabk...@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.si...@amd.com>
---
 target/i386/sev.c| 29 +
 target/i386/trace-events |  1 +
 2 files changed, 30 insertions(+)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 7ad7eaf600a7..ce199d259f7a 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -578,6 +578,34 @@ static Notifier sev_machine_done_notify = {
 .notify = sev_launch_get_measure,
 };
 
+static void
+sev_launch_finish(SEVState *s)
+{
+int ret, error;
+
+trace_kvm_sev_launch_finish();
+ret = sev_ioctl(sev_state->sev_fd, KVM_SEV_LAUNCH_FINISH, 0, );
+if (ret) {
+error_report("%s: LAUNCH_FINISH ret=%d fw_error=%d '%s'",
+ __func__, ret, error, fw_error_to_str(error));
+exit(1);
+}
+
+sev_set_guest_state(SEV_STATE_RUNNING);
+}
+
+static void
+sev_vm_state_change(void *opaque, int running, RunState state)
+{
+SEVState *s = opaque;
+
+if (running) {
+if (!sev_check_state(SEV_STATE_RUNNING)) {
+sev_launch_finish(s);
+}
+}
+}
+
 void *
 sev_guest_init(const char *id)
 {
@@ -656,6 +684,7 @@ sev_guest_init(const char *id)
 
 ram_block_notifier_add(_ram_notifier);
 qemu_add_machine_init_done_notifier(_machine_done_notify);
+qemu_add_vm_change_state_handler(sev_vm_state_change, s);
 
 return s;
 err:
diff --git a/target/i386/trace-events b/target/i386/trace-events
index f7a1a1e6b85c..b1fbde6e40fe 100644
--- a/target/i386/trace-events
+++ b/target/i386/trace-events
@@ -14,3 +14,4 @@ kvm_sev_change_state(const char *old, const char *new) "%s -> 
%s"
 kvm_sev_launch_start(int policy, void *session, void *pdh) "policy 0x%x 
session %p pdh %p"
 kvm_sev_launch_update_data(void *addr, uint64_t len) "addr %p len 0x%" PRIu64
 kvm_sev_launch_measurement(const char *value) "data %s"
+kvm_sev_launch_finish(void) ""
-- 
2.14.3




  1   2   3   4   5   >