Re: [Xen-devel] [PATCH V3 2/29] VIOMMU: Add vIOMMU helper functions to create, destroy vIOMMU instance

2017-10-29 Thread Lan Tianyu
On 2017年10月18日 22:05, Roger Pau Monné wrote:
>> +int viommu_register_type(uint64_t type, struct viommu_ops *ops)
>> > +{
>> > +struct viommu_type *viommu_type = NULL;
>> > +
>> > +if ( !viommu_enabled() )
>> > +return -ENODEV;
>> > +
>> > +if ( viommu_get_type(type) )
>> > +return -EEXIST;
>> > +
>> > +viommu_type = xzalloc(struct viommu_type);
>> > +if ( !viommu_type )
>> > +return -ENOMEM;
>> > +
>> > +viommu_type->type = type;
>> > +viommu_type->ops = ops;
>> > +
>> > +spin_lock(_list_lock);
>> > +list_add_tail(_type->node, _list);
>> > +spin_unlock(_list_lock);
>> > +
>> > +return 0;
>> > +}
> As mentioned above, I think this viommu_register_type helper could be
> avoided. I would rather use a macro similar to REGISTER_SCHEDULER in
> order to populate an array at link time, and then just iterate over
> it.
> 

Hi Jan:
Could you help to check whether REGISTER_SCHEDULER is right direction
for vIOMMU? It needs to change Xen lds layout. From my view, a list to
manage vIOMMU device model types will be more easy and this maybe a
common solution.

-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 2/29] VIOMMU: Add vIOMMU helper functions to create, destroy vIOMMU instance

2017-10-29 Thread Lan Tianyu
On 2017年10月25日 09:43, Lan Tianyu wrote:
>> For all platforms supporting HVM, for PV I don't think it makes sense.
>> > Since AFAIK ARM guest type is also HVM I would rather introduce this
>> > field in the hvm_domain structure rather than the generic domain
>> > structure.
>> > 
> This sounds reasonable.
> 
>> > You might want to wait for feedback from others regarding this issue.
>> > 
> I discussed with Julien before. He hoped no to add viommu code for ARM
> first.So struct hvm_domain seems to be better place since it's arch
> specific definition and only add struct viommu for struct hvm_domain of x86.

Hi Roger:
If PV guest needs PV IOMMU support, struct iommu should be put  into
struct domain and it can be reused by full-virtualization and PV iommu.
Malcolm Crossley sent out RFC patch of pv iommu before. I found it also
needs to change struct domain.

https://lists.xenproject.org/archives/html/xen-devel/2016-02/msg01441.html


-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 8/29] tools/libxl: create vIOMMU during domain construction

2017-10-26 Thread Lan Tianyu
On 2017年10月26日 20:05, Wei Liu wrote:
> On Thu, Oct 19, 2017 at 11:13:57AM +0100, Roger Pau Monné wrote:
>>> +
>>> +if (viommu->type == LIBXL_VIOMMU_TYPE_INTEL_VTD) {
>>> +ret = xc_viommu_create(ctx->xch, domid, 
>>> VIOMMU_TYPE_INTEL_VTD,
>>> +   viommu->base_addr, viommu->cap, 
>>> );
>>
>> As said in another patch: this will break compilation because
>> xc_viommu_create is introduced in patch 9.
>>
>> Please organize the patches in a way that the code always compiles and
>> works fine. Keep in mind that the Xen tree should be bisectable
>> always.
>>
> 
> +10 to this.
> 
> We rely heavily on our test system's bisector to tell us what is wrong.
> The bisector works on patch level. Please make sure every patch builds,
> otherwise the test system will just give up.
> 
> If triaging can be done automatically by computers, maintainers can
> spend less time doing tedious work and more time reviewing patches
> (yours included).

Sure. Will pay more attention on this.

> 
> Normally I use git-rebase to build every commit, but I figured that's a
> bit too dangerous so I wrote a script.
> 
> Please check out:
> 
>   [PATCH v3 for-4.10] scripts: introduce a script for build test
> 
> It is still under review, but you can fish out some of the runes to do
> build tests.

This is very helpful. Thanks.
-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [Qemu-devel] [PATCH] x86: Skip check apic_id_limit for Xen

2017-10-26 Thread Lan Tianyu
On 2017年10月26日 22:27, Michael S. Tsirkin wrote:
> On Thu, Oct 26, 2017 at 02:19:43PM +0200, Eduardo Habkost wrote:
>> On Mon, Aug 21, 2017 at 10:22:15AM +0800, Lan Tianyu wrote:
>>> On 2017年08月19日 00:38, Eduardo Habkost wrote:
>>>> On Thu, Aug 17, 2017 at 09:37:10AM +0800, Lan Tianyu wrote:
>>>>> On 2017年08月16日 19:21, Paolo Bonzini wrote:
>>>>>> On 16/08/2017 02:22, Lan Tianyu wrote:
>>>>>>> Xen vIOMMU device model will be in Xen hypervisor. Skip vIOMMU
>>>>>>> check for Xen here when vcpu number is more than 255.
>>>>>>
>>>>>> I think you still need to do a check for vIOMMU being enabled.
>>>>>
>>>>> Yes, this will be done in the Xen tool stack and Qemu doesn't have such
>>>>> knowledge. Operations of create, destroy Xen vIOMMU will be done in the
>>>>> Xen tool stack.
>>>>
>>>> Shouldn't we make QEMU have knowledge of the vIOMMU device, then?
>>>> Won't QEMU need to know about it eventually?
>>>>
>>>
>>> Hi Eduardo:
>>>  Thanks for your review.
>>>  Xen has some guest modes which doesn't use Qemu and we tried to
>>> make Xen vIOMMU framework compatible with all guest modes. So far, we
>>> are adding interrupt remapping function for Xen vIOMMU and find qemu
>>> doesn't need to know Xen vIOMMU. The check of vcpu number > 255 here
>>> will be done in Xen side and so skip the check in Qemu to avoid blocking
>>> Xen creating >255 vcpus.
>>>  We may make Qemu have knowledge of the vIOMMU device if it's
>>> necessary when adding new function.
>>
>> I was expecting it to go through the PC tree, but I will queue it
>> on x86-next instead.
> 
> I was waiting for an ack from you or Paolo as you participated in the
> discussion. But sure, go ahead
> 
> Acked-by: Michael S. Tsirkin <m...@redhat.com>
> 

Great. Thanks.
-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 20/29] VIOMMU: Add get irq info callback to convert irq remapping request

2017-10-25 Thread Lan Tianyu
On 2017年10月25日 15:43, Roger Pau Monné wrote:
> On Wed, Oct 25, 2017 at 03:30:39PM +0800, Lan Tianyu wrote:
>> On 2017年10月19日 23:42, Roger Pau Monné wrote:
>>> On Thu, Sep 21, 2017 at 11:02:01PM -0400, Lan Tianyu wrote:
>>>
>>>>  
>>>>  struct viommu_ops {
>>>> @@ -28,6 +29,9 @@ struct viommu_ops {
>>>>  int (*destroy)(struct viommu *viommu);
>>>>  int (*handle_irq_request)(struct domain *d,
>>>>struct arch_irq_remapping_request *request);
>>>> +int (*get_irq_info)(struct domain *d,
>>>> +struct arch_irq_remapping_request *request,
>>>
>>> AFAICT d and request should be constified.
>>
>> Did you mean to keep d and request in the same line? This will exceed 80
>> chars.
> 
> No, I meant that the parameters of the function should be "const struct
> domain *d" and "const struct arch_irq_remapping_request *request".
> AFAICT you should never modify them inside of get_irq_info.
> 

OK. I got it. This makes sense and will update.

-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 20/29] VIOMMU: Add get irq info callback to convert irq remapping request

2017-10-25 Thread Lan Tianyu
On 2017年10月19日 23:42, Roger Pau Monné wrote:
> On Thu, Sep 21, 2017 at 11:02:01PM -0400, Lan Tianyu wrote:
>> This patch is to add get_irq_info callback for platform implementation
>> to convert irq remapping request to irq info (E,G vector, dest, dest_mode
>> and so on).
>>
>> Signed-off-by: Lan Tianyu <tianyu@intel.com>
>> ---
>>  xen/common/viommu.c  | 16 
>>  xen/include/asm-x86/viommu.h |  8 
>>  xen/include/xen/viommu.h | 14 ++
>>  3 files changed, 38 insertions(+)
>>
>> diff --git a/xen/common/viommu.c b/xen/common/viommu.c
>> index b517158..0708e43 100644
>> --- a/xen/common/viommu.c
>> +++ b/xen/common/viommu.c
>> @@ -178,6 +178,22 @@ int viommu_handle_irq_request(struct domain *d,
>>  return viommu->ops->handle_irq_request(d, request);
>>  }
>>  
>> +int viommu_get_irq_info(struct domain *d,
>> +struct arch_irq_remapping_request *request,
>> +struct arch_irq_remapping_info *irq_info)
>> +{
>> +struct viommu *viommu = d->viommu;
>> +
>> +if ( !viommu )
>> +return -EINVAL;
> 
> OK, here there's a check for !viommu. Can we please have this written
> down in the header? (ie: which functions are safe/expected to be
> called without a viommu)

Sure. I will add some comments.

> 
>> +
>> +ASSERT(viommu->ops);
>> +if ( !viommu->ops->get_irq_info )
>> +return -EINVAL;
>> +
>> +return viommu->ops->get_irq_info(d, request, irq_info);
>> +}
>> +
>>  /*
>>   * Local variables:
>>   * mode: C
>> diff --git a/xen/include/asm-x86/viommu.h b/xen/include/asm-x86/viommu.h
>> index 366fbb6..586b6bd 100644
>> --- a/xen/include/asm-x86/viommu.h
>> +++ b/xen/include/asm-x86/viommu.h
>> @@ -24,6 +24,14 @@
>>  #define VIOMMU_REQUEST_IRQ_MSI  0
>>  #define VIOMMU_REQUEST_IRQ_APIC 1
>>  
>> +struct arch_irq_remapping_info
>> +{
>> +uint8_t  vector;
>> +uint32_t dest;
>> +uint32_t dest_mode:1;
>> +uint32_t delivery_mode:3;
> 
> Why uint32_t for this two last fields? Also please sort them so that
> the padding is limited at the end of the structure.

Yes, this makes sense.

> 
>> +};
>> +
>>  struct arch_irq_remapping_request
>>  {
>>  union {
>> diff --git a/xen/include/xen/viommu.h b/xen/include/xen/viommu.h
>> index 230f6b1..beb40cd 100644
>> --- a/xen/include/xen/viommu.h
>> +++ b/xen/include/xen/viommu.h
>> @@ -21,6 +21,7 @@
>>  #define __XEN_VIOMMU_H__
>>  
>>  struct viommu;
>> +struct arch_irq_remapping_info;
>>  struct arch_irq_remapping_request;
> 
> If you include asm/viommu.h in viommu.h you don't need to forward
> declarations.

Will update.

> 
>>  
>>  struct viommu_ops {
>> @@ -28,6 +29,9 @@ struct viommu_ops {
>>  int (*destroy)(struct viommu *viommu);
>>  int (*handle_irq_request)(struct domain *d,
>>struct arch_irq_remapping_request *request);
>> +int (*get_irq_info)(struct domain *d,
>> +struct arch_irq_remapping_request *request,
> 
> AFAICT d and request should be constified.

Did you mean to keep d and request in the same line? This will exceed 80
chars.


-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 5/29] tools/libacpi: Add new fields in acpi_config for DMAR table

2017-10-25 Thread Lan Tianyu
On 2017年10月19日 16:40, Roger Pau Monné wrote:
> On Thu, Oct 19, 2017 at 04:09:02PM +0800, Lan Tianyu wrote:
>> On 2017年10月18日 23:12, Roger Pau Monné wrote:
>>>> diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
>>>> index a2efd23..fdd6a78 100644
>>>> --- a/tools/libacpi/libacpi.h
>>>> +++ b/tools/libacpi/libacpi.h
>>>> @@ -20,6 +20,8 @@
>>>>  #ifndef __LIBACPI_H__
>>>>  #define __LIBACPI_H__
>>>>  
>>>> +#include 
>>>
>>> I'm quite sure you shouldn't add this here, see how headers are added
>>> using LIBACPI_STDUTILS.
>>>
>>
>> We may replace bool with uint8_t xxx:1 to avoid introduce new head file.
> 
> Did you check whether including stdbool is actually required? AFAICT
> hvmloader util.h already includes it, and you would only have to
> introduce it in libxl if it's not there yet.
> 

Yes, you are right. stdbool.h has introduced in both libxl(libxl.h) and
hvmloader(util.h). We just need to adjust include order.

-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 2/29] VIOMMU: Add vIOMMU helper functions to create, destroy vIOMMU instance

2017-10-24 Thread Lan Tianyu
On 2017年10月19日 16:47, Roger Pau Monné wrote:
> For all platforms supporting HVM, for PV I don't think it makes sense.
> Since AFAIK ARM guest type is also HVM I would rather introduce this
> field in the hvm_domain structure rather than the generic domain
> structure.
> 

This sounds reasonable.

> You might want to wait for feedback from others regarding this issue.
> 

I discussed with Julien before. He hoped no to add viommu code for ARM
first.So struct hvm_domain seems to be better place since it's arch
specific definition and only add struct viommu for struct hvm_domain of x86.

-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 1/29] Xen/doc: Add Xen virtual IOMMU doc

2017-10-24 Thread Lan Tianyu
On 2017年10月19日 19:28, Jan Beulich wrote:
>>>> On 19.10.17 at 10:49, <roger@citrix.com> wrote:
>> On Thu, Oct 19, 2017 at 10:26:36AM +0800, Lan Tianyu wrote:
>>> Hi Roger:
>>>  Thanks for review.
>>>
>>> On 2017年10月18日 21:26, Roger Pau Monné wrote:
>>>> On Thu, Sep 21, 2017 at 11:01:42PM -0400, Lan Tianyu wrote:
>>>>> +Xen hypervisor vIOMMU command
>>>>> +=
>>>>> +Introduce vIOMMU command "viommu=1" to enable vIOMMU function in 
>> hypervisor.
>>>>> +It's default disabled.
>>>>
>>>> Hm, I'm not sure we really need this. At the end viommu will be
>>>> disabled by default for guests, unless explicitly enabled in the
>>>> config file.
>>>
>>> This is according to Jan's early comments on RFC patch
>>> https://patchwork.kernel.org/patch/9733869/.
>>>
>>> "It's actually a question whether in our current scheme a Kconfig
>>> option is appropriate here in the first place. I'd rather see this be
>>> an always built feature which needs enabling on the command line
>>> for the time being."
>>
>> So if I read this correctly Jan wanted you to ditch the Kconfig option
>> and instead rely on the command line option to enable/disable it.
> 
> Yes.
> 
> Jan
> 

OK. I will remove the command in the next version. Thanks for clarification.

-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 00/29]

2017-10-22 Thread Lan Tianyu
On 2017年10月20日 19:36, Roger Pau Monné wrote:
> On Thu, Sep 21, 2017 at 11:01:41PM -0400, Lan Tianyu wrote:
>> Change since v2:
>>1) Remove vIOMMU hypercall of query capabilities and introduce when 
>> necessary.
>>2) Remove length field of vIOMMU create parameter of vIOMMU hypercall
>>3) Introduce irq remapping mode callback to vIOMMU framework and 
>> vIOMMU device models
>> can check irq remapping mode by vendor specific ways.
>>4) Update vIOMMU docs.
>>5) Other changes please see patches' change logs.
>>
>> Change since v1:
>>1) Fix coding style issues
>>2) Add definitions for vIOMMU type and capabilities
>>3) Change vIOMMU kconfig and select vIOMMU default on x86
>>4) Put vIOMMU creation in libxl__arch_domain_create()
>>5) Make vIOMMU structure of tool stack more general for both PV and 
>> HVM.
>>
>> Change since RFC v2:
>>1) Move vvtd.c to drivers/passthrough/vtd directroy. 
>>2) Make vIOMMU always built in on x86
>>3) Add new boot cmd "viommu" to enable viommu function
>>4) Fix some code stype issues.
>>
>> Change since RFC v1:
>>1) Add Xen virtual IOMMU doc docs/misc/viommu.txt
>>2) Move vIOMMU hypercall of create/destroy vIOMMU and query  
>> capabilities from dmop to domctl suggested by Paul Durrant. Because
>> these hypercalls can be done in tool stack and more VM mode(E,G PVH
>> or other modes don't use Qemu) can be benefit.
>>3) Add check of input MMIO address and length.
>>4) Add iommu_type in vIOMMU hypercall parameter to specify
>> vendor vIOMMU device model(E,G Intel VTD, AMD or ARM IOMMU. So far
>> only support Intel VTD).
>>5) Add save and restore support for vvtd
>>
>>
>> This patchset is to introduce vIOMMU framework and add virtual VTD's
>> interrupt remapping support according "Xen virtual IOMMU high level
>> design doc V3"(https://lists.xenproject.org/archives/html/xen-devel/
>> 2016-11/msg01391.html).
>>
>> - vIOMMU framework
>> New framework provides viommu_ops and help functions to abstract
>> vIOMMU operations(E,G create, destroy, handle irq remapping request
>> and so on). Vendors(Intel, ARM, AMD and son) can implement their
>> vIOMMU callbacks.
>>
>> - Virtual VTD
>> We enable irq remapping function and covers both
>> MSI and IOAPIC interrupts. Don't support post interrupt mode emulation
>> and post interrupt mode enabled on host with virtual VTD. will add
>> later.
> 
> Hello,
> 
> Just a couple of generic comments on the whole series:
> 
>  - Please make sure that the result after each patch is buildable. It
>is of extreme importance the that Xen tree is bisectable at all
>points.
> 
>  - Regarding the organization of the series, I would rather prefer
>that you place the design document at the beginning (like it's done
>now), then the hypervisor changes (possibly the generic framework
>first, then the vvtd functionality and finally all the hooks into
>common code) and the toolstack side at the end. This might be just
>my personal taste, but I think it's clearer to review/understand
>rather than mixed as it is now.
> 
>  - Finally, please try to make sure that each patch introduces the
>helpers or structures that it needs. For example don't place all
>the "static inline" helpers together with a bunch of structures in
>an isolated patch, and then a bunch of patches that start making
>use of them. Instead introduce the structures or helpers in the
>context when they are used. An exception of this might be for very
>big or generic structures.
> 

Sure. We will follow your guide. Thanks.

-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 11/29] x86/hvm: Introduce a emulated VTD for HVM

2017-10-20 Thread Lan Tianyu
On 2017年10月20日 14:56, Jan Beulich wrote:
>>>> On 20.10.17 at 04:46, <chao@intel.com> wrote:
>> On Thu, Oct 19, 2017 at 12:20:35PM +0100, Roger Pau Monné wrote:
>>> On Thu, Sep 21, 2017 at 11:01:52PM -0400, Lan Tianyu wrote:
>>>> From: Chao Gao <chao@intel.com>
>>>>
>>>> This patch adds create/destroy function for the emulated VTD
>>>> and adapts it to the common VIOMMU abstraction.
>>>>
>>>> Signed-off-by: Chao Gao <chao@intel.com>
>>>> Signed-off-by: Lan Tianyu <tianyu@intel.com>
>>>> ---
>>>>  
>>>> -obj-y += iommu.o
>>>>  obj-y += dmar.o
>>>> -obj-y += utils.o
>>>> -obj-y += qinval.o
>>>>  obj-y += intremap.o
>>>> +obj-y += iommu.o
>>>> +obj-y += qinval.o
>>>>  obj-y += quirks.o
>>>> +obj-y += utils.o
>>>
>>> Why do you need to shuffle the list above?
>>
>> I placed them in alphabetic order.
> 
> Which is appreciated. But this being non-essential for the patch, it
> would avoid (valid) reviewer questions if you said in the description
> this is an intended but non-essential change.
> 
>>> Also I'm not sure the Intel vIOMMU implementation should live here. As
>>> you can see the path is:
>>>
>>> xen/drivers/passthrough/vtd/
>>>
>>> The vIOMMU is not tied to passthrough at all, so I would rather place
>>> it in:
> 
> Hmm, is vIOMMU usable without an actual backing IOMMU?

For interrupt remapping support, we can emulate it without physical IOMMU.

> 
>>> xen/drivers/vvtd/
>>>
>>> Or maybe you can create something like:
>>>
>>> xen/drivers/viommu/
>>>
>>> So that all vIOMMU implementations can share some code.
>>>
>>
>> vvtd and vtd use the same header files (i.g. vtd.h). That is why we put
>> it there.  If that, we shoule move the related header files to a public
>> directory.
> 
> And AMD (long ago) had placed their (still incomplete) virtual
> implementation into the same directory as well. I.e. at this point
> I'm not really opposed to the proposed placement here, albeit
> I can see the point of Roger's argument.
> 
> Jan
> 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 5/29] tools/libacpi: Add new fields in acpi_config for DMAR table

2017-10-19 Thread Lan Tianyu
On 2017年10月18日 23:12, Roger Pau Monné wrote:
> On Thu, Sep 21, 2017 at 11:01:46PM -0400, Lan Tianyu wrote:
>> From: Chao Gao <chao@intel.com>
>>
>> The BIOS reports the remapping hardware units in a platform to system 
>> software
>> through the DMA Remapping Reporting (DMAR) ACPI table.
>> New fields are introduces for DMAR table. These new fields are set by
>  ^ introduced
>> toolstack through parsing guest's config file. construct_dmar() is added to
>> build DMAR table according to the new fields.
>>
>> Signed-off-by: Chao Gao <chao@intel.com>
>> Signed-off-by: Lan Tianyu <tianyu@intel.com>
>> ---
>> v3:
>>  - Remove chip-set specific IOAPIC BDF. Instead, let IOAPIC-related
>>  info be passed by struct acpi_config.
>>
>> ---
>>  tools/libacpi/build.c   | 53 
>> +
>>  tools/libacpi/libacpi.h | 12 +++
>>  2 files changed, 65 insertions(+)
>>
>> diff --git a/tools/libacpi/build.c b/tools/libacpi/build.c
>> index f9881c9..5ee8fcd 100644
>> --- a/tools/libacpi/build.c
>> +++ b/tools/libacpi/build.c
>> @@ -303,6 +303,59 @@ static struct acpi_20_slit *construct_slit(struct 
>> acpi_ctxt *ctxt,
>>  return slit;
>>  }
>>  
>> +/*
>> + * Only one DMA remapping hardware unit is exposed and all devices
>> + * are under the remapping hardware unit. I/O APIC should be explicitly
>> + * enumerated.
>> + */
>> +struct acpi_dmar *construct_dmar(struct acpi_ctxt *ctxt,
>> + const struct acpi_config *config)
>> +{
>> +struct acpi_dmar *dmar;
>> +struct acpi_dmar_hardware_unit *drhd;
>> +struct dmar_device_scope *scope;
>> +unsigned int size;
>> +unsigned int ioapic_scope_size = sizeof(*scope) + 
>> sizeof(scope->path[0]);
> 
> I'm not sure I follow why you need to add the size of a uint16_t here.
> 
>> +
>> +size = sizeof(*dmar) + sizeof(*drhd) + ioapic_scope_size;
> 
> size can be initialized at declaration time.
> 
>> +
>> +dmar = ctxt->mem_ops.alloc(ctxt, size, 16);
> 
> Even dmar can be initialized at declaration time.
> 

OK. Will update.

>> +if ( !dmar )
>> +return NULL;
>> +
>> +memset(dmar, 0, size);
>> +dmar->header.signature = ACPI_2_0_DMAR_SIGNATURE;
>> +dmar->header.revision = ACPI_2_0_DMAR_REVISION;
>> +dmar->header.length = size;
>> +fixed_strcpy(dmar->header.oem_id, ACPI_OEM_ID);
>> +fixed_strcpy(dmar->header.oem_table_id, ACPI_OEM_TABLE_ID);
>> +dmar->header.oem_revision = ACPI_OEM_REVISION;
>> +dmar->header.creator_id   = ACPI_CREATOR_ID;
>> +dmar->header.creator_revision = ACPI_CREATOR_REVISION;
>> +dmar->host_address_width = config->host_addr_width - 1;
>> +if ( config->iommu_intremap_supported )
>> +dmar->flags |= ACPI_DMAR_INTR_REMAP;
>> +if ( !config->iommu_x2apic_supported )
>> +dmar->flags |= ACPI_DMAR_X2APIC_OPT_OUT;
> 
> Is there any reason why we would want to create a guest with a vIOMMU
> but not x2APIC support?

Will remove this.

> 
>> +
>> +drhd = (struct acpi_dmar_hardware_unit *)((void*)dmar + sizeof(*dmar));
>   ^ space
>> +drhd->type = ACPI_DMAR_TYPE_HARDWARE_UNIT;
>> +drhd->length = sizeof(*drhd) + ioapic_scope_size;
>> +drhd->flags = ACPI_DMAR_INCLUDE_PCI_ALL;
>> +drhd->pci_segment = 0;
>> +drhd->base_address = config->iommu_base_addr;
>> +
>> +scope = >scope[0];
>> +scope->type = ACPI_DMAR_DEVICE_SCOPE_IOAPIC;
>> +scope->length = ioapic_scope_size;
>> +scope->enumeration_id = config->ioapic_id;
>> +scope->bus = config->ioapic_bus;
>> +scope->path[0] = config->ioapic_devfn;
>> +
>> +set_checksum(dmar, offsetof(struct acpi_header, checksum), size);
>> +return dmar;
>> +}
>> +
>>  static int construct_passthrough_tables(struct acpi_ctxt *ctxt,
>>  unsigned long *table_ptrs,
>>  int nr_tables,
>> diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
>> index a2efd23..fdd6a78 100644
>> --- a/tools/libacpi/libacpi.h
>> +++ b/tools/libacpi/libacpi.h
>> @@ -20,6 +20,8 @@
>>  #ifndef __LIBACPI_H__
>>  #define __LIBACPI_H__
>>  
>> +#include 
> 
> I'm quite sure you shouldn't add this here, see how headers are added
> using LIBACPI_STDUTILS.
> 

We may replace bool with uint8_t xxx:1 to avoid introduce new head file.

-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 4/29] tools/libacpi: Add DMA remapping reporting (DMAR) ACPI table structures

2017-10-19 Thread Lan Tianyu
On 2017年10月18日 22:36, Roger Pau Monné wrote:
> On Thu, Sep 21, 2017 at 11:01:45PM -0400, Lan Tianyu wrote:
>> From: Chao Gao <chao@intel.com>
>>
>> Add dmar table structure according Chapter 8 "BIOS Considerations" of
>> VTd spec Rev. 2.4.
>>
>> VTd 
>> spec:http://www.intel.com/content/dam/www/public/us/en/documents/product-specifications/vt-directed-io-spec.pdf
>>
>> Signed-off-by: Chao Gao <chao@intel.com>
>> Signed-off-by: Lan Tianyu <tianyu@intel.com>
>> ---
>>  tools/libacpi/acpi2_0.h | 61 
>> +
>>  1 file changed, 61 insertions(+)
>>
>> diff --git a/tools/libacpi/acpi2_0.h b/tools/libacpi/acpi2_0.h
>> index 2619ba3..758a823 100644
>> --- a/tools/libacpi/acpi2_0.h
>> +++ b/tools/libacpi/acpi2_0.h
>> @@ -422,6 +422,65 @@ struct acpi_20_slit {
>>  };
>>  
>>  /*
>> + * DMA Remapping Table header definition (DMAR)
>> + */
>> +
>> +/*
>> + * DMAR Flags.
>> + */
>> +#define ACPI_DMAR_INTR_REMAP(1 << 0)
>> +#define ACPI_DMAR_X2APIC_OPT_OUT(1 << 1)
>> +
>> +struct acpi_dmar {
>> +struct acpi_header header;
>> +uint8_t host_address_width;
>> +uint8_t flags;
>> +uint8_t reserved[10];
>> +};
>> +
>> +/*
>> + * Device Scope Types
>> + */
>> +#define ACPI_DMAR_DEVICE_SCOPE_PCI_ENDPOINT 0x01
>> +#define ACPI_DMAR_DEVICE_SCOPE_PCI_SUB_HIERARACHY   0x01
>^0x02
>> +#define ACPI_DMAR_DEVICE_SCOPE_IOAPIC   0x03
>> +#define ACPI_DMAR_DEVICE_SCOPE_HPET 0x04
>> +#define ACPI_DMAR_DEVICE_SCOPE_ACPI_NAMESPACE_DEVICE0x05
> 
> Maybe you could try to reduce the length of the defines?

Sure. Will update.

> 
>> +
>> +struct dmar_device_scope {
>> +uint8_t type;
>> +uint8_t length;
>> +uint8_t reserved[2];
>> +uint8_t enumeration_id;
>> +uint8_t bus;
>> +uint16_t path[0];
>> +};
>> +
>> +/*
>> + * DMA Remapping Hardware Unit Types
>> + */
>> +#define ACPI_DMAR_TYPE_HARDWARE_UNIT0x00
>> +#define ACPI_DMAR_TYPE_RESERVED_MEMORY  0x01
>> +#define ACPI_DMAR_TYPE_ATSR 0x02
>> +#define ACPI_DMAR_TYPE_HARDWARE_AFFINITY0x03
>> +#define ACPI_DMAR_TYPE_ANDD 0x04
> 
> I think you either use acronyms for all of them (like ATSR and ANDD)
> or not. But mixing acronyms with full names is confusing.

OK. Will update.

-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 3/29] DOMCTL: Introduce new DOMCTL commands for vIOMMU support

2017-10-19 Thread Lan Tianyu
On 2017年10月18日 22:18, Roger Pau Monné wrote:
> On Thu, Sep 21, 2017 at 11:01:44PM -0400, Lan Tianyu wrote:
>> This patch is to introduce create, destroy and query capabilities
>> command for vIOMMU. vIOMMU layer will deal with requests and call
>> arch vIOMMU ops.
>>
>> Signed-off-by: Lan Tianyu <tianyu@intel.com>
>> ---
>>  xen/common/domctl.c |  6 ++
>>  xen/common/viommu.c | 30 ++
>>  xen/include/public/domctl.h | 42 ++
>>  xen/include/xen/viommu.h|  2 ++
>>  4 files changed, 80 insertions(+)
>>
>> diff --git a/xen/common/domctl.c b/xen/common/domctl.c
>> index 42658e5..7e28237 100644
>> --- a/xen/common/domctl.c
>> +++ b/xen/common/domctl.c
>> @@ -1149,6 +1149,12 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) 
>> u_domctl)
>>  copyback = 1;
>>  break;
>>  
>> +#ifdef CONFIG_VIOMMU
>> +case XEN_DOMCTL_viommu_op:
>> +ret = viommu_domctl(d, >u.viommu_op, );
> 
> IMHO, I'm not really sure if it's worth to pass the copyback parameter
> around. Can you just do the copy if !ret?

Yes, will update.

> 
>> +break;
>> +#endif
> 
> Instead of guarding every call to a viommu related function with
> CONFIG_VIOMMU I would rather add dummy replacements for them in the
> !CONFIG_VIOMMU case in the viommu.h header.


OK.

> 
>> +
>>  default:
>>  ret = arch_do_domctl(op, d, u_domctl);
>>  break;
>>  /*
>>   * Local variables:
>>   * mode: C
>> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
>> index 50ff58f..68854b6 100644
>> --- a/xen/include/public/domctl.h
>> +++ b/xen/include/public/domctl.h
>> @@ -1163,6 +1163,46 @@ struct xen_domctl_psr_cat_op {
>>  typedef struct xen_domctl_psr_cat_op xen_domctl_psr_cat_op_t;
>>  DEFINE_XEN_GUEST_HANDLE(xen_domctl_psr_cat_op_t);
>>  
>> +/*  vIOMMU helper
>> + *
>> + *  vIOMMU interface can be used to create/destroy vIOMMU and
>> + *  query vIOMMU capabilities.
>> + */
>> +
>> +/* vIOMMU type - specify vendor vIOMMU device model */
>> +#define VIOMMU_TYPE_INTEL_VTD   0
>> +
>> +/* vIOMMU capabilities */
>> +#define VIOMMU_CAP_IRQ_REMAPPING  (1u << 0)
> 
> Please put those two defines next to the fields they belong to.

OK.




-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 2/29] VIOMMU: Add vIOMMU helper functions to create, destroy vIOMMU instance

2017-10-19 Thread Lan Tianyu
On 2017年10月18日 22:05, Roger Pau Monné wrote:
> On Thu, Sep 21, 2017 at 11:01:43PM -0400, Lan Tianyu wrote:
>> +int viommu_destroy_domain(struct domain *d)
>> +{
>> +int ret;
>> +
>> +if ( !d->viommu )
>> +return -EINVAL;
> 
> ENODEV would be better.

OK. Will update.

> 
>> +
>> +ret = d->viommu->ops->destroy(d->viommu);
>> +if ( ret < 0 )
>> +return ret;
>> +
>> +xfree(d->viommu);
>> +d->viommu = NULL;
> 
> Newline preferably.

OK.

> 
>> +return 0;
>> +}
>> +
>> +static struct viommu_type *viommu_get_type(uint64_t type)
>> +{
>> +struct viommu_type *viommu_type = NULL;
>> +
>> +spin_lock(_list_lock);
>> +list_for_each_entry( viommu_type, _list, node )
>> +{
>> +if ( viommu_type->type == type )
>> +{
>> +spin_unlock(_list_lock);
>> +return viommu_type;
>> +}
>> +}
>> +spin_unlock(_list_lock);
> 
> Why do you need a lock here, and a list at all?
> 
> AFAICT vIOMMU types will never be added at runtime.

Yes, will remove it.

> 
>> +
>> +return NULL;
>> +}
>> +
>> +int viommu_register_type(uint64_t type, struct viommu_ops *ops)
>> +{
>> +struct viommu_type *viommu_type = NULL;
>> +
>> +if ( !viommu_enabled() )
>> +return -ENODEV;
>> +
>> +if ( viommu_get_type(type) )
>> +return -EEXIST;
>> +
>> +viommu_type = xzalloc(struct viommu_type);
>> +if ( !viommu_type )
>> +return -ENOMEM;
>> +
>> +viommu_type->type = type;
>> +viommu_type->ops = ops;
>> +
>> +spin_lock(_list_lock);
>> +list_add_tail(_type->node, _list);
>> +spin_unlock(_list_lock);
>> +
>> +return 0;
>> +}
> 
> As mentioned above, I think this viommu_register_type helper could be
> avoided. I would rather use a macro similar to REGISTER_SCHEDULER in
> order to populate an array at link time, and then just iterate over
> it.
> 
>> +
>> +static int viommu_create(struct domain *d, uint64_t type,
>> + uint64_t base_address, uint64_t caps,
>> + uint32_t *viommu_id)
> 
> I'm quite sure this doesn't compile, you are adding a static function
> here that's not used at all in this patch. Please be careful and don't
> introduce patches that will break the build.

This function will be used in the next patch. "DOMCTL: Introduce new
DOMCTL commands for vIOMMU support.". So this doesn't break patchset
build. Will combine these two patches to avoid such issue.


> 
>> +{
>> +struct viommu *viommu;
>> +struct viommu_type *viommu_type = NULL;
>> +int rc;
>> +
>> +/* Only support one vIOMMU per domain. */
>> +if ( d->viommu )
>> +return -E2BIG;
>> +
>> +viommu_type = viommu_get_type(type);
>> +if ( !viommu_type )
>> +return -EINVAL;
>> +
>> +if ( !viommu_type->ops || !viommu_type->ops->create )
>> +return -EINVAL;
> 
> Can this really happen? What's the point in having a iommu_type
> without ops or without the create op? I think this should be an ASSERT
> instead.

How about add ASSERT(viommu_type->ops->create) here?

> 
>> +
>> +viommu = xzalloc(struct viommu);
>> +if ( !viommu )
>> +return -ENOMEM;
>> +
>> +viommu->base_address = base_address;
>> +viommu->caps = caps;
>> +viommu->ops = viommu_type->ops;
>> +
>> +rc = viommu->ops->create(d, viommu);
>> +if ( rc < 0 )
>> +{
>> +xfree(viommu);
>> +return rc;
>> +}
>> +
>> +d->viommu = viommu;
>> +
>> +/* Only support one vIOMMU per domain. */
>> +*viommu_id = 0;
>> +return 0;
>> +}
>> +
>> +/*
>> + * Local variables:
>> + * mode: C
>> + * c-file-style: "BSD"
>> + * c-basic-offset: 4
>> + * tab-width: 4
>> + * indent-tabs-mode: nil
>> + * End:
>> + */
>> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
>> index 5b8f8c6..750f235 100644
>> --- a/xen/include/xen/sched.h
>> +++ b/xen/include/xen/sched.h
>> @@ -33,6 +33,10 @@
>>  DEFINE_XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t);
>>  #endif
>>  
>> +#ifdef CONFIG_VIOMMU
>> +#include 
>> +#endif
> 
> I would sug

Re: [Xen-devel] [PATCH V3 1/29] Xen/doc: Add Xen virtual IOMMU doc

2017-10-18 Thread Lan Tianyu
Hi Roger:
 Thanks for review.

On 2017年10月18日 21:26, Roger Pau Monné wrote:
> On Thu, Sep 21, 2017 at 11:01:42PM -0400, Lan Tianyu wrote:
>> This patch is to add Xen virtual IOMMU doc to introduce motivation,
>> framework, vIOMMU hypercall and xl configuration.
>>
>> Signed-off-by: Lan Tianyu <tianyu@intel.com>
>> ---
>>  docs/misc/viommu.txt | 136 
>> +++
>>  1 file changed, 136 insertions(+)
>>  create mode 100644 docs/misc/viommu.txt
>>
>> diff --git a/docs/misc/viommu.txt b/docs/misc/viommu.txt
>> new file mode 100644
>> index 000..348e8c4
>> --- /dev/null
>> +++ b/docs/misc/viommu.txt
>> @@ -0,0 +1,136 @@
>> +Xen virtual IOMMU
>> +
>> +Motivation
>> +==
>> +Enable more than 128 vcpu support
>> +
>> +The current requirements of HPC cloud service requires VM with a high
>> +number of CPUs in order to achieve high performance in parallel
>> +computing.
>> +
>> +To support >128 vcpus, X2APIC mode in guest is necessary because legacy
>> +APIC(XAPIC) just supports 8-bit APIC ID. The APIC ID used by Xen is
>> +CPU ID * 2 (ie: CPU 127 has APIC ID 254, which is the last one available
>> +in xAPIC mode) and so it only can support 128 vcpus at most. x2APIC mode
>> +supports 32-bit APIC ID and it requires the interrupt remapping 
>> functionality
>> +of a vIOMMU if the guest wishes to route interrupts to all available vCPUs
>> +
>> +The reason for this is that there is no modification for existing PCI MSI
>> +and IOAPIC when introduce X2APIC.
> 
> I'm not sure the above sentence makes much sense. IMHO I would just
> remove it.

OK. Will remove.

> 
>> PCI MSI/IOAPIC can only send interrupt
>> +message containing 8-bit APIC ID, which cannot address cpus with >254
>> +APIC ID. Interrupt remapping supports 32-bit APIC ID and so it's necessary
>> +for >128 vcpus support.
>> +
>> +
>> +vIOMMU Architecture
>> +===
>> +vIOMMU device model is inside Xen hypervisor for following factors
>> +1) Avoid round trips between Qemu and Xen hypervisor
>> +2) Ease of integration with the rest of hypervisor
>> +3) HVMlite/PVH doesn't use Qemu
> 
> Just use PVH here, HVMlite == PVH now.

OK.

> 
>> +
>> +* Interrupt remapping overview.
>> +Interrupts from virtual devices and physical devices are delivered
>> +to vLAPIC from vIOAPIC and vMSI. vIOMMU needs to remap interrupt during
>> +this procedure.
>> +
>> ++---+
>> +|Qemu   |VM |
>> +|   | ++|
>> +|   | |  Device driver ||
>> +|   | ++---+|
>> +|   |  ^|
>> +|   ++  | ++---+|
>> +|   | Virtual device |  | |  IRQ subsystem ||
>> +|   +---++  | ++---+|
>> +|   |   |  ^|
>> +|   |   |  ||
>> ++---+---+
>> +|hypervisor |  | VIRQ   |
>> +|   |+-++   |
>> +|   ||  vLAPIC  |   |
>> +|   |VIRQ+-++   |
>> +|   |  ^|
>> +|   |  ||
>> +|   |+-++   |
>> +|   ||  vIOMMU  |   |
>> +|   |+-++   |
>> +|   |  ^|
>> +|   |  ||
>> +|   |+-++   |
>> +|   ||   vIOAPIC/vMSI   |   |
>> +|   |++++   |
>> +|   | ^^|
>> +|   +-+||
>> +|  ||
>> ++---+
>> +HW |IRQ
>> ++---+
>> +|   PCI Device  |
>> ++---+
>> +
>> +
>> +vIOMMU h

[Xen-devel] [PATCH V3 20/29] VIOMMU: Add get irq info callback to convert irq remapping request

2017-09-22 Thread Lan Tianyu
This patch is to add get_irq_info callback for platform implementation
to convert irq remapping request to irq info (E,G vector, dest, dest_mode
and so on).

Signed-off-by: Lan Tianyu <tianyu@intel.com>
---
 xen/common/viommu.c  | 16 
 xen/include/asm-x86/viommu.h |  8 
 xen/include/xen/viommu.h | 14 ++
 3 files changed, 38 insertions(+)

diff --git a/xen/common/viommu.c b/xen/common/viommu.c
index b517158..0708e43 100644
--- a/xen/common/viommu.c
+++ b/xen/common/viommu.c
@@ -178,6 +178,22 @@ int viommu_handle_irq_request(struct domain *d,
 return viommu->ops->handle_irq_request(d, request);
 }
 
+int viommu_get_irq_info(struct domain *d,
+struct arch_irq_remapping_request *request,
+struct arch_irq_remapping_info *irq_info)
+{
+struct viommu *viommu = d->viommu;
+
+if ( !viommu )
+return -EINVAL;
+
+ASSERT(viommu->ops);
+if ( !viommu->ops->get_irq_info )
+return -EINVAL;
+
+return viommu->ops->get_irq_info(d, request, irq_info);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/asm-x86/viommu.h b/xen/include/asm-x86/viommu.h
index 366fbb6..586b6bd 100644
--- a/xen/include/asm-x86/viommu.h
+++ b/xen/include/asm-x86/viommu.h
@@ -24,6 +24,14 @@
 #define VIOMMU_REQUEST_IRQ_MSI  0
 #define VIOMMU_REQUEST_IRQ_APIC 1
 
+struct arch_irq_remapping_info
+{
+uint8_t  vector;
+uint32_t dest;
+uint32_t dest_mode:1;
+uint32_t delivery_mode:3;
+};
+
 struct arch_irq_remapping_request
 {
 union {
diff --git a/xen/include/xen/viommu.h b/xen/include/xen/viommu.h
index 230f6b1..beb40cd 100644
--- a/xen/include/xen/viommu.h
+++ b/xen/include/xen/viommu.h
@@ -21,6 +21,7 @@
 #define __XEN_VIOMMU_H__
 
 struct viommu;
+struct arch_irq_remapping_info;
 struct arch_irq_remapping_request;
 
 struct viommu_ops {
@@ -28,6 +29,9 @@ struct viommu_ops {
 int (*destroy)(struct viommu *viommu);
 int (*handle_irq_request)(struct domain *d,
   struct arch_irq_remapping_request *request);
+int (*get_irq_info)(struct domain *d,
+struct arch_irq_remapping_request *request,
+struct arch_irq_remapping_info *info);
 };
 
 struct viommu {
@@ -50,6 +54,9 @@ int viommu_domctl(struct domain *d, struct 
xen_domctl_viommu_op *op,
   bool_t *need_copy);
 int viommu_handle_irq_request(struct domain *d,
   struct arch_irq_remapping_request *request);
+int viommu_get_irq_info(struct domain *d,
+struct arch_irq_remapping_request *request,
+struct arch_irq_remapping_info *irq_info);
 #else
 static inline int viommu_register_type(uint64_t type, struct viommu_ops *ops)
 {
@@ -61,6 +68,13 @@ viommu_handle_irq_request(struct domain *d,
 {
 return -EINVAL;
 }
+static inline int
+viommu_get_irq_info(struct domain *d,
+struct arch_irq_remapping_request *request,
+struct arch_irq_remapping_info *irq_info);
+{
+return -EINVAL;
+}
 #endif
 
 #endif /* __XEN_VIOMMU_H__ */
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH V3 24/29] tools/libxc: Add a new interface to bind remapping format msi with pirq

2017-09-22 Thread Lan Tianyu
From: Chao Gao <chao@intel.com>

When exposing vIOMMU (vvtd) to guest, guest can configure the msi to
remapping format. For pass-through device, the physical interrupt now
can be bound with remapping format msi. This patch introduce a flag,
HVM_IRQ_DPCI_GUEST_REMAPPED, which indicate a physical interrupt is
bound with remapping format guest interrupt. Thus, we can use
(HVM_IRQ_DPCI_GUEST_REMAPPED | HVM_IRQ_DPCI_GUEST_MSI) to show the new
binding type. Also provide an new interface to manage the new binding.

Signed-off-by: Chao Gao <chao@intel.com>
Signed-off-by: Lan Tianyu <tianyu@intel.com>

---
v3:
 - Introduce a new flag HVM_IRQ_DPCI_GUEST_REMAPPED
 - Remove the flag HVM_IRQ_DPCI_GUEST_MSI_IR
---
 tools/libxc/include/xenctrl.h |  17 +
 tools/libxc/xc_domain.c   |  53 +++
 xen/drivers/passthrough/io.c  | 155 +++---
 xen/include/asm-x86/hvm/irq.h |   7 ++
 xen/include/public/domctl.h   |   7 ++
 5 files changed, 216 insertions(+), 23 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index bedca1f..1a17974 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1720,6 +1720,15 @@ int xc_domain_ioport_mapping(xc_interface *xch,
  uint32_t nr_ports,
  uint32_t add_mapping);
 
+int xc_domain_update_msi_irq_remapping(
+xc_interface *xch,
+uint32_t domid,
+uint32_t pirq,
+uint32_t source_id,
+uint32_t data,
+uint64_t addr,
+uint64_t gtable);
+
 int xc_domain_update_msi_irq(
 xc_interface *xch,
 uint32_t domid,
@@ -1734,6 +1743,14 @@ int xc_domain_unbind_msi_irq(xc_interface *xch,
  uint32_t pirq,
  uint32_t gflags);
 
+int xc_domain_unbind_msi_irq_remapping(
+xc_interface *xch,
+uint32_t domid,
+uint32_t pirq,
+uint32_t source_id,
+uint32_t data,
+uint64_t addr);
+
 int xc_domain_bind_pt_irq(xc_interface *xch,
   uint32_t domid,
   uint8_t machine_irq,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 3bab4e8..4b6a510 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1702,8 +1702,34 @@ int xc_deassign_dt_device(
 return rc;
 }
 
+int xc_domain_update_msi_irq_remapping(
+xc_interface *xch,
+uint32_t domid,
+uint32_t pirq,
+uint32_t source_id,
+uint32_t data,
+uint64_t addr,
+uint64_t gtable)
+{
+int rc;
+xen_domctl_bind_pt_irq_t *bind;
+
+DECLARE_DOMCTL;
 
+domctl.cmd = XEN_DOMCTL_bind_pt_irq;
+domctl.domain = (domid_t)domid;
 
+bind = &(domctl.u.bind_pt_irq);
+bind->irq_type = PT_IRQ_TYPE_MSI_IR;
+bind->machine_irq = pirq;
+bind->u.msi_ir.source_id = source_id;
+bind->u.msi_ir.data = data;
+bind->u.msi_ir.addr = addr;
+bind->u.msi_ir.gtable = gtable;
+
+rc = do_domctl(xch, );
+return rc;
+}
 
 int xc_domain_update_msi_irq(
 xc_interface *xch,
@@ -1732,6 +1758,33 @@ int xc_domain_update_msi_irq(
 return rc;
 }
 
+int xc_domain_unbind_msi_irq_remapping(
+xc_interface *xch,
+uint32_t domid,
+uint32_t pirq,
+uint32_t source_id,
+uint32_t data,
+uint64_t addr)
+{
+int rc;
+xen_domctl_bind_pt_irq_t *bind;
+
+DECLARE_DOMCTL;
+
+domctl.cmd = XEN_DOMCTL_unbind_pt_irq;
+domctl.domain = (domid_t)domid;
+
+bind = &(domctl.u.bind_pt_irq);
+bind->irq_type = PT_IRQ_TYPE_MSI_IR;
+bind->machine_irq = pirq;
+bind->u.msi_ir.source_id = source_id;
+bind->u.msi_ir.data = data;
+bind->u.msi_ir.addr = addr;
+
+rc = do_domctl(xch, );
+return rc;
+}
+
 int xc_domain_unbind_msi_irq(
 xc_interface *xch,
 uint32_t domid,
diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index fb44223..6196334 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -21,9 +21,11 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
+#include 
 
 static DEFINE_PER_CPU(struct list_head, dpci_list);
 
@@ -275,6 +277,106 @@ static struct vcpu *vector_hashing_dest(const struct 
domain *d,
 return dest;
 }
 
+static void set_hvm_gmsi_info(struct hvm_gmsi_info *msi,
+  xen_domctl_bind_pt_irq_t *pt_irq_bind)
+{
+switch (pt_irq_bind->irq_type)
+{
+case PT_IRQ_TYPE_MSI:
+msi->legacy.gvec = pt_irq_bind->u.msi.gvec;
+msi->legacy.gflags = pt_irq_bind->u.msi.gflags;
+break;
+
+case PT_IRQ_TYPE_MSI_IR:
+msi->intremap.source_id = pt_irq_bind->u.msi_ir.source_id;
+msi->intremap.data = pt_irq_bind->u.msi_ir.data;
+msi->intremap.addr = pt_irq_bind->u.msi_ir.addr;
+break;
+
+default:
+ASSERT_UNREACHABLE();
+}
+}
+
+stati

[Xen-devel] [PATCH V3 28/29] x86/vvtd: Add queued invalidation (QI) support

2017-09-22 Thread Lan Tianyu
From: Chao Gao <chao@intel.com>

Queued Invalidation Interface is an expanded invalidation interface with
extended capabilities. Hardware implementations report support for queued
invalidation interface through the Extended Capability Register. The queued
invalidation interface uses an Invalidation Queue (IQ), which is a circular
buffer in system memory. Software submits commands by writing Invalidation
Descriptors to the IQ.

In this patch, a new function viommu_process_iq() is used for emulating how
hardware handles invalidation requests through QI.

Signed-off-by: Chao Gao <chao@intel.com>
Signed-off-by: Lan Tianyu <tianyu@intel.com>
---
 xen/drivers/passthrough/vtd/iommu.h |  19 ++-
 xen/drivers/passthrough/vtd/vvtd.c  | 232 
 2 files changed, 250 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.h 
b/xen/drivers/passthrough/vtd/iommu.h
index c69cd21..c2b83f1 100644
--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -177,6 +177,21 @@
 #define DMA_IRTA_S(val) (val & 0xf)
 #define DMA_IRTA_SIZE(val)  (1UL << (DMA_IRTA_S(val) + 1))
 
+/* IQA_REG */
+#define DMA_IQA_ADDR(val)   (val & ~0xfffULL)
+#define DMA_IQA_QS(val) (val & 0x7)
+#define DMA_IQA_RSVD0xff8ULL
+
+/* IECTL_REG */
+#define DMA_IECTL_IM_SHIFT 31
+#define DMA_IECTL_IM(1 << DMA_IECTL_IM_SHIFT)
+#define DMA_IECTL_IP_SHIFT 30
+#define DMA_IECTL_IP(1 << DMA_IECTL_IP_SHIFT)
+
+/* ICS_REG */
+#define DMA_ICS_IWC_SHIFT   0
+#define DMA_ICS_IWC (1 << DMA_ICS_IWC_SHIFT)
+
 /* PMEN_REG */
 #define DMA_PMEN_EPM(((u32)1) << 31)
 #define DMA_PMEN_PRS(((u32)1) << 0)
@@ -211,7 +226,8 @@
 #define DMA_FSTS_PPF (1U << DMA_FSTS_PPF_SHIFT)
 #define DMA_FSTS_AFO (1U << 2)
 #define DMA_FSTS_APF (1U << 3)
-#define DMA_FSTS_IQE (1U << 4)
+#define DMA_FSTS_IQE_SHIFT 4
+#define DMA_FSTS_IQE (1U << DMA_FSTS_IQE_SHIFT)
 #define DMA_FSTS_ICE (1U << 5)
 #define DMA_FSTS_ITE (1U << 6)
 #define DMA_FSTS_PRO_SHIFT 7
@@ -562,6 +578,7 @@ struct qinval_entry {
 
 /* Queue invalidation head/tail shift */
 #define QINVAL_INDEX_SHIFT 4
+#define QINVAL_INDEX_MASK  0x7fff0ULL
 
 #define qinval_present(v) ((v).lo & 1)
 #define qinval_fault_disable(v) (((v).lo >> 1) & 1)
diff --git a/xen/drivers/passthrough/vtd/vvtd.c 
b/xen/drivers/passthrough/vtd/vvtd.c
index 55f7a46..668d0c9 100644
--- a/xen/drivers/passthrough/vtd/vvtd.c
+++ b/xen/drivers/passthrough/vtd/vvtd.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -419,6 +420,177 @@ static int vvtd_record_fault(struct vvtd *vvtd,
 return X86EMUL_OKAY;
 }
 
+/*
+ * Process a invalidation descriptor. Currently, only two types descriptors,
+ * Interrupt Entry Cache Invalidation Descritor and Invalidation Wait
+ * Descriptor are handled.
+ * @vvtd: the virtual vtd instance
+ * @i: the index of the invalidation descriptor to be processed
+ *
+ * If success return 0, or return non-zero when failure.
+ */
+static int process_iqe(struct vvtd *vvtd, int i)
+{
+uint64_t iqa;
+struct qinval_entry *qinval_page;
+int ret = 0;
+
+iqa = vvtd_get_reg_quad(vvtd, DMAR_IQA_REG);
+qinval_page = map_guest_page(vvtd->domain, DMA_IQA_ADDR(iqa)>>PAGE_SHIFT);
+if ( IS_ERR(qinval_page) )
+{
+gdprintk(XENLOG_ERR, "Can't map guest IRT (rc %ld)",
+ PTR_ERR(qinval_page));
+return PTR_ERR(qinval_page);
+}
+
+switch ( qinval_page[i].q.inv_wait_dsc.lo.type )
+{
+case TYPE_INVAL_WAIT:
+if ( qinval_page[i].q.inv_wait_dsc.lo.sw )
+{
+uint32_t data = qinval_page[i].q.inv_wait_dsc.lo.sdata;
+uint64_t addr = (qinval_page[i].q.inv_wait_dsc.hi.saddr << 2);
+
+ret = hvm_copy_to_guest_phys(addr, , sizeof(data), current);
+if ( ret )
+vvtd_info("Failed to write status address");
+}
+
+/*
+ * The following code generates an invalidation completion event
+ * indicating the invalidation wait descriptor completion. Note that
+ * the following code fragment is not tested properly.
+ */
+if ( qinval_page[i].q.inv_wait_dsc.lo.iflag )
+{
+uint32_t ie_data, ie_addr;
+if ( !vvtd_test_and_set_bit(vvtd, DMAR_ICS_REG, DMA_ICS_IWC_SHIFT) 
)
+{
+vvtd_set_bit(vvtd, DMAR_IECTL_REG, DMA_IECTL_IP_SHIFT);
+if ( !vvtd_test_bit(vvtd, DMAR_IECTL_REG, DMA_IECTL_IM_SHIFT) )
+{
+ie_data = vvtd_get_reg(vvtd, DMAR_IEDATA_REG);
+ie_addr = vvtd_get_reg(vvtd, DMAR_IEADDR_REG);
+vvtd_generate_interrupt(vvtd, ie_addr, ie_data);
+

[Xen-devel] [PATCH V3 29/29] x86/vvtd: save and restore emulated VT-d

2017-09-22 Thread Lan Tianyu
From: Chao Gao <chao@intel.com>

Provide a save-restore pair to save/restore registers and non-register
status.

Signed-off-by: Chao Gao <chao@intel.com>
Signed-off-by: Lan Tianyu <tianyu@intel.com>
---
v3:
 - use one entry to save both vvtd registers and other intermediate
 state
---
 xen/drivers/passthrough/vtd/vvtd.c | 66 ++
 xen/include/public/arch-x86/hvm/save.h | 25 -
 2 files changed, 76 insertions(+), 15 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/vvtd.c 
b/xen/drivers/passthrough/vtd/vvtd.c
index 668d0c9..2aecd93 100644
--- a/xen/drivers/passthrough/vtd/vvtd.c
+++ b/xen/drivers/passthrough/vtd/vvtd.c
@@ -28,11 +28,13 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 #include "iommu.h"
 #include "vtd.h"
@@ -40,20 +42,6 @@
 /* Supported capabilities by vvtd */
 unsigned int vvtd_caps = VIOMMU_CAP_IRQ_REMAPPING;
 
-struct hvm_hw_vvtd_status {
-uint32_t eim_enabled : 1,
- intremap_enabled : 1;
-uint32_t fault_index;
-uint32_t irt_max_entry;
-/* Interrupt remapping table base gfn */
-uint64_t irt;
-};
-
-union hvm_hw_vvtd_regs {
-uint32_t data32[256];
-uint64_t data64[128];
-};
-
 struct vvtd {
 /* Address range of remapping hardware register-set */
 uint64_t base_addr;
@@ -1057,6 +1045,56 @@ static bool vvtd_is_remapping(struct domain *d,
 return 0;
 }
 
+static int vvtd_load(struct domain *d, hvm_domain_context_t *h)
+{
+struct hvm_hw_vvtd *hw_vvtd;
+
+if ( !domain_vvtd(d) )
+return -ENODEV;
+
+hw_vvtd = xmalloc(struct hvm_hw_vvtd);
+if ( !hw_vvtd )
+return -ENOMEM;
+
+if ( hvm_load_entry(VVTD, h, hw_vvtd) )
+{
+xfree(hw_vvtd);
+return -EINVAL;
+}
+
+memcpy(_vvtd(d)->status, _vvtd->status,
+   sizeof(struct hvm_hw_vvtd_status));
+memcpy(domain_vvtd(d)->regs, _vvtd->regs,
+   sizeof(union hvm_hw_vvtd_regs));
+xfree(hw_vvtd);
+
+return 0;
+}
+
+static int vvtd_save(struct domain *d, hvm_domain_context_t *h)
+{
+struct hvm_hw_vvtd *hw_vvtd;
+int ret;
+
+if ( !domain_vvtd(d) )
+return 0;
+
+hw_vvtd = xmalloc(struct hvm_hw_vvtd);
+if ( !hw_vvtd )
+return -ENOMEM;
+
+memcpy(_vvtd->status, _vvtd(d)->status,
+   sizeof(struct hvm_hw_vvtd_status));
+memcpy(_vvtd->regs, domain_vvtd(d)->regs,
+   sizeof(union hvm_hw_vvtd_regs));
+ret = hvm_save_entry(VVTD, 0, h, hw_vvtd);
+xfree(hw_vvtd);
+
+return ret;
+}
+
+HVM_REGISTER_SAVE_RESTORE(VVTD, vvtd_save, vvtd_load, 1, HVMSR_PER_DOM);
+
 static void vvtd_reset(struct vvtd *vvtd, uint64_t capability)
 {
 uint64_t cap = cap_set_num_fault_regs(1ULL) |
diff --git a/xen/include/public/arch-x86/hvm/save.h 
b/xen/include/public/arch-x86/hvm/save.h
index fd7bf3f..181abb2 100644
--- a/xen/include/public/arch-x86/hvm/save.h
+++ b/xen/include/public/arch-x86/hvm/save.h
@@ -639,10 +639,33 @@ struct hvm_msr {
 
 #define CPU_MSR_CODE  20
 
+union hvm_hw_vvtd_regs {
+uint32_t data32[256];
+uint64_t data64[128];
+};
+
+struct hvm_hw_vvtd_status
+{
+uint32_t eim_enabled : 1,
+ intremap_enabled : 1;
+uint32_t fault_index;
+uint32_t irt_max_entry;
+/* Interrupt remapping table base gfn */
+uint64_t irt;
+};
+
+struct hvm_hw_vvtd
+{
+union hvm_hw_vvtd_regs regs;
+struct hvm_hw_vvtd_status status;
+};
+
+DECLARE_HVM_SAVE_TYPE(VVTD, 21, struct hvm_hw_vvtd);
+
 /* 
  * Largest type-code in use
  */
-#define HVM_SAVE_CODE_MAX 20
+#define HVM_SAVE_CODE_MAX 21
 
 #endif /* __XEN_PUBLIC_HVM_SAVE_X86_H__ */
 
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH V3 26/29] x86/vvtd: Handle interrupt translation faults

2017-09-22 Thread Lan Tianyu
From: Chao Gao <chao@intel.com>

Interrupt translation faults are non-recoverable fault. When faults
are triggered, it needs to populate fault info to Fault Recording
Registers and inject vIOMMU msi interrupt to notify guest IOMMU driver
to deal with faults.

This patch emulates hardware's handling interrupt translation
faults (more information about the process can be found in VT-d spec,
chipter "Translation Faults", section "Non-Recoverable Fault
Reporting" and section "Non-Recoverable Logging").
Specifically, viommu_record_fault() records the fault information and
viommu_report_non_recoverable_fault() reports faults to software.
Currently, only Primary Fault Logging is supported and the Number of
Fault-recording Registers is 1.

Signed-off-by: Chao Gao <chao....@intel.com>
Signed-off-by: Lan Tianyu <tianyu@intel.com>
---
 xen/drivers/passthrough/vtd/iommu.h |  60 +++--
 xen/drivers/passthrough/vtd/vvtd.c  | 252 +++-
 2 files changed, 301 insertions(+), 11 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.h 
b/xen/drivers/passthrough/vtd/iommu.h
index 790384f..e19b045 100644
--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -198,26 +198,66 @@
 #define DMA_CCMD_CAIG_MASK(x) (((u64)x) & ((u64) 0x3 << 59))
 
 /* FECTL_REG */
-#define DMA_FECTL_IM (((u64)1) << 31)
+#define DMA_FECTL_IM_SHIFT 31
+#define DMA_FECTL_IM (1U << DMA_FECTL_IM_SHIFT)
+#define DMA_FECTL_IP_SHIFT 30
+#define DMA_FECTL_IP (1U << DMA_FECTL_IP_SHIFT)
 
 /* FSTS_REG */
-#define DMA_FSTS_PFO ((u64)1 << 0)
-#define DMA_FSTS_PPF ((u64)1 << 1)
-#define DMA_FSTS_AFO ((u64)1 << 2)
-#define DMA_FSTS_APF ((u64)1 << 3)
-#define DMA_FSTS_IQE ((u64)1 << 4)
-#define DMA_FSTS_ICE ((u64)1 << 5)
-#define DMA_FSTS_ITE ((u64)1 << 6)
-#define DMA_FSTS_FAULTSDMA_FSTS_PFO | DMA_FSTS_PPF | DMA_FSTS_AFO | 
DMA_FSTS_APF | DMA_FSTS_IQE | DMA_FSTS_ICE | DMA_FSTS_ITE
+#define DMA_FSTS_PFO_SHIFT 0
+#define DMA_FSTS_PFO (1U << DMA_FSTS_PFO_SHIFT)
+#define DMA_FSTS_PPF_SHIFT 1
+#define DMA_FSTS_PPF (1U << DMA_FSTS_PPF_SHIFT)
+#define DMA_FSTS_AFO (1U << 2)
+#define DMA_FSTS_APF (1U << 3)
+#define DMA_FSTS_IQE (1U << 4)
+#define DMA_FSTS_ICE (1U << 5)
+#define DMA_FSTS_ITE (1U << 6)
+#define DMA_FSTS_PRO_SHIFT 7
+#define DMA_FSTS_PRO (1U << DMA_FSTS_PRO_SHIFT)
+#define DMA_FSTS_FAULTS(DMA_FSTS_PFO | DMA_FSTS_PPF | DMA_FSTS_AFO | \
+DMA_FSTS_APF | DMA_FSTS_IQE | DMA_FSTS_ICE | \
+DMA_FSTS_ITE | DMA_FSTS_PRO)
+#define DMA_FSTS_RW1CS (DMA_FSTS_PFO | DMA_FSTS_AFO | DMA_FSTS_APF | \
+DMA_FSTS_IQE | DMA_FSTS_ICE | DMA_FSTS_ITE | \
+DMA_FSTS_PRO)
 #define dma_fsts_fault_record_index(s) (((s) >> 8) & 0xff)
 
 /* FRCD_REG, 32 bits access */
-#define DMA_FRCD_F (((u64)1) << 31)
+#define DMA_FRCD_LEN0x10
+#define DMA_FRCD2_OFFSET0x8
+#define DMA_FRCD3_OFFSET0xc
+#define DMA_FRCD_F_SHIFT31
+#define DMA_FRCD_F ((u64)1 << DMA_FRCD_F_SHIFT)
 #define dma_frcd_type(d) ((d >> 30) & 1)
 #define dma_frcd_fault_reason(c) (c & 0xff)
 #define dma_frcd_source_id(c) (c & 0x)
 #define dma_frcd_page_addr(d) (d & (((u64)-1) << 12)) /* low 64 bit */
 
+struct vtd_fault_record_register
+{
+union {
+struct {
+uint64_t lo;
+uint64_t hi;
+} bits;
+struct {
+uint64_t rsvd0  :12,
+ fault_info :52;
+uint64_t source_id  :16,
+ rsvd1  :9,
+ pmr:1,  /* Privilege Mode Requested */
+ exe:1,  /* Execute Permission Requested */
+ pasid_p:1,  /* PASID Present */
+ fault_reason   :8,  /* Fault Reason */
+ pasid_val  :20, /* PASID Value */
+ addr_type  :2,  /* Address Type */
+ type   :1,  /* Type. (0) Write (1) Read/AtomicOp 
*/
+ fault  :1;  /* Fault */
+} fields;
+};
+};
+
 enum VTD_FAULT_TYPE
 {
 /* Interrupt remapping transition faults */
diff --git a/xen/drivers/passthrough/vtd/vvtd.c 
b/xen/drivers/passthrough/vtd/vvtd.c
index bd1cadd..745941c 100644
--- a/xen/drivers/passthrough/vtd/vvtd.c
+++ b/xen/drivers/passthrough/vtd/vvtd.c
@@ -19,6 +19,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -41,6 +42,7 @@ unsigned int vvtd_caps = VIOMMU_CAP_IRQ_REMAPPING;
 struct hvm_hw_vvtd_status {
 uint32_t eim_enabled : 1,
  intremap_enabled : 1;
+uint32_t fault_index;
 uint32_t irt_max_entry;
 /* Interrupt remapping table base gfn */
 

[Xen-devel] [PATCH V3 27/29] x86/vvtd: Enable Queued Invalidation through GCMD

2017-09-22 Thread Lan Tianyu
From: Chao Gao <chao@intel.com>

Software writes to QIE field of GCMD to enable or disable queued
invalidations. This patch emulates QIE field of GCMD.

Signed-off-by: Chao Gao <chao@intel.com>
Signed-off-by: Lan Tianyu <tianyu@intel.com>
---
 xen/drivers/passthrough/vtd/iommu.h |  3 ++-
 xen/drivers/passthrough/vtd/vvtd.c  | 17 +
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.h 
b/xen/drivers/passthrough/vtd/iommu.h
index e19b045..c69cd21 100644
--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -162,7 +162,8 @@
 #define DMA_GSTS_FLS(((u64)1) << 29)
 #define DMA_GSTS_AFLS   (((u64)1) << 28)
 #define DMA_GSTS_WBFS   (((u64)1) << 27)
-#define DMA_GSTS_QIES   (((u64)1) <<26)
+#define DMA_GSTS_QIES_SHIFT 26
+#define DMA_GSTS_QIES   (((u64)1) << DMA_GSTS_QIES_SHIFT)
 #define DMA_GSTS_IRES_SHIFT 25
 #define DMA_GSTS_IRES   (((u64)1) << DMA_GSTS_IRES_SHIFT)
 #define DMA_GSTS_SIRTPS_SHIFT   24
diff --git a/xen/drivers/passthrough/vtd/vvtd.c 
b/xen/drivers/passthrough/vtd/vvtd.c
index 745941c..55f7a46 100644
--- a/xen/drivers/passthrough/vtd/vvtd.c
+++ b/xen/drivers/passthrough/vtd/vvtd.c
@@ -496,6 +496,19 @@ static void vvtd_handle_gcmd_ire(struct vvtd *vvtd, 
uint32_t val)
 }
 }
 
+static void vvtd_handle_gcmd_qie(struct vvtd *vvtd, uint32_t val)
+{
+vvtd_info("%sable Queue Invalidation", (val & DMA_GCMD_QIE) ? "En" : 
"Dis");
+
+if ( val & DMA_GCMD_QIE )
+vvtd_set_bit(vvtd, DMAR_GSTS_REG, DMA_GSTS_QIES_SHIFT);
+else
+{
+vvtd_set_reg_quad(vvtd, DMAR_IQH_REG, 0);
+vvtd_clear_bit(vvtd, DMAR_GSTS_REG, DMA_GSTS_QIES_SHIFT);
+}
+}
+
 static void vvtd_handle_gcmd_sirtp(struct vvtd *vvtd, uint32_t val)
 {
 uint64_t irta = vvtd_get_reg_quad(vvtd, DMAR_IRTA_REG);
@@ -535,6 +548,10 @@ static int vvtd_write_gcmd(struct vvtd *vvtd, uint32_t val)
 vvtd_handle_gcmd_sirtp(vvtd, val);
 if ( changed & DMA_GCMD_IRE )
 vvtd_handle_gcmd_ire(vvtd, val);
+if ( changed & DMA_GCMD_QIE )
+vvtd_handle_gcmd_qie(vvtd, val);
+if ( changed & ~(DMA_GCMD_SIRTP | DMA_GCMD_IRE | DMA_GCMD_QIE) )
+vvtd_info("Only SIRTP, IRE, QIE in GCMD are handled");
 
 return X86EMUL_OKAY;
 }
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH V3 22/29] x86/vioapic: extend vioapic_get_vector() to support remapping format RTE

2017-09-22 Thread Lan Tianyu
From: Chao Gao <chao@intel.com>

When IOAPIC RTE is in remapping format, it doesn't contain the vector of
interrupt. For this case, the RTE contains an index of interrupt remapping
table where the vector of interrupt is stored. This patchs gets the vector
through a vIOMMU interface.

Signed-off-by: Chao Gao <chao@intel.com>
Signed-off-by: Lan Tianyu <tianyu@intel.com>
---
 xen/arch/x86/hvm/vioapic.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index 5d0d1cd..9e47ef4 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -561,11 +561,25 @@ int vioapic_get_vector(const struct domain *d, unsigned 
int gsi)
 {
 unsigned int pin;
 const struct hvm_vioapic *vioapic = gsi_vioapic(d, gsi, );
+struct arch_irq_remapping_request request;
 
 if ( !vioapic )
 return -EINVAL;
 
-return vioapic->redirtbl[pin].fields.vector;
+irq_request_ioapic_fill(, vioapic->id, 
vioapic->redirtbl[pin].bits);
+if ( viommu_check_irq_remapping(vioapic->domain, ) )
+{
+int err;
+struct arch_irq_remapping_info info;
+
+err = viommu_get_irq_info(vioapic->domain, , );
+return !err ? info.vector : err;
+}
+else
+{
+return vioapic->redirtbl[pin].fields.vector;
+}
+
 }
 
 int vioapic_get_trigger_mode(const struct domain *d, unsigned int gsi)
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH V3 15/29] x86/vvtd: Process interrupt remapping request

2017-09-22 Thread Lan Tianyu
From: Chao Gao <chao@intel.com>

When a remapping interrupt request arrives, remapping hardware computes the
interrupt_index per the algorithm described in VTD spec
"Interrupt Remapping Table", interprets the IRTE and generates a remapped
interrupt request.

This patch introduces viommu_handle_irq_request() to emulate the process how
remapping hardware handles a remapping interrupt request.

Signed-off-by: Chao Gao <chao@intel.com>
Signed-off-by: Lan Tianyu <tianyu@intel.com>

---
v3:
 - Encode map_guest_page()'s error into void* to avoid using another parameter
---
 xen/drivers/passthrough/vtd/iommu.h |  21 +++
 xen/drivers/passthrough/vtd/vvtd.c  | 264 +++-
 2 files changed, 284 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.h 
b/xen/drivers/passthrough/vtd/iommu.h
index 703726f..790384f 100644
--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -218,6 +218,21 @@
 #define dma_frcd_source_id(c) (c & 0x)
 #define dma_frcd_page_addr(d) (d & (((u64)-1) << 12)) /* low 64 bit */
 
+enum VTD_FAULT_TYPE
+{
+/* Interrupt remapping transition faults */
+VTD_FR_IR_REQ_RSVD  = 0x20, /* One or more IR request reserved
+ * fields set */
+VTD_FR_IR_INDEX_OVER= 0x21, /* Index value greater than max */
+VTD_FR_IR_ENTRY_P   = 0x22, /* Present (P) not set in IRTE */
+VTD_FR_IR_ROOT_INVAL= 0x23, /* IR Root table invalid */
+VTD_FR_IR_IRTE_RSVD = 0x24, /* IRTE Rsvd field non-zero with
+ * Present flag set */
+VTD_FR_IR_REQ_COMPAT= 0x25, /* Encountered compatible IR
+ * request while disabled */
+VTD_FR_IR_SID_ERR   = 0x26, /* Invalid Source-ID */
+};
+
 /*
  * 0: Present
  * 1-11: Reserved
@@ -358,6 +373,12 @@ struct iremap_entry {
 };
 
 /*
+ * When VT-d doesn't enable Extended Interrupt Mode. Hardware only interprets
+ * only 8-bits ([15:8]) of Destination-ID field in the IRTEs.
+ */
+#define IRTE_xAPIC_DEST_MASK 0xff00
+
+/*
  * Posted-interrupt descriptor address is 64 bits with 64-byte aligned, only
  * the upper 26 bits of lest significiant 32 bits is available.
  */
diff --git a/xen/drivers/passthrough/vtd/vvtd.c 
b/xen/drivers/passthrough/vtd/vvtd.c
index a0f63e9..90c00f5 100644
--- a/xen/drivers/passthrough/vtd/vvtd.c
+++ b/xen/drivers/passthrough/vtd/vvtd.c
@@ -23,11 +23,17 @@
 #include 
 #include 
 #include 
+#include 
 #include 
+#include 
 #include 
+#include 
 #include 
+#include 
+#include 
 
 #include "iommu.h"
+#include "vtd.h"
 
 /* Supported capabilities by vvtd */
 unsigned int vvtd_caps = VIOMMU_CAP_IRQ_REMAPPING;
@@ -111,6 +117,132 @@ static inline uint64_t vvtd_get_reg_quad(struct vvtd 
*vtd, uint32_t reg)
 return vtd->regs->data64[reg/sizeof(uint64_t)];
 }
 
+static void* map_guest_page(struct domain *d, uint64_t gfn)
+{
+struct page_info *p;
+void *ret;
+
+p = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC);
+if ( !p )
+return ERR_PTR(-EINVAL);
+
+if ( !get_page_type(p, PGT_writable_page) )
+{
+put_page(p);
+return ERR_PTR(-EINVAL);
+}
+
+ret = __map_domain_page_global(p);
+if ( !ret )
+{
+put_page_and_type(p);
+return ERR_PTR(-ENOMEM);
+}
+
+return ret;
+}
+
+static void unmap_guest_page(void *virt)
+{
+struct page_info *page;
+
+ASSERT((unsigned long)virt & PAGE_MASK);
+page = mfn_to_page(domain_page_map_to_mfn(virt));
+
+unmap_domain_page_global(virt);
+put_page_and_type(page);
+}
+
+static void vvtd_inj_irq(struct vlapic *target, uint8_t vector,
+ uint8_t trig_mode, uint8_t delivery_mode)
+{
+vvtd_debug("dest=v%d, delivery_mode=%x vector=%d trig_mode=%d\n",
+   vlapic_vcpu(target)->vcpu_id, delivery_mode, vector, trig_mode);
+
+ASSERT((delivery_mode == dest_Fixed) ||
+   (delivery_mode == dest_LowestPrio));
+
+vlapic_set_irq(target, vector, trig_mode);
+}
+
+static int vvtd_delivery(struct domain *d, uint8_t vector,
+ uint32_t dest, uint8_t dest_mode,
+ uint8_t delivery_mode, uint8_t trig_mode)
+{
+struct vlapic *target;
+struct vcpu *v;
+
+switch ( delivery_mode )
+{
+case dest_LowestPrio:
+target = vlapic_lowest_prio(d, NULL, 0, dest, dest_mode);
+if ( target != NULL )
+{
+vvtd_inj_irq(target, vector, trig_mode, delivery_mode);
+break;
+}
+vvtd_debug("null round robin: vector=%02x\n", vector);
+break;
+
+case dest_Fixed:
+for_each_vcpu ( d, v )
+if ( vlapic_match_dest(vcpu_vlapic(v), NULL, 0, dest, dest_mode) )
+vvtd_inj_irq(vcpu_vlapic(v), vector, trig_mode, delivery_mode);
+b

[Xen-devel] [PATCH V3 19/29] x86/vioapic: Hook interrupt delivery of vIOAPIC

2017-09-22 Thread Lan Tianyu
From: Chao Gao <chao@intel.com>

When irq remapping is enabled, IOAPIC Redirection Entry may be in remapping
format. If that, generate an irq_remapping_request and call the common
VIOMMU abstraction's callback to handle this interrupt request. Device
model is responsible for checking the request's validity.

Signed-off-by: Chao Gao <chao@intel.com>
Signed-off-by: Lan Tianyu <tianyu@intel.com>

---
v3:
 - use the new interface to check remapping format.
---
 xen/arch/x86/hvm/vioapic.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index 72cae93..5d0d1cd 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -38,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* HACK: Route IRQ0 only to VCPU0 to prevent time jumps. */
 #define IRQ0_SPECIAL_ROUTING 1
@@ -387,9 +389,17 @@ static void vioapic_deliver(struct hvm_vioapic *vioapic, 
unsigned int pin)
 struct vlapic *target;
 struct vcpu *v;
 unsigned int irq = vioapic->base_gsi + pin;
+struct arch_irq_remapping_request request;
 
 ASSERT(spin_is_locked(>arch.hvm_domain.irq_lock));
 
+irq_request_ioapic_fill(, vioapic->id, 
vioapic->redirtbl[pin].bits);
+if ( viommu_check_irq_remapping(d, ) )
+{
+viommu_handle_irq_request(d, );
+return;
+}
+
 HVM_DBG_LOG(DBG_LEVEL_IOAPIC,
 "dest=%x dest_mode=%x delivery_mode=%x "
 "vector=%x trig_mode=%x",
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH V3 25/29] x86/vmsi: Hook delivering remapping format msi to guest

2017-09-22 Thread Lan Tianyu
From: Chao Gao <chao@intel.com>

In two situations, hypervisor delivers a msi to a hvm guest. One is
when qemu sends a request to hypervisor through XEN_DMOP_inject_msi.
The other is when a physical interrupt arrives and it has been bound
to a guest msi.

For the former, the msi is routed to common vIOMMU layer if it is in
remapping format. For the latter, if the pt irq is bound to a guest
remapping msi, a new remapping msi is constructed based on the binding
information and routed to common vIOMMU layer.

Signed-off-by: Chao Gao <chao@intel.com>
Signed-off-by: Lan Tianyu <tianyu@intel.com>
---
 xen/arch/x86/hvm/irq.c   |  7 +++
 xen/arch/x86/hvm/vmsi.c  | 14 +-
 xen/drivers/passthrough/io.c | 21 ++---
 3 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c
index e425df9..e99ba7d 100644
--- a/xen/arch/x86/hvm/irq.c
+++ b/xen/arch/x86/hvm/irq.c
@@ -23,9 +23,11 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
+#include 
 
 /* Must be called with hvm_domain->irq_lock hold */
 static void assert_gsi(struct domain *d, unsigned ioapic_gsi)
@@ -339,6 +341,11 @@ int hvm_inject_msi(struct domain *d, uint64_t addr, 
uint32_t data)
 uint8_t trig_mode = (data & MSI_DATA_TRIGGER_MASK)
 >> MSI_DATA_TRIGGER_SHIFT;
 uint8_t vector = data & MSI_DATA_VECTOR_MASK;
+struct arch_irq_remapping_request request;
+
+irq_request_msi_fill(, 0, addr, data);
+if ( viommu_check_irq_remapping(d, ) )
+return viommu_handle_irq_request(d, );
 
 if ( !vector )
 {
diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index 7f21853..1244df1 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -39,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static void vmsi_inj_irq(
 struct vlapic *target,
@@ -115,7 +117,17 @@ void vmsi_deliver_pirq(struct domain *d, const struct 
hvm_pirq_dpci *pirq_dpci)
 
 ASSERT(pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI);
 
-vmsi_deliver(d, vector, dest, dest_mode, delivery_mode, trig_mode);
+if ( pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_REMAPPED )
+{
+struct arch_irq_remapping_request request;
+
+irq_request_msi_fill(, pirq_dpci->gmsi.intremap.source_id,
+ pirq_dpci->gmsi.intremap.addr,
+ pirq_dpci->gmsi.intremap.data);
+viommu_handle_irq_request(d, );
+}
+else
+vmsi_deliver(d, vector, dest, dest_mode, delivery_mode, trig_mode);
 }
 
 /* Return value, -1 : multi-dests, non-negative value: dest_vcpu_id */
diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index 6196334..349a8cf 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -942,21 +942,20 @@ static void __msi_pirq_eoi(struct hvm_pirq_dpci 
*pirq_dpci)
 static int _hvm_dpci_msi_eoi(struct domain *d,
  struct hvm_pirq_dpci *pirq_dpci, void *arg)
 {
-int vector = (long)arg;
+uint8_t vector, dlm, vector_target = (long)arg;
+uint32_t dest;
+bool dm;
 
-if ( (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
- (pirq_dpci->gmsi.legacy.gvec == vector) )
+if ( pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI )
 {
-unsigned int dest = MASK_EXTR(pirq_dpci->gmsi.legacy.gflags,
-  XEN_DOMCTL_VMSI_X86_DEST_ID_MASK);
-bool dest_mode = pirq_dpci->gmsi.legacy.gflags &
- XEN_DOMCTL_VMSI_X86_DM_MASK;
+if ( pirq_dpci_2_msi_attr(d, pirq_dpci, , , , ) )
+return 0;
 
-if ( vlapic_match_dest(vcpu_vlapic(current), NULL, 0, dest,
-   dest_mode) )
+if ( vector == vector_target &&
+ vlapic_match_dest(vcpu_vlapic(current), NULL, 0, dest, dm) )
 {
-__msi_pirq_eoi(pirq_dpci);
-return 1;
+__msi_pirq_eoi(pirq_dpci);
+return 1;
 }
 }
 
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH V3 16/29] x86/vvtd: decode interrupt attribute from IRTE

2017-09-22 Thread Lan Tianyu
From: Chao Gao <chao@intel.com>

Without interrupt remapping, interrupt attributes can be extracted from
msi message or IOAPIC RTE. However, with interrupt remapping enabled,
the attributes are enclosed in the associated IRTE. This callback is
for cases in which the caller wants to acquire interrupt attributes, for
example:
1. vioapic_get_vector(). With vIOMMU, the RTE may don't contain vector.
2. perform EOI which is always based on the interrupt vector.

Signed-off-by: Chao Gao <chao@intel.com>
Signed-off-by: Lan Tianyu <tianyu@intel.com>
---
v3:
 - add example cases in which we will use this function.
---
 xen/drivers/passthrough/vtd/vvtd.c | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/vtd/vvtd.c 
b/xen/drivers/passthrough/vtd/vvtd.c
index 90c00f5..5e22ace 100644
--- a/xen/drivers/passthrough/vtd/vvtd.c
+++ b/xen/drivers/passthrough/vtd/vvtd.c
@@ -516,6 +516,26 @@ static int vvtd_handle_irq_request(struct domain *d,
  irte.remap.tm);
 }
 
+static int vvtd_get_irq_info(struct domain *d,
+ struct arch_irq_remapping_request *irq,
+ struct arch_irq_remapping_info *info)
+{
+int ret;
+struct iremap_entry irte;
+struct vvtd *vvtd = domain_vvtd(d);
+
+ret = vvtd_get_entry(vvtd, irq, , false);
+if ( ret )
+return ret;
+
+info->vector = irte.remap.vector;
+info->dest = irte_dest(vvtd, irte.remap.dst);
+info->dest_mode = irte.remap.dm;
+info->delivery_mode = irte.remap.dlm;
+
+return 0;
+}
+
 static void vvtd_reset(struct vvtd *vvtd, uint64_t capability)
 {
 uint64_t cap = cap_set_num_fault_regs(1ULL) |
@@ -586,7 +606,8 @@ static int vvtd_destroy(struct viommu *viommu)
 struct viommu_ops vvtd_hvm_vmx_ops = {
 .create = vvtd_create,
 .destroy = vvtd_destroy,
-.handle_irq_request = vvtd_handle_irq_request
+.handle_irq_request = vvtd_handle_irq_request,
+.get_irq_info = vvtd_get_irq_info
 };
 
 static int vvtd_register(void)
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH V3 21/29] VIOMMU: Introduce callback of checking irq remapping mode

2017-09-22 Thread Lan Tianyu
This patch is to add callback for vIOAPIC and vMSI to check whether interrupt
remapping is enabled.

Signed-off-by: Lan Tianyu <tianyu@intel.com>
---
 xen/common/viommu.c  | 15 +++
 xen/include/xen/viommu.h | 10 ++
 2 files changed, 25 insertions(+)

diff --git a/xen/common/viommu.c b/xen/common/viommu.c
index 0708e43..ff95465 100644
--- a/xen/common/viommu.c
+++ b/xen/common/viommu.c
@@ -194,6 +194,21 @@ int viommu_get_irq_info(struct domain *d,
 return viommu->ops->get_irq_info(d, request, irq_info);
 }
 
+bool viommu_check_irq_remapping(struct domain *d,
+struct arch_irq_remapping_request *request)
+{
+struct viommu *viommu = d->viommu;
+
+if ( !viommu )
+return false;
+
+ASSERT(viommu->ops);
+if ( !viommu->ops->check_irq_remapping )
+return false;
+
+return viommu->ops->check_irq_remapping(d, request);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/xen/viommu.h b/xen/include/xen/viommu.h
index beb40cd..b5ac1e6 100644
--- a/xen/include/xen/viommu.h
+++ b/xen/include/xen/viommu.h
@@ -26,6 +26,8 @@ struct arch_irq_remapping_request;
 
 struct viommu_ops {
 int (*create)(struct domain *d, struct viommu *viommu);
+bool (*check_irq_remapping)(struct domain *d,
+struct arch_irq_remapping_request *request);
 int (*destroy)(struct viommu *viommu);
 int (*handle_irq_request)(struct domain *d,
   struct arch_irq_remapping_request *request);
@@ -57,6 +59,8 @@ int viommu_handle_irq_request(struct domain *d,
 int viommu_get_irq_info(struct domain *d,
 struct arch_irq_remapping_request *request,
 struct arch_irq_remapping_info *irq_info);
+bool viommu_check_irq_remapping(struct domain *d,
+struct arch_irq_remapping_request *request);
 #else
 static inline int viommu_register_type(uint64_t type, struct viommu_ops *ops)
 {
@@ -75,6 +79,12 @@ viommu_get_irq_info(struct domain *d,
 {
 return -EINVAL;
 }
+static inline bool
+viommu_check_irq_remapping(struct domain *d,
+   struct arch_irq_remapping_request *request)
+{
+return false;
+}
 #endif
 
 #endif /* __XEN_VIOMMU_H__ */
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH V3 23/29] passthrough: move some fields of hvm_gmsi_info to a sub-structure

2017-09-22 Thread Lan Tianyu
From: Chao Gao <chao@intel.com>

No functional change. It is a preparation for introducing new fields in
hvm_gmsi_info to manage remapping format msi bound to a physical msi.

Signed-off-by: Chao Gao <chao@intel.com>
Signed-off-by: Lan Tianyu <tianyu@intel.com>
---
 xen/arch/x86/hvm/vmsi.c   |  4 ++--
 xen/drivers/passthrough/io.c  | 34 ++
 xen/include/asm-x86/hvm/irq.h |  8 ++--
 3 files changed, 26 insertions(+), 20 deletions(-)

diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index 9b35e9b..7f21853 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -101,8 +101,8 @@ int vmsi_deliver(
 
 void vmsi_deliver_pirq(struct domain *d, const struct hvm_pirq_dpci *pirq_dpci)
 {
-uint32_t flags = pirq_dpci->gmsi.gflags;
-int vector = pirq_dpci->gmsi.gvec;
+uint32_t flags = pirq_dpci->gmsi.legacy.gflags;
+int vector = pirq_dpci->gmsi.legacy.gvec;
 uint8_t dest = (uint8_t)flags;
 bool dest_mode = flags & XEN_DOMCTL_VMSI_X86_DM_MASK;
 uint8_t delivery_mode = MASK_EXTR(flags, XEN_DOMCTL_VMSI_X86_DELIV_MASK);
diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index ec9f41a..fb44223 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -350,8 +350,8 @@ int pt_irq_create_bind(
 {
 pirq_dpci->flags = HVM_IRQ_DPCI_MAPPED | HVM_IRQ_DPCI_MACH_MSI |
HVM_IRQ_DPCI_GUEST_MSI;
-pirq_dpci->gmsi.gvec = pt_irq_bind->u.msi.gvec;
-pirq_dpci->gmsi.gflags = gflags;
+pirq_dpci->gmsi.legacy.gvec = pt_irq_bind->u.msi.gvec;
+pirq_dpci->gmsi.legacy.gflags = gflags;
 /*
  * 'pt_irq_create_bind' can be called after 'pt_irq_destroy_bind'.
  * The 'pirq_cleanup_check' which would free the structure is only
@@ -383,8 +383,8 @@ int pt_irq_create_bind(
 }
 if ( unlikely(rc) )
 {
-pirq_dpci->gmsi.gflags = 0;
-pirq_dpci->gmsi.gvec = 0;
+pirq_dpci->gmsi.legacy.gflags = 0;
+pirq_dpci->gmsi.legacy.gvec = 0;
 pirq_dpci->dom = NULL;
 pirq_dpci->flags = 0;
 pirq_cleanup_check(info, d);
@@ -403,21 +403,22 @@ int pt_irq_create_bind(
 }
 
 /* If pirq is already mapped as vmsi, update guest data/addr. */
-if ( pirq_dpci->gmsi.gvec != pt_irq_bind->u.msi.gvec ||
- pirq_dpci->gmsi.gflags != gflags )
+if ( pirq_dpci->gmsi.legacy.gvec != pt_irq_bind->u.msi.gvec ||
+ pirq_dpci->gmsi.legacy.gflags != gflags )
 {
 /* Directly clear pending EOIs before enabling new MSI info. */
 pirq_guest_eoi(info);
 
-pirq_dpci->gmsi.gvec = pt_irq_bind->u.msi.gvec;
-pirq_dpci->gmsi.gflags = gflags;
+}
+pirq_dpci->gmsi.legacy.gvec = pt_irq_bind->u.msi.gvec;
+pirq_dpci->gmsi.legacy.gflags = gflags;
 }
 }
 /* Calculate dest_vcpu_id for MSI-type pirq migration. */
-dest = MASK_EXTR(pirq_dpci->gmsi.gflags,
+dest = MASK_EXTR(pirq_dpci->gmsi.legacy.gflags,
  XEN_DOMCTL_VMSI_X86_DEST_ID_MASK);
-dest_mode = pirq_dpci->gmsi.gflags & XEN_DOMCTL_VMSI_X86_DM_MASK;
-delivery_mode = MASK_EXTR(pirq_dpci->gmsi.gflags,
+dest_mode = pirq_dpci->gmsi.legacy.gflags & 
XEN_DOMCTL_VMSI_X86_DM_MASK;
+delivery_mode = MASK_EXTR(pirq_dpci->gmsi.legacy.gflags,
   XEN_DOMCTL_VMSI_X86_DELIV_MASK);
 
 dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
@@ -430,7 +431,7 @@ int pt_irq_create_bind(
 {
 if ( delivery_mode == dest_LowestPrio )
 vcpu = vector_hashing_dest(d, dest, dest_mode,
-   pirq_dpci->gmsi.gvec);
+   pirq_dpci->gmsi.legacy.gvec);
 if ( vcpu )
 pirq_dpci->gmsi.posted = true;
 }
@@ -440,7 +441,7 @@ int pt_irq_create_bind(
 /* Use interrupt posting if it is supported. */
 if ( iommu_intpost )
 pi_update_irte(vcpu ? >arch.hvm_vmx.pi_desc : NULL,
-   info, pirq_dpci->gmsi.gvec);
+   info, pirq_dpci->gmsi.legacy.gvec);
 
 if ( pt_irq_bind->u.msi.gflags & XEN_DOMCTL_VMSI_X86_UNMASKED )
 {
@@ -835,11 +836,12 @@ static int _hvm_dpci_msi_eoi(struct domain *d,
 int vector = (long)arg;
 
 if ( (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
- (pirq_dpci->gmsi.gvec == vector) )
+ (p

[Xen-devel] [PATCH V3 13/29] x86/vvtd: Set Interrupt Remapping Table Pointer through GCMD

2017-09-22 Thread Lan Tianyu
From: Chao Gao <chao@intel.com>

Software sets this field to set/update the interrupt remapping table pointer
used by hardware. The interrupt remapping table pointer is specified through
the Interrupt Remapping Table Address (IRTA_REG) register.

This patch emulates this operation and adds some new fields in VVTD to track
info (e.g. the table's gfn and max supported entries) of interrupt remapping
table.

Signed-off-by: Chao Gao <chao@intel.com>
Signed-off-by: Lan Tianyu <tianyu@intel.com>

---
v3:
 - ignore unaligned r/w of vt-d hardware registers and return X86EMUL_OK
---
 xen/drivers/passthrough/vtd/iommu.h | 12 ++-
 xen/drivers/passthrough/vtd/vvtd.c  | 69 +
 2 files changed, 80 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.h 
b/xen/drivers/passthrough/vtd/iommu.h
index ef038c9..a0d5ec8 100644
--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -153,6 +153,8 @@
 #define DMA_GCMD_IRE(((u64)1) << 25)
 #define DMA_GCMD_SIRTP  (((u64)1) << 24)
 #define DMA_GCMD_CFI(((u64)1) << 23)
+/* mask of one-shot bits */
+#define DMA_GCMD_ONE_SHOT_MASK 0x96ff 
 
 /* GSTS_REG */
 #define DMA_GSTS_TES(((u64)1) << 31)
@@ -162,9 +164,17 @@
 #define DMA_GSTS_WBFS   (((u64)1) << 27)
 #define DMA_GSTS_QIES   (((u64)1) <<26)
 #define DMA_GSTS_IRES   (((u64)1) <<25)
-#define DMA_GSTS_SIRTPS (((u64)1) << 24)
+#define DMA_GSTS_SIRTPS_SHIFT   24
+#define DMA_GSTS_SIRTPS (((u64)1) << DMA_GSTS_SIRTPS_SHIFT)
 #define DMA_GSTS_CFIS   (((u64)1) <<23)
 
+/* IRTA_REG */
+/* The base of 4KB aligned interrupt remapping table */
+#define DMA_IRTA_ADDR(val)  ((val) & ~0xfffULL)
+/* The size of remapping table is 2^(x+1), where x is the size field in IRTA */
+#define DMA_IRTA_S(val) (val & 0xf)
+#define DMA_IRTA_SIZE(val)  (1UL << (DMA_IRTA_S(val) + 1))
+
 /* PMEN_REG */
 #define DMA_PMEN_EPM(((u32)1) << 31)
 #define DMA_PMEN_PRS(((u32)1) << 0)
diff --git a/xen/drivers/passthrough/vtd/vvtd.c 
b/xen/drivers/passthrough/vtd/vvtd.c
index a3002c3..6736956 100644
--- a/xen/drivers/passthrough/vtd/vvtd.c
+++ b/xen/drivers/passthrough/vtd/vvtd.c
@@ -32,6 +32,13 @@
 /* Supported capabilities by vvtd */
 unsigned int vvtd_caps = VIOMMU_CAP_IRQ_REMAPPING;
 
+struct hvm_hw_vvtd_status {
+uint32_t eim_enabled : 1;
+uint32_t irt_max_entry;
+/* Interrupt remapping table base gfn */
+uint64_t irt;
+};
+
 union hvm_hw_vvtd_regs {
 uint32_t data32[256];
 uint64_t data64[128];
@@ -43,6 +50,8 @@ struct vvtd {
 uint64_t length;
 /* Point back to the owner domain */
 struct domain *domain;
+
+struct hvm_hw_vvtd_status status;
 union hvm_hw_vvtd_regs *regs;
 struct page_info *regs_page;
 };
@@ -70,6 +79,11 @@ struct vvtd *domain_vvtd(struct domain *d)
 return (d->viommu) ? d->viommu->priv : NULL;
 }
 
+static inline void vvtd_set_bit(struct vvtd *vvtd, uint32_t reg, int nr)
+{
+__set_bit(nr, >regs->data32[reg/sizeof(uint32_t)]);
+}
+
 static inline void vvtd_set_reg(struct vvtd *vtd, uint32_t reg, uint32_t value)
 {
 vtd->regs->data32[reg/sizeof(uint32_t)] = value;
@@ -91,6 +105,44 @@ static inline uint64_t vvtd_get_reg_quad(struct vvtd *vtd, 
uint32_t reg)
 return vtd->regs->data64[reg/sizeof(uint64_t)];
 }
 
+static void vvtd_handle_gcmd_sirtp(struct vvtd *vvtd, uint32_t val)
+{
+uint64_t irta = vvtd_get_reg_quad(vvtd, DMAR_IRTA_REG);
+
+if ( !(val & DMA_GCMD_SIRTP) )
+return;
+
+vvtd->status.irt = DMA_IRTA_ADDR(irta) >> PAGE_SHIFT;
+vvtd->status.irt_max_entry = DMA_IRTA_SIZE(irta);
+vvtd->status.eim_enabled = !!(irta & IRTA_EIME);
+vvtd_info("Update IR info (addr=%lx eim=%d size=%d).",
+  vvtd->status.irt, vvtd->status.eim_enabled,
+  vvtd->status.irt_max_entry);
+vvtd_set_bit(vvtd, DMAR_GSTS_REG, DMA_GSTS_SIRTPS_SHIFT);
+}
+
+static int vvtd_write_gcmd(struct vvtd *vvtd, uint32_t val)
+{
+uint32_t orig = vvtd_get_reg(vvtd, DMAR_GSTS_REG);
+uint32_t changed;
+
+orig = orig & DMA_GCMD_ONE_SHOT_MASK;   /* reset the one-shot bits */
+changed = orig ^ val;
+
+if ( !changed )
+return X86EMUL_OKAY;
+
+if ( changed & (changed - 1) )
+vvtd_info("Guest attempts to write %x to GCMD (current GSTS is %x)," 
+  "it would lead to update multiple fields",
+  val, orig);
+
+if ( changed & DMA_GCMD_SIRTP )
+vvtd_handle_gcmd_sirtp(vvtd, val);
+
+return X86EMUL_OKAY;
+}
+
 static int vvtd_in_range(struct vcpu *v, unsigned long addr)
 {
 struct vvtd *vvtd = domain_vvtd(v->domain);
@@ -135,12 +187,17 @@ static int vvtd_write(struct vcpu *v, unsigned long addr,
 {
 switch ( offset )
 

[Xen-devel] [PATCH V3 14/29] x86/vvtd: Enable Interrupt Remapping through GCMD

2017-09-22 Thread Lan Tianyu
From: Chao Gao <chao@intel.com>

Software writes this field to enable/disable interrupt reampping. This patch
emulate IRES field of GCMD.

Signed-off-by: Chao Gao <chao@intel.com>
Signed-off-by: Lan Tianyu <tianyu@intel.com>
---
 xen/drivers/passthrough/vtd/iommu.h |  3 ++-
 xen/drivers/passthrough/vtd/vvtd.c  | 30 +-
 2 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.h 
b/xen/drivers/passthrough/vtd/iommu.h
index a0d5ec8..703726f 100644
--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -163,7 +163,8 @@
 #define DMA_GSTS_AFLS   (((u64)1) << 28)
 #define DMA_GSTS_WBFS   (((u64)1) << 27)
 #define DMA_GSTS_QIES   (((u64)1) <<26)
-#define DMA_GSTS_IRES   (((u64)1) <<25)
+#define DMA_GSTS_IRES_SHIFT 25
+#define DMA_GSTS_IRES   (((u64)1) << DMA_GSTS_IRES_SHIFT)
 #define DMA_GSTS_SIRTPS_SHIFT   24
 #define DMA_GSTS_SIRTPS (((u64)1) << DMA_GSTS_SIRTPS_SHIFT)
 #define DMA_GSTS_CFIS   (((u64)1) <<23)
diff --git a/xen/drivers/passthrough/vtd/vvtd.c 
b/xen/drivers/passthrough/vtd/vvtd.c
index 6736956..a0f63e9 100644
--- a/xen/drivers/passthrough/vtd/vvtd.c
+++ b/xen/drivers/passthrough/vtd/vvtd.c
@@ -33,7 +33,8 @@
 unsigned int vvtd_caps = VIOMMU_CAP_IRQ_REMAPPING;
 
 struct hvm_hw_vvtd_status {
-uint32_t eim_enabled : 1;
+uint32_t eim_enabled : 1,
+ intremap_enabled : 1;
 uint32_t irt_max_entry;
 /* Interrupt remapping table base gfn */
 uint64_t irt;
@@ -84,6 +85,11 @@ static inline void vvtd_set_bit(struct vvtd *vvtd, uint32_t 
reg, int nr)
 __set_bit(nr, >regs->data32[reg/sizeof(uint32_t)]);
 }
 
+static inline void vvtd_clear_bit(struct vvtd *vvtd, uint32_t reg, int nr)
+{
+__clear_bit(nr, >regs->data32[reg/sizeof(uint32_t)]);
+}
+
 static inline void vvtd_set_reg(struct vvtd *vtd, uint32_t reg, uint32_t value)
 {
 vtd->regs->data32[reg/sizeof(uint32_t)] = value;
@@ -105,6 +111,23 @@ static inline uint64_t vvtd_get_reg_quad(struct vvtd *vtd, 
uint32_t reg)
 return vtd->regs->data64[reg/sizeof(uint64_t)];
 }
 
+static void vvtd_handle_gcmd_ire(struct vvtd *vvtd, uint32_t val)
+{
+vvtd_info("%sable Interrupt Remapping",
+  (val & DMA_GCMD_IRE) ? "En" : "Dis");
+
+if ( val & DMA_GCMD_IRE )
+{
+vvtd->status.intremap_enabled = true;
+vvtd_set_bit(vvtd, DMAR_GSTS_REG, DMA_GSTS_IRES_SHIFT);
+}
+else
+{
+vvtd->status.intremap_enabled = false;
+vvtd_clear_bit(vvtd, DMAR_GSTS_REG, DMA_GSTS_IRES_SHIFT);
+}
+}
+
 static void vvtd_handle_gcmd_sirtp(struct vvtd *vvtd, uint32_t val)
 {
 uint64_t irta = vvtd_get_reg_quad(vvtd, DMAR_IRTA_REG);
@@ -112,6 +135,9 @@ static void vvtd_handle_gcmd_sirtp(struct vvtd *vvtd, 
uint32_t val)
 if ( !(val & DMA_GCMD_SIRTP) )
 return;
 
+if ( vvtd->status.intremap_enabled )
+vvtd_info("Update Interrupt Remapping Table when active\n");
+
 vvtd->status.irt = DMA_IRTA_ADDR(irta) >> PAGE_SHIFT;
 vvtd->status.irt_max_entry = DMA_IRTA_SIZE(irta);
 vvtd->status.eim_enabled = !!(irta & IRTA_EIME);
@@ -139,6 +165,8 @@ static int vvtd_write_gcmd(struct vvtd *vvtd, uint32_t val)
 
 if ( changed & DMA_GCMD_SIRTP )
 vvtd_handle_gcmd_sirtp(vvtd, val);
+if ( changed & DMA_GCMD_IRE )
+vvtd_handle_gcmd_ire(vvtd, val);
 
 return X86EMUL_OKAY;
 }
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH V3 18/29] VIOMMU: Add irq request callback to deal with irq remapping

2017-09-22 Thread Lan Tianyu
This patch is to add irq request callback for platform implementation
to deal with irq remapping request.

Signed-off-by: Lan Tianyu <tianyu@intel.com>
---
 xen/common/viommu.c  | 15 +
 xen/include/asm-x86/viommu.h | 72 
 xen/include/xen/viommu.h | 11 +++
 3 files changed, 98 insertions(+)
 create mode 100644 xen/include/asm-x86/viommu.h

diff --git a/xen/common/viommu.c b/xen/common/viommu.c
index 55feb5d..b517158 100644
--- a/xen/common/viommu.c
+++ b/xen/common/viommu.c
@@ -163,6 +163,21 @@ int viommu_domctl(struct domain *d, struct 
xen_domctl_viommu_op *op,
 return rc;
 }
 
+int viommu_handle_irq_request(struct domain *d,
+  struct arch_irq_remapping_request *request)
+{
+struct viommu *viommu = d->viommu;
+
+if ( !viommu )
+return -EINVAL;
+
+ASSERT(viommu->ops);
+if ( !viommu->ops->handle_irq_request )
+return -EINVAL;
+
+return viommu->ops->handle_irq_request(d, request);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/asm-x86/viommu.h b/xen/include/asm-x86/viommu.h
new file mode 100644
index 000..366fbb6
--- /dev/null
+++ b/xen/include/asm-x86/viommu.h
@@ -0,0 +1,72 @@
+/*
+ * include/asm-x86/viommu.h
+ *
+ * Copyright (c) 2017 Intel Corporation.
+ * Author: Lan Tianyu <tianyu@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; If not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+#ifndef __ARCH_X86_VIOMMU_H__
+#define __ARCH_X86_VIOMMU_H__
+
+/* IRQ request type */
+#define VIOMMU_REQUEST_IRQ_MSI  0
+#define VIOMMU_REQUEST_IRQ_APIC 1
+
+struct arch_irq_remapping_request
+{
+union {
+/* MSI */
+struct {
+uint64_t addr;
+uint32_t data;
+} msi;
+/* Redirection Entry in IOAPIC */
+uint64_t rte;
+} msg;
+uint16_t source_id;
+uint8_t type;
+};
+
+static inline void irq_request_ioapic_fill(struct arch_irq_remapping_request 
*req,
+   uint32_t ioapic_id, uint64_t rte)
+{
+ASSERT(req);
+req->type = VIOMMU_REQUEST_IRQ_APIC;
+req->source_id = ioapic_id;
+req->msg.rte = rte;
+}
+
+static inline void irq_request_msi_fill(struct arch_irq_remapping_request *req,
+uint32_t source_id, uint64_t addr,
+uint32_t data)
+{
+ASSERT(req);
+req->type = VIOMMU_REQUEST_IRQ_MSI;
+req->source_id = source_id;
+req->msg.msi.addr = addr;
+req->msg.msi.data = data;
+}
+
+#endif /* __ARCH_X86_VIOMMU_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/xen/viommu.h b/xen/include/xen/viommu.h
index baa8ab7..230f6b1 100644
--- a/xen/include/xen/viommu.h
+++ b/xen/include/xen/viommu.h
@@ -21,10 +21,13 @@
 #define __XEN_VIOMMU_H__
 
 struct viommu;
+struct arch_irq_remapping_request;
 
 struct viommu_ops {
 int (*create)(struct domain *d, struct viommu *viommu);
 int (*destroy)(struct viommu *viommu);
+int (*handle_irq_request)(struct domain *d,
+  struct arch_irq_remapping_request *request);
 };
 
 struct viommu {
@@ -45,11 +48,19 @@ int viommu_register_type(uint64_t type, struct viommu_ops 
*ops);
 int viommu_destroy_domain(struct domain *d);
 int viommu_domctl(struct domain *d, struct xen_domctl_viommu_op *op,
   bool_t *need_copy);
+int viommu_handle_irq_request(struct domain *d,
+  struct arch_irq_remapping_request *request);
 #else
 static inline int viommu_register_type(uint64_t type, struct viommu_ops *ops)
 {
 return -EINVAL;
 }
+static inline int
+viommu_handle_irq_request(struct domain *d,
+  struct arch_irq_remapping_request *request)
+{
+return -EINVAL;
+}
 #endif
 
 #endif /* __XEN_VIOMMU_H__ */
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH V3 12/29] x86/vvtd: Add MMIO handler for VVTD

2017-09-22 Thread Lan Tianyu
From: Chao Gao <chao@intel.com>

This patch adds VVTD MMIO handler to deal with MMIO access.

Signed-off-by: Chao Gao <chao@intel.com>
Signed-off-by: Lan Tianyu <tianyu@intel.com>
---
 xen/drivers/passthrough/vtd/vvtd.c | 91 ++
 1 file changed, 91 insertions(+)

diff --git a/xen/drivers/passthrough/vtd/vvtd.c 
b/xen/drivers/passthrough/vtd/vvtd.c
index c851ec7..a3002c3 100644
--- a/xen/drivers/passthrough/vtd/vvtd.c
+++ b/xen/drivers/passthrough/vtd/vvtd.c
@@ -47,6 +47,29 @@ struct vvtd {
 struct page_info *regs_page;
 };
 
+/* Setting viommu_verbose enables debugging messages of vIOMMU */
+bool __read_mostly viommu_verbose;
+boolean_runtime_param("viommu_verbose", viommu_verbose);
+
+#ifndef NDEBUG
+#define vvtd_info(fmt...) do {\
+if ( viommu_verbose ) \
+gprintk(XENLOG_G_INFO, ## fmt);   \
+} while(0)
+#define vvtd_debug(fmt...) do {   \
+if ( viommu_verbose && printk_ratelimit() )   \
+printk(XENLOG_G_DEBUG fmt);   \
+} while(0)
+#else
+#define vvtd_info(fmt...) do {} while(0)
+#define vvtd_debug(fmt...) do {} while(0)
+#endif
+
+struct vvtd *domain_vvtd(struct domain *d)
+{
+return (d->viommu) ? d->viommu->priv : NULL;
+}
+
 static inline void vvtd_set_reg(struct vvtd *vtd, uint32_t reg, uint32_t value)
 {
 vtd->regs->data32[reg/sizeof(uint32_t)] = value;
@@ -68,6 +91,73 @@ static inline uint64_t vvtd_get_reg_quad(struct vvtd *vtd, 
uint32_t reg)
 return vtd->regs->data64[reg/sizeof(uint64_t)];
 }
 
+static int vvtd_in_range(struct vcpu *v, unsigned long addr)
+{
+struct vvtd *vvtd = domain_vvtd(v->domain);
+
+if ( vvtd )
+return (addr >= vvtd->base_addr) &&
+   (addr < vvtd->base_addr + PAGE_SIZE);
+return 0;
+}
+
+static int vvtd_read(struct vcpu *v, unsigned long addr,
+ unsigned int len, unsigned long *pval)
+{
+struct vvtd *vvtd = domain_vvtd(v->domain);
+unsigned int offset = addr - vvtd->base_addr;
+
+vvtd_info("Read offset %x len %d\n", offset, len);
+
+if ( (len != 4 && len != 8) || (offset & (len - 1)) )
+return X86EMUL_OKAY;
+
+if ( len == 4 )
+*pval = vvtd_get_reg(vvtd, offset);
+else
+*pval = vvtd_get_reg_quad(vvtd, offset);
+
+return X86EMUL_OKAY;
+}
+
+static int vvtd_write(struct vcpu *v, unsigned long addr,
+  unsigned int len, unsigned long val)
+{
+struct vvtd *vvtd = domain_vvtd(v->domain);
+unsigned int offset = addr - vvtd->base_addr;
+
+vvtd_info("Write offset %x len %d val %lx\n", offset, len, val);
+
+if ( (len != 4 && len != 8) || (offset & (len - 1)) )
+return X86EMUL_OKAY;
+
+if ( len == 4 )
+{
+switch ( offset )
+{
+case DMAR_IEDATA_REG:
+case DMAR_IEADDR_REG:
+case DMAR_IEUADDR_REG:
+case DMAR_FEDATA_REG:
+case DMAR_FEADDR_REG:
+case DMAR_FEUADDR_REG:
+vvtd_set_reg(vvtd, offset, val);
+break;
+
+default:
+break;
+}
+}
+
+return X86EMUL_OKAY;
+}
+
+static const struct hvm_mmio_ops vvtd_mmio_ops = {
+.check = vvtd_in_range,
+.read = vvtd_read,
+.write = vvtd_write
+};
+
 static void vvtd_reset(struct vvtd *vvtd, uint64_t capability)
 {
 uint64_t cap = cap_set_num_fault_regs(1ULL) |
@@ -109,6 +199,7 @@ static int vvtd_create(struct domain *d, struct viommu 
*viommu)
 vvtd_reset(vvtd, viommu->caps);
 vvtd->base_addr = viommu->base_address;
 vvtd->domain = d;
+register_mmio_handler(d, _mmio_ops);
 
 viommu->priv = vvtd;
 
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH V3 10/29] vtd: add and align register definitions

2017-09-22 Thread Lan Tianyu
From: Chao Gao <chao@intel.com>

No functional changes.

Signed-off-by: Chao Gao <chao@intel.com>
Signed-off-by: Lan Tianyu <tianyu@intel.com>

---
 xen/drivers/passthrough/vtd/iommu.h | 54 +
 1 file changed, 31 insertions(+), 23 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.h 
b/xen/drivers/passthrough/vtd/iommu.h
index 72c1a2e..d7e433e 100644
--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -23,31 +23,39 @@
 #include 
 
 /*
- * Intel IOMMU register specification per version 1.0 public spec.
+ * Intel IOMMU register specification per version 2.4 public spec.
  */
 
-#defineDMAR_VER_REG0x0/* Arch version supported by this IOMMU */
-#defineDMAR_CAP_REG0x8/* Hardware supported capabilities */
-#defineDMAR_ECAP_REG0x10/* Extended capabilities supported */
-#defineDMAR_GCMD_REG0x18/* Global command register */
-#defineDMAR_GSTS_REG0x1c/* Global status register */
-#defineDMAR_RTADDR_REG0x20/* Root entry table */
-#defineDMAR_CCMD_REG0x28/* Context command reg */
-#defineDMAR_FSTS_REG0x34/* Fault Status register */
-#defineDMAR_FECTL_REG0x38/* Fault control register */
-#defineDMAR_FEDATA_REG0x3c/* Fault event interrupt data register */
-#defineDMAR_FEADDR_REG0x40/* Fault event interrupt addr register */
-#defineDMAR_FEUADDR_REG 0x44/* Upper address register */
-#defineDMAR_AFLOG_REG0x58/* Advanced Fault control */
-#defineDMAR_PMEN_REG0x64/* Enable Protected Memory Region */
-#defineDMAR_PLMBASE_REG 0x68/* PMRR Low addr */
-#defineDMAR_PLMLIMIT_REG 0x6c/* PMRR low limit */
-#defineDMAR_PHMBASE_REG 0x70/* pmrr high base addr */
-#defineDMAR_PHMLIMIT_REG 0x78/* pmrr high limit */
-#defineDMAR_IQH_REG0x80/* invalidation queue head */
-#defineDMAR_IQT_REG0x88/* invalidation queue tail */
-#defineDMAR_IQA_REG0x90/* invalidation queue addr */
-#defineDMAR_IRTA_REG   0xB8/* intr remap */
+#define DMAR_VER_REG0x0  /* Arch version supported by this IOMMU */
+#define DMAR_CAP_REG0x8  /* Hardware supported capabilities */
+#define DMAR_ECAP_REG   0x10 /* Extended capabilities supported */
+#define DMAR_GCMD_REG   0x18 /* Global command register */
+#define DMAR_GSTS_REG   0x1c /* Global status register */
+#define DMAR_RTADDR_REG 0x20 /* Root entry table */
+#define DMAR_CCMD_REG   0x28 /* Context command reg */
+#define DMAR_FSTS_REG   0x34 /* Fault Status register */
+#define DMAR_FECTL_REG  0x38 /* Fault control register */
+#define DMAR_FEDATA_REG 0x3c /* Fault event interrupt data register */
+#define DMAR_FEADDR_REG 0x40 /* Fault event interrupt addr register */
+#define DMAR_FEUADDR_REG0x44 /* Upper address register */
+#define DMAR_AFLOG_REG  0x58 /* Advanced Fault control */
+#define DMAR_PMEN_REG   0x64 /* Enable Protected Memory Region */
+#define DMAR_PLMBASE_REG0x68 /* PMRR Low addr */
+#define DMAR_PLMLIMIT_REG   0x6c /* PMRR low limit */
+#define DMAR_PHMBASE_REG0x70 /* pmrr high base addr */
+#define DMAR_PHMLIMIT_REG   0x78 /* pmrr high limit */
+#define DMAR_IQH_REG0x80 /* invalidation queue head */
+#define DMAR_IQT_REG0x88 /* invalidation queue tail */
+#define DMAR_IQT_REG_HI 0x8c
+#define DMAR_IQA_REG0x90 /* invalidation queue addr */
+#define DMAR_IQA_REG_HI 0x94
+#define DMAR_ICS_REG0x9c /* Invalidation complete status */
+#define DMAR_IECTL_REG  0xa0 /* Invalidation event control */
+#define DMAR_IEDATA_REG 0xa4 /* Invalidation event data */
+#define DMAR_IEADDR_REG 0xa8 /* Invalidation event address */
+#define DMAR_IEUADDR_REG0xac /* Invalidation event address */
+#define DMAR_IRTA_REG   0xb8 /* Interrupt remapping table addr */
+#define DMAR_IRTA_REG_HI0xbc
 
 #define OFFSET_STRIDE(9)
 #define dmar_readl(dmar, reg) readl((dmar) + (reg))
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH V3 17/29] x86/vvtd: add a helper function to decide the interrupt format

2017-09-22 Thread Lan Tianyu
From: Chao Gao <chao@intel.com>

Different platform may use different method to distinguish
remapping format interrupt and normal format interrupt.

Intel uses one bit in IOAPIC RTE or MSI address register to
indicate the interrupt is remapping format. vvtd will handle
all the interrupts when .check_irq_remapping() return true.

Signed-off-by: Chao Gao <chao@intel.com>
Signed-off-by: Lan Tianyu <tianyu@intel.com>
---
 xen/drivers/passthrough/vtd/vvtd.c | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/vtd/vvtd.c 
b/xen/drivers/passthrough/vtd/vvtd.c
index 5e22ace..bd1cadd 100644
--- a/xen/drivers/passthrough/vtd/vvtd.c
+++ b/xen/drivers/passthrough/vtd/vvtd.c
@@ -536,6 +536,28 @@ static int vvtd_get_irq_info(struct domain *d,
 return 0;
 }
 
+/* Probe whether the interrupt request is an remapping format */
+static bool vvtd_is_remapping(struct domain *d,
+  struct arch_irq_remapping_request *irq)
+{
+if ( irq->type == VIOMMU_REQUEST_IRQ_APIC )
+{
+struct IO_APIC_route_remap_entry rte = { .val = irq->msg.rte };
+
+return rte.format;
+}
+else if ( irq->type == VIOMMU_REQUEST_IRQ_MSI )
+{
+struct msi_msg_remap_entry msi_msg =
+{ .address_lo = { .val = irq->msg.msi.addr } };
+
+return msi_msg.address_lo.format;
+}
+ASSERT_UNREACHABLE();
+
+return 0;
+}
+
 static void vvtd_reset(struct vvtd *vvtd, uint64_t capability)
 {
 uint64_t cap = cap_set_num_fault_regs(1ULL) |
@@ -607,7 +629,8 @@ struct viommu_ops vvtd_hvm_vmx_ops = {
 .create = vvtd_create,
 .destroy = vvtd_destroy,
 .handle_irq_request = vvtd_handle_irq_request,
-.get_irq_info = vvtd_get_irq_info
+.get_irq_info = vvtd_get_irq_info,
+.check_irq_remapping = vvtd_is_remapping
 };
 
 static int vvtd_register(void)
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH V3 9/29] tools/libxc: Add viommu operations in libxc

2017-09-22 Thread Lan Tianyu
From: Chao Gao <chao@intel.com>

This patch adds XEN_DOMCTL_viommu_op hypercall. This hypercall
comprises two sub-commands:
- create(): create a vIOMMU in Xen, given viommu type, register-set
location and capabilities
- destroy(): destroy a vIOMMU specified by viommu_id

Signed-off-by: Chao Gao <chao@intel.com>
Signed-off-by: Lan Tianyu <tianyu@intel.com>
---
v3:
 - Remove API for querying viommu capabilities
 - Remove pointless cast
 - Polish commit message
 - Coding style
---
 tools/libxc/Makefile  |  1 +
 tools/libxc/include/xenctrl.h |  4 +++
 tools/libxc/xc_viommu.c   | 64 +++
 3 files changed, 69 insertions(+)
 create mode 100644 tools/libxc/xc_viommu.c

diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index 9a019e8..7d8c4b4 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -51,6 +51,7 @@ CTRL_SRCS-$(CONFIG_MiniOS) += xc_minios.c
 CTRL_SRCS-y   += xc_evtchn_compat.c
 CTRL_SRCS-y   += xc_gnttab_compat.c
 CTRL_SRCS-y   += xc_devicemodel_compat.c
+CTRL_SRCS-y   += xc_viommu.c
 
 GUEST_SRCS-y :=
 GUEST_SRCS-y += xg_private.c xc_suspend.c
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 43151cb..bedca1f 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2501,6 +2501,10 @@ enum xc_static_cpu_featuremask {
 const uint32_t *xc_get_static_cpu_featuremask(enum xc_static_cpu_featuremask);
 const uint32_t *xc_get_feature_deep_deps(uint32_t feature);
 
+int xc_viommu_create(xc_interface *xch, domid_t dom, uint64_t type,
+ uint64_t base_addr, uint64_t cap, uint32_t *viommu_id);
+int xc_viommu_destroy(xc_interface *xch, domid_t dom, uint32_t viommu_id);
+
 #endif
 
 int xc_livepatch_upload(xc_interface *xch,
diff --git a/tools/libxc/xc_viommu.c b/tools/libxc/xc_viommu.c
new file mode 100644
index 000..17507c5
--- /dev/null
+++ b/tools/libxc/xc_viommu.c
@@ -0,0 +1,64 @@
+/*
+ * xc_viommu.c
+ *
+ * viommu related API functions.
+ *
+ * Copyright (C) 2017 Intel Corporation
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License, version 2.1, as published by the Free Software Foundation.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "xc_private.h"
+
+int xc_viommu_create(xc_interface *xch, domid_t dom, uint64_t type,
+ uint64_t base_addr, uint64_t cap, uint32_t *viommu_id)
+{
+int rc;
+
+DECLARE_DOMCTL;
+
+domctl.cmd = XEN_DOMCTL_viommu_op;
+domctl.domain = dom;
+domctl.u.viommu_op.cmd = XEN_DOMCTL_create_viommu;
+domctl.u.viommu_op.u.create.viommu_type = type;
+domctl.u.viommu_op.u.create.base_address = base_addr;
+domctl.u.viommu_op.u.create.capabilities = cap;
+
+rc = do_domctl(xch, );
+if ( !rc )
+*viommu_id = domctl.u.viommu_op.u.create.viommu_id;
+
+return rc;
+}
+
+int xc_viommu_destroy(xc_interface *xch, domid_t dom, uint32_t viommu_id)
+{
+DECLARE_DOMCTL;
+
+domctl.cmd = XEN_DOMCTL_viommu_op;
+domctl.domain = dom;
+domctl.u.viommu_op.cmd = XEN_DOMCTL_destroy_viommu;
+domctl.u.viommu_op.u.destroy.viommu_id = viommu_id;
+
+return do_domctl(xch, );
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH V3 5/29] tools/libacpi: Add new fields in acpi_config for DMAR table

2017-09-22 Thread Lan Tianyu
From: Chao Gao <chao@intel.com>

The BIOS reports the remapping hardware units in a platform to system software
through the DMA Remapping Reporting (DMAR) ACPI table.
New fields are introduces for DMAR table. These new fields are set by
toolstack through parsing guest's config file. construct_dmar() is added to
build DMAR table according to the new fields.

Signed-off-by: Chao Gao <chao@intel.com>
Signed-off-by: Lan Tianyu <tianyu@intel.com>
---
v3:
 - Remove chip-set specific IOAPIC BDF. Instead, let IOAPIC-related
 info be passed by struct acpi_config.

---
 tools/libacpi/build.c   | 53 +
 tools/libacpi/libacpi.h | 12 +++
 2 files changed, 65 insertions(+)

diff --git a/tools/libacpi/build.c b/tools/libacpi/build.c
index f9881c9..5ee8fcd 100644
--- a/tools/libacpi/build.c
+++ b/tools/libacpi/build.c
@@ -303,6 +303,59 @@ static struct acpi_20_slit *construct_slit(struct 
acpi_ctxt *ctxt,
 return slit;
 }
 
+/*
+ * Only one DMA remapping hardware unit is exposed and all devices
+ * are under the remapping hardware unit. I/O APIC should be explicitly
+ * enumerated.
+ */
+struct acpi_dmar *construct_dmar(struct acpi_ctxt *ctxt,
+ const struct acpi_config *config)
+{
+struct acpi_dmar *dmar;
+struct acpi_dmar_hardware_unit *drhd;
+struct dmar_device_scope *scope;
+unsigned int size;
+unsigned int ioapic_scope_size = sizeof(*scope) + sizeof(scope->path[0]);
+
+size = sizeof(*dmar) + sizeof(*drhd) + ioapic_scope_size;
+
+dmar = ctxt->mem_ops.alloc(ctxt, size, 16);
+if ( !dmar )
+return NULL;
+
+memset(dmar, 0, size);
+dmar->header.signature = ACPI_2_0_DMAR_SIGNATURE;
+dmar->header.revision = ACPI_2_0_DMAR_REVISION;
+dmar->header.length = size;
+fixed_strcpy(dmar->header.oem_id, ACPI_OEM_ID);
+fixed_strcpy(dmar->header.oem_table_id, ACPI_OEM_TABLE_ID);
+dmar->header.oem_revision = ACPI_OEM_REVISION;
+dmar->header.creator_id   = ACPI_CREATOR_ID;
+dmar->header.creator_revision = ACPI_CREATOR_REVISION;
+dmar->host_address_width = config->host_addr_width - 1;
+if ( config->iommu_intremap_supported )
+dmar->flags |= ACPI_DMAR_INTR_REMAP;
+if ( !config->iommu_x2apic_supported )
+dmar->flags |= ACPI_DMAR_X2APIC_OPT_OUT;
+
+drhd = (struct acpi_dmar_hardware_unit *)((void*)dmar + sizeof(*dmar));
+drhd->type = ACPI_DMAR_TYPE_HARDWARE_UNIT;
+drhd->length = sizeof(*drhd) + ioapic_scope_size;
+drhd->flags = ACPI_DMAR_INCLUDE_PCI_ALL;
+drhd->pci_segment = 0;
+drhd->base_address = config->iommu_base_addr;
+
+scope = >scope[0];
+scope->type = ACPI_DMAR_DEVICE_SCOPE_IOAPIC;
+scope->length = ioapic_scope_size;
+scope->enumeration_id = config->ioapic_id;
+scope->bus = config->ioapic_bus;
+scope->path[0] = config->ioapic_devfn;
+
+set_checksum(dmar, offsetof(struct acpi_header, checksum), size);
+return dmar;
+}
+
 static int construct_passthrough_tables(struct acpi_ctxt *ctxt,
 unsigned long *table_ptrs,
 int nr_tables,
diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
index a2efd23..fdd6a78 100644
--- a/tools/libacpi/libacpi.h
+++ b/tools/libacpi/libacpi.h
@@ -20,6 +20,8 @@
 #ifndef __LIBACPI_H__
 #define __LIBACPI_H__
 
+#include 
+
 #define ACPI_HAS_COM1  (1<<0)
 #define ACPI_HAS_COM2  (1<<1)
 #define ACPI_HAS_LPT1  (1<<2)
@@ -96,8 +98,18 @@ struct acpi_config {
 uint32_t ioapic_base_address;
 uint16_t pci_isa_irq_mask;
 uint8_t ioapic_id;
+
+/* Emulated IOMMU features, location and IOAPIC under the scope of IOMMU */
+bool iommu_intremap_supported;
+bool iommu_x2apic_supported;
+uint8_t host_addr_width;
+uint8_t ioapic_bus;
+uint16_t ioapic_devfn;
+uint64_t iommu_base_addr;
 };
 
+struct acpi_dmar *construct_dmar(struct acpi_ctxt *ctxt,
+ const struct acpi_config *config);
 int acpi_build_tables(struct acpi_ctxt *ctxt, struct acpi_config *config);
 
 #endif /* __LIBACPI_H__ */
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH V3 11/29] x86/hvm: Introduce a emulated VTD for HVM

2017-09-22 Thread Lan Tianyu
From: Chao Gao <chao@intel.com>

This patch adds create/destroy function for the emulated VTD
and adapts it to the common VIOMMU abstraction.

Signed-off-by: Chao Gao <chao@intel.com>
Signed-off-by: Lan Tianyu <tianyu@intel.com>
---
 xen/drivers/passthrough/vtd/Makefile |   7 +-
 xen/drivers/passthrough/vtd/iommu.h  |  23 +-
 xen/drivers/passthrough/vtd/vvtd.c   | 147 +++
 3 files changed, 170 insertions(+), 7 deletions(-)
 create mode 100644 xen/drivers/passthrough/vtd/vvtd.c

diff --git a/xen/drivers/passthrough/vtd/Makefile 
b/xen/drivers/passthrough/vtd/Makefile
index f302653..163c7fe 100644
--- a/xen/drivers/passthrough/vtd/Makefile
+++ b/xen/drivers/passthrough/vtd/Makefile
@@ -1,8 +1,9 @@
 subdir-$(CONFIG_X86) += x86
 
-obj-y += iommu.o
 obj-y += dmar.o
-obj-y += utils.o
-obj-y += qinval.o
 obj-y += intremap.o
+obj-y += iommu.o
+obj-y += qinval.o
 obj-y += quirks.o
+obj-y += utils.o
+obj-$(CONFIG_VIOMMU) += vvtd.o
diff --git a/xen/drivers/passthrough/vtd/iommu.h 
b/xen/drivers/passthrough/vtd/iommu.h
index d7e433e..ef038c9 100644
--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -66,6 +66,12 @@
 #define VER_MAJOR(v)(((v) & 0xf0) >> 4)
 #define VER_MINOR(v)((v) & 0x0f)
 
+/* Supported Adjusted Guest Address Widths */
+#define DMA_CAP_SAGAW_SHIFT 8
+ /* 39-bit AGAW, 3-level page-table */
+#define DMA_CAP_SAGAW_39bit (0x2ULL << DMA_CAP_SAGAW_SHIFT)
+#define DMA_CAP_ND_64K  6ULL
+
 /*
  * Decoding Capability Register
  */
@@ -74,6 +80,7 @@
 #define cap_write_drain(c) (((c) >> 54) & 1)
 #define cap_max_amask_val(c)   (((c) >> 48) & 0x3f)
 #define cap_num_fault_regs(c)  c) >> 40) & 0xff) + 1)
+#define cap_set_num_fault_regs(c)  c) - 1) & 0xff) << 40)
 #define cap_pgsel_inv(c)   (((c) >> 39) & 1)
 
 #define cap_super_page_val(c)  (((c) >> 34) & 0xf)
@@ -85,11 +92,13 @@
 #define cap_sps_1tb(c) ((c >> 37) & 1)
 
 #define cap_fault_reg_offset(c)c) >> 24) & 0x3ff) * 16)
+#define cap_set_fault_reg_offset(c) c) / 16) & 0x3ff) << 24 )
 
 #define cap_isoch(c)(((c) >> 23) & 1)
 #define cap_qos(c)(((c) >> 22) & 1)
 #define cap_mgaw(c)c) >> 16) & 0x3f) + 1)
-#define cap_sagaw(c)(((c) >> 8) & 0x1f)
+#define cap_set_mgaw(c) c) - 1) & 0x3f) << 16)
+#define cap_sagaw(c)(((c) >> DMA_CAP_SAGAW_SHIFT) & 0x1f)
 #define cap_caching_mode(c)(((c) >> 7) & 1)
 #define cap_phmr(c)(((c) >> 6) & 1)
 #define cap_plmr(c)(((c) >> 5) & 1)
@@ -104,10 +113,16 @@
 #define ecap_niotlb_iunits(e)e) >> 24) & 0xff) + 1)
 #define ecap_iotlb_offset(e) e) >> 8) & 0x3ff) * 16)
 #define ecap_coherent(e) ((e >> 0) & 0x1)
-#define ecap_queued_inval(e) ((e >> 1) & 0x1)
+#define DMA_ECAP_QI_SHIFT1
+#define DMA_ECAP_QI  (1ULL << DMA_ECAP_QI_SHIFT)
+#define ecap_queued_inval(e) ((e >> DMA_ECAP_QI_SHIFT) & 0x1)
 #define ecap_dev_iotlb(e)((e >> 2) & 0x1)
-#define ecap_intr_remap(e)   ((e >> 3) & 0x1)
-#define ecap_eim(e)  ((e >> 4) & 0x1)
+#define DMA_ECAP_IR_SHIFT3
+#define DMA_ECAP_IR  (1ULL << DMA_ECAP_IR_SHIFT)
+#define ecap_intr_remap(e)   ((e >> DMA_ECAP_IR_SHIFT) & 0x1)
+#define DMA_ECAP_EIM_SHIFT   4
+#define DMA_ECAP_EIM (1ULL << DMA_ECAP_EIM_SHIFT)
+#define ecap_eim(e)  ((e >> DMA_ECAP_EIM_SHIFT) & 0x1)
 #define ecap_cache_hints(e)  ((e >> 5) & 0x1)
 #define ecap_pass_thru(e)((e >> 6) & 0x1)
 #define ecap_snp_ctl(e)  ((e >> 7) & 0x1)
diff --git a/xen/drivers/passthrough/vtd/vvtd.c 
b/xen/drivers/passthrough/vtd/vvtd.c
new file mode 100644
index 000..c851ec7
--- /dev/null
+++ b/xen/drivers/passthrough/vtd/vvtd.c
@@ -0,0 +1,147 @@
+/*
+ * vvtd.c
+ *
+ * virtualize VTD for HVM.
+ *
+ * Copyright (C) 2017 Chao Gao, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#i

[Xen-devel] [PATCH V3 8/29] tools/libxl: create vIOMMU during domain construction

2017-09-22 Thread Lan Tianyu
From: Chao Gao <chao@intel.com>

If guest is configured to have a vIOMMU, create it during domain construction.

Signed-off-by: Chao Gao <chao@intel.com>
Signed-off-by: Lan Tianyu <tianyu@intel.com>

---
v3:
 - Remove the process of querying capabilities.
---
 tools/libxl/libxl_x86.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index 23c9a55..25cae5f 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -341,8 +341,25 @@ int libxl__arch_domain_create(libxl__gc *gc, 
libxl_domain_config *d_config,
 if (d_config->b_info.type == LIBXL_DOMAIN_TYPE_HVM) {
 unsigned long shadow = DIV_ROUNDUP(d_config->b_info.shadow_memkb,
1024);
+int i;
+
 xc_shadow_control(ctx->xch, domid, XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION,
   NULL, 0, , 0, NULL);
+
+for (i = 0; i < d_config->b_info.num_viommus; i++) {
+uint32_t id;
+libxl_viommu_info *viommu = d_config->b_info.viommu + i;
+
+if (viommu->type == LIBXL_VIOMMU_TYPE_INTEL_VTD) {
+ret = xc_viommu_create(ctx->xch, domid, VIOMMU_TYPE_INTEL_VTD,
+   viommu->base_addr, viommu->cap, );
+if (ret) {
+LOGED(ERROR, domid, "create vIOMMU fail");
+ret = ERROR_FAIL;
+goto out;
+}
+}
+}
 }
 
 if (d_config->c_info.type == LIBXL_DOMAIN_TYPE_PV &&
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH V3 7/29] tools/libxl: build DMAR table for a guest with one virtual VTD

2017-09-22 Thread Lan Tianyu
From: Chao Gao <chao@intel.com>

A new logic is added to build ACPI DMAR table in tool stack for a guest
with one virtual VTD and pass through it to guest via existing mechanism. If
there already are ACPI tables needed to pass through, we joint the tables.

Signed-off-by: Chao Gao <chao@intel.com>
Signed-off-by: Lan Tianyu <tianyu@intel.com>

---
v3:
 - build dmar and initialize related acpi_modules struct in
 libxl_x86_acpi.c, keeping in accordance with pvh.

---
 tools/libxl/libxl_x86.c  |  3 +-
 tools/libxl/libxl_x86_acpi.c | 98 ++--
 2 files changed, 96 insertions(+), 5 deletions(-)

diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index 455f6f0..23c9a55 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -381,8 +381,7 @@ int libxl__arch_domain_finalise_hw_description(libxl__gc 
*gc,
 {
 int rc = 0;
 
-if ((info->type == LIBXL_DOMAIN_TYPE_HVM) &&
-(info->device_model_version == LIBXL_DEVICE_MODEL_VERSION_NONE)) {
+if (info->type == LIBXL_DOMAIN_TYPE_HVM) {
 rc = libxl__dom_load_acpi(gc, info, dom);
 if (rc != 0)
 LOGE(ERROR, "libxl_dom_load_acpi failed");
diff --git a/tools/libxl/libxl_x86_acpi.c b/tools/libxl/libxl_x86_acpi.c
index 1761756..adf02f4 100644
--- a/tools/libxl/libxl_x86_acpi.c
+++ b/tools/libxl/libxl_x86_acpi.c
@@ -16,6 +16,7 @@
 #include "libxl_arch.h"
 #include 
 #include 
+#include "libacpi/acpi2_0.h"
 #include "libacpi/libacpi.h"
 
 #include 
@@ -161,9 +162,9 @@ out:
 return rc;
 }
 
-int libxl__dom_load_acpi(libxl__gc *gc,
- const libxl_domain_build_info *b_info,
- struct xc_dom_image *dom)
+static int libxl__dom_load_acpi_pvh(libxl__gc *gc,
+const libxl_domain_build_info *b_info,
+struct xc_dom_image *dom)
 {
 struct acpi_config config = {0};
 struct libxl_acpi_ctxt libxl_ctxt;
@@ -236,6 +237,97 @@ out:
 return rc;
 }
 
+static void *acpi_memalign(struct acpi_ctxt *ctxt, uint32_t size,
+   uint32_t align)
+{
+int ret;
+void *ptr;
+
+ret = posix_memalign(, align, size);
+if (ret != 0 || !ptr)
+return NULL;
+
+return ptr;
+}
+
+/*
+ * For hvm, we don't need build acpi in libxl. Instead, it's built in 
hvmloader.
+ * But if one hvm has virtual VTD(s), we build DMAR table for it and joint this
+ * table with existing content in acpi_modules in order to employ HVM
+ * firmware pass-through mechanism to pass-through DMAR table.
+ */
+static int libxl__dom_load_acpi_hvm(libxl__gc *gc,
+const libxl_domain_build_info *b_info,
+struct xc_dom_image *dom)
+{
+struct acpi_config config = { 0 };
+struct acpi_ctxt ctxt;
+void *table;
+uint32_t len;
+
+if ((b_info->type != LIBXL_DOMAIN_TYPE_HVM) ||
+(b_info->device_model_version == LIBXL_DEVICE_MODEL_VERSION_NONE) ||
+(b_info->num_viommus != 1) ||
+(b_info->viommu[0].type != LIBXL_VIOMMU_TYPE_INTEL_VTD))
+return 0;
+
+ctxt.mem_ops.alloc = acpi_memalign;
+ctxt.mem_ops.v2p = virt_to_phys;
+ctxt.mem_ops.free = acpi_mem_free;
+
+if (libxl_defbool_val(b_info->viommu[0].intremap))
+config.iommu_intremap_supported = true;
+/* x2apic is always enabled since in no case we must disable it */
+config.iommu_x2apic_supported = true;
+config.iommu_base_addr = b_info->viommu[0].base_addr;
+
+/* IOAPIC id and PSEUDO BDF */
+config.ioapic_id = 1;
+config.ioapic_bus = 0xff;
+config.ioapic_devfn = 0x0;
+
+config.host_addr_width = 39;
+
+table = construct_dmar(, );
+if ( !table )
+return ERROR_NOMEM;
+len = ((struct acpi_header *)table)->length;
+
+if (len) {
+libxl__ptr_add(gc, table);
+if (!dom->acpi_modules[0].data) {
+dom->acpi_modules[0].data = table;
+dom->acpi_modules[0].length = len;
+} else {
+/* joint tables */
+void *newdata;
+
+newdata = libxl__malloc(gc, len + dom->acpi_modules[0].length);
+memcpy(newdata, dom->acpi_modules[0].data,
+   dom->acpi_modules[0].length);
+memcpy(newdata + dom->acpi_modules[0].length, table, len);
+
+free(dom->acpi_modules[0].data);
+dom->acpi_modules[0].data = newdata;
+dom->acpi_modules[0].length += len;
+}
+}
+return 0;
+}
+
+int libxl__dom_load_acpi(libxl__gc *gc,
+ const libxl_domain_build_info *b_info,
+ struct xc_dom_image *dom)
+{
+
+if (b_info->type != LIBXL_DOMAIN_TYPE_HVM)
+return 0;
+
+if (b_info->device_model_version == LIBXL_DEVI

[Xen-devel] [PATCH V3 6/29] tools/libxl: Add a user configurable parameter to control vIOMMU attributes

2017-09-22 Thread Lan Tianyu
From: Chao Gao <chao@intel.com>

A field, viommu_info, is added to struct libxl_domain_build_info. Several
attributes can be specified by guest config file for virtual IOMMU. These
attributes are used for DMAR construction and vIOMMU creation.

Signed-off-by: Chao Gao <chao@intel.com>
Signed-off-by: Lan Tianyu <tianyu@intel.com>

---
v3:
 - allow an array of viommu other than only one viommu to present to guest.
 During domain building, an error would be raised for
 multiple viommus case since we haven't implemented this yet.
 - provide a libxl__viommu_set_default() for viommu

---
 docs/man/xl.cfg.pod.5.in| 27 +++
 tools/libxl/libxl_create.c  | 52 +
 tools/libxl/libxl_types.idl | 12 +++
 tools/xl/xl_parse.c | 52 -
 4 files changed, 142 insertions(+), 1 deletion(-)

diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in
index 79cb2ea..9cd7dd7 100644
--- a/docs/man/xl.cfg.pod.5.in
+++ b/docs/man/xl.cfg.pod.5.in
@@ -1547,6 +1547,33 @@ 
L<http://www.microsoft.com/en-us/download/details.aspx?id=30707>
 
 =back 
 
+=item B

[Xen-devel] [PATCH V3 00/29]

2017-09-22 Thread Lan Tianyu
Change since v2:
   1) Remove vIOMMU hypercall of query capabilities and introduce when 
necessary.
   2) Remove length field of vIOMMU create parameter of vIOMMU hypercall
   3) Introduce irq remapping mode callback to vIOMMU framework and vIOMMU 
device models
can check irq remapping mode by vendor specific ways.
   4) Update vIOMMU docs.
   5) Other changes please see patches' change logs.

Change since v1:
   1) Fix coding style issues
   2) Add definitions for vIOMMU type and capabilities
   3) Change vIOMMU kconfig and select vIOMMU default on x86
   4) Put vIOMMU creation in libxl__arch_domain_create()
   5) Make vIOMMU structure of tool stack more general for both PV and HVM.

Change since RFC v2:
   1) Move vvtd.c to drivers/passthrough/vtd directroy. 
   2) Make vIOMMU always built in on x86
   3) Add new boot cmd "viommu" to enable viommu function
   4) Fix some code stype issues.

Change since RFC v1:
   1) Add Xen virtual IOMMU doc docs/misc/viommu.txt
   2) Move vIOMMU hypercall of create/destroy vIOMMU and query  
capabilities from dmop to domctl suggested by Paul Durrant. Because
these hypercalls can be done in tool stack and more VM mode(E,G PVH
or other modes don't use Qemu) can be benefit.
   3) Add check of input MMIO address and length.
   4) Add iommu_type in vIOMMU hypercall parameter to specify
vendor vIOMMU device model(E,G Intel VTD, AMD or ARM IOMMU. So far
only support Intel VTD).
   5) Add save and restore support for vvtd


This patchset is to introduce vIOMMU framework and add virtual VTD's
interrupt remapping support according "Xen virtual IOMMU high level
design doc V3"(https://lists.xenproject.org/archives/html/xen-devel/
2016-11/msg01391.html).

- vIOMMU framework
New framework provides viommu_ops and help functions to abstract
vIOMMU operations(E,G create, destroy, handle irq remapping request
and so on). Vendors(Intel, ARM, AMD and son) can implement their
vIOMMU callbacks.

- Virtual VTD
We enable irq remapping function and covers both
MSI and IOAPIC interrupts. Don't support post interrupt mode emulation
and post interrupt mode enabled on host with virtual VTD. will add
later.

Repo:
https://github.com/lantianyu/Xen/tree/xen_viommu_v3


Chao Gao (23):
  tools/libacpi: Add DMA remapping reporting (DMAR) ACPI table
structures
  tools/libacpi: Add new fields in acpi_config for DMAR table
  tools/libxl: Add a user configurable parameter to control vIOMMU
attributes
  tools/libxl: build DMAR table for a guest with one virtual VTD
  tools/libxl: create vIOMMU during domain construction
  tools/libxc: Add viommu operations in libxc
  vtd: add and align register definitions
  x86/hvm: Introduce a emulated VTD for HVM
  x86/vvtd: Add MMIO handler for VVTD
  x86/vvtd: Set Interrupt Remapping Table Pointer through GCMD
  x86/vvtd: Enable Interrupt Remapping through GCMD
  x86/vvtd: Process interrupt remapping request
  x86/vvtd: decode interrupt attribute from IRTE
  x86/vvtd: add a helper function to decide the interrupt format
  x86/vioapic: Hook interrupt delivery of vIOAPIC
  x86/vioapic: extend vioapic_get_vector() to support remapping format
RTE
  passthrough: move some fields of hvm_gmsi_info to a sub-structure
  tools/libxc: Add a new interface to bind remapping format msi with
pirq
  x86/vmsi: Hook delivering remapping format msi to guest
  x86/vvtd: Handle interrupt translation faults
  x86/vvtd: Enable Queued Invalidation through GCMD
  x86/vvtd: Add queued invalidation (QI) support
  x86/vvtd: save and restore emulated VT-d

Lan Tianyu (6):
  Xen/doc: Add Xen virtual IOMMU doc
  VIOMMU: Add vIOMMU helper functions to create, destroy vIOMMU instance
  DOMCTL: Introduce new DOMCTL commands for vIOMMU support
  VIOMMU: Add irq request callback to deal with irq remapping
  VIOMMU: Add get irq info callback to convert irq remapping request
  VIOMMU: Introduce callback of checking irq remapping mode

 docs/man/xl.cfg.pod.5.in   |   27 +
 docs/misc/viommu.txt   |  136 
 docs/misc/xen-command-line.markdown|7 +
 tools/libacpi/acpi2_0.h|   61 ++
 tools/libacpi/build.c  |   53 ++
 tools/libacpi/libacpi.h|   12 +
 tools/libxc/Makefile   |1 +
 tools/libxc/include/xenctrl.h  |   21 +
 tools/libxc/xc_domain.c|   53 ++
 tools/libxc/xc_viommu.c|   64 ++
 tools/libxl/libxl_create.c |   52 ++
 tools/libxl/libxl_types.idl|   12 +
 tools/libxl/libxl_x86.c|   20 +-
 tools/libxl/libxl_x86_acpi.c   |   98 ++-
 tools/xl/xl_parse.c|   52 +-
 xen/arch/x86/Kconfig   |1 +
 xen/arch/x86/hvm/irq.c |7 +
 xen/arch/x86/hvm/vioapic.c |   26 +-
 xen/arch/x86/hvm/vmsi.c|   18 +-
 xen/common/Kconfig  

[Xen-devel] [PATCH V3 4/29] tools/libacpi: Add DMA remapping reporting (DMAR) ACPI table structures

2017-09-22 Thread Lan Tianyu
From: Chao Gao <chao@intel.com>

Add dmar table structure according Chapter 8 "BIOS Considerations" of
VTd spec Rev. 2.4.

VTd 
spec:http://www.intel.com/content/dam/www/public/us/en/documents/product-specifications/vt-directed-io-spec.pdf

Signed-off-by: Chao Gao <chao@intel.com>
Signed-off-by: Lan Tianyu <tianyu@intel.com>
---
 tools/libacpi/acpi2_0.h | 61 +
 1 file changed, 61 insertions(+)

diff --git a/tools/libacpi/acpi2_0.h b/tools/libacpi/acpi2_0.h
index 2619ba3..758a823 100644
--- a/tools/libacpi/acpi2_0.h
+++ b/tools/libacpi/acpi2_0.h
@@ -422,6 +422,65 @@ struct acpi_20_slit {
 };
 
 /*
+ * DMA Remapping Table header definition (DMAR)
+ */
+
+/*
+ * DMAR Flags.
+ */
+#define ACPI_DMAR_INTR_REMAP(1 << 0)
+#define ACPI_DMAR_X2APIC_OPT_OUT(1 << 1)
+
+struct acpi_dmar {
+struct acpi_header header;
+uint8_t host_address_width;
+uint8_t flags;
+uint8_t reserved[10];
+};
+
+/*
+ * Device Scope Types
+ */
+#define ACPI_DMAR_DEVICE_SCOPE_PCI_ENDPOINT 0x01
+#define ACPI_DMAR_DEVICE_SCOPE_PCI_SUB_HIERARACHY   0x01
+#define ACPI_DMAR_DEVICE_SCOPE_IOAPIC   0x03
+#define ACPI_DMAR_DEVICE_SCOPE_HPET 0x04
+#define ACPI_DMAR_DEVICE_SCOPE_ACPI_NAMESPACE_DEVICE0x05
+
+struct dmar_device_scope {
+uint8_t type;
+uint8_t length;
+uint8_t reserved[2];
+uint8_t enumeration_id;
+uint8_t bus;
+uint16_t path[0];
+};
+
+/*
+ * DMA Remapping Hardware Unit Types
+ */
+#define ACPI_DMAR_TYPE_HARDWARE_UNIT0x00
+#define ACPI_DMAR_TYPE_RESERVED_MEMORY  0x01
+#define ACPI_DMAR_TYPE_ATSR 0x02
+#define ACPI_DMAR_TYPE_HARDWARE_AFFINITY0x03
+#define ACPI_DMAR_TYPE_ANDD 0x04
+
+/*
+ * DMA Remapping Hardware Unit Flags. All other bits are reserved and must be 
0.
+ */
+#define ACPI_DMAR_INCLUDE_PCI_ALL   (1 << 0)
+
+struct acpi_dmar_hardware_unit {
+uint16_t type;
+uint16_t length;
+uint8_t flags;
+uint8_t reserved;
+uint16_t pci_segment;
+uint64_t base_address;
+struct dmar_device_scope scope[0];
+};
+
+/*
  * Table Signatures.
  */
 #define ACPI_2_0_RSDP_SIGNATURE ASCII64('R','S','D',' ','P','T','R',' ')
@@ -435,6 +494,7 @@ struct acpi_20_slit {
 #define ACPI_2_0_WAET_SIGNATURE ASCII32('W','A','E','T')
 #define ACPI_2_0_SRAT_SIGNATURE ASCII32('S','R','A','T')
 #define ACPI_2_0_SLIT_SIGNATURE ASCII32('S','L','I','T')
+#define ACPI_2_0_DMAR_SIGNATURE ASCII32('D','M','A','R')
 
 /*
  * Table revision numbers.
@@ -449,6 +509,7 @@ struct acpi_20_slit {
 #define ACPI_1_0_FADT_REVISION 0x01
 #define ACPI_2_0_SRAT_REVISION 0x01
 #define ACPI_2_0_SLIT_REVISION 0x01
+#define ACPI_2_0_DMAR_REVISION 0x01
 
 #pragma pack ()
 
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH V3 2/29] VIOMMU: Add vIOMMU helper functions to create, destroy vIOMMU instance

2017-09-22 Thread Lan Tianyu
This patch is to introduce an abstract layer for arch vIOMMU implementation
to deal with requests from dom0. Arch vIOMMU code needs to provide callback
to do create and destroy operation.

Signed-off-by: Lan Tianyu <tianyu@intel.com>
---
 docs/misc/xen-command-line.markdown |   7 ++
 xen/arch/x86/Kconfig|   1 +
 xen/common/Kconfig  |   3 +
 xen/common/Makefile |   1 +
 xen/common/domain.c |   4 +
 xen/common/viommu.c | 144 
 xen/include/xen/sched.h |   8 ++
 xen/include/xen/viommu.h|  63 
 8 files changed, 231 insertions(+)
 create mode 100644 xen/common/viommu.c
 create mode 100644 xen/include/xen/viommu.h

diff --git a/docs/misc/xen-command-line.markdown 
b/docs/misc/xen-command-line.markdown
index 9797c8d..dfd1db5 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1825,3 +1825,10 @@ mode.
 > Default: `true`
 
 Permit use of the `xsave/xrstor` instructions.
+
+### viommu
+> `= `
+
+> Default: `false`
+
+Permit use of viommu interface to create and destroy viommu device model.
diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index 30c2769..1f1de96 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -23,6 +23,7 @@ config X86
select HAS_PDX
select NUMA
select VGA
+   select VIOMMU
 
 config ARCH_DEFCONFIG
string
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index dc8e876..2ad2c8d 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -49,6 +49,9 @@ config HAS_CHECKPOLICY
string
option env="XEN_HAS_CHECKPOLICY"
 
+config VIOMMU
+   bool
+
 config KEXEC
bool "kexec support"
default y
diff --git a/xen/common/Makefile b/xen/common/Makefile
index 39e2614..da32f71 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -56,6 +56,7 @@ obj-y += time.o
 obj-y += timer.o
 obj-y += trace.o
 obj-y += version.o
+obj-$(CONFIG_VIOMMU) += viommu.o
 obj-y += virtual_region.o
 obj-y += vm_event.o
 obj-y += vmap.o
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 5aebcf2..cdb1c9d 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -814,6 +814,10 @@ static void complete_domain_destroy(struct rcu_head *head)
 
 sched_destroy_domain(d);
 
+#ifdef CONFIG_VIOMMU
+viommu_destroy_domain(d);
+#endif
+
 /* Free page used by xen oprofile buffer. */
 #ifdef CONFIG_XENOPROF
 free_xenoprof_pages(d);
diff --git a/xen/common/viommu.c b/xen/common/viommu.c
new file mode 100644
index 000..64d91e6
--- /dev/null
+++ b/xen/common/viommu.c
@@ -0,0 +1,144 @@
+/*
+ * common/viommu.c
+ *
+ * Copyright (c) 2017 Intel Corporation
+ * Author: Lan Tianyu <tianyu@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+bool __read_mostly opt_viommu;
+boolean_param("viommu", opt_viommu);
+
+static DEFINE_SPINLOCK(type_list_lock);
+static LIST_HEAD(type_list);
+
+struct viommu_type {
+uint64_t type;
+struct viommu_ops *ops;
+struct list_head node;
+};
+
+int viommu_destroy_domain(struct domain *d)
+{
+int ret;
+
+if ( !d->viommu )
+return -EINVAL;
+
+ret = d->viommu->ops->destroy(d->viommu);
+if ( ret < 0 )
+return ret;
+
+xfree(d->viommu);
+d->viommu = NULL;
+return 0;
+}
+
+static struct viommu_type *viommu_get_type(uint64_t type)
+{
+struct viommu_type *viommu_type = NULL;
+
+spin_lock(_list_lock);
+list_for_each_entry( viommu_type, _list, node )
+{
+if ( viommu_type->type == type )
+{
+spin_unlock(_list_lock);
+return viommu_type;
+}
+}
+spin_unlock(_list_lock);
+
+return NULL;
+}
+
+int viommu_register_type(uint64_t type, struct viommu_ops *ops)
+{
+struct viommu_type *viommu_type = NULL;
+
+if ( !viommu_enabled() )
+return -ENODEV;
+
+if ( viommu_get_type(type) )
+return -EEXIST;
+
+viommu_type = xzalloc(struct viommu_type);
+if ( !viommu_type )
+return -ENOMEM;
+
+viommu_type->type = type;
+viommu_type->ops = ops;
+
+spin_lock(_list_lock);
+list_add_tail(_type->node, _list);
+spin_unlock

[Xen-devel] [PATCH V3 1/29] Xen/doc: Add Xen virtual IOMMU doc

2017-09-22 Thread Lan Tianyu
This patch is to add Xen virtual IOMMU doc to introduce motivation,
framework, vIOMMU hypercall and xl configuration.

Signed-off-by: Lan Tianyu <tianyu@intel.com>
---
 docs/misc/viommu.txt | 136 +++
 1 file changed, 136 insertions(+)
 create mode 100644 docs/misc/viommu.txt

diff --git a/docs/misc/viommu.txt b/docs/misc/viommu.txt
new file mode 100644
index 000..348e8c4
--- /dev/null
+++ b/docs/misc/viommu.txt
@@ -0,0 +1,136 @@
+Xen virtual IOMMU
+
+Motivation
+==
+Enable more than 128 vcpu support
+
+The current requirements of HPC cloud service requires VM with a high
+number of CPUs in order to achieve high performance in parallel
+computing.
+
+To support >128 vcpus, X2APIC mode in guest is necessary because legacy
+APIC(XAPIC) just supports 8-bit APIC ID. The APIC ID used by Xen is
+CPU ID * 2 (ie: CPU 127 has APIC ID 254, which is the last one available
+in xAPIC mode) and so it only can support 128 vcpus at most. x2APIC mode
+supports 32-bit APIC ID and it requires the interrupt remapping functionality
+of a vIOMMU if the guest wishes to route interrupts to all available vCPUs
+
+The reason for this is that there is no modification for existing PCI MSI
+and IOAPIC when introduce X2APIC. PCI MSI/IOAPIC can only send interrupt
+message containing 8-bit APIC ID, which cannot address cpus with >254
+APIC ID. Interrupt remapping supports 32-bit APIC ID and so it's necessary
+for >128 vcpus support.
+
+
+vIOMMU Architecture
+===
+vIOMMU device model is inside Xen hypervisor for following factors
+1) Avoid round trips between Qemu and Xen hypervisor
+2) Ease of integration with the rest of hypervisor
+3) HVMlite/PVH doesn't use Qemu
+
+* Interrupt remapping overview.
+Interrupts from virtual devices and physical devices are delivered
+to vLAPIC from vIOAPIC and vMSI. vIOMMU needs to remap interrupt during
+this procedure.
+
++---+
+|Qemu   |VM |
+|   | ++|
+|   | |  Device driver ||
+|   | ++---+|
+|   |  ^|
+|   ++  | ++---+|
+|   | Virtual device |  | |  IRQ subsystem ||
+|   +---++  | ++---+|
+|   |   |  ^|
+|   |   |  ||
++---+---+
+|hypervisor |  | VIRQ   |
+|   |+-++   |
+|   ||  vLAPIC  |   |
+|   |VIRQ+-++   |
+|   |  ^|
+|   |  ||
+|   |+-++   |
+|   ||  vIOMMU  |   |
+|   |+-++   |
+|   |  ^|
+|   |  ||
+|   |+-++   |
+|   ||   vIOAPIC/vMSI   |   |
+|   |++++   |
+|   | ^^|
+|   +-+||
+|  ||
++---+
+HW |IRQ
++---+
+|   PCI Device  |
++---+
+
+
+vIOMMU hypercall
+
+Introduce a new domctl hypercall "xen_domctl_viommu_op" to create/destroy
+vIOMMUs.
+
+* vIOMMU hypercall parameter structure
+
+/* vIOMMU type - specify vendor vIOMMU device model */
+#define VIOMMU_TYPE_INTEL_VTD 0
+
+/* vIOMMU capabilities */
+#define VIOMMU_CAP_IRQ_REMAPPING  (1u << 0)
+
+struct xen_domctl_viommu_op {
+uint32_t cmd;
+#define XEN_DOMCTL_create_viommu  0
+#define XEN_DOMCTL_destroy_viommu 1
+union {
+struct {
+/* IN - vIOMMU type  */
+uint64_t viommu_type;
+/* IN - MMIO base address of vIOMMU. */
+uint64_t base_address;
+/* IN - Capabilities with which we want to create */
+uint64_t capabilities;
+/* OUT - vIOMMU identity */
+uint32_t viommu_id;
+} create_viommu;
+
+struct {
+/* IN - vIOMMU identity */
+uint32_t viommu_id;
+} destroy_viommu;
+} u;
+};
+
+- XEN_DOMCTL_create_viommu
+Create vIOMMU device with vIOMMU_type, capabilities and MMIO base
+address. Hypervisor allocates viommu_id for

[Xen-devel] [PATCH V3 3/29] DOMCTL: Introduce new DOMCTL commands for vIOMMU support

2017-09-22 Thread Lan Tianyu
This patch is to introduce create, destroy and query capabilities
command for vIOMMU. vIOMMU layer will deal with requests and call
arch vIOMMU ops.

Signed-off-by: Lan Tianyu <tianyu@intel.com>
---
 xen/common/domctl.c |  6 ++
 xen/common/viommu.c | 30 ++
 xen/include/public/domctl.h | 42 ++
 xen/include/xen/viommu.h|  2 ++
 4 files changed, 80 insertions(+)

diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 42658e5..7e28237 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -1149,6 +1149,12 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) 
u_domctl)
 copyback = 1;
 break;
 
+#ifdef CONFIG_VIOMMU
+case XEN_DOMCTL_viommu_op:
+ret = viommu_domctl(d, >u.viommu_op, );
+break;
+#endif
+
 default:
 ret = arch_do_domctl(op, d, u_domctl);
 break;
diff --git a/xen/common/viommu.c b/xen/common/viommu.c
index 64d91e6..55feb5d 100644
--- a/xen/common/viommu.c
+++ b/xen/common/viommu.c
@@ -133,6 +133,36 @@ static int viommu_create(struct domain *d, uint64_t type,
 return 0;
 }
 
+int viommu_domctl(struct domain *d, struct xen_domctl_viommu_op *op,
+  bool *need_copy)
+{
+int rc = -EINVAL;
+
+if ( !viommu_enabled() )
+return -ENODEV;
+
+switch ( op->cmd )
+{
+case XEN_DOMCTL_create_viommu:
+rc = viommu_create(d, op->u.create.viommu_type,
+   op->u.create.base_address,
+   op->u.create.capabilities,
+   >u.create.viommu_id);
+if ( !rc )
+*need_copy = true;
+break;
+
+case XEN_DOMCTL_destroy_viommu:
+rc = viommu_destroy_domain(d);
+break;
+
+default:
+return -ENOSYS;
+}
+
+return rc;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 50ff58f..68854b6 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -1163,6 +1163,46 @@ struct xen_domctl_psr_cat_op {
 typedef struct xen_domctl_psr_cat_op xen_domctl_psr_cat_op_t;
 DEFINE_XEN_GUEST_HANDLE(xen_domctl_psr_cat_op_t);
 
+/*  vIOMMU helper
+ *
+ *  vIOMMU interface can be used to create/destroy vIOMMU and
+ *  query vIOMMU capabilities.
+ */
+
+/* vIOMMU type - specify vendor vIOMMU device model */
+#define VIOMMU_TYPE_INTEL_VTD   0
+
+/* vIOMMU capabilities */
+#define VIOMMU_CAP_IRQ_REMAPPING  (1u << 0)
+
+struct xen_domctl_viommu_op {
+uint32_t cmd;
+#define XEN_DOMCTL_create_viommu  0
+#define XEN_DOMCTL_destroy_viommu 1
+union {
+struct {
+/* IN - vIOMMU type */
+uint64_t viommu_type;
+/* 
+ * IN - MMIO base address of vIOMMU. vIOMMU device models
+ * are in charge of to check base_address.
+ */
+uint64_t base_address;
+/* IN - Capabilities with which we want to create */
+uint64_t capabilities;
+/* OUT - vIOMMU identity */
+uint32_t viommu_id;
+} create;
+
+struct {
+/* IN - vIOMMU identity */
+uint32_t viommu_id;
+} destroy;
+} u;
+};
+typedef struct xen_domctl_viommu_op xen_domctl_viommu_op;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_viommu_op);
+
 struct xen_domctl {
 uint32_t cmd;
 #define XEN_DOMCTL_createdomain   1
@@ -1240,6 +1280,7 @@ struct xen_domctl {
 #define XEN_DOMCTL_monitor_op77
 #define XEN_DOMCTL_psr_cat_op78
 #define XEN_DOMCTL_soft_reset79
+#define XEN_DOMCTL_viommu_op 80
 #define XEN_DOMCTL_gdbsx_guestmemio1000
 #define XEN_DOMCTL_gdbsx_pausevcpu 1001
 #define XEN_DOMCTL_gdbsx_unpausevcpu   1002
@@ -1302,6 +1343,7 @@ struct xen_domctl {
 struct xen_domctl_psr_cmt_oppsr_cmt_op;
 struct xen_domctl_monitor_opmonitor_op;
 struct xen_domctl_psr_cat_oppsr_cat_op;
+struct xen_domctl_viommu_op viommu_op;
 uint8_t pad[128];
 } u;
 };
diff --git a/xen/include/xen/viommu.h b/xen/include/xen/viommu.h
index 636a2a3..baa8ab7 100644
--- a/xen/include/xen/viommu.h
+++ b/xen/include/xen/viommu.h
@@ -43,6 +43,8 @@ static inline bool viommu_enabled(void)
 
 int viommu_register_type(uint64_t type, struct viommu_ops *ops);
 int viommu_destroy_domain(struct domain *d);
+int viommu_domctl(struct domain *d, struct xen_domctl_viommu_op *op,
+  bool_t *need_copy);
 #else
 static inline int viommu_register_type(uint64_t type, struct viommu_ops *ops)
 {
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH V3 1/3] Xen: Increase hap/shadow page pool size to support more vcpus support

2017-09-21 Thread Lan Tianyu
On 2017年09月20日 23:13, Wei Liu wrote:
> On Tue, Sep 19, 2017 at 11:06:26AM +0800, Lan Tianyu wrote:
>> Hi Wei:
>>
>> On 2017年09月18日 21:06, Wei Liu wrote:
>>> On Wed, Sep 13, 2017 at 12:52:47AM -0400, Lan Tianyu wrote:
>>>> This patch is to increase page pool size when max vcpu number is larger
>>>> than 128.
>>>>
>>>> Signed-off-by: Lan Tianyu <tianyu@intel.com>
>>>> ---
>>>>  xen/arch/arm/domain.c|  5 +
>>>>  xen/arch/x86/domain.c| 25 +
>>>>  xen/common/domctl.c  |  3 +++
>>>>  xen/include/xen/domain.h |  2 ++
>>>>  4 files changed, 35 insertions(+)
>>>>
>>>> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
>>>> index 6512f01..94cf70b 100644
>>>> --- a/xen/arch/arm/domain.c
>>>> +++ b/xen/arch/arm/domain.c
>>>> @@ -824,6 +824,11 @@ int arch_vcpu_reset(struct vcpu *v)
>>>>  return 0;
>>>>  }
>>>>  
>>>> +int arch_domain_set_max_vcpus(struct domain *d)
>>>> +{
>>>> +return 0;
>>>> +}
>>>> +
>>>>  static int relinquish_memory(struct domain *d, struct page_list_head 
>>>> *list)
>>>>  {
>>>>  struct page_info *page, *tmp;
>>>> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
>>>> index dbddc53..0e230f9 100644
>>>> --- a/xen/arch/x86/domain.c
>>>> +++ b/xen/arch/x86/domain.c
>>>> @@ -1161,6 +1161,31 @@ int arch_vcpu_reset(struct vcpu *v)
>>>>  return 0;
>>>>  }
>>>>  
>>>> +int arch_domain_set_max_vcpus(struct domain *d)
>>>
>>> The name doesn't match what the function does.
>>>
>>
>> I originally hoped to introduce a hook for each arch when set max vcpus.
>> Each arch function can do customized thing and so named
>> "arch_domain_set_max_vcpus".
>>
>> How about "arch_domain_setup_vcpus_resource"?
> 
> Before you go away and do a lot of work, please let us think about if
> this is the right approach first.

Sure. This idea that increase page pool when set max vcpu is from Jan.
Jan, Could you help to check whether current patch is right approach?
Thanks.

> 
> We are close to freeze, with the amount of patches we receive everyday
> RFC patch like this one is low on my (can't speak for others) priority
> list. I am not sure when I will be able to get back to this, but do ping
> us if you want to know where things stand.
> 


-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH V3 1/3] Xen: Increase hap/shadow page pool size to support more vcpus support

2017-09-18 Thread Lan Tianyu
Hi Wei:

On 2017年09月18日 21:06, Wei Liu wrote:
> On Wed, Sep 13, 2017 at 12:52:47AM -0400, Lan Tianyu wrote:
>> This patch is to increase page pool size when max vcpu number is larger
>> than 128.
>>
>> Signed-off-by: Lan Tianyu <tianyu@intel.com>
>> ---
>>  xen/arch/arm/domain.c|  5 +
>>  xen/arch/x86/domain.c| 25 +
>>  xen/common/domctl.c  |  3 +++
>>  xen/include/xen/domain.h |  2 ++
>>  4 files changed, 35 insertions(+)
>>
>> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
>> index 6512f01..94cf70b 100644
>> --- a/xen/arch/arm/domain.c
>> +++ b/xen/arch/arm/domain.c
>> @@ -824,6 +824,11 @@ int arch_vcpu_reset(struct vcpu *v)
>>  return 0;
>>  }
>>  
>> +int arch_domain_set_max_vcpus(struct domain *d)
>> +{
>> +return 0;
>> +}
>> +
>>  static int relinquish_memory(struct domain *d, struct page_list_head *list)
>>  {
>>  struct page_info *page, *tmp;
>> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
>> index dbddc53..0e230f9 100644
>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -1161,6 +1161,31 @@ int arch_vcpu_reset(struct vcpu *v)
>>  return 0;
>>  }
>>  
>> +int arch_domain_set_max_vcpus(struct domain *d)
> 
> The name doesn't match what the function does.
> 

I originally hoped to introduce a hook for each arch when set max vcpus.
Each arch function can do customized thing and so named
"arch_domain_set_max_vcpus".

How about "arch_domain_setup_vcpus_resource"?


>> +{
>> +int ret;
>> +
>> +/* Increase page pool in order to support more vcpus. */
>> +if ( d->max_vcpus > 128 )
>> +{
>> +unsigned long nr_pages;
>> +
>> +if (hap_enabled(d))
> 
> Coding style.

Will update. Thanks.

> 
>> +nr_pages = 1024;
>> +else
>> +nr_pages = 4096;
>> +
>> +ret = paging_set_allocation(d, nr_pages, NULL);
> 
> Does this work on PV guests?


Sorry. This code should not run for PV guest. Will add a domain type
check here.

> 
>> +if ( ret != 0 )
>> +{
>> +paging_set_allocation(d, 0, NULL);
>> +return ret;
>> +}
>> +}
>> +
>> +return 0;
>> +}
>> +
>>  long
>>  arch_do_vcpu_op(
>>  int cmd, struct vcpu *v, XEN_GUEST_HANDLE_PARAM(void) arg)
>> diff --git a/xen/common/domctl.c b/xen/common/domctl.c
>> index 42658e5..64357a3 100644
>> --- a/xen/common/domctl.c
>> +++ b/xen/common/domctl.c
>> @@ -631,6 +631,9 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) 
>> u_domctl)
>>  d->max_vcpus = max;
>>  }
>>  
>> +if ( arch_domain_set_max_vcpus(d) < 0)
> 
> != 0 please.
> 

Sure.

>> +goto maxvcpu_out;
>> +
>>  for ( i = 0; i < max; i++ )
>>  {
>>  if ( d->vcpu[i] != NULL )
>> diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h
>> index 347f264..e1ece3a 100644
>> --- a/xen/include/xen/domain.h
>> +++ b/xen/include/xen/domain.h
>> @@ -81,6 +81,8 @@ void arch_dump_domain_info(struct domain *d);
>>  
>>  int arch_vcpu_reset(struct vcpu *);
>>  
>> +int arch_domain_set_max_vcpus(struct domain *d);
>> +
>>  extern spinlock_t vcpu_alloc_lock;
>>  bool_t domctl_lock_acquire(void);
>>  void domctl_lock_release(void);
>> -- 
>> 1.8.3.1
>>


-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC PATCH V3 2/3] Tool/ACPI: DSDT extension to support more vcpus

2017-09-13 Thread Lan Tianyu
This patch is to change DSDT table for processor object to support >128 vcpus
accroding to ACPI spec 8.4 Declaring Processors

Signed-off-by: Lan Tianyu <tianyu@intel.com>
---
 tools/libacpi/mk_dsdt.c | 31 +--
 1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/tools/libacpi/mk_dsdt.c b/tools/libacpi/mk_dsdt.c
index 2daf32c..09c1529 100644
--- a/tools/libacpi/mk_dsdt.c
+++ b/tools/libacpi/mk_dsdt.c
@@ -24,6 +24,8 @@
 #include 
 #endif
 
+#define CPU_NAME_FMT  "P%.03X"
+
 static unsigned int indent_level;
 static bool debug = false;
 
@@ -196,10 +198,27 @@ int main(int argc, char **argv)
 /* Define processor objects and control methods. */
 for ( cpu = 0; cpu < max_cpus; cpu++)
 {
-push_block("Processor", "PR%02X, %d, 0xb010, 0x06", cpu, cpu);
 
-stmt("Name", "_HID, \"ACPI0007\"");
+#ifdef CONFIG_X86
+unsigned int apic_id = cpu * 2;
+
+if ( apic_id > 254 )
+{
+push_block("Device", CPU_NAME_FMT, cpu);
+}
+else
+#endif
+{
+if (cpu > 255)
+{
+fprintf(stderr, "Exceed the range of processor ID \n");
+return -1;
+}
+push_block("Processor", CPU_NAME_FMT ", %d,0xb010, 0x06",
+   cpu, cpu);
+}
 
+stmt("Name", "_HID, \"ACPI0007\"");
 stmt("Name", "_UID, %d", cpu);
 #ifdef CONFIG_ARM_64
 pop_block();
@@ -268,15 +287,15 @@ int main(int argc, char **argv)
 /* Extract current CPU's status: 0=offline; 1=online. */
 stmt("And", "Local1, 1, Local2");
 /* Check if status is up-to-date in the relevant MADT LAPIC entry... */
-push_block("If", "LNotEqual(Local2, \\_SB.PR%02X.FLG)", cpu);
+push_block("If", "LNotEqual(Local2, \\_SB." CPU_NAME_FMT ".FLG)", cpu);
 /* ...If not, update it and the MADT checksum, and notify OSPM. */
-stmt("Store", "Local2, \\_SB.PR%02X.FLG", cpu);
+stmt("Store", "Local2, \\_SB." CPU_NAME_FMT ".FLG", cpu);
 push_block("If", "LEqual(Local2, 1)");
-stmt("Notify", "PR%02X, 1", cpu); /* Notify: Device Check */
+stmt("Notify", CPU_NAME_FMT ", 1", cpu); /* Notify: Device Check */
 stmt("Subtract", "\\_SB.MSU, 1, \\_SB.MSU"); /* Adjust MADT csum */
 pop_block();
 push_block("Else", NULL);
-stmt("Notify", "PR%02X, 3", cpu); /* Notify: Eject Request */
+stmt("Notify", CPU_NAME_FMT ", 3", cpu); /* Notify: Eject Request */
 stmt("Add", "\\_SB.MSU, 1, \\_SB.MSU"); /* Adjust MADT csum */
 pop_block();
 pop_block();
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC PATCH V3 3/3] hvmload: Add x2apic entry support in the MADT build

2017-09-13 Thread Lan Tianyu
This patch is to add x2apic entry support for ACPI MADT table
according to ACPI spec 5.2.12.12 Processor Local x2APIC Structure.

Signed-off-by: Chao Gao <chao@intel.com>
Signed-off-by: Lan Tianyu <tianyu@intel.com>
---
 tools/libacpi/acpi2_0.h | 10 +
 tools/libacpi/build.c   | 56 -
 2 files changed, 51 insertions(+), 15 deletions(-)

diff --git a/tools/libacpi/acpi2_0.h b/tools/libacpi/acpi2_0.h
index 2619ba3..ada5131 100644
--- a/tools/libacpi/acpi2_0.h
+++ b/tools/libacpi/acpi2_0.h
@@ -322,6 +322,7 @@ struct acpi_20_waet {
 #define ACPI_IO_SAPIC   0x06
 #define ACPI_PROCESSOR_LOCAL_SAPIC  0x07
 #define ACPI_PLATFORM_INTERRUPT_SOURCES 0x08
+#define ACPI_PROCESSOR_LOCAL_X2APIC 0x09
 
 /*
  * APIC Structure Definitions.
@@ -338,6 +339,15 @@ struct acpi_20_madt_lapic {
 uint32_t flags;
 };
 
+struct acpi_20_madt_x2apic {
+uint8_t  type;
+uint8_t  length;
+uint16_t reserved;  /* reserved - must be zero */
+uint32_t apic_id;   /* Processor x2APIC ID  */
+uint32_t flags;
+uint32_t acpi_processor_id; /* ACPI processor UID */
+};
+
 /*
  * Local APIC Flags.  All other bits are reserved and must be 0.
  */
diff --git a/tools/libacpi/build.c b/tools/libacpi/build.c
index f9881c9..4830339 100644
--- a/tools/libacpi/build.c
+++ b/tools/libacpi/build.c
@@ -78,9 +78,9 @@ static struct acpi_20_madt *construct_madt(struct acpi_ctxt 
*ctxt,
 struct acpi_20_madt   *madt;
 struct acpi_20_madt_intsrcovr *intsrcovr;
 struct acpi_20_madt_ioapic*io_apic;
-struct acpi_20_madt_lapic *lapic;
 const struct hvm_info_table   *hvminfo = config->hvminfo;
 int i, sz;
+void *end;
 
 if ( config->lapic_id == NULL )
 return NULL;
@@ -88,7 +88,14 @@ static struct acpi_20_madt *construct_madt(struct acpi_ctxt 
*ctxt,
 sz  = sizeof(struct acpi_20_madt);
 sz += sizeof(struct acpi_20_madt_intsrcovr) * 16;
 sz += sizeof(struct acpi_20_madt_ioapic);
-sz += sizeof(struct acpi_20_madt_lapic) * hvminfo->nr_vcpus;
+
+for ( i = 0; i < hvminfo->nr_vcpus; i++ )
+{
+if ( config->lapic_id(i) > 254)
+sz += sizeof(struct acpi_20_madt_x2apic);
+else
+sz += sizeof(struct acpi_20_madt_lapic);
+}
 
 madt = ctxt->mem_ops.alloc(ctxt, sz, 16);
 if (!madt) return NULL;
@@ -142,27 +149,46 @@ static struct acpi_20_madt *construct_madt(struct 
acpi_ctxt *ctxt,
 io_apic->ioapic_id   = config->ioapic_id;
 io_apic->ioapic_addr = config->ioapic_base_address;
 
-lapic = (struct acpi_20_madt_lapic *)(io_apic + 1);
+end = (struct acpi_20_madt_lapic *)(io_apic + 1);
 }
 else
-lapic = (struct acpi_20_madt_lapic *)(madt + 1);
+end = (struct acpi_20_madt_lapic *)(madt + 1);
 
 info->nr_cpus = hvminfo->nr_vcpus;
-info->madt_lapic0_addr = ctxt->mem_ops.v2p(ctxt, lapic);
+info->madt_lapic0_addr = ctxt->mem_ops.v2p(ctxt, end);
+
 for ( i = 0; i < hvminfo->nr_vcpus; i++ )
 {
-memset(lapic, 0, sizeof(*lapic));
-lapic->type= ACPI_PROCESSOR_LOCAL_APIC;
-lapic->length  = sizeof(*lapic);
-/* Processor ID must match processor-object IDs in the DSDT. */
-lapic->acpi_processor_id = i;
-lapic->apic_id = config->lapic_id(i);
-lapic->flags = (test_bit(i, hvminfo->vcpu_online)
-? ACPI_LOCAL_APIC_ENABLED : 0);
-lapic++;
+unsigned int apic_id = config->lapic_id(i);
+
+if ( apic_id < 255 ) {
+struct acpi_20_madt_lapic *lapic = end;
+
+memset(lapic, 0, sizeof(*lapic));
+lapic->type= ACPI_PROCESSOR_LOCAL_APIC;
+lapic->length  = sizeof(*lapic);
+/* Processor ID must match processor-object IDs in the DSDT. */
+lapic->acpi_processor_id = i;
+lapic->apic_id = apic_id;
+lapic->flags = test_bit(i, hvminfo->vcpu_online)
+? ACPI_LOCAL_APIC_ENABLED : 0;
+end = ++lapic;
+} else {
+struct acpi_20_madt_x2apic *lapic = end;
+
+memset(lapic, 0, sizeof(*lapic));
+lapic->type= ACPI_PROCESSOR_LOCAL_X2APIC;
+lapic->length  = sizeof(*lapic);
+/* Processor ID must match processor-object IDs in the DSDT. */
+lapic->acpi_processor_id = i;
+lapic->apic_id = apic_id;
+lapic->flags =  test_bit(i, hvminfo->vcpu_online)
+? ACPI_LOCAL_APIC_ENABLED : 0;
+end = ++lapic;
+}
 }
 
-madt->header.length = (unsigned char *)lapic - (unsigned char *)madt;
+madt->header.length = (unsigned char *)end - (unsigned char *)madt;
  

[Xen-devel] [RFC PATCH V3 0/3] Extend resources to support more vcpus in single VM.

2017-09-13 Thread Lan Tianyu
Change since v2:
   1) Increase page pool size during setting max vcpu
   2) Allocate madt table size according APIC id of each vcpus
   3) Fix some code style issues.

Change since v1:
   1) Increase hap page pool according vcpu number
   2) Use "Processor" syntax to define vcpus with APIC id < 255
in dsdt and use "Device" syntax for other vcpus in ACPI DSDT table.
   3) Use XAPIC structure for vcpus with APIC id < 255
in dsdt and use x2APIC structure for other vcpus in the ACPI MADT table.

This patchset is to extend some resources(i.e, event channel,
hap and so) to support more vcpus for single VM.


Lan Tianyu (3):
  Xen: Increase hap/shadow page pool size to support more vcpus support
  Tool/ACPI: DSDT extension to support more vcpus
  hvmload: Add x2apic entry support in the MADT build

 tools/libacpi/acpi2_0.h  | 10 +
 tools/libacpi/build.c| 56 +++-
 tools/libacpi/mk_dsdt.c  | 31 +--
 xen/arch/arm/domain.c|  5 +
 xen/arch/x86/domain.c| 25 +
 xen/common/domctl.c  |  3 +++
 xen/include/xen/domain.h |  2 ++
 7 files changed, 111 insertions(+), 21 deletions(-)

-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC PATCH V3 1/3] Xen: Increase hap/shadow page pool size to support more vcpus support

2017-09-13 Thread Lan Tianyu
This patch is to increase page pool size when max vcpu number is larger
than 128.

Signed-off-by: Lan Tianyu <tianyu@intel.com>
---
 xen/arch/arm/domain.c|  5 +
 xen/arch/x86/domain.c| 25 +
 xen/common/domctl.c  |  3 +++
 xen/include/xen/domain.h |  2 ++
 4 files changed, 35 insertions(+)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 6512f01..94cf70b 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -824,6 +824,11 @@ int arch_vcpu_reset(struct vcpu *v)
 return 0;
 }
 
+int arch_domain_set_max_vcpus(struct domain *d)
+{
+return 0;
+}
+
 static int relinquish_memory(struct domain *d, struct page_list_head *list)
 {
 struct page_info *page, *tmp;
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index dbddc53..0e230f9 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1161,6 +1161,31 @@ int arch_vcpu_reset(struct vcpu *v)
 return 0;
 }
 
+int arch_domain_set_max_vcpus(struct domain *d)
+{
+int ret;
+
+/* Increase page pool in order to support more vcpus. */
+if ( d->max_vcpus > 128 )
+{
+unsigned long nr_pages;
+
+if (hap_enabled(d))
+nr_pages = 1024;
+else
+nr_pages = 4096;
+
+ret = paging_set_allocation(d, nr_pages, NULL);
+if ( ret != 0 )
+{
+paging_set_allocation(d, 0, NULL);
+return ret;
+}
+}
+
+return 0;
+}
+
 long
 arch_do_vcpu_op(
 int cmd, struct vcpu *v, XEN_GUEST_HANDLE_PARAM(void) arg)
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 42658e5..64357a3 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -631,6 +631,9 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) 
u_domctl)
 d->max_vcpus = max;
 }
 
+if ( arch_domain_set_max_vcpus(d) < 0)
+goto maxvcpu_out;
+
 for ( i = 0; i < max; i++ )
 {
 if ( d->vcpu[i] != NULL )
diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h
index 347f264..e1ece3a 100644
--- a/xen/include/xen/domain.h
+++ b/xen/include/xen/domain.h
@@ -81,6 +81,8 @@ void arch_dump_domain_info(struct domain *d);
 
 int arch_vcpu_reset(struct vcpu *);
 
+int arch_domain_set_max_vcpus(struct domain *d);
+
 extern spinlock_t vcpu_alloc_lock;
 bool_t domctl_lock_acquire(void);
 void domctl_lock_release(void);
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH V2 2/4] Tool/ACPI: DSDT extension to support more vcpus

2017-09-04 Thread Lan Tianyu
On 2017年09月04日 17:05, Roger Pau Monné wrote:
> On Mon, Sep 04, 2017 at 11:07:14AM +0800, Lan Tianyu wrote:
>> On 2017年09月01日 17:41, Roger Pau Monné wrote:
>>> On Fri, Sep 01, 2017 at 10:54:02AM +0800, Lan Tianyu wrote:
>>>> On 2017年08月31日 23:38, Roger Pau Monné wrote:
>>>>> On Thu, Aug 31, 2017 at 01:01:47AM -0400, Lan Tianyu wrote:
>>>>>> This patch is to change DSDT table for processor object to support >128 
>>>>>> vcpus
>>>>>> accroding to ACPI spec 8.4 Declaring Processors
>>>>>>
>>>>>> Signed-off-by: Lan Tianyu <tianyu@intel.com>
>>>>>> ---
>>>>>>  tools/libacpi/mk_dsdt.c | 18 --
>>>>>>  1 file changed, 12 insertions(+), 6 deletions(-)
>>>>>>
>>>>>> diff --git a/tools/libacpi/mk_dsdt.c b/tools/libacpi/mk_dsdt.c
>>>>>> index 2daf32c..6c4c325 100644
>>>>>> --- a/tools/libacpi/mk_dsdt.c
>>>>>> +++ b/tools/libacpi/mk_dsdt.c
>>>>>> @@ -24,6 +24,8 @@
>>>>>>  #include 
>>>>>>  #endif
>>>>>>  
>>>>>> +#define CPU_NAME_FMT  "P%.03X"
>>>>>> +
>>>>>>  static unsigned int indent_level;
>>>>>>  static bool debug = false;
>>>>>>  
>>>>>> @@ -196,10 +198,14 @@ int main(int argc, char **argv)
>>>>>>  /* Define processor objects and control methods. */
>>>>>>  for ( cpu = 0; cpu < max_cpus; cpu++)
>>>>>>  {
>>>>>> -push_block("Processor", "PR%02X, %d, 0xb010, 0x06", cpu, 
>>>>>> cpu);
>>>>>> +unsigned int apic_id = cpu * 2;
>>>>>
>>>>> This is fragile, ideally there should be a single point where the APIC
>>>>> ID is calculated. Although there are already two places where the APIC
>>>>> ID is calculated, in hvmloader and libxl.
>>>>>
>>>>> And I'm not sure how to use any of those here in order to avoid
>>>>> introducing a third one.
>>>>
>>>> The mk_dsdt is independent tool to build dsdt table. It wasn't linked
>>>> with libxl and hvmloader. We can't reuse old function to do that.
>>>>
>>>> But I think we may introduce a new LAPIC_ID(vcpu) in the arch head
>>>> file(i.e, #include ) and replace old ones.
>>>
>>> There's already a LAPIC_ID macro in hvmloader headers which should be
>>> placed somewhere suitable.
>>
>> Yes, this is what I mentioned.
> 
> Jan has expressed some concerns with removing the hook, see:
> 
> <59a94e32027800176...@prv-mh.provo.novell.com>

So we still need to introduce LAPIC_ID() here, right?

> 
>>> What about removing the lapic_id hook from
>>> acpi_config and placing the LAPIC_ID macro in the libacpi.h header?
>>
>> I think this should be ARCH specific. I am not sure whether ARM follows
>> rule of "apic_id = vcpu_id *2".
>>
>> Julien, could you give some inputs? Thanks.
> 
> AFAIK ARM doesn't have a local APIC, so there are no xAPIC/x2APIC
> entries in the ARM MADT.
> 




-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH V2 3/4] hvmload: Add x2apic entry support in the MADT build

2017-09-04 Thread Lan Tianyu
On 2017年09月01日 17:57, Roger Pau Monné wrote:
> On Thu, Aug 31, 2017 at 01:01:48AM -0400, Lan Tianyu wrote:
>> This patch is to add x2apic entry support for ACPI MADT table
>> according to ACPI spec 5.2.12.12 Processor Local x2APIC Structure
>>
>> Signed-off-by: Chao Gao <chao@intel.com>
>> Signed-off-by: Lan Tianyu <tianyu@intel.com>
>> ---
>>  tools/libacpi/acpi2_0.h | 10 +
>>  tools/libacpi/build.c   | 59 
>> +++--
>>  2 files changed, 52 insertions(+), 17 deletions(-)
>>
>> diff --git a/tools/libacpi/acpi2_0.h b/tools/libacpi/acpi2_0.h
>> index 758a823..caa3682 100644
>> --- a/tools/libacpi/acpi2_0.h
>> +++ b/tools/libacpi/acpi2_0.h
>> @@ -322,6 +322,7 @@ struct acpi_20_waet {
>>  #define ACPI_IO_SAPIC   0x06
>>  #define ACPI_PROCESSOR_LOCAL_SAPIC  0x07
>>  #define ACPI_PLATFORM_INTERRUPT_SOURCES 0x08
>> +#define ACPI_PROCESSOR_LOCAL_X2APIC 0x09
>>  
>>  /*
>>   * APIC Structure Definitions.
>> @@ -338,6 +339,15 @@ struct acpi_20_madt_lapic {
>>  uint32_t flags;
>>  };
>>  
>> +struct acpi_20_madt_x2apic {
>> +uint8_t  type;
>> +uint8_t  length;
>> +uint16_t reserved;  /* reserved - must be zero */
>> +uint32_t apic_id;   /* Processor x2APIC ID  */
>> +uint32_t flags;
>> +uint32_t acpi_processor_id; /* ACPI processor UID */
>> +};
>> +
>>  /*
>>   * Local APIC Flags.  All other bits are reserved and must be 0.
>>   */
>> diff --git a/tools/libacpi/build.c b/tools/libacpi/build.c
>> index c7cc784..0c95850 100644
>> --- a/tools/libacpi/build.c
>> +++ b/tools/libacpi/build.c
>> @@ -82,9 +82,9 @@ static struct acpi_20_madt *construct_madt(struct 
>> acpi_ctxt *ctxt,
>>  struct acpi_20_madt   *madt;
>>  struct acpi_20_madt_intsrcovr *intsrcovr;
>>  struct acpi_20_madt_ioapic*io_apic;
>> -struct acpi_20_madt_lapic *lapic;
>>  const struct hvm_info_table   *hvminfo = config->hvminfo;
>>  int i, sz;
>> +void *end;
>>  
>>  if ( config->lapic_id == NULL )
>>  return NULL;
>> @@ -92,7 +92,12 @@ static struct acpi_20_madt *construct_madt(struct 
>> acpi_ctxt *ctxt,
>>  sz  = sizeof(struct acpi_20_madt);
>>  sz += sizeof(struct acpi_20_madt_intsrcovr) * 16;
>>  sz += sizeof(struct acpi_20_madt_ioapic);
>> -sz += sizeof(struct acpi_20_madt_lapic) * hvminfo->nr_vcpus;
>> +
>> +if (hvminfo->nr_vcpus < 128)
> 
> This should be done based on APIC ID.

There will be a problem how to get max apic id. Should we use the max
vcpu index to get max APIC id?

> 
>> +sz += sizeof(struct acpi_20_madt_lapic) * hvminfo->nr_vcpus;
>> +else
>> +sz += sizeof(struct acpi_20_madt_lapic) * 128 +
>> +  sizeof(struct acpi_20_madt_x2apic) * (hvminfo->nr_vcpus - 
>> 128);
>>  
>>  madt = ctxt->mem_ops.alloc(ctxt, sz, 16);
>>  if (!madt) return NULL;
>> @@ -109,7 +114,7 @@ static struct acpi_20_madt *construct_madt(struct 
>> acpi_ctxt *ctxt,
>>  madt->flags  = ACPI_PCAT_COMPAT;
>>  
>>  if ( config->table_flags & ACPI_HAS_IOAPIC )
>> -{ 
>> +{
> 
> Spurious cleanup?
> 
>>  intsrcovr = (struct acpi_20_madt_intsrcovr *)(madt + 1);
>>  for ( i = 0; i < 16; i++ )
>>  {
>> @@ -146,27 +151,47 @@ static struct acpi_20_madt *construct_madt(struct 
>> acpi_ctxt *ctxt,
>>  io_apic->ioapic_id   = config->ioapic_id;
>>  io_apic->ioapic_addr = config->ioapic_base_address;
>>  
>> -lapic = (struct acpi_20_madt_lapic *)(io_apic + 1);
>> +end = (struct acpi_20_madt_lapic *)(io_apic + 1);
>>  }
>>  else
>> -lapic = (struct acpi_20_madt_lapic *)(madt + 1);
>> +end = (struct acpi_20_madt_lapic *)(madt + 1);
>> +
>> +info->madt_lapic0_addr = ctxt->mem_ops.v2p(ctxt, end);
>>  
>> -info->nr_cpus = hvminfo->nr_vcpus;
> 
> Why are you moving this? AFAICT the value of nr_vpcus is not changed,
> so you might as well leave it as-is.

OK.

> 
>> -info->madt_lapic0_addr = ctxt->mem_ops.v2p(ctxt, lapic);
>>  for ( i = 0; i < hvminfo->nr_vcpus; i++ )
>>  {
>> -memset(lapic, 0, sizeof(*lapic));
>> -lapic->type= ACPI_PROCESSOR_LOCAL_APIC;
>> -la

Re: [Xen-devel] [RFC PATCH V2 2/4] Tool/ACPI: DSDT extension to support more vcpus

2017-09-03 Thread Lan Tianyu
On 2017年09月01日 17:41, Roger Pau Monné wrote:
> On Fri, Sep 01, 2017 at 10:54:02AM +0800, Lan Tianyu wrote:
>> On 2017年08月31日 23:38, Roger Pau Monné wrote:
>>> On Thu, Aug 31, 2017 at 01:01:47AM -0400, Lan Tianyu wrote:
>>>> This patch is to change DSDT table for processor object to support >128 
>>>> vcpus
>>>> accroding to ACPI spec 8.4 Declaring Processors
>>>>
>>>> Signed-off-by: Lan Tianyu <tianyu@intel.com>
>>>> ---
>>>>  tools/libacpi/mk_dsdt.c | 18 --
>>>>  1 file changed, 12 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/tools/libacpi/mk_dsdt.c b/tools/libacpi/mk_dsdt.c
>>>> index 2daf32c..6c4c325 100644
>>>> --- a/tools/libacpi/mk_dsdt.c
>>>> +++ b/tools/libacpi/mk_dsdt.c
>>>> @@ -24,6 +24,8 @@
>>>>  #include 
>>>>  #endif
>>>>  
>>>> +#define CPU_NAME_FMT  "P%.03X"
>>>> +
>>>>  static unsigned int indent_level;
>>>>  static bool debug = false;
>>>>  
>>>> @@ -196,10 +198,14 @@ int main(int argc, char **argv)
>>>>  /* Define processor objects and control methods. */
>>>>  for ( cpu = 0; cpu < max_cpus; cpu++)
>>>>  {
>>>> -push_block("Processor", "PR%02X, %d, 0xb010, 0x06", cpu, cpu);
>>>> +unsigned int apic_id = cpu * 2;
>>>
>>> This is fragile, ideally there should be a single point where the APIC
>>> ID is calculated. Although there are already two places where the APIC
>>> ID is calculated, in hvmloader and libxl.
>>>
>>> And I'm not sure how to use any of those here in order to avoid
>>> introducing a third one.
>>
>> The mk_dsdt is independent tool to build dsdt table. It wasn't linked
>> with libxl and hvmloader. We can't reuse old function to do that.
>>
>> But I think we may introduce a new LAPIC_ID(vcpu) in the arch head
>> file(i.e, #include ) and replace old ones.
> 
> There's already a LAPIC_ID macro in hvmloader headers which should be
> placed somewhere suitable.

Yes, this is what I mentioned.

> What about removing the lapic_id hook from
> acpi_config and placing the LAPIC_ID macro in the libacpi.h header?

I think this should be ARCH specific. I am not sure whether ARM follows
rule of "apic_id = vcpu_id *2".

Julien, could you give some inputs? Thanks.



> 
> I'm not sure why lapic_id needs to be a hook in any case, both it's
> callers use the same exact formula (cpu_id * 2).
> 
> Thanks, Roger.
> 


-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH V2 1/4] xen/hap: Increase hap page pool size for more vcpus support

2017-09-01 Thread Lan Tianyu
On 2017年09月01日 16:34, Jan Beulich wrote:
>>>> On 01.09.17 at 10:19, <tianyu@intel.com> wrote:
>> On 2017年08月31日 21:56, Andrew Cooper wrote:
>>> On 31/08/17 06:01, Lan Tianyu wrote:
>>>> This patch is to increase hap page pool size to support more vcpus in 
>>>> single 
>> VM.
>>>>
>>>> Signed-off-by: Lan Tianyu <tianyu@intel.com>
>>>> ---
>>>>  xen/arch/x86/mm/hap/hap.c | 10 +-
>>>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
>>>> index cdc77a9..96a7ed0 100644
>>>> --- a/xen/arch/x86/mm/hap/hap.c
>>>> +++ b/xen/arch/x86/mm/hap/hap.c
>>>> @@ -464,6 +464,7 @@ void hap_domain_init(struct domain *d)
>>>>  int hap_enable(struct domain *d, u32 mode)
>>>>  {
>>>>  unsigned int old_pages;
>>>> +unsigned int pages;
>>>>  unsigned int i;
>>>>  int rv = 0;
>>>>  
>>>> @@ -473,7 +474,14 @@ int hap_enable(struct domain *d, u32 mode)
>>>>  if ( old_pages == 0 )
>>>>  {
>>>>  paging_lock(d);
>>>> -rv = hap_set_allocation(d, 256, NULL);
>>>> +
>>>> +/* Increase hap page pool with max vcpu number. */
>>>> +if ( d->max_vcpus > 128 )
>>>> +pages = 256;
>>>> +else
>>>> +pages = 512;
>>>> +
>>>> +rv = hap_set_allocation(d, pages, NULL);
>>>
>>> What effect is this intended to have?  hap_enable() is always called
>>> when d->max_vcpus is 0.
>>
>> Sorry. I didn't notice that max_vcpus wasn't set at that point.I hope to
>> allocate hap pages according vcpu number. This means we don't know how
>> many vcpu will be used when allocate hap pages during creating domain,
>> right? If that, we have to increase page number unconditionally.
> 
> But that you were already told isn't really acceptable. Did you
> consider calling hap_set_allocation() another time once vCPU
> count was set?
> 

That sounds feasible. I will try it. Thanks.


-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH V2 1/4] xen/hap: Increase hap page pool size for more vcpus support

2017-09-01 Thread Lan Tianyu
On 2017年08月31日 21:56, Andrew Cooper wrote:
> On 31/08/17 06:01, Lan Tianyu wrote:
>> This patch is to increase hap page pool size to support more vcpus in single 
>> VM.
>>
>> Signed-off-by: Lan Tianyu <tianyu@intel.com>
>> ---
>>  xen/arch/x86/mm/hap/hap.c | 10 +-
>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
>> index cdc77a9..96a7ed0 100644
>> --- a/xen/arch/x86/mm/hap/hap.c
>> +++ b/xen/arch/x86/mm/hap/hap.c
>> @@ -464,6 +464,7 @@ void hap_domain_init(struct domain *d)
>>  int hap_enable(struct domain *d, u32 mode)
>>  {
>>  unsigned int old_pages;
>> +unsigned int pages;
>>  unsigned int i;
>>  int rv = 0;
>>  
>> @@ -473,7 +474,14 @@ int hap_enable(struct domain *d, u32 mode)
>>  if ( old_pages == 0 )
>>  {
>>  paging_lock(d);
>> -rv = hap_set_allocation(d, 256, NULL);
>> +
>> +/* Increase hap page pool with max vcpu number. */
>> +if ( d->max_vcpus > 128 )
>> +pages = 256;
>> +else
>> +pages = 512;
>> +
>> +rv = hap_set_allocation(d, pages, NULL);
> 
> What effect is this intended to have?  hap_enable() is always called
> when d->max_vcpus is 0.

Sorry. I didn't notice that max_vcpus wasn't set at that point.I hope to
allocate hap pages according vcpu number. This means we don't know how
many vcpu will be used when allocate hap pages during creating domain,
right? If that, we have to increase page number unconditionally.

> 
> d->max_vcpus isn't chosen until a subsequent hypercall.  (This is one of
> many unexpected surprised from multi-vcpu support having been hacked on
> the side of existing Xen support, rather than being built in to the
> createdomain hypercall).
> 
> ~Andrew
> 


-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH V2 2/4] Tool/ACPI: DSDT extension to support more vcpus

2017-08-31 Thread Lan Tianyu
On 2017年08月31日 23:38, Roger Pau Monné wrote:
> On Thu, Aug 31, 2017 at 01:01:47AM -0400, Lan Tianyu wrote:
>> This patch is to change DSDT table for processor object to support >128 vcpus
>> accroding to ACPI spec 8.4 Declaring Processors
>>
>> Signed-off-by: Lan Tianyu <tianyu@intel.com>
>> ---
>>  tools/libacpi/mk_dsdt.c | 18 --
>>  1 file changed, 12 insertions(+), 6 deletions(-)
>>
>> diff --git a/tools/libacpi/mk_dsdt.c b/tools/libacpi/mk_dsdt.c
>> index 2daf32c..6c4c325 100644
>> --- a/tools/libacpi/mk_dsdt.c
>> +++ b/tools/libacpi/mk_dsdt.c
>> @@ -24,6 +24,8 @@
>>  #include 
>>  #endif
>>  
>> +#define CPU_NAME_FMT  "P%.03X"
>> +
>>  static unsigned int indent_level;
>>  static bool debug = false;
>>  
>> @@ -196,10 +198,14 @@ int main(int argc, char **argv)
>>  /* Define processor objects and control methods. */
>>  for ( cpu = 0; cpu < max_cpus; cpu++)
>>  {
>> -push_block("Processor", "PR%02X, %d, 0xb010, 0x06", cpu, cpu);
>> +unsigned int apic_id = cpu * 2;
> 
> This is fragile, ideally there should be a single point where the APIC
> ID is calculated. Although there are already two places where the APIC
> ID is calculated, in hvmloader and libxl.
> 
> And I'm not sure how to use any of those here in order to avoid
> introducing a third one.

The mk_dsdt is independent tool to build dsdt table. It wasn't linked
with libxl and hvmloader. We can't reuse old function to do that.

But I think we may introduce a new LAPIC_ID(vcpu) in the arch head
file(i.e, #include ) and replace old ones.

> 
>>  
>> -stmt("Name", "_HID, \"ACPI0007\"");
>> +if ( apic_id > 255 )
> 
> We need to be careful with this. This is not a problem ATM because the
> ACPI ID is the CPU ID, but care should be taken to not create a
> Processor object with ACPI ID 255, because that's the broadcast ACPI
> ID...

Yes.

> 
>> +push_block("Device", CPU_NAME_FMT, cpu);
>> +else
> 
> ... IMHO an assert(cpu < 255); should be added here.

OK.

> 
>> +push_block("Processor", CPU_NAME_FMT", %d, 0xb010, 0x06", 
>> cpu, cpu);
>^ space (here and below)
> 
> Please leave a space between the string literals and the defines, it
> makes it easier to read. And this line needs to be split.
> 

OK. Will update.


-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC PATCH V2 4/4] xl/libacpi: extend lapic_id() to uint32_t

2017-08-31 Thread Lan Tianyu
From: Chao Gao <chao@intel.com>

This patch is to extend lapic_id() to support more vcpus.

Signed-off-by: Chao Gao <chao@intel.com>
Signed-off-by: Lan Tianyu <tianyu@intel.com>
---
 tools/firmware/hvmloader/util.c | 2 +-
 tools/libacpi/libacpi.h | 2 +-
 tools/libxl/libxl_x86_acpi.c| 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index db5f240..814ac2e 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -883,7 +883,7 @@ static void acpi_mem_free(struct acpi_ctxt *ctxt,
 /* ACPI builder currently doesn't free memory so this is just a stub */
 }
 
-static uint8_t acpi_lapic_id(unsigned cpu)
+static uint32_t acpi_lapic_id(unsigned cpu)
 {
 return LAPIC_ID(cpu);
 }
diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
index 74778a5..0b04cbc 100644
--- a/tools/libacpi/libacpi.h
+++ b/tools/libacpi/libacpi.h
@@ -93,7 +93,7 @@ struct acpi_config {
 unsigned long rsdp;
 
 /* x86-specific parameters */
-uint8_t (*lapic_id)(unsigned cpu);
+uint32_t (*lapic_id)(unsigned cpu);
 uint32_t lapic_base_address;
 uint32_t ioapic_base_address;
 uint16_t pci_isa_irq_mask;
diff --git a/tools/libxl/libxl_x86_acpi.c b/tools/libxl/libxl_x86_acpi.c
index 1fa97ff..8fe084d 100644
--- a/tools/libxl/libxl_x86_acpi.c
+++ b/tools/libxl/libxl_x86_acpi.c
@@ -85,7 +85,7 @@ static void acpi_mem_free(struct acpi_ctxt *ctxt,
 {
 }
 
-static uint8_t acpi_lapic_id(unsigned cpu)
+static uint32_t acpi_lapic_id(unsigned cpu)
 {
 return cpu * 2;
 }
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC PATCH V2 0/4] Extend resources to support more vcpus in single VM

2017-08-31 Thread Lan Tianyu
Change since v1:
1) Increase hap page pool according vcpu number
2) Use "Processor" syntax to define vcpus with APIC id < 255
in dsdt and use "Device" syntax for other vcpus in ACPI DSDT table.
3) Use XAPIC structure for vcpus with APIC id < 255
in dsdt and use x2APIC structure for other vcpus in the ACPI MADT table.

This patchset is to extend some resources(i.e, event channel,
hap and so) to support more vcpus for single VM.


Chao Gao (1):
  xl/libacpi: extend lapic_id() to uint32_t

Lan Tianyu (3):
  xen/hap: Increase hap size for more vcpus support
  Tool/ACPI: DSDT extension to support more vcpus
  hvmload: Add x2apic entry support in the MADT build

 tools/firmware/hvmloader/util.c |  2 +-
 tools/libacpi/acpi2_0.h | 10 +++
 tools/libacpi/build.c   | 59 +
 tools/libacpi/libacpi.h |  2 +-
 tools/libacpi/mk_dsdt.c | 18 -
 tools/libxl/libxl_x86_acpi.c|  2 +-
 xen/arch/x86/mm/hap/hap.c   | 10 ++-
 7 files changed, 76 insertions(+), 27 deletions(-)

-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC PATCH V2 1/4] xen/hap: Increase hap page pool size for more vcpus support

2017-08-31 Thread Lan Tianyu
This patch is to increase hap page pool size to support more vcpus in single VM.

Signed-off-by: Lan Tianyu <tianyu@intel.com>
---
 xen/arch/x86/mm/hap/hap.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index cdc77a9..96a7ed0 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -464,6 +464,7 @@ void hap_domain_init(struct domain *d)
 int hap_enable(struct domain *d, u32 mode)
 {
 unsigned int old_pages;
+unsigned int pages;
 unsigned int i;
 int rv = 0;
 
@@ -473,7 +474,14 @@ int hap_enable(struct domain *d, u32 mode)
 if ( old_pages == 0 )
 {
 paging_lock(d);
-rv = hap_set_allocation(d, 256, NULL);
+
+/* Increase hap page pool with max vcpu number. */
+if ( d->max_vcpus > 128 )
+pages = 256;
+else
+pages = 512;
+
+rv = hap_set_allocation(d, pages, NULL);
 if ( rv != 0 )
 {
 hap_set_allocation(d, 0, NULL);
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC PATCH V2 3/4] hvmload: Add x2apic entry support in the MADT build

2017-08-31 Thread Lan Tianyu
This patch is to add x2apic entry support for ACPI MADT table
according to ACPI spec 5.2.12.12 Processor Local x2APIC Structure

Signed-off-by: Chao Gao <chao@intel.com>
Signed-off-by: Lan Tianyu <tianyu@intel.com>
---
 tools/libacpi/acpi2_0.h | 10 +
 tools/libacpi/build.c   | 59 +++--
 2 files changed, 52 insertions(+), 17 deletions(-)

diff --git a/tools/libacpi/acpi2_0.h b/tools/libacpi/acpi2_0.h
index 758a823..caa3682 100644
--- a/tools/libacpi/acpi2_0.h
+++ b/tools/libacpi/acpi2_0.h
@@ -322,6 +322,7 @@ struct acpi_20_waet {
 #define ACPI_IO_SAPIC   0x06
 #define ACPI_PROCESSOR_LOCAL_SAPIC  0x07
 #define ACPI_PLATFORM_INTERRUPT_SOURCES 0x08
+#define ACPI_PROCESSOR_LOCAL_X2APIC 0x09
 
 /*
  * APIC Structure Definitions.
@@ -338,6 +339,15 @@ struct acpi_20_madt_lapic {
 uint32_t flags;
 };
 
+struct acpi_20_madt_x2apic {
+uint8_t  type;
+uint8_t  length;
+uint16_t reserved;  /* reserved - must be zero */
+uint32_t apic_id;   /* Processor x2APIC ID  */
+uint32_t flags;
+uint32_t acpi_processor_id; /* ACPI processor UID */
+};
+
 /*
  * Local APIC Flags.  All other bits are reserved and must be 0.
  */
diff --git a/tools/libacpi/build.c b/tools/libacpi/build.c
index c7cc784..0c95850 100644
--- a/tools/libacpi/build.c
+++ b/tools/libacpi/build.c
@@ -82,9 +82,9 @@ static struct acpi_20_madt *construct_madt(struct acpi_ctxt 
*ctxt,
 struct acpi_20_madt   *madt;
 struct acpi_20_madt_intsrcovr *intsrcovr;
 struct acpi_20_madt_ioapic*io_apic;
-struct acpi_20_madt_lapic *lapic;
 const struct hvm_info_table   *hvminfo = config->hvminfo;
 int i, sz;
+void *end;
 
 if ( config->lapic_id == NULL )
 return NULL;
@@ -92,7 +92,12 @@ static struct acpi_20_madt *construct_madt(struct acpi_ctxt 
*ctxt,
 sz  = sizeof(struct acpi_20_madt);
 sz += sizeof(struct acpi_20_madt_intsrcovr) * 16;
 sz += sizeof(struct acpi_20_madt_ioapic);
-sz += sizeof(struct acpi_20_madt_lapic) * hvminfo->nr_vcpus;
+
+if (hvminfo->nr_vcpus < 128)
+sz += sizeof(struct acpi_20_madt_lapic) * hvminfo->nr_vcpus;
+else
+sz += sizeof(struct acpi_20_madt_lapic) * 128 +
+  sizeof(struct acpi_20_madt_x2apic) * (hvminfo->nr_vcpus - 128);
 
 madt = ctxt->mem_ops.alloc(ctxt, sz, 16);
 if (!madt) return NULL;
@@ -109,7 +114,7 @@ static struct acpi_20_madt *construct_madt(struct acpi_ctxt 
*ctxt,
 madt->flags  = ACPI_PCAT_COMPAT;
 
 if ( config->table_flags & ACPI_HAS_IOAPIC )
-{ 
+{
 intsrcovr = (struct acpi_20_madt_intsrcovr *)(madt + 1);
 for ( i = 0; i < 16; i++ )
 {
@@ -146,27 +151,47 @@ static struct acpi_20_madt *construct_madt(struct 
acpi_ctxt *ctxt,
 io_apic->ioapic_id   = config->ioapic_id;
 io_apic->ioapic_addr = config->ioapic_base_address;
 
-lapic = (struct acpi_20_madt_lapic *)(io_apic + 1);
+end = (struct acpi_20_madt_lapic *)(io_apic + 1);
 }
 else
-lapic = (struct acpi_20_madt_lapic *)(madt + 1);
+end = (struct acpi_20_madt_lapic *)(madt + 1);
+
+info->madt_lapic0_addr = ctxt->mem_ops.v2p(ctxt, end);
 
-info->nr_cpus = hvminfo->nr_vcpus;
-info->madt_lapic0_addr = ctxt->mem_ops.v2p(ctxt, lapic);
 for ( i = 0; i < hvminfo->nr_vcpus; i++ )
 {
-memset(lapic, 0, sizeof(*lapic));
-lapic->type= ACPI_PROCESSOR_LOCAL_APIC;
-lapic->length  = sizeof(*lapic);
-/* Processor ID must match processor-object IDs in the DSDT. */
-lapic->acpi_processor_id = i;
-lapic->apic_id = config->lapic_id(i);
-lapic->flags = (test_bit(i, hvminfo->vcpu_online)
-? ACPI_LOCAL_APIC_ENABLED : 0);
-lapic++;
+unsigned int apic_id = config->lapic_id(i);
+
+if ( apic_id < 255 ) {
+struct acpi_20_madt_lapic *lapic = end;
+
+memset(lapic, 0, sizeof(*lapic));
+lapic->type= ACPI_PROCESSOR_LOCAL_APIC;
+lapic->length  = sizeof(*lapic);
+/* Processor ID must match processor-object IDs in the DSDT. */
+lapic->acpi_processor_id = i;
+lapic->apic_id = config->lapic_id(i);
+lapic->flags = ((i < hvminfo->nr_vcpus) &&
+test_bit(i, hvminfo->vcpu_online)
+? ACPI_LOCAL_APIC_ENABLED : 0);
+end = ++lapic;
+} else {
+struct acpi_20_madt_x2apic *lapic = end;
+
+memset(lapic, 0, sizeof(*lapic));
+lapic->type= ACPI_PROCESSOR_LOCAL_X2APIC;
+lapic->length  = sizeof(*lapic);
+/* Processor ID must match processor-object IDs in th

[Xen-devel] [RFC PATCH V2 2/4] Tool/ACPI: DSDT extension to support more vcpus

2017-08-31 Thread Lan Tianyu
This patch is to change DSDT table for processor object to support >128 vcpus
accroding to ACPI spec 8.4 Declaring Processors

Signed-off-by: Lan Tianyu <tianyu@intel.com>
---
 tools/libacpi/mk_dsdt.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/tools/libacpi/mk_dsdt.c b/tools/libacpi/mk_dsdt.c
index 2daf32c..6c4c325 100644
--- a/tools/libacpi/mk_dsdt.c
+++ b/tools/libacpi/mk_dsdt.c
@@ -24,6 +24,8 @@
 #include 
 #endif
 
+#define CPU_NAME_FMT  "P%.03X"
+
 static unsigned int indent_level;
 static bool debug = false;
 
@@ -196,10 +198,14 @@ int main(int argc, char **argv)
 /* Define processor objects and control methods. */
 for ( cpu = 0; cpu < max_cpus; cpu++)
 {
-push_block("Processor", "PR%02X, %d, 0xb010, 0x06", cpu, cpu);
+unsigned int apic_id = cpu * 2;
 
-stmt("Name", "_HID, \"ACPI0007\"");
+if ( apic_id > 255 )
+push_block("Device", CPU_NAME_FMT, cpu);
+else
+push_block("Processor", CPU_NAME_FMT", %d, 0xb010, 0x06", cpu, 
cpu);
 
+stmt("Name", "_HID, \"ACPI0007\"");
 stmt("Name", "_UID, %d", cpu);
 #ifdef CONFIG_ARM_64
 pop_block();
@@ -268,15 +274,15 @@ int main(int argc, char **argv)
 /* Extract current CPU's status: 0=offline; 1=online. */
 stmt("And", "Local1, 1, Local2");
 /* Check if status is up-to-date in the relevant MADT LAPIC entry... */
-push_block("If", "LNotEqual(Local2, \\_SB.PR%02X.FLG)", cpu);
+push_block("If", "LNotEqual(Local2, \\_SB."CPU_NAME_FMT ".FLG)", cpu);
 /* ...If not, update it and the MADT checksum, and notify OSPM. */
-stmt("Store", "Local2, \\_SB.PR%02X.FLG", cpu);
+stmt("Store", "Local2, \\_SB."CPU_NAME_FMT".FLG", cpu);
 push_block("If", "LEqual(Local2, 1)");
-stmt("Notify", "PR%02X, 1", cpu); /* Notify: Device Check */
+stmt("Notify", CPU_NAME_FMT", 1", cpu); /* Notify: Device Check */
 stmt("Subtract", "\\_SB.MSU, 1, \\_SB.MSU"); /* Adjust MADT csum */
 pop_block();
 push_block("Else", NULL);
-stmt("Notify", "PR%02X, 3", cpu); /* Notify: Eject Request */
+stmt("Notify", CPU_NAME_FMT", 3", cpu); /* Notify: Eject Request */
 stmt("Add", "\\_SB.MSU, 1, \\_SB.MSU"); /* Adjust MADT csum */
 pop_block();
 pop_block();
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH 0/5] Extend resources to support more vcpus in single VM

2017-08-29 Thread Lan Tianyu
On 2017年08月29日 16:49, Jan Beulich wrote:
 On 29.08.17 at 06:38,  wrote:
>> On 2017年08月25日 22:10, Meng Xu wrote:
>>> How many VCPUs for a single VM do you want to support with this patch set?
>>
>> Hi Meng:
>>  Sorry for later response. We hope to increase max vcpu number to 512.
>> This also have dependency on other jobs(i.e, cpu topology, mult page
>> support for ioreq server and virtual IOMMU).
> 
> I'm sorry for repeating this, but your first and foremost goal ought
> to be to address the known issues with VMs having up to 128
> vCPU-s; Andrew has been pointing this out in the past. I see no
> point in pushing up the limit if even the current limit doesn't work
> reliably in all cases.
> 

Hi Jan & Andrew:
We ran some HPC benchmark(i.e, HPlinkpack, dgemm, sgemm, igemm and so
on) in a huge VM with 128 vcpus(Even >255 vcpus with non-upstreamed
patches) and didn't meet unreliable issue. These benchmarks run heavy
workloads in VM and some of them even last several hours.

-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH 3/5] Tool/ACPI: DSDT extension to support more vcpus

2017-08-28 Thread Lan Tianyu
On 2017年08月25日 18:36, Roger Pau Monné wrote:
> On Thu, Aug 24, 2017 at 10:52:18PM -0400, Lan Tianyu wrote:
>> This patch is to change DSDT table for processor object to support >255 
>> vcpus.
> 
> The note in ACPI 6.1A spec section 5.2.12.12 contains the following:
> 
> [Compatibility note] On some legacy OSes, Logical processors with APIC
> ID values less than 255 (whether in XAPIC or X2APIC mode) must use the
> Processor Local APIC structure to convey their APIC information to
> OSPM, and those processors must be declared in the DSDT using the
> Processor() keyword. Logical processors with APIC ID values 255 and
> greater must use the Processor Local x2APIC structure and be declared
> using the Device() keyword. See Section 19.6.102 "Processor (Declare
> Processor)" for more information.
> 
> So you cannot unconditionally switch to using the Device for all
> processors.
> 
> vCPUs <= 128 need to use the Processor keyword, while vCPUs > 128 need
> to use the Device keyword.

Yes, that's right and will fix.
-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH 3/5] Tool/ACPI: DSDT extension to support more vcpus

2017-08-28 Thread Lan Tianyu
On 2017年08月25日 20:01, Jan Beulich wrote:
>>>> On 25.08.17 at 12:36, <roger@citrix.com> wrote:
>> On Thu, Aug 24, 2017 at 10:52:18PM -0400, Lan Tianyu wrote:
>>> This patch is to change DSDT table for processor object to support >255 
>> vcpus.
>>
>> The note in ACPI 6.1A spec section 5.2.12.12 contains the following:
>>
>> [Compatibility note] On some legacy OSes, Logical processors with APIC
>> ID values less than 255 (whether in XAPIC or X2APIC mode) must use the
>> Processor Local APIC structure to convey their APIC information to
>> OSPM, and those processors must be declared in the DSDT using the
>> Processor() keyword. Logical processors with APIC ID values 255 and
>> greater must use the Processor Local x2APIC structure and be declared
>> using the Device() keyword. See Section 19.6.102 "Processor (Declare
>> Processor)" for more information.
>>
>> So you cannot unconditionally switch to using the Device for all
>> processors.
>>
>> vCPUs <= 128 need to use the Processor keyword, while vCPUs > 128 need
>> to use the Device keyword.
> 
> While changing this code, may I suggest to stop referring to the
> 128 vCPU boundary? The decision should be solely based on
> LAPIC ID, such that the only place to change later on will end up
> being the one where it gets set to double the vCPU number.
> 

OK. Got it.

-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH 0/5] Extend resources to support more vcpus in single VM

2017-08-28 Thread Lan Tianyu
On 2017年08月25日 22:10, Meng Xu wrote:
> Hi Tianyu,
> 
> On Thu, Aug 24, 2017 at 10:52 PM, Lan Tianyu <tianyu@intel.com> wrote:
>>
>> This patchset is to extend some resources(i.e, event channel,
>> hap and so) to support more vcpus for single VM.
>>
>>
>> Chao Gao (1):
>>   xl/libacpi: extend lapic_id() to uint32_t
>>
>> Lan Tianyu (4):
>>   xen/hap: Increase hap size for more vcpus support
>>   XL: Increase event channels to support more vcpus
>>   Tool/ACPI: DSDT extension to support more vcpus
>>   hvmload: Add x2apic entry support in the MADT build
>>
>>  tools/firmware/hvmloader/util.c |  2 +-
>>  tools/libacpi/acpi2_0.h | 10 +++
>>  tools/libacpi/build.c   | 61 
>> +
>>  tools/libacpi/libacpi.h |  2 +-
>>  tools/libacpi/mk_dsdt.c | 11 
>>  tools/libxl/libxl_create.c  |  2 +-
>>  tools/libxl/libxl_x86_acpi.c|  2 +-
>>  xen/arch/x86/mm/hap/hap.c   |  2 +-
>>  8 files changed, 63 insertions(+), 29 deletions(-)
> 
> 
> How many VCPUs for a single VM do you want to support with this patch set?

Hi Meng:
Sorry for later response. We hope to increase max vcpu number to 512.
This also have dependency on other jobs(i.e, cpu topology, mult page
support for ioreq server and virtual IOMMU).

-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH 4/5] hvmload: Add x2apic entry support in the MADT build

2017-08-28 Thread Lan Tianyu
On 2017年08月25日 18:11, Roger Pau Monné wrote:
> On Thu, Aug 24, 2017 at 10:52:19PM -0400, Lan Tianyu wrote:
>> This patch is to add x2apic entry support for ACPI MADT table.
>>
>> Signed-off-by: Lan Tianyu <tianyu@intel.com>
>> Signed-off-by: Chao Gao <chao@intel.com>
>> ---
>>  tools/libacpi/acpi2_0.h | 10 
>>  tools/libacpi/build.c   | 61 
>> ++---
>>  2 files changed, 53 insertions(+), 18 deletions(-)
>>
>> diff --git a/tools/libacpi/acpi2_0.h b/tools/libacpi/acpi2_0.h
>> index 758a823..ff18b3e 100644
>> --- a/tools/libacpi/acpi2_0.h
>> +++ b/tools/libacpi/acpi2_0.h
>> @@ -322,6 +322,7 @@ struct acpi_20_waet {
>>  #define ACPI_IO_SAPIC   0x06
>>  #define ACPI_PROCESSOR_LOCAL_SAPIC  0x07
>>  #define ACPI_PLATFORM_INTERRUPT_SOURCES 0x08
>> +#define ACPI_PROCESSOR_LOCAL_X2APIC 0x09
>>  
>>  /*
>>   * APIC Structure Definitions.
>> @@ -338,6 +339,15 @@ struct acpi_20_madt_lapic {
>>  uint32_t flags;
>>  };
>>  
>> +struct acpi_20_madt_x2apic {
>> +uint8_t  type;
>> +uint8_t  length;
>> +uint16_t reserved;  /* reserved - must be zero */
>> +uint32_t apic_id;   /* Processor x2APIC ID  */
>> +uint32_t flags;
>> +uint32_t acpi_processor_id; /* ACPI processor UID */
> 
> There's a mix of tabs and spaces above.
> 
>> +};
>> +
>>  /*
>>   * Local APIC Flags.  All other bits are reserved and must be 0.
>>   */
>> diff --git a/tools/libacpi/build.c b/tools/libacpi/build.c
>> index c7cc784..36e582a 100644
>> --- a/tools/libacpi/build.c
>> +++ b/tools/libacpi/build.c
>> @@ -82,9 +82,9 @@ static struct acpi_20_madt *construct_madt(struct 
>> acpi_ctxt *ctxt,
>>  struct acpi_20_madt   *madt;
>>  struct acpi_20_madt_intsrcovr *intsrcovr;
>>  struct acpi_20_madt_ioapic*io_apic;
>> -struct acpi_20_madt_lapic *lapic;
>>  const struct hvm_info_table   *hvminfo = config->hvminfo;
>>  int i, sz;
>> +void *end;
>>  
>>  if ( config->lapic_id == NULL )
>>  return NULL;
>> @@ -92,7 +92,11 @@ static struct acpi_20_madt *construct_madt(struct 
>> acpi_ctxt *ctxt,
>>  sz  = sizeof(struct acpi_20_madt);
>>  sz += sizeof(struct acpi_20_madt_intsrcovr) * 16;
>>  sz += sizeof(struct acpi_20_madt_ioapic);
>> -sz += sizeof(struct acpi_20_madt_lapic) * hvminfo->nr_vcpus;
>> +
>> +if (hvminfo->nr_vcpus < 256)
>> +sz += sizeof(struct acpi_20_madt_lapic) * hvminfo->nr_vcpus;
>> +else
>> +sz += sizeof(struct acpi_20_madt_x2apic) * hvminfo->nr_vcpus;
> 
> This is wrong, APIC ID is cpu id * 2, so the limit here needs to be
> 128, not 256. Also this should be set as a constant somewhere.

Sorry. We made APIC ID was vcpu id in our internal repo and didn't send
out. Will change this in next version.

> 
> Apart from that, although this is technically correct, I would rather
> prefer the first 128 vCPUs to have xAPIC entries, and APIC IDs > 254
> to use x2APIC entries. This will allow a guest without x2APIC support
> to still boot on VMs > 128 vCPUs, although they won't be able to use
> the extra CPUs. IIRC this is in line with what bare metal does.

OK. Will update.

> 
> Roger.
> 


-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH 3/5] Tool/ACPI: DSDT extension to support more vcpus

2017-08-28 Thread Lan, Tianyu

On 8/25/2017 5:25 PM, Wei Liu wrote:

On Thu, Aug 24, 2017 at 10:52:18PM -0400, Lan Tianyu wrote:

This patch is to change DSDT table for processor object to support >255 vcpus.



Can you provide a link to the spec so people can check if you
modification is correct?



OK. Will add in the next version.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH 2/5] XL: Increase event channels to support more vcpus

2017-08-28 Thread Lan, Tianyu

On 8/25/2017 6:04 PM, Wei Liu wrote:

On Fri, Aug 25, 2017 at 10:57:26AM +0100, Roger Pau Monné wrote:

On Fri, Aug 25, 2017 at 10:18:24AM +0100, Wei Liu wrote:

On Thu, Aug 24, 2017 at 10:52:17PM -0400, Lan Tianyu wrote:

This patch is to increase event channels to support more vcpus in single VM.

Signed-off-by: Lan Tianyu <tianyu@intel.com>


There is no need to bump the default. There is already a configuration
option called max_event_channel.


Maybe make this somehow based on the number of vCPUs assigned to the
domain?

It's not very used-friendly to allow the creation of a domain with 256
vCPUs for example that would then get stuck during boot.

Or at least check max_event_channel and the number of vCPUs and print
a warning message to alert the user that things might go wrong with
this configuration.



The problem is number of vcpu is only one factor that would affect the
number of event channels needed.


How about producing a warning about event channel maybe not enough when 
vcpu number is >128 and still uses default max event channel number?



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH 1/5] xen/hap: Increase hap size for more vcpus support

2017-08-28 Thread Lan, Tianyu

On 8/25/2017 5:14 PM, Wei Liu wrote:

On Thu, Aug 24, 2017 at 10:52:16PM -0400, Lan Tianyu wrote:

This patch is to increase hap size to support more vcpus in single VM.

Signed-off-by: Lan Tianyu <tianyu@intel.com>


Can we maybe derive the number of pages needed from the number of vcpus?



Yes, we can add check of vcpu number here.


Bumping this value unconditionally is going to increase memory
consumption.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC PATCH 0/5] Extend resources to support more vcpus in single VM

2017-08-25 Thread Lan Tianyu
This patchset is to extend some resources(i.e, event channel,
hap and so) to support more vcpus for single VM.

Chao Gao (1):
  xl/libacpi: extend lapic_id() to uint32_t

Lan Tianyu (4):
  xen/hap: Increase hap size for more vcpus support
  XL: Increase event channels to support more vcpus
  Tool/ACPI: DSDT extension to support more vcpus
  hvmload: Add x2apic entry support in the MADT build

 tools/firmware/hvmloader/util.c |  2 +-
 tools/libacpi/acpi2_0.h | 10 +++
 tools/libacpi/build.c   | 61 +
 tools/libacpi/libacpi.h |  2 +-
 tools/libacpi/mk_dsdt.c | 11 
 tools/libxl/libxl_create.c  |  2 +-
 tools/libxl/libxl_x86_acpi.c|  2 +-
 xen/arch/x86/mm/hap/hap.c   |  2 +-
 8 files changed, 63 insertions(+), 29 deletions(-)

-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC PATCH 3/5] Tool/ACPI: DSDT extension to support more vcpus

2017-08-25 Thread Lan Tianyu
This patch is to change DSDT table for processor object to support >255 vcpus.

Signed-off-by: Lan Tianyu <tianyu@intel.com>
---
 tools/libacpi/mk_dsdt.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/tools/libacpi/mk_dsdt.c b/tools/libacpi/mk_dsdt.c
index 2daf32c..d37aed6 100644
--- a/tools/libacpi/mk_dsdt.c
+++ b/tools/libacpi/mk_dsdt.c
@@ -196,8 +196,7 @@ int main(int argc, char **argv)
 /* Define processor objects and control methods. */
 for ( cpu = 0; cpu < max_cpus; cpu++)
 {
-push_block("Processor", "PR%02X, %d, 0xb010, 0x06", cpu, cpu);
-
+push_block("Device", "P%03X", cpu);
 stmt("Name", "_HID, \"ACPI0007\"");
 
 stmt("Name", "_UID, %d", cpu);
@@ -268,15 +267,15 @@ int main(int argc, char **argv)
 /* Extract current CPU's status: 0=offline; 1=online. */
 stmt("And", "Local1, 1, Local2");
 /* Check if status is up-to-date in the relevant MADT LAPIC entry... */
-push_block("If", "LNotEqual(Local2, \\_SB.PR%02X.FLG)", cpu);
+push_block("If", "LNotEqual(Local2, \\_SB.P%03X.FLG)", cpu);
 /* ...If not, update it and the MADT checksum, and notify OSPM. */
-stmt("Store", "Local2, \\_SB.PR%02X.FLG", cpu);
+stmt("Store", "Local2, \\_SB.P%03X.FLG", cpu);
 push_block("If", "LEqual(Local2, 1)");
-stmt("Notify", "PR%02X, 1", cpu); /* Notify: Device Check */
+stmt("Notify", "P%03X, 1", cpu); /* Notify: Device Check */
 stmt("Subtract", "\\_SB.MSU, 1, \\_SB.MSU"); /* Adjust MADT csum */
 pop_block();
 push_block("Else", NULL);
-stmt("Notify", "PR%02X, 3", cpu); /* Notify: Eject Request */
+stmt("Notify", "P%03X, 3", cpu); /* Notify: Eject Request */
 stmt("Add", "\\_SB.MSU, 1, \\_SB.MSU"); /* Adjust MADT csum */
 pop_block();
 pop_block();
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC PATCH 2/5] XL: Increase event channels to support more vcpus

2017-08-25 Thread Lan Tianyu
This patch is to increase event channels to support more vcpus in single VM.

Signed-off-by: Lan Tianyu <tianyu@intel.com>
---
 tools/libxl/libxl_create.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 1158303..3937169 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -210,7 +210,7 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
 b_info->iomem[i].gfn = b_info->iomem[i].start;
 
 if (!b_info->event_channels)
-b_info->event_channels = 1023;
+b_info->event_channels = 4095;
 
 libxl__arch_domain_build_info_acpi_setdefault(b_info);
 
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC PATCH 1/5] xen/hap: Increase hap size for more vcpus support

2017-08-25 Thread Lan Tianyu
This patch is to increase hap size to support more vcpus in single VM.

Signed-off-by: Lan Tianyu <tianyu@intel.com>
---
 xen/arch/x86/mm/hap/hap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index cdc77a9..cb81368 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -473,7 +473,7 @@ int hap_enable(struct domain *d, u32 mode)
 if ( old_pages == 0 )
 {
 paging_lock(d);
-rv = hap_set_allocation(d, 256, NULL);
+rv = hap_set_allocation(d, 512, NULL);
 if ( rv != 0 )
 {
 hap_set_allocation(d, 0, NULL);
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC PATCH 5/5] xl/libacpi: extend lapic_id() to uint32_t

2017-08-25 Thread Lan Tianyu
From: Chao Gao <chao@intel.com>

This patch is to extend lapic_id() to support more vcpus.

Signed-off-by: Chao Gao <chao@intel.com>
Signed-off-by: Lan Tianyu <tianyu@intel.com>
---
 tools/firmware/hvmloader/util.c | 2 +-
 tools/libacpi/libacpi.h | 2 +-
 tools/libxl/libxl_x86_acpi.c| 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index db5f240..814ac2e 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -883,7 +883,7 @@ static void acpi_mem_free(struct acpi_ctxt *ctxt,
 /* ACPI builder currently doesn't free memory so this is just a stub */
 }
 
-static uint8_t acpi_lapic_id(unsigned cpu)
+static uint32_t acpi_lapic_id(unsigned cpu)
 {
 return LAPIC_ID(cpu);
 }
diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
index 74778a5..0b04cbc 100644
--- a/tools/libacpi/libacpi.h
+++ b/tools/libacpi/libacpi.h
@@ -93,7 +93,7 @@ struct acpi_config {
 unsigned long rsdp;
 
 /* x86-specific parameters */
-uint8_t (*lapic_id)(unsigned cpu);
+uint32_t (*lapic_id)(unsigned cpu);
 uint32_t lapic_base_address;
 uint32_t ioapic_base_address;
 uint16_t pci_isa_irq_mask;
diff --git a/tools/libxl/libxl_x86_acpi.c b/tools/libxl/libxl_x86_acpi.c
index 1fa97ff..8fe084d 100644
--- a/tools/libxl/libxl_x86_acpi.c
+++ b/tools/libxl/libxl_x86_acpi.c
@@ -85,7 +85,7 @@ static void acpi_mem_free(struct acpi_ctxt *ctxt,
 {
 }
 
-static uint8_t acpi_lapic_id(unsigned cpu)
+static uint32_t acpi_lapic_id(unsigned cpu)
 {
 return cpu * 2;
 }
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC PATCH 4/5] hvmload: Add x2apic entry support in the MADT build

2017-08-25 Thread Lan Tianyu
This patch is to add x2apic entry support for ACPI MADT table.

Signed-off-by: Lan Tianyu <tianyu@intel.com>
Signed-off-by: Chao Gao <chao@intel.com>
---
 tools/libacpi/acpi2_0.h | 10 
 tools/libacpi/build.c   | 61 ++---
 2 files changed, 53 insertions(+), 18 deletions(-)

diff --git a/tools/libacpi/acpi2_0.h b/tools/libacpi/acpi2_0.h
index 758a823..ff18b3e 100644
--- a/tools/libacpi/acpi2_0.h
+++ b/tools/libacpi/acpi2_0.h
@@ -322,6 +322,7 @@ struct acpi_20_waet {
 #define ACPI_IO_SAPIC   0x06
 #define ACPI_PROCESSOR_LOCAL_SAPIC  0x07
 #define ACPI_PLATFORM_INTERRUPT_SOURCES 0x08
+#define ACPI_PROCESSOR_LOCAL_X2APIC 0x09
 
 /*
  * APIC Structure Definitions.
@@ -338,6 +339,15 @@ struct acpi_20_madt_lapic {
 uint32_t flags;
 };
 
+struct acpi_20_madt_x2apic {
+uint8_t  type;
+uint8_t  length;
+uint16_t reserved; /* reserved - must be zero */
+uint32_t apic_id;   /* Processor x2APIC ID  */
+uint32_t flags;
+uint32_t acpi_processor_id;/* ACPI processor UID */
+};
+
 /*
  * Local APIC Flags.  All other bits are reserved and must be 0.
  */
diff --git a/tools/libacpi/build.c b/tools/libacpi/build.c
index c7cc784..36e582a 100644
--- a/tools/libacpi/build.c
+++ b/tools/libacpi/build.c
@@ -82,9 +82,9 @@ static struct acpi_20_madt *construct_madt(struct acpi_ctxt 
*ctxt,
 struct acpi_20_madt   *madt;
 struct acpi_20_madt_intsrcovr *intsrcovr;
 struct acpi_20_madt_ioapic*io_apic;
-struct acpi_20_madt_lapic *lapic;
 const struct hvm_info_table   *hvminfo = config->hvminfo;
 int i, sz;
+void *end;
 
 if ( config->lapic_id == NULL )
 return NULL;
@@ -92,7 +92,11 @@ static struct acpi_20_madt *construct_madt(struct acpi_ctxt 
*ctxt,
 sz  = sizeof(struct acpi_20_madt);
 sz += sizeof(struct acpi_20_madt_intsrcovr) * 16;
 sz += sizeof(struct acpi_20_madt_ioapic);
-sz += sizeof(struct acpi_20_madt_lapic) * hvminfo->nr_vcpus;
+
+if (hvminfo->nr_vcpus < 256)
+sz += sizeof(struct acpi_20_madt_lapic) * hvminfo->nr_vcpus;
+else
+sz += sizeof(struct acpi_20_madt_x2apic) * hvminfo->nr_vcpus;
 
 madt = ctxt->mem_ops.alloc(ctxt, sz, 16);
 if (!madt) return NULL;
@@ -146,27 +150,48 @@ static struct acpi_20_madt *construct_madt(struct 
acpi_ctxt *ctxt,
 io_apic->ioapic_id   = config->ioapic_id;
 io_apic->ioapic_addr = config->ioapic_base_address;
 
-lapic = (struct acpi_20_madt_lapic *)(io_apic + 1);
+end = (struct acpi_20_madt_lapic *)(io_apic + 1);
 }
 else
-lapic = (struct acpi_20_madt_lapic *)(madt + 1);
+end = (struct acpi_20_madt_lapic *)(madt + 1);
 
-info->nr_cpus = hvminfo->nr_vcpus;
-info->madt_lapic0_addr = ctxt->mem_ops.v2p(ctxt, lapic);
-for ( i = 0; i < hvminfo->nr_vcpus; i++ )
-{
-memset(lapic, 0, sizeof(*lapic));
-lapic->type= ACPI_PROCESSOR_LOCAL_APIC;
-lapic->length  = sizeof(*lapic);
-/* Processor ID must match processor-object IDs in the DSDT. */
-lapic->acpi_processor_id = i;
-lapic->apic_id = config->lapic_id(i);
-lapic->flags = (test_bit(i, hvminfo->vcpu_online)
-? ACPI_LOCAL_APIC_ENABLED : 0);
-lapic++;
+if (hvminfo->nr_vcpus < 256) {
+struct acpi_20_madt_lapic *lapic = (struct acpi_20_madt_lapic *)end;
+info->madt_lapic0_addr = ctxt->mem_ops.v2p(ctxt, lapic);
+for ( i = 0; i < hvminfo->nr_vcpus; i++ )
+{
+memset(lapic, 0, sizeof(*lapic));
+lapic->type= ACPI_PROCESSOR_LOCAL_APIC;
+lapic->length  = sizeof(*lapic);
+/* Processor ID must match processor-object IDs in the DSDT. */
+lapic->acpi_processor_id = i;
+lapic->apic_id = config->lapic_id(i);
+lapic->flags = ((i < hvminfo->nr_vcpus) &&
+test_bit(i, hvminfo->vcpu_online)
+? ACPI_LOCAL_APIC_ENABLED : 0);
+lapic++;
+}
+end = lapic;
+} else {
+struct acpi_20_madt_x2apic *lapic = (struct acpi_20_madt_x2apic *)end;
+info->madt_lapic0_addr = ctxt->mem_ops.v2p(ctxt, lapic);
+for ( i = 0; i < hvminfo->nr_vcpus; i++ )
+{
+memset(lapic, 0, sizeof(*lapic));
+lapic->type= ACPI_PROCESSOR_LOCAL_X2APIC;
+lapic->length  = sizeof(*lapic);
+/* Processor ID must match processor-object IDs in the DSDT. */
+lapic->acpi_processor_id = i;
+lapic->apic_id = config->lapic_id(i);
+lapic->flags =  test_bit(i, hvminfo->vcpu_online)
+? ACPI

Re: [Xen-devel] [PATCH V2 9/25] tools/libxl: build DMAR table for a guest with one virtual VTD

2017-08-25 Thread Lan Tianyu
On 2017年08月25日 11:19, Lan Tianyu wrote:
> On 2017年08月24日 19:08, Wei Liu wrote:
>>>>> If add dmar table for hvmlite, we should combine dmar table with other
>>>>>>>> ACPI table and populate into acpi_modules[2]. This is how hvmlite add
>>>>>>>> other ACPI tables in libxl__dom_load_acpi().
>>>>>>>>
>>>>>>
>>>>>> Sure, that sounds plausible.
>>>>>>
>>>>>> What I would like to see is to have one entry point to manipulate APCI
>>>>>> tables.
>>>>>>
>>>>>> Given the patch volume we're seeing now, we expect contributors to drive
>>>>>> the discussion forward. If you're not sure, feel free to ask more 
>>>>>> questions.
>>>>
>>>> I am not sure whether I understood correctly.
>>>>
>>>> PVHv2 builds all ACPI table in tool stack and uses acpi_module[0, 1, 2]
>>>> to pass related table content.
>>>>
>>>> HVM builds ACPI tables in hvmloader and just use acpi_module[0] to pass
>>>> additional ACPI firmware or table.
>>>>
>>>> These two modes have different way to use acpi_modules[]. So I think we
>>>> can't combine them, right?
>>>>
>> There might be some misunderstanding.  We probably don't want to
>> manipulate the content of the tables in libxl.
>>
>>>> For build dmar table, we have introduced construct_dmar() in under
>>>> libacpi to build dmar table and PVHv2 also can use it in
>>>> libxl__dom_load_acpi().
>>>>
>> My major complain is now there are two functions and in two different
>> locations, in two different phases of domain construction that would
>> manipulate ACPI tables. I would like to have only one.
>>
>> The function you're currently modifying libxl__domain_firmware is not
>> the right place. It's primary function is to load files from disks.
>>
>> You should be able to call the function you introduced in
>> libxl__dom_load_acpi, provided appropriate checks are added.
> 
> But libxl__dom_load_acpi() isn't called on hvm guest code path. It just
> works for PVHv2/HVMlite and have some conflict with hvm guest
> configuration(i.e, acpi_module).
> 
> 
> int libxl__arch_domain_finalise_hw_description(libxl__gc *gc,
>libxl_domain_build_info
> *info,
>struct xc_dom_image *dom)
> {
> int rc = 0;
> 
> if ((info->type == LIBXL_DOMAIN_TYPE_HVM) &&
> (info->device_model_version == LIBXL_DEVICE_MODEL_VERSION_NONE)) {
> rc = libxl__dom_load_acpi(gc, info, dom);
> if (rc != 0)
> LOGE(ERROR, "libxl_dom_load_acpi failed");
> }
> 
> return rc;
> }

We may remove the check and move introduced code in
libxl__dom_load_acpi(). Run new code just for hvm guest. Does this make
sense?

-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V2 9/25] tools/libxl: build DMAR table for a guest with one virtual VTD

2017-08-24 Thread Lan Tianyu
On 2017年08月24日 19:08, Wei Liu wrote:
 If add dmar table for hvmlite, we should combine dmar table with other
 > >> ACPI table and populate into acpi_modules[2]. This is how hvmlite add
 > >> other ACPI tables in libxl__dom_load_acpi().
 > >>
>>> > > 
>>> > > Sure, that sounds plausible.
>>> > > 
>>> > > What I would like to see is to have one entry point to manipulate APCI
>>> > > tables.
>>> > > 
>>> > > Given the patch volume we're seeing now, we expect contributors to drive
>>> > > the discussion forward. If you're not sure, feel free to ask more 
>>> > > questions.
>> > 
>> > I am not sure whether I understood correctly.
>> > 
>> > PVHv2 builds all ACPI table in tool stack and uses acpi_module[0, 1, 2]
>> > to pass related table content.
>> > 
>> > HVM builds ACPI tables in hvmloader and just use acpi_module[0] to pass
>> > additional ACPI firmware or table.
>> > 
>> > These two modes have different way to use acpi_modules[]. So I think we
>> > can't combine them, right?
>> > 
> There might be some misunderstanding.  We probably don't want to
> manipulate the content of the tables in libxl.
> 
>> > For build dmar table, we have introduced construct_dmar() in under
>> > libacpi to build dmar table and PVHv2 also can use it in
>> > libxl__dom_load_acpi().
>> > 
> My major complain is now there are two functions and in two different
> locations, in two different phases of domain construction that would
> manipulate ACPI tables. I would like to have only one.
> 
> The function you're currently modifying libxl__domain_firmware is not
> the right place. It's primary function is to load files from disks.
> 
> You should be able to call the function you introduced in
> libxl__dom_load_acpi, provided appropriate checks are added.

But libxl__dom_load_acpi() isn't called on hvm guest code path. It just
works for PVHv2/HVMlite and have some conflict with hvm guest
configuration(i.e, acpi_module).


int libxl__arch_domain_finalise_hw_description(libxl__gc *gc,
   libxl_domain_build_info
*info,
   struct xc_dom_image *dom)
{
int rc = 0;

if ((info->type == LIBXL_DOMAIN_TYPE_HVM) &&
(info->device_model_version == LIBXL_DEVICE_MODEL_VERSION_NONE)) {
rc = libxl__dom_load_acpi(gc, info, dom);
if (rc != 0)
LOGE(ERROR, "libxl_dom_load_acpi failed");
}

return rc;
}



-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V2 11/25] x86/hvm: Introduce a emulated VTD for HVM

2017-08-24 Thread Lan Tianyu
On 2017年08月24日 16:49, Roger Pau Monné wrote:
> On Thu, Aug 24, 2017 at 10:16:32AM +0800, Lan Tianyu wrote:
>> On 2017年08月23日 15:58, Roger Pau Monné wrote:
>>> On Wed, Aug 09, 2017 at 04:34:12PM -0400, Lan Tianyu wrote:
>>>> From: Chao Gao <chao@intel.com>
>>>> +}
>>>> +
>>>> +#define vvtd_get_reg_quad(vvtd, reg, val) do {  \
>>>> +(val) = vvtd_get_reg(vvtd, (reg) + 4 ); \
>>>> +(val) = (val) << 32;\
>>>> +(val) += vvtd_get_reg(vvtd, reg);   \
>>>> +} while(0)
>>>> +#define vvtd_set_reg_quad(vvtd, reg, val) do {  \
>>>> +vvtd_set_reg(vvtd, reg, (val)); \
>>>> +vvtd_set_reg(vvtd, (reg) + 4, (val) >> 32); \
>>>> +} while(0)
>>>
>>> You seem to need to access hvm_hw_vvtd_regs using different sizes, why
>>> not do:
>>>
>>> union hvm_hw_vvtd_regs {
>>> uint8_t  data8[1024];
>>> uint16_t data16[512];
>>> uint32_t data32[256];
>>> uint64_t data64[128];
>>> };
>>>
>>> Then the access is much more straightforward and you don't need the
>>> complicated helpers that you have above.
>>
>> Yes, that will be simpler.
> 
> Keep in mind (as said in another patch) that this approach will only
> work correctly as long as you force accesses to be size aligned, which
> you where not doing now.
> 
> I've looked at the VT-d spec, but I cannot find any section that
> explains the restrictions on access sizes and alignments.
> 

10.2 Software Access to Registers?

-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V2 7/25] tools/libacpi: Add new fields in acpi_config for DMAR table

2017-08-24 Thread Lan Tianyu
On 2017年08月24日 14:54, Jan Beulich wrote:
>>>> On 24.08.17 at 04:33, <tianyu@intel.com> wrote:
>> On 2017年08月23日 16:04, Roger Pau Monné wrote:
>>> On Wed, Aug 23, 2017 at 03:52:01PM +0800, Lan Tianyu wrote:
>>>> On 2017年08月23日 00:41, Roger Pau Monné wrote:
>>>>>>> +drhd = (struct acpi_dmar_hardware_unit *)((void*)dmar + 
>>>>>>> sizeof(*dmar));
>>>>>>> +drhd->type = ACPI_DMAR_TYPE_HARDWARE_UNIT;
>>>>>>> +drhd->length = sizeof(*drhd) + ioapic_scope_size;
>>>>>>> +drhd->flags = ACPI_DMAR_INCLUDE_PCI_ALL;
>>>>>>> +drhd->pci_segment = 0;
>>>>>>> +drhd->base_address = config->iommu_base_addr;
>>>>>>> +
>>>>>>> +scope = >scope[0];
>>>>>>> +scope->type = ACPI_DMAR_DEVICE_SCOPE_IOAPIC;
>>>>>>> +scope->length = ioapic_scope_size;
>>>>>>> +scope->enumeration_id = config->ioapic_id;
>>>>>>> +scope->bus = I440_PSEUDO_BUS_PLATFORM;
>>>>>>> +scope->path[0] = I440_PSEUDO_DEVFN_IOAPIC;
>>>>> I'm not sure whether this constants should instead be fields in the
>>>>> acpi_config struct passed down from libxl. libxc shouldn't really need
>>>>> to know anything about which chipset a VM is using.
>>>>
>>>> How about rename I440_PSEUDO_XXX to VIOMMU_PSEUDO_XXX?
>>>
>>> I'm not really complaining about the naming, I'm just saying that I'm
>>> not sure whether this constants should live in libxl. It would be
>>> better IMHO if they where defined in some libxl x86 specific header,
>>> and passed to libxc inside of the acpi_config struct.
>>>
>>> At the end it is libxl which decides which chipset the VM is going to
>>> use, not libxc.
>>
>> We can do that but the bdf is reserved for IOAPIC and should be same for
>> different chipset. Do we still need to pass it via acpi_config?
> 
> Well, which value is the right (reserved) one surely can - at least
> in theory - depend on the chipset. Which means that it should
> come from the same place which determines the chipset to be
> emulated for the guest.
> 

OK. Will update.


-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V2 19/25] x86/vioapic: extend vioapic_get_vector() to support remapping format RTE

2017-08-24 Thread Lan Tianyu
On 2017年08月24日 14:59, Jan Beulich wrote:
>>>> On 24.08.17 at 08:11, <tianyu@intel.com> wrote:
>> On 2017年08月23日 18:14, Roger Pau Monné wrote:
>>> On Wed, Aug 09, 2017 at 04:34:20PM -0400, Lan Tianyu wrote:
>>>> --- a/xen/arch/x86/hvm/vioapic.c
>>>> +++ b/xen/arch/x86/hvm/vioapic.c
>>>> @@ -565,11 +565,27 @@ int vioapic_get_vector(const struct domain *d, 
>>>> unsigned int gsi)
>>>>  {
>>>>  unsigned int pin;
>>>>  const struct hvm_vioapic *vioapic = gsi_vioapic(d, gsi, );
>>>> +struct IO_APIC_route_remap_entry rte = { { 
>>>> vioapic->redirtbl[pin].bits } };
>>>
>>> Designated initialization and const.
>>>
>>>>  
>>>>  if ( !vioapic )
>>>>  return -EINVAL;
>>>>  
>>>> -return vioapic->redirtbl[pin].fields.vector;
>>>> +if ( rte.format )
>>>> +{
>>>> +int err;
>>>> +struct irq_remapping_request request;
>>>> +struct irq_remapping_info info;
>>>> +
>>>> +irq_request_ioapic_fill(, vioapic->id, rte.val);
>>>> +/* Currently, only viommu 0 is supported */
>>>
>>> This seems to be hardcoded in a bunch of places, which makes me wonder
>>> whether having an array of vIOMMUs is the correct choice. I think that
>>> you should remove the array and have a single vIOMMU per domain.
>>
>> The array is reserved for mult-vIOMMU support but so far no such
>> requirement as I know. In design stage, someone commented we should take
>> mult-vIOMMU support into account.
> 
> It _may_ suffice to do so at the public interface level. I'm not
> against using a single entry array right away, but then the rest
> of the code needs to be written as if the array bound was not
> fixed at 1, i.e. no hard coded uses of zero as the only valid
> array index should occur.

Hi Jan:

I am not sure whether we can hide single vIOMMU logic in the device
model. When vIOMMU instance is created in device model, store vIOMMU
instance in the global variable of virtual VTD code. Provide getting
viommu instance callback in vIOMMU ops and helper function in vIOMMU
abstract layer with interrupt information as parameter. New callback
always returns stored vIOMMU instance. This seems to avoid hard coded
uses of zero.

If this can't be accept, removing the array seems to be only feasible way.

-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V2 24/25] x86/vvtd: Add queued invalidation (QI) support

2017-08-24 Thread Lan Tianyu
On 2017年08月23日 20:16, Roger Pau Monné wrote:
> On Wed, Aug 09, 2017 at 04:34:25PM -0400, Lan Tianyu wrote:
>> > From: Chao Gao <chao@intel.com>
>> > 
>> > Queued Invalidation Interface is an expanded invalidation interface with
>> > extended capabilities. Hardware implementations report support for queued
>> > invalidation interface through the Extended Capability Register. The queued
>> > invalidation interface uses an Invalidation Queue (IQ), which is a circular
>> > buffer in system memory. Software submits commands by writing Invalidation
>> > Descriptors to the IQ.
>> > 
>> > In this patch, a new function viommu_process_iq() is used for emulating how
>> > hardware handles invalidation requests through QI.
> It seems like this is an extended feature, which is not needed for
> basic functionality. Would it be possible to have this series focus on
> the bare-minimum functionality, leaving everything else to a separate
> series?
> 

No, IOMMU supporting interrupt remapping must also support Queued
Invalidation (QI) according VTD spec.

-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V2 19/25] x86/vioapic: extend vioapic_get_vector() to support remapping format RTE

2017-08-24 Thread Lan Tianyu
On 2017年08月23日 18:14, Roger Pau Monné wrote:
> On Wed, Aug 09, 2017 at 04:34:20PM -0400, Lan Tianyu wrote:
>> From: Chao Gao <chao@intel.com>
>>
>> When IOAPIC RTE is in remapping format, it doesn't contain the vector of
>> interrupt. For this case, the RTE contains an index of interrupt remapping
>> table where the vector of interrupt is stored. This patchs gets the vector
>> through a vIOMMU interface.
>>
>> Signed-off-by: Chao Gao <chao@intel.com>
>> Signed-off-by: Lan Tianyu <tianyu@intel.com>
>> ---
>>  xen/arch/x86/hvm/vioapic.c | 18 +-
>>  1 file changed, 17 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
>> index 322f33c..ff0742d 100644
>> --- a/xen/arch/x86/hvm/vioapic.c
>> +++ b/xen/arch/x86/hvm/vioapic.c
>> @@ -565,11 +565,27 @@ int vioapic_get_vector(const struct domain *d, 
>> unsigned int gsi)
>>  {
>>  unsigned int pin;
>>  const struct hvm_vioapic *vioapic = gsi_vioapic(d, gsi, );
>> +struct IO_APIC_route_remap_entry rte = { { vioapic->redirtbl[pin].bits 
>> } };
> 
> Designated initialization and const.
> 
>>  
>>  if ( !vioapic )
>>  return -EINVAL;
>>  
>> -return vioapic->redirtbl[pin].fields.vector;
>> +if ( rte.format )
>> +{
>> +int err;
>> +struct irq_remapping_request request;
>> +struct irq_remapping_info info;
>> +
>> +irq_request_ioapic_fill(, vioapic->id, rte.val);
>> +/* Currently, only viommu 0 is supported */
> 
> This seems to be hardcoded in a bunch of places, which makes me wonder
> whether having an array of vIOMMUs is the correct choice. I think that
> you should remove the array and have a single vIOMMU per domain.

The array is reserved for mult-vIOMMU support but so far no such
requirement as I know. In design stage, someone commented we should take
mult-vIOMMU support into account.

We may add callback of getting vIOMMU in vIOMMU ops and let vIOMMU
device model return associated vIOMMU instance according irq remapping
information(e.g source id). One VM suppose to have only one vIOMMU type.
When add multi-vIOMMU support, this logic also can be applied.

For current scenario. device model should return the first vIOMMU directly.

> 
>> +err = viommu_get_irq_info(vioapic->domain, 0, , );
>> +return !err ? info.vector : -1;
> 
> maybe:
> 
> return err ?: info.vector;
> 
> ?
> 
>> +}
>> +else
>> +{
>> +return vioapic->redirtbl[pin].fields.vector;
>> +}
>> +
>>  }
>>  
>>  int vioapic_get_trigger_mode(const struct domain *d, unsigned int gsi)
>> -- 
>> 1.8.3.1
>>
>>
>> ___
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> https://lists.xen.org/xen-devel


-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V2 16/25] x86/vioapic: Hook interrupt delivery of vIOAPIC

2017-08-23 Thread Lan Tianyu
On 2017年08月23日 17:59, Roger Pau Monné wrote:
> On Wed, Aug 09, 2017 at 04:34:17PM -0400, Lan Tianyu wrote:
>> From: Chao Gao <chao@intel.com>
>>
>> When irq remapping is enabled, IOAPIC Redirection Entry may be in remapping
>> format. If that, generate an irq_remapping_request and call the common
>> VIOMMU abstraction's callback to handle this interrupt request. Device
>> model is responsible for checking the request's validity.
>>
>> Signed-off-by: Chao Gao <chao@intel.com>
>> Signed-off-by: Lan Tianyu <tianyu@intel.com>
>> ---
>>  xen/arch/x86/hvm/vioapic.c | 14 ++
>>  1 file changed, 14 insertions(+)
>>
>> diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
>> index 72cae93..322f33c 100644
>> --- a/xen/arch/x86/hvm/vioapic.c
>> +++ b/xen/arch/x86/hvm/vioapic.c
>> @@ -30,6 +30,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  #include 
>>  #include 
>>  #include 
>> @@ -39,6 +40,8 @@
>>  #include 
>>  #include 
>>  
>> +#include "../../../drivers/passthrough/vtd/vtd.h"
> 
> Ouch, that's not very nice. Why do you need this? I though that you
> introduced an arch-agnostic layer that should be suitable?

Yes, agree. So far, I think of introducing a callback of checking
remapping mode in viommu ops and let vIOMMU device model to check
whether vioapic is in interrupt remapping mode. Device model can use
Intel or AMD IOAPIC remapping format to parse IOAPIC entry.

> 
>>  /* HACK: Route IRQ0 only to VCPU0 to prevent time jumps. */
>>  #define IRQ0_SPECIAL_ROUTING 1
>>  
>> @@ -387,9 +390,20 @@ static void vioapic_deliver(struct hvm_vioapic 
>> *vioapic, unsigned int pin)
>>  struct vlapic *target;
>>  struct vcpu *v;
>>  unsigned int irq = vioapic->base_gsi + pin;
>> +struct IO_APIC_route_remap_entry rte = { { vioapic->redirtbl[pin].bits 
>> } };
> 
> Designated initializers please.
> 
> Roger.
> 


-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V2 9/25] tools/libxl: build DMAR table for a guest with one virtual VTD

2017-08-23 Thread Lan Tianyu
On 2017年08月23日 16:34, Wei Liu wrote:
> On Wed, Aug 23, 2017 at 01:35:17PM +0800, Lan Tianyu wrote:
>> On 2017年08月22日 21:48, Wei Liu wrote:
>>>>> Hi, Wei
>>>>> Thanks for your comments.
>>>>>
>>>>> iirc, HVM only supports one module; DMAR cannot be a new module. Joining 
>>>>> to
>>>>> the existing one is the approach we are taking. 
>>>>>
>>>>> Which kind of conflicts you think should be resolved? If you mean I
>>>>> forget to free the old buf, I will fix this. If you mean the potential
>>>>> overlap between the binary passed by admin and DMAR table built here, I
>>>>> don't have much idea on this. Even without the DMAR table, the binary
>>>>> may contains MADT or other tables and tool stacks don't intrepret the
>>>>> binary and check whether there are conflicts, right?
>>>>>
>>> Thinking a bit more about this, when I first said "conflicts" I didn't
>>> mean to parse the content. I was referring to the code in
>>> libxl_x86_apci.c which also seems to manipulate acpi_modules.
>>
>> Code in libxl_x86_acpi.c works for Hvmlite/PVHv2. The code we added is
>> for hvm guest.
>>
> 
> That's correct for the code as-is but what is preventing the code there
> from working with HVM? Assuming correct checks and branches are added
> to appropriate places?
> 
> I'm against having multiple locations doing things that could
> potentially clash with each other. In the foreseeable future PVH is
> going to get need similar functionality.
> 
> My expectation is that if the existing code needs to be taken into
> consideration and the contributors need to figure out if and how it can
> be modified to suite their needs. If everyone is doing their own thing
> in their own little function Xen will eventually become unmaintainable.
> 
>>>
>>> I would like the code to generate dmar take into consideration
>>> libxl__dom_load_acpi.
>>>
>>
>> If add dmar table for hvmlite, we should combine dmar table with other
>> ACPI table and populate into acpi_modules[2]. This is how hvmlite add
>> other ACPI tables in libxl__dom_load_acpi().
>>
> 
> Sure, that sounds plausible.
> 
> What I would like to see is to have one entry point to manipulate APCI
> tables.
> 
> Given the patch volume we're seeing now, we expect contributors to drive
> the discussion forward. If you're not sure, feel free to ask more questions.

I am not sure whether I understood correctly.

PVHv2 builds all ACPI table in tool stack and uses acpi_module[0, 1, 2]
to pass related table content.

HVM builds ACPI tables in hvmloader and just use acpi_module[0] to pass
additional ACPI firmware or table.

These two modes have different way to use acpi_modules[]. So I think we
can't combine them, right?

For build dmar table, we have introduced construct_dmar() in under
libacpi to build dmar table and PVHv2 also can use it in
libxl__dom_load_acpi().


-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V2 7/25] tools/libacpi: Add new fields in acpi_config for DMAR table

2017-08-23 Thread Lan Tianyu
On 2017年08月23日 16:04, Roger Pau Monné wrote:
> On Wed, Aug 23, 2017 at 03:52:01PM +0800, Lan Tianyu wrote:
>> On 2017年08月23日 00:41, Roger Pau Monné wrote:
>>>>> +drhd = (struct acpi_dmar_hardware_unit *)((void*)dmar + 
>>>>> sizeof(*dmar));
>>>>> +drhd->type = ACPI_DMAR_TYPE_HARDWARE_UNIT;
>>>>> +drhd->length = sizeof(*drhd) + ioapic_scope_size;
>>>>> +drhd->flags = ACPI_DMAR_INCLUDE_PCI_ALL;
>>>>> +drhd->pci_segment = 0;
>>>>> +drhd->base_address = config->iommu_base_addr;
>>>>> +
>>>>> +scope = >scope[0];
>>>>> +scope->type = ACPI_DMAR_DEVICE_SCOPE_IOAPIC;
>>>>> +scope->length = ioapic_scope_size;
>>>>> +scope->enumeration_id = config->ioapic_id;
>>>>> +scope->bus = I440_PSEUDO_BUS_PLATFORM;
>>>>> +scope->path[0] = I440_PSEUDO_DEVFN_IOAPIC;
>>> I'm not sure whether this constants should instead be fields in the
>>> acpi_config struct passed down from libxl. libxc shouldn't really need
>>> to know anything about which chipset a VM is using.
>>
>> How about rename I440_PSEUDO_XXX to VIOMMU_PSEUDO_XXX?
> 
> I'm not really complaining about the naming, I'm just saying that I'm
> not sure whether this constants should live in libxl. It would be
> better IMHO if they where defined in some libxl x86 specific header,
> and passed to libxc inside of the acpi_config struct.
> 
> At the end it is libxl which decides which chipset the VM is going to
> use, not libxc.

We can do that but the bdf is reserved for IOAPIC and should be same for
different chipset. Do we still need to pass it via acpi_config?


> 
> Roger.
> 


-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V2 11/25] x86/hvm: Introduce a emulated VTD for HVM

2017-08-23 Thread Lan Tianyu
On 2017年08月23日 15:58, Roger Pau Monné wrote:
> On Wed, Aug 09, 2017 at 04:34:12PM -0400, Lan Tianyu wrote:
>> From: Chao Gao <chao@intel.com>
>>
>> This patch adds create/destroy/query function for the emulated VTD
>> and adapts it to the common VIOMMU abstraction.
>>
>> Signed-off-by: Chao Gao <chao@intel.com>
>> Signed-off-by: Lan Tianyu <tianyu@intel.com>
>> ---
>>  xen/drivers/passthrough/vtd/Makefile |   7 +-
>>  xen/drivers/passthrough/vtd/iommu.h  |  99 +-
>>  xen/drivers/passthrough/vtd/vvtd.c   | 158 
>> +++
>>  xen/include/asm-x86/viommu.h |   3 +
>>  4 files changed, 241 insertions(+), 26 deletions(-)
>>  create mode 100644 xen/drivers/passthrough/vtd/vvtd.c
>>
>> diff --git a/xen/drivers/passthrough/vtd/Makefile 
>> b/xen/drivers/passthrough/vtd/Makefile
>> index f302653..163c7fe 100644
>> --- a/xen/drivers/passthrough/vtd/Makefile
>> +++ b/xen/drivers/passthrough/vtd/Makefile
>> @@ -1,8 +1,9 @@
>>  subdir-$(CONFIG_X86) += x86
>>  
>> -obj-y += iommu.o
>>  obj-y += dmar.o
>> -obj-y += utils.o
>> -obj-y += qinval.o
>>  obj-y += intremap.o
>> +obj-y += iommu.o
>> +obj-y += qinval.o
>>  obj-y += quirks.o
>> +obj-y += utils.o
>> +obj-$(CONFIG_VIOMMU) += vvtd.o
>> diff --git a/xen/drivers/passthrough/vtd/iommu.h 
>> b/xen/drivers/passthrough/vtd/iommu.h
>> index 72c1a2e..55f3b6e 100644
>> --- a/xen/drivers/passthrough/vtd/iommu.h
>> +++ b/xen/drivers/passthrough/vtd/iommu.h
>> @@ -23,31 +23,54 @@
>>  #include 
>>  
>>  /*
>> - * Intel IOMMU register specification per version 1.0 public spec.
>> + * Intel IOMMU register specification per version 2.4 public spec.
>>   */
>>  
>> -#defineDMAR_VER_REG0x0/* Arch version supported by this IOMMU */
>> -#defineDMAR_CAP_REG0x8/* Hardware supported capabilities */
>> -#defineDMAR_ECAP_REG0x10/* Extended capabilities supported */
>> -#defineDMAR_GCMD_REG0x18/* Global command register */
>> -#defineDMAR_GSTS_REG0x1c/* Global status register */
>> -#defineDMAR_RTADDR_REG0x20/* Root entry table */
>> -#defineDMAR_CCMD_REG0x28/* Context command reg */
>> -#defineDMAR_FSTS_REG0x34/* Fault Status register */
>> -#defineDMAR_FECTL_REG0x38/* Fault control register */
>> -#defineDMAR_FEDATA_REG0x3c/* Fault event interrupt data 
>> register */
>> -#defineDMAR_FEADDR_REG0x40/* Fault event interrupt addr 
>> register */
>> -#defineDMAR_FEUADDR_REG 0x44/* Upper address register */
>> -#defineDMAR_AFLOG_REG0x58/* Advanced Fault control */
>> -#defineDMAR_PMEN_REG0x64/* Enable Protected Memory Region */
>> -#defineDMAR_PLMBASE_REG 0x68/* PMRR Low addr */
>> -#defineDMAR_PLMLIMIT_REG 0x6c/* PMRR low limit */
>> -#defineDMAR_PHMBASE_REG 0x70/* pmrr high base addr */
>> -#defineDMAR_PHMLIMIT_REG 0x78/* pmrr high limit */
>> -#defineDMAR_IQH_REG0x80/* invalidation queue head */
>> -#defineDMAR_IQT_REG0x88/* invalidation queue tail */
>> -#defineDMAR_IQA_REG0x90/* invalidation queue addr */
>> -#defineDMAR_IRTA_REG   0xB8/* intr remap */
>> +#define DMAR_VER_REG0x0  /* Arch version supported by this 
>> IOMMU */
>> +#define DMAR_CAP_REG0x8  /* Hardware supported capabilities */
>> +#define DMAR_ECAP_REG   0x10 /* Extended capabilities supported */
>> +#define DMAR_GCMD_REG   0x18 /* Global command register */
>> +#define DMAR_GSTS_REG   0x1c /* Global status register */
>> +#define DMAR_RTADDR_REG 0x20 /* Root entry table */
>> +#define DMAR_CCMD_REG   0x28 /* Context command reg */
>> +#define DMAR_FSTS_REG   0x34 /* Fault Status register */
>> +#define DMAR_FECTL_REG  0x38 /* Fault control register */
>> +#define DMAR_FEDATA_REG 0x3c /* Fault event interrupt data register 
>> */
>> +#define DMAR_FEADDR_REG 0x40 /* Fault event interrupt addr register 
>> */
>> +#define DMAR_FEUADDR_REG0x44 /* Upper address register */
>> +#define DMAR_AFLOG_REG  0x58 /* Advanced Fault control */
>> +#define DMAR_PMEN_REG   0x64 /* Enable Protected Memory Region */
>> +#define DMAR_PLMBASE_REG0x68 /* PMRR Low addr */
>> +#define DMAR_PLMLIMIT_REG   0x6c /* PMRR low limit */
>> +#define DMAR_PH

Re: [Xen-devel] [PATCH V2 2/25] VIOMMU: Add irq request callback to deal with irq remapping

2017-08-23 Thread Lan Tianyu
On 2017年08月23日 17:24, Jan Beulich wrote:
>>>> On 23.08.17 at 09:42, <tianyu@intel.com> wrote:
>> On 2017年08月22日 23:32, Roger Pau Monné wrote:
>>> On Wed, Aug 09, 2017 at 04:34:03PM -0400, Lan Tianyu wrote:
>>>> +static inline void irq_request_ioapic_fill(struct irq_remapping_request 
>>>> *req,
>>>> + uint32_t ioapic_id, uint64_t rte)
>>>> +{
>>>> +ASSERT(req);
>>>> +req->type = VIOMMU_REQUEST_IRQ_APIC;
>>>> +req->source_id = ioapic_id;
>>>> +req->msg.rte = rte;
>>>> +}
>>>> +
>>>> +static inline void irq_request_msi_fill(struct irq_remapping_request *req,
>>>> +  uint32_t source_id, uint64_t addr, uint32_t 
>>>> data)
>>>> +{
>>>> +ASSERT(req);
>>>> +req->type = VIOMMU_REQUEST_IRQ_MSI;
>>>> +req->source_id = source_id;
>>>> +req->msg.msi.addr = addr;
>>>> +req->msg.msi.data = data;
>>>> +}
>>>
>>> What's the usage of those two functions? AFAICT they don't have any
>>> callers in this patch.
>>
>> These functions will be called in the following interrupt patch 22
>> "x86/vmsi: Hook delivering remapping format msi to guest" and patch 16
>> "x86/vioapic: Hook interrupt delivery of vIOAPIC"
> 
> That's _far_ away. As implied by Roger's comment, please try to
> avoid introducing dead code, especially when it's dead for an
> extended period of time. Always remember that a series may not
> be committed in one go.
OK. Will change order.

-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V2 1/25] DOMCTL: Introduce new DOMCTL commands for vIOMMU support

2017-08-23 Thread Lan Tianyu
On 2017年08月23日 15:22, Roger Pau Monné wrote:
> On Wed, Aug 23, 2017 at 02:06:17PM +0800, Lan Tianyu wrote:
>> Hi Roger:
>>  Thanks for your review.
>>
>> On 2017年08月22日 22:32, Roger Pau Monné wrote:
>>> On Wed, Aug 09, 2017 at 04:34:02PM -0400, Lan Tianyu wrote:
>>>> +
>>>> +/* vIOMMU capabilities */
>>>> +#define VIOMMU_CAP_IRQ_REMAPPING  (1u << 0)
>>>> +
>>>> +struct xen_domctl_viommu_op {
>>>> +uint32_t cmd;
>>>> +#define XEN_DOMCTL_create_viommu  0
>>>> +#define XEN_DOMCTL_destroy_viommu 1
>>>> +#define XEN_DOMCTL_query_viommu_caps  2
>>>> +union {
>>>> +struct {
>>>> +/* IN - vIOMMU type */
>>>> +uint64_t viommu_type;
>>>> +/* 
>>>> + * IN - MMIO base address of vIOMMU. vIOMMU device models
>>>> + * are in charge of to check base_address and length.
>>>> + */
>>>> +uint64_t base_address;
>>>> +/* IN - Length of MMIO region */
>>>> +uint64_t length;
>>>
>>> It seems weird that you can specify the length, is that something
>>> that a user would like to set? Isn't the length of the IOMMU MMIO
>>> region fixed by the hardware spec?
>>
>> Different vendor may have different IOMMU register region sizes. (e.g,
>> VTD has one page size for register region). The length field is to make
>> vIOMMU device model not to abuse address space. Some registers' offsets
>> are reported by other register and these offsets are emulated by vIOMMU
>> device model. If it's not necessary, we can remove it and add it when
>> there is real such requirement.
> 
> So from my understanding the size of the IOMMU MMIO region is implicit
> in the IOMMU type that the user chooses. I don't think this field is
> needed.

OK. Will remove it.

-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V2 10/25] tools/libxl: create vIOMMU during domain construction

2017-08-23 Thread Lan Tianyu
On 2017年08月23日 15:45, Roger Pau Monné wrote:
> On Wed, Aug 09, 2017 at 04:34:11PM -0400, Lan Tianyu wrote:
>> From: Chao Gao <chao@intel.com>
>>
>> If guest is configured to have a vIOMMU, create it during domain 
>> construction.
>>
>> Signed-off-by: Chao Gao <chao@intel.com>
>> Signed-off-by: Lan Tianyu <tianyu@intel.com>
>> ---
>>  tools/libxl/libxl_x86.c | 28 
>>  1 file changed, 28 insertions(+)
>>
>> diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
>> index 455f6f0..ace20e5 100644
>> --- a/tools/libxl/libxl_x86.c
>> +++ b/tools/libxl/libxl_x86.c
>> @@ -341,8 +341,36 @@ int libxl__arch_domain_create(libxl__gc *gc, 
>> libxl_domain_config *d_config,
>>  if (d_config->b_info.type == LIBXL_DOMAIN_TYPE_HVM) {
> 
> I would rather change this check so it's:
> 
> d_config->b_info.type != LIBXL_DOMAIN_TYPE_PV
> 
> Is there any reason why PVH guests shouldn't get a vIOMMU?

No, but we current only support vIOMMU for HVM guest and don't know how
PVH guest enumerates vIOMMU without ACPI DMAR table.

-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V2 7/25] tools/libacpi: Add new fields in acpi_config for DMAR table

2017-08-23 Thread Lan Tianyu
On 2017年08月23日 00:41, Roger Pau Monné wrote:
>> > +drhd = (struct acpi_dmar_hardware_unit *)((void*)dmar + 
>> > sizeof(*dmar));
>> > +drhd->type = ACPI_DMAR_TYPE_HARDWARE_UNIT;
>> > +drhd->length = sizeof(*drhd) + ioapic_scope_size;
>> > +drhd->flags = ACPI_DMAR_INCLUDE_PCI_ALL;
>> > +drhd->pci_segment = 0;
>> > +drhd->base_address = config->iommu_base_addr;
>> > +
>> > +scope = >scope[0];
>> > +scope->type = ACPI_DMAR_DEVICE_SCOPE_IOAPIC;
>> > +scope->length = ioapic_scope_size;
>> > +scope->enumeration_id = config->ioapic_id;
>> > +scope->bus = I440_PSEUDO_BUS_PLATFORM;
>> > +scope->path[0] = I440_PSEUDO_DEVFN_IOAPIC;
> I'm not sure whether this constants should instead be fields in the
> acpi_config struct passed down from libxl. libxc shouldn't really need
> to know anything about which chipset a VM is using.

How about rename I440_PSEUDO_XXX to VIOMMU_PSEUDO_XXX?


-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V2 3/25] VIOMMU: Add get irq info callback to convert irq remapping request

2017-08-23 Thread Lan Tianyu
On 2017年08月22日 23:38, Roger Pau Monné wrote:
> On Wed, Aug 09, 2017 at 04:34:04PM -0400, Lan Tianyu wrote:
>> This patch is to add get_irq_info callback for platform implementation
>> to convert irq remapping request to irq info (E,G vector, dest, dest_mode
>> and so on).
>>
>> Signed-off-by: Lan Tianyu <tianyu@intel.com>
>> ---
>>  xen/common/viommu.c  | 16 
>>  xen/include/asm-x86/viommu.h |  8 
>>  xen/include/xen/viommu.h |  9 +
>>  3 files changed, 33 insertions(+)
>>
>> diff --git a/xen/common/viommu.c b/xen/common/viommu.c
>> index f4d34e6..03c879d 100644
>> --- a/xen/common/viommu.c
>> +++ b/xen/common/viommu.c
>> @@ -213,6 +213,22 @@ int viommu_handle_irq_request(struct domain *d, u32 
>> viommu_id,
>>  return info->viommu[viommu_id]->ops->handle_irq_request(d, request);
>>  }
>>  
>> +int viommu_get_irq_info(struct domain *d, u32 viommu_id,
>> +struct irq_remapping_request *request,
>> +struct irq_remapping_info *irq_info)
> 
> The definition of this struct seems to be arch-specific, in which case
> IMHO it should be called arch_irq_remapping_info, in order to denote
> it's arch-specific.

OK. Will update.

> 
>> +{
>> +struct viommu_info *info = >viommu;
>> +
>> +if ( viommu_id >= info->nr_viommu
>> + || !info->viommu[viommu_id] )
> 
> Unneeded line break.
> 
> Roger.
> 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V2 2/25] VIOMMU: Add irq request callback to deal with irq remapping

2017-08-23 Thread Lan Tianyu
On 2017年08月22日 23:32, Roger Pau Monné wrote:
> On Wed, Aug 09, 2017 at 04:34:03PM -0400, Lan Tianyu wrote:
>> This patch is to add irq request callback for platform implementation
>> to deal with irq remapping request.
>>
>> Signed-off-by: Lan Tianyu <tianyu@intel.com>
>> ---
>>  xen/common/viommu.c  | 15 +
>>  xen/include/asm-x86/viommu.h | 73 
>> 
>>  xen/include/xen/viommu.h |  9 ++
>>  3 files changed, 97 insertions(+)
>>  create mode 100644 xen/include/asm-x86/viommu.h
>>
>> diff --git a/xen/common/viommu.c b/xen/common/viommu.c
>> index a4d004d..f4d34e6 100644
>> --- a/xen/common/viommu.c
>> +++ b/xen/common/viommu.c
>> @@ -198,6 +198,21 @@ int __init viommu_setup(void)
>>  return 0;
>>  }
>>  
>> +int viommu_handle_irq_request(struct domain *d, u32 viommu_id,
>> +  struct irq_remapping_request *request)
>> +{
>> +struct viommu_info *info = >viommu;
>> +
>> +if ( viommu_id >= info->nr_viommu
>> + || !info->viommu[viommu_id] )
> 
> This fits on the same line, no need to split it.
> 
>> +return -EINVAL;
>> +
>> +if ( !info->viommu[viommu_id]->ops->handle_irq_request )
>> +return -EINVAL;
>> +
>> +return info->viommu[viommu_id]->ops->handle_irq_request(d, request);
>> +}
>> +
>>  /*
>>   * Local variables:
>>   * mode: C
>> diff --git a/xen/include/asm-x86/viommu.h b/xen/include/asm-x86/viommu.h
>> new file mode 100644
>> index 000..51bda72
>> --- /dev/null
>> +++ b/xen/include/asm-x86/viommu.h
>> @@ -0,0 +1,73 @@
>> +/*
>> + * include/asm-x86/viommu.h
>> + *
>> + * Copyright (c) 2017 Intel Corporation.
>> + * Author: Lan Tianyu <tianyu@intel.com> 
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along 
>> with
>> + * this program; If not, see <http://www.gnu.org/licenses/>.
>> + *
>> + */
>> +#ifndef __ARCH_X86_VIOMMU_H__
>> +#define __ARCH_X86_VIOMMU_H__
>> +
>> +#include 
>> +#include 
>> +
>> +/* IRQ request type */
>> +#define VIOMMU_REQUEST_IRQ_MSI  0
>> +#define VIOMMU_REQUEST_IRQ_APIC 1
>> +
>> +struct irq_remapping_request
>> +{
>> +union {
>> +/* MSI */
>> +struct {
>> +u64 addr;
>> +u32 data;
>> +} msi;
>> +/* Redirection Entry in IOAPIC */
>> +u64 rte;
>> +} msg;
>> +u16 source_id;
>> +u8 type;
>> +};
>> +
>> +static inline void irq_request_ioapic_fill(struct irq_remapping_request 
>> *req,
>> + uint32_t ioapic_id, uint64_t rte)
>> +{
>> +ASSERT(req);
>> +req->type = VIOMMU_REQUEST_IRQ_APIC;
>> +req->source_id = ioapic_id;
>> +req->msg.rte = rte;
>> +}
>> +
>> +static inline void irq_request_msi_fill(struct irq_remapping_request *req,
>> +  uint32_t source_id, uint64_t addr, uint32_t data)
>> +{
>> +ASSERT(req);
>> +req->type = VIOMMU_REQUEST_IRQ_MSI;
>> +req->source_id = source_id;
>> +req->msg.msi.addr = addr;
>> +req->msg.msi.data = data;
>> +}
> 
> What's the usage of those two functions? AFAICT they don't have any
> callers in this patch.

These functions will be called in the following interrupt patch 22
"x86/vmsi: Hook delivering remapping format msi to guest" and patch 16
"x86/vioapic: Hook interrupt delivery of vIOAPIC"

> 
>> +
>> +#endif /* __ARCH_X86_VIOMMU_H__ */
>> +
>> +/*
>> + * Local variables:
>> + * mode: C
>> + * c-file-style: "BSD"
>> + * c-basic-offset: 4
>> + * tab-width: 4
>> + * End:
>> + */
>> diff --git a/xen/include/xen/viommu.h b/xen/include/xen/viommu.h
>> index 527afb1..0be1b3a 100644
&g

Re: [Xen-devel] [PATCH V2 4/25] Xen/doc: Add Xen virtual IOMMU doc

2017-08-23 Thread Lan Tianyu
On 2017年08月22日 23:55, Roger Pau Monné wrote:
> On Wed, Aug 09, 2017 at 04:34:05PM -0400, Lan Tianyu wrote:
>> This patch is to add Xen virtual IOMMU doc to introduce motivation,
>> framework, vIOMMU hypercall and xl configuration.
>>
>> Signed-off-by: Lan Tianyu <tianyu@intel.com>
>> ---
>>  docs/misc/viommu.txt | 139 
>> +++
>>  1 file changed, 139 insertions(+)
>>  create mode 100644 docs/misc/viommu.txt
>>
>> diff --git a/docs/misc/viommu.txt b/docs/misc/viommu.txt
>> new file mode 100644
>> index 000..39455bb
>> --- /dev/null
>> +++ b/docs/misc/viommu.txt
> 
> IMHO, this should be the first patch in the series.

OK. Will update.

> 
>> @@ -0,0 +1,139 @@
>> +Xen virtual IOMMU
>> +
>> +Motivation
>> +==
>> +*) Enable more than 255 vcpu support
> 
> Seems like the "*)" is some kind of leftover?
> 
>> +HPC cloud service requires VM provides high performance parallel
>> +computing and we hope to create a huge VM with >255 vcpu on one machine
>> +to meet such requirement. Pin each vcpu to separate pcpus.
> 
> I would re-write this as:
> 
> The current requirements of HPC cloud service requires VM with a high
> number of CPUs in order to achieve high performance in parallel
> computing.
> 
> Also, this is needed in order to create VMs with > 128 vCPUs, not 255
> vCPUs. That's because the APIC ID used by Xen is CPU ID * 2 (ie: CPU
> 127 has APIC ID 254, which is the last one available in xAPIC mode).
> You should reword the paragraphs below in order to fix the mention of
> 255 vCPUs.

Thanks for your rewrite.

> 
>> +
>> +To support >255 vcpus, X2APIC mode in guest is necessary because legacy
>> +APIC(XAPIC) just supports 8-bit APIC ID and it only can support 255
>> +vcpus at most. X2APIC mode supports 32-bit APIC ID and it requires
>> +interrupt mapping function of vIOMMU.
> 
> Correct me if I'm wrong, but I don't think x2APIC requires vIOMMU. The
> IOMMU is required so that you can route interrupts to all the possible
> CPUs. One could image a setup where only CPUs with APIC IDs < 255 are
> used as targets of external interrupts, and that doesn't require a
> IOMMU.

This is OS behavior. IIRC, Windows strictly requires IOMMU when enable
x2apic mode and Linux kernel only has such requirement when cpu number
is > 255.


> 
>> +The reason for this is that there is no modification to existing PCI MSI
>> +and IOAPIC with the introduction of X2APIC. PCI MSI/IOAPIC can only send
>> +interrupt message containing 8-bit APIC ID, which cannot address >255
>> +cpus. Interrupt remapping supports 32-bit APIC ID and so it's necessary
>> +to enable >255 cpus with x2apic mode.
>> +
>> +
>> +vIOMMU Architecture
>> +===
>> +vIOMMU device model is inside Xen hypervisor for following factors
>> +1) Avoid round trips between Qemu and Xen hypervisor
>> +2) Ease of integration with the rest of hypervisor
>> +3) HVMlite/PVH doesn't use Qemu
>> +
>> +* Interrupt remapping overview.
>> +Interrupts from virtual devices and physical devices are delivered
>> +to vLAPIC from vIOAPIC and vMSI. vIOMMU needs to remap interrupt during
>> +this procedure.
>> +
>> ++---+
>> +|Qemu   |VM |
>> +|   | ++|
>> +|   | |  Device driver ||
>> +|   | ++---+|
>> +|   |  ^|
>> +|   ++  | ++---+|
>> +|   | Virtual device |  | |  IRQ subsystem ||
>> +|   +---++  | ++---+|
>> +|   |   |  ^|
>> +|   |   |  ||
>> ++---+---+
>> +|hypervisor |  | VIRQ   |
>> +|   |+-++   |
>> +|   ||  vLAPIC  |   |
>> +|   |VIRQ+-++   |
>> +|   |  ^|
>> +|   |  ||
>> +|   |+-++   |
>> +|   ||  vIOMMU  |   |
>> +|   |+-++   |
>> +|   |  

Re: [Xen-devel] [PATCH V2 1/25] VIOMMU: Add vIOMMU helper functions to create, destroy and query capabilities

2017-08-23 Thread Lan Tianyu
On 2017年08月22日 23:27, Roger Pau Monné wrote:
> On Thu, Aug 17, 2017 at 08:22:16PM -0400, Lan Tianyu wrote:
>> This patch is to introduct an abstract layer for arch vIOMMU implementation
>> to deal with requests from dom0. Arch vIOMMU code needs to provide callback
>> to perform create, destroy and query capabilities operation.
>>
>> Signed-off-by: Lan Tianyu <tianyu@intel.com>
>> ---
>>  xen/arch/x86/Kconfig |   1 +
>>  xen/arch/x86/setup.c |   1 +
>>  xen/common/Kconfig   |   3 +
>>  xen/common/Makefile  |   1 +
>>  xen/common/domain.c  |   3 +
>>  xen/common/viommu.c  | 165 
>> +++
>>  xen/include/xen/sched.h  |   2 +
>>  xen/include/xen/viommu.h |  71 
>>  8 files changed, 247 insertions(+)
>>  create mode 100644 xen/common/viommu.c
>>  create mode 100644 xen/include/xen/viommu.h
>>
>> diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
>> index 30c2769..1f1de96 100644
>> --- a/xen/arch/x86/Kconfig
>> +++ b/xen/arch/x86/Kconfig
>> @@ -23,6 +23,7 @@ config X86
>>  select HAS_PDX
>>  select NUMA
>>  select VGA
>> +select VIOMMU
>>  
>>  config ARCH_DEFCONFIG
>>  string
>> diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
>> index db5df69..68f1631 100644
>> --- a/xen/arch/x86/setup.c
>> +++ b/xen/arch/x86/setup.c
>> @@ -1513,6 +1513,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>>  early_msi_init();
>>  
>>  iommu_setup();/* setup iommu if available */
>> +viommu_setup();
>>  
>>  smp_prepare_cpus(max_cpus);
>>  
>> diff --git a/xen/common/Kconfig b/xen/common/Kconfig
>> index dc8e876..2ad2c8d 100644
>> --- a/xen/common/Kconfig
>> +++ b/xen/common/Kconfig
>> @@ -49,6 +49,9 @@ config HAS_CHECKPOLICY
>>  string
>>  option env="XEN_HAS_CHECKPOLICY"
>>  
>> +config VIOMMU
>> +bool
>> +
>>  config KEXEC
>>  bool "kexec support"
>>  default y
>> diff --git a/xen/common/Makefile b/xen/common/Makefile
>> index 26c5a64..852553d 100644
>> --- a/xen/common/Makefile
>> +++ b/xen/common/Makefile
>> @@ -56,6 +56,7 @@ obj-y += time.o
>>  obj-y += timer.o
>>  obj-y += trace.o
>>  obj-y += version.o
>> +obj-$(CONFIG_VIOMMU) += viommu.o
>>  obj-y += virtual_region.o
>>  obj-y += vm_event.o
>>  obj-y += vmap.o
>> diff --git a/xen/common/domain.c b/xen/common/domain.c
>> index b22aacc..d1f9b10 100644
>> --- a/xen/common/domain.c
>> +++ b/xen/common/domain.c
>> @@ -396,6 +396,9 @@ struct domain *domain_create(domid_t domid, unsigned int 
>> domcr_flags,
>>  spin_unlock(_update_lock);
>>  }
>>  
>> +if ( (err = viommu_init_domain(d)) != 0 )
>> +goto fail;
>> +
>>  return d;
>>  
>>   fail:
>> diff --git a/xen/common/viommu.c b/xen/common/viommu.c
>> new file mode 100644
>> index 000..6874d9f
>> --- /dev/null
>> +++ b/xen/common/viommu.c
>> @@ -0,0 +1,165 @@
>> +/*
>> + * common/viommu.c
>> + * 
>> + * Copyright (c) 2017 Intel Corporation
>> + * Author: Lan Tianyu <tianyu@intel.com> 
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along 
>> with
>> + * this program; If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +bool __read_mostly opt_viommu;
>> +boolean_param("viommu", opt_viommu);
>> +
>> +static spinlock_t type_list_lock;
> 
> static DEFINE_SPINLOCK(type_list_lock);
> 
>> +static struct list_head type_list;
> 
> static LIST_HEAD(type_list);
> 
>> +
>> +struct viommu_type {
>> +u64 type;
>> +struct viommu_ops *ops;
>> +struct list_head node;
>> +};
>> +
>> +int viommu_init_domain(struct d

Re: [Xen-devel] [PATCH V2 1/25] DOMCTL: Introduce new DOMCTL commands for vIOMMU support

2017-08-23 Thread Lan Tianyu
Hi Roger:
Thanks for your review.

On 2017年08月22日 22:32, Roger Pau Monné wrote:
> On Wed, Aug 09, 2017 at 04:34:02PM -0400, Lan Tianyu wrote:
>> This patch is to introduce create, destroy and query capabilities
>> command for vIOMMU. vIOMMU layer will deal with requests and call
>> arch vIOMMU ops.
>>
>> Signed-off-by: Lan Tianyu <tianyu@intel.com>
>> ---
>>  xen/common/domctl.c |  3 +++
>>  xen/common/viommu.c | 43 +
> 
> I'm confused, I don't see this file in the repo, and the cover letter
> doesn't mention this being based on top of any other series, where
> does this viommu.c file come from?
> 
>>  xen/include/public/domctl.h | 52 
>> +
>>  xen/include/xen/viommu.h|  6 ++
>>  4 files changed, 104 insertions(+)
>>
>> diff --git a/xen/common/domctl.c b/xen/common/domctl.c
>> index d80488b..01c3024 100644
>> --- a/xen/common/domctl.c
>> +++ b/xen/common/domctl.c
>> @@ -1144,6 +1144,9 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) 
>> u_domctl)
>>  if ( !ret )
>>  copyback = 1;
>>  break;
>> +case XEN_DOMCTL_viommu_op:
>> +ret = viommu_domctl(d, >u.viommu_op, );
>> +break;
> 
> Hm, shouldn't this be protected with #ifdef CONFIG_VIOMMU?
> 

Added viommu_domctl() always returns -ENODEV when CONFIG_VIOMMU is unset.

>>  default:
>>  ret = arch_do_domctl(op, d, u_domctl);
>> diff --git a/xen/common/viommu.c b/xen/common/viommu.c
>> index 6874d9f..a4d004d 100644
>> --- a/xen/common/viommu.c
>> +++ b/xen/common/viommu.c
>> @@ -148,6 +148,49 @@ static u64 viommu_query_caps(struct domain *d, u64 type)
>>  return viommu_type->ops->query_caps(d);
>>  }
>>  
>> +int viommu_domctl(struct domain *d, struct xen_domctl_viommu_op *op,
>> +  bool *need_copy)
>> +{
>> +int rc = -EINVAL, ret;
> 
> Do you really need both ret and rc?
> 
>> +if ( !viommu_enabled() )
>> +return rc;
> 
> EINVAL? Maybe ENODEV?

OK.

> 
>> +
>> +switch ( op->cmd )
>> +{
>> +case XEN_DOMCTL_create_viommu:
>> +ret = viommu_create(d, op->u.create_viommu.viommu_type,
>> +op->u.create_viommu.base_address,
>> +op->u.create_viommu.length,
>> +op->u.create_viommu.capabilities);
> 
> I would rather prefer for viommu_create to simply return an error or
> 0, and store the viommu_id by passing a pointer parameter to viommu_create, 
> ie:
> 
> rc = viommu_create(d, op->u.create_viommu.viommu_type,
>op->u.create_viommu.base_address,
>op->u.create_viommu.length,
>op->u.create_viommu.capabilities,
>>u.create_viommu.viommu_id);
> 

Got it. Will update in the next version.

>> +if ( ret >= 0 ) {
>^ coding style
>> +op->u.create_viommu.viommu_id = ret;
>> +*need_copy = true;
>> +rc = 0; /* return 0 if success */
>> +}
>> +break;
>> +
>> +case XEN_DOMCTL_destroy_viommu:
>> +rc = viommu_destroy(d, op->u.destroy_viommu.viommu_id);
>> +break;
>> +
>> +case XEN_DOMCTL_query_viommu_caps:
>> +ret = viommu_query_caps(d, op->u.query_caps.viommu_type);
> 
> Same here, I would rather pass another parameter and use the return
> for error only.
> 
>> +if ( ret >= 0 )
>> +{
>> +op->u.query_caps.capabilities = ret;
>> +rc = 0;
>> +}
>> +*need_copy = true;
>> +break;
>> +
>> +default:
>> +break;
> 
> Here you should return ENOSYS.


OK.

> 
>> +}
>> +
>> +return rc;
>> +}
>> +
>>  int __init viommu_setup(void)
>>  {
>>  INIT_LIST_HEAD(_list);
>> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
>> index ff39762..4b10f26 100644
>> --- a/xen/include/public/domctl.h
>> +++ b/xen/include/public/domctl.h
>> @@ -1149,6 +1149,56 @@ struct xen_domctl_psr_cat_op {
>>  typedef struct xen_domctl_psr_cat_op xen_domctl_psr_cat_op_t;
>>  DEFINE_XEN_GUEST_HANDLE(xen_domctl_psr_cat_op_t);
>>  
>> +/*  vIOMMU helper
>> + *
>> + *  vIOMMU interface can be used to create/destroy vIOMMU and
>> + *  query vIOMMU ca

Re: [Xen-devel] [PATCH V2 9/25] tools/libxl: build DMAR table for a guest with one virtual VTD

2017-08-22 Thread Lan Tianyu
On 2017年08月22日 21:48, Wei Liu wrote:
>> > Hi, Wei
>> > Thanks for your comments.
>> > 
>> > iirc, HVM only supports one module; DMAR cannot be a new module. Joining to
>> > the existing one is the approach we are taking. 
>> > 
>> > Which kind of conflicts you think should be resolved? If you mean I
>> > forget to free the old buf, I will fix this. If you mean the potential
>> > overlap between the binary passed by admin and DMAR table built here, I
>> > don't have much idea on this. Even without the DMAR table, the binary
>> > may contains MADT or other tables and tool stacks don't intrepret the
>> > binary and check whether there are conflicts, right?
>> > 
> Thinking a bit more about this, when I first said "conflicts" I didn't
> mean to parse the content. I was referring to the code in
> libxl_x86_apci.c which also seems to manipulate acpi_modules.

Code in libxl_x86_acpi.c works for Hvmlite/PVHv2. The code we added is
for hvm guest.

> 
> I would like the code to generate dmar take into consideration
> libxl__dom_load_acpi.
> 

If add dmar table for hvmlite, we should combine dmar table with other
ACPI table and populate into acpi_modules[2]. This is how hvmlite add
other ACPI tables in libxl__dom_load_acpi().


-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V2 6/25] tools/libacpi: Add DMA remapping reporting (DMAR) ACPI table structures

2017-08-22 Thread Lan Tianyu
On 2017年08月22日 20:56, Wei Liu wrote:
> On Wed, Aug 09, 2017 at 04:34:07PM -0400, Lan Tianyu wrote:
>> From: Chao Gao <chao@intel.com>
>>
>> Add dmar table structure according Chapter 8 "BIOS Considerations" of
>> VTd spec Rev. 2.4.
>>
>> VTd 
>> spec:http://www.intel.com/content/dam/www/public/us/en/documents/product-specifications/vt-directed-io-spec.pdf
>>
>> Signed-off-by: Chao Gao <chao@intel.com>
>> Signed-off-by: Lan Tianyu <tianyu@intel.com>
> 
> I check the spec and the content, they match.
> 

Thanks.

-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V2 8/25] tools/libxl: Add a user configurable parameter to control vIOMMU attributes

2017-08-22 Thread Lan Tianyu
On 2017年08月22日 21:19, Wei Liu wrote:
>> +=over 4
>> > +
>> > +=item 

  1   2   3   4   >