Re: [Xen-devel] question: xen/qemu - mmio mapping issues for device pass-through
> -Original Message- > From: Konrad Rzeszutek Wilk > Sent: Tuesday, March 21, 2017 08:19 AM > To: Xuquan (Quan Xu); Venu Busireddy > Cc: Jan Beulich; anthony.per...@citrix.com; george.dun...@eu.citrix.com; > ian.jack...@eu.citrix.com; Fanhenglong; Kevin Tian; StefanoStabellini; > xen-devel@lists.xen.org > Subject: Re: question: xen/qemu - mmio mapping issues for device pass- > through > > .. snip.. > > support to pass-through large bar (pci-e bar > 4G) device.. > > Yes it does work. > > > > > I was assuming large BAR handling to work so far > > >(Konrad had done some adjustments there quite a while ago, from all I > recall). > > > > > > > > > _iirc_ what Konrad mentioned was using qemu-trad.. > > Yes but we also did tests on qemu-xen and it worked. CCing Venu. > > Venu, does passing in large BARs work with qemu-xen (aka 'xl')? Sorry, I do not know the answer! Venu ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] question: xen/qemu - mmio mapping issues for device pass-through
. snip.. > support to pass-through large bar (pci-e bar > 4G) device.. Yes it does work. > > > I was assuming large BAR handling to work so far > >(Konrad had done some adjustments there quite a while ago, from all I > >recall). > > > > > _iirc_ what Konrad mentioned was using qemu-trad.. Yes but we also did tests on qemu-xen and it worked. CCing Venu. Venu, does passing in large BARs work with qemu-xen (aka 'xl')? ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] question: xen/qemu - mmio mapping issues for device pass-through
>>> On 21.03.17 at 02:53,wrote: > On March 20, 2017 3:35 PM, Jan Beulich wrote: > On 20.03.17 at 02:58, wrote: >>> On March 16, 2017 11:32 PM, Jan Beulich wrote: >>> On 16.03.17 at 15:21, wrote: > On March 16, 2017 10:06 PM, Jan Beulich wrote: > On 16.03.17 at 14:55, wrote: >>> I try to pass-through a device with 8G large bar, such as nvidia >>> M60(note1, pci-e info as below). It takes about '__15 sconds__' to >>> update 8G large bar in QEMU::xen_pt_region_update().. >>> Specifically, it is xc_domain_memory_mapping() in xen_pt_region_update(). >>> >>> Digged into xc_domain_memory_mapping(), I find it mainly call >>> "do_domctl >>> (…case XEN_DOMCTL_memory_mapping…)" >>> to mapping mmio region.. of cause, I find out that this mapping >>> could take a while in the code comment below ' case >>XEN_DOMCTL_memory_mapping '. >>> >>> my questions: >>> 1. could we make this mapping mmio region quicker? >> > > Thanks for your quick reply. > >>Yes, e.g. by using large (2M or 1G) pages. This has been on my todo >>list for quite a while... >> >>> 2. if could not, does it limit by hardware performance? >> >>I'm afraid I don't understand the question. If you mean "Is it >>limited by hw performance", then no, see above. If you mean "Does it >>limit hw performance", then again no, I don't think so (other than >>the effect of having more IOMMU translation levels than really >>necessary for such large a region). >> > > Sorry, my question is "Is it limited by hw performance"... > > I am much confused. why does this mmio mapping take a while? > I guessed it takes a lot of time to set up p2m / iommu entry. That's > why I ask "Is it limited by hw performance". Well, just count the number of page table entries and that of the resulting hypercall continuations. It's the sheer amount of work that's causing the slowness, together with the need for us to use continuations to be on the safe side. There may well be redundant TLB invalidations as well. Since we can do better (by using large pages) I wouldn't call this "limited by hw performance", but of course one may. >>> >>> I agree. >>> So far as I know, xen upstream doesn't support to pass-through >>> large bar (pci-e bar > 4G) device, such as nvidia M60, However cloud >>> providers may want to leverage this feature for machine learning .etc. >>> Is it on your TODO list? >> >>Is what on my todo list? > > support to pass-through large bar (pci-e bar > 4G) device.. > >> I was assuming large BAR handling to work so far >>(Konrad had done some adjustments there quite a while ago, from all I > recall). >> > > > _iirc_ what Konrad mentioned was using qemu-trad.. Quite possible (albeit my memory says hvmloader), but the qemu side (trad or upstream) isn't my realm anyway. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] question: xen/qemu - mmio mapping issues for device pass-through
On March 20, 2017 3:35 PM, Jan Beulich wrote: On 20.03.17 at 02:58,wrote: >> On March 16, 2017 11:32 PM, Jan Beulich wrote: >> On 16.03.17 at 15:21, wrote: On March 16, 2017 10:06 PM, Jan Beulich wrote: On 16.03.17 at 14:55, wrote: >> I try to pass-through a device with 8G large bar, such as nvidia >> M60(note1, pci-e info as below). It takes about '__15 sconds__' to >> update 8G large bar in QEMU::xen_pt_region_update().. >> Specifically, it is xc_domain_memory_mapping() in >>>xen_pt_region_update(). >> >> Digged into xc_domain_memory_mapping(), I find it mainly call >> "do_domctl >> (…case XEN_DOMCTL_memory_mapping…)" >> to mapping mmio region.. of cause, I find out that this mapping >> could take a while in the code comment below ' case >XEN_DOMCTL_memory_mapping '. >> >> my questions: >> 1. could we make this mapping mmio region quicker? > Thanks for your quick reply. >Yes, e.g. by using large (2M or 1G) pages. This has been on my todo >list for quite a while... > >> 2. if could not, does it limit by hardware performance? > >I'm afraid I don't understand the question. If you mean "Is it >limited by hw performance", then no, see above. If you mean "Does it >limit hw performance", then again no, I don't think so (other than >the effect of having more IOMMU translation levels than really >necessary for such >>>large a region). > Sorry, my question is "Is it limited by hw performance"... I am much confused. why does this mmio mapping take a while? I guessed it takes a lot of time to set up p2m / iommu entry. That's why I ask "Is it limited by hw performance". >>> >>>Well, just count the number of page table entries and that of the >>>resulting hypercall continuations. It's the sheer amount of work >>>that's causing the slowness, together with the need for us to use >>>continuations to be on the safe side. There may well be redundant TLB >>>invalidations as well. Since we can do better (by using large >>>pages) I wouldn't call this "limited by hw performance", but of course >>>one may. >>> >> >> I agree. >> So far as I know, xen upstream doesn't support to pass-through >> large bar (pci-e bar > 4G) device, such as nvidia M60, However cloud >> providers may want to leverage this feature for machine learning .etc. >> Is it on your TODO list? > >Is what on my todo list? support to pass-through large bar (pci-e bar > 4G) device.. > I was assuming large BAR handling to work so far >(Konrad had done some adjustments there quite a while ago, from all I recall). > _iirc_ what Konrad mentioned was using qemu-trad.. Quan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] question: xen/qemu - mmio mapping issues for device pass-through
>>> On 20.03.17 at 02:58,wrote: > On March 16, 2017 11:32 PM, Jan Beulich wrote: > On 16.03.17 at 15:21, wrote: >>> On March 16, 2017 10:06 PM, Jan Beulich wrote: >>> On 16.03.17 at 14:55, wrote: > I try to pass-through a device with 8G large bar, such as nvidia > M60(note1, pci-e info as below). It takes about '__15 sconds__' to > update 8G large bar in QEMU::xen_pt_region_update().. > Specifically, it is xc_domain_memory_mapping() in >>xen_pt_region_update(). > > Digged into xc_domain_memory_mapping(), I find it mainly call > "do_domctl > (…case XEN_DOMCTL_memory_mapping…)" > to mapping mmio region.. of cause, I find out that this mapping > could take a while in the code comment below ' case XEN_DOMCTL_memory_mapping '. > > my questions: > 1. could we make this mapping mmio region quicker? >>> >>> Thanks for your quick reply. >>> Yes, e.g. by using large (2M or 1G) pages. This has been on my todo list for quite a while... > 2. if could not, does it limit by hardware performance? I'm afraid I don't understand the question. If you mean "Is it limited by hw performance", then no, see above. If you mean "Does it limit hw performance", then again no, I don't think so (other than the effect of having more IOMMU translation levels than really necessary for such >>large a region). >>> >>> Sorry, my question is "Is it limited by hw performance"... >>> >>> I am much confused. why does this mmio mapping take a while? >>> I guessed it takes a lot of time to set up p2m / iommu entry. That's >>> why I ask "Is it limited by hw performance". >> >>Well, just count the number of page table entries and that of the resulting >>hypercall continuations. It's the sheer amount of work that's causing the >>slowness, together with the need for us to use continuations to be on the safe >>side. There may well be redundant TLB invalidations as well. Since we can do >>better (by using large >>pages) I wouldn't call this "limited by hw performance", but of course one >>may. >> > > I agree. > So far as I know, xen upstream doesn't support to pass-through large bar > (pci-e bar > 4G) device, such as nvidia M60, > However cloud providers may want to leverage this feature for machine > learning .etc. > Is it on your TODO list? Is what on my todo list? I was assuming large BAR handling to work so far (Konrad had done some adjustments there quite a while ago, from all I recall). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] question: xen/qemu - mmio mapping issues for device pass-through
On March 16, 2017 11:32 PM, Jan Beulich wrote: On 16.03.17 at 15:21,wrote: >> On March 16, 2017 10:06 PM, Jan Beulich wrote: >> On 16.03.17 at 14:55, wrote: I try to pass-through a device with 8G large bar, such as nvidia M60(note1, pci-e info as below). It takes about '__15 sconds__' to update 8G large bar in QEMU::xen_pt_region_update().. Specifically, it is xc_domain_memory_mapping() in >xen_pt_region_update(). Digged into xc_domain_memory_mapping(), I find it mainly call "do_domctl (…case XEN_DOMCTL_memory_mapping…)" to mapping mmio region.. of cause, I find out that this mapping could take a while in the code comment below ' case >>>XEN_DOMCTL_memory_mapping '. my questions: 1. could we make this mapping mmio region quicker? >>> >> >> Thanks for your quick reply. >> >>>Yes, e.g. by using large (2M or 1G) pages. This has been on my todo >>>list for quite a while... >>> 2. if could not, does it limit by hardware performance? >>> >>>I'm afraid I don't understand the question. If you mean "Is it limited >>>by hw performance", then no, see above. If you mean "Does it limit hw >>>performance", then again no, I don't think so (other than the effect >>>of having more IOMMU translation levels than really necessary for such >large a region). >>> >> >> Sorry, my question is "Is it limited by hw performance"... >> >> I am much confused. why does this mmio mapping take a while? >> I guessed it takes a lot of time to set up p2m / iommu entry. That's >> why I ask "Is it limited by hw performance". > >Well, just count the number of page table entries and that of the resulting >hypercall continuations. It's the sheer amount of work that's causing the >slowness, together with the need for us to use continuations to be on the safe >side. There may well be redundant TLB invalidations as well. Since we can do >better (by using large >pages) I wouldn't call this "limited by hw performance", but of course one >may. > I agree. So far as I know, xen upstream doesn't support to pass-through large bar (pci-e bar > 4G) device, such as nvidia M60, However cloud providers may want to leverage this feature for machine learning .etc. Is it on your TODO list? Quan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] question: xen/qemu - mmio mapping issues for device pass-through
>>> On 16.03.17 at 15:21,wrote: > On March 16, 2017 10:06 PM, Jan Beulich wrote: > On 16.03.17 at 14:55, wrote: >>> I try to pass-through a device with 8G large bar, such as nvidia >>> M60(note1, pci-e info as below). It takes about '__15 sconds__' to >>> update 8G large bar in QEMU::xen_pt_region_update().. >>> Specifically, it is xc_domain_memory_mapping() in xen_pt_region_update(). >>> >>> Digged into xc_domain_memory_mapping(), I find it mainly call >>> "do_domctl >>> (…case XEN_DOMCTL_memory_mapping…)" >>> to mapping mmio region.. of cause, I find out that this mapping could >>> take a while in the code comment below ' case >>XEN_DOMCTL_memory_mapping '. >>> >>> my questions: >>> 1. could we make this mapping mmio region quicker? >> > > Thanks for your quick reply. > >>Yes, e.g. by using large (2M or 1G) pages. This has been on my todo list for >>quite a while... >> >>> 2. if could not, does it limit by hardware performance? >> >>I'm afraid I don't understand the question. If you mean "Is it limited by hw >>performance", then no, see above. If you mean "Does it limit hw performance", >>then again no, I don't think so (other than the effect of having more IOMMU >>translation levels than really necessary for such large a region). >> > > Sorry, my question is "Is it limited by hw performance"... > > I am much confused. why does this mmio mapping take a while? > I guessed it takes a lot of time to set up p2m / iommu entry. That's why I > ask "Is it limited by hw performance". Well, just count the number of page table entries and that of the resulting hypercall continuations. It's the sheer amount of work that's causing the slowness, together with the need for us to use continuations to be on the safe side. There may well be redundant TLB invalidations as well. Since we can do better (by using large pages) I wouldn't call this "limited by hw performance", but of course one may. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] question: xen/qemu - mmio mapping issues for device pass-through
On March 16, 2017 10:06 PM, Jan Beulich wrote: On 16.03.17 at 14:55,wrote: >> I try to pass-through a device with 8G large bar, such as nvidia >> M60(note1, pci-e info as below). It takes about '__15 sconds__' to >> update 8G large bar in QEMU::xen_pt_region_update().. >> Specifically, it is xc_domain_memory_mapping() in xen_pt_region_update(). >> >> Digged into xc_domain_memory_mapping(), I find it mainly call >> "do_domctl >> (…case XEN_DOMCTL_memory_mapping…)" >> to mapping mmio region.. of cause, I find out that this mapping could >> take a while in the code comment below ' case >XEN_DOMCTL_memory_mapping '. >> >> my questions: >> 1. could we make this mapping mmio region quicker? > Thanks for your quick reply. >Yes, e.g. by using large (2M or 1G) pages. This has been on my todo list for >quite a while... > >> 2. if could not, does it limit by hardware performance? > >I'm afraid I don't understand the question. If you mean "Is it limited by hw >performance", then no, see above. If you mean "Does it limit hw performance", >then again no, I don't think so (other than the effect of having more IOMMU >translation levels than really necessary for such large a region). > Sorry, my question is "Is it limited by hw performance"... I am much confused. why does this mmio mapping take a while? I guessed it takes a lot of time to set up p2m / iommu entry. That's why I ask "Is it limited by hw performance". Quan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] question: xen/qemu - mmio mapping issues for device pass-through
>>> On 16.03.17 at 14:55,wrote: > I try to pass-through a device with 8G large bar, such as nvidia M60(note1, > pci-e info as below). It takes about '__15 sconds__' to update 8G large bar > in > QEMU::xen_pt_region_update().. > Specifically, it is xc_domain_memory_mapping() in xen_pt_region_update(). > > Digged into xc_domain_memory_mapping(), I find it mainly call "do_domctl > (…case XEN_DOMCTL_memory_mapping…)" > to mapping mmio region.. of cause, I find out that this mapping could take a > while in the code comment below ' case XEN_DOMCTL_memory_mapping '. > > my questions: > 1. could we make this mapping mmio region quicker? Yes, e.g. by using large (2M or 1G) pages. This has been on my todo list for quite a while... > 2. if could not, does it limit by hardware performance? I'm afraid I don't understand the question. If you mean "Is it limited by hw performance", then no, see above. If you mean "Does it limit hw performance", then again no, I don't think so (other than the effect of having more IOMMU translation levels than really necessary for such large a region). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel