Re: [Intel-gfx] [PATCH V6] drm/i915: Disable stolen memory when i915 runs in guest vm
On Mon, 08 May 2017 13:07:10 +0300 Joonas Lahtinenwrote: > On la, 2017-05-06 at 02:58 +, Zhang, Xiong Y wrote: > > > > > > On ke, 2017-05-03 at 09:22 +, Zhang, Xiong Y wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > + David and Jon > > > > > > > > > > > > On ti, 2017-04-25 at 18:34 +0800, Xiong Zhang wrote: > > > > > > > > > > > > The blocking issue I see is that bisecting is still not pointing at > > > > > > relevant commits. Both bisected commits from Bugzilla are not > > > > > > related > > > > > > to changes in stolen memory usage behavior. I'd assume a successful > > > > > > bisect to land at the patches where we start creating kernel > > > > > > internal > > > > > > objects from stolen memory. Otherwise we could be ignoring a bug > > > > > > elsewhere. If it consistently lands on those patches, then there > > > > > > might > > > > > > be something wrong with them, in addition to stolen memory > > > > > > problems. > > > > > [Zhang, Xiong Y] I only try kernel 4.8 and 4.9 above, as the bugzilla > > > > > > > > descripted, > > > > > > > > > > > > > > guest 4.8 kernel doesn't see gpu hang in guest dmesg, 4.9 kernel has > > > > > gpu > > > hang > > > > > > > > > > > > > > in guest dmesg. From this point, we could do git bisect. > > > > > But tons of IOMMU DMA R/W exception to stolen memory exist in host > > > dmesg > > > > > > > > > > > > > > when guest kernel is 4.8 and 4.9. This means guest domain iommu table > > > > > doesn't > > > > > have mapping for stolen memory and IGD fail in accessing stolen memory > > > > > from guest kernel 4.8 and 4.9. From this point, this issue isn't a > > > > > regression > > > and > > > > > > > > > > > > > > shouldn't go git bisect. You could check this host error message from > > > > > the > > > > > bugzilla > > > > > attachment. And this should be fixed first. > > > > > Anyway, I will try my best to get the ideal commit through git > > > > > bisect, but > > > I'm > > > > > > > > > > > > > > afraid > > > > > the result is the same as past because we don't have a stable good > > > > > point to > > > > > start git > > > > > bisect. > > > > [Zhang, Xiong Y] hi, Joonas: > > > > As you said, the gpu hang exist because i915 create ring buffer from > > > > stolen > > > memory. > > > > > > > > I did git bisect again, and the following commit is the first bad > > > > commit: > > > > commit c58b735fc762e891481e92af7124b85cb0a51fce > > > > > > > Author: Chris Wilson > > > > Date: Thu Aug 18 17:16:57 2016 +0100 > > > > > > > > drm/i915: Allocate rings from stolen > > > > > > > > If we have stolen available, make use of it for ringbuffer > > > > allocation. > > > > Previously this was restricted to !llc platforms, as writing to > > > > stolen > > > > requires a GGTT mapping - but now that we have partial mappable > > > support, > > > > > > > > the mappable aperture isn't quite so precious so we can use it more > > > > freely and ringbuffers are a good user for the otherwise wasted > > > > stolen. > > > > > > > > After reverting this patch from drm-intel-nightly, I didn't see gpu > > > > hang during > > > guest boot process. > > > > > > > > So what's our next step ? > > > > > > An appropriate next step would be to evaluate how much work it is to > > > support the RMRR passthrough David mentioned about in his commit. > > [Zhang, Xiong Y] As Kevin explained, KVM community found the disadvantage > > Of RMRR and have decided to not support RMRR passthrough, so it is really > > hard > > for us to push such solution and isn't related to the workload. > > Except usb and graphic card, all other devices with RMRR couldn't > > passthrough > > to guest. But the driver of usb and graphic card couldn't access RMRR in > > such > > environment. > > https://access.redhat.com/sites/default/files/attachments/rmrr-wp1.pdf > > Does this patch have the right Cc's from KVM team? I'd like to hear > directly from them that even the usage of RMRRs that follow the > intention of VT-d spec are not going to be supported. That document > predates the patches to add the exclusion for graphics. I'm the QEMU and kernel vfio maintainer and co-author of the above whitepaper. Even the VT-d spec suggests that usage of RMRRs should be limited (rev 2.3, 8.4): Platform designers should avoid or limit use of reserved memory regions since these require system software to create holes in the DMA virtual address range available to system software and its drivers. Also, if you read the entire thread which added the graphics exception for RMRR, you'll see that it went in under some degree of protest and ultimately under the conclusion that we should just ignore the RMRR anyway: https://lists.linuxfoundation.org/pipermail/iommu/2015-April/012790.html At least for the case of IGD RMRRs, we
Re: [Intel-gfx] [PATCH V6] drm/i915: Disable stolen memory when i915 runs in guest vm
On la, 2017-05-06 at 02:58 +, Zhang, Xiong Y wrote: > > > > On ke, 2017-05-03 at 09:22 +, Zhang, Xiong Y wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > + David and Jon > > > > > > > > > > On ti, 2017-04-25 at 18:34 +0800, Xiong Zhang wrote: > > > > > > > > > > The blocking issue I see is that bisecting is still not pointing at > > > > > relevant commits. Both bisected commits from Bugzilla are not related > > > > > to changes in stolen memory usage behavior. I'd assume a successful > > > > > bisect to land at the patches where we start creating kernel internal > > > > > objects from stolen memory. Otherwise we could be ignoring a bug > > > > > elsewhere. If it consistently lands on those patches, then there might > > > > > be something wrong with them, in addition to stolen memory problems. > > > > [Zhang, Xiong Y] I only try kernel 4.8 and 4.9 above, as the bugzilla > > descripted, > > > > > > > > > > > guest 4.8 kernel doesn't see gpu hang in guest dmesg, 4.9 kernel has gpu > > hang > > > > > > > > > > > in guest dmesg. From this point, we could do git bisect. > > > > But tons of IOMMU DMA R/W exception to stolen memory exist in host > > dmesg > > > > > > > > > > > when guest kernel is 4.8 and 4.9. This means guest domain iommu table > > > > doesn't > > > > have mapping for stolen memory and IGD fail in accessing stolen memory > > > > from guest kernel 4.8 and 4.9. From this point, this issue isn't a > > > > regression > > and > > > > > > > > > > > shouldn't go git bisect. You could check this host error message from > > > > the > > > > bugzilla > > > > attachment. And this should be fixed first. > > > > Anyway, I will try my best to get the ideal commit through git bisect, > > > > but > > I'm > > > > > > > > > > > afraid > > > > the result is the same as past because we don't have a stable good > > > > point to > > > > start git > > > > bisect. > > > [Zhang, Xiong Y] hi, Joonas: > > > As you said, the gpu hang exist because i915 create ring buffer from > > > stolen > > memory. > > > > > > I did git bisect again, and the following commit is the first bad commit: > > > commit c58b735fc762e891481e92af7124b85cb0a51fce > > > > > > Author: Chris Wilson> > > Date: Thu Aug 18 17:16:57 2016 +0100 > > > > > > drm/i915: Allocate rings from stolen > > > > > > If we have stolen available, make use of it for ringbuffer allocation. > > > Previously this was restricted to !llc platforms, as writing to stolen > > > requires a GGTT mapping - but now that we have partial mappable > > support, > > > > > > the mappable aperture isn't quite so precious so we can use it more > > > freely and ringbuffers are a good user for the otherwise wasted > > > stolen. > > > > > > After reverting this patch from drm-intel-nightly, I didn't see gpu hang > > > during > > guest boot process. > > > > > > So what's our next step ? > > > > An appropriate next step would be to evaluate how much work it is to > > support the RMRR passthrough David mentioned about in his commit. > [Zhang, Xiong Y] As Kevin explained, KVM community found the disadvantage > Of RMRR and have decided to not support RMRR passthrough, so it is really hard > for us to push such solution and isn't related to the workload. > Except usb and graphic card, all other devices with RMRR couldn't passthrough > to guest. But the driver of usb and graphic card couldn't access RMRR in such > environment. > https://access.redhat.com/sites/default/files/attachments/rmrr-wp1.pdf Does this patch have the right Cc's from KVM team? I'd like to hear directly from them that even the usage of RMRRs that follow the intention of VT-d spec are not going to be supported. That document predates the patches to add the exclusion for graphics. > > I'd also go talk with the IGD team, why they refuse to load the driver > > when stolen memory is correctly reported as zero, and insist on being > > lied to. > [Zhang, Xiong Y] thanks a lot for doing so. I don't have the contacts, so I assume you to pursue that. Regards, Joonas -- Joonas Lahtinen Open Source Technology Center Intel Corporation ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH V6] drm/i915: Disable stolen memory when i915 runs in guest vm
> On ke, 2017-05-03 at 09:22 +, Zhang, Xiong Y wrote: > > > > > > > > > > > + David and Jon > > > > > > > > On ti, 2017-04-25 at 18:34 +0800, Xiong Zhang wrote: > > > > > > > > The blocking issue I see is that bisecting is still not pointing at > > > > relevant commits. Both bisected commits from Bugzilla are not related > > > > to changes in stolen memory usage behavior. I'd assume a successful > > > > bisect to land at the patches where we start creating kernel internal > > > > objects from stolen memory. Otherwise we could be ignoring a bug > > > > elsewhere. If it consistently lands on those patches, then there might > > > > be something wrong with them, in addition to stolen memory problems. > > > [Zhang, Xiong Y] I only try kernel 4.8 and 4.9 above, as the bugzilla > descripted, > > > guest 4.8 kernel doesn't see gpu hang in guest dmesg, 4.9 kernel has gpu > hang > > > in guest dmesg. From this point, we could do git bisect. > > > But tons of IOMMU DMA R/W exception to stolen memory exist in host > dmesg > > > when guest kernel is 4.8 and 4.9. This means guest domain iommu table > > > doesn't > > > have mapping for stolen memory and IGD fail in accessing stolen memory > > > from guest kernel 4.8 and 4.9. From this point, this issue isn't a > > > regression > and > > > shouldn't go git bisect. You could check this host error message from the > > > bugzilla > > > attachment. And this should be fixed first. > > > Anyway, I will try my best to get the ideal commit through git bisect, but > I'm > > > afraid > > > the result is the same as past because we don't have a stable good point > > > to > > > start git > > > bisect. > > [Zhang, Xiong Y] hi, Joonas: > > As you said, the gpu hang exist because i915 create ring buffer from stolen > memory. > > I did git bisect again, and the following commit is the first bad commit: > > commit c58b735fc762e891481e92af7124b85cb0a51fce > > Author: Chris Wilson> > Date: Thu Aug 18 17:16:57 2016 +0100 > > > > drm/i915: Allocate rings from stolen > > > > If we have stolen available, make use of it for ringbuffer allocation. > > Previously this was restricted to !llc platforms, as writing to stolen > > requires a GGTT mapping - but now that we have partial mappable > support, > > the mappable aperture isn't quite so precious so we can use it more > > freely and ringbuffers are a good user for the otherwise wasted stolen. > > > > After reverting this patch from drm-intel-nightly, I didn't see gpu hang > > during > guest boot process. > > So what's our next step ? > > An appropriate next step would be to evaluate how much work it is to > support the RMRR passthrough David mentioned about in his commit. [Zhang, Xiong Y] As Kevin explained, KVM community found the disadvantage Of RMRR and have decided to not support RMRR passthrough, so it is really hard for us to push such solution and isn't related to the workload. Except usb and graphic card, all other devices with RMRR couldn't passthrough to guest. But the driver of usb and graphic card couldn't access RMRR in such environment. https://access.redhat.com/sites/default/files/attachments/rmrr-wp1.pdf > I'd also go talk with the IGD team, why they refuse to load the driver > when stolen memory is correctly reported as zero, and insist on being > lied to. [Zhang, Xiong Y] thanks a lot for doing so. > > While doing that, please update the freedesktop.org bugs. [Zhang, Xiong Y] sure, I will update bugzilla once we have further finding and make a decision. > > Regards, Joonas > -- > Joonas Lahtinen > Open Source Technology Center > Intel Corporation ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH V6] drm/i915: Disable stolen memory when i915 runs in guest vm
On ke, 2017-05-03 at 09:22 +, Zhang, Xiong Y wrote: > > > > > > > > + David and Jon > > > > > > On ti, 2017-04-25 at 18:34 +0800, Xiong Zhang wrote: > > > > > > The blocking issue I see is that bisecting is still not pointing at > > > relevant commits. Both bisected commits from Bugzilla are not related > > > to changes in stolen memory usage behavior. I'd assume a successful > > > bisect to land at the patches where we start creating kernel internal > > > objects from stolen memory. Otherwise we could be ignoring a bug > > > elsewhere. If it consistently lands on those patches, then there might > > > be something wrong with them, in addition to stolen memory problems. > > [Zhang, Xiong Y] I only try kernel 4.8 and 4.9 above, as the bugzilla > > descripted, > > guest 4.8 kernel doesn't see gpu hang in guest dmesg, 4.9 kernel has gpu > > hang > > in guest dmesg. From this point, we could do git bisect. > > But tons of IOMMU DMA R/W exception to stolen memory exist in host dmesg > > when guest kernel is 4.8 and 4.9. This means guest domain iommu table > > doesn't > > have mapping for stolen memory and IGD fail in accessing stolen memory > > from guest kernel 4.8 and 4.9. From this point, this issue isn't a > > regression and > > shouldn't go git bisect. You could check this host error message from the > > bugzilla > > attachment. And this should be fixed first. > > Anyway, I will try my best to get the ideal commit through git bisect, but > > I'm > > afraid > > the result is the same as past because we don't have a stable good point to > > start git > > bisect. > [Zhang, Xiong Y] hi, Joonas: > As you said, the gpu hang exist because i915 create ring buffer from stolen > memory. > I did git bisect again, and the following commit is the first bad commit: > commit c58b735fc762e891481e92af7124b85cb0a51fce > Author: Chris Wilson> Date: Thu Aug 18 17:16:57 2016 +0100 > > drm/i915: Allocate rings from stolen > > If we have stolen available, make use of it for ringbuffer allocation. > Previously this was restricted to !llc platforms, as writing to stolen > requires a GGTT mapping - but now that we have partial mappable support, > the mappable aperture isn't quite so precious so we can use it more > freely and ringbuffers are a good user for the otherwise wasted stolen. > > After reverting this patch from drm-intel-nightly, I didn't see gpu hang > during guest boot process. > So what's our next step ? An appropriate next step would be to evaluate how much work it is to support the RMRR passthrough David mentioned about in his commit. I'd also go talk with the IGD team, why they refuse to load the driver when stolen memory is correctly reported as zero, and insist on being lied to. While doing that, please update the freedesktop.org bugs. Regards, Joonas -- Joonas Lahtinen Open Source Technology Center Intel Corporation ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH V6] drm/i915: Disable stolen memory when i915 runs in guest vm
On to, 2017-04-27 at 05:54 +, Zhang, Xiong Y wrote: > > > > Also, was fixing the IGD driver loading with zero stolen memory > > considered instead? All this information should exist in the commit > > message. > [Zhang, Xiong Y] IGD and i915 driver read pci config register 0x50 to get > the size of stolen memory. When guest read this register, qemu could trap > it and return one value to guest. > So in order to " fixing the IGD driver loading with zero stolen memory ", > We have to modify both Qemu and IGD driver: > 1) QEMU: trap read from pci cfg 0x50 register, then return zero to guest > 2) IGD driver: when IGD driver see zero size of stolen memory, don't exit > loading > and continue. > This doesn't give any benefit to i915, i915 will still disable stolen memory > as i915 > see zero size stolen memory . So I prefer to disable stolen memory in i915 > directly > and keep Qemu and IGD driver unchanged. If due to lack of code in QEMU, RMRR is not available. It'd sound to me they should carry the change to report stolen as zero, because they're in best position to track if that changes in future. Also, if the IGD driver requires a special treatment where stolen is reported to exist but is not functional, that logic should be fixed where the flaw is. i915 driver is not the go-to for fixing any quirks and lack of code related to virtualization. The driver performs perfectly logically, if stolen is reported to exist, it is expected to work, if not, it's not touched. Adding special cases to satisfy conditions outside of i915 will make driver maintenance a burden. And, if all other options fail and we end up having to fix the issue in i915, the fix should be robust, which this currently is not (see my previous messages). Regards, Joonas -- Joonas Lahtinen Open Source Technology Center Intel Corporation ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH V6] drm/i915: Disable stolen memory when i915 runs in guest vm
> > + David and Jon > > > > On ti, 2017-04-25 at 18:34 +0800, Xiong Zhang wrote: > > > > The blocking issue I see is that bisecting is still not pointing at > > relevant commits. Both bisected commits from Bugzilla are not related > > to changes in stolen memory usage behavior. I'd assume a successful > > bisect to land at the patches where we start creating kernel internal > > objects from stolen memory. Otherwise we could be ignoring a bug > > elsewhere. If it consistently lands on those patches, then there might > > be something wrong with them, in addition to stolen memory problems. > [Zhang, Xiong Y] I only try kernel 4.8 and 4.9 above, as the bugzilla > descripted, > guest 4.8 kernel doesn't see gpu hang in guest dmesg, 4.9 kernel has gpu hang > in guest dmesg. From this point, we could do git bisect. > But tons of IOMMU DMA R/W exception to stolen memory exist in host dmesg > when guest kernel is 4.8 and 4.9. This means guest domain iommu table > doesn't > have mapping for stolen memory and IGD fail in accessing stolen memory > from guest kernel 4.8 and 4.9. From this point, this issue isn't a regression > and > shouldn't go git bisect. You could check this host error message from the > bugzilla > attachment. And this should be fixed first. > Anyway, I will try my best to get the ideal commit through git bisect, but I'm > afraid > the result is the same as past because we don't have a stable good point to > start git > bisect. [Zhang, Xiong Y] hi, Joonas: As you said, the gpu hang exist because i915 create ring buffer from stolen memory. I did git bisect again, and the following commit is the first bad commit: commit c58b735fc762e891481e92af7124b85cb0a51fce Author: Chris WilsonDate: Thu Aug 18 17:16:57 2016 +0100 drm/i915: Allocate rings from stolen If we have stolen available, make use of it for ringbuffer allocation. Previously this was restricted to !llc platforms, as writing to stolen requires a GGTT mapping - but now that we have partial mappable support, the mappable aperture isn't quite so precious so we can use it more freely and ringbuffers are a good user for the otherwise wasted stolen. After reverting this patch from drm-intel-nightly, I didn't see gpu hang during guest boot process. So what's our next step ? thanks > > > Disabling power saving makes many bugs go away, but we still don't > > disable power saving as a resolution to such bugs, but instead root > > cause and fix the individual bugs. > [Zhang, Xiong Y] I add i915.enable_rc6=0, i915.enable_dc=0, > i915.enable_fbc=0, > I915.enable_psr=0, i915.disable_power_well=0,i915.enable_ips=0 to grub. > But gpu hang exist in guest and DMA R/W error exist in host. > > > > > Stolen memory isn't a standard pci resource and exists in RMRR which has > > > identity mapping in iommu table when host boot up, so IGD could access > > > stolen memory in host OS. While according to 'commit c875d2c1b808 > > > ("iommu/vt-d: Exclude devices using RMRRs from IOMMU API > > domains")',RMRR > > > isn't supported by kvm, then both EPT and guest iommu domain table lack > > > of maaping for stolen memory in kvm IGD passthrough environment. > > > > Commit message text still fails to address that an exclusion was added > > by commit: > > > > commit 18436afdc11a00ac881990b454cfb2eae81d6003 > > Author: David Woodhouse > > Date: Wed Mar 25 15:05:47 2015 + > > > > iommu/vt-d: Allow RMRR on graphics devices too > > > > Commit c875d2c1 ("iommu/vt-d: Exclude devices using RMRRs from > > IOMMU API > > domains") prevents certain options for devices with RMRRs. This even > > prevents those devices from getting a 1:1 mapping with 'iommu=pt', > > because we don't have the code to handle *preserving* the RMRR > > regions > > when moving the device between domains. > > > > > > > > The quoted part of David's commit message leads me to believe it's > > simply lack of some code in kernel for juggling the RMRRs when moving a > > device between domains that is missing. Why is not that considered > > instead? With that implemented, we would have more transparent pass- > > through, which should be good. > [Zhang, Xiong Y] c875d2c1 ("iommu/vt-d: Exclude devices using RMRRs from > IOMMU API domains). This patch prevent devices associated with RMRRs from > assigning to a guest, the one of reason is it knows RMRR isn't supported in > guest > domain IOMMU table, If these device's driver still access RMRR from guest, > serious error will happen. > 18436afdc ("iommu/vt-d: Allow RMRR on graphics devices too "), add an > exception > to above commit. So IGD could be assigned to a guest. But this doesn't mean > IGD > 1:1 mapping for RMRR will be support in guest domain iommu table > 'iommu=pt' is to set 1:1 mapping for all pci device in host domain iommu > table. > > When one device is assigned to a guest and this guest boot up, this guest > domain > Iommu table will take
Re: [Intel-gfx] [PATCH V6] drm/i915: Disable stolen memory when i915 runs in guest vm
> + David and Jon > > On ti, 2017-04-25 at 18:34 +0800, Xiong Zhang wrote: > > The blocking issue I see is that bisecting is still not pointing at > relevant commits. Both bisected commits from Bugzilla are not related > to changes in stolen memory usage behavior. I'd assume a successful > bisect to land at the patches where we start creating kernel internal > objects from stolen memory. Otherwise we could be ignoring a bug > elsewhere. If it consistently lands on those patches, then there might > be something wrong with them, in addition to stolen memory problems. [Zhang, Xiong Y] I only try kernel 4.8 and 4.9 above, as the bugzilla descripted, guest 4.8 kernel doesn't see gpu hang in guest dmesg, 4.9 kernel has gpu hang in guest dmesg. From this point, we could do git bisect. But tons of IOMMU DMA R/W exception to stolen memory exist in host dmesg when guest kernel is 4.8 and 4.9. This means guest domain iommu table doesn't have mapping for stolen memory and IGD fail in accessing stolen memory from guest kernel 4.8 and 4.9. From this point, this issue isn't a regression and shouldn't go git bisect. You could check this host error message from the bugzilla attachment. And this should be fixed first. Anyway, I will try my best to get the ideal commit through git bisect, but I'm afraid the result is the same as past because we don't have a stable good point to start git bisect. > Disabling power saving makes many bugs go away, but we still don't > disable power saving as a resolution to such bugs, but instead root > cause and fix the individual bugs. [Zhang, Xiong Y] I add i915.enable_rc6=0, i915.enable_dc=0, i915.enable_fbc=0, I915.enable_psr=0, i915.disable_power_well=0,i915.enable_ips=0 to grub. But gpu hang exist in guest and DMA R/W error exist in host. > > > Stolen memory isn't a standard pci resource and exists in RMRR which has > > identity mapping in iommu table when host boot up, so IGD could access > > stolen memory in host OS. While according to 'commit c875d2c1b808 > > ("iommu/vt-d: Exclude devices using RMRRs from IOMMU API > domains")',RMRR > > isn't supported by kvm, then both EPT and guest iommu domain table lack > > of maaping for stolen memory in kvm IGD passthrough environment. > > Commit message text still fails to address that an exclusion was added > by commit: > > commit 18436afdc11a00ac881990b454cfb2eae81d6003 > Author: David Woodhouse> Date: Wed Mar 25 15:05:47 2015 + > > iommu/vt-d: Allow RMRR on graphics devices too > > Commit c875d2c1 ("iommu/vt-d: Exclude devices using RMRRs from > IOMMU API > domains") prevents certain options for devices with RMRRs. This even > prevents those devices from getting a 1:1 mapping with 'iommu=pt', > because we don't have the code to handle *preserving* the RMRR > regions > when moving the device between domains. > > > > The quoted part of David's commit message leads me to believe it's > simply lack of some code in kernel for juggling the RMRRs when moving a > device between domains that is missing. Why is not that considered > instead? With that implemented, we would have more transparent pass- > through, which should be good. [Zhang, Xiong Y] c875d2c1 ("iommu/vt-d: Exclude devices using RMRRs from IOMMU API domains). This patch prevent devices associated with RMRRs from assigning to a guest, the one of reason is it knows RMRR isn't supported in guest domain IOMMU table, If these device's driver still access RMRR from guest, serious error will happen. 18436afdc ("iommu/vt-d: Allow RMRR on graphics devices too "), add an exception to above commit. So IGD could be assigned to a guest. But this doesn't mean IGD 1:1 mapping for RMRR will be support in guest domain iommu table 'iommu=pt' is to set 1:1 mapping for all pci device in host domain iommu table. When one device is assigned to a guest and this guest boot up, this guest domain Iommu table will take place of host domain iommu table on hardware. Our issue is guest domain iommu table doesn't have 1:1 mapping for RMRR. In order to set up 1:1 mapping for RMRR in guest domain iommu table, we have to modify kvm and qemu and kvm community have declined this. > > Also, was fixing the IGD driver loading with zero stolen memory > considered instead? All this information should exist in the commit > message. [Zhang, Xiong Y] IGD and i915 driver read pci config register 0x50 to get the size of stolen memory. When guest read this register, qemu could trap it and return one value to guest. So in order to " fixing the IGD driver loading with zero stolen memory ", We have to modify both Qemu and IGD driver: 1) QEMU: trap read from pci cfg 0x50 register, then return zero to guest 2) IGD driver: when IGD driver see zero size of stolen memory, don't exit loading and continue. This doesn't give any benefit to i915, i915 will still disable stolen memory as i915 see zero size stolen memory . So I prefer to disable stolen memory
Re: [Intel-gfx] [PATCH V6] drm/i915: Disable stolen memory when i915 runs in guest vm
+ David and Jon On ti, 2017-04-25 at 18:34 +0800, Xiong Zhang wrote: The blocking issue I see is that bisecting is still not pointing at relevant commits. Both bisected commits from Bugzilla are not related to changes in stolen memory usage behavior. I'd assume a successful bisect to land at the patches where we start creating kernel internal objects from stolen memory. Otherwise we could be ignoring a bug elsewhere. If it consistently lands on those patches, then there might be something wrong with them, in addition to stolen memory problems. Disabling power saving makes many bugs go away, but we still don't disable power saving as a resolution to such bugs, but instead root cause and fix the individual bugs. > Stolen memory isn't a standard pci resource and exists in RMRR which has > identity mapping in iommu table when host boot up, so IGD could access > stolen memory in host OS. While according to 'commit c875d2c1b808 > ("iommu/vt-d: Exclude devices using RMRRs from IOMMU API domains")',RMRR > isn't supported by kvm, then both EPT and guest iommu domain table lack > of maaping for stolen memory in kvm IGD passthrough environment. Commit message text still fails to address that an exclusion was added by commit: commit 18436afdc11a00ac881990b454cfb2eae81d6003 Author: David WoodhouseDate: Wed Mar 25 15:05:47 2015 + iommu/vt-d: Allow RMRR on graphics devices too Commit c875d2c1 ("iommu/vt-d: Exclude devices using RMRRs from IOMMU API domains") prevents certain options for devices with RMRRs. This even prevents those devices from getting a 1:1 mapping with 'iommu=pt', because we don't have the code to handle *preserving* the RMRR regions when moving the device between domains. The quoted part of David's commit message leads me to believe it's simply lack of some code in kernel for juggling the RMRRs when moving a device between domains that is missing. Why is not that considered instead? With that implemented, we would have more transparent pass- through, which should be good. Also, was fixing the IGD driver loading with zero stolen memory considered instead? All this information should exist in the commit message. After the bisecting is properly done, there is an agreement that suggested RMRR preservation is absolutely a no-go, other options are not viable, the commit message should be updated to reflect all that. Then we should look in more detail on how to detect the scenarios when we're running in a virtual machine that doesn't set up the 1:1 mapping for RMRRs. Regards, Joonas -- Joonas Lahtinen Open Source Technology Center Intel Corporation ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx