Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-16 Thread Chen, Tiejun

On 2015/7/16 17:40, George Dunlap wrote:

On Thu, Jul 16, 2015 at 3:05 AM, Chen, Tiejun tiejun.c...@intel.com wrote:

Could you take a look at the original patch #06 ?  Although Jan thought that
is complicated, that is really one version that I can refine in current time
slot.


When you say original, which version are you talking about?  You
mean the one at the base of this thread (v7)?



Yes, I'm pointing patch #6 in v7. And sorry to make this confused to you.

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-16 Thread George Dunlap
On Thu, Jul 16, 2015 at 3:05 AM, Chen, Tiejun tiejun.c...@intel.com wrote:
 Could you take a look at the original patch #06 ?  Although Jan thought that
 is complicated, that is really one version that I can refine in current time
 slot.

When you say original, which version are you talking about?  You
mean the one at the base of this thread (v7)?

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Chen, Tiejun

On 2015/7/16 0:14, George Dunlap wrote:

On Wed, Jul 15, 2015 at 2:56 PM, George Dunlap
george.dun...@eu.citrix.com wrote:

Would it be possible, on a collision, to have one last stab at
allocating the BAR somewhere else, without relocating memory (or
relocating any other BARs)?  At very least then an administrator could
work around this kind of thing by setting the mmio_hole larger in the
domain config.


If it's not possible to have this last-ditch relocation effort, then


Could you take a look at the original patch #06 ?  Although Jan thought 
that is complicated, that is really one version that I can refine in 
current time slot.



yes, I'd be OK with just disabling the device for the time being.



Just let me send out new patch series based this idea. We can continue 
discuss this over there but we also need to further review other 
remaining comments based on a new revision.


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Chen, Tiejun

On 2015/7/15 16:34, Jan Beulich wrote:

On 15.07.15 at 06:27, tiejun.c...@intel.com wrote:

Furthermore, could we have this solution as follows?


Yet more special casing code you want to add. I said no to this
model, and unless you can address the issue _without_ adding
a lot of special casing code, the answer will remain no (subject


What about this?

@@ -301,6 +301,19 @@ void pci_setup(void)
 pci_mem_start = 1;
 }

+for ( i = 0; i  memory_map.nr_map ; i++ )
+{
+uint64_t reserved_start, reserved_size;
+reserved_start = memory_map.map[i].addr;
+reserved_size = memory_map.map[i].size;
+if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
+   reserved_start, reserved_size) )
+{
+printf(Reserved device memory conflicts current PCI 
memory.\n);

+BUG();
+}
+}
+
 if ( mmio_total  (pci_mem_end - pci_mem_start) )
 {
 printf(Low MMIO hole not large enough for all devices,

This is very similar to our current policy to 
[RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END] in patch #6 
since actually this is also another rare possibility in real world. Even 
I can do this as well when we handle that conflict with 
[RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END] in patch #6.


Note its not necessary to concern high memory since we already handle 
this case in the hv code previously, and its also not affected by those 
relocated memory later since our previous policy can make sure RAM isn't 
overlapping with RDM.


Thanks
Tiejun


to co-maintainers overriding me).

Jan





___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Jan Beulich
 On 15.07.15 at 06:27, tiejun.c...@intel.com wrote:
 Furthermore, could we have this solution as follows?

Yet more special casing code you want to add. I said no to this
model, and unless you can address the issue _without_ adding
a lot of special casing code, the answer will remain no (subject
to co-maintainers overriding me).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Chen, Tiejun

This is very similar to our current policy to
[RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END] in patch #6
since actually this is also another rare possibility in real world. Even
I can do this as well when we handle that conflict with
[RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END] in patch #6.


Sorry, here is one typo, s/#6/#5

Thanks
Tiejun



Note its not necessary to concern high memory since we already handle
this case in the hv code previously, and its also not affected by those
relocated memory later since our previous policy can make sure RAM isn't
overlapping with RDM.

Thanks
Tiejun


to co-maintainers overriding me).

Jan





___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Chen, Tiejun

Certainly appreciate your time.

I didn't mean its wasting time at this point. I just want to express
that its hard to implement that solution in one or two weeks to walking
into 4.6 as an exception.

Note I know this feature is still not accepted as an exception to 4.6
right now so I'm making an assumption.


After all this is a bug fix (and would have been allowed into 4.5 had
it been ready in time), so doesn't necessarily need a freeze
exception (but of course the bar raises the later it gets). Rather


Yes, this is not a bug fix again into 4.6.


than rushing in something that's cumbersome to maintain, I'd much
prefer this to be done properly.



Indeed, we'd like to finalize this properly as you said. But apparently 
time is not sufficient to allow this happened. So I just suggest we can 
further seek the best solution in next phase.


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Jan Beulich
 On 15.07.15 at 02:55, tiejun.c...@intel.com wrote:
  I agree we'd better overhaul this since we already found something
 unreasonable here. But one or two weeks is really not enough to fix this
 with a bitmap framework, and although a bitmap can make mmio allocation
 better, but more complicated if we just want to allocate PCI mmio.

 So could we do this next? I just feel if you'd like to pay more time
 help me refine our current solution, its relatively realistic to this
 case :) And then we can go into bitmap in details or work out a better
 solution in sufficient time slot.

 Looking at how long it took to get here (wasn't this series originally
 even meant to go into 4.5?) and how much time I already spent
 
 Certainly appreciate your time.
 
 I didn't mean its wasting time at this point. I just want to express 
 that its hard to implement that solution in one or two weeks to walking 
 into 4.6 as an exception.
 
 Note I know this feature is still not accepted as an exception to 4.6 
 right now so I'm making an assumption.

After all this is a bug fix (and would have been allowed into 4.5 had
it been ready in time), so doesn't necessarily need a freeze
exception (but of course the bar raises the later it gets). Rather
than rushing in something that's cumbersome to maintain, I'd much
prefer this to be done properly.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Jan Beulich
 On 15.07.15 at 10:59, tiejun.c...@intel.com wrote:
 On 2015/7/15 16:34, Jan Beulich wrote:
 On 15.07.15 at 06:27, tiejun.c...@intel.com wrote:
 Furthermore, could we have this solution as follows?

 Yet more special casing code you want to add. I said no to this
 model, and unless you can address the issue _without_ adding
 a lot of special casing code, the answer will remain no (subject
 
 What about this?
 
 @@ -301,6 +301,19 @@ void pci_setup(void)
   pci_mem_start = 1;
   }
 
 +for ( i = 0; i  memory_map.nr_map ; i++ )
 +{
 +uint64_t reserved_start, reserved_size;
 +reserved_start = memory_map.map[i].addr;
 +reserved_size = memory_map.map[i].size;
 +if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
 +   reserved_start, reserved_size) )
 +{
 +printf(Reserved device memory conflicts current PCI memory.\n);
 +BUG();
 +}
 +}

So what would the cure be if someone ran into this BUG() (other
than removing the device associated with the conflicting RMRR)?
Afaics such a guest would remain permanently unbootable, which
of course is not an option.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Chen, Tiejun

On 2015/7/15 19:05, George Dunlap wrote:

On Wed, Jul 15, 2015 at 10:27 AM, Jan Beulich jbeul...@suse.com wrote:

On 15.07.15 at 10:59, tiejun.c...@intel.com wrote:

On 2015/7/15 16:34, Jan Beulich wrote:

On 15.07.15 at 06:27, tiejun.c...@intel.com wrote:

Furthermore, could we have this solution as follows?


Yet more special casing code you want to add. I said no to this
model, and unless you can address the issue _without_ adding
a lot of special casing code, the answer will remain no (subject


What about this?

@@ -301,6 +301,19 @@ void pci_setup(void)
   pci_mem_start = 1;
   }

+for ( i = 0; i  memory_map.nr_map ; i++ )
+{
+uint64_t reserved_start, reserved_size;
+reserved_start = memory_map.map[i].addr;
+reserved_size = memory_map.map[i].size;
+if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
+   reserved_start, reserved_size) )
+{
+printf(Reserved device memory conflicts current PCI memory.\n);
+BUG();
+}
+}


So what would the cure be if someone ran into this BUG() (other
than removing the device associated with the conflicting RMRR)?
Afaics such a guest would remain permanently unbootable, which
of course is not an option.


Is not booting worse than what we have now -- which is, booting
successfully but (probably) having issues due to MMIO ranges
overlapping RMRRs?


Its really so rare possibility here since in the real world we didn't 
see any RMRR regions = 0xF000 (the default pci memory start.) And I 
already sent out a little better revision in that ensuing email so also 
please take a review if possible :)




This patch series as a whole represents a lot of work and a lot of
tangible improvements to the situation; and (unless the situation has
changed) it's almost entirely acked apart from the MMIO placement
part.  If there is a simple way that we can change hvmloader so that
most (or even many) VM/device combinations work properly for the 4.6
release, then I think it's worth considering.



Current MMIO allocation mechanism is not good. So we really need to 
reshape that, but we'd better do this with some further discussion in 
next release :)


Thanks
Tiejun



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Chen, Tiejun

Maybe I  can move this chunk of codes downward those actual allocation
to check if RDM conflicts with the final allocation, and then just
disable those associated devices by writing PCI_COMMAND without BUG()
like this draft code,


And what would keep the guest from re-enabling the device?



We can't but IMO,

#1. We're already posting that warning messages.

#2. Actually this is like this sort of case like, as you known, even in 
the native platform, some broken devices are also disabled by BIOS, 
right? So I think this is OS's responsibility or risk to force enabling 
such a broken device.


#3. Its really rare possibility in real world.

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Chen, Tiejun

On 2015/7/15 19:27, Jan Beulich wrote:

On 15.07.15 at 13:05, george.dun...@eu.citrix.com wrote:

This patch series as a whole represents a lot of work and a lot of
tangible improvements to the situation; and (unless the situation has
changed) it's almost entirely acked apart from the MMIO placement
part.  If there is a simple way that we can change hvmloader so that
most (or even many) VM/device combinations work properly for the 4.6
release, then I think it's worth considering.


And I think the simplest way is to replace the allocation mechanism


This is the best way not the simplest way.

The bitmap way matching our all requirement is not easy and realistic to 
address design/write/test/review in short time. And also the entire 
replacement would bring more potential risks to 4.6 release.


Thanks
Tiejun


as outlined rather than trying to make the current model cope with
the new requirement.

Jan





___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread George Dunlap
On Wed, Jul 15, 2015 at 10:27 AM, Jan Beulich jbeul...@suse.com wrote:
 On 15.07.15 at 10:59, tiejun.c...@intel.com wrote:
 On 2015/7/15 16:34, Jan Beulich wrote:
 On 15.07.15 at 06:27, tiejun.c...@intel.com wrote:
 Furthermore, could we have this solution as follows?

 Yet more special casing code you want to add. I said no to this
 model, and unless you can address the issue _without_ adding
 a lot of special casing code, the answer will remain no (subject

 What about this?

 @@ -301,6 +301,19 @@ void pci_setup(void)
   pci_mem_start = 1;
   }

 +for ( i = 0; i  memory_map.nr_map ; i++ )
 +{
 +uint64_t reserved_start, reserved_size;
 +reserved_start = memory_map.map[i].addr;
 +reserved_size = memory_map.map[i].size;
 +if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
 +   reserved_start, reserved_size) )
 +{
 +printf(Reserved device memory conflicts current PCI 
 memory.\n);
 +BUG();
 +}
 +}

 So what would the cure be if someone ran into this BUG() (other
 than removing the device associated with the conflicting RMRR)?
 Afaics such a guest would remain permanently unbootable, which
 of course is not an option.

Is not booting worse than what we have now -- which is, booting
successfully but (probably) having issues due to MMIO ranges
overlapping RMRRs?

This patch series as a whole represents a lot of work and a lot of
tangible improvements to the situation; and (unless the situation has
changed) it's almost entirely acked apart from the MMIO placement
part.  If there is a simple way that we can change hvmloader so that
most (or even many) VM/device combinations work properly for the 4.6
release, then I think it's worth considering.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Jan Beulich
 On 15.07.15 at 12:34, tiejun.c...@intel.com wrote:
 Yet more special casing code you want to add. I said no to this
 model, and unless you can address the issue _without_ adding
 a lot of special casing code, the answer will remain no (subject

 What about this?

 @@ -301,6 +301,19 @@ void pci_setup(void)
pci_mem_start = 1;
}

 +for ( i = 0; i  memory_map.nr_map ; i++ )
 +{
 +uint64_t reserved_start, reserved_size;
 +reserved_start = memory_map.map[i].addr;
 +reserved_size = memory_map.map[i].size;
 +if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
 +   reserved_start, reserved_size) )
 +{
 +printf(Reserved device memory conflicts current PCI 
 memory.\n);
 +BUG();
 +}
 +}

 So what would the cure be if someone ran into this BUG() (other
 than removing the device associated with the conflicting RMRR)?
 
 Maybe I  can move this chunk of codes downward those actual allocation 
 to check if RDM conflicts with the final allocation, and then just 
 disable those associated devices by writing PCI_COMMAND without BUG() 
 like this draft code,

And what would keep the guest from re-enabling the device?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Jan Beulich
 On 15.07.15 at 13:05, george.dun...@eu.citrix.com wrote:
 This patch series as a whole represents a lot of work and a lot of
 tangible improvements to the situation; and (unless the situation has
 changed) it's almost entirely acked apart from the MMIO placement
 part.  If there is a simple way that we can change hvmloader so that
 most (or even many) VM/device combinations work properly for the 4.6
 release, then I think it's worth considering.

And I think the simplest way is to replace the allocation mechanism
as outlined rather than trying to make the current model cope with
the new requirement.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread George Dunlap
On 07/15/2015 12:20 PM, Chen, Tiejun wrote:
 On 2015/7/15 19:05, George Dunlap wrote:
 On Wed, Jul 15, 2015 at 10:27 AM, Jan Beulich jbeul...@suse.com wrote:
 On 15.07.15 at 10:59, tiejun.c...@intel.com wrote:
 On 2015/7/15 16:34, Jan Beulich wrote:
 On 15.07.15 at 06:27, tiejun.c...@intel.com wrote:
 Furthermore, could we have this solution as follows?

 Yet more special casing code you want to add. I said no to this
 model, and unless you can address the issue _without_ adding
 a lot of special casing code, the answer will remain no (subject

 What about this?

 @@ -301,6 +301,19 @@ void pci_setup(void)
pci_mem_start = 1;
}

 +for ( i = 0; i  memory_map.nr_map ; i++ )
 +{
 +uint64_t reserved_start, reserved_size;
 +reserved_start = memory_map.map[i].addr;
 +reserved_size = memory_map.map[i].size;
 +if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
 +   reserved_start, reserved_size) )
 +{
 +printf(Reserved device memory conflicts current PCI
 memory.\n);
 +BUG();
 +}
 +}

 So what would the cure be if someone ran into this BUG() (other
 than removing the device associated with the conflicting RMRR)?
 Afaics such a guest would remain permanently unbootable, which
 of course is not an option.

 Is not booting worse than what we have now -- which is, booting
 successfully but (probably) having issues due to MMIO ranges
 overlapping RMRRs?
 
 Its really so rare possibility here since in the real world we didn't
 see any RMRR regions = 0xF000 (the default pci memory start.) And I
 already sent out a little better revision in that ensuing email so also
 please take a review if possible :)

Do remember the context we're talking about. :-)  Jan said, *if* there
was a device that had an RMRR conflict with the default MMIO
placement, then the guest simply wouldn't boot.  I was saying, in that
case, we move from silently ignore the conflict, possibly making the
device not work to guest refuses to boot.  Which, if it was
guaranteed that a conflict would cause the device to no longer work,
would be an improvement.

 This patch series as a whole represents a lot of work and a lot of
 tangible improvements to the situation; and (unless the situation has
 changed) it's almost entirely acked apart from the MMIO placement
 part.  If there is a simple way that we can change hvmloader so that
 most (or even many) VM/device combinations work properly for the 4.6
 release, then I think it's worth considering.

 
 Current MMIO allocation mechanism is not good. So we really need to
 reshape that, but we'd better do this with some further discussion in
 next release :)

Absolutely; I was saying, if we can put in a good enough measure for
this release, then we can get the rest of the patch series in with our
good enough hvmloader fix, and then work on fixing it properly next
release.

But if you're not aiming for this release anymore, then our aims are
something different.  (See my other e-mail.)

 -George



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Chen, Tiejun

Yet more special casing code you want to add. I said no to this
model, and unless you can address the issue _without_ adding
a lot of special casing code, the answer will remain no (subject


What about this?

@@ -301,6 +301,19 @@ void pci_setup(void)
   pci_mem_start = 1;
   }

+for ( i = 0; i  memory_map.nr_map ; i++ )
+{
+uint64_t reserved_start, reserved_size;
+reserved_start = memory_map.map[i].addr;
+reserved_size = memory_map.map[i].size;
+if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
+   reserved_start, reserved_size) )
+{
+printf(Reserved device memory conflicts current PCI memory.\n);
+BUG();
+}
+}


So what would the cure be if someone ran into this BUG() (other
than removing the device associated with the conflicting RMRR)?


Maybe I  can move this chunk of codes downward those actual allocation 
to check if RDM conflicts with the final allocation, and then just 
disable those associated devices by writing PCI_COMMAND without BUG() 
like this draft code,


/* If pci bars conflict with RDM we need to disable this pci device. */
for ( devfn = 0; devfn  256; devfn++ )
{
bar_sz = pci_readl(devfn, bar_reg);
bar_data = pci_readl(devfn, bar_reg);
bar_data_upper = pci_readl(devfn, bar_reg + 4);
/* Until here we don't conflict high memory. */
if ( bar_data_upper )
continue;

for ( i = 0; i  memory_map.nr_map ; i++ )
{
uint64_t reserved_start, reserved_size;
reserved_start = memory_map.map[i].addr;
reserved_size = memory_map.map[i].size;
if ( check_overlap(bar_data  ~(bar_sz - 1), bar_sz,
   reserved_start, reserved_size) )
{
printf(Reserved device memory conflicts with this pci 
bar,

so just disable this device.\n);
/* Now disable this device */
cmd = pci_readw(devfn, PCI_COMMAND);
pci_writew(devfn, PCI_COMMAND, ~cmd);
}
}
}

If this is still not fine to you, look I have to raise a request to 
co-maintainers since its hard to step next in practice.


Hi all guys, what about your idea?

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Jan Beulich
 On 15.07.15 at 13:05, george.dun...@eu.citrix.com wrote:
 On Wed, Jul 15, 2015 at 10:27 AM, Jan Beulich jbeul...@suse.com wrote:
 On 15.07.15 at 10:59, tiejun.c...@intel.com wrote:
 What about this?

 @@ -301,6 +301,19 @@ void pci_setup(void)
   pci_mem_start = 1;
   }

 +for ( i = 0; i  memory_map.nr_map ; i++ )
 +{
 +uint64_t reserved_start, reserved_size;
 +reserved_start = memory_map.map[i].addr;
 +reserved_size = memory_map.map[i].size;
 +if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
 +   reserved_start, reserved_size) )
 +{
 +printf(Reserved device memory conflicts current PCI 
 memory.\n);
 +BUG();
 +}
 +}

 So what would the cure be if someone ran into this BUG() (other
 than removing the device associated with the conflicting RMRR)?
 Afaics such a guest would remain permanently unbootable, which
 of course is not an option.
 
 Is not booting worse than what we have now -- which is, booting
 successfully but (probably) having issues due to MMIO ranges
 overlapping RMRRs?

Again a matter of perspective: For devices (USB!) where the RMRR
exists solely for boot time (or outdated OS) use, this would be a
plain regression. For the graphics device Tiejun needs this for, it of
course would make little difference, I agree.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread George Dunlap
On 07/15/2015 12:24 PM, Jan Beulich wrote:
 On 15.07.15 at 13:05, george.dun...@eu.citrix.com wrote:
 On Wed, Jul 15, 2015 at 10:27 AM, Jan Beulich jbeul...@suse.com wrote:
 On 15.07.15 at 10:59, tiejun.c...@intel.com wrote:
 What about this?

 @@ -301,6 +301,19 @@ void pci_setup(void)
   pci_mem_start = 1;
   }

 +for ( i = 0; i  memory_map.nr_map ; i++ )
 +{
 +uint64_t reserved_start, reserved_size;
 +reserved_start = memory_map.map[i].addr;
 +reserved_size = memory_map.map[i].size;
 +if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
 +   reserved_start, reserved_size) )
 +{
 +printf(Reserved device memory conflicts current PCI 
 memory.\n);
 +BUG();
 +}
 +}

 So what would the cure be if someone ran into this BUG() (other
 than removing the device associated with the conflicting RMRR)?
 Afaics such a guest would remain permanently unbootable, which
 of course is not an option.

 Is not booting worse than what we have now -- which is, booting
 successfully but (probably) having issues due to MMIO ranges
 overlapping RMRRs?
 
 Again a matter of perspective: For devices (USB!) where the RMRR
 exists solely for boot time (or outdated OS) use, this would be a
 plain regression. For the graphics device Tiejun needs this for, it of
 course would make little difference, I agree.

So it's the case that, for some devices, not only is it functionally OK
to have RAM in the RMRR with no issues, it's also functionally OK to
have PCI BARs in the RMRR with no issues?

If so, then yes, I agree having a device fail to work with no ability to
work around it is an unacceptable regression.

If we're not targeting 4.6 for this, then the situation changes
somewhat.  One thing worth considering (which perhaps may be what tiejun
is looking at) is the cost of keeping a large number of
working-and-already-acked patches out of tree -- both the psychological
cost, the cost of rebasing, and the cost of re-reviewing rebases.  Given
how independent the hvmloader changes are to the rest of the series,
it's probably worth trying to see if we can check in the other patches
as soon as we branch.

If we can check in those patches with hvmloader still ignoring RMRRs,
then there's no problem.  But if we need the issue addressed *somehow*
in hvmloader before checking the rest of the series in, then I think it
makes sense to consider a minimal change that would make the series
somewhat functional, before moving on to overhauling the hvmloader MMIO
placement code.

 -George



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Jan Beulich
 On 15.07.15 at 15:40, dunl...@umich.edu wrote:
 On Mon, Jul 13, 2015 at 2:12 PM, Jan Beulich jbeul...@suse.com wrote:
 Therefore I'll not make any further comments on the rest of the
 patch, but instead outline an allocation model that I think would
 fit our needs: Subject to the constraints mentioned above, set up
 a bitmap (maximum size 64k [2Gb = 2^^19 pages needing 2^^19
 bits], i.e. reasonably small a memory block). Each bit represents a
 page usable for MMIO: First of all you remove the range from
 PCI_MEM_END upwards. Then remove all RDM pages. Now do a
 first pass over all devices, allocating (in the bitmap) space for only
 the 32-bit MMIO BARs, starting with the biggest one(s), by finding
 a best fit (i.e. preferably a range not usable by any bigger BAR)
 from top down. For example, if you have available

 [f000,f800)
 [f900,f9001000)
 [fa00,fa003000)
 [fa01,fa012000)

 and you're looking for a single page slot, you should end up
 picking fa002000.

 After this pass you should be able to do RAM relocation in a
 single attempt just like we do today (you may still grow the MMIO
 window if you know you need to and can fit some of the 64-bit
 BARs in there, subject to said constraints; this is in an attempt
 to help OSes not comfortable with 64-bit resources).

 In a 2nd pass you'd then assign 64-bit resources: If you can fit
 them below 4G (you still have the bitmap left of what you've got
 available), put them there. Allocation strategy could be the same
 as above (biggest first), perhaps allowing for some factoring out
 of logic, but here smallest first probably could work equally well.
 The main thought to decide between the two is whether it is
 better to fit as many (small) or as big (in total) as possible a set
 under 4G. I'd generally expect the former (as many as possible,
 leaving only a few huge ones to go above 4G) to be the better
 approach, but that's more a gut feeling than based on hard data.
 
 I agree that it would be more sensible for hvmloader to make a plan
 first, and then do the memory reallocation (if it's possible) at one
 time, then go through and actually update the device BARs according to
 the plan.
 
 However, I don't really see how having a bitmap really helps in this
 case.  I would think having a list of free ranges (perhaps aligned by
 powers of two?), sorted small-large, makes the most sense.

I view bitmap vs list as just two different representations, and I
picked the bitmap approach as being more compact storage wise
in case there are many regions to deal with. I'd be fine with a list
approach too, provided lookup times don't become prohibitive.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread George Dunlap
On Wed, Jul 15, 2015 at 2:56 PM, George Dunlap
george.dun...@eu.citrix.com wrote:
 Would it be possible, on a collision, to have one last stab at
 allocating the BAR somewhere else, without relocating memory (or
 relocating any other BARs)?  At very least then an administrator could
 work around this kind of thing by setting the mmio_hole larger in the
 domain config.

If it's not possible to have this last-ditch relocation effort, then
yes, I'd be OK with just disabling the device for the time being.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Wei Liu
On Wed, Jul 15, 2015 at 09:32:34AM +0100, Jan Beulich wrote:
  On 15.07.15 at 02:55, tiejun.c...@intel.com wrote:
   I agree we'd better overhaul this since we already found something
  unreasonable here. But one or two weeks is really not enough to fix this
  with a bitmap framework, and although a bitmap can make mmio allocation
  better, but more complicated if we just want to allocate PCI mmio.
 
  So could we do this next? I just feel if you'd like to pay more time
  help me refine our current solution, its relatively realistic to this
  case :) And then we can go into bitmap in details or work out a better
  solution in sufficient time slot.
 
  Looking at how long it took to get here (wasn't this series originally
  even meant to go into 4.5?) and how much time I already spent
  
  Certainly appreciate your time.
  
  I didn't mean its wasting time at this point. I just want to express 
  that its hard to implement that solution in one or two weeks to walking 
  into 4.6 as an exception.
  
  Note I know this feature is still not accepted as an exception to 4.6 
  right now so I'm making an assumption.
 
 After all this is a bug fix (and would have been allowed into 4.5 had
 it been ready in time), so doesn't necessarily need a freeze
 exception (but of course the bar raises the later it gets). Rather
 than rushing in something that's cumbersome to maintain, I'd much
 prefer this to be done properly.
 

This series is twofold. I consider the tools side change RDM (not
limited to RMRR) a new feature.  It introduces a new feature to fix a
bug. It would still be subject to freeze exception from my point of
view.

Wei.

 Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Chen, Tiejun

Is not booting worse than what we have now -- which is, booting
successfully but (probably) having issues due to MMIO ranges
overlapping RMRRs?


Its really so rare possibility here since in the real world we didn't
see any RMRR regions = 0xF000 (the default pci memory start.) And I
already sent out a little better revision in that ensuing email so also
please take a review if possible :)


Do remember the context we're talking about. :-)  Jan said, *if* there
was a device that had an RMRR conflict with the default MMIO
placement, then the guest simply wouldn't boot.  I was saying, in that


I understand what you guys mean. Yes, BUG is not good in our case :)


case, we move from silently ignore the conflict, possibly making the
device not work to guest refuses to boot.  Which, if it was
guaranteed that a conflict would cause the device to no longer work,
would be an improvement.


This is really what I did this with a little better revision as I 
mentioned above. Maybe you're missing this, so I'd like to paste that 
below ( but if you already saw this please ignore this below )


/* If pci bars conflict with RDM we need to disable this pci device. */
for ( devfn = 0; devfn  256; devfn++ )
{
bar_sz = pci_readl(devfn, bar_reg);
bar_data = pci_readl(devfn, bar_reg);
bar_data_upper = pci_readl(devfn, bar_reg + 4);
/* Until here we don't conflict high memory. */
if ( bar_data_upper )
continue;

for ( i = 0; i  memory_map.nr_map ; i++ )
{
uint64_t reserved_start, reserved_size;
reserved_start = memory_map.map[i].addr;
reserved_size = memory_map.map[i].size;
if ( check_overlap(bar_data  ~(bar_sz - 1), bar_sz,
   reserved_start, reserved_size) )
{
printf(Reserved device memory conflicts with this pci 
bar,

so just disable this device.\n);
/* Now disable this device */
cmd = pci_readw(devfn, PCI_COMMAND);
pci_writew(devfn, PCI_COMMAND, ~cmd);
}
}
}

Here I don't break that original allocation mechanism. Instead, I just 
check if we really have that conflict case and then disable that 
associated device.





This patch series as a whole represents a lot of work and a lot of
tangible improvements to the situation; and (unless the situation has
changed) it's almost entirely acked apart from the MMIO placement
part.  If there is a simple way that we can change hvmloader so that
most (or even many) VM/device combinations work properly for the 4.6
release, then I think it's worth considering.



Current MMIO allocation mechanism is not good. So we really need to
reshape that, but we'd better do this with some further discussion in
next release :)


Absolutely; I was saying, if we can put in a good enough measure for
this release, then we can get the rest of the patch series in with our
good enough hvmloader fix, and then work on fixing it properly next
release.

But if you're not aiming for this release anymore, then our aims are
something different.  (See my other e-mail.)



I really still try to strive for 4.6 release if possible :)

Thanks
Tiejun


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread George Dunlap
On Mon, Jul 13, 2015 at 2:12 PM, Jan Beulich jbeul...@suse.com wrote:
 Therefore I'll not make any further comments on the rest of the
 patch, but instead outline an allocation model that I think would
 fit our needs: Subject to the constraints mentioned above, set up
 a bitmap (maximum size 64k [2Gb = 2^^19 pages needing 2^^19
 bits], i.e. reasonably small a memory block). Each bit represents a
 page usable for MMIO: First of all you remove the range from
 PCI_MEM_END upwards. Then remove all RDM pages. Now do a
 first pass over all devices, allocating (in the bitmap) space for only
 the 32-bit MMIO BARs, starting with the biggest one(s), by finding
 a best fit (i.e. preferably a range not usable by any bigger BAR)
 from top down. For example, if you have available

 [f000,f800)
 [f900,f9001000)
 [fa00,fa003000)
 [fa01,fa012000)

 and you're looking for a single page slot, you should end up
 picking fa002000.

 After this pass you should be able to do RAM relocation in a
 single attempt just like we do today (you may still grow the MMIO
 window if you know you need to and can fit some of the 64-bit
 BARs in there, subject to said constraints; this is in an attempt
 to help OSes not comfortable with 64-bit resources).

 In a 2nd pass you'd then assign 64-bit resources: If you can fit
 them below 4G (you still have the bitmap left of what you've got
 available), put them there. Allocation strategy could be the same
 as above (biggest first), perhaps allowing for some factoring out
 of logic, but here smallest first probably could work equally well.
 The main thought to decide between the two is whether it is
 better to fit as many (small) or as big (in total) as possible a set
 under 4G. I'd generally expect the former (as many as possible,
 leaving only a few huge ones to go above 4G) to be the better
 approach, but that's more a gut feeling than based on hard data.

I agree that it would be more sensible for hvmloader to make a plan
first, and then do the memory reallocation (if it's possible) at one
time, then go through and actually update the device BARs according to
the plan.

However, I don't really see how having a bitmap really helps in this
case.  I would think having a list of free ranges (perhaps aligned by
powers of two?), sorted small-large, makes the most sense.

So suppose we had the above example, but with the range
[fa00,fa005000) instead, and we're looking for a 4-page region.
Then our free list initially would look like this:

[f900,f9001000)
[fa01,fa012000)
[fa00,fa005000)
[f000,f800)

After skipping the first two because they aren't big enough, we'd take
0x4000 from the third one, placing the BAR at [fa00,fa004000), and
putting the remainder [fa004000,fa005000) back on the free list in
order, thus:

[f900,f9001000)
[fa004000,fa005000)
[fa01,fa012000)
[f000,f800)

If we got to the end and hadn't found a region large enough, *and* we
could still expand the MMIO hole, we could lower pci_mem_start until
it could fit.

What do you think?

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread George Dunlap
On Wed, Jul 15, 2015 at 3:00 PM, Jan Beulich jbeul...@suse.com wrote:
 On 15.07.15 at 15:40, dunl...@umich.edu wrote:
 On Mon, Jul 13, 2015 at 2:12 PM, Jan Beulich jbeul...@suse.com wrote:
 Therefore I'll not make any further comments on the rest of the
 patch, but instead outline an allocation model that I think would
 fit our needs: Subject to the constraints mentioned above, set up
 a bitmap (maximum size 64k [2Gb = 2^^19 pages needing 2^^19
 bits], i.e. reasonably small a memory block). Each bit represents a
 page usable for MMIO: First of all you remove the range from
 PCI_MEM_END upwards. Then remove all RDM pages. Now do a
 first pass over all devices, allocating (in the bitmap) space for only
 the 32-bit MMIO BARs, starting with the biggest one(s), by finding
 a best fit (i.e. preferably a range not usable by any bigger BAR)
 from top down. For example, if you have available

 [f000,f800)
 [f900,f9001000)
 [fa00,fa003000)
 [fa01,fa012000)

 and you're looking for a single page slot, you should end up
 picking fa002000.

 After this pass you should be able to do RAM relocation in a
 single attempt just like we do today (you may still grow the MMIO
 window if you know you need to and can fit some of the 64-bit
 BARs in there, subject to said constraints; this is in an attempt
 to help OSes not comfortable with 64-bit resources).

 In a 2nd pass you'd then assign 64-bit resources: If you can fit
 them below 4G (you still have the bitmap left of what you've got
 available), put them there. Allocation strategy could be the same
 as above (biggest first), perhaps allowing for some factoring out
 of logic, but here smallest first probably could work equally well.
 The main thought to decide between the two is whether it is
 better to fit as many (small) or as big (in total) as possible a set
 under 4G. I'd generally expect the former (as many as possible,
 leaving only a few huge ones to go above 4G) to be the better
 approach, but that's more a gut feeling than based on hard data.

 I agree that it would be more sensible for hvmloader to make a plan
 first, and then do the memory reallocation (if it's possible) at one
 time, then go through and actually update the device BARs according to
 the plan.

 However, I don't really see how having a bitmap really helps in this
 case.  I would think having a list of free ranges (perhaps aligned by
 powers of two?), sorted small-large, makes the most sense.

 I view bitmap vs list as just two different representations, and I
 picked the bitmap approach as being more compact storage wise
 in case there are many regions to deal with. I'd be fine with a list
 approach too, provided lookup times don't become prohibitive.

Sure, you can obviously translate one into the other.  The main reason
I dislike the idea of a bitmap is having to write code to determine
where the next free region is, and how big that region is, rather than
just going down the next on the list and reading range.start and
range.len.

Also, in your suggestion each bit is a page (4k); so assuming a 64-bit
pointer, a 64-bit starting point, and a 64-bit length (juts to make
things simple), a single range takes up enough bitmap to reserve
(64+64+64)*4k = 768k.  So if we make the bitmap big enough for 2GiB,
then the break-even point for storage is 2,730 ranges.  It's even
higher if we have an array instead of a linked list.

I'm pretty sure that having such a large number of ranges will be
vanishingly rare;  I'd expect the number of ranges so the range
representation will not only be easier to code and read, but will in
the common case (I believe) be far more compact.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-14 Thread Jan Beulich
 On 14.07.15 at 08:39, tiejun.c...@intel.com wrote:
  -} *resource, mem_resource, high_mem_resource, io_resource;
 +} *resource, mem_resource, high_mem_resource, io_resource, 
 exp_mem_resource;

 Despite having gone through description and the rest of the patch I
 can't seem to be able to guess what exp_mem stands for.
 Meaningful variable names are quite helpful though, often avoiding
 the need for comments.
 
 exp_mem_resource() is the expanded mem_resource in the case of 
 populating RAM.
 
 Maybe I should use the whole word, expand_mem_resource.

And what does expand here mean then?

 @@ -309,29 +339,31 @@ void pci_setup(void)
   }

   /* Relocate RAM that overlaps PCI space (in 64k-page chunks). */
 +cur_pci_mem_start = pci_mem_start;
   while ( (pci_mem_start  PAGE_SHIFT)  hvm_info-low_mem_pgend )
 +relocate_ram_for_pci_memory(cur_pci_mem_start);

 Please be consistent which variable to want to use in the loop
 (pci_mem_start vs cur_pci_mem_start).
 
 Overall I just call relocate_ram_for_pci_memory() twice and each I 
 always pass cur_pci_mem_start. Any inconsistent place?

In the quoted code you use pci_mem_start in the while()
condition and cur_pci_mem_start in that same while()'s body.

 Also, this being the first substantial change to the function makes
 clear that you _still_ leave the sizing loop untouched, and instead
 make the allocation logic below more complicated. I said before a
 
 But this may be more reasonable than it used to do. In my point of view 
 we always need to first allocate 32bit mmio and then allocate 64bit mmio 
 since as you said we don't want to expand high memory if possible.
 
 number of times that I don't think this helps maintainability of this
 already convoluted code. Among other things this manifests itself
 in your second call to relocate_ram_for_pci_memory() in no way
 playing by the constraints explained a few lines up from here in an
 extensive comment.
 
 Can't all variables/comments express what I intend to do here? Except 
 for that exp_mem_resource.

I'm not talking about a lack of comments, but about your added
use of the function not being in line with what is being explained
in an earlier (pre-existing) comment.

 Therefore I'll not make any further comments on the rest of the
 patch, but instead outline an allocation model that I think would
 fit our needs: Subject to the constraints mentioned above, set up
 a bitmap (maximum size 64k [2Gb = 2^^19 pages needing 2^^19
 bits], i.e. reasonably small a memory block). Each bit represents a
 page usable for MMIO: First of all you remove the range from
 PCI_MEM_END upwards. Then remove all RDM pages. Now do a
 first pass over all devices, allocating (in the bitmap) space for only
 the 32-bit MMIO BARs, starting with the biggest one(s), by finding
 a best fit (i.e. preferably a range not usable by any bigger BAR)
 from top down. For example, if you have available

 [f000,f800)
 [f900,f9001000)
 [fa00,fa003000)
 [fa01,fa012000)

 and you're looking for a single page slot, you should end up
 picking fa002000.
 
 Why is this [f900,f9001000]? Just one page in this slot.

This was just a simple example I gave. Or maybe I don't understand
your question...

 After this pass you should be able to do RAM relocation in a
 single attempt just like we do today (you may still grow the MMIO
 window if you know you need to and can fit some of the 64-bit
 BARs in there, subject to said constraints; this is in an attempt
 to help OSes not comfortable with 64-bit resources).

 In a 2nd pass you'd then assign 64-bit resources: If you can fit
 them below 4G (you still have the bitmap left of what you've got
 available), put them there. Allocation strategy could be the same
 
 I think basically, your logic is similar to what I did as I described in 
 changelog,

The goal is the same, but the approaches look quite different to
me. In particular my approach avoids calculating mmio_total up
front, then basing RAM relocation on it, only to find subsequently
that more RAM may need to be relocated.

 I think bitmap mechanism is a good idea but honestly, its not easy to 
 cover all requirements here. And just like bootmem on Linux side, so its 
 a little complicated to implement this entirely. So I prefer not to 
 introduce this way in current phase.

I'm afraid it's going to be hard to convince me of any approaches
further complicating the current mechanism instead of overhauling
it.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-14 Thread Chen, Tiejun


Note here I don't address your comments above since I think we should 
achieve an agreement firstly.



I think bitmap mechanism is a good idea but honestly, its not easy to
cover all requirements here. And just like bootmem on Linux side, so its
a little complicated to implement this entirely. So I prefer not to
introduce this way in current phase.


I'm afraid it's going to be hard to convince me of any approaches
further complicating the current mechanism instead of overhauling
it.


I agree we'd better overhaul this since we already found something 
unreasonable here. But one or two weeks is really not enough to fix this 
with a bitmap framework, and although a bitmap can make mmio allocation 
better, but more complicated if we just want to allocate PCI mmio.


So could we do this next? I just feel if you'd like to pay more time 
help me refine our current solution, its relatively realistic to this 
case :) And then we can go into bitmap in details or work out a better 
solution in sufficient time slot.


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-14 Thread Jan Beulich
 On 14.07.15 at 12:54, tiejun.c...@intel.com wrote:
 I think bitmap mechanism is a good idea but honestly, its not easy to
 cover all requirements here. And just like bootmem on Linux side, so its
 a little complicated to implement this entirely. So I prefer not to
 introduce this way in current phase.

 I'm afraid it's going to be hard to convince me of any approaches
 further complicating the current mechanism instead of overhauling
 it.
 
 I agree we'd better overhaul this since we already found something 
 unreasonable here. But one or two weeks is really not enough to fix this 
 with a bitmap framework, and although a bitmap can make mmio allocation 
 better, but more complicated if we just want to allocate PCI mmio.
 
 So could we do this next? I just feel if you'd like to pay more time 
 help me refine our current solution, its relatively realistic to this 
 case :) And then we can go into bitmap in details or work out a better 
 solution in sufficient time slot.

Looking at how long it took to get here (wasn't this series originally
even meant to go into 4.5?) and how much time I already spent
reviewing all the previous versions, I don't see a point in wasting
even more time on working out details of an approach that's getting
too complex/ugly already anyway.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-14 Thread Chen, Tiejun

-} *resource, mem_resource, high_mem_resource, io_resource;
+} *resource, mem_resource, high_mem_resource, io_resource, 
exp_mem_resource;


Despite having gone through description and the rest of the patch I
can't seem to be able to guess what exp_mem stands for.
Meaningful variable names are quite helpful though, often avoiding
the need for comments.


exp_mem_resource() is the expanded mem_resource in the case of 
populating RAM.


Maybe I should use the whole word, expand_mem_resource.




  /* Create a list of device BARs in descending order of size. */


[snip]


@@ -309,29 +339,31 @@ void pci_setup(void)
  }

  /* Relocate RAM that overlaps PCI space (in 64k-page chunks). */
+cur_pci_mem_start = pci_mem_start;
  while ( (pci_mem_start  PAGE_SHIFT)  hvm_info-low_mem_pgend )
+relocate_ram_for_pci_memory(cur_pci_mem_start);


Please be consistent which variable to want to use in the loop
(pci_mem_start vs cur_pci_mem_start).


Overall I just call relocate_ram_for_pci_memory() twice and each I 
always pass cur_pci_mem_start. Any inconsistent place?




Also, this being the first substantial change to the function makes
clear that you _still_ leave the sizing loop untouched, and instead
make the allocation logic below more complicated. I said before a


But this may be more reasonable than it used to do. In my point of view 
we always need to first allocate 32bit mmio and then allocate 64bit mmio 
since as you said we don't want to expand high memory if possible.



number of times that I don't think this helps maintainability of this
already convoluted code. Among other things this manifests itself
in your second call to relocate_ram_for_pci_memory() in no way
playing by the constraints explained a few lines up from here in an
extensive comment.


Can't all variables/comments express what I intend to do here? Except 
for that exp_mem_resource.
  /* 

 * We have to populate more RAM to further allocate 

 * the remaining 32bars. 

 */ 

if ( mmio32_unallocated_total ) 

{ 

cur_pci_mem_start = pci_mem_start - 
mmio32_unallocated_total;
relocate_ram_for_pci_memory(cur_pci_mem_start); 

exp_mem_resource.base = cur_pci_mem_start; 

exp_mem_resource.max = pci_mem_start; 


}



Therefore I'll not make any further comments on the rest of the
patch, but instead outline an allocation model that I think would
fit our needs: Subject to the constraints mentioned above, set up
a bitmap (maximum size 64k [2Gb = 2^^19 pages needing 2^^19
bits], i.e. reasonably small a memory block). Each bit represents a
page usable for MMIO: First of all you remove the range from
PCI_MEM_END upwards. Then remove all RDM pages. Now do a
first pass over all devices, allocating (in the bitmap) space for only
the 32-bit MMIO BARs, starting with the biggest one(s), by finding
a best fit (i.e. preferably a range not usable by any bigger BAR)
from top down. For example, if you have available

[f000,f800)
[f900,f9001000)
[fa00,fa003000)
[fa01,fa012000)

and you're looking for a single page slot, you should end up
picking fa002000.


Why is this [f900,f9001000]? Just one page in this slot.



After this pass you should be able to do RAM relocation in a
single attempt just like we do today (you may still grow the MMIO
window if you know you need to and can fit some of the 64-bit
BARs in there, subject to said constraints; this is in an attempt
to help OSes not comfortable with 64-bit resources).

In a 2nd pass you'd then assign 64-bit resources: If you can fit
them below 4G (you still have the bitmap left of what you've got
available), put them there. Allocation strategy could be the same


I think basically, your logic is similar to what I did as I described in 
changelog,


  1. The first allocation round just to 32bit-bar

  If we can finish allocating all 32bit-bar, we just go to allocate 
64bit-bar

  with all remaining resources including low pci memory.

  If not, we need to calculate how much RAM should be populated to 
allocate the
  remaining 32bit-bars, then populate sufficient RAM as 
exp_mem_resource to go

  to the second allocation round 2.

  2. The second allocation round to the remaining 32bit-bar

  We should can finish allocating all 32bit-bar in theory, then go to 
the third

  allocation round 3.

  3. The third allocation round to 64bit-bar

  We'll try to first allocate from the remaining low memory resource. 
If that
  isn't enough, we try to expand highmem to allocate for 64bit-bar. 
This process

  should be same as the original.


as above (biggest first), perhaps allowing for some factoring out
of logic, but here smallest first probably could work equally well.
The main thought to decide between the two is whether it is
better to fit as many (small) or as big (in total) as possible a set
under 4G. I'd 

Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-14 Thread Chen, Tiejun

I agree we'd better overhaul this since we already found something
unreasonable here. But one or two weeks is really not enough to fix this
with a bitmap framework, and although a bitmap can make mmio allocation
better, but more complicated if we just want to allocate PCI mmio.

So could we do this next? I just feel if you'd like to pay more time
help me refine our current solution, its relatively realistic to this
case :) And then we can go into bitmap in details or work out a better
solution in sufficient time slot.


Looking at how long it took to get here (wasn't this series originally
even meant to go into 4.5?) and how much time I already spent


Certainly appreciate your time.

I didn't mean its wasting time at this point. I just want to express 
that its hard to implement that solution in one or two weeks to walking 
into 4.6 as an exception.


Note I know this feature is still not accepted as an exception to 4.6 
right now so I'm making an assumption.



reviewing all the previous versions, I don't see a point in wasting
even more time on working out details of an approach that's getting
too complex/ugly already anyway.


Here I'm trying to seek such a kind of two-steps approach if possible.

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-14 Thread Chen, Tiejun

On 2015/7/15 8:55, Chen, Tiejun wrote:

I agree we'd better overhaul this since we already found
something unreasonable here. But one or two weeks is really not
enough to fix this with a bitmap framework, and although a bitmap
can make mmio allocation better, but more complicated if we just
want to allocate PCI mmio.

So could we do this next? I just feel if you'd like to pay more
time help me refine our current solution, its relatively
realistic to this case :) And then we can go into bitmap in
details or work out a better solution in sufficient time slot.


Looking at how long it took to get here (wasn't this series
originally even meant to go into 4.5?) and how much time I already
spent


Certainly appreciate your time.

I didn't mean its wasting time at this point. I just want to express
 that its hard to implement that solution in one or two weeks to
walking into 4.6 as an exception.

Note I know this feature is still not accepted as an exception to 4.6
 right now so I'm making an assumption.


reviewing all the previous versions, I don't see a point in
wasting even more time on working out details of an approach that's
getting too complex/ugly already anyway.


Here I'm trying to seek such a kind of two-steps approach if
possible.



Furthermore, could we have this solution as follows?

@@ -407,8 +408,29 @@ void pci_setup(void)
 }

 base = (resource-base  + bar_sz - 1)  ~(uint64_t)(bar_sz - 1);
+ reallocate_bar:
 bar_data |= (uint32_t)base;
 bar_data_upper = (uint32_t)(base  32);
+/*
+ * We need to skip those reserved regions like RMRR but this
pobably
+ * lead the remaing allocation failed.
+ */
+for ( j = 0; j  memory_map.nr_map ; j++ )
+{
+if ( memory_map.map[j].type != E820_RAM )
+{
+reserved_end = memory_map.map[j].addr +
memory_map.map[j].size;
+if ( check_overlap(base, bar_sz,
+   memory_map.map[j].addr,
+   memory_map.map[j].size) )
+{
+if ( !mmio_conflict )
+mmio_conflict = true;
+base = (reserved_end + bar_sz - 1) 
~(uint64_t)(bar_sz - 1);
+goto reallocate_bar;
+}
+}
+}
 base += bar_sz;

 if ( (base  resource-base) || (base  resource-max) )
@@ -416,6 +438,11 @@ void pci_setup(void)
 printf(pci dev %02x:%x bar %02x size PRIllx: no space for 
resource!\n, devfn3, devfn7, bar_reg,
PRIllx_arg(bar_sz));
+if ( mmio_conflict )
+{
+printf(MMIO conflicts with RDM.\n);
+BUG();
+}
 continue;
 }

Here we still check RDMs and skip them if any conflicts exist, but at
last, if 1) no sufficient space to allocate all bars to this VM  2)
this insufficient situation is triggered by RDM, we stop creating this VM.

This may change slightly current mechanism and also guarantee we don't 
ignore any conflicts. But I admit this is not a good solution because 
that alignment issue still exists but I think it may be accepted if all 
PCI devices specific to this VM can work. And in real world most RDMs 
just fall into somewhere but  0xF000 (current default pci memory 
start address), so its not a common case to see this kind of conflict.


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-13 Thread Jan Beulich
 On 09.07.15 at 07:33, tiejun.c...@intel.com wrote:
 @@ -50,17 +75,22 @@ void pci_setup(void)
  /* Resources assignable to PCI devices via BARs. */
  struct resource {
  uint64_t base, max;
 -} *resource, mem_resource, high_mem_resource, io_resource;
 +} *resource, mem_resource, high_mem_resource, io_resource, 
 exp_mem_resource;

Despite having gone through description and the rest of the patch I
can't seem to be able to guess what exp_mem stands for.
Meaningful variable names are quite helpful though, often avoiding
the need for comments.

  /* Create a list of device BARs in descending order of size. */
  struct bars {
 -uint32_t is_64bar;
 +#define PCI_BAR_IS_64BIT0x1
 +#define PCI_BAR_IS_ALLOCATED0x2
 +uint32_t flag;

flags (you already have two)

  uint32_t devfn;
  uint32_t bar_reg;
  uint64_t bar_sz;
  } *bars = (struct bars *)scratch_start;
 -unsigned int i, nr_bars = 0;
 -uint64_t mmio_hole_size = 0;
 +unsigned int i, j, n, nr_bars = 0;
 +uint64_t mmio_hole_size = 0, reserved_start, reserved_end, reserved_size;
 +bool bar32_allocating = 0;
 +uint64_t mmio32_unallocated_total = 0;
 +unsigned long cur_pci_mem_start = 0;
  
  const char *s;
  /*
 @@ -222,7 +252,7 @@ void pci_setup(void)
  if ( i != nr_bars )
  memmove(bars[i+1], bars[i], (nr_bars-i) * sizeof(*bars));
  
 -bars[i].is_64bar = is_64bar;
 +bars[i].flag = is_64bar ? PCI_BAR_IS_64BIT : 0;
  bars[i].devfn   = devfn;
  bars[i].bar_reg = bar_reg;
  bars[i].bar_sz  = bar_sz;
 @@ -309,29 +339,31 @@ void pci_setup(void)
  }
  
  /* Relocate RAM that overlaps PCI space (in 64k-page chunks). */
 +cur_pci_mem_start = pci_mem_start;
  while ( (pci_mem_start  PAGE_SHIFT)  hvm_info-low_mem_pgend )
 +relocate_ram_for_pci_memory(cur_pci_mem_start);

Please be consistent which variable to want to use in the loop
(pci_mem_start vs cur_pci_mem_start).

Also, this being the first substantial change to the function makes
clear that you _still_ leave the sizing loop untouched, and instead
make the allocation logic below more complicated. I said before a
number of times that I don't think this helps maintainability of this
already convoluted code. Among other things this manifests itself
in your second call to relocate_ram_for_pci_memory() in no way
playing by the constraints explained a few lines up from here in an
extensive comment.

Therefore I'll not make any further comments on the rest of the
patch, but instead outline an allocation model that I think would
fit our needs: Subject to the constraints mentioned above, set up
a bitmap (maximum size 64k [2Gb = 2^^19 pages needing 2^^19
bits], i.e. reasonably small a memory block). Each bit represents a
page usable for MMIO: First of all you remove the range from
PCI_MEM_END upwards. Then remove all RDM pages. Now do a
first pass over all devices, allocating (in the bitmap) space for only
the 32-bit MMIO BARs, starting with the biggest one(s), by finding
a best fit (i.e. preferably a range not usable by any bigger BAR)
from top down. For example, if you have available

[f000,f800)
[f900,f9001000)
[fa00,fa003000)
[fa01,fa012000)

and you're looking for a single page slot, you should end up
picking fa002000.

After this pass you should be able to do RAM relocation in a
single attempt just like we do today (you may still grow the MMIO
window if you know you need to and can fit some of the 64-bit
BARs in there, subject to said constraints; this is in an attempt
to help OSes not comfortable with 64-bit resources).

In a 2nd pass you'd then assign 64-bit resources: If you can fit
them below 4G (you still have the bitmap left of what you've got
available), put them there. Allocation strategy could be the same
as above (biggest first), perhaps allowing for some factoring out
of logic, but here smallest first probably could work equally well.
The main thought to decide between the two is whether it is
better to fit as many (small) or as big (in total) as possible a set
under 4G. I'd generally expect the former (as many as possible,
leaving only a few huge ones to go above 4G) to be the better
approach, but that's more a gut feeling than based on hard data.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-08 Thread Tiejun Chen
When allocating mmio address for PCI bars, we need to make
sure they don't overlap with reserved regions.

CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v6 ~ v7:

* Nothing is changed.

v5:

* Rename that field, is_64bar, inside struct bars with flag, and
  then extend to also indicate if this bar is already allocated.

v4:

* We have to re-design this as follows:

  #1. Goal

  MMIO region should exclude all reserved device memory

  #2. Requirements

  #2.1 Still need to make sure MMIO region is fit all pci devices as before

  #2.2 Accommodate the not aligned reserved memory regions

  If I'm missing something let me know.

  #3. How to

  #3.1 Address #2.1

  We need to either of populating more RAM, or of expanding more highmem. But
  we should know just 64bit-bar can work with highmem, and as you mentioned we
  also should avoid expanding highmem as possible. So my implementation is to 
  allocate 32bit-bar and 64bit-bar orderly.

  1. The first allocation round just to 32bit-bar

  If we can finish allocating all 32bit-bar, we just go to allocate 64bit-bar
  with all remaining resources including low pci memory.

  If not, we need to calculate how much RAM should be populated to allocate the 
  remaining 32bit-bars, then populate sufficient RAM as exp_mem_resource to go
  to the second allocation round 2.

  2. The second allocation round to the remaining 32bit-bar

  We should can finish allocating all 32bit-bar in theory, then go to the third
  allocation round 3.

  3. The third allocation round to 64bit-bar

  We'll try to first allocate from the remaining low memory resource. If that
  isn't enough, we try to expand highmem to allocate for 64bit-bar. This process
  should be same as the original.

  #3.2 Address #2.2

  I'm trying to accommodate the not aligned reserved memory regions:

  We should skip all reserved device memory, but we also need to check if other
  smaller bars can be allocated if a mmio hole exists between resource-base and
  reserved device memory. If a hole exists between base and reserved device
  memory, lets go out simply to try allocate for next bar since all bars are in
  descending order of size. If not, we need to move resource-base to 
reserved_end
  just to reallocate this bar.

 tools/firmware/hvmloader/pci.c | 194 ++---
 1 file changed, 164 insertions(+), 30 deletions(-)

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index 5ff87a7..397f3b7 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -38,6 +38,31 @@ uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0;
 enum virtual_vga virtual_vga = VGA_none;
 unsigned long igd_opregion_pgbase = 0;
 
+static void relocate_ram_for_pci_memory(unsigned long cur_pci_mem_start)
+{
+struct xen_add_to_physmap xatp;
+unsigned int nr_pages = min_t(
+unsigned int,
+hvm_info-low_mem_pgend - (cur_pci_mem_start  PAGE_SHIFT),
+(1u  16) - 1);
+if ( hvm_info-high_mem_pgend == 0 )
+hvm_info-high_mem_pgend = 1ull  (32 - PAGE_SHIFT);
+hvm_info-low_mem_pgend -= nr_pages;
+printf(Relocating 0x%x pages from PRIllx to PRIllx\
+for lowmem MMIO hole\n,
+   nr_pages,
+   PRIllx_arg(((uint64_t)hvm_info-low_mem_pgend)PAGE_SHIFT),
+   PRIllx_arg(((uint64_t)hvm_info-high_mem_pgend)PAGE_SHIFT));
+xatp.domid = DOMID_SELF;
+xatp.space = XENMAPSPACE_gmfn_range;
+xatp.idx   = hvm_info-low_mem_pgend;
+xatp.gpfn  = hvm_info-high_mem_pgend;
+xatp.size  = nr_pages;
+if ( hypercall_memory_op(XENMEM_add_to_physmap, xatp) != 0 )
+BUG();
+hvm_info-high_mem_pgend += nr_pages;
+}
+
 void pci_setup(void)
 {
 uint8_t is_64bar, using_64bar, bar64_relocate = 0;
@@ -50,17 +75,22 @@ void pci_setup(void)
 /* Resources assignable to PCI devices via BARs. */
 struct resource {
 uint64_t base, max;
-} *resource, mem_resource, high_mem_resource, io_resource;
+} *resource, mem_resource, high_mem_resource, io_resource, 
exp_mem_resource;
 
 /* Create a list of device BARs in descending order of size. */
 struct bars {
-uint32_t is_64bar;
+#define PCI_BAR_IS_64BIT0x1
+#define PCI_BAR_IS_ALLOCATED0x2
+uint32_t flag;
 uint32_t devfn;
 uint32_t bar_reg;
 uint64_t bar_sz;
 } *bars = (struct bars *)scratch_start;
-unsigned int i, nr_bars = 0;
-uint64_t mmio_hole_size = 0;
+unsigned int i, j, n, nr_bars = 0;
+uint64_t mmio_hole_size = 0, reserved_start, reserved_end, reserved_size;
+bool bar32_allocating = 0;
+uint64_t mmio32_unallocated_total = 0;
+unsigned long