On Tue, Jun 19, 2018 at 01:06:48AM +0000, Wang, Wei W wrote:
> On Monday, June 18, 2018 10:29 AM, Michael S. Tsirkin wrote:
> > On Sat, Jun 16, 2018 at 01:09:44AM +0000, Wang, Wei W wrote:
> > > Not necessarily, I think. We have min(4m_page_blocks / 512, 1024) above,
> > so the maximum memory that can be reported is 2TB. For larger guests, e.g.
> > 4TB, the optimization can still offer 2TB free memory (better than no
> > optimization).
> > 
> > Maybe it's better, maybe it isn't. It certainly muddies the waters even 
> > more.
> > I'd rather we had a better plan. From that POV I like what Matthew Wilcox
> > suggested for this which is to steal the necessary # of entries off the 
> > list.
> 
> Actually what Matthew suggested doesn't make a difference here. That method 
> always steal the first free page blocks, and sure can be changed to take 
> more. But all these can be achieved via kmalloc

I'd do get_user_pages really. You don't want pages split, etc.

> by the caller which is more prudent and makes the code more straightforward. 
> I think we don't need to take that risk unless the MM folks strongly endorse 
> that approach.
> 
> The max size of the kmalloc-ed memory is 4MB, which gives us the limitation 
> that the max free memory to report is 2TB. Back to the motivation of this 
> work, the cloud guys want to use this optimization to accelerate their guest 
> live migration. 2TB guests are not common in today's clouds. When huge guests 
> become common in the future, we can easily tweak this API to fill hints into 
> scattered buffer (e.g. several 4MB arrays passed to this API) instead of one 
> as in this version.
> 
> This limitation doesn't cause any issue from functionality perspective. For 
> the extreme case like a 100TB guest live migration which is theoretically 
> possible today, this optimization helps skip 2TB of its free memory. This 
> result is that it may reduce only 2% live migration time, but still better 
> than not skipping the 2TB (if not using the feature).

Not clearly better, no, since you are slowing the guest.


> So, for the first release of this feature, I think it is better to have the 
> simpler and more straightforward solution as we have now, and clearly 
> document why it can report up to 2TB free memory.

No one has the time to read documentation about how an internal flag
within a device works. Come on, getting two pages isn't much harder
than a single one.

> 
>  
> > If that doesn't fly, we can allocate out of the loop and just retry with 
> > more
> > pages.
> > 
> > > On the other hand, large guests being large mostly because the guests need
> > to use large memory. In that case, they usually won't have that much free
> > memory to report.
> > 
> > And following this logic small guests don't have a lot of memory to report 
> > at
> > all.
> > Could you remind me why are we considering this optimization then?
> 
> If there is a 3TB guest, it is 3TB not 2TB mostly because it would need to 
> use e.g. 2.5TB memory from time to time. In the worst case, it only has 0.5TB 
> free memory to report, but reporting 0.5TB with this optimization is better 
> than no optimization. (and the current 2TB limitation isn't a limitation for 
> the 3TB guest in this case)

I'd rather not spend time writing up random limitations.


> Best,
> Wei
_______________________________________________
Virtualization mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Reply via email to