Re: [PATCH 0/1] memory offline issues with hugepage size > memory block size
On Wed 21-09-16 11:27:48, Dave Hansen wrote: > On 09/21/2016 11:20 AM, Michal Hocko wrote: > > I would even question the per page block offlining itself. Why would > > anybody want to offline few blocks rather than the whole node? What is > > the usecase here? > > The original reason was so that you could remove a DIMM or a riser card > full of DIMMs, which are certainly a subset of a node. OK, I see, thanks for the clarification! I was always thinking more in node rather than physical memory range hot-remove. I do agree that it makes sense to free the whole gigantic huge page if we encounter a tail page for the above use case because losing the gigantic page is justified when the whole dim goes away. Thanks! -- Michal Hocko SUSE Labs
Re: [PATCH 0/1] memory offline issues with hugepage size > memory block size
On 09/21/2016 11:20 AM, Michal Hocko wrote: > I would even question the per page block offlining itself. Why would > anybody want to offline few blocks rather than the whole node? What is > the usecase here? The original reason was so that you could remove a DIMM or a riser card full of DIMMs, which are certainly a subset of a node. With virtual machines, perhaps you only want to make a small adjustment to the memory that a VM has. Or, perhaps the VM only _has_ one node. Granted, ballooning takes care of a lot of these cases, but memmap[] starts to get annoying at some point if you balloon too much memory away.
Re: [PATCH 0/1] memory offline issues with hugepage size > memory block size
On Tue 20-09-16 10:37:04, Mike Kravetz wrote: > On 09/20/2016 08:53 AM, Gerald Schaefer wrote: > > dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a > > list corruption and addressing exception when trying to set a memory > > block offline that is part (but not the first part) of a gigantic > > hugetlb page with a size > memory block size. > > > > When no other smaller hugepage sizes are present, the VM_BUG_ON() will > > trigger directly. In the other case we will run into an addressing > > exception later, because dissolve_free_huge_page() will not use the head > > page of the compound hugetlb page which will result in a NULL hstate > > from page_hstate(). list_del() would also not work well on a tail page. > > > > To fix this, first remove the VM_BUG_ON() because it is wrong, and then > > use the compound head page in dissolve_free_huge_page(). > > > > However, this all assumes that it is the desired behaviour to remove > > a (gigantic) unused hugetlb page from the pool, just because a small > > (in relation to the hugepage size) memory block is going offline. Not > > sure if this is the right thing, and it doesn't look very consistent > > given that in this scenario it is _not_ possible to migrate > > such a (gigantic) hugepage if it is in use. OTOH, has_unmovable_pages() > > will return false in both cases, i.e. the memory block will be reported > > as removable, no matter if the hugepage that it is part of is unused or > > in use. > > > > This patch is assuming that it would be OK to remove the hugepage, > > i.e. memory offline beats pre-allocated unused (gigantic) hugepages. > > > > Any thoughts? > > Cc'ed Rui Teng and Dave Hansen as they were discussing the issue in > this thread: > https://lkml.org/lkml/2016/9/13/146 > > Their approach (I believe) would be to fail the offline operation in > this case. However, I could argue that failing the operation, or > dissolving the unused huge page containing the area to be offlined is > the right thing to do. I am sorry I have noticed this thread only now. I was arguing about this in the original thread. I would be rather reluctant to free gigantic page just because somebody wants to offline a small part of it because setup is really expensive and a lost page would be really hard to get back. I would even question the per page block offlining itself. Why would anybody want to offline few blocks rather than the whole node? What is the usecase here? -- Michal Hocko SUSE Labs
Re: [PATCH 0/1] memory offline issues with hugepage size > memory block size
On Tue, 20 Sep 2016 10:45:23 -0700 Dave Hansen wrote: > On 09/20/2016 10:37 AM, Mike Kravetz wrote: > > > > Their approach (I believe) would be to fail the offline operation in > > this case. However, I could argue that failing the operation, or > > dissolving the unused huge page containing the area to be offlined is > > the right thing to do. > > I think the right thing to do is dissolve the whole huge page if even a > part of it is offlined. The only question is what to do with the > gigantic remnants. > Hmm, not sure if I got this right, but I thought that by calling update_and_free_page() on the head page (even if it is not part of the memory block to be removed) all parts of the gigantic hugepage should be properly freed and there should not be any remnants left.
Re: [PATCH 0/1] memory offline issues with hugepage size > memory block size
On Tue, 20 Sep 2016 10:37:04 -0700 Mike Kravetz wrote: > > Cc'ed Rui Teng and Dave Hansen as they were discussing the issue in > this thread: > https://lkml.org/lkml/2016/9/13/146 Ah, thanks, I didn't see that. > > Their approach (I believe) would be to fail the offline operation in > this case. However, I could argue that failing the operation, or > dissolving the unused huge page containing the area to be offlined is > the right thing to do. > > I never thought too much about the VM_BUG_ON(), but you are correct in > that it should be removed in either case. > > The other thing that needs to be changed is the locking in > dissolve_free_huge_page(). I believe the lock only needs to be held if > we are removing the huge page from the pool. It is not a correctness > but performance issue. > Yes, that looks odd, that's why in my patch I moved the PageHuge() check out from dissolve_free_huge_page(), up into the loop in dissolve_free_huge_pages(). This way dissolve_free_huge_page() with its locking should only be called once per memory block, in the case where this memory block is part of a gigantic hugepage.
Re: [PATCH 0/1] memory offline issues with hugepage size > memory block size
On 09/20/2016 07:45 PM, Dave Hansen wrote: On 09/20/2016 10:37 AM, Mike Kravetz wrote: Their approach (I believe) would be to fail the offline operation in this case. However, I could argue that failing the operation, or dissolving the unused huge page containing the area to be offlined is the right thing to do. I think the right thing to do is dissolve the whole huge page if even a part of it is offlined. The only question is what to do with the gigantic remnants. Just free them into the buddy system? Or what are the alternatives? Creating smaller huge pages (if supported)? That doesn't make much sense. Offline it completely? That's probably not what the user requested. Vlastimil
Re: [PATCH 0/1] memory offline issues with hugepage size > memory block size
On 09/20/2016 10:37 AM, Mike Kravetz wrote: > > Their approach (I believe) would be to fail the offline operation in > this case. However, I could argue that failing the operation, or > dissolving the unused huge page containing the area to be offlined is > the right thing to do. I think the right thing to do is dissolve the whole huge page if even a part of it is offlined. The only question is what to do with the gigantic remnants.
Re: [PATCH 0/1] memory offline issues with hugepage size > memory block size
On 09/20/2016 08:53 AM, Gerald Schaefer wrote: > dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a > list corruption and addressing exception when trying to set a memory > block offline that is part (but not the first part) of a gigantic > hugetlb page with a size > memory block size. > > When no other smaller hugepage sizes are present, the VM_BUG_ON() will > trigger directly. In the other case we will run into an addressing > exception later, because dissolve_free_huge_page() will not use the head > page of the compound hugetlb page which will result in a NULL hstate > from page_hstate(). list_del() would also not work well on a tail page. > > To fix this, first remove the VM_BUG_ON() because it is wrong, and then > use the compound head page in dissolve_free_huge_page(). > > However, this all assumes that it is the desired behaviour to remove > a (gigantic) unused hugetlb page from the pool, just because a small > (in relation to the hugepage size) memory block is going offline. Not > sure if this is the right thing, and it doesn't look very consistent > given that in this scenario it is _not_ possible to migrate > such a (gigantic) hugepage if it is in use. OTOH, has_unmovable_pages() > will return false in both cases, i.e. the memory block will be reported > as removable, no matter if the hugepage that it is part of is unused or > in use. > > This patch is assuming that it would be OK to remove the hugepage, > i.e. memory offline beats pre-allocated unused (gigantic) hugepages. > > Any thoughts? Cc'ed Rui Teng and Dave Hansen as they were discussing the issue in this thread: https://lkml.org/lkml/2016/9/13/146 Their approach (I believe) would be to fail the offline operation in this case. However, I could argue that failing the operation, or dissolving the unused huge page containing the area to be offlined is the right thing to do. I never thought too much about the VM_BUG_ON(), but you are correct in that it should be removed in either case. The other thing that needs to be changed is the locking in dissolve_free_huge_page(). I believe the lock only needs to be held if we are removing the huge page from the pool. It is not a correctness but performance issue. -- Mike Kravetz > > > Gerald Schaefer (1): > mm/hugetlb: fix memory offline with hugepage size > memory block size > > mm/hugetlb.c | 16 +--- > 1 file changed, 9 insertions(+), 7 deletions(-) >