Re: [External] Re: [PATCH v4 05/21] mm/hugetlb: Introduce pgtable allocation/freeing helpers

2020-11-19 Thread Muchun Song
On Fri, Nov 20, 2020 at 7:22 AM Mike Kravetz  wrote:
>
> On 11/18/20 10:17 PM, Muchun Song wrote:
> > On Tue, Nov 17, 2020 at 11:06 PM Oscar Salvador  wrote:
> >>
> >> On Fri, Nov 13, 2020 at 06:59:36PM +0800, Muchun Song wrote:
> >>> +#define page_huge_pte(page)  ((page)->pmd_huge_pte)
> >>
> >> Seems you do not need this one anymore.
> >>
> >>> +void vmemmap_pgtable_free(struct page *page)
> >>> +{
> >>> + struct page *pte_page, *t_page;
> >>> +
> >>> + list_for_each_entry_safe(pte_page, t_page, >lru, lru) {
> >>> + list_del(_page->lru);
> >>> + pte_free_kernel(_mm, page_to_virt(pte_page));
> >>> + }
> >>> +}
> >>> +
> >>> +int vmemmap_pgtable_prealloc(struct hstate *h, struct page *page)
> >>> +{
> >>> + unsigned int nr = pgtable_pages_to_prealloc_per_hpage(h);
> >>> +
> >>> + /* Store preallocated pages on huge page lru list */
> >>> + INIT_LIST_HEAD(>lru);
> >>> +
> >>> + while (nr--) {
> >>> + pte_t *pte_p;
> >>> +
> >>> + pte_p = pte_alloc_one_kernel(_mm);
> >>> + if (!pte_p)
> >>> + goto out;
> >>> + list_add(_to_page(pte_p)->lru, >lru);
> >>> + }
> >>
> >> Definetely this looks better and easier to handle.
> >> Btw, did you explore Matthew's hint about instead of allocating a new page,
> >> using one of the ones you are going to free to store the ptes?
> >> I am not sure whether it is feasible at all though.
> >
> > Hi Oscar and Matthew,
> >
> > I have started an investigation about this. Finally, I think that it
> > may not be feasible. If we use a vmemmap page frame as a
> > page table when we split the PMD table firstly, in this stage,
> > we need to set 512 pte entry to the vmemmap page frame. If
> > someone reads the tail struct page struct of the HugeTLB,
> > it can get the arbitrary value (I am not sure it actually exists,
> > maybe the memory compaction module can do this). So on
> > the safe side, I think that allocating a new page is a good
> > choice.
>
> Thanks for looking into this.
>
> If I understand correctly, the issue is that you need the pte page to set
> up the new mappings.  In your current code, this is done before removing
> the pages of struct pages.  This keeps everything 'consistent' as things
> are remapped.
>
> If you want to use one of the 'pages of struct pages' for the new pte
> page, then there will be a period of time when things are inconsistent.
> Before setting up the mapping, some code could potentially access that
> pages of struct pages.

Yeah, you are right.

>
> I tend to agree that allocating allocating a new page is the safest thing
> to do here.  Or, perhaps someone can think of a way make this safe.
> --
> Mike Kravetz



-- 
Yours,
Muchun


Re: [External] Re: [PATCH v4 05/21] mm/hugetlb: Introduce pgtable allocation/freeing helpers

2020-11-19 Thread Mike Kravetz
On 11/18/20 10:17 PM, Muchun Song wrote:
> On Tue, Nov 17, 2020 at 11:06 PM Oscar Salvador  wrote:
>>
>> On Fri, Nov 13, 2020 at 06:59:36PM +0800, Muchun Song wrote:
>>> +#define page_huge_pte(page)  ((page)->pmd_huge_pte)
>>
>> Seems you do not need this one anymore.
>>
>>> +void vmemmap_pgtable_free(struct page *page)
>>> +{
>>> + struct page *pte_page, *t_page;
>>> +
>>> + list_for_each_entry_safe(pte_page, t_page, >lru, lru) {
>>> + list_del(_page->lru);
>>> + pte_free_kernel(_mm, page_to_virt(pte_page));
>>> + }
>>> +}
>>> +
>>> +int vmemmap_pgtable_prealloc(struct hstate *h, struct page *page)
>>> +{
>>> + unsigned int nr = pgtable_pages_to_prealloc_per_hpage(h);
>>> +
>>> + /* Store preallocated pages on huge page lru list */
>>> + INIT_LIST_HEAD(>lru);
>>> +
>>> + while (nr--) {
>>> + pte_t *pte_p;
>>> +
>>> + pte_p = pte_alloc_one_kernel(_mm);
>>> + if (!pte_p)
>>> + goto out;
>>> + list_add(_to_page(pte_p)->lru, >lru);
>>> + }
>>
>> Definetely this looks better and easier to handle.
>> Btw, did you explore Matthew's hint about instead of allocating a new page,
>> using one of the ones you are going to free to store the ptes?
>> I am not sure whether it is feasible at all though.
> 
> Hi Oscar and Matthew,
> 
> I have started an investigation about this. Finally, I think that it
> may not be feasible. If we use a vmemmap page frame as a
> page table when we split the PMD table firstly, in this stage,
> we need to set 512 pte entry to the vmemmap page frame. If
> someone reads the tail struct page struct of the HugeTLB,
> it can get the arbitrary value (I am not sure it actually exists,
> maybe the memory compaction module can do this). So on
> the safe side, I think that allocating a new page is a good
> choice.

Thanks for looking into this.

If I understand correctly, the issue is that you need the pte page to set
up the new mappings.  In your current code, this is done before removing
the pages of struct pages.  This keeps everything 'consistent' as things
are remapped.

If you want to use one of the 'pages of struct pages' for the new pte
page, then there will be a period of time when things are inconsistent.
Before setting up the mapping, some code could potentially access that
pages of struct pages.

I tend to agree that allocating allocating a new page is the safest thing
to do here.  Or, perhaps someone can think of a way make this safe.
-- 
Mike Kravetz


Re: [External] Re: [PATCH v4 05/21] mm/hugetlb: Introduce pgtable allocation/freeing helpers

2020-11-18 Thread Muchun Song
On Tue, Nov 17, 2020 at 11:06 PM Oscar Salvador  wrote:
>
> On Fri, Nov 13, 2020 at 06:59:36PM +0800, Muchun Song wrote:
> > +#define page_huge_pte(page)  ((page)->pmd_huge_pte)
>
> Seems you do not need this one anymore.
>
> > +void vmemmap_pgtable_free(struct page *page)
> > +{
> > + struct page *pte_page, *t_page;
> > +
> > + list_for_each_entry_safe(pte_page, t_page, >lru, lru) {
> > + list_del(_page->lru);
> > + pte_free_kernel(_mm, page_to_virt(pte_page));
> > + }
> > +}
> > +
> > +int vmemmap_pgtable_prealloc(struct hstate *h, struct page *page)
> > +{
> > + unsigned int nr = pgtable_pages_to_prealloc_per_hpage(h);
> > +
> > + /* Store preallocated pages on huge page lru list */
> > + INIT_LIST_HEAD(>lru);
> > +
> > + while (nr--) {
> > + pte_t *pte_p;
> > +
> > + pte_p = pte_alloc_one_kernel(_mm);
> > + if (!pte_p)
> > + goto out;
> > + list_add(_to_page(pte_p)->lru, >lru);
> > + }
>
> Definetely this looks better and easier to handle.
> Btw, did you explore Matthew's hint about instead of allocating a new page,
> using one of the ones you are going to free to store the ptes?
> I am not sure whether it is feasible at all though.

Hi Oscar and Matthew,

I have started an investigation about this. Finally, I think that it
may not be feasible. If we use a vmemmap page frame as a
page table when we split the PMD table firstly, in this stage,
we need to set 512 pte entry to the vmemmap page frame. If
someone reads the tail struct page struct of the HugeTLB,
it can get the arbitrary value (I am not sure it actually exists,
maybe the memory compaction module can do this). So on
the safe side, I think that allocating a new page is a good
choice.

Thanks.

>
>
> > --- a/mm/hugetlb_vmemmap.h
> > +++ b/mm/hugetlb_vmemmap.h
> > @@ -9,12 +9,24 @@
> >  #ifndef _LINUX_HUGETLB_VMEMMAP_H
> >  #define _LINUX_HUGETLB_VMEMMAP_H
> >  #include 
> > +#include 
>
> why do we need this here?
>
> --
> Oscar Salvador
> SUSE L3



-- 
Yours,
Muchun


Re: [External] Re: [PATCH v4 05/21] mm/hugetlb: Introduce pgtable allocation/freeing helpers

2020-11-17 Thread Muchun Song
On Tue, Nov 17, 2020 at 11:06 PM Oscar Salvador  wrote:
>
> On Fri, Nov 13, 2020 at 06:59:36PM +0800, Muchun Song wrote:
> > +#define page_huge_pte(page)  ((page)->pmd_huge_pte)

Yeah, I forgot to remove it. Thanks.

>
> Seems you do not need this one anymore.
>
> > +void vmemmap_pgtable_free(struct page *page)
> > +{
> > + struct page *pte_page, *t_page;
> > +
> > + list_for_each_entry_safe(pte_page, t_page, >lru, lru) {
> > + list_del(_page->lru);
> > + pte_free_kernel(_mm, page_to_virt(pte_page));
> > + }
> > +}
> > +
> > +int vmemmap_pgtable_prealloc(struct hstate *h, struct page *page)
> > +{
> > + unsigned int nr = pgtable_pages_to_prealloc_per_hpage(h);
> > +
> > + /* Store preallocated pages on huge page lru list */
> > + INIT_LIST_HEAD(>lru);
> > +
> > + while (nr--) {
> > + pte_t *pte_p;
> > +
> > + pte_p = pte_alloc_one_kernel(_mm);
> > + if (!pte_p)
> > + goto out;
> > + list_add(_to_page(pte_p)->lru, >lru);
> > + }
>
> Definetely this looks better and easier to handle.
> Btw, did you explore Matthew's hint about instead of allocating a new page,
> using one of the ones you are going to free to store the ptes?

Oh, sorry for missing his reply. It is a good idea. I will start an
investigation.
Thanks for reminding me.

> I am not sure whether it is feasible at all though.
>
>
> > --- a/mm/hugetlb_vmemmap.h
> > +++ b/mm/hugetlb_vmemmap.h
> > @@ -9,12 +9,24 @@
> >  #ifndef _LINUX_HUGETLB_VMEMMAP_H
> >  #define _LINUX_HUGETLB_VMEMMAP_H
> >  #include 
> > +#include 
>
> why do we need this here?

Yeah, also can remove:).


>
> --
> Oscar Salvador
> SUSE L3



--
Yours,
Muchun