On 31.07.2025 17:58, Oleksii Kurochko wrote:
> RISC-V's PTE has only two available bits that can be used to store the P2M
> type. This is insufficient to represent all the current RISC-V P2M types.
> Therefore, some P2M types must be stored outside the PTE bits.
> 
> To address this, a metadata table is introduced to store P2M types that
> cannot fit in the PTE itself. Not all P2M types are stored in the
> metadata table—only those that require it.
> 
> The metadata table is linked to the intermediate page table via the
> `struct page_info`'s list field of the corresponding intermediate page.
> 
> To simplify the allocation and linking of intermediate and metadata page
> tables, `p2m_{alloc,free}_table()` functions are implemented.
> 
> These changes impact `p2m_split_superpage()`, since when a superpage is
> split, it is necessary to update the metadata table of the new
> intermediate page table — if the entry being split has its P2M type set
> to `p2m_ext_storage` in its `P2M_TYPES` bits.

Oh, this was an aspect I didn't realize when commenting on the name of
the enumerator. I think you want to keep the name for the purpose here,
but you better wouldn't apply relational operators to it (and hence
have a second value to serve that purpose).

> In addition to updating
> the metadata of the new intermediate page table, the corresponding entry
> in the metadata for the original superpage is invalidated.
> 
> Also, update p2m_{get,set}_type to work with P2M types which don't fit
> into PTE bits.
> 
> Signed-off-by: Oleksii Kurochko <oleksii.kuroc...@gmail.com>

No Suggested-by: or anything?

> --- a/xen/arch/riscv/include/asm/mm.h
> +++ b/xen/arch/riscv/include/asm/mm.h
> @@ -150,6 +150,15 @@ struct page_info
>              /* Order-size of the free chunk this page is the head of. */
>              unsigned int order;
>          } free;
> +
> +        /* Page is used to store metadata: p2m type. */

That's not correct. The page thus described is what the pointer below
points to. Here it's more like "Page is used as an intermediate P2M
page table".

> +        struct {
> +            /*
> +             * Pointer to a page which store metadata for an intermediate 
> page
> +             * table.
> +             */
> +            struct page_info *metadata;
> +        } md;

In the description you say you would re-use the list field.

> --- a/xen/arch/riscv/p2m.c
> +++ b/xen/arch/riscv/p2m.c
> @@ -101,7 +101,16 @@ static int p2m_alloc_root_table(struct p2m_domain *p2m)
>  {
>      struct domain *d = p2m->domain;
>      struct page_info *page;
> -    const unsigned int nr_root_pages = P2M_ROOT_PAGES;
> +    /*
> +     * If the root page table starts at Level <= 2, and since only 1GB, 2MB,
> +     * and 4KB mappings are supported (as enforced by the ASSERT() in
> +     * p2m_set_entry()), it is necessary to allocate P2M_ROOT_PAGES for
> +     * the root page table itself, plus an additional P2M_ROOT_PAGES for
> +     * metadata storage. This is because only two free bits are available in
> +     * the PTE, which are not sufficient to represent all possible P2M types.
> +     */
> +    const unsigned int nr_root_pages = P2M_ROOT_PAGES *
> +                                       ((P2M_ROOT_LEVEL <= 2) ? 2 : 1);
>  
>      /*
>       * Return back nr_root_pages to assure the root table memory is also
> @@ -114,6 +123,23 @@ static int p2m_alloc_root_table(struct p2m_domain *p2m)
>      if ( !page )
>          return -ENOMEM;
>  
> +    if ( P2M_ROOT_LEVEL <= 2 )
> +    {
> +        /*
> +         * In the case where P2M_ROOT_LEVEL <= 2, it is necessary to allocate
> +         * a page of the same size as that used for the root page table.
> +         * Therefore, p2m_allocate_root() can be safely reused.
> +         */
> +        struct page_info *metadata = p2m_allocate_root(d);
> +        if ( !metadata )
> +        {
> +            free_domheap_pages(page, P2M_ROOT_ORDER);
> +            return -ENOMEM;
> +        }
> +
> +        page->v.md.metadata = metadata;

Don't you need to install such a link for every one of the 4 pages?

> @@ -198,24 +224,25 @@ static pte_t *p2m_get_root_pointer(struct p2m_domain 
> *p2m, gfn_t gfn)
>      return __map_domain_page(p2m->root + root_table_indx);
>  }
>  
> -static int p2m_set_type(pte_t *pte, p2m_type_t t)
> +static void p2m_set_type(pte_t *pte, const p2m_type_t t, const unsigned int 
> i)
>  {
> -    int rc = 0;
> -
>      if ( t > p2m_ext_storage )
> -        panic("unimplemeted\n");
> +    {
> +        ASSERT(pte);
> +
> +        pte[i].pte = t;

What does i identify here?

> +    }
>      else
>          pte->pte |= MASK_INSR(t, P2M_TYPE_PTE_BITS_MASK);
> -
> -    return rc;
>  }
>  
> -static p2m_type_t p2m_get_type(const pte_t pte)
> +static p2m_type_t p2m_get_type(const pte_t pte, const pte_t *metadata,
> +                               const unsigned int i)
>  {
>      p2m_type_t type = MASK_EXTR(pte.pte, P2M_TYPE_PTE_BITS_MASK);
>  
>      if ( type == p2m_ext_storage )
> -        panic("unimplemented\n");
> +        type = metadata[i].pte;
>  
>      return type;
>  }

Overall this feels pretty fragile, as the caller has to pass several values
which all need to be in sync with one another. If you ...

> @@ -265,7 +292,10 @@ static void p2m_set_permission(pte_t *e, p2m_type_t t)
>      }
>  }
>  
> -static pte_t p2m_pte_from_mfn(mfn_t mfn, p2m_type_t t, bool is_table)
> +static pte_t p2m_pte_from_mfn(mfn_t mfn, p2m_type_t t,
> +                              struct page_info *metadata_pg,
> +                              const unsigned int indx,
> +                              bool is_table)
>  {
>      pte_t e = (pte_t) { PTE_VALID };
>  
> @@ -285,12 +315,21 @@ static pte_t p2m_pte_from_mfn(mfn_t mfn, p2m_type_t t, 
> bool is_table)
>  
>      if ( !is_table )
>      {
> +        pte_t *metadata = __map_domain_page(metadata_pg);

... map the page anyway, no matter whether ...

>          p2m_set_permission(&e, t);
>  
> +        metadata[indx].pte = p2m_invalid;
> +
>          if ( t < p2m_ext_storage )
> -            p2m_set_type(&e, t);
> +            p2m_set_type(&e, t, indx);
>          else
> -            panic("unimplemeted\n");
> +        {
> +            e.pte |= MASK_INSR(p2m_ext_storage, P2M_TYPE_PTE_BITS_MASK);
> +            p2m_set_type(metadata, t, indx);
> +        }

... you'll actually use it, maybe best to map both pages at the same point?

And as said elsewhere, no, I don't think you want to use p2m_set_type() for
two entirely different purposes.

> @@ -323,22 +364,71 @@ static struct page_info *p2m_alloc_page(struct 
> p2m_domain *p2m)
>      return pg;
>  }
>  
> +static void p2m_free_page(struct p2m_domain *p2m, struct page_info *pg);
> +
> +/*
> + * Allocate a page table with an additional extra page to store
> + * metadata for each entry of the page table.
> + * Link this metadata page to page table page's list field.
> + */
> +static struct page_info * p2m_alloc_table(struct p2m_domain *p2m)

Nit: Stray blank after * again.

> +{
> +    enum table_type
> +    {
> +        INTERMEDIATE_TABLE=0,

If you really think you need the "= 0", then please with blanks around '='.

> +        /*
> +         * At the moment, metadata is going to store P2M type
> +         * for each PTE of page table.
> +         */
> +        METADATA_TABLE,
> +        TABLE_MAX
> +    };
> +
> +    struct page_info *tables[TABLE_MAX];
> +
> +    for ( unsigned int i = 0; i < TABLE_MAX; i++ )
> +    {
> +        tables[i] = p2m_alloc_page(p2m);
> +
> +        if ( !tables[i] )
> +            goto out;
> +
> +        clear_and_clean_page(tables[i]);
> +    }
> +
> +    tables[INTERMEDIATE_TABLE]->v.md.metadata = tables[METADATA_TABLE];
> +
> +    return tables[INTERMEDIATE_TABLE];
> +
> + out:
> +    for ( unsigned int i = 0; i < TABLE_MAX; i++ )
> +        if ( tables[i] )

You didn't clear all of tables[] first, though. This kind of cleanup is
often better done as

    while ( i-- > 0 )
        ...

You don't even need an if() then, as you know allocations succeeded for all
earlier array slots.

> +            p2m_free_page(p2m, tables[i]);
> +
> +    return NULL;
> +}

I'm also surprised you allocate the metadata table no matter whether you'll
actually need it. That'll double your average paging pool usage, when in a
typical case only very few entries would actually require this extra
storage.

> @@ -453,10 +543,9 @@ static void p2m_put_2m_superpage(mfn_t mfn, p2m_type_t 
> type)
>  }
>  
>  /* Put any references on the page referenced by pte. */
> -static void p2m_put_page(const pte_t pte, unsigned int level)
> +static void p2m_put_page(const pte_t pte, unsigned int level, p2m_type_t 
> p2mt)
>  {
>      mfn_t mfn = pte_get_mfn(pte);
> -    p2m_type_t p2m_type = p2m_get_type(pte);
>  
>      ASSERT(pte_is_valid(pte));
>  
> @@ -470,10 +559,10 @@ static void p2m_put_page(const pte_t pte, unsigned int 
> level)
>      switch ( level )
>      {
>      case 1:
> -        return p2m_put_2m_superpage(mfn, p2m_type);
> +        return p2m_put_2m_superpage(mfn, p2mt);
>  
>      case 0:
> -        return p2m_put_4k_page(mfn, p2m_type);
> +        return p2m_put_4k_page(mfn, p2mt);
>      }
>  }

Might it be better to introduce this function in this shape right away, in
the earlier patch?

> @@ -690,18 +791,23 @@ static int p2m_set_entry(struct p2m_domain *p2m,
>      {
>          /* We need to split the original page. */
>          pte_t split_pte = *entry;
> +        struct page_info *metadata = virt_to_page(table)->v.md.metadata;

This (or along these lines) is how I would have expected things to be done
elsewhere as well, limiting the amount of arguments you need to pass
around.

Jan

Reply via email to