Re: [PATCH v1 1/3] xen/riscv: introduce setup_initial_pages

2023-03-22 Thread Oleksii
Hi Julien,

On Tue, 2023-03-21 at 16:25 +, Julien Grall wrote:
> 
> 
> On 05/03/2023 16:25, Oleksii wrote:
> > Hi Julien,
> 
> Hi,
> 
> Sorry for the late answer. I was away for the past couple of weeks.
> 
> > On Mon, 2023-02-27 at 17:36 +, Julien Grall wrote:
> > > Hi Oleksii,
> > > 
> > > On 27/02/2023 16:52, Oleksii wrote:
> > > > On Sat, 2023-02-25 at 17:53 +, Julien Grall wrote:
> > > > > > +/*
> > > > > > + * WARNING: load_addr() and linker_addr() are to be called
> > > > > > only
> > > > > > when the MMU is
> > > > > > + * disabled and only when executed by the primary CPU. 
> > > > > > They
> > > > > > cannot refer to
> > > > > > + * any global variable or functions.
> > > > > 
> > > > > I find interesting you are saying when
> > > > > _setup_initial_pagetables() is
> > > > > called from setup_initial_pagetables(). Would you be able to
> > > > > explain
> > > > > how
> > > > > this is different?
> > > > I am not sure that I understand your question correctly but
> > > > _setup_initial_pagetables() was introduced to map some
> > > > addresses
> > > > with
> > > > write/read flag. Probably I have to rename it to something that
> > > > is
> > > > more
> > > > clear.
> > > 
> > > So the comment suggests that you code cannot refer to global
> > > functions/variables when the MMU is off. So I have multiple
> > > questions:
> > >     * Why only global? IOW, why static would be OK?
> > >     * setup_initial_pagetables() has a call to
> > > _setup_initial_pagetables() (IOW referring to another function).
> > > Why
> > > is
> > > it fine?
> > >     * You have code in the next patch referring to global
> > > variables
> > > (mainly _start and _end). How is this different?
> > > 
> > > > > 
> > > > > > + */
> > > > > > +
> > > > > > +/*
> > > > > > + * Convert an addressed layed out at link time to the
> > > > > > address
> > > > > > where it was loaded
> > > > > 
> > > > > Typo: s/addressed/address/ ?
> > > > Yes, it should be address. and 'layed out' should be changed to
> > > > 'laid
> > > > out'...
> > > > > 
> > > > > > + * by the bootloader.
> > > > > > + */
> > > > > 
> > > > > Looking at the implementation, you seem to consider that any
> > > > > address
> > > > > not
> > > > > in the range [linker_addr_start, linker_addr_end[ will have a
> > > > > 1:1
> > > > > mappings.
> > > > > 
> > > > > I am not sure this is what you want. So I would consider to
> > > > > throw
> > > > > an
> > > > > error if such address is passed.
> > > > I thought that at this stage and if no relocation was done it
> > > > is
> > > > 1:1
> > > > except the case when load_addr_start != linker_addr_start.
> > > 
> > > The problem is what you try to map one to one may clash with the
> > > linked
> > > region for Xen. So it is not always possible to map the region
> > > 1:1.
> > > 
> > > Therefore, I don't see any use for the else part here.
> > Got it. Thanks.
> > 
> > I am curious than what is the correct approach in general to handle
> > this situation?
> There are multiple approach to handle it and I don't know which one 
> would be best :). Relocation is one...
> 
> > I mean that throw an error it is one option but if I would like to
> > do
> > that w/o throwing an error. Should it done some relocation in that
> > case?
> ... solution. For Arm, I decided to avoid relocation it requires more
> work in assembly.
> 
> Let me describe what we did and you can decide what you want to do in
> RISC-V.
> 
> For Arm64, as we have plenty of virtual address space, I decided to 
> reshuffle the layout so Xen is running a very high address (so it is 
> unlikely to clash).
I thought about running Xen very high address.
Thanks. I think it is a nice option to do the same for RISC-V64.

> 
> For Arm32, we have a smaller address space (4GB) so instead we are
> going 
> through a temporary area to enable the MMU when the load and runtime 
> region clash. The sequence is:
> 
>    1) Map Xen to a temporary area
>    2) Enable the MMU and jump to the temporary area
>    3) Map Xen to the runtime area
>    4) Jump to the runtime area
>    5) Remove the temporary area
> 
It is the same for RV32. As we don't support RV32 I will use:
  #error "Add support of MMU for RV32"
> [...]
> 
> > > > > Hmmm... I would actually expect the address to be properly
> > > > > aligned
> > > > > and
> > > > > therefore not require an alignment here.
> > > > > 
> > > > > Otherwise, this raise the question of what happen if you have
> > > > > region
> > > > > using the same page?
> > > > That map_start &=  ZEROETH_MAP_MASK is needed to page number of
> > > > address
> > > > w/o page offset.
> > > 
> > > My point is why would the page offset be non-zero?
> > I checked a linker script and addresses that passed to
> > setup_initial_mapping() and they are really always aligned so there
> > is
> > no any sense in additional alignment.
> 
> Ok. I would suggest to add some ASSERT()/BUG_ON() in order to confirm
> this is always the case.
> 
> [...]
> 
> > > > > 
> > 

Re: [PATCH v1 1/3] xen/riscv: introduce setup_initial_pages

2023-03-21 Thread Julien Grall




On 05/03/2023 16:25, Oleksii wrote:

Hi Julien,


Hi,

Sorry for the late answer. I was away for the past couple of weeks.


On Mon, 2023-02-27 at 17:36 +, Julien Grall wrote:

Hi Oleksii,

On 27/02/2023 16:52, Oleksii wrote:

On Sat, 2023-02-25 at 17:53 +, Julien Grall wrote:

+/*
+ * WARNING: load_addr() and linker_addr() are to be called
only
when the MMU is
+ * disabled and only when executed by the primary CPU.  They
cannot refer to
+ * any global variable or functions.


I find interesting you are saying when
_setup_initial_pagetables() is
called from setup_initial_pagetables(). Would you be able to
explain
how
this is different?

I am not sure that I understand your question correctly but
_setup_initial_pagetables() was introduced to map some addresses
with
write/read flag. Probably I have to rename it to something that is
more
clear.


So the comment suggests that you code cannot refer to global
functions/variables when the MMU is off. So I have multiple
questions:
    * Why only global? IOW, why static would be OK?
    * setup_initial_pagetables() has a call to
_setup_initial_pagetables() (IOW referring to another function). Why
is
it fine?
    * You have code in the next patch referring to global variables
(mainly _start and _end). How is this different?




+ */
+
+/*
+ * Convert an addressed layed out at link time to the address
where it was loaded


Typo: s/addressed/address/ ?

Yes, it should be address. and 'layed out' should be changed to
'laid
out'...



+ * by the bootloader.
+ */


Looking at the implementation, you seem to consider that any
address
not
in the range [linker_addr_start, linker_addr_end[ will have a 1:1
mappings.

I am not sure this is what you want. So I would consider to throw
an
error if such address is passed.

I thought that at this stage and if no relocation was done it is
1:1
except the case when load_addr_start != linker_addr_start.


The problem is what you try to map one to one may clash with the
linked
region for Xen. So it is not always possible to map the region 1:1.

Therefore, I don't see any use for the else part here.

Got it. Thanks.

I am curious than what is the correct approach in general to handle
this situation?
There are multiple approach to handle it and I don't know which one 
would be best :). Relocation is one...



I mean that throw an error it is one option but if I would like to do
that w/o throwing an error. Should it done some relocation in that
case?
... solution. For Arm, I decided to avoid relocation it requires more 
work in assembly.


Let me describe what we did and you can decide what you want to do in 
RISC-V.


For Arm64, as we have plenty of virtual address space, I decided to 
reshuffle the layout so Xen is running a very high address (so it is 
unlikely to clash).


For Arm32, we have a smaller address space (4GB) so instead we are going 
through a temporary area to enable the MMU when the load and runtime 
region clash. The sequence is:


  1) Map Xen to a temporary area
  2) Enable the MMU and jump to the temporary area
  3) Map Xen to the runtime area
  4) Jump to the runtime area
  5) Remove the temporary area

[...]


Hmmm... I would actually expect the address to be properly
aligned
and
therefore not require an alignment here.

Otherwise, this raise the question of what happen if you have
region
using the same page?

That map_start &=  ZEROETH_MAP_MASK is needed to page number of
address
w/o page offset.


My point is why would the page offset be non-zero?

I checked a linker script and addresses that passed to
setup_initial_mapping() and they are really always aligned so there is
no any sense in additional alignment.


Ok. I would suggest to add some ASSERT()/BUG_ON() in order to confirm 
this is always the case.


[...]




+
+    /*
+ * Create a mapping of the load time address range to...
the
load time address range.


Same about the line length here.


+ * This mapping is used at boot time only.
+ */
+    _setup_initial_pagetables(second, first, zeroeth,


This can only work if Xen is loaded at its linked address. So you
need a
separate set of L0, L1 tables for the identity mapping.

That said, this would not be sufficient because:
     1) Xen may not be loaded at a 2M boundary (you can control
with
U-boot, but not with EFI). So this may cross a boundary and
therefore
need multiple pages.
     2) The load region may overlap the link address

While I think it would be good to handle those cases from the
start,
I
would understand why are not easy to solve. So I think the
minimum is
to
throw some errors if you are in a case you can't support.

Do you mean to throw some error in load_addr()/linkder_addr()?


In this case, I meant to check if load_addr != linker_addr, then
throw
an error.

I am not sure that it is needed now and it is easier to throw an error
but is option exist to handler situation when load_addr != linker_addr
except throwing an error? relocate?


I believe I answered this above.

Re: [PATCH v1 1/3] xen/riscv: introduce setup_initial_pages

2023-03-05 Thread Oleksii
On Mon, 2023-02-27 at 16:19 +0100, Jan Beulich wrote:
> On 27.02.2023 16:12, Jan Beulich wrote:
> > On 24.02.2023 16:06, Oleksii Kurochko wrote:
> > > +static void __attribute__((section(".entry")))
> > > +_setup_initial_pagetables(pte_t *second, pte_t *first, pte_t
> > > *zeroeth,
> > 
> > Why the special section (also again further down)?
> 
> Looking at patch 2 it occurred to me that you probably mean __init
> here.
Yes, you are right.
> 
> Jan




Re: [PATCH v1 1/3] xen/riscv: introduce setup_initial_pages

2023-03-05 Thread Oleksii
On Mon, 2023-02-27 at 16:12 +0100, Jan Beulich wrote:
> On 24.02.2023 16:06, Oleksii Kurochko wrote:
> > --- /dev/null
> > +++ b/xen/arch/riscv/include/asm/page.h
> > @@ -0,0 +1,90 @@
> > +#ifndef _ASM_RISCV_PAGE_H
> > +#define _ASM_RISCV_PAGE_H
> > +
> > +#include 
> > +#include 
> > +
> > +#define PAGE_ENTRIES    512
> > +#define VPN_BITS    (9)
> > +#define VPN_MASK    ((unsigned long)((1 << VPN_BITS) -
> > 1))
> > +
> > +#ifdef CONFIG_RISCV_64
> > +/* L3 index Bit[47:39] */
> > +#define THIRD_SHIFT (39)
> > +#define THIRD_MASK  (VPN_MASK << THIRD_SHIFT)
> > +/* L2 index Bit[38:30] */
> > +#define SECOND_SHIFT    (30)
> > +#define SECOND_MASK (VPN_MASK << SECOND_SHIFT)
> > +/* L1 index Bit[29:21] */
> > +#define FIRST_SHIFT (21)
> > +#define FIRST_MASK  (VPN_MASK << FIRST_SHIFT)
> > +/* L0 index Bit[20:12] */
> > +#define ZEROETH_SHIFT   (12)
> > +#define ZEROETH_MASK    (VPN_MASK << ZEROETH_SHIFT)
> > +
> > +#else // CONFIG_RISCV_32
> > +
> > +/* L1 index Bit[31:22] */
> > +#define FIRST_SHIFT (22)
> > +#define FIRST_MASK  (VPN_MASK << FIRST_SHIFT)
> > +
> > +/* L0 index Bit[21:12] */
> > +#define ZEROETH_SHIFT   (12)
> > +#define ZEROETH_MASK    (VPN_MASK << ZEROETH_SHIFT)
> > +#endif
> > +
> > +#define THIRD_SIZE  (1 << THIRD_SHIFT)
> > +#define THIRD_MAP_MASK  (~(THIRD_SIZE - 1))
> > +#define SECOND_SIZE (1 << SECOND_SHIFT)
> > +#define SECOND_MAP_MASK (~(SECOND_SIZE - 1))
> > +#define FIRST_SIZE  (1 << FIRST_SHIFT)
> > +#define FIRST_MAP_MASK  (~(FIRST_SIZE - 1))
> > +#define ZEROETH_SIZE    (1 << ZEROETH_SHIFT)
> > +#define ZEROETH_MAP_MASK    (~(ZEROETH_SIZE - 1))
> > +
> > +#define PTE_SHIFT   10
> > +
> > +#define PTE_VALID   BIT(0, UL)
> > +#define PTE_READABLE    BIT(1, UL)
> > +#define PTE_WRITABLE    BIT(2, UL)
> > +#define PTE_EXECUTABLE  BIT(3, UL)
> > +#define PTE_USER    BIT(4, UL)
> > +#define PTE_GLOBAL  BIT(5, UL)
> > +#define PTE_ACCESSED    BIT(6, UL)
> > +#define PTE_DIRTY   BIT(7, UL)
> > +#define PTE_RSW (BIT(8, UL) | BIT(9, UL))
> > +
> > +#define PTE_LEAF_DEFAULT    (PTE_VALID | PTE_READABLE |
> > PTE_WRITABLE | PTE_EXECUTABLE)
> > +#define PTE_TABLE   (PTE_VALID)
> > +
> > +/* Calculate the offsets into the pagetables for a given VA */
> > +#define zeroeth_linear_offset(va)   ((va) >> ZEROETH_SHIFT)
> > +#define first_linear_offset(va) ((va) >> FIRST_SHIFT)
> > +#define second_linear_offset(va)    ((va) >> SECOND_SHIFT)
> > +#define third_linear_offset(va) ((va) >> THIRD_SHIFT)
> > +
> > +#define pagetable_zeroeth_index(va) zeroeth_linear_offset((va) &
> > ZEROETH_MASK)
> > +#define pagetable_first_index(va)   first_linear_offset((va) &
> > FIRST_MASK)
> > +#define pagetable_second_index(va)  second_linear_offset((va) &
> > SECOND_MASK)
> > +#define pagetable_third_index(va)   third_linear_offset((va) &
> > THIRD_MASK)
> > +
> > +/* Page Table entry */
> > +typedef struct {
> > +    uint64_t pte;
> > +} pte_t;
> > +
> > +/* Shift the VPN[x] or PPN[x] fields of a virtual or physical
> > address
> > + * to become the shifted PPN[x] fields of a page table entry */
> > +#define addr_to_ppn(x) (((x) >> PAGE_SHIFT) << PTE_SHIFT)
> > +
> > +static inline pte_t paddr_to_pte(unsigned long paddr)
> > +{
> > +    return (pte_t) { .pte = addr_to_ppn(paddr) };
> > +}
> > +
> > +static inline bool pte_is_valid(pte_t *p)
> 
> Btw - const whenever possible please, especially in such basic
> helpers.
Sure. Thanks.
> 
> > --- /dev/null
> > +++ b/xen/arch/riscv/mm.c
> > @@ -0,0 +1,223 @@
> > +#include 
> > +#include 
> > +
> > +#include 
> > +#include 
> > +#include 
> > +
> > +/*
> > + * xen_second_pagetable is indexed with the VPN[2] page table
> > entry field
> > + * xen_first_pagetable is accessed from the VPN[1] page table
> > entry field
> > + * xen_zeroeth_pagetable is accessed from the VPN[0] page table
> > entry field
> > + */
> > +pte_t xen_second_pagetable[PAGE_ENTRIES]
> > __attribute__((__aligned__(PAGE_SIZE)));
> 
> static?
It should be static.
Thanks.
> 
> > +static pte_t xen_first_pagetable[PAGE_ENTRIES]
> > +    __attribute__((__aligned__(PAGE_SIZE)));
> > +static pte_t xen_zeroeth_pagetable[PAGE_ENTRIES]
> > +    __attribute__((__aligned__(PAGE_SIZE)));
> 
> Please use __aligned() instead of open-coding it. You also may want
> to
> specifiy the section here explicitly, as .bss.page_aligned (as we do
> elsewhere).
> 
> > +extern unsigned long _stext;
> > +extern unsigned long _etext;
> > +extern unsigned long __init_begin;
> > +extern unsigned long __init_end;
> > +extern unsigned long _srodata;
> > +extern unsigned long _erodata;
> 
> Please use kernel.h and drop then colliding declarations. For what's
> left please use 

Re: [PATCH v1 1/3] xen/riscv: introduce setup_initial_pages

2023-03-05 Thread Oleksii
> 
> > 
> > > > 
> > > > > +
> > > > > +    page_addr = map_start;
> > > > > +    while ( page_addr < map_end )
> > > > 
> > > > Looking at the loop, it looks like you are assuming that the
> > > > region
> > > > will
> > > > never cross a boundary of a page-table (either L0, L1, L2). I
> > > > am
> > > > not
> > > > convinced you can make such assumption (see below).
> > > > 
> > > > But if you really want to make such assumption then you should
> > > > add
> > > > some
> > > > guard (either BUILD_BUG_ON(), ASSERT(), proper check) in your
> > > > code to
> > > > avoid any surprise in the future.
> > > I am not sure that I fully understand what is the problem here.
> > > The address is aligned on (1<<12) boundary and each itearation is
> > > mapped (1<<12) page so all looks fine or I misunderstood you.
> > 
> > Let's take an example, imagine the region you want to map is 4MB. 
> > AFAICT, you are only passing one L0 page-table. So your code will
> > end
> > up 
> > to overwrite the previous entries in the zeroeth page-table and
> > then
> > add 
> > another link in the L1 page-table.
> Got it. Then it looks that current approach isn't correct totally...
Or as an option we can add to xen.lds.S something like:

  ASSERT(_end - _start <= MB(L0_ENTRIES*PAGE_SIZE), "Xen too large")

~ Oleksii



Re: [PATCH v1 1/3] xen/riscv: introduce setup_initial_pages

2023-03-05 Thread Oleksii
Hi Julien,

On Mon, 2023-02-27 at 17:36 +, Julien Grall wrote:
> Hi Oleksii,
> 
> On 27/02/2023 16:52, Oleksii wrote:
> > On Sat, 2023-02-25 at 17:53 +, Julien Grall wrote:
> > > > +/*
> > > > + * WARNING: load_addr() and linker_addr() are to be called
> > > > only
> > > > when the MMU is
> > > > + * disabled and only when executed by the primary CPU.  They
> > > > cannot refer to
> > > > + * any global variable or functions.
> > > 
> > > I find interesting you are saying when
> > > _setup_initial_pagetables() is
> > > called from setup_initial_pagetables(). Would you be able to
> > > explain
> > > how
> > > this is different?
> > I am not sure that I understand your question correctly but
> > _setup_initial_pagetables() was introduced to map some addresses
> > with
> > write/read flag. Probably I have to rename it to something that is
> > more
> > clear.
> 
> So the comment suggests that you code cannot refer to global 
> functions/variables when the MMU is off. So I have multiple
> questions:
>    * Why only global? IOW, why static would be OK?
>    * setup_initial_pagetables() has a call to 
> _setup_initial_pagetables() (IOW referring to another function). Why
> is 
> it fine?
>    * You have code in the next patch referring to global variables 
> (mainly _start and _end). How is this different?
> 
> > > 
> > > > + */
> > > > +
> > > > +/*
> > > > + * Convert an addressed layed out at link time to the address
> > > > where it was loaded
> > > 
> > > Typo: s/addressed/address/ ?
> > Yes, it should be address. and 'layed out' should be changed to
> > 'laid
> > out'...
> > > 
> > > > + * by the bootloader.
> > > > + */
> > > 
> > > Looking at the implementation, you seem to consider that any
> > > address
> > > not
> > > in the range [linker_addr_start, linker_addr_end[ will have a 1:1
> > > mappings.
> > > 
> > > I am not sure this is what you want. So I would consider to throw
> > > an
> > > error if such address is passed.
> > I thought that at this stage and if no relocation was done it is
> > 1:1
> > except the case when load_addr_start != linker_addr_start.
> 
> The problem is what you try to map one to one may clash with the
> linked 
> region for Xen. So it is not always possible to map the region 1:1.
> 
> Therefore, I don't see any use for the else part here.
Got it. Thanks.

I am curious than what is the correct approach in general to handle
this situation?
I mean that throw an error it is one option but if I would like to do
that w/o throwing an error. Should it done some relocation in that
case?

> 
> > 
> > 
> > > 
> > > > +#define
> > > > load_addr(linker_address)
> > > >  \
> > > > +
> > > > ({
> > > >  \
> > > > +    unsigned long __linker_address = (unsigned
> > > > long)(linker_address);  \
> > > > +    if ( linker_addr_start <= __linker_address
> > > > &&  \
> > > > +    __linker_address < linker_addr_end
> > > > )   \
> > > > +
> > > > {
> > > >  \
> > > > +    __linker_address
> > > > = \
> > > > +    __linker_address - linker_addr_start +
> > > > load_addr_start;    \
> > > > +
> > > > }
> > > >  \
> > > > +
> > > > __linker_address;
> > > >  \
> > > > +    })
> > > > +
> > > > +/* Convert boot-time Xen address from where it was loaded by
> > > > the
> > > > boot loader to the address it was layed out
> > > > + * at link-time.
> > > > + */
> > > 
> > > Coding style: The first line is too long and multi-line comments
> > > look
> > > like:
> > > 
> > > /*
> > >    * Foo
> > >    * Bar
> > >    */
> > > 
> > > > +#define
> > > > linker_addr(load_address)
> > > >  \
> > > 
> > > Same remark as for load_addr() above.
> > > 
> > > > +
> > > > ({
> > > >  \
> > > > +    unsigned long __load_address = (unsigned
> > > > long)(load_address);  \
> > > > +    if ( load_addr_start <= __load_address
> > > > &&  \
> > > > +    __load_address < load_addr_end
> > > > )   \
> > > > +
> > > > {
> > > >  \
> > > > +    __load_address
> > > > =   \
> > > > +    __load_address - load_addr_start +
> > > > linker_addr_start;  \
> > > > +
> > > > }
> > > >  \
> > > > +
> > > > __load_address;
> > > >  \
> > > > +    })
> > > > +
> > > > +static void __attribute__((section(".entry")))
> > > > +_setup_initial_pagetables(pte_t *second, pte_t *first, pte_t
> > > > *zeroeth,
> > > Can this be named to setup_initial_mapping() so this is clearer
> > > and
> > > avoid the one '_' different with the function below.
> > Sure. It will be better.
> > > 
> > > > + unsigned long map_start,
> > > > + unsigned long map_end,
> > > > + unsigned long pa_start,
> > > > + bool 

Re: [PATCH v1 1/3] xen/riscv: introduce setup_initial_pages

2023-02-27 Thread Julien Grall

Hi Oleksii,

On 27/02/2023 16:52, Oleksii wrote:

On Sat, 2023-02-25 at 17:53 +, Julien Grall wrote:

+/*
+ * WARNING: load_addr() and linker_addr() are to be called only
when the MMU is
+ * disabled and only when executed by the primary CPU.  They
cannot refer to
+ * any global variable or functions.


I find interesting you are saying when _setup_initial_pagetables() is
called from setup_initial_pagetables(). Would you be able to explain
how
this is different?

I am not sure that I understand your question correctly but
_setup_initial_pagetables() was introduced to map some addresses with
write/read flag. Probably I have to rename it to something that is more
clear.


So the comment suggests that you code cannot refer to global 
functions/variables when the MMU is off. So I have multiple questions:

  * Why only global? IOW, why static would be OK?
  * setup_initial_pagetables() has a call to 
_setup_initial_pagetables() (IOW referring to another function). Why is 
it fine?
  * You have code in the next patch referring to global variables 
(mainly _start and _end). How is this different?





+ */
+
+/*
+ * Convert an addressed layed out at link time to the address
where it was loaded


Typo: s/addressed/address/ ?

Yes, it should be address. and 'layed out' should be changed to 'laid
out'...



+ * by the bootloader.
+ */


Looking at the implementation, you seem to consider that any address
not
in the range [linker_addr_start, linker_addr_end[ will have a 1:1
mappings.

I am not sure this is what you want. So I would consider to throw an
error if such address is passed.

I thought that at this stage and if no relocation was done it is 1:1
except the case when load_addr_start != linker_addr_start.


The problem is what you try to map one to one may clash with the linked 
region for Xen. So it is not always possible to map the region 1:1.


Therefore, I don't see any use for the else part here.







+#define
load_addr(linker_address)
     \
+
({
     \
+    unsigned long __linker_address = (unsigned
long)(linker_address);  \
+    if ( linker_addr_start <= __linker_address
&&  \
+    __linker_address < linker_addr_end
)   \
+
{
     \
+    __linker_address
= \
+    __linker_address - linker_addr_start +
load_addr_start;    \
+
}
     \
+
__linker_address;
     \
+    })
+
+/* Convert boot-time Xen address from where it was loaded by the
boot loader to the address it was layed out
+ * at link-time.
+ */


Coding style: The first line is too long and multi-line comments look
like:

/*
   * Foo
   * Bar
   */


+#define
linker_addr(load_address)
     \


Same remark as for load_addr() above.


+
({
     \
+    unsigned long __load_address = (unsigned
long)(load_address);  \
+    if ( load_addr_start <= __load_address
&&  \
+    __load_address < load_addr_end
)   \
+
{
     \
+    __load_address
=   \
+    __load_address - load_addr_start +
linker_addr_start;  \
+
}
     \
+
__load_address;
     \
+    })
+
+static void __attribute__((section(".entry")))
+_setup_initial_pagetables(pte_t *second, pte_t *first, pte_t
*zeroeth,

Can this be named to setup_initial_mapping() so this is clearer and
avoid the one '_' different with the function below.

Sure. It will be better.



+ unsigned long map_start,
+ unsigned long map_end,
+ unsigned long pa_start,
+ bool writable)


What about the executable bit?

It's always executable... But as you mentioned above PTE_LEAF_DEFAULT
should be either RX or RW.
I think it makes sense to add flags instead of writable.



+{
+    unsigned long page_addr;
+    unsigned long index2;
+    unsigned long index1;
+    unsigned long index0;


index* could be defined in the loop below.

It could. But I am curious why it is better?



+
+    /* align start addresses */
+    map_start &= ZEROETH_MAP_MASK;
+    pa_start &= ZEROETH_MAP_MASK;


Hmmm... I would actually expect the address to be properly aligned
and
therefore not require an alignment here.

Otherwise, this raise the question of what happen if you have region
using the same page?

That map_start &=  ZEROETH_MAP_MASK is needed to page number of address
w/o page offset.


My point is why would the page offset be non-zero?




+
+    page_addr = map_start;
+    while ( page_addr < map_end )


Looking at the loop, it looks like you are assuming that the region
will
never cross a boundary of a page-table (either L0, L1, L2). I am not
convinced you can make such assumption (see below).

But if you really want to make such assumption then you should add
some
guard (either BUILD_BUG_ON(), ASSERT(), proper check) in your code 

Re: [PATCH v1 1/3] xen/riscv: introduce setup_initial_pages

2023-02-27 Thread Oleksii
On Sat, 2023-02-25 at 17:53 +, Julien Grall wrote:
> Hi Oleksii,
> 
> On 24/02/2023 15:06, Oleksii Kurochko wrote:
> > Mostly the code for setup_initial_pages was taken from Bobby's
> > repo except for the following changes:
> > * Use only a minimal part of the code enough to enable MMU
> > * rename {_}setup_initial_pagetables functions
> > * add writable argument for _setup_initial_pagetables to have
> >    an opportunity to make some sections read-only
> > * update setup_initial_pagetables function to make some sections
> >    read-only
> > * change the order of _setup_inital_pagetables()
> >    in setup_initial_pagetable():
> >    * first it is called for text, init, rodata sections
> >    * after call it for ranges [link_addr_start : link_addr_end] and
> >  [load_addr_start : load_addr_end]
> >    Before it was done first for the ranges and after for sections
> > but
> >    in that case read-only status will be equal to 'true' and
> >    as sections' addresses  can/are inside the ranges the read-only
> > status
> >    won't be updated for them as it was set up before.
> > 
> > Origin:
> > https://gitlab.com/xen-on-risc-v/xen/-/tree/riscv-rebase 4af165b468
> > af
> > Signed-off-by: Oleksii Kurochko 
> > ---
> >   xen/arch/riscv/Makefile   |   1 +
> >   xen/arch/riscv/include/asm/mm.h   |   9 ++
> >   xen/arch/riscv/include/asm/page.h |  90 
> >   xen/arch/riscv/mm.c   | 223
> > ++
> >   4 files changed, 323 insertions(+)
> >   create mode 100644 xen/arch/riscv/include/asm/mm.h
> >   create mode 100644 xen/arch/riscv/include/asm/page.h
> >   create mode 100644 xen/arch/riscv/mm.c
> > 
> > diff --git a/xen/arch/riscv/Makefile b/xen/arch/riscv/Makefile
> > index 443f6bf15f..956ceb02df 100644
> > --- a/xen/arch/riscv/Makefile
> > +++ b/xen/arch/riscv/Makefile
> > @@ -1,5 +1,6 @@
> >   obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
> >   obj-y += entry.o
> > +obj-y += mm.o
> >   obj-$(CONFIG_RISCV_64) += riscv64/
> >   obj-y += sbi.o
> >   obj-y += setup.o
> > diff --git a/xen/arch/riscv/include/asm/mm.h
> > b/xen/arch/riscv/include/asm/mm.h
> > new file mode 100644
> > index 00..fc1866b1d8
> > --- /dev/null
> > +++ b/xen/arch/riscv/include/asm/mm.h
> > @@ -0,0 +1,9 @@
> > +#ifndef _ASM_RISCV_MM_H
> > +#define _ASM_RISCV_MM_H
> > +
> > +void setup_initial_pagetables(unsigned long load_addr_start,
> > +  unsigned long load_addr_end,
> > +  unsigned long linker_addr_start,
> > +  unsigned long linker_addr_end);
> > +
> > +#endif /* _ASM_RISCV_MM_H */
> > diff --git a/xen/arch/riscv/include/asm/page.h
> > b/xen/arch/riscv/include/asm/page.h
> > new file mode 100644
> > index 00..fabbe1305f
> > --- /dev/null
> > +++ b/xen/arch/riscv/include/asm/page.h
> > @@ -0,0 +1,90 @@
> > +#ifndef _ASM_RISCV_PAGE_H
> > +#define _ASM_RISCV_PAGE_H
> > +
> > +#include 
> > +#include 
> > +
> > +#define PAGE_ENTRIES    512
> 
> NIT: AFAIU, the number here is based on ...
> 
> > +#define VPN_BITS    (9)
> 
> ... this. So I would suggest to define PAGE_ENTRIES using VPN_BITS.
Sure. It should be defined using VPN_BITS. Thanks.
> 
> > +#define VPN_MASK    ((unsigned long)((1 << VPN_BITS) -
> > 1))
> NIT: Use 1UL and you can avoid the cast.
Thanks. I'll update that in the next version of patch series.
> 
> > +
> > +#ifdef CONFIG_RISCV_64
> > +/* L3 index Bit[47:39] */
> > +#define THIRD_SHIFT (39)
> > +#define THIRD_MASK  (VPN_MASK << THIRD_SHIFT)
> > +/* L2 index Bit[38:30] */
> > +#define SECOND_SHIFT    (30)
> > +#define SECOND_MASK (VPN_MASK << SECOND_SHIFT)
> > +/* L1 index Bit[29:21] */
> > +#define FIRST_SHIFT (21)
> > +#define FIRST_MASK  (VPN_MASK << FIRST_SHIFT)
> > +/* L0 index Bit[20:12] */
> > +#define ZEROETH_SHIFT   (12)
> > +#define ZEROETH_MASK    (VPN_MASK << ZEROETH_SHIFT)
> 
> On Arm, we are trying to phase out ZEROETH_* and co because the name
> is 
> too generic. Instead, we now introduce a generic macro that take a
> level 
> and then compute the mask/shift (see XEN_PT_LEVEL_*).
> 
> You should be able to do in RISC-V and reduce the amount of defines 
> introduced.
Thanks. I'll look at XEN_PT_LEVEL_*. I'll re-read Andrew's comment but
as far as I understand after quick reading we can remove mostly that.
> 
> > +
> > +#else // CONFIG_RISCV_32
> 
> Coding style: comments in Xen are using /* ... */
> 
> > +
> > +/* L1 index Bit[31:22] */
> > +#define FIRST_SHIFT (22)
> > +#define FIRST_MASK  (VPN_MASK << FIRST_SHIFT)
> > +
> > +/* L0 index Bit[21:12] */
> > +#define ZEROETH_SHIFT   (12)
> > +#define ZEROETH_MASK    (VPN_MASK << ZEROETH_SHIFT)
> > +#endif
> > +
> > +#define THIRD_SIZE  (1 << THIRD_SHIFT)
> > +#define THIRD_MAP_MASK  (~(THIRD_SIZE - 1))
> > +#define SECOND_SIZE   

Re: [PATCH v1 1/3] xen/riscv: introduce setup_initial_pages

2023-02-27 Thread Jan Beulich
On 27.02.2023 16:12, Jan Beulich wrote:
> On 24.02.2023 16:06, Oleksii Kurochko wrote:
>> +static void __attribute__((section(".entry")))
>> +_setup_initial_pagetables(pte_t *second, pte_t *first, pte_t *zeroeth,
> 
> Why the special section (also again further down)?

Looking at patch 2 it occurred to me that you probably mean __init here.

Jan



Re: [PATCH v1 1/3] xen/riscv: introduce setup_initial_pages

2023-02-27 Thread Jan Beulich
On 24.02.2023 16:06, Oleksii Kurochko wrote:
> --- /dev/null
> +++ b/xen/arch/riscv/include/asm/page.h
> @@ -0,0 +1,90 @@
> +#ifndef _ASM_RISCV_PAGE_H
> +#define _ASM_RISCV_PAGE_H
> +
> +#include 
> +#include 
> +
> +#define PAGE_ENTRIES512
> +#define VPN_BITS(9)
> +#define VPN_MASK((unsigned long)((1 << VPN_BITS) - 1))
> +
> +#ifdef CONFIG_RISCV_64
> +/* L3 index Bit[47:39] */
> +#define THIRD_SHIFT (39)
> +#define THIRD_MASK  (VPN_MASK << THIRD_SHIFT)
> +/* L2 index Bit[38:30] */
> +#define SECOND_SHIFT(30)
> +#define SECOND_MASK (VPN_MASK << SECOND_SHIFT)
> +/* L1 index Bit[29:21] */
> +#define FIRST_SHIFT (21)
> +#define FIRST_MASK  (VPN_MASK << FIRST_SHIFT)
> +/* L0 index Bit[20:12] */
> +#define ZEROETH_SHIFT   (12)
> +#define ZEROETH_MASK(VPN_MASK << ZEROETH_SHIFT)
> +
> +#else // CONFIG_RISCV_32
> +
> +/* L1 index Bit[31:22] */
> +#define FIRST_SHIFT (22)
> +#define FIRST_MASK  (VPN_MASK << FIRST_SHIFT)
> +
> +/* L0 index Bit[21:12] */
> +#define ZEROETH_SHIFT   (12)
> +#define ZEROETH_MASK(VPN_MASK << ZEROETH_SHIFT)
> +#endif
> +
> +#define THIRD_SIZE  (1 << THIRD_SHIFT)
> +#define THIRD_MAP_MASK  (~(THIRD_SIZE - 1))
> +#define SECOND_SIZE (1 << SECOND_SHIFT)
> +#define SECOND_MAP_MASK (~(SECOND_SIZE - 1))
> +#define FIRST_SIZE  (1 << FIRST_SHIFT)
> +#define FIRST_MAP_MASK  (~(FIRST_SIZE - 1))
> +#define ZEROETH_SIZE(1 << ZEROETH_SHIFT)
> +#define ZEROETH_MAP_MASK(~(ZEROETH_SIZE - 1))
> +
> +#define PTE_SHIFT   10
> +
> +#define PTE_VALID   BIT(0, UL)
> +#define PTE_READABLEBIT(1, UL)
> +#define PTE_WRITABLEBIT(2, UL)
> +#define PTE_EXECUTABLE  BIT(3, UL)
> +#define PTE_USERBIT(4, UL)
> +#define PTE_GLOBAL  BIT(5, UL)
> +#define PTE_ACCESSEDBIT(6, UL)
> +#define PTE_DIRTY   BIT(7, UL)
> +#define PTE_RSW (BIT(8, UL) | BIT(9, UL))
> +
> +#define PTE_LEAF_DEFAULT(PTE_VALID | PTE_READABLE | PTE_WRITABLE | 
> PTE_EXECUTABLE)
> +#define PTE_TABLE   (PTE_VALID)
> +
> +/* Calculate the offsets into the pagetables for a given VA */
> +#define zeroeth_linear_offset(va)   ((va) >> ZEROETH_SHIFT)
> +#define first_linear_offset(va) ((va) >> FIRST_SHIFT)
> +#define second_linear_offset(va)((va) >> SECOND_SHIFT)
> +#define third_linear_offset(va) ((va) >> THIRD_SHIFT)
> +
> +#define pagetable_zeroeth_index(va) zeroeth_linear_offset((va) & 
> ZEROETH_MASK)
> +#define pagetable_first_index(va)   first_linear_offset((va) & FIRST_MASK)
> +#define pagetable_second_index(va)  second_linear_offset((va) & SECOND_MASK)
> +#define pagetable_third_index(va)   third_linear_offset((va) & THIRD_MASK)
> +
> +/* Page Table entry */
> +typedef struct {
> +uint64_t pte;
> +} pte_t;
> +
> +/* Shift the VPN[x] or PPN[x] fields of a virtual or physical address
> + * to become the shifted PPN[x] fields of a page table entry */
> +#define addr_to_ppn(x) (((x) >> PAGE_SHIFT) << PTE_SHIFT)
> +
> +static inline pte_t paddr_to_pte(unsigned long paddr)
> +{
> +return (pte_t) { .pte = addr_to_ppn(paddr) };
> +}
> +
> +static inline bool pte_is_valid(pte_t *p)

Btw - const whenever possible please, especially in such basic helpers.

> --- /dev/null
> +++ b/xen/arch/riscv/mm.c
> @@ -0,0 +1,223 @@
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +
> +/*
> + * xen_second_pagetable is indexed with the VPN[2] page table entry field
> + * xen_first_pagetable is accessed from the VPN[1] page table entry field
> + * xen_zeroeth_pagetable is accessed from the VPN[0] page table entry field
> + */
> +pte_t xen_second_pagetable[PAGE_ENTRIES] 
> __attribute__((__aligned__(PAGE_SIZE)));

static?

> +static pte_t xen_first_pagetable[PAGE_ENTRIES]
> +__attribute__((__aligned__(PAGE_SIZE)));
> +static pte_t xen_zeroeth_pagetable[PAGE_ENTRIES]
> +__attribute__((__aligned__(PAGE_SIZE)));

Please use __aligned() instead of open-coding it. You also may want to
specifiy the section here explicitly, as .bss.page_aligned (as we do
elsewhere).

> +extern unsigned long _stext;
> +extern unsigned long _etext;
> +extern unsigned long __init_begin;
> +extern unsigned long __init_end;
> +extern unsigned long _srodata;
> +extern unsigned long _erodata;

Please use kernel.h and drop then colliding declarations. For what's
left please use array types, as suggested elsewhere already.

> +paddr_t phys_offset;
> +
> +#define resolve_early_addr(x) \
> +({   
>\
> + unsigned long * __##x;  
>\
> +if ( load_addr_start <= x && x < load_addr_end ) 
>\

Nit: Mismatched 

Re: [PATCH v1 1/3] xen/riscv: introduce setup_initial_pages

2023-02-25 Thread Julien Grall

Hi Oleksii,

On 24/02/2023 15:06, Oleksii Kurochko wrote:

Mostly the code for setup_initial_pages was taken from Bobby's
repo except for the following changes:
* Use only a minimal part of the code enough to enable MMU
* rename {_}setup_initial_pagetables functions
* add writable argument for _setup_initial_pagetables to have
   an opportunity to make some sections read-only
* update setup_initial_pagetables function to make some sections
   read-only
* change the order of _setup_inital_pagetables()
   in setup_initial_pagetable():
   * first it is called for text, init, rodata sections
   * after call it for ranges [link_addr_start : link_addr_end] and
 [load_addr_start : load_addr_end]
   Before it was done first for the ranges and after for sections but
   in that case read-only status will be equal to 'true' and
   as sections' addresses  can/are inside the ranges the read-only status
   won't be updated for them as it was set up before.

Origin: https://gitlab.com/xen-on-risc-v/xen/-/tree/riscv-rebase 4af165b468af
Signed-off-by: Oleksii Kurochko 
---
  xen/arch/riscv/Makefile   |   1 +
  xen/arch/riscv/include/asm/mm.h   |   9 ++
  xen/arch/riscv/include/asm/page.h |  90 
  xen/arch/riscv/mm.c   | 223 ++
  4 files changed, 323 insertions(+)
  create mode 100644 xen/arch/riscv/include/asm/mm.h
  create mode 100644 xen/arch/riscv/include/asm/page.h
  create mode 100644 xen/arch/riscv/mm.c

diff --git a/xen/arch/riscv/Makefile b/xen/arch/riscv/Makefile
index 443f6bf15f..956ceb02df 100644
--- a/xen/arch/riscv/Makefile
+++ b/xen/arch/riscv/Makefile
@@ -1,5 +1,6 @@
  obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
  obj-y += entry.o
+obj-y += mm.o
  obj-$(CONFIG_RISCV_64) += riscv64/
  obj-y += sbi.o
  obj-y += setup.o
diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h
new file mode 100644
index 00..fc1866b1d8
--- /dev/null
+++ b/xen/arch/riscv/include/asm/mm.h
@@ -0,0 +1,9 @@
+#ifndef _ASM_RISCV_MM_H
+#define _ASM_RISCV_MM_H
+
+void setup_initial_pagetables(unsigned long load_addr_start,
+  unsigned long load_addr_end,
+  unsigned long linker_addr_start,
+  unsigned long linker_addr_end);
+
+#endif /* _ASM_RISCV_MM_H */
diff --git a/xen/arch/riscv/include/asm/page.h 
b/xen/arch/riscv/include/asm/page.h
new file mode 100644
index 00..fabbe1305f
--- /dev/null
+++ b/xen/arch/riscv/include/asm/page.h
@@ -0,0 +1,90 @@
+#ifndef _ASM_RISCV_PAGE_H
+#define _ASM_RISCV_PAGE_H
+
+#include 
+#include 
+
+#define PAGE_ENTRIES512


NIT: AFAIU, the number here is based on ...


+#define VPN_BITS(9)


... this. So I would suggest to define PAGE_ENTRIES using VPN_BITS.


+#define VPN_MASK((unsigned long)((1 << VPN_BITS) - 1))

NIT: Use 1UL and you can avoid the cast.


+
+#ifdef CONFIG_RISCV_64
+/* L3 index Bit[47:39] */
+#define THIRD_SHIFT (39)
+#define THIRD_MASK  (VPN_MASK << THIRD_SHIFT)
+/* L2 index Bit[38:30] */
+#define SECOND_SHIFT(30)
+#define SECOND_MASK (VPN_MASK << SECOND_SHIFT)
+/* L1 index Bit[29:21] */
+#define FIRST_SHIFT (21)
+#define FIRST_MASK  (VPN_MASK << FIRST_SHIFT)
+/* L0 index Bit[20:12] */
+#define ZEROETH_SHIFT   (12)
+#define ZEROETH_MASK(VPN_MASK << ZEROETH_SHIFT)


On Arm, we are trying to phase out ZEROETH_* and co because the name is 
too generic. Instead, we now introduce a generic macro that take a level 
and then compute the mask/shift (see XEN_PT_LEVEL_*).


You should be able to do in RISC-V and reduce the amount of defines 
introduced.



+
+#else // CONFIG_RISCV_32


Coding style: comments in Xen are using /* ... */


+
+/* L1 index Bit[31:22] */
+#define FIRST_SHIFT (22)
+#define FIRST_MASK  (VPN_MASK << FIRST_SHIFT)
+
+/* L0 index Bit[21:12] */
+#define ZEROETH_SHIFT   (12)
+#define ZEROETH_MASK(VPN_MASK << ZEROETH_SHIFT)
+#endif
+
+#define THIRD_SIZE  (1 << THIRD_SHIFT)
+#define THIRD_MAP_MASK  (~(THIRD_SIZE - 1))
+#define SECOND_SIZE (1 << SECOND_SHIFT)
+#define SECOND_MAP_MASK (~(SECOND_SIZE - 1))
+#define FIRST_SIZE  (1 << FIRST_SHIFT)
+#define FIRST_MAP_MASK  (~(FIRST_SIZE - 1))
+#define ZEROETH_SIZE(1 << ZEROETH_SHIFT)
+#define ZEROETH_MAP_MASK(~(ZEROETH_SIZE - 1))
+
+#define PTE_SHIFT   10
+
+#define PTE_VALID   BIT(0, UL)
+#define PTE_READABLEBIT(1, UL)
+#define PTE_WRITABLEBIT(2, UL)
+#define PTE_EXECUTABLE  BIT(3, UL)
+#define PTE_USERBIT(4, UL)
+#define PTE_GLOBAL  BIT(5, UL)
+#define PTE_ACCESSEDBIT(6, UL)
+#define PTE_DIRTY   BIT(7, UL)
+#define PTE_RSW (BIT(8, UL) | BIT(9, UL))
+
+#define 

Re: [PATCH v1 1/3] xen/riscv: introduce setup_initial_pages

2023-02-24 Thread Andrew Cooper
On 24/02/2023 3:06 pm, Oleksii Kurochko wrote:
> diff --git a/xen/arch/riscv/include/asm/page.h 
> b/xen/arch/riscv/include/asm/page.h
> new file mode 100644
> index 00..fabbe1305f
> --- /dev/null
> +++ b/xen/arch/riscv/include/asm/page.h
> @@ -0,0 +1,90 @@
> +#ifndef _ASM_RISCV_PAGE_H
> +#define _ASM_RISCV_PAGE_H
> +
> +#include 
> +#include 
> +
> +#define PAGE_ENTRIES512
> +#define VPN_BITS(9)
> +#define VPN_MASK((unsigned long)((1 << VPN_BITS) - 1))
> +
> +#ifdef CONFIG_RISCV_64
> +/* L3 index Bit[47:39] */
> +#define THIRD_SHIFT (39)
> +#define THIRD_MASK  (VPN_MASK << THIRD_SHIFT)
> +/* L2 index Bit[38:30] */
> +#define SECOND_SHIFT(30)
> +#define SECOND_MASK (VPN_MASK << SECOND_SHIFT)
> +/* L1 index Bit[29:21] */
> +#define FIRST_SHIFT (21)
> +#define FIRST_MASK  (VPN_MASK << FIRST_SHIFT)
> +/* L0 index Bit[20:12] */
> +#define ZEROETH_SHIFT   (12)
> +#define ZEROETH_MASK(VPN_MASK << ZEROETH_SHIFT)

Don't name these with words.  That's an error ultimately inherited from
an architectural mistake ARM.

These should be named L1 (4k) thru L4 (512T), and don't need separate
separate masks or shifts because it looks like RISC-V designed their
pagetables in a coherent and uniform way.

You'll find the code simplifies substantially if you have
PAGETABLE_ORDER 9 somewhere in here.

The shift is always (PAGE_ORDER + level * PAGETABLE_ORDER), and it's
rare that you need something other than "(addr >> shift) & mask".  About
the only time you need a virtual address masked but unshifted is for
debugging.

~Andrew



[PATCH v1 1/3] xen/riscv: introduce setup_initial_pages

2023-02-24 Thread Oleksii Kurochko
Mostly the code for setup_initial_pages was taken from Bobby's
repo except for the following changes:
* Use only a minimal part of the code enough to enable MMU
* rename {_}setup_initial_pagetables functions
* add writable argument for _setup_initial_pagetables to have
  an opportunity to make some sections read-only
* update setup_initial_pagetables function to make some sections
  read-only
* change the order of _setup_inital_pagetables()
  in setup_initial_pagetable():
  * first it is called for text, init, rodata sections
  * after call it for ranges [link_addr_start : link_addr_end] and
[load_addr_start : load_addr_end]
  Before it was done first for the ranges and after for sections but
  in that case read-only status will be equal to 'true' and
  as sections' addresses  can/are inside the ranges the read-only status
  won't be updated for them as it was set up before.

Origin: https://gitlab.com/xen-on-risc-v/xen/-/tree/riscv-rebase 4af165b468af
Signed-off-by: Oleksii Kurochko 
---
 xen/arch/riscv/Makefile   |   1 +
 xen/arch/riscv/include/asm/mm.h   |   9 ++
 xen/arch/riscv/include/asm/page.h |  90 
 xen/arch/riscv/mm.c   | 223 ++
 4 files changed, 323 insertions(+)
 create mode 100644 xen/arch/riscv/include/asm/mm.h
 create mode 100644 xen/arch/riscv/include/asm/page.h
 create mode 100644 xen/arch/riscv/mm.c

diff --git a/xen/arch/riscv/Makefile b/xen/arch/riscv/Makefile
index 443f6bf15f..956ceb02df 100644
--- a/xen/arch/riscv/Makefile
+++ b/xen/arch/riscv/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
 obj-y += entry.o
+obj-y += mm.o
 obj-$(CONFIG_RISCV_64) += riscv64/
 obj-y += sbi.o
 obj-y += setup.o
diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h
new file mode 100644
index 00..fc1866b1d8
--- /dev/null
+++ b/xen/arch/riscv/include/asm/mm.h
@@ -0,0 +1,9 @@
+#ifndef _ASM_RISCV_MM_H
+#define _ASM_RISCV_MM_H
+
+void setup_initial_pagetables(unsigned long load_addr_start,
+  unsigned long load_addr_end,
+  unsigned long linker_addr_start,
+  unsigned long linker_addr_end);
+
+#endif /* _ASM_RISCV_MM_H */
diff --git a/xen/arch/riscv/include/asm/page.h 
b/xen/arch/riscv/include/asm/page.h
new file mode 100644
index 00..fabbe1305f
--- /dev/null
+++ b/xen/arch/riscv/include/asm/page.h
@@ -0,0 +1,90 @@
+#ifndef _ASM_RISCV_PAGE_H
+#define _ASM_RISCV_PAGE_H
+
+#include 
+#include 
+
+#define PAGE_ENTRIES512
+#define VPN_BITS(9)
+#define VPN_MASK((unsigned long)((1 << VPN_BITS) - 1))
+
+#ifdef CONFIG_RISCV_64
+/* L3 index Bit[47:39] */
+#define THIRD_SHIFT (39)
+#define THIRD_MASK  (VPN_MASK << THIRD_SHIFT)
+/* L2 index Bit[38:30] */
+#define SECOND_SHIFT(30)
+#define SECOND_MASK (VPN_MASK << SECOND_SHIFT)
+/* L1 index Bit[29:21] */
+#define FIRST_SHIFT (21)
+#define FIRST_MASK  (VPN_MASK << FIRST_SHIFT)
+/* L0 index Bit[20:12] */
+#define ZEROETH_SHIFT   (12)
+#define ZEROETH_MASK(VPN_MASK << ZEROETH_SHIFT)
+
+#else // CONFIG_RISCV_32
+
+/* L1 index Bit[31:22] */
+#define FIRST_SHIFT (22)
+#define FIRST_MASK  (VPN_MASK << FIRST_SHIFT)
+
+/* L0 index Bit[21:12] */
+#define ZEROETH_SHIFT   (12)
+#define ZEROETH_MASK(VPN_MASK << ZEROETH_SHIFT)
+#endif
+
+#define THIRD_SIZE  (1 << THIRD_SHIFT)
+#define THIRD_MAP_MASK  (~(THIRD_SIZE - 1))
+#define SECOND_SIZE (1 << SECOND_SHIFT)
+#define SECOND_MAP_MASK (~(SECOND_SIZE - 1))
+#define FIRST_SIZE  (1 << FIRST_SHIFT)
+#define FIRST_MAP_MASK  (~(FIRST_SIZE - 1))
+#define ZEROETH_SIZE(1 << ZEROETH_SHIFT)
+#define ZEROETH_MAP_MASK(~(ZEROETH_SIZE - 1))
+
+#define PTE_SHIFT   10
+
+#define PTE_VALID   BIT(0, UL)
+#define PTE_READABLEBIT(1, UL)
+#define PTE_WRITABLEBIT(2, UL)
+#define PTE_EXECUTABLE  BIT(3, UL)
+#define PTE_USERBIT(4, UL)
+#define PTE_GLOBAL  BIT(5, UL)
+#define PTE_ACCESSEDBIT(6, UL)
+#define PTE_DIRTY   BIT(7, UL)
+#define PTE_RSW (BIT(8, UL) | BIT(9, UL))
+
+#define PTE_LEAF_DEFAULT(PTE_VALID | PTE_READABLE | PTE_WRITABLE | 
PTE_EXECUTABLE)
+#define PTE_TABLE   (PTE_VALID)
+
+/* Calculate the offsets into the pagetables for a given VA */
+#define zeroeth_linear_offset(va)   ((va) >> ZEROETH_SHIFT)
+#define first_linear_offset(va) ((va) >> FIRST_SHIFT)
+#define second_linear_offset(va)((va) >> SECOND_SHIFT)
+#define third_linear_offset(va) ((va) >> THIRD_SHIFT)
+
+#define pagetable_zeroeth_index(va) zeroeth_linear_offset((va) & ZEROETH_MASK)
+#define pagetable_first_index(va)   first_linear_offset((va) & FIRST_MASK)
+#define