Re: [PATCH 4/4] mm/hmm: change CPU page table snapshot functions to simplify drivers

2018-03-16 Thread Jerome Glisse
On Thu, Mar 15, 2018 at 10:08:21PM -0700, John Hubbard wrote:
> On 03/15/2018 11:37 AM, jgli...@redhat.com wrote:
> > From: Jérôme Glisse 
> > 
> > This change hmm_vma_fault() and hmm_vma_get_pfns() API to allow HMM
> > to directly write entry that can match any device page table entry
> > format. Device driver now provide an array of flags value and we use
> > enum to index this array for each flag.
> > 
> > This also allow the device driver to ask for write fault on a per page
> > basis making API more flexible to service multiple device page faults
> > in one go.
> > 
> 
> Hi Jerome,
> 
> This is a large patch, so I'm going to review it in two passes. The first 
> pass is just an overview plus the hmm.h changes (now), and tomorrow I will
> review the hmm.c, which is where the real changes are.
> 
> Overview: the hmm.c changes are doing several things, and it is difficult to
> review, because refactoring, plus new behavior, makes diffs less useful here.
> It would probably be good to split the hmm.c changes into a few patches, such
> as:
> 
>   -- HMM_PFN_FLAG_* changes, plus function signature changes (mm_range* 
>being passed to functions), and
> -- New behavior in the page handling loops, and 
>   -- Refactoring into new routines (hmm_vma_handle_pte, and others)
> 
> That way, reviewers can see more easily that things are correct. 

So i resent patchset and i splitted this patch in many (11 patches now).
I included your comments so far in the new version so probably better if
you look at new one.

[...[

> > - * HMM_PFN_READ:  CPU page table has read permission set
> 
> So why is it that we don't need the _READ flag anymore? I looked at the 
> corresponding
> hmm.c but still don't quite get it. Is it that we just expect that _READ is
> always set if there is an entry at all? Or something else?

Explained why in the commit message, !READ when WRITE make no sense so
now VALID imply READ as does WRITE (write by itself is meaningless
without valid).

Cheers,
Jérôme


Re: [PATCH 4/4] mm/hmm: change CPU page table snapshot functions to simplify drivers

2018-03-16 Thread Jerome Glisse
On Thu, Mar 15, 2018 at 10:08:21PM -0700, John Hubbard wrote:
> On 03/15/2018 11:37 AM, jgli...@redhat.com wrote:
> > From: Jérôme Glisse 
> > 
> > This change hmm_vma_fault() and hmm_vma_get_pfns() API to allow HMM
> > to directly write entry that can match any device page table entry
> > format. Device driver now provide an array of flags value and we use
> > enum to index this array for each flag.
> > 
> > This also allow the device driver to ask for write fault on a per page
> > basis making API more flexible to service multiple device page faults
> > in one go.
> > 
> 
> Hi Jerome,
> 
> This is a large patch, so I'm going to review it in two passes. The first 
> pass is just an overview plus the hmm.h changes (now), and tomorrow I will
> review the hmm.c, which is where the real changes are.
> 
> Overview: the hmm.c changes are doing several things, and it is difficult to
> review, because refactoring, plus new behavior, makes diffs less useful here.
> It would probably be good to split the hmm.c changes into a few patches, such
> as:
> 
>   -- HMM_PFN_FLAG_* changes, plus function signature changes (mm_range* 
>being passed to functions), and
> -- New behavior in the page handling loops, and 
>   -- Refactoring into new routines (hmm_vma_handle_pte, and others)
> 
> That way, reviewers can see more easily that things are correct. 

So i resent patchset and i splitted this patch in many (11 patches now).
I included your comments so far in the new version so probably better if
you look at new one.

[...[

> > - * HMM_PFN_READ:  CPU page table has read permission set
> 
> So why is it that we don't need the _READ flag anymore? I looked at the 
> corresponding
> hmm.c but still don't quite get it. Is it that we just expect that _READ is
> always set if there is an entry at all? Or something else?

Explained why in the commit message, !READ when WRITE make no sense so
now VALID imply READ as does WRITE (write by itself is meaningless
without valid).

Cheers,
Jérôme


Re: [PATCH 4/4] mm/hmm: change CPU page table snapshot functions to simplify drivers

2018-03-15 Thread John Hubbard
On 03/15/2018 11:37 AM, jgli...@redhat.com wrote:
> From: Jérôme Glisse 
> 
> This change hmm_vma_fault() and hmm_vma_get_pfns() API to allow HMM
> to directly write entry that can match any device page table entry
> format. Device driver now provide an array of flags value and we use
> enum to index this array for each flag.
> 
> This also allow the device driver to ask for write fault on a per page
> basis making API more flexible to service multiple device page faults
> in one go.
> 

Hi Jerome,

This is a large patch, so I'm going to review it in two passes. The first 
pass is just an overview plus the hmm.h changes (now), and tomorrow I will
review the hmm.c, which is where the real changes are.

Overview: the hmm.c changes are doing several things, and it is difficult to
review, because refactoring, plus new behavior, makes diffs less useful here.
It would probably be good to split the hmm.c changes into a few patches, such
as:

-- HMM_PFN_FLAG_* changes, plus function signature changes (mm_range* 
   being passed to functions), and
-- New behavior in the page handling loops, and 
-- Refactoring into new routines (hmm_vma_handle_pte, and others)

That way, reviewers can see more easily that things are correct. 

> Signed-off-by: Jérôme Glisse 
> Cc: Evgeny Baskakov 
> Cc: Ralph Campbell 
> Cc: Mark Hairgrove 
> Cc: John Hubbard 
> ---
>  include/linux/hmm.h | 130 +++--
>  mm/hmm.c| 331 
> +---
>  2 files changed, 249 insertions(+), 212 deletions(-)
> 
> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> index 61b0e1c05ee1..34e8a8c65bbd 100644
> --- a/include/linux/hmm.h
> +++ b/include/linux/hmm.h
> @@ -80,11 +80,10 @@
>  struct hmm;
>  
>  /*
> - * hmm_pfn_t - HMM uses its own pfn type to keep several flags per page
> + * uint64_t - HMM uses its own pfn type to keep several flags per page

This line now is a little odd, because it looks like it's trying to document
uint64_t as an HMM pfn type. :) Maybe:

* HMM pfns are of type uint64_t

...or else just delete it, either way.

>   *
>   * Flags:
>   * HMM_PFN_VALID: pfn is valid

All of these are missing a _FLAG_ piece. The above should be HMM_PFN_FLAG_VALID,
to match the enum below.

> - * HMM_PFN_READ:  CPU page table has read permission set

So why is it that we don't need the _READ flag anymore? I looked at the 
corresponding
hmm.c but still don't quite get it. Is it that we just expect that _READ is
always set if there is an entry at all? Or something else?

>   * HMM_PFN_WRITE: CPU page table has write permission set
>   * HMM_PFN_ERROR: corresponding CPU page table entry points to poisoned 
> memory
>   * HMM_PFN_EMPTY: corresponding CPU page table entry is pte_none()
> @@ -92,64 +91,94 @@ struct hmm;
>   *  result of vm_insert_pfn() or vm_insert_page(). Therefore, it should 
> not
>   *  be mirrored by a device, because the entry will never have 
> HMM_PFN_VALID
>   *  set and the pfn value is undefined.
> - * HMM_PFN_DEVICE_UNADDRESSABLE: unaddressable device memory (ZONE_DEVICE)
> + * HMM_PFN_DEVICE_PRIVATE: private device memory (ZONE_DEVICE)
>   */
> -typedef unsigned long hmm_pfn_t;
> +enum hmm_pfn_flag_e {
> + HMM_PFN_FLAG_VALID = 0,
> + HMM_PFN_FLAG_WRITE,
> + HMM_PFN_FLAG_ERROR,
> + HMM_PFN_FLAG_NONE,
> + HMM_PFN_FLAG_SPECIAL,
> + HMM_PFN_FLAG_DEVICE_PRIVATE,
> + HMM_PFN_FLAG_MAX
> +};
> +
> +/*
> + * struct hmm_range - track invalidation lock on virtual address range
> + *
> + * @vma: the vm area struct for the range
> + * @list: all range lock are on a list
> + * @start: range virtual start address (inclusive)
> + * @end: range virtual end address (exclusive)
> + * @pfns: array of pfns (big enough for the range)
> + * @flags: pfn flags to match device driver page table
> + * @valid: pfns array did not change since it has been fill by an HMM 
> function
> + */
> +struct hmm_range {
> + struct vm_area_struct   *vma;
> + struct list_headlist;
> + unsigned long   start;
> + unsigned long   end;
> + uint64_t*pfns;
> + const uint64_t  *flags;
> + uint8_t pfn_shift;
> + boolvalid;
> +};
> +#define HMM_RANGE_PFN_FLAG(f) (range->flags[HMM_PFN_FLAG_##f])

Please please please no. :)  This breaks grep without actually adding any value.
It's not as if you need to build up a whole set of symmetric macros like
the Page* flags do, after all. So we can keep this very simple, instead.

I've looked through the hmm.c and it's always just something like
HMM_RANGE_PFN_FLAG(WRITE), so there really is no need for this macro at all.

Just use HMM_PFN_FLAG_WRITE and friends directly, and enjoy the resulting 
clarity.


>  
> -#define HMM_PFN_VALID (1 << 0)
> 

Re: [PATCH 4/4] mm/hmm: change CPU page table snapshot functions to simplify drivers

2018-03-15 Thread John Hubbard
On 03/15/2018 11:37 AM, jgli...@redhat.com wrote:
> From: Jérôme Glisse 
> 
> This change hmm_vma_fault() and hmm_vma_get_pfns() API to allow HMM
> to directly write entry that can match any device page table entry
> format. Device driver now provide an array of flags value and we use
> enum to index this array for each flag.
> 
> This also allow the device driver to ask for write fault on a per page
> basis making API more flexible to service multiple device page faults
> in one go.
> 

Hi Jerome,

This is a large patch, so I'm going to review it in two passes. The first 
pass is just an overview plus the hmm.h changes (now), and tomorrow I will
review the hmm.c, which is where the real changes are.

Overview: the hmm.c changes are doing several things, and it is difficult to
review, because refactoring, plus new behavior, makes diffs less useful here.
It would probably be good to split the hmm.c changes into a few patches, such
as:

-- HMM_PFN_FLAG_* changes, plus function signature changes (mm_range* 
   being passed to functions), and
-- New behavior in the page handling loops, and 
-- Refactoring into new routines (hmm_vma_handle_pte, and others)

That way, reviewers can see more easily that things are correct. 

> Signed-off-by: Jérôme Glisse 
> Cc: Evgeny Baskakov 
> Cc: Ralph Campbell 
> Cc: Mark Hairgrove 
> Cc: John Hubbard 
> ---
>  include/linux/hmm.h | 130 +++--
>  mm/hmm.c| 331 
> +---
>  2 files changed, 249 insertions(+), 212 deletions(-)
> 
> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> index 61b0e1c05ee1..34e8a8c65bbd 100644
> --- a/include/linux/hmm.h
> +++ b/include/linux/hmm.h
> @@ -80,11 +80,10 @@
>  struct hmm;
>  
>  /*
> - * hmm_pfn_t - HMM uses its own pfn type to keep several flags per page
> + * uint64_t - HMM uses its own pfn type to keep several flags per page

This line now is a little odd, because it looks like it's trying to document
uint64_t as an HMM pfn type. :) Maybe:

* HMM pfns are of type uint64_t

...or else just delete it, either way.

>   *
>   * Flags:
>   * HMM_PFN_VALID: pfn is valid

All of these are missing a _FLAG_ piece. The above should be HMM_PFN_FLAG_VALID,
to match the enum below.

> - * HMM_PFN_READ:  CPU page table has read permission set

So why is it that we don't need the _READ flag anymore? I looked at the 
corresponding
hmm.c but still don't quite get it. Is it that we just expect that _READ is
always set if there is an entry at all? Or something else?

>   * HMM_PFN_WRITE: CPU page table has write permission set
>   * HMM_PFN_ERROR: corresponding CPU page table entry points to poisoned 
> memory
>   * HMM_PFN_EMPTY: corresponding CPU page table entry is pte_none()
> @@ -92,64 +91,94 @@ struct hmm;
>   *  result of vm_insert_pfn() or vm_insert_page(). Therefore, it should 
> not
>   *  be mirrored by a device, because the entry will never have 
> HMM_PFN_VALID
>   *  set and the pfn value is undefined.
> - * HMM_PFN_DEVICE_UNADDRESSABLE: unaddressable device memory (ZONE_DEVICE)
> + * HMM_PFN_DEVICE_PRIVATE: private device memory (ZONE_DEVICE)
>   */
> -typedef unsigned long hmm_pfn_t;
> +enum hmm_pfn_flag_e {
> + HMM_PFN_FLAG_VALID = 0,
> + HMM_PFN_FLAG_WRITE,
> + HMM_PFN_FLAG_ERROR,
> + HMM_PFN_FLAG_NONE,
> + HMM_PFN_FLAG_SPECIAL,
> + HMM_PFN_FLAG_DEVICE_PRIVATE,
> + HMM_PFN_FLAG_MAX
> +};
> +
> +/*
> + * struct hmm_range - track invalidation lock on virtual address range
> + *
> + * @vma: the vm area struct for the range
> + * @list: all range lock are on a list
> + * @start: range virtual start address (inclusive)
> + * @end: range virtual end address (exclusive)
> + * @pfns: array of pfns (big enough for the range)
> + * @flags: pfn flags to match device driver page table
> + * @valid: pfns array did not change since it has been fill by an HMM 
> function
> + */
> +struct hmm_range {
> + struct vm_area_struct   *vma;
> + struct list_headlist;
> + unsigned long   start;
> + unsigned long   end;
> + uint64_t*pfns;
> + const uint64_t  *flags;
> + uint8_t pfn_shift;
> + boolvalid;
> +};
> +#define HMM_RANGE_PFN_FLAG(f) (range->flags[HMM_PFN_FLAG_##f])

Please please please no. :)  This breaks grep without actually adding any value.
It's not as if you need to build up a whole set of symmetric macros like
the Page* flags do, after all. So we can keep this very simple, instead.

I've looked through the hmm.c and it's always just something like
HMM_RANGE_PFN_FLAG(WRITE), so there really is no need for this macro at all.

Just use HMM_PFN_FLAG_WRITE and friends directly, and enjoy the resulting 
clarity.


>  
> -#define HMM_PFN_VALID (1 << 0)
> -#define HMM_PFN_READ (1 << 1)
> -#define HMM_PFN_WRITE (1 << 2)
> -#define HMM_PFN_ERROR (1 << 3)
> -#define HMM_PFN_EMPTY (1 << 4)
>