[PATCH 5/5] mm: remove range parameter from follow_invalidate_pte()

2022-01-20 Thread Muchun Song
The only user (DAX) of range parameter of follow_invalidate_pte()
is gone, it safe to remove the range paramter and make it static
to simlify the code.

Signed-off-by: Muchun Song 
---
 include/linux/mm.h |  3 ---
 mm/memory.c| 23 +++
 2 files changed, 3 insertions(+), 23 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index d211a06784d5..7895b17f6847 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1814,9 +1814,6 @@ void free_pgd_range(struct mmu_gather *tlb, unsigned long 
addr,
unsigned long end, unsigned long floor, unsigned long ceiling);
 int
 copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct 
*src_vma);
-int follow_invalidate_pte(struct mm_struct *mm, unsigned long address,
- struct mmu_notifier_range *range, pte_t **ptepp,
- pmd_t **pmdpp, spinlock_t **ptlp);
 int follow_pte(struct mm_struct *mm, unsigned long address,
   pte_t **ptepp, spinlock_t **ptlp);
 int follow_pfn(struct vm_area_struct *vma, unsigned long address,
diff --git a/mm/memory.c b/mm/memory.c
index 514a81cdd1ae..e8ce066be5f2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4869,9 +4869,8 @@ int __pmd_alloc(struct mm_struct *mm, pud_t *pud, 
unsigned long address)
 }
 #endif /* __PAGETABLE_PMD_FOLDED */
 
-int follow_invalidate_pte(struct mm_struct *mm, unsigned long address,
- struct mmu_notifier_range *range, pte_t **ptepp,
- pmd_t **pmdpp, spinlock_t **ptlp)
+static int follow_invalidate_pte(struct mm_struct *mm, unsigned long address,
+pte_t **ptepp, pmd_t **pmdpp, spinlock_t 
**ptlp)
 {
pgd_t *pgd;
p4d_t *p4d;
@@ -4898,31 +4897,17 @@ int follow_invalidate_pte(struct mm_struct *mm, 
unsigned long address,
if (!pmdpp)
goto out;
 
-   if (range) {
-   mmu_notifier_range_init(range, MMU_NOTIFY_CLEAR, 0,
-   NULL, mm, address & PMD_MASK,
-   (address & PMD_MASK) + 
PMD_SIZE);
-   mmu_notifier_invalidate_range_start(range);
-   }
*ptlp = pmd_lock(mm, pmd);
if (pmd_huge(*pmd)) {
*pmdpp = pmd;
return 0;
}
spin_unlock(*ptlp);
-   if (range)
-   mmu_notifier_invalidate_range_end(range);
}
 
if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd)))
goto out;
 
-   if (range) {
-   mmu_notifier_range_init(range, MMU_NOTIFY_CLEAR, 0, NULL, mm,
-   address & PAGE_MASK,
-   (address & PAGE_MASK) + PAGE_SIZE);
-   mmu_notifier_invalidate_range_start(range);
-   }
ptep = pte_offset_map_lock(mm, pmd, address, ptlp);
if (!pte_present(*ptep))
goto unlock;
@@ -4930,8 +4915,6 @@ int follow_invalidate_pte(struct mm_struct *mm, unsigned 
long address,
return 0;
 unlock:
pte_unmap_unlock(ptep, *ptlp);
-   if (range)
-   mmu_notifier_invalidate_range_end(range);
 out:
return -EINVAL;
 }
@@ -4960,7 +4943,7 @@ int follow_invalidate_pte(struct mm_struct *mm, unsigned 
long address,
 int follow_pte(struct mm_struct *mm, unsigned long address,
   pte_t **ptepp, spinlock_t **ptlp)
 {
-   return follow_invalidate_pte(mm, address, NULL, ptepp, NULL, ptlp);
+   return follow_invalidate_pte(mm, address, ptepp, NULL, ptlp);
 }
 EXPORT_SYMBOL_GPL(follow_pte);
 
-- 
2.11.0




[PATCH 4/5] dax: fix missing writeprotect the pte entry

2022-01-20 Thread Muchun Song
Currently dax_mapping_entry_mkclean() fails to clean and write protect
the pte entry within a DAX PMD entry during an *sync operation. This
can result in data loss in the following sequence:

  1) process A mmap write to DAX PMD, dirtying PMD radix tree entry and
 making the pmd entry dirty and writeable.
  2) process B mmap with the @offset (e.g. 4K) and @length (e.g. 4K)
 write to the same file, dirtying PMD radix tree entry (already
 done in 1)) and making the pte entry dirty and writeable.
  3) fsync, flushing out PMD data and cleaning the radix tree entry. We
 currently fail to mark the pte entry as clean and write protected
 since the vma of process B is not covered in dax_entry_mkclean().
  4) process B writes to the pte. These don't cause any page faults since
 the pte entry is dirty and writeable. The radix tree entry remains
 clean.
  5) fsync, which fails to flush the dirty PMD data because the radix tree
 entry was clean.
  6) crash - dirty data that should have been fsync'd as part of 5) could
 still have been in the processor cache, and is lost.

Reuse some infrastructure of page_mkclean_one() to let DAX can handle
similar case to fix this issue.

Fixes: 4b4bb46d00b3 ("dax: clear dirty entry tags on cache flush")
Signed-off-by: Muchun Song 
---
 fs/dax.c | 78 +---
 include/linux/rmap.h |  9 ++
 mm/internal.h| 27 --
 mm/rmap.c| 69 ++
 4 files changed, 85 insertions(+), 98 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 2955ec65eb65..7d4e3e68b861 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #define CREATE_TRACE_POINTS
@@ -801,86 +802,21 @@ static void *dax_insert_entry(struct xa_state *xas,
return entry;
 }
 
-static inline
-unsigned long pgoff_address(pgoff_t pgoff, struct vm_area_struct *vma)
-{
-   unsigned long address;
-
-   address = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
-   VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
-   return address;
-}
-
 /* Walk all mappings of a given index of a file and writeprotect them */
-static void dax_entry_mkclean(struct address_space *mapping, pgoff_t index,
-   unsigned long pfn)
+static void dax_entry_mkclean(struct address_space *mapping, unsigned long pfn,
+ unsigned long npfn, pgoff_t pgoff_start)
 {
struct vm_area_struct *vma;
-   pte_t pte, *ptep = NULL;
-   pmd_t *pmdp = NULL;
-   spinlock_t *ptl;
+   pgoff_t pgoff_end = pgoff_start + npfn - 1;
 
i_mmap_lock_read(mapping);
-   vma_interval_tree_foreach(vma, >i_mmap, index, index) {
-   struct mmu_notifier_range range;
-   unsigned long address;
-
+   vma_interval_tree_foreach(vma, >i_mmap, pgoff_start, 
pgoff_end) {
cond_resched();
 
if (!(vma->vm_flags & VM_SHARED))
continue;
 
-   address = pgoff_address(index, vma);
-
-   /*
-* follow_invalidate_pte() will use the range to call
-* mmu_notifier_invalidate_range_start() on our behalf before
-* taking any lock.
-*/
-   if (follow_invalidate_pte(vma->vm_mm, address, , ,
- , ))
-   continue;
-
-   /*
-* No need to call mmu_notifier_invalidate_range() as we are
-* downgrading page table protection not changing it to point
-* to a new page.
-*
-* See Documentation/vm/mmu_notifier.rst
-*/
-   if (pmdp) {
-#ifdef CONFIG_FS_DAX_PMD
-   pmd_t pmd;
-
-   if (pfn != pmd_pfn(*pmdp))
-   goto unlock_pmd;
-   if (!pmd_dirty(*pmdp) && !pmd_write(*pmdp))
-   goto unlock_pmd;
-
-   flush_cache_range(vma, address, address + 
HPAGE_PMD_SIZE);
-   pmd = pmdp_invalidate(vma, address, pmdp);
-   pmd = pmd_wrprotect(pmd);
-   pmd = pmd_mkclean(pmd);
-   set_pmd_at(vma->vm_mm, address, pmdp, pmd);
-unlock_pmd:
-#endif
-   spin_unlock(ptl);
-   } else {
-   if (pfn != pte_pfn(*ptep))
-   goto unlock_pte;
-   if (!pte_dirty(*ptep) && !pte_write(*ptep))
-   goto unlock_pte;
-
-   flush_cache_page(vma, address, pfn);
-   pte = ptep_clear_flush(vma, address, ptep);
-   pte = pte_wrprotect(pte);
-   pte = 

[PATCH 3/5] mm: page_vma_mapped: support checking if a pfn is mapped into a vma

2022-01-20 Thread Muchun Song
page_vma_mapped_walk() is supposed to check if a page is mapped into a vma.
However, not all page frames (e.g. PFN_DEV) have a associated struct page
with it. There is going to be some duplicate codes similar with this function
if someone want to check if a pfn (without a struct page) is mapped into a
vma. So add support for checking if a pfn is mapped into a vma. In the next
patch, the dax will use this new feature.

Signed-off-by: Muchun Song 
---
 include/linux/rmap.h | 13 +--
 mm/internal.h| 25 +---
 mm/page_vma_mapped.c | 65 +---
 3 files changed, 70 insertions(+), 33 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 221c3c6438a7..7628474732e7 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -204,9 +204,18 @@ int make_device_exclusive_range(struct mm_struct *mm, 
unsigned long start,
 #define PVMW_SYNC  (1 << 0)
 /* Look for migarion entries rather than present PTEs */
 #define PVMW_MIGRATION (1 << 1)
+/* Walk the page table by checking the pfn instead of a struct page */
+#define PVMW_PFN_WALK  (1 << 2)
 
 struct page_vma_mapped_walk {
-   struct page *page;
+   union {
+   struct page *page;
+   struct {
+   unsigned long pfn;
+   unsigned int nr;
+   pgoff_t index;
+   };
+   };
struct vm_area_struct *vma;
unsigned long address;
pmd_t *pmd;
@@ -218,7 +227,7 @@ struct page_vma_mapped_walk {
 static inline void page_vma_mapped_walk_done(struct page_vma_mapped_walk *pvmw)
 {
/* HugeTLB pte is set to the relevant page table entry without 
pte_mapped. */
-   if (pvmw->pte && !PageHuge(pvmw->page))
+   if (pvmw->pte && ((pvmw->flags & PVMW_PFN_WALK) || 
!PageHuge(pvmw->page)))
pte_unmap(pvmw->pte);
if (pvmw->ptl)
spin_unlock(pvmw->ptl);
diff --git a/mm/internal.h b/mm/internal.h
index deb9bda18e59..d6e3e8e1be2d 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -478,25 +478,34 @@ vma_address(struct page *page, struct vm_area_struct *vma)
 }
 
 /*
- * Then at what user virtual address will none of the page be found in vma?
- * Assumes that vma_address() already returned a good starting address.
- * If page is a compound head, the entire compound page is considered.
+ * Return the end of user virtual address at the specific offset within
+ * a vma.
  */
 static inline unsigned long
-vma_address_end(struct page *page, struct vm_area_struct *vma)
+vma_pgoff_address_end(pgoff_t pgoff, unsigned long nr_pages,
+ struct vm_area_struct *vma)
 {
-   pgoff_t pgoff;
unsigned long address;
 
-   VM_BUG_ON_PAGE(PageKsm(page), page);/* KSM page->index unusable */
-   pgoff = page_to_pgoff(page) + compound_nr(page);
-   address = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
+   address = vma->vm_start + ((pgoff + nr_pages - vma->vm_pgoff) << 
PAGE_SHIFT);
/* Check for address beyond vma (or wrapped through 0?) */
if (address < vma->vm_start || address > vma->vm_end)
address = vma->vm_end;
return address;
 }
 
+/*
+ * Then at what user virtual address will none of the page be found in vma?
+ * Assumes that vma_address() already returned a good starting address.
+ * If page is a compound head, the entire compound page is considered.
+ */
+static inline unsigned long
+vma_address_end(struct page *page, struct vm_area_struct *vma)
+{
+   VM_BUG_ON_PAGE(PageKsm(page), page);/* KSM page->index unusable */
+   return vma_pgoff_address_end(page_to_pgoff(page), compound_nr(page), 
vma);
+}
+
 static inline struct file *maybe_unlock_mmap_for_io(struct vm_fault *vmf,
struct file *fpin)
 {
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index f7b331081791..c8819770d457 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -53,10 +53,16 @@ static bool map_pte(struct page_vma_mapped_walk *pvmw)
return true;
 }
 
-static inline bool pfn_is_match(struct page *page, unsigned long pfn)
+static inline bool pfn_is_match(struct page_vma_mapped_walk *pvmw, unsigned 
long pfn)
 {
-   unsigned long page_pfn = page_to_pfn(page);
+   struct page *page;
+   unsigned long page_pfn;
 
+   if (pvmw->flags & PVMW_PFN_WALK)
+   return pfn >= pvmw->pfn && pfn - pvmw->pfn < pvmw->nr;
+
+   page = pvmw->page;
+   page_pfn = page_to_pfn(page);
/* normal page and hugetlbfs page */
if (!PageTransCompound(page) || PageHuge(page))
return page_pfn == pfn;
@@ -116,7 +122,7 @@ static bool check_pte(struct page_vma_mapped_walk *pvmw)
pfn = pte_pfn(*pvmw->pte);
}
 
-   return pfn_is_match(pvmw->page, pfn);
+   return pfn_is_match(pvmw, pfn);
 

[PATCH 2/5] dax: fix cache flush on PMD-mapped pages

2022-01-20 Thread Muchun Song
The flush_cache_page() only remove a PAGE_SIZE sized range from the cache.
However, it does not cover the full pages in a THP except a head page.
Replace it with flush_cache_range() to fix this issue.

Fixes: f729c8c9b24f ("dax: wrprotect pmd_t in dax_mapping_entry_mkclean")
Signed-off-by: Muchun Song 
---
 fs/dax.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/dax.c b/fs/dax.c
index 88be1c02a151..2955ec65eb65 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -857,7 +857,7 @@ static void dax_entry_mkclean(struct address_space 
*mapping, pgoff_t index,
if (!pmd_dirty(*pmdp) && !pmd_write(*pmdp))
goto unlock_pmd;
 
-   flush_cache_page(vma, address, pfn);
+   flush_cache_range(vma, address, address + 
HPAGE_PMD_SIZE);
pmd = pmdp_invalidate(vma, address, pmdp);
pmd = pmd_wrprotect(pmd);
pmd = pmd_mkclean(pmd);
-- 
2.11.0




[PATCH 1/5] mm: rmap: fix cache flush on THP pages

2022-01-20 Thread Muchun Song
The flush_cache_page() only remove a PAGE_SIZE sized range from the cache.
However, it does not cover the full pages in a THP except a head page.
Replace it with flush_cache_range() to fix this issue. At least, no
problems were found due to this. Maybe because the architectures that
have virtual indexed caches is less.

Fixes: f27176cfc363 ("mm: convert page_mkclean_one() to use 
page_vma_mapped_walk()")
Signed-off-by: Muchun Song 
---
 mm/rmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index b0fd9dc19eba..65670cb805d6 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -974,7 +974,7 @@ static bool page_mkclean_one(struct page *page, struct 
vm_area_struct *vma,
if (!pmd_dirty(*pmd) && !pmd_write(*pmd))
continue;
 
-   flush_cache_page(vma, address, page_to_pfn(page));
+   flush_cache_range(vma, address, address + 
HPAGE_PMD_SIZE);
entry = pmdp_invalidate(vma, address, pmd);
entry = pmd_wrprotect(entry);
entry = pmd_mkclean(entry);
-- 
2.11.0




Re: [PATCH v9 02/10] dax: Introduce holder for dax_device

2022-01-20 Thread Christoph Hellwig
On Thu, Jan 20, 2022 at 06:22:00PM -0800, Darrick J. Wong wrote:
> Hm, so that means XFS can only support dax+pmem when there aren't
> partitions in use?  Ew.

Yes.  Or any sensible DAX usage going forward for that matter.

> 
> > >   (2) extent the holder mechanism to cover a rangeo
> 
> I don't think I was around for the part where "hch balked at a notifier
> call chain" -- what were the objections there, specifically?  I would
> hope that pmem problems would be infrequent enough that the locking
> contention (or rcu expiration) wouldn't be an issue...?

notifiers are a nightmare untype API leading to tons of boilerplate
code.  Open coding the notification is almost always a better idea.



Re: [PATCH v9 10/10] fsdax: set a CoW flag when associate reflink mappings

2022-01-20 Thread Christoph Hellwig
On Fri, Jan 21, 2022 at 10:33:58AM +0800, Shiyang Ruan wrote:
> > 
> > But different question, how does this not conflict with:
> > 
> > #define PAGE_MAPPING_ANON   0x1
> > 
> > in page-flags.h?
> 
> Now we are treating dax pages, so I think its flags should be different from
> normal page.  In another word, PAGE_MAPPING_ANON is a flag of rmap mechanism
> for normal page, it doesn't work for dax page.  And now, we have dax rmap
> for dax page.  So, I think this two kinds of flags are supposed to be used
> in different mechanisms and won't conflect.

It just needs someone to use folio_test_anon in a place where a DAX
folio can be passed.  This probably should not happen, but we need to
clearly document that.

> > Either way I think this flag should move to page-flags.h and be
> > integrated with the PAGE_MAPPING_FLAGS infrastucture.
> 
> And that's why I keep them in this dax.c file.

But that does not integrate it with the infrastructure.  For people
to debug things it needs to be next to PAGE_MAPPING_ANON and have
documentation explaining why they are exclusive.



Re: [PATCH v9 10/10] fsdax: set a CoW flag when associate reflink mappings

2022-01-20 Thread Shiyang Ruan




在 2022/1/20 16:59, Christoph Hellwig 写道:

On Sun, Dec 26, 2021 at 10:34:39PM +0800, Shiyang Ruan wrote:

+#define FS_DAX_MAPPING_COW 1UL
+
+#define MAPPING_SET_COW(m) (m = (struct address_space *)FS_DAX_MAPPING_COW)
+#define MAPPING_TEST_COW(m)(((unsigned long)m & FS_DAX_MAPPING_COW) == \
+   FS_DAX_MAPPING_COW)


These really should be inline functions and probably use lower case
names.


OK.



But different question, how does this not conflict with:

#define PAGE_MAPPING_ANON   0x1

in page-flags.h?


Now we are treating dax pages, so I think its flags should be different 
from normal page.  In another word, PAGE_MAPPING_ANON is a flag of rmap 
mechanism for normal page, it doesn't work for dax page.  And now, we 
have dax rmap for dax page.  So, I think this two kinds of flags are 
supposed to be used in different mechanisms and won't conflect.




Either way I think this flag should move to page-flags.h and be
integrated with the PAGE_MAPPING_FLAGS infrastucture.


And that's why I keep them in this dax.c file.


--
Thanks,
Ruan.





Re: [PATCH v9 02/10] dax: Introduce holder for dax_device

2022-01-20 Thread Darrick J. Wong
On Fri, Jan 21, 2022 at 09:26:52AM +0800, Shiyang Ruan wrote:
> 
> 
> 在 2022/1/20 16:46, Christoph Hellwig 写道:
> > On Wed, Jan 05, 2022 at 04:12:04PM -0800, Dan Williams wrote:
> > > We ended up with explicit callbacks after hch balked at a notifier
> > > call-chain, but I think we're back to that now. The partition mistake
> > > might be unfixable, but at least bdev_dax_pgoff() is dead. Notifier
> > > call chains have their own locking so, Ruan, this still does not need
> > > to touch dax_read_lock().
> > 
> > I think we have a few options here:
> > 
> >   (1) don't allow error notifications on partitions.  And error return from
> >   the holder registration with proper error handling in the file
> >   system would give us that

Hm, so that means XFS can only support dax+pmem when there aren't
partitions in use?  Ew.

> >   (2) extent the holder mechanism to cover a rangeo

I don't think I was around for the part where "hch balked at a notifier
call chain" -- what were the objections there, specifically?  I would
hope that pmem problems would be infrequent enough that the locking
contention (or rcu expiration) wouldn't be an issue...?

> >   (3) bite the bullet and create a new stacked dax_device for each
> >   partition
> > 
> > I think (1) is the best option for now.  If people really do need
> > partitions we'll have to go for (3)
> 
> Yes, I agree.  I'm doing it the first way right now.
> 
> I think that since we can use namespace to divide a big NVDIMM into multiple
> pmems, partition on a pmem seems not so meaningful.

I'll try to find out what will happen if pmem suddenly stops supporting
partitions...

--D

> 
> --
> Thanks,
> Ruan.
> 
> 



Re: [PATCH v9 02/10] dax: Introduce holder for dax_device

2022-01-20 Thread Shiyang Ruan




在 2022/1/20 16:46, Christoph Hellwig 写道:

On Wed, Jan 05, 2022 at 04:12:04PM -0800, Dan Williams wrote:

We ended up with explicit callbacks after hch balked at a notifier
call-chain, but I think we're back to that now. The partition mistake
might be unfixable, but at least bdev_dax_pgoff() is dead. Notifier
call chains have their own locking so, Ruan, this still does not need
to touch dax_read_lock().


I think we have a few options here:

  (1) don't allow error notifications on partitions.  And error return from
  the holder registration with proper error handling in the file
  system would give us that
  (2) extent the holder mechanism to cover a range
  (3) bite the bullet and create a new stacked dax_device for each
  partition

I think (1) is the best option for now.  If people really do need
partitions we'll have to go for (3)


Yes, I agree.  I'm doing it the first way right now.

I think that since we can use namespace to divide a big NVDIMM into 
multiple pmems, partition on a pmem seems not so meaningful.



--
Thanks,
Ruan.





Re: [PATCH v9 10/10] fsdax: set a CoW flag when associate reflink mappings

2022-01-20 Thread Christoph Hellwig
On Sun, Dec 26, 2021 at 10:34:39PM +0800, Shiyang Ruan wrote:
> +#define FS_DAX_MAPPING_COW   1UL
> +
> +#define MAPPING_SET_COW(m)   (m = (struct address_space *)FS_DAX_MAPPING_COW)
> +#define MAPPING_TEST_COW(m)  (((unsigned long)m & FS_DAX_MAPPING_COW) == \
> + FS_DAX_MAPPING_COW)

These really should be inline functions and probably use lower case
names.

But different question, how does this not conflict with:

#define PAGE_MAPPING_ANON   0x1

in page-flags.h?

Either way I think this flag should move to page-flags.h and be
integrated with the PAGE_MAPPING_FLAGS infrastucture.



Re: [PATCH v9 08/10] mm: Introduce mf_dax_kill_procs() for fsdax case

2022-01-20 Thread Christoph Hellwig
Please only build the new DAX code if CONFIG_FS_DAX is set.



Re: [PATCH v9 07/10] mm: move pgoff_address() to vma_pgoff_address()

2022-01-20 Thread Christoph Hellwig
On Sun, Dec 26, 2021 at 10:34:36PM +0800, Shiyang Ruan wrote:
> Since it is not a DAX-specific function, move it into mm and rename it
> to be a generic helper.
> 
> Signed-off-by: Shiyang Ruan 

Looks good,

Reviewed-by: Christoph Hellwig 



Re: [PATCH v9 05/10] fsdax: fix function description

2022-01-20 Thread Christoph Hellwig
On Sun, Dec 26, 2021 at 10:34:34PM +0800, Shiyang Ruan wrote:
> The function name has been changed, so the description should be updated
> too.
> 
> Signed-off-by: Shiyang Ruan 

Looks good,

Reviewed-by: Christoph Hellwig 

Dan, can you send this to Linux for 5.17 so that we can get it out of
the way?



Re: [PATCH v9 02/10] dax: Introduce holder for dax_device

2022-01-20 Thread Christoph Hellwig
On Wed, Jan 05, 2022 at 04:12:04PM -0800, Dan Williams wrote:
> We ended up with explicit callbacks after hch balked at a notifier
> call-chain, but I think we're back to that now. The partition mistake
> might be unfixable, but at least bdev_dax_pgoff() is dead. Notifier
> call chains have their own locking so, Ruan, this still does not need
> to touch dax_read_lock().

I think we have a few options here:

 (1) don't allow error notifications on partitions.  And error return from
 the holder registration with proper error handling in the file
 system would give us that
 (2) extent the holder mechanism to cover a range
 (3) bite the bullet and create a new stacked dax_device for each
 partition

I think (1) is the best option for now.  If people really do need
partitions we'll have to go for (3)