Re: [PATCH v1] libnvdimm, dax: Fix a missing check in nd_dax_probe()

2021-04-09 Thread Dan Williams
On Fri, Apr 9, 2021 at 5:33 PM  wrote:
>
> From: Yingjie Wang 
>
> In nd_dax_probe(), nd_dax_alloc() may fail and return NULL.
> Check for NULL before attempting to
> use nd_dax to avoid a NULL pointer dereference.
>
> Fixes: c5ed9268643c ("libnvdimm, dax: autodetect support")
> Signed-off-by: Yingjie Wang 
> ---
>  drivers/nvdimm/dax_devs.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/nvdimm/dax_devs.c b/drivers/nvdimm/dax_devs.c
> index 99965077bac4..b1426ac03f01 100644
> --- a/drivers/nvdimm/dax_devs.c
> +++ b/drivers/nvdimm/dax_devs.c
> @@ -106,6 +106,8 @@ int nd_dax_probe(struct device *dev, struct 
> nd_namespace_common *ndns)
>
> nvdimm_bus_lock(>dev);

hmmm...

> nd_dax = nd_dax_alloc(nd_region);
> +   if (!nd_dax)
> +   return -ENOMEM;

Can you spot the bug this introduces? See the hint above.

> nd_pfn = _dax->nd_pfn;
> dax_dev = nd_pfn_devinit(nd_pfn, ndns);
> nvdimm_bus_unlock(>dev);
> --
> 2.7.4
>
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH] libnvdimm/region: Update nvdimm_has_flush() to handle ND_REGION_ASYNC

2021-04-09 Thread Dan Williams
On Mon, Apr 5, 2021 at 4:31 AM Aneesh Kumar K.V
 wrote:
>
> Vaibhav Jain  writes:
>
> > In case a platform doesn't provide explicit flush-hints but provides an
> > explicit flush callback via ND_REGION_ASYNC region flag, then
> > nvdimm_has_flush() still returns '0' indicating that writes do not
> > require flushing. This happens on PPC64 with patch at [1] applied,
> > where 'deep_flush' of a region was denied even though an explicit
> > flush function was provided.
> >
> > Fix this by adding a condition to nvdimm_has_flush() to test for the
> > ND_REGION_ASYNC flag on the region and see if a 'region->flush'
> > callback is assigned.
> >
>
> May be this should have
> Fixes: c5d4355d10d4 ("libnvdimm: nd_region flush callback support")

Yes, thanks for that.

>
> Without this we will mark the pmem disk not having FUA support?

Yes.

>
>
> > References:
> > [1] "powerpc/papr_scm: Implement support for H_SCM_FLUSH hcall"
> > https://lore.kernel.org/linux-nvdimm/161703936121.36.7260632399   
> > 582101498.stgit@e1fbed493c87
> >
> > Reported-by: Shivaprasad G Bhat 
> > Signed-off-by: Vaibhav Jain 
> > ---
> >  drivers/nvdimm/region_devs.c | 9 +++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> > index ef23119db574..e05cc9f8a9fd 100644
> > --- a/drivers/nvdimm/region_devs.c
> > +++ b/drivers/nvdimm/region_devs.c
> > @@ -1239,6 +1239,11 @@ int nvdimm_has_flush(struct nd_region *nd_region)
> >   || !IS_ENABLED(CONFIG_ARCH_HAS_PMEM_API))
> >   return -ENXIO;
> >
> > + /* Test if an explicit flush function is defined */
> > + if (test_bit(ND_REGION_ASYNC, _region->flags) && nd_region->flush)
> > + return 1;
>
> > +
> > + /* Test if any flush hints for the region are available */
> >   for (i = 0; i < nd_region->ndr_mappings; i++) {
> >   struct nd_mapping *nd_mapping = _region->mapping[i];
> >   struct nvdimm *nvdimm = nd_mapping->nvdimm;
> > @@ -1249,8 +1254,8 @@ int nvdimm_has_flush(struct nd_region *nd_region)
> >   }
> >
> >   /*
> > -  * The platform defines dimm devices without hints, assume
> > -  * platform persistence mechanism like ADR
> > +  * The platform defines dimm devices without hints nor explicit flush,
> > +  * assume platform persistence mechanism like ADR
> >*/
> >   return 0;
> >  }
> > --
> > 2.30.2
> > ___
> > Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
> > To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH] libnvdimm/region: Update nvdimm_has_flush() to handle ND_REGION_ASYNC

2021-04-09 Thread Dan Williams
On Fri, Apr 2, 2021 at 2:26 AM Vaibhav Jain  wrote:
>
> In case a platform doesn't provide explicit flush-hints but provides an
> explicit flush callback via ND_REGION_ASYNC region flag, then
> nvdimm_has_flush() still returns '0' indicating that writes do not
> require flushing. This happens on PPC64 with patch at [1] applied,
> where 'deep_flush' of a region was denied even though an explicit
> flush function was provided.
>
> Fix this by adding a condition to nvdimm_has_flush() to test for the
> ND_REGION_ASYNC flag on the region and see if a 'region->flush'
> callback is assigned.

Looks good.

>
> References:
> [1] "powerpc/papr_scm: Implement support for H_SCM_FLUSH hcall"
> https://lore.kernel.org/linux-nvdimm/161703936121.36.7260632399 
> 582101498.stgit@e1fbed493c87

Looks like a typo happened in that link, I can fix that up. I'll also
change this to the canonical "Link:" tag for references.

>
> Reported-by: Shivaprasad G Bhat 
> Signed-off-by: Vaibhav Jain 
> ---
>  drivers/nvdimm/region_devs.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> index ef23119db574..e05cc9f8a9fd 100644
> --- a/drivers/nvdimm/region_devs.c
> +++ b/drivers/nvdimm/region_devs.c
> @@ -1239,6 +1239,11 @@ int nvdimm_has_flush(struct nd_region *nd_region)
> || !IS_ENABLED(CONFIG_ARCH_HAS_PMEM_API))
> return -ENXIO;
>
> +   /* Test if an explicit flush function is defined */
> +   if (test_bit(ND_REGION_ASYNC, _region->flags) && nd_region->flush)
> +   return 1;
> +
> +   /* Test if any flush hints for the region are available */
> for (i = 0; i < nd_region->ndr_mappings; i++) {
> struct nd_mapping *nd_mapping = _region->mapping[i];
> struct nvdimm *nvdimm = nd_mapping->nvdimm;
> @@ -1249,8 +1254,8 @@ int nvdimm_has_flush(struct nd_region *nd_region)
> }
>
> /*
> -* The platform defines dimm devices without hints, assume
> -* platform persistence mechanism like ADR
> +* The platform defines dimm devices without hints nor explicit flush,
> +* assume platform persistence mechanism like ADR
>  */
> return 0;
>  }
> --
> 2.30.2
>
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH v1] libnvdimm, dax: Fix a missing check in nd_dax_probe()

2021-04-09 Thread wangyingjie55
From: Yingjie Wang 

In nd_dax_probe(), nd_dax_alloc() may fail and return NULL.
Check for NULL before attempting to
use nd_dax to avoid a NULL pointer dereference.

Fixes: c5ed9268643c ("libnvdimm, dax: autodetect support")
Signed-off-by: Yingjie Wang 
---
 drivers/nvdimm/dax_devs.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/nvdimm/dax_devs.c b/drivers/nvdimm/dax_devs.c
index 99965077bac4..b1426ac03f01 100644
--- a/drivers/nvdimm/dax_devs.c
+++ b/drivers/nvdimm/dax_devs.c
@@ -106,6 +106,8 @@ int nd_dax_probe(struct device *dev, struct 
nd_namespace_common *ndns)
 
nvdimm_bus_lock(>dev);
nd_dax = nd_dax_alloc(nd_region);
+   if (!nd_dax)
+   return -ENOMEM;
nd_pfn = _dax->nd_pfn;
dax_dev = nd_pfn_devinit(nd_pfn, ndns);
nvdimm_bus_unlock(>dev);
-- 
2.7.4
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


中港运输+香港仓储+拼箱+装卸柜服务

2021-04-09 Thread 中港运输+香港仓储+拼箱+装卸柜服务


中港运输+香港仓储+拼箱+装卸柜服务

中港运输服务主要从事省内各地到香港葵涌码头、机场及其他站场的散货拼车
整车运输服务及香港到深圳的接驳运输,并可代理进出口报关报检配套服务.

■大陆上门提货,运输报关一条龙服务
■香港有卸货平台,可提供装卸货,拼箱,仓储服务
■九龙/新界/港岛派送公司以及仓库
■香港机场/仓码头仓入仓服务


中港货运有限公司

联系人:Jack
Mobile:+86-13642980935(微信同号)
E-mail:wuliu56sale...@hotmail.com
香港:葵涌三号货柜码头物流中心A座地下
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: follow up on our email from yesterday

2021-04-09 Thread Addyson Barry



Hi,
I hope all is well.I am contacting you to let you know that we
have got the following office chair available in our warehouse.
Specification:Material: PVC(polyvinyl
chloride)Leather + Elastic Fabric + Plating FeetMax. Load Bearing:
150kgWhole Size: 73x50cm / 28.74''x19.68''Sitting Size: 46x51cm
/ 18.11''x20.08''Feature:-90°~135°lying design,
can be lying down like a bed.-Retractable footrest, make you more
relexed during the break.-Quality leather cushion, comfortable
sitting, scratch resistant.-5 claw universal wheel-360°
rotatable.-Height adjustable, to fit different people's
height.Costs:  u  s  d  shipment
included219.50 each (1-2 units)199.50 each (3-10 units)179.50 each (11-50 units)Would you like to order it? Just confirm your
address for delivery, we will arrange the shipment.Would you like
to order it? Just confirm your address for delivery, we will arrange the
shipment.Thanks,Addyson Barry


___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH v7 16/16] bcache: more fix for compiling error when BCACHE_NVM_PAGES disabled

2021-04-09 Thread Coly Li
This patch fixes the compiling error when BCACHE_NVM_PAGES is disabled.
The change could be added into previous nvm-pages patches, so that this
patch can be dropped in next nvm-pages series.

Signed-off-by: Coly Li 
Cc: Jianpeng Ma 
Cc: Qiaowei Ren 
---
 drivers/md/bcache/nvm-pages.c | 4 ++--
 drivers/md/bcache/nvm-pages.h | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/md/bcache/nvm-pages.c b/drivers/md/bcache/nvm-pages.c
index 19597ae7ef3e..b32f162bf728 100644
--- a/drivers/md/bcache/nvm-pages.c
+++ b/drivers/md/bcache/nvm-pages.c
@@ -7,6 +7,8 @@
  * Copyright (c) 2021, Jianpeng Ma .
  */
 
+#ifdef CONFIG_BCACHE_NVM_PAGES
+
 #include "bcache.h"
 #include "nvm-pages.h"
 
@@ -23,8 +25,6 @@
 #include 
 #include 
 
-#ifdef CONFIG_BCACHE_NVM_PAGES
-
 struct bch_nvm_set *only_set;
 
 static void release_nvm_namespaces(struct bch_nvm_set *nvm_set)
diff --git a/drivers/md/bcache/nvm-pages.h b/drivers/md/bcache/nvm-pages.h
index 1c4cbad0209f..f9e0cd7ca3dd 100644
--- a/drivers/md/bcache/nvm-pages.h
+++ b/drivers/md/bcache/nvm-pages.h
@@ -3,8 +3,10 @@
 #ifndef _BCACHE_NVM_PAGES_H
 #define _BCACHE_NVM_PAGES_H
 
+#ifdef CONFIG_BCACHE_NVM_PAGES
 #include 
 #include 
+#endif /* CONFIG_BCACHE_NVM_PAGES */
 
 /*
  * Bcache NVDIMM in memory data structures
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH v7 15/16] bcache: fix BCACHE_NVM_PAGES' dependences in Kconfig

2021-04-09 Thread Coly Li
This patch fix the following dependences for BCACHE_NVM_PAGES in
Kconfig,
- Add "depends on PHYS_ADDR_T_64BIT" which is mandatory for libnvdimm
- Add "select LIBNVDIMM" and "select DAX" because nvm-pages code needs
  libnvdimm and dax driver.

This patch can be merged into previous nvm-pages patches, and dropped
in next version series.

Signed-off-by: Coly Li 
Cc: Jianpeng Ma 
Cc: Qiaowei Ren 
---
 drivers/md/bcache/Kconfig | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/md/bcache/Kconfig b/drivers/md/bcache/Kconfig
index fdec9905ef40..0996e366ad0b 100644
--- a/drivers/md/bcache/Kconfig
+++ b/drivers/md/bcache/Kconfig
@@ -39,5 +39,8 @@ config BCACHE_ASYNC_REGISTRATION
 config BCACHE_NVM_PAGES
bool "NVDIMM support for bcache (EXPERIMENTAL)"
depends on BCACHE
+   depends on PHYS_ADDR_T_64BIT
+   select LIBNVDIMM
+   select DAX
help
nvm pages allocator for bcache.
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH v7 14/16] bcache: use div_u64() in init_owner_info()

2021-04-09 Thread Coly Li
Kernel test robot reports the built-in u64/u32 in init_owner_info()
doesn't work for m68k arch, the explicit div_u64() should be used.

This patch explicit uses div_u64() to do the u64/u32 division on
32bit m68k arch.

Reported-by: kernel test robot 
Signed-off-by: Coly Li 
Cc: Jianpeng Ma 
Cc: Qiaowei Ren 
---
 drivers/md/bcache/nvm-pages.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/bcache/nvm-pages.c b/drivers/md/bcache/nvm-pages.c
index c3ab396a45fa..19597ae7ef3e 100644
--- a/drivers/md/bcache/nvm-pages.c
+++ b/drivers/md/bcache/nvm-pages.c
@@ -405,7 +405,7 @@ static int init_owner_info(struct bch_nvm_namespace *ns)
only_set->owner_list_used = owner_list_head->used;
 
/*remove used space*/
-   remove_owner_space(ns, 0, ns->pages_offset/ns->page_size);
+   remove_owner_space(ns, 0, div_u64(ns->pages_offset, ns->page_size));
 
sys_recs = ns->kaddr + BCH_NVM_PAGES_SYS_RECS_HEAD_OFFSET;
// suppose no hole in array
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH v7 13/16] bcache: add sysfs interface register_nvdimm_meta to register NVDIMM meta device

2021-04-09 Thread Coly Li
This patch adds a sysfs interface register_nvdimm_meta to register
NVDIMM meta device. The sysfs interface file only shows up when
CONFIG_BCACHE_NVM_PAGES=y. Then a NVDIMM name space formatted by
bcache-tools can be registered into bcache by e.g.,
  echo /dev/pmem0 > /sys/fs/bcache/register_nvdimm_meta

Signed-off-by: Coly Li 
Cc: Jianpeng Ma 
Cc: Qiaowei Ren 
---
 drivers/md/bcache/super.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 9640bfb85571..d95a9a3c2041 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -2436,10 +2436,18 @@ static ssize_t register_bcache(struct kobject *k, 
struct kobj_attribute *attr,
 static ssize_t bch_pending_bdevs_cleanup(struct kobject *k,
 struct kobj_attribute *attr,
 const char *buffer, size_t size);
+#ifdef CONFIG_BCACHE_NVM_PAGES
+static ssize_t register_nvdimm_meta(struct kobject *k,
+   struct kobj_attribute *attr,
+   const char *buffer, size_t size);
+#endif
 
 kobj_attribute_write(register, register_bcache);
 kobj_attribute_write(register_quiet,   register_bcache);
 kobj_attribute_write(pendings_cleanup, bch_pending_bdevs_cleanup);
+#ifdef CONFIG_BCACHE_NVM_PAGES
+kobj_attribute_write(register_nvdimm_meta, register_nvdimm_meta);
+#endif
 
 static bool bch_is_open_backing(dev_t dev)
 {
@@ -2553,6 +2561,24 @@ static void register_device_async(struct async_reg_args 
*args)
queue_delayed_work(system_wq, >reg_work, 10);
 }
 
+#ifdef CONFIG_BCACHE_NVM_PAGES
+static ssize_t register_nvdimm_meta(struct kobject *k, struct kobj_attribute 
*attr,
+   const char *buffer, size_t size)
+{
+   ssize_t ret = size;
+
+   struct bch_nvm_namespace *ns = bch_register_namespace(buffer);
+
+   if (IS_ERR(ns)) {
+   pr_err("register nvdimm namespace %s for meta device failed.\n",
+   buffer);
+   ret = -EINVAL;
+   }
+
+   return ret;
+}
+#endif
+
 static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
   const char *buffer, size_t size)
 {
@@ -2888,6 +2914,9 @@ static int __init bcache_init(void)
static const struct attribute *files[] = {
_register.attr,
_register_quiet.attr,
+#ifdef CONFIG_BCACHE_NVM_PAGES
+   _register_nvdimm_meta.attr,
+#endif
_pendings_cleanup.attr,
NULL
};
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH v7 12/16] bcache: read jset from NVDIMM pages for journal replay

2021-04-09 Thread Coly Li
This patch implements two methods to read jset from media for journal
replay,
- __jnl_rd_bkt() for block device
  This is the legacy method to read jset via block device interface.
- __jnl_rd_nvm_bkt() for NVDIMM
  This is the method to read jset from NVDIMM memory interface, a.k.a
  memcopy() from NVDIMM pages to DRAM pages.

If BCH_FEATURE_INCOMPAT_NVDIMM_META is set in incompat feature set,
during running cache set, journal_read_bucket() will read the journal
content from NVDIMM by __jnl_rd_nvm_bkt(). The linear addresses of
NVDIMM pages to read jset are stored in sb.d[SB_JOURNAL_BUCKETS], which
were initialized and maintained in previous runs of the cache set.

A thing should be noticed is, when bch_journal_read() is called, the
linear address of NVDIMM pages is not loaded and initialized yet, it
is necessary to call __bch_journal_nvdimm_init() before reading the jset
from NVDIMM pages.

Signed-off-by: Coly Li 
Cc: Jianpeng Ma 
Cc: Qiaowei Ren 
---
 drivers/md/bcache/journal.c | 93 +++--
 1 file changed, 69 insertions(+), 24 deletions(-)

diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 9a542e6c2152..4f09ad5b994b 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -34,60 +34,96 @@ static void journal_read_endio(struct bio *bio)
closure_put(cl);
 }
 
+static struct jset *__jnl_rd_bkt(struct cache *ca, unsigned int bkt_idx,
+   unsigned int len, unsigned int offset,
+   struct closure *cl)
+{
+   sector_t bucket = bucket_to_sector(ca->set, ca->sb.d[bkt_idx]);
+   struct bio *bio = >journal.bio;
+   struct jset *data = ca->set->journal.w[0].data;
+
+   bio_reset(bio);
+   bio->bi_iter.bi_sector  = bucket + offset;
+   bio_set_dev(bio, ca->bdev);
+   bio->bi_iter.bi_size= len << 9;
+   bio->bi_end_io  = journal_read_endio;
+   bio->bi_private = cl;
+   bio_set_op_attrs(bio, REQ_OP_READ, 0);
+   bch_bio_map(bio, data);
+
+   closure_bio_submit(ca->set, bio, cl);
+   closure_sync(cl);
+
+   /* Indeed journal.w[0].data */
+   return data;
+}
+
+#ifdef CONFIG_BCACHE_NVM_PAGES
+
+static struct jset *__jnl_rd_nvm_bkt(struct cache *ca, unsigned int bkt_idx,
+unsigned int len, unsigned int offset)
+{
+   void *jset_addr = (void *)ca->sb.d[bkt_idx] + (offset << 9);
+   struct jset *data = ca->set->journal.w[0].data;
+
+   memcpy(data, jset_addr, len << 9);
+
+   /* Indeed journal.w[0].data */
+   return data;
+}
+
+#else /* CONFIG_BCACHE_NVM_PAGES */
+
+static struct jset *__jnl_rd_nvm_bkt(struct cache *ca, unsigned int bkt_idx,
+unsigned int len, unsigned int offset)
+{
+   return NULL;
+}
+
+#endif /* CONFIG_BCACHE_NVM_PAGES */
+
 static int journal_read_bucket(struct cache *ca, struct list_head *list,
-  unsigned int bucket_index)
+  unsigned int bucket_idx)
 {
struct journal_device *ja = >journal;
-   struct bio *bio = >bio;
 
struct journal_replay *i;
-   struct jset *j, *data = ca->set->journal.w[0].data;
+   struct jset *j;
struct closure cl;
unsigned int len, left, offset = 0;
int ret = 0;
-   sector_t bucket = bucket_to_sector(ca->set, ca->sb.d[bucket_index]);
 
closure_init_stack();
 
-   pr_debug("reading %u\n", bucket_index);
+   pr_debug("reading %u\n", bucket_idx);
 
while (offset < ca->sb.bucket_size) {
 reread:left = ca->sb.bucket_size - offset;
len = min_t(unsigned int, left, PAGE_SECTORS << JSET_BITS);
 
-   bio_reset(bio);
-   bio->bi_iter.bi_sector  = bucket + offset;
-   bio_set_dev(bio, ca->bdev);
-   bio->bi_iter.bi_size= len << 9;
-
-   bio->bi_end_io  = journal_read_endio;
-   bio->bi_private = 
-   bio_set_op_attrs(bio, REQ_OP_READ, 0);
-   bch_bio_map(bio, data);
-
-   closure_bio_submit(ca->set, bio, );
-   closure_sync();
+   if (!bch_has_feature_nvdimm_meta(>sb))
+   j = __jnl_rd_bkt(ca, bucket_idx, len, offset, );
+   else
+   j = __jnl_rd_nvm_bkt(ca, bucket_idx, len, offset);
 
/* This function could be simpler now since we no longer write
 * journal entries that overlap bucket boundaries; this means
 * the start of a bucket will always have a valid journal entry
 * if it has any journal entries at all.
 */
-
-   j = data;
while (len) {
struct list_head *where;
size_t blocks, bytes = set_bytes(j);
 
if (j->magic != jset_magic(>sb)) {
-   

[PATCH v7 11/16] bcache: support storing bcache journal into NVDIMM meta device

2021-04-09 Thread Coly Li
This patch implements two methods to store bcache journal to,
1) __journal_write_unlocked() for block interface device
   The latency method to compose bio and issue the jset bio to cache
   device (e.g. SSD). c->journal.key.ptr[0] indicates the LBA on cache
   device to store the journal jset.
2) __journal_nvdimm_write_unlocked() for memory interface NVDIMM
   Use memory interface to access NVDIMM pages and store the jset by
   memcpy_flushcache(). c->journal.key.ptr[0] indicates the linear
   address from the NVDIMM pages to store the journal jset.

For lagency configuration without NVDIMM meta device, journal I/O is
handled by __journal_write_unlocked() with existing code logic. If the
NVDIMM meta device is used (by bcache-tools), the journal I/O will
be handled by __journal_nvdimm_write_unlocked() and go into the NVDIMM
pages.

And when NVDIMM meta device is used, sb.d[] stores the linear addresses
from NVDIMM pages (no more bucket index), in journal_reclaim() the
journaling location in c->journal.key.ptr[0] should also be updated by
linear address from NVDIMM pages (no more LBA combined by sectors offset
and bucket index).

Signed-off-by: Coly Li 
Cc: Jianpeng Ma 
Cc: Qiaowei Ren 
---
 drivers/md/bcache/journal.c   | 119 --
 drivers/md/bcache/nvm-pages.h |   1 +
 drivers/md/bcache/super.c |  25 ++-
 3 files changed, 107 insertions(+), 38 deletions(-)

diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index acbfd4ec88af..9a542e6c2152 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -596,6 +596,8 @@ static void do_journal_discard(struct cache *ca)
return;
}
 
+   BUG_ON(bch_has_feature_nvdimm_meta(>sb));
+
switch (atomic_read(>discard_in_flight)) {
case DISCARD_IN_FLIGHT:
return;
@@ -661,9 +663,13 @@ static void journal_reclaim(struct cache_set *c)
goto out;
 
ja->cur_idx = next;
-   k->ptr[0] = MAKE_PTR(0,
-bucket_to_sector(c, ca->sb.d[ja->cur_idx]),
-ca->sb.nr_this_dev);
+   if (!bch_has_feature_nvdimm_meta(>sb))
+   k->ptr[0] = MAKE_PTR(0,
+   bucket_to_sector(c, ca->sb.d[ja->cur_idx]),
+   ca->sb.nr_this_dev);
+   else
+   k->ptr[0] = ca->sb.d[ja->cur_idx];
+
atomic_long_inc(>reclaimed_journal_buckets);
 
bkey_init(k);
@@ -729,46 +735,21 @@ static void journal_write_unlock(struct closure *cl)
spin_unlock(>journal.lock);
 }
 
-static void journal_write_unlocked(struct closure *cl)
+
+static void __journal_write_unlocked(struct cache_set *c)
__releases(c->journal.lock)
 {
-   struct cache_set *c = container_of(cl, struct cache_set, journal.io);
-   struct cache *ca = c->cache;
-   struct journal_write *w = c->journal.cur;
struct bkey *k = >journal.key;
-   unsigned int i, sectors = set_blocks(w->data, block_bytes(ca)) *
-   ca->sb.block_size;
-
+   struct journal_write *w = c->journal.cur;
+   struct closure *cl = >journal.io;
+   struct cache *ca = c->cache;
struct bio *bio;
struct bio_list list;
+   unsigned int i, sectors = set_blocks(w->data, block_bytes(ca)) *
+   ca->sb.block_size;
 
bio_list_init();
 
-   if (!w->need_write) {
-   closure_return_with_destructor(cl, journal_write_unlock);
-   return;
-   } else if (journal_full(>journal)) {
-   journal_reclaim(c);
-   spin_unlock(>journal.lock);
-
-   btree_flush_write(c);
-   continue_at(cl, journal_write, bch_journal_wq);
-   return;
-   }
-
-   c->journal.blocks_free -= set_blocks(w->data, block_bytes(ca));
-
-   w->data->btree_level = c->root->level;
-
-   bkey_copy(>data->btree_root, >root->key);
-   bkey_copy(>data->uuid_bucket, >uuid_bucket);
-
-   w->data->prio_bucket[ca->sb.nr_this_dev] = ca->prio_buckets[0];
-   w->data->magic  = jset_magic(>sb);
-   w->data->version= BCACHE_JSET_VERSION;
-   w->data->last_seq   = last_seq(>journal);
-   w->data->csum   = csum_set(w->data);
-
for (i = 0; i < KEY_PTRS(k); i++) {
ca = PTR_CACHE(c, k, i);
bio = >journal.bio;
@@ -793,7 +774,6 @@ static void journal_write_unlocked(struct closure *cl)
 
ca->journal.seq[ca->journal.cur_idx] = w->data->seq;
}
-
/* If KEY_PTRS(k) == 0, this jset gets lost in air */
BUG_ON(i == 0);
 
@@ -805,6 +785,73 @@ static void journal_write_unlocked(struct closure *cl)
 
while ((bio = bio_list_pop()))
closure_bio_submit(c, bio, cl);
+}
+
+#ifdef CONFIG_BCACHE_NVM_PAGES
+
+static void __journal_nvdimm_write_unlocked(struct cache_set *c)
+   __releases(c->journal.lock)
+{
+   struct 

[PATCH v7 10/16] bcache: initialize bcache journal for NVDIMM meta device

2021-04-09 Thread Coly Li
The nvm-pages allocator may store and index the NVDIMM pages allocated
for bcache journal. This patch adds the initialization to store bcache
journal space on NVDIMM pages if BCH_FEATURE_INCOMPAT_NVDIMM_META bit is
set by bcache-tools.

If BCH_FEATURE_INCOMPAT_NVDIMM_META is set, get_nvdimm_journal_space()
will return the linear address of NVDIMM pages for bcache journal,
- If there is previously allocated space, find it from nvm-pages owner
  list and return to bch_journal_init().
- If there is no previously allocated space, require a new NVDIMM range
  from the nvm-pages allocator, and return it to bch_journal_init().

And in bch_journal_init(), keys in sb.d[] store the corresponding linear
address from NVDIMM into sb.d[i].ptr[0] where 'i' is the bucket index to
iterate all journal buckets.

Later when bcache journaling code stores the journaling jset, the target
NVDIMM linear address stored (and updated) in sb.d[i].ptr[0] can be used
directly in memory copy from DRAM pages into NVDIMM pages.

Signed-off-by: Coly Li 
Cc: Jianpeng Ma 
Cc: Qiaowei Ren 
---
 drivers/md/bcache/journal.c | 105 
 drivers/md/bcache/journal.h |   2 +-
 drivers/md/bcache/super.c   |  16 +++---
 3 files changed, 115 insertions(+), 8 deletions(-)

diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index c6613e817333..acbfd4ec88af 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -9,6 +9,8 @@
 #include "btree.h"
 #include "debug.h"
 #include "extents.h"
+#include "nvm-pages.h"
+#include "features.h"
 
 #include 
 
@@ -982,3 +984,106 @@ int bch_journal_alloc(struct cache_set *c)
 
return 0;
 }
+
+#ifdef CONFIG_BCACHE_NVM_PAGES
+
+static void *find_journal_nvm_base(struct bch_nvm_pages_owner_head *owner_list,
+  struct cache *ca)
+{
+   unsigned long addr = 0;
+   struct bch_nvm_pgalloc_recs *recs_list = owner_list->recs[0];
+
+   while (recs_list) {
+   struct bch_pgalloc_rec *rec;
+   unsigned long jnl_pgoff;
+   int i;
+
+   jnl_pgoff = ((unsigned long)ca->sb.d[0]) >> PAGE_SHIFT;
+   rec = recs_list->recs;
+   for (i = 0; i < recs_list->used; i++) {
+   if (rec->pgoff == jnl_pgoff)
+   break;
+   rec++;
+   }
+   if (i < recs_list->used) {
+   addr = rec->pgoff << PAGE_SHIFT;
+   break;
+   }
+   recs_list = recs_list->next;
+   }
+   return (void *)addr;
+}
+
+static void *get_nvdimm_journal_space(struct cache *ca)
+{
+   struct bch_nvm_pages_owner_head *owner_list = NULL;
+   void *ret = NULL;
+   int order;
+
+   owner_list = bch_get_allocated_pages(ca->sb.set_uuid);
+   if (owner_list) {
+   ret = find_journal_nvm_base(owner_list, ca);
+   if (ret)
+   goto found;
+   }
+
+   order = ilog2(ca->sb.bucket_size *
+ ca->sb.njournal_buckets / PAGE_SECTORS);
+   ret = bch_nvm_alloc_pages(order, ca->sb.set_uuid);
+   if (ret)
+   memset(ret, 0, (1 << order) * PAGE_SIZE);
+
+found:
+   return ret;
+}
+
+static int __bch_journal_nvdimm_init(struct cache *ca)
+{
+   int i, ret = 0;
+   void *journal_nvm_base = NULL;
+
+   journal_nvm_base = get_nvdimm_journal_space(ca);
+   if (!journal_nvm_base) {
+   pr_err("Failed to get journal space from nvdimm\n");
+   ret = -1;
+   goto out;
+   }
+
+   /* Iniialized and reloaded from on-disk super block already */
+   if (ca->sb.d[0] != 0)
+   goto out;
+
+   for (i = 0; i < ca->sb.keys; i++)
+   ca->sb.d[i] =
+   (u64)(journal_nvm_base + (ca->sb.bucket_size * i));
+
+out:
+   return ret;
+}
+
+#else /* CONFIG_BCACHE_NVM_PAGES */
+
+static int __bch_journal_nvdimm_init(struct cache *ca)
+{
+   return -1;
+}
+
+#endif /* CONFIG_BCACHE_NVM_PAGES */
+
+int bch_journal_init(struct cache_set *c)
+{
+   int i, ret = 0;
+   struct cache *ca = c->cache;
+
+   ca->sb.keys = clamp_t(int, ca->sb.nbuckets >> 7,
+   2, SB_JOURNAL_BUCKETS);
+
+   if (!bch_has_feature_nvdimm_meta(>sb)) {
+   for (i = 0; i < ca->sb.keys; i++)
+   ca->sb.d[i] = ca->sb.first_bucket + i;
+   } else {
+   ret = __bch_journal_nvdimm_init(ca);
+   }
+
+   return ret;
+}
diff --git a/drivers/md/bcache/journal.h b/drivers/md/bcache/journal.h
index f2ea34d5f431..e3a7fa5a8fda 100644
--- a/drivers/md/bcache/journal.h
+++ b/drivers/md/bcache/journal.h
@@ -179,7 +179,7 @@ void bch_journal_mark(struct cache_set *c, struct list_head 
*list);
 void bch_journal_meta(struct cache_set *c, struct closure *cl);
 int bch_journal_read(struct cache_set *c, 

[PATCH v7 09/16] bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set

2021-04-09 Thread Coly Li
This patch adds BCH_FEATURE_INCOMPAT_NVDIMM_META (value 0x0004) into the
incompat feature set. When this bit is set by bcache-tools, it indicates
bcache meta data should be stored on specific NVDIMM meta device.

The bcache meta data mainly includes journal and btree nodes, when this
bit is set in incompat feature set, bcache will ask the nvm-pages
allocator for NVDIMM space to store the meta data.

Signed-off-by: Coly Li 
Cc: Jianpeng Ma 
Cc: Qiaowei Ren 
---
 drivers/md/bcache/features.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/md/bcache/features.h b/drivers/md/bcache/features.h
index d1c8fd3977fc..333fb5efb6bd 100644
--- a/drivers/md/bcache/features.h
+++ b/drivers/md/bcache/features.h
@@ -17,11 +17,19 @@
 #define BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET 0x0001
 /* real bucket size is (1 << bucket_size) */
 #define BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE 0x0002
+/* store bcache meta data on nvdimm */
+#define BCH_FEATURE_INCOMPAT_NVDIMM_META   0x0004
 
 #define BCH_FEATURE_COMPAT_SUPP0
 #define BCH_FEATURE_RO_COMPAT_SUPP 0
+#ifdef CONFIG_BCACHE_NVM_PAGES
+#define BCH_FEATURE_INCOMPAT_SUPP  
(BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET| \
+
BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE| \
+BCH_FEATURE_INCOMPAT_NVDIMM_META)
+#else
 #define BCH_FEATURE_INCOMPAT_SUPP  
(BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET| \
 
BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE)
+#endif
 
 #define BCH_HAS_COMPAT_FEATURE(sb, mask) \
((sb)->feature_compat & (mask))
@@ -89,6 +97,7 @@ static inline void bch_clear_feature_##name(struct cache_sb 
*sb) \
 
 BCH_FEATURE_INCOMPAT_FUNCS(obso_large_bucket, OBSO_LARGE_BUCKET);
 BCH_FEATURE_INCOMPAT_FUNCS(large_bucket, LOG_LARGE_BUCKET_SIZE);
+BCH_FEATURE_INCOMPAT_FUNCS(nvdimm_meta, NVDIMM_META);
 
 static inline bool bch_has_unknown_compat_features(struct cache_sb *sb)
 {
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH v7 08/16] bcache: use bucket index to set GC_MARK_METADATA for journal buckets in bch_btree_gc_finish()

2021-04-09 Thread Coly Li
Currently the meta data bucket locations on cache device are reserved
after the meta data stored on NVDIMM pages, for the meta data layout
consistentcy temporarily. So these buckets are still marked as meta data
by SET_GC_MARK() in bch_btree_gc_finish().

When BCH_FEATURE_INCOMPAT_NVDIMM_META is set, the sb.d[] stores linear
address of NVDIMM pages and not bucket index anymore. Therefore we
should avoid to find bucket index from sb.d[], and directly use bucket
index from ca->sb.first_bucket to (ca->sb.first_bucket +
ca->sb.njournal_bucketsi) for setting the gc mark of journal bucket.

Signed-off-by: Coly Li 
Cc: Jianpeng Ma 
Cc: Qiaowei Ren 
---
 drivers/md/bcache/btree.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index fe6dce125aba..28edd884bd5d 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -1761,8 +1761,10 @@ static void bch_btree_gc_finish(struct cache_set *c)
ca = c->cache;
ca->invalidate_needs_gc = 0;
 
-   for (k = ca->sb.d; k < ca->sb.d + ca->sb.keys; k++)
-   SET_GC_MARK(ca->buckets + *k, GC_MARK_METADATA);
+   /* Range [first_bucket, first_bucket + keys) is for journal buckets */
+   for (i = ca->sb.first_bucket;
+i < ca->sb.first_bucket + ca->sb.njournal_buckets; i++)
+   SET_GC_MARK(ca->buckets + i, GC_MARK_METADATA);
 
for (k = ca->prio_buckets;
 k < ca->prio_buckets + prio_buckets(ca) * 2; k++)
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH v7 07/16] bcache: nvm-pages fixes for bcache integration testing

2021-04-09 Thread Coly Li
There are two minor fixes in nvm-pages code, which can be added in next
nvm-pages series. Then I can drop this patch.

Signed-off-by: Coly Li 
Cc: Jianpeng Ma 
Cc: Qiaowei Ren 
---
 drivers/md/bcache/nvm-pages.c | 29 +
 drivers/md/bcache/nvm-pages.h |  1 +
 2 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/drivers/md/bcache/nvm-pages.c b/drivers/md/bcache/nvm-pages.c
index 2ba02091bccf..c3ab396a45fa 100644
--- a/drivers/md/bcache/nvm-pages.c
+++ b/drivers/md/bcache/nvm-pages.c
@@ -73,24 +73,32 @@ static inline void remove_owner_space(struct 
bch_nvm_namespace *ns,
 static struct bch_nvm_pages_owner_head *find_owner_head(const char 
*owner_uuid, bool create)
 {
struct bch_owner_list_head *owner_list_head = only_set->owner_list_head;
+   struct bch_nvm_pages_owner_head *owner_head = NULL;
int i;
 
+   if (owner_list_head == NULL)
+   goto out;
+
for (i = 0; i < only_set->owner_list_used; i++) {
-   if (!memcmp(owner_uuid, owner_list_head->heads[i].uuid, 16))
-   return &(owner_list_head->heads[i]);
+   if (!memcmp(owner_uuid, owner_list_head->heads[i].uuid, 16)) {
+   owner_head = &(owner_list_head->heads[i]);
+   break;
+   }
}
 
-   if (create) {
+   if (!owner_head && create) {
int used = only_set->owner_list_used;
 
-   BUG_ON(only_set->owner_list_size == used);
-   memcpy(owner_list_head->heads[used].uuid, owner_uuid, 16);
+   BUG_ON((used > 0) && (only_set->owner_list_size == used));
+   memcpy_flushcache(owner_list_head->heads[used].uuid, 
owner_uuid, 16);
only_set->owner_list_used++;
 
owner_list_head->used++;
-   return &(owner_list_head->heads[used]);
-   } else
-   return NULL;
+   owner_head = &(owner_list_head->heads[used]);
+   }
+
+out:
+   return owner_head;
 }
 
 static struct bch_nvm_pgalloc_recs *find_empty_pgalloc_recs(void)
@@ -324,6 +332,10 @@ void *bch_nvm_alloc_pages(int order, const char 
*owner_uuid)
 
mutex_lock(_set->lock);
owner_head = find_owner_head(owner_uuid, true);
+   if (!owner_head) {
+   pr_err("can't find bch_nvm_pgalloc_recs by(uuid=%s)\n", 
owner_uuid);
+   goto unlock;
+   }
 
for (j = 0; j < only_set->total_namespaces_nr; j++) {
struct bch_nvm_namespace *ns = only_set->nss[j];
@@ -369,6 +381,7 @@ void *bch_nvm_alloc_pages(int order, const char *owner_uuid)
}
}
 
+unlock:
mutex_unlock(_set->lock);
return kaddr;
 }
diff --git a/drivers/md/bcache/nvm-pages.h b/drivers/md/bcache/nvm-pages.h
index 87b1efc301c8..b8a5cd0890d3 100644
--- a/drivers/md/bcache/nvm-pages.h
+++ b/drivers/md/bcache/nvm-pages.h
@@ -66,6 +66,7 @@ struct bch_nvm_pages_owner_head 
*bch_get_allocated_pages(const char *owner_uuid)
 #else
 
 static inline struct bch_nvm_namespace *bch_register_namespace(const char 
*dev_path)
+{
return NULL;
 }
 static inline int bch_nvm_init(void)
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH v7 06/16] bcache: get allocated pages from specific owner

2021-04-09 Thread Coly Li
From: Jianpeng Ma 

This patch implements bch_get_allocated_pages() of the buddy to be used to
get allocated pages from specific owner.

Signed-off-by: Jianpeng Ma 
Co-authored-by: Qiaowei Ren 
---
 drivers/md/bcache/nvm-pages.c | 6 ++
 drivers/md/bcache/nvm-pages.h | 7 ++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/md/bcache/nvm-pages.c b/drivers/md/bcache/nvm-pages.c
index e576b4cb4850..2ba02091bccf 100644
--- a/drivers/md/bcache/nvm-pages.c
+++ b/drivers/md/bcache/nvm-pages.c
@@ -374,6 +374,12 @@ void *bch_nvm_alloc_pages(int order, const char 
*owner_uuid)
 }
 EXPORT_SYMBOL_GPL(bch_nvm_alloc_pages);
 
+struct bch_nvm_pages_owner_head *bch_get_allocated_pages(const char 
*owner_uuid)
+{
+   return find_owner_head(owner_uuid, false);
+}
+EXPORT_SYMBOL_GPL(bch_get_allocated_pages);
+
 static int init_owner_info(struct bch_nvm_namespace *ns)
 {
struct bch_owner_list_head *owner_list_head = ns->sb->owner_list_head;
diff --git a/drivers/md/bcache/nvm-pages.h b/drivers/md/bcache/nvm-pages.h
index 4ea831894583..87b1efc301c8 100644
--- a/drivers/md/bcache/nvm-pages.h
+++ b/drivers/md/bcache/nvm-pages.h
@@ -62,7 +62,7 @@ int bch_nvm_init(void);
 void bch_nvm_exit(void);
 void *bch_nvm_alloc_pages(int order, const char *owner_uuid);
 void bch_nvm_free_pages(void *addr, int order, const char *owner_uuid);
-
+struct bch_nvm_pages_owner_head *bch_get_allocated_pages(const char 
*owner_uuid);
 #else
 
 static inline struct bch_nvm_namespace *bch_register_namespace(const char 
*dev_path)
@@ -81,6 +81,11 @@ static inline void *bch_nvm_alloc_pages(int order, const 
char *owner_uuid)
 
 static inline void bch_nvm_free_pages(void *addr, int order, const char 
*owner_uuid) { }
 
+static inline struct bch_nvm_pages_owner_head *bch_get_allocated_pages(const 
char *owner_uuid)
+{
+   return NULL;
+}
+
 #endif /* CONFIG_BCACHE_NVM_PAGES */
 
 #endif /* _BCACHE_NVM_PAGES_H */
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH v7 05/16] bcache: bch_nvm_free_pages() of the buddy

2021-04-09 Thread Coly Li
From: Jianpeng Ma 

This patch implements the bch_nvm_free_pages() of the buddy.

Signed-off-by: Jianpeng Ma 
Co-authored-by: Qiaowei Ren 
---
 drivers/md/bcache/nvm-pages.c | 158 +-
 drivers/md/bcache/nvm-pages.h |   3 +
 2 files changed, 157 insertions(+), 4 deletions(-)

diff --git a/drivers/md/bcache/nvm-pages.c b/drivers/md/bcache/nvm-pages.c
index 6bdc7d3773de..e576b4cb4850 100644
--- a/drivers/md/bcache/nvm-pages.c
+++ b/drivers/md/bcache/nvm-pages.c
@@ -166,6 +166,155 @@ static void add_pgalloc_rec(struct bch_nvm_pgalloc_recs 
*recs, void *kaddr, int
BUG_ON(i == recs->size);
 }
 
+static inline void *nvm_end_addr(struct bch_nvm_namespace *ns)
+{
+   return ns->kaddr + (ns->pages_total << PAGE_SHIFT);
+}
+
+static inline bool in_nvm_range(struct bch_nvm_namespace *ns,
+   void *start_addr, void *end_addr)
+{
+   return (start_addr >= ns->kaddr) && (end_addr <= nvm_end_addr(ns));
+}
+
+static struct bch_nvm_namespace *find_nvm_by_addr(void *addr, int order)
+{
+   int i;
+   struct bch_nvm_namespace *ns;
+
+   for (i = 0; i < only_set->total_namespaces_nr; i++) {
+   ns = only_set->nss[i];
+   if (ns && in_nvm_range(ns, addr, addr + (1 << order)))
+   return ns;
+   }
+   return NULL;
+}
+
+static int remove_pgalloc_rec(struct bch_nvm_pgalloc_recs *pgalloc_recs, int 
ns_nr,
+   void *kaddr, int order)
+{
+   struct bch_nvm_pages_owner_head *owner_head = pgalloc_recs->owner;
+   struct bch_nvm_pgalloc_recs *prev_recs, *sys_recs;
+   u64 pgoff = (unsigned long)kaddr >> PAGE_SHIFT;
+   struct bch_nvm_namespace *ns = only_set->nss[0];
+   int i;
+
+   prev_recs = pgalloc_recs;
+   sys_recs = ns->kaddr + BCH_NVM_PAGES_SYS_RECS_HEAD_OFFSET;
+   while (pgalloc_recs) {
+   for (i = 0; i < pgalloc_recs->size; i++) {
+   struct bch_pgalloc_rec *rec = &(pgalloc_recs->recs[i]);
+
+   if (rec->pgoff == pgoff) {
+   WARN_ON(rec->order != order);
+   rec->pgoff = 0;
+   rec->order = 0;
+   pgalloc_recs->used--;
+
+   if (pgalloc_recs->used == 0) {
+   int recs_pos = pgalloc_recs - sys_recs;
+
+   if (pgalloc_recs == prev_recs)
+   owner_head->recs[ns_nr] = 
pgalloc_recs->next;
+   else
+   prev_recs->next = 
pgalloc_recs->next;
+
+   pgalloc_recs->next = NULL;
+   pgalloc_recs->owner = NULL;
+
+   bitmap_clear(ns->pgalloc_recs_bitmap, 
recs_pos, 1);
+   }
+   goto exit;
+   }
+   }
+   prev_recs = pgalloc_recs;
+   pgalloc_recs = pgalloc_recs->next;
+   }
+exit:
+   return pgalloc_recs ? 0 : -ENOENT;
+}
+
+static void __free_space(struct bch_nvm_namespace *ns, void *addr, int order)
+{
+   unsigned int add_pages = (1 << order);
+   pgoff_t pgoff;
+   struct page *page;
+
+   page = nvm_vaddr_to_page(ns, addr);
+   WARN_ON((!page) || (page->private != order));
+   pgoff = page->index;
+
+   while (order < BCH_MAX_ORDER - 1) {
+   struct page *buddy_page;
+
+   pgoff_t buddy_pgoff = pgoff ^ (1 << order);
+   pgoff_t parent_pgoff = pgoff & ~(1 << order);
+
+   if ((parent_pgoff + (1 << (order + 1)) > ns->pages_total))
+   break;
+
+   buddy_page = nvm_vaddr_to_page(ns, nvm_pgoff_to_vaddr(ns, 
buddy_pgoff));
+   WARN_ON(!buddy_page);
+
+   if (PageBuddy(buddy_page) && (buddy_page->private == order)) {
+   list_del((struct list_head 
*)_page->zone_device_data);
+   __ClearPageBuddy(buddy_page);
+   pgoff = parent_pgoff;
+   order++;
+   continue;
+   }
+   break;
+   }
+
+   page = nvm_vaddr_to_page(ns, nvm_pgoff_to_vaddr(ns, pgoff));
+   WARN_ON(!page);
+   list_add((struct list_head *)>zone_device_data, 
>free_area[order]);
+   page->index = pgoff;
+   set_page_private(page, order);
+   __SetPageBuddy(page);
+   ns->free += add_pages;
+}
+
+void bch_nvm_free_pages(void *addr, int order, const char *owner_uuid)
+{
+   struct bch_nvm_namespace *ns;
+   struct bch_nvm_pages_owner_head *owner_head;
+   struct bch_nvm_pgalloc_recs *pgalloc_recs;
+   int r;
+
+   mutex_lock(_set->lock);
+
+   ns = find_nvm_by_addr(addr, order);
+

[PATCH v7 04/16] bcache: bch_nvm_alloc_pages() of the buddy

2021-04-09 Thread Coly Li
From: Jianpeng Ma 

This patch implements the bch_nvm_alloc_pages() of the buddy.

Signed-off-by: Jianpeng Ma 
Co-authored-by: Qiaowei Ren 
---
 drivers/md/bcache/nvm-pages.c | 157 ++
 drivers/md/bcache/nvm-pages.h |   6 ++
 2 files changed, 163 insertions(+)

diff --git a/drivers/md/bcache/nvm-pages.c b/drivers/md/bcache/nvm-pages.c
index fef497c7acb3..6bdc7d3773de 100644
--- a/drivers/md/bcache/nvm-pages.c
+++ b/drivers/md/bcache/nvm-pages.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_BCACHE_NVM_PAGES
 
@@ -68,6 +69,162 @@ static inline void remove_owner_space(struct 
bch_nvm_namespace *ns,
bitmap_set(ns->pages_bitmap, pgoff, nr);
 }
 
+/* If not found, it will create if create == true */
+static struct bch_nvm_pages_owner_head *find_owner_head(const char 
*owner_uuid, bool create)
+{
+   struct bch_owner_list_head *owner_list_head = only_set->owner_list_head;
+   int i;
+
+   for (i = 0; i < only_set->owner_list_used; i++) {
+   if (!memcmp(owner_uuid, owner_list_head->heads[i].uuid, 16))
+   return &(owner_list_head->heads[i]);
+   }
+
+   if (create) {
+   int used = only_set->owner_list_used;
+
+   BUG_ON(only_set->owner_list_size == used);
+   memcpy(owner_list_head->heads[used].uuid, owner_uuid, 16);
+   only_set->owner_list_used++;
+
+   owner_list_head->used++;
+   return &(owner_list_head->heads[used]);
+   } else
+   return NULL;
+}
+
+static struct bch_nvm_pgalloc_recs *find_empty_pgalloc_recs(void)
+{
+   unsigned int start;
+   struct bch_nvm_namespace *ns = only_set->nss[0];
+   struct bch_nvm_pgalloc_recs *recs;
+
+   start = bitmap_find_next_zero_area(ns->pgalloc_recs_bitmap, 
BCH_MAX_PGALLOC_RECS, 0, 1, 0);
+   if (start > BCH_MAX_PGALLOC_RECS) {
+   pr_info("no free struct bch_nvm_pgalloc_recs\n");
+   return NULL;
+   }
+
+   bitmap_set(ns->pgalloc_recs_bitmap, start, 1);
+   recs = (struct bch_nvm_pgalloc_recs *)(ns->kaddr + 
BCH_NVM_PAGES_SYS_RECS_HEAD_OFFSET)
+   + start;
+   return recs;
+}
+
+static struct bch_nvm_pgalloc_recs *find_nvm_pgalloc_recs(struct 
bch_nvm_namespace *ns,
+   struct bch_nvm_pages_owner_head *owner_head, bool create)
+{
+   int ns_nr = ns->sb->this_namespace_nr;
+   struct bch_nvm_pgalloc_recs *prev_recs = NULL, *recs = 
owner_head->recs[ns_nr];
+
+   // If create=false, we return recs[nr]
+   if (!create)
+   return recs;
+
+   // If create=true, it mean we need a empty struct bch_pgalloc_rec
+   // So we should find non-empty struct bch_nvm_pgalloc_recs or alloc
+   // new struct bch_nvm_pgalloc_recs. And return this bch_nvm_pgalloc_recs
+   while (recs && (recs->used == recs->size)) {
+   prev_recs = recs;
+   recs = recs->next;
+   }
+
+   // Found empty struct bch_nvm_pgalloc_recs
+   if (recs)
+   return recs;
+   // Need alloc new struct bch_nvm_galloc_recs
+   recs = find_empty_pgalloc_recs();
+   if (recs) {
+   recs->next = NULL;
+   recs->owner = owner_head;
+   strncpy(recs->magic, bch_nvm_pages_pgalloc_magic, 16);
+   strncpy(recs->owner_uuid, owner_head->uuid, 16);
+   recs->size = BCH_MAX_RECS;
+   recs->used = 0;
+
+   if (prev_recs)
+   prev_recs->next = recs;
+   else
+   owner_head->recs[ns_nr] = recs;
+   }
+
+   return recs;
+}
+
+static void add_pgalloc_rec(struct bch_nvm_pgalloc_recs *recs, void *kaddr, 
int order)
+{
+   int i;
+
+   for (i = 0; i < recs->size; i++) {
+   if (recs->recs[i].pgoff == 0) {
+   recs->recs[i].pgoff = (unsigned long)kaddr >> 
PAGE_SHIFT;
+   recs->recs[i].order = order;
+   recs->used++;
+   break;
+   }
+   }
+   BUG_ON(i == recs->size);
+}
+
+void *bch_nvm_alloc_pages(int order, const char *owner_uuid)
+{
+   void *kaddr = NULL;
+   struct bch_nvm_pgalloc_recs *pgalloc_recs;
+   struct bch_nvm_pages_owner_head *owner_head;
+   int i, j;
+
+   mutex_lock(_set->lock);
+   owner_head = find_owner_head(owner_uuid, true);
+
+   for (j = 0; j < only_set->total_namespaces_nr; j++) {
+   struct bch_nvm_namespace *ns = only_set->nss[j];
+
+   if (!ns || (ns->free < (1 << order)))
+   continue;
+
+   for (i = order; i < BCH_MAX_ORDER; i++) {
+   struct list_head *list;
+   struct page *page, *buddy_page;
+
+   if (list_empty(>free_area[i]))
+   continue;
+
+   list = 

[PATCH v7 03/16] bcache: initialization of the buddy

2021-04-09 Thread Coly Li
From: Jianpeng Ma 

This nvm pages allocator will implement the simple buddy to manage the
nvm address space. This patch initializes this buddy for new namespace.

the unit of alloc/free of the buddy is page. DAX device has their
struct page(in dram or PMEM).

struct {/* ZONE_DEVICE pages */
/** @pgmap: Points to the hosting device page map. */
struct dev_pagemap *pgmap;
void *zone_device_data;
/*
 * ZONE_DEVICE private pages are counted as being
 * mapped so the next 3 words hold the mapping, index,
 * and private fields from the source anonymous or
 * page cache page while the page is migrated to device
 * private memory.
 * ZONE_DEVICE MEMORY_DEVICE_FS_DAX pages also
 * use the mapping, index, and private fields when
 * pmem backed DAX files are mapped.
 */
};

ZONE_DEVICE pages only use pgmap. Other 4 words[16/32 bytes] don't use.
So the second/third word will be used as 'struct list_head ' which list
in buddy. The fourth word(that is normal struct page::index) store pgoff
which the page-offset in the dax device. And the fifth word (that is
normal struct page::private) store order of buddy. page_type will be used
to store buddy flags.

Signed-off-by: Jianpeng Ma 
Co-authored-by: Qiaowei Ren 
---
 drivers/md/bcache/nvm-pages.c   | 142 +++-
 drivers/md/bcache/nvm-pages.h   |   6 ++
 include/uapi/linux/bcache-nvm.h |  12 ++-
 3 files changed, 153 insertions(+), 7 deletions(-)

diff --git a/drivers/md/bcache/nvm-pages.c b/drivers/md/bcache/nvm-pages.c
index 101b108b9766..fef497c7acb3 100644
--- a/drivers/md/bcache/nvm-pages.c
+++ b/drivers/md/bcache/nvm-pages.c
@@ -34,6 +34,10 @@ static void release_nvm_namespaces(struct bch_nvm_set 
*nvm_set)
for (i = 0; i < nvm_set->total_namespaces_nr; i++) {
ns = nvm_set->nss[i];
if (ns) {
+   kvfree(ns->pages_bitmap);
+   if (ns->pgalloc_recs_bitmap)
+   bitmap_free(ns->pgalloc_recs_bitmap);
+
blkdev_put(ns->bdev, FMODE_READ|FMODE_WRITE|FMODE_EXEC);
kfree(ns);
}
@@ -48,17 +52,122 @@ static void release_nvm_set(struct bch_nvm_set *nvm_set)
kfree(nvm_set);
 }
 
+static struct page *nvm_vaddr_to_page(struct bch_nvm_namespace *ns, void *addr)
+{
+   return virt_to_page(addr);
+}
+
+static void *nvm_pgoff_to_vaddr(struct bch_nvm_namespace *ns, pgoff_t pgoff)
+{
+   return ns->kaddr + (pgoff << PAGE_SHIFT);
+}
+
+static inline void remove_owner_space(struct bch_nvm_namespace *ns,
+   pgoff_t pgoff, u32 nr)
+{
+   bitmap_set(ns->pages_bitmap, pgoff, nr);
+}
+
 static int init_owner_info(struct bch_nvm_namespace *ns)
 {
struct bch_owner_list_head *owner_list_head = ns->sb->owner_list_head;
+   struct bch_nvm_pgalloc_recs *sys_recs;
+   int i, j, k, rc = 0;
 
mutex_lock(_set->lock);
only_set->owner_list_head = owner_list_head;
only_set->owner_list_size = owner_list_head->size;
only_set->owner_list_used = owner_list_head->used;
+
+   /*remove used space*/
+   remove_owner_space(ns, 0, ns->pages_offset/ns->page_size);
+
+   sys_recs = ns->kaddr + BCH_NVM_PAGES_SYS_RECS_HEAD_OFFSET;
+   // suppose no hole in array
+   for (i = 0; i < owner_list_head->used; i++) {
+   struct bch_nvm_pages_owner_head *head = 
_list_head->heads[i];
+
+   for (j = 0; j < BCH_NVM_PAGES_NAMESPACES_MAX; j++) {
+   struct bch_nvm_pgalloc_recs *pgalloc_recs = 
head->recs[j];
+   unsigned long offset = (unsigned long)ns->kaddr >> 
PAGE_SHIFT;
+   struct page *page;
+
+   while (pgalloc_recs) {
+   u32 pgalloc_recs_pos = (unsigned 
long)(pgalloc_recs - sys_recs);
+
+   if (memcmp(pgalloc_recs->magic, 
bch_nvm_pages_pgalloc_magic, 16)) {
+   pr_info("invalid 
bch_nvm_pages_pgalloc_magic\n");
+   rc = -EINVAL;
+   goto unlock;
+   }
+   if (memcmp(pgalloc_recs->owner_uuid, 
head->uuid, 16)) {
+   pr_info("invalid owner_uuid in 
bch_nvm_pgalloc_recs\n");
+   rc = -EINVAL;
+   goto unlock;
+   }
+   if (pgalloc_recs->owner != head) {
+   pr_info("invalid owner in 
bch_nvm_pgalloc_recs\n");
+   rc = -EINVAL;
+

[PATCH v7 02/16] bcache: initialize the nvm pages allocator

2021-04-09 Thread Coly Li
From: Jianpeng Ma 

This patch define the prototype data structures in memory and initializes
the nvm pages allocator.

The nv address space which is managed by this allocatior can consist of
many nvm namespaces, and some namespaces can compose into one nvm set,
like cache set. For this initial implementation, only one set can be
supported.

The users of this nvm pages allocator need to call regiseter_namespace()
to register the nvdimm device (like /dev/pmemX) into this allocator as
the instance of struct nvm_namespace.

Signed-off-by: Jianpeng Ma 
Co-authored-by: Qiaowei Ren 
---
 drivers/md/bcache/Kconfig |   6 +
 drivers/md/bcache/Makefile|   2 +-
 drivers/md/bcache/nvm-pages.c | 284 ++
 drivers/md/bcache/nvm-pages.h |  71 +
 drivers/md/bcache/super.c |   3 +
 5 files changed, 365 insertions(+), 1 deletion(-)
 create mode 100644 drivers/md/bcache/nvm-pages.c
 create mode 100644 drivers/md/bcache/nvm-pages.h

diff --git a/drivers/md/bcache/Kconfig b/drivers/md/bcache/Kconfig
index d1ca4d059c20..fdec9905ef40 100644
--- a/drivers/md/bcache/Kconfig
+++ b/drivers/md/bcache/Kconfig
@@ -35,3 +35,9 @@ config BCACHE_ASYNC_REGISTRATION
device path into this file will returns immediately and the real
registration work is handled in kernel work queue in asynchronous
way.
+
+config BCACHE_NVM_PAGES
+   bool "NVDIMM support for bcache (EXPERIMENTAL)"
+   depends on BCACHE
+   help
+   nvm pages allocator for bcache.
diff --git a/drivers/md/bcache/Makefile b/drivers/md/bcache/Makefile
index 5b87e59676b8..948e5ed2ca66 100644
--- a/drivers/md/bcache/Makefile
+++ b/drivers/md/bcache/Makefile
@@ -4,4 +4,4 @@ obj-$(CONFIG_BCACHE)+= bcache.o
 
 bcache-y   := alloc.o bset.o btree.o closure.o debug.o extents.o\
io.o journal.o movinggc.o request.o stats.o super.o sysfs.o trace.o\
-   util.o writeback.o features.o
+   util.o writeback.o features.o nvm-pages.o
diff --git a/drivers/md/bcache/nvm-pages.c b/drivers/md/bcache/nvm-pages.c
new file mode 100644
index ..101b108b9766
--- /dev/null
+++ b/drivers/md/bcache/nvm-pages.c
@@ -0,0 +1,284 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Nvdimm page-buddy allocator
+ *
+ * Copyright (c) 2021, Intel Corporation.
+ * Copyright (c) 2021, Qiaowei Ren .
+ * Copyright (c) 2021, Jianpeng Ma .
+ */
+
+#include "bcache.h"
+#include "nvm-pages.h"
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifdef CONFIG_BCACHE_NVM_PAGES
+
+struct bch_nvm_set *only_set;
+
+static void release_nvm_namespaces(struct bch_nvm_set *nvm_set)
+{
+   int i;
+   struct bch_nvm_namespace *ns;
+
+   for (i = 0; i < nvm_set->total_namespaces_nr; i++) {
+   ns = nvm_set->nss[i];
+   if (ns) {
+   blkdev_put(ns->bdev, FMODE_READ|FMODE_WRITE|FMODE_EXEC);
+   kfree(ns);
+   }
+   }
+
+   kfree(nvm_set->nss);
+}
+
+static void release_nvm_set(struct bch_nvm_set *nvm_set)
+{
+   release_nvm_namespaces(nvm_set);
+   kfree(nvm_set);
+}
+
+static int init_owner_info(struct bch_nvm_namespace *ns)
+{
+   struct bch_owner_list_head *owner_list_head = ns->sb->owner_list_head;
+
+   mutex_lock(_set->lock);
+   only_set->owner_list_head = owner_list_head;
+   only_set->owner_list_size = owner_list_head->size;
+   only_set->owner_list_used = owner_list_head->used;
+   mutex_unlock(_set->lock);
+
+   return 0;
+}
+
+static bool attach_nvm_set(struct bch_nvm_namespace *ns)
+{
+   bool rc = true;
+
+   mutex_lock(_set->lock);
+   if (only_set->nss) {
+   if (memcmp(ns->sb->set_uuid, only_set->set_uuid, 16)) {
+   pr_info("namespace id doesn't match nvm set\n");
+   rc = false;
+   goto unlock;
+   }
+
+   if (only_set->nss[ns->sb->this_namespace_nr]) {
+   pr_info("already has the same position(%d) nvm\n",
+   ns->sb->this_namespace_nr);
+   rc = false;
+   goto unlock;
+   }
+   } else {
+   memcpy(only_set->set_uuid, ns->sb->set_uuid, 16);
+   only_set->total_namespaces_nr = ns->sb->total_namespaces_nr;
+   only_set->nss = kcalloc(only_set->total_namespaces_nr,
+   sizeof(struct bch_nvm_namespace *), GFP_KERNEL);
+   if (!only_set->nss) {
+   rc = false;
+   goto unlock;
+   }
+   }
+
+   only_set->nss[ns->sb->this_namespace_nr] = ns;
+
+unlock:
+   mutex_unlock(_set->lock);
+   return rc;
+}
+
+static int read_nvdimm_meta_super(struct block_device *bdev,
+ struct bch_nvm_namespace *ns)
+{
+   

[PATCH v7 01/16] bcache: add initial data structures for nvm pages

2021-04-09 Thread Coly Li
This patch initializes the prototype data structures for nvm pages
allocator,

- struct bch_nvm_pages_sb
This is the super block allocated on each nvdimm namespace. A nvdimm
set may have multiple namespaces, bch_nvm_pages_sb->set_uuid is used
to mark which nvdimm set this name space belongs to. Normally we will
use the bcache's cache set UUID to initialize this uuid, to connect this
nvdimm set to a specified bcache cache set.

- struct bch_owner_list_head
This is a table for all heads of all owner lists. A owner list records
which page(s) allocated to which owner. After reboot from power failure,
the ownwer may find all its requested and allocated pages from the owner
list by a handler which is converted by a UUID.

- struct bch_nvm_pages_owner_head
This is a head of an owner list. Each owner only has one owner list,
and a nvm page only belongs to an specific owner. uuid[] will be set to
owner's uuid, for bcache it is the bcache's cache set uuid. label is not
mandatory, it is a human-readable string for debug purpose. The pointer
*recs references to separated nvm page which hold the table of struct
bch_nvm_pgalloc_rec.

- struct bch_nvm_pgalloc_recs
This struct occupies a whole page, owner_uuid should match the uuid
in struct bch_nvm_pages_owner_head. recs[] is the real table contains all
allocated records.

- struct bch_nvm_pgalloc_rec
Each structure records a range of allocated nvm pages.
  - Bits  0 - 51: is pages offset of the allocated pages.
  - Bits 52 - 57: allocaed size in page_size * order-of-2
  - Bits 58 - 63: reserved.
Since each of the allocated nvm pages are power of 2, using 6 bits to
represent allocated size can have (1<<(1<<64) - 1) * PAGE_SIZE maximum
value. It can be a 76 bits width range size in byte for 4KB page size,
which is large enough currently.

Signed-off-by: Coly Li 
Cc: Jianpeng Ma 
Cc: Qiaowei Ren 
---
 include/uapi/linux/bcache-nvm.h | 202 
 1 file changed, 202 insertions(+)
 create mode 100644 include/uapi/linux/bcache-nvm.h

diff --git a/include/uapi/linux/bcache-nvm.h b/include/uapi/linux/bcache-nvm.h
new file mode 100644
index ..3c381c1b32ba
--- /dev/null
+++ b/include/uapi/linux/bcache-nvm.h
@@ -0,0 +1,202 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+
+#ifndef _UAPI_BCACHE_NVM_H
+#define _UAPI_BCACHE_NVM_H
+
+/*
+ * Bcache on NVDIMM data structures
+ */
+
+/*
+ * - struct bch_nvm_pages_sb
+ *   This is the super block allocated on each nvdimm namespace. A nvdimm
+ * set may have multiple namespaces, bch_nvm_pages_sb->set_uuid is used to mark
+ * which nvdimm set this name space belongs to. Normally we will use the
+ * bcache's cache set UUID to initialize this uuid, to connect this nvdimm
+ * set to a specified bcache cache set.
+ *
+ * - struct bch_owner_list_head
+ *   This is a table for all heads of all owner lists. A owner list records
+ * which page(s) allocated to which owner. After reboot from power failure,
+ * the ownwer may find all its requested and allocated pages from the owner
+ * list by a handler which is converted by a UUID.
+ *
+ * - struct bch_nvm_pages_owner_head
+ *   This is a head of an owner list. Each owner only has one owner list,
+ * and a nvm page only belongs to an specific owner. uuid[] will be set to
+ * owner's uuid, for bcache it is the bcache's cache set uuid. label is not
+ * mandatory, it is a human-readable string for debug purpose. The pointer
+ * recs references to separated nvm page which hold the table of struct
+ * bch_pgalloc_rec.
+ *
+ *- struct bch_nvm_pgalloc_recs
+ *  This structure occupies a whole page, owner_uuid should match the uuid
+ * in struct bch_nvm_pages_owner_head. recs[] is the real table contains all
+ * allocated records.
+ *
+ * - struct bch_pgalloc_rec
+ *   Each structure records a range of allocated nvm pages. pgoff is offset
+ * in unit of page size of this allocated nvm page range. The adjoint page
+ * ranges of same owner can be merged into a larger one, therefore pages_nr
+ * is NOT always power of 2.
+ *
+ *
+ * Memory layout on nvdimm namespace 0
+ *
+ *0 +-+
+ *  | |
+ *  4KB +-+
+ *  | bch_nvm_pages_sb|
+ *  8KB +-+ <--- 
bch_nvm_pages_sb.bch_owner_list_head
+ *  |   bch_owner_list_head   |
+ *  | |
+ * 16KB +-+ <--- 
bch_owner_list_head.heads[0].recs[0]
+ *  |   bch_nvm_pgalloc_recs  |
+ *  |  (nvm pages internal usage) |
+ * 24KB +-+
+ *  | |
+ *  | |
+ * 16MB  +-+
+ *  |  allocable nvm pages|
+ *  |  for buddy allocator|
+ * end  +-+
+ *
+ *
+ *
+ * Memory layout on nvdimm 

[PATCH v7 00/16] bcache: support NVDIMM for journaling

2021-04-09 Thread Coly Li
This is the 7th effort for bcache to support NVDIMM for jouranling since
the first nvm-pages series was posted.

This series is combination of the v7 nvm-pages allocator developed by
Intel developers and related bcache changes from me.

The nvm-pages allocator is a buddy-like allocator, which allocates size
in power-of-2 pages from the NVDIMM namespace. User space tool 'bcache'
has a new added '-M' option to format a NVDIMM namespace and register it
via sysfs interface as a bcache meta device. The nvm-pages kernel code
does a DAX mapping to map the whole namespace into system's memory
address range, and allocating the pages to requestion like typical buddy
allocator does. The major difference is nvm-pages allocator maintains
the pages allocated to each requester by a owner list which stored on
NVDIMM too. Owner list of different requester is tracked by a pre-
defined UUID, all the pages tracked in all owner lists are treated as
allocated busy pages and won't be initialized into buddy system after
the system reboot.

The bcache journal code may request a block of power-of-2 size pages
from the nvm-pages allocator, normally it is a range of 256MB or 512MB
continuous pages range. During meta data journaling, the in-memory jsets
go into the calculated nvdimm pages location by kernel memcpy routine.
So the journaling I/Os won't go into block device (e.g. SSD) anymore, 
the write and read for journal jsets happen on NVDIMM.

The nvm-pages on-NVDIMM data structures are defined as legacy in-memory
objects, because they ARE in-memory objects directly referenced by
linear addresses, both in system DRAM and NVDIMM. They are defined in
the following patch,
- bcache: add initial data structures for nvm pages

Intel developers Jianpeng Ma and Qiaowei Ren compose the initial code of
nvm-pages, the related patches are,
- bcache: initialize the nvm pages allocator
- bcache: initialization of the buddy
- bcache: bch_nvm_alloc_pages() of the buddy
- bcache: bch_nvm_free_pages() of the buddy
- bcache: get allocated pages from specific owner
All the code depends on Linux libnvdimm and dax drivers, the bcache nvm-
pages allocator can be treated as user of these two drivers.

I modify the bcache code to recognize the nvm meta device feature,
initialize journal on NVDIMM, and do journal I/Os on NVDIMM in the
following patches,
- bcache: add initial data structures for nvm pages
- bcache: use bucket index to set GC_MARK_METADATA for journal buckets
  in bch_btree_gc_finish()
- bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set
- bcache: initialize bcache journal for NVDIMM meta device
- bcache: support storing bcache journal into NVDIMM meta device
- bcache: read jset from NVDIMM pages for journal replay
- bcache: add sysfs interface register_nvdimm_meta to register NVDIMM
  meta device

Also during the code integration and testing, there are some issues are
fixed by the following patches,
- bcache: nvm-pages fixes for bcache integration testing
- bcache: use div_u64() in init_owner_info()
- bcache: fix BCACHE_NVM_PAGES' dependences in Kconfig
- bcache: more fix for compiling error when BCACHE_NVM_PAGES disabled
The above patches can be added or merged into nvm-pages code, so that
they can be dropped in next version of this series.

Current series works as expected, of course it is not perfect but the
state is fine as a code base for further improvement. For example the
power failure tolerance for nvm-pages owner list operations, more error
handling for journal code, and moving the B+ tree node I/Os into NVDIMM.

All the code is EXPERIMENTAL, they won't be enabled by default until we
feel the NVDIMM support is completed and stable.

Any comments and suggestion is warmly welcome :-)

Thank you in advance.

Coly Li

---
Changelog:
v7: Refine nvm-pages allocator code to operate owner list directly in
dax mapped NVDIMM pages, and remove the meta data copy from DRAM.
v6: The series submitted but not merged in Linux 5.12 merge window.
v1-v5: RFC patches of bcache nvm-pages.


Coly Li (11):
  bcache: add initial data structures for nvm pages
  bcache: nvm-pages fixes for bcache integration testing
  bcache: use bucket index to set GC_MARK_METADATA for journal buckets
in bch_btree_gc_finish()
  bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set
  bcache: initialize bcache journal for NVDIMM meta device
  bcache: support storing bcache journal into NVDIMM meta device
  bcache: read jset from NVDIMM pages for journal replay
  bcache: add sysfs interface register_nvdimm_meta to register NVDIMM
meta device
  bcache: use div_u64() in init_owner_info()
  bcache: fix BCACHE_NVM_PAGES' dependences in Kconfig
  bcache: more fix for compiling error when BCACHE_NVM_PAGES disabled

Jianpeng Ma (5):
  bcache: initialize the nvm pages allocator
  bcache: initialization of the buddy
  bcache: bch_nvm_alloc_pages() of the buddy
  bcache: bch_nvm_free_pages() of the buddy
  bcache: get allocated pages 

Re: [PATCH v1] libnvdimm, dax: Fix a missing check in nd_dax_probe()

2021-04-09 Thread Ira Weiny
On Thu, Apr 08, 2021 at 06:58:26PM -0700, wangyingji...@126.com wrote:
> From: Yingjie Wang 
> 
> In nd_dax_probe(), 'nd_dax' is allocated by nd_dax_alloc().
> nd_dax_alloc() may fail and return NULL, so we should better check

Avoid the use of 'we'.

> it's return value to avoid a NULL pointer dereference
> a bit later in the code.

How about:

"nd_dax_alloc() may fail and return NULL.  Check for NULL before attempting to
use nd_dax to avoid a NULL pointer dereference."

> 
> Fixes: c5ed9268643c ("libnvdimm, dax: autodetect support")
> Signed-off-by: Yingjie Wang 

The code looks good though.

Ira

> ---
>  drivers/nvdimm/dax_devs.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/nvdimm/dax_devs.c b/drivers/nvdimm/dax_devs.c
> index 99965077bac4..b1426ac03f01 100644
> --- a/drivers/nvdimm/dax_devs.c
> +++ b/drivers/nvdimm/dax_devs.c
> @@ -106,6 +106,8 @@ int nd_dax_probe(struct device *dev, struct 
> nd_namespace_common *ndns)
>  
>   nvdimm_bus_lock(>dev);
>   nd_dax = nd_dax_alloc(nd_region);
> + if (!nd_dax)
> + return -ENOMEM;
>   nd_pfn = _dax->nd_pfn;
>   dax_dev = nd_pfn_devinit(nd_pfn, ndns);
>   nvdimm_bus_unlock(>dev);
> -- 
> 2.7.4
> 
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: quick following up

2021-04-09 Thread Harrison Hopkins



Hi,
Hope this email finds you well.I am reaching out to tell you
that we have the following drone available in our stock now.
Basic Information:Stabilize two-axis
mechanical gimbalBattery: 11.4V 2850mAh Li-ion battery
(included)Transmitter battery: rechargeable lithium
batteryFlight time: 28 minutesRemote
control distance: about 2000m Giftbox or portable bag
for 8811 Pro DroneFeature:-Level-7 wind resistance -Two-axis stabilization
mechanical Pan-Tilt with a pitch angle of 110 degrees.-With foldable arms, it is compact and easy to carry.-8811 Pro Drone is equipped with 5G wifi fpv distance can reach
2000m.-5G WiFi 6K HD camera can provide a variety of
high-definition images and videos.-Follow me: The plane
will control the phone fixedly and follow the direction of movement of the
operator.-With altitude hold mode function, it can
provide stable flight.-Waypoint flight mode, just draw a
route on the screen and use the helicopter as a given path.-With auto return function. When the aircraft loses the controller
signal, the aircraft will return to the takeoff point accordingto the GPS trajectory.Costs details:  u  s  d 289.50
each (1-5 units)279.50 each (6-20 units)269.50 each (21-50
units)Order it? Just send us the shipping address, we will handle the
shipment for you.Order it? Just send us the
shipping address, we will handle the shipment for you.Thanks,Harrison Hopkins


___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org