[PATCH v2 1/2] lightnvm: pblk: Do not overwrite ppa list with meta list

2018-12-07 Thread Igor Konopko
Currently when using PBLK with 0 sized metadata both ppa list
and meta list points to the same memory since pblk_dma_meta_size()
returns 0 in that case.

This commit fix that issue by ensuring that pblk_dma_meta_size()
always returns space equal to sizeof(struct pblk_sec_meta) and thus
ppa list and meta list points to different memory address.

Even that in that case drive does not really care about meta_list
pointer, this is the easiest way to fix that issue without introducing
changes in many places in the code just for 0 sized metadata case.

The same approach needs to be also done for pblk_get_sec_meta()
since we also cannot point to the same memory address in meta buffer
when we are using it for pblk recovery process

Reported-by: Hans Holmberg 
Signed-off-by: Igor Konopko 
---
 drivers/lightnvm/pblk.h | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index bc40b1381ff6..85e38ed62f85 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -1388,12 +1388,15 @@ static inline unsigned int pblk_get_min_chks(struct 
pblk *pblk)
 static inline struct pblk_sec_meta *pblk_get_meta(struct pblk *pblk,
 void *meta, int index)
 {
-   return meta + pblk->oob_meta_size * index;
+   return meta +
+  max_t(int, sizeof(struct pblk_sec_meta), pblk->oob_meta_size)
+  * index;
 }
 
 static inline int pblk_dma_meta_size(struct pblk *pblk)
 {
-   return pblk->oob_meta_size * NVM_MAX_VLBA;
+   return max_t(int, sizeof(struct pblk_sec_meta), pblk->oob_meta_size)
+  * NVM_MAX_VLBA;
 }
 
 static inline int pblk_is_oob_meta_supported(struct pblk *pblk)
-- 
2.17.1



[PATCH v2 2/2] lightnvm: pblk: Ensure that bio is not freed on recovery

2018-12-07 Thread Igor Konopko
When we are using PBLK with 0 sized metadata during recovery
process we need to reference a last page of bio. Currently
KASAN reports use-after-free in that case, since bio is
freed on IO completion.

This patch adds addtional bio reference to ensure, that we
can still use bio memory after IO completion. It also ensures
that we are not reusing the same bio on retry_rq path.

Reported-by: Hans Holmberg 
Signed-off-by: Igor Konopko 
---
 drivers/lightnvm/pblk-recovery.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index 009faf5db40f..3fcf062d752c 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -376,12 +376,14 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct 
pblk_line *line,
rq_ppas = pblk->min_write_pgs;
rq_len = rq_ppas * geo->csecs;
 
+retry_rq:
bio = bio_map_kern(dev->q, data, rq_len, GFP_KERNEL);
if (IS_ERR(bio))
return PTR_ERR(bio);
 
bio->bi_iter.bi_sector = 0; /* internal bio */
bio_set_op_attrs(bio, REQ_OP_READ, 0);
+   bio_get(bio);
 
rqd->bio = bio;
rqd->opcode = NVM_OP_PREAD;
@@ -394,7 +396,6 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct 
pblk_line *line,
if (pblk_io_aligned(pblk, rq_ppas))
rqd->is_seq = 1;
 
-retry_rq:
for (i = 0; i < rqd->nr_ppas; ) {
struct ppa_addr ppa;
int pos;
@@ -417,6 +418,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct 
pblk_line *line,
if (ret) {
pblk_err(pblk, "I/O submission failed: %d\n", ret);
bio_put(bio);
+   bio_put(bio);
return ret;
}
 
@@ -428,19 +430,25 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct 
pblk_line *line,
 
if (padded) {
pblk_log_read_err(pblk, rqd);
+   bio_put(bio);
return -EINTR;
}
 
pad_distance = pblk_pad_distance(pblk, line);
ret = pblk_recov_pad_line(pblk, line, pad_distance);
-   if (ret)
+   if (ret) {
+   bio_put(bio);
return ret;
+   }
 
padded = true;
+   bio_put(bio);
goto retry_rq;
}
 
pblk_get_packed_meta(pblk, rqd);
+   bio_put(bio);
+
for (i = 0; i < rqd->nr_ppas; i++) {
struct pblk_sec_meta *meta = pblk_get_meta(pblk, meta_list, i);
u64 lba = le64_to_cpu(meta->lba);
-- 
2.17.1



[PATCH 1/2] lightnvm: pblk: Do not overwrite ppa list with meta list

2018-12-06 Thread Igor Konopko
Currently when using PBLK with 0 sized metadata both ppa list
and meta list points to the same memory since pblk_dma_meta_size()
returns 0 in that case.

This commit fix that issue by ensuring that pblk_dma_meta_size()
always returns space equal to sizeof(struct pblk_sec_meta) and thus
ppa list and meta list points to different memory address.

Even that in that case drive does not really care about meta_list
pointer, this is the easiest way to fix that issue without introducing
changes in many places in the code just for 0 sized metadata case.

Reported-by: Hans Holmberg 
Signed-off-by: Igor Konopko 
---
 drivers/lightnvm/pblk.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index bc40b1381ff6..e5c9ff2bf0da 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -1393,7 +1393,8 @@ static inline struct pblk_sec_meta *pblk_get_meta(struct 
pblk *pblk,
 
 static inline int pblk_dma_meta_size(struct pblk *pblk)
 {
-   return pblk->oob_meta_size * NVM_MAX_VLBA;
+   return max_t(int, sizeof(struct pblk_sec_meta), pblk->oob_meta_size)
+   * NVM_MAX_VLBA;
 }
 
 static inline int pblk_is_oob_meta_supported(struct pblk *pblk)
-- 
2.17.1



[PATCH 2/2] lightnvm: pblk: Ensure that bio is not freed on recovery

2018-12-06 Thread Igor Konopko
When we are using PBLK with 0 sized metadata during recovery
process we need to reference a last page of bio. Currently
KASAN reports use-after-free in that case, since bio is
freed on IO completion.

This patch adds addtional bio reference to ensure, that we
can still use bio memory after IO completion. It also ensures
that we are not reusing the same bio on retry_rq path.

Reported-by: Hans Holmberg 
Signed-off-by: Igor Konopko 
---
 drivers/lightnvm/pblk-recovery.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index 009faf5db40f..3fcf062d752c 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -376,12 +376,14 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct 
pblk_line *line,
rq_ppas = pblk->min_write_pgs;
rq_len = rq_ppas * geo->csecs;
 
+retry_rq:
bio = bio_map_kern(dev->q, data, rq_len, GFP_KERNEL);
if (IS_ERR(bio))
return PTR_ERR(bio);
 
bio->bi_iter.bi_sector = 0; /* internal bio */
bio_set_op_attrs(bio, REQ_OP_READ, 0);
+   bio_get(bio);
 
rqd->bio = bio;
rqd->opcode = NVM_OP_PREAD;
@@ -394,7 +396,6 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct 
pblk_line *line,
if (pblk_io_aligned(pblk, rq_ppas))
rqd->is_seq = 1;
 
-retry_rq:
for (i = 0; i < rqd->nr_ppas; ) {
struct ppa_addr ppa;
int pos;
@@ -417,6 +418,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct 
pblk_line *line,
if (ret) {
pblk_err(pblk, "I/O submission failed: %d\n", ret);
bio_put(bio);
+   bio_put(bio);
return ret;
}
 
@@ -428,19 +430,25 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct 
pblk_line *line,
 
if (padded) {
pblk_log_read_err(pblk, rqd);
+   bio_put(bio);
return -EINTR;
}
 
pad_distance = pblk_pad_distance(pblk, line);
ret = pblk_recov_pad_line(pblk, line, pad_distance);
-   if (ret)
+   if (ret) {
+   bio_put(bio);
return ret;
+   }
 
padded = true;
+   bio_put(bio);
goto retry_rq;
}
 
pblk_get_packed_meta(pblk, rqd);
+   bio_put(bio);
+
for (i = 0; i < rqd->nr_ppas; i++) {
struct pblk_sec_meta *meta = pblk_get_meta(pblk, meta_list, i);
u64 lba = le64_to_cpu(meta->lba);
-- 
2.17.1



[PATCH v5 3/5] lightnvm: Flexible DMA pool entry size

2018-11-30 Thread Igor Konopko
Currently whole lightnvm and pblk uses single DMA pool,
for which entry size is always equal to PAGE_SIZE.
PPA list always needs 8B*64, so there is only 56B*64
space for OOB meta. Since NVMe OOB meta can be bigger,
such as 128B, this solution is not robustness.

This patch add the possiblity to support OOB meta above
56b by changing DMA pool size based on OOB meta size.

It also allows pblk to use OOB metadata >=16B.

Reviewed-by: Javier González 
Signed-off-by: Igor Konopko 
---
 drivers/lightnvm/core.c  | 9 +++--
 drivers/lightnvm/pblk-core.c | 8 
 drivers/lightnvm/pblk-init.c | 2 +-
 drivers/lightnvm/pblk-recovery.c | 4 ++--
 drivers/lightnvm/pblk.h  | 6 +-
 drivers/nvme/host/lightnvm.c | 5 +++--
 include/linux/lightnvm.h | 2 +-
 7 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
index 69b841d682c7..5f82036fe322 100644
--- a/drivers/lightnvm/core.c
+++ b/drivers/lightnvm/core.c
@@ -1140,7 +1140,7 @@ EXPORT_SYMBOL(nvm_alloc_dev);
 
 int nvm_register(struct nvm_dev *dev)
 {
-   int ret;
+   int ret, exp_pool_size;
 
if (!dev->q || !dev->ops)
return -EINVAL;
@@ -1149,7 +1149,12 @@ int nvm_register(struct nvm_dev *dev)
if (ret)
return ret;
 
-   dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist");
+   exp_pool_size = max_t(int, PAGE_SIZE,
+ (NVM_MAX_VLBA * (sizeof(u64) + dev->geo.sos)));
+   exp_pool_size = round_up(exp_pool_size, PAGE_SIZE);
+
+   dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist",
+ exp_pool_size);
if (!dev->dma_pool) {
pr_err("nvm: could not create dma pool\n");
nvm_free(dev);
diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index e732b2d12a23..7e3397f8ead1 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -250,8 +250,8 @@ int pblk_alloc_rqd_meta(struct pblk *pblk, struct nvm_rq 
*rqd)
if (rqd->nr_ppas == 1)
return 0;
 
-   rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size;
-   rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size;
+   rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size(pblk);
+   rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size(pblk);
 
return 0;
 }
@@ -846,8 +846,8 @@ int pblk_line_emeta_read(struct pblk *pblk, struct 
pblk_line *line,
if (!meta_list)
return -ENOMEM;
 
-   ppa_list = meta_list + pblk_dma_meta_size;
-   dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
+   ppa_list = meta_list + pblk_dma_meta_size(pblk);
+   dma_ppa_list = dma_meta_list + pblk_dma_meta_size(pblk);
 
 next_rq:
memset(, 0, sizeof(struct nvm_rq));
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 33361bfb85c3..ff6a6df369c3 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -406,7 +406,7 @@ static int pblk_core_init(struct pblk *pblk)
pblk_set_sec_per_write(pblk, pblk->min_write_pgs);
 
pblk->oob_meta_size = geo->sos;
-   if (pblk->oob_meta_size != sizeof(struct pblk_sec_meta)) {
+   if (pblk->oob_meta_size < sizeof(struct pblk_sec_meta)) {
pblk_err(pblk, "Unsupported metadata size\n");
return -EINVAL;
}
diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index e4dd634ba05f..3a775d10f616 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -481,8 +481,8 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, 
struct pblk_line *line)
if (!meta_list)
return -ENOMEM;
 
-   ppa_list = (void *)(meta_list) + pblk_dma_meta_size;
-   dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
+   ppa_list = (void *)(meta_list) + pblk_dma_meta_size(pblk);
+   dma_ppa_list = dma_meta_list + pblk_dma_meta_size(pblk);
 
data = kcalloc(pblk->max_write_pgs, geo->csecs, GFP_KERNEL);
if (!data) {
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index 80f356688803..9087d53d5c25 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -104,7 +104,6 @@ enum {
PBLK_RL_LOW = 4
 };
 
-#define pblk_dma_meta_size (sizeof(struct pblk_sec_meta) * NVM_MAX_VLBA)
 #define pblk_dma_ppa_size (sizeof(u64) * NVM_MAX_VLBA)
 
 /* write buffer completion context */
@@ -1388,4 +1387,9 @@ static inline struct pblk_sec_meta *pblk_get_meta(struct 
pblk *pblk,
 {
return meta + pblk->oob_meta_size * index;
 }
+
+static inline int pblk_dma_meta_size(struct pblk *pblk)
+{
+   return pblk->oob_meta_size * NVM_MAX_VLBA;
+}
 #endif /* PBLK_H_ */
diff -

[PATCH v5 2/5] lightnvm: pblk: Helpers for OOB metadata

2018-11-30 Thread Igor Konopko
Currently pblk assumes that size of OOB metadata on drive is always
equal to size of pblk_sec_meta struct. This commit add helpers which will
allow to handle different sizes of OOB metadata on drive in the future.
Still, after this patch only OOB metadata equal to 16 bytes is supported.

Reviewed-by: Javier González 
Signed-off-by: Igor Konopko 
---
 drivers/lightnvm/pblk-core.c |  5 +++--
 drivers/lightnvm/pblk-init.c |  6 +
 drivers/lightnvm/pblk-map.c  | 20 +++--
 drivers/lightnvm/pblk-read.c | 48 ++--
 drivers/lightnvm/pblk-recovery.c | 16 +-
 drivers/lightnvm/pblk.h  |  6 +
 6 files changed, 69 insertions(+), 32 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index f1b411e7c7c9..e732b2d12a23 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -796,10 +796,11 @@ static int pblk_line_smeta_write(struct pblk *pblk, 
struct pblk_line *line,
rqd.is_seq = 1;
 
for (i = 0; i < lm->smeta_sec; i++, paddr++) {
-   struct pblk_sec_meta *meta_list = rqd.meta_list;
+   struct pblk_sec_meta *meta = pblk_get_meta(pblk,
+  rqd.meta_list, i);
 
rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
-   meta_list[i].lba = lba_list[paddr] = addr_empty;
+   meta->lba = lba_list[paddr] = addr_empty;
}
 
ret = pblk_submit_io_sync_sem(pblk, );
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 72ad3e70318c..33361bfb85c3 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -405,6 +405,12 @@ static int pblk_core_init(struct pblk *pblk)
queue_max_hw_sectors(dev->q) / (geo->csecs >> SECTOR_SHIFT));
pblk_set_sec_per_write(pblk, pblk->min_write_pgs);
 
+   pblk->oob_meta_size = geo->sos;
+   if (pblk->oob_meta_size != sizeof(struct pblk_sec_meta)) {
+   pblk_err(pblk, "Unsupported metadata size\n");
+   return -EINVAL;
+   }
+
pblk->pad_dist = kcalloc(pblk->min_write_pgs - 1, sizeof(atomic64_t),
GFP_KERNEL);
if (!pblk->pad_dist)
diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
index 5a3c28cce8ab..81e503ec384e 100644
--- a/drivers/lightnvm/pblk-map.c
+++ b/drivers/lightnvm/pblk-map.c
@@ -22,7 +22,7 @@
 static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
  struct ppa_addr *ppa_list,
  unsigned long *lun_bitmap,
- struct pblk_sec_meta *meta_list,
+ void *meta_list,
  unsigned int valid_secs)
 {
struct pblk_line *line = pblk_line_get_data(pblk);
@@ -58,6 +58,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int 
sentry,
paddr = pblk_alloc_page(pblk, line, nr_secs);
 
for (i = 0; i < nr_secs; i++, paddr++) {
+   struct pblk_sec_meta *meta = pblk_get_meta(pblk, meta_list, i);
__le64 addr_empty = cpu_to_le64(ADDR_EMPTY);
 
/* ppa to be sent to the device */
@@ -74,14 +75,15 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned 
int sentry,
kref_get(>ref);
w_ctx = pblk_rb_w_ctx(>rwb, sentry + i);
w_ctx->ppa = ppa_list[i];
-   meta_list[i].lba = cpu_to_le64(w_ctx->lba);
+   meta->lba = cpu_to_le64(w_ctx->lba);
lba_list[paddr] = cpu_to_le64(w_ctx->lba);
if (lba_list[paddr] != addr_empty)
line->nr_valid_lbas++;
else
atomic64_inc(>pad_wa);
} else {
-   lba_list[paddr] = meta_list[i].lba = addr_empty;
+   lba_list[paddr] = addr_empty;
+   meta->lba = addr_empty;
__pblk_map_invalidate(pblk, line, paddr);
}
}
@@ -94,7 +96,8 @@ int pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, 
unsigned int sentry,
 unsigned long *lun_bitmap, unsigned int valid_secs,
 unsigned int off)
 {
-   struct pblk_sec_meta *meta_list = rqd->meta_list;
+   void *meta_list = rqd->meta_list;
+   void *meta_buffer;
struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
unsigned int map_secs;
int min = pblk->min_write_pgs;
@@ -103,9 +106,10 @@ int pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, 
unsigned int sentry,
 
for (i = off; i < rqd->nr_ppas; i += min) {
  

[PATCH v5 4/5] lightnvm: Disable interleaved metadata

2018-11-30 Thread Igor Konopko
Currently pblk and lightnvm does only check for size
of OOB metadata and does not care wheather this meta
is located in separate buffer or is interleaved with
data in single buffer.

In reality only the first scenario is supported, where
second mode will break pblk functionality during any
IO operation.

The goal of this patch is to block creation of pblk
devices in case of interleaved metadata

Reviewed-by: Javier González 
Signed-off-by: Igor Konopko 
---
 drivers/lightnvm/pblk-init.c | 6 ++
 drivers/nvme/host/lightnvm.c | 1 +
 include/linux/lightnvm.h | 1 +
 3 files changed, 8 insertions(+)

diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index ff6a6df369c3..e8055b796381 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -1175,6 +1175,12 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct 
gendisk *tdisk,
return ERR_PTR(-EINVAL);
}
 
+   if (geo->ext) {
+   pblk_err(pblk, "extended metadata not supported\n");
+   kfree(pblk);
+   return ERR_PTR(-EINVAL);
+   }
+
spin_lock_init(>resubmit_lock);
spin_lock_init(>trans_lock);
spin_lock_init(>lock);
diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
index ba268d7cf141..f145fc0220d6 100644
--- a/drivers/nvme/host/lightnvm.c
+++ b/drivers/nvme/host/lightnvm.c
@@ -990,6 +990,7 @@ int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, 
int node)
geo = >geo;
geo->csecs = 1 << ns->lba_shift;
geo->sos = ns->ms;
+   geo->ext = ns->ext;
 
dev->q = q;
memcpy(dev->name, disk_name, DISK_NAME_LEN);
diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
index 7afedaddbd15..5d865a5d5cdc 100644
--- a/include/linux/lightnvm.h
+++ b/include/linux/lightnvm.h
@@ -357,6 +357,7 @@ struct nvm_geo {
u32 clba;   /* sectors per chunk */
u16 csecs;  /* sector size */
u16 sos;/* out-of-band area size */
+   boolext;/* metadata in extended data buffer */
 
/* device write constrains */
u32 ws_min; /* minimum write size */
-- 
2.14.5



[PATCH v5 5/5] lightnvm: pblk: Support for packed metadata

2018-11-30 Thread Igor Konopko
In current pblk implementation, l2p mapping for not closed lines
is always stored only in OOB metadata and recovered from it.

Such a solution does not provide data integrity when drives does
not have such a OOB metadata space.

The goal of this patch is to add support for so called packed
metadata, which store l2p mapping for open lines in last sector
of every write unit.

After this set of changes, drives with OOB size <16B will use
packed metadata, when >=16B will continue to use OOB metadata.

Reviewed-by: Javier González 
Signed-off-by: Igor Konopko 
---
 drivers/lightnvm/pblk-core.c | 48 
 drivers/lightnvm/pblk-init.c | 38 ++-
 drivers/lightnvm/pblk-map.c  |  4 ++--
 drivers/lightnvm/pblk-rb.c   |  3 +++
 drivers/lightnvm/pblk-read.c |  6 +
 drivers/lightnvm/pblk-recovery.c |  5 +++--
 drivers/lightnvm/pblk-sysfs.c|  7 ++
 drivers/lightnvm/pblk-write.c|  9 
 drivers/lightnvm/pblk.h  | 10 -
 9 files changed, 112 insertions(+), 18 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 7e3397f8ead1..1ff165351180 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -376,7 +376,7 @@ void pblk_write_should_kick(struct pblk *pblk)
 {
unsigned int secs_avail = pblk_rb_read_count(>rwb);
 
-   if (secs_avail >= pblk->min_write_pgs)
+   if (secs_avail >= pblk->min_write_pgs_data)
pblk_write_kick(pblk);
 }
 
@@ -407,7 +407,9 @@ struct list_head *pblk_line_gc_list(struct pblk *pblk, 
struct pblk_line *line)
struct pblk_line_meta *lm = >lm;
struct pblk_line_mgmt *l_mg = >l_mg;
struct list_head *move_list = NULL;
-   int vsc = le32_to_cpu(*line->vsc);
+   int packed_meta = (le32_to_cpu(*line->vsc) / pblk->min_write_pgs_data)
+   * (pblk->min_write_pgs - pblk->min_write_pgs_data);
+   int vsc = le32_to_cpu(*line->vsc) + packed_meta;
 
lockdep_assert_held(>lock);
 
@@ -620,12 +622,15 @@ struct bio *pblk_bio_map_addr(struct pblk *pblk, void 
*data,
 }
 
 int pblk_calc_secs(struct pblk *pblk, unsigned long secs_avail,
-  unsigned long secs_to_flush)
+  unsigned long secs_to_flush, bool skip_meta)
 {
int max = pblk->sec_per_write;
int min = pblk->min_write_pgs;
int secs_to_sync = 0;
 
+   if (skip_meta && pblk->min_write_pgs_data != pblk->min_write_pgs)
+   min = max = pblk->min_write_pgs_data;
+
if (secs_avail >= max)
secs_to_sync = max;
else if (secs_avail >= min)
@@ -852,7 +857,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct 
pblk_line *line,
 next_rq:
memset(, 0, sizeof(struct nvm_rq));
 
-   rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
+   rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
rq_len = rq_ppas * geo->csecs;
 
bio = pblk_bio_map_addr(pblk, emeta_buf, rq_ppas, rq_len,
@@ -2169,3 +2174,38 @@ void pblk_lookup_l2p_rand(struct pblk *pblk, struct 
ppa_addr *ppas,
}
spin_unlock(>trans_lock);
 }
+
+void *pblk_get_meta_for_writes(struct pblk *pblk, struct nvm_rq *rqd)
+{
+   void *buffer;
+
+   if (pblk_is_oob_meta_supported(pblk)) {
+   /* Just use OOB metadata buffer as always */
+   buffer = rqd->meta_list;
+   } else {
+   /* We need to reuse last page of request (packed metadata)
+* in similar way as traditional oob metadata
+*/
+   buffer = page_to_virt(
+   rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page);
+   }
+
+   return buffer;
+}
+
+void pblk_get_packed_meta(struct pblk *pblk, struct nvm_rq *rqd)
+{
+   void *meta_list = rqd->meta_list;
+   void *page;
+   int i = 0;
+
+   if (pblk_is_oob_meta_supported(pblk))
+   return;
+
+   page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page);
+   /* We need to fill oob meta buffer with data from packed metadata */
+   for (; i < rqd->nr_ppas; i++)
+   memcpy(pblk_get_meta(pblk, meta_list, i),
+   page + (i * sizeof(struct pblk_sec_meta)),
+   sizeof(struct pblk_sec_meta));
+}
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index e8055b796381..f9a3e47b6a93 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -399,6 +399,7 @@ static int pblk_core_init(struct pblk *pblk)
pblk->nr_flush_rst = 0;
 
pblk->min_write_pgs = geo->ws_opt;
+   pblk->min_write_pgs_data = pblk->min_write_pgs;
max_write_ppas = pblk->min_write_pgs * geo->all_luns;
pblk->max_write_pgs = min_t(i

[PATCH v5 0/5] lightnvm: Flexible metadata

2018-11-30 Thread Igor Konopko
This series of patches extends the way how pblk can
store L2P sector metadata. After this set of changes
any size of NVMe metadata is supported in pblk.
Also there is an support for case without NVMe metadata.

Changes v4 --> v5:
-rebase on top of ocssd/for-4.21/core

Changes v3 --> v4:
-rename nvm_alloc_dma_pool() to nvm_create_dma_pool()
-split pblk_get_meta() calls and lba setting into
two operations for better core readability
-fixing compilation with CONFIG_NVM disabled
-getting rid of unnecessary memcpy for packed metadata
on write path
-support for drives with oob size >0 and <16B in packed
metadata mode
-minor commit message updates

Changes v2 --> v3:
-Rebase on top of ocssd/for-4.21/core
-get/set_meta_lba helpers were removed
-dma reallocation was replaced with single allocation
-oob metadata size was added to pblk structure
-proper checks on pblk creation were added
 
Changes v1 --> v2:
-Revert sector meta size back to 16b for pblk
-Dma pool for larger oob meta are handled in core instead of pblk
-Pblk oob meta helpers uses __le64 as input outpu instead of u64
-Other minor fixes based on v1 patch review

Igor Konopko (5):
  lightnvm: pblk: Move lba list to partial read context
  lightnvm: pblk: Helpers for OOB metadata
  lightnvm: Flexible DMA pool entry size
  lightnvm: Disable interleaved metadata
  lightnvm: pblk: Support for packed metadata

 drivers/lightnvm/core.c  |  9 --
 drivers/lightnvm/pblk-core.c | 61 +++--
 drivers/lightnvm/pblk-init.c | 44 +--
 drivers/lightnvm/pblk-map.c  | 20 +++-
 drivers/lightnvm/pblk-rb.c   |  3 ++
 drivers/lightnvm/pblk-read.c | 66 +++-
 drivers/lightnvm/pblk-recovery.c | 25 +--
 drivers/lightnvm/pblk-sysfs.c|  7 +
 drivers/lightnvm/pblk-write.c|  9 +++---
 drivers/lightnvm/pblk.h  | 24 +--
 drivers/nvme/host/lightnvm.c |  6 ++--
 include/linux/lightnvm.h |  3 +-
 12 files changed, 209 insertions(+), 68 deletions(-)

-- 
2.14.5



[PATCH v5 1/5] lightnvm: pblk: Move lba list to partial read context

2018-11-30 Thread Igor Konopko
Currently DMA allocated memory is reused on partial read
for lba_list_mem and lba_list_media arrays. In preparation
for dynamic DMA pool sizes we need to move this arrays
into pblk_pr_ctx structures.

Reviewed-by: Javier González 
Signed-off-by: Igor Konopko 
---
 drivers/lightnvm/pblk-read.c | 20 +---
 drivers/lightnvm/pblk.h  |  2 ++
 2 files changed, 7 insertions(+), 15 deletions(-)

diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index 9fba614adeeb..19917d3c19b3 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -224,7 +224,6 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
unsigned long *read_bitmap = pr_ctx->bitmap;
int nr_secs = pr_ctx->orig_nr_secs;
int nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs);
-   __le64 *lba_list_mem, *lba_list_media;
void *src_p, *dst_p;
int hole, i;
 
@@ -237,13 +236,9 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
rqd->ppa_list[0] = ppa;
}
 
-   /* Re-use allocated memory for intermediate lbas */
-   lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
-   lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk_dma_ppa_size);
-
for (i = 0; i < nr_secs; i++) {
-   lba_list_media[i] = meta_list[i].lba;
-   meta_list[i].lba = lba_list_mem[i];
+   pr_ctx->lba_list_media[i] = meta_list[i].lba;
+   meta_list[i].lba = pr_ctx->lba_list_mem[i];
}
 
/* Fill the holes in the original bio */
@@ -255,7 +250,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
line = pblk_ppa_to_line(pblk, rqd->ppa_list[i]);
kref_put(>ref, pblk_line_put);
 
-   meta_list[hole].lba = lba_list_media[i];
+   meta_list[hole].lba = pr_ctx->lba_list_media[i];
 
src_bv = new_bio->bi_io_vec[i++];
dst_bv = bio->bi_io_vec[bio_init_idx + hole];
@@ -295,13 +290,9 @@ static int pblk_setup_partial_read(struct pblk *pblk, 
struct nvm_rq *rqd,
struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
struct pblk_pr_ctx *pr_ctx;
struct bio *new_bio, *bio = r_ctx->private;
-   __le64 *lba_list_mem;
int nr_secs = rqd->nr_ppas;
int i;
 
-   /* Re-use allocated memory for intermediate lbas */
-   lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
-
new_bio = bio_alloc(GFP_KERNEL, nr_holes);
 
if (pblk_bio_add_pages(pblk, new_bio, GFP_KERNEL, nr_holes))
@@ -312,12 +303,12 @@ static int pblk_setup_partial_read(struct pblk *pblk, 
struct nvm_rq *rqd,
goto fail_free_pages;
}
 
-   pr_ctx = kmalloc(sizeof(struct pblk_pr_ctx), GFP_KERNEL);
+   pr_ctx = kzalloc(sizeof(struct pblk_pr_ctx), GFP_KERNEL);
if (!pr_ctx)
goto fail_free_pages;
 
for (i = 0; i < nr_secs; i++)
-   lba_list_mem[i] = meta_list[i].lba;
+   pr_ctx->lba_list_mem[i] = meta_list[i].lba;
 
new_bio->bi_iter.bi_sector = 0; /* internal bio */
bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
@@ -325,7 +316,6 @@ static int pblk_setup_partial_read(struct pblk *pblk, 
struct nvm_rq *rqd,
rqd->bio = new_bio;
rqd->nr_ppas = nr_holes;
 
-   pr_ctx->ppa_ptr = NULL;
pr_ctx->orig_bio = bio;
bitmap_copy(pr_ctx->bitmap, read_bitmap, NVM_MAX_VLBA);
pr_ctx->bio_init_idx = bio_init_idx;
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index e5b88a25d4d6..0e9d3960ac4c 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -132,6 +132,8 @@ struct pblk_pr_ctx {
unsigned int bio_init_idx;
void *ppa_ptr;
dma_addr_t dma_ppa_list;
+   __le64 lba_list_mem[NVM_MAX_VLBA];
+   __le64 lba_list_media[NVM_MAX_VLBA];
 };
 
 /* Pad context */
-- 
2.14.5



[PATCH v4 5/5] lightnvm: pblk: Support for packed metadata

2018-11-28 Thread Igor Konopko
In current pblk implementation, l2p mapping for not closed lines
is always stored only in OOB metadata and recovered from it.

Such a solution does not provide data integrity when drives does
not have such a OOB metadata space.

The goal of this patch is to add support for so called packed
metadata, which store l2p mapping for open lines in last sector
of every write unit.

After this set of changes, drives with OOB size <16B will use
packed metadata, when >=16B will continue to use OOB metadata.

Signed-off-by: Igor Konopko 
---
 drivers/lightnvm/pblk-core.c | 48 
 drivers/lightnvm/pblk-init.c | 38 ++-
 drivers/lightnvm/pblk-map.c  |  4 ++--
 drivers/lightnvm/pblk-rb.c   |  3 +++
 drivers/lightnvm/pblk-read.c |  6 +
 drivers/lightnvm/pblk-recovery.c |  5 +++--
 drivers/lightnvm/pblk-sysfs.c|  7 ++
 drivers/lightnvm/pblk-write.c|  9 
 drivers/lightnvm/pblk.h  | 10 -
 9 files changed, 112 insertions(+), 18 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 1347d1a93dd0..a95e18de5beb 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -376,7 +376,7 @@ void pblk_write_should_kick(struct pblk *pblk)
 {
unsigned int secs_avail = pblk_rb_read_count(>rwb);
 
-   if (secs_avail >= pblk->min_write_pgs)
+   if (secs_avail >= pblk->min_write_pgs_data)
pblk_write_kick(pblk);
 }
 
@@ -407,7 +407,9 @@ struct list_head *pblk_line_gc_list(struct pblk *pblk, 
struct pblk_line *line)
struct pblk_line_meta *lm = >lm;
struct pblk_line_mgmt *l_mg = >l_mg;
struct list_head *move_list = NULL;
-   int vsc = le32_to_cpu(*line->vsc);
+   int packed_meta = (le32_to_cpu(*line->vsc) / pblk->min_write_pgs_data)
+   * (pblk->min_write_pgs - pblk->min_write_pgs_data);
+   int vsc = le32_to_cpu(*line->vsc) + packed_meta;
 
lockdep_assert_held(>lock);
 
@@ -620,12 +622,15 @@ struct bio *pblk_bio_map_addr(struct pblk *pblk, void 
*data,
 }
 
 int pblk_calc_secs(struct pblk *pblk, unsigned long secs_avail,
-  unsigned long secs_to_flush)
+  unsigned long secs_to_flush, bool skip_meta)
 {
int max = pblk->sec_per_write;
int min = pblk->min_write_pgs;
int secs_to_sync = 0;
 
+   if (skip_meta && pblk->min_write_pgs_data != pblk->min_write_pgs)
+   min = max = pblk->min_write_pgs_data;
+
if (secs_avail >= max)
secs_to_sync = max;
else if (secs_avail >= min)
@@ -852,7 +857,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct 
pblk_line *line,
 next_rq:
memset(, 0, sizeof(struct nvm_rq));
 
-   rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
+   rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
rq_len = rq_ppas * geo->csecs;
 
bio = pblk_bio_map_addr(pblk, emeta_buf, rq_ppas, rq_len,
@@ -2161,3 +2166,38 @@ void pblk_lookup_l2p_rand(struct pblk *pblk, struct 
ppa_addr *ppas,
}
spin_unlock(>trans_lock);
 }
+
+void *pblk_get_meta_for_writes(struct pblk *pblk, struct nvm_rq *rqd)
+{
+   void *buffer;
+
+   if (pblk_is_oob_meta_supported(pblk)) {
+   /* Just use OOB metadata buffer as always */
+   buffer = rqd->meta_list;
+   } else {
+   /* We need to reuse last page of request (packed metadata)
+* in similar way as traditional oob metadata
+*/
+   buffer = page_to_virt(
+   rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page);
+   }
+
+   return buffer;
+}
+
+void pblk_get_packed_meta(struct pblk *pblk, struct nvm_rq *rqd)
+{
+   void *meta_list = rqd->meta_list;
+   void *page;
+   int i = 0;
+
+   if (pblk_is_oob_meta_supported(pblk))
+   return;
+
+   page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page);
+   /* We need to fill oob meta buffer with data from packed metadata */
+   for (; i < rqd->nr_ppas; i++)
+   memcpy(pblk_get_meta(pblk, meta_list, i),
+   page + (i * sizeof(struct pblk_sec_meta)),
+   sizeof(struct pblk_sec_meta));
+}
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index a728f861edd6..830ebe3d098a 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -403,6 +403,7 @@ static int pblk_core_init(struct pblk *pblk)
pblk->nr_flush_rst = 0;
 
pblk->min_write_pgs = geo->ws_opt;
+   pblk->min_write_pgs_data = pblk->min_write_pgs;
max_write_ppas = pblk->min_write_pgs * geo->all_luns;
pblk->max_write_pgs = min_t(int, max_write_ppas

[PATCH v4 4/5] lightnvm: Disable interleaved metadata

2018-11-28 Thread Igor Konopko
Currently pblk and lightnvm does only check for size
of OOB metadata and does not care wheather this meta
is located in separate buffer or is interleaved with
data in single buffer.

In reality only the first scenario is supported, where
second mode will break pblk functionality during any
IO operation.

The goal of this patch is to block creation of pblk
devices in case of interleaved metadata

Reviewed-by: Javier González 
Signed-off-by: Igor Konopko 
---
 drivers/lightnvm/pblk-init.c | 6 ++
 drivers/nvme/host/lightnvm.c | 1 +
 include/linux/lightnvm.h | 1 +
 3 files changed, 8 insertions(+)

diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index b67bca810eb7..a728f861edd6 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -1179,6 +1179,12 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct 
gendisk *tdisk,
return ERR_PTR(-EINVAL);
}
 
+   if (geo->ext) {
+   pblk_err(pblk, "extended metadata not supported\n");
+   kfree(pblk);
+   return ERR_PTR(-EINVAL);
+   }
+
spin_lock_init(>resubmit_lock);
spin_lock_init(>trans_lock);
spin_lock_init(>lock);
diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
index 60ac32b03fb6..ceb92610f43b 100644
--- a/drivers/nvme/host/lightnvm.c
+++ b/drivers/nvme/host/lightnvm.c
@@ -982,6 +982,7 @@ void nvme_nvm_update_nvm_info(struct nvme_ns *ns)
if (geo->version != NVM_OCSSD_SPEC_12) {
geo->csecs = 1 << ns->lba_shift;
geo->sos = ns->ms;
+   geo->ext = ns->ext;
}
 
if (nvm_create_dma_pool(ndev))
diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
index 216b373b7fea..2717e6141b1a 100644
--- a/include/linux/lightnvm.h
+++ b/include/linux/lightnvm.h
@@ -357,6 +357,7 @@ struct nvm_geo {
u32 clba;   /* sectors per chunk */
u16 csecs;  /* sector size */
u16 sos;/* out-of-band area size */
+   boolext;/* metadata in extended data buffer */
 
/* device write constrains */
u32 ws_min; /* minimum write size */
-- 
2.14.5



[PATCH v4 3/5] lightnvm: Flexible DMA pool entry size

2018-11-28 Thread Igor Konopko
Currently whole lightnvm and pblk uses single DMA pool,
for which entry size is always equal to PAGE_SIZE.
PPA list always needs 8B*64, so there is only 56B*64
space for OOB meta. Since NVMe OOB meta can be bigger,
such as 128B, this solution is not robustness.

This patch add the possiblity to support OOB meta above
56b by changing DMA pool size based on OOB meta size.

It also allows pblk to use OOB metadata >=16B.

Reviewed-by: Javier González 
Signed-off-by: Igor Konopko 
---
 drivers/lightnvm/core.c  | 30 --
 drivers/lightnvm/pblk-core.c |  8 
 drivers/lightnvm/pblk-init.c |  2 +-
 drivers/lightnvm/pblk-recovery.c |  4 ++--
 drivers/lightnvm/pblk.h  |  6 +-
 drivers/nvme/host/lightnvm.c | 15 +--
 include/linux/lightnvm.h |  7 ++-
 7 files changed, 47 insertions(+), 25 deletions(-)

diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
index 73ab3cf26868..e3a83e506458 100644
--- a/drivers/lightnvm/core.c
+++ b/drivers/lightnvm/core.c
@@ -1145,15 +1145,9 @@ int nvm_register(struct nvm_dev *dev)
if (!dev->q || !dev->ops)
return -EINVAL;
 
-   dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist");
-   if (!dev->dma_pool) {
-   pr_err("nvm: could not create dma pool\n");
-   return -ENOMEM;
-   }
-
ret = nvm_init(dev);
if (ret)
-   goto err_init;
+   return ret;
 
/* register device with a supported media manager */
down_write(_lock);
@@ -1161,9 +1155,6 @@ int nvm_register(struct nvm_dev *dev)
up_write(_lock);
 
return 0;
-err_init:
-   dev->ops->destroy_dma_pool(dev->dma_pool);
-   return ret;
 }
 EXPORT_SYMBOL(nvm_register);
 
@@ -1187,6 +1178,25 @@ void nvm_unregister(struct nvm_dev *dev)
 }
 EXPORT_SYMBOL(nvm_unregister);
 
+int nvm_create_dma_pool(struct nvm_dev *dev)
+{
+   int exp_pool_size;
+
+   exp_pool_size = max_t(int, PAGE_SIZE,
+ (NVM_MAX_VLBA * (sizeof(u64) + dev->geo.sos)));
+   exp_pool_size = round_up(exp_pool_size, PAGE_SIZE);
+
+   dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist",
+ exp_pool_size);
+   if (!dev->dma_pool) {
+   pr_err("nvm: could not create dma pool\n");
+   return -ENOMEM;
+   }
+
+   return 0;
+}
+EXPORT_SYMBOL(nvm_create_dma_pool);
+
 static int __nvm_configure_create(struct nvm_ioctl_create *create)
 {
struct nvm_dev *dev;
diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index aeb10bd78c62..1347d1a93dd0 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -250,8 +250,8 @@ int pblk_alloc_rqd_meta(struct pblk *pblk, struct nvm_rq 
*rqd)
if (rqd->nr_ppas == 1)
return 0;
 
-   rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size;
-   rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size;
+   rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size(pblk);
+   rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size(pblk);
 
return 0;
 }
@@ -846,8 +846,8 @@ int pblk_line_emeta_read(struct pblk *pblk, struct 
pblk_line *line,
if (!meta_list)
return -ENOMEM;
 
-   ppa_list = meta_list + pblk_dma_meta_size;
-   dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
+   ppa_list = meta_list + pblk_dma_meta_size(pblk);
+   dma_ppa_list = dma_meta_list + pblk_dma_meta_size(pblk);
 
 next_rq:
memset(, 0, sizeof(struct nvm_rq));
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 6e7a0c6c6655..b67bca810eb7 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -410,7 +410,7 @@ static int pblk_core_init(struct pblk *pblk)
pblk_set_sec_per_write(pblk, pblk->min_write_pgs);
 
pblk->oob_meta_size = geo->sos;
-   if (pblk->oob_meta_size != sizeof(struct pblk_sec_meta)) {
+   if (pblk->oob_meta_size < sizeof(struct pblk_sec_meta)) {
pblk_err(pblk, "Unsupported metadata size\n");
return -EINVAL;
}
diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index 6a30b9971283..52cbe06e3ebc 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -478,8 +478,8 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, 
struct pblk_line *line)
if (!meta_list)
return -ENOMEM;
 
-   ppa_list = (void *)(meta_list) + pblk_dma_meta_size;
-   dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
+   ppa_list = (void *)(meta_list) + pblk_dma_meta_size(pblk);
+   dma_ppa_list = dma_meta_list + pblk_dma_meta_size(pblk);
 
data

[PATCH v4 1/5] lightnvm: pblk: Move lba list to partial read context

2018-11-28 Thread Igor Konopko
Currently DMA allocated memory is reused on partial read
for lba_list_mem and lba_list_media arrays. In preparation
for dynamic DMA pool sizes we need to move this arrays
into pblk_pr_ctx structures.

Reviewed-by: Javier González 
Signed-off-by: Igor Konopko 
---
 drivers/lightnvm/pblk-read.c | 20 +---
 drivers/lightnvm/pblk.h  |  2 ++
 2 files changed, 7 insertions(+), 15 deletions(-)

diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index 9fba614adeeb..19917d3c19b3 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -224,7 +224,6 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
unsigned long *read_bitmap = pr_ctx->bitmap;
int nr_secs = pr_ctx->orig_nr_secs;
int nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs);
-   __le64 *lba_list_mem, *lba_list_media;
void *src_p, *dst_p;
int hole, i;
 
@@ -237,13 +236,9 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
rqd->ppa_list[0] = ppa;
}
 
-   /* Re-use allocated memory for intermediate lbas */
-   lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
-   lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk_dma_ppa_size);
-
for (i = 0; i < nr_secs; i++) {
-   lba_list_media[i] = meta_list[i].lba;
-   meta_list[i].lba = lba_list_mem[i];
+   pr_ctx->lba_list_media[i] = meta_list[i].lba;
+   meta_list[i].lba = pr_ctx->lba_list_mem[i];
}
 
/* Fill the holes in the original bio */
@@ -255,7 +250,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
line = pblk_ppa_to_line(pblk, rqd->ppa_list[i]);
kref_put(>ref, pblk_line_put);
 
-   meta_list[hole].lba = lba_list_media[i];
+   meta_list[hole].lba = pr_ctx->lba_list_media[i];
 
src_bv = new_bio->bi_io_vec[i++];
dst_bv = bio->bi_io_vec[bio_init_idx + hole];
@@ -295,13 +290,9 @@ static int pblk_setup_partial_read(struct pblk *pblk, 
struct nvm_rq *rqd,
struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
struct pblk_pr_ctx *pr_ctx;
struct bio *new_bio, *bio = r_ctx->private;
-   __le64 *lba_list_mem;
int nr_secs = rqd->nr_ppas;
int i;
 
-   /* Re-use allocated memory for intermediate lbas */
-   lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
-
new_bio = bio_alloc(GFP_KERNEL, nr_holes);
 
if (pblk_bio_add_pages(pblk, new_bio, GFP_KERNEL, nr_holes))
@@ -312,12 +303,12 @@ static int pblk_setup_partial_read(struct pblk *pblk, 
struct nvm_rq *rqd,
goto fail_free_pages;
}
 
-   pr_ctx = kmalloc(sizeof(struct pblk_pr_ctx), GFP_KERNEL);
+   pr_ctx = kzalloc(sizeof(struct pblk_pr_ctx), GFP_KERNEL);
if (!pr_ctx)
goto fail_free_pages;
 
for (i = 0; i < nr_secs; i++)
-   lba_list_mem[i] = meta_list[i].lba;
+   pr_ctx->lba_list_mem[i] = meta_list[i].lba;
 
new_bio->bi_iter.bi_sector = 0; /* internal bio */
bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
@@ -325,7 +316,6 @@ static int pblk_setup_partial_read(struct pblk *pblk, 
struct nvm_rq *rqd,
rqd->bio = new_bio;
rqd->nr_ppas = nr_holes;
 
-   pr_ctx->ppa_ptr = NULL;
pr_ctx->orig_bio = bio;
bitmap_copy(pr_ctx->bitmap, read_bitmap, NVM_MAX_VLBA);
pr_ctx->bio_init_idx = bio_init_idx;
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index e5b88a25d4d6..0e9d3960ac4c 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -132,6 +132,8 @@ struct pblk_pr_ctx {
unsigned int bio_init_idx;
void *ppa_ptr;
dma_addr_t dma_ppa_list;
+   __le64 lba_list_mem[NVM_MAX_VLBA];
+   __le64 lba_list_media[NVM_MAX_VLBA];
 };
 
 /* Pad context */
-- 
2.14.5



[PATCH v4 0/5] lightnvm: Flexible metadata

2018-11-28 Thread Igor Konopko
This series of patches extends the way how pblk can
store L2P sector metadata. After this set of changes
any size of NVMe metadata is supported in pblk.
Also there is an support for case without NVMe metadata.

Changes v3 --> v4:
-rename nvm_alloc_dma_pool() to nvm_create_dma_pool()
-split pblk_get_meta() calls and lba setting into
two operations for better core readability
-fixing compilation with CONFIG_NVM disabled
-getting rid of unnecessary memcpy for packed metadata
on write path
-support for drives with oob size >0 and <16B in packed
metadata mode
-minor commit message updates

Changes v2 --> v3:
-Rebase on top of ocssd/for-4.21/core
-get/set_meta_lba helpers were removed
-dma reallocation was replaced with single allocation
-oob metadata size was added to pblk structure
-proper checks on pblk creation were added
 
Changes v1 --> v2:
-Revert sector meta size back to 16b for pblk
-Dma pool for larger oob meta are handled in core instead of pblk
-Pblk oob meta helpers uses __le64 as input outpu instead of u64
-Other minor fixes based on v1 patch review

Igor Konopko (5):
  lightnvm: pblk: Move lba list to partial read context
  lightnvm: pblk: Helpers for OOB metadata
  lightnvm: Flexible DMA pool entry size
  lightnvm: Disable interleaved metadata
  lightnvm: pblk: Support for packed metadata

 drivers/lightnvm/core.c  | 30 --
 drivers/lightnvm/pblk-core.c | 61 +++--
 drivers/lightnvm/pblk-init.c | 44 +--
 drivers/lightnvm/pblk-map.c  | 20 +++-
 drivers/lightnvm/pblk-rb.c   |  3 ++
 drivers/lightnvm/pblk-read.c | 66 +++-
 drivers/lightnvm/pblk-recovery.c | 25 +--
 drivers/lightnvm/pblk-sysfs.c|  7 +
 drivers/lightnvm/pblk-write.c|  9 +++---
 drivers/lightnvm/pblk.h  | 24 +--
 drivers/nvme/host/lightnvm.c | 16 ++
 include/linux/lightnvm.h |  8 -
 12 files changed, 233 insertions(+), 80 deletions(-)

-- 
2.14.5



[PATCH v4 2/5] lightnvm: pblk: Helpers for OOB metadata

2018-11-28 Thread Igor Konopko
Currently pblk assumes that size of OOB metadata on drive is always
equal to size of pblk_sec_meta struct. This commit add helpers which will
allow to handle different sizes of OOB metadata on drive in the future.
Still, after this patch only OOB metadata equal to 16 bytes is supported.

Reviewed-by: Javier González 
Signed-off-by: Igor Konopko 
---
 drivers/lightnvm/pblk-core.c |  5 +++--
 drivers/lightnvm/pblk-init.c |  6 +
 drivers/lightnvm/pblk-map.c  | 20 +++--
 drivers/lightnvm/pblk-read.c | 48 ++--
 drivers/lightnvm/pblk-recovery.c | 16 +-
 drivers/lightnvm/pblk.h  |  6 +
 6 files changed, 69 insertions(+), 32 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 6581c35f51ee..aeb10bd78c62 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -796,10 +796,11 @@ static int pblk_line_smeta_write(struct pblk *pblk, 
struct pblk_line *line,
rqd.is_seq = 1;
 
for (i = 0; i < lm->smeta_sec; i++, paddr++) {
-   struct pblk_sec_meta *meta_list = rqd.meta_list;
+   struct pblk_sec_meta *meta = pblk_get_meta(pblk,
+  rqd.meta_list, i);
 
rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
-   meta_list[i].lba = lba_list[paddr] = addr_empty;
+   meta->lba = lba_list[paddr] = addr_empty;
}
 
ret = pblk_submit_io_sync_sem(pblk, );
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 0e37104de596..6e7a0c6c6655 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -409,6 +409,12 @@ static int pblk_core_init(struct pblk *pblk)
queue_max_hw_sectors(dev->q) / (geo->csecs >> SECTOR_SHIFT));
pblk_set_sec_per_write(pblk, pblk->min_write_pgs);
 
+   pblk->oob_meta_size = geo->sos;
+   if (pblk->oob_meta_size != sizeof(struct pblk_sec_meta)) {
+   pblk_err(pblk, "Unsupported metadata size\n");
+   return -EINVAL;
+   }
+
pblk->pad_dist = kcalloc(pblk->min_write_pgs - 1, sizeof(atomic64_t),
GFP_KERNEL);
if (!pblk->pad_dist)
diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
index 5a3c28cce8ab..81e503ec384e 100644
--- a/drivers/lightnvm/pblk-map.c
+++ b/drivers/lightnvm/pblk-map.c
@@ -22,7 +22,7 @@
 static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
  struct ppa_addr *ppa_list,
  unsigned long *lun_bitmap,
- struct pblk_sec_meta *meta_list,
+ void *meta_list,
  unsigned int valid_secs)
 {
struct pblk_line *line = pblk_line_get_data(pblk);
@@ -58,6 +58,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int 
sentry,
paddr = pblk_alloc_page(pblk, line, nr_secs);
 
for (i = 0; i < nr_secs; i++, paddr++) {
+   struct pblk_sec_meta *meta = pblk_get_meta(pblk, meta_list, i);
__le64 addr_empty = cpu_to_le64(ADDR_EMPTY);
 
/* ppa to be sent to the device */
@@ -74,14 +75,15 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned 
int sentry,
kref_get(>ref);
w_ctx = pblk_rb_w_ctx(>rwb, sentry + i);
w_ctx->ppa = ppa_list[i];
-   meta_list[i].lba = cpu_to_le64(w_ctx->lba);
+   meta->lba = cpu_to_le64(w_ctx->lba);
lba_list[paddr] = cpu_to_le64(w_ctx->lba);
if (lba_list[paddr] != addr_empty)
line->nr_valid_lbas++;
else
atomic64_inc(>pad_wa);
} else {
-   lba_list[paddr] = meta_list[i].lba = addr_empty;
+   lba_list[paddr] = addr_empty;
+   meta->lba = addr_empty;
__pblk_map_invalidate(pblk, line, paddr);
}
}
@@ -94,7 +96,8 @@ int pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, 
unsigned int sentry,
 unsigned long *lun_bitmap, unsigned int valid_secs,
 unsigned int off)
 {
-   struct pblk_sec_meta *meta_list = rqd->meta_list;
+   void *meta_list = rqd->meta_list;
+   void *meta_buffer;
struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
unsigned int map_secs;
int min = pblk->min_write_pgs;
@@ -103,9 +106,10 @@ int pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, 
unsigned int sentry,
 
for (i = off; i < rqd->nr_ppas; i += min) {
  

[PATCH] nvme: Fix PCIe surprise removal scenario

2018-11-23 Thread Igor Konopko
This patch fixes kernel OOPS for surprise removal
scenario for PCIe connected NVMe drives.

After latest changes, when PCIe device is not present,
nvme_dev_remove_admin() calls blk_cleanup_queue() on
admin queue, which frees hctx for that queue.
Moment later, on the same path nvme_kill_queues()
calls blk_mq_unquiesce_queue() on admin queue and
tries to access hctx of it, which leads to
following OOPS scenario:

Oops:  [#1] SMP PTI
RIP: 0010:sbitmap_any_bit_set+0xb/0x40
Call Trace:
 blk_mq_run_hw_queue+0xd5/0x150
 blk_mq_run_hw_queues+0x3a/0x50
 nvme_kill_queues+0x26/0x50
 nvme_remove_namespaces+0xb2/0xc0
 nvme_remove+0x60/0x140
 pci_device_remove+0x3b/0xb0

Fixes: cb4bfda62afa2 ("nvme-pci: fix hot removal during error handling")
Signed-off-by: Igor Konopko 
---
 drivers/nvme/host/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 65c42448e904..5aff95389694 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3601,7 +3601,7 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
down_read(>namespaces_rwsem);
 
/* Forcibly unquiesce queues to avoid blocking dispatch */
-   if (ctrl->admin_q)
+   if (ctrl->admin_q && !blk_queue_dying(ctrl->admin_q))
blk_mq_unquiesce_queue(ctrl->admin_q);
 
list_for_each_entry(ns, >namespaces, list)
-- 
2.14.5



[PATCH v3 5/5] lightnvm: pblk: Support for packed metadata

2018-11-23 Thread Igor Konopko
In current pblk implementation, l2p mapping for not closed lines
is always stored only in OOB metadata and recovered from it.

Such a solution does not provide data integrity when drives does
not have such a OOB metadata space.

The goal of this patch is to add support for so called packed
metadata, which store l2p mapping for open lines in last sector
of every write unit.

After this set of changes, drives with OOB >0 and <16b are still
not supported.

Signed-off-by: Igor Konopko 
---
 drivers/lightnvm/pblk-core.c | 53 +---
 drivers/lightnvm/pblk-init.c | 37 +---
 drivers/lightnvm/pblk-rb.c   |  3 +++
 drivers/lightnvm/pblk-read.c |  6 +
 drivers/lightnvm/pblk-recovery.c |  5 ++--
 drivers/lightnvm/pblk-sysfs.c|  7 ++
 drivers/lightnvm/pblk-write.c| 14 ---
 drivers/lightnvm/pblk.h  | 10 +++-
 8 files changed, 121 insertions(+), 14 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 2ebd3b079a96..615817bf97e3 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -376,7 +376,7 @@ void pblk_write_should_kick(struct pblk *pblk)
 {
unsigned int secs_avail = pblk_rb_read_count(>rwb);
 
-   if (secs_avail >= pblk->min_write_pgs)
+   if (secs_avail >= pblk->min_write_pgs_data)
pblk_write_kick(pblk);
 }
 
@@ -407,7 +407,9 @@ struct list_head *pblk_line_gc_list(struct pblk *pblk, 
struct pblk_line *line)
struct pblk_line_meta *lm = >lm;
struct pblk_line_mgmt *l_mg = >l_mg;
struct list_head *move_list = NULL;
-   int vsc = le32_to_cpu(*line->vsc);
+   int packed_meta = (le32_to_cpu(*line->vsc) / pblk->min_write_pgs_data)
+   * (pblk->min_write_pgs - pblk->min_write_pgs_data);
+   int vsc = le32_to_cpu(*line->vsc) + packed_meta;
 
lockdep_assert_held(>lock);
 
@@ -620,12 +622,15 @@ struct bio *pblk_bio_map_addr(struct pblk *pblk, void 
*data,
 }
 
 int pblk_calc_secs(struct pblk *pblk, unsigned long secs_avail,
-  unsigned long secs_to_flush)
+  unsigned long secs_to_flush, bool skip_meta)
 {
int max = pblk->sec_per_write;
int min = pblk->min_write_pgs;
int secs_to_sync = 0;
 
+   if (skip_meta && pblk->min_write_pgs_data != pblk->min_write_pgs)
+   min = max = pblk->min_write_pgs_data;
+
if (secs_avail >= max)
secs_to_sync = max;
else if (secs_avail >= min)
@@ -852,7 +857,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct 
pblk_line *line,
 next_rq:
memset(, 0, sizeof(struct nvm_rq));
 
-   rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
+   rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
rq_len = rq_ppas * geo->csecs;
 
bio = pblk_bio_map_addr(pblk, emeta_buf, rq_ppas, rq_len,
@@ -2161,3 +2166,43 @@ void pblk_lookup_l2p_rand(struct pblk *pblk, struct 
ppa_addr *ppas,
}
spin_unlock(>trans_lock);
 }
+
+void pblk_set_packed_meta(struct pblk *pblk, struct nvm_rq *rqd)
+{
+   void *meta_list = rqd->meta_list;
+   void *page;
+   int i = 0;
+
+   if (pblk_is_oob_meta_supported(pblk))
+   return;
+
+   /* We need to zero out metadata corresponding to packed meta page */
+   pblk_get_meta(pblk, meta_list, rqd->nr_ppas - 1)->lba =
+   cpu_to_le64(ADDR_EMPTY);
+
+   page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page);
+   /* We need to fill last page of request (packed metadata)
+* with data from oob meta buffer.
+*/
+   for (; i < rqd->nr_ppas; i++)
+   memcpy(page + (i * sizeof(struct pblk_sec_meta)),
+   pblk_get_meta(pblk, meta_list, i),
+   sizeof(struct pblk_sec_meta));
+}
+
+void pblk_get_packed_meta(struct pblk *pblk, struct nvm_rq *rqd)
+{
+   void *meta_list = rqd->meta_list;
+   void *page;
+   int i = 0;
+
+   if (pblk_is_oob_meta_supported(pblk))
+   return;
+
+   page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page);
+   /* We need to fill oob meta buffer with data from packe metadata */
+   for (; i < rqd->nr_ppas; i++)
+   memcpy(pblk_get_meta(pblk, meta_list, i),
+   page + (i * sizeof(struct pblk_sec_meta)),
+   sizeof(struct pblk_sec_meta));
+}
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index a728f861edd6..5536338eea76 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -403,6 +403,7 @@ static int pblk_core_init(struct pblk *pblk)
pblk->nr_flush_rst = 0;
 
pblk->min_write_pgs = geo-&g

[PATCH v3 4/5] lightnvm: Disable interleaved metadata

2018-11-23 Thread Igor Konopko
Currently pblk and lightnvm does only check for size
of OOB metadata and does not care wheather this meta
is located in separate buffer or is interleaved with
data in single buffer.

In reality only the first scenario is supported, where
second mode will break pblk functionality during any
IO operation.

The goal of this patch is to block creation of pblk
devices in case of interleaved metadata

Reviewed-by: Javier González 
Signed-off-by: Igor Konopko 
---
 drivers/lightnvm/pblk-init.c | 6 ++
 drivers/nvme/host/lightnvm.c | 1 +
 include/linux/lightnvm.h | 1 +
 3 files changed, 8 insertions(+)

diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index b67bca810eb7..a728f861edd6 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -1179,6 +1179,12 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct 
gendisk *tdisk,
return ERR_PTR(-EINVAL);
}
 
+   if (geo->ext) {
+   pblk_err(pblk, "extended metadata not supported\n");
+   kfree(pblk);
+   return ERR_PTR(-EINVAL);
+   }
+
spin_lock_init(>resubmit_lock);
spin_lock_init(>trans_lock);
spin_lock_init(>lock);
diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
index 55076912a673..049425ad8592 100644
--- a/drivers/nvme/host/lightnvm.c
+++ b/drivers/nvme/host/lightnvm.c
@@ -982,6 +982,7 @@ void nvme_nvm_update_nvm_info(struct nvme_ns *ns)
if (geo->version != NVM_OCSSD_SPEC_12) {
geo->csecs = 1 << ns->lba_shift;
geo->sos = ns->ms;
+   geo->ext = ns->ext;
}
 
if (nvm_alloc_dma_pool(ndev))
diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
index 8b4564c17656..fd7b519f3ad2 100644
--- a/include/linux/lightnvm.h
+++ b/include/linux/lightnvm.h
@@ -357,6 +357,7 @@ struct nvm_geo {
u32 clba;   /* sectors per chunk */
u16 csecs;  /* sector size */
u16 sos;/* out-of-band area size */
+   boolext;/* metadata in extended data buffer */
 
/* device write constrains */
u32 ws_min; /* minimum write size */
-- 
2.14.4



[PATCH v3 0/5] lightnvm: Flexible metadata

2018-11-23 Thread Igor Konopko
This series of patches extends the way how pblk can
store L2P sector metadata. After this set of changes
any size of NVMe metadata above 16b is supported in pblk.
Also there is an support for case without NVMe metadata.

Changes v2 --> v3:
-Rebase on top of ocssd/for-4.21/core
-get/set_meta_lba helpers were removed
-dma reallocation was replaced with single allocation
-oob metadata size was added to pblk structure
-proper checks on pblk creation were added
 
Changes v1 --> v2:
-Revert sector meta size back to 16b for pblk
-Dma pool for larger oob meta are handled in core instead of pblk
-Pblk oob meta helpers uses __le64 as input outpu instead of u64
-Other minor fixes based on v1 patch review

Igor Konopko (5):
  lightnvm: pblk: Move lba list to partial read context
  lightnvm: pblk: Helpers for OOB metadata
  lightnvm: Flexible DMA pool entry size
  lightnvm: Disable interleaved metadata
  lightnvm: pblk: Support for packed metadata

 drivers/lightnvm/core.c  | 30 --
 drivers/lightnvm/pblk-core.c | 66 ++--
 drivers/lightnvm/pblk-init.c | 47 ++--
 drivers/lightnvm/pblk-map.c  | 21 -
 drivers/lightnvm/pblk-rb.c   |  3 ++
 drivers/lightnvm/pblk-read.c | 60 
 drivers/lightnvm/pblk-recovery.c | 22 --
 drivers/lightnvm/pblk-sysfs.c|  7 +
 drivers/lightnvm/pblk-write.c| 14 ++---
 drivers/lightnvm/pblk.h  | 24 +--
 drivers/nvme/host/lightnvm.c | 16 ++
 include/linux/lightnvm.h |  4 ++-
 12 files changed, 235 insertions(+), 79 deletions(-)

-- 
2.14.4



[PATCH v3 2/5] lightnvm: pblk: Helpers for OOB metadata

2018-11-23 Thread Igor Konopko
Currently pblk assumes that size of OOB metadata on drive is always
equal to size of pblk_sec_meta struct. This commit add helpers which will
allow to handle different sizes of OOB metadata on drive in the future.
Still, after this patch only OOB metadata equal to 16b is supported.

Signed-off-by: Igor Konopko 
---
 drivers/lightnvm/pblk-core.c |  5 +++--
 drivers/lightnvm/pblk-init.c |  6 ++
 drivers/lightnvm/pblk-map.c  | 21 +---
 drivers/lightnvm/pblk-read.c | 42 +---
 drivers/lightnvm/pblk-recovery.c | 13 +++--
 drivers/lightnvm/pblk.h  |  6 ++
 6 files changed, 62 insertions(+), 31 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 6581c35f51ee..9509d6dbed53 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -796,10 +796,11 @@ static int pblk_line_smeta_write(struct pblk *pblk, 
struct pblk_line *line,
rqd.is_seq = 1;
 
for (i = 0; i < lm->smeta_sec; i++, paddr++) {
-   struct pblk_sec_meta *meta_list = rqd.meta_list;
+   void *meta_list = rqd.meta_list;
 
rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
-   meta_list[i].lba = lba_list[paddr] = addr_empty;
+   pblk_get_meta(pblk, meta_list, i)->lba = lba_list[paddr] =
+   addr_empty;
}
 
ret = pblk_submit_io_sync_sem(pblk, );
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 0e37104de596..6e7a0c6c6655 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -409,6 +409,12 @@ static int pblk_core_init(struct pblk *pblk)
queue_max_hw_sectors(dev->q) / (geo->csecs >> SECTOR_SHIFT));
pblk_set_sec_per_write(pblk, pblk->min_write_pgs);
 
+   pblk->oob_meta_size = geo->sos;
+   if (pblk->oob_meta_size != sizeof(struct pblk_sec_meta)) {
+   pblk_err(pblk, "Unsupported metadata size\n");
+   return -EINVAL;
+   }
+
pblk->pad_dist = kcalloc(pblk->min_write_pgs - 1, sizeof(atomic64_t),
GFP_KERNEL);
if (!pblk->pad_dist)
diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
index 5a3c28cce8ab..0c6d962bad78 100644
--- a/drivers/lightnvm/pblk-map.c
+++ b/drivers/lightnvm/pblk-map.c
@@ -22,7 +22,7 @@
 static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
  struct ppa_addr *ppa_list,
  unsigned long *lun_bitmap,
- struct pblk_sec_meta *meta_list,
+ void *meta_list,
  unsigned int valid_secs)
 {
struct pblk_line *line = pblk_line_get_data(pblk);
@@ -74,14 +74,17 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned 
int sentry,
kref_get(>ref);
w_ctx = pblk_rb_w_ctx(>rwb, sentry + i);
w_ctx->ppa = ppa_list[i];
-   meta_list[i].lba = cpu_to_le64(w_ctx->lba);
+   pblk_get_meta(pblk, meta_list, i)->lba =
+   cpu_to_le64(w_ctx->lba);
lba_list[paddr] = cpu_to_le64(w_ctx->lba);
if (lba_list[paddr] != addr_empty)
line->nr_valid_lbas++;
else
atomic64_inc(>pad_wa);
} else {
-   lba_list[paddr] = meta_list[i].lba = addr_empty;
+   lba_list[paddr] = addr_empty;
+   pblk_get_meta(pblk, meta_list, i)->lba =
+   addr_empty;
__pblk_map_invalidate(pblk, line, paddr);
}
}
@@ -94,7 +97,8 @@ int pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, 
unsigned int sentry,
 unsigned long *lun_bitmap, unsigned int valid_secs,
 unsigned int off)
 {
-   struct pblk_sec_meta *meta_list = rqd->meta_list;
+   void *meta_list = rqd->meta_list;
+   void *meta_buffer;
struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
unsigned int map_secs;
int min = pblk->min_write_pgs;
@@ -103,9 +107,10 @@ int pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, 
unsigned int sentry,
 
for (i = off; i < rqd->nr_ppas; i += min) {
map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
+   meta_buffer = pblk_get_meta(pblk, meta_list, i);
 
ret = pblk_map_page_data(pblk, sentry + i, _list[i],
-

[PATCH v3 3/5] lightnvm: Flexible DMA pool entry size

2018-11-23 Thread Igor Konopko
Currently whole lightnvm and pblk uses single DMA pool,
for which entry size is always equal to PAGE_SIZE.
PPA list always needs 8b*64, so there is only 56b*64
space for OOB meta. Since NVMe OOB meta can be bigger,
such as 128b, this solution is not robustness.

This patch add the possiblity to support OOB meta above
56b by changing DMA pool size based on OOB meta size.

It also allows pblk to use OOB metadata >=16b.

Signed-off-by: Igor Konopko 
---
 drivers/lightnvm/core.c  | 30 --
 drivers/lightnvm/pblk-core.c |  8 
 drivers/lightnvm/pblk-init.c |  2 +-
 drivers/lightnvm/pblk-recovery.c |  4 ++--
 drivers/lightnvm/pblk.h  |  6 +-
 drivers/nvme/host/lightnvm.c | 15 +--
 include/linux/lightnvm.h |  3 ++-
 7 files changed, 43 insertions(+), 25 deletions(-)

diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
index 73ab3cf26868..c3650b141a30 100644
--- a/drivers/lightnvm/core.c
+++ b/drivers/lightnvm/core.c
@@ -1145,15 +1145,9 @@ int nvm_register(struct nvm_dev *dev)
if (!dev->q || !dev->ops)
return -EINVAL;
 
-   dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist");
-   if (!dev->dma_pool) {
-   pr_err("nvm: could not create dma pool\n");
-   return -ENOMEM;
-   }
-
ret = nvm_init(dev);
if (ret)
-   goto err_init;
+   return ret;
 
/* register device with a supported media manager */
down_write(_lock);
@@ -1161,9 +1155,6 @@ int nvm_register(struct nvm_dev *dev)
up_write(_lock);
 
return 0;
-err_init:
-   dev->ops->destroy_dma_pool(dev->dma_pool);
-   return ret;
 }
 EXPORT_SYMBOL(nvm_register);
 
@@ -1187,6 +1178,25 @@ void nvm_unregister(struct nvm_dev *dev)
 }
 EXPORT_SYMBOL(nvm_unregister);
 
+int nvm_alloc_dma_pool(struct nvm_dev *dev)
+{
+   int exp_pool_size;
+
+   exp_pool_size = max_t(int, PAGE_SIZE,
+ (NVM_MAX_VLBA * (sizeof(u64) + dev->geo.sos)));
+   exp_pool_size = round_up(exp_pool_size, PAGE_SIZE);
+
+   dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist",
+ exp_pool_size);
+   if (!dev->dma_pool) {
+   pr_err("nvm: could not create dma pool\n");
+   return -ENOMEM;
+   }
+
+   return 0;
+}
+EXPORT_SYMBOL(nvm_alloc_dma_pool);
+
 static int __nvm_configure_create(struct nvm_ioctl_create *create)
 {
struct nvm_dev *dev;
diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 9509d6dbed53..2ebd3b079a96 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -250,8 +250,8 @@ int pblk_alloc_rqd_meta(struct pblk *pblk, struct nvm_rq 
*rqd)
if (rqd->nr_ppas == 1)
return 0;
 
-   rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size;
-   rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size;
+   rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size(pblk);
+   rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size(pblk);
 
return 0;
 }
@@ -846,8 +846,8 @@ int pblk_line_emeta_read(struct pblk *pblk, struct 
pblk_line *line,
if (!meta_list)
return -ENOMEM;
 
-   ppa_list = meta_list + pblk_dma_meta_size;
-   dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
+   ppa_list = meta_list + pblk_dma_meta_size(pblk);
+   dma_ppa_list = dma_meta_list + pblk_dma_meta_size(pblk);
 
 next_rq:
memset(, 0, sizeof(struct nvm_rq));
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 6e7a0c6c6655..b67bca810eb7 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -410,7 +410,7 @@ static int pblk_core_init(struct pblk *pblk)
pblk_set_sec_per_write(pblk, pblk->min_write_pgs);
 
pblk->oob_meta_size = geo->sos;
-   if (pblk->oob_meta_size != sizeof(struct pblk_sec_meta)) {
+   if (pblk->oob_meta_size < sizeof(struct pblk_sec_meta)) {
pblk_err(pblk, "Unsupported metadata size\n");
return -EINVAL;
}
diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index 902c54ab1318..5bb8a2a4f87b 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -475,8 +475,8 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, 
struct pblk_line *line)
if (!meta_list)
return -ENOMEM;
 
-   ppa_list = (void *)(meta_list) + pblk_dma_meta_size;
-   dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
+   ppa_list = (void *)(meta_list) + pblk_dma_meta_size(pblk);
+   dma_ppa_list = dma_meta_list + pblk_dma_meta_size(pblk);
 
data = kcalloc(pblk->max_write_pgs

[PATCH v3 1/5] lightnvm: pblk: Move lba list to partial read context

2018-11-23 Thread Igor Konopko
Currently DMA allocated memory is reused on partial read
for lba_list_mem and lba_list_media arrays. In preparation
for dynamic DMA pool sizes we need to move this arrays
into pblk_pr_ctx structures.

Reviewed-by: Javier González 
Signed-off-by: Igor Konopko 
---
 drivers/lightnvm/pblk-read.c | 20 +---
 drivers/lightnvm/pblk.h  |  2 ++
 2 files changed, 7 insertions(+), 15 deletions(-)

diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index 9fba614adeeb..19917d3c19b3 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -224,7 +224,6 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
unsigned long *read_bitmap = pr_ctx->bitmap;
int nr_secs = pr_ctx->orig_nr_secs;
int nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs);
-   __le64 *lba_list_mem, *lba_list_media;
void *src_p, *dst_p;
int hole, i;
 
@@ -237,13 +236,9 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
rqd->ppa_list[0] = ppa;
}
 
-   /* Re-use allocated memory for intermediate lbas */
-   lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
-   lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk_dma_ppa_size);
-
for (i = 0; i < nr_secs; i++) {
-   lba_list_media[i] = meta_list[i].lba;
-   meta_list[i].lba = lba_list_mem[i];
+   pr_ctx->lba_list_media[i] = meta_list[i].lba;
+   meta_list[i].lba = pr_ctx->lba_list_mem[i];
}
 
/* Fill the holes in the original bio */
@@ -255,7 +250,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
line = pblk_ppa_to_line(pblk, rqd->ppa_list[i]);
kref_put(>ref, pblk_line_put);
 
-   meta_list[hole].lba = lba_list_media[i];
+   meta_list[hole].lba = pr_ctx->lba_list_media[i];
 
src_bv = new_bio->bi_io_vec[i++];
dst_bv = bio->bi_io_vec[bio_init_idx + hole];
@@ -295,13 +290,9 @@ static int pblk_setup_partial_read(struct pblk *pblk, 
struct nvm_rq *rqd,
struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
struct pblk_pr_ctx *pr_ctx;
struct bio *new_bio, *bio = r_ctx->private;
-   __le64 *lba_list_mem;
int nr_secs = rqd->nr_ppas;
int i;
 
-   /* Re-use allocated memory for intermediate lbas */
-   lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
-
new_bio = bio_alloc(GFP_KERNEL, nr_holes);
 
if (pblk_bio_add_pages(pblk, new_bio, GFP_KERNEL, nr_holes))
@@ -312,12 +303,12 @@ static int pblk_setup_partial_read(struct pblk *pblk, 
struct nvm_rq *rqd,
goto fail_free_pages;
}
 
-   pr_ctx = kmalloc(sizeof(struct pblk_pr_ctx), GFP_KERNEL);
+   pr_ctx = kzalloc(sizeof(struct pblk_pr_ctx), GFP_KERNEL);
if (!pr_ctx)
goto fail_free_pages;
 
for (i = 0; i < nr_secs; i++)
-   lba_list_mem[i] = meta_list[i].lba;
+   pr_ctx->lba_list_mem[i] = meta_list[i].lba;
 
new_bio->bi_iter.bi_sector = 0; /* internal bio */
bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
@@ -325,7 +316,6 @@ static int pblk_setup_partial_read(struct pblk *pblk, 
struct nvm_rq *rqd,
rqd->bio = new_bio;
rqd->nr_ppas = nr_holes;
 
-   pr_ctx->ppa_ptr = NULL;
pr_ctx->orig_bio = bio;
bitmap_copy(pr_ctx->bitmap, read_bitmap, NVM_MAX_VLBA);
pr_ctx->bio_init_idx = bio_init_idx;
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index e5b88a25d4d6..0e9d3960ac4c 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -132,6 +132,8 @@ struct pblk_pr_ctx {
unsigned int bio_init_idx;
void *ppa_ptr;
dma_addr_t dma_ppa_list;
+   __le64 lba_list_mem[NVM_MAX_VLBA];
+   __le64 lba_list_media[NVM_MAX_VLBA];
 };
 
 /* Pad context */
-- 
2.14.4



[PATCH v2 5/5] lightnvm: pblk: Support for packed metadata

2018-10-22 Thread Igor Konopko
In current pblk implementation, l2p mapping for not closed lines
is always stored only in OOB metadata and recovered from it.

Such a solution does not provide data integrity when drives does
not have such a OOB metadata space.

The goal of this patch is to add support for so called packed
metadata, which store l2p mapping for open lines in last sector
of every write unit.

Signed-off-by: Igor Konopko 
---
 drivers/lightnvm/pblk-core.c | 53 +---
 drivers/lightnvm/pblk-init.c | 37 ++--
 drivers/lightnvm/pblk-rb.c   |  3 +++
 drivers/lightnvm/pblk-read.c |  6 +
 drivers/lightnvm/pblk-recovery.c |  5 ++--
 drivers/lightnvm/pblk-sysfs.c|  7 ++
 drivers/lightnvm/pblk-write.c| 14 ---
 drivers/lightnvm/pblk.h  | 13 +-
 8 files changed, 125 insertions(+), 13 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index b1e104765868..245abf29620f 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -376,7 +376,7 @@ void pblk_write_should_kick(struct pblk *pblk)
 {
unsigned int secs_avail = pblk_rb_read_count(>rwb);
 
-   if (secs_avail >= pblk->min_write_pgs)
+   if (secs_avail >= pblk->min_write_pgs_data)
pblk_write_kick(pblk);
 }
 
@@ -407,7 +407,9 @@ struct list_head *pblk_line_gc_list(struct pblk *pblk, 
struct pblk_line *line)
struct pblk_line_meta *lm = >lm;
struct pblk_line_mgmt *l_mg = >l_mg;
struct list_head *move_list = NULL;
-   int vsc = le32_to_cpu(*line->vsc);
+   int packed_meta = (le32_to_cpu(*line->vsc) / pblk->min_write_pgs_data)
+   * (pblk->min_write_pgs - pblk->min_write_pgs_data);
+   int vsc = le32_to_cpu(*line->vsc) + packed_meta;
 
lockdep_assert_held(>lock);
 
@@ -620,12 +622,15 @@ struct bio *pblk_bio_map_addr(struct pblk *pblk, void 
*data,
 }
 
 int pblk_calc_secs(struct pblk *pblk, unsigned long secs_avail,
-  unsigned long secs_to_flush)
+  unsigned long secs_to_flush, bool skip_meta)
 {
int max = pblk->sec_per_write;
int min = pblk->min_write_pgs;
int secs_to_sync = 0;
 
+   if (skip_meta && pblk->min_write_pgs_data != pblk->min_write_pgs)
+   min = max = pblk->min_write_pgs_data;
+
if (secs_avail >= max)
secs_to_sync = max;
else if (secs_avail >= min)
@@ -852,7 +857,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct 
pblk_line *line,
 next_rq:
memset(, 0, sizeof(struct nvm_rq));
 
-   rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
+   rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
rq_len = rq_ppas * geo->csecs;
 
bio = pblk_bio_map_addr(pblk, emeta_buf, rq_ppas, rq_len,
@@ -2161,3 +2166,43 @@ void pblk_lookup_l2p_rand(struct pblk *pblk, struct 
ppa_addr *ppas,
}
spin_unlock(>trans_lock);
 }
+
+void pblk_set_packed_meta(struct pblk *pblk, struct nvm_rq *rqd)
+{
+   void *meta_list = rqd->meta_list;
+   void *page;
+   int i = 0;
+
+   if (pblk_is_oob_meta_supported(pblk))
+   return;
+
+   /* We need to zero out metadata corresponding to packed meta page */
+   pblk_set_meta_lba(pblk, meta_list, rqd->nr_ppas - 1,
+ cpu_to_le64(ADDR_EMPTY));
+
+   page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page);
+   /* We need to fill last page of request (packed metadata)
+* with data from oob meta buffer.
+*/
+   for (; i < rqd->nr_ppas; i++)
+   memcpy(page + (i * sizeof(struct pblk_sec_meta)),
+   pblk_get_meta(pblk, meta_list, i),
+   sizeof(struct pblk_sec_meta));
+}
+
+void pblk_get_packed_meta(struct pblk *pblk, struct nvm_rq *rqd)
+{
+   void *meta_list = rqd->meta_list;
+   void *page;
+   int i = 0;
+
+   if (pblk_is_oob_meta_supported(pblk))
+   return;
+
+   page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page);
+   /* We need to fill oob meta buffer with data from packe metadata */
+   for (; i < rqd->nr_ppas; i++)
+   memcpy(pblk_get_meta(pblk, meta_list, i),
+   page + (i * sizeof(struct pblk_sec_meta)),
+   sizeof(struct pblk_sec_meta));
+}
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index ded0618f6cda..7e09717a93d4 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -406,12 +406,44 @@ static int pblk_core_init(struct pblk *pblk)
pblk->nr_flush_rst = 0;
 
pblk->min_write_pgs = geo->ws_opt;
+   pblk->min_write_pgs_data = pblk->min_write_pgs;
max_write_ppas = pblk->mi

[PATCH v2 2/5] lightnvm: pblk: Helpers for OOB metadata

2018-10-22 Thread Igor Konopko
Currently pblk assumes that size of OOB metadata on drive is always
equal to size of pblk_sec_meta struct. This commit add helpers which will
allow to handle different sizes of OOB metadata on drive.

Signed-off-by: Igor Konopko 
---
 drivers/lightnvm/pblk-core.c |  5 +++--
 drivers/lightnvm/pblk-map.c  | 20 +++---
 drivers/lightnvm/pblk-read.c | 45 +---
 drivers/lightnvm/pblk-recovery.c | 13 ++--
 drivers/lightnvm/pblk.h  | 22 
 5 files changed, 73 insertions(+), 32 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 6944aac43b01..0f33055f40eb 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -796,10 +796,11 @@ static int pblk_line_smeta_write(struct pblk *pblk, 
struct pblk_line *line,
rqd.is_seq = 1;
 
for (i = 0; i < lm->smeta_sec; i++, paddr++) {
-   struct pblk_sec_meta *meta_list = rqd.meta_list;
+   void *meta_list = rqd.meta_list;
 
rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
-   meta_list[i].lba = lba_list[paddr] = addr_empty;
+   pblk_set_meta_lba(pblk, meta_list, i, addr_empty);
+   lba_list[paddr] = addr_empty;
}
 
ret = pblk_submit_io_sync_sem(pblk, );
diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
index 6dcbd44e3acb..4bae30129bc9 100644
--- a/drivers/lightnvm/pblk-map.c
+++ b/drivers/lightnvm/pblk-map.c
@@ -22,7 +22,7 @@
 static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
  struct ppa_addr *ppa_list,
  unsigned long *lun_bitmap,
- struct pblk_sec_meta *meta_list,
+ void *meta_list,
  unsigned int valid_secs)
 {
struct pblk_line *line = pblk_line_get_data(pblk);
@@ -68,14 +68,16 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned 
int sentry,
kref_get(>ref);
w_ctx = pblk_rb_w_ctx(>rwb, sentry + i);
w_ctx->ppa = ppa_list[i];
-   meta_list[i].lba = cpu_to_le64(w_ctx->lba);
+   pblk_set_meta_lba(pblk, meta_list, i,
+ cpu_to_le64(w_ctx->lba));
lba_list[paddr] = cpu_to_le64(w_ctx->lba);
if (lba_list[paddr] != addr_empty)
line->nr_valid_lbas++;
else
atomic64_inc(>pad_wa);
} else {
-   lba_list[paddr] = meta_list[i].lba = addr_empty;
+   lba_list[paddr] = addr_empty;
+   pblk_set_meta_lba(pblk, meta_list, i, addr_empty);
__pblk_map_invalidate(pblk, line, paddr);
}
}
@@ -88,7 +90,8 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, 
unsigned int sentry,
 unsigned long *lun_bitmap, unsigned int valid_secs,
 unsigned int off)
 {
-   struct pblk_sec_meta *meta_list = rqd->meta_list;
+   void *meta_list = rqd->meta_list;
+   void *meta_buffer;
struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
unsigned int map_secs;
int min = pblk->min_write_pgs;
@@ -96,8 +99,9 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, 
unsigned int sentry,
 
for (i = off; i < rqd->nr_ppas; i += min) {
map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
+   meta_buffer = pblk_get_meta(pblk, meta_list, i);
if (pblk_map_page_data(pblk, sentry + i, _list[i],
-   lun_bitmap, _list[i], map_secs)) {
+   lun_bitmap, meta_buffer, map_secs)) {
bio_put(rqd->bio);
pblk_free_rqd(pblk, rqd, PBLK_WRITE);
pblk_pipeline_stop(pblk);
@@ -113,7 +117,8 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq 
*rqd,
struct nvm_tgt_dev *dev = pblk->dev;
struct nvm_geo *geo = >geo;
struct pblk_line_meta *lm = >lm;
-   struct pblk_sec_meta *meta_list = rqd->meta_list;
+   void *meta_list = rqd->meta_list;
+   void *meta_buffer;
struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
struct pblk_line *e_line, *d_line;
unsigned int map_secs;
@@ -122,8 +127,9 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq 
*rqd,
 
for (i = 0; i < rqd->nr_ppas; i += min) {
map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
+   meta_buffer = pblk_get_meta(pblk, m

[PATCH v2 3/5] lightnvm: Flexible DMA pool entry size

2018-10-22 Thread Igor Konopko
Currently whole lightnvm and pblk uses single DMA pool,
for which entry size is always equal to PAGE_SIZE.
PPA list always needs 8b*64, so there is only 56b*64
space for OOB meta. Since NVMe OOB meta can be bigger,
such as 128b, this solution is not robustness.

This patch add the possiblity to support OOB meta above
56b by changing DMA pool size based on OOB meta size.

Signed-off-by: Igor Konopko 
---
 drivers/lightnvm/core.c  | 45 ++--
 drivers/lightnvm/pblk-core.c |  8 +++
 drivers/lightnvm/pblk-recovery.c |  4 ++--
 drivers/lightnvm/pblk.h  | 10 -
 drivers/nvme/host/lightnvm.c |  8 +--
 include/linux/lightnvm.h |  4 +++-
 6 files changed, 63 insertions(+), 16 deletions(-)

diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
index efb976a863d2..68f0812077d5 100644
--- a/drivers/lightnvm/core.c
+++ b/drivers/lightnvm/core.c
@@ -1145,11 +1145,9 @@ int nvm_register(struct nvm_dev *dev)
if (!dev->q || !dev->ops)
return -EINVAL;
 
-   dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist");
-   if (!dev->dma_pool) {
-   pr_err("nvm: could not create dma pool\n");
-   return -ENOMEM;
-   }
+   ret = nvm_realloc_dma_pool(dev);
+   if (ret)
+   return ret;
 
ret = nvm_init(dev);
if (ret)
@@ -1162,7 +1160,12 @@ int nvm_register(struct nvm_dev *dev)
 
return 0;
 err_init:
-   dev->ops->destroy_dma_pool(dev->dma_pool);
+   if (dev->dma_pool) {
+   dev->ops->destroy_dma_pool(dev->dma_pool);
+   dev->dma_pool = NULL;
+   dev->dma_pool_size = 0;
+   }
+
return ret;
 }
 EXPORT_SYMBOL(nvm_register);
@@ -1187,6 +1190,36 @@ void nvm_unregister(struct nvm_dev *dev)
 }
 EXPORT_SYMBOL(nvm_unregister);
 
+int nvm_realloc_dma_pool(struct nvm_dev *dev)
+{
+   int exp_pool_size;
+
+   exp_pool_size = max_t(int, PAGE_SIZE,
+ (NVM_MAX_VLBA * (sizeof(u64) + dev->geo.sos)));
+   exp_pool_size = round_up(exp_pool_size, PAGE_SIZE);
+
+   if (dev->dma_pool_size >= exp_pool_size)
+   return 0;
+
+   if (dev->dma_pool) {
+   dev->ops->destroy_dma_pool(dev->dma_pool);
+   dev->dma_pool = NULL;
+   dev->dma_pool_size = 0;
+   }
+
+   dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist",
+ exp_pool_size);
+   if (!dev->dma_pool) {
+   dev->dma_pool_size = 0;
+   pr_err("nvm: could not create dma pool\n");
+   return -ENOMEM;
+   }
+   dev->dma_pool_size = exp_pool_size;
+
+   return 0;
+}
+EXPORT_SYMBOL(nvm_realloc_dma_pool);
+
 static int __nvm_configure_create(struct nvm_ioctl_create *create)
 {
struct nvm_dev *dev;
diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 0f33055f40eb..b1e104765868 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -250,8 +250,8 @@ int pblk_alloc_rqd_meta(struct pblk *pblk, struct nvm_rq 
*rqd)
if (rqd->nr_ppas == 1)
return 0;
 
-   rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size;
-   rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size;
+   rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size(pblk);
+   rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size(pblk);
 
return 0;
 }
@@ -846,8 +846,8 @@ int pblk_line_emeta_read(struct pblk *pblk, struct 
pblk_line *line,
if (!meta_list)
return -ENOMEM;
 
-   ppa_list = meta_list + pblk_dma_meta_size;
-   dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
+   ppa_list = meta_list + pblk_dma_meta_size(pblk);
+   dma_ppa_list = dma_meta_list + pblk_dma_meta_size(pblk);
 
 next_rq:
memset(, 0, sizeof(struct nvm_rq));
diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
index 977b2ca5d849..b5c8a0ed9bb1 100644
--- a/drivers/lightnvm/pblk-recovery.c
+++ b/drivers/lightnvm/pblk-recovery.c
@@ -474,8 +474,8 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, 
struct pblk_line *line)
if (!meta_list)
return -ENOMEM;
 
-   ppa_list = (void *)(meta_list) + pblk_dma_meta_size;
-   dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
+   ppa_list = (void *)(meta_list) + pblk_dma_meta_size(pblk);
+   dma_ppa_list = dma_meta_list + pblk_dma_meta_size(pblk);
 
data = kcalloc(pblk->max_write_pgs, geo->csecs, GFP_KERNEL);
if (!data) {
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index d09c1b341e07..c03fa037d037 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -10

[PATCH v2 0/5] lightnvm: Flexible metadata

2018-10-22 Thread Igor Konopko
This series of patches extends the way how pblk can
store L2P sector metadata. After this set of changes
any size of NVMe metadata (including 0) is supported.

Patches are rebased on top of block/for-next since
there was no ocssd/for-4.21 branch yet.

Changes v1 --> v2:
-Revert sector meta size back to 16b for pblk
-Dma pool for larger oob meta are handled in core instead of pblk
-Pblk oob meta helpers uses __le64 as input outpu instead of u64
-Other minor fixes based on v1 patch review

Igor Konopko (5):
  lightnvm: pblk: Move lba list to partial read context
  lightnvm: pblk: Helpers for OOB metadata
  lightnvm: Flexible DMA pool entry size
  lightnvm: Disable interleaved metadata
  lightnvm: pblk: Support for packed metadata

 drivers/lightnvm/core.c  | 45 +++
 drivers/lightnvm/pblk-core.c | 66 ++--
 drivers/lightnvm/pblk-init.c | 43 --
 drivers/lightnvm/pblk-map.c  | 20 +++-
 drivers/lightnvm/pblk-rb.c   |  3 ++
 drivers/lightnvm/pblk-read.c | 63 +-
 drivers/lightnvm/pblk-recovery.c | 22 --
 drivers/lightnvm/pblk-sysfs.c|  7 +
 drivers/lightnvm/pblk-write.c| 14 ++---
 drivers/lightnvm/pblk.h  | 47 ++--
 drivers/nvme/host/lightnvm.c |  9 --
 include/linux/lightnvm.h |  5 ++-
 12 files changed, 272 insertions(+), 72 deletions(-)

-- 
2.14.4



[PATCH v2 4/5] lightnvm: Disable interleaved metadata

2018-10-22 Thread Igor Konopko
Currently pblk and lightnvm does only check for size
of OOB metadata and does not care wheather this meta
is located in separate buffer or is interleaved with
data in single buffer.

In reality only the first scenario is supported, where
second mode will break pblk functionality during any
IO operation.

The goal of this patch is to block creation of pblk
devices in case of interleaved metadata

Signed-off-by: Igor Konopko 
---
 drivers/lightnvm/pblk-init.c | 6 ++
 drivers/nvme/host/lightnvm.c | 1 +
 include/linux/lightnvm.h | 1 +
 3 files changed, 8 insertions(+)

diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 13822594647c..ded0618f6cda 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -1154,6 +1154,12 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct 
gendisk *tdisk,
return ERR_PTR(-EINVAL);
}
 
+   if (geo->ext) {
+   pblk_err(pblk, "extended metadata not supported\n");
+   kfree(pblk);
+   return ERR_PTR(-EINVAL);
+   }
+
spin_lock_init(>resubmit_lock);
spin_lock_init(>trans_lock);
spin_lock_init(>lock);
diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
index d1e47a93bcfd..b71c730a6e32 100644
--- a/drivers/nvme/host/lightnvm.c
+++ b/drivers/nvme/host/lightnvm.c
@@ -983,6 +983,7 @@ void nvme_nvm_update_nvm_info(struct nvme_ns *ns)
 
geo->csecs = 1 << ns->lba_shift;
geo->sos = ns->ms;
+   geo->ext = ns->ext;
 
if (nvm_realloc_dma_pool(ndev))
nvm_unregister(ndev);
diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
index 9d3b7c627cac..4870022ebff1 100644
--- a/include/linux/lightnvm.h
+++ b/include/linux/lightnvm.h
@@ -357,6 +357,7 @@ struct nvm_geo {
u32 clba;   /* sectors per chunk */
u16 csecs;  /* sector size */
u16 sos;/* out-of-band area size */
+   boolext;/* metadata in extended data buffer */
 
/* device write constrains */
u32 ws_min; /* minimum write size */
-- 
2.14.4



[PATCH v2 1/5] lightnvm: pblk: Move lba list to partial read context

2018-10-22 Thread Igor Konopko
Currently DMA allocated memory is reused on partial read
for lba_list_mem and lba_list_media arrays. In preparation
for dynamic DMA pool sizes we need to move this arrays
into pblk_pr_ctx structures.

Signed-off-by: Igor Konopko 
---
 drivers/lightnvm/pblk-read.c | 20 +---
 drivers/lightnvm/pblk.h  |  2 ++
 2 files changed, 7 insertions(+), 15 deletions(-)

diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index 9fba614adeeb..19917d3c19b3 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -224,7 +224,6 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
unsigned long *read_bitmap = pr_ctx->bitmap;
int nr_secs = pr_ctx->orig_nr_secs;
int nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs);
-   __le64 *lba_list_mem, *lba_list_media;
void *src_p, *dst_p;
int hole, i;
 
@@ -237,13 +236,9 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
rqd->ppa_list[0] = ppa;
}
 
-   /* Re-use allocated memory for intermediate lbas */
-   lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
-   lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk_dma_ppa_size);
-
for (i = 0; i < nr_secs; i++) {
-   lba_list_media[i] = meta_list[i].lba;
-   meta_list[i].lba = lba_list_mem[i];
+   pr_ctx->lba_list_media[i] = meta_list[i].lba;
+   meta_list[i].lba = pr_ctx->lba_list_mem[i];
}
 
/* Fill the holes in the original bio */
@@ -255,7 +250,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
line = pblk_ppa_to_line(pblk, rqd->ppa_list[i]);
kref_put(>ref, pblk_line_put);
 
-   meta_list[hole].lba = lba_list_media[i];
+   meta_list[hole].lba = pr_ctx->lba_list_media[i];
 
src_bv = new_bio->bi_io_vec[i++];
dst_bv = bio->bi_io_vec[bio_init_idx + hole];
@@ -295,13 +290,9 @@ static int pblk_setup_partial_read(struct pblk *pblk, 
struct nvm_rq *rqd,
struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
struct pblk_pr_ctx *pr_ctx;
struct bio *new_bio, *bio = r_ctx->private;
-   __le64 *lba_list_mem;
int nr_secs = rqd->nr_ppas;
int i;
 
-   /* Re-use allocated memory for intermediate lbas */
-   lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
-
new_bio = bio_alloc(GFP_KERNEL, nr_holes);
 
if (pblk_bio_add_pages(pblk, new_bio, GFP_KERNEL, nr_holes))
@@ -312,12 +303,12 @@ static int pblk_setup_partial_read(struct pblk *pblk, 
struct nvm_rq *rqd,
goto fail_free_pages;
}
 
-   pr_ctx = kmalloc(sizeof(struct pblk_pr_ctx), GFP_KERNEL);
+   pr_ctx = kzalloc(sizeof(struct pblk_pr_ctx), GFP_KERNEL);
if (!pr_ctx)
goto fail_free_pages;
 
for (i = 0; i < nr_secs; i++)
-   lba_list_mem[i] = meta_list[i].lba;
+   pr_ctx->lba_list_mem[i] = meta_list[i].lba;
 
new_bio->bi_iter.bi_sector = 0; /* internal bio */
bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
@@ -325,7 +316,6 @@ static int pblk_setup_partial_read(struct pblk *pblk, 
struct nvm_rq *rqd,
rqd->bio = new_bio;
rqd->nr_ppas = nr_holes;
 
-   pr_ctx->ppa_ptr = NULL;
pr_ctx->orig_bio = bio;
bitmap_copy(pr_ctx->bitmap, read_bitmap, NVM_MAX_VLBA);
pr_ctx->bio_init_idx = bio_init_idx;
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index 02bb2e98f8a9..2aca840c7838 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -132,6 +132,8 @@ struct pblk_pr_ctx {
unsigned int bio_init_idx;
void *ppa_ptr;
dma_addr_t dma_ppa_list;
+   __le64 lba_list_mem[NVM_MAX_VLBA];
+   __le64 lba_list_media[NVM_MAX_VLBA];
 };
 
 /* Pad context */
-- 
2.14.4



Re: [PATCH 3/5] lightnvm: Flexible DMA pool entry size

2018-10-09 Thread Igor Konopko




On 09.10.2018 14:36, Javier Gonzalez wrote:

On 9 Oct 2018, at 21.10, Hans Holmberg  wrote:

On Tue, Oct 9, 2018 at 12:03 PM Igor Konopko  wrote:

On 09.10.2018 11:16, Hans Holmberg wrote:

On Fri, Oct 5, 2018 at 3:38 PM Igor Konopko  wrote:

Currently whole lightnvm and pblk uses single DMA pool,
for which entry size is always equal to PAGE_SIZE.
PPA list always needs 8b*64, so there is only 56b*64
space for OOB meta. Since NVMe OOB meta can be bigger,
such as 128b, this solution is not robustness.

This patch add the possiblity to support OOB meta above
56b by creating separate DMA pool for PBLK with entry
size which is big enough to store both PPA list and such
a OOB metadata.

Signed-off-by: Igor Konopko 
---
  drivers/lightnvm/core.c  | 33 +++-
  drivers/lightnvm/pblk-core.c | 19 +-
  drivers/lightnvm/pblk-init.c | 11 +++
  drivers/lightnvm/pblk-read.c |  3 ++-
  drivers/lightnvm/pblk-recovery.c |  9 +
  drivers/lightnvm/pblk.h  | 11 ++-
  drivers/nvme/host/lightnvm.c |  6 --
  include/linux/lightnvm.h |  8 +---
  8 files changed, 71 insertions(+), 29 deletions(-)

diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
index efb976a863d2..48db7a096257 100644
--- a/drivers/lightnvm/core.c
+++ b/drivers/lightnvm/core.c
@@ -641,20 +641,33 @@ void nvm_unregister_tgt_type(struct nvm_tgt_type *tt)
  }
  EXPORT_SYMBOL(nvm_unregister_tgt_type);

-void *nvm_dev_dma_alloc(struct nvm_dev *dev, gfp_t mem_flags,
-   dma_addr_t *dma_handler)
+void *nvm_dev_dma_alloc(struct nvm_dev *dev, void *pool,
+   gfp_t mem_flags, dma_addr_t *dma_handler)
  {
-   return dev->ops->dev_dma_alloc(dev, dev->dma_pool, mem_flags,
-   dma_handler);
+   return dev->ops->dev_dma_alloc(dev, pool ?: dev->dma_pool,
+   mem_flags, dma_handler);
  }
  EXPORT_SYMBOL(nvm_dev_dma_alloc);

-void nvm_dev_dma_free(struct nvm_dev *dev, void *addr, dma_addr_t dma_handler)
+void nvm_dev_dma_free(struct nvm_dev *dev, void *pool,
+   void *addr, dma_addr_t dma_handler)
  {
-   dev->ops->dev_dma_free(dev->dma_pool, addr, dma_handler);
+   dev->ops->dev_dma_free(pool ?: dev->dma_pool, addr, dma_handler);
  }
  EXPORT_SYMBOL(nvm_dev_dma_free);

+void *nvm_dev_dma_create(struct nvm_dev *dev, int size, char *name)
+{
+   return dev->ops->create_dma_pool(dev, name, size);
+}
+EXPORT_SYMBOL(nvm_dev_dma_create);
+
+void nvm_dev_dma_destroy(struct nvm_dev *dev, void *pool)
+{
+   dev->ops->destroy_dma_pool(pool);
+}
+EXPORT_SYMBOL(nvm_dev_dma_destroy);
+
  static struct nvm_dev *nvm_find_nvm_dev(const char *name)
  {
 struct nvm_dev *dev;
@@ -682,7 +695,8 @@ static int nvm_set_rqd_ppalist(struct nvm_tgt_dev *tgt_dev, 
struct nvm_rq *rqd,
 }

 rqd->nr_ppas = nr_ppas;
-   rqd->ppa_list = nvm_dev_dma_alloc(dev, GFP_KERNEL, >dma_ppa_list);
+   rqd->ppa_list = nvm_dev_dma_alloc(dev, NULL, GFP_KERNEL,
+   >dma_ppa_list);
 if (!rqd->ppa_list) {
 pr_err("nvm: failed to allocate dma memory\n");
 return -ENOMEM;
@@ -708,7 +722,8 @@ static void nvm_free_rqd_ppalist(struct nvm_tgt_dev 
*tgt_dev,
 if (!rqd->ppa_list)
 return;

-   nvm_dev_dma_free(tgt_dev->parent, rqd->ppa_list, rqd->dma_ppa_list);
+   nvm_dev_dma_free(tgt_dev->parent, NULL, rqd->ppa_list,
+   rqd->dma_ppa_list);
  }

  static int nvm_set_flags(struct nvm_geo *geo, struct nvm_rq *rqd)
@@ -1145,7 +1160,7 @@ int nvm_register(struct nvm_dev *dev)
 if (!dev->q || !dev->ops)
 return -EINVAL;

-   dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist");
+   dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist", PAGE_SIZE);
 if (!dev->dma_pool) {
 pr_err("nvm: could not create dma pool\n");
 return -ENOMEM;


Why hack the nvm_dev_ interfaces when you are not using the dev pool anyway?
Wouldn't it be more straightforward to use dma_pool_* instead?


In order to call dma_pool_create() I need NVMe device structure, which
in my understanding is not public, so this is why I decided to reuse
plumbing which was available in nvm_dev_* interfaces.


Hmm, yes, I see now.


If there is some easy way to call dma_pool_create() from pblk module and
I'm missing that - let me know. I can rewrite this part, if there is
some better way to do so.


Create and destroy needs to go through dev->ops, but once you have
allocated the pool, there is no need for going through t

Re: [PATCH 3/5] lightnvm: Flexible DMA pool entry size

2018-10-09 Thread Igor Konopko




On 09.10.2018 11:16, Hans Holmberg wrote:

On Fri, Oct 5, 2018 at 3:38 PM Igor Konopko  wrote:


Currently whole lightnvm and pblk uses single DMA pool,
for which entry size is always equal to PAGE_SIZE.
PPA list always needs 8b*64, so there is only 56b*64
space for OOB meta. Since NVMe OOB meta can be bigger,
such as 128b, this solution is not robustness.

This patch add the possiblity to support OOB meta above
56b by creating separate DMA pool for PBLK with entry
size which is big enough to store both PPA list and such
a OOB metadata.

Signed-off-by: Igor Konopko 
---
  drivers/lightnvm/core.c  | 33 +++-
  drivers/lightnvm/pblk-core.c | 19 +-
  drivers/lightnvm/pblk-init.c | 11 +++
  drivers/lightnvm/pblk-read.c |  3 ++-
  drivers/lightnvm/pblk-recovery.c |  9 +
  drivers/lightnvm/pblk.h  | 11 ++-
  drivers/nvme/host/lightnvm.c |  6 --
  include/linux/lightnvm.h |  8 +---
  8 files changed, 71 insertions(+), 29 deletions(-)

diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
index efb976a863d2..48db7a096257 100644
--- a/drivers/lightnvm/core.c
+++ b/drivers/lightnvm/core.c
@@ -641,20 +641,33 @@ void nvm_unregister_tgt_type(struct nvm_tgt_type *tt)
  }
  EXPORT_SYMBOL(nvm_unregister_tgt_type);

-void *nvm_dev_dma_alloc(struct nvm_dev *dev, gfp_t mem_flags,
-   dma_addr_t *dma_handler)
+void *nvm_dev_dma_alloc(struct nvm_dev *dev, void *pool,
+   gfp_t mem_flags, dma_addr_t *dma_handler)
  {
-   return dev->ops->dev_dma_alloc(dev, dev->dma_pool, mem_flags,
-   dma_handler);
+   return dev->ops->dev_dma_alloc(dev, pool ?: dev->dma_pool,
+   mem_flags, dma_handler);
  }
  EXPORT_SYMBOL(nvm_dev_dma_alloc);

-void nvm_dev_dma_free(struct nvm_dev *dev, void *addr, dma_addr_t dma_handler)
+void nvm_dev_dma_free(struct nvm_dev *dev, void *pool,
+   void *addr, dma_addr_t dma_handler)
  {
-   dev->ops->dev_dma_free(dev->dma_pool, addr, dma_handler);
+   dev->ops->dev_dma_free(pool ?: dev->dma_pool, addr, dma_handler);
  }
  EXPORT_SYMBOL(nvm_dev_dma_free);

+void *nvm_dev_dma_create(struct nvm_dev *dev, int size, char *name)
+{
+   return dev->ops->create_dma_pool(dev, name, size);
+}
+EXPORT_SYMBOL(nvm_dev_dma_create);
+
+void nvm_dev_dma_destroy(struct nvm_dev *dev, void *pool)
+{
+   dev->ops->destroy_dma_pool(pool);
+}
+EXPORT_SYMBOL(nvm_dev_dma_destroy);
+
  static struct nvm_dev *nvm_find_nvm_dev(const char *name)
  {
 struct nvm_dev *dev;
@@ -682,7 +695,8 @@ static int nvm_set_rqd_ppalist(struct nvm_tgt_dev *tgt_dev, 
struct nvm_rq *rqd,
 }

 rqd->nr_ppas = nr_ppas;
-   rqd->ppa_list = nvm_dev_dma_alloc(dev, GFP_KERNEL, >dma_ppa_list);
+   rqd->ppa_list = nvm_dev_dma_alloc(dev, NULL, GFP_KERNEL,
+   >dma_ppa_list);
 if (!rqd->ppa_list) {
 pr_err("nvm: failed to allocate dma memory\n");
 return -ENOMEM;
@@ -708,7 +722,8 @@ static void nvm_free_rqd_ppalist(struct nvm_tgt_dev 
*tgt_dev,
 if (!rqd->ppa_list)
 return;

-   nvm_dev_dma_free(tgt_dev->parent, rqd->ppa_list, rqd->dma_ppa_list);
+   nvm_dev_dma_free(tgt_dev->parent, NULL, rqd->ppa_list,
+   rqd->dma_ppa_list);
  }

  static int nvm_set_flags(struct nvm_geo *geo, struct nvm_rq *rqd)
@@ -1145,7 +1160,7 @@ int nvm_register(struct nvm_dev *dev)
 if (!dev->q || !dev->ops)
 return -EINVAL;

-   dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist");
+   dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist", PAGE_SIZE);
 if (!dev->dma_pool) {
 pr_err("nvm: could not create dma pool\n");
 return -ENOMEM;


Why hack the nvm_dev_ interfaces when you are not using the dev pool anyway?
Wouldn't it be more straightforward to use dma_pool_* instead?



In order to call dma_pool_create() I need NVMe device structure, which 
in my understanding is not public, so this is why I decided to reuse 
plumbing which was available in nvm_dev_* interfaces.


If there is some easy way to call dma_pool_create() from pblk module and 
I'm missing that - let me know. I can rewrite this part, if there is 
some better way to do so.



diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 7cb39d84c833..131972b13e27 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -242,16 +242,16 @@ int pblk_alloc_rqd_meta(str

Re: [PATCH 2/5] lightnvm: pblk: Helpers for OOB metadata

2018-10-09 Thread Igor Konopko




On 09.10.2018 11:02, Hans Holmberg wrote:

Hi Igor!

One important thing: this patch breaks the on-disk-storage format so
that needs to be handled(see my comment on this) and  some additional
nitpicks below.

Thanks,
Hans

On Fri, Oct 5, 2018 at 3:38 PM Igor Konopko  wrote:


Currently pblk assumes that size of OOB metadata on drive is always
equal to size of pblk_sec_meta struct. This commit add helpers which will
allow to handle different sizes of OOB metadata on drive.

Signed-off-by: Igor Konopko 
---
  drivers/lightnvm/pblk-core.c |  6 ++---
  drivers/lightnvm/pblk-map.c  | 21 ++--
  drivers/lightnvm/pblk-read.c | 41 +++-
  drivers/lightnvm/pblk-recovery.c | 14 ++-
  drivers/lightnvm/pblk.h  | 37 +++-
  5 files changed, 86 insertions(+), 33 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 6944aac43b01..7cb39d84c833 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -743,7 +743,6 @@ int pblk_line_smeta_read(struct pblk *pblk, struct 
pblk_line *line)
 rqd.opcode = NVM_OP_PREAD;
 rqd.nr_ppas = lm->smeta_sec;
 rqd.is_seq = 1;
-
 for (i = 0; i < lm->smeta_sec; i++, paddr++)
 rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);

@@ -796,10 +795,11 @@ static int pblk_line_smeta_write(struct pblk *pblk, 
struct pblk_line *line,
 rqd.is_seq = 1;

 for (i = 0; i < lm->smeta_sec; i++, paddr++) {
-   struct pblk_sec_meta *meta_list = rqd.meta_list;
+   void *meta_list = rqd.meta_list;

 rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
-   meta_list[i].lba = lba_list[paddr] = addr_empty;
+   pblk_set_meta_lba(pblk, meta_list, i, ADDR_EMPTY);
+   lba_list[paddr] = addr_empty;
 }

 ret = pblk_submit_io_sync_sem(pblk, );
diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
index 6dcbd44e3acb..4c7a9909308e 100644
--- a/drivers/lightnvm/pblk-map.c
+++ b/drivers/lightnvm/pblk-map.c
@@ -22,7 +22,7 @@
  static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
   struct ppa_addr *ppa_list,
   unsigned long *lun_bitmap,
- struct pblk_sec_meta *meta_list,
+ void *meta_list,
   unsigned int valid_secs)
  {
 struct pblk_line *line = pblk_line_get_data(pblk);
@@ -68,14 +68,15 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned 
int sentry,
 kref_get(>ref);
 w_ctx = pblk_rb_w_ctx(>rwb, sentry + i);
 w_ctx->ppa = ppa_list[i];
-   meta_list[i].lba = cpu_to_le64(w_ctx->lba);
+   pblk_set_meta_lba(pblk, meta_list, i, w_ctx->lba);
 lba_list[paddr] = cpu_to_le64(w_ctx->lba);
 if (lba_list[paddr] != addr_empty)
 line->nr_valid_lbas++;
 else
 atomic64_inc(>pad_wa);
 } else {
-   lba_list[paddr] = meta_list[i].lba = addr_empty;
+   lba_list[paddr] = addr_empty;
+   pblk_set_meta_lba(pblk, meta_list, i, ADDR_EMPTY);
 __pblk_map_invalidate(pblk, line, paddr);
 }
 }
@@ -88,7 +89,7 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, 
unsigned int sentry,
  unsigned long *lun_bitmap, unsigned int valid_secs,
  unsigned int off)
  {
-   struct pblk_sec_meta *meta_list = rqd->meta_list;
+   void *meta_list = rqd->meta_list;
 struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
 unsigned int map_secs;
 int min = pblk->min_write_pgs;
@@ -97,7 +98,10 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, 
unsigned int sentry,
 for (i = off; i < rqd->nr_ppas; i += min) {
 map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
 if (pblk_map_page_data(pblk, sentry + i, _list[i],
-   lun_bitmap, _list[i], map_secs)) {
+   lun_bitmap,
+   pblk_get_meta_buffer(pblk,
+meta_list, i),
+   map_secs)) {
 bio_put(rqd->bio);
 pblk_free_rqd(pblk, rqd, PBLK_WRITE);
 pblk_pipeline_stop(pblk);
@@ -113,7 +117,7 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq 
*rqd,
 struct nvm_tgt_dev *dev = pblk->dev;
  

[PATCH 0/5] lightnvm: pblk: Flexible metadata

2018-10-05 Thread Igor Konopko
This series of patches extends the way how pblk can
store L2P sector metadata. After this set of changes
any size of NVMe metadata (including 0) is supported.

Igor Konopko (5):
  lightnvm: pblk: Do not reuse DMA memory on partial read
  lightnvm: pblk: Helpers for OOB metadata
  lightnvm: Flexible DMA pool entry size
  lightnvm: Disable interleaved metadata
  lightnvm: pblk: Support for packed metadata

 drivers/lightnvm/core.c  | 33 ++
 drivers/lightnvm/pblk-core.c | 77 +---
 drivers/lightnvm/pblk-init.c | 54 +-
 drivers/lightnvm/pblk-map.c  | 21 ++---
 drivers/lightnvm/pblk-rb.c   |  3 ++
 drivers/lightnvm/pblk-read.c | 56 +++
 drivers/lightnvm/pblk-recovery.c | 28 +++-
 drivers/lightnvm/pblk-sysfs.c|  7 +++
 drivers/lightnvm/pblk-write.c| 14 --
 drivers/lightnvm/pblk.h  | 55 +--
 drivers/nvme/host/lightnvm.c |  7 ++-
 include/linux/lightnvm.h |  9 ++--
 12 files changed, 278 insertions(+), 86 deletions(-)

-- 
2.17.1



[PATCH 5/5] lightnvm: pblk: Support for packed metadata

2018-10-05 Thread Igor Konopko
In current pblk implementation, l2p mapping for not closed lines
is always stored only in OOB metadata and recovered from it.

Such a solution does not provide data integrity when drives does
not have such a OOB metadata space.

The goal of this patch is to add support for so called packed
metadata, which store l2p mapping for open lines in last sector
of every write unit.

Signed-off-by: Igor Konopko 
---
 drivers/lightnvm/pblk-core.c | 52 +---
 drivers/lightnvm/pblk-init.c | 37 +--
 drivers/lightnvm/pblk-rb.c   |  3 ++
 drivers/lightnvm/pblk-recovery.c |  5 +--
 drivers/lightnvm/pblk-sysfs.c|  7 +
 drivers/lightnvm/pblk-write.c| 14 ++---
 drivers/lightnvm/pblk.h  |  5 ++-
 7 files changed, 110 insertions(+), 13 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 131972b13e27..e11a46c05067 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -376,7 +376,7 @@ void pblk_write_should_kick(struct pblk *pblk)
 {
unsigned int secs_avail = pblk_rb_read_count(>rwb);
 
-   if (secs_avail >= pblk->min_write_pgs)
+   if (secs_avail >= pblk->min_write_pgs_data)
pblk_write_kick(pblk);
 }
 
@@ -407,7 +407,9 @@ struct list_head *pblk_line_gc_list(struct pblk *pblk, 
struct pblk_line *line)
struct pblk_line_meta *lm = >lm;
struct pblk_line_mgmt *l_mg = >l_mg;
struct list_head *move_list = NULL;
-   int vsc = le32_to_cpu(*line->vsc);
+   int packed_meta = (le32_to_cpu(*line->vsc) / pblk->min_write_pgs_data)
+   * (pblk->min_write_pgs - pblk->min_write_pgs_data);
+   int vsc = le32_to_cpu(*line->vsc) + packed_meta;
 
lockdep_assert_held(>lock);
 
@@ -620,12 +622,15 @@ struct bio *pblk_bio_map_addr(struct pblk *pblk, void 
*data,
 }
 
 int pblk_calc_secs(struct pblk *pblk, unsigned long secs_avail,
-  unsigned long secs_to_flush)
+  unsigned long secs_to_flush, bool skip_meta)
 {
int max = pblk->sec_per_write;
int min = pblk->min_write_pgs;
int secs_to_sync = 0;
 
+   if (skip_meta && pblk->min_write_pgs_data != pblk->min_write_pgs)
+   min = max = pblk->min_write_pgs_data;
+
if (secs_avail >= max)
secs_to_sync = max;
else if (secs_avail >= min)
@@ -851,7 +856,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct 
pblk_line *line,
 next_rq:
memset(, 0, sizeof(struct nvm_rq));
 
-   rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
+   rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
rq_len = rq_ppas * geo->csecs;
 
bio = pblk_bio_map_addr(pblk, emeta_buf, rq_ppas, rq_len,
@@ -2161,3 +2166,42 @@ void pblk_lookup_l2p_rand(struct pblk *pblk, struct 
ppa_addr *ppas,
}
spin_unlock(>trans_lock);
 }
+
+void pblk_set_packed_meta(struct pblk *pblk, struct nvm_rq *rqd)
+{
+   void *meta_list = rqd->meta_list;
+   void *page;
+   int i = 0;
+
+   if (pblk_is_oob_meta_supported(pblk))
+   return;
+
+   /* We need to zero out metadata corresponding to packed meta page */
+   pblk_set_meta_lba(pblk, meta_list, rqd->nr_ppas - 1, ADDR_EMPTY);
+
+   page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page);
+   /* We need to fill last page of request (packed metadata)
+* with data from oob meta buffer.
+*/
+   for (; i < rqd->nr_ppas; i++)
+   memcpy(page + (i * sizeof(struct pblk_sec_meta)),
+   pblk_get_meta_buffer(pblk, meta_list, i),
+   sizeof(struct pblk_sec_meta));
+}
+
+void pblk_get_packed_meta(struct pblk *pblk, struct nvm_rq *rqd)
+{
+   void *meta_list = rqd->meta_list;
+   void *page;
+   int i = 0;
+
+   if (pblk_is_oob_meta_supported(pblk))
+   return;
+
+   page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page);
+   /* We need to fill oob meta buffer with data from packe metadata */
+   for (; i < rqd->nr_ppas; i++)
+   memcpy(pblk_get_meta_buffer(pblk, meta_list, i),
+   page + (i * sizeof(struct pblk_sec_meta)),
+   sizeof(struct pblk_sec_meta));
+}
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 1529aa37b30f..d2a63494def6 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -407,8 +407,40 @@ static int pblk_core_init(struct pblk *pblk)
pblk->min_write_pgs = geo->ws_opt;
max_write_ppas = pblk->min_write_pgs * geo->all_luns;
pblk->max_write_pgs = min_t(int, max_write_ppas, NVM_MAX_VLBA);
+   pblk->min_write_pgs_data = pblk->min_write_pgs;
pblk_set_sec_per_writ

[PATCH 4/5] lightnvm: Disable interleaved metadata

2018-10-05 Thread Igor Konopko
Currently pblk and lightnvm does only check for size
of OOB metadata and does not care wheather this meta
is located in separate buffer or is interleaved with
data in single buffer.

In reality only the first scenario is supported, where
second mode will break pblk functionality during any
IO operation.

The goal of this patch is to block creation of pblk
devices in case of interleaved metadata

Signed-off-by: Igor Konopko 
---
 drivers/lightnvm/pblk-init.c | 6 ++
 drivers/nvme/host/lightnvm.c | 1 +
 include/linux/lightnvm.h | 1 +
 3 files changed, 8 insertions(+)

diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index b794e279da31..1529aa37b30f 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -1152,6 +1152,12 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct 
gendisk *tdisk,
return ERR_PTR(-EINVAL);
}
 
+   if (geo->ext) {
+   pblk_err(pblk, "extended metadata not supported\n");
+   kfree(pblk);
+   return ERR_PTR(-EINVAL);
+   }
+
spin_lock_init(>resubmit_lock);
spin_lock_init(>trans_lock);
spin_lock_init(>lock);
diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
index e370793f52d5..7020e87bcee4 100644
--- a/drivers/nvme/host/lightnvm.c
+++ b/drivers/nvme/host/lightnvm.c
@@ -989,6 +989,7 @@ void nvme_nvm_update_nvm_info(struct nvme_ns *ns)
 
geo->csecs = 1 << ns->lba_shift;
geo->sos = ns->ms;
+   geo->ext = ns->ext;
 }
 
 int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node)
diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
index c6c998716ee7..abd29f50f2a1 100644
--- a/include/linux/lightnvm.h
+++ b/include/linux/lightnvm.h
@@ -357,6 +357,7 @@ struct nvm_geo {
u32 clba;   /* sectors per chunk */
u16 csecs;  /* sector size */
u16 sos;/* out-of-band area size */
+   u16 ext;/* metadata in extended data buffer */
 
/* device write constrains */
u32 ws_min; /* minimum write size */
-- 
2.17.1



[PATCH 1/5] lightnvm: pblk: Do not reuse DMA memory on partial read

2018-10-05 Thread Igor Konopko
Currently DMA allocated memory is reused on partial read
path for some internal pblk structs. In preparation for
dynamic DMA pool sizes we need to change it to kmalloc
allocated memory.

Signed-off-by: Igor Konopko 
---
 drivers/lightnvm/pblk-read.c | 20 +---
 drivers/lightnvm/pblk.h  |  2 ++
 2 files changed, 7 insertions(+), 15 deletions(-)

diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index d340dece1d00..08f6ebd4bc48 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -224,7 +224,6 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
unsigned long *read_bitmap = pr_ctx->bitmap;
int nr_secs = pr_ctx->orig_nr_secs;
int nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs);
-   __le64 *lba_list_mem, *lba_list_media;
void *src_p, *dst_p;
int hole, i;
 
@@ -237,13 +236,9 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
rqd->ppa_list[0] = ppa;
}
 
-   /* Re-use allocated memory for intermediate lbas */
-   lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
-   lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk_dma_ppa_size);
-
for (i = 0; i < nr_secs; i++) {
-   lba_list_media[i] = meta_list[i].lba;
-   meta_list[i].lba = lba_list_mem[i];
+   pr_ctx->lba_list_media[i] = meta_list[i].lba;
+   meta_list[i].lba = pr_ctx->lba_list_mem[i];
}
 
/* Fill the holes in the original bio */
@@ -255,7 +250,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
line = pblk_ppa_to_line(pblk, rqd->ppa_list[i]);
kref_put(>ref, pblk_line_put);
 
-   meta_list[hole].lba = lba_list_media[i];
+   meta_list[hole].lba = pr_ctx->lba_list_media[i];
 
src_bv = new_bio->bi_io_vec[i++];
dst_bv = bio->bi_io_vec[bio_init_idx + hole];
@@ -295,13 +290,9 @@ static int pblk_setup_partial_read(struct pblk *pblk, 
struct nvm_rq *rqd,
struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
struct pblk_pr_ctx *pr_ctx;
struct bio *new_bio, *bio = r_ctx->private;
-   __le64 *lba_list_mem;
int nr_secs = rqd->nr_ppas;
int i;
 
-   /* Re-use allocated memory for intermediate lbas */
-   lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
-
new_bio = bio_alloc(GFP_KERNEL, nr_holes);
 
if (pblk_bio_add_pages(pblk, new_bio, GFP_KERNEL, nr_holes))
@@ -312,12 +303,12 @@ static int pblk_setup_partial_read(struct pblk *pblk, 
struct nvm_rq *rqd,
goto fail_free_pages;
}
 
-   pr_ctx = kmalloc(sizeof(struct pblk_pr_ctx), GFP_KERNEL);
+   pr_ctx = kzalloc(sizeof(struct pblk_pr_ctx), GFP_KERNEL);
if (!pr_ctx)
goto fail_free_pages;
 
for (i = 0; i < nr_secs; i++)
-   lba_list_mem[i] = meta_list[i].lba;
+   pr_ctx->lba_list_mem[i] = meta_list[i].lba;
 
new_bio->bi_iter.bi_sector = 0; /* internal bio */
bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
@@ -325,7 +316,6 @@ static int pblk_setup_partial_read(struct pblk *pblk, 
struct nvm_rq *rqd,
rqd->bio = new_bio;
rqd->nr_ppas = nr_holes;
 
-   pr_ctx->ppa_ptr = NULL;
pr_ctx->orig_bio = bio;
bitmap_copy(pr_ctx->bitmap, read_bitmap, NVM_MAX_VLBA);
pr_ctx->bio_init_idx = bio_init_idx;
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index 0f98ea24ee59..aea09879636f 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -132,6 +132,8 @@ struct pblk_pr_ctx {
unsigned int bio_init_idx;
void *ppa_ptr;
dma_addr_t dma_ppa_list;
+   __le64 lba_list_mem[NVM_MAX_VLBA];
+   __le64 lba_list_media[NVM_MAX_VLBA];
 };
 
 /* Pad context */
-- 
2.17.1



[PATCH 2/5] lightnvm: pblk: Helpers for OOB metadata

2018-10-05 Thread Igor Konopko
Currently pblk assumes that size of OOB metadata on drive is always
equal to size of pblk_sec_meta struct. This commit add helpers which will
allow to handle different sizes of OOB metadata on drive.

Signed-off-by: Igor Konopko 
---
 drivers/lightnvm/pblk-core.c |  6 ++---
 drivers/lightnvm/pblk-map.c  | 21 ++--
 drivers/lightnvm/pblk-read.c | 41 +++-
 drivers/lightnvm/pblk-recovery.c | 14 ++-
 drivers/lightnvm/pblk.h  | 37 +++-
 5 files changed, 86 insertions(+), 33 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 6944aac43b01..7cb39d84c833 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -743,7 +743,6 @@ int pblk_line_smeta_read(struct pblk *pblk, struct 
pblk_line *line)
rqd.opcode = NVM_OP_PREAD;
rqd.nr_ppas = lm->smeta_sec;
rqd.is_seq = 1;
-
for (i = 0; i < lm->smeta_sec; i++, paddr++)
rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
 
@@ -796,10 +795,11 @@ static int pblk_line_smeta_write(struct pblk *pblk, 
struct pblk_line *line,
rqd.is_seq = 1;
 
for (i = 0; i < lm->smeta_sec; i++, paddr++) {
-   struct pblk_sec_meta *meta_list = rqd.meta_list;
+   void *meta_list = rqd.meta_list;
 
rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
-   meta_list[i].lba = lba_list[paddr] = addr_empty;
+   pblk_set_meta_lba(pblk, meta_list, i, ADDR_EMPTY);
+   lba_list[paddr] = addr_empty;
}
 
ret = pblk_submit_io_sync_sem(pblk, );
diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
index 6dcbd44e3acb..4c7a9909308e 100644
--- a/drivers/lightnvm/pblk-map.c
+++ b/drivers/lightnvm/pblk-map.c
@@ -22,7 +22,7 @@
 static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
  struct ppa_addr *ppa_list,
  unsigned long *lun_bitmap,
- struct pblk_sec_meta *meta_list,
+ void *meta_list,
  unsigned int valid_secs)
 {
struct pblk_line *line = pblk_line_get_data(pblk);
@@ -68,14 +68,15 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned 
int sentry,
kref_get(>ref);
w_ctx = pblk_rb_w_ctx(>rwb, sentry + i);
w_ctx->ppa = ppa_list[i];
-   meta_list[i].lba = cpu_to_le64(w_ctx->lba);
+   pblk_set_meta_lba(pblk, meta_list, i, w_ctx->lba);
lba_list[paddr] = cpu_to_le64(w_ctx->lba);
if (lba_list[paddr] != addr_empty)
line->nr_valid_lbas++;
else
atomic64_inc(>pad_wa);
} else {
-   lba_list[paddr] = meta_list[i].lba = addr_empty;
+   lba_list[paddr] = addr_empty;
+   pblk_set_meta_lba(pblk, meta_list, i, ADDR_EMPTY);
__pblk_map_invalidate(pblk, line, paddr);
}
}
@@ -88,7 +89,7 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, 
unsigned int sentry,
 unsigned long *lun_bitmap, unsigned int valid_secs,
 unsigned int off)
 {
-   struct pblk_sec_meta *meta_list = rqd->meta_list;
+   void *meta_list = rqd->meta_list;
struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
unsigned int map_secs;
int min = pblk->min_write_pgs;
@@ -97,7 +98,10 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, 
unsigned int sentry,
for (i = off; i < rqd->nr_ppas; i += min) {
map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
if (pblk_map_page_data(pblk, sentry + i, _list[i],
-   lun_bitmap, _list[i], map_secs)) {
+   lun_bitmap,
+   pblk_get_meta_buffer(pblk,
+meta_list, i),
+   map_secs)) {
bio_put(rqd->bio);
pblk_free_rqd(pblk, rqd, PBLK_WRITE);
pblk_pipeline_stop(pblk);
@@ -113,7 +117,7 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq 
*rqd,
struct nvm_tgt_dev *dev = pblk->dev;
struct nvm_geo *geo = >geo;
struct pblk_line_meta *lm = >lm;
-   struct pblk_sec_meta *meta_list = rqd->meta_list;
+   void *meta_list = rqd->meta_list;
struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
struct pblk_line *e_line, *d_line;
unsigned int map_secs;
@

Re: [PATCH v4] lightnvm: pblk: add asynchronous partial read

2018-07-10 Thread Igor Konopko
nvm_rq 
*rqd,

+    unsigned int bio_init_idx,
+    unsigned long *read_bitmap,
+    int nr_holes)
+{
+    struct pblk_sec_meta *meta_list = rqd->meta_list;
+    struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
+    struct pblk_pr_ctx *pr_ctx;
+    struct bio *new_bio, *bio = r_ctx->private;
+    __le64 *lba_list_mem;
+    int nr_secs = rqd->nr_ppas;
+    int i;
+
+    /* Re-use allocated memory for intermediate lbas */
+    lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
+
+    new_bio = bio_alloc(GFP_KERNEL, nr_holes);
+
+    if (pblk_bio_add_pages(pblk, new_bio, GFP_KERNEL, nr_holes))
+    goto fail_bio_put;
+
+    if (nr_holes != new_bio->bi_vcnt) {
+    WARN_ONCE(1, "pblk: malformed bio\n");
+    goto fail_free_pages;
+    }
+
+    pr_ctx = kmalloc(sizeof(struct pblk_pr_ctx), GFP_KERNEL);
+    if (!pr_ctx)
+    goto fail_free_pages;
+
+    for (i = 0; i < nr_secs; i++)
+    lba_list_mem[i] = meta_list[i].lba;
+
+    new_bio->bi_iter.bi_sector = 0; /* internal bio */
+    bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
+
+    rqd->bio = new_bio;
+    rqd->nr_ppas = nr_holes;
+    rqd->flags = pblk_set_read_mode(pblk, PBLK_READ_RANDOM);
+
+    pr_ctx->ppa_ptr = NULL;
+    pr_ctx->orig_bio = bio;
+    bitmap_copy(pr_ctx->bitmap, read_bitmap, NVM_MAX_VLBA);
+    pr_ctx->bio_init_idx = bio_init_idx;
+    pr_ctx->orig_nr_secs = nr_secs;
+    r_ctx->private = pr_ctx;
+
+    if (unlikely(nr_holes == 1)) {
+    pr_ctx->ppa_ptr = rqd->ppa_list;
+    pr_ctx->dma_ppa_list = rqd->dma_ppa_list;
+    rqd->ppa_addr = rqd->ppa_list[0];
+    }
+    return 0;
+
+fail_free_pages:
  pblk_bio_free_pages(pblk, new_bio, 0, new_bio->bi_vcnt);
-fail_add_pages:
+fail_bio_put:
+    bio_put(new_bio);
+
+    return -ENOMEM;
+}
+
+static int pblk_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
+ unsigned int bio_init_idx,
+ unsigned long *read_bitmap, int nr_secs)
+{
+    int nr_holes;
+    int ret;
+
+    nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs);
+
+    if (pblk_setup_partial_read(pblk, rqd, bio_init_idx, read_bitmap,
+    nr_holes))
+    return NVM_IO_ERR;
+
+    rqd->end_io = pblk_end_partial_read;
+
+    ret = pblk_submit_io(pblk, rqd);
+    if (ret) {
+    bio_put(rqd->bio);
+    pblk_err(pblk, "partial read IO submission failed\n");
+    goto err;
+    }
+
+    return NVM_IO_OK;
+
+err:
  pblk_err(pblk, "failed to perform partial read\n");
+
+    /* Free allocated pages in new bio */
+    pblk_bio_free_pages(pblk, rqd->bio, 0, rqd->bio->bi_vcnt);
  __pblk_end_io_read(pblk, rqd, false);
  return NVM_IO_ERR;
  }
@@ -480,8 +530,15 @@ int pblk_submit_read(struct pblk *pblk, struct 
bio *bio)
  /* The read bio request could be partially filled by the write 
buffer,

   * but there are some holes that need to be read from the drive.
   */
-    return pblk_partial_read(pblk, rqd, bio, bio_init_idx, read_bitmap);
+    ret = pblk_partial_read_bio(pblk, rqd, bio_init_idx, read_bitmap,
+    nr_secs);
+    if (ret)
+    goto fail_meta_free;
+
+    return NVM_IO_OK;
+fail_meta_free:
+    nvm_dev_dma_free(dev->parent, rqd->meta_list, rqd->dma_meta_list);
  fail_rqd_free:
  pblk_free_rqd(pblk, rqd, PBLK_READ);
  return ret;
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index 5c6904e..4760af7 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -119,6 +119,16 @@ struct pblk_g_ctx {
  u64 lba;
  };
+/* partial read context */
+struct pblk_pr_ctx {
+    struct bio *orig_bio;
+    DECLARE_BITMAP(bitmap, NVM_MAX_VLBA);
+    unsigned int orig_nr_secs;
+    unsigned int bio_init_idx;
+    void *ppa_ptr;
+    dma_addr_t dma_ppa_list;
+};
+
  /* Pad context */
  struct pblk_pad_rq {
  struct pblk *pblk;



Hey Igor,

May I add your reviewed-by before I pick up?


Sure - now it looks fine.

Reviewed-by: Igor Konopko 


Re: [PATCH v3] lightnvm: pblk: add asynchronous partial read

2018-07-06 Thread Igor Konopko




On 06.07.2018 05:18, Matias Bjørling wrote:

On 07/06/2018 12:12 PM, Heiner Litz wrote:

In the read path, partial reads are currently performed synchronously
which affects performance for workloads that generate many partial
reads. This patch adds an asynchronous partial read path as well as
the required partial read ctx.

Signed-off-by: Heiner Litz 

---

v3: rebase to head, incorporate 64-bit read bitmap

---
  drivers/lightnvm/pblk-read.c | 183 
---

  drivers/lightnvm/pblk.h  |  10 +++
  2 files changed, 130 insertions(+), 63 deletions(-)

diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index 9c9362b..4a44076 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -231,74 +231,36 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
  __pblk_end_io_read(pblk, rqd, true);
  }
-static int pblk_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
- struct bio *orig_bio, unsigned int bio_init_idx,
- unsigned long *read_bitmap)
+static void pblk_end_partial_read(struct nvm_rq *rqd)
  {
-    struct pblk_sec_meta *meta_list = rqd->meta_list;
-    struct bio *new_bio;
+    struct pblk *pblk = rqd->private;
+    struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
+    struct pblk_pr_ctx *pr_ctx = r_ctx->private;
+    struct bio *new_bio = rqd->bio;
+    struct bio *bio = pr_ctx->orig_bio;
  struct bio_vec src_bv, dst_bv;
-    void *ppa_ptr = NULL;
-    void *src_p, *dst_p;
-    dma_addr_t dma_ppa_list = 0;
-    __le64 *lba_list_mem, *lba_list_media;
-    int nr_secs = rqd->nr_ppas;
+    struct pblk_sec_meta *meta_list = rqd->meta_list;
+    int bio_init_idx = pr_ctx->bio_init_idx;
+    unsigned long *read_bitmap = _ctx->bitmap;
+    int nr_secs = pr_ctx->orig_nr_secs;
  int nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs);
-    int i, ret, hole;
-
-    /* Re-use allocated memory for intermediate lbas */
-    lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
-    lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk_dma_ppa_size);
-
-    new_bio = bio_alloc(GFP_KERNEL, nr_holes);
-
-    if (pblk_bio_add_pages(pblk, new_bio, GFP_KERNEL, nr_holes))
-    goto fail_add_pages;
-
-    if (nr_holes != new_bio->bi_vcnt) {
-    pblk_err(pblk, "malformed bio\n");
-    goto fail;
-    }
-
-    for (i = 0; i < nr_secs; i++)
-    lba_list_mem[i] = meta_list[i].lba;
-
-    new_bio->bi_iter.bi_sector = 0; /* internal bio */
-    bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
-
-    rqd->bio = new_bio;
-    rqd->nr_ppas = nr_holes;
-    rqd->flags = pblk_set_read_mode(pblk, PBLK_READ_RANDOM);
-
-    if (unlikely(nr_holes == 1)) {
-    ppa_ptr = rqd->ppa_list;
-    dma_ppa_list = rqd->dma_ppa_list;
-    rqd->ppa_addr = rqd->ppa_list[0];
-    }
-
-    ret = pblk_submit_io_sync(pblk, rqd);
-    if (ret) {
-    bio_put(rqd->bio);
-    pblk_err(pblk, "sync read IO submission failed\n");
-    goto fail;
-    }
-
-    if (rqd->error) {
-    atomic_long_inc(>read_failed);
-#ifdef CONFIG_NVM_PBLK_DEBUG
-    pblk_print_failed_rqd(pblk, rqd, rqd->error);
-#endif
-    }
+    __le64 *lba_list_mem, *lba_list_media;
+    void *src_p, *dst_p;
+    int hole, i;
  if (unlikely(nr_holes == 1)) {
  struct ppa_addr ppa;
  ppa = rqd->ppa_addr;
-    rqd->ppa_list = ppa_ptr;
-    rqd->dma_ppa_list = dma_ppa_list;
+    rqd->ppa_list = pr_ctx->ppa_ptr;
+    rqd->dma_ppa_list = pr_ctx->dma_ppa_list;
  rqd->ppa_list[0] = ppa;
  }
+    /* Re-use allocated memory for intermediate lbas */
+    lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
+    lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk_dma_ppa_size);
+
  for (i = 0; i < nr_secs; i++) {
  lba_list_media[i] = meta_list[i].lba;
  meta_list[i].lba = lba_list_mem[i];
@@ -316,7 +278,7 @@ static int pblk_partial_read(struct pblk *pblk, 
struct nvm_rq *rqd,

  meta_list[hole].lba = lba_list_media[i];
  src_bv = new_bio->bi_io_vec[i++];
-    dst_bv = orig_bio->bi_io_vec[bio_init_idx + hole];
+    dst_bv = bio->bi_io_vec[bio_init_idx + hole];
  src_p = kmap_atomic(src_bv.bv_page);
  dst_p = kmap_atomic(dst_bv.bv_page);
@@ -334,19 +296,107 @@ static int pblk_partial_read(struct pblk *pblk, 
struct nvm_rq *rqd,

  } while (hole < nr_secs);
  bio_put(new_bio);
+    kfree(pr_ctx);
  /* restore original request */
  rqd->bio = NULL;
  rqd->nr_ppas = nr_secs;
+    bio_endio(bio);
  __pblk_end_io_read(pblk, rqd, false);
-    return NVM_IO_DONE;
+}
-fail:
-    /* Free allocated pages in new bio */
+static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq 
*rqd,

+    unsigned int bio_init_idx,
+    unsigned long *read_bitmap,
+    int nr_holes)
+{
+    struct pblk_sec_meta *meta_list = rqd->meta_list;
+    struct pblk_g_ctx *r_ctx 

Re: [PATCH 4/5] lightnvm: pblk: Support for packed metadata in pblk.

2018-06-19 Thread Igor Konopko




On 19.06.2018 05:47, Javier Gonzalez wrote:

On 19 Jun 2018, at 14.42, Matias Bjørling  wrote:

On Tue, Jun 19, 2018 at 1:08 PM, Javier Gonzalez  wrote:

On 16 Jun 2018, at 00.27, Igor Konopko  wrote:

In current pblk implementation, l2p mapping for not closed lines
is always stored only in OOB metadata and recovered from it.

Such a solution does not provide data integrity when drives does
not have such a OOB metadata space.

The goal of this patch is to add support for so called packed
metadata, which store l2p mapping for open lines in last sector
of every write unit.

Signed-off-by: Igor Konopko 
---
drivers/lightnvm/pblk-core.c | 52 
drivers/lightnvm/pblk-init.c | 37 ++--
drivers/lightnvm/pblk-rb.c   |  3 +++
drivers/lightnvm/pblk-recovery.c | 25 +++
drivers/lightnvm/pblk-sysfs.c|  7 ++
drivers/lightnvm/pblk-write.c| 14 +++
drivers/lightnvm/pblk.h  |  5 +++-
7 files changed, 128 insertions(+), 15 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index c092ee93a18d..375c6430612e 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -340,7 +340,7 @@ void pblk_write_should_kick(struct pblk *pblk)
{
  unsigned int secs_avail = pblk_rb_read_count(>rwb);

- if (secs_avail >= pblk->min_write_pgs)
+ if (secs_avail >= pblk->min_write_pgs_data)
  pblk_write_kick(pblk);
}

@@ -371,7 +371,9 @@ struct list_head *pblk_line_gc_list(struct pblk *pblk, 
struct pblk_line *line)
  struct pblk_line_meta *lm = >lm;
  struct pblk_line_mgmt *l_mg = >l_mg;
  struct list_head *move_list = NULL;
- int vsc = le32_to_cpu(*line->vsc);
+ int packed_meta = (le32_to_cpu(*line->vsc) / pblk->min_write_pgs_data)
+ * (pblk->min_write_pgs - pblk->min_write_pgs_data);
+ int vsc = le32_to_cpu(*line->vsc) + packed_meta;

  lockdep_assert_held(>lock);

@@ -540,12 +542,15 @@ struct bio *pblk_bio_map_addr(struct pblk *pblk, void 
*data,
}

int pblk_calc_secs(struct pblk *pblk, unsigned long secs_avail,
-unsigned long secs_to_flush)
+unsigned long secs_to_flush, bool skip_meta)
{
  int max = pblk->sec_per_write;
  int min = pblk->min_write_pgs;
  int secs_to_sync = 0;

+ if (skip_meta)
+ min = max = pblk->min_write_pgs_data;
+
  if (secs_avail >= max)
  secs_to_sync = max;
  else if (secs_avail >= min)
@@ -663,7 +668,7 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, 
struct pblk_line *line,
next_rq:
  memset(, 0, sizeof(struct nvm_rq));

- rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
+ rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
  rq_len = rq_ppas * geo->csecs;

  bio = pblk_bio_map_addr(pblk, emeta_buf, rq_ppas, rq_len,
@@ -2091,3 +2096,42 @@ void pblk_lookup_l2p_rand(struct pblk *pblk, struct 
ppa_addr *ppas,
  }
  spin_unlock(>trans_lock);
}
+
+void pblk_set_packed_meta(struct pblk *pblk, struct nvm_rq *rqd)
+{
+ void *meta_list = rqd->meta_list;
+ void *page;
+ int i = 0;
+
+ if (pblk_is_oob_meta_supported(pblk))
+ return;
+
+ /* We need to zero out metadata corresponding to packed meta page */
+ pblk_get_meta_at(pblk, meta_list, rqd->nr_ppas - 1)->lba = ADDR_EMPTY;
+
+ page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page);
+ /* We need to fill last page of request (packed metadata)
+  * with data from oob meta buffer.
+  */
+ for (; i < rqd->nr_ppas; i++)
+ memcpy(page + (i * sizeof(struct pblk_sec_meta)),
+ pblk_get_meta_at(pblk, meta_list, i),
+ sizeof(struct pblk_sec_meta));
+}
+
+void pblk_get_packed_meta(struct pblk *pblk, struct nvm_rq *rqd)
+{
+ void *meta_list = rqd->meta_list;
+ void *page;
+ int i = 0;
+
+ if (pblk_is_oob_meta_supported(pblk))
+ return;
+
+ page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page);
+ /* We need to fill oob meta buffer with data from packed metadata */
+ for (; i < rqd->nr_ppas; i++)
+ memcpy(pblk_get_meta_at(pblk, meta_list, i),
+ page + (i * sizeof(struct pblk_sec_meta)),
+ sizeof(struct pblk_sec_meta));
+}
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index f05112230a52..5eb641da46ed 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -372,8 +372,40 @@ static int pblk_core_init(struct pblk *pblk)
  pblk->min_write_pgs = geo->ws_opt * (geo->csecs / PAGE_SIZE);
  max_write_ppas = pblk->min_write_pgs * geo->all_luns;
  pblk->max_write_pgs = min_t(int, max_write_

Re: [PATCH 1/5] lightnvm: pblk: Helpers for OOB metadata

2018-06-18 Thread Igor Konopko




On 18.06.2018 07:23, Javier Gonzalez wrote:



On 16 Jun 2018, at 00.27, Igor Konopko  wrote:

Currently pblk assumes that size of OOB metadata on drive is always
equal to size of pblk_sec_meta struct. This commit add helpers which will
allow to handle different sizes of OOB metadata on drive.

Signed-off-by: Igor Konopko 
---
drivers/lightnvm/pblk-core.c | 10 +
drivers/lightnvm/pblk-map.c  | 21 ---
drivers/lightnvm/pblk-read.c | 45 +---
drivers/lightnvm/pblk-recovery.c | 24 -
drivers/lightnvm/pblk.h  | 29 ++
5 files changed, 91 insertions(+), 38 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 66ab1036f2fb..8a0ac466872f 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -685,7 +685,7 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, 
struct pblk_line *line,
rqd.nr_ppas = rq_ppas;

if (dir == PBLK_WRITE) {
-   struct pblk_sec_meta *meta_list = rqd.meta_list;
+   void *meta_list = rqd.meta_list;

rqd.flags = pblk_set_progr_mode(pblk, PBLK_WRITE);
for (i = 0; i < rqd.nr_ppas; ) {
@@ -693,7 +693,8 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, 
struct pblk_line *line,
paddr = __pblk_alloc_page(pblk, line, min);
spin_unlock(>lock);
for (j = 0; j < min; j++, i++, paddr++) {
-   meta_list[i].lba = cpu_to_le64(ADDR_EMPTY);
+   pblk_get_meta_at(pblk, meta_list, i)->lba =
+   cpu_to_le64(ADDR_EMPTY);
rqd.ppa_list[i] =
addr_to_gen_ppa(pblk, paddr, id);
}
@@ -825,14 +826,15 @@ static int pblk_line_submit_smeta_io(struct pblk *pblk, 
struct pblk_line *line,
rqd.nr_ppas = lm->smeta_sec;

for (i = 0; i < lm->smeta_sec; i++, paddr++) {
-   struct pblk_sec_meta *meta_list = rqd.meta_list;
+   void *meta_list = rqd.meta_list;

rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);

if (dir == PBLK_WRITE) {
__le64 addr_empty = cpu_to_le64(ADDR_EMPTY);

-   meta_list[i].lba = lba_list[paddr] = addr_empty;
+   pblk_get_meta_at(pblk, meta_list, i)->lba =
+   lba_list[paddr] = addr_empty;
}
}

diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
index 953ca31dda68..92c40b546c4e 100644
--- a/drivers/lightnvm/pblk-map.c
+++ b/drivers/lightnvm/pblk-map.c
@@ -21,7 +21,7 @@
static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
  struct ppa_addr *ppa_list,
  unsigned long *lun_bitmap,
- struct pblk_sec_meta *meta_list,
+ void *meta_list,
  unsigned int valid_secs)
{
struct pblk_line *line = pblk_line_get_data(pblk);
@@ -67,14 +67,17 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned 
int sentry,
kref_get(>ref);
w_ctx = pblk_rb_w_ctx(>rwb, sentry + i);
w_ctx->ppa = ppa_list[i];
-   meta_list[i].lba = cpu_to_le64(w_ctx->lba);
+   pblk_get_meta_at(pblk, meta_list, i)->lba =
+   cpu_to_le64(w_ctx->lba);
lba_list[paddr] = cpu_to_le64(w_ctx->lba);
if (lba_list[paddr] != addr_empty)
line->nr_valid_lbas++;
else
atomic64_inc(>pad_wa);
} else {
-   lba_list[paddr] = meta_list[i].lba = addr_empty;
+   lba_list[paddr] =
+   pblk_get_meta_at(pblk, meta_list, i)->lba =
+   addr_empty;
__pblk_map_invalidate(pblk, line, paddr);
}
}
@@ -87,7 +90,7 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, 
unsigned int sentry,
 unsigned long *lun_bitmap, unsigned int valid_secs,
 unsigned int off)
{
-   struct pblk_sec_meta *meta_list = rqd->meta_list;
+   void *meta_list = rqd->meta_list;
unsigned int map_secs;
int min = pblk->min_write_pgs;
int i;
@@ -95,7 +98,9 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, 
unsigned int sentry,
for (i = off; i < rqd->nr_ppas; i += min) {
map_secs = (i + min > val

Re: [PATCH 5/5] lightnvm: pblk: Disable interleaved metadata in pblk

2018-06-18 Thread Igor Konopko




On 18.06.2018 07:29, Javier Gonzalez wrote:

On 16 Jun 2018, at 21.38, Matias Bjørling  wrote:

On 06/16/2018 12:27 AM, Igor Konopko wrote:

Currently pblk and lightnvm does only check for size
of OOB metadata and does not care wheather this meta
is located in separate buffer or is interleaved with
data in single buffer.
In reality only the first scenario is supported, where
second mode will break pblk functionality during any
IO operation.
The goal of this patch is to block creation of pblk
devices in case of interleaved metadata
Signed-off-by: Igor Konopko 
---
  drivers/lightnvm/pblk-init.c | 6 ++
  drivers/nvme/host/lightnvm.c | 1 +
  include/linux/lightnvm.h | 1 +
  3 files changed, 8 insertions(+)
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 5eb641da46ed..483a6d479e7d 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -1238,6 +1238,12 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct 
gendisk *tdisk,
return ERR_PTR(-EINVAL);
}
  + if (geo->ext) {
+   pr_err("pblk: extended (interleaved) metadata in data buffer"
+   " not supported\n");
+   return ERR_PTR(-EINVAL);
+   }
+
pblk = kzalloc(sizeof(struct pblk), GFP_KERNEL);
if (!pblk)
return ERR_PTR(-ENOMEM);
diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
index 670478abc754..872ab854ccf5 100644
--- a/drivers/nvme/host/lightnvm.c
+++ b/drivers/nvme/host/lightnvm.c
@@ -979,6 +979,7 @@ void nvme_nvm_update_nvm_info(struct nvme_ns *ns)
geo->csecs = 1 << ns->lba_shift;
geo->sos = ns->ms;
+   geo->ext = ns->ext;
  }
int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node)
diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
index 72a55d71917e..b13e64e2112f 100644
--- a/include/linux/lightnvm.h
+++ b/include/linux/lightnvm.h
@@ -350,6 +350,7 @@ struct nvm_geo {
u32 clba;   /* sectors per chunk */
u16 csecs;  /* sector size */
u16 sos;/* out-of-band area size */
+   u16 ext;/* metadata in extended data buffer */
/* device write constrains */
u32 ws_min; /* minimum write size */


I think bool type would be better here. Can it be placesd a bit down, just over 
the 1.2 stuff?

Also, feel free to fix up the checkpatch stuff in patch 1 & 3 & 5.


Apart from Matias' comments, it looks good to me.

Traditionally, we have separated subsystem and target patches to make
sure there is no coupling between pblk and lightnvm, but if Matias is ok
with starting having patches covering all at once, then good for me too.

Javier



Will fix above comments and resend.

Igor


Re: [PATCH 2/5] lightnvm: pblk: Remove resv field for sec meta

2018-06-18 Thread Igor Konopko




On 18.06.2018 07:25, Javier Gonzalez wrote:

On 16 Jun 2018, at 21.27, Matias Bjørling  wrote:

On 06/16/2018 12:27 AM, Igor Konopko wrote:

Since we have flexible size of pblk_sec_meta
which depends on drive metadata size we can
remove not needed reserved field from that
structure
Signed-off-by: Igor Konopko 
---
  drivers/lightnvm/pblk.h | 1 -
  1 file changed, 1 deletion(-)
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index f82c3a0b0de5..27658dc6fc1a 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -82,7 +82,6 @@ enum {
  };
struct pblk_sec_meta {
-   u64 reserved;
__le64 lba;
  };



Looks good to me. Javier may have some comment on this, since it is
not completely obvious from the code why that reserved attribute is
there. I do like the change to go in, as it needlessly extends the
requirement from 8 to 16bytes.


Looks good to me. Maybe marge this patch with 1/5? It was actually a
comment I added to it.



Sure, can merge it.

Igor


[PATCH 1/5] lightnvm: pblk: Helpers for OOB metadata

2018-06-15 Thread Igor Konopko
Currently pblk assumes that size of OOB metadata on drive is always
equal to size of pblk_sec_meta struct. This commit add helpers which will
allow to handle different sizes of OOB metadata on drive.

Signed-off-by: Igor Konopko 
---
 drivers/lightnvm/pblk-core.c | 10 +
 drivers/lightnvm/pblk-map.c  | 21 ---
 drivers/lightnvm/pblk-read.c | 45 +---
 drivers/lightnvm/pblk-recovery.c | 24 -
 drivers/lightnvm/pblk.h  | 29 ++
 5 files changed, 91 insertions(+), 38 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 66ab1036f2fb..8a0ac466872f 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -685,7 +685,7 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, 
struct pblk_line *line,
rqd.nr_ppas = rq_ppas;
 
if (dir == PBLK_WRITE) {
-   struct pblk_sec_meta *meta_list = rqd.meta_list;
+   void *meta_list = rqd.meta_list;
 
rqd.flags = pblk_set_progr_mode(pblk, PBLK_WRITE);
for (i = 0; i < rqd.nr_ppas; ) {
@@ -693,7 +693,8 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, 
struct pblk_line *line,
paddr = __pblk_alloc_page(pblk, line, min);
spin_unlock(>lock);
for (j = 0; j < min; j++, i++, paddr++) {
-   meta_list[i].lba = cpu_to_le64(ADDR_EMPTY);
+   pblk_get_meta_at(pblk, meta_list, i)->lba =
+   cpu_to_le64(ADDR_EMPTY);
rqd.ppa_list[i] =
addr_to_gen_ppa(pblk, paddr, id);
}
@@ -825,14 +826,15 @@ static int pblk_line_submit_smeta_io(struct pblk *pblk, 
struct pblk_line *line,
rqd.nr_ppas = lm->smeta_sec;
 
for (i = 0; i < lm->smeta_sec; i++, paddr++) {
-   struct pblk_sec_meta *meta_list = rqd.meta_list;
+   void *meta_list = rqd.meta_list;
 
rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
 
if (dir == PBLK_WRITE) {
__le64 addr_empty = cpu_to_le64(ADDR_EMPTY);
 
-   meta_list[i].lba = lba_list[paddr] = addr_empty;
+   pblk_get_meta_at(pblk, meta_list, i)->lba =
+   lba_list[paddr] = addr_empty;
}
}
 
diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c
index 953ca31dda68..92c40b546c4e 100644
--- a/drivers/lightnvm/pblk-map.c
+++ b/drivers/lightnvm/pblk-map.c
@@ -21,7 +21,7 @@
 static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
  struct ppa_addr *ppa_list,
  unsigned long *lun_bitmap,
- struct pblk_sec_meta *meta_list,
+ void *meta_list,
  unsigned int valid_secs)
 {
struct pblk_line *line = pblk_line_get_data(pblk);
@@ -67,14 +67,17 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned 
int sentry,
kref_get(>ref);
w_ctx = pblk_rb_w_ctx(>rwb, sentry + i);
w_ctx->ppa = ppa_list[i];
-   meta_list[i].lba = cpu_to_le64(w_ctx->lba);
+   pblk_get_meta_at(pblk, meta_list, i)->lba =
+   cpu_to_le64(w_ctx->lba);
lba_list[paddr] = cpu_to_le64(w_ctx->lba);
if (lba_list[paddr] != addr_empty)
line->nr_valid_lbas++;
else
atomic64_inc(>pad_wa);
} else {
-   lba_list[paddr] = meta_list[i].lba = addr_empty;
+   lba_list[paddr] =
+   pblk_get_meta_at(pblk, meta_list, i)->lba =
+   addr_empty;
__pblk_map_invalidate(pblk, line, paddr);
}
}
@@ -87,7 +90,7 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, 
unsigned int sentry,
 unsigned long *lun_bitmap, unsigned int valid_secs,
 unsigned int off)
 {
-   struct pblk_sec_meta *meta_list = rqd->meta_list;
+   void *meta_list = rqd->meta_list;
unsigned int map_secs;
int min = pblk->min_write_pgs;
int i;
@@ -95,7 +98,9 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, 
unsigned int sentry,
for (i = off; i < rqd->nr_ppas; i += min) {
map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
if (pblk_map_page_data(pblk, s

[PATCH 3/5] lightnvm: Flexible DMA pool entry size

2018-06-15 Thread Igor Konopko
Currently whole lightnvm and pblk uses single DMA pool,
for which entry size is always equal to PAGE_SIZE.
PPA list always needs 8b*64, so there is only 56b*64
space for OOB meta. Since NVMe OOB meta can be bigger,
such as 128b, this solution is not robustness.

This patch add the possiblity to support OOB meta above
56b by creating separate DMA pool for PBLK with entry
size which is big enough to store both PPA list and such
a OOB metadata.

Signed-off-by: Igor Konopko 
---
 drivers/lightnvm/core.c  | 33 -
 drivers/lightnvm/pblk-core.c | 24 +---
 drivers/lightnvm/pblk-init.c |  9 +
 drivers/lightnvm/pblk-read.c | 40 +++-
 drivers/lightnvm/pblk-recovery.c | 18 ++
 drivers/lightnvm/pblk-write.c|  8 
 drivers/lightnvm/pblk.h  | 11 ++-
 drivers/nvme/host/lightnvm.c |  6 --
 include/linux/lightnvm.h |  8 +---
 9 files changed, 106 insertions(+), 51 deletions(-)

diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
index 60aa7bc5a630..bc8e6ecea083 100644
--- a/drivers/lightnvm/core.c
+++ b/drivers/lightnvm/core.c
@@ -642,20 +642,33 @@ void nvm_unregister_tgt_type(struct nvm_tgt_type *tt)
 }
 EXPORT_SYMBOL(nvm_unregister_tgt_type);
 
-void *nvm_dev_dma_alloc(struct nvm_dev *dev, gfp_t mem_flags,
-   dma_addr_t *dma_handler)
+void *nvm_dev_dma_alloc(struct nvm_dev *dev, void *pool,
+   gfp_t mem_flags, dma_addr_t *dma_handler)
 {
-   return dev->ops->dev_dma_alloc(dev, dev->dma_pool, mem_flags,
-   dma_handler);
+   return dev->ops->dev_dma_alloc(dev, pool ?: dev->dma_pool,
+   mem_flags, dma_handler);
 }
 EXPORT_SYMBOL(nvm_dev_dma_alloc);
 
-void nvm_dev_dma_free(struct nvm_dev *dev, void *addr, dma_addr_t dma_handler)
+void nvm_dev_dma_free(struct nvm_dev *dev, void *pool,
+   void *addr, dma_addr_t dma_handler)
 {
-   dev->ops->dev_dma_free(dev->dma_pool, addr, dma_handler);
+   dev->ops->dev_dma_free(pool ?: dev->dma_pool, addr, dma_handler);
 }
 EXPORT_SYMBOL(nvm_dev_dma_free);
 
+void *nvm_dev_dma_create(struct nvm_dev *dev, int size, char *name)
+{
+   return dev->ops->create_dma_pool(dev, name, size);
+}
+EXPORT_SYMBOL(nvm_dev_dma_create);
+
+void nvm_dev_dma_destroy(struct nvm_dev *dev, void *pool)
+{
+   dev->ops->destroy_dma_pool(pool);
+}
+EXPORT_SYMBOL(nvm_dev_dma_destroy);
+
 static struct nvm_dev *nvm_find_nvm_dev(const char *name)
 {
struct nvm_dev *dev;
@@ -683,7 +696,8 @@ static int nvm_set_rqd_ppalist(struct nvm_tgt_dev *tgt_dev, 
struct nvm_rq *rqd,
}
 
rqd->nr_ppas = nr_ppas;
-   rqd->ppa_list = nvm_dev_dma_alloc(dev, GFP_KERNEL, >dma_ppa_list);
+   rqd->ppa_list = nvm_dev_dma_alloc(dev, NULL, GFP_KERNEL,
+   >dma_ppa_list);
if (!rqd->ppa_list) {
pr_err("nvm: failed to allocate dma memory\n");
return -ENOMEM;
@@ -709,7 +723,8 @@ static void nvm_free_rqd_ppalist(struct nvm_tgt_dev 
*tgt_dev,
if (!rqd->ppa_list)
return;
 
-   nvm_dev_dma_free(tgt_dev->parent, rqd->ppa_list, rqd->dma_ppa_list);
+   nvm_dev_dma_free(tgt_dev->parent, NULL, rqd->ppa_list,
+   rqd->dma_ppa_list);
 }
 
 int nvm_get_chunk_meta(struct nvm_tgt_dev *tgt_dev, struct nvm_chk_meta *meta,
@@ -933,7 +948,7 @@ int nvm_register(struct nvm_dev *dev)
if (!dev->q || !dev->ops)
return -EINVAL;
 
-   dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist");
+   dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist", PAGE_SIZE);
if (!dev->dma_pool) {
pr_err("nvm: could not create dma pool\n");
return -ENOMEM;
diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 8a0ac466872f..c092ee93a18d 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -279,7 +279,7 @@ void pblk_free_rqd(struct pblk *pblk, struct nvm_rq *rqd, 
int type)
}
 
if (rqd->meta_list)
-   nvm_dev_dma_free(dev->parent, rqd->meta_list,
+   nvm_dev_dma_free(dev->parent, pblk->dma_pool, rqd->meta_list,
rqd->dma_meta_list);
mempool_free(rqd, pool);
 }
@@ -652,13 +652,13 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, 
struct pblk_line *line,
} else
return -EINVAL;
 
-   meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
-   

[PATCH 4/5] lightnvm: pblk: Support for packed metadata in pblk.

2018-06-15 Thread Igor Konopko
In current pblk implementation, l2p mapping for not closed lines
is always stored only in OOB metadata and recovered from it.

Such a solution does not provide data integrity when drives does
not have such a OOB metadata space.

The goal of this patch is to add support for so called packed
metadata, which store l2p mapping for open lines in last sector
of every write unit.

Signed-off-by: Igor Konopko 
---
 drivers/lightnvm/pblk-core.c | 52 
 drivers/lightnvm/pblk-init.c | 37 ++--
 drivers/lightnvm/pblk-rb.c   |  3 +++
 drivers/lightnvm/pblk-recovery.c | 25 +++
 drivers/lightnvm/pblk-sysfs.c|  7 ++
 drivers/lightnvm/pblk-write.c| 14 +++
 drivers/lightnvm/pblk.h  |  5 +++-
 7 files changed, 128 insertions(+), 15 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index c092ee93a18d..375c6430612e 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -340,7 +340,7 @@ void pblk_write_should_kick(struct pblk *pblk)
 {
unsigned int secs_avail = pblk_rb_read_count(>rwb);
 
-   if (secs_avail >= pblk->min_write_pgs)
+   if (secs_avail >= pblk->min_write_pgs_data)
pblk_write_kick(pblk);
 }
 
@@ -371,7 +371,9 @@ struct list_head *pblk_line_gc_list(struct pblk *pblk, 
struct pblk_line *line)
struct pblk_line_meta *lm = >lm;
struct pblk_line_mgmt *l_mg = >l_mg;
struct list_head *move_list = NULL;
-   int vsc = le32_to_cpu(*line->vsc);
+   int packed_meta = (le32_to_cpu(*line->vsc) / pblk->min_write_pgs_data)
+   * (pblk->min_write_pgs - pblk->min_write_pgs_data);
+   int vsc = le32_to_cpu(*line->vsc) + packed_meta;
 
lockdep_assert_held(>lock);
 
@@ -540,12 +542,15 @@ struct bio *pblk_bio_map_addr(struct pblk *pblk, void 
*data,
 }
 
 int pblk_calc_secs(struct pblk *pblk, unsigned long secs_avail,
-  unsigned long secs_to_flush)
+  unsigned long secs_to_flush, bool skip_meta)
 {
int max = pblk->sec_per_write;
int min = pblk->min_write_pgs;
int secs_to_sync = 0;
 
+   if (skip_meta)
+   min = max = pblk->min_write_pgs_data;
+
if (secs_avail >= max)
secs_to_sync = max;
else if (secs_avail >= min)
@@ -663,7 +668,7 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, 
struct pblk_line *line,
 next_rq:
memset(, 0, sizeof(struct nvm_rq));
 
-   rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
+   rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
rq_len = rq_ppas * geo->csecs;
 
bio = pblk_bio_map_addr(pblk, emeta_buf, rq_ppas, rq_len,
@@ -2091,3 +2096,42 @@ void pblk_lookup_l2p_rand(struct pblk *pblk, struct 
ppa_addr *ppas,
}
spin_unlock(>trans_lock);
 }
+
+void pblk_set_packed_meta(struct pblk *pblk, struct nvm_rq *rqd)
+{
+   void *meta_list = rqd->meta_list;
+   void *page;
+   int i = 0;
+
+   if (pblk_is_oob_meta_supported(pblk))
+   return;
+
+   /* We need to zero out metadata corresponding to packed meta page */
+   pblk_get_meta_at(pblk, meta_list, rqd->nr_ppas - 1)->lba = ADDR_EMPTY;
+
+   page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page);
+   /* We need to fill last page of request (packed metadata)
+* with data from oob meta buffer.
+*/
+   for (; i < rqd->nr_ppas; i++)
+   memcpy(page + (i * sizeof(struct pblk_sec_meta)),
+   pblk_get_meta_at(pblk, meta_list, i),
+   sizeof(struct pblk_sec_meta));
+}
+
+void pblk_get_packed_meta(struct pblk *pblk, struct nvm_rq *rqd)
+{
+   void *meta_list = rqd->meta_list;
+   void *page;
+   int i = 0;
+
+   if (pblk_is_oob_meta_supported(pblk))
+   return;
+
+   page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page);
+   /* We need to fill oob meta buffer with data from packed metadata */
+   for (; i < rqd->nr_ppas; i++)
+   memcpy(pblk_get_meta_at(pblk, meta_list, i),
+   page + (i * sizeof(struct pblk_sec_meta)),
+   sizeof(struct pblk_sec_meta));
+}
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index f05112230a52..5eb641da46ed 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -372,8 +372,40 @@ static int pblk_core_init(struct pblk *pblk)
pblk->min_write_pgs = geo->ws_opt * (geo->csecs / PAGE_SIZE);
max_write_ppas = pblk->min_write_pgs * geo->all_luns;
pblk->max_write_pgs = min_t(int, max_write_ppas, NVM_MAX_VLBA);
+   pblk->min_write_pgs_data = pblk->min_write_pgs;
pblk_se

[PATCH 5/5] lightnvm: pblk: Disable interleaved metadata in pblk

2018-06-15 Thread Igor Konopko
Currently pblk and lightnvm does only check for size
of OOB metadata and does not care wheather this meta
is located in separate buffer or is interleaved with
data in single buffer.

In reality only the first scenario is supported, where
second mode will break pblk functionality during any
IO operation.

The goal of this patch is to block creation of pblk
devices in case of interleaved metadata

Signed-off-by: Igor Konopko 
---
 drivers/lightnvm/pblk-init.c | 6 ++
 drivers/nvme/host/lightnvm.c | 1 +
 include/linux/lightnvm.h | 1 +
 3 files changed, 8 insertions(+)

diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 5eb641da46ed..483a6d479e7d 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -1238,6 +1238,12 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct 
gendisk *tdisk,
return ERR_PTR(-EINVAL);
}
 
+   if (geo->ext) {
+   pr_err("pblk: extended (interleaved) metadata in data buffer"
+   " not supported\n");
+   return ERR_PTR(-EINVAL);
+   }
+
pblk = kzalloc(sizeof(struct pblk), GFP_KERNEL);
if (!pblk)
return ERR_PTR(-ENOMEM);
diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
index 670478abc754..872ab854ccf5 100644
--- a/drivers/nvme/host/lightnvm.c
+++ b/drivers/nvme/host/lightnvm.c
@@ -979,6 +979,7 @@ void nvme_nvm_update_nvm_info(struct nvme_ns *ns)
 
geo->csecs = 1 << ns->lba_shift;
geo->sos = ns->ms;
+   geo->ext = ns->ext;
 }
 
 int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node)
diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
index 72a55d71917e..b13e64e2112f 100644
--- a/include/linux/lightnvm.h
+++ b/include/linux/lightnvm.h
@@ -350,6 +350,7 @@ struct nvm_geo {
u32 clba;   /* sectors per chunk */
u16 csecs;  /* sector size */
u16 sos;/* out-of-band area size */
+   u16 ext;/* metadata in extended data buffer */
 
/* device write constrains */
u32 ws_min; /* minimum write size */
-- 
2.14.3



[PATCH 0/5] lightnvm: More flexible approach to metadata

2018-06-15 Thread Igor Konopko
This series of patches introduce some more flexibility in pblk
related to OOB meta:
-ability to use different sizes of metadata (previously fixed 16b)
-ability to use pblk on drives without metadata
-ensuring that extended (interleaved) metadata is not in use

I belive that most of this patches, maybe except of number 4 (Support
for packed metadata) are rather simple, so waiting for comments
especially about this one.

Igor Konopko (5):
  lightnvm: pblk: Helpers for OOB metadata
  lightnvm: pblk: Remove resv field for sec meta
  lightnvm: Flexible DMA pool entry size
  lightnvm: pblk: Support for packed metadata in pblk.
  lightnvm: pblk: Disable interleaved metadata in pblk

 drivers/lightnvm/core.c  | 33 ++-
 drivers/lightnvm/pblk-core.c | 86 +++-
 drivers/lightnvm/pblk-init.c | 52 +++-
 drivers/lightnvm/pblk-map.c  | 21 ++
 drivers/lightnvm/pblk-rb.c   |  3 ++
 drivers/lightnvm/pblk-read.c | 85 +--
 drivers/lightnvm/pblk-recovery.c | 67 +--
 drivers/lightnvm/pblk-sysfs.c|  7 
 drivers/lightnvm/pblk-write.c| 22 ++
 drivers/lightnvm/pblk.h  | 46 +++--
 drivers/nvme/host/lightnvm.c |  7 +++-
 include/linux/lightnvm.h |  9 +++--
 12 files changed, 333 insertions(+), 105 deletions(-)

-- 
2.14.3



[PATCH 2/5] lightnvm: pblk: Remove resv field for sec meta

2018-06-15 Thread Igor Konopko
Since we have flexible size of pblk_sec_meta
which depends on drive metadata size we can
remove not needed reserved field from that
structure

Signed-off-by: Igor Konopko 
---
 drivers/lightnvm/pblk.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index f82c3a0b0de5..27658dc6fc1a 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -82,7 +82,6 @@ enum {
 };
 
 struct pblk_sec_meta {
-   u64 reserved;
__le64 lba;
 };
 
-- 
2.14.3



Re: [PATCH] lightnvm: pblk: add asynchronous partial read

2018-06-13 Thread Igor Konopko
_list;
+    struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
+    struct pblk_pr_ctx *pr_ctx;
+    struct bio *new_bio, *bio = r_ctx->private;
+    __le64 *lba_list_mem;
+    int nr_secs = rqd->nr_ppas;
+    int i;
+
+    /* Re-use allocated memory for intermediate lbas */
+    lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
+
+    new_bio = bio_alloc(GFP_KERNEL, nr_holes);
+
+    if (pblk_bio_add_pages(pblk, new_bio, GFP_KERNEL, nr_holes))
+    goto fail;
+
+    if (nr_holes != new_bio->bi_vcnt) {
+    pr_err("pblk: malformed bio\n");
+    goto fail;
Shouldn't we use goto fail_pages since we already allocate bio pages 
correctly?

+    }
+
+    pr_ctx = kmalloc(sizeof(struct pblk_pr_ctx), GFP_KERNEL);
+    if (!pr_ctx)
+    goto fail_pages;
+
+    for (i = 0; i < nr_secs; i++)
+    lba_list_mem[i] = meta_list[i].lba;
+
+    new_bio->bi_iter.bi_sector = 0; /* internal bio */
+    bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
+
+    rqd->bio = new_bio;
+    rqd->nr_ppas = nr_holes;
+    rqd->flags = pblk_set_read_mode(pblk, PBLK_READ_RANDOM);
+
+    pr_ctx->ppa_ptr = NULL;
+    pr_ctx->orig_bio = bio;
+    pr_ctx->bitmap = *read_bitmap;
+    pr_ctx->bio_init_idx = bio_init_idx;
+    pr_ctx->orig_nr_secs = nr_secs;
+    r_ctx->private = pr_ctx;
+
+    if (unlikely(nr_holes == 1)) {
+    pr_ctx->ppa_ptr = rqd->ppa_list;
+    pr_ctx->dma_ppa_list = rqd->dma_ppa_list;
+    rqd->ppa_addr = rqd->ppa_list[0];
+    }
+    return 0;
+
+fail_pages:
+    pblk_bio_free_pages(pblk, new_bio, 0, new_bio->bi_vcnt);
+fail:
+    bio_put(new_bio);
+
+    return -ENOMEM;
+}
+
+static int pblk_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
+ unsigned int bio_init_idx,
+ unsigned long *read_bitmap, int nr_secs)
+{
+    int nr_holes;
+    int ret;
+
+    nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs);
+
+    if (pblk_setup_partial_read(pblk, rqd, bio_init_idx, read_bitmap,
+    nr_holes))
+    return NVM_IO_ERR;
+
+    rqd->end_io = pblk_end_partial_read;
+
+    ret = pblk_submit_io(pblk, rqd);
+    if (ret) {
+    bio_put(rqd->bio);
+    pr_err("pblk: partial read IO submission failed\n");
+    goto err;
+    }
+
+    return NVM_IO_OK;

err:
pr_err("pblk: failed to perform partial read\n");

/* Free allocated pages in new bio */
-    pblk_bio_free_pages(pblk, orig_bio, 0, new_bio->bi_vcnt);
+    pblk_bio_free_pages(pblk, rqd->bio, 0, rqd->bio->bi_vcnt);
__pblk_end_io_read(pblk, rqd, false);
return NVM_IO_ERR;
}
@@ -480,8 +530,15 @@ int pblk_submit_read(struct pblk *pblk, struct 
bio *bio)
/* The read bio request could be partially filled by the write 
buffer,

 * but there are some holes that need to be read from the drive.
 */
-    return pblk_partial_read(pblk, rqd, bio, bio_init_idx, 
_bitmap);

+    ret = pblk_partial_read_bio(pblk, rqd, bio_init_idx, _bitmap,
+    nr_secs);
+    if (ret)
+    goto fail_meta_free;
+
+    return NVM_IO_OK;

+fail_meta_free:
+    nvm_dev_dma_free(dev->parent, rqd->meta_list, rqd->dma_meta_list);
fail_rqd_free:
pblk_free_rqd(pblk, rqd, PBLK_READ);
return ret;
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index 25ad026..4b28900 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -119,6 +119,16 @@ struct pblk_g_ctx {
u64 lba;
};

+/* partial read context */
+struct pblk_pr_ctx {
+    struct bio *orig_bio;
+    unsigned long bitmap;
+    unsigned int orig_nr_secs;
+    unsigned int bio_init_idx;
+    void *ppa_ptr;
+    dma_addr_t dma_ppa_list;
+};
+
/* Pad context */
struct pblk_pad_rq {
struct pblk *pblk;
--
2.7.4


Thanks Heiner. The patch looks good.

Reviewed-by: Javier González 



+ Marcin & Igor. Could you give this a spin with your drive and see if 
it works for you?
It looks that it does not apply on top of for-4.19/core, but after some 
changes I was able to test it. Except of one minor comment above it 
looks good for me.


Tested-by: Igor Konopko 


[PATCH] lightnvm: pblk: sync RB and RL states during GC

2018-05-24 Thread Igor Konopko
During sequential workloads we can met the case
when almost all the lines are fully written with data.
In that case rate limiter will significantly
reduce the max number of requests for user IOs.

Unfortunately in the case when round buffer is
flushed to drive and the entries are not yet
removed (which is ok, since there is still enough
free entries in round buffer for user IO) we hang on
user IO due to not enough entries in rate limiter.
The reason is that rate limiter user entries are
decreased after freeing the round buffer entries,
which does not happen if there is still plenty of
space in round buffer.

The goal of this patch is to force freeing round buffer
by calling pblk_rb_sync_l2p and thus making new free entries
in rate limiter, when there is no enough of them for user IO.

Signed-off-by: Igor Konopko <igor.j.kono...@intel.com>
Signed-off-by: Marcin Dziegielewski <marcin.dziegielew...@intel.com>
---
 drivers/lightnvm/pblk-init.c | 2 ++
 drivers/lightnvm/pblk-rb.c   | 7 +++
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 0f277744266b..e6aa7726f8ba 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -1149,7 +1149,9 @@ static void pblk_tear_down(struct pblk *pblk, bool 
graceful)
__pblk_pipeline_flush(pblk);
__pblk_pipeline_stop(pblk);
pblk_writer_stop(pblk);
+   spin_lock(>rwb.w_lock);
pblk_rb_sync_l2p(>rwb);
+   spin_unlock(>rwb.w_lock);
pblk_rl_free(>rl);
 
pr_debug("pblk: consistent tear down (graceful:%d)\n", graceful);
diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
index 1b74ec51a4ad..91824cd3e8d8 100644
--- a/drivers/lightnvm/pblk-rb.c
+++ b/drivers/lightnvm/pblk-rb.c
@@ -266,21 +266,18 @@ static int pblk_rb_update_l2p(struct pblk_rb *rb, 
unsigned int nr_entries,
  * Update the l2p entry for all sectors stored on the write buffer. This means
  * that all future lookups to the l2p table will point to a device address, not
  * to the cacheline in the write buffer.
+ * Caller must ensure that rb->w_lock is taken.
  */
 void pblk_rb_sync_l2p(struct pblk_rb *rb)
 {
unsigned int sync;
unsigned int to_update;
 
-   spin_lock(>w_lock);
-
/* Protect from reads and writes */
sync = smp_load_acquire(>sync);
 
to_update = pblk_rb_ring_count(sync, rb->l2p_update, rb->nr_entries);
__pblk_rb_update_l2p(rb, to_update);
-
-   spin_unlock(>w_lock);
 }
 
 /*
@@ -462,6 +459,8 @@ int pblk_rb_may_write_user(struct pblk_rb *rb, struct bio 
*bio,
spin_lock(>w_lock);
io_ret = pblk_rl_user_may_insert(>rl, nr_entries);
if (io_ret) {
+   /* Sync RB & L2P in order to update rate limiter values */
+   pblk_rb_sync_l2p(rb);
spin_unlock(>w_lock);
return io_ret;
}
-- 
2.14.3



[PATCH 2/3] lightnvm: Handling when whole line is bad

2018-05-23 Thread Igor Konopko
When all the blocks (chunks) in line are marked as bad (offline)
we shouldn't try to read smeta during init process.

Currently we are trying to do so by passing -1 as PPA address,
what causes multiple warnings, that we issuing IOs to out-of-bound PPAs.

Signed-off-by: Igor Konopko <igor.j.kono...@intel.com>
Signed-off-by: Marcin Dziegielewski <marcin.dziegielew...@intel.com>
---
 drivers/lightnvm/pblk-core.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index 6d21f9dbca5f..5d197f19b77b 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -867,6 +867,11 @@ int pblk_line_read_smeta(struct pblk *pblk, struct 
pblk_line *line)
 {
u64 bpaddr = pblk_line_smeta_start(pblk, line);
 
+   if (bpaddr == -1) {
+   /* Whole line is bad - do not try to read smeta. */
+   return 1;
+   }
+
return pblk_line_submit_smeta_io(pblk, line, bpaddr, PBLK_READ_RECOV);
 }
 
-- 
2.14.3



[PATCH 3/3] lightnvm: Fix partial read error path

2018-05-23 Thread Igor Konopko
When error occurs during bio_add_page on partial read path, pblk tries to
free pages twice. This patch fixes that issue.

Signed-off-by: Igor Konopko <igor.j.kono...@intel.com>
Signed-off-by: Marcin Dziegielewski <marcin.dziegielew...@intel.com>
---
 drivers/lightnvm/pblk-read.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index a2e678de428f..fa7b60f852d9 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -256,7 +256,7 @@ static int pblk_partial_read_bio(struct pblk *pblk, struct 
nvm_rq *rqd,
new_bio = bio_alloc(GFP_KERNEL, nr_holes);
 
if (pblk_bio_add_pages(pblk, new_bio, GFP_KERNEL, nr_holes))
-   goto err;
+   goto err_add_pages;
 
if (nr_holes != new_bio->bi_vcnt) {
pr_err("pblk: malformed bio\n");
@@ -347,10 +347,10 @@ static int pblk_partial_read_bio(struct pblk *pblk, 
struct nvm_rq *rqd,
return NVM_IO_OK;
 
 err:
-   pr_err("pblk: failed to perform partial read\n");
-
/* Free allocated pages in new bio */
-   pblk_bio_free_pages(pblk, bio, 0, new_bio->bi_vcnt);
+   pblk_bio_free_pages(pblk, new_bio, 0, new_bio->bi_vcnt);
+err_add_pages:
+   pr_err("pblk: failed to perform partial read\n");
__pblk_end_io_read(pblk, rqd, false);
return NVM_IO_ERR;
 }
-- 
2.14.3



[PATCH 1/3] lightnvm: Proper error handling for pblk_bio_add_pages

2018-05-23 Thread Igor Konopko
Currently in case of error caused by bio_pc_add_page in pblk_bio_add_pages
two issues occur when calling from pblk_rb_read_to_bio().
First one is in pblk_bio_free_pages, since we are trying to free
pages not allocated from our mempool. Second one is the warn from
dma_pool_free, that we are trying to free NULL pointer dma. 
This commit fix that both issues.

Signed-off-by: Igor Konopko <igor.j.kono...@intel.com>
Signed-off-by: Marcin Dziegielewski <marcin.dziegielew...@intel.com>
---
 drivers/lightnvm/pblk-core.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
index e43093e27084..6d21f9dbca5f 100644
--- a/drivers/lightnvm/pblk-core.c
+++ b/drivers/lightnvm/pblk-core.c
@@ -278,7 +278,8 @@ void pblk_free_rqd(struct pblk *pblk, struct nvm_rq *rqd, 
int type)
return;
}
 
-   nvm_dev_dma_free(dev->parent, rqd->meta_list, rqd->dma_meta_list);
+   if (rqd->meta_list)
+   nvm_dev_dma_free(dev->parent, rqd->meta_list,
+rqd->dma_meta_list);
mempool_free(rqd, pool);
 }
 
@@ -316,7 +317,7 @@ int pblk_bio_add_pages(struct pblk *pblk, struct bio *bio, 
gfp_t flags,
 
return 0;
 err:
-   pblk_bio_free_pages(pblk, bio, 0, i - 1);
+   pblk_bio_free_pages(pblk, bio, (bio->bi_vcnt - i), i);
return -1;
 }
 
-- 
2.14.3



[PATCH 0/3] lightnvm: Error paths handling

2018-05-23 Thread Igor Konopko
This patchset provides a proper handling for some of the errors which
are not gracefully handled right now.

Igor Konopko (3):
  lightnvm: Proper error handling for pblk_bio_add_pages
  lightnvm: Handling when whole line is bad
  lightnvm: Fix partial read error path

 drivers/lightnvm/pblk-core.c | 10 --
 drivers/lightnvm/pblk-read.c |  8 
 2 files changed, 12 insertions(+), 6 deletions(-)

-- 
2.14.3