[PATCH v2 1/2] lightnvm: pblk: Do not overwrite ppa list with meta list
Currently when using PBLK with 0 sized metadata both ppa list and meta list points to the same memory since pblk_dma_meta_size() returns 0 in that case. This commit fix that issue by ensuring that pblk_dma_meta_size() always returns space equal to sizeof(struct pblk_sec_meta) and thus ppa list and meta list points to different memory address. Even that in that case drive does not really care about meta_list pointer, this is the easiest way to fix that issue without introducing changes in many places in the code just for 0 sized metadata case. The same approach needs to be also done for pblk_get_sec_meta() since we also cannot point to the same memory address in meta buffer when we are using it for pblk recovery process Reported-by: Hans Holmberg Signed-off-by: Igor Konopko --- drivers/lightnvm/pblk.h | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h index bc40b1381ff6..85e38ed62f85 100644 --- a/drivers/lightnvm/pblk.h +++ b/drivers/lightnvm/pblk.h @@ -1388,12 +1388,15 @@ static inline unsigned int pblk_get_min_chks(struct pblk *pblk) static inline struct pblk_sec_meta *pblk_get_meta(struct pblk *pblk, void *meta, int index) { - return meta + pblk->oob_meta_size * index; + return meta + + max_t(int, sizeof(struct pblk_sec_meta), pblk->oob_meta_size) + * index; } static inline int pblk_dma_meta_size(struct pblk *pblk) { - return pblk->oob_meta_size * NVM_MAX_VLBA; + return max_t(int, sizeof(struct pblk_sec_meta), pblk->oob_meta_size) + * NVM_MAX_VLBA; } static inline int pblk_is_oob_meta_supported(struct pblk *pblk) -- 2.17.1
[PATCH v2 2/2] lightnvm: pblk: Ensure that bio is not freed on recovery
When we are using PBLK with 0 sized metadata during recovery process we need to reference a last page of bio. Currently KASAN reports use-after-free in that case, since bio is freed on IO completion. This patch adds addtional bio reference to ensure, that we can still use bio memory after IO completion. It also ensures that we are not reusing the same bio on retry_rq path. Reported-by: Hans Holmberg Signed-off-by: Igor Konopko --- drivers/lightnvm/pblk-recovery.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c index 009faf5db40f..3fcf062d752c 100644 --- a/drivers/lightnvm/pblk-recovery.c +++ b/drivers/lightnvm/pblk-recovery.c @@ -376,12 +376,14 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line, rq_ppas = pblk->min_write_pgs; rq_len = rq_ppas * geo->csecs; +retry_rq: bio = bio_map_kern(dev->q, data, rq_len, GFP_KERNEL); if (IS_ERR(bio)) return PTR_ERR(bio); bio->bi_iter.bi_sector = 0; /* internal bio */ bio_set_op_attrs(bio, REQ_OP_READ, 0); + bio_get(bio); rqd->bio = bio; rqd->opcode = NVM_OP_PREAD; @@ -394,7 +396,6 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line, if (pblk_io_aligned(pblk, rq_ppas)) rqd->is_seq = 1; -retry_rq: for (i = 0; i < rqd->nr_ppas; ) { struct ppa_addr ppa; int pos; @@ -417,6 +418,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line, if (ret) { pblk_err(pblk, "I/O submission failed: %d\n", ret); bio_put(bio); + bio_put(bio); return ret; } @@ -428,19 +430,25 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line, if (padded) { pblk_log_read_err(pblk, rqd); + bio_put(bio); return -EINTR; } pad_distance = pblk_pad_distance(pblk, line); ret = pblk_recov_pad_line(pblk, line, pad_distance); - if (ret) + if (ret) { + bio_put(bio); return ret; + } padded = true; + bio_put(bio); goto retry_rq; } pblk_get_packed_meta(pblk, rqd); + bio_put(bio); + for (i = 0; i < rqd->nr_ppas; i++) { struct pblk_sec_meta *meta = pblk_get_meta(pblk, meta_list, i); u64 lba = le64_to_cpu(meta->lba); -- 2.17.1
[PATCH 1/2] lightnvm: pblk: Do not overwrite ppa list with meta list
Currently when using PBLK with 0 sized metadata both ppa list and meta list points to the same memory since pblk_dma_meta_size() returns 0 in that case. This commit fix that issue by ensuring that pblk_dma_meta_size() always returns space equal to sizeof(struct pblk_sec_meta) and thus ppa list and meta list points to different memory address. Even that in that case drive does not really care about meta_list pointer, this is the easiest way to fix that issue without introducing changes in many places in the code just for 0 sized metadata case. Reported-by: Hans Holmberg Signed-off-by: Igor Konopko --- drivers/lightnvm/pblk.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h index bc40b1381ff6..e5c9ff2bf0da 100644 --- a/drivers/lightnvm/pblk.h +++ b/drivers/lightnvm/pblk.h @@ -1393,7 +1393,8 @@ static inline struct pblk_sec_meta *pblk_get_meta(struct pblk *pblk, static inline int pblk_dma_meta_size(struct pblk *pblk) { - return pblk->oob_meta_size * NVM_MAX_VLBA; + return max_t(int, sizeof(struct pblk_sec_meta), pblk->oob_meta_size) + * NVM_MAX_VLBA; } static inline int pblk_is_oob_meta_supported(struct pblk *pblk) -- 2.17.1
[PATCH 2/2] lightnvm: pblk: Ensure that bio is not freed on recovery
When we are using PBLK with 0 sized metadata during recovery process we need to reference a last page of bio. Currently KASAN reports use-after-free in that case, since bio is freed on IO completion. This patch adds addtional bio reference to ensure, that we can still use bio memory after IO completion. It also ensures that we are not reusing the same bio on retry_rq path. Reported-by: Hans Holmberg Signed-off-by: Igor Konopko --- drivers/lightnvm/pblk-recovery.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c index 009faf5db40f..3fcf062d752c 100644 --- a/drivers/lightnvm/pblk-recovery.c +++ b/drivers/lightnvm/pblk-recovery.c @@ -376,12 +376,14 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line, rq_ppas = pblk->min_write_pgs; rq_len = rq_ppas * geo->csecs; +retry_rq: bio = bio_map_kern(dev->q, data, rq_len, GFP_KERNEL); if (IS_ERR(bio)) return PTR_ERR(bio); bio->bi_iter.bi_sector = 0; /* internal bio */ bio_set_op_attrs(bio, REQ_OP_READ, 0); + bio_get(bio); rqd->bio = bio; rqd->opcode = NVM_OP_PREAD; @@ -394,7 +396,6 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line, if (pblk_io_aligned(pblk, rq_ppas)) rqd->is_seq = 1; -retry_rq: for (i = 0; i < rqd->nr_ppas; ) { struct ppa_addr ppa; int pos; @@ -417,6 +418,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line, if (ret) { pblk_err(pblk, "I/O submission failed: %d\n", ret); bio_put(bio); + bio_put(bio); return ret; } @@ -428,19 +430,25 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line, if (padded) { pblk_log_read_err(pblk, rqd); + bio_put(bio); return -EINTR; } pad_distance = pblk_pad_distance(pblk, line); ret = pblk_recov_pad_line(pblk, line, pad_distance); - if (ret) + if (ret) { + bio_put(bio); return ret; + } padded = true; + bio_put(bio); goto retry_rq; } pblk_get_packed_meta(pblk, rqd); + bio_put(bio); + for (i = 0; i < rqd->nr_ppas; i++) { struct pblk_sec_meta *meta = pblk_get_meta(pblk, meta_list, i); u64 lba = le64_to_cpu(meta->lba); -- 2.17.1
[PATCH v5 3/5] lightnvm: Flexible DMA pool entry size
Currently whole lightnvm and pblk uses single DMA pool, for which entry size is always equal to PAGE_SIZE. PPA list always needs 8B*64, so there is only 56B*64 space for OOB meta. Since NVMe OOB meta can be bigger, such as 128B, this solution is not robustness. This patch add the possiblity to support OOB meta above 56b by changing DMA pool size based on OOB meta size. It also allows pblk to use OOB metadata >=16B. Reviewed-by: Javier González Signed-off-by: Igor Konopko --- drivers/lightnvm/core.c | 9 +++-- drivers/lightnvm/pblk-core.c | 8 drivers/lightnvm/pblk-init.c | 2 +- drivers/lightnvm/pblk-recovery.c | 4 ++-- drivers/lightnvm/pblk.h | 6 +- drivers/nvme/host/lightnvm.c | 5 +++-- include/linux/lightnvm.h | 2 +- 7 files changed, 23 insertions(+), 13 deletions(-) diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c index 69b841d682c7..5f82036fe322 100644 --- a/drivers/lightnvm/core.c +++ b/drivers/lightnvm/core.c @@ -1140,7 +1140,7 @@ EXPORT_SYMBOL(nvm_alloc_dev); int nvm_register(struct nvm_dev *dev) { - int ret; + int ret, exp_pool_size; if (!dev->q || !dev->ops) return -EINVAL; @@ -1149,7 +1149,12 @@ int nvm_register(struct nvm_dev *dev) if (ret) return ret; - dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist"); + exp_pool_size = max_t(int, PAGE_SIZE, + (NVM_MAX_VLBA * (sizeof(u64) + dev->geo.sos))); + exp_pool_size = round_up(exp_pool_size, PAGE_SIZE); + + dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist", + exp_pool_size); if (!dev->dma_pool) { pr_err("nvm: could not create dma pool\n"); nvm_free(dev); diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c index e732b2d12a23..7e3397f8ead1 100644 --- a/drivers/lightnvm/pblk-core.c +++ b/drivers/lightnvm/pblk-core.c @@ -250,8 +250,8 @@ int pblk_alloc_rqd_meta(struct pblk *pblk, struct nvm_rq *rqd) if (rqd->nr_ppas == 1) return 0; - rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size; - rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size; + rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size(pblk); + rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size(pblk); return 0; } @@ -846,8 +846,8 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line, if (!meta_list) return -ENOMEM; - ppa_list = meta_list + pblk_dma_meta_size; - dma_ppa_list = dma_meta_list + pblk_dma_meta_size; + ppa_list = meta_list + pblk_dma_meta_size(pblk); + dma_ppa_list = dma_meta_list + pblk_dma_meta_size(pblk); next_rq: memset(, 0, sizeof(struct nvm_rq)); diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c index 33361bfb85c3..ff6a6df369c3 100644 --- a/drivers/lightnvm/pblk-init.c +++ b/drivers/lightnvm/pblk-init.c @@ -406,7 +406,7 @@ static int pblk_core_init(struct pblk *pblk) pblk_set_sec_per_write(pblk, pblk->min_write_pgs); pblk->oob_meta_size = geo->sos; - if (pblk->oob_meta_size != sizeof(struct pblk_sec_meta)) { + if (pblk->oob_meta_size < sizeof(struct pblk_sec_meta)) { pblk_err(pblk, "Unsupported metadata size\n"); return -EINVAL; } diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c index e4dd634ba05f..3a775d10f616 100644 --- a/drivers/lightnvm/pblk-recovery.c +++ b/drivers/lightnvm/pblk-recovery.c @@ -481,8 +481,8 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line) if (!meta_list) return -ENOMEM; - ppa_list = (void *)(meta_list) + pblk_dma_meta_size; - dma_ppa_list = dma_meta_list + pblk_dma_meta_size; + ppa_list = (void *)(meta_list) + pblk_dma_meta_size(pblk); + dma_ppa_list = dma_meta_list + pblk_dma_meta_size(pblk); data = kcalloc(pblk->max_write_pgs, geo->csecs, GFP_KERNEL); if (!data) { diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h index 80f356688803..9087d53d5c25 100644 --- a/drivers/lightnvm/pblk.h +++ b/drivers/lightnvm/pblk.h @@ -104,7 +104,6 @@ enum { PBLK_RL_LOW = 4 }; -#define pblk_dma_meta_size (sizeof(struct pblk_sec_meta) * NVM_MAX_VLBA) #define pblk_dma_ppa_size (sizeof(u64) * NVM_MAX_VLBA) /* write buffer completion context */ @@ -1388,4 +1387,9 @@ static inline struct pblk_sec_meta *pblk_get_meta(struct pblk *pblk, { return meta + pblk->oob_meta_size * index; } + +static inline int pblk_dma_meta_size(struct pblk *pblk) +{ + return pblk->oob_meta_size * NVM_MAX_VLBA; +} #endif /* PBLK_H_ */ diff -
[PATCH v5 2/5] lightnvm: pblk: Helpers for OOB metadata
Currently pblk assumes that size of OOB metadata on drive is always equal to size of pblk_sec_meta struct. This commit add helpers which will allow to handle different sizes of OOB metadata on drive in the future. Still, after this patch only OOB metadata equal to 16 bytes is supported. Reviewed-by: Javier González Signed-off-by: Igor Konopko --- drivers/lightnvm/pblk-core.c | 5 +++-- drivers/lightnvm/pblk-init.c | 6 + drivers/lightnvm/pblk-map.c | 20 +++-- drivers/lightnvm/pblk-read.c | 48 ++-- drivers/lightnvm/pblk-recovery.c | 16 +- drivers/lightnvm/pblk.h | 6 + 6 files changed, 69 insertions(+), 32 deletions(-) diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c index f1b411e7c7c9..e732b2d12a23 100644 --- a/drivers/lightnvm/pblk-core.c +++ b/drivers/lightnvm/pblk-core.c @@ -796,10 +796,11 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line, rqd.is_seq = 1; for (i = 0; i < lm->smeta_sec; i++, paddr++) { - struct pblk_sec_meta *meta_list = rqd.meta_list; + struct pblk_sec_meta *meta = pblk_get_meta(pblk, + rqd.meta_list, i); rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id); - meta_list[i].lba = lba_list[paddr] = addr_empty; + meta->lba = lba_list[paddr] = addr_empty; } ret = pblk_submit_io_sync_sem(pblk, ); diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c index 72ad3e70318c..33361bfb85c3 100644 --- a/drivers/lightnvm/pblk-init.c +++ b/drivers/lightnvm/pblk-init.c @@ -405,6 +405,12 @@ static int pblk_core_init(struct pblk *pblk) queue_max_hw_sectors(dev->q) / (geo->csecs >> SECTOR_SHIFT)); pblk_set_sec_per_write(pblk, pblk->min_write_pgs); + pblk->oob_meta_size = geo->sos; + if (pblk->oob_meta_size != sizeof(struct pblk_sec_meta)) { + pblk_err(pblk, "Unsupported metadata size\n"); + return -EINVAL; + } + pblk->pad_dist = kcalloc(pblk->min_write_pgs - 1, sizeof(atomic64_t), GFP_KERNEL); if (!pblk->pad_dist) diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c index 5a3c28cce8ab..81e503ec384e 100644 --- a/drivers/lightnvm/pblk-map.c +++ b/drivers/lightnvm/pblk-map.c @@ -22,7 +22,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry, struct ppa_addr *ppa_list, unsigned long *lun_bitmap, - struct pblk_sec_meta *meta_list, + void *meta_list, unsigned int valid_secs) { struct pblk_line *line = pblk_line_get_data(pblk); @@ -58,6 +58,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry, paddr = pblk_alloc_page(pblk, line, nr_secs); for (i = 0; i < nr_secs; i++, paddr++) { + struct pblk_sec_meta *meta = pblk_get_meta(pblk, meta_list, i); __le64 addr_empty = cpu_to_le64(ADDR_EMPTY); /* ppa to be sent to the device */ @@ -74,14 +75,15 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry, kref_get(>ref); w_ctx = pblk_rb_w_ctx(>rwb, sentry + i); w_ctx->ppa = ppa_list[i]; - meta_list[i].lba = cpu_to_le64(w_ctx->lba); + meta->lba = cpu_to_le64(w_ctx->lba); lba_list[paddr] = cpu_to_le64(w_ctx->lba); if (lba_list[paddr] != addr_empty) line->nr_valid_lbas++; else atomic64_inc(>pad_wa); } else { - lba_list[paddr] = meta_list[i].lba = addr_empty; + lba_list[paddr] = addr_empty; + meta->lba = addr_empty; __pblk_map_invalidate(pblk, line, paddr); } } @@ -94,7 +96,8 @@ int pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry, unsigned long *lun_bitmap, unsigned int valid_secs, unsigned int off) { - struct pblk_sec_meta *meta_list = rqd->meta_list; + void *meta_list = rqd->meta_list; + void *meta_buffer; struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd); unsigned int map_secs; int min = pblk->min_write_pgs; @@ -103,9 +106,10 @@ int pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry, for (i = off; i < rqd->nr_ppas; i += min) {
[PATCH v5 4/5] lightnvm: Disable interleaved metadata
Currently pblk and lightnvm does only check for size of OOB metadata and does not care wheather this meta is located in separate buffer or is interleaved with data in single buffer. In reality only the first scenario is supported, where second mode will break pblk functionality during any IO operation. The goal of this patch is to block creation of pblk devices in case of interleaved metadata Reviewed-by: Javier González Signed-off-by: Igor Konopko --- drivers/lightnvm/pblk-init.c | 6 ++ drivers/nvme/host/lightnvm.c | 1 + include/linux/lightnvm.h | 1 + 3 files changed, 8 insertions(+) diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c index ff6a6df369c3..e8055b796381 100644 --- a/drivers/lightnvm/pblk-init.c +++ b/drivers/lightnvm/pblk-init.c @@ -1175,6 +1175,12 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk, return ERR_PTR(-EINVAL); } + if (geo->ext) { + pblk_err(pblk, "extended metadata not supported\n"); + kfree(pblk); + return ERR_PTR(-EINVAL); + } + spin_lock_init(>resubmit_lock); spin_lock_init(>trans_lock); spin_lock_init(>lock); diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c index ba268d7cf141..f145fc0220d6 100644 --- a/drivers/nvme/host/lightnvm.c +++ b/drivers/nvme/host/lightnvm.c @@ -990,6 +990,7 @@ int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node) geo = >geo; geo->csecs = 1 << ns->lba_shift; geo->sos = ns->ms; + geo->ext = ns->ext; dev->q = q; memcpy(dev->name, disk_name, DISK_NAME_LEN); diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h index 7afedaddbd15..5d865a5d5cdc 100644 --- a/include/linux/lightnvm.h +++ b/include/linux/lightnvm.h @@ -357,6 +357,7 @@ struct nvm_geo { u32 clba; /* sectors per chunk */ u16 csecs; /* sector size */ u16 sos;/* out-of-band area size */ + boolext;/* metadata in extended data buffer */ /* device write constrains */ u32 ws_min; /* minimum write size */ -- 2.14.5
[PATCH v5 5/5] lightnvm: pblk: Support for packed metadata
In current pblk implementation, l2p mapping for not closed lines is always stored only in OOB metadata and recovered from it. Such a solution does not provide data integrity when drives does not have such a OOB metadata space. The goal of this patch is to add support for so called packed metadata, which store l2p mapping for open lines in last sector of every write unit. After this set of changes, drives with OOB size <16B will use packed metadata, when >=16B will continue to use OOB metadata. Reviewed-by: Javier González Signed-off-by: Igor Konopko --- drivers/lightnvm/pblk-core.c | 48 drivers/lightnvm/pblk-init.c | 38 ++- drivers/lightnvm/pblk-map.c | 4 ++-- drivers/lightnvm/pblk-rb.c | 3 +++ drivers/lightnvm/pblk-read.c | 6 + drivers/lightnvm/pblk-recovery.c | 5 +++-- drivers/lightnvm/pblk-sysfs.c| 7 ++ drivers/lightnvm/pblk-write.c| 9 drivers/lightnvm/pblk.h | 10 - 9 files changed, 112 insertions(+), 18 deletions(-) diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c index 7e3397f8ead1..1ff165351180 100644 --- a/drivers/lightnvm/pblk-core.c +++ b/drivers/lightnvm/pblk-core.c @@ -376,7 +376,7 @@ void pblk_write_should_kick(struct pblk *pblk) { unsigned int secs_avail = pblk_rb_read_count(>rwb); - if (secs_avail >= pblk->min_write_pgs) + if (secs_avail >= pblk->min_write_pgs_data) pblk_write_kick(pblk); } @@ -407,7 +407,9 @@ struct list_head *pblk_line_gc_list(struct pblk *pblk, struct pblk_line *line) struct pblk_line_meta *lm = >lm; struct pblk_line_mgmt *l_mg = >l_mg; struct list_head *move_list = NULL; - int vsc = le32_to_cpu(*line->vsc); + int packed_meta = (le32_to_cpu(*line->vsc) / pblk->min_write_pgs_data) + * (pblk->min_write_pgs - pblk->min_write_pgs_data); + int vsc = le32_to_cpu(*line->vsc) + packed_meta; lockdep_assert_held(>lock); @@ -620,12 +622,15 @@ struct bio *pblk_bio_map_addr(struct pblk *pblk, void *data, } int pblk_calc_secs(struct pblk *pblk, unsigned long secs_avail, - unsigned long secs_to_flush) + unsigned long secs_to_flush, bool skip_meta) { int max = pblk->sec_per_write; int min = pblk->min_write_pgs; int secs_to_sync = 0; + if (skip_meta && pblk->min_write_pgs_data != pblk->min_write_pgs) + min = max = pblk->min_write_pgs_data; + if (secs_avail >= max) secs_to_sync = max; else if (secs_avail >= min) @@ -852,7 +857,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line, next_rq: memset(, 0, sizeof(struct nvm_rq)); - rq_ppas = pblk_calc_secs(pblk, left_ppas, 0); + rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false); rq_len = rq_ppas * geo->csecs; bio = pblk_bio_map_addr(pblk, emeta_buf, rq_ppas, rq_len, @@ -2169,3 +2174,38 @@ void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas, } spin_unlock(>trans_lock); } + +void *pblk_get_meta_for_writes(struct pblk *pblk, struct nvm_rq *rqd) +{ + void *buffer; + + if (pblk_is_oob_meta_supported(pblk)) { + /* Just use OOB metadata buffer as always */ + buffer = rqd->meta_list; + } else { + /* We need to reuse last page of request (packed metadata) +* in similar way as traditional oob metadata +*/ + buffer = page_to_virt( + rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page); + } + + return buffer; +} + +void pblk_get_packed_meta(struct pblk *pblk, struct nvm_rq *rqd) +{ + void *meta_list = rqd->meta_list; + void *page; + int i = 0; + + if (pblk_is_oob_meta_supported(pblk)) + return; + + page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page); + /* We need to fill oob meta buffer with data from packed metadata */ + for (; i < rqd->nr_ppas; i++) + memcpy(pblk_get_meta(pblk, meta_list, i), + page + (i * sizeof(struct pblk_sec_meta)), + sizeof(struct pblk_sec_meta)); +} diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c index e8055b796381..f9a3e47b6a93 100644 --- a/drivers/lightnvm/pblk-init.c +++ b/drivers/lightnvm/pblk-init.c @@ -399,6 +399,7 @@ static int pblk_core_init(struct pblk *pblk) pblk->nr_flush_rst = 0; pblk->min_write_pgs = geo->ws_opt; + pblk->min_write_pgs_data = pblk->min_write_pgs; max_write_ppas = pblk->min_write_pgs * geo->all_luns; pblk->max_write_pgs = min_t(i
[PATCH v5 0/5] lightnvm: Flexible metadata
This series of patches extends the way how pblk can store L2P sector metadata. After this set of changes any size of NVMe metadata is supported in pblk. Also there is an support for case without NVMe metadata. Changes v4 --> v5: -rebase on top of ocssd/for-4.21/core Changes v3 --> v4: -rename nvm_alloc_dma_pool() to nvm_create_dma_pool() -split pblk_get_meta() calls and lba setting into two operations for better core readability -fixing compilation with CONFIG_NVM disabled -getting rid of unnecessary memcpy for packed metadata on write path -support for drives with oob size >0 and <16B in packed metadata mode -minor commit message updates Changes v2 --> v3: -Rebase on top of ocssd/for-4.21/core -get/set_meta_lba helpers were removed -dma reallocation was replaced with single allocation -oob metadata size was added to pblk structure -proper checks on pblk creation were added Changes v1 --> v2: -Revert sector meta size back to 16b for pblk -Dma pool for larger oob meta are handled in core instead of pblk -Pblk oob meta helpers uses __le64 as input outpu instead of u64 -Other minor fixes based on v1 patch review Igor Konopko (5): lightnvm: pblk: Move lba list to partial read context lightnvm: pblk: Helpers for OOB metadata lightnvm: Flexible DMA pool entry size lightnvm: Disable interleaved metadata lightnvm: pblk: Support for packed metadata drivers/lightnvm/core.c | 9 -- drivers/lightnvm/pblk-core.c | 61 +++-- drivers/lightnvm/pblk-init.c | 44 +-- drivers/lightnvm/pblk-map.c | 20 +++- drivers/lightnvm/pblk-rb.c | 3 ++ drivers/lightnvm/pblk-read.c | 66 +++- drivers/lightnvm/pblk-recovery.c | 25 +-- drivers/lightnvm/pblk-sysfs.c| 7 + drivers/lightnvm/pblk-write.c| 9 +++--- drivers/lightnvm/pblk.h | 24 +-- drivers/nvme/host/lightnvm.c | 6 ++-- include/linux/lightnvm.h | 3 +- 12 files changed, 209 insertions(+), 68 deletions(-) -- 2.14.5
[PATCH v5 1/5] lightnvm: pblk: Move lba list to partial read context
Currently DMA allocated memory is reused on partial read for lba_list_mem and lba_list_media arrays. In preparation for dynamic DMA pool sizes we need to move this arrays into pblk_pr_ctx structures. Reviewed-by: Javier González Signed-off-by: Igor Konopko --- drivers/lightnvm/pblk-read.c | 20 +--- drivers/lightnvm/pblk.h | 2 ++ 2 files changed, 7 insertions(+), 15 deletions(-) diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c index 9fba614adeeb..19917d3c19b3 100644 --- a/drivers/lightnvm/pblk-read.c +++ b/drivers/lightnvm/pblk-read.c @@ -224,7 +224,6 @@ static void pblk_end_partial_read(struct nvm_rq *rqd) unsigned long *read_bitmap = pr_ctx->bitmap; int nr_secs = pr_ctx->orig_nr_secs; int nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs); - __le64 *lba_list_mem, *lba_list_media; void *src_p, *dst_p; int hole, i; @@ -237,13 +236,9 @@ static void pblk_end_partial_read(struct nvm_rq *rqd) rqd->ppa_list[0] = ppa; } - /* Re-use allocated memory for intermediate lbas */ - lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size); - lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk_dma_ppa_size); - for (i = 0; i < nr_secs; i++) { - lba_list_media[i] = meta_list[i].lba; - meta_list[i].lba = lba_list_mem[i]; + pr_ctx->lba_list_media[i] = meta_list[i].lba; + meta_list[i].lba = pr_ctx->lba_list_mem[i]; } /* Fill the holes in the original bio */ @@ -255,7 +250,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd) line = pblk_ppa_to_line(pblk, rqd->ppa_list[i]); kref_put(>ref, pblk_line_put); - meta_list[hole].lba = lba_list_media[i]; + meta_list[hole].lba = pr_ctx->lba_list_media[i]; src_bv = new_bio->bi_io_vec[i++]; dst_bv = bio->bi_io_vec[bio_init_idx + hole]; @@ -295,13 +290,9 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd, struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd); struct pblk_pr_ctx *pr_ctx; struct bio *new_bio, *bio = r_ctx->private; - __le64 *lba_list_mem; int nr_secs = rqd->nr_ppas; int i; - /* Re-use allocated memory for intermediate lbas */ - lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size); - new_bio = bio_alloc(GFP_KERNEL, nr_holes); if (pblk_bio_add_pages(pblk, new_bio, GFP_KERNEL, nr_holes)) @@ -312,12 +303,12 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd, goto fail_free_pages; } - pr_ctx = kmalloc(sizeof(struct pblk_pr_ctx), GFP_KERNEL); + pr_ctx = kzalloc(sizeof(struct pblk_pr_ctx), GFP_KERNEL); if (!pr_ctx) goto fail_free_pages; for (i = 0; i < nr_secs; i++) - lba_list_mem[i] = meta_list[i].lba; + pr_ctx->lba_list_mem[i] = meta_list[i].lba; new_bio->bi_iter.bi_sector = 0; /* internal bio */ bio_set_op_attrs(new_bio, REQ_OP_READ, 0); @@ -325,7 +316,6 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd, rqd->bio = new_bio; rqd->nr_ppas = nr_holes; - pr_ctx->ppa_ptr = NULL; pr_ctx->orig_bio = bio; bitmap_copy(pr_ctx->bitmap, read_bitmap, NVM_MAX_VLBA); pr_ctx->bio_init_idx = bio_init_idx; diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h index e5b88a25d4d6..0e9d3960ac4c 100644 --- a/drivers/lightnvm/pblk.h +++ b/drivers/lightnvm/pblk.h @@ -132,6 +132,8 @@ struct pblk_pr_ctx { unsigned int bio_init_idx; void *ppa_ptr; dma_addr_t dma_ppa_list; + __le64 lba_list_mem[NVM_MAX_VLBA]; + __le64 lba_list_media[NVM_MAX_VLBA]; }; /* Pad context */ -- 2.14.5
[PATCH v4 5/5] lightnvm: pblk: Support for packed metadata
In current pblk implementation, l2p mapping for not closed lines is always stored only in OOB metadata and recovered from it. Such a solution does not provide data integrity when drives does not have such a OOB metadata space. The goal of this patch is to add support for so called packed metadata, which store l2p mapping for open lines in last sector of every write unit. After this set of changes, drives with OOB size <16B will use packed metadata, when >=16B will continue to use OOB metadata. Signed-off-by: Igor Konopko --- drivers/lightnvm/pblk-core.c | 48 drivers/lightnvm/pblk-init.c | 38 ++- drivers/lightnvm/pblk-map.c | 4 ++-- drivers/lightnvm/pblk-rb.c | 3 +++ drivers/lightnvm/pblk-read.c | 6 + drivers/lightnvm/pblk-recovery.c | 5 +++-- drivers/lightnvm/pblk-sysfs.c| 7 ++ drivers/lightnvm/pblk-write.c| 9 drivers/lightnvm/pblk.h | 10 - 9 files changed, 112 insertions(+), 18 deletions(-) diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c index 1347d1a93dd0..a95e18de5beb 100644 --- a/drivers/lightnvm/pblk-core.c +++ b/drivers/lightnvm/pblk-core.c @@ -376,7 +376,7 @@ void pblk_write_should_kick(struct pblk *pblk) { unsigned int secs_avail = pblk_rb_read_count(>rwb); - if (secs_avail >= pblk->min_write_pgs) + if (secs_avail >= pblk->min_write_pgs_data) pblk_write_kick(pblk); } @@ -407,7 +407,9 @@ struct list_head *pblk_line_gc_list(struct pblk *pblk, struct pblk_line *line) struct pblk_line_meta *lm = >lm; struct pblk_line_mgmt *l_mg = >l_mg; struct list_head *move_list = NULL; - int vsc = le32_to_cpu(*line->vsc); + int packed_meta = (le32_to_cpu(*line->vsc) / pblk->min_write_pgs_data) + * (pblk->min_write_pgs - pblk->min_write_pgs_data); + int vsc = le32_to_cpu(*line->vsc) + packed_meta; lockdep_assert_held(>lock); @@ -620,12 +622,15 @@ struct bio *pblk_bio_map_addr(struct pblk *pblk, void *data, } int pblk_calc_secs(struct pblk *pblk, unsigned long secs_avail, - unsigned long secs_to_flush) + unsigned long secs_to_flush, bool skip_meta) { int max = pblk->sec_per_write; int min = pblk->min_write_pgs; int secs_to_sync = 0; + if (skip_meta && pblk->min_write_pgs_data != pblk->min_write_pgs) + min = max = pblk->min_write_pgs_data; + if (secs_avail >= max) secs_to_sync = max; else if (secs_avail >= min) @@ -852,7 +857,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line, next_rq: memset(, 0, sizeof(struct nvm_rq)); - rq_ppas = pblk_calc_secs(pblk, left_ppas, 0); + rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false); rq_len = rq_ppas * geo->csecs; bio = pblk_bio_map_addr(pblk, emeta_buf, rq_ppas, rq_len, @@ -2161,3 +2166,38 @@ void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas, } spin_unlock(>trans_lock); } + +void *pblk_get_meta_for_writes(struct pblk *pblk, struct nvm_rq *rqd) +{ + void *buffer; + + if (pblk_is_oob_meta_supported(pblk)) { + /* Just use OOB metadata buffer as always */ + buffer = rqd->meta_list; + } else { + /* We need to reuse last page of request (packed metadata) +* in similar way as traditional oob metadata +*/ + buffer = page_to_virt( + rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page); + } + + return buffer; +} + +void pblk_get_packed_meta(struct pblk *pblk, struct nvm_rq *rqd) +{ + void *meta_list = rqd->meta_list; + void *page; + int i = 0; + + if (pblk_is_oob_meta_supported(pblk)) + return; + + page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page); + /* We need to fill oob meta buffer with data from packed metadata */ + for (; i < rqd->nr_ppas; i++) + memcpy(pblk_get_meta(pblk, meta_list, i), + page + (i * sizeof(struct pblk_sec_meta)), + sizeof(struct pblk_sec_meta)); +} diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c index a728f861edd6..830ebe3d098a 100644 --- a/drivers/lightnvm/pblk-init.c +++ b/drivers/lightnvm/pblk-init.c @@ -403,6 +403,7 @@ static int pblk_core_init(struct pblk *pblk) pblk->nr_flush_rst = 0; pblk->min_write_pgs = geo->ws_opt; + pblk->min_write_pgs_data = pblk->min_write_pgs; max_write_ppas = pblk->min_write_pgs * geo->all_luns; pblk->max_write_pgs = min_t(int, max_write_ppas
[PATCH v4 4/5] lightnvm: Disable interleaved metadata
Currently pblk and lightnvm does only check for size of OOB metadata and does not care wheather this meta is located in separate buffer or is interleaved with data in single buffer. In reality only the first scenario is supported, where second mode will break pblk functionality during any IO operation. The goal of this patch is to block creation of pblk devices in case of interleaved metadata Reviewed-by: Javier González Signed-off-by: Igor Konopko --- drivers/lightnvm/pblk-init.c | 6 ++ drivers/nvme/host/lightnvm.c | 1 + include/linux/lightnvm.h | 1 + 3 files changed, 8 insertions(+) diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c index b67bca810eb7..a728f861edd6 100644 --- a/drivers/lightnvm/pblk-init.c +++ b/drivers/lightnvm/pblk-init.c @@ -1179,6 +1179,12 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk, return ERR_PTR(-EINVAL); } + if (geo->ext) { + pblk_err(pblk, "extended metadata not supported\n"); + kfree(pblk); + return ERR_PTR(-EINVAL); + } + spin_lock_init(>resubmit_lock); spin_lock_init(>trans_lock); spin_lock_init(>lock); diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c index 60ac32b03fb6..ceb92610f43b 100644 --- a/drivers/nvme/host/lightnvm.c +++ b/drivers/nvme/host/lightnvm.c @@ -982,6 +982,7 @@ void nvme_nvm_update_nvm_info(struct nvme_ns *ns) if (geo->version != NVM_OCSSD_SPEC_12) { geo->csecs = 1 << ns->lba_shift; geo->sos = ns->ms; + geo->ext = ns->ext; } if (nvm_create_dma_pool(ndev)) diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h index 216b373b7fea..2717e6141b1a 100644 --- a/include/linux/lightnvm.h +++ b/include/linux/lightnvm.h @@ -357,6 +357,7 @@ struct nvm_geo { u32 clba; /* sectors per chunk */ u16 csecs; /* sector size */ u16 sos;/* out-of-band area size */ + boolext;/* metadata in extended data buffer */ /* device write constrains */ u32 ws_min; /* minimum write size */ -- 2.14.5
[PATCH v4 3/5] lightnvm: Flexible DMA pool entry size
Currently whole lightnvm and pblk uses single DMA pool, for which entry size is always equal to PAGE_SIZE. PPA list always needs 8B*64, so there is only 56B*64 space for OOB meta. Since NVMe OOB meta can be bigger, such as 128B, this solution is not robustness. This patch add the possiblity to support OOB meta above 56b by changing DMA pool size based on OOB meta size. It also allows pblk to use OOB metadata >=16B. Reviewed-by: Javier González Signed-off-by: Igor Konopko --- drivers/lightnvm/core.c | 30 -- drivers/lightnvm/pblk-core.c | 8 drivers/lightnvm/pblk-init.c | 2 +- drivers/lightnvm/pblk-recovery.c | 4 ++-- drivers/lightnvm/pblk.h | 6 +- drivers/nvme/host/lightnvm.c | 15 +-- include/linux/lightnvm.h | 7 ++- 7 files changed, 47 insertions(+), 25 deletions(-) diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c index 73ab3cf26868..e3a83e506458 100644 --- a/drivers/lightnvm/core.c +++ b/drivers/lightnvm/core.c @@ -1145,15 +1145,9 @@ int nvm_register(struct nvm_dev *dev) if (!dev->q || !dev->ops) return -EINVAL; - dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist"); - if (!dev->dma_pool) { - pr_err("nvm: could not create dma pool\n"); - return -ENOMEM; - } - ret = nvm_init(dev); if (ret) - goto err_init; + return ret; /* register device with a supported media manager */ down_write(_lock); @@ -1161,9 +1155,6 @@ int nvm_register(struct nvm_dev *dev) up_write(_lock); return 0; -err_init: - dev->ops->destroy_dma_pool(dev->dma_pool); - return ret; } EXPORT_SYMBOL(nvm_register); @@ -1187,6 +1178,25 @@ void nvm_unregister(struct nvm_dev *dev) } EXPORT_SYMBOL(nvm_unregister); +int nvm_create_dma_pool(struct nvm_dev *dev) +{ + int exp_pool_size; + + exp_pool_size = max_t(int, PAGE_SIZE, + (NVM_MAX_VLBA * (sizeof(u64) + dev->geo.sos))); + exp_pool_size = round_up(exp_pool_size, PAGE_SIZE); + + dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist", + exp_pool_size); + if (!dev->dma_pool) { + pr_err("nvm: could not create dma pool\n"); + return -ENOMEM; + } + + return 0; +} +EXPORT_SYMBOL(nvm_create_dma_pool); + static int __nvm_configure_create(struct nvm_ioctl_create *create) { struct nvm_dev *dev; diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c index aeb10bd78c62..1347d1a93dd0 100644 --- a/drivers/lightnvm/pblk-core.c +++ b/drivers/lightnvm/pblk-core.c @@ -250,8 +250,8 @@ int pblk_alloc_rqd_meta(struct pblk *pblk, struct nvm_rq *rqd) if (rqd->nr_ppas == 1) return 0; - rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size; - rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size; + rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size(pblk); + rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size(pblk); return 0; } @@ -846,8 +846,8 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line, if (!meta_list) return -ENOMEM; - ppa_list = meta_list + pblk_dma_meta_size; - dma_ppa_list = dma_meta_list + pblk_dma_meta_size; + ppa_list = meta_list + pblk_dma_meta_size(pblk); + dma_ppa_list = dma_meta_list + pblk_dma_meta_size(pblk); next_rq: memset(, 0, sizeof(struct nvm_rq)); diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c index 6e7a0c6c6655..b67bca810eb7 100644 --- a/drivers/lightnvm/pblk-init.c +++ b/drivers/lightnvm/pblk-init.c @@ -410,7 +410,7 @@ static int pblk_core_init(struct pblk *pblk) pblk_set_sec_per_write(pblk, pblk->min_write_pgs); pblk->oob_meta_size = geo->sos; - if (pblk->oob_meta_size != sizeof(struct pblk_sec_meta)) { + if (pblk->oob_meta_size < sizeof(struct pblk_sec_meta)) { pblk_err(pblk, "Unsupported metadata size\n"); return -EINVAL; } diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c index 6a30b9971283..52cbe06e3ebc 100644 --- a/drivers/lightnvm/pblk-recovery.c +++ b/drivers/lightnvm/pblk-recovery.c @@ -478,8 +478,8 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line) if (!meta_list) return -ENOMEM; - ppa_list = (void *)(meta_list) + pblk_dma_meta_size; - dma_ppa_list = dma_meta_list + pblk_dma_meta_size; + ppa_list = (void *)(meta_list) + pblk_dma_meta_size(pblk); + dma_ppa_list = dma_meta_list + pblk_dma_meta_size(pblk); data
[PATCH v4 1/5] lightnvm: pblk: Move lba list to partial read context
Currently DMA allocated memory is reused on partial read for lba_list_mem and lba_list_media arrays. In preparation for dynamic DMA pool sizes we need to move this arrays into pblk_pr_ctx structures. Reviewed-by: Javier González Signed-off-by: Igor Konopko --- drivers/lightnvm/pblk-read.c | 20 +--- drivers/lightnvm/pblk.h | 2 ++ 2 files changed, 7 insertions(+), 15 deletions(-) diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c index 9fba614adeeb..19917d3c19b3 100644 --- a/drivers/lightnvm/pblk-read.c +++ b/drivers/lightnvm/pblk-read.c @@ -224,7 +224,6 @@ static void pblk_end_partial_read(struct nvm_rq *rqd) unsigned long *read_bitmap = pr_ctx->bitmap; int nr_secs = pr_ctx->orig_nr_secs; int nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs); - __le64 *lba_list_mem, *lba_list_media; void *src_p, *dst_p; int hole, i; @@ -237,13 +236,9 @@ static void pblk_end_partial_read(struct nvm_rq *rqd) rqd->ppa_list[0] = ppa; } - /* Re-use allocated memory for intermediate lbas */ - lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size); - lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk_dma_ppa_size); - for (i = 0; i < nr_secs; i++) { - lba_list_media[i] = meta_list[i].lba; - meta_list[i].lba = lba_list_mem[i]; + pr_ctx->lba_list_media[i] = meta_list[i].lba; + meta_list[i].lba = pr_ctx->lba_list_mem[i]; } /* Fill the holes in the original bio */ @@ -255,7 +250,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd) line = pblk_ppa_to_line(pblk, rqd->ppa_list[i]); kref_put(>ref, pblk_line_put); - meta_list[hole].lba = lba_list_media[i]; + meta_list[hole].lba = pr_ctx->lba_list_media[i]; src_bv = new_bio->bi_io_vec[i++]; dst_bv = bio->bi_io_vec[bio_init_idx + hole]; @@ -295,13 +290,9 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd, struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd); struct pblk_pr_ctx *pr_ctx; struct bio *new_bio, *bio = r_ctx->private; - __le64 *lba_list_mem; int nr_secs = rqd->nr_ppas; int i; - /* Re-use allocated memory for intermediate lbas */ - lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size); - new_bio = bio_alloc(GFP_KERNEL, nr_holes); if (pblk_bio_add_pages(pblk, new_bio, GFP_KERNEL, nr_holes)) @@ -312,12 +303,12 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd, goto fail_free_pages; } - pr_ctx = kmalloc(sizeof(struct pblk_pr_ctx), GFP_KERNEL); + pr_ctx = kzalloc(sizeof(struct pblk_pr_ctx), GFP_KERNEL); if (!pr_ctx) goto fail_free_pages; for (i = 0; i < nr_secs; i++) - lba_list_mem[i] = meta_list[i].lba; + pr_ctx->lba_list_mem[i] = meta_list[i].lba; new_bio->bi_iter.bi_sector = 0; /* internal bio */ bio_set_op_attrs(new_bio, REQ_OP_READ, 0); @@ -325,7 +316,6 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd, rqd->bio = new_bio; rqd->nr_ppas = nr_holes; - pr_ctx->ppa_ptr = NULL; pr_ctx->orig_bio = bio; bitmap_copy(pr_ctx->bitmap, read_bitmap, NVM_MAX_VLBA); pr_ctx->bio_init_idx = bio_init_idx; diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h index e5b88a25d4d6..0e9d3960ac4c 100644 --- a/drivers/lightnvm/pblk.h +++ b/drivers/lightnvm/pblk.h @@ -132,6 +132,8 @@ struct pblk_pr_ctx { unsigned int bio_init_idx; void *ppa_ptr; dma_addr_t dma_ppa_list; + __le64 lba_list_mem[NVM_MAX_VLBA]; + __le64 lba_list_media[NVM_MAX_VLBA]; }; /* Pad context */ -- 2.14.5
[PATCH v4 0/5] lightnvm: Flexible metadata
This series of patches extends the way how pblk can store L2P sector metadata. After this set of changes any size of NVMe metadata is supported in pblk. Also there is an support for case without NVMe metadata. Changes v3 --> v4: -rename nvm_alloc_dma_pool() to nvm_create_dma_pool() -split pblk_get_meta() calls and lba setting into two operations for better core readability -fixing compilation with CONFIG_NVM disabled -getting rid of unnecessary memcpy for packed metadata on write path -support for drives with oob size >0 and <16B in packed metadata mode -minor commit message updates Changes v2 --> v3: -Rebase on top of ocssd/for-4.21/core -get/set_meta_lba helpers were removed -dma reallocation was replaced with single allocation -oob metadata size was added to pblk structure -proper checks on pblk creation were added Changes v1 --> v2: -Revert sector meta size back to 16b for pblk -Dma pool for larger oob meta are handled in core instead of pblk -Pblk oob meta helpers uses __le64 as input outpu instead of u64 -Other minor fixes based on v1 patch review Igor Konopko (5): lightnvm: pblk: Move lba list to partial read context lightnvm: pblk: Helpers for OOB metadata lightnvm: Flexible DMA pool entry size lightnvm: Disable interleaved metadata lightnvm: pblk: Support for packed metadata drivers/lightnvm/core.c | 30 -- drivers/lightnvm/pblk-core.c | 61 +++-- drivers/lightnvm/pblk-init.c | 44 +-- drivers/lightnvm/pblk-map.c | 20 +++- drivers/lightnvm/pblk-rb.c | 3 ++ drivers/lightnvm/pblk-read.c | 66 +++- drivers/lightnvm/pblk-recovery.c | 25 +-- drivers/lightnvm/pblk-sysfs.c| 7 + drivers/lightnvm/pblk-write.c| 9 +++--- drivers/lightnvm/pblk.h | 24 +-- drivers/nvme/host/lightnvm.c | 16 ++ include/linux/lightnvm.h | 8 - 12 files changed, 233 insertions(+), 80 deletions(-) -- 2.14.5
[PATCH v4 2/5] lightnvm: pblk: Helpers for OOB metadata
Currently pblk assumes that size of OOB metadata on drive is always equal to size of pblk_sec_meta struct. This commit add helpers which will allow to handle different sizes of OOB metadata on drive in the future. Still, after this patch only OOB metadata equal to 16 bytes is supported. Reviewed-by: Javier González Signed-off-by: Igor Konopko --- drivers/lightnvm/pblk-core.c | 5 +++-- drivers/lightnvm/pblk-init.c | 6 + drivers/lightnvm/pblk-map.c | 20 +++-- drivers/lightnvm/pblk-read.c | 48 ++-- drivers/lightnvm/pblk-recovery.c | 16 +- drivers/lightnvm/pblk.h | 6 + 6 files changed, 69 insertions(+), 32 deletions(-) diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c index 6581c35f51ee..aeb10bd78c62 100644 --- a/drivers/lightnvm/pblk-core.c +++ b/drivers/lightnvm/pblk-core.c @@ -796,10 +796,11 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line, rqd.is_seq = 1; for (i = 0; i < lm->smeta_sec; i++, paddr++) { - struct pblk_sec_meta *meta_list = rqd.meta_list; + struct pblk_sec_meta *meta = pblk_get_meta(pblk, + rqd.meta_list, i); rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id); - meta_list[i].lba = lba_list[paddr] = addr_empty; + meta->lba = lba_list[paddr] = addr_empty; } ret = pblk_submit_io_sync_sem(pblk, ); diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c index 0e37104de596..6e7a0c6c6655 100644 --- a/drivers/lightnvm/pblk-init.c +++ b/drivers/lightnvm/pblk-init.c @@ -409,6 +409,12 @@ static int pblk_core_init(struct pblk *pblk) queue_max_hw_sectors(dev->q) / (geo->csecs >> SECTOR_SHIFT)); pblk_set_sec_per_write(pblk, pblk->min_write_pgs); + pblk->oob_meta_size = geo->sos; + if (pblk->oob_meta_size != sizeof(struct pblk_sec_meta)) { + pblk_err(pblk, "Unsupported metadata size\n"); + return -EINVAL; + } + pblk->pad_dist = kcalloc(pblk->min_write_pgs - 1, sizeof(atomic64_t), GFP_KERNEL); if (!pblk->pad_dist) diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c index 5a3c28cce8ab..81e503ec384e 100644 --- a/drivers/lightnvm/pblk-map.c +++ b/drivers/lightnvm/pblk-map.c @@ -22,7 +22,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry, struct ppa_addr *ppa_list, unsigned long *lun_bitmap, - struct pblk_sec_meta *meta_list, + void *meta_list, unsigned int valid_secs) { struct pblk_line *line = pblk_line_get_data(pblk); @@ -58,6 +58,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry, paddr = pblk_alloc_page(pblk, line, nr_secs); for (i = 0; i < nr_secs; i++, paddr++) { + struct pblk_sec_meta *meta = pblk_get_meta(pblk, meta_list, i); __le64 addr_empty = cpu_to_le64(ADDR_EMPTY); /* ppa to be sent to the device */ @@ -74,14 +75,15 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry, kref_get(>ref); w_ctx = pblk_rb_w_ctx(>rwb, sentry + i); w_ctx->ppa = ppa_list[i]; - meta_list[i].lba = cpu_to_le64(w_ctx->lba); + meta->lba = cpu_to_le64(w_ctx->lba); lba_list[paddr] = cpu_to_le64(w_ctx->lba); if (lba_list[paddr] != addr_empty) line->nr_valid_lbas++; else atomic64_inc(>pad_wa); } else { - lba_list[paddr] = meta_list[i].lba = addr_empty; + lba_list[paddr] = addr_empty; + meta->lba = addr_empty; __pblk_map_invalidate(pblk, line, paddr); } } @@ -94,7 +96,8 @@ int pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry, unsigned long *lun_bitmap, unsigned int valid_secs, unsigned int off) { - struct pblk_sec_meta *meta_list = rqd->meta_list; + void *meta_list = rqd->meta_list; + void *meta_buffer; struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd); unsigned int map_secs; int min = pblk->min_write_pgs; @@ -103,9 +106,10 @@ int pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry, for (i = off; i < rqd->nr_ppas; i += min) {
[PATCH] nvme: Fix PCIe surprise removal scenario
This patch fixes kernel OOPS for surprise removal scenario for PCIe connected NVMe drives. After latest changes, when PCIe device is not present, nvme_dev_remove_admin() calls blk_cleanup_queue() on admin queue, which frees hctx for that queue. Moment later, on the same path nvme_kill_queues() calls blk_mq_unquiesce_queue() on admin queue and tries to access hctx of it, which leads to following OOPS scenario: Oops: [#1] SMP PTI RIP: 0010:sbitmap_any_bit_set+0xb/0x40 Call Trace: blk_mq_run_hw_queue+0xd5/0x150 blk_mq_run_hw_queues+0x3a/0x50 nvme_kill_queues+0x26/0x50 nvme_remove_namespaces+0xb2/0xc0 nvme_remove+0x60/0x140 pci_device_remove+0x3b/0xb0 Fixes: cb4bfda62afa2 ("nvme-pci: fix hot removal during error handling") Signed-off-by: Igor Konopko --- drivers/nvme/host/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 65c42448e904..5aff95389694 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -3601,7 +3601,7 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl) down_read(>namespaces_rwsem); /* Forcibly unquiesce queues to avoid blocking dispatch */ - if (ctrl->admin_q) + if (ctrl->admin_q && !blk_queue_dying(ctrl->admin_q)) blk_mq_unquiesce_queue(ctrl->admin_q); list_for_each_entry(ns, >namespaces, list) -- 2.14.5
[PATCH v3 5/5] lightnvm: pblk: Support for packed metadata
In current pblk implementation, l2p mapping for not closed lines is always stored only in OOB metadata and recovered from it. Such a solution does not provide data integrity when drives does not have such a OOB metadata space. The goal of this patch is to add support for so called packed metadata, which store l2p mapping for open lines in last sector of every write unit. After this set of changes, drives with OOB >0 and <16b are still not supported. Signed-off-by: Igor Konopko --- drivers/lightnvm/pblk-core.c | 53 +--- drivers/lightnvm/pblk-init.c | 37 +--- drivers/lightnvm/pblk-rb.c | 3 +++ drivers/lightnvm/pblk-read.c | 6 + drivers/lightnvm/pblk-recovery.c | 5 ++-- drivers/lightnvm/pblk-sysfs.c| 7 ++ drivers/lightnvm/pblk-write.c| 14 --- drivers/lightnvm/pblk.h | 10 +++- 8 files changed, 121 insertions(+), 14 deletions(-) diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c index 2ebd3b079a96..615817bf97e3 100644 --- a/drivers/lightnvm/pblk-core.c +++ b/drivers/lightnvm/pblk-core.c @@ -376,7 +376,7 @@ void pblk_write_should_kick(struct pblk *pblk) { unsigned int secs_avail = pblk_rb_read_count(>rwb); - if (secs_avail >= pblk->min_write_pgs) + if (secs_avail >= pblk->min_write_pgs_data) pblk_write_kick(pblk); } @@ -407,7 +407,9 @@ struct list_head *pblk_line_gc_list(struct pblk *pblk, struct pblk_line *line) struct pblk_line_meta *lm = >lm; struct pblk_line_mgmt *l_mg = >l_mg; struct list_head *move_list = NULL; - int vsc = le32_to_cpu(*line->vsc); + int packed_meta = (le32_to_cpu(*line->vsc) / pblk->min_write_pgs_data) + * (pblk->min_write_pgs - pblk->min_write_pgs_data); + int vsc = le32_to_cpu(*line->vsc) + packed_meta; lockdep_assert_held(>lock); @@ -620,12 +622,15 @@ struct bio *pblk_bio_map_addr(struct pblk *pblk, void *data, } int pblk_calc_secs(struct pblk *pblk, unsigned long secs_avail, - unsigned long secs_to_flush) + unsigned long secs_to_flush, bool skip_meta) { int max = pblk->sec_per_write; int min = pblk->min_write_pgs; int secs_to_sync = 0; + if (skip_meta && pblk->min_write_pgs_data != pblk->min_write_pgs) + min = max = pblk->min_write_pgs_data; + if (secs_avail >= max) secs_to_sync = max; else if (secs_avail >= min) @@ -852,7 +857,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line, next_rq: memset(, 0, sizeof(struct nvm_rq)); - rq_ppas = pblk_calc_secs(pblk, left_ppas, 0); + rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false); rq_len = rq_ppas * geo->csecs; bio = pblk_bio_map_addr(pblk, emeta_buf, rq_ppas, rq_len, @@ -2161,3 +2166,43 @@ void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas, } spin_unlock(>trans_lock); } + +void pblk_set_packed_meta(struct pblk *pblk, struct nvm_rq *rqd) +{ + void *meta_list = rqd->meta_list; + void *page; + int i = 0; + + if (pblk_is_oob_meta_supported(pblk)) + return; + + /* We need to zero out metadata corresponding to packed meta page */ + pblk_get_meta(pblk, meta_list, rqd->nr_ppas - 1)->lba = + cpu_to_le64(ADDR_EMPTY); + + page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page); + /* We need to fill last page of request (packed metadata) +* with data from oob meta buffer. +*/ + for (; i < rqd->nr_ppas; i++) + memcpy(page + (i * sizeof(struct pblk_sec_meta)), + pblk_get_meta(pblk, meta_list, i), + sizeof(struct pblk_sec_meta)); +} + +void pblk_get_packed_meta(struct pblk *pblk, struct nvm_rq *rqd) +{ + void *meta_list = rqd->meta_list; + void *page; + int i = 0; + + if (pblk_is_oob_meta_supported(pblk)) + return; + + page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page); + /* We need to fill oob meta buffer with data from packe metadata */ + for (; i < rqd->nr_ppas; i++) + memcpy(pblk_get_meta(pblk, meta_list, i), + page + (i * sizeof(struct pblk_sec_meta)), + sizeof(struct pblk_sec_meta)); +} diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c index a728f861edd6..5536338eea76 100644 --- a/drivers/lightnvm/pblk-init.c +++ b/drivers/lightnvm/pblk-init.c @@ -403,6 +403,7 @@ static int pblk_core_init(struct pblk *pblk) pblk->nr_flush_rst = 0; pblk->min_write_pgs = geo-&g
[PATCH v3 4/5] lightnvm: Disable interleaved metadata
Currently pblk and lightnvm does only check for size of OOB metadata and does not care wheather this meta is located in separate buffer or is interleaved with data in single buffer. In reality only the first scenario is supported, where second mode will break pblk functionality during any IO operation. The goal of this patch is to block creation of pblk devices in case of interleaved metadata Reviewed-by: Javier González Signed-off-by: Igor Konopko --- drivers/lightnvm/pblk-init.c | 6 ++ drivers/nvme/host/lightnvm.c | 1 + include/linux/lightnvm.h | 1 + 3 files changed, 8 insertions(+) diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c index b67bca810eb7..a728f861edd6 100644 --- a/drivers/lightnvm/pblk-init.c +++ b/drivers/lightnvm/pblk-init.c @@ -1179,6 +1179,12 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk, return ERR_PTR(-EINVAL); } + if (geo->ext) { + pblk_err(pblk, "extended metadata not supported\n"); + kfree(pblk); + return ERR_PTR(-EINVAL); + } + spin_lock_init(>resubmit_lock); spin_lock_init(>trans_lock); spin_lock_init(>lock); diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c index 55076912a673..049425ad8592 100644 --- a/drivers/nvme/host/lightnvm.c +++ b/drivers/nvme/host/lightnvm.c @@ -982,6 +982,7 @@ void nvme_nvm_update_nvm_info(struct nvme_ns *ns) if (geo->version != NVM_OCSSD_SPEC_12) { geo->csecs = 1 << ns->lba_shift; geo->sos = ns->ms; + geo->ext = ns->ext; } if (nvm_alloc_dma_pool(ndev)) diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h index 8b4564c17656..fd7b519f3ad2 100644 --- a/include/linux/lightnvm.h +++ b/include/linux/lightnvm.h @@ -357,6 +357,7 @@ struct nvm_geo { u32 clba; /* sectors per chunk */ u16 csecs; /* sector size */ u16 sos;/* out-of-band area size */ + boolext;/* metadata in extended data buffer */ /* device write constrains */ u32 ws_min; /* minimum write size */ -- 2.14.4
[PATCH v3 0/5] lightnvm: Flexible metadata
This series of patches extends the way how pblk can store L2P sector metadata. After this set of changes any size of NVMe metadata above 16b is supported in pblk. Also there is an support for case without NVMe metadata. Changes v2 --> v3: -Rebase on top of ocssd/for-4.21/core -get/set_meta_lba helpers were removed -dma reallocation was replaced with single allocation -oob metadata size was added to pblk structure -proper checks on pblk creation were added Changes v1 --> v2: -Revert sector meta size back to 16b for pblk -Dma pool for larger oob meta are handled in core instead of pblk -Pblk oob meta helpers uses __le64 as input outpu instead of u64 -Other minor fixes based on v1 patch review Igor Konopko (5): lightnvm: pblk: Move lba list to partial read context lightnvm: pblk: Helpers for OOB metadata lightnvm: Flexible DMA pool entry size lightnvm: Disable interleaved metadata lightnvm: pblk: Support for packed metadata drivers/lightnvm/core.c | 30 -- drivers/lightnvm/pblk-core.c | 66 ++-- drivers/lightnvm/pblk-init.c | 47 ++-- drivers/lightnvm/pblk-map.c | 21 - drivers/lightnvm/pblk-rb.c | 3 ++ drivers/lightnvm/pblk-read.c | 60 drivers/lightnvm/pblk-recovery.c | 22 -- drivers/lightnvm/pblk-sysfs.c| 7 + drivers/lightnvm/pblk-write.c| 14 ++--- drivers/lightnvm/pblk.h | 24 +-- drivers/nvme/host/lightnvm.c | 16 ++ include/linux/lightnvm.h | 4 ++- 12 files changed, 235 insertions(+), 79 deletions(-) -- 2.14.4
[PATCH v3 2/5] lightnvm: pblk: Helpers for OOB metadata
Currently pblk assumes that size of OOB metadata on drive is always equal to size of pblk_sec_meta struct. This commit add helpers which will allow to handle different sizes of OOB metadata on drive in the future. Still, after this patch only OOB metadata equal to 16b is supported. Signed-off-by: Igor Konopko --- drivers/lightnvm/pblk-core.c | 5 +++-- drivers/lightnvm/pblk-init.c | 6 ++ drivers/lightnvm/pblk-map.c | 21 +--- drivers/lightnvm/pblk-read.c | 42 +--- drivers/lightnvm/pblk-recovery.c | 13 +++-- drivers/lightnvm/pblk.h | 6 ++ 6 files changed, 62 insertions(+), 31 deletions(-) diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c index 6581c35f51ee..9509d6dbed53 100644 --- a/drivers/lightnvm/pblk-core.c +++ b/drivers/lightnvm/pblk-core.c @@ -796,10 +796,11 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line, rqd.is_seq = 1; for (i = 0; i < lm->smeta_sec; i++, paddr++) { - struct pblk_sec_meta *meta_list = rqd.meta_list; + void *meta_list = rqd.meta_list; rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id); - meta_list[i].lba = lba_list[paddr] = addr_empty; + pblk_get_meta(pblk, meta_list, i)->lba = lba_list[paddr] = + addr_empty; } ret = pblk_submit_io_sync_sem(pblk, ); diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c index 0e37104de596..6e7a0c6c6655 100644 --- a/drivers/lightnvm/pblk-init.c +++ b/drivers/lightnvm/pblk-init.c @@ -409,6 +409,12 @@ static int pblk_core_init(struct pblk *pblk) queue_max_hw_sectors(dev->q) / (geo->csecs >> SECTOR_SHIFT)); pblk_set_sec_per_write(pblk, pblk->min_write_pgs); + pblk->oob_meta_size = geo->sos; + if (pblk->oob_meta_size != sizeof(struct pblk_sec_meta)) { + pblk_err(pblk, "Unsupported metadata size\n"); + return -EINVAL; + } + pblk->pad_dist = kcalloc(pblk->min_write_pgs - 1, sizeof(atomic64_t), GFP_KERNEL); if (!pblk->pad_dist) diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c index 5a3c28cce8ab..0c6d962bad78 100644 --- a/drivers/lightnvm/pblk-map.c +++ b/drivers/lightnvm/pblk-map.c @@ -22,7 +22,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry, struct ppa_addr *ppa_list, unsigned long *lun_bitmap, - struct pblk_sec_meta *meta_list, + void *meta_list, unsigned int valid_secs) { struct pblk_line *line = pblk_line_get_data(pblk); @@ -74,14 +74,17 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry, kref_get(>ref); w_ctx = pblk_rb_w_ctx(>rwb, sentry + i); w_ctx->ppa = ppa_list[i]; - meta_list[i].lba = cpu_to_le64(w_ctx->lba); + pblk_get_meta(pblk, meta_list, i)->lba = + cpu_to_le64(w_ctx->lba); lba_list[paddr] = cpu_to_le64(w_ctx->lba); if (lba_list[paddr] != addr_empty) line->nr_valid_lbas++; else atomic64_inc(>pad_wa); } else { - lba_list[paddr] = meta_list[i].lba = addr_empty; + lba_list[paddr] = addr_empty; + pblk_get_meta(pblk, meta_list, i)->lba = + addr_empty; __pblk_map_invalidate(pblk, line, paddr); } } @@ -94,7 +97,8 @@ int pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry, unsigned long *lun_bitmap, unsigned int valid_secs, unsigned int off) { - struct pblk_sec_meta *meta_list = rqd->meta_list; + void *meta_list = rqd->meta_list; + void *meta_buffer; struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd); unsigned int map_secs; int min = pblk->min_write_pgs; @@ -103,9 +107,10 @@ int pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry, for (i = off; i < rqd->nr_ppas; i += min) { map_secs = (i + min > valid_secs) ? (valid_secs % min) : min; + meta_buffer = pblk_get_meta(pblk, meta_list, i); ret = pblk_map_page_data(pblk, sentry + i, _list[i], -
[PATCH v3 3/5] lightnvm: Flexible DMA pool entry size
Currently whole lightnvm and pblk uses single DMA pool, for which entry size is always equal to PAGE_SIZE. PPA list always needs 8b*64, so there is only 56b*64 space for OOB meta. Since NVMe OOB meta can be bigger, such as 128b, this solution is not robustness. This patch add the possiblity to support OOB meta above 56b by changing DMA pool size based on OOB meta size. It also allows pblk to use OOB metadata >=16b. Signed-off-by: Igor Konopko --- drivers/lightnvm/core.c | 30 -- drivers/lightnvm/pblk-core.c | 8 drivers/lightnvm/pblk-init.c | 2 +- drivers/lightnvm/pblk-recovery.c | 4 ++-- drivers/lightnvm/pblk.h | 6 +- drivers/nvme/host/lightnvm.c | 15 +-- include/linux/lightnvm.h | 3 ++- 7 files changed, 43 insertions(+), 25 deletions(-) diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c index 73ab3cf26868..c3650b141a30 100644 --- a/drivers/lightnvm/core.c +++ b/drivers/lightnvm/core.c @@ -1145,15 +1145,9 @@ int nvm_register(struct nvm_dev *dev) if (!dev->q || !dev->ops) return -EINVAL; - dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist"); - if (!dev->dma_pool) { - pr_err("nvm: could not create dma pool\n"); - return -ENOMEM; - } - ret = nvm_init(dev); if (ret) - goto err_init; + return ret; /* register device with a supported media manager */ down_write(_lock); @@ -1161,9 +1155,6 @@ int nvm_register(struct nvm_dev *dev) up_write(_lock); return 0; -err_init: - dev->ops->destroy_dma_pool(dev->dma_pool); - return ret; } EXPORT_SYMBOL(nvm_register); @@ -1187,6 +1178,25 @@ void nvm_unregister(struct nvm_dev *dev) } EXPORT_SYMBOL(nvm_unregister); +int nvm_alloc_dma_pool(struct nvm_dev *dev) +{ + int exp_pool_size; + + exp_pool_size = max_t(int, PAGE_SIZE, + (NVM_MAX_VLBA * (sizeof(u64) + dev->geo.sos))); + exp_pool_size = round_up(exp_pool_size, PAGE_SIZE); + + dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist", + exp_pool_size); + if (!dev->dma_pool) { + pr_err("nvm: could not create dma pool\n"); + return -ENOMEM; + } + + return 0; +} +EXPORT_SYMBOL(nvm_alloc_dma_pool); + static int __nvm_configure_create(struct nvm_ioctl_create *create) { struct nvm_dev *dev; diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c index 9509d6dbed53..2ebd3b079a96 100644 --- a/drivers/lightnvm/pblk-core.c +++ b/drivers/lightnvm/pblk-core.c @@ -250,8 +250,8 @@ int pblk_alloc_rqd_meta(struct pblk *pblk, struct nvm_rq *rqd) if (rqd->nr_ppas == 1) return 0; - rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size; - rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size; + rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size(pblk); + rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size(pblk); return 0; } @@ -846,8 +846,8 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line, if (!meta_list) return -ENOMEM; - ppa_list = meta_list + pblk_dma_meta_size; - dma_ppa_list = dma_meta_list + pblk_dma_meta_size; + ppa_list = meta_list + pblk_dma_meta_size(pblk); + dma_ppa_list = dma_meta_list + pblk_dma_meta_size(pblk); next_rq: memset(, 0, sizeof(struct nvm_rq)); diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c index 6e7a0c6c6655..b67bca810eb7 100644 --- a/drivers/lightnvm/pblk-init.c +++ b/drivers/lightnvm/pblk-init.c @@ -410,7 +410,7 @@ static int pblk_core_init(struct pblk *pblk) pblk_set_sec_per_write(pblk, pblk->min_write_pgs); pblk->oob_meta_size = geo->sos; - if (pblk->oob_meta_size != sizeof(struct pblk_sec_meta)) { + if (pblk->oob_meta_size < sizeof(struct pblk_sec_meta)) { pblk_err(pblk, "Unsupported metadata size\n"); return -EINVAL; } diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c index 902c54ab1318..5bb8a2a4f87b 100644 --- a/drivers/lightnvm/pblk-recovery.c +++ b/drivers/lightnvm/pblk-recovery.c @@ -475,8 +475,8 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line) if (!meta_list) return -ENOMEM; - ppa_list = (void *)(meta_list) + pblk_dma_meta_size; - dma_ppa_list = dma_meta_list + pblk_dma_meta_size; + ppa_list = (void *)(meta_list) + pblk_dma_meta_size(pblk); + dma_ppa_list = dma_meta_list + pblk_dma_meta_size(pblk); data = kcalloc(pblk->max_write_pgs
[PATCH v3 1/5] lightnvm: pblk: Move lba list to partial read context
Currently DMA allocated memory is reused on partial read for lba_list_mem and lba_list_media arrays. In preparation for dynamic DMA pool sizes we need to move this arrays into pblk_pr_ctx structures. Reviewed-by: Javier González Signed-off-by: Igor Konopko --- drivers/lightnvm/pblk-read.c | 20 +--- drivers/lightnvm/pblk.h | 2 ++ 2 files changed, 7 insertions(+), 15 deletions(-) diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c index 9fba614adeeb..19917d3c19b3 100644 --- a/drivers/lightnvm/pblk-read.c +++ b/drivers/lightnvm/pblk-read.c @@ -224,7 +224,6 @@ static void pblk_end_partial_read(struct nvm_rq *rqd) unsigned long *read_bitmap = pr_ctx->bitmap; int nr_secs = pr_ctx->orig_nr_secs; int nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs); - __le64 *lba_list_mem, *lba_list_media; void *src_p, *dst_p; int hole, i; @@ -237,13 +236,9 @@ static void pblk_end_partial_read(struct nvm_rq *rqd) rqd->ppa_list[0] = ppa; } - /* Re-use allocated memory for intermediate lbas */ - lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size); - lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk_dma_ppa_size); - for (i = 0; i < nr_secs; i++) { - lba_list_media[i] = meta_list[i].lba; - meta_list[i].lba = lba_list_mem[i]; + pr_ctx->lba_list_media[i] = meta_list[i].lba; + meta_list[i].lba = pr_ctx->lba_list_mem[i]; } /* Fill the holes in the original bio */ @@ -255,7 +250,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd) line = pblk_ppa_to_line(pblk, rqd->ppa_list[i]); kref_put(>ref, pblk_line_put); - meta_list[hole].lba = lba_list_media[i]; + meta_list[hole].lba = pr_ctx->lba_list_media[i]; src_bv = new_bio->bi_io_vec[i++]; dst_bv = bio->bi_io_vec[bio_init_idx + hole]; @@ -295,13 +290,9 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd, struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd); struct pblk_pr_ctx *pr_ctx; struct bio *new_bio, *bio = r_ctx->private; - __le64 *lba_list_mem; int nr_secs = rqd->nr_ppas; int i; - /* Re-use allocated memory for intermediate lbas */ - lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size); - new_bio = bio_alloc(GFP_KERNEL, nr_holes); if (pblk_bio_add_pages(pblk, new_bio, GFP_KERNEL, nr_holes)) @@ -312,12 +303,12 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd, goto fail_free_pages; } - pr_ctx = kmalloc(sizeof(struct pblk_pr_ctx), GFP_KERNEL); + pr_ctx = kzalloc(sizeof(struct pblk_pr_ctx), GFP_KERNEL); if (!pr_ctx) goto fail_free_pages; for (i = 0; i < nr_secs; i++) - lba_list_mem[i] = meta_list[i].lba; + pr_ctx->lba_list_mem[i] = meta_list[i].lba; new_bio->bi_iter.bi_sector = 0; /* internal bio */ bio_set_op_attrs(new_bio, REQ_OP_READ, 0); @@ -325,7 +316,6 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd, rqd->bio = new_bio; rqd->nr_ppas = nr_holes; - pr_ctx->ppa_ptr = NULL; pr_ctx->orig_bio = bio; bitmap_copy(pr_ctx->bitmap, read_bitmap, NVM_MAX_VLBA); pr_ctx->bio_init_idx = bio_init_idx; diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h index e5b88a25d4d6..0e9d3960ac4c 100644 --- a/drivers/lightnvm/pblk.h +++ b/drivers/lightnvm/pblk.h @@ -132,6 +132,8 @@ struct pblk_pr_ctx { unsigned int bio_init_idx; void *ppa_ptr; dma_addr_t dma_ppa_list; + __le64 lba_list_mem[NVM_MAX_VLBA]; + __le64 lba_list_media[NVM_MAX_VLBA]; }; /* Pad context */ -- 2.14.4
[PATCH v2 5/5] lightnvm: pblk: Support for packed metadata
In current pblk implementation, l2p mapping for not closed lines is always stored only in OOB metadata and recovered from it. Such a solution does not provide data integrity when drives does not have such a OOB metadata space. The goal of this patch is to add support for so called packed metadata, which store l2p mapping for open lines in last sector of every write unit. Signed-off-by: Igor Konopko --- drivers/lightnvm/pblk-core.c | 53 +--- drivers/lightnvm/pblk-init.c | 37 ++-- drivers/lightnvm/pblk-rb.c | 3 +++ drivers/lightnvm/pblk-read.c | 6 + drivers/lightnvm/pblk-recovery.c | 5 ++-- drivers/lightnvm/pblk-sysfs.c| 7 ++ drivers/lightnvm/pblk-write.c| 14 --- drivers/lightnvm/pblk.h | 13 +- 8 files changed, 125 insertions(+), 13 deletions(-) diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c index b1e104765868..245abf29620f 100644 --- a/drivers/lightnvm/pblk-core.c +++ b/drivers/lightnvm/pblk-core.c @@ -376,7 +376,7 @@ void pblk_write_should_kick(struct pblk *pblk) { unsigned int secs_avail = pblk_rb_read_count(>rwb); - if (secs_avail >= pblk->min_write_pgs) + if (secs_avail >= pblk->min_write_pgs_data) pblk_write_kick(pblk); } @@ -407,7 +407,9 @@ struct list_head *pblk_line_gc_list(struct pblk *pblk, struct pblk_line *line) struct pblk_line_meta *lm = >lm; struct pblk_line_mgmt *l_mg = >l_mg; struct list_head *move_list = NULL; - int vsc = le32_to_cpu(*line->vsc); + int packed_meta = (le32_to_cpu(*line->vsc) / pblk->min_write_pgs_data) + * (pblk->min_write_pgs - pblk->min_write_pgs_data); + int vsc = le32_to_cpu(*line->vsc) + packed_meta; lockdep_assert_held(>lock); @@ -620,12 +622,15 @@ struct bio *pblk_bio_map_addr(struct pblk *pblk, void *data, } int pblk_calc_secs(struct pblk *pblk, unsigned long secs_avail, - unsigned long secs_to_flush) + unsigned long secs_to_flush, bool skip_meta) { int max = pblk->sec_per_write; int min = pblk->min_write_pgs; int secs_to_sync = 0; + if (skip_meta && pblk->min_write_pgs_data != pblk->min_write_pgs) + min = max = pblk->min_write_pgs_data; + if (secs_avail >= max) secs_to_sync = max; else if (secs_avail >= min) @@ -852,7 +857,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line, next_rq: memset(, 0, sizeof(struct nvm_rq)); - rq_ppas = pblk_calc_secs(pblk, left_ppas, 0); + rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false); rq_len = rq_ppas * geo->csecs; bio = pblk_bio_map_addr(pblk, emeta_buf, rq_ppas, rq_len, @@ -2161,3 +2166,43 @@ void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas, } spin_unlock(>trans_lock); } + +void pblk_set_packed_meta(struct pblk *pblk, struct nvm_rq *rqd) +{ + void *meta_list = rqd->meta_list; + void *page; + int i = 0; + + if (pblk_is_oob_meta_supported(pblk)) + return; + + /* We need to zero out metadata corresponding to packed meta page */ + pblk_set_meta_lba(pblk, meta_list, rqd->nr_ppas - 1, + cpu_to_le64(ADDR_EMPTY)); + + page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page); + /* We need to fill last page of request (packed metadata) +* with data from oob meta buffer. +*/ + for (; i < rqd->nr_ppas; i++) + memcpy(page + (i * sizeof(struct pblk_sec_meta)), + pblk_get_meta(pblk, meta_list, i), + sizeof(struct pblk_sec_meta)); +} + +void pblk_get_packed_meta(struct pblk *pblk, struct nvm_rq *rqd) +{ + void *meta_list = rqd->meta_list; + void *page; + int i = 0; + + if (pblk_is_oob_meta_supported(pblk)) + return; + + page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page); + /* We need to fill oob meta buffer with data from packe metadata */ + for (; i < rqd->nr_ppas; i++) + memcpy(pblk_get_meta(pblk, meta_list, i), + page + (i * sizeof(struct pblk_sec_meta)), + sizeof(struct pblk_sec_meta)); +} diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c index ded0618f6cda..7e09717a93d4 100644 --- a/drivers/lightnvm/pblk-init.c +++ b/drivers/lightnvm/pblk-init.c @@ -406,12 +406,44 @@ static int pblk_core_init(struct pblk *pblk) pblk->nr_flush_rst = 0; pblk->min_write_pgs = geo->ws_opt; + pblk->min_write_pgs_data = pblk->min_write_pgs; max_write_ppas = pblk->mi
[PATCH v2 2/5] lightnvm: pblk: Helpers for OOB metadata
Currently pblk assumes that size of OOB metadata on drive is always equal to size of pblk_sec_meta struct. This commit add helpers which will allow to handle different sizes of OOB metadata on drive. Signed-off-by: Igor Konopko --- drivers/lightnvm/pblk-core.c | 5 +++-- drivers/lightnvm/pblk-map.c | 20 +++--- drivers/lightnvm/pblk-read.c | 45 +--- drivers/lightnvm/pblk-recovery.c | 13 ++-- drivers/lightnvm/pblk.h | 22 5 files changed, 73 insertions(+), 32 deletions(-) diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c index 6944aac43b01..0f33055f40eb 100644 --- a/drivers/lightnvm/pblk-core.c +++ b/drivers/lightnvm/pblk-core.c @@ -796,10 +796,11 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line, rqd.is_seq = 1; for (i = 0; i < lm->smeta_sec; i++, paddr++) { - struct pblk_sec_meta *meta_list = rqd.meta_list; + void *meta_list = rqd.meta_list; rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id); - meta_list[i].lba = lba_list[paddr] = addr_empty; + pblk_set_meta_lba(pblk, meta_list, i, addr_empty); + lba_list[paddr] = addr_empty; } ret = pblk_submit_io_sync_sem(pblk, ); diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c index 6dcbd44e3acb..4bae30129bc9 100644 --- a/drivers/lightnvm/pblk-map.c +++ b/drivers/lightnvm/pblk-map.c @@ -22,7 +22,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry, struct ppa_addr *ppa_list, unsigned long *lun_bitmap, - struct pblk_sec_meta *meta_list, + void *meta_list, unsigned int valid_secs) { struct pblk_line *line = pblk_line_get_data(pblk); @@ -68,14 +68,16 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry, kref_get(>ref); w_ctx = pblk_rb_w_ctx(>rwb, sentry + i); w_ctx->ppa = ppa_list[i]; - meta_list[i].lba = cpu_to_le64(w_ctx->lba); + pblk_set_meta_lba(pblk, meta_list, i, + cpu_to_le64(w_ctx->lba)); lba_list[paddr] = cpu_to_le64(w_ctx->lba); if (lba_list[paddr] != addr_empty) line->nr_valid_lbas++; else atomic64_inc(>pad_wa); } else { - lba_list[paddr] = meta_list[i].lba = addr_empty; + lba_list[paddr] = addr_empty; + pblk_set_meta_lba(pblk, meta_list, i, addr_empty); __pblk_map_invalidate(pblk, line, paddr); } } @@ -88,7 +90,8 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry, unsigned long *lun_bitmap, unsigned int valid_secs, unsigned int off) { - struct pblk_sec_meta *meta_list = rqd->meta_list; + void *meta_list = rqd->meta_list; + void *meta_buffer; struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd); unsigned int map_secs; int min = pblk->min_write_pgs; @@ -96,8 +99,9 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry, for (i = off; i < rqd->nr_ppas; i += min) { map_secs = (i + min > valid_secs) ? (valid_secs % min) : min; + meta_buffer = pblk_get_meta(pblk, meta_list, i); if (pblk_map_page_data(pblk, sentry + i, _list[i], - lun_bitmap, _list[i], map_secs)) { + lun_bitmap, meta_buffer, map_secs)) { bio_put(rqd->bio); pblk_free_rqd(pblk, rqd, PBLK_WRITE); pblk_pipeline_stop(pblk); @@ -113,7 +117,8 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd, struct nvm_tgt_dev *dev = pblk->dev; struct nvm_geo *geo = >geo; struct pblk_line_meta *lm = >lm; - struct pblk_sec_meta *meta_list = rqd->meta_list; + void *meta_list = rqd->meta_list; + void *meta_buffer; struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd); struct pblk_line *e_line, *d_line; unsigned int map_secs; @@ -122,8 +127,9 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd, for (i = 0; i < rqd->nr_ppas; i += min) { map_secs = (i + min > valid_secs) ? (valid_secs % min) : min; + meta_buffer = pblk_get_meta(pblk, m
[PATCH v2 3/5] lightnvm: Flexible DMA pool entry size
Currently whole lightnvm and pblk uses single DMA pool, for which entry size is always equal to PAGE_SIZE. PPA list always needs 8b*64, so there is only 56b*64 space for OOB meta. Since NVMe OOB meta can be bigger, such as 128b, this solution is not robustness. This patch add the possiblity to support OOB meta above 56b by changing DMA pool size based on OOB meta size. Signed-off-by: Igor Konopko --- drivers/lightnvm/core.c | 45 ++-- drivers/lightnvm/pblk-core.c | 8 +++ drivers/lightnvm/pblk-recovery.c | 4 ++-- drivers/lightnvm/pblk.h | 10 - drivers/nvme/host/lightnvm.c | 8 +-- include/linux/lightnvm.h | 4 +++- 6 files changed, 63 insertions(+), 16 deletions(-) diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c index efb976a863d2..68f0812077d5 100644 --- a/drivers/lightnvm/core.c +++ b/drivers/lightnvm/core.c @@ -1145,11 +1145,9 @@ int nvm_register(struct nvm_dev *dev) if (!dev->q || !dev->ops) return -EINVAL; - dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist"); - if (!dev->dma_pool) { - pr_err("nvm: could not create dma pool\n"); - return -ENOMEM; - } + ret = nvm_realloc_dma_pool(dev); + if (ret) + return ret; ret = nvm_init(dev); if (ret) @@ -1162,7 +1160,12 @@ int nvm_register(struct nvm_dev *dev) return 0; err_init: - dev->ops->destroy_dma_pool(dev->dma_pool); + if (dev->dma_pool) { + dev->ops->destroy_dma_pool(dev->dma_pool); + dev->dma_pool = NULL; + dev->dma_pool_size = 0; + } + return ret; } EXPORT_SYMBOL(nvm_register); @@ -1187,6 +1190,36 @@ void nvm_unregister(struct nvm_dev *dev) } EXPORT_SYMBOL(nvm_unregister); +int nvm_realloc_dma_pool(struct nvm_dev *dev) +{ + int exp_pool_size; + + exp_pool_size = max_t(int, PAGE_SIZE, + (NVM_MAX_VLBA * (sizeof(u64) + dev->geo.sos))); + exp_pool_size = round_up(exp_pool_size, PAGE_SIZE); + + if (dev->dma_pool_size >= exp_pool_size) + return 0; + + if (dev->dma_pool) { + dev->ops->destroy_dma_pool(dev->dma_pool); + dev->dma_pool = NULL; + dev->dma_pool_size = 0; + } + + dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist", + exp_pool_size); + if (!dev->dma_pool) { + dev->dma_pool_size = 0; + pr_err("nvm: could not create dma pool\n"); + return -ENOMEM; + } + dev->dma_pool_size = exp_pool_size; + + return 0; +} +EXPORT_SYMBOL(nvm_realloc_dma_pool); + static int __nvm_configure_create(struct nvm_ioctl_create *create) { struct nvm_dev *dev; diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c index 0f33055f40eb..b1e104765868 100644 --- a/drivers/lightnvm/pblk-core.c +++ b/drivers/lightnvm/pblk-core.c @@ -250,8 +250,8 @@ int pblk_alloc_rqd_meta(struct pblk *pblk, struct nvm_rq *rqd) if (rqd->nr_ppas == 1) return 0; - rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size; - rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size; + rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size(pblk); + rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size(pblk); return 0; } @@ -846,8 +846,8 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line, if (!meta_list) return -ENOMEM; - ppa_list = meta_list + pblk_dma_meta_size; - dma_ppa_list = dma_meta_list + pblk_dma_meta_size; + ppa_list = meta_list + pblk_dma_meta_size(pblk); + dma_ppa_list = dma_meta_list + pblk_dma_meta_size(pblk); next_rq: memset(, 0, sizeof(struct nvm_rq)); diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c index 977b2ca5d849..b5c8a0ed9bb1 100644 --- a/drivers/lightnvm/pblk-recovery.c +++ b/drivers/lightnvm/pblk-recovery.c @@ -474,8 +474,8 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line) if (!meta_list) return -ENOMEM; - ppa_list = (void *)(meta_list) + pblk_dma_meta_size; - dma_ppa_list = dma_meta_list + pblk_dma_meta_size; + ppa_list = (void *)(meta_list) + pblk_dma_meta_size(pblk); + dma_ppa_list = dma_meta_list + pblk_dma_meta_size(pblk); data = kcalloc(pblk->max_write_pgs, geo->csecs, GFP_KERNEL); if (!data) { diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h index d09c1b341e07..c03fa037d037 100644 --- a/drivers/lightnvm/pblk.h +++ b/drivers/lightnvm/pblk.h @@ -10
[PATCH v2 0/5] lightnvm: Flexible metadata
This series of patches extends the way how pblk can store L2P sector metadata. After this set of changes any size of NVMe metadata (including 0) is supported. Patches are rebased on top of block/for-next since there was no ocssd/for-4.21 branch yet. Changes v1 --> v2: -Revert sector meta size back to 16b for pblk -Dma pool for larger oob meta are handled in core instead of pblk -Pblk oob meta helpers uses __le64 as input outpu instead of u64 -Other minor fixes based on v1 patch review Igor Konopko (5): lightnvm: pblk: Move lba list to partial read context lightnvm: pblk: Helpers for OOB metadata lightnvm: Flexible DMA pool entry size lightnvm: Disable interleaved metadata lightnvm: pblk: Support for packed metadata drivers/lightnvm/core.c | 45 +++ drivers/lightnvm/pblk-core.c | 66 ++-- drivers/lightnvm/pblk-init.c | 43 -- drivers/lightnvm/pblk-map.c | 20 +++- drivers/lightnvm/pblk-rb.c | 3 ++ drivers/lightnvm/pblk-read.c | 63 +- drivers/lightnvm/pblk-recovery.c | 22 -- drivers/lightnvm/pblk-sysfs.c| 7 + drivers/lightnvm/pblk-write.c| 14 ++--- drivers/lightnvm/pblk.h | 47 ++-- drivers/nvme/host/lightnvm.c | 9 -- include/linux/lightnvm.h | 5 ++- 12 files changed, 272 insertions(+), 72 deletions(-) -- 2.14.4
[PATCH v2 4/5] lightnvm: Disable interleaved metadata
Currently pblk and lightnvm does only check for size of OOB metadata and does not care wheather this meta is located in separate buffer or is interleaved with data in single buffer. In reality only the first scenario is supported, where second mode will break pblk functionality during any IO operation. The goal of this patch is to block creation of pblk devices in case of interleaved metadata Signed-off-by: Igor Konopko --- drivers/lightnvm/pblk-init.c | 6 ++ drivers/nvme/host/lightnvm.c | 1 + include/linux/lightnvm.h | 1 + 3 files changed, 8 insertions(+) diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c index 13822594647c..ded0618f6cda 100644 --- a/drivers/lightnvm/pblk-init.c +++ b/drivers/lightnvm/pblk-init.c @@ -1154,6 +1154,12 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk, return ERR_PTR(-EINVAL); } + if (geo->ext) { + pblk_err(pblk, "extended metadata not supported\n"); + kfree(pblk); + return ERR_PTR(-EINVAL); + } + spin_lock_init(>resubmit_lock); spin_lock_init(>trans_lock); spin_lock_init(>lock); diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c index d1e47a93bcfd..b71c730a6e32 100644 --- a/drivers/nvme/host/lightnvm.c +++ b/drivers/nvme/host/lightnvm.c @@ -983,6 +983,7 @@ void nvme_nvm_update_nvm_info(struct nvme_ns *ns) geo->csecs = 1 << ns->lba_shift; geo->sos = ns->ms; + geo->ext = ns->ext; if (nvm_realloc_dma_pool(ndev)) nvm_unregister(ndev); diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h index 9d3b7c627cac..4870022ebff1 100644 --- a/include/linux/lightnvm.h +++ b/include/linux/lightnvm.h @@ -357,6 +357,7 @@ struct nvm_geo { u32 clba; /* sectors per chunk */ u16 csecs; /* sector size */ u16 sos;/* out-of-band area size */ + boolext;/* metadata in extended data buffer */ /* device write constrains */ u32 ws_min; /* minimum write size */ -- 2.14.4
[PATCH v2 1/5] lightnvm: pblk: Move lba list to partial read context
Currently DMA allocated memory is reused on partial read for lba_list_mem and lba_list_media arrays. In preparation for dynamic DMA pool sizes we need to move this arrays into pblk_pr_ctx structures. Signed-off-by: Igor Konopko --- drivers/lightnvm/pblk-read.c | 20 +--- drivers/lightnvm/pblk.h | 2 ++ 2 files changed, 7 insertions(+), 15 deletions(-) diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c index 9fba614adeeb..19917d3c19b3 100644 --- a/drivers/lightnvm/pblk-read.c +++ b/drivers/lightnvm/pblk-read.c @@ -224,7 +224,6 @@ static void pblk_end_partial_read(struct nvm_rq *rqd) unsigned long *read_bitmap = pr_ctx->bitmap; int nr_secs = pr_ctx->orig_nr_secs; int nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs); - __le64 *lba_list_mem, *lba_list_media; void *src_p, *dst_p; int hole, i; @@ -237,13 +236,9 @@ static void pblk_end_partial_read(struct nvm_rq *rqd) rqd->ppa_list[0] = ppa; } - /* Re-use allocated memory for intermediate lbas */ - lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size); - lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk_dma_ppa_size); - for (i = 0; i < nr_secs; i++) { - lba_list_media[i] = meta_list[i].lba; - meta_list[i].lba = lba_list_mem[i]; + pr_ctx->lba_list_media[i] = meta_list[i].lba; + meta_list[i].lba = pr_ctx->lba_list_mem[i]; } /* Fill the holes in the original bio */ @@ -255,7 +250,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd) line = pblk_ppa_to_line(pblk, rqd->ppa_list[i]); kref_put(>ref, pblk_line_put); - meta_list[hole].lba = lba_list_media[i]; + meta_list[hole].lba = pr_ctx->lba_list_media[i]; src_bv = new_bio->bi_io_vec[i++]; dst_bv = bio->bi_io_vec[bio_init_idx + hole]; @@ -295,13 +290,9 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd, struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd); struct pblk_pr_ctx *pr_ctx; struct bio *new_bio, *bio = r_ctx->private; - __le64 *lba_list_mem; int nr_secs = rqd->nr_ppas; int i; - /* Re-use allocated memory for intermediate lbas */ - lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size); - new_bio = bio_alloc(GFP_KERNEL, nr_holes); if (pblk_bio_add_pages(pblk, new_bio, GFP_KERNEL, nr_holes)) @@ -312,12 +303,12 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd, goto fail_free_pages; } - pr_ctx = kmalloc(sizeof(struct pblk_pr_ctx), GFP_KERNEL); + pr_ctx = kzalloc(sizeof(struct pblk_pr_ctx), GFP_KERNEL); if (!pr_ctx) goto fail_free_pages; for (i = 0; i < nr_secs; i++) - lba_list_mem[i] = meta_list[i].lba; + pr_ctx->lba_list_mem[i] = meta_list[i].lba; new_bio->bi_iter.bi_sector = 0; /* internal bio */ bio_set_op_attrs(new_bio, REQ_OP_READ, 0); @@ -325,7 +316,6 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd, rqd->bio = new_bio; rqd->nr_ppas = nr_holes; - pr_ctx->ppa_ptr = NULL; pr_ctx->orig_bio = bio; bitmap_copy(pr_ctx->bitmap, read_bitmap, NVM_MAX_VLBA); pr_ctx->bio_init_idx = bio_init_idx; diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h index 02bb2e98f8a9..2aca840c7838 100644 --- a/drivers/lightnvm/pblk.h +++ b/drivers/lightnvm/pblk.h @@ -132,6 +132,8 @@ struct pblk_pr_ctx { unsigned int bio_init_idx; void *ppa_ptr; dma_addr_t dma_ppa_list; + __le64 lba_list_mem[NVM_MAX_VLBA]; + __le64 lba_list_media[NVM_MAX_VLBA]; }; /* Pad context */ -- 2.14.4
Re: [PATCH 3/5] lightnvm: Flexible DMA pool entry size
On 09.10.2018 14:36, Javier Gonzalez wrote: On 9 Oct 2018, at 21.10, Hans Holmberg wrote: On Tue, Oct 9, 2018 at 12:03 PM Igor Konopko wrote: On 09.10.2018 11:16, Hans Holmberg wrote: On Fri, Oct 5, 2018 at 3:38 PM Igor Konopko wrote: Currently whole lightnvm and pblk uses single DMA pool, for which entry size is always equal to PAGE_SIZE. PPA list always needs 8b*64, so there is only 56b*64 space for OOB meta. Since NVMe OOB meta can be bigger, such as 128b, this solution is not robustness. This patch add the possiblity to support OOB meta above 56b by creating separate DMA pool for PBLK with entry size which is big enough to store both PPA list and such a OOB metadata. Signed-off-by: Igor Konopko --- drivers/lightnvm/core.c | 33 +++- drivers/lightnvm/pblk-core.c | 19 +- drivers/lightnvm/pblk-init.c | 11 +++ drivers/lightnvm/pblk-read.c | 3 ++- drivers/lightnvm/pblk-recovery.c | 9 + drivers/lightnvm/pblk.h | 11 ++- drivers/nvme/host/lightnvm.c | 6 -- include/linux/lightnvm.h | 8 +--- 8 files changed, 71 insertions(+), 29 deletions(-) diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c index efb976a863d2..48db7a096257 100644 --- a/drivers/lightnvm/core.c +++ b/drivers/lightnvm/core.c @@ -641,20 +641,33 @@ void nvm_unregister_tgt_type(struct nvm_tgt_type *tt) } EXPORT_SYMBOL(nvm_unregister_tgt_type); -void *nvm_dev_dma_alloc(struct nvm_dev *dev, gfp_t mem_flags, - dma_addr_t *dma_handler) +void *nvm_dev_dma_alloc(struct nvm_dev *dev, void *pool, + gfp_t mem_flags, dma_addr_t *dma_handler) { - return dev->ops->dev_dma_alloc(dev, dev->dma_pool, mem_flags, - dma_handler); + return dev->ops->dev_dma_alloc(dev, pool ?: dev->dma_pool, + mem_flags, dma_handler); } EXPORT_SYMBOL(nvm_dev_dma_alloc); -void nvm_dev_dma_free(struct nvm_dev *dev, void *addr, dma_addr_t dma_handler) +void nvm_dev_dma_free(struct nvm_dev *dev, void *pool, + void *addr, dma_addr_t dma_handler) { - dev->ops->dev_dma_free(dev->dma_pool, addr, dma_handler); + dev->ops->dev_dma_free(pool ?: dev->dma_pool, addr, dma_handler); } EXPORT_SYMBOL(nvm_dev_dma_free); +void *nvm_dev_dma_create(struct nvm_dev *dev, int size, char *name) +{ + return dev->ops->create_dma_pool(dev, name, size); +} +EXPORT_SYMBOL(nvm_dev_dma_create); + +void nvm_dev_dma_destroy(struct nvm_dev *dev, void *pool) +{ + dev->ops->destroy_dma_pool(pool); +} +EXPORT_SYMBOL(nvm_dev_dma_destroy); + static struct nvm_dev *nvm_find_nvm_dev(const char *name) { struct nvm_dev *dev; @@ -682,7 +695,8 @@ static int nvm_set_rqd_ppalist(struct nvm_tgt_dev *tgt_dev, struct nvm_rq *rqd, } rqd->nr_ppas = nr_ppas; - rqd->ppa_list = nvm_dev_dma_alloc(dev, GFP_KERNEL, >dma_ppa_list); + rqd->ppa_list = nvm_dev_dma_alloc(dev, NULL, GFP_KERNEL, + >dma_ppa_list); if (!rqd->ppa_list) { pr_err("nvm: failed to allocate dma memory\n"); return -ENOMEM; @@ -708,7 +722,8 @@ static void nvm_free_rqd_ppalist(struct nvm_tgt_dev *tgt_dev, if (!rqd->ppa_list) return; - nvm_dev_dma_free(tgt_dev->parent, rqd->ppa_list, rqd->dma_ppa_list); + nvm_dev_dma_free(tgt_dev->parent, NULL, rqd->ppa_list, + rqd->dma_ppa_list); } static int nvm_set_flags(struct nvm_geo *geo, struct nvm_rq *rqd) @@ -1145,7 +1160,7 @@ int nvm_register(struct nvm_dev *dev) if (!dev->q || !dev->ops) return -EINVAL; - dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist"); + dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist", PAGE_SIZE); if (!dev->dma_pool) { pr_err("nvm: could not create dma pool\n"); return -ENOMEM; Why hack the nvm_dev_ interfaces when you are not using the dev pool anyway? Wouldn't it be more straightforward to use dma_pool_* instead? In order to call dma_pool_create() I need NVMe device structure, which in my understanding is not public, so this is why I decided to reuse plumbing which was available in nvm_dev_* interfaces. Hmm, yes, I see now. If there is some easy way to call dma_pool_create() from pblk module and I'm missing that - let me know. I can rewrite this part, if there is some better way to do so. Create and destroy needs to go through dev->ops, but once you have allocated the pool, there is no need for going through t
Re: [PATCH 3/5] lightnvm: Flexible DMA pool entry size
On 09.10.2018 11:16, Hans Holmberg wrote: On Fri, Oct 5, 2018 at 3:38 PM Igor Konopko wrote: Currently whole lightnvm and pblk uses single DMA pool, for which entry size is always equal to PAGE_SIZE. PPA list always needs 8b*64, so there is only 56b*64 space for OOB meta. Since NVMe OOB meta can be bigger, such as 128b, this solution is not robustness. This patch add the possiblity to support OOB meta above 56b by creating separate DMA pool for PBLK with entry size which is big enough to store both PPA list and such a OOB metadata. Signed-off-by: Igor Konopko --- drivers/lightnvm/core.c | 33 +++- drivers/lightnvm/pblk-core.c | 19 +- drivers/lightnvm/pblk-init.c | 11 +++ drivers/lightnvm/pblk-read.c | 3 ++- drivers/lightnvm/pblk-recovery.c | 9 + drivers/lightnvm/pblk.h | 11 ++- drivers/nvme/host/lightnvm.c | 6 -- include/linux/lightnvm.h | 8 +--- 8 files changed, 71 insertions(+), 29 deletions(-) diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c index efb976a863d2..48db7a096257 100644 --- a/drivers/lightnvm/core.c +++ b/drivers/lightnvm/core.c @@ -641,20 +641,33 @@ void nvm_unregister_tgt_type(struct nvm_tgt_type *tt) } EXPORT_SYMBOL(nvm_unregister_tgt_type); -void *nvm_dev_dma_alloc(struct nvm_dev *dev, gfp_t mem_flags, - dma_addr_t *dma_handler) +void *nvm_dev_dma_alloc(struct nvm_dev *dev, void *pool, + gfp_t mem_flags, dma_addr_t *dma_handler) { - return dev->ops->dev_dma_alloc(dev, dev->dma_pool, mem_flags, - dma_handler); + return dev->ops->dev_dma_alloc(dev, pool ?: dev->dma_pool, + mem_flags, dma_handler); } EXPORT_SYMBOL(nvm_dev_dma_alloc); -void nvm_dev_dma_free(struct nvm_dev *dev, void *addr, dma_addr_t dma_handler) +void nvm_dev_dma_free(struct nvm_dev *dev, void *pool, + void *addr, dma_addr_t dma_handler) { - dev->ops->dev_dma_free(dev->dma_pool, addr, dma_handler); + dev->ops->dev_dma_free(pool ?: dev->dma_pool, addr, dma_handler); } EXPORT_SYMBOL(nvm_dev_dma_free); +void *nvm_dev_dma_create(struct nvm_dev *dev, int size, char *name) +{ + return dev->ops->create_dma_pool(dev, name, size); +} +EXPORT_SYMBOL(nvm_dev_dma_create); + +void nvm_dev_dma_destroy(struct nvm_dev *dev, void *pool) +{ + dev->ops->destroy_dma_pool(pool); +} +EXPORT_SYMBOL(nvm_dev_dma_destroy); + static struct nvm_dev *nvm_find_nvm_dev(const char *name) { struct nvm_dev *dev; @@ -682,7 +695,8 @@ static int nvm_set_rqd_ppalist(struct nvm_tgt_dev *tgt_dev, struct nvm_rq *rqd, } rqd->nr_ppas = nr_ppas; - rqd->ppa_list = nvm_dev_dma_alloc(dev, GFP_KERNEL, >dma_ppa_list); + rqd->ppa_list = nvm_dev_dma_alloc(dev, NULL, GFP_KERNEL, + >dma_ppa_list); if (!rqd->ppa_list) { pr_err("nvm: failed to allocate dma memory\n"); return -ENOMEM; @@ -708,7 +722,8 @@ static void nvm_free_rqd_ppalist(struct nvm_tgt_dev *tgt_dev, if (!rqd->ppa_list) return; - nvm_dev_dma_free(tgt_dev->parent, rqd->ppa_list, rqd->dma_ppa_list); + nvm_dev_dma_free(tgt_dev->parent, NULL, rqd->ppa_list, + rqd->dma_ppa_list); } static int nvm_set_flags(struct nvm_geo *geo, struct nvm_rq *rqd) @@ -1145,7 +1160,7 @@ int nvm_register(struct nvm_dev *dev) if (!dev->q || !dev->ops) return -EINVAL; - dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist"); + dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist", PAGE_SIZE); if (!dev->dma_pool) { pr_err("nvm: could not create dma pool\n"); return -ENOMEM; Why hack the nvm_dev_ interfaces when you are not using the dev pool anyway? Wouldn't it be more straightforward to use dma_pool_* instead? In order to call dma_pool_create() I need NVMe device structure, which in my understanding is not public, so this is why I decided to reuse plumbing which was available in nvm_dev_* interfaces. If there is some easy way to call dma_pool_create() from pblk module and I'm missing that - let me know. I can rewrite this part, if there is some better way to do so. diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c index 7cb39d84c833..131972b13e27 100644 --- a/drivers/lightnvm/pblk-core.c +++ b/drivers/lightnvm/pblk-core.c @@ -242,16 +242,16 @@ int pblk_alloc_rqd_meta(str
Re: [PATCH 2/5] lightnvm: pblk: Helpers for OOB metadata
On 09.10.2018 11:02, Hans Holmberg wrote: Hi Igor! One important thing: this patch breaks the on-disk-storage format so that needs to be handled(see my comment on this) and some additional nitpicks below. Thanks, Hans On Fri, Oct 5, 2018 at 3:38 PM Igor Konopko wrote: Currently pblk assumes that size of OOB metadata on drive is always equal to size of pblk_sec_meta struct. This commit add helpers which will allow to handle different sizes of OOB metadata on drive. Signed-off-by: Igor Konopko --- drivers/lightnvm/pblk-core.c | 6 ++--- drivers/lightnvm/pblk-map.c | 21 ++-- drivers/lightnvm/pblk-read.c | 41 +++- drivers/lightnvm/pblk-recovery.c | 14 ++- drivers/lightnvm/pblk.h | 37 +++- 5 files changed, 86 insertions(+), 33 deletions(-) diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c index 6944aac43b01..7cb39d84c833 100644 --- a/drivers/lightnvm/pblk-core.c +++ b/drivers/lightnvm/pblk-core.c @@ -743,7 +743,6 @@ int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line) rqd.opcode = NVM_OP_PREAD; rqd.nr_ppas = lm->smeta_sec; rqd.is_seq = 1; - for (i = 0; i < lm->smeta_sec; i++, paddr++) rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id); @@ -796,10 +795,11 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line, rqd.is_seq = 1; for (i = 0; i < lm->smeta_sec; i++, paddr++) { - struct pblk_sec_meta *meta_list = rqd.meta_list; + void *meta_list = rqd.meta_list; rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id); - meta_list[i].lba = lba_list[paddr] = addr_empty; + pblk_set_meta_lba(pblk, meta_list, i, ADDR_EMPTY); + lba_list[paddr] = addr_empty; } ret = pblk_submit_io_sync_sem(pblk, ); diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c index 6dcbd44e3acb..4c7a9909308e 100644 --- a/drivers/lightnvm/pblk-map.c +++ b/drivers/lightnvm/pblk-map.c @@ -22,7 +22,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry, struct ppa_addr *ppa_list, unsigned long *lun_bitmap, - struct pblk_sec_meta *meta_list, + void *meta_list, unsigned int valid_secs) { struct pblk_line *line = pblk_line_get_data(pblk); @@ -68,14 +68,15 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry, kref_get(>ref); w_ctx = pblk_rb_w_ctx(>rwb, sentry + i); w_ctx->ppa = ppa_list[i]; - meta_list[i].lba = cpu_to_le64(w_ctx->lba); + pblk_set_meta_lba(pblk, meta_list, i, w_ctx->lba); lba_list[paddr] = cpu_to_le64(w_ctx->lba); if (lba_list[paddr] != addr_empty) line->nr_valid_lbas++; else atomic64_inc(>pad_wa); } else { - lba_list[paddr] = meta_list[i].lba = addr_empty; + lba_list[paddr] = addr_empty; + pblk_set_meta_lba(pblk, meta_list, i, ADDR_EMPTY); __pblk_map_invalidate(pblk, line, paddr); } } @@ -88,7 +89,7 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry, unsigned long *lun_bitmap, unsigned int valid_secs, unsigned int off) { - struct pblk_sec_meta *meta_list = rqd->meta_list; + void *meta_list = rqd->meta_list; struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd); unsigned int map_secs; int min = pblk->min_write_pgs; @@ -97,7 +98,10 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry, for (i = off; i < rqd->nr_ppas; i += min) { map_secs = (i + min > valid_secs) ? (valid_secs % min) : min; if (pblk_map_page_data(pblk, sentry + i, _list[i], - lun_bitmap, _list[i], map_secs)) { + lun_bitmap, + pblk_get_meta_buffer(pblk, +meta_list, i), + map_secs)) { bio_put(rqd->bio); pblk_free_rqd(pblk, rqd, PBLK_WRITE); pblk_pipeline_stop(pblk); @@ -113,7 +117,7 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd, struct nvm_tgt_dev *dev = pblk->dev;
[PATCH 0/5] lightnvm: pblk: Flexible metadata
This series of patches extends the way how pblk can store L2P sector metadata. After this set of changes any size of NVMe metadata (including 0) is supported. Igor Konopko (5): lightnvm: pblk: Do not reuse DMA memory on partial read lightnvm: pblk: Helpers for OOB metadata lightnvm: Flexible DMA pool entry size lightnvm: Disable interleaved metadata lightnvm: pblk: Support for packed metadata drivers/lightnvm/core.c | 33 ++ drivers/lightnvm/pblk-core.c | 77 +--- drivers/lightnvm/pblk-init.c | 54 +- drivers/lightnvm/pblk-map.c | 21 ++--- drivers/lightnvm/pblk-rb.c | 3 ++ drivers/lightnvm/pblk-read.c | 56 +++ drivers/lightnvm/pblk-recovery.c | 28 +++- drivers/lightnvm/pblk-sysfs.c| 7 +++ drivers/lightnvm/pblk-write.c| 14 -- drivers/lightnvm/pblk.h | 55 +-- drivers/nvme/host/lightnvm.c | 7 ++- include/linux/lightnvm.h | 9 ++-- 12 files changed, 278 insertions(+), 86 deletions(-) -- 2.17.1
[PATCH 5/5] lightnvm: pblk: Support for packed metadata
In current pblk implementation, l2p mapping for not closed lines is always stored only in OOB metadata and recovered from it. Such a solution does not provide data integrity when drives does not have such a OOB metadata space. The goal of this patch is to add support for so called packed metadata, which store l2p mapping for open lines in last sector of every write unit. Signed-off-by: Igor Konopko --- drivers/lightnvm/pblk-core.c | 52 +--- drivers/lightnvm/pblk-init.c | 37 +-- drivers/lightnvm/pblk-rb.c | 3 ++ drivers/lightnvm/pblk-recovery.c | 5 +-- drivers/lightnvm/pblk-sysfs.c| 7 + drivers/lightnvm/pblk-write.c| 14 ++--- drivers/lightnvm/pblk.h | 5 ++- 7 files changed, 110 insertions(+), 13 deletions(-) diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c index 131972b13e27..e11a46c05067 100644 --- a/drivers/lightnvm/pblk-core.c +++ b/drivers/lightnvm/pblk-core.c @@ -376,7 +376,7 @@ void pblk_write_should_kick(struct pblk *pblk) { unsigned int secs_avail = pblk_rb_read_count(>rwb); - if (secs_avail >= pblk->min_write_pgs) + if (secs_avail >= pblk->min_write_pgs_data) pblk_write_kick(pblk); } @@ -407,7 +407,9 @@ struct list_head *pblk_line_gc_list(struct pblk *pblk, struct pblk_line *line) struct pblk_line_meta *lm = >lm; struct pblk_line_mgmt *l_mg = >l_mg; struct list_head *move_list = NULL; - int vsc = le32_to_cpu(*line->vsc); + int packed_meta = (le32_to_cpu(*line->vsc) / pblk->min_write_pgs_data) + * (pblk->min_write_pgs - pblk->min_write_pgs_data); + int vsc = le32_to_cpu(*line->vsc) + packed_meta; lockdep_assert_held(>lock); @@ -620,12 +622,15 @@ struct bio *pblk_bio_map_addr(struct pblk *pblk, void *data, } int pblk_calc_secs(struct pblk *pblk, unsigned long secs_avail, - unsigned long secs_to_flush) + unsigned long secs_to_flush, bool skip_meta) { int max = pblk->sec_per_write; int min = pblk->min_write_pgs; int secs_to_sync = 0; + if (skip_meta && pblk->min_write_pgs_data != pblk->min_write_pgs) + min = max = pblk->min_write_pgs_data; + if (secs_avail >= max) secs_to_sync = max; else if (secs_avail >= min) @@ -851,7 +856,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line, next_rq: memset(, 0, sizeof(struct nvm_rq)); - rq_ppas = pblk_calc_secs(pblk, left_ppas, 0); + rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false); rq_len = rq_ppas * geo->csecs; bio = pblk_bio_map_addr(pblk, emeta_buf, rq_ppas, rq_len, @@ -2161,3 +2166,42 @@ void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas, } spin_unlock(>trans_lock); } + +void pblk_set_packed_meta(struct pblk *pblk, struct nvm_rq *rqd) +{ + void *meta_list = rqd->meta_list; + void *page; + int i = 0; + + if (pblk_is_oob_meta_supported(pblk)) + return; + + /* We need to zero out metadata corresponding to packed meta page */ + pblk_set_meta_lba(pblk, meta_list, rqd->nr_ppas - 1, ADDR_EMPTY); + + page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page); + /* We need to fill last page of request (packed metadata) +* with data from oob meta buffer. +*/ + for (; i < rqd->nr_ppas; i++) + memcpy(page + (i * sizeof(struct pblk_sec_meta)), + pblk_get_meta_buffer(pblk, meta_list, i), + sizeof(struct pblk_sec_meta)); +} + +void pblk_get_packed_meta(struct pblk *pblk, struct nvm_rq *rqd) +{ + void *meta_list = rqd->meta_list; + void *page; + int i = 0; + + if (pblk_is_oob_meta_supported(pblk)) + return; + + page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page); + /* We need to fill oob meta buffer with data from packe metadata */ + for (; i < rqd->nr_ppas; i++) + memcpy(pblk_get_meta_buffer(pblk, meta_list, i), + page + (i * sizeof(struct pblk_sec_meta)), + sizeof(struct pblk_sec_meta)); +} diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c index 1529aa37b30f..d2a63494def6 100644 --- a/drivers/lightnvm/pblk-init.c +++ b/drivers/lightnvm/pblk-init.c @@ -407,8 +407,40 @@ static int pblk_core_init(struct pblk *pblk) pblk->min_write_pgs = geo->ws_opt; max_write_ppas = pblk->min_write_pgs * geo->all_luns; pblk->max_write_pgs = min_t(int, max_write_ppas, NVM_MAX_VLBA); + pblk->min_write_pgs_data = pblk->min_write_pgs; pblk_set_sec_per_writ
[PATCH 4/5] lightnvm: Disable interleaved metadata
Currently pblk and lightnvm does only check for size of OOB metadata and does not care wheather this meta is located in separate buffer or is interleaved with data in single buffer. In reality only the first scenario is supported, where second mode will break pblk functionality during any IO operation. The goal of this patch is to block creation of pblk devices in case of interleaved metadata Signed-off-by: Igor Konopko --- drivers/lightnvm/pblk-init.c | 6 ++ drivers/nvme/host/lightnvm.c | 1 + include/linux/lightnvm.h | 1 + 3 files changed, 8 insertions(+) diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c index b794e279da31..1529aa37b30f 100644 --- a/drivers/lightnvm/pblk-init.c +++ b/drivers/lightnvm/pblk-init.c @@ -1152,6 +1152,12 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk, return ERR_PTR(-EINVAL); } + if (geo->ext) { + pblk_err(pblk, "extended metadata not supported\n"); + kfree(pblk); + return ERR_PTR(-EINVAL); + } + spin_lock_init(>resubmit_lock); spin_lock_init(>trans_lock); spin_lock_init(>lock); diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c index e370793f52d5..7020e87bcee4 100644 --- a/drivers/nvme/host/lightnvm.c +++ b/drivers/nvme/host/lightnvm.c @@ -989,6 +989,7 @@ void nvme_nvm_update_nvm_info(struct nvme_ns *ns) geo->csecs = 1 << ns->lba_shift; geo->sos = ns->ms; + geo->ext = ns->ext; } int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node) diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h index c6c998716ee7..abd29f50f2a1 100644 --- a/include/linux/lightnvm.h +++ b/include/linux/lightnvm.h @@ -357,6 +357,7 @@ struct nvm_geo { u32 clba; /* sectors per chunk */ u16 csecs; /* sector size */ u16 sos;/* out-of-band area size */ + u16 ext;/* metadata in extended data buffer */ /* device write constrains */ u32 ws_min; /* minimum write size */ -- 2.17.1
[PATCH 1/5] lightnvm: pblk: Do not reuse DMA memory on partial read
Currently DMA allocated memory is reused on partial read path for some internal pblk structs. In preparation for dynamic DMA pool sizes we need to change it to kmalloc allocated memory. Signed-off-by: Igor Konopko --- drivers/lightnvm/pblk-read.c | 20 +--- drivers/lightnvm/pblk.h | 2 ++ 2 files changed, 7 insertions(+), 15 deletions(-) diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c index d340dece1d00..08f6ebd4bc48 100644 --- a/drivers/lightnvm/pblk-read.c +++ b/drivers/lightnvm/pblk-read.c @@ -224,7 +224,6 @@ static void pblk_end_partial_read(struct nvm_rq *rqd) unsigned long *read_bitmap = pr_ctx->bitmap; int nr_secs = pr_ctx->orig_nr_secs; int nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs); - __le64 *lba_list_mem, *lba_list_media; void *src_p, *dst_p; int hole, i; @@ -237,13 +236,9 @@ static void pblk_end_partial_read(struct nvm_rq *rqd) rqd->ppa_list[0] = ppa; } - /* Re-use allocated memory for intermediate lbas */ - lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size); - lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk_dma_ppa_size); - for (i = 0; i < nr_secs; i++) { - lba_list_media[i] = meta_list[i].lba; - meta_list[i].lba = lba_list_mem[i]; + pr_ctx->lba_list_media[i] = meta_list[i].lba; + meta_list[i].lba = pr_ctx->lba_list_mem[i]; } /* Fill the holes in the original bio */ @@ -255,7 +250,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd) line = pblk_ppa_to_line(pblk, rqd->ppa_list[i]); kref_put(>ref, pblk_line_put); - meta_list[hole].lba = lba_list_media[i]; + meta_list[hole].lba = pr_ctx->lba_list_media[i]; src_bv = new_bio->bi_io_vec[i++]; dst_bv = bio->bi_io_vec[bio_init_idx + hole]; @@ -295,13 +290,9 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd, struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd); struct pblk_pr_ctx *pr_ctx; struct bio *new_bio, *bio = r_ctx->private; - __le64 *lba_list_mem; int nr_secs = rqd->nr_ppas; int i; - /* Re-use allocated memory for intermediate lbas */ - lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size); - new_bio = bio_alloc(GFP_KERNEL, nr_holes); if (pblk_bio_add_pages(pblk, new_bio, GFP_KERNEL, nr_holes)) @@ -312,12 +303,12 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd, goto fail_free_pages; } - pr_ctx = kmalloc(sizeof(struct pblk_pr_ctx), GFP_KERNEL); + pr_ctx = kzalloc(sizeof(struct pblk_pr_ctx), GFP_KERNEL); if (!pr_ctx) goto fail_free_pages; for (i = 0; i < nr_secs; i++) - lba_list_mem[i] = meta_list[i].lba; + pr_ctx->lba_list_mem[i] = meta_list[i].lba; new_bio->bi_iter.bi_sector = 0; /* internal bio */ bio_set_op_attrs(new_bio, REQ_OP_READ, 0); @@ -325,7 +316,6 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd, rqd->bio = new_bio; rqd->nr_ppas = nr_holes; - pr_ctx->ppa_ptr = NULL; pr_ctx->orig_bio = bio; bitmap_copy(pr_ctx->bitmap, read_bitmap, NVM_MAX_VLBA); pr_ctx->bio_init_idx = bio_init_idx; diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h index 0f98ea24ee59..aea09879636f 100644 --- a/drivers/lightnvm/pblk.h +++ b/drivers/lightnvm/pblk.h @@ -132,6 +132,8 @@ struct pblk_pr_ctx { unsigned int bio_init_idx; void *ppa_ptr; dma_addr_t dma_ppa_list; + __le64 lba_list_mem[NVM_MAX_VLBA]; + __le64 lba_list_media[NVM_MAX_VLBA]; }; /* Pad context */ -- 2.17.1
[PATCH 2/5] lightnvm: pblk: Helpers for OOB metadata
Currently pblk assumes that size of OOB metadata on drive is always equal to size of pblk_sec_meta struct. This commit add helpers which will allow to handle different sizes of OOB metadata on drive. Signed-off-by: Igor Konopko --- drivers/lightnvm/pblk-core.c | 6 ++--- drivers/lightnvm/pblk-map.c | 21 ++-- drivers/lightnvm/pblk-read.c | 41 +++- drivers/lightnvm/pblk-recovery.c | 14 ++- drivers/lightnvm/pblk.h | 37 +++- 5 files changed, 86 insertions(+), 33 deletions(-) diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c index 6944aac43b01..7cb39d84c833 100644 --- a/drivers/lightnvm/pblk-core.c +++ b/drivers/lightnvm/pblk-core.c @@ -743,7 +743,6 @@ int pblk_line_smeta_read(struct pblk *pblk, struct pblk_line *line) rqd.opcode = NVM_OP_PREAD; rqd.nr_ppas = lm->smeta_sec; rqd.is_seq = 1; - for (i = 0; i < lm->smeta_sec; i++, paddr++) rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id); @@ -796,10 +795,11 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line, rqd.is_seq = 1; for (i = 0; i < lm->smeta_sec; i++, paddr++) { - struct pblk_sec_meta *meta_list = rqd.meta_list; + void *meta_list = rqd.meta_list; rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id); - meta_list[i].lba = lba_list[paddr] = addr_empty; + pblk_set_meta_lba(pblk, meta_list, i, ADDR_EMPTY); + lba_list[paddr] = addr_empty; } ret = pblk_submit_io_sync_sem(pblk, ); diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c index 6dcbd44e3acb..4c7a9909308e 100644 --- a/drivers/lightnvm/pblk-map.c +++ b/drivers/lightnvm/pblk-map.c @@ -22,7 +22,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry, struct ppa_addr *ppa_list, unsigned long *lun_bitmap, - struct pblk_sec_meta *meta_list, + void *meta_list, unsigned int valid_secs) { struct pblk_line *line = pblk_line_get_data(pblk); @@ -68,14 +68,15 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry, kref_get(>ref); w_ctx = pblk_rb_w_ctx(>rwb, sentry + i); w_ctx->ppa = ppa_list[i]; - meta_list[i].lba = cpu_to_le64(w_ctx->lba); + pblk_set_meta_lba(pblk, meta_list, i, w_ctx->lba); lba_list[paddr] = cpu_to_le64(w_ctx->lba); if (lba_list[paddr] != addr_empty) line->nr_valid_lbas++; else atomic64_inc(>pad_wa); } else { - lba_list[paddr] = meta_list[i].lba = addr_empty; + lba_list[paddr] = addr_empty; + pblk_set_meta_lba(pblk, meta_list, i, ADDR_EMPTY); __pblk_map_invalidate(pblk, line, paddr); } } @@ -88,7 +89,7 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry, unsigned long *lun_bitmap, unsigned int valid_secs, unsigned int off) { - struct pblk_sec_meta *meta_list = rqd->meta_list; + void *meta_list = rqd->meta_list; struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd); unsigned int map_secs; int min = pblk->min_write_pgs; @@ -97,7 +98,10 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry, for (i = off; i < rqd->nr_ppas; i += min) { map_secs = (i + min > valid_secs) ? (valid_secs % min) : min; if (pblk_map_page_data(pblk, sentry + i, _list[i], - lun_bitmap, _list[i], map_secs)) { + lun_bitmap, + pblk_get_meta_buffer(pblk, +meta_list, i), + map_secs)) { bio_put(rqd->bio); pblk_free_rqd(pblk, rqd, PBLK_WRITE); pblk_pipeline_stop(pblk); @@ -113,7 +117,7 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd, struct nvm_tgt_dev *dev = pblk->dev; struct nvm_geo *geo = >geo; struct pblk_line_meta *lm = >lm; - struct pblk_sec_meta *meta_list = rqd->meta_list; + void *meta_list = rqd->meta_list; struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd); struct pblk_line *e_line, *d_line; unsigned int map_secs; @
Re: [PATCH v4] lightnvm: pblk: add asynchronous partial read
nvm_rq *rqd, + unsigned int bio_init_idx, + unsigned long *read_bitmap, + int nr_holes) +{ + struct pblk_sec_meta *meta_list = rqd->meta_list; + struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd); + struct pblk_pr_ctx *pr_ctx; + struct bio *new_bio, *bio = r_ctx->private; + __le64 *lba_list_mem; + int nr_secs = rqd->nr_ppas; + int i; + + /* Re-use allocated memory for intermediate lbas */ + lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size); + + new_bio = bio_alloc(GFP_KERNEL, nr_holes); + + if (pblk_bio_add_pages(pblk, new_bio, GFP_KERNEL, nr_holes)) + goto fail_bio_put; + + if (nr_holes != new_bio->bi_vcnt) { + WARN_ONCE(1, "pblk: malformed bio\n"); + goto fail_free_pages; + } + + pr_ctx = kmalloc(sizeof(struct pblk_pr_ctx), GFP_KERNEL); + if (!pr_ctx) + goto fail_free_pages; + + for (i = 0; i < nr_secs; i++) + lba_list_mem[i] = meta_list[i].lba; + + new_bio->bi_iter.bi_sector = 0; /* internal bio */ + bio_set_op_attrs(new_bio, REQ_OP_READ, 0); + + rqd->bio = new_bio; + rqd->nr_ppas = nr_holes; + rqd->flags = pblk_set_read_mode(pblk, PBLK_READ_RANDOM); + + pr_ctx->ppa_ptr = NULL; + pr_ctx->orig_bio = bio; + bitmap_copy(pr_ctx->bitmap, read_bitmap, NVM_MAX_VLBA); + pr_ctx->bio_init_idx = bio_init_idx; + pr_ctx->orig_nr_secs = nr_secs; + r_ctx->private = pr_ctx; + + if (unlikely(nr_holes == 1)) { + pr_ctx->ppa_ptr = rqd->ppa_list; + pr_ctx->dma_ppa_list = rqd->dma_ppa_list; + rqd->ppa_addr = rqd->ppa_list[0]; + } + return 0; + +fail_free_pages: pblk_bio_free_pages(pblk, new_bio, 0, new_bio->bi_vcnt); -fail_add_pages: +fail_bio_put: + bio_put(new_bio); + + return -ENOMEM; +} + +static int pblk_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd, + unsigned int bio_init_idx, + unsigned long *read_bitmap, int nr_secs) +{ + int nr_holes; + int ret; + + nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs); + + if (pblk_setup_partial_read(pblk, rqd, bio_init_idx, read_bitmap, + nr_holes)) + return NVM_IO_ERR; + + rqd->end_io = pblk_end_partial_read; + + ret = pblk_submit_io(pblk, rqd); + if (ret) { + bio_put(rqd->bio); + pblk_err(pblk, "partial read IO submission failed\n"); + goto err; + } + + return NVM_IO_OK; + +err: pblk_err(pblk, "failed to perform partial read\n"); + + /* Free allocated pages in new bio */ + pblk_bio_free_pages(pblk, rqd->bio, 0, rqd->bio->bi_vcnt); __pblk_end_io_read(pblk, rqd, false); return NVM_IO_ERR; } @@ -480,8 +530,15 @@ int pblk_submit_read(struct pblk *pblk, struct bio *bio) /* The read bio request could be partially filled by the write buffer, * but there are some holes that need to be read from the drive. */ - return pblk_partial_read(pblk, rqd, bio, bio_init_idx, read_bitmap); + ret = pblk_partial_read_bio(pblk, rqd, bio_init_idx, read_bitmap, + nr_secs); + if (ret) + goto fail_meta_free; + + return NVM_IO_OK; +fail_meta_free: + nvm_dev_dma_free(dev->parent, rqd->meta_list, rqd->dma_meta_list); fail_rqd_free: pblk_free_rqd(pblk, rqd, PBLK_READ); return ret; diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h index 5c6904e..4760af7 100644 --- a/drivers/lightnvm/pblk.h +++ b/drivers/lightnvm/pblk.h @@ -119,6 +119,16 @@ struct pblk_g_ctx { u64 lba; }; +/* partial read context */ +struct pblk_pr_ctx { + struct bio *orig_bio; + DECLARE_BITMAP(bitmap, NVM_MAX_VLBA); + unsigned int orig_nr_secs; + unsigned int bio_init_idx; + void *ppa_ptr; + dma_addr_t dma_ppa_list; +}; + /* Pad context */ struct pblk_pad_rq { struct pblk *pblk; Hey Igor, May I add your reviewed-by before I pick up? Sure - now it looks fine. Reviewed-by: Igor Konopko
Re: [PATCH v3] lightnvm: pblk: add asynchronous partial read
On 06.07.2018 05:18, Matias Bjørling wrote: On 07/06/2018 12:12 PM, Heiner Litz wrote: In the read path, partial reads are currently performed synchronously which affects performance for workloads that generate many partial reads. This patch adds an asynchronous partial read path as well as the required partial read ctx. Signed-off-by: Heiner Litz --- v3: rebase to head, incorporate 64-bit read bitmap --- drivers/lightnvm/pblk-read.c | 183 --- drivers/lightnvm/pblk.h | 10 +++ 2 files changed, 130 insertions(+), 63 deletions(-) diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c index 9c9362b..4a44076 100644 --- a/drivers/lightnvm/pblk-read.c +++ b/drivers/lightnvm/pblk-read.c @@ -231,74 +231,36 @@ static void pblk_end_io_read(struct nvm_rq *rqd) __pblk_end_io_read(pblk, rqd, true); } -static int pblk_partial_read(struct pblk *pblk, struct nvm_rq *rqd, - struct bio *orig_bio, unsigned int bio_init_idx, - unsigned long *read_bitmap) +static void pblk_end_partial_read(struct nvm_rq *rqd) { - struct pblk_sec_meta *meta_list = rqd->meta_list; - struct bio *new_bio; + struct pblk *pblk = rqd->private; + struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd); + struct pblk_pr_ctx *pr_ctx = r_ctx->private; + struct bio *new_bio = rqd->bio; + struct bio *bio = pr_ctx->orig_bio; struct bio_vec src_bv, dst_bv; - void *ppa_ptr = NULL; - void *src_p, *dst_p; - dma_addr_t dma_ppa_list = 0; - __le64 *lba_list_mem, *lba_list_media; - int nr_secs = rqd->nr_ppas; + struct pblk_sec_meta *meta_list = rqd->meta_list; + int bio_init_idx = pr_ctx->bio_init_idx; + unsigned long *read_bitmap = _ctx->bitmap; + int nr_secs = pr_ctx->orig_nr_secs; int nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs); - int i, ret, hole; - - /* Re-use allocated memory for intermediate lbas */ - lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size); - lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk_dma_ppa_size); - - new_bio = bio_alloc(GFP_KERNEL, nr_holes); - - if (pblk_bio_add_pages(pblk, new_bio, GFP_KERNEL, nr_holes)) - goto fail_add_pages; - - if (nr_holes != new_bio->bi_vcnt) { - pblk_err(pblk, "malformed bio\n"); - goto fail; - } - - for (i = 0; i < nr_secs; i++) - lba_list_mem[i] = meta_list[i].lba; - - new_bio->bi_iter.bi_sector = 0; /* internal bio */ - bio_set_op_attrs(new_bio, REQ_OP_READ, 0); - - rqd->bio = new_bio; - rqd->nr_ppas = nr_holes; - rqd->flags = pblk_set_read_mode(pblk, PBLK_READ_RANDOM); - - if (unlikely(nr_holes == 1)) { - ppa_ptr = rqd->ppa_list; - dma_ppa_list = rqd->dma_ppa_list; - rqd->ppa_addr = rqd->ppa_list[0]; - } - - ret = pblk_submit_io_sync(pblk, rqd); - if (ret) { - bio_put(rqd->bio); - pblk_err(pblk, "sync read IO submission failed\n"); - goto fail; - } - - if (rqd->error) { - atomic_long_inc(>read_failed); -#ifdef CONFIG_NVM_PBLK_DEBUG - pblk_print_failed_rqd(pblk, rqd, rqd->error); -#endif - } + __le64 *lba_list_mem, *lba_list_media; + void *src_p, *dst_p; + int hole, i; if (unlikely(nr_holes == 1)) { struct ppa_addr ppa; ppa = rqd->ppa_addr; - rqd->ppa_list = ppa_ptr; - rqd->dma_ppa_list = dma_ppa_list; + rqd->ppa_list = pr_ctx->ppa_ptr; + rqd->dma_ppa_list = pr_ctx->dma_ppa_list; rqd->ppa_list[0] = ppa; } + /* Re-use allocated memory for intermediate lbas */ + lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size); + lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk_dma_ppa_size); + for (i = 0; i < nr_secs; i++) { lba_list_media[i] = meta_list[i].lba; meta_list[i].lba = lba_list_mem[i]; @@ -316,7 +278,7 @@ static int pblk_partial_read(struct pblk *pblk, struct nvm_rq *rqd, meta_list[hole].lba = lba_list_media[i]; src_bv = new_bio->bi_io_vec[i++]; - dst_bv = orig_bio->bi_io_vec[bio_init_idx + hole]; + dst_bv = bio->bi_io_vec[bio_init_idx + hole]; src_p = kmap_atomic(src_bv.bv_page); dst_p = kmap_atomic(dst_bv.bv_page); @@ -334,19 +296,107 @@ static int pblk_partial_read(struct pblk *pblk, struct nvm_rq *rqd, } while (hole < nr_secs); bio_put(new_bio); + kfree(pr_ctx); /* restore original request */ rqd->bio = NULL; rqd->nr_ppas = nr_secs; + bio_endio(bio); __pblk_end_io_read(pblk, rqd, false); - return NVM_IO_DONE; +} -fail: - /* Free allocated pages in new bio */ +static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd, + unsigned int bio_init_idx, + unsigned long *read_bitmap, + int nr_holes) +{ + struct pblk_sec_meta *meta_list = rqd->meta_list; + struct pblk_g_ctx *r_ctx
Re: [PATCH 4/5] lightnvm: pblk: Support for packed metadata in pblk.
On 19.06.2018 05:47, Javier Gonzalez wrote: On 19 Jun 2018, at 14.42, Matias Bjørling wrote: On Tue, Jun 19, 2018 at 1:08 PM, Javier Gonzalez wrote: On 16 Jun 2018, at 00.27, Igor Konopko wrote: In current pblk implementation, l2p mapping for not closed lines is always stored only in OOB metadata and recovered from it. Such a solution does not provide data integrity when drives does not have such a OOB metadata space. The goal of this patch is to add support for so called packed metadata, which store l2p mapping for open lines in last sector of every write unit. Signed-off-by: Igor Konopko --- drivers/lightnvm/pblk-core.c | 52 drivers/lightnvm/pblk-init.c | 37 ++-- drivers/lightnvm/pblk-rb.c | 3 +++ drivers/lightnvm/pblk-recovery.c | 25 +++ drivers/lightnvm/pblk-sysfs.c| 7 ++ drivers/lightnvm/pblk-write.c| 14 +++ drivers/lightnvm/pblk.h | 5 +++- 7 files changed, 128 insertions(+), 15 deletions(-) diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c index c092ee93a18d..375c6430612e 100644 --- a/drivers/lightnvm/pblk-core.c +++ b/drivers/lightnvm/pblk-core.c @@ -340,7 +340,7 @@ void pblk_write_should_kick(struct pblk *pblk) { unsigned int secs_avail = pblk_rb_read_count(>rwb); - if (secs_avail >= pblk->min_write_pgs) + if (secs_avail >= pblk->min_write_pgs_data) pblk_write_kick(pblk); } @@ -371,7 +371,9 @@ struct list_head *pblk_line_gc_list(struct pblk *pblk, struct pblk_line *line) struct pblk_line_meta *lm = >lm; struct pblk_line_mgmt *l_mg = >l_mg; struct list_head *move_list = NULL; - int vsc = le32_to_cpu(*line->vsc); + int packed_meta = (le32_to_cpu(*line->vsc) / pblk->min_write_pgs_data) + * (pblk->min_write_pgs - pblk->min_write_pgs_data); + int vsc = le32_to_cpu(*line->vsc) + packed_meta; lockdep_assert_held(>lock); @@ -540,12 +542,15 @@ struct bio *pblk_bio_map_addr(struct pblk *pblk, void *data, } int pblk_calc_secs(struct pblk *pblk, unsigned long secs_avail, -unsigned long secs_to_flush) +unsigned long secs_to_flush, bool skip_meta) { int max = pblk->sec_per_write; int min = pblk->min_write_pgs; int secs_to_sync = 0; + if (skip_meta) + min = max = pblk->min_write_pgs_data; + if (secs_avail >= max) secs_to_sync = max; else if (secs_avail >= min) @@ -663,7 +668,7 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line, next_rq: memset(, 0, sizeof(struct nvm_rq)); - rq_ppas = pblk_calc_secs(pblk, left_ppas, 0); + rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false); rq_len = rq_ppas * geo->csecs; bio = pblk_bio_map_addr(pblk, emeta_buf, rq_ppas, rq_len, @@ -2091,3 +2096,42 @@ void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas, } spin_unlock(>trans_lock); } + +void pblk_set_packed_meta(struct pblk *pblk, struct nvm_rq *rqd) +{ + void *meta_list = rqd->meta_list; + void *page; + int i = 0; + + if (pblk_is_oob_meta_supported(pblk)) + return; + + /* We need to zero out metadata corresponding to packed meta page */ + pblk_get_meta_at(pblk, meta_list, rqd->nr_ppas - 1)->lba = ADDR_EMPTY; + + page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page); + /* We need to fill last page of request (packed metadata) + * with data from oob meta buffer. + */ + for (; i < rqd->nr_ppas; i++) + memcpy(page + (i * sizeof(struct pblk_sec_meta)), + pblk_get_meta_at(pblk, meta_list, i), + sizeof(struct pblk_sec_meta)); +} + +void pblk_get_packed_meta(struct pblk *pblk, struct nvm_rq *rqd) +{ + void *meta_list = rqd->meta_list; + void *page; + int i = 0; + + if (pblk_is_oob_meta_supported(pblk)) + return; + + page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page); + /* We need to fill oob meta buffer with data from packed metadata */ + for (; i < rqd->nr_ppas; i++) + memcpy(pblk_get_meta_at(pblk, meta_list, i), + page + (i * sizeof(struct pblk_sec_meta)), + sizeof(struct pblk_sec_meta)); +} diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c index f05112230a52..5eb641da46ed 100644 --- a/drivers/lightnvm/pblk-init.c +++ b/drivers/lightnvm/pblk-init.c @@ -372,8 +372,40 @@ static int pblk_core_init(struct pblk *pblk) pblk->min_write_pgs = geo->ws_opt * (geo->csecs / PAGE_SIZE); max_write_ppas = pblk->min_write_pgs * geo->all_luns; pblk->max_write_pgs = min_t(int, max_write_
Re: [PATCH 1/5] lightnvm: pblk: Helpers for OOB metadata
On 18.06.2018 07:23, Javier Gonzalez wrote: On 16 Jun 2018, at 00.27, Igor Konopko wrote: Currently pblk assumes that size of OOB metadata on drive is always equal to size of pblk_sec_meta struct. This commit add helpers which will allow to handle different sizes of OOB metadata on drive. Signed-off-by: Igor Konopko --- drivers/lightnvm/pblk-core.c | 10 + drivers/lightnvm/pblk-map.c | 21 --- drivers/lightnvm/pblk-read.c | 45 +--- drivers/lightnvm/pblk-recovery.c | 24 - drivers/lightnvm/pblk.h | 29 ++ 5 files changed, 91 insertions(+), 38 deletions(-) diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c index 66ab1036f2fb..8a0ac466872f 100644 --- a/drivers/lightnvm/pblk-core.c +++ b/drivers/lightnvm/pblk-core.c @@ -685,7 +685,7 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line, rqd.nr_ppas = rq_ppas; if (dir == PBLK_WRITE) { - struct pblk_sec_meta *meta_list = rqd.meta_list; + void *meta_list = rqd.meta_list; rqd.flags = pblk_set_progr_mode(pblk, PBLK_WRITE); for (i = 0; i < rqd.nr_ppas; ) { @@ -693,7 +693,8 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line, paddr = __pblk_alloc_page(pblk, line, min); spin_unlock(>lock); for (j = 0; j < min; j++, i++, paddr++) { - meta_list[i].lba = cpu_to_le64(ADDR_EMPTY); + pblk_get_meta_at(pblk, meta_list, i)->lba = + cpu_to_le64(ADDR_EMPTY); rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, id); } @@ -825,14 +826,15 @@ static int pblk_line_submit_smeta_io(struct pblk *pblk, struct pblk_line *line, rqd.nr_ppas = lm->smeta_sec; for (i = 0; i < lm->smeta_sec; i++, paddr++) { - struct pblk_sec_meta *meta_list = rqd.meta_list; + void *meta_list = rqd.meta_list; rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id); if (dir == PBLK_WRITE) { __le64 addr_empty = cpu_to_le64(ADDR_EMPTY); - meta_list[i].lba = lba_list[paddr] = addr_empty; + pblk_get_meta_at(pblk, meta_list, i)->lba = + lba_list[paddr] = addr_empty; } } diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c index 953ca31dda68..92c40b546c4e 100644 --- a/drivers/lightnvm/pblk-map.c +++ b/drivers/lightnvm/pblk-map.c @@ -21,7 +21,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry, struct ppa_addr *ppa_list, unsigned long *lun_bitmap, - struct pblk_sec_meta *meta_list, + void *meta_list, unsigned int valid_secs) { struct pblk_line *line = pblk_line_get_data(pblk); @@ -67,14 +67,17 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry, kref_get(>ref); w_ctx = pblk_rb_w_ctx(>rwb, sentry + i); w_ctx->ppa = ppa_list[i]; - meta_list[i].lba = cpu_to_le64(w_ctx->lba); + pblk_get_meta_at(pblk, meta_list, i)->lba = + cpu_to_le64(w_ctx->lba); lba_list[paddr] = cpu_to_le64(w_ctx->lba); if (lba_list[paddr] != addr_empty) line->nr_valid_lbas++; else atomic64_inc(>pad_wa); } else { - lba_list[paddr] = meta_list[i].lba = addr_empty; + lba_list[paddr] = + pblk_get_meta_at(pblk, meta_list, i)->lba = + addr_empty; __pblk_map_invalidate(pblk, line, paddr); } } @@ -87,7 +90,7 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry, unsigned long *lun_bitmap, unsigned int valid_secs, unsigned int off) { - struct pblk_sec_meta *meta_list = rqd->meta_list; + void *meta_list = rqd->meta_list; unsigned int map_secs; int min = pblk->min_write_pgs; int i; @@ -95,7 +98,9 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry, for (i = off; i < rqd->nr_ppas; i += min) { map_secs = (i + min > val
Re: [PATCH 5/5] lightnvm: pblk: Disable interleaved metadata in pblk
On 18.06.2018 07:29, Javier Gonzalez wrote: On 16 Jun 2018, at 21.38, Matias Bjørling wrote: On 06/16/2018 12:27 AM, Igor Konopko wrote: Currently pblk and lightnvm does only check for size of OOB metadata and does not care wheather this meta is located in separate buffer or is interleaved with data in single buffer. In reality only the first scenario is supported, where second mode will break pblk functionality during any IO operation. The goal of this patch is to block creation of pblk devices in case of interleaved metadata Signed-off-by: Igor Konopko --- drivers/lightnvm/pblk-init.c | 6 ++ drivers/nvme/host/lightnvm.c | 1 + include/linux/lightnvm.h | 1 + 3 files changed, 8 insertions(+) diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c index 5eb641da46ed..483a6d479e7d 100644 --- a/drivers/lightnvm/pblk-init.c +++ b/drivers/lightnvm/pblk-init.c @@ -1238,6 +1238,12 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk, return ERR_PTR(-EINVAL); } + if (geo->ext) { + pr_err("pblk: extended (interleaved) metadata in data buffer" + " not supported\n"); + return ERR_PTR(-EINVAL); + } + pblk = kzalloc(sizeof(struct pblk), GFP_KERNEL); if (!pblk) return ERR_PTR(-ENOMEM); diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c index 670478abc754..872ab854ccf5 100644 --- a/drivers/nvme/host/lightnvm.c +++ b/drivers/nvme/host/lightnvm.c @@ -979,6 +979,7 @@ void nvme_nvm_update_nvm_info(struct nvme_ns *ns) geo->csecs = 1 << ns->lba_shift; geo->sos = ns->ms; + geo->ext = ns->ext; } int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node) diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h index 72a55d71917e..b13e64e2112f 100644 --- a/include/linux/lightnvm.h +++ b/include/linux/lightnvm.h @@ -350,6 +350,7 @@ struct nvm_geo { u32 clba; /* sectors per chunk */ u16 csecs; /* sector size */ u16 sos;/* out-of-band area size */ + u16 ext;/* metadata in extended data buffer */ /* device write constrains */ u32 ws_min; /* minimum write size */ I think bool type would be better here. Can it be placesd a bit down, just over the 1.2 stuff? Also, feel free to fix up the checkpatch stuff in patch 1 & 3 & 5. Apart from Matias' comments, it looks good to me. Traditionally, we have separated subsystem and target patches to make sure there is no coupling between pblk and lightnvm, but if Matias is ok with starting having patches covering all at once, then good for me too. Javier Will fix above comments and resend. Igor
Re: [PATCH 2/5] lightnvm: pblk: Remove resv field for sec meta
On 18.06.2018 07:25, Javier Gonzalez wrote: On 16 Jun 2018, at 21.27, Matias Bjørling wrote: On 06/16/2018 12:27 AM, Igor Konopko wrote: Since we have flexible size of pblk_sec_meta which depends on drive metadata size we can remove not needed reserved field from that structure Signed-off-by: Igor Konopko --- drivers/lightnvm/pblk.h | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h index f82c3a0b0de5..27658dc6fc1a 100644 --- a/drivers/lightnvm/pblk.h +++ b/drivers/lightnvm/pblk.h @@ -82,7 +82,6 @@ enum { }; struct pblk_sec_meta { - u64 reserved; __le64 lba; }; Looks good to me. Javier may have some comment on this, since it is not completely obvious from the code why that reserved attribute is there. I do like the change to go in, as it needlessly extends the requirement from 8 to 16bytes. Looks good to me. Maybe marge this patch with 1/5? It was actually a comment I added to it. Sure, can merge it. Igor
[PATCH 1/5] lightnvm: pblk: Helpers for OOB metadata
Currently pblk assumes that size of OOB metadata on drive is always equal to size of pblk_sec_meta struct. This commit add helpers which will allow to handle different sizes of OOB metadata on drive. Signed-off-by: Igor Konopko --- drivers/lightnvm/pblk-core.c | 10 + drivers/lightnvm/pblk-map.c | 21 --- drivers/lightnvm/pblk-read.c | 45 +--- drivers/lightnvm/pblk-recovery.c | 24 - drivers/lightnvm/pblk.h | 29 ++ 5 files changed, 91 insertions(+), 38 deletions(-) diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c index 66ab1036f2fb..8a0ac466872f 100644 --- a/drivers/lightnvm/pblk-core.c +++ b/drivers/lightnvm/pblk-core.c @@ -685,7 +685,7 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line, rqd.nr_ppas = rq_ppas; if (dir == PBLK_WRITE) { - struct pblk_sec_meta *meta_list = rqd.meta_list; + void *meta_list = rqd.meta_list; rqd.flags = pblk_set_progr_mode(pblk, PBLK_WRITE); for (i = 0; i < rqd.nr_ppas; ) { @@ -693,7 +693,8 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line, paddr = __pblk_alloc_page(pblk, line, min); spin_unlock(>lock); for (j = 0; j < min; j++, i++, paddr++) { - meta_list[i].lba = cpu_to_le64(ADDR_EMPTY); + pblk_get_meta_at(pblk, meta_list, i)->lba = + cpu_to_le64(ADDR_EMPTY); rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, id); } @@ -825,14 +826,15 @@ static int pblk_line_submit_smeta_io(struct pblk *pblk, struct pblk_line *line, rqd.nr_ppas = lm->smeta_sec; for (i = 0; i < lm->smeta_sec; i++, paddr++) { - struct pblk_sec_meta *meta_list = rqd.meta_list; + void *meta_list = rqd.meta_list; rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id); if (dir == PBLK_WRITE) { __le64 addr_empty = cpu_to_le64(ADDR_EMPTY); - meta_list[i].lba = lba_list[paddr] = addr_empty; + pblk_get_meta_at(pblk, meta_list, i)->lba = + lba_list[paddr] = addr_empty; } } diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c index 953ca31dda68..92c40b546c4e 100644 --- a/drivers/lightnvm/pblk-map.c +++ b/drivers/lightnvm/pblk-map.c @@ -21,7 +21,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry, struct ppa_addr *ppa_list, unsigned long *lun_bitmap, - struct pblk_sec_meta *meta_list, + void *meta_list, unsigned int valid_secs) { struct pblk_line *line = pblk_line_get_data(pblk); @@ -67,14 +67,17 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry, kref_get(>ref); w_ctx = pblk_rb_w_ctx(>rwb, sentry + i); w_ctx->ppa = ppa_list[i]; - meta_list[i].lba = cpu_to_le64(w_ctx->lba); + pblk_get_meta_at(pblk, meta_list, i)->lba = + cpu_to_le64(w_ctx->lba); lba_list[paddr] = cpu_to_le64(w_ctx->lba); if (lba_list[paddr] != addr_empty) line->nr_valid_lbas++; else atomic64_inc(>pad_wa); } else { - lba_list[paddr] = meta_list[i].lba = addr_empty; + lba_list[paddr] = + pblk_get_meta_at(pblk, meta_list, i)->lba = + addr_empty; __pblk_map_invalidate(pblk, line, paddr); } } @@ -87,7 +90,7 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry, unsigned long *lun_bitmap, unsigned int valid_secs, unsigned int off) { - struct pblk_sec_meta *meta_list = rqd->meta_list; + void *meta_list = rqd->meta_list; unsigned int map_secs; int min = pblk->min_write_pgs; int i; @@ -95,7 +98,9 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry, for (i = off; i < rqd->nr_ppas; i += min) { map_secs = (i + min > valid_secs) ? (valid_secs % min) : min; if (pblk_map_page_data(pblk, s
[PATCH 3/5] lightnvm: Flexible DMA pool entry size
Currently whole lightnvm and pblk uses single DMA pool, for which entry size is always equal to PAGE_SIZE. PPA list always needs 8b*64, so there is only 56b*64 space for OOB meta. Since NVMe OOB meta can be bigger, such as 128b, this solution is not robustness. This patch add the possiblity to support OOB meta above 56b by creating separate DMA pool for PBLK with entry size which is big enough to store both PPA list and such a OOB metadata. Signed-off-by: Igor Konopko --- drivers/lightnvm/core.c | 33 - drivers/lightnvm/pblk-core.c | 24 +--- drivers/lightnvm/pblk-init.c | 9 + drivers/lightnvm/pblk-read.c | 40 +++- drivers/lightnvm/pblk-recovery.c | 18 ++ drivers/lightnvm/pblk-write.c| 8 drivers/lightnvm/pblk.h | 11 ++- drivers/nvme/host/lightnvm.c | 6 -- include/linux/lightnvm.h | 8 +--- 9 files changed, 106 insertions(+), 51 deletions(-) diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c index 60aa7bc5a630..bc8e6ecea083 100644 --- a/drivers/lightnvm/core.c +++ b/drivers/lightnvm/core.c @@ -642,20 +642,33 @@ void nvm_unregister_tgt_type(struct nvm_tgt_type *tt) } EXPORT_SYMBOL(nvm_unregister_tgt_type); -void *nvm_dev_dma_alloc(struct nvm_dev *dev, gfp_t mem_flags, - dma_addr_t *dma_handler) +void *nvm_dev_dma_alloc(struct nvm_dev *dev, void *pool, + gfp_t mem_flags, dma_addr_t *dma_handler) { - return dev->ops->dev_dma_alloc(dev, dev->dma_pool, mem_flags, - dma_handler); + return dev->ops->dev_dma_alloc(dev, pool ?: dev->dma_pool, + mem_flags, dma_handler); } EXPORT_SYMBOL(nvm_dev_dma_alloc); -void nvm_dev_dma_free(struct nvm_dev *dev, void *addr, dma_addr_t dma_handler) +void nvm_dev_dma_free(struct nvm_dev *dev, void *pool, + void *addr, dma_addr_t dma_handler) { - dev->ops->dev_dma_free(dev->dma_pool, addr, dma_handler); + dev->ops->dev_dma_free(pool ?: dev->dma_pool, addr, dma_handler); } EXPORT_SYMBOL(nvm_dev_dma_free); +void *nvm_dev_dma_create(struct nvm_dev *dev, int size, char *name) +{ + return dev->ops->create_dma_pool(dev, name, size); +} +EXPORT_SYMBOL(nvm_dev_dma_create); + +void nvm_dev_dma_destroy(struct nvm_dev *dev, void *pool) +{ + dev->ops->destroy_dma_pool(pool); +} +EXPORT_SYMBOL(nvm_dev_dma_destroy); + static struct nvm_dev *nvm_find_nvm_dev(const char *name) { struct nvm_dev *dev; @@ -683,7 +696,8 @@ static int nvm_set_rqd_ppalist(struct nvm_tgt_dev *tgt_dev, struct nvm_rq *rqd, } rqd->nr_ppas = nr_ppas; - rqd->ppa_list = nvm_dev_dma_alloc(dev, GFP_KERNEL, >dma_ppa_list); + rqd->ppa_list = nvm_dev_dma_alloc(dev, NULL, GFP_KERNEL, + >dma_ppa_list); if (!rqd->ppa_list) { pr_err("nvm: failed to allocate dma memory\n"); return -ENOMEM; @@ -709,7 +723,8 @@ static void nvm_free_rqd_ppalist(struct nvm_tgt_dev *tgt_dev, if (!rqd->ppa_list) return; - nvm_dev_dma_free(tgt_dev->parent, rqd->ppa_list, rqd->dma_ppa_list); + nvm_dev_dma_free(tgt_dev->parent, NULL, rqd->ppa_list, + rqd->dma_ppa_list); } int nvm_get_chunk_meta(struct nvm_tgt_dev *tgt_dev, struct nvm_chk_meta *meta, @@ -933,7 +948,7 @@ int nvm_register(struct nvm_dev *dev) if (!dev->q || !dev->ops) return -EINVAL; - dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist"); + dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist", PAGE_SIZE); if (!dev->dma_pool) { pr_err("nvm: could not create dma pool\n"); return -ENOMEM; diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c index 8a0ac466872f..c092ee93a18d 100644 --- a/drivers/lightnvm/pblk-core.c +++ b/drivers/lightnvm/pblk-core.c @@ -279,7 +279,7 @@ void pblk_free_rqd(struct pblk *pblk, struct nvm_rq *rqd, int type) } if (rqd->meta_list) - nvm_dev_dma_free(dev->parent, rqd->meta_list, + nvm_dev_dma_free(dev->parent, pblk->dma_pool, rqd->meta_list, rqd->dma_meta_list); mempool_free(rqd, pool); } @@ -652,13 +652,13 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line, } else return -EINVAL; - meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL, -
[PATCH 4/5] lightnvm: pblk: Support for packed metadata in pblk.
In current pblk implementation, l2p mapping for not closed lines is always stored only in OOB metadata and recovered from it. Such a solution does not provide data integrity when drives does not have such a OOB metadata space. The goal of this patch is to add support for so called packed metadata, which store l2p mapping for open lines in last sector of every write unit. Signed-off-by: Igor Konopko --- drivers/lightnvm/pblk-core.c | 52 drivers/lightnvm/pblk-init.c | 37 ++-- drivers/lightnvm/pblk-rb.c | 3 +++ drivers/lightnvm/pblk-recovery.c | 25 +++ drivers/lightnvm/pblk-sysfs.c| 7 ++ drivers/lightnvm/pblk-write.c| 14 +++ drivers/lightnvm/pblk.h | 5 +++- 7 files changed, 128 insertions(+), 15 deletions(-) diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c index c092ee93a18d..375c6430612e 100644 --- a/drivers/lightnvm/pblk-core.c +++ b/drivers/lightnvm/pblk-core.c @@ -340,7 +340,7 @@ void pblk_write_should_kick(struct pblk *pblk) { unsigned int secs_avail = pblk_rb_read_count(>rwb); - if (secs_avail >= pblk->min_write_pgs) + if (secs_avail >= pblk->min_write_pgs_data) pblk_write_kick(pblk); } @@ -371,7 +371,9 @@ struct list_head *pblk_line_gc_list(struct pblk *pblk, struct pblk_line *line) struct pblk_line_meta *lm = >lm; struct pblk_line_mgmt *l_mg = >l_mg; struct list_head *move_list = NULL; - int vsc = le32_to_cpu(*line->vsc); + int packed_meta = (le32_to_cpu(*line->vsc) / pblk->min_write_pgs_data) + * (pblk->min_write_pgs - pblk->min_write_pgs_data); + int vsc = le32_to_cpu(*line->vsc) + packed_meta; lockdep_assert_held(>lock); @@ -540,12 +542,15 @@ struct bio *pblk_bio_map_addr(struct pblk *pblk, void *data, } int pblk_calc_secs(struct pblk *pblk, unsigned long secs_avail, - unsigned long secs_to_flush) + unsigned long secs_to_flush, bool skip_meta) { int max = pblk->sec_per_write; int min = pblk->min_write_pgs; int secs_to_sync = 0; + if (skip_meta) + min = max = pblk->min_write_pgs_data; + if (secs_avail >= max) secs_to_sync = max; else if (secs_avail >= min) @@ -663,7 +668,7 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line, next_rq: memset(, 0, sizeof(struct nvm_rq)); - rq_ppas = pblk_calc_secs(pblk, left_ppas, 0); + rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false); rq_len = rq_ppas * geo->csecs; bio = pblk_bio_map_addr(pblk, emeta_buf, rq_ppas, rq_len, @@ -2091,3 +2096,42 @@ void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas, } spin_unlock(>trans_lock); } + +void pblk_set_packed_meta(struct pblk *pblk, struct nvm_rq *rqd) +{ + void *meta_list = rqd->meta_list; + void *page; + int i = 0; + + if (pblk_is_oob_meta_supported(pblk)) + return; + + /* We need to zero out metadata corresponding to packed meta page */ + pblk_get_meta_at(pblk, meta_list, rqd->nr_ppas - 1)->lba = ADDR_EMPTY; + + page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page); + /* We need to fill last page of request (packed metadata) +* with data from oob meta buffer. +*/ + for (; i < rqd->nr_ppas; i++) + memcpy(page + (i * sizeof(struct pblk_sec_meta)), + pblk_get_meta_at(pblk, meta_list, i), + sizeof(struct pblk_sec_meta)); +} + +void pblk_get_packed_meta(struct pblk *pblk, struct nvm_rq *rqd) +{ + void *meta_list = rqd->meta_list; + void *page; + int i = 0; + + if (pblk_is_oob_meta_supported(pblk)) + return; + + page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page); + /* We need to fill oob meta buffer with data from packed metadata */ + for (; i < rqd->nr_ppas; i++) + memcpy(pblk_get_meta_at(pblk, meta_list, i), + page + (i * sizeof(struct pblk_sec_meta)), + sizeof(struct pblk_sec_meta)); +} diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c index f05112230a52..5eb641da46ed 100644 --- a/drivers/lightnvm/pblk-init.c +++ b/drivers/lightnvm/pblk-init.c @@ -372,8 +372,40 @@ static int pblk_core_init(struct pblk *pblk) pblk->min_write_pgs = geo->ws_opt * (geo->csecs / PAGE_SIZE); max_write_ppas = pblk->min_write_pgs * geo->all_luns; pblk->max_write_pgs = min_t(int, max_write_ppas, NVM_MAX_VLBA); + pblk->min_write_pgs_data = pblk->min_write_pgs; pblk_se
[PATCH 5/5] lightnvm: pblk: Disable interleaved metadata in pblk
Currently pblk and lightnvm does only check for size of OOB metadata and does not care wheather this meta is located in separate buffer or is interleaved with data in single buffer. In reality only the first scenario is supported, where second mode will break pblk functionality during any IO operation. The goal of this patch is to block creation of pblk devices in case of interleaved metadata Signed-off-by: Igor Konopko --- drivers/lightnvm/pblk-init.c | 6 ++ drivers/nvme/host/lightnvm.c | 1 + include/linux/lightnvm.h | 1 + 3 files changed, 8 insertions(+) diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c index 5eb641da46ed..483a6d479e7d 100644 --- a/drivers/lightnvm/pblk-init.c +++ b/drivers/lightnvm/pblk-init.c @@ -1238,6 +1238,12 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk, return ERR_PTR(-EINVAL); } + if (geo->ext) { + pr_err("pblk: extended (interleaved) metadata in data buffer" + " not supported\n"); + return ERR_PTR(-EINVAL); + } + pblk = kzalloc(sizeof(struct pblk), GFP_KERNEL); if (!pblk) return ERR_PTR(-ENOMEM); diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c index 670478abc754..872ab854ccf5 100644 --- a/drivers/nvme/host/lightnvm.c +++ b/drivers/nvme/host/lightnvm.c @@ -979,6 +979,7 @@ void nvme_nvm_update_nvm_info(struct nvme_ns *ns) geo->csecs = 1 << ns->lba_shift; geo->sos = ns->ms; + geo->ext = ns->ext; } int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node) diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h index 72a55d71917e..b13e64e2112f 100644 --- a/include/linux/lightnvm.h +++ b/include/linux/lightnvm.h @@ -350,6 +350,7 @@ struct nvm_geo { u32 clba; /* sectors per chunk */ u16 csecs; /* sector size */ u16 sos;/* out-of-band area size */ + u16 ext;/* metadata in extended data buffer */ /* device write constrains */ u32 ws_min; /* minimum write size */ -- 2.14.3
[PATCH 0/5] lightnvm: More flexible approach to metadata
This series of patches introduce some more flexibility in pblk related to OOB meta: -ability to use different sizes of metadata (previously fixed 16b) -ability to use pblk on drives without metadata -ensuring that extended (interleaved) metadata is not in use I belive that most of this patches, maybe except of number 4 (Support for packed metadata) are rather simple, so waiting for comments especially about this one. Igor Konopko (5): lightnvm: pblk: Helpers for OOB metadata lightnvm: pblk: Remove resv field for sec meta lightnvm: Flexible DMA pool entry size lightnvm: pblk: Support for packed metadata in pblk. lightnvm: pblk: Disable interleaved metadata in pblk drivers/lightnvm/core.c | 33 ++- drivers/lightnvm/pblk-core.c | 86 +++- drivers/lightnvm/pblk-init.c | 52 +++- drivers/lightnvm/pblk-map.c | 21 ++ drivers/lightnvm/pblk-rb.c | 3 ++ drivers/lightnvm/pblk-read.c | 85 +-- drivers/lightnvm/pblk-recovery.c | 67 +-- drivers/lightnvm/pblk-sysfs.c| 7 drivers/lightnvm/pblk-write.c| 22 ++ drivers/lightnvm/pblk.h | 46 +++-- drivers/nvme/host/lightnvm.c | 7 +++- include/linux/lightnvm.h | 9 +++-- 12 files changed, 333 insertions(+), 105 deletions(-) -- 2.14.3
[PATCH 2/5] lightnvm: pblk: Remove resv field for sec meta
Since we have flexible size of pblk_sec_meta which depends on drive metadata size we can remove not needed reserved field from that structure Signed-off-by: Igor Konopko --- drivers/lightnvm/pblk.h | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h index f82c3a0b0de5..27658dc6fc1a 100644 --- a/drivers/lightnvm/pblk.h +++ b/drivers/lightnvm/pblk.h @@ -82,7 +82,6 @@ enum { }; struct pblk_sec_meta { - u64 reserved; __le64 lba; }; -- 2.14.3
Re: [PATCH] lightnvm: pblk: add asynchronous partial read
_list; + struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd); + struct pblk_pr_ctx *pr_ctx; + struct bio *new_bio, *bio = r_ctx->private; + __le64 *lba_list_mem; + int nr_secs = rqd->nr_ppas; + int i; + + /* Re-use allocated memory for intermediate lbas */ + lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size); + + new_bio = bio_alloc(GFP_KERNEL, nr_holes); + + if (pblk_bio_add_pages(pblk, new_bio, GFP_KERNEL, nr_holes)) + goto fail; + + if (nr_holes != new_bio->bi_vcnt) { + pr_err("pblk: malformed bio\n"); + goto fail; Shouldn't we use goto fail_pages since we already allocate bio pages correctly? + } + + pr_ctx = kmalloc(sizeof(struct pblk_pr_ctx), GFP_KERNEL); + if (!pr_ctx) + goto fail_pages; + + for (i = 0; i < nr_secs; i++) + lba_list_mem[i] = meta_list[i].lba; + + new_bio->bi_iter.bi_sector = 0; /* internal bio */ + bio_set_op_attrs(new_bio, REQ_OP_READ, 0); + + rqd->bio = new_bio; + rqd->nr_ppas = nr_holes; + rqd->flags = pblk_set_read_mode(pblk, PBLK_READ_RANDOM); + + pr_ctx->ppa_ptr = NULL; + pr_ctx->orig_bio = bio; + pr_ctx->bitmap = *read_bitmap; + pr_ctx->bio_init_idx = bio_init_idx; + pr_ctx->orig_nr_secs = nr_secs; + r_ctx->private = pr_ctx; + + if (unlikely(nr_holes == 1)) { + pr_ctx->ppa_ptr = rqd->ppa_list; + pr_ctx->dma_ppa_list = rqd->dma_ppa_list; + rqd->ppa_addr = rqd->ppa_list[0]; + } + return 0; + +fail_pages: + pblk_bio_free_pages(pblk, new_bio, 0, new_bio->bi_vcnt); +fail: + bio_put(new_bio); + + return -ENOMEM; +} + +static int pblk_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd, + unsigned int bio_init_idx, + unsigned long *read_bitmap, int nr_secs) +{ + int nr_holes; + int ret; + + nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs); + + if (pblk_setup_partial_read(pblk, rqd, bio_init_idx, read_bitmap, + nr_holes)) + return NVM_IO_ERR; + + rqd->end_io = pblk_end_partial_read; + + ret = pblk_submit_io(pblk, rqd); + if (ret) { + bio_put(rqd->bio); + pr_err("pblk: partial read IO submission failed\n"); + goto err; + } + + return NVM_IO_OK; err: pr_err("pblk: failed to perform partial read\n"); /* Free allocated pages in new bio */ - pblk_bio_free_pages(pblk, orig_bio, 0, new_bio->bi_vcnt); + pblk_bio_free_pages(pblk, rqd->bio, 0, rqd->bio->bi_vcnt); __pblk_end_io_read(pblk, rqd, false); return NVM_IO_ERR; } @@ -480,8 +530,15 @@ int pblk_submit_read(struct pblk *pblk, struct bio *bio) /* The read bio request could be partially filled by the write buffer, * but there are some holes that need to be read from the drive. */ - return pblk_partial_read(pblk, rqd, bio, bio_init_idx, _bitmap); + ret = pblk_partial_read_bio(pblk, rqd, bio_init_idx, _bitmap, + nr_secs); + if (ret) + goto fail_meta_free; + + return NVM_IO_OK; +fail_meta_free: + nvm_dev_dma_free(dev->parent, rqd->meta_list, rqd->dma_meta_list); fail_rqd_free: pblk_free_rqd(pblk, rqd, PBLK_READ); return ret; diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h index 25ad026..4b28900 100644 --- a/drivers/lightnvm/pblk.h +++ b/drivers/lightnvm/pblk.h @@ -119,6 +119,16 @@ struct pblk_g_ctx { u64 lba; }; +/* partial read context */ +struct pblk_pr_ctx { + struct bio *orig_bio; + unsigned long bitmap; + unsigned int orig_nr_secs; + unsigned int bio_init_idx; + void *ppa_ptr; + dma_addr_t dma_ppa_list; +}; + /* Pad context */ struct pblk_pad_rq { struct pblk *pblk; -- 2.7.4 Thanks Heiner. The patch looks good. Reviewed-by: Javier González + Marcin & Igor. Could you give this a spin with your drive and see if it works for you? It looks that it does not apply on top of for-4.19/core, but after some changes I was able to test it. Except of one minor comment above it looks good for me. Tested-by: Igor Konopko
[PATCH] lightnvm: pblk: sync RB and RL states during GC
During sequential workloads we can met the case when almost all the lines are fully written with data. In that case rate limiter will significantly reduce the max number of requests for user IOs. Unfortunately in the case when round buffer is flushed to drive and the entries are not yet removed (which is ok, since there is still enough free entries in round buffer for user IO) we hang on user IO due to not enough entries in rate limiter. The reason is that rate limiter user entries are decreased after freeing the round buffer entries, which does not happen if there is still plenty of space in round buffer. The goal of this patch is to force freeing round buffer by calling pblk_rb_sync_l2p and thus making new free entries in rate limiter, when there is no enough of them for user IO. Signed-off-by: Igor Konopko <igor.j.kono...@intel.com> Signed-off-by: Marcin Dziegielewski <marcin.dziegielew...@intel.com> --- drivers/lightnvm/pblk-init.c | 2 ++ drivers/lightnvm/pblk-rb.c | 7 +++ 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c index 0f277744266b..e6aa7726f8ba 100644 --- a/drivers/lightnvm/pblk-init.c +++ b/drivers/lightnvm/pblk-init.c @@ -1149,7 +1149,9 @@ static void pblk_tear_down(struct pblk *pblk, bool graceful) __pblk_pipeline_flush(pblk); __pblk_pipeline_stop(pblk); pblk_writer_stop(pblk); + spin_lock(>rwb.w_lock); pblk_rb_sync_l2p(>rwb); + spin_unlock(>rwb.w_lock); pblk_rl_free(>rl); pr_debug("pblk: consistent tear down (graceful:%d)\n", graceful); diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c index 1b74ec51a4ad..91824cd3e8d8 100644 --- a/drivers/lightnvm/pblk-rb.c +++ b/drivers/lightnvm/pblk-rb.c @@ -266,21 +266,18 @@ static int pblk_rb_update_l2p(struct pblk_rb *rb, unsigned int nr_entries, * Update the l2p entry for all sectors stored on the write buffer. This means * that all future lookups to the l2p table will point to a device address, not * to the cacheline in the write buffer. + * Caller must ensure that rb->w_lock is taken. */ void pblk_rb_sync_l2p(struct pblk_rb *rb) { unsigned int sync; unsigned int to_update; - spin_lock(>w_lock); - /* Protect from reads and writes */ sync = smp_load_acquire(>sync); to_update = pblk_rb_ring_count(sync, rb->l2p_update, rb->nr_entries); __pblk_rb_update_l2p(rb, to_update); - - spin_unlock(>w_lock); } /* @@ -462,6 +459,8 @@ int pblk_rb_may_write_user(struct pblk_rb *rb, struct bio *bio, spin_lock(>w_lock); io_ret = pblk_rl_user_may_insert(>rl, nr_entries); if (io_ret) { + /* Sync RB & L2P in order to update rate limiter values */ + pblk_rb_sync_l2p(rb); spin_unlock(>w_lock); return io_ret; } -- 2.14.3
[PATCH 2/3] lightnvm: Handling when whole line is bad
When all the blocks (chunks) in line are marked as bad (offline) we shouldn't try to read smeta during init process. Currently we are trying to do so by passing -1 as PPA address, what causes multiple warnings, that we issuing IOs to out-of-bound PPAs. Signed-off-by: Igor Konopko <igor.j.kono...@intel.com> Signed-off-by: Marcin Dziegielewski <marcin.dziegielew...@intel.com> --- drivers/lightnvm/pblk-core.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c index 6d21f9dbca5f..5d197f19b77b 100644 --- a/drivers/lightnvm/pblk-core.c +++ b/drivers/lightnvm/pblk-core.c @@ -867,6 +867,11 @@ int pblk_line_read_smeta(struct pblk *pblk, struct pblk_line *line) { u64 bpaddr = pblk_line_smeta_start(pblk, line); + if (bpaddr == -1) { + /* Whole line is bad - do not try to read smeta. */ + return 1; + } + return pblk_line_submit_smeta_io(pblk, line, bpaddr, PBLK_READ_RECOV); } -- 2.14.3
[PATCH 3/3] lightnvm: Fix partial read error path
When error occurs during bio_add_page on partial read path, pblk tries to free pages twice. This patch fixes that issue. Signed-off-by: Igor Konopko <igor.j.kono...@intel.com> Signed-off-by: Marcin Dziegielewski <marcin.dziegielew...@intel.com> --- drivers/lightnvm/pblk-read.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c index a2e678de428f..fa7b60f852d9 100644 --- a/drivers/lightnvm/pblk-read.c +++ b/drivers/lightnvm/pblk-read.c @@ -256,7 +256,7 @@ static int pblk_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd, new_bio = bio_alloc(GFP_KERNEL, nr_holes); if (pblk_bio_add_pages(pblk, new_bio, GFP_KERNEL, nr_holes)) - goto err; + goto err_add_pages; if (nr_holes != new_bio->bi_vcnt) { pr_err("pblk: malformed bio\n"); @@ -347,10 +347,10 @@ static int pblk_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd, return NVM_IO_OK; err: - pr_err("pblk: failed to perform partial read\n"); - /* Free allocated pages in new bio */ - pblk_bio_free_pages(pblk, bio, 0, new_bio->bi_vcnt); + pblk_bio_free_pages(pblk, new_bio, 0, new_bio->bi_vcnt); +err_add_pages: + pr_err("pblk: failed to perform partial read\n"); __pblk_end_io_read(pblk, rqd, false); return NVM_IO_ERR; } -- 2.14.3
[PATCH 1/3] lightnvm: Proper error handling for pblk_bio_add_pages
Currently in case of error caused by bio_pc_add_page in pblk_bio_add_pages two issues occur when calling from pblk_rb_read_to_bio(). First one is in pblk_bio_free_pages, since we are trying to free pages not allocated from our mempool. Second one is the warn from dma_pool_free, that we are trying to free NULL pointer dma. This commit fix that both issues. Signed-off-by: Igor Konopko <igor.j.kono...@intel.com> Signed-off-by: Marcin Dziegielewski <marcin.dziegielew...@intel.com> --- drivers/lightnvm/pblk-core.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c index e43093e27084..6d21f9dbca5f 100644 --- a/drivers/lightnvm/pblk-core.c +++ b/drivers/lightnvm/pblk-core.c @@ -278,7 +278,8 @@ void pblk_free_rqd(struct pblk *pblk, struct nvm_rq *rqd, int type) return; } - nvm_dev_dma_free(dev->parent, rqd->meta_list, rqd->dma_meta_list); + if (rqd->meta_list) + nvm_dev_dma_free(dev->parent, rqd->meta_list, +rqd->dma_meta_list); mempool_free(rqd, pool); } @@ -316,7 +317,7 @@ int pblk_bio_add_pages(struct pblk *pblk, struct bio *bio, gfp_t flags, return 0; err: - pblk_bio_free_pages(pblk, bio, 0, i - 1); + pblk_bio_free_pages(pblk, bio, (bio->bi_vcnt - i), i); return -1; } -- 2.14.3
[PATCH 0/3] lightnvm: Error paths handling
This patchset provides a proper handling for some of the errors which are not gracefully handled right now. Igor Konopko (3): lightnvm: Proper error handling for pblk_bio_add_pages lightnvm: Handling when whole line is bad lightnvm: Fix partial read error path drivers/lightnvm/pblk-core.c | 10 -- drivers/lightnvm/pblk-read.c | 8 2 files changed, 12 insertions(+), 6 deletions(-) -- 2.14.3