Re: [PATCH v2] nvme-pci: fix dbbuf_sq_db point to freed memory
Thanks for replying to my email, my description in the last email was not clear enough, so here's a supplementary note. The NVME device I used support DBBUF, but the nvme_admin_dbbuf request returned a failure that eventually led to the kernel crash. The problem occurs as follows: 1, Device support NVME_CTRL_OACS_DBBUF_SUPP,so reset worker alloc memory for dev->dbbuf_dbs。 2, In nvme_setup_io_queues process, the nvme_dbbuf_init function is called to assign values to pointers such as nvmeq->dbbuf_sq_db. 3, In nvme_dev_add function, the nvme_admin_dbbuf request is sent to the device, but the device returns failed, so the memory that dev->dbbuf_dbs points to is released. Then, the driver issued IO requests, in the nvme_write_sq_db process, nvme_dbbuf_update_and_check_event function judgment to Nvmeq->dbbuf_sq_db pointer is not NULL, write to the memory it points to, causing memory confusion and kernel crash. On 2019/1/5 2:07, Christoph Hellwig wrote: > On Fri, Dec 21, 2018 at 01:07:25AM +, Lulina (A) wrote: >> The case is that nvme device support NVME_CTRL_OACS_DBBUF_SUPP, and >> return failed when the driver sent nvme_admin_dbbuf. The nvmeq->dbbuf_sq_db >> point to freed memory, as nvme_dbbuf_set is called after nvme_dbbuf_init. > > But we never use those pointers in that state, do we? Can you explain > the problem in a little more detail? > >
[PATCH v2] nvme-pci: fix dbbuf_sq_db point to freed memory
The case is that nvme device support NVME_CTRL_OACS_DBBUF_SUPP, and return failed when the driver sent nvme_admin_dbbuf. The nvmeq->dbbuf_sq_db point to freed memory, as nvme_dbbuf_set is called after nvme_dbbuf_init. Signed-off-by: lulina diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index c33bb20..a477905 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -251,16 +251,25 @@ static int nvme_dbbuf_dma_alloc(struct nvme_dev *dev) static void nvme_dbbuf_dma_free(struct nvme_dev *dev) { unsigned int mem_size = nvme_dbbuf_size(dev->db_stride); + unsigned int i; if (dev->dbbuf_dbs) { dma_free_coherent(dev->dev, mem_size, dev->dbbuf_dbs, dev->dbbuf_dbs_dma_addr); dev->dbbuf_dbs = NULL; + for (i = dev->ctrl.queue_count - 1; i > 0; i--) { + dev->queues[i].dbbuf_sq_db = NULL; + dev->queues[i].dbbuf_cq_db = NULL; + } } if (dev->dbbuf_eis) { dma_free_coherent(dev->dev, mem_size, dev->dbbuf_eis, dev->dbbuf_eis_dma_addr); dev->dbbuf_eis = NULL; + for (i = dev->ctrl.queue_count - 1; i > 0; i--) { + dev->queues[i].dbbuf_sq_ei = NULL; + dev->queues[i].dbbuf_cq_ei = NULL; + } } } -- 1.8.3.1
[PATCH] nvme-pci: fix dbbuf_sq_db point to freed memory
The case is that nvme device support NVME_CTRL_OACS_DBBUF_SUPP, and return failed when the driver sent nvme_admin_dbbuf. The nvmeq->dbbuf_sq_db point to freed memory, as nvme_dbbuf_set is called behind nvme_dbbuf_init. Change-Id: Ief2a5877cb008d3c29cf99053f80fecc9b8db1db Signed-off-by: lulina diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index da39729..2e11980 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -240,16 +240,25 @@ static int nvme_dbbuf_dma_alloc(struct nvme_dev *dev) static void nvme_dbbuf_dma_free(struct nvme_dev *dev) { unsigned int mem_size = nvme_dbbuf_size(dev->db_stride); + unsigned int i; if (dev->dbbuf_dbs) { dma_free_coherent(dev->dev, mem_size, dev->dbbuf_dbs, dev->dbbuf_dbs_dma_addr); dev->dbbuf_dbs = NULL; + for (i = dev->ctrl.queue_count - 1; i > 0; i--) { + dev->queues[i]->dbbuf_sq_db = NULL; + dev->queues[i]->dbbuf_cq_db = NULL; + } } if (dev->dbbuf_eis) { dma_free_coherent(dev->dev, mem_size, dev->dbbuf_eis, dev->dbbuf_eis_dma_addr); dev->dbbuf_eis = NULL; + for (i = dev->ctrl.queue_count - 1; i > 0; i--) { + dev->queues[i]->dbbuf_sq_ei = NULL; + dev->queues[i]->dbbuf_cq_ei = NULL; + } } } -- 1.8.3.1
Question about bcache buckets utilization
Hi! Kent Overstreet, I tested bcache branch for jens: http://evilpiepirate.org/git/linux-bcache.git/log/?h=for-jens I have a question about buckets utilization, can you help me to resolve it? This is the test fio cmd: fio -name iops -rw=randwrite -iodepth=32 -numjobs=1 -filename=/dev/bcache0 -ioengine libaio -direct=1 -bs=4k -size=500M -runtime=600 -time_based -random_distribution=zipf:1.2 My cache device is 1G size, and Hdd device is 5G size. The bucket size is default 512k. When the test is run, cache is over CUTOFF_WRITEBACK_SYNC after 3 or 4 seconds. The test result is as follows: Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz avgqu-sz await svctm %util bcache0 0.00 0.000.00 62081.00 0.00 242.50 8.00 0.000.01 0.00 0.00 bcache0 0.00 0.000.00 72407.00 0.00 282.84 8.00 0.000.01 0.00 0.00 bcache0 0.00 0.000.00 62990.00 0.00 246.05 8.00 0.000.08 0.00 0.00 bcache0 0.00 0.000.00 511.00 0.00 2.00 8.00 0.00 62.53 0.00 0.00 bcache0 0.00 0.000.00 601.00 0.00 2.35 8.00 0.00 52.73 0.00 0.00 After cache is over CUTOFF_WRITEBACK_SYNC, all writes come down to the hdd, then I got a pool performance. As the random rang is set to 500M, It will all overlapped in the cache device. Why the overlapped buckets can't be reused? Is there any other conditions need to meet? Thanks, Lina Lu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Question about bcache buckets utilization
Hi! Kent Overstreet, I tested bcache branch for jens: http://evilpiepirate.org/git/linux-bcache.git/log/?h=for-jens I have a question about buckets utilization, can you help me to resolve it? This is the test fio cmd: fio -name iops -rw=randwrite -iodepth=32 -numjobs=1 -filename=/dev/bcache0 -ioengine libaio -direct=1 -bs=4k -size=500M -runtime=600 -time_based -random_distribution=zipf:1.2 My cache device is 1G size, and Hdd device is 5G size. The bucket size is default 512k. When the test is run, cache is over CUTOFF_WRITEBACK_SYNC after 3 or 4 seconds. The test result is as follows: Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz avgqu-sz await svctm %util bcache0 0.00 0.000.00 62081.00 0.00 242.50 8.00 0.000.01 0.00 0.00 bcache0 0.00 0.000.00 72407.00 0.00 282.84 8.00 0.000.01 0.00 0.00 bcache0 0.00 0.000.00 62990.00 0.00 246.05 8.00 0.000.08 0.00 0.00 bcache0 0.00 0.000.00 511.00 0.00 2.00 8.00 0.00 62.53 0.00 0.00 bcache0 0.00 0.000.00 601.00 0.00 2.35 8.00 0.00 52.73 0.00 0.00 After cache is over CUTOFF_WRITEBACK_SYNC, all writes come down to the hdd, then I got a pool performance. As the random rang is set to 500M, It will all overlapped in the cache device. Why the overlapped buckets can't be reused? Is there any other conditions need to meet? Thanks, Lina Lu -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/