Re: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device support
On Tue, Apr 28, 2015 at 03:15:54PM -0700, Dan Williams wrote: > > lsblk's blkdev_scsi_type_to_name() considers 4 to mean > > SCSI_TYPE_WORM (write once read many ... used for certain optical > > and tape drives). > > Why is lsblk assuming these are scsi devices? I'll need to go check that out. It's a very common assumption unfortunately. I rember fixing it in various in-house tools at customers and stumbled over it in targetcli recently. Please use a prefix for your type attribute to avoid this problem. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device support
On Tue, Apr 28, 2015 at 03:15:54PM -0700, Dan Williams wrote: lsblk's blkdev_scsi_type_to_name() considers 4 to mean SCSI_TYPE_WORM (write once read many ... used for certain optical and tape drives). Why is lsblk assuming these are scsi devices? I'll need to go check that out. It's a very common assumption unfortunately. I rember fixing it in various in-house tools at customers and stumbled over it in targetcli recently. Please use a prefix for your type attribute to avoid this problem. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device support
On Tue, Apr 28, 2015 at 2:24 PM, Elliott, Robert (Server Storage) wrote: >> -Original Message- >> From: Linux-nvdimm [mailto:linux-nvdimm-boun...@lists.01.org] On Behalf Of >> Dan Williams >> Sent: Tuesday, April 28, 2015 1:24 PM >> To: linux-nvd...@lists.01.org >> Cc: Neil Brown; Dave Chinner; H. Peter Anvin; Christoph Hellwig; Rafael J. >> Wysocki; Robert Moore; Ingo Molnar; linux-a...@vger.kernel.org; Jens Axboe; >> Borislav Petkov; Thomas Gleixner; Greg KH; linux-kernel@vger.kernel.org; >> Andy Lutomirski; Andrew Morton; Linus Torvalds >> Subject: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device >> support >> >> Changes since v1 [1]: Incorporates feedback received prior to April 24. > > Here are some comments on the sysfs properties reported for a pmem device. > They are based on v1, but I don't think v2 changes anything. > > 1. This confuses lsblk (part of util-linux): > /sys/block/pmem0/device/type:4 > > lsblk shows: > NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT > pmem0 251:00 8G 0 worm > pmem1 251:16 0 8G 0 worm > pmem2 251:32 0 8G 0 worm > pmem3 251:48 0 8G 0 worm > pmem4 251:64 0 8G 0 worm > pmem5 251:80 0 8G 0 worm > pmem6 251:96 0 8G 0 worm > pmem7 251:112 0 8G 0 worm > > lsblk's blkdev_scsi_type_to_name() considers 4 to mean > SCSI_TYPE_WORM (write once read many ... used for certain optical > and tape drives). Why is lsblk assuming these are scsi devices? I'll need to go check that out. > I'm not sure what nd and pmem are doing to result in that value. That is their libnd specific device type number from include/uapi/ndctl.h. 4 == ND_DEVICE_NAMESPACE_IO. lsblk has no business interpreting this as something SCSI specific. > 2. To avoid confusing software trying to detect fast storage vs. > slow storage devices via sysfs, this value should be 0: > /sys/block/pmem0/queue/rotational:1 > > That can be done by adding this shortly after the blk_alloc_queue call: > queue_flag_set_unlocked(QUEUE_FLAG_NONROT, pmem->pmem_queue); Yeah, good catch. > 3. Is there any reason to have a 512 KiB limit on the transfer > length? > /sys/block/pmem0/queue/max_hw_sectors_kb:512 > > That is from: >blk_queue_max_hw_sectors(pmem->pmem_queue, 1024); I'd only change this from the default if performance testing showed it made a non-trivial difference. > 4. These are read-writeable, but IOs never reach a queue, so > the queue size is irrelevant and merging never happens: > /sys/block/pmem0/queue/nomerges:0 > /sys/block/pmem0/queue/nr_requests:128 > > Consider making them both read-only with: > * nomerges set to 2 (no merging happening) > * nr_requests as small as the block layer allows to avoid > wasting memory. > > 5. No scatter-gather lists are created by the driver, so these > read-only fields are meaningless: > /sys/block/pmem0/queue/max_segments:128 > /sys/block/pmem0/queue/max_segment_size:65536 > > Is there a better way to report them as irrelevant? Again it comes back to the question of whether these default settings are actively harmful. > > 6. There is no completion processing, so the read-writeable > cpu affinity is not used: > /sys/block/pmem0/queue/rq_affinity:0 > > Consider making it read-only and set to 2, meaning the > completions always run on the requesting CPU. There are no completions with pmem, the entire I/O path is synchronous. Ideally, this attribute would disappear for a pmem queue, not be set to 2. > 7. With mmap() allowing less than logical block sized accesses > to the device, this could be considered misleading: > /sys/block/pmem0/queue/physical_block_size:512 I don't see how it is misleading. If you access it as a block device the block size is 512. If the application is mmap() + DAX aware it knows that the physical_block_size is being bypassed. > > Perhaps that needs to be 1 byte or a cacheline size (64 bytes > on x86) to indicate that direct partial logical block accesses > are possible. No, because that breaks the definition of a block device. Through the bdev interface it's always accessed a block at a time. > The btt driver could report 512 as one indication > it is different. > > I wouldn't be surprised if smaller values than the logical block > size confused some software, though. Precisely why we shouldn't go there with pmem. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device support
> -Original Message- > From: Linux-nvdimm [mailto:linux-nvdimm-boun...@lists.01.org] On Behalf Of > Dan Williams > Sent: Tuesday, April 28, 2015 1:24 PM > To: linux-nvd...@lists.01.org > Cc: Neil Brown; Dave Chinner; H. Peter Anvin; Christoph Hellwig; Rafael J. > Wysocki; Robert Moore; Ingo Molnar; linux-a...@vger.kernel.org; Jens Axboe; > Borislav Petkov; Thomas Gleixner; Greg KH; linux-kernel@vger.kernel.org; > Andy Lutomirski; Andrew Morton; Linus Torvalds > Subject: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device > support > > Changes since v1 [1]: Incorporates feedback received prior to April 24. Here are some comments on the sysfs properties reported for a pmem device. They are based on v1, but I don't think v2 changes anything. 1. This confuses lsblk (part of util-linux): /sys/block/pmem0/device/type:4 lsblk shows: NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT pmem0 251:00 8G 0 worm pmem1 251:16 0 8G 0 worm pmem2 251:32 0 8G 0 worm pmem3 251:48 0 8G 0 worm pmem4 251:64 0 8G 0 worm pmem5 251:80 0 8G 0 worm pmem6 251:96 0 8G 0 worm pmem7 251:112 0 8G 0 worm lsblk's blkdev_scsi_type_to_name() considers 4 to mean SCSI_TYPE_WORM (write once read many ... used for certain optical and tape drives). I'm not sure what nd and pmem are doing to result in that value. 2. To avoid confusing software trying to detect fast storage vs. slow storage devices via sysfs, this value should be 0: /sys/block/pmem0/queue/rotational:1 That can be done by adding this shortly after the blk_alloc_queue call: queue_flag_set_unlocked(QUEUE_FLAG_NONROT, pmem->pmem_queue); 3. Is there any reason to have a 512 KiB limit on the transfer length? /sys/block/pmem0/queue/max_hw_sectors_kb:512 That is from: blk_queue_max_hw_sectors(pmem->pmem_queue, 1024); 4. These are read-writeable, but IOs never reach a queue, so the queue size is irrelevant and merging never happens: /sys/block/pmem0/queue/nomerges:0 /sys/block/pmem0/queue/nr_requests:128 Consider making them both read-only with: * nomerges set to 2 (no merging happening) * nr_requests as small as the block layer allows to avoid wasting memory. 5. No scatter-gather lists are created by the driver, so these read-only fields are meaningless: /sys/block/pmem0/queue/max_segments:128 /sys/block/pmem0/queue/max_segment_size:65536 Is there a better way to report them as irrelevant? 6. There is no completion processing, so the read-writeable cpu affinity is not used: /sys/block/pmem0/queue/rq_affinity:0 Consider making it read-only and set to 2, meaning the completions always run on the requesting CPU. 7. With mmap() allowing less than logical block sized accesses to the device, this could be considered misleading: /sys/block/pmem0/queue/physical_block_size:512 Perhaps that needs to be 1 byte or a cacheline size (64 bytes on x86) to indicate that direct partial logical block accesses are possible. The btt driver could report 512 as one indication it is different. I wouldn't be surprised if smaller values than the logical block size confused some software, though. --- Robert Elliott, HP Server Storage -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device support
-Original Message- From: Linux-nvdimm [mailto:linux-nvdimm-boun...@lists.01.org] On Behalf Of Dan Williams Sent: Tuesday, April 28, 2015 1:24 PM To: linux-nvd...@lists.01.org Cc: Neil Brown; Dave Chinner; H. Peter Anvin; Christoph Hellwig; Rafael J. Wysocki; Robert Moore; Ingo Molnar; linux-a...@vger.kernel.org; Jens Axboe; Borislav Petkov; Thomas Gleixner; Greg KH; linux-kernel@vger.kernel.org; Andy Lutomirski; Andrew Morton; Linus Torvalds Subject: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device support Changes since v1 [1]: Incorporates feedback received prior to April 24. Here are some comments on the sysfs properties reported for a pmem device. They are based on v1, but I don't think v2 changes anything. 1. This confuses lsblk (part of util-linux): /sys/block/pmem0/device/type:4 lsblk shows: NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT pmem0 251:00 8G 0 worm pmem1 251:16 0 8G 0 worm pmem2 251:32 0 8G 0 worm pmem3 251:48 0 8G 0 worm pmem4 251:64 0 8G 0 worm pmem5 251:80 0 8G 0 worm pmem6 251:96 0 8G 0 worm pmem7 251:112 0 8G 0 worm lsblk's blkdev_scsi_type_to_name() considers 4 to mean SCSI_TYPE_WORM (write once read many ... used for certain optical and tape drives). I'm not sure what nd and pmem are doing to result in that value. 2. To avoid confusing software trying to detect fast storage vs. slow storage devices via sysfs, this value should be 0: /sys/block/pmem0/queue/rotational:1 That can be done by adding this shortly after the blk_alloc_queue call: queue_flag_set_unlocked(QUEUE_FLAG_NONROT, pmem-pmem_queue); 3. Is there any reason to have a 512 KiB limit on the transfer length? /sys/block/pmem0/queue/max_hw_sectors_kb:512 That is from: blk_queue_max_hw_sectors(pmem-pmem_queue, 1024); 4. These are read-writeable, but IOs never reach a queue, so the queue size is irrelevant and merging never happens: /sys/block/pmem0/queue/nomerges:0 /sys/block/pmem0/queue/nr_requests:128 Consider making them both read-only with: * nomerges set to 2 (no merging happening) * nr_requests as small as the block layer allows to avoid wasting memory. 5. No scatter-gather lists are created by the driver, so these read-only fields are meaningless: /sys/block/pmem0/queue/max_segments:128 /sys/block/pmem0/queue/max_segment_size:65536 Is there a better way to report them as irrelevant? 6. There is no completion processing, so the read-writeable cpu affinity is not used: /sys/block/pmem0/queue/rq_affinity:0 Consider making it read-only and set to 2, meaning the completions always run on the requesting CPU. 7. With mmap() allowing less than logical block sized accesses to the device, this could be considered misleading: /sys/block/pmem0/queue/physical_block_size:512 Perhaps that needs to be 1 byte or a cacheline size (64 bytes on x86) to indicate that direct partial logical block accesses are possible. The btt driver could report 512 as one indication it is different. I wouldn't be surprised if smaller values than the logical block size confused some software, though. --- Robert Elliott, HP Server Storage -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device support
On Tue, Apr 28, 2015 at 2:24 PM, Elliott, Robert (Server Storage) elli...@hp.com wrote: -Original Message- From: Linux-nvdimm [mailto:linux-nvdimm-boun...@lists.01.org] On Behalf Of Dan Williams Sent: Tuesday, April 28, 2015 1:24 PM To: linux-nvd...@lists.01.org Cc: Neil Brown; Dave Chinner; H. Peter Anvin; Christoph Hellwig; Rafael J. Wysocki; Robert Moore; Ingo Molnar; linux-a...@vger.kernel.org; Jens Axboe; Borislav Petkov; Thomas Gleixner; Greg KH; linux-kernel@vger.kernel.org; Andy Lutomirski; Andrew Morton; Linus Torvalds Subject: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device support Changes since v1 [1]: Incorporates feedback received prior to April 24. Here are some comments on the sysfs properties reported for a pmem device. They are based on v1, but I don't think v2 changes anything. 1. This confuses lsblk (part of util-linux): /sys/block/pmem0/device/type:4 lsblk shows: NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT pmem0 251:00 8G 0 worm pmem1 251:16 0 8G 0 worm pmem2 251:32 0 8G 0 worm pmem3 251:48 0 8G 0 worm pmem4 251:64 0 8G 0 worm pmem5 251:80 0 8G 0 worm pmem6 251:96 0 8G 0 worm pmem7 251:112 0 8G 0 worm lsblk's blkdev_scsi_type_to_name() considers 4 to mean SCSI_TYPE_WORM (write once read many ... used for certain optical and tape drives). Why is lsblk assuming these are scsi devices? I'll need to go check that out. I'm not sure what nd and pmem are doing to result in that value. That is their libnd specific device type number from include/uapi/ndctl.h. 4 == ND_DEVICE_NAMESPACE_IO. lsblk has no business interpreting this as something SCSI specific. 2. To avoid confusing software trying to detect fast storage vs. slow storage devices via sysfs, this value should be 0: /sys/block/pmem0/queue/rotational:1 That can be done by adding this shortly after the blk_alloc_queue call: queue_flag_set_unlocked(QUEUE_FLAG_NONROT, pmem-pmem_queue); Yeah, good catch. 3. Is there any reason to have a 512 KiB limit on the transfer length? /sys/block/pmem0/queue/max_hw_sectors_kb:512 That is from: blk_queue_max_hw_sectors(pmem-pmem_queue, 1024); I'd only change this from the default if performance testing showed it made a non-trivial difference. 4. These are read-writeable, but IOs never reach a queue, so the queue size is irrelevant and merging never happens: /sys/block/pmem0/queue/nomerges:0 /sys/block/pmem0/queue/nr_requests:128 Consider making them both read-only with: * nomerges set to 2 (no merging happening) * nr_requests as small as the block layer allows to avoid wasting memory. 5. No scatter-gather lists are created by the driver, so these read-only fields are meaningless: /sys/block/pmem0/queue/max_segments:128 /sys/block/pmem0/queue/max_segment_size:65536 Is there a better way to report them as irrelevant? Again it comes back to the question of whether these default settings are actively harmful. 6. There is no completion processing, so the read-writeable cpu affinity is not used: /sys/block/pmem0/queue/rq_affinity:0 Consider making it read-only and set to 2, meaning the completions always run on the requesting CPU. There are no completions with pmem, the entire I/O path is synchronous. Ideally, this attribute would disappear for a pmem queue, not be set to 2. 7. With mmap() allowing less than logical block sized accesses to the device, this could be considered misleading: /sys/block/pmem0/queue/physical_block_size:512 I don't see how it is misleading. If you access it as a block device the block size is 512. If the application is mmap() + DAX aware it knows that the physical_block_size is being bypassed. Perhaps that needs to be 1 byte or a cacheline size (64 bytes on x86) to indicate that direct partial logical block accesses are possible. No, because that breaks the definition of a block device. Through the bdev interface it's always accessed a block at a time. The btt driver could report 512 as one indication it is different. I wouldn't be surprised if smaller values than the logical block size confused some software, though. Precisely why we shouldn't go there with pmem. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/