Re: [PATCH] device-dax: fail all private mapping attempts
On Wed, 16 Nov 2016, Dan Williams wrote: > The device-dax implementation originally tried to be tricky and allow > private read-only mappings, but in the process allowed writable > MAP_PRIVATE + MAP_NORESERVE mappings. For simplicity and predictability > just fail all private mapping attempts since device-dax memory is > statically allocated and will never support overcommit. > > Cc: > Cc: Dave Hansen > Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") > Reported-by: Pawel Lebioda > Signed-off-by: Dan Williams > --- > drivers/dax/dax.c |4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/dax/dax.c b/drivers/dax/dax.c > index 0e499bfca41c..3d94ff20fdca 100644 > --- a/drivers/dax/dax.c > +++ b/drivers/dax/dax.c > @@ -270,8 +270,8 @@ static int check_vma(struct dax_dev *dax_dev, struct > vm_area_struct *vma, > if (!dax_dev->alive) > return -ENXIO; > > - /* prevent private / writable mappings from being established */ > - if ((vma->vm_flags & (VM_NORESERVE|VM_SHARED|VM_WRITE)) == VM_WRITE) { > + /* prevent private mappings from being established */ > + if ((vma->vm_flags & VM_SHARED) != VM_SHARED) { I think that is more restrictive than you intended: haven't tried, but I believe it rejects a PROT_READ, MAP_SHARED, O_RDONLY fd mmap, leaving no way to mmap /dev/dax without write permission to it. See line 1393 of mm/mmap.c: the test you want is probably if (!(vma->vm_flags & VM_MAYSHARE)) Hugh > dev_info(dev, "%s: %s: fail, attempted private mapping\n", > current->comm, func); > return -EINVAL; > > -- ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
[PATCH v2] nfit: Fix extended status translations for ACPI DSMs
ACPI DSMs can have an 'extended' status which can be non-zero to convey additional information about the command. In the xlat_status routine, where we translate the command statuses, we were returning an error for a non-zero extended status, even if the primary status indicated success. Return from each command's 'case' once we have verified both its status and extend status are good. Cc: Dan Williams Signed-off-by: Vishal Verma --- drivers/acpi/nfit/core.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c index 71a7d07..60acbb1 100644 --- a/drivers/acpi/nfit/core.c +++ b/drivers/acpi/nfit/core.c @@ -113,7 +113,7 @@ static int xlat_status(void *buf, unsigned int cmd, u32 status) flags = ND_ARS_PERSISTENT | ND_ARS_VOLATILE; if ((status >> 16 & flags) == 0) return -ENOTTY; - break; + return 0; case ND_CMD_ARS_START: /* ARS is in progress */ if ((status & 0x) == NFIT_ARS_START_BUSY) @@ -122,7 +122,7 @@ static int xlat_status(void *buf, unsigned int cmd, u32 status) /* Command failed */ if (status & 0x) return -EIO; - break; + return 0; case ND_CMD_ARS_STATUS: ars_status = buf; /* Command failed */ @@ -154,7 +154,7 @@ static int xlat_status(void *buf, unsigned int cmd, u32 status) /* Unknown status */ if (status >> 16) return -EIO; - break; + return 0; case ND_CMD_CLEAR_ERROR: clear_err = buf; if (status & 0x) @@ -163,7 +163,7 @@ static int xlat_status(void *buf, unsigned int cmd, u32 status) return -EIO; if (clear_err->length > clear_err->cleared) return clear_err->cleared; - break; + return 0; default: break; } -- 2.7.4 ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH] nfit: Fix extended status translations for ACPI DSMs
On 12/5/2016 5:16 PM, Dan Williams wrote: > On Mon, Dec 5, 2016 at 1:54 PM, Linda Knippers wrote: >> On 12/05/2016 04:43 PM, Verma, Vishal L wrote: >>> On Mon, 2016-12-05 at 13:37 -0800, Dan Williams wrote: On Mon, Dec 5, 2016 at 1:27 PM, Vishal Verma wrote: > > ACPI DSMs can have an 'extended' status which can be non-zero to > convey > additional information about the command. In the xlat_status > routine, > where we translate the command statuses, we were returning an error > for > a non-zero extended status, even if the primary status indicated > success. > > Cc: Dan Williams > Signed-off-by: Vishal Verma > --- > drivers/acpi/nfit/core.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c > index 71a7d07..d14f09b 100644 > --- a/drivers/acpi/nfit/core.c > +++ b/drivers/acpi/nfit/core.c > @@ -169,7 +169,7 @@ static int xlat_status(void *buf, unsigned int > cmd, u32 status) > } > > /* all other non-zero status results in an error */ > - if (status) > + if (status & 0x) > return -EIO; I don't think this is right, because we have no idea at this point whether extended status is fatal or not. Each 'case' statement in that 'switch' should be returning 0 if it does not see any errors. Because that's the only part of the function with per-command knowledge of extended being benign / informational vs fatal. >>> >>> Good point - I was wondering just that.. I'll resend. >> >> But can't that function be called with the status for DSMs that aren't in >> switch >> statement? >> > > Yes, but keep in mind the only consumer of that "cmd_rc" result is > in-kernel callers. > >> All the DSM specs I've seen separate the status into status and extended or >> function-specific >> status, which is either defined or reserved. If the 2 bytes of status don't >> indicate >> a failure, I don't think you should return EIO just because there may be >> something set in a reserved bit. > > The kernel will only consume that status for ARS and label commands. > In the case of ND_CMD_CALL, and other DSMs that the kernel never > consumes internally, the translation to -EIO is benign. Actually, it looks like -EIO won't be returned because fw_status is still 0 when xlat_status is called so there's nothing to translate. Am I reading that right? If so, you could probably avoid the call. -- ljk ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH] nfit: Fix extended status translations for ACPI DSMs
On Mon, Dec 5, 2016 at 1:54 PM, Linda Knippers wrote: > On 12/05/2016 04:43 PM, Verma, Vishal L wrote: >> On Mon, 2016-12-05 at 13:37 -0800, Dan Williams wrote: >>> On Mon, Dec 5, 2016 at 1:27 PM, Vishal Verma >>> wrote: ACPI DSMs can have an 'extended' status which can be non-zero to convey additional information about the command. In the xlat_status routine, where we translate the command statuses, we were returning an error for a non-zero extended status, even if the primary status indicated success. Cc: Dan Williams Signed-off-by: Vishal Verma --- drivers/acpi/nfit/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c index 71a7d07..d14f09b 100644 --- a/drivers/acpi/nfit/core.c +++ b/drivers/acpi/nfit/core.c @@ -169,7 +169,7 @@ static int xlat_status(void *buf, unsigned int cmd, u32 status) } /* all other non-zero status results in an error */ - if (status) + if (status & 0x) return -EIO; >>> >>> I don't think this is right, because we have no idea at this point >>> whether extended status is fatal or not. >>> >>> Each 'case' statement in that 'switch' should be returning 0 if it >>> does not see any errors. Because that's the only part of the function >>> with per-command knowledge of extended being benign / informational vs >>> fatal. >> >> Good point - I was wondering just that.. I'll resend. > > But can't that function be called with the status for DSMs that aren't in > switch > statement? > Yes, but keep in mind the only consumer of that "cmd_rc" result is in-kernel callers. > All the DSM specs I've seen separate the status into status and extended or > function-specific > status, which is either defined or reserved. If the 2 bytes of status don't > indicate > a failure, I don't think you should return EIO just because there may be > something set in a reserved bit. The kernel will only consume that status for ARS and label commands. In the case of ND_CMD_CALL, and other DSMs that the kernel never consumes internally, the translation to -EIO is benign. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH] nfit: Fix extended status translations for ACPI DSMs
On 12/05/2016 04:43 PM, Verma, Vishal L wrote: > On Mon, 2016-12-05 at 13:37 -0800, Dan Williams wrote: >> On Mon, Dec 5, 2016 at 1:27 PM, Vishal Verma >> wrote: >>> >>> ACPI DSMs can have an 'extended' status which can be non-zero to >>> convey >>> additional information about the command. In the xlat_status >>> routine, >>> where we translate the command statuses, we were returning an error >>> for >>> a non-zero extended status, even if the primary status indicated >>> success. >>> >>> Cc: Dan Williams >>> Signed-off-by: Vishal Verma >>> --- >>> drivers/acpi/nfit/core.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c >>> index 71a7d07..d14f09b 100644 >>> --- a/drivers/acpi/nfit/core.c >>> +++ b/drivers/acpi/nfit/core.c >>> @@ -169,7 +169,7 @@ static int xlat_status(void *buf, unsigned int >>> cmd, u32 status) >>> } >>> >>> /* all other non-zero status results in an error */ >>> - if (status) >>> + if (status & 0x) >>> return -EIO; >> >> I don't think this is right, because we have no idea at this point >> whether extended status is fatal or not. >> >> Each 'case' statement in that 'switch' should be returning 0 if it >> does not see any errors. Because that's the only part of the function >> with per-command knowledge of extended being benign / informational vs >> fatal. > > Good point - I was wondering just that.. I'll resend. But can't that function be called with the status for DSMs that aren't in switch statement? All the DSM specs I've seen separate the status into status and extended or function-specific status, which is either defined or reserved. If the 2 bytes of status don't indicate a failure, I don't think you should return EIO just because there may be something set in a reserved bit. -- ljk > ___ > Linux-nvdimm mailing list > Linux-nvdimm@lists.01.org > https://lists.01.org/mailman/listinfo/linux-nvdimm > ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH] nfit: Fix extended status translations for ACPI DSMs
On Mon, 2016-12-05 at 13:37 -0800, Dan Williams wrote: > On Mon, Dec 5, 2016 at 1:27 PM, Vishal Verma > wrote: > > > > ACPI DSMs can have an 'extended' status which can be non-zero to > > convey > > additional information about the command. In the xlat_status > > routine, > > where we translate the command statuses, we were returning an error > > for > > a non-zero extended status, even if the primary status indicated > > success. > > > > Cc: Dan Williams > > Signed-off-by: Vishal Verma > > --- > > drivers/acpi/nfit/core.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c > > index 71a7d07..d14f09b 100644 > > --- a/drivers/acpi/nfit/core.c > > +++ b/drivers/acpi/nfit/core.c > > @@ -169,7 +169,7 @@ static int xlat_status(void *buf, unsigned int > > cmd, u32 status) > > } > > > > /* all other non-zero status results in an error */ > > - if (status) > > + if (status & 0x) > > return -EIO; > > I don't think this is right, because we have no idea at this point > whether extended status is fatal or not. > > Each 'case' statement in that 'switch' should be returning 0 if it > does not see any errors. Because that's the only part of the function > with per-command knowledge of extended being benign / informational vs > fatal. Good point - I was wondering just that.. I'll resend. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH] nfit: Fix extended status translations for ACPI DSMs
On Mon, Dec 5, 2016 at 1:27 PM, Vishal Verma wrote: > ACPI DSMs can have an 'extended' status which can be non-zero to convey > additional information about the command. In the xlat_status routine, > where we translate the command statuses, we were returning an error for > a non-zero extended status, even if the primary status indicated success. > > Cc: Dan Williams > Signed-off-by: Vishal Verma > --- > drivers/acpi/nfit/core.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c > index 71a7d07..d14f09b 100644 > --- a/drivers/acpi/nfit/core.c > +++ b/drivers/acpi/nfit/core.c > @@ -169,7 +169,7 @@ static int xlat_status(void *buf, unsigned int cmd, u32 > status) > } > > /* all other non-zero status results in an error */ > - if (status) > + if (status & 0x) > return -EIO; I don't think this is right, because we have no idea at this point whether extended status is fatal or not. Each 'case' statement in that 'switch' should be returning 0 if it does not see any errors. Because that's the only part of the function with per-command knowledge of extended being benign / informational vs fatal. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
[PATCH] nfit: Fix extended status translations for ACPI DSMs
ACPI DSMs can have an 'extended' status which can be non-zero to convey additional information about the command. In the xlat_status routine, where we translate the command statuses, we were returning an error for a non-zero extended status, even if the primary status indicated success. Cc: Dan Williams Signed-off-by: Vishal Verma --- drivers/acpi/nfit/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c index 71a7d07..d14f09b 100644 --- a/drivers/acpi/nfit/core.c +++ b/drivers/acpi/nfit/core.c @@ -169,7 +169,7 @@ static int xlat_status(void *buf, unsigned int cmd, u32 status) } /* all other non-zero status results in an error */ - if (status) + if (status & 0x) return -EIO; return 0; } -- 2.7.4 ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: Enabling peer to peer device transactions for PCIe devices
On Mon, Dec 05, 2016 at 12:46:14PM -0700, Jason Gunthorpe wrote: > In any event the allocator still needs to track which regions are in > use and be able to hook 'free' from userspace. That does suggest it > should be integrated into the nvme driver and not a bolt on driver.. Two totally different use cases: - a card that exposes directly byte addressable storage as a PCI-e bar. Thin of it as a nvdimm on a PCI-e card. That's the iopmem case. - the NVMe CMB which exposes a byte addressable indirection buffer for I/O, but does not actually provide byte addressable persistent storage. This is something that needs to be added to the NVMe driver (and the block layer for the abstraction probably). ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: Enabling peer to peer device transactions for PCIe devices
On 05/12/16 12:46 PM, Jason Gunthorpe wrote: NVMe might have to deal with pci-e hot-unplug, which is a similar problem-class to the GPU case.. Sure, but if the NVMe device gets hot-unplugged it means that all the CMB mappings are useless and need to be torn down. This probably means killing any process that has mappings open. In any event the allocator still needs to track which regions are in use and be able to hook 'free' from userspace. That does suggest it should be integrated into the nvme driver and not a bolt on driver.. Yup, that's correct. And yes, I've never suggested this to be a bolt on driver -- I always expected for it to get integrated into the nvme driver. (iopmem was not meant for this.) Logan ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: Enabling peer to peer device transactions for PCIe devices
On Mon, Dec 05, 2016 at 12:27:20PM -0700, Logan Gunthorpe wrote: > > > On 05/12/16 12:14 PM, Jason Gunthorpe wrote: > >But CMB sounds much more like the GPU case where there is a > >specialized allocator handing out the BAR to consumers, so I'm not > >sure a general purpose chardev makes a lot of sense? > > I don't think it will ever need to be as complicated as the GPU case. There > will probably only ever be a relatively small amount of memory behind the > CMB and really the only users are those doing P2P work. Thus the specialized > allocator could be pretty simple and I expect it would be fine to just > return -ENOMEM if there is not enough memory. NVMe might have to deal with pci-e hot-unplug, which is a similar problem-class to the GPU case.. In any event the allocator still needs to track which regions are in use and be able to hook 'free' from userspace. That does suggest it should be integrated into the nvme driver and not a bolt on driver.. Jason ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: Enabling peer to peer device transactions for PCIe devices
On 05/12/16 12:14 PM, Jason Gunthorpe wrote: But CMB sounds much more like the GPU case where there is a specialized allocator handing out the BAR to consumers, so I'm not sure a general purpose chardev makes a lot of sense? I don't think it will ever need to be as complicated as the GPU case. There will probably only ever be a relatively small amount of memory behind the CMB and really the only users are those doing P2P work. Thus the specialized allocator could be pretty simple and I expect it would be fine to just return -ENOMEM if there is not enough memory. Also, if it was implemented this way, if there was a need to make the allocator more complicated it could easily be added later as the userspace interface is just mmap to obtain a buffer. Logan ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: Enabling peer to peer device transactions for PCIe devices
On Mon, Dec 05, 2016 at 10:48:58AM -0800, Dan Williams wrote: > On Mon, Dec 5, 2016 at 10:39 AM, Logan Gunthorpe wrote: > > On 05/12/16 11:08 AM, Dan Williams wrote: > >> > >> I've already recommended that iopmem not be a block device and instead > >> be a device-dax instance. I also don't think it should claim the PCI > >> ID, rather the driver that wants to map one of its bars this way can > >> register the memory region with the device-dax core. > >> > >> I'm not sure there are enough device drivers that want to do this to > >> have it be a generic /sys/.../resource_dmableX capability. It still > >> seems to be an exotic one-off type of configuration. > > > > > > Yes, this is essentially my thinking. Except I think the userspace interface > > should really depend on the device itself. Device dax is a good choice for > > many and I agree the block device approach wouldn't be ideal. > > > > Specifically for NVME CMB: I think it would make a lot of sense to just hand > > out these mappings with an mmap call on /dev/nvmeX. I expect CMB buffers > > would be volatile and thus you wouldn't need to keep track of where in the > > BAR the region came from. Thus, the mmap call would just be an allocator > > from BAR memory. If device-dax were used, userspace would need to lookup > > which device-dax instance corresponds to which nvme drive. > > I'm not opposed to mapping /dev/nvmeX. However, the lookup is trivial > to accomplish in sysfs through /sys/dev/char to find the sysfs path > of But CMB sounds much more like the GPU case where there is a specialized allocator handing out the BAR to consumers, so I'm not sure a general purpose chardev makes a lot of sense? Jason ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: Enabling peer to peer device transactions for PCIe devices
On Mon, Dec 5, 2016 at 10:39 AM, Logan Gunthorpe wrote: > On 05/12/16 11:08 AM, Dan Williams wrote: >> >> I've already recommended that iopmem not be a block device and instead >> be a device-dax instance. I also don't think it should claim the PCI >> ID, rather the driver that wants to map one of its bars this way can >> register the memory region with the device-dax core. >> >> I'm not sure there are enough device drivers that want to do this to >> have it be a generic /sys/.../resource_dmableX capability. It still >> seems to be an exotic one-off type of configuration. > > > Yes, this is essentially my thinking. Except I think the userspace interface > should really depend on the device itself. Device dax is a good choice for > many and I agree the block device approach wouldn't be ideal. > > Specifically for NVME CMB: I think it would make a lot of sense to just hand > out these mappings with an mmap call on /dev/nvmeX. I expect CMB buffers > would be volatile and thus you wouldn't need to keep track of where in the > BAR the region came from. Thus, the mmap call would just be an allocator > from BAR memory. If device-dax were used, userspace would need to lookup > which device-dax instance corresponds to which nvme drive. > I'm not opposed to mapping /dev/nvmeX. However, the lookup is trivial to accomplish in sysfs through /sys/dev/char to find the sysfs path of the device-dax instance under the nvme device, or if you already have the nvme sysfs path the dax instance(s) will appear under the "dax" sub-directory. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: Enabling peer to peer device transactions for PCIe devices
On 05/12/16 11:08 AM, Dan Williams wrote: I've already recommended that iopmem not be a block device and instead be a device-dax instance. I also don't think it should claim the PCI ID, rather the driver that wants to map one of its bars this way can register the memory region with the device-dax core. I'm not sure there are enough device drivers that want to do this to have it be a generic /sys/.../resource_dmableX capability. It still seems to be an exotic one-off type of configuration. Yes, this is essentially my thinking. Except I think the userspace interface should really depend on the device itself. Device dax is a good choice for many and I agree the block device approach wouldn't be ideal. Specifically for NVME CMB: I think it would make a lot of sense to just hand out these mappings with an mmap call on /dev/nvmeX. I expect CMB buffers would be volatile and thus you wouldn't need to keep track of where in the BAR the region came from. Thus, the mmap call would just be an allocator from BAR memory. If device-dax were used, userspace would need to lookup which device-dax instance corresponds to which nvme drive. Logan ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: Enabling peer to peer device transactions for PCIe devices
On Mon, Dec 5, 2016 at 10:02 AM, Jason Gunthorpe wrote: > On Mon, Dec 05, 2016 at 09:40:38AM -0800, Dan Williams wrote: > >> > If it is kernel only with physical addresess we don't need a uAPI for >> > it, so I'm not sure #1 is at all related to iopmem. >> > >> > Most people who want #1 probably can just mmap >> > /sys/../pci/../resourceX to get a user handle to it, or pass around >> > __iomem pointers in the kernel. This has been asked for before with >> > RDMA. >> > >> > I'm still not really clear what iopmem is for, or why DAX should ever >> > be involved in this.. >> >> Right, by default remap_pfn_range() does not establish DMA capable >> mappings. You can think of iopmem as remap_pfn_range() converted to >> use devm_memremap_pages(). Given the extra constraints of >> devm_memremap_pages() it seems reasonable to have those DMA capable >> mappings be optionally established via a separate driver. > > Except the iopmem driver claims the PCI ID, and presents a block > interface which is really *NOT* what people who have asked for this in > the past have wanted. IIRC it was embedded stuff eg RDMA video > directly out of a capture card or a similar kind of thinking. > > It is a good point about devm_memremap_pages limitations, but maybe > that just says to create a /sys/.../resource_dmableX ? > > Or is there some reason why people want a filesystem on top of BAR > memory? That does not seem to have been covered yet.. > I've already recommended that iopmem not be a block device and instead be a device-dax instance. I also don't think it should claim the PCI ID, rather the driver that wants to map one of its bars this way can register the memory region with the device-dax core. I'm not sure there are enough device drivers that want to do this to have it be a generic /sys/.../resource_dmableX capability. It still seems to be an exotic one-off type of configuration. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: Enabling peer to peer device transactions for PCIe devices
On Mon, Dec 05, 2016 at 09:40:38AM -0800, Dan Williams wrote: > > If it is kernel only with physical addresess we don't need a uAPI for > > it, so I'm not sure #1 is at all related to iopmem. > > > > Most people who want #1 probably can just mmap > > /sys/../pci/../resourceX to get a user handle to it, or pass around > > __iomem pointers in the kernel. This has been asked for before with > > RDMA. > > > > I'm still not really clear what iopmem is for, or why DAX should ever > > be involved in this.. > > Right, by default remap_pfn_range() does not establish DMA capable > mappings. You can think of iopmem as remap_pfn_range() converted to > use devm_memremap_pages(). Given the extra constraints of > devm_memremap_pages() it seems reasonable to have those DMA capable > mappings be optionally established via a separate driver. Except the iopmem driver claims the PCI ID, and presents a block interface which is really *NOT* what people who have asked for this in the past have wanted. IIRC it was embedded stuff eg RDMA video directly out of a capture card or a similar kind of thinking. It is a good point about devm_memremap_pages limitations, but maybe that just says to create a /sys/.../resource_dmableX ? Or is there some reason why people want a filesystem on top of BAR memory? That does not seem to have been covered yet.. Jason ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: Enabling peer to peer device transactions for PCIe devices
On Mon, Dec 5, 2016 at 9:18 AM, Jason Gunthorpe wrote: > On Sun, Dec 04, 2016 at 07:23:00AM -0600, Stephen Bates wrote: >> Hi All >> >> This has been a great thread (thanks to Alex for kicking it off) and I >> wanted to jump in and maybe try and put some summary around the >> discussion. I also wanted to propose we include this as a topic for LFS/MM >> because I think we need more discussion on the best way to add this >> functionality to the kernel. >> >> As far as I can tell the people looking for P2P support in the kernel fall >> into two main camps: >> >> 1. Those who simply want to expose static BARs on PCIe devices that can be >> used as the source/destination for DMAs from another PCIe device. This >> group has no need for memory invalidation and are happy to use >> physical/bus addresses and not virtual addresses. > > I didn't think there was much on this topic except for the CMB > thing.. Even that is really a mapped kernel address.. > >> I think something like the iopmem patches Logan and I submitted recently >> come close to addressing use case 1. There are some issues around >> routability but based on feedback to date that does not seem to be a >> show-stopper for an initial inclusion. > > If it is kernel only with physical addresess we don't need a uAPI for > it, so I'm not sure #1 is at all related to iopmem. > > Most people who want #1 probably can just mmap > /sys/../pci/../resourceX to get a user handle to it, or pass around > __iomem pointers in the kernel. This has been asked for before with > RDMA. > > I'm still not really clear what iopmem is for, or why DAX should ever > be involved in this.. Right, by default remap_pfn_range() does not establish DMA capable mappings. You can think of iopmem as remap_pfn_range() converted to use devm_memremap_pages(). Given the extra constraints of devm_memremap_pages() it seems reasonable to have those DMA capable mappings be optionally established via a separate driver. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: Enabling peer to peer device transactions for PCIe devices
On Sun, Dec 04, 2016 at 07:23:00AM -0600, Stephen Bates wrote: > Hi All > > This has been a great thread (thanks to Alex for kicking it off) and I > wanted to jump in and maybe try and put some summary around the > discussion. I also wanted to propose we include this as a topic for LFS/MM > because I think we need more discussion on the best way to add this > functionality to the kernel. > > As far as I can tell the people looking for P2P support in the kernel fall > into two main camps: > > 1. Those who simply want to expose static BARs on PCIe devices that can be > used as the source/destination for DMAs from another PCIe device. This > group has no need for memory invalidation and are happy to use > physical/bus addresses and not virtual addresses. I didn't think there was much on this topic except for the CMB thing.. Even that is really a mapped kernel address.. > I think something like the iopmem patches Logan and I submitted recently > come close to addressing use case 1. There are some issues around > routability but based on feedback to date that does not seem to be a > show-stopper for an initial inclusion. If it is kernel only with physical addresess we don't need a uAPI for it, so I'm not sure #1 is at all related to iopmem. Most people who want #1 probably can just mmap /sys/../pci/../resourceX to get a user handle to it, or pass around __iomem pointers in the kernel. This has been asked for before with RDMA. I'm still not really clear what iopmem is for, or why DAX should ever be involved in this.. > For use-case 2 it looks like there are several options and some of them > (like HMM) have been around for quite some time without gaining > acceptance. I think there needs to be more discussion on this usecase and > it could be some time before we get something upstreamable. AFAIK, hmm makes parts easier, but isn't directly addressing this need.. I think you need to get ZONE_DEVICE accepted for non-cachable PCI BARs as the first step. >From there is pretty clear we the DMA API needs to be updated to support that use and work can be done to solve the various problems there on the basis of using ZONE_DEVICE pages to figure out to the PCI-E end points Jason ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH] e820: use module_platform_driver
On Mon, Dec 5, 2016 at 12:23 AM, Johannes Thumshirn wrote: > Use module_platform_driver for the e820 driver instead of open-coding it. > > Signed-off-by: Johannes Thumshirn Applied, thanks Johannes. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
每个行政工作者都必须面临的问题
ÐèÒª²éÔĿγ̴ó¸Ù£¬·³Çë²éÊÕ¸½¼þ£¬ Message-ID: <278501278501278501278501278501278501278501278501278501> From: =linux-nvdimm= To: ·¢ ËÍ Ê± ¼ä:2016-12-05 19:07:56 ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
[PATCH] e820: use module_platform_driver
Use module_platform_driver for the e820 driver instead of open-coding it. Signed-off-by: Johannes Thumshirn --- drivers/nvdimm/e820.c | 12 +--- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/drivers/nvdimm/e820.c b/drivers/nvdimm/e820.c index 11ea901..6f9a6ff 100644 --- a/drivers/nvdimm/e820.c +++ b/drivers/nvdimm/e820.c @@ -84,18 +84,8 @@ static struct platform_driver e820_pmem_driver = { }, }; -static __init int e820_pmem_init(void) -{ - return platform_driver_register(&e820_pmem_driver); -} - -static __exit void e820_pmem_exit(void) -{ - platform_driver_unregister(&e820_pmem_driver); -} +module_platform_driver(e820_pmem_driver); MODULE_ALIAS("platform:e820_pmem*"); MODULE_LICENSE("GPL v2"); MODULE_AUTHOR("Intel Corporation"); -module_init(e820_pmem_init); -module_exit(e820_pmem_exit); -- 2.10.2 ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm