Re: [PATCH] device-dax: fail all private mapping attempts

2016-12-05 Thread Hugh Dickins
On Wed, 16 Nov 2016, Dan Williams wrote:

> The device-dax implementation originally tried to be tricky and allow
> private read-only mappings, but in the process allowed writable
> MAP_PRIVATE + MAP_NORESERVE mappings.  For simplicity and predictability
> just fail all private mapping attempts since device-dax memory is
> statically allocated and will never support overcommit.
> 
> Cc: 
> Cc: Dave Hansen 
> Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap")
> Reported-by: Pawel Lebioda 
> Signed-off-by: Dan Williams 
> ---
>  drivers/dax/dax.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/dax/dax.c b/drivers/dax/dax.c
> index 0e499bfca41c..3d94ff20fdca 100644
> --- a/drivers/dax/dax.c
> +++ b/drivers/dax/dax.c
> @@ -270,8 +270,8 @@ static int check_vma(struct dax_dev *dax_dev, struct 
> vm_area_struct *vma,
>   if (!dax_dev->alive)
>   return -ENXIO;
>  
> - /* prevent private / writable mappings from being established */
> - if ((vma->vm_flags & (VM_NORESERVE|VM_SHARED|VM_WRITE)) == VM_WRITE) {
> + /* prevent private mappings from being established */
> + if ((vma->vm_flags & VM_SHARED) != VM_SHARED) {

I think that is more restrictive than you intended: haven't tried,
but I believe it rejects a PROT_READ, MAP_SHARED, O_RDONLY fd mmap,
leaving no way to mmap /dev/dax without write permission to it.

See line 1393 of mm/mmap.c: the test you want is probably
if (!(vma->vm_flags & VM_MAYSHARE))

Hugh

>   dev_info(dev, "%s: %s: fail, attempted private mapping\n",
>   current->comm, func);
>   return -EINVAL;
> 
> --
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v2] nfit: Fix extended status translations for ACPI DSMs

2016-12-05 Thread Vishal Verma
ACPI DSMs can have an 'extended' status which can be non-zero to convey
additional information about the command. In the xlat_status routine,
where we translate the command statuses, we were returning an error for
a non-zero extended status, even if the primary status indicated success.

Return from each command's 'case' once we have verified both its status
and extend status are good.

Cc: Dan Williams 
Signed-off-by: Vishal Verma 
---
 drivers/acpi/nfit/core.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index 71a7d07..60acbb1 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -113,7 +113,7 @@ static int xlat_status(void *buf, unsigned int cmd, u32 
status)
flags = ND_ARS_PERSISTENT | ND_ARS_VOLATILE;
if ((status >> 16 & flags) == 0)
return -ENOTTY;
-   break;
+   return 0;
case ND_CMD_ARS_START:
/* ARS is in progress */
if ((status & 0x) == NFIT_ARS_START_BUSY)
@@ -122,7 +122,7 @@ static int xlat_status(void *buf, unsigned int cmd, u32 
status)
/* Command failed */
if (status & 0x)
return -EIO;
-   break;
+   return 0;
case ND_CMD_ARS_STATUS:
ars_status = buf;
/* Command failed */
@@ -154,7 +154,7 @@ static int xlat_status(void *buf, unsigned int cmd, u32 
status)
/* Unknown status */
if (status >> 16)
return -EIO;
-   break;
+   return 0;
case ND_CMD_CLEAR_ERROR:
clear_err = buf;
if (status & 0x)
@@ -163,7 +163,7 @@ static int xlat_status(void *buf, unsigned int cmd, u32 
status)
return -EIO;
if (clear_err->length > clear_err->cleared)
return clear_err->cleared;
-   break;
+   return 0;
default:
break;
}
-- 
2.7.4

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] nfit: Fix extended status translations for ACPI DSMs

2016-12-05 Thread Linda Knippers


On 12/5/2016 5:16 PM, Dan Williams wrote:
> On Mon, Dec 5, 2016 at 1:54 PM, Linda Knippers  wrote:
>> On 12/05/2016 04:43 PM, Verma, Vishal L wrote:
>>> On Mon, 2016-12-05 at 13:37 -0800, Dan Williams wrote:
 On Mon, Dec 5, 2016 at 1:27 PM, Vishal Verma  wrote:
>
> ACPI DSMs can have an 'extended' status which can be non-zero to
> convey
> additional information about the command. In the xlat_status
> routine,
> where we translate the command statuses, we were returning an error
> for
> a non-zero extended status, even if the primary status indicated
> success.
>
> Cc: Dan Williams 
> Signed-off-by: Vishal Verma 
> ---
>  drivers/acpi/nfit/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
> index 71a7d07..d14f09b 100644
> --- a/drivers/acpi/nfit/core.c
> +++ b/drivers/acpi/nfit/core.c
> @@ -169,7 +169,7 @@ static int xlat_status(void *buf, unsigned int
> cmd, u32 status)
> }
>
> /* all other non-zero status results in an error */
> -   if (status)
> +   if (status & 0x)
> return -EIO;

 I don't think this is right, because we have no idea at this point
 whether extended status is fatal or not.

 Each 'case' statement in that 'switch' should be returning 0 if it
 does not see any errors. Because that's the only part of the function
 with per-command knowledge of extended being benign / informational vs
 fatal.
>>>
>>> Good point - I was wondering just that.. I'll resend.
>>
>> But can't that function be called with the status for DSMs that aren't in 
>> switch
>> statement?
>>
> 
> Yes, but keep in mind the only consumer of that "cmd_rc" result is
> in-kernel callers.
> 
>> All the DSM specs I've seen separate the status into status and extended or 
>> function-specific
>> status,  which is either defined or reserved. If the 2 bytes of status don't 
>> indicate
>> a failure, I don't think you should return EIO just because there may be
>> something set in a reserved bit.
> 
> The kernel will only consume that status for ARS and label commands.
> In the case of ND_CMD_CALL, and other DSMs that the kernel never
> consumes internally, the translation to -EIO is benign.

Actually, it looks like -EIO won't be returned because fw_status is still 0
when xlat_status is called so there's nothing to translate.  Am I reading
that right?  If so, you could probably avoid the call.

-- ljk
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] nfit: Fix extended status translations for ACPI DSMs

2016-12-05 Thread Dan Williams
On Mon, Dec 5, 2016 at 1:54 PM, Linda Knippers  wrote:
> On 12/05/2016 04:43 PM, Verma, Vishal L wrote:
>> On Mon, 2016-12-05 at 13:37 -0800, Dan Williams wrote:
>>> On Mon, Dec 5, 2016 at 1:27 PM, Vishal Verma >>> wrote:

 ACPI DSMs can have an 'extended' status which can be non-zero to
 convey
 additional information about the command. In the xlat_status
 routine,
 where we translate the command statuses, we were returning an error
 for
 a non-zero extended status, even if the primary status indicated
 success.

 Cc: Dan Williams 
 Signed-off-by: Vishal Verma 
 ---
  drivers/acpi/nfit/core.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
 index 71a7d07..d14f09b 100644
 --- a/drivers/acpi/nfit/core.c
 +++ b/drivers/acpi/nfit/core.c
 @@ -169,7 +169,7 @@ static int xlat_status(void *buf, unsigned int
 cmd, u32 status)
 }

 /* all other non-zero status results in an error */
 -   if (status)
 +   if (status & 0x)
 return -EIO;
>>>
>>> I don't think this is right, because we have no idea at this point
>>> whether extended status is fatal or not.
>>>
>>> Each 'case' statement in that 'switch' should be returning 0 if it
>>> does not see any errors. Because that's the only part of the function
>>> with per-command knowledge of extended being benign / informational vs
>>> fatal.
>>
>> Good point - I was wondering just that.. I'll resend.
>
> But can't that function be called with the status for DSMs that aren't in 
> switch
> statement?
>

Yes, but keep in mind the only consumer of that "cmd_rc" result is
in-kernel callers.

> All the DSM specs I've seen separate the status into status and extended or 
> function-specific
> status,  which is either defined or reserved. If the 2 bytes of status don't 
> indicate
> a failure, I don't think you should return EIO just because there may be
> something set in a reserved bit.

The kernel will only consume that status for ARS and label commands.
In the case of ND_CMD_CALL, and other DSMs that the kernel never
consumes internally, the translation to -EIO is benign.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] nfit: Fix extended status translations for ACPI DSMs

2016-12-05 Thread Linda Knippers
On 12/05/2016 04:43 PM, Verma, Vishal L wrote:
> On Mon, 2016-12-05 at 13:37 -0800, Dan Williams wrote:
>> On Mon, Dec 5, 2016 at 1:27 PM, Vishal Verma >> wrote:
>>>
>>> ACPI DSMs can have an 'extended' status which can be non-zero to
>>> convey
>>> additional information about the command. In the xlat_status
>>> routine,
>>> where we translate the command statuses, we were returning an error
>>> for
>>> a non-zero extended status, even if the primary status indicated
>>> success.
>>>
>>> Cc: Dan Williams 
>>> Signed-off-by: Vishal Verma 
>>> ---
>>>  drivers/acpi/nfit/core.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
>>> index 71a7d07..d14f09b 100644
>>> --- a/drivers/acpi/nfit/core.c
>>> +++ b/drivers/acpi/nfit/core.c
>>> @@ -169,7 +169,7 @@ static int xlat_status(void *buf, unsigned int
>>> cmd, u32 status)
>>> }
>>>
>>> /* all other non-zero status results in an error */
>>> -   if (status)
>>> +   if (status & 0x)
>>> return -EIO;
>>
>> I don't think this is right, because we have no idea at this point
>> whether extended status is fatal or not.
>>
>> Each 'case' statement in that 'switch' should be returning 0 if it
>> does not see any errors. Because that's the only part of the function
>> with per-command knowledge of extended being benign / informational vs
>> fatal.
> 
> Good point - I was wondering just that.. I'll resend.

But can't that function be called with the status for DSMs that aren't in switch
statement?

All the DSM specs I've seen separate the status into status and extended or 
function-specific
status,  which is either defined or reserved. If the 2 bytes of status don't 
indicate
a failure, I don't think you should return EIO just because there may be
something set in a reserved bit.

-- ljk

> ___
> Linux-nvdimm mailing list
> Linux-nvdimm@lists.01.org
> https://lists.01.org/mailman/listinfo/linux-nvdimm
> 

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] nfit: Fix extended status translations for ACPI DSMs

2016-12-05 Thread Verma, Vishal L
On Mon, 2016-12-05 at 13:37 -0800, Dan Williams wrote:
> On Mon, Dec 5, 2016 at 1:27 PM, Vishal Verma  > wrote:
> > 
> > ACPI DSMs can have an 'extended' status which can be non-zero to
> > convey
> > additional information about the command. In the xlat_status
> > routine,
> > where we translate the command statuses, we were returning an error
> > for
> > a non-zero extended status, even if the primary status indicated
> > success.
> > 
> > Cc: Dan Williams 
> > Signed-off-by: Vishal Verma 
> > ---
> >  drivers/acpi/nfit/core.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
> > index 71a7d07..d14f09b 100644
> > --- a/drivers/acpi/nfit/core.c
> > +++ b/drivers/acpi/nfit/core.c
> > @@ -169,7 +169,7 @@ static int xlat_status(void *buf, unsigned int
> > cmd, u32 status)
> > }
> > 
> > /* all other non-zero status results in an error */
> > -   if (status)
> > +   if (status & 0x)
> > return -EIO;
> 
> I don't think this is right, because we have no idea at this point
> whether extended status is fatal or not.
> 
> Each 'case' statement in that 'switch' should be returning 0 if it
> does not see any errors. Because that's the only part of the function
> with per-command knowledge of extended being benign / informational vs
> fatal.

Good point - I was wondering just that.. I'll resend.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] nfit: Fix extended status translations for ACPI DSMs

2016-12-05 Thread Dan Williams
On Mon, Dec 5, 2016 at 1:27 PM, Vishal Verma  wrote:
> ACPI DSMs can have an 'extended' status which can be non-zero to convey
> additional information about the command. In the xlat_status routine,
> where we translate the command statuses, we were returning an error for
> a non-zero extended status, even if the primary status indicated success.
>
> Cc: Dan Williams 
> Signed-off-by: Vishal Verma 
> ---
>  drivers/acpi/nfit/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
> index 71a7d07..d14f09b 100644
> --- a/drivers/acpi/nfit/core.c
> +++ b/drivers/acpi/nfit/core.c
> @@ -169,7 +169,7 @@ static int xlat_status(void *buf, unsigned int cmd, u32 
> status)
> }
>
> /* all other non-zero status results in an error */
> -   if (status)
> +   if (status & 0x)
> return -EIO;

I don't think this is right, because we have no idea at this point
whether extended status is fatal or not.

Each 'case' statement in that 'switch' should be returning 0 if it
does not see any errors. Because that's the only part of the function
with per-command knowledge of extended being benign / informational vs
fatal.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH] nfit: Fix extended status translations for ACPI DSMs

2016-12-05 Thread Vishal Verma
ACPI DSMs can have an 'extended' status which can be non-zero to convey
additional information about the command. In the xlat_status routine,
where we translate the command statuses, we were returning an error for
a non-zero extended status, even if the primary status indicated success.

Cc: Dan Williams 
Signed-off-by: Vishal Verma 
---
 drivers/acpi/nfit/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index 71a7d07..d14f09b 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -169,7 +169,7 @@ static int xlat_status(void *buf, unsigned int cmd, u32 
status)
}
 
/* all other non-zero status results in an error */
-   if (status)
+   if (status & 0x)
return -EIO;
return 0;
 }
-- 
2.7.4

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: Enabling peer to peer device transactions for PCIe devices

2016-12-05 Thread Christoph Hellwig
On Mon, Dec 05, 2016 at 12:46:14PM -0700, Jason Gunthorpe wrote:
> In any event the allocator still needs to track which regions are in
> use and be able to hook 'free' from userspace. That does suggest it
> should be integrated into the nvme driver and not a bolt on driver..

Two totally different use cases:

 - a card that exposes directly byte addressable storage as a PCI-e
   bar.  Thin of it as a nvdimm on a PCI-e card.  That's the iopmem
   case.
 - the NVMe CMB which exposes a byte addressable indirection buffer for
   I/O, but does not actually provide byte addressable persistent
   storage.  This is something that needs to be added to the NVMe driver
   (and the block layer for the abstraction probably).
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: Enabling peer to peer device transactions for PCIe devices

2016-12-05 Thread Logan Gunthorpe



On 05/12/16 12:46 PM, Jason Gunthorpe wrote:

NVMe might have to deal with pci-e hot-unplug, which is a similar
problem-class to the GPU case..


Sure, but if the NVMe device gets hot-unplugged it means that all the 
CMB mappings are useless and need to be torn down. This probably means 
killing any process that has mappings open.



In any event the allocator still needs to track which regions are in
use and be able to hook 'free' from userspace. That does suggest it
should be integrated into the nvme driver and not a bolt on driver..


Yup, that's correct. And yes, I've never suggested this to be a bolt on 
driver -- I always expected for it to get integrated into the nvme 
driver. (iopmem was not meant for this.)


Logan
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: Enabling peer to peer device transactions for PCIe devices

2016-12-05 Thread Jason Gunthorpe
On Mon, Dec 05, 2016 at 12:27:20PM -0700, Logan Gunthorpe wrote:
> 
> 
> On 05/12/16 12:14 PM, Jason Gunthorpe wrote:
> >But CMB sounds much more like the GPU case where there is a
> >specialized allocator handing out the BAR to consumers, so I'm not
> >sure a general purpose chardev makes a lot of sense?
> 
> I don't think it will ever need to be as complicated as the GPU case. There
> will probably only ever be a relatively small amount of memory behind the
> CMB and really the only users are those doing P2P work. Thus the specialized
> allocator could be pretty simple and I expect it would be fine to just
> return -ENOMEM if there is not enough memory.

NVMe might have to deal with pci-e hot-unplug, which is a similar
problem-class to the GPU case..

In any event the allocator still needs to track which regions are in
use and be able to hook 'free' from userspace. That does suggest it
should be integrated into the nvme driver and not a bolt on driver..

Jason
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: Enabling peer to peer device transactions for PCIe devices

2016-12-05 Thread Logan Gunthorpe



On 05/12/16 12:14 PM, Jason Gunthorpe wrote:

But CMB sounds much more like the GPU case where there is a
specialized allocator handing out the BAR to consumers, so I'm not
sure a general purpose chardev makes a lot of sense?


I don't think it will ever need to be as complicated as the GPU case. 
There will probably only ever be a relatively small amount of memory 
behind the CMB and really the only users are those doing P2P work. Thus 
the specialized allocator could be pretty simple and I expect it would 
be fine to just return -ENOMEM if there is not enough memory.


Also, if it was implemented this way, if there was a need to make the 
allocator more complicated it could easily be added later as the 
userspace interface is just mmap to obtain a buffer.


Logan
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: Enabling peer to peer device transactions for PCIe devices

2016-12-05 Thread Jason Gunthorpe
On Mon, Dec 05, 2016 at 10:48:58AM -0800, Dan Williams wrote:
> On Mon, Dec 5, 2016 at 10:39 AM, Logan Gunthorpe  wrote:
> > On 05/12/16 11:08 AM, Dan Williams wrote:
> >>
> >> I've already recommended that iopmem not be a block device and instead
> >> be a device-dax instance. I also don't think it should claim the PCI
> >> ID, rather the driver that wants to map one of its bars this way can
> >> register the memory region with the device-dax core.
> >>
> >> I'm not sure there are enough device drivers that want to do this to
> >> have it be a generic /sys/.../resource_dmableX capability. It still
> >> seems to be an exotic one-off type of configuration.
> >
> >
> > Yes, this is essentially my thinking. Except I think the userspace interface
> > should really depend on the device itself. Device dax is a good  choice for
> > many and I agree the block device approach wouldn't be ideal.
> >
> > Specifically for NVME CMB: I think it would make a lot of sense to just hand
> > out these mappings with an mmap call on /dev/nvmeX. I expect CMB buffers
> > would be volatile and thus you wouldn't need to keep track of where in the
> > BAR the region came from. Thus, the mmap call would just be an allocator
> > from BAR memory. If device-dax were used, userspace would need to lookup
> > which device-dax instance corresponds to which nvme drive.
> 
> I'm not opposed to mapping /dev/nvmeX.  However, the lookup is trivial
> to accomplish in sysfs through /sys/dev/char to find the sysfs path
> of

But CMB sounds much more like the GPU case where there is a
specialized allocator handing out the BAR to consumers, so I'm not
sure a general purpose chardev makes a lot of sense?

Jason
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: Enabling peer to peer device transactions for PCIe devices

2016-12-05 Thread Dan Williams
On Mon, Dec 5, 2016 at 10:39 AM, Logan Gunthorpe  wrote:
> On 05/12/16 11:08 AM, Dan Williams wrote:
>>
>> I've already recommended that iopmem not be a block device and instead
>> be a device-dax instance. I also don't think it should claim the PCI
>> ID, rather the driver that wants to map one of its bars this way can
>> register the memory region with the device-dax core.
>>
>> I'm not sure there are enough device drivers that want to do this to
>> have it be a generic /sys/.../resource_dmableX capability. It still
>> seems to be an exotic one-off type of configuration.
>
>
> Yes, this is essentially my thinking. Except I think the userspace interface
> should really depend on the device itself. Device dax is a good  choice for
> many and I agree the block device approach wouldn't be ideal.
>
> Specifically for NVME CMB: I think it would make a lot of sense to just hand
> out these mappings with an mmap call on /dev/nvmeX. I expect CMB buffers
> would be volatile and thus you wouldn't need to keep track of where in the
> BAR the region came from. Thus, the mmap call would just be an allocator
> from BAR memory. If device-dax were used, userspace would need to lookup
> which device-dax instance corresponds to which nvme drive.
>

I'm not opposed to mapping /dev/nvmeX.  However, the lookup is trivial
to accomplish in sysfs through /sys/dev/char to find the sysfs path of
the device-dax instance under the nvme device, or if you already have
the nvme sysfs path the dax instance(s) will appear under the "dax"
sub-directory.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: Enabling peer to peer device transactions for PCIe devices

2016-12-05 Thread Logan Gunthorpe

On 05/12/16 11:08 AM, Dan Williams wrote:

I've already recommended that iopmem not be a block device and instead
be a device-dax instance. I also don't think it should claim the PCI
ID, rather the driver that wants to map one of its bars this way can
register the memory region with the device-dax core.

I'm not sure there are enough device drivers that want to do this to
have it be a generic /sys/.../resource_dmableX capability. It still
seems to be an exotic one-off type of configuration.


Yes, this is essentially my thinking. Except I think the userspace 
interface should really depend on the device itself. Device dax is a 
good  choice for many and I agree the block device approach wouldn't be 
ideal.


Specifically for NVME CMB: I think it would make a lot of sense to just 
hand out these mappings with an mmap call on /dev/nvmeX. I expect CMB 
buffers would be volatile and thus you wouldn't need to keep track of 
where in the BAR the region came from. Thus, the mmap call would just be 
an allocator from BAR memory. If device-dax were used, userspace would 
need to lookup which device-dax instance corresponds to which nvme drive.


Logan


___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: Enabling peer to peer device transactions for PCIe devices

2016-12-05 Thread Dan Williams
On Mon, Dec 5, 2016 at 10:02 AM, Jason Gunthorpe
 wrote:
> On Mon, Dec 05, 2016 at 09:40:38AM -0800, Dan Williams wrote:
>
>> > If it is kernel only with physical addresess we don't need a uAPI for
>> > it, so I'm not sure #1 is at all related to iopmem.
>> >
>> > Most people who want #1 probably can just mmap
>> > /sys/../pci/../resourceX to get a user handle to it, or pass around
>> > __iomem pointers in the kernel. This has been asked for before with
>> > RDMA.
>> >
>> > I'm still not really clear what iopmem is for, or why DAX should ever
>> > be involved in this..
>>
>> Right, by default remap_pfn_range() does not establish DMA capable
>> mappings. You can think of iopmem as remap_pfn_range() converted to
>> use devm_memremap_pages(). Given the extra constraints of
>> devm_memremap_pages() it seems reasonable to have those DMA capable
>> mappings be optionally established via a separate driver.
>
> Except the iopmem driver claims the PCI ID, and presents a block
> interface which is really *NOT* what people who have asked for this in
> the past have wanted. IIRC it was embedded stuff eg RDMA video
> directly out of a capture card or a similar kind of thinking.
>
> It is a good point about devm_memremap_pages limitations, but maybe
> that just says to create a /sys/.../resource_dmableX ?
>
> Or is there some reason why people want a filesystem on top of BAR
> memory? That does not seem to have been covered yet..
>

I've already recommended that iopmem not be a block device and instead
be a device-dax instance. I also don't think it should claim the PCI
ID, rather the driver that wants to map one of its bars this way can
register the memory region with the device-dax core.

I'm not sure there are enough device drivers that want to do this to
have it be a generic /sys/.../resource_dmableX capability. It still
seems to be an exotic one-off type of configuration.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: Enabling peer to peer device transactions for PCIe devices

2016-12-05 Thread Jason Gunthorpe
On Mon, Dec 05, 2016 at 09:40:38AM -0800, Dan Williams wrote:

> > If it is kernel only with physical addresess we don't need a uAPI for
> > it, so I'm not sure #1 is at all related to iopmem.
> >
> > Most people who want #1 probably can just mmap
> > /sys/../pci/../resourceX to get a user handle to it, or pass around
> > __iomem pointers in the kernel. This has been asked for before with
> > RDMA.
> >
> > I'm still not really clear what iopmem is for, or why DAX should ever
> > be involved in this..
> 
> Right, by default remap_pfn_range() does not establish DMA capable
> mappings. You can think of iopmem as remap_pfn_range() converted to
> use devm_memremap_pages(). Given the extra constraints of
> devm_memremap_pages() it seems reasonable to have those DMA capable
> mappings be optionally established via a separate driver.

Except the iopmem driver claims the PCI ID, and presents a block
interface which is really *NOT* what people who have asked for this in
the past have wanted. IIRC it was embedded stuff eg RDMA video
directly out of a capture card or a similar kind of thinking.

It is a good point about devm_memremap_pages limitations, but maybe
that just says to create a /sys/.../resource_dmableX ?

Or is there some reason why people want a filesystem on top of BAR
memory? That does not seem to have been covered yet..

Jason
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: Enabling peer to peer device transactions for PCIe devices

2016-12-05 Thread Dan Williams
On Mon, Dec 5, 2016 at 9:18 AM, Jason Gunthorpe
 wrote:
> On Sun, Dec 04, 2016 at 07:23:00AM -0600, Stephen Bates wrote:
>> Hi All
>>
>> This has been a great thread (thanks to Alex for kicking it off) and I
>> wanted to jump in and maybe try and put some summary around the
>> discussion. I also wanted to propose we include this as a topic for LFS/MM
>> because I think we need more discussion on the best way to add this
>> functionality to the kernel.
>>
>> As far as I can tell the people looking for P2P support in the kernel fall
>> into two main camps:
>>
>> 1. Those who simply want to expose static BARs on PCIe devices that can be
>> used as the source/destination for DMAs from another PCIe device. This
>> group has no need for memory invalidation and are happy to use
>> physical/bus addresses and not virtual addresses.
>
> I didn't think there was much on this topic except for the CMB
> thing.. Even that is really a mapped kernel address..
>
>> I think something like the iopmem patches Logan and I submitted recently
>> come close to addressing use case 1. There are some issues around
>> routability but based on feedback to date that does not seem to be a
>> show-stopper for an initial inclusion.
>
> If it is kernel only with physical addresess we don't need a uAPI for
> it, so I'm not sure #1 is at all related to iopmem.
>
> Most people who want #1 probably can just mmap
> /sys/../pci/../resourceX to get a user handle to it, or pass around
> __iomem pointers in the kernel. This has been asked for before with
> RDMA.
>
> I'm still not really clear what iopmem is for, or why DAX should ever
> be involved in this..

Right, by default remap_pfn_range() does not establish DMA capable
mappings. You can think of iopmem as remap_pfn_range() converted to
use devm_memremap_pages(). Given the extra constraints of
devm_memremap_pages() it seems reasonable to have those DMA capable
mappings be optionally established via a separate driver.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: Enabling peer to peer device transactions for PCIe devices

2016-12-05 Thread Jason Gunthorpe
On Sun, Dec 04, 2016 at 07:23:00AM -0600, Stephen Bates wrote:
> Hi All
> 
> This has been a great thread (thanks to Alex for kicking it off) and I
> wanted to jump in and maybe try and put some summary around the
> discussion. I also wanted to propose we include this as a topic for LFS/MM
> because I think we need more discussion on the best way to add this
> functionality to the kernel.
> 
> As far as I can tell the people looking for P2P support in the kernel fall
> into two main camps:
> 
> 1. Those who simply want to expose static BARs on PCIe devices that can be
> used as the source/destination for DMAs from another PCIe device. This
> group has no need for memory invalidation and are happy to use
> physical/bus addresses and not virtual addresses.

I didn't think there was much on this topic except for the CMB
thing.. Even that is really a mapped kernel address..

> I think something like the iopmem patches Logan and I submitted recently
> come close to addressing use case 1. There are some issues around
> routability but based on feedback to date that does not seem to be a
> show-stopper for an initial inclusion.

If it is kernel only with physical addresess we don't need a uAPI for
it, so I'm not sure #1 is at all related to iopmem.

Most people who want #1 probably can just mmap
/sys/../pci/../resourceX to get a user handle to it, or pass around
__iomem pointers in the kernel. This has been asked for before with
RDMA.

I'm still not really clear what iopmem is for, or why DAX should ever
be involved in this..

> For use-case 2 it looks like there are several options and some of them
> (like HMM) have been around for quite some time without gaining
> acceptance. I think there needs to be more discussion on this usecase and
> it could be some time before we get something upstreamable.

AFAIK, hmm makes parts easier, but isn't directly addressing this
need..

I think you need to get ZONE_DEVICE accepted for non-cachable PCI BARs
as the first step.

>From there is pretty clear we the DMA API needs to be updated to
support that use and work can be done to solve the various problems
there on the basis of using ZONE_DEVICE pages to figure out to the
PCI-E end points

Jason
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH] e820: use module_platform_driver

2016-12-05 Thread Dan Williams
On Mon, Dec 5, 2016 at 12:23 AM, Johannes Thumshirn  wrote:
> Use module_platform_driver for the e820 driver instead of open-coding it.
>
> Signed-off-by: Johannes Thumshirn 

Applied, thanks Johannes.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


每个行政工作者都必须面临的问题

2016-12-05 Thread 何�O芷
ÐèÒª²éÔĿγ̴ó¸Ù£¬·³Çë²éÊÕ¸½¼þ£¬

Message-ID: <278501278501278501278501278501278501278501278501278501>
From: =linux-nvdimm= 
To: 

·¢ ËÍ Ê± ¼ä:2016-12-05  19:07:56
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH] e820: use module_platform_driver

2016-12-05 Thread Johannes Thumshirn
Use module_platform_driver for the e820 driver instead of open-coding it.

Signed-off-by: Johannes Thumshirn 
---
 drivers/nvdimm/e820.c | 12 +---
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/drivers/nvdimm/e820.c b/drivers/nvdimm/e820.c
index 11ea901..6f9a6ff 100644
--- a/drivers/nvdimm/e820.c
+++ b/drivers/nvdimm/e820.c
@@ -84,18 +84,8 @@ static struct platform_driver e820_pmem_driver = {
},
 };
 
-static __init int e820_pmem_init(void)
-{
-   return platform_driver_register(&e820_pmem_driver);
-}
-
-static __exit void e820_pmem_exit(void)
-{
-   platform_driver_unregister(&e820_pmem_driver);
-}
+module_platform_driver(e820_pmem_driver);
 
 MODULE_ALIAS("platform:e820_pmem*");
 MODULE_LICENSE("GPL v2");
 MODULE_AUTHOR("Intel Corporation");
-module_init(e820_pmem_init);
-module_exit(e820_pmem_exit);
-- 
2.10.2

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm