Re: [PATCH 01/12] pci-p2p: Support peer to peer memory

2018-01-04 Thread Logan Gunthorpe



On 04/01/18 02:59 PM, Bjorn Helgaas wrote:

On Thu, Jan 04, 2018 at 12:01:26PM -0700, Logan Gunthorpe wrote:

Some PCI devices may have memory mapped in a BAR space that's
intended for use in Peer-to-Peer transactions. In order to enable
such transactions the memory must be registered with ZONE_DEVICE pages
so it can be used by DMA interfaces in existing drivers.
...



+ * pci_p2pmem_alloc_sgl - allocate p2p memory in an sgl
+ * @pdev:  the device to allocate memory from
+ * @sgl:   the allocated sgl
+ * @nents:  the number of sgs in the list
+ * @length: number of bytes to allocate


Your later patches use "SGL" in English text.  If that's the conventional
capitalization, please use it here, too.



I'm not sure there is a convention. scatterlist.h uses 'sglist' and 
'scatterlist' so I'll just change my comments to use the latter.


Logan


Re: [PATCH 01/12] pci-p2p: Support peer to peer memory

2018-01-04 Thread Logan Gunthorpe



On 04/01/18 02:59 PM, Bjorn Helgaas wrote:

On Thu, Jan 04, 2018 at 12:01:26PM -0700, Logan Gunthorpe wrote:

Some PCI devices may have memory mapped in a BAR space that's
intended for use in Peer-to-Peer transactions. In order to enable
such transactions the memory must be registered with ZONE_DEVICE pages
so it can be used by DMA interfaces in existing drivers.
...



+ * pci_p2pmem_alloc_sgl - allocate p2p memory in an sgl
+ * @pdev:  the device to allocate memory from
+ * @sgl:   the allocated sgl
+ * @nents:  the number of sgs in the list
+ * @length: number of bytes to allocate


Your later patches use "SGL" in English text.  If that's the conventional
capitalization, please use it here, too.



I'm not sure there is a convention. scatterlist.h uses 'sglist' and 
'scatterlist' so I'll just change my comments to use the latter.


Logan


Re: [PATCH 01/12] pci-p2p: Support peer to peer memory

2018-01-04 Thread Logan Gunthorpe

Thanks for the speedy review!

On 04/01/18 02:40 PM, Bjorn Helgaas wrote:

Run "git log --oneline drivers/pci" and follow the convention.  I
think it would make sense to add a new tag like "PCI/P2P", although
"P2P" has historically also been used in the "PCI-to-PCI bridge"
context, so maybe there's something less ambiguous.  "P2PDMA"?


Ok, I'll fix this for v2. I'm fine with renaming things to p2pdma


When you add new files, I guess we're looking for the new SPDX
copyright stuff?


Will do.


It's more than "non-trivial" or "with good performance" (from Kconfig
help), isn't it?  AFAIK, there's no standard way at all to discover
whether P2P DMA is supported between root ports or RCs.


Yup, that's correct. This would have to be done with a white list.



s/bars/BARs/ (and similarly below, except in C code)
Similarly, s/dma/DMA/ and s/pci/PCI/ below.
And probably also s/p2p/peer-to-peer DMA/ in messages.


Will do.


Maybe clarify this domain bit.  Using "domain" suggests the common PCI
segment/domain usage, but I think you really mean something like the
part of the hierarchy where peer-to-peer DMA is guaranteed by the PCI
spec to work, i.e., anything below a single PCI bridge.


Ok, I like the wording you proposed.




Seems like there should be

   if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
 return -EINVAL;

or similar here?


That sounds like a good idea. Will add.




+   if (WARN_ON(offset >= pci_resource_len(pdev, bar)))
+   return -EINVAL;


Are these WARN_ONs for debugging purposes, or do you think we need
them in production?  Granted, hitting it would probably be a kernel
driver bug, but still, not sure if the PCI core needs to coddle the
driver author that much.


Sure, I'll drop all the WARN_ONs.


I'm guessing Christoph's dev_pagemap revamp repo must change
pgmap->res from a pointer to a structure, but I don't see the actual
link in your cover letter.


Sorry, the patch set is here:

https://www.mail-archive.com/linux-nvdimm@lists.01.org/msg07323.html

git://git.infradead.org/users/hch/misc.git hch/pgmap-cleanups.3


I think you should set pgmap->res.flags here, too.


Sure, I don't think it's used and not set by the NVDIMM code; but I 
agree that it'd be a good idea to set it anyway.



+   dev_info(>dev, "added %zdB of p2p memory\n", size);


Can we add %pR and print pgmap->res itself, too?


Yup.


You have a bit of a mix of PCI ("pci device", "bridge") and PCIe
("switch", "switch port") terminology.  I haven't read the rest of the
patches yet, so I don't know if you intend to restrict this to
PCIe-only, e.g., so you can use ACS, or if you want to make it
available on conventional PCI as well.

If the latter, I would use the generic PCI terminology, i.e., "bridge"
instead of "switch".


Ok, I'll change it to use the generic term bridge. There's no 
restriction in the code to limit it to PCIe only, though I don't expect 
anybody will ever be using this with legacy PCI.



+ * pci_virt_to_bus - return the pci bus address for a given virtual
+ * address obtained with pci_alloc_p2pmem
+ * @pdev:  the device the memory was allocated from
+ * @addr:  address of the memory that was allocated
+ */
+pci_bus_addr_t pci_p2pmem_virt_to_bus(struct pci_dev *pdev, void *addr)
+{
+   if (!addr)
+   return 0;
+   if (!pdev->p2p)
+   return 0;
+
+   return gen_pool_virt_to_phys(pdev->p2p->pool, (unsigned long)addr);


This doesn't seem right.  A physical address is not the same as a PCI
bus address.  I expected something like pci_bus_address() or
pcibios_resource_to_bus() here.  Am I missing something?  If so, a
clarifying comment would be helpful.


What you're missing is that when we called gen_pool_add_virt we used the 
PCI bus address as the physical address and not the CPU physical address 
(which we don't care about). I'll add a comment explaining this.



I've been noticing that we're accumulating PCI-related files in
include/linux: pci.h, pci-aspm.h pci-ats.h, pci-dma.h, pcieport_if.h,
etc.  I'm not sure there's value in all those and am thinking maybe
they should just be folded into pci.h.  What do you think?


We started with that. Once we reached a certain amount of code, 
Christoph suggested we put it in its own header.


Logan


Re: [PATCH 01/12] pci-p2p: Support peer to peer memory

2018-01-04 Thread Logan Gunthorpe

Thanks for the speedy review!

On 04/01/18 02:40 PM, Bjorn Helgaas wrote:

Run "git log --oneline drivers/pci" and follow the convention.  I
think it would make sense to add a new tag like "PCI/P2P", although
"P2P" has historically also been used in the "PCI-to-PCI bridge"
context, so maybe there's something less ambiguous.  "P2PDMA"?


Ok, I'll fix this for v2. I'm fine with renaming things to p2pdma


When you add new files, I guess we're looking for the new SPDX
copyright stuff?


Will do.


It's more than "non-trivial" or "with good performance" (from Kconfig
help), isn't it?  AFAIK, there's no standard way at all to discover
whether P2P DMA is supported between root ports or RCs.


Yup, that's correct. This would have to be done with a white list.



s/bars/BARs/ (and similarly below, except in C code)
Similarly, s/dma/DMA/ and s/pci/PCI/ below.
And probably also s/p2p/peer-to-peer DMA/ in messages.


Will do.


Maybe clarify this domain bit.  Using "domain" suggests the common PCI
segment/domain usage, but I think you really mean something like the
part of the hierarchy where peer-to-peer DMA is guaranteed by the PCI
spec to work, i.e., anything below a single PCI bridge.


Ok, I like the wording you proposed.




Seems like there should be

   if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
 return -EINVAL;

or similar here?


That sounds like a good idea. Will add.




+   if (WARN_ON(offset >= pci_resource_len(pdev, bar)))
+   return -EINVAL;


Are these WARN_ONs for debugging purposes, or do you think we need
them in production?  Granted, hitting it would probably be a kernel
driver bug, but still, not sure if the PCI core needs to coddle the
driver author that much.


Sure, I'll drop all the WARN_ONs.


I'm guessing Christoph's dev_pagemap revamp repo must change
pgmap->res from a pointer to a structure, but I don't see the actual
link in your cover letter.


Sorry, the patch set is here:

https://www.mail-archive.com/linux-nvdimm@lists.01.org/msg07323.html

git://git.infradead.org/users/hch/misc.git hch/pgmap-cleanups.3


I think you should set pgmap->res.flags here, too.


Sure, I don't think it's used and not set by the NVDIMM code; but I 
agree that it'd be a good idea to set it anyway.



+   dev_info(>dev, "added %zdB of p2p memory\n", size);


Can we add %pR and print pgmap->res itself, too?


Yup.


You have a bit of a mix of PCI ("pci device", "bridge") and PCIe
("switch", "switch port") terminology.  I haven't read the rest of the
patches yet, so I don't know if you intend to restrict this to
PCIe-only, e.g., so you can use ACS, or if you want to make it
available on conventional PCI as well.

If the latter, I would use the generic PCI terminology, i.e., "bridge"
instead of "switch".


Ok, I'll change it to use the generic term bridge. There's no 
restriction in the code to limit it to PCIe only, though I don't expect 
anybody will ever be using this with legacy PCI.



+ * pci_virt_to_bus - return the pci bus address for a given virtual
+ * address obtained with pci_alloc_p2pmem
+ * @pdev:  the device the memory was allocated from
+ * @addr:  address of the memory that was allocated
+ */
+pci_bus_addr_t pci_p2pmem_virt_to_bus(struct pci_dev *pdev, void *addr)
+{
+   if (!addr)
+   return 0;
+   if (!pdev->p2p)
+   return 0;
+
+   return gen_pool_virt_to_phys(pdev->p2p->pool, (unsigned long)addr);


This doesn't seem right.  A physical address is not the same as a PCI
bus address.  I expected something like pci_bus_address() or
pcibios_resource_to_bus() here.  Am I missing something?  If so, a
clarifying comment would be helpful.


What you're missing is that when we called gen_pool_add_virt we used the 
PCI bus address as the physical address and not the CPU physical address 
(which we don't care about). I'll add a comment explaining this.



I've been noticing that we're accumulating PCI-related files in
include/linux: pci.h, pci-aspm.h pci-ats.h, pci-dma.h, pcieport_if.h,
etc.  I'm not sure there's value in all those and am thinking maybe
they should just be folded into pci.h.  What do you think?


We started with that. Once we reached a certain amount of code, 
Christoph suggested we put it in its own header.


Logan


Re: [PATCH 01/12] pci-p2p: Support peer to peer memory

2018-01-04 Thread Bjorn Helgaas
On Thu, Jan 04, 2018 at 12:01:26PM -0700, Logan Gunthorpe wrote:
> Some PCI devices may have memory mapped in a BAR space that's
> intended for use in Peer-to-Peer transactions. In order to enable
> such transactions the memory must be registered with ZONE_DEVICE pages
> so it can be used by DMA interfaces in existing drivers.
> ...

> + * pci_p2pmem_alloc_sgl - allocate p2p memory in an sgl
> + * @pdev:the device to allocate memory from
> + * @sgl: the allocated sgl
> + * @nents:  the number of sgs in the list
> + * @length: number of bytes to allocate

Your later patches use "SGL" in English text.  If that's the conventional
capitalization, please use it here, too.



Re: [PATCH 01/12] pci-p2p: Support peer to peer memory

2018-01-04 Thread Bjorn Helgaas
On Thu, Jan 04, 2018 at 12:01:26PM -0700, Logan Gunthorpe wrote:
> Some PCI devices may have memory mapped in a BAR space that's
> intended for use in Peer-to-Peer transactions. In order to enable
> such transactions the memory must be registered with ZONE_DEVICE pages
> so it can be used by DMA interfaces in existing drivers.
> ...

> + * pci_p2pmem_alloc_sgl - allocate p2p memory in an sgl
> + * @pdev:the device to allocate memory from
> + * @sgl: the allocated sgl
> + * @nents:  the number of sgs in the list
> + * @length: number of bytes to allocate

Your later patches use "SGL" in English text.  If that's the conventional
capitalization, please use it here, too.



Re: [PATCH 01/12] pci-p2p: Support peer to peer memory

2018-01-04 Thread Bjorn Helgaas
Run "git log --oneline drivers/pci" and follow the convention.  I
think it would make sense to add a new tag like "PCI/P2P", although
"P2P" has historically also been used in the "PCI-to-PCI bridge"
context, so maybe there's something less ambiguous.  "P2PDMA"?

When you add new files, I guess we're looking for the new SPDX
copyright stuff?

On Thu, Jan 04, 2018 at 12:01:26PM -0700, Logan Gunthorpe wrote:
> Some PCI devices may have memory mapped in a BAR space that's
> intended for use in Peer-to-Peer transactions. In order to enable
> such transactions the memory must be registered with ZONE_DEVICE pages
> so it can be used by DMA interfaces in existing drivers.
> 
> A kernel interface is provided so that other subsystems can find and
> allocate chunks of P2P memory as necessary to facilitate transfers
> between two PCI peers. Depending on hardware, this may reduce the
> bandwidth of the transfer but would significantly reduce pressure
> on system memory. This may be desirable in many cases: for example a
> system could be designed with a small CPU connected to a PCI switch by a
> small number of lanes which would maximize the number of lanes available
> to connect to NVME devices.
> 
> The interface requires a user driver to collect a list of client devices
> involved in the transaction with the pci_p2pmem_add_client*() functions
> then call pci_p2pmem_find() to obtain any suitable P2P memory. Once
> this is done the list is bound to the memory and the calling driver is
> free to add and remove clients as necessary. The ACS bits on the
> downstream switch port will be managed for all the registered clients.
> 
> The code is designed to only utilize the p2pmem device if all the devices
> involved in a transfer are behind the same PCI switch. This is because
> using P2P transactions through the PCI root complex can have performance
> limitations or, worse, might not work at all. Finding out how well a
> particular RC supports P2P transfers is non-trivial. 

It's more than "non-trivial" or "with good performance" (from Kconfig
help), isn't it?  AFAIK, there's no standard way at all to discover
whether P2P DMA is supported between root ports or RCs.

> +config PCI_P2P
> + bool "PCI Peer to Peer transfer support"
> + depends on ZONE_DEVICE
> + select GENERIC_ALLOCATOR
> + help
> +   Enableѕ drivers to do PCI peer to peer transactions to and from
> +   bars that are exposed to other devices in the same domain.

s/bars/BARs/ (and similarly below, except in C code)

Similarly, s/dma/DMA/ and s/pci/PCI/ below.

And probably also s/p2p/peer-to-peer DMA/ in messages.

Maybe clarify this domain bit.  Using "domain" suggests the common PCI
segment/domain usage, but I think you really mean something like the
part of the hierarchy where peer-to-peer DMA is guaranteed by the PCI
spec to work, i.e., anything below a single PCI bridge.

> +
> +   Many PCIe root complexes do not support P2P transactions and
> +   it's hard to tell which support it with good performance, so
> +   at this time you will need a PCIe switch.
> +
> +   If unsure, say N.

> + * pci_p2pmem_add_resource - add memory for use as p2p memory
> + * @pci: the device to add the memory to
> + * @bar: PCI bar to add
> + * @size: size of the memory to add, may be zero to use the whole bar
> + * @offset: offset into the PCI bar
> + *
> + * The memory will be given ZONE_DEVICE struct pages so that it may
> + * be used with any dma request.
> + */
> +int pci_p2pmem_add_resource(struct pci_dev *pdev, int bar, size_t size,
> + u64 offset)
> +{
> + struct dev_pagemap *pgmap;
> + void *addr;
> + int error;

Seems like there should be

  if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
return -EINVAL;

or similar here?

> + if (WARN_ON(offset >= pci_resource_len(pdev, bar)))
> + return -EINVAL;

Are these WARN_ONs for debugging purposes, or do you think we need
them in production?  Granted, hitting it would probably be a kernel
driver bug, but still, not sure if the PCI core needs to coddle the
driver author that much.

> + if (!size)
> + size = pci_resource_len(pdev, bar) - offset;
> +
> + if (WARN_ON(size + offset > pci_resource_len(pdev, bar)))
> + return -EINVAL;
> +
> + if (!pdev->p2p) {
> + error = pci_p2pmem_setup(pdev);
> + if (error)
> + return error;
> + }
> +
> + pgmap = devm_kzalloc(>dev, sizeof(*pgmap), GFP_KERNEL);
> + if (!pgmap)
> + return -ENOMEM;
> +
> + pgmap->res.start = pci_resource_start(pdev, bar) + offset;
> + pgmap->res.end = pgmap->res.start + size - 1;

I'm guessing Christoph's dev_pagemap revamp repo must change
pgmap->res from a pointer to a structure, but I don't see the actual
link in your cover letter.

I think you should set pgmap->res.flags here, too.

> + pgmap->ref = >p2p->devmap_ref;
> + pgmap->type = 

Re: [PATCH 01/12] pci-p2p: Support peer to peer memory

2018-01-04 Thread Bjorn Helgaas
Run "git log --oneline drivers/pci" and follow the convention.  I
think it would make sense to add a new tag like "PCI/P2P", although
"P2P" has historically also been used in the "PCI-to-PCI bridge"
context, so maybe there's something less ambiguous.  "P2PDMA"?

When you add new files, I guess we're looking for the new SPDX
copyright stuff?

On Thu, Jan 04, 2018 at 12:01:26PM -0700, Logan Gunthorpe wrote:
> Some PCI devices may have memory mapped in a BAR space that's
> intended for use in Peer-to-Peer transactions. In order to enable
> such transactions the memory must be registered with ZONE_DEVICE pages
> so it can be used by DMA interfaces in existing drivers.
> 
> A kernel interface is provided so that other subsystems can find and
> allocate chunks of P2P memory as necessary to facilitate transfers
> between two PCI peers. Depending on hardware, this may reduce the
> bandwidth of the transfer but would significantly reduce pressure
> on system memory. This may be desirable in many cases: for example a
> system could be designed with a small CPU connected to a PCI switch by a
> small number of lanes which would maximize the number of lanes available
> to connect to NVME devices.
> 
> The interface requires a user driver to collect a list of client devices
> involved in the transaction with the pci_p2pmem_add_client*() functions
> then call pci_p2pmem_find() to obtain any suitable P2P memory. Once
> this is done the list is bound to the memory and the calling driver is
> free to add and remove clients as necessary. The ACS bits on the
> downstream switch port will be managed for all the registered clients.
> 
> The code is designed to only utilize the p2pmem device if all the devices
> involved in a transfer are behind the same PCI switch. This is because
> using P2P transactions through the PCI root complex can have performance
> limitations or, worse, might not work at all. Finding out how well a
> particular RC supports P2P transfers is non-trivial. 

It's more than "non-trivial" or "with good performance" (from Kconfig
help), isn't it?  AFAIK, there's no standard way at all to discover
whether P2P DMA is supported between root ports or RCs.

> +config PCI_P2P
> + bool "PCI Peer to Peer transfer support"
> + depends on ZONE_DEVICE
> + select GENERIC_ALLOCATOR
> + help
> +   Enableѕ drivers to do PCI peer to peer transactions to and from
> +   bars that are exposed to other devices in the same domain.

s/bars/BARs/ (and similarly below, except in C code)

Similarly, s/dma/DMA/ and s/pci/PCI/ below.

And probably also s/p2p/peer-to-peer DMA/ in messages.

Maybe clarify this domain bit.  Using "domain" suggests the common PCI
segment/domain usage, but I think you really mean something like the
part of the hierarchy where peer-to-peer DMA is guaranteed by the PCI
spec to work, i.e., anything below a single PCI bridge.

> +
> +   Many PCIe root complexes do not support P2P transactions and
> +   it's hard to tell which support it with good performance, so
> +   at this time you will need a PCIe switch.
> +
> +   If unsure, say N.

> + * pci_p2pmem_add_resource - add memory for use as p2p memory
> + * @pci: the device to add the memory to
> + * @bar: PCI bar to add
> + * @size: size of the memory to add, may be zero to use the whole bar
> + * @offset: offset into the PCI bar
> + *
> + * The memory will be given ZONE_DEVICE struct pages so that it may
> + * be used with any dma request.
> + */
> +int pci_p2pmem_add_resource(struct pci_dev *pdev, int bar, size_t size,
> + u64 offset)
> +{
> + struct dev_pagemap *pgmap;
> + void *addr;
> + int error;

Seems like there should be

  if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
return -EINVAL;

or similar here?

> + if (WARN_ON(offset >= pci_resource_len(pdev, bar)))
> + return -EINVAL;

Are these WARN_ONs for debugging purposes, or do you think we need
them in production?  Granted, hitting it would probably be a kernel
driver bug, but still, not sure if the PCI core needs to coddle the
driver author that much.

> + if (!size)
> + size = pci_resource_len(pdev, bar) - offset;
> +
> + if (WARN_ON(size + offset > pci_resource_len(pdev, bar)))
> + return -EINVAL;
> +
> + if (!pdev->p2p) {
> + error = pci_p2pmem_setup(pdev);
> + if (error)
> + return error;
> + }
> +
> + pgmap = devm_kzalloc(>dev, sizeof(*pgmap), GFP_KERNEL);
> + if (!pgmap)
> + return -ENOMEM;
> +
> + pgmap->res.start = pci_resource_start(pdev, bar) + offset;
> + pgmap->res.end = pgmap->res.start + size - 1;

I'm guessing Christoph's dev_pagemap revamp repo must change
pgmap->res from a pointer to a structure, but I don't see the actual
link in your cover letter.

I think you should set pgmap->res.flags here, too.

> + pgmap->ref = >p2p->devmap_ref;
> + pgmap->type =