Re: [PATCH v3 01/11] PCI/P2PDMA: Support peer-to-peer memory

2018-03-13 Thread Sinan Kaya
On 3/13/2018 4:46 PM, Logan Gunthorpe wrote:
> 
> 
> On 13/03/18 01:53 PM, Sinan Kaya wrote:
>> I agree disabling globally would be bad. Somebody can always say I have
>> ten switches on my system. I want to do peer-to-peer on one switch only. Now,
>> this change weakened security for the other switches that I had no intention
>> with doing P2P.
>>
>> Isn't this a problem?
> 
> Well, if it's a problem for someone they'll have to solve it. We're
> targeting JBOFs that have no use for ACS / IOMMU groups at all.
> 
>> Can we specify the BDF of the downstream device we want P2P with during boot 
>> via
>> kernel command line?
> 
> That's a painful configuration burden. And then things might stop
> working if you change your topology at all and now have to change boot
> parameters.
> 

It sounds like you have very tight hardware expectations for this to work
at this moment. You also don't want to generalize this code for others and
address the shortcomings.

To get you going, you should limit this change to the switch products that you 
have
validated via white-listing PCI vendor/device ids. Please do not enable this 
feature
for all other PCI devices or by default.

I think your code qualifies as a virus until this issue is resolved (so NAK). 

Another option is for your CONFIG to depend on BROKEN/EXPERT.

You are delivering a general purpose P2P code with a lot of holes in it and
expecting people to jump through it.

Turning security off by default is also not acceptable. Linux requires ACS 
support
even though you don't care about it for your particular application.

I'd hate ACS to be broken due to some operating system enabling your CONFIG 
option.


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v3 01/11] PCI/P2PDMA: Support peer-to-peer memory

2018-03-13 Thread Sinan Kaya
On 3/13/2018 3:19 PM, Logan Gunthorpe wrote:
> 
> 
> On 13/03/18 01:10 PM, Sinan Kaya wrote:
>> I was thinking of this for the pci_p2pdma_add_client() case for the
>> parent pointer.
>>
>> +struct pci_p2pdma_client {
>> +struct list_head list;
>> +struct pci_dev *client;
>> +struct pci_dev *provider;
>> +};
> 
> Yeah, that structure only exists in a list owned by the client and we
> only check the upstream bridge once per entry so I don't see the point.
> 
>> But then, Why bother searching for the switch at all?
> 
> Huh? We have to make sure all the client and provider devices are behind
> the same switch. How can we do that without "searching" for the switch?
> 

Sorry, I was thinking of ACS case you described below. The only thing code
cares is if the device is behind a switch or not at this moment.

> In the ACS case, we only disable ACS on downstream ports of switches. No
> sense disabling it globally as that's worse from an isolation point of
> view and not worth it given we require all P2P transactions to be behind
> a switch.

I agree disabling globally would be bad. Somebody can always say I have
ten switches on my system. I want to do peer-to-peer on one switch only. Now,
this change weakened security for the other switches that I had no intention
with doing P2P.

Isn't this a problem?

Can we specify the BDF of the downstream device we want P2P with during boot via
kernel command line?

> 
>> Even if the switch is there, there is no guarantee that it is currently
>> being used for P2P.
> 
> IOMMU groups are set at boot time and, at present, there's no way to
> dynamically change ACS bits without messing up the groups. So switches
> not used for P2P will not have ACS enabled when CONFIG_PCI_P2PDMA is set
> and I don't know of any good solution to that. Please see the ACS
> discussions in v1 and v2.

Given the implementation limitations, this might be OK as a short-term
solution. 

It depends on if Alex is comfortable with this.

> 
>> It seems that we are going with the assumption that enabling this config
>> option implies you want P2P, then we can simplify this code as well.
> 
> How so?
> 
> Logan
> 
> 
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v3 01/11] PCI/P2PDMA: Support peer-to-peer memory

2018-03-13 Thread Sinan Kaya
On 3/13/2018 2:44 PM, Logan Gunthorpe wrote:
>> Do you think you can keep a pointer to the parent bridge instead of querying 
>> it
>> via get_upstream_bridge_port() here so that we can reuse your
>> pci_p2pdma_disable_acs() in the future.
> 
> Keep a pointer where? pci_p2pdma_disable_acs() and pci_p2pdma_add_client() 
> are used in completely different cases on completely different devices. There 
> really is no overlap and no obvious place to store the port pointer (except 
> in the struct pci_dev itself, in which case why not just call the function 
> again).

I was thinking of this for the pci_p2pdma_add_client() case for the
parent pointer.

+struct pci_p2pdma_client {
+   struct list_head list;
+   struct pci_dev *client;
+   struct pci_dev *provider;
+};

You are right second code seems to be there to disable ACS altogether
when this kernel configuration is selected.

It doesn't have any relationship to the client code.

But then, Why bother searching for the switch at all?

Even if the switch is there, there is no guarantee that it is currently
being used for P2P.

It seems that we are going with the assumption that enabling this config
option implies you want P2P, then we can simplify this code as well.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v3 01/11] PCI/P2PDMA: Support peer-to-peer memory

2018-03-12 Thread Sinan Kaya
On 3/12/2018 3:35 PM, Logan Gunthorpe wrote:
> +int pci_p2pdma_add_client(struct list_head *head, struct device *dev)

It feels like code tried to be a generic p2pdma provider first. Then got
converted to PCI, yet all dev parameters are still struct device.

Maybe, dev parameter should also be struct pci_dev so that you can get rid of
all to_pci_dev() calls in this code including find_parent_pci_dev() function.

Regarding the switch business, It is amazing how much trouble you went into
limit this functionality into very specific hardware.

I thought that we reached to an agreement that code would not impose
any limits on what user wants.

What happened to all the emails we exchanged?

I understand you are coming from what you tested. Considering that you
are singing up for providing a generic PCI functionality into the kernel,
why don't you just blacklist the products that you had problems with and
yet still allow other architectures to use your code with their root ports?

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v3 08/11] nvme-pci: Use PCI p2pmem subsystem to manage the CMB

2018-03-12 Thread Sinan Kaya
On 3/12/2018 9:55 PM, Sinan Kaya wrote:
> On 3/12/2018 3:35 PM, Logan Gunthorpe wrote:
>> -if (nvmeq->sq_cmds_io)
> 
> I think you should keep the code as it is for the case where
> (!nvmeq->sq_cmds_is_io && nvmeq->sq_cmds_io)

Never mind. I misunderstood the code.


> 
> You are changing the behavior for NVMe drives with CMB buffers.
> You can change the if statement here with the statement above.
> 
>> -memcpy_toio(>sq_cmds_io[tail], cmd, sizeof(*cmd));
>> -else
>> -memcpy(>sq_cmds[tail], cmd, sizeof(*cmd));
>> +memcpy(>sq_cmds[tail], cmd, sizeof(*cmd));
>>  
> 
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v3 08/11] nvme-pci: Use PCI p2pmem subsystem to manage the CMB

2018-03-12 Thread Sinan Kaya
On 3/12/2018 3:35 PM, Logan Gunthorpe wrote:
> - if (nvmeq->sq_cmds_io)

I think you should keep the code as it is for the case where
(!nvmeq->sq_cmds_is_io && nvmeq->sq_cmds_io)

You are changing the behavior for NVMe drives with CMB buffers.
You can change the if statement here with the statement above.

> - memcpy_toio(>sq_cmds_io[tail], cmd, sizeof(*cmd));
> - else
> - memcpy(>sq_cmds[tail], cmd, sizeof(*cmd));
> + memcpy(>sq_cmds[tail], cmd, sizeof(*cmd));
>  


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v2 07/10] nvme-pci: Use PCI p2pmem subsystem to manage the CMB

2018-03-05 Thread Sinan Kaya
On 3/5/2018 12:10 PM, Logan Gunthorpe wrote:
> 
> 
> On 05/03/18 09:00 AM, Keith Busch wrote:
>> On Mon, Mar 05, 2018 at 12:33:29PM +1100, Oliver wrote:
>>> On Thu, Mar 1, 2018 at 10:40 AM, Logan Gunthorpe <log...@deltatee.com> 
>>> wrote:
>>>> @@ -429,10 +429,7 @@ static void __nvme_submit_cmd(struct nvme_queue 
>>>> *nvmeq,
>>>>   {
>>>>  u16 tail = nvmeq->sq_tail;
>>>
>>>> -   if (nvmeq->sq_cmds_io)
>>>> -   memcpy_toio(>sq_cmds_io[tail], cmd, sizeof(*cmd));
>>>> -   else
>>>> -   memcpy(>sq_cmds[tail], cmd, sizeof(*cmd));
>>>> +   memcpy(>sq_cmds[tail], cmd, sizeof(*cmd));
>>>
>>> Hmm, how safe is replacing memcpy_toio() with regular memcpy()? On PPC
>>> the _toio() variant enforces alignment, does the copy with 4 byte
>>> stores, and has a full barrier after the copy. In comparison our
>>> regular memcpy() does none of those things and may use unaligned and
>>> vector load/stores. For normal (cacheable) memory that is perfectly
>>> fine, but they can cause alignment faults when targeted at MMIO
>>> (cache-inhibited) memory.
>>>
>>> I think in this particular case it might be ok since we know SEQs are
>>> aligned to 64 byte boundaries and the copy is too small to use our
>>> vectorised memcpy(). I'll assume we don't need explicit ordering
>>> between writes of SEQs since the existing code doesn't seem to care
>>> unless the doorbell is being rung, so you're probably fine there too.
>>> That said, I still think this is a little bit sketchy and at the very
>>> least you should add a comment explaining what's going on when the CMB
>>> is being used. If someone more familiar with the NVMe driver could
>>> chime in I would appreciate it.
>>
>> I may not be understanding the concern, but I'll give it a shot.
>>
>> You're right, the start of any SQE is always 64-byte aligned, so that
>> should satisfy alignment requirements.
>>
>> The order when writing multiple/successive SQEs in a submission queue
>> does matter, and this is currently serialized through the q_lock.
>>
>> The order in which the bytes of a single SQE is written doesn't really
>> matter as long as the entire SQE is written into the CMB prior to writing
>> that SQ's doorbell register.
>>
>> The doorbell register is written immediately after copying a command
>> entry into the submission queue (ignore "shadow buffer" features),
>> so the doorbells written to commands submitted is 1:1.
>>
>> If a CMB SQE and DB order is not enforced with the memcpy, then we do
>> need a barrier after the SQE's memcpy and before the doorbell's writel.
> 
> 
> Thanks for the information Keith.
> 
> Adding to this: regular memcpy generally also enforces alignment as unaligned 
> access to regular memory is typically bad in some way on most arches. The 
> generic memcpy_toio also does not have any barrier as it is just a call to 
> memcpy. Arm64 also does not appear to have a barrier in its implementation 
> and in the short survey I did I could not find any implementation with a 
> barrier. I also did not find a ppc implementation in the tree but it would be 
> weird for it to add a barrier when other arches do not appear to need it.
> 
> We've been operating on the assumption that memory mapped by 
> devm_memremap_pages() can be treated as regular memory. This is emphasized by 
> the fact that it does not return an __iomem pointer. If this assumption does 
> not hold for an arch then we cannot support P2P DMA without an overhaul of 
> many kernel interfaces or creating other backend interfaces into the drivers 
> which take different data types (ie. we'd have to bypass the entire block 
> layer when trying to write data in p2pmem to an nvme device. This is very 
> undesirable.
> 

writel has a barrier inside on ARM64.

https://elixir.bootlin.com/linux/latest/source/arch/arm64/include/asm/io.h#L143

Why do you need another barrier?


ACCESSING DEVICES
-

Many devices can be memory mapped, and so appear to the CPU as if they're just
a set of memory locations.  To control such a device, the driver usually has to
make the right memory accesses in exactly the right order.

However, having a clever CPU or a clever compiler creates a potential problem
in that the carefully sequenced accesses in the driver code won't reach the
device in the requisite order if the CPU or the compiler thinks it is more
efficient to reorder, combine or merge accesses - something that would cause
the device to malfunct

Re: [PATCH v4 3/8] dmaengine: Add DMA_MEMCPY_SG transaction op

2017-08-08 Thread Sinan Kaya
Hi Dave,

On 8/7/2017 12:39 PM, Dave Jiang wrote:
> Adding a dmaengine transaction operation that allows copy to/from a
> scatterlist and a flat buffer.
> 
> Signed-off-by: Dave Jiang <dave.ji...@intel.com>
> ---
>  Documentation/dmaengine/provider.txt |3 +++
>  drivers/dma/dmaengine.c  |2 ++
>  include/linux/dmaengine.h|6 ++
>  3 files changed, 11 insertions(+)
> 

I was wondering if you could add a test like DMA_SG to dmatest so that we
also have a validation suite as well.

Sinan

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq

2017-08-02 Thread Sinan Kaya
On 8/2/2017 4:52 PM, Dave Jiang wrote:
>> Do we need a new API / new function, or new capability?
> Hmmm...you are right. I wonder if we need something like DMA_SG cap
> 
> 

Unfortunately, DMA_SG means something else. Maybe, we need DMA_MEMCPY_SG
to be similar with DMA_MEMSET_SG.

enum dma_transaction_type {
DMA_MEMCPY,
DMA_XOR,
DMA_PQ,
DMA_XOR_VAL,
DMA_PQ_VAL,
DMA_MEMSET,
DMA_MEMSET_SG,
DMA_INTERRUPT,
DMA_SG,
DMA_PRIVATE,
DMA_ASYNC_TX,
DMA_SLAVE,
DMA_CYCLIC,
DMA_INTERLEAVE,
/* last transaction type for creation of the capabilities mask */
DMA_TX_TYPE_END,
};

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq

2017-08-02 Thread Sinan Kaya
On 8/2/2017 2:41 PM, Dave Jiang wrote:
>   if (queue_mode == PMEM_Q_MQ) {
> + chan = dma_find_channel(DMA_MEMCPY);
> + if (!chan) {
> + queue_mode = PMEM_Q_BIO;
> + dev_warn(dev, "Forced back to PMEM_Q_BIO, no DMA\n");
> + }

We can't expect all MEMCPY capable hardware to support this feature, right?

Do we need a new API / new function, or new capability?

> + }


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [RFC 1/8] Introduce Peer-to-Peer memory (p2pmem) device

2017-04-02 Thread Sinan Kaya
On 4/2/2017 1:21 PM, Logan Gunthorpe wrote:
>> This is when the BIOS date helps so that you don't break existing systems.
> I'm not that worried about this code breaking existing systems. There
> are significant trade-offs with using p2pmem (ie. you are quite likely
> sacrificing performance for memory QOS or upstream PCI bandwidth), and
> therefore the user _has_ to specifically say to use it. This is why
> we've put a flag in the nvme target code that defaults to off. Thus we
> are not going to have a situation where people upgrade their kernels and
> see broken or slow systems. People _have_ to make the decision to turn
> it on and decide based on their use case whether it's appropriate.
> 

OK. I didn't know the feature was not enabled by default. This is even 
easier now. 

Push the decision all the way to the user. Let them decide whether they
want this feature to work on a root port connected port or under the
switch.

>> We can't guarentee all switches will work either. See above for instructions
>> on when this feature should be enabled.
> It's a lot easier to say that all switches will work than it is for root
> ports. This is essentially what switches are designed for, so I'd be
> surprised to find one that doesn't work. Root ports are the trouble here
> seeing it's a lot more likely for them to be designed without
> considering that traffic needs to move between ports efficiently. If we
> do find extremely broken switches that don't support this then we'd
> probably want to create a black list for that. Also, there's
> significantly fewer PCI switch products on the market than there are
> root port instances, so a black list would be much easier to manage there.
> 

I thought the issue was feature didn't work at all with some root ports
or there was some kind of memory corruption issue that you were trying to
avoid with the existing systems.

If you are just worried about performance, the switch recommendation belongs
to your particular product tuning guide or a howto document not into the
actual code itself. 

I think you should get rid of all pci searching business in your code.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [RFC 1/8] Introduce Peer-to-Peer memory (p2pmem) device

2017-03-31 Thread Sinan Kaya
On 3/31/2017 6:42 PM, Logan Gunthorpe wrote:
> 
> 
> On 31/03/17 03:38 PM, Sinan Kaya wrote:
>> On 3/31/2017 5:23 PM, Logan Gunthorpe wrote:
>>> What exactly would you white/black list? It can't be the NIC or the
>>> disk. If it's going to be a white/black list on the switch or root port
>>> then you'd need essentially the same code to ensure they are all behind
>>> the same switch or root port.
>>
>> What is so special about being connected to the same switch?
>>
>> Why don't we allow the feature by default and blacklist by the root ports
>> that don't work with a quirk.
> 
> Well root ports have the same issue here. There may be more than one
> root port or other buses (ie QPI) between the devices in question. So
> you can't just say "this system has root port X therefore we can always
> use p2pmem". 

We only care about devices on the data path between two devices.

> In the end, if you want to do any kind of restrictions
> you're going to have to walk the tree, as the code currently does, and
> figure out what's between the devices being used and black or white list
> accordingly. Then seeing there's just such a vast number of devices out
> there you'd almost certainly have to use some kind of white list and not
> a black list. Then the question becomes which devices will be white
> listed? 

How about a combination of blacklist + time bomb + peer-to-peer feature?

You can put a restriction with DMI/SMBIOS such that all devices from 2016
work else they belong to blacklist.

> The first to be listed would be switches seeing they will always
> work. This is pretty much what we have (though it doesn't currently
> cover multiple levels of switches). The next step, if someone wanted to
> test with specific hardware, might be to allow the case where all the
> devices are behind the same root port which Intel Ivy Bridge or newer.

Sorry, I'm not familiar with Intel architecture. Based on what you just
wrote, I think I see your point. 

I'm trying to generalize what you are doing to a little
bigger context so that I can use it on another architecture like arm64
where I may or may not have a switch.

This text below is sort of repeating what you are writing above. 

How about this:

The goal is to find a common parent between any two devices that need to
use your code. 

- all bridges/switches on the data need to support peer-to-peer, otherwise
stop.

- Make sure that all devices on the data path are not blacklisted via your
code.

- If there is at least somebody blacklisted, we stop and the feature is
not allowed.

- If we find a common parent and no errors, you are good to go.

- We don't care about devices above the common parent whether they have
some feature X, Y, Z or not. 

Maybe, a little bit less code than what you have but it is flexible and
not that too hard to implement.

Well, the code is in RFC. I don't see why we can't remove some restrictions
and still have your code move forward. 

> However, I don't think a comprehensive white list should be a
> requirement for this work to go forward and I don't think anything
> you've suggested will remove any of the "ugliness".

I don't think the ask above is a very big deal. If you feel like
addressing this on another patchset like you suggested in your cover letter,
I'm fine with that too.

> 
> What we discussed at LSF was that only allowing cases with a switch was
> the simplest way to be sure any given setup would actually work.
> 
>> I'm looking at this from portability perspective to be honest.
> 
> I'm looking at this from the fact that there's a vast number of
> topologies and devices involved, and figuring out which will work is
> very complicated and could require a lot of hardware testing. The LSF
> folks were primarily concerned with not having users enable the feature
> and see breakage or terrible performance.
> 
>> I'd rather see the feature enabled by default without any assumptions.
>> Using it with a switch is just a use case that you happened to test.
>> It can allow new architectures to use your code tomorrow.
> 
> That's why I was advocating for letting userspace decide such that if
> you're setting up a system with this you say to use a specific p2pmem
> device and then you are responsible to test and benchmark it and decide
> to use it in going forward. However, this has received a lot of push back.

Yeah, we shouldn't trust the userspace for such things.

> 
> Logan
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm