Re: [PATCH V3 1/4] mm: Define coherent device memory (CDM) node

2017-02-21 Thread Anshuman Khandual
On 02/17/2017 07:35 PM, Bob Liu wrote:
> Hi Anshuman,
> 
> I have a few questions about coherent device memory.

Sure.

> 
> On Wed, Feb 15, 2017 at 8:07 PM, Anshuman Khandual
>  wrote:
>> There are certain devices like specialized accelerator, GPU cards, network
>> cards, FPGA cards etc which might contain onboard memory which is coherent
>> along with the existing system RAM while being accessed either from the CPU
>> or from the device. They share some similar properties with that of normal
> 
> What's the general size of this kind of memory?

Its more comparable to available system RAM sizes and also not as high as
persistent storage memory or NVDIMM.

> 
>> system RAM but at the same time can also be different with respect to
>> system RAM.
>>
>> User applications might be interested in using this kind of coherent device
> 
> What kind of applications?

Applications which want to use CPU compute as well device compute on the
same allocated buffer transparently. Applications for example want to
load the problem statement on the allocated buffer and ask the device
through driver to compute results out of the problem statement.


> 
>> memory explicitly or implicitly along side the system RAM utilizing all
>> possible core memory functions like anon mapping (LRU), file mapping (LRU),
>> page cache (LRU), driver managed (non LRU), HW poisoning, NUMA migrations
> 
> I didn't see the benefit to manage the onboard memory same way as system RAM.
> Why not just map this kind of onborad memory to userspace directly?
> And only those specific applications can manage/access/use it.

Integration with core MM along with driver assisted migrations gives the
application the ability to use the allocated buffer seamlessly from the
CPU or the device without bothering about actual physical placement of
the pages. That changes the paradigm of cpu and device based hybrid
compute framework which can not be achieved by mapping the device memory
directly to the user space.

> 
> It sounds not very good to complicate the core memory framework a lot
> because of some not widely used devices and uncertain applications.

Applications are not uncertain, they intend to use these framework to
achieve hybrid cpu/device compute working transparently on the same
allocated virtual buffer. IIUC we would want Linux kernel to enable
new device technologies regardless whether they are widely used or
not.



Re: [PATCH V3 1/4] mm: Define coherent device memory (CDM) node

2017-02-21 Thread Anshuman Khandual
On 02/17/2017 07:35 PM, Bob Liu wrote:
> Hi Anshuman,
> 
> I have a few questions about coherent device memory.

Sure.

> 
> On Wed, Feb 15, 2017 at 8:07 PM, Anshuman Khandual
>  wrote:
>> There are certain devices like specialized accelerator, GPU cards, network
>> cards, FPGA cards etc which might contain onboard memory which is coherent
>> along with the existing system RAM while being accessed either from the CPU
>> or from the device. They share some similar properties with that of normal
> 
> What's the general size of this kind of memory?

Its more comparable to available system RAM sizes and also not as high as
persistent storage memory or NVDIMM.

> 
>> system RAM but at the same time can also be different with respect to
>> system RAM.
>>
>> User applications might be interested in using this kind of coherent device
> 
> What kind of applications?

Applications which want to use CPU compute as well device compute on the
same allocated buffer transparently. Applications for example want to
load the problem statement on the allocated buffer and ask the device
through driver to compute results out of the problem statement.


> 
>> memory explicitly or implicitly along side the system RAM utilizing all
>> possible core memory functions like anon mapping (LRU), file mapping (LRU),
>> page cache (LRU), driver managed (non LRU), HW poisoning, NUMA migrations
> 
> I didn't see the benefit to manage the onboard memory same way as system RAM.
> Why not just map this kind of onborad memory to userspace directly?
> And only those specific applications can manage/access/use it.

Integration with core MM along with driver assisted migrations gives the
application the ability to use the allocated buffer seamlessly from the
CPU or the device without bothering about actual physical placement of
the pages. That changes the paradigm of cpu and device based hybrid
compute framework which can not be achieved by mapping the device memory
directly to the user space.

> 
> It sounds not very good to complicate the core memory framework a lot
> because of some not widely used devices and uncertain applications.

Applications are not uncertain, they intend to use these framework to
achieve hybrid cpu/device compute working transparently on the same
allocated virtual buffer. IIUC we would want Linux kernel to enable
new device technologies regardless whether they are widely used or
not.



Re: [PATCH V3 1/4] mm: Define coherent device memory (CDM) node

2017-02-17 Thread Bob Liu
Hi Anshuman,

I have a few questions about coherent device memory.

On Wed, Feb 15, 2017 at 8:07 PM, Anshuman Khandual
 wrote:
> There are certain devices like specialized accelerator, GPU cards, network
> cards, FPGA cards etc which might contain onboard memory which is coherent
> along with the existing system RAM while being accessed either from the CPU
> or from the device. They share some similar properties with that of normal

What's the general size of this kind of memory?

> system RAM but at the same time can also be different with respect to
> system RAM.
>
> User applications might be interested in using this kind of coherent device

What kind of applications?

> memory explicitly or implicitly along side the system RAM utilizing all
> possible core memory functions like anon mapping (LRU), file mapping (LRU),
> page cache (LRU), driver managed (non LRU), HW poisoning, NUMA migrations

I didn't see the benefit to manage the onboard memory same way as system RAM.
Why not just map this kind of onborad memory to userspace directly?
And only those specific applications can manage/access/use it.

It sounds not very good to complicate the core memory framework a lot
because of some not widely used devices and uncertain applications.

-
Regards,
Bob

> etc. To achieve this kind of tight integration with core memory subsystem,
> the device onboard coherent memory must be represented as a memory only
> NUMA node. At the same time arch must export some kind of a function to
> identify of this node as a coherent device memory not any other regular
> cpu less memory only NUMA node.
>
> After achieving the integration with core memory subsystem coherent device
> memory might still need some special consideration inside the kernel. There
> can be a variety of coherent memory nodes with different expectations from
> the core kernel memory. But right now only one kind of special treatment is
> considered which requires certain isolation.
>
> Now consider the case of a coherent device memory node type which requires
> isolation. This kind of coherent memory is onboard an external device
> attached to the system through a link where there is always a chance of a
> link failure taking down the entire memory node with it. More over the
> memory might also have higher chance of ECC failure as compared to the
> system RAM. Hence allocation into this kind of coherent memory node should
> be regulated. Kernel allocations must not come here. Normal user space
> allocations too should not come here implicitly (without user application
> knowing about it). This summarizes isolation requirement of certain kind of
> coherent device memory node as an example. There can be different kinds of
> isolation requirement also.
>
> Some coherent memory devices might not require isolation altogether after
> all. Then there might be other coherent memory devices which might require
> some other special treatment after being part of core memory representation
> . For now, will look into isolation seeking coherent device memory node not
> the other ones.
>
> To implement the integration as well as isolation, the coherent memory node
> must be present in N_MEMORY and a new N_COHERENT_DEVICE node mask inside
> the node_states[] array. During memory hotplug operations, the new nodemask
> N_COHERENT_DEVICE is updated along with N_MEMORY for these coherent device
> memory nodes. This also creates the following new sysfs based interface to
> list down all the coherent memory nodes of the system.
>
> /sys/devices/system/node/is_coherent_node
>
> Architectures must export function arch_check_node_cdm() which identifies
> any coherent device memory node in case they enable CONFIG_COHERENT_DEVICE.
>
> Signed-off-by: Anshuman Khandual 
> ---
>  Documentation/ABI/stable/sysfs-devices-node |  7 
>  arch/powerpc/Kconfig|  1 +
>  arch/powerpc/mm/numa.c  |  7 
>  drivers/base/node.c |  6 +++
>  include/linux/nodemask.h| 58 
> -
>  mm/Kconfig  |  4 ++
>  mm/memory_hotplug.c |  3 ++
>  mm/page_alloc.c |  8 +++-
>  8 files changed, 91 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/ABI/stable/sysfs-devices-node 
> b/Documentation/ABI/stable/sysfs-devices-node
> index 5b2d0f0..5df18f7 100644
> --- a/Documentation/ABI/stable/sysfs-devices-node
> +++ b/Documentation/ABI/stable/sysfs-devices-node
> @@ -29,6 +29,13 @@ Description:
> Nodes that have regular or high memory.
> Depends on CONFIG_HIGHMEM.
>
> +What:  /sys/devices/system/node/is_cdm_node
> +Date:  January 2017
> +Contact:   Linux Memory Management list 
> +Description:
> +   Lists the nodemask of nodes that have coherent device 

Re: [PATCH V3 1/4] mm: Define coherent device memory (CDM) node

2017-02-17 Thread Bob Liu
Hi Anshuman,

I have a few questions about coherent device memory.

On Wed, Feb 15, 2017 at 8:07 PM, Anshuman Khandual
 wrote:
> There are certain devices like specialized accelerator, GPU cards, network
> cards, FPGA cards etc which might contain onboard memory which is coherent
> along with the existing system RAM while being accessed either from the CPU
> or from the device. They share some similar properties with that of normal

What's the general size of this kind of memory?

> system RAM but at the same time can also be different with respect to
> system RAM.
>
> User applications might be interested in using this kind of coherent device

What kind of applications?

> memory explicitly or implicitly along side the system RAM utilizing all
> possible core memory functions like anon mapping (LRU), file mapping (LRU),
> page cache (LRU), driver managed (non LRU), HW poisoning, NUMA migrations

I didn't see the benefit to manage the onboard memory same way as system RAM.
Why not just map this kind of onborad memory to userspace directly?
And only those specific applications can manage/access/use it.

It sounds not very good to complicate the core memory framework a lot
because of some not widely used devices and uncertain applications.

-
Regards,
Bob

> etc. To achieve this kind of tight integration with core memory subsystem,
> the device onboard coherent memory must be represented as a memory only
> NUMA node. At the same time arch must export some kind of a function to
> identify of this node as a coherent device memory not any other regular
> cpu less memory only NUMA node.
>
> After achieving the integration with core memory subsystem coherent device
> memory might still need some special consideration inside the kernel. There
> can be a variety of coherent memory nodes with different expectations from
> the core kernel memory. But right now only one kind of special treatment is
> considered which requires certain isolation.
>
> Now consider the case of a coherent device memory node type which requires
> isolation. This kind of coherent memory is onboard an external device
> attached to the system through a link where there is always a chance of a
> link failure taking down the entire memory node with it. More over the
> memory might also have higher chance of ECC failure as compared to the
> system RAM. Hence allocation into this kind of coherent memory node should
> be regulated. Kernel allocations must not come here. Normal user space
> allocations too should not come here implicitly (without user application
> knowing about it). This summarizes isolation requirement of certain kind of
> coherent device memory node as an example. There can be different kinds of
> isolation requirement also.
>
> Some coherent memory devices might not require isolation altogether after
> all. Then there might be other coherent memory devices which might require
> some other special treatment after being part of core memory representation
> . For now, will look into isolation seeking coherent device memory node not
> the other ones.
>
> To implement the integration as well as isolation, the coherent memory node
> must be present in N_MEMORY and a new N_COHERENT_DEVICE node mask inside
> the node_states[] array. During memory hotplug operations, the new nodemask
> N_COHERENT_DEVICE is updated along with N_MEMORY for these coherent device
> memory nodes. This also creates the following new sysfs based interface to
> list down all the coherent memory nodes of the system.
>
> /sys/devices/system/node/is_coherent_node
>
> Architectures must export function arch_check_node_cdm() which identifies
> any coherent device memory node in case they enable CONFIG_COHERENT_DEVICE.
>
> Signed-off-by: Anshuman Khandual 
> ---
>  Documentation/ABI/stable/sysfs-devices-node |  7 
>  arch/powerpc/Kconfig|  1 +
>  arch/powerpc/mm/numa.c  |  7 
>  drivers/base/node.c |  6 +++
>  include/linux/nodemask.h| 58 
> -
>  mm/Kconfig  |  4 ++
>  mm/memory_hotplug.c |  3 ++
>  mm/page_alloc.c |  8 +++-
>  8 files changed, 91 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/ABI/stable/sysfs-devices-node 
> b/Documentation/ABI/stable/sysfs-devices-node
> index 5b2d0f0..5df18f7 100644
> --- a/Documentation/ABI/stable/sysfs-devices-node
> +++ b/Documentation/ABI/stable/sysfs-devices-node
> @@ -29,6 +29,13 @@ Description:
> Nodes that have regular or high memory.
> Depends on CONFIG_HIGHMEM.
>
> +What:  /sys/devices/system/node/is_cdm_node
> +Date:  January 2017
> +Contact:   Linux Memory Management list 
> +Description:
> +   Lists the nodemask of nodes that have coherent device memory.
> +   Depends on CONFIG_COHERENT_DEVICE.
> +
>  What:  

[PATCH V3 1/4] mm: Define coherent device memory (CDM) node

2017-02-15 Thread Anshuman Khandual
There are certain devices like specialized accelerator, GPU cards, network
cards, FPGA cards etc which might contain onboard memory which is coherent
along with the existing system RAM while being accessed either from the CPU
or from the device. They share some similar properties with that of normal
system RAM but at the same time can also be different with respect to
system RAM.

User applications might be interested in using this kind of coherent device
memory explicitly or implicitly along side the system RAM utilizing all
possible core memory functions like anon mapping (LRU), file mapping (LRU),
page cache (LRU), driver managed (non LRU), HW poisoning, NUMA migrations
etc. To achieve this kind of tight integration with core memory subsystem,
the device onboard coherent memory must be represented as a memory only
NUMA node. At the same time arch must export some kind of a function to
identify of this node as a coherent device memory not any other regular
cpu less memory only NUMA node.

After achieving the integration with core memory subsystem coherent device
memory might still need some special consideration inside the kernel. There
can be a variety of coherent memory nodes with different expectations from
the core kernel memory. But right now only one kind of special treatment is
considered which requires certain isolation.

Now consider the case of a coherent device memory node type which requires
isolation. This kind of coherent memory is onboard an external device
attached to the system through a link where there is always a chance of a
link failure taking down the entire memory node with it. More over the
memory might also have higher chance of ECC failure as compared to the
system RAM. Hence allocation into this kind of coherent memory node should
be regulated. Kernel allocations must not come here. Normal user space
allocations too should not come here implicitly (without user application
knowing about it). This summarizes isolation requirement of certain kind of
coherent device memory node as an example. There can be different kinds of
isolation requirement also.

Some coherent memory devices might not require isolation altogether after
all. Then there might be other coherent memory devices which might require
some other special treatment after being part of core memory representation
. For now, will look into isolation seeking coherent device memory node not
the other ones.

To implement the integration as well as isolation, the coherent memory node
must be present in N_MEMORY and a new N_COHERENT_DEVICE node mask inside
the node_states[] array. During memory hotplug operations, the new nodemask
N_COHERENT_DEVICE is updated along with N_MEMORY for these coherent device
memory nodes. This also creates the following new sysfs based interface to
list down all the coherent memory nodes of the system.

/sys/devices/system/node/is_coherent_node

Architectures must export function arch_check_node_cdm() which identifies
any coherent device memory node in case they enable CONFIG_COHERENT_DEVICE.

Signed-off-by: Anshuman Khandual 
---
 Documentation/ABI/stable/sysfs-devices-node |  7 
 arch/powerpc/Kconfig|  1 +
 arch/powerpc/mm/numa.c  |  7 
 drivers/base/node.c |  6 +++
 include/linux/nodemask.h| 58 -
 mm/Kconfig  |  4 ++
 mm/memory_hotplug.c |  3 ++
 mm/page_alloc.c |  8 +++-
 8 files changed, 91 insertions(+), 3 deletions(-)

diff --git a/Documentation/ABI/stable/sysfs-devices-node 
b/Documentation/ABI/stable/sysfs-devices-node
index 5b2d0f0..5df18f7 100644
--- a/Documentation/ABI/stable/sysfs-devices-node
+++ b/Documentation/ABI/stable/sysfs-devices-node
@@ -29,6 +29,13 @@ Description:
Nodes that have regular or high memory.
Depends on CONFIG_HIGHMEM.
 
+What:  /sys/devices/system/node/is_cdm_node
+Date:  January 2017
+Contact:   Linux Memory Management list 
+Description:
+   Lists the nodemask of nodes that have coherent device memory.
+   Depends on CONFIG_COHERENT_DEVICE.
+
 What:  /sys/devices/system/node/nodeX
 Date:  October 2002
 Contact:   Linux Memory Management list 
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 281f4f1..1cff239 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -164,6 +164,7 @@ config PPC
select ARCH_HAS_SCALED_CPUTIME if VIRT_CPU_ACCOUNTING_NATIVE
select HAVE_ARCH_HARDENED_USERCOPY
select HAVE_KERNEL_GZIP
+   select COHERENT_DEVICE if PPC_BOOK3S_64 && NEED_MULTIPLE_NODES
 
 config GENERIC_CSUM
def_bool CPU_LITTLE_ENDIAN
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index b1099cb..14f0b98 100644
--- 

[PATCH V3 1/4] mm: Define coherent device memory (CDM) node

2017-02-15 Thread Anshuman Khandual
There are certain devices like specialized accelerator, GPU cards, network
cards, FPGA cards etc which might contain onboard memory which is coherent
along with the existing system RAM while being accessed either from the CPU
or from the device. They share some similar properties with that of normal
system RAM but at the same time can also be different with respect to
system RAM.

User applications might be interested in using this kind of coherent device
memory explicitly or implicitly along side the system RAM utilizing all
possible core memory functions like anon mapping (LRU), file mapping (LRU),
page cache (LRU), driver managed (non LRU), HW poisoning, NUMA migrations
etc. To achieve this kind of tight integration with core memory subsystem,
the device onboard coherent memory must be represented as a memory only
NUMA node. At the same time arch must export some kind of a function to
identify of this node as a coherent device memory not any other regular
cpu less memory only NUMA node.

After achieving the integration with core memory subsystem coherent device
memory might still need some special consideration inside the kernel. There
can be a variety of coherent memory nodes with different expectations from
the core kernel memory. But right now only one kind of special treatment is
considered which requires certain isolation.

Now consider the case of a coherent device memory node type which requires
isolation. This kind of coherent memory is onboard an external device
attached to the system through a link where there is always a chance of a
link failure taking down the entire memory node with it. More over the
memory might also have higher chance of ECC failure as compared to the
system RAM. Hence allocation into this kind of coherent memory node should
be regulated. Kernel allocations must not come here. Normal user space
allocations too should not come here implicitly (without user application
knowing about it). This summarizes isolation requirement of certain kind of
coherent device memory node as an example. There can be different kinds of
isolation requirement also.

Some coherent memory devices might not require isolation altogether after
all. Then there might be other coherent memory devices which might require
some other special treatment after being part of core memory representation
. For now, will look into isolation seeking coherent device memory node not
the other ones.

To implement the integration as well as isolation, the coherent memory node
must be present in N_MEMORY and a new N_COHERENT_DEVICE node mask inside
the node_states[] array. During memory hotplug operations, the new nodemask
N_COHERENT_DEVICE is updated along with N_MEMORY for these coherent device
memory nodes. This also creates the following new sysfs based interface to
list down all the coherent memory nodes of the system.

/sys/devices/system/node/is_coherent_node

Architectures must export function arch_check_node_cdm() which identifies
any coherent device memory node in case they enable CONFIG_COHERENT_DEVICE.

Signed-off-by: Anshuman Khandual 
---
 Documentation/ABI/stable/sysfs-devices-node |  7 
 arch/powerpc/Kconfig|  1 +
 arch/powerpc/mm/numa.c  |  7 
 drivers/base/node.c |  6 +++
 include/linux/nodemask.h| 58 -
 mm/Kconfig  |  4 ++
 mm/memory_hotplug.c |  3 ++
 mm/page_alloc.c |  8 +++-
 8 files changed, 91 insertions(+), 3 deletions(-)

diff --git a/Documentation/ABI/stable/sysfs-devices-node 
b/Documentation/ABI/stable/sysfs-devices-node
index 5b2d0f0..5df18f7 100644
--- a/Documentation/ABI/stable/sysfs-devices-node
+++ b/Documentation/ABI/stable/sysfs-devices-node
@@ -29,6 +29,13 @@ Description:
Nodes that have regular or high memory.
Depends on CONFIG_HIGHMEM.
 
+What:  /sys/devices/system/node/is_cdm_node
+Date:  January 2017
+Contact:   Linux Memory Management list 
+Description:
+   Lists the nodemask of nodes that have coherent device memory.
+   Depends on CONFIG_COHERENT_DEVICE.
+
 What:  /sys/devices/system/node/nodeX
 Date:  October 2002
 Contact:   Linux Memory Management list 
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 281f4f1..1cff239 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -164,6 +164,7 @@ config PPC
select ARCH_HAS_SCALED_CPUTIME if VIRT_CPU_ACCOUNTING_NATIVE
select HAVE_ARCH_HARDENED_USERCOPY
select HAVE_KERNEL_GZIP
+   select COHERENT_DEVICE if PPC_BOOK3S_64 && NEED_MULTIPLE_NODES
 
 config GENERIC_CSUM
def_bool CPU_LITTLE_ENDIAN
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index b1099cb..14f0b98 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -41,6 +41,13 @@
 #include