Re: [PATCH 1/1] drm/amdkfd: Add IPC API

2020-07-14 Thread Daniel Vetter
On Tue, Jul 14, 2020 at 01:36:41PM +0200, Christian König wrote:
> Am 14.07.20 um 11:53 schrieb Daniel Vetter:
> > On Tue, Jul 14, 2020 at 11:28:12AM +0200, Christian König wrote:
> > > Am 14.07.20 um 10:58 schrieb Daniel Vetter:
> > > > On Tue, Jul 14, 2020 at 02:26:36PM +1000, Dave Airlie wrote:
> > > > > On Tue, 14 Jul 2020 at 14:09, Felix Kuehling  
> > > > > wrote:
> > > > > > Am 2020-07-13 um 11:28 p.m. schrieb Dave Airlie:
> > > > > > > On Tue, 14 Jul 2020 at 13:14, Felix Kuehling 
> > > > > > >  wrote:
> > > > > > > > This allows exporting and importing buffers. The API generates 
> > > > > > > > handles
> > > > > > > > that can be used with the HIP IPC API, i.e. big numbers rather 
> > > > > > > > than
> > > > > > > > file descriptors.
> > > > > > > First up why? I get the how.
> > > > > > The "why" is compatibility with HIP code ported from CUDA. The
> > > > > > equivalent CUDA IPC API works with handles that can be communicated
> > > > > > through e.g. a pipe or shared memory. You can't do that with file
> > > > > > descriptors.
> > > > > Okay that sort of useful information should definitely be in the patch
> > > > > description.
> > > > > 
> > > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.nvidia.com%2Fcuda%2Fcuda-runtime-api%2Fgroup__CUDART__DEVICE.html%23group__CUDART__DEVICE_1g8a37f7dfafaca652391d0758b3667539&data=02%7C01%7Cchristian.koenig%40amd.com%7C2e0b8a6d2aac49e0f6c908d827dbcb46%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637303172250574336&sdata=mDWVEDsP%2BKTvvhqYp%2BSstczPtEV9l7n%2B%2BuNj30de0sQ%3D&reserved=0
> > > > > > 
> > > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.nvidia.com%2Fcuda%2Fcuda-runtime-api%2Fgroup__CUDART__DEVICE.html%23group__CUDART__DEVICE_1g01050a29fefde385b1042081ada4cde9&data=02%7C01%7Cchristian.koenig%40amd.com%7C2e0b8a6d2aac49e0f6c908d827dbcb46%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637303172250574336&sdata=yb80LA1csqkctuF1rAXBpYgJT%2BkS0Nmfilnd8%2BjQSW4%3D&reserved=0
> > > > > > 
> > > > > > > > + * @share_handle is a 128 bit random number generated with
> > > > > > > > + * @get_random_bytes. This number should be very hard to guess.
> > > > > > > > + * Knowledge of the @share_handle implies authorization to 
> > > > > > > > access
> > > > > > > > + * the shared memory. User mode should treat it like a secret 
> > > > > > > > key.
> > > > > > > > + * It can be used to import this BO in a different process 
> > > > > > > > context
> > > > > > > > + * for IPC buffer sharing. The handle will be valid as long as 
> > > > > > > > the
> > > > > > > > + * underlying BO exists. If the same BO is exported multiple 
> > > > > > > > times,
> > > > > > > Do we have any examples of any APIs in the kernel that operate 
> > > > > > > like
> > > > > > > this? That don't at least layer some sort of file permissions  and
> > > > > > > access control on top?
> > > > > > SystemV shared memory APIs (shmget, shmat) work similarly. There are
> > > > > > some permissions that can be specified by the exporter in shmget.
> > > > > > However, the handles are just numbers and much easier to guess 
> > > > > > (they are
> > > > > > 32-bit integers). The most restrictive permissions would allow only 
> > > > > > the
> > > > > > exporting UID to attach to the shared memory segment.
> > > > > > 
> > > > > > I think DRM flink works similarly as well, with a global name IDR 
> > > > > > used
> > > > > > for looking up GEM objects using global object names.
> > > > > flink is why I asked, because flink was a mistake and not one I'd care
> > > > > to make again.
> > > > > shm is horrible also, but at least has some permissions on what users
> > > > > can attack it.
> > > > Yeah this smells way too much like flink. I had the same reaction, and
> > > > kinda sad that we have to do this because nvidia defines how this works
> > > > with 0 input from anyone else. Oh well, the world sucks.
> > > > 
> > > > > > > The reason fd's are good is that combined with unix sockets, you 
> > > > > > > can't
> > > > > > > sniff it, you can't ptrace a process and find it, you can't write 
> > > > > > > it
> > > > > > > out in a coredump and have someone access it later.
> > > > > > Arguably ptrace and core dumps give you access to all the memory
> > > > > > contents already. So you don't need the shared memory handle to 
> > > > > > access
> > > > > > memory in that case.
> > > > > core dumps might not dump this memory though, but yeah ptrace would
> > > > > likely already mean you have access.
> > > > > 
> > > > > > > Maybe someone who knows security can ack merging this sort of uAPI
> > > > > > > design, I'm not confident in what it's doing is in any ways a good
> > > > > > > idea. I might have to ask some people to take a closer look.
> > > > > > Please do. We have tried to make this API as secure as possible 
> > > > > > within
> > > > > > the constraints of the user mode API we needed to implement.
> > > > > I'll see if I hear back, b

Re: [PATCH 1/1] drm/amdkfd: Add IPC API

2020-07-14 Thread Christian König

Am 14.07.20 um 11:53 schrieb Daniel Vetter:

On Tue, Jul 14, 2020 at 11:28:12AM +0200, Christian König wrote:

Am 14.07.20 um 10:58 schrieb Daniel Vetter:

On Tue, Jul 14, 2020 at 02:26:36PM +1000, Dave Airlie wrote:

On Tue, 14 Jul 2020 at 14:09, Felix Kuehling  wrote:

Am 2020-07-13 um 11:28 p.m. schrieb Dave Airlie:

On Tue, 14 Jul 2020 at 13:14, Felix Kuehling  wrote:

This allows exporting and importing buffers. The API generates handles
that can be used with the HIP IPC API, i.e. big numbers rather than
file descriptors.

First up why? I get the how.

The "why" is compatibility with HIP code ported from CUDA. The
equivalent CUDA IPC API works with handles that can be communicated
through e.g. a pipe or shared memory. You can't do that with file
descriptors.

Okay that sort of useful information should definitely be in the patch
description.


https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.nvidia.com%2Fcuda%2Fcuda-runtime-api%2Fgroup__CUDART__DEVICE.html%23group__CUDART__DEVICE_1g8a37f7dfafaca652391d0758b3667539&data=02%7C01%7Cchristian.koenig%40amd.com%7C2e0b8a6d2aac49e0f6c908d827dbcb46%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637303172250574336&sdata=mDWVEDsP%2BKTvvhqYp%2BSstczPtEV9l7n%2B%2BuNj30de0sQ%3D&reserved=0

https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.nvidia.com%2Fcuda%2Fcuda-runtime-api%2Fgroup__CUDART__DEVICE.html%23group__CUDART__DEVICE_1g01050a29fefde385b1042081ada4cde9&data=02%7C01%7Cchristian.koenig%40amd.com%7C2e0b8a6d2aac49e0f6c908d827dbcb46%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637303172250574336&sdata=yb80LA1csqkctuF1rAXBpYgJT%2BkS0Nmfilnd8%2BjQSW4%3D&reserved=0


+ * @share_handle is a 128 bit random number generated with
+ * @get_random_bytes. This number should be very hard to guess.
+ * Knowledge of the @share_handle implies authorization to access
+ * the shared memory. User mode should treat it like a secret key.
+ * It can be used to import this BO in a different process context
+ * for IPC buffer sharing. The handle will be valid as long as the
+ * underlying BO exists. If the same BO is exported multiple times,

Do we have any examples of any APIs in the kernel that operate like
this? That don't at least layer some sort of file permissions  and
access control on top?

SystemV shared memory APIs (shmget, shmat) work similarly. There are
some permissions that can be specified by the exporter in shmget.
However, the handles are just numbers and much easier to guess (they are
32-bit integers). The most restrictive permissions would allow only the
exporting UID to attach to the shared memory segment.

I think DRM flink works similarly as well, with a global name IDR used
for looking up GEM objects using global object names.

flink is why I asked, because flink was a mistake and not one I'd care
to make again.
shm is horrible also, but at least has some permissions on what users
can attack it.

Yeah this smells way too much like flink. I had the same reaction, and
kinda sad that we have to do this because nvidia defines how this works
with 0 input from anyone else. Oh well, the world sucks.


The reason fd's are good is that combined with unix sockets, you can't
sniff it, you can't ptrace a process and find it, you can't write it
out in a coredump and have someone access it later.

Arguably ptrace and core dumps give you access to all the memory
contents already. So you don't need the shared memory handle to access
memory in that case.

core dumps might not dump this memory though, but yeah ptrace would
likely already mean you have access.


Maybe someone who knows security can ack merging this sort of uAPI
design, I'm not confident in what it's doing is in any ways a good
idea. I might have to ask some people to take a closer look.

Please do. We have tried to make this API as secure as possible within
the constraints of the user mode API we needed to implement.

I'll see if I hear back, but also if danvet has any input like I
suppose it's UUID based buffer access, so maybe 128-bit is enough and
you have enough entropy not to create anything insanely predictable.

So one idea that crossed my mind is if we don't want to do this as a
generic dma-buf handle converter.

Something like /dev/dri/cuda_is_nasty (maybe slightly nicer name) which
provides a generic dma-buf <-> cuda uuid converter. With separate access
restrictions, so admins can decide whether they want to allow this
silliness, or not. Anyone else who wants to reimplement cuda will need
this too, so that's another reason for splitting this out.

Wrt security: I think assuming that there's none and the lookup has a
side-channel you can use to efficiently scan the entire range is probably
the safe approach here. This is way out of my league, but I think people
who know how to do this won't have a much harder time scanning this than
the flink space.

Also, if we have one common uuid->dma-buf converter, we might actually
have a chance to proof the "it's not secur

Re: [PATCH 1/1] drm/amdkfd: Add IPC API

2020-07-14 Thread Daniel Vetter
On Tue, Jul 14, 2020 at 11:28:12AM +0200, Christian König wrote:
> Am 14.07.20 um 10:58 schrieb Daniel Vetter:
> > On Tue, Jul 14, 2020 at 02:26:36PM +1000, Dave Airlie wrote:
> > > On Tue, 14 Jul 2020 at 14:09, Felix Kuehling  
> > > wrote:
> > > > Am 2020-07-13 um 11:28 p.m. schrieb Dave Airlie:
> > > > > On Tue, 14 Jul 2020 at 13:14, Felix Kuehling  
> > > > > wrote:
> > > > > > This allows exporting and importing buffers. The API generates 
> > > > > > handles
> > > > > > that can be used with the HIP IPC API, i.e. big numbers rather than
> > > > > > file descriptors.
> > > > > First up why? I get the how.
> > > > The "why" is compatibility with HIP code ported from CUDA. The
> > > > equivalent CUDA IPC API works with handles that can be communicated
> > > > through e.g. a pipe or shared memory. You can't do that with file
> > > > descriptors.
> > > Okay that sort of useful information should definitely be in the patch
> > > description.
> > > 
> > > > https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE_1g8a37f7dfafaca652391d0758b3667539
> > > > 
> > > > https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE_1g01050a29fefde385b1042081ada4cde9
> > > > 
> > > > > > + * @share_handle is a 128 bit random number generated with
> > > > > > + * @get_random_bytes. This number should be very hard to guess.
> > > > > > + * Knowledge of the @share_handle implies authorization to access
> > > > > > + * the shared memory. User mode should treat it like a secret key.
> > > > > > + * It can be used to import this BO in a different process context
> > > > > > + * for IPC buffer sharing. The handle will be valid as long as the
> > > > > > + * underlying BO exists. If the same BO is exported multiple times,
> > > > > Do we have any examples of any APIs in the kernel that operate like
> > > > > this? That don't at least layer some sort of file permissions  and
> > > > > access control on top?
> > > > SystemV shared memory APIs (shmget, shmat) work similarly. There are
> > > > some permissions that can be specified by the exporter in shmget.
> > > > However, the handles are just numbers and much easier to guess (they are
> > > > 32-bit integers). The most restrictive permissions would allow only the
> > > > exporting UID to attach to the shared memory segment.
> > > > 
> > > > I think DRM flink works similarly as well, with a global name IDR used
> > > > for looking up GEM objects using global object names.
> > > flink is why I asked, because flink was a mistake and not one I'd care
> > > to make again.
> > > shm is horrible also, but at least has some permissions on what users
> > > can attack it.
> > Yeah this smells way too much like flink. I had the same reaction, and
> > kinda sad that we have to do this because nvidia defines how this works
> > with 0 input from anyone else. Oh well, the world sucks.
> > 
> > > > > The reason fd's are good is that combined with unix sockets, you can't
> > > > > sniff it, you can't ptrace a process and find it, you can't write it
> > > > > out in a coredump and have someone access it later.
> > > > Arguably ptrace and core dumps give you access to all the memory
> > > > contents already. So you don't need the shared memory handle to access
> > > > memory in that case.
> > > core dumps might not dump this memory though, but yeah ptrace would
> > > likely already mean you have access.
> > > 
> > > > > Maybe someone who knows security can ack merging this sort of uAPI
> > > > > design, I'm not confident in what it's doing is in any ways a good
> > > > > idea. I might have to ask some people to take a closer look.
> > > > Please do. We have tried to make this API as secure as possible within
> > > > the constraints of the user mode API we needed to implement.
> > > I'll see if I hear back, but also if danvet has any input like I
> > > suppose it's UUID based buffer access, so maybe 128-bit is enough and
> > > you have enough entropy not to create anything insanely predictable.
> > So one idea that crossed my mind is if we don't want to do this as a
> > generic dma-buf handle converter.
> > 
> > Something like /dev/dri/cuda_is_nasty (maybe slightly nicer name) which
> > provides a generic dma-buf <-> cuda uuid converter. With separate access
> > restrictions, so admins can decide whether they want to allow this
> > silliness, or not. Anyone else who wants to reimplement cuda will need
> > this too, so that's another reason for splitting this out.
> > 
> > Wrt security: I think assuming that there's none and the lookup has a
> > side-channel you can use to efficiently scan the entire range is probably
> > the safe approach here. This is way out of my league, but I think people
> > who know how to do this won't have a much harder time scanning this than
> > the flink space.
> > 
> > Also, if we have one common uuid->dma-buf converter, we might actually
> > have a chance to proof the "it's not secure" assumpti

Re: [PATCH 1/1] drm/amdkfd: Add IPC API

2020-07-14 Thread Christian König

Am 14.07.20 um 10:58 schrieb Daniel Vetter:

On Tue, Jul 14, 2020 at 02:26:36PM +1000, Dave Airlie wrote:

On Tue, 14 Jul 2020 at 14:09, Felix Kuehling  wrote:

Am 2020-07-13 um 11:28 p.m. schrieb Dave Airlie:

On Tue, 14 Jul 2020 at 13:14, Felix Kuehling  wrote:

This allows exporting and importing buffers. The API generates handles
that can be used with the HIP IPC API, i.e. big numbers rather than
file descriptors.

First up why? I get the how.

The "why" is compatibility with HIP code ported from CUDA. The
equivalent CUDA IPC API works with handles that can be communicated
through e.g. a pipe or shared memory. You can't do that with file
descriptors.

Okay that sort of useful information should definitely be in the patch
description.


https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE_1g8a37f7dfafaca652391d0758b3667539

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE_1g01050a29fefde385b1042081ada4cde9


+ * @share_handle is a 128 bit random number generated with
+ * @get_random_bytes. This number should be very hard to guess.
+ * Knowledge of the @share_handle implies authorization to access
+ * the shared memory. User mode should treat it like a secret key.
+ * It can be used to import this BO in a different process context
+ * for IPC buffer sharing. The handle will be valid as long as the
+ * underlying BO exists. If the same BO is exported multiple times,

Do we have any examples of any APIs in the kernel that operate like
this? That don't at least layer some sort of file permissions  and
access control on top?

SystemV shared memory APIs (shmget, shmat) work similarly. There are
some permissions that can be specified by the exporter in shmget.
However, the handles are just numbers and much easier to guess (they are
32-bit integers). The most restrictive permissions would allow only the
exporting UID to attach to the shared memory segment.

I think DRM flink works similarly as well, with a global name IDR used
for looking up GEM objects using global object names.

flink is why I asked, because flink was a mistake and not one I'd care
to make again.
shm is horrible also, but at least has some permissions on what users
can attack it.

Yeah this smells way too much like flink. I had the same reaction, and
kinda sad that we have to do this because nvidia defines how this works
with 0 input from anyone else. Oh well, the world sucks.


The reason fd's are good is that combined with unix sockets, you can't
sniff it, you can't ptrace a process and find it, you can't write it
out in a coredump and have someone access it later.

Arguably ptrace and core dumps give you access to all the memory
contents already. So you don't need the shared memory handle to access
memory in that case.

core dumps might not dump this memory though, but yeah ptrace would
likely already mean you have access.


Maybe someone who knows security can ack merging this sort of uAPI
design, I'm not confident in what it's doing is in any ways a good
idea. I might have to ask some people to take a closer look.

Please do. We have tried to make this API as secure as possible within
the constraints of the user mode API we needed to implement.

I'll see if I hear back, but also if danvet has any input like I
suppose it's UUID based buffer access, so maybe 128-bit is enough and
you have enough entropy not to create anything insanely predictable.

So one idea that crossed my mind is if we don't want to do this as a
generic dma-buf handle converter.

Something like /dev/dri/cuda_is_nasty (maybe slightly nicer name) which
provides a generic dma-buf <-> cuda uuid converter. With separate access
restrictions, so admins can decide whether they want to allow this
silliness, or not. Anyone else who wants to reimplement cuda will need
this too, so that's another reason for splitting this out.

Wrt security: I think assuming that there's none and the lookup has a
side-channel you can use to efficiently scan the entire range is probably
the safe approach here. This is way out of my league, but I think people
who know how to do this won't have a much harder time scanning this than
the flink space.

Also, if we have one common uuid->dma-buf converter, we might actually
have a chance to proof the "it's not secure" assumption wrong. Also, we
might be able to tie this into cgroups or namespaces or similar that way.

Just some thoughts to give my initial "eek, why this" reaction a bit more
substance :-) No idea whether this would work or make more sense.


Yeah, my initial reaction was the same. On the pro side is that we use 
more than the 32bits flink did as identifier.



What we could maybe do to improve this is to link DMA-buf file 
descriptors into the file system from userspace. And then we could just 
do something like:


open("/tmp/dma-buf-0x0123-4567-89AB-CDEF-0123-4567-89AB-CDEF", "rw");

But to be honest I don't really know the fs code that well enough to

Re: [PATCH 1/1] drm/amdkfd: Add IPC API

2020-07-14 Thread Daniel Vetter
On Tue, Jul 14, 2020 at 02:26:36PM +1000, Dave Airlie wrote:
> On Tue, 14 Jul 2020 at 14:09, Felix Kuehling  wrote:
> >
> > Am 2020-07-13 um 11:28 p.m. schrieb Dave Airlie:
> > > On Tue, 14 Jul 2020 at 13:14, Felix Kuehling  
> > > wrote:
> > >> This allows exporting and importing buffers. The API generates handles
> > >> that can be used with the HIP IPC API, i.e. big numbers rather than
> > >> file descriptors.
> > > First up why? I get the how.
> >
> > The "why" is compatibility with HIP code ported from CUDA. The
> > equivalent CUDA IPC API works with handles that can be communicated
> > through e.g. a pipe or shared memory. You can't do that with file
> > descriptors.
> 
> Okay that sort of useful information should definitely be in the patch
> description.
> 
> >
> > https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE_1g8a37f7dfafaca652391d0758b3667539
> >
> > https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE_1g01050a29fefde385b1042081ada4cde9
> >
> > >
> > >> + * @share_handle is a 128 bit random number generated with
> > >> + * @get_random_bytes. This number should be very hard to guess.
> > >> + * Knowledge of the @share_handle implies authorization to access
> > >> + * the shared memory. User mode should treat it like a secret key.
> > >> + * It can be used to import this BO in a different process context
> > >> + * for IPC buffer sharing. The handle will be valid as long as the
> > >> + * underlying BO exists. If the same BO is exported multiple times,
> > > Do we have any examples of any APIs in the kernel that operate like
> > > this? That don't at least layer some sort of file permissions  and
> > > access control on top?
> >
> > SystemV shared memory APIs (shmget, shmat) work similarly. There are
> > some permissions that can be specified by the exporter in shmget.
> > However, the handles are just numbers and much easier to guess (they are
> > 32-bit integers). The most restrictive permissions would allow only the
> > exporting UID to attach to the shared memory segment.
> >
> > I think DRM flink works similarly as well, with a global name IDR used
> > for looking up GEM objects using global object names.
> 
> flink is why I asked, because flink was a mistake and not one I'd care
> to make again.
> shm is horrible also, but at least has some permissions on what users
> can attack it.

Yeah this smells way too much like flink. I had the same reaction, and
kinda sad that we have to do this because nvidia defines how this works
with 0 input from anyone else. Oh well, the world sucks.

> > > The reason fd's are good is that combined with unix sockets, you can't
> > > sniff it, you can't ptrace a process and find it, you can't write it
> > > out in a coredump and have someone access it later.
> >
> > Arguably ptrace and core dumps give you access to all the memory
> > contents already. So you don't need the shared memory handle to access
> > memory in that case.
> 
> core dumps might not dump this memory though, but yeah ptrace would
> likely already mean you have access.
> 
> > > Maybe someone who knows security can ack merging this sort of uAPI
> > > design, I'm not confident in what it's doing is in any ways a good
> > > idea. I might have to ask some people to take a closer look.
> >
> > Please do. We have tried to make this API as secure as possible within
> > the constraints of the user mode API we needed to implement.
> 
> I'll see if I hear back, but also if danvet has any input like I
> suppose it's UUID based buffer access, so maybe 128-bit is enough and
> you have enough entropy not to create anything insanely predictable.

So one idea that crossed my mind is if we don't want to do this as a
generic dma-buf handle converter.

Something like /dev/dri/cuda_is_nasty (maybe slightly nicer name) which
provides a generic dma-buf <-> cuda uuid converter. With separate access
restrictions, so admins can decide whether they want to allow this
silliness, or not. Anyone else who wants to reimplement cuda will need
this too, so that's another reason for splitting this out.

Wrt security: I think assuming that there's none and the lookup has a
side-channel you can use to efficiently scan the entire range is probably
the safe approach here. This is way out of my league, but I think people
who know how to do this won't have a much harder time scanning this than
the flink space.

Also, if we have one common uuid->dma-buf converter, we might actually
have a chance to proof the "it's not secure" assumption wrong. Also, we
might be able to tie this into cgroups or namespaces or similar that way.

Just some thoughts to give my initial "eek, why this" reaction a bit more
substance :-) No idea whether this would work or make more sense.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedes

Re: [PATCH 1/1] drm/amdkfd: Add IPC API

2020-07-13 Thread Felix Kuehling
Am 2020-07-14 um 1:05 a.m. schrieb Li, Dennis:
> [AMD Official Use Only - Internal Distribution Only]
>
> Hi, Felix,
>   amdgpu_gem_prime_export has different define in the old driver. I added 
> some comment in the below codes. 
>
> Best Regards
> Dennis Li
> -Original Message-
> From: amd-gfx  On Behalf Of Felix 
> Kuehling
> Sent: Tuesday, July 14, 2020 11:15 AM
> To: amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org
> Cc: Deucher, Alexander ; daniel.vet...@ffwll.ch; 
> airl...@gmail.com
> Subject: [PATCH 1/1] drm/amdkfd: Add IPC API
>
> This allows exporting and importing buffers. The API generates handles that 
> can be used with the HIP IPC API, i.e. big numbers rather than file 
> descriptors.
>
> Signed-off-by: Felix Kuehling 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|   5 +
>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  56 +++-
>  drivers/gpu/drm/amd/amdkfd/Makefile   |   3 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  74 ++---
>  drivers/gpu/drm/amd/amdkfd/kfd_ipc.c  | 263 ++
>  drivers/gpu/drm/amd/amdkfd/kfd_ipc.h  |  55 
>  drivers/gpu/drm/amd/amdkfd/kfd_module.c   |   5 +
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   5 +
>  include/uapi/linux/kfd_ioctl.h|  62 -
>  9 files changed, 488 insertions(+), 40 deletions(-)  create mode 100644 
> drivers/gpu/drm/amd/amdkfd/kfd_ipc.c
>  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_ipc.h
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> index 3f2b695cf19e..0f8dc4c4f924 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> @@ -49,6 +49,7 @@ struct kfd_bo_va_list {  struct kgd_mem {
>   struct mutex lock;
>   struct amdgpu_bo *bo;
> + struct kfd_ipc_obj *ipc_obj;
>   struct list_head bo_va_list;
>   /* protected by amdkfd_process_info.lock */
>   struct ttm_validate_buffer validate_list; @@ -240,9 +241,13 @@ int 
> amdgpu_amdkfd_gpuvm_get_vm_fault_info(struct kgd_dev *kgd,
>  
>  int amdgpu_amdkfd_gpuvm_import_dmabuf(struct kgd_dev *kgd,
> struct dma_buf *dmabuf,
> +   struct kfd_ipc_obj *ipc_obj,
> uint64_t va, void *vm,
> struct kgd_mem **mem, uint64_t *size,
> uint64_t *mmap_offset);
> +int amdgpu_amdkfd_gpuvm_export_ipc_obj(struct kgd_dev *kgd, void *vm,
> +struct kgd_mem *mem,
> +struct kfd_ipc_obj **ipc_obj);
>  
>  void amdgpu_amdkfd_gpuvm_init_mem_limits(void);
>  void amdgpu_amdkfd_unreserve_memory_limit(struct amdgpu_bo *bo); diff --git 
> a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index c408936e8f98..cd5f23c0c2ca 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -29,6 +29,7 @@
>  #include "amdgpu_vm.h"
>  #include "amdgpu_amdkfd.h"
>  #include "amdgpu_dma_buf.h"
> +#include "kfd_ipc.h"
>  #include 
>  
>  /* BO flag to indicate a KFD userptr BO */ @@ -1353,6 +1354,9 @@ int 
> amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>   *size = 0;
>   }
>  
> + /* Unreference the ipc_obj if applicable */
> + kfd_ipc_obj_put(&mem->ipc_obj);
> +
>   /* Free the BO*/
>   drm_gem_object_put_unlocked(&mem->bo->tbo.base);
>   mutex_destroy(&mem->lock);
> @@ -1656,6 +1660,7 @@ int amdgpu_amdkfd_gpuvm_get_vm_fault_info(struct 
> kgd_dev *kgd,
>  
>  int amdgpu_amdkfd_gpuvm_import_dmabuf(struct kgd_dev *kgd,
> struct dma_buf *dma_buf,
> +   struct kfd_ipc_obj *ipc_obj,
> uint64_t va, void *vm,
> struct kgd_mem **mem, uint64_t *size,
> uint64_t *mmap_offset)
> @@ -1692,15 +1697,18 @@ int amdgpu_amdkfd_gpuvm_import_dmabuf(struct kgd_dev 
> *kgd,
>  
>   INIT_LIST_HEAD(&(*mem)->bo_va_list);
>   mutex_init(&(*mem)->lock);
> - 
> - (*mem)->alloc_flags =
> - ((bo->preferred_domains & AMDGPU_GEM_DOMAIN_VRAM) ?
> - KFD_IOC_ALLOC_MEM_FLAGS_VRAM : KFD_IOC_ALLOC_MEM_FLAGS_GTT)
> - | KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE
> -   

RE: [PATCH 1/1] drm/amdkfd: Add IPC API

2020-07-13 Thread Li, Dennis
[AMD Official Use Only - Internal Distribution Only]

Hi, Felix,
  amdgpu_gem_prime_export has different define in the old driver. I added 
some comment in the below codes. 

Best Regards
Dennis Li
-Original Message-
From: amd-gfx  On Behalf Of Felix 
Kuehling
Sent: Tuesday, July 14, 2020 11:15 AM
To: amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org
Cc: Deucher, Alexander ; daniel.vet...@ffwll.ch; 
airl...@gmail.com
Subject: [PATCH 1/1] drm/amdkfd: Add IPC API

This allows exporting and importing buffers. The API generates handles that can 
be used with the HIP IPC API, i.e. big numbers rather than file descriptors.

Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|   5 +
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  56 +++-
 drivers/gpu/drm/amd/amdkfd/Makefile   |   3 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  74 ++---
 drivers/gpu/drm/amd/amdkfd/kfd_ipc.c  | 263 ++
 drivers/gpu/drm/amd/amdkfd/kfd_ipc.h  |  55 
 drivers/gpu/drm/amd/amdkfd/kfd_module.c   |   5 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   5 +
 include/uapi/linux/kfd_ioctl.h|  62 -
 9 files changed, 488 insertions(+), 40 deletions(-)  create mode 100644 
drivers/gpu/drm/amd/amdkfd/kfd_ipc.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_ipc.h

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 3f2b695cf19e..0f8dc4c4f924 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -49,6 +49,7 @@ struct kfd_bo_va_list {  struct kgd_mem {
struct mutex lock;
struct amdgpu_bo *bo;
+   struct kfd_ipc_obj *ipc_obj;
struct list_head bo_va_list;
/* protected by amdkfd_process_info.lock */
struct ttm_validate_buffer validate_list; @@ -240,9 +241,13 @@ int 
amdgpu_amdkfd_gpuvm_get_vm_fault_info(struct kgd_dev *kgd,
 
 int amdgpu_amdkfd_gpuvm_import_dmabuf(struct kgd_dev *kgd,
  struct dma_buf *dmabuf,
+ struct kfd_ipc_obj *ipc_obj,
  uint64_t va, void *vm,
  struct kgd_mem **mem, uint64_t *size,
  uint64_t *mmap_offset);
+int amdgpu_amdkfd_gpuvm_export_ipc_obj(struct kgd_dev *kgd, void *vm,
+  struct kgd_mem *mem,
+  struct kfd_ipc_obj **ipc_obj);
 
 void amdgpu_amdkfd_gpuvm_init_mem_limits(void);
 void amdgpu_amdkfd_unreserve_memory_limit(struct amdgpu_bo *bo); diff --git 
a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index c408936e8f98..cd5f23c0c2ca 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -29,6 +29,7 @@
 #include "amdgpu_vm.h"
 #include "amdgpu_amdkfd.h"
 #include "amdgpu_dma_buf.h"
+#include "kfd_ipc.h"
 #include 
 
 /* BO flag to indicate a KFD userptr BO */ @@ -1353,6 +1354,9 @@ int 
amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
*size = 0;
}
 
+   /* Unreference the ipc_obj if applicable */
+   kfd_ipc_obj_put(&mem->ipc_obj);
+
/* Free the BO*/
drm_gem_object_put_unlocked(&mem->bo->tbo.base);
mutex_destroy(&mem->lock);
@@ -1656,6 +1660,7 @@ int amdgpu_amdkfd_gpuvm_get_vm_fault_info(struct kgd_dev 
*kgd,
 
 int amdgpu_amdkfd_gpuvm_import_dmabuf(struct kgd_dev *kgd,
  struct dma_buf *dma_buf,
+ struct kfd_ipc_obj *ipc_obj,
  uint64_t va, void *vm,
  struct kgd_mem **mem, uint64_t *size,
  uint64_t *mmap_offset)
@@ -1692,15 +1697,18 @@ int amdgpu_amdkfd_gpuvm_import_dmabuf(struct kgd_dev 
*kgd,
 
INIT_LIST_HEAD(&(*mem)->bo_va_list);
mutex_init(&(*mem)->lock);
-   
-   (*mem)->alloc_flags =
-   ((bo->preferred_domains & AMDGPU_GEM_DOMAIN_VRAM) ?
-   KFD_IOC_ALLOC_MEM_FLAGS_VRAM : KFD_IOC_ALLOC_MEM_FLAGS_GTT)
-   | KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE
-   | KFD_IOC_ALLOC_MEM_FLAGS_EXECUTABLE;
+   if (bo->kfd_bo)
+   (*mem)->alloc_flags = bo->kfd_bo->alloc_flags;
+   else
+   (*mem)->alloc_flags =
+   ((bo->preferred_domains & AMDGPU_GEM_DOMAIN_VRAM) ?
+   KFD_IOC_ALLOC_MEM_FLAGS_VRAM : 
KFD_IOC_ALLOC_MEM_FLAGS_GTT)
+   | KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE
+   | KFD_IOC_ALLOC_MEM_FLAGS_EXECUTABLE;
 
drm_gem_object_get(&bo->tbo.bas

Re: [PATCH 1/1] drm/amdkfd: Add IPC API

2020-07-13 Thread Dave Airlie
On Tue, 14 Jul 2020 at 14:09, Felix Kuehling  wrote:
>
> Am 2020-07-13 um 11:28 p.m. schrieb Dave Airlie:
> > On Tue, 14 Jul 2020 at 13:14, Felix Kuehling  wrote:
> >> This allows exporting and importing buffers. The API generates handles
> >> that can be used with the HIP IPC API, i.e. big numbers rather than
> >> file descriptors.
> > First up why? I get the how.
>
> The "why" is compatibility with HIP code ported from CUDA. The
> equivalent CUDA IPC API works with handles that can be communicated
> through e.g. a pipe or shared memory. You can't do that with file
> descriptors.

Okay that sort of useful information should definitely be in the patch
description.

>
> https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE_1g8a37f7dfafaca652391d0758b3667539
>
> https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE_1g01050a29fefde385b1042081ada4cde9
>
> >
> >> + * @share_handle is a 128 bit random number generated with
> >> + * @get_random_bytes. This number should be very hard to guess.
> >> + * Knowledge of the @share_handle implies authorization to access
> >> + * the shared memory. User mode should treat it like a secret key.
> >> + * It can be used to import this BO in a different process context
> >> + * for IPC buffer sharing. The handle will be valid as long as the
> >> + * underlying BO exists. If the same BO is exported multiple times,
> > Do we have any examples of any APIs in the kernel that operate like
> > this? That don't at least layer some sort of file permissions  and
> > access control on top?
>
> SystemV shared memory APIs (shmget, shmat) work similarly. There are
> some permissions that can be specified by the exporter in shmget.
> However, the handles are just numbers and much easier to guess (they are
> 32-bit integers). The most restrictive permissions would allow only the
> exporting UID to attach to the shared memory segment.
>
> I think DRM flink works similarly as well, with a global name IDR used
> for looking up GEM objects using global object names.

flink is why I asked, because flink was a mistake and not one I'd care
to make again.
shm is horrible also, but at least has some permissions on what users
can attack it.

> > The reason fd's are good is that combined with unix sockets, you can't
> > sniff it, you can't ptrace a process and find it, you can't write it
> > out in a coredump and have someone access it later.
>
> Arguably ptrace and core dumps give you access to all the memory
> contents already. So you don't need the shared memory handle to access
> memory in that case.

core dumps might not dump this memory though, but yeah ptrace would
likely already mean you have access.

> > Maybe someone who knows security can ack merging this sort of uAPI
> > design, I'm not confident in what it's doing is in any ways a good
> > idea. I might have to ask some people to take a closer look.
>
> Please do. We have tried to make this API as secure as possible within
> the constraints of the user mode API we needed to implement.

I'll see if I hear back, but also if danvet has any input like I
suppose it's UUID based buffer access, so maybe 128-bit is enough and
you have enough entropy not to create anything insanely predictable.

Dave.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 1/1] drm/amdkfd: Add IPC API

2020-07-13 Thread Felix Kuehling
Am 2020-07-13 um 11:28 p.m. schrieb Dave Airlie:
> On Tue, 14 Jul 2020 at 13:14, Felix Kuehling  wrote:
>> This allows exporting and importing buffers. The API generates handles
>> that can be used with the HIP IPC API, i.e. big numbers rather than
>> file descriptors.
> First up why? I get the how.

The "why" is compatibility with HIP code ported from CUDA. The
equivalent CUDA IPC API works with handles that can be communicated
through e.g. a pipe or shared memory. You can't do that with file
descriptors.

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE_1g8a37f7dfafaca652391d0758b3667539

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE_1g01050a29fefde385b1042081ada4cde9


>
>> + * @share_handle is a 128 bit random number generated with
>> + * @get_random_bytes. This number should be very hard to guess.
>> + * Knowledge of the @share_handle implies authorization to access
>> + * the shared memory. User mode should treat it like a secret key.
>> + * It can be used to import this BO in a different process context
>> + * for IPC buffer sharing. The handle will be valid as long as the
>> + * underlying BO exists. If the same BO is exported multiple times,
> Do we have any examples of any APIs in the kernel that operate like
> this? That don't at least layer some sort of file permissions  and
> access control on top?

SystemV shared memory APIs (shmget, shmat) work similarly. There are
some permissions that can be specified by the exporter in shmget.
However, the handles are just numbers and much easier to guess (they are
32-bit integers). The most restrictive permissions would allow only the
exporting UID to attach to the shared memory segment.

I think DRM flink works similarly as well, with a global name IDR used
for looking up GEM objects using global object names.


>
> The reason fd's are good is that combined with unix sockets, you can't
> sniff it, you can't ptrace a process and find it, you can't write it
> out in a coredump and have someone access it later.

Arguably ptrace and core dumps give you access to all the memory
contents already. So you don't need the shared memory handle to access
memory in that case.


>
> To me this isn't secure design, it's obscure design, now I get to find
> you've likely shipped this in "downstream" ROCm already, and have
> customers running it.
>
> Maybe someone who knows security can ack merging this sort of uAPI
> design, I'm not confident in what it's doing is in any ways a good
> idea. I might have to ask some people to take a closer look.

Please do. We have tried to make this API as secure as possible within
the constraints of the user mode API we needed to implement.

Thanks,
  Felix


>
> Dave.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 1/1] drm/amdkfd: Add IPC API

2020-07-13 Thread Dave Airlie
On Tue, 14 Jul 2020 at 13:14, Felix Kuehling  wrote:
>
> This allows exporting and importing buffers. The API generates handles
> that can be used with the HIP IPC API, i.e. big numbers rather than
> file descriptors.

First up why? I get the how.

> + * @share_handle is a 128 bit random number generated with
> + * @get_random_bytes. This number should be very hard to guess.
> + * Knowledge of the @share_handle implies authorization to access
> + * the shared memory. User mode should treat it like a secret key.
> + * It can be used to import this BO in a different process context
> + * for IPC buffer sharing. The handle will be valid as long as the
> + * underlying BO exists. If the same BO is exported multiple times,

Do we have any examples of any APIs in the kernel that operate like
this? That don't at least layer some sort of file permissions  and
access control on top?

The reason fd's are good is that combined with unix sockets, you can't
sniff it, you can't ptrace a process and find it, you can't write it
out in a coredump and have someone access it later.

To me this isn't secure design, it's obscure design, now I get to find
you've likely shipped this in "downstream" ROCm already, and have
customers running it.

Maybe someone who knows security can ack merging this sort of uAPI
design, I'm not confident in what it's doing is in any ways a good
idea. I might have to ask some people to take a closer look.

Dave.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 1/1] drm/amdkfd: Add IPC API

2020-07-13 Thread Felix Kuehling
This allows exporting and importing buffers. The API generates handles
that can be used with the HIP IPC API, i.e. big numbers rather than
file descriptors.

Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|   5 +
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  56 +++-
 drivers/gpu/drm/amd/amdkfd/Makefile   |   3 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  74 ++---
 drivers/gpu/drm/amd/amdkfd/kfd_ipc.c  | 263 ++
 drivers/gpu/drm/amd/amdkfd/kfd_ipc.h  |  55 
 drivers/gpu/drm/amd/amdkfd/kfd_module.c   |   5 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   5 +
 include/uapi/linux/kfd_ioctl.h|  62 -
 9 files changed, 488 insertions(+), 40 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_ipc.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_ipc.h

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 3f2b695cf19e..0f8dc4c4f924 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -49,6 +49,7 @@ struct kfd_bo_va_list {
 struct kgd_mem {
struct mutex lock;
struct amdgpu_bo *bo;
+   struct kfd_ipc_obj *ipc_obj;
struct list_head bo_va_list;
/* protected by amdkfd_process_info.lock */
struct ttm_validate_buffer validate_list;
@@ -240,9 +241,13 @@ int amdgpu_amdkfd_gpuvm_get_vm_fault_info(struct kgd_dev 
*kgd,
 
 int amdgpu_amdkfd_gpuvm_import_dmabuf(struct kgd_dev *kgd,
  struct dma_buf *dmabuf,
+ struct kfd_ipc_obj *ipc_obj,
  uint64_t va, void *vm,
  struct kgd_mem **mem, uint64_t *size,
  uint64_t *mmap_offset);
+int amdgpu_amdkfd_gpuvm_export_ipc_obj(struct kgd_dev *kgd, void *vm,
+  struct kgd_mem *mem,
+  struct kfd_ipc_obj **ipc_obj);
 
 void amdgpu_amdkfd_gpuvm_init_mem_limits(void);
 void amdgpu_amdkfd_unreserve_memory_limit(struct amdgpu_bo *bo);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index c408936e8f98..cd5f23c0c2ca 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -29,6 +29,7 @@
 #include "amdgpu_vm.h"
 #include "amdgpu_amdkfd.h"
 #include "amdgpu_dma_buf.h"
+#include "kfd_ipc.h"
 #include 
 
 /* BO flag to indicate a KFD userptr BO */
@@ -1353,6 +1354,9 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
*size = 0;
}
 
+   /* Unreference the ipc_obj if applicable */
+   kfd_ipc_obj_put(&mem->ipc_obj);
+
/* Free the BO*/
drm_gem_object_put_unlocked(&mem->bo->tbo.base);
mutex_destroy(&mem->lock);
@@ -1656,6 +1660,7 @@ int amdgpu_amdkfd_gpuvm_get_vm_fault_info(struct kgd_dev 
*kgd,
 
 int amdgpu_amdkfd_gpuvm_import_dmabuf(struct kgd_dev *kgd,
  struct dma_buf *dma_buf,
+ struct kfd_ipc_obj *ipc_obj,
  uint64_t va, void *vm,
  struct kgd_mem **mem, uint64_t *size,
  uint64_t *mmap_offset)
@@ -1692,15 +1697,18 @@ int amdgpu_amdkfd_gpuvm_import_dmabuf(struct kgd_dev 
*kgd,
 
INIT_LIST_HEAD(&(*mem)->bo_va_list);
mutex_init(&(*mem)->lock);
-   
-   (*mem)->alloc_flags =
-   ((bo->preferred_domains & AMDGPU_GEM_DOMAIN_VRAM) ?
-   KFD_IOC_ALLOC_MEM_FLAGS_VRAM : KFD_IOC_ALLOC_MEM_FLAGS_GTT)
-   | KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE
-   | KFD_IOC_ALLOC_MEM_FLAGS_EXECUTABLE;
+   if (bo->kfd_bo)
+   (*mem)->alloc_flags = bo->kfd_bo->alloc_flags;
+   else
+   (*mem)->alloc_flags =
+   ((bo->preferred_domains & AMDGPU_GEM_DOMAIN_VRAM) ?
+   KFD_IOC_ALLOC_MEM_FLAGS_VRAM : 
KFD_IOC_ALLOC_MEM_FLAGS_GTT)
+   | KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE
+   | KFD_IOC_ALLOC_MEM_FLAGS_EXECUTABLE;
 
drm_gem_object_get(&bo->tbo.base);
(*mem)->bo = bo;
+   (*mem)->ipc_obj = ipc_obj;
(*mem)->va = va;
(*mem)->domain = (bo->preferred_domains & AMDGPU_GEM_DOMAIN_VRAM) ?
AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT;
@@ -1713,6 +1721,42 @@ int amdgpu_amdkfd_gpuvm_import_dmabuf(struct kgd_dev 
*kgd,
return 0;
 }
 
+int amdgpu_amdkfd_gpuvm_export_ipc_obj(struct kgd_dev *kgd, void *vm,
+  struct kgd_mem *mem,
+  struct kfd_ipc_obj **ipc_obj)
+{
+   struct amdgpu_device *adev = NULL;
+   struct dma_buf *dmabuf;
+   int r =