Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-16 Thread Greg KH
On Wed, Jul 16, 2014 at 10:21:14AM +0200, Daniel Vetter wrote:
> On Tue, Jul 15, 2014 at 7:53 PM, Bridgman, John  wrote:
> [snip away the discussion about hsa device discover, I'm hijacking
> this thread just for the event/fence stuff here.]
> 
> > ... There's an event mechanism still to come - mostly for communicating 
> > fences and shader interrupts back to userspace, but also used for "device 
> > change" notifications, so no polling of sysfs.
> 
> That could would be interesting. On i915 my plan is to internally use
> the recently added struct fence from Maarten. For the external
> interface for userspace that wants explicit control over fences I'm
> leaning towards polishing the android syncpt stuff (currently in
> staging). But in any case I _really_ want to avoid that we end up with
> multiple different and incompatible explicit fencing interfaces on
> linux.

I agree, and I'll say it stronger than that, we WILL NOT have different
and incompatible fencing interfaces in the kernel.  That way lies
madness.

John, take a look at what is now in linux-next, it should provide what
you need here, right?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-16 Thread Daniel Vetter
On Wed, Jul 16, 2014 at 10:52:56AM -0400, Jerome Glisse wrote:
> On Wed, Jul 16, 2014 at 10:27:42AM +0200, Daniel Vetter wrote:
> > On Tue, Jul 15, 2014 at 8:04 PM, Jerome Glisse  wrote:
> > >> Yes although it can be skipped on most systems. We figured that topology
> > >> needed to cover everything that would be handled by a single OS image, so
> > >> in a NUMA system it would need to cover all the CPUs. I think that is 
> > >> still
> > >> the right scope, do you agree ?
> > >
> > > I think it is a idea to duplicate cpu. I would rather have each device
> > > give its afinity against each cpu and for cpu just keep the existing
> > > kernel api that expose this through sysfs iirc.
> > 
> > It's all there already if we fix up the hsa dev-node model to expose
> > one dev node per underlying device instead of one for everything:
> > - cpus already expose the full numa topology in sysfs
> > - pci devices have a numa_node file in sysfs to display the link
> > - we can easily add similar stuff for platform devices on arm socs
> > without pci devices.
> > 
> > Then the only thing userspace needs to do is follow the device link in
> > the hsa instance node in sysfs and we have all the information
> > exposed. Iff we expose one hsa driver instance to userspace per
> > physical device (which is the normal linux device driver model
> > anyway).
> > 
> > I don't see a need to add anything hsa specific here at all (well
> > maybe some description of the cache architecture on the hsa device
> > itself, the spec seems to have provisions for that).
> > -Daniel
> 
> What is HSA specific is userspace command queue in form of common ring
> buffer execution queue all sharing common packet format. So yes i see
> a reason for an HSA class that provide common ioctl through one dev file
> per device. Note that i am not a fan of userspace command queue given
> that linux ioctl overhead is small and having kernel do stuff would
> allow for really "infinite" number of userspace context while right
> now limit is DOORBELL_APERTURE_SIZE/PAGE_SIZE.
> 
> No, CPU should not be included, neither should the numa topology of
> device. And yes all numa topology should use existing kernel interface.
> I however understand that a second GPU specific topology might make
> sense ie if you have specialize link btw some discrete GPU.
> 
> So if Intel wants to join the HSA foundation fine, but unless you are
> ready to implement what is needed i do not see the value of forcing
> your wish on another group that is trying to standardize something.

You're mixing up my replies ;-) This was really just a comment on the
proposed hsa interfaces for exposing the topology - we already have all
the stuff exposed in sysfs for cpus and pci devices, so exposing this
again through a hsa specific interface doesn't make much sense imo.

What intel does or not does is completely irrelevant for my comment. I.e.
I've written the above with my drm hacker hat on, not with my intel hat
on.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-16 Thread Jerome Glisse
On Wed, Jul 16, 2014 at 10:27:42AM +0200, Daniel Vetter wrote:
> On Tue, Jul 15, 2014 at 8:04 PM, Jerome Glisse  wrote:
> >> Yes although it can be skipped on most systems. We figured that topology
> >> needed to cover everything that would be handled by a single OS image, so
> >> in a NUMA system it would need to cover all the CPUs. I think that is still
> >> the right scope, do you agree ?
> >
> > I think it is a idea to duplicate cpu. I would rather have each device
> > give its afinity against each cpu and for cpu just keep the existing
> > kernel api that expose this through sysfs iirc.
> 
> It's all there already if we fix up the hsa dev-node model to expose
> one dev node per underlying device instead of one for everything:
> - cpus already expose the full numa topology in sysfs
> - pci devices have a numa_node file in sysfs to display the link
> - we can easily add similar stuff for platform devices on arm socs
> without pci devices.
> 
> Then the only thing userspace needs to do is follow the device link in
> the hsa instance node in sysfs and we have all the information
> exposed. Iff we expose one hsa driver instance to userspace per
> physical device (which is the normal linux device driver model
> anyway).
> 
> I don't see a need to add anything hsa specific here at all (well
> maybe some description of the cache architecture on the hsa device
> itself, the spec seems to have provisions for that).
> -Daniel

What is HSA specific is userspace command queue in form of common ring
buffer execution queue all sharing common packet format. So yes i see
a reason for an HSA class that provide common ioctl through one dev file
per device. Note that i am not a fan of userspace command queue given
that linux ioctl overhead is small and having kernel do stuff would
allow for really "infinite" number of userspace context while right
now limit is DOORBELL_APERTURE_SIZE/PAGE_SIZE.

No, CPU should not be included, neither should the numa topology of
device. And yes all numa topology should use existing kernel interface.
I however understand that a second GPU specific topology might make
sense ie if you have specialize link btw some discrete GPU.

So if Intel wants to join the HSA foundation fine, but unless you are
ready to implement what is needed i do not see the value of forcing
your wish on another group that is trying to standardize something.

Cheers,
Jérôme
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-16 Thread Daniel Vetter
On Tue, Jul 15, 2014 at 8:04 PM, Jerome Glisse  wrote:
>> Yes although it can be skipped on most systems. We figured that topology
>> needed to cover everything that would be handled by a single OS image, so
>> in a NUMA system it would need to cover all the CPUs. I think that is still
>> the right scope, do you agree ?
>
> I think it is a idea to duplicate cpu. I would rather have each device
> give its afinity against each cpu and for cpu just keep the existing
> kernel api that expose this through sysfs iirc.

It's all there already if we fix up the hsa dev-node model to expose
one dev node per underlying device instead of one for everything:
- cpus already expose the full numa topology in sysfs
- pci devices have a numa_node file in sysfs to display the link
- we can easily add similar stuff for platform devices on arm socs
without pci devices.

Then the only thing userspace needs to do is follow the device link in
the hsa instance node in sysfs and we have all the information
exposed. Iff we expose one hsa driver instance to userspace per
physical device (which is the normal linux device driver model
anyway).

I don't see a need to add anything hsa specific here at all (well
maybe some description of the cache architecture on the hsa device
itself, the spec seems to have provisions for that).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-16 Thread Daniel Vetter
On Tue, Jul 15, 2014 at 7:53 PM, Bridgman, John  wrote:
[snip away the discussion about hsa device discover, I'm hijacking
this thread just for the event/fence stuff here.]

> ... There's an event mechanism still to come - mostly for communicating 
> fences and shader interrupts back to userspace, but also used for "device 
> change" notifications, so no polling of sysfs.

That could would be interesting. On i915 my plan is to internally use
the recently added struct fence from Maarten. For the external
interface for userspace that wants explicit control over fences I'm
leaning towards polishing the android syncpt stuff (currently in
staging). But in any case I _really_ want to avoid that we end up with
multiple different and incompatible explicit fencing interfaces on
linux.

Adding relevant people.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-16 Thread Daniel Vetter
On Tue, Jul 15, 2014 at 7:53 PM, Bridgman, John john.bridg...@amd.com wrote:
[snip away the discussion about hsa device discover, I'm hijacking
this thread just for the event/fence stuff here.]

 ... There's an event mechanism still to come - mostly for communicating 
 fences and shader interrupts back to userspace, but also used for device 
 change notifications, so no polling of sysfs.

That could would be interesting. On i915 my plan is to internally use
the recently added struct fence from Maarten. For the external
interface for userspace that wants explicit control over fences I'm
leaning towards polishing the android syncpt stuff (currently in
staging). But in any case I _really_ want to avoid that we end up with
multiple different and incompatible explicit fencing interfaces on
linux.

Adding relevant people.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-16 Thread Daniel Vetter
On Tue, Jul 15, 2014 at 8:04 PM, Jerome Glisse j.gli...@gmail.com wrote:
 Yes although it can be skipped on most systems. We figured that topology
 needed to cover everything that would be handled by a single OS image, so
 in a NUMA system it would need to cover all the CPUs. I think that is still
 the right scope, do you agree ?

 I think it is a idea to duplicate cpu. I would rather have each device
 give its afinity against each cpu and for cpu just keep the existing
 kernel api that expose this through sysfs iirc.

It's all there already if we fix up the hsa dev-node model to expose
one dev node per underlying device instead of one for everything:
- cpus already expose the full numa topology in sysfs
- pci devices have a numa_node file in sysfs to display the link
- we can easily add similar stuff for platform devices on arm socs
without pci devices.

Then the only thing userspace needs to do is follow the device link in
the hsa instance node in sysfs and we have all the information
exposed. Iff we expose one hsa driver instance to userspace per
physical device (which is the normal linux device driver model
anyway).

I don't see a need to add anything hsa specific here at all (well
maybe some description of the cache architecture on the hsa device
itself, the spec seems to have provisions for that).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-16 Thread Jerome Glisse
On Wed, Jul 16, 2014 at 10:27:42AM +0200, Daniel Vetter wrote:
 On Tue, Jul 15, 2014 at 8:04 PM, Jerome Glisse j.gli...@gmail.com wrote:
  Yes although it can be skipped on most systems. We figured that topology
  needed to cover everything that would be handled by a single OS image, so
  in a NUMA system it would need to cover all the CPUs. I think that is still
  the right scope, do you agree ?
 
  I think it is a idea to duplicate cpu. I would rather have each device
  give its afinity against each cpu and for cpu just keep the existing
  kernel api that expose this through sysfs iirc.
 
 It's all there already if we fix up the hsa dev-node model to expose
 one dev node per underlying device instead of one for everything:
 - cpus already expose the full numa topology in sysfs
 - pci devices have a numa_node file in sysfs to display the link
 - we can easily add similar stuff for platform devices on arm socs
 without pci devices.
 
 Then the only thing userspace needs to do is follow the device link in
 the hsa instance node in sysfs and we have all the information
 exposed. Iff we expose one hsa driver instance to userspace per
 physical device (which is the normal linux device driver model
 anyway).
 
 I don't see a need to add anything hsa specific here at all (well
 maybe some description of the cache architecture on the hsa device
 itself, the spec seems to have provisions for that).
 -Daniel

What is HSA specific is userspace command queue in form of common ring
buffer execution queue all sharing common packet format. So yes i see
a reason for an HSA class that provide common ioctl through one dev file
per device. Note that i am not a fan of userspace command queue given
that linux ioctl overhead is small and having kernel do stuff would
allow for really infinite number of userspace context while right
now limit is DOORBELL_APERTURE_SIZE/PAGE_SIZE.

No, CPU should not be included, neither should the numa topology of
device. And yes all numa topology should use existing kernel interface.
I however understand that a second GPU specific topology might make
sense ie if you have specialize link btw some discrete GPU.

So if Intel wants to join the HSA foundation fine, but unless you are
ready to implement what is needed i do not see the value of forcing
your wish on another group that is trying to standardize something.

Cheers,
Jérôme
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-16 Thread Daniel Vetter
On Wed, Jul 16, 2014 at 10:52:56AM -0400, Jerome Glisse wrote:
 On Wed, Jul 16, 2014 at 10:27:42AM +0200, Daniel Vetter wrote:
  On Tue, Jul 15, 2014 at 8:04 PM, Jerome Glisse j.gli...@gmail.com wrote:
   Yes although it can be skipped on most systems. We figured that topology
   needed to cover everything that would be handled by a single OS image, so
   in a NUMA system it would need to cover all the CPUs. I think that is 
   still
   the right scope, do you agree ?
  
   I think it is a idea to duplicate cpu. I would rather have each device
   give its afinity against each cpu and for cpu just keep the existing
   kernel api that expose this through sysfs iirc.
  
  It's all there already if we fix up the hsa dev-node model to expose
  one dev node per underlying device instead of one for everything:
  - cpus already expose the full numa topology in sysfs
  - pci devices have a numa_node file in sysfs to display the link
  - we can easily add similar stuff for platform devices on arm socs
  without pci devices.
  
  Then the only thing userspace needs to do is follow the device link in
  the hsa instance node in sysfs and we have all the information
  exposed. Iff we expose one hsa driver instance to userspace per
  physical device (which is the normal linux device driver model
  anyway).
  
  I don't see a need to add anything hsa specific here at all (well
  maybe some description of the cache architecture on the hsa device
  itself, the spec seems to have provisions for that).
  -Daniel
 
 What is HSA specific is userspace command queue in form of common ring
 buffer execution queue all sharing common packet format. So yes i see
 a reason for an HSA class that provide common ioctl through one dev file
 per device. Note that i am not a fan of userspace command queue given
 that linux ioctl overhead is small and having kernel do stuff would
 allow for really infinite number of userspace context while right
 now limit is DOORBELL_APERTURE_SIZE/PAGE_SIZE.
 
 No, CPU should not be included, neither should the numa topology of
 device. And yes all numa topology should use existing kernel interface.
 I however understand that a second GPU specific topology might make
 sense ie if you have specialize link btw some discrete GPU.
 
 So if Intel wants to join the HSA foundation fine, but unless you are
 ready to implement what is needed i do not see the value of forcing
 your wish on another group that is trying to standardize something.

You're mixing up my replies ;-) This was really just a comment on the
proposed hsa interfaces for exposing the topology - we already have all
the stuff exposed in sysfs for cpus and pci devices, so exposing this
again through a hsa specific interface doesn't make much sense imo.

What intel does or not does is completely irrelevant for my comment. I.e.
I've written the above with my drm hacker hat on, not with my intel hat
on.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-16 Thread Greg KH
On Wed, Jul 16, 2014 at 10:21:14AM +0200, Daniel Vetter wrote:
 On Tue, Jul 15, 2014 at 7:53 PM, Bridgman, John john.bridg...@amd.com wrote:
 [snip away the discussion about hsa device discover, I'm hijacking
 this thread just for the event/fence stuff here.]
 
  ... There's an event mechanism still to come - mostly for communicating 
  fences and shader interrupts back to userspace, but also used for device 
  change notifications, so no polling of sysfs.
 
 That could would be interesting. On i915 my plan is to internally use
 the recently added struct fence from Maarten. For the external
 interface for userspace that wants explicit control over fences I'm
 leaning towards polishing the android syncpt stuff (currently in
 staging). But in any case I _really_ want to avoid that we end up with
 multiple different and incompatible explicit fencing interfaces on
 linux.

I agree, and I'll say it stronger than that, we WILL NOT have different
and incompatible fencing interfaces in the kernel.  That way lies
madness.

John, take a look at what is now in linux-next, it should provide what
you need here, right?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-15 Thread Jerome Glisse
On Tue, Jul 15, 2014 at 05:53:32PM +, Bridgman, John wrote:
> >From: Jerome Glisse [mailto:j.gli...@gmail.com]
> >Sent: Tuesday, July 15, 2014 1:37 PM
> >To: Bridgman, John
> >Cc: Dave Airlie; Christian König; Lewycky, Andrew; linux-
> >ker...@vger.kernel.org; dri-de...@lists.freedesktop.org; Deucher,
> >Alexander; a...@linux-foundation.org
> >Subject: Re: [PATCH 00/83] AMD HSA kernel driver
> >
> >On Tue, Jul 15, 2014 at 05:06:56PM +, Bridgman, John wrote:
> >> >From: Dave Airlie [mailto:airl...@gmail.com]
> >> >Sent: Tuesday, July 15, 2014 12:35 AM
> >> >To: Christian König
> >> >Cc: Jerome Glisse; Bridgman, John; Lewycky, Andrew; linux-
> >> >ker...@vger.kernel.org; dri-de...@lists.freedesktop.org; Deucher,
> >> >Alexander; a...@linux-foundation.org
> >> >Subject: Re: [PATCH 00/83] AMD HSA kernel driver
> >> >
> >> >On 14 July 2014 18:37, Christian König  wrote:
> >> >>> I vote for HSA module that expose ioctl and is an intermediary
> >> >>> with the kernel driver that handle the hardware. This gives a
> >> >>> single point for HSA hardware and yes this enforce things for any
> >> >>> hardware
> >> >manufacturer.
> >> >>> I am more than happy to tell them that this is it and nothing else
> >> >>> if they want to get upstream.
> >> >>
> >> >> I think we should still discuss this single point of entry a bit more.
> >> >>
> >> >> Just to make it clear the plan is to expose all physical HSA
> >> >> capable devices through a single /dev/hsa device node to userspace.
> >> >
> >> >This is why we don't design kernel interfaces in secret foundations,
> >> >and expect anyone to like them.
> >>
> >> Understood and agree. In this case though this isn't a cross-vendor
> >> interface designed by a secret committee, it's supposed to be more of
> >> an inoffensive little single-vendor interface designed *for* a secret
> >> committee. I'm hoping that's better ;)
> >>
> >> >
> >> >So before we go any further, how is this stuff planned to work for
> >> >multiple GPUs/accelerators?
> >>
> >> Three classes of "multiple" :
> >>
> >> 1. Single CPU with IOMMUv2 and multiple GPUs:
> >>
> >> - all devices accessible via /dev/kfd
> >> - topology information identifies CPU + GPUs, each has "node ID" at
> >> top of userspace API, "global ID" at user/kernel interface  (don't
> >> think we've implemented CPU part yet though)
> >> - userspace builds snapshot from sysfs info & exposes to HSAIL
> >> runtime, which in turn exposes the "standard" API
> >
> >This is why i do not like the sysfs approach, it would be lot nicer to have
> >device file per provider and thus hsail can listen on device file event and
> >discover if hardware is vanishing or appearing. Periodicaly going over sysfs
> >files is not the right way to do that.
> 
> Agree that wouldn't be good. There's an event mechanism still to come - mostly
> for communicating fences and shader interrupts back to userspace, but also 
> used
> for "device change" notifications, so no polling of sysfs.
> 

My point being, do not use sysfs, use /dev/hsa/device* and have hsail listen on
file event on /dev/hsa/ directory. The hsail would be inform of new device and
of device that are unloaded. It would do a first pass to open each device file
and get there capabilities through standardize ioctl.

Thought maybe sysfs is ok given than cpu numa is expose through sysfs

> >
> >> - kfd sets up ATC aperture so GPUs can access system RAM via IOMMUv2
> >> (fast for APU, relatively less so for dGPU over PCIE)
> >> - to-be-added memory operations allow allocation & residency control
> >> (within existing gfx driver limits) of buffers in VRAM & carved-out
> >> system RAM
> >> - queue operations specify a node ID to userspace library, which
> >> translates to "global ID" before calling kfd
> >>
> >> 2. Multiple CPUs connected via fabric (eg HyperTransport) each with 0 or
> >more GPUs:
> >>
> >> - topology information exposes CPUs & GPUs, along with affinity info
> >> showing what is connected to what
> >> - everything else works as in (1) above
> >>
> >
> >This is suppose to be part of HSA ? This is lot broader than i thought.
> 
> Yes although it

RE: [PATCH 00/83] AMD HSA kernel driver

2014-07-15 Thread Bridgman, John


>-Original Message-
>From: Jerome Glisse [mailto:j.gli...@gmail.com]
>Sent: Tuesday, July 15, 2014 1:37 PM
>To: Bridgman, John
>Cc: Dave Airlie; Christian König; Lewycky, Andrew; linux-
>ker...@vger.kernel.org; dri-de...@lists.freedesktop.org; Deucher,
>Alexander; a...@linux-foundation.org
>Subject: Re: [PATCH 00/83] AMD HSA kernel driver
>
>On Tue, Jul 15, 2014 at 05:06:56PM +, Bridgman, John wrote:
>> >From: Dave Airlie [mailto:airl...@gmail.com]
>> >Sent: Tuesday, July 15, 2014 12:35 AM
>> >To: Christian König
>> >Cc: Jerome Glisse; Bridgman, John; Lewycky, Andrew; linux-
>> >ker...@vger.kernel.org; dri-de...@lists.freedesktop.org; Deucher,
>> >Alexander; a...@linux-foundation.org
>> >Subject: Re: [PATCH 00/83] AMD HSA kernel driver
>> >
>> >On 14 July 2014 18:37, Christian König  wrote:
>> >>> I vote for HSA module that expose ioctl and is an intermediary
>> >>> with the kernel driver that handle the hardware. This gives a
>> >>> single point for HSA hardware and yes this enforce things for any
>> >>> hardware
>> >manufacturer.
>> >>> I am more than happy to tell them that this is it and nothing else
>> >>> if they want to get upstream.
>> >>
>> >> I think we should still discuss this single point of entry a bit more.
>> >>
>> >> Just to make it clear the plan is to expose all physical HSA
>> >> capable devices through a single /dev/hsa device node to userspace.
>> >
>> >This is why we don't design kernel interfaces in secret foundations,
>> >and expect anyone to like them.
>>
>> Understood and agree. In this case though this isn't a cross-vendor
>> interface designed by a secret committee, it's supposed to be more of
>> an inoffensive little single-vendor interface designed *for* a secret
>> committee. I'm hoping that's better ;)
>>
>> >
>> >So before we go any further, how is this stuff planned to work for
>> >multiple GPUs/accelerators?
>>
>> Three classes of "multiple" :
>>
>> 1. Single CPU with IOMMUv2 and multiple GPUs:
>>
>> - all devices accessible via /dev/kfd
>> - topology information identifies CPU + GPUs, each has "node ID" at
>> top of userspace API, "global ID" at user/kernel interface  (don't
>> think we've implemented CPU part yet though)
>> - userspace builds snapshot from sysfs info & exposes to HSAIL
>> runtime, which in turn exposes the "standard" API
>
>This is why i do not like the sysfs approach, it would be lot nicer to have
>device file per provider and thus hsail can listen on device file event and
>discover if hardware is vanishing or appearing. Periodicaly going over sysfs
>files is not the right way to do that.

Agree that wouldn't be good. There's an event mechanism still to come - mostly 
for communicating fences and shader interrupts back to userspace, but also used 
for "device change" notifications, so no polling of sysfs.

>
>> - kfd sets up ATC aperture so GPUs can access system RAM via IOMMUv2
>> (fast for APU, relatively less so for dGPU over PCIE)
>> - to-be-added memory operations allow allocation & residency control
>> (within existing gfx driver limits) of buffers in VRAM & carved-out
>> system RAM
>> - queue operations specify a node ID to userspace library, which
>> translates to "global ID" before calling kfd
>>
>> 2. Multiple CPUs connected via fabric (eg HyperTransport) each with 0 or
>more GPUs:
>>
>> - topology information exposes CPUs & GPUs, along with affinity info
>> showing what is connected to what
>> - everything else works as in (1) above
>>
>
>This is suppose to be part of HSA ? This is lot broader than i thought.

Yes although it can be skipped on most systems. We figured that topology needed 
to cover everything that would be handled by a single OS image, so in a NUMA 
system it would need to cover all the CPUs. I think that is still the right 
scope, do you agree ?

>
>> 3. Multiple CPUs not connected via fabric (eg a blade server) each
>> with 0 or more GPUs
>>
>> - no attempt to cover this with HSA topology, each CPU and associated
>> GPUs is accessed independently via separate /dev/kfd instances
>>
>> >
>> >Do we have a userspace to exercise this interface so we can see how
>> >such a thing would look?
>>
>> Yes -- initial IP review done, legal stuff done, sanitizing WIP,
>> hoping for final approval this week
>>
>> There's a separate test harness to exercise the userspace lib calls,
>> haven't started IP review or sanitizing for that but legal stuff is
>> done
>>
>> >
>> >Dave.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 00/83] AMD HSA kernel driver

2014-07-15 Thread Bridgman, John


>-Original Message-
>From: dri-devel [mailto:dri-devel-boun...@lists.freedesktop.org] On Behalf
>Of Bridgman, John
>Sent: Tuesday, July 15, 2014 1:07 PM
>To: Dave Airlie; Christian König
>Cc: Lewycky, Andrew; linux-kernel@vger.kernel.org; dri-
>de...@lists.freedesktop.org; Deucher, Alexander; akpm@linux-
>foundation.org
>Subject: RE: [PATCH 00/83] AMD HSA kernel driver
>
>
>
>>-Original Message-
>>From: Dave Airlie [mailto:airl...@gmail.com]
>>Sent: Tuesday, July 15, 2014 12:35 AM
>>To: Christian König
>>Cc: Jerome Glisse; Bridgman, John; Lewycky, Andrew; linux-
>>ker...@vger.kernel.org; dri-de...@lists.freedesktop.org; Deucher,
>>Alexander; a...@linux-foundation.org
>>Subject: Re: [PATCH 00/83] AMD HSA kernel driver
>>
>>On 14 July 2014 18:37, Christian König  wrote:
>>>> I vote for HSA module that expose ioctl and is an intermediary with
>>>> the kernel driver that handle the hardware. This gives a single
>>>> point for HSA hardware and yes this enforce things for any hardware
>>manufacturer.
>>>> I am more than happy to tell them that this is it and nothing else
>>>> if they want to get upstream.
>>>
>>> I think we should still discuss this single point of entry a bit more.
>>>
>>> Just to make it clear the plan is to expose all physical HSA capable
>>> devices through a single /dev/hsa device node to userspace.
>>
>>This is why we don't design kernel interfaces in secret foundations,
>>and expect anyone to like them.
>
>Understood and agree. In this case though this isn't a cross-vendor interface
>designed by a secret committee, it's supposed to be more of an inoffensive
>little single-vendor interface designed *for* a secret committee. I'm hoping
>that's better ;)
>
>>
>>So before we go any further, how is this stuff planned to work for
>>multiple GPUs/accelerators?
>
>Three classes of "multiple" :
>
>1. Single CPU with IOMMUv2 and multiple GPUs:
>
>- all devices accessible via /dev/kfd
>- topology information identifies CPU + GPUs, each has "node ID" at top of
>userspace API, "global ID" at user/kernel interface  (don't think we've
>implemented CPU part yet though)
>- userspace builds snapshot from sysfs info & exposes to HSAIL runtime,
>which in turn exposes the "standard" API
>- kfd sets up ATC aperture so GPUs can access system RAM via IOMMUv2 (fast
>for APU, relatively less so for dGPU over PCIE)
>- to-be-added memory operations allow allocation & residency control
>(within existing gfx driver limits) of buffers in VRAM & carved-out system
>RAM
>- queue operations specify a node ID to userspace library, which translates to
>"global ID" before calling kfd
>
>2. Multiple CPUs connected via fabric (eg HyperTransport) each with 0 or
>more GPUs:
>
>- topology information exposes CPUs & GPUs, along with affinity info
>showing what is connected to what
>- everything else works as in (1) above

This is probably a good point to stress that HSA topology is only intended as 
an OS-independent way of communicating system info up to higher levels of the 
HSA stack, not as a new and competing way to *manage* system properties inside 
Linux or any other OS.

>
>3. Multiple CPUs not connected via fabric (eg a blade server) each with 0 or
>more GPUs
>
>- no attempt to cover this with HSA topology, each CPU and associated GPUs
>is accessed independently via separate /dev/kfd instances
>
>>
>>Do we have a userspace to exercise this interface so we can see how
>>such a thing would look?
>
>Yes -- initial IP review done, legal stuff done, sanitizing WIP, hoping for 
>final
>approval this week
>
>There's a separate test harness to exercise the userspace lib calls, haven't
>started IP review or sanitizing for that but legal stuff is done
>
>>
>>Dave.
>___
>dri-devel mailing list
>dri-de...@lists.freedesktop.org
>http://lists.freedesktop.org/mailman/listinfo/dri-devel
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-15 Thread Jerome Glisse
On Tue, Jul 15, 2014 at 05:06:56PM +, Bridgman, John wrote:
> >From: Dave Airlie [mailto:airl...@gmail.com]
> >Sent: Tuesday, July 15, 2014 12:35 AM
> >To: Christian König
> >Cc: Jerome Glisse; Bridgman, John; Lewycky, Andrew; linux-
> >ker...@vger.kernel.org; dri-de...@lists.freedesktop.org; Deucher,
> >Alexander; a...@linux-foundation.org
> >Subject: Re: [PATCH 00/83] AMD HSA kernel driver
> >
> >On 14 July 2014 18:37, Christian König  wrote:
> >>> I vote for HSA module that expose ioctl and is an intermediary with
> >>> the kernel driver that handle the hardware. This gives a single point
> >>> for HSA hardware and yes this enforce things for any hardware
> >manufacturer.
> >>> I am more than happy to tell them that this is it and nothing else if
> >>> they want to get upstream.
> >>
> >> I think we should still discuss this single point of entry a bit more.
> >>
> >> Just to make it clear the plan is to expose all physical HSA capable
> >> devices through a single /dev/hsa device node to userspace.
> >
> >This is why we don't design kernel interfaces in secret foundations, and
> >expect anyone to like them.
> 
> Understood and agree. In this case though this isn't a cross-vendor interface 
> designed by a secret committee, it's supposed to be more of an inoffensive 
> little single-vendor interface designed *for* a secret committee. I'm hoping 
> that's better ;)
> 
> >
> >So before we go any further, how is this stuff planned to work for multiple
> >GPUs/accelerators?
> 
> Three classes of "multiple" :
> 
> 1. Single CPU with IOMMUv2 and multiple GPUs:
> 
> - all devices accessible via /dev/kfd
> - topology information identifies CPU + GPUs, each has "node ID" at top of 
> userspace API, "global ID" at user/kernel interface
>  (don't think we've implemented CPU part yet though)
> - userspace builds snapshot from sysfs info & exposes to HSAIL runtime, which 
> in turn exposes the "standard" API

This is why i do not like the sysfs approach, it would be lot nicer to have
device file per provider and thus hsail can listen on device file event and
discover if hardware is vanishing or appearing. Periodicaly going over sysfs
files is not the right way to do that.

> - kfd sets up ATC aperture so GPUs can access system RAM via IOMMUv2 (fast 
> for APU, relatively less so for dGPU over PCIE)
> - to-be-added memory operations allow allocation & residency control (within 
> existing gfx driver limits) of buffers in VRAM & carved-out system RAM
> - queue operations specify a node ID to userspace library, which translates 
> to "global ID" before calling kfd
> 
> 2. Multiple CPUs connected via fabric (eg HyperTransport) each with 0 or more 
> GPUs:
> 
> - topology information exposes CPUs & GPUs, along with affinity info showing 
> what is connected to what
> - everything else works as in (1) above
> 

This is suppose to be part of HSA ? This is lot broader than i thought.

> 3. Multiple CPUs not connected via fabric (eg a blade server) each with 0 or 
> more GPUs
> 
> - no attempt to cover this with HSA topology, each CPU and associated GPUs is 
> accessed independently via separate /dev/kfd instances
> 
> >
> >Do we have a userspace to exercise this interface so we can see how such a
> >thing would look?
> 
> Yes -- initial IP review done, legal stuff done, sanitizing WIP, hoping for 
> final approval this week
> 
> There's a separate test harness to exercise the userspace lib calls, haven't 
> started IP review or sanitizing for that but legal stuff is done
> 
> >
> >Dave.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 00/83] AMD HSA kernel driver

2014-07-15 Thread Bridgman, John


>-Original Message-
>From: Dave Airlie [mailto:airl...@gmail.com]
>Sent: Tuesday, July 15, 2014 12:35 AM
>To: Christian König
>Cc: Jerome Glisse; Bridgman, John; Lewycky, Andrew; linux-
>ker...@vger.kernel.org; dri-de...@lists.freedesktop.org; Deucher,
>Alexander; a...@linux-foundation.org
>Subject: Re: [PATCH 00/83] AMD HSA kernel driver
>
>On 14 July 2014 18:37, Christian König  wrote:
>>> I vote for HSA module that expose ioctl and is an intermediary with
>>> the kernel driver that handle the hardware. This gives a single point
>>> for HSA hardware and yes this enforce things for any hardware
>manufacturer.
>>> I am more than happy to tell them that this is it and nothing else if
>>> they want to get upstream.
>>
>> I think we should still discuss this single point of entry a bit more.
>>
>> Just to make it clear the plan is to expose all physical HSA capable
>> devices through a single /dev/hsa device node to userspace.
>
>This is why we don't design kernel interfaces in secret foundations, and
>expect anyone to like them.

Understood and agree. In this case though this isn't a cross-vendor interface 
designed by a secret committee, it's supposed to be more of an inoffensive 
little single-vendor interface designed *for* a secret committee. I'm hoping 
that's better ;)

>
>So before we go any further, how is this stuff planned to work for multiple
>GPUs/accelerators?

Three classes of "multiple" :

1. Single CPU with IOMMUv2 and multiple GPUs:

- all devices accessible via /dev/kfd
- topology information identifies CPU + GPUs, each has "node ID" at top of 
userspace API, "global ID" at user/kernel interface
 (don't think we've implemented CPU part yet though)
- userspace builds snapshot from sysfs info & exposes to HSAIL runtime, which 
in turn exposes the "standard" API
- kfd sets up ATC aperture so GPUs can access system RAM via IOMMUv2 (fast for 
APU, relatively less so for dGPU over PCIE)
- to-be-added memory operations allow allocation & residency control (within 
existing gfx driver limits) of buffers in VRAM & carved-out system RAM
- queue operations specify a node ID to userspace library, which translates to 
"global ID" before calling kfd

2. Multiple CPUs connected via fabric (eg HyperTransport) each with 0 or more 
GPUs:

- topology information exposes CPUs & GPUs, along with affinity info showing 
what is connected to what
- everything else works as in (1) above

3. Multiple CPUs not connected via fabric (eg a blade server) each with 0 or 
more GPUs

- no attempt to cover this with HSA topology, each CPU and associated GPUs is 
accessed independently via separate /dev/kfd instances

>
>Do we have a userspace to exercise this interface so we can see how such a
>thing would look?

Yes -- initial IP review done, legal stuff done, sanitizing WIP, hoping for 
final approval this week

There's a separate test harness to exercise the userspace lib calls, haven't 
started IP review or sanitizing for that but legal stuff is done

>
>Dave.
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-15 Thread Jerome Glisse
On Tue, Jul 15, 2014 at 02:35:19PM +1000, Dave Airlie wrote:
> On 14 July 2014 18:37, Christian König  wrote:
> >> I vote for HSA module that expose ioctl and is an intermediary with the
> >> kernel driver that handle the hardware. This gives a single point for
> >> HSA hardware and yes this enforce things for any hardware manufacturer.
> >> I am more than happy to tell them that this is it and nothing else if
> >> they want to get upstream.
> >
> > I think we should still discuss this single point of entry a bit more.
> >
> > Just to make it clear the plan is to expose all physical HSA capable devices
> > through a single /dev/hsa device node to userspace.
> 
> This is why we don't design kernel interfaces in secret foundations,
> and expect anyone to like them.
> 

I think at this time this is unlikely to get into 3.17. But Christian had
point on having multiple device file. So something like /dev/hsa/*

> So before we go any further, how is this stuff planned to work for
> multiple GPUs/accelerators?

My understanding is that you create queue and each queue is associated
with a device. You can create several queue for same device and have
different priority btw queue.

Btw queue here means a ring buffer that understand a common set of pm4
packet.

> Do we have a userspace to exercise this interface so we can see how
> such a thing would look?

I think we need to wait a bit before freezing and accepting the kernel
api and see enough userspace bits to be confortable. Moreover if AMD
wants common API for HSA i also think that they at very least needs
there HSA partner to make public comment on the kernel API.

Cheers,
Jérôme


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-15 Thread Jerome Glisse
On Tue, Jul 15, 2014 at 02:35:19PM +1000, Dave Airlie wrote:
 On 14 July 2014 18:37, Christian König deathsim...@vodafone.de wrote:
  I vote for HSA module that expose ioctl and is an intermediary with the
  kernel driver that handle the hardware. This gives a single point for
  HSA hardware and yes this enforce things for any hardware manufacturer.
  I am more than happy to tell them that this is it and nothing else if
  they want to get upstream.
 
  I think we should still discuss this single point of entry a bit more.
 
  Just to make it clear the plan is to expose all physical HSA capable devices
  through a single /dev/hsa device node to userspace.
 
 This is why we don't design kernel interfaces in secret foundations,
 and expect anyone to like them.
 

I think at this time this is unlikely to get into 3.17. But Christian had
point on having multiple device file. So something like /dev/hsa/*

 So before we go any further, how is this stuff planned to work for
 multiple GPUs/accelerators?

My understanding is that you create queue and each queue is associated
with a device. You can create several queue for same device and have
different priority btw queue.

Btw queue here means a ring buffer that understand a common set of pm4
packet.

 Do we have a userspace to exercise this interface so we can see how
 such a thing would look?

I think we need to wait a bit before freezing and accepting the kernel
api and see enough userspace bits to be confortable. Moreover if AMD
wants common API for HSA i also think that they at very least needs
there HSA partner to make public comment on the kernel API.

Cheers,
Jérôme


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 00/83] AMD HSA kernel driver

2014-07-15 Thread Bridgman, John


-Original Message-
From: Dave Airlie [mailto:airl...@gmail.com]
Sent: Tuesday, July 15, 2014 12:35 AM
To: Christian König
Cc: Jerome Glisse; Bridgman, John; Lewycky, Andrew; linux-
ker...@vger.kernel.org; dri-de...@lists.freedesktop.org; Deucher,
Alexander; a...@linux-foundation.org
Subject: Re: [PATCH 00/83] AMD HSA kernel driver

On 14 July 2014 18:37, Christian König deathsim...@vodafone.de wrote:
 I vote for HSA module that expose ioctl and is an intermediary with
 the kernel driver that handle the hardware. This gives a single point
 for HSA hardware and yes this enforce things for any hardware
manufacturer.
 I am more than happy to tell them that this is it and nothing else if
 they want to get upstream.

 I think we should still discuss this single point of entry a bit more.

 Just to make it clear the plan is to expose all physical HSA capable
 devices through a single /dev/hsa device node to userspace.

This is why we don't design kernel interfaces in secret foundations, and
expect anyone to like them.

Understood and agree. In this case though this isn't a cross-vendor interface 
designed by a secret committee, it's supposed to be more of an inoffensive 
little single-vendor interface designed *for* a secret committee. I'm hoping 
that's better ;)


So before we go any further, how is this stuff planned to work for multiple
GPUs/accelerators?

Three classes of multiple :

1. Single CPU with IOMMUv2 and multiple GPUs:

- all devices accessible via /dev/kfd
- topology information identifies CPU + GPUs, each has node ID at top of 
userspace API, global ID at user/kernel interface
 (don't think we've implemented CPU part yet though)
- userspace builds snapshot from sysfs info  exposes to HSAIL runtime, which 
in turn exposes the standard API
- kfd sets up ATC aperture so GPUs can access system RAM via IOMMUv2 (fast for 
APU, relatively less so for dGPU over PCIE)
- to-be-added memory operations allow allocation  residency control (within 
existing gfx driver limits) of buffers in VRAM  carved-out system RAM
- queue operations specify a node ID to userspace library, which translates to 
global ID before calling kfd

2. Multiple CPUs connected via fabric (eg HyperTransport) each with 0 or more 
GPUs:

- topology information exposes CPUs  GPUs, along with affinity info showing 
what is connected to what
- everything else works as in (1) above

3. Multiple CPUs not connected via fabric (eg a blade server) each with 0 or 
more GPUs

- no attempt to cover this with HSA topology, each CPU and associated GPUs is 
accessed independently via separate /dev/kfd instances


Do we have a userspace to exercise this interface so we can see how such a
thing would look?

Yes -- initial IP review done, legal stuff done, sanitizing WIP, hoping for 
final approval this week

There's a separate test harness to exercise the userspace lib calls, haven't 
started IP review or sanitizing for that but legal stuff is done


Dave.
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-15 Thread Jerome Glisse
On Tue, Jul 15, 2014 at 05:06:56PM +, Bridgman, John wrote:
 From: Dave Airlie [mailto:airl...@gmail.com]
 Sent: Tuesday, July 15, 2014 12:35 AM
 To: Christian König
 Cc: Jerome Glisse; Bridgman, John; Lewycky, Andrew; linux-
 ker...@vger.kernel.org; dri-de...@lists.freedesktop.org; Deucher,
 Alexander; a...@linux-foundation.org
 Subject: Re: [PATCH 00/83] AMD HSA kernel driver
 
 On 14 July 2014 18:37, Christian König deathsim...@vodafone.de wrote:
  I vote for HSA module that expose ioctl and is an intermediary with
  the kernel driver that handle the hardware. This gives a single point
  for HSA hardware and yes this enforce things for any hardware
 manufacturer.
  I am more than happy to tell them that this is it and nothing else if
  they want to get upstream.
 
  I think we should still discuss this single point of entry a bit more.
 
  Just to make it clear the plan is to expose all physical HSA capable
  devices through a single /dev/hsa device node to userspace.
 
 This is why we don't design kernel interfaces in secret foundations, and
 expect anyone to like them.
 
 Understood and agree. In this case though this isn't a cross-vendor interface 
 designed by a secret committee, it's supposed to be more of an inoffensive 
 little single-vendor interface designed *for* a secret committee. I'm hoping 
 that's better ;)
 
 
 So before we go any further, how is this stuff planned to work for multiple
 GPUs/accelerators?
 
 Three classes of multiple :
 
 1. Single CPU with IOMMUv2 and multiple GPUs:
 
 - all devices accessible via /dev/kfd
 - topology information identifies CPU + GPUs, each has node ID at top of 
 userspace API, global ID at user/kernel interface
  (don't think we've implemented CPU part yet though)
 - userspace builds snapshot from sysfs info  exposes to HSAIL runtime, which 
 in turn exposes the standard API

This is why i do not like the sysfs approach, it would be lot nicer to have
device file per provider and thus hsail can listen on device file event and
discover if hardware is vanishing or appearing. Periodicaly going over sysfs
files is not the right way to do that.

 - kfd sets up ATC aperture so GPUs can access system RAM via IOMMUv2 (fast 
 for APU, relatively less so for dGPU over PCIE)
 - to-be-added memory operations allow allocation  residency control (within 
 existing gfx driver limits) of buffers in VRAM  carved-out system RAM
 - queue operations specify a node ID to userspace library, which translates 
 to global ID before calling kfd
 
 2. Multiple CPUs connected via fabric (eg HyperTransport) each with 0 or more 
 GPUs:
 
 - topology information exposes CPUs  GPUs, along with affinity info showing 
 what is connected to what
 - everything else works as in (1) above
 

This is suppose to be part of HSA ? This is lot broader than i thought.

 3. Multiple CPUs not connected via fabric (eg a blade server) each with 0 or 
 more GPUs
 
 - no attempt to cover this with HSA topology, each CPU and associated GPUs is 
 accessed independently via separate /dev/kfd instances
 
 
 Do we have a userspace to exercise this interface so we can see how such a
 thing would look?
 
 Yes -- initial IP review done, legal stuff done, sanitizing WIP, hoping for 
 final approval this week
 
 There's a separate test harness to exercise the userspace lib calls, haven't 
 started IP review or sanitizing for that but legal stuff is done
 
 
 Dave.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 00/83] AMD HSA kernel driver

2014-07-15 Thread Bridgman, John


-Original Message-
From: dri-devel [mailto:dri-devel-boun...@lists.freedesktop.org] On Behalf
Of Bridgman, John
Sent: Tuesday, July 15, 2014 1:07 PM
To: Dave Airlie; Christian König
Cc: Lewycky, Andrew; linux-kernel@vger.kernel.org; dri-
de...@lists.freedesktop.org; Deucher, Alexander; akpm@linux-
foundation.org
Subject: RE: [PATCH 00/83] AMD HSA kernel driver



-Original Message-
From: Dave Airlie [mailto:airl...@gmail.com]
Sent: Tuesday, July 15, 2014 12:35 AM
To: Christian König
Cc: Jerome Glisse; Bridgman, John; Lewycky, Andrew; linux-
ker...@vger.kernel.org; dri-de...@lists.freedesktop.org; Deucher,
Alexander; a...@linux-foundation.org
Subject: Re: [PATCH 00/83] AMD HSA kernel driver

On 14 July 2014 18:37, Christian König deathsim...@vodafone.de wrote:
 I vote for HSA module that expose ioctl and is an intermediary with
 the kernel driver that handle the hardware. This gives a single
 point for HSA hardware and yes this enforce things for any hardware
manufacturer.
 I am more than happy to tell them that this is it and nothing else
 if they want to get upstream.

 I think we should still discuss this single point of entry a bit more.

 Just to make it clear the plan is to expose all physical HSA capable
 devices through a single /dev/hsa device node to userspace.

This is why we don't design kernel interfaces in secret foundations,
and expect anyone to like them.

Understood and agree. In this case though this isn't a cross-vendor interface
designed by a secret committee, it's supposed to be more of an inoffensive
little single-vendor interface designed *for* a secret committee. I'm hoping
that's better ;)


So before we go any further, how is this stuff planned to work for
multiple GPUs/accelerators?

Three classes of multiple :

1. Single CPU with IOMMUv2 and multiple GPUs:

- all devices accessible via /dev/kfd
- topology information identifies CPU + GPUs, each has node ID at top of
userspace API, global ID at user/kernel interface  (don't think we've
implemented CPU part yet though)
- userspace builds snapshot from sysfs info  exposes to HSAIL runtime,
which in turn exposes the standard API
- kfd sets up ATC aperture so GPUs can access system RAM via IOMMUv2 (fast
for APU, relatively less so for dGPU over PCIE)
- to-be-added memory operations allow allocation  residency control
(within existing gfx driver limits) of buffers in VRAM  carved-out system
RAM
- queue operations specify a node ID to userspace library, which translates to
global ID before calling kfd

2. Multiple CPUs connected via fabric (eg HyperTransport) each with 0 or
more GPUs:

- topology information exposes CPUs  GPUs, along with affinity info
showing what is connected to what
- everything else works as in (1) above

This is probably a good point to stress that HSA topology is only intended as 
an OS-independent way of communicating system info up to higher levels of the 
HSA stack, not as a new and competing way to *manage* system properties inside 
Linux or any other OS.


3. Multiple CPUs not connected via fabric (eg a blade server) each with 0 or
more GPUs

- no attempt to cover this with HSA topology, each CPU and associated GPUs
is accessed independently via separate /dev/kfd instances


Do we have a userspace to exercise this interface so we can see how
such a thing would look?

Yes -- initial IP review done, legal stuff done, sanitizing WIP, hoping for 
final
approval this week

There's a separate test harness to exercise the userspace lib calls, haven't
started IP review or sanitizing for that but legal stuff is done


Dave.
___
dri-devel mailing list
dri-de...@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH 00/83] AMD HSA kernel driver

2014-07-15 Thread Bridgman, John


-Original Message-
From: Jerome Glisse [mailto:j.gli...@gmail.com]
Sent: Tuesday, July 15, 2014 1:37 PM
To: Bridgman, John
Cc: Dave Airlie; Christian König; Lewycky, Andrew; linux-
ker...@vger.kernel.org; dri-de...@lists.freedesktop.org; Deucher,
Alexander; a...@linux-foundation.org
Subject: Re: [PATCH 00/83] AMD HSA kernel driver

On Tue, Jul 15, 2014 at 05:06:56PM +, Bridgman, John wrote:
 From: Dave Airlie [mailto:airl...@gmail.com]
 Sent: Tuesday, July 15, 2014 12:35 AM
 To: Christian König
 Cc: Jerome Glisse; Bridgman, John; Lewycky, Andrew; linux-
 ker...@vger.kernel.org; dri-de...@lists.freedesktop.org; Deucher,
 Alexander; a...@linux-foundation.org
 Subject: Re: [PATCH 00/83] AMD HSA kernel driver
 
 On 14 July 2014 18:37, Christian König deathsim...@vodafone.de wrote:
  I vote for HSA module that expose ioctl and is an intermediary
  with the kernel driver that handle the hardware. This gives a
  single point for HSA hardware and yes this enforce things for any
  hardware
 manufacturer.
  I am more than happy to tell them that this is it and nothing else
  if they want to get upstream.
 
  I think we should still discuss this single point of entry a bit more.
 
  Just to make it clear the plan is to expose all physical HSA
  capable devices through a single /dev/hsa device node to userspace.
 
 This is why we don't design kernel interfaces in secret foundations,
 and expect anyone to like them.

 Understood and agree. In this case though this isn't a cross-vendor
 interface designed by a secret committee, it's supposed to be more of
 an inoffensive little single-vendor interface designed *for* a secret
 committee. I'm hoping that's better ;)

 
 So before we go any further, how is this stuff planned to work for
 multiple GPUs/accelerators?

 Three classes of multiple :

 1. Single CPU with IOMMUv2 and multiple GPUs:

 - all devices accessible via /dev/kfd
 - topology information identifies CPU + GPUs, each has node ID at
 top of userspace API, global ID at user/kernel interface  (don't
 think we've implemented CPU part yet though)
 - userspace builds snapshot from sysfs info  exposes to HSAIL
 runtime, which in turn exposes the standard API

This is why i do not like the sysfs approach, it would be lot nicer to have
device file per provider and thus hsail can listen on device file event and
discover if hardware is vanishing or appearing. Periodicaly going over sysfs
files is not the right way to do that.

Agree that wouldn't be good. There's an event mechanism still to come - mostly 
for communicating fences and shader interrupts back to userspace, but also used 
for device change notifications, so no polling of sysfs.


 - kfd sets up ATC aperture so GPUs can access system RAM via IOMMUv2
 (fast for APU, relatively less so for dGPU over PCIE)
 - to-be-added memory operations allow allocation  residency control
 (within existing gfx driver limits) of buffers in VRAM  carved-out
 system RAM
 - queue operations specify a node ID to userspace library, which
 translates to global ID before calling kfd

 2. Multiple CPUs connected via fabric (eg HyperTransport) each with 0 or
more GPUs:

 - topology information exposes CPUs  GPUs, along with affinity info
 showing what is connected to what
 - everything else works as in (1) above


This is suppose to be part of HSA ? This is lot broader than i thought.

Yes although it can be skipped on most systems. We figured that topology needed 
to cover everything that would be handled by a single OS image, so in a NUMA 
system it would need to cover all the CPUs. I think that is still the right 
scope, do you agree ?


 3. Multiple CPUs not connected via fabric (eg a blade server) each
 with 0 or more GPUs

 - no attempt to cover this with HSA topology, each CPU and associated
 GPUs is accessed independently via separate /dev/kfd instances

 
 Do we have a userspace to exercise this interface so we can see how
 such a thing would look?

 Yes -- initial IP review done, legal stuff done, sanitizing WIP,
 hoping for final approval this week

 There's a separate test harness to exercise the userspace lib calls,
 haven't started IP review or sanitizing for that but legal stuff is
 done

 
 Dave.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-15 Thread Jerome Glisse
On Tue, Jul 15, 2014 at 05:53:32PM +, Bridgman, John wrote:
 From: Jerome Glisse [mailto:j.gli...@gmail.com]
 Sent: Tuesday, July 15, 2014 1:37 PM
 To: Bridgman, John
 Cc: Dave Airlie; Christian König; Lewycky, Andrew; linux-
 ker...@vger.kernel.org; dri-de...@lists.freedesktop.org; Deucher,
 Alexander; a...@linux-foundation.org
 Subject: Re: [PATCH 00/83] AMD HSA kernel driver
 
 On Tue, Jul 15, 2014 at 05:06:56PM +, Bridgman, John wrote:
  From: Dave Airlie [mailto:airl...@gmail.com]
  Sent: Tuesday, July 15, 2014 12:35 AM
  To: Christian König
  Cc: Jerome Glisse; Bridgman, John; Lewycky, Andrew; linux-
  ker...@vger.kernel.org; dri-de...@lists.freedesktop.org; Deucher,
  Alexander; a...@linux-foundation.org
  Subject: Re: [PATCH 00/83] AMD HSA kernel driver
  
  On 14 July 2014 18:37, Christian König deathsim...@vodafone.de wrote:
   I vote for HSA module that expose ioctl and is an intermediary
   with the kernel driver that handle the hardware. This gives a
   single point for HSA hardware and yes this enforce things for any
   hardware
  manufacturer.
   I am more than happy to tell them that this is it and nothing else
   if they want to get upstream.
  
   I think we should still discuss this single point of entry a bit more.
  
   Just to make it clear the plan is to expose all physical HSA
   capable devices through a single /dev/hsa device node to userspace.
  
  This is why we don't design kernel interfaces in secret foundations,
  and expect anyone to like them.
 
  Understood and agree. In this case though this isn't a cross-vendor
  interface designed by a secret committee, it's supposed to be more of
  an inoffensive little single-vendor interface designed *for* a secret
  committee. I'm hoping that's better ;)
 
  
  So before we go any further, how is this stuff planned to work for
  multiple GPUs/accelerators?
 
  Three classes of multiple :
 
  1. Single CPU with IOMMUv2 and multiple GPUs:
 
  - all devices accessible via /dev/kfd
  - topology information identifies CPU + GPUs, each has node ID at
  top of userspace API, global ID at user/kernel interface  (don't
  think we've implemented CPU part yet though)
  - userspace builds snapshot from sysfs info  exposes to HSAIL
  runtime, which in turn exposes the standard API
 
 This is why i do not like the sysfs approach, it would be lot nicer to have
 device file per provider and thus hsail can listen on device file event and
 discover if hardware is vanishing or appearing. Periodicaly going over sysfs
 files is not the right way to do that.
 
 Agree that wouldn't be good. There's an event mechanism still to come - mostly
 for communicating fences and shader interrupts back to userspace, but also 
 used
 for device change notifications, so no polling of sysfs.
 

My point being, do not use sysfs, use /dev/hsa/device* and have hsail listen on
file event on /dev/hsa/ directory. The hsail would be inform of new device and
of device that are unloaded. It would do a first pass to open each device file
and get there capabilities through standardize ioctl.

Thought maybe sysfs is ok given than cpu numa is expose through sysfs

 
  - kfd sets up ATC aperture so GPUs can access system RAM via IOMMUv2
  (fast for APU, relatively less so for dGPU over PCIE)
  - to-be-added memory operations allow allocation  residency control
  (within existing gfx driver limits) of buffers in VRAM  carved-out
  system RAM
  - queue operations specify a node ID to userspace library, which
  translates to global ID before calling kfd
 
  2. Multiple CPUs connected via fabric (eg HyperTransport) each with 0 or
 more GPUs:
 
  - topology information exposes CPUs  GPUs, along with affinity info
  showing what is connected to what
  - everything else works as in (1) above
 
 
 This is suppose to be part of HSA ? This is lot broader than i thought.
 
 Yes although it can be skipped on most systems. We figured that topology
 needed to cover everything that would be handled by a single OS image, so
 in a NUMA system it would need to cover all the CPUs. I think that is still
 the right scope, do you agree ?

I think it is a idea to duplicate cpu. I would rather have each device
give its afinity against each cpu and for cpu just keep the existing
kernel api that expose this through sysfs iirc.

 
 
  3. Multiple CPUs not connected via fabric (eg a blade server) each
  with 0 or more GPUs
 
  - no attempt to cover this with HSA topology, each CPU and associated
  GPUs is accessed independently via separate /dev/kfd instances
 
  
  Do we have a userspace to exercise this interface so we can see how
  such a thing would look?
 
  Yes -- initial IP review done, legal stuff done, sanitizing WIP,
  hoping for final approval this week
 
  There's a separate test harness to exercise the userspace lib calls,
  haven't started IP review or sanitizing for that but legal stuff is
  done
 
  
  Dave.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel

Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-14 Thread Dave Airlie
On 14 July 2014 18:37, Christian König  wrote:
>> I vote for HSA module that expose ioctl and is an intermediary with the
>> kernel driver that handle the hardware. This gives a single point for
>> HSA hardware and yes this enforce things for any hardware manufacturer.
>> I am more than happy to tell them that this is it and nothing else if
>> they want to get upstream.
>
> I think we should still discuss this single point of entry a bit more.
>
> Just to make it clear the plan is to expose all physical HSA capable devices
> through a single /dev/hsa device node to userspace.

This is why we don't design kernel interfaces in secret foundations,
and expect anyone to like them.

So before we go any further, how is this stuff planned to work for
multiple GPUs/accelerators?

Do we have a userspace to exercise this interface so we can see how
such a thing would look?

Dave.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-14 Thread Christian König

I vote for HSA module that expose ioctl and is an intermediary with the
kernel driver that handle the hardware. This gives a single point for
HSA hardware and yes this enforce things for any hardware manufacturer.
I am more than happy to tell them that this is it and nothing else if
they want to get upstream.

I think we should still discuss this single point of entry a bit more.

Just to make it clear the plan is to expose all physical HSA capable 
devices through a single /dev/hsa device node to userspace.


While this obviously makes device enumeration much easier it's still a 
quite hard break with Unix traditions. Essentially we now expose all 
devices of one kind though a single device node instead of creating 
independent nodes for each physical or logical device.


What makes it even worse is that we want to expose different drivers 
though the same device node.


Because of this any effort of a system administrator to limit access to 
HSA is reduced to an on/off decision. It's simply not possible any more 
to apply simple file system access semantics to individual hardware devices.


Just imaging you are an administrator with a bunch of different compute 
cards in a system and you want to restrict access of one off them 
because it's faulty or has a security problem or something like this. Or 
you have several hardware device and want to assign each of them a 
distinct container.


Just some thoughts,
Christian.

Am 13.07.2014 18:49, schrieb Jerome Glisse:

On Sun, Jul 13, 2014 at 03:34:12PM +, Bridgman, John wrote:

From: Jerome Glisse [mailto:j.gli...@gmail.com]
Sent: Saturday, July 12, 2014 11:56 PM
To: Gabbay, Oded
Cc: linux-kernel@vger.kernel.org; Bridgman, John; Deucher, Alexander;
Lewycky, Andrew; j...@8bytes.org; a...@linux-foundation.org; dri-
de...@lists.freedesktop.org; airl...@linux.ie; oded.gab...@gmail.com
Subject: Re: [PATCH 00/83] AMD HSA kernel driver

On Sat, Jul 12, 2014 at 09:55:49PM +, Gabbay, Oded wrote:

On Fri, 2014-07-11 at 17:18 -0400, Jerome Glisse wrote:

On Thu, Jul 10, 2014 at 10:51:29PM +, Gabbay, Oded wrote:

  On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:

  On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:

   This patch set implements a Heterogeneous System
Architecture
  (HSA) driver
   for radeon-family GPUs.

  This is just quick comments on few things. Given size of this,
people  will need to have time to review things.

   HSA allows different processor types (CPUs, DSPs, GPUs,
etc..) to
  share
   system resources more effectively via HW features including
shared pageable
   memory, userspace-accessible work queues, and platform-level
atomics. In
   addition to the memory protection mechanisms in GPUVM and
IOMMUv2, the Sea
   Islands family of GPUs also performs HW-level validation of
commands passed
   in through the queues (aka rings).
   The code in this patch set is intended to serve both as a
sample  driver for
   other HSA-compatible hardware devices and as a production
driver  for
   radeon-family processors. The code is architected to support
multiple CPUs
   each with connected GPUs, although the current
implementation  focuses on a
   single Kaveri/Berlin APU, and works alongside the existing
radeon  kernel
   graphics driver (kgd).
   AMD GPUs designed for use with HSA (Sea Islands and up)
share  some hardware
   functionality between HSA compute and regular gfx/compute
(memory,
   interrupts, registers), while other functionality has been
added
   specifically for HSA compute  (hw scheduler for virtualized
compute rings).
   All shared hardware is owned by the radeon graphics driver,
and  an interface
   between kfd and kgd allows the kfd to make use of those
shared  resources,
   while HSA-specific functionality is managed directly by kfd
by  submitting
   packets into an HSA-specific command queue (the "HIQ").
   During kfd module initialization a char device node
(/dev/kfd) is
  created
   (surviving until module exit), with ioctls for queue
creation &  management,
   and data structures are initialized for managing HSA device
topology.
   The rest of the initialization is driven by calls from the
radeon  kgd at
   the following points :
   - radeon_init (kfd_init)
   - radeon_exit (kfd_fini)
   - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
   - radeon_driver_unload_kms (kfd_device_fini)
   During the probe and init processing per-device data
structures  are
   established which connect to the associated graphics kernel
driver. This
   information is exposed to userspace via sysfs, along with a
version number
   allowing userspace to determine if a topology change has
occurred  while it
   was reading from sysfs.
   The interface between kfd and kgd also allows the kfd to
request  buffer
   management services from kgd, and allows kgd to route
interrupt  requests to
   kfd code since the interrupt block is shared between regular
   graphics/compute and HSA compute subsyst

Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-14 Thread Christian König

I vote for HSA module that expose ioctl and is an intermediary with the
kernel driver that handle the hardware. This gives a single point for
HSA hardware and yes this enforce things for any hardware manufacturer.
I am more than happy to tell them that this is it and nothing else if
they want to get upstream.

I think we should still discuss this single point of entry a bit more.

Just to make it clear the plan is to expose all physical HSA capable 
devices through a single /dev/hsa device node to userspace.


While this obviously makes device enumeration much easier it's still a 
quite hard break with Unix traditions. Essentially we now expose all 
devices of one kind though a single device node instead of creating 
independent nodes for each physical or logical device.


What makes it even worse is that we want to expose different drivers 
though the same device node.


Because of this any effort of a system administrator to limit access to 
HSA is reduced to an on/off decision. It's simply not possible any more 
to apply simple file system access semantics to individual hardware devices.


Just imaging you are an administrator with a bunch of different compute 
cards in a system and you want to restrict access of one off them 
because it's faulty or has a security problem or something like this. Or 
you have several hardware device and want to assign each of them a 
distinct container.


Just some thoughts,
Christian.

Am 13.07.2014 18:49, schrieb Jerome Glisse:

On Sun, Jul 13, 2014 at 03:34:12PM +, Bridgman, John wrote:

From: Jerome Glisse [mailto:j.gli...@gmail.com]
Sent: Saturday, July 12, 2014 11:56 PM
To: Gabbay, Oded
Cc: linux-kernel@vger.kernel.org; Bridgman, John; Deucher, Alexander;
Lewycky, Andrew; j...@8bytes.org; a...@linux-foundation.org; dri-
de...@lists.freedesktop.org; airl...@linux.ie; oded.gab...@gmail.com
Subject: Re: [PATCH 00/83] AMD HSA kernel driver

On Sat, Jul 12, 2014 at 09:55:49PM +, Gabbay, Oded wrote:

On Fri, 2014-07-11 at 17:18 -0400, Jerome Glisse wrote:

On Thu, Jul 10, 2014 at 10:51:29PM +, Gabbay, Oded wrote:

  On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:

  On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:

   This patch set implements a Heterogeneous System
Architecture
  (HSA) driver
   for radeon-family GPUs.

  This is just quick comments on few things. Given size of this,
people  will need to have time to review things.

   HSA allows different processor types (CPUs, DSPs, GPUs,
etc..) to
  share
   system resources more effectively via HW features including
shared pageable
   memory, userspace-accessible work queues, and platform-level
atomics. In
   addition to the memory protection mechanisms in GPUVM and
IOMMUv2, the Sea
   Islands family of GPUs also performs HW-level validation of
commands passed
   in through the queues (aka rings).
   The code in this patch set is intended to serve both as a
sample  driver for
   other HSA-compatible hardware devices and as a production
driver  for
   radeon-family processors. The code is architected to support
multiple CPUs
   each with connected GPUs, although the current
implementation  focuses on a
   single Kaveri/Berlin APU, and works alongside the existing
radeon  kernel
   graphics driver (kgd).
   AMD GPUs designed for use with HSA (Sea Islands and up)
share  some hardware
   functionality between HSA compute and regular gfx/compute
(memory,
   interrupts, registers), while other functionality has been
added
   specifically for HSA compute  (hw scheduler for virtualized
compute rings).
   All shared hardware is owned by the radeon graphics driver,
and  an interface
   between kfd and kgd allows the kfd to make use of those
shared  resources,
   while HSA-specific functionality is managed directly by kfd
by  submitting
   packets into an HSA-specific command queue (the HIQ).
   During kfd module initialization a char device node
(/dev/kfd) is
  created
   (surviving until module exit), with ioctls for queue
creation   management,
   and data structures are initialized for managing HSA device
topology.
   The rest of the initialization is driven by calls from the
radeon  kgd at
   the following points :
   - radeon_init (kfd_init)
   - radeon_exit (kfd_fini)
   - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
   - radeon_driver_unload_kms (kfd_device_fini)
   During the probe and init processing per-device data
structures  are
   established which connect to the associated graphics kernel
driver. This
   information is exposed to userspace via sysfs, along with a
version number
   allowing userspace to determine if a topology change has
occurred  while it
   was reading from sysfs.
   The interface between kfd and kgd also allows the kfd to
request  buffer
   management services from kgd, and allows kgd to route
interrupt  requests to
   kfd code since the interrupt block is shared between regular
   graphics/compute and HSA compute subsystems in the GPU

Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-14 Thread Dave Airlie
On 14 July 2014 18:37, Christian König deathsim...@vodafone.de wrote:
 I vote for HSA module that expose ioctl and is an intermediary with the
 kernel driver that handle the hardware. This gives a single point for
 HSA hardware and yes this enforce things for any hardware manufacturer.
 I am more than happy to tell them that this is it and nothing else if
 they want to get upstream.

 I think we should still discuss this single point of entry a bit more.

 Just to make it clear the plan is to expose all physical HSA capable devices
 through a single /dev/hsa device node to userspace.

This is why we don't design kernel interfaces in secret foundations,
and expect anyone to like them.

So before we go any further, how is this stuff planned to work for
multiple GPUs/accelerators?

Do we have a userspace to exercise this interface so we can see how
such a thing would look?

Dave.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-13 Thread Jerome Glisse
On Sun, Jul 13, 2014 at 03:34:12PM +, Bridgman, John wrote:
> >From: Jerome Glisse [mailto:j.gli...@gmail.com]
> >Sent: Saturday, July 12, 2014 11:56 PM
> >To: Gabbay, Oded
> >Cc: linux-kernel@vger.kernel.org; Bridgman, John; Deucher, Alexander;
> >Lewycky, Andrew; j...@8bytes.org; a...@linux-foundation.org; dri-
> >de...@lists.freedesktop.org; airl...@linux.ie; oded.gab...@gmail.com
> >Subject: Re: [PATCH 00/83] AMD HSA kernel driver
> >
> >On Sat, Jul 12, 2014 at 09:55:49PM +, Gabbay, Oded wrote:
> >> On Fri, 2014-07-11 at 17:18 -0400, Jerome Glisse wrote:
> >> > On Thu, Jul 10, 2014 at 10:51:29PM +, Gabbay, Oded wrote:
> >> > >  On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
> >> > > >  On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
> >> > > > >   This patch set implements a Heterogeneous System
> >> > > > > Architecture
> >> > > > >  (HSA) driver
> >> > > > >   for radeon-family GPUs.
> >> > > >  This is just quick comments on few things. Given size of this,
> >> > > > people  will need to have time to review things.
> >> > > > >   HSA allows different processor types (CPUs, DSPs, GPUs,
> >> > > > > etc..) to
> >> > > > >  share
> >> > > > >   system resources more effectively via HW features including
> >> > > > > shared pageable
> >> > > > >   memory, userspace-accessible work queues, and platform-level
> >> > > > > atomics. In
> >> > > > >   addition to the memory protection mechanisms in GPUVM and
> >> > > > > IOMMUv2, the Sea
> >> > > > >   Islands family of GPUs also performs HW-level validation of
> >> > > > > commands passed
> >> > > > >   in through the queues (aka rings).
> >> > > > >   The code in this patch set is intended to serve both as a
> >> > > > > sample  driver for
> >> > > > >   other HSA-compatible hardware devices and as a production
> >> > > > > driver  for
> >> > > > >   radeon-family processors. The code is architected to support
> >> > > > > multiple CPUs
> >> > > > >   each with connected GPUs, although the current
> >> > > > > implementation  focuses on a
> >> > > > >   single Kaveri/Berlin APU, and works alongside the existing
> >> > > > > radeon  kernel
> >> > > > >   graphics driver (kgd).
> >> > > > >   AMD GPUs designed for use with HSA (Sea Islands and up)
> >> > > > > share  some hardware
> >> > > > >   functionality between HSA compute and regular gfx/compute
> >> > > > > (memory,
> >> > > > >   interrupts, registers), while other functionality has been
> >> > > > > added
> >> > > > >   specifically for HSA compute  (hw scheduler for virtualized
> >> > > > > compute rings).
> >> > > > >   All shared hardware is owned by the radeon graphics driver,
> >> > > > > and  an interface
> >> > > > >   between kfd and kgd allows the kfd to make use of those
> >> > > > > shared  resources,
> >> > > > >   while HSA-specific functionality is managed directly by kfd
> >> > > > > by  submitting
> >> > > > >   packets into an HSA-specific command queue (the "HIQ").
> >> > > > >   During kfd module initialization a char device node
> >> > > > > (/dev/kfd) is
> >> > > > >  created
> >> > > > >   (surviving until module exit), with ioctls for queue
> >> > > > > creation &  management,
> >> > > > >   and data structures are initialized for managing HSA device
> >> > > > > topology.
> >> > > > >   The rest of the initialization is driven by calls from the
> >> > > > > radeon  kgd at
> >> > > > >   the following points :
> >> > > > >   - radeon_init (kfd_init)
> >> > > > >   - radeon_exit (kfd_fini)
> >> > > > >   - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
> >> > > > >   - radeon_driver_unload_kms (kfd_device_fini)
> &

Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-13 Thread Jerome Glisse
On Sun, Jul 13, 2014 at 11:42:58AM +0200, Daniel Vetter wrote:
> On Sat, Jul 12, 2014 at 6:49 PM, Jerome Glisse  wrote:
> >> Hm, so the hsa part is a completely new driver/subsystem, not just an
> >> additional ioctl tacked onto radoen? The history of drm is littered with
> >> "generic" ioctls that turned out to be useful for exactly one driver.
> >> Which is why _all_ the command submission is now done with driver-private
> >> ioctls.
> >>
> >> I'd be quite a bit surprised if that suddenly works differently, so before
> >> we bless a generic hsa interface I really want to see some implementation
> >> from a different vendor (i.e. nvdidia or intel) using the same ioctls.
> >> Otherwise we just repeat history and I'm not terribly inclined to keep on
> >> cleanup up cruft forever - one drm legacy is enough ;-)
> >>
> >> Jesse is the guy from our side to talk to about this.
> >> -Daniel
> >
> > I am not worried about that side, the hsa foundation has pretty strict
> > guidelines on what is hsa compliant hardware ie the hw needs to understand
> > the pm4 packet format of radeon (well small subset of it). But of course
> > this require hsa compliant hardware and from member i am guessing ARM Mali,
> > ImgTech, Qualcomm, ... so unless Intel and NVidia joins hsa you will not
> > see it for those hardware.
> >
> > So yes for once same ioctl would apply to different hardware. The only 
> > things
> > that is different is the shader isa. The hsafoundation site has some pdf
> > explaining all that but someone thought that slideshare would be a good idea
> > personnaly i would not register to any of the website just to get the pdf.
> >
> > So to sumup i am ok with having a new device file that present uniform set
> > of ioctl. It would actualy be lot easier for userspace, just open this fix
> > device file and ask for list of compliant hardware.
> >
> > Then radeon kernel driver would register itself as a provider. So all ioctl
> > decoding marshalling would be share which makes sense.
> 
> There's also the other side namely that preparing the cp ring in
> userspace and submitting the entire pile through a doorbell to the hw
> scheduler isn't really hsa exclusive. And for a solid platform with
> seamless gpu/cpu integration that means we need standard ways to set
> gpu context priorities and get at useful stats like gpu time used by a
> given context.
> 
> To get there I guess intel/nvidia need to reuse the hsa subsystem with
> the command submission adjusted a bit. Kinda like drm where kms and
> buffer sharing is common and cs driver specific.

HSA module would be for HSA compliant hardware and thus hardware would
need to follow HSA specification which again is pretty clear on what
the hardware need to provide. So if Intel and NVidia wants to join HSA
i am sure they would be welcome, the more the merrier :)

So i would not block HSA kernel ioctl design in order to please non HSA
hardware especialy if at this point in time nor Intel or NVidia can
share anything concret on the design and how this things should be setup
for there hardware.

When Intel or NVidia present their own API they should provide their
own set of ioctl through their own platform.

Cheers,
Jérôme Glisse
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 00/83] AMD HSA kernel driver

2014-07-13 Thread Bridgman, John


>-Original Message-
>From: Jerome Glisse [mailto:j.gli...@gmail.com]
>Sent: Saturday, July 12, 2014 11:56 PM
>To: Gabbay, Oded
>Cc: linux-kernel@vger.kernel.org; Bridgman, John; Deucher, Alexander;
>Lewycky, Andrew; j...@8bytes.org; a...@linux-foundation.org; dri-
>de...@lists.freedesktop.org; airl...@linux.ie; oded.gab...@gmail.com
>Subject: Re: [PATCH 00/83] AMD HSA kernel driver
>
>On Sat, Jul 12, 2014 at 09:55:49PM +, Gabbay, Oded wrote:
>> On Fri, 2014-07-11 at 17:18 -0400, Jerome Glisse wrote:
>> > On Thu, Jul 10, 2014 at 10:51:29PM +, Gabbay, Oded wrote:
>> > >  On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
>> > > >  On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
>> > > > >   This patch set implements a Heterogeneous System
>> > > > > Architecture
>> > > > >  (HSA) driver
>> > > > >   for radeon-family GPUs.
>> > > >  This is just quick comments on few things. Given size of this,
>> > > > people  will need to have time to review things.
>> > > > >   HSA allows different processor types (CPUs, DSPs, GPUs,
>> > > > > etc..) to
>> > > > >  share
>> > > > >   system resources more effectively via HW features including
>> > > > > shared pageable
>> > > > >   memory, userspace-accessible work queues, and platform-level
>> > > > > atomics. In
>> > > > >   addition to the memory protection mechanisms in GPUVM and
>> > > > > IOMMUv2, the Sea
>> > > > >   Islands family of GPUs also performs HW-level validation of
>> > > > > commands passed
>> > > > >   in through the queues (aka rings).
>> > > > >   The code in this patch set is intended to serve both as a
>> > > > > sample  driver for
>> > > > >   other HSA-compatible hardware devices and as a production
>> > > > > driver  for
>> > > > >   radeon-family processors. The code is architected to support
>> > > > > multiple CPUs
>> > > > >   each with connected GPUs, although the current
>> > > > > implementation  focuses on a
>> > > > >   single Kaveri/Berlin APU, and works alongside the existing
>> > > > > radeon  kernel
>> > > > >   graphics driver (kgd).
>> > > > >   AMD GPUs designed for use with HSA (Sea Islands and up)
>> > > > > share  some hardware
>> > > > >   functionality between HSA compute and regular gfx/compute
>> > > > > (memory,
>> > > > >   interrupts, registers), while other functionality has been
>> > > > > added
>> > > > >   specifically for HSA compute  (hw scheduler for virtualized
>> > > > > compute rings).
>> > > > >   All shared hardware is owned by the radeon graphics driver,
>> > > > > and  an interface
>> > > > >   between kfd and kgd allows the kfd to make use of those
>> > > > > shared  resources,
>> > > > >   while HSA-specific functionality is managed directly by kfd
>> > > > > by  submitting
>> > > > >   packets into an HSA-specific command queue (the "HIQ").
>> > > > >   During kfd module initialization a char device node
>> > > > > (/dev/kfd) is
>> > > > >  created
>> > > > >   (surviving until module exit), with ioctls for queue
>> > > > > creation &  management,
>> > > > >   and data structures are initialized for managing HSA device
>> > > > > topology.
>> > > > >   The rest of the initialization is driven by calls from the
>> > > > > radeon  kgd at
>> > > > >   the following points :
>> > > > >   - radeon_init (kfd_init)
>> > > > >   - radeon_exit (kfd_fini)
>> > > > >   - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
>> > > > >   - radeon_driver_unload_kms (kfd_device_fini)
>> > > > >   During the probe and init processing per-device data
>> > > > > structures  are
>> > > > >   established which connect to the associated graphics kernel
>> > > > > driver. This
>> > > > >   information is exposed to userspace via sysfs, along with a
>> > > > > version numbe

Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-13 Thread Daniel Vetter
On Sat, Jul 12, 2014 at 6:49 PM, Jerome Glisse  wrote:
>> Hm, so the hsa part is a completely new driver/subsystem, not just an
>> additional ioctl tacked onto radoen? The history of drm is littered with
>> "generic" ioctls that turned out to be useful for exactly one driver.
>> Which is why _all_ the command submission is now done with driver-private
>> ioctls.
>>
>> I'd be quite a bit surprised if that suddenly works differently, so before
>> we bless a generic hsa interface I really want to see some implementation
>> from a different vendor (i.e. nvdidia or intel) using the same ioctls.
>> Otherwise we just repeat history and I'm not terribly inclined to keep on
>> cleanup up cruft forever - one drm legacy is enough ;-)
>>
>> Jesse is the guy from our side to talk to about this.
>> -Daniel
>
> I am not worried about that side, the hsa foundation has pretty strict
> guidelines on what is hsa compliant hardware ie the hw needs to understand
> the pm4 packet format of radeon (well small subset of it). But of course
> this require hsa compliant hardware and from member i am guessing ARM Mali,
> ImgTech, Qualcomm, ... so unless Intel and NVidia joins hsa you will not
> see it for those hardware.
>
> So yes for once same ioctl would apply to different hardware. The only things
> that is different is the shader isa. The hsafoundation site has some pdf
> explaining all that but someone thought that slideshare would be a good idea
> personnaly i would not register to any of the website just to get the pdf.
>
> So to sumup i am ok with having a new device file that present uniform set
> of ioctl. It would actualy be lot easier for userspace, just open this fix
> device file and ask for list of compliant hardware.
>
> Then radeon kernel driver would register itself as a provider. So all ioctl
> decoding marshalling would be share which makes sense.

There's also the other side namely that preparing the cp ring in
userspace and submitting the entire pile through a doorbell to the hw
scheduler isn't really hsa exclusive. And for a solid platform with
seamless gpu/cpu integration that means we need standard ways to set
gpu context priorities and get at useful stats like gpu time used by a
given context.

To get there I guess intel/nvidia need to reuse the hsa subsystem with
the command submission adjusted a bit. Kinda like drm where kms and
buffer sharing is common and cs driver specific.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-13 Thread Daniel Vetter
On Sat, Jul 12, 2014 at 6:49 PM, Jerome Glisse j.gli...@gmail.com wrote:
 Hm, so the hsa part is a completely new driver/subsystem, not just an
 additional ioctl tacked onto radoen? The history of drm is littered with
 generic ioctls that turned out to be useful for exactly one driver.
 Which is why _all_ the command submission is now done with driver-private
 ioctls.

 I'd be quite a bit surprised if that suddenly works differently, so before
 we bless a generic hsa interface I really want to see some implementation
 from a different vendor (i.e. nvdidia or intel) using the same ioctls.
 Otherwise we just repeat history and I'm not terribly inclined to keep on
 cleanup up cruft forever - one drm legacy is enough ;-)

 Jesse is the guy from our side to talk to about this.
 -Daniel

 I am not worried about that side, the hsa foundation has pretty strict
 guidelines on what is hsa compliant hardware ie the hw needs to understand
 the pm4 packet format of radeon (well small subset of it). But of course
 this require hsa compliant hardware and from member i am guessing ARM Mali,
 ImgTech, Qualcomm, ... so unless Intel and NVidia joins hsa you will not
 see it for those hardware.

 So yes for once same ioctl would apply to different hardware. The only things
 that is different is the shader isa. The hsafoundation site has some pdf
 explaining all that but someone thought that slideshare would be a good idea
 personnaly i would not register to any of the website just to get the pdf.

 So to sumup i am ok with having a new device file that present uniform set
 of ioctl. It would actualy be lot easier for userspace, just open this fix
 device file and ask for list of compliant hardware.

 Then radeon kernel driver would register itself as a provider. So all ioctl
 decoding marshalling would be share which makes sense.

There's also the other side namely that preparing the cp ring in
userspace and submitting the entire pile through a doorbell to the hw
scheduler isn't really hsa exclusive. And for a solid platform with
seamless gpu/cpu integration that means we need standard ways to set
gpu context priorities and get at useful stats like gpu time used by a
given context.

To get there I guess intel/nvidia need to reuse the hsa subsystem with
the command submission adjusted a bit. Kinda like drm where kms and
buffer sharing is common and cs driver specific.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 00/83] AMD HSA kernel driver

2014-07-13 Thread Bridgman, John


-Original Message-
From: Jerome Glisse [mailto:j.gli...@gmail.com]
Sent: Saturday, July 12, 2014 11:56 PM
To: Gabbay, Oded
Cc: linux-kernel@vger.kernel.org; Bridgman, John; Deucher, Alexander;
Lewycky, Andrew; j...@8bytes.org; a...@linux-foundation.org; dri-
de...@lists.freedesktop.org; airl...@linux.ie; oded.gab...@gmail.com
Subject: Re: [PATCH 00/83] AMD HSA kernel driver

On Sat, Jul 12, 2014 at 09:55:49PM +, Gabbay, Oded wrote:
 On Fri, 2014-07-11 at 17:18 -0400, Jerome Glisse wrote:
  On Thu, Jul 10, 2014 at 10:51:29PM +, Gabbay, Oded wrote:
On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
 On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
   This patch set implements a Heterogeneous System
 Architecture
  (HSA) driver
   for radeon-family GPUs.
 This is just quick comments on few things. Given size of this,
people  will need to have time to review things.
   HSA allows different processor types (CPUs, DSPs, GPUs,
 etc..) to
  share
   system resources more effectively via HW features including
 shared pageable
   memory, userspace-accessible work queues, and platform-level
 atomics. In
   addition to the memory protection mechanisms in GPUVM and
 IOMMUv2, the Sea
   Islands family of GPUs also performs HW-level validation of
 commands passed
   in through the queues (aka rings).
   The code in this patch set is intended to serve both as a
 sample  driver for
   other HSA-compatible hardware devices and as a production
 driver  for
   radeon-family processors. The code is architected to support
 multiple CPUs
   each with connected GPUs, although the current
 implementation  focuses on a
   single Kaveri/Berlin APU, and works alongside the existing
 radeon  kernel
   graphics driver (kgd).
   AMD GPUs designed for use with HSA (Sea Islands and up)
 share  some hardware
   functionality between HSA compute and regular gfx/compute
 (memory,
   interrupts, registers), while other functionality has been
 added
   specifically for HSA compute  (hw scheduler for virtualized
 compute rings).
   All shared hardware is owned by the radeon graphics driver,
 and  an interface
   between kfd and kgd allows the kfd to make use of those
 shared  resources,
   while HSA-specific functionality is managed directly by kfd
 by  submitting
   packets into an HSA-specific command queue (the HIQ).
   During kfd module initialization a char device node
 (/dev/kfd) is
  created
   (surviving until module exit), with ioctls for queue
 creation   management,
   and data structures are initialized for managing HSA device
 topology.
   The rest of the initialization is driven by calls from the
 radeon  kgd at
   the following points :
   - radeon_init (kfd_init)
   - radeon_exit (kfd_fini)
   - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
   - radeon_driver_unload_kms (kfd_device_fini)
   During the probe and init processing per-device data
 structures  are
   established which connect to the associated graphics kernel
 driver. This
   information is exposed to userspace via sysfs, along with a
 version number
   allowing userspace to determine if a topology change has
 occurred  while it
   was reading from sysfs.
   The interface between kfd and kgd also allows the kfd to
 request  buffer
   management services from kgd, and allows kgd to route
 interrupt  requests to
   kfd code since the interrupt block is shared between regular
   graphics/compute and HSA compute subsystems in the GPU.
   The kfd code works with an open source usermode library
  (libhsakmt) which
   is in the final stages of IP review and should be published
 in a  separate
   repo over the next few days.
   The code operates in one of three modes, selectable via the
 sched_policy
   module parameter :
   - sched_policy=0 uses a hardware scheduler running in the
 MEC  block within
   CP, and allows oversubscription (more queues than HW slots)
   - sched_policy=1 also uses HW scheduling but does not allow
   oversubscription, so create_queue requests fail when we run
 out  of HW slots
   - sched_policy=2 does not use HW scheduling, so the driver
 manually assigns
   queues to HW slots by programming registers
   The no HW scheduling option is for debug  new hardware
 bringup  only, so
   has less test coverage than the other options. Default in
 the  current code
   is HW scheduling without oversubscription since that is
 where  we have the
   most test coverage but we expect to change the default to
 HW  scheduling
   with oversubscription after further testing. This
 effectively  removes the
   HW

Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-13 Thread Jerome Glisse
On Sun, Jul 13, 2014 at 11:42:58AM +0200, Daniel Vetter wrote:
 On Sat, Jul 12, 2014 at 6:49 PM, Jerome Glisse j.gli...@gmail.com wrote:
  Hm, so the hsa part is a completely new driver/subsystem, not just an
  additional ioctl tacked onto radoen? The history of drm is littered with
  generic ioctls that turned out to be useful for exactly one driver.
  Which is why _all_ the command submission is now done with driver-private
  ioctls.
 
  I'd be quite a bit surprised if that suddenly works differently, so before
  we bless a generic hsa interface I really want to see some implementation
  from a different vendor (i.e. nvdidia or intel) using the same ioctls.
  Otherwise we just repeat history and I'm not terribly inclined to keep on
  cleanup up cruft forever - one drm legacy is enough ;-)
 
  Jesse is the guy from our side to talk to about this.
  -Daniel
 
  I am not worried about that side, the hsa foundation has pretty strict
  guidelines on what is hsa compliant hardware ie the hw needs to understand
  the pm4 packet format of radeon (well small subset of it). But of course
  this require hsa compliant hardware and from member i am guessing ARM Mali,
  ImgTech, Qualcomm, ... so unless Intel and NVidia joins hsa you will not
  see it for those hardware.
 
  So yes for once same ioctl would apply to different hardware. The only 
  things
  that is different is the shader isa. The hsafoundation site has some pdf
  explaining all that but someone thought that slideshare would be a good idea
  personnaly i would not register to any of the website just to get the pdf.
 
  So to sumup i am ok with having a new device file that present uniform set
  of ioctl. It would actualy be lot easier for userspace, just open this fix
  device file and ask for list of compliant hardware.
 
  Then radeon kernel driver would register itself as a provider. So all ioctl
  decoding marshalling would be share which makes sense.
 
 There's also the other side namely that preparing the cp ring in
 userspace and submitting the entire pile through a doorbell to the hw
 scheduler isn't really hsa exclusive. And for a solid platform with
 seamless gpu/cpu integration that means we need standard ways to set
 gpu context priorities and get at useful stats like gpu time used by a
 given context.
 
 To get there I guess intel/nvidia need to reuse the hsa subsystem with
 the command submission adjusted a bit. Kinda like drm where kms and
 buffer sharing is common and cs driver specific.

HSA module would be for HSA compliant hardware and thus hardware would
need to follow HSA specification which again is pretty clear on what
the hardware need to provide. So if Intel and NVidia wants to join HSA
i am sure they would be welcome, the more the merrier :)

So i would not block HSA kernel ioctl design in order to please non HSA
hardware especialy if at this point in time nor Intel or NVidia can
share anything concret on the design and how this things should be setup
for there hardware.

When Intel or NVidia present their own API they should provide their
own set of ioctl through their own platform.

Cheers,
Jérôme Glisse
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-13 Thread Jerome Glisse
On Sun, Jul 13, 2014 at 03:34:12PM +, Bridgman, John wrote:
 From: Jerome Glisse [mailto:j.gli...@gmail.com]
 Sent: Saturday, July 12, 2014 11:56 PM
 To: Gabbay, Oded
 Cc: linux-kernel@vger.kernel.org; Bridgman, John; Deucher, Alexander;
 Lewycky, Andrew; j...@8bytes.org; a...@linux-foundation.org; dri-
 de...@lists.freedesktop.org; airl...@linux.ie; oded.gab...@gmail.com
 Subject: Re: [PATCH 00/83] AMD HSA kernel driver
 
 On Sat, Jul 12, 2014 at 09:55:49PM +, Gabbay, Oded wrote:
  On Fri, 2014-07-11 at 17:18 -0400, Jerome Glisse wrote:
   On Thu, Jul 10, 2014 at 10:51:29PM +, Gabbay, Oded wrote:
 On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
  On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
This patch set implements a Heterogeneous System
  Architecture
   (HSA) driver
for radeon-family GPUs.
  This is just quick comments on few things. Given size of this,
 people  will need to have time to review things.
HSA allows different processor types (CPUs, DSPs, GPUs,
  etc..) to
   share
system resources more effectively via HW features including
  shared pageable
memory, userspace-accessible work queues, and platform-level
  atomics. In
addition to the memory protection mechanisms in GPUVM and
  IOMMUv2, the Sea
Islands family of GPUs also performs HW-level validation of
  commands passed
in through the queues (aka rings).
The code in this patch set is intended to serve both as a
  sample  driver for
other HSA-compatible hardware devices and as a production
  driver  for
radeon-family processors. The code is architected to support
  multiple CPUs
each with connected GPUs, although the current
  implementation  focuses on a
single Kaveri/Berlin APU, and works alongside the existing
  radeon  kernel
graphics driver (kgd).
AMD GPUs designed for use with HSA (Sea Islands and up)
  share  some hardware
functionality between HSA compute and regular gfx/compute
  (memory,
interrupts, registers), while other functionality has been
  added
specifically for HSA compute  (hw scheduler for virtualized
  compute rings).
All shared hardware is owned by the radeon graphics driver,
  and  an interface
between kfd and kgd allows the kfd to make use of those
  shared  resources,
while HSA-specific functionality is managed directly by kfd
  by  submitting
packets into an HSA-specific command queue (the HIQ).
During kfd module initialization a char device node
  (/dev/kfd) is
   created
(surviving until module exit), with ioctls for queue
  creation   management,
and data structures are initialized for managing HSA device
  topology.
The rest of the initialization is driven by calls from the
  radeon  kgd at
the following points :
- radeon_init (kfd_init)
- radeon_exit (kfd_fini)
- radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
- radeon_driver_unload_kms (kfd_device_fini)
During the probe and init processing per-device data
  structures  are
established which connect to the associated graphics kernel
  driver. This
information is exposed to userspace via sysfs, along with a
  version number
allowing userspace to determine if a topology change has
  occurred  while it
was reading from sysfs.
The interface between kfd and kgd also allows the kfd to
  request  buffer
management services from kgd, and allows kgd to route
  interrupt  requests to
kfd code since the interrupt block is shared between regular
graphics/compute and HSA compute subsystems in the GPU.
The kfd code works with an open source usermode library
   (libhsakmt) which
is in the final stages of IP review and should be published
  in a  separate
repo over the next few days.
The code operates in one of three modes, selectable via the
  sched_policy
module parameter :
- sched_policy=0 uses a hardware scheduler running in the
  MEC  block within
CP, and allows oversubscription (more queues than HW slots)
- sched_policy=1 also uses HW scheduling but does not allow
oversubscription, so create_queue requests fail when we run
  out  of HW slots
- sched_policy=2 does not use HW scheduling, so the driver
  manually assigns
queues to HW slots by programming registers
The no HW scheduling option is for debug  new hardware
  bringup  only, so
has less test coverage than the other options. Default in
  the  current code
is HW scheduling without oversubscription since that is
  where  we have the
most test coverage but we expect

Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-12 Thread Jerome Glisse
On Sat, Jul 12, 2014 at 09:55:49PM +, Gabbay, Oded wrote:
> On Fri, 2014-07-11 at 17:18 -0400, Jerome Glisse wrote:
> > On Thu, Jul 10, 2014 at 10:51:29PM +, Gabbay, Oded wrote:
> > >  On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
> > > >  On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
> > > > >   This patch set implements a Heterogeneous System Architecture
> > > > >  (HSA) driver
> > > > >   for radeon-family GPUs.
> > > >  This is just quick comments on few things. Given size of this, 
> > > > people
> > > >  will need to have time to review things.
> > > > >   HSA allows different processor types (CPUs, DSPs, GPUs, 
> > > > > etc..) to
> > > > >  share
> > > > >   system resources more effectively via HW features including
> > > > >  shared pageable
> > > > >   memory, userspace-accessible work queues, and platform-level
> > > > >  atomics. In
> > > > >   addition to the memory protection mechanisms in GPUVM and
> > > > >  IOMMUv2, the Sea
> > > > >   Islands family of GPUs also performs HW-level validation of
> > > > >  commands passed
> > > > >   in through the queues (aka rings).
> > > > >   The code in this patch set is intended to serve both as a 
> > > > > sample
> > > > >  driver for
> > > > >   other HSA-compatible hardware devices and as a production 
> > > > > driver
> > > > >  for
> > > > >   radeon-family processors. The code is architected to support
> > > > >  multiple CPUs
> > > > >   each with connected GPUs, although the current implementation
> > > > >  focuses on a
> > > > >   single Kaveri/Berlin APU, and works alongside the existing 
> > > > > radeon
> > > > >  kernel
> > > > >   graphics driver (kgd).
> > > > >   AMD GPUs designed for use with HSA (Sea Islands and up) share
> > > > >  some hardware
> > > > >   functionality between HSA compute and regular gfx/compute 
> > > > > (memory,
> > > > >   interrupts, registers), while other functionality has been 
> > > > > added
> > > > >   specifically for HSA compute  (hw scheduler for virtualized
> > > > >  compute rings).
> > > > >   All shared hardware is owned by the radeon graphics driver, 
> > > > > and
> > > > >  an interface
> > > > >   between kfd and kgd allows the kfd to make use of those 
> > > > > shared
> > > > >  resources,
> > > > >   while HSA-specific functionality is managed directly by kfd 
> > > > > by
> > > > >  submitting
> > > > >   packets into an HSA-specific command queue (the "HIQ").
> > > > >   During kfd module initialization a char device node 
> > > > > (/dev/kfd) is
> > > > >  created
> > > > >   (surviving until module exit), with ioctls for queue 
> > > > > creation &
> > > > >  management,
> > > > >   and data structures are initialized for managing HSA device
> > > > >  topology.
> > > > >   The rest of the initialization is driven by calls from the 
> > > > > radeon
> > > > >  kgd at
> > > > >   the following points :
> > > > >   - radeon_init (kfd_init)
> > > > >   - radeon_exit (kfd_fini)
> > > > >   - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
> > > > >   - radeon_driver_unload_kms (kfd_device_fini)
> > > > >   During the probe and init processing per-device data 
> > > > > structures
> > > > >  are
> > > > >   established which connect to the associated graphics kernel
> > > > >  driver. This
> > > > >   information is exposed to userspace via sysfs, along with a
> > > > >  version number
> > > > >   allowing userspace to determine if a topology change has 
> > > > > occurred
> > > > >  while it
> > > > >   was reading from sysfs.
> > > > >   The interface between kfd and kgd also allows the kfd to 
> > > > > request
> > > > >  buffer
> > > > >   management services from kgd, and allows kgd to route 
> > > > > interrupt
> > > > >  requests to
> > > > >   kfd code since the interrupt block is shared between regular
> > > > >   graphics/compute and HSA compute subsystems in the GPU.
> > > > >   The kfd code works with an open source usermode library
> > > > >  ("libhsakmt") which
> > > > >   is in the final stages of IP review and should be published 
> > > > > in a
> > > > >  separate
> > > > >   repo over the next few days.
> > > > >   The code operates in one of three modes, selectable via the
> > > > >  sched_policy
> > > > >   module parameter :
> > > > >   - sched_policy=0 uses a hardware scheduler running in the MEC
> > > > >  block within
> > > > >   CP, and allows oversubscription (more queues than HW slots)
> > > > >   - sched_policy=1 also uses HW scheduling but does not allow
> > > > >   oversubscription, so create_queue requests fail when we run 
> > > > > out
> > > > >  of HW slots
> > > > >   - sched_policy=2 does not use HW scheduling, so the driver
> > > > >  manually assigns
> > > > >   queues to HW slots by programming registers
> > > > >   The "no HW scheduling" option is for debug & new hardware 
> > > > > bringup
> > > > >  only, so
> > > > >   has less test coverage than the other options. Default in the
> > > > >  current 

Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-12 Thread Gabbay, Oded
On Fri, 2014-07-11 at 17:18 -0400, Jerome Glisse wrote:
> On Thu, Jul 10, 2014 at 10:51:29PM +, Gabbay, Oded wrote:
> >  On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
> > >  On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
> > > >   This patch set implements a Heterogeneous System Architecture
> > > >  (HSA) driver
> > > >   for radeon-family GPUs.
> > >  This is just quick comments on few things. Given size of this, 
> > > people
> > >  will need to have time to review things.
> > > >   HSA allows different processor types (CPUs, DSPs, GPUs, 
> > > > etc..) to
> > > >  share
> > > >   system resources more effectively via HW features including
> > > >  shared pageable
> > > >   memory, userspace-accessible work queues, and platform-level
> > > >  atomics. In
> > > >   addition to the memory protection mechanisms in GPUVM and
> > > >  IOMMUv2, the Sea
> > > >   Islands family of GPUs also performs HW-level validation of
> > > >  commands passed
> > > >   in through the queues (aka rings).
> > > >   The code in this patch set is intended to serve both as a 
> > > > sample
> > > >  driver for
> > > >   other HSA-compatible hardware devices and as a production 
> > > > driver
> > > >  for
> > > >   radeon-family processors. The code is architected to support
> > > >  multiple CPUs
> > > >   each with connected GPUs, although the current implementation
> > > >  focuses on a
> > > >   single Kaveri/Berlin APU, and works alongside the existing 
> > > > radeon
> > > >  kernel
> > > >   graphics driver (kgd).
> > > >   AMD GPUs designed for use with HSA (Sea Islands and up) share
> > > >  some hardware
> > > >   functionality between HSA compute and regular gfx/compute 
> > > > (memory,
> > > >   interrupts, registers), while other functionality has been 
> > > > added
> > > >   specifically for HSA compute  (hw scheduler for virtualized
> > > >  compute rings).
> > > >   All shared hardware is owned by the radeon graphics driver, 
> > > > and
> > > >  an interface
> > > >   between kfd and kgd allows the kfd to make use of those 
> > > > shared
> > > >  resources,
> > > >   while HSA-specific functionality is managed directly by kfd 
> > > > by
> > > >  submitting
> > > >   packets into an HSA-specific command queue (the "HIQ").
> > > >   During kfd module initialization a char device node 
> > > > (/dev/kfd) is
> > > >  created
> > > >   (surviving until module exit), with ioctls for queue 
> > > > creation &
> > > >  management,
> > > >   and data structures are initialized for managing HSA device
> > > >  topology.
> > > >   The rest of the initialization is driven by calls from the 
> > > > radeon
> > > >  kgd at
> > > >   the following points :
> > > >   - radeon_init (kfd_init)
> > > >   - radeon_exit (kfd_fini)
> > > >   - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
> > > >   - radeon_driver_unload_kms (kfd_device_fini)
> > > >   During the probe and init processing per-device data 
> > > > structures
> > > >  are
> > > >   established which connect to the associated graphics kernel
> > > >  driver. This
> > > >   information is exposed to userspace via sysfs, along with a
> > > >  version number
> > > >   allowing userspace to determine if a topology change has 
> > > > occurred
> > > >  while it
> > > >   was reading from sysfs.
> > > >   The interface between kfd and kgd also allows the kfd to 
> > > > request
> > > >  buffer
> > > >   management services from kgd, and allows kgd to route 
> > > > interrupt
> > > >  requests to
> > > >   kfd code since the interrupt block is shared between regular
> > > >   graphics/compute and HSA compute subsystems in the GPU.
> > > >   The kfd code works with an open source usermode library
> > > >  ("libhsakmt") which
> > > >   is in the final stages of IP review and should be published 
> > > > in a
> > > >  separate
> > > >   repo over the next few days.
> > > >   The code operates in one of three modes, selectable via the
> > > >  sched_policy
> > > >   module parameter :
> > > >   - sched_policy=0 uses a hardware scheduler running in the MEC
> > > >  block within
> > > >   CP, and allows oversubscription (more queues than HW slots)
> > > >   - sched_policy=1 also uses HW scheduling but does not allow
> > > >   oversubscription, so create_queue requests fail when we run 
> > > > out
> > > >  of HW slots
> > > >   - sched_policy=2 does not use HW scheduling, so the driver
> > > >  manually assigns
> > > >   queues to HW slots by programming registers
> > > >   The "no HW scheduling" option is for debug & new hardware 
> > > > bringup
> > > >  only, so
> > > >   has less test coverage than the other options. Default in the
> > > >  current code
> > > >   is "HW scheduling without oversubscription" since that is 
> > > > where
> > > >  we have the
> > > >   most test coverage but we expect to change the default to "HW
> > > >  scheduling
> > > >   with oversubscription" after further testing. This 
> > > > effectively
> > > 

Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-12 Thread Jerome Glisse
On Sat, Jul 12, 2014 at 01:10:32PM +0200, Daniel Vetter wrote:
> On Sat, Jul 12, 2014 at 11:24:49AM +0200, Christian König wrote:
> > Am 11.07.2014 23:18, schrieb Jerome Glisse:
> > >On Thu, Jul 10, 2014 at 10:51:29PM +, Gabbay, Oded wrote:
> > >>On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
> > >>>On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
> >   This patch set implements a Heterogeneous System Architecture
> > (HSA) driver
> >   for radeon-family GPUs.
> > >>>This is just quick comments on few things. Given size of this, people
> > >>>will need to have time to review things.
> >   HSA allows different processor types (CPUs, DSPs, GPUs, etc..) to
> > share
> >   system resources more effectively via HW features including
> > shared pageable
> >   memory, userspace-accessible work queues, and platform-level
> > atomics. In
> >   addition to the memory protection mechanisms in GPUVM and
> > IOMMUv2, the Sea
> >   Islands family of GPUs also performs HW-level validation of
> > commands passed
> >   in through the queues (aka rings).
> >   The code in this patch set is intended to serve both as a sample
> > driver for
> >   other HSA-compatible hardware devices and as a production driver
> > for
> >   radeon-family processors. The code is architected to support
> > multiple CPUs
> >   each with connected GPUs, although the current implementation
> > focuses on a
> >   single Kaveri/Berlin APU, and works alongside the existing radeon
> > kernel
> >   graphics driver (kgd).
> >   AMD GPUs designed for use with HSA (Sea Islands and up) share
> > some hardware
> >   functionality between HSA compute and regular gfx/compute (memory,
> >   interrupts, registers), while other functionality has been added
> >   specifically for HSA compute  (hw scheduler for virtualized
> > compute rings).
> >   All shared hardware is owned by the radeon graphics driver, and
> > an interface
> >   between kfd and kgd allows the kfd to make use of those shared
> > resources,
> >   while HSA-specific functionality is managed directly by kfd by
> > submitting
> >   packets into an HSA-specific command queue (the "HIQ").
> >   During kfd module initialization a char device node (/dev/kfd) is
> > created
> >   (surviving until module exit), with ioctls for queue creation &
> > management,
> >   and data structures are initialized for managing HSA device
> > topology.
> >   The rest of the initialization is driven by calls from the radeon
> > kgd at
> >   the following points :
> >   - radeon_init (kfd_init)
> >   - radeon_exit (kfd_fini)
> >   - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
> >   - radeon_driver_unload_kms (kfd_device_fini)
> >   During the probe and init processing per-device data structures
> > are
> >   established which connect to the associated graphics kernel
> > driver. This
> >   information is exposed to userspace via sysfs, along with a
> > version number
> >   allowing userspace to determine if a topology change has occurred
> > while it
> >   was reading from sysfs.
> >   The interface between kfd and kgd also allows the kfd to request
> > buffer
> >   management services from kgd, and allows kgd to route interrupt
> > requests to
> >   kfd code since the interrupt block is shared between regular
> >   graphics/compute and HSA compute subsystems in the GPU.
> >   The kfd code works with an open source usermode library
> > ("libhsakmt") which
> >   is in the final stages of IP review and should be published in a
> > separate
> >   repo over the next few days.
> >   The code operates in one of three modes, selectable via the
> > sched_policy
> >   module parameter :
> >   - sched_policy=0 uses a hardware scheduler running in the MEC
> > block within
> >   CP, and allows oversubscription (more queues than HW slots)
> >   - sched_policy=1 also uses HW scheduling but does not allow
> >   oversubscription, so create_queue requests fail when we run out
> > of HW slots
> >   - sched_policy=2 does not use HW scheduling, so the driver
> > manually assigns
> >   queues to HW slots by programming registers
> >   The "no HW scheduling" option is for debug & new hardware bringup
> > only, so
> >   has less test coverage than the other options. Default in the
> > current code
> >   is "HW scheduling without oversubscription" since that is where
> > we have the
> >   most test coverage but we expect to change the default to "HW
> > scheduling
> >   with oversubscription" after further testing. This effectively
> > removes the
> >   HW limit on the number of work queues available to applications.
> >   

Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-12 Thread Daniel Vetter
On Sat, Jul 12, 2014 at 11:24:49AM +0200, Christian König wrote:
> Am 11.07.2014 23:18, schrieb Jerome Glisse:
> >On Thu, Jul 10, 2014 at 10:51:29PM +, Gabbay, Oded wrote:
> >>On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
> >>>On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
>   This patch set implements a Heterogeneous System Architecture
> (HSA) driver
>   for radeon-family GPUs.
> >>>This is just quick comments on few things. Given size of this, people
> >>>will need to have time to review things.
>   HSA allows different processor types (CPUs, DSPs, GPUs, etc..) to
> share
>   system resources more effectively via HW features including
> shared pageable
>   memory, userspace-accessible work queues, and platform-level
> atomics. In
>   addition to the memory protection mechanisms in GPUVM and
> IOMMUv2, the Sea
>   Islands family of GPUs also performs HW-level validation of
> commands passed
>   in through the queues (aka rings).
>   The code in this patch set is intended to serve both as a sample
> driver for
>   other HSA-compatible hardware devices and as a production driver
> for
>   radeon-family processors. The code is architected to support
> multiple CPUs
>   each with connected GPUs, although the current implementation
> focuses on a
>   single Kaveri/Berlin APU, and works alongside the existing radeon
> kernel
>   graphics driver (kgd).
>   AMD GPUs designed for use with HSA (Sea Islands and up) share
> some hardware
>   functionality between HSA compute and regular gfx/compute (memory,
>   interrupts, registers), while other functionality has been added
>   specifically for HSA compute  (hw scheduler for virtualized
> compute rings).
>   All shared hardware is owned by the radeon graphics driver, and
> an interface
>   between kfd and kgd allows the kfd to make use of those shared
> resources,
>   while HSA-specific functionality is managed directly by kfd by
> submitting
>   packets into an HSA-specific command queue (the "HIQ").
>   During kfd module initialization a char device node (/dev/kfd) is
> created
>   (surviving until module exit), with ioctls for queue creation &
> management,
>   and data structures are initialized for managing HSA device
> topology.
>   The rest of the initialization is driven by calls from the radeon
> kgd at
>   the following points :
>   - radeon_init (kfd_init)
>   - radeon_exit (kfd_fini)
>   - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
>   - radeon_driver_unload_kms (kfd_device_fini)
>   During the probe and init processing per-device data structures
> are
>   established which connect to the associated graphics kernel
> driver. This
>   information is exposed to userspace via sysfs, along with a
> version number
>   allowing userspace to determine if a topology change has occurred
> while it
>   was reading from sysfs.
>   The interface between kfd and kgd also allows the kfd to request
> buffer
>   management services from kgd, and allows kgd to route interrupt
> requests to
>   kfd code since the interrupt block is shared between regular
>   graphics/compute and HSA compute subsystems in the GPU.
>   The kfd code works with an open source usermode library
> ("libhsakmt") which
>   is in the final stages of IP review and should be published in a
> separate
>   repo over the next few days.
>   The code operates in one of three modes, selectable via the
> sched_policy
>   module parameter :
>   - sched_policy=0 uses a hardware scheduler running in the MEC
> block within
>   CP, and allows oversubscription (more queues than HW slots)
>   - sched_policy=1 also uses HW scheduling but does not allow
>   oversubscription, so create_queue requests fail when we run out
> of HW slots
>   - sched_policy=2 does not use HW scheduling, so the driver
> manually assigns
>   queues to HW slots by programming registers
>   The "no HW scheduling" option is for debug & new hardware bringup
> only, so
>   has less test coverage than the other options. Default in the
> current code
>   is "HW scheduling without oversubscription" since that is where
> we have the
>   most test coverage but we expect to change the default to "HW
> scheduling
>   with oversubscription" after further testing. This effectively
> removes the
>   HW limit on the number of work queues available to applications.
>   Programs running on the GPU are associated with an address space
> through the
>   VMID field, which is translated to a unique PASID at access time
> via a set
>   of 16 VMID-to-PASID mapping registers. The available VMIDs
> (currently 16)
>   are 

Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-12 Thread Christian König

Am 11.07.2014 23:18, schrieb Jerome Glisse:

On Thu, Jul 10, 2014 at 10:51:29PM +, Gabbay, Oded wrote:

On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:

On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:

  This patch set implements a Heterogeneous System Architecture
(HSA) driver
  for radeon-family GPUs.
  
This is just quick comments on few things. Given size of this, people

will need to have time to review things.
  

  HSA allows different processor types (CPUs, DSPs, GPUs, etc..) to
share
  system resources more effectively via HW features including
shared pageable
  memory, userspace-accessible work queues, and platform-level
atomics. In
  addition to the memory protection mechanisms in GPUVM and
IOMMUv2, the Sea
  Islands family of GPUs also performs HW-level validation of
commands passed
  in through the queues (aka rings).
  The code in this patch set is intended to serve both as a sample
driver for
  other HSA-compatible hardware devices and as a production driver
for
  radeon-family processors. The code is architected to support
multiple CPUs
  each with connected GPUs, although the current implementation
focuses on a
  single Kaveri/Berlin APU, and works alongside the existing radeon
kernel
  graphics driver (kgd).
  AMD GPUs designed for use with HSA (Sea Islands and up) share
some hardware
  functionality between HSA compute and regular gfx/compute (memory,
  interrupts, registers), while other functionality has been added
  specifically for HSA compute  (hw scheduler for virtualized
compute rings).
  All shared hardware is owned by the radeon graphics driver, and
an interface
  between kfd and kgd allows the kfd to make use of those shared
resources,
  while HSA-specific functionality is managed directly by kfd by
submitting
  packets into an HSA-specific command queue (the "HIQ").
  During kfd module initialization a char device node (/dev/kfd) is
created
  (surviving until module exit), with ioctls for queue creation &
management,
  and data structures are initialized for managing HSA device
topology.
  The rest of the initialization is driven by calls from the radeon
kgd at
  the following points :
  - radeon_init (kfd_init)
  - radeon_exit (kfd_fini)
  - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
  - radeon_driver_unload_kms (kfd_device_fini)
  During the probe and init processing per-device data structures
are
  established which connect to the associated graphics kernel
driver. This
  information is exposed to userspace via sysfs, along with a
version number
  allowing userspace to determine if a topology change has occurred
while it
  was reading from sysfs.
  The interface between kfd and kgd also allows the kfd to request
buffer
  management services from kgd, and allows kgd to route interrupt
requests to
  kfd code since the interrupt block is shared between regular
  graphics/compute and HSA compute subsystems in the GPU.
  The kfd code works with an open source usermode library
("libhsakmt") which
  is in the final stages of IP review and should be published in a
separate
  repo over the next few days.
  The code operates in one of three modes, selectable via the
sched_policy
  module parameter :
  - sched_policy=0 uses a hardware scheduler running in the MEC
block within
  CP, and allows oversubscription (more queues than HW slots)
  - sched_policy=1 also uses HW scheduling but does not allow
  oversubscription, so create_queue requests fail when we run out
of HW slots
  - sched_policy=2 does not use HW scheduling, so the driver
manually assigns
  queues to HW slots by programming registers
  The "no HW scheduling" option is for debug & new hardware bringup
only, so
  has less test coverage than the other options. Default in the
current code
  is "HW scheduling without oversubscription" since that is where
we have the
  most test coverage but we expect to change the default to "HW
scheduling
  with oversubscription" after further testing. This effectively
removes the
  HW limit on the number of work queues available to applications.
  Programs running on the GPU are associated with an address space
through the
  VMID field, which is translated to a unique PASID at access time
via a set
  of 16 VMID-to-PASID mapping registers. The available VMIDs
(currently 16)
  are partitioned (under control of the radeon kgd) between current
  gfx/compute and HSA compute, with each getting 8 in the current
code. The
  VMID-to-PASID mapping registers are updated by the HW scheduler
when used,
  and by driver code if HW scheduling is not being used.
  The Sea Islands compute queues use a new "doorbell" mechanism
instead of the
  earlier kernel-managed write pointer registers. Doorbells use a
separate BAR
  dedicated for this purpose, and pages within the doorbell
aperture are
  mapped to userspace (each page mapped to only one user address
space).
  Writes to the doorbell aperture are intercepted by GPU hardware,
allowing
  userspace code to safely 

Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-12 Thread Christian König

Am 11.07.2014 23:18, schrieb Jerome Glisse:

On Thu, Jul 10, 2014 at 10:51:29PM +, Gabbay, Oded wrote:

On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:

On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:

  This patch set implements a Heterogeneous System Architecture
(HSA) driver
  for radeon-family GPUs.
  
This is just quick comments on few things. Given size of this, people

will need to have time to review things.
  

  HSA allows different processor types (CPUs, DSPs, GPUs, etc..) to
share
  system resources more effectively via HW features including
shared pageable
  memory, userspace-accessible work queues, and platform-level
atomics. In
  addition to the memory protection mechanisms in GPUVM and
IOMMUv2, the Sea
  Islands family of GPUs also performs HW-level validation of
commands passed
  in through the queues (aka rings).
  The code in this patch set is intended to serve both as a sample
driver for
  other HSA-compatible hardware devices and as a production driver
for
  radeon-family processors. The code is architected to support
multiple CPUs
  each with connected GPUs, although the current implementation
focuses on a
  single Kaveri/Berlin APU, and works alongside the existing radeon
kernel
  graphics driver (kgd).
  AMD GPUs designed for use with HSA (Sea Islands and up) share
some hardware
  functionality between HSA compute and regular gfx/compute (memory,
  interrupts, registers), while other functionality has been added
  specifically for HSA compute  (hw scheduler for virtualized
compute rings).
  All shared hardware is owned by the radeon graphics driver, and
an interface
  between kfd and kgd allows the kfd to make use of those shared
resources,
  while HSA-specific functionality is managed directly by kfd by
submitting
  packets into an HSA-specific command queue (the HIQ).
  During kfd module initialization a char device node (/dev/kfd) is
created
  (surviving until module exit), with ioctls for queue creation 
management,
  and data structures are initialized for managing HSA device
topology.
  The rest of the initialization is driven by calls from the radeon
kgd at
  the following points :
  - radeon_init (kfd_init)
  - radeon_exit (kfd_fini)
  - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
  - radeon_driver_unload_kms (kfd_device_fini)
  During the probe and init processing per-device data structures
are
  established which connect to the associated graphics kernel
driver. This
  information is exposed to userspace via sysfs, along with a
version number
  allowing userspace to determine if a topology change has occurred
while it
  was reading from sysfs.
  The interface between kfd and kgd also allows the kfd to request
buffer
  management services from kgd, and allows kgd to route interrupt
requests to
  kfd code since the interrupt block is shared between regular
  graphics/compute and HSA compute subsystems in the GPU.
  The kfd code works with an open source usermode library
(libhsakmt) which
  is in the final stages of IP review and should be published in a
separate
  repo over the next few days.
  The code operates in one of three modes, selectable via the
sched_policy
  module parameter :
  - sched_policy=0 uses a hardware scheduler running in the MEC
block within
  CP, and allows oversubscription (more queues than HW slots)
  - sched_policy=1 also uses HW scheduling but does not allow
  oversubscription, so create_queue requests fail when we run out
of HW slots
  - sched_policy=2 does not use HW scheduling, so the driver
manually assigns
  queues to HW slots by programming registers
  The no HW scheduling option is for debug  new hardware bringup
only, so
  has less test coverage than the other options. Default in the
current code
  is HW scheduling without oversubscription since that is where
we have the
  most test coverage but we expect to change the default to HW
scheduling
  with oversubscription after further testing. This effectively
removes the
  HW limit on the number of work queues available to applications.
  Programs running on the GPU are associated with an address space
through the
  VMID field, which is translated to a unique PASID at access time
via a set
  of 16 VMID-to-PASID mapping registers. The available VMIDs
(currently 16)
  are partitioned (under control of the radeon kgd) between current
  gfx/compute and HSA compute, with each getting 8 in the current
code. The
  VMID-to-PASID mapping registers are updated by the HW scheduler
when used,
  and by driver code if HW scheduling is not being used.
  The Sea Islands compute queues use a new doorbell mechanism
instead of the
  earlier kernel-managed write pointer registers. Doorbells use a
separate BAR
  dedicated for this purpose, and pages within the doorbell
aperture are
  mapped to userspace (each page mapped to only one user address
space).
  Writes to the doorbell aperture are intercepted by GPU hardware,
allowing
  userspace code to safely manage work queues 

Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-12 Thread Daniel Vetter
On Sat, Jul 12, 2014 at 11:24:49AM +0200, Christian König wrote:
 Am 11.07.2014 23:18, schrieb Jerome Glisse:
 On Thu, Jul 10, 2014 at 10:51:29PM +, Gabbay, Oded wrote:
 On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
 On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
   This patch set implements a Heterogeneous System Architecture
 (HSA) driver
   for radeon-family GPUs.
 This is just quick comments on few things. Given size of this, people
 will need to have time to review things.
   HSA allows different processor types (CPUs, DSPs, GPUs, etc..) to
 share
   system resources more effectively via HW features including
 shared pageable
   memory, userspace-accessible work queues, and platform-level
 atomics. In
   addition to the memory protection mechanisms in GPUVM and
 IOMMUv2, the Sea
   Islands family of GPUs also performs HW-level validation of
 commands passed
   in through the queues (aka rings).
   The code in this patch set is intended to serve both as a sample
 driver for
   other HSA-compatible hardware devices and as a production driver
 for
   radeon-family processors. The code is architected to support
 multiple CPUs
   each with connected GPUs, although the current implementation
 focuses on a
   single Kaveri/Berlin APU, and works alongside the existing radeon
 kernel
   graphics driver (kgd).
   AMD GPUs designed for use with HSA (Sea Islands and up) share
 some hardware
   functionality between HSA compute and regular gfx/compute (memory,
   interrupts, registers), while other functionality has been added
   specifically for HSA compute  (hw scheduler for virtualized
 compute rings).
   All shared hardware is owned by the radeon graphics driver, and
 an interface
   between kfd and kgd allows the kfd to make use of those shared
 resources,
   while HSA-specific functionality is managed directly by kfd by
 submitting
   packets into an HSA-specific command queue (the HIQ).
   During kfd module initialization a char device node (/dev/kfd) is
 created
   (surviving until module exit), with ioctls for queue creation 
 management,
   and data structures are initialized for managing HSA device
 topology.
   The rest of the initialization is driven by calls from the radeon
 kgd at
   the following points :
   - radeon_init (kfd_init)
   - radeon_exit (kfd_fini)
   - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
   - radeon_driver_unload_kms (kfd_device_fini)
   During the probe and init processing per-device data structures
 are
   established which connect to the associated graphics kernel
 driver. This
   information is exposed to userspace via sysfs, along with a
 version number
   allowing userspace to determine if a topology change has occurred
 while it
   was reading from sysfs.
   The interface between kfd and kgd also allows the kfd to request
 buffer
   management services from kgd, and allows kgd to route interrupt
 requests to
   kfd code since the interrupt block is shared between regular
   graphics/compute and HSA compute subsystems in the GPU.
   The kfd code works with an open source usermode library
 (libhsakmt) which
   is in the final stages of IP review and should be published in a
 separate
   repo over the next few days.
   The code operates in one of three modes, selectable via the
 sched_policy
   module parameter :
   - sched_policy=0 uses a hardware scheduler running in the MEC
 block within
   CP, and allows oversubscription (more queues than HW slots)
   - sched_policy=1 also uses HW scheduling but does not allow
   oversubscription, so create_queue requests fail when we run out
 of HW slots
   - sched_policy=2 does not use HW scheduling, so the driver
 manually assigns
   queues to HW slots by programming registers
   The no HW scheduling option is for debug  new hardware bringup
 only, so
   has less test coverage than the other options. Default in the
 current code
   is HW scheduling without oversubscription since that is where
 we have the
   most test coverage but we expect to change the default to HW
 scheduling
   with oversubscription after further testing. This effectively
 removes the
   HW limit on the number of work queues available to applications.
   Programs running on the GPU are associated with an address space
 through the
   VMID field, which is translated to a unique PASID at access time
 via a set
   of 16 VMID-to-PASID mapping registers. The available VMIDs
 (currently 16)
   are partitioned (under control of the radeon kgd) between current
   gfx/compute and HSA compute, with each getting 8 in the current
 code. The
   VMID-to-PASID mapping registers are updated by the HW scheduler
 when used,
   and by driver code if HW scheduling is not being used.
   The Sea Islands compute queues use a new doorbell mechanism
 instead of the
   earlier kernel-managed write pointer registers. Doorbells use a
 separate BAR
   dedicated for this purpose, and pages within the doorbell
 aperture are
   mapped to userspace 

Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-12 Thread Jerome Glisse
On Sat, Jul 12, 2014 at 01:10:32PM +0200, Daniel Vetter wrote:
 On Sat, Jul 12, 2014 at 11:24:49AM +0200, Christian König wrote:
  Am 11.07.2014 23:18, schrieb Jerome Glisse:
  On Thu, Jul 10, 2014 at 10:51:29PM +, Gabbay, Oded wrote:
  On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
  On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
This patch set implements a Heterogeneous System Architecture
  (HSA) driver
for radeon-family GPUs.
  This is just quick comments on few things. Given size of this, people
  will need to have time to review things.
HSA allows different processor types (CPUs, DSPs, GPUs, etc..) to
  share
system resources more effectively via HW features including
  shared pageable
memory, userspace-accessible work queues, and platform-level
  atomics. In
addition to the memory protection mechanisms in GPUVM and
  IOMMUv2, the Sea
Islands family of GPUs also performs HW-level validation of
  commands passed
in through the queues (aka rings).
The code in this patch set is intended to serve both as a sample
  driver for
other HSA-compatible hardware devices and as a production driver
  for
radeon-family processors. The code is architected to support
  multiple CPUs
each with connected GPUs, although the current implementation
  focuses on a
single Kaveri/Berlin APU, and works alongside the existing radeon
  kernel
graphics driver (kgd).
AMD GPUs designed for use with HSA (Sea Islands and up) share
  some hardware
functionality between HSA compute and regular gfx/compute (memory,
interrupts, registers), while other functionality has been added
specifically for HSA compute  (hw scheduler for virtualized
  compute rings).
All shared hardware is owned by the radeon graphics driver, and
  an interface
between kfd and kgd allows the kfd to make use of those shared
  resources,
while HSA-specific functionality is managed directly by kfd by
  submitting
packets into an HSA-specific command queue (the HIQ).
During kfd module initialization a char device node (/dev/kfd) is
  created
(surviving until module exit), with ioctls for queue creation 
  management,
and data structures are initialized for managing HSA device
  topology.
The rest of the initialization is driven by calls from the radeon
  kgd at
the following points :
- radeon_init (kfd_init)
- radeon_exit (kfd_fini)
- radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
- radeon_driver_unload_kms (kfd_device_fini)
During the probe and init processing per-device data structures
  are
established which connect to the associated graphics kernel
  driver. This
information is exposed to userspace via sysfs, along with a
  version number
allowing userspace to determine if a topology change has occurred
  while it
was reading from sysfs.
The interface between kfd and kgd also allows the kfd to request
  buffer
management services from kgd, and allows kgd to route interrupt
  requests to
kfd code since the interrupt block is shared between regular
graphics/compute and HSA compute subsystems in the GPU.
The kfd code works with an open source usermode library
  (libhsakmt) which
is in the final stages of IP review and should be published in a
  separate
repo over the next few days.
The code operates in one of three modes, selectable via the
  sched_policy
module parameter :
- sched_policy=0 uses a hardware scheduler running in the MEC
  block within
CP, and allows oversubscription (more queues than HW slots)
- sched_policy=1 also uses HW scheduling but does not allow
oversubscription, so create_queue requests fail when we run out
  of HW slots
- sched_policy=2 does not use HW scheduling, so the driver
  manually assigns
queues to HW slots by programming registers
The no HW scheduling option is for debug  new hardware bringup
  only, so
has less test coverage than the other options. Default in the
  current code
is HW scheduling without oversubscription since that is where
  we have the
most test coverage but we expect to change the default to HW
  scheduling
with oversubscription after further testing. This effectively
  removes the
HW limit on the number of work queues available to applications.
Programs running on the GPU are associated with an address space
  through the
VMID field, which is translated to a unique PASID at access time
  via a set
of 16 VMID-to-PASID mapping registers. The available VMIDs
  (currently 16)
are partitioned (under control of the radeon kgd) between current
gfx/compute and HSA compute, with each getting 8 in the current
  code. The
VMID-to-PASID mapping registers are updated by the HW scheduler
  when used,
and by driver code if HW scheduling is not being used.
The Sea Islands compute queues use a new doorbell mechanism
  instead of the

Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-12 Thread Gabbay, Oded
On Fri, 2014-07-11 at 17:18 -0400, Jerome Glisse wrote:
 On Thu, Jul 10, 2014 at 10:51:29PM +, Gabbay, Oded wrote:
   On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
  This patch set implements a Heterogeneous System Architecture
 (HSA) driver
  for radeon-family GPUs.
This is just quick comments on few things. Given size of this, 
   people
will need to have time to review things.
  HSA allows different processor types (CPUs, DSPs, GPUs, 
etc..) to
 share
  system resources more effectively via HW features including
 shared pageable
  memory, userspace-accessible work queues, and platform-level
 atomics. In
  addition to the memory protection mechanisms in GPUVM and
 IOMMUv2, the Sea
  Islands family of GPUs also performs HW-level validation of
 commands passed
  in through the queues (aka rings).
  The code in this patch set is intended to serve both as a 
sample
 driver for
  other HSA-compatible hardware devices and as a production 
driver
 for
  radeon-family processors. The code is architected to support
 multiple CPUs
  each with connected GPUs, although the current implementation
 focuses on a
  single Kaveri/Berlin APU, and works alongside the existing 
radeon
 kernel
  graphics driver (kgd).
  AMD GPUs designed for use with HSA (Sea Islands and up) share
 some hardware
  functionality between HSA compute and regular gfx/compute 
(memory,
  interrupts, registers), while other functionality has been 
added
  specifically for HSA compute  (hw scheduler for virtualized
 compute rings).
  All shared hardware is owned by the radeon graphics driver, 
and
 an interface
  between kfd and kgd allows the kfd to make use of those 
shared
 resources,
  while HSA-specific functionality is managed directly by kfd 
by
 submitting
  packets into an HSA-specific command queue (the HIQ).
  During kfd module initialization a char device node 
(/dev/kfd) is
 created
  (surviving until module exit), with ioctls for queue 
creation 
 management,
  and data structures are initialized for managing HSA device
 topology.
  The rest of the initialization is driven by calls from the 
radeon
 kgd at
  the following points :
  - radeon_init (kfd_init)
  - radeon_exit (kfd_fini)
  - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
  - radeon_driver_unload_kms (kfd_device_fini)
  During the probe and init processing per-device data 
structures
 are
  established which connect to the associated graphics kernel
 driver. This
  information is exposed to userspace via sysfs, along with a
 version number
  allowing userspace to determine if a topology change has 
occurred
 while it
  was reading from sysfs.
  The interface between kfd and kgd also allows the kfd to 
request
 buffer
  management services from kgd, and allows kgd to route 
interrupt
 requests to
  kfd code since the interrupt block is shared between regular
  graphics/compute and HSA compute subsystems in the GPU.
  The kfd code works with an open source usermode library
 (libhsakmt) which
  is in the final stages of IP review and should be published 
in a
 separate
  repo over the next few days.
  The code operates in one of three modes, selectable via the
 sched_policy
  module parameter :
  - sched_policy=0 uses a hardware scheduler running in the MEC
 block within
  CP, and allows oversubscription (more queues than HW slots)
  - sched_policy=1 also uses HW scheduling but does not allow
  oversubscription, so create_queue requests fail when we run 
out
 of HW slots
  - sched_policy=2 does not use HW scheduling, so the driver
 manually assigns
  queues to HW slots by programming registers
  The no HW scheduling option is for debug  new hardware 
bringup
 only, so
  has less test coverage than the other options. Default in the
 current code
  is HW scheduling without oversubscription since that is 
where
 we have the
  most test coverage but we expect to change the default to HW
 scheduling
  with oversubscription after further testing. This 
effectively
 removes the
  HW limit on the number of work queues available to 
applications.
  Programs running on the GPU are associated with an address 
space
 through the
  VMID field, which is translated to a unique PASID at access 
time
 via a set
  of 16 VMID-to-PASID mapping registers. The available VMIDs
 (currently 16)
  are partitioned (under control of the radeon kgd) between 
current
  gfx/compute and HSA compute, with each 

Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-12 Thread Jerome Glisse
On Sat, Jul 12, 2014 at 09:55:49PM +, Gabbay, Oded wrote:
 On Fri, 2014-07-11 at 17:18 -0400, Jerome Glisse wrote:
  On Thu, Jul 10, 2014 at 10:51:29PM +, Gabbay, Oded wrote:
On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
 On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
   This patch set implements a Heterogeneous System Architecture
  (HSA) driver
   for radeon-family GPUs.
 This is just quick comments on few things. Given size of this, 
people
 will need to have time to review things.
   HSA allows different processor types (CPUs, DSPs, GPUs, 
 etc..) to
  share
   system resources more effectively via HW features including
  shared pageable
   memory, userspace-accessible work queues, and platform-level
  atomics. In
   addition to the memory protection mechanisms in GPUVM and
  IOMMUv2, the Sea
   Islands family of GPUs also performs HW-level validation of
  commands passed
   in through the queues (aka rings).
   The code in this patch set is intended to serve both as a 
 sample
  driver for
   other HSA-compatible hardware devices and as a production 
 driver
  for
   radeon-family processors. The code is architected to support
  multiple CPUs
   each with connected GPUs, although the current implementation
  focuses on a
   single Kaveri/Berlin APU, and works alongside the existing 
 radeon
  kernel
   graphics driver (kgd).
   AMD GPUs designed for use with HSA (Sea Islands and up) share
  some hardware
   functionality between HSA compute and regular gfx/compute 
 (memory,
   interrupts, registers), while other functionality has been 
 added
   specifically for HSA compute  (hw scheduler for virtualized
  compute rings).
   All shared hardware is owned by the radeon graphics driver, 
 and
  an interface
   between kfd and kgd allows the kfd to make use of those 
 shared
  resources,
   while HSA-specific functionality is managed directly by kfd 
 by
  submitting
   packets into an HSA-specific command queue (the HIQ).
   During kfd module initialization a char device node 
 (/dev/kfd) is
  created
   (surviving until module exit), with ioctls for queue 
 creation 
  management,
   and data structures are initialized for managing HSA device
  topology.
   The rest of the initialization is driven by calls from the 
 radeon
  kgd at
   the following points :
   - radeon_init (kfd_init)
   - radeon_exit (kfd_fini)
   - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
   - radeon_driver_unload_kms (kfd_device_fini)
   During the probe and init processing per-device data 
 structures
  are
   established which connect to the associated graphics kernel
  driver. This
   information is exposed to userspace via sysfs, along with a
  version number
   allowing userspace to determine if a topology change has 
 occurred
  while it
   was reading from sysfs.
   The interface between kfd and kgd also allows the kfd to 
 request
  buffer
   management services from kgd, and allows kgd to route 
 interrupt
  requests to
   kfd code since the interrupt block is shared between regular
   graphics/compute and HSA compute subsystems in the GPU.
   The kfd code works with an open source usermode library
  (libhsakmt) which
   is in the final stages of IP review and should be published 
 in a
  separate
   repo over the next few days.
   The code operates in one of three modes, selectable via the
  sched_policy
   module parameter :
   - sched_policy=0 uses a hardware scheduler running in the MEC
  block within
   CP, and allows oversubscription (more queues than HW slots)
   - sched_policy=1 also uses HW scheduling but does not allow
   oversubscription, so create_queue requests fail when we run 
 out
  of HW slots
   - sched_policy=2 does not use HW scheduling, so the driver
  manually assigns
   queues to HW slots by programming registers
   The no HW scheduling option is for debug  new hardware 
 bringup
  only, so
   has less test coverage than the other options. Default in the
  current code
   is HW scheduling without oversubscription since that is 
 where
  we have the
   most test coverage but we expect to change the default to HW
  scheduling
   with oversubscription after further testing. This 
 effectively
  removes the
   HW limit on the number of work queues available to 
 applications.
   Programs running on the GPU are associated with an address 
 space
  through the
   VMID field, which is translated to a unique PASID at access 
 time
  via a set
   of 16 

Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-11 Thread Jerome Glisse
On Thu, Jul 10, 2014 at 10:51:29PM +, Gabbay, Oded wrote:
> On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
> > On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
> > >  This patch set implements a Heterogeneous System Architecture 
> > > (HSA) driver
> > >  for radeon-family GPUs.
> >  
> > This is just quick comments on few things. Given size of this, people
> > will need to have time to review things.
> >  
> > >  HSA allows different processor types (CPUs, DSPs, GPUs, etc..) to 
> > > share
> > >  system resources more effectively via HW features including 
> > > shared pageable
> > >  memory, userspace-accessible work queues, and platform-level 
> > > atomics. In
> > >  addition to the memory protection mechanisms in GPUVM and 
> > > IOMMUv2, the Sea
> > >  Islands family of GPUs also performs HW-level validation of 
> > > commands passed
> > >  in through the queues (aka rings).
> > >  The code in this patch set is intended to serve both as a sample 
> > > driver for
> > >  other HSA-compatible hardware devices and as a production driver 
> > > for
> > >  radeon-family processors. The code is architected to support 
> > > multiple CPUs
> > >  each with connected GPUs, although the current implementation 
> > > focuses on a
> > >  single Kaveri/Berlin APU, and works alongside the existing radeon 
> > > kernel
> > >  graphics driver (kgd).
> > >  AMD GPUs designed for use with HSA (Sea Islands and up) share 
> > > some hardware
> > >  functionality between HSA compute and regular gfx/compute (memory,
> > >  interrupts, registers), while other functionality has been added
> > >  specifically for HSA compute  (hw scheduler for virtualized 
> > > compute rings).
> > >  All shared hardware is owned by the radeon graphics driver, and 
> > > an interface
> > >  between kfd and kgd allows the kfd to make use of those shared 
> > > resources,
> > >  while HSA-specific functionality is managed directly by kfd by 
> > > submitting
> > >  packets into an HSA-specific command queue (the "HIQ").
> > >  During kfd module initialization a char device node (/dev/kfd) is 
> > > created
> > >  (surviving until module exit), with ioctls for queue creation & 
> > > management,
> > >  and data structures are initialized for managing HSA device 
> > > topology.
> > >  The rest of the initialization is driven by calls from the radeon 
> > > kgd at
> > >  the following points :
> > >  - radeon_init (kfd_init)
> > >  - radeon_exit (kfd_fini)
> > >  - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
> > >  - radeon_driver_unload_kms (kfd_device_fini)
> > >  During the probe and init processing per-device data structures 
> > > are
> > >  established which connect to the associated graphics kernel 
> > > driver. This
> > >  information is exposed to userspace via sysfs, along with a 
> > > version number
> > >  allowing userspace to determine if a topology change has occurred 
> > > while it
> > >  was reading from sysfs.
> > >  The interface between kfd and kgd also allows the kfd to request 
> > > buffer
> > >  management services from kgd, and allows kgd to route interrupt 
> > > requests to
> > >  kfd code since the interrupt block is shared between regular
> > >  graphics/compute and HSA compute subsystems in the GPU.
> > >  The kfd code works with an open source usermode library 
> > > ("libhsakmt") which
> > >  is in the final stages of IP review and should be published in a 
> > > separate
> > >  repo over the next few days.
> > >  The code operates in one of three modes, selectable via the 
> > > sched_policy
> > >  module parameter :
> > >  - sched_policy=0 uses a hardware scheduler running in the MEC 
> > > block within
> > >  CP, and allows oversubscription (more queues than HW slots)
> > >  - sched_policy=1 also uses HW scheduling but does not allow
> > >  oversubscription, so create_queue requests fail when we run out 
> > > of HW slots
> > >  - sched_policy=2 does not use HW scheduling, so the driver 
> > > manually assigns
> > >  queues to HW slots by programming registers
> > >  The "no HW scheduling" option is for debug & new hardware bringup 
> > > only, so
> > >  has less test coverage than the other options. Default in the 
> > > current code
> > >  is "HW scheduling without oversubscription" since that is where 
> > > we have the
> > >  most test coverage but we expect to change the default to "HW 
> > > scheduling
> > >  with oversubscription" after further testing. This effectively 
> > > removes the
> > >  HW limit on the number of work queues available to applications.
> > >  Programs running on the GPU are associated with an address space 
> > > through the
> > >  VMID field, which is translated to a unique PASID at access time 
> > > via a set
> > >  of 16 VMID-to-PASID mapping registers. The available VMIDs 
> > > (currently 16)
> > >  are partitioned (under control of the radeon kgd) between current
> > >  gfx/compute and HSA compute, with each getting 8 in the 

Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-11 Thread Jerome Glisse
On Thu, Jul 10, 2014 at 10:51:29PM +, Gabbay, Oded wrote:
 On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
  On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
This patch set implements a Heterogeneous System Architecture 
   (HSA) driver
for radeon-family GPUs.
   
  This is just quick comments on few things. Given size of this, people
  will need to have time to review things.
   
HSA allows different processor types (CPUs, DSPs, GPUs, etc..) to 
   share
system resources more effectively via HW features including 
   shared pageable
memory, userspace-accessible work queues, and platform-level 
   atomics. In
addition to the memory protection mechanisms in GPUVM and 
   IOMMUv2, the Sea
Islands family of GPUs also performs HW-level validation of 
   commands passed
in through the queues (aka rings).
The code in this patch set is intended to serve both as a sample 
   driver for
other HSA-compatible hardware devices and as a production driver 
   for
radeon-family processors. The code is architected to support 
   multiple CPUs
each with connected GPUs, although the current implementation 
   focuses on a
single Kaveri/Berlin APU, and works alongside the existing radeon 
   kernel
graphics driver (kgd).
AMD GPUs designed for use with HSA (Sea Islands and up) share 
   some hardware
functionality between HSA compute and regular gfx/compute (memory,
interrupts, registers), while other functionality has been added
specifically for HSA compute  (hw scheduler for virtualized 
   compute rings).
All shared hardware is owned by the radeon graphics driver, and 
   an interface
between kfd and kgd allows the kfd to make use of those shared 
   resources,
while HSA-specific functionality is managed directly by kfd by 
   submitting
packets into an HSA-specific command queue (the HIQ).
During kfd module initialization a char device node (/dev/kfd) is 
   created
(surviving until module exit), with ioctls for queue creation  
   management,
and data structures are initialized for managing HSA device 
   topology.
The rest of the initialization is driven by calls from the radeon 
   kgd at
the following points :
- radeon_init (kfd_init)
- radeon_exit (kfd_fini)
- radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
- radeon_driver_unload_kms (kfd_device_fini)
During the probe and init processing per-device data structures 
   are
established which connect to the associated graphics kernel 
   driver. This
information is exposed to userspace via sysfs, along with a 
   version number
allowing userspace to determine if a topology change has occurred 
   while it
was reading from sysfs.
The interface between kfd and kgd also allows the kfd to request 
   buffer
management services from kgd, and allows kgd to route interrupt 
   requests to
kfd code since the interrupt block is shared between regular
graphics/compute and HSA compute subsystems in the GPU.
The kfd code works with an open source usermode library 
   (libhsakmt) which
is in the final stages of IP review and should be published in a 
   separate
repo over the next few days.
The code operates in one of three modes, selectable via the 
   sched_policy
module parameter :
- sched_policy=0 uses a hardware scheduler running in the MEC 
   block within
CP, and allows oversubscription (more queues than HW slots)
- sched_policy=1 also uses HW scheduling but does not allow
oversubscription, so create_queue requests fail when we run out 
   of HW slots
- sched_policy=2 does not use HW scheduling, so the driver 
   manually assigns
queues to HW slots by programming registers
The no HW scheduling option is for debug  new hardware bringup 
   only, so
has less test coverage than the other options. Default in the 
   current code
is HW scheduling without oversubscription since that is where 
   we have the
most test coverage but we expect to change the default to HW 
   scheduling
with oversubscription after further testing. This effectively 
   removes the
HW limit on the number of work queues available to applications.
Programs running on the GPU are associated with an address space 
   through the
VMID field, which is translated to a unique PASID at access time 
   via a set
of 16 VMID-to-PASID mapping registers. The available VMIDs 
   (currently 16)
are partitioned (under control of the radeon kgd) between current
gfx/compute and HSA compute, with each getting 8 in the current 
   code. The
VMID-to-PASID mapping registers are updated by the HW scheduler 
   when used,
and by driver code if HW scheduling is not being used.
The Sea Islands compute queues use a new doorbell mechanism 
   instead of the
earlier kernel-managed write pointer registers. Doorbells use a 
   separate BAR
   

Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-10 Thread Gabbay, Oded
On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
> On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
> >  This patch set implements a Heterogeneous System Architecture 
> > (HSA) driver
> >  for radeon-family GPUs.
>  
> This is just quick comments on few things. Given size of this, people
> will need to have time to review things.
>  
> >  HSA allows different processor types (CPUs, DSPs, GPUs, etc..) to 
> > share
> >  system resources more effectively via HW features including 
> > shared pageable
> >  memory, userspace-accessible work queues, and platform-level 
> > atomics. In
> >  addition to the memory protection mechanisms in GPUVM and 
> > IOMMUv2, the Sea
> >  Islands family of GPUs also performs HW-level validation of 
> > commands passed
> >  in through the queues (aka rings).
> >  The code in this patch set is intended to serve both as a sample 
> > driver for
> >  other HSA-compatible hardware devices and as a production driver 
> > for
> >  radeon-family processors. The code is architected to support 
> > multiple CPUs
> >  each with connected GPUs, although the current implementation 
> > focuses on a
> >  single Kaveri/Berlin APU, and works alongside the existing radeon 
> > kernel
> >  graphics driver (kgd).
> >  AMD GPUs designed for use with HSA (Sea Islands and up) share 
> > some hardware
> >  functionality between HSA compute and regular gfx/compute (memory,
> >  interrupts, registers), while other functionality has been added
> >  specifically for HSA compute  (hw scheduler for virtualized 
> > compute rings).
> >  All shared hardware is owned by the radeon graphics driver, and 
> > an interface
> >  between kfd and kgd allows the kfd to make use of those shared 
> > resources,
> >  while HSA-specific functionality is managed directly by kfd by 
> > submitting
> >  packets into an HSA-specific command queue (the "HIQ").
> >  During kfd module initialization a char device node (/dev/kfd) is 
> > created
> >  (surviving until module exit), with ioctls for queue creation & 
> > management,
> >  and data structures are initialized for managing HSA device 
> > topology.
> >  The rest of the initialization is driven by calls from the radeon 
> > kgd at
> >  the following points :
> >  - radeon_init (kfd_init)
> >  - radeon_exit (kfd_fini)
> >  - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
> >  - radeon_driver_unload_kms (kfd_device_fini)
> >  During the probe and init processing per-device data structures 
> > are
> >  established which connect to the associated graphics kernel 
> > driver. This
> >  information is exposed to userspace via sysfs, along with a 
> > version number
> >  allowing userspace to determine if a topology change has occurred 
> > while it
> >  was reading from sysfs.
> >  The interface between kfd and kgd also allows the kfd to request 
> > buffer
> >  management services from kgd, and allows kgd to route interrupt 
> > requests to
> >  kfd code since the interrupt block is shared between regular
> >  graphics/compute and HSA compute subsystems in the GPU.
> >  The kfd code works with an open source usermode library 
> > ("libhsakmt") which
> >  is in the final stages of IP review and should be published in a 
> > separate
> >  repo over the next few days.
> >  The code operates in one of three modes, selectable via the 
> > sched_policy
> >  module parameter :
> >  - sched_policy=0 uses a hardware scheduler running in the MEC 
> > block within
> >  CP, and allows oversubscription (more queues than HW slots)
> >  - sched_policy=1 also uses HW scheduling but does not allow
> >  oversubscription, so create_queue requests fail when we run out 
> > of HW slots
> >  - sched_policy=2 does not use HW scheduling, so the driver 
> > manually assigns
> >  queues to HW slots by programming registers
> >  The "no HW scheduling" option is for debug & new hardware bringup 
> > only, so
> >  has less test coverage than the other options. Default in the 
> > current code
> >  is "HW scheduling without oversubscription" since that is where 
> > we have the
> >  most test coverage but we expect to change the default to "HW 
> > scheduling
> >  with oversubscription" after further testing. This effectively 
> > removes the
> >  HW limit on the number of work queues available to applications.
> >  Programs running on the GPU are associated with an address space 
> > through the
> >  VMID field, which is translated to a unique PASID at access time 
> > via a set
> >  of 16 VMID-to-PASID mapping registers. The available VMIDs 
> > (currently 16)
> >  are partitioned (under control of the radeon kgd) between current
> >  gfx/compute and HSA compute, with each getting 8 in the current 
> > code. The
> >  VMID-to-PASID mapping registers are updated by the HW scheduler 
> > when used,
> >  and by driver code if HW scheduling is not being used.
> >  The Sea Islands compute queues use a new "doorbell" mechanism 
> > instead of the
> >  earlier kernel-managed 

Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-10 Thread Jerome Glisse
On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
> This patch set implements a Heterogeneous System Architecture (HSA) driver 
> for radeon-family GPUs. 

This is just quick comments on few things. Given size of this, people
will need to have time to review things.

> 
> HSA allows different processor types (CPUs, DSPs, GPUs, etc..) to share 
> system resources more effectively via HW features including shared pageable 
> memory, userspace-accessible work queues, and platform-level atomics. In 
> addition to the memory protection mechanisms in GPUVM and IOMMUv2, the Sea 
> Islands family of GPUs also performs HW-level validation of commands passed 
> in through the queues (aka rings).
> 
> The code in this patch set is intended to serve both as a sample driver for 
> other HSA-compatible hardware devices and as a production driver for 
> radeon-family processors. The code is architected to support multiple CPUs 
> each with connected GPUs, although the current implementation focuses on a 
> single Kaveri/Berlin APU, and works alongside the existing radeon kernel 
> graphics driver (kgd). 
> 
> AMD GPUs designed for use with HSA (Sea Islands and up) share some hardware 
> functionality between HSA compute and regular gfx/compute (memory, 
> interrupts, registers), while other functionality has been added 
> specifically for HSA compute  (hw scheduler for virtualized compute rings). 
> All shared hardware is owned by the radeon graphics driver, and an interface 
> between kfd and kgd allows the kfd to make use of those shared resources, 
> while HSA-specific functionality is managed directly by kfd by submitting 
> packets into an HSA-specific command queue (the "HIQ").
> 
> During kfd module initialization a char device node (/dev/kfd) is created 
> (surviving until module exit), with ioctls for queue creation & management, 
> and data structures are initialized for managing HSA device topology. 
> 
> The rest of the initialization is driven by calls from the radeon kgd at 
> the following points :
> 
> - radeon_init (kfd_init)
> - radeon_exit (kfd_fini)
> - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
> - radeon_driver_unload_kms (kfd_device_fini)
> 
> During the probe and init processing per-device data structures are 
> established which connect to the associated graphics kernel driver. This 
> information is exposed to userspace via sysfs, along with a version number 
> allowing userspace to determine if a topology change has occurred while it 
> was reading from sysfs. 
> 
> The interface between kfd and kgd also allows the kfd to request buffer 
> management services from kgd, and allows kgd to route interrupt requests to 
> kfd code since the interrupt block is shared between regular 
> graphics/compute and HSA compute subsystems in the GPU.
> 
> The kfd code works with an open source usermode library ("libhsakmt") which 
> is in the final stages of IP review and should be published in a separate 
> repo over the next few days. 
> 
> The code operates in one of three modes, selectable via the sched_policy 
> module parameter :
> 
> - sched_policy=0 uses a hardware scheduler running in the MEC block within 
> CP, and allows oversubscription (more queues than HW slots) 
> 
> - sched_policy=1 also uses HW scheduling but does not allow 
> oversubscription, so create_queue requests fail when we run out of HW slots 
> 
> - sched_policy=2 does not use HW scheduling, so the driver manually assigns 
> queues to HW slots by programming registers
> 
> The "no HW scheduling" option is for debug & new hardware bringup only, so 
> has less test coverage than the other options. Default in the current code 
> is "HW scheduling without oversubscription" since that is where we have the 
> most test coverage but we expect to change the default to "HW scheduling 
> with oversubscription" after further testing. This effectively removes the 
> HW limit on the number of work queues available to applications.
> 
> Programs running on the GPU are associated with an address space through the 
> VMID field, which is translated to a unique PASID at access time via a set 
> of 16 VMID-to-PASID mapping registers. The available VMIDs (currently 16) 
> are partitioned (under control of the radeon kgd) between current 
> gfx/compute and HSA compute, with each getting 8 in the current code. The 
> VMID-to-PASID mapping registers are updated by the HW scheduler when used, 
> and by driver code if HW scheduling is not being used.  
> 
> The Sea Islands compute queues use a new "doorbell" mechanism instead of the 
> earlier kernel-managed write pointer registers. Doorbells use a separate BAR 
> dedicated for this purpose, and pages within the doorbell aperture are 
> mapped to userspace (each page mapped to only one user address space). 
> Writes to the doorbell aperture are intercepted by GPU hardware, allowing 
> userspace code to safely manage work queues (rings) without requiring a 
> kernel call for every 

[PATCH 00/83] AMD HSA kernel driver

2014-07-10 Thread Oded Gabbay
This patch set implements a Heterogeneous System Architecture (HSA) driver 
for radeon-family GPUs. 

HSA allows different processor types (CPUs, DSPs, GPUs, etc..) to share 
system resources more effectively via HW features including shared pageable 
memory, userspace-accessible work queues, and platform-level atomics. In 
addition to the memory protection mechanisms in GPUVM and IOMMUv2, the Sea 
Islands family of GPUs also performs HW-level validation of commands passed 
in through the queues (aka rings).

The code in this patch set is intended to serve both as a sample driver for 
other HSA-compatible hardware devices and as a production driver for 
radeon-family processors. The code is architected to support multiple CPUs 
each with connected GPUs, although the current implementation focuses on a 
single Kaveri/Berlin APU, and works alongside the existing radeon kernel 
graphics driver (kgd). 

AMD GPUs designed for use with HSA (Sea Islands and up) share some hardware 
functionality between HSA compute and regular gfx/compute (memory, 
interrupts, registers), while other functionality has been added 
specifically for HSA compute  (hw scheduler for virtualized compute rings). 
All shared hardware is owned by the radeon graphics driver, and an interface 
between kfd and kgd allows the kfd to make use of those shared resources, 
while HSA-specific functionality is managed directly by kfd by submitting 
packets into an HSA-specific command queue (the "HIQ").

During kfd module initialization a char device node (/dev/kfd) is created 
(surviving until module exit), with ioctls for queue creation & management, 
and data structures are initialized for managing HSA device topology. 

The rest of the initialization is driven by calls from the radeon kgd at 
the following points :

- radeon_init (kfd_init)
- radeon_exit (kfd_fini)
- radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
- radeon_driver_unload_kms (kfd_device_fini)

During the probe and init processing per-device data structures are 
established which connect to the associated graphics kernel driver. This 
information is exposed to userspace via sysfs, along with a version number 
allowing userspace to determine if a topology change has occurred while it 
was reading from sysfs. 

The interface between kfd and kgd also allows the kfd to request buffer 
management services from kgd, and allows kgd to route interrupt requests to 
kfd code since the interrupt block is shared between regular 
graphics/compute and HSA compute subsystems in the GPU.

The kfd code works with an open source usermode library ("libhsakmt") which 
is in the final stages of IP review and should be published in a separate 
repo over the next few days. 

The code operates in one of three modes, selectable via the sched_policy 
module parameter :

- sched_policy=0 uses a hardware scheduler running in the MEC block within 
CP, and allows oversubscription (more queues than HW slots) 

- sched_policy=1 also uses HW scheduling but does not allow 
oversubscription, so create_queue requests fail when we run out of HW slots 

- sched_policy=2 does not use HW scheduling, so the driver manually assigns 
queues to HW slots by programming registers

The "no HW scheduling" option is for debug & new hardware bringup only, so 
has less test coverage than the other options. Default in the current code 
is "HW scheduling without oversubscription" since that is where we have the 
most test coverage but we expect to change the default to "HW scheduling 
with oversubscription" after further testing. This effectively removes the 
HW limit on the number of work queues available to applications.

Programs running on the GPU are associated with an address space through the 
VMID field, which is translated to a unique PASID at access time via a set 
of 16 VMID-to-PASID mapping registers. The available VMIDs (currently 16) 
are partitioned (under control of the radeon kgd) between current 
gfx/compute and HSA compute, with each getting 8 in the current code. The 
VMID-to-PASID mapping registers are updated by the HW scheduler when used, 
and by driver code if HW scheduling is not being used.  

The Sea Islands compute queues use a new "doorbell" mechanism instead of the 
earlier kernel-managed write pointer registers. Doorbells use a separate BAR 
dedicated for this purpose, and pages within the doorbell aperture are 
mapped to userspace (each page mapped to only one user address space). 
Writes to the doorbell aperture are intercepted by GPU hardware, allowing 
userspace code to safely manage work queues (rings) without requiring a 
kernel call for every ring update. 

First step for an application process is to open the kfd device. Calls to 
open create a kfd "process" structure only for the first thread of the 
process. Subsequent open calls are checked to see if they are from processes 
using the same mm_struct and, if so, don't do anything. The kfd per-process 
data lives as long as the 

[PATCH 00/83] AMD HSA kernel driver

2014-07-10 Thread Oded Gabbay
This patch set implements a Heterogeneous System Architecture (HSA) driver 
for radeon-family GPUs. 

HSA allows different processor types (CPUs, DSPs, GPUs, etc..) to share 
system resources more effectively via HW features including shared pageable 
memory, userspace-accessible work queues, and platform-level atomics. In 
addition to the memory protection mechanisms in GPUVM and IOMMUv2, the Sea 
Islands family of GPUs also performs HW-level validation of commands passed 
in through the queues (aka rings).

The code in this patch set is intended to serve both as a sample driver for 
other HSA-compatible hardware devices and as a production driver for 
radeon-family processors. The code is architected to support multiple CPUs 
each with connected GPUs, although the current implementation focuses on a 
single Kaveri/Berlin APU, and works alongside the existing radeon kernel 
graphics driver (kgd). 

AMD GPUs designed for use with HSA (Sea Islands and up) share some hardware 
functionality between HSA compute and regular gfx/compute (memory, 
interrupts, registers), while other functionality has been added 
specifically for HSA compute  (hw scheduler for virtualized compute rings). 
All shared hardware is owned by the radeon graphics driver, and an interface 
between kfd and kgd allows the kfd to make use of those shared resources, 
while HSA-specific functionality is managed directly by kfd by submitting 
packets into an HSA-specific command queue (the HIQ).

During kfd module initialization a char device node (/dev/kfd) is created 
(surviving until module exit), with ioctls for queue creation  management, 
and data structures are initialized for managing HSA device topology. 

The rest of the initialization is driven by calls from the radeon kgd at 
the following points :

- radeon_init (kfd_init)
- radeon_exit (kfd_fini)
- radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
- radeon_driver_unload_kms (kfd_device_fini)

During the probe and init processing per-device data structures are 
established which connect to the associated graphics kernel driver. This 
information is exposed to userspace via sysfs, along with a version number 
allowing userspace to determine if a topology change has occurred while it 
was reading from sysfs. 

The interface between kfd and kgd also allows the kfd to request buffer 
management services from kgd, and allows kgd to route interrupt requests to 
kfd code since the interrupt block is shared between regular 
graphics/compute and HSA compute subsystems in the GPU.

The kfd code works with an open source usermode library (libhsakmt) which 
is in the final stages of IP review and should be published in a separate 
repo over the next few days. 

The code operates in one of three modes, selectable via the sched_policy 
module parameter :

- sched_policy=0 uses a hardware scheduler running in the MEC block within 
CP, and allows oversubscription (more queues than HW slots) 

- sched_policy=1 also uses HW scheduling but does not allow 
oversubscription, so create_queue requests fail when we run out of HW slots 

- sched_policy=2 does not use HW scheduling, so the driver manually assigns 
queues to HW slots by programming registers

The no HW scheduling option is for debug  new hardware bringup only, so 
has less test coverage than the other options. Default in the current code 
is HW scheduling without oversubscription since that is where we have the 
most test coverage but we expect to change the default to HW scheduling 
with oversubscription after further testing. This effectively removes the 
HW limit on the number of work queues available to applications.

Programs running on the GPU are associated with an address space through the 
VMID field, which is translated to a unique PASID at access time via a set 
of 16 VMID-to-PASID mapping registers. The available VMIDs (currently 16) 
are partitioned (under control of the radeon kgd) between current 
gfx/compute and HSA compute, with each getting 8 in the current code. The 
VMID-to-PASID mapping registers are updated by the HW scheduler when used, 
and by driver code if HW scheduling is not being used.  

The Sea Islands compute queues use a new doorbell mechanism instead of the 
earlier kernel-managed write pointer registers. Doorbells use a separate BAR 
dedicated for this purpose, and pages within the doorbell aperture are 
mapped to userspace (each page mapped to only one user address space). 
Writes to the doorbell aperture are intercepted by GPU hardware, allowing 
userspace code to safely manage work queues (rings) without requiring a 
kernel call for every ring update. 

First step for an application process is to open the kfd device. Calls to 
open create a kfd process structure only for the first thread of the 
process. Subsequent open calls are checked to see if they are from processes 
using the same mm_struct and, if so, don't do anything. The kfd per-process 
data lives as long as the mm_struct exists. 

Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-10 Thread Jerome Glisse
On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
 This patch set implements a Heterogeneous System Architecture (HSA) driver 
 for radeon-family GPUs. 

This is just quick comments on few things. Given size of this, people
will need to have time to review things.

 
 HSA allows different processor types (CPUs, DSPs, GPUs, etc..) to share 
 system resources more effectively via HW features including shared pageable 
 memory, userspace-accessible work queues, and platform-level atomics. In 
 addition to the memory protection mechanisms in GPUVM and IOMMUv2, the Sea 
 Islands family of GPUs also performs HW-level validation of commands passed 
 in through the queues (aka rings).
 
 The code in this patch set is intended to serve both as a sample driver for 
 other HSA-compatible hardware devices and as a production driver for 
 radeon-family processors. The code is architected to support multiple CPUs 
 each with connected GPUs, although the current implementation focuses on a 
 single Kaveri/Berlin APU, and works alongside the existing radeon kernel 
 graphics driver (kgd). 
 
 AMD GPUs designed for use with HSA (Sea Islands and up) share some hardware 
 functionality between HSA compute and regular gfx/compute (memory, 
 interrupts, registers), while other functionality has been added 
 specifically for HSA compute  (hw scheduler for virtualized compute rings). 
 All shared hardware is owned by the radeon graphics driver, and an interface 
 between kfd and kgd allows the kfd to make use of those shared resources, 
 while HSA-specific functionality is managed directly by kfd by submitting 
 packets into an HSA-specific command queue (the HIQ).
 
 During kfd module initialization a char device node (/dev/kfd) is created 
 (surviving until module exit), with ioctls for queue creation  management, 
 and data structures are initialized for managing HSA device topology. 
 
 The rest of the initialization is driven by calls from the radeon kgd at 
 the following points :
 
 - radeon_init (kfd_init)
 - radeon_exit (kfd_fini)
 - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
 - radeon_driver_unload_kms (kfd_device_fini)
 
 During the probe and init processing per-device data structures are 
 established which connect to the associated graphics kernel driver. This 
 information is exposed to userspace via sysfs, along with a version number 
 allowing userspace to determine if a topology change has occurred while it 
 was reading from sysfs. 
 
 The interface between kfd and kgd also allows the kfd to request buffer 
 management services from kgd, and allows kgd to route interrupt requests to 
 kfd code since the interrupt block is shared between regular 
 graphics/compute and HSA compute subsystems in the GPU.
 
 The kfd code works with an open source usermode library (libhsakmt) which 
 is in the final stages of IP review and should be published in a separate 
 repo over the next few days. 
 
 The code operates in one of three modes, selectable via the sched_policy 
 module parameter :
 
 - sched_policy=0 uses a hardware scheduler running in the MEC block within 
 CP, and allows oversubscription (more queues than HW slots) 
 
 - sched_policy=1 also uses HW scheduling but does not allow 
 oversubscription, so create_queue requests fail when we run out of HW slots 
 
 - sched_policy=2 does not use HW scheduling, so the driver manually assigns 
 queues to HW slots by programming registers
 
 The no HW scheduling option is for debug  new hardware bringup only, so 
 has less test coverage than the other options. Default in the current code 
 is HW scheduling without oversubscription since that is where we have the 
 most test coverage but we expect to change the default to HW scheduling 
 with oversubscription after further testing. This effectively removes the 
 HW limit on the number of work queues available to applications.
 
 Programs running on the GPU are associated with an address space through the 
 VMID field, which is translated to a unique PASID at access time via a set 
 of 16 VMID-to-PASID mapping registers. The available VMIDs (currently 16) 
 are partitioned (under control of the radeon kgd) between current 
 gfx/compute and HSA compute, with each getting 8 in the current code. The 
 VMID-to-PASID mapping registers are updated by the HW scheduler when used, 
 and by driver code if HW scheduling is not being used.  
 
 The Sea Islands compute queues use a new doorbell mechanism instead of the 
 earlier kernel-managed write pointer registers. Doorbells use a separate BAR 
 dedicated for this purpose, and pages within the doorbell aperture are 
 mapped to userspace (each page mapped to only one user address space). 
 Writes to the doorbell aperture are intercepted by GPU hardware, allowing 
 userspace code to safely manage work queues (rings) without requiring a 
 kernel call for every ring update. 
 
 First step for an application process is to open the kfd device. Calls to 
 open 

Re: [PATCH 00/83] AMD HSA kernel driver

2014-07-10 Thread Gabbay, Oded
On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote:
 On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote:
   This patch set implements a Heterogeneous System Architecture 
  (HSA) driver
   for radeon-family GPUs.
  
 This is just quick comments on few things. Given size of this, people
 will need to have time to review things.
  
   HSA allows different processor types (CPUs, DSPs, GPUs, etc..) to 
  share
   system resources more effectively via HW features including 
  shared pageable
   memory, userspace-accessible work queues, and platform-level 
  atomics. In
   addition to the memory protection mechanisms in GPUVM and 
  IOMMUv2, the Sea
   Islands family of GPUs also performs HW-level validation of 
  commands passed
   in through the queues (aka rings).
   The code in this patch set is intended to serve both as a sample 
  driver for
   other HSA-compatible hardware devices and as a production driver 
  for
   radeon-family processors. The code is architected to support 
  multiple CPUs
   each with connected GPUs, although the current implementation 
  focuses on a
   single Kaveri/Berlin APU, and works alongside the existing radeon 
  kernel
   graphics driver (kgd).
   AMD GPUs designed for use with HSA (Sea Islands and up) share 
  some hardware
   functionality between HSA compute and regular gfx/compute (memory,
   interrupts, registers), while other functionality has been added
   specifically for HSA compute  (hw scheduler for virtualized 
  compute rings).
   All shared hardware is owned by the radeon graphics driver, and 
  an interface
   between kfd and kgd allows the kfd to make use of those shared 
  resources,
   while HSA-specific functionality is managed directly by kfd by 
  submitting
   packets into an HSA-specific command queue (the HIQ).
   During kfd module initialization a char device node (/dev/kfd) is 
  created
   (surviving until module exit), with ioctls for queue creation  
  management,
   and data structures are initialized for managing HSA device 
  topology.
   The rest of the initialization is driven by calls from the radeon 
  kgd at
   the following points :
   - radeon_init (kfd_init)
   - radeon_exit (kfd_fini)
   - radeon_driver_load_kms (kfd_device_probe, kfd_device_init)
   - radeon_driver_unload_kms (kfd_device_fini)
   During the probe and init processing per-device data structures 
  are
   established which connect to the associated graphics kernel 
  driver. This
   information is exposed to userspace via sysfs, along with a 
  version number
   allowing userspace to determine if a topology change has occurred 
  while it
   was reading from sysfs.
   The interface between kfd and kgd also allows the kfd to request 
  buffer
   management services from kgd, and allows kgd to route interrupt 
  requests to
   kfd code since the interrupt block is shared between regular
   graphics/compute and HSA compute subsystems in the GPU.
   The kfd code works with an open source usermode library 
  (libhsakmt) which
   is in the final stages of IP review and should be published in a 
  separate
   repo over the next few days.
   The code operates in one of three modes, selectable via the 
  sched_policy
   module parameter :
   - sched_policy=0 uses a hardware scheduler running in the MEC 
  block within
   CP, and allows oversubscription (more queues than HW slots)
   - sched_policy=1 also uses HW scheduling but does not allow
   oversubscription, so create_queue requests fail when we run out 
  of HW slots
   - sched_policy=2 does not use HW scheduling, so the driver 
  manually assigns
   queues to HW slots by programming registers
   The no HW scheduling option is for debug  new hardware bringup 
  only, so
   has less test coverage than the other options. Default in the 
  current code
   is HW scheduling without oversubscription since that is where 
  we have the
   most test coverage but we expect to change the default to HW 
  scheduling
   with oversubscription after further testing. This effectively 
  removes the
   HW limit on the number of work queues available to applications.
   Programs running on the GPU are associated with an address space 
  through the
   VMID field, which is translated to a unique PASID at access time 
  via a set
   of 16 VMID-to-PASID mapping registers. The available VMIDs 
  (currently 16)
   are partitioned (under control of the radeon kgd) between current
   gfx/compute and HSA compute, with each getting 8 in the current 
  code. The
   VMID-to-PASID mapping registers are updated by the HW scheduler 
  when used,
   and by driver code if HW scheduling is not being used.
   The Sea Islands compute queues use a new doorbell mechanism 
  instead of the
   earlier kernel-managed write pointer registers. Doorbells use a 
  separate BAR
   dedicated for this purpose, and pages within the doorbell 
  aperture are
   mapped to userspace (each page mapped to only one user address 
  space).
   Writes to the doorbell