Re: [PATCH v5 00/21] Host1x/TegraDRM UAPI

2021-01-28 Thread Mikko Perttunen

On 1/28/21 12:06 AM, Dmitry Osipenko wrote:

28.01.2021 00:57, Mikko Perttunen пишет:



On 1/27/21 11:26 PM, Dmitry Osipenko wrote:

26.01.2021 05:45, Mikko Perttunen пишет:

5. The hardware state of sync points should be reset when sync point is
requested, not when host1x driver is initialized.


This may be doable, but I don't think it is critical for this UAPI, so
let's consider it after this series.

The userspace should anyway not be able to assume the initial value of
the syncpoint upon allocation. The kernel should set it to some high
value to catch any issues related to wraparound.


This is critical because min != max when sync point is requested.


That I would just consider a bug, and it can be fixed. But it's
orthogonal to whether the value gets reset every time the syncpoint is
allocated.




Also, this makes code more complicated since it now needs to ensure all
waits on the syncpoint have completed before freeing the syncpoint,
which can be nontrivial e.g. if the waiter is in a different virtual
machine or some other device connected via PCIe (a real usecase).


It sounds to me that these VM sync points should be treated very
separately from a generic sync points, don't you think so? Let's not mix
them and get the generic sync points usable first.



They are not special in any way, I'm just referring to cases where the
waiter (consumer) is remote. The allocator of the syncpoint (producer)
doesn't necessarily even need to know about it. The same concern is
applicable within a single VM, or single application as well. Just
putting out the point that this is something that needs to be taken care
of if we were to reset the value.


Will kernel driver know that it deals with a VM sync point? >
Will it be possible to get a non-VM sync point explicitly?

If driver knows that it deals with a VM sync point, then we can treat it
specially, avoiding the reset and etc.



There is no distinction between a "VM syncpoint" and a "non-VM 
syncpoint". To provide an example on the issue, consider we have VM1 and 
VM2. VM1 is running some camera software that produces frames. VM2 runs 
some analysis software that consumes those frames through shared memory. 
In between there is some software that takes the postfences of the 
camera software and transmits them to the analysis software to be used 
as prefences. Only this transmitting software needs to know anything 
about multiple VMs being in use.


At any time, if we want to reset the value of the syncpoint in question, 
we must ensure that all fences waiting on that syncpoint have observed 
the fence's threshold first.


Consider an interleaving like this:

VM1 (Camera)VM2 (Analysis)
---
Send postfence (threshold=X)
Recv postfence (threshold=X)
Increment syncpoint value to X
Free syncpoint
Reset syncpoint value to Y
Wait on postfence
---

Now depending on the relative values of X and Y, either VM2 progresses 
(correctly), or gets stuck. If we didn't reset the syncpoint, the race 
could not occur (unless VM1 managed to increment the syncpoint 2^31 
times before VM2's wait starts, which is very unrealistic).


We can remove "VM1" and "VM2" everywhere here, and just consider two 
applications in one VM, too, or two parts of one application. Within one 
VM the issue is of course easier because the driver can have knowledge 
about fences and solve the race internally, but I'd always prefer not 
having such special cases.


Now, admittedly this is probably not a huge problem unless we are 
freeing syncpoints all the time, but nevertheless something to consider.


Mikko
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v5 00/21] Host1x/TegraDRM UAPI

2021-01-28 Thread Dmitry Osipenko
28.01.2021 00:57, Mikko Perttunen пишет:
> 
> 
> On 1/27/21 11:26 PM, Dmitry Osipenko wrote:
>> 26.01.2021 05:45, Mikko Perttunen пишет:
 5. The hardware state of sync points should be reset when sync point is
 requested, not when host1x driver is initialized.
>>>
>>> This may be doable, but I don't think it is critical for this UAPI, so
>>> let's consider it after this series.
>>>
>>> The userspace should anyway not be able to assume the initial value of
>>> the syncpoint upon allocation. The kernel should set it to some high
>>> value to catch any issues related to wraparound.
>>
>> This is critical because min != max when sync point is requested.
> 
> That I would just consider a bug, and it can be fixed. But it's
> orthogonal to whether the value gets reset every time the syncpoint is
> allocated.
> 
>>
>>> Also, this makes code more complicated since it now needs to ensure all
>>> waits on the syncpoint have completed before freeing the syncpoint,
>>> which can be nontrivial e.g. if the waiter is in a different virtual
>>> machine or some other device connected via PCIe (a real usecase).
>>
>> It sounds to me that these VM sync points should be treated very
>> separately from a generic sync points, don't you think so? Let's not mix
>> them and get the generic sync points usable first.
>>
> 
> They are not special in any way, I'm just referring to cases where the
> waiter (consumer) is remote. The allocator of the syncpoint (producer)
> doesn't necessarily even need to know about it. The same concern is
> applicable within a single VM, or single application as well. Just
> putting out the point that this is something that needs to be taken care
> of if we were to reset the value.

Will kernel driver know that it deals with a VM sync point?

Will it be possible to get a non-VM sync point explicitly?

If driver knows that it deals with a VM sync point, then we can treat it
specially, avoiding the reset and etc.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v5 00/21] Host1x/TegraDRM UAPI

2021-01-28 Thread Dmitry Osipenko
26.01.2021 05:45, Mikko Perttunen пишет:
>> 5. The hardware state of sync points should be reset when sync point is
>> requested, not when host1x driver is initialized.
> 
> This may be doable, but I don't think it is critical for this UAPI, so
> let's consider it after this series.
> 
> The userspace should anyway not be able to assume the initial value of
> the syncpoint upon allocation. The kernel should set it to some high
> value to catch any issues related to wraparound.

This is critical because min != max when sync point is requested.

> Also, this makes code more complicated since it now needs to ensure all
> waits on the syncpoint have completed before freeing the syncpoint,
> which can be nontrivial e.g. if the waiter is in a different virtual
> machine or some other device connected via PCIe (a real usecase).

It sounds to me that these VM sync points should be treated very
separately from a generic sync points, don't you think so? Let's not mix
them and get the generic sync points usable first.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v5 00/21] Host1x/TegraDRM UAPI

2021-01-27 Thread Mikko Perttunen



On 1/27/21 11:26 PM, Dmitry Osipenko wrote:

26.01.2021 05:45, Mikko Perttunen пишет:

5. The hardware state of sync points should be reset when sync point is
requested, not when host1x driver is initialized.


This may be doable, but I don't think it is critical for this UAPI, so
let's consider it after this series.

The userspace should anyway not be able to assume the initial value of
the syncpoint upon allocation. The kernel should set it to some high
value to catch any issues related to wraparound.


This is critical because min != max when sync point is requested.


That I would just consider a bug, and it can be fixed. But it's 
orthogonal to whether the value gets reset every time the syncpoint is 
allocated.





Also, this makes code more complicated since it now needs to ensure all
waits on the syncpoint have completed before freeing the syncpoint,
which can be nontrivial e.g. if the waiter is in a different virtual
machine or some other device connected via PCIe (a real usecase).


It sounds to me that these VM sync points should be treated very
separately from a generic sync points, don't you think so? Let's not mix
them and get the generic sync points usable first.



They are not special in any way, I'm just referring to cases where the 
waiter (consumer) is remote. The allocator of the syncpoint (producer) 
doesn't necessarily even need to know about it. The same concern is 
applicable within a single VM, or single application as well. Just 
putting out the point that this is something that needs to be taken care 
of if we were to reset the value.


Mikko
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v5 00/21] Host1x/TegraDRM UAPI

2021-01-25 Thread Mikko Perttunen

On 1/20/21 12:29 AM, Dmitry Osipenko wrote:

11.01.2021 15:59, Mikko Perttunen пишет:

Hi all,

here's the fifth revision of the Host1x/TegraDRM UAPI proposal,
containing primarily small bug fixes. It has also been
rebased on top of recent linux-next.

vaapi-tegra-driver has been updated to support the new UAPI
as well as Tegra186:

   https://github.com/cyndis/vaapi-tegra-driver

The `putsurface` program has been tested to work.

The test suite for the new UAPI is available at
https://github.com/cyndis/uapi-test

The series can be also found in
https://github.com/cyndis/linux/commits/work/host1x-uapi-v5.

Older versions:
v1: https://www.spinics.net/lists/linux-tegra/msg51000.html
v2: https://www.spinics.net/lists/linux-tegra/msg53061.html
v3: https://www.spinics.net/lists/linux-tegra/msg54370.html
v4: https://www.spinics.net/lists/dri-devel/msg279897.html

Thank you,
Mikko


The basic support for the v5 UAPI is added now to the Opentegra driver.
  In overall UAPI works, but there are couple things that we need to
improve, I'll focus on them here.

Problems


1. The channel map/unmap API needs some more thought.

The main problem is a difficulty to track liveness of BOs and mappings.
  The kernel driver refs BO for each mapping and userspace needs to track
both BO and its mappings together, it's too easy to make mistake and
leak BOs without noticing.

2. Host1x sync point UAPI should not be used for tracking DRM jobs.  I
remember we discussed this previously, but this pops up again and I
don't remember where we ended previously.

This creates unnecessary complexity for userspace.  Userspace needs to
go through a lot of chores just to get a sync point and then to manage
it for the jobs.

Nothing stops two different channels to reuse a single sync point and
use it for a job, fixing this will only add more complexity to the
kernel driver instead of removing it.

3. The signalling of DMA fences doesn't work properly in v5 apparently
because of the host1x waiter bug.  I see that sync point interrupt
happens, but waiter callback isn't invoked.

4. The sync_file API is not very suitable for DRM purposes because of
-EMFILE "Too many open files", which I saw while was running x11perf.
It also adds complexity to userspace, instead of removing it.  This
approach not suitable for DRM scheduler as well.

5. Sync points have a dirty hardware state when allocated / requested.
The special sync point reservation is meaningless in this case.

6. I found that the need to chop cmdstream into gathers is a bit
cumbersome for userspace of older SoCs which don't have h/w firewall.
Can we support option where all commands are collected into a single
dedicated cmdstream for a job?

Possible solutions for the above problems
=

1. Stop to use concept of "channels". Switch to DRM context-only.

Each DRM context should get access to all engines once DRM context is
created.  Think of it like "when DRM context is opened, it opens a
channel for each engine".

Then each DRM context will get one instance of mapping per-engine for
each BO.

enum tegra_engine {
TEGRA_GR2D,
TEGRA_GR3D,
TEGRA_VIC,
...
NUM_ENGINES
};

struct tegra_bo_mapping {
dma_addr_t ioaddr;
...
};

struct tegra_bo {
...
struct tegra_bo_mapping *hw_maps[NUM_ENGINES];
};

Instead of DRM_IOCTL_TEGRA_CHANNEL_MAP we should have
DRM_IOCTL_TEGRA_GEM_MAP_TO_ENGINE, which will create a BO mapping for a
specified h/w engine.

Once BO is closed, all its mappings should be closed too.  This way
userspace doesn't need to track both BOs and mappings.

Everything that userspace needs to do is:

1) Open DRM context
2) Create GEM
3) Map GEM to required hardware engines
4) Submit job that uses GEM handle
5) Close GEM

If GEM wasn't mapped prior to the job's submission, then job will fail
because kernel driver won't resolve the IO mapping of the GEM.
Perhaps we can instead change the reference counting such that if you 
close the GEM object, the mappings will be dropped as well automatically.




2. We will probably need a dedicated drm_tegra_submit_cmd for sync point
increments.  The job's sync point will be allocated dynamically when job
is submitted.  We will need a fag for the sync_incr and wait_syncpt
commands, saying "it's a job's sync point increment/wait"


Negative. Like I have explained in previous discussions, with the 
current way the usage of hardware resources is much more deterministic 
and obvious. I disagree on the point that this is much more complicated 
for the userspace. Separating syncpoint and channel allocation is one of 
the primary motivations of this series for me.




3. We should use dma-fence API directly and waiter-shim should be
removed.  It's great that you're already working on this.

4. Sync file shouldn't be needed for the part of DRM API which doesn't
interact with external non-DRM devices.  We 

Re: [PATCH v5 00/21] Host1x/TegraDRM UAPI

2021-01-21 Thread Dmitry Osipenko
11.01.2021 15:59, Mikko Perttunen пишет:
> Hi all,
> 
> here's the fifth revision of the Host1x/TegraDRM UAPI proposal,
> containing primarily small bug fixes. It has also been
> rebased on top of recent linux-next.
> 
> vaapi-tegra-driver has been updated to support the new UAPI
> as well as Tegra186:
> 
>   https://github.com/cyndis/vaapi-tegra-driver
> 
> The `putsurface` program has been tested to work.
> 
> The test suite for the new UAPI is available at
> https://github.com/cyndis/uapi-test
> 
> The series can be also found in
> https://github.com/cyndis/linux/commits/work/host1x-uapi-v5.
> 
> Older versions:
> v1: https://www.spinics.net/lists/linux-tegra/msg51000.html
> v2: https://www.spinics.net/lists/linux-tegra/msg53061.html
> v3: https://www.spinics.net/lists/linux-tegra/msg54370.html
> v4: https://www.spinics.net/lists/dri-devel/msg279897.html
> 
> Thank you,
> Mikko

The basic support for the v5 UAPI is added now to the Opentegra driver.
 In overall UAPI works, but there are couple things that we need to
improve, I'll focus on them here.

Problems


1. The channel map/unmap API needs some more thought.

The main problem is a difficulty to track liveness of BOs and mappings.
 The kernel driver refs BO for each mapping and userspace needs to track
both BO and its mappings together, it's too easy to make mistake and
leak BOs without noticing.

2. Host1x sync point UAPI should not be used for tracking DRM jobs.  I
remember we discussed this previously, but this pops up again and I
don't remember where we ended previously.

This creates unnecessary complexity for userspace.  Userspace needs to
go through a lot of chores just to get a sync point and then to manage
it for the jobs.

Nothing stops two different channels to reuse a single sync point and
use it for a job, fixing this will only add more complexity to the
kernel driver instead of removing it.

3. The signalling of DMA fences doesn't work properly in v5 apparently
because of the host1x waiter bug.  I see that sync point interrupt
happens, but waiter callback isn't invoked.

4. The sync_file API is not very suitable for DRM purposes because of
-EMFILE "Too many open files", which I saw while was running x11perf.
It also adds complexity to userspace, instead of removing it.  This
approach not suitable for DRM scheduler as well.

5. Sync points have a dirty hardware state when allocated / requested.
The special sync point reservation is meaningless in this case.

6. I found that the need to chop cmdstream into gathers is a bit
cumbersome for userspace of older SoCs which don't have h/w firewall.
Can we support option where all commands are collected into a single
dedicated cmdstream for a job?

Possible solutions for the above problems
=

1. Stop to use concept of "channels". Switch to DRM context-only.

Each DRM context should get access to all engines once DRM context is
created.  Think of it like "when DRM context is opened, it opens a
channel for each engine".

Then each DRM context will get one instance of mapping per-engine for
each BO.

enum tegra_engine {
TEGRA_GR2D,
TEGRA_GR3D,
TEGRA_VIC,
...
NUM_ENGINES
};

struct tegra_bo_mapping {
dma_addr_t ioaddr;
...
};

struct tegra_bo {
...
struct tegra_bo_mapping *hw_maps[NUM_ENGINES];
};

Instead of DRM_IOCTL_TEGRA_CHANNEL_MAP we should have
DRM_IOCTL_TEGRA_GEM_MAP_TO_ENGINE, which will create a BO mapping for a
specified h/w engine.

Once BO is closed, all its mappings should be closed too.  This way
userspace doesn't need to track both BOs and mappings.

Everything that userspace needs to do is:

1) Open DRM context
2) Create GEM
3) Map GEM to required hardware engines
4) Submit job that uses GEM handle
5) Close GEM

If GEM wasn't mapped prior to the job's submission, then job will fail
because kernel driver won't resolve the IO mapping of the GEM.

2. We will probably need a dedicated drm_tegra_submit_cmd for sync point
increments.  The job's sync point will be allocated dynamically when job
is submitted.  We will need a fag for the sync_incr and wait_syncpt
commands, saying "it's a job's sync point increment/wait"

3. We should use dma-fence API directly and waiter-shim should be
removed.  It's great that you're already working on this.

4. Sync file shouldn't be needed for the part of DRM API which doesn't
interact with external non-DRM devices.  We should use DRM syncobj for
everything related to DRM, it's a superior API over sync file, it's
suitable for DRM scheduler.

5. The hardware state of sync points should be reset when sync point is
requested, not when host1x driver is initialized.

6. We will need to allocate a host1x BO for a job's cmdstream and add a
restart command to the end of the job's stream.  CDMA will jump into the
job's stream from push buffer.

We could add a flag for that to drm_tegra_submit_cmd_gather, 

[PATCH v5 00/21] Host1x/TegraDRM UAPI

2021-01-11 Thread Mikko Perttunen
Hi all,

here's the fifth revision of the Host1x/TegraDRM UAPI proposal,
containing primarily small bug fixes. It has also been
rebased on top of recent linux-next.

vaapi-tegra-driver has been updated to support the new UAPI
as well as Tegra186:

  https://github.com/cyndis/vaapi-tegra-driver

The `putsurface` program has been tested to work.

The test suite for the new UAPI is available at
https://github.com/cyndis/uapi-test

The series can be also found in
https://github.com/cyndis/linux/commits/work/host1x-uapi-v5.

Older versions:
v1: https://www.spinics.net/lists/linux-tegra/msg51000.html
v2: https://www.spinics.net/lists/linux-tegra/msg53061.html
v3: https://www.spinics.net/lists/linux-tegra/msg54370.html
v4: https://www.spinics.net/lists/dri-devel/msg279897.html

Thank you,
Mikko

Mikko Perttunen (21):
  gpu: host1x: Use different lock classes for each client
  gpu: host1x: Allow syncpoints without associated client
  gpu: host1x: Show number of pending waiters in debugfs
  gpu: host1x: Remove cancelled waiters immediately
  gpu: host1x: Use HW-equivalent syncpoint expiration check
  gpu: host1x: Cleanup and refcounting for syncpoints
  gpu: host1x: Introduce UAPI header
  gpu: host1x: Implement /dev/host1x device node
  gpu: host1x: DMA fences and userspace fence creation
  gpu: host1x: Add no-recovery mode
  gpu: host1x: Add job release callback
  gpu: host1x: Add support for syncpoint waits in CDMA pushbuffer
  gpu: host1x: Reset max value when freeing a syncpoint
  gpu: host1x: Reserve VBLANK syncpoints at initialization
  drm/tegra: Add new UAPI to header
  drm/tegra: Boot VIC during runtime PM resume
  drm/tegra: Set resv fields when importing/exporting GEMs
  drm/tegra: Allocate per-engine channel in core code
  drm/tegra: Implement new UAPI
  drm/tegra: Implement job submission part of new UAPI
  drm/tegra: Add job firewall

 drivers/gpu/drm/tegra/Makefile |   4 +
 drivers/gpu/drm/tegra/dc.c |  10 +-
 drivers/gpu/drm/tegra/drm.c|  69 ++--
 drivers/gpu/drm/tegra/drm.h|   9 +
 drivers/gpu/drm/tegra/gem.c|   2 +
 drivers/gpu/drm/tegra/gr2d.c   |   4 +-
 drivers/gpu/drm/tegra/gr3d.c   |   4 +-
 drivers/gpu/drm/tegra/uapi.h   |  63 
 drivers/gpu/drm/tegra/uapi/firewall.c  | 221 +
 drivers/gpu/drm/tegra/uapi/gather_bo.c |  86 +
 drivers/gpu/drm/tegra/uapi/gather_bo.h |  22 ++
 drivers/gpu/drm/tegra/uapi/submit.c| 436 +
 drivers/gpu/drm/tegra/uapi/submit.h|  21 ++
 drivers/gpu/drm/tegra/uapi/uapi.c  | 307 +
 drivers/gpu/drm/tegra/vic.c| 118 ---
 drivers/gpu/host1x/Makefile|   2 +
 drivers/gpu/host1x/bus.c   |   7 +-
 drivers/gpu/host1x/cdma.c  |  69 +++-
 drivers/gpu/host1x/debug.c |  14 +-
 drivers/gpu/host1x/dev.c   |  15 +
 drivers/gpu/host1x/dev.h   |  16 +-
 drivers/gpu/host1x/fence.c | 208 
 drivers/gpu/host1x/fence.h |  13 +
 drivers/gpu/host1x/hw/cdma_hw.c|   2 +-
 drivers/gpu/host1x/hw/channel_hw.c |  63 ++--
 drivers/gpu/host1x/hw/debug_hw.c   |  11 +-
 drivers/gpu/host1x/intr.c  |  32 +-
 drivers/gpu/host1x/intr.h  |   6 +-
 drivers/gpu/host1x/job.c   |  79 +++--
 drivers/gpu/host1x/job.h   |  14 +
 drivers/gpu/host1x/syncpt.c| 187 ++-
 drivers/gpu/host1x/syncpt.h|  16 +-
 drivers/gpu/host1x/uapi.c  | 385 ++
 drivers/gpu/host1x/uapi.h  |  22 ++
 drivers/staging/media/tegra-video/vi.c |   4 +-
 include/linux/host1x.h |  47 ++-
 include/uapi/drm/tegra_drm.h   | 338 +--
 include/uapi/linux/host1x.h| 134 
 38 files changed, 2771 insertions(+), 289 deletions(-)
 create mode 100644 drivers/gpu/drm/tegra/uapi.h
 create mode 100644 drivers/gpu/drm/tegra/uapi/firewall.c
 create mode 100644 drivers/gpu/drm/tegra/uapi/gather_bo.c
 create mode 100644 drivers/gpu/drm/tegra/uapi/gather_bo.h
 create mode 100644 drivers/gpu/drm/tegra/uapi/submit.c
 create mode 100644 drivers/gpu/drm/tegra/uapi/submit.h
 create mode 100644 drivers/gpu/drm/tegra/uapi/uapi.c
 create mode 100644 drivers/gpu/host1x/fence.c
 create mode 100644 drivers/gpu/host1x/fence.h
 create mode 100644 drivers/gpu/host1x/uapi.c
 create mode 100644 drivers/gpu/host1x/uapi.h
 create mode 100644 include/uapi/linux/host1x.h

-- 
2.30.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel