[PATCH] drm/amd/powerplay: fix compile warning for wrong data type V2

2018-09-04 Thread Evan Quan
do_div expects the 1st argument in 64bit instead of 32bit.
Drop the usage of do_div as it seems unnecessary.

V2: drop usage of do_div completely

Change-Id: Id2032a43727e7f1fa516d3565354d412a561
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
index 3efd59e984a3..1e65ac01e0f5 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
@@ -1195,7 +1195,7 @@ static int vega20_set_sclk_od(
int ret = 0;
 
od_sclk = golden_sclk_table->dpm_levels[golden_sclk_table->count - 
1].value * value;
-   do_div(od_sclk, 100);
+   od_sclk /= 100;
od_sclk += golden_sclk_table->dpm_levels[golden_sclk_table->count - 
1].value;
 
ret = vega20_od8_set_settings(hwmgr, OD8_SETTING_GFXCLK_FMAX, od_sclk);
@@ -1242,7 +1242,7 @@ static int vega20_set_mclk_od(
int ret = 0;
 
od_mclk = golden_mclk_table->dpm_levels[golden_mclk_table->count - 
1].value * value;
-   do_div(od_mclk, 100);
+   od_mclk /= 100;
od_mclk += golden_mclk_table->dpm_levels[golden_mclk_table->count - 
1].value;
 
ret = vega20_od8_set_settings(hwmgr, OD8_SETTING_UCLK_FMAX, od_mclk);
-- 
2.18.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amd/powerplay: fix compile warning for wrong data type

2018-09-04 Thread Quan, Evan


> -Original Message-
> From: Deucher, Alexander
> Sent: 2018年9月5日 12:33
> To: Quan, Evan ; amd-gfx@lists.freedesktop.org
> Cc: Quan, Evan 
> Subject: RE: [PATCH] drm/amd/powerplay: fix compile warning for wrong
> data type
> 
> > -Original Message-
> > From: amd-gfx  On Behalf Of
> > Evan Quan
> > Sent: Tuesday, September 4, 2018 10:21 PM
> > To: amd-gfx@lists.freedesktop.org
> > Cc: Quan, Evan 
> > Subject: [PATCH] drm/amd/powerplay: fix compile warning for wrong data
> > type
> >
> > do_div expects the 1st argument in 64bit instead of 32bit.
> 
> Do we actually need to use do_div here?  If both arguments are 32 bit, can't
> we just use regular division?
> 
[Quan, Evan] That's a good idea. Will update the patch accordingly.
> Alex
> 
> >
> > Change-Id: Id2032a43727e7f1fa516d3565354d412a561
> > Signed-off-by: Evan Quan 
> > ---
> >  drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c | 8 
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
> > b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
> > index 3efd59e984a3..6ba5f328249d 100644
> > --- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
> > +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
> > @@ -1191,14 +1191,14 @@ static int vega20_set_sclk_od(
> > &(data->dpm_table.gfx_table);
> > struct vega20_single_dpm_table *golden_sclk_table =
> > &(data->golden_dpm_table.gfx_table);
> > -   uint32_t od_sclk;
> > +   uint64_t od_sclk;
> > int ret = 0;
> >
> > od_sclk = golden_sclk_table->dpm_levels[golden_sclk_table->count
> > - 1].value * value;
> > do_div(od_sclk, 100);
> > od_sclk += golden_sclk_table->dpm_levels[golden_sclk_table-
> > >count - 1].value;
> >
> > -   ret = vega20_od8_set_settings(hwmgr,
> > OD8_SETTING_GFXCLK_FMAX, od_sclk);
> > +   ret = vega20_od8_set_settings(hwmgr,
> > OD8_SETTING_GFXCLK_FMAX, (uint32_t)od_sclk);
> > PP_ASSERT_WITH_CODE(!ret,
> > "[SetSclkOD] failed to set od gfxclk!",
> > return ret);
> > @@ -1238,14 +1238,14 @@ static int vega20_set_mclk_od(
> > &(data->dpm_table.mem_table);
> > struct vega20_single_dpm_table *golden_mclk_table =
> > &(data->golden_dpm_table.mem_table);
> > -   uint32_t od_mclk;
> > +   uint64_t od_mclk;
> > int ret = 0;
> >
> > od_mclk = golden_mclk_table->dpm_levels[golden_mclk_table-
> > >count - 1].value * value;
> > do_div(od_mclk, 100);
> > od_mclk += golden_mclk_table->dpm_levels[golden_mclk_table-
> > >count - 1].value;
> >
> > -   ret = vega20_od8_set_settings(hwmgr, OD8_SETTING_UCLK_FMAX,
> > od_mclk);
> > +   ret = vega20_od8_set_settings(hwmgr, OD8_SETTING_UCLK_FMAX,
> > (uint32_t)od_mclk);
> > PP_ASSERT_WITH_CODE(!ret,
> > "[SetMclkOD] failed to set od memclk!",
> > return ret);
> > --
> > 2.18.0
> >
> > ___
> > amd-gfx mailing list
> > amd-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amd/powerplay: fix compile warning for wrong data type

2018-09-04 Thread Deucher, Alexander
> -Original Message-
> From: amd-gfx  On Behalf Of Evan
> Quan
> Sent: Tuesday, September 4, 2018 10:21 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Quan, Evan 
> Subject: [PATCH] drm/amd/powerplay: fix compile warning for wrong data
> type
> 
> do_div expects the 1st argument in 64bit instead of 32bit.

Do we actually need to use do_div here?  If both arguments are 32 bit, can't we 
just use regular division?

Alex

> 
> Change-Id: Id2032a43727e7f1fa516d3565354d412a561
> Signed-off-by: Evan Quan 
> ---
>  drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
> b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
> index 3efd59e984a3..6ba5f328249d 100644
> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
> @@ -1191,14 +1191,14 @@ static int vega20_set_sclk_od(
>   &(data->dpm_table.gfx_table);
>   struct vega20_single_dpm_table *golden_sclk_table =
>   &(data->golden_dpm_table.gfx_table);
> - uint32_t od_sclk;
> + uint64_t od_sclk;
>   int ret = 0;
> 
>   od_sclk = golden_sclk_table->dpm_levels[golden_sclk_table->count
> - 1].value * value;
>   do_div(od_sclk, 100);
>   od_sclk += golden_sclk_table->dpm_levels[golden_sclk_table-
> >count - 1].value;
> 
> - ret = vega20_od8_set_settings(hwmgr,
> OD8_SETTING_GFXCLK_FMAX, od_sclk);
> + ret = vega20_od8_set_settings(hwmgr,
> OD8_SETTING_GFXCLK_FMAX, (uint32_t)od_sclk);
>   PP_ASSERT_WITH_CODE(!ret,
>   "[SetSclkOD] failed to set od gfxclk!",
>   return ret);
> @@ -1238,14 +1238,14 @@ static int vega20_set_mclk_od(
>   &(data->dpm_table.mem_table);
>   struct vega20_single_dpm_table *golden_mclk_table =
>   &(data->golden_dpm_table.mem_table);
> - uint32_t od_mclk;
> + uint64_t od_mclk;
>   int ret = 0;
> 
>   od_mclk = golden_mclk_table->dpm_levels[golden_mclk_table-
> >count - 1].value * value;
>   do_div(od_mclk, 100);
>   od_mclk += golden_mclk_table->dpm_levels[golden_mclk_table-
> >count - 1].value;
> 
> - ret = vega20_od8_set_settings(hwmgr, OD8_SETTING_UCLK_FMAX,
> od_mclk);
> + ret = vega20_od8_set_settings(hwmgr, OD8_SETTING_UCLK_FMAX,
> (uint32_t)od_mclk);
>   PP_ASSERT_WITH_CODE(!ret,
>   "[SetMclkOD] failed to set od memclk!",
>   return ret);
> --
> 2.18.0
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-09-04 Thread zhoucm1



On 2018年09月04日 17:20, Christian König wrote:

Am 04.09.2018 um 11:00 schrieb zhoucm1:



On 2018年09月04日 16:42, Christian König wrote:

Am 04.09.2018 um 10:27 schrieb zhoucm1:



On 2018年09月04日 16:05, Christian König wrote:

Am 04.09.2018 um 09:53 schrieb zhoucm1:

[SNIP]


How about this idea:
1. Each signaling point is a fence implementation with an rb 
node.
2. Each node keeps a reference to the last previously inserted 
node.

3. Each node is referenced by the sync object itself.
4. Before each signal/wait operation we do a garbage 
collection and remove the first node from the tree as long as 
it is signaled.


5. When enable_signaling is requested for a node we cascade 
that to the left using rb_prev.
    This ensures that signaling is enabled for the current 
fence as well as all previous fences.


6. A wait just looks into the tree for the signal point lower 
or equal of the requested sequence number.
After re-thought your idea, I think it doesn't work since there 
is no timeline value as a line:
signal pt value doesn't must be continues, which can be jump by 
signal operation, like 1, 4, 8, 15, 19, e.g. there are five 
singal_pt, 
signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if 
a wait pt is 7, do you mean this wait only needs signal_pt1 and 
signal_pt4???  That's certainly not right, we need to make sure 
the timeline value is bigger than wait pt value, that means 
signal_pt8 is need for wait_pt7.


That can be defined as we like it, e.g. when a wait operation asks 
for 7 we can return 8 as well.
If defined this, then problem is coming again, if 8 is removed when 
garbage collection, you will return 15?


The garbage collection is only done for signaled nodes. So when 8 is 
already garbage collected and 7 is asked we know that we don't need 
to return anything.
8 is a signaled node, waitA/signal operation do garbage collection, 
how waitB(7) know the garbage history?


Well we of course keep what the last garbage collected number is, 
don't we?


Since there is no timeline as a line, I think this is not right 
direction.


That is actually intended. There is no infinite timeline here, just 
a windows of the last not yet signaled fences.
No one said the it's a infinite timeline, timeline will stop 
increasing when syncobj is released.


Yeah, but the syncobj can live for a very very long time. Not having 
some form of shrinking it when fences are signaled is certainly not 
going to fly very far.

I will try to fix this problem.
btw, when I try your suggestion, I find it will be difficult to 
implement drm_syncobj_array_wait_timeout by your idea, since it needs 
first_signaled. if there is un-signaled syncobj, we will still register 
cb to timeline value change, then still back to need enble_signaling.


Thanks,
David Zhou


Regards,
Christian.



Anyway kref is a good way to solve the 'free' problem, I will try to 
use it improve my patch, of course, will refer your idea.:)


Thanks,
David Zhou


Otherwise you will never be able to release nodes from the tree 
since you always need to keep them around just in case somebody asks 
for a lower number.


Regards,
Christian.





The key is that as soon as a signal point is added adding a 
previous point is no longer allowed.

That's intention.

Regards,
David Zhou




7. When the sync object is released we use 
rbtree_postorder_for_each_entry_safe() and drop the extra 
reference to each node, but never call rb_erase!
    This way the rb_tree stays in memory, but without a root 
(e.g. the sync object). It only destructs itself when the 
looked up references to the nodes are dropped.
And here, who will destroy rb node since no one do 
enable_signaling, and there is no callback to free themselves.


The node will be destroyed when the last reference drops, not when 
enable_signaling is called.


In other words the sync_obj keeps the references to each tree 
object to provide the wait operation, as soon as the sync_obj is 
destroyed we don't need that functionality any more.


We don't even need to wait for anything to be signaled, this way 
we can drop all unused signal points as soon as the sync_obj is 
destroyed.


Only the used ones will stay alive and provide the necessary 
functionality to provide the signal for each wait operation.


Regards,
Christian.



Regards,
David Zhou


Well that is quite a bunch of logic, but I think that should 
work fine.
Yeah, it could work, simple timeline reference also can solve 
'free' problem.


I think this approach is still quite a bit better, 


e.g. you don't run into circle dependency problems, it needs 
less memory and each node has always the same size which means 
we can use a kmem_cache for it.


Regards,
Christian.



Thanks,
David Zhou






___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx








___
amd-gfx mailing list

[PATCH] drm/amd/powerplay: fix compile warning for wrong data type

2018-09-04 Thread Evan Quan
do_div expects the 1st argument in 64bit instead of 32bit.

Change-Id: Id2032a43727e7f1fa516d3565354d412a561
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
index 3efd59e984a3..6ba5f328249d 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
@@ -1191,14 +1191,14 @@ static int vega20_set_sclk_od(
&(data->dpm_table.gfx_table);
struct vega20_single_dpm_table *golden_sclk_table =
&(data->golden_dpm_table.gfx_table);
-   uint32_t od_sclk;
+   uint64_t od_sclk;
int ret = 0;
 
od_sclk = golden_sclk_table->dpm_levels[golden_sclk_table->count - 
1].value * value;
do_div(od_sclk, 100);
od_sclk += golden_sclk_table->dpm_levels[golden_sclk_table->count - 
1].value;
 
-   ret = vega20_od8_set_settings(hwmgr, OD8_SETTING_GFXCLK_FMAX, od_sclk);
+   ret = vega20_od8_set_settings(hwmgr, OD8_SETTING_GFXCLK_FMAX, 
(uint32_t)od_sclk);
PP_ASSERT_WITH_CODE(!ret,
"[SetSclkOD] failed to set od gfxclk!",
return ret);
@@ -1238,14 +1238,14 @@ static int vega20_set_mclk_od(
&(data->dpm_table.mem_table);
struct vega20_single_dpm_table *golden_mclk_table =
&(data->golden_dpm_table.mem_table);
-   uint32_t od_mclk;
+   uint64_t od_mclk;
int ret = 0;
 
od_mclk = golden_mclk_table->dpm_levels[golden_mclk_table->count - 
1].value * value;
do_div(od_mclk, 100);
od_mclk += golden_mclk_table->dpm_levels[golden_mclk_table->count - 
1].value;
 
-   ret = vega20_od8_set_settings(hwmgr, OD8_SETTING_UCLK_FMAX, od_mclk);
+   ret = vega20_od8_set_settings(hwmgr, OD8_SETTING_UCLK_FMAX, 
(uint32_t)od_mclk);
PP_ASSERT_WITH_CODE(!ret,
"[SetMclkOD] failed to set od memclk!",
return ret);
-- 
2.18.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm] libdrm: Allow dynamic drm majors on linux

2018-09-04 Thread Dave Airlie
On Tue, 4 Sep 2018 at 03:00, Thomas Hellstrom  wrote:
>
> On 09/03/2018 06:33 PM, Daniel Vetter wrote:
> > On Mon, Sep 03, 2018 at 11:16:29AM +0200, Thomas Hellstrom wrote:
> >> On 08/31/2018 05:30 PM, Thomas Hellstrom wrote:
> >>> On 08/31/2018 05:27 PM, Emil Velikov wrote:
>  On 31 August 2018 at 15:38, Michel Dänzer  wrote:
> > [ Adding the amd-gfx list ]
> >
> > On 2018-08-31 3:05 p.m., Thomas Hellstrom wrote:
> >> On 08/31/2018 02:30 PM, Emil Velikov wrote:
> >>> On 31 August 2018 at 12:54, Thomas Hellstrom 
> >>> wrote:
>  To determine whether a device node is a drm device
>  node or not, the code
>  currently compares the node's major number to the static drm major
>  device
>  number.
> 
>  This breaks the standalone vmwgfx driver on XWayland dri clients,
> 
> >>> Any particular reason why the code doesn't use a fixed node there?
> >>> It will make the diff vs the in-kernel driver a bit smaller.
> >> Because then it won't be able to interoperate with other in-tree
> >> drivers, like virtual drm drivers or passthrough usb drm drivers.
> >> There is no clean way to share the minor number allocation
> >> with in-tree
> >> drm, so standalone vmwgfx is using dynamic major allocation.
> > I wonder why I haven't heard of any of these issues with the standalone
> > version of amdgpu shipped in packaged AMD releases. Does that
> > also use a
> > different major number? If yes, maybe it's just that nobody has tried
> > Xwayland clients with that driver. If no, how does it avoid the other
> > issues described above?
> >
>  AFAICT, the difference is that the standalone vmwgfx uses an internal
>  copy of drm core.
>  It doesn't reuse the in-kernel drm, hence it cannot know which minor
>  it can use.
> 
>  -Emil
> >>> Actually, standalone vmwgfx could perhaps also try to allocate minors
> >>> from 63 and downwards. That might work, but needs some verification.
> >>>
> >> So unfortuntately this doesn't work since the in-tree drm's file operations
> >> are registered with the DRM_MAJOR.
> >> So I still think the patch is the way to go. If people are concerned that
> >> also fbdev file descriptors are allowed, perhaps there are other sysfs
> >> traits we can look at?
> > Somewhat out of curiosity, but why do you have to overwrite all of drm?
> > amdgpu seems to be able to pull their stunt off without ...
> > -Daniel
>
> At the time we launched the standalone vmwgfx, the DRM <-> driver
> interface was moving considerably more rapidly than the DRM <-> kernel
> interface. I think that's still the case. Hence less work for us. Also
> meant we can install the full driver stack with latest features on
> fairly old VMs without backported DRM functionality.
>

I think this should be fine for 99% of drm usage, there may be corner
cases in wierd places, but I can't point to any that really matter
(maybe strace?)

Acked-by: Dave Airlie 

Dave.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.

2018-09-04 Thread Daniel Vetter
On Tue, Sep 04, 2018 at 09:36:01PM +0200, Bas Nieuwenhuizen wrote:
> On Tue, Sep 4, 2018 at 9:28 PM Daniel Vetter  wrote:
> >
> > On Tue, Sep 4, 2018 at 8:31 PM, Bas Nieuwenhuizen
> >  wrote:
> > > On Tue, Sep 4, 2018 at 8:27 PM Daniel Vetter  wrote:
> > >>
> > >> On Tue, Sep 4, 2018 at 7:57 PM, Bas Nieuwenhuizen
> > >>  wrote:
> > >> > On Tue, Sep 4, 2018 at 7:48 PM Christian König
> > >> >  wrote:
> > >> >>
> > >> >> Am 04.09.2018 um 18:37 schrieb Daniel Vetter:
> > >> >> > On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen
> > >> >> >  wrote:
> > >> >> >> On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter  
> > >> >> >> wrote:
> > >> >> >>> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen
> > >> >> >>>  wrote:
> > >> >>  On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter  
> > >> >>  wrote:
> > >> >> > On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen 
> > >> >> > wrote:
> > >> >> >> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter  
> > >> >> >> wrote:
> > >> >> >>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König 
> > >> >> >>> wrote:
> > >> >>  Am 04.09.2018 um 12:15 schrieb Daniel Stone:
> > >> >> > Hi,
> > >> >> >
> > >> >> > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter 
> > >> >> >  wrote:
> > >> >> >> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen 
> > >> >> >>  wrote:
> > >> >> >>> +/* The chip this is compatible with.
> > >> >> >>> + *
> > >> >> >>> + * If compression is disabled, use
> > >> >> >>> + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
> > >> >> >>> + *   - AMDGPU_CHIP_VEGA10 for GFX9+
> > >> >> >>> + *
> > >> >> >>> + * With compression enabled please use the exact chip.
> > >> >> >>> + *
> > >> >> >>> + * TODO: Do some generations share DCC format?
> > >> >> >>> + */
> > >> >> >>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40
> > >> >> >>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK  
> > >> >> >>> 0xff
> > >> >> >> Do you really need all the combinations here of DCC + gpu 
> > >> >> >> gen + tiling
> > >> >> >> details? When we had the entire discussion with nvidia 
> > >> >> >> folks they
> > >> >> >> eventually agreed that they don't need the massive pile 
> > >> >> >> with every
> > >> >> >> possible combination. Do you really plan to share all 
> > >> >> >> these different
> > >> >> >> things?
> > >> >> >>
> > >> >> >> Note that e.g. on i915 we spec some of the tiling 
> > >> >> >> depending upon
> > >> >> >> buffer size and buffer format (because that's how the hw 
> > >> >> >> works), not
> > >> >> >> using explicit modifier flags for everything.
> > >> >> > Right. The conclusion, after people went through and 
> > >> >> > started sorting
> > >> >> > out the kinds of formats for which they would _actually_ 
> > >> >> > export real
> > >> >> > colour buffers for, that most vendors definitely have fewer 
> > >> >> > than
> > >> >> > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> > >> >> > possible formats to represent, very likely fewer than
> > >> >> > 340,282,366,920,938,463,463,374,607,431,768,211,456 
> > >> >> > formats, probably
> > >> >> > fewer than 72,057,594,037,927,936 formats, and even still 
> > >> >> > generally
> > >> >> > fewer than 281,474,976,710,656 if you want to be generous 
> > >> >> > and leave 8
> > >> >> > bits of the 56 available.
> > >> >>  The problem here is that at least for some parameters we 
> > >> >>  actually don't know
> > >> >>  which formats are actually used.
> > >> >> 
> > >> >>  The following are not real world examples, but just to 
> > >> >>  explain the general
> > >> >>  problem.
> > >> >> 
> > >> >>  The memory configuration for example can be not ASIC 
> > >> >>  specific, but rather
> > >> >>  determined by whoever took the ASIC and glued it together 
> > >> >>  with VRAM on a
> > >> >>  board. It is not likely that somebody puts all the VRAM 
> > >> >>  chips on one
> > >> >>  channel, but it is still perfectly possible.
> > >> >> 
> > >> >>  Same is true for things like harvesting, e.g. of 16 channels 
> > >> >>  halve of them
> > >> >>  could be bad and we need to know which to actually use.
> > >> >> >>> For my understanding: This leaks outside the chip when 
> > >> >> >>> sharing buffers?
> > >> >> >>> All the information you only need locally to a given amdgpu 
> > >> >> >>> instance
> > >> >> >>> don't really need to be encoded in modifiers.
> > >> >> 

Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.

2018-09-04 Thread Bas Nieuwenhuizen
On Tue, Sep 4, 2018 at 9:28 PM Daniel Vetter  wrote:
>
> On Tue, Sep 4, 2018 at 8:31 PM, Bas Nieuwenhuizen
>  wrote:
> > On Tue, Sep 4, 2018 at 8:27 PM Daniel Vetter  wrote:
> >>
> >> On Tue, Sep 4, 2018 at 7:57 PM, Bas Nieuwenhuizen
> >>  wrote:
> >> > On Tue, Sep 4, 2018 at 7:48 PM Christian König
> >> >  wrote:
> >> >>
> >> >> Am 04.09.2018 um 18:37 schrieb Daniel Vetter:
> >> >> > On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen
> >> >> >  wrote:
> >> >> >> On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter  wrote:
> >> >> >>> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen
> >> >> >>>  wrote:
> >> >>  On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter  
> >> >>  wrote:
> >> >> > On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote:
> >> >> >> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter  
> >> >> >> wrote:
> >> >> >>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:
> >> >>  Am 04.09.2018 um 12:15 schrieb Daniel Stone:
> >> >> > Hi,
> >> >> >
> >> >> > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter 
> >> >> >  wrote:
> >> >> >> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen 
> >> >> >>  wrote:
> >> >> >>> +/* The chip this is compatible with.
> >> >> >>> + *
> >> >> >>> + * If compression is disabled, use
> >> >> >>> + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
> >> >> >>> + *   - AMDGPU_CHIP_VEGA10 for GFX9+
> >> >> >>> + *
> >> >> >>> + * With compression enabled please use the exact chip.
> >> >> >>> + *
> >> >> >>> + * TODO: Do some generations share DCC format?
> >> >> >>> + */
> >> >> >>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40
> >> >> >>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK  0xff
> >> >> >> Do you really need all the combinations here of DCC + gpu 
> >> >> >> gen + tiling
> >> >> >> details? When we had the entire discussion with nvidia folks 
> >> >> >> they
> >> >> >> eventually agreed that they don't need the massive pile with 
> >> >> >> every
> >> >> >> possible combination. Do you really plan to share all these 
> >> >> >> different
> >> >> >> things?
> >> >> >>
> >> >> >> Note that e.g. on i915 we spec some of the tiling depending 
> >> >> >> upon
> >> >> >> buffer size and buffer format (because that's how the hw 
> >> >> >> works), not
> >> >> >> using explicit modifier flags for everything.
> >> >> > Right. The conclusion, after people went through and started 
> >> >> > sorting
> >> >> > out the kinds of formats for which they would _actually_ 
> >> >> > export real
> >> >> > colour buffers for, that most vendors definitely have fewer 
> >> >> > than
> >> >> > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> >> >> > possible formats to represent, very likely fewer than
> >> >> > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, 
> >> >> > probably
> >> >> > fewer than 72,057,594,037,927,936 formats, and even still 
> >> >> > generally
> >> >> > fewer than 281,474,976,710,656 if you want to be generous and 
> >> >> > leave 8
> >> >> > bits of the 56 available.
> >> >>  The problem here is that at least for some parameters we 
> >> >>  actually don't know
> >> >>  which formats are actually used.
> >> >> 
> >> >>  The following are not real world examples, but just to explain 
> >> >>  the general
> >> >>  problem.
> >> >> 
> >> >>  The memory configuration for example can be not ASIC specific, 
> >> >>  but rather
> >> >>  determined by whoever took the ASIC and glued it together with 
> >> >>  VRAM on a
> >> >>  board. It is not likely that somebody puts all the VRAM chips 
> >> >>  on one
> >> >>  channel, but it is still perfectly possible.
> >> >> 
> >> >>  Same is true for things like harvesting, e.g. of 16 channels 
> >> >>  halve of them
> >> >>  could be bad and we need to know which to actually use.
> >> >> >>> For my understanding: This leaks outside the chip when sharing 
> >> >> >>> buffers?
> >> >> >>> All the information you only need locally to a given amdgpu 
> >> >> >>> instance
> >> >> >>> don't really need to be encoded in modifiers.
> >> >> >>>
> >> >> >>> Pointers to code where this is all decided (kernel and radeonsi 
> >> >> >>> would be
> >> >> >>> good starters I guess) would be really good here.
> >> >> >> I extracted the information on which bits are relevant mostly 
> >> >> >> from the
> >> >> >> AddrFromCoord functions in addrlib in 

Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.

2018-09-04 Thread Daniel Vetter
On Tue, Sep 4, 2018 at 8:31 PM, Bas Nieuwenhuizen
 wrote:
> On Tue, Sep 4, 2018 at 8:27 PM Daniel Vetter  wrote:
>>
>> On Tue, Sep 4, 2018 at 7:57 PM, Bas Nieuwenhuizen
>>  wrote:
>> > On Tue, Sep 4, 2018 at 7:48 PM Christian König
>> >  wrote:
>> >>
>> >> Am 04.09.2018 um 18:37 schrieb Daniel Vetter:
>> >> > On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen
>> >> >  wrote:
>> >> >> On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter  wrote:
>> >> >>> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen
>> >> >>>  wrote:
>> >>  On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter  wrote:
>> >> > On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote:
>> >> >> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter  
>> >> >> wrote:
>> >> >>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:
>> >>  Am 04.09.2018 um 12:15 schrieb Daniel Stone:
>> >> > Hi,
>> >> >
>> >> > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter 
>> >> >  wrote:
>> >> >> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen 
>> >> >>  wrote:
>> >> >>> +/* The chip this is compatible with.
>> >> >>> + *
>> >> >>> + * If compression is disabled, use
>> >> >>> + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
>> >> >>> + *   - AMDGPU_CHIP_VEGA10 for GFX9+
>> >> >>> + *
>> >> >>> + * With compression enabled please use the exact chip.
>> >> >>> + *
>> >> >>> + * TODO: Do some generations share DCC format?
>> >> >>> + */
>> >> >>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40
>> >> >>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK  0xff
>> >> >> Do you really need all the combinations here of DCC + gpu gen 
>> >> >> + tiling
>> >> >> details? When we had the entire discussion with nvidia folks 
>> >> >> they
>> >> >> eventually agreed that they don't need the massive pile with 
>> >> >> every
>> >> >> possible combination. Do you really plan to share all these 
>> >> >> different
>> >> >> things?
>> >> >>
>> >> >> Note that e.g. on i915 we spec some of the tiling depending 
>> >> >> upon
>> >> >> buffer size and buffer format (because that's how the hw 
>> >> >> works), not
>> >> >> using explicit modifier flags for everything.
>> >> > Right. The conclusion, after people went through and started 
>> >> > sorting
>> >> > out the kinds of formats for which they would _actually_ export 
>> >> > real
>> >> > colour buffers for, that most vendors definitely have fewer than
>> >> > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
>> >> > possible formats to represent, very likely fewer than
>> >> > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, 
>> >> > probably
>> >> > fewer than 72,057,594,037,927,936 formats, and even still 
>> >> > generally
>> >> > fewer than 281,474,976,710,656 if you want to be generous and 
>> >> > leave 8
>> >> > bits of the 56 available.
>> >>  The problem here is that at least for some parameters we 
>> >>  actually don't know
>> >>  which formats are actually used.
>> >> 
>> >>  The following are not real world examples, but just to explain 
>> >>  the general
>> >>  problem.
>> >> 
>> >>  The memory configuration for example can be not ASIC specific, 
>> >>  but rather
>> >>  determined by whoever took the ASIC and glued it together with 
>> >>  VRAM on a
>> >>  board. It is not likely that somebody puts all the VRAM chips on 
>> >>  one
>> >>  channel, but it is still perfectly possible.
>> >> 
>> >>  Same is true for things like harvesting, e.g. of 16 channels 
>> >>  halve of them
>> >>  could be bad and we need to know which to actually use.
>> >> >>> For my understanding: This leaks outside the chip when sharing 
>> >> >>> buffers?
>> >> >>> All the information you only need locally to a given amdgpu 
>> >> >>> instance
>> >> >>> don't really need to be encoded in modifiers.
>> >> >>>
>> >> >>> Pointers to code where this is all decided (kernel and radeonsi 
>> >> >>> would be
>> >> >>> good starters I guess) would be really good here.
>> >> >> I extracted the information on which bits are relevant mostly from 
>> >> >> the
>> >> >> AddrFromCoord functions in addrlib in mesa:
>> >> >>
>> >> >> for macro-tiles:
>> >> >> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587
>> >> >>
>> >> >> for micro-tiles (or the micro-tiles in macro-tiles):
>> >> >>
>> >> >> 

Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.

2018-09-04 Thread Daniel Vetter
On Tue, Sep 4, 2018 at 7:57 PM, Bas Nieuwenhuizen
 wrote:
> On Tue, Sep 4, 2018 at 7:48 PM Christian König
>  wrote:
>>
>> Am 04.09.2018 um 18:37 schrieb Daniel Vetter:
>> > On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen
>> >  wrote:
>> >> On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter  wrote:
>> >>> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen
>> >>>  wrote:
>>  On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter  wrote:
>> > On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote:
>> >> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter  wrote:
>> >>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:
>>  Am 04.09.2018 um 12:15 schrieb Daniel Stone:
>> > Hi,
>> >
>> > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter 
>> >  wrote:
>> >> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen 
>> >>  wrote:
>> >>> +/* The chip this is compatible with.
>> >>> + *
>> >>> + * If compression is disabled, use
>> >>> + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
>> >>> + *   - AMDGPU_CHIP_VEGA10 for GFX9+
>> >>> + *
>> >>> + * With compression enabled please use the exact chip.
>> >>> + *
>> >>> + * TODO: Do some generations share DCC format?
>> >>> + */
>> >>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40
>> >>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK  0xff
>> >> Do you really need all the combinations here of DCC + gpu gen + 
>> >> tiling
>> >> details? When we had the entire discussion with nvidia folks they
>> >> eventually agreed that they don't need the massive pile with every
>> >> possible combination. Do you really plan to share all these 
>> >> different
>> >> things?
>> >>
>> >> Note that e.g. on i915 we spec some of the tiling depending upon
>> >> buffer size and buffer format (because that's how the hw works), 
>> >> not
>> >> using explicit modifier flags for everything.
>> > Right. The conclusion, after people went through and started 
>> > sorting
>> > out the kinds of formats for which they would _actually_ export 
>> > real
>> > colour buffers for, that most vendors definitely have fewer than
>> > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
>> > possible formats to represent, very likely fewer than
>> > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, 
>> > probably
>> > fewer than 72,057,594,037,927,936 formats, and even still generally
>> > fewer than 281,474,976,710,656 if you want to be generous and 
>> > leave 8
>> > bits of the 56 available.
>>  The problem here is that at least for some parameters we actually 
>>  don't know
>>  which formats are actually used.
>> 
>>  The following are not real world examples, but just to explain the 
>>  general
>>  problem.
>> 
>>  The memory configuration for example can be not ASIC specific, but 
>>  rather
>>  determined by whoever took the ASIC and glued it together with VRAM 
>>  on a
>>  board. It is not likely that somebody puts all the VRAM chips on one
>>  channel, but it is still perfectly possible.
>> 
>>  Same is true for things like harvesting, e.g. of 16 channels halve 
>>  of them
>>  could be bad and we need to know which to actually use.
>> >>> For my understanding: This leaks outside the chip when sharing 
>> >>> buffers?
>> >>> All the information you only need locally to a given amdgpu instance
>> >>> don't really need to be encoded in modifiers.
>> >>>
>> >>> Pointers to code where this is all decided (kernel and radeonsi 
>> >>> would be
>> >>> good starters I guess) would be really good here.
>> >> I extracted the information on which bits are relevant mostly from the
>> >> AddrFromCoord functions in addrlib in mesa:
>> >>
>> >> for macro-tiles:
>> >> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587
>> >>
>> >> for micro-tiles (or the micro-tiles in macro-tiles):
>> >>
>> >> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016
>> > So this is the decoding thing. How many of these actually exist, even 
>> > when
>> > taking all the other information into account?
>> >
>> > E.g. given a platform + memory config (seems needed) + drm_fourcc + 
>> > stride
>> > + height + width, how much of all these bits do you actually still 
>> > freely
>> > pick?
>>  Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse,
>>  thick 

Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.

2018-09-04 Thread Christian König

Am 04.09.2018 um 20:00 schrieb Bas Nieuwenhuizen:

On Tue, Sep 4, 2018 at 7:57 PM Bas Nieuwenhuizen
 wrote:

On Tue, Sep 4, 2018 at 7:48 PM Christian König
 wrote:

Am 04.09.2018 um 18:37 schrieb Daniel Vetter:

On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen
 wrote:

On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter  wrote:

On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen
 wrote:

On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter  wrote:

On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote:

On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter  wrote:

On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:

Am 04.09.2018 um 12:15 schrieb Daniel Stone:

Hi,

On Tue, 4 Sep 2018 at 11:05, Daniel Vetter  wrote:

On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen  
wrote:

+/* The chip this is compatible with.
+ *
+ * If compression is disabled, use
+ *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
+ *   - AMDGPU_CHIP_VEGA10 for GFX9+
+ *
+ * With compression enabled please use the exact chip.
+ *
+ * TODO: Do some generations share DCC format?
+ */
+#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40
+#define AMDGPU_MODIFIER_CHIP_GEN_MASK  0xff

Do you really need all the combinations here of DCC + gpu gen + tiling
details? When we had the entire discussion with nvidia folks they
eventually agreed that they don't need the massive pile with every
possible combination. Do you really plan to share all these different
things?

Note that e.g. on i915 we spec some of the tiling depending upon
buffer size and buffer format (because that's how the hw works), not
using explicit modifier flags for everything.

Right. The conclusion, after people went through and started sorting
out the kinds of formats for which they would _actually_ export real
colour buffers for, that most vendors definitely have fewer than
115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
possible formats to represent, very likely fewer than
340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
fewer than 72,057,594,037,927,936 formats, and even still generally
fewer than 281,474,976,710,656 if you want to be generous and leave 8
bits of the 56 available.

The problem here is that at least for some parameters we actually don't know
which formats are actually used.

The following are not real world examples, but just to explain the general
problem.

The memory configuration for example can be not ASIC specific, but rather
determined by whoever took the ASIC and glued it together with VRAM on a
board. It is not likely that somebody puts all the VRAM chips on one
channel, but it is still perfectly possible.

Same is true for things like harvesting, e.g. of 16 channels halve of them
could be bad and we need to know which to actually use.

For my understanding: This leaks outside the chip when sharing buffers?
All the information you only need locally to a given amdgpu instance
don't really need to be encoded in modifiers.

Pointers to code where this is all decided (kernel and radeonsi would be
good starters I guess) would be really good here.

I extracted the information on which bits are relevant mostly from the
AddrFromCoord functions in addrlib in mesa:

for macro-tiles:
https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587

for micro-tiles (or the micro-tiles in macro-tiles):

https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016

So this is the decoding thing. How many of these actually exist, even when
taking all the other information into account?

E.g. given a platform + memory config (seems needed) + drm_fourcc + stride
+ height + width, how much of all these bits do you actually still freely
pick?

Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse,
thick variants of macro-tile), MICRO_TILE_MODE(display, non-display,
depth, display-rotated) + whether to use compression, everything else
is fixed given those option, the properties of the chip and the
format.



It might be that all the things you need to know from the memory config
don't encode smaller than the macro/micro/whatever else stuff. But that's
kinda the angle that we looked at this for everyone else.

E.g. for multi-plane stuff, if everyone picks the same config for the
2nd/3rd plane, then you don't actually need to encode that. It just
becomes part of the implied stuff in the modifier.

The problem is some GPUs are compatible for say 8-bpp images, but not
for 32-bpp surfaces. e.g. lets look at the following table showing the
current configuration for all GFX6-GFX8 GPU:

format: (bank width, bank height, macro tile aspect, num banks) for
8-bpp, 16-bpp and 32 bpp single-sample followed by the PIPE_CONFIG

verde: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
oland: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
hainan: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) 

Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.

2018-09-04 Thread Bas Nieuwenhuizen
On Tue, Sep 4, 2018 at 7:57 PM Bas Nieuwenhuizen
 wrote:
>
> On Tue, Sep 4, 2018 at 7:48 PM Christian König
>  wrote:
> >
> > Am 04.09.2018 um 18:37 schrieb Daniel Vetter:
> > > On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen
> > >  wrote:
> > >> On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter  wrote:
> > >>> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen
> > >>>  wrote:
> >  On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter  wrote:
> > > On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote:
> > >> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter  wrote:
> > >>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:
> >  Am 04.09.2018 um 12:15 schrieb Daniel Stone:
> > > Hi,
> > >
> > > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter 
> > >  wrote:
> > >> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen 
> > >>  wrote:
> > >>> +/* The chip this is compatible with.
> > >>> + *
> > >>> + * If compression is disabled, use
> > >>> + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
> > >>> + *   - AMDGPU_CHIP_VEGA10 for GFX9+
> > >>> + *
> > >>> + * With compression enabled please use the exact chip.
> > >>> + *
> > >>> + * TODO: Do some generations share DCC format?
> > >>> + */
> > >>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40
> > >>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK  0xff
> > >> Do you really need all the combinations here of DCC + gpu gen + 
> > >> tiling
> > >> details? When we had the entire discussion with nvidia folks they
> > >> eventually agreed that they don't need the massive pile with 
> > >> every
> > >> possible combination. Do you really plan to share all these 
> > >> different
> > >> things?
> > >>
> > >> Note that e.g. on i915 we spec some of the tiling depending upon
> > >> buffer size and buffer format (because that's how the hw works), 
> > >> not
> > >> using explicit modifier flags for everything.
> > > Right. The conclusion, after people went through and started 
> > > sorting
> > > out the kinds of formats for which they would _actually_ export 
> > > real
> > > colour buffers for, that most vendors definitely have fewer than
> > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> > > possible formats to represent, very likely fewer than
> > > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, 
> > > probably
> > > fewer than 72,057,594,037,927,936 formats, and even still 
> > > generally
> > > fewer than 281,474,976,710,656 if you want to be generous and 
> > > leave 8
> > > bits of the 56 available.
> >  The problem here is that at least for some parameters we actually 
> >  don't know
> >  which formats are actually used.
> > 
> >  The following are not real world examples, but just to explain the 
> >  general
> >  problem.
> > 
> >  The memory configuration for example can be not ASIC specific, but 
> >  rather
> >  determined by whoever took the ASIC and glued it together with 
> >  VRAM on a
> >  board. It is not likely that somebody puts all the VRAM chips on 
> >  one
> >  channel, but it is still perfectly possible.
> > 
> >  Same is true for things like harvesting, e.g. of 16 channels halve 
> >  of them
> >  could be bad and we need to know which to actually use.
> > >>> For my understanding: This leaks outside the chip when sharing 
> > >>> buffers?
> > >>> All the information you only need locally to a given amdgpu instance
> > >>> don't really need to be encoded in modifiers.
> > >>>
> > >>> Pointers to code where this is all decided (kernel and radeonsi 
> > >>> would be
> > >>> good starters I guess) would be really good here.
> > >> I extracted the information on which bits are relevant mostly from 
> > >> the
> > >> AddrFromCoord functions in addrlib in mesa:
> > >>
> > >> for macro-tiles:
> > >> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587
> > >>
> > >> for micro-tiles (or the micro-tiles in macro-tiles):
> > >>
> > >> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016
> > > So this is the decoding thing. How many of these actually exist, even 
> > > when
> > > taking all the other information into account?
> > >
> > > E.g. given a platform + memory config (seems needed) + drm_fourcc + 
> > > stride
> > > + height + width, how much of all these 

Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.

2018-09-04 Thread Bas Nieuwenhuizen
On Tue, Sep 4, 2018 at 7:48 PM Christian König
 wrote:
>
> Am 04.09.2018 um 18:37 schrieb Daniel Vetter:
> > On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen
> >  wrote:
> >> On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter  wrote:
> >>> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen
> >>>  wrote:
>  On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter  wrote:
> > On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote:
> >> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter  wrote:
> >>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:
>  Am 04.09.2018 um 12:15 schrieb Daniel Stone:
> > Hi,
> >
> > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter  
> > wrote:
> >> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen 
> >>  wrote:
> >>> +/* The chip this is compatible with.
> >>> + *
> >>> + * If compression is disabled, use
> >>> + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
> >>> + *   - AMDGPU_CHIP_VEGA10 for GFX9+
> >>> + *
> >>> + * With compression enabled please use the exact chip.
> >>> + *
> >>> + * TODO: Do some generations share DCC format?
> >>> + */
> >>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40
> >>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK  0xff
> >> Do you really need all the combinations here of DCC + gpu gen + 
> >> tiling
> >> details? When we had the entire discussion with nvidia folks they
> >> eventually agreed that they don't need the massive pile with every
> >> possible combination. Do you really plan to share all these 
> >> different
> >> things?
> >>
> >> Note that e.g. on i915 we spec some of the tiling depending upon
> >> buffer size and buffer format (because that's how the hw works), 
> >> not
> >> using explicit modifier flags for everything.
> > Right. The conclusion, after people went through and started sorting
> > out the kinds of formats for which they would _actually_ export real
> > colour buffers for, that most vendors definitely have fewer than
> > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> > possible formats to represent, very likely fewer than
> > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, 
> > probably
> > fewer than 72,057,594,037,927,936 formats, and even still generally
> > fewer than 281,474,976,710,656 if you want to be generous and leave 
> > 8
> > bits of the 56 available.
>  The problem here is that at least for some parameters we actually 
>  don't know
>  which formats are actually used.
> 
>  The following are not real world examples, but just to explain the 
>  general
>  problem.
> 
>  The memory configuration for example can be not ASIC specific, but 
>  rather
>  determined by whoever took the ASIC and glued it together with VRAM 
>  on a
>  board. It is not likely that somebody puts all the VRAM chips on one
>  channel, but it is still perfectly possible.
> 
>  Same is true for things like harvesting, e.g. of 16 channels halve 
>  of them
>  could be bad and we need to know which to actually use.
> >>> For my understanding: This leaks outside the chip when sharing 
> >>> buffers?
> >>> All the information you only need locally to a given amdgpu instance
> >>> don't really need to be encoded in modifiers.
> >>>
> >>> Pointers to code where this is all decided (kernel and radeonsi would 
> >>> be
> >>> good starters I guess) would be really good here.
> >> I extracted the information on which bits are relevant mostly from the
> >> AddrFromCoord functions in addrlib in mesa:
> >>
> >> for macro-tiles:
> >> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587
> >>
> >> for micro-tiles (or the micro-tiles in macro-tiles):
> >>
> >> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016
> > So this is the decoding thing. How many of these actually exist, even 
> > when
> > taking all the other information into account?
> >
> > E.g. given a platform + memory config (seems needed) + drm_fourcc + 
> > stride
> > + height + width, how much of all these bits do you actually still 
> > freely
> > pick?
>  Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse,
>  thick variants of macro-tile), MICRO_TILE_MODE(display, non-display,
>  depth, display-rotated) + whether to use compression, everything else
>  is fixed given those option, the properties of 

Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.

2018-09-04 Thread Christian König

Am 04.09.2018 um 18:37 schrieb Daniel Vetter:

On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen
 wrote:

On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter  wrote:

On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen
 wrote:

On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter  wrote:

On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote:

On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter  wrote:

On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:

Am 04.09.2018 um 12:15 schrieb Daniel Stone:

Hi,

On Tue, 4 Sep 2018 at 11:05, Daniel Vetter  wrote:

On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen  
wrote:

+/* The chip this is compatible with.
+ *
+ * If compression is disabled, use
+ *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
+ *   - AMDGPU_CHIP_VEGA10 for GFX9+
+ *
+ * With compression enabled please use the exact chip.
+ *
+ * TODO: Do some generations share DCC format?
+ */
+#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40
+#define AMDGPU_MODIFIER_CHIP_GEN_MASK  0xff

Do you really need all the combinations here of DCC + gpu gen + tiling
details? When we had the entire discussion with nvidia folks they
eventually agreed that they don't need the massive pile with every
possible combination. Do you really plan to share all these different
things?

Note that e.g. on i915 we spec some of the tiling depending upon
buffer size and buffer format (because that's how the hw works), not
using explicit modifier flags for everything.

Right. The conclusion, after people went through and started sorting
out the kinds of formats for which they would _actually_ export real
colour buffers for, that most vendors definitely have fewer than
115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
possible formats to represent, very likely fewer than
340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
fewer than 72,057,594,037,927,936 formats, and even still generally
fewer than 281,474,976,710,656 if you want to be generous and leave 8
bits of the 56 available.

The problem here is that at least for some parameters we actually don't know
which formats are actually used.

The following are not real world examples, but just to explain the general
problem.

The memory configuration for example can be not ASIC specific, but rather
determined by whoever took the ASIC and glued it together with VRAM on a
board. It is not likely that somebody puts all the VRAM chips on one
channel, but it is still perfectly possible.

Same is true for things like harvesting, e.g. of 16 channels halve of them
could be bad and we need to know which to actually use.

For my understanding: This leaks outside the chip when sharing buffers?
All the information you only need locally to a given amdgpu instance
don't really need to be encoded in modifiers.

Pointers to code where this is all decided (kernel and radeonsi would be
good starters I guess) would be really good here.

I extracted the information on which bits are relevant mostly from the
AddrFromCoord functions in addrlib in mesa:

for macro-tiles:
https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587

for micro-tiles (or the micro-tiles in macro-tiles):

https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016

So this is the decoding thing. How many of these actually exist, even when
taking all the other information into account?

E.g. given a platform + memory config (seems needed) + drm_fourcc + stride
+ height + width, how much of all these bits do you actually still freely
pick?

Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse,
thick variants of macro-tile), MICRO_TILE_MODE(display, non-display,
depth, display-rotated) + whether to use compression, everything else
is fixed given those option, the properties of the chip and the
format.



It might be that all the things you need to know from the memory config
don't encode smaller than the macro/micro/whatever else stuff. But that's
kinda the angle that we looked at this for everyone else.

E.g. for multi-plane stuff, if everyone picks the same config for the
2nd/3rd plane, then you don't actually need to encode that. It just
becomes part of the implied stuff in the modifier.

The problem is some GPUs are compatible for say 8-bpp images, but not
for 32-bpp surfaces. e.g. lets look at the following table showing the
current configuration for all GFX6-GFX8 GPU:

format: (bank width, bank height, macro tile aspect, num banks) for
8-bpp, 16-bpp and 32 bpp single-sample followed by the PIPE_CONFIG

verde: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
oland: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
hainan: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P2
tahiti/pitcairn: (1, 4, 1, 16) (1, 2, 1, 16) (1, 1, 1, 16)
ADDR_SURF_P8_32x32_8x16
bonaire: (1,4,4,16) (1, 2, 4, 16) (1, 1, 2, 16) ADDR_SURF_P4_16x16
hawaii: (1, 

Re: two KASANs in TTM logic

2018-09-04 Thread Tom St Denis

Sure:

d2917f399e0b250f47d07da551a335843a24f835 is the first bad commit
commit d2917f399e0b250f47d07da551a335843a24f835
Author: Christian König 
Date:   Thu Aug 30 10:04:53 2018 +0200

drm/amdgpu: fix "use bulk moves for efficient VM LRU handling" v2

First step to fix the LRU corruption, we accidentially tried to 
move things

on the LRU after dropping the lock.

Signed-off-by: Christian König 
Tested-by: Michel Dänzer 

:04 04 ed5be1ad4da129c4154b2b43acf7ef349a470700 
0008c4e2fb56512f41559618dd474c916fc09a37 M  drivers



The commit before that I can run xonotic-glx and piglit on my Carrizo 
without a KASAN.


Tom

On 09/04/2018 10:05 AM, Christian König wrote:

The first one should already be fixed.

Not sure where the second comes from. Can you narrow that down further?

Christian.

Am 04.09.2018 um 15:46 schrieb Tom St Denis:

First is caused by this commit while running a GL heavy application.

d78c1fa0c9f815fe951fd57001acca3d35262a17 is the first bad commit
commit d78c1fa0c9f815fe951fd57001acca3d35262a17
Author: Michel Dänzer 
Date:   Wed Aug 29 11:59:38 2018 +0200

    Revert "drm/amdgpu: move PD/PT bos on LRU again"

    This reverts commit 31625ccae4464b61ec8cdb9740df848bbc857a5b.

    It triggered various badness on my development machine when 
running the

    piglit gpu profile with radeonsi on Bonaire, looks like memory
    corruption due to insufficiently protected list manipulations.

    Signed-off-by: Michel Dänzer 
    Signed-off-by: Alex Deucher 

:04 04 b7169f0cf0c7decec631751a9896a92badb67f9d 
42ea58f43199d26fc0c7ddcc655e6d0964b81817 M  drivers


The second is caused by something between that and the tip of the 
4.19-rc1 amd-staging-drm-next (I haven't pinned it down yet) while 
loading GNOME.


Tom



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.

2018-09-04 Thread Daniel Vetter
On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen
 wrote:
> On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter  wrote:
>>
>> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen
>>  wrote:
>> > On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter  wrote:
>> >>
>> >> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote:
>> >> > On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter  wrote:
>> >> > >
>> >> > > On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:
>> >> > > > Am 04.09.2018 um 12:15 schrieb Daniel Stone:
>> >> > > > > Hi,
>> >> > > > >
>> >> > > > > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter 
>> >> > > > >  wrote:
>> >> > > > > > On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen 
>> >> > > > > >  wrote:
>> >> > > > > > > +/* The chip this is compatible with.
>> >> > > > > > > + *
>> >> > > > > > > + * If compression is disabled, use
>> >> > > > > > > + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
>> >> > > > > > > + *   - AMDGPU_CHIP_VEGA10 for GFX9+
>> >> > > > > > > + *
>> >> > > > > > > + * With compression enabled please use the exact chip.
>> >> > > > > > > + *
>> >> > > > > > > + * TODO: Do some generations share DCC format?
>> >> > > > > > > + */
>> >> > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40
>> >> > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK  0xff
>> >> > > > > > Do you really need all the combinations here of DCC + gpu gen + 
>> >> > > > > > tiling
>> >> > > > > > details? When we had the entire discussion with nvidia folks 
>> >> > > > > > they
>> >> > > > > > eventually agreed that they don't need the massive pile with 
>> >> > > > > > every
>> >> > > > > > possible combination. Do you really plan to share all these 
>> >> > > > > > different
>> >> > > > > > things?
>> >> > > > > >
>> >> > > > > > Note that e.g. on i915 we spec some of the tiling depending upon
>> >> > > > > > buffer size and buffer format (because that's how the hw 
>> >> > > > > > works), not
>> >> > > > > > using explicit modifier flags for everything.
>> >> > > > > Right. The conclusion, after people went through and started 
>> >> > > > > sorting
>> >> > > > > out the kinds of formats for which they would _actually_ export 
>> >> > > > > real
>> >> > > > > colour buffers for, that most vendors definitely have fewer than
>> >> > > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
>> >> > > > > possible formats to represent, very likely fewer than
>> >> > > > > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, 
>> >> > > > > probably
>> >> > > > > fewer than 72,057,594,037,927,936 formats, and even still 
>> >> > > > > generally
>> >> > > > > fewer than 281,474,976,710,656 if you want to be generous and 
>> >> > > > > leave 8
>> >> > > > > bits of the 56 available.
>> >> > > >
>> >> > > > The problem here is that at least for some parameters we actually 
>> >> > > > don't know
>> >> > > > which formats are actually used.
>> >> > > >
>> >> > > > The following are not real world examples, but just to explain the 
>> >> > > > general
>> >> > > > problem.
>> >> > > >
>> >> > > > The memory configuration for example can be not ASIC specific, but 
>> >> > > > rather
>> >> > > > determined by whoever took the ASIC and glued it together with VRAM 
>> >> > > > on a
>> >> > > > board. It is not likely that somebody puts all the VRAM chips on one
>> >> > > > channel, but it is still perfectly possible.
>> >> > > >
>> >> > > > Same is true for things like harvesting, e.g. of 16 channels halve 
>> >> > > > of them
>> >> > > > could be bad and we need to know which to actually use.
>> >> > >
>> >> > > For my understanding: This leaks outside the chip when sharing 
>> >> > > buffers?
>> >> > > All the information you only need locally to a given amdgpu instance
>> >> > > don't really need to be encoded in modifiers.
>> >> > >
>> >> > > Pointers to code where this is all decided (kernel and radeonsi would 
>> >> > > be
>> >> > > good starters I guess) would be really good here.
>> >> >
>> >> > I extracted the information on which bits are relevant mostly from the
>> >> > AddrFromCoord functions in addrlib in mesa:
>> >> >
>> >> > for macro-tiles:
>> >> > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587
>> >> >
>> >> > for micro-tiles (or the micro-tiles in macro-tiles):
>> >> >
>> >> > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016
>> >>
>> >> So this is the decoding thing. How many of these actually exist, even when
>> >> taking all the other information into account?
>> >>
>> >> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride
>> >> + height + width, how much of all these bits do you actually still freely
>> >> pick?
>> >
>> > Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse,
>> > thick variants of macro-tile), MICRO_TILE_MODE(display, non-display,
>> > depth, display-rotated) + 

Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.

2018-09-04 Thread Bas Nieuwenhuizen
On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter  wrote:
>
> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen
>  wrote:
> > On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter  wrote:
> >>
> >> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote:
> >> > On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter  wrote:
> >> > >
> >> > > On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:
> >> > > > Am 04.09.2018 um 12:15 schrieb Daniel Stone:
> >> > > > > Hi,
> >> > > > >
> >> > > > > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter 
> >> > > > >  wrote:
> >> > > > > > On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen 
> >> > > > > >  wrote:
> >> > > > > > > +/* The chip this is compatible with.
> >> > > > > > > + *
> >> > > > > > > + * If compression is disabled, use
> >> > > > > > > + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
> >> > > > > > > + *   - AMDGPU_CHIP_VEGA10 for GFX9+
> >> > > > > > > + *
> >> > > > > > > + * With compression enabled please use the exact chip.
> >> > > > > > > + *
> >> > > > > > > + * TODO: Do some generations share DCC format?
> >> > > > > > > + */
> >> > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40
> >> > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK  0xff
> >> > > > > > Do you really need all the combinations here of DCC + gpu gen + 
> >> > > > > > tiling
> >> > > > > > details? When we had the entire discussion with nvidia folks they
> >> > > > > > eventually agreed that they don't need the massive pile with 
> >> > > > > > every
> >> > > > > > possible combination. Do you really plan to share all these 
> >> > > > > > different
> >> > > > > > things?
> >> > > > > >
> >> > > > > > Note that e.g. on i915 we spec some of the tiling depending upon
> >> > > > > > buffer size and buffer format (because that's how the hw works), 
> >> > > > > > not
> >> > > > > > using explicit modifier flags for everything.
> >> > > > > Right. The conclusion, after people went through and started 
> >> > > > > sorting
> >> > > > > out the kinds of formats for which they would _actually_ export 
> >> > > > > real
> >> > > > > colour buffers for, that most vendors definitely have fewer than
> >> > > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> >> > > > > possible formats to represent, very likely fewer than
> >> > > > > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, 
> >> > > > > probably
> >> > > > > fewer than 72,057,594,037,927,936 formats, and even still generally
> >> > > > > fewer than 281,474,976,710,656 if you want to be generous and 
> >> > > > > leave 8
> >> > > > > bits of the 56 available.
> >> > > >
> >> > > > The problem here is that at least for some parameters we actually 
> >> > > > don't know
> >> > > > which formats are actually used.
> >> > > >
> >> > > > The following are not real world examples, but just to explain the 
> >> > > > general
> >> > > > problem.
> >> > > >
> >> > > > The memory configuration for example can be not ASIC specific, but 
> >> > > > rather
> >> > > > determined by whoever took the ASIC and glued it together with VRAM 
> >> > > > on a
> >> > > > board. It is not likely that somebody puts all the VRAM chips on one
> >> > > > channel, but it is still perfectly possible.
> >> > > >
> >> > > > Same is true for things like harvesting, e.g. of 16 channels halve 
> >> > > > of them
> >> > > > could be bad and we need to know which to actually use.
> >> > >
> >> > > For my understanding: This leaks outside the chip when sharing buffers?
> >> > > All the information you only need locally to a given amdgpu instance
> >> > > don't really need to be encoded in modifiers.
> >> > >
> >> > > Pointers to code where this is all decided (kernel and radeonsi would 
> >> > > be
> >> > > good starters I guess) would be really good here.
> >> >
> >> > I extracted the information on which bits are relevant mostly from the
> >> > AddrFromCoord functions in addrlib in mesa:
> >> >
> >> > for macro-tiles:
> >> > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587
> >> >
> >> > for micro-tiles (or the micro-tiles in macro-tiles):
> >> >
> >> > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016
> >>
> >> So this is the decoding thing. How many of these actually exist, even when
> >> taking all the other information into account?
> >>
> >> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride
> >> + height + width, how much of all these bits do you actually still freely
> >> pick?
> >
> > Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse,
> > thick variants of macro-tile), MICRO_TILE_MODE(display, non-display,
> > depth, display-rotated) + whether to use compression, everything else
> > is fixed given those option, the properties of the chip and the
> > format.
> >
> >
> >>
> >> It might be that all the things you need to know from the 

Re: [PATCH v3 04/13] fbdev: add remove_conflicting_pci_framebuffers()

2018-09-04 Thread kbuild test robot
Hi Michał,

I love your patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.19-rc2 next-20180831]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Micha-Miros-aw/remove_conflicting_framebuffers-cleanup/20180903-094322
reproduce: make htmldocs
:: branch date: 5 hours ago
:: commit date: 5 hours ago

All warnings (new ones prefixed by >>):

   WARNING: convert(1) not found, for SVG to PDF conversion install ImageMagick 
(https://www.imagemagick.org)
   include/linux/srcu.h:175: warning: Function parameter or member 'p' not 
described in 'srcu_dereference_notrace'
   include/linux/srcu.h:175: warning: Function parameter or member 'sp' not 
described in 'srcu_dereference_notrace'
   include/linux/gfp.h:1: warning: no structured comments found
   include/net/cfg80211.h:4381: warning: Function parameter or member 
'wext.ibss' not described in 'wireless_dev'
   include/net/cfg80211.h:4381: warning: Function parameter or member 
'wext.connect' not described in 'wireless_dev'
   include/net/cfg80211.h:4381: warning: Function parameter or member 
'wext.keys' not described in 'wireless_dev'
   include/net/cfg80211.h:4381: warning: Function parameter or member 'wext.ie' 
not described in 'wireless_dev'
   include/net/cfg80211.h:4381: warning: Function parameter or member 
'wext.ie_len' not described in 'wireless_dev'
   include/net/cfg80211.h:4381: warning: Function parameter or member 
'wext.bssid' not described in 'wireless_dev'
   include/net/cfg80211.h:4381: warning: Function parameter or member 
'wext.ssid' not described in 'wireless_dev'
   include/net/cfg80211.h:4381: warning: Function parameter or member 
'wext.default_key' not described in 'wireless_dev'
   include/net/cfg80211.h:4381: warning: Function parameter or member 
'wext.default_mgmt_key' not described in 'wireless_dev'
   include/net/cfg80211.h:4381: warning: Function parameter or member 
'wext.prev_bssid_valid' not described in 'wireless_dev'
   include/net/mac80211.h:2328: warning: Function parameter or member 
'radiotap_timestamp.units_pos' not described in 'ieee80211_hw'
   include/net/mac80211.h:2328: warning: Function parameter or member 
'radiotap_timestamp.accuracy' not described in 'ieee80211_hw'
   include/net/mac80211.h:977: warning: Function parameter or member 
'control.rates' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 
'control.rts_cts_rate_idx' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 
'control.use_rts' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 
'control.use_cts_prot' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 
'control.short_preamble' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 
'control.skip_table' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 
'control.jiffies' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 
'control.vif' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 
'control.hw_key' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 
'control.flags' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 
'control.enqueue_time' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 'ack' not 
described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 
'ack.cookie' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 
'status.rates' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 
'status.ack_signal' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 
'status.ampdu_ack_len' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 
'status.ampdu_len' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 
'status.antenna' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 
'status.tx_time' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 
'status.is_valid_ack_signal' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:977: warning: Function parameter or member 
'status.status_driver_data' not described 

Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.

2018-09-04 Thread Daniel Vetter
On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen
 wrote:
> On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter  wrote:
>>
>> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote:
>> > On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter  wrote:
>> > >
>> > > On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:
>> > > > Am 04.09.2018 um 12:15 schrieb Daniel Stone:
>> > > > > Hi,
>> > > > >
>> > > > > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter  
>> > > > > wrote:
>> > > > > > On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen 
>> > > > > >  wrote:
>> > > > > > > +/* The chip this is compatible with.
>> > > > > > > + *
>> > > > > > > + * If compression is disabled, use
>> > > > > > > + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
>> > > > > > > + *   - AMDGPU_CHIP_VEGA10 for GFX9+
>> > > > > > > + *
>> > > > > > > + * With compression enabled please use the exact chip.
>> > > > > > > + *
>> > > > > > > + * TODO: Do some generations share DCC format?
>> > > > > > > + */
>> > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40
>> > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK  0xff
>> > > > > > Do you really need all the combinations here of DCC + gpu gen + 
>> > > > > > tiling
>> > > > > > details? When we had the entire discussion with nvidia folks they
>> > > > > > eventually agreed that they don't need the massive pile with every
>> > > > > > possible combination. Do you really plan to share all these 
>> > > > > > different
>> > > > > > things?
>> > > > > >
>> > > > > > Note that e.g. on i915 we spec some of the tiling depending upon
>> > > > > > buffer size and buffer format (because that's how the hw works), 
>> > > > > > not
>> > > > > > using explicit modifier flags for everything.
>> > > > > Right. The conclusion, after people went through and started sorting
>> > > > > out the kinds of formats for which they would _actually_ export real
>> > > > > colour buffers for, that most vendors definitely have fewer than
>> > > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
>> > > > > possible formats to represent, very likely fewer than
>> > > > > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
>> > > > > fewer than 72,057,594,037,927,936 formats, and even still generally
>> > > > > fewer than 281,474,976,710,656 if you want to be generous and leave 8
>> > > > > bits of the 56 available.
>> > > >
>> > > > The problem here is that at least for some parameters we actually 
>> > > > don't know
>> > > > which formats are actually used.
>> > > >
>> > > > The following are not real world examples, but just to explain the 
>> > > > general
>> > > > problem.
>> > > >
>> > > > The memory configuration for example can be not ASIC specific, but 
>> > > > rather
>> > > > determined by whoever took the ASIC and glued it together with VRAM on 
>> > > > a
>> > > > board. It is not likely that somebody puts all the VRAM chips on one
>> > > > channel, but it is still perfectly possible.
>> > > >
>> > > > Same is true for things like harvesting, e.g. of 16 channels halve of 
>> > > > them
>> > > > could be bad and we need to know which to actually use.
>> > >
>> > > For my understanding: This leaks outside the chip when sharing buffers?
>> > > All the information you only need locally to a given amdgpu instance
>> > > don't really need to be encoded in modifiers.
>> > >
>> > > Pointers to code where this is all decided (kernel and radeonsi would be
>> > > good starters I guess) would be really good here.
>> >
>> > I extracted the information on which bits are relevant mostly from the
>> > AddrFromCoord functions in addrlib in mesa:
>> >
>> > for macro-tiles:
>> > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587
>> >
>> > for micro-tiles (or the micro-tiles in macro-tiles):
>> >
>> > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016
>>
>> So this is the decoding thing. How many of these actually exist, even when
>> taking all the other information into account?
>>
>> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride
>> + height + width, how much of all these bits do you actually still freely
>> pick?
>
> Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse,
> thick variants of macro-tile), MICRO_TILE_MODE(display, non-display,
> depth, display-rotated) + whether to use compression, everything else
> is fixed given those option, the properties of the chip and the
> format.
>
>
>>
>> It might be that all the things you need to know from the memory config
>> don't encode smaller than the macro/micro/whatever else stuff. But that's
>> kinda the angle that we looked at this for everyone else.
>>
>> E.g. for multi-plane stuff, if everyone picks the same config for the
>> 2nd/3rd plane, then you don't actually need to encode that. It just
>> becomes part of the implied stuff in the 

[bug report] drm/radeon: use pcie functions for link width

2018-09-04 Thread Dan Carpenter
Hello Alex Deucher,

The patch 5f152a572c10: "drm/radeon: use pcie functions for link
width" from Jun 25, 2018, leads to the following static checker
warning:

drivers/gpu/drm/radeon/cik.c:9646 cik_pcie_gen3_enable()
warn: dead code because of 'speed_cap == 21' and 'speed_cap != 21'

drivers/gpu/drm/radeon/cik.c
  9499  static void cik_pcie_gen3_enable(struct radeon_device *rdev)
  9500  {
  9501  struct pci_dev *root = rdev->pdev->bus->self;
  9502  enum pci_bus_speed speed_cap;
  9503  int bridge_pos, gpu_pos;
  9504  u32 speed_cntl, current_data_rate;
  9505  int i;
  9506  u16 tmp16;
  9507  
  9508  if (pci_is_root_bus(rdev->pdev->bus))
  9509  return;
  9510  
  9511  if (radeon_pcie_gen2 == 0)
  9512  return;
  9513  
  9514  if (rdev->flags & RADEON_IS_IGP)
  9515  return;
  9516  
  9517  if (!(rdev->flags & RADEON_IS_PCIE))
  9518  return;
  9519  
  9520  speed_cap = pcie_get_speed_cap(root);
  9521  if (speed_cap == PCI_SPEED_UNKNOWN)
  9522  return;
  9523  
  9524  if ((speed_cap != PCIE_SPEED_8_0GT) &&
  9525  (speed_cap != PCIE_SPEED_5_0GT))
 ^
We added this condition

  9526  return;
  9527  
  9528  speed_cntl = RREG32_PCIE_PORT(PCIE_LC_SPEED_CNTL);
  9529  current_data_rate = (speed_cntl & LC_CURRENT_DATA_RATE_MASK) >>
  9530  LC_CURRENT_DATA_RATE_SHIFT;
  9531  if (speed_cap == PCIE_SPEED_8_0GT) {
  9532  if (current_data_rate == 2) {
  9533  DRM_INFO("PCIE gen 3 link speeds already 
enabled\n");
  9534  return;
  9535  }
  9536  DRM_INFO("enabling PCIE gen 3 link speeds, disable with 
radeon.pcie_gen2=0\n");
  9537  } else if (speed_cap == PCIE_SPEED_5_0GT) {
  9538  if (current_data_rate == 1) {
  9539  DRM_INFO("PCIE gen 2 link speeds already 
enabled\n");
  9540  return;
  9541  }
  9542  DRM_INFO("enabling PCIE gen 2 link speeds, disable with 
radeon.pcie_gen2=0\n");
  9543  }
  9544  
  9545  bridge_pos = pci_pcie_cap(root);
  9546  if (!bridge_pos)
  9547  return;
  9548  
  9549  gpu_pos = pci_pcie_cap(rdev->pdev);
  9550  if (!gpu_pos)
  9551  return;
  9552  
  9553  if (speed_cap == PCIE_SPEED_8_0GT) {
  9554  /* re-try equalization if gen3 is not already enabled */
  9555  if (current_data_rate != 2) {
  9556  u16 bridge_cfg, gpu_cfg;
  9557  u16 bridge_cfg2, gpu_cfg2;
  9558  u32 max_lw, current_lw, tmp;
  9559  
  9560  pci_read_config_word(root, bridge_pos + 
PCI_EXP_LNKCTL, _cfg);
  9561  pci_read_config_word(rdev->pdev, gpu_pos + 
PCI_EXP_LNKCTL, _cfg);
  9562  
  9563  tmp16 = bridge_cfg | PCI_EXP_LNKCTL_HAWD;
  9564  pci_write_config_word(root, bridge_pos + 
PCI_EXP_LNKCTL, tmp16);
  9565  
  9566  tmp16 = gpu_cfg | PCI_EXP_LNKCTL_HAWD;
  9567  pci_write_config_word(rdev->pdev, gpu_pos + 
PCI_EXP_LNKCTL, tmp16);
  9568  
  9569  tmp = RREG32_PCIE_PORT(PCIE_LC_STATUS1);
  9570  max_lw = (tmp & LC_DETECTED_LINK_WIDTH_MASK) >> 
LC_DETECTED_LINK_WIDTH_SHIFT;
  9571  current_lw = (tmp & 
LC_OPERATING_LINK_WIDTH_MASK) >> LC_OPERATING_LINK_WIDTH_SHIFT;
  9572  
  9573  if (current_lw < max_lw) {
  9574  tmp = 
RREG32_PCIE_PORT(PCIE_LC_LINK_WIDTH_CNTL);
  9575  if (tmp & LC_RENEGOTIATION_SUPPORT) {
  9576  tmp &= ~(LC_LINK_WIDTH_MASK | 
LC_UPCONFIGURE_DIS);
  9577  tmp |= (max_lw << 
LC_LINK_WIDTH_SHIFT);
  9578  tmp |= LC_UPCONFIGURE_SUPPORT | 
LC_RENEGOTIATE_EN | LC_RECONFIG_NOW;
  9579  
WREG32_PCIE_PORT(PCIE_LC_LINK_WIDTH_CNTL, tmp);
  9580  }
  9581  }
  9582  
  9583  for (i = 0; i < 10; i++) {
  9584  /* check status */
  9585  pci_read_config_word(rdev->pdev, 
gpu_pos + PCI_EXP_DEVSTA, );
  9586  if (tmp16 & PCI_EXP_DEVSTA_TRPND)
  9587  break;
  9588  
  9589  

Re: two KASANs in TTM logic

2018-09-04 Thread Christian König

The first one should already be fixed.

Not sure where the second comes from. Can you narrow that down further?

Christian.

Am 04.09.2018 um 15:46 schrieb Tom St Denis:

First is caused by this commit while running a GL heavy application.

d78c1fa0c9f815fe951fd57001acca3d35262a17 is the first bad commit
commit d78c1fa0c9f815fe951fd57001acca3d35262a17
Author: Michel Dänzer 
Date:   Wed Aug 29 11:59:38 2018 +0200

    Revert "drm/amdgpu: move PD/PT bos on LRU again"

    This reverts commit 31625ccae4464b61ec8cdb9740df848bbc857a5b.

    It triggered various badness on my development machine when 
running the

    piglit gpu profile with radeonsi on Bonaire, looks like memory
    corruption due to insufficiently protected list manipulations.

    Signed-off-by: Michel Dänzer 
    Signed-off-by: Alex Deucher 

:04 04 b7169f0cf0c7decec631751a9896a92badb67f9d 
42ea58f43199d26fc0c7ddcc655e6d0964b81817 M  drivers


The second is caused by something between that and the tip of the 
4.19-rc1 amd-staging-drm-next (I haven't pinned it down yet) while 
loading GNOME.


Tom



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.

2018-09-04 Thread Bas Nieuwenhuizen
On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter  wrote:
>
> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote:
> > On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter  wrote:
> > >
> > > On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:
> > > > Am 04.09.2018 um 12:15 schrieb Daniel Stone:
> > > > > Hi,
> > > > >
> > > > > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter  
> > > > > wrote:
> > > > > > On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen 
> > > > > >  wrote:
> > > > > > > +/* The chip this is compatible with.
> > > > > > > + *
> > > > > > > + * If compression is disabled, use
> > > > > > > + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
> > > > > > > + *   - AMDGPU_CHIP_VEGA10 for GFX9+
> > > > > > > + *
> > > > > > > + * With compression enabled please use the exact chip.
> > > > > > > + *
> > > > > > > + * TODO: Do some generations share DCC format?
> > > > > > > + */
> > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40
> > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK  0xff
> > > > > > Do you really need all the combinations here of DCC + gpu gen + 
> > > > > > tiling
> > > > > > details? When we had the entire discussion with nvidia folks they
> > > > > > eventually agreed that they don't need the massive pile with every
> > > > > > possible combination. Do you really plan to share all these 
> > > > > > different
> > > > > > things?
> > > > > >
> > > > > > Note that e.g. on i915 we spec some of the tiling depending upon
> > > > > > buffer size and buffer format (because that's how the hw works), not
> > > > > > using explicit modifier flags for everything.
> > > > > Right. The conclusion, after people went through and started sorting
> > > > > out the kinds of formats for which they would _actually_ export real
> > > > > colour buffers for, that most vendors definitely have fewer than
> > > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> > > > > possible formats to represent, very likely fewer than
> > > > > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
> > > > > fewer than 72,057,594,037,927,936 formats, and even still generally
> > > > > fewer than 281,474,976,710,656 if you want to be generous and leave 8
> > > > > bits of the 56 available.
> > > >
> > > > The problem here is that at least for some parameters we actually don't 
> > > > know
> > > > which formats are actually used.
> > > >
> > > > The following are not real world examples, but just to explain the 
> > > > general
> > > > problem.
> > > >
> > > > The memory configuration for example can be not ASIC specific, but 
> > > > rather
> > > > determined by whoever took the ASIC and glued it together with VRAM on a
> > > > board. It is not likely that somebody puts all the VRAM chips on one
> > > > channel, but it is still perfectly possible.
> > > >
> > > > Same is true for things like harvesting, e.g. of 16 channels halve of 
> > > > them
> > > > could be bad and we need to know which to actually use.
> > >
> > > For my understanding: This leaks outside the chip when sharing buffers?
> > > All the information you only need locally to a given amdgpu instance
> > > don't really need to be encoded in modifiers.
> > >
> > > Pointers to code where this is all decided (kernel and radeonsi would be
> > > good starters I guess) would be really good here.
> >
> > I extracted the information on which bits are relevant mostly from the
> > AddrFromCoord functions in addrlib in mesa:
> >
> > for macro-tiles:
> > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587
> >
> > for micro-tiles (or the micro-tiles in macro-tiles):
> >
> > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016
>
> So this is the decoding thing. How many of these actually exist, even when
> taking all the other information into account?
>
> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride
> + height + width, how much of all these bits do you actually still freely
> pick?

Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse,
thick variants of macro-tile), MICRO_TILE_MODE(display, non-display,
depth, display-rotated) + whether to use compression, everything else
is fixed given those option, the properties of the chip and the
format.


>
> It might be that all the things you need to know from the memory config
> don't encode smaller than the macro/micro/whatever else stuff. But that's
> kinda the angle that we looked at this for everyone else.
>
> E.g. for multi-plane stuff, if everyone picks the same config for the
> 2nd/3rd plane, then you don't actually need to encode that. It just
> becomes part of the implied stuff in the modifier.

The problem is some GPUs are compatible for say 8-bpp images, but not
for 32-bpp surfaces. e.g. lets look at the following table showing the
current configuration for all GFX6-GFX8 

Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.

2018-09-04 Thread Christian König

Am 04.09.2018 um 15:17 schrieb Daniel Vetter:

On Tue, Sep 4, 2018 at 3:12 PM, Christian König
 wrote:

Am 04.09.2018 um 15:03 schrieb Daniel Vetter:

On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote:

On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter  wrote:

On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:

Am 04.09.2018 um 12:15 schrieb Daniel Stone:

Hi,

On Tue, 4 Sep 2018 at 11:05, Daniel Vetter 
wrote:

On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen
 wrote:

+/* The chip this is compatible with.
+ *
+ * If compression is disabled, use
+ *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
+ *   - AMDGPU_CHIP_VEGA10 for GFX9+
+ *
+ * With compression enabled please use the exact chip.
+ *
+ * TODO: Do some generations share DCC format?
+ */
+#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40
+#define AMDGPU_MODIFIER_CHIP_GEN_MASK  0xff

Do you really need all the combinations here of DCC + gpu gen +
tiling
details? When we had the entire discussion with nvidia folks they
eventually agreed that they don't need the massive pile with every
possible combination. Do you really plan to share all these different
things?

Note that e.g. on i915 we spec some of the tiling depending upon
buffer size and buffer format (because that's how the hw works), not
using explicit modifier flags for everything.

Right. The conclusion, after people went through and started sorting
out the kinds of formats for which they would _actually_ export real
colour buffers for, that most vendors definitely have fewer than

115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
possible formats to represent, very likely fewer than
340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
fewer than 72,057,594,037,927,936 formats, and even still generally
fewer than 281,474,976,710,656 if you want to be generous and leave 8
bits of the 56 available.

The problem here is that at least for some parameters we actually don't
know
which formats are actually used.

The following are not real world examples, but just to explain the
general
problem.

The memory configuration for example can be not ASIC specific, but
rather
determined by whoever took the ASIC and glued it together with VRAM on
a
board. It is not likely that somebody puts all the VRAM chips on one
channel, but it is still perfectly possible.

Same is true for things like harvesting, e.g. of 16 channels halve of
them
could be bad and we need to know which to actually use.

For my understanding: This leaks outside the chip when sharing buffers?
All the information you only need locally to a given amdgpu instance
don't really need to be encoded in modifiers.

Pointers to code where this is all decided (kernel and radeonsi would be
good starters I guess) would be really good here.

I extracted the information on which bits are relevant mostly from the
AddrFromCoord functions in addrlib in mesa:

for macro-tiles:

https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587

for micro-tiles (or the micro-tiles in macro-tiles):


https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016

So this is the decoding thing. How many of these actually exist, even when
taking all the other information into account?

E.g. given a platform + memory config (seems needed) + drm_fourcc + stride
+ height + width, how much of all these bits do you actually still freely
pick?

It might be that all the things you need to know from the memory config
don't encode smaller than the macro/micro/whatever else stuff. But that's
kinda the angle that we looked at this for everyone else.

E.g. for multi-plane stuff, if everyone picks the same config for the
2nd/3rd plane, then you don't actually need to encode that. It just
becomes part of the implied stuff in the modifier.


The pipe/bank configuration for the 2nd/3rd plane is the same, but the tile
configuration is potentially different.

We can make an educated guess, but I don't think that this is universal
applicable as well.

Why? The source code for amdgpu.ko, mesa, radv, amdvk is all open.
Should be possible to figure out what's actually used (on buffers you
want to share), and then figuring out how to best encode the
information you need to reconstruct all the information. You might
need to occasionally respin that if best practices for how to lay out
buffers changes after you started upstreaming for a given generation.
But ime hw people have pretty strong opinions on how to best tile
things. You even know what's coming down the internal pipeline.

Or do you mean something else with "educated guess"?


I mean something else. It can depend on the use case what a buffer 
object ends up with.


In other words when you want to transcode a video you could use a 
different layout as when you want to display the resulting image.


So essentially we would need to encode the use case and 

Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.

2018-09-04 Thread Daniel Vetter
On Tue, Sep 4, 2018 at 3:12 PM, Christian König
 wrote:
> Am 04.09.2018 um 15:03 schrieb Daniel Vetter:
>>
>> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote:
>>>
>>> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter  wrote:

 On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:
>
> Am 04.09.2018 um 12:15 schrieb Daniel Stone:
>>
>> Hi,
>>
>> On Tue, 4 Sep 2018 at 11:05, Daniel Vetter 
>> wrote:
>>>
>>> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen
>>>  wrote:

 +/* The chip this is compatible with.
 + *
 + * If compression is disabled, use
 + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
 + *   - AMDGPU_CHIP_VEGA10 for GFX9+
 + *
 + * With compression enabled please use the exact chip.
 + *
 + * TODO: Do some generations share DCC format?
 + */
 +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40
 +#define AMDGPU_MODIFIER_CHIP_GEN_MASK  0xff
>>>
>>> Do you really need all the combinations here of DCC + gpu gen +
>>> tiling
>>> details? When we had the entire discussion with nvidia folks they
>>> eventually agreed that they don't need the massive pile with every
>>> possible combination. Do you really plan to share all these different
>>> things?
>>>
>>> Note that e.g. on i915 we spec some of the tiling depending upon
>>> buffer size and buffer format (because that's how the hw works), not
>>> using explicit modifier flags for everything.
>>
>> Right. The conclusion, after people went through and started sorting
>> out the kinds of formats for which they would _actually_ export real
>> colour buffers for, that most vendors definitely have fewer than
>>
>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
>> possible formats to represent, very likely fewer than
>> 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
>> fewer than 72,057,594,037,927,936 formats, and even still generally
>> fewer than 281,474,976,710,656 if you want to be generous and leave 8
>> bits of the 56 available.
>
> The problem here is that at least for some parameters we actually don't
> know
> which formats are actually used.
>
> The following are not real world examples, but just to explain the
> general
> problem.
>
> The memory configuration for example can be not ASIC specific, but
> rather
> determined by whoever took the ASIC and glued it together with VRAM on
> a
> board. It is not likely that somebody puts all the VRAM chips on one
> channel, but it is still perfectly possible.
>
> Same is true for things like harvesting, e.g. of 16 channels halve of
> them
> could be bad and we need to know which to actually use.

 For my understanding: This leaks outside the chip when sharing buffers?
 All the information you only need locally to a given amdgpu instance
 don't really need to be encoded in modifiers.

 Pointers to code where this is all decided (kernel and radeonsi would be
 good starters I guess) would be really good here.
>>>
>>> I extracted the information on which bits are relevant mostly from the
>>> AddrFromCoord functions in addrlib in mesa:
>>>
>>> for macro-tiles:
>>>
>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587
>>>
>>> for micro-tiles (or the micro-tiles in macro-tiles):
>>>
>>>
>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016
>>
>> So this is the decoding thing. How many of these actually exist, even when
>> taking all the other information into account?
>>
>> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride
>> + height + width, how much of all these bits do you actually still freely
>> pick?
>>
>> It might be that all the things you need to know from the memory config
>> don't encode smaller than the macro/micro/whatever else stuff. But that's
>> kinda the angle that we looked at this for everyone else.
>>
>> E.g. for multi-plane stuff, if everyone picks the same config for the
>> 2nd/3rd plane, then you don't actually need to encode that. It just
>> becomes part of the implied stuff in the modifier.
>
>
> The pipe/bank configuration for the 2nd/3rd plane is the same, but the tile
> configuration is potentially different.
>
> We can make an educated guess, but I don't think that this is universal
> applicable as well.

Why? The source code for amdgpu.ko, mesa, radv, amdvk is all open.
Should be possible to figure out what's actually used (on buffers you
want to share), and then figuring out how to best encode the
information you need to reconstruct all the information. You might
need to occasionally respin that if best practices for 

Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.

2018-09-04 Thread Christian König

Am 04.09.2018 um 15:03 schrieb Daniel Vetter:

On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote:

On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter  wrote:

On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:

Am 04.09.2018 um 12:15 schrieb Daniel Stone:

Hi,

On Tue, 4 Sep 2018 at 11:05, Daniel Vetter  wrote:

On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen  
wrote:

+/* The chip this is compatible with.
+ *
+ * If compression is disabled, use
+ *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
+ *   - AMDGPU_CHIP_VEGA10 for GFX9+
+ *
+ * With compression enabled please use the exact chip.
+ *
+ * TODO: Do some generations share DCC format?
+ */
+#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40
+#define AMDGPU_MODIFIER_CHIP_GEN_MASK  0xff

Do you really need all the combinations here of DCC + gpu gen + tiling
details? When we had the entire discussion with nvidia folks they
eventually agreed that they don't need the massive pile with every
possible combination. Do you really plan to share all these different
things?

Note that e.g. on i915 we spec some of the tiling depending upon
buffer size and buffer format (because that's how the hw works), not
using explicit modifier flags for everything.

Right. The conclusion, after people went through and started sorting
out the kinds of formats for which they would _actually_ export real
colour buffers for, that most vendors definitely have fewer than
115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
possible formats to represent, very likely fewer than
340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
fewer than 72,057,594,037,927,936 formats, and even still generally
fewer than 281,474,976,710,656 if you want to be generous and leave 8
bits of the 56 available.

The problem here is that at least for some parameters we actually don't know
which formats are actually used.

The following are not real world examples, but just to explain the general
problem.

The memory configuration for example can be not ASIC specific, but rather
determined by whoever took the ASIC and glued it together with VRAM on a
board. It is not likely that somebody puts all the VRAM chips on one
channel, but it is still perfectly possible.

Same is true for things like harvesting, e.g. of 16 channels halve of them
could be bad and we need to know which to actually use.

For my understanding: This leaks outside the chip when sharing buffers?
All the information you only need locally to a given amdgpu instance
don't really need to be encoded in modifiers.

Pointers to code where this is all decided (kernel and radeonsi would be
good starters I guess) would be really good here.

I extracted the information on which bits are relevant mostly from the
AddrFromCoord functions in addrlib in mesa:

for macro-tiles:
https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587

for micro-tiles (or the micro-tiles in macro-tiles):

https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016

So this is the decoding thing. How many of these actually exist, even when
taking all the other information into account?

E.g. given a platform + memory config (seems needed) + drm_fourcc + stride
+ height + width, how much of all these bits do you actually still freely
pick?

It might be that all the things you need to know from the memory config
don't encode smaller than the macro/micro/whatever else stuff. But that's
kinda the angle that we looked at this for everyone else.

E.g. for multi-plane stuff, if everyone picks the same config for the
2nd/3rd plane, then you don't actually need to encode that. It just
becomes part of the implied stuff in the modifier.


The pipe/bank configuration for the 2nd/3rd plane is the same, but the 
tile configuration is potentially different.


We can make an educated guess, but I don't think that this is universal 
applicable as well.


Regards,
Christian.


-Daniel


-Daniel


Regards,
Christian.


If you do use 256 bits in order to represent
115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
modifiers per format, userspace would start hitting OOM pretty quickly
as it attempted to enumerate and negotiate acceptable modifiers.
Either that or we need to replace the fixed 64-bit modifier tokens
with some kind of eBPF script.

Cheers,
Daniel
___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org

Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.

2018-09-04 Thread Christian König

Am 04.09.2018 um 14:26 schrieb Daniel Vetter:

On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:

Am 04.09.2018 um 12:15 schrieb Daniel Stone:

Hi,

On Tue, 4 Sep 2018 at 11:05, Daniel Vetter  wrote:

On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen  
wrote:

+/* The chip this is compatible with.
+ *
+ * If compression is disabled, use
+ *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
+ *   - AMDGPU_CHIP_VEGA10 for GFX9+
+ *
+ * With compression enabled please use the exact chip.
+ *
+ * TODO: Do some generations share DCC format?
+ */
+#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40
+#define AMDGPU_MODIFIER_CHIP_GEN_MASK  0xff

Do you really need all the combinations here of DCC + gpu gen + tiling
details? When we had the entire discussion with nvidia folks they
eventually agreed that they don't need the massive pile with every
possible combination. Do you really plan to share all these different
things?

Note that e.g. on i915 we spec some of the tiling depending upon
buffer size and buffer format (because that's how the hw works), not
using explicit modifier flags for everything.

Right. The conclusion, after people went through and started sorting
out the kinds of formats for which they would _actually_ export real
colour buffers for, that most vendors definitely have fewer than
115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
possible formats to represent, very likely fewer than
340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
fewer than 72,057,594,037,927,936 formats, and even still generally
fewer than 281,474,976,710,656 if you want to be generous and leave 8
bits of the 56 available.

The problem here is that at least for some parameters we actually don't know
which formats are actually used.

The following are not real world examples, but just to explain the general
problem.

The memory configuration for example can be not ASIC specific, but rather
determined by whoever took the ASIC and glued it together with VRAM on a
board. It is not likely that somebody puts all the VRAM chips on one
channel, but it is still perfectly possible.

Same is true for things like harvesting, e.g. of 16 channels halve of them
could be bad and we need to know which to actually use.

For my understanding: This leaks outside the chip when sharing buffers?
All the information you only need locally to a given amdgpu instance
don't really need to be encoded in modifiers.


Yeah, that is certainly a possibility. The problem is that the 
identifiers are then not universal applicable.


But if we can somehow say this format only applies to buffers from 
devices with hardware config X, then that would certainly simplify 
things a lot.



Pointers to code where this is all decided (kernel and radeonsi would be
good starters I guess) would be really good here.


Well that is the problem, a good part of that stuff is rather deeply 
embedded in the firmware.


The kernel driver just reads the hardware config from some registers and 
hands it of to userspace.


Regards,
Christian.


-Daniel


Regards,
Christian.


If you do use 256 bits in order to represent
115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
modifiers per format, userspace would start hitting OOM pretty quickly
as it attempted to enumerate and negotiate acceptable modifiers.
Either that or we need to replace the fixed 64-bit modifier tokens
with some kind of eBPF script.

Cheers,
Daniel
___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.

2018-09-04 Thread Daniel Vetter
On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote:
> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter  wrote:
> >
> > On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:
> > > Am 04.09.2018 um 12:15 schrieb Daniel Stone:
> > > > Hi,
> > > >
> > > > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter  
> > > > wrote:
> > > > > On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen 
> > > > >  wrote:
> > > > > > +/* The chip this is compatible with.
> > > > > > + *
> > > > > > + * If compression is disabled, use
> > > > > > + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
> > > > > > + *   - AMDGPU_CHIP_VEGA10 for GFX9+
> > > > > > + *
> > > > > > + * With compression enabled please use the exact chip.
> > > > > > + *
> > > > > > + * TODO: Do some generations share DCC format?
> > > > > > + */
> > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40
> > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK  0xff
> > > > > Do you really need all the combinations here of DCC + gpu gen + tiling
> > > > > details? When we had the entire discussion with nvidia folks they
> > > > > eventually agreed that they don't need the massive pile with every
> > > > > possible combination. Do you really plan to share all these different
> > > > > things?
> > > > >
> > > > > Note that e.g. on i915 we spec some of the tiling depending upon
> > > > > buffer size and buffer format (because that's how the hw works), not
> > > > > using explicit modifier flags for everything.
> > > > Right. The conclusion, after people went through and started sorting
> > > > out the kinds of formats for which they would _actually_ export real
> > > > colour buffers for, that most vendors definitely have fewer than
> > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> > > > possible formats to represent, very likely fewer than
> > > > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
> > > > fewer than 72,057,594,037,927,936 formats, and even still generally
> > > > fewer than 281,474,976,710,656 if you want to be generous and leave 8
> > > > bits of the 56 available.
> > >
> > > The problem here is that at least for some parameters we actually don't 
> > > know
> > > which formats are actually used.
> > >
> > > The following are not real world examples, but just to explain the general
> > > problem.
> > >
> > > The memory configuration for example can be not ASIC specific, but rather
> > > determined by whoever took the ASIC and glued it together with VRAM on a
> > > board. It is not likely that somebody puts all the VRAM chips on one
> > > channel, but it is still perfectly possible.
> > >
> > > Same is true for things like harvesting, e.g. of 16 channels halve of them
> > > could be bad and we need to know which to actually use.
> >
> > For my understanding: This leaks outside the chip when sharing buffers?
> > All the information you only need locally to a given amdgpu instance
> > don't really need to be encoded in modifiers.
> >
> > Pointers to code where this is all decided (kernel and radeonsi would be
> > good starters I guess) would be really good here.
> 
> I extracted the information on which bits are relevant mostly from the
> AddrFromCoord functions in addrlib in mesa:
> 
> for macro-tiles:
> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587
> 
> for micro-tiles (or the micro-tiles in macro-tiles):
> 
> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016

So this is the decoding thing. How many of these actually exist, even when
taking all the other information into account?

E.g. given a platform + memory config (seems needed) + drm_fourcc + stride
+ height + width, how much of all these bits do you actually still freely
pick?

It might be that all the things you need to know from the memory config
don't encode smaller than the macro/micro/whatever else stuff. But that's
kinda the angle that we looked at this for everyone else.

E.g. for multi-plane stuff, if everyone picks the same config for the
2nd/3rd plane, then you don't actually need to encode that. It just
becomes part of the implied stuff in the modifier.
-Daniel

> 
> > -Daniel
> >
> > >
> > > Regards,
> > > Christian.
> > >
> > > >
> > > > If you do use 256 bits in order to represent
> > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> > > > modifiers per format, userspace would start hitting OOM pretty quickly
> > > > as it attempted to enumerate and negotiate acceptable modifiers.
> > > > Either that or we need to replace the fixed 64-bit modifier tokens
> > > > with some kind of eBPF script.
> > > >
> > > > Cheers,
> > > > Daniel
> > > > ___
> > > > dri-devel mailing list
> > > > dri-de...@lists.freedesktop.org
> > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> 

Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.

2018-09-04 Thread Christian König

Am 04.09.2018 um 14:22 schrieb Daniel Stone:

Hi,

On Tue, 4 Sep 2018 at 11:44, Christian König
 wrote:

Am 04.09.2018 um 12:15 schrieb Daniel Stone:

Right. The conclusion, after people went through and started sorting
out the kinds of formats for which they would _actually_ export real
colour buffers for, that most vendors definitely have fewer than
115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
possible formats to represent, very likely fewer than
340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
fewer than 72,057,594,037,927,936 formats, and even still generally
fewer than 281,474,976,710,656 if you want to be generous and leave 8
bits of the 56 available.

The problem here is that at least for some parameters we actually don't
know which formats are actually used.

The following are not real world examples, but just to explain the
general problem.

The memory configuration for example can be not ASIC specific, but
rather determined by whoever took the ASIC and glued it together with
VRAM on a board. It is not likely that somebody puts all the VRAM chips
on one channel, but it is still perfectly possible.

Same is true for things like harvesting, e.g. of 16 channels halve of
them could be bad and we need to know which to actually use.

Sure, that's fine, but I'm not sure why, for instance, 'half of the
memory channels are bad and must be avoided' would be attached to a
format in the same way that macrotile size/depth would be?


Because it results in a different memory layout.


Similarly,
what happens if you encode a channel or lane blacklist mask into a
modifier, and then you import a buffer with that modifier into a
client API on another device. Does that also apply to the other device
or only the source device?


If the other device doesn't have the same hardware config it probably 
can't access that buffer.



How then do you handle the capability query between them: is it
relevant to device B attempting to import and sample from an image
that device A only has a 128-bit memory bus because of SKU
configuration? How does generic userspace look at a token which
encodes all of this information and know that the buffer is
interchangeable between devices?


Well that is exactly the point where my understanding stops. No idea how 
the user space driver is handling this.


Marek came up with the interop interface, he needs to explain the details.

Regards,
Christian.



Cheers,
Daniel


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Mesa-dev] [PATCH libdrm] Add basic CONTRIBUTING file

2018-09-04 Thread Daniel Vetter
On Tue, Sep 04, 2018 at 10:02:40AM +0100, Eric Engestrom wrote:
> On Tuesday, 2018-09-04 16:24:44 +1000, Dave Airlie wrote:
> > On Mon, 3 Sep 2018 at 18:47, Daniel Vetter  wrote:
> > >
> > > I picked up a bunch of the pieces from wayland's version:
> > >
> > > https://gitlab.freedesktop.org/wayland/wayland/blob/master/CONTRIBUTING.md
> > >
> > > The weston one is fairly similar. Then I rather massively trimmed it
> > > down since in reality libdrm is a bit a dumping ground with very few
> > > real rules. The commit rights and CoC sections I've copied verbatim
> > > from igt respectively drm-misc. Weston/Wayland only differ in their
> > > pick of how many patches you need (10 instead of 5). I think for
> > > libdrm this is supremely relevant, since most everyone will get their
> > > commit rights by contributing already to the kernel or mesa and having
> > > commit rights there already.
> > >
> > > Anyway, I figured this is good to get the rules documented, even if
> > > there's mostly not many rules.
> > >
> > > Note: This references maintainers in a MAINTAINERS file, which needs
> > > to be created first.
> > >
> > > Note: With the gitlab migration the entire commit rights process is
> > > still a bit up in the air. But gitlab commit rights and roles are
> > > hierarchical, so we can do libdrm-only maintainer/commiter roles
> > > ("Owner" and "Developer" in gitlab-speak). This should avoid
> > > conflating libdrm roles with mesa roles, useful for those pushing to
> > > libdrm as primarily kernel contributors.
> > 
> > Fine with me,
> > 
> > Acked-by: Dave Airlie 
> > 
> > Dave.
> 
> I think this has gathered enough acks and rbs, you can just push it now
> and if there's anything that should be adjusted we can do that as
> a follow up :)

And pushed.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.

2018-09-04 Thread Bas Nieuwenhuizen
+everyone again
On Tue, Sep 4, 2018 at 2:39 PM Bas Nieuwenhuizen
 wrote:
>
> On Tue, Sep 4, 2018 at 2:22 PM Daniel Stone  wrote:
> >
> > Hi,
> >
> > On Tue, 4 Sep 2018 at 11:44, Christian König
> >  wrote:
> > > Am 04.09.2018 um 12:15 schrieb Daniel Stone:
> > > > Right. The conclusion, after people went through and started sorting
> > > > out the kinds of formats for which they would _actually_ export real
> > > > colour buffers for, that most vendors definitely have fewer than
> > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> > > > possible formats to represent, very likely fewer than
> > > > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
> > > > fewer than 72,057,594,037,927,936 formats, and even still generally
> > > > fewer than 281,474,976,710,656 if you want to be generous and leave 8
> > > > bits of the 56 available.
> > >
> > > The problem here is that at least for some parameters we actually don't
> > > know which formats are actually used.
> > >
> > > The following are not real world examples, but just to explain the
> > > general problem.
> > >
> > > The memory configuration for example can be not ASIC specific, but
> > > rather determined by whoever took the ASIC and glued it together with
> > > VRAM on a board. It is not likely that somebody puts all the VRAM chips
> > > on one channel, but it is still perfectly possible.
> > >
> > > Same is true for things like harvesting, e.g. of 16 channels halve of
> > > them could be bad and we need to know which to actually use.
> >
> > Sure, that's fine, but I'm not sure why, for instance, 'half of the
> > memory channels are bad and must be avoided' would be attached to a
> > format in the same way that macrotile size/depth would be? Similarly,
> > what happens if you encode a channel or lane blacklist mask into a
> > modifier, and then you import a buffer with that modifier into a
> > client API on another device. Does that also apply to the other device
> > or only the source device?
> >
> > How then do you handle the capability query between them: is it
> > relevant to device B attempting to import and sample from an image
> > that device A only has a 128-bit memory bus because of SKU
> > configuration? How does generic userspace look at a token which
> > encodes all of this information and know that the buffer is
> > interchangeable between devices?
>
> So my idea here is to encode all the bits that determine format in a
> way that as much as possible if it
> results in the same layout it should get the same modifier(i.e. remove
> useless info):
>
> https://lists.freedesktop.org/archives/dri-devel/2018-September/188418.html
>
> An example of the per-chip things is the PIPE_CONFIG, which is used
> for address calculation in the macro-tiled mode:
>
> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/siaddrlib.cpp#L666
>
> two chips with different PIPE_CONFIG will have different texture
> layouts, as the address swizzles are different.
>
> So it gets included in the modifier (the PIPE_CONFIG is pretty much
> constant per chip but multiple chips might have the same PIPE_CONFIG).
>
> >
> > Cheers,
> > Daniel
> > ___
> > dri-devel mailing list
> > dri-de...@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.

2018-09-04 Thread Bas Nieuwenhuizen
On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter  wrote:
>
> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:
> > Am 04.09.2018 um 12:15 schrieb Daniel Stone:
> > > Hi,
> > >
> > > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter  wrote:
> > > > On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen 
> > > >  wrote:
> > > > > +/* The chip this is compatible with.
> > > > > + *
> > > > > + * If compression is disabled, use
> > > > > + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
> > > > > + *   - AMDGPU_CHIP_VEGA10 for GFX9+
> > > > > + *
> > > > > + * With compression enabled please use the exact chip.
> > > > > + *
> > > > > + * TODO: Do some generations share DCC format?
> > > > > + */
> > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40
> > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK  0xff
> > > > Do you really need all the combinations here of DCC + gpu gen + tiling
> > > > details? When we had the entire discussion with nvidia folks they
> > > > eventually agreed that they don't need the massive pile with every
> > > > possible combination. Do you really plan to share all these different
> > > > things?
> > > >
> > > > Note that e.g. on i915 we spec some of the tiling depending upon
> > > > buffer size and buffer format (because that's how the hw works), not
> > > > using explicit modifier flags for everything.
> > > Right. The conclusion, after people went through and started sorting
> > > out the kinds of formats for which they would _actually_ export real
> > > colour buffers for, that most vendors definitely have fewer than
> > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> > > possible formats to represent, very likely fewer than
> > > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
> > > fewer than 72,057,594,037,927,936 formats, and even still generally
> > > fewer than 281,474,976,710,656 if you want to be generous and leave 8
> > > bits of the 56 available.
> >
> > The problem here is that at least for some parameters we actually don't know
> > which formats are actually used.
> >
> > The following are not real world examples, but just to explain the general
> > problem.
> >
> > The memory configuration for example can be not ASIC specific, but rather
> > determined by whoever took the ASIC and glued it together with VRAM on a
> > board. It is not likely that somebody puts all the VRAM chips on one
> > channel, but it is still perfectly possible.
> >
> > Same is true for things like harvesting, e.g. of 16 channels halve of them
> > could be bad and we need to know which to actually use.
>
> For my understanding: This leaks outside the chip when sharing buffers?
> All the information you only need locally to a given amdgpu instance
> don't really need to be encoded in modifiers.
>
> Pointers to code where this is all decided (kernel and radeonsi would be
> good starters I guess) would be really good here.

I extracted the information on which bits are relevant mostly from the
AddrFromCoord functions in addrlib in mesa:

for macro-tiles:
https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587

for micro-tiles (or the micro-tiles in macro-tiles):

https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016

> -Daniel
>
> >
> > Regards,
> > Christian.
> >
> > >
> > > If you do use 256 bits in order to represent
> > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> > > modifiers per format, userspace would start hitting OOM pretty quickly
> > > as it attempted to enumerate and negotiate acceptable modifiers.
> > > Either that or we need to replace the fixed 64-bit modifier tokens
> > > with some kind of eBPF script.
> > >
> > > Cheers,
> > > Daniel
> > > ___
> > > dri-devel mailing list
> > > dri-de...@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.

2018-09-04 Thread Daniel Vetter
On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:
> Am 04.09.2018 um 12:15 schrieb Daniel Stone:
> > Hi,
> > 
> > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter  wrote:
> > > On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen 
> > >  wrote:
> > > > +/* The chip this is compatible with.
> > > > + *
> > > > + * If compression is disabled, use
> > > > + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
> > > > + *   - AMDGPU_CHIP_VEGA10 for GFX9+
> > > > + *
> > > > + * With compression enabled please use the exact chip.
> > > > + *
> > > > + * TODO: Do some generations share DCC format?
> > > > + */
> > > > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40
> > > > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK  0xff
> > > Do you really need all the combinations here of DCC + gpu gen + tiling
> > > details? When we had the entire discussion with nvidia folks they
> > > eventually agreed that they don't need the massive pile with every
> > > possible combination. Do you really plan to share all these different
> > > things?
> > > 
> > > Note that e.g. on i915 we spec some of the tiling depending upon
> > > buffer size and buffer format (because that's how the hw works), not
> > > using explicit modifier flags for everything.
> > Right. The conclusion, after people went through and started sorting
> > out the kinds of formats for which they would _actually_ export real
> > colour buffers for, that most vendors definitely have fewer than
> > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> > possible formats to represent, very likely fewer than
> > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
> > fewer than 72,057,594,037,927,936 formats, and even still generally
> > fewer than 281,474,976,710,656 if you want to be generous and leave 8
> > bits of the 56 available.
> 
> The problem here is that at least for some parameters we actually don't know
> which formats are actually used.
> 
> The following are not real world examples, but just to explain the general
> problem.
> 
> The memory configuration for example can be not ASIC specific, but rather
> determined by whoever took the ASIC and glued it together with VRAM on a
> board. It is not likely that somebody puts all the VRAM chips on one
> channel, but it is still perfectly possible.
> 
> Same is true for things like harvesting, e.g. of 16 channels halve of them
> could be bad and we need to know which to actually use.

For my understanding: This leaks outside the chip when sharing buffers?
All the information you only need locally to a given amdgpu instance
don't really need to be encoded in modifiers.

Pointers to code where this is all decided (kernel and radeonsi would be
good starters I guess) would be really good here.
-Daniel

> 
> Regards,
> Christian.
> 
> > 
> > If you do use 256 bits in order to represent
> > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> > modifiers per format, userspace would start hitting OOM pretty quickly
> > as it attempted to enumerate and negotiate acceptable modifiers.
> > Either that or we need to replace the fixed 64-bit modifier tokens
> > with some kind of eBPF script.
> > 
> > Cheers,
> > Daniel
> > ___
> > dri-devel mailing list
> > dri-de...@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.

2018-09-04 Thread Daniel Stone
Hi,

On Tue, 4 Sep 2018 at 11:44, Christian König
 wrote:
> Am 04.09.2018 um 12:15 schrieb Daniel Stone:
> > Right. The conclusion, after people went through and started sorting
> > out the kinds of formats for which they would _actually_ export real
> > colour buffers for, that most vendors definitely have fewer than
> > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> > possible formats to represent, very likely fewer than
> > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
> > fewer than 72,057,594,037,927,936 formats, and even still generally
> > fewer than 281,474,976,710,656 if you want to be generous and leave 8
> > bits of the 56 available.
>
> The problem here is that at least for some parameters we actually don't
> know which formats are actually used.
>
> The following are not real world examples, but just to explain the
> general problem.
>
> The memory configuration for example can be not ASIC specific, but
> rather determined by whoever took the ASIC and glued it together with
> VRAM on a board. It is not likely that somebody puts all the VRAM chips
> on one channel, but it is still perfectly possible.
>
> Same is true for things like harvesting, e.g. of 16 channels halve of
> them could be bad and we need to know which to actually use.

Sure, that's fine, but I'm not sure why, for instance, 'half of the
memory channels are bad and must be avoided' would be attached to a
format in the same way that macrotile size/depth would be? Similarly,
what happens if you encode a channel or lane blacklist mask into a
modifier, and then you import a buffer with that modifier into a
client API on another device. Does that also apply to the other device
or only the source device?

How then do you handle the capability query between them: is it
relevant to device B attempting to import and sample from an image
that device A only has a 128-bit memory bus because of SKU
configuration? How does generic userspace look at a token which
encodes all of this information and know that the buffer is
interchangeable between devices?

Cheers,
Daniel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.

2018-09-04 Thread Bas Nieuwenhuizen
On Tue, Sep 4, 2018 at 12:04 PM Daniel Vetter  wrote:
>
> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen
>  wrote:
> > This is an initial proposal for format modifiers for AMD hardware.
> >
> > It uses 48 bits including a chip generation, leaving 8 bits for
> > a format version number.
> >
> > On gfx6-gfx8 we have all the fields influencing sample locations
> > in memory.
> >
> > Tile split bytes are optional for single sample buffers as no
> > hardware reaches the split size with 1 sample and hence the actual
> > size does not matter.
> >
> > The macrotile fields are duplicated for images with multiple planes.
> > If the planes have different bitdepth they need different macro
> > tile fields and different tile split bytes if multisample.
> >
> > I could not fit multiple copies in for tile split bytes, but
> > multisample & multiplane images are very rare. Overall, I think
> > we should punt on multisample for a later format version since
> > they are generally not shared on any modifier aware windowing
> > system, and we have more issues like fmask & cmask.
> >
> > We need these copies because the drm modifier of all planes in an
> > image needs to be equal, so we need to fit these together.
> >
> > This adds fields for compression support, using metadata that is
> > compatible with AMDVLK and for which radv and radeonsi can
> > reasonably be extended.
> >
> > The big open question for compression is between which generations
> > the format changed to see if we can share more.
> >
> > This explicitly does not try to solve the linear stride alignment
> > issue, thoguh we could internally just use the tiling modes for
> > the linear modes to indicate linear images with the stride for the
> > given chip.
> >
> > Signed-off-by: Bas Nieuwenhuizen 
> > CC: Chad Versace 
> > CC: Dave Airlie 
> > CC: Marek Olšák 
> > CC: Nicolai Hähnle 
> > CC: Alex Deucher 
> > CC: Daniel Vetter 
> > ---
> >  include/uapi/drm/amdgpu_drm.h | 130 ++
> >  1 file changed, 130 insertions(+)
> >
> > diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> > index 9eeba55b..4e1452161dbf 100644
> > --- a/include/uapi/drm/amdgpu_drm.h
> > +++ b/include/uapi/drm/amdgpu_drm.h
> > @@ -990,6 +990,136 @@ struct drm_amdgpu_info_vce_clock_table {
> >  #define AMDGPU_FAMILY_AI   141 /* Vega10 */
> >  #define AMDGPU_FAMILY_RV   142 /* Raven */
> >
> > +#define AMDGPU_CHIP_TAHITI 0
> > +#define AMDGPU_CHIP_PITCAIRN   1
> > +#define AMDGPU_CHIP_VERDE  2
> > +#define AMDGPU_CHIP_OLAND  3
> > +#define AMDGPU_CHIP_HAINAN 4
> > +#define AMDGPU_CHIP_BONAIRE5
> > +#define AMDGPU_CHIP_KAVERI 6
> > +#define AMDGPU_CHIP_KABINI 7
> > +#define AMDGPU_CHIP_HAWAII 8
> > +#define AMDGPU_CHIP_MULLINS9
> > +#define AMDGPU_CHIP_TOPAZ  10
> > +#define AMDGPU_CHIP_TONGA  11
> > +#define AMDGPU_CHIP_FIJI   12
> > +#define AMDGPU_CHIP_CARRIZO13
> > +#define AMDGPU_CHIP_STONEY 14
> > +#define AMDGPU_CHIP_POLARIS10  15
> > +#define AMDGPU_CHIP_POLARIS11  16
> > +#define AMDGPU_CHIP_POLARIS12  17
> > +#define AMDGPU_CHIP_VEGAM  18
> > +#define AMDGPU_CHIP_VEGA10 19
> > +#define AMDGPU_CHIP_VEGA12 20
> > +#define AMDGPU_CHIP_VEGA20 21
> > +#define AMDGPU_CHIP_RAVEN  22
> > +
> > +/* Format for GFX6-GFX8 DRM format modifiers. These are intentionally the 
> > same
> > + * as AMDGPU_TILING_*. However, the the rules as to when to set them are
> > + * different.
> > + *
> > + * Do not use linear ARRAY_MODEs or SWIZZLE_MODEs. Use 
> > DRM_FORMAT_MOD_LINEAR
> > + * instead.
> > + *
> > + * If ARRAY_MODE is 1D, only the micro tile mode and the pipe config 
> > should be
> > + * set.
> > + *
> > + * For other ARRAY_MODEs:
> > + *  - Only set TILE_SPLIT if the image is multisample.
> > + *
> > + * We have 1 extra bit for the micro tile mode, as GFX6 and GFX7+ have 1
> > + * different value there. The values are
> > + *   - depth   : 0
> > + *   - displayable : 1
> > + *   - thin: 2
> > + *   - thick (GFX6): 3
> > + *   - rotated (GFX7+) : 4
> > + *
> > + * TODO: What to do with multisample multi plane images? More tile split
> > + * fields don't fit if we want to keep a few bits for a format version.
> > + */
> > +#define AMDGPU_MODIFIER_GFX8_ARRAY_MODE_SHIFT  0
> > +#define AMDGPU_MODIFIER_GFX8_ARRAY_MODE_MASK   0xf
> > +#define AMDGPU_MODIFIER_GFX8_PIPE_CONFIG_SHIFT 4
> > +#define AMDGPU_MODIFIER_GFX8_PIPE_CONFIG_MASK  0x1f
> > +#define AMDGPU_MODIFIER_GFX8_TILE_SPLIT_SHIFT  9
> > +#define AMDGPU_MODIFIER_GFX8_TILE_SPLIT_MASK   0x7
> > +#define AMDGPU_MODIFIER_GFX8_MICRO_TILE_MODE_SHIFT 12
> > +#define AMDGPU_MODIFIER_GFX8_MICRO_TILE_MODE_MASK  0x7
> > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_SHIFT  15
> > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_MASK   0x3
> > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_SHIFT  

Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.

2018-09-04 Thread Christian König

Am 04.09.2018 um 12:15 schrieb Daniel Stone:

Hi,

On Tue, 4 Sep 2018 at 11:05, Daniel Vetter  wrote:

On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen  
wrote:

+/* The chip this is compatible with.
+ *
+ * If compression is disabled, use
+ *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
+ *   - AMDGPU_CHIP_VEGA10 for GFX9+
+ *
+ * With compression enabled please use the exact chip.
+ *
+ * TODO: Do some generations share DCC format?
+ */
+#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40
+#define AMDGPU_MODIFIER_CHIP_GEN_MASK  0xff

Do you really need all the combinations here of DCC + gpu gen + tiling
details? When we had the entire discussion with nvidia folks they
eventually agreed that they don't need the massive pile with every
possible combination. Do you really plan to share all these different
things?

Note that e.g. on i915 we spec some of the tiling depending upon
buffer size and buffer format (because that's how the hw works), not
using explicit modifier flags for everything.

Right. The conclusion, after people went through and started sorting
out the kinds of formats for which they would _actually_ export real
colour buffers for, that most vendors definitely have fewer than
115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
possible formats to represent, very likely fewer than
340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
fewer than 72,057,594,037,927,936 formats, and even still generally
fewer than 281,474,976,710,656 if you want to be generous and leave 8
bits of the 56 available.


The problem here is that at least for some parameters we actually don't 
know which formats are actually used.


The following are not real world examples, but just to explain the 
general problem.


The memory configuration for example can be not ASIC specific, but 
rather determined by whoever took the ASIC and glued it together with 
VRAM on a board. It is not likely that somebody puts all the VRAM chips 
on one channel, but it is still perfectly possible.


Same is true for things like harvesting, e.g. of 16 channels halve of 
them could be bad and we need to know which to actually use.


Regards,
Christian.



If you do use 256 bits in order to represent
115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
modifiers per format, userspace would start hitting OOM pretty quickly
as it attempted to enumerate and negotiate acceptable modifiers.
Either that or we need to replace the fixed 64-bit modifier tokens
with some kind of eBPF script.

Cheers,
Daniel
___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.

2018-09-04 Thread Daniel Stone
Hi,

On Tue, 4 Sep 2018 at 11:05, Daniel Vetter  wrote:
> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen  
> wrote:
> > +/* The chip this is compatible with.
> > + *
> > + * If compression is disabled, use
> > + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
> > + *   - AMDGPU_CHIP_VEGA10 for GFX9+
> > + *
> > + * With compression enabled please use the exact chip.
> > + *
> > + * TODO: Do some generations share DCC format?
> > + */
> > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40
> > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK  0xff
>
> Do you really need all the combinations here of DCC + gpu gen + tiling
> details? When we had the entire discussion with nvidia folks they
> eventually agreed that they don't need the massive pile with every
> possible combination. Do you really plan to share all these different
> things?
>
> Note that e.g. on i915 we spec some of the tiling depending upon
> buffer size and buffer format (because that's how the hw works), not
> using explicit modifier flags for everything.

Right. The conclusion, after people went through and started sorting
out the kinds of formats for which they would _actually_ export real
colour buffers for, that most vendors definitely have fewer than
115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
possible formats to represent, very likely fewer than
340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
fewer than 72,057,594,037,927,936 formats, and even still generally
fewer than 281,474,976,710,656 if you want to be generous and leave 8
bits of the 56 available.

If you do use 256 bits in order to represent
115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
modifiers per format, userspace would start hitting OOM pretty quickly
as it attempted to enumerate and negotiate acceptable modifiers.
Either that or we need to replace the fixed 64-bit modifier tokens
with some kind of eBPF script.

Cheers,
Daniel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.

2018-09-04 Thread Daniel Vetter
On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen
 wrote:
> This is an initial proposal for format modifiers for AMD hardware.
>
> It uses 48 bits including a chip generation, leaving 8 bits for
> a format version number.
>
> On gfx6-gfx8 we have all the fields influencing sample locations
> in memory.
>
> Tile split bytes are optional for single sample buffers as no
> hardware reaches the split size with 1 sample and hence the actual
> size does not matter.
>
> The macrotile fields are duplicated for images with multiple planes.
> If the planes have different bitdepth they need different macro
> tile fields and different tile split bytes if multisample.
>
> I could not fit multiple copies in for tile split bytes, but
> multisample & multiplane images are very rare. Overall, I think
> we should punt on multisample for a later format version since
> they are generally not shared on any modifier aware windowing
> system, and we have more issues like fmask & cmask.
>
> We need these copies because the drm modifier of all planes in an
> image needs to be equal, so we need to fit these together.
>
> This adds fields for compression support, using metadata that is
> compatible with AMDVLK and for which radv and radeonsi can
> reasonably be extended.
>
> The big open question for compression is between which generations
> the format changed to see if we can share more.
>
> This explicitly does not try to solve the linear stride alignment
> issue, thoguh we could internally just use the tiling modes for
> the linear modes to indicate linear images with the stride for the
> given chip.
>
> Signed-off-by: Bas Nieuwenhuizen 
> CC: Chad Versace 
> CC: Dave Airlie 
> CC: Marek Olšák 
> CC: Nicolai Hähnle 
> CC: Alex Deucher 
> CC: Daniel Vetter 
> ---
>  include/uapi/drm/amdgpu_drm.h | 130 ++
>  1 file changed, 130 insertions(+)
>
> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> index 9eeba55b..4e1452161dbf 100644
> --- a/include/uapi/drm/amdgpu_drm.h
> +++ b/include/uapi/drm/amdgpu_drm.h
> @@ -990,6 +990,136 @@ struct drm_amdgpu_info_vce_clock_table {
>  #define AMDGPU_FAMILY_AI   141 /* Vega10 */
>  #define AMDGPU_FAMILY_RV   142 /* Raven */
>
> +#define AMDGPU_CHIP_TAHITI 0
> +#define AMDGPU_CHIP_PITCAIRN   1
> +#define AMDGPU_CHIP_VERDE  2
> +#define AMDGPU_CHIP_OLAND  3
> +#define AMDGPU_CHIP_HAINAN 4
> +#define AMDGPU_CHIP_BONAIRE5
> +#define AMDGPU_CHIP_KAVERI 6
> +#define AMDGPU_CHIP_KABINI 7
> +#define AMDGPU_CHIP_HAWAII 8
> +#define AMDGPU_CHIP_MULLINS9
> +#define AMDGPU_CHIP_TOPAZ  10
> +#define AMDGPU_CHIP_TONGA  11
> +#define AMDGPU_CHIP_FIJI   12
> +#define AMDGPU_CHIP_CARRIZO13
> +#define AMDGPU_CHIP_STONEY 14
> +#define AMDGPU_CHIP_POLARIS10  15
> +#define AMDGPU_CHIP_POLARIS11  16
> +#define AMDGPU_CHIP_POLARIS12  17
> +#define AMDGPU_CHIP_VEGAM  18
> +#define AMDGPU_CHIP_VEGA10 19
> +#define AMDGPU_CHIP_VEGA12 20
> +#define AMDGPU_CHIP_VEGA20 21
> +#define AMDGPU_CHIP_RAVEN  22
> +
> +/* Format for GFX6-GFX8 DRM format modifiers. These are intentionally the 
> same
> + * as AMDGPU_TILING_*. However, the the rules as to when to set them are
> + * different.
> + *
> + * Do not use linear ARRAY_MODEs or SWIZZLE_MODEs. Use DRM_FORMAT_MOD_LINEAR
> + * instead.
> + *
> + * If ARRAY_MODE is 1D, only the micro tile mode and the pipe config should 
> be
> + * set.
> + *
> + * For other ARRAY_MODEs:
> + *  - Only set TILE_SPLIT if the image is multisample.
> + *
> + * We have 1 extra bit for the micro tile mode, as GFX6 and GFX7+ have 1
> + * different value there. The values are
> + *   - depth   : 0
> + *   - displayable : 1
> + *   - thin: 2
> + *   - thick (GFX6): 3
> + *   - rotated (GFX7+) : 4
> + *
> + * TODO: What to do with multisample multi plane images? More tile split
> + * fields don't fit if we want to keep a few bits for a format version.
> + */
> +#define AMDGPU_MODIFIER_GFX8_ARRAY_MODE_SHIFT  0
> +#define AMDGPU_MODIFIER_GFX8_ARRAY_MODE_MASK   0xf
> +#define AMDGPU_MODIFIER_GFX8_PIPE_CONFIG_SHIFT 4
> +#define AMDGPU_MODIFIER_GFX8_PIPE_CONFIG_MASK  0x1f
> +#define AMDGPU_MODIFIER_GFX8_TILE_SPLIT_SHIFT  9
> +#define AMDGPU_MODIFIER_GFX8_TILE_SPLIT_MASK   0x7
> +#define AMDGPU_MODIFIER_GFX8_MICRO_TILE_MODE_SHIFT 12
> +#define AMDGPU_MODIFIER_GFX8_MICRO_TILE_MODE_MASK  0x7
> +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_SHIFT  15
> +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_MASK   0x3
> +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_SHIFT 17
> +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_MASK  0x3
> +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_SHIFT   19
> +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_MASK0x3
> +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_SHIFT   21
> +#define 

Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-09-04 Thread Christian König

Am 04.09.2018 um 11:00 schrieb zhoucm1:



On 2018年09月04日 16:42, Christian König wrote:

Am 04.09.2018 um 10:27 schrieb zhoucm1:



On 2018年09月04日 16:05, Christian König wrote:

Am 04.09.2018 um 09:53 schrieb zhoucm1:

[SNIP]


How about this idea:
1. Each signaling point is a fence implementation with an rb node.
2. Each node keeps a reference to the last previously inserted 
node.

3. Each node is referenced by the sync object itself.
4. Before each signal/wait operation we do a garbage collection 
and remove the first node from the tree as long as it is signaled.


5. When enable_signaling is requested for a node we cascade 
that to the left using rb_prev.
    This ensures that signaling is enabled for the current 
fence as well as all previous fences.


6. A wait just looks into the tree for the signal point lower 
or equal of the requested sequence number.
After re-thought your idea, I think it doesn't work since there is 
no timeline value as a line:
signal pt value doesn't must be continues, which can be jump by 
signal operation, like 1, 4, 8, 15, 19, e.g. there are five 
singal_pt, 
signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if a 
wait pt is 7, do you mean this wait only needs signal_pt1 and 
signal_pt4???  That's certainly not right, we need to make sure 
the timeline value is bigger than wait pt value, that means 
signal_pt8 is need for wait_pt7.


That can be defined as we like it, e.g. when a wait operation asks 
for 7 we can return 8 as well.
If defined this, then problem is coming again, if 8 is removed when 
garbage collection, you will return 15?


The garbage collection is only done for signaled nodes. So when 8 is 
already garbage collected and 7 is asked we know that we don't need 
to return anything.
8 is a signaled node, waitA/signal operation do garbage collection, 
how waitB(7) know the garbage history?


Well we of course keep what the last garbage collected number is, don't we?

Since there is no timeline as a line, I think this is not right 
direction.


That is actually intended. There is no infinite timeline here, just a 
windows of the last not yet signaled fences.
No one said the it's a infinite timeline, timeline will stop 
increasing when syncobj is released.


Yeah, but the syncobj can live for a very very long time. Not having 
some form of shrinking it when fences are signaled is certainly not 
going to fly very far.


Regards,
Christian.



Anyway kref is a good way to solve the 'free' problem, I will try to 
use it improve my patch, of course, will refer your idea.:)


Thanks,
David Zhou


Otherwise you will never be able to release nodes from the tree since 
you always need to keep them around just in case somebody asks for a 
lower number.


Regards,
Christian.





The key is that as soon as a signal point is added adding a 
previous point is no longer allowed.

That's intention.

Regards,
David Zhou




7. When the sync object is released we use 
rbtree_postorder_for_each_entry_safe() and drop the extra 
reference to each node, but never call rb_erase!
    This way the rb_tree stays in memory, but without a root 
(e.g. the sync object). It only destructs itself when the 
looked up references to the nodes are dropped.
And here, who will destroy rb node since no one do 
enable_signaling, and there is no callback to free themselves.


The node will be destroyed when the last reference drops, not when 
enable_signaling is called.


In other words the sync_obj keeps the references to each tree 
object to provide the wait operation, as soon as the sync_obj is 
destroyed we don't need that functionality any more.


We don't even need to wait for anything to be signaled, this way we 
can drop all unused signal points as soon as the sync_obj is 
destroyed.


Only the used ones will stay alive and provide the necessary 
functionality to provide the signal for each wait operation.


Regards,
Christian.



Regards,
David Zhou


Well that is quite a bunch of logic, but I think that should 
work fine.
Yeah, it could work, simple timeline reference also can solve 
'free' problem.


I think this approach is still quite a bit better, 


e.g. you don't run into circle dependency problems, it needs less 
memory and each node has always the same size which means we can 
use a kmem_cache for it.


Regards,
Christian.



Thanks,
David Zhou






___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx








___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Mesa-dev] [PATCH libdrm] Add basic CONTRIBUTING file

2018-09-04 Thread Eric Engestrom
On Tuesday, 2018-09-04 16:24:44 +1000, Dave Airlie wrote:
> On Mon, 3 Sep 2018 at 18:47, Daniel Vetter  wrote:
> >
> > I picked up a bunch of the pieces from wayland's version:
> >
> > https://gitlab.freedesktop.org/wayland/wayland/blob/master/CONTRIBUTING.md
> >
> > The weston one is fairly similar. Then I rather massively trimmed it
> > down since in reality libdrm is a bit a dumping ground with very few
> > real rules. The commit rights and CoC sections I've copied verbatim
> > from igt respectively drm-misc. Weston/Wayland only differ in their
> > pick of how many patches you need (10 instead of 5). I think for
> > libdrm this is supremely relevant, since most everyone will get their
> > commit rights by contributing already to the kernel or mesa and having
> > commit rights there already.
> >
> > Anyway, I figured this is good to get the rules documented, even if
> > there's mostly not many rules.
> >
> > Note: This references maintainers in a MAINTAINERS file, which needs
> > to be created first.
> >
> > Note: With the gitlab migration the entire commit rights process is
> > still a bit up in the air. But gitlab commit rights and roles are
> > hierarchical, so we can do libdrm-only maintainer/commiter roles
> > ("Owner" and "Developer" in gitlab-speak). This should avoid
> > conflating libdrm roles with mesa roles, useful for those pushing to
> > libdrm as primarily kernel contributors.
> 
> Fine with me,
> 
> Acked-by: Dave Airlie 
> 
> Dave.

I think this has gathered enough acks and rbs, you can just push it now
and if there's anything that should be adjusted we can do that as
a follow up :)
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-09-04 Thread zhoucm1



On 2018年09月04日 16:42, Christian König wrote:

Am 04.09.2018 um 10:27 schrieb zhoucm1:



On 2018年09月04日 16:05, Christian König wrote:

Am 04.09.2018 um 09:53 schrieb zhoucm1:

[SNIP]


How about this idea:
1. Each signaling point is a fence implementation with an rb node.
2. Each node keeps a reference to the last previously inserted 
node.

3. Each node is referenced by the sync object itself.
4. Before each signal/wait operation we do a garbage collection 
and remove the first node from the tree as long as it is signaled.


5. When enable_signaling is requested for a node we cascade that 
to the left using rb_prev.
    This ensures that signaling is enabled for the current fence 
as well as all previous fences.


6. A wait just looks into the tree for the signal point lower or 
equal of the requested sequence number.
After re-thought your idea, I think it doesn't work since there is 
no timeline value as a line:
signal pt value doesn't must be continues, which can be jump by 
signal operation, like 1, 4, 8, 15, 19, e.g. there are five 
singal_pt, 
signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if a 
wait pt is 7, do you mean this wait only needs signal_pt1 and 
signal_pt4???  That's certainly not right, we need to make sure the 
timeline value is bigger than wait pt value, that means signal_pt8 
is need for wait_pt7.


That can be defined as we like it, e.g. when a wait operation asks 
for 7 we can return 8 as well.
If defined this, then problem is coming again, if 8 is removed when 
garbage collection, you will return 15?


The garbage collection is only done for signaled nodes. So when 8 is 
already garbage collected and 7 is asked we know that we don't need to 
return anything.
8 is a signaled node, waitA/signal operation do garbage collection, how 
waitB(7) know the garbage history?




Since there is no timeline as a line, I think this is not right 
direction.


That is actually intended. There is no infinite timeline here, just a 
windows of the last not yet signaled fences.
No one said the it's a infinite timeline, timeline will stop increasing 
when syncobj is released.


Anyway kref is a good way to solve the 'free' problem, I will try to use 
it improve my patch, of course, will refer your idea.:)


Thanks,
David Zhou


Otherwise you will never be able to release nodes from the tree since 
you always need to keep them around just in case somebody asks for a 
lower number.


Regards,
Christian.





The key is that as soon as a signal point is added adding a previous 
point is no longer allowed.

That's intention.

Regards,
David Zhou




7. When the sync object is released we use 
rbtree_postorder_for_each_entry_safe() and drop the extra 
reference to each node, but never call rb_erase!
    This way the rb_tree stays in memory, but without a root 
(e.g. the sync object). It only destructs itself when the looked 
up references to the nodes are dropped.
And here, who will destroy rb node since no one do 
enable_signaling, and there is no callback to free themselves.


The node will be destroyed when the last reference drops, not when 
enable_signaling is called.


In other words the sync_obj keeps the references to each tree object 
to provide the wait operation, as soon as the sync_obj is destroyed 
we don't need that functionality any more.


We don't even need to wait for anything to be signaled, this way we 
can drop all unused signal points as soon as the sync_obj is destroyed.


Only the used ones will stay alive and provide the necessary 
functionality to provide the signal for each wait operation.


Regards,
Christian.



Regards,
David Zhou


Well that is quite a bunch of logic, but I think that should 
work fine.
Yeah, it could work, simple timeline reference also can solve 
'free' problem.


I think this approach is still quite a bit better, 


e.g. you don't run into circle dependency problems, it needs less 
memory and each node has always the same size which means we can 
use a kmem_cache for it.


Regards,
Christian.



Thanks,
David Zhou






___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx






___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-09-04 Thread Christian König

Am 04.09.2018 um 10:27 schrieb zhoucm1:



On 2018年09月04日 16:05, Christian König wrote:

Am 04.09.2018 um 09:53 schrieb zhoucm1:

[SNIP]


How about this idea:
1. Each signaling point is a fence implementation with an rb node.
2. Each node keeps a reference to the last previously inserted node.
3. Each node is referenced by the sync object itself.
4. Before each signal/wait operation we do a garbage collection 
and remove the first node from the tree as long as it is signaled.


5. When enable_signaling is requested for a node we cascade that 
to the left using rb_prev.
    This ensures that signaling is enabled for the current fence 
as well as all previous fences.


6. A wait just looks into the tree for the signal point lower or 
equal of the requested sequence number.
After re-thought your idea, I think it doesn't work since there is 
no timeline value as a line:
signal pt value doesn't must be continues, which can be jump by 
signal operation, like 1, 4, 8, 15, 19, e.g. there are five 
singal_pt, 
signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if a 
wait pt is 7, do you mean this wait only needs signal_pt1 and 
signal_pt4???  That's certainly not right, we need to make sure the 
timeline value is bigger than wait pt value, that means signal_pt8 
is need for wait_pt7.


That can be defined as we like it, e.g. when a wait operation asks 
for 7 we can return 8 as well.
If defined this, then problem is coming again, if 8 is removed when 
garbage collection, you will return 15?


The garbage collection is only done for signaled nodes. So when 8 is 
already garbage collected and 7 is asked we know that we don't need to 
return anything.


Since there is no timeline as a line, I think this is not right 
direction.


That is actually intended. There is no infinite timeline here, just a 
windows of the last not yet signaled fences.


Otherwise you will never be able to release nodes from the tree since 
you always need to keep them around just in case somebody asks for a 
lower number.


Regards,
Christian.





The key is that as soon as a signal point is added adding a previous 
point is no longer allowed.

That's intention.

Regards,
David Zhou




7. When the sync object is released we use 
rbtree_postorder_for_each_entry_safe() and drop the extra 
reference to each node, but never call rb_erase!
    This way the rb_tree stays in memory, but without a root 
(e.g. the sync object). It only destructs itself when the looked 
up references to the nodes are dropped.
And here, who will destroy rb node since no one do enable_signaling, 
and there is no callback to free themselves.


The node will be destroyed when the last reference drops, not when 
enable_signaling is called.


In other words the sync_obj keeps the references to each tree object 
to provide the wait operation, as soon as the sync_obj is destroyed 
we don't need that functionality any more.


We don't even need to wait for anything to be signaled, this way we 
can drop all unused signal points as soon as the sync_obj is destroyed.


Only the used ones will stay alive and provide the necessary 
functionality to provide the signal for each wait operation.


Regards,
Christian.



Regards,
David Zhou


Well that is quite a bunch of logic, but I think that should work 
fine.
Yeah, it could work, simple timeline reference also can solve 
'free' problem.


I think this approach is still quite a bit better, 


e.g. you don't run into circle dependency problems, it needs less 
memory and each node has always the same size which means we can 
use a kmem_cache for it.


Regards,
Christian.



Thanks,
David Zhou






___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-09-04 Thread zhoucm1



On 2018年09月04日 16:05, Christian König wrote:

Am 04.09.2018 um 09:53 schrieb zhoucm1:

[SNIP]


How about this idea:
1. Each signaling point is a fence implementation with an rb node.
2. Each node keeps a reference to the last previously inserted node.
3. Each node is referenced by the sync object itself.
4. Before each signal/wait operation we do a garbage collection 
and remove the first node from the tree as long as it is signaled.


5. When enable_signaling is requested for a node we cascade that 
to the left using rb_prev.
    This ensures that signaling is enabled for the current fence 
as well as all previous fences.


6. A wait just looks into the tree for the signal point lower or 
equal of the requested sequence number.
After re-thought your idea, I think it doesn't work since there is no 
timeline value as a line:
signal pt value doesn't must be continues, which can be jump by 
signal operation, like 1, 4, 8, 15, 19, e.g. there are five 
singal_pt, 
signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if a 
wait pt is 7, do you mean this wait only needs signal_pt1 and 
signal_pt4???  That's certainly not right, we need to make sure the 
timeline value is bigger than wait pt value, that means signal_pt8 is 
need for wait_pt7.


That can be defined as we like it, e.g. when a wait operation asks for 
7 we can return 8 as well.
If defined this, then problem is coming again, if 8 is removed when 
garbage collection, you will return 15? Since there is no timeline as a 
line, I think this is not right direction.




The key is that as soon as a signal point is added adding a previous 
point is no longer allowed.

That's intention.

Regards,
David Zhou




7. When the sync object is released we use 
rbtree_postorder_for_each_entry_safe() and drop the extra 
reference to each node, but never call rb_erase!
    This way the rb_tree stays in memory, but without a root (e.g. 
the sync object). It only destructs itself when the looked up 
references to the nodes are dropped.
And here, who will destroy rb node since no one do enable_signaling, 
and there is no callback to free themselves.


The node will be destroyed when the last reference drops, not when 
enable_signaling is called.


In other words the sync_obj keeps the references to each tree object 
to provide the wait operation, as soon as the sync_obj is destroyed we 
don't need that functionality any more.


We don't even need to wait for anything to be signaled, this way we 
can drop all unused signal points as soon as the sync_obj is destroyed.


Only the used ones will stay alive and provide the necessary 
functionality to provide the signal for each wait operation.


Regards,
Christian.



Regards,
David Zhou


Well that is quite a bunch of logic, but I think that should work 
fine.
Yeah, it could work, simple timeline reference also can solve 
'free' problem.


I think this approach is still quite a bit better, 


e.g. you don't run into circle dependency problems, it needs less 
memory and each node has always the same size which means we can use 
a kmem_cache for it.


Regards,
Christian.



Thanks,
David Zhou






___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-09-04 Thread Christian König

Am 04.09.2018 um 09:53 schrieb zhoucm1:

[SNIP]


How about this idea:
1. Each signaling point is a fence implementation with an rb node.
2. Each node keeps a reference to the last previously inserted node.
3. Each node is referenced by the sync object itself.
4. Before each signal/wait operation we do a garbage collection and 
remove the first node from the tree as long as it is signaled.


5. When enable_signaling is requested for a node we cascade that to 
the left using rb_prev.
    This ensures that signaling is enabled for the current fence as 
well as all previous fences.


6. A wait just looks into the tree for the signal point lower or 
equal of the requested sequence number.
After re-thought your idea, I think it doesn't work since there is no 
timeline value as a line:
signal pt value doesn't must be continues, which can be jump by signal 
operation, like 1, 4, 8, 15, 19, e.g. there are five singal_pt, 
signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if a 
wait pt is 7, do you mean this wait only needs signal_pt1 and 
signal_pt4???  That's certainly not right, we need to make sure the 
timeline value is bigger than wait pt value, that means signal_pt8 is 
need for wait_pt7.


That can be defined as we like it, e.g. when a wait operation asks for 7 
we can return 8 as well.


The key is that as soon as a signal point is added adding a previous 
point is no longer allowed.




7. When the sync object is released we use 
rbtree_postorder_for_each_entry_safe() and drop the extra reference 
to each node, but never call rb_erase!
    This way the rb_tree stays in memory, but without a root (e.g. 
the sync object). It only destructs itself when the looked up 
references to the nodes are dropped.
And here, who will destroy rb node since no one do enable_signaling, 
and there is no callback to free themselves.


The node will be destroyed when the last reference drops, not when 
enable_signaling is called.


In other words the sync_obj keeps the references to each tree object to 
provide the wait operation, as soon as the sync_obj is destroyed we 
don't need that functionality any more.


We don't even need to wait for anything to be signaled, this way we can 
drop all unused signal points as soon as the sync_obj is destroyed.


Only the used ones will stay alive and provide the necessary 
functionality to provide the signal for each wait operation.


Regards,
Christian.



Regards,
David Zhou


Well that is quite a bunch of logic, but I think that should work 
fine.
Yeah, it could work, simple timeline reference also can solve 'free' 
problem.


I think this approach is still quite a bit better, 


e.g. you don't run into circle dependency problems, it needs less 
memory and each node has always the same size which means we can use 
a kmem_cache for it.


Regards,
Christian.



Thanks,
David Zhou






___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-09-04 Thread zhoucm1



On 2018年09月04日 15:00, Christian König wrote:

Am 04.09.2018 um 06:04 schrieb zhoucm1:



On 2018年09月03日 19:19, Christian König wrote:

Am 03.09.2018 um 12:07 schrieb Chunming Zhou:



在 2018/9/3 16:50, Christian König 写道:

Am 03.09.2018 um 06:13 schrieb Chunming Zhou:



在 2018/8/30 19:32, Christian König 写道:

[SNIP]



+
+struct drm_syncobj_wait_pt {
+    struct drm_syncobj_stub_fence base;
+    u64    value;
+    struct rb_node   node;
+};
+struct drm_syncobj_signal_pt {
+    struct drm_syncobj_stub_fence base;
+    struct dma_fence *signal_fence;
+    struct dma_fence *pre_pt_base;
+    struct dma_fence_cb signal_cb;
+    struct dma_fence_cb pre_pt_cb;
+    struct drm_syncobj *syncobj;
+    u64    value;
+    struct list_head list;
+};


What are those two structures good for

They are used to record wait ops points and signal ops points.
For timeline, they are connected by timeline value, works like:
    a. signal pt increase timeline value to signal_pt value 
when signal_pt is signaled. signal_pt is depending on 
previous pt fence and itself signal fence from signal ops.

    b. wait pt tree checks if timeline value over itself value.

For normal, works like:
    a. signal pt increase 1 for 
syncobj_timeline->signal_point every time when signal ops is 
performed.
    b. when wait ops is comming, wait pt will fetch above 
last signal pt value as its wait point. wait pt will be only 
signaled by equal point value signal_pt.




and why is the stub fence their base?
Good question, I tried to kzalloc for them as well when I 
debug them, I encountered a problem:
I lookup/find wait_pt or signal_pt successfully, but when I 
tried to use them, they are freed sometimes, and results in 
NULL point.
and generally, when lookup them, we often need their stub 
fence as well, and in the meantime, their lifecycle are same.

Above reason, I make stub fence as their base.


That sounds like you only did this because you messed up the 
lifecycle.


Additional to that I don't think you correctly considered the 
lifecycle of the waits and the sync object itself. E.g. 
blocking in drm_syncobj_timeline_fini() until all waits are 
done is not a good idea.


What you should do instead is to create a fence_array object 
with all the fence we need to wait for when a wait point is 
requested.
Yeah, this is our initial discussion result, but when I tried 
to do that, I found that cannot meet the advance signal 
requirement:
    a. We need to consider the wait and signal pt value are not 
one-to-one match, it's difficult to find dependent point, at 
least, there is some overhead.


As far as I can see that is independent of using a fence array 
here. See you can either use a ring buffer or an rb-tree, but 
when you want to wait for a specific point we need to condense 
the not yet signaled fences into an array.
again, need to find the range of where the specific point in, we 
should close to timeline semantics, I also refered the sw_sync.c 
timeline, usally wait_pt is signalled by timeline point. And I 
agree we can implement it with several methods, but I don't think 
there are basical differences.


The problem is that with your current approach you need the 
sync_obj alive for the synchronization to work. That is most 
likely not a good idea.
Indeed, I will fix that. How abount only creating fence array for 
every wait pt when syncobj release? when syncobj release, wait pt 
must have waited the signal opertation, then we can easily condense 
fences for every wait pt. And meantime, we can take timeline based 
wait pt advantage.


That could work, but I don't see how you want to replace the already 
issued fence with a fence_array when the sync object is destroyed.


Additional to that I would rather prefer a consistent handling, e.g. 
without extra rarely used code paths.
Ah, I find a easy way, we just need to make syncobj_timeline 
structure as a reference. This way syncobj itself can be released 
first, wait_pt/signal_pt don't need syncobj at all.

every wait_pt/signal_pt keep a reference of syncobj_timeline.


I've thought about that as well, but came to the conclusion that you 
run into problems because of circle dependencies.


E.g. sync_obj references sync_point and sync_point references sync_obj.
sync_obj can be freed first, only sync point references syncobj_timeline 
structure, syncobj_timeline doesn't reference sync_pt, no circle dep.




Additional to that it is quite a bit larger memory footprint because 
you need to keep the sync_obj around as well.
all signaled sync_pt are freed immediately except syncobj_timeline 
sturcture, where does extra memory foootprint take?












Additional to that you enable signaling without a need from the 
waiting side. That is rather bad for implementations which need 
that optimization.
Do you mean increasing timeline based on signal fence is not 
better? only update timeline value when requested by a wait pt?


Yes, exactly.

This way, we will not update timeline value 

Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.

2018-09-04 Thread Christian König

Am 04.09.2018 um 03:00 schrieb Bas Nieuwenhuizen:

This is an initial proposal for format modifiers for AMD hardware.

It uses 48 bits including a chip generation, leaving 8 bits for
a format version number.


I'm absolutely not an expert on this, but as far as I know the major 
problem with this approach was that we couldn't fit all necessary 
parameter into a 64bit identifier.


The last thing I've heard was something like we need at least 128bit, 
better 256. But Marek and Alex certainly know more about that stuff.


Christian.



On gfx6-gfx8 we have all the fields influencing sample locations
in memory.

Tile split bytes are optional for single sample buffers as no
hardware reaches the split size with 1 sample and hence the actual
size does not matter.

The macrotile fields are duplicated for images with multiple planes.
If the planes have different bitdepth they need different macro
tile fields and different tile split bytes if multisample.

I could not fit multiple copies in for tile split bytes, but
multisample & multiplane images are very rare. Overall, I think
we should punt on multisample for a later format version since
they are generally not shared on any modifier aware windowing
system, and we have more issues like fmask & cmask.

We need these copies because the drm modifier of all planes in an
image needs to be equal, so we need to fit these together.

This adds fields for compression support, using metadata that is
compatible with AMDVLK and for which radv and radeonsi can
reasonably be extended.

The big open question for compression is between which generations
the format changed to see if we can share more.

This explicitly does not try to solve the linear stride alignment
issue, thoguh we could internally just use the tiling modes for
the linear modes to indicate linear images with the stride for the
given chip.

Signed-off-by: Bas Nieuwenhuizen 
CC: Chad Versace 
CC: Dave Airlie 
CC: Marek Olšák 
CC: Nicolai Hähnle 
CC: Alex Deucher 
CC: Daniel Vetter 
---
  include/uapi/drm/amdgpu_drm.h | 130 ++
  1 file changed, 130 insertions(+)

diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 9eeba55b..4e1452161dbf 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -990,6 +990,136 @@ struct drm_amdgpu_info_vce_clock_table {
  #define AMDGPU_FAMILY_AI  141 /* Vega10 */
  #define AMDGPU_FAMILY_RV  142 /* Raven */
  
+#define AMDGPU_CHIP_TAHITI	0

+#define AMDGPU_CHIP_PITCAIRN   1
+#define AMDGPU_CHIP_VERDE  2
+#define AMDGPU_CHIP_OLAND  3
+#define AMDGPU_CHIP_HAINAN 4
+#define AMDGPU_CHIP_BONAIRE5
+#define AMDGPU_CHIP_KAVERI 6
+#define AMDGPU_CHIP_KABINI 7
+#define AMDGPU_CHIP_HAWAII 8
+#define AMDGPU_CHIP_MULLINS9
+#define AMDGPU_CHIP_TOPAZ  10
+#define AMDGPU_CHIP_TONGA  11
+#define AMDGPU_CHIP_FIJI   12
+#define AMDGPU_CHIP_CARRIZO13
+#define AMDGPU_CHIP_STONEY 14
+#define AMDGPU_CHIP_POLARIS10  15
+#define AMDGPU_CHIP_POLARIS11  16
+#define AMDGPU_CHIP_POLARIS12  17
+#define AMDGPU_CHIP_VEGAM  18
+#define AMDGPU_CHIP_VEGA10 19
+#define AMDGPU_CHIP_VEGA12 20
+#define AMDGPU_CHIP_VEGA20 21
+#define AMDGPU_CHIP_RAVEN  22
+
+/* Format for GFX6-GFX8 DRM format modifiers. These are intentionally the same
+ * as AMDGPU_TILING_*. However, the the rules as to when to set them are
+ * different.
+ *
+ * Do not use linear ARRAY_MODEs or SWIZZLE_MODEs. Use DRM_FORMAT_MOD_LINEAR
+ * instead.
+ *
+ * If ARRAY_MODE is 1D, only the micro tile mode and the pipe config should be
+ * set.
+ *
+ * For other ARRAY_MODEs:
+ *  - Only set TILE_SPLIT if the image is multisample.
+ *
+ * We have 1 extra bit for the micro tile mode, as GFX6 and GFX7+ have 1
+ * different value there. The values are
+ *   - depth   : 0
+ *   - displayable : 1
+ *   - thin: 2
+ *   - thick (GFX6): 3
+ *   - rotated (GFX7+) : 4
+ *
+ * TODO: What to do with multisample multi plane images? More tile split
+ * fields don't fit if we want to keep a few bits for a format version.
+ */
+#define AMDGPU_MODIFIER_GFX8_ARRAY_MODE_SHIFT  0
+#define AMDGPU_MODIFIER_GFX8_ARRAY_MODE_MASK   0xf
+#define AMDGPU_MODIFIER_GFX8_PIPE_CONFIG_SHIFT 4
+#define AMDGPU_MODIFIER_GFX8_PIPE_CONFIG_MASK  0x1f
+#define AMDGPU_MODIFIER_GFX8_TILE_SPLIT_SHIFT  9
+#define AMDGPU_MODIFIER_GFX8_TILE_SPLIT_MASK   0x7
+#define AMDGPU_MODIFIER_GFX8_MICRO_TILE_MODE_SHIFT 12
+#define AMDGPU_MODIFIER_GFX8_MICRO_TILE_MODE_MASK  0x7
+#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_SHIFT  15
+#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_MASK   0x3
+#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_SHIFT 17
+#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_MASK  0x3
+#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_SHIFT   19
+#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_MASK0x3

Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-09-04 Thread Christian König

Am 04.09.2018 um 06:04 schrieb zhoucm1:



On 2018年09月03日 19:19, Christian König wrote:

Am 03.09.2018 um 12:07 schrieb Chunming Zhou:



在 2018/9/3 16:50, Christian König 写道:

Am 03.09.2018 um 06:13 schrieb Chunming Zhou:



在 2018/8/30 19:32, Christian König 写道:

[SNIP]



+
+struct drm_syncobj_wait_pt {
+    struct drm_syncobj_stub_fence base;
+    u64    value;
+    struct rb_node   node;
+};
+struct drm_syncobj_signal_pt {
+    struct drm_syncobj_stub_fence base;
+    struct dma_fence *signal_fence;
+    struct dma_fence *pre_pt_base;
+    struct dma_fence_cb signal_cb;
+    struct dma_fence_cb pre_pt_cb;
+    struct drm_syncobj *syncobj;
+    u64    value;
+    struct list_head list;
+};


What are those two structures good for

They are used to record wait ops points and signal ops points.
For timeline, they are connected by timeline value, works like:
    a. signal pt increase timeline value to signal_pt value 
when signal_pt is signaled. signal_pt is depending on previous 
pt fence and itself signal fence from signal ops.

    b. wait pt tree checks if timeline value over itself value.

For normal, works like:
    a. signal pt increase 1 for syncobj_timeline->signal_point 
every time when signal ops is performed.
    b. when wait ops is comming, wait pt will fetch above last 
signal pt value as its wait point. wait pt will be only 
signaled by equal point value signal_pt.




and why is the stub fence their base?
Good question, I tried to kzalloc for them as well when I 
debug them, I encountered a problem:
I lookup/find wait_pt or signal_pt successfully, but when I 
tried to use them, they are freed sometimes, and results in 
NULL point.
and generally, when lookup them, we often need their stub 
fence as well, and in the meantime, their lifecycle are same.

Above reason, I make stub fence as their base.


That sounds like you only did this because you messed up the 
lifecycle.


Additional to that I don't think you correctly considered the 
lifecycle of the waits and the sync object itself. E.g. 
blocking in drm_syncobj_timeline_fini() until all waits are 
done is not a good idea.


What you should do instead is to create a fence_array object 
with all the fence we need to wait for when a wait point is 
requested.
Yeah, this is our initial discussion result, but when I tried to 
do that, I found that cannot meet the advance signal requirement:
    a. We need to consider the wait and signal pt value are not 
one-to-one match, it's difficult to find dependent point, at 
least, there is some overhead.


As far as I can see that is independent of using a fence array 
here. See you can either use a ring buffer or an rb-tree, but 
when you want to wait for a specific point we need to condense 
the not yet signaled fences into an array.
again, need to find the range of where the specific point in, we 
should close to timeline semantics, I also refered the sw_sync.c 
timeline, usally wait_pt is signalled by timeline point. And I 
agree we can implement it with several methods, but I don't think 
there are basical differences.


The problem is that with your current approach you need the 
sync_obj alive for the synchronization to work. That is most likely 
not a good idea.
Indeed, I will fix that. How abount only creating fence array for 
every wait pt when syncobj release? when syncobj release, wait pt 
must have waited the signal opertation, then we can easily condense 
fences for every wait pt. And meantime, we can take timeline based 
wait pt advantage.


That could work, but I don't see how you want to replace the already 
issued fence with a fence_array when the sync object is destroyed.


Additional to that I would rather prefer a consistent handling, e.g. 
without extra rarely used code paths.
Ah, I find a easy way, we just need to make syncobj_timeline structure 
as a reference. This way syncobj itself can be released first, 
wait_pt/signal_pt don't need syncobj at all.

every wait_pt/signal_pt keep a reference of syncobj_timeline.


I've thought about that as well, but came to the conclusion that you run 
into problems because of circle dependencies.


E.g. sync_obj references sync_point and sync_point references sync_obj.

Additional to that it is quite a bit larger memory footprint because you 
need to keep the sync_obj around as well.










Additional to that you enable signaling without a need from the 
waiting side. That is rather bad for implementations which need 
that optimization.
Do you mean increasing timeline based on signal fence is not better? 
only update timeline value when requested by a wait pt?


Yes, exactly.

This way, we will not update timeline value immidiately and cannot 
free signal pt immidiately, and we also need to consider it to CPU 
query and wait.


That is actually the better coding style. We usually try to avoid 
doing things in interrupt handlers as much as possible.
OK, I see your concern, how about to delay handling to a workqueue? 
this way, we only 

Re: [PATCH libdrm] Add basic CONTRIBUTING file

2018-09-04 Thread Dave Airlie
On Mon, 3 Sep 2018 at 18:47, Daniel Vetter  wrote:
>
> I picked up a bunch of the pieces from wayland's version:
>
> https://gitlab.freedesktop.org/wayland/wayland/blob/master/CONTRIBUTING.md
>
> The weston one is fairly similar. Then I rather massively trimmed it
> down since in reality libdrm is a bit a dumping ground with very few
> real rules. The commit rights and CoC sections I've copied verbatim
> from igt respectively drm-misc. Weston/Wayland only differ in their
> pick of how many patches you need (10 instead of 5). I think for
> libdrm this is supremely relevant, since most everyone will get their
> commit rights by contributing already to the kernel or mesa and having
> commit rights there already.
>
> Anyway, I figured this is good to get the rules documented, even if
> there's mostly not many rules.
>
> Note: This references maintainers in a MAINTAINERS file, which needs
> to be created first.
>
> Note: With the gitlab migration the entire commit rights process is
> still a bit up in the air. But gitlab commit rights and roles are
> hierarchical, so we can do libdrm-only maintainer/commiter roles
> ("Owner" and "Developer" in gitlab-speak). This should avoid
> conflating libdrm roles with mesa roles, useful for those pushing to
> libdrm as primarily kernel contributors.

Fine with me,

Acked-by: Dave Airlie 

Dave.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx