[PATCH] drm/amd/powerplay: fix compile warning for wrong data type V2
do_div expects the 1st argument in 64bit instead of 32bit. Drop the usage of do_div as it seems unnecessary. V2: drop usage of do_div completely Change-Id: Id2032a43727e7f1fa516d3565354d412a561 Signed-off-by: Evan Quan --- drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c index 3efd59e984a3..1e65ac01e0f5 100644 --- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c @@ -1195,7 +1195,7 @@ static int vega20_set_sclk_od( int ret = 0; od_sclk = golden_sclk_table->dpm_levels[golden_sclk_table->count - 1].value * value; - do_div(od_sclk, 100); + od_sclk /= 100; od_sclk += golden_sclk_table->dpm_levels[golden_sclk_table->count - 1].value; ret = vega20_od8_set_settings(hwmgr, OD8_SETTING_GFXCLK_FMAX, od_sclk); @@ -1242,7 +1242,7 @@ static int vega20_set_mclk_od( int ret = 0; od_mclk = golden_mclk_table->dpm_levels[golden_mclk_table->count - 1].value * value; - do_div(od_mclk, 100); + od_mclk /= 100; od_mclk += golden_mclk_table->dpm_levels[golden_mclk_table->count - 1].value; ret = vega20_od8_set_settings(hwmgr, OD8_SETTING_UCLK_FMAX, od_mclk); -- 2.18.0 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH] drm/amd/powerplay: fix compile warning for wrong data type
> -Original Message- > From: Deucher, Alexander > Sent: 2018年9月5日 12:33 > To: Quan, Evan ; amd-gfx@lists.freedesktop.org > Cc: Quan, Evan > Subject: RE: [PATCH] drm/amd/powerplay: fix compile warning for wrong > data type > > > -Original Message- > > From: amd-gfx On Behalf Of > > Evan Quan > > Sent: Tuesday, September 4, 2018 10:21 PM > > To: amd-gfx@lists.freedesktop.org > > Cc: Quan, Evan > > Subject: [PATCH] drm/amd/powerplay: fix compile warning for wrong data > > type > > > > do_div expects the 1st argument in 64bit instead of 32bit. > > Do we actually need to use do_div here? If both arguments are 32 bit, can't > we just use regular division? > [Quan, Evan] That's a good idea. Will update the patch accordingly. > Alex > > > > > Change-Id: Id2032a43727e7f1fa516d3565354d412a561 > > Signed-off-by: Evan Quan > > --- > > drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c | 8 > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c > > b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c > > index 3efd59e984a3..6ba5f328249d 100644 > > --- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c > > +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c > > @@ -1191,14 +1191,14 @@ static int vega20_set_sclk_od( > > &(data->dpm_table.gfx_table); > > struct vega20_single_dpm_table *golden_sclk_table = > > &(data->golden_dpm_table.gfx_table); > > - uint32_t od_sclk; > > + uint64_t od_sclk; > > int ret = 0; > > > > od_sclk = golden_sclk_table->dpm_levels[golden_sclk_table->count > > - 1].value * value; > > do_div(od_sclk, 100); > > od_sclk += golden_sclk_table->dpm_levels[golden_sclk_table- > > >count - 1].value; > > > > - ret = vega20_od8_set_settings(hwmgr, > > OD8_SETTING_GFXCLK_FMAX, od_sclk); > > + ret = vega20_od8_set_settings(hwmgr, > > OD8_SETTING_GFXCLK_FMAX, (uint32_t)od_sclk); > > PP_ASSERT_WITH_CODE(!ret, > > "[SetSclkOD] failed to set od gfxclk!", > > return ret); > > @@ -1238,14 +1238,14 @@ static int vega20_set_mclk_od( > > &(data->dpm_table.mem_table); > > struct vega20_single_dpm_table *golden_mclk_table = > > &(data->golden_dpm_table.mem_table); > > - uint32_t od_mclk; > > + uint64_t od_mclk; > > int ret = 0; > > > > od_mclk = golden_mclk_table->dpm_levels[golden_mclk_table- > > >count - 1].value * value; > > do_div(od_mclk, 100); > > od_mclk += golden_mclk_table->dpm_levels[golden_mclk_table- > > >count - 1].value; > > > > - ret = vega20_od8_set_settings(hwmgr, OD8_SETTING_UCLK_FMAX, > > od_mclk); > > + ret = vega20_od8_set_settings(hwmgr, OD8_SETTING_UCLK_FMAX, > > (uint32_t)od_mclk); > > PP_ASSERT_WITH_CODE(!ret, > > "[SetMclkOD] failed to set od memclk!", > > return ret); > > -- > > 2.18.0 > > > > ___ > > amd-gfx mailing list > > amd-gfx@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH] drm/amd/powerplay: fix compile warning for wrong data type
> -Original Message- > From: amd-gfx On Behalf Of Evan > Quan > Sent: Tuesday, September 4, 2018 10:21 PM > To: amd-gfx@lists.freedesktop.org > Cc: Quan, Evan > Subject: [PATCH] drm/amd/powerplay: fix compile warning for wrong data > type > > do_div expects the 1st argument in 64bit instead of 32bit. Do we actually need to use do_div here? If both arguments are 32 bit, can't we just use regular division? Alex > > Change-Id: Id2032a43727e7f1fa516d3565354d412a561 > Signed-off-by: Evan Quan > --- > drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c > b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c > index 3efd59e984a3..6ba5f328249d 100644 > --- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c > +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c > @@ -1191,14 +1191,14 @@ static int vega20_set_sclk_od( > &(data->dpm_table.gfx_table); > struct vega20_single_dpm_table *golden_sclk_table = > &(data->golden_dpm_table.gfx_table); > - uint32_t od_sclk; > + uint64_t od_sclk; > int ret = 0; > > od_sclk = golden_sclk_table->dpm_levels[golden_sclk_table->count > - 1].value * value; > do_div(od_sclk, 100); > od_sclk += golden_sclk_table->dpm_levels[golden_sclk_table- > >count - 1].value; > > - ret = vega20_od8_set_settings(hwmgr, > OD8_SETTING_GFXCLK_FMAX, od_sclk); > + ret = vega20_od8_set_settings(hwmgr, > OD8_SETTING_GFXCLK_FMAX, (uint32_t)od_sclk); > PP_ASSERT_WITH_CODE(!ret, > "[SetSclkOD] failed to set od gfxclk!", > return ret); > @@ -1238,14 +1238,14 @@ static int vega20_set_mclk_od( > &(data->dpm_table.mem_table); > struct vega20_single_dpm_table *golden_mclk_table = > &(data->golden_dpm_table.mem_table); > - uint32_t od_mclk; > + uint64_t od_mclk; > int ret = 0; > > od_mclk = golden_mclk_table->dpm_levels[golden_mclk_table- > >count - 1].value * value; > do_div(od_mclk, 100); > od_mclk += golden_mclk_table->dpm_levels[golden_mclk_table- > >count - 1].value; > > - ret = vega20_od8_set_settings(hwmgr, OD8_SETTING_UCLK_FMAX, > od_mclk); > + ret = vega20_od8_set_settings(hwmgr, OD8_SETTING_UCLK_FMAX, > (uint32_t)od_mclk); > PP_ASSERT_WITH_CODE(!ret, > "[SetMclkOD] failed to set od memclk!", > return ret); > -- > 2.18.0 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3
On 2018年09月04日 17:20, Christian König wrote: Am 04.09.2018 um 11:00 schrieb zhoucm1: On 2018年09月04日 16:42, Christian König wrote: Am 04.09.2018 um 10:27 schrieb zhoucm1: On 2018年09月04日 16:05, Christian König wrote: Am 04.09.2018 um 09:53 schrieb zhoucm1: [SNIP] How about this idea: 1. Each signaling point is a fence implementation with an rb node. 2. Each node keeps a reference to the last previously inserted node. 3. Each node is referenced by the sync object itself. 4. Before each signal/wait operation we do a garbage collection and remove the first node from the tree as long as it is signaled. 5. When enable_signaling is requested for a node we cascade that to the left using rb_prev. This ensures that signaling is enabled for the current fence as well as all previous fences. 6. A wait just looks into the tree for the signal point lower or equal of the requested sequence number. After re-thought your idea, I think it doesn't work since there is no timeline value as a line: signal pt value doesn't must be continues, which can be jump by signal operation, like 1, 4, 8, 15, 19, e.g. there are five singal_pt, signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if a wait pt is 7, do you mean this wait only needs signal_pt1 and signal_pt4??? That's certainly not right, we need to make sure the timeline value is bigger than wait pt value, that means signal_pt8 is need for wait_pt7. That can be defined as we like it, e.g. when a wait operation asks for 7 we can return 8 as well. If defined this, then problem is coming again, if 8 is removed when garbage collection, you will return 15? The garbage collection is only done for signaled nodes. So when 8 is already garbage collected and 7 is asked we know that we don't need to return anything. 8 is a signaled node, waitA/signal operation do garbage collection, how waitB(7) know the garbage history? Well we of course keep what the last garbage collected number is, don't we? Since there is no timeline as a line, I think this is not right direction. That is actually intended. There is no infinite timeline here, just a windows of the last not yet signaled fences. No one said the it's a infinite timeline, timeline will stop increasing when syncobj is released. Yeah, but the syncobj can live for a very very long time. Not having some form of shrinking it when fences are signaled is certainly not going to fly very far. I will try to fix this problem. btw, when I try your suggestion, I find it will be difficult to implement drm_syncobj_array_wait_timeout by your idea, since it needs first_signaled. if there is un-signaled syncobj, we will still register cb to timeline value change, then still back to need enble_signaling. Thanks, David Zhou Regards, Christian. Anyway kref is a good way to solve the 'free' problem, I will try to use it improve my patch, of course, will refer your idea.:) Thanks, David Zhou Otherwise you will never be able to release nodes from the tree since you always need to keep them around just in case somebody asks for a lower number. Regards, Christian. The key is that as soon as a signal point is added adding a previous point is no longer allowed. That's intention. Regards, David Zhou 7. When the sync object is released we use rbtree_postorder_for_each_entry_safe() and drop the extra reference to each node, but never call rb_erase! This way the rb_tree stays in memory, but without a root (e.g. the sync object). It only destructs itself when the looked up references to the nodes are dropped. And here, who will destroy rb node since no one do enable_signaling, and there is no callback to free themselves. The node will be destroyed when the last reference drops, not when enable_signaling is called. In other words the sync_obj keeps the references to each tree object to provide the wait operation, as soon as the sync_obj is destroyed we don't need that functionality any more. We don't even need to wait for anything to be signaled, this way we can drop all unused signal points as soon as the sync_obj is destroyed. Only the used ones will stay alive and provide the necessary functionality to provide the signal for each wait operation. Regards, Christian. Regards, David Zhou Well that is quite a bunch of logic, but I think that should work fine. Yeah, it could work, simple timeline reference also can solve 'free' problem. I think this approach is still quite a bit better, e.g. you don't run into circle dependency problems, it needs less memory and each node has always the same size which means we can use a kmem_cache for it. Regards, Christian. Thanks, David Zhou ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list
[PATCH] drm/amd/powerplay: fix compile warning for wrong data type
do_div expects the 1st argument in 64bit instead of 32bit. Change-Id: Id2032a43727e7f1fa516d3565354d412a561 Signed-off-by: Evan Quan --- drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c index 3efd59e984a3..6ba5f328249d 100644 --- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c @@ -1191,14 +1191,14 @@ static int vega20_set_sclk_od( &(data->dpm_table.gfx_table); struct vega20_single_dpm_table *golden_sclk_table = &(data->golden_dpm_table.gfx_table); - uint32_t od_sclk; + uint64_t od_sclk; int ret = 0; od_sclk = golden_sclk_table->dpm_levels[golden_sclk_table->count - 1].value * value; do_div(od_sclk, 100); od_sclk += golden_sclk_table->dpm_levels[golden_sclk_table->count - 1].value; - ret = vega20_od8_set_settings(hwmgr, OD8_SETTING_GFXCLK_FMAX, od_sclk); + ret = vega20_od8_set_settings(hwmgr, OD8_SETTING_GFXCLK_FMAX, (uint32_t)od_sclk); PP_ASSERT_WITH_CODE(!ret, "[SetSclkOD] failed to set od gfxclk!", return ret); @@ -1238,14 +1238,14 @@ static int vega20_set_mclk_od( &(data->dpm_table.mem_table); struct vega20_single_dpm_table *golden_mclk_table = &(data->golden_dpm_table.mem_table); - uint32_t od_mclk; + uint64_t od_mclk; int ret = 0; od_mclk = golden_mclk_table->dpm_levels[golden_mclk_table->count - 1].value * value; do_div(od_mclk, 100); od_mclk += golden_mclk_table->dpm_levels[golden_mclk_table->count - 1].value; - ret = vega20_od8_set_settings(hwmgr, OD8_SETTING_UCLK_FMAX, od_mclk); + ret = vega20_od8_set_settings(hwmgr, OD8_SETTING_UCLK_FMAX, (uint32_t)od_mclk); PP_ASSERT_WITH_CODE(!ret, "[SetMclkOD] failed to set od memclk!", return ret); -- 2.18.0 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH libdrm] libdrm: Allow dynamic drm majors on linux
On Tue, 4 Sep 2018 at 03:00, Thomas Hellstrom wrote: > > On 09/03/2018 06:33 PM, Daniel Vetter wrote: > > On Mon, Sep 03, 2018 at 11:16:29AM +0200, Thomas Hellstrom wrote: > >> On 08/31/2018 05:30 PM, Thomas Hellstrom wrote: > >>> On 08/31/2018 05:27 PM, Emil Velikov wrote: > On 31 August 2018 at 15:38, Michel Dänzer wrote: > > [ Adding the amd-gfx list ] > > > > On 2018-08-31 3:05 p.m., Thomas Hellstrom wrote: > >> On 08/31/2018 02:30 PM, Emil Velikov wrote: > >>> On 31 August 2018 at 12:54, Thomas Hellstrom > >>> wrote: > To determine whether a device node is a drm device > node or not, the code > currently compares the node's major number to the static drm major > device > number. > > This breaks the standalone vmwgfx driver on XWayland dri clients, > > >>> Any particular reason why the code doesn't use a fixed node there? > >>> It will make the diff vs the in-kernel driver a bit smaller. > >> Because then it won't be able to interoperate with other in-tree > >> drivers, like virtual drm drivers or passthrough usb drm drivers. > >> There is no clean way to share the minor number allocation > >> with in-tree > >> drm, so standalone vmwgfx is using dynamic major allocation. > > I wonder why I haven't heard of any of these issues with the standalone > > version of amdgpu shipped in packaged AMD releases. Does that > > also use a > > different major number? If yes, maybe it's just that nobody has tried > > Xwayland clients with that driver. If no, how does it avoid the other > > issues described above? > > > AFAICT, the difference is that the standalone vmwgfx uses an internal > copy of drm core. > It doesn't reuse the in-kernel drm, hence it cannot know which minor > it can use. > > -Emil > >>> Actually, standalone vmwgfx could perhaps also try to allocate minors > >>> from 63 and downwards. That might work, but needs some verification. > >>> > >> So unfortuntately this doesn't work since the in-tree drm's file operations > >> are registered with the DRM_MAJOR. > >> So I still think the patch is the way to go. If people are concerned that > >> also fbdev file descriptors are allowed, perhaps there are other sysfs > >> traits we can look at? > > Somewhat out of curiosity, but why do you have to overwrite all of drm? > > amdgpu seems to be able to pull their stunt off without ... > > -Daniel > > At the time we launched the standalone vmwgfx, the DRM <-> driver > interface was moving considerably more rapidly than the DRM <-> kernel > interface. I think that's still the case. Hence less work for us. Also > meant we can install the full driver stack with latest features on > fairly old VMs without backported DRM functionality. > I think this should be fine for 99% of drm usage, there may be corner cases in wierd places, but I can't point to any that really matter (maybe strace?) Acked-by: Dave Airlie Dave. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.
On Tue, Sep 04, 2018 at 09:36:01PM +0200, Bas Nieuwenhuizen wrote: > On Tue, Sep 4, 2018 at 9:28 PM Daniel Vetter wrote: > > > > On Tue, Sep 4, 2018 at 8:31 PM, Bas Nieuwenhuizen > > wrote: > > > On Tue, Sep 4, 2018 at 8:27 PM Daniel Vetter wrote: > > >> > > >> On Tue, Sep 4, 2018 at 7:57 PM, Bas Nieuwenhuizen > > >> wrote: > > >> > On Tue, Sep 4, 2018 at 7:48 PM Christian König > > >> > wrote: > > >> >> > > >> >> Am 04.09.2018 um 18:37 schrieb Daniel Vetter: > > >> >> > On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen > > >> >> > wrote: > > >> >> >> On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter > > >> >> >> wrote: > > >> >> >>> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen > > >> >> >>> wrote: > > >> >> On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter > > >> >> wrote: > > >> >> > On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen > > >> >> > wrote: > > >> >> >> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter > > >> >> >> wrote: > > >> >> >>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König > > >> >> >>> wrote: > > >> >> Am 04.09.2018 um 12:15 schrieb Daniel Stone: > > >> >> > Hi, > > >> >> > > > >> >> > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter > > >> >> > wrote: > > >> >> >> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen > > >> >> >> wrote: > > >> >> >>> +/* The chip this is compatible with. > > >> >> >>> + * > > >> >> >>> + * If compression is disabled, use > > >> >> >>> + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 > > >> >> >>> + * - AMDGPU_CHIP_VEGA10 for GFX9+ > > >> >> >>> + * > > >> >> >>> + * With compression enabled please use the exact chip. > > >> >> >>> + * > > >> >> >>> + * TODO: Do some generations share DCC format? > > >> >> >>> + */ > > >> >> >>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 > > >> >> >>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK > > >> >> >>> 0xff > > >> >> >> Do you really need all the combinations here of DCC + gpu > > >> >> >> gen + tiling > > >> >> >> details? When we had the entire discussion with nvidia > > >> >> >> folks they > > >> >> >> eventually agreed that they don't need the massive pile > > >> >> >> with every > > >> >> >> possible combination. Do you really plan to share all > > >> >> >> these different > > >> >> >> things? > > >> >> >> > > >> >> >> Note that e.g. on i915 we spec some of the tiling > > >> >> >> depending upon > > >> >> >> buffer size and buffer format (because that's how the hw > > >> >> >> works), not > > >> >> >> using explicit modifier flags for everything. > > >> >> > Right. The conclusion, after people went through and > > >> >> > started sorting > > >> >> > out the kinds of formats for which they would _actually_ > > >> >> > export real > > >> >> > colour buffers for, that most vendors definitely have fewer > > >> >> > than > > >> >> > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > > >> >> > possible formats to represent, very likely fewer than > > >> >> > 340,282,366,920,938,463,463,374,607,431,768,211,456 > > >> >> > formats, probably > > >> >> > fewer than 72,057,594,037,927,936 formats, and even still > > >> >> > generally > > >> >> > fewer than 281,474,976,710,656 if you want to be generous > > >> >> > and leave 8 > > >> >> > bits of the 56 available. > > >> >> The problem here is that at least for some parameters we > > >> >> actually don't know > > >> >> which formats are actually used. > > >> >> > > >> >> The following are not real world examples, but just to > > >> >> explain the general > > >> >> problem. > > >> >> > > >> >> The memory configuration for example can be not ASIC > > >> >> specific, but rather > > >> >> determined by whoever took the ASIC and glued it together > > >> >> with VRAM on a > > >> >> board. It is not likely that somebody puts all the VRAM > > >> >> chips on one > > >> >> channel, but it is still perfectly possible. > > >> >> > > >> >> Same is true for things like harvesting, e.g. of 16 channels > > >> >> halve of them > > >> >> could be bad and we need to know which to actually use. > > >> >> >>> For my understanding: This leaks outside the chip when > > >> >> >>> sharing buffers? > > >> >> >>> All the information you only need locally to a given amdgpu > > >> >> >>> instance > > >> >> >>> don't really need to be encoded in modifiers. > > >> >>
Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.
On Tue, Sep 4, 2018 at 9:28 PM Daniel Vetter wrote: > > On Tue, Sep 4, 2018 at 8:31 PM, Bas Nieuwenhuizen > wrote: > > On Tue, Sep 4, 2018 at 8:27 PM Daniel Vetter wrote: > >> > >> On Tue, Sep 4, 2018 at 7:57 PM, Bas Nieuwenhuizen > >> wrote: > >> > On Tue, Sep 4, 2018 at 7:48 PM Christian König > >> > wrote: > >> >> > >> >> Am 04.09.2018 um 18:37 schrieb Daniel Vetter: > >> >> > On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen > >> >> > wrote: > >> >> >> On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter wrote: > >> >> >>> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen > >> >> >>> wrote: > >> >> On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter > >> >> wrote: > >> >> > On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote: > >> >> >> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter > >> >> >> wrote: > >> >> >>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: > >> >> Am 04.09.2018 um 12:15 schrieb Daniel Stone: > >> >> > Hi, > >> >> > > >> >> > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter > >> >> > wrote: > >> >> >> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen > >> >> >> wrote: > >> >> >>> +/* The chip this is compatible with. > >> >> >>> + * > >> >> >>> + * If compression is disabled, use > >> >> >>> + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 > >> >> >>> + * - AMDGPU_CHIP_VEGA10 for GFX9+ > >> >> >>> + * > >> >> >>> + * With compression enabled please use the exact chip. > >> >> >>> + * > >> >> >>> + * TODO: Do some generations share DCC format? > >> >> >>> + */ > >> >> >>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 > >> >> >>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff > >> >> >> Do you really need all the combinations here of DCC + gpu > >> >> >> gen + tiling > >> >> >> details? When we had the entire discussion with nvidia folks > >> >> >> they > >> >> >> eventually agreed that they don't need the massive pile with > >> >> >> every > >> >> >> possible combination. Do you really plan to share all these > >> >> >> different > >> >> >> things? > >> >> >> > >> >> >> Note that e.g. on i915 we spec some of the tiling depending > >> >> >> upon > >> >> >> buffer size and buffer format (because that's how the hw > >> >> >> works), not > >> >> >> using explicit modifier flags for everything. > >> >> > Right. The conclusion, after people went through and started > >> >> > sorting > >> >> > out the kinds of formats for which they would _actually_ > >> >> > export real > >> >> > colour buffers for, that most vendors definitely have fewer > >> >> > than > >> >> > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > >> >> > possible formats to represent, very likely fewer than > >> >> > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, > >> >> > probably > >> >> > fewer than 72,057,594,037,927,936 formats, and even still > >> >> > generally > >> >> > fewer than 281,474,976,710,656 if you want to be generous and > >> >> > leave 8 > >> >> > bits of the 56 available. > >> >> The problem here is that at least for some parameters we > >> >> actually don't know > >> >> which formats are actually used. > >> >> > >> >> The following are not real world examples, but just to explain > >> >> the general > >> >> problem. > >> >> > >> >> The memory configuration for example can be not ASIC specific, > >> >> but rather > >> >> determined by whoever took the ASIC and glued it together with > >> >> VRAM on a > >> >> board. It is not likely that somebody puts all the VRAM chips > >> >> on one > >> >> channel, but it is still perfectly possible. > >> >> > >> >> Same is true for things like harvesting, e.g. of 16 channels > >> >> halve of them > >> >> could be bad and we need to know which to actually use. > >> >> >>> For my understanding: This leaks outside the chip when sharing > >> >> >>> buffers? > >> >> >>> All the information you only need locally to a given amdgpu > >> >> >>> instance > >> >> >>> don't really need to be encoded in modifiers. > >> >> >>> > >> >> >>> Pointers to code where this is all decided (kernel and radeonsi > >> >> >>> would be > >> >> >>> good starters I guess) would be really good here. > >> >> >> I extracted the information on which bits are relevant mostly > >> >> >> from the > >> >> >> AddrFromCoord functions in addrlib in
Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.
On Tue, Sep 4, 2018 at 8:31 PM, Bas Nieuwenhuizen wrote: > On Tue, Sep 4, 2018 at 8:27 PM Daniel Vetter wrote: >> >> On Tue, Sep 4, 2018 at 7:57 PM, Bas Nieuwenhuizen >> wrote: >> > On Tue, Sep 4, 2018 at 7:48 PM Christian König >> > wrote: >> >> >> >> Am 04.09.2018 um 18:37 schrieb Daniel Vetter: >> >> > On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen >> >> > wrote: >> >> >> On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter wrote: >> >> >>> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen >> >> >>> wrote: >> >> On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter wrote: >> >> > On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote: >> >> >> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter >> >> >> wrote: >> >> >>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: >> >> Am 04.09.2018 um 12:15 schrieb Daniel Stone: >> >> > Hi, >> >> > >> >> > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter >> >> > wrote: >> >> >> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen >> >> >> wrote: >> >> >>> +/* The chip this is compatible with. >> >> >>> + * >> >> >>> + * If compression is disabled, use >> >> >>> + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 >> >> >>> + * - AMDGPU_CHIP_VEGA10 for GFX9+ >> >> >>> + * >> >> >>> + * With compression enabled please use the exact chip. >> >> >>> + * >> >> >>> + * TODO: Do some generations share DCC format? >> >> >>> + */ >> >> >>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 >> >> >>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff >> >> >> Do you really need all the combinations here of DCC + gpu gen >> >> >> + tiling >> >> >> details? When we had the entire discussion with nvidia folks >> >> >> they >> >> >> eventually agreed that they don't need the massive pile with >> >> >> every >> >> >> possible combination. Do you really plan to share all these >> >> >> different >> >> >> things? >> >> >> >> >> >> Note that e.g. on i915 we spec some of the tiling depending >> >> >> upon >> >> >> buffer size and buffer format (because that's how the hw >> >> >> works), not >> >> >> using explicit modifier flags for everything. >> >> > Right. The conclusion, after people went through and started >> >> > sorting >> >> > out the kinds of formats for which they would _actually_ export >> >> > real >> >> > colour buffers for, that most vendors definitely have fewer than >> >> > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 >> >> > possible formats to represent, very likely fewer than >> >> > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, >> >> > probably >> >> > fewer than 72,057,594,037,927,936 formats, and even still >> >> > generally >> >> > fewer than 281,474,976,710,656 if you want to be generous and >> >> > leave 8 >> >> > bits of the 56 available. >> >> The problem here is that at least for some parameters we >> >> actually don't know >> >> which formats are actually used. >> >> >> >> The following are not real world examples, but just to explain >> >> the general >> >> problem. >> >> >> >> The memory configuration for example can be not ASIC specific, >> >> but rather >> >> determined by whoever took the ASIC and glued it together with >> >> VRAM on a >> >> board. It is not likely that somebody puts all the VRAM chips on >> >> one >> >> channel, but it is still perfectly possible. >> >> >> >> Same is true for things like harvesting, e.g. of 16 channels >> >> halve of them >> >> could be bad and we need to know which to actually use. >> >> >>> For my understanding: This leaks outside the chip when sharing >> >> >>> buffers? >> >> >>> All the information you only need locally to a given amdgpu >> >> >>> instance >> >> >>> don't really need to be encoded in modifiers. >> >> >>> >> >> >>> Pointers to code where this is all decided (kernel and radeonsi >> >> >>> would be >> >> >>> good starters I guess) would be really good here. >> >> >> I extracted the information on which bits are relevant mostly from >> >> >> the >> >> >> AddrFromCoord functions in addrlib in mesa: >> >> >> >> >> >> for macro-tiles: >> >> >> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587 >> >> >> >> >> >> for micro-tiles (or the micro-tiles in macro-tiles): >> >> >> >> >> >>
Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.
On Tue, Sep 4, 2018 at 7:57 PM, Bas Nieuwenhuizen wrote: > On Tue, Sep 4, 2018 at 7:48 PM Christian König > wrote: >> >> Am 04.09.2018 um 18:37 schrieb Daniel Vetter: >> > On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen >> > wrote: >> >> On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter wrote: >> >>> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen >> >>> wrote: >> On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter wrote: >> > On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote: >> >> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter wrote: >> >>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: >> Am 04.09.2018 um 12:15 schrieb Daniel Stone: >> > Hi, >> > >> > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter >> > wrote: >> >> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen >> >> wrote: >> >>> +/* The chip this is compatible with. >> >>> + * >> >>> + * If compression is disabled, use >> >>> + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 >> >>> + * - AMDGPU_CHIP_VEGA10 for GFX9+ >> >>> + * >> >>> + * With compression enabled please use the exact chip. >> >>> + * >> >>> + * TODO: Do some generations share DCC format? >> >>> + */ >> >>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 >> >>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff >> >> Do you really need all the combinations here of DCC + gpu gen + >> >> tiling >> >> details? When we had the entire discussion with nvidia folks they >> >> eventually agreed that they don't need the massive pile with every >> >> possible combination. Do you really plan to share all these >> >> different >> >> things? >> >> >> >> Note that e.g. on i915 we spec some of the tiling depending upon >> >> buffer size and buffer format (because that's how the hw works), >> >> not >> >> using explicit modifier flags for everything. >> > Right. The conclusion, after people went through and started >> > sorting >> > out the kinds of formats for which they would _actually_ export >> > real >> > colour buffers for, that most vendors definitely have fewer than >> > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 >> > possible formats to represent, very likely fewer than >> > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, >> > probably >> > fewer than 72,057,594,037,927,936 formats, and even still generally >> > fewer than 281,474,976,710,656 if you want to be generous and >> > leave 8 >> > bits of the 56 available. >> The problem here is that at least for some parameters we actually >> don't know >> which formats are actually used. >> >> The following are not real world examples, but just to explain the >> general >> problem. >> >> The memory configuration for example can be not ASIC specific, but >> rather >> determined by whoever took the ASIC and glued it together with VRAM >> on a >> board. It is not likely that somebody puts all the VRAM chips on one >> channel, but it is still perfectly possible. >> >> Same is true for things like harvesting, e.g. of 16 channels halve >> of them >> could be bad and we need to know which to actually use. >> >>> For my understanding: This leaks outside the chip when sharing >> >>> buffers? >> >>> All the information you only need locally to a given amdgpu instance >> >>> don't really need to be encoded in modifiers. >> >>> >> >>> Pointers to code where this is all decided (kernel and radeonsi >> >>> would be >> >>> good starters I guess) would be really good here. >> >> I extracted the information on which bits are relevant mostly from the >> >> AddrFromCoord functions in addrlib in mesa: >> >> >> >> for macro-tiles: >> >> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587 >> >> >> >> for micro-tiles (or the micro-tiles in macro-tiles): >> >> >> >> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016 >> > So this is the decoding thing. How many of these actually exist, even >> > when >> > taking all the other information into account? >> > >> > E.g. given a platform + memory config (seems needed) + drm_fourcc + >> > stride >> > + height + width, how much of all these bits do you actually still >> > freely >> > pick? >> Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse, >> thick
Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.
Am 04.09.2018 um 20:00 schrieb Bas Nieuwenhuizen: On Tue, Sep 4, 2018 at 7:57 PM Bas Nieuwenhuizen wrote: On Tue, Sep 4, 2018 at 7:48 PM Christian König wrote: Am 04.09.2018 um 18:37 schrieb Daniel Vetter: On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen wrote: On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter wrote: On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen wrote: On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter wrote: On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote: On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter wrote: On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: Am 04.09.2018 um 12:15 schrieb Daniel Stone: Hi, On Tue, 4 Sep 2018 at 11:05, Daniel Vetter wrote: On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen wrote: +/* The chip this is compatible with. + * + * If compression is disabled, use + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 + * - AMDGPU_CHIP_VEGA10 for GFX9+ + * + * With compression enabled please use the exact chip. + * + * TODO: Do some generations share DCC format? + */ +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff Do you really need all the combinations here of DCC + gpu gen + tiling details? When we had the entire discussion with nvidia folks they eventually agreed that they don't need the massive pile with every possible combination. Do you really plan to share all these different things? Note that e.g. on i915 we spec some of the tiling depending upon buffer size and buffer format (because that's how the hw works), not using explicit modifier flags for everything. Right. The conclusion, after people went through and started sorting out the kinds of formats for which they would _actually_ export real colour buffers for, that most vendors definitely have fewer than 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 possible formats to represent, very likely fewer than 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably fewer than 72,057,594,037,927,936 formats, and even still generally fewer than 281,474,976,710,656 if you want to be generous and leave 8 bits of the 56 available. The problem here is that at least for some parameters we actually don't know which formats are actually used. The following are not real world examples, but just to explain the general problem. The memory configuration for example can be not ASIC specific, but rather determined by whoever took the ASIC and glued it together with VRAM on a board. It is not likely that somebody puts all the VRAM chips on one channel, but it is still perfectly possible. Same is true for things like harvesting, e.g. of 16 channels halve of them could be bad and we need to know which to actually use. For my understanding: This leaks outside the chip when sharing buffers? All the information you only need locally to a given amdgpu instance don't really need to be encoded in modifiers. Pointers to code where this is all decided (kernel and radeonsi would be good starters I guess) would be really good here. I extracted the information on which bits are relevant mostly from the AddrFromCoord functions in addrlib in mesa: for macro-tiles: https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587 for micro-tiles (or the micro-tiles in macro-tiles): https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016 So this is the decoding thing. How many of these actually exist, even when taking all the other information into account? E.g. given a platform + memory config (seems needed) + drm_fourcc + stride + height + width, how much of all these bits do you actually still freely pick? Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse, thick variants of macro-tile), MICRO_TILE_MODE(display, non-display, depth, display-rotated) + whether to use compression, everything else is fixed given those option, the properties of the chip and the format. It might be that all the things you need to know from the memory config don't encode smaller than the macro/micro/whatever else stuff. But that's kinda the angle that we looked at this for everyone else. E.g. for multi-plane stuff, if everyone picks the same config for the 2nd/3rd plane, then you don't actually need to encode that. It just becomes part of the implied stuff in the modifier. The problem is some GPUs are compatible for say 8-bpp images, but not for 32-bpp surfaces. e.g. lets look at the following table showing the current configuration for all GFX6-GFX8 GPU: format: (bank width, bank height, macro tile aspect, num banks) for 8-bpp, 16-bpp and 32 bpp single-sample followed by the PIPE_CONFIG verde: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16 oland: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16 hainan: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16)
Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.
On Tue, Sep 4, 2018 at 7:57 PM Bas Nieuwenhuizen wrote: > > On Tue, Sep 4, 2018 at 7:48 PM Christian König > wrote: > > > > Am 04.09.2018 um 18:37 schrieb Daniel Vetter: > > > On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen > > > wrote: > > >> On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter wrote: > > >>> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen > > >>> wrote: > > On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter wrote: > > > On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote: > > >> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter wrote: > > >>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: > > Am 04.09.2018 um 12:15 schrieb Daniel Stone: > > > Hi, > > > > > > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter > > > wrote: > > >> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen > > >> wrote: > > >>> +/* The chip this is compatible with. > > >>> + * > > >>> + * If compression is disabled, use > > >>> + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 > > >>> + * - AMDGPU_CHIP_VEGA10 for GFX9+ > > >>> + * > > >>> + * With compression enabled please use the exact chip. > > >>> + * > > >>> + * TODO: Do some generations share DCC format? > > >>> + */ > > >>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 > > >>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff > > >> Do you really need all the combinations here of DCC + gpu gen + > > >> tiling > > >> details? When we had the entire discussion with nvidia folks they > > >> eventually agreed that they don't need the massive pile with > > >> every > > >> possible combination. Do you really plan to share all these > > >> different > > >> things? > > >> > > >> Note that e.g. on i915 we spec some of the tiling depending upon > > >> buffer size and buffer format (because that's how the hw works), > > >> not > > >> using explicit modifier flags for everything. > > > Right. The conclusion, after people went through and started > > > sorting > > > out the kinds of formats for which they would _actually_ export > > > real > > > colour buffers for, that most vendors definitely have fewer than > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > > > possible formats to represent, very likely fewer than > > > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, > > > probably > > > fewer than 72,057,594,037,927,936 formats, and even still > > > generally > > > fewer than 281,474,976,710,656 if you want to be generous and > > > leave 8 > > > bits of the 56 available. > > The problem here is that at least for some parameters we actually > > don't know > > which formats are actually used. > > > > The following are not real world examples, but just to explain the > > general > > problem. > > > > The memory configuration for example can be not ASIC specific, but > > rather > > determined by whoever took the ASIC and glued it together with > > VRAM on a > > board. It is not likely that somebody puts all the VRAM chips on > > one > > channel, but it is still perfectly possible. > > > > Same is true for things like harvesting, e.g. of 16 channels halve > > of them > > could be bad and we need to know which to actually use. > > >>> For my understanding: This leaks outside the chip when sharing > > >>> buffers? > > >>> All the information you only need locally to a given amdgpu instance > > >>> don't really need to be encoded in modifiers. > > >>> > > >>> Pointers to code where this is all decided (kernel and radeonsi > > >>> would be > > >>> good starters I guess) would be really good here. > > >> I extracted the information on which bits are relevant mostly from > > >> the > > >> AddrFromCoord functions in addrlib in mesa: > > >> > > >> for macro-tiles: > > >> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587 > > >> > > >> for micro-tiles (or the micro-tiles in macro-tiles): > > >> > > >> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016 > > > So this is the decoding thing. How many of these actually exist, even > > > when > > > taking all the other information into account? > > > > > > E.g. given a platform + memory config (seems needed) + drm_fourcc + > > > stride > > > + height + width, how much of all these
Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.
On Tue, Sep 4, 2018 at 7:48 PM Christian König wrote: > > Am 04.09.2018 um 18:37 schrieb Daniel Vetter: > > On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen > > wrote: > >> On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter wrote: > >>> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen > >>> wrote: > On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter wrote: > > On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote: > >> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter wrote: > >>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: > Am 04.09.2018 um 12:15 schrieb Daniel Stone: > > Hi, > > > > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter > > wrote: > >> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen > >> wrote: > >>> +/* The chip this is compatible with. > >>> + * > >>> + * If compression is disabled, use > >>> + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 > >>> + * - AMDGPU_CHIP_VEGA10 for GFX9+ > >>> + * > >>> + * With compression enabled please use the exact chip. > >>> + * > >>> + * TODO: Do some generations share DCC format? > >>> + */ > >>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 > >>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff > >> Do you really need all the combinations here of DCC + gpu gen + > >> tiling > >> details? When we had the entire discussion with nvidia folks they > >> eventually agreed that they don't need the massive pile with every > >> possible combination. Do you really plan to share all these > >> different > >> things? > >> > >> Note that e.g. on i915 we spec some of the tiling depending upon > >> buffer size and buffer format (because that's how the hw works), > >> not > >> using explicit modifier flags for everything. > > Right. The conclusion, after people went through and started sorting > > out the kinds of formats for which they would _actually_ export real > > colour buffers for, that most vendors definitely have fewer than > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > > possible formats to represent, very likely fewer than > > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, > > probably > > fewer than 72,057,594,037,927,936 formats, and even still generally > > fewer than 281,474,976,710,656 if you want to be generous and leave > > 8 > > bits of the 56 available. > The problem here is that at least for some parameters we actually > don't know > which formats are actually used. > > The following are not real world examples, but just to explain the > general > problem. > > The memory configuration for example can be not ASIC specific, but > rather > determined by whoever took the ASIC and glued it together with VRAM > on a > board. It is not likely that somebody puts all the VRAM chips on one > channel, but it is still perfectly possible. > > Same is true for things like harvesting, e.g. of 16 channels halve > of them > could be bad and we need to know which to actually use. > >>> For my understanding: This leaks outside the chip when sharing > >>> buffers? > >>> All the information you only need locally to a given amdgpu instance > >>> don't really need to be encoded in modifiers. > >>> > >>> Pointers to code where this is all decided (kernel and radeonsi would > >>> be > >>> good starters I guess) would be really good here. > >> I extracted the information on which bits are relevant mostly from the > >> AddrFromCoord functions in addrlib in mesa: > >> > >> for macro-tiles: > >> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587 > >> > >> for micro-tiles (or the micro-tiles in macro-tiles): > >> > >> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016 > > So this is the decoding thing. How many of these actually exist, even > > when > > taking all the other information into account? > > > > E.g. given a platform + memory config (seems needed) + drm_fourcc + > > stride > > + height + width, how much of all these bits do you actually still > > freely > > pick? > Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse, > thick variants of macro-tile), MICRO_TILE_MODE(display, non-display, > depth, display-rotated) + whether to use compression, everything else > is fixed given those option, the properties of
Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.
Am 04.09.2018 um 18:37 schrieb Daniel Vetter: On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen wrote: On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter wrote: On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen wrote: On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter wrote: On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote: On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter wrote: On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: Am 04.09.2018 um 12:15 schrieb Daniel Stone: Hi, On Tue, 4 Sep 2018 at 11:05, Daniel Vetter wrote: On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen wrote: +/* The chip this is compatible with. + * + * If compression is disabled, use + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 + * - AMDGPU_CHIP_VEGA10 for GFX9+ + * + * With compression enabled please use the exact chip. + * + * TODO: Do some generations share DCC format? + */ +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff Do you really need all the combinations here of DCC + gpu gen + tiling details? When we had the entire discussion with nvidia folks they eventually agreed that they don't need the massive pile with every possible combination. Do you really plan to share all these different things? Note that e.g. on i915 we spec some of the tiling depending upon buffer size and buffer format (because that's how the hw works), not using explicit modifier flags for everything. Right. The conclusion, after people went through and started sorting out the kinds of formats for which they would _actually_ export real colour buffers for, that most vendors definitely have fewer than 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 possible formats to represent, very likely fewer than 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably fewer than 72,057,594,037,927,936 formats, and even still generally fewer than 281,474,976,710,656 if you want to be generous and leave 8 bits of the 56 available. The problem here is that at least for some parameters we actually don't know which formats are actually used. The following are not real world examples, but just to explain the general problem. The memory configuration for example can be not ASIC specific, but rather determined by whoever took the ASIC and glued it together with VRAM on a board. It is not likely that somebody puts all the VRAM chips on one channel, but it is still perfectly possible. Same is true for things like harvesting, e.g. of 16 channels halve of them could be bad and we need to know which to actually use. For my understanding: This leaks outside the chip when sharing buffers? All the information you only need locally to a given amdgpu instance don't really need to be encoded in modifiers. Pointers to code where this is all decided (kernel and radeonsi would be good starters I guess) would be really good here. I extracted the information on which bits are relevant mostly from the AddrFromCoord functions in addrlib in mesa: for macro-tiles: https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587 for micro-tiles (or the micro-tiles in macro-tiles): https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016 So this is the decoding thing. How many of these actually exist, even when taking all the other information into account? E.g. given a platform + memory config (seems needed) + drm_fourcc + stride + height + width, how much of all these bits do you actually still freely pick? Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse, thick variants of macro-tile), MICRO_TILE_MODE(display, non-display, depth, display-rotated) + whether to use compression, everything else is fixed given those option, the properties of the chip and the format. It might be that all the things you need to know from the memory config don't encode smaller than the macro/micro/whatever else stuff. But that's kinda the angle that we looked at this for everyone else. E.g. for multi-plane stuff, if everyone picks the same config for the 2nd/3rd plane, then you don't actually need to encode that. It just becomes part of the implied stuff in the modifier. The problem is some GPUs are compatible for say 8-bpp images, but not for 32-bpp surfaces. e.g. lets look at the following table showing the current configuration for all GFX6-GFX8 GPU: format: (bank width, bank height, macro tile aspect, num banks) for 8-bpp, 16-bpp and 32 bpp single-sample followed by the PIPE_CONFIG verde: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16 oland: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16 hainan: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P2 tahiti/pitcairn: (1, 4, 1, 16) (1, 2, 1, 16) (1, 1, 1, 16) ADDR_SURF_P8_32x32_8x16 bonaire: (1,4,4,16) (1, 2, 4, 16) (1, 1, 2, 16) ADDR_SURF_P4_16x16 hawaii: (1,
Re: two KASANs in TTM logic
Sure: d2917f399e0b250f47d07da551a335843a24f835 is the first bad commit commit d2917f399e0b250f47d07da551a335843a24f835 Author: Christian König Date: Thu Aug 30 10:04:53 2018 +0200 drm/amdgpu: fix "use bulk moves for efficient VM LRU handling" v2 First step to fix the LRU corruption, we accidentially tried to move things on the LRU after dropping the lock. Signed-off-by: Christian König Tested-by: Michel Dänzer :04 04 ed5be1ad4da129c4154b2b43acf7ef349a470700 0008c4e2fb56512f41559618dd474c916fc09a37 M drivers The commit before that I can run xonotic-glx and piglit on my Carrizo without a KASAN. Tom On 09/04/2018 10:05 AM, Christian König wrote: The first one should already be fixed. Not sure where the second comes from. Can you narrow that down further? Christian. Am 04.09.2018 um 15:46 schrieb Tom St Denis: First is caused by this commit while running a GL heavy application. d78c1fa0c9f815fe951fd57001acca3d35262a17 is the first bad commit commit d78c1fa0c9f815fe951fd57001acca3d35262a17 Author: Michel Dänzer Date: Wed Aug 29 11:59:38 2018 +0200 Revert "drm/amdgpu: move PD/PT bos on LRU again" This reverts commit 31625ccae4464b61ec8cdb9740df848bbc857a5b. It triggered various badness on my development machine when running the piglit gpu profile with radeonsi on Bonaire, looks like memory corruption due to insufficiently protected list manipulations. Signed-off-by: Michel Dänzer Signed-off-by: Alex Deucher :04 04 b7169f0cf0c7decec631751a9896a92badb67f9d 42ea58f43199d26fc0c7ddcc655e6d0964b81817 M drivers The second is caused by something between that and the tip of the 4.19-rc1 amd-staging-drm-next (I haven't pinned it down yet) while loading GNOME. Tom ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.
On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen wrote: > On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter wrote: >> >> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen >> wrote: >> > On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter wrote: >> >> >> >> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote: >> >> > On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter wrote: >> >> > > >> >> > > On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: >> >> > > > Am 04.09.2018 um 12:15 schrieb Daniel Stone: >> >> > > > > Hi, >> >> > > > > >> >> > > > > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter >> >> > > > > wrote: >> >> > > > > > On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen >> >> > > > > > wrote: >> >> > > > > > > +/* The chip this is compatible with. >> >> > > > > > > + * >> >> > > > > > > + * If compression is disabled, use >> >> > > > > > > + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 >> >> > > > > > > + * - AMDGPU_CHIP_VEGA10 for GFX9+ >> >> > > > > > > + * >> >> > > > > > > + * With compression enabled please use the exact chip. >> >> > > > > > > + * >> >> > > > > > > + * TODO: Do some generations share DCC format? >> >> > > > > > > + */ >> >> > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 >> >> > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff >> >> > > > > > Do you really need all the combinations here of DCC + gpu gen + >> >> > > > > > tiling >> >> > > > > > details? When we had the entire discussion with nvidia folks >> >> > > > > > they >> >> > > > > > eventually agreed that they don't need the massive pile with >> >> > > > > > every >> >> > > > > > possible combination. Do you really plan to share all these >> >> > > > > > different >> >> > > > > > things? >> >> > > > > > >> >> > > > > > Note that e.g. on i915 we spec some of the tiling depending upon >> >> > > > > > buffer size and buffer format (because that's how the hw >> >> > > > > > works), not >> >> > > > > > using explicit modifier flags for everything. >> >> > > > > Right. The conclusion, after people went through and started >> >> > > > > sorting >> >> > > > > out the kinds of formats for which they would _actually_ export >> >> > > > > real >> >> > > > > colour buffers for, that most vendors definitely have fewer than >> >> > > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 >> >> > > > > possible formats to represent, very likely fewer than >> >> > > > > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, >> >> > > > > probably >> >> > > > > fewer than 72,057,594,037,927,936 formats, and even still >> >> > > > > generally >> >> > > > > fewer than 281,474,976,710,656 if you want to be generous and >> >> > > > > leave 8 >> >> > > > > bits of the 56 available. >> >> > > > >> >> > > > The problem here is that at least for some parameters we actually >> >> > > > don't know >> >> > > > which formats are actually used. >> >> > > > >> >> > > > The following are not real world examples, but just to explain the >> >> > > > general >> >> > > > problem. >> >> > > > >> >> > > > The memory configuration for example can be not ASIC specific, but >> >> > > > rather >> >> > > > determined by whoever took the ASIC and glued it together with VRAM >> >> > > > on a >> >> > > > board. It is not likely that somebody puts all the VRAM chips on one >> >> > > > channel, but it is still perfectly possible. >> >> > > > >> >> > > > Same is true for things like harvesting, e.g. of 16 channels halve >> >> > > > of them >> >> > > > could be bad and we need to know which to actually use. >> >> > > >> >> > > For my understanding: This leaks outside the chip when sharing >> >> > > buffers? >> >> > > All the information you only need locally to a given amdgpu instance >> >> > > don't really need to be encoded in modifiers. >> >> > > >> >> > > Pointers to code where this is all decided (kernel and radeonsi would >> >> > > be >> >> > > good starters I guess) would be really good here. >> >> > >> >> > I extracted the information on which bits are relevant mostly from the >> >> > AddrFromCoord functions in addrlib in mesa: >> >> > >> >> > for macro-tiles: >> >> > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587 >> >> > >> >> > for micro-tiles (or the micro-tiles in macro-tiles): >> >> > >> >> > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016 >> >> >> >> So this is the decoding thing. How many of these actually exist, even when >> >> taking all the other information into account? >> >> >> >> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride >> >> + height + width, how much of all these bits do you actually still freely >> >> pick? >> > >> > Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse, >> > thick variants of macro-tile), MICRO_TILE_MODE(display, non-display, >> > depth, display-rotated) +
Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.
On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter wrote: > > On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen > wrote: > > On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter wrote: > >> > >> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote: > >> > On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter wrote: > >> > > > >> > > On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: > >> > > > Am 04.09.2018 um 12:15 schrieb Daniel Stone: > >> > > > > Hi, > >> > > > > > >> > > > > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter > >> > > > > wrote: > >> > > > > > On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen > >> > > > > > wrote: > >> > > > > > > +/* The chip this is compatible with. > >> > > > > > > + * > >> > > > > > > + * If compression is disabled, use > >> > > > > > > + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 > >> > > > > > > + * - AMDGPU_CHIP_VEGA10 for GFX9+ > >> > > > > > > + * > >> > > > > > > + * With compression enabled please use the exact chip. > >> > > > > > > + * > >> > > > > > > + * TODO: Do some generations share DCC format? > >> > > > > > > + */ > >> > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 > >> > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff > >> > > > > > Do you really need all the combinations here of DCC + gpu gen + > >> > > > > > tiling > >> > > > > > details? When we had the entire discussion with nvidia folks they > >> > > > > > eventually agreed that they don't need the massive pile with > >> > > > > > every > >> > > > > > possible combination. Do you really plan to share all these > >> > > > > > different > >> > > > > > things? > >> > > > > > > >> > > > > > Note that e.g. on i915 we spec some of the tiling depending upon > >> > > > > > buffer size and buffer format (because that's how the hw works), > >> > > > > > not > >> > > > > > using explicit modifier flags for everything. > >> > > > > Right. The conclusion, after people went through and started > >> > > > > sorting > >> > > > > out the kinds of formats for which they would _actually_ export > >> > > > > real > >> > > > > colour buffers for, that most vendors definitely have fewer than > >> > > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > >> > > > > possible formats to represent, very likely fewer than > >> > > > > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, > >> > > > > probably > >> > > > > fewer than 72,057,594,037,927,936 formats, and even still generally > >> > > > > fewer than 281,474,976,710,656 if you want to be generous and > >> > > > > leave 8 > >> > > > > bits of the 56 available. > >> > > > > >> > > > The problem here is that at least for some parameters we actually > >> > > > don't know > >> > > > which formats are actually used. > >> > > > > >> > > > The following are not real world examples, but just to explain the > >> > > > general > >> > > > problem. > >> > > > > >> > > > The memory configuration for example can be not ASIC specific, but > >> > > > rather > >> > > > determined by whoever took the ASIC and glued it together with VRAM > >> > > > on a > >> > > > board. It is not likely that somebody puts all the VRAM chips on one > >> > > > channel, but it is still perfectly possible. > >> > > > > >> > > > Same is true for things like harvesting, e.g. of 16 channels halve > >> > > > of them > >> > > > could be bad and we need to know which to actually use. > >> > > > >> > > For my understanding: This leaks outside the chip when sharing buffers? > >> > > All the information you only need locally to a given amdgpu instance > >> > > don't really need to be encoded in modifiers. > >> > > > >> > > Pointers to code where this is all decided (kernel and radeonsi would > >> > > be > >> > > good starters I guess) would be really good here. > >> > > >> > I extracted the information on which bits are relevant mostly from the > >> > AddrFromCoord functions in addrlib in mesa: > >> > > >> > for macro-tiles: > >> > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587 > >> > > >> > for micro-tiles (or the micro-tiles in macro-tiles): > >> > > >> > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016 > >> > >> So this is the decoding thing. How many of these actually exist, even when > >> taking all the other information into account? > >> > >> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride > >> + height + width, how much of all these bits do you actually still freely > >> pick? > > > > Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse, > > thick variants of macro-tile), MICRO_TILE_MODE(display, non-display, > > depth, display-rotated) + whether to use compression, everything else > > is fixed given those option, the properties of the chip and the > > format. > > > > > >> > >> It might be that all the things you need to know from the
Re: [PATCH v3 04/13] fbdev: add remove_conflicting_pci_framebuffers()
Hi Michał, I love your patch! Perhaps something to improve: [auto build test WARNING on linus/master] [also build test WARNING on v4.19-rc2 next-20180831] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Micha-Miros-aw/remove_conflicting_framebuffers-cleanup/20180903-094322 reproduce: make htmldocs :: branch date: 5 hours ago :: commit date: 5 hours ago All warnings (new ones prefixed by >>): WARNING: convert(1) not found, for SVG to PDF conversion install ImageMagick (https://www.imagemagick.org) include/linux/srcu.h:175: warning: Function parameter or member 'p' not described in 'srcu_dereference_notrace' include/linux/srcu.h:175: warning: Function parameter or member 'sp' not described in 'srcu_dereference_notrace' include/linux/gfp.h:1: warning: no structured comments found include/net/cfg80211.h:4381: warning: Function parameter or member 'wext.ibss' not described in 'wireless_dev' include/net/cfg80211.h:4381: warning: Function parameter or member 'wext.connect' not described in 'wireless_dev' include/net/cfg80211.h:4381: warning: Function parameter or member 'wext.keys' not described in 'wireless_dev' include/net/cfg80211.h:4381: warning: Function parameter or member 'wext.ie' not described in 'wireless_dev' include/net/cfg80211.h:4381: warning: Function parameter or member 'wext.ie_len' not described in 'wireless_dev' include/net/cfg80211.h:4381: warning: Function parameter or member 'wext.bssid' not described in 'wireless_dev' include/net/cfg80211.h:4381: warning: Function parameter or member 'wext.ssid' not described in 'wireless_dev' include/net/cfg80211.h:4381: warning: Function parameter or member 'wext.default_key' not described in 'wireless_dev' include/net/cfg80211.h:4381: warning: Function parameter or member 'wext.default_mgmt_key' not described in 'wireless_dev' include/net/cfg80211.h:4381: warning: Function parameter or member 'wext.prev_bssid_valid' not described in 'wireless_dev' include/net/mac80211.h:2328: warning: Function parameter or member 'radiotap_timestamp.units_pos' not described in 'ieee80211_hw' include/net/mac80211.h:2328: warning: Function parameter or member 'radiotap_timestamp.accuracy' not described in 'ieee80211_hw' include/net/mac80211.h:977: warning: Function parameter or member 'control.rates' not described in 'ieee80211_tx_info' include/net/mac80211.h:977: warning: Function parameter or member 'control.rts_cts_rate_idx' not described in 'ieee80211_tx_info' include/net/mac80211.h:977: warning: Function parameter or member 'control.use_rts' not described in 'ieee80211_tx_info' include/net/mac80211.h:977: warning: Function parameter or member 'control.use_cts_prot' not described in 'ieee80211_tx_info' include/net/mac80211.h:977: warning: Function parameter or member 'control.short_preamble' not described in 'ieee80211_tx_info' include/net/mac80211.h:977: warning: Function parameter or member 'control.skip_table' not described in 'ieee80211_tx_info' include/net/mac80211.h:977: warning: Function parameter or member 'control.jiffies' not described in 'ieee80211_tx_info' include/net/mac80211.h:977: warning: Function parameter or member 'control.vif' not described in 'ieee80211_tx_info' include/net/mac80211.h:977: warning: Function parameter or member 'control.hw_key' not described in 'ieee80211_tx_info' include/net/mac80211.h:977: warning: Function parameter or member 'control.flags' not described in 'ieee80211_tx_info' include/net/mac80211.h:977: warning: Function parameter or member 'control.enqueue_time' not described in 'ieee80211_tx_info' include/net/mac80211.h:977: warning: Function parameter or member 'ack' not described in 'ieee80211_tx_info' include/net/mac80211.h:977: warning: Function parameter or member 'ack.cookie' not described in 'ieee80211_tx_info' include/net/mac80211.h:977: warning: Function parameter or member 'status.rates' not described in 'ieee80211_tx_info' include/net/mac80211.h:977: warning: Function parameter or member 'status.ack_signal' not described in 'ieee80211_tx_info' include/net/mac80211.h:977: warning: Function parameter or member 'status.ampdu_ack_len' not described in 'ieee80211_tx_info' include/net/mac80211.h:977: warning: Function parameter or member 'status.ampdu_len' not described in 'ieee80211_tx_info' include/net/mac80211.h:977: warning: Function parameter or member 'status.antenna' not described in 'ieee80211_tx_info' include/net/mac80211.h:977: warning: Function parameter or member 'status.tx_time' not described in 'ieee80211_tx_info' include/net/mac80211.h:977: warning: Function parameter or member 'status.is_valid_ack_signal' not described in 'ieee80211_tx_info' include/net/mac80211.h:977: warning: Function parameter or member 'status.status_driver_data' not described
Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.
On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen wrote: > On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter wrote: >> >> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote: >> > On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter wrote: >> > > >> > > On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: >> > > > Am 04.09.2018 um 12:15 schrieb Daniel Stone: >> > > > > Hi, >> > > > > >> > > > > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter >> > > > > wrote: >> > > > > > On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen >> > > > > > wrote: >> > > > > > > +/* The chip this is compatible with. >> > > > > > > + * >> > > > > > > + * If compression is disabled, use >> > > > > > > + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 >> > > > > > > + * - AMDGPU_CHIP_VEGA10 for GFX9+ >> > > > > > > + * >> > > > > > > + * With compression enabled please use the exact chip. >> > > > > > > + * >> > > > > > > + * TODO: Do some generations share DCC format? >> > > > > > > + */ >> > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 >> > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff >> > > > > > Do you really need all the combinations here of DCC + gpu gen + >> > > > > > tiling >> > > > > > details? When we had the entire discussion with nvidia folks they >> > > > > > eventually agreed that they don't need the massive pile with every >> > > > > > possible combination. Do you really plan to share all these >> > > > > > different >> > > > > > things? >> > > > > > >> > > > > > Note that e.g. on i915 we spec some of the tiling depending upon >> > > > > > buffer size and buffer format (because that's how the hw works), >> > > > > > not >> > > > > > using explicit modifier flags for everything. >> > > > > Right. The conclusion, after people went through and started sorting >> > > > > out the kinds of formats for which they would _actually_ export real >> > > > > colour buffers for, that most vendors definitely have fewer than >> > > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 >> > > > > possible formats to represent, very likely fewer than >> > > > > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably >> > > > > fewer than 72,057,594,037,927,936 formats, and even still generally >> > > > > fewer than 281,474,976,710,656 if you want to be generous and leave 8 >> > > > > bits of the 56 available. >> > > > >> > > > The problem here is that at least for some parameters we actually >> > > > don't know >> > > > which formats are actually used. >> > > > >> > > > The following are not real world examples, but just to explain the >> > > > general >> > > > problem. >> > > > >> > > > The memory configuration for example can be not ASIC specific, but >> > > > rather >> > > > determined by whoever took the ASIC and glued it together with VRAM on >> > > > a >> > > > board. It is not likely that somebody puts all the VRAM chips on one >> > > > channel, but it is still perfectly possible. >> > > > >> > > > Same is true for things like harvesting, e.g. of 16 channels halve of >> > > > them >> > > > could be bad and we need to know which to actually use. >> > > >> > > For my understanding: This leaks outside the chip when sharing buffers? >> > > All the information you only need locally to a given amdgpu instance >> > > don't really need to be encoded in modifiers. >> > > >> > > Pointers to code where this is all decided (kernel and radeonsi would be >> > > good starters I guess) would be really good here. >> > >> > I extracted the information on which bits are relevant mostly from the >> > AddrFromCoord functions in addrlib in mesa: >> > >> > for macro-tiles: >> > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587 >> > >> > for micro-tiles (or the micro-tiles in macro-tiles): >> > >> > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016 >> >> So this is the decoding thing. How many of these actually exist, even when >> taking all the other information into account? >> >> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride >> + height + width, how much of all these bits do you actually still freely >> pick? > > Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse, > thick variants of macro-tile), MICRO_TILE_MODE(display, non-display, > depth, display-rotated) + whether to use compression, everything else > is fixed given those option, the properties of the chip and the > format. > > >> >> It might be that all the things you need to know from the memory config >> don't encode smaller than the macro/micro/whatever else stuff. But that's >> kinda the angle that we looked at this for everyone else. >> >> E.g. for multi-plane stuff, if everyone picks the same config for the >> 2nd/3rd plane, then you don't actually need to encode that. It just >> becomes part of the implied stuff in the
[bug report] drm/radeon: use pcie functions for link width
Hello Alex Deucher, The patch 5f152a572c10: "drm/radeon: use pcie functions for link width" from Jun 25, 2018, leads to the following static checker warning: drivers/gpu/drm/radeon/cik.c:9646 cik_pcie_gen3_enable() warn: dead code because of 'speed_cap == 21' and 'speed_cap != 21' drivers/gpu/drm/radeon/cik.c 9499 static void cik_pcie_gen3_enable(struct radeon_device *rdev) 9500 { 9501 struct pci_dev *root = rdev->pdev->bus->self; 9502 enum pci_bus_speed speed_cap; 9503 int bridge_pos, gpu_pos; 9504 u32 speed_cntl, current_data_rate; 9505 int i; 9506 u16 tmp16; 9507 9508 if (pci_is_root_bus(rdev->pdev->bus)) 9509 return; 9510 9511 if (radeon_pcie_gen2 == 0) 9512 return; 9513 9514 if (rdev->flags & RADEON_IS_IGP) 9515 return; 9516 9517 if (!(rdev->flags & RADEON_IS_PCIE)) 9518 return; 9519 9520 speed_cap = pcie_get_speed_cap(root); 9521 if (speed_cap == PCI_SPEED_UNKNOWN) 9522 return; 9523 9524 if ((speed_cap != PCIE_SPEED_8_0GT) && 9525 (speed_cap != PCIE_SPEED_5_0GT)) ^ We added this condition 9526 return; 9527 9528 speed_cntl = RREG32_PCIE_PORT(PCIE_LC_SPEED_CNTL); 9529 current_data_rate = (speed_cntl & LC_CURRENT_DATA_RATE_MASK) >> 9530 LC_CURRENT_DATA_RATE_SHIFT; 9531 if (speed_cap == PCIE_SPEED_8_0GT) { 9532 if (current_data_rate == 2) { 9533 DRM_INFO("PCIE gen 3 link speeds already enabled\n"); 9534 return; 9535 } 9536 DRM_INFO("enabling PCIE gen 3 link speeds, disable with radeon.pcie_gen2=0\n"); 9537 } else if (speed_cap == PCIE_SPEED_5_0GT) { 9538 if (current_data_rate == 1) { 9539 DRM_INFO("PCIE gen 2 link speeds already enabled\n"); 9540 return; 9541 } 9542 DRM_INFO("enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0\n"); 9543 } 9544 9545 bridge_pos = pci_pcie_cap(root); 9546 if (!bridge_pos) 9547 return; 9548 9549 gpu_pos = pci_pcie_cap(rdev->pdev); 9550 if (!gpu_pos) 9551 return; 9552 9553 if (speed_cap == PCIE_SPEED_8_0GT) { 9554 /* re-try equalization if gen3 is not already enabled */ 9555 if (current_data_rate != 2) { 9556 u16 bridge_cfg, gpu_cfg; 9557 u16 bridge_cfg2, gpu_cfg2; 9558 u32 max_lw, current_lw, tmp; 9559 9560 pci_read_config_word(root, bridge_pos + PCI_EXP_LNKCTL, _cfg); 9561 pci_read_config_word(rdev->pdev, gpu_pos + PCI_EXP_LNKCTL, _cfg); 9562 9563 tmp16 = bridge_cfg | PCI_EXP_LNKCTL_HAWD; 9564 pci_write_config_word(root, bridge_pos + PCI_EXP_LNKCTL, tmp16); 9565 9566 tmp16 = gpu_cfg | PCI_EXP_LNKCTL_HAWD; 9567 pci_write_config_word(rdev->pdev, gpu_pos + PCI_EXP_LNKCTL, tmp16); 9568 9569 tmp = RREG32_PCIE_PORT(PCIE_LC_STATUS1); 9570 max_lw = (tmp & LC_DETECTED_LINK_WIDTH_MASK) >> LC_DETECTED_LINK_WIDTH_SHIFT; 9571 current_lw = (tmp & LC_OPERATING_LINK_WIDTH_MASK) >> LC_OPERATING_LINK_WIDTH_SHIFT; 9572 9573 if (current_lw < max_lw) { 9574 tmp = RREG32_PCIE_PORT(PCIE_LC_LINK_WIDTH_CNTL); 9575 if (tmp & LC_RENEGOTIATION_SUPPORT) { 9576 tmp &= ~(LC_LINK_WIDTH_MASK | LC_UPCONFIGURE_DIS); 9577 tmp |= (max_lw << LC_LINK_WIDTH_SHIFT); 9578 tmp |= LC_UPCONFIGURE_SUPPORT | LC_RENEGOTIATE_EN | LC_RECONFIG_NOW; 9579 WREG32_PCIE_PORT(PCIE_LC_LINK_WIDTH_CNTL, tmp); 9580 } 9581 } 9582 9583 for (i = 0; i < 10; i++) { 9584 /* check status */ 9585 pci_read_config_word(rdev->pdev, gpu_pos + PCI_EXP_DEVSTA, ); 9586 if (tmp16 & PCI_EXP_DEVSTA_TRPND) 9587 break; 9588 9589
Re: two KASANs in TTM logic
The first one should already be fixed. Not sure where the second comes from. Can you narrow that down further? Christian. Am 04.09.2018 um 15:46 schrieb Tom St Denis: First is caused by this commit while running a GL heavy application. d78c1fa0c9f815fe951fd57001acca3d35262a17 is the first bad commit commit d78c1fa0c9f815fe951fd57001acca3d35262a17 Author: Michel Dänzer Date: Wed Aug 29 11:59:38 2018 +0200 Revert "drm/amdgpu: move PD/PT bos on LRU again" This reverts commit 31625ccae4464b61ec8cdb9740df848bbc857a5b. It triggered various badness on my development machine when running the piglit gpu profile with radeonsi on Bonaire, looks like memory corruption due to insufficiently protected list manipulations. Signed-off-by: Michel Dänzer Signed-off-by: Alex Deucher :04 04 b7169f0cf0c7decec631751a9896a92badb67f9d 42ea58f43199d26fc0c7ddcc655e6d0964b81817 M drivers The second is caused by something between that and the tip of the 4.19-rc1 amd-staging-drm-next (I haven't pinned it down yet) while loading GNOME. Tom ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.
On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter wrote: > > On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote: > > On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter wrote: > > > > > > On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: > > > > Am 04.09.2018 um 12:15 schrieb Daniel Stone: > > > > > Hi, > > > > > > > > > > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter > > > > > wrote: > > > > > > On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen > > > > > > wrote: > > > > > > > +/* The chip this is compatible with. > > > > > > > + * > > > > > > > + * If compression is disabled, use > > > > > > > + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 > > > > > > > + * - AMDGPU_CHIP_VEGA10 for GFX9+ > > > > > > > + * > > > > > > > + * With compression enabled please use the exact chip. > > > > > > > + * > > > > > > > + * TODO: Do some generations share DCC format? > > > > > > > + */ > > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 > > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff > > > > > > Do you really need all the combinations here of DCC + gpu gen + > > > > > > tiling > > > > > > details? When we had the entire discussion with nvidia folks they > > > > > > eventually agreed that they don't need the massive pile with every > > > > > > possible combination. Do you really plan to share all these > > > > > > different > > > > > > things? > > > > > > > > > > > > Note that e.g. on i915 we spec some of the tiling depending upon > > > > > > buffer size and buffer format (because that's how the hw works), not > > > > > > using explicit modifier flags for everything. > > > > > Right. The conclusion, after people went through and started sorting > > > > > out the kinds of formats for which they would _actually_ export real > > > > > colour buffers for, that most vendors definitely have fewer than > > > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > > > > > possible formats to represent, very likely fewer than > > > > > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably > > > > > fewer than 72,057,594,037,927,936 formats, and even still generally > > > > > fewer than 281,474,976,710,656 if you want to be generous and leave 8 > > > > > bits of the 56 available. > > > > > > > > The problem here is that at least for some parameters we actually don't > > > > know > > > > which formats are actually used. > > > > > > > > The following are not real world examples, but just to explain the > > > > general > > > > problem. > > > > > > > > The memory configuration for example can be not ASIC specific, but > > > > rather > > > > determined by whoever took the ASIC and glued it together with VRAM on a > > > > board. It is not likely that somebody puts all the VRAM chips on one > > > > channel, but it is still perfectly possible. > > > > > > > > Same is true for things like harvesting, e.g. of 16 channels halve of > > > > them > > > > could be bad and we need to know which to actually use. > > > > > > For my understanding: This leaks outside the chip when sharing buffers? > > > All the information you only need locally to a given amdgpu instance > > > don't really need to be encoded in modifiers. > > > > > > Pointers to code where this is all decided (kernel and radeonsi would be > > > good starters I guess) would be really good here. > > > > I extracted the information on which bits are relevant mostly from the > > AddrFromCoord functions in addrlib in mesa: > > > > for macro-tiles: > > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587 > > > > for micro-tiles (or the micro-tiles in macro-tiles): > > > > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016 > > So this is the decoding thing. How many of these actually exist, even when > taking all the other information into account? > > E.g. given a platform + memory config (seems needed) + drm_fourcc + stride > + height + width, how much of all these bits do you actually still freely > pick? Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse, thick variants of macro-tile), MICRO_TILE_MODE(display, non-display, depth, display-rotated) + whether to use compression, everything else is fixed given those option, the properties of the chip and the format. > > It might be that all the things you need to know from the memory config > don't encode smaller than the macro/micro/whatever else stuff. But that's > kinda the angle that we looked at this for everyone else. > > E.g. for multi-plane stuff, if everyone picks the same config for the > 2nd/3rd plane, then you don't actually need to encode that. It just > becomes part of the implied stuff in the modifier. The problem is some GPUs are compatible for say 8-bpp images, but not for 32-bpp surfaces. e.g. lets look at the following table showing the current configuration for all GFX6-GFX8
Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.
Am 04.09.2018 um 15:17 schrieb Daniel Vetter: On Tue, Sep 4, 2018 at 3:12 PM, Christian König wrote: Am 04.09.2018 um 15:03 schrieb Daniel Vetter: On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote: On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter wrote: On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: Am 04.09.2018 um 12:15 schrieb Daniel Stone: Hi, On Tue, 4 Sep 2018 at 11:05, Daniel Vetter wrote: On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen wrote: +/* The chip this is compatible with. + * + * If compression is disabled, use + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 + * - AMDGPU_CHIP_VEGA10 for GFX9+ + * + * With compression enabled please use the exact chip. + * + * TODO: Do some generations share DCC format? + */ +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff Do you really need all the combinations here of DCC + gpu gen + tiling details? When we had the entire discussion with nvidia folks they eventually agreed that they don't need the massive pile with every possible combination. Do you really plan to share all these different things? Note that e.g. on i915 we spec some of the tiling depending upon buffer size and buffer format (because that's how the hw works), not using explicit modifier flags for everything. Right. The conclusion, after people went through and started sorting out the kinds of formats for which they would _actually_ export real colour buffers for, that most vendors definitely have fewer than 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 possible formats to represent, very likely fewer than 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably fewer than 72,057,594,037,927,936 formats, and even still generally fewer than 281,474,976,710,656 if you want to be generous and leave 8 bits of the 56 available. The problem here is that at least for some parameters we actually don't know which formats are actually used. The following are not real world examples, but just to explain the general problem. The memory configuration for example can be not ASIC specific, but rather determined by whoever took the ASIC and glued it together with VRAM on a board. It is not likely that somebody puts all the VRAM chips on one channel, but it is still perfectly possible. Same is true for things like harvesting, e.g. of 16 channels halve of them could be bad and we need to know which to actually use. For my understanding: This leaks outside the chip when sharing buffers? All the information you only need locally to a given amdgpu instance don't really need to be encoded in modifiers. Pointers to code where this is all decided (kernel and radeonsi would be good starters I guess) would be really good here. I extracted the information on which bits are relevant mostly from the AddrFromCoord functions in addrlib in mesa: for macro-tiles: https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587 for micro-tiles (or the micro-tiles in macro-tiles): https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016 So this is the decoding thing. How many of these actually exist, even when taking all the other information into account? E.g. given a platform + memory config (seems needed) + drm_fourcc + stride + height + width, how much of all these bits do you actually still freely pick? It might be that all the things you need to know from the memory config don't encode smaller than the macro/micro/whatever else stuff. But that's kinda the angle that we looked at this for everyone else. E.g. for multi-plane stuff, if everyone picks the same config for the 2nd/3rd plane, then you don't actually need to encode that. It just becomes part of the implied stuff in the modifier. The pipe/bank configuration for the 2nd/3rd plane is the same, but the tile configuration is potentially different. We can make an educated guess, but I don't think that this is universal applicable as well. Why? The source code for amdgpu.ko, mesa, radv, amdvk is all open. Should be possible to figure out what's actually used (on buffers you want to share), and then figuring out how to best encode the information you need to reconstruct all the information. You might need to occasionally respin that if best practices for how to lay out buffers changes after you started upstreaming for a given generation. But ime hw people have pretty strong opinions on how to best tile things. You even know what's coming down the internal pipeline. Or do you mean something else with "educated guess"? I mean something else. It can depend on the use case what a buffer object ends up with. In other words when you want to transcode a video you could use a different layout as when you want to display the resulting image. So essentially we would need to encode the use case and
Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.
On Tue, Sep 4, 2018 at 3:12 PM, Christian König wrote: > Am 04.09.2018 um 15:03 schrieb Daniel Vetter: >> >> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote: >>> >>> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter wrote: On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: > > Am 04.09.2018 um 12:15 schrieb Daniel Stone: >> >> Hi, >> >> On Tue, 4 Sep 2018 at 11:05, Daniel Vetter >> wrote: >>> >>> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen >>> wrote: +/* The chip this is compatible with. + * + * If compression is disabled, use + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 + * - AMDGPU_CHIP_VEGA10 for GFX9+ + * + * With compression enabled please use the exact chip. + * + * TODO: Do some generations share DCC format? + */ +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff >>> >>> Do you really need all the combinations here of DCC + gpu gen + >>> tiling >>> details? When we had the entire discussion with nvidia folks they >>> eventually agreed that they don't need the massive pile with every >>> possible combination. Do you really plan to share all these different >>> things? >>> >>> Note that e.g. on i915 we spec some of the tiling depending upon >>> buffer size and buffer format (because that's how the hw works), not >>> using explicit modifier flags for everything. >> >> Right. The conclusion, after people went through and started sorting >> out the kinds of formats for which they would _actually_ export real >> colour buffers for, that most vendors definitely have fewer than >> >> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 >> possible formats to represent, very likely fewer than >> 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably >> fewer than 72,057,594,037,927,936 formats, and even still generally >> fewer than 281,474,976,710,656 if you want to be generous and leave 8 >> bits of the 56 available. > > The problem here is that at least for some parameters we actually don't > know > which formats are actually used. > > The following are not real world examples, but just to explain the > general > problem. > > The memory configuration for example can be not ASIC specific, but > rather > determined by whoever took the ASIC and glued it together with VRAM on > a > board. It is not likely that somebody puts all the VRAM chips on one > channel, but it is still perfectly possible. > > Same is true for things like harvesting, e.g. of 16 channels halve of > them > could be bad and we need to know which to actually use. For my understanding: This leaks outside the chip when sharing buffers? All the information you only need locally to a given amdgpu instance don't really need to be encoded in modifiers. Pointers to code where this is all decided (kernel and radeonsi would be good starters I guess) would be really good here. >>> >>> I extracted the information on which bits are relevant mostly from the >>> AddrFromCoord functions in addrlib in mesa: >>> >>> for macro-tiles: >>> >>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587 >>> >>> for micro-tiles (or the micro-tiles in macro-tiles): >>> >>> >>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016 >> >> So this is the decoding thing. How many of these actually exist, even when >> taking all the other information into account? >> >> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride >> + height + width, how much of all these bits do you actually still freely >> pick? >> >> It might be that all the things you need to know from the memory config >> don't encode smaller than the macro/micro/whatever else stuff. But that's >> kinda the angle that we looked at this for everyone else. >> >> E.g. for multi-plane stuff, if everyone picks the same config for the >> 2nd/3rd plane, then you don't actually need to encode that. It just >> becomes part of the implied stuff in the modifier. > > > The pipe/bank configuration for the 2nd/3rd plane is the same, but the tile > configuration is potentially different. > > We can make an educated guess, but I don't think that this is universal > applicable as well. Why? The source code for amdgpu.ko, mesa, radv, amdvk is all open. Should be possible to figure out what's actually used (on buffers you want to share), and then figuring out how to best encode the information you need to reconstruct all the information. You might need to occasionally respin that if best practices for
Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.
Am 04.09.2018 um 15:03 schrieb Daniel Vetter: On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote: On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter wrote: On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: Am 04.09.2018 um 12:15 schrieb Daniel Stone: Hi, On Tue, 4 Sep 2018 at 11:05, Daniel Vetter wrote: On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen wrote: +/* The chip this is compatible with. + * + * If compression is disabled, use + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 + * - AMDGPU_CHIP_VEGA10 for GFX9+ + * + * With compression enabled please use the exact chip. + * + * TODO: Do some generations share DCC format? + */ +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff Do you really need all the combinations here of DCC + gpu gen + tiling details? When we had the entire discussion with nvidia folks they eventually agreed that they don't need the massive pile with every possible combination. Do you really plan to share all these different things? Note that e.g. on i915 we spec some of the tiling depending upon buffer size and buffer format (because that's how the hw works), not using explicit modifier flags for everything. Right. The conclusion, after people went through and started sorting out the kinds of formats for which they would _actually_ export real colour buffers for, that most vendors definitely have fewer than 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 possible formats to represent, very likely fewer than 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably fewer than 72,057,594,037,927,936 formats, and even still generally fewer than 281,474,976,710,656 if you want to be generous and leave 8 bits of the 56 available. The problem here is that at least for some parameters we actually don't know which formats are actually used. The following are not real world examples, but just to explain the general problem. The memory configuration for example can be not ASIC specific, but rather determined by whoever took the ASIC and glued it together with VRAM on a board. It is not likely that somebody puts all the VRAM chips on one channel, but it is still perfectly possible. Same is true for things like harvesting, e.g. of 16 channels halve of them could be bad and we need to know which to actually use. For my understanding: This leaks outside the chip when sharing buffers? All the information you only need locally to a given amdgpu instance don't really need to be encoded in modifiers. Pointers to code where this is all decided (kernel and radeonsi would be good starters I guess) would be really good here. I extracted the information on which bits are relevant mostly from the AddrFromCoord functions in addrlib in mesa: for macro-tiles: https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587 for micro-tiles (or the micro-tiles in macro-tiles): https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016 So this is the decoding thing. How many of these actually exist, even when taking all the other information into account? E.g. given a platform + memory config (seems needed) + drm_fourcc + stride + height + width, how much of all these bits do you actually still freely pick? It might be that all the things you need to know from the memory config don't encode smaller than the macro/micro/whatever else stuff. But that's kinda the angle that we looked at this for everyone else. E.g. for multi-plane stuff, if everyone picks the same config for the 2nd/3rd plane, then you don't actually need to encode that. It just becomes part of the implied stuff in the modifier. The pipe/bank configuration for the 2nd/3rd plane is the same, but the tile configuration is potentially different. We can make an educated guess, but I don't think that this is universal applicable as well. Regards, Christian. -Daniel -Daniel Regards, Christian. If you do use 256 bits in order to represent 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 modifiers per format, userspace would start hitting OOM pretty quickly as it attempted to enumerate and negotiate acceptable modifiers. Either that or we need to replace the fixed 64-bit modifier tokens with some kind of eBPF script. Cheers, Daniel ___ dri-devel mailing list dri-de...@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ dri-devel mailing list dri-de...@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org
Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.
Am 04.09.2018 um 14:26 schrieb Daniel Vetter: On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: Am 04.09.2018 um 12:15 schrieb Daniel Stone: Hi, On Tue, 4 Sep 2018 at 11:05, Daniel Vetter wrote: On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen wrote: +/* The chip this is compatible with. + * + * If compression is disabled, use + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 + * - AMDGPU_CHIP_VEGA10 for GFX9+ + * + * With compression enabled please use the exact chip. + * + * TODO: Do some generations share DCC format? + */ +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff Do you really need all the combinations here of DCC + gpu gen + tiling details? When we had the entire discussion with nvidia folks they eventually agreed that they don't need the massive pile with every possible combination. Do you really plan to share all these different things? Note that e.g. on i915 we spec some of the tiling depending upon buffer size and buffer format (because that's how the hw works), not using explicit modifier flags for everything. Right. The conclusion, after people went through and started sorting out the kinds of formats for which they would _actually_ export real colour buffers for, that most vendors definitely have fewer than 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 possible formats to represent, very likely fewer than 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably fewer than 72,057,594,037,927,936 formats, and even still generally fewer than 281,474,976,710,656 if you want to be generous and leave 8 bits of the 56 available. The problem here is that at least for some parameters we actually don't know which formats are actually used. The following are not real world examples, but just to explain the general problem. The memory configuration for example can be not ASIC specific, but rather determined by whoever took the ASIC and glued it together with VRAM on a board. It is not likely that somebody puts all the VRAM chips on one channel, but it is still perfectly possible. Same is true for things like harvesting, e.g. of 16 channels halve of them could be bad and we need to know which to actually use. For my understanding: This leaks outside the chip when sharing buffers? All the information you only need locally to a given amdgpu instance don't really need to be encoded in modifiers. Yeah, that is certainly a possibility. The problem is that the identifiers are then not universal applicable. But if we can somehow say this format only applies to buffers from devices with hardware config X, then that would certainly simplify things a lot. Pointers to code where this is all decided (kernel and radeonsi would be good starters I guess) would be really good here. Well that is the problem, a good part of that stuff is rather deeply embedded in the firmware. The kernel driver just reads the hardware config from some registers and hands it of to userspace. Regards, Christian. -Daniel Regards, Christian. If you do use 256 bits in order to represent 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 modifiers per format, userspace would start hitting OOM pretty quickly as it attempted to enumerate and negotiate acceptable modifiers. Either that or we need to replace the fixed 64-bit modifier tokens with some kind of eBPF script. Cheers, Daniel ___ dri-devel mailing list dri-de...@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.
On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote: > On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter wrote: > > > > On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: > > > Am 04.09.2018 um 12:15 schrieb Daniel Stone: > > > > Hi, > > > > > > > > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter > > > > wrote: > > > > > On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen > > > > > wrote: > > > > > > +/* The chip this is compatible with. > > > > > > + * > > > > > > + * If compression is disabled, use > > > > > > + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 > > > > > > + * - AMDGPU_CHIP_VEGA10 for GFX9+ > > > > > > + * > > > > > > + * With compression enabled please use the exact chip. > > > > > > + * > > > > > > + * TODO: Do some generations share DCC format? > > > > > > + */ > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff > > > > > Do you really need all the combinations here of DCC + gpu gen + tiling > > > > > details? When we had the entire discussion with nvidia folks they > > > > > eventually agreed that they don't need the massive pile with every > > > > > possible combination. Do you really plan to share all these different > > > > > things? > > > > > > > > > > Note that e.g. on i915 we spec some of the tiling depending upon > > > > > buffer size and buffer format (because that's how the hw works), not > > > > > using explicit modifier flags for everything. > > > > Right. The conclusion, after people went through and started sorting > > > > out the kinds of formats for which they would _actually_ export real > > > > colour buffers for, that most vendors definitely have fewer than > > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > > > > possible formats to represent, very likely fewer than > > > > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably > > > > fewer than 72,057,594,037,927,936 formats, and even still generally > > > > fewer than 281,474,976,710,656 if you want to be generous and leave 8 > > > > bits of the 56 available. > > > > > > The problem here is that at least for some parameters we actually don't > > > know > > > which formats are actually used. > > > > > > The following are not real world examples, but just to explain the general > > > problem. > > > > > > The memory configuration for example can be not ASIC specific, but rather > > > determined by whoever took the ASIC and glued it together with VRAM on a > > > board. It is not likely that somebody puts all the VRAM chips on one > > > channel, but it is still perfectly possible. > > > > > > Same is true for things like harvesting, e.g. of 16 channels halve of them > > > could be bad and we need to know which to actually use. > > > > For my understanding: This leaks outside the chip when sharing buffers? > > All the information you only need locally to a given amdgpu instance > > don't really need to be encoded in modifiers. > > > > Pointers to code where this is all decided (kernel and radeonsi would be > > good starters I guess) would be really good here. > > I extracted the information on which bits are relevant mostly from the > AddrFromCoord functions in addrlib in mesa: > > for macro-tiles: > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587 > > for micro-tiles (or the micro-tiles in macro-tiles): > > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016 So this is the decoding thing. How many of these actually exist, even when taking all the other information into account? E.g. given a platform + memory config (seems needed) + drm_fourcc + stride + height + width, how much of all these bits do you actually still freely pick? It might be that all the things you need to know from the memory config don't encode smaller than the macro/micro/whatever else stuff. But that's kinda the angle that we looked at this for everyone else. E.g. for multi-plane stuff, if everyone picks the same config for the 2nd/3rd plane, then you don't actually need to encode that. It just becomes part of the implied stuff in the modifier. -Daniel > > > -Daniel > > > > > > > > Regards, > > > Christian. > > > > > > > > > > > If you do use 256 bits in order to represent > > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > > > > modifiers per format, userspace would start hitting OOM pretty quickly > > > > as it attempted to enumerate and negotiate acceptable modifiers. > > > > Either that or we need to replace the fixed 64-bit modifier tokens > > > > with some kind of eBPF script. > > > > > > > > Cheers, > > > > Daniel > > > > ___ > > > > dri-devel mailing list > > > > dri-de...@lists.freedesktop.org > > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel >
Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.
Am 04.09.2018 um 14:22 schrieb Daniel Stone: Hi, On Tue, 4 Sep 2018 at 11:44, Christian König wrote: Am 04.09.2018 um 12:15 schrieb Daniel Stone: Right. The conclusion, after people went through and started sorting out the kinds of formats for which they would _actually_ export real colour buffers for, that most vendors definitely have fewer than 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 possible formats to represent, very likely fewer than 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably fewer than 72,057,594,037,927,936 formats, and even still generally fewer than 281,474,976,710,656 if you want to be generous and leave 8 bits of the 56 available. The problem here is that at least for some parameters we actually don't know which formats are actually used. The following are not real world examples, but just to explain the general problem. The memory configuration for example can be not ASIC specific, but rather determined by whoever took the ASIC and glued it together with VRAM on a board. It is not likely that somebody puts all the VRAM chips on one channel, but it is still perfectly possible. Same is true for things like harvesting, e.g. of 16 channels halve of them could be bad and we need to know which to actually use. Sure, that's fine, but I'm not sure why, for instance, 'half of the memory channels are bad and must be avoided' would be attached to a format in the same way that macrotile size/depth would be? Because it results in a different memory layout. Similarly, what happens if you encode a channel or lane blacklist mask into a modifier, and then you import a buffer with that modifier into a client API on another device. Does that also apply to the other device or only the source device? If the other device doesn't have the same hardware config it probably can't access that buffer. How then do you handle the capability query between them: is it relevant to device B attempting to import and sample from an image that device A only has a 128-bit memory bus because of SKU configuration? How does generic userspace look at a token which encodes all of this information and know that the buffer is interchangeable between devices? Well that is exactly the point where my understanding stops. No idea how the user space driver is handling this. Marek came up with the interop interface, he needs to explain the details. Regards, Christian. Cheers, Daniel ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [Mesa-dev] [PATCH libdrm] Add basic CONTRIBUTING file
On Tue, Sep 04, 2018 at 10:02:40AM +0100, Eric Engestrom wrote: > On Tuesday, 2018-09-04 16:24:44 +1000, Dave Airlie wrote: > > On Mon, 3 Sep 2018 at 18:47, Daniel Vetter wrote: > > > > > > I picked up a bunch of the pieces from wayland's version: > > > > > > https://gitlab.freedesktop.org/wayland/wayland/blob/master/CONTRIBUTING.md > > > > > > The weston one is fairly similar. Then I rather massively trimmed it > > > down since in reality libdrm is a bit a dumping ground with very few > > > real rules. The commit rights and CoC sections I've copied verbatim > > > from igt respectively drm-misc. Weston/Wayland only differ in their > > > pick of how many patches you need (10 instead of 5). I think for > > > libdrm this is supremely relevant, since most everyone will get their > > > commit rights by contributing already to the kernel or mesa and having > > > commit rights there already. > > > > > > Anyway, I figured this is good to get the rules documented, even if > > > there's mostly not many rules. > > > > > > Note: This references maintainers in a MAINTAINERS file, which needs > > > to be created first. > > > > > > Note: With the gitlab migration the entire commit rights process is > > > still a bit up in the air. But gitlab commit rights and roles are > > > hierarchical, so we can do libdrm-only maintainer/commiter roles > > > ("Owner" and "Developer" in gitlab-speak). This should avoid > > > conflating libdrm roles with mesa roles, useful for those pushing to > > > libdrm as primarily kernel contributors. > > > > Fine with me, > > > > Acked-by: Dave Airlie > > > > Dave. > > I think this has gathered enough acks and rbs, you can just push it now > and if there's anything that should be adjusted we can do that as > a follow up :) And pushed. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.
+everyone again On Tue, Sep 4, 2018 at 2:39 PM Bas Nieuwenhuizen wrote: > > On Tue, Sep 4, 2018 at 2:22 PM Daniel Stone wrote: > > > > Hi, > > > > On Tue, 4 Sep 2018 at 11:44, Christian König > > wrote: > > > Am 04.09.2018 um 12:15 schrieb Daniel Stone: > > > > Right. The conclusion, after people went through and started sorting > > > > out the kinds of formats for which they would _actually_ export real > > > > colour buffers for, that most vendors definitely have fewer than > > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > > > > possible formats to represent, very likely fewer than > > > > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably > > > > fewer than 72,057,594,037,927,936 formats, and even still generally > > > > fewer than 281,474,976,710,656 if you want to be generous and leave 8 > > > > bits of the 56 available. > > > > > > The problem here is that at least for some parameters we actually don't > > > know which formats are actually used. > > > > > > The following are not real world examples, but just to explain the > > > general problem. > > > > > > The memory configuration for example can be not ASIC specific, but > > > rather determined by whoever took the ASIC and glued it together with > > > VRAM on a board. It is not likely that somebody puts all the VRAM chips > > > on one channel, but it is still perfectly possible. > > > > > > Same is true for things like harvesting, e.g. of 16 channels halve of > > > them could be bad and we need to know which to actually use. > > > > Sure, that's fine, but I'm not sure why, for instance, 'half of the > > memory channels are bad and must be avoided' would be attached to a > > format in the same way that macrotile size/depth would be? Similarly, > > what happens if you encode a channel or lane blacklist mask into a > > modifier, and then you import a buffer with that modifier into a > > client API on another device. Does that also apply to the other device > > or only the source device? > > > > How then do you handle the capability query between them: is it > > relevant to device B attempting to import and sample from an image > > that device A only has a 128-bit memory bus because of SKU > > configuration? How does generic userspace look at a token which > > encodes all of this information and know that the buffer is > > interchangeable between devices? > > So my idea here is to encode all the bits that determine format in a > way that as much as possible if it > results in the same layout it should get the same modifier(i.e. remove > useless info): > > https://lists.freedesktop.org/archives/dri-devel/2018-September/188418.html > > An example of the per-chip things is the PIPE_CONFIG, which is used > for address calculation in the macro-tiled mode: > > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/siaddrlib.cpp#L666 > > two chips with different PIPE_CONFIG will have different texture > layouts, as the address swizzles are different. > > So it gets included in the modifier (the PIPE_CONFIG is pretty much > constant per chip but multiple chips might have the same PIPE_CONFIG). > > > > > Cheers, > > Daniel > > ___ > > dri-devel mailing list > > dri-de...@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/dri-devel ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.
On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter wrote: > > On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: > > Am 04.09.2018 um 12:15 schrieb Daniel Stone: > > > Hi, > > > > > > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter wrote: > > > > On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen > > > > wrote: > > > > > +/* The chip this is compatible with. > > > > > + * > > > > > + * If compression is disabled, use > > > > > + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 > > > > > + * - AMDGPU_CHIP_VEGA10 for GFX9+ > > > > > + * > > > > > + * With compression enabled please use the exact chip. > > > > > + * > > > > > + * TODO: Do some generations share DCC format? > > > > > + */ > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff > > > > Do you really need all the combinations here of DCC + gpu gen + tiling > > > > details? When we had the entire discussion with nvidia folks they > > > > eventually agreed that they don't need the massive pile with every > > > > possible combination. Do you really plan to share all these different > > > > things? > > > > > > > > Note that e.g. on i915 we spec some of the tiling depending upon > > > > buffer size and buffer format (because that's how the hw works), not > > > > using explicit modifier flags for everything. > > > Right. The conclusion, after people went through and started sorting > > > out the kinds of formats for which they would _actually_ export real > > > colour buffers for, that most vendors definitely have fewer than > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > > > possible formats to represent, very likely fewer than > > > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably > > > fewer than 72,057,594,037,927,936 formats, and even still generally > > > fewer than 281,474,976,710,656 if you want to be generous and leave 8 > > > bits of the 56 available. > > > > The problem here is that at least for some parameters we actually don't know > > which formats are actually used. > > > > The following are not real world examples, but just to explain the general > > problem. > > > > The memory configuration for example can be not ASIC specific, but rather > > determined by whoever took the ASIC and glued it together with VRAM on a > > board. It is not likely that somebody puts all the VRAM chips on one > > channel, but it is still perfectly possible. > > > > Same is true for things like harvesting, e.g. of 16 channels halve of them > > could be bad and we need to know which to actually use. > > For my understanding: This leaks outside the chip when sharing buffers? > All the information you only need locally to a given amdgpu instance > don't really need to be encoded in modifiers. > > Pointers to code where this is all decided (kernel and radeonsi would be > good starters I guess) would be really good here. I extracted the information on which bits are relevant mostly from the AddrFromCoord functions in addrlib in mesa: for macro-tiles: https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587 for micro-tiles (or the micro-tiles in macro-tiles): https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016 > -Daniel > > > > > Regards, > > Christian. > > > > > > > > If you do use 256 bits in order to represent > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > > > modifiers per format, userspace would start hitting OOM pretty quickly > > > as it attempted to enumerate and negotiate acceptable modifiers. > > > Either that or we need to replace the fixed 64-bit modifier tokens > > > with some kind of eBPF script. > > > > > > Cheers, > > > Daniel > > > ___ > > > dri-devel mailing list > > > dri-de...@lists.freedesktop.org > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > > > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch > ___ > dri-devel mailing list > dri-de...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.
On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: > Am 04.09.2018 um 12:15 schrieb Daniel Stone: > > Hi, > > > > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter wrote: > > > On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen > > > wrote: > > > > +/* The chip this is compatible with. > > > > + * > > > > + * If compression is disabled, use > > > > + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 > > > > + * - AMDGPU_CHIP_VEGA10 for GFX9+ > > > > + * > > > > + * With compression enabled please use the exact chip. > > > > + * > > > > + * TODO: Do some generations share DCC format? > > > > + */ > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff > > > Do you really need all the combinations here of DCC + gpu gen + tiling > > > details? When we had the entire discussion with nvidia folks they > > > eventually agreed that they don't need the massive pile with every > > > possible combination. Do you really plan to share all these different > > > things? > > > > > > Note that e.g. on i915 we spec some of the tiling depending upon > > > buffer size and buffer format (because that's how the hw works), not > > > using explicit modifier flags for everything. > > Right. The conclusion, after people went through and started sorting > > out the kinds of formats for which they would _actually_ export real > > colour buffers for, that most vendors definitely have fewer than > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > > possible formats to represent, very likely fewer than > > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably > > fewer than 72,057,594,037,927,936 formats, and even still generally > > fewer than 281,474,976,710,656 if you want to be generous and leave 8 > > bits of the 56 available. > > The problem here is that at least for some parameters we actually don't know > which formats are actually used. > > The following are not real world examples, but just to explain the general > problem. > > The memory configuration for example can be not ASIC specific, but rather > determined by whoever took the ASIC and glued it together with VRAM on a > board. It is not likely that somebody puts all the VRAM chips on one > channel, but it is still perfectly possible. > > Same is true for things like harvesting, e.g. of 16 channels halve of them > could be bad and we need to know which to actually use. For my understanding: This leaks outside the chip when sharing buffers? All the information you only need locally to a given amdgpu instance don't really need to be encoded in modifiers. Pointers to code where this is all decided (kernel and radeonsi would be good starters I guess) would be really good here. -Daniel > > Regards, > Christian. > > > > > If you do use 256 bits in order to represent > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > > modifiers per format, userspace would start hitting OOM pretty quickly > > as it attempted to enumerate and negotiate acceptable modifiers. > > Either that or we need to replace the fixed 64-bit modifier tokens > > with some kind of eBPF script. > > > > Cheers, > > Daniel > > ___ > > dri-devel mailing list > > dri-de...@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.
Hi, On Tue, 4 Sep 2018 at 11:44, Christian König wrote: > Am 04.09.2018 um 12:15 schrieb Daniel Stone: > > Right. The conclusion, after people went through and started sorting > > out the kinds of formats for which they would _actually_ export real > > colour buffers for, that most vendors definitely have fewer than > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > > possible formats to represent, very likely fewer than > > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably > > fewer than 72,057,594,037,927,936 formats, and even still generally > > fewer than 281,474,976,710,656 if you want to be generous and leave 8 > > bits of the 56 available. > > The problem here is that at least for some parameters we actually don't > know which formats are actually used. > > The following are not real world examples, but just to explain the > general problem. > > The memory configuration for example can be not ASIC specific, but > rather determined by whoever took the ASIC and glued it together with > VRAM on a board. It is not likely that somebody puts all the VRAM chips > on one channel, but it is still perfectly possible. > > Same is true for things like harvesting, e.g. of 16 channels halve of > them could be bad and we need to know which to actually use. Sure, that's fine, but I'm not sure why, for instance, 'half of the memory channels are bad and must be avoided' would be attached to a format in the same way that macrotile size/depth would be? Similarly, what happens if you encode a channel or lane blacklist mask into a modifier, and then you import a buffer with that modifier into a client API on another device. Does that also apply to the other device or only the source device? How then do you handle the capability query between them: is it relevant to device B attempting to import and sample from an image that device A only has a 128-bit memory bus because of SKU configuration? How does generic userspace look at a token which encodes all of this information and know that the buffer is interchangeable between devices? Cheers, Daniel ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.
On Tue, Sep 4, 2018 at 12:04 PM Daniel Vetter wrote: > > On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen > wrote: > > This is an initial proposal for format modifiers for AMD hardware. > > > > It uses 48 bits including a chip generation, leaving 8 bits for > > a format version number. > > > > On gfx6-gfx8 we have all the fields influencing sample locations > > in memory. > > > > Tile split bytes are optional for single sample buffers as no > > hardware reaches the split size with 1 sample and hence the actual > > size does not matter. > > > > The macrotile fields are duplicated for images with multiple planes. > > If the planes have different bitdepth they need different macro > > tile fields and different tile split bytes if multisample. > > > > I could not fit multiple copies in for tile split bytes, but > > multisample & multiplane images are very rare. Overall, I think > > we should punt on multisample for a later format version since > > they are generally not shared on any modifier aware windowing > > system, and we have more issues like fmask & cmask. > > > > We need these copies because the drm modifier of all planes in an > > image needs to be equal, so we need to fit these together. > > > > This adds fields for compression support, using metadata that is > > compatible with AMDVLK and for which radv and radeonsi can > > reasonably be extended. > > > > The big open question for compression is between which generations > > the format changed to see if we can share more. > > > > This explicitly does not try to solve the linear stride alignment > > issue, thoguh we could internally just use the tiling modes for > > the linear modes to indicate linear images with the stride for the > > given chip. > > > > Signed-off-by: Bas Nieuwenhuizen > > CC: Chad Versace > > CC: Dave Airlie > > CC: Marek Olšák > > CC: Nicolai Hähnle > > CC: Alex Deucher > > CC: Daniel Vetter > > --- > > include/uapi/drm/amdgpu_drm.h | 130 ++ > > 1 file changed, 130 insertions(+) > > > > diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h > > index 9eeba55b..4e1452161dbf 100644 > > --- a/include/uapi/drm/amdgpu_drm.h > > +++ b/include/uapi/drm/amdgpu_drm.h > > @@ -990,6 +990,136 @@ struct drm_amdgpu_info_vce_clock_table { > > #define AMDGPU_FAMILY_AI 141 /* Vega10 */ > > #define AMDGPU_FAMILY_RV 142 /* Raven */ > > > > +#define AMDGPU_CHIP_TAHITI 0 > > +#define AMDGPU_CHIP_PITCAIRN 1 > > +#define AMDGPU_CHIP_VERDE 2 > > +#define AMDGPU_CHIP_OLAND 3 > > +#define AMDGPU_CHIP_HAINAN 4 > > +#define AMDGPU_CHIP_BONAIRE5 > > +#define AMDGPU_CHIP_KAVERI 6 > > +#define AMDGPU_CHIP_KABINI 7 > > +#define AMDGPU_CHIP_HAWAII 8 > > +#define AMDGPU_CHIP_MULLINS9 > > +#define AMDGPU_CHIP_TOPAZ 10 > > +#define AMDGPU_CHIP_TONGA 11 > > +#define AMDGPU_CHIP_FIJI 12 > > +#define AMDGPU_CHIP_CARRIZO13 > > +#define AMDGPU_CHIP_STONEY 14 > > +#define AMDGPU_CHIP_POLARIS10 15 > > +#define AMDGPU_CHIP_POLARIS11 16 > > +#define AMDGPU_CHIP_POLARIS12 17 > > +#define AMDGPU_CHIP_VEGAM 18 > > +#define AMDGPU_CHIP_VEGA10 19 > > +#define AMDGPU_CHIP_VEGA12 20 > > +#define AMDGPU_CHIP_VEGA20 21 > > +#define AMDGPU_CHIP_RAVEN 22 > > + > > +/* Format for GFX6-GFX8 DRM format modifiers. These are intentionally the > > same > > + * as AMDGPU_TILING_*. However, the the rules as to when to set them are > > + * different. > > + * > > + * Do not use linear ARRAY_MODEs or SWIZZLE_MODEs. Use > > DRM_FORMAT_MOD_LINEAR > > + * instead. > > + * > > + * If ARRAY_MODE is 1D, only the micro tile mode and the pipe config > > should be > > + * set. > > + * > > + * For other ARRAY_MODEs: > > + * - Only set TILE_SPLIT if the image is multisample. > > + * > > + * We have 1 extra bit for the micro tile mode, as GFX6 and GFX7+ have 1 > > + * different value there. The values are > > + * - depth : 0 > > + * - displayable : 1 > > + * - thin: 2 > > + * - thick (GFX6): 3 > > + * - rotated (GFX7+) : 4 > > + * > > + * TODO: What to do with multisample multi plane images? More tile split > > + * fields don't fit if we want to keep a few bits for a format version. > > + */ > > +#define AMDGPU_MODIFIER_GFX8_ARRAY_MODE_SHIFT 0 > > +#define AMDGPU_MODIFIER_GFX8_ARRAY_MODE_MASK 0xf > > +#define AMDGPU_MODIFIER_GFX8_PIPE_CONFIG_SHIFT 4 > > +#define AMDGPU_MODIFIER_GFX8_PIPE_CONFIG_MASK 0x1f > > +#define AMDGPU_MODIFIER_GFX8_TILE_SPLIT_SHIFT 9 > > +#define AMDGPU_MODIFIER_GFX8_TILE_SPLIT_MASK 0x7 > > +#define AMDGPU_MODIFIER_GFX8_MICRO_TILE_MODE_SHIFT 12 > > +#define AMDGPU_MODIFIER_GFX8_MICRO_TILE_MODE_MASK 0x7 > > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_SHIFT 15 > > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_MASK 0x3 > > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_SHIFT
Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.
Am 04.09.2018 um 12:15 schrieb Daniel Stone: Hi, On Tue, 4 Sep 2018 at 11:05, Daniel Vetter wrote: On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen wrote: +/* The chip this is compatible with. + * + * If compression is disabled, use + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 + * - AMDGPU_CHIP_VEGA10 for GFX9+ + * + * With compression enabled please use the exact chip. + * + * TODO: Do some generations share DCC format? + */ +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff Do you really need all the combinations here of DCC + gpu gen + tiling details? When we had the entire discussion with nvidia folks they eventually agreed that they don't need the massive pile with every possible combination. Do you really plan to share all these different things? Note that e.g. on i915 we spec some of the tiling depending upon buffer size and buffer format (because that's how the hw works), not using explicit modifier flags for everything. Right. The conclusion, after people went through and started sorting out the kinds of formats for which they would _actually_ export real colour buffers for, that most vendors definitely have fewer than 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 possible formats to represent, very likely fewer than 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably fewer than 72,057,594,037,927,936 formats, and even still generally fewer than 281,474,976,710,656 if you want to be generous and leave 8 bits of the 56 available. The problem here is that at least for some parameters we actually don't know which formats are actually used. The following are not real world examples, but just to explain the general problem. The memory configuration for example can be not ASIC specific, but rather determined by whoever took the ASIC and glued it together with VRAM on a board. It is not likely that somebody puts all the VRAM chips on one channel, but it is still perfectly possible. Same is true for things like harvesting, e.g. of 16 channels halve of them could be bad and we need to know which to actually use. Regards, Christian. If you do use 256 bits in order to represent 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 modifiers per format, userspace would start hitting OOM pretty quickly as it attempted to enumerate and negotiate acceptable modifiers. Either that or we need to replace the fixed 64-bit modifier tokens with some kind of eBPF script. Cheers, Daniel ___ dri-devel mailing list dri-de...@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.
Hi, On Tue, 4 Sep 2018 at 11:05, Daniel Vetter wrote: > On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen > wrote: > > +/* The chip this is compatible with. > > + * > > + * If compression is disabled, use > > + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 > > + * - AMDGPU_CHIP_VEGA10 for GFX9+ > > + * > > + * With compression enabled please use the exact chip. > > + * > > + * TODO: Do some generations share DCC format? > > + */ > > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 > > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff > > Do you really need all the combinations here of DCC + gpu gen + tiling > details? When we had the entire discussion with nvidia folks they > eventually agreed that they don't need the massive pile with every > possible combination. Do you really plan to share all these different > things? > > Note that e.g. on i915 we spec some of the tiling depending upon > buffer size and buffer format (because that's how the hw works), not > using explicit modifier flags for everything. Right. The conclusion, after people went through and started sorting out the kinds of formats for which they would _actually_ export real colour buffers for, that most vendors definitely have fewer than 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 possible formats to represent, very likely fewer than 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably fewer than 72,057,594,037,927,936 formats, and even still generally fewer than 281,474,976,710,656 if you want to be generous and leave 8 bits of the 56 available. If you do use 256 bits in order to represent 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 modifiers per format, userspace would start hitting OOM pretty quickly as it attempted to enumerate and negotiate acceptable modifiers. Either that or we need to replace the fixed 64-bit modifier tokens with some kind of eBPF script. Cheers, Daniel ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.
On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen wrote: > This is an initial proposal for format modifiers for AMD hardware. > > It uses 48 bits including a chip generation, leaving 8 bits for > a format version number. > > On gfx6-gfx8 we have all the fields influencing sample locations > in memory. > > Tile split bytes are optional for single sample buffers as no > hardware reaches the split size with 1 sample and hence the actual > size does not matter. > > The macrotile fields are duplicated for images with multiple planes. > If the planes have different bitdepth they need different macro > tile fields and different tile split bytes if multisample. > > I could not fit multiple copies in for tile split bytes, but > multisample & multiplane images are very rare. Overall, I think > we should punt on multisample for a later format version since > they are generally not shared on any modifier aware windowing > system, and we have more issues like fmask & cmask. > > We need these copies because the drm modifier of all planes in an > image needs to be equal, so we need to fit these together. > > This adds fields for compression support, using metadata that is > compatible with AMDVLK and for which radv and radeonsi can > reasonably be extended. > > The big open question for compression is between which generations > the format changed to see if we can share more. > > This explicitly does not try to solve the linear stride alignment > issue, thoguh we could internally just use the tiling modes for > the linear modes to indicate linear images with the stride for the > given chip. > > Signed-off-by: Bas Nieuwenhuizen > CC: Chad Versace > CC: Dave Airlie > CC: Marek Olšák > CC: Nicolai Hähnle > CC: Alex Deucher > CC: Daniel Vetter > --- > include/uapi/drm/amdgpu_drm.h | 130 ++ > 1 file changed, 130 insertions(+) > > diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h > index 9eeba55b..4e1452161dbf 100644 > --- a/include/uapi/drm/amdgpu_drm.h > +++ b/include/uapi/drm/amdgpu_drm.h > @@ -990,6 +990,136 @@ struct drm_amdgpu_info_vce_clock_table { > #define AMDGPU_FAMILY_AI 141 /* Vega10 */ > #define AMDGPU_FAMILY_RV 142 /* Raven */ > > +#define AMDGPU_CHIP_TAHITI 0 > +#define AMDGPU_CHIP_PITCAIRN 1 > +#define AMDGPU_CHIP_VERDE 2 > +#define AMDGPU_CHIP_OLAND 3 > +#define AMDGPU_CHIP_HAINAN 4 > +#define AMDGPU_CHIP_BONAIRE5 > +#define AMDGPU_CHIP_KAVERI 6 > +#define AMDGPU_CHIP_KABINI 7 > +#define AMDGPU_CHIP_HAWAII 8 > +#define AMDGPU_CHIP_MULLINS9 > +#define AMDGPU_CHIP_TOPAZ 10 > +#define AMDGPU_CHIP_TONGA 11 > +#define AMDGPU_CHIP_FIJI 12 > +#define AMDGPU_CHIP_CARRIZO13 > +#define AMDGPU_CHIP_STONEY 14 > +#define AMDGPU_CHIP_POLARIS10 15 > +#define AMDGPU_CHIP_POLARIS11 16 > +#define AMDGPU_CHIP_POLARIS12 17 > +#define AMDGPU_CHIP_VEGAM 18 > +#define AMDGPU_CHIP_VEGA10 19 > +#define AMDGPU_CHIP_VEGA12 20 > +#define AMDGPU_CHIP_VEGA20 21 > +#define AMDGPU_CHIP_RAVEN 22 > + > +/* Format for GFX6-GFX8 DRM format modifiers. These are intentionally the > same > + * as AMDGPU_TILING_*. However, the the rules as to when to set them are > + * different. > + * > + * Do not use linear ARRAY_MODEs or SWIZZLE_MODEs. Use DRM_FORMAT_MOD_LINEAR > + * instead. > + * > + * If ARRAY_MODE is 1D, only the micro tile mode and the pipe config should > be > + * set. > + * > + * For other ARRAY_MODEs: > + * - Only set TILE_SPLIT if the image is multisample. > + * > + * We have 1 extra bit for the micro tile mode, as GFX6 and GFX7+ have 1 > + * different value there. The values are > + * - depth : 0 > + * - displayable : 1 > + * - thin: 2 > + * - thick (GFX6): 3 > + * - rotated (GFX7+) : 4 > + * > + * TODO: What to do with multisample multi plane images? More tile split > + * fields don't fit if we want to keep a few bits for a format version. > + */ > +#define AMDGPU_MODIFIER_GFX8_ARRAY_MODE_SHIFT 0 > +#define AMDGPU_MODIFIER_GFX8_ARRAY_MODE_MASK 0xf > +#define AMDGPU_MODIFIER_GFX8_PIPE_CONFIG_SHIFT 4 > +#define AMDGPU_MODIFIER_GFX8_PIPE_CONFIG_MASK 0x1f > +#define AMDGPU_MODIFIER_GFX8_TILE_SPLIT_SHIFT 9 > +#define AMDGPU_MODIFIER_GFX8_TILE_SPLIT_MASK 0x7 > +#define AMDGPU_MODIFIER_GFX8_MICRO_TILE_MODE_SHIFT 12 > +#define AMDGPU_MODIFIER_GFX8_MICRO_TILE_MODE_MASK 0x7 > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_SHIFT 15 > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_MASK 0x3 > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_SHIFT 17 > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_MASK 0x3 > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_SHIFT 19 > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_MASK0x3 > +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_SHIFT 21 > +#define
Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3
Am 04.09.2018 um 11:00 schrieb zhoucm1: On 2018年09月04日 16:42, Christian König wrote: Am 04.09.2018 um 10:27 schrieb zhoucm1: On 2018年09月04日 16:05, Christian König wrote: Am 04.09.2018 um 09:53 schrieb zhoucm1: [SNIP] How about this idea: 1. Each signaling point is a fence implementation with an rb node. 2. Each node keeps a reference to the last previously inserted node. 3. Each node is referenced by the sync object itself. 4. Before each signal/wait operation we do a garbage collection and remove the first node from the tree as long as it is signaled. 5. When enable_signaling is requested for a node we cascade that to the left using rb_prev. This ensures that signaling is enabled for the current fence as well as all previous fences. 6. A wait just looks into the tree for the signal point lower or equal of the requested sequence number. After re-thought your idea, I think it doesn't work since there is no timeline value as a line: signal pt value doesn't must be continues, which can be jump by signal operation, like 1, 4, 8, 15, 19, e.g. there are five singal_pt, signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if a wait pt is 7, do you mean this wait only needs signal_pt1 and signal_pt4??? That's certainly not right, we need to make sure the timeline value is bigger than wait pt value, that means signal_pt8 is need for wait_pt7. That can be defined as we like it, e.g. when a wait operation asks for 7 we can return 8 as well. If defined this, then problem is coming again, if 8 is removed when garbage collection, you will return 15? The garbage collection is only done for signaled nodes. So when 8 is already garbage collected and 7 is asked we know that we don't need to return anything. 8 is a signaled node, waitA/signal operation do garbage collection, how waitB(7) know the garbage history? Well we of course keep what the last garbage collected number is, don't we? Since there is no timeline as a line, I think this is not right direction. That is actually intended. There is no infinite timeline here, just a windows of the last not yet signaled fences. No one said the it's a infinite timeline, timeline will stop increasing when syncobj is released. Yeah, but the syncobj can live for a very very long time. Not having some form of shrinking it when fences are signaled is certainly not going to fly very far. Regards, Christian. Anyway kref is a good way to solve the 'free' problem, I will try to use it improve my patch, of course, will refer your idea.:) Thanks, David Zhou Otherwise you will never be able to release nodes from the tree since you always need to keep them around just in case somebody asks for a lower number. Regards, Christian. The key is that as soon as a signal point is added adding a previous point is no longer allowed. That's intention. Regards, David Zhou 7. When the sync object is released we use rbtree_postorder_for_each_entry_safe() and drop the extra reference to each node, but never call rb_erase! This way the rb_tree stays in memory, but without a root (e.g. the sync object). It only destructs itself when the looked up references to the nodes are dropped. And here, who will destroy rb node since no one do enable_signaling, and there is no callback to free themselves. The node will be destroyed when the last reference drops, not when enable_signaling is called. In other words the sync_obj keeps the references to each tree object to provide the wait operation, as soon as the sync_obj is destroyed we don't need that functionality any more. We don't even need to wait for anything to be signaled, this way we can drop all unused signal points as soon as the sync_obj is destroyed. Only the used ones will stay alive and provide the necessary functionality to provide the signal for each wait operation. Regards, Christian. Regards, David Zhou Well that is quite a bunch of logic, but I think that should work fine. Yeah, it could work, simple timeline reference also can solve 'free' problem. I think this approach is still quite a bit better, e.g. you don't run into circle dependency problems, it needs less memory and each node has always the same size which means we can use a kmem_cache for it. Regards, Christian. Thanks, David Zhou ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [Mesa-dev] [PATCH libdrm] Add basic CONTRIBUTING file
On Tuesday, 2018-09-04 16:24:44 +1000, Dave Airlie wrote: > On Mon, 3 Sep 2018 at 18:47, Daniel Vetter wrote: > > > > I picked up a bunch of the pieces from wayland's version: > > > > https://gitlab.freedesktop.org/wayland/wayland/blob/master/CONTRIBUTING.md > > > > The weston one is fairly similar. Then I rather massively trimmed it > > down since in reality libdrm is a bit a dumping ground with very few > > real rules. The commit rights and CoC sections I've copied verbatim > > from igt respectively drm-misc. Weston/Wayland only differ in their > > pick of how many patches you need (10 instead of 5). I think for > > libdrm this is supremely relevant, since most everyone will get their > > commit rights by contributing already to the kernel or mesa and having > > commit rights there already. > > > > Anyway, I figured this is good to get the rules documented, even if > > there's mostly not many rules. > > > > Note: This references maintainers in a MAINTAINERS file, which needs > > to be created first. > > > > Note: With the gitlab migration the entire commit rights process is > > still a bit up in the air. But gitlab commit rights and roles are > > hierarchical, so we can do libdrm-only maintainer/commiter roles > > ("Owner" and "Developer" in gitlab-speak). This should avoid > > conflating libdrm roles with mesa roles, useful for those pushing to > > libdrm as primarily kernel contributors. > > Fine with me, > > Acked-by: Dave Airlie > > Dave. I think this has gathered enough acks and rbs, you can just push it now and if there's anything that should be adjusted we can do that as a follow up :) ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3
On 2018年09月04日 16:42, Christian König wrote: Am 04.09.2018 um 10:27 schrieb zhoucm1: On 2018年09月04日 16:05, Christian König wrote: Am 04.09.2018 um 09:53 schrieb zhoucm1: [SNIP] How about this idea: 1. Each signaling point is a fence implementation with an rb node. 2. Each node keeps a reference to the last previously inserted node. 3. Each node is referenced by the sync object itself. 4. Before each signal/wait operation we do a garbage collection and remove the first node from the tree as long as it is signaled. 5. When enable_signaling is requested for a node we cascade that to the left using rb_prev. This ensures that signaling is enabled for the current fence as well as all previous fences. 6. A wait just looks into the tree for the signal point lower or equal of the requested sequence number. After re-thought your idea, I think it doesn't work since there is no timeline value as a line: signal pt value doesn't must be continues, which can be jump by signal operation, like 1, 4, 8, 15, 19, e.g. there are five singal_pt, signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if a wait pt is 7, do you mean this wait only needs signal_pt1 and signal_pt4??? That's certainly not right, we need to make sure the timeline value is bigger than wait pt value, that means signal_pt8 is need for wait_pt7. That can be defined as we like it, e.g. when a wait operation asks for 7 we can return 8 as well. If defined this, then problem is coming again, if 8 is removed when garbage collection, you will return 15? The garbage collection is only done for signaled nodes. So when 8 is already garbage collected and 7 is asked we know that we don't need to return anything. 8 is a signaled node, waitA/signal operation do garbage collection, how waitB(7) know the garbage history? Since there is no timeline as a line, I think this is not right direction. That is actually intended. There is no infinite timeline here, just a windows of the last not yet signaled fences. No one said the it's a infinite timeline, timeline will stop increasing when syncobj is released. Anyway kref is a good way to solve the 'free' problem, I will try to use it improve my patch, of course, will refer your idea.:) Thanks, David Zhou Otherwise you will never be able to release nodes from the tree since you always need to keep them around just in case somebody asks for a lower number. Regards, Christian. The key is that as soon as a signal point is added adding a previous point is no longer allowed. That's intention. Regards, David Zhou 7. When the sync object is released we use rbtree_postorder_for_each_entry_safe() and drop the extra reference to each node, but never call rb_erase! This way the rb_tree stays in memory, but without a root (e.g. the sync object). It only destructs itself when the looked up references to the nodes are dropped. And here, who will destroy rb node since no one do enable_signaling, and there is no callback to free themselves. The node will be destroyed when the last reference drops, not when enable_signaling is called. In other words the sync_obj keeps the references to each tree object to provide the wait operation, as soon as the sync_obj is destroyed we don't need that functionality any more. We don't even need to wait for anything to be signaled, this way we can drop all unused signal points as soon as the sync_obj is destroyed. Only the used ones will stay alive and provide the necessary functionality to provide the signal for each wait operation. Regards, Christian. Regards, David Zhou Well that is quite a bunch of logic, but I think that should work fine. Yeah, it could work, simple timeline reference also can solve 'free' problem. I think this approach is still quite a bit better, e.g. you don't run into circle dependency problems, it needs less memory and each node has always the same size which means we can use a kmem_cache for it. Regards, Christian. Thanks, David Zhou ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3
Am 04.09.2018 um 10:27 schrieb zhoucm1: On 2018年09月04日 16:05, Christian König wrote: Am 04.09.2018 um 09:53 schrieb zhoucm1: [SNIP] How about this idea: 1. Each signaling point is a fence implementation with an rb node. 2. Each node keeps a reference to the last previously inserted node. 3. Each node is referenced by the sync object itself. 4. Before each signal/wait operation we do a garbage collection and remove the first node from the tree as long as it is signaled. 5. When enable_signaling is requested for a node we cascade that to the left using rb_prev. This ensures that signaling is enabled for the current fence as well as all previous fences. 6. A wait just looks into the tree for the signal point lower or equal of the requested sequence number. After re-thought your idea, I think it doesn't work since there is no timeline value as a line: signal pt value doesn't must be continues, which can be jump by signal operation, like 1, 4, 8, 15, 19, e.g. there are five singal_pt, signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if a wait pt is 7, do you mean this wait only needs signal_pt1 and signal_pt4??? That's certainly not right, we need to make sure the timeline value is bigger than wait pt value, that means signal_pt8 is need for wait_pt7. That can be defined as we like it, e.g. when a wait operation asks for 7 we can return 8 as well. If defined this, then problem is coming again, if 8 is removed when garbage collection, you will return 15? The garbage collection is only done for signaled nodes. So when 8 is already garbage collected and 7 is asked we know that we don't need to return anything. Since there is no timeline as a line, I think this is not right direction. That is actually intended. There is no infinite timeline here, just a windows of the last not yet signaled fences. Otherwise you will never be able to release nodes from the tree since you always need to keep them around just in case somebody asks for a lower number. Regards, Christian. The key is that as soon as a signal point is added adding a previous point is no longer allowed. That's intention. Regards, David Zhou 7. When the sync object is released we use rbtree_postorder_for_each_entry_safe() and drop the extra reference to each node, but never call rb_erase! This way the rb_tree stays in memory, but without a root (e.g. the sync object). It only destructs itself when the looked up references to the nodes are dropped. And here, who will destroy rb node since no one do enable_signaling, and there is no callback to free themselves. The node will be destroyed when the last reference drops, not when enable_signaling is called. In other words the sync_obj keeps the references to each tree object to provide the wait operation, as soon as the sync_obj is destroyed we don't need that functionality any more. We don't even need to wait for anything to be signaled, this way we can drop all unused signal points as soon as the sync_obj is destroyed. Only the used ones will stay alive and provide the necessary functionality to provide the signal for each wait operation. Regards, Christian. Regards, David Zhou Well that is quite a bunch of logic, but I think that should work fine. Yeah, it could work, simple timeline reference also can solve 'free' problem. I think this approach is still quite a bit better, e.g. you don't run into circle dependency problems, it needs less memory and each node has always the same size which means we can use a kmem_cache for it. Regards, Christian. Thanks, David Zhou ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3
On 2018年09月04日 16:05, Christian König wrote: Am 04.09.2018 um 09:53 schrieb zhoucm1: [SNIP] How about this idea: 1. Each signaling point is a fence implementation with an rb node. 2. Each node keeps a reference to the last previously inserted node. 3. Each node is referenced by the sync object itself. 4. Before each signal/wait operation we do a garbage collection and remove the first node from the tree as long as it is signaled. 5. When enable_signaling is requested for a node we cascade that to the left using rb_prev. This ensures that signaling is enabled for the current fence as well as all previous fences. 6. A wait just looks into the tree for the signal point lower or equal of the requested sequence number. After re-thought your idea, I think it doesn't work since there is no timeline value as a line: signal pt value doesn't must be continues, which can be jump by signal operation, like 1, 4, 8, 15, 19, e.g. there are five singal_pt, signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if a wait pt is 7, do you mean this wait only needs signal_pt1 and signal_pt4??? That's certainly not right, we need to make sure the timeline value is bigger than wait pt value, that means signal_pt8 is need for wait_pt7. That can be defined as we like it, e.g. when a wait operation asks for 7 we can return 8 as well. If defined this, then problem is coming again, if 8 is removed when garbage collection, you will return 15? Since there is no timeline as a line, I think this is not right direction. The key is that as soon as a signal point is added adding a previous point is no longer allowed. That's intention. Regards, David Zhou 7. When the sync object is released we use rbtree_postorder_for_each_entry_safe() and drop the extra reference to each node, but never call rb_erase! This way the rb_tree stays in memory, but without a root (e.g. the sync object). It only destructs itself when the looked up references to the nodes are dropped. And here, who will destroy rb node since no one do enable_signaling, and there is no callback to free themselves. The node will be destroyed when the last reference drops, not when enable_signaling is called. In other words the sync_obj keeps the references to each tree object to provide the wait operation, as soon as the sync_obj is destroyed we don't need that functionality any more. We don't even need to wait for anything to be signaled, this way we can drop all unused signal points as soon as the sync_obj is destroyed. Only the used ones will stay alive and provide the necessary functionality to provide the signal for each wait operation. Regards, Christian. Regards, David Zhou Well that is quite a bunch of logic, but I think that should work fine. Yeah, it could work, simple timeline reference also can solve 'free' problem. I think this approach is still quite a bit better, e.g. you don't run into circle dependency problems, it needs less memory and each node has always the same size which means we can use a kmem_cache for it. Regards, Christian. Thanks, David Zhou ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3
Am 04.09.2018 um 09:53 schrieb zhoucm1: [SNIP] How about this idea: 1. Each signaling point is a fence implementation with an rb node. 2. Each node keeps a reference to the last previously inserted node. 3. Each node is referenced by the sync object itself. 4. Before each signal/wait operation we do a garbage collection and remove the first node from the tree as long as it is signaled. 5. When enable_signaling is requested for a node we cascade that to the left using rb_prev. This ensures that signaling is enabled for the current fence as well as all previous fences. 6. A wait just looks into the tree for the signal point lower or equal of the requested sequence number. After re-thought your idea, I think it doesn't work since there is no timeline value as a line: signal pt value doesn't must be continues, which can be jump by signal operation, like 1, 4, 8, 15, 19, e.g. there are five singal_pt, signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if a wait pt is 7, do you mean this wait only needs signal_pt1 and signal_pt4??? That's certainly not right, we need to make sure the timeline value is bigger than wait pt value, that means signal_pt8 is need for wait_pt7. That can be defined as we like it, e.g. when a wait operation asks for 7 we can return 8 as well. The key is that as soon as a signal point is added adding a previous point is no longer allowed. 7. When the sync object is released we use rbtree_postorder_for_each_entry_safe() and drop the extra reference to each node, but never call rb_erase! This way the rb_tree stays in memory, but without a root (e.g. the sync object). It only destructs itself when the looked up references to the nodes are dropped. And here, who will destroy rb node since no one do enable_signaling, and there is no callback to free themselves. The node will be destroyed when the last reference drops, not when enable_signaling is called. In other words the sync_obj keeps the references to each tree object to provide the wait operation, as soon as the sync_obj is destroyed we don't need that functionality any more. We don't even need to wait for anything to be signaled, this way we can drop all unused signal points as soon as the sync_obj is destroyed. Only the used ones will stay alive and provide the necessary functionality to provide the signal for each wait operation. Regards, Christian. Regards, David Zhou Well that is quite a bunch of logic, but I think that should work fine. Yeah, it could work, simple timeline reference also can solve 'free' problem. I think this approach is still quite a bit better, e.g. you don't run into circle dependency problems, it needs less memory and each node has always the same size which means we can use a kmem_cache for it. Regards, Christian. Thanks, David Zhou ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3
On 2018年09月04日 15:00, Christian König wrote: Am 04.09.2018 um 06:04 schrieb zhoucm1: On 2018年09月03日 19:19, Christian König wrote: Am 03.09.2018 um 12:07 schrieb Chunming Zhou: 在 2018/9/3 16:50, Christian König 写道: Am 03.09.2018 um 06:13 schrieb Chunming Zhou: 在 2018/8/30 19:32, Christian König 写道: [SNIP] + +struct drm_syncobj_wait_pt { + struct drm_syncobj_stub_fence base; + u64 value; + struct rb_node node; +}; +struct drm_syncobj_signal_pt { + struct drm_syncobj_stub_fence base; + struct dma_fence *signal_fence; + struct dma_fence *pre_pt_base; + struct dma_fence_cb signal_cb; + struct dma_fence_cb pre_pt_cb; + struct drm_syncobj *syncobj; + u64 value; + struct list_head list; +}; What are those two structures good for They are used to record wait ops points and signal ops points. For timeline, they are connected by timeline value, works like: a. signal pt increase timeline value to signal_pt value when signal_pt is signaled. signal_pt is depending on previous pt fence and itself signal fence from signal ops. b. wait pt tree checks if timeline value over itself value. For normal, works like: a. signal pt increase 1 for syncobj_timeline->signal_point every time when signal ops is performed. b. when wait ops is comming, wait pt will fetch above last signal pt value as its wait point. wait pt will be only signaled by equal point value signal_pt. and why is the stub fence their base? Good question, I tried to kzalloc for them as well when I debug them, I encountered a problem: I lookup/find wait_pt or signal_pt successfully, but when I tried to use them, they are freed sometimes, and results in NULL point. and generally, when lookup them, we often need their stub fence as well, and in the meantime, their lifecycle are same. Above reason, I make stub fence as their base. That sounds like you only did this because you messed up the lifecycle. Additional to that I don't think you correctly considered the lifecycle of the waits and the sync object itself. E.g. blocking in drm_syncobj_timeline_fini() until all waits are done is not a good idea. What you should do instead is to create a fence_array object with all the fence we need to wait for when a wait point is requested. Yeah, this is our initial discussion result, but when I tried to do that, I found that cannot meet the advance signal requirement: a. We need to consider the wait and signal pt value are not one-to-one match, it's difficult to find dependent point, at least, there is some overhead. As far as I can see that is independent of using a fence array here. See you can either use a ring buffer or an rb-tree, but when you want to wait for a specific point we need to condense the not yet signaled fences into an array. again, need to find the range of where the specific point in, we should close to timeline semantics, I also refered the sw_sync.c timeline, usally wait_pt is signalled by timeline point. And I agree we can implement it with several methods, but I don't think there are basical differences. The problem is that with your current approach you need the sync_obj alive for the synchronization to work. That is most likely not a good idea. Indeed, I will fix that. How abount only creating fence array for every wait pt when syncobj release? when syncobj release, wait pt must have waited the signal opertation, then we can easily condense fences for every wait pt. And meantime, we can take timeline based wait pt advantage. That could work, but I don't see how you want to replace the already issued fence with a fence_array when the sync object is destroyed. Additional to that I would rather prefer a consistent handling, e.g. without extra rarely used code paths. Ah, I find a easy way, we just need to make syncobj_timeline structure as a reference. This way syncobj itself can be released first, wait_pt/signal_pt don't need syncobj at all. every wait_pt/signal_pt keep a reference of syncobj_timeline. I've thought about that as well, but came to the conclusion that you run into problems because of circle dependencies. E.g. sync_obj references sync_point and sync_point references sync_obj. sync_obj can be freed first, only sync point references syncobj_timeline structure, syncobj_timeline doesn't reference sync_pt, no circle dep. Additional to that it is quite a bit larger memory footprint because you need to keep the sync_obj around as well. all signaled sync_pt are freed immediately except syncobj_timeline sturcture, where does extra memory foootprint take? Additional to that you enable signaling without a need from the waiting side. That is rather bad for implementations which need that optimization. Do you mean increasing timeline based on signal fence is not better? only update timeline value when requested by a wait pt? Yes, exactly. This way, we will not update timeline value
Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.
Am 04.09.2018 um 03:00 schrieb Bas Nieuwenhuizen: This is an initial proposal for format modifiers for AMD hardware. It uses 48 bits including a chip generation, leaving 8 bits for a format version number. I'm absolutely not an expert on this, but as far as I know the major problem with this approach was that we couldn't fit all necessary parameter into a 64bit identifier. The last thing I've heard was something like we need at least 128bit, better 256. But Marek and Alex certainly know more about that stuff. Christian. On gfx6-gfx8 we have all the fields influencing sample locations in memory. Tile split bytes are optional for single sample buffers as no hardware reaches the split size with 1 sample and hence the actual size does not matter. The macrotile fields are duplicated for images with multiple planes. If the planes have different bitdepth they need different macro tile fields and different tile split bytes if multisample. I could not fit multiple copies in for tile split bytes, but multisample & multiplane images are very rare. Overall, I think we should punt on multisample for a later format version since they are generally not shared on any modifier aware windowing system, and we have more issues like fmask & cmask. We need these copies because the drm modifier of all planes in an image needs to be equal, so we need to fit these together. This adds fields for compression support, using metadata that is compatible with AMDVLK and for which radv and radeonsi can reasonably be extended. The big open question for compression is between which generations the format changed to see if we can share more. This explicitly does not try to solve the linear stride alignment issue, thoguh we could internally just use the tiling modes for the linear modes to indicate linear images with the stride for the given chip. Signed-off-by: Bas Nieuwenhuizen CC: Chad Versace CC: Dave Airlie CC: Marek Olšák CC: Nicolai Hähnle CC: Alex Deucher CC: Daniel Vetter --- include/uapi/drm/amdgpu_drm.h | 130 ++ 1 file changed, 130 insertions(+) diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h index 9eeba55b..4e1452161dbf 100644 --- a/include/uapi/drm/amdgpu_drm.h +++ b/include/uapi/drm/amdgpu_drm.h @@ -990,6 +990,136 @@ struct drm_amdgpu_info_vce_clock_table { #define AMDGPU_FAMILY_AI 141 /* Vega10 */ #define AMDGPU_FAMILY_RV 142 /* Raven */ +#define AMDGPU_CHIP_TAHITI 0 +#define AMDGPU_CHIP_PITCAIRN 1 +#define AMDGPU_CHIP_VERDE 2 +#define AMDGPU_CHIP_OLAND 3 +#define AMDGPU_CHIP_HAINAN 4 +#define AMDGPU_CHIP_BONAIRE5 +#define AMDGPU_CHIP_KAVERI 6 +#define AMDGPU_CHIP_KABINI 7 +#define AMDGPU_CHIP_HAWAII 8 +#define AMDGPU_CHIP_MULLINS9 +#define AMDGPU_CHIP_TOPAZ 10 +#define AMDGPU_CHIP_TONGA 11 +#define AMDGPU_CHIP_FIJI 12 +#define AMDGPU_CHIP_CARRIZO13 +#define AMDGPU_CHIP_STONEY 14 +#define AMDGPU_CHIP_POLARIS10 15 +#define AMDGPU_CHIP_POLARIS11 16 +#define AMDGPU_CHIP_POLARIS12 17 +#define AMDGPU_CHIP_VEGAM 18 +#define AMDGPU_CHIP_VEGA10 19 +#define AMDGPU_CHIP_VEGA12 20 +#define AMDGPU_CHIP_VEGA20 21 +#define AMDGPU_CHIP_RAVEN 22 + +/* Format for GFX6-GFX8 DRM format modifiers. These are intentionally the same + * as AMDGPU_TILING_*. However, the the rules as to when to set them are + * different. + * + * Do not use linear ARRAY_MODEs or SWIZZLE_MODEs. Use DRM_FORMAT_MOD_LINEAR + * instead. + * + * If ARRAY_MODE is 1D, only the micro tile mode and the pipe config should be + * set. + * + * For other ARRAY_MODEs: + * - Only set TILE_SPLIT if the image is multisample. + * + * We have 1 extra bit for the micro tile mode, as GFX6 and GFX7+ have 1 + * different value there. The values are + * - depth : 0 + * - displayable : 1 + * - thin: 2 + * - thick (GFX6): 3 + * - rotated (GFX7+) : 4 + * + * TODO: What to do with multisample multi plane images? More tile split + * fields don't fit if we want to keep a few bits for a format version. + */ +#define AMDGPU_MODIFIER_GFX8_ARRAY_MODE_SHIFT 0 +#define AMDGPU_MODIFIER_GFX8_ARRAY_MODE_MASK 0xf +#define AMDGPU_MODIFIER_GFX8_PIPE_CONFIG_SHIFT 4 +#define AMDGPU_MODIFIER_GFX8_PIPE_CONFIG_MASK 0x1f +#define AMDGPU_MODIFIER_GFX8_TILE_SPLIT_SHIFT 9 +#define AMDGPU_MODIFIER_GFX8_TILE_SPLIT_MASK 0x7 +#define AMDGPU_MODIFIER_GFX8_MICRO_TILE_MODE_SHIFT 12 +#define AMDGPU_MODIFIER_GFX8_MICRO_TILE_MODE_MASK 0x7 +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_SHIFT 15 +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_MASK 0x3 +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_SHIFT 17 +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_MASK 0x3 +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_SHIFT 19 +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_MASK0x3
Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3
Am 04.09.2018 um 06:04 schrieb zhoucm1: On 2018年09月03日 19:19, Christian König wrote: Am 03.09.2018 um 12:07 schrieb Chunming Zhou: 在 2018/9/3 16:50, Christian König 写道: Am 03.09.2018 um 06:13 schrieb Chunming Zhou: 在 2018/8/30 19:32, Christian König 写道: [SNIP] + +struct drm_syncobj_wait_pt { + struct drm_syncobj_stub_fence base; + u64 value; + struct rb_node node; +}; +struct drm_syncobj_signal_pt { + struct drm_syncobj_stub_fence base; + struct dma_fence *signal_fence; + struct dma_fence *pre_pt_base; + struct dma_fence_cb signal_cb; + struct dma_fence_cb pre_pt_cb; + struct drm_syncobj *syncobj; + u64 value; + struct list_head list; +}; What are those two structures good for They are used to record wait ops points and signal ops points. For timeline, they are connected by timeline value, works like: a. signal pt increase timeline value to signal_pt value when signal_pt is signaled. signal_pt is depending on previous pt fence and itself signal fence from signal ops. b. wait pt tree checks if timeline value over itself value. For normal, works like: a. signal pt increase 1 for syncobj_timeline->signal_point every time when signal ops is performed. b. when wait ops is comming, wait pt will fetch above last signal pt value as its wait point. wait pt will be only signaled by equal point value signal_pt. and why is the stub fence their base? Good question, I tried to kzalloc for them as well when I debug them, I encountered a problem: I lookup/find wait_pt or signal_pt successfully, but when I tried to use them, they are freed sometimes, and results in NULL point. and generally, when lookup them, we often need their stub fence as well, and in the meantime, their lifecycle are same. Above reason, I make stub fence as their base. That sounds like you only did this because you messed up the lifecycle. Additional to that I don't think you correctly considered the lifecycle of the waits and the sync object itself. E.g. blocking in drm_syncobj_timeline_fini() until all waits are done is not a good idea. What you should do instead is to create a fence_array object with all the fence we need to wait for when a wait point is requested. Yeah, this is our initial discussion result, but when I tried to do that, I found that cannot meet the advance signal requirement: a. We need to consider the wait and signal pt value are not one-to-one match, it's difficult to find dependent point, at least, there is some overhead. As far as I can see that is independent of using a fence array here. See you can either use a ring buffer or an rb-tree, but when you want to wait for a specific point we need to condense the not yet signaled fences into an array. again, need to find the range of where the specific point in, we should close to timeline semantics, I also refered the sw_sync.c timeline, usally wait_pt is signalled by timeline point. And I agree we can implement it with several methods, but I don't think there are basical differences. The problem is that with your current approach you need the sync_obj alive for the synchronization to work. That is most likely not a good idea. Indeed, I will fix that. How abount only creating fence array for every wait pt when syncobj release? when syncobj release, wait pt must have waited the signal opertation, then we can easily condense fences for every wait pt. And meantime, we can take timeline based wait pt advantage. That could work, but I don't see how you want to replace the already issued fence with a fence_array when the sync object is destroyed. Additional to that I would rather prefer a consistent handling, e.g. without extra rarely used code paths. Ah, I find a easy way, we just need to make syncobj_timeline structure as a reference. This way syncobj itself can be released first, wait_pt/signal_pt don't need syncobj at all. every wait_pt/signal_pt keep a reference of syncobj_timeline. I've thought about that as well, but came to the conclusion that you run into problems because of circle dependencies. E.g. sync_obj references sync_point and sync_point references sync_obj. Additional to that it is quite a bit larger memory footprint because you need to keep the sync_obj around as well. Additional to that you enable signaling without a need from the waiting side. That is rather bad for implementations which need that optimization. Do you mean increasing timeline based on signal fence is not better? only update timeline value when requested by a wait pt? Yes, exactly. This way, we will not update timeline value immidiately and cannot free signal pt immidiately, and we also need to consider it to CPU query and wait. That is actually the better coding style. We usually try to avoid doing things in interrupt handlers as much as possible. OK, I see your concern, how about to delay handling to a workqueue? this way, we only
Re: [PATCH libdrm] Add basic CONTRIBUTING file
On Mon, 3 Sep 2018 at 18:47, Daniel Vetter wrote: > > I picked up a bunch of the pieces from wayland's version: > > https://gitlab.freedesktop.org/wayland/wayland/blob/master/CONTRIBUTING.md > > The weston one is fairly similar. Then I rather massively trimmed it > down since in reality libdrm is a bit a dumping ground with very few > real rules. The commit rights and CoC sections I've copied verbatim > from igt respectively drm-misc. Weston/Wayland only differ in their > pick of how many patches you need (10 instead of 5). I think for > libdrm this is supremely relevant, since most everyone will get their > commit rights by contributing already to the kernel or mesa and having > commit rights there already. > > Anyway, I figured this is good to get the rules documented, even if > there's mostly not many rules. > > Note: This references maintainers in a MAINTAINERS file, which needs > to be created first. > > Note: With the gitlab migration the entire commit rights process is > still a bit up in the air. But gitlab commit rights and roles are > hierarchical, so we can do libdrm-only maintainer/commiter roles > ("Owner" and "Developer" in gitlab-speak). This should avoid > conflating libdrm roles with mesa roles, useful for those pushing to > libdrm as primarily kernel contributors. Fine with me, Acked-by: Dave Airlie Dave. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx