AMD FirePro W5130M radeon/amdgpu driver bug

2020-02-28 Thread David Brinovec
Hello,

I've been struggling for a long time with an issue I have getting either
the radeon or the amdgpu drivers to work with the AMD FirePro W5130M
discrete GPU in my Dell Precision 3510.

Most recently I've been loading the drivers once I have the system booted
with modprobe commands.  This makes things a bit more convenient for
testing.

I get fairly similar behavior whether I use the radeon driver or the amdgpu
driver.

I'm able to insert the module but once I start an application that uses it
for 3D rendering the application typically freezes and sometimes the entire
system does too.

In my dmesg output I see alot of errors like "ring 0 stalled for more than
##msec" and "GPU fault detected".

In the past when I've reported bugs, I've been told to try the most recent
version of the kernel.  I'm wondering if someone can point me to the best
way to go about that.  As in, is there a good linux distro that I can use
to build the latest kernel, etc.

Any help or suggestions would be appreciated.

Thanks,
Dave
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2] drm/amd/display: Fix dmub_psr_destroy()

2020-02-28 Thread Alex Deucher
Applied.  Thanks!

Alex

On Fri, Feb 28, 2020 at 8:08 AM Kazlauskas, Nicholas
 wrote:
>
> On 2020-02-28 5:58 a.m., Dan Carpenter wrote:
> > This is freeing the wrong variable so it will crash.  It should be
> > freeing "*dmub" instead of "dmub".
> >
> > Fixes: 4c1a1335dfe0 ("drm/amd/display: Driverside changes to support PSR in 
> > DMCUB")
> > Signed-off-by: Dan Carpenter 
>
> Reviewed-by: Nicholas Kazlauskas 
>
> Thanks!
>
> > ---
> >   drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c 
> > b/drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c
> > index 2c932c29f1f9..f0936cb3c056 100644
> > --- a/drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c
> > +++ b/drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c
> > @@ -235,6 +235,6 @@ struct dmub_psr *dmub_psr_create(struct dc_context *ctx)
> >*/
> >   void dmub_psr_destroy(struct dmub_psr **dmub)
> >   {
> > - kfree(dmub);
> > + kfree(*dmub);
> >   *dmub = NULL;
> >   }
> >
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH][next] drm/amd/display: fix indentation issue on a hunk of code

2020-02-28 Thread Alex Deucher
On Fri, Feb 28, 2020 at 8:16 AM Colin King  wrote:
>
> From: Colin Ian King 
>
> There are multiple statements that are indented incorrectly. Add
> in the missing tabs.
>
> Signed-off-by: Colin Ian King 

Applied.  Thanks!

Alex

> ---
>  .../gpu/drm/amd/display/dc/calcs/dce_calcs.c  | 46 +--
>  1 file changed, 23 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c 
> b/drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c
> index 5d081c42e81b..2c6db379afae 100644
> --- a/drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c
> +++ b/drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c
> @@ -3265,33 +3265,33 @@ bool bw_calcs(struct dc_context *ctx,
> bw_fixed_to_int(bw_mul(data->
> stutter_exit_watermark[9], 
> bw_int_to_fixed(1000)));
>
> -   calcs_output->stutter_entry_wm_ns[0].b_mark =
> -   bw_fixed_to_int(bw_mul(data->
> -   stutter_entry_watermark[4], 
> bw_int_to_fixed(1000)));
> -   calcs_output->stutter_entry_wm_ns[1].b_mark =
> -   bw_fixed_to_int(bw_mul(data->
> -   stutter_entry_watermark[5], 
> bw_int_to_fixed(1000)));
> -   calcs_output->stutter_entry_wm_ns[2].b_mark =
> -   bw_fixed_to_int(bw_mul(data->
> -   stutter_entry_watermark[6], 
> bw_int_to_fixed(1000)));
> -   if (ctx->dc->caps.max_slave_planes) {
> -   calcs_output->stutter_entry_wm_ns[3].b_mark =
> +   calcs_output->stutter_entry_wm_ns[0].b_mark =
> bw_fixed_to_int(bw_mul(data->
> -   stutter_entry_watermark[0], 
> bw_int_to_fixed(1000)));
> -   calcs_output->stutter_entry_wm_ns[4].b_mark =
> +   stutter_entry_watermark[4], 
> bw_int_to_fixed(1000)));
> +   calcs_output->stutter_entry_wm_ns[1].b_mark =
> bw_fixed_to_int(bw_mul(data->
> -   stutter_entry_watermark[1], 
> bw_int_to_fixed(1000)));
> -   } else {
> -   calcs_output->stutter_entry_wm_ns[3].b_mark =
> +   stutter_entry_watermark[5], 
> bw_int_to_fixed(1000)));
> +   calcs_output->stutter_entry_wm_ns[2].b_mark =
> bw_fixed_to_int(bw_mul(data->
> -   stutter_entry_watermark[7], 
> bw_int_to_fixed(1000)));
> -   calcs_output->stutter_entry_wm_ns[4].b_mark =
> +   stutter_entry_watermark[6], 
> bw_int_to_fixed(1000)));
> +   if (ctx->dc->caps.max_slave_planes) {
> +   calcs_output->stutter_entry_wm_ns[3].b_mark =
> +   bw_fixed_to_int(bw_mul(data->
> +   stutter_entry_watermark[0], 
> bw_int_to_fixed(1000)));
> +   calcs_output->stutter_entry_wm_ns[4].b_mark =
> +   bw_fixed_to_int(bw_mul(data->
> +   stutter_entry_watermark[1], 
> bw_int_to_fixed(1000)));
> +   } else {
> +   calcs_output->stutter_entry_wm_ns[3].b_mark =
> +   bw_fixed_to_int(bw_mul(data->
> +   stutter_entry_watermark[7], 
> bw_int_to_fixed(1000)));
> +   calcs_output->stutter_entry_wm_ns[4].b_mark =
> +   bw_fixed_to_int(bw_mul(data->
> +   stutter_entry_watermark[8], 
> bw_int_to_fixed(1000)));
> +   }
> +   calcs_output->stutter_entry_wm_ns[5].b_mark =
> bw_fixed_to_int(bw_mul(data->
> -   stutter_entry_watermark[8], 
> bw_int_to_fixed(1000)));
> -   }
> -   calcs_output->stutter_entry_wm_ns[5].b_mark =
> -   bw_fixed_to_int(bw_mul(data->
> -   stutter_entry_watermark[9], 
> bw_int_to_fixed(1000)));
> +   stutter_entry_watermark[9], 
> bw_int_to_fixed(1000)));
>
> calcs_output->urgent_wm_ns[0].b_mark =
> bw_fixed_to_int(bw_mul(data->
> --
> 2.25.0
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org

Re: [PATCH][next] drm/amdkfd: fix indentation issue

2020-02-28 Thread Alex Deucher
Applied.  Thanks!

On Fri, Feb 28, 2020 at 8:08 AM Colin King  wrote:
>
> From: Colin Ian King 
>
> There is a statement that is indented with spaces instead of a tab.
> Replace spaces with a tab.
>
> Signed-off-by: Colin Ian King 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> index f547e4769831..5db42814dd51 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> @@ -490,7 +490,7 @@ static ssize_t node_show(struct kobject *kobj, struct 
> attribute *attr,
> dev->node_props.num_sdma_queues_per_engine);
> sysfs_show_32bit_prop(buffer, "num_cp_queues",
> dev->node_props.num_cp_queues);
> -sysfs_show_64bit_prop(buffer, "unique_id",
> +   sysfs_show_64bit_prop(buffer, "unique_id",
> dev->node_props.unique_id);
>
> if (dev->gpu) {
> --
> 2.25.0
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Nuritzi Sanchez
Hi All,


I know there's been a lot of discussion already, but I wanted to respond to
Daniel's original post.

I joined GitLab earlier this month as their new Open Source Program Manager
[1] and wanted to introduce myself here since I’ll be involved from the
GitLab side as we work together to problem-solve the financial situation
here. My role at GitLab is to help make it easier for Open Source
organizations to migrate (by helping to smooth out some of the current pain
points), and to help advocate internally for changes to the product and our
workflows to make GitLab better for Open Source orgs. We want to make sure
that our Open Source community feels supported beyond just migration. As
such, I’ll be running the GitLab Open Source Program [2].

My background is that I’m the former President and Chairperson of the GNOME
Foundation, which is one of the earliest Free Software projects to migrate
to GitLab. GNOME initially faced some limitations with the CI runner costs
too, but thanks to generous support from donors, has no longer experienced
those issues in recent times. I know there's already a working relationship
between our communities, but it could be good to examine what GNOME and KDE
have done and see if there's anything we can apply here. We've reached out
to Daniel Stone, our main contact for the freedesktop.org migration, and he
has gotten us in touch with Daniel V. and the X.Org Foundation Board to
learn more about what's already been done and what we can do next.

Please bear with me as I continue to get ramped up in my new job, but I’d
like to offer as much support as possible with this issue. We’ll be
exploring ways for GitLab to help make sure there isn’t a gap in coverage
during the time that freedesktop looks for sponsors. I know that on
GitLab’s side, supporting our Open Source user community is a priority.

Best,

Nuritzi

[1] https://about.gitlab.com/company/team/#nuritzi
[2]
https://about.gitlab.com/handbook/marketing/community-relations/opensource-program/


On Fri, Feb 28, 2020 at 1:22 PM Daniel Vetter 
wrote:

> On Fri, Feb 28, 2020 at 9:31 PM Dave Airlie  wrote:
> >
> > On Sat, 29 Feb 2020 at 05:34, Eric Anholt  wrote:
> > >
> > > On Fri, Feb 28, 2020 at 12:48 AM Dave Airlie 
> wrote:
> > > >
> > > > On Fri, 28 Feb 2020 at 18:18, Daniel Stone 
> wrote:
> > > > >
> > > > > On Fri, 28 Feb 2020 at 03:38, Dave Airlie 
> wrote:
> > > > > > b) we probably need to take a large step back here.
> > > > > >
> > > > > > Look at this from a sponsor POV, why would I give X.org/fd.o
> > > > > > sponsorship money that they are just giving straight to google
> to pay
> > > > > > for hosting credits? Google are profiting in some minor way from
> these
> > > > > > hosting credits being bought by us, and I assume we aren't
> getting any
> > > > > > sort of discounts here. Having google sponsor the credits costs
> google
> > > > > > substantially less than having any other company give us money
> to do
> > > > > > it.
> > > > >
> > > > > The last I looked, Google GCP / Amazon AWS / Azure were all pretty
> > > > > comparable in terms of what you get and what you pay for them.
> > > > > Obviously providers like Packet and Digital Ocean who offer
> bare-metal
> > > > > services are cheaper, but then you need to find someone who is
> going
> > > > > to properly administer the various machines, install decent
> > > > > monitoring, make sure that more storage is provisioned when we need
> > > > > more storage (which is basically all the time), make sure that the
> > > > > hardware is maintained in decent shape (pretty sure one of the fd.o
> > > > > machines has had a drive in imminent-failure state for the last few
> > > > > months), etc.
> > > > >
> > > > > Given the size of our service, that's a much better plan (IMO) than
> > > > > relying on someone who a) isn't an admin by trade, b) has a million
> > > > > other things to do, and c) hasn't wanted to do it for the past
> several
> > > > > years. But as long as that's the resources we have, then we're
> paying
> > > > > the cloud tradeoff, where we pay more money in exchange for fewer
> > > > > problems.
> > > >
> > > > Admin for gitlab and CI is a full time role anyways. The system is
> > > > definitely not self sustaining without time being put in by you and
> > > > anholt still. If we have $75k to burn on credits, and it was diverted
> > > > to just pay an admin to admin the real hw + gitlab/CI would that not
> > > > be a better use of the money? I didn't know if we can afford $75k for
> > > > an admin, but suddenly we can afford it for gitlab credits?
> > >
> > > As I think about the time that I've spent at google in less than a
> > > year on trying to keep the lights on for CI and optimize our
> > > infrastructure in the current cloud environment, that's more than the
> > > entire yearly budget you're talking about here.  Saying "let's just
> > > pay for people to do more work instead of paying for full-service
> > > cloud" is not a cost 

Re: [PATCH 2/2] drm/amd/display: Allow current eDP link settings to override verified ones.

2020-02-28 Thread Mario Kleiner
On Thu, Feb 27, 2020 at 8:11 PM Mario Kleiner
 wrote:
>
> Hi Harry
>
> Ok, back from various other emergencies and deadlines, sorry for the
> late reply. I also fixed my e-mail address - it was mistyped, causing
> all these delivery failures :/
>
> On Thu, Jan 9, 2020 at 10:26 PM Harry Wentland  wrote:
> >
> > On 2020-01-09 4:13 p.m., Mario Kleiner wrote:
> > > On Thu, Jan 9, 2020 at 7:44 PM Harry Wentland  > > > wrote:
> > >
> > > On 2020-01-09 10:20 a.m., Mario Kleiner wrote:
> > > > If the current eDP link settings, as read from hw, provide a higher
> > > > bandwidth than the verified_link_cap ones (= reported_link_cap), 
> > > then
> > > > override verified_link_cap with current settings.
> > > >
> > > > These initial current eDP link settings have been set up by
> > > > firmware during boot, so they should work on the eDP panel.
> > > > Therefore use them if the firmware thinks they are good and
> > > > they provide higher link bandwidth, e.g., to enable higher
> > > > resolutions / color depths.
> > > >
> ... snip ...
> > >
> > >
> > > Tried that already (see other mail), replacing the whole if statement
> > > with a if (true) to force reading DP_SUPPORTED_LINK_RATES. The whole
> > > table reads back as all-zero, and versions are DP 1.1, eDP 1.3, not 1.4+
> > > as what seems to be required. The use the classic link bw stuff, but
> > > with a non-standard link bandwidth multiplier of 0xc, and a reported
> > > DP_MAX_LINK_RATE of 0xa, contradicting the 0xc setting that the firmware
> > > sets at bootup.
> > >
> > > Seems to be a very Apple thing...
> >
> > Indeed. I think it was a funky panel that was "ahead of its time" and
> > ahead of the spec.
> >
> > I would prefer a DPCD quirk for this panel that updates the reported DP
> > caps, rather than picking the "current" ones from the FW lightup.
> >
> > Harry
> >
>
> How would i do this? I see various options:
>
> I could rewrite my current patch, move it down inside
> dc_link_detect_helper() until after the edid was read and we have
> vendor/model id available, then say if(everything that's there now &&
> (vendor=Apple) && (model=Troublesomepanel)) { ... }
>
> Or i could add quirk code to detect_edp_sink_caps() after
> retrieve_link_cap() [or inside retrieve_link_cap] to override the
> reported_link_cap. But at that point we don't have edid yet and
> therefore no vendor/model id. Is there something inside the dpcd one
> can use to uniquely identify this display model?
>
> struct dp_device_vendor_id sink_id; queried inside retrieve_link_cap()
> sounds like it could be a unique id? I don't know about that.
>
> My intention was to actually do nothing on the AMD side here, as my
> photometer measurements suggest that the panel gives better quality
> results for >= 10 bpc output if it is operated at 8 bit and then the
> gpu's spatial dithering convincingly fakes the extra bits. Quality
> seems worse if one actually switches the panel into 10 bpc, as it
> doesn't seem to be a real 10 bit panel, just a 8 bit panel that
> accepts 10 bit and then badly dithers it to 10 bit.
>
> The situation has changed for Linux 5.6-rc, because of this recent
> commit from Roman Li, which is already in 5.6-rc:
> 4a8ca46bae8affba063aabac85a0b1401ba810a3 "drm/amd/display: Default max
> bpc to 16 for eDP"
>
> While that commit supposedly fixes some darkness on some other eDP
> panel, it now breaks my eDP panel. It leaves edid reported bpc
> unclamped, so the driver uses 10 bpc as basis for required bandwidth
> calculations and then the required bandwidth for all modes exceeds the
> link bandwidth. I end with the eDP panel having no valid modes at all
> ==> Panel goes black, game over.
>
> We either need to revert that commit for drm-fixes, or quirk it for
> the specific panels that are troublesome, or need to get some solution
> into 5.6-rc, otherwise there will be a lot of regressions for at least
> Apple MBP users.
>
> thanks,
> -mario
>

Ok, just sent out a patch with a specific dpcd quirk for this as:

[PATCH] drm/amd/display: Add link_rate quirk for Apple 15" MBP 2017

Tested against drm-next for Linux 5.6, works.

-mario
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amd/display: Add link_rate quirk for Apple 15" MBP 2017

2020-02-28 Thread Mario Kleiner
This fixes a problem found on the MacBookPro 2017 Retina panel:

The panel reports 10 bpc color depth in its EDID, and the
firmware chooses link settings at boot which support enough
bandwidth for 10 bpc (324000 kbit/sec aka LINK_RATE_RBR2
aka 0xc), but the DP_MAX_LINK_RATE dpcd register only reports
2.7 Gbps (multiplier value 0xa) as possible, in direct
contradiction of what the firmware successfully set up.

This restricts the panel to 8 bpc, not providing the full
color depth of the panel on Linux <= 5.5. Additionally, commit
'4a8ca46bae8a ("drm/amd/display: Default max bpc to 16 for eDP")'
introduced into Linux 5.6-rc1 will unclamp panel depth to
its full 10 bpc, thereby requiring a eDP bandwidth for all
modes that exceeds the bandwidth available and causes all modes
to fail validation -> No modes for the laptop panel -> failure
to set any mode -> Panel goes dark.

This patch adds a quirk specific to the MBP 2017 15" Retina
panel to override reported max link rate to the correct maximum
of 0xc = LINK_RATE_RBR2 to fix the darkness and reduced display
precision.

Please apply for Linux 5.6+ to avoid regressing Apple MBP panel
support.

Signed-off-by: Mario Kleiner 
---
 drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
index cb731c1d30b1..fd9e69634c50 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
@@ -3401,6 +3401,17 @@ static bool retrieve_link_cap(struct dc_link *link)
sink_id.ieee_device_id,
sizeof(sink_id.ieee_device_id));
 
+   /* Quirk Apple MBP 2017 15" Retina panel: Wrong DP_MAX_LINK_RATE */
+   {
+   uint8_t str_mbp_2017[] = { 101, 68, 21, 101, 98, 97 };
+
+   if ((link->dpcd_caps.sink_dev_id == 0x0010fa) &&
+   !memcmp(link->dpcd_caps.sink_dev_id_str, str_mbp_2017,
+   sizeof(str_mbp_2017))) {
+   link->reported_link_cap.link_rate = 0x0c;
+   }
+   }
+
core_link_read_dpcd(
link,
DP_SINK_HW_REVISION_START,
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Daniel Vetter
On Fri, Feb 28, 2020 at 9:31 PM Dave Airlie  wrote:
>
> On Sat, 29 Feb 2020 at 05:34, Eric Anholt  wrote:
> >
> > On Fri, Feb 28, 2020 at 12:48 AM Dave Airlie  wrote:
> > >
> > > On Fri, 28 Feb 2020 at 18:18, Daniel Stone  wrote:
> > > >
> > > > On Fri, 28 Feb 2020 at 03:38, Dave Airlie  wrote:
> > > > > b) we probably need to take a large step back here.
> > > > >
> > > > > Look at this from a sponsor POV, why would I give X.org/fd.o
> > > > > sponsorship money that they are just giving straight to google to pay
> > > > > for hosting credits? Google are profiting in some minor way from these
> > > > > hosting credits being bought by us, and I assume we aren't getting any
> > > > > sort of discounts here. Having google sponsor the credits costs google
> > > > > substantially less than having any other company give us money to do
> > > > > it.
> > > >
> > > > The last I looked, Google GCP / Amazon AWS / Azure were all pretty
> > > > comparable in terms of what you get and what you pay for them.
> > > > Obviously providers like Packet and Digital Ocean who offer bare-metal
> > > > services are cheaper, but then you need to find someone who is going
> > > > to properly administer the various machines, install decent
> > > > monitoring, make sure that more storage is provisioned when we need
> > > > more storage (which is basically all the time), make sure that the
> > > > hardware is maintained in decent shape (pretty sure one of the fd.o
> > > > machines has had a drive in imminent-failure state for the last few
> > > > months), etc.
> > > >
> > > > Given the size of our service, that's a much better plan (IMO) than
> > > > relying on someone who a) isn't an admin by trade, b) has a million
> > > > other things to do, and c) hasn't wanted to do it for the past several
> > > > years. But as long as that's the resources we have, then we're paying
> > > > the cloud tradeoff, where we pay more money in exchange for fewer
> > > > problems.
> > >
> > > Admin for gitlab and CI is a full time role anyways. The system is
> > > definitely not self sustaining without time being put in by you and
> > > anholt still. If we have $75k to burn on credits, and it was diverted
> > > to just pay an admin to admin the real hw + gitlab/CI would that not
> > > be a better use of the money? I didn't know if we can afford $75k for
> > > an admin, but suddenly we can afford it for gitlab credits?
> >
> > As I think about the time that I've spent at google in less than a
> > year on trying to keep the lights on for CI and optimize our
> > infrastructure in the current cloud environment, that's more than the
> > entire yearly budget you're talking about here.  Saying "let's just
> > pay for people to do more work instead of paying for full-service
> > cloud" is not a cost optimization.
> >
> >
> > > > Yes, we could federate everything back out so everyone runs their own
> > > > builds and executes those. Tinderbox did something really similar to
> > > > that IIRC; not sure if Buildbot does as well. Probably rules out
> > > > pre-merge testing, mind.
> > >
> > > Why? does gitlab not support the model? having builds done in parallel
> > > on runners closer to the test runners seems like it should be a thing.
> > > I guess artifact transfer would cost less then as a result.
> >
> > Let's do some napkin math.  The biggest artifacts cost we have in Mesa
> > is probably meson-arm64/meson-arm (60MB zipped from meson-arm64,
> > downloaded by 4 freedreno and 6ish lava, about 100 pipelines/day,
> > makes ~1.8TB/month ($180 or so).  We could build a local storage next
> > to the lava dispatcher so that the artifacts didn't have to contain
> > the rootfs that came from the container (~2/3 of the insides of the
> > zip file), but that's another service to build and maintain.  Building
> > the drivers once locally and storing it would save downloading the
> > other ~1/3 of the inside of the zip file, but that requires a big
> > enough system to do builds in time.
> >
> > I'm planning on doing a local filestore for google's lava lab, since I
> > need to be able to move our xml files off of the lava DUTs to get the
> > xml results we've become accustomed to, but this would not bubble up
> > to being a priority for my time if I wasn't doing it anyway.  If it
> > takes me a single day to set all this up (I estimate a couple of
> > weeks), that costs my employer a lot more than sponsoring the costs of
> > the inefficiencies of the system that has accumulated.
>
> I'm not trying to knock the engineering works the CI contributors have
> done at all, but I've never seen a real discussion about costs until
> now. Engineers aren't accountants.
>
> The thing we seem to be missing here is fiscal responsibility. I know
> this email is us being fiscally responsible, but it's kinda after the
> fact.
>
> I cannot commit my employer to spending a large amount of money (> 0
> actually) without a long and lengthy process with checks and bounds.
> Can you?
>
> The 

Re: gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Matt Turner
On Fri, Feb 28, 2020 at 12:00 AM Daniel Stone  wrote:
>
> Hi Matt,
>
> On Thu, 27 Feb 2020 at 23:45, Matt Turner  wrote:
> > We're paying 75K USD for the bandwidth to transfer data from the
> > GitLab cloud instance. i.e., for viewing the https site, for
> > cloning/updating git repos, and for downloading CI artifacts/images to
> > the testing machines (AFAIU).
>
> I believe that in January, we had $2082 of network cost (almost
> entirely egress; ingress is basically free) and $1750 of cloud-storage
> cost (almost all of which was download). That's based on 16TB of
> cloud-storage (CI artifacts, container images, file uploads, Git LFS)
> egress and 17.9TB of other egress (the web service itself, repo
> activity). Projecting that out gives us roughly $45k of network
> activity alone, so it looks like this figure is based on a projected
> increase of ~50%.
>
> The actual compute capacity is closer to $1150/month.

Could we have the full GCP bill posted?
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] drm/amdgpu/display: navi1x copy dcn watermark clock settings to smu resume from s3

2020-02-28 Thread Wu, Hersen
Follow Evan's review, add smu->mutex.


This interface is for dGPU Navi1x. Linux dc-pplib interface depends
 on window driver dc implementation.

 For Navi1x, clock settings of dcn watermarks are fixed. the settings
 should be passed to smu during boot up and resume from s3.
 boot up: dc calculate dcn watermark clock settings within dc_create,
 dcn20_resource_construct, then call pplib functions below to pass
 the settings to smu:
 smu_set_watermarks_for_clock_ranges
 smu_set_watermarks_table
 navi10_set_watermarks_table
 smu_write_watermarks_table

 For Renoir, clock settings of dcn watermark are also fixed values.
 dc has implemented different flow for window driver:
 dc_hardware_init / dc_set_power_state
 dcn10_init_hw
 notify_wm_ranges
 set_wm_ranges

 For Linux
 smu_set_watermarks_for_clock_ranges
 renoir_set_watermarks_table
 smu_write_watermarks_table

 dc_hardware_init -> amdgpu_dm_init
 dc_set_power_state --> dm_resume

 therefore, linux dc-pplib interface of navi10/12/14 is different
 from that of Renoir.

Signed-off-by: Hersen Wu 
---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 68 +++
 1 file changed, 68 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 931cbd7b372e..1ee1d6ff2782 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1435,6 +1435,72 @@ static void s3_handle_mst(struct drm_device *dev, bool 
suspend)
  drm_kms_helper_hotplug_event(dev);
 }

+static int amdgpu_dm_smu_write_watermarks_table(struct amdgpu_device *adev)
+{
+ struct smu_context *smu = >smu;
+ int ret = 0;
+
+ if (!is_support_sw_smu(adev))
+ return 0;
+
+ /* This interface is for dGPU Navi1x.Linux dc-pplib interface depends
+ * on window driver dc implementation.
+ * For Navi1x, clock settings of dcn watermarks are fixed. the settings
+ * should be passed to smu during boot up and resume from s3.
+ * boot up: dc calculate dcn watermark clock settings within dc_create,
+ * dcn20_resource_construct
+ * then call pplib functions below to pass the settings to smu:
+ * smu_set_watermarks_for_clock_ranges
+ * smu_set_watermarks_table
+ * navi10_set_watermarks_table
+ * smu_write_watermarks_table
+ *
+ * For Renoir, clock settings of dcn watermark are also fixed values.
+ * dc has implemented different flow for window driver:
+ * dc_hardware_init / dc_set_power_state
+ * dcn10_init_hw
+ * notify_wm_ranges
+ * set_wm_ranges
+ * -- Linux
+ * smu_set_watermarks_for_clock_ranges
+ * renoir_set_watermarks_table
+ * smu_write_watermarks_table
+ *
+ * For Linux,
+ * dc_hardware_init -> amdgpu_dm_init
+ * dc_set_power_state --> dm_resume
+ *
+ * therefore, this function apply to navi10/12/14 but not Renoir
+ * *
+ */
+ switch(adev->asic_type) {
+ case CHIP_NAVI10:
+ case CHIP_NAVI14:
+ case CHIP_NAVI12:
+ break;
+ default:
+ return 0;
+ }
+
+ mutex_lock(>mutex);
+
+ /* pass data to smu controller */
+ if ((smu->watermarks_bitmap & WATERMARKS_EXIST) &&
+ !(smu->watermarks_bitmap & WATERMARKS_LOADED)) {
+ ret = smu_write_watermarks_table(smu);
+
+ if (ret) {
+ DRM_ERROR("Failed to update WMTABLE!\n");
+ return ret;
+ }
+ smu->watermarks_bitmap |= WATERMARKS_LOADED;
+ }
+
+ mutex_unlock(>mutex);
+
+ return 0;
+}
+
 /**
  * dm_hw_init() - Initialize DC device
  * @handle: The base driver device containing the amdgpu_dm device.
@@ -1713,6 +1779,8 @@ static int dm_resume(void *handle)

  amdgpu_dm_irq_resume_late(adev);

+ amdgpu_dm_smu_write_watermarks_table(adev);
+
  return 0;
 }

--
2.17.1


From: Quan, Evan 
Sent: February 27, 2020 9:58 PM
To: Wu, Hersen ; amd-gfx@lists.freedesktop.org 

Cc: Feng, Kenneth ; Wu, Hersen 
Subject: RE: [PATCH 2/2] drm/amdgpu/display: navi1x copy dcn watermark clock 
settings to smu resume from s3

Thanks. But could you help to confirm whether this is correctly protected by 
"mutex_lock(>mutex)"?

-Original Message-
From: Hersen Wu 
Sent: Thursday, February 27, 2020 11:54 PM
To: amd-gfx@lists.freedesktop.org
Cc: Quan, Evan ; Feng, Kenneth ; Wu, 
Hersen 
Subject: [PATCH 2/2] drm/amdgpu/display: navi1x copy dcn watermark clock 
settings to smu resume from s3

 This interface is for dGPU Navi1x. Linux dc-pplib interface depends  on window 
driver dc implementation.

 For Navi1x, clock settings of dcn watermarks are fixed. the settings  should 
be passed to smu during boot up and resume from s3.
 boot up: dc calculate dcn watermark clock settings within dc_create,  
dcn20_resource_construct, then call pplib functions below to pass  the settings 
to smu:
 smu_set_watermarks_for_clock_ranges
 smu_set_watermarks_table
 navi10_set_watermarks_table
 smu_write_watermarks_table

 For Renoir, clock settings of dcn watermark are also fixed values.
 dc has implemented different flow for window driver:
 dc_hardware_init / dc_set_power_state
 dcn10_init_hw
 notify_wm_ranges
 set_wm_ranges

 For Linux
 

Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Dave Airlie
On Sat, 29 Feb 2020 at 05:34, Eric Anholt  wrote:
>
> On Fri, Feb 28, 2020 at 12:48 AM Dave Airlie  wrote:
> >
> > On Fri, 28 Feb 2020 at 18:18, Daniel Stone  wrote:
> > >
> > > On Fri, 28 Feb 2020 at 03:38, Dave Airlie  wrote:
> > > > b) we probably need to take a large step back here.
> > > >
> > > > Look at this from a sponsor POV, why would I give X.org/fd.o
> > > > sponsorship money that they are just giving straight to google to pay
> > > > for hosting credits? Google are profiting in some minor way from these
> > > > hosting credits being bought by us, and I assume we aren't getting any
> > > > sort of discounts here. Having google sponsor the credits costs google
> > > > substantially less than having any other company give us money to do
> > > > it.
> > >
> > > The last I looked, Google GCP / Amazon AWS / Azure were all pretty
> > > comparable in terms of what you get and what you pay for them.
> > > Obviously providers like Packet and Digital Ocean who offer bare-metal
> > > services are cheaper, but then you need to find someone who is going
> > > to properly administer the various machines, install decent
> > > monitoring, make sure that more storage is provisioned when we need
> > > more storage (which is basically all the time), make sure that the
> > > hardware is maintained in decent shape (pretty sure one of the fd.o
> > > machines has had a drive in imminent-failure state for the last few
> > > months), etc.
> > >
> > > Given the size of our service, that's a much better plan (IMO) than
> > > relying on someone who a) isn't an admin by trade, b) has a million
> > > other things to do, and c) hasn't wanted to do it for the past several
> > > years. But as long as that's the resources we have, then we're paying
> > > the cloud tradeoff, where we pay more money in exchange for fewer
> > > problems.
> >
> > Admin for gitlab and CI is a full time role anyways. The system is
> > definitely not self sustaining without time being put in by you and
> > anholt still. If we have $75k to burn on credits, and it was diverted
> > to just pay an admin to admin the real hw + gitlab/CI would that not
> > be a better use of the money? I didn't know if we can afford $75k for
> > an admin, but suddenly we can afford it for gitlab credits?
>
> As I think about the time that I've spent at google in less than a
> year on trying to keep the lights on for CI and optimize our
> infrastructure in the current cloud environment, that's more than the
> entire yearly budget you're talking about here.  Saying "let's just
> pay for people to do more work instead of paying for full-service
> cloud" is not a cost optimization.
>
>
> > > Yes, we could federate everything back out so everyone runs their own
> > > builds and executes those. Tinderbox did something really similar to
> > > that IIRC; not sure if Buildbot does as well. Probably rules out
> > > pre-merge testing, mind.
> >
> > Why? does gitlab not support the model? having builds done in parallel
> > on runners closer to the test runners seems like it should be a thing.
> > I guess artifact transfer would cost less then as a result.
>
> Let's do some napkin math.  The biggest artifacts cost we have in Mesa
> is probably meson-arm64/meson-arm (60MB zipped from meson-arm64,
> downloaded by 4 freedreno and 6ish lava, about 100 pipelines/day,
> makes ~1.8TB/month ($180 or so).  We could build a local storage next
> to the lava dispatcher so that the artifacts didn't have to contain
> the rootfs that came from the container (~2/3 of the insides of the
> zip file), but that's another service to build and maintain.  Building
> the drivers once locally and storing it would save downloading the
> other ~1/3 of the inside of the zip file, but that requires a big
> enough system to do builds in time.
>
> I'm planning on doing a local filestore for google's lava lab, since I
> need to be able to move our xml files off of the lava DUTs to get the
> xml results we've become accustomed to, but this would not bubble up
> to being a priority for my time if I wasn't doing it anyway.  If it
> takes me a single day to set all this up (I estimate a couple of
> weeks), that costs my employer a lot more than sponsoring the costs of
> the inefficiencies of the system that has accumulated.

I'm not trying to knock the engineering works the CI contributors have
done at all, but I've never seen a real discussion about costs until
now. Engineers aren't accountants.

The thing we seem to be missing here is fiscal responsibility. I know
this email is us being fiscally responsible, but it's kinda after the
fact.

I cannot commit my employer to spending a large amount of money (> 0
actually) without a long and lengthy process with checks and bounds.
Can you?

The X.org board has budgets and procedures as well. I as a developer
of Mesa should not be able to commit the X.org foundation to spending
large amounts of money without checks and bounds.

The CI infrastructure lacks any checks and 

Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Eric Anholt
On Fri, Feb 28, 2020 at 12:48 AM Dave Airlie  wrote:
>
> On Fri, 28 Feb 2020 at 18:18, Daniel Stone  wrote:
> >
> > On Fri, 28 Feb 2020 at 03:38, Dave Airlie  wrote:
> > > b) we probably need to take a large step back here.
> > >
> > > Look at this from a sponsor POV, why would I give X.org/fd.o
> > > sponsorship money that they are just giving straight to google to pay
> > > for hosting credits? Google are profiting in some minor way from these
> > > hosting credits being bought by us, and I assume we aren't getting any
> > > sort of discounts here. Having google sponsor the credits costs google
> > > substantially less than having any other company give us money to do
> > > it.
> >
> > The last I looked, Google GCP / Amazon AWS / Azure were all pretty
> > comparable in terms of what you get and what you pay for them.
> > Obviously providers like Packet and Digital Ocean who offer bare-metal
> > services are cheaper, but then you need to find someone who is going
> > to properly administer the various machines, install decent
> > monitoring, make sure that more storage is provisioned when we need
> > more storage (which is basically all the time), make sure that the
> > hardware is maintained in decent shape (pretty sure one of the fd.o
> > machines has had a drive in imminent-failure state for the last few
> > months), etc.
> >
> > Given the size of our service, that's a much better plan (IMO) than
> > relying on someone who a) isn't an admin by trade, b) has a million
> > other things to do, and c) hasn't wanted to do it for the past several
> > years. But as long as that's the resources we have, then we're paying
> > the cloud tradeoff, where we pay more money in exchange for fewer
> > problems.
>
> Admin for gitlab and CI is a full time role anyways. The system is
> definitely not self sustaining without time being put in by you and
> anholt still. If we have $75k to burn on credits, and it was diverted
> to just pay an admin to admin the real hw + gitlab/CI would that not
> be a better use of the money? I didn't know if we can afford $75k for
> an admin, but suddenly we can afford it for gitlab credits?

As I think about the time that I've spent at google in less than a
year on trying to keep the lights on for CI and optimize our
infrastructure in the current cloud environment, that's more than the
entire yearly budget you're talking about here.  Saying "let's just
pay for people to do more work instead of paying for full-service
cloud" is not a cost optimization.


> > Yes, we could federate everything back out so everyone runs their own
> > builds and executes those. Tinderbox did something really similar to
> > that IIRC; not sure if Buildbot does as well. Probably rules out
> > pre-merge testing, mind.
>
> Why? does gitlab not support the model? having builds done in parallel
> on runners closer to the test runners seems like it should be a thing.
> I guess artifact transfer would cost less then as a result.

Let's do some napkin math.  The biggest artifacts cost we have in Mesa
is probably meson-arm64/meson-arm (60MB zipped from meson-arm64,
downloaded by 4 freedreno and 6ish lava, about 100 pipelines/day,
makes ~1.8TB/month ($180 or so).  We could build a local storage next
to the lava dispatcher so that the artifacts didn't have to contain
the rootfs that came from the container (~2/3 of the insides of the
zip file), but that's another service to build and maintain.  Building
the drivers once locally and storing it would save downloading the
other ~1/3 of the inside of the zip file, but that requires a big
enough system to do builds in time.

I'm planning on doing a local filestore for google's lava lab, since I
need to be able to move our xml files off of the lava DUTs to get the
xml results we've become accustomed to, but this would not bubble up
to being a priority for my time if I wasn't doing it anyway.  If it
takes me a single day to set all this up (I estimate a couple of
weeks), that costs my employer a lot more than sponsoring the costs of
the inefficiencies of the system that has accumulated.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Kristian Høgsberg
On Thu, Feb 27, 2020 at 7:38 PM Dave Airlie  wrote:
>
> On Fri, 28 Feb 2020 at 07:27, Daniel Vetter  wrote:
> >
> > Hi all,
> >
> > You might have read the short take in the X.org board meeting minutes
> > already, here's the long version.
> >
> > The good news: gitlab.fd.o has become very popular with our
> > communities, and is used extensively. This especially includes all the
> > CI integration. Modern development process and tooling, yay!
> >
> > The bad news: The cost in growth has also been tremendous, and it's
> > breaking our bank account. With reasonable estimates for continued
> > growth we're expecting hosting expenses totalling 75k USD this year,
> > and 90k USD next year. With the current sponsors we've set up we can't
> > sustain that. We estimate that hosting expenses for gitlab.fd.o
> > without any of the CI features enabled would total 30k USD, which is
> > within X.org's ability to support through various sponsorships, mostly
> > through XDC.
> >
> > Note that X.org does no longer sponsor any CI runners themselves,
> > we've stopped that. The huge additional expenses are all just in
> > storing and serving build artifacts and images to outside CI runners
> > sponsored by various companies. A related topic is that with the
> > growth in fd.o it's becoming infeasible to maintain it all on
> > volunteer admin time. X.org is therefore also looking for admin
> > sponsorship, at least medium term.
> >
> > Assuming that we want cash flow reserves for one year of gitlab.fd.o
> > (without CI support) and a trimmed XDC and assuming no sponsor payment
> > meanwhile, we'd have to cut CI services somewhere between May and June
> > this year. The board is of course working on acquiring sponsors, but
> > filling a shortfall of this magnitude is neither easy nor quick work,
> > and we therefore decided to give an early warning as soon as possible.
> > Any help in finding sponsors for fd.o is very much appreciated.
>
> a) Ouch.
>
> b) we probably need to take a large step back here.

If we're taking a step back here, I also want to recognize what a
tremendous success this has been so far and thank everybody involved
for building something so useful. Between gitlab and the CI, our
workflow has improved and code quality has gone up.  I don't have
anything useful to add to the technical discussion, except that that
it seems pretty standard engineering practice to build a system,
observe it and identify and eliminate bottlenecks. Planning never
hurts, of course, but I don't think anybody could have realistically
modeled and projected the cost of this infrastructure as it's grown
organically and fast.

Kristian

> Look at this from a sponsor POV, why would I give X.org/fd.o
> sponsorship money that they are just giving straight to google to pay
> for hosting credits? Google are profiting in some minor way from these
> hosting credits being bought by us, and I assume we aren't getting any
> sort of discounts here. Having google sponsor the credits costs google
> substantially less than having any other company give us money to do
> it.
>
> If our current CI architecture is going to burn this amount of money a
> year and we hadn't worked this out in advance of deploying it then I
> suggest the system should be taken offline until we work out what a
> sustainable system would look like within the budget we have, whether
> that be never transferring containers and build artifacts from the
> google network, just having local runner/build combos etc.
>
> Dave.
> ___
> Intel-gfx mailing list
> intel-...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH][next] drm/amdkfd: fix indentation issue

2020-02-28 Thread Colin King
From: Colin Ian King 

There is a statement that is indented with spaces instead of a tab.
Replace spaces with a tab.

Signed-off-by: Colin Ian King 
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index f547e4769831..5db42814dd51 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -490,7 +490,7 @@ static ssize_t node_show(struct kobject *kobj, struct 
attribute *attr,
dev->node_props.num_sdma_queues_per_engine);
sysfs_show_32bit_prop(buffer, "num_cp_queues",
dev->node_props.num_cp_queues);
-sysfs_show_64bit_prop(buffer, "unique_id",
+   sysfs_show_64bit_prop(buffer, "unique_id",
dev->node_props.unique_id);
 
if (dev->gpu) {
-- 
2.25.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Jan Engelhardt

On Friday 2020-02-28 08:59, Daniel Stone wrote:
>
>I believe that in January, we had $2082 of network cost (almost
>entirely egress; ingress is basically free) and $1750 of
>cloud-storage cost (almost all of which was download). That's based
>on 16TB of cloud-storage (CI artifacts, container images, file
>uploads, Git LFS) egress and 17.9TB of other egress (the web service
>itself, repo activity). Projecting that out [×12 for a year] gives
>us roughly $45k of network activity alone,

I had come to a similar conclusion a few years back: It is not very
economic to run ephemereal buildroots (and anything like it) between
two (or more) "significant locations" of which one end is located in
a Large Cloud datacenter like EC2/AWS/etc.

As for such usecases, me and my surrounding peers have used (other)
offerings where there is 50 TB free network/month, and yes that may
have entailed doing more adminning than elsewhere - but an admin 
appreciates $2000 a lot more than a corporation, too.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Erik Faye-Lund
On Fri, 2020-02-28 at 11:40 +0200, Lionel Landwerlin wrote:
> On 28/02/2020 11:28, Erik Faye-Lund wrote:
> > On Fri, 2020-02-28 at 13:37 +1000, Dave Airlie wrote:
> > > On Fri, 28 Feb 2020 at 07:27, Daniel Vetter <
> > > daniel.vet...@ffwll.ch>
> > > wrote:
> > > > Hi all,
> > > > 
> > > > You might have read the short take in the X.org board meeting
> > > > minutes
> > > > already, here's the long version.
> > > > 
> > > > The good news: gitlab.fd.o has become very popular with our
> > > > communities, and is used extensively. This especially includes
> > > > all
> > > > the
> > > > CI integration. Modern development process and tooling, yay!
> > > > 
> > > > The bad news: The cost in growth has also been tremendous, and
> > > > it's
> > > > breaking our bank account. With reasonable estimates for
> > > > continued
> > > > growth we're expecting hosting expenses totalling 75k USD this
> > > > year,
> > > > and 90k USD next year. With the current sponsors we've set up
> > > > we
> > > > can't
> > > > sustain that. We estimate that hosting expenses for gitlab.fd.o
> > > > without any of the CI features enabled would total 30k USD,
> > > > which
> > > > is
> > > > within X.org's ability to support through various sponsorships,
> > > > mostly
> > > > through XDC.
> > > > 
> > > > Note that X.org does no longer sponsor any CI runners
> > > > themselves,
> > > > we've stopped that. The huge additional expenses are all just
> > > > in
> > > > storing and serving build artifacts and images to outside CI
> > > > runners
> > > > sponsored by various companies. A related topic is that with
> > > > the
> > > > growth in fd.o it's becoming infeasible to maintain it all on
> > > > volunteer admin time. X.org is therefore also looking for admin
> > > > sponsorship, at least medium term.
> > > > 
> > > > Assuming that we want cash flow reserves for one year of
> > > > gitlab.fd.o
> > > > (without CI support) and a trimmed XDC and assuming no sponsor
> > > > payment
> > > > meanwhile, we'd have to cut CI services somewhere between May
> > > > and
> > > > June
> > > > this year. The board is of course working on acquiring
> > > > sponsors,
> > > > but
> > > > filling a shortfall of this magnitude is neither easy nor quick
> > > > work,
> > > > and we therefore decided to give an early warning as soon as
> > > > possible.
> > > > Any help in finding sponsors for fd.o is very much appreciated.
> > > a) Ouch.
> > > 
> > > b) we probably need to take a large step back here.
> > > 
> > I kinda agree, but maybe the step doesn't have to be *too* large?
> > 
> > I wonder if we could solve this by restructuring the project a bit.
> > I'm
> > talking purely from a Mesa point of view here, so it might not
> > solve
> > the full problem, but:
> > 
> > 1. It feels silly that we need to test changes to e.g the i965
> > driver
> > on dragonboards. We only have a big "do not run CI at all" escape-
> > hatch.
> 
> Yeah, changes on vulkan drivers or backend compilers should be
> fairly 
> sandboxed.
> 
> We also have tools that only work for intel stuff, that should never 
> trigger anything on other people's HW.
> 
> Could something be worked out using the tags?
> 

I think so! We have the pre-defined environment variable
CI_MERGE_REQUEST_LABELS, and we can do variable conditions:

https://docs.gitlab.com/ee/ci/yaml/#onlyvariablesexceptvariables

That sounds like a pretty neat middle-ground to me. I just hope that
new pipelines are triggered if new labels are added, because not
everyone is allowed to set labels, and sometimes people forget...

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Erik Faye-Lund
On Fri, 2020-02-28 at 10:43 +, Daniel Stone wrote:
> On Fri, 28 Feb 2020 at 10:06, Erik Faye-Lund
>  wrote:
> > On Fri, 2020-02-28 at 11:40 +0200, Lionel Landwerlin wrote:
> > > Yeah, changes on vulkan drivers or backend compilers should be
> > > fairly
> > > sandboxed.
> > > 
> > > We also have tools that only work for intel stuff, that should
> > > never
> > > trigger anything on other people's HW.
> > > 
> > > Could something be worked out using the tags?
> > 
> > I think so! We have the pre-defined environment variable
> > CI_MERGE_REQUEST_LABELS, and we can do variable conditions:
> > 
> > https://docs.gitlab.com/ee/ci/yaml/#onlyvariablesexceptvariables
> > 
> > That sounds like a pretty neat middle-ground to me. I just hope
> > that
> > new pipelines are triggered if new labels are added, because not
> > everyone is allowed to set labels, and sometimes people forget...
> 
> There's also this which is somewhat more robust:
> https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2569
> 
> 

I'm not sure it's more robust, but yeah that a useful tool too.

The reason I'm skeptical about the robustness is that we'll miss
testing if this misses a path. That can of course be fixed by testing
everything once things are in master, and fixing up that list when
something breaks on master.

The person who wrote a change knows more about the intricacies of the
changes than a computer will ever do. But humans are also good at
making mistakes, so I'm not sure which one is better. Maybe the union
of both?

As long as we have both rigorous testing after something landed in
master (doesn't nessecarily need to happen right after, but for now
that's probably fine), as well as a reasonable heuristic for what
testing is needed pre-merge, I think we're good.

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Erik Faye-Lund
On Fri, 2020-02-28 at 13:37 +1000, Dave Airlie wrote:
> On Fri, 28 Feb 2020 at 07:27, Daniel Vetter 
> wrote:
> > Hi all,
> > 
> > You might have read the short take in the X.org board meeting
> > minutes
> > already, here's the long version.
> > 
> > The good news: gitlab.fd.o has become very popular with our
> > communities, and is used extensively. This especially includes all
> > the
> > CI integration. Modern development process and tooling, yay!
> > 
> > The bad news: The cost in growth has also been tremendous, and it's
> > breaking our bank account. With reasonable estimates for continued
> > growth we're expecting hosting expenses totalling 75k USD this
> > year,
> > and 90k USD next year. With the current sponsors we've set up we
> > can't
> > sustain that. We estimate that hosting expenses for gitlab.fd.o
> > without any of the CI features enabled would total 30k USD, which
> > is
> > within X.org's ability to support through various sponsorships,
> > mostly
> > through XDC.
> > 
> > Note that X.org does no longer sponsor any CI runners themselves,
> > we've stopped that. The huge additional expenses are all just in
> > storing and serving build artifacts and images to outside CI
> > runners
> > sponsored by various companies. A related topic is that with the
> > growth in fd.o it's becoming infeasible to maintain it all on
> > volunteer admin time. X.org is therefore also looking for admin
> > sponsorship, at least medium term.
> > 
> > Assuming that we want cash flow reserves for one year of
> > gitlab.fd.o
> > (without CI support) and a trimmed XDC and assuming no sponsor
> > payment
> > meanwhile, we'd have to cut CI services somewhere between May and
> > June
> > this year. The board is of course working on acquiring sponsors,
> > but
> > filling a shortfall of this magnitude is neither easy nor quick
> > work,
> > and we therefore decided to give an early warning as soon as
> > possible.
> > Any help in finding sponsors for fd.o is very much appreciated.
> 
> a) Ouch.
> 
> b) we probably need to take a large step back here.
> 

I kinda agree, but maybe the step doesn't have to be *too* large?

I wonder if we could solve this by restructuring the project a bit. I'm
talking purely from a Mesa point of view here, so it might not solve
the full problem, but:

1. It feels silly that we need to test changes to e.g the i965 driver
on dragonboards. We only have a big "do not run CI at all" escape-
hatch.

2. A lot of us are working for a company that can probably pay for
their own needs in terms of CI. Perhaps moving some costs "up front" to
the company that needs it can make the future of CI for those who can't
do this 

3. I think we need a much more detailed break-down of the cost to make
educated changes. For instance, how expensive is Docker image
uploads/downloads (e.g intermediary artifacts) compared to build logs
and final test-results? What kind of artifacts?

One suggestion would be to do something more similar to what the kernel
does, and separate into different repos for different subsystems. This
could allow us to have separate testing-pipelines for these repos,
which would mean that for instance a change to RADV didn't trigger a
full Panfrost test-run.

This would probably require us to accept using a more branch-heavy
work-flow. I don't personally think that would be a bad thing.

But this is all kinda based on an assumption that running hardware-
testing is the expensive part. I think that's quite possibly the case,
but some more numbers would be helpful. I mean, it might turn out that
just throwing up a Docker cache inside the organizations that host
runners might be sufficient for all I know...

We could also do stuff like reducing the amount of tests we run on each
commit, and punt some testing to a per-weekend test-run or someting
like that. We don't *need* to know about every problem up front, just
the stuff that's about to be released, really. The other stuff is just
nice to have. If it's too expensive, I would say drop it.

I would really hope that we can consider approaches like this before we
throw out the baby with the bathwater...

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH][next] drm/amd/display: fix indentation issue on a hunk of code

2020-02-28 Thread Colin King
From: Colin Ian King 

There are multiple statements that are indented incorrectly. Add
in the missing tabs.

Signed-off-by: Colin Ian King 
---
 .../gpu/drm/amd/display/dc/calcs/dce_calcs.c  | 46 +--
 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c 
b/drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c
index 5d081c42e81b..2c6db379afae 100644
--- a/drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c
+++ b/drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c
@@ -3265,33 +3265,33 @@ bool bw_calcs(struct dc_context *ctx,
bw_fixed_to_int(bw_mul(data->
stutter_exit_watermark[9], 
bw_int_to_fixed(1000)));
 
-   calcs_output->stutter_entry_wm_ns[0].b_mark =
-   bw_fixed_to_int(bw_mul(data->
-   stutter_entry_watermark[4], 
bw_int_to_fixed(1000)));
-   calcs_output->stutter_entry_wm_ns[1].b_mark =
-   bw_fixed_to_int(bw_mul(data->
-   stutter_entry_watermark[5], 
bw_int_to_fixed(1000)));
-   calcs_output->stutter_entry_wm_ns[2].b_mark =
-   bw_fixed_to_int(bw_mul(data->
-   stutter_entry_watermark[6], 
bw_int_to_fixed(1000)));
-   if (ctx->dc->caps.max_slave_planes) {
-   calcs_output->stutter_entry_wm_ns[3].b_mark =
+   calcs_output->stutter_entry_wm_ns[0].b_mark =
bw_fixed_to_int(bw_mul(data->
-   stutter_entry_watermark[0], 
bw_int_to_fixed(1000)));
-   calcs_output->stutter_entry_wm_ns[4].b_mark =
+   stutter_entry_watermark[4], 
bw_int_to_fixed(1000)));
+   calcs_output->stutter_entry_wm_ns[1].b_mark =
bw_fixed_to_int(bw_mul(data->
-   stutter_entry_watermark[1], 
bw_int_to_fixed(1000)));
-   } else {
-   calcs_output->stutter_entry_wm_ns[3].b_mark =
+   stutter_entry_watermark[5], 
bw_int_to_fixed(1000)));
+   calcs_output->stutter_entry_wm_ns[2].b_mark =
bw_fixed_to_int(bw_mul(data->
-   stutter_entry_watermark[7], 
bw_int_to_fixed(1000)));
-   calcs_output->stutter_entry_wm_ns[4].b_mark =
+   stutter_entry_watermark[6], 
bw_int_to_fixed(1000)));
+   if (ctx->dc->caps.max_slave_planes) {
+   calcs_output->stutter_entry_wm_ns[3].b_mark =
+   bw_fixed_to_int(bw_mul(data->
+   stutter_entry_watermark[0], 
bw_int_to_fixed(1000)));
+   calcs_output->stutter_entry_wm_ns[4].b_mark =
+   bw_fixed_to_int(bw_mul(data->
+   stutter_entry_watermark[1], 
bw_int_to_fixed(1000)));
+   } else {
+   calcs_output->stutter_entry_wm_ns[3].b_mark =
+   bw_fixed_to_int(bw_mul(data->
+   stutter_entry_watermark[7], 
bw_int_to_fixed(1000)));
+   calcs_output->stutter_entry_wm_ns[4].b_mark =
+   bw_fixed_to_int(bw_mul(data->
+   stutter_entry_watermark[8], 
bw_int_to_fixed(1000)));
+   }
+   calcs_output->stutter_entry_wm_ns[5].b_mark =
bw_fixed_to_int(bw_mul(data->
-   stutter_entry_watermark[8], 
bw_int_to_fixed(1000)));
-   }
-   calcs_output->stutter_entry_wm_ns[5].b_mark =
-   bw_fixed_to_int(bw_mul(data->
-   stutter_entry_watermark[9], 
bw_int_to_fixed(1000)));
+   stutter_entry_watermark[9], 
bw_int_to_fixed(1000)));
 
calcs_output->urgent_wm_ns[0].b_mark =
bw_fixed_to_int(bw_mul(data->
-- 
2.25.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Erik Faye-Lund
On Fri, 2020-02-28 at 10:47 +0100, Daniel Vetter wrote:
> On Fri, Feb 28, 2020 at 10:29 AM Erik Faye-Lund
>  wrote:
> > On Fri, 2020-02-28 at 13:37 +1000, Dave Airlie wrote:
> > > On Fri, 28 Feb 2020 at 07:27, Daniel Vetter <
> > > daniel.vet...@ffwll.ch>
> > > wrote:
> > > > Hi all,
> > > > 
> > > > You might have read the short take in the X.org board meeting
> > > > minutes
> > > > already, here's the long version.
> > > > 
> > > > The good news: gitlab.fd.o has become very popular with our
> > > > communities, and is used extensively. This especially includes
> > > > all
> > > > the
> > > > CI integration. Modern development process and tooling, yay!
> > > > 
> > > > The bad news: The cost in growth has also been tremendous, and
> > > > it's
> > > > breaking our bank account. With reasonable estimates for
> > > > continued
> > > > growth we're expecting hosting expenses totalling 75k USD this
> > > > year,
> > > > and 90k USD next year. With the current sponsors we've set up
> > > > we
> > > > can't
> > > > sustain that. We estimate that hosting expenses for gitlab.fd.o
> > > > without any of the CI features enabled would total 30k USD,
> > > > which
> > > > is
> > > > within X.org's ability to support through various sponsorships,
> > > > mostly
> > > > through XDC.
> > > > 
> > > > Note that X.org does no longer sponsor any CI runners
> > > > themselves,
> > > > we've stopped that. The huge additional expenses are all just
> > > > in
> > > > storing and serving build artifacts and images to outside CI
> > > > runners
> > > > sponsored by various companies. A related topic is that with
> > > > the
> > > > growth in fd.o it's becoming infeasible to maintain it all on
> > > > volunteer admin time. X.org is therefore also looking for admin
> > > > sponsorship, at least medium term.
> > > > 
> > > > Assuming that we want cash flow reserves for one year of
> > > > gitlab.fd.o
> > > > (without CI support) and a trimmed XDC and assuming no sponsor
> > > > payment
> > > > meanwhile, we'd have to cut CI services somewhere between May
> > > > and
> > > > June
> > > > this year. The board is of course working on acquiring
> > > > sponsors,
> > > > but
> > > > filling a shortfall of this magnitude is neither easy nor quick
> > > > work,
> > > > and we therefore decided to give an early warning as soon as
> > > > possible.
> > > > Any help in finding sponsors for fd.o is very much appreciated.
> > > 
> > > a) Ouch.
> > > 
> > > b) we probably need to take a large step back here.
> > > 
> > 
> > I kinda agree, but maybe the step doesn't have to be *too* large?
> > 
> > I wonder if we could solve this by restructuring the project a bit.
> > I'm
> > talking purely from a Mesa point of view here, so it might not
> > solve
> > the full problem, but:
> > 
> > 1. It feels silly that we need to test changes to e.g the i965
> > driver
> > on dragonboards. We only have a big "do not run CI at all" escape-
> > hatch.
> > 
> > 2. A lot of us are working for a company that can probably pay for
> > their own needs in terms of CI. Perhaps moving some costs "up
> > front" to
> > the company that needs it can make the future of CI for those who
> > can't
> > do this
> > 
> > 3. I think we need a much more detailed break-down of the cost to
> > make
> > educated changes. For instance, how expensive is Docker image
> > uploads/downloads (e.g intermediary artifacts) compared to build
> > logs
> > and final test-results? What kind of artifacts?
> 
> We have logs somewhere, but no one yet got around to analyzing that.
> Which will be quite a bit of work to do since the cloud storage is
> totally disconnected from the gitlab front-end, making the connection
> to which project or CI job caused something is going to require
> scripting. Volunteers definitely very much welcome I think.
> 

Fair enough, but just keep in mind that the same thing as optimizing
software applies here; working blindly reduces the impact. So if we
want to fix the current situation, this is going to have to be a
priority, I think.

> > One suggestion would be to do something more similar to what the
> > kernel
> > does, and separate into different repos for different subsystems.
> > This
> > could allow us to have separate testing-pipelines for these repos,
> > which would mean that for instance a change to RADV didn't trigger
> > a
> > full Panfrost test-run.
> 
> Uh as someone who lives the kernel multi-tree model daily, there's a
> _lot_ of pain involved.

Could you please elaborate a bit? We're not the kernel, so I'm not sure
all of the kernel-pains apply to us. But we probably have other pains
as well ;-)

But yeah, it might be better to take smaller steps first, and see if
that helps.

> I think much better to look at filtering out
> CI targets for when nothing relevant happened. But that gets somewhat
> tricky, since "nothing relevant" is always only relative to some
> baseline, so bit of scripting and all involved to make sure you don't
> run stuff too 

Re: [Intel-gfx] [Mesa-dev] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Rob Clark
On Fri, Feb 28, 2020 at 3:43 AM Michel Dänzer  wrote:
>
> On 2020-02-28 10:28 a.m., Erik Faye-Lund wrote:
> >
> > We could also do stuff like reducing the amount of tests we run on each
> > commit, and punt some testing to a per-weekend test-run or someting
> > like that. We don't *need* to know about every problem up front, just
> > the stuff that's about to be released, really. The other stuff is just
> > nice to have. If it's too expensive, I would say drop it.
>
> I don't agree that pre-merge testing is just nice to have. A problem
> which is only caught after it lands in mainline has a much bigger impact
> than one which is already caught earlier.
>

one thought.. since with mesa+margebot we effectively get at least
two(ish) CI runs per MR, ie. one when it is initially pushed, and one
when margebot rebases and tries to merge, could we leverage this to
have trimmed down pre-margebot CI which tries to just target affected
drivers, with margebot doing a full CI run (when it is potentially
batching together multiple MRs)?

Seems like a way to reduce our CI runs with a good safety net to
prevent things from slipping through the cracks.

(Not sure how much that would help reduce bandwidth costs, but I guess
it should help a bit.)

BR,
-R
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v2 2/2] drm/amd/display: dc_link: code clean up on detect_dp function

2020-02-28 Thread Melissa Wen
Removes codestyle issues on detect_dp function as suggested by
checkpatch.pl.

CHECK: Lines should not end with a '('
WARNING: Missing a blank line after declarations
WARNING: line over 80 characters
CHECK: Alignment should match open parenthesis

Signed-off-by: Melissa Wen 
---
 drivers/gpu/drm/amd/display/dc/core/dc_link.c | 35 +--
 1 file changed, 16 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
index eb9894e416ed..549bea1d725c 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
@@ -585,14 +585,14 @@ static void read_current_link_settings_on_detect(struct 
dc_link *link)
LINK_SPREAD_05_DOWNSPREAD_30KHZ : LINK_SPREAD_DISABLED;
 }
 
-static bool detect_dp(
-   struct dc_link *link,
-   struct display_sink_capability *sink_caps,
-   bool *converter_disable_audio,
-   struct audio_support *audio_support,
-   enum dc_detect_reason reason)
+static bool detect_dp(struct dc_link *link,
+ struct display_sink_capability *sink_caps,
+ bool *converter_disable_audio,
+ struct audio_support *audio_support,
+ enum dc_detect_reason reason)
 {
bool boot = false;
+
sink_caps->signal = link_detect_sink(link, reason);
sink_caps->transaction_type =
get_ddc_transaction_type(sink_caps->signal);
@@ -609,9 +609,8 @@ static bool detect_dp(
sink_caps->signal = SIGNAL_TYPE_DISPLAY_PORT_MST;
link->type = dc_connection_mst_branch;
 
-   dal_ddc_service_set_transaction_type(
-   link->ddc,
-   
sink_caps->transaction_type);
+   dal_ddc_service_set_transaction_type(link->ddc,
+
sink_caps->transaction_type);
 
/*
 * This call will initiate MST topology discovery. Which
@@ -640,13 +639,10 @@ static bool detect_dp(
if (reason == DETECT_REASON_BOOT)
boot = true;
 
-   dm_helpers_dp_update_branch_info(
-   link->ctx,
-   link);
+   dm_helpers_dp_update_branch_info(link->ctx, link);
 
-   if (!dm_helpers_dp_mst_start_top_mgr(
-   link->ctx,
-   link, boot)) {
+   if (!dm_helpers_dp_mst_start_top_mgr(link->ctx,
+link, boot)) {
/* MST not supported */
link->type = dc_connection_single;
sink_caps->signal = SIGNAL_TYPE_DISPLAY_PORT;
@@ -654,7 +650,7 @@ static bool detect_dp(
}
 
if (link->type != dc_connection_mst_branch &&
-   is_dp_active_dongle(link)) {
+   is_dp_active_dongle(link)) {
/* DP active dongles */
link->type = dc_connection_active_dongle;
if (!link->dpcd_caps.sink_count.bits.SINK_COUNT) {
@@ -665,14 +661,15 @@ static bool detect_dp(
return true;
}
 
-   if (link->dpcd_caps.dongle_type != 
DISPLAY_DONGLE_DP_HDMI_CONVERTER)
+   if (link->dpcd_caps.dongle_type !=
+   DISPLAY_DONGLE_DP_HDMI_CONVERTER)
*converter_disable_audio = true;
}
} else {
/* DP passive dongles */
sink_caps->signal = dp_passive_dongle_detection(link->ddc,
-   sink_caps,
-   audio_support);
+   sink_caps,
+   audio_support);
}
 
return true;
-- 
2.25.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v2 1/2] drm/amd/display: dc_link: code clean up on enable_link_dp function

2020-02-28 Thread Melissa Wen
Coding style clean up on enable_link_dp function as suggested by
checkpatch.pl:

CHECK: Lines should not end with a '('
WARNING: line over 80 characters
WARNING: suspect code indent for conditional statements (8, 24)
CHECK: braces {} should be used on all arms of this statement
ERROR: else should follow close brace '}'
CHECK: Comparison to NULL could be written
   "link->preferred_training_settings.fec_enable"

Signed-off-by: Melissa Wen 
---
 drivers/gpu/drm/amd/display/dc/core/dc_link.c | 29 +--
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
index 02e1ad318203..eb9894e416ed 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
@@ -1498,9 +1498,8 @@ static void enable_stream_features(struct pipe_ctx 
*pipe_ctx)
}
 }
 
-static enum dc_status enable_link_dp(
-   struct dc_state *state,
-   struct pipe_ctx *pipe_ctx)
+static enum dc_status enable_link_dp(struct dc_state *state,
+struct pipe_ctx *pipe_ctx)
 {
struct dc_stream_state *stream = pipe_ctx->stream;
enum dc_status status;
@@ -1532,7 +1531,8 @@ static enum dc_status enable_link_dp(
pipe_ctx->stream_res.pix_clk_params.requested_sym_clk =
link_settings.link_rate * LINK_RATE_REF_FREQ_IN_KHZ;
if (state->clk_mgr && !apply_seamless_boot_optimization)
-   state->clk_mgr->funcs->update_clocks(state->clk_mgr, state, 
false);
+   state->clk_mgr->funcs->update_clocks(state->clk_mgr,
+state, false);
 
// during mode switch we do DP_SET_POWER off then on, and OUI is lost
dpcd_set_source_specific_data(link);
@@ -1540,21 +1540,20 @@ static enum dc_status enable_link_dp(
skip_video_pattern = true;
 
if (link_settings.link_rate == LINK_RATE_LOW)
-   skip_video_pattern = false;
-
-   if (perform_link_training_with_retries(
-   _settings,
-   skip_video_pattern,
-   LINK_TRAINING_ATTEMPTS,
-   pipe_ctx,
-   pipe_ctx->stream->signal)) {
+   skip_video_pattern = false;
+
+   if (perform_link_training_with_retries(_settings,
+  skip_video_pattern,
+  LINK_TRAINING_ATTEMPTS,
+  pipe_ctx,
+  pipe_ctx->stream->signal)) {
link->cur_link_settings = link_settings;
status = DC_OK;
-   }
-   else
+   } else {
status = DC_FAIL_DP_LINK_TRAINING;
+   }
 
-   if (link->preferred_training_settings.fec_enable != NULL)
+   if (link->preferred_training_settings.fec_enable)
fec_enable = *link->preferred_training_settings.fec_enable;
else
fec_enable = true;
-- 
2.25.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v2 0/2] drm/amd/display: dc_link: cleaning up some code style issues

2020-02-28 Thread Melissa Wen
This patchset solves some coding style issues on dc_link for readability
and cleaning up warnings. Change suggested by checkpatch.pl.

Changes in v2:
- Apply patches to the right amdgpu repository.
- Remove unnecessary {} added in the previous version.

Melissa Wen (2):
  drm/amd/display: dc_link: code clean up on enable_link_dp function
  drm/amd/display: dc_link: code clean up on detect_dp function

 drivers/gpu/drm/amd/display/dc/core/dc_link.c | 64 +--
 1 file changed, 30 insertions(+), 34 deletions(-)

-- 
2.25.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2 1/4] drm/amdgpu: set compute queue priority at mqd_init

2020-02-28 Thread Alex Deucher
On Fri, Feb 28, 2020 at 9:36 AM Nirmoy Das  wrote:
>
> We were changing compute ring priority while rings were being used
> before every job submission which is not recommended. This patch
> sets compute queue priority at mqd initialization for gfx8, gfx9 and
> gfx10.
>
> Policy: make queue 0 of each pipe as high priority compute queue
>
> High/normal priority compute sched lists are generated from set of high/normal
> priority compute queues. At context creation, entity of compute queue
> get a sched list from high or normal priority depending on ctx->priority
>
> Signed-off-by: Nirmoy Das 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  4 ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c  | 40 +++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c  |  8 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h  | 13 +++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  6 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |  1 +
>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c   | 19 +++
>  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c| 23 --
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 20 
>  9 files changed, 113 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index f397ff97b4e4..8304d0c87899 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -1205,7 +1205,6 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
> struct amdgpu_fpriv *fpriv = p->filp->driver_priv;
> struct drm_sched_entity *entity = p->entity;
> enum drm_sched_priority priority;
> -   struct amdgpu_ring *ring;
> struct amdgpu_bo_list_entry *e;
> struct amdgpu_job *job;
> uint64_t seq;
> @@ -1258,9 +1257,6 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
> priority = job->base.s_priority;
> drm_sched_entity_push_job(>base, entity);
>
> -   ring = to_amdgpu_ring(entity->rq->sched);
> -   amdgpu_ring_priority_get(ring, priority);
> -
> amdgpu_vm_move_to_lru_tail(p->adev, >vm);
>
> ttm_eu_fence_buffer_objects(>ticket, >validated, p->fence);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> index 94a6c42f29ea..b21771b37300 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> @@ -85,8 +85,8 @@ static int amdgpu_ctx_init_entity(struct amdgpu_ctx *ctx, 
> const u32 hw_ip, const
> num_scheds = 1;
> break;
> case AMDGPU_HW_IP_COMPUTE:
> -   scheds = adev->gfx.compute_sched;
> -   num_scheds = adev->gfx.num_compute_sched;
> +   scheds = adev->gfx.compute_prio_sched[priority];
> +   num_scheds = adev->gfx.num_compute_sched[priority];
> break;
> case AMDGPU_HW_IP_DMA:
> scheds = adev->sdma.sdma_sched;
> @@ -628,20 +628,46 @@ void amdgpu_ctx_mgr_fini(struct amdgpu_ctx_mgr *mgr)
> mutex_destroy(>lock);
>  }
>
> +
> +static void amdgpu_ctx_init_compute_sched(struct amdgpu_device *adev)
> +{
> +   int num_compute_sched_normal = 0;
> +   int num_compute_sched_high = AMDGPU_MAX_COMPUTE_RINGS - 1;
> +   int i;
> +
> +
> +   for (i = 0; i < adev->gfx.num_compute_rings; i++) {
> +   if (adev->gfx.compute_ring[i].high_priority)
> +   adev->gfx.compute_sched[num_compute_sched_normal++] =
> +   >gfx.compute_ring[i].sched;
> +   else
> +   adev->gfx.compute_sched[num_compute_sched_high--] =
> +   >gfx.compute_ring[i].sched;
> +   }
> +
> +   for (i = DRM_SCHED_PRIORITY_MIN; i <= DRM_SCHED_PRIORITY_NORMAL; i++) 
> {
> +   adev->gfx.compute_prio_sched[i] = >gfx.compute_sched[0];
> +   adev->gfx.num_compute_sched[i] = num_compute_sched_normal;
> +   }
> +
> +   for (i = DRM_SCHED_PRIORITY_NORMAL + 1; i < DRM_SCHED_PRIORITY_MAX; 
> i++) {
> +   adev->gfx.compute_prio_sched[i] =
> +   >gfx.compute_sched[num_compute_sched_high - 1];
> +   adev->gfx.num_compute_sched[i] =
> +   adev->gfx.num_compute_rings - 
> num_compute_sched_normal;
> +   }
> +}
> +
>  void amdgpu_ctx_init_sched(struct amdgpu_device *adev)
>  {
> int i, j;
>
> +   amdgpu_ctx_init_compute_sched(adev);
> for (i = 0; i < adev->gfx.num_gfx_rings; i++) {
> adev->gfx.gfx_sched[i] = >gfx.gfx_ring[i].sched;
> adev->gfx.num_gfx_sched++;
> }
>
> -   for (i = 0; i < adev->gfx.num_compute_rings; i++) {
> -   adev->gfx.compute_sched[i] = >gfx.compute_ring[i].sched;
> -   adev->gfx.num_compute_sched++;
> 

[PATCH v2 2/4] drm/scheduler: implement a function to modify sched list

2020-02-28 Thread Nirmoy Das
implement drm_sched_entity_modify_sched() which can modify existing
sched_list with a different one. This is going to be helpful when
userspace changes priority of a ctx/entity then driver can switch to
corresponding hw shced list for that priority

Signed-off-by: Nirmoy Das 
---
 drivers/gpu/drm/scheduler/sched_entity.c | 19 +++
 include/drm/gpu_scheduler.h  |  4 
 2 files changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 63bccd201b97..b94312154e56 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -83,6 +83,25 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
 }
 EXPORT_SYMBOL(drm_sched_entity_init);

+/**
+ * drm_sched_entity_modify_sched - Modify sched of an entity
+ *
+ * @entity: scheduler entity to init
+ * @sched_list: the list of new drm scheds which will replace
+ * existing entity->sched_list
+ * @num_sched_list: number of drm sched in sched_list
+ */
+void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
+ struct drm_gpu_scheduler **sched_list,
+ unsigned int num_sched_list)
+{
+   WARN_ON(!num_sched_list || !sched_list);
+
+   entity->sched_list = sched_list;
+   entity->num_sched_list = num_sched_list;
+}
+EXPORT_SYMBOL(drm_sched_entity_modify_sched);
+
 /**
  * drm_sched_entity_is_idle - Check if entity is idle
  *
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 589be851f8a1..f70a84aaaf7a 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -297,6 +297,10 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched);
 int drm_sched_job_init(struct drm_sched_job *job,
   struct drm_sched_entity *entity,
   void *owner);
+void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
+ struct drm_gpu_scheduler **sched_list,
+  unsigned int num_sched_list);
+
 void drm_sched_job_cleanup(struct drm_sched_job *job);
 void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
 void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job 
*bad);
--
2.25.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 4/4] drm/amdgpu: remove unused functions

2020-02-28 Thread Nirmoy Das
amdgpu statically set priority for compute queues
at initialization so remove all the functions
responsible changing compute queue priority dynamically

Signed-off-by: Nirmoy Das 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |  70 
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   7 --
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c|  99 --
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 100 ---
 4 files changed, 276 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index ca6b52054b4b..a7e1d0425ed0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -150,76 +150,6 @@ void amdgpu_ring_undo(struct amdgpu_ring *ring)
ring->funcs->end_use(ring);
 }
 
-/**
- * amdgpu_ring_priority_put - restore a ring's priority
- *
- * @ring: amdgpu_ring structure holding the information
- * @priority: target priority
- *
- * Release a request for executing at @priority
- */
-void amdgpu_ring_priority_put(struct amdgpu_ring *ring,
- enum drm_sched_priority priority)
-{
-   int i;
-
-   if (!ring->funcs->set_priority)
-   return;
-
-   if (atomic_dec_return(>num_jobs[priority]) > 0)
-   return;
-
-   /* no need to restore if the job is already at the lowest priority */
-   if (priority == DRM_SCHED_PRIORITY_NORMAL)
-   return;
-
-   mutex_lock(>priority_mutex);
-   /* something higher prio is executing, no need to decay */
-   if (ring->priority > priority)
-   goto out_unlock;
-
-   /* decay priority to the next level with a job available */
-   for (i = priority; i >= DRM_SCHED_PRIORITY_MIN; i--) {
-   if (i == DRM_SCHED_PRIORITY_NORMAL
-   || atomic_read(>num_jobs[i])) {
-   ring->priority = i;
-   ring->funcs->set_priority(ring, i);
-   break;
-   }
-   }
-
-out_unlock:
-   mutex_unlock(>priority_mutex);
-}
-
-/**
- * amdgpu_ring_priority_get - change the ring's priority
- *
- * @ring: amdgpu_ring structure holding the information
- * @priority: target priority
- *
- * Request a ring's priority to be raised to @priority (refcounted).
- */
-void amdgpu_ring_priority_get(struct amdgpu_ring *ring,
- enum drm_sched_priority priority)
-{
-   if (!ring->funcs->set_priority)
-   return;
-
-   if (atomic_inc_return(>num_jobs[priority]) <= 0)
-   return;
-
-   mutex_lock(>priority_mutex);
-   if (priority <= ring->priority)
-   goto out_unlock;
-
-   ring->priority = priority;
-   ring->funcs->set_priority(ring, priority);
-
-out_unlock:
-   mutex_unlock(>priority_mutex);
-}
-
 /**
  * amdgpu_ring_init - init driver ring struct.
  *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 34fcd467f18d..87ec35b68bfd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -167,9 +167,6 @@ struct amdgpu_ring_funcs {
uint32_t reg0, uint32_t reg1,
uint32_t ref, uint32_t mask);
void (*emit_tmz)(struct amdgpu_ring *ring, bool start);
-   /* priority functions */
-   void (*set_priority) (struct amdgpu_ring *ring,
- enum drm_sched_priority priority);
/* Try to soft recover the ring to make the fence signal */
void (*soft_recovery)(struct amdgpu_ring *ring, unsigned vmid);
int (*preempt_ib)(struct amdgpu_ring *ring);
@@ -259,10 +256,6 @@ void amdgpu_ring_insert_nop(struct amdgpu_ring *ring, 
uint32_t count);
 void amdgpu_ring_generic_pad_ib(struct amdgpu_ring *ring, struct amdgpu_ib 
*ib);
 void amdgpu_ring_commit(struct amdgpu_ring *ring);
 void amdgpu_ring_undo(struct amdgpu_ring *ring);
-void amdgpu_ring_priority_get(struct amdgpu_ring *ring,
- enum drm_sched_priority priority);
-void amdgpu_ring_priority_put(struct amdgpu_ring *ring,
- enum drm_sched_priority priority);
 int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring *ring,
 unsigned ring_size, struct amdgpu_irq_src *irq_src,
 unsigned irq_type);
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index 6c4b7e49f97f..ed9aff72350d 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -6275,104 +6275,6 @@ static void gfx_v8_0_ring_set_wptr_compute(struct 
amdgpu_ring *ring)
WDOORBELL32(ring->doorbell_index, lower_32_bits(ring->wptr));
 }
 
-static void gfx_v8_0_ring_set_pipe_percent(struct amdgpu_ring *ring,
-  

[PATCH v3 1/4] drm/amdgpu: set compute queue priority at mqd_init

2020-02-28 Thread Nirmoy Das
We were changing compute ring priority while rings were being used
before every job submission which is not recommended. This patch
sets compute queue priority at mqd initialization for gfx8, gfx9 and
gfx10.

Policy: make queue 0 of each pipe as high priority compute queue

High/normal priority compute sched lists are generated from set of high/normal
priority compute queues. At context creation, entity of compute queue
get a sched list from high or normal priority depending on ctx->priority

Signed-off-by: Nirmoy Das 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  4 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c  | 59 +---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c  |  8 
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h  | 15 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  6 ---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |  1 +
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c   | 19 
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c| 23 +++--
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 20 
 9 files changed, 134 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index f397ff97b4e4..8304d0c87899 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1205,7 +1205,6 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
struct amdgpu_fpriv *fpriv = p->filp->driver_priv;
struct drm_sched_entity *entity = p->entity;
enum drm_sched_priority priority;
-   struct amdgpu_ring *ring;
struct amdgpu_bo_list_entry *e;
struct amdgpu_job *job;
uint64_t seq;
@@ -1258,9 +1257,6 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
priority = job->base.s_priority;
drm_sched_entity_push_job(>base, entity);

-   ring = to_amdgpu_ring(entity->rq->sched);
-   amdgpu_ring_priority_get(ring, priority);
-
amdgpu_vm_move_to_lru_tail(p->adev, >vm);

ttm_eu_fence_buffer_objects(>ticket, >validated, p->fence);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 94a6c42f29ea..4616741e1295 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -61,12 +61,30 @@ static int amdgpu_ctx_priority_permit(struct drm_file *filp,
return -EACCES;
 }

+static enum gfx_pipe_priority amdgpu_ctx_sched_prio_to_compute_prio(enum 
drm_sched_priority prio)
+{
+   switch(prio) {
+   case DRM_SCHED_PRIORITY_MIN:
+   case DRM_SCHED_PRIORITY_NORMAL:
+   return AMDGPU_GFX_PIPE_PRIO_NORMAL;
+   case DRM_SCHED_PRIORITY_HIGH_SW:
+   case DRM_SCHED_PRIORITY_HIGH_HW:
+   case DRM_SCHED_PRIORITY_KERNEL:
+   return AMDGPU_GFX_PIPE_PRIO_HIGH;
+   default:
+   return AMDGPU_GFX_PIPE_PRIO_NORMAL;
+   }
+
+   return AMDGPU_GFX_PIPE_PRIO_NORMAL;
+}
+
 static int amdgpu_ctx_init_entity(struct amdgpu_ctx *ctx, const u32 hw_ip, 
const u32 ring)
 {
struct amdgpu_device *adev = ctx->adev;
struct amdgpu_ctx_entity *entity;
struct drm_gpu_scheduler **scheds = NULL, *sched = NULL;
unsigned num_scheds = 0;
+   enum gfx_pipe_priority compute_priority;
enum drm_sched_priority priority;
int r;

@@ -85,8 +103,10 @@ static int amdgpu_ctx_init_entity(struct amdgpu_ctx *ctx, 
const u32 hw_ip, const
num_scheds = 1;
break;
case AMDGPU_HW_IP_COMPUTE:
-   scheds = adev->gfx.compute_sched;
-   num_scheds = adev->gfx.num_compute_sched;
+   compute_priority =
+   amdgpu_ctx_sched_prio_to_compute_prio(priority);
+   scheds = adev->gfx.compute_prio_sched[compute_priority];
+   num_scheds = 
adev->gfx.num_compute_sched[compute_priority];
break;
case AMDGPU_HW_IP_DMA:
scheds = adev->sdma.sdma_sched;
@@ -628,20 +648,45 @@ void amdgpu_ctx_mgr_fini(struct amdgpu_ctx_mgr *mgr)
mutex_destroy(>lock);
 }

+
+static void amdgpu_ctx_init_compute_sched(struct amdgpu_device *adev)
+{
+   int num_compute_sched_normal = 0;
+   int num_compute_sched_high = AMDGPU_MAX_COMPUTE_RINGS - 1;
+   int i;
+
+
+   for (i = 0; i < adev->gfx.num_compute_rings; i++) {
+   if (adev->gfx.compute_ring[i].high_priority)
+   adev->gfx.compute_sched[num_compute_sched_normal++] =
+   >gfx.compute_ring[i].sched;
+   else
+   adev->gfx.compute_sched[num_compute_sched_high--] =
+   >gfx.compute_ring[i].sched;
+   }
+
+   /* compute ring only has two priority for now*/
+   i = AMDGPU_GFX_PIPE_PRIO_NORMAL;
+   adev->gfx.compute_prio_sched[i] = >gfx.compute_sched[0];

[PATCH v3 3/4] drm/amdgpu: change hw sched list on ctx priority override

2020-02-28 Thread Nirmoy Das
Switch to appropriate sched list for an entity on priority override.

Signed-off-by: Nirmoy Das 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 30 +
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 4616741e1295..bc7de30b49f9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -522,6 +522,30 @@ struct dma_fence *amdgpu_ctx_get_fence(struct amdgpu_ctx 
*ctx,
return fence;
 }

+static void amdgpu_ctx_set_entity_priority(struct amdgpu_ctx *ctx,
+  struct amdgpu_ctx_entity *aentity,
+  int hw_ip, enum drm_sched_priority priority)
+{
+   struct amdgpu_device *adev = ctx->adev;
+   enum gfx_pipe_priority compute_priority;
+   struct drm_gpu_scheduler **scheds = NULL;
+   unsigned num_scheds = 0;
+
+   switch (hw_ip) {
+   case AMDGPU_HW_IP_COMPUTE:
+   compute_priority =
+
amdgpu_ctx_sched_prio_to_compute_prio(priority);
+   scheds = adev->gfx.compute_prio_sched[compute_priority];
+   num_scheds = adev->gfx.num_compute_sched[compute_priority];
+   break;
+   default:
+   return;
+   }
+
+   drm_sched_entity_modify_sched(>entity, scheds, num_scheds);
+   drm_sched_entity_set_priority(>entity, priority);
+}
+
 void amdgpu_ctx_priority_override(struct amdgpu_ctx *ctx,
  enum drm_sched_priority priority)
 {
@@ -534,13 +558,11 @@ void amdgpu_ctx_priority_override(struct amdgpu_ctx *ctx,
ctx->init_priority : ctx->override_priority;
for (i = 0; i < AMDGPU_HW_IP_NUM; ++i) {
for (j = 0; j < amdgpu_ctx_num_entities[i]; ++j) {
-   struct drm_sched_entity *entity;
-
if (!ctx->entities[i][j])
continue;

-   entity = >entities[i][j]->entity;
-   drm_sched_entity_set_priority(entity, ctx_prio);
+   amdgpu_ctx_set_entity_priority(ctx, ctx->entities[i][j],
+  i, ctx_prio);
}
}
 }
--
2.25.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: stop disable the scheduler during HW fini

2020-02-28 Thread Li, Dennis
[AMD Public Use]

Looks good to me

Test-by: Dennis Li mailto:dennis...@amd.com>>

Best Regards
Dennis Li
From: amd-gfx  On Behalf Of Deucher, 
Alexander
Sent: Thursday, February 27, 2020 11:18 PM
To: Christian König ; Das, Nirmoy 
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: stop disable the scheduler during HW fini


[AMD Public Use]

Looks good to me.
Reviewed-by: Alex Deucher 
mailto:alexander.deuc...@amd.com>>

From: Christian König 
mailto:ckoenig.leichtzumer...@gmail.com>>
Sent: Thursday, February 27, 2020 9:50 AM
To: Das, Nirmoy mailto:nirmoy@amd.com>>; 
amd-gfx@lists.freedesktop.org 
mailto:amd-gfx@lists.freedesktop.org>>; Deucher, 
Alexander mailto:alexander.deuc...@amd.com>>
Subject: Re: [PATCH] drm/amdgpu: stop disable the scheduler during HW fini

Alex any comment on this?

Am 25.02.20 um 14:16 schrieb Nirmoy:
> Acked-by: Nirmoy Das mailto:nirmoy@amd.com>>
>
> On 2/25/20 2:07 PM, Christian König wrote:
>> When we stop the HW for example for GPU reset we should not stop the
>> front-end scheduler. Otherwise we run into intermediate failures during
>> command submission.
>>
>> The scheduler should only be stopped in very few cases:
>> 1. We can't get the hardware working in ring or IB test after a GPU
>> reset.
>> 2. The KIQ scheduler is not used in the front-end and should be
>> disabled during GPU reset.
>> 3. In amdgpu_ring_fini() when the driver unloads.
>>
>> Signed-off-by: Christian König 
>> mailto:christian.koe...@amd.com>>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/cik_sdma.c  |  2 --
>>   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c |  8 
>>   drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c  |  5 -
>>   drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c  | 25 +
>>   drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c  |  7 ---
>>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  |  9 -
>>   drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c |  3 ---
>>   drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c |  2 --
>>   drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c |  2 --
>>   drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c |  4 
>>   drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c |  3 ---
>>   drivers/gpu/drm/amd/amdgpu/si_dma.c|  1 -
>>   drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c  |  3 ---
>>   drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c  |  3 ---
>>   drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c  |  3 ---
>>   drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c  |  7 ---
>>   drivers/gpu/drm/amd/amdgpu/vce_v4_0.c  |  4 
>>   drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c  |  3 ---
>>   drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c  |  9 -
>>   drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c  | 11 +--
>>   20 files changed, 10 insertions(+), 104 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
>> b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
>> index 4274ccf765de..cb3b3a0a1348 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
>> @@ -320,8 +320,6 @@ static void cik_sdma_gfx_stop(struct
>> amdgpu_device *adev)
>>   WREG32(mmSDMA0_GFX_RB_CNTL + sdma_offsets[i], rb_cntl);
>>   WREG32(mmSDMA0_GFX_IB_CNTL + sdma_offsets[i], 0);
>>   }
>> -sdma0->sched.ready = false;
>> -sdma1->sched.ready = false;
>>   }
>> /**
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> index 7b6158320400..36ce67ce4800 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> @@ -2391,10 +2391,6 @@ static int gfx_v10_0_cp_gfx_enable(struct
>> amdgpu_device *adev, bool enable)
>>   tmp = REG_SET_FIELD(tmp, CP_ME_CNTL, ME_HALT, enable ? 0 : 1);
>>   tmp = REG_SET_FIELD(tmp, CP_ME_CNTL, PFP_HALT, enable ? 0 : 1);
>>   tmp = REG_SET_FIELD(tmp, CP_ME_CNTL, CE_HALT, enable ? 0 : 1);
>> -if (!enable) {
>> -for (i = 0; i < adev->gfx.num_gfx_rings; i++)
>> -adev->gfx.gfx_ring[i].sched.ready = false;
>> -}
>>   WREG32_SOC15(GC, 0, mmCP_ME_CNTL, tmp);
>> for (i = 0; i < adev->usec_timeout; i++) {
>> @@ -2869,16 +2865,12 @@ static int gfx_v10_0_cp_gfx_resume(struct
>> amdgpu_device *adev)
>> static void gfx_v10_0_cp_compute_enable(struct amdgpu_device
>> *adev, bool enable)
>>   {
>> -int i;
>> -
>>   if (enable) {
>>   WREG32_SOC15(GC, 0, mmCP_MEC_CNTL, 0);
>>   } else {
>>   WREG32_SOC15(GC, 0, mmCP_MEC_CNTL,
>>(CP_MEC_CNTL__MEC_ME1_HALT_MASK |
>> CP_MEC_CNTL__MEC_ME2_HALT_MASK));
>> -for (i = 0; i < adev->gfx.num_compute_rings; i++)
>> -adev->gfx.compute_ring[i].sched.ready = false;
>>   adev->gfx.kiq.ring.sched.ready = false;
>>   }
>>   udelay(50);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
>> index 31f44d05e606..e462a099dbda 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
>> +++ 

Re: [PATCH v2 1/4] drm/amdgpu: set compute queue priority at mqd_init

2020-02-28 Thread Nirmoy

Thanks Christian, I will send a updated one soon.

On 2/28/20 3:44 PM, Christian König wrote:

Am 28.02.20 um 15:39 schrieb Nirmoy Das:

We were changing compute ring priority while rings were being used
before every job submission which is not recommended. This patch
sets compute queue priority at mqd initialization for gfx8, gfx9 and
gfx10.

Policy: make queue 0 of each pipe as high priority compute queue

High/normal priority compute sched lists are generated from set of 
high/normal

priority compute queues. At context creation, entity of compute queue
get a sched list from high or normal priority depending on ctx->priority

Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  4 ---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c  | 40 +++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c  |  8 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h  | 13 +++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  6 
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |  1 +
  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c   | 19 +++
  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c    | 23 --
  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c    | 20 
  9 files changed, 113 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c

index f397ff97b4e4..8304d0c87899 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1205,7 +1205,6 @@ static int amdgpu_cs_submit(struct 
amdgpu_cs_parser *p,

  struct amdgpu_fpriv *fpriv = p->filp->driver_priv;
  struct drm_sched_entity *entity = p->entity;
  enum drm_sched_priority priority;
-    struct amdgpu_ring *ring;
  struct amdgpu_bo_list_entry *e;
  struct amdgpu_job *job;
  uint64_t seq;
@@ -1258,9 +1257,6 @@ static int amdgpu_cs_submit(struct 
amdgpu_cs_parser *p,

  priority = job->base.s_priority;
  drm_sched_entity_push_job(>base, entity);

-    ring = to_amdgpu_ring(entity->rq->sched);
-    amdgpu_ring_priority_get(ring, priority);
-
  amdgpu_vm_move_to_lru_tail(p->adev, >vm);

  ttm_eu_fence_buffer_objects(>ticket, >validated, p->fence);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c

index 94a6c42f29ea..b21771b37300 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -85,8 +85,8 @@ static int amdgpu_ctx_init_entity(struct amdgpu_ctx 
*ctx, const u32 hw_ip, const

  num_scheds = 1;
  break;
  case AMDGPU_HW_IP_COMPUTE:
-    scheds = adev->gfx.compute_sched;
-    num_scheds = adev->gfx.num_compute_sched;
+    scheds = adev->gfx.compute_prio_sched[priority];
+    num_scheds = adev->gfx.num_compute_sched[priority];
  break;
  case AMDGPU_HW_IP_DMA:
  scheds = adev->sdma.sdma_sched;
@@ -628,20 +628,46 @@ void amdgpu_ctx_mgr_fini(struct amdgpu_ctx_mgr 
*mgr)

  mutex_destroy(>lock);
  }

+
+static void amdgpu_ctx_init_compute_sched(struct amdgpu_device *adev)
+{
+    int num_compute_sched_normal = 0;
+    int num_compute_sched_high = AMDGPU_MAX_COMPUTE_RINGS - 1;
+    int i;
+
+
+    for (i = 0; i < adev->gfx.num_compute_rings; i++) {
+    if (adev->gfx.compute_ring[i].high_priority)
+ adev->gfx.compute_sched[num_compute_sched_normal++] =
+    >gfx.compute_ring[i].sched;
+    else
+ adev->gfx.compute_sched[num_compute_sched_high--] =
+    >gfx.compute_ring[i].sched;
+    }
+
+    for (i = DRM_SCHED_PRIORITY_MIN; i <= DRM_SCHED_PRIORITY_NORMAL; 
i++) {

+    adev->gfx.compute_prio_sched[i] = >gfx.compute_sched[0];
+    adev->gfx.num_compute_sched[i] = num_compute_sched_normal;
+    }
+
+    for (i = DRM_SCHED_PRIORITY_NORMAL + 1; i < 
DRM_SCHED_PRIORITY_MAX; i++) {

+    adev->gfx.compute_prio_sched[i] =
+ >gfx.compute_sched[num_compute_sched_high - 1];
+    adev->gfx.num_compute_sched[i] =
+    adev->gfx.num_compute_rings - num_compute_sched_normal;
+    }
+}
+
  void amdgpu_ctx_init_sched(struct amdgpu_device *adev)
  {
  int i, j;

+    amdgpu_ctx_init_compute_sched(adev);
  for (i = 0; i < adev->gfx.num_gfx_rings; i++) {
  adev->gfx.gfx_sched[i] = >gfx.gfx_ring[i].sched;
  adev->gfx.num_gfx_sched++;
  }

-    for (i = 0; i < adev->gfx.num_compute_rings; i++) {
-    adev->gfx.compute_sched[i] = >gfx.compute_ring[i].sched;
-    adev->gfx.num_compute_sched++;
-    }
-
  for (i = 0; i < adev->sdma.num_instances; i++) {
  adev->sdma.sdma_sched[i] = >sdma.instance[i].ring.sched;
  adev->sdma.num_sdma_sched++;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c

index 7403588684b3..952725e7243c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -192,6 +192,14 @@ static bool 

Re: [PATCH 1/2] Revert "drm/amdgpu: add sysfs interface to set arbitrary sclk value for navi14"

2020-02-28 Thread Deucher, Alexander
[AMD Public Use]

Please also revert this fix:
https://cgit.freedesktop.org/~agd5f/linux/commit/?h=amd-staging-drm-next=4e261b86437af7481364a5239c62cc3c5ef0ee38

Alex

From: amd-gfx  on behalf of Quan, Evan 

Sent: Thursday, February 27, 2020 10:37 PM
To: Gui, Jack ; amd-gfx@lists.freedesktop.org 

Cc: Xu, Feifei ; Feng, Kenneth ; Gui, 
Jack 
Subject: RE: [PATCH 1/2] Revert "drm/amdgpu: add sysfs interface to set 
arbitrary sclk value for navi14"

Reviewed-by: Evan Quan 

-Original Message-
From: Chengming Gui 
Sent: Friday, February 28, 2020 10:37 AM
To: amd-gfx@lists.freedesktop.org
Cc: Quan, Evan ; Feng, Kenneth ; Xu, 
Feifei ; Gui, Jack 
Subject: [PATCH 1/2] Revert "drm/amdgpu: add sysfs interface to set arbitrary 
sclk value for navi14"

Revert this commit and than add debugfs interface to replace this to meet the 
specitic requirement.

This reverts commit 3107269204f8e18f389080673f7848b420970aa5.
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c| 42 ---
 drivers/gpu/drm/amd/powerplay/smu_v11_0.c |  9 ++-
 2 files changed, 2 insertions(+), 49 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
index 9deff8c..bc3cf04 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
@@ -1034,40 +1034,6 @@ static ssize_t amdgpu_read_mask(const char *buf, size_t 
count, uint32_t *mask)
 return 0;
 }

-static ssize_t amdgpu_set_pp_sclk(struct device *dev,
-   struct device_attribute *attr,
-   const char *buf,
-   size_t count)
-{
-   struct drm_device *ddev = dev_get_drvdata(dev);
-   struct amdgpu_device *adev = ddev->dev_private;
-   int ret;
-   uint32_t value;
-
-   if (amdgpu_sriov_vf(adev) && !amdgpu_sriov_is_pp_one_vf(adev))
-   return -EINVAL;
-
-   ret = pm_runtime_get_sync(ddev->dev);
-   if (ret < 0)
-   return ret;
-
-   ret = kstrtou32(buf, 0, );
-   if (ret < 0)
-   return ret;
-   if (is_support_sw_smu(adev))
-   ret = smu_set_soft_freq_range(>smu, SMU_SCLK, value, 
value);
-   else
-   return 0;
-
-   pm_runtime_mark_last_busy(ddev->dev);
-   pm_runtime_put_autosuspend(ddev->dev);
-
-   if (ret)
-   return -EINVAL;
-
-   return count;
-}
-
 static ssize_t amdgpu_set_pp_dpm_sclk(struct device *dev,
 struct device_attribute *attr,
 const char *buf,
@@ -1829,8 +1795,6 @@ static DEVICE_ATTR(pp_force_state, S_IRUGO | S_IWUSR,  
static DEVICE_ATTR(pp_table, S_IRUGO | S_IWUSR,
 amdgpu_get_pp_table,
 amdgpu_set_pp_table);
-static DEVICE_ATTR(pp_sclk, S_IWUSR,
-   NULL, amdgpu_set_pp_sclk);
 static DEVICE_ATTR(pp_dpm_sclk, S_IRUGO | S_IWUSR,
 amdgpu_get_pp_dpm_sclk,
 amdgpu_set_pp_dpm_sclk);
@@ -3322,12 +3286,6 @@ int amdgpu_pm_sysfs_init(struct amdgpu_device *adev)
 return ret;
 }

-   ret = device_create_file(adev->dev, _attr_pp_sclk);
-   if (ret) {
-   DRM_ERROR("failed to create device file pp_sclk\n");
-   return ret;
-   }
-
 ret = device_create_file(adev->dev, _attr_pp_dpm_sclk);
 if (ret) {
 DRM_ERROR("failed to create device file pp_dpm_sclk\n"); diff 
--git a/drivers/gpu/drm/amd/powerplay/smu_v11_0.c 
b/drivers/gpu/drm/amd/powerplay/smu_v11_0.c
index 1507bb7..c9e5ce1 100644
--- a/drivers/gpu/drm/amd/powerplay/smu_v11_0.c
+++ b/drivers/gpu/drm/amd/powerplay/smu_v11_0.c
@@ -1803,17 +1803,12 @@ int smu_v11_0_set_soft_freq_limited_range(struct 
smu_context *smu, enum smu_clk_  {
 int ret = 0, clk_id = 0;
 uint32_t param;
-   uint32_t min_freq, max_freq;

 clk_id = smu_clk_get_index(smu, clk_type);
 if (clk_id < 0)
 return clk_id;

-   ret = smu_get_dpm_freq_range(smu, clk_type, _freq, _freq, true);
-   if (ret)
-   return ret;
-
-   if (max > 0 && max <=  max_freq) {
+   if (max > 0) {
 param = (uint32_t)((clk_id << 16) | (max & 0x));
 ret = smu_send_smc_msg_with_param(smu, 
SMU_MSG_SetSoftMaxByFreq,
   param);
@@ -1821,7 +1816,7 @@ int smu_v11_0_set_soft_freq_limited_range(struct 
smu_context *smu, enum smu_clk_
 return ret;
 }

-   if (min > 0 && min >= min_freq) {
+   if (min > 0) {
 param = (uint32_t)((clk_id << 16) | (min & 0x));
 ret = smu_send_smc_msg_with_param(smu, 
SMU_MSG_SetSoftMinByFreq,
   param);
--
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org

Re: [PATCH] drm/amdgpu/sriov: Use VF-accessible register for gpu_clock_count

2020-02-28 Thread Deucher, Alexander
Let's just switch everyone over.  I think this should work for both bare metal 
and SR-IOV.  Fewer code pathes to maintain.  Feel free to do it as a follow up 
patch.

Thanks,

Alex



From: Zhao, Jiange 
Sent: Thursday, February 27, 2020 10:02 PM
To: Deucher, Alexander ; 
amd-gfx@lists.freedesktop.org 
Cc: Deng, Emily ; Liu, Monk 
Subject: Re: [PATCH] drm/amdgpu/sriov: Use VF-accessible register for 
gpu_clock_count

Hi,

I got feedback from Linux team and they simply don't want to change.

I believe that it would work for bare metal as well.

Jiange

From: Deucher, Alexander 
Sent: Thursday, February 27, 2020 10:23 PM
To: Zhao, Jiange ; amd-gfx@lists.freedesktop.org 

Cc: Deng, Emily ; Liu, Monk 
Subject: Re: [PATCH] drm/amdgpu/sriov: Use VF-accessible register for 
gpu_clock_count


[AMD Public Use]

Is there any reason to not just use this for bare metal as well?

Alex


From: amd-gfx  on behalf of jianzh 

Sent: Thursday, February 27, 2020 6:21 AM
To: amd-gfx@lists.freedesktop.org 
Cc: Deng, Emily ; Zhao, Jiange ; Liu, 
Monk 
Subject: [PATCH] drm/amdgpu/sriov: Use VF-accessible register for 
gpu_clock_count

Navi12 VK CTS subtest timestamp.calibrated.dev_domain_test failed
because mmRLC_CAPTURE_GPU_CLOCK_COUNT register cannot be
written in VF due to security policy.

Solution: use a VF-accessible timestamp register pair
mmGOLDEN_TSC_COUNT_LOWER/UPPER for SRIOV case.

Signed-off-by: jianzh 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 44f00ec..8787a46 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -35,6 +35,8 @@

 #include "gc/gc_10_1_0_offset.h"
 #include "gc/gc_10_1_0_sh_mask.h"
+#include "smuio/smuio_11_0_0_offset.h"
+#include "smuio/smuio_11_0_0_sh_mask.h"
 #include "navi10_enum.h"
 #include "hdp/hdp_5_0_0_offset.h"
 #include "ivsrcid/gfx/irqsrcs_gfx_10_1.h"
@@ -3920,9 +3922,14 @@ static uint64_t gfx_v10_0_get_gpu_clock_counter(struct 
amdgpu_device *adev)

 amdgpu_gfx_off_ctrl(adev, false);
 mutex_lock(>gfx.gpu_clock_mutex);
-   WREG32_SOC15(GC, 0, mmRLC_CAPTURE_GPU_CLOCK_COUNT, 1);
-   clock = (uint64_t)RREG32_SOC15(GC, 0, mmRLC_GPU_CLOCK_COUNT_LSB) |
-   ((uint64_t)RREG32_SOC15(GC, 0, mmRLC_GPU_CLOCK_COUNT_MSB) << 
32ULL);
+   if (!amdgpu_sriov_vf(adev)) {
+   WREG32_SOC15(GC, 0, mmRLC_CAPTURE_GPU_CLOCK_COUNT, 1);
+   clock = (uint64_t)RREG32_SOC15(GC, 0, 
mmRLC_GPU_CLOCK_COUNT_LSB) |
+   ((uint64_t)RREG32_SOC15(GC, 0, 
mmRLC_GPU_CLOCK_COUNT_MSB) << 32ULL);
+   } else {
+   clock = (uint64_t)RREG32_SOC15(SMUIO, 0, 
mmGOLDEN_TSC_COUNT_LOWER) |
+   ((uint64_t)RREG32_SOC15(SMUIO, 0, 
mmGOLDEN_TSC_COUNT_UPPER) << 32ULL);
+   }
 mutex_unlock(>gfx.gpu_clock_mutex);
 amdgpu_gfx_off_ctrl(adev, true);
 return clock;
--
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=02%7C01%7Calexander.deucher%40amd.com%7C9b14a49e41fd48f7138f08d7bb773ed9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637183993593225894sdata=jFUVpgeEcTTJbTJ3a7ibAPOyAU3RVF%2FEIN41zaqS0eM%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2 1/4] drm/amdgpu: set compute queue priority at mqd_init

2020-02-28 Thread Christian König

Am 28.02.20 um 15:39 schrieb Nirmoy Das:

We were changing compute ring priority while rings were being used
before every job submission which is not recommended. This patch
sets compute queue priority at mqd initialization for gfx8, gfx9 and
gfx10.

Policy: make queue 0 of each pipe as high priority compute queue

High/normal priority compute sched lists are generated from set of high/normal
priority compute queues. At context creation, entity of compute queue
get a sched list from high or normal priority depending on ctx->priority

Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  4 ---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c  | 40 +++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c  |  8 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h  | 13 +++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  6 
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |  1 +
  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c   | 19 +++
  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c| 23 --
  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 20 
  9 files changed, 113 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index f397ff97b4e4..8304d0c87899 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1205,7 +1205,6 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
struct amdgpu_fpriv *fpriv = p->filp->driver_priv;
struct drm_sched_entity *entity = p->entity;
enum drm_sched_priority priority;
-   struct amdgpu_ring *ring;
struct amdgpu_bo_list_entry *e;
struct amdgpu_job *job;
uint64_t seq;
@@ -1258,9 +1257,6 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
priority = job->base.s_priority;
drm_sched_entity_push_job(>base, entity);

-   ring = to_amdgpu_ring(entity->rq->sched);
-   amdgpu_ring_priority_get(ring, priority);
-
amdgpu_vm_move_to_lru_tail(p->adev, >vm);

ttm_eu_fence_buffer_objects(>ticket, >validated, p->fence);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 94a6c42f29ea..b21771b37300 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -85,8 +85,8 @@ static int amdgpu_ctx_init_entity(struct amdgpu_ctx *ctx, 
const u32 hw_ip, const
num_scheds = 1;
break;
case AMDGPU_HW_IP_COMPUTE:
-   scheds = adev->gfx.compute_sched;
-   num_scheds = adev->gfx.num_compute_sched;
+   scheds = adev->gfx.compute_prio_sched[priority];
+   num_scheds = adev->gfx.num_compute_sched[priority];
break;
case AMDGPU_HW_IP_DMA:
scheds = adev->sdma.sdma_sched;
@@ -628,20 +628,46 @@ void amdgpu_ctx_mgr_fini(struct amdgpu_ctx_mgr *mgr)
mutex_destroy(>lock);
  }

+
+static void amdgpu_ctx_init_compute_sched(struct amdgpu_device *adev)
+{
+   int num_compute_sched_normal = 0;
+   int num_compute_sched_high = AMDGPU_MAX_COMPUTE_RINGS - 1;
+   int i;
+
+
+   for (i = 0; i < adev->gfx.num_compute_rings; i++) {
+   if (adev->gfx.compute_ring[i].high_priority)
+   adev->gfx.compute_sched[num_compute_sched_normal++] =
+   >gfx.compute_ring[i].sched;
+   else
+   adev->gfx.compute_sched[num_compute_sched_high--] =
+   >gfx.compute_ring[i].sched;
+   }
+
+   for (i = DRM_SCHED_PRIORITY_MIN; i <= DRM_SCHED_PRIORITY_NORMAL; i++) {
+   adev->gfx.compute_prio_sched[i] = >gfx.compute_sched[0];
+   adev->gfx.num_compute_sched[i] = num_compute_sched_normal;
+   }
+
+   for (i = DRM_SCHED_PRIORITY_NORMAL + 1; i < DRM_SCHED_PRIORITY_MAX; 
i++) {
+   adev->gfx.compute_prio_sched[i] =
+   >gfx.compute_sched[num_compute_sched_high - 1];
+   adev->gfx.num_compute_sched[i] =
+   adev->gfx.num_compute_rings - num_compute_sched_normal;
+   }
+}
+
  void amdgpu_ctx_init_sched(struct amdgpu_device *adev)
  {
int i, j;

+   amdgpu_ctx_init_compute_sched(adev);
for (i = 0; i < adev->gfx.num_gfx_rings; i++) {
adev->gfx.gfx_sched[i] = >gfx.gfx_ring[i].sched;
adev->gfx.num_gfx_sched++;
}

-   for (i = 0; i < adev->gfx.num_compute_rings; i++) {
-   adev->gfx.compute_sched[i] = >gfx.compute_ring[i].sched;
-   adev->gfx.num_compute_sched++;
-   }
-
for (i = 0; i < adev->sdma.num_instances; i++) {
adev->sdma.sdma_sched[i] = >sdma.instance[i].ring.sched;
adev->sdma.num_sdma_sched++;
diff --git 

[PATCH v2 2/4] drm/scheduler: implement a function to modify sched list

2020-02-28 Thread Nirmoy Das
implement drm_sched_entity_modify_sched() which can modify existing
sched_list with a different one. This is going to be helpful when
userspace changes priority of a ctx/entity then driver can switch to
corresponding hw shced list for that priority

Signed-off-by: Nirmoy Das 
---
 drivers/gpu/drm/scheduler/sched_entity.c | 19 +++
 include/drm/gpu_scheduler.h  |  4 
 2 files changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 63bccd201b97..b94312154e56 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -83,6 +83,25 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
 }
 EXPORT_SYMBOL(drm_sched_entity_init);

+/**
+ * drm_sched_entity_modify_sched - Modify sched of an entity
+ *
+ * @entity: scheduler entity to init
+ * @sched_list: the list of new drm scheds which will replace
+ * existing entity->sched_list
+ * @num_sched_list: number of drm sched in sched_list
+ */
+void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
+ struct drm_gpu_scheduler **sched_list,
+ unsigned int num_sched_list)
+{
+   WARN_ON(!num_sched_list || !sched_list);
+
+   entity->sched_list = sched_list;
+   entity->num_sched_list = num_sched_list;
+}
+EXPORT_SYMBOL(drm_sched_entity_modify_sched);
+
 /**
  * drm_sched_entity_is_idle - Check if entity is idle
  *
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 589be851f8a1..f70a84aaaf7a 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -297,6 +297,10 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched);
 int drm_sched_job_init(struct drm_sched_job *job,
   struct drm_sched_entity *entity,
   void *owner);
+void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
+ struct drm_gpu_scheduler **sched_list,
+  unsigned int num_sched_list);
+
 void drm_sched_job_cleanup(struct drm_sched_job *job);
 void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
 void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job 
*bad);
--
2.25.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 4/4] drm/amdgpu: remove unused functions

2020-02-28 Thread Nirmoy Das
amdgpu statically set priority for compute queues
at initialization so remove all the functions
responsible changing compute queue priority dynamically

Signed-off-by: Nirmoy Das 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |  70 
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   7 --
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c|  99 --
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 100 ---
 4 files changed, 276 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index ca6b52054b4b..a7e1d0425ed0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -150,76 +150,6 @@ void amdgpu_ring_undo(struct amdgpu_ring *ring)
ring->funcs->end_use(ring);
 }
 
-/**
- * amdgpu_ring_priority_put - restore a ring's priority
- *
- * @ring: amdgpu_ring structure holding the information
- * @priority: target priority
- *
- * Release a request for executing at @priority
- */
-void amdgpu_ring_priority_put(struct amdgpu_ring *ring,
- enum drm_sched_priority priority)
-{
-   int i;
-
-   if (!ring->funcs->set_priority)
-   return;
-
-   if (atomic_dec_return(>num_jobs[priority]) > 0)
-   return;
-
-   /* no need to restore if the job is already at the lowest priority */
-   if (priority == DRM_SCHED_PRIORITY_NORMAL)
-   return;
-
-   mutex_lock(>priority_mutex);
-   /* something higher prio is executing, no need to decay */
-   if (ring->priority > priority)
-   goto out_unlock;
-
-   /* decay priority to the next level with a job available */
-   for (i = priority; i >= DRM_SCHED_PRIORITY_MIN; i--) {
-   if (i == DRM_SCHED_PRIORITY_NORMAL
-   || atomic_read(>num_jobs[i])) {
-   ring->priority = i;
-   ring->funcs->set_priority(ring, i);
-   break;
-   }
-   }
-
-out_unlock:
-   mutex_unlock(>priority_mutex);
-}
-
-/**
- * amdgpu_ring_priority_get - change the ring's priority
- *
- * @ring: amdgpu_ring structure holding the information
- * @priority: target priority
- *
- * Request a ring's priority to be raised to @priority (refcounted).
- */
-void amdgpu_ring_priority_get(struct amdgpu_ring *ring,
- enum drm_sched_priority priority)
-{
-   if (!ring->funcs->set_priority)
-   return;
-
-   if (atomic_inc_return(>num_jobs[priority]) <= 0)
-   return;
-
-   mutex_lock(>priority_mutex);
-   if (priority <= ring->priority)
-   goto out_unlock;
-
-   ring->priority = priority;
-   ring->funcs->set_priority(ring, priority);
-
-out_unlock:
-   mutex_unlock(>priority_mutex);
-}
-
 /**
  * amdgpu_ring_init - init driver ring struct.
  *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 34fcd467f18d..87ec35b68bfd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -167,9 +167,6 @@ struct amdgpu_ring_funcs {
uint32_t reg0, uint32_t reg1,
uint32_t ref, uint32_t mask);
void (*emit_tmz)(struct amdgpu_ring *ring, bool start);
-   /* priority functions */
-   void (*set_priority) (struct amdgpu_ring *ring,
- enum drm_sched_priority priority);
/* Try to soft recover the ring to make the fence signal */
void (*soft_recovery)(struct amdgpu_ring *ring, unsigned vmid);
int (*preempt_ib)(struct amdgpu_ring *ring);
@@ -259,10 +256,6 @@ void amdgpu_ring_insert_nop(struct amdgpu_ring *ring, 
uint32_t count);
 void amdgpu_ring_generic_pad_ib(struct amdgpu_ring *ring, struct amdgpu_ib 
*ib);
 void amdgpu_ring_commit(struct amdgpu_ring *ring);
 void amdgpu_ring_undo(struct amdgpu_ring *ring);
-void amdgpu_ring_priority_get(struct amdgpu_ring *ring,
- enum drm_sched_priority priority);
-void amdgpu_ring_priority_put(struct amdgpu_ring *ring,
- enum drm_sched_priority priority);
 int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring *ring,
 unsigned ring_size, struct amdgpu_irq_src *irq_src,
 unsigned irq_type);
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index 6c4b7e49f97f..ed9aff72350d 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -6275,104 +6275,6 @@ static void gfx_v8_0_ring_set_wptr_compute(struct 
amdgpu_ring *ring)
WDOORBELL32(ring->doorbell_index, lower_32_bits(ring->wptr));
 }
 
-static void gfx_v8_0_ring_set_pipe_percent(struct amdgpu_ring *ring,
-  

[PATCH v2 1/4] drm/amdgpu: set compute queue priority at mqd_init

2020-02-28 Thread Nirmoy Das
We were changing compute ring priority while rings were being used
before every job submission which is not recommended. This patch
sets compute queue priority at mqd initialization for gfx8, gfx9 and
gfx10.

Policy: make queue 0 of each pipe as high priority compute queue

High/normal priority compute sched lists are generated from set of high/normal
priority compute queues. At context creation, entity of compute queue
get a sched list from high or normal priority depending on ctx->priority

Signed-off-by: Nirmoy Das 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  4 ---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c  | 40 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c  |  8 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h  | 13 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  6 
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |  1 +
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c   | 19 +++
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c| 23 --
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 20 
 9 files changed, 113 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index f397ff97b4e4..8304d0c87899 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1205,7 +1205,6 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
struct amdgpu_fpriv *fpriv = p->filp->driver_priv;
struct drm_sched_entity *entity = p->entity;
enum drm_sched_priority priority;
-   struct amdgpu_ring *ring;
struct amdgpu_bo_list_entry *e;
struct amdgpu_job *job;
uint64_t seq;
@@ -1258,9 +1257,6 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
priority = job->base.s_priority;
drm_sched_entity_push_job(>base, entity);

-   ring = to_amdgpu_ring(entity->rq->sched);
-   amdgpu_ring_priority_get(ring, priority);
-
amdgpu_vm_move_to_lru_tail(p->adev, >vm);

ttm_eu_fence_buffer_objects(>ticket, >validated, p->fence);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 94a6c42f29ea..b21771b37300 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -85,8 +85,8 @@ static int amdgpu_ctx_init_entity(struct amdgpu_ctx *ctx, 
const u32 hw_ip, const
num_scheds = 1;
break;
case AMDGPU_HW_IP_COMPUTE:
-   scheds = adev->gfx.compute_sched;
-   num_scheds = adev->gfx.num_compute_sched;
+   scheds = adev->gfx.compute_prio_sched[priority];
+   num_scheds = adev->gfx.num_compute_sched[priority];
break;
case AMDGPU_HW_IP_DMA:
scheds = adev->sdma.sdma_sched;
@@ -628,20 +628,46 @@ void amdgpu_ctx_mgr_fini(struct amdgpu_ctx_mgr *mgr)
mutex_destroy(>lock);
 }

+
+static void amdgpu_ctx_init_compute_sched(struct amdgpu_device *adev)
+{
+   int num_compute_sched_normal = 0;
+   int num_compute_sched_high = AMDGPU_MAX_COMPUTE_RINGS - 1;
+   int i;
+
+
+   for (i = 0; i < adev->gfx.num_compute_rings; i++) {
+   if (adev->gfx.compute_ring[i].high_priority)
+   adev->gfx.compute_sched[num_compute_sched_normal++] =
+   >gfx.compute_ring[i].sched;
+   else
+   adev->gfx.compute_sched[num_compute_sched_high--] =
+   >gfx.compute_ring[i].sched;
+   }
+
+   for (i = DRM_SCHED_PRIORITY_MIN; i <= DRM_SCHED_PRIORITY_NORMAL; i++) {
+   adev->gfx.compute_prio_sched[i] = >gfx.compute_sched[0];
+   adev->gfx.num_compute_sched[i] = num_compute_sched_normal;
+   }
+
+   for (i = DRM_SCHED_PRIORITY_NORMAL + 1; i < DRM_SCHED_PRIORITY_MAX; 
i++) {
+   adev->gfx.compute_prio_sched[i] =
+   >gfx.compute_sched[num_compute_sched_high - 1];
+   adev->gfx.num_compute_sched[i] =
+   adev->gfx.num_compute_rings - num_compute_sched_normal;
+   }
+}
+
 void amdgpu_ctx_init_sched(struct amdgpu_device *adev)
 {
int i, j;

+   amdgpu_ctx_init_compute_sched(adev);
for (i = 0; i < adev->gfx.num_gfx_rings; i++) {
adev->gfx.gfx_sched[i] = >gfx.gfx_ring[i].sched;
adev->gfx.num_gfx_sched++;
}

-   for (i = 0; i < adev->gfx.num_compute_rings; i++) {
-   adev->gfx.compute_sched[i] = >gfx.compute_ring[i].sched;
-   adev->gfx.num_compute_sched++;
-   }
-
for (i = 0; i < adev->sdma.num_instances; i++) {
adev->sdma.sdma_sched[i] = >sdma.instance[i].ring.sched;
adev->sdma.num_sdma_sched++;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 

[PATCH v2 3/4] drm/amdgpu: change hw sched list on ctx priority override

2020-02-28 Thread Nirmoy Das
Switch to appropriate sched list for an entity on priority override.

Signed-off-by: Nirmoy Das 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 27 +
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index b21771b37300..3744c689affc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -502,6 +502,27 @@ struct dma_fence *amdgpu_ctx_get_fence(struct amdgpu_ctx 
*ctx,
return fence;
 }

+static void amdgpu_ctx_set_entity_priority(struct amdgpu_ctx *ctx,
+  struct amdgpu_ctx_entity *aentity,
+  int hw_ip, enum drm_sched_priority priority)
+{
+   struct amdgpu_device *adev = ctx->adev;
+   struct drm_gpu_scheduler **scheds = NULL;
+   unsigned num_scheds = 0;
+
+   switch (hw_ip) {
+   case AMDGPU_HW_IP_COMPUTE:
+   scheds = adev->gfx.compute_prio_sched[priority];
+   num_scheds = adev->gfx.num_compute_sched[priority];
+   break;
+   default:
+   return;
+   }
+
+   drm_sched_entity_modify_sched(>entity, scheds, num_scheds);
+   drm_sched_entity_set_priority(>entity, priority);
+}
+
 void amdgpu_ctx_priority_override(struct amdgpu_ctx *ctx,
  enum drm_sched_priority priority)
 {
@@ -514,13 +535,11 @@ void amdgpu_ctx_priority_override(struct amdgpu_ctx *ctx,
ctx->init_priority : ctx->override_priority;
for (i = 0; i < AMDGPU_HW_IP_NUM; ++i) {
for (j = 0; j < amdgpu_ctx_num_entities[i]; ++j) {
-   struct drm_sched_entity *entity;
-
if (!ctx->entities[i][j])
continue;

-   entity = >entities[i][j]->entity;
-   drm_sched_entity_set_priority(entity, ctx_prio);
+   amdgpu_ctx_set_entity_priority(ctx, ctx->entities[i][j],
+  i, ctx_prio);
}
}
 }
--
2.25.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2] drm/amd/display: Fix dmub_psr_destroy()

2020-02-28 Thread Kazlauskas, Nicholas

On 2020-02-28 5:58 a.m., Dan Carpenter wrote:

This is freeing the wrong variable so it will crash.  It should be
freeing "*dmub" instead of "dmub".

Fixes: 4c1a1335dfe0 ("drm/amd/display: Driverside changes to support PSR in 
DMCUB")
Signed-off-by: Dan Carpenter 


Reviewed-by: Nicholas Kazlauskas 

Thanks!


---
  drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c 
b/drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c
index 2c932c29f1f9..f0936cb3c056 100644
--- a/drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c
+++ b/drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c
@@ -235,6 +235,6 @@ struct dmub_psr *dmub_psr_create(struct dc_context *ctx)
   */
  void dmub_psr_destroy(struct dmub_psr **dmub)
  {
-   kfree(dmub);
+   kfree(*dmub);
*dmub = NULL;
  }



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Lionel Landwerlin

On 28/02/2020 13:46, Michel Dänzer wrote:

On 2020-02-28 12:02 p.m., Erik Faye-Lund wrote:

On Fri, 2020-02-28 at 10:43 +, Daniel Stone wrote:

On Fri, 28 Feb 2020 at 10:06, Erik Faye-Lund
 wrote:

On Fri, 2020-02-28 at 11:40 +0200, Lionel Landwerlin wrote:

Yeah, changes on vulkan drivers or backend compilers should be
fairly
sandboxed.

We also have tools that only work for intel stuff, that should
never
trigger anything on other people's HW.

Could something be worked out using the tags?

I think so! We have the pre-defined environment variable
CI_MERGE_REQUEST_LABELS, and we can do variable conditions:

https://docs.gitlab.com/ee/ci/yaml/#onlyvariablesexceptvariables

That sounds like a pretty neat middle-ground to me. I just hope
that
new pipelines are triggered if new labels are added, because not
everyone is allowed to set labels, and sometimes people forget...

There's also this which is somewhat more robust:
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2569

I'm not sure it's more robust, but yeah that a useful tool too.

The reason I'm skeptical about the robustness is that we'll miss
testing if this misses a path.

Surely missing a path will be less likely / often to happen compared to
an MR missing a label. (Users which aren't members of the project can't
even set labels for an MR)



Sounds like a good alternative to tags.


-Lionel

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Michel Dänzer
On 2020-02-28 12:02 p.m., Erik Faye-Lund wrote:
> On Fri, 2020-02-28 at 10:43 +, Daniel Stone wrote:
>> On Fri, 28 Feb 2020 at 10:06, Erik Faye-Lund
>>  wrote:
>>> On Fri, 2020-02-28 at 11:40 +0200, Lionel Landwerlin wrote:
 Yeah, changes on vulkan drivers or backend compilers should be
 fairly
 sandboxed.

 We also have tools that only work for intel stuff, that should
 never
 trigger anything on other people's HW.

 Could something be worked out using the tags?
>>>
>>> I think so! We have the pre-defined environment variable
>>> CI_MERGE_REQUEST_LABELS, and we can do variable conditions:
>>>
>>> https://docs.gitlab.com/ee/ci/yaml/#onlyvariablesexceptvariables
>>>
>>> That sounds like a pretty neat middle-ground to me. I just hope
>>> that
>>> new pipelines are triggered if new labels are added, because not
>>> everyone is allowed to set labels, and sometimes people forget...
>>
>> There's also this which is somewhat more robust:
>> https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2569
> 
> I'm not sure it's more robust, but yeah that a useful tool too.
> 
> The reason I'm skeptical about the robustness is that we'll miss
> testing if this misses a path.

Surely missing a path will be less likely / often to happen compared to
an MR missing a label. (Users which aren't members of the project can't
even set labels for an MR)


-- 
Earthling Michel Dänzer   |   https://redhat.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Michel Dänzer
On 2020-02-28 10:28 a.m., Erik Faye-Lund wrote:
> 
> We could also do stuff like reducing the amount of tests we run on each
> commit, and punt some testing to a per-weekend test-run or someting
> like that. We don't *need* to know about every problem up front, just
> the stuff that's about to be released, really. The other stuff is just
> nice to have. If it's too expensive, I would say drop it.

I don't agree that pre-merge testing is just nice to have. A problem
which is only caught after it lands in mainline has a much bigger impact
than one which is already caught earlier.


-- 
Earthling Michel Dänzer   |   https://redhat.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Daniel Stone
Hi Jan,

On Fri, 28 Feb 2020 at 10:09, Jan Engelhardt  wrote:
> On Friday 2020-02-28 08:59, Daniel Stone wrote:
> >I believe that in January, we had $2082 of network cost (almost
> >entirely egress; ingress is basically free) and $1750 of
> >cloud-storage cost (almost all of which was download). That's based
> >on 16TB of cloud-storage (CI artifacts, container images, file
> >uploads, Git LFS) egress and 17.9TB of other egress (the web service
> >itself, repo activity). Projecting that out [×12 for a year] gives
> >us roughly $45k of network activity alone,
>
> I had come to a similar conclusion a few years back: It is not very
> economic to run ephemereal buildroots (and anything like it) between
> two (or more) "significant locations" of which one end is located in
> a Large Cloud datacenter like EC2/AWS/etc.
>
> As for such usecases, me and my surrounding peers have used (other)
> offerings where there is 50 TB free network/month, and yes that may
> have entailed doing more adminning than elsewhere - but an admin
> appreciates $2000 a lot more than a corporation, too.

Yes, absolutely. For context, our storage & network costs have
increased >10x in the past 12 months (~$320 Jan 2019), >3x in the past
6 months (~$1350 July 2019), and ~2x in the past 3 months (~$2000 Oct
2019).

I do now (personally) think that it's crossed the point at which it
would be worthwhile paying an admin to solve the problems that cloud
services currently solve for us - which wasn't true before. Such an
admin could also deal with things like our SMTP delivery failure rate,
which in the past year has spiked over 50% (see previous email),
demand for new services such as Discourse which will enable user
support without either a) users having to subscribe to a mailing list,
or b) bug trackers being cluttered up with user requests and other
non-bugs, etc.

Cheers,
Daniel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v2] drm/amd/display: Fix dmub_psr_destroy()

2020-02-28 Thread Dan Carpenter
This is freeing the wrong variable so it will crash.  It should be
freeing "*dmub" instead of "dmub".

Fixes: 4c1a1335dfe0 ("drm/amd/display: Driverside changes to support PSR in 
DMCUB")
Signed-off-by: Dan Carpenter 
---
 drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c 
b/drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c
index 2c932c29f1f9..f0936cb3c056 100644
--- a/drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c
+++ b/drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c
@@ -235,6 +235,6 @@ struct dmub_psr *dmub_psr_create(struct dc_context *ctx)
  */
 void dmub_psr_destroy(struct dmub_psr **dmub)
 {
-   kfree(dmub);
+   kfree(*dmub);
*dmub = NULL;
 }
-- 
2.11.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Daniel Stone
On Fri, 28 Feb 2020 at 10:06, Erik Faye-Lund
 wrote:
> On Fri, 2020-02-28 at 11:40 +0200, Lionel Landwerlin wrote:
> > Yeah, changes on vulkan drivers or backend compilers should be
> > fairly
> > sandboxed.
> >
> > We also have tools that only work for intel stuff, that should never
> > trigger anything on other people's HW.
> >
> > Could something be worked out using the tags?
>
> I think so! We have the pre-defined environment variable
> CI_MERGE_REQUEST_LABELS, and we can do variable conditions:
>
> https://docs.gitlab.com/ee/ci/yaml/#onlyvariablesexceptvariables
>
> That sounds like a pretty neat middle-ground to me. I just hope that
> new pipelines are triggered if new labels are added, because not
> everyone is allowed to set labels, and sometimes people forget...

There's also this which is somewhat more robust:
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2569

Cheers,
Daniel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Lucas Stach
On Fr, 2020-02-28 at 10:47 +0100, Daniel Vetter wrote:
> On Fri, Feb 28, 2020 at 10:29 AM Erik Faye-Lund
>  wrote:
> > On Fri, 2020-02-28 at 13:37 +1000, Dave Airlie wrote:
> > > On Fri, 28 Feb 2020 at 07:27, Daniel Vetter 
> > > wrote:
> > > > Hi all,
> > > > 
> > > > You might have read the short take in the X.org board meeting
> > > > minutes
> > > > already, here's the long version.
> > > > 
> > > > The good news: gitlab.fd.o has become very popular with our
> > > > communities, and is used extensively. This especially includes all
> > > > the
> > > > CI integration. Modern development process and tooling, yay!
> > > > 
> > > > The bad news: The cost in growth has also been tremendous, and it's
> > > > breaking our bank account. With reasonable estimates for continued
> > > > growth we're expecting hosting expenses totalling 75k USD this
> > > > year,
> > > > and 90k USD next year. With the current sponsors we've set up we
> > > > can't
> > > > sustain that. We estimate that hosting expenses for gitlab.fd.o
> > > > without any of the CI features enabled would total 30k USD, which
> > > > is
> > > > within X.org's ability to support through various sponsorships,
> > > > mostly
> > > > through XDC.
> > > > 
> > > > Note that X.org does no longer sponsor any CI runners themselves,
> > > > we've stopped that. The huge additional expenses are all just in
> > > > storing and serving build artifacts and images to outside CI
> > > > runners
> > > > sponsored by various companies. A related topic is that with the
> > > > growth in fd.o it's becoming infeasible to maintain it all on
> > > > volunteer admin time. X.org is therefore also looking for admin
> > > > sponsorship, at least medium term.
> > > > 
> > > > Assuming that we want cash flow reserves for one year of
> > > > gitlab.fd.o
> > > > (without CI support) and a trimmed XDC and assuming no sponsor
> > > > payment
> > > > meanwhile, we'd have to cut CI services somewhere between May and
> > > > June
> > > > this year. The board is of course working on acquiring sponsors,
> > > > but
> > > > filling a shortfall of this magnitude is neither easy nor quick
> > > > work,
> > > > and we therefore decided to give an early warning as soon as
> > > > possible.
> > > > Any help in finding sponsors for fd.o is very much appreciated.
> > > 
> > > a) Ouch.
> > > 
> > > b) we probably need to take a large step back here.
> > > 
> > 
> > I kinda agree, but maybe the step doesn't have to be *too* large?
> > 
> > I wonder if we could solve this by restructuring the project a bit. I'm
> > talking purely from a Mesa point of view here, so it might not solve
> > the full problem, but:
> > 
> > 1. It feels silly that we need to test changes to e.g the i965 driver
> > on dragonboards. We only have a big "do not run CI at all" escape-
> > hatch.
> > 
> > 2. A lot of us are working for a company that can probably pay for
> > their own needs in terms of CI. Perhaps moving some costs "up front" to
> > the company that needs it can make the future of CI for those who can't
> > do this
> > 
> > 3. I think we need a much more detailed break-down of the cost to make
> > educated changes. For instance, how expensive is Docker image
> > uploads/downloads (e.g intermediary artifacts) compared to build logs
> > and final test-results? What kind of artifacts?
> 
> We have logs somewhere, but no one yet got around to analyzing that.
> Which will be quite a bit of work to do since the cloud storage is
> totally disconnected from the gitlab front-end, making the connection
> to which project or CI job caused something is going to require
> scripting. Volunteers definitely very much welcome I think.

It's very surprising to me that this kind of cost monitoring is treated
as an afterthought, especially since one of the main jobs of the X.Org
board is to keep spending under control and transparent.

Also from all the conversations it's still unclear to me if the google
hosting costs are already over the sponsored credits (so is burning a
hole into X.org bank account right now) or if this is only going to
happen at a later point in time.

Even with CI disabled it seems that the board estimates a cost of 30k
annually for the plain gitlab hosting. Is this covered by the credits
sponsored by google? If not, why wasn't there a board voting on this
spending? All other spending seem to require pre-approval by the board.
Why wasn't gitlab hosting cost discussed much earlier in the public
board meetings, especially if it's going to be such an big chunk of the
overall spending of the X.Org foundation?

Regards,
Lucas

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amd/display: Use after free in dmub_psr_destroy()

2020-02-28 Thread Dan Carpenter
On Fri, Feb 28, 2020 at 11:05:11AM +0100, Michel Dänzer wrote:
> On 2020-02-28 9:22 a.m., Dan Carpenter wrote:
> > These lines need to be re-ordered so that we don't dereference "dmub"
> > after we just freed it.
> > 
> > Fixes: 4c1a1335dfe0 ("drm/amd/display: Driverside changes to support PSR in 
> > DMCUB")
> > Signed-off-by: Dan Carpenter 
> > ---
> >  drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c 
> > b/drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c
> > index 2c932c29f1f9..dc858b152c6e 100644
> > --- a/drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c
> > +++ b/drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c
> > @@ -235,6 +235,6 @@ struct dmub_psr *dmub_psr_create(struct dc_context *ctx)
> >   */
> >  void dmub_psr_destroy(struct dmub_psr **dmub)
> >  {
> > -   kfree(dmub);
> > *dmub = NULL;
> > +   kfree(dmub);
> >  }
> > 
> 
> Maybe
> 
>   kfree(*dmub);
> 
> was intended instead?
> 

Ah yeah.  You're right.  I will resend.

> 
> Actually, this function and others in this file seem completely unused?

It's used in linux-next from dcn21_resource_destruct().

drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
   986  
   987  if (pool->base.dp_clock_source != NULL) {
   988  dcn20_clock_source_destroy(>base.dp_clock_source);
   989  pool->base.dp_clock_source = NULL;
   990  }
   991  
   992  
   993  if (pool->base.abm != NULL)
   994  dce_abm_destroy(>base.abm);
   995  
   996  if (pool->base.dmcu != NULL)
   997  dce_dmcu_destroy(>base.dmcu);
   998  
   999  if (pool->base.psr != NULL)
  1000  dmub_psr_destroy(>base.psr);

kfree(>base.psr); will crash.

  1001  
  1002  if (pool->base.dccg != NULL)
  1003  dcn_dccg_destroy(>base.dccg);
  1004  
  1005  if (pool->base.pp_smu != NULL)
  1006  dcn21_pp_smu_destroy(>base.pp_smu);
  1007  }


regards,
dan carpenter

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amd/display: Use after free in dmub_psr_destroy()

2020-02-28 Thread Michel Dänzer
On 2020-02-28 9:22 a.m., Dan Carpenter wrote:
> These lines need to be re-ordered so that we don't dereference "dmub"
> after we just freed it.
> 
> Fixes: 4c1a1335dfe0 ("drm/amd/display: Driverside changes to support PSR in 
> DMCUB")
> Signed-off-by: Dan Carpenter 
> ---
>  drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c 
> b/drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c
> index 2c932c29f1f9..dc858b152c6e 100644
> --- a/drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c
> +++ b/drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c
> @@ -235,6 +235,6 @@ struct dmub_psr *dmub_psr_create(struct dc_context *ctx)
>   */
>  void dmub_psr_destroy(struct dmub_psr **dmub)
>  {
> - kfree(dmub);
>   *dmub = NULL;
> + kfree(dmub);
>  }
> 

Maybe

kfree(*dmub);

was intended instead?


Actually, this function and others in this file seem completely unused?


-- 
Earthling Michel Dänzer   |   https://redhat.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Daniel Vetter
On Fri, Feb 28, 2020 at 10:29 AM Erik Faye-Lund
 wrote:
>
> On Fri, 2020-02-28 at 13:37 +1000, Dave Airlie wrote:
> > On Fri, 28 Feb 2020 at 07:27, Daniel Vetter 
> > wrote:
> > > Hi all,
> > >
> > > You might have read the short take in the X.org board meeting
> > > minutes
> > > already, here's the long version.
> > >
> > > The good news: gitlab.fd.o has become very popular with our
> > > communities, and is used extensively. This especially includes all
> > > the
> > > CI integration. Modern development process and tooling, yay!
> > >
> > > The bad news: The cost in growth has also been tremendous, and it's
> > > breaking our bank account. With reasonable estimates for continued
> > > growth we're expecting hosting expenses totalling 75k USD this
> > > year,
> > > and 90k USD next year. With the current sponsors we've set up we
> > > can't
> > > sustain that. We estimate that hosting expenses for gitlab.fd.o
> > > without any of the CI features enabled would total 30k USD, which
> > > is
> > > within X.org's ability to support through various sponsorships,
> > > mostly
> > > through XDC.
> > >
> > > Note that X.org does no longer sponsor any CI runners themselves,
> > > we've stopped that. The huge additional expenses are all just in
> > > storing and serving build artifacts and images to outside CI
> > > runners
> > > sponsored by various companies. A related topic is that with the
> > > growth in fd.o it's becoming infeasible to maintain it all on
> > > volunteer admin time. X.org is therefore also looking for admin
> > > sponsorship, at least medium term.
> > >
> > > Assuming that we want cash flow reserves for one year of
> > > gitlab.fd.o
> > > (without CI support) and a trimmed XDC and assuming no sponsor
> > > payment
> > > meanwhile, we'd have to cut CI services somewhere between May and
> > > June
> > > this year. The board is of course working on acquiring sponsors,
> > > but
> > > filling a shortfall of this magnitude is neither easy nor quick
> > > work,
> > > and we therefore decided to give an early warning as soon as
> > > possible.
> > > Any help in finding sponsors for fd.o is very much appreciated.
> >
> > a) Ouch.
> >
> > b) we probably need to take a large step back here.
> >
>
> I kinda agree, but maybe the step doesn't have to be *too* large?
>
> I wonder if we could solve this by restructuring the project a bit. I'm
> talking purely from a Mesa point of view here, so it might not solve
> the full problem, but:
>
> 1. It feels silly that we need to test changes to e.g the i965 driver
> on dragonboards. We only have a big "do not run CI at all" escape-
> hatch.
>
> 2. A lot of us are working for a company that can probably pay for
> their own needs in terms of CI. Perhaps moving some costs "up front" to
> the company that needs it can make the future of CI for those who can't
> do this
>
> 3. I think we need a much more detailed break-down of the cost to make
> educated changes. For instance, how expensive is Docker image
> uploads/downloads (e.g intermediary artifacts) compared to build logs
> and final test-results? What kind of artifacts?

We have logs somewhere, but no one yet got around to analyzing that.
Which will be quite a bit of work to do since the cloud storage is
totally disconnected from the gitlab front-end, making the connection
to which project or CI job caused something is going to require
scripting. Volunteers definitely very much welcome I think.

> One suggestion would be to do something more similar to what the kernel
> does, and separate into different repos for different subsystems. This
> could allow us to have separate testing-pipelines for these repos,
> which would mean that for instance a change to RADV didn't trigger a
> full Panfrost test-run.

Uh as someone who lives the kernel multi-tree model daily, there's a
_lot_ of pain involved. I think much better to look at filtering out
CI targets for when nothing relevant happened. But that gets somewhat
tricky, since "nothing relevant" is always only relative to some
baseline, so bit of scripting and all involved to make sure you don't
run stuff too often or (probably worse) not often enough.
-Daniel

> This would probably require us to accept using a more branch-heavy
> work-flow. I don't personally think that would be a bad thing.
>
> But this is all kinda based on an assumption that running hardware-
> testing is the expensive part. I think that's quite possibly the case,
> but some more numbers would be helpful. I mean, it might turn out that
> just throwing up a Docker cache inside the organizations that host
> runners might be sufficient for all I know...
>
> We could also do stuff like reducing the amount of tests we run on each
> commit, and punt some testing to a per-weekend test-run or someting
> like that. We don't *need* to know about every problem up front, just
> the stuff that's about to be released, really. The other stuff is just
> nice to have. If it's too expensive, I would say drop it.

Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Lionel Landwerlin

On 28/02/2020 11:28, Erik Faye-Lund wrote:

On Fri, 2020-02-28 at 13:37 +1000, Dave Airlie wrote:

On Fri, 28 Feb 2020 at 07:27, Daniel Vetter 
wrote:

Hi all,

You might have read the short take in the X.org board meeting
minutes
already, here's the long version.

The good news: gitlab.fd.o has become very popular with our
communities, and is used extensively. This especially includes all
the
CI integration. Modern development process and tooling, yay!

The bad news: The cost in growth has also been tremendous, and it's
breaking our bank account. With reasonable estimates for continued
growth we're expecting hosting expenses totalling 75k USD this
year,
and 90k USD next year. With the current sponsors we've set up we
can't
sustain that. We estimate that hosting expenses for gitlab.fd.o
without any of the CI features enabled would total 30k USD, which
is
within X.org's ability to support through various sponsorships,
mostly
through XDC.

Note that X.org does no longer sponsor any CI runners themselves,
we've stopped that. The huge additional expenses are all just in
storing and serving build artifacts and images to outside CI
runners
sponsored by various companies. A related topic is that with the
growth in fd.o it's becoming infeasible to maintain it all on
volunteer admin time. X.org is therefore also looking for admin
sponsorship, at least medium term.

Assuming that we want cash flow reserves for one year of
gitlab.fd.o
(without CI support) and a trimmed XDC and assuming no sponsor
payment
meanwhile, we'd have to cut CI services somewhere between May and
June
this year. The board is of course working on acquiring sponsors,
but
filling a shortfall of this magnitude is neither easy nor quick
work,
and we therefore decided to give an early warning as soon as
possible.
Any help in finding sponsors for fd.o is very much appreciated.

a) Ouch.

b) we probably need to take a large step back here.


I kinda agree, but maybe the step doesn't have to be *too* large?

I wonder if we could solve this by restructuring the project a bit. I'm
talking purely from a Mesa point of view here, so it might not solve
the full problem, but:

1. It feels silly that we need to test changes to e.g the i965 driver
on dragonboards. We only have a big "do not run CI at all" escape-
hatch.



Yeah, changes on vulkan drivers or backend compilers should be fairly 
sandboxed.


We also have tools that only work for intel stuff, that should never 
trigger anything on other people's HW.


Could something be worked out using the tags?


-Lionel

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Daniel Stone
On Fri, 28 Feb 2020 at 08:48, Dave Airlie  wrote:
> On Fri, 28 Feb 2020 at 18:18, Daniel Stone  wrote:
> > The last I looked, Google GCP / Amazon AWS / Azure were all pretty
> > comparable in terms of what you get and what you pay for them.
> > Obviously providers like Packet and Digital Ocean who offer bare-metal
> > services are cheaper, but then you need to find someone who is going
> > to properly administer the various machines, install decent
> > monitoring, make sure that more storage is provisioned when we need
> > more storage (which is basically all the time), make sure that the
> > hardware is maintained in decent shape (pretty sure one of the fd.o
> > machines has had a drive in imminent-failure state for the last few
> > months), etc.
> >
> > Given the size of our service, that's a much better plan (IMO) than
> > relying on someone who a) isn't an admin by trade, b) has a million
> > other things to do, and c) hasn't wanted to do it for the past several
> > years. But as long as that's the resources we have, then we're paying
> > the cloud tradeoff, where we pay more money in exchange for fewer
> > problems.
>
> Admin for gitlab and CI is a full time role anyways. The system is
> definitely not self sustaining without time being put in by you and
> anholt still. If we have $75k to burn on credits, and it was diverted
> to just pay an admin to admin the real hw + gitlab/CI would that not
> be a better use of the money? I didn't know if we can afford $75k for
> an admin, but suddenly we can afford it for gitlab credits?

s/gitlab credits/GCP credits/

I took a quick look at HPE, which we previously used for bare metal,
and it looks like we'd be spending $25-50k (depending on how much
storage you want to provision, how much room you want to leave to
provision more storage later, how much you care about backups) to run
a similar level of service so that'd put a bit of a dint in your
year-one budget.

The bare-metal hosting providers also add up to more expensive than
you might think, again especially if you want either redundancy or
just backups.

> > Yes, we could federate everything back out so everyone runs their own
> > builds and executes those. Tinderbox did something really similar to
> > that IIRC; not sure if Buildbot does as well. Probably rules out
> > pre-merge testing, mind.
>
> Why? does gitlab not support the model? having builds done in parallel
> on runners closer to the test runners seems like it should be a thing.
> I guess artifact transfer would cost less then as a result.

It does support the model but if every single build executor is also
compiling Mesa from scratch locally, how long do you think that's
going to take?

> > Again, if you want everything to be centrally
> > designed/approved/monitored/controlled, that's a fine enough idea, and
> > I'd be happy to support whoever it was who was doing that for all of
> > fd.o.
>
> I don't think we have any choice but to have someone centrally
> controlling it, You can't have a system in place that lets CI users
> burn largs sums of money without authorisation, and that is what we
> have now.

OK, not sure who it is who's going to be approving every update to
every .gitlab-ci.yml in the repository, or maybe we just have zero
shared runners and anyone who wants to do builds can BYO.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH 2/4] drm/scheduler: implement a function to modify sched list

2020-02-28 Thread Nirmoy


On 2/28/20 8:47 AM, Christian König wrote:

Am 28.02.20 um 06:08 schrieb Luben Tuikov:

On 2020-02-27 4:40 p.m., Nirmoy Das wrote:

implement drm_sched_entity_modify_sched() which can modify existing
sched_list with a different one. This is going to be helpful when
userspace changes priority of a ctx/entity then driver can switch to
corresponding hw shced list for that priority

Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/scheduler/sched_entity.c | 24 


  include/drm/gpu_scheduler.h  |  4 
  2 files changed, 28 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c

index 63bccd201b97..711e9d504bcb 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -83,6 +83,30 @@ int drm_sched_entity_init(struct drm_sched_entity 
*entity,

  }
  EXPORT_SYMBOL(drm_sched_entity_init);
  +/**
+ * drm_sched_entity_modify_sched - Modify sched of an entity
+ *
+ * @entity: scheduler entity to init
+ * @sched_list: the list of new drm scheds which will replace
+ *    existing entity->sched_list
+ * @num_sched_list: number of drm sched in sched_list
+ *
+ * Returns 0 on success or a negative error code on failure.
+ */
+int drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
+  struct drm_gpu_scheduler **sched_list,
+  unsigned int num_sched_list)
+{
+    if (!(entity && sched_list && (num_sched_list == 0 || 
sched_list[0])))

+    return -EINVAL;

This seems unmaintainable. I'd write it in its natural form:


This is probably just copy & pasted from the init function and 
complete overkill here.

Was indeed lame copy paste from my side.




So if num_sched_list is 0 or 1 we return NULL?
And if num_sched_list is 2 or greater we return sched_list
of which sched_list[0] could be NULL?


This is also copy from the init code and completely wrong here.

What we should do instead is just: WARN_ON(!num_sched_list || 
!sched_list);


Thanks, this will make it much simple. I will have this in next version.

Nirmoy


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: clean wptr on wb when gpu recovery

2020-02-28 Thread Liu, Monk
This is a clear fix :

After TDR we have a compute ring HQD restore from its MQD, but the MQD only 
record "WPTR_ADDR_LO/HI" so once
HQD restored the MEC would immediately read value from "WPTR_ADDR_LO/HI" which 
is a WB memory,  and that value is sometime not "0"  (because TDR won't clear 
WB, its value is what a hang process left there )
So MEC consider there is command in RB (since RPTR != WPTR) thus lead to 
further hang 

Reviewed-by: Monk Liu 

_
Monk Liu|GPU Virtualization Team |AMD


-Original Message-
From: Christian König  
Sent: Friday, February 28, 2020 5:20 PM
To: Tao, Yintian ; Koenig, Christian 
; Deucher, Alexander ; 
Liu, Monk 
Cc: amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: clean wptr on wb when gpu recovery

Am 28.02.20 um 07:31 schrieb Yintian Tao:
> The TDR will be randomly failed due to compute ring test failure. If 
> the compute ring wptr & 0x7ff(ring_buf_mask) is 0x100 then after map 
> mqd the compute ring rptr will be synced with 0x100. And the ring test 
> packet size is also 0x100.
> Then after invocation of amdgpu_ring_commit, the cp will not really 
> handle the packet on the ring buffer because rptr is equal to wptr.
>
> Signed-off-by: Yintian Tao 

Of hand that looks correct to me, but I can't fully judge if that won't have 
any negative side effects. Patch is Acked-by: Christian König 
 for now.

Monk according to git you modified that function as well. Could this have any 
potential negative effect for SRIOV? I don't think so, but better save than 
sorry.

Regards,
Christian.

> ---
>   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 1 +
>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 1 +
>   2 files changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index 44f00ecea322..5df1a6d45457 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -3508,6 +3508,7 @@ static int gfx_v10_0_kcq_init_queue(struct 
> amdgpu_ring *ring)
>   
>   /* reset ring buffer */
>   ring->wptr = 0;
> + atomic64_set((atomic64_t *)>wb.wb[ring->wptr_offs], 0);
>   amdgpu_ring_clear_ring(ring);
>   } else {
>   amdgpu_ring_clear_ring(ring);
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index 4135e4126e82..ac22490e8656 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -3664,6 +3664,7 @@ static int gfx_v9_0_kcq_init_queue(struct 
> amdgpu_ring *ring)
>   
>   /* reset ring buffer */
>   ring->wptr = 0;
> + atomic64_set((atomic64_t *)>wb.wb[ring->wptr_offs], 0);
>   amdgpu_ring_clear_ring(ring);
>   } else {
>   amdgpu_ring_clear_ring(ring);

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: clean wptr on wb when gpu recovery

2020-02-28 Thread Christian König

Am 28.02.20 um 07:31 schrieb Yintian Tao:

The TDR will be randomly failed due to compute ring
test failure. If the compute ring wptr & 0x7ff(ring_buf_mask)
is 0x100 then after map mqd the compute ring rptr will be
synced with 0x100. And the ring test packet size is also 0x100.
Then after invocation of amdgpu_ring_commit, the cp will not
really handle the packet on the ring buffer because rptr is equal to wptr.

Signed-off-by: Yintian Tao 


Of hand that looks correct to me, but I can't fully judge if that won't 
have any negative side effects. Patch is Acked-by: Christian König 
 for now.


Monk according to git you modified that function as well. Could this 
have any potential negative effect for SRIOV? I don't think so, but 
better save than sorry.


Regards,
Christian.


---
  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 1 +
  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 1 +
  2 files changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 44f00ecea322..5df1a6d45457 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -3508,6 +3508,7 @@ static int gfx_v10_0_kcq_init_queue(struct amdgpu_ring 
*ring)
  
  		/* reset ring buffer */

ring->wptr = 0;
+   atomic64_set((atomic64_t *)>wb.wb[ring->wptr_offs], 0);
amdgpu_ring_clear_ring(ring);
} else {
amdgpu_ring_clear_ring(ring);
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 4135e4126e82..ac22490e8656 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -3664,6 +3664,7 @@ static int gfx_v9_0_kcq_init_queue(struct amdgpu_ring 
*ring)
  
  		/* reset ring buffer */

ring->wptr = 0;
+   atomic64_set((atomic64_t *)>wb.wb[ring->wptr_offs], 0);
amdgpu_ring_clear_ring(ring);
} else {
amdgpu_ring_clear_ring(ring);


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 2/2] drm/amdgpu: Add debugfs interface to set arbitrary sclk for navi14 (v2)

2020-02-28 Thread Quan, Evan
Series is reviewed-by: Evan Quan 

-Original Message-
From: Chengming Gui  
Sent: Friday, February 28, 2020 5:00 PM
To: amd-gfx@lists.freedesktop.org
Cc: Quan, Evan ; Feng, Kenneth ; Xu, 
Feifei ; Gui, Jack 
Subject: [PATCH 2/2] drm/amdgpu: Add debugfs interface to set arbitrary sclk 
for navi14 (v2)

add debugfs interface amdgpu_force_sclk
to set arbitrary sclk for navi14

v2: Add lock

Signed-off-by: Chengming Gui 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c| 43 ++
 drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h |  3 ++
 2 files changed, 46 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index abc1482..831f70d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1257,9 +1257,43 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val)
return 0;
 }
 
+static int amdgpu_debugfs_sclk_set(void *data, u64 val) {
+   int ret = 0;
+   uint32_t max_freq, min_freq;
+   struct amdgpu_device *adev = (struct amdgpu_device *)data;
+
+   if (amdgpu_sriov_vf(adev) && !amdgpu_sriov_is_pp_one_vf(adev))
+   return -EINVAL;
+
+   ret = pm_runtime_get_sync(adev->ddev->dev);
+   if (ret < 0)
+   return ret;
+
+   if (is_support_sw_smu(adev)) {
+   ret = smu_get_dpm_freq_range(>smu, SMU_SCLK, _freq, 
_freq, true);
+   if (ret || val > max_freq || val < min_freq)
+   return -EINVAL;
+   ret = smu_set_soft_freq_range(>smu, SMU_SCLK, 
(uint32_t)val, (uint32_t)val, true);
+   } else {
+   return 0;
+   }
+
+   pm_runtime_mark_last_busy(adev->ddev->dev);
+   pm_runtime_put_autosuspend(adev->ddev->dev);
+
+   if (ret)
+   return -EINVAL;
+
+   return 0;
+}
+
 DEFINE_SIMPLE_ATTRIBUTE(fops_ib_preempt, NULL,
amdgpu_debugfs_ib_preempt, "%llu\n");
 
+DEFINE_SIMPLE_ATTRIBUTE(fops_sclk_set, NULL,
+   amdgpu_debugfs_sclk_set, "%llu\n");
+
 int amdgpu_debugfs_init(struct amdgpu_device *adev)  {
int r, i;
@@ -1273,6 +1307,15 @@ int amdgpu_debugfs_init(struct amdgpu_device *adev)
return -EIO;
}
 
+   adev->smu.debugfs_sclk =
+   debugfs_create_file("amdgpu_force_sclk", 0200,
+   adev->ddev->primary->debugfs_root, adev,
+   _sclk_set);
+   if (!(adev->smu.debugfs_sclk)) {
+   DRM_ERROR("unable to create amdgpu_set_sclk debugsfs file\n");
+   return -EIO;
+   }
+
/* Register debugfs entries for amdgpu_ttm */
r = amdgpu_ttm_debugfs_init(adev);
if (r) {
diff --git a/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h
index c8e72c7..657a6f1 100644
--- a/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h
@@ -372,6 +372,9 @@ struct smu_context
struct amd_pp_display_configuration  *display_config;
struct smu_baco_context smu_baco;
void *od_settings;
+#if defined(CONFIG_DEBUG_FS)
+   struct dentry   *debugfs_sclk;
+#endif
 
uint32_t pstate_sclk;
uint32_t pstate_mclk;
--
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/2] drm/amdgpu: add lock option for smu_set_soft_freq_range()

2020-02-28 Thread Chengming Gui
Add lock_needed param for smu_set_soft_freq_range()

Signed-off-by: Chengming Gui 
---
 drivers/gpu/drm/amd/powerplay/amdgpu_smu.c |  7 ++-
 drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h |  2 +-
 drivers/gpu/drm/amd/powerplay/navi10_ppt.c | 14 +++---
 drivers/gpu/drm/amd/powerplay/renoir_ppt.c |  8 
 4 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c 
b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
index 9f6da26..e3398f9 100644
--- a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
@@ -210,7 +210,7 @@ int smu_get_smc_version(struct smu_context *smu, uint32_t 
*if_version, uint32_t
 }
 
 int smu_set_soft_freq_range(struct smu_context *smu, enum smu_clk_type 
clk_type,
-   uint32_t min, uint32_t max)
+   uint32_t min, uint32_t max, bool lock_needed)
 {
int ret = 0;
 
@@ -220,7 +220,12 @@ int smu_set_soft_freq_range(struct smu_context *smu, enum 
smu_clk_type clk_type,
if (!smu_clk_dpm_is_enabled(smu, clk_type))
return 0;
 
+   if (lock_needed)
+   mutex_lock(>mutex);
ret = smu_set_soft_freq_limited_range(smu, clk_type, min, max);
+   if (lock_needed)
+   mutex_unlock(>mutex);
+
return ret;
 }
 
diff --git a/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h
index d652f920..c8e72c7 100644
--- a/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h
@@ -707,7 +707,7 @@ int smu_get_dpm_level_count(struct smu_context *smu, enum 
smu_clk_type clk_type,
 int smu_get_dpm_freq_range(struct smu_context *smu, enum smu_clk_type clk_type,
   uint32_t *min, uint32_t *max, bool lock_needed);
 int smu_set_soft_freq_range(struct smu_context *smu, enum smu_clk_type 
clk_type,
-   uint32_t min, uint32_t max);
+   uint32_t min, uint32_t max, bool lock_needed);
 int smu_set_hard_freq_range(struct smu_context *smu, enum smu_clk_type 
clk_type,
uint32_t min, uint32_t max);
 int smu_get_dpm_level_range(struct smu_context *smu, enum smu_clk_type 
clk_type,
diff --git a/drivers/gpu/drm/amd/powerplay/navi10_ppt.c 
b/drivers/gpu/drm/amd/powerplay/navi10_ppt.c
index 04b569d..6e41f3c 100644
--- a/drivers/gpu/drm/amd/powerplay/navi10_ppt.c
+++ b/drivers/gpu/drm/amd/powerplay/navi10_ppt.c
@@ -970,7 +970,7 @@ static int navi10_force_clk_levels(struct smu_context *smu,
if (ret)
return size;
 
-   ret = smu_set_soft_freq_range(smu, clk_type, min_freq, 
max_freq);
+   ret = smu_set_soft_freq_range(smu, clk_type, min_freq, 
max_freq, false);
if (ret)
return size;
break;
@@ -1094,7 +1094,7 @@ static int navi10_force_dpm_limit_value(struct 
smu_context *smu, bool highest)
return ret;
 
force_freq = highest ? max_freq : min_freq;
-   ret = smu_set_soft_freq_range(smu, clk_type, force_freq, 
force_freq);
+   ret = smu_set_soft_freq_range(smu, clk_type, force_freq, 
force_freq, false);
if (ret)
return ret;
}
@@ -1120,7 +1120,7 @@ static int navi10_unforce_dpm_levels(struct smu_context 
*smu)
if (ret)
return ret;
 
-   ret = smu_set_soft_freq_range(smu, clk_type, min_freq, 
max_freq);
+   ret = smu_set_soft_freq_range(smu, clk_type, min_freq, 
max_freq, false);
if (ret)
return ret;
}
@@ -1680,10 +1680,10 @@ static int navi10_set_standard_performance_level(struct 
smu_context *smu)
return navi10_set_performance_level(smu, 
AMD_DPM_FORCED_LEVEL_AUTO);
}
 
-   ret = smu_set_soft_freq_range(smu, SMU_SCLK, sclk_freq, sclk_freq);
+   ret = smu_set_soft_freq_range(smu, SMU_SCLK, sclk_freq, sclk_freq, 
false);
if (ret)
return ret;
-   ret = smu_set_soft_freq_range(smu, SMU_UCLK, uclk_freq, uclk_freq);
+   ret = smu_set_soft_freq_range(smu, SMU_UCLK, uclk_freq, uclk_freq, 
false);
if (ret)
return ret;
 
@@ -1748,10 +1748,10 @@ static int navi10_set_peak_performance_level(struct 
smu_context *smu)
if (ret)
return ret;
 
-   ret = smu_set_soft_freq_range(smu, SMU_SCLK, sclk_freq, sclk_freq);
+   ret = smu_set_soft_freq_range(smu, SMU_SCLK, sclk_freq, sclk_freq, 
false);
if (ret)
return ret;
-   ret = smu_set_soft_freq_range(smu, SMU_UCLK, uclk_freq, uclk_freq);
+   ret = smu_set_soft_freq_range(smu, SMU_UCLK, uclk_freq, uclk_freq, 
false);
if (ret)
return ret;
 
diff --git 

[PATCH 2/2] drm/amdgpu: Add debugfs interface to set arbitrary sclk for navi14 (v2)

2020-02-28 Thread Chengming Gui
add debugfs interface amdgpu_force_sclk
to set arbitrary sclk for navi14

v2: Add lock

Signed-off-by: Chengming Gui 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c| 43 ++
 drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h |  3 ++
 2 files changed, 46 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index abc1482..831f70d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1257,9 +1257,43 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val)
return 0;
 }
 
+static int amdgpu_debugfs_sclk_set(void *data, u64 val)
+{
+   int ret = 0;
+   uint32_t max_freq, min_freq;
+   struct amdgpu_device *adev = (struct amdgpu_device *)data;
+
+   if (amdgpu_sriov_vf(adev) && !amdgpu_sriov_is_pp_one_vf(adev))
+   return -EINVAL;
+
+   ret = pm_runtime_get_sync(adev->ddev->dev);
+   if (ret < 0)
+   return ret;
+
+   if (is_support_sw_smu(adev)) {
+   ret = smu_get_dpm_freq_range(>smu, SMU_SCLK, _freq, 
_freq, true);
+   if (ret || val > max_freq || val < min_freq)
+   return -EINVAL;
+   ret = smu_set_soft_freq_range(>smu, SMU_SCLK, 
(uint32_t)val, (uint32_t)val, true);
+   } else {
+   return 0;
+   }
+
+   pm_runtime_mark_last_busy(adev->ddev->dev);
+   pm_runtime_put_autosuspend(adev->ddev->dev);
+
+   if (ret)
+   return -EINVAL;
+
+   return 0;
+}
+
 DEFINE_SIMPLE_ATTRIBUTE(fops_ib_preempt, NULL,
amdgpu_debugfs_ib_preempt, "%llu\n");
 
+DEFINE_SIMPLE_ATTRIBUTE(fops_sclk_set, NULL,
+   amdgpu_debugfs_sclk_set, "%llu\n");
+
 int amdgpu_debugfs_init(struct amdgpu_device *adev)
 {
int r, i;
@@ -1273,6 +1307,15 @@ int amdgpu_debugfs_init(struct amdgpu_device *adev)
return -EIO;
}
 
+   adev->smu.debugfs_sclk =
+   debugfs_create_file("amdgpu_force_sclk", 0200,
+   adev->ddev->primary->debugfs_root, adev,
+   _sclk_set);
+   if (!(adev->smu.debugfs_sclk)) {
+   DRM_ERROR("unable to create amdgpu_set_sclk debugsfs file\n");
+   return -EIO;
+   }
+
/* Register debugfs entries for amdgpu_ttm */
r = amdgpu_ttm_debugfs_init(adev);
if (r) {
diff --git a/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h
index c8e72c7..657a6f1 100644
--- a/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h
@@ -372,6 +372,9 @@ struct smu_context
struct amd_pp_display_configuration  *display_config;
struct smu_baco_context smu_baco;
void *od_settings;
+#if defined(CONFIG_DEBUG_FS)
+   struct dentry   *debugfs_sclk;
+#endif
 
uint32_t pstate_sclk;
uint32_t pstate_mclk;
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Dave Airlie
On Fri, 28 Feb 2020 at 18:18, Daniel Stone  wrote:
>
> On Fri, 28 Feb 2020 at 03:38, Dave Airlie  wrote:
> > b) we probably need to take a large step back here.
> >
> > Look at this from a sponsor POV, why would I give X.org/fd.o
> > sponsorship money that they are just giving straight to google to pay
> > for hosting credits? Google are profiting in some minor way from these
> > hosting credits being bought by us, and I assume we aren't getting any
> > sort of discounts here. Having google sponsor the credits costs google
> > substantially less than having any other company give us money to do
> > it.
>
> The last I looked, Google GCP / Amazon AWS / Azure were all pretty
> comparable in terms of what you get and what you pay for them.
> Obviously providers like Packet and Digital Ocean who offer bare-metal
> services are cheaper, but then you need to find someone who is going
> to properly administer the various machines, install decent
> monitoring, make sure that more storage is provisioned when we need
> more storage (which is basically all the time), make sure that the
> hardware is maintained in decent shape (pretty sure one of the fd.o
> machines has had a drive in imminent-failure state for the last few
> months), etc.
>
> Given the size of our service, that's a much better plan (IMO) than
> relying on someone who a) isn't an admin by trade, b) has a million
> other things to do, and c) hasn't wanted to do it for the past several
> years. But as long as that's the resources we have, then we're paying
> the cloud tradeoff, where we pay more money in exchange for fewer
> problems.

Admin for gitlab and CI is a full time role anyways. The system is
definitely not self sustaining without time being put in by you and
anholt still. If we have $75k to burn on credits, and it was diverted
to just pay an admin to admin the real hw + gitlab/CI would that not
be a better use of the money? I didn't know if we can afford $75k for
an admin, but suddenly we can afford it for gitlab credits?

> Yes, we could federate everything back out so everyone runs their own
> builds and executes those. Tinderbox did something really similar to
> that IIRC; not sure if Buildbot does as well. Probably rules out
> pre-merge testing, mind.

Why? does gitlab not support the model? having builds done in parallel
on runners closer to the test runners seems like it should be a thing.
I guess artifact transfer would cost less then as a result.

> The reason we hadn't worked everything out in advance of deploying is
> because Mesa has had 3993 MRs in the not long over a year since
> moving, and a similar number in GStreamer, just taking the two biggest
> users. At the start it was 'maybe let's use MRs if you want to but
> make sure everything still goes through the list', and now it's
> something different. Similarly the CI architecture hasn't been
> 'designed', so much as that people want to run dEQP and Piglit on
> their hardware pre-merge in an open fashion that's actually accessible
> to people, and have just done it.
>
> Again, if you want everything to be centrally
> designed/approved/monitored/controlled, that's a fine enough idea, and
> I'd be happy to support whoever it was who was doing that for all of
> fd.o.

I don't think we have any choice but to have someone centrally
controlling it, You can't have a system in place that lets CI users
burn largs sums of money without authorisation, and that is what we
have now.

Dave.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] drm/amdgpu/gfx: fix indentation in new rlc spm code

2020-02-28 Thread Christian König

Series is Reviewed-by: Christian König  as well.

Am 27.02.20 um 21:33 schrieb Nirmoy:

series Reviewed-by:Nirmoy Das 

On 2/27/20 9:14 PM, Alex Deucher wrote:

fixes warnings with -Wmisleading-indentation.

Signed-off-by: Alex Deucher 
---
  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 6 +++---
  drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c  | 6 +++---
  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c  | 6 +++---
  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 6 +++---
  4 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c

index cb3421e25adc..0dc3ed694fe8 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -1016,9 +1016,9 @@ static int gfx_v10_0_rlc_init(struct 
amdgpu_device *adev)

  return r;
  }
  -    /* init spm vmid with 0xf */
-    if (adev->gfx.rlc.funcs->update_spm_vmid)
-    adev->gfx.rlc.funcs->update_spm_vmid(adev, 0xf);
+    /* init spm vmid with 0xf */
+    if (adev->gfx.rlc.funcs->update_spm_vmid)
+    adev->gfx.rlc.funcs->update_spm_vmid(adev, 0xf);
    return 0;
  }
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c

index 397c62c58436..733d398c61cc 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
@@ -3346,9 +3346,9 @@ static int gfx_v7_0_rlc_init(struct 
amdgpu_device *adev)

  return r;
  }
  -    /* init spm vmid with 0xf */
-    if (adev->gfx.rlc.funcs->update_spm_vmid)
-    adev->gfx.rlc.funcs->update_spm_vmid(adev, 0xf);
+    /* init spm vmid with 0xf */
+    if (adev->gfx.rlc.funcs->update_spm_vmid)
+    adev->gfx.rlc.funcs->update_spm_vmid(adev, 0xf);
    return 0;
  }
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c

index 294abff9efb7..393a1324daa9 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -1318,9 +1318,9 @@ static int gfx_v8_0_rlc_init(struct 
amdgpu_device *adev)

  return r;
  }
  -    /* init spm vmid with 0xf */
-    if (adev->gfx.rlc.funcs->update_spm_vmid)
-    adev->gfx.rlc.funcs->update_spm_vmid(adev, 0xf);
+    /* init spm vmid with 0xf */
+    if (adev->gfx.rlc.funcs->update_spm_vmid)
+    adev->gfx.rlc.funcs->update_spm_vmid(adev, 0xf);
    return 0;
  }
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c

index 1d5fd5f17a30..9962ef80da92 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -1847,9 +1847,9 @@ static int gfx_v9_0_rlc_init(struct 
amdgpu_device *adev)

  break;
  }
  -    /* init spm vmid with 0xf */
-    if (adev->gfx.rlc.funcs->update_spm_vmid)
-    adev->gfx.rlc.funcs->update_spm_vmid(adev, 0xf);
+    /* init spm vmid with 0xf */
+    if (adev->gfx.rlc.funcs->update_spm_vmid)
+    adev->gfx.rlc.funcs->update_spm_vmid(adev, 0xf);
    return 0;
  }

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amd/display: Use after free in dmub_psr_destroy()

2020-02-28 Thread Dan Carpenter
These lines need to be re-ordered so that we don't dereference "dmub"
after we just freed it.

Fixes: 4c1a1335dfe0 ("drm/amd/display: Driverside changes to support PSR in 
DMCUB")
Signed-off-by: Dan Carpenter 
---
 drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c 
b/drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c
index 2c932c29f1f9..dc858b152c6e 100644
--- a/drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c
+++ b/drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c
@@ -235,6 +235,6 @@ struct dmub_psr *dmub_psr_create(struct dc_context *ctx)
  */
 void dmub_psr_destroy(struct dmub_psr **dmub)
 {
-   kfree(dmub);
*dmub = NULL;
+   kfree(dmub);
 }
-- 
2.11.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Daniel Stone
On Fri, 28 Feb 2020 at 03:38, Dave Airlie  wrote:
> b) we probably need to take a large step back here.
>
> Look at this from a sponsor POV, why would I give X.org/fd.o
> sponsorship money that they are just giving straight to google to pay
> for hosting credits? Google are profiting in some minor way from these
> hosting credits being bought by us, and I assume we aren't getting any
> sort of discounts here. Having google sponsor the credits costs google
> substantially less than having any other company give us money to do
> it.

The last I looked, Google GCP / Amazon AWS / Azure were all pretty
comparable in terms of what you get and what you pay for them.
Obviously providers like Packet and Digital Ocean who offer bare-metal
services are cheaper, but then you need to find someone who is going
to properly administer the various machines, install decent
monitoring, make sure that more storage is provisioned when we need
more storage (which is basically all the time), make sure that the
hardware is maintained in decent shape (pretty sure one of the fd.o
machines has had a drive in imminent-failure state for the last few
months), etc.

Given the size of our service, that's a much better plan (IMO) than
relying on someone who a) isn't an admin by trade, b) has a million
other things to do, and c) hasn't wanted to do it for the past several
years. But as long as that's the resources we have, then we're paying
the cloud tradeoff, where we pay more money in exchange for fewer
problems.

> If our current CI architecture is going to burn this amount of money a
> year and we hadn't worked this out in advance of deploying it then I
> suggest the system should be taken offline until we work out what a
> sustainable system would look like within the budget we have, whether
> that be never transferring containers and build artifacts from the
> google network, just having local runner/build combos etc.

Yes, we could federate everything back out so everyone runs their own
builds and executes those. Tinderbox did something really similar to
that IIRC; not sure if Buildbot does as well. Probably rules out
pre-merge testing, mind.

The reason we hadn't worked everything out in advance of deploying is
because Mesa has had 3993 MRs in the not long over a year since
moving, and a similar number in GStreamer, just taking the two biggest
users. At the start it was 'maybe let's use MRs if you want to but
make sure everything still goes through the list', and now it's
something different. Similarly the CI architecture hasn't been
'designed', so much as that people want to run dEQP and Piglit on
their hardware pre-merge in an open fashion that's actually accessible
to people, and have just done it.

Again, if you want everything to be centrally
designed/approved/monitored/controlled, that's a fine enough idea, and
I'd be happy to support whoever it was who was doing that for all of
fd.o.

Cheers,
Daniel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Daniel Stone
Hi Matt,

On Thu, 27 Feb 2020 at 23:45, Matt Turner  wrote:
> We're paying 75K USD for the bandwidth to transfer data from the
> GitLab cloud instance. i.e., for viewing the https site, for
> cloning/updating git repos, and for downloading CI artifacts/images to
> the testing machines (AFAIU).

I believe that in January, we had $2082 of network cost (almost
entirely egress; ingress is basically free) and $1750 of cloud-storage
cost (almost all of which was download). That's based on 16TB of
cloud-storage (CI artifacts, container images, file uploads, Git LFS)
egress and 17.9TB of other egress (the web service itself, repo
activity). Projecting that out gives us roughly $45k of network
activity alone, so it looks like this figure is based on a projected
increase of ~50%.

The actual compute capacity is closer to $1150/month.

> I was not aware that we were being charged for anything wrt GitLab
> hosting yet (and neither was anyone on my team at Intel that I've
> asked). This... kind of needs to be communicated.
>
> A consistent concern put forth when we were discussing switching to
> GitLab and building CI was... how do we pay for it. It felt like that
> concern was always handwaved away. I heard many times that if we
> needed more runners that we could just ask Google to spin up a few
> more. If we needed testing machines they'd be donated. No one
> mentioned that all the while we were paying for bandwidth... Perhaps
> people building the CI would make different decisions about its
> structure if they knew it was going to wipe out the bank account.

The original answer is that GitLab themselves offered to sponsor
enough credit on Google Cloud to get us started. They used GCP
themselves so they could assist us (me) in getting bootstrapped, which
was invaluable. After that, Google's open-source program office
offered to sponsor us for $30k/year, which was I believe last April.
Since then the service usage has increased roughly by a factor of 10,
so our 12-month sponsorship is no longer enough to cover 12 months.

> What percentage of the bandwidth is consumed by transferring CI
> images, etc? Wouldn't 75K USD would be enough to buy all the testing
> machines we need and host them within Google or wherever so we don't
> need to pay for huge amounts of bandwidth?

Unless the Google Cloud Platform starts offering DragonBoards, it
wouldn't reduce our bandwidth usage as the corporate network is
treated separately for egress.

Cheers,
Daniel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: gitlab.fd.o financial situation and impact on services

2020-02-28 Thread The Rasterman
On Thu, 27 Feb 2020 22:27:04 +0100 Daniel Vetter  said:

Might I suggest that given the kind of expenses detailed here, literally buying
1 - 4 reasonably specced boxes and hosting them at OSUOSL would be incredibly
cheaper? (we (enlightenment.org) have been doing so for years on a single
box). We farm out CI to travis via gihub mirrors as it's not considered
an essential core service (unlike mailing lists, git, phabricator whch nwe
still run - we can live without CI for a while and find other ways).

The cost is the odd HDD replacement every few years and maybe every 10y or so a
new box. That's a massively lower cost than you are quoting below.

OSUOSL provide bandwidth, power, rack space etc. for free. They have been
fantastic IMHO and the whole "no fat bills" is awesome and you get a full
system to set up any way you like. You just bring the box. That should drop cost
through the floor. It will require some setup and admin though.

> Hi all,
> 
> You might have read the short take in the X.org board meeting minutes
> already, here's the long version.
> 
> The good news: gitlab.fd.o has become very popular with our
> communities, and is used extensively. This especially includes all the
> CI integration. Modern development process and tooling, yay!
> 
> The bad news: The cost in growth has also been tremendous, and it's
> breaking our bank account. With reasonable estimates for continued
> growth we're expecting hosting expenses totalling 75k USD this year,
> and 90k USD next year. With the current sponsors we've set up we can't
> sustain that. We estimate that hosting expenses for gitlab.fd.o
> without any of the CI features enabled would total 30k USD, which is
> within X.org's ability to support through various sponsorships, mostly
> through XDC.
> 
> Note that X.org does no longer sponsor any CI runners themselves,
> we've stopped that. The huge additional expenses are all just in
> storing and serving build artifacts and images to outside CI runners
> sponsored by various companies. A related topic is that with the
> growth in fd.o it's becoming infeasible to maintain it all on
> volunteer admin time. X.org is therefore also looking for admin
> sponsorship, at least medium term.
> 
> Assuming that we want cash flow reserves for one year of gitlab.fd.o
> (without CI support) and a trimmed XDC and assuming no sponsor payment
> meanwhile, we'd have to cut CI services somewhere between May and June
> this year. The board is of course working on acquiring sponsors, but
> filling a shortfall of this magnitude is neither easy nor quick work,
> and we therefore decided to give an early warning as soon as possible.
> Any help in finding sponsors for fd.o is very much appreciated.
> 
> Thanks, Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> ___
> xorg-de...@lists.x.org: X.Org development
> Archives: http://lists.x.org/archives/xorg-devel
> Info: https://lists.x.org/mailman/listinfo/xorg-devel
> 


-- 
- Codito, ergo sum - "I code, therefore I am" --
Carsten Haitzler - ras...@rasterman.com

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx