Re: [PATCH 6/8] drm/msm/dpu: use dpu_perf_cfg in DPU core_perf code
On 6/20/2023 4:31 AM, Konrad Dybcio wrote: On 20.06.2023 13:18, Dmitry Baryshkov wrote: On 20/06/2023 13:55, Konrad Dybcio wrote: On 20.06.2023 02:08, Dmitry Baryshkov wrote: Simplify dpu_core_perf code by using only dpu_perf_cfg instead of using full-featured catalog data. Signed-off-by: Dmitry Baryshkov --- Acked-by: Konrad Dybcio Check below. drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c | 52 --- drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.h | 8 +-- drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 2 +- 3 files changed, 27 insertions(+), 35 deletions(-) diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c index 773e641eab28..78a7e3ea27a4 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c @@ -19,11 +19,11 @@ /** * _dpu_core_perf_calc_bw() - to calculate BW per crtc - * @kms: pointer to the dpu_kms + * @perf_cfg: performance configuration * @crtc: pointer to a crtc * Return: returns aggregated BW for all planes in crtc. */ -static u64 _dpu_core_perf_calc_bw(struct dpu_kms *kms, +static u64 _dpu_core_perf_calc_bw(const struct dpu_perf_cfg *perf_cfg, struct drm_crtc *crtc) { struct drm_plane *plane; @@ -39,7 +39,7 @@ static u64 _dpu_core_perf_calc_bw(struct dpu_kms *kms, crtc_plane_bw += pstate->plane_fetch_bw; } -bw_factor = kms->catalog->perf->bw_inefficiency_factor; +bw_factor = perf_cfg->bw_inefficiency_factor; It's set to 120 for all SoCs.. and it sounds very much like some kind of a hack. The 105 on the other inefficiency factor is easy to spot: (1024/1000)^2 = 1.048576 =~= 1.05 = 105% It comes from a MiB-MB-MHz conversion that Qcom splattered all over downstream as due to ancient tragical design decisions in msmbus (which leak to the downstream interconnect a bit): This doesn't describe, why msm8226 and msm8974 had qcom,mdss-clk-factor of 5/4. And 8084 got 1.05 as usual. I can only suppose that MDSS 1.0 (8974 v1) and 1.1 (8226) had some internal inefficiency / issues. Also, this 1.05 is a clock inefficiency, so it should not be related to msm bus client code. Right. Maybe Abhinav could shed some light on this. Konrad I will need to check with someone else about this as msm8974 and msm8226 are quite old for me to remember. That being said, I really dont think the explanation behind the number is going to be something which is going to be explained in detail here even if I did ask. The name of the variable "clk_inefficiency_factor" says pretty much what has to be said for the purposes of this patch. I dont know if we will be able to go further into how that number came. Coming to this patch itself, its not a major gain or major loss in my perspective. Sure, we dont need to pass the full catalog today so we can just pass the perf_cfg. I cannot guarantee we wont need the full catalog later. The logic needs to get some input that corresponds to a clock rate of a bus clock (19.2, 200, 300 Mhz etc.) but the APIs expect a Kbps value. So at one point they invented a MHZ_TO_MBPS macro which did this conversion the other way around and probably had to account for it. I think they tried to make it make more sense, but it ended up being even more spaghetti :/ Not yet sure how it's done on RPMh icc, but with SMD RPM, passing e.g. opp-peak-kBps = <(200 * 8 * 1000)>; # 200 MHz * 8-wide * KHz-to-MHz results in a "correct" end rate. Konrad if (bw_factor) { crtc_plane_bw *= bw_factor; do_div(crtc_plane_bw, 100); -- With best wishes Dmitry
Re: [PATCH 6/8] drm/msm/dpu: use dpu_perf_cfg in DPU core_perf code
On 20.06.2023 13:18, Dmitry Baryshkov wrote: > On 20/06/2023 13:55, Konrad Dybcio wrote: >> On 20.06.2023 02:08, Dmitry Baryshkov wrote: >>> Simplify dpu_core_perf code by using only dpu_perf_cfg instead of using >>> full-featured catalog data. >>> >>> Signed-off-by: Dmitry Baryshkov >>> --- >> Acked-by: Konrad Dybcio >> >> Check below. >> >>> drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c | 52 --- >>> drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.h | 8 +-- >>> drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 2 +- >>> 3 files changed, 27 insertions(+), 35 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c >>> b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c >>> index 773e641eab28..78a7e3ea27a4 100644 >>> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c >>> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c >>> @@ -19,11 +19,11 @@ >>> >>> /** >>>* _dpu_core_perf_calc_bw() - to calculate BW per crtc >>> - * @kms: pointer to the dpu_kms >>> + * @perf_cfg: performance configuration >>>* @crtc: pointer to a crtc >>>* Return: returns aggregated BW for all planes in crtc. >>>*/ >>> -static u64 _dpu_core_perf_calc_bw(struct dpu_kms *kms, >>> +static u64 _dpu_core_perf_calc_bw(const struct dpu_perf_cfg *perf_cfg, >>> struct drm_crtc *crtc) >>> { >>> struct drm_plane *plane; >>> @@ -39,7 +39,7 @@ static u64 _dpu_core_perf_calc_bw(struct dpu_kms *kms, >>> crtc_plane_bw += pstate->plane_fetch_bw; >>> } >>> >>> -bw_factor = kms->catalog->perf->bw_inefficiency_factor; >>> +bw_factor = perf_cfg->bw_inefficiency_factor; >> It's set to 120 for all SoCs.. and it sounds very much like some kind of a >> hack. >> >> The 105 on the other inefficiency factor is easy to spot: >> >> (1024/1000)^2 = 1.048576 =~= 1.05 = 105% >> >> It comes from a MiB-MB-MHz conversion that Qcom splattered all over >> downstream as due to ancient tragical design decisions in msmbus >> (which leak to the downstream interconnect a bit): > > This doesn't describe, why msm8226 and msm8974 had qcom,mdss-clk-factor > of 5/4. And 8084 got 1.05 as usual. I can only suppose that MDSS 1.0 > (8974 v1) and 1.1 (8226) had some internal inefficiency / issues. > > Also, this 1.05 is a clock inefficiency, so it should not be related > to msm bus client code. Right. Maybe Abhinav could shed some light on this. Konrad > >> >> The logic needs to get some input that corresponds to a clock rate >> of a bus clock (19.2, 200, 300 Mhz etc.) but the APIs expect a Kbps >> value. So at one point they invented a MHZ_TO_MBPS macro which did this >> conversion the other way around and probably had to account for it. >> >> I think they tried to make it make more sense, but it ended up being >> even more spaghetti :/ >> >> Not yet sure how it's done on RPMh icc, but with SMD RPM, passing e.g. >> >> opp-peak-kBps = <(200 * 8 * 1000)>; # 200 MHz * 8-wide * KHz-to-MHz >> >> results in a "correct" end rate. >> >> Konrad >>> if (bw_factor) { >>> crtc_plane_bw *= bw_factor; >>> do_div(crtc_plane_bw, 100); > > > -- > With best wishes > Dmitry
Re: [PATCH 6/8] drm/msm/dpu: use dpu_perf_cfg in DPU core_perf code
On 20/06/2023 13:55, Konrad Dybcio wrote: > On 20.06.2023 02:08, Dmitry Baryshkov wrote: >> Simplify dpu_core_perf code by using only dpu_perf_cfg instead of using >> full-featured catalog data. >> >> Signed-off-by: Dmitry Baryshkov >> --- > Acked-by: Konrad Dybcio > > Check below. > >> drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c | 52 --- >> drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.h | 8 +-- >> drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 2 +- >> 3 files changed, 27 insertions(+), 35 deletions(-) >> >> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c >> b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c >> index 773e641eab28..78a7e3ea27a4 100644 >> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c >> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c >> @@ -19,11 +19,11 @@ >> >> /** >>* _dpu_core_perf_calc_bw() - to calculate BW per crtc >> - * @kms: pointer to the dpu_kms >> + * @perf_cfg: performance configuration >>* @crtc: pointer to a crtc >>* Return: returns aggregated BW for all planes in crtc. >>*/ >> -static u64 _dpu_core_perf_calc_bw(struct dpu_kms *kms, >> +static u64 _dpu_core_perf_calc_bw(const struct dpu_perf_cfg *perf_cfg, >> struct drm_crtc *crtc) >> { >> struct drm_plane *plane; >> @@ -39,7 +39,7 @@ static u64 _dpu_core_perf_calc_bw(struct dpu_kms *kms, >> crtc_plane_bw += pstate->plane_fetch_bw; >> } >> >> -bw_factor = kms->catalog->perf->bw_inefficiency_factor; >> +bw_factor = perf_cfg->bw_inefficiency_factor; > It's set to 120 for all SoCs.. and it sounds very much like some kind of a > hack. > > The 105 on the other inefficiency factor is easy to spot: > > (1024/1000)^2 = 1.048576 =~= 1.05 = 105% > > It comes from a MiB-MB-MHz conversion that Qcom splattered all over > downstream as due to ancient tragical design decisions in msmbus > (which leak to the downstream interconnect a bit): This doesn't describe, why msm8226 and msm8974 had qcom,mdss-clk-factor of 5/4. And 8084 got 1.05 as usual. I can only suppose that MDSS 1.0 (8974 v1) and 1.1 (8226) had some internal inefficiency / issues. Also, this 1.05 is a clock inefficiency, so it should not be related to msm bus client code. > > The logic needs to get some input that corresponds to a clock rate > of a bus clock (19.2, 200, 300 Mhz etc.) but the APIs expect a Kbps > value. So at one point they invented a MHZ_TO_MBPS macro which did this > conversion the other way around and probably had to account for it. > > I think they tried to make it make more sense, but it ended up being > even more spaghetti :/ > > Not yet sure how it's done on RPMh icc, but with SMD RPM, passing e.g. > > opp-peak-kBps = <(200 * 8 * 1000)>; # 200 MHz * 8-wide * KHz-to-MHz > > results in a "correct" end rate. > > Konrad >> if (bw_factor) { >> crtc_plane_bw *= bw_factor; >> do_div(crtc_plane_bw, 100); -- With best wishes Dmitry
Re: [PATCH 6/8] drm/msm/dpu: use dpu_perf_cfg in DPU core_perf code
On 20.06.2023 02:08, Dmitry Baryshkov wrote: > Simplify dpu_core_perf code by using only dpu_perf_cfg instead of using > full-featured catalog data. > > Signed-off-by: Dmitry Baryshkov > --- Acked-by: Konrad Dybcio Check below. > drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c | 52 --- > drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.h | 8 +-- > drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 2 +- > 3 files changed, 27 insertions(+), 35 deletions(-) > > diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c > b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c > index 773e641eab28..78a7e3ea27a4 100644 > --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c > +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c > @@ -19,11 +19,11 @@ > > /** > * _dpu_core_perf_calc_bw() - to calculate BW per crtc > - * @kms: pointer to the dpu_kms > + * @perf_cfg: performance configuration > * @crtc: pointer to a crtc > * Return: returns aggregated BW for all planes in crtc. > */ > -static u64 _dpu_core_perf_calc_bw(struct dpu_kms *kms, > +static u64 _dpu_core_perf_calc_bw(const struct dpu_perf_cfg *perf_cfg, > struct drm_crtc *crtc) > { > struct drm_plane *plane; > @@ -39,7 +39,7 @@ static u64 _dpu_core_perf_calc_bw(struct dpu_kms *kms, > crtc_plane_bw += pstate->plane_fetch_bw; > } > > - bw_factor = kms->catalog->perf->bw_inefficiency_factor; > + bw_factor = perf_cfg->bw_inefficiency_factor; It's set to 120 for all SoCs.. and it sounds very much like some kind of a hack. The 105 on the other inefficiency factor is easy to spot: (1024/1000)^2 = 1.048576 =~= 1.05 = 105% It comes from a MiB-MB-MHz conversion that Qcom splattered all over downstream as due to ancient tragical design decisions in msmbus (which leak to the downstream interconnect a bit): The logic needs to get some input that corresponds to a clock rate of a bus clock (19.2, 200, 300 Mhz etc.) but the APIs expect a Kbps value. So at one point they invented a MHZ_TO_MBPS macro which did this conversion the other way around and probably had to account for it. I think they tried to make it make more sense, but it ended up being even more spaghetti :/ Not yet sure how it's done on RPMh icc, but with SMD RPM, passing e.g. opp-peak-kBps = <(200 * 8 * 1000)>; # 200 MHz * 8-wide * KHz-to-MHz results in a "correct" end rate. Konrad > if (bw_factor) { > crtc_plane_bw *= bw_factor; > do_div(crtc_plane_bw, 100); > @@ -50,12 +50,12 @@ static u64 _dpu_core_perf_calc_bw(struct dpu_kms *kms, > > /** > * _dpu_core_perf_calc_clk() - to calculate clock per crtc > - * @kms: pointer to the dpu_kms > + * @perf_cfg: performance configuration > * @crtc: pointer to a crtc > * @state: pointer to a crtc state > * Return: returns max clk for all planes in crtc. > */ > -static u64 _dpu_core_perf_calc_clk(struct dpu_kms *kms, > +static u64 _dpu_core_perf_calc_clk(const struct dpu_perf_cfg *perf_cfg, > struct drm_crtc *crtc, struct drm_crtc_state *state) > { > struct drm_plane *plane; > @@ -76,7 +76,7 @@ static u64 _dpu_core_perf_calc_clk(struct dpu_kms *kms, > crtc_clk = max(pstate->plane_clk, crtc_clk); > } > > - clk_factor = kms->catalog->perf->clk_inefficiency_factor; > + clk_factor = perf_cfg->clk_inefficiency_factor; > if (clk_factor) { > crtc_clk *= clk_factor; > do_div(crtc_clk, 100); > @@ -92,20 +92,20 @@ static struct dpu_kms *_dpu_crtc_get_kms(struct drm_crtc > *crtc) > return to_dpu_kms(priv->kms); > } > > -static void _dpu_core_perf_calc_crtc(struct dpu_kms *kms, > +static void _dpu_core_perf_calc_crtc(const struct dpu_perf_cfg *perf_cfg, > struct drm_crtc *crtc, > struct drm_crtc_state *state, > struct dpu_core_perf_params *perf) > { > - if (!kms || !kms->catalog || !crtc || !state || !perf) { > + if (!perf_cfg || !crtc || !state || !perf) { > DPU_ERROR("invalid parameters\n"); > return; > } > > memset(perf, 0, sizeof(struct dpu_core_perf_params)); > > - perf->bw_ctl = _dpu_core_perf_calc_bw(kms, crtc); > - perf->core_clk_rate = _dpu_core_perf_calc_clk(kms, crtc, state); > + perf->bw_ctl = _dpu_core_perf_calc_bw(perf_cfg, crtc); > + perf->core_clk_rate = _dpu_core_perf_calc_clk(perf_cfg, crtc, state); > > DRM_DEBUG_ATOMIC( > "crtc=%d clk_rate=%llu core_ab=%llu\n", > @@ -122,6 +122,7 @@ int dpu_core_perf_crtc_check(struct drm_crtc *crtc, > struct dpu_crtc_state *dpu_cstate; > struct drm_crtc *tmp_crtc; > struct dpu_kms *kms; > + const struct dpu_perf_cfg *perf_cfg; > > if (!crtc || !state) { > DPU_ERROR("invalid crtc\n"); > @@ -129,10 +130,7 @@ int dpu_core_perf_crtc_check(struct drm_crtc *crtc, > } > > kms = _dpu_crtc_get_kms(crtc); > - if
[PATCH 6/8] drm/msm/dpu: use dpu_perf_cfg in DPU core_perf code
Simplify dpu_core_perf code by using only dpu_perf_cfg instead of using full-featured catalog data. Signed-off-by: Dmitry Baryshkov --- drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c | 52 --- drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.h | 8 +-- drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 2 +- 3 files changed, 27 insertions(+), 35 deletions(-) diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c index 773e641eab28..78a7e3ea27a4 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c @@ -19,11 +19,11 @@ /** * _dpu_core_perf_calc_bw() - to calculate BW per crtc - * @kms: pointer to the dpu_kms + * @perf_cfg: performance configuration * @crtc: pointer to a crtc * Return: returns aggregated BW for all planes in crtc. */ -static u64 _dpu_core_perf_calc_bw(struct dpu_kms *kms, +static u64 _dpu_core_perf_calc_bw(const struct dpu_perf_cfg *perf_cfg, struct drm_crtc *crtc) { struct drm_plane *plane; @@ -39,7 +39,7 @@ static u64 _dpu_core_perf_calc_bw(struct dpu_kms *kms, crtc_plane_bw += pstate->plane_fetch_bw; } - bw_factor = kms->catalog->perf->bw_inefficiency_factor; + bw_factor = perf_cfg->bw_inefficiency_factor; if (bw_factor) { crtc_plane_bw *= bw_factor; do_div(crtc_plane_bw, 100); @@ -50,12 +50,12 @@ static u64 _dpu_core_perf_calc_bw(struct dpu_kms *kms, /** * _dpu_core_perf_calc_clk() - to calculate clock per crtc - * @kms: pointer to the dpu_kms + * @perf_cfg: performance configuration * @crtc: pointer to a crtc * @state: pointer to a crtc state * Return: returns max clk for all planes in crtc. */ -static u64 _dpu_core_perf_calc_clk(struct dpu_kms *kms, +static u64 _dpu_core_perf_calc_clk(const struct dpu_perf_cfg *perf_cfg, struct drm_crtc *crtc, struct drm_crtc_state *state) { struct drm_plane *plane; @@ -76,7 +76,7 @@ static u64 _dpu_core_perf_calc_clk(struct dpu_kms *kms, crtc_clk = max(pstate->plane_clk, crtc_clk); } - clk_factor = kms->catalog->perf->clk_inefficiency_factor; + clk_factor = perf_cfg->clk_inefficiency_factor; if (clk_factor) { crtc_clk *= clk_factor; do_div(crtc_clk, 100); @@ -92,20 +92,20 @@ static struct dpu_kms *_dpu_crtc_get_kms(struct drm_crtc *crtc) return to_dpu_kms(priv->kms); } -static void _dpu_core_perf_calc_crtc(struct dpu_kms *kms, +static void _dpu_core_perf_calc_crtc(const struct dpu_perf_cfg *perf_cfg, struct drm_crtc *crtc, struct drm_crtc_state *state, struct dpu_core_perf_params *perf) { - if (!kms || !kms->catalog || !crtc || !state || !perf) { + if (!perf_cfg || !crtc || !state || !perf) { DPU_ERROR("invalid parameters\n"); return; } memset(perf, 0, sizeof(struct dpu_core_perf_params)); - perf->bw_ctl = _dpu_core_perf_calc_bw(kms, crtc); - perf->core_clk_rate = _dpu_core_perf_calc_clk(kms, crtc, state); + perf->bw_ctl = _dpu_core_perf_calc_bw(perf_cfg, crtc); + perf->core_clk_rate = _dpu_core_perf_calc_clk(perf_cfg, crtc, state); DRM_DEBUG_ATOMIC( "crtc=%d clk_rate=%llu core_ab=%llu\n", @@ -122,6 +122,7 @@ int dpu_core_perf_crtc_check(struct drm_crtc *crtc, struct dpu_crtc_state *dpu_cstate; struct drm_crtc *tmp_crtc; struct dpu_kms *kms; + const struct dpu_perf_cfg *perf_cfg; if (!crtc || !state) { DPU_ERROR("invalid crtc\n"); @@ -129,10 +130,7 @@ int dpu_core_perf_crtc_check(struct drm_crtc *crtc, } kms = _dpu_crtc_get_kms(crtc); - if (!kms->catalog) { - DPU_ERROR("invalid parameters\n"); - return 0; - } + perf_cfg = kms->perf.perf_cfg; /* we only need bandwidth check on real-time clients (interfaces) */ if (dpu_crtc_get_client_type(crtc) == NRT_CLIENT) @@ -141,7 +139,7 @@ int dpu_core_perf_crtc_check(struct drm_crtc *crtc, dpu_cstate = to_dpu_crtc_state(state); /* obtain new values */ - _dpu_core_perf_calc_crtc(kms, crtc, state, _cstate->new_perf); + _dpu_core_perf_calc_crtc(perf_cfg, crtc, state, _cstate->new_perf); bw_sum_of_intfs = dpu_cstate->new_perf.bw_ctl; curr_client_type = dpu_crtc_get_client_type(crtc); @@ -164,7 +162,7 @@ int dpu_core_perf_crtc_check(struct drm_crtc *crtc, bw = DIV_ROUND_UP_ULL(bw_sum_of_intfs, 1000); DRM_DEBUG_ATOMIC("calculated bandwidth=%uk\n", bw); - threshold = kms->catalog->perf->max_bw_high; + threshold = perf_cfg->max_bw_high; DRM_DEBUG_ATOMIC("final threshold bw limit = %d\n", threshold); @@ -212,7 +210,7 @@ static int