from:"Rob Clark"

Re: [PATCH] drm/msm: always parse interconnects

2021-04-17 Thread Rob Clark

On Sat, Apr 17, 2021 at 9:21 AM Caleb Connolly  wrote:
>
> The WARN_ON in dpu_runtime_resume() fires constantly on non-SC7180
> platforms. As SDM845 now has interconnects hooked up we should always
> try and parse them.
>
> Fixes: 627dc55c273d ("drm/msm/disp/dpu1: icc path needs to be set before dpu 
> runtime resume")
> Signed-off-by: Caleb Connolly 

I believe this series in msm-next already solves this

https://patchwork.freedesktop.org/series/88644/

BR,
-R

> ---
>  drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c 
> b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
> index 85f2c3564c96..fb061e666faa 100644
> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
> @@ -933,8 +933,7 @@ static int dpu_kms_hw_init(struct msm_kms *kms)
> DPU_DEBUG("REG_DMA is not defined");
> }
>
> -   if (of_device_is_compatible(dev->dev->of_node, "qcom,sc7180-mdss"))
> -   dpu_kms_parse_data_bus_icc_path(dpu_kms);
> +   dpu_kms_parse_data_bus_icc_path(dpu_kms);
>
> pm_runtime_get_sync(_kms->pdev->dev);
>
> --
> 2.30.2
>
>

Re: [kbuild] drivers/gpu/drm/msm/adreno/a3xx_gpu.c:600 a3xx_gpu_init() error: passing non negative 1 to ERR_PTR

2021-04-16 Thread Rob Clark

On Thu, Apr 15, 2021 at 9:33 PM Dan Carpenter  wrote:
>
> On Thu, Apr 15, 2021 at 04:21:01PM -0700, Rob Clark wrote:
> > > > 5785dd7a8ef0de Akhil P Oommen 2020-10-28  571 icc_path = 
> > > > devm_of_icc_get(>dev, "gfx-mem");
> > > > 5785dd7a8ef0de Akhil P Oommen 2020-10-28  572 ret = 
> > > > IS_ERR(icc_path);
> > > > 5785dd7a8ef0de Akhil P Oommen 2020-10-28  573 if (ret)
> > > > 5785dd7a8ef0de Akhil P Oommen 2020-10-28  574 goto fail;
> > > >
> > > > IS_ERR() returns/true false so this will lead to an Oops in the caller.
> > > >
> > > >   icc_path = devm_of_icc_get(>dev, "gfx-mem");
> > > >   if (IS_ERR(icc_path)) {
> > > >   ret = PTR_ERR(icc_path);
> > > >   goto fail;
> > > >   }
> > > Agree.
> > >
> > > >
> > > > 5785dd7a8ef0de Akhil P Oommen 2020-10-28  575
> > > > 5785dd7a8ef0de Akhil P Oommen 2020-10-28  576 ocmem_icc_path = 
> > > > devm_of_icc_get(>dev, "ocmem");
> > > > 5785dd7a8ef0de Akhil P Oommen 2020-10-28  577 ret = 
> > > > IS_ERR(ocmem_icc_path);
> > > > 5785dd7a8ef0de Akhil P Oommen 2020-10-28  578 if (ret) {
> > > > 5785dd7a8ef0de Akhil P Oommen 2020-10-28  579 /* allow 
> > > > -ENODATA, ocmem icc is optional */
> > > > 5785dd7a8ef0de Akhil P Oommen 2020-10-28  580 if (ret 
> > > > != -ENODATA)
> > > > 5785dd7a8ef0de Akhil P Oommen 2020-10-28  581 
> > > > goto fail;
> > > >
> > > > Same.  ret is true/false so it can't be equal to -ENODATA, plus the
> > > > caller will Oops.
> > > >
> > > > Btw, this patch removed the assignments:
> > > >
> > > >   gpu->icc_path = of_icc_get(dev, "gfx-mem");
> > > >   gpu->ocmem_icc_path = of_icc_get(dev, "ocmem");
> > > >
> > > > So I think "gpu->icc_path" and "gpu->ocmem_icc_path" are always
> > > > NULL/unused and they should be removed.
> > > >
> > > Agree. Will share a fix.
> > > Thanks, Dan.
> >
> > gpu->ocmem_icc_path/icc_path is used on older devices.. it sounds like
> > we broke some older devices and no one has noticed yet?
>
> This is error paths and dead code.  Probably no one is affected in
> real life.
>

oh, right, we are using devm now, so we can drop the icc_put()s

BR,
-R

Re: [kbuild] drivers/gpu/drm/msm/adreno/a3xx_gpu.c:600 a3xx_gpu_init() error: passing non negative 1 to ERR_PTR

2021-04-15 Thread Rob Clark

On Thu, Apr 15, 2021 at 9:52 AM Akhil P Oommen  wrote:
>
> On 4/9/2021 3:07 PM, Dan Carpenter wrote:
> > tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git  
> > master
> > head:   2d743660786ec51f5c1fefd5782bbdee7b227db0
> > commit: 5785dd7a8ef0de8049f40a1a109de6a1bf17b479 drm/msm: Fix duplicate gpu 
> > node in icc summary
> > config: arm64-randconfig-m031-20210407 (attached as .config)
> > compiler: aarch64-linux-gcc (GCC) 9.3.0
> >
> > If you fix the issue, kindly add following tag as appropriate
> > Reported-by: kernel test robot 
> > Reported-by: Dan Carpenter 
> >
> > smatch warnings:
> > drivers/gpu/drm/msm/adreno/a3xx_gpu.c:600 a3xx_gpu_init() error: passing 
> > non negative 1 to ERR_PTR
> > drivers/gpu/drm/msm/adreno/a4xx_gpu.c:727 a4xx_gpu_init() error: passing 
> > non negative 1 to ERR_PTR
> >
> > vim +600 drivers/gpu/drm/msm/adreno/a3xx_gpu.c
> >
> > 7198e6b03155f6 Rob Clark  2013-07-19  515  struct msm_gpu 
> > *a3xx_gpu_init(struct drm_device *dev)
> > 7198e6b03155f6 Rob Clark  2013-07-19  516  {
> > 7198e6b03155f6 Rob Clark  2013-07-19  517 struct a3xx_gpu 
> > *a3xx_gpu = NULL;
> > 55459968176f13 Rob Clark      2013-12-05  518 struct adreno_gpu 
> > *adreno_gpu;
> > 7198e6b03155f6 Rob Clark  2013-07-19  519 struct msm_gpu *gpu;
> > 060530f1ea6740 Rob Clark  2014-03-03  520 struct 
> > msm_drm_private *priv = dev->dev_private;
> > 060530f1ea6740 Rob Clark  2014-03-03  521 struct 
> > platform_device *pdev = priv->gpu_pdev;
> > 5785dd7a8ef0de Akhil P Oommen 2020-10-28  522 struct icc_path 
> > *ocmem_icc_path;
> > 5785dd7a8ef0de Akhil P Oommen 2020-10-28  523 struct icc_path 
> > *icc_path;
> > 7198e6b03155f6 Rob Clark  2013-07-19  524 int ret;
> > 7198e6b03155f6 Rob Clark  2013-07-19  525
> > 7198e6b03155f6 Rob Clark  2013-07-19  526 if (!pdev) {
> > 6a41da17e87dee Mamta Shukla   2018-10-20  527 
> > DRM_DEV_ERROR(dev->dev, "no a3xx device\n");
> > 7198e6b03155f6 Rob Clark  2013-07-19  528 ret = -ENXIO;
> > 7198e6b03155f6 Rob Clark  2013-07-19  529 goto fail;
> > 7198e6b03155f6 Rob Clark      2013-07-19  530 }
> > 7198e6b03155f6 Rob Clark  2013-07-19  531
> > 7198e6b03155f6 Rob Clark  2013-07-19  532 a3xx_gpu = 
> > kzalloc(sizeof(*a3xx_gpu), GFP_KERNEL);
> > 7198e6b03155f6 Rob Clark  2013-07-19  533     if (!a3xx_gpu) {
> > 7198e6b03155f6 Rob Clark  2013-07-19  534         ret = -ENOMEM;
> > 7198e6b03155f6 Rob Clark  2013-07-19  535         goto fail;
> > 7198e6b03155f6 Rob Clark  2013-07-19  536 }
> > 7198e6b03155f6 Rob Clark  2013-07-19  537
> > 55459968176f13 Rob Clark  2013-12-05  538 adreno_gpu = 
> > _gpu->base;
> > 55459968176f13 Rob Clark  2013-12-05  539 gpu = 
> > _gpu->base;
> > 7198e6b03155f6 Rob Clark  2013-07-19  540
> > 70c70f091b1ffd Rob Clark  2014-05-30  541 gpu->perfcntrs = 
> > perfcntrs;
> > 70c70f091b1ffd Rob Clark  2014-05-30  542     gpu->num_perfcntrs = 
> > ARRAY_SIZE(perfcntrs);
> > 70c70f091b1ffd Rob Clark  2014-05-30  543
> > 3bcefb0497f9fc Rob Clark  2014-09-05  544 adreno_gpu->registers 
> > = a3xx_registers;
> > 3bcefb0497f9fc Rob Clark  2014-09-05  545
> > f97decac5f4c2d Jordan Crouse  2017-10-20  546 ret = 
> > adreno_gpu_init(dev, pdev, adreno_gpu, , 1);
> > 7198e6b03155f6 Rob Clark  2013-07-19  547 if (ret)
> > 7198e6b03155f6 Rob Clark  2013-07-19  548 goto fail;
> > 7198e6b03155f6 Rob Clark  2013-07-19  549
> > 55459968176f13 Rob Clark  2013-12-05  550 /* if needed, 
> > allocate gmem: */
> > 55459968176f13 Rob Clark  2013-12-05  551     if 
> > (adreno_is_a330(adreno_gpu)) {
> > 26c0b26dcd005d Brian Masney   2019-08-23  552         ret = 
> > adreno_gpu_ocmem_init(_gpu->base.pdev->dev,
> > 26c0b26dcd005d Brian Masney   2019-08-23  553   
> >   adreno_gpu, _gpu->ocmem);
> > 26c0b26dcd005d Brian Masney   2019-08-23  554 if (ret)
> > 26c0b26dcd005d Brian Masney   2019-08-23  555 goto 
> > fail;
> > 55459968176f13 Rob Clark  2013-12-05  556     }
> > 55459968176f13 Rob Clark  2013-12-05  557
> > 667ce33e57d0de Rob Cla

Re: [PATCH 0/8] drm/msm: Swappable GEM objects

2021-04-12 Thread Rob Clark

On Mon, Apr 12, 2021 at 7:28 AM Daniel Vetter  wrote:
>
> On Thu, Apr 08, 2021 at 08:23:42AM -0700, Rob Clark wrote:
> > On Thu, Apr 8, 2021 at 4:15 AM Daniel Vetter  wrote:
> > >
> > > On Mon, Apr 05, 2021 at 10:45:23AM -0700, Rob Clark wrote:
> > > > From: Rob Clark 
> > > >
> > > > One would normally hope not to be under enough memory pressure to need
> > > > to swap GEM objects to disk backed swap.  But memory backed zram swap
> > > > (as enabled on chromebooks, for example) can actually be quite fast
> > > > and useful on devices with less RAM.  On a 4GB device, opening up ~4
> > > > memory intensive web pages (in separate windows rather than tabs, to try
> > > > and prevent tab discard), I see ~500MB worth of GEM objects, of which
> > > > maybe only 10% are active at any time, and with unpin/evict enabled,
> > > > only about half resident (which is a number that gets much lower if you
> > > > simulate extreme memory pressure).  Assuming a 2:1 compression ratio (I
> > > > see a bit higher in practice, but cannot isolate swapped out GEM pages
> > > > vs other), that is like having an extra 100+MB of RAM, or more under
> > > > higher memory pressure.
> > > >
> > > > Rob Clark (8):
> > > >   drm/msm: ratelimit GEM related WARN_ON()s
> > > >   drm/msm: Reorganize msm_gem_shrinker_scan()
> > > >   drm/msm: Clear msm_obj->sgt in put_pages()
> > > >   drm/msm: Split iova purge and close
> > > >   drm/msm: Add $debugfs/gem stats on resident objects
> > > >   drm/msm: Track potentially evictable objects
> > > >   drm/msm: Small msm_gem_purge() fix
> > > >   drm/msm: Support evicting GEM objects to swap
> > >
> > > Given how much entertainement shrinkers are, should we aim for more common
> > > code here?
> > >
> > > Christian has tons of fun with adding something like this for ttm (well
> > > different shades of grey). i915 is going to adopt ttm, at least for
> > > discrete.
> > >
> > > The locking is also an utter pain, and msm seems to still live a lot in
> > > its own land here. I think as much as possible a standard approach here
> > > would be really good, ideally maybe as building blocks shared between ttm
> > > and gem-shmem drivers ...
> >
> > I don't disagree.. but also replacing the engines on an airplane
> > mid-flight isn't a great option either.. ;-)
> >
> > The hard part (esp. wrt to locking) is tracking the state of a given
> > bo.. ie. is it active, active+purgable, inactive+purgable,
> > inactive+unpinnable, etc.  Currently the shmem helpers don't really
> > provide anything here.  If they did, I suppose they could provide some
> > shrinker helpers as well.  Unfortunately these days I barely have
> > enough time for drm/msm, let alone bolting this onto the shmem
> > helpers.  I would recommend that if someone wanted to do this, that
> > they look at recent drm/msm shrinker patches that I've sent (ie. make
> > shrinker->count() lockless, and drop the locks in shrinker->scan()
> > body.. when the system is under heavy memory pressure, you start
> > getting shrinker called from all the threads so contention for mm_lock
> > can be a really bad problem)
> >
> > (Well, the other potential problem is that drm/msm has a lot of
> > different possible iommu pairings across the generations, so there is
> > some potential here to uncover exciting new bugs.. the locking at
> > least is the same for all the generations and pretty easy to test with
> > and without lockdep with some tests that push essentially all memory
> > into swap)
>
> So what we aimed for with i915 and discrete gpu is to first align on
> locking with dma_resv_lock for all buffer state, plus a bunch of
> lru/allocator locks for lists and stuff.
>
> And then with more aligned locking, figure out how to maybe share more
> code.
>
> The trouble is that right now neither shmem helpers, nor drivers using
> them, are really using dma_resv_lock to protect their per-buffer state.

We are actually already using dma_resv_lock() since a few release
cycles back.. msm_gem_lock() and friends are a wrapper around that
from the migration away from using our own lock).. the mm_lock is
symply protecting the lists, not the objects

> So yeah it's a bit an awkward situation, and I don't know myself really
> how to get out of it. Lack of people with tons of free time doesn't help
> much.
>
> So best case I think is that every time we touch helpers o

Re: [Freedreno] [PATCH 1/3] drm/msm/mdp5: Configure PP_SYNC_HEIGHT to double the vtotal

2021-04-08 Thread Rob Clark

On Thu, Apr 8, 2021 at 4:16 PM AngeloGioacchino Del Regno
 wrote:
>
>
> Il gio 8 apr 2021, 21:05 Rob Clark  ha scritto:
>>
>> On Wed, Apr 7, 2021 at 12:11 PM AngeloGioacchino Del Regno
>>  wrote:
>> >
>> > Il 07/04/21 20:19, abhin...@codeaurora.org ha scritto:
>> > > Hi Marijn
>> > >
>> > > On 2021-04-06 14:47, Marijn Suijten wrote:
>> > >> Leaving this at a close-to-maximum register value 0xFFF0 means it takes
>> > >> very long for the MDSS to generate a software vsync interrupt when the
>> > >> hardware TE interrupt doesn't arrive.  Configuring this to double the
>> > >> vtotal (like some downstream kernels) leads to a frame to take at most
>> > >> twice before the vsync signal, until hardware TE comes up.
>> > >>
>> > >> In this case the hardware interrupt responsible for providing this
>> > >> signal - "disp-te" gpio - is not hooked up to the mdp5 vsync/pp logic at
>> > >> all.  This solves severe panel update issues observed on at least the
>> > >> Xperia Loire and Tone series, until said gpio is properly hooked up to
>> > >> an irq.
>> > >
>> > > The reason the CONFIG_HEIGHT was at such a high value is to make sure 
>> > > that
>> > > we always get the TE only from the panel vsync and not false positives
>> > > coming
>> > > from the tear check logic itself.
>> > >
>> > > When you say that disp-te gpio is not hooked up, is it something
>> > > incorrect with
>> > > the schematic OR panel is not generating the TE correctly?
>> > >
>> >
>> > Sometimes, some panels aren't getting correctly configured by the
>> > OEM/ODM in the first place: especially when porting devices from
>> > downstream to upstream, developers often get in a situation in which
>> > their TE line is either misconfigured or the DriverIC is not configured
>> > to raise V-Sync interrupts.
>> > Please remember: some DDICs need a "commands sequence" to enable
>> > generating the TE interrupts, sometimes this is not standard, and
>> > sometimes OEMs/ODMs are not even doing that in their downstream code
>> > (but instead they work around it in creative ways "for reasons", even
>> > though their DDIC supports indeed sending TE events).
>> >
>> > This mostly happens when bringing up devices that have autorefresh
>> > enabled from the bootloader (when the bootloader sets up the splash
>> > screen) by using simple-panel as a (hopefully) temporary solution to get
>> > through the initial stages of porting.
>> >
>> > We are not trying to cover cases related to incorrect schematics or
>> > hardware mistakes here, as the fix for that - as you know - is to just
>> > fix your hardware.
>> > What we're trying to do here is to stop freezes and, in some cases,
>> > lockups, other than false positives making the developer go offroad when
>> > the platform shows that something is wrong during early porting.
>> >
>> > Also, sometimes, some DDICs will not generate TE interrupts when
>> > expected... in these cases we get a PP timeout and a MDP5 recovery: this
>> > is totally avoidable if we rely on the 2*vtotal, as we wouldn't get
>> > through the very time consuming task of recovering the entire MDP.
>> >
>> > Of course, if something is wrong in the MDP and the block really needs
>> > recovery, this "trick" won't save anyone and the recovery will anyway be
>> > triggered, as the PP-done will anyway timeout.
>>
>> So, is this (mostly) a workaround due to TE not wired up?  In which
>> case I think it is ok, but maybe should have a comment about the
>> interaction with TE?
>
>
> Mostly, yes.
>
>>
>> Currently I have this patch in msm-next-staging but I guess we need to
>> decide in the next day or so whether to drop it or smash in a comment?
>>
>> BR,
>> -R
>
>
> Marijn, can you please urgently throw a comment in, reminding that these 
> timers are interacting with TE and send a fast V2?
>

Or just reply on list w/ a comment to smash in, if that is easier

BR,
-R

Re: [Freedreno] [PATCH 1/3] drm/msm/mdp5: Configure PP_SYNC_HEIGHT to double the vtotal

2021-04-08 Thread Rob Clark

On Wed, Apr 7, 2021 at 12:11 PM AngeloGioacchino Del Regno
 wrote:
>
> Il 07/04/21 20:19, abhin...@codeaurora.org ha scritto:
> > Hi Marijn
> >
> > On 2021-04-06 14:47, Marijn Suijten wrote:
> >> Leaving this at a close-to-maximum register value 0xFFF0 means it takes
> >> very long for the MDSS to generate a software vsync interrupt when the
> >> hardware TE interrupt doesn't arrive.  Configuring this to double the
> >> vtotal (like some downstream kernels) leads to a frame to take at most
> >> twice before the vsync signal, until hardware TE comes up.
> >>
> >> In this case the hardware interrupt responsible for providing this
> >> signal - "disp-te" gpio - is not hooked up to the mdp5 vsync/pp logic at
> >> all.  This solves severe panel update issues observed on at least the
> >> Xperia Loire and Tone series, until said gpio is properly hooked up to
> >> an irq.
> >
> > The reason the CONFIG_HEIGHT was at such a high value is to make sure that
> > we always get the TE only from the panel vsync and not false positives
> > coming
> > from the tear check logic itself.
> >
> > When you say that disp-te gpio is not hooked up, is it something
> > incorrect with
> > the schematic OR panel is not generating the TE correctly?
> >
>
> Sometimes, some panels aren't getting correctly configured by the
> OEM/ODM in the first place: especially when porting devices from
> downstream to upstream, developers often get in a situation in which
> their TE line is either misconfigured or the DriverIC is not configured
> to raise V-Sync interrupts.
> Please remember: some DDICs need a "commands sequence" to enable
> generating the TE interrupts, sometimes this is not standard, and
> sometimes OEMs/ODMs are not even doing that in their downstream code
> (but instead they work around it in creative ways "for reasons", even
> though their DDIC supports indeed sending TE events).
>
> This mostly happens when bringing up devices that have autorefresh
> enabled from the bootloader (when the bootloader sets up the splash
> screen) by using simple-panel as a (hopefully) temporary solution to get
> through the initial stages of porting.
>
> We are not trying to cover cases related to incorrect schematics or
> hardware mistakes here, as the fix for that - as you know - is to just
> fix your hardware.
> What we're trying to do here is to stop freezes and, in some cases,
> lockups, other than false positives making the developer go offroad when
> the platform shows that something is wrong during early porting.
>
> Also, sometimes, some DDICs will not generate TE interrupts when
> expected... in these cases we get a PP timeout and a MDP5 recovery: this
> is totally avoidable if we rely on the 2*vtotal, as we wouldn't get
> through the very time consuming task of recovering the entire MDP.
>
> Of course, if something is wrong in the MDP and the block really needs
> recovery, this "trick" won't save anyone and the recovery will anyway be
> triggered, as the PP-done will anyway timeout.

So, is this (mostly) a workaround due to TE not wired up?  In which
case I think it is ok, but maybe should have a comment about the
interaction with TE?

Currently I have this patch in msm-next-staging but I guess we need to
decide in the next day or so whether to drop it or smash in a comment?

BR,
-R

> >>
> >> Suggested-by: AngeloGioacchino Del Regno
> >> 
> >> Signed-off-by: Marijn Suijten 
> >> Reviewed-by: AngeloGioacchino Del Regno
> >> 
> >> ---
> >>  drivers/gpu/drm/msm/disp/mdp5/mdp5_cmd_encoder.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_cmd_encoder.c
> >> b/drivers/gpu/drm/msm/disp/mdp5/mdp5_cmd_encoder.c
> >> index ff2c1d583c79..2d5ac03dbc17 100644
> >> --- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_cmd_encoder.c
> >> +++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_cmd_encoder.c
> >> @@ -51,7 +51,7 @@ static int pingpong_tearcheck_setup(struct
> >> drm_encoder *encoder,
> >>
> >>  mdp5_write(mdp5_kms, REG_MDP5_PP_SYNC_CONFIG_VSYNC(pp_id), cfg);
> >>  mdp5_write(mdp5_kms,
> >> -REG_MDP5_PP_SYNC_CONFIG_HEIGHT(pp_id), 0xfff0);
> >> +REG_MDP5_PP_SYNC_CONFIG_HEIGHT(pp_id), (2 * mode->vtotal));
> >>  mdp5_write(mdp5_kms,
> >>  REG_MDP5_PP_VSYNC_INIT_VAL(pp_id), mode->vdisplay);
> >>  mdp5_write(mdp5_kms, REG_MDP5_PP_RD_PTR_IRQ(pp_id),
> >> mode->vdisplay + 1);
>

Re: [PATCH 0/8] drm/msm: Swappable GEM objects

2021-04-08 Thread Rob Clark

On Thu, Apr 8, 2021 at 4:15 AM Daniel Vetter  wrote:
>
> On Mon, Apr 05, 2021 at 10:45:23AM -0700, Rob Clark wrote:
> > From: Rob Clark 
> >
> > One would normally hope not to be under enough memory pressure to need
> > to swap GEM objects to disk backed swap.  But memory backed zram swap
> > (as enabled on chromebooks, for example) can actually be quite fast
> > and useful on devices with less RAM.  On a 4GB device, opening up ~4
> > memory intensive web pages (in separate windows rather than tabs, to try
> > and prevent tab discard), I see ~500MB worth of GEM objects, of which
> > maybe only 10% are active at any time, and with unpin/evict enabled,
> > only about half resident (which is a number that gets much lower if you
> > simulate extreme memory pressure).  Assuming a 2:1 compression ratio (I
> > see a bit higher in practice, but cannot isolate swapped out GEM pages
> > vs other), that is like having an extra 100+MB of RAM, or more under
> > higher memory pressure.
> >
> > Rob Clark (8):
> >   drm/msm: ratelimit GEM related WARN_ON()s
> >   drm/msm: Reorganize msm_gem_shrinker_scan()
> >   drm/msm: Clear msm_obj->sgt in put_pages()
> >   drm/msm: Split iova purge and close
> >   drm/msm: Add $debugfs/gem stats on resident objects
> >   drm/msm: Track potentially evictable objects
> >   drm/msm: Small msm_gem_purge() fix
> >   drm/msm: Support evicting GEM objects to swap
>
> Given how much entertainement shrinkers are, should we aim for more common
> code here?
>
> Christian has tons of fun with adding something like this for ttm (well
> different shades of grey). i915 is going to adopt ttm, at least for
> discrete.
>
> The locking is also an utter pain, and msm seems to still live a lot in
> its own land here. I think as much as possible a standard approach here
> would be really good, ideally maybe as building blocks shared between ttm
> and gem-shmem drivers ...

I don't disagree.. but also replacing the engines on an airplane
mid-flight isn't a great option either.. ;-)

The hard part (esp. wrt to locking) is tracking the state of a given
bo.. ie. is it active, active+purgable, inactive+purgable,
inactive+unpinnable, etc.  Currently the shmem helpers don't really
provide anything here.  If they did, I suppose they could provide some
shrinker helpers as well.  Unfortunately these days I barely have
enough time for drm/msm, let alone bolting this onto the shmem
helpers.  I would recommend that if someone wanted to do this, that
they look at recent drm/msm shrinker patches that I've sent (ie. make
shrinker->count() lockless, and drop the locks in shrinker->scan()
body.. when the system is under heavy memory pressure, you start
getting shrinker called from all the threads so contention for mm_lock
can be a really bad problem)

(Well, the other potential problem is that drm/msm has a lot of
different possible iommu pairings across the generations, so there is
some potential here to uncover exciting new bugs.. the locking at
least is the same for all the generations and pretty easy to test with
and without lockdep with some tests that push essentially all memory
into swap)

BR,
-R

> -Daniel
>
> >
> >  drivers/gpu/drm/msm/msm_drv.c  |   2 +-
> >  drivers/gpu/drm/msm/msm_drv.h  |  13 ++-
> >  drivers/gpu/drm/msm/msm_gem.c  | 155 +
> >  drivers/gpu/drm/msm/msm_gem.h  |  68 +--
> >  drivers/gpu/drm/msm/msm_gem_shrinker.c | 129 
> >  drivers/gpu/drm/msm/msm_gpu_trace.h|  13 +++
> >  6 files changed, 272 insertions(+), 108 deletions(-)
> >
> > --
> > 2.30.2
> >
> > ___
> > dri-devel mailing list
> > dri-de...@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

[PATCH] drm/msm: Fix spelling "purgable" -> "purgeable"

2021-04-06 Thread Rob Clark

From: Rob Clark 

The previous patch fixes the user visible spelling.  This one fixes the
code.  Oops.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem.c  | 12 ++--
 drivers/gpu/drm/msm/msm_gem.h  | 16 
 drivers/gpu/drm/msm/msm_gem_shrinker.c |  2 +-
 3 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 9568d551f7de..3c0b384a8984 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -821,14 +821,14 @@ static void update_inactive(struct msm_gem_object 
*msm_obj)
WARN_ON(msm_obj->active_count != 0);
 
if (msm_obj->dontneed)
-   mark_unpurgable(msm_obj);
+   mark_unpurgeable(msm_obj);
 
list_del(_obj->mm_list);
if (msm_obj->madv == MSM_MADV_WILLNEED) {
list_add_tail(_obj->mm_list, >inactive_willneed);
} else if (msm_obj->madv == MSM_MADV_DONTNEED) {
list_add_tail(_obj->mm_list, >inactive_dontneed);
-   mark_purgable(msm_obj);
+   mark_purgeable(msm_obj);
} else {
WARN_ON(msm_obj->madv != __MSM_MADV_PURGED);
list_add_tail(_obj->mm_list, >inactive_purged);
@@ -901,8 +901,8 @@ void msm_gem_describe(struct drm_gem_object *obj, struct 
seq_file *m,
madv = " purged";
break;
case MSM_MADV_DONTNEED:
-   stats->purgable.count++;
-   stats->purgable.size += obj->size;
+   stats->purgeable.count++;
+   stats->purgeable.size += obj->size;
madv = " purgeable";
break;
case MSM_MADV_WILLNEED:
@@ -984,7 +984,7 @@ void msm_gem_describe_objects(struct list_head *list, 
struct seq_file *m)
seq_printf(m, "Active:%4d objects, %9zu bytes\n",
stats.active.count, stats.active.size);
seq_printf(m, "Purgeable: %4d objects, %9zu bytes\n",
-   stats.purgable.count, stats.purgable.size);
+   stats.purgeable.count, stats.purgeable.size);
seq_printf(m, "Purged:%4d objects, %9zu bytes\n",
stats.purged.count, stats.purged.size);
 }
@@ -1003,7 +1003,7 @@ void msm_gem_free_object(struct drm_gem_object *obj)
 
mutex_lock(>mm_lock);
if (msm_obj->dontneed)
-   mark_unpurgable(msm_obj);
+   mark_unpurgeable(msm_obj);
list_del(_obj->mm_list);
mutex_unlock(>mm_lock);
 
diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index 7c7d54bad189..13ebecdd70f4 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -163,7 +163,7 @@ struct msm_gem_stats {
struct {
unsigned count;
size_t size;
-   } all, active, purgable, purged;
+   } all, active, purgeable, purged;
 };
 
 void msm_gem_describe(struct drm_gem_object *obj, struct seq_file *m,
@@ -207,8 +207,8 @@ static inline bool is_active(struct msm_gem_object *msm_obj)
return msm_obj->active_count;
 }
 
-/* imported/exported objects are not purgable: */
-static inline bool is_unpurgable(struct msm_gem_object *msm_obj)
+/* imported/exported objects are not purgeable: */
+static inline bool is_unpurgeable(struct msm_gem_object *msm_obj)
 {
return msm_obj->base.dma_buf && msm_obj->base.import_attach;
 }
@@ -216,7 +216,7 @@ static inline bool is_unpurgable(struct msm_gem_object 
*msm_obj)
 static inline bool is_purgeable(struct msm_gem_object *msm_obj)
 {
return (msm_obj->madv == MSM_MADV_DONTNEED) && msm_obj->sgt &&
-   !is_unpurgable(msm_obj);
+   !is_unpurgeable(msm_obj);
 }
 
 static inline bool is_vunmapable(struct msm_gem_object *msm_obj)
@@ -225,13 +225,13 @@ static inline bool is_vunmapable(struct msm_gem_object 
*msm_obj)
return (msm_obj->vmap_count == 0) && msm_obj->vaddr;
 }
 
-static inline void mark_purgable(struct msm_gem_object *msm_obj)
+static inline void mark_purgeable(struct msm_gem_object *msm_obj)
 {
struct msm_drm_private *priv = msm_obj->base.dev->dev_private;
 
WARN_ON(!mutex_is_locked(>mm_lock));
 
-   if (is_unpurgable(msm_obj))
+   if (is_unpurgeable(msm_obj))
return;
 
if (WARN_ON(msm_obj->dontneed))
@@ -241,13 +241,13 @@ static inline void mark_purgable(struct msm_gem_object 
*msm_obj)
msm_obj->dontneed = true;
 }
 
-static inline void mark_unpurgable(struct msm_gem_object *msm_obj)
+static inline void mark_unpurgeable(struct msm_gem_object *msm_obj)
 {
struct msm_drm_private *priv = msm_obj->base.dev->dev_private;
 
WARN_ON(!mutex_is_loc

Re: [PATCH][next] drm/msm: Fix spelling mistake "Purgable" -> "Purgeable"

2021-04-06 Thread Rob Clark

On Tue, Apr 6, 2021 at 6:39 AM Colin King  wrote:
>
> From: Colin Ian King 
>
> There is a spelling mistake in debugfs gem stats. Fix it. Also
> re-align output to cater for the extra 1 character.
>
> Signed-off-by: Colin Ian King 
> ---
>  drivers/gpu/drm/msm/msm_gem.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
> index f146d9c5ba9c..4e2e0a93d17d 100644
> --- a/drivers/gpu/drm/msm/msm_gem.c
> +++ b/drivers/gpu/drm/msm/msm_gem.c
> @@ -979,13 +979,13 @@ void msm_gem_describe_objects(struct list_head *list, 
> struct seq_file *m)
> msm_gem_describe(obj, m, );
> }
>
> -   seq_printf(m, "Total:%4d objects, %9zu bytes\n",
> +   seq_printf(m, "Total: %4d objects, %9zu bytes\n",
> stats.all.count, stats.all.size);
> -   seq_printf(m, "Active:   %4d objects, %9zu bytes\n",
> +   seq_printf(m, "Active:%4d objects, %9zu bytes\n",
> stats.active.count, stats.active.size);
> -   seq_printf(m, "Purgable: %4d objects, %9zu bytes\n",
> +   seq_printf(m, "Purgeable: %4d objects, %9zu bytes\n",
> stats.purgable.count, stats.purgable.size);

oh, whoops.. I spel gud..

Thanks, applied.. I'll follow-up with fixing the spelling in the code

BR,
-R

> -   seq_printf(m, "Purged:   %4d objects, %9zu bytes\n",
> +   seq_printf(m, "Purged:%4d objects, %9zu bytes\n",
> stats.purged.count, stats.purged.size);
>  }
>  #endif
> --
> 2.30.2
>

[PATCH 8/8] drm/msm: Support evicting GEM objects to swap

2021-04-05 Thread Rob Clark

From: Rob Clark 

Now that tracking is wired up for potentially evictable GEM objects,
wire up shrinker and the remaining GEM bits for unpinning backing pages
of inactive objects.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem.c  | 23 
 drivers/gpu/drm/msm/msm_gem_shrinker.c | 37 +-
 drivers/gpu/drm/msm/msm_gpu_trace.h| 13 +
 3 files changed, 72 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 163a1d30b5c9..2b731cf42294 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -759,6 +759,29 @@ void msm_gem_purge(struct drm_gem_object *obj)
0, (loff_t)-1);
 }
 
+/**
+ * Unpin the backing pages and make them available to be swapped out.
+ */
+void msm_gem_evict(struct drm_gem_object *obj)
+{
+   struct drm_device *dev = obj->dev;
+   struct msm_gem_object *msm_obj = to_msm_bo(obj);
+
+   GEM_WARN_ON(!msm_gem_is_locked(obj));
+   GEM_WARN_ON(is_unevictable(msm_obj));
+   GEM_WARN_ON(!msm_obj->evictable);
+   GEM_WARN_ON(msm_obj->active_count);
+
+   /* Get rid of any iommu mapping(s): */
+   put_iova_spaces(obj, false);
+
+   drm_vma_node_unmap(>vma_node, dev->anon_inode->i_mapping);
+
+   put_pages(obj);
+
+   update_inactive(msm_obj);
+}
+
 void msm_gem_vunmap(struct drm_gem_object *obj)
 {
struct msm_gem_object *msm_obj = to_msm_bo(obj);
diff --git a/drivers/gpu/drm/msm/msm_gem_shrinker.c 
b/drivers/gpu/drm/msm/msm_gem_shrinker.c
index 38bf919f8508..52828028b9d4 100644
--- a/drivers/gpu/drm/msm/msm_gem_shrinker.c
+++ b/drivers/gpu/drm/msm/msm_gem_shrinker.c
@@ -9,12 +9,26 @@
 #include "msm_gpu.h"
 #include "msm_gpu_trace.h"
 
+bool enable_swap = true;
+MODULE_PARM_DESC(enable_swap, "Enable swappable GEM buffers");
+module_param(enable_swap, bool, 0600);
+
+static bool can_swap(void)
+{
+   return enable_swap && get_nr_swap_pages() > 0;
+}
+
 static unsigned long
 msm_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc)
 {
struct msm_drm_private *priv =
container_of(shrinker, struct msm_drm_private, shrinker);
-   return priv->shrinkable_count;
+   unsigned count = priv->shrinkable_count;
+
+   if (can_swap())
+   count += priv->evictable_count;
+
+   return count;
 }
 
 static bool
@@ -32,6 +46,17 @@ purge(struct msm_gem_object *msm_obj)
return true;
 }
 
+static bool
+evict(struct msm_gem_object *msm_obj)
+{
+   if (is_unevictable(msm_obj))
+   return false;
+
+   msm_gem_evict(_obj->base);
+
+   return true;
+}
+
 static unsigned long
 scan(struct msm_drm_private *priv, unsigned nr_to_scan, struct list_head *list,
bool (*shrink)(struct msm_gem_object *msm_obj))
@@ -104,6 +129,16 @@ msm_gem_shrinker_scan(struct shrinker *shrinker, struct 
shrink_control *sc)
if (freed > 0)
trace_msm_gem_purge(freed << PAGE_SHIFT);
 
+   if (can_swap() && freed < sc->nr_to_scan) {
+   int evicted = scan(priv, sc->nr_to_scan - freed,
+   >inactive_willneed, evict);
+
+   if (evicted > 0)
+   trace_msm_gem_evict(evicted << PAGE_SHIFT);
+
+   freed += evicted;
+   }
+
return (freed > 0) ? freed : SHRINK_STOP;
 }
 
diff --git a/drivers/gpu/drm/msm/msm_gpu_trace.h 
b/drivers/gpu/drm/msm/msm_gpu_trace.h
index 03e0c2536b94..ca0b08d7875b 100644
--- a/drivers/gpu/drm/msm/msm_gpu_trace.h
+++ b/drivers/gpu/drm/msm/msm_gpu_trace.h
@@ -128,6 +128,19 @@ TRACE_EVENT(msm_gem_purge,
 );
 
 
+TRACE_EVENT(msm_gem_evict,
+   TP_PROTO(u32 bytes),
+   TP_ARGS(bytes),
+   TP_STRUCT__entry(
+   __field(u32, bytes)
+   ),
+   TP_fast_assign(
+   __entry->bytes = bytes;
+   ),
+   TP_printk("Evicting %u bytes", __entry->bytes)
+);
+
+
 TRACE_EVENT(msm_gem_purge_vmaps,
TP_PROTO(u32 unmapped),
TP_ARGS(unmapped),
-- 
2.30.2

[PATCH 7/8] drm/msm: Small msm_gem_purge() fix

2021-04-05 Thread Rob Clark

From: Rob Clark 

Shoot down any mmap's *first* before put_pages().  Also add a WARN_ON
that the object is locked (to make it clear that this doesn't race with
msm_gem_fault()) and remove a redundant WARN_ON (since is_purgable()
already covers that case).

Fixes: 68209390f116 ("drm/msm: shrinker support")
Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 9ac89951080c..163a1d30b5c9 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -729,14 +729,16 @@ void msm_gem_purge(struct drm_gem_object *obj)
struct drm_device *dev = obj->dev;
struct msm_gem_object *msm_obj = to_msm_bo(obj);
 
+   GEM_WARN_ON(!msm_gem_is_locked(obj));
GEM_WARN_ON(!is_purgeable(msm_obj));
-   GEM_WARN_ON(obj->import_attach);
 
/* Get rid of any iommu mapping(s): */
put_iova_spaces(obj, true);
 
msm_gem_vunmap(obj);
 
+   drm_vma_node_unmap(>vma_node, dev->anon_inode->i_mapping);
+
put_pages(obj);
 
put_iova_vmas(obj);
@@ -744,7 +746,6 @@ void msm_gem_purge(struct drm_gem_object *obj)
msm_obj->madv = __MSM_MADV_PURGED;
update_inactive(msm_obj);
 
-   drm_vma_node_unmap(>vma_node, dev->anon_inode->i_mapping);
drm_gem_free_mmap_offset(obj);
 
/* Our goal here is to return as much of the memory as
-- 
2.30.2

[PATCH 6/8] drm/msm: Track potentially evictable objects

2021-04-05 Thread Rob Clark

From: Rob Clark 

Objects that are potential for swapping out are (1) willneed (ie. if
they are purgable/MADV_WONTNEED we can just free the pages without them
having to land in swap), (2) not on an active list, (3) not dma-buf
imported or exported, and (4) not vmap'd.  This repurposes the purged
list for objects that do not have backing pages (either because they
have not been pinned for the first time yet, or in a later patch because
they have been unpinned/evicted.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_drv.c |  2 +-
 drivers/gpu/drm/msm/msm_drv.h | 13 ++
 drivers/gpu/drm/msm/msm_gem.c | 44 ++
 drivers/gpu/drm/msm/msm_gem.h | 45 +++
 4 files changed, 89 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index e12d5fbd0a34..d3d6c743b7af 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -451,7 +451,7 @@ static int msm_drm_init(struct device *dev, const struct 
drm_driver *drv)
 
INIT_LIST_HEAD(>inactive_willneed);
INIT_LIST_HEAD(>inactive_dontneed);
-   INIT_LIST_HEAD(>inactive_purged);
+   INIT_LIST_HEAD(>inactive_unpinned);
mutex_init(>mm_lock);
 
/* Teach lockdep about lock ordering wrt. shrinker: */
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index 6a42cdf4cf7e..2668941df529 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -182,11 +182,15 @@ struct msm_drm_private {
struct mutex obj_lock;
 
/**
-* Lists of inactive GEM objects.  Every bo is either in one of the
+* LRUs of inactive GEM objects.  Every bo is either in one of the
 * inactive lists (depending on whether or not it is shrinkable) or
 * gpu->active_list (for the gpu it is active on[1]), or transiently
 * on a temporary list as the shrinker is running.
 *
+* Note that inactive_willneed also contains pinned and vmap'd bos,
+* but the number of pinned-but-not-active objects is small (scanout
+* buffers, ringbuffer, etc).
+*
 * These lists are protected by mm_lock (which should be acquired
 * before per GEM object lock).  One should *not* hold mm_lock in
 * get_pages()/vmap()/etc paths, as they can trigger the shrinker.
@@ -194,10 +198,11 @@ struct msm_drm_private {
 * [1] if someone ever added support for the old 2d cores, there could 
be
 * more than one gpu object
 */
-   struct list_head inactive_willneed;  /* inactive + !shrinkable */
-   struct list_head inactive_dontneed;  /* inactive +  shrinkable */
-   struct list_head inactive_purged;/* inactive +  purged */
+   struct list_head inactive_willneed;  /* inactive + potentially 
unpin/evictable */
+   struct list_head inactive_dontneed;  /* inactive + shrinkable */
+   struct list_head inactive_unpinned;  /* inactive + purged or unpinned */
long shrinkable_count;   /* write access under mm_lock */
+   long evictable_count;/* write access under mm_lock */
struct mutex mm_lock;
 
struct workqueue_struct *wq;
diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 9ff37904ec2b..9ac89951080c 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -130,6 +130,9 @@ static struct page **get_pages(struct drm_gem_object *obj)
 */
if (msm_obj->flags & (MSM_BO_WC|MSM_BO_UNCACHED))
sync_for_device(msm_obj);
+
+   GEM_WARN_ON(msm_obj->active_count);
+   update_inactive(msm_obj);
}
 
return msm_obj->pages;
@@ -428,7 +431,7 @@ static int msm_gem_pin_iova(struct drm_gem_object *obj,
struct msm_gem_object *msm_obj = to_msm_bo(obj);
struct msm_gem_vma *vma;
struct page **pages;
-   int prot = IOMMU_READ;
+   int ret, prot = IOMMU_READ;
 
if (!(msm_obj->flags & MSM_BO_GPU_READONLY))
prot |= IOMMU_WRITE;
@@ -449,8 +452,13 @@ static int msm_gem_pin_iova(struct drm_gem_object *obj,
if (IS_ERR(pages))
return PTR_ERR(pages);
 
-   return msm_gem_map_vma(aspace, vma, prot,
+   ret = msm_gem_map_vma(aspace, vma, prot,
msm_obj->sgt, obj->size >> PAGE_SHIFT);
+
+   if (!ret)
+   msm_obj->pin_count++;
+
+   return ret;
 }
 
 static int get_and_pin_iova_range_locked(struct drm_gem_object *obj,
@@ -542,14 +550,21 @@ uint64_t msm_gem_iova(struct drm_gem_object *obj,
 void msm_gem_unpin_iova_locked(struct drm_gem_object *obj,
struct msm_gem_address_space *aspace)
 {
+   struct msm_gem_object *msm_obj = to_msm_bo(obj);
struct msm_gem_vma *vma;
 
GEM_WARN_ON

[PATCH 5/8] drm/msm: Add $debugfs/gem stats on resident objects

2021-04-05 Thread Rob Clark

From: Rob Clark 

Currently nearly everything, other than newly allocated objects which
are not yet backed by pages, is pinned and resident in RAM.  But it will
be nice to have some stats on what is unpinned once that is supported.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem.c | 7 +++
 drivers/gpu/drm/msm/msm_gem.h | 4 ++--
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 5f0647adc29d..9ff37904ec2b 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -902,6 +902,11 @@ void msm_gem_describe(struct drm_gem_object *obj, struct 
seq_file *m,
stats->active.size += obj->size;
}
 
+   if (msm_obj->pages) {
+   stats->resident.count++;
+   stats->resident.size += obj->size;
+   }
+
switch (msm_obj->madv) {
case __MSM_MADV_PURGED:
stats->purged.count++;
@@ -991,6 +996,8 @@ void msm_gem_describe_objects(struct list_head *list, 
struct seq_file *m)
stats.all.count, stats.all.size);
seq_printf(m, "Active:   %4d objects, %9zu bytes\n",
stats.active.count, stats.active.size);
+   seq_printf(m, "Resident: %4d objects, %9zu bytes\n",
+   stats.resident.count, stats.resident.size);
seq_printf(m, "Purgable: %4d objects, %9zu bytes\n",
stats.purgable.count, stats.purgable.size);
seq_printf(m, "Purged:   %4d objects, %9zu bytes\n",
diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index 917af526a5c5..e13a9301b616 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -162,13 +162,13 @@ struct drm_gem_object *msm_gem_import(struct drm_device 
*dev,
struct dma_buf *dmabuf, struct sg_table *sgt);
 __printf(2, 3)
 void msm_gem_object_set_name(struct drm_gem_object *bo, const char *fmt, ...);
-#ifdef CONFIG_DEBUG_FS
 
+#ifdef CONFIG_DEBUG_FS
 struct msm_gem_stats {
struct {
unsigned count;
size_t size;
-   } all, active, purgable, purged;
+   } all, active, resident, purgable, purged;
 };
 
 void msm_gem_describe(struct drm_gem_object *obj, struct seq_file *m,
-- 
2.30.2

[PATCH 4/8] drm/msm: Split iova purge and close

2021-04-05 Thread Rob Clark

From: Rob Clark 

Currently these always go together, either when we purge MADV_WONTNEED
objects or when the object is freed.  But for unpin, we want to be able
to purge (unmap from iommu) the vma, while keeping the iova range
allocated (so we can remap back to the same GPU virtual address when the
object is re-pinned.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 71530a89b675..5f0647adc29d 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -357,9 +357,14 @@ static void del_vma(struct msm_gem_vma *vma)
kfree(vma);
 }
 
-/* Called with msm_obj locked */
+/**
+ * If close is true, this also closes the VMA (releasing the allocated
+ * iova range) in addition to removing the iommu mapping.  In the eviction
+ * case (!close), we keep the iova allocated, but only remove the iommu
+ * mapping.
+ */
 static void
-put_iova_spaces(struct drm_gem_object *obj)
+put_iova_spaces(struct drm_gem_object *obj, bool close)
 {
struct msm_gem_object *msm_obj = to_msm_bo(obj);
struct msm_gem_vma *vma;
@@ -369,7 +374,8 @@ put_iova_spaces(struct drm_gem_object *obj)
list_for_each_entry(vma, _obj->vmas, list) {
if (vma->aspace) {
msm_gem_purge_vma(vma->aspace, vma);
-   msm_gem_close_vma(vma->aspace, vma);
+   if (close)
+   msm_gem_close_vma(vma->aspace, vma);
}
}
 }
@@ -711,7 +717,8 @@ void msm_gem_purge(struct drm_gem_object *obj)
GEM_WARN_ON(!is_purgeable(msm_obj));
GEM_WARN_ON(obj->import_attach);
 
-   put_iova_spaces(obj);
+   /* Get rid of any iommu mapping(s): */
+   put_iova_spaces(obj, true);
 
msm_gem_vunmap(obj);
 
@@ -1013,7 +1020,7 @@ void msm_gem_free_object(struct drm_gem_object *obj)
/* object should not be on active list: */
GEM_WARN_ON(is_active(msm_obj));
 
-   put_iova_spaces(obj);
+   put_iova_spaces(obj, true);
 
if (obj->import_attach) {
GEM_WARN_ON(msm_obj->vaddr);
-- 
2.30.2

[PATCH 3/8] drm/msm: Clear msm_obj->sgt in put_pages()

2021-04-05 Thread Rob Clark

From: Rob Clark 

Currently this doesn't matter since we keep the pages pinned until the
object is destroyed.  But when we start unpinning pages to allow objects
to be evicted to swap, it will.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index d5abe8aa9978..71530a89b675 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -162,6 +162,7 @@ static void put_pages(struct drm_gem_object *obj)
 
sg_free_table(msm_obj->sgt);
kfree(msm_obj->sgt);
+   msm_obj->sgt = NULL;
}
 
if (use_pages(obj))
-- 
2.30.2

[PATCH 2/8] drm/msm: Reorganize msm_gem_shrinker_scan()

2021-04-05 Thread Rob Clark

From: Rob Clark 

So we don't have to duplicate the boilerplate for eviction.

This also lets us re-use the main scan loop for vmap shrinker.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem_shrinker.c | 94 +-
 1 file changed, 46 insertions(+), 48 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem_shrinker.c 
b/drivers/gpu/drm/msm/msm_gem_shrinker.c
index 33a49641ef30..38bf919f8508 100644
--- a/drivers/gpu/drm/msm/msm_gem_shrinker.c
+++ b/drivers/gpu/drm/msm/msm_gem_shrinker.c
@@ -17,21 +17,35 @@ msm_gem_shrinker_count(struct shrinker *shrinker, struct 
shrink_control *sc)
return priv->shrinkable_count;
 }
 
+static bool
+purge(struct msm_gem_object *msm_obj)
+{
+   if (!is_purgeable(msm_obj))
+   return false;
+
+   /*
+* This will move the obj out of still_in_list to
+* the purged list
+*/
+   msm_gem_purge(_obj->base);
+
+   return true;
+}
+
 static unsigned long
-msm_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
+scan(struct msm_drm_private *priv, unsigned nr_to_scan, struct list_head *list,
+   bool (*shrink)(struct msm_gem_object *msm_obj))
 {
-   struct msm_drm_private *priv =
-   container_of(shrinker, struct msm_drm_private, shrinker);
+   unsigned freed = 0;
struct list_head still_in_list;
-   unsigned long freed = 0;
 
INIT_LIST_HEAD(_in_list);
 
mutex_lock(>mm_lock);
 
-   while (freed < sc->nr_to_scan) {
+   while (freed < nr_to_scan) {
struct msm_gem_object *msm_obj = list_first_entry_or_null(
-   >inactive_dontneed, typeof(*msm_obj), 
mm_list);
+   list, typeof(*msm_obj), mm_list);
 
if (!msm_obj)
break;
@@ -62,14 +76,9 @@ msm_gem_shrinker_scan(struct shrinker *shrinker, struct 
shrink_control *sc)
if (!msm_gem_trylock(_obj->base))
goto tail;
 
-   if (is_purgeable(msm_obj)) {
-   /*
-* This will move the obj out of still_in_list to
-* the purged list
-*/
-   msm_gem_purge(_obj->base);
+   if (shrink(msm_obj))
freed += msm_obj->base.size >> PAGE_SHIFT;
-   }
+
msm_gem_unlock(_obj->base);
 
 tail:
@@ -77,16 +86,25 @@ msm_gem_shrinker_scan(struct shrinker *shrinker, struct 
shrink_control *sc)
mutex_lock(>mm_lock);
}
 
-   list_splice_tail(_in_list, >inactive_dontneed);
+   list_splice_tail(_in_list, list);
mutex_unlock(>mm_lock);
 
-   if (freed > 0) {
+   return freed;
+}
+
+static unsigned long
+msm_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
+{
+   struct msm_drm_private *priv =
+   container_of(shrinker, struct msm_drm_private, shrinker);
+   unsigned long freed;
+
+   freed = scan(priv, sc->nr_to_scan, >inactive_dontneed, purge);
+
+   if (freed > 0)
trace_msm_gem_purge(freed << PAGE_SHIFT);
-   } else {
-   return SHRINK_STOP;
-   }
 
-   return freed;
+   return (freed > 0) ? freed : SHRINK_STOP;
 }
 
 /* since we don't know any better, lets bail after a few
@@ -95,29 +113,15 @@ msm_gem_shrinker_scan(struct shrinker *shrinker, struct 
shrink_control *sc)
  */
 static const int vmap_shrink_limit = 15;
 
-static unsigned
-vmap_shrink(struct list_head *mm_list)
+static bool
+vmap_shrink(struct msm_gem_object *msm_obj)
 {
-   struct msm_gem_object *msm_obj;
-   unsigned unmapped = 0;
+   if (!is_vunmapable(msm_obj))
+   return false;
 
-   list_for_each_entry(msm_obj, mm_list, mm_list) {
-   /* Use trylock, because we cannot block on a obj that
-* might be trying to acquire mm_lock
-*/
-   if (!msm_gem_trylock(_obj->base))
-   continue;
-   if (is_vunmapable(msm_obj)) {
-   msm_gem_vunmap(_obj->base);
-   unmapped++;
-   }
-   msm_gem_unlock(_obj->base);
+   msm_gem_vunmap(_obj->base);
 
-   if (++unmapped >= vmap_shrink_limit)
-   break;
-   }
-
-   return unmapped;
+   return true;
 }
 
 static int
@@ -133,17 +137,11 @@ msm_gem_shrinker_vmap(struct notifier_block *nb, unsigned 
long event, void *ptr)
};
unsigned idx, unmapped = 0;
 
-   mutex_lock(>mm_lock);
-
-   for (idx = 0; mm_lists[idx]; idx++) {
-   unmapped += vmap_shrink(mm_lists[idx]);
-
-   if (unmapped >= vmap_shrink_limit)
-   break;
+   for (idx = 0; mm_lists[idx] &&am

[PATCH 1/8] drm/msm: ratelimit GEM related WARN_ON()s

2021-04-05 Thread Rob Clark

From: Rob Clark 

If you mess something up, you don't really need to see the same warn on
splat 4000 times pumped out a slow debug UART port..

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem.c | 66 +--
 drivers/gpu/drm/msm/msm_gem.h | 19 ++
 2 files changed, 45 insertions(+), 40 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 4e91b095ab77..d5abe8aa9978 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -96,7 +96,7 @@ static struct page **get_pages(struct drm_gem_object *obj)
 {
struct msm_gem_object *msm_obj = to_msm_bo(obj);
 
-   WARN_ON(!msm_gem_is_locked(obj));
+   GEM_WARN_ON(!msm_gem_is_locked(obj));
 
if (!msm_obj->pages) {
struct drm_device *dev = obj->dev;
@@ -180,7 +180,7 @@ struct page **msm_gem_get_pages(struct drm_gem_object *obj)
 
msm_gem_lock(obj);
 
-   if (WARN_ON(msm_obj->madv != MSM_MADV_WILLNEED)) {
+   if (GEM_WARN_ON(msm_obj->madv != MSM_MADV_WILLNEED)) {
msm_gem_unlock(obj);
return ERR_PTR(-EBUSY);
}
@@ -256,7 +256,7 @@ static vm_fault_t msm_gem_fault(struct vm_fault *vmf)
goto out;
}
 
-   if (WARN_ON(msm_obj->madv != MSM_MADV_WILLNEED)) {
+   if (GEM_WARN_ON(msm_obj->madv != MSM_MADV_WILLNEED)) {
msm_gem_unlock(obj);
return VM_FAULT_SIGBUS;
}
@@ -289,7 +289,7 @@ static uint64_t mmap_offset(struct drm_gem_object *obj)
struct drm_device *dev = obj->dev;
int ret;
 
-   WARN_ON(!msm_gem_is_locked(obj));
+   GEM_WARN_ON(!msm_gem_is_locked(obj));
 
/* Make it mmapable */
ret = drm_gem_create_mmap_offset(obj);
@@ -318,7 +318,7 @@ static struct msm_gem_vma *add_vma(struct drm_gem_object 
*obj,
struct msm_gem_object *msm_obj = to_msm_bo(obj);
struct msm_gem_vma *vma;
 
-   WARN_ON(!msm_gem_is_locked(obj));
+   GEM_WARN_ON(!msm_gem_is_locked(obj));
 
vma = kzalloc(sizeof(*vma), GFP_KERNEL);
if (!vma)
@@ -337,7 +337,7 @@ static struct msm_gem_vma *lookup_vma(struct drm_gem_object 
*obj,
struct msm_gem_object *msm_obj = to_msm_bo(obj);
struct msm_gem_vma *vma;
 
-   WARN_ON(!msm_gem_is_locked(obj));
+   GEM_WARN_ON(!msm_gem_is_locked(obj));
 
list_for_each_entry(vma, _obj->vmas, list) {
if (vma->aspace == aspace)
@@ -363,7 +363,7 @@ put_iova_spaces(struct drm_gem_object *obj)
struct msm_gem_object *msm_obj = to_msm_bo(obj);
struct msm_gem_vma *vma;
 
-   WARN_ON(!msm_gem_is_locked(obj));
+   GEM_WARN_ON(!msm_gem_is_locked(obj));
 
list_for_each_entry(vma, _obj->vmas, list) {
if (vma->aspace) {
@@ -380,7 +380,7 @@ put_iova_vmas(struct drm_gem_object *obj)
struct msm_gem_object *msm_obj = to_msm_bo(obj);
struct msm_gem_vma *vma, *tmp;
 
-   WARN_ON(!msm_gem_is_locked(obj));
+   GEM_WARN_ON(!msm_gem_is_locked(obj));
 
list_for_each_entry_safe(vma, tmp, _obj->vmas, list) {
del_vma(vma);
@@ -394,7 +394,7 @@ static int get_iova_locked(struct drm_gem_object *obj,
struct msm_gem_vma *vma;
int ret = 0;
 
-   WARN_ON(!msm_gem_is_locked(obj));
+   GEM_WARN_ON(!msm_gem_is_locked(obj));
 
vma = lookup_vma(obj, aspace);
 
@@ -429,13 +429,13 @@ static int msm_gem_pin_iova(struct drm_gem_object *obj,
if (msm_obj->flags & MSM_BO_MAP_PRIV)
prot |= IOMMU_PRIV;
 
-   WARN_ON(!msm_gem_is_locked(obj));
+   GEM_WARN_ON(!msm_gem_is_locked(obj));
 
-   if (WARN_ON(msm_obj->madv != MSM_MADV_WILLNEED))
+   if (GEM_WARN_ON(msm_obj->madv != MSM_MADV_WILLNEED))
return -EBUSY;
 
vma = lookup_vma(obj, aspace);
-   if (WARN_ON(!vma))
+   if (GEM_WARN_ON(!vma))
return -EINVAL;
 
pages = get_pages(obj);
@@ -453,7 +453,7 @@ static int get_and_pin_iova_range_locked(struct 
drm_gem_object *obj,
u64 local;
int ret;
 
-   WARN_ON(!msm_gem_is_locked(obj));
+   GEM_WARN_ON(!msm_gem_is_locked(obj));
 
ret = get_iova_locked(obj, aspace, ,
range_start, range_end);
@@ -524,7 +524,7 @@ uint64_t msm_gem_iova(struct drm_gem_object *obj,
msm_gem_lock(obj);
vma = lookup_vma(obj, aspace);
msm_gem_unlock(obj);
-   WARN_ON(!vma);
+   GEM_WARN_ON(!vma);
 
return vma ? vma->iova : 0;
 }
@@ -537,11 +537,11 @@ void msm_gem_unpin_iova_locked(struct drm_gem_object *obj,
 {
struct msm_gem_vma *vma;
 
-   WARN_ON(!msm_gem_is_locked(obj));
+   GEM_WARN_ON(!msm_gem_is_locked(obj));
 
vma = lookup_vma(obj, aspace);
 
-   if (!WARN_ON(!vma))
+   if (!GEM_WARN_ON(!vma))
msm_gem_unmap_vma(aspace, vma);
 }
 
@@ -593,12 +593,12

[PATCH 0/8] drm/msm: Swappable GEM objects

2021-04-05 Thread Rob Clark

From: Rob Clark 

One would normally hope not to be under enough memory pressure to need
to swap GEM objects to disk backed swap.  But memory backed zram swap
(as enabled on chromebooks, for example) can actually be quite fast
and useful on devices with less RAM.  On a 4GB device, opening up ~4
memory intensive web pages (in separate windows rather than tabs, to try
and prevent tab discard), I see ~500MB worth of GEM objects, of which
maybe only 10% are active at any time, and with unpin/evict enabled,
only about half resident (which is a number that gets much lower if you
simulate extreme memory pressure).  Assuming a 2:1 compression ratio (I
see a bit higher in practice, but cannot isolate swapped out GEM pages
vs other), that is like having an extra 100+MB of RAM, or more under
higher memory pressure.

Rob Clark (8):
  drm/msm: ratelimit GEM related WARN_ON()s
  drm/msm: Reorganize msm_gem_shrinker_scan()
  drm/msm: Clear msm_obj->sgt in put_pages()
  drm/msm: Split iova purge and close
  drm/msm: Add $debugfs/gem stats on resident objects
  drm/msm: Track potentially evictable objects
  drm/msm: Small msm_gem_purge() fix
  drm/msm: Support evicting GEM objects to swap

 drivers/gpu/drm/msm/msm_drv.c  |   2 +-
 drivers/gpu/drm/msm/msm_drv.h  |  13 ++-
 drivers/gpu/drm/msm/msm_gem.c  | 155 +
 drivers/gpu/drm/msm/msm_gem.h  |  68 +--
 drivers/gpu/drm/msm/msm_gem_shrinker.c | 129 
 drivers/gpu/drm/msm/msm_gpu_trace.h|  13 +++
 6 files changed, 272 insertions(+), 108 deletions(-)

-- 
2.30.2

[PATCH v2] drm/msm: Drop mm_lock in scan loop

2021-04-02 Thread Rob Clark

From: Rob Clark 

lock_stat + mmm_donut[1] say that this reduces contention on mm_lock
significantly (~350x lower waittime-max, and ~100x lower waittime-avg)

[1] 
https://chromium.googlesource.com/chromiumos/platform/microbenchmarks/+/refs/heads/main/mmm_donut.py

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_drv.h  |  3 +-
 drivers/gpu/drm/msm/msm_gem.c  |  2 +-
 drivers/gpu/drm/msm/msm_gem_shrinker.c | 48 ++
 3 files changed, 45 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index c84e6f84cb6d..d8d64d34e6e3 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -184,7 +184,8 @@ struct msm_drm_private {
/**
 * Lists of inactive GEM objects.  Every bo is either in one of the
 * inactive lists (depending on whether or not it is shrinkable) or
-* gpu->active_list (for the gpu it is active on[1])
+* gpu->active_list (for the gpu it is active on[1]), or transiently
+* on a temporary list as the shrinker is running.
 *
 * These lists are protected by mm_lock (which should be acquired
 * before per GEM object lock).  One should *not* hold mm_lock in
diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 2ecf7f1cef25..75cea5b801da 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -719,7 +719,7 @@ void msm_gem_purge(struct drm_gem_object *obj)
put_iova_vmas(obj);
 
msm_obj->madv = __MSM_MADV_PURGED;
-   mark_unpurgable(msm_obj);
+   update_inactive(msm_obj);
 
drm_vma_node_unmap(>vma_node, dev->anon_inode->i_mapping);
drm_gem_free_mmap_offset(obj);
diff --git a/drivers/gpu/drm/msm/msm_gem_shrinker.c 
b/drivers/gpu/drm/msm/msm_gem_shrinker.c
index f3e948af01c5..33a49641ef30 100644
--- a/drivers/gpu/drm/msm/msm_gem_shrinker.c
+++ b/drivers/gpu/drm/msm/msm_gem_shrinker.c
@@ -22,26 +22,62 @@ msm_gem_shrinker_scan(struct shrinker *shrinker, struct 
shrink_control *sc)
 {
struct msm_drm_private *priv =
container_of(shrinker, struct msm_drm_private, shrinker);
-   struct msm_gem_object *msm_obj;
+   struct list_head still_in_list;
unsigned long freed = 0;
 
+   INIT_LIST_HEAD(_in_list);
+
mutex_lock(>mm_lock);
 
-   list_for_each_entry(msm_obj, >inactive_dontneed, mm_list) {
-   if (freed >= sc->nr_to_scan)
+   while (freed < sc->nr_to_scan) {
+   struct msm_gem_object *msm_obj = list_first_entry_or_null(
+   >inactive_dontneed, typeof(*msm_obj), 
mm_list);
+
+   if (!msm_obj)
break;
-   /* Use trylock, because we cannot block on a obj that
-* might be trying to acquire mm_lock
+
+   list_move_tail(_obj->mm_list, _in_list);
+
+   /*
+* If it is in the process of being freed, msm_gem_free_object
+* can be blocked on mm_lock waiting to remove it.  So just
+* skip it.
 */
-   if (!msm_gem_trylock(_obj->base))
+   if (!kref_get_unless_zero(_obj->base.refcount))
continue;
+
+   /*
+* Now that we own a reference, we can drop mm_lock for the
+* rest of the loop body, to reduce contention with the
+* retire_submit path (which could make more objects purgable)
+*/
+
+   mutex_unlock(>mm_lock);
+
+   /*
+* Note that this still needs to be trylock, since we can
+* hit shrinker in response to trying to get backing pages
+* for this obj (ie. while it's lock is already held)
+*/
+   if (!msm_gem_trylock(_obj->base))
+   goto tail;
+
if (is_purgeable(msm_obj)) {
+   /*
+* This will move the obj out of still_in_list to
+* the purged list
+*/
msm_gem_purge(_obj->base);
freed += msm_obj->base.size >> PAGE_SHIFT;
}
msm_gem_unlock(_obj->base);
+
+tail:
+   drm_gem_object_put(_obj->base);
+   mutex_lock(>mm_lock);
}
 
+   list_splice_tail(_in_list, >inactive_dontneed);
mutex_unlock(>mm_lock);
 
if (freed > 0) {
-- 
2.30.2

Re: [v1] drm/msm/disp/dpu1: program 3d_merge only if block is attached

2021-04-02 Thread Rob Clark

On Fri, Apr 2, 2021 at 4:55 AM Kalyan Thota  wrote:
>
> Update the 3d merge as active in the data path only if
> the hw block is selected in the configuration.
>
> Reported-by: Stephen Boyd 

Thanks, I've added:

Fixes: 73bfb790ac78 ("msm:disp:dpu1: setup display datapath for SC7180 target")

> Signed-off-by: Kalyan Thota 
> ---
>  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.c 
> b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.c
> index 8981cfa..92e6f1b 100644
> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.c
> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.c
> @@ -496,7 +496,9 @@ static void dpu_hw_ctl_intf_cfg_v1(struct dpu_hw_ctl *ctx,
>
> DPU_REG_WRITE(c, CTL_TOP, mode_sel);
> DPU_REG_WRITE(c, CTL_INTF_ACTIVE, intf_active);
> -   DPU_REG_WRITE(c, CTL_MERGE_3D_ACTIVE, BIT(cfg->merge_3d - 
> MERGE_3D_0));
> +   if (cfg->merge_3d)
> +   DPU_REG_WRITE(c, CTL_MERGE_3D_ACTIVE,
> + BIT(cfg->merge_3d - MERGE_3D_0));
>  }
>
>  static void dpu_hw_ctl_intf_cfg(struct dpu_hw_ctl *ctx,
> --
> 2.7.4
>

Re: [PATCH] drm/msm/a6xx: fix for kernels without CONFIG_NVMEM

2021-04-01 Thread Rob Clark

On Thu, Apr 1, 2021 at 2:03 PM Dmitry Baryshkov
 wrote:
>
> On Thu, 1 Apr 2021 at 23:09, Rob Clark  wrote:
> >
> > On Mon, Feb 22, 2021 at 8:06 AM Rob Clark  wrote:
> > >
> > > On Mon, Feb 22, 2021 at 7:45 AM Akhil P Oommen  
> > > wrote:
> > > >
> > > > On 2/19/2021 9:30 PM, Rob Clark wrote:
> > > > > On Fri, Feb 19, 2021 at 2:44 AM Akhil P Oommen 
> > > > >  wrote:
> > > > >>
> > > > >> On 2/18/2021 9:41 PM, Rob Clark wrote:
> > > > >>> On Thu, Feb 18, 2021 at 4:28 AM Akhil P Oommen 
> > > > >>>  wrote:
> > > > >>>>
> > > > >>>> On 2/18/2021 2:05 AM, Jonathan Marek wrote:
> > > > >>>>> On 2/17/21 3:18 PM, Rob Clark wrote:
> > > > >>>>>> On Wed, Feb 17, 2021 at 11:08 AM Jordan Crouse
> > > > >>>>>>  wrote:
> > > > >>>>>>>
> > > > >>>>>>> On Wed, Feb 17, 2021 at 07:14:16PM +0530, Akhil P Oommen wrote:
> > > > >>>>>>>> On 2/17/2021 8:36 AM, Rob Clark wrote:
> > > > >>>>>>>>> On Tue, Feb 16, 2021 at 12:10 PM Jonathan Marek 
> > > > >>>>>>>>> 
> > > > >>>>>>>>> wrote:
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Ignore nvmem_cell_get() EOPNOTSUPP error in the same way as a
> > > > >>>>>>>>>> ENOENT error,
> > > > >>>>>>>>>> to fix the case where the kernel was compiled without 
> > > > >>>>>>>>>> CONFIG_NVMEM.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Fixes: fe7952c629da ("drm/msm: Add speed-bin support to a618 
> > > > >>>>>>>>>> gpu")
> > > > >>>>>>>>>> Signed-off-by: Jonathan Marek 
> > > > >>>>>>>>>> ---
> > > > >>>>>>>>>> drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 6 +++---
> > > > >>>>>>>>>> 1 file changed, 3 insertions(+), 3 deletions(-)
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > > >>>>>>>>>> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > > >>>>>>>>>> index ba8e9d3cf0fe..7fe5d97606aa 100644
> > > > >>>>>>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > > >>>>>>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > > >>>>>>>>>> @@ -1356,10 +1356,10 @@ static int 
> > > > >>>>>>>>>> a6xx_set_supported_hw(struct
> > > > >>>>>>>>>> device *dev, struct a6xx_gpu *a6xx_gpu,
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>cell = nvmem_cell_get(dev, "speed_bin");
> > > > >>>>>>>>>>/*
> > > > >>>>>>>>>> -* -ENOENT means that the platform doesn't support
> > > > >>>>>>>>>> speedbin which is
> > > > >>>>>>>>>> -* fine
> > > > >>>>>>>>>> +* -ENOENT means no speed bin in device tree,
> > > > >>>>>>>>>> +* -EOPNOTSUPP means kernel was built without 
> > > > >>>>>>>>>> CONFIG_NVMEM
> > > > >>>>>>>>>
> > > > >>>>>>>>> very minor nit, it would be nice to at least preserve the 
> > > > >>>>>>>>> gist of the
> > > > >>>>>>>>> "which is fine" (ie. some variation of "this is an optional 
> > > > >>>>>>>>> thing and
> > > > >>>>>>>>> things won't catch fire without it" ;-))
> > > > >>>>>>>>>
> > > > >>>>>>>>> (which is, I beli

Re: [PATCH] drm/msm/a6xx: fix for kernels without CONFIG_NVMEM

2021-04-01 Thread Rob Clark

On Mon, Feb 22, 2021 at 8:06 AM Rob Clark  wrote:
>
> On Mon, Feb 22, 2021 at 7:45 AM Akhil P Oommen  wrote:
> >
> > On 2/19/2021 9:30 PM, Rob Clark wrote:
> > > On Fri, Feb 19, 2021 at 2:44 AM Akhil P Oommen  
> > > wrote:
> > >>
> > >> On 2/18/2021 9:41 PM, Rob Clark wrote:
> > >>> On Thu, Feb 18, 2021 at 4:28 AM Akhil P Oommen  
> > >>> wrote:
> > >>>>
> > >>>> On 2/18/2021 2:05 AM, Jonathan Marek wrote:
> > >>>>> On 2/17/21 3:18 PM, Rob Clark wrote:
> > >>>>>> On Wed, Feb 17, 2021 at 11:08 AM Jordan Crouse
> > >>>>>>  wrote:
> > >>>>>>>
> > >>>>>>> On Wed, Feb 17, 2021 at 07:14:16PM +0530, Akhil P Oommen wrote:
> > >>>>>>>> On 2/17/2021 8:36 AM, Rob Clark wrote:
> > >>>>>>>>> On Tue, Feb 16, 2021 at 12:10 PM Jonathan Marek 
> > >>>>>>>>> 
> > >>>>>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>> Ignore nvmem_cell_get() EOPNOTSUPP error in the same way as a
> > >>>>>>>>>> ENOENT error,
> > >>>>>>>>>> to fix the case where the kernel was compiled without 
> > >>>>>>>>>> CONFIG_NVMEM.
> > >>>>>>>>>>
> > >>>>>>>>>> Fixes: fe7952c629da ("drm/msm: Add speed-bin support to a618 
> > >>>>>>>>>> gpu")
> > >>>>>>>>>> Signed-off-by: Jonathan Marek 
> > >>>>>>>>>> ---
> > >>>>>>>>>> drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 6 +++---
> > >>>>>>>>>> 1 file changed, 3 insertions(+), 3 deletions(-)
> > >>>>>>>>>>
> > >>>>>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > >>>>>>>>>> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > >>>>>>>>>> index ba8e9d3cf0fe..7fe5d97606aa 100644
> > >>>>>>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > >>>>>>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > >>>>>>>>>> @@ -1356,10 +1356,10 @@ static int a6xx_set_supported_hw(struct
> > >>>>>>>>>> device *dev, struct a6xx_gpu *a6xx_gpu,
> > >>>>>>>>>>
> > >>>>>>>>>>cell = nvmem_cell_get(dev, "speed_bin");
> > >>>>>>>>>>/*
> > >>>>>>>>>> -* -ENOENT means that the platform doesn't support
> > >>>>>>>>>> speedbin which is
> > >>>>>>>>>> -* fine
> > >>>>>>>>>> +* -ENOENT means no speed bin in device tree,
> > >>>>>>>>>> +* -EOPNOTSUPP means kernel was built without 
> > >>>>>>>>>> CONFIG_NVMEM
> > >>>>>>>>>
> > >>>>>>>>> very minor nit, it would be nice to at least preserve the gist of 
> > >>>>>>>>> the
> > >>>>>>>>> "which is fine" (ie. some variation of "this is an optional thing 
> > >>>>>>>>> and
> > >>>>>>>>> things won't catch fire without it" ;-))
> > >>>>>>>>>
> > >>>>>>>>> (which is, I believe, is true, hopefully Akhil could confirm.. if 
> > >>>>>>>>> not
> > >>>>>>>>> we should have a harder dependency on CONFIG_NVMEM..)
> > >>>>>>>> IIRC, if the gpu opp table in the DT uses the 'opp-supported-hw'
> > >>>>>>>> property,
> > >>>>>>>> we will see some error during boot up if we don't call
> > >>>>>>>> dev_pm_opp_set_supported_hw(). So calling "nvmem_cell_get(dev,
> > >>>>>>>> "speed_bin")"
> > >>>>>>>> is a way to test this.
> > >>>>>>>>
> > >>>>>

[PATCH] drm/msm: Drop mm_lock in scan loop

2021-04-01 Thread Rob Clark

From: Rob Clark 

lock_stat + mmm_donut[1] say that this reduces contention on mm_lock
significantly (~350x lower waittime-max, and ~100x lower waittime-avg)

[1] 
https://chromium.googlesource.com/chromiumos/platform/microbenchmarks/+/refs/heads/main/mmm_donut.py

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem.c  |  2 +-
 drivers/gpu/drm/msm/msm_gem_shrinker.c | 48 ++
 2 files changed, 43 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 2ecf7f1cef25..75cea5b801da 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -719,7 +719,7 @@ void msm_gem_purge(struct drm_gem_object *obj)
put_iova_vmas(obj);
 
msm_obj->madv = __MSM_MADV_PURGED;
-   mark_unpurgable(msm_obj);
+   update_inactive(msm_obj);
 
drm_vma_node_unmap(>vma_node, dev->anon_inode->i_mapping);
drm_gem_free_mmap_offset(obj);
diff --git a/drivers/gpu/drm/msm/msm_gem_shrinker.c 
b/drivers/gpu/drm/msm/msm_gem_shrinker.c
index f3e948af01c5..6bbb15d64861 100644
--- a/drivers/gpu/drm/msm/msm_gem_shrinker.c
+++ b/drivers/gpu/drm/msm/msm_gem_shrinker.c
@@ -22,26 +22,62 @@ msm_gem_shrinker_scan(struct shrinker *shrinker, struct 
shrink_control *sc)
 {
struct msm_drm_private *priv =
container_of(shrinker, struct msm_drm_private, shrinker);
-   struct msm_gem_object *msm_obj;
+   struct list_head still_in_list;
unsigned long freed = 0;
 
+   INIT_LIST_HEAD(_in_list);
+
mutex_lock(>mm_lock);
 
-   list_for_each_entry(msm_obj, >inactive_dontneed, mm_list) {
-   if (freed >= sc->nr_to_scan)
+   while (freed < sc->nr_to_scan) {
+   struct msm_gem_object *msm_obj = list_first_entry_or_null(
+   >inactive_dontneed, typeof(*msm_obj), 
mm_list);
+
+   if (!msm_obj)
break;
-   /* Use trylock, because we cannot block on a obj that
-* might be trying to acquire mm_lock
+
+   /*
+* If it is in the process of being freed, msm_gem_free_object
+* can be blocked on mm_lock waiting to remove it.  So just
+* skip it.
 */
-   if (!msm_gem_trylock(_obj->base))
+   if (!kref_get_unless_zero(_obj->base.refcount))
continue;
+
+   /*
+* Now that we own a reference, we can move it to our own
+* private temporary list and drop mm_lock for the rest of
+* the loop body, to reduce contention with the retire_submit
+* path (which could make more objects available to purge)
+*/
+   list_move_tail(_obj->mm_list, _in_list);
+
+   mutex_unlock(>mm_lock);
+
+   /*
+* Note that this still needs to be trylock, since we can
+* hit shrinker in response to trying to get backing pages
+* for this obj (ie. while it's lock is already held)
+*/
+   if (!msm_gem_trylock(_obj->base))
+   goto tail;
+
if (is_purgeable(msm_obj)) {
+   /*
+* This will move the obj out of still_in_list to
+* the purged list
+*/
msm_gem_purge(_obj->base);
freed += msm_obj->base.size >> PAGE_SHIFT;
}
msm_gem_unlock(_obj->base);
+
+tail:
+   drm_gem_object_put(_obj->base);
+   mutex_lock(>mm_lock);
}
 
+   list_splice_tail(_in_list, >inactive_dontneed);
mutex_unlock(>mm_lock);
 
if (freed > 0) {
-- 
2.30.2

Re: [PATCH v2 2/4] drm/msm: Avoid mutex in shrinker_count()

2021-04-01 Thread Rob Clark

On Thu, Apr 1, 2021 at 8:34 AM Doug Anderson  wrote:
>
> Hi,
>
> On Wed, Mar 31, 2021 at 6:24 PM Rob Clark  wrote:
> >
> > @@ -45,6 +30,9 @@ msm_gem_shrinker_scan(struct shrinker *shrinker, struct 
> > shrink_control *sc)
> > list_for_each_entry(msm_obj, >inactive_dontneed, mm_list) {
> > if (freed >= sc->nr_to_scan)
> > break;
> > +   /* Use trylock, because we cannot block on a obj that
> > +* might be trying to acquire mm_lock
> > +*/
>
> nit: I thought the above multi-line commenting style was only for
> "net" subsystem?

we do use the "net" style a fair bit already.. (OTOH I tend to not
really care what checkpatch says)

> > if (!msm_gem_trylock(_obj->base))
> > continue;
> > if (is_purgeable(msm_obj)) {
> > @@ -56,8 +44,11 @@ msm_gem_shrinker_scan(struct shrinker *shrinker, struct 
> > shrink_control *sc)
> >
> > mutex_unlock(>mm_lock);
> >
> > -   if (freed > 0)
> > +   if (freed > 0) {
> > trace_msm_gem_purge(freed << PAGE_SHIFT);
> > +   } else {
> > +   return SHRINK_STOP;
> > +   }
>
> It probably doesn't matter, but I wonder if we should still be
> returning SHRINK_STOP if we got any trylock failures. It could
> possibly be worth returning 0 in that case?

On the surface, you'd think that, but there be mm dragons.. we can hit
shrinker from the submit path when the obj is locked already and we
are trying to allocate backing pages.  We don't want to tell vmscan to
keep trying, because we'll keep failing to grab that objects lock

>
> > @@ -75,6 +66,9 @@ vmap_shrink(struct list_head *mm_list)
> > unsigned unmapped = 0;
> >
> > list_for_each_entry(msm_obj, mm_list, mm_list) {
> > +   /* Use trylock, because we cannot block on a obj that
> > +* might be trying to acquire mm_lock
> > +*/
>
> If you end up changing the commenting style above, should also be here.
>
> At this point this seems fine to land to me. Though I'm not an expert
> on every interaction in this code, I've spent enough time starting at
> it that I'm comfortable with:
>
> Reviewed-by: Douglas Anderson 

thanks

BR,
-R

[PATCH v2 4/4] drm/msm: Improved debugfs gem stats

2021-03-31 Thread Rob Clark

From: Rob Clark 

The last patch lost the breakdown of active vs inactive GEM objects in
$debugfs/gem.  But we can add some better stats to summarize not just
active vs inactive, but also purgable/purged to make up for that.

Signed-off-by: Rob Clark 
Tested-by: Douglas Anderson 
Reviewed-by: Douglas Anderson 
---
 drivers/gpu/drm/msm/msm_fb.c  |  3 ++-
 drivers/gpu/drm/msm/msm_gem.c | 31 ---
 drivers/gpu/drm/msm/msm_gem.h | 11 ++-
 3 files changed, 36 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_fb.c b/drivers/gpu/drm/msm/msm_fb.c
index d42f0665359a..91c0e493aed5 100644
--- a/drivers/gpu/drm/msm/msm_fb.c
+++ b/drivers/gpu/drm/msm/msm_fb.c
@@ -33,6 +33,7 @@ static const struct drm_framebuffer_funcs 
msm_framebuffer_funcs = {
 #ifdef CONFIG_DEBUG_FS
 void msm_framebuffer_describe(struct drm_framebuffer *fb, struct seq_file *m)
 {
+   struct msm_gem_stats stats = {};
int i, n = fb->format->num_planes;
 
seq_printf(m, "fb: %dx%d@%4.4s (%2d, ID:%d)\n",
@@ -42,7 +43,7 @@ void msm_framebuffer_describe(struct drm_framebuffer *fb, 
struct seq_file *m)
for (i = 0; i < n; i++) {
seq_printf(m, "   %d: offset=%d pitch=%d, obj: ",
i, fb->offsets[i], fb->pitches[i]);
-   msm_gem_describe(fb->obj[i], m);
+   msm_gem_describe(fb->obj[i], m, );
}
 }
 #endif
diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 7ca30e36..2ecf7f1cef25 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -873,7 +873,8 @@ static void describe_fence(struct dma_fence *fence, const 
char *type,
fence->seqno);
 }
 
-void msm_gem_describe(struct drm_gem_object *obj, struct seq_file *m)
+void msm_gem_describe(struct drm_gem_object *obj, struct seq_file *m,
+   struct msm_gem_stats *stats)
 {
struct msm_gem_object *msm_obj = to_msm_bo(obj);
struct dma_resv *robj = obj->resv;
@@ -885,11 +886,23 @@ void msm_gem_describe(struct drm_gem_object *obj, struct 
seq_file *m)
 
msm_gem_lock(obj);
 
+   stats->all.count++;
+   stats->all.size += obj->size;
+
+   if (is_active(msm_obj)) {
+   stats->active.count++;
+   stats->active.size += obj->size;
+   }
+
switch (msm_obj->madv) {
case __MSM_MADV_PURGED:
+   stats->purged.count++;
+   stats->purged.size += obj->size;
madv = " purged";
break;
case MSM_MADV_DONTNEED:
+   stats->purgable.count++;
+   stats->purgable.size += obj->size;
madv = " purgeable";
break;
case MSM_MADV_WILLNEED:
@@ -956,20 +969,24 @@ void msm_gem_describe(struct drm_gem_object *obj, struct 
seq_file *m)
 
 void msm_gem_describe_objects(struct list_head *list, struct seq_file *m)
 {
+   struct msm_gem_stats stats = {};
struct msm_gem_object *msm_obj;
-   int count = 0;
-   size_t size = 0;
 
seq_puts(m, "   flags   id ref  offset   kaddrsize 
madv  name\n");
list_for_each_entry(msm_obj, list, node) {
struct drm_gem_object *obj = _obj->base;
seq_puts(m, "   ");
-   msm_gem_describe(obj, m);
-   count++;
-   size += obj->size;
+   msm_gem_describe(obj, m, );
}
 
-   seq_printf(m, "Total %d objects, %zu bytes\n", count, size);
+   seq_printf(m, "Total:%4d objects, %9zu bytes\n",
+   stats.all.count, stats.all.size);
+   seq_printf(m, "Active:   %4d objects, %9zu bytes\n",
+   stats.active.count, stats.active.size);
+   seq_printf(m, "Purgable: %4d objects, %9zu bytes\n",
+   stats.purgable.count, stats.purgable.size);
+   seq_printf(m, "Purged:   %4d objects, %9zu bytes\n",
+   stats.purged.count, stats.purged.size);
 }
 #endif
 
diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index e6b28edb1db9..7c7d54bad189 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -158,7 +158,16 @@ struct drm_gem_object *msm_gem_import(struct drm_device 
*dev,
 __printf(2, 3)
 void msm_gem_object_set_name(struct drm_gem_object *bo, const char *fmt, ...);
 #ifdef CONFIG_DEBUG_FS
-void msm_gem_describe(struct drm_gem_object *obj, struct seq_file *m);
+
+struct msm_gem_stats {
+   struct {
+   unsigned count;
+   size_t size;
+   } all, active, purgable, purged;
+};
+
+void msm_gem_describe(struct drm_gem_object *obj, struct seq_file *m,
+   struct msm_gem_stats *stats);
 void msm_gem_describe_objects(struct list_head *list, struct seq_file *m);
 #endif
 
-- 
2.30.2

[PATCH v2 2/4] drm/msm: Avoid mutex in shrinker_count()

2021-03-31 Thread Rob Clark

From: Rob Clark 

When the system is under heavy memory pressure, we can end up with lots
of concurrent calls into the shrinker.  Keeping a running tab on what we
can shrink avoids grabbing a lock in shrinker->count(), and avoids
shrinker->scan() getting called when not profitable.

Also, we can keep purged objects in their own list to avoid re-traversing
them to help cut down time in the critical section further.

Signed-off-by: Rob Clark 
Tested-by: Douglas Anderson 
---
 drivers/gpu/drm/msm/msm_drv.c  |  1 +
 drivers/gpu/drm/msm/msm_drv.h  |  6 ++-
 drivers/gpu/drm/msm/msm_gem.c  | 20 --
 drivers/gpu/drm/msm/msm_gem.h  | 53 --
 drivers/gpu/drm/msm/msm_gem_shrinker.c | 28 ++
 5 files changed, 81 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 4f9fa0189a07..3462b0ea14c6 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -476,6 +476,7 @@ static int msm_drm_init(struct device *dev, const struct 
drm_driver *drv)
 
INIT_LIST_HEAD(>inactive_willneed);
INIT_LIST_HEAD(>inactive_dontneed);
+   INIT_LIST_HEAD(>inactive_purged);
mutex_init(>mm_lock);
 
/* Teach lockdep about lock ordering wrt. shrinker: */
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index a1264cfcac5e..503168817e24 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -179,8 +179,8 @@ struct msm_drm_private {
 * inactive lists (depending on whether or not it is shrinkable) or
 * gpu->active_list (for the gpu it is active on[1])
 *
-* These lists are protected by mm_lock.  If struct_mutex is involved, 
it
-* should be aquired prior to mm_lock.  One should *not* hold mm_lock in
+* These lists are protected by mm_lock (which should be acquired
+* before per GEM object lock).  One should *not* hold mm_lock in
 * get_pages()/vmap()/etc paths, as they can trigger the shrinker.
 *
 * [1] if someone ever added support for the old 2d cores, there could 
be
@@ -188,6 +188,8 @@ struct msm_drm_private {
 */
struct list_head inactive_willneed;  /* inactive + !shrinkable */
struct list_head inactive_dontneed;  /* inactive +  shrinkable */
+   struct list_head inactive_purged;/* inactive +  purged */
+   long shrinkable_count;   /* write access under mm_lock */
struct mutex mm_lock;
 
struct workqueue_struct *wq;
diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 9d10739c4eb2..bec01bb48fce 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -719,6 +719,7 @@ void msm_gem_purge(struct drm_gem_object *obj)
put_iova_vmas(obj);
 
msm_obj->madv = __MSM_MADV_PURGED;
+   mark_unpurgable(msm_obj);
 
drm_vma_node_unmap(>vma_node, dev->anon_inode->i_mapping);
drm_gem_free_mmap_offset(obj);
@@ -790,10 +791,11 @@ void msm_gem_active_get(struct drm_gem_object *obj, 
struct msm_gpu *gpu)
might_sleep();
WARN_ON(!msm_gem_is_locked(obj));
WARN_ON(msm_obj->madv != MSM_MADV_WILLNEED);
+   WARN_ON(msm_obj->dontneed);
 
if (msm_obj->active_count++ == 0) {
mutex_lock(>mm_lock);
-   list_del_init(_obj->mm_list);
+   list_del(_obj->mm_list);
list_add_tail(_obj->mm_list, >active_list);
mutex_unlock(>mm_lock);
}
@@ -818,11 +820,19 @@ static void update_inactive(struct msm_gem_object 
*msm_obj)
mutex_lock(>mm_lock);
WARN_ON(msm_obj->active_count != 0);
 
-   list_del_init(_obj->mm_list);
-   if (msm_obj->madv == MSM_MADV_WILLNEED)
+   if (msm_obj->dontneed)
+   mark_unpurgable(msm_obj);
+
+   list_del(_obj->mm_list);
+   if (msm_obj->madv == MSM_MADV_WILLNEED) {
list_add_tail(_obj->mm_list, >inactive_willneed);
-   else
+   } else if (msm_obj->madv == MSM_MADV_DONTNEED) {
list_add_tail(_obj->mm_list, >inactive_dontneed);
+   mark_purgable(msm_obj);
+   } else {
+   WARN_ON(msm_obj->madv != __MSM_MADV_PURGED);
+   list_add_tail(_obj->mm_list, >inactive_purged);
+   }
 
mutex_unlock(>mm_lock);
 }
@@ -971,6 +981,8 @@ void msm_gem_free_object(struct drm_gem_object *obj)
struct msm_drm_private *priv = dev->dev_private;
 
mutex_lock(>mm_lock);
+   if (msm_obj->dontneed)
+   mark_unpurgable(msm_obj);
list_del(_obj->mm_list);
mutex_unlock(>mm_lock);
 
diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index 7a9107cf1818..13aabfe92d

[PATCH v2 1/4] drm/msm: Remove unused freed llist node

2021-03-31 Thread Rob Clark

From: Rob Clark 

Unused since commit c951a9b284b9 ("drm/msm: Remove msm_gem_free_work")

Signed-off-by: Rob Clark 
Tested-by: Douglas Anderson 
---
 drivers/gpu/drm/msm/msm_gem.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index b3a0a880cbab..7a9107cf1818 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -78,8 +78,6 @@ struct msm_gem_object {
 
struct list_head vmas;/* list of msm_gem_vma */
 
-   struct llist_node freed;
-
/* For physically contiguous buffers.  Used when we don't have
 * an IOMMU.  Also used for stolen/splashscreen buffer.
 */
-- 
2.30.2

[PATCH v2 3/4] drm/msm: Fix debugfs deadlock

2021-03-31 Thread Rob Clark

From: Rob Clark 

In normal cases the gem obj lock is acquired first before mm_lock.  The
exception is iterating the various object lists.  In the shrinker path,
deadlock is avoided by using msm_gem_trylock() and skipping over objects
that cannot be locked.  But for debugfs the straightforward thing is to
split things out into a separate list of all objects protected by it's
own lock.

Fixes: d984457b31c4 ("drm/msm: Add priv->mm_lock to protect active/inactive 
lists")
Signed-off-by: Rob Clark 
Tested-by: Douglas Anderson 
---
 drivers/gpu/drm/msm/msm_debugfs.c | 14 +++---
 drivers/gpu/drm/msm/msm_drv.c |  3 +++
 drivers/gpu/drm/msm/msm_drv.h |  9 -
 drivers/gpu/drm/msm/msm_gem.c | 14 +-
 drivers/gpu/drm/msm/msm_gem.h | 10 --
 5 files changed, 35 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_debugfs.c 
b/drivers/gpu/drm/msm/msm_debugfs.c
index 85ad0babc326..d611cc8e54a4 100644
--- a/drivers/gpu/drm/msm/msm_debugfs.c
+++ b/drivers/gpu/drm/msm/msm_debugfs.c
@@ -111,23 +111,15 @@ static const struct file_operations msm_gpu_fops = {
 static int msm_gem_show(struct drm_device *dev, struct seq_file *m)
 {
struct msm_drm_private *priv = dev->dev_private;
-   struct msm_gpu *gpu = priv->gpu;
int ret;
 
-   ret = mutex_lock_interruptible(>mm_lock);
+   ret = mutex_lock_interruptible(>obj_lock);
if (ret)
return ret;
 
-   if (gpu) {
-   seq_printf(m, "Active Objects (%s):\n", gpu->name);
-   msm_gem_describe_objects(>active_list, m);
-   }
-
-   seq_printf(m, "Inactive Objects:\n");
-   msm_gem_describe_objects(>inactive_dontneed, m);
-   msm_gem_describe_objects(>inactive_willneed, m);
+   msm_gem_describe_objects(>objects, m);
 
-   mutex_unlock(>mm_lock);
+   mutex_unlock(>obj_lock);
 
return 0;
 }
diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 3462b0ea14c6..1ef1cd0cc714 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -474,6 +474,9 @@ static int msm_drm_init(struct device *dev, const struct 
drm_driver *drv)
 
priv->wq = alloc_ordered_workqueue("msm", 0);
 
+   INIT_LIST_HEAD(>objects);
+   mutex_init(>obj_lock);
+
INIT_LIST_HEAD(>inactive_willneed);
INIT_LIST_HEAD(>inactive_dontneed);
INIT_LIST_HEAD(>inactive_purged);
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index 503168817e24..c84e6f84cb6d 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -174,7 +174,14 @@ struct msm_drm_private {
struct msm_rd_state *hangrd;   /* debugfs to dump hanging submits */
struct msm_perf_state *perf;
 
-   /*
+   /**
+* List of all GEM objects (mainly for debugfs, protected by obj_lock
+* (acquire before per GEM object lock)
+*/
+   struct list_head objects;
+   struct mutex obj_lock;
+
+   /**
 * Lists of inactive GEM objects.  Every bo is either in one of the
 * inactive lists (depending on whether or not it is shrinkable) or
 * gpu->active_list (for the gpu it is active on[1])
diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index bec01bb48fce..7ca30e36 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -961,7 +961,7 @@ void msm_gem_describe_objects(struct list_head *list, 
struct seq_file *m)
size_t size = 0;
 
seq_puts(m, "   flags   id ref  offset   kaddrsize 
madv  name\n");
-   list_for_each_entry(msm_obj, list, mm_list) {
+   list_for_each_entry(msm_obj, list, node) {
struct drm_gem_object *obj = _obj->base;
seq_puts(m, "   ");
msm_gem_describe(obj, m);
@@ -980,6 +980,10 @@ void msm_gem_free_object(struct drm_gem_object *obj)
struct drm_device *dev = obj->dev;
struct msm_drm_private *priv = dev->dev_private;
 
+   mutex_lock(>obj_lock);
+   list_del(_obj->node);
+   mutex_unlock(>obj_lock);
+
mutex_lock(>mm_lock);
if (msm_obj->dontneed)
mark_unpurgable(msm_obj);
@@ -1170,6 +1174,10 @@ static struct drm_gem_object *_msm_gem_new(struct 
drm_device *dev,
list_add_tail(_obj->mm_list, >inactive_willneed);
mutex_unlock(>mm_lock);
 
+   mutex_lock(>obj_lock);
+   list_add_tail(_obj->node, >objects);
+   mutex_unlock(>obj_lock);
+
return obj;
 
 fail:
@@ -1240,6 +1248,10 @@ struct drm_gem_object *msm_gem_import(struct drm_device 
*dev,
list_add_tail(_obj->mm_list, >inactive_willneed);
mutex_unlock(>mm_lock);
 
+   mutex_lock(>obj_lock);
+   list_ad

[PATCH v2 0/4] drm/msm: Shrinker (and related) fixes

2021-03-31 Thread Rob Clark

From: Rob Clark 

I've been spending some time looking into how things behave under high
memory pressure.  The first patch is a random cleanup I noticed along
the way.  The second improves the situation significantly when we are
getting shrinker called from many threads in parallel.  And the last
two are $debugfs/gem fixes I needed so I could monitor the state of GEM
objects (ie. how many are active/purgable/purged) while triggering high
memory pressure.

We could probably go a bit further with dropping the mm_lock in the
shrinker->scan() loop, but this is already a pretty big improvement.
The next step is probably actually to add support to unpin/evict
inactive objects.  (We are part way there since we have already de-
coupled the iova lifetime from the pages lifetime, but there are a
few sharp corners to work through.)

Rob Clark (4):
  drm/msm: Remove unused freed llist node
  drm/msm: Avoid mutex in shrinker_count()
  drm/msm: Fix debugfs deadlock
  drm/msm: Improved debugfs gem stats

 drivers/gpu/drm/msm/msm_debugfs.c  | 14 ++---
 drivers/gpu/drm/msm/msm_drv.c  |  4 ++
 drivers/gpu/drm/msm/msm_drv.h  | 15 --
 drivers/gpu/drm/msm/msm_fb.c   |  3 +-
 drivers/gpu/drm/msm/msm_gem.c  | 65 ++-
 drivers/gpu/drm/msm/msm_gem.h  | 72 +++---
 drivers/gpu/drm/msm/msm_gem_shrinker.c | 28 --
 7 files changed, 150 insertions(+), 51 deletions(-)

-- 
2.30.2

Re: [PATCH 2/4] drm/msm: Avoid mutex in shrinker_count()

2021-03-31 Thread Rob Clark

On Wed, Mar 31, 2021 at 4:39 PM Doug Anderson  wrote:
>
> Hi,
>
> On Wed, Mar 31, 2021 at 4:23 PM Rob Clark  wrote:
> >
> > On Wed, Mar 31, 2021 at 3:44 PM Doug Anderson  wrote:
> > >
> > > Hi,
> > >
> > > On Wed, Mar 31, 2021 at 3:14 PM Rob Clark  wrote:
> > > >
> > > > @@ -818,11 +820,19 @@ static void update_inactive(struct msm_gem_object 
> > > > *msm_obj)
> > > > mutex_lock(>mm_lock);
> > > > WARN_ON(msm_obj->active_count != 0);
> > > >
> > > > +   if (msm_obj->dontneed)
> > > > +   mark_unpurgable(msm_obj);
> > > > +
> > > > list_del_init(_obj->mm_list);
> > > > -   if (msm_obj->madv == MSM_MADV_WILLNEED)
> > > > +   if (msm_obj->madv == MSM_MADV_WILLNEED) {
> > > > list_add_tail(_obj->mm_list, 
> > > > >inactive_willneed);
> > > > -   else
> > > > +   } else if (msm_obj->madv == MSM_MADV_DONTNEED) {
> > > > list_add_tail(_obj->mm_list, 
> > > > >inactive_dontneed);
> > > > +   mark_purgable(msm_obj);
> > > > +   } else {
> > > > +   WARN_ON(msm_obj->madv != __MSM_MADV_PURGED);
> > > > +   list_add_tail(_obj->mm_list, 
> > > > >inactive_purged);
> > >
> > > I'm probably being dense, but what's the point of adding it to the
> > > "inactive_purged" list here? You never look at that list, right? You
> > > already did a list_del_init() on this object's list pointer
> > > ("mm_list"). I don't see how adding it to a bogus list helps with
> > > anything.
> >
> > It preserves the "every bo is in one of these lists" statement, but
> > other than that you are right we aren't otherwise doing anything with
> > that list.  (Or we could replace the list_del_init() with list_del()..
> > I tend to instinctively go for list_del_init())
>
> If you really want this list, it wouldn't hurt to at least have a
> comment saying that it's not used for anything so people like me doing
> go trying to figure out what it's used for. ;-)
>
>
> > > > @@ -198,6 +203,33 @@ static inline bool is_vunmapable(struct 
> > > > msm_gem_object *msm_obj)
> > > > return (msm_obj->vmap_count == 0) && msm_obj->vaddr;
> > > >  }
> > > >
> > > > +static inline void mark_purgable(struct msm_gem_object *msm_obj)
> > > > +{
> > > > +   struct msm_drm_private *priv = msm_obj->base.dev->dev_private;
> > > > +
> > > > +   WARN_ON(!mutex_is_locked(>mm_lock));
> > > > +
> > > > +   if (WARN_ON(msm_obj->dontneed))
> > > > +   return;
> > >
> > > The is_purgeable() function also checks other things besides just
> > > "MSM_MADV_DONTNEED". Do we need to check those too? Specifically:
> > >
> > >  msm_obj->sgt && !msm_obj->base.dma_buf && !msm_obj->base.import_attach
> > >
> > > ...or is it just being paranoid?
> > >
> > > I guess I'm just worried that if any of those might be important then
> > > we'll consistently report back that we have a count of things that can
> > > be purged but then scan() won't find anything to do. That wouldn't be
> > > great.
> >
> > Hmm, I thought msm_gem_madvise() returned an error instead of allowing
> > MSM_MADV_DONTNEED to be set on imported/exported dma-bufs.. it
> > probably should to be complete (but userspace already knows not to
> > madvise an imported/exported buffer for other reasons.. ie. we can't
> > let a shared buffer end up in the bo cache).  I'll re-work that a bit.
> >
> > The msm_obj->sgt case is a bit more tricky.. that will be the case of
> > a freshly allocated obj that does not have backing patches yet.  But
> > it seems like enough of a corner case, that I'm happy to live with
> > it.. ie. the tricky thing is not leaking decrements of
> > priv->shrinkable_count or underflowing priv->shrinkable_count, and
> > caring about the !msm_obj->sgt case doubles the number of states an
> > object can be in, and the shrinker->count() return value is just an
> > estimate.
>
> I think it's equally important to make sure that we don't constantly
> have a non-zero count and then have scan() do nothing.  If there's a
> transit

Re: [PATCH 3/4] drm/msm: Fix debugfs deadlock

2021-03-31 Thread Rob Clark

On Wed, Mar 31, 2021 at 4:13 PM Doug Anderson  wrote:
>
> Hi,
>
> On Wed, Mar 31, 2021 at 3:14 PM Rob Clark  wrote:
> >
> > @@ -111,23 +111,15 @@ static const struct file_operations msm_gpu_fops = {
> >  static int msm_gem_show(struct drm_device *dev, struct seq_file *m)
> >  {
> > struct msm_drm_private *priv = dev->dev_private;
> > -   struct msm_gpu *gpu = priv->gpu;
> > int ret;
> >
> > -   ret = mutex_lock_interruptible(>mm_lock);
> > +   ret = mutex_lock_interruptible(>obj_lock);
> > if (ret)
> > return ret;
> >
> > -   if (gpu) {
> > -   seq_printf(m, "Active Objects (%s):\n", gpu->name);
> > -   msm_gem_describe_objects(>active_list, m);
> > -   }
> > -
> > -   seq_printf(m, "Inactive Objects:\n");
> > -   msm_gem_describe_objects(>inactive_dontneed, m);
> > -   msm_gem_describe_objects(>inactive_willneed, m);
> > +   msm_gem_describe_objects(>objects, m);
>
> I guess we no longer sort the by Active and Inactive but that doesn't
> really matter?

It turned out to be less useful to sort by active/inactive, as much as
just having the summary at the bottom that the last patch adds.  We
can already tell from the per-object entries whether it is
active/purgable/purged.

I did initially try to come up with an approach that let me keep this,
but it would basically amount to re-writing the gem_submit path
(because you cannot do any memory allocation under mm_lock)

>
> > @@ -174,7 +174,13 @@ struct msm_drm_private {
> > struct msm_rd_state *hangrd;   /* debugfs to dump hanging submits */
> > struct msm_perf_state *perf;
> >
> > -   /*
> > +   /**
> > +* List of all GEM objects (mainly for debugfs, protected by 
> > obj_lock
>
> It wouldn't hurt to talk about lock ordering here? Like: "If we need
> the "obj_lock" and a "gem_lock" at the same time we always grab the
> "obj_lock" first.

good point

>
> > @@ -60,13 +60,20 @@ struct msm_gem_object {
> >  */
> > uint8_t vmap_count;
> >
> > -   /* And object is either:
> > -*  inactive - on priv->inactive_list
> > +   /**
> > +* Node in list of all objects (mainly for debugfs, protected by
> > +* struct_mutex
>
> Not "struct_mutex" in comment, right? Maybe "obj_lock" I think?

oh, right, forgot to fix that from an earlier iteration

BR,
-R

Re: [PATCH 2/4] drm/msm: Avoid mutex in shrinker_count()

2021-03-31 Thread Rob Clark

On Wed, Mar 31, 2021 at 3:44 PM Doug Anderson  wrote:
>
> Hi,
>
> On Wed, Mar 31, 2021 at 3:14 PM Rob Clark  wrote:
> >
> > @@ -818,11 +820,19 @@ static void update_inactive(struct msm_gem_object 
> > *msm_obj)
> > mutex_lock(>mm_lock);
> > WARN_ON(msm_obj->active_count != 0);
> >
> > +   if (msm_obj->dontneed)
> > +   mark_unpurgable(msm_obj);
> > +
> > list_del_init(_obj->mm_list);
> > -   if (msm_obj->madv == MSM_MADV_WILLNEED)
> > +   if (msm_obj->madv == MSM_MADV_WILLNEED) {
> > list_add_tail(_obj->mm_list, >inactive_willneed);
> > -   else
> > +   } else if (msm_obj->madv == MSM_MADV_DONTNEED) {
> > list_add_tail(_obj->mm_list, >inactive_dontneed);
> > +   mark_purgable(msm_obj);
> > +   } else {
> > +   WARN_ON(msm_obj->madv != __MSM_MADV_PURGED);
> > +   list_add_tail(_obj->mm_list, >inactive_purged);
>
> I'm probably being dense, but what's the point of adding it to the
> "inactive_purged" list here? You never look at that list, right? You
> already did a list_del_init() on this object's list pointer
> ("mm_list"). I don't see how adding it to a bogus list helps with
> anything.

It preserves the "every bo is in one of these lists" statement, but
other than that you are right we aren't otherwise doing anything with
that list.  (Or we could replace the list_del_init() with list_del()..
I tend to instinctively go for list_del_init())

>
> > @@ -198,6 +203,33 @@ static inline bool is_vunmapable(struct msm_gem_object 
> > *msm_obj)
> > return (msm_obj->vmap_count == 0) && msm_obj->vaddr;
> >  }
> >
> > +static inline void mark_purgable(struct msm_gem_object *msm_obj)
> > +{
> > +   struct msm_drm_private *priv = msm_obj->base.dev->dev_private;
> > +
> > +   WARN_ON(!mutex_is_locked(>mm_lock));
> > +
> > +   if (WARN_ON(msm_obj->dontneed))
> > +   return;
>
> The is_purgeable() function also checks other things besides just
> "MSM_MADV_DONTNEED". Do we need to check those too? Specifically:
>
>  msm_obj->sgt && !msm_obj->base.dma_buf && !msm_obj->base.import_attach
>
> ...or is it just being paranoid?
>
> I guess I'm just worried that if any of those might be important then
> we'll consistently report back that we have a count of things that can
> be purged but then scan() won't find anything to do. That wouldn't be
> great.

Hmm, I thought msm_gem_madvise() returned an error instead of allowing
MSM_MADV_DONTNEED to be set on imported/exported dma-bufs.. it
probably should to be complete (but userspace already knows not to
madvise an imported/exported buffer for other reasons.. ie. we can't
let a shared buffer end up in the bo cache).  I'll re-work that a bit.

The msm_obj->sgt case is a bit more tricky.. that will be the case of
a freshly allocated obj that does not have backing patches yet.  But
it seems like enough of a corner case, that I'm happy to live with
it.. ie. the tricky thing is not leaking decrements of
priv->shrinkable_count or underflowing priv->shrinkable_count, and
caring about the !msm_obj->sgt case doubles the number of states an
object can be in, and the shrinker->count() return value is just an
estimate.

>
> > +   priv->shrinkable_count += msm_obj->base.size >> PAGE_SHIFT;
> > +   msm_obj->dontneed = true;
> > +}
> > +
> > +static inline void mark_unpurgable(struct msm_gem_object *msm_obj)
> > +{
> > +   struct msm_drm_private *priv = msm_obj->base.dev->dev_private;
> > +
> > +   WARN_ON(!mutex_is_locked(>mm_lock));
> > +
> > +   if (WARN_ON(!msm_obj->dontneed))
> > +   return;
> > +
> > +   priv->shrinkable_count -= msm_obj->base.size >> PAGE_SHIFT;
> > +   WARN_ON(priv->shrinkable_count < 0);
>
> If you changed the order maybe you could make shrinkable_count
> "unsigned long" to match the shrinker API?
>
>  new_shrinkable = msm_obj->base.size >> PAGE_SHIFT;
>  WARN_ON(new_shrinkable > priv->shrinkable_count);
>  priv->shrinkable_count -= new_shrinkable
>

True, although I've developed a preference for signed integers in
cases where it can underflow if you mess up

BR,
-R

Re: [v1] drm/msm/disp/dpu1: fix warn stack reported during dpu resume

2021-03-31 Thread Rob Clark

On Wed, Mar 31, 2021 at 9:03 AM Dmitry Baryshkov
 wrote:
>
> On 31/03/2021 14:27, Kalyan Thota wrote:
> > WARN_ON was introduced by the below commit to catch runtime resumes
> > that are getting triggered before icc path was set.
> >
> > "drm/msm/disp/dpu1: icc path needs to be set before dpu runtime resume"
> >
> > For the targets where the bw scaling is not enabled, this WARN_ON is
> > a false alarm. Fix the WARN condition appropriately.
>
> Should we change all DPU targets to use bw scaling to the mdp from the
> mdss nodes? The limitation to sc7180 looks artificial.

yes, we should, this keeps biting us on 845

> >
> > Reported-by: Steev Klimaszewski 

Please add Fixes: tag as well

> > Signed-off-by: Kalyan Thota 
> > ---
> >   drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c  |  8 +---
> >   drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h  |  9 +
> >   drivers/gpu/drm/msm/disp/dpu1/dpu_mdss.c | 11 ++-
> >   3 files changed, 20 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c 
> > b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
> > index cab387f..0071a4d 100644
> > --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
> > +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
> > @@ -294,6 +294,9 @@ static int dpu_kms_parse_data_bus_icc_path(struct 
> > dpu_kms *dpu_kms)
> >   struct icc_path *path1;
> >   struct drm_device *dev = dpu_kms->dev;
> >
> > + if (!dpu_supports_bw_scaling(dev))
> > + return 0;
> > +
> >   path0 = of_icc_get(dev->dev, "mdp0-mem");
> >   path1 = of_icc_get(dev->dev, "mdp1-mem");
> >
> > @@ -934,8 +937,7 @@ static int dpu_kms_hw_init(struct msm_kms *kms)
> >   DPU_DEBUG("REG_DMA is not defined");
> >   }
> >
> > - if (of_device_is_compatible(dev->dev->of_node, "qcom,sc7180-mdss"))
> > - dpu_kms_parse_data_bus_icc_path(dpu_kms);
> > + dpu_kms_parse_data_bus_icc_path(dpu_kms);
> >
> >   pm_runtime_get_sync(_kms->pdev->dev);
> >
> > @@ -1198,7 +1200,7 @@ static int __maybe_unused dpu_runtime_resume(struct 
> > device *dev)
> >
> >   ddev = dpu_kms->dev;
> >
> > - WARN_ON(!(dpu_kms->num_paths));
> > + WARN_ON((dpu_supports_bw_scaling(ddev) && !dpu_kms->num_paths));
> >   /* Min vote of BW is required before turning on AXI clk */
> >   for (i = 0; i < dpu_kms->num_paths; i++)
> >   icc_set_bw(dpu_kms->path[i], 0, Bps_to_icc(MIN_IB_BW));
> > diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h 
> > b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h
> > index d6717d6..f7bcc0a 100644
> > --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h
> > +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h
> > @@ -154,6 +154,15 @@ struct vsync_info {
> >
> >   #define to_dpu_global_state(x) container_of(x, struct dpu_global_state, 
> > base)
> >
> > +/**
> > + * dpu_supports_bw_scaling: returns true for drivers that support bw 
> > scaling.
> > + * @dev: Pointer to drm_device structure
> > + */
> > +static inline int dpu_supports_bw_scaling(struct drm_device *dev)
> > +{
> > + return of_device_is_compatible(dev->dev->of_node, "qcom,sc7180-mdss");
> > +}
> > +
> >   /* Global private object state for tracking resources that are shared 
> > across
> >* multiple kms objects (planes/crtcs/etc).
> >*/
> > diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_mdss.c 
> > b/drivers/gpu/drm/msm/disp/dpu1/dpu_mdss.c
> > index cd40788..8cd712c 100644
> > --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_mdss.c
> > +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_mdss.c
> > @@ -41,6 +41,9 @@ static int dpu_mdss_parse_data_bus_icc_path(struct 
> > drm_device *dev,
> >   struct icc_path *path0 = of_icc_get(dev->dev, "mdp0-mem");
> >   struct icc_path *path1 = of_icc_get(dev->dev, "mdp1-mem");
> >
> > + if (dpu_supports_bw_scaling(dev))
> > + return 0;
> > +
> >   if (IS_ERR_OR_NULL(path0))
> >   return PTR_ERR_OR_ZERO(path0);
> >
> > @@ -276,11 +279,9 @@ int dpu_mdss_init(struct drm_device *dev)
> >
> >   DRM_DEBUG("mapped mdss address space @%pK\n", dpu_mdss->mmio);
> >
> > - if (!of_device_is_compatible(dev->dev->of_node, "qcom,sc7180-mdss")) {
> > - ret = dpu_mdss_parse_data_bus_icc_path(dev, dpu_mdss);
> > - if (ret)
> > - return ret;
> > - }
> > + ret = dpu_mdss_parse_data_bus_icc_path(dev, dpu_mdss);
> > + if (ret)
> > + return ret;
> >
> >   mp = _mdss->mp;
> >   ret = msm_dss_parse_clock(pdev, mp);
> >
>
>
> --
> With best wishes
> Dmitry

Re: [Freedreno] [PATCH] mailmap: Update email address for Jordan Crouse

2021-03-31 Thread Rob Clark

On Thu, Mar 25, 2021 at 7:37 AM Jordan Crouse  wrote:
>
> jcrouse at codeaurora.org ha started bouncing. Redirect to a

nit: s/ha/has/

> more permanent address.
>
> Signed-off-by: Jordan Crouse 

Acked-by: Rob Clark 

> ---
>
>  .mailmap | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/.mailmap b/.mailmap
> index 85b93cdefc87..8c489cb1d1ce 100644
> --- a/.mailmap
> +++ b/.mailmap
> @@ -165,6 +165,7 @@ Johan Hovold  
>  Johan Hovold  
>  John Paul Adrian Glaubitz 
>  John Stultz 
> +Jordan Crouse  
>   
>   
>   
> --
> 2.25.1
>
> ___
> Freedreno mailing list
> freedr...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/freedreno

[PATCH 4/4] drm/msm: Improved debugfs gem stats

2021-03-31 Thread Rob Clark

From: Rob Clark 

The last patch lost the breakdown of active vs inactive GEM objects in
$debugfs/gem.  But we can add some better stats to summarize not just
active vs inactive, but also purgable/purged to make up for that.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_fb.c  |  3 ++-
 drivers/gpu/drm/msm/msm_gem.c | 31 ---
 drivers/gpu/drm/msm/msm_gem.h | 11 ++-
 3 files changed, 36 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_fb.c b/drivers/gpu/drm/msm/msm_fb.c
index d42f0665359a..887172a10c9a 100644
--- a/drivers/gpu/drm/msm/msm_fb.c
+++ b/drivers/gpu/drm/msm/msm_fb.c
@@ -33,6 +33,7 @@ static const struct drm_framebuffer_funcs 
msm_framebuffer_funcs = {
 #ifdef CONFIG_DEBUG_FS
 void msm_framebuffer_describe(struct drm_framebuffer *fb, struct seq_file *m)
 {
+   struct msm_gem_stats stats = {{0}};
int i, n = fb->format->num_planes;
 
seq_printf(m, "fb: %dx%d@%4.4s (%2d, ID:%d)\n",
@@ -42,7 +43,7 @@ void msm_framebuffer_describe(struct drm_framebuffer *fb, 
struct seq_file *m)
for (i = 0; i < n; i++) {
seq_printf(m, "   %d: offset=%d pitch=%d, obj: ",
i, fb->offsets[i], fb->pitches[i]);
-   msm_gem_describe(fb->obj[i], m);
+   msm_gem_describe(fb->obj[i], m, );
}
 }
 #endif
diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index c184ea68a6d0..a933ca5dc6df 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -873,7 +873,8 @@ static void describe_fence(struct dma_fence *fence, const 
char *type,
fence->seqno);
 }
 
-void msm_gem_describe(struct drm_gem_object *obj, struct seq_file *m)
+void msm_gem_describe(struct drm_gem_object *obj, struct seq_file *m,
+   struct msm_gem_stats *stats)
 {
struct msm_gem_object *msm_obj = to_msm_bo(obj);
struct dma_resv *robj = obj->resv;
@@ -885,11 +886,23 @@ void msm_gem_describe(struct drm_gem_object *obj, struct 
seq_file *m)
 
msm_gem_lock(obj);
 
+   stats->all.count++;
+   stats->all.size += obj->size;
+
+   if (is_active(msm_obj)) {
+   stats->active.count++;
+   stats->active.size += obj->size;
+   }
+
switch (msm_obj->madv) {
case __MSM_MADV_PURGED:
+   stats->purged.count++;
+   stats->purged.size += obj->size;
madv = " purged";
break;
case MSM_MADV_DONTNEED:
+   stats->purgable.count++;
+   stats->purgable.size += obj->size;
madv = " purgeable";
break;
case MSM_MADV_WILLNEED:
@@ -956,20 +969,24 @@ void msm_gem_describe(struct drm_gem_object *obj, struct 
seq_file *m)
 
 void msm_gem_describe_objects(struct list_head *list, struct seq_file *m)
 {
+   struct msm_gem_stats stats = {{0}};
struct msm_gem_object *msm_obj;
-   int count = 0;
-   size_t size = 0;
 
seq_puts(m, "   flags   id ref  offset   kaddrsize 
madv  name\n");
list_for_each_entry(msm_obj, list, node) {
struct drm_gem_object *obj = _obj->base;
seq_puts(m, "   ");
-   msm_gem_describe(obj, m);
-   count++;
-   size += obj->size;
+   msm_gem_describe(obj, m, );
}
 
-   seq_printf(m, "Total %d objects, %zu bytes\n", count, size);
+   seq_printf(m, "Total:%4d objects, %9zu bytes\n",
+   stats.all.count, stats.all.size);
+   seq_printf(m, "Active:   %4d objects, %9zu bytes\n",
+   stats.active.count, stats.active.size);
+   seq_printf(m, "Purgable: %4d objects, %9zu bytes\n",
+   stats.purgable.count, stats.purgable.size);
+   seq_printf(m, "Purged:   %4d objects, %9zu bytes\n",
+   stats.purged.count, stats.purged.size);
 }
 #endif
 
diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index 49956196025e..43510ac070dd 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -158,7 +158,16 @@ struct drm_gem_object *msm_gem_import(struct drm_device 
*dev,
 __printf(2, 3)
 void msm_gem_object_set_name(struct drm_gem_object *bo, const char *fmt, ...);
 #ifdef CONFIG_DEBUG_FS
-void msm_gem_describe(struct drm_gem_object *obj, struct seq_file *m);
+
+struct msm_gem_stats {
+   struct {
+   unsigned count;
+   size_t size;
+   } all, active, purgable, purged;
+};
+
+void msm_gem_describe(struct drm_gem_object *obj, struct seq_file *m,
+   struct msm_gem_stats *stats);
 void msm_gem_describe_objects(struct list_head *list, struct seq_file *m);
 #endif
 
-- 
2.30.2

[PATCH 3/4] drm/msm: Fix debugfs deadlock

2021-03-31 Thread Rob Clark

From: Rob Clark 

In normal cases the gem obj lock is acquired first before mm_lock.  The
exception is iterating the various object lists.  In the shrinker path,
deadlock is avoided by using msm_gem_trylock() and skipping over objects
that cannot be locked.  But for debugfs the straightforward thing is to
split things out into a separate list of all objects protected by it's
own lock.

Fixes: d984457b31c4 ("drm/msm: Add priv->mm_lock to protect active/inactive 
lists")
Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_debugfs.c | 14 +++---
 drivers/gpu/drm/msm/msm_drv.c |  3 +++
 drivers/gpu/drm/msm/msm_drv.h |  8 +++-
 drivers/gpu/drm/msm/msm_gem.c | 14 +-
 drivers/gpu/drm/msm/msm_gem.h | 13 ++---
 5 files changed, 36 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_debugfs.c 
b/drivers/gpu/drm/msm/msm_debugfs.c
index 85ad0babc326..d611cc8e54a4 100644
--- a/drivers/gpu/drm/msm/msm_debugfs.c
+++ b/drivers/gpu/drm/msm/msm_debugfs.c
@@ -111,23 +111,15 @@ static const struct file_operations msm_gpu_fops = {
 static int msm_gem_show(struct drm_device *dev, struct seq_file *m)
 {
struct msm_drm_private *priv = dev->dev_private;
-   struct msm_gpu *gpu = priv->gpu;
int ret;
 
-   ret = mutex_lock_interruptible(>mm_lock);
+   ret = mutex_lock_interruptible(>obj_lock);
if (ret)
return ret;
 
-   if (gpu) {
-   seq_printf(m, "Active Objects (%s):\n", gpu->name);
-   msm_gem_describe_objects(>active_list, m);
-   }
-
-   seq_printf(m, "Inactive Objects:\n");
-   msm_gem_describe_objects(>inactive_dontneed, m);
-   msm_gem_describe_objects(>inactive_willneed, m);
+   msm_gem_describe_objects(>objects, m);
 
-   mutex_unlock(>mm_lock);
+   mutex_unlock(>obj_lock);
 
return 0;
 }
diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 3462b0ea14c6..1ef1cd0cc714 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -474,6 +474,9 @@ static int msm_drm_init(struct device *dev, const struct 
drm_driver *drv)
 
priv->wq = alloc_ordered_workqueue("msm", 0);
 
+   INIT_LIST_HEAD(>objects);
+   mutex_init(>obj_lock);
+
INIT_LIST_HEAD(>inactive_willneed);
INIT_LIST_HEAD(>inactive_dontneed);
INIT_LIST_HEAD(>inactive_purged);
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index 3ead5755f695..d69f4263bd4e 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -174,7 +174,13 @@ struct msm_drm_private {
struct msm_rd_state *hangrd;   /* debugfs to dump hanging submits */
struct msm_perf_state *perf;
 
-   /*
+   /**
+* List of all GEM objects (mainly for debugfs, protected by obj_lock
+*/
+   struct list_head objects;
+   struct mutex obj_lock;
+
+   /**
 * Lists of inactive GEM objects.  Every bo is either in one of the
 * inactive lists (depending on whether or not it is shrinkable) or
 * gpu->active_list (for the gpu it is active on[1])
diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 74a92eedc992..c184ea68a6d0 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -961,7 +961,7 @@ void msm_gem_describe_objects(struct list_head *list, 
struct seq_file *m)
size_t size = 0;
 
seq_puts(m, "   flags   id ref  offset   kaddrsize 
madv  name\n");
-   list_for_each_entry(msm_obj, list, mm_list) {
+   list_for_each_entry(msm_obj, list, node) {
struct drm_gem_object *obj = _obj->base;
seq_puts(m, "   ");
msm_gem_describe(obj, m);
@@ -980,6 +980,10 @@ void msm_gem_free_object(struct drm_gem_object *obj)
struct drm_device *dev = obj->dev;
struct msm_drm_private *priv = dev->dev_private;
 
+   mutex_lock(>obj_lock);
+   list_del(_obj->node);
+   mutex_unlock(>obj_lock);
+
mutex_lock(>mm_lock);
if (msm_obj->dontneed)
mark_unpurgable(msm_obj);
@@ -1170,6 +1174,10 @@ static struct drm_gem_object *_msm_gem_new(struct 
drm_device *dev,
list_add_tail(_obj->mm_list, >inactive_willneed);
mutex_unlock(>mm_lock);
 
+   mutex_lock(>obj_lock);
+   list_add_tail(_obj->node, >objects);
+   mutex_unlock(>obj_lock);
+
return obj;
 
 fail:
@@ -1240,6 +1248,10 @@ struct drm_gem_object *msm_gem_import(struct drm_device 
*dev,
list_add_tail(_obj->mm_list, >inactive_willneed);
mutex_unlock(>mm_lock);
 
+   mutex_lock(>obj_lock);
+   list_add_tail(_obj->node, >objects);
+   mutex_unlock(>obj_lock);
+

[PATCH 2/4] drm/msm: Avoid mutex in shrinker_count()

2021-03-31 Thread Rob Clark

From: Rob Clark 

When the system is under heavy memory pressure, we can end up with lots
of concurrent calls into the shrinker.  Keeping a running tab on what we
can shrink avoids grabbing a lock in shrinker->count(), and avoids
shrinker->scan() getting called when not profitable.

Also, we can keep purged objects in their own list to avoid re-traversing
them to help cut down time in the critical section further.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_drv.c  |  1 +
 drivers/gpu/drm/msm/msm_drv.h  |  2 ++
 drivers/gpu/drm/msm/msm_gem.c  | 16 +++--
 drivers/gpu/drm/msm/msm_gem.h  | 32 ++
 drivers/gpu/drm/msm/msm_gem_shrinker.c | 17 +-
 5 files changed, 50 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 4f9fa0189a07..3462b0ea14c6 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -476,6 +476,7 @@ static int msm_drm_init(struct device *dev, const struct 
drm_driver *drv)
 
INIT_LIST_HEAD(>inactive_willneed);
INIT_LIST_HEAD(>inactive_dontneed);
+   INIT_LIST_HEAD(>inactive_purged);
mutex_init(>mm_lock);
 
/* Teach lockdep about lock ordering wrt. shrinker: */
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index a1264cfcac5e..3ead5755f695 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -188,6 +188,8 @@ struct msm_drm_private {
 */
struct list_head inactive_willneed;  /* inactive + !shrinkable */
struct list_head inactive_dontneed;  /* inactive +  shrinkable */
+   struct list_head inactive_purged;/* inactive +  purged */
+   int shrinkable_count;/* write access under mm_lock */
struct mutex mm_lock;
 
struct workqueue_struct *wq;
diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 9d10739c4eb2..74a92eedc992 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -719,6 +719,7 @@ void msm_gem_purge(struct drm_gem_object *obj)
put_iova_vmas(obj);
 
msm_obj->madv = __MSM_MADV_PURGED;
+   mark_unpurgable(msm_obj);
 
drm_vma_node_unmap(>vma_node, dev->anon_inode->i_mapping);
drm_gem_free_mmap_offset(obj);
@@ -790,6 +791,7 @@ void msm_gem_active_get(struct drm_gem_object *obj, struct 
msm_gpu *gpu)
might_sleep();
WARN_ON(!msm_gem_is_locked(obj));
WARN_ON(msm_obj->madv != MSM_MADV_WILLNEED);
+   WARN_ON(msm_obj->dontneed);
 
if (msm_obj->active_count++ == 0) {
mutex_lock(>mm_lock);
@@ -818,11 +820,19 @@ static void update_inactive(struct msm_gem_object 
*msm_obj)
mutex_lock(>mm_lock);
WARN_ON(msm_obj->active_count != 0);
 
+   if (msm_obj->dontneed)
+   mark_unpurgable(msm_obj);
+
list_del_init(_obj->mm_list);
-   if (msm_obj->madv == MSM_MADV_WILLNEED)
+   if (msm_obj->madv == MSM_MADV_WILLNEED) {
list_add_tail(_obj->mm_list, >inactive_willneed);
-   else
+   } else if (msm_obj->madv == MSM_MADV_DONTNEED) {
list_add_tail(_obj->mm_list, >inactive_dontneed);
+   mark_purgable(msm_obj);
+   } else {
+   WARN_ON(msm_obj->madv != __MSM_MADV_PURGED);
+   list_add_tail(_obj->mm_list, >inactive_purged);
+   }
 
mutex_unlock(>mm_lock);
 }
@@ -971,6 +981,8 @@ void msm_gem_free_object(struct drm_gem_object *obj)
struct msm_drm_private *priv = dev->dev_private;
 
mutex_lock(>mm_lock);
+   if (msm_obj->dontneed)
+   mark_unpurgable(msm_obj);
list_del(_obj->mm_list);
mutex_unlock(>mm_lock);
 
diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index 7a9107cf1818..0feabae75d3d 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -50,6 +50,11 @@ struct msm_gem_object {
 */
uint8_t madv;
 
+   /**
+* Is object on inactive_dontneed list (ie. counted in 
priv->shrinkable_count)?
+*/
+   bool dontneed : 1;
+
/**
 * count of active vmap'ing
 */
@@ -198,6 +203,33 @@ static inline bool is_vunmapable(struct msm_gem_object 
*msm_obj)
return (msm_obj->vmap_count == 0) && msm_obj->vaddr;
 }
 
+static inline void mark_purgable(struct msm_gem_object *msm_obj)
+{
+   struct msm_drm_private *priv = msm_obj->base.dev->dev_private;
+
+   WARN_ON(!mutex_is_locked(>mm_lock));
+
+   if (WARN_ON(msm_obj->dontneed))
+   return;
+
+   priv->shrinkable_count += msm_obj->base.size >> PAGE_SHIFT;
+   msm_obj->dontneed = true;
+}
+
+static inline void mark_unpurgable(str

[PATCH 1/4] drm/msm: Remove unused freed llist node

2021-03-31 Thread Rob Clark

From: Rob Clark 

Unused since c951a9b284b907604759628d273901064c60d09f

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index b3a0a880cbab..7a9107cf1818 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -78,8 +78,6 @@ struct msm_gem_object {
 
struct list_head vmas;/* list of msm_gem_vma */
 
-   struct llist_node freed;
-
/* For physically contiguous buffers.  Used when we don't have
 * an IOMMU.  Also used for stolen/splashscreen buffer.
 */
-- 
2.30.2

[PATCH 0/4] drm/msm: Shrinker (and related) fixes

2021-03-31 Thread Rob Clark

From: Rob Clark 

I've been spending some time looking into how things behave under high
memory pressure.  The first patch is a random cleanup I noticed along
the way.  The second improves the situation significantly when we are
getting shrinker called from many threads in parallel.  And the last
two are $debugfs/gem fixes I needed so I could monitor the state of GEM
objects (ie. how many are active/purgable/purged) while triggering high
memory pressure.

We could probably go a bit further with dropping the mm_lock in the
shrinker->scan() loop, but this is already a pretty big improvement.
The next step is probably actually to add support to unpin/evict
inactive objects.  (We are part way there since we have already de-
coupled the iova lifetime from the pages lifetime, but there are a
few sharp corners to work through.)

Rob Clark (4):
  drm/msm: Remove unused freed llist node
  drm/msm: Avoid mutex in shrinker_count()
  drm/msm: Fix debugfs deadlock
  drm/msm: Improved debugfs gem stats

 drivers/gpu/drm/msm/msm_debugfs.c  | 14 ++
 drivers/gpu/drm/msm/msm_drv.c  |  4 ++
 drivers/gpu/drm/msm/msm_drv.h  | 10 -
 drivers/gpu/drm/msm/msm_fb.c   |  3 +-
 drivers/gpu/drm/msm/msm_gem.c  | 61 +-
 drivers/gpu/drm/msm/msm_gem.h  | 58 +---
 drivers/gpu/drm/msm/msm_gem_shrinker.c | 17 +--
 7 files changed, 122 insertions(+), 45 deletions(-)

-- 
2.30.2

Re: [PATCH 1/2] iommu/arm-smmu-qcom: Skip the TTBR1 quirk for db820c.

2021-03-30 Thread Rob Clark

On Tue, Mar 30, 2021 at 8:31 AM Will Deacon  wrote:
>
> On Tue, Mar 30, 2021 at 08:03:36AM -0700, Rob Clark wrote:
> > On Tue, Mar 30, 2021 at 2:34 AM Will Deacon  wrote:
> > >
> > > On Mon, Mar 29, 2021 at 09:02:50PM -0700, Rob Clark wrote:
> > > > On Mon, Mar 29, 2021 at 7:47 AM Will Deacon  wrote:
> > > > >
> > > > > On Fri, Mar 26, 2021 at 04:13:02PM -0700, Eric Anholt wrote:
> > > > > > db820c wants to use the qcom smmu path to get HUPCF set (which keeps
> > > > > > the GPU from wedging and then sometimes wedging the kernel after a
> > > > > > page fault), but it doesn't have separate pagetables support yet in
> > > > > > drm/msm so we can't go all the way to the TTBR1 path.
> > > > >
> > > > > What do you mean by "doesn't have separate pagetables support yet"? 
> > > > > The
> > > > > compatible string doesn't feel like the right way to determine this.
> > > >
> > > > the compatible string identifies what it is, not what the sw
> > > > limitations are, so in that regard it seems right to me..
> > >
> > > Well it depends on what "doesn't have separate pagetables support yet"
> > > means. I can't tell if it's a hardware issue, a firmware issue or a driver
> > > issue.
> >
> > Just a driver issue (and the fact that currently we don't have
> > physical access to a device... debugging a5xx per-process-pgtables by
> > pushing untested things to the CI farm is kind of a difficult way to
> > work)
>
> But then in that case, this is using the compatible string to identify a
> driver issue, no?
>

Well, I suppose yes.. but OTOH it is keeping the problem out of the
dtb.  Once per-process pgtables works for a5xx, there would be no dtb
change, just a change to the quirk behavior in arm-smmu-qcom.

BR,
-R

Re: [PATCH 1/2] iommu/arm-smmu-qcom: Skip the TTBR1 quirk for db820c.

2021-03-30 Thread Rob Clark

On Tue, Mar 30, 2021 at 2:34 AM Will Deacon  wrote:
>
> On Mon, Mar 29, 2021 at 09:02:50PM -0700, Rob Clark wrote:
> > On Mon, Mar 29, 2021 at 7:47 AM Will Deacon  wrote:
> > >
> > > On Fri, Mar 26, 2021 at 04:13:02PM -0700, Eric Anholt wrote:
> > > > db820c wants to use the qcom smmu path to get HUPCF set (which keeps
> > > > the GPU from wedging and then sometimes wedging the kernel after a
> > > > page fault), but it doesn't have separate pagetables support yet in
> > > > drm/msm so we can't go all the way to the TTBR1 path.
> > >
> > > What do you mean by "doesn't have separate pagetables support yet"? The
> > > compatible string doesn't feel like the right way to determine this.
> >
> > the compatible string identifies what it is, not what the sw
> > limitations are, so in that regard it seems right to me..
>
> Well it depends on what "doesn't have separate pagetables support yet"
> means. I can't tell if it's a hardware issue, a firmware issue or a driver
> issue.

Just a driver issue (and the fact that currently we don't have
physical access to a device... debugging a5xx per-process-pgtables by
pushing untested things to the CI farm is kind of a difficult way to
work)

BR,
-R

Re: [PATCH 1/2] iommu/arm-smmu-qcom: Skip the TTBR1 quirk for db820c.

2021-03-29 Thread Rob Clark

On Mon, Mar 29, 2021 at 7:47 AM Will Deacon  wrote:
>
> On Fri, Mar 26, 2021 at 04:13:02PM -0700, Eric Anholt wrote:
> > db820c wants to use the qcom smmu path to get HUPCF set (which keeps
> > the GPU from wedging and then sometimes wedging the kernel after a
> > page fault), but it doesn't have separate pagetables support yet in
> > drm/msm so we can't go all the way to the TTBR1 path.
>
> What do you mean by "doesn't have separate pagetables support yet"? The
> compatible string doesn't feel like the right way to determine this.

the compatible string identifies what it is, not what the sw
limitations are, so in that regard it seems right to me..

BR,
-R

Re: [RFC PATCH 1/3] dt-bindings: display: simple: Add the panel on sc7180-trogdor-pompom

2021-03-26 Thread Rob Clark

On Fri, Mar 26, 2021 at 5:33 PM Rob Clark  wrote:
>
> On Fri, Mar 26, 2021 at 4:48 PM Rob Herring  wrote:
> >
> > On Fri, Mar 26, 2021 at 4:13 PM Rob Clark  wrote:
> > >
> > > On Fri, Mar 26, 2021 at 12:48 PM Rob Herring  wrote:
> > > >
> > > > On Fri, Mar 26, 2021 at 9:20 AM Rob Clark  wrote:
> > > > >
> > > > > On Fri, Mar 26, 2021 at 8:18 AM Rob Clark  wrote:
> > > > > >
> > > > > > On Fri, Mar 26, 2021 at 5:38 AM Thierry Reding 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Wed, Mar 17, 2021 at 06:53:04PM -0700, Rob Clark wrote:
> > > > > > > > On Wed, Mar 17, 2021 at 4:27 PM Matthias Kaehlcke 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Tue, Mar 16, 2021 at 02:08:19PM -0700, Douglas Anderson 
> > > > > > > > > wrote:
> > > > > > > > > > The sc7180-trogdor-pompom board might be attached to any 
> > > > > > > > > > number of a
> > > > > > > > > > pile of eDP panels. At the moment I'm told that the list 
> > > > > > > > > > might include:
> > > > > > > > > > - KD KD116N21-30NV-A010
> > > > > > > > > > - KD KD116N09-30NH-A016
> > > > > > > > > > - Starry 2081116HHD028001-51D
> > > > > > > > > > - Sharp LQ116M1JW10
> > > > > > > > > >
> > > > > > > > > > It should be noted that while the EDID programmed in the 
> > > > > > > > > > first 3
> > > > > > > > > > panels indicates that they should run with exactly the same 
> > > > > > > > > > timing (to
> > > > > > > > > > keep things simple), the 4th panel not only needs different 
> > > > > > > > > > timing but
> > > > > > > > > > has a different resolution.
> > > > > > > > > >
> > > > > > > > > > As is true in general with eDP panels, we can figure out 
> > > > > > > > > > which panel
> > > > > > > > > > we have and all the info needed to drive its pixel clock by 
> > > > > > > > > > reading
> > > > > > > > > > the EDID. However, we can do this only after we've powered 
> > > > > > > > > > the panel
> > > > > > > > > > on. Powering on the panels requires following the timing 
> > > > > > > > > > diagram in
> > > > > > > > > > each panel's datasheet which specifies delays between 
> > > > > > > > > > certain
> > > > > > > > > > actions. This means that, while we can be quite dynamic 
> > > > > > > > > > about handling
> > > > > > > > > > things we can't just totally skip out on describing the 
> > > > > > > > > > panel like we
> > > > > > > > > > could do if it was connected to an external-facing DP port.
> > > > > > > > > >
> > > > > > > > > > While the different panels have slightly different delays, 
> > > > > > > > > > it's
> > > > > > > > > > possible to come up with a set of unified delays that will 
> > > > > > > > > > work on all
> > > > > > > > > > the panels. From reading the datasheets:
> > > > > > > > > > * KD KD116N21-30NV-A010 and KD KD116N09-30NH-A016
> > > > > > > > > >   - HPD absent delay: 200 ms
> > > > > > > > > >   - Unprepare delay: 150 ms (datasheet is confusing, might 
> > > > > > > > > > be 500 ms)
> > > > > > > > > > * Starry 2081116HHD028001-51D
> > > > > > > > > >   - HPD absent delay: 100 ms
> > > > > > > > > >   - Enable delay: (link training done till enable BL): 200 
> > > > > > > > > > ms
> > > > > > > > > >   - Unprepare delay: 500 ms
> > > > > > > > > > * Sharp LQ116M1J

Re: [RFC PATCH 1/3] dt-bindings: display: simple: Add the panel on sc7180-trogdor-pompom

2021-03-26 Thread Rob Clark

On Fri, Mar 26, 2021 at 4:48 PM Rob Herring  wrote:
>
> On Fri, Mar 26, 2021 at 4:13 PM Rob Clark  wrote:
> >
> > On Fri, Mar 26, 2021 at 12:48 PM Rob Herring  wrote:
> > >
> > > On Fri, Mar 26, 2021 at 9:20 AM Rob Clark  wrote:
> > > >
> > > > On Fri, Mar 26, 2021 at 8:18 AM Rob Clark  wrote:
> > > > >
> > > > > On Fri, Mar 26, 2021 at 5:38 AM Thierry Reding 
> > > > >  wrote:
> > > > > >
> > > > > > On Wed, Mar 17, 2021 at 06:53:04PM -0700, Rob Clark wrote:
> > > > > > > On Wed, Mar 17, 2021 at 4:27 PM Matthias Kaehlcke 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Tue, Mar 16, 2021 at 02:08:19PM -0700, Douglas Anderson 
> > > > > > > > wrote:
> > > > > > > > > The sc7180-trogdor-pompom board might be attached to any 
> > > > > > > > > number of a
> > > > > > > > > pile of eDP panels. At the moment I'm told that the list 
> > > > > > > > > might include:
> > > > > > > > > - KD KD116N21-30NV-A010
> > > > > > > > > - KD KD116N09-30NH-A016
> > > > > > > > > - Starry 2081116HHD028001-51D
> > > > > > > > > - Sharp LQ116M1JW10
> > > > > > > > >
> > > > > > > > > It should be noted that while the EDID programmed in the 
> > > > > > > > > first 3
> > > > > > > > > panels indicates that they should run with exactly the same 
> > > > > > > > > timing (to
> > > > > > > > > keep things simple), the 4th panel not only needs different 
> > > > > > > > > timing but
> > > > > > > > > has a different resolution.
> > > > > > > > >
> > > > > > > > > As is true in general with eDP panels, we can figure out 
> > > > > > > > > which panel
> > > > > > > > > we have and all the info needed to drive its pixel clock by 
> > > > > > > > > reading
> > > > > > > > > the EDID. However, we can do this only after we've powered 
> > > > > > > > > the panel
> > > > > > > > > on. Powering on the panels requires following the timing 
> > > > > > > > > diagram in
> > > > > > > > > each panel's datasheet which specifies delays between certain
> > > > > > > > > actions. This means that, while we can be quite dynamic about 
> > > > > > > > > handling
> > > > > > > > > things we can't just totally skip out on describing the panel 
> > > > > > > > > like we
> > > > > > > > > could do if it was connected to an external-facing DP port.
> > > > > > > > >
> > > > > > > > > While the different panels have slightly different delays, 
> > > > > > > > > it's
> > > > > > > > > possible to come up with a set of unified delays that will 
> > > > > > > > > work on all
> > > > > > > > > the panels. From reading the datasheets:
> > > > > > > > > * KD KD116N21-30NV-A010 and KD KD116N09-30NH-A016
> > > > > > > > >   - HPD absent delay: 200 ms
> > > > > > > > >   - Unprepare delay: 150 ms (datasheet is confusing, might be 
> > > > > > > > > 500 ms)
> > > > > > > > > * Starry 2081116HHD028001-51D
> > > > > > > > >   - HPD absent delay: 100 ms
> > > > > > > > >   - Enable delay: (link training done till enable BL): 200 ms
> > > > > > > > >   - Unprepare delay: 500 ms
> > > > > > > > > * Sharp LQ116M1JW10
> > > > > > > > >   - HPD absent delay: 200 ms
> > > > > > > > >   - Unprepare delay: 500 ms
> > > > > > > > >   - Prepare to enable delay (power on till backlight): 100 ms
> > > > > > > > >
> > > > > > > > > Unified:
> > > > > > > > > - HPD absent delay: 200 ms
> > > > > > > > > - Unprepare de

Re: [RFC PATCH 1/3] dt-bindings: display: simple: Add the panel on sc7180-trogdor-pompom

2021-03-26 Thread Rob Clark

On Fri, Mar 26, 2021 at 12:48 PM Rob Herring  wrote:
>
> On Fri, Mar 26, 2021 at 9:20 AM Rob Clark  wrote:
> >
> > On Fri, Mar 26, 2021 at 8:18 AM Rob Clark  wrote:
> > >
> > > On Fri, Mar 26, 2021 at 5:38 AM Thierry Reding  
> > > wrote:
> > > >
> > > > On Wed, Mar 17, 2021 at 06:53:04PM -0700, Rob Clark wrote:
> > > > > On Wed, Mar 17, 2021 at 4:27 PM Matthias Kaehlcke  
> > > > > wrote:
> > > > > >
> > > > > > On Tue, Mar 16, 2021 at 02:08:19PM -0700, Douglas Anderson wrote:
> > > > > > > The sc7180-trogdor-pompom board might be attached to any number 
> > > > > > > of a
> > > > > > > pile of eDP panels. At the moment I'm told that the list might 
> > > > > > > include:
> > > > > > > - KD KD116N21-30NV-A010
> > > > > > > - KD KD116N09-30NH-A016
> > > > > > > - Starry 2081116HHD028001-51D
> > > > > > > - Sharp LQ116M1JW10
> > > > > > >
> > > > > > > It should be noted that while the EDID programmed in the first 3
> > > > > > > panels indicates that they should run with exactly the same 
> > > > > > > timing (to
> > > > > > > keep things simple), the 4th panel not only needs different 
> > > > > > > timing but
> > > > > > > has a different resolution.
> > > > > > >
> > > > > > > As is true in general with eDP panels, we can figure out which 
> > > > > > > panel
> > > > > > > we have and all the info needed to drive its pixel clock by 
> > > > > > > reading
> > > > > > > the EDID. However, we can do this only after we've powered the 
> > > > > > > panel
> > > > > > > on. Powering on the panels requires following the timing diagram 
> > > > > > > in
> > > > > > > each panel's datasheet which specifies delays between certain
> > > > > > > actions. This means that, while we can be quite dynamic about 
> > > > > > > handling
> > > > > > > things we can't just totally skip out on describing the panel 
> > > > > > > like we
> > > > > > > could do if it was connected to an external-facing DP port.
> > > > > > >
> > > > > > > While the different panels have slightly different delays, it's
> > > > > > > possible to come up with a set of unified delays that will work 
> > > > > > > on all
> > > > > > > the panels. From reading the datasheets:
> > > > > > > * KD KD116N21-30NV-A010 and KD KD116N09-30NH-A016
> > > > > > >   - HPD absent delay: 200 ms
> > > > > > >   - Unprepare delay: 150 ms (datasheet is confusing, might be 500 
> > > > > > > ms)
> > > > > > > * Starry 2081116HHD028001-51D
> > > > > > >   - HPD absent delay: 100 ms
> > > > > > >   - Enable delay: (link training done till enable BL): 200 ms
> > > > > > >   - Unprepare delay: 500 ms
> > > > > > > * Sharp LQ116M1JW10
> > > > > > >   - HPD absent delay: 200 ms
> > > > > > >   - Unprepare delay: 500 ms
> > > > > > >   - Prepare to enable delay (power on till backlight): 100 ms
> > > > > > >
> > > > > > > Unified:
> > > > > > > - HPD absent delay: 200 ms
> > > > > > > - Unprepare delay: 500 ms
> > > > > > > - Enable delay: 200 ms
> > > > > > >
> > > > > > > NOTE: in theory the only thing that we _really_ need unity on is 
> > > > > > > the
> > > > > > > "HPD absent delay" since once the panel asserts HPD we can read 
> > > > > > > the
> > > > > > > EDID and could make per-panel decisions if we wanted.
> > > > > > >
> > > > > > > Let's create a definition of "a panel that can be attached to 
> > > > > > > pompom"
> > > > > > > as a panel that provides a valid EDID and can work with the 
> > > > > > > standard
> > > > > > > pompom power sequencing. If more panels are later attached to 
&

Re: [RFC PATCH 1/3] dt-bindings: display: simple: Add the panel on sc7180-trogdor-pompom

2021-03-26 Thread Rob Clark

On Fri, Mar 26, 2021 at 8:18 AM Rob Clark  wrote:
>
> On Fri, Mar 26, 2021 at 5:38 AM Thierry Reding  
> wrote:
> >
> > On Wed, Mar 17, 2021 at 06:53:04PM -0700, Rob Clark wrote:
> > > On Wed, Mar 17, 2021 at 4:27 PM Matthias Kaehlcke  
> > > wrote:
> > > >
> > > > On Tue, Mar 16, 2021 at 02:08:19PM -0700, Douglas Anderson wrote:
> > > > > The sc7180-trogdor-pompom board might be attached to any number of a
> > > > > pile of eDP panels. At the moment I'm told that the list might 
> > > > > include:
> > > > > - KD KD116N21-30NV-A010
> > > > > - KD KD116N09-30NH-A016
> > > > > - Starry 2081116HHD028001-51D
> > > > > - Sharp LQ116M1JW10
> > > > >
> > > > > It should be noted that while the EDID programmed in the first 3
> > > > > panels indicates that they should run with exactly the same timing (to
> > > > > keep things simple), the 4th panel not only needs different timing but
> > > > > has a different resolution.
> > > > >
> > > > > As is true in general with eDP panels, we can figure out which panel
> > > > > we have and all the info needed to drive its pixel clock by reading
> > > > > the EDID. However, we can do this only after we've powered the panel
> > > > > on. Powering on the panels requires following the timing diagram in
> > > > > each panel's datasheet which specifies delays between certain
> > > > > actions. This means that, while we can be quite dynamic about handling
> > > > > things we can't just totally skip out on describing the panel like we
> > > > > could do if it was connected to an external-facing DP port.
> > > > >
> > > > > While the different panels have slightly different delays, it's
> > > > > possible to come up with a set of unified delays that will work on all
> > > > > the panels. From reading the datasheets:
> > > > > * KD KD116N21-30NV-A010 and KD KD116N09-30NH-A016
> > > > >   - HPD absent delay: 200 ms
> > > > >   - Unprepare delay: 150 ms (datasheet is confusing, might be 500 ms)
> > > > > * Starry 2081116HHD028001-51D
> > > > >   - HPD absent delay: 100 ms
> > > > >   - Enable delay: (link training done till enable BL): 200 ms
> > > > >   - Unprepare delay: 500 ms
> > > > > * Sharp LQ116M1JW10
> > > > >   - HPD absent delay: 200 ms
> > > > >   - Unprepare delay: 500 ms
> > > > >   - Prepare to enable delay (power on till backlight): 100 ms
> > > > >
> > > > > Unified:
> > > > > - HPD absent delay: 200 ms
> > > > > - Unprepare delay: 500 ms
> > > > > - Enable delay: 200 ms
> > > > >
> > > > > NOTE: in theory the only thing that we _really_ need unity on is the
> > > > > "HPD absent delay" since once the panel asserts HPD we can read the
> > > > > EDID and could make per-panel decisions if we wanted.
> > > > >
> > > > > Let's create a definition of "a panel that can be attached to pompom"
> > > > > as a panel that provides a valid EDID and can work with the standard
> > > > > pompom power sequencing. If more panels are later attached to pompom
> > > > > then it's fine as long as they work in a compatible way.
> > > > >
> > > > > One might ask why we can't just use a generic string here and provide
> > > > > the timings directly in the device tree file. As I understand it,
> > > > > trying to describe generic power sequencing in the device tree is
> > > > > frowned upon and the one instance (SD/MMC) is regarded as a mistake
> > > > > that shouldn't be repeated. Specifying a power sequence per board (or
> > > > > per board class) feels like a reasonable compromise. We're not trying
> > > > > to define fully generic power sequence bindings but we can also take
> > > > > advantage of the semi-probable properties of the attached device.
> > > > >
> > > > > NOTE: I believe that past instances of supporting this type of thing
> > > > > have used the "white lie" approach. One representative panel was
> > > > > listed in the device tree. The power sequencings of this
> > > > > representative panel were OK to use across all panels that might be
&g

Re: [RFC PATCH 1/3] dt-bindings: display: simple: Add the panel on sc7180-trogdor-pompom

2021-03-26 Thread Rob Clark

On Fri, Mar 26, 2021 at 5:38 AM Thierry Reding  wrote:
>
> On Wed, Mar 17, 2021 at 06:53:04PM -0700, Rob Clark wrote:
> > On Wed, Mar 17, 2021 at 4:27 PM Matthias Kaehlcke  wrote:
> > >
> > > On Tue, Mar 16, 2021 at 02:08:19PM -0700, Douglas Anderson wrote:
> > > > The sc7180-trogdor-pompom board might be attached to any number of a
> > > > pile of eDP panels. At the moment I'm told that the list might include:
> > > > - KD KD116N21-30NV-A010
> > > > - KD KD116N09-30NH-A016
> > > > - Starry 2081116HHD028001-51D
> > > > - Sharp LQ116M1JW10
> > > >
> > > > It should be noted that while the EDID programmed in the first 3
> > > > panels indicates that they should run with exactly the same timing (to
> > > > keep things simple), the 4th panel not only needs different timing but
> > > > has a different resolution.
> > > >
> > > > As is true in general with eDP panels, we can figure out which panel
> > > > we have and all the info needed to drive its pixel clock by reading
> > > > the EDID. However, we can do this only after we've powered the panel
> > > > on. Powering on the panels requires following the timing diagram in
> > > > each panel's datasheet which specifies delays between certain
> > > > actions. This means that, while we can be quite dynamic about handling
> > > > things we can't just totally skip out on describing the panel like we
> > > > could do if it was connected to an external-facing DP port.
> > > >
> > > > While the different panels have slightly different delays, it's
> > > > possible to come up with a set of unified delays that will work on all
> > > > the panels. From reading the datasheets:
> > > > * KD KD116N21-30NV-A010 and KD KD116N09-30NH-A016
> > > >   - HPD absent delay: 200 ms
> > > >   - Unprepare delay: 150 ms (datasheet is confusing, might be 500 ms)
> > > > * Starry 2081116HHD028001-51D
> > > >   - HPD absent delay: 100 ms
> > > >   - Enable delay: (link training done till enable BL): 200 ms
> > > >   - Unprepare delay: 500 ms
> > > > * Sharp LQ116M1JW10
> > > >   - HPD absent delay: 200 ms
> > > >   - Unprepare delay: 500 ms
> > > >   - Prepare to enable delay (power on till backlight): 100 ms
> > > >
> > > > Unified:
> > > > - HPD absent delay: 200 ms
> > > > - Unprepare delay: 500 ms
> > > > - Enable delay: 200 ms
> > > >
> > > > NOTE: in theory the only thing that we _really_ need unity on is the
> > > > "HPD absent delay" since once the panel asserts HPD we can read the
> > > > EDID and could make per-panel decisions if we wanted.
> > > >
> > > > Let's create a definition of "a panel that can be attached to pompom"
> > > > as a panel that provides a valid EDID and can work with the standard
> > > > pompom power sequencing. If more panels are later attached to pompom
> > > > then it's fine as long as they work in a compatible way.
> > > >
> > > > One might ask why we can't just use a generic string here and provide
> > > > the timings directly in the device tree file. As I understand it,
> > > > trying to describe generic power sequencing in the device tree is
> > > > frowned upon and the one instance (SD/MMC) is regarded as a mistake
> > > > that shouldn't be repeated. Specifying a power sequence per board (or
> > > > per board class) feels like a reasonable compromise. We're not trying
> > > > to define fully generic power sequence bindings but we can also take
> > > > advantage of the semi-probable properties of the attached device.
> > > >
> > > > NOTE: I believe that past instances of supporting this type of thing
> > > > have used the "white lie" approach. One representative panel was
> > > > listed in the device tree. The power sequencings of this
> > > > representative panel were OK to use across all panels that might be
> > > > attached and other differences were handled by EDID. This patch
> > > > attempts to set a new precedent and avoid the need for the white lie.
> > > >
> > > > Signed-off-by: Douglas Anderson 
> > > > ---
> > >
> > > Sounds reasonable to me if DT maintainers can live with this abstract
> > > hardware definition. It's clearer than the 'white lie' appro

[PATCH 0/2] drm/msm: Fixes/updates for perfetto profiling

2021-03-24 Thread Rob Clark

From: Rob Clark 

A couple kernel side things I realized I needed in the process of
implementing performance-counter and render-stage support for perfetto,
the first patch fixes the MSM_PARAM_TIMESTAMP query which was just
wrong on a5xx/a6xx (ALWAYS_COUNT vs ALWAYS_ON).  The second adds a
way for userspace to determine whether the device has suspended since
the last sampling period (which means counters have lost their state
and configuration).

I am a bit tempted to add a way that a CAP_SYS_ADMIN user could ask
the GPU to not suspend (until the drm_file is closed), but so far
I've managed to avoid needing this.

Rob Clark (2):
  drm/msm: Fix a5xx/a6xx timestamps
  drm/msm: Add param for userspace to query suspend count

 drivers/gpu/drm/msm/adreno/a5xx_gpu.c   | 4 ++--
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 4 ++--
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 3 +++
 drivers/gpu/drm/msm/msm_drv.c   | 1 +
 drivers/gpu/drm/msm/msm_gpu.c   | 2 ++
 drivers/gpu/drm/msm/msm_gpu.h   | 2 ++
 include/uapi/drm/msm_drm.h  | 1 +
 7 files changed, 13 insertions(+), 4 deletions(-)

-- 
2.29.2

[PATCH 2/2] drm/msm: Add param for userspace to query suspend count

2021-03-24 Thread Rob Clark

From: Rob Clark 

Performance counts, and ALWAYS_ON counters used for capturing GPU
timestamps, lose their state across suspend/resume cycles.  Userspace
tooling for performance monitoring needs to be aware of this.  For
example, after a suspend userspace needs to recalibrate it's offset
between CPU and GPU time.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 3 +++
 drivers/gpu/drm/msm/msm_drv.c   | 1 +
 drivers/gpu/drm/msm/msm_gpu.c   | 2 ++
 drivers/gpu/drm/msm/msm_gpu.h   | 2 ++
 include/uapi/drm/msm_drm.h  | 1 +
 5 files changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index f09175698827..e473b7c9ff7f 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -280,6 +280,9 @@ int adreno_get_param(struct msm_gpu *gpu, uint32_t param, 
uint64_t *value)
case MSM_PARAM_FAULTS:
*value = gpu->global_faults;
return 0;
+   case MSM_PARAM_SUSPENDS:
+   *value = gpu->suspend_count;
+   return 0;
default:
DBG("%s: invalid param: %u", gpu->name, param);
return -EINVAL;
diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index b29e439eb299..4f9fa0189a07 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -39,6 +39,7 @@
  *   GEM object's debug name
  * - 1.5.0 - Add SUBMITQUERY_QUERY ioctl
  * - 1.6.0 - Syncobj support
+ * - 1.7.0 - Add MSM_PARAM_SUSPENDS to access suspend count
  */
 #define MSM_VERSION_MAJOR  1
 #define MSM_VERSION_MINOR  6
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 7bdb01f202f4..ab888d83b887 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -256,6 +256,8 @@ int msm_gpu_pm_suspend(struct msm_gpu *gpu)
if (ret)
return ret;
 
+   gpu->suspend_count++;
+
return 0;
 }
 
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index d7cd02cd2109..18baf935e143 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -152,6 +152,8 @@ struct msm_gpu {
ktime_t time;
} devfreq;
 
+   uint32_t suspend_count;
+
struct msm_gpu_state *crashstate;
/* True if the hardware supports expanded apriv (a650 and newer) */
bool hw_apriv;
diff --git a/include/uapi/drm/msm_drm.h b/include/uapi/drm/msm_drm.h
index a6c1f3eb2623..5596d7c37f9e 100644
--- a/include/uapi/drm/msm_drm.h
+++ b/include/uapi/drm/msm_drm.h
@@ -76,6 +76,7 @@ struct drm_msm_timespec {
 #define MSM_PARAM_NR_RINGS   0x07
 #define MSM_PARAM_PP_PGTABLE 0x08  /* => 1 for per-process pagetables, else 0 
*/
 #define MSM_PARAM_FAULTS 0x09
+#define MSM_PARAM_SUSPENDS   0x0a
 
 struct drm_msm_param {
__u32 pipe;   /* in, MSM_PIPE_x */
-- 
2.29.2

[PATCH 1/2] drm/msm: Fix a5xx/a6xx timestamps

2021-03-24 Thread Rob Clark

From: Rob Clark 

They were reading a counter that was configured to ALWAYS_COUNT (ie.
cycles that the GPU is doing something) rather than ALWAYS_ON.  This
isn't the thing that userspace is looking for.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 4 ++--
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index a5af223eaf50..bb82fcd9df81 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -1241,8 +1241,8 @@ static int a5xx_pm_suspend(struct msm_gpu *gpu)
 
 static int a5xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
 {
-   *value = gpu_read64(gpu, REG_A5XX_RBBM_PERFCTR_CP_0_LO,
-   REG_A5XX_RBBM_PERFCTR_CP_0_HI);
+   *value = gpu_read64(gpu, REG_A5XX_RBBM_ALWAYSON_COUNTER_LO,
+   REG_A5XX_RBBM_ALWAYSON_COUNTER_HI);
 
return 0;
 }
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 130661898546..59718c304488 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1173,8 +1173,8 @@ static int a6xx_get_timestamp(struct msm_gpu *gpu, 
uint64_t *value)
/* Force the GPU power on so we can read this register */
a6xx_gmu_set_oob(_gpu->gmu, GMU_OOB_GPU_SET);
 
-   *value = gpu_read64(gpu, REG_A6XX_RBBM_PERFCTR_CP_0_LO,
-   REG_A6XX_RBBM_PERFCTR_CP_0_HI);
+   *value = gpu_read64(gpu, REG_A6XX_CP_ALWAYS_ON_COUNTER_LO,
+   REG_A6XX_CP_ALWAYS_ON_COUNTER_HI);
 
a6xx_gmu_clear_oob(_gpu->gmu, GMU_OOB_GPU_SET);
return 0;
-- 
2.29.2

Re: [v1] drm/msm/disp/dpu1: icc path needs to be set before dpu runtime resume

2021-03-22 Thread Rob Clark

On Mon, Mar 22, 2021 at 5:44 PM Matthias Kaehlcke  wrote:
>
> On Mon, Mar 22, 2021 at 02:17:12AM -0700, Kalyan Thota wrote:
> > From: Kalyan Thota 
> >
> > DPU runtime resume will request for a min vote on the AXI bus as
> > it is a necessary step before turning ON the AXI clock.
> >
> > The change does below
> > 1) Move the icc path set before requesting runtime get_sync.
> > 2) remove the dependency of hw catalog for min ib vote
> > as it is initialized at a later point.
> >
> > Signed-off-by: Kalyan Thota 
>
> Confirmed that this fixes a bunch of warnings at boot on SC7180 when
> (out-of-tree) camera support is enabled:
>
>   [1.832228] gcc_disp_hf_axi_clk status stuck at 'off'
>   [2.118292] gcc_disp_hf_axi_clk status stuck at 'off'
>   [2.442383] gcc_disp_hf_axi_clk already disabled
>   [2.750054] gcc_disp_hf_axi_clk already unprepared
>   [3.154835] gcc_disp_hf_axi_clk already disabled
>   [3.421835] gcc_disp_hf_axi_clk already unprepared
>
> Tested-by: Matthias Kaehlcke 

thanks for testing on the setup which had this issue.. I've pushed to msm-next

BR,
-R

Re: [RFC PATCH 1/3] dt-bindings: display: simple: Add the panel on sc7180-trogdor-pompom

2021-03-17 Thread Rob Clark

On Wed, Mar 17, 2021 at 4:27 PM Matthias Kaehlcke  wrote:
>
> On Tue, Mar 16, 2021 at 02:08:19PM -0700, Douglas Anderson wrote:
> > The sc7180-trogdor-pompom board might be attached to any number of a
> > pile of eDP panels. At the moment I'm told that the list might include:
> > - KD KD116N21-30NV-A010
> > - KD KD116N09-30NH-A016
> > - Starry 2081116HHD028001-51D
> > - Sharp LQ116M1JW10
> >
> > It should be noted that while the EDID programmed in the first 3
> > panels indicates that they should run with exactly the same timing (to
> > keep things simple), the 4th panel not only needs different timing but
> > has a different resolution.
> >
> > As is true in general with eDP panels, we can figure out which panel
> > we have and all the info needed to drive its pixel clock by reading
> > the EDID. However, we can do this only after we've powered the panel
> > on. Powering on the panels requires following the timing diagram in
> > each panel's datasheet which specifies delays between certain
> > actions. This means that, while we can be quite dynamic about handling
> > things we can't just totally skip out on describing the panel like we
> > could do if it was connected to an external-facing DP port.
> >
> > While the different panels have slightly different delays, it's
> > possible to come up with a set of unified delays that will work on all
> > the panels. From reading the datasheets:
> > * KD KD116N21-30NV-A010 and KD KD116N09-30NH-A016
> >   - HPD absent delay: 200 ms
> >   - Unprepare delay: 150 ms (datasheet is confusing, might be 500 ms)
> > * Starry 2081116HHD028001-51D
> >   - HPD absent delay: 100 ms
> >   - Enable delay: (link training done till enable BL): 200 ms
> >   - Unprepare delay: 500 ms
> > * Sharp LQ116M1JW10
> >   - HPD absent delay: 200 ms
> >   - Unprepare delay: 500 ms
> >   - Prepare to enable delay (power on till backlight): 100 ms
> >
> > Unified:
> > - HPD absent delay: 200 ms
> > - Unprepare delay: 500 ms
> > - Enable delay: 200 ms
> >
> > NOTE: in theory the only thing that we _really_ need unity on is the
> > "HPD absent delay" since once the panel asserts HPD we can read the
> > EDID and could make per-panel decisions if we wanted.
> >
> > Let's create a definition of "a panel that can be attached to pompom"
> > as a panel that provides a valid EDID and can work with the standard
> > pompom power sequencing. If more panels are later attached to pompom
> > then it's fine as long as they work in a compatible way.
> >
> > One might ask why we can't just use a generic string here and provide
> > the timings directly in the device tree file. As I understand it,
> > trying to describe generic power sequencing in the device tree is
> > frowned upon and the one instance (SD/MMC) is regarded as a mistake
> > that shouldn't be repeated. Specifying a power sequence per board (or
> > per board class) feels like a reasonable compromise. We're not trying
> > to define fully generic power sequence bindings but we can also take
> > advantage of the semi-probable properties of the attached device.
> >
> > NOTE: I believe that past instances of supporting this type of thing
> > have used the "white lie" approach. One representative panel was
> > listed in the device tree. The power sequencings of this
> > representative panel were OK to use across all panels that might be
> > attached and other differences were handled by EDID. This patch
> > attempts to set a new precedent and avoid the need for the white lie.
> >
> > Signed-off-by: Douglas Anderson 
> > ---
>
> Sounds reasonable to me if DT maintainers can live with this abstract
> hardware definition. It's clearer than the 'white lie' approach.

Yeah, it is a weird grey area between "discoverable" and "not
discoverable".. but I favor DT reflecting reality as much as
possible/feasible, so I think this is definity cleaner than "white
lies"

Reviewed-by: Rob Clark 

> It's then up to the vendor/manufacturer to ensure to only ship devices
> with panels that have compatible timings.
>
> >  .../devicetree/bindings/display/panel/panel-simple.yaml   | 4 
> >  1 file changed, 4 insertions(+)
> >
> > diff --git 
> > a/Documentation/devicetree/bindings/display/panel/panel-simple.yaml 
> > b/Documentation/devicetree/bindings/display/panel/panel-simple.yaml
> > index 62b0d54d87b7..9807dbc1cceb 100644
> > --- a/Documentation/devicetree/bindings/display/panel/panel-simple.yaml
> > +++ b/Documentation/

[PATCH] drm/msm: Ratelimit invalid-fence message

2021-03-17 Thread Rob Clark

From: Rob Clark 

We have seen a couple cases where low memory situations cause something
bad to happen, followed by a flood of these messages obscuring the root
cause.  Lets ratelimit the dmesg spam so that next time it happens we
don't loose the kernel traces leading up to this.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_fence.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c
index ad2703698b05..cd59a5918038 100644
--- a/drivers/gpu/drm/msm/msm_fence.c
+++ b/drivers/gpu/drm/msm/msm_fence.c
@@ -45,7 +45,7 @@ int msm_wait_fence(struct msm_fence_context *fctx, uint32_t 
fence,
int ret;
 
if (fence > fctx->last_fence) {
-   DRM_ERROR("%s: waiting on invalid fence: %u (of %u)\n",
+   DRM_ERROR_RATELIMITED("%s: waiting on invalid fence: %u (of 
%u)\n",
fctx->name, fence, fctx->last_fence);
return -EINVAL;
}
-- 
2.29.2

Re: [PATCH 2/3] iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag

2021-03-16 Thread Rob Clark

On Tue, Mar 16, 2021 at 10:04 AM Rob Clark  wrote:
>
> On Wed, Feb 3, 2021 at 2:14 PM Rob Clark  wrote:
> >
> > On Wed, Feb 3, 2021 at 1:46 PM Will Deacon  wrote:
> > >
> > > On Tue, Feb 02, 2021 at 11:56:27AM +0530, Sai Prakash Ranjan wrote:
> > > > On 2021-02-01 23:50, Jordan Crouse wrote:
> > > > > On Mon, Feb 01, 2021 at 08:20:44AM -0800, Rob Clark wrote:
> > > > > > On Mon, Feb 1, 2021 at 3:16 AM Will Deacon  wrote:
> > > > > > > On Fri, Jan 29, 2021 at 03:12:59PM +0530, Sai Prakash Ranjan 
> > > > > > > wrote:
> > > > > > > > On 2021-01-29 14:35, Will Deacon wrote:
> > > > > > > > > On Mon, Jan 11, 2021 at 07:45:04PM +0530, Sai Prakash Ranjan 
> > > > > > > > > wrote:
> > > > > > > > > > +#define IOMMU_LLC(1 << 6)
> > > > > > > > >
> > > > > > > > > On reflection, I'm a bit worried about exposing this because 
> > > > > > > > > I think it
> > > > > > > > > will
> > > > > > > > > introduce a mismatched virtual alias with the CPU (we don't 
> > > > > > > > > even have a
> > > > > > > > > MAIR
> > > > > > > > > set up for this memory type). Now, we also have that issue 
> > > > > > > > > for the PTW,
> > > > > > > > > but
> > > > > > > > > since we always use cache maintenance (i.e. the streaming 
> > > > > > > > > API) for
> > > > > > > > > publishing the page-tables to a non-coheren walker, it works 
> > > > > > > > > out.
> > > > > > > > > However,
> > > > > > > > > if somebody expects IOMMU_LLC to be coherent with a DMA API 
> > > > > > > > > coherent
> > > > > > > > > allocation, then they're potentially in for a nasty surprise 
> > > > > > > > > due to the
> > > > > > > > > mismatched outer-cacheability attributes.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Can't we add the syscached memory type similar to what is done 
> > > > > > > > on android?
> > > > > > >
> > > > > > > Maybe. How does the GPU driver map these things on the CPU side?
> > > > > >
> > > > > > Currently we use writecombine mappings for everything, although 
> > > > > > there
> > > > > > are some cases that we'd like to use cached (but have not merged
> > > > > > patches that would give userspace a way to flush/invalidate)
> > > > > >
> > > > >
> > > > > LLC/system cache doesn't have a relationship with the CPU cache.  Its
> > > > > just a
> > > > > little accelerator that sits on the connection from the GPU to DDR and
> > > > > caches
> > > > > accesses. The hint that Sai is suggesting is used to mark the buffers 
> > > > > as
> > > > > 'no-write-allocate' to prevent GPU write operations from being cached 
> > > > > in
> > > > > the LLC
> > > > > which a) isn't interesting and b) takes up cache space for read
> > > > > operations.
> > > > >
> > > > > Its easiest to think of the LLC as a bonus accelerator that has no 
> > > > > cost
> > > > > for
> > > > > us to use outside of the unfortunate per buffer hint.
> > > > >
> > > > > We do have to worry about the CPU cache w.r.t I/O coherency (which is 
> > > > > a
> > > > > different hint) and in that case we have all of concerns that Will
> > > > > identified.
> > > > >
> > > >
> > > > For mismatched outer cacheability attributes which Will mentioned, I was
> > > > referring to [1] in android kernel.
> > >
> > > I've lost track of the conversation here :/
> > >
> > > When the GPU has a buffer mapped with IOMMU_LLC, is the buffer also mapped
> > > into the CPU and with what attributes? Rob said "writecombine for
> > > everything" -- does that mean ioremap_wc() / MEMREMAP_WC?
> >
> > Currently userspace asks for everything WC, so pgprot_writecombine()
> >
> > The kernel doesn't enforce this, but so far provides no UAPI to do
> > anything useful with non-coherent cached mappings (although there is
> > interest to support this)
> >
>
> btw, I'm looking at a benchmark (gl_driver2_off) where (after some
> other in-flight optimizations land) we end up bottlenecked on writing
> to WC cmdstream buffers.  I assume in the current state, WC goes all
> the way to main memory rather than just to system cache?
>

oh, I guess this (mentioned earlier in thread) is what I really want
for this benchmark:

https://android-review.googlesource.com/c/kernel/common/+/1549097/3

> BR,
> -R
>
> > BR,
> > -R
> >
> > > Finally, we need to be careful when we use the word "hint" as "allocation
> > > hint" has a specific meaning in the architecture, and if we only mismatch 
> > > on
> > > those then we're actually ok. But I think IOMMU_LLC is more than just a
> > > hint, since it actually drives eviction policy (i.e. it enables 
> > > writeback).
> > >
> > > Sorry for the pedantry, but I just want to make sure we're all talking
> > > about the same things!
> > >
> > > Cheers,
> > >
> > > Will

Re: [PATCH 2/3] iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag

2021-03-16 Thread Rob Clark

On Wed, Feb 3, 2021 at 2:14 PM Rob Clark  wrote:
>
> On Wed, Feb 3, 2021 at 1:46 PM Will Deacon  wrote:
> >
> > On Tue, Feb 02, 2021 at 11:56:27AM +0530, Sai Prakash Ranjan wrote:
> > > On 2021-02-01 23:50, Jordan Crouse wrote:
> > > > On Mon, Feb 01, 2021 at 08:20:44AM -0800, Rob Clark wrote:
> > > > > On Mon, Feb 1, 2021 at 3:16 AM Will Deacon  wrote:
> > > > > > On Fri, Jan 29, 2021 at 03:12:59PM +0530, Sai Prakash Ranjan wrote:
> > > > > > > On 2021-01-29 14:35, Will Deacon wrote:
> > > > > > > > On Mon, Jan 11, 2021 at 07:45:04PM +0530, Sai Prakash Ranjan 
> > > > > > > > wrote:
> > > > > > > > > +#define IOMMU_LLC(1 << 6)
> > > > > > > >
> > > > > > > > On reflection, I'm a bit worried about exposing this because I 
> > > > > > > > think it
> > > > > > > > will
> > > > > > > > introduce a mismatched virtual alias with the CPU (we don't 
> > > > > > > > even have a
> > > > > > > > MAIR
> > > > > > > > set up for this memory type). Now, we also have that issue for 
> > > > > > > > the PTW,
> > > > > > > > but
> > > > > > > > since we always use cache maintenance (i.e. the streaming API) 
> > > > > > > > for
> > > > > > > > publishing the page-tables to a non-coheren walker, it works 
> > > > > > > > out.
> > > > > > > > However,
> > > > > > > > if somebody expects IOMMU_LLC to be coherent with a DMA API 
> > > > > > > > coherent
> > > > > > > > allocation, then they're potentially in for a nasty surprise 
> > > > > > > > due to the
> > > > > > > > mismatched outer-cacheability attributes.
> > > > > > > >
> > > > > > >
> > > > > > > Can't we add the syscached memory type similar to what is done on 
> > > > > > > android?
> > > > > >
> > > > > > Maybe. How does the GPU driver map these things on the CPU side?
> > > > >
> > > > > Currently we use writecombine mappings for everything, although there
> > > > > are some cases that we'd like to use cached (but have not merged
> > > > > patches that would give userspace a way to flush/invalidate)
> > > > >
> > > >
> > > > LLC/system cache doesn't have a relationship with the CPU cache.  Its
> > > > just a
> > > > little accelerator that sits on the connection from the GPU to DDR and
> > > > caches
> > > > accesses. The hint that Sai is suggesting is used to mark the buffers as
> > > > 'no-write-allocate' to prevent GPU write operations from being cached in
> > > > the LLC
> > > > which a) isn't interesting and b) takes up cache space for read
> > > > operations.
> > > >
> > > > Its easiest to think of the LLC as a bonus accelerator that has no cost
> > > > for
> > > > us to use outside of the unfortunate per buffer hint.
> > > >
> > > > We do have to worry about the CPU cache w.r.t I/O coherency (which is a
> > > > different hint) and in that case we have all of concerns that Will
> > > > identified.
> > > >
> > >
> > > For mismatched outer cacheability attributes which Will mentioned, I was
> > > referring to [1] in android kernel.
> >
> > I've lost track of the conversation here :/
> >
> > When the GPU has a buffer mapped with IOMMU_LLC, is the buffer also mapped
> > into the CPU and with what attributes? Rob said "writecombine for
> > everything" -- does that mean ioremap_wc() / MEMREMAP_WC?
>
> Currently userspace asks for everything WC, so pgprot_writecombine()
>
> The kernel doesn't enforce this, but so far provides no UAPI to do
> anything useful with non-coherent cached mappings (although there is
> interest to support this)
>

btw, I'm looking at a benchmark (gl_driver2_off) where (after some
other in-flight optimizations land) we end up bottlenecked on writing
to WC cmdstream buffers.  I assume in the current state, WC goes all
the way to main memory rather than just to system cache?

BR,
-R

> BR,
> -R
>
> > Finally, we need to be careful when we use the word "hint" as "allocation
> > hint" has a specific meaning in the architecture, and if we only mismatch on
> > those then we're actually ok. But I think IOMMU_LLC is more than just a
> > hint, since it actually drives eviction policy (i.e. it enables writeback).
> >
> > Sorry for the pedantry, but I just want to make sure we're all talking
> > about the same things!
> >
> > Cheers,
> >
> > Will

Re: [PATCHv2 2/2] iommu/arm-smmu-qcom: Move the adreno smmu specific impl earlier

2021-02-26 Thread Rob Clark

On Fri, Feb 26, 2021 at 9:24 AM Bjorn Andersson
 wrote:
>
> On Fri 26 Feb 03:55 CST 2021, Sai Prakash Ranjan wrote:
>
> > Adreno(GPU) SMMU and APSS(Application Processor SubSystem) SMMU
> > both implement "arm,mmu-500" in some QTI SoCs and to run through
> > adreno smmu specific implementation such as enabling split pagetables
> > support, we need to match the "qcom,adreno-smmu" compatible first
> > before apss smmu or else we will be running apps smmu implementation
> > for adreno smmu and the additional features for adreno smmu is never
> > set. For ex: we have "qcom,sc7280-smmu-500" compatible for both apps
> > and adreno smmu implementing "arm,mmu-500", so the adreno smmu
> > implementation is never reached because the current sequence checks
> > for apps smmu compatible(qcom,sc7280-smmu-500) first and runs that
> > specific impl and we never reach adreno smmu specific implementation.
> >
>
> So you're saying that you have a single SMMU instance that's compatible
> with both an entry in qcom_smmu_impl_of_match[] and "qcom,adreno-smmu"?
>
> Per your proposed change we will pick the adreno ops _only_ for this
> component, essentially disabling the non-Adreno quirks selected by the
> qcom impl. As such keeping the non-adreno compatible in the
> qcom_smmu_impl_init[] seems to only serve to obfuscate the situation.
>
> Don't we somehow need the combined set of quirks? (At least if we're
> running this with a standard UEFI based boot flow?)
>

are you thinking of the apps-smmu handover of display context bank?
That shouldn't change, the only thing that changes is that gpu-smmu
becomes an mmu-500, whereas previously only apps-smmu was..

BR,
-R

Re: [v4] drm/msm/disp/dpu1: turn off vblank irqs aggressively in dpu driver

2021-02-22 Thread Rob Clark

On Thu, Feb 18, 2021 at 4:36 AM Kalyan Thota  wrote:
>
> Set the flag vblank_disable_immediate = true to turn off vblank irqs
> immediately as soon as drm_vblank_put is requested so that there are
> no irqs triggered during idle state. This will reduce cpu wakeups
> and help in power saving.
>
> To enable vblank_disable_immediate flag the underlying KMS driver
> needs to support high precision vblank timestamping and also a
> reliable way of providing vblank counter which is incrementing
> at the leading edge of vblank.
>
> This patch also brings in changes to support vblank_disable_immediate
> requirement in dpu driver.
>
> Changes in v1:
>  - Specify reason to add vblank timestamp support. (Rob).
>  - Add changes to provide vblank counter from dpu driver.
>
> Changes in v2:
>  - Fix warn stack reported by Rob Clark with v2 patch.
>
> Changes in v3:
>  - Move back to HW frame counter (Rob).
>

could you let me know what the delta was in v4?  (No need to resend
yet, if needed I can amend the commit msg when applying)

BR,
-R

>  Signed-off-by: Kalyan Thota 
> ---
>  drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c   | 80 
> ++
>  drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c| 30 
>  drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.h| 11 +++
>  drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h   |  1 +
>  .../gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c   | 26 +++
>  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.c|  1 +
>  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_intf.h|  1 +
>  drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c|  5 ++
>  8 files changed, 155 insertions(+)
>
> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c 
> b/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c
> index d4662e8..9a80981 100644
> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c
> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c
> @@ -65,6 +65,83 @@ static void dpu_crtc_destroy(struct drm_crtc *crtc)
> kfree(dpu_crtc);
>  }
>
> +static struct drm_encoder *get_encoder_from_crtc(struct drm_crtc *crtc)
> +{
> +   struct drm_device *dev = crtc->dev;
> +   struct drm_encoder *encoder;
> +
> +   drm_for_each_encoder(encoder, dev)
> +   if (encoder->crtc == crtc)
> +   return encoder;
> +
> +   return NULL;
> +}
> +
> +static u32 dpu_crtc_get_vblank_counter(struct drm_crtc *crtc)
> +{
> +   struct drm_encoder *encoder;
> +
> +   encoder = get_encoder_from_crtc(crtc);
> +   if (!encoder) {
> +   DRM_ERROR("no encoder found for crtc %d\n", crtc->index);
> +   return false;
> +   }
> +
> +   return dpu_encoder_get_frame_count(encoder);
> +}
> +
> +static bool dpu_crtc_get_scanout_position(struct drm_crtc *crtc,
> +  bool in_vblank_irq,
> +  int *vpos, int *hpos,
> +  ktime_t *stime, ktime_t *etime,
> +  const struct drm_display_mode 
> *mode)
> +{
> +   unsigned int pipe = crtc->index;
> +   struct drm_encoder *encoder;
> +   int line, vsw, vbp, vactive_start, vactive_end, vfp_end;
> +
> +   encoder = get_encoder_from_crtc(crtc);
> +   if (!encoder) {
> +   DRM_ERROR("no encoder found for crtc %d\n", pipe);
> +   return false;
> +   }
> +
> +   vsw = mode->crtc_vsync_end - mode->crtc_vsync_start;
> +   vbp = mode->crtc_vtotal - mode->crtc_vsync_end;
> +
> +   /*
> +* the line counter is 1 at the start of the VSYNC pulse and VTOTAL at
> +* the end of VFP. Translate the porch values relative to the line
> +* counter positions.
> +*/
> +
> +   vactive_start = vsw + vbp + 1;
> +   vactive_end = vactive_start + mode->crtc_vdisplay;
> +
> +   /* last scan line before VSYNC */
> +   vfp_end = mode->crtc_vtotal;
> +
> +   if (stime)
> +   *stime = ktime_get();
> +
> +   line = dpu_encoder_get_linecount(encoder);
> +
> +   if (line < vactive_start)
> +   line -= vactive_start;
> +   else if (line > vactive_end)
> +   line = line - vfp_end - vactive_start;
> +   else
> +   line -= vactive_start;
> +
> +   *vpos = line;
> +   *hpos = 0;
> +
> +   if (etime)
> +   *etime = ktime_get();
> +
> +   return true;
> +}
> +
>  static void _dpu_crtc_setup_blend_cfg(struct dpu_crtc_mixer *mixer,
> struct d

Re: [PATCH] drm/msm/a6xx: fix for kernels without CONFIG_NVMEM

2021-02-22 Thread Rob Clark

On Mon, Feb 22, 2021 at 7:45 AM Akhil P Oommen  wrote:
>
> On 2/19/2021 9:30 PM, Rob Clark wrote:
> > On Fri, Feb 19, 2021 at 2:44 AM Akhil P Oommen  
> > wrote:
> >>
> >> On 2/18/2021 9:41 PM, Rob Clark wrote:
> >>> On Thu, Feb 18, 2021 at 4:28 AM Akhil P Oommen  
> >>> wrote:
> >>>>
> >>>> On 2/18/2021 2:05 AM, Jonathan Marek wrote:
> >>>>> On 2/17/21 3:18 PM, Rob Clark wrote:
> >>>>>> On Wed, Feb 17, 2021 at 11:08 AM Jordan Crouse
> >>>>>>  wrote:
> >>>>>>>
> >>>>>>> On Wed, Feb 17, 2021 at 07:14:16PM +0530, Akhil P Oommen wrote:
> >>>>>>>> On 2/17/2021 8:36 AM, Rob Clark wrote:
> >>>>>>>>> On Tue, Feb 16, 2021 at 12:10 PM Jonathan Marek 
> >>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Ignore nvmem_cell_get() EOPNOTSUPP error in the same way as a
> >>>>>>>>>> ENOENT error,
> >>>>>>>>>> to fix the case where the kernel was compiled without CONFIG_NVMEM.
> >>>>>>>>>>
> >>>>>>>>>> Fixes: fe7952c629da ("drm/msm: Add speed-bin support to a618 gpu")
> >>>>>>>>>> Signed-off-by: Jonathan Marek 
> >>>>>>>>>> ---
> >>>>>>>>>> drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 6 +++---
> >>>>>>>>>> 1 file changed, 3 insertions(+), 3 deletions(-)
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>>>>>>>>> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>>>>>>>>> index ba8e9d3cf0fe..7fe5d97606aa 100644
> >>>>>>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>>>>>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>>>>>>>>> @@ -1356,10 +1356,10 @@ static int a6xx_set_supported_hw(struct
> >>>>>>>>>> device *dev, struct a6xx_gpu *a6xx_gpu,
> >>>>>>>>>>
> >>>>>>>>>>cell = nvmem_cell_get(dev, "speed_bin");
> >>>>>>>>>>/*
> >>>>>>>>>> -* -ENOENT means that the platform doesn't support
> >>>>>>>>>> speedbin which is
> >>>>>>>>>> -* fine
> >>>>>>>>>> +* -ENOENT means no speed bin in device tree,
> >>>>>>>>>> +* -EOPNOTSUPP means kernel was built without CONFIG_NVMEM
> >>>>>>>>>
> >>>>>>>>> very minor nit, it would be nice to at least preserve the gist of 
> >>>>>>>>> the
> >>>>>>>>> "which is fine" (ie. some variation of "this is an optional thing 
> >>>>>>>>> and
> >>>>>>>>> things won't catch fire without it" ;-))
> >>>>>>>>>
> >>>>>>>>> (which is, I believe, is true, hopefully Akhil could confirm.. if 
> >>>>>>>>> not
> >>>>>>>>> we should have a harder dependency on CONFIG_NVMEM..)
> >>>>>>>> IIRC, if the gpu opp table in the DT uses the 'opp-supported-hw'
> >>>>>>>> property,
> >>>>>>>> we will see some error during boot up if we don't call
> >>>>>>>> dev_pm_opp_set_supported_hw(). So calling "nvmem_cell_get(dev,
> >>>>>>>> "speed_bin")"
> >>>>>>>> is a way to test this.
> >>>>>>>>
> >>>>>>>> If there is no other harm, we can put a hard dependency on
> >>>>>>>> CONFIG_NVMEM.
> >>>>>>>
> >>>>>>> I'm not sure if we want to go this far given the squishiness about
> >>>>>>> module
> >>>>>>> dependencies. As far as I know we are the only driver that uses this
> >>>>>>> seriously
> >>>>>>> on QCOM SoCs and this is only needed for certain targets. I don't
> >>>>>>> know if we
> >>>>>>> want to force every target to build NVMEM and QFPROM on our behalf.
> >>>>>>> But maybe
> >>>>>>> I'm just saying that because Kconfig dependencies tend to break my
> >>>>>>> brain (and
> >>>>>>> then Arnd has to send a patch to fix it).
> >>>>>>>
> >>>>>>
> >>>>>> Hmm, good point.. looks like CONFIG_NVMEM itself doesn't have any
> >>>>>> other dependencies, so I suppose it wouldn't be the end of the world
> >>>>>> to select that.. but I guess we don't want to require QFPROM
> >>>>>>
> >>>>>> I guess at the end of the day, what is the failure mode if you have a
> >>>>>> speed-bin device, but your kernel config misses QFPROM (and possibly
> >>>>>> NVMEM)?  If the result is just not having the highest clk rate(s)
> >>>>
> >>>> Atleast on sc7180's gpu, using an unsupported FMAX breaks gmu. It won't
> >>>> be very obvious what went wrong when this happens!
> >>>
> >>> Ugg, ok..
> >>>
> >>> I suppose we could select NVMEM, but not QFPROM, and then the case
> >>> where QFPROM is not enabled on platforms that have the speed-bin field
> >>> in DT will fail gracefully and all other platforms would continue on
> >>> happily?
> >>>
> >>> BR,
> >>> -R
> >>
> >> Sounds good to me.
> >>
> >
> > You probably should do a quick test with NVMEM enabled but QFPROM
> > disabled to confirm my theory, but I *think* that should work
> >
> > BR,
> > -R
> >
>
> I tried it on an sc7180 device. The suggested combo (CONFIG_NVMEM + no
> CONFIG_QCOM_QFPROM) makes the gpu probe fail with error "failed to read
> speed-bin. Some OPPs may not be supported by hardware". This is good
> enough clue for the developer that he should fix the broken speedbin
> detection.
>

Ok, great.. then sounds like selecting NVMEM is a good approach

BR,
-R

Re: [PATCH] drm/msm/a6xx: fix for kernels without CONFIG_NVMEM

2021-02-19 Thread Rob Clark

On Fri, Feb 19, 2021 at 2:44 AM Akhil P Oommen  wrote:
>
> On 2/18/2021 9:41 PM, Rob Clark wrote:
> > On Thu, Feb 18, 2021 at 4:28 AM Akhil P Oommen  
> > wrote:
> >>
> >> On 2/18/2021 2:05 AM, Jonathan Marek wrote:
> >>> On 2/17/21 3:18 PM, Rob Clark wrote:
> >>>> On Wed, Feb 17, 2021 at 11:08 AM Jordan Crouse
> >>>>  wrote:
> >>>>>
> >>>>> On Wed, Feb 17, 2021 at 07:14:16PM +0530, Akhil P Oommen wrote:
> >>>>>> On 2/17/2021 8:36 AM, Rob Clark wrote:
> >>>>>>> On Tue, Feb 16, 2021 at 12:10 PM Jonathan Marek 
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Ignore nvmem_cell_get() EOPNOTSUPP error in the same way as a
> >>>>>>>> ENOENT error,
> >>>>>>>> to fix the case where the kernel was compiled without CONFIG_NVMEM.
> >>>>>>>>
> >>>>>>>> Fixes: fe7952c629da ("drm/msm: Add speed-bin support to a618 gpu")
> >>>>>>>> Signed-off-by: Jonathan Marek 
> >>>>>>>> ---
> >>>>>>>>drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 6 +++---
> >>>>>>>>1 file changed, 3 insertions(+), 3 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>>>>>>> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>>>>>>> index ba8e9d3cf0fe..7fe5d97606aa 100644
> >>>>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>>>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>>>>>>> @@ -1356,10 +1356,10 @@ static int a6xx_set_supported_hw(struct
> >>>>>>>> device *dev, struct a6xx_gpu *a6xx_gpu,
> >>>>>>>>
> >>>>>>>>   cell = nvmem_cell_get(dev, "speed_bin");
> >>>>>>>>   /*
> >>>>>>>> -* -ENOENT means that the platform doesn't support
> >>>>>>>> speedbin which is
> >>>>>>>> -* fine
> >>>>>>>> +* -ENOENT means no speed bin in device tree,
> >>>>>>>> +* -EOPNOTSUPP means kernel was built without CONFIG_NVMEM
> >>>>>>>
> >>>>>>> very minor nit, it would be nice to at least preserve the gist of the
> >>>>>>> "which is fine" (ie. some variation of "this is an optional thing and
> >>>>>>> things won't catch fire without it" ;-))
> >>>>>>>
> >>>>>>> (which is, I believe, is true, hopefully Akhil could confirm.. if not
> >>>>>>> we should have a harder dependency on CONFIG_NVMEM..)
> >>>>>> IIRC, if the gpu opp table in the DT uses the 'opp-supported-hw'
> >>>>>> property,
> >>>>>> we will see some error during boot up if we don't call
> >>>>>> dev_pm_opp_set_supported_hw(). So calling "nvmem_cell_get(dev,
> >>>>>> "speed_bin")"
> >>>>>> is a way to test this.
> >>>>>>
> >>>>>> If there is no other harm, we can put a hard dependency on
> >>>>>> CONFIG_NVMEM.
> >>>>>
> >>>>> I'm not sure if we want to go this far given the squishiness about
> >>>>> module
> >>>>> dependencies. As far as I know we are the only driver that uses this
> >>>>> seriously
> >>>>> on QCOM SoCs and this is only needed for certain targets. I don't
> >>>>> know if we
> >>>>> want to force every target to build NVMEM and QFPROM on our behalf.
> >>>>> But maybe
> >>>>> I'm just saying that because Kconfig dependencies tend to break my
> >>>>> brain (and
> >>>>> then Arnd has to send a patch to fix it).
> >>>>>
> >>>>
> >>>> Hmm, good point.. looks like CONFIG_NVMEM itself doesn't have any
> >>>> other dependencies, so I suppose it wouldn't be the end of the world
> >>>> to select that.. but I guess we don't want to require QFPROM
> >>>>
> >>>> I guess at the end of the day, what is the failure mode if you have a
> >>>> speed-bin device, but your kernel config misses QFPROM (and possibly
> >>>> NVMEM)?  If the result is just not having the highest clk rate(s)
> >>
> >> Atleast on sc7180's gpu, using an unsupported FMAX breaks gmu. It won't
> >> be very obvious what went wrong when this happens!
> >
> > Ugg, ok..
> >
> > I suppose we could select NVMEM, but not QFPROM, and then the case
> > where QFPROM is not enabled on platforms that have the speed-bin field
> > in DT will fail gracefully and all other platforms would continue on
> > happily?
> >
> > BR,
> > -R
>
> Sounds good to me.
>

You probably should do a quick test with NVMEM enabled but QFPROM
disabled to confirm my theory, but I *think* that should work

BR,
-R

Re: [PATCH] drm/msm/a6xx: fix for kernels without CONFIG_NVMEM

2021-02-18 Thread Rob Clark

On Thu, Feb 18, 2021 at 4:28 AM Akhil P Oommen  wrote:
>
> On 2/18/2021 2:05 AM, Jonathan Marek wrote:
> > On 2/17/21 3:18 PM, Rob Clark wrote:
> >> On Wed, Feb 17, 2021 at 11:08 AM Jordan Crouse
> >>  wrote:
> >>>
> >>> On Wed, Feb 17, 2021 at 07:14:16PM +0530, Akhil P Oommen wrote:
> >>>> On 2/17/2021 8:36 AM, Rob Clark wrote:
> >>>>> On Tue, Feb 16, 2021 at 12:10 PM Jonathan Marek 
> >>>>> wrote:
> >>>>>>
> >>>>>> Ignore nvmem_cell_get() EOPNOTSUPP error in the same way as a
> >>>>>> ENOENT error,
> >>>>>> to fix the case where the kernel was compiled without CONFIG_NVMEM.
> >>>>>>
> >>>>>> Fixes: fe7952c629da ("drm/msm: Add speed-bin support to a618 gpu")
> >>>>>> Signed-off-by: Jonathan Marek 
> >>>>>> ---
> >>>>>>   drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 6 +++---
> >>>>>>   1 file changed, 3 insertions(+), 3 deletions(-)
> >>>>>>
> >>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>>>>> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>>>>> index ba8e9d3cf0fe..7fe5d97606aa 100644
> >>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> >>>>>> @@ -1356,10 +1356,10 @@ static int a6xx_set_supported_hw(struct
> >>>>>> device *dev, struct a6xx_gpu *a6xx_gpu,
> >>>>>>
> >>>>>>  cell = nvmem_cell_get(dev, "speed_bin");
> >>>>>>  /*
> >>>>>> -* -ENOENT means that the platform doesn't support
> >>>>>> speedbin which is
> >>>>>> -* fine
> >>>>>> +* -ENOENT means no speed bin in device tree,
> >>>>>> +* -EOPNOTSUPP means kernel was built without CONFIG_NVMEM
> >>>>>
> >>>>> very minor nit, it would be nice to at least preserve the gist of the
> >>>>> "which is fine" (ie. some variation of "this is an optional thing and
> >>>>> things won't catch fire without it" ;-))
> >>>>>
> >>>>> (which is, I believe, is true, hopefully Akhil could confirm.. if not
> >>>>> we should have a harder dependency on CONFIG_NVMEM..)
> >>>> IIRC, if the gpu opp table in the DT uses the 'opp-supported-hw'
> >>>> property,
> >>>> we will see some error during boot up if we don't call
> >>>> dev_pm_opp_set_supported_hw(). So calling "nvmem_cell_get(dev,
> >>>> "speed_bin")"
> >>>> is a way to test this.
> >>>>
> >>>> If there is no other harm, we can put a hard dependency on
> >>>> CONFIG_NVMEM.
> >>>
> >>> I'm not sure if we want to go this far given the squishiness about
> >>> module
> >>> dependencies. As far as I know we are the only driver that uses this
> >>> seriously
> >>> on QCOM SoCs and this is only needed for certain targets. I don't
> >>> know if we
> >>> want to force every target to build NVMEM and QFPROM on our behalf.
> >>> But maybe
> >>> I'm just saying that because Kconfig dependencies tend to break my
> >>> brain (and
> >>> then Arnd has to send a patch to fix it).
> >>>
> >>
> >> Hmm, good point.. looks like CONFIG_NVMEM itself doesn't have any
> >> other dependencies, so I suppose it wouldn't be the end of the world
> >> to select that.. but I guess we don't want to require QFPROM
> >>
> >> I guess at the end of the day, what is the failure mode if you have a
> >> speed-bin device, but your kernel config misses QFPROM (and possibly
> >> NVMEM)?  If the result is just not having the highest clk rate(s)
>
> Atleast on sc7180's gpu, using an unsupported FMAX breaks gmu. It won't
> be very obvious what went wrong when this happens!

Ugg, ok..

I suppose we could select NVMEM, but not QFPROM, and then the case
where QFPROM is not enabled on platforms that have the speed-bin field
in DT will fail gracefully and all other platforms would continue on
happily?

BR,
-R

>
> >> available, that isn't the end of the world.  But if it makes things
> >> not-work, that is sub-optimal.  Generally, especially on ARM, kconfig
> >> seems to be way harder than it should be to build a kernel that works,
> >> if we could somehow not add to that problem (for both people with a6xx
> >> and older gens) that would be nice ;-)
> >>
> >
> > There is a "imply" kconfig option which solves exactly this problem.
> > (you would "imply NVMEM" instead of "select NVMEM". then it would be
> > possible to disable NVMEM but it would get enabled by default)
> >
> >> BR,
> >> -R
> >>
> > ___
> > dri-devel mailing list
> > dri-de...@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>

Re: [PATCH] drm/msm/a6xx: fix for kernels without CONFIG_NVMEM

2021-02-17 Thread Rob Clark

On Wed, Feb 17, 2021 at 11:08 AM Jordan Crouse  wrote:
>
> On Wed, Feb 17, 2021 at 07:14:16PM +0530, Akhil P Oommen wrote:
> > On 2/17/2021 8:36 AM, Rob Clark wrote:
> > >On Tue, Feb 16, 2021 at 12:10 PM Jonathan Marek  wrote:
> > >>
> > >>Ignore nvmem_cell_get() EOPNOTSUPP error in the same way as a ENOENT 
> > >>error,
> > >>to fix the case where the kernel was compiled without CONFIG_NVMEM.
> > >>
> > >>Fixes: fe7952c629da ("drm/msm: Add speed-bin support to a618 gpu")
> > >>Signed-off-by: Jonathan Marek 
> > >>---
> > >>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 6 +++---
> > >>  1 file changed, 3 insertions(+), 3 deletions(-)
> > >>
> > >>diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> > >>b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > >>index ba8e9d3cf0fe..7fe5d97606aa 100644
> > >>--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > >>+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > >>@@ -1356,10 +1356,10 @@ static int a6xx_set_supported_hw(struct device 
> > >>*dev, struct a6xx_gpu *a6xx_gpu,
> > >>
> > >> cell = nvmem_cell_get(dev, "speed_bin");
> > >> /*
> > >>-* -ENOENT means that the platform doesn't support speedbin which 
> > >>is
> > >>-* fine
> > >>+* -ENOENT means no speed bin in device tree,
> > >>+* -EOPNOTSUPP means kernel was built without CONFIG_NVMEM
> > >
> > >very minor nit, it would be nice to at least preserve the gist of the
> > >"which is fine" (ie. some variation of "this is an optional thing and
> > >things won't catch fire without it" ;-))
> > >
> > >(which is, I believe, is true, hopefully Akhil could confirm.. if not
> > >we should have a harder dependency on CONFIG_NVMEM..)
> > IIRC, if the gpu opp table in the DT uses the 'opp-supported-hw' property,
> > we will see some error during boot up if we don't call
> > dev_pm_opp_set_supported_hw(). So calling "nvmem_cell_get(dev, "speed_bin")"
> > is a way to test this.
> >
> > If there is no other harm, we can put a hard dependency on CONFIG_NVMEM.
>
> I'm not sure if we want to go this far given the squishiness about module
> dependencies. As far as I know we are the only driver that uses this seriously
> on QCOM SoCs and this is only needed for certain targets. I don't know if we
> want to force every target to build NVMEM and QFPROM on our behalf. But maybe
> I'm just saying that because Kconfig dependencies tend to break my brain (and
> then Arnd has to send a patch to fix it).
>

Hmm, good point.. looks like CONFIG_NVMEM itself doesn't have any
other dependencies, so I suppose it wouldn't be the end of the world
to select that.. but I guess we don't want to require QFPROM

I guess at the end of the day, what is the failure mode if you have a
speed-bin device, but your kernel config misses QFPROM (and possibly
NVMEM)?  If the result is just not having the highest clk rate(s)
available, that isn't the end of the world.  But if it makes things
not-work, that is sub-optimal.  Generally, especially on ARM, kconfig
seems to be way harder than it should be to build a kernel that works,
if we could somehow not add to that problem (for both people with a6xx
and older gens) that would be nice ;-)

BR,
-R

Re: [PATCH] drm/msm/a6xx: fix for kernels without CONFIG_NVMEM

2021-02-16 Thread Rob Clark

On Tue, Feb 16, 2021 at 12:10 PM Jonathan Marek  wrote:
>
> Ignore nvmem_cell_get() EOPNOTSUPP error in the same way as a ENOENT error,
> to fix the case where the kernel was compiled without CONFIG_NVMEM.
>
> Fixes: fe7952c629da ("drm/msm: Add speed-bin support to a618 gpu")
> Signed-off-by: Jonathan Marek 
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index ba8e9d3cf0fe..7fe5d97606aa 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -1356,10 +1356,10 @@ static int a6xx_set_supported_hw(struct device *dev, 
> struct a6xx_gpu *a6xx_gpu,
>
> cell = nvmem_cell_get(dev, "speed_bin");
> /*
> -* -ENOENT means that the platform doesn't support speedbin which is
> -* fine
> +* -ENOENT means no speed bin in device tree,
> +* -EOPNOTSUPP means kernel was built without CONFIG_NVMEM

very minor nit, it would be nice to at least preserve the gist of the
"which is fine" (ie. some variation of "this is an optional thing and
things won't catch fire without it" ;-))

(which is, I believe, is true, hopefully Akhil could confirm.. if not
we should have a harder dependency on CONFIG_NVMEM..)

BR,
-R

>  */
> -   if (PTR_ERR(cell) == -ENOENT)
> +   if (PTR_ERR(cell) == -ENOENT || PTR_ERR(cell) == -EOPNOTSUPP)
> return 0;
> else if (IS_ERR(cell)) {
> DRM_DEV_ERROR(dev,
> --
> 2.26.1
>

Re: [PATCH v2 1/2] drm/msm: add compatibles for sm8150/sm8250 display

2021-02-16 Thread Rob Clark

On Tue, Feb 16, 2021 at 10:06 AM Jonathan Marek  wrote:
>
> On 2/16/21 11:54 AM, Dmitry Baryshkov wrote:
> > On Mon, 15 Feb 2021 at 19:25, Jonathan Marek  wrote:
> >>
> >> The driver already has support for sm8150/sm8250, but the compatibles were
> >> never added.
> >>
> >> Also inverse the non-mdp4 condition in add_display_components() to avoid
> >> having to check every new compatible in the condition.
> >>
> >> Signed-off-by: Jonathan Marek 
> >> ---
> >>   Documentation/devicetree/bindings/display/msm/dpu.txt | 4 ++--
> >>   drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c   | 2 ++
> >>   drivers/gpu/drm/msm/msm_drv.c | 6 +++---
> >>   3 files changed, 7 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/Documentation/devicetree/bindings/display/msm/dpu.txt 
> >> b/Documentation/devicetree/bindings/display/msm/dpu.txt
> >> index 551ae26f60da..5763f43200a0 100644
> >> --- a/Documentation/devicetree/bindings/display/msm/dpu.txt
> >> +++ b/Documentation/devicetree/bindings/display/msm/dpu.txt
> >> @@ -8,7 +8,7 @@ The DPU display controller is found in SDM845 SoC.
> >>
> >>   MDSS:
> >>   Required properties:
> >> -- compatible:  "qcom,sdm845-mdss", "qcom,sc7180-mdss"
> >> +- compatible:  "qcom,sdm845-mdss", "qcom,sc7180-mdss", 
> >> "qcom,sm8150-mdss", "qcom,sm8250-mdss"
> >>   - reg: physical base address and length of contoller's registers.
> >>   - reg-names: register region names. The following region is required:
> >> * "mdss"
> >> @@ -41,7 +41,7 @@ Optional properties:
> >>
> >>   MDP:
> >>   Required properties:
> >> -- compatible: "qcom,sdm845-dpu", "qcom,sc7180-dpu"
> >> +- compatible: "qcom,sdm845-dpu", "qcom,sc7180-dpu", "qcom,sm8150-dpu", 
> >> "qcom,sm8250-dpu"
> >>   - reg: physical base address and length of controller's registers.
> >>   - reg-names : register region names. The following region is required:
> >> * "mdp"
> >
> > These two chunks should probably go to the separate patch 'dt-bindings:...'.
> >
>
> In this case I think its better to have this change in the same patch,
> but maybe one of the Robs will disagree.

I *think* typically the reason to split dt bindings into their own
patch is that devicetree@ list isn't interested in reviewing driver
changes, just binding changes..

In this case since it is just adding a compatible I think it is ok..
(or at least ok by me, but maybe other-Rob disagrees ;-))

> > Also, could you please pinpoint the reason for adding more
> > compatibility strings, while they map to the same internal data?
> > I think we might want instead to use some generic name for the dpu
> > block, like "qcom,dpu" or "qcom,mdp-dpu" instead of specifying the
> > platform name.
> >
>
> sdm845 and sc7180 aren't using generic compatibles, this is just being
> consistent with that.

It is good to have a device specific compatible up front, even if we
fallback to the more generic one for matching.. just in case we find a
reason for needing it later

BR,
-R

> >
> >> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c 
> >> b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
> >> index 5a8e3e1fc48c..fff12a4c8bfc 100644
> >> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
> >> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
> >> @@ -1219,6 +1219,8 @@ static const struct dev_pm_ops dpu_pm_ops = {
> >>   static const struct of_device_id dpu_dt_match[] = {
> >>  { .compatible = "qcom,sdm845-dpu", },
> >>  { .compatible = "qcom,sc7180-dpu", },
> >> +   { .compatible = "qcom,sm8150-dpu", },
> >> +   { .compatible = "qcom,sm8250-dpu", },
> >>  {}
> >>   };
> >>   MODULE_DEVICE_TABLE(of, dpu_dt_match);
> >> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
> >> index 94525ac76d4e..928f13d4bfbc 100644
> >> --- a/drivers/gpu/drm/msm/msm_drv.c
> >> +++ b/drivers/gpu/drm/msm/msm_drv.c
> >> @@ -1185,9 +1185,7 @@ static int add_display_components(struct device *dev,
> >>   * Populate the children devices, find the MDP5/DPU node, and 
> >> then add
> >>   * the interfaces to our components list.
> >>   */
> >> -   if (of_device_is_compatible(dev->of_node, "qcom,mdss") ||
> >> -   of_device_is_compatible(dev->of_node, "qcom,sdm845-mdss") ||
> >> -   of_device_is_compatible(dev->of_node, "qcom,sc7180-mdss")) {
> >> +   if (!of_device_is_compatible(dev->of_node, "qcom,mdp4")) {
> >>  ret = of_platform_populate(dev->of_node, NULL, NULL, dev);
> >>  if (ret) {
> >>  DRM_DEV_ERROR(dev, "failed to populate children 
> >> devices\n");
> >> @@ -1320,6 +1318,8 @@ static const struct of_device_id dt_match[] = {
> >>  { .compatible = "qcom,mdss", .data = (void *)KMS_MDP5 },
> >>  { .compatible = "qcom,sdm845-mdss", .data = (void *)KMS_DPU },
> >>  { .compatible = "qcom,sc7180-mdss", .data = (void *)KMS_DPU },
> >> +   { .compatible = "qcom,sm8150-mdss", .data = (void *)KMS_DPU },
> >> +

Re: [Freedreno] [v2] drm/msm/disp/dpu1: turn off vblank irqs aggressively in dpu driver

2021-02-16 Thread Rob Clark

On Tue, Feb 16, 2021 at 7:21 AM  wrote:
>
> On 2021-02-12 23:19, Rob Clark wrote:
> > On Thu, Feb 11, 2021 at 7:31 AM  wrote:
> >>
> >> On 2021-02-11 01:56, Rob Clark wrote:
> >> > On Wed, Feb 10, 2021 at 3:41 AM  wrote:
> >> >>
> >> >> On 2021-02-01 00:46, Rob Clark wrote:
> >> >> > On Fri, Dec 18, 2020 at 2:27 AM Kalyan Thota 
> >> >> > wrote:
> >> >> >>
> >> >> >> Set the flag vblank_disable_immediate = true to turn off vblank irqs
> >> >> >> immediately as soon as drm_vblank_put is requested so that there are
> >> >> >> no irqs triggered during idle state. This will reduce cpu wakeups
> >> >> >> and help in power saving.
> >> >> >>
> >> >> >> To enable vblank_disable_immediate flag the underlying KMS driver
> >> >> >> needs to support high precision vblank timestamping and also a
> >> >> >> reliable way of providing vblank counter which is incrementing
> >> >> >> at the leading edge of vblank.
> >> >> >>
> >> >> >> This patch also brings in changes to support vblank_disable_immediate
> >> >> >> requirement in dpu driver.
> >> >> >>
> >> >> >> Changes in v1:
> >> >> >>  - Specify reason to add vblank timestamp support. (Rob)
> >> >> >>  - Add changes to provide vblank counter from dpu driver.
> >> >> >>
> >> >> >> Signed-off-by: Kalyan Thota 
> >> >> >
> >> >> > This seems to be triggering:
> >> >> >
> >> >> > [  +0.032668] [ cut here ]
> >> >> > [  +0.004759] msm ae0.mdss: drm_WARN_ON_ONCE(cur_vblank !=
> >> >> > vblank->last)
> >> >> > [  +0.24] WARNING: CPU: 0 PID: 362 at
> >> >> > drivers/gpu/drm/drm_vblank.c:354 drm_update_vblank_count+0x1e4/0x258
> >> >> > [  +0.017154] Modules linked in: joydev
> >> >> > [  +0.003784] CPU: 0 PID: 362 Comm: frecon Not tainted
> >> >> > 5.11.0-rc5-00037-g33d3504871dd #2
> >> >> > [  +0.008135] Hardware name: Google Lazor (rev1 - 2) with LTE (DT)
> >> >> > [  +0.006167] pstate: 60400089 (nZCv daIf +PAN -UAO -TCO BTYPE=--)
> >> >> > [  +0.006169] pc : drm_update_vblank_count+0x1e4/0x258
> >> >> > [  +0.005105] lr : drm_update_vblank_count+0x1e4/0x258
> >> >> > [  +0.005106] sp : ffc010003b70
> >> >> > [  +0.003409] x29: ffc010003b70 x28: ff80855d9d98
> >> >> > [  +0.005466] x27:  x26: 00fe502a
> >> >> > [  +0.005458] x25: 0001 x24: 0001
> >> >> > [  +0.005466] x23: 0001 x22: ff808561ce80
> >> >> > [  +0.005465] x21:  x20: 
> >> >> > [  +0.005468] x19: ff80850d6800 x18: 
> >> >> > [  +0.005466] x17:  x16: 
> >> >> > [  +0.005465] x15: 000a x14: 263b
> >> >> > [  +0.005466] x13: 0006 x12: 
> >> >> > [  +0.005465] x11: 0010 x10: ffc090003797
> >> >> > [  +0.005466] x9 : ffed200e2a8c x8 : 
> >> >> > [  +0.005466] x7 :  x6 : ffed213b2b51
> >> >> > [  +0.005465] x5 : c000dfff x4 : ffed21218048
> >> >> > [  +0.005465] x3 :  x2 : 
> >> >> > [  +0.005465] x1 :  x0 : 
> >> >> > [  +0.005466] Call trace:
> >> >> > [  +0.002520]  drm_update_vblank_count+0x1e4/0x258
> >> >> > [  +0.004748]  drm_handle_vblank+0xd0/0x35c
> >> >> > [  +0.004130]  drm_crtc_handle_vblank+0x24/0x30
> >> >> > [  +0.004487]  dpu_crtc_vblank_callback+0x3c/0xc4
> >> >> > [  +0.004662]  dpu_encoder_vblank_callback+0x70/0xc4
> >> >> > [  +0.004931]  dpu_encoder_phys_vid_vblank_irq+0x50/0x12c
> >> >> > [  +0.005378]  dpu_core_irq_callback_handler+0xf4/0xfc
> >> >> > [  +0.005107]  dpu_hw_intr_dispatch_irq+0x100/0x120
> >> >> > [  +0.004834]  dpu_core_irq+0x44/0x5c
> >> >> > [  +0.003597]  d

Re: [v3] drm/msm/disp/dpu1: turn off vblank irqs aggressively in dpu

2021-02-12 Thread Rob Clark

On Fri, Feb 12, 2021 at 3:45 AM Kalyan Thota  wrote:
>
> Set the flag vblank_disable_immediate = true to turn off vblank irqs
> immediately as soon as drm_vblank_put is requested so that there are
> no irqs triggered during idle state. This will reduce cpu wakeups
> and help in power saving.
>
> To enable vblank_disable_immediate flag the underlying KMS driver
> needs to support high precision vblank timestamping and also a
> reliable way of providing vblank counter which is incrementing
> at the leading edge of vblank.
>
> This patch also brings in changes to support vblank_disable_immediate
> requirement in dpu driver.
>
> Changes in v1:
>  - Specify reason to add vblank timestamp support. (Rob)
>  - Add changes to provide vblank counter from dpu driver.
>
> Changes in v2:
>  - fix warn stack reported by Rob with v2 patch
>
> Signed-off-by: Kalyan Thota 
> ---
>  drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c| 80 
> +
>  drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c | 28 +-
>  drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.h | 11 
>  drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c |  5 ++
>  4 files changed, 123 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c 
> b/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c
> index d4662e8..9a80981 100644
> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c
> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c
> @@ -65,6 +65,83 @@ static void dpu_crtc_destroy(struct drm_crtc *crtc)
> kfree(dpu_crtc);
>  }
>
> +static struct drm_encoder *get_encoder_from_crtc(struct drm_crtc *crtc)
> +{
> +   struct drm_device *dev = crtc->dev;
> +   struct drm_encoder *encoder;
> +
> +   drm_for_each_encoder(encoder, dev)
> +   if (encoder->crtc == crtc)
> +   return encoder;
> +
> +   return NULL;
> +}
> +
> +static u32 dpu_crtc_get_vblank_counter(struct drm_crtc *crtc)
> +{
> +   struct drm_encoder *encoder;
> +
> +   encoder = get_encoder_from_crtc(crtc);
> +   if (!encoder) {
> +   DRM_ERROR("no encoder found for crtc %d\n", crtc->index);
> +   return false;
> +   }
> +
> +   return dpu_encoder_get_frame_count(encoder);
> +}
> +
> +static bool dpu_crtc_get_scanout_position(struct drm_crtc *crtc,
> +  bool in_vblank_irq,
> +  int *vpos, int *hpos,
> +  ktime_t *stime, ktime_t *etime,
> +  const struct drm_display_mode 
> *mode)
> +{
> +   unsigned int pipe = crtc->index;
> +   struct drm_encoder *encoder;
> +   int line, vsw, vbp, vactive_start, vactive_end, vfp_end;
> +
> +   encoder = get_encoder_from_crtc(crtc);
> +   if (!encoder) {
> +   DRM_ERROR("no encoder found for crtc %d\n", pipe);
> +   return false;
> +   }
> +
> +   vsw = mode->crtc_vsync_end - mode->crtc_vsync_start;
> +   vbp = mode->crtc_vtotal - mode->crtc_vsync_end;
> +
> +   /*
> +* the line counter is 1 at the start of the VSYNC pulse and VTOTAL at
> +* the end of VFP. Translate the porch values relative to the line
> +* counter positions.
> +*/
> +
> +   vactive_start = vsw + vbp + 1;
> +   vactive_end = vactive_start + mode->crtc_vdisplay;
> +
> +   /* last scan line before VSYNC */
> +   vfp_end = mode->crtc_vtotal;
> +
> +   if (stime)
> +   *stime = ktime_get();
> +
> +   line = dpu_encoder_get_linecount(encoder);
> +
> +   if (line < vactive_start)
> +   line -= vactive_start;
> +   else if (line > vactive_end)
> +   line = line - vfp_end - vactive_start;
> +   else
> +   line -= vactive_start;
> +
> +   *vpos = line;
> +   *hpos = 0;
> +
> +   if (etime)
> +   *etime = ktime_get();
> +
> +   return true;
> +}
> +
>  static void _dpu_crtc_setup_blend_cfg(struct dpu_crtc_mixer *mixer,
> struct dpu_plane_state *pstate, struct dpu_format *format)
>  {
> @@ -1243,6 +1320,8 @@ static const struct drm_crtc_funcs dpu_crtc_funcs = {
> .early_unregister = dpu_crtc_early_unregister,
> .enable_vblank  = msm_crtc_enable_vblank,
> .disable_vblank = msm_crtc_disable_vblank,
> +   .get_vblank_timestamp = drm_crtc_vblank_helper_get_vblank_timestamp,
> +   .get_vblank_counter = dpu_crtc_get_vblank_counter,
>  };
>
>  static const struct drm_crtc_helper_funcs dpu_crtc_helper_funcs = {
> @@ -1251,6 +1330,7 @@ static const struct drm_crtc_helper_funcs 
> dpu_crtc_helper_funcs = {
> .atomic_check = dpu_crtc_atomic_check,
> .atomic_begin = dpu_crtc_atomic_begin,
> .atomic_flush = dpu_crtc_atomic_flush,
> +   .get_scanout_position = dpu_crtc_get_scanout_position,
>  };
>
>  /* initialize crtc */
> diff --git

Re: [Freedreno] [v2] drm/msm/disp/dpu1: turn off vblank irqs aggressively in dpu driver

2021-02-12 Thread Rob Clark

On Thu, Feb 11, 2021 at 7:31 AM  wrote:
>
> On 2021-02-11 01:56, Rob Clark wrote:
> > On Wed, Feb 10, 2021 at 3:41 AM  wrote:
> >>
> >> On 2021-02-01 00:46, Rob Clark wrote:
> >> > On Fri, Dec 18, 2020 at 2:27 AM Kalyan Thota 
> >> > wrote:
> >> >>
> >> >> Set the flag vblank_disable_immediate = true to turn off vblank irqs
> >> >> immediately as soon as drm_vblank_put is requested so that there are
> >> >> no irqs triggered during idle state. This will reduce cpu wakeups
> >> >> and help in power saving.
> >> >>
> >> >> To enable vblank_disable_immediate flag the underlying KMS driver
> >> >> needs to support high precision vblank timestamping and also a
> >> >> reliable way of providing vblank counter which is incrementing
> >> >> at the leading edge of vblank.
> >> >>
> >> >> This patch also brings in changes to support vblank_disable_immediate
> >> >> requirement in dpu driver.
> >> >>
> >> >> Changes in v1:
> >> >>  - Specify reason to add vblank timestamp support. (Rob)
> >> >>  - Add changes to provide vblank counter from dpu driver.
> >> >>
> >> >> Signed-off-by: Kalyan Thota 
> >> >
> >> > This seems to be triggering:
> >> >
> >> > [  +0.032668] [ cut here ]
> >> > [  +0.004759] msm ae0.mdss: drm_WARN_ON_ONCE(cur_vblank !=
> >> > vblank->last)
> >> > [  +0.24] WARNING: CPU: 0 PID: 362 at
> >> > drivers/gpu/drm/drm_vblank.c:354 drm_update_vblank_count+0x1e4/0x258
> >> > [  +0.017154] Modules linked in: joydev
> >> > [  +0.003784] CPU: 0 PID: 362 Comm: frecon Not tainted
> >> > 5.11.0-rc5-00037-g33d3504871dd #2
> >> > [  +0.008135] Hardware name: Google Lazor (rev1 - 2) with LTE (DT)
> >> > [  +0.006167] pstate: 60400089 (nZCv daIf +PAN -UAO -TCO BTYPE=--)
> >> > [  +0.006169] pc : drm_update_vblank_count+0x1e4/0x258
> >> > [  +0.005105] lr : drm_update_vblank_count+0x1e4/0x258
> >> > [  +0.005106] sp : ffc010003b70
> >> > [  +0.003409] x29: ffc010003b70 x28: ff80855d9d98
> >> > [  +0.005466] x27:  x26: 00fe502a
> >> > [  +0.005458] x25: 0001 x24: 0001
> >> > [  +0.005466] x23: 0001 x22: ff808561ce80
> >> > [  +0.005465] x21:  x20: 
> >> > [  +0.005468] x19: ff80850d6800 x18: 
> >> > [  +0.005466] x17:  x16: 
> >> > [  +0.005465] x15: 000a x14: 263b
> >> > [  +0.005466] x13: 0006 x12: 
> >> > [  +0.005465] x11: 0010 x10: ffc090003797
> >> > [  +0.005466] x9 : ffed200e2a8c x8 : 
> >> > [  +0.005466] x7 :  x6 : ffed213b2b51
> >> > [  +0.005465] x5 : c000dfff x4 : ffed21218048
> >> > [  +0.005465] x3 :  x2 : 
> >> > [  +0.005465] x1 :  x0 : 
> >> > [  +0.005466] Call trace:
> >> > [  +0.002520]  drm_update_vblank_count+0x1e4/0x258
> >> > [  +0.004748]  drm_handle_vblank+0xd0/0x35c
> >> > [  +0.004130]  drm_crtc_handle_vblank+0x24/0x30
> >> > [  +0.004487]  dpu_crtc_vblank_callback+0x3c/0xc4
> >> > [  +0.004662]  dpu_encoder_vblank_callback+0x70/0xc4
> >> > [  +0.004931]  dpu_encoder_phys_vid_vblank_irq+0x50/0x12c
> >> > [  +0.005378]  dpu_core_irq_callback_handler+0xf4/0xfc
> >> > [  +0.005107]  dpu_hw_intr_dispatch_irq+0x100/0x120
> >> > [  +0.004834]  dpu_core_irq+0x44/0x5c
> >> > [  +0.003597]  dpu_irq+0x1c/0x28
> >> > [  +0.003141]  msm_irq+0x34/0x40
> >> > [  +0.003153]  __handle_irq_event_percpu+0xfc/0x254
> >> > [  +0.004838]  handle_irq_event_percpu+0x3c/0x94
> >> > [  +0.004574]  handle_irq_event+0x54/0x98
> >> > [  +0.003944]  handle_level_irq+0xa0/0xd0
> >> > [  +0.003943]  generic_handle_irq+0x30/0x48
> >> > [  +0.004131]  dpu_mdss_irq+0xe4/0x118
> >> > [  +0.003684]  generic_handle_irq+0x30/0x48
> >> > [  +0.004127]  __handle_domain_irq+0xa8/0xac
> >> > [  +0.004215]  gic_handle_irq+0xdc/0x150
> >> > [  +0.003856]  el1_irq+0xb4/0x180
> >> > [  +0.00

Re: [Freedreno] [v2] drm/msm/disp/dpu1: turn off vblank irqs aggressively in dpu driver

2021-02-10 Thread Rob Clark

On Wed, Feb 10, 2021 at 3:41 AM  wrote:
>
> On 2021-02-01 00:46, Rob Clark wrote:
> > On Fri, Dec 18, 2020 at 2:27 AM Kalyan Thota 
> > wrote:
> >>
> >> Set the flag vblank_disable_immediate = true to turn off vblank irqs
> >> immediately as soon as drm_vblank_put is requested so that there are
> >> no irqs triggered during idle state. This will reduce cpu wakeups
> >> and help in power saving.
> >>
> >> To enable vblank_disable_immediate flag the underlying KMS driver
> >> needs to support high precision vblank timestamping and also a
> >> reliable way of providing vblank counter which is incrementing
> >> at the leading edge of vblank.
> >>
> >> This patch also brings in changes to support vblank_disable_immediate
> >> requirement in dpu driver.
> >>
> >> Changes in v1:
> >>  - Specify reason to add vblank timestamp support. (Rob)
> >>  - Add changes to provide vblank counter from dpu driver.
> >>
> >> Signed-off-by: Kalyan Thota 
> >
> > This seems to be triggering:
> >
> > [  +0.032668] [ cut here ]
> > [  +0.004759] msm ae0.mdss: drm_WARN_ON_ONCE(cur_vblank !=
> > vblank->last)
> > [  +0.24] WARNING: CPU: 0 PID: 362 at
> > drivers/gpu/drm/drm_vblank.c:354 drm_update_vblank_count+0x1e4/0x258
> > [  +0.017154] Modules linked in: joydev
> > [  +0.003784] CPU: 0 PID: 362 Comm: frecon Not tainted
> > 5.11.0-rc5-00037-g33d3504871dd #2
> > [  +0.008135] Hardware name: Google Lazor (rev1 - 2) with LTE (DT)
> > [  +0.006167] pstate: 60400089 (nZCv daIf +PAN -UAO -TCO BTYPE=--)
> > [  +0.006169] pc : drm_update_vblank_count+0x1e4/0x258
> > [  +0.005105] lr : drm_update_vblank_count+0x1e4/0x258
> > [  +0.005106] sp : ffc010003b70
> > [  +0.003409] x29: ffc010003b70 x28: ff80855d9d98
> > [  +0.005466] x27:  x26: 00fe502a
> > [  +0.005458] x25: 0001 x24: 0001
> > [  +0.005466] x23: 0001 x22: ff808561ce80
> > [  +0.005465] x21:  x20: 
> > [  +0.005468] x19: ff80850d6800 x18: 
> > [  +0.005466] x17:  x16: 
> > [  +0.005465] x15: 000a x14: 263b
> > [  +0.005466] x13: 0006 x12: 
> > [  +0.005465] x11: 0010 x10: ffc090003797
> > [  +0.005466] x9 : ffed200e2a8c x8 : 
> > [  +0.005466] x7 :  x6 : ffed213b2b51
> > [  +0.005465] x5 : c000dfff x4 : ffed21218048
> > [  +0.005465] x3 :  x2 : 
> > [  +0.005465] x1 :  x0 : 
> > [  +0.005466] Call trace:
> > [  +0.002520]  drm_update_vblank_count+0x1e4/0x258
> > [  +0.004748]  drm_handle_vblank+0xd0/0x35c
> > [  +0.004130]  drm_crtc_handle_vblank+0x24/0x30
> > [  +0.004487]  dpu_crtc_vblank_callback+0x3c/0xc4
> > [  +0.004662]  dpu_encoder_vblank_callback+0x70/0xc4
> > [  +0.004931]  dpu_encoder_phys_vid_vblank_irq+0x50/0x12c
> > [  +0.005378]  dpu_core_irq_callback_handler+0xf4/0xfc
> > [  +0.005107]  dpu_hw_intr_dispatch_irq+0x100/0x120
> > [  +0.004834]  dpu_core_irq+0x44/0x5c
> > [  +0.003597]  dpu_irq+0x1c/0x28
> > [  +0.003141]  msm_irq+0x34/0x40
> > [  +0.003153]  __handle_irq_event_percpu+0xfc/0x254
> > [  +0.004838]  handle_irq_event_percpu+0x3c/0x94
> > [  +0.004574]  handle_irq_event+0x54/0x98
> > [  +0.003944]  handle_level_irq+0xa0/0xd0
> > [  +0.003943]  generic_handle_irq+0x30/0x48
> > [  +0.004131]  dpu_mdss_irq+0xe4/0x118
> > [  +0.003684]  generic_handle_irq+0x30/0x48
> > [  +0.004127]  __handle_domain_irq+0xa8/0xac
> > [  +0.004215]  gic_handle_irq+0xdc/0x150
> > [  +0.003856]  el1_irq+0xb4/0x180
> > [  +0.003237]  dpu_encoder_vsync_time+0x78/0x230
> > [  +0.004574]  dpu_encoder_kickoff+0x190/0x354
> > [  +0.004386]  dpu_crtc_commit_kickoff+0x194/0x1a0
> > [  +0.004748]  dpu_kms_flush_commit+0xf4/0x108
> > [  +0.004390]  msm_atomic_commit_tail+0x2e8/0x384
> > [  +0.004661]  commit_tail+0x80/0x108
> > [  +0.003588]  drm_atomic_helper_commit+0x118/0x11c
> > [  +0.004834]  drm_atomic_commit+0x58/0x68
> > [  +0.004033]  drm_atomic_helper_set_config+0x70/0x9c
> > [  +0.005018]  drm_mode_setcrtc+0x390/0x584
> > [  +0.004131]  drm_ioctl_kernel+0xc8/0x11c
> > [  +0.004035]  drm_ioctl+0x2f8/0x34c
> > [  +0.003500]  drm_compat_ioctl+0x48/0xe8
> > [  +0.003945]  __arm64_compat_sys_ioctl+0xe8/0x104
> > [  +

[PATCH] drm/msm: Fix legacy relocs path

2021-02-04 Thread Rob Clark

From: Rob Clark 

In moving code around, we ended up using the same pointer to
copy_from_user() the relocs tables as we used for the cmd table
entry, which is clearly not right.  This went unnoticed because
modern mesa on non-ancent kernels does not actually use relocs.
But this broke ancient mesa on modern kernels.

Reported-by: Emil Velikov 
Fixes: 20224d715a88 ("drm/msm/submit: Move copy_from_user ahead of locking bos")
Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem_submit.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c 
b/drivers/gpu/drm/msm/msm_gem_submit.c
index d04c349d8112..5480852bdeda 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -198,6 +198,8 @@ static int submit_lookup_cmds(struct msm_gem_submit *submit,
submit->cmd[i].idx  = submit_cmd.submit_idx;
submit->cmd[i].nr_relocs = submit_cmd.nr_relocs;
 
+   userptr = u64_to_user_ptr(submit_cmd.relocs);
+
sz = array_size(submit_cmd.nr_relocs,
sizeof(struct drm_msm_gem_submit_reloc));
/* check for overflow: */
-- 
2.29.2

Re: [PATCH 2/3] iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag

2021-02-03 Thread Rob Clark

On Wed, Feb 3, 2021 at 1:46 PM Will Deacon  wrote:
>
> On Tue, Feb 02, 2021 at 11:56:27AM +0530, Sai Prakash Ranjan wrote:
> > On 2021-02-01 23:50, Jordan Crouse wrote:
> > > On Mon, Feb 01, 2021 at 08:20:44AM -0800, Rob Clark wrote:
> > > > On Mon, Feb 1, 2021 at 3:16 AM Will Deacon  wrote:
> > > > > On Fri, Jan 29, 2021 at 03:12:59PM +0530, Sai Prakash Ranjan wrote:
> > > > > > On 2021-01-29 14:35, Will Deacon wrote:
> > > > > > > On Mon, Jan 11, 2021 at 07:45:04PM +0530, Sai Prakash Ranjan 
> > > > > > > wrote:
> > > > > > > > +#define IOMMU_LLC(1 << 6)
> > > > > > >
> > > > > > > On reflection, I'm a bit worried about exposing this because I 
> > > > > > > think it
> > > > > > > will
> > > > > > > introduce a mismatched virtual alias with the CPU (we don't even 
> > > > > > > have a
> > > > > > > MAIR
> > > > > > > set up for this memory type). Now, we also have that issue for 
> > > > > > > the PTW,
> > > > > > > but
> > > > > > > since we always use cache maintenance (i.e. the streaming API) for
> > > > > > > publishing the page-tables to a non-coheren walker, it works out.
> > > > > > > However,
> > > > > > > if somebody expects IOMMU_LLC to be coherent with a DMA API 
> > > > > > > coherent
> > > > > > > allocation, then they're potentially in for a nasty surprise due 
> > > > > > > to the
> > > > > > > mismatched outer-cacheability attributes.
> > > > > > >
> > > > > >
> > > > > > Can't we add the syscached memory type similar to what is done on 
> > > > > > android?
> > > > >
> > > > > Maybe. How does the GPU driver map these things on the CPU side?
> > > >
> > > > Currently we use writecombine mappings for everything, although there
> > > > are some cases that we'd like to use cached (but have not merged
> > > > patches that would give userspace a way to flush/invalidate)
> > > >
> > >
> > > LLC/system cache doesn't have a relationship with the CPU cache.  Its
> > > just a
> > > little accelerator that sits on the connection from the GPU to DDR and
> > > caches
> > > accesses. The hint that Sai is suggesting is used to mark the buffers as
> > > 'no-write-allocate' to prevent GPU write operations from being cached in
> > > the LLC
> > > which a) isn't interesting and b) takes up cache space for read
> > > operations.
> > >
> > > Its easiest to think of the LLC as a bonus accelerator that has no cost
> > > for
> > > us to use outside of the unfortunate per buffer hint.
> > >
> > > We do have to worry about the CPU cache w.r.t I/O coherency (which is a
> > > different hint) and in that case we have all of concerns that Will
> > > identified.
> > >
> >
> > For mismatched outer cacheability attributes which Will mentioned, I was
> > referring to [1] in android kernel.
>
> I've lost track of the conversation here :/
>
> When the GPU has a buffer mapped with IOMMU_LLC, is the buffer also mapped
> into the CPU and with what attributes? Rob said "writecombine for
> everything" -- does that mean ioremap_wc() / MEMREMAP_WC?

Currently userspace asks for everything WC, so pgprot_writecombine()

The kernel doesn't enforce this, but so far provides no UAPI to do
anything useful with non-coherent cached mappings (although there is
interest to support this)

BR,
-R

> Finally, we need to be careful when we use the word "hint" as "allocation
> hint" has a specific meaning in the architecture, and if we only mismatch on
> those then we're actually ok. But I think IOMMU_LLC is more than just a
> hint, since it actually drives eviction policy (i.e. it enables writeback).
>
> Sorry for the pedantry, but I just want to make sure we're all talking
> about the same things!
>
> Cheers,
>
> Will

Re: [PATCH] drm/msm/kms: Make a lock_class_key for each crtc mutex

2021-02-03 Thread Rob Clark

On Wed, Feb 3, 2021 at 1:58 PM Stephen Boyd  wrote:
>
> Quoting Rob Clark (2021-02-03 09:29:09)
> > On Wed, Feb 3, 2021 at 2:10 AM Daniel Vetter  wrote:
> > >
> > > On Tue, Feb 02, 2021 at 08:51:25AM -0800, Rob Clark wrote:
> > > > On Tue, Feb 2, 2021 at 7:46 AM Daniel Vetter  wrote:
> > > > >
> > > > > On Mon, Jan 25, 2021 at 03:49:01PM -0800, Stephen Boyd wrote:
> > > > > > This is because lockdep thinks all the locks taken in lock_crtcs() 
> > > > > > are
> > > > > > the same lock, when they actually aren't. That's because we call
> > > > > > mutex_init() in msm_kms_init() and that assigns on static key for 
> > > > > > every
> > > > > > lock initialized in this loop. Let's allocate a dynamic number of
> > > > > > lock_class_keys and assign them to each lock so that lockdep can 
> > > > > > figure
> > > > > > out an AA deadlock isn't possible here.
> > > > > >
> > > > > > Fixes: b3d91800d9ac ("drm/msm: Fix race condition in msm driver 
> > > > > > with async layer updates")
> > > > > > Cc: Krishna Manikandan 
> > > > > > Signed-off-by: Stephen Boyd 
> > > > >
> > > > > This smells like throwing more bad after initial bad code ...
> > > > >
> > > > > First a rant: 
> > > > > https://blog.ffwll.ch/2020/08/lockdep-false-positives.html
> > >
> > > Some technical on the patch itself: I think you want
> > > mutex_lock_nested(crtc->lock, drm_crtc_index(crtc)), not your own locking
> > > classes hand-rolled. It's defacto the same, but much more obviously
> > > correct since self-documenting.
> >
> > hmm, yeah, that is a bit cleaner.. but this patch is already on
> > msm-next, maybe I'll add a patch on top to change it
>
> How many CRTCs are there? The subclass number tops out at 8, per
> MAX_LOCKDEP_SUBCLASSES so if we have more than that many bits possible
> then it will fail.

conveniently MAX_CRTCS is 8.. realistically I don't *think* you'd ever
see more than 2 or 3

BR,
-R

Re: [PATCH] drm/msm/kms: Make a lock_class_key for each crtc mutex

2021-02-03 Thread Rob Clark

On Wed, Feb 3, 2021 at 2:10 AM Daniel Vetter  wrote:
>
> On Tue, Feb 02, 2021 at 08:51:25AM -0800, Rob Clark wrote:
> > On Tue, Feb 2, 2021 at 7:46 AM Daniel Vetter  wrote:
> > >
> > > On Mon, Jan 25, 2021 at 03:49:01PM -0800, Stephen Boyd wrote:
> > > > Lockdep complains about an AA deadlock when rebooting the device.
> > > >
> > > > 
> > > > WARNING: possible recursive locking detected
> > > > 5.4.91 #1 Not tainted
> > > > 
> > > > reboot/5213 is trying to acquire lock:
> > > > ff80d13391b0 (>commit_lock[i]){+.+.}, at: lock_crtcs+0x60/0xa4
> > > >
> > > > but task is already holding lock:
> > > > ff80d1339110 (>commit_lock[i]){+.+.}, at: lock_crtcs+0x60/0xa4
> > > >
> > > > other info that might help us debug this:
> > > > Possible unsafe locking scenario:
> > > >
> > > > CPU0
> > > > 
> > > > lock(>commit_lock[i]);
> > > > lock(>commit_lock[i]);
> > > >
> > > > *** DEADLOCK ***
> > > >
> > > > May be due to missing lock nesting notation
> > > >
> > > > 6 locks held by reboot/5213:
> > > > __arm64_sys_reboot+0x148/0x2a0
> > > > device_shutdown+0x10c/0x2c4
> > > > drm_atomic_helper_shutdown+0x48/0xfc
> > > > modeset_lock+0x120/0x24c
> > > > lock_crtcs+0x60/0xa4
> > > >
> > > > stack backtrace:
> > > > CPU: 4 PID: 5213 Comm: reboot Not tainted 5.4.91 #1
> > > > Hardware name: Google Pompom (rev1) with LTE (DT)
> > > > Call trace:
> > > > dump_backtrace+0x0/0x1dc
> > > > show_stack+0x24/0x30
> > > > dump_stack+0xfc/0x1a8
> > > > __lock_acquire+0xcd0/0x22b8
> > > > lock_acquire+0x1ec/0x240
> > > > __mutex_lock_common+0xe0/0xc84
> > > > mutex_lock_nested+0x48/0x58
> > > > lock_crtcs+0x60/0xa4
> > > > msm_atomic_commit_tail+0x348/0x570
> > > > commit_tail+0xdc/0x178
> > > > drm_atomic_helper_commit+0x160/0x168
> > > > drm_atomic_commit+0x68/0x80
> > > >
> > > > This is because lockdep thinks all the locks taken in lock_crtcs() are
> > > > the same lock, when they actually aren't. That's because we call
> > > > mutex_init() in msm_kms_init() and that assigns on static key for every
> > > > lock initialized in this loop. Let's allocate a dynamic number of
> > > > lock_class_keys and assign them to each lock so that lockdep can figure
> > > > out an AA deadlock isn't possible here.
> > > >
> > > > Fixes: b3d91800d9ac ("drm/msm: Fix race condition in msm driver with 
> > > > async layer updates")
> > > > Cc: Krishna Manikandan 
> > > > Signed-off-by: Stephen Boyd 
> > >
> > > This smells like throwing more bad after initial bad code ...
> > >
> > > First a rant: https://blog.ffwll.ch/2020/08/lockdep-false-positives.html
>
> Some technical on the patch itself: I think you want
> mutex_lock_nested(crtc->lock, drm_crtc_index(crtc)), not your own locking
> classes hand-rolled. It's defacto the same, but much more obviously
> correct since self-documenting.

hmm, yeah, that is a bit cleaner.. but this patch is already on
msm-next, maybe I'll add a patch on top to change it

> > > Yes I know the locking you're doing here is correct, but that goes to the
> > > second issue: Why is this needed? atomic_async_update helpers are supposed
> > > to take care of ordering fun like this, if they're not, we need to address
> > > things there. The problem that
> >
> > Maybe a better solution would be helper awareness of hw that has
> > double-buffered state and flush bits.. ie. something that looks a bit
> > more like the internal kms fxn ptrs. Currently the locking is
> > protecting something that the atomic helpers are not aware of, ie.
> > we've already written previous cursor updates to hw and are just
> > waiting until close to vblank to write the flush bits
> >
> > But, we've been over this before. I'd tried various approaches.. the
> > current scheme replaces seanpaul's earlier attempts to do it the
> > "helper" way.  The current implementation does the best job of
> > avoiding fps drops when the legacy cursor uapi is in play.  (And yes,
> > legacy cursor + atomic ioctls is maybe not the greatest, but it i

Re: [PATCH 3/5] drm/msm/dsi_pll_10nm: Fix bad VCO rate calculation and prescaler

2021-02-02 Thread Rob Clark

On Tue, Feb 2, 2021 at 11:08 AM Rob Clark  wrote:
>
> On Tue, Feb 2, 2021 at 10:46 AM AngeloGioacchino Del Regno
>  wrote:
> >
> > Il 02/02/21 19:45, Rob Clark ha scritto:
> > > On Tue, Feb 2, 2021 at 6:32 AM AngeloGioacchino Del Regno
> > >  wrote:
> > >>
> > >> Il 01/02/21 18:31, Rob Clark ha scritto:
> > >>> fwiw, this is the clk_summary diff with and without this patch:
> > >>>
> > >>> --
> > >>> 270,282c270,282
> > >>> < dsi0_pll_out_div_clk  110
> > >>> 887039941  0 0  5 Y
> > >>> <dsi0_pll_post_out_div_clk   000
> > >>> 221759985  0 0  5 Y
> > >>> <dsi0_pll_bit_clk   220
> > >>> 887039941  0 0  5 Y
> > >>> <   dsi0_pclk_mux   110
> > >>> 887039941  0 0  5 Y
> > >>> <  dsi0_phy_pll_out_dsiclk   110
> > >>> 147839991  0 0  5 Y
> > >>> < disp_cc_mdss_pclk0_clk_src   110
> > >>> 147839991  0 0  5 Y
> > >>> <disp_cc_mdss_pclk0_clk   110
> > >>>147839991  0 0  5 Y
> > >>> <   dsi0_pll_by_2_bit_clk   000
> > >>> 443519970  0 0  5 Y
> > >>> <   dsi0_phy_pll_out_byteclk   110
> > >>> 110879992  0 0  5 Y
> > >>> <  disp_cc_mdss_byte0_clk_src   220
> > >>> 110879992  0 0  5 Y
> > >>> < disp_cc_mdss_byte0_div_clk_src   11
> > >>> 055439996  0 0  5 Y
> > >>> <disp_cc_mdss_byte0_intf_clk   11
> > >>> 055439996  0 0  5 Y
> > >>> < disp_cc_mdss_byte0_clk   110
> > >>> 110879992  0 0  5 Y
> > >>> ---
> > >>>>   dsi0_pll_out_div_clk  110   
> > >>>> 887039978  0 0  5 Y
> > >>>>  dsi0_pll_post_out_div_clk   000   
> > >>>> 221759994  0 0  5 Y
> > >>>>  dsi0_pll_bit_clk   220   
> > >>>> 887039978  0 0  5 Y
> > >>>> dsi0_pclk_mux   110   
> > >>>> 887039978  0 0  5 Y
> > >>>>dsi0_phy_pll_out_dsiclk   110   
> > >>>> 147839997  0 0  5 Y
> > >>>>   disp_cc_mdss_pclk0_clk_src   110 
> > >>>>   147839997  0 0  5 Y
> > >>>>  disp_cc_mdss_pclk0_clk   110  
> > >>>>  147839997  0 0  5 Y
> > >>>> dsi0_pll_by_2_bit_clk   000   
> > >>>> 443519989  0 0  5 Y
> > >>>> dsi0_phy_pll_out_byteclk   110   
> > >>>> 110879997  0 0  5 Y
> > >>>>disp_cc_mdss_byte0_clk_src   220   
> > >>>> 110879997  0 0  5 Y
> > >>>>   disp_cc_mdss_byte0_div_clk_src   11  
> > >>>>   05543  0 0  5 Y
> > >>>>  disp_cc_mdss_byte0_intf_clk   11  
> > >>>>   05543  0 0  5 Y
> > >>>>   disp_cc_mdss_byte0_clk   110   
> > >>>> 110879997  0 0  5 Y
> > >>> --
> > >>>
> > >>>
> > >>
> > >> This is almost exactly what I saw on

Re: [PATCH 3/5] drm/msm/dsi_pll_10nm: Fix bad VCO rate calculation and prescaler

2021-02-02 Thread Rob Clark

On Tue, Feb 2, 2021 at 10:46 AM AngeloGioacchino Del Regno
 wrote:
>
> Il 02/02/21 19:45, Rob Clark ha scritto:
> > On Tue, Feb 2, 2021 at 6:32 AM AngeloGioacchino Del Regno
> >  wrote:
> >>
> >> Il 01/02/21 18:31, Rob Clark ha scritto:
> >>> On Mon, Feb 1, 2021 at 9:18 AM Rob Clark  wrote:
> >>>>
> >>>> On Mon, Feb 1, 2021 at 9:05 AM Rob Clark  wrote:
> >>>>>
> >>>>> On Mon, Feb 1, 2021 at 7:47 AM Rob Clark  wrote:
> >>>>>>
> >>>>>> On Mon, Feb 1, 2021 at 2:11 AM AngeloGioacchino Del Regno
> >>>>>>  wrote:
> >>>>>>>
> >>>>>>> Il 31/01/21 20:50, Rob Clark ha scritto:
> >>>>>>>> On Sat, Jan 9, 2021 at 5:51 AM AngeloGioacchino Del Regno
> >>>>>>>>  wrote:
> >>>>>>>>>
> >>>>>>>>> The VCO rate was being miscalculated due to a big overlook during
> >>>>>>>>> the process of porting this driver from downstream to upstream:
> >>>>>>>>> here we are really recalculating the rate of the VCO by reading
> >>>>>>>>> the appropriate registers and returning a real frequency, while
> >>>>>>>>> downstream the driver was doing something entirely different.
> >>>>>>>>>
> >>>>>>>>> In our case here, the recalculated rate was wrong, as it was then
> >>>>>>>>> given back to the set_rate function, which was erroneously doing
> >>>>>>>>> a division on the fractional value, based on the prescaler being
> >>>>>>>>> either enabled or disabled: this was actually producing a bug for
> >>>>>>>>> which the final VCO rate was being doubled, causing very obvious
> >>>>>>>>> issues when trying to drive a DSI panel because the actual divider
> >>>>>>>>> value was multiplied by two!
> >>>>>>>>>
> >>>>>>>>> To make things work properly, remove the multiplication of the
> >>>>>>>>> reference clock by two from function dsi_pll_calc_dec_frac and
> >>>>>>>>> account for the prescaler enablement in the vco_recalc_rate (if
> >>>>>>>>> the prescaler is enabled, then the hardware will divide the rate
> >>>>>>>>> by two).
> >>>>>>>>>
> >>>>>>>>> This will make the vco_recalc_rate function to pass the right
> >>>>>>>>> frequency to the (clock framework) set_rate function when called,
> >>>>>>>>> which will - in turn - program the right values in both the
> >>>>>>>>> DECIMAL_DIV_START_1 and the FRAC_DIV_START_{LOW/MID/HIGH}_1
> >>>>>>>>> registers, finally making the PLL to output the right clock.
> >>>>>>>>>
> >>>>>>>>> Also, while at it, remove the prescaler TODO by also adding the
> >>>>>>>>> possibility of disabling the prescaler on the PLL (it is in the
> >>>>>>>>> PLL_ANALOG_CONTROLS_ONE register).
> >>>>>>>>> Of course, both prescaler-ON and OFF cases were tested.
> >>>>>>>>
> >>>>>>>> This somehow breaks things on sc7180 (display gets stuck at first
> >>>>>>>> frame of splash screen).  (This is a setup w/ an ti-sn65dsi86 
> >>>>>>>> dsi->eDP
> >>>>>>>> bridge)
> >>>>>>>>
> >>>>>>>
> >>>>>>> First frame of the splash means that something is "a bit" wrong...
> >>>>>>> ...like the DSI clock is a little off.
> >>>>>>>
> >>>>>>> I don't have such hardware, otherwise I would've tried... but what you
> >>>>>>> describe is a bit strange.
> >>>>>>> Is there any other older qcom platform using this chip? Any other
> >>>>>>> non-qcom platform? Is the driver for the SN65DSI86 surely fine?
> >>>>>>> Anyway, as you know, I would never propose untested patches nor
> >>>>>>> partially working ones for any reason: I'm sorry that this happened.
> >>>&

Re: [PATCH 3/5] drm/msm/dsi_pll_10nm: Fix bad VCO rate calculation and prescaler

2021-02-02 Thread Rob Clark

On Tue, Feb 2, 2021 at 6:32 AM AngeloGioacchino Del Regno
 wrote:
>
> Il 01/02/21 18:31, Rob Clark ha scritto:
> > On Mon, Feb 1, 2021 at 9:18 AM Rob Clark  wrote:
> >>
> >> On Mon, Feb 1, 2021 at 9:05 AM Rob Clark  wrote:
> >>>
> >>> On Mon, Feb 1, 2021 at 7:47 AM Rob Clark  wrote:
> >>>>
> >>>> On Mon, Feb 1, 2021 at 2:11 AM AngeloGioacchino Del Regno
> >>>>  wrote:
> >>>>>
> >>>>> Il 31/01/21 20:50, Rob Clark ha scritto:
> >>>>>> On Sat, Jan 9, 2021 at 5:51 AM AngeloGioacchino Del Regno
> >>>>>>  wrote:
> >>>>>>>
> >>>>>>> The VCO rate was being miscalculated due to a big overlook during
> >>>>>>> the process of porting this driver from downstream to upstream:
> >>>>>>> here we are really recalculating the rate of the VCO by reading
> >>>>>>> the appropriate registers and returning a real frequency, while
> >>>>>>> downstream the driver was doing something entirely different.
> >>>>>>>
> >>>>>>> In our case here, the recalculated rate was wrong, as it was then
> >>>>>>> given back to the set_rate function, which was erroneously doing
> >>>>>>> a division on the fractional value, based on the prescaler being
> >>>>>>> either enabled or disabled: this was actually producing a bug for
> >>>>>>> which the final VCO rate was being doubled, causing very obvious
> >>>>>>> issues when trying to drive a DSI panel because the actual divider
> >>>>>>> value was multiplied by two!
> >>>>>>>
> >>>>>>> To make things work properly, remove the multiplication of the
> >>>>>>> reference clock by two from function dsi_pll_calc_dec_frac and
> >>>>>>> account for the prescaler enablement in the vco_recalc_rate (if
> >>>>>>> the prescaler is enabled, then the hardware will divide the rate
> >>>>>>> by two).
> >>>>>>>
> >>>>>>> This will make the vco_recalc_rate function to pass the right
> >>>>>>> frequency to the (clock framework) set_rate function when called,
> >>>>>>> which will - in turn - program the right values in both the
> >>>>>>> DECIMAL_DIV_START_1 and the FRAC_DIV_START_{LOW/MID/HIGH}_1
> >>>>>>> registers, finally making the PLL to output the right clock.
> >>>>>>>
> >>>>>>> Also, while at it, remove the prescaler TODO by also adding the
> >>>>>>> possibility of disabling the prescaler on the PLL (it is in the
> >>>>>>> PLL_ANALOG_CONTROLS_ONE register).
> >>>>>>> Of course, both prescaler-ON and OFF cases were tested.
> >>>>>>
> >>>>>> This somehow breaks things on sc7180 (display gets stuck at first
> >>>>>> frame of splash screen).  (This is a setup w/ an ti-sn65dsi86 dsi->eDP
> >>>>>> bridge)
> >>>>>>
> >>>>>
> >>>>> First frame of the splash means that something is "a bit" wrong...
> >>>>> ...like the DSI clock is a little off.
> >>>>>
> >>>>> I don't have such hardware, otherwise I would've tried... but what you
> >>>>> describe is a bit strange.
> >>>>> Is there any other older qcom platform using this chip? Any other
> >>>>> non-qcom platform? Is the driver for the SN65DSI86 surely fine?
> >>>>> Anyway, as you know, I would never propose untested patches nor
> >>>>> partially working ones for any reason: I'm sorry that this happened.
> >>>>
> >>>> I don't think there is anything publicly avail w/ sc7180 (yet.. but very 
> >>>> soon)
> >>>>
> >>>> The ti-sn65dsi86 bridge is used on a bunch of 845/850 devices (like
> >>>> the snapdragon windows laptops).. and I think also the older 835
> >>>> laptops.. ofc that doesn't mean that there isn't some bug, but I'd
> >>>> guess maybe more likely that there is some small difference in DSI vs
> >>>> older devices, or some cmd vs video mode difference.
> >>>>
> >>>> Anyways, seems like the screen did eve

Re: [PATCH] drm/msm/kms: Make a lock_class_key for each crtc mutex

2021-02-02 Thread Rob Clark

On Tue, Feb 2, 2021 at 7:46 AM Daniel Vetter  wrote:
>
> On Mon, Jan 25, 2021 at 03:49:01PM -0800, Stephen Boyd wrote:
> > Lockdep complains about an AA deadlock when rebooting the device.
> >
> > 
> > WARNING: possible recursive locking detected
> > 5.4.91 #1 Not tainted
> > 
> > reboot/5213 is trying to acquire lock:
> > ff80d13391b0 (>commit_lock[i]){+.+.}, at: lock_crtcs+0x60/0xa4
> >
> > but task is already holding lock:
> > ff80d1339110 (>commit_lock[i]){+.+.}, at: lock_crtcs+0x60/0xa4
> >
> > other info that might help us debug this:
> > Possible unsafe locking scenario:
> >
> > CPU0
> > 
> > lock(>commit_lock[i]);
> > lock(>commit_lock[i]);
> >
> > *** DEADLOCK ***
> >
> > May be due to missing lock nesting notation
> >
> > 6 locks held by reboot/5213:
> > __arm64_sys_reboot+0x148/0x2a0
> > device_shutdown+0x10c/0x2c4
> > drm_atomic_helper_shutdown+0x48/0xfc
> > modeset_lock+0x120/0x24c
> > lock_crtcs+0x60/0xa4
> >
> > stack backtrace:
> > CPU: 4 PID: 5213 Comm: reboot Not tainted 5.4.91 #1
> > Hardware name: Google Pompom (rev1) with LTE (DT)
> > Call trace:
> > dump_backtrace+0x0/0x1dc
> > show_stack+0x24/0x30
> > dump_stack+0xfc/0x1a8
> > __lock_acquire+0xcd0/0x22b8
> > lock_acquire+0x1ec/0x240
> > __mutex_lock_common+0xe0/0xc84
> > mutex_lock_nested+0x48/0x58
> > lock_crtcs+0x60/0xa4
> > msm_atomic_commit_tail+0x348/0x570
> > commit_tail+0xdc/0x178
> > drm_atomic_helper_commit+0x160/0x168
> > drm_atomic_commit+0x68/0x80
> >
> > This is because lockdep thinks all the locks taken in lock_crtcs() are
> > the same lock, when they actually aren't. That's because we call
> > mutex_init() in msm_kms_init() and that assigns on static key for every
> > lock initialized in this loop. Let's allocate a dynamic number of
> > lock_class_keys and assign them to each lock so that lockdep can figure
> > out an AA deadlock isn't possible here.
> >
> > Fixes: b3d91800d9ac ("drm/msm: Fix race condition in msm driver with async 
> > layer updates")
> > Cc: Krishna Manikandan 
> > Signed-off-by: Stephen Boyd 
>
> This smells like throwing more bad after initial bad code ...
>
> First a rant: https://blog.ffwll.ch/2020/08/lockdep-false-positives.html
>
> Yes I know the locking you're doing here is correct, but that goes to the
> second issue: Why is this needed? atomic_async_update helpers are supposed
> to take care of ordering fun like this, if they're not, we need to address
> things there. The problem that

Maybe a better solution would be helper awareness of hw that has
double-buffered state and flush bits.. ie. something that looks a bit
more like the internal kms fxn ptrs. Currently the locking is
protecting something that the atomic helpers are not aware of, ie.
we've already written previous cursor updates to hw and are just
waiting until close to vblank to write the flush bits

But, we've been over this before. I'd tried various approaches.. the
current scheme replaces seanpaul's earlier attempts to do it the
"helper" way.  The current implementation does the best job of
avoiding fps drops when the legacy cursor uapi is in play.  (And yes,
legacy cursor + atomic ioctls is maybe not the greatest, but it is
what it is.)

BR,
-R

>
> commit b3d91800d9ac35014e0349292273a6fa7938d402
> Author: Krishna Manikandan 
> Date:   Fri Oct 16 19:40:43 2020 +0530
>
> drm/msm: Fix race condition in msm driver with async layer updates
>
> is _the_ reason we have drm_crtc_commit to track stuff, and Maxime has
> recently rolled out a pile of changes to vc4 to use these things
> correctly. Hacking some glorious hand-rolled locking for synchronization
> of updates really should be the exception for kms drivers, not the rule.
> And this one here doesn't look like an exception by far (the one legit I
> know of is the locking issues amdgpu has between atomic_commit_tail and
> gpu reset, and that one is really nasty, so not going to get fixed in
> helpers, ever).
>
> Cheers, Daniel
>
> > ---
> >  drivers/gpu/drm/msm/msm_kms.h | 8 ++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/msm_kms.h b/drivers/gpu/drm/msm/msm_kms.h
> > index d8151a89e163..4735251a394d 100644
> > --- a/drivers/gpu/drm/msm/msm_kms.h
> > +++ b/drivers/gpu/drm/msm/msm_kms.h
> > @@ -157,6 +157,7 @@ struct msm_kms {
> >* from the crtc's pending_timer close to end of the frame:
> >*/
> >   struct mutex commit_lock[MAX_CRTCS];
> > + struct lock_class_key commit_lock_keys[MAX_CRTCS];
> >   unsigned pending_crtc_mask;
> >   struct msm_pending_timer pending_timers[MAX_CRTCS];
> >  };
> > @@ -166,8 +167,11 @@ static inline int msm_kms_init(struct msm_kms *kms,
> >  {
> >   unsigned i, ret;
> >
> > - for (i = 0; i < ARRAY_SIZE(kms->commit_lock); i++)
> > - mutex_init(>commit_lock[i]);
> > + for (i = 0; i < ARRAY_SIZE(kms->commit_lock); i++) {
> > +

Re: [PATCH 3/5] drm/msm/dsi_pll_10nm: Fix bad VCO rate calculation and prescaler

2021-02-01 Thread Rob Clark

On Mon, Feb 1, 2021 at 9:18 AM Rob Clark  wrote:
>
> On Mon, Feb 1, 2021 at 9:05 AM Rob Clark  wrote:
> >
> > On Mon, Feb 1, 2021 at 7:47 AM Rob Clark  wrote:
> > >
> > > On Mon, Feb 1, 2021 at 2:11 AM AngeloGioacchino Del Regno
> > >  wrote:
> > > >
> > > > Il 31/01/21 20:50, Rob Clark ha scritto:
> > > > > On Sat, Jan 9, 2021 at 5:51 AM AngeloGioacchino Del Regno
> > > > >  wrote:
> > > > >>
> > > > >> The VCO rate was being miscalculated due to a big overlook during
> > > > >> the process of porting this driver from downstream to upstream:
> > > > >> here we are really recalculating the rate of the VCO by reading
> > > > >> the appropriate registers and returning a real frequency, while
> > > > >> downstream the driver was doing something entirely different.
> > > > >>
> > > > >> In our case here, the recalculated rate was wrong, as it was then
> > > > >> given back to the set_rate function, which was erroneously doing
> > > > >> a division on the fractional value, based on the prescaler being
> > > > >> either enabled or disabled: this was actually producing a bug for
> > > > >> which the final VCO rate was being doubled, causing very obvious
> > > > >> issues when trying to drive a DSI panel because the actual divider
> > > > >> value was multiplied by two!
> > > > >>
> > > > >> To make things work properly, remove the multiplication of the
> > > > >> reference clock by two from function dsi_pll_calc_dec_frac and
> > > > >> account for the prescaler enablement in the vco_recalc_rate (if
> > > > >> the prescaler is enabled, then the hardware will divide the rate
> > > > >> by two).
> > > > >>
> > > > >> This will make the vco_recalc_rate function to pass the right
> > > > >> frequency to the (clock framework) set_rate function when called,
> > > > >> which will - in turn - program the right values in both the
> > > > >> DECIMAL_DIV_START_1 and the FRAC_DIV_START_{LOW/MID/HIGH}_1
> > > > >> registers, finally making the PLL to output the right clock.
> > > > >>
> > > > >> Also, while at it, remove the prescaler TODO by also adding the
> > > > >> possibility of disabling the prescaler on the PLL (it is in the
> > > > >> PLL_ANALOG_CONTROLS_ONE register).
> > > > >> Of course, both prescaler-ON and OFF cases were tested.
> > > > >
> > > > > This somehow breaks things on sc7180 (display gets stuck at first
> > > > > frame of splash screen).  (This is a setup w/ an ti-sn65dsi86 dsi->eDP
> > > > > bridge)
> > > > >
> > > >
> > > > First frame of the splash means that something is "a bit" wrong...
> > > > ...like the DSI clock is a little off.
> > > >
> > > > I don't have such hardware, otherwise I would've tried... but what you
> > > > describe is a bit strange.
> > > > Is there any other older qcom platform using this chip? Any other
> > > > non-qcom platform? Is the driver for the SN65DSI86 surely fine?
> > > > Anyway, as you know, I would never propose untested patches nor
> > > > partially working ones for any reason: I'm sorry that this happened.
> > >
> > > I don't think there is anything publicly avail w/ sc7180 (yet.. but very 
> > > soon)
> > >
> > > The ti-sn65dsi86 bridge is used on a bunch of 845/850 devices (like
> > > the snapdragon windows laptops).. and I think also the older 835
> > > laptops.. ofc that doesn't mean that there isn't some bug, but I'd
> > > guess maybe more likely that there is some small difference in DSI vs
> > > older devices, or some cmd vs video mode difference.
> > >
> > > Anyways, seems like the screen did eventually recover so that gives me
> > > a bit of confidence to bisect this series, which I'll do a bit later
> > > today.
> >
> > fwiw, this series minus this patch, and everything looks ok.. let me
> > take a closer look at what changes with this patch
>
> Btw, it looks like upstream, config->disable_prescaler is always
> false.. I don't suppose you have anything WIP that changes this?

fwiw, this is the clk_summary diff with and without this

Re: [PATCH 3/5] drm/msm/dsi_pll_10nm: Fix bad VCO rate calculation and prescaler

2021-02-01 Thread Rob Clark

On Mon, Feb 1, 2021 at 9:05 AM Rob Clark  wrote:
>
> On Mon, Feb 1, 2021 at 7:47 AM Rob Clark  wrote:
> >
> > On Mon, Feb 1, 2021 at 2:11 AM AngeloGioacchino Del Regno
> >  wrote:
> > >
> > > Il 31/01/21 20:50, Rob Clark ha scritto:
> > > > On Sat, Jan 9, 2021 at 5:51 AM AngeloGioacchino Del Regno
> > > >  wrote:
> > > >>
> > > >> The VCO rate was being miscalculated due to a big overlook during
> > > >> the process of porting this driver from downstream to upstream:
> > > >> here we are really recalculating the rate of the VCO by reading
> > > >> the appropriate registers and returning a real frequency, while
> > > >> downstream the driver was doing something entirely different.
> > > >>
> > > >> In our case here, the recalculated rate was wrong, as it was then
> > > >> given back to the set_rate function, which was erroneously doing
> > > >> a division on the fractional value, based on the prescaler being
> > > >> either enabled or disabled: this was actually producing a bug for
> > > >> which the final VCO rate was being doubled, causing very obvious
> > > >> issues when trying to drive a DSI panel because the actual divider
> > > >> value was multiplied by two!
> > > >>
> > > >> To make things work properly, remove the multiplication of the
> > > >> reference clock by two from function dsi_pll_calc_dec_frac and
> > > >> account for the prescaler enablement in the vco_recalc_rate (if
> > > >> the prescaler is enabled, then the hardware will divide the rate
> > > >> by two).
> > > >>
> > > >> This will make the vco_recalc_rate function to pass the right
> > > >> frequency to the (clock framework) set_rate function when called,
> > > >> which will - in turn - program the right values in both the
> > > >> DECIMAL_DIV_START_1 and the FRAC_DIV_START_{LOW/MID/HIGH}_1
> > > >> registers, finally making the PLL to output the right clock.
> > > >>
> > > >> Also, while at it, remove the prescaler TODO by also adding the
> > > >> possibility of disabling the prescaler on the PLL (it is in the
> > > >> PLL_ANALOG_CONTROLS_ONE register).
> > > >> Of course, both prescaler-ON and OFF cases were tested.
> > > >
> > > > This somehow breaks things on sc7180 (display gets stuck at first
> > > > frame of splash screen).  (This is a setup w/ an ti-sn65dsi86 dsi->eDP
> > > > bridge)
> > > >
> > >
> > > First frame of the splash means that something is "a bit" wrong...
> > > ...like the DSI clock is a little off.
> > >
> > > I don't have such hardware, otherwise I would've tried... but what you
> > > describe is a bit strange.
> > > Is there any other older qcom platform using this chip? Any other
> > > non-qcom platform? Is the driver for the SN65DSI86 surely fine?
> > > Anyway, as you know, I would never propose untested patches nor
> > > partially working ones for any reason: I'm sorry that this happened.
> >
> > I don't think there is anything publicly avail w/ sc7180 (yet.. but very 
> > soon)
> >
> > The ti-sn65dsi86 bridge is used on a bunch of 845/850 devices (like
> > the snapdragon windows laptops).. and I think also the older 835
> > laptops.. ofc that doesn't mean that there isn't some bug, but I'd
> > guess maybe more likely that there is some small difference in DSI vs
> > older devices, or some cmd vs video mode difference.
> >
> > Anyways, seems like the screen did eventually recover so that gives me
> > a bit of confidence to bisect this series, which I'll do a bit later
> > today.
>
> fwiw, this series minus this patch, and everything looks ok.. let me
> take a closer look at what changes with this patch

Btw, it looks like upstream, config->disable_prescaler is always
false.. I don't suppose you have anything WIP that changes this?

BR,
-R

>
> > > In any case, just to be perfectly transparent, while being here waiting
> > > for review, this patch series got tested on more smartphones, even ones
> > > that I don't personally own, with different displays.
> > >
> > > For your reference, here's a list (all MSM8998..):
> > > - OnePlus 5   (1920x1080)
> > > - F(x)Tec Pro 1   (2160x1080)
> > > - Sony Xperia XZ1 Compact (1280x720)
> > &g

Re: [PATCH 3/5] drm/msm/dsi_pll_10nm: Fix bad VCO rate calculation and prescaler

2021-02-01 Thread Rob Clark

On Mon, Feb 1, 2021 at 7:47 AM Rob Clark  wrote:
>
> On Mon, Feb 1, 2021 at 2:11 AM AngeloGioacchino Del Regno
>  wrote:
> >
> > Il 31/01/21 20:50, Rob Clark ha scritto:
> > > On Sat, Jan 9, 2021 at 5:51 AM AngeloGioacchino Del Regno
> > >  wrote:
> > >>
> > >> The VCO rate was being miscalculated due to a big overlook during
> > >> the process of porting this driver from downstream to upstream:
> > >> here we are really recalculating the rate of the VCO by reading
> > >> the appropriate registers and returning a real frequency, while
> > >> downstream the driver was doing something entirely different.
> > >>
> > >> In our case here, the recalculated rate was wrong, as it was then
> > >> given back to the set_rate function, which was erroneously doing
> > >> a division on the fractional value, based on the prescaler being
> > >> either enabled or disabled: this was actually producing a bug for
> > >> which the final VCO rate was being doubled, causing very obvious
> > >> issues when trying to drive a DSI panel because the actual divider
> > >> value was multiplied by two!
> > >>
> > >> To make things work properly, remove the multiplication of the
> > >> reference clock by two from function dsi_pll_calc_dec_frac and
> > >> account for the prescaler enablement in the vco_recalc_rate (if
> > >> the prescaler is enabled, then the hardware will divide the rate
> > >> by two).
> > >>
> > >> This will make the vco_recalc_rate function to pass the right
> > >> frequency to the (clock framework) set_rate function when called,
> > >> which will - in turn - program the right values in both the
> > >> DECIMAL_DIV_START_1 and the FRAC_DIV_START_{LOW/MID/HIGH}_1
> > >> registers, finally making the PLL to output the right clock.
> > >>
> > >> Also, while at it, remove the prescaler TODO by also adding the
> > >> possibility of disabling the prescaler on the PLL (it is in the
> > >> PLL_ANALOG_CONTROLS_ONE register).
> > >> Of course, both prescaler-ON and OFF cases were tested.
> > >
> > > This somehow breaks things on sc7180 (display gets stuck at first
> > > frame of splash screen).  (This is a setup w/ an ti-sn65dsi86 dsi->eDP
> > > bridge)
> > >
> >
> > First frame of the splash means that something is "a bit" wrong...
> > ...like the DSI clock is a little off.
> >
> > I don't have such hardware, otherwise I would've tried... but what you
> > describe is a bit strange.
> > Is there any other older qcom platform using this chip? Any other
> > non-qcom platform? Is the driver for the SN65DSI86 surely fine?
> > Anyway, as you know, I would never propose untested patches nor
> > partially working ones for any reason: I'm sorry that this happened.
>
> I don't think there is anything publicly avail w/ sc7180 (yet.. but very soon)
>
> The ti-sn65dsi86 bridge is used on a bunch of 845/850 devices (like
> the snapdragon windows laptops).. and I think also the older 835
> laptops.. ofc that doesn't mean that there isn't some bug, but I'd
> guess maybe more likely that there is some small difference in DSI vs
> older devices, or some cmd vs video mode difference.
>
> Anyways, seems like the screen did eventually recover so that gives me
> a bit of confidence to bisect this series, which I'll do a bit later
> today.

fwiw, this series minus this patch, and everything looks ok.. let me
take a closer look at what changes with this patch

BR,
-R

> > In any case, just to be perfectly transparent, while being here waiting
> > for review, this patch series got tested on more smartphones, even ones
> > that I don't personally own, with different displays.
> >
> > For your reference, here's a list (all MSM8998..):
> > - OnePlus 5   (1920x1080)
> > - F(x)Tec Pro 1   (2160x1080)
> > - Sony Xperia XZ1 Compact (1280x720)
> > - Sony Xperia XZ1 (1920x1080)
> > - Sony Xperia XZ Premium  (3840x2160)
> >
>
> Yeah, no worries, I wasn't trying to imply that the patch was untested.
>
> Out of curiosity, are any of those video mode panels?
>
> >
> > > Also, something (I assume DSI related) that I was testing on
> > > msm-next-staging seems to have effected the colors on the panel (ie.
> > > they are more muted).. which seems to persist across reboots (ie. when
> >
> > So much "fun". This makes me think something about the PCC blo

Re: [PATCH 2/3] iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag

2021-02-01 Thread Rob Clark

On Mon, Feb 1, 2021 at 3:16 AM Will Deacon  wrote:
>
> On Fri, Jan 29, 2021 at 03:12:59PM +0530, Sai Prakash Ranjan wrote:
> > On 2021-01-29 14:35, Will Deacon wrote:
> > > On Mon, Jan 11, 2021 at 07:45:04PM +0530, Sai Prakash Ranjan wrote:
> > > > Add a new page protection flag IOMMU_LLC which can be used
> > > > by non-coherent masters to set cacheable memory attributes
> > > > for an outer level of cache called as last-level cache or
> > > > system cache. Initial user of this page protection flag is
> > > > the adreno gpu and then can later be used by other clients
> > > > such as video where this can be used for per-buffer based
> > > > mapping.
> > > >
> > > > Signed-off-by: Sai Prakash Ranjan 
> > > > ---
> > > >  drivers/iommu/io-pgtable-arm.c | 3 +++
> > > >  include/linux/iommu.h  | 6 ++
> > > >  2 files changed, 9 insertions(+)
> > > >
> > > > diff --git a/drivers/iommu/io-pgtable-arm.c
> > > > b/drivers/iommu/io-pgtable-arm.c
> > > > index 7439ee7fdcdb..ebe653ef601b 100644
> > > > --- a/drivers/iommu/io-pgtable-arm.c
> > > > +++ b/drivers/iommu/io-pgtable-arm.c
> > > > @@ -415,6 +415,9 @@ static arm_lpae_iopte
> > > > arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
> > > >   else if (prot & IOMMU_CACHE)
> > > >   pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
> > > >   << ARM_LPAE_PTE_ATTRINDX_SHIFT);
> > > > + else if (prot & IOMMU_LLC)
> > > > + pte |= (ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE
> > > > + << ARM_LPAE_PTE_ATTRINDX_SHIFT);
> > > >   }
> > > >
> > > >   if (prot & IOMMU_CACHE)
> > > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > > > index ffaa389ea128..1f82057df531 100644
> > > > --- a/include/linux/iommu.h
> > > > +++ b/include/linux/iommu.h
> > > > @@ -31,6 +31,12 @@
> > > >   * if the IOMMU page table format is equivalent.
> > > >   */
> > > >  #define IOMMU_PRIV   (1 << 5)
> > > > +/*
> > > > + * Non-coherent masters can use this page protection flag to set
> > > > cacheable
> > > > + * memory attributes for only a transparent outer level of cache,
> > > > also known as
> > > > + * the last-level or system cache.
> > > > + */
> > > > +#define IOMMU_LLC(1 << 6)
> > >
> > > On reflection, I'm a bit worried about exposing this because I think it
> > > will
> > > introduce a mismatched virtual alias with the CPU (we don't even have a
> > > MAIR
> > > set up for this memory type). Now, we also have that issue for the PTW,
> > > but
> > > since we always use cache maintenance (i.e. the streaming API) for
> > > publishing the page-tables to a non-coheren walker, it works out.
> > > However,
> > > if somebody expects IOMMU_LLC to be coherent with a DMA API coherent
> > > allocation, then they're potentially in for a nasty surprise due to the
> > > mismatched outer-cacheability attributes.
> > >
> >
> > Can't we add the syscached memory type similar to what is done on android?
>
> Maybe. How does the GPU driver map these things on the CPU side?

Currently we use writecombine mappings for everything, although there
are some cases that we'd like to use cached (but have not merged
patches that would give userspace a way to flush/invalidate)

BR,
-R

Re: [PATCH 3/5] drm/msm/dsi_pll_10nm: Fix bad VCO rate calculation and prescaler

2021-02-01 Thread Rob Clark

On Mon, Feb 1, 2021 at 2:11 AM AngeloGioacchino Del Regno
 wrote:
>
> Il 31/01/21 20:50, Rob Clark ha scritto:
> > On Sat, Jan 9, 2021 at 5:51 AM AngeloGioacchino Del Regno
> >  wrote:
> >>
> >> The VCO rate was being miscalculated due to a big overlook during
> >> the process of porting this driver from downstream to upstream:
> >> here we are really recalculating the rate of the VCO by reading
> >> the appropriate registers and returning a real frequency, while
> >> downstream the driver was doing something entirely different.
> >>
> >> In our case here, the recalculated rate was wrong, as it was then
> >> given back to the set_rate function, which was erroneously doing
> >> a division on the fractional value, based on the prescaler being
> >> either enabled or disabled: this was actually producing a bug for
> >> which the final VCO rate was being doubled, causing very obvious
> >> issues when trying to drive a DSI panel because the actual divider
> >> value was multiplied by two!
> >>
> >> To make things work properly, remove the multiplication of the
> >> reference clock by two from function dsi_pll_calc_dec_frac and
> >> account for the prescaler enablement in the vco_recalc_rate (if
> >> the prescaler is enabled, then the hardware will divide the rate
> >> by two).
> >>
> >> This will make the vco_recalc_rate function to pass the right
> >> frequency to the (clock framework) set_rate function when called,
> >> which will - in turn - program the right values in both the
> >> DECIMAL_DIV_START_1 and the FRAC_DIV_START_{LOW/MID/HIGH}_1
> >> registers, finally making the PLL to output the right clock.
> >>
> >> Also, while at it, remove the prescaler TODO by also adding the
> >> possibility of disabling the prescaler on the PLL (it is in the
> >> PLL_ANALOG_CONTROLS_ONE register).
> >> Of course, both prescaler-ON and OFF cases were tested.
> >
> > This somehow breaks things on sc7180 (display gets stuck at first
> > frame of splash screen).  (This is a setup w/ an ti-sn65dsi86 dsi->eDP
> > bridge)
> >
>
> First frame of the splash means that something is "a bit" wrong...
> ...like the DSI clock is a little off.
>
> I don't have such hardware, otherwise I would've tried... but what you
> describe is a bit strange.
> Is there any other older qcom platform using this chip? Any other
> non-qcom platform? Is the driver for the SN65DSI86 surely fine?
> Anyway, as you know, I would never propose untested patches nor
> partially working ones for any reason: I'm sorry that this happened.

I don't think there is anything publicly avail w/ sc7180 (yet.. but very soon)

The ti-sn65dsi86 bridge is used on a bunch of 845/850 devices (like
the snapdragon windows laptops).. and I think also the older 835
laptops.. ofc that doesn't mean that there isn't some bug, but I'd
guess maybe more likely that there is some small difference in DSI vs
older devices, or some cmd vs video mode difference.

Anyways, seems like the screen did eventually recover so that gives me
a bit of confidence to bisect this series, which I'll do a bit later
today.

> In any case, just to be perfectly transparent, while being here waiting
> for review, this patch series got tested on more smartphones, even ones
> that I don't personally own, with different displays.
>
> For your reference, here's a list (all MSM8998..):
> - OnePlus 5   (1920x1080)
> - F(x)Tec Pro 1   (2160x1080)
> - Sony Xperia XZ1 Compact (1280x720)
> - Sony Xperia XZ1 (1920x1080)
> - Sony Xperia XZ Premium  (3840x2160)
>

Yeah, no worries, I wasn't trying to imply that the patch was untested.

Out of curiosity, are any of those video mode panels?

>
> > Also, something (I assume DSI related) that I was testing on
> > msm-next-staging seems to have effected the colors on the panel (ie.
> > they are more muted).. which seems to persist across reboots (ie. when
>
> So much "fun". This makes me think something about the PCC block doing
> the wrong thing (getting misconfigured).
>
> > switching back to a good kernel), and interestingly if I reboot from a
> > good kernel I see part of the login prompt (or whatever was previously
> > on-screen) in the firmware ui screen !?!  (so maybe somehow triggered
> > the display to think it is in PSR mode??)
> >
>
>  From a fast read, the SN65DSI86 is on I2C.. giving it a wrong dsi clock
> cannot produce (logically, at least) this, so I say that it is very
> unlikely for this to be a consequence of the 10nm pll fixes...
>

Note that the

Re: [v2] drm/msm/disp/dpu1: turn off vblank irqs aggressively in dpu driver

2021-01-31 Thread Rob Clark

On Fri, Dec 18, 2020 at 2:27 AM Kalyan Thota  wrote:
>
> Set the flag vblank_disable_immediate = true to turn off vblank irqs
> immediately as soon as drm_vblank_put is requested so that there are
> no irqs triggered during idle state. This will reduce cpu wakeups
> and help in power saving.
>
> To enable vblank_disable_immediate flag the underlying KMS driver
> needs to support high precision vblank timestamping and also a
> reliable way of providing vblank counter which is incrementing
> at the leading edge of vblank.
>
> This patch also brings in changes to support vblank_disable_immediate
> requirement in dpu driver.
>
> Changes in v1:
>  - Specify reason to add vblank timestamp support. (Rob)
>  - Add changes to provide vblank counter from dpu driver.
>
> Signed-off-by: Kalyan Thota 

This seems to be triggering:

[  +0.032668] [ cut here ]
[  +0.004759] msm ae0.mdss: drm_WARN_ON_ONCE(cur_vblank != vblank->last)
[  +0.24] WARNING: CPU: 0 PID: 362 at
drivers/gpu/drm/drm_vblank.c:354 drm_update_vblank_count+0x1e4/0x258
[  +0.017154] Modules linked in: joydev
[  +0.003784] CPU: 0 PID: 362 Comm: frecon Not tainted
5.11.0-rc5-00037-g33d3504871dd #2
[  +0.008135] Hardware name: Google Lazor (rev1 - 2) with LTE (DT)
[  +0.006167] pstate: 60400089 (nZCv daIf +PAN -UAO -TCO BTYPE=--)
[  +0.006169] pc : drm_update_vblank_count+0x1e4/0x258
[  +0.005105] lr : drm_update_vblank_count+0x1e4/0x258
[  +0.005106] sp : ffc010003b70
[  +0.003409] x29: ffc010003b70 x28: ff80855d9d98
[  +0.005466] x27:  x26: 00fe502a
[  +0.005458] x25: 0001 x24: 0001
[  +0.005466] x23: 0001 x22: ff808561ce80
[  +0.005465] x21:  x20: 
[  +0.005468] x19: ff80850d6800 x18: 
[  +0.005466] x17:  x16: 
[  +0.005465] x15: 000a x14: 263b
[  +0.005466] x13: 0006 x12: 
[  +0.005465] x11: 0010 x10: ffc090003797
[  +0.005466] x9 : ffed200e2a8c x8 : 
[  +0.005466] x7 :  x6 : ffed213b2b51
[  +0.005465] x5 : c000dfff x4 : ffed21218048
[  +0.005465] x3 :  x2 : 
[  +0.005465] x1 :  x0 : 
[  +0.005466] Call trace:
[  +0.002520]  drm_update_vblank_count+0x1e4/0x258
[  +0.004748]  drm_handle_vblank+0xd0/0x35c
[  +0.004130]  drm_crtc_handle_vblank+0x24/0x30
[  +0.004487]  dpu_crtc_vblank_callback+0x3c/0xc4
[  +0.004662]  dpu_encoder_vblank_callback+0x70/0xc4
[  +0.004931]  dpu_encoder_phys_vid_vblank_irq+0x50/0x12c
[  +0.005378]  dpu_core_irq_callback_handler+0xf4/0xfc
[  +0.005107]  dpu_hw_intr_dispatch_irq+0x100/0x120
[  +0.004834]  dpu_core_irq+0x44/0x5c
[  +0.003597]  dpu_irq+0x1c/0x28
[  +0.003141]  msm_irq+0x34/0x40
[  +0.003153]  __handle_irq_event_percpu+0xfc/0x254
[  +0.004838]  handle_irq_event_percpu+0x3c/0x94
[  +0.004574]  handle_irq_event+0x54/0x98
[  +0.003944]  handle_level_irq+0xa0/0xd0
[  +0.003943]  generic_handle_irq+0x30/0x48
[  +0.004131]  dpu_mdss_irq+0xe4/0x118
[  +0.003684]  generic_handle_irq+0x30/0x48
[  +0.004127]  __handle_domain_irq+0xa8/0xac
[  +0.004215]  gic_handle_irq+0xdc/0x150
[  +0.003856]  el1_irq+0xb4/0x180
[  +0.003237]  dpu_encoder_vsync_time+0x78/0x230
[  +0.004574]  dpu_encoder_kickoff+0x190/0x354
[  +0.004386]  dpu_crtc_commit_kickoff+0x194/0x1a0
[  +0.004748]  dpu_kms_flush_commit+0xf4/0x108
[  +0.004390]  msm_atomic_commit_tail+0x2e8/0x384
[  +0.004661]  commit_tail+0x80/0x108
[  +0.003588]  drm_atomic_helper_commit+0x118/0x11c
[  +0.004834]  drm_atomic_commit+0x58/0x68
[  +0.004033]  drm_atomic_helper_set_config+0x70/0x9c
[  +0.005018]  drm_mode_setcrtc+0x390/0x584
[  +0.004131]  drm_ioctl_kernel+0xc8/0x11c
[  +0.004035]  drm_ioctl+0x2f8/0x34c
[  +0.003500]  drm_compat_ioctl+0x48/0xe8
[  +0.003945]  __arm64_compat_sys_ioctl+0xe8/0x104
[  +0.004750]  el0_svc_common.constprop.0+0x114/0x188
[  +0.005019]  do_el0_svc_compat+0x28/0x38
[  +0.004031]  el0_svc_compat+0x20/0x30
[  +0.003772]  el0_sync_compat_handler+0x104/0x18c
[  +0.004749]  el0_sync_compat+0x178/0x180
[  +0.004034] ---[ end trace 2959d178e74f2555 ]---


BR,
-R

> ---
>  drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c   | 80 
> ++
>  drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c| 30 
>  drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.h| 11 +++
>  drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys.h   |  1 +
>  .../gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c   | 17 +
>  drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c|  5 ++
>  6 files changed, 144 insertions(+)
>
> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c 
> b/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c
> index d4662e8..9a80981 100644
> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c
> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c
> @@ -65,6 +65,83 @@ static void

Re: [PATCH 3/5] drm/msm/dsi_pll_10nm: Fix bad VCO rate calculation and prescaler

2021-01-31 Thread Rob Clark

On Sat, Jan 9, 2021 at 5:51 AM AngeloGioacchino Del Regno
 wrote:
>
> The VCO rate was being miscalculated due to a big overlook during
> the process of porting this driver from downstream to upstream:
> here we are really recalculating the rate of the VCO by reading
> the appropriate registers and returning a real frequency, while
> downstream the driver was doing something entirely different.
>
> In our case here, the recalculated rate was wrong, as it was then
> given back to the set_rate function, which was erroneously doing
> a division on the fractional value, based on the prescaler being
> either enabled or disabled: this was actually producing a bug for
> which the final VCO rate was being doubled, causing very obvious
> issues when trying to drive a DSI panel because the actual divider
> value was multiplied by two!
>
> To make things work properly, remove the multiplication of the
> reference clock by two from function dsi_pll_calc_dec_frac and
> account for the prescaler enablement in the vco_recalc_rate (if
> the prescaler is enabled, then the hardware will divide the rate
> by two).
>
> This will make the vco_recalc_rate function to pass the right
> frequency to the (clock framework) set_rate function when called,
> which will - in turn - program the right values in both the
> DECIMAL_DIV_START_1 and the FRAC_DIV_START_{LOW/MID/HIGH}_1
> registers, finally making the PLL to output the right clock.
>
> Also, while at it, remove the prescaler TODO by also adding the
> possibility of disabling the prescaler on the PLL (it is in the
> PLL_ANALOG_CONTROLS_ONE register).
> Of course, both prescaler-ON and OFF cases were tested.

This somehow breaks things on sc7180 (display gets stuck at first
frame of splash screen).  (This is a setup w/ an ti-sn65dsi86 dsi->eDP
bridge)

Also, something (I assume DSI related) that I was testing on
msm-next-staging seems to have effected the colors on the panel (ie.
they are more muted).. which seems to persist across reboots (ie. when
switching back to a good kernel), and interestingly if I reboot from a
good kernel I see part of the login prompt (or whatever was previously
on-screen) in the firmware ui screen !?!  (so maybe somehow triggered
the display to think it is in PSR mode??)

Not sure if that is caused by these patches, but if I can figure out
how to get the panel back to normal I can bisect.  I think for now
I'll drop this series.  Possibly it could be a
two-wrongs-makes-a-right situation that had things working before, but
I think someone from qcom who knows the DSI IP should take a look.

BR,
-R


> Signed-off-by: AngeloGioacchino Del Regno 
> 
> ---
>  drivers/gpu/drm/msm/dsi/pll/dsi_pll_10nm.c | 22 +-
>  1 file changed, 9 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/dsi/pll/dsi_pll_10nm.c 
> b/drivers/gpu/drm/msm/dsi/pll/dsi_pll_10nm.c
> index 8b66e852eb36..5be562dfbf06 100644
> --- a/drivers/gpu/drm/msm/dsi/pll/dsi_pll_10nm.c
> +++ b/drivers/gpu/drm/msm/dsi/pll/dsi_pll_10nm.c
> @@ -165,11 +165,7 @@ static void dsi_pll_calc_dec_frac(struct dsi_pll_10nm 
> *pll)
>
> pll_freq = pll->vco_current_rate;
>
> -   if (config->disable_prescaler)
> -   divider = fref;
> -   else
> -   divider = fref * 2;
> -
> +   divider = fref;
> multiplier = 1 << config->frac_bits;
> dec_multiple = div_u64(pll_freq * multiplier, divider);
> dec = div_u64_rem(dec_multiple, multiplier, );
> @@ -266,9 +262,11 @@ static void dsi_pll_ssc_commit(struct dsi_pll_10nm *pll)
>
>  static void dsi_pll_config_hzindep_reg(struct dsi_pll_10nm *pll)
>  {
> +   struct dsi_pll_config *config = >pll_configuration;
> void __iomem *base = pll->mmio;
> +   u32 val = config->disable_prescaler ? 0x0 : 0x80;
>
> -   pll_write(base + REG_DSI_10nm_PHY_PLL_ANALOG_CONTROLS_ONE, 0x80);
> +   pll_write(base + REG_DSI_10nm_PHY_PLL_ANALOG_CONTROLS_ONE, val);
> pll_write(base + REG_DSI_10nm_PHY_PLL_ANALOG_CONTROLS_TWO, 0x03);
> pll_write(base + REG_DSI_10nm_PHY_PLL_ANALOG_CONTROLS_THREE, 0x00);
> pll_write(base + REG_DSI_10nm_PHY_PLL_DSM_DIVIDER, 0x00);
> @@ -499,17 +497,15 @@ static unsigned long 
> dsi_pll_10nm_vco_recalc_rate(struct clk_hw *hw,
> frac |= ((pll_read(base + REG_DSI_10nm_PHY_PLL_FRAC_DIV_START_HIGH_1) 
> &
>   0x3) << 16);
>
> -   /*
> -* TODO:
> -*  1. Assumes prescaler is disabled
> -*/
> multiplier = 1 << config->frac_bits;
> -   pll_freq = dec * (ref_clk * 2);
> -   tmp64 = (ref_clk * 2 * frac);
> +   pll_freq = dec * ref_clk;
> +   tmp64 = ref_clk * frac;
> pll_freq += div_u64(tmp64, multiplier);
> -
> vco_rate = pll_freq;
>
> +   if (config->disable_prescaler)
> +   vco_rate = div_u64(vco_rate, 2);
> +
> DBG("DSI PLL%d returning vco rate = %lu, dec = %x, frac = %x",
> pll_10nm->id, (unsigned

Re: [PATCH 2/3] drm/msm: Fix races managing the OOB state for timestamp vs timestamps.

2021-01-28 Thread Rob Clark

On Wed, Jan 27, 2021 at 3:39 PM Eric Anholt  wrote:
>
> Now that we're not racing with GPU setup, also fix races of timestamps
> against other timestamps.  In CI, we were seeing this path trigger
> timeouts on setting the GMU bit, especially on the first set of tests
> right after boot (it's probably easier to lose the race than one might
> think, given that we start many tests in parallel, and waiting for NFS
> to page in code probably means that lots of tests hit the same point
> of screen init at the same time).

Could you add the error msg to the commit msg, to make it more easily
searchable?

BR,
-R

> Signed-off-by: Eric Anholt 
> Cc: sta...@vger.kernel.org # v5.9
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 7424a70b9d35..e8f0b5325a7f 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -1175,6 +1175,9 @@ static int a6xx_get_timestamp(struct msm_gpu *gpu, 
> uint64_t *value)
>  {
> struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> +   static DEFINE_MUTEX(perfcounter_oob);
> +
> +   mutex_lock(_oob);
>
> /* Force the GPU power on so we can read this register */
> a6xx_gmu_set_oob(_gpu->gmu, GMU_OOB_PERFCOUNTER_SET);
> @@ -1183,6 +1186,7 @@ static int a6xx_get_timestamp(struct msm_gpu *gpu, 
> uint64_t *value)
> REG_A6XX_RBBM_PERFCTR_CP_0_HI);
>
> a6xx_gmu_clear_oob(_gpu->gmu, GMU_OOB_PERFCOUNTER_SET);
> +   mutex_unlock(_oob);
> return 0;
>  }
>
> --
> 2.30.0
>

Re: [PATCH] drm/msm/kms: Make a lock_class_key for each crtc mutex

2021-01-28 Thread Rob Clark

On Mon, Jan 25, 2021 at 3:49 PM Stephen Boyd  wrote:
>
> Lockdep complains about an AA deadlock when rebooting the device.
>
> 
> WARNING: possible recursive locking detected
> 5.4.91 #1 Not tainted
> 
> reboot/5213 is trying to acquire lock:
> ff80d13391b0 (>commit_lock[i]){+.+.}, at: lock_crtcs+0x60/0xa4
>
> but task is already holding lock:
> ff80d1339110 (>commit_lock[i]){+.+.}, at: lock_crtcs+0x60/0xa4
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> 
> lock(>commit_lock[i]);
> lock(>commit_lock[i]);
>
> *** DEADLOCK ***
>
> May be due to missing lock nesting notation
>
> 6 locks held by reboot/5213:
> __arm64_sys_reboot+0x148/0x2a0
> device_shutdown+0x10c/0x2c4
> drm_atomic_helper_shutdown+0x48/0xfc
> modeset_lock+0x120/0x24c
> lock_crtcs+0x60/0xa4
>
> stack backtrace:
> CPU: 4 PID: 5213 Comm: reboot Not tainted 5.4.91 #1
> Hardware name: Google Pompom (rev1) with LTE (DT)
> Call trace:
> dump_backtrace+0x0/0x1dc
> show_stack+0x24/0x30
> dump_stack+0xfc/0x1a8
> __lock_acquire+0xcd0/0x22b8
> lock_acquire+0x1ec/0x240
> __mutex_lock_common+0xe0/0xc84
> mutex_lock_nested+0x48/0x58
> lock_crtcs+0x60/0xa4
> msm_atomic_commit_tail+0x348/0x570
> commit_tail+0xdc/0x178
> drm_atomic_helper_commit+0x160/0x168
> drm_atomic_commit+0x68/0x80
>
> This is because lockdep thinks all the locks taken in lock_crtcs() are
> the same lock, when they actually aren't. That's because we call
> mutex_init() in msm_kms_init() and that assigns on static key for every

nit, s/on/one/ ?

BR,
-R

> lock initialized in this loop. Let's allocate a dynamic number of
> lock_class_keys and assign them to each lock so that lockdep can figure
> out an AA deadlock isn't possible here.
>
> Fixes: b3d91800d9ac ("drm/msm: Fix race condition in msm driver with async 
> layer updates")
> Cc: Krishna Manikandan 
> Signed-off-by: Stephen Boyd 
> ---
>  drivers/gpu/drm/msm/msm_kms.h | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/msm_kms.h b/drivers/gpu/drm/msm/msm_kms.h
> index d8151a89e163..4735251a394d 100644
> --- a/drivers/gpu/drm/msm/msm_kms.h
> +++ b/drivers/gpu/drm/msm/msm_kms.h
> @@ -157,6 +157,7 @@ struct msm_kms {
>  * from the crtc's pending_timer close to end of the frame:
>  */
> struct mutex commit_lock[MAX_CRTCS];
> +   struct lock_class_key commit_lock_keys[MAX_CRTCS];
> unsigned pending_crtc_mask;
> struct msm_pending_timer pending_timers[MAX_CRTCS];
>  };
> @@ -166,8 +167,11 @@ static inline int msm_kms_init(struct msm_kms *kms,
>  {
> unsigned i, ret;
>
> -   for (i = 0; i < ARRAY_SIZE(kms->commit_lock); i++)
> -   mutex_init(>commit_lock[i]);
> +   for (i = 0; i < ARRAY_SIZE(kms->commit_lock); i++) {
> +   lockdep_register_key(>commit_lock_keys[i]);
> +   __mutex_init(>commit_lock[i], ">commit_lock[i]",
> +>commit_lock_keys[i]);
> +   }
>
> kms->funcs = funcs;
>
>
> base-commit: 19c329f6808995b142b3966301f217c831e7cf31
> --
> https://chromeos.dev
>

Re: [PATCH v2 1/3] iommu/arm-smmu: Add support for driver IOMMU fault handlers

2021-01-26 Thread Rob Clark

On Tue, Jan 26, 2021 at 3:41 AM Robin Murphy  wrote:
>
> On 2021-01-25 21:51, Jordan Crouse wrote:
> > On Fri, Jan 22, 2021 at 12:53:17PM +, Robin Murphy wrote:
> >> On 2021-01-22 12:41, Will Deacon wrote:
> >>> On Tue, Nov 24, 2020 at 12:15:58PM -0700, Jordan Crouse wrote:
>  Call report_iommu_fault() to allow upper-level drivers to register their
>  own fault handlers.
> 
>  Signed-off-by: Jordan Crouse 
>  ---
> 
>    drivers/iommu/arm/arm-smmu/arm-smmu.c | 16 +---
>    1 file changed, 13 insertions(+), 3 deletions(-)
> 
>  diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
>  b/drivers/iommu/arm/arm-smmu/arm-smmu.c
>  index 0f28a8614da3..7fd18bbda8f5 100644
>  --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
>  +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
>  @@ -427,6 +427,7 @@ static irqreturn_t arm_smmu_context_fault(int irq, 
>  void *dev)
> struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> struct arm_smmu_device *smmu = smmu_domain->smmu;
> int idx = smmu_domain->cfg.cbndx;
>  +  int ret;
> fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
> if (!(fsr & ARM_SMMU_FSR_FAULT))
>  @@ -436,11 +437,20 @@ static irqreturn_t arm_smmu_context_fault(int irq, 
>  void *dev)
> iova = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_FAR);
> cbfrsynra = arm_smmu_gr1_read(smmu, ARM_SMMU_GR1_CBFRSYNRA(idx));
>  -  dev_err_ratelimited(smmu->dev,
>  -  "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, 
>  cbfrsynra=0x%x, cb=%d\n",
>  +  ret = report_iommu_fault(domain, dev, iova,
>  +  fsynr & ARM_SMMU_FSYNR0_WNR ? IOMMU_FAULT_WRITE : 
>  IOMMU_FAULT_READ);
>  +
>  +  if (ret == -ENOSYS)
>  +  dev_err_ratelimited(smmu->dev,
>  +  "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, 
>  cbfrsynra=0x%x, cb=%d\n",
> fsr, iova, fsynr, cbfrsynra, idx);
>  -  arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr);
>  +  /*
>  +   * If the iommu fault returns an error (except -ENOSYS) then assume 
>  that
>  +   * they will handle resuming on their own
>  +   */
>  +  if (!ret || ret == -ENOSYS)
>  +  arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr);
> >>>
> >>> Hmm, I don't grok this part. If the fault handler returned an error and
> >>> we don't clear the FSR, won't we just re-take the irq immediately?
> >>
> >> If we don't touch the FSR at all, yes. Even if we clear the fault indicator
> >> bits, the interrupt *might* remain asserted until a stalled transaction is
> >> actually resolved - that's that lovely IMP-DEF corner.
> >>
> >> Robin.
> >>
> >
> > This is for stall-on-fault. The idea is that if the developer chooses to do 
> > so
> > we would stall the GPU after a fault long enough to take a picture of it 
> > with
> > devcoredump and then release the FSR. Since we can't take the devcoredump 
> > from
> > the interrupt handler we schedule it in a worker and then return an error
> > to let the main handler know that we'll come back around clear the FSR later
> > when we are done.
>
> Sure, but clearing FSR is not writing to RESUME to resolve the stalled
> transaction(s). You can already snarf the FSR contents from your
> report_iommu_fault() handler if you want to, so either way I don't see
> what's gained by not clearing it as expected at the point where we've
> handled the *interrupt*, even if it will take longer to decide what to
> do with the underlying *fault* that it signalled. I'm particularly not
> keen on having unusual behaviour in the core interrupt handling which
> callers may unwittingly trigger, for the sake of one
> very-very-driver-specific flow having a slightly richer debugging
> experience.

Tbf, "slightly" is an understatement.. it is a big enough improvement
that I've hacked up deferred resume several times to debug various
issues. ;-)

(Which is always a bit of a PITA because of things moving around in
arm-smmu as well as the drm side of things.)

But from my recollection, we can clear FSR immediately, all we need to
do is defer writing ARM_SMMU_CB_RESUME

BR,
-R

>
> For actually *handling* faults, I thought we were going to need to hook
> up the new IOPF fault queue stuff anyway?
>
> Robin.
>
> > It is assumed that we'll have to turn off interrupts in our handler to allow
> > this to work. Its all very implementation specific, but then again we're
> > assuming that if you want to do this then you know what you are doing.
> >
> > In that spirit the error that skips the FSR should probably be something
> > specific instead of "all errors" - that way a well meaning handler that 
> > returns
> > a -EINVAL doesn't accidentally break itself.
> >
> > Jordan
> >
> >>> I think
> >>> it would be better to do this unconditionally, and print the "Unhandled
> >>> context fault" message for any non-zero

Re: [PATCH 2/2] drm/msm/a6xx: Create an A6XX GPU specific address space

2021-01-20 Thread Rob Clark

On Wed, Jan 20, 2021 at 3:04 AM AngeloGioacchino Del Regno
 wrote:
>
> Il 11/01/21 13:04, Sai Prakash Ranjan ha scritto:
> > A6XX GPUs have support for last level cache(LLC) also known
> > as system cache and need to set the bus attributes to
> > use it. Currently we use a generic adreno iommu address space
> > implementation which are also used by older GPU generations
> > which do not have LLC and might introduce issues accidentally
> > and is not clean in a way that anymore additions of GPUs
> > supporting LLC would have to be guarded under ifdefs. So keep
> > the generic code separate and make the address space creation
> > A6XX specific. We also have a helper to set the llc attributes
> > so that if the newer GPU generations do support them, we can
> > use it instead of open coding domain attribute setting for each
> > GPU.
> >
>
> Hello!
>
> > Signed-off-by: Sai Prakash Ranjan 
> > ---
> >   drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 46 -
> >   drivers/gpu/drm/msm/adreno/adreno_gpu.c | 23 +
> >   drivers/gpu/drm/msm/adreno/adreno_gpu.h |  7 ++--
> >   3 files changed, 55 insertions(+), 21 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> > b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > index 3b798e883f82..3c7ad51732bb 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > @@ -1239,6 +1239,50 @@ static unsigned long a6xx_gpu_busy(struct msm_gpu 
> > *gpu)
> >   return (unsigned long)busy_time;
> >   }
> >
> > +static struct msm_gem_address_space *
> > +a6xx_create_address_space(struct msm_gpu *gpu, struct platform_device 
> > *pdev)
> > +{
> > + struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> > + struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> > + struct iommu_domain *iommu;
> > + struct msm_mmu *mmu;
> > + struct msm_gem_address_space *aspace;
> > + u64 start, size;
> > +
> > + iommu = iommu_domain_alloc(_bus_type);
> > + if (!iommu)
> > + return NULL;
> > +
> > + /*
> > +  * This allows GPU to set the bus attributes required to use system
> > +  * cache on behalf of the iommu page table walker.
> > +  */
> > + if (!IS_ERR_OR_NULL(a6xx_gpu->htw_llc_slice))
> > + adreno_set_llc_attributes(iommu);
> > +
> > + mmu = msm_iommu_new(>dev, iommu);
> > + if (IS_ERR(mmu)) {
> > + iommu_domain_free(iommu);
> > + return ERR_CAST(mmu);
> > + }
> > +
> > + /*
> > +  * Use the aperture start or SZ_16M, whichever is greater. This will
> > +  * ensure that we align with the allocated pagetable range while still
> > +  * allowing room in the lower 32 bits for GMEM and whatnot
> > +  */
> > + start = max_t(u64, SZ_16M, iommu->geometry.aperture_start);
> > + size = iommu->geometry.aperture_end - start + 1;
> > +
> > + aspace = msm_gem_address_space_create(mmu, "gpu",
> > + start & GENMASK_ULL(48, 0), size);
> > +
> > + if (IS_ERR(aspace) && !IS_ERR(mmu))
> > + mmu->funcs->destroy(mmu);
> > +
> > + return aspace;
> > +}
> > +
>
> I get what you're trying to do - yes the intentions are good, however...
> you are effectively duplicating code 1:1, as this *is* the same as
> function adreno_iommu_create_address_space.

I had suggested moving this to a6xx, to avoid breaking earlier gens so
much..  (Note a2xx by necessity already has it's own version of
create_address_space().) I would in general tend to favor a small bit
of code duplication to lower the risk of breaking older gens which not
everybody has hw to test.

But I suppose we could add a has_llcc() and move the htw_llc_slice up
to the 'struct adreno_gpu' level.  Casting to a6xx_gpu in common code
isn't a great idea.  Older gens which don't have LLCC (or don't have
LLCC support yet) could leave the slice ptr NULL.

BR,
-R

> I don't see adding two lines to a function as a valid justification to
> duplicate all the rest: perhaps, you may want to find another way to do
> this;
>
> Here's one of the many ideas, perhaps you could:
> 1. Introduce a "generic feature" to signal LLCC support (perhaps in
> struct adreno_info ?)
> 2. If LLCC is supported, and LLCC slices are initialized, set the LLCC
> attributes on the IOMMU. Of course this would mean passing the init
> state of the slices (maybe just a bool would be fine) back to the
> generic adreno_gpu.c
>
> This, unless you tell me that the entire function is going to be a6xx
> specific, but that doesn't seem to be the case at all.
>
> Concerns are that when an hypotetical Adreno A7XX comes and perhaps also
> uses the LLCC slices, this function will be duplicated yet another time.
>
> >   static struct msm_gem_address_space *
> >   a6xx_create_private_address_space(struct msm_gpu *gpu)
> >   {
> > @@ -1285,7 +1329,7 @@ static const struct adreno_gpu_funcs funcs = {
> >   .gpu_state_get =

Re: [Freedreno] [PATCH 2/2] drm/msm/a6xx: Create an A6XX GPU specific address space

2021-01-20 Thread Rob Clark

On Mon, Jan 11, 2021 at 4:04 AM Sai Prakash Ranjan
 wrote:
>
> A6XX GPUs have support for last level cache(LLC) also known
> as system cache and need to set the bus attributes to
> use it. Currently we use a generic adreno iommu address space
> implementation which are also used by older GPU generations
> which do not have LLC and might introduce issues accidentally
> and is not clean in a way that anymore additions of GPUs
> supporting LLC would have to be guarded under ifdefs. So keep
> the generic code separate and make the address space creation
> A6XX specific. We also have a helper to set the llc attributes
> so that if the newer GPU generations do support them, we can
> use it instead of open coding domain attribute setting for each
> GPU.
>
> Signed-off-by: Sai Prakash Ranjan 
> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 46 -
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c | 23 +
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h |  7 ++--
>  3 files changed, 55 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 3b798e883f82..3c7ad51732bb 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -1239,6 +1239,50 @@ static unsigned long a6xx_gpu_busy(struct msm_gpu *gpu)
> return (unsigned long)busy_time;
>  }
>
> +static struct msm_gem_address_space *
> +a6xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev)
> +{
> +   struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> +   struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> +   struct iommu_domain *iommu;
> +   struct msm_mmu *mmu;
> +   struct msm_gem_address_space *aspace;
> +   u64 start, size;
> +
> +   iommu = iommu_domain_alloc(_bus_type);
> +   if (!iommu)
> +   return NULL;
> +
> +   /*
> +* This allows GPU to set the bus attributes required to use system
> +* cache on behalf of the iommu page table walker.
> +*/
> +   if (!IS_ERR_OR_NULL(a6xx_gpu->htw_llc_slice))
> +   adreno_set_llc_attributes(iommu);
> +
> +   mmu = msm_iommu_new(>dev, iommu);
> +   if (IS_ERR(mmu)) {
> +   iommu_domain_free(iommu);
> +   return ERR_CAST(mmu);
> +   }
> +
> +   /*
> +* Use the aperture start or SZ_16M, whichever is greater. This will
> +* ensure that we align with the allocated pagetable range while still
> +* allowing room in the lower 32 bits for GMEM and whatnot
> +*/
> +   start = max_t(u64, SZ_16M, iommu->geometry.aperture_start);
> +   size = iommu->geometry.aperture_end - start + 1;
> +
> +   aspace = msm_gem_address_space_create(mmu, "gpu",
> +   start & GENMASK_ULL(48, 0), size);
> +
> +   if (IS_ERR(aspace) && !IS_ERR(mmu))
> +   mmu->funcs->destroy(mmu);
> +
> +   return aspace;
> +}
> +
>  static struct msm_gem_address_space *
>  a6xx_create_private_address_space(struct msm_gpu *gpu)
>  {
> @@ -1285,7 +1329,7 @@ static const struct adreno_gpu_funcs funcs = {
> .gpu_state_get = a6xx_gpu_state_get,
> .gpu_state_put = a6xx_gpu_state_put,
>  #endif
> -   .create_address_space = adreno_iommu_create_address_space,
> +   .create_address_space = a6xx_create_address_space,
> .create_private_address_space = 
> a6xx_create_private_address_space,
> .get_rptr = a6xx_get_rptr,
> },
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
> b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> index b35914de1b27..0f184c3dd9d9 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> @@ -186,11 +186,18 @@ int adreno_zap_shader_load(struct msm_gpu *gpu, u32 
> pasid)
> return zap_shader_load_mdt(gpu, adreno_gpu->info->zapfw, pasid);
>  }
>
> +void adreno_set_llc_attributes(struct iommu_domain *iommu)
> +{
> +   struct io_pgtable_domain_attr pgtbl_cfg;
> +
> +   pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_ARM_OUTER_WBWA;

btw, since quirks is the only field in the struct currently, this is
ok.  But better practice to do something like:

struct io_pgtable_domain_attr pgtbl_cfg = {
.quirks = IO_PGTABLE_QUIRK_ARM_OUTER_WBWA,
};

which will zero-initialize any additional fields which might be added
later, rather than inherit random garbage from the stack.

BR,
-R

> +   iommu_domain_set_attr(iommu, DOMAIN_ATTR_IO_PGTABLE_CFG, _cfg);
> +}
> +
>  struct msm_gem_address_space *
>  adreno_iommu_create_address_space(struct msm_gpu *gpu,
> struct platform_device *pdev)
>  {
> -   struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> struct iommu_domain *iommu;
> struct msm_mmu *mmu;
> struct msm_gem_address_space *aspace;
> @@ -200,20 +207,6 @@

Re: [PATCH v2 5/7] drm/msm: Add dependency on io-pgtable-arm format module

2021-01-19 Thread Rob Clark

On Mon, Jan 18, 2021 at 1:39 PM Will Deacon  wrote:
>
> On Mon, Jan 18, 2021 at 01:16:03PM -0800, Rob Clark wrote:
> > On Mon, Dec 21, 2020 at 4:44 PM Isaac J. Manjarres
> >  wrote:
> > >
> > > The MSM DRM driver depends on the availability of the ARM LPAE io-pgtable
> > > format code to work properly. In preparation for having the io-pgtable
> > > formats as modules, add a "pre" dependency with MODULE_SOFTDEP() to
> > > ensure that the io-pgtable-arm format module is loaded before loading
> > > the MSM DRM driver module.
> > >
> > > Signed-off-by: Isaac J. Manjarres 
> >
> > Thanks, I've queued this up locally
>
> I don't plan to make the io-pgtable code modular, so please drop this patch.
>
> https://lore.kernel.org/r/20210106123428.GA1798@willie-the-truck

Ok, done. Thanks

BR,
-R

Re: [PATCH v2 5/7] drm/msm: Add dependency on io-pgtable-arm format module

2021-01-18 Thread Rob Clark

On Mon, Dec 21, 2020 at 4:44 PM Isaac J. Manjarres
 wrote:
>
> The MSM DRM driver depends on the availability of the ARM LPAE io-pgtable
> format code to work properly. In preparation for having the io-pgtable
> formats as modules, add a "pre" dependency with MODULE_SOFTDEP() to
> ensure that the io-pgtable-arm format module is loaded before loading
> the MSM DRM driver module.
>
> Signed-off-by: Isaac J. Manjarres 

Thanks, I've queued this up locally

BR,
-R

> ---
>  drivers/gpu/drm/msm/msm_drv.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
> index 535a026..8be3506 100644
> --- a/drivers/gpu/drm/msm/msm_drv.c
> +++ b/drivers/gpu/drm/msm/msm_drv.c
> @@ -1369,3 +1369,4 @@ module_exit(msm_drm_unregister);
>  MODULE_AUTHOR("Rob Clark   MODULE_DESCRIPTION("MSM DRM Driver");
>  MODULE_LICENSE("GPL");
> +MODULE_SOFTDEP("pre: io-pgtable-arm");
> --
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
>
> ___
> iommu mailing list
> io...@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [Freedreno] [PATCH] drm/msm: Only enable A6xx LLCC code on A6xx

2021-01-08 Thread Rob Clark

On Fri, Jan 8, 2021 at 6:05 AM Sai Prakash Ranjan
 wrote:
>
> On 2021-01-08 19:09, Konrad Dybcio wrote:
> >> Konrad, can you please test this below change without your change?
> >
> > This brings no difference, a BUG still happens. We're still calling
> > to_a6xx_gpu on ANY device that's probed! Too bad it won't turn my A330
> > into an A640..
> >
> > Also, relying on disabling LLCC in the config is out of question as it
> > makes the arm32 kernel not compile with DRM/MSM and it just removes
> > the functionality on devices with a6xx.. (unless somebody removes the
> > dependency on it, which in my opinion is even worse and will cause
> > more problems for developers!).
> >
>
> Disabling LLCC is not the suggestion, I was under the impression that
> was the cause here for the smmu bug. Anyways, the check for llc slice
> in case llcc is disabled is not correct as well. I will send a patch for
> that as well.
>
> > The bigger question is how and why did that piece of code ever make it
> > to adreno_gpu.c and not a6xx_gpu.c?
> >
>
> My mistake, I will move it.

Thanks, since we don't have kernel-CI coverage for gpu, and there
probably isn't one person who has all the different devices supported
(or enough hours in the day to test them all), it is probably
better/safer to keep things in the backend code that is specific to a
given generation.

> > To solve it in a cleaner way I propose to move it to an a6xx-specific
> > file, or if it's going to be used with next-gen GPUs, perhaps manage
> > calling of this code via an adreno quirk/feature in adreno_device.c.
> > Now that I think about it, A5xx GPMU en/disable could probably managed
> > like that, instead of using tons of if-statements for each GPU model
> > that has it..
> >
> > While we're at it, do ALL (and I truly do mean ALL, including the
> > low-end ones, this will be important later on) A6xx GPUs make use of
> > that feature?
> >
>
> I do not have a list of all A6XX GPUs with me currently, but from what
> I know, A618, A630, A640, A650 has the support.
>

>From the PoV of bringing up new a6xx, we should probably consider that
some of them may not *yet* have LLCC enabled.  I have an 8cx laptop
and once I find time to get the display working, the next step would
be bringing up a680.. and I'd probably like to start without LLCC..

BR,
-R

> Thanks,
> Sai
>
> --
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
> member
> of Code Aurora Forum, hosted by The Linux Foundation
> ___
> Freedreno mailing list
> freedr...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/freedreno

Re: [PATCH] drm/msm: Fix MSM_INFO_GET_IOVA with carveout

2021-01-07 Thread Rob Clark

On Thu, Jan 7, 2021 at 9:20 AM Rob Clark  wrote:
>
> On Sat, Jan 2, 2021 at 12:26 PM Iskren Chernev  
> wrote:
> >
> > The msm_gem_get_iova should be guarded with gpu != NULL and not aspace
> > != NULL, because aspace is NULL when using vram carveout.
> >
> > Fixes: 933415e24bd0d ("drm/msm: Add support for private address space 
> > instances")
> >
> > Signed-off-by: Iskren Chernev 
> > ---
> >  drivers/gpu/drm/msm/msm_drv.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
> > index c5e61cb3356df..c1953fb079133 100644
> > --- a/drivers/gpu/drm/msm/msm_drv.c
> > +++ b/drivers/gpu/drm/msm/msm_drv.c
> > @@ -775,9 +775,10 @@ static int msm_ioctl_gem_info_iova(struct drm_device 
> > *dev,
> > struct drm_file *file, struct drm_gem_object *obj,
> > uint64_t *iova)
> >  {
> > +   struct msm_drm_private *priv = dev->dev_private;
> > struct msm_file_private *ctx = file->driver_priv;
> >
> > -   if (!ctx->aspace)
> > +   if (!priv->gpu)
> > return -EINVAL;
>
> Does this actually work?  It seems like you would hit a null ptr deref
> in msm_gem_init_vma().. and in general I think a lot of code paths
> would be surprised by a null address space, so this seems like a risky
> idea.

oh, actually, I suppose it is ok, since in the vram carveout case we
create the vma up front when the gem obj is created..

(still, it does seem a bit fragile.. and easy for folks testing on
devices not using vram carvout to break.. hmm..)

BR,
-R

> Maybe instead we should be creating an address space for the vram carveout?
>
> BR,
> -R
>
>
> > /*
> > --
> > 2.29.2
> >

Re: [PATCH] drm/msm: Only enable A6xx LLCC code on A6xx

2021-01-07 Thread Rob Clark

On Wed, Jan 6, 2021 at 8:50 PM Sai Prakash Ranjan
 wrote:
>
> On 2021-01-05 01:00, Konrad Dybcio wrote:
> > Using this code on A5xx (and probably older too) causes a
> > smmu bug.
> >
> > Fixes: 474dadb8b0d5 ("drm/msm/a6xx: Add support for using system
> > cache(LLC)")
> > Signed-off-by: Konrad Dybcio 
> > Tested-by: AngeloGioacchino Del Regno
> > 
> > ---
>
> Reviewed-by: Sai Prakash Ranjan 
>
> >  drivers/gpu/drm/msm/adreno/adreno_gpu.c | 21 -
> >  drivers/gpu/drm/msm/adreno/adreno_gpu.h |  5 +
> >  2 files changed, 17 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> > b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> > index 6cf9975e951e..f09175698827 100644
> > --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> > +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> > @@ -191,8 +191,6 @@ adreno_iommu_create_address_space(struct msm_gpu
> > *gpu,
> >   struct platform_device *pdev)
> >  {
> >   struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> > - struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> > - struct io_pgtable_domain_attr pgtbl_cfg;
> >   struct iommu_domain *iommu;
> >   struct msm_mmu *mmu;
> >   struct msm_gem_address_space *aspace;
> > @@ -202,13 +200,18 @@ adreno_iommu_create_address_space(struct msm_gpu
> > *gpu,
> >   if (!iommu)
> >   return NULL;
> >
> > - /*
> > -  * This allows GPU to set the bus attributes required to use system
> > -  * cache on behalf of the iommu page table walker.
> > -  */
> > - if (!IS_ERR(a6xx_gpu->htw_llc_slice)) {
> > - pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_ARM_OUTER_WBWA;
> > - iommu_domain_set_attr(iommu, DOMAIN_ATTR_IO_PGTABLE_CFG,
> > _cfg);
> > +
> > + if (adreno_is_a6xx(adreno_gpu)) {
> > + struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> > + struct io_pgtable_domain_attr pgtbl_cfg;
> > + /*
> > + * This allows GPU to set the bus attributes required to use 
> > system
> > + * cache on behalf of the iommu page table walker.
> > + */
> > + if (!IS_ERR(a6xx_gpu->htw_llc_slice)) {
> > + pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_ARM_OUTER_WBWA;
> > + iommu_domain_set_attr(iommu, 
> > DOMAIN_ATTR_IO_PGTABLE_CFG,
> > _cfg);
> > + }

I'm applying for -fixes as this is an obvious problem..  But kinda
thinking that we should try to move it into an a6xx specific
create_address_space() (or wrapper for the generic fxn)

Sai/Jordan, could I talk one of you into trying to clean this up
better for next cycle?

BR,
-R

> >   }
> >
> >   mmu = msm_iommu_new(>dev, iommu);
> > diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> > b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> > index 4574d85c5680..08421fa54a50 100644
> > --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> > +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> > @@ -226,6 +226,11 @@ static inline int adreno_is_a540(struct adreno_gpu
> > *gpu)
> >   return gpu->revn == 540;
> >  }
> >
> > +static inline bool adreno_is_a6xx(struct adreno_gpu *gpu)
> > +{
> > + return ((gpu->revn < 700 && gpu->revn > 599));
> > +}
> > +
> >  static inline int adreno_is_a618(struct adreno_gpu *gpu)
> >  {
> > return gpu->revn == 618;
>
> --
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
> member
> of Code Aurora Forum, hosted by The Linux Foundation

Re: [PATCH] drm/msm: Fix MSM_INFO_GET_IOVA with carveout

2021-01-07 Thread Rob Clark

On Sat, Jan 2, 2021 at 12:26 PM Iskren Chernev  wrote:
>
> The msm_gem_get_iova should be guarded with gpu != NULL and not aspace
> != NULL, because aspace is NULL when using vram carveout.
>
> Fixes: 933415e24bd0d ("drm/msm: Add support for private address space 
> instances")
>
> Signed-off-by: Iskren Chernev 
> ---
>  drivers/gpu/drm/msm/msm_drv.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
> index c5e61cb3356df..c1953fb079133 100644
> --- a/drivers/gpu/drm/msm/msm_drv.c
> +++ b/drivers/gpu/drm/msm/msm_drv.c
> @@ -775,9 +775,10 @@ static int msm_ioctl_gem_info_iova(struct drm_device 
> *dev,
> struct drm_file *file, struct drm_gem_object *obj,
> uint64_t *iova)
>  {
> +   struct msm_drm_private *priv = dev->dev_private;
> struct msm_file_private *ctx = file->driver_priv;
>
> -   if (!ctx->aspace)
> +   if (!priv->gpu)
> return -EINVAL;

Does this actually work?  It seems like you would hit a null ptr deref
in msm_gem_init_vma().. and in general I think a lot of code paths
would be surprised by a null address space, so this seems like a risky
idea.

Maybe instead we should be creating an address space for the vram carveout?

BR,
-R


> /*
> --
> 2.29.2
>

Re: [v1] drm/msm/disp/dpu1: turn off vblank irqs aggressively in dpu driver

2020-12-14 Thread Rob Clark

On Mon, Dec 14, 2020 at 3:41 AM Kalyan Thota  wrote:
>
> Turn off vblank irqs immediately as soon as drm_vblank_put is
> requested so that there are no irqs triggered during idle state.
>
> This will reduce cpu wakeups and help in power saving. The change
> also enable driver timestamp for vblanks.
>
> Signed-off-by: Kalyan Thota 
> ---
>  drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c| 69 
> +
>  drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c | 15 +++
>  drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.h |  6 +++
>  drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c |  4 ++
>  4 files changed, 94 insertions(+)
>
> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c 
> b/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c
> index d4662e8..a4a5733 100644
> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c
> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c
> @@ -65,6 +65,73 @@ static void dpu_crtc_destroy(struct drm_crtc *crtc)
> kfree(dpu_crtc);
>  }
>
> +static struct drm_encoder *get_encoder_from_crtc(struct drm_crtc *crtc)
> +{
> +   struct drm_device *dev = crtc->dev;
> +   struct drm_encoder *encoder;
> +
> +   drm_for_each_encoder(encoder, dev)
> +   if (encoder->crtc == crtc)
> +   return encoder;
> +
> +   return NULL;
> +}
> +
> +static bool dpu_crtc_get_scanout_position(struct drm_crtc *crtc,
> +  bool in_vblank_irq,
> +  int *vpos, int *hpos,
> +  ktime_t *stime, ktime_t *etime,
> +  const struct drm_display_mode 
> *mode)
> +{
> +   unsigned int pipe = crtc->index;
> +   struct drm_encoder *encoder;
> +   int line, vsw, vbp, vactive_start, vactive_end, vfp_end;
> +
> +
> +   encoder = get_encoder_from_crtc(crtc);
> +   if (!encoder) {
> +   DRM_ERROR("no encoder found for crtc %d\n", pipe);
> +   return false;
> +   }
> +
> +   vsw = mode->crtc_vsync_end - mode->crtc_vsync_start;
> +   vbp = mode->crtc_vtotal - mode->crtc_vsync_end;
> +
> +   /*
> +* the line counter is 1 at the start of the VSYNC pulse and VTOTAL at
> +* the end of VFP. Translate the porch values relative to the line
> +* counter positions.
> +*/
> +
> +   vactive_start = vsw + vbp + 1;
> +
> +   vactive_end = vactive_start + mode->crtc_vdisplay;
> +
> +   /* last scan line before VSYNC */
> +   vfp_end = mode->crtc_vtotal;
> +
> +   if (stime)
> +   *stime = ktime_get();
> +
> +   line = dpu_encoder_get_linecount(encoder);
> +
> +   if (line < vactive_start)
> +   line -= vactive_start;
> +   else if (line > vactive_end)
> +   line = line - vfp_end - vactive_start;
> +   else
> +   line -= vactive_start;
> +
> +   *vpos = line;
> +   *hpos = 0;
> +
> +   if (etime)
> +   *etime = ktime_get();
> +
> +   return true;
> +}
> +
> +
>  static void _dpu_crtc_setup_blend_cfg(struct dpu_crtc_mixer *mixer,
> struct dpu_plane_state *pstate, struct dpu_format *format)
>  {
> @@ -1243,6 +1310,7 @@ static const struct drm_crtc_funcs dpu_crtc_funcs = {
> .early_unregister = dpu_crtc_early_unregister,
> .enable_vblank  = msm_crtc_enable_vblank,
> .disable_vblank = msm_crtc_disable_vblank,
> +   .get_vblank_timestamp = drm_crtc_vblank_helper_get_vblank_timestamp,
>  };
>
>  static const struct drm_crtc_helper_funcs dpu_crtc_helper_funcs = {
> @@ -1251,6 +1319,7 @@ static const struct drm_crtc_helper_funcs 
> dpu_crtc_helper_funcs = {
> .atomic_check = dpu_crtc_atomic_check,
> .atomic_begin = dpu_crtc_atomic_begin,
> .atomic_flush = dpu_crtc_atomic_flush,
> +   .get_scanout_position = dpu_crtc_get_scanout_position,
>  };

I'm happy to see get_vblank_timestamp/get_scanout_position wired up, I
had a WIP patch for this somewhere but never got around to finishing
it.

But you probably should mention in the commit msg why this is part of this patch

>
>  /* initialize crtc */
> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c 
> b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
> index f7f5c25..6c7c7fd 100644
> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
> @@ -425,6 +425,21 @@ int dpu_encoder_helper_unregister_irq(struct 
> dpu_encoder_phys *phys_enc,
> return 0;
>  }
>
> +int dpu_encoder_get_linecount(struct drm_encoder *drm_enc)
> +{
> +   struct dpu_encoder_virt *dpu_enc = NULL;
> +   struct dpu_encoder_phys *phys = NULL;
> +   int linecount = 0;
> +
> +   dpu_enc = to_dpu_encoder_virt(drm_enc);
> +   phys = dpu_enc ? dpu_enc->cur_master : NULL;
> +
> +   if (phys && phys->ops.get_line_count)
> +   linecount = phys->ops.get_line_count(phys);
> +
> +

[PATCH] drm/msm: Fix WARN_ON() splat in _free_object()

2020-12-10 Thread Rob Clark

From: Rob Clark 

[  192.062000] [ cut here ]
[  192.062498] WARNING: CPU: 3 PID: 2039 at drivers/gpu/drm/msm/msm_gem.c:381 
put_iova_vmas+0x94/0xa0 [msm]
[  192.062870] Modules linked in: snd_hrtimer snd_seq snd_seq_device rfcomm 
algif_hash algif_skcipher af_alg bnep xt_CHECKSUM nft_chain_nat xt_MASQUERADE 
nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter xt_tcpudp 
nft_compat cpufreq_powersave cpufreq_conservative q6asm_dai q6routing q6afe_dai 
q6adm bridge q6afe q6asm q6dsp_common q6core stp llc nf_tables libcrc32c 
nfnetlink snd_soc_wsa881x regmap_sdw soundwire_qcom gpio_wcd934x 
snd_soc_wcd934x wcd934x regmap_slimbus venus_enc venus_dec apr videobuf2_dma_sg 
qrtr_smd uvcvideo videobuf2_vmalloc videobuf2_memops ath10k_snoc ath10k_core 
hci_uart btqca btbcm mac80211 bluetooth snd_soc_sdm845 ath snd_soc_rt5663 
snd_soc_qcom_common snd_soc_rl6231 soundwire_bus ecdh_generic ecc 
qcom_spmi_adc5 venus_core qcom_pon qcom_spmi_temp_alarm qcom_vadc_common 
v4l2_mem2mem videobuf2_v4l2 cfg80211 videobuf2_common hid_multitouch 
reset_qcom_pdc qcrypto qcom_rng rfkill qcom_q6v5_mss libarc4 libdes qrtr ns 
qcom_wdt socinfo slim_qcom_ngd_ctrl
[  192.065739]  pdr_interface qcom_q6v5_pas slimbus qcom_pil_info qcom_q6v5 
qcom_sysmon qcom_common qcom_glink_smem qmi_helpers rmtfs_mem tcp_bbr sch_fq 
fuse ip_tables x_tables ipv6 crc_ccitt ti_sn65dsi86 i2c_hid msm mdt_loader 
llcc_qcom rtc_pm8xxx ocmem drm_kms_helper crct10dif_ce phy_qcom_qusb2 
i2c_qcom_geni panel_simple drm pwm_bl
[  192.066066] CPU: 3 PID: 2039 Comm: gnome-shell Tainted: GW 
5.10.0-rc7-next-20201208 #1
[  192.066068] Hardware name: LENOVO 81JL/LNVNB161216, BIOS 9UCN33WW(V2.06) 06/ 
4/2019
[  192.066072] pstate: 4045 (nZcv daif +PAN -UAO -TCO BTYPE=--)
[  192.066099] pc : put_iova_vmas+0x94/0xa0 [msm]
[  192.066262] lr : put_iova_vmas+0x1c/0xa0 [msm]
[  192.066403] sp : 800019efbbb0
[  192.066405] x29: 800019efbbb0 x28: 800019efbd88
[  192.066411] x27:  x26: 109582efa400
[  192.066417] x25: 0009 x24: 012b
[  192.066422] x23: 109582efa438 x22: 109582efa450
[  192.066427] x21: 109582efa528 x20: 1095cbd4f200
[  192.066432] x19: 1095cbd4f200 x18: 
[  192.066438] x17:  x16: c26c200ca750
[  192.066727] x15:  x14: 
[  192.066741] x13: 1096fb8c9100 x12: 0002
[  192.066754] x11:  x10: 0002
[  192.067046] x9 : 0001 x8 : 0a36
[  192.067060] x7 : 4e2ad9f11000 x6 : c26c216d4000
[  192.067212] x5 : c26c2022661c x4 : 1095c2b98000
[  192.067367] x3 : 1095cbd4f300 x2 : 
[  192.067380] x1 : 1095c2b98000 x0 : 
[  192.067667] Call trace:
[  192.067734]  put_iova_vmas+0x94/0xa0 [msm]
[  192.068078]  msm_gem_free_object+0xb4/0x110 [msm]
[  192.068399]  drm_gem_object_free+0x1c/0x30 [drm]
[  192.068717]  drm_gem_object_handle_put_unlocked+0xf0/0xf8 [drm]
[  192.069032]  drm_gem_object_release_handle+0x6c/0x88 [drm]
[  192.069349]  drm_gem_handle_delete+0x68/0xc0 [drm]
[  192.069666]  drm_gem_close_ioctl+0x30/0x48 [drm]
[  192.069984]  drm_ioctl_kernel+0xc0/0x110 [drm]
[  192.070303]  drm_ioctl+0x210/0x440 [drm]
[  192.070588]  __arm64_sys_ioctl+0xa8/0xf0
[  192.070599]  el0_svc_common.constprop.0+0x74/0x190
[  192.070608]  do_el0_svc+0x24/0x90
[  192.070618]  el0_svc+0x14/0x20
[  192.070903]  el0_sync_handler+0xb0/0xb8
[  192.070911]  el0_sync+0x174/0x180
[  192.070918] ---[ end trace bee6b12a899001a3 ]---
[  192.072140] [ cut here ]

Fixes: 9b73bde39cf2 ("drm/msm: Fix use-after-free in msm_gem with carveout")
Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_gem.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 68a6c7eacc0a..a21be5b910ff 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -990,6 +990,8 @@ void msm_gem_free_object(struct drm_gem_object *obj)
if (msm_obj->pages)
kvfree(msm_obj->pages);
 
+   put_iova_vmas(obj);
+
/* dma_buf_detach() grabs resv lock, so we need to unlock
 * prior to drm_prime_gem_destroy
 */
@@ -999,11 +1001,10 @@ void msm_gem_free_object(struct drm_gem_object *obj)
} else {
msm_gem_vunmap(obj);
put_pages(obj);
+   put_iova_vmas(obj);
msm_gem_unlock(obj);
}
 
-   put_iova_vmas(obj);
-
drm_gem_object_release(obj);
 
kfree(msm_obj);
-- 
2.28.0

Re: [RESEND PATCH v2 4/5] drm/msm: add DRM_MSM_GEM_SYNC_CACHE for non-coherent cache maintenance

2020-11-29 Thread Rob Clark

On Mon, Nov 16, 2020 at 9:55 AM Jonathan Marek  wrote:
>
> On 11/16/20 12:50 PM, Rob Clark wrote:
> > On Mon, Nov 16, 2020 at 9:33 AM Christoph Hellwig  wrote:
> >>
> >> On Sat, Nov 14, 2020 at 03:07:20PM -0500, Jonathan Marek wrote:
> >>> qcom's vulkan driver has nonCoherentAtomSize=1, and it looks like
> >>> dma_sync_single_for_cpu() does deal in some way with the partial cache 
> >>> line
> >>> case, although I'm not sure that means we can have a 
> >>> nonCoherentAtomSize=1.
> >>
> >> No, it doesn't.  You need to ensure ownership is managed at
> >> dma_get_cache_alignment() granularity.
> >
> > my guess is nonCoherentAtomSize=1 only works in the case of cache
> > coherent buffers
> >
>
> nonCoherentAtomSize doesn't apply to coherent memory (as the name
> implies), I guess qcom's driver is just wrong about having
> nonCoherentAtomSize=1.
>
> Jordan just mentioned there is at least one conformance test for this, I
> wonder if it just doesn't test it well enough, or just doesn't test the
> non-coherent memory type?

I was *assuming* (but could be wrong) that Jordan was referring to an
opencl cts test?

At any rate, it is sounding like you should add a
`MSM_PARAM_CACHE_ALIGNMENT` type of param that returns
dma_get_cache_alignment(), and then properly implement offset/end

BR,
-R

[PATCH] msm/mdp5: Fix some kernel-doc warnings

2020-11-29 Thread Rob Clark

From: Rob Clark 

Fixes the following W=1 kernel build warning(s):

 drivers/gpu/drm/msm/disp/mdp5/mdp5_ctl.c:227: warning: Function parameter or 
member 'ctl' not described in 'mdp5_ctl_set_encoder_state'
 drivers/gpu/drm/msm/disp/mdp5/mdp5_ctl.c:227: warning: Function parameter or 
member 'pipeline' not described in 'mdp5_ctl_set_encoder_state'
 drivers/gpu/drm/msm/disp/mdp5/mdp5_ctl.c:227: warning: Function parameter or 
member 'enabled' not described in 'mdp5_ctl_set_encoder_state'
 drivers/gpu/drm/msm/disp/mdp5/mdp5_ctl.c:227: warning: Excess function 
parameter 'enable' description in 'mdp5_ctl_set_encoder_state'
 drivers/gpu/drm/msm/disp/mdp5/mdp5_ctl.c:529: warning: Function parameter or 
member 'ctl' not described in 'mdp5_ctl_commit'
 drivers/gpu/drm/msm/disp/mdp5/mdp5_ctl.c:529: warning: Function parameter or 
member 'pipeline' not described in 'mdp5_ctl_commit'
 drivers/gpu/drm/msm/disp/mdp5/mdp5_ctl.c:529: warning: Function parameter or 
member 'flush_mask' not described in 'mdp5_ctl_commit'
 drivers/gpu/drm/msm/disp/mdp5/mdp5_ctl.c:529: warning: Function parameter or 
member 'start' not described in 'mdp5_ctl_commit'

Cc: Lee Jones 
Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/disp/mdp5/mdp5_ctl.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_ctl.c 
b/drivers/gpu/drm/msm/disp/mdp5/mdp5_ctl.c
index 030279d7b64b..81b0c7cf954e 100644
--- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_ctl.c
+++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_ctl.c
@@ -216,7 +216,9 @@ static void send_start_signal(struct mdp5_ctl *ctl)
 /**
  * mdp5_ctl_set_encoder_state() - set the encoder state
  *
- * @enable: true, when encoder is ready for data streaming; false, otherwise.
+ * @ctl:  the CTL instance
+ * @pipeline: the encoder's INTF + MIXER configuration
+ * @enabled:  true, when encoder is ready for data streaming; false, otherwise.
  *
  * Note:
  * This encoder state is needed to trigger START signal (data path kickoff).
@@ -510,6 +512,13 @@ static void fix_for_single_flush(struct mdp5_ctl *ctl, u32 
*flush_mask,
 /**
  * mdp5_ctl_commit() - Register Flush
  *
+ * @ctl:the CTL instance
+ * @pipeline:   the encoder's INTF + MIXER configuration
+ * @flush_mask: bitmask of display controller hw blocks to flush
+ * @start:  if true, immediately update flush registers and set START
+ *  bit, otherwise accumulate flush_mask bits until we are
+ *  ready to START
+ *
  * The flush register is used to indicate several registers are all
  * programmed, and are safe to update to the back copy of the double
  * buffered registers.
-- 
2.28.0

Re: [PATCHv8 0/8] System Cache support for GPU and required SMMU support

2020-11-24 Thread Rob Clark

On Tue, Nov 24, 2020 at 1:43 PM Will Deacon  wrote:
>
> On Tue, Nov 24, 2020 at 11:05:39AM -0800, Rob Clark wrote:
> > On Tue, Nov 24, 2020 at 3:10 AM Will Deacon  wrote:
> > > On Tue, Nov 24, 2020 at 09:32:54AM +0530, Sai Prakash Ranjan wrote:
> > > > On 2020-11-24 00:52, Rob Clark wrote:
> > > > > On Mon, Nov 23, 2020 at 9:01 AM Sai Prakash Ranjan
> > > > >  wrote:
> > > > > > On 2020-11-23 20:51, Will Deacon wrote:
> > > > > > > Modulo some minor comments I've made, this looks good to me. What 
> > > > > > > is
> > > > > > > the
> > > > > > > plan for merging it? I can take the IOMMU parts, but patches 4-6 
> > > > > > > touch
> > > > > > > the
> > > > > > > MSM GPU driver and I'd like to avoid conflicts with that.
> > > > > > >
> > > > > >
> > > > > > SMMU bits are pretty much independent and GPU relies on the domain
> > > > > > attribute
> > > > > > and the quirk exposed, so as long as SMMU changes go in first it
> > > > > > should
> > > > > > be good.
> > > > > > Rob?
> > > > >
> > > > > I suppose one option would be to split out the patch that adds the
> > > > > attribute into it's own patch, and merge that both thru drm and iommu?
> > > > >
> > > >
> > > > Ok I can split out domain attr and quirk into its own patch if Will is
> > > > fine with that approach.
> > >
> > > Why don't I just queue the first two patches on their own branch and we
> > > both pull that?
> >
> > Ok, that works for me.  I normally base msm-next on -rc1 but I guess
> > as long as we base the branch on the older or our two -next branches,
> > that should work out nicely
>
> Turns out we're getting a v10 of Sai's stuff, so I've asked him to split
> patch two up anyway. Then I'll make a branch based on -rc1 that we can
> both pull.

Sounds good, thx

BR,
-R

Re: [PATCHv8 0/8] System Cache support for GPU and required SMMU support

2020-11-24 Thread Rob Clark

On Tue, Nov 24, 2020 at 3:10 AM Will Deacon  wrote:
>
> On Tue, Nov 24, 2020 at 09:32:54AM +0530, Sai Prakash Ranjan wrote:
> > On 2020-11-24 00:52, Rob Clark wrote:
> > > On Mon, Nov 23, 2020 at 9:01 AM Sai Prakash Ranjan
> > >  wrote:
> > > >
> > > > On 2020-11-23 20:51, Will Deacon wrote:
> > > > > On Tue, Nov 17, 2020 at 08:00:39PM +0530, Sai Prakash Ranjan wrote:
> > > > >> Some hardware variants contain a system cache or the last level
> > > > >> cache(llc). This cache is typically a large block which is shared
> > > > >> by multiple clients on the SOC. GPU uses the system cache to cache
> > > > >> both the GPU data buffers(like textures) as well the SMMU pagetables.
> > > > >> This helps with improved render performance as well as lower power
> > > > >> consumption by reducing the bus traffic to the system memory.
> > > > >>
> > > > >> The system cache architecture allows the cache to be split into 
> > > > >> slices
> > > > >> which then be used by multiple SOC clients. This patch series is an
> > > > >> effort to enable and use two of those slices preallocated for the 
> > > > >> GPU,
> > > > >> one for the GPU data buffers and another for the GPU SMMU hardware
> > > > >> pagetables.
> > > > >>
> > > > >> Patch 1 - Patch 6 adds system cache support in SMMU and GPU driver.
> > > > >> Patch 7 and 8 are minor cleanups for arm-smmu impl.
> > > > >>
> > > > >> Changes in v8:
> > > > >>  * Introduce a generic domain attribute for pagetable config (Will)
> > > > >>  * Rename quirk to more generic IO_PGTABLE_QUIRK_ARM_OUTER_WBWA 
> > > > >> (Will)
> > > > >>  * Move non-strict mode to use new struct domain_attr_io_pgtbl_config
> > > > >> (Will)
> > > > >
> > > > > Modulo some minor comments I've made, this looks good to me. What is
> > > > > the
> > > > > plan for merging it? I can take the IOMMU parts, but patches 4-6 touch
> > > > > the
> > > > > MSM GPU driver and I'd like to avoid conflicts with that.
> > > > >
> > > >
> > > > SMMU bits are pretty much independent and GPU relies on the domain
> > > > attribute
> > > > and the quirk exposed, so as long as SMMU changes go in first it
> > > > should
> > > > be good.
> > > > Rob?
> > >
> > > I suppose one option would be to split out the patch that adds the
> > > attribute into it's own patch, and merge that both thru drm and iommu?
> > >
> >
> > Ok I can split out domain attr and quirk into its own patch if Will is
> > fine with that approach.
>
> Why don't I just queue the first two patches on their own branch and we
> both pull that?

Ok, that works for me.  I normally base msm-next on -rc1 but I guess
as long as we base the branch on the older or our two -next branches,
that should work out nicely

BR,
-R

Re: [PATCHv8 0/8] System Cache support for GPU and required SMMU support

2020-11-23 Thread Rob Clark

On Mon, Nov 23, 2020 at 9:01 AM Sai Prakash Ranjan
 wrote:
>
> On 2020-11-23 20:51, Will Deacon wrote:
> > On Tue, Nov 17, 2020 at 08:00:39PM +0530, Sai Prakash Ranjan wrote:
> >> Some hardware variants contain a system cache or the last level
> >> cache(llc). This cache is typically a large block which is shared
> >> by multiple clients on the SOC. GPU uses the system cache to cache
> >> both the GPU data buffers(like textures) as well the SMMU pagetables.
> >> This helps with improved render performance as well as lower power
> >> consumption by reducing the bus traffic to the system memory.
> >>
> >> The system cache architecture allows the cache to be split into slices
> >> which then be used by multiple SOC clients. This patch series is an
> >> effort to enable and use two of those slices preallocated for the GPU,
> >> one for the GPU data buffers and another for the GPU SMMU hardware
> >> pagetables.
> >>
> >> Patch 1 - Patch 6 adds system cache support in SMMU and GPU driver.
> >> Patch 7 and 8 are minor cleanups for arm-smmu impl.
> >>
> >> Changes in v8:
> >>  * Introduce a generic domain attribute for pagetable config (Will)
> >>  * Rename quirk to more generic IO_PGTABLE_QUIRK_ARM_OUTER_WBWA (Will)
> >>  * Move non-strict mode to use new struct domain_attr_io_pgtbl_config
> >> (Will)
> >
> > Modulo some minor comments I've made, this looks good to me. What is
> > the
> > plan for merging it? I can take the IOMMU parts, but patches 4-6 touch
> > the
> > MSM GPU driver and I'd like to avoid conflicts with that.
> >
>
> SMMU bits are pretty much independent and GPU relies on the domain
> attribute
> and the quirk exposed, so as long as SMMU changes go in first it should
> be good.
> Rob?

I suppose one option would be to split out the patch that adds the
attribute into it's own patch, and merge that both thru drm and iommu?

If Will/Robin dislike that approach, I'll pick up the parts of the drm
patches which don't depend on the new attribute for v5.11 and the rest
for v5.12.. or possibly a second late v5.11 pull req if airlied
doesn't hate me too much for it.

Going forward, I think we will have one or two more co-dependent
series, like the smmu iova fault handler improvements that Jordan
posted.  So I would like to hear how Will and Robin prefer to handle
those.

BR,
-R


> Thanks,
> Sai
>
> --
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
> member
> of Code Aurora Forum, hosted by The Linux Foundation

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1414 matches

Mail list logo