Re: [PATCH 01/12] slab: Introduce kmalloc_size_roundup()

2022-09-22 Thread Feng Tang
Thanks Hyeonggon for looping in me.

On Thu, Sep 22, 2022 at 07:12:21PM +0800, Hyeonggon Yoo wrote:
> On Wed, Sep 21, 2022 at 08:10:02PM -0700, Kees Cook wrote:
> > In the effort to help the compiler reason about buffer sizes, the
> > __alloc_size attribute was added to allocators. This improves the scope
> > of the compiler's ability to apply CONFIG_UBSAN_BOUNDS and (in the near
> > future) CONFIG_FORTIFY_SOURCE. For most allocations, this works well,
> > as the vast majority of callers are not expecting to use more memory
> > than what they asked for.
> > 
> > There is, however, one common exception to this: anticipatory resizing
> > of kmalloc allocations. These cases all use ksize() to determine the
> > actual bucket size of a given allocation (e.g. 128 when 126 was asked
> > for). This comes in two styles in the kernel:
> > 
> > 1) An allocation has been determined to be too small, and needs to be
> >resized. Instead of the caller choosing its own next best size, it
> >wants to minimize the number of calls to krealloc(), so it just uses
> >ksize() plus some additional bytes, forcing the realloc into the next
> >bucket size, from which it can learn how large it is now. For example:
> > 
> > data = krealloc(data, ksize(data) + 1, gfp);
> > data_len = ksize(data);
> > 
> > 2) The minimum size of an allocation is calculated, but since it may
> >grow in the future, just use all the space available in the chosen
> >bucket immediately, to avoid needing to reallocate later. A good
> >example of this is skbuff's allocators:
> > 
> > data = kmalloc_reserve(size, gfp_mask, node, );
> > ...
> > /* kmalloc(size) might give us more room than requested.
> >  * Put skb_shared_info exactly at the end of allocated zone,
> >  * to allow max possible filling before reallocation.
> >  */
> > osize = ksize(data);
> > size = SKB_WITH_OVERHEAD(osize);
> > 
> > In both cases, the "how large is the allocation?" question is answered
> > _after_ the allocation, where the compiler hinting is not in an easy place
> > to make the association any more. This mismatch between the compiler's
> > view of the buffer length and the code's intention about how much it is
> > going to actually use has already caused problems[1]. It is possible to
> > fix this by reordering the use of the "actual size" information.
> > 
> > We can serve the needs of users of ksize() and still have accurate buffer
> > length hinting for the compiler by doing the bucket size calculation
> > _before_ the allocation. Code can instead ask "how large an allocation
> > would I get for a given size?".
> > 
> > Introduce kmalloc_size_roundup(), to serve this function so we can start
> > replacing the "anticipatory resizing" uses of ksize().
> >
> 
> Cc-ing Feng Tang who may welcome this series ;)
 
Indeed! This will help our work of extending slub redzone check,
as we also ran into some trouble with ksize() users when extending
the redzone support to this extra allocated space than requested
size [1], and have to disable the redzone sanity for all ksize()
users [2].

[1]. 
https://lore.kernel.org/lkml/20220719134503.ga56...@shbuild999.sh.intel.com/
[2]. https://lore.kernel.org/lkml/20220913065423.520159-5-feng.t...@intel.com/

Thanks,
Feng

> > [1] https://github.com/ClangBuiltLinux/linux/issues/1599
> > https://github.com/KSPP/linux/issues/183
> > 
> > Cc: Vlastimil Babka 
> > Cc: Pekka Enberg 
> > Cc: David Rientjes 
> > Cc: Joonsoo Kim 
> > Cc: Andrew Morton 
> > Cc: linux...@kvack.org
> > Signed-off-by: Kees Cook 
> > ---


Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression

2019-09-16 Thread Feng Tang
Hi Thomas,

On Mon, Sep 09, 2019 at 04:12:37PM +0200, Thomas Zimmermann wrote:
> Hi
> 
> Am 04.09.19 um 08:27 schrieb Feng Tang:
> > Hi Thomas,
> > 
> > On Wed, Aug 28, 2019 at 12:51:40PM +0200, Thomas Zimmermann wrote:
> >> Hi
> >>
> >> Am 28.08.19 um 11:37 schrieb Rong Chen:
> >>> Hi Thomas,
> >>>
> >>> On 8/28/19 1:16 AM, Thomas Zimmermann wrote:
> >>>> Hi
> >>>>
> >>>> Am 27.08.19 um 14:33 schrieb Chen, Rong A:
> >>>>> Both patches have little impact on the performance from our side.
> >>>> Thanks for testing. Too bad they doesn't solve the issue.
> >>>>
> >>>> There's another patch attached. Could you please tests this as well?
> >>>> Thanks a lot!
> >>>>
> >>>> The patch comes from Daniel Vetter after discussing the problem on IRC.
> >>>> The idea of the patch is that the old mgag200 code might display much
> >>>> less frames that the generic code, because mgag200 only prints from
> >>>> non-atomic context. If we simulate this with the generic code, we should
> >>>> see roughly the original performance.
> >>>>
> >>>>
> >>>
> >>> It's cool, the patch "usecansleep.patch" can fix the issue.
> >>
> >> Thank you for testing. But don't get too excited, because the patch
> >> simulates a bug that was present in the original mgag200 code. A
> >> significant number of frames are simply skipped. That is apparently the
> >> reason why it's faster.
> > 
> > Thanks for the detailed info, so the original code skips time-consuming
> > work inside atomic context on purpose. Is there any space to optmise it?
> > If 2 scheduled update worker are handled at almost same time, can one be
> > skipped?
> 
> We discussed ideas on IRC and decided that screen updates could be
> synchronized with vblank intervals. This may give some rate limiting to
> the output.
> 
> If you like, you could try the patch set at [1]. It adds the respective
> code to console and mgag200.

I just tried the 2 patches, no obvious change (comparing to the
18.8% regression), both in overall benchmark and micro-profiling.

90f479ae51afa45e 04a0983095feaee022cdd65e3e4 
 --- 
 37236 ±  3%  +2.5%  38167 ±  3%  vm-scalability.median
  0.15 ± 24% -25.1%   0.11 ± 23%  vm-scalability.median_stddev
  0.15 ± 23% -25.1%   0.11 ± 22%  vm-scalability.stddev
  12767318 ±  4%  +2.5%   13089177 ±  3%  vm-scalability.throughput
 
Thanks,
Feng

> 
> Best regards
> Thomas
> 
> [1]
> https://lists.freedesktop.org/archives/dri-devel/2019-September/234850.html
> 
> > 
> > Thanks,
> > Feng
> > 
> >>
> >> Best regards
> >> Thomas
> 
> -- 
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany
> GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
> HRB 21284 (AG Nürnberg)
> 





Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression

2019-09-05 Thread Feng Tang
On Thu, Sep 05, 2019 at 06:37:47PM +0800, Daniel Vetter wrote:
> On Thu, Sep 5, 2019 at 8:58 AM Feng Tang  wrote:
> >
> > Hi Vetter,
> >
> > On Wed, Sep 04, 2019 at 01:20:29PM +0200, Daniel Vetter wrote:
> > > On Wed, Sep 4, 2019 at 1:15 PM Dave Airlie  wrote:
> > > >
> > > > On Wed, 4 Sep 2019 at 19:17, Daniel Vetter  wrote:
> > > > >
> > > > > On Wed, Sep 4, 2019 at 10:35 AM Feng Tang  wrote:
> > > > > >
> > > > > > Hi Daniel,
> > > > > >
> > > > > > On Wed, Sep 04, 2019 at 10:11:11AM +0200, Daniel Vetter wrote:
> > > > > > > On Wed, Sep 4, 2019 at 8:53 AM Thomas Zimmermann 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > Hi
> > > > > > > >
> > > > > > > > Am 04.09.19 um 08:27 schrieb Feng Tang:
> > > > > > > > >> Thank you for testing. But don't get too excited, because 
> > > > > > > > >> the patch
> > > > > > > > >> simulates a bug that was present in the original mgag200 
> > > > > > > > >> code. A
> > > > > > > > >> significant number of frames are simply skipped. That is 
> > > > > > > > >> apparently the
> > > > > > > > >> reason why it's faster.
> > > > > > > > >
> > > > > > > > > Thanks for the detailed info, so the original code skips 
> > > > > > > > > time-consuming
> > > > > > > > > work inside atomic context on purpose. Is there any space to 
> > > > > > > > > optmise it?
> > > > > > > > > If 2 scheduled update worker are handled at almost same time, 
> > > > > > > > > can one be
> > > > > > > > > skipped?
> > > > > > > >
> > > > > > > > To my knowledge, there's only one instance of the worker. 
> > > > > > > > Re-scheduling
> > > > > > > > the worker before a previous instance started, will not create 
> > > > > > > > a second
> > > > > > > > instance. The worker's instance will complete all pending 
> > > > > > > > updates. So in
> > > > > > > > some way, skipping workers already happens.
> > > > > > >
> > > > > > > So I think that the most often fbcon update from atomic context 
> > > > > > > is the
> > > > > > > blinking cursor. If you disable that one you should be back to 
> > > > > > > the old
> > > > > > > performance level I think, since just writing to dmesg is from 
> > > > > > > process
> > > > > > > context, so shouldn't change.
> > > > > >
> > > > > > Hmm, then for the old driver, it should also do the most update in
> > > > > > non-atomic context?
> > > > > >
> > > > > > One other thing is, I profiled that updating a 3MB shadow buffer 
> > > > > > needs
> > > > > > 20 ms, which transfer to 150 MB/s bandwidth. Could it be related 
> > > > > > with
> > > > > > the cache setting of DRM shadow buffer? say the orginal code use a
> > > > > > cachable buffer?
> > > > >
> > > > > Hm, that would indicate the write-combining got broken somewhere. This
> > > > > should definitely be faster. Also we shouldn't transfer the hole
> > > > > thing, except when scrolling ...
> > > >
> > > > First rule of fbcon usage, you are always effectively scrolling.
> > > >
> > > > Also these devices might be on a PCIE 1x piece of wet string, not sure
> > > > if the numbers reflect that.
> > >
> > > pcie 1x 1.0 is 250MB/s, so yeah with a bit of inefficiency and
> > > overhead not entirely out of the question that 150MB/s is actually the
> > > hw limit. If it's really pcie 1x 1.0, no idea where to check that.
> > > Also might be worth to double-check that the gpu pci bar is listed as
> > > wc in debugfs/x86/pat_memtype_list.
> >
> > Here is some dump of the device info and the pat_memtype_list, while it is
> > running other 0day task:
> 
> Looks

Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression

2019-09-05 Thread Feng Tang
Hi Vetter,

On Wed, Sep 04, 2019 at 01:20:29PM +0200, Daniel Vetter wrote:
> On Wed, Sep 4, 2019 at 1:15 PM Dave Airlie  wrote:
> >
> > On Wed, 4 Sep 2019 at 19:17, Daniel Vetter  wrote:
> > >
> > > On Wed, Sep 4, 2019 at 10:35 AM Feng Tang  wrote:
> > > >
> > > > Hi Daniel,
> > > >
> > > > On Wed, Sep 04, 2019 at 10:11:11AM +0200, Daniel Vetter wrote:
> > > > > On Wed, Sep 4, 2019 at 8:53 AM Thomas Zimmermann 
> > > > >  wrote:
> > > > > >
> > > > > > Hi
> > > > > >
> > > > > > Am 04.09.19 um 08:27 schrieb Feng Tang:
> > > > > > >> Thank you for testing. But don't get too excited, because the 
> > > > > > >> patch
> > > > > > >> simulates a bug that was present in the original mgag200 code. A
> > > > > > >> significant number of frames are simply skipped. That is 
> > > > > > >> apparently the
> > > > > > >> reason why it's faster.
> > > > > > >
> > > > > > > Thanks for the detailed info, so the original code skips 
> > > > > > > time-consuming
> > > > > > > work inside atomic context on purpose. Is there any space to 
> > > > > > > optmise it?
> > > > > > > If 2 scheduled update worker are handled at almost same time, can 
> > > > > > > one be
> > > > > > > skipped?
> > > > > >
> > > > > > To my knowledge, there's only one instance of the worker. 
> > > > > > Re-scheduling
> > > > > > the worker before a previous instance started, will not create a 
> > > > > > second
> > > > > > instance. The worker's instance will complete all pending updates. 
> > > > > > So in
> > > > > > some way, skipping workers already happens.
> > > > >
> > > > > So I think that the most often fbcon update from atomic context is the
> > > > > blinking cursor. If you disable that one you should be back to the old
> > > > > performance level I think, since just writing to dmesg is from process
> > > > > context, so shouldn't change.
> > > >
> > > > Hmm, then for the old driver, it should also do the most update in
> > > > non-atomic context?
> > > >
> > > > One other thing is, I profiled that updating a 3MB shadow buffer needs
> > > > 20 ms, which transfer to 150 MB/s bandwidth. Could it be related with
> > > > the cache setting of DRM shadow buffer? say the orginal code use a
> > > > cachable buffer?
> > >
> > > Hm, that would indicate the write-combining got broken somewhere. This
> > > should definitely be faster. Also we shouldn't transfer the hole
> > > thing, except when scrolling ...
> >
> > First rule of fbcon usage, you are always effectively scrolling.
> >
> > Also these devices might be on a PCIE 1x piece of wet string, not sure
> > if the numbers reflect that.
> 
> pcie 1x 1.0 is 250MB/s, so yeah with a bit of inefficiency and
> overhead not entirely out of the question that 150MB/s is actually the
> hw limit. If it's really pcie 1x 1.0, no idea where to check that.
> Also might be worth to double-check that the gpu pci bar is listed as
> wc in debugfs/x86/pat_memtype_list.

Here is some dump of the device info and the pat_memtype_list, while it is
running other 0day task:

controller info
=
03:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200e 
[Pilot] ServerEngines (SEP1) (rev 05) (prog-if 00 [VGA controller])
Subsystem: Intel Corporation MGA G200e [Pilot] ServerEngines (SEP1)
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort+ SERR- 
Capabilities: [144 v1] Vendor Specific Information: ID=0004 Rev=1 
Len=03c 
Capabilities: [1d0 v1] Vendor Specific Information: ID=0003 Rev=1 
Len=00a 
Capabilities: [250 v1] #19
Capabilities: [280 v1] Vendor Specific Information: ID=0005 Rev=3 
Len=018 
Capabilities: [298 v1] Vendor Specific Information: ID=0007 Rev=0 
Len=024 


Thanks,
Feng


>
> -Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression

2019-09-04 Thread Feng Tang
Hi Daniel,

On Wed, Sep 04, 2019 at 10:11:11AM +0200, Daniel Vetter wrote:
> On Wed, Sep 4, 2019 at 8:53 AM Thomas Zimmermann  wrote:
> >
> > Hi
> >
> > Am 04.09.19 um 08:27 schrieb Feng Tang:
> > >> Thank you for testing. But don't get too excited, because the patch
> > >> simulates a bug that was present in the original mgag200 code. A
> > >> significant number of frames are simply skipped. That is apparently the
> > >> reason why it's faster.
> > >
> > > Thanks for the detailed info, so the original code skips time-consuming
> > > work inside atomic context on purpose. Is there any space to optmise it?
> > > If 2 scheduled update worker are handled at almost same time, can one be
> > > skipped?
> >
> > To my knowledge, there's only one instance of the worker. Re-scheduling
> > the worker before a previous instance started, will not create a second
> > instance. The worker's instance will complete all pending updates. So in
> > some way, skipping workers already happens.
> 
> So I think that the most often fbcon update from atomic context is the
> blinking cursor. If you disable that one you should be back to the old
> performance level I think, since just writing to dmesg is from process
> context, so shouldn't change.

Hmm, then for the old driver, it should also do the most update in
non-atomic context? 

One other thing is, I profiled that updating a 3MB shadow buffer needs
20 ms, which transfer to 150 MB/s bandwidth. Could it be related with
the cache setting of DRM shadow buffer? say the orginal code use a
cachable buffer?


> 
> https://unix.stackexchange.com/questions/3759/how-to-stop-cursor-from-blinking
> 
> Bunch of tricks, but tbh I haven't tested them.

Thomas has suggested to disable curson by
echo 0 > /sys/devices/virtual/graphics/fbcon/cursor_blink

We tried that way, and no change for the performance data.

Thanks,
Feng

> 
> In any case, I still strongly advice you don't print anything to dmesg
> or fbcon while benchmarking, because dmesg/printf are anything but
> fast, especially if a gpu driver is involved. There's some efforts to
> make the dmesg/printk side less painful (untangling the console_lock
> from printk), but fundamentally printing to the gpu from the kernel
> through dmesg/fbcon won't be cheap. It's just not something we
> optimize beyond "make sure it works for emergencies".
> -Daniel
> 
> >
> > Best regards
> > Thomas
> >
> > >
> > > Thanks,
> > > Feng
> > >
> > >>
> > >> Best regards
> > >> Thomas
> > > ___
> > > dri-devel mailing list
> > > dri-devel@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > >
> >
> > --
> > Thomas Zimmermann
> > Graphics Driver Developer
> > SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany
> > GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
> > HRB 21284 (AG Nürnberg)
> >
> > ___
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> 
> 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression

2019-09-04 Thread Feng Tang
Hi Thomas,

On Wed, Aug 28, 2019 at 12:51:40PM +0200, Thomas Zimmermann wrote:
> Hi
> 
> Am 28.08.19 um 11:37 schrieb Rong Chen:
> > Hi Thomas,
> > 
> > On 8/28/19 1:16 AM, Thomas Zimmermann wrote:
> >> Hi
> >>
> >> Am 27.08.19 um 14:33 schrieb Chen, Rong A:
> >>> Both patches have little impact on the performance from our side.
> >> Thanks for testing. Too bad they doesn't solve the issue.
> >>
> >> There's another patch attached. Could you please tests this as well?
> >> Thanks a lot!
> >>
> >> The patch comes from Daniel Vetter after discussing the problem on IRC.
> >> The idea of the patch is that the old mgag200 code might display much
> >> less frames that the generic code, because mgag200 only prints from
> >> non-atomic context. If we simulate this with the generic code, we should
> >> see roughly the original performance.
> >>
> >>
> > 
> > It's cool, the patch "usecansleep.patch" can fix the issue.
> 
> Thank you for testing. But don't get too excited, because the patch
> simulates a bug that was present in the original mgag200 code. A
> significant number of frames are simply skipped. That is apparently the
> reason why it's faster.

Thanks for the detailed info, so the original code skips time-consuming
work inside atomic context on purpose. Is there any space to optmise it?
If 2 scheduled update worker are handled at almost same time, can one be
skipped?

Thanks,
Feng

> 
> Best regards
> Thomas
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression

2019-08-23 Thread Feng Tang
Hi Thomas,

On Thu, Aug 22, 2019 at 07:25:11PM +0200, Thomas Zimmermann wrote:
> Hi
> 
> I was traveling and could reply earlier. Sorry for taking so long.

No problem! I guessed so :)

> 
> Am 13.08.19 um 11:36 schrieb Feng Tang:
> > Hi Thomas, 
> > 
> > On Mon, Aug 12, 2019 at 03:25:45PM +0800, Feng Tang wrote:
> >> Hi Thomas,
> >>
> >> On Fri, Aug 09, 2019 at 04:12:29PM +0800, Rong Chen wrote:
> >>> Hi,
> >>>
> >>>>> Actually we run the benchmark as a background process, do we need to
> >>>>> disable the cursor and test again?
> >>>> There's a worker thread that updates the display from the shadow buffer.
> >>>> The blinking cursor periodically triggers the worker thread, but the
> >>>> actual update is just the size of one character.
> >>>>
> >>>> The point of the test without output is to see if the regression comes
> >>> >from the buffer update (i.e., the memcpy from shadow buffer to VRAM), or
> >>> >from the worker thread. If the regression goes away after disabling the
> >>>> blinking cursor, then the worker thread is the problem. If it already
> >>>> goes away if there's simply no output from the test, the screen update
> >>>> is the problem. On my machine I have to disable the blinking cursor, so
> >>>> I think the worker causes the performance drop.
> >>>
> >>> We disabled redirecting stdout/stderr to /dev/kmsg,  and the regression is
> >>> gone.
> >>>
> >>> commit:
> >>>   f1f8555dfb9 drm/bochs: Use shadow buffer for bochs framebuffer console
> >>>   90f479ae51a drm/mgag200: Replace struct mga_fbdev with generic 
> >>> framebuffer
> >>> emulation
> >>>
> >>> f1f8555dfb9a70a2  90f479ae51afa45efab97afdde testcase/testparams/testbox
> >>>   -- ---
> >>>  %stddev  change %stddev
> >>>  \  |    \
> >>>  43785   44481
> >>> vm-scalability/300s-8T-anon-cow-seq-hugetlb/lkp-knm01
> >>>  43785   44481    GEO-MEAN 
> >>> vm-scalability.median
> >>
> >> Till now, from Rong's tests:
> >> 1. Disabling cursor blinking doesn't cure the regression.
> >> 2. Disabling printint test results to console can workaround the
> >> regression.
> >>
> >> Also if we set the perfer_shadown to 0, the regression is also
> >> gone.
> > 
> > We also did some further break down for the time consumed by the
> > new code.
> > 
> > The drm_fb_helper_dirty_work() calls sequentially 
> > 1. drm_client_buffer_vmap (290 us)
> > 2. drm_fb_helper_dirty_blit_real  (19240 us)
> > 3. helper->fb->funcs->dirty()---> NULL for mgag200 driver
> > 4. drm_client_buffer_vunmap   (215 us)
> >
> 
> It's somewhat different to what I observed, but maybe I just couldn't
> reproduce the problem correctly.
> 
> > The average run time is listed after the function names.
> > 
> > From it, we can see drm_fb_helper_dirty_blit_real() takes too long
> > time (about 20ms for each run). I guess this is the root cause
> > of this regression, as the original code doesn't use this dirty worker.
> 
> True, the original code uses a temporary buffer, but updates the display
> immediately.
> 
> My guess is that this could be a caching problem. The worker runs on a
> different CPU, which doesn't have the shadow buffer in cache.

Yes, that's my thought too. I profiled the working set size, for most of
the drm_fb_helper_dirty_blit_real(), it will update a buffer 4096x768(3 MB),
and as it is called 30~40 times per second, it surely will affect the cache.


> > As said in last email, setting the prefer_shadow to 0 can avoid
> > the regrssion. Could it be an option?
> 
> Unfortunately not. Without the shadow buffer, the console's display
> buffer permanently resides in video memory. It consumes significant
> amount of that memory (say 8 MiB out of 16 MiB). That doesn't leave
> enough room for anything else.
> 
> The best option is to not print to the console.

Do we have other options here?

My thought is this is clearly a regression, that the old driver works
fine, while the new version in linux-next doesn't. Also for a frame
buffer console, writting dozens line of message to it is not a rare
user case. We have many test platforms (servers/desktops/laptops)
with differ

Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression

2019-08-16 Thread Feng Tang
Hi Thomas,

On Tue, Aug 13, 2019 at 05:36:16PM +0800, Feng Tang wrote:
> Hi Thomas, 
> 
> On Mon, Aug 12, 2019 at 03:25:45PM +0800, Feng Tang wrote:
> > Hi Thomas,
> > 
> > On Fri, Aug 09, 2019 at 04:12:29PM +0800, Rong Chen wrote:
> > > Hi,
> > > 
> > > >>Actually we run the benchmark as a background process, do we need to
> > > >>disable the cursor and test again?
> > > >There's a worker thread that updates the display from the shadow buffer.
> > > >The blinking cursor periodically triggers the worker thread, but the
> > > >actual update is just the size of one character.
> > > >
> > > >The point of the test without output is to see if the regression comes
> > > >from the buffer update (i.e., the memcpy from shadow buffer to VRAM), or
> > > >from the worker thread. If the regression goes away after disabling the
> > > >blinking cursor, then the worker thread is the problem. If it already
> > > >goes away if there's simply no output from the test, the screen update
> > > >is the problem. On my machine I have to disable the blinking cursor, so
> > > >I think the worker causes the performance drop.
> > > 
> > > We disabled redirecting stdout/stderr to /dev/kmsg,  and the regression is
> > > gone.
> > > 
> > > commit:
> > >   f1f8555dfb9 drm/bochs: Use shadow buffer for bochs framebuffer console
> > >   90f479ae51a drm/mgag200: Replace struct mga_fbdev with generic 
> > > framebuffer
> > > emulation
> > > 
> > > f1f8555dfb9a70a2  90f479ae51afa45efab97afdde testcase/testparams/testbox
> > >   -- ---
> > >  %stddev  change %stddev
> > >  \  |    \
> > >  43785   44481
> > > vm-scalability/300s-8T-anon-cow-seq-hugetlb/lkp-knm01
> > >  43785   44481    GEO-MEAN 
> > > vm-scalability.median
> > 
> > Till now, from Rong's tests:
> > 1. Disabling cursor blinking doesn't cure the regression.
> > 2. Disabling printint test results to console can workaround the
> > regression.
> > 
> > Also if we set the perfer_shadown to 0, the regression is also
> > gone.
> 
> We also did some further break down for the time consumed by the
> new code.
> 
> The drm_fb_helper_dirty_work() calls sequentially 
> 1. drm_client_buffer_vmap   (290 us)
> 2. drm_fb_helper_dirty_blit_real  (19240 us)
> 3. helper->fb->funcs->dirty()---> NULL for mgag200 driver
> 4. drm_client_buffer_vunmap   (215 us)
> 
> The average run time is listed after the function names.
> 
> From it, we can see drm_fb_helper_dirty_blit_real() takes too long
> time (about 20ms for each run). I guess this is the root cause
> of this regression, as the original code doesn't use this dirty worker.
> 
> As said in last email, setting the prefer_shadow to 0 can avoid
> the regrssion. Could it be an option?

Any comments on this? thanks

- Feng

> 
> Thanks,
> Feng
> 
> > 
> > --- a/drivers/gpu/drm/mgag200/mgag200_main.c
> > +++ b/drivers/gpu/drm/mgag200/mgag200_main.c
> > @@ -167,7 +167,7 @@ int mgag200_driver_load(struct drm_device *dev, 
> > unsigned long flags)
> > dev->mode_config.preferred_depth = 16;
> > else
> > dev->mode_config.preferred_depth = 32;
> > -   dev->mode_config.prefer_shadow = 1;
> > +   dev->mode_config.prefer_shadow = 0;
> > 
> > And from the perf data, one obvious difference is good case don't
> > call drm_fb_helper_dirty_work(), while bad case calls.
> > 
> > Thanks,
> > Feng
> > 
> > > Best Regards,
> > > Rong Chen
> ___
> LKP mailing list
> l...@lists.01.org
> https://lists.01.org/mailman/listinfo/lkp
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression

2019-08-13 Thread Feng Tang
Hi Thomas, 

On Mon, Aug 12, 2019 at 03:25:45PM +0800, Feng Tang wrote:
> Hi Thomas,
> 
> On Fri, Aug 09, 2019 at 04:12:29PM +0800, Rong Chen wrote:
> > Hi,
> > 
> > >>Actually we run the benchmark as a background process, do we need to
> > >>disable the cursor and test again?
> > >There's a worker thread that updates the display from the shadow buffer.
> > >The blinking cursor periodically triggers the worker thread, but the
> > >actual update is just the size of one character.
> > >
> > >The point of the test without output is to see if the regression comes
> > >from the buffer update (i.e., the memcpy from shadow buffer to VRAM), or
> > >from the worker thread. If the regression goes away after disabling the
> > >blinking cursor, then the worker thread is the problem. If it already
> > >goes away if there's simply no output from the test, the screen update
> > >is the problem. On my machine I have to disable the blinking cursor, so
> > >I think the worker causes the performance drop.
> > 
> > We disabled redirecting stdout/stderr to /dev/kmsg,  and the regression is
> > gone.
> > 
> > commit:
> >   f1f8555dfb9 drm/bochs: Use shadow buffer for bochs framebuffer console
> >   90f479ae51a drm/mgag200: Replace struct mga_fbdev with generic framebuffer
> > emulation
> > 
> > f1f8555dfb9a70a2  90f479ae51afa45efab97afdde testcase/testparams/testbox
> >   -- ---
> >  %stddev  change %stddev
> >  \  |    \
> >  43785   44481
> > vm-scalability/300s-8T-anon-cow-seq-hugetlb/lkp-knm01
> >  43785   44481    GEO-MEAN vm-scalability.median
> 
> Till now, from Rong's tests:
> 1. Disabling cursor blinking doesn't cure the regression.
> 2. Disabling printint test results to console can workaround the
> regression.
> 
> Also if we set the perfer_shadown to 0, the regression is also
> gone.

We also did some further break down for the time consumed by the
new code.

The drm_fb_helper_dirty_work() calls sequentially 
1. drm_client_buffer_vmap (290 us)
2. drm_fb_helper_dirty_blit_real  (19240 us)
3. helper->fb->funcs->dirty()---> NULL for mgag200 driver
4. drm_client_buffer_vunmap   (215 us)

The average run time is listed after the function names.

From it, we can see drm_fb_helper_dirty_blit_real() takes too long
time (about 20ms for each run). I guess this is the root cause
of this regression, as the original code doesn't use this dirty worker.

As said in last email, setting the prefer_shadow to 0 can avoid
the regrssion. Could it be an option?

Thanks,
Feng

> 
> --- a/drivers/gpu/drm/mgag200/mgag200_main.c
> +++ b/drivers/gpu/drm/mgag200/mgag200_main.c
> @@ -167,7 +167,7 @@ int mgag200_driver_load(struct drm_device *dev, unsigned 
> long flags)
>   dev->mode_config.preferred_depth = 16;
>   else
>   dev->mode_config.preferred_depth = 32;
> - dev->mode_config.prefer_shadow = 1;
> + dev->mode_config.prefer_shadow = 0;
> 
> And from the perf data, one obvious difference is good case don't
> call drm_fb_helper_dirty_work(), while bad case calls.
> 
> Thanks,
> Feng
> 
> > Best Regards,
> > Rong Chen
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression

2019-08-12 Thread Feng Tang
Hi Thomas,

On Fri, Aug 09, 2019 at 04:12:29PM +0800, Rong Chen wrote:
> Hi,
> 
> >>Actually we run the benchmark as a background process, do we need to
> >>disable the cursor and test again?
> >There's a worker thread that updates the display from the shadow buffer.
> >The blinking cursor periodically triggers the worker thread, but the
> >actual update is just the size of one character.
> >
> >The point of the test without output is to see if the regression comes
> >from the buffer update (i.e., the memcpy from shadow buffer to VRAM), or
> >from the worker thread. If the regression goes away after disabling the
> >blinking cursor, then the worker thread is the problem. If it already
> >goes away if there's simply no output from the test, the screen update
> >is the problem. On my machine I have to disable the blinking cursor, so
> >I think the worker causes the performance drop.
> 
> We disabled redirecting stdout/stderr to /dev/kmsg,  and the regression is
> gone.
> 
> commit:
>   f1f8555dfb9 drm/bochs: Use shadow buffer for bochs framebuffer console
>   90f479ae51a drm/mgag200: Replace struct mga_fbdev with generic framebuffer
> emulation
> 
> f1f8555dfb9a70a2  90f479ae51afa45efab97afdde testcase/testparams/testbox
>   -- ---
>  %stddev  change %stddev
>  \  |    \
>  43785   44481
> vm-scalability/300s-8T-anon-cow-seq-hugetlb/lkp-knm01
>  43785   44481    GEO-MEAN vm-scalability.median

Till now, from Rong's tests:
1. Disabling cursor blinking doesn't cure the regression.
2. Disabling printint test results to console can workaround the
regression.

Also if we set the perfer_shadown to 0, the regression is also
gone.

--- a/drivers/gpu/drm/mgag200/mgag200_main.c
+++ b/drivers/gpu/drm/mgag200/mgag200_main.c
@@ -167,7 +167,7 @@ int mgag200_driver_load(struct drm_device *dev, unsigned 
long flags)
dev->mode_config.preferred_depth = 16;
else
dev->mode_config.preferred_depth = 32;
-   dev->mode_config.prefer_shadow = 1;
+   dev->mode_config.prefer_shadow = 0;

And from the perf data, one obvious difference is good case don't
call drm_fb_helper_dirty_work(), while bad case calls.

Thanks,
Feng

> Best Regards,
> Rong Chen
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression

2019-08-05 Thread Feng Tang
Hi Thomas,

On Mon, Aug 05, 2019 at 12:22:11PM +0200, Thomas Zimmermann wrote:

[snip] 

> >>   2019-08-03 19:29:17  ./case-anon-cow-seq-hugetlb
> >>   2019-08-03 19:29:17  ./usemem --runtime 300 -n 4 --prealloc --prefault
> >> -O -U 815394406
> >>   917318700 bytes / 659419 usecs = 1358497 KB/s
> >>   917318700 bytes / 659658 usecs = 1358005 KB/s
> >>   917318700 bytes / 659916 usecs = 1357474 KB/s
> >>   917318700 bytes / 660168 usecs = 1356956 KB/s
> >>
> >> Rong, Feng, could you confirm this by disabling the cursor or blinking?
> > 
> > Glad to know this method restored the drop. Rong is running the case.
> > 
> > While I have another finds, as I noticed your patch changed the bpp from
> > 24 to 32, I had a patch to change it back to 24, and run the case in
> > the weekend, the -18% regrssion was reduced to about -5%. Could this
> > be related?
> 
> In the original code, the fbdev console already ran with 32 bpp [1] and
> 16 bpp was selected for low-end devices. [2][3] The patch only set the
> same values for userspace; nothing changed for the console.

I did the experiment becasue I checked the commit 

90f479ae51afa4 drm/mgag200: Replace struct mga_fbdev with generic framebuffer 
emulation

in which there is code:

diff --git a/drivers/gpu/drm/mgag200/mgag200_main.c 
b/drivers/gpu/drm/mgag200/mgag200_main.c
index b10f726..a977333 100644
--- a/drivers/gpu/drm/mgag200/mgag200_main.c
+++ b/drivers/gpu/drm/mgag200/mgag200_main.c
@@ -162,7 +162,7 @@ int mgag200_driver_load(struct drm_device *dev, unsigned 
long flags)
if (IS_G200_SE(mdev) && mdev->mc.vram_size < (2048*1024))
dev->mode_config.preferred_depth = 16;
else
-   dev->mode_config.preferred_depth = 24;
+   dev->mode_config.preferred_depth = 32;
dev->mode_config.prefer_shadow = 1;
 
My debug patch was kind of restoring of this part.

Thanks,
Feng

> 
> Best regards
> Thomas
> 
> [1]
> https://cgit.freedesktop.org/drm/drm-tip/tree/drivers/gpu/drm/mgag200/mgag200_fb.c?id=5d17718997367c435dbe5341a8e270d9b19478d3#n259
> [2]
> https://cgit.freedesktop.org/drm/drm-tip/tree/drivers/gpu/drm/mgag200/mgag200_fb.c?id=5d17718997367c435dbe5341a8e270d9b19478d3#n263
> [3]
> https://cgit.freedesktop.org/drm/drm-tip/tree/drivers/gpu/drm/mgag200/mgag200_fb.c?id=5d17718997367c435dbe5341a8e270d9b19478d3#n286
> 
> > 
> > commit: 
> >   f1f8555dfb9 drm/bochs: Use shadow buffer for bochs framebuffer console
> >   90f479ae51a drm/mgag200: Replace struct mga_fbdev with generic 
> > framebuffer emulation
> >   01e75fea0d5 mgag200: restore the depth back to 24
> > 
> > f1f8555dfb9a70a2 90f479ae51afa45efab97afdde9 01e75fea0d5ff39d3e588c20ec5 
> >  --- --- 
> >  43921 ±  2% -18.3%      35884-4.8%  41826
> > vm-scalability.median
> >   14889337   -17.5%   12291029    -4.1%   14278574
> > vm-scalability.throughput
> >  
> > commit 01e75fea0d5ff39d3e588c20ec52e7a4e6588a74
> > Author: Feng Tang 
> > Date:   Fri Aug 2 15:09:19 2019 +0800
> > 
> > mgag200: restore the depth back to 24
> > 
> > Signed-off-by: Feng Tang 
> > 
> > diff --git a/drivers/gpu/drm/mgag200/mgag200_main.c 
> > b/drivers/gpu/drm/mgag200/mgag200_main.c
> > index a977333..ac8f6c9 100644
> > --- a/drivers/gpu/drm/mgag200/mgag200_main.c
> > +++ b/drivers/gpu/drm/mgag200/mgag200_main.c
> > @@ -162,7 +162,7 @@ int mgag200_driver_load(struct drm_device *dev, 
> > unsigned long flags)
> > if (IS_G200_SE(mdev) && mdev->mc.vram_size < (2048*1024))
> > dev->mode_config.preferred_depth = 16;
> > else
> > -   dev->mode_config.preferred_depth = 32;
> > +   dev->mode_config.preferred_depth = 24;> 
> > dev->mode_config.prefer_shadow = 1;
> >  
> > r = mgag200_modeset_init(mdev);
> > 
> > Thanks,
> > Feng
> > 
> >>
> >>
> >> The difference between mgag200's original fbdev support and generic
> >> fbdev emulation is generic fbdev's worker task that updates the VRAM
> >> buffer from the shadow buffer. mgag200 does this immediately, but relies
> >> on drm_can_sleep(), which is deprecated.
> >>
> >> I think that the worker task interferes with the test case, as the
> >> worker has been in fbdev emulation since forever and no performance
> >> regressions have been reported so far.
> >>
> >>
> >> So unless

Re: [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression

2019-08-05 Thread Feng Tang
Hi Thomas,

On Sun, Aug 04, 2019 at 08:39:19PM +0200, Thomas Zimmermann wrote:
> Hi
> 
> I did some further analysis on this problem and found that the blinking
> cursor affects performance of the vm-scalability test case.
> 
> I only have a 4-core machine, so scalability is not really testable. Yet
> I see the effects of running vm-scalibility against drm-tip, a revert of
> the mgag200 patch and the vmap fixes that I posted a few days ago.
> 
> After reverting the mgag200 patch, running the test as described in the
> report
> 
>   bin/lkp run job.yaml
> 
> gives results like
> 
>   2019-08-02 19:34:37  ./case-anon-cow-seq-hugetlb
>   2019-08-02 19:34:37  ./usemem --runtime 300 -n 4 --prealloc --prefault
> -O -U 815395225
>   917319627 bytes / 756534 usecs = 1184110 KB/s
>   917319627 bytes / 764675 usecs = 1171504 KB/s
>   917319627 bytes / 766414 usecs = 1168846 KB/s
>   917319627 bytes / 777990 usecs = 1151454 KB/s
> 
> Running the test against current drm-tip gives slightly worse results,
> such as.
> 
>   2019-08-03 19:17:06  ./case-anon-cow-seq-hugetlb
>   2019-08-03 19:17:06  ./usemem --runtime 300 -n 4 --prealloc --prefault
> -O -U 815394406
>   917318700 bytes / 871607 usecs = 1027778 KB/s
>   917318700 bytes / 894173 usecs = 1001840 KB/s
>   917318700 bytes / 919694 usecs = 974040 KB/s
>   917318700 bytes / 923341 usecs = 970193 KB/s
> 
> The test puts out roughly one result per second. Strangely sending the
> output to /dev/null can make results significantly worse.
> 
>   bin/lkp run job.yaml > /dev/null
> 
>   2019-08-03 19:23:04  ./case-anon-cow-seq-hugetlb
>   2019-08-03 19:23:04  ./usemem --runtime 300 -n 4 --prealloc --prefault
> -O -U 815394406
>   917318700 bytes / 1207358 usecs = 741966 KB/s
>   917318700 bytes / 1210456 usecs = 740067 KB/s
>   917318700 bytes / 1216572 usecs = 736346 KB/s
>   917318700 bytes / 1239152 usecs = 722929 KB/s
> 
> I realized that there's still a blinking cursor on the screen, which I
> disabled with
> 
>   tput civis
> 
> or alternatively
> 
>   echo 0 > /sys/devices/virtual/graphics/fbcon/cursor_blink
> 
> Running the the test now gives the original or even better results, such as
> 
>   bin/lkp run job.yaml > /dev/null
> 
>   2019-08-03 19:29:17  ./case-anon-cow-seq-hugetlb
>   2019-08-03 19:29:17  ./usemem --runtime 300 -n 4 --prealloc --prefault
> -O -U 815394406
>   917318700 bytes / 659419 usecs = 1358497 KB/s
>   917318700 bytes / 659658 usecs = 1358005 KB/s
>   917318700 bytes / 659916 usecs = 1357474 KB/s
>   917318700 bytes / 660168 usecs = 1356956 KB/s
> 
> Rong, Feng, could you confirm this by disabling the cursor or blinking?

Glad to know this method restored the drop. Rong is running the case.

While I have another finds, as I noticed your patch changed the bpp from
24 to 32, I had a patch to change it back to 24, and run the case in
the weekend, the -18% regrssion was reduced to about -5%. Could this
be related?

commit: 
  f1f8555dfb9 drm/bochs: Use shadow buffer for bochs framebuffer console
  90f479ae51a drm/mgag200: Replace struct mga_fbdev with generic framebuffer 
emulation
  01e75fea0d5 mgag200: restore the depth back to 24

f1f8555dfb9a70a2 90f479ae51afa45efab97afdde9 01e75fea0d5ff39d3e588c20ec5 
 --- --- 
 43921 ±  2% -18.3%  35884    -4.8%  41826
vm-scalability.median
  14889337   -17.5%   12291029-4.1%   14278574
vm-scalability.throughput
 
commit 01e75fea0d5ff39d3e588c20ec52e7a4e6588a74
Author: Feng Tang 
Date:   Fri Aug 2 15:09:19 2019 +0800

mgag200: restore the depth back to 24

Signed-off-by: Feng Tang 

diff --git a/drivers/gpu/drm/mgag200/mgag200_main.c 
b/drivers/gpu/drm/mgag200/mgag200_main.c
index a977333..ac8f6c9 100644
--- a/drivers/gpu/drm/mgag200/mgag200_main.c
+++ b/drivers/gpu/drm/mgag200/mgag200_main.c
@@ -162,7 +162,7 @@ int mgag200_driver_load(struct drm_device *dev, unsigned 
long flags)
if (IS_G200_SE(mdev) && mdev->mc.vram_size < (2048*1024))
dev->mode_config.preferred_depth = 16;
else
-   dev->mode_config.preferred_depth = 32;
+   dev->mode_config.preferred_depth = 24;
dev->mode_config.prefer_shadow = 1;
 
r = mgag200_modeset_init(mdev);

Thanks,
Feng

> 
> 
> The difference between mgag200's original fbdev support and generic
> fbdev emulation is generic fbdev's worker task that updates the VRAM
> buffer from the shadow buffer. mgag200 does this immediately, but relies
> on drm_can_sleep(), which is deprecated.
> 
> I think that the worker task interferes with the test case, as the
> worker has been in fbdev emul

Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression

2019-08-01 Thread Feng Tang
Hi Thomas,

On Thu, Aug 01, 2019 at 11:59:28AM +0200, Thomas Zimmermann wrote:
> Hi
> 
> Am 01.08.19 um 10:37 schrieb Feng Tang:
> > On Thu, Aug 01, 2019 at 02:19:53PM +0800, Rong Chen wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> commit: 90f479ae51afa45efab97afdde9b94b9660dd3e4 ("drm/mgag200: 
> >>>>>>>>>>> Replace struct mga_fbdev with generic framebuffer emulation")
> >>>>>>>>>>> https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next.git
> >>>>>>>>>>>  master
> >>>>>>>>>> Daniel, Noralf, we may have to revert this patch.
> >>>>>>>>>>
> >>>>>>>>>> I expected some change in display performance, but not in VM. 
> >>>>>>>>>> Since it's
> >>>>>>>>>> a server chipset, probably no one cares much about display 
> >>>>>>>>>> performance.
> >>>>>>>>>> So that seemed like a good trade-off for re-using shared code.
> >>>>>>>>>>
> >>>>>>>>>> Part of the patch set is that the generic fb emulation now maps and
> >>>>>>>>>> unmaps the fbdev BO when updating the screen. I guess that's the 
> >>>>>>>>>> cause
> >>>>>>>>>> of the performance regression. And it should be visible with other
> >>>>>>>>>> drivers as well if they use a shadow FB for fbdev emulation.
> >>>>>>>>> For fbcon we should need to do any maps/unamps at all, this is for 
> >>>>>>>>> the
> >>>>>>>>> fbdev mmap support only. If the testcase mentioned here tests fbdev
> >>>>>>>>> mmap handling it's pretty badly misnamed :-) And as long as you 
> >>>>>>>>> don't
> >>>>>>>>> have an fbdev mmap there shouldn't be any impact at all.
> >>>>>>>> The ast and mgag200 have only a few MiB of VRAM, so we have to get 
> >>>>>>>> the
> >>>>>>>> fbdev BO out if it's not being displayed. If not being mapped, it 
> >>>>>>>> can be
> >>>>>>>> evicted and make room for X, etc.
> >>>>>>>>
> >>>>>>>> To make this work, the BO's memory is mapped and unmapped in
> >>>>>>>> drm_fb_helper_dirty_work() before being updated from the shadow FB. 
> >>>>>>>> [1]
> >>>>>>>> That fbdev mapping is established on each screen update, more or 
> >>>>>>>> less.
> >>>>>>>> From my (yet unverified) understanding, this causes the performance
> >>>>>>>> regression in the VM code.
> >>>>>>>>
> >>>>>>>> The original code in mgag200 used to kmap the fbdev BO while it's 
> >>>>>>>> being
> >>>>>>>> displayed; [2] and the drawing code only mapped it when necessary 
> >>>>>>>> (i.e.,
> >>>>>>>> not being display). [3]
> >>>>>>> Hm yeah, this vmap/vunmap is going to be pretty bad. We indeed should
> >>>>>>> cache this.
> >>>>>>>
> >>>>>>>> I think this could be added for VRAM helpers as well, but it's still 
> >>>>>>>> a
> >>>>>>>> workaround and non-VRAM drivers might also run into such a 
> >>>>>>>> performance
> >>>>>>>> regression if they use the fbdev's shadow fb.
> >>>>>>> Yeah agreed, fbdev emulation should try to cache the vmap.
> >>>>>>>
> >>>>>>>> Noralf mentioned that there are plans for other DRM clients besides 
> >>>>>>>> the
> >>>>>>>> console. They would as well run into similar problems.
> >>>>>>>>
> >>>>>>>>>> The thing is that we'd need another generic fbdev emulation for 
> >>>>>>>>>> ast and
> >>>>>>>>>> mgag200 that handles this issue properly.
> >>>>>>>>> Yeah I dont think we want to 

Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression

2019-08-01 Thread Feng Tang
On Thu, Aug 01, 2019 at 02:19:53PM +0800, Rong Chen wrote:
> >
> >commit: 90f479ae51afa45efab97afdde9b94b9660dd3e4 ("drm/mgag200: 
> >Replace struct mga_fbdev with generic framebuffer emulation")
> >https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next.git
> > master
> Daniel, Noralf, we may have to revert this patch.
> 
> I expected some change in display performance, but not in VM. Since 
> it's
> a server chipset, probably no one cares much about display 
> performance.
> So that seemed like a good trade-off for re-using shared code.
> 
> Part of the patch set is that the generic fb emulation now maps and
> unmaps the fbdev BO when updating the screen. I guess that's the cause
> of the performance regression. And it should be visible with other
> drivers as well if they use a shadow FB for fbdev emulation.
> >>>For fbcon we should need to do any maps/unamps at all, this is for the
> >>>fbdev mmap support only. If the testcase mentioned here tests fbdev
> >>>mmap handling it's pretty badly misnamed :-) And as long as you don't
> >>>have an fbdev mmap there shouldn't be any impact at all.
> >>The ast and mgag200 have only a few MiB of VRAM, so we have to get the
> >>fbdev BO out if it's not being displayed. If not being mapped, it can be
> >>evicted and make room for X, etc.
> >>
> >>To make this work, the BO's memory is mapped and unmapped in
> >>drm_fb_helper_dirty_work() before being updated from the shadow FB. [1]
> >>That fbdev mapping is established on each screen update, more or less.
> >> From my (yet unverified) understanding, this causes the performance
> >>regression in the VM code.
> >>
> >>The original code in mgag200 used to kmap the fbdev BO while it's being
> >>displayed; [2] and the drawing code only mapped it when necessary (i.e.,
> >>not being display). [3]
> >Hm yeah, this vmap/vunmap is going to be pretty bad. We indeed should
> >cache this.
> >
> >>I think this could be added for VRAM helpers as well, but it's still a
> >>workaround and non-VRAM drivers might also run into such a performance
> >>regression if they use the fbdev's shadow fb.
> >Yeah agreed, fbdev emulation should try to cache the vmap.
> >
> >>Noralf mentioned that there are plans for other DRM clients besides the
> >>console. They would as well run into similar problems.
> >>
> The thing is that we'd need another generic fbdev emulation for ast 
> and
> mgag200 that handles this issue properly.
> >>>Yeah I dont think we want to jump the gun here.  If you can try to
> >>>repro locally and profile where we're wasting cpu time I hope that
> >>>should sched a light what's going wrong here.
> >>I don't have much time ATM and I'm not even officially at work until
> >>late Aug. I'd send you the revert and investigate later. I agree that
> >>using generic fbdev emulation would be preferable.
> >Still not sure that's the right thing to do really. Yes it's a
> >regression, but vm testcases shouldn run a single line of fbcon or drm
> >code. So why this is impacted so heavily by a silly drm change is very
> >confusing to me. We might be papering over a deeper and much more
> >serious issue ...
> It's a regression, the right thing is to revert first and then work
> out the right thing to do.
> >>>Sure, but I have no idea whether the testcase is doing something
> >>>reasonable. If it's accidentally testing vm scalability of fbdev and
> >>>there's no one else doing something this pointless, then it's not a
> >>>real bug. Plus I think we're shooting the messenger here.
> >>>
> It's likely the test runs on the console and printfs stuff out while 
> running.
> >>>But why did we not regress the world if a few prints on the console
> >>>have such a huge impact? We didn't get an entire stream of mails about
> >>>breaking stuff ...
> >>The regression seems not related to the commit.  But we have retested
> >>and confirmed the regression.  Hard to understand what happens.
> >Does the regressed test cause any output on console while it's
> >measuring? If so, it's probably accidentally measuring fbcon/DRM code in
> >addition to the workload it's trying to measure.
> >
> 
> Sorry, I'm not familiar with DRM, we enabled the console to output logs, and
> attached please find the log file.
> 
> "Command line: ... console=tty0 earlyprintk=ttyS0,115200
> console=ttyS0,115200 vga=normal rw"

We did more check, and found this test machine does use the
mgag200 driver. 

And we are suspecting the regression is caused by 

commit cf1ca9aeb930df074bb5bbcde55f935fec04e529
Author: Thomas Zimmermann 
Date:   Wed Jul 3 09:58:24 2019 +0200

drm/fb-helper: Map DRM client buffer only when required

This 

Re: [Intel-gfx] [PATCH] drm/dp: add module parameter for the dpcd access max retries

2018-05-10 Thread Feng Tang
On Wed, May 09, 2018 at 12:28:15PM +0300, Jani Nikula wrote:
> On Wed, 09 May 2018, Feng Tang <feng.t...@intel.com> wrote:
> > On Wed, May 09, 2018 at 10:53:53AM +0300, Jani Nikula wrote:
> >> On Wed, 09 May 2018, Feng Tang <feng.t...@intel.com> wrote:
> >> > On Tue, May 08, 2018 at 10:30:19PM +0300, Jani Nikula wrote:
> >> >> On Wed, 09 May 2018, Feng Tang <feng.t...@intel.com> wrote:
> >> >> >> Well if it's edp probing, then atm we do need to block since we have
> >> >> >> no support for panel hotplugging. And userspace generally no
> >> >> >> expectations that built-in panels come async_synchronize_full
> >> >> >> making our fbdev code less async than desired is kinda a different
> >> >> >> story I think here. First step would be trying to figure out why we
> >> >> >> even bother with edp probing on this platform, when the thing isn't
> >> >> >> there. Sounds like broken VBT.
> >> >> >
> >> >> > Hi Daniel,
> >> >> >
> >> >> > Here are some of the VBT and DPCD related logs on my A3900 (bxt + 
> >> >> > GEN9 LP)
> >> >> > based NUC. but I don't have the knowledge to tell if the VBT is 
> >> >> > broken :)
> >> >> 
> >> >> Please run current upstream drm-tip when you're suggesting changes to
> >> >> upstream code. Looks like you're running at most v4.14. This can't be
> >> >> emphasized enough. We can't and won't merge the changes unless they make
> >> >> sense with current code.
> >> >
> >> > Yes, I understand, the patch posted  was created right after git-pulling
> >> > Linus' tree, and back-ported to test with 4.14 kernel. 
> >> >
> >> >> 
> >> >> As to vbt, please send me /sys/kernel/debug/dri/0/i915_vbt.
> >> >
> >> > Sure. as attached
> >> 
> >> Your VBT claims the device has an eDP panel. Does it have one or not?
> >
> > After asking around, our platform (BXT NUC) does have a eDP interface 
> > (someone
> > has tested with a eDP screen), but most of our platforms are connected
> > with 2 HDMI LCD monitors.
> 
> Sounds like you should have a different VBT for the cases where you ship
> with/without eDP connected. As you've noticed, we generally try pretty
Yep, this is a good idea. Currently I'm not able to change VBT, so I hacked
the code to simulating a no-eDP VBT like:

--- a/drivers/gpu/drm/i915/intel_bios.c
+++ b/drivers/gpu/drm/i915/intel_bios.c
@@ -1261,7 +1261,8 @@ static void parse_ddi_port(struct drm_i915_private 
*dev_priv, enum port port,
is_crt = child->device_type & DEVICE_TYPE_ANALOG_OUTPUT;
is_hdmi = is_dvi && (child->device_type & DEVICE_TYPE_NOT_HDMI_OUTPUT) 
== 0;
-   is_edp = is_dp && (child->device_type & DEVICE_TYPE_INTERNAL_CONNECTOR);
+   is_edp = 0; 
 
And it does cut the boottime a lot!! as avoiding the dpcd access.

And later i915_hpd_poll_init_work() will still call intel_dp_detect() and
call the time-eater drm_dp_dpcd_access() finally, but the situation is
better as it runs in an async way at this point.

Thanks,
Feng

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Intel-gfx] [PATCH] drm/dp: add module parameter for the dpcd access max retries

2018-05-09 Thread Feng Tang
On Wed, May 09, 2018 at 10:53:53AM +0300, Jani Nikula wrote:
> On Wed, 09 May 2018, Feng Tang <feng.t...@intel.com> wrote:
> > On Tue, May 08, 2018 at 10:30:19PM +0300, Jani Nikula wrote:
> >> On Wed, 09 May 2018, Feng Tang <feng.t...@intel.com> wrote:
> >> >> Well if it's edp probing, then atm we do need to block since we have
> >> >> no support for panel hotplugging. And userspace generally no
> >> >> expectations that built-in panels come async_synchronize_full
> >> >> making our fbdev code less async than desired is kinda a different
> >> >> story I think here. First step would be trying to figure out why we
> >> >> even bother with edp probing on this platform, when the thing isn't
> >> >> there. Sounds like broken VBT.
> >> >
> >> > Hi Daniel,
> >> >
> >> > Here are some of the VBT and DPCD related logs on my A3900 (bxt + GEN9 
> >> > LP)
> >> > based NUC. but I don't have the knowledge to tell if the VBT is broken :)
> >> 
> >> Please run current upstream drm-tip when you're suggesting changes to
> >> upstream code. Looks like you're running at most v4.14. This can't be
> >> emphasized enough. We can't and won't merge the changes unless they make
> >> sense with current code.
> >
> > Yes, I understand, the patch posted  was created right after git-pulling
> > Linus' tree, and back-ported to test with 4.14 kernel. 
> >
> >> 
> >> As to vbt, please send me /sys/kernel/debug/dri/0/i915_vbt.
> >
> > Sure. as attached
> 
> Your VBT claims the device has an eDP panel. Does it have one or not?

After asking around, our platform (BXT NUC) does have a eDP interface (someone
has tested with a eDP screen), but most of our platforms are connected
with 2 HDMI LCD monitors.

Thanks,
Feng
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Intel-gfx] [PATCH] drm/dp: add module parameter for the dpcd access max retries

2018-05-09 Thread Feng Tang
On Tue, May 08, 2018 at 10:30:19PM +0300, Jani Nikula wrote:
> On Wed, 09 May 2018, Feng Tang <feng.t...@intel.com> wrote:
> >  >> > > > On Mon, May 07, 2018 at 11:40:45AM +0100, Chris Wilson wrote:
> >> >> > > > > Quoting Feng Tang (2018-05-07 11:36:09)
> >> >> > > > > > To fulfil the Dell 4K monitor, the dpcd max retries has been 
> >> >> > > > > > bumped
> >> >> > > > > > from 7 to 32, which may hurt the boot/init time for some 
> >> >> > > > > > platforms,
> >> >> > > > > > as the 32 retries may take hundreds of ms.
> >> >> > > > >
> >> >> > > > > If we need that many retries, so be it. No modparam, the driver 
> >> >> > > > > just has
> >> >> > > > > to work.
> >> >> > > >
> >> >> > > > I understand your point. The retry numer was originally 7, and 
> >> >> > > > worked
> >> >> > > > fine untill the Dell 4K monitor which changes to 32.  According 
> >> >> > > > to my test,
> >> >> > > > each retry will take about 8ms on the A3960 based NUC.
> >> >> > > >
> >> >> > > > One of our product need to boot up within a given time limit, this
> >> >> > > > 32 retries will take about 1/3 of the budget (about 270ms), that's
> >> >> > > > why I would try to make it a parameter.
> >> >> > >
> >> >> > > The essence is that probing whether a monitor is connected should 
> >> >> > > not be
> >> >> > > blocking boot. If an async probe tries and fails to find a monitor,
> >> >> > > fine - no one will notice. If it does take 270ms to find a monitor, 
> >> >> > > it
> >> >> > > turns on 200ms after userspace kicks in, just like any other 
> >> >> > > hotplug.
> >> >> >
> >> >> > Yeah, the fix here is to get the probing out of the driver load path, 
> >> >> > not
> >> >> > to break the driver for some funky monitors. And definitely not using 
> >> >> > a
> >> >> > modparam.
> >> >>
> >> >> Hi Chris and Daniel,
> >> >>
> >> >> After reading the i915 modeset init code, I did some experiments:
> >> >> 1. make the intel_modeset_init() function async (here async means
> >> >>creating a async func wrapper and call async_schedul() for it)
> >> >> 2. make the intel_setup_outpus()+intel_modeset_setup_hw_state() async
> >> >
> >> > You could just set i915_pci_driver.driver.prove_type =
> >> > PROBE_PREFER_ASYNCHRONOUS for the same result, or set i915.async_probe=1
> >> > for testing.
> >> >
> >> > However, if it's blocking on async_synchronize_full(), that will be no
> >> > improvement. So if it is only an existing async_sychronize_full()
> >> > causing the fbdev config to be waited on before userspace, we need to
> >> > stop using the async mechanism and just use an ordinary worker and
> >> > manual flushing. If it's the fbdev probing blocking you, why are you
> >> > using fbdev?
> >> 
> >> Well if it's edp probing, then atm we do need to block since we have
> >> no support for panel hotplugging. And userspace generally no
> >> expectations that built-in panels come async_synchronize_full
> >> making our fbdev code less async than desired is kinda a different
> >> story I think here. First step would be trying to figure out why we
> >> even bother with edp probing on this platform, when the thing isn't
> >> there. Sounds like broken VBT.
> >
> > Hi Daniel,
> >
> > Here are some of the VBT and DPCD related logs on my A3900 (bxt + GEN9 LP)
> > based NUC. but I don't have the knowledge to tell if the VBT is broken :)
> 
> Please run current upstream drm-tip when you're suggesting changes to
> upstream code. Looks like you're running at most v4.14. This can't be
> emphasized enough. We can't and won't merge the changes unless they make
> sense with current code.

Yes, I understand, the patch posted  was created right after git-pulling
Linus' tree, and back-ported to test with 4.14 kernel. 

> 
> As to vbt, please send me /sys/kernel/debug/dri/0/i915_vbt.

Sure. as attached

Thanks,
Feng



apl_nuc_i915_vbt
Description: Binary data
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Intel-gfx] [PATCH] drm/dp: add module parameter for the dpcd access max retries

2018-05-08 Thread Feng Tang
 >> > > > On Mon, May 07, 2018 at 11:40:45AM +0100, Chris Wilson wrote:
> >> > > > > Quoting Feng Tang (2018-05-07 11:36:09)
> >> > > > > > To fulfil the Dell 4K monitor, the dpcd max retries has been 
> >> > > > > > bumped
> >> > > > > > from 7 to 32, which may hurt the boot/init time for some 
> >> > > > > > platforms,
> >> > > > > > as the 32 retries may take hundreds of ms.
> >> > > > >
> >> > > > > If we need that many retries, so be it. No modparam, the driver 
> >> > > > > just has
> >> > > > > to work.
> >> > > >
> >> > > > I understand your point. The retry numer was originally 7, and worked
> >> > > > fine untill the Dell 4K monitor which changes to 32.  According to 
> >> > > > my test,
> >> > > > each retry will take about 8ms on the A3960 based NUC.
> >> > > >
> >> > > > One of our product need to boot up within a given time limit, this
> >> > > > 32 retries will take about 1/3 of the budget (about 270ms), that's
> >> > > > why I would try to make it a parameter.
> >> > >
> >> > > The essence is that probing whether a monitor is connected should not 
> >> > > be
> >> > > blocking boot. If an async probe tries and fails to find a monitor,
> >> > > fine - no one will notice. If it does take 270ms to find a monitor, it
> >> > > turns on 200ms after userspace kicks in, just like any other hotplug.
> >> >
> >> > Yeah, the fix here is to get the probing out of the driver load path, not
> >> > to break the driver for some funky monitors. And definitely not using a
> >> > modparam.
> >>
> >> Hi Chris and Daniel,
> >>
> >> After reading the i915 modeset init code, I did some experiments:
> >> 1. make the intel_modeset_init() function async (here async means
> >>creating a async func wrapper and call async_schedul() for it)
> >> 2. make the intel_setup_outpus()+intel_modeset_setup_hw_state() async
> >
> > You could just set i915_pci_driver.driver.prove_type =
> > PROBE_PREFER_ASYNCHRONOUS for the same result, or set i915.async_probe=1
> > for testing.
> >
> > However, if it's blocking on async_synchronize_full(), that will be no
> > improvement. So if it is only an existing async_sychronize_full()
> > causing the fbdev config to be waited on before userspace, we need to
> > stop using the async mechanism and just use an ordinary worker and
> > manual flushing. If it's the fbdev probing blocking you, why are you
> > using fbdev?
> 
> Well if it's edp probing, then atm we do need to block since we have
> no support for panel hotplugging. And userspace generally no
> expectations that built-in panels come async_synchronize_full
> making our fbdev code less async than desired is kinda a different
> story I think here. First step would be trying to figure out why we
> even bother with edp probing on this platform, when the thing isn't
> there. Sounds like broken VBT.

Hi Daniel,

Here are some of the VBT and DPCD related logs on my A3900 (bxt + GEN9 LP)
based NUC. but I don't have the knowledge to tell if the VBT is broken :)

[0.545231] [drm:intel_bios_init] Set default to SSC at 10 kHz
[0.545237] [drm:intel_bios_init] VBT signature "$VBT BROXTON", BDB 
version 207
[0.545241] [drm:intel_bios_init] BDB_GENERAL_FEATURES int_tv_support 0 
int_crt_support 0 lvds_use_ssc 0 lvds_ssc_freq 12 display_clock_mode 1 
fdi_rx_polarity_inverted 0
[0.545245] [drm:intel_bios_init] crt_ddc_bus_pin: 2
[0.545255] [drm:intel_opregion_get_panel_type] Failed to get panel details 
from OpRegion (-19)
[0.545257] [drm:intel_bios_init] Panel type: 0 (VBT)
[0.545260] [drm:intel_bios_init] DRRS supported mode is seamless
[0.545275] [drm:intel_bios_init] Found panel mode in BIOS VBT tables:
[0.545281] [drm:drm_mode_debug_printmodeline] Modeline 0:"1920x1080" 0 
148500 1920 2008 2052 2200 1080 1084 1089 1125 0x8 0xa
[0.545292] [drm:intel_bios_init] VBT backlight PWM modulation frequency 200 
Hz, active high, min brightness 0, level 180, controller 1
[0.545301] [drm:intel_bios_init] Unsupported child device size for SDVO 
mapping.
[0.545305] [drm:intel_bios_init] Expected child device config size for VBT 
version 207 not known; assuming 38
[0.545323] [drm:intel_bios_init] DRRS State Enabled:1
[0.545334] [drm:intel_bios_init] Port A VBT info: DP:1 HDMI:0 DV

Re: [Intel-gfx] [PATCH] drm/dp: add module parameter for the dpcd access max retries

2018-05-08 Thread Feng Tang

On Mon, May 07, 2018 at 05:09:21PM +0200, Daniel Vetter wrote:
> On Mon, May 07, 2018 at 02:33:25PM +0100, Chris Wilson wrote:
> > Quoting Feng Tang (2018-05-07 22:26:34)
> > > Hi Chris,
> > > 
> > > Thanks for the prompt review!
> > > 
> > > On Mon, May 07, 2018 at 11:40:45AM +0100, Chris Wilson wrote:
> > > > Quoting Feng Tang (2018-05-07 11:36:09)
> > > > > To fulfil the Dell 4K monitor, the dpcd max retries has been bumped
> > > > > from 7 to 32, which may hurt the boot/init time for some platforms,
> > > > > as the 32 retries may take hundreds of ms.
> > > > 
> > > > If we need that many retries, so be it. No modparam, the driver just has
> > > > to work.
> > > 
> > > I understand your point. The retry numer was originally 7, and worked
> > > fine untill the Dell 4K monitor which changes to 32.  According to my 
> > > test,
> > > each retry will take about 8ms on the A3960 based NUC.
> > > 
> > > One of our product need to boot up within a given time limit, this
> > > 32 retries will take about 1/3 of the budget (about 270ms), that's
> > > why I would try to make it a parameter.
> > 
> > The essence is that probing whether a monitor is connected should not be
> > blocking boot. If an async probe tries and fails to find a monitor,
> > fine - no one will notice. If it does take 270ms to find a monitor, it
> > turns on 200ms after userspace kicks in, just like any other hotplug.
> 
> Yeah, the fix here is to get the probing out of the driver load path, not
> to break the driver for some funky monitors. And definitely not using a
> modparam.

Hi Chris and Daniel,

After reading the i915 modeset init code, I did some experiments:
1. make the intel_modeset_init() function async (here async means
   creating a async func wrapper and call async_schedul() for it)
2. make the intel_setup_outpus()+intel_modeset_setup_hw_state() async 

But both of them will trigger kernel panic (log msg pasted in the end),
did I made some mistakes, or maybe the i915 codes following these functions
has dependency over them?

IIUC the dpcd access first happens in
i915_driver_load --> i915_load_modeset_init --> intel_modeset_init
--> intel_setup_outputs --> intel_ddi_init --> intel_edp_init_connector
--> intel_edp_init_dpcd (to check if DPCD exist)

Should we postpone it to later phase or even after user space kick in?

Thanks,
Feng

---
Error msg for my async experiment:

[0.715706] No backend configured for hyper_dmabuf in kernel config
[0.716079] Hyper_dmabuf: no backend found
[0.736361] intel_powerclamp: CPU does not support MWAIT
[0.737643] [drm:wait_panel_status] *ERROR* PPS state mismatch
[0.741381] genirq: Setting trigger mode 3 for irq 127 failed 
(intel_gpio_irq_type+0x0/0x110)
[0.743244] dmi-sysfs: dmi entry is absent.
[0.765116] [edp_panel_vdd_on()]: exit
[0.765360] BUG: unable to handle kernel NULL pointer dereference at 
  (null)
[0.765809] IP:   (null)
[0.766005] PGD 0 P4D 0 
[0.766168] Oops: 0010 [#1] PREEMPT SMP
[0.766401] Modules linked in:
[0.766592] CPU: 0 PID: 28 Comm: kworker/u8:1 Tainted: G U  W   
4.14.39-sos+ #26
[0.767075] Workqueue: events_unbound async_run_entry_fn
[0.767392] task: 88027433c240 task.stack: 88027434
[0.767743] RIP: 0010:  (null)
[0.767969] RSP: :880274343ab8 EFLAGS: 00010246
[0.768281] RAX:  RBX: 2d4003ff RCX: 0001
[0.768701] RDX: 8000 RSI:  RDI: 880272f21100
[0.769121] RBP:  R08: 2d4003ff R09: 0001
[0.769541] R10: 880274343a60 R11: 82e7fe0d R12: 0001
[0.769961] R13: 880273038000 R14: 0004 R15: 880272f21100
[0.770383] FS:  () GS:88027dc0() 
knlGS:
[0.770858] CS:  0010 DS:  ES:  CR0: 80050033
[0.771199] CR2:  CR3: 02613000 CR4: 003426f0
[0.771619] Call Trace:
[0.771779]  ? intel_dp_aux_ch+0x1a7/0x770
[0.772031]  ? remove_wait_queue+0x60/0x60
[0.772281]  ? intel_dp_aux_transfer+0xa6/0x200
[0.772556]  ? drm_dp_dpcd_access+0x9d/0x150
[0.772815]  ? drm_dp_dpcd_read+0x2c/0x60
[0.773059]  ? drm_dp_read_desc+0x43/0xf0
[0.773303]  ? intel_dp_detect+0x346/0x6a0
[0.773554]  ? drm_helper_probe_single_connector_modes+0xcd/0x6b0
[0.773920]  ? _raw_spin_unlock+0x14/0x30
[0.774165]  ? vt_console_print+0x22a/0x3d0
[0.774420]  ? preempt_count_add+0x56/0xa0
[0.774669]  ? _raw_spin_lock_irqsave+0x32/0x40
[0.774944]  ? drm_setup_crtcs+0x143/0x9e0
[0.775195]

Re: [Intel-gfx] [PATCH] drm/dp: add module parameter for the dpcd access max retries

2018-05-07 Thread Feng Tang
On Mon, May 07, 2018 at 05:09:21PM +0200, Daniel Vetter wrote:
> On Mon, May 07, 2018 at 02:33:25PM +0100, Chris Wilson wrote:
> > Quoting Feng Tang (2018-05-07 22:26:34)
> > > Hi Chris,
> > > 
> > > Thanks for the prompt review!
> > > 
> > > On Mon, May 07, 2018 at 11:40:45AM +0100, Chris Wilson wrote:
> > > > Quoting Feng Tang (2018-05-07 11:36:09)
> > > > > To fulfil the Dell 4K monitor, the dpcd max retries has been bumped
> > > > > from 7 to 32, which may hurt the boot/init time for some platforms,
> > > > > as the 32 retries may take hundreds of ms.
> > > > 
> > > > If we need that many retries, so be it. No modparam, the driver just has
> > > > to work.
> > > 
> > > I understand your point. The retry numer was originally 7, and worked
> > > fine untill the Dell 4K monitor which changes to 32.  According to my 
> > > test,
> > > each retry will take about 8ms on the A3960 based NUC.
> > > 
> > > One of our product need to boot up within a given time limit, this
> > > 32 retries will take about 1/3 of the budget (about 270ms), that's
> > > why I would try to make it a parameter.
> > 
> > The essence is that probing whether a monitor is connected should not be
> > blocking boot. If an async probe tries and fails to find a monitor,
> > fine - no one will notice. If it does take 270ms to find a monitor, it
> > turns on 200ms after userspace kicks in, just like any other hotplug.
> 
> Yeah, the fix here is to get the probing out of the driver load path, not
> to break the driver for some funky monitors. And definitely not using a
> modparam.

Thank you both for the suggestion! 

Will try to make the probing async first, though I'm not familiar with the
whole drm yet :)

- Feng

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/dp: add module parameter for the dpcd access max retries

2018-05-07 Thread Feng Tang
Hi Chris,

Thanks for the prompt review!

On Mon, May 07, 2018 at 11:40:45AM +0100, Chris Wilson wrote:
> Quoting Feng Tang (2018-05-07 11:36:09)
> > To fulfil the Dell 4K monitor, the dpcd max retries has been bumped
> > from 7 to 32, which may hurt the boot/init time for some platforms,
> > as the 32 retries may take hundreds of ms.
> 
> If we need that many retries, so be it. No modparam, the driver just has
> to work.

I understand your point. The retry numer was originally 7, and worked
fine untill the Dell 4K monitor which changes to 32.  According to my test,
each retry will take about 8ms on the A3960 based NUC.

One of our product need to boot up within a given time limit, this
32 retries will take about 1/3 of the budget (about 270ms), that's
why I would try to make it a parameter.

>  
> > This patch makes no functional change, but make the max retries
> > adjustable, so that concerned users/platforms can have an option
> > to short the wait time.
> > 
> > On a Intel Atom Processor A3960 based NUC, the i915_init() time could
> > be reduced from 450ms to 200ms.
> > 
> > retries = 3:
> > [0.457806] calling  i915_init+0x0/0x51 @ 1
> > [0.662307] initcall i915_init+0x0/0x51 returned 0 after 199694 
> > usecs
> > 
> > retries = 32:
> > [0.465818] calling  i915_init+0x0/0x51 @ 1
> > [0.925831] initcall i915_init+0x0/0x51 returned 0 after 449219 
> > usecs
> 
> Why is this synchronous anyway?

I don't know the reason, maybe some of kernel components depends on the GFX 
module.
But even we can put it to async, it is still the longest init one on our 
platform.

Thanks,
Feng

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/dp: add module parameter for the dpcd access max retries

2018-05-07 Thread Feng Tang
To fulfil the Dell 4K monitor, the dpcd max retries has been bumped
from 7 to 32, which may hurt the boot/init time for some platforms,
as the 32 retries may take hundreds of ms.

This patch makes no functional change, but make the max retries
adjustable, so that concerned users/platforms can have an option
to short the wait time.

On a Intel Atom Processor A3960 based NUC, the i915_init() time could
be reduced from 450ms to 200ms.

retries = 3:
[0.457806] calling  i915_init+0x0/0x51 @ 1
[0.662307] initcall i915_init+0x0/0x51 returned 0 after 199694 usecs

retries = 32:
[0.465818] calling  i915_init+0x0/0x51 @ 1
[0.925831] initcall i915_init+0x0/0x51 returned 0 after 449219 usecs

Signed-off-by: Feng Tang <feng.t...@intel.com>
---
 drivers/gpu/drm/drm_dp_helper.c | 22 +++---
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/drm_dp_helper.c b/drivers/gpu/drm/drm_dp_helper.c
index ffe14ec..6054067 100644
--- a/drivers/gpu/drm/drm_dp_helper.c
+++ b/drivers/gpu/drm/drm_dp_helper.c
@@ -171,6 +171,20 @@ EXPORT_SYMBOL(drm_dp_bw_code_to_link_rate);
 
 #define AUX_RETRY_INTERVAL 500 /* us */
 
+/*
+ * The specification doesn't give any recommendation on how often to
+ * retry native transactions. We used to retry 7 times like for
+ * aux i2c transactions but real world devices this wasn't
+ * sufficient, bump to 32 which makes Dell 4k monitors happier.
+ *
+ * Since the retry may take quite some boot time, we make it a
+ * adjustable parameter for possible shorter retry time.
+ */
+static int dp_dpcd_max_retries __read_mostly = 32;
+module_param_unsafe(dp_dpcd_max_retries, int, 0644);
+MODULE_PARM_DESC(dp_dpcd_max_retries,
+"Max retry number for DPCD access (default 32)");
+
 /**
  * DOC: dp helpers
  *
@@ -198,13 +212,7 @@ static int drm_dp_dpcd_access(struct drm_dp_aux *aux, u8 
request,
 
mutex_lock(>hw_mutex);
 
-   /*
-* The specification doesn't give any recommendation on how often to
-* retry native transactions. We used to retry 7 times like for
-* aux i2c transactions but real world devices this wasn't
-* sufficient, bump to 32 which makes Dell 4k monitors happier.
-*/
-   for (retry = 0; retry < 32; retry++) {
+   for (retry = 0; retry < dp_dpcd_max_retries; retry++) {
if (ret != 0 && ret != -ETIMEDOUT) {
usleep_range(AUX_RETRY_INTERVAL,
 AUX_RETRY_INTERVAL + 100);
-- 
2.7.4

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel