Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
Hi Am 16.09.19 um 11:06 schrieb Feng Tang: > Hi Thomas, > > On Mon, Sep 09, 2019 at 04:12:37PM +0200, Thomas Zimmermann wrote: >> Hi >> >> Am 04.09.19 um 08:27 schrieb Feng Tang: >>> Hi Thomas, >>> >>> On Wed, Aug 28, 2019 at 12:51:40PM +0200, Thomas Zimmermann wrote: Hi Am 28.08.19 um 11:37 schrieb Rong Chen: > Hi Thomas, > > On 8/28/19 1:16 AM, Thomas Zimmermann wrote: >> Hi >> >> Am 27.08.19 um 14:33 schrieb Chen, Rong A: >>> Both patches have little impact on the performance from our side. >> Thanks for testing. Too bad they doesn't solve the issue. >> >> There's another patch attached. Could you please tests this as well? >> Thanks a lot! >> >> The patch comes from Daniel Vetter after discussing the problem on IRC. >> The idea of the patch is that the old mgag200 code might display much >> less frames that the generic code, because mgag200 only prints from >> non-atomic context. If we simulate this with the generic code, we should >> see roughly the original performance. >> >> > > It's cool, the patch "usecansleep.patch" can fix the issue. Thank you for testing. But don't get too excited, because the patch simulates a bug that was present in the original mgag200 code. A significant number of frames are simply skipped. That is apparently the reason why it's faster. >>> >>> Thanks for the detailed info, so the original code skips time-consuming >>> work inside atomic context on purpose. Is there any space to optmise it? >>> If 2 scheduled update worker are handled at almost same time, can one be >>> skipped? >> >> We discussed ideas on IRC and decided that screen updates could be >> synchronized with vblank intervals. This may give some rate limiting to >> the output. >> >> If you like, you could try the patch set at [1]. It adds the respective >> code to console and mgag200. > > I just tried the 2 patches, no obvious change (comparing to the > 18.8% regression), both in overall benchmark and micro-profiling. > > 90f479ae51afa45e 04a0983095feaee022cdd65e3e4 > --- > 37236 ± 3% +2.5% 38167 ± 3% vm-scalability.median > 0.15 ± 24% -25.1% 0.11 ± 23% vm-scalability.median_stddev > 0.15 ± 23% -25.1% 0.11 ± 22% vm-scalability.stddev > 12767318 ± 4% +2.5% 13089177 ± 3% vm-scalability.throughput Thank you for testing. I wish we'd seen at least some improvement. Best regards Thomas > Thanks, > Feng > >> >> Best regards >> Thomas >> >> [1] >> https://lists.freedesktop.org/archives/dri-devel/2019-September/234850.html >> >>> >>> Thanks, >>> Feng >>> Best regards Thomas >> >> -- >> Thomas Zimmermann >> Graphics Driver Developer >> SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany >> GF: Felix Imendörffer, Mary Higgins, Sri Rasiah >> HRB 21284 (AG Nürnberg) >> > > > > ___ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel > -- Thomas Zimmermann Graphics Driver Developer SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 21284 (AG Nürnberg) signature.asc Description: OpenPGP digital signature
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
Hi Thomas, On Mon, Sep 09, 2019 at 04:12:37PM +0200, Thomas Zimmermann wrote: > Hi > > Am 04.09.19 um 08:27 schrieb Feng Tang: > > Hi Thomas, > > > > On Wed, Aug 28, 2019 at 12:51:40PM +0200, Thomas Zimmermann wrote: > >> Hi > >> > >> Am 28.08.19 um 11:37 schrieb Rong Chen: > >>> Hi Thomas, > >>> > >>> On 8/28/19 1:16 AM, Thomas Zimmermann wrote: > Hi > > Am 27.08.19 um 14:33 schrieb Chen, Rong A: > > Both patches have little impact on the performance from our side. > Thanks for testing. Too bad they doesn't solve the issue. > > There's another patch attached. Could you please tests this as well? > Thanks a lot! > > The patch comes from Daniel Vetter after discussing the problem on IRC. > The idea of the patch is that the old mgag200 code might display much > less frames that the generic code, because mgag200 only prints from > non-atomic context. If we simulate this with the generic code, we should > see roughly the original performance. > > > >>> > >>> It's cool, the patch "usecansleep.patch" can fix the issue. > >> > >> Thank you for testing. But don't get too excited, because the patch > >> simulates a bug that was present in the original mgag200 code. A > >> significant number of frames are simply skipped. That is apparently the > >> reason why it's faster. > > > > Thanks for the detailed info, so the original code skips time-consuming > > work inside atomic context on purpose. Is there any space to optmise it? > > If 2 scheduled update worker are handled at almost same time, can one be > > skipped? > > We discussed ideas on IRC and decided that screen updates could be > synchronized with vblank intervals. This may give some rate limiting to > the output. > > If you like, you could try the patch set at [1]. It adds the respective > code to console and mgag200. I just tried the 2 patches, no obvious change (comparing to the 18.8% regression), both in overall benchmark and micro-profiling. 90f479ae51afa45e 04a0983095feaee022cdd65e3e4 --- 37236 ± 3% +2.5% 38167 ± 3% vm-scalability.median 0.15 ± 24% -25.1% 0.11 ± 23% vm-scalability.median_stddev 0.15 ± 23% -25.1% 0.11 ± 22% vm-scalability.stddev 12767318 ± 4% +2.5% 13089177 ± 3% vm-scalability.throughput Thanks, Feng > > Best regards > Thomas > > [1] > https://lists.freedesktop.org/archives/dri-devel/2019-September/234850.html > > > > > Thanks, > > Feng > > > >> > >> Best regards > >> Thomas > > -- > Thomas Zimmermann > Graphics Driver Developer > SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany > GF: Felix Imendörffer, Mary Higgins, Sri Rasiah > HRB 21284 (AG Nürnberg) >
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
Hi Am 04.09.19 um 08:27 schrieb Feng Tang: > Hi Thomas, > > On Wed, Aug 28, 2019 at 12:51:40PM +0200, Thomas Zimmermann wrote: >> Hi >> >> Am 28.08.19 um 11:37 schrieb Rong Chen: >>> Hi Thomas, >>> >>> On 8/28/19 1:16 AM, Thomas Zimmermann wrote: Hi Am 27.08.19 um 14:33 schrieb Chen, Rong A: > Both patches have little impact on the performance from our side. Thanks for testing. Too bad they doesn't solve the issue. There's another patch attached. Could you please tests this as well? Thanks a lot! The patch comes from Daniel Vetter after discussing the problem on IRC. The idea of the patch is that the old mgag200 code might display much less frames that the generic code, because mgag200 only prints from non-atomic context. If we simulate this with the generic code, we should see roughly the original performance. >>> >>> It's cool, the patch "usecansleep.patch" can fix the issue. >> >> Thank you for testing. But don't get too excited, because the patch >> simulates a bug that was present in the original mgag200 code. A >> significant number of frames are simply skipped. That is apparently the >> reason why it's faster. > > Thanks for the detailed info, so the original code skips time-consuming > work inside atomic context on purpose. Is there any space to optmise it? > If 2 scheduled update worker are handled at almost same time, can one be > skipped? We discussed ideas on IRC and decided that screen updates could be synchronized with vblank intervals. This may give some rate limiting to the output. If you like, you could try the patch set at [1]. It adds the respective code to console and mgag200. Best regards Thomas [1] https://lists.freedesktop.org/archives/dri-devel/2019-September/234850.html > > Thanks, > Feng > >> >> Best regards >> Thomas -- Thomas Zimmermann Graphics Driver Developer SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 21284 (AG Nürnberg) signature.asc Description: OpenPGP digital signature ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
On Thu, Sep 05, 2019 at 06:37:47PM +0800, Daniel Vetter wrote: > On Thu, Sep 5, 2019 at 8:58 AM Feng Tang wrote: > > > > Hi Vetter, > > > > On Wed, Sep 04, 2019 at 01:20:29PM +0200, Daniel Vetter wrote: > > > On Wed, Sep 4, 2019 at 1:15 PM Dave Airlie wrote: > > > > > > > > On Wed, 4 Sep 2019 at 19:17, Daniel Vetter wrote: > > > > > > > > > > On Wed, Sep 4, 2019 at 10:35 AM Feng Tang wrote: > > > > > > > > > > > > Hi Daniel, > > > > > > > > > > > > On Wed, Sep 04, 2019 at 10:11:11AM +0200, Daniel Vetter wrote: > > > > > > > On Wed, Sep 4, 2019 at 8:53 AM Thomas Zimmermann > > > > > > > wrote: > > > > > > > > > > > > > > > > Hi > > > > > > > > > > > > > > > > Am 04.09.19 um 08:27 schrieb Feng Tang: > > > > > > > > >> Thank you for testing. But don't get too excited, because > > > > > > > > >> the patch > > > > > > > > >> simulates a bug that was present in the original mgag200 > > > > > > > > >> code. A > > > > > > > > >> significant number of frames are simply skipped. That is > > > > > > > > >> apparently the > > > > > > > > >> reason why it's faster. > > > > > > > > > > > > > > > > > > Thanks for the detailed info, so the original code skips > > > > > > > > > time-consuming > > > > > > > > > work inside atomic context on purpose. Is there any space to > > > > > > > > > optmise it? > > > > > > > > > If 2 scheduled update worker are handled at almost same time, > > > > > > > > > can one be > > > > > > > > > skipped? > > > > > > > > > > > > > > > > To my knowledge, there's only one instance of the worker. > > > > > > > > Re-scheduling > > > > > > > > the worker before a previous instance started, will not create > > > > > > > > a second > > > > > > > > instance. The worker's instance will complete all pending > > > > > > > > updates. So in > > > > > > > > some way, skipping workers already happens. > > > > > > > > > > > > > > So I think that the most often fbcon update from atomic context > > > > > > > is the > > > > > > > blinking cursor. If you disable that one you should be back to > > > > > > > the old > > > > > > > performance level I think, since just writing to dmesg is from > > > > > > > process > > > > > > > context, so shouldn't change. > > > > > > > > > > > > Hmm, then for the old driver, it should also do the most update in > > > > > > non-atomic context? > > > > > > > > > > > > One other thing is, I profiled that updating a 3MB shadow buffer > > > > > > needs > > > > > > 20 ms, which transfer to 150 MB/s bandwidth. Could it be related > > > > > > with > > > > > > the cache setting of DRM shadow buffer? say the orginal code use a > > > > > > cachable buffer? > > > > > > > > > > Hm, that would indicate the write-combining got broken somewhere. This > > > > > should definitely be faster. Also we shouldn't transfer the hole > > > > > thing, except when scrolling ... > > > > > > > > First rule of fbcon usage, you are always effectively scrolling. > > > > > > > > Also these devices might be on a PCIE 1x piece of wet string, not sure > > > > if the numbers reflect that. > > > > > > pcie 1x 1.0 is 250MB/s, so yeah with a bit of inefficiency and > > > overhead not entirely out of the question that 150MB/s is actually the > > > hw limit. If it's really pcie 1x 1.0, no idea where to check that. > > > Also might be worth to double-check that the gpu pci bar is listed as > > > wc in debugfs/x86/pat_memtype_list. > > > > Here is some dump of the device info and the pat_memtype_list, while it is > > running other 0day task: > > Looks all good, I guess Dave is right with this probably only being a > real slow, real old pcie link, plus maybe some inefficiencies in the > mapping. Your 150MB/s, was that just the copy, or did you include all > the setup/map/unmap/teardown too in your measurement in the trace? Following is the breakdown, the 19240 us is the memory copy time The drm_fb_helper_dirty_work() calls sequentially 1. drm_client_buffer_vmap (290 us) 2. drm_fb_helper_dirty_blit_real (19240 us) 3. helper->fb->funcs->dirty()---> NULL for mgag200 driver 4. drm_client_buffer_vunmap (215 us) Thanks, Feng > -Daniel > > > > > controller info > > = > > 03:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA > > G200e [Pilot] ServerEngines (SEP1) (rev 05) (prog-if 00 [VGA controller]) > > Subsystem: Intel Corporation MGA G200e [Pilot] ServerEngines (SEP1) > > Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- > > Stepping- SERR- FastB2B- DisINTx- > > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > > SERR- > Interrupt: pin A routed to IRQ 16 > > NUMA node: 0 > > Region 0: Memory at d000 (32-bit, prefetchable) [size=16M] > > Region 1: Memory at d180 (32-bit, non-prefetchable) [size=16K] > > Region 2: Memory at d100 (32-bit, non-prefetchable) [size=8M] > > Expansion ROM at 000c [disabled] [size=128K] > >
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
On Thu, Sep 5, 2019 at 8:58 AM Feng Tang wrote: > > Hi Vetter, > > On Wed, Sep 04, 2019 at 01:20:29PM +0200, Daniel Vetter wrote: > > On Wed, Sep 4, 2019 at 1:15 PM Dave Airlie wrote: > > > > > > On Wed, 4 Sep 2019 at 19:17, Daniel Vetter wrote: > > > > > > > > On Wed, Sep 4, 2019 at 10:35 AM Feng Tang wrote: > > > > > > > > > > Hi Daniel, > > > > > > > > > > On Wed, Sep 04, 2019 at 10:11:11AM +0200, Daniel Vetter wrote: > > > > > > On Wed, Sep 4, 2019 at 8:53 AM Thomas Zimmermann > > > > > > wrote: > > > > > > > > > > > > > > Hi > > > > > > > > > > > > > > Am 04.09.19 um 08:27 schrieb Feng Tang: > > > > > > > >> Thank you for testing. But don't get too excited, because the > > > > > > > >> patch > > > > > > > >> simulates a bug that was present in the original mgag200 code. > > > > > > > >> A > > > > > > > >> significant number of frames are simply skipped. That is > > > > > > > >> apparently the > > > > > > > >> reason why it's faster. > > > > > > > > > > > > > > > > Thanks for the detailed info, so the original code skips > > > > > > > > time-consuming > > > > > > > > work inside atomic context on purpose. Is there any space to > > > > > > > > optmise it? > > > > > > > > If 2 scheduled update worker are handled at almost same time, > > > > > > > > can one be > > > > > > > > skipped? > > > > > > > > > > > > > > To my knowledge, there's only one instance of the worker. > > > > > > > Re-scheduling > > > > > > > the worker before a previous instance started, will not create a > > > > > > > second > > > > > > > instance. The worker's instance will complete all pending > > > > > > > updates. So in > > > > > > > some way, skipping workers already happens. > > > > > > > > > > > > So I think that the most often fbcon update from atomic context is > > > > > > the > > > > > > blinking cursor. If you disable that one you should be back to the > > > > > > old > > > > > > performance level I think, since just writing to dmesg is from > > > > > > process > > > > > > context, so shouldn't change. > > > > > > > > > > Hmm, then for the old driver, it should also do the most update in > > > > > non-atomic context? > > > > > > > > > > One other thing is, I profiled that updating a 3MB shadow buffer needs > > > > > 20 ms, which transfer to 150 MB/s bandwidth. Could it be related with > > > > > the cache setting of DRM shadow buffer? say the orginal code use a > > > > > cachable buffer? > > > > > > > > Hm, that would indicate the write-combining got broken somewhere. This > > > > should definitely be faster. Also we shouldn't transfer the hole > > > > thing, except when scrolling ... > > > > > > First rule of fbcon usage, you are always effectively scrolling. > > > > > > Also these devices might be on a PCIE 1x piece of wet string, not sure > > > if the numbers reflect that. > > > > pcie 1x 1.0 is 250MB/s, so yeah with a bit of inefficiency and > > overhead not entirely out of the question that 150MB/s is actually the > > hw limit. If it's really pcie 1x 1.0, no idea where to check that. > > Also might be worth to double-check that the gpu pci bar is listed as > > wc in debugfs/x86/pat_memtype_list. > > Here is some dump of the device info and the pat_memtype_list, while it is > running other 0day task: Looks all good, I guess Dave is right with this probably only being a real slow, real old pcie link, plus maybe some inefficiencies in the mapping. Your 150MB/s, was that just the copy, or did you include all the setup/map/unmap/teardown too in your measurement in the trace? -Daniel > > controller info > = > 03:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200e > [Pilot] ServerEngines (SEP1) (rev 05) (prog-if 00 [VGA controller]) > Subsystem: Intel Corporation MGA G200e [Pilot] ServerEngines (SEP1) > Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- > Stepping- SERR- FastB2B- DisINTx- > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > SERR- Interrupt: pin A routed to IRQ 16 > NUMA node: 0 > Region 0: Memory at d000 (32-bit, prefetchable) [size=16M] > Region 1: Memory at d180 (32-bit, non-prefetchable) [size=16K] > Region 2: Memory at d100 (32-bit, non-prefetchable) [size=8M] > Expansion ROM at 000c [disabled] [size=128K] > Capabilities: [dc] Power Management version 2 > Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA > PME(D0-,D1-,D2-,D3hot-,D3cold-) > Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [e4] Express (v1) Legacy Endpoint, MSI 00 > DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, > L1 <1us > ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset- > DevCtl: Report errors: Correctable- Non-Fatal- Fatal- > Unsupported- > RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- >
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
Hi Vetter, On Wed, Sep 04, 2019 at 01:20:29PM +0200, Daniel Vetter wrote: > On Wed, Sep 4, 2019 at 1:15 PM Dave Airlie wrote: > > > > On Wed, 4 Sep 2019 at 19:17, Daniel Vetter wrote: > > > > > > On Wed, Sep 4, 2019 at 10:35 AM Feng Tang wrote: > > > > > > > > Hi Daniel, > > > > > > > > On Wed, Sep 04, 2019 at 10:11:11AM +0200, Daniel Vetter wrote: > > > > > On Wed, Sep 4, 2019 at 8:53 AM Thomas Zimmermann > > > > > wrote: > > > > > > > > > > > > Hi > > > > > > > > > > > > Am 04.09.19 um 08:27 schrieb Feng Tang: > > > > > > >> Thank you for testing. But don't get too excited, because the > > > > > > >> patch > > > > > > >> simulates a bug that was present in the original mgag200 code. A > > > > > > >> significant number of frames are simply skipped. That is > > > > > > >> apparently the > > > > > > >> reason why it's faster. > > > > > > > > > > > > > > Thanks for the detailed info, so the original code skips > > > > > > > time-consuming > > > > > > > work inside atomic context on purpose. Is there any space to > > > > > > > optmise it? > > > > > > > If 2 scheduled update worker are handled at almost same time, can > > > > > > > one be > > > > > > > skipped? > > > > > > > > > > > > To my knowledge, there's only one instance of the worker. > > > > > > Re-scheduling > > > > > > the worker before a previous instance started, will not create a > > > > > > second > > > > > > instance. The worker's instance will complete all pending updates. > > > > > > So in > > > > > > some way, skipping workers already happens. > > > > > > > > > > So I think that the most often fbcon update from atomic context is the > > > > > blinking cursor. If you disable that one you should be back to the old > > > > > performance level I think, since just writing to dmesg is from process > > > > > context, so shouldn't change. > > > > > > > > Hmm, then for the old driver, it should also do the most update in > > > > non-atomic context? > > > > > > > > One other thing is, I profiled that updating a 3MB shadow buffer needs > > > > 20 ms, which transfer to 150 MB/s bandwidth. Could it be related with > > > > the cache setting of DRM shadow buffer? say the orginal code use a > > > > cachable buffer? > > > > > > Hm, that would indicate the write-combining got broken somewhere. This > > > should definitely be faster. Also we shouldn't transfer the hole > > > thing, except when scrolling ... > > > > First rule of fbcon usage, you are always effectively scrolling. > > > > Also these devices might be on a PCIE 1x piece of wet string, not sure > > if the numbers reflect that. > > pcie 1x 1.0 is 250MB/s, so yeah with a bit of inefficiency and > overhead not entirely out of the question that 150MB/s is actually the > hw limit. If it's really pcie 1x 1.0, no idea where to check that. > Also might be worth to double-check that the gpu pci bar is listed as > wc in debugfs/x86/pat_memtype_list. Here is some dump of the device info and the pat_memtype_list, while it is running other 0day task: controller info = 03:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200e [Pilot] ServerEngines (SEP1) (rev 05) (prog-if 00 [VGA controller]) Subsystem: Intel Corporation MGA G200e [Pilot] ServerEngines (SEP1) Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort+ SERR- Capabilities: [144 v1] Vendor Specific Information: ID=0004 Rev=1 Len=03c Capabilities: [1d0 v1] Vendor Specific Information: ID=0003 Rev=1 Len=00a Capabilities: [250 v1] #19 Capabilities: [280 v1] Vendor Specific Information: ID=0005 Rev=3 Len=018 Capabilities: [298 v1] Vendor Specific Information: ID=0007 Rev=0 Len=024 Thanks, Feng > > -Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
Hi Thomas, On 9/4/2019 4:43 PM, Thomas Zimmermann wrote: Hi Am 04.09.19 um 10:35 schrieb Feng Tang: Hi Daniel, On Wed, Sep 04, 2019 at 10:11:11AM +0200, Daniel Vetter wrote: On Wed, Sep 4, 2019 at 8:53 AM Thomas Zimmermann wrote: Hi Am 04.09.19 um 08:27 schrieb Feng Tang: Thank you for testing. But don't get too excited, because the patch simulates a bug that was present in the original mgag200 code. A significant number of frames are simply skipped. That is apparently the reason why it's faster. Thanks for the detailed info, so the original code skips time-consuming work inside atomic context on purpose. Is there any space to optmise it? If 2 scheduled update worker are handled at almost same time, can one be skipped? To my knowledge, there's only one instance of the worker. Re-scheduling the worker before a previous instance started, will not create a second instance. The worker's instance will complete all pending updates. So in some way, skipping workers already happens. So I think that the most often fbcon update from atomic context is the blinking cursor. If you disable that one you should be back to the old performance level I think, since just writing to dmesg is from process context, so shouldn't change. Hmm, then for the old driver, it should also do the most update in non-atomic context? One other thing is, I profiled that updating a 3MB shadow buffer needs 20 ms, which transfer to 150 MB/s bandwidth. Could it be related with the cache setting of DRM shadow buffer? say the orginal code use a cachable buffer? https://unix.stackexchange.com/questions/3759/how-to-stop-cursor-from-blinking Bunch of tricks, but tbh I haven't tested them. Thomas has suggested to disable curson by echo 0 > /sys/devices/virtual/graphics/fbcon/cursor_blink We tried that way, and no change for the performance data. There are several ways of disabling the cursor. On my test system, I entered tput civis before the test and got better performance. Did you try this as well? There's no obvious change on our system. Best Regards, Rong Chen Best regards Thomas Thanks, Feng In any case, I still strongly advice you don't print anything to dmesg or fbcon while benchmarking, because dmesg/printf are anything but fast, especially if a gpu driver is involved. There's some efforts to make the dmesg/printk side less painful (untangling the console_lock from printk), but fundamentally printing to the gpu from the kernel through dmesg/fbcon won't be cheap. It's just not something we optimize beyond "make sure it works for emergencies". -Daniel Best regards Thomas Thanks, Feng Best regards Thomas ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel -- Thomas Zimmermann Graphics Driver Developer SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 21284 (AG Nürnberg) ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ LKP mailing list l...@lists.01.org https://lists.01.org/mailman/listinfo/lkp ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
On Wed, Sep 4, 2019 at 1:15 PM Dave Airlie wrote: > > On Wed, 4 Sep 2019 at 19:17, Daniel Vetter wrote: > > > > On Wed, Sep 4, 2019 at 10:35 AM Feng Tang wrote: > > > > > > Hi Daniel, > > > > > > On Wed, Sep 04, 2019 at 10:11:11AM +0200, Daniel Vetter wrote: > > > > On Wed, Sep 4, 2019 at 8:53 AM Thomas Zimmermann > > > > wrote: > > > > > > > > > > Hi > > > > > > > > > > Am 04.09.19 um 08:27 schrieb Feng Tang: > > > > > >> Thank you for testing. But don't get too excited, because the patch > > > > > >> simulates a bug that was present in the original mgag200 code. A > > > > > >> significant number of frames are simply skipped. That is > > > > > >> apparently the > > > > > >> reason why it's faster. > > > > > > > > > > > > Thanks for the detailed info, so the original code skips > > > > > > time-consuming > > > > > > work inside atomic context on purpose. Is there any space to > > > > > > optmise it? > > > > > > If 2 scheduled update worker are handled at almost same time, can > > > > > > one be > > > > > > skipped? > > > > > > > > > > To my knowledge, there's only one instance of the worker. > > > > > Re-scheduling > > > > > the worker before a previous instance started, will not create a > > > > > second > > > > > instance. The worker's instance will complete all pending updates. So > > > > > in > > > > > some way, skipping workers already happens. > > > > > > > > So I think that the most often fbcon update from atomic context is the > > > > blinking cursor. If you disable that one you should be back to the old > > > > performance level I think, since just writing to dmesg is from process > > > > context, so shouldn't change. > > > > > > Hmm, then for the old driver, it should also do the most update in > > > non-atomic context? > > > > > > One other thing is, I profiled that updating a 3MB shadow buffer needs > > > 20 ms, which transfer to 150 MB/s bandwidth. Could it be related with > > > the cache setting of DRM shadow buffer? say the orginal code use a > > > cachable buffer? > > > > Hm, that would indicate the write-combining got broken somewhere. This > > should definitely be faster. Also we shouldn't transfer the hole > > thing, except when scrolling ... > > First rule of fbcon usage, you are always effectively scrolling. > > Also these devices might be on a PCIE 1x piece of wet string, not sure > if the numbers reflect that. pcie 1x 1.0 is 250MB/s, so yeah with a bit of inefficiency and overhead not entirely out of the question that 150MB/s is actually the hw limit. If it's really pcie 1x 1.0, no idea where to check that. Also might be worth to double-check that the gpu pci bar is listed as wc in debugfs/x86/pat_memtype_list. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
On Wed, 4 Sep 2019 at 19:17, Daniel Vetter wrote: > > On Wed, Sep 4, 2019 at 10:35 AM Feng Tang wrote: > > > > Hi Daniel, > > > > On Wed, Sep 04, 2019 at 10:11:11AM +0200, Daniel Vetter wrote: > > > On Wed, Sep 4, 2019 at 8:53 AM Thomas Zimmermann > > > wrote: > > > > > > > > Hi > > > > > > > > Am 04.09.19 um 08:27 schrieb Feng Tang: > > > > >> Thank you for testing. But don't get too excited, because the patch > > > > >> simulates a bug that was present in the original mgag200 code. A > > > > >> significant number of frames are simply skipped. That is apparently > > > > >> the > > > > >> reason why it's faster. > > > > > > > > > > Thanks for the detailed info, so the original code skips > > > > > time-consuming > > > > > work inside atomic context on purpose. Is there any space to optmise > > > > > it? > > > > > If 2 scheduled update worker are handled at almost same time, can one > > > > > be > > > > > skipped? > > > > > > > > To my knowledge, there's only one instance of the worker. Re-scheduling > > > > the worker before a previous instance started, will not create a second > > > > instance. The worker's instance will complete all pending updates. So in > > > > some way, skipping workers already happens. > > > > > > So I think that the most often fbcon update from atomic context is the > > > blinking cursor. If you disable that one you should be back to the old > > > performance level I think, since just writing to dmesg is from process > > > context, so shouldn't change. > > > > Hmm, then for the old driver, it should also do the most update in > > non-atomic context? > > > > One other thing is, I profiled that updating a 3MB shadow buffer needs > > 20 ms, which transfer to 150 MB/s bandwidth. Could it be related with > > the cache setting of DRM shadow buffer? say the orginal code use a > > cachable buffer? > > Hm, that would indicate the write-combining got broken somewhere. This > should definitely be faster. Also we shouldn't transfer the hole > thing, except when scrolling ... First rule of fbcon usage, you are always effectively scrolling. Also these devices might be on a PCIE 1x piece of wet string, not sure if the numbers reflect that. Dave. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
On Wed, Sep 4, 2019 at 10:35 AM Feng Tang wrote: > > Hi Daniel, > > On Wed, Sep 04, 2019 at 10:11:11AM +0200, Daniel Vetter wrote: > > On Wed, Sep 4, 2019 at 8:53 AM Thomas Zimmermann > > wrote: > > > > > > Hi > > > > > > Am 04.09.19 um 08:27 schrieb Feng Tang: > > > >> Thank you for testing. But don't get too excited, because the patch > > > >> simulates a bug that was present in the original mgag200 code. A > > > >> significant number of frames are simply skipped. That is apparently the > > > >> reason why it's faster. > > > > > > > > Thanks for the detailed info, so the original code skips time-consuming > > > > work inside atomic context on purpose. Is there any space to optmise it? > > > > If 2 scheduled update worker are handled at almost same time, can one be > > > > skipped? > > > > > > To my knowledge, there's only one instance of the worker. Re-scheduling > > > the worker before a previous instance started, will not create a second > > > instance. The worker's instance will complete all pending updates. So in > > > some way, skipping workers already happens. > > > > So I think that the most often fbcon update from atomic context is the > > blinking cursor. If you disable that one you should be back to the old > > performance level I think, since just writing to dmesg is from process > > context, so shouldn't change. > > Hmm, then for the old driver, it should also do the most update in > non-atomic context? > > One other thing is, I profiled that updating a 3MB shadow buffer needs > 20 ms, which transfer to 150 MB/s bandwidth. Could it be related with > the cache setting of DRM shadow buffer? say the orginal code use a > cachable buffer? Hm, that would indicate the write-combining got broken somewhere. This should definitely be faster. Also we shouldn't transfer the hole thing, except when scrolling ... > > https://unix.stackexchange.com/questions/3759/how-to-stop-cursor-from-blinking > > > > Bunch of tricks, but tbh I haven't tested them. > > Thomas has suggested to disable curson by > echo 0 > /sys/devices/virtual/graphics/fbcon/cursor_blink > > We tried that way, and no change for the performance data. Huh, if there's other atomic contexts for fbcon update then I'm not aware ... and if it's all the updates, then you wouldn't see a hole lot on your screen, neither with the old or new fbdev support in mgag200. I'm a bit confused ... -Daniel > > Thanks, > Feng > > > > > In any case, I still strongly advice you don't print anything to dmesg > > or fbcon while benchmarking, because dmesg/printf are anything but > > fast, especially if a gpu driver is involved. There's some efforts to > > make the dmesg/printk side less painful (untangling the console_lock > > from printk), but fundamentally printing to the gpu from the kernel > > through dmesg/fbcon won't be cheap. It's just not something we > > optimize beyond "make sure it works for emergencies". > > -Daniel > > > > > > > > Best regards > > > Thomas > > > > > > > > > > > Thanks, > > > > Feng > > > > > > > >> > > > >> Best regards > > > >> Thomas > > > > ___ > > > > dri-devel mailing list > > > > dri-devel@lists.freedesktop.org > > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > > > > > > > > > > -- > > > Thomas Zimmermann > > > Graphics Driver Developer > > > SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany > > > GF: Felix Imendörffer, Mary Higgins, Sri Rasiah > > > HRB 21284 (AG Nürnberg) > > > > > > ___ > > > dri-devel mailing list > > > dri-devel@lists.freedesktop.org > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > > > > > > > > -- > > Daniel Vetter > > Software Engineer, Intel Corporation > > +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
Hi Am 04.09.19 um 10:35 schrieb Feng Tang: > Hi Daniel, > > On Wed, Sep 04, 2019 at 10:11:11AM +0200, Daniel Vetter wrote: >> On Wed, Sep 4, 2019 at 8:53 AM Thomas Zimmermann wrote: >>> >>> Hi >>> >>> Am 04.09.19 um 08:27 schrieb Feng Tang: > Thank you for testing. But don't get too excited, because the patch > simulates a bug that was present in the original mgag200 code. A > significant number of frames are simply skipped. That is apparently the > reason why it's faster. Thanks for the detailed info, so the original code skips time-consuming work inside atomic context on purpose. Is there any space to optmise it? If 2 scheduled update worker are handled at almost same time, can one be skipped? >>> >>> To my knowledge, there's only one instance of the worker. Re-scheduling >>> the worker before a previous instance started, will not create a second >>> instance. The worker's instance will complete all pending updates. So in >>> some way, skipping workers already happens. >> >> So I think that the most often fbcon update from atomic context is the >> blinking cursor. If you disable that one you should be back to the old >> performance level I think, since just writing to dmesg is from process >> context, so shouldn't change. > > Hmm, then for the old driver, it should also do the most update in > non-atomic context? > > One other thing is, I profiled that updating a 3MB shadow buffer needs > 20 ms, which transfer to 150 MB/s bandwidth. Could it be related with > the cache setting of DRM shadow buffer? say the orginal code use a > cachable buffer? > > >> >> https://unix.stackexchange.com/questions/3759/how-to-stop-cursor-from-blinking >> >> Bunch of tricks, but tbh I haven't tested them. > > Thomas has suggested to disable curson by > echo 0 > /sys/devices/virtual/graphics/fbcon/cursor_blink > > We tried that way, and no change for the performance data. There are several ways of disabling the cursor. On my test system, I entered tput civis before the test and got better performance. Did you try this as well? Best regards Thomas > > Thanks, > Feng > >> >> In any case, I still strongly advice you don't print anything to dmesg >> or fbcon while benchmarking, because dmesg/printf are anything but >> fast, especially if a gpu driver is involved. There's some efforts to >> make the dmesg/printk side less painful (untangling the console_lock >> from printk), but fundamentally printing to the gpu from the kernel >> through dmesg/fbcon won't be cheap. It's just not something we >> optimize beyond "make sure it works for emergencies". >> -Daniel >> >>> >>> Best regards >>> Thomas >>> Thanks, Feng > > Best regards > Thomas ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel >>> >>> -- >>> Thomas Zimmermann >>> Graphics Driver Developer >>> SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany >>> GF: Felix Imendörffer, Mary Higgins, Sri Rasiah >>> HRB 21284 (AG Nürnberg) >>> >>> ___ >>> dri-devel mailing list >>> dri-devel@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >> >> >> >> -- >> Daniel Vetter >> Software Engineer, Intel Corporation >> +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- Thomas Zimmermann Graphics Driver Developer SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 21284 (AG Nürnberg) signature.asc Description: OpenPGP digital signature
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
Hi Daniel, On Wed, Sep 04, 2019 at 10:11:11AM +0200, Daniel Vetter wrote: > On Wed, Sep 4, 2019 at 8:53 AM Thomas Zimmermann wrote: > > > > Hi > > > > Am 04.09.19 um 08:27 schrieb Feng Tang: > > >> Thank you for testing. But don't get too excited, because the patch > > >> simulates a bug that was present in the original mgag200 code. A > > >> significant number of frames are simply skipped. That is apparently the > > >> reason why it's faster. > > > > > > Thanks for the detailed info, so the original code skips time-consuming > > > work inside atomic context on purpose. Is there any space to optmise it? > > > If 2 scheduled update worker are handled at almost same time, can one be > > > skipped? > > > > To my knowledge, there's only one instance of the worker. Re-scheduling > > the worker before a previous instance started, will not create a second > > instance. The worker's instance will complete all pending updates. So in > > some way, skipping workers already happens. > > So I think that the most often fbcon update from atomic context is the > blinking cursor. If you disable that one you should be back to the old > performance level I think, since just writing to dmesg is from process > context, so shouldn't change. Hmm, then for the old driver, it should also do the most update in non-atomic context? One other thing is, I profiled that updating a 3MB shadow buffer needs 20 ms, which transfer to 150 MB/s bandwidth. Could it be related with the cache setting of DRM shadow buffer? say the orginal code use a cachable buffer? > > https://unix.stackexchange.com/questions/3759/how-to-stop-cursor-from-blinking > > Bunch of tricks, but tbh I haven't tested them. Thomas has suggested to disable curson by echo 0 > /sys/devices/virtual/graphics/fbcon/cursor_blink We tried that way, and no change for the performance data. Thanks, Feng > > In any case, I still strongly advice you don't print anything to dmesg > or fbcon while benchmarking, because dmesg/printf are anything but > fast, especially if a gpu driver is involved. There's some efforts to > make the dmesg/printk side less painful (untangling the console_lock > from printk), but fundamentally printing to the gpu from the kernel > through dmesg/fbcon won't be cheap. It's just not something we > optimize beyond "make sure it works for emergencies". > -Daniel > > > > > Best regards > > Thomas > > > > > > > > Thanks, > > > Feng > > > > > >> > > >> Best regards > > >> Thomas > > > ___ > > > dri-devel mailing list > > > dri-devel@lists.freedesktop.org > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > > > > > > > -- > > Thomas Zimmermann > > Graphics Driver Developer > > SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany > > GF: Felix Imendörffer, Mary Higgins, Sri Rasiah > > HRB 21284 (AG Nürnberg) > > > > ___ > > dri-devel mailing list > > dri-devel@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > > > > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
On Wed, Sep 4, 2019 at 8:53 AM Thomas Zimmermann wrote: > > Hi > > Am 04.09.19 um 08:27 schrieb Feng Tang: > >> Thank you for testing. But don't get too excited, because the patch > >> simulates a bug that was present in the original mgag200 code. A > >> significant number of frames are simply skipped. That is apparently the > >> reason why it's faster. > > > > Thanks for the detailed info, so the original code skips time-consuming > > work inside atomic context on purpose. Is there any space to optmise it? > > If 2 scheduled update worker are handled at almost same time, can one be > > skipped? > > To my knowledge, there's only one instance of the worker. Re-scheduling > the worker before a previous instance started, will not create a second > instance. The worker's instance will complete all pending updates. So in > some way, skipping workers already happens. So I think that the most often fbcon update from atomic context is the blinking cursor. If you disable that one you should be back to the old performance level I think, since just writing to dmesg is from process context, so shouldn't change. https://unix.stackexchange.com/questions/3759/how-to-stop-cursor-from-blinking Bunch of tricks, but tbh I haven't tested them. In any case, I still strongly advice you don't print anything to dmesg or fbcon while benchmarking, because dmesg/printf are anything but fast, especially if a gpu driver is involved. There's some efforts to make the dmesg/printk side less painful (untangling the console_lock from printk), but fundamentally printing to the gpu from the kernel through dmesg/fbcon won't be cheap. It's just not something we optimize beyond "make sure it works for emergencies". -Daniel > > Best regards > Thomas > > > > > Thanks, > > Feng > > > >> > >> Best regards > >> Thomas > > ___ > > dri-devel mailing list > > dri-devel@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > > > > -- > Thomas Zimmermann > Graphics Driver Developer > SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany > GF: Felix Imendörffer, Mary Higgins, Sri Rasiah > HRB 21284 (AG Nürnberg) > > ___ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
Hi Am 04.09.19 um 08:27 schrieb Feng Tang: >> Thank you for testing. But don't get too excited, because the patch >> simulates a bug that was present in the original mgag200 code. A >> significant number of frames are simply skipped. That is apparently the >> reason why it's faster. > > Thanks for the detailed info, so the original code skips time-consuming > work inside atomic context on purpose. Is there any space to optmise it? > If 2 scheduled update worker are handled at almost same time, can one be > skipped? To my knowledge, there's only one instance of the worker. Re-scheduling the worker before a previous instance started, will not create a second instance. The worker's instance will complete all pending updates. So in some way, skipping workers already happens. Best regards Thomas > > Thanks, > Feng > >> >> Best regards >> Thomas > ___ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel > -- Thomas Zimmermann Graphics Driver Developer SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 21284 (AG Nürnberg) signature.asc Description: OpenPGP digital signature
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
Hi Thomas, On Wed, Aug 28, 2019 at 12:51:40PM +0200, Thomas Zimmermann wrote: > Hi > > Am 28.08.19 um 11:37 schrieb Rong Chen: > > Hi Thomas, > > > > On 8/28/19 1:16 AM, Thomas Zimmermann wrote: > >> Hi > >> > >> Am 27.08.19 um 14:33 schrieb Chen, Rong A: > >>> Both patches have little impact on the performance from our side. > >> Thanks for testing. Too bad they doesn't solve the issue. > >> > >> There's another patch attached. Could you please tests this as well? > >> Thanks a lot! > >> > >> The patch comes from Daniel Vetter after discussing the problem on IRC. > >> The idea of the patch is that the old mgag200 code might display much > >> less frames that the generic code, because mgag200 only prints from > >> non-atomic context. If we simulate this with the generic code, we should > >> see roughly the original performance. > >> > >> > > > > It's cool, the patch "usecansleep.patch" can fix the issue. > > Thank you for testing. But don't get too excited, because the patch > simulates a bug that was present in the original mgag200 code. A > significant number of frames are simply skipped. That is apparently the > reason why it's faster. Thanks for the detailed info, so the original code skips time-consuming work inside atomic context on purpose. Is there any space to optmise it? If 2 scheduled update worker are handled at almost same time, can one be skipped? Thanks, Feng > > Best regards > Thomas ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
Hi Am 28.08.19 um 11:37 schrieb Rong Chen: > Hi Thomas, > > On 8/28/19 1:16 AM, Thomas Zimmermann wrote: >> Hi >> >> Am 27.08.19 um 14:33 schrieb Chen, Rong A: >>> Both patches have little impact on the performance from our side. >> Thanks for testing. Too bad they doesn't solve the issue. >> >> There's another patch attached. Could you please tests this as well? >> Thanks a lot! >> >> The patch comes from Daniel Vetter after discussing the problem on IRC. >> The idea of the patch is that the old mgag200 code might display much >> less frames that the generic code, because mgag200 only prints from >> non-atomic context. If we simulate this with the generic code, we should >> see roughly the original performance. >> >> > > It's cool, the patch "usecansleep.patch" can fix the issue. Thank you for testing. But don't get too excited, because the patch simulates a bug that was present in the original mgag200 code. A significant number of frames are simply skipped. That is apparently the reason why it's faster. Best regards Thomas > commit: > f1f8555dfb9 drm/bochs: Use shadow buffer for bochs framebuffer console > 90f479ae51a drm/mgag200: Replace struct mga_fbdev with generic > framebuffer emulation > b976b04c2bc only schedule worker from non-atomic context > > f1f8555dfb9a70a2 90f479ae51afa45efab97afdde b976b04c2bcf33148d6c7bc1a2 > testcase/testparams/testbox > -- -- > --- > %stddev change %stddev change %stddev > \ | \ | \ > 42912 -15% 36517 44093 > vm-scalability/performance-300s-8T-anon-cow-seq-hugetlb/lkp-knm01 > 42912 -15% 36517 44093 GEO-MEAN > vm-scalability.median > > Best Regards, > Rong Chen > ___ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel -- Thomas Zimmermann Graphics Driver Developer SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 21284 (AG Nürnberg) signature.asc Description: OpenPGP digital signature ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
Hi Thomas, On 8/28/19 1:16 AM, Thomas Zimmermann wrote: Hi Am 27.08.19 um 14:33 schrieb Chen, Rong A: Both patches have little impact on the performance from our side. Thanks for testing. Too bad they doesn't solve the issue. There's another patch attached. Could you please tests this as well? Thanks a lot! The patch comes from Daniel Vetter after discussing the problem on IRC. The idea of the patch is that the old mgag200 code might display much less frames that the generic code, because mgag200 only prints from non-atomic context. If we simulate this with the generic code, we should see roughly the original performance. It's cool, the patch "usecansleep.patch" can fix the issue. commit: f1f8555dfb9 drm/bochs: Use shadow buffer for bochs framebuffer console 90f479ae51a drm/mgag200: Replace struct mga_fbdev with generic framebuffer emulation b976b04c2bc only schedule worker from non-atomic context f1f8555dfb9a70a2 90f479ae51afa45efab97afdde b976b04c2bcf33148d6c7bc1a2 testcase/testparams/testbox -- -- --- %stddev change %stddev change %stddev \ | \ | \ 42912 -15% 36517 44093 vm-scalability/performance-300s-8T-anon-cow-seq-hugetlb/lkp-knm01 42912 -15% 36517 44093 GEO-MEAN vm-scalability.median Best Regards, Rong Chen
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
Hi Am 27.08.19 um 14:33 schrieb Chen, Rong A: > > Both patches have little impact on the performance from our side. Thanks for testing. Too bad they doesn't solve the issue. There's another patch attached. Could you please tests this as well? Thanks a lot! The patch comes from Daniel Vetter after discussing the problem on IRC. The idea of the patch is that the old mgag200 code might display much less frames that the generic code, because mgag200 only prints from non-atomic context. If we simulate this with the generic code, we should see roughly the original performance. Best regards Thomas > > prefetch.patch: > commit: > f1f8555dfb9 drm/bochs: Use shadow buffer for bochs framebuffer console > 90f479ae51a drm/mgag200: Replace struct mga_fbdev with generic > framebuffer emulation > 77459f56994 prefetch shadow buffer two lines ahead of blit offset > > f1f8555dfb9a70a2 90f479ae51afa45efab97afdde 77459f56994ab87ee5459920b3 > testcase/testparams/testbox > -- -- > --- > %stddev change %stddev change %stddev > \ | \ | \ > 42912 -15% 36517 -17% 35515 > vm-scalability/performance-300s-8T-anon-cow-seq-hugetlb/lkp-knm01 > 42912 -15% 36517 -17% 35515 > GEO-MEAN vm-scalability.median > > schedule.patch: > commit: > f1f8555dfb9 drm/bochs: Use shadow buffer for bochs framebuffer console > 90f479ae51a drm/mgag200: Replace struct mga_fbdev with generic > framebuffer emulation > ccc5f095c61 schedule dirty worker on local core > > f1f8555dfb9a70a2 90f479ae51afa45efab97afdde ccc5f095c61ff6eded0f0ab1b7 > testcase/testparams/testbox > -- -- > --- > %stddev change %stddev change %stddev > \ | \ | \ > 42912 -15% 36517 -15% 36556 ± 4% > vm-scalability/performance-300s-8T-anon-cow-seq-hugetlb/lkp-knm01 > 42912 -15% 36517 -15% 36556 > GEO-MEAN vm-scalability.median > > Best Regards, > Rong Chen > ___ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel -- Thomas Zimmermann Graphics Driver Developer SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 21284 (AG Nürnberg) From e6e72031e85e1ad4cbd38fb47f899bab54bf6bdc Mon Sep 17 00:00:00 2001 From: Thomas Zimmermann Date: Tue, 27 Aug 2019 19:00:41 +0200 Subject: only schedule worker from non-atomic context --- drivers/gpu/drm/drm_fb_helper.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c index a7ba5b4902d6..3a3e4784eb28 100644 --- a/drivers/gpu/drm/drm_fb_helper.c +++ b/drivers/gpu/drm/drm_fb_helper.c @@ -642,7 +642,8 @@ static void drm_fb_helper_dirty(struct fb_info *info, u32 x, u32 y, clip->y2 = max_t(u32, clip->y2, y + height); spin_unlock_irqrestore(>dirty_lock, flags); - schedule_work(>dirty_work); + if (drm_can_sleep()) + schedule_work(>dirty_work); } /** -- 2.22.0 signature.asc Description: OpenPGP digital signature
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
Hi Thomas, On 8/26/2019 6:50 PM, Thomas Zimmermann wrote: Hi Feng Am 24.08.19 um 07:16 schrieb Feng Tang: Hi Thomas, On Thu, Aug 22, 2019 at 07:25:11PM +0200, Thomas Zimmermann wrote: Hi I was traveling and could reply earlier. Sorry for taking so long. No problem! I guessed so :) Am 13.08.19 um 11:36 schrieb Feng Tang: Hi Thomas, On Mon, Aug 12, 2019 at 03:25:45PM +0800, Feng Tang wrote: Hi Thomas, On Fri, Aug 09, 2019 at 04:12:29PM +0800, Rong Chen wrote: Hi, Actually we run the benchmark as a background process, do we need to disable the cursor and test again? There's a worker thread that updates the display from the shadow buffer. The blinking cursor periodically triggers the worker thread, but the actual update is just the size of one character. The point of the test without output is to see if the regression comes from the buffer update (i.e., the memcpy from shadow buffer to VRAM), or from the worker thread. If the regression goes away after disabling the blinking cursor, then the worker thread is the problem. If it already goes away if there's simply no output from the test, the screen update is the problem. On my machine I have to disable the blinking cursor, so I think the worker causes the performance drop. We disabled redirecting stdout/stderr to /dev/kmsg, and the regression is gone. commit: f1f8555dfb9 drm/bochs: Use shadow buffer for bochs framebuffer console 90f479ae51a drm/mgag200: Replace struct mga_fbdev with generic framebuffer emulation f1f8555dfb9a70a2 90f479ae51afa45efab97afdde testcase/testparams/testbox -- --- %stddev change %stddev \ | \ 43785 44481 vm-scalability/300s-8T-anon-cow-seq-hugetlb/lkp-knm01 43785 44481GEO-MEAN vm-scalability.median Till now, from Rong's tests: 1. Disabling cursor blinking doesn't cure the regression. 2. Disabling printint test results to console can workaround the regression. Also if we set the perfer_shadown to 0, the regression is also gone. We also did some further break down for the time consumed by the new code. The drm_fb_helper_dirty_work() calls sequentially 1. drm_client_buffer_vmap(290 us) 2. drm_fb_helper_dirty_blit_real (19240 us) 3. helper->fb->funcs->dirty()---> NULL for mgag200 driver 4. drm_client_buffer_vunmap (215 us) It's somewhat different to what I observed, but maybe I just couldn't reproduce the problem correctly. The average run time is listed after the function names. From it, we can see drm_fb_helper_dirty_blit_real() takes too long time (about 20ms for each run). I guess this is the root cause of this regression, as the original code doesn't use this dirty worker. True, the original code uses a temporary buffer, but updates the display immediately. My guess is that this could be a caching problem. The worker runs on a different CPU, which doesn't have the shadow buffer in cache. Yes, that's my thought too. I profiled the working set size, for most of the drm_fb_helper_dirty_blit_real(), it will update a buffer 4096x768(3 MB), and as it is called 30~40 times per second, it surely will affect the cache. As said in last email, setting the prefer_shadow to 0 can avoid the regrssion. Could it be an option? Unfortunately not. Without the shadow buffer, the console's display buffer permanently resides in video memory. It consumes significant amount of that memory (say 8 MiB out of 16 MiB). That doesn't leave enough room for anything else. The best option is to not print to the console. Do we have other options here? I attached two patches. Both show an improvement in my setup at least. Could you please test them independently from each other and report back? prefetch.patch prefetches the shadow buffer two scanlines ahead during the blit function. The idea is to have the scanlines in cache when they are supposed to go to hardware. schedule.patch schedules the dirty worker on the current CPU core (i.e., the one that did the drawing to the shadow buffer). Hopefully the shadow buffer remains in cache meanwhile. Best regards Thomas Both patches have little impact on the performance from our side. prefetch.patch: commit: f1f8555dfb9 drm/bochs: Use shadow buffer for bochs framebuffer console 90f479ae51a drm/mgag200: Replace struct mga_fbdev with generic framebuffer emulation 77459f56994 prefetch shadow buffer two lines ahead of blit offset f1f8555dfb9a70a2 90f479ae51afa45efab97afdde 77459f56994ab87ee5459920b3 testcase/testparams/testbox -- -- --- %stddev change %stddev change %stddev \ | \ | \ 42912 -15% 36517 -17% 35515 vm-scalability/performance-300s-8T-anon-cow-seq-hugetlb/lkp-knm01 42912 -15% 36517 -17% 35515 GEO-MEAN
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
Hi Feng Am 24.08.19 um 07:16 schrieb Feng Tang: > Hi Thomas, > > On Thu, Aug 22, 2019 at 07:25:11PM +0200, Thomas Zimmermann wrote: >> Hi >> >> I was traveling and could reply earlier. Sorry for taking so long. > > No problem! I guessed so :) > >> >> Am 13.08.19 um 11:36 schrieb Feng Tang: >>> Hi Thomas, >>> >>> On Mon, Aug 12, 2019 at 03:25:45PM +0800, Feng Tang wrote: Hi Thomas, On Fri, Aug 09, 2019 at 04:12:29PM +0800, Rong Chen wrote: > Hi, > >>> Actually we run the benchmark as a background process, do >>> we need to disable the cursor and test again? >> There's a worker thread that updates the display from the >> shadow buffer. The blinking cursor periodically triggers >> the worker thread, but the actual update is just the size >> of one character. >> >> The point of the test without output is to see if the >> regression comes from the buffer update (i.e., the memcpy >> from shadow buffer to VRAM), or from the worker thread. If >> the regression goes away after disabling the blinking >> cursor, then the worker thread is the problem. If it >> already goes away if there's simply no output from the >> test, the screen update is the problem. On my machine I >> have to disable the blinking cursor, so I think the worker >> causes the performance drop. > > We disabled redirecting stdout/stderr to /dev/kmsg, and the > regression is gone. > > commit: f1f8555dfb9 drm/bochs: Use shadow buffer for bochs > framebuffer console 90f479ae51a drm/mgag200: Replace struct > mga_fbdev with generic framebuffer emulation > > f1f8555dfb9a70a2 90f479ae51afa45efab97afdde > testcase/testparams/testbox > -- --- > %stddev change %stddev \ | \ 43785 > 44481 vm-scalability/300s-8T-anon-cow-seq-hugetlb/lkp-knm01 > 43785 44481GEO-MEAN vm-scalability.median Till now, from Rong's tests: 1. Disabling cursor blinking doesn't cure the regression. 2. Disabling printint test results to console can workaround the regression. Also if we set the perfer_shadown to 0, the regression is also gone. >>> >>> We also did some further break down for the time consumed by the >>> new code. >>> >>> The drm_fb_helper_dirty_work() calls sequentially 1. >>> drm_client_buffer_vmap(290 us) 2. >>> drm_fb_helper_dirty_blit_real (19240 us) 3. >>> helper->fb->funcs->dirty()---> NULL for mgag200 driver 4. >>> drm_client_buffer_vunmap (215 us) >>> >> >> It's somewhat different to what I observed, but maybe I just >> couldn't reproduce the problem correctly. >> >>> The average run time is listed after the function names. >>> >>> From it, we can see drm_fb_helper_dirty_blit_real() takes too >>> long time (about 20ms for each run). I guess this is the root >>> cause of this regression, as the original code doesn't use this >>> dirty worker. >> >> True, the original code uses a temporary buffer, but updates the >> display immediately. >> >> My guess is that this could be a caching problem. The worker runs >> on a different CPU, which doesn't have the shadow buffer in cache. > > Yes, that's my thought too. I profiled the working set size, for most > of the drm_fb_helper_dirty_blit_real(), it will update a buffer > 4096x768(3 MB), and as it is called 30~40 times per second, it surely > will affect the cache. > > >>> As said in last email, setting the prefer_shadow to 0 can avoid >>> the regrssion. Could it be an option? >> >> Unfortunately not. Without the shadow buffer, the console's >> display buffer permanently resides in video memory. It consumes >> significant amount of that memory (say 8 MiB out of 16 MiB). That >> doesn't leave enough room for anything else. >> >> The best option is to not print to the console. > > Do we have other options here? I attached two patches. Both show an improvement in my setup at least. Could you please test them independently from each other and report back? prefetch.patch prefetches the shadow buffer two scanlines ahead during the blit function. The idea is to have the scanlines in cache when they are supposed to go to hardware. schedule.patch schedules the dirty worker on the current CPU core (i.e., the one that did the drawing to the shadow buffer). Hopefully the shadow buffer remains in cache meanwhile. Best regards Thomas > My thought is this is clearly a regression, that the old driver > works fine, while the new version in linux-next doesn't. Also for a > frame buffer console, writting dozens line of message to it is not a > rare user case. We have many test platforms > (servers/desktops/laptops) with different kinds of GFX hardwares, and > this model works fine for many years :) > > Thanks, Feng > > > >> Best regards Thomas >> >>> Thanks, Feng >>> ---
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
Hi Thomas, On Thu, Aug 22, 2019 at 07:25:11PM +0200, Thomas Zimmermann wrote: > Hi > > I was traveling and could reply earlier. Sorry for taking so long. No problem! I guessed so :) > > Am 13.08.19 um 11:36 schrieb Feng Tang: > > Hi Thomas, > > > > On Mon, Aug 12, 2019 at 03:25:45PM +0800, Feng Tang wrote: > >> Hi Thomas, > >> > >> On Fri, Aug 09, 2019 at 04:12:29PM +0800, Rong Chen wrote: > >>> Hi, > >>> > > Actually we run the benchmark as a background process, do we need to > > disable the cursor and test again? > There's a worker thread that updates the display from the shadow buffer. > The blinking cursor periodically triggers the worker thread, but the > actual update is just the size of one character. > > The point of the test without output is to see if the regression comes > >>> >from the buffer update (i.e., the memcpy from shadow buffer to VRAM), or > >>> >from the worker thread. If the regression goes away after disabling the > blinking cursor, then the worker thread is the problem. If it already > goes away if there's simply no output from the test, the screen update > is the problem. On my machine I have to disable the blinking cursor, so > I think the worker causes the performance drop. > >>> > >>> We disabled redirecting stdout/stderr to /dev/kmsg, and the regression is > >>> gone. > >>> > >>> commit: > >>> f1f8555dfb9 drm/bochs: Use shadow buffer for bochs framebuffer console > >>> 90f479ae51a drm/mgag200: Replace struct mga_fbdev with generic > >>> framebuffer > >>> emulation > >>> > >>> f1f8555dfb9a70a2 90f479ae51afa45efab97afdde testcase/testparams/testbox > >>> -- --- > >>> %stddev change %stddev > >>> \ | \ > >>> 43785 44481 > >>> vm-scalability/300s-8T-anon-cow-seq-hugetlb/lkp-knm01 > >>> 43785 44481 GEO-MEAN > >>> vm-scalability.median > >> > >> Till now, from Rong's tests: > >> 1. Disabling cursor blinking doesn't cure the regression. > >> 2. Disabling printint test results to console can workaround the > >> regression. > >> > >> Also if we set the perfer_shadown to 0, the regression is also > >> gone. > > > > We also did some further break down for the time consumed by the > > new code. > > > > The drm_fb_helper_dirty_work() calls sequentially > > 1. drm_client_buffer_vmap (290 us) > > 2. drm_fb_helper_dirty_blit_real (19240 us) > > 3. helper->fb->funcs->dirty()---> NULL for mgag200 driver > > 4. drm_client_buffer_vunmap (215 us) > > > > It's somewhat different to what I observed, but maybe I just couldn't > reproduce the problem correctly. > > > The average run time is listed after the function names. > > > > From it, we can see drm_fb_helper_dirty_blit_real() takes too long > > time (about 20ms for each run). I guess this is the root cause > > of this regression, as the original code doesn't use this dirty worker. > > True, the original code uses a temporary buffer, but updates the display > immediately. > > My guess is that this could be a caching problem. The worker runs on a > different CPU, which doesn't have the shadow buffer in cache. Yes, that's my thought too. I profiled the working set size, for most of the drm_fb_helper_dirty_blit_real(), it will update a buffer 4096x768(3 MB), and as it is called 30~40 times per second, it surely will affect the cache. > > As said in last email, setting the prefer_shadow to 0 can avoid > > the regrssion. Could it be an option? > > Unfortunately not. Without the shadow buffer, the console's display > buffer permanently resides in video memory. It consumes significant > amount of that memory (say 8 MiB out of 16 MiB). That doesn't leave > enough room for anything else. > > The best option is to not print to the console. Do we have other options here? My thought is this is clearly a regression, that the old driver works fine, while the new version in linux-next doesn't. Also for a frame buffer console, writting dozens line of message to it is not a rare user case. We have many test platforms (servers/desktops/laptops) with different kinds of GFX hardwares, and this model works fine for many years :) Thanks, Feng > Best regards > Thomas > > > Thanks, > > Feng > > > >> > >> --- a/drivers/gpu/drm/mgag200/mgag200_main.c > >> +++ b/drivers/gpu/drm/mgag200/mgag200_main.c > >> @@ -167,7 +167,7 @@ int mgag200_driver_load(struct drm_device *dev, > >> unsigned long flags) > >>dev->mode_config.preferred_depth = 16; > >>else > >>dev->mode_config.preferred_depth = 32; > >> - dev->mode_config.prefer_shadow = 1; > >> + dev->mode_config.prefer_shadow = 0; > >> > >> And from the perf data, one obvious difference is good case don't > >> call drm_fb_helper_dirty_work(), while bad case calls. > >> > >> Thanks, > >> Feng > >> > >>> Best
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
Hi Am 22.08.19 um 22:02 schrieb Dave Airlie: > On Fri, 23 Aug 2019 at 03:25, Thomas Zimmermann wrote: >> >> Hi >> >> I was traveling and could reply earlier. Sorry for taking so long. >> >> Am 13.08.19 um 11:36 schrieb Feng Tang: >>> Hi Thomas, >>> >>> On Mon, Aug 12, 2019 at 03:25:45PM +0800, Feng Tang wrote: Hi Thomas, On Fri, Aug 09, 2019 at 04:12:29PM +0800, Rong Chen wrote: > Hi, > >>> Actually we run the benchmark as a background process, do we need to >>> disable the cursor and test again? >> There's a worker thread that updates the display from the shadow buffer. >> The blinking cursor periodically triggers the worker thread, but the >> actual update is just the size of one character. >> >> The point of the test without output is to see if the regression comes > >from the buffer update (i.e., the memcpy from shadow buffer to VRAM), or > >from the worker thread. If the regression goes away after disabling the >> blinking cursor, then the worker thread is the problem. If it already >> goes away if there's simply no output from the test, the screen update >> is the problem. On my machine I have to disable the blinking cursor, so >> I think the worker causes the performance drop. > > We disabled redirecting stdout/stderr to /dev/kmsg, and the regression is > gone. > > commit: > f1f8555dfb9 drm/bochs: Use shadow buffer for bochs framebuffer console > 90f479ae51a drm/mgag200: Replace struct mga_fbdev with generic > framebuffer > emulation > > f1f8555dfb9a70a2 90f479ae51afa45efab97afdde testcase/testparams/testbox > -- --- > %stddev change %stddev > \ |\ > 43785 44481 > vm-scalability/300s-8T-anon-cow-seq-hugetlb/lkp-knm01 > 43785 44481GEO-MEAN > vm-scalability.median Till now, from Rong's tests: 1. Disabling cursor blinking doesn't cure the regression. 2. Disabling printint test results to console can workaround the regression. Also if we set the perfer_shadown to 0, the regression is also gone. >>> >>> We also did some further break down for the time consumed by the >>> new code. >>> >>> The drm_fb_helper_dirty_work() calls sequentially >>> 1. drm_client_buffer_vmap (290 us) >>> 2. drm_fb_helper_dirty_blit_real (19240 us) >>> 3. helper->fb->funcs->dirty()---> NULL for mgag200 driver >>> 4. drm_client_buffer_vunmap (215 us) >>> >> >> It's somewhat different to what I observed, but maybe I just couldn't >> reproduce the problem correctly. >> >>> The average run time is listed after the function names. >>> >>> From it, we can see drm_fb_helper_dirty_blit_real() takes too long >>> time (about 20ms for each run). I guess this is the root cause >>> of this regression, as the original code doesn't use this dirty worker. >> >> True, the original code uses a temporary buffer, but updates the display >> immediately. >> >> My guess is that this could be a caching problem. The worker runs on a >> different CPU, which doesn't have the shadow buffer in cache. >> >>> As said in last email, setting the prefer_shadow to 0 can avoid >>> the regrssion. Could it be an option? >> >> Unfortunately not. Without the shadow buffer, the console's display >> buffer permanently resides in video memory. It consumes significant >> amount of that memory (say 8 MiB out of 16 MiB). That doesn't leave >> enough room for anything else. >> >> The best option is to not print to the console. > > Wait a second, I thought the driver did an eviction on modeset of the > scanned out object, this was a deliberate design decision made when > writing those drivers, has this been removed in favour of gem and > generic code paths? Yes. We added back this feature for testing in [1]. It was only an improvement of ~1% compared to the original report. I wouldn't mind landing this patch set, but it probably doesn't make a difference either. Best regards Thomas [1] https://lists.freedesktop.org/archives/dri-devel/2019-August/228950.html > > Dave. > -- Thomas Zimmermann Graphics Driver Developer SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 21284 (AG Nürnberg) signature.asc Description: OpenPGP digital signature ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
On Fri, 23 Aug 2019 at 03:25, Thomas Zimmermann wrote: > > Hi > > I was traveling and could reply earlier. Sorry for taking so long. > > Am 13.08.19 um 11:36 schrieb Feng Tang: > > Hi Thomas, > > > > On Mon, Aug 12, 2019 at 03:25:45PM +0800, Feng Tang wrote: > >> Hi Thomas, > >> > >> On Fri, Aug 09, 2019 at 04:12:29PM +0800, Rong Chen wrote: > >>> Hi, > >>> > > Actually we run the benchmark as a background process, do we need to > > disable the cursor and test again? > There's a worker thread that updates the display from the shadow buffer. > The blinking cursor periodically triggers the worker thread, but the > actual update is just the size of one character. > > The point of the test without output is to see if the regression comes > >>> >from the buffer update (i.e., the memcpy from shadow buffer to VRAM), or > >>> >from the worker thread. If the regression goes away after disabling the > blinking cursor, then the worker thread is the problem. If it already > goes away if there's simply no output from the test, the screen update > is the problem. On my machine I have to disable the blinking cursor, so > I think the worker causes the performance drop. > >>> > >>> We disabled redirecting stdout/stderr to /dev/kmsg, and the regression is > >>> gone. > >>> > >>> commit: > >>> f1f8555dfb9 drm/bochs: Use shadow buffer for bochs framebuffer console > >>> 90f479ae51a drm/mgag200: Replace struct mga_fbdev with generic > >>> framebuffer > >>> emulation > >>> > >>> f1f8555dfb9a70a2 90f479ae51afa45efab97afdde testcase/testparams/testbox > >>> -- --- > >>> %stddev change %stddev > >>> \ |\ > >>> 43785 44481 > >>> vm-scalability/300s-8T-anon-cow-seq-hugetlb/lkp-knm01 > >>> 43785 44481GEO-MEAN > >>> vm-scalability.median > >> > >> Till now, from Rong's tests: > >> 1. Disabling cursor blinking doesn't cure the regression. > >> 2. Disabling printint test results to console can workaround the > >> regression. > >> > >> Also if we set the perfer_shadown to 0, the regression is also > >> gone. > > > > We also did some further break down for the time consumed by the > > new code. > > > > The drm_fb_helper_dirty_work() calls sequentially > > 1. drm_client_buffer_vmap (290 us) > > 2. drm_fb_helper_dirty_blit_real (19240 us) > > 3. helper->fb->funcs->dirty()---> NULL for mgag200 driver > > 4. drm_client_buffer_vunmap (215 us) > > > > It's somewhat different to what I observed, but maybe I just couldn't > reproduce the problem correctly. > > > The average run time is listed after the function names. > > > > From it, we can see drm_fb_helper_dirty_blit_real() takes too long > > time (about 20ms for each run). I guess this is the root cause > > of this regression, as the original code doesn't use this dirty worker. > > True, the original code uses a temporary buffer, but updates the display > immediately. > > My guess is that this could be a caching problem. The worker runs on a > different CPU, which doesn't have the shadow buffer in cache. > > > As said in last email, setting the prefer_shadow to 0 can avoid > > the regrssion. Could it be an option? > > Unfortunately not. Without the shadow buffer, the console's display > buffer permanently resides in video memory. It consumes significant > amount of that memory (say 8 MiB out of 16 MiB). That doesn't leave > enough room for anything else. > > The best option is to not print to the console. Wait a second, I thought the driver did an eviction on modeset of the scanned out object, this was a deliberate design decision made when writing those drivers, has this been removed in favour of gem and generic code paths? Dave. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
Hi I was traveling and could reply earlier. Sorry for taking so long. Am 13.08.19 um 11:36 schrieb Feng Tang: > Hi Thomas, > > On Mon, Aug 12, 2019 at 03:25:45PM +0800, Feng Tang wrote: >> Hi Thomas, >> >> On Fri, Aug 09, 2019 at 04:12:29PM +0800, Rong Chen wrote: >>> Hi, >>> > Actually we run the benchmark as a background process, do we need to > disable the cursor and test again? There's a worker thread that updates the display from the shadow buffer. The blinking cursor periodically triggers the worker thread, but the actual update is just the size of one character. The point of the test without output is to see if the regression comes >>> >from the buffer update (i.e., the memcpy from shadow buffer to VRAM), or >>> >from the worker thread. If the regression goes away after disabling the blinking cursor, then the worker thread is the problem. If it already goes away if there's simply no output from the test, the screen update is the problem. On my machine I have to disable the blinking cursor, so I think the worker causes the performance drop. >>> >>> We disabled redirecting stdout/stderr to /dev/kmsg, and the regression is >>> gone. >>> >>> commit: >>> f1f8555dfb9 drm/bochs: Use shadow buffer for bochs framebuffer console >>> 90f479ae51a drm/mgag200: Replace struct mga_fbdev with generic framebuffer >>> emulation >>> >>> f1f8555dfb9a70a2 90f479ae51afa45efab97afdde testcase/testparams/testbox >>> -- --- >>> %stddev change %stddev >>> \ | \ >>> 43785 44481 >>> vm-scalability/300s-8T-anon-cow-seq-hugetlb/lkp-knm01 >>> 43785 44481 GEO-MEAN vm-scalability.median >> >> Till now, from Rong's tests: >> 1. Disabling cursor blinking doesn't cure the regression. >> 2. Disabling printint test results to console can workaround the >> regression. >> >> Also if we set the perfer_shadown to 0, the regression is also >> gone. > > We also did some further break down for the time consumed by the > new code. > > The drm_fb_helper_dirty_work() calls sequentially > 1. drm_client_buffer_vmap (290 us) > 2. drm_fb_helper_dirty_blit_real (19240 us) > 3. helper->fb->funcs->dirty()---> NULL for mgag200 driver > 4. drm_client_buffer_vunmap (215 us) > It's somewhat different to what I observed, but maybe I just couldn't reproduce the problem correctly. > The average run time is listed after the function names. > > From it, we can see drm_fb_helper_dirty_blit_real() takes too long > time (about 20ms for each run). I guess this is the root cause > of this regression, as the original code doesn't use this dirty worker. True, the original code uses a temporary buffer, but updates the display immediately. My guess is that this could be a caching problem. The worker runs on a different CPU, which doesn't have the shadow buffer in cache. > As said in last email, setting the prefer_shadow to 0 can avoid > the regrssion. Could it be an option? Unfortunately not. Without the shadow buffer, the console's display buffer permanently resides in video memory. It consumes significant amount of that memory (say 8 MiB out of 16 MiB). That doesn't leave enough room for anything else. The best option is to not print to the console. Best regards Thomas > Thanks, > Feng > >> >> --- a/drivers/gpu/drm/mgag200/mgag200_main.c >> +++ b/drivers/gpu/drm/mgag200/mgag200_main.c >> @@ -167,7 +167,7 @@ int mgag200_driver_load(struct drm_device *dev, unsigned >> long flags) >> dev->mode_config.preferred_depth = 16; >> else >> dev->mode_config.preferred_depth = 32; >> -dev->mode_config.prefer_shadow = 1; >> +dev->mode_config.prefer_shadow = 0; >> >> And from the perf data, one obvious difference is good case don't >> call drm_fb_helper_dirty_work(), while bad case calls. >> >> Thanks, >> Feng >> >>> Best Regards, >>> Rong Chen > ___ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel > -- Thomas Zimmermann Graphics Driver Developer SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 21284 (AG Nürnberg) signature.asc Description: OpenPGP digital signature ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
Hi Thomas, On Tue, Aug 13, 2019 at 05:36:16PM +0800, Feng Tang wrote: > Hi Thomas, > > On Mon, Aug 12, 2019 at 03:25:45PM +0800, Feng Tang wrote: > > Hi Thomas, > > > > On Fri, Aug 09, 2019 at 04:12:29PM +0800, Rong Chen wrote: > > > Hi, > > > > > > >>Actually we run the benchmark as a background process, do we need to > > > >>disable the cursor and test again? > > > >There's a worker thread that updates the display from the shadow buffer. > > > >The blinking cursor periodically triggers the worker thread, but the > > > >actual update is just the size of one character. > > > > > > > >The point of the test without output is to see if the regression comes > > > >from the buffer update (i.e., the memcpy from shadow buffer to VRAM), or > > > >from the worker thread. If the regression goes away after disabling the > > > >blinking cursor, then the worker thread is the problem. If it already > > > >goes away if there's simply no output from the test, the screen update > > > >is the problem. On my machine I have to disable the blinking cursor, so > > > >I think the worker causes the performance drop. > > > > > > We disabled redirecting stdout/stderr to /dev/kmsg, and the regression is > > > gone. > > > > > > commit: > > > f1f8555dfb9 drm/bochs: Use shadow buffer for bochs framebuffer console > > > 90f479ae51a drm/mgag200: Replace struct mga_fbdev with generic > > > framebuffer > > > emulation > > > > > > f1f8555dfb9a70a2 90f479ae51afa45efab97afdde testcase/testparams/testbox > > > -- --- > > > %stddev change %stddev > > > \ | \ > > > 43785 44481 > > > vm-scalability/300s-8T-anon-cow-seq-hugetlb/lkp-knm01 > > > 43785 44481 GEO-MEAN > > > vm-scalability.median > > > > Till now, from Rong's tests: > > 1. Disabling cursor blinking doesn't cure the regression. > > 2. Disabling printint test results to console can workaround the > > regression. > > > > Also if we set the perfer_shadown to 0, the regression is also > > gone. > > We also did some further break down for the time consumed by the > new code. > > The drm_fb_helper_dirty_work() calls sequentially > 1. drm_client_buffer_vmap (290 us) > 2. drm_fb_helper_dirty_blit_real (19240 us) > 3. helper->fb->funcs->dirty()---> NULL for mgag200 driver > 4. drm_client_buffer_vunmap (215 us) > > The average run time is listed after the function names. > > From it, we can see drm_fb_helper_dirty_blit_real() takes too long > time (about 20ms for each run). I guess this is the root cause > of this regression, as the original code doesn't use this dirty worker. > > As said in last email, setting the prefer_shadow to 0 can avoid > the regrssion. Could it be an option? Any comments on this? thanks - Feng > > Thanks, > Feng > > > > > --- a/drivers/gpu/drm/mgag200/mgag200_main.c > > +++ b/drivers/gpu/drm/mgag200/mgag200_main.c > > @@ -167,7 +167,7 @@ int mgag200_driver_load(struct drm_device *dev, > > unsigned long flags) > > dev->mode_config.preferred_depth = 16; > > else > > dev->mode_config.preferred_depth = 32; > > - dev->mode_config.prefer_shadow = 1; > > + dev->mode_config.prefer_shadow = 0; > > > > And from the perf data, one obvious difference is good case don't > > call drm_fb_helper_dirty_work(), while bad case calls. > > > > Thanks, > > Feng > > > > > Best Regards, > > > Rong Chen > ___ > LKP mailing list > l...@lists.01.org > https://lists.01.org/mailman/listinfo/lkp ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
Hi Thomas, On Mon, Aug 12, 2019 at 03:25:45PM +0800, Feng Tang wrote: > Hi Thomas, > > On Fri, Aug 09, 2019 at 04:12:29PM +0800, Rong Chen wrote: > > Hi, > > > > >>Actually we run the benchmark as a background process, do we need to > > >>disable the cursor and test again? > > >There's a worker thread that updates the display from the shadow buffer. > > >The blinking cursor periodically triggers the worker thread, but the > > >actual update is just the size of one character. > > > > > >The point of the test without output is to see if the regression comes > > >from the buffer update (i.e., the memcpy from shadow buffer to VRAM), or > > >from the worker thread. If the regression goes away after disabling the > > >blinking cursor, then the worker thread is the problem. If it already > > >goes away if there's simply no output from the test, the screen update > > >is the problem. On my machine I have to disable the blinking cursor, so > > >I think the worker causes the performance drop. > > > > We disabled redirecting stdout/stderr to /dev/kmsg, and the regression is > > gone. > > > > commit: > > f1f8555dfb9 drm/bochs: Use shadow buffer for bochs framebuffer console > > 90f479ae51a drm/mgag200: Replace struct mga_fbdev with generic framebuffer > > emulation > > > > f1f8555dfb9a70a2 90f479ae51afa45efab97afdde testcase/testparams/testbox > > -- --- > > %stddev change %stddev > > \ | \ > > 43785 44481 > > vm-scalability/300s-8T-anon-cow-seq-hugetlb/lkp-knm01 > > 43785 44481 GEO-MEAN vm-scalability.median > > Till now, from Rong's tests: > 1. Disabling cursor blinking doesn't cure the regression. > 2. Disabling printint test results to console can workaround the > regression. > > Also if we set the perfer_shadown to 0, the regression is also > gone. We also did some further break down for the time consumed by the new code. The drm_fb_helper_dirty_work() calls sequentially 1. drm_client_buffer_vmap (290 us) 2. drm_fb_helper_dirty_blit_real (19240 us) 3. helper->fb->funcs->dirty()---> NULL for mgag200 driver 4. drm_client_buffer_vunmap (215 us) The average run time is listed after the function names. From it, we can see drm_fb_helper_dirty_blit_real() takes too long time (about 20ms for each run). I guess this is the root cause of this regression, as the original code doesn't use this dirty worker. As said in last email, setting the prefer_shadow to 0 can avoid the regrssion. Could it be an option? Thanks, Feng > > --- a/drivers/gpu/drm/mgag200/mgag200_main.c > +++ b/drivers/gpu/drm/mgag200/mgag200_main.c > @@ -167,7 +167,7 @@ int mgag200_driver_load(struct drm_device *dev, unsigned > long flags) > dev->mode_config.preferred_depth = 16; > else > dev->mode_config.preferred_depth = 32; > - dev->mode_config.prefer_shadow = 1; > + dev->mode_config.prefer_shadow = 0; > > And from the perf data, one obvious difference is good case don't > call drm_fb_helper_dirty_work(), while bad case calls. > > Thanks, > Feng > > > Best Regards, > > Rong Chen ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
Hi Thomas, On Fri, Aug 09, 2019 at 04:12:29PM +0800, Rong Chen wrote: > Hi, > > >>Actually we run the benchmark as a background process, do we need to > >>disable the cursor and test again? > >There's a worker thread that updates the display from the shadow buffer. > >The blinking cursor periodically triggers the worker thread, but the > >actual update is just the size of one character. > > > >The point of the test without output is to see if the regression comes > >from the buffer update (i.e., the memcpy from shadow buffer to VRAM), or > >from the worker thread. If the regression goes away after disabling the > >blinking cursor, then the worker thread is the problem. If it already > >goes away if there's simply no output from the test, the screen update > >is the problem. On my machine I have to disable the blinking cursor, so > >I think the worker causes the performance drop. > > We disabled redirecting stdout/stderr to /dev/kmsg, and the regression is > gone. > > commit: > f1f8555dfb9 drm/bochs: Use shadow buffer for bochs framebuffer console > 90f479ae51a drm/mgag200: Replace struct mga_fbdev with generic framebuffer > emulation > > f1f8555dfb9a70a2 90f479ae51afa45efab97afdde testcase/testparams/testbox > -- --- > %stddev change %stddev > \ | \ > 43785 44481 > vm-scalability/300s-8T-anon-cow-seq-hugetlb/lkp-knm01 > 43785 44481 GEO-MEAN vm-scalability.median Till now, from Rong's tests: 1. Disabling cursor blinking doesn't cure the regression. 2. Disabling printint test results to console can workaround the regression. Also if we set the perfer_shadown to 0, the regression is also gone. --- a/drivers/gpu/drm/mgag200/mgag200_main.c +++ b/drivers/gpu/drm/mgag200/mgag200_main.c @@ -167,7 +167,7 @@ int mgag200_driver_load(struct drm_device *dev, unsigned long flags) dev->mode_config.preferred_depth = 16; else dev->mode_config.preferred_depth = 32; - dev->mode_config.prefer_shadow = 1; + dev->mode_config.prefer_shadow = 0; And from the perf data, one obvious difference is good case don't call drm_fb_helper_dirty_work(), while bad case calls. Thanks, Feng > Best Regards, > Rong Chen ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
Hi, On 8/7/19 6:42 PM, Thomas Zimmermann wrote: Hi Rong Am 06.08.19 um 14:59 schrieb Chen, Rong A: Hi, On 8/5/2019 6:25 PM, Thomas Zimmermann wrote: Hi Am 05.08.19 um 09:28 schrieb Rong Chen: Hi, On 8/5/19 3:02 PM, Feng Tang wrote: Hi Thomas, On Sun, Aug 04, 2019 at 08:39:19PM +0200, Thomas Zimmermann wrote: Hi I did some further analysis on this problem and found that the blinking cursor affects performance of the vm-scalability test case. I only have a 4-core machine, so scalability is not really testable. Yet I see the effects of running vm-scalibility against drm-tip, a revert of the mgag200 patch and the vmap fixes that I posted a few days ago. After reverting the mgag200 patch, running the test as described in the report bin/lkp run job.yaml gives results like 2019-08-02 19:34:37 ./case-anon-cow-seq-hugetlb 2019-08-02 19:34:37 ./usemem --runtime 300 -n 4 --prealloc --prefault -O -U 815395225 917319627 bytes / 756534 usecs = 1184110 KB/s 917319627 bytes / 764675 usecs = 1171504 KB/s 917319627 bytes / 766414 usecs = 1168846 KB/s 917319627 bytes / 777990 usecs = 1151454 KB/s Running the test against current drm-tip gives slightly worse results, such as. 2019-08-03 19:17:06 ./case-anon-cow-seq-hugetlb 2019-08-03 19:17:06 ./usemem --runtime 300 -n 4 --prealloc --prefault -O -U 815394406 917318700 bytes / 871607 usecs = 1027778 KB/s 917318700 bytes / 894173 usecs = 1001840 KB/s 917318700 bytes / 919694 usecs = 974040 KB/s 917318700 bytes / 923341 usecs = 970193 KB/s The test puts out roughly one result per second. Strangely sending the output to /dev/null can make results significantly worse. bin/lkp run job.yaml > /dev/null 2019-08-03 19:23:04 ./case-anon-cow-seq-hugetlb 2019-08-03 19:23:04 ./usemem --runtime 300 -n 4 --prealloc --prefault -O -U 815394406 917318700 bytes / 1207358 usecs = 741966 KB/s 917318700 bytes / 1210456 usecs = 740067 KB/s 917318700 bytes / 1216572 usecs = 736346 KB/s 917318700 bytes / 1239152 usecs = 722929 KB/s I realized that there's still a blinking cursor on the screen, which I disabled with tput civis or alternatively echo 0 > /sys/devices/virtual/graphics/fbcon/cursor_blink Running the the test now gives the original or even better results, such as bin/lkp run job.yaml > /dev/null 2019-08-03 19:29:17 ./case-anon-cow-seq-hugetlb 2019-08-03 19:29:17 ./usemem --runtime 300 -n 4 --prealloc --prefault -O -U 815394406 917318700 bytes / 659419 usecs = 1358497 KB/s 917318700 bytes / 659658 usecs = 1358005 KB/s 917318700 bytes / 659916 usecs = 1357474 KB/s 917318700 bytes / 660168 usecs = 1356956 KB/s Rong, Feng, could you confirm this by disabling the cursor or blinking? Glad to know this method restored the drop. Rong is running the case. I set "echo 0 > /sys/devices/virtual/graphics/fbcon/cursor_blink" for both commits, and the regression has no obvious change. Ah, I see. Thank you for testing. There are two questions that come to my mind: did you send the regular output to /dev/null? And what happens if you disable the cursor with 'tput civis'? I didn't send the output to /dev/null because we need to collect data from the output, You can send it to any file, as long as it doesn't show up on the console. I also found the latest results in the file result/vm-scalability. Actually we run the benchmark as a background process, do we need to disable the cursor and test again? There's a worker thread that updates the display from the shadow buffer. The blinking cursor periodically triggers the worker thread, but the actual update is just the size of one character. The point of the test without output is to see if the regression comes from the buffer update (i.e., the memcpy from shadow buffer to VRAM), or from the worker thread. If the regression goes away after disabling the blinking cursor, then the worker thread is the problem. If it already goes away if there's simply no output from the test, the screen update is the problem. On my machine I have to disable the blinking cursor, so I think the worker causes the performance drop. We disabled redirecting stdout/stderr to /dev/kmsg, and the regression is gone. commit: f1f8555dfb9 drm/bochs: Use shadow buffer for bochs framebuffer console 90f479ae51a drm/mgag200: Replace struct mga_fbdev with generic framebuffer emulation f1f8555dfb9a70a2 90f479ae51afa45efab97afdde testcase/testparams/testbox -- --- %stddev change %stddev \ | \ 43785 44481 vm-scalability/300s-8T-anon-cow-seq-hugetlb/lkp-knm01 43785 44481 GEO-MEAN vm-scalability.median Best Regards, Rong Chen Best regards Thomas Best Regards, Rong Chen If there is absolutely nothing changing
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
Hi Rong Am 06.08.19 um 14:59 schrieb Chen, Rong A: > Hi, > > On 8/5/2019 6:25 PM, Thomas Zimmermann wrote: >> Hi >> >> Am 05.08.19 um 09:28 schrieb Rong Chen: >>> Hi, >>> >>> On 8/5/19 3:02 PM, Feng Tang wrote: Hi Thomas, On Sun, Aug 04, 2019 at 08:39:19PM +0200, Thomas Zimmermann wrote: > Hi > > I did some further analysis on this problem and found that the blinking > cursor affects performance of the vm-scalability test case. > > I only have a 4-core machine, so scalability is not really testable. Yet > I see the effects of running vm-scalibility against drm-tip, a revert of > the mgag200 patch and the vmap fixes that I posted a few days ago. > > After reverting the mgag200 patch, running the test as described in the > report > > bin/lkp run job.yaml > > gives results like > > 2019-08-02 19:34:37 ./case-anon-cow-seq-hugetlb > 2019-08-02 19:34:37 ./usemem --runtime 300 -n 4 --prealloc > --prefault > -O -U 815395225 > 917319627 bytes / 756534 usecs = 1184110 KB/s > 917319627 bytes / 764675 usecs = 1171504 KB/s > 917319627 bytes / 766414 usecs = 1168846 KB/s > 917319627 bytes / 777990 usecs = 1151454 KB/s > > Running the test against current drm-tip gives slightly worse results, > such as. > > 2019-08-03 19:17:06 ./case-anon-cow-seq-hugetlb > 2019-08-03 19:17:06 ./usemem --runtime 300 -n 4 --prealloc > --prefault > -O -U 815394406 > 917318700 bytes / 871607 usecs = 1027778 KB/s > 917318700 bytes / 894173 usecs = 1001840 KB/s > 917318700 bytes / 919694 usecs = 974040 KB/s > 917318700 bytes / 923341 usecs = 970193 KB/s > > The test puts out roughly one result per second. Strangely sending the > output to /dev/null can make results significantly worse. > > bin/lkp run job.yaml > /dev/null > > 2019-08-03 19:23:04 ./case-anon-cow-seq-hugetlb > 2019-08-03 19:23:04 ./usemem --runtime 300 -n 4 --prealloc > --prefault > -O -U 815394406 > 917318700 bytes / 1207358 usecs = 741966 KB/s > 917318700 bytes / 1210456 usecs = 740067 KB/s > 917318700 bytes / 1216572 usecs = 736346 KB/s > 917318700 bytes / 1239152 usecs = 722929 KB/s > > I realized that there's still a blinking cursor on the screen, which I > disabled with > > tput civis > > or alternatively > > echo 0 > /sys/devices/virtual/graphics/fbcon/cursor_blink > > Running the the test now gives the original or even better results, > such as > > bin/lkp run job.yaml > /dev/null > > 2019-08-03 19:29:17 ./case-anon-cow-seq-hugetlb > 2019-08-03 19:29:17 ./usemem --runtime 300 -n 4 --prealloc > --prefault > -O -U 815394406 > 917318700 bytes / 659419 usecs = 1358497 KB/s > 917318700 bytes / 659658 usecs = 1358005 KB/s > 917318700 bytes / 659916 usecs = 1357474 KB/s > 917318700 bytes / 660168 usecs = 1356956 KB/s > > Rong, Feng, could you confirm this by disabling the cursor or blinking? Glad to know this method restored the drop. Rong is running the case. >>> I set "echo 0 > /sys/devices/virtual/graphics/fbcon/cursor_blink" for >>> both commits, >>> and the regression has no obvious change. >> Ah, I see. Thank you for testing. There are two questions that come to >> my mind: did you send the regular output to /dev/null? And what happens >> if you disable the cursor with 'tput civis'? > > I didn't send the output to /dev/null because we need to collect data > from the output, You can send it to any file, as long as it doesn't show up on the console. I also found the latest results in the file result/vm-scalability. > Actually we run the benchmark as a background process, do we need to > disable the cursor and test again? There's a worker thread that updates the display from the shadow buffer. The blinking cursor periodically triggers the worker thread, but the actual update is just the size of one character. The point of the test without output is to see if the regression comes from the buffer update (i.e., the memcpy from shadow buffer to VRAM), or from the worker thread. If the regression goes away after disabling the blinking cursor, then the worker thread is the problem. If it already goes away if there's simply no output from the test, the screen update is the problem. On my machine I have to disable the blinking cursor, so I think the worker causes the performance drop. Best regards Thomas > > Best Regards, > Rong Chen > >> If there is absolutely nothing changing on the screen, I don't see how >> the regression could persist. >> >> Best regards >> Thomas >> >> >>> commit: >>> f1f8555dfb9 drm/bochs: Use shadow buffer for bochs framebuffer console >>> 90f479ae51a drm/mgag200: Replace struct mga_fbdev with generic >>>
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
Hi, On 8/5/2019 6:25 PM, Thomas Zimmermann wrote: Hi Am 05.08.19 um 09:28 schrieb Rong Chen: Hi, On 8/5/19 3:02 PM, Feng Tang wrote: Hi Thomas, On Sun, Aug 04, 2019 at 08:39:19PM +0200, Thomas Zimmermann wrote: Hi I did some further analysis on this problem and found that the blinking cursor affects performance of the vm-scalability test case. I only have a 4-core machine, so scalability is not really testable. Yet I see the effects of running vm-scalibility against drm-tip, a revert of the mgag200 patch and the vmap fixes that I posted a few days ago. After reverting the mgag200 patch, running the test as described in the report bin/lkp run job.yaml gives results like 2019-08-02 19:34:37 ./case-anon-cow-seq-hugetlb 2019-08-02 19:34:37 ./usemem --runtime 300 -n 4 --prealloc --prefault -O -U 815395225 917319627 bytes / 756534 usecs = 1184110 KB/s 917319627 bytes / 764675 usecs = 1171504 KB/s 917319627 bytes / 766414 usecs = 1168846 KB/s 917319627 bytes / 777990 usecs = 1151454 KB/s Running the test against current drm-tip gives slightly worse results, such as. 2019-08-03 19:17:06 ./case-anon-cow-seq-hugetlb 2019-08-03 19:17:06 ./usemem --runtime 300 -n 4 --prealloc --prefault -O -U 815394406 917318700 bytes / 871607 usecs = 1027778 KB/s 917318700 bytes / 894173 usecs = 1001840 KB/s 917318700 bytes / 919694 usecs = 974040 KB/s 917318700 bytes / 923341 usecs = 970193 KB/s The test puts out roughly one result per second. Strangely sending the output to /dev/null can make results significantly worse. bin/lkp run job.yaml > /dev/null 2019-08-03 19:23:04 ./case-anon-cow-seq-hugetlb 2019-08-03 19:23:04 ./usemem --runtime 300 -n 4 --prealloc --prefault -O -U 815394406 917318700 bytes / 1207358 usecs = 741966 KB/s 917318700 bytes / 1210456 usecs = 740067 KB/s 917318700 bytes / 1216572 usecs = 736346 KB/s 917318700 bytes / 1239152 usecs = 722929 KB/s I realized that there's still a blinking cursor on the screen, which I disabled with tput civis or alternatively echo 0 > /sys/devices/virtual/graphics/fbcon/cursor_blink Running the the test now gives the original or even better results, such as bin/lkp run job.yaml > /dev/null 2019-08-03 19:29:17 ./case-anon-cow-seq-hugetlb 2019-08-03 19:29:17 ./usemem --runtime 300 -n 4 --prealloc --prefault -O -U 815394406 917318700 bytes / 659419 usecs = 1358497 KB/s 917318700 bytes / 659658 usecs = 1358005 KB/s 917318700 bytes / 659916 usecs = 1357474 KB/s 917318700 bytes / 660168 usecs = 1356956 KB/s Rong, Feng, could you confirm this by disabling the cursor or blinking? Glad to know this method restored the drop. Rong is running the case. I set "echo 0 > /sys/devices/virtual/graphics/fbcon/cursor_blink" for both commits, and the regression has no obvious change. Ah, I see. Thank you for testing. There are two questions that come to my mind: did you send the regular output to /dev/null? And what happens if you disable the cursor with 'tput civis'? I didn't send the output to /dev/null because we need to collect data from the output, Actually we run the benchmark as a background process, do we need to disable the cursor and test again? Best Regards, Rong Chen If there is absolutely nothing changing on the screen, I don't see how the regression could persist. Best regards Thomas commit: f1f8555dfb9 drm/bochs: Use shadow buffer for bochs framebuffer console 90f479ae51a drm/mgag200: Replace struct mga_fbdev with generic framebuffer emulation f1f8555dfb9a70a2 90f479ae51afa45efab97afdde testcase/testparams/testbox -- --- %stddev change %stddev \ | \ 43394 -20% 34575 ± 3% vm-scalability/performance-300s-8T-anon-cow-seq-hugetlb/lkp-knm01 43393 -20% 34575 GEO-MEAN vm-scalability.median Best Regards, Rong Chen While I have another finds, as I noticed your patch changed the bpp from 24 to 32, I had a patch to change it back to 24, and run the case in the weekend, the -18% regrssion was reduced to about -5%. Could this be related? commit: f1f8555dfb9 drm/bochs: Use shadow buffer for bochs framebuffer console 90f479ae51a drm/mgag200: Replace struct mga_fbdev with generic framebuffer emulation 01e75fea0d5 mgag200: restore the depth back to 24 f1f8555dfb9a70a2 90f479ae51afa45efab97afdde9 01e75fea0d5ff39d3e588c20ec5 --- --- 43921 ± 2% -18.3% 35884 -4.8% 41826 vm-scalability.median 14889337 -17.5% 12291029 -4.1% 14278574 vm-scalability.throughput commit 01e75fea0d5ff39d3e588c20ec52e7a4e6588a74 Author: Feng Tang Date: Fri Aug 2 15:09:19 2019 +0800 mgag200:
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
Hi Am 02.08.19 um 09:11 schrieb Rong Chen: > Hi, > > On 8/1/19 7:58 PM, Thomas Zimmermann wrote: >> Hi >> >> Am 01.08.19 um 13:25 schrieb Feng Tang: >>> Hi Thomas, >>> >>> On Thu, Aug 01, 2019 at 11:59:28AM +0200, Thomas Zimmermann wrote: Hi Am 01.08.19 um 10:37 schrieb Feng Tang: > On Thu, Aug 01, 2019 at 02:19:53PM +0800, Rong Chen wrote: >>> commit: 90f479ae51afa45efab97afdde9b94b9660dd3e4 >>> ("drm/mgag200: Replace struct mga_fbdev with generic >>> framebuffer emulation") >>> https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next.git >>> master >> Daniel, Noralf, we may have to revert this patch. >> >> I expected some change in display performance, but not in >> VM. Since it's >> a server chipset, probably no one cares much about display >> performance. >> So that seemed like a good trade-off for re-using shared >> code. >> >> Part of the patch set is that the generic fb emulation now >> maps and >> unmaps the fbdev BO when updating the screen. I guess >> that's the cause >> of the performance regression. And it should be visible >> with other >> drivers as well if they use a shadow FB for fbdev emulation. > For fbcon we should need to do any maps/unamps at all, this > is for the > fbdev mmap support only. If the testcase mentioned here > tests fbdev > mmap handling it's pretty badly misnamed :-) And as long as > you don't > have an fbdev mmap there shouldn't be any impact at all. The ast and mgag200 have only a few MiB of VRAM, so we have to get the fbdev BO out if it's not being displayed. If not being mapped, it can be evicted and make room for X, etc. To make this work, the BO's memory is mapped and unmapped in drm_fb_helper_dirty_work() before being updated from the shadow FB. [1] That fbdev mapping is established on each screen update, more or less. From my (yet unverified) understanding, this causes the performance regression in the VM code. The original code in mgag200 used to kmap the fbdev BO while it's being displayed; [2] and the drawing code only mapped it when necessary (i.e., not being display). [3] >>> Hm yeah, this vmap/vunmap is going to be pretty bad. We >>> indeed should >>> cache this. >>> I think this could be added for VRAM helpers as well, but it's still a workaround and non-VRAM drivers might also run into such a performance regression if they use the fbdev's shadow fb. >>> Yeah agreed, fbdev emulation should try to cache the vmap. >>> Noralf mentioned that there are plans for other DRM clients besides the console. They would as well run into similar problems. >> The thing is that we'd need another generic fbdev >> emulation for ast and >> mgag200 that handles this issue properly. > Yeah I dont think we want to jump the gun here. If you can > try to > repro locally and profile where we're wasting cpu time I > hope that > should sched a light what's going wrong here. I don't have much time ATM and I'm not even officially at work until late Aug. I'd send you the revert and investigate later. I agree that using generic fbdev emulation would be preferable. >>> Still not sure that's the right thing to do really. Yes it's a >>> regression, but vm testcases shouldn run a single line of >>> fbcon or drm >>> code. So why this is impacted so heavily by a silly drm >>> change is very >>> confusing to me. We might be papering over a deeper and much >>> more >>> serious issue ... >> It's a regression, the right thing is to revert first and then >> work >> out the right thing to do. > Sure, but I have no idea whether the testcase is doing something > reasonable. If it's accidentally testing vm scalability of > fbdev and > there's no one else doing something this pointless, then it's > not a > real bug. Plus I think we're shooting the messenger here. > >> It's likely the test runs on the console and printfs stuff out >> while running. > But why did we not regress the world if a few
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
Hi Am 02.08.19 um 09:11 schrieb Rong Chen: > Hi, > > On 8/1/19 7:58 PM, Thomas Zimmermann wrote: >> Hi >> >> Am 01.08.19 um 13:25 schrieb Feng Tang: >>> Hi Thomas, >>> >>> On Thu, Aug 01, 2019 at 11:59:28AM +0200, Thomas Zimmermann wrote: Hi Am 01.08.19 um 10:37 schrieb Feng Tang: > On Thu, Aug 01, 2019 at 02:19:53PM +0800, Rong Chen wrote: >>> commit: 90f479ae51afa45efab97afdde9b94b9660dd3e4 >>> ("drm/mgag200: Replace struct mga_fbdev with generic >>> framebuffer emulation") >>> https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next.git >>> master >> Daniel, Noralf, we may have to revert this patch. >> >> I expected some change in display performance, but not in >> VM. Since it's >> a server chipset, probably no one cares much about display >> performance. >> So that seemed like a good trade-off for re-using shared >> code. >> >> Part of the patch set is that the generic fb emulation now >> maps and >> unmaps the fbdev BO when updating the screen. I guess >> that's the cause >> of the performance regression. And it should be visible >> with other >> drivers as well if they use a shadow FB for fbdev emulation. > For fbcon we should need to do any maps/unamps at all, this > is for the > fbdev mmap support only. If the testcase mentioned here > tests fbdev > mmap handling it's pretty badly misnamed :-) And as long as > you don't > have an fbdev mmap there shouldn't be any impact at all. The ast and mgag200 have only a few MiB of VRAM, so we have to get the fbdev BO out if it's not being displayed. If not being mapped, it can be evicted and make room for X, etc. To make this work, the BO's memory is mapped and unmapped in drm_fb_helper_dirty_work() before being updated from the shadow FB. [1] That fbdev mapping is established on each screen update, more or less. From my (yet unverified) understanding, this causes the performance regression in the VM code. The original code in mgag200 used to kmap the fbdev BO while it's being displayed; [2] and the drawing code only mapped it when necessary (i.e., not being display). [3] >>> Hm yeah, this vmap/vunmap is going to be pretty bad. We >>> indeed should >>> cache this. >>> I think this could be added for VRAM helpers as well, but it's still a workaround and non-VRAM drivers might also run into such a performance regression if they use the fbdev's shadow fb. >>> Yeah agreed, fbdev emulation should try to cache the vmap. >>> Noralf mentioned that there are plans for other DRM clients besides the console. They would as well run into similar problems. >> The thing is that we'd need another generic fbdev >> emulation for ast and >> mgag200 that handles this issue properly. > Yeah I dont think we want to jump the gun here. If you can > try to > repro locally and profile where we're wasting cpu time I > hope that > should sched a light what's going wrong here. I don't have much time ATM and I'm not even officially at work until late Aug. I'd send you the revert and investigate later. I agree that using generic fbdev emulation would be preferable. >>> Still not sure that's the right thing to do really. Yes it's a >>> regression, but vm testcases shouldn run a single line of >>> fbcon or drm >>> code. So why this is impacted so heavily by a silly drm >>> change is very >>> confusing to me. We might be papering over a deeper and much >>> more >>> serious issue ... >> It's a regression, the right thing is to revert first and then >> work >> out the right thing to do. > Sure, but I have no idea whether the testcase is doing something > reasonable. If it's accidentally testing vm scalability of > fbdev and > there's no one else doing something this pointless, then it's > not a > real bug. Plus I think we're shooting the messenger here. > >> It's likely the test runs on the console and printfs stuff out >> while running. > But why did we not regress the world if a few
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
Hi Am 01.08.19 um 15:30 schrieb Michel Dänzer: > On 2019-08-01 8:19 a.m., Rong Chen wrote: >> Hi, >> >> On 7/31/19 6:21 PM, Michel Dänzer wrote: >>> On 2019-07-31 11:25 a.m., Huang, Ying wrote: Hi, Daniel, Daniel Vetter writes: > On Tue, Jul 30, 2019 at 10:27 PM Dave Airlie wrote: >> On Wed, 31 Jul 2019 at 05:00, Daniel Vetter wrote: >>> On Tue, Jul 30, 2019 at 8:50 PM Thomas Zimmermann >>> wrote: Hi Am 30.07.19 um 20:12 schrieb Daniel Vetter: > On Tue, Jul 30, 2019 at 7:50 PM Thomas Zimmermann > wrote: >> Am 29.07.19 um 11:51 schrieb kernel test robot: >>> Greeting, >>> >>> FYI, we noticed a -18.8% regression of vm-scalability.median >>> due to commit:> >>> >>> commit: 90f479ae51afa45efab97afdde9b94b9660dd3e4 >>> ("drm/mgag200: Replace struct mga_fbdev with generic >>> framebuffer emulation") >>> https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next.git >>> master >> Daniel, Noralf, we may have to revert this patch. >> >> I expected some change in display performance, but not in VM. >> Since it's >> a server chipset, probably no one cares much about display >> performance. >> So that seemed like a good trade-off for re-using shared code. >> >> Part of the patch set is that the generic fb emulation now maps >> and >> unmaps the fbdev BO when updating the screen. I guess that's >> the cause >> of the performance regression. And it should be visible with other >> drivers as well if they use a shadow FB for fbdev emulation. > For fbcon we should need to do any maps/unamps at all, this is > for the > fbdev mmap support only. If the testcase mentioned here tests fbdev > mmap handling it's pretty badly misnamed :-) And as long as you > don't > have an fbdev mmap there shouldn't be any impact at all. The ast and mgag200 have only a few MiB of VRAM, so we have to get the fbdev BO out if it's not being displayed. If not being mapped, it can be evicted and make room for X, etc. To make this work, the BO's memory is mapped and unmapped in drm_fb_helper_dirty_work() before being updated from the shadow FB. [1] That fbdev mapping is established on each screen update, more or less. From my (yet unverified) understanding, this causes the performance regression in the VM code. The original code in mgag200 used to kmap the fbdev BO while it's being displayed; [2] and the drawing code only mapped it when necessary (i.e., not being display). [3] >>> Hm yeah, this vmap/vunmap is going to be pretty bad. We indeed should >>> cache this. >>> I think this could be added for VRAM helpers as well, but it's still a workaround and non-VRAM drivers might also run into such a performance regression if they use the fbdev's shadow fb. >>> Yeah agreed, fbdev emulation should try to cache the vmap. >>> Noralf mentioned that there are plans for other DRM clients besides the console. They would as well run into similar problems. >> The thing is that we'd need another generic fbdev emulation for >> ast and >> mgag200 that handles this issue properly. > Yeah I dont think we want to jump the gun here. If you can try to > repro locally and profile where we're wasting cpu time I hope that > should sched a light what's going wrong here. I don't have much time ATM and I'm not even officially at work until late Aug. I'd send you the revert and investigate later. I agree that using generic fbdev emulation would be preferable. >>> Still not sure that's the right thing to do really. Yes it's a >>> regression, but vm testcases shouldn run a single line of fbcon or >>> drm >>> code. So why this is impacted so heavily by a silly drm change is >>> very >>> confusing to me. We might be papering over a deeper and much more >>> serious issue ... >> It's a regression, the right thing is to revert first and then work >> out the right thing to do. > Sure, but I have no idea whether the testcase is doing something > reasonable. If it's accidentally testing vm scalability of fbdev and > there's no one else doing something this pointless, then it's not a > real bug. Plus I think we're shooting the messenger here. > >> It's likely the test runs on the console and printfs stuff out >> while running. > But why did we not regress the world if a few prints on the console > have such a huge impact? We
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
Hi, On 8/1/19 7:58 PM, Thomas Zimmermann wrote: Hi Am 01.08.19 um 13:25 schrieb Feng Tang: Hi Thomas, On Thu, Aug 01, 2019 at 11:59:28AM +0200, Thomas Zimmermann wrote: Hi Am 01.08.19 um 10:37 schrieb Feng Tang: On Thu, Aug 01, 2019 at 02:19:53PM +0800, Rong Chen wrote: commit: 90f479ae51afa45efab97afdde9b94b9660dd3e4 ("drm/mgag200: Replace struct mga_fbdev with generic framebuffer emulation") https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next.git master Daniel, Noralf, we may have to revert this patch. I expected some change in display performance, but not in VM. Since it's a server chipset, probably no one cares much about display performance. So that seemed like a good trade-off for re-using shared code. Part of the patch set is that the generic fb emulation now maps and unmaps the fbdev BO when updating the screen. I guess that's the cause of the performance regression. And it should be visible with other drivers as well if they use a shadow FB for fbdev emulation. For fbcon we should need to do any maps/unamps at all, this is for the fbdev mmap support only. If the testcase mentioned here tests fbdev mmap handling it's pretty badly misnamed :-) And as long as you don't have an fbdev mmap there shouldn't be any impact at all. The ast and mgag200 have only a few MiB of VRAM, so we have to get the fbdev BO out if it's not being displayed. If not being mapped, it can be evicted and make room for X, etc. To make this work, the BO's memory is mapped and unmapped in drm_fb_helper_dirty_work() before being updated from the shadow FB. [1] That fbdev mapping is established on each screen update, more or less. From my (yet unverified) understanding, this causes the performance regression in the VM code. The original code in mgag200 used to kmap the fbdev BO while it's being displayed; [2] and the drawing code only mapped it when necessary (i.e., not being display). [3] Hm yeah, this vmap/vunmap is going to be pretty bad. We indeed should cache this. I think this could be added for VRAM helpers as well, but it's still a workaround and non-VRAM drivers might also run into such a performance regression if they use the fbdev's shadow fb. Yeah agreed, fbdev emulation should try to cache the vmap. Noralf mentioned that there are plans for other DRM clients besides the console. They would as well run into similar problems. The thing is that we'd need another generic fbdev emulation for ast and mgag200 that handles this issue properly. Yeah I dont think we want to jump the gun here. If you can try to repro locally and profile where we're wasting cpu time I hope that should sched a light what's going wrong here. I don't have much time ATM and I'm not even officially at work until late Aug. I'd send you the revert and investigate later. I agree that using generic fbdev emulation would be preferable. Still not sure that's the right thing to do really. Yes it's a regression, but vm testcases shouldn run a single line of fbcon or drm code. So why this is impacted so heavily by a silly drm change is very confusing to me. We might be papering over a deeper and much more serious issue ... It's a regression, the right thing is to revert first and then work out the right thing to do. Sure, but I have no idea whether the testcase is doing something reasonable. If it's accidentally testing vm scalability of fbdev and there's no one else doing something this pointless, then it's not a real bug. Plus I think we're shooting the messenger here. It's likely the test runs on the console and printfs stuff out while running. But why did we not regress the world if a few prints on the console have such a huge impact? We didn't get an entire stream of mails about breaking stuff ... The regression seems not related to the commit. But we have retested and confirmed the regression. Hard to understand what happens. Does the regressed test cause any output on console while it's measuring? If so, it's probably accidentally measuring fbcon/DRM code in addition to the workload it's trying to measure. Sorry, I'm not familiar with DRM, we enabled the console to output logs, and attached please find the log file. "Command line: ... console=tty0 earlyprintk=ttyS0,115200 console=ttyS0,115200 vga=normal rw" We did more check, and found this test machine does use the mgag200 driver. And we are suspecting the regression is caused by commit cf1ca9aeb930df074bb5bbcde55f935fec04e529 Author: Thomas Zimmermann Date: Wed Jul 3 09:58:24 2019 +0200 Yes, that's the commit. Unfortunately reverting it would require reverting a hand full of other patches as well. I have a potential fix for the problem. Could you run and verify that it resolves the problem? Sure, please send it to us. Rong and I will try it. Fantastic, thank you! The patch set is available on dri-devel at https://lists.freedesktop.org/archives/dri-devel/2019-August/228950.html The patch set improves the performance
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
On 2019-08-01 8:19 a.m., Rong Chen wrote: > Hi, > > On 7/31/19 6:21 PM, Michel Dänzer wrote: >> On 2019-07-31 11:25 a.m., Huang, Ying wrote: >>> Hi, Daniel, >>> >>> Daniel Vetter writes: >>> On Tue, Jul 30, 2019 at 10:27 PM Dave Airlie wrote: > On Wed, 31 Jul 2019 at 05:00, Daniel Vetter wrote: >> On Tue, Jul 30, 2019 at 8:50 PM Thomas Zimmermann >> wrote: >>> Hi >>> >>> Am 30.07.19 um 20:12 schrieb Daniel Vetter: On Tue, Jul 30, 2019 at 7:50 PM Thomas Zimmermann wrote: > Am 29.07.19 um 11:51 schrieb kernel test robot: >> Greeting, >> >> FYI, we noticed a -18.8% regression of vm-scalability.median >> due to commit:> >> >> commit: 90f479ae51afa45efab97afdde9b94b9660dd3e4 >> ("drm/mgag200: Replace struct mga_fbdev with generic >> framebuffer emulation") >> https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next.git >> master > Daniel, Noralf, we may have to revert this patch. > > I expected some change in display performance, but not in VM. > Since it's > a server chipset, probably no one cares much about display > performance. > So that seemed like a good trade-off for re-using shared code. > > Part of the patch set is that the generic fb emulation now maps > and > unmaps the fbdev BO when updating the screen. I guess that's > the cause > of the performance regression. And it should be visible with other > drivers as well if they use a shadow FB for fbdev emulation. For fbcon we should need to do any maps/unamps at all, this is for the fbdev mmap support only. If the testcase mentioned here tests fbdev mmap handling it's pretty badly misnamed :-) And as long as you don't have an fbdev mmap there shouldn't be any impact at all. >>> The ast and mgag200 have only a few MiB of VRAM, so we have to >>> get the >>> fbdev BO out if it's not being displayed. If not being mapped, it >>> can be >>> evicted and make room for X, etc. >>> >>> To make this work, the BO's memory is mapped and unmapped in >>> drm_fb_helper_dirty_work() before being updated from the shadow >>> FB. [1] >>> That fbdev mapping is established on each screen update, more or >>> less. >>> From my (yet unverified) understanding, this causes the performance >>> regression in the VM code. >>> >>> The original code in mgag200 used to kmap the fbdev BO while it's >>> being >>> displayed; [2] and the drawing code only mapped it when necessary >>> (i.e., >>> not being display). [3] >> Hm yeah, this vmap/vunmap is going to be pretty bad. We indeed should >> cache this. >> >>> I think this could be added for VRAM helpers as well, but it's >>> still a >>> workaround and non-VRAM drivers might also run into such a >>> performance >>> regression if they use the fbdev's shadow fb. >> Yeah agreed, fbdev emulation should try to cache the vmap. >> >>> Noralf mentioned that there are plans for other DRM clients >>> besides the >>> console. They would as well run into similar problems. >>> > The thing is that we'd need another generic fbdev emulation for > ast and > mgag200 that handles this issue properly. Yeah I dont think we want to jump the gun here. If you can try to repro locally and profile where we're wasting cpu time I hope that should sched a light what's going wrong here. >>> I don't have much time ATM and I'm not even officially at work until >>> late Aug. I'd send you the revert and investigate later. I agree >>> that >>> using generic fbdev emulation would be preferable. >> Still not sure that's the right thing to do really. Yes it's a >> regression, but vm testcases shouldn run a single line of fbcon or >> drm >> code. So why this is impacted so heavily by a silly drm change is >> very >> confusing to me. We might be papering over a deeper and much more >> serious issue ... > It's a regression, the right thing is to revert first and then work > out the right thing to do. Sure, but I have no idea whether the testcase is doing something reasonable. If it's accidentally testing vm scalability of fbdev and there's no one else doing something this pointless, then it's not a real bug. Plus I think we're shooting the messenger here. > It's likely the test runs on the console and printfs stuff out > while running. But why did we not regress the world if a few prints on the console have such a huge impact? We didn't get an entire stream of mails about breaking stuff ... >>> The regression seems not related to the commit. But we have retested >>> and confirmed
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
Hi Am 01.08.19 um 13:25 schrieb Feng Tang: > Hi Thomas, > > On Thu, Aug 01, 2019 at 11:59:28AM +0200, Thomas Zimmermann wrote: >> Hi >> >> Am 01.08.19 um 10:37 schrieb Feng Tang: >>> On Thu, Aug 01, 2019 at 02:19:53PM +0800, Rong Chen wrote: > > commit: 90f479ae51afa45efab97afdde9b94b9660dd3e4 ("drm/mgag200: > Replace struct mga_fbdev with generic framebuffer emulation") > https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next.git > master Daniel, Noralf, we may have to revert this patch. I expected some change in display performance, but not in VM. Since it's a server chipset, probably no one cares much about display performance. So that seemed like a good trade-off for re-using shared code. Part of the patch set is that the generic fb emulation now maps and unmaps the fbdev BO when updating the screen. I guess that's the cause of the performance regression. And it should be visible with other drivers as well if they use a shadow FB for fbdev emulation. >>> For fbcon we should need to do any maps/unamps at all, this is for >>> the >>> fbdev mmap support only. If the testcase mentioned here tests fbdev >>> mmap handling it's pretty badly misnamed :-) And as long as you >>> don't >>> have an fbdev mmap there shouldn't be any impact at all. >> The ast and mgag200 have only a few MiB of VRAM, so we have to get >> the >> fbdev BO out if it's not being displayed. If not being mapped, it >> can be >> evicted and make room for X, etc. >> >> To make this work, the BO's memory is mapped and unmapped in >> drm_fb_helper_dirty_work() before being updated from the shadow FB. >> [1] >> That fbdev mapping is established on each screen update, more or >> less. >> From my (yet unverified) understanding, this causes the performance >> regression in the VM code. >> >> The original code in mgag200 used to kmap the fbdev BO while it's >> being >> displayed; [2] and the drawing code only mapped it when necessary >> (i.e., >> not being display). [3] > Hm yeah, this vmap/vunmap is going to be pretty bad. We indeed should > cache this. > >> I think this could be added for VRAM helpers as well, but it's still >> a >> workaround and non-VRAM drivers might also run into such a >> performance >> regression if they use the fbdev's shadow fb. > Yeah agreed, fbdev emulation should try to cache the vmap. > >> Noralf mentioned that there are plans for other DRM clients besides >> the >> console. They would as well run into similar problems. >> The thing is that we'd need another generic fbdev emulation for ast and mgag200 that handles this issue properly. >>> Yeah I dont think we want to jump the gun here. If you can try to >>> repro locally and profile where we're wasting cpu time I hope that >>> should sched a light what's going wrong here. >> I don't have much time ATM and I'm not even officially at work until >> late Aug. I'd send you the revert and investigate later. I agree that >> using generic fbdev emulation would be preferable. > Still not sure that's the right thing to do really. Yes it's a > regression, but vm testcases shouldn run a single line of fbcon or drm > code. So why this is impacted so heavily by a silly drm change is very > confusing to me. We might be papering over a deeper and much more > serious issue ... It's a regression, the right thing is to revert first and then work out the right thing to do. >>> Sure, but I have no idea whether the testcase is doing something >>> reasonable. If it's accidentally testing vm scalability of fbdev and >>> there's no one else doing something this pointless, then it's not a >>> real bug. Plus I think we're shooting the messenger here. >>> It's likely the test runs on the console and printfs stuff out while running. >>> But why did we not regress the world if a few prints on the console >>> have such a huge impact? We didn't get an entire stream of mails about >>> breaking stuff ... >> The regression seems not related to the commit. But we have retested >> and confirmed the regression. Hard to understand what happens. > Does the regressed test cause any output on console while it's > measuring? If so, it's probably accidentally measuring fbcon/DRM code in > addition to the workload it's trying to
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
Hi Thomas, On Thu, Aug 01, 2019 at 11:59:28AM +0200, Thomas Zimmermann wrote: > Hi > > Am 01.08.19 um 10:37 schrieb Feng Tang: > > On Thu, Aug 01, 2019 at 02:19:53PM +0800, Rong Chen wrote: > >>> > >>> commit: 90f479ae51afa45efab97afdde9b94b9660dd3e4 ("drm/mgag200: > >>> Replace struct mga_fbdev with generic framebuffer emulation") > >>> https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next.git > >>> master > >> Daniel, Noralf, we may have to revert this patch. > >> > >> I expected some change in display performance, but not in VM. > >> Since it's > >> a server chipset, probably no one cares much about display > >> performance. > >> So that seemed like a good trade-off for re-using shared code. > >> > >> Part of the patch set is that the generic fb emulation now maps and > >> unmaps the fbdev BO when updating the screen. I guess that's the > >> cause > >> of the performance regression. And it should be visible with other > >> drivers as well if they use a shadow FB for fbdev emulation. > > For fbcon we should need to do any maps/unamps at all, this is for > > the > > fbdev mmap support only. If the testcase mentioned here tests fbdev > > mmap handling it's pretty badly misnamed :-) And as long as you > > don't > > have an fbdev mmap there shouldn't be any impact at all. > The ast and mgag200 have only a few MiB of VRAM, so we have to get > the > fbdev BO out if it's not being displayed. If not being mapped, it > can be > evicted and make room for X, etc. > > To make this work, the BO's memory is mapped and unmapped in > drm_fb_helper_dirty_work() before being updated from the shadow FB. > [1] > That fbdev mapping is established on each screen update, more or > less. > From my (yet unverified) understanding, this causes the performance > regression in the VM code. > > The original code in mgag200 used to kmap the fbdev BO while it's > being > displayed; [2] and the drawing code only mapped it when necessary > (i.e., > not being display). [3] > >>> Hm yeah, this vmap/vunmap is going to be pretty bad. We indeed should > >>> cache this. > >>> > I think this could be added for VRAM helpers as well, but it's still > a > workaround and non-VRAM drivers might also run into such a > performance > regression if they use the fbdev's shadow fb. > >>> Yeah agreed, fbdev emulation should try to cache the vmap. > >>> > Noralf mentioned that there are plans for other DRM clients besides > the > console. They would as well run into similar problems. > > >> The thing is that we'd need another generic fbdev emulation for > >> ast and > >> mgag200 that handles this issue properly. > > Yeah I dont think we want to jump the gun here. If you can try to > > repro locally and profile where we're wasting cpu time I hope that > > should sched a light what's going wrong here. > I don't have much time ATM and I'm not even officially at work until > late Aug. I'd send you the revert and investigate later. I agree that > using generic fbdev emulation would be preferable. > >>> Still not sure that's the right thing to do really. Yes it's a > >>> regression, but vm testcases shouldn run a single line of fbcon or drm > >>> code. So why this is impacted so heavily by a silly drm change is very > >>> confusing to me. We might be papering over a deeper and much more > >>> serious issue ... > >> It's a regression, the right thing is to revert first and then work > >> out the right thing to do. > > Sure, but I have no idea whether the testcase is doing something > > reasonable. If it's accidentally testing vm scalability of fbdev and > > there's no one else doing something this pointless, then it's not a > > real bug. Plus I think we're shooting the messenger here. > > > >> It's likely the test runs on the console and printfs stuff out while > >> running. > > But why did we not regress the world if a few prints on the console > > have such a huge impact? We didn't get an entire stream of mails about > > breaking stuff ... > The regression seems not related to the commit. But we have retested > and confirmed the regression. Hard to understand what happens. > >>> Does the regressed test cause any output on console while it's > >>> measuring? If so, it's probably accidentally measuring fbcon/DRM code in > >>> addition to the workload it's trying to measure. > >>> > >> > >> Sorry, I'm not familiar
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
Hi Am 01.08.19 um 10:37 schrieb Feng Tang: > On Thu, Aug 01, 2019 at 02:19:53PM +0800, Rong Chen wrote: >>> >>> commit: 90f479ae51afa45efab97afdde9b94b9660dd3e4 ("drm/mgag200: >>> Replace struct mga_fbdev with generic framebuffer emulation") >>> https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next.git >>> master >> Daniel, Noralf, we may have to revert this patch. >> >> I expected some change in display performance, but not in VM. Since >> it's >> a server chipset, probably no one cares much about display >> performance. >> So that seemed like a good trade-off for re-using shared code. >> >> Part of the patch set is that the generic fb emulation now maps and >> unmaps the fbdev BO when updating the screen. I guess that's the >> cause >> of the performance regression. And it should be visible with other >> drivers as well if they use a shadow FB for fbdev emulation. > For fbcon we should need to do any maps/unamps at all, this is for the > fbdev mmap support only. If the testcase mentioned here tests fbdev > mmap handling it's pretty badly misnamed :-) And as long as you don't > have an fbdev mmap there shouldn't be any impact at all. The ast and mgag200 have only a few MiB of VRAM, so we have to get the fbdev BO out if it's not being displayed. If not being mapped, it can be evicted and make room for X, etc. To make this work, the BO's memory is mapped and unmapped in drm_fb_helper_dirty_work() before being updated from the shadow FB. [1] That fbdev mapping is established on each screen update, more or less. From my (yet unverified) understanding, this causes the performance regression in the VM code. The original code in mgag200 used to kmap the fbdev BO while it's being displayed; [2] and the drawing code only mapped it when necessary (i.e., not being display). [3] >>> Hm yeah, this vmap/vunmap is going to be pretty bad. We indeed should >>> cache this. >>> I think this could be added for VRAM helpers as well, but it's still a workaround and non-VRAM drivers might also run into such a performance regression if they use the fbdev's shadow fb. >>> Yeah agreed, fbdev emulation should try to cache the vmap. >>> Noralf mentioned that there are plans for other DRM clients besides the console. They would as well run into similar problems. >> The thing is that we'd need another generic fbdev emulation for ast >> and >> mgag200 that handles this issue properly. > Yeah I dont think we want to jump the gun here. If you can try to > repro locally and profile where we're wasting cpu time I hope that > should sched a light what's going wrong here. I don't have much time ATM and I'm not even officially at work until late Aug. I'd send you the revert and investigate later. I agree that using generic fbdev emulation would be preferable. >>> Still not sure that's the right thing to do really. Yes it's a >>> regression, but vm testcases shouldn run a single line of fbcon or drm >>> code. So why this is impacted so heavily by a silly drm change is very >>> confusing to me. We might be papering over a deeper and much more >>> serious issue ... >> It's a regression, the right thing is to revert first and then work >> out the right thing to do. > Sure, but I have no idea whether the testcase is doing something > reasonable. If it's accidentally testing vm scalability of fbdev and > there's no one else doing something this pointless, then it's not a > real bug. Plus I think we're shooting the messenger here. > >> It's likely the test runs on the console and printfs stuff out while >> running. > But why did we not regress the world if a few prints on the console > have such a huge impact? We didn't get an entire stream of mails about > breaking stuff ... The regression seems not related to the commit. But we have retested and confirmed the regression. Hard to understand what happens. >>> Does the regressed test cause any output on console while it's >>> measuring? If so, it's probably accidentally measuring fbcon/DRM code in >>> addition to the workload it's trying to measure. >>> >> >> Sorry, I'm not familiar with DRM, we enabled the console to output logs, and >> attached please find the log file. >> >> "Command line: ... console=tty0 earlyprintk=ttyS0,115200 >> console=ttyS0,115200 vga=normal rw" > > We did more check, and found this test machine does use the > mgag200 driver. > > And we are suspecting the regression is caused by > > commit
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
Hi Am 01.08.19 um 08:19 schrieb Rong Chen: > Hi, > > On 7/31/19 6:21 PM, Michel Dänzer wrote: >> On 2019-07-31 11:25 a.m., Huang, Ying wrote: >>> Hi, Daniel, >>> >>> Daniel Vetter writes: >>> On Tue, Jul 30, 2019 at 10:27 PM Dave Airlie wrote: > On Wed, 31 Jul 2019 at 05:00, Daniel Vetter wrote: >> On Tue, Jul 30, 2019 at 8:50 PM Thomas Zimmermann >> wrote: >>> Hi >>> >>> Am 30.07.19 um 20:12 schrieb Daniel Vetter: On Tue, Jul 30, 2019 at 7:50 PM Thomas Zimmermann wrote: > Am 29.07.19 um 11:51 schrieb kernel test robot: >> Greeting, >> >> FYI, we noticed a -18.8% regression of vm-scalability.median >> due to commit:> >> >> commit: 90f479ae51afa45efab97afdde9b94b9660dd3e4 >> ("drm/mgag200: Replace struct mga_fbdev with generic >> framebuffer emulation") >> https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next.git >> master > Daniel, Noralf, we may have to revert this patch. > > I expected some change in display performance, but not in VM. > Since it's > a server chipset, probably no one cares much about display > performance. > So that seemed like a good trade-off for re-using shared code. > > Part of the patch set is that the generic fb emulation now maps > and > unmaps the fbdev BO when updating the screen. I guess that's > the cause > of the performance regression. And it should be visible with other > drivers as well if they use a shadow FB for fbdev emulation. For fbcon we should need to do any maps/unamps at all, this is for the fbdev mmap support only. If the testcase mentioned here tests fbdev mmap handling it's pretty badly misnamed :-) And as long as you don't have an fbdev mmap there shouldn't be any impact at all. >>> The ast and mgag200 have only a few MiB of VRAM, so we have to >>> get the >>> fbdev BO out if it's not being displayed. If not being mapped, it >>> can be >>> evicted and make room for X, etc. >>> >>> To make this work, the BO's memory is mapped and unmapped in >>> drm_fb_helper_dirty_work() before being updated from the shadow >>> FB. [1] >>> That fbdev mapping is established on each screen update, more or >>> less. >>> From my (yet unverified) understanding, this causes the performance >>> regression in the VM code. >>> >>> The original code in mgag200 used to kmap the fbdev BO while it's >>> being >>> displayed; [2] and the drawing code only mapped it when necessary >>> (i.e., >>> not being display). [3] >> Hm yeah, this vmap/vunmap is going to be pretty bad. We indeed should >> cache this. >> >>> I think this could be added for VRAM helpers as well, but it's >>> still a >>> workaround and non-VRAM drivers might also run into such a >>> performance >>> regression if they use the fbdev's shadow fb. >> Yeah agreed, fbdev emulation should try to cache the vmap. >> >>> Noralf mentioned that there are plans for other DRM clients >>> besides the >>> console. They would as well run into similar problems. >>> > The thing is that we'd need another generic fbdev emulation for > ast and > mgag200 that handles this issue properly. Yeah I dont think we want to jump the gun here. If you can try to repro locally and profile where we're wasting cpu time I hope that should sched a light what's going wrong here. >>> I don't have much time ATM and I'm not even officially at work until >>> late Aug. I'd send you the revert and investigate later. I agree >>> that >>> using generic fbdev emulation would be preferable. >> Still not sure that's the right thing to do really. Yes it's a >> regression, but vm testcases shouldn run a single line of fbcon or >> drm >> code. So why this is impacted so heavily by a silly drm change is >> very >> confusing to me. We might be papering over a deeper and much more >> serious issue ... > It's a regression, the right thing is to revert first and then work > out the right thing to do. Sure, but I have no idea whether the testcase is doing something reasonable. If it's accidentally testing vm scalability of fbdev and there's no one else doing something this pointless, then it's not a real bug. Plus I think we're shooting the messenger here. > It's likely the test runs on the console and printfs stuff out > while running. But why did we not regress the world if a few prints on the console have such a huge impact? We didn't get an entire stream of mails about breaking stuff ... >>> The regression seems not related to the commit. But we have retested >>> and
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
On Thu, Aug 01, 2019 at 02:19:53PM +0800, Rong Chen wrote: > > > >commit: 90f479ae51afa45efab97afdde9b94b9660dd3e4 ("drm/mgag200: > >Replace struct mga_fbdev with generic framebuffer emulation") > >https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next.git > > master > Daniel, Noralf, we may have to revert this patch. > > I expected some change in display performance, but not in VM. Since > it's > a server chipset, probably no one cares much about display > performance. > So that seemed like a good trade-off for re-using shared code. > > Part of the patch set is that the generic fb emulation now maps and > unmaps the fbdev BO when updating the screen. I guess that's the cause > of the performance regression. And it should be visible with other > drivers as well if they use a shadow FB for fbdev emulation. > >>>For fbcon we should need to do any maps/unamps at all, this is for the > >>>fbdev mmap support only. If the testcase mentioned here tests fbdev > >>>mmap handling it's pretty badly misnamed :-) And as long as you don't > >>>have an fbdev mmap there shouldn't be any impact at all. > >>The ast and mgag200 have only a few MiB of VRAM, so we have to get the > >>fbdev BO out if it's not being displayed. If not being mapped, it can be > >>evicted and make room for X, etc. > >> > >>To make this work, the BO's memory is mapped and unmapped in > >>drm_fb_helper_dirty_work() before being updated from the shadow FB. [1] > >>That fbdev mapping is established on each screen update, more or less. > >> From my (yet unverified) understanding, this causes the performance > >>regression in the VM code. > >> > >>The original code in mgag200 used to kmap the fbdev BO while it's being > >>displayed; [2] and the drawing code only mapped it when necessary (i.e., > >>not being display). [3] > >Hm yeah, this vmap/vunmap is going to be pretty bad. We indeed should > >cache this. > > > >>I think this could be added for VRAM helpers as well, but it's still a > >>workaround and non-VRAM drivers might also run into such a performance > >>regression if they use the fbdev's shadow fb. > >Yeah agreed, fbdev emulation should try to cache the vmap. > > > >>Noralf mentioned that there are plans for other DRM clients besides the > >>console. They would as well run into similar problems. > >> > The thing is that we'd need another generic fbdev emulation for ast > and > mgag200 that handles this issue properly. > >>>Yeah I dont think we want to jump the gun here. If you can try to > >>>repro locally and profile where we're wasting cpu time I hope that > >>>should sched a light what's going wrong here. > >>I don't have much time ATM and I'm not even officially at work until > >>late Aug. I'd send you the revert and investigate later. I agree that > >>using generic fbdev emulation would be preferable. > >Still not sure that's the right thing to do really. Yes it's a > >regression, but vm testcases shouldn run a single line of fbcon or drm > >code. So why this is impacted so heavily by a silly drm change is very > >confusing to me. We might be papering over a deeper and much more > >serious issue ... > It's a regression, the right thing is to revert first and then work > out the right thing to do. > >>>Sure, but I have no idea whether the testcase is doing something > >>>reasonable. If it's accidentally testing vm scalability of fbdev and > >>>there's no one else doing something this pointless, then it's not a > >>>real bug. Plus I think we're shooting the messenger here. > >>> > It's likely the test runs on the console and printfs stuff out while > running. > >>>But why did we not regress the world if a few prints on the console > >>>have such a huge impact? We didn't get an entire stream of mails about > >>>breaking stuff ... > >>The regression seems not related to the commit. But we have retested > >>and confirmed the regression. Hard to understand what happens. > >Does the regressed test cause any output on console while it's > >measuring? If so, it's probably accidentally measuring fbcon/DRM code in > >addition to the workload it's trying to measure. > > > > Sorry, I'm not familiar with DRM, we enabled the console to output logs, and > attached please find the log file. > > "Command line: ... console=tty0 earlyprintk=ttyS0,115200 > console=ttyS0,115200 vga=normal rw" We did more check, and found this test machine does use the mgag200 driver. And we are suspecting the regression is caused by commit cf1ca9aeb930df074bb5bbcde55f935fec04e529 Author: Thomas Zimmermann Date: Wed Jul 3 09:58:24 2019 +0200 drm/fb-helper: Map DRM client buffer only when required This
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
Hi, On 7/31/19 6:21 PM, Michel Dänzer wrote: On 2019-07-31 11:25 a.m., Huang, Ying wrote: Hi, Daniel, Daniel Vetter writes: On Tue, Jul 30, 2019 at 10:27 PM Dave Airlie wrote: On Wed, 31 Jul 2019 at 05:00, Daniel Vetter wrote: On Tue, Jul 30, 2019 at 8:50 PM Thomas Zimmermann wrote: Hi Am 30.07.19 um 20:12 schrieb Daniel Vetter: On Tue, Jul 30, 2019 at 7:50 PM Thomas Zimmermann wrote: Am 29.07.19 um 11:51 schrieb kernel test robot: Greeting, FYI, we noticed a -18.8% regression of vm-scalability.median due to commit:> commit: 90f479ae51afa45efab97afdde9b94b9660dd3e4 ("drm/mgag200: Replace struct mga_fbdev with generic framebuffer emulation") https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next.git master Daniel, Noralf, we may have to revert this patch. I expected some change in display performance, but not in VM. Since it's a server chipset, probably no one cares much about display performance. So that seemed like a good trade-off for re-using shared code. Part of the patch set is that the generic fb emulation now maps and unmaps the fbdev BO when updating the screen. I guess that's the cause of the performance regression. And it should be visible with other drivers as well if they use a shadow FB for fbdev emulation. For fbcon we should need to do any maps/unamps at all, this is for the fbdev mmap support only. If the testcase mentioned here tests fbdev mmap handling it's pretty badly misnamed :-) And as long as you don't have an fbdev mmap there shouldn't be any impact at all. The ast and mgag200 have only a few MiB of VRAM, so we have to get the fbdev BO out if it's not being displayed. If not being mapped, it can be evicted and make room for X, etc. To make this work, the BO's memory is mapped and unmapped in drm_fb_helper_dirty_work() before being updated from the shadow FB. [1] That fbdev mapping is established on each screen update, more or less. From my (yet unverified) understanding, this causes the performance regression in the VM code. The original code in mgag200 used to kmap the fbdev BO while it's being displayed; [2] and the drawing code only mapped it when necessary (i.e., not being display). [3] Hm yeah, this vmap/vunmap is going to be pretty bad. We indeed should cache this. I think this could be added for VRAM helpers as well, but it's still a workaround and non-VRAM drivers might also run into such a performance regression if they use the fbdev's shadow fb. Yeah agreed, fbdev emulation should try to cache the vmap. Noralf mentioned that there are plans for other DRM clients besides the console. They would as well run into similar problems. The thing is that we'd need another generic fbdev emulation for ast and mgag200 that handles this issue properly. Yeah I dont think we want to jump the gun here. If you can try to repro locally and profile where we're wasting cpu time I hope that should sched a light what's going wrong here. I don't have much time ATM and I'm not even officially at work until late Aug. I'd send you the revert and investigate later. I agree that using generic fbdev emulation would be preferable. Still not sure that's the right thing to do really. Yes it's a regression, but vm testcases shouldn run a single line of fbcon or drm code. So why this is impacted so heavily by a silly drm change is very confusing to me. We might be papering over a deeper and much more serious issue ... It's a regression, the right thing is to revert first and then work out the right thing to do. Sure, but I have no idea whether the testcase is doing something reasonable. If it's accidentally testing vm scalability of fbdev and there's no one else doing something this pointless, then it's not a real bug. Plus I think we're shooting the messenger here. It's likely the test runs on the console and printfs stuff out while running. But why did we not regress the world if a few prints on the console have such a huge impact? We didn't get an entire stream of mails about breaking stuff ... The regression seems not related to the commit. But we have retested and confirmed the regression. Hard to understand what happens. Does the regressed test cause any output on console while it's measuring? If so, it's probably accidentally measuring fbcon/DRM code in addition to the workload it's trying to measure. Sorry, I'm not familiar with DRM, we enabled the console to output logs, and attached please find the log file. "Command line: ... console=tty0 earlyprintk=ttyS0,115200 console=ttyS0,115200 vga=normal rw" Best Regards, Rong Chen kmsg.xz Description: application/xz ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
On 2019-07-31 11:25 a.m., Huang, Ying wrote: > Hi, Daniel, > > Daniel Vetter writes: > >> On Tue, Jul 30, 2019 at 10:27 PM Dave Airlie wrote: >>> >>> On Wed, 31 Jul 2019 at 05:00, Daniel Vetter wrote: On Tue, Jul 30, 2019 at 8:50 PM Thomas Zimmermann wrote: > > Hi > > Am 30.07.19 um 20:12 schrieb Daniel Vetter: >> On Tue, Jul 30, 2019 at 7:50 PM Thomas Zimmermann >> wrote: >>> Am 29.07.19 um 11:51 schrieb kernel test robot: Greeting, FYI, we noticed a -18.8% regression of vm-scalability.median due to commit:> commit: 90f479ae51afa45efab97afdde9b94b9660dd3e4 ("drm/mgag200: Replace struct mga_fbdev with generic framebuffer emulation") https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next.git master >>> >>> Daniel, Noralf, we may have to revert this patch. >>> >>> I expected some change in display performance, but not in VM. Since it's >>> a server chipset, probably no one cares much about display performance. >>> So that seemed like a good trade-off for re-using shared code. >>> >>> Part of the patch set is that the generic fb emulation now maps and >>> unmaps the fbdev BO when updating the screen. I guess that's the cause >>> of the performance regression. And it should be visible with other >>> drivers as well if they use a shadow FB for fbdev emulation. >> >> For fbcon we should need to do any maps/unamps at all, this is for the >> fbdev mmap support only. If the testcase mentioned here tests fbdev >> mmap handling it's pretty badly misnamed :-) And as long as you don't >> have an fbdev mmap there shouldn't be any impact at all. > > The ast and mgag200 have only a few MiB of VRAM, so we have to get the > fbdev BO out if it's not being displayed. If not being mapped, it can be > evicted and make room for X, etc. > > To make this work, the BO's memory is mapped and unmapped in > drm_fb_helper_dirty_work() before being updated from the shadow FB. [1] > That fbdev mapping is established on each screen update, more or less. > From my (yet unverified) understanding, this causes the performance > regression in the VM code. > > The original code in mgag200 used to kmap the fbdev BO while it's being > displayed; [2] and the drawing code only mapped it when necessary (i.e., > not being display). [3] Hm yeah, this vmap/vunmap is going to be pretty bad. We indeed should cache this. > I think this could be added for VRAM helpers as well, but it's still a > workaround and non-VRAM drivers might also run into such a performance > regression if they use the fbdev's shadow fb. Yeah agreed, fbdev emulation should try to cache the vmap. > Noralf mentioned that there are plans for other DRM clients besides the > console. They would as well run into similar problems. > >>> The thing is that we'd need another generic fbdev emulation for ast and >>> mgag200 that handles this issue properly. >> >> Yeah I dont think we want to jump the gun here. If you can try to >> repro locally and profile where we're wasting cpu time I hope that >> should sched a light what's going wrong here. > > I don't have much time ATM and I'm not even officially at work until > late Aug. I'd send you the revert and investigate later. I agree that > using generic fbdev emulation would be preferable. Still not sure that's the right thing to do really. Yes it's a regression, but vm testcases shouldn run a single line of fbcon or drm code. So why this is impacted so heavily by a silly drm change is very confusing to me. We might be papering over a deeper and much more serious issue ... >>> >>> It's a regression, the right thing is to revert first and then work >>> out the right thing to do. >> >> Sure, but I have no idea whether the testcase is doing something >> reasonable. If it's accidentally testing vm scalability of fbdev and >> there's no one else doing something this pointless, then it's not a >> real bug. Plus I think we're shooting the messenger here. >> >>> It's likely the test runs on the console and printfs stuff out while >>> running. >> >> But why did we not regress the world if a few prints on the console >> have such a huge impact? We didn't get an entire stream of mails about >> breaking stuff ... > > The regression seems not related to the commit. But we have retested > and confirmed the regression. Hard to understand what happens. Does the regressed test cause any output on console while it's measuring? If so, it's probably accidentally measuring fbcon/DRM code in addition to the workload it's trying to measure. -- Earthling Michel Dänzer | https://www.amd.com Libre software enthusiast |
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
Hi Am 31.07.19 um 11:25 schrieb Huang, Ying: > Hi, Daniel, > > Daniel Vetter writes: > >> On Tue, Jul 30, 2019 at 10:27 PM Dave Airlie wrote: >>> >>> On Wed, 31 Jul 2019 at 05:00, Daniel Vetter wrote: On Tue, Jul 30, 2019 at 8:50 PM Thomas Zimmermann wrote: > > Hi > > Am 30.07.19 um 20:12 schrieb Daniel Vetter: >> On Tue, Jul 30, 2019 at 7:50 PM Thomas Zimmermann >> wrote: >>> Am 29.07.19 um 11:51 schrieb kernel test robot: Greeting, FYI, we noticed a -18.8% regression of vm-scalability.median due to commit:> commit: 90f479ae51afa45efab97afdde9b94b9660dd3e4 ("drm/mgag200: Replace struct mga_fbdev with generic framebuffer emulation") https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next.git master >>> >>> Daniel, Noralf, we may have to revert this patch. >>> >>> I expected some change in display performance, but not in VM. Since it's >>> a server chipset, probably no one cares much about display performance. >>> So that seemed like a good trade-off for re-using shared code. >>> >>> Part of the patch set is that the generic fb emulation now maps and >>> unmaps the fbdev BO when updating the screen. I guess that's the cause >>> of the performance regression. And it should be visible with other >>> drivers as well if they use a shadow FB for fbdev emulation. >> >> For fbcon we should need to do any maps/unamps at all, this is for the >> fbdev mmap support only. If the testcase mentioned here tests fbdev >> mmap handling it's pretty badly misnamed :-) And as long as you don't >> have an fbdev mmap there shouldn't be any impact at all. > > The ast and mgag200 have only a few MiB of VRAM, so we have to get the > fbdev BO out if it's not being displayed. If not being mapped, it can be > evicted and make room for X, etc. > > To make this work, the BO's memory is mapped and unmapped in > drm_fb_helper_dirty_work() before being updated from the shadow FB. [1] > That fbdev mapping is established on each screen update, more or less. > From my (yet unverified) understanding, this causes the performance > regression in the VM code. > > The original code in mgag200 used to kmap the fbdev BO while it's being > displayed; [2] and the drawing code only mapped it when necessary (i.e., > not being display). [3] Hm yeah, this vmap/vunmap is going to be pretty bad. We indeed should cache this. > I think this could be added for VRAM helpers as well, but it's still a > workaround and non-VRAM drivers might also run into such a performance > regression if they use the fbdev's shadow fb. Yeah agreed, fbdev emulation should try to cache the vmap. > Noralf mentioned that there are plans for other DRM clients besides the > console. They would as well run into similar problems. > >>> The thing is that we'd need another generic fbdev emulation for ast and >>> mgag200 that handles this issue properly. >> >> Yeah I dont think we want to jump the gun here. If you can try to >> repro locally and profile where we're wasting cpu time I hope that >> should sched a light what's going wrong here. > > I don't have much time ATM and I'm not even officially at work until > late Aug. I'd send you the revert and investigate later. I agree that > using generic fbdev emulation would be preferable. Still not sure that's the right thing to do really. Yes it's a regression, but vm testcases shouldn run a single line of fbcon or drm code. So why this is impacted so heavily by a silly drm change is very confusing to me. We might be papering over a deeper and much more serious issue ... >>> >>> It's a regression, the right thing is to revert first and then work >>> out the right thing to do. >> >> Sure, but I have no idea whether the testcase is doing something >> reasonable. If it's accidentally testing vm scalability of fbdev and >> there's no one else doing something this pointless, then it's not a >> real bug. Plus I think we're shooting the messenger here. >> >>> It's likely the test runs on the console and printfs stuff out while >>> running. >> >> But why did we not regress the world if a few prints on the console >> have such a huge impact? We didn't get an entire stream of mails about >> breaking stuff ... > > The regression seems not related to the commit. But we have retested > and confirmed the regression. Hard to understand what happens. Take a look at commit cf1ca9aeb930df074bb5bbcde55f935fec04e529 Best regards Thomas > > Best Regards, > Huang, Ying > ___ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel > -- Thomas
Re: [LKP] [drm/mgag200] 90f479ae51: vm-scalability.median -18.8% regression
Hi, Daniel, Daniel Vetter writes: > On Tue, Jul 30, 2019 at 10:27 PM Dave Airlie wrote: >> >> On Wed, 31 Jul 2019 at 05:00, Daniel Vetter wrote: >> > >> > On Tue, Jul 30, 2019 at 8:50 PM Thomas Zimmermann >> > wrote: >> > > >> > > Hi >> > > >> > > Am 30.07.19 um 20:12 schrieb Daniel Vetter: >> > > > On Tue, Jul 30, 2019 at 7:50 PM Thomas Zimmermann >> > > > wrote: >> > > >> Am 29.07.19 um 11:51 schrieb kernel test robot: >> > > >>> Greeting, >> > > >>> >> > > >>> FYI, we noticed a -18.8% regression of vm-scalability.median due to >> > > >>> commit:> >> > > >>> >> > > >>> commit: 90f479ae51afa45efab97afdde9b94b9660dd3e4 ("drm/mgag200: >> > > >>> Replace struct mga_fbdev with generic framebuffer emulation") >> > > >>> https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next.git >> > > >>> master >> > > >> >> > > >> Daniel, Noralf, we may have to revert this patch. >> > > >> >> > > >> I expected some change in display performance, but not in VM. Since >> > > >> it's >> > > >> a server chipset, probably no one cares much about display >> > > >> performance. >> > > >> So that seemed like a good trade-off for re-using shared code. >> > > >> >> > > >> Part of the patch set is that the generic fb emulation now maps and >> > > >> unmaps the fbdev BO when updating the screen. I guess that's the cause >> > > >> of the performance regression. And it should be visible with other >> > > >> drivers as well if they use a shadow FB for fbdev emulation. >> > > > >> > > > For fbcon we should need to do any maps/unamps at all, this is for the >> > > > fbdev mmap support only. If the testcase mentioned here tests fbdev >> > > > mmap handling it's pretty badly misnamed :-) And as long as you don't >> > > > have an fbdev mmap there shouldn't be any impact at all. >> > > >> > > The ast and mgag200 have only a few MiB of VRAM, so we have to get the >> > > fbdev BO out if it's not being displayed. If not being mapped, it can be >> > > evicted and make room for X, etc. >> > > >> > > To make this work, the BO's memory is mapped and unmapped in >> > > drm_fb_helper_dirty_work() before being updated from the shadow FB. [1] >> > > That fbdev mapping is established on each screen update, more or less. >> > > From my (yet unverified) understanding, this causes the performance >> > > regression in the VM code. >> > > >> > > The original code in mgag200 used to kmap the fbdev BO while it's being >> > > displayed; [2] and the drawing code only mapped it when necessary (i.e., >> > > not being display). [3] >> > >> > Hm yeah, this vmap/vunmap is going to be pretty bad. We indeed should >> > cache this. >> > >> > > I think this could be added for VRAM helpers as well, but it's still a >> > > workaround and non-VRAM drivers might also run into such a performance >> > > regression if they use the fbdev's shadow fb. >> > >> > Yeah agreed, fbdev emulation should try to cache the vmap. >> > >> > > Noralf mentioned that there are plans for other DRM clients besides the >> > > console. They would as well run into similar problems. >> > > >> > > >> The thing is that we'd need another generic fbdev emulation for ast >> > > >> and >> > > >> mgag200 that handles this issue properly. >> > > > >> > > > Yeah I dont think we want to jump the gun here. If you can try to >> > > > repro locally and profile where we're wasting cpu time I hope that >> > > > should sched a light what's going wrong here. >> > > >> > > I don't have much time ATM and I'm not even officially at work until >> > > late Aug. I'd send you the revert and investigate later. I agree that >> > > using generic fbdev emulation would be preferable. >> > >> > Still not sure that's the right thing to do really. Yes it's a >> > regression, but vm testcases shouldn run a single line of fbcon or drm >> > code. So why this is impacted so heavily by a silly drm change is very >> > confusing to me. We might be papering over a deeper and much more >> > serious issue ... >> >> It's a regression, the right thing is to revert first and then work >> out the right thing to do. > > Sure, but I have no idea whether the testcase is doing something > reasonable. If it's accidentally testing vm scalability of fbdev and > there's no one else doing something this pointless, then it's not a > real bug. Plus I think we're shooting the messenger here. > >> It's likely the test runs on the console and printfs stuff out while running. > > But why did we not regress the world if a few prints on the console > have such a huge impact? We didn't get an entire stream of mails about > breaking stuff ... The regression seems not related to the commit. But we have retested and confirmed the regression. Hard to understand what happens. Best Regards, Huang, Ying ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel