Re: [PATCH 2/2] fbdev: Improve performance of sys_imageblit()

2022-02-18 Thread Thomas Zimmermann
Hi Sam Am 18.02.22 um 11:14 schrieb Sam Ravnborg: Hi Thomas, On Thu, Feb 17, 2022 at 11:34:05AM +0100, Thomas Zimmermann wrote: Improve the performance of sys_imageblit() by manually unrolling the inner blitting loop and moving some invariants out. The compiler failed to do this

Re: [PATCH 2/2] fbdev: Improve performance of sys_imageblit()

2022-02-18 Thread Sam Ravnborg
Hi Thomas, On Thu, Feb 17, 2022 at 11:34:05AM +0100, Thomas Zimmermann wrote: > Improve the performance of sys_imageblit() by manually unrolling > the inner blitting loop and moving some invariants out. The compiler > failed to do this automatically. The resulting binary code was even > slower

Re: [PATCH 2/2] fbdev: Improve performance of sys_imageblit()

2022-02-18 Thread Javier Martinez Canillas
Hello Thomas, On 2/17/22 11:34, Thomas Zimmermann wrote: > Improve the performance of sys_imageblit() by manually unrolling > the inner blitting loop and moving some invariants out. The compiler > failed to do this automatically. The resulting binary code was even > slower than the

Re: [PATCH 2/2] fbdev: Improve performance of sys_imageblit()

2022-02-17 Thread Thomas Zimmermann
Hi Am 17.02.22 um 12:05 schrieb Gerd Hoffmann: - for (j = k; j--; ) { - shift -= ppw; - end_mask = tab[(*src >> shift) & bit_mask]; - *dst++ = (end_mask & eorx) ^ bgx; - if (!shift) { -

Re: [PATCH 2/2] fbdev: Improve performance of sys_imageblit()

2022-02-17 Thread Gerd Hoffmann
> - for (j = k; j--; ) { > - shift -= ppw; > - end_mask = tab[(*src >> shift) & bit_mask]; > - *dst++ = (end_mask & eorx) ^ bgx; > - if (!shift) { > - shift = 8; > -

[PATCH 2/2] fbdev: Improve performance of sys_imageblit()

2022-02-17 Thread Thomas Zimmermann
Improve the performance of sys_imageblit() by manually unrolling the inner blitting loop and moving some invariants out. The compiler failed to do this automatically. The resulting binary code was even slower than the cfb_imageblit() helper, which uses the same algorithm, but operates on I/O