> From: Jeremie Courreges-Anglas <[email protected]>
> Date: Fri, 23 Jul 2021 13:54:31 +0200
> 
> On Fri, Jul 23 2021, Mark Kettenis <[email protected]> wrote:
> >> From: Jeremie Courreges-Anglas <[email protected]>
> >> Date: Fri, 23 Jul 2021 11:54:51 +0200
> >> Content-Type: text/plain
> >> 
> >> 
> >> I've been using a variation of this diff on my hifive unmatched since
> >> a few days.  The goal is to at least optimize the aligned cases by using
> >> 8 or 4 bytes loads/stores.  On this hifive unmatched, I found that
> >> unaligned 8 or 4 bytes loads/stores loops are utterly slow, much slower
> >> than equivalent 1 byte loads/stores (say 40x slower).
> >> 
> >> This improves eg i/o throughput and shaves off between 10 and 15s out of
> >> a total 11m30s in ''make clean; make -j4'' kernel builds.
> >> 
> >> I have another diff that tries to re-align initially unaligned addresses
> >> if possible but it's uglier and it's hard to tell whether it makes any
> >> difference in real life.
> >> 
> >> ok?
> >> 
> >> 
> >> Index: copy.S
> >> ===================================================================
> >> RCS file: /d/cvs/src/sys/arch/riscv64/riscv64/copy.S,v
> >> retrieving revision 1.6
> >> diff -u -p -p -u -r1.6 copy.S
> >> --- copy.S 28 Jun 2021 18:53:10 -0000      1.6
> >> +++ copy.S 23 Jul 2021 07:45:16 -0000
> >> @@ -49,8 +49,38 @@ ENTRY(copyin)
> >>    SWAP_FAULT_HANDLER(a3, a4, a5)
> >>    ENTER_USER_ACCESS(a4)
> >>  
> >> -// XXX optimize?
> >>  .Lcopyio:
> >> +.Lcopy8:
> >> +  li      a5, 8
> >> +  bltu    a2, a5, .Lcopy4
> >> +
> >> +  or      a7, a0, a1
> >> +  andi    a7, a7, 7
> >> +  bnez    a7, .Lcopy4
> >> +
> >> +1:        ld      a4, 0(a0)
> >> +  addi    a0, a0, 8
> >> +  sd      a4, 0(a1)
> >> +  addi    a1, a1, 8
> >> +  addi    a2, a2, -8
> >> +  bgtu    a2, a5, 1b
> >
> > Shouldn't this be
> >
> >     bgeu    a2, a5, 1b
> 
> Yes, that's better ideed, thanks!  Updated diff.

ok kettenis@

> Index: copy.S
> ===================================================================
> RCS file: /d/cvs/src/sys/arch/riscv64/riscv64/copy.S,v
> retrieving revision 1.6
> diff -u -p -p -u -r1.6 copy.S
> --- copy.S    28 Jun 2021 18:53:10 -0000      1.6
> +++ copy.S    23 Jul 2021 11:52:54 -0000
> @@ -49,8 +49,38 @@ ENTRY(copyin)
>       SWAP_FAULT_HANDLER(a3, a4, a5)
>       ENTER_USER_ACCESS(a4)
>  
> -// XXX optimize?
>  .Lcopyio:
> +.Lcopy8:
> +     li      a5, 8
> +     bltu    a2, a5, .Lcopy4
> +
> +     or      a7, a0, a1
> +     andi    a7, a7, 7
> +     bnez    a7, .Lcopy4
> +
> +1:   ld      a4, 0(a0)
> +     addi    a0, a0, 8
> +     sd      a4, 0(a1)
> +     addi    a1, a1, 8
> +     addi    a2, a2, -8
> +     bgeu    a2, a5, 1b
> +
> +.Lcopy4:
> +     li      a5, 4
> +     bltu    a2, a5, .Lcopy1
> +
> +     andi    a7, a7, 3
> +     bnez    a7, .Lcopy1
> +
> +1:   lw      a4, 0(a0)
> +     addi    a0, a0, 4
> +     sw      a4, 0(a1)
> +     addi    a1, a1, 4
> +     addi    a2, a2, -4
> +     bgeu    a2, a5, 1b
> +
> +.Lcopy1:
> +     beqz    a2, .Lcopy0
>  1:   lb      a4, 0(a0)
>       addi    a0, a0, 1
>       sb      a4, 0(a1)
> @@ -58,6 +88,7 @@ ENTRY(copyin)
>       addi    a2, a2, -1
>       bnez    a2, 1b
>  
> +.Lcopy0:
>       EXIT_USER_ACCESS(a4)
>       SET_FAULT_HANDLER(a3, a4)
>  .Lcopyiodone:
> 
> 
> -- 
> jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF  DDCC 0DFA 74AE 1524 E7EE
> 

Reply via email to