Hi,
Thank you for your quick feedback, I added some comments,
Vincent

On Thu, Oct 14, 2010 at 11:53 AM, Andy Polyakov <[email protected]> wrote:

> Hi,
>
> There will be more comments later.
>
> > - for SHA1(x2) /SHA256(+40%), not such to say. The SHA256 gain is
> > limited due to the low register count of the SH4
>
> It was not my intention to make you implement SHA256. But since you've
> chosen to do it here it goes.
>

=> Of course, but since SHA1 get really old (~2^51), SHA256 get required for
a number of needs.

>
> 1. The file should have been called sha256-sh4.pl, not sha512-sh4.pl.
> MIPS, as well as number of other modules, are called sha512-*, because
> they either generate *both* SHA512 and SHA256 or SHA512 alone. Modules
> that can't generate SHA512 code should *not* be called sha512-*:-)
>
>
=> I fully agree.
The goal is to avoid to patch the Makefile. Currently, there is a sha512-%.o
rule to create the sha256 code, but no sha256-%.o rule.
Correcting that only imply renaming the .pl file and adding a rule in the
Makefile, that I wanted to avoid to minimize the patch.

2. Why do you use tables of small constants? There is 'mov #imm,Rn'
> instruction, where #imm is 8-bit signed value. Works for all [Ss]igma
> constants. As for mask_ff. There is extu.b that does &0xff...
>
>
This is very important for the sh4 serie 200 pipeline: there is only one ALU
pipe, so you have to use load/store for optimization.
I will take a deeper look on the extx ins usage.

3. Position independence is still problem.
>
> > - In SH4 asm, the MOVA is hidden behind a normal mov.l without a base
> > register, so in fact, it is used very often.
>
> Can't confirm this. Well, I can see now that it extensively uses 'mov.l
> @(disp,PC),Rn' for loading constants, but no mova... I.e. following is
> position-independent:
>
>        mov.l   label,rx
>
> label:
>        .long   xxxx
>
> Only[!] as long as xxxx is *not* another label, in which case a
> relocation record is generated voiding position independence.
>
>
=> I know, nevertheless, if you compile code on SH4 with gcc with -PIC, then
the same apply, so currently, I don't see the point to make real PIC code on
this CPU.
I will study some more anyway on that.
Note: the mova ins is a ALU ins for this CPU.

> The issue is that the possible offset (256 words forward only from the
> > mov.l) is very small, so it is just not possible to use direct access
> > most of the time.
> > This is normal for this CPU.
>
> Right. So position-independent way to pull K256 address is something
> like following:
>
>        mova    K256,r0
>        bra     skip
>        nop
>
> .align  4
> K256:
>        .long   ....
>
> skip:
>
> Needless to mention that 'skip' can be something that is already there,
> e.g. sha256asm_loop0.
>
> Looping is not position-independent either, not in SHA1 nor SHA256.
> Position-independent way (for loops larger than 4KB) is following:
>
> loop_start:
>        ...
>        mov.l   loop_size,rx
>        braf    rx
>        nop
> loop_stop:
> .align  4
> loop_size:
>        .long   loop_start-loop_stop
>
> More to come... A.
> ______________________________________________________________________
> OpenSSL Project                                 http://www.openssl.org
> Development Mailing List                       [email protected]
> Automated List Manager                           [email protected]
>

Reply via email to