Greetings,

Thanks for the details, that explains it all very nicely.  I grabbed
NASM, installed, and it all built just fine.

Just for fun, I grabbed "objconv" and disassembled sha256-586.obj, and
looked at the sections of code that were an issue when using the
unsupported ml.exe, they looked like this when NASM created the object
code:

L$011grand_avx LABEL NEAR
        vmovdqu xmm0, xmmword ptr [edi]                 ; 33A0 _ C5 FA: 6F. 07
        vmovdqu xmm1, xmmword ptr [edi+10H]             ; 33A4 _ C5
FA: 6F. 4F, 10
        vmovdqu xmm2, xmmword ptr [edi+20H]             ; 33A9 _ C5
FA: 6F. 57, 20
        vmovdqu xmm3, xmmword ptr [edi+30H]             ; 33AE _ C5
FA: 6F. 5F, 30
; Note: Immediate operand could be made smaller by sign extension
        add     edi, 64                                 ; 33B3 _ 81.
C7, 00000040
        vpshufb xmm0, xmm0, xmm7                        ; 33B9 _ C4 E2
79: 00. C7
        mov     dword ptr [esp+64H], edi                ; 33BE _ 89. 7C 24, 64
        vpshufb xmm1, xmm1, xmm7                        ; 33C2 _ C4 E2
71: 00. CF
        vpshufb xmm2, xmm2, xmm7                        ; 33C7 _ C4 E2
69: 00. D7
        vpaddd  xmm4, xmm0, xmmword ptr [ebp]           ; 33CC _ C5
F9: FE. 65, 00
        vpshufb xmm3, xmm3, xmm7                        ; 33D1 _ C4 E2
61: 00. DF
        vpaddd  xmm5, xmm1, xmmword ptr [ebp+10H]       ; 33D6 _ C5
F1: FE. 6D, 10
        vpaddd  xmm6, xmm2, xmmword ptr [ebp+20H]       ; 33DB _ C5
E9: FE. 75, 20
        vpaddd  xmm7, xmm3, xmmword ptr [ebp+30H]       ; 33E0 _ C5
E1: FE. 7D, 30
        vmovdqa xmmword ptr [esp+20H], xmm4             ; 33E5 _ C5
F9: 7F. 64 24, 20
        vmovdqa xmmword ptr [esp+30H], xmm5             ; 33EB _ C5
F9: 7F. 6C 24, 30
        vmovdqa xmmword ptr [esp+40H], xmm6             ; 33F1 _ C5
F9: 7F. 74 24, 40
        vmovdqa xmmword ptr [esp+50H], xmm7             ; 33F7 _ C5
F9: 7F. 7C 24, 50
; Note: Immediate operand could be made smaller by sign extension
        jmp     L$012avx_00_47                          ; 33FD _ E9, 0000000E

I remembered seeing XMMWORD being taken into consideration if the
opcode args were of the xmm variety, so out of curiosity I added a
little code to my previous patch to modify the opcode's third arg as
well, if present:

 --- openssl-1.0.2-stable-SNAP-20140226\crypto\perlasm\x86masm_ORIG.pl
2014-02-27 10:40:36.599122930 -0400
 +++ openssl-1.0.2-stable-SNAP-20140226\crypto\perlasm\x86masm.pl
2014-02-27 23:10:15.051142244 -0400
 @@ -22,6 +22,10 @@
      { # fix xmm references
  $arg[0] =~ s/\b[A-Z]+WORD\s+PTR/XMMWORD PTR/i if ($arg[1]=~/\bxmm[0-7]\b/i);
  $arg[1] =~ s/\b[A-Z]+WORD\s+PTR/XMMWORD PTR/i if ($arg[0]=~/\bxmm[0-7]\b/i);
 + if (defined($arg[2])) {
 + $arg[2] =~ s/\b[A-Z]+WORD\s+PTR/XMMWORD PTR/i if ($arg[1]=~/\bxmm[0-7]\b/i);
 + $arg[2] =~ s/\b[A-Z]+WORD\s+PTR/XMMWORD PTR/i if ($arg[0]=~/\bxmm[0-7]\b/i);
 + }
      }
      &::emit($opcode,@arg);
 @@ -160,13 +164,13 @@
  {   push(@out,"PUBLIC\t".&::LABEL($_[0],$nmdecor.$_[0])."\n");   }
  sub ::data_byte
 -{   push(@out,("DB\t").join(',',@_)."\n"); }
 +{   push @out, (("DB\t").join(',',splice(@_, 0, 16))."\n") while @_; }
  sub ::data_short
 -{   push(@out,("DW\t").join(',',@_)."\n"); }
 +{   push @out, (("DW\t").join(',',splice(@_, 0, 8))."\n") while @_; }
  sub ::data_word
 -{   push(@out,("DD\t").join(',',@_)."\n"); }
 +{   push @out, (("DD\t").join(',',splice(@_, 0, 4))."\n") while @_; }
  sub ::align
 {   push(@out,"ALIGN\t$_[0]\n"); }


And I generated it again the wrong way using ml.exe.  It passed the
tests as far as my older CPU could take it.  Then disassembled that
section:

$L011grand_avx LABEL NEAR
        vmovdqu xmm0, xmmword ptr [edi]                 ; 3380 _ C5 FA: 6F. 07
        vmovdqu xmm1, xmmword ptr [edi+10H]             ; 3384 _ C5
FA: 6F. 4F, 10
        vmovdqu xmm2, xmmword ptr [edi+20H]             ; 3389 _ C5
FA: 6F. 57, 20
        vmovdqu xmm3, xmmword ptr [edi+30H]             ; 338E _ C5
FA: 6F. 5F, 30
        add     edi, 64                                 ; 3393 _ 83. C7, 40
        vpshufb xmm0, xmm0, xmm7                        ; 3396 _ C4 E2
79: 00. C7
        mov     dword ptr [esp+64H], edi                ; 339B _ 89. 7C 24, 64
        vpshufb xmm1, xmm1, xmm7                        ; 339F _ C4 E2
71: 00. CF
        vpshufb xmm2, xmm2, xmm7                        ; 33A4 _ C4 E2
69: 00. D7
        vpaddd  xmm4, xmm0, xmmword ptr [ebp]           ; 33A9 _ C5
F9: FE. 65, 00
        vpshufb xmm3, xmm3, xmm7                        ; 33AE _ C4 E2
61: 00. DF
        vpaddd  xmm5, xmm1, xmmword ptr [ebp+10H]       ; 33B3 _ C5
F1: FE. 6D, 10
        vpaddd  xmm6, xmm2, xmmword ptr [ebp+20H]       ; 33B8 _ C5
E9: FE. 75, 20
        vpaddd  xmm7, xmm3, xmmword ptr [ebp+30H]       ; 33BD _ C5
E1: FE. 7D, 30
        vmovdqa xmmword ptr [esp+20H], xmm4             ; 33C2 _ C5
F9: 7F. 64 24, 20
        vmovdqa xmmword ptr [esp+30H], xmm5             ; 33C8 _ C5
F9: 7F. 6C 24, 30
        vmovdqa xmmword ptr [esp+40H], xmm6             ; 33CE _ C5
F9: 7F. 74 24, 40
        vmovdqa xmmword ptr [esp+50H], xmm7             ; 33D4 _ C5
F9: 7F. 7C 24, 50
        jmp     $L012avx_00_47                          ; 33DA _ EB, 04


It looks like it is generating better code now for that opcode at
least.  I know the issue on
https://github.com/openssl/openssl/issues/34 is officially due to an
"unsupported" use, but just in case that old masm code is still in the
distribution for a reason I thought I'd report what I found out.

Thanks again,

Steve...

On Thu, Feb 27, 2014 at 3:58 PM, Andy Polyakov <[email protected]> wrote:
> Hi,
>
>> ...  Even with NASM doing the SafeSEH stuff, I
>>
>> think I do have to put the /safeseh part in myself for VS to complete
>> the build properly, don't I?
>
>
> No.
>
>
>> I remember when I did not put it in for
>> ml.exe that it complained about the other stuff not being safeseh, so
>> I think that is needed for VS.
>
>
> That is correct. You must add /safeseh to ml. But you don't need it with
> nasm. Basically it works like this. If all .obj modules you try to link at
> any given occasion are safeseh-aware, then linker will generate safeseh
> table even without /safeseh argument. If at least one .obj is not
> safeseh-aware (like one generated by ml without /safeseh flag), then linker
> will fail if you pass /safeseh argument and silently omit the table
> otherwise.
>
>
>> As for the incorrect results ... that is the "fun" part!  I compared
>> my new "I used vpaddq for the heck of it" 32 bit build to a 64 bit
>> build out there in net-land from a year ago, and I get the same hash,
>> see below.  That is why I was wondering, scratching my head :-)
>
>
> Is you processor AVX-capable? The code in question is AVX and it takes
> AVX-capable processor to observe the incorrect result. Otherwise *another*
> code path is executed and produces correct result.
>
> ______________________________________________________________________
> OpenSSL Project                                 http://www.openssl.org
> Development Mailing List                       [email protected]
> Automated List Manager                           [email protected]



-- 
Steve Kneizys
Senior Business Process Engineer
Voice: (610) 256-1396  [For Emergency Service (888)864-3282]
Ferrilli Information Group -- Quality Service and Solutions for Higher Education
web: http://www.ferrilli.com/

Making you a success while exceeding your expectations.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [email protected]
Automated List Manager                           [email protected]

Reply via email to