Greetings,
Thanks for the details, that explains it all very nicely. I grabbed
NASM, installed, and it all built just fine.
Just for fun, I grabbed "objconv" and disassembled sha256-586.obj, and
looked at the sections of code that were an issue when using the
unsupported ml.exe, they looked like this when NASM created the object
code:
L$011grand_avx LABEL NEAR
vmovdqu xmm0, xmmword ptr [edi] ; 33A0 _ C5 FA: 6F. 07
vmovdqu xmm1, xmmword ptr [edi+10H] ; 33A4 _ C5
FA: 6F. 4F, 10
vmovdqu xmm2, xmmword ptr [edi+20H] ; 33A9 _ C5
FA: 6F. 57, 20
vmovdqu xmm3, xmmword ptr [edi+30H] ; 33AE _ C5
FA: 6F. 5F, 30
; Note: Immediate operand could be made smaller by sign extension
add edi, 64 ; 33B3 _ 81.
C7, 00000040
vpshufb xmm0, xmm0, xmm7 ; 33B9 _ C4 E2
79: 00. C7
mov dword ptr [esp+64H], edi ; 33BE _ 89. 7C 24, 64
vpshufb xmm1, xmm1, xmm7 ; 33C2 _ C4 E2
71: 00. CF
vpshufb xmm2, xmm2, xmm7 ; 33C7 _ C4 E2
69: 00. D7
vpaddd xmm4, xmm0, xmmword ptr [ebp] ; 33CC _ C5
F9: FE. 65, 00
vpshufb xmm3, xmm3, xmm7 ; 33D1 _ C4 E2
61: 00. DF
vpaddd xmm5, xmm1, xmmword ptr [ebp+10H] ; 33D6 _ C5
F1: FE. 6D, 10
vpaddd xmm6, xmm2, xmmword ptr [ebp+20H] ; 33DB _ C5
E9: FE. 75, 20
vpaddd xmm7, xmm3, xmmword ptr [ebp+30H] ; 33E0 _ C5
E1: FE. 7D, 30
vmovdqa xmmword ptr [esp+20H], xmm4 ; 33E5 _ C5
F9: 7F. 64 24, 20
vmovdqa xmmword ptr [esp+30H], xmm5 ; 33EB _ C5
F9: 7F. 6C 24, 30
vmovdqa xmmword ptr [esp+40H], xmm6 ; 33F1 _ C5
F9: 7F. 74 24, 40
vmovdqa xmmword ptr [esp+50H], xmm7 ; 33F7 _ C5
F9: 7F. 7C 24, 50
; Note: Immediate operand could be made smaller by sign extension
jmp L$012avx_00_47 ; 33FD _ E9, 0000000E
I remembered seeing XMMWORD being taken into consideration if the
opcode args were of the xmm variety, so out of curiosity I added a
little code to my previous patch to modify the opcode's third arg as
well, if present:
--- openssl-1.0.2-stable-SNAP-20140226\crypto\perlasm\x86masm_ORIG.pl
2014-02-27 10:40:36.599122930 -0400
+++ openssl-1.0.2-stable-SNAP-20140226\crypto\perlasm\x86masm.pl
2014-02-27 23:10:15.051142244 -0400
@@ -22,6 +22,10 @@
{ # fix xmm references
$arg[0] =~ s/\b[A-Z]+WORD\s+PTR/XMMWORD PTR/i if ($arg[1]=~/\bxmm[0-7]\b/i);
$arg[1] =~ s/\b[A-Z]+WORD\s+PTR/XMMWORD PTR/i if ($arg[0]=~/\bxmm[0-7]\b/i);
+ if (defined($arg[2])) {
+ $arg[2] =~ s/\b[A-Z]+WORD\s+PTR/XMMWORD PTR/i if ($arg[1]=~/\bxmm[0-7]\b/i);
+ $arg[2] =~ s/\b[A-Z]+WORD\s+PTR/XMMWORD PTR/i if ($arg[0]=~/\bxmm[0-7]\b/i);
+ }
}
&::emit($opcode,@arg);
@@ -160,13 +164,13 @@
{ push(@out,"PUBLIC\t".&::LABEL($_[0],$nmdecor.$_[0])."\n"); }
sub ::data_byte
-{ push(@out,("DB\t").join(',',@_)."\n"); }
+{ push @out, (("DB\t").join(',',splice(@_, 0, 16))."\n") while @_; }
sub ::data_short
-{ push(@out,("DW\t").join(',',@_)."\n"); }
+{ push @out, (("DW\t").join(',',splice(@_, 0, 8))."\n") while @_; }
sub ::data_word
-{ push(@out,("DD\t").join(',',@_)."\n"); }
+{ push @out, (("DD\t").join(',',splice(@_, 0, 4))."\n") while @_; }
sub ::align
{ push(@out,"ALIGN\t$_[0]\n"); }
And I generated it again the wrong way using ml.exe. It passed the
tests as far as my older CPU could take it. Then disassembled that
section:
$L011grand_avx LABEL NEAR
vmovdqu xmm0, xmmword ptr [edi] ; 3380 _ C5 FA: 6F. 07
vmovdqu xmm1, xmmword ptr [edi+10H] ; 3384 _ C5
FA: 6F. 4F, 10
vmovdqu xmm2, xmmword ptr [edi+20H] ; 3389 _ C5
FA: 6F. 57, 20
vmovdqu xmm3, xmmword ptr [edi+30H] ; 338E _ C5
FA: 6F. 5F, 30
add edi, 64 ; 3393 _ 83. C7, 40
vpshufb xmm0, xmm0, xmm7 ; 3396 _ C4 E2
79: 00. C7
mov dword ptr [esp+64H], edi ; 339B _ 89. 7C 24, 64
vpshufb xmm1, xmm1, xmm7 ; 339F _ C4 E2
71: 00. CF
vpshufb xmm2, xmm2, xmm7 ; 33A4 _ C4 E2
69: 00. D7
vpaddd xmm4, xmm0, xmmword ptr [ebp] ; 33A9 _ C5
F9: FE. 65, 00
vpshufb xmm3, xmm3, xmm7 ; 33AE _ C4 E2
61: 00. DF
vpaddd xmm5, xmm1, xmmword ptr [ebp+10H] ; 33B3 _ C5
F1: FE. 6D, 10
vpaddd xmm6, xmm2, xmmword ptr [ebp+20H] ; 33B8 _ C5
E9: FE. 75, 20
vpaddd xmm7, xmm3, xmmword ptr [ebp+30H] ; 33BD _ C5
E1: FE. 7D, 30
vmovdqa xmmword ptr [esp+20H], xmm4 ; 33C2 _ C5
F9: 7F. 64 24, 20
vmovdqa xmmword ptr [esp+30H], xmm5 ; 33C8 _ C5
F9: 7F. 6C 24, 30
vmovdqa xmmword ptr [esp+40H], xmm6 ; 33CE _ C5
F9: 7F. 74 24, 40
vmovdqa xmmword ptr [esp+50H], xmm7 ; 33D4 _ C5
F9: 7F. 7C 24, 50
jmp $L012avx_00_47 ; 33DA _ EB, 04
It looks like it is generating better code now for that opcode at
least. I know the issue on
https://github.com/openssl/openssl/issues/34 is officially due to an
"unsupported" use, but just in case that old masm code is still in the
distribution for a reason I thought I'd report what I found out.
Thanks again,
Steve...
On Thu, Feb 27, 2014 at 3:58 PM, Andy Polyakov <[email protected]> wrote:
> Hi,
>
>> ... Even with NASM doing the SafeSEH stuff, I
>>
>> think I do have to put the /safeseh part in myself for VS to complete
>> the build properly, don't I?
>
>
> No.
>
>
>> I remember when I did not put it in for
>> ml.exe that it complained about the other stuff not being safeseh, so
>> I think that is needed for VS.
>
>
> That is correct. You must add /safeseh to ml. But you don't need it with
> nasm. Basically it works like this. If all .obj modules you try to link at
> any given occasion are safeseh-aware, then linker will generate safeseh
> table even without /safeseh argument. If at least one .obj is not
> safeseh-aware (like one generated by ml without /safeseh flag), then linker
> will fail if you pass /safeseh argument and silently omit the table
> otherwise.
>
>
>> As for the incorrect results ... that is the "fun" part! I compared
>> my new "I used vpaddq for the heck of it" 32 bit build to a 64 bit
>> build out there in net-land from a year ago, and I get the same hash,
>> see below. That is why I was wondering, scratching my head :-)
>
>
> Is you processor AVX-capable? The code in question is AVX and it takes
> AVX-capable processor to observe the incorrect result. Otherwise *another*
> code path is executed and produces correct result.
>
> ______________________________________________________________________
> OpenSSL Project http://www.openssl.org
> Development Mailing List [email protected]
> Automated List Manager [email protected]
--
Steve Kneizys
Senior Business Process Engineer
Voice: (610) 256-1396 [For Emergency Service (888)864-3282]
Ferrilli Information Group -- Quality Service and Solutions for Higher Education
web: http://www.ferrilli.com/
Making you a success while exceeding your expectations.
______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List [email protected]
Automated List Manager [email protected]