Re: Bug: dlltool delaylibs corrupt float/double arguments

2022-06-17 Thread Nick Clifton

Hi Strager,


I think the solution is for dll's delaylib trampoline to
save xmm1 on the stack before calling __delayLoadHelper2.
I made a patch which does this, and it fixes the bug for my
code.


Thanks very much for taking the time to track down the cause
of this problem. and for creating a patch. :-)


See attached patch. I think my patch has two problems:

1. AVX/vmovupd/ymm might not be usable on the target
machine, but saving just xmm isn't enough. Should we
perform a CPUID check?


This would only work if the dlltool is run on the same machine,
or same type of machine, as the target machine.  Probably a
safer solution would be to add a new command line option to
select the extended trampoline.  Then it is up to the user to
select the correct trampoline type.

To be really paranoid if the new option is not enabled and
dlltool is running on an x86_64 host, then it could run a
CPUID check and if extended registers are available, issue
a warning message to the user, reminding them of the possible
problem.



2. We store unaligned with vmovupd. Storing aligned with
vmovapd would be better. I haven't looked into how to
align ymm registers when storing on the stack.


"Better" as in better performance, yes ?  I think that in this
case safer is more important than faster, so sticking with
unaligned moves should be OK.



I'd love to get this bug fixed so others don't spend two
days debugging assembly code!


Would you be willing to work on the improvements suggested
above and submitting a revised patch ?  The catch here is
that such a patch would need a copyright assignment from
you before we could accept it.  The links below should provide
more details on this.

Cheers
  Nick

https://www.gnu.org/prep/maintain/html_node/Legally-Significant.html#Legally-Significant
https://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob_plain;f=doc/Copyright/request-assign.future;hb=HEAD




Bug: dlltool delaylibs corrupt float/double arguments

2022-05-21 Thread strager
I am calling a function in another x64 DLL with the
following C signature:

int napi_create_double(void*, double, void*);

The first time I call this function, the 'double' argument
ends up as 1.20305e-307 inside napi_create_double, no matter
what value the caller gives. The 'double' is corrupted.
Calls after the first don't corrupt the 'double'.

The cause is ntdll.dll, eventually called by MinGW's
__delayLoadHelper2, modifying the xmm1 register:

#0  0x7ffd26ce3006 in ntdll!RtlLookupFunctionEntry () from
C:\WINDOWS\SYSTEM32\ntdll.dll
#1  0x7ffd26ce05e8 in ntdll!LdrGetProcedureAddressForCaller ()
from C:\WINDOWS\SYSTEM32\ntdll.dll
#2  0x7ffd26ce00a5 in ntdll!LdrGetProcedureAddressForCaller ()
from C:\WINDOWS\SYSTEM32\ntdll.dll
#3  0x7ffd245b53dc in KERNELBASE!GetProcAddressForCaller () from
C:\WINDOWS\System32\KernelBase.dll
#4  0x7ffcd7b7ca6f in __delayLoadHelper2 (pidd=0x7ffcd7b8ba70
<__DELAY_IMPORT_DESCRIPTOR_node_napi_lib>,
ppfnIATEntry=0x7ffcd7ecd134 <__imp_napi_create_double>)
at 
C:/M/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/misc/delayimp.c:209
#5  0x7ffcd7b717c9 in __tailMerge_node_napi_lib ()
   from MYDLL.dll
#6  0x02ad2fe84c50 in ?? ()

   0x7ffd26ce2ffb <+1051>:  movups (%rdx),%xmm0
   0x7ffd26ce2ffe <+1054>:  movups %xmm0,(%rsi)
   0x7ffd26ce3001 <+1057>:  movsd  0x10(%rdx),%xmm1
=> 0x7ffd26ce3006 <+1062>:  movsd  %xmm1,0x10(%rsi)
   0x7ffd26ce300b <+1067>:  mov(%rsi),%rbp
   0x7ffd26ce300e <+1070>:  mov%r11,%rax
   0x7ffd26ce3011 <+1073>:  lock cmpxchg %r12,0x1384d6(%rip)
 # 0x7ffd26e1b4f0
   0x7ffd26ce301a <+1082>:  jne0x7ffd26ce3102


According to Windows x64 documentation, xmm1 is a volatile
register:
https://docs.microsoft.com/en-us/cpp/build/x64-software-conventions?redirectedfrom=MSDN=msvc-170

I think the solution is for dll's delaylib trampoline to
save xmm1 on the stack before calling __delayLoadHelper2.
I made a patch which does this, and it fixes the bug for my
code.

See attached patch. I think my patch has two problems:

1. AVX/vmovupd/ymm might not be usable on the target
   machine, but saving just xmm isn't enough. Should we
   perform a CPUID check?
2. We store unaligned with vmovupd. Storing aligned with
   vmovapd would be better. I haven't looked into how to
   align ymm registers when storing on the stack.

I'd love to get this bug fixed so others don't spend two
days debugging assembly code!

Matthew "strager" Glazar


dlltool.patch
Description: Binary data