Re: DLL symbol identity

2015-05-13 Thread Benjamin Thaut via Digitalmars-d

On Wednesday, 13 May 2015 at 11:27:18 UTC, Logan Capaldo wrote:


Yes it won't happen for explicit LoadLibrary's and 
GetProcAddresses, but COM or other plugin systems is an example 
of a situation where many DLLs may expose the same named 
symbols with different definitions, and there may be situations 
where people link to those DLLs directly to get other things 
they provide.


Once again, I'm going to patch the import table. The import table 
gets only generated for symbosl which are _imported_ by a import 
library. This only happens for things that get imported by D 
libraries / executables. Linking against multiple dlls via a 
import library which export the same symbol doesn't work no 
matter if I do the patching or not. So nothing changes in that 
regard. Your COM Dlls are not going to break even if each COM dll 
exports the same symbol. Because these COM specific symbols will 
not be imported by a D library via a import library, so nothing 
changes. The problems you think exist do not exist because I only 
patch the importing table and not the dlls that export the 
symbols. Even if you mix D with C++ you are not going to have 
that problem, because you can't link against multiple libraries 
with the same symbol with C++ either.


Re: DLL symbol identity

2015-05-13 Thread Logan Capaldo via Digitalmars-d

On Wednesday, 13 May 2015 at 07:49:26 UTC, Benjamin Thaut wrote:

On Wednesday, 13 May 2015 at 07:41:27 UTC, Logan Capaldo wrote:


If my program only links against DLLs written in D, sure this 
is no worse than the static library/version flag situation. 
But one of D's features is C and C++ interop. For instance if 
I link against a DLL that happens to provide COM objects am I 
going to start getting weird behaviors because all the 
DllGetClassObjects are 'unified' and we just pick one?


Well this unification will only happen for D libraries. Its not 
going to do that for non D shared libraries (e.g. written in C 
or C++).


And for shared libraries written in a mix of D and C++ or C, or 
shared libraries written in D but that expose extern (C) or 
extern (C++) symbols?


Yes it won't happen for explicit LoadLibrary's and 
GetProcAddresses, but COM or other plugin systems is an example 
of a situation where many DLLs may expose the same named symbols 
with different definitions, and there may be situations where 
people link to those DLLs directly to get other things they 
provide.


Re: DLL symbol identity

2015-05-13 Thread Logan Capaldo via Digitalmars-d

On Wednesday, 13 May 2015 at 11:41:27 UTC, Benjamin Thaut wrote:

On Wednesday, 13 May 2015 at 11:27:18 UTC, Logan Capaldo wrote:


Yes it won't happen for explicit LoadLibrary's and 
GetProcAddresses, but COM or other plugin systems is an 
example of a situation where many DLLs may expose the same 
named symbols with different definitions, and there may be 
situations where people link to those DLLs directly to get 
other things they provide.


Once again, I'm going to patch the import table. The import 
table gets only generated for symbosl which are _imported_ by a 
import library. This only happens for things that get imported 
by D libraries / executables. Linking against multiple dlls via 
a import library which export the same symbol doesn't work no 
matter if I do the patching or not. So nothing changes in that 
regard. Your COM Dlls are not going to break even if each COM 
dll exports the same symbol. Because these COM specific symbols 
will not be imported by a D library via a import library, so 
nothing changes. The problems you think exist do not exist 
because I only patch the importing table and not the dlls that 
export the symbols. Even if you mix D with C++ you are not 
going to have that problem, because you can't link against 
multiple libraries with the same symbol with C++ either.


a.dll provides symbol s1
b.dll provides symbol s1

c.dll imports symbol s1 from a.dll, provides symbol s2
d.dll imports symbol s1 from b.dll, provides symbol s3

e.exe imports symbol s2 from c.dll, imports symbol s3 from d.dll. 
e.exe only needs the import libs from c.dll and d.dll.


You're patching the import tables at runtime correct?. If you 
patch c and d's import tables their s1 import is going to end up 
pointing at the same symbol.


I can build a.dll and c.dll completely independently of d.dll and 
b.dll. There's no opportunity to prevent this at compile time. 
Likewise e.exe doesn't know or care s1 exists so it builds fine 
as well. You don't need a.lib or b.lib to build e.exe.


Re: DLL symbol identity

2015-05-13 Thread Benjamin Thaut via Digitalmars-d

On Wednesday, 13 May 2015 at 12:57:35 UTC, Logan Capaldo wrote:


a.dll provides symbol s1
b.dll provides symbol s1

c.dll imports symbol s1 from a.dll, provides symbol s2
d.dll imports symbol s1 from b.dll, provides symbol s3

e.exe imports symbol s2 from c.dll, imports symbol s3 from 
d.dll. e.exe only needs the import libs from c.dll and d.dll.


You're patching the import tables at runtime correct?. If you 
patch c and d's import tables their s1 import is going to end 
up pointing at the same symbol.


I can build a.dll and c.dll completely independently of d.dll 
and b.dll. There's no opportunity to prevent this at compile 
time. Likewise e.exe doesn't know or care s1 exists so it 
builds fine as well. You don't need a.lib or b.lib to build 
e.exe.


Yes, but exactly the same behavior is currently in place on 
linux. Also your example is quite a corner case, the usual use 
case where you wan't symbols of multiple instances of the same 
template to be merged is more common. I don't see any real use 
case in D where it would be important that the duplicated s1 
symbols are not merged. Non D dlls will not be touched and if you 
really need that behavior you can always put your non D code in a 
seperate Dll to avoid this behavior.


Re: DLL symbol identity

2015-05-13 Thread Logan Capaldo via Digitalmars-d

On Wednesday, 13 May 2015 at 13:31:15 UTC, Benjamin Thaut wrote:

On Wednesday, 13 May 2015 at 12:57:35 UTC, Logan Capaldo wrote:


a.dll provides symbol s1
b.dll provides symbol s1

c.dll imports symbol s1 from a.dll, provides symbol s2
d.dll imports symbol s1 from b.dll, provides symbol s3

e.exe imports symbol s2 from c.dll, imports symbol s3 from 
d.dll. e.exe only needs the import libs from c.dll and d.dll.


You're patching the import tables at runtime correct?. If you 
patch c and d's import tables their s1 import is going to end 
up pointing at the same symbol.


I can build a.dll and c.dll completely independently of d.dll 
and b.dll. There's no opportunity to prevent this at compile 
time. Likewise e.exe doesn't know or care s1 exists so it 
builds fine as well. You don't need a.lib or b.lib to build 
e.exe.


Yes, but exactly the same behavior is currently in place on 
linux. Also your example is quite a corner case, the usual use 
case where you wan't symbols of multiple instances of the same 
template to be merged is more common.


Imagine a is msvcr90.dll and b is msvcr100.dll. Or a is 
msvcrt.dll. Or a is mfc100u.dll and b is mfc110u.dll. This 
happens all the time, and all we need is for c and d to have a 
little bit of D in them.


Linux (thankfully) doesn't typically have N versions of libc 
floating around.


I _think_ if you only do this for D-mangled symbols you'll get 
99% of the benefits (doing the right things for templates etc.) 
without causing problems for the corner cases.


Re: DLL symbol identity

2015-05-13 Thread Benjamin Thaut via Digitalmars-d

On Wednesday, 13 May 2015 at 13:50:52 UTC, Logan Capaldo wrote:


I _think_ if you only do this for D-mangled symbols you'll get 
99% of the benefits (doing the right things for templates etc.) 
without causing problems for the corner cases.


Yes, that's the plan. I might even do it only for D data symbols, 
because you don't really care about the identity of functions.


Re: DLL symbol identity

2015-05-13 Thread Benjamin Thaut via Digitalmars-d

On Tuesday, 12 May 2015 at 17:48:50 UTC, Logan Capaldo wrote:
q could be a completely different type in a.dll vs. c.dll. 
Please correct me if I am wrong, but my understanding of how 
import libs get used you can't detect this at build time and 
disallow it. Linking d.exe we have no reason to look at a.lib 
and notice the conflict, and even if we did there's no type 
information to go off of anyway and you could assume that they 
were the same.


No q can not be a different type in a.dll vs c.dll
Because of the mangling of the type it would be called a.q once 
and c.q so no conflict would arise.


If you define the same type within the same module but it behaves 
differently depending on where it is used (e.g. depending on 
compiler flags -version -debug etc), this is already an issue and 
will also explode with static libraries. So nothing new here. The 
user of the language has to ensure that all uses of a type see 
the same declaration of the type.




Is your intent to only apply this unification to extern (D) 
symbols?
Why not? I can't think of anything special about extern (D) 
declarations. Just as a reminder, linux already does this for 
_all_ symbols. And it doesn't cause any issues there.


Re: DLL symbol identity

2015-05-13 Thread Logan Capaldo via Digitalmars-d

On Wednesday, 13 May 2015 at 06:17:36 UTC, Benjamin Thaut wrote:

On Tuesday, 12 May 2015 at 17:48:50 UTC, Logan Capaldo wrote:
q could be a completely different type in a.dll vs. c.dll. 
Please correct me if I am wrong, but my understanding of how 
import libs get used you can't detect this at build time and 
disallow it. Linking d.exe we have no reason to look at a.lib 
and notice the conflict, and even if we did there's no type 
information to go off of anyway and you could assume that they 
were the same.


No q can not be a different type in a.dll vs c.dll
Because of the mangling of the type it would be called a.q once 
and c.q so no conflict would arise.




Not if q is extern C or extern C++.



Is your intent to only apply this unification to extern (D) 
symbols?
Why not? I can't think of anything special about extern (D) 
declarations. Just as a reminder, linux already does this for 
_all_ symbols. And it doesn't cause any issues there.


The thing that is special about extern (D) symbols is that the 
module mangling sidesteps my 'q' example.


It does cause issues on Linux actually. I've seen it multiple 
times, usually when first party code and third party both 
unbeknownst to each other both embed different versions of a 
popular source only library.


If my program only links against DLLs written in D, sure this is 
no worse than the static library/version flag situation. But one 
of D's features is C and C++ interop. For instance if I link 
against a DLL that happens to provide COM objects am I going to 
start getting weird behaviors because all the DllGetClassObjects 
are 'unified' and we just pick one?




Re: DLL symbol identity

2015-05-13 Thread Benjamin Thaut via Digitalmars-d

On Wednesday, 13 May 2015 at 07:41:27 UTC, Logan Capaldo wrote:


If my program only links against DLLs written in D, sure this 
is no worse than the static library/version flag situation. But 
one of D's features is C and C++ interop. For instance if I 
link against a DLL that happens to provide COM objects am I 
going to start getting weird behaviors because all the 
DllGetClassObjects are 'unified' and we just pick one?


Well this unification will only happen for D libraries. Its not 
going to do that for non D shared libraries (e.g. written in C or 
C++). The unification is also only going to happen for things 
that are linked in via a import library. So if you load the stuff 
manually with GetProcAddress you still get the real thing. All 
in all the summary is, if it breaks with static libraries it will 
break with shared libraries as well. If you have multiple static 
libraries that all define a symbol called DllGetClassObjects 
then it won't even link.


Re: DLL symbol identity

2015-05-12 Thread Logan Capaldo via Digitalmars-d

On Friday, 8 May 2015 at 05:26:01 UTC, Benjamin Thaut wrote:
I personally would prefer option 2 because it would be easier 
to use and wouldn't cause lots of additional maintenance effort.


Any opinions on this? As both options would be quite some work 
I don't wan't to start blindly with one and risking it being 
rejected later in the PR.


Kind Regards
Benjamin Thaut



(2) would be nice but how would

a.dll provides a.dll!q

b.dll links against a.dll, provides b.dll!w

c.dll provides c.dll!q

d.exe links against b.dll (b.lib) and c.dll (c.lib).

work?

q could be a completely different type in a.dll vs. c.dll. Please 
correct me if I am wrong, but my understanding of how import libs 
get used you can't detect this at build time and disallow it. 
Linking d.exe we have no reason to look at a.lib and notice the 
conflict, and even if we did there's no type information to go 
off of anyway and you could assume that they were the same.


Is your intent to only apply this unification to extern (D) 
symbols?




Re: DLL symbol identity

2015-05-11 Thread Benjamin Thaut via Digitalmars-d

On Sunday, 10 May 2015 at 21:44:59 UTC, Dicebot wrote:
Well choice between two presented options seems obvious so I 
suspect a catch :)


Well, exactly like with the shared library visibility the only 
catch might be Walter's and Andrei's opinion.


Re: DLL symbol identity

2015-05-11 Thread Marco Leise via Digitalmars-d
Am Sun, 10 May 2015 19:51:26 +
schrieb Dicebot pub...@dicebot.lv:

 On Friday, 8 May 2015 at 05:26:01 UTC, Benjamin Thaut wrote:
  Pro:
  - Its the plain windows shared library mechanism in all its 
  uglyness.
 
 I wonder if anyone can provide more Pro input :)

Yep, this is an area where I have no expertise and what you
provided made me wonder if it is a technical analysis or a
sales pitch for unique symbols.

Why did Microsoft go with that approach, why did it work for
them and why does it not map well to D ?

-- 
Marco



Re: DLL symbol identity

2015-05-11 Thread Benjamin Thaut via Digitalmars-d

Why did Microsoft go with that approach,


Maybe they didn't know better back then. Historically DLLs 
initially didn't support data symbols at all, only functions 
where supported. For functions its not a problem if they are 
duplicated because usually you don't compare pointers to 
functions a lot. Later they added support for data symbols 
building on what they had. I assume the system that is in place 
now is a result of that.



why did it work for them
Because C/C++ are not as template heavy as D and you basically 
try to avoid cross dll templates in c++ at all cost when 
developing for windows. Because if you do use templates across 
dll boundaries and you are not super careful you get a lot of 
issues due to duplicate symbols (e.g. static variables existing 
twice etc). MSVC gets around the casting issue by essentially 
doing string comparisons for dynamic casts which comes with a 
significant performance impact. On the other hand you don't use 
dynamic casts in c++ a lot (if you care about performance).



and why does it not map well to D ?
D uses tons of templates everywhere. Even type information for 
non templated types is generated on demand and stored in comdats 
which can lead to duplicate symbols the same way it does for 
templates. In D the dynamic cast is basically the default and you 
have to force the compiler to not use a dynamic cast if you care 
for performance.



Its not like the linux approach doesn't have issues as well. I 
heard of cases where people put large parts of boost into a 
shared library and the linux loader would take multiple minutes 
to load the shared library into the program. This however is 
mostly due to the fact that on linux all symbols are visible from 
a shared library by default. In later versions of gcc (4+) they 
added a option to make all symbols hidden by default 
(-fvisibility=hidden) and you can make only those visible that 
you need. This then significantly speeds up loading of shared 
libraries because the number of symbols that need to be resolved 
is greatly decreased.


On the other hand the linux approach has a additional advantage I 
didn't mention yet. You can use the LD_PRELOAD feature to 
inject shared libraries into processes. E.g. for injecting a 
better malloc library to speed up your favorite program. This is 
not easily possible with the windows approach to shared libraries.


Re: DLL symbol identity

2015-05-11 Thread Marco Leise via Digitalmars-d
Thanks for the insight into how this affects MSVC++, too.

How much work do you think would have to be done at startup of
an application like Firefox or QtCreator if they were not in
C++, but D?

Most of us have no idea what the algorithm would look like and
what data sets to expect.

I guess you'd have to collect all the imported symbols from
all exe/dll modules and put the list of addresses for each
unique symbol into some multi-set that maps symbol names to a
list of adresses:

abc - [a.dll @ 0x359428F0, b.dll @ 0x5E30A410]
def - [b.dll @ 0x38C3D200]

Then the symbol name is no longer relevant so it can be
thought of as an array of address arrays

[
  [0x359428F0, 0x5E30A410],
  [0x38C3D200]
]

where you pick one item from each of the arrays (e.g. the
first one and map all others to that):

0x359428F0 - 0x359428F0
0x5E30A410 - 0x359428F0
0x38C3D200 - 0x38C3D200

Then you go through all import address tables and perform
the above remapping to make symbols unique.

Is that what would happen?

-- 
Marco



Re: DLL symbol identity

2015-05-11 Thread Martin Nowak via Digitalmars-d

On Friday, 8 May 2015 at 05:26:01 UTC, Benjamin Thaut wrote:
And Step 2) at program start up time. This means that symbols 
don't have identity. If different shared libraries provide the 
same symbol it may exist multiple times and multiple instances 
might be in use.


Can you elaborate a bit on that?
How would you run into such an ODR violation, by linking against 
multiple import libraries that contain the same symbol?


Any opinions on this? As both options would be quite some work 
I don't wan't to start blindly with one and risking it being 
rejected later in the PR.


Last time we thought about this we came to the conclusion that 
global uniqueness for symbols isn't possible, even on Unix when 
you have 2 comdat/weak typeinfos for template classes in 2 
different shared libraries but not in the executable. I suggested 
that we could wrap typeinfos for template types in something like 
TypeInfo_Comdat that would do a equality comparison based on name 
and type size.


Re: DLL symbol identity

2015-05-11 Thread Benjamin Thaut via Digitalmars-d

On Monday, 11 May 2015 at 14:57:46 UTC, Marco Leise wrote:


Is that what would happen?


Yes, that's exactly what would happen. You could go one step 
further and not do it for all symbols, instead you make the 
compiler emit a additional section with references to all 
relevant data symbols. Then you only do the patching operation on 
the data symbols and leave all other symbols as is. This would 
greatly reduce the number of symbols that require patching.


The exepcted data set size should be significantly smaller then 
on linux. Because currently on linux D simply exports all 
symbols. Which means that the linux loader does this patching for 
all symbols. On windows only symbols with the export protection 
level get exported. That means the set of symbols this patching 
has to be done for is a lot smaller to begin with. The additional 
optimization would reduce the number of symbols to patch once 
again. So even if the custom implementation is vastly inferior to 
what the linux loader does (which I don't think it will be) it 
still should be fast enough to not influence program startup time 
a lot.


Re: DLL symbol identity

2015-05-11 Thread Benjamin Thaut via Digitalmars-d

Am 11.05.2015 um 16:21 schrieb Martin Nowak:


Can you elaborate a bit on that?
How would you run into such an ODR violation, by linking against
multiple import libraries that contain the same symbol?


I will post some code examples later. Code usually shows the issue best.



Last time we thought about this we came to the conclusion that global
uniqueness for symbols isn't possible, even on Unix when you have 2
comdat/weak typeinfos for template classes in 2 different shared
libraries but not in the executable. I suggested that we could wrap
typeinfos for template types in something like TypeInfo_Comdat that
would do a equality comparison based on name and type size.


Do you have a code example for this issue? I wasn't able to produce a 
duplicate symbol with linux shared libraries yet.




Re: DLL symbol identity

2015-05-11 Thread Laeeth Isharc via Digitalmars-d

On Monday, 11 May 2015 at 12:54:09 UTC, Benjamin Thaut wrote:

and why does it not map well to D ?
D uses tons of templates everywhere. Even type information for 
non templated types is generated on demand and stored in 
comdats which can lead to duplicate symbols the same way it 
does for templates. In D the dynamic cast is basically the 
default and you have to force the compiler to not use a dynamic 
cast if you care for performance.


Sorry for the rookie question, but my background is C rather than 
C++.  How do I force a static cast, and roughly order magnitude 
how big is the cost of a dynamic cast ?


Would you mean for example rather than casting a char[] to a 
string taking the address and casting the pointer?


Re: DLL symbol identity

2015-05-11 Thread Paulo Pinto via Digitalmars-d

On Monday, 11 May 2015 at 15:32:47 UTC, Benjamin Thaut wrote:

On Monday, 11 May 2015 at 14:57:46 UTC, Marco Leise wrote:


Is that what would happen?


Yes, that's exactly what would happen. You could go one step 
further and not do it for all symbols, instead you make the 
compiler emit a additional section with references to all 
relevant data symbols. Then you only do the patching operation 
on the data symbols and leave all other symbols as is. This 
would greatly reduce the number of symbols that require 
patching.


The exepcted data set size should be significantly smaller then 
on linux. Because currently on linux D simply exports all 
symbols. Which means that the linux loader does this patching 
for all symbols. On windows only symbols with the export 
protection level get exported. That means the set of symbols 
this patching has to be done for is a lot smaller to begin 
with. The additional optimization would reduce the number of 
symbols to patch once again. So even if the custom 
implementation is vastly inferior to what the linux loader does 
(which I don't think it will be) it still should be fast enough 
to not influence program startup time a lot.



Just as info, Windows is not alone.

There are a few other systems that follow the same process.

For example, Aix used to be Windows like and nowadays it has a 
mix of ELF and Windows modes.


http://www.ibm.com/developerworks/aix/library/au-aix-symbol-visibility/

Symbian although dead, also used the Windows approach if I 
remember correctly.


I expect other non-POSIX OSes not to follow the ELF way.

--
Paulo


Re: DLL symbol identity

2015-05-11 Thread Piotrek via Digitalmars-d

On Sunday, 10 May 2015 at 19:27:03 UTC, Benjamin Thaut wrote:

Does nobody have a opinion on this?


Sorry for being an extreme noob in the matter.

Probably, only Manu fought with Windows dlls for real.
As a user I would say I want short startup times as I 
change/execute the active application *very* often. However I'm 
not sure I hit HDD seek time penalty or the system loader 
activity.


TBH I think Linux is more sleepy which I don't like (but again, 
this may be prefetch problem, I don't know).


And by maintenance overhead for 1st option you mean explicit 
handling in library source code? Isn't it the job for 
compiler/linker?


Piotrek


Re: DLL symbol identity

2015-05-11 Thread Benjamin Thaut via Digitalmars-d

Am 11.05.2015 um 21:39 schrieb Laeeth Isharc:

On Monday, 11 May 2015 at 12:54:09 UTC, Benjamin Thaut wrote:

and why does it not map well to D ?

D uses tons of templates everywhere. Even type information for non
templated types is generated on demand and stored in comdats which can
lead to duplicate symbols the same way it does for templates. In D the
dynamic cast is basically the default and you have to force the
compiler to not use a dynamic cast if you care for performance.


Sorry for the rookie question, but my background is C rather than C++.
How do I force a static cast, and roughly order magnitude how big is the
cost of a dynamic cast ?

Would you mean for example rather than casting a char[] to a string
taking the address and casting the pointer?


Dynamic casts only apply to classes. They don't apply to basic types.

Example

object o = instance;
SomeClass c = cast(SomeClass)instance; // dynamic cast, checks type info
SomeClass c2 = cast(SomeClass)cast(void*)instance; // unsafe cast, 
simply assumes instance is SomeClass


If you do the cast in a tight loop it can have quite some performance 
impact because it walks the type info chain. Walking the type info 
hirarchy may cause multiple cache misses and thus a significant 
performance impact. The unsafe cast literally does not anything besides 
copying the pointer.


Re: DLL symbol identity

2015-05-11 Thread Laeeth Isharc via Digitalmars-d

On Monday, 11 May 2015 at 20:53:40 UTC, Benjamin Thaut wrote:

Am 11.05.2015 um 21:39 schrieb Laeeth Isharc:

On Monday, 11 May 2015 at 12:54:09 UTC, Benjamin Thaut wrote:

and why does it not map well to D ?
D uses tons of templates everywhere. Even type information 
for non
templated types is generated on demand and stored in comdats 
which can
lead to duplicate symbols the same way it does for templates. 
In D the
dynamic cast is basically the default and you have to force 
the
compiler to not use a dynamic cast if you care for 
performance.


Sorry for the rookie question, but my background is C rather 
than C++.
How do I force a static cast, and roughly order magnitude how 
big is the

cost of a dynamic cast ?

Would you mean for example rather than casting a char[] to a 
string

taking the address and casting the pointer?


Dynamic casts only apply to classes. They don't apply to basic 
types.


Example

object o = instance;
SomeClass c = cast(SomeClass)instance; // dynamic cast, checks 
type info
SomeClass c2 = cast(SomeClass)cast(void*)instance; // unsafe 
cast, simply assumes instance is SomeClass


If you do the cast in a tight loop it can have quite some 
performance impact because it walks the type info chain. 
Walking the type info hirarchy may cause multiple cache misses 
and thus a significant performance impact. The unsafe cast 
literally does not anything besides copying the pointer.


aha - thank you.  I appreciate it.  Laeeth.


Re: DLL symbol identity

2015-05-10 Thread Dicebot via Digitalmars-d
Well choice between two presented options seems obvious so I 
suspect a catch :)


Re: DLL symbol identity

2015-05-10 Thread Benjamin Thaut via Digitalmars-d

Am 10.05.2015 um 21:51 schrieb Dicebot:

On Friday, 8 May 2015 at 05:26:01 UTC, Benjamin Thaut wrote:

Pro:
- Its the plain windows shared library mechanism in all its uglyness.


I wonder if anyone can provide more Pro input :)


I described both implementations of shared libaries. From the 
description alone you should be able to find any other pro arguments 
for the windows approach. The only one I could find was, that its faster 
at program startup time, compared to the linux one, but is inferrior in 
all other points.


Re: DLL symbol identity

2015-05-10 Thread Benjamin Thaut via Digitalmars-d

Does nobody have a opinion on this?



Re: DLL symbol identity

2015-05-10 Thread Dicebot via Digitalmars-d

On Friday, 8 May 2015 at 05:26:01 UTC, Benjamin Thaut wrote:

Pro:
- Its the plain windows shared library mechanism in all its 
uglyness.


I wonder if anyone can provide more Pro input :)


Re: DLL symbol identity

2015-05-08 Thread Kagamin via Digitalmars-d
As I understand, if SomeClass is in some dll, it will be there 
and be unique. If typeid(SomeClass) loads the symbol address from 
IAT, it will be the same address as in dll.


Re: DLL symbol identity

2015-05-08 Thread Benjamin Thaut via Digitalmars-d

On Friday, 8 May 2015 at 08:04:20 UTC, Kagamin wrote:
As I understand, if SomeClass is in some dll, it will be there 
and be unique. If typeid(SomeClass) loads the symbol address 
from IAT, it will be the same address as in dll.


No, you don't understand. TypeInfos are stored in comdats. And 
they are only created if needed. So if you have SomeClass there 
is a typeinfo for SomeClass but not all possible typeinfos are 
created. Say you never use const(SomeClass) and then two other 
dlls use const(SomeClass) then each of those two dlls will 
contain a instance of the TypeInfo for const(SomeClass). This 
issue gets even worse with TypeInfos of templated types.


Re: DLL symbol identity

2015-05-08 Thread Benjamin Thaut via Digitalmars-d

Am 08.05.2015 um 13:34 schrieb Kagamin:

bool checkIfSomeClass(Object o)
{
   return typeid(o) is typeid(SomeClass);
}

Doesn't typeid(o) extract TypeInfo from the object? If it's stored as a
physical value in the object, how can it change transparently for const
class?

As I understand, C++ resorts to preinstantiation of needed templates
when compiling to dlls.


This is obviously a very simplified example. You either have to take my 
word for it about the actualy issue and voice your opinion on the 
decision to make or dig into dmds sources, understand how type infos 
work and then question my issue description. But please don't question 
my description of the issue without actually understanding what the 
implementation looks like.


Let me put my question in a different way:

From the point of a D user, would you rather have 'is' expressions and 
'static' / '__gshared' variables inside classes do strange things 
sometimes when using dlls or would you wan't it to always work without 
considering the underlying implementation. Please choose option 1 or 
option 2.


Re: DLL symbol identity

2015-05-08 Thread Kagamin via Digitalmars-d

bool checkIfSomeClass(Object o)
{
  return typeid(o) is typeid(SomeClass);
}

Doesn't typeid(o) extract TypeInfo from the object? If it's 
stored as a physical value in the object, how can it change 
transparently for const class?


As I understand, C++ resorts to preinstantiation of needed 
templates when compiling to dlls.


DLL symbol identity

2015-05-07 Thread Benjamin Thaut via Digitalmars-d
To implement shared libraries on a operating system level 
generally two steps have to be taken


1) Locate which shared library provides a required symbol
2) Load that library and retrieve the final address of the symbol

Linux does both of those steps at program start up time. As a 
result all symbols have identity. If a symbols appears in 
multiple shared libraries only one will be used (first come first 
serve) and the rest will remain unused.


Windows does step 1) at link time (through so called import 
libraries). And Step 2) at program start up time. This means that 
symbols don't have identity. If different shared libraries 
provide the same symbol it may exist multiple times and multiple 
instances might be in use.


Why is this important for D?
D uses symbol identity in a few places usually through the 'is' 
operator. The most notable is type info objects.


bool checkIfSomeClass(Object o)
{
  return typeid(o) is typeid(SomeClass);
}

The everyday D-user relies on this behavior usually when doing 
dynamic casts.

Object o = ...;
SomeClass c = cast(SomeClass)o;

So if symbols don't have identity all places within druntime and 
phobos which rely on symbol identity have to be identified and 
changed to make it work with windows dlls. I'm currently at a 
point in my Windows Dll implementation where I have to decide how 
to solve this issue. There are two options now.


Option 1)
Leave as is, symbols won't have identity.

Con:
- It has a performance impact, because for making casts and other 
features, which rely on type info objects, work we will have to 
fallback to string comparisons on windows.
- All places within druntime and phobos which use symbol identity 
have to be found and fixed. This is a lot of work and might 
produce many bugs.
- Library writers have to consider this problem every time they 
extend / modify druntime / phobos.
- There are going to be tons of threads on D.learn about Why 
does this not work in a Dll


Pro:
- Its the plain windows shared library mechanism in all its 
uglyness.


Option 2)
Windows already generates a indirection table we could patch. 
Rebind the symbols at program start up time overwriting the 
results of the windows program loader. Essentially reproducing 
the behavior of linux with code in druntime.


Pro:
- Symbols would have identity.
- Everything would behave the same way as on Linux.
- No run time performance impact.

Con:
- Performance impact at program start up time.
- Might increase the binary size (I'm not entirely sure yet if I 
can read all required information out of the binary itself or if 
I have to add more myself)




I personally would prefer option 2 because it would be easier to 
use and wouldn't cause lots of additional maintenance effort.


Any opinions on this? As both options would be quite some work I 
don't wan't to start blindly with one and risking it being 
rejected later in the PR.


Kind Regards
Benjamin Thaut