Re: DMD producing huge binaries

2016-05-23 Thread Steven Schveighoffer via Digitalmars-d

On 5/23/16 4:25 PM, Walter Bright wrote:

On 5/23/2016 12:03 PM, Steven Schveighoffer wrote:

Indeed, D does not overload based on return type ever.


It does factor into type matching when a function pointer is used, for
example.



Yes, that may be the only time return type is factored in, but 
practically speaking, a function overload set that overloads solely on 
return type is not directly callable (you will get ambiguity error), so 
there is little point to create such a situation.


-Steve


Re: DMD producing huge binaries

2016-05-23 Thread Walter Bright via Digitalmars-d

On 5/23/2016 12:03 PM, Steven Schveighoffer wrote:

Indeed, D does not overload based on return type ever.


It does factor into type matching when a function pointer is used, for example.



Re: DMD producing huge binaries

2016-05-23 Thread Steven Schveighoffer via Digitalmars-d

On 5/23/16 3:03 PM, Steven Schveighoffer wrote:

On 5/22/16 5:42 PM, Andrei Alexandrescu wrote:

On 05/21/2016 03:13 PM, Kagamin wrote:

On Saturday, 21 May 2016 at 18:18:21 UTC, Andrei Alexandrescu wrote:

He said that that won't happen any longer, the growth was because of
the return type. Is that correct?


https://issues.dlang.org/show_bug.cgi?id=15831#c4
Looks like growth is due to the fact that the voldemort type is in the
scope of a function and function is fun!(T).fun(T) - hence each level
multiplies by 2. And return type is not part of the mangled name already
- that would cause circular dependency, you would need the voldemort
type mangling to generate it.


Just to make sure I understand: do you agree or disagree that there's no
more exponential growth if we encode "auto return" in the mangling? Thx!
-- Andrei


Not sure if someone has definitively answered before, but no, this does
not. I think this would shrink the growth from 3^n to 2^n.



To clarify, I think this is still a good idea, but only if the type is 
defined internally (no reason to repeat the entire function signature of 
the function you are defining). But if you specify auto return and 
return int, it should still be mangled as returning int.


-Steve


Re: DMD producing huge binaries

2016-05-23 Thread Steven Schveighoffer via Digitalmars-d

On 5/23/16 3:03 PM, Steven Schveighoffer wrote:


In fact, this may be a reason to NOT shortcut the return mangle
(if you had compiled against foo one day, and foo's internal voldemort
type changes the next, it would still link, even though the type may
have changed).


Actually, this is not true. It will only fail to link if the name of the 
internal type changes, or you return some other type :(


-Steve


Re: DMD producing huge binaries

2016-05-23 Thread Steven Schveighoffer via Digitalmars-d

On 5/22/16 5:42 PM, Andrei Alexandrescu wrote:

On 05/21/2016 03:13 PM, Kagamin wrote:

On Saturday, 21 May 2016 at 18:18:21 UTC, Andrei Alexandrescu wrote:

He said that that won't happen any longer, the growth was because of
the return type. Is that correct?


https://issues.dlang.org/show_bug.cgi?id=15831#c4
Looks like growth is due to the fact that the voldemort type is in the
scope of a function and function is fun!(T).fun(T) - hence each level
multiplies by 2. And return type is not part of the mangled name already
- that would cause circular dependency, you would need the voldemort
type mangling to generate it.


Just to make sure I understand: do you agree or disagree that there's no
more exponential growth if we encode "auto return" in the mangling? Thx!
-- Andrei


Not sure if someone has definitively answered before, but no, this does 
not. I think this would shrink the growth from 3^n to 2^n.


The exponential growth happens because of the repeating of the template 
types.


If you look at the example in the bug report, there is no return types 
anywhere. They exist in the mangled names, but are not relevant to the 
FQN or symbol resolution. The return type only plays a factor for 
preventing linking against other files that expect a certain return 
type. In fact, this may be a reason to NOT shortcut the return mangle 
(if you had compiled against foo one day, and foo's internal voldemort 
type changes the next, it would still link, even though the type may 
have changed).


Indeed, D does not overload based on return type ever.

-Steve


Re: DMD producing huge binaries

2016-05-23 Thread Steven Schveighoffer via Digitalmars-d

On 5/19/16 6:17 PM, Walter Bright wrote:

On 5/19/2016 8:39 AM, Steven Schveighoffer wrote:

template(T) foo if (someConstraints)
{
struct Result
{
   T t;
}

auto foo(T t)
{
   return Result(t);
}
}


This solution works awesomely, actually. It even produces smaller code
than
moving the struct completely outside.

I'm going to use this mechanism in iopipe to get my voldemorts back :)



Consider using a 'static struct'. A non-static struct will include a
hidden member that points to the stack frame of foo(), i.e. "produces
smaller code".


Yes, I did use static in my actual code. In fact, I found a bug because 
of this (one of my voldemort functions was allocating a closure because 
I had thought I was using a member but was actually using a function 
parameter).


But this doesn't fix the symbol size problem. The other mechanism does.

-Steve


Re: DMD producing huge binaries

2016-05-22 Thread Anon via Digitalmars-d

On Sunday, 22 May 2016 at 21:40:07 UTC, Andrei Alexandrescu wrote:

On 05/21/2016 06:27 PM, Anon wrote:

On Saturday, 21 May 2016 at 20:50:56 UTC, Walter Bright wrote:

On 5/21/2016 1:49 PM, Walter Bright wrote:
We already have a compressor in the compiler source for 
compressing

names:

https://github.com/dlang/dmd/blob/master/src/backend/compress.c

A faster one would certainly be nice. Anyone game?


Note how well it does:

https://issues.dlang.org/show_bug.cgi?id=15831#c5


The underlying symbols are still growing exponentially. Nobody 
has the

RAM for that, regardless of what size the resultant binary is.

Compressing the symbol names is a bandage. The compiler needs 
a new kidney.


My understanding is that the encoding "auto" return in the 
mangling makes any exponential growth disappear. Is that the 
case? -- Andrei


No:

auto foo(T)(T x)
{
struct Vold { T payload; }
return Vold(x);
}

foo(5)
_D3mod10__T3fooTiZ3fooFNaNbNiNfiZS3mod10__T3fooTiZ3fooFiZ4Vold

foo(5).foo
_D3mod38__T3fooTS3mod10__T3fooTiZ3fooFiZ4VoldZ3fooFNaNbNiNfS3mod10__T3fooTiZ3fooFiZ4VoldZS3mod38__T3fooTS3mod10__T3fooTiZ3fooFiZ4VoldZ3fooFS3mod10__T3fooTiZ3fooFiZ4VoldZ4Vold

foo(5).foo.foo
_D3mod94__T3fooTS3mod38__T3fooTS3mod10__T3fooTiZ3fooFiZ4VoldZ3fooFS3mod10__T3fooTiZ3fooFiZ4VoldZ4VoldZ3fooFNaNbNiNfS3mod38__T3fooTS3mod10__T3fooTiZ3fooFiZ4VoldZ3fooFS3mod10__T3fooTiZ3fooFiZ4VoldZ4VoldZS3mod94__T3fooTS3mod38__T3fooTS3mod10__T3fooTiZ3fooFiZ4VoldZ3fooFS3mod10__T3fooTiZ3fooFiZ4VoldZ4VoldZ3fooFS3mod38__T3fooTS3mod10__T3fooTiZ3fooFiZ4VoldZ3fooFS3mod10__T3fooTiZ3fooFiZ4VoldZ4VoldZ4Vold

Lengths: 62 | 174 | 398

Just dropping the return types to a single character ($) shrinks 
the names, but it doesn't solve the core of the problem. Still 
exponential:


foo(5)
_D3mod1010__T3fooTiZ3fooFNaNbNiNfiZ($)

foo(5).foo
_D3mod38__T3fooT(S3mod10__T3fooTiZ3fooFiZ4Vold)Z3fooFNaNbNiNf(S3mod10__T3fooTiZ3fooFiZ4Vold)Z{$}

foo(5).foo.foo
_D3mod94__T3fooT{S3mod38__T3fooT(S3mod10__T3fooTiZ3fooFiZ4Vold)Z3fooF(S3mod10__T3fooTiZ3fooFiZ4Vold)Z4Vold}Z3fooFNaNbNiNf{S3mod38__T3fooT(S3mod10__T3fooTiZ3fooFiZ4Vold)Z3fooF(S3mod10__T3fooTiZ3fooFiZ4Vold)Z4Vold}Z$

Lengths: 36 | 90 | 202

Note: the part inside () is the return type of the first. The 
part in {} is the return type of the second. I left those in for 
illustrative purposes.


Re: DMD producing huge binaries

2016-05-22 Thread Andrei Alexandrescu via Digitalmars-d

On 05/21/2016 03:13 PM, Kagamin wrote:

On Saturday, 21 May 2016 at 18:18:21 UTC, Andrei Alexandrescu wrote:

He said that that won't happen any longer, the growth was because of
the return type. Is that correct?


https://issues.dlang.org/show_bug.cgi?id=15831#c4
Looks like growth is due to the fact that the voldemort type is in the
scope of a function and function is fun!(T).fun(T) - hence each level
multiplies by 2. And return type is not part of the mangled name already
- that would cause circular dependency, you would need the voldemort
type mangling to generate it.


Just to make sure I understand: do you agree or disagree that there's no 
more exponential growth if we encode "auto return" in the mangling? Thx! 
-- Andrei


Re: DMD producing huge binaries

2016-05-22 Thread Andrei Alexandrescu via Digitalmars-d

On 05/21/2016 06:27 PM, Anon wrote:

On Saturday, 21 May 2016 at 20:50:56 UTC, Walter Bright wrote:

On 5/21/2016 1:49 PM, Walter Bright wrote:

We already have a compressor in the compiler source for compressing
names:

https://github.com/dlang/dmd/blob/master/src/backend/compress.c

A faster one would certainly be nice. Anyone game?


Note how well it does:

https://issues.dlang.org/show_bug.cgi?id=15831#c5


The underlying symbols are still growing exponentially. Nobody has the
RAM for that, regardless of what size the resultant binary is.

Compressing the symbol names is a bandage. The compiler needs a new kidney.


My understanding is that the encoding "auto" return in the mangling 
makes any exponential growth disappear. Is that the case? -- Andrei


Re: DMD producing huge binaries

2016-05-21 Thread Anon via Digitalmars-d

On Saturday, 21 May 2016 at 20:50:56 UTC, Walter Bright wrote:

On 5/21/2016 1:49 PM, Walter Bright wrote:
We already have a compressor in the compiler source for 
compressing names:


  
https://github.com/dlang/dmd/blob/master/src/backend/compress.c


A faster one would certainly be nice. Anyone game?


Note how well it does:

https://issues.dlang.org/show_bug.cgi?id=15831#c5


The underlying symbols are still growing exponentially. Nobody 
has the RAM for that, regardless of what size the resultant 
binary is.


Compressing the symbol names is a bandage. The compiler needs a 
new kidney.


Re: DMD producing huge binaries

2016-05-21 Thread Walter Bright via Digitalmars-d

On 5/21/2016 1:49 PM, Walter Bright wrote:

We already have a compressor in the compiler source for compressing names:

  https://github.com/dlang/dmd/blob/master/src/backend/compress.c

A faster one would certainly be nice. Anyone game?


Note how well it does:

https://issues.dlang.org/show_bug.cgi?id=15831#c5


Re: DMD producing huge binaries

2016-05-21 Thread Walter Bright via Digitalmars-d

On 5/21/2016 1:09 PM, Era Scarecrow wrote:

 Depends on implementation and algorithm. However even the weakest compression
settings can yield huge initial compression benefits. In normal text a reduction
of 2:1 is expected, and will . The benefit being compression still contains all
it's information, and hashing will reduce the size to a fixed length but throw
away all that information for unique identity.

 LZO is a compressor/decompressor that prioritizes speed and can be used in
real-time applications for very little cost to add compression to any
application. I tinkered with some of it's options a while back and it did okay,
not as good as zlib but that's to be expected. Although using zlib on it's
fastest setting will likely yield good results too.

 http://www.oberhumer.com/opensource/lzo/


We already have a compressor in the compiler source for compressing names:

  https://github.com/dlang/dmd/blob/master/src/backend/compress.c

A faster one would certainly be nice. Anyone game?


Re: DMD producing huge binaries

2016-05-21 Thread Era Scarecrow via Digitalmars-d

On Saturday, 21 May 2016 at 17:34:19 UTC, Kagamin wrote:
Equivalent to not mangling return type at all. But this leaves 
you with 2^^n growth, still exponential. Also is there any 
evidence that compression is faster than hashing?


 Depends on implementation and algorithm. However even the 
weakest compression settings can yield huge initial compression 
benefits. In normal text a reduction of 2:1 is expected, and will 
. The benefit being compression still contains all it's 
information, and hashing will reduce the size to a fixed length 
but throw away all that information for unique identity.


 LZO is a compressor/decompressor that prioritizes speed and can 
be used in real-time applications for very little cost to add 
compression to any application. I tinkered with some of it's 
options a while back and it did okay, not as good as zlib but 
that's to be expected. Although using zlib on it's fastest 
setting will likely yield good results too.


 http://www.oberhumer.com/opensource/lzo/


Re: DMD producing huge binaries

2016-05-21 Thread Guillaume Boucher via Digitalmars-d

On Saturday, 21 May 2016 at 18:25:46 UTC, Walter Bright wrote:

On 5/20/2016 11:18 PM, poliklosio wrote:

foo!(boo!(bar!(baz!(int))), #1, #2)
Where #1 and #2 are special symbols that refer to stuff that 
was **already in

the name**, particularly:
#1: bar!(baz!(int))
#2: baz!(int)


This is what LZW compression does, except that LZW does it in 
general rather than just for identifiers.


It's more like LZ77 than LZW.


Re: DMD producing huge binaries

2016-05-21 Thread poliklosio via Digitalmars-d

On Saturday, 21 May 2016 at 18:25:46 UTC, Walter Bright wrote:

On 5/20/2016 11:18 PM, poliklosio wrote:
I have an Idea of reproducible, less-than-exponential time 
mangling, although I

don't know how actionable.

Leave mangling as is, but pretend that you are mangling 
something different, for

example when the input is
foo!(boo!(bar!(baz!(int))), bar!(baz!(int)), baz!(int))
pretend that you are mangling
foo!(boo!(bar!(baz!(int))), #1, #2)
Where #1 and #2 are special symbols that refer to stuff that 
was **already in

the name**, particularly:
#1: bar!(baz!(int))
#2: baz!(int)


This is what LZW compression does, except that LZW does it in 
general rather than just for identifiers.


That's true, but a general compression algorithm requires a 
stream of chars as input, so to compress something of exponential 
size you still need exponential runtime in order to iterate 
(while compressing) on the exponential input.
The idea was to avoid such exponential iteration in the first 
place by doing some sort of caching.
My thinking is that if you only reduce size but keep exponential 
runtime, you are going to have to revisit the issue in a few 
months anyway when people start using things you enabled more 
heavily.

Anyway I wish you good luck with all this!


Re: DMD producing huge binaries

2016-05-21 Thread poliklosio via Digitalmars-d

On Saturday, 21 May 2016 at 08:57:57 UTC, Johan Engelen wrote:

On Saturday, 21 May 2016 at 06:18:05 UTC, poliklosio wrote:


I have an Idea of reproducible, less-than-exponential time 
mangling, although I don't know how actionable.


Leave mangling as is, but pretend that you are mangling 
something different, for example when the input is

foo!(boo!(bar!(baz!(int))), bar!(baz!(int)), baz!(int))
pretend that you are mangling
foo!(boo!(bar!(baz!(int))), #1, #2)
Where #1 and #2 are special symbols that refer to stuff that 
was **already in the name**, particularly:

#1: bar!(baz!(int))
#2: baz!(int)


See 
http://forum.dlang.org/post/szodxrizfmufqdkpd...@forum.dlang.org


So if I understand correctly, you tried implementing something 
like this, but it didn't help much even with size of mangled 
names. Are you sure you were testing on the pathological case 
(exponential stuff), rather than a bad one? Assuming your 
experiment is correct, something interesting is happening, and I 
will be observing you guys finding out what it is. :)


Re: DMD producing huge binaries

2016-05-21 Thread Kagamin via Digitalmars-d
On Saturday, 21 May 2016 at 18:18:21 UTC, Andrei Alexandrescu 
wrote:
He said that that won't happen any longer, the growth was 
because of the return type. Is that correct?


https://issues.dlang.org/show_bug.cgi?id=15831#c4
Looks like growth is due to the fact that the voldemort type is 
in the scope of a function and function is fun!(T).fun(T) - hence 
each level multiplies by 2. And return type is not part of the 
mangled name already - that would cause circular dependency, you 
would need the voldemort type mangling to generate it.


Re: DMD producing huge binaries

2016-05-21 Thread Walter Bright via Digitalmars-d

On 5/20/2016 11:18 PM, poliklosio wrote:

I have an Idea of reproducible, less-than-exponential time mangling, although I
don't know how actionable.

Leave mangling as is, but pretend that you are mangling something different, for
example when the input is
foo!(boo!(bar!(baz!(int))), bar!(baz!(int)), baz!(int))
pretend that you are mangling
foo!(boo!(bar!(baz!(int))), #1, #2)
Where #1 and #2 are special symbols that refer to stuff that was **already in
the name**, particularly:
#1: bar!(baz!(int))
#2: baz!(int)


This is what LZW compression does, except that LZW does it in general rather 
than just for identifiers.


Re: DMD producing huge binaries

2016-05-21 Thread Andrei Alexandrescu via Digitalmars-d

On 05/21/2016 01:34 PM, Kagamin wrote:

But this leaves you with 2^^n growth, still exponential


He said that that won't happen any longer, the growth was because of the 
return type. Is that correct? -- Andrei


Re: DMD producing huge binaries

2016-05-21 Thread krzaq via Digitalmars-d

On Saturday, 21 May 2016 at 17:34:19 UTC, Kagamin wrote:
On Friday, 20 May 2016 at 19:37:23 UTC, Andrei Alexandrescu 
wrote:
Was talking to Walter on the phone and he just had one of 
those great ideas: encode in the function mangle that it 
returns "auto". I thought that was fantastic. Would that work? 
-- Andrei


Equivalent to not mangling return type at all. But this leaves 
you with 2^^n growth, still exponential. Also is there any 
evidence that compression is faster than hashing?


Would it be practical to use type-erasure in the middle of a call 
chain? It would cut 2^^n right at the knees at the possibly 
negligible runtime cost.


It would impose a burden on the client (as I don't think a 
compiler should do it on its own - or even if it could), but if 
it's really Pareto distributed then it shouldn't be that 
difficult to find the hot spots.


Re: DMD producing huge binaries

2016-05-21 Thread Kagamin via Digitalmars-d

On Friday, 20 May 2016 at 19:37:23 UTC, Andrei Alexandrescu wrote:
Was talking to Walter on the phone and he just had one of those 
great ideas: encode in the function mangle that it returns 
"auto". I thought that was fantastic. Would that work? -- Andrei


Equivalent to not mangling return type at all. But this leaves 
you with 2^^n growth, still exponential. Also is there any 
evidence that compression is faster than hashing?


Re: DMD producing huge binaries

2016-05-21 Thread Johan Engelen via Digitalmars-d

On Saturday, 21 May 2016 at 06:18:05 UTC, poliklosio wrote:


I have an Idea of reproducible, less-than-exponential time 
mangling, although I don't know how actionable.


Leave mangling as is, but pretend that you are mangling 
something different, for example when the input is

foo!(boo!(bar!(baz!(int))), bar!(baz!(int)), baz!(int))
pretend that you are mangling
foo!(boo!(bar!(baz!(int))), #1, #2)
Where #1 and #2 are special symbols that refer to stuff that 
was **already in the name**, particularly:

#1: bar!(baz!(int))
#2: baz!(int)


See 
http://forum.dlang.org/post/szodxrizfmufqdkpd...@forum.dlang.org


Re: DMD producing huge binaries

2016-05-21 Thread poliklosio via Digitalmars-d
On Thursday, 19 May 2016 at 13:45:18 UTC, Andrei Alexandrescu 
wrote:

On 05/19/2016 08:38 AM, Steven Schveighoffer wrote:

Yep. chain uses voldemort type, map does not.


We definitely need to fix Voldemort types. Walter and I bounced 
a few ideas during DConf.


1. Do the cleartext/compress/hash troika everywhere (currently 
we do it on Windows). That should help in several places.


2. For Voldemort types, simply generate a GUID-style random 
string of e.g. 64 bytes. Currently the name of the type is 
composed out of its full scope, with some 
exponentially-increasing repetition of substrings. That would 
compress well, but it's just a lot of work to produce and then 
compress the name. A random string is cheap to produce and 
adequate for Voldemort types (you don't care for their name 
anyway... Voldemort... get it?).


I very much advocate slapping a 64-long random string for all 
Voldermort returns and calling it a day. I bet Liran's code 
will get a lot quicker to build and smaller to boot.



Andrei


I have an Idea of reproducible, less-than-exponential time 
mangling, although I don't know how actionable.


Leave mangling as is, but pretend that you are mangling something 
different, for example when the input is

foo!(boo!(bar!(baz!(int))), bar!(baz!(int)), baz!(int))
pretend that you are mangling
foo!(boo!(bar!(baz!(int))), #1, #2)
Where #1 and #2 are special symbols that refer to stuff that was 
**already in the name**, particularly:

#1: bar!(baz!(int))
#2: baz!(int)

Now, this is an reduction of the exponential computation time 
only if you can detect repetitions before you go inside them, for 
example, in case of #1, detect the existence of bar!(baz!(int)) 
without looking inside it and seeing baz!(int). Do you think it 
could be done?


You would also need a deterministic way to assign those symbols 
#1, #2, #3... to the stuff in the name.


Compression can also be done at the end if necessary to reduce 
the polynomial growth.


Re: DMD producing huge binaries

2016-05-20 Thread Bill Hicks via Digitalmars-d

On Saturday, 21 May 2016 at 02:16:30 UTC, Era Scarecrow wrote:


 Unless there's a way to seed it consistently so order doesn't 
matter, it won't work. Although a fast CRC32 on the source 
could then give you a seed to work with, I'd prefer a hash over 
PRNG.


There is no seed; it's a low-discrepancy sequence, and each 
element of the sequence can be computed independently.


Re: DMD producing huge binaries

2016-05-20 Thread Era Scarecrow via Digitalmars-d

On Saturday, 21 May 2016 at 01:35:05 UTC, Bill Hicks wrote:

On Friday, 20 May 2016 at 19:41:16 UTC, Walter Bright wrote:
Hashing produces reproducible unique values. Random will not 
be reproducible and may not even be unique.


Just a troll here, but Quasirandom numbers are reproducible and 
unique.  e.g., Sobol in base 2 can be very efficient and fast, 
for any output length.


 Unless there's a way to seed it consistently so order doesn't 
matter, it won't work. Although a fast CRC32 on the source could 
then give you a seed to work with, I'd prefer a hash over PRNG.


Re: DMD producing huge binaries

2016-05-20 Thread Observer via Digitalmars-d

On Saturday, 21 May 2016 at 01:29:18 UTC, Observer wrote:
My recollection is that successively compiled binaries are 
rarely directly comparable, because of timestamps embedded by 
compilers.

So I have to ask:  are there standard tools to understand enough
of the ELF binary format (or whatever, for the given platform) 
to compare everything except the timestamps?


I just looked at an ELF format document, and didn't see any
reference to timestamps.  Maybe I'm confusing that with 
compression

formats like gzip produces.


Re: DMD producing huge binaries

2016-05-20 Thread Bill Hicks via Digitalmars-d

On Friday, 20 May 2016 at 19:41:16 UTC, Walter Bright wrote:

On 5/20/2016 6:24 AM, Andrei Alexandrescu wrote:
I don't see a need for hashing something. Would a 
randomly-generated string

suffice?


Hashing produces reproducible unique values. Random will not be 
reproducible and may not even be unique.


Just a troll here, but Quasirandom numbers are reproducible and 
unique.  e.g., Sobol in base 2 can be very efficient and fast, 
for any output length.


Re: DMD producing huge binaries

2016-05-20 Thread Observer via Digitalmars-d

On Friday, 20 May 2016 at 21:09:23 UTC, cym13 wrote:
It would make binary comparison of libraries and executables 
difficult which troubles me as comparing hashes is a basics of 
binary distribution security : you can check that a precompiled 
binary is legit by recompiling it in the same conditions and 
comparing the two. It would be way harder if random components 
were added.


My recollection is that successively compiled binaries are rarely
directly comparable, because of timestamps embedded by compilers.
So I have to ask:  are there standard tools to understand enough
of the ELF binary format (or whatever, for the given platform) to
compare everything except the timestamps?


Re: DMD producing huge binaries

2016-05-20 Thread Andrei Alexandrescu via Digitalmars-d

On 05/20/2016 06:01 PM, Georgi D wrote:

In essence the problem that should be solved is the long names of types.


Voldemort returns are by far the worst. Compression and hashing should 
handle the rest. -- Andrei


Re: DMD producing huge binaries

2016-05-20 Thread Johan Engelen via Digitalmars-d

On Friday, 20 May 2016 at 19:45:36 UTC, Walter Bright wrote:


Hashing isn't algorithmically cheap, either.


I also don't think compression should be a performance issue. I 
heard that some compression algorithms are as fast as the data 
comes in, so fast enough for our purpose.
The reason I chose hashing is simplicity and a guaranteed small 
symbol size. I wasn't sure whether a 5MB symbol would compress to 
a reasonable size. I wanted symbols to be readable; so less than, 
say, 80 chars.


For people interested in finding out whether mangling changes 
will improve compile speed performance: forget about linking and 
mangling, and just assign symbol names like "a", "b", "c", etc., 
and see what happens.




Re: DMD producing huge binaries

2016-05-20 Thread Era Scarecrow via Digitalmars-d

On Friday, 20 May 2016 at 22:01:21 UTC, Georgi D wrote:
This is a very interesting idea. I see one problem though: The 
real issue is not just the return type but also the types of 
the input parameters to a function. So even if the return type 
of a method is encoded as "auto" the moment the result of that 
method is  passed as parameter to another template method the 
long typename will resurface. I do not think the type of the 
input parameters could be encoded as "auto" since the different 
instances of a template method will clash in names.


In essence the problem that should be solved is the long names 
of types.


 I keep having it go through my head, if we had to hash the 
result, I'd prefer to has the source (minus comments & 
whitespace, and after replacements have been done) to come up 
with a reproducible value. This of course is if long names or 
compression won't do the job.


Re: DMD producing huge binaries

2016-05-20 Thread Georgi D via Digitalmars-d

On Friday, 20 May 2016 at 19:37:23 UTC, Andrei Alexandrescu wrote:

On 5/20/16 2:34 PM, Georgi D wrote:

1) Exponential growth of symbol name with voldemort types.
I like Steven's solution where the compiler lowers the struct 
outside of

the method.


Was talking to Walter on the phone and he just had one of those 
great ideas: encode in the function mangle that it returns 
"auto". I thought that was fantastic. Would that work? -- Andrei


This is a very interesting idea. I see one problem though: The 
real issue is not just the return type but also the types of the 
input parameters to a function. So even if the return type of a 
method is encoded as "auto" the moment the result of that method 
is  passed as parameter to another template method the long 
typename will resurface. I do not think the type of the input 
parameters could be encoded as "auto" since the different 
instances of a template method will clash in names.


In essence the problem that should be solved is the long names of 
types.


Re: DMD producing huge binaries

2016-05-20 Thread H. S. Teoh via Digitalmars-d
On Fri, May 20, 2016 at 02:08:02PM -0700, Walter Bright via Digitalmars-d wrote:
[...]
> 2. having the compiler produce different .o files on different runs
> with the same inputs is pretty eyebrow raising, and makes it harder to
> debug the compiler, and harder to have a test suite for the compiler.

Yeah, I think hashing is the way to go. Random strings just raise more
issues than they solve.


T

-- 
In theory, software is implemented according to the design that has been 
carefully worked out beforehand. In practice, design documents are written 
after the fact to describe the sorry mess that has gone on before.


Re: DMD producing huge binaries

2016-05-20 Thread Dicebot via Digitalmars-d

On Friday, 20 May 2016 at 21:09:23 UTC, cym13 wrote:
It would make binary comparison of libraries and executables 
difficult which troubles me as comparing hashes is a basics of 
binary distribution security : you can check that a precompiled 
binary is legit by recompiling it in the same conditions and 
comparing the two. It would be way harder if random components 
were added.


Yeah, that is good reason.


Re: DMD producing huge binaries

2016-05-20 Thread cym13 via Digitalmars-d

On Friday, 20 May 2016 at 20:49:20 UTC, Dicebot wrote:

On Friday, 20 May 2016 at 19:41:16 UTC, Walter Bright wrote:

On 5/20/2016 6:24 AM, Andrei Alexandrescu wrote:
I don't see a need for hashing something. Would a 
randomly-generated string

suffice?


Hashing produces reproducible unique values. Random will not 
be reproducible and may not even be unique.


The question is: is it actually good for them to be 
reproducible? The very idea behind voldemort types is that you 
don't reference them directly in any way, it is just 
implementation detail. To me it does make sense to apply it to 
debugging too (debugging of deeply chained template types isn't 
really very usable anyway).


It would make binary comparison of libraries and executables 
difficult which troubles me as comparing hashes is a basics of 
binary distribution security : you can check that a precompiled 
binary is legit by recompiling it in the same conditions and 
comparing the two. It would be way harder if random components 
were added.


Re: DMD producing huge binaries

2016-05-20 Thread Walter Bright via Digitalmars-d

On 5/20/2016 1:49 PM, Dicebot wrote:

The question is: is it actually good for them to be reproducible?


1. yes, because the same type can be generated from multiple compilation units 
(the linker removes the duplicates - if they're not duplicates, the linker won't 
remove them).


2. having the compiler produce different .o files on different runs with the 
same inputs is pretty eyebrow raising, and makes it harder to debug the 
compiler, and harder to have a test suite for the compiler.




Re: DMD producing huge binaries

2016-05-20 Thread Dicebot via Digitalmars-d

On Friday, 20 May 2016 at 19:41:16 UTC, Walter Bright wrote:

On 5/20/2016 6:24 AM, Andrei Alexandrescu wrote:
I don't see a need for hashing something. Would a 
randomly-generated string

suffice?


Hashing produces reproducible unique values. Random will not be 
reproducible and may not even be unique.


The question is: is it actually good for them to be reproducible? 
The very idea behind voldemort types is that you don't reference 
them directly in any way, it is just implementation detail. To me 
it does make sense to apply it to debugging too (debugging of 
deeply chained template types isn't really very usable anyway).


Re: DMD producing huge binaries

2016-05-20 Thread Andrei Alexandrescu via Digitalmars-d

On 05/20/2016 03:45 PM, Walter Bright wrote:

On 5/20/2016 5:57 AM, ZombineDev wrote:

Walter's PR slows down the compilation with
25-40% according to my tests. I expect that compilation would be
faster if the
whole process is skipped altogether.



The compressor only kicks in if the string is longer than 64 bytes. I
set it pretty low in order to have it kick in often, and hopefully flush
out any bugs in it.

For a production version, the minimum size should be set substantially
larger, based on testing to provide a reasonable balance.

That should speed it up a lot. Also, the algorithm is a bit naively
implemented, I bet it could be speeded up substantially.

Hashing isn't algorithmically cheap, either.


From the measurements shown the work seems Pareto distributed - the 
major time spender is the few long symbols, not the many short symbols. 
There are a few long symbols that dominate everything else by an order 
of magnitude. The one percenters! :o) -- Andrei


Re: DMD producing huge binaries

2016-05-20 Thread Walter Bright via Digitalmars-d

On 5/20/2016 5:57 AM, ZombineDev wrote:

Walter's PR slows down the compilation with
25-40% according to my tests. I expect that compilation would be faster if the
whole process is skipped altogether.



The compressor only kicks in if the string is longer than 64 bytes. I set it 
pretty low in order to have it kick in often, and hopefully flush out any bugs 
in it.


For a production version, the minimum size should be set substantially larger, 
based on testing to provide a reasonable balance.


That should speed it up a lot. Also, the algorithm is a bit naively implemented, 
I bet it could be speeded up substantially.


Hashing isn't algorithmically cheap, either.



Re: DMD producing huge binaries

2016-05-20 Thread Walter Bright via Digitalmars-d

On 5/20/2016 6:24 AM, Andrei Alexandrescu wrote:

I don't see a need for hashing something. Would a randomly-generated string
suffice?


Hashing produces reproducible unique values. Random will not be reproducible and 
may not even be unique.




Re: DMD producing huge binaries

2016-05-20 Thread Andrei Alexandrescu via Digitalmars-d

On 5/20/16 2:34 PM, Georgi D wrote:

1) Exponential growth of symbol name with voldemort types.
I like Steven's solution where the compiler lowers the struct outside of
the method.


Was talking to Walter on the phone and he just had one of those great 
ideas: encode in the function mangle that it returns "auto". I thought 
that was fantastic. Would that work? -- Andrei


Re: DMD producing huge binaries

2016-05-20 Thread Georgi D via Digitalmars-d

On Friday, 20 May 2016 at 16:21:55 UTC, ZombineDev wrote:

On Friday, 20 May 2016 at 13:16:32 UTC, Johan Engelen wrote:

On Friday, 20 May 2016 at 12:57:40 UTC, ZombineDev wrote:


As I said earlier, it would be best if can prevent the 
generation of long symbols in the first place, because that 
would improve the compilation times significantly.


From what I've observed, generating the long symbol name 
itself is fast. If we avoid the deep type hierarchy, then I 
think indeed you can expect compile time improvement.


Walter's PR slows down the compilation with 25-40% according 
to my tests. I expect that compilation would be faster if the 
whole process is skipped altogether.


MD5 hashing slowed down builds by a few percent for Weka 
(note: LDC machinecodegen is slower than DMD's, so 
percentage-wise...), which can then be compensated for using 
PGO ;-)  /+  <-- shameless PGO plug  +/


IIUC, your scheme works like this:
1. DMDFE creates a mangled symbol name
2. Create a MD-5 hash of thr symbol use the hash instead of the 
full name.


If minimal change in Georgi's almost trivial program w.r.t LoC 
(compared to e.g. Weka's huge codebase) increases compilation 
time from 1.5sec to 27sec, I can't imagine how slower it would 
take for a larger project.


We should cure root cause. Genetating uselessly large symbols 
(even if hashed after that) is part of that problem, so I think 
it should never done if they start growing than e.g. 500 bytes.


The solution that Steven showed is exactly what the compiler 
should do. Another reason why the compiler should do it is 
because often voldemort types capture outer context (local 
variables, alias paramters, delegates, etc.), which makes it 
very hard for the user to manually extract the voldemort type 
out of the function.


I see two separate issues that I think should be handled 
independently:


1) Exponential growth of symbol name with voldemort types.
I like Steven's solution where the compiler lowers the struct 
outside of the method.


2) Long symbol names in general which could arise even without 
voldemort types involved especially with chaining multiple 
algorithms.


I like Johan Engelen solution in LDC for symbols longer than a 
threshold. For symbols shorter than the threshold I think 
Walter's compression algorithm could be used to gain 40-60% 
reduction and still retain the full type information.




Re: DMD producing huge binaries

2016-05-20 Thread Adam D. Ruppe via Digitalmars-d

On Friday, 20 May 2016 at 17:11:52 UTC, Wyatt wrote:
I've probably missed something, but it seems like line noise 
just because we feel we must.


Line and file is not unique because the same template would 
generate different functions for different arguments.


Template arguments are a part of the identifier... and that can 
really blow up the size because they might be template arguments 
which might be template arguments, etc., but each unique argument 
could be creating an entirely different function.


Re: DMD producing huge binaries

2016-05-20 Thread Wyatt via Digitalmars-d

On Friday, 20 May 2016 at 13:24:42 UTC, Andrei Alexandrescu wrote:


I don't see a need for hashing something. Would a 
randomly-generated string suffice?



Naïve question: is a (randomly-)?generated _anything_ required?

I've probably missed something, but it seems like line noise just 
because we feel we must.  I guess there's a possibility that 
there would be multiple matches on the same line with the same 
object and identifier... Stick the column in there too and call 
it a day?


-Wyatt


Re: DMD producing huge binaries

2016-05-20 Thread ZombineDev via Digitalmars-d

On Friday, 20 May 2016 at 13:16:32 UTC, Johan Engelen wrote:

On Friday, 20 May 2016 at 12:57:40 UTC, ZombineDev wrote:


As I said earlier, it would be best if can prevent the 
generation of long symbols in the first place, because that 
would improve the compilation times significantly.


From what I've observed, generating the long symbol name itself 
is fast. If we avoid the deep type hierarchy, then I think 
indeed you can expect compile time improvement.


Walter's PR slows down the compilation with 25-40% according 
to my tests. I expect that compilation would be faster if the 
whole process is skipped altogether.


MD5 hashing slowed down builds by a few percent for Weka (note: 
LDC machinecodegen is slower than DMD's, so 
percentage-wise...), which can then be compensated for using 
PGO ;-)  /+  <-- shameless PGO plug  +/


IIUC, your scheme works like this:
1. DMDFE creates a mangled symbol name
2. Create a MD-5 hash of thr symbol use the hash instead of the 
full name.


If minimal change in Georgi's almost trivial program w.r.t LoC 
(compared to e.g. Weka's huge codebase) increases compilation 
time from 1.5sec to 27sec, I can't imagine how slower it would 
take for a larger project.


We should cure root cause. Genetating uselessly large symbols 
(even if hashed after that) is part of that problem, so I think 
it should never done if they start growing than e.g. 500 bytes.


The solution that Steven showed is exactly what the compiler 
should do. Another reason why the compiler should do it is 
because often voldemort types capture outer context (local 
variables, alias paramters, delegates, etc.), which makes it very 
hard for the user to manually extract the voldemort type out of 
the function.





Re: DMD producing huge binaries

2016-05-20 Thread ZombineDev via Digitalmars-d

On Friday, 20 May 2016 at 15:39:18 UTC, Rene Zwanenburg wrote:

On Friday, 20 May 2016 at 12:08:37 UTC, ZombineDev wrote:

@Rene
How do you expect the compiler to know the exact return type, 
only by looking at this signature:

auto transmogrify(string str);

A possible implementation might be this:
auto transmogrify(string str)
{
   return str.map!someFunc.filter!otherFunc.joiner();
}

or something completly different.


I was thinking of something along the lines of this:

===
size_t frobnicate(int i)
{
return 0;
}

auto frobnicator(T)(T t)
{
static struct Result
{
int index;

size_t front()
{
return frobnicate(index);
}

enum empty = false;

void popFront()
{
++index;
}
}

return Result(t.index);
}
===

Automatically generating a header with DMD gives me:

===
size_t frobnicate(int i);
auto frobnicator(T)(T t)
{
static struct Result
{
int index;
size_t front();
enum empty = false;
void popFront();
}
return Result(t.index);
}
===

Now frobnicator returns a different type for the same T 
depending on whether you're using the .d or the .di file. I'm 
not sure if this is a problem, but it sounds like something 
that can come back to bite you in edge cases.


The only issue is code bloat. It also happens with separate 
compilation, becuase the compiler can't know if the template has 
already been instantiated in a different compilation unit.
Only in a single compilation unit/invocation the compiler can 
reliably see all the places where the same template instance is 
used. In that case, the actual mangled name shouldn't matter, 
AFAIU.


Re: DMD producing huge binaries

2016-05-20 Thread Rene Zwanenburg via Digitalmars-d

On Friday, 20 May 2016 at 12:08:37 UTC, ZombineDev wrote:

@Rene
How do you expect the compiler to know the exact return type, 
only by looking at this signature:

auto transmogrify(string str);

A possible implementation might be this:
auto transmogrify(string str)
{
   return str.map!someFunc.filter!otherFunc.joiner();
}

or something completly different.


I was thinking of something along the lines of this:

===
size_t frobnicate(int i)
{
return 0;
}

auto frobnicator(T)(T t)
{
static struct Result
{
int index;

size_t front()
{
return frobnicate(index);
}

enum empty = false;

void popFront()
{
++index;
}
}

return Result(t.index);
}
===

Automatically generating a header with DMD gives me:

===
size_t frobnicate(int i);
auto frobnicator(T)(T t)
{
static struct Result
{
int index;
size_t front();
enum empty = false;
void popFront();
}
return Result(t.index);
}
===

Now frobnicator returns a different type for the same T depending 
on whether you're using the .d or the .di file. I'm not sure if 
this is a problem, but it sounds like something that can come 
back to bite you in edge cases.


Re: DMD producing huge binaries

2016-05-20 Thread Marc Schütz via Digitalmars-d

On Friday, 20 May 2016 at 13:24:42 UTC, Andrei Alexandrescu wrote:
I don't see a need for hashing something. Would a 
randomly-generated string suffice?


That would break separate compilation, wouldn't it?


Re: DMD producing huge binaries

2016-05-20 Thread H. S. Teoh via Digitalmars-d
On Fri, May 20, 2016 at 09:24:42AM -0400, Andrei Alexandrescu via Digitalmars-d 
wrote:
> On 05/20/2016 09:07 AM, Johan Engelen wrote:
[...]
> > One obstacle is the hasher itself: I am not going to implement one
> > myself! In the LDC PR, I used LLVM's MD5 hasher and Phobos's MD5
> > hasher.  Perhaps it is better to use a faster hasher (I have no
> > expertise on that; Murmur?), so we will have to carry our own copy
> > of a good hasher implementation. Or perhaps the speedlimit is memory
> > access and hash algorithm speed doesn't matter.
> > 
> > I made the hashing optional, with a symbol length threshold. Getting
> > rid of the variable threshold would be good, such that the (few)
> > large symbols in Phobos are hashed too and all will work fine.
> > Perhaps 1k is a good threshold.
> 
> I don't see a need for hashing something. Would a randomly-generated
> string suffice?
[...]

Wouldn't we want the same symbol to be generated if we call a
Voldemort-returning function with the same compile-time arguments (but
not necessarily runtime arguments) multiple times from different places?
This is likely not an issue when compiling all sources at once, but may
be a problem with incremental compilation.


T

-- 
"Hi." "'Lo."


Re: DMD producing huge binaries

2016-05-20 Thread Andrei Alexandrescu via Digitalmars-d

On 05/20/2016 09:07 AM, Johan Engelen wrote:

On Friday, 20 May 2016 at 12:30:10 UTC, Andrei Alexandrescu wrote:

On 05/20/2016 08:21 AM, Johan Engelen wrote:


https://github.com/ldc-developers/ldc/pull/1445:
"Hashed symbols look like this:
_D3one3two5three3L3433_46a82aac733d8a4b3588d7fa8937aad66Result3fooZ
ddemangle gives:
one.two.three.L34._46a82aac733d8a4b3588d7fa8937aad6.Result.foo
Meaning: this symbol is defined in module one.two.three on line 34. The
identifier is foo and is contained in the struct or class Result."


This is nice. How difficult would it be to rework it into a PR for
dmd? -- Andrei


I can work on it, but only if it will not result in a long debate
afterwards (!!!).


Thanks. I'll get back to you on that.


One obstacle is the hasher itself: I am not going to implement one
myself! In the LDC PR, I used LLVM's MD5 hasher and Phobos's MD5 hasher.
Perhaps it is better to use a faster hasher (I have no expertise on
that; Murmur?), so we will have to carry our own copy of a good hasher
implementation. Or perhaps the speedlimit is memory access and hash
algorithm speed doesn't matter.

I made the hashing optional, with a symbol length threshold. Getting rid
of the variable threshold would be good, such that the (few) large
symbols in Phobos are hashed too and all will work fine. Perhaps 1k is a
good threshold.


I don't see a need for hashing something. Would a randomly-generated 
string suffice?



Andrei



Re: DMD producing huge binaries

2016-05-20 Thread Johan Engelen via Digitalmars-d

On Friday, 20 May 2016 at 12:57:40 UTC, ZombineDev wrote:


As I said earlier, it would be best if can prevent the 
generation of long symbols in the first place, because that 
would improve the compilation times significantly.


From what I've observed, generating the long symbol name itself 
is fast. If we avoid the deep type hierarchy, then I think indeed 
you can expect compile time improvement.


Walter's PR slows down the compilation with 25-40% according to 
my tests. I expect that compilation would be faster if the 
whole process is skipped altogether.


MD5 hashing slowed down builds by a few percent for Weka (note: 
LDC machinecodegen is slower than DMD's, so percentage-wise...), 
which can then be compensated for using PGO ;-)  /+  <-- 
shameless PGO plug  +/


Re: DMD producing huge binaries

2016-05-20 Thread Johan Engelen via Digitalmars-d

On Friday, 20 May 2016 at 12:30:10 UTC, Andrei Alexandrescu wrote:

On 05/20/2016 08:21 AM, Johan Engelen wrote:


https://github.com/ldc-developers/ldc/pull/1445:
"Hashed symbols look like this:
_D3one3two5three3L3433_46a82aac733d8a4b3588d7fa8937aad66Result3fooZ
ddemangle gives:
one.two.three.L34._46a82aac733d8a4b3588d7fa8937aad6.Result.foo
Meaning: this symbol is defined in module one.two.three on 
line 34. The
identifier is foo and is contained in the struct or class 
Result."


This is nice. How difficult would it be to rework it into a PR 
for dmd? -- Andrei


I can work on it, but only if it will not result in a long debate 
afterwards (!!!).


One obstacle is the hasher itself: I am not going to implement 
one myself! In the LDC PR, I used LLVM's MD5 hasher and Phobos's 
MD5 hasher. Perhaps it is better to use a faster hasher (I have 
no expertise on that; Murmur?), so we will have to carry our own 
copy of a good hasher implementation. Or perhaps the speedlimit 
is memory access and hash algorithm speed doesn't matter.


I made the hashing optional, with a symbol length threshold. 
Getting rid of the variable threshold would be good, such that 
the (few) large symbols in Phobos are hashed too and all will 
work fine. Perhaps 1k is a good threshold.




Re: DMD producing huge binaries

2016-05-20 Thread ZombineDev via Digitalmars-d

On Friday, 20 May 2016 at 12:21:58 UTC, Johan Engelen wrote:
On Friday, 20 May 2016 at 10:54:18 UTC, Andrei Alexandrescu 
wrote:

On 5/19/16 6:16 PM, Walter Bright wrote:

On 5/19/2016 6:45 AM, Andrei Alexandrescu wrote:

I very much advocate slapping a 64-long random string for all
Voldermort returns
and calling it a day. I bet Liran's code will get a lot 
quicker to

build and
smaller to boot.


Let's see how far we get with compression first.

   https://github.com/dlang/dmd/pull/5793

Using 64 character random strings will make symbolic 
debugging unpleasant.


This is a fallacy. I don't think so, at all, when the baseline 
is an extremely long string.


I agree with Andrei.
I solved it this way 
https://github.com/ldc-developers/ldc/pull/1445:

"Hashed symbols look like this:
_D3one3two5three3L3433_46a82aac733d8a4b3588d7fa8937aad66Result3fooZ
ddemangle gives:
one.two.three.L34._46a82aac733d8a4b3588d7fa8937aad6.Result.foo
Meaning: this symbol is defined in module one.two.three on line 
34. The identifier is foo and is contained in the struct or 
class Result."


I like your approach. As I said earlier, it would be best if can 
prevent the generation of long symbols in the first place, 
because that would improve the compilation times significantly. 
Walter's PR slows down the compilation with 25-40% according to 
my tests. I expect that compilation would be faster if the whole 
process is skipped altogether.


Re: DMD producing huge binaries

2016-05-20 Thread Andrei Alexandrescu via Digitalmars-d

On 05/20/2016 08:21 AM, Johan Engelen wrote:

On Friday, 20 May 2016 at 10:54:18 UTC, Andrei Alexandrescu wrote:

On 5/19/16 6:16 PM, Walter Bright wrote:

On 5/19/2016 6:45 AM, Andrei Alexandrescu wrote:

I very much advocate slapping a 64-long random string for all
Voldermort returns
and calling it a day. I bet Liran's code will get a lot quicker to
build and
smaller to boot.


Let's see how far we get with compression first.

   https://github.com/dlang/dmd/pull/5793

Using 64 character random strings will make symbolic debugging
unpleasant.


This is a fallacy. I don't think so, at all, when the baseline is an
extremely long string.


I agree with Andrei.
I solved it this way https://github.com/ldc-developers/ldc/pull/1445:
"Hashed symbols look like this:
_D3one3two5three3L3433_46a82aac733d8a4b3588d7fa8937aad66Result3fooZ
ddemangle gives:
one.two.three.L34._46a82aac733d8a4b3588d7fa8937aad6.Result.foo
Meaning: this symbol is defined in module one.two.three on line 34. The
identifier is foo and is contained in the struct or class Result."


This is nice. How difficult would it be to rework it into a PR for dmd? 
-- Andrei




Re: DMD producing huge binaries

2016-05-20 Thread Johan Engelen via Digitalmars-d

On Friday, 20 May 2016 at 10:54:18 UTC, Andrei Alexandrescu wrote:

On 5/19/16 6:16 PM, Walter Bright wrote:

On 5/19/2016 6:45 AM, Andrei Alexandrescu wrote:

I very much advocate slapping a 64-long random string for all
Voldermort returns
and calling it a day. I bet Liran's code will get a lot 
quicker to

build and
smaller to boot.


Let's see how far we get with compression first.

   https://github.com/dlang/dmd/pull/5793

Using 64 character random strings will make symbolic debugging 
unpleasant.


This is a fallacy. I don't think so, at all, when the baseline 
is an extremely long string.


I agree with Andrei.
I solved it this way 
https://github.com/ldc-developers/ldc/pull/1445:

"Hashed symbols look like this:
_D3one3two5three3L3433_46a82aac733d8a4b3588d7fa8937aad66Result3fooZ
ddemangle gives:
one.two.three.L34._46a82aac733d8a4b3588d7fa8937aad6.Result.foo
Meaning: this symbol is defined in module one.two.three on line 
34. The identifier is foo and is contained in the struct or class 
Result."





Re: DMD producing huge binaries

2016-05-20 Thread ZombineDev via Digitalmars-d

On Friday, 20 May 2016 at 12:01:14 UTC, ZombineDev wrote:

On Friday, 20 May 2016 at 11:40:12 UTC, Rene Zwanenburg wrote:

On Friday, 20 May 2016 at 11:32:16 UTC, ZombineDev wrote:

IMO, the best way forward is:
+ The compiler should lower voldemort types, according to the 
scheme that Steve suggested 
(http://forum.dlang.org/post/nhkmo7$ob5$1...@digitalmars.com)
+ After that, during symbol generation (mangling) if a symbol 
starts getting larger than some threshold (e.g. 800 
characters), the mangling algorithm should detect that and 
bail out by generating some unique id instead. The only 
valuable information that the symbol must include is the 
module name and location (line and column number) of the 
template instantiation.


Location info shouldn't be used. This will break things like 
interface files and dynamic libraries.


Well... template-heavy code doesn't play well header-files and 
dynamic libraries. Most of the time templates are used for the 
implementation of an interface, but template types such as 
ranges are unsuable in function signatures. That's why they're 
called voldemort types - because they're the ones that can 
not/must not be named.
Instead dynamic libraries should use stable types such as 
interfaces, arrays and function pointers, which don't have the 
aforementioned symbol size problems.


Since we're using random numbers for symbols (instead of the 
actual names) it would not be possible for such symbols to be 
part of an interface, because a different invocation of the 
compiler would produce different symbol names. Such symbols 
should always an implementation detail, and not part of an 
interface. That's why location info would play no role, except 
for debugging purposes.


@Rene
How do you expect the compiler to know the exact return type, 
only by looking at this signature:

auto transmogrify(string str);

A possible implementation might be this:
auto transmogrify(string str)
{
   return str.map!someFunc.filter!otherFunc.joiner();
}

or something completly different.




Re: DMD producing huge binaries

2016-05-20 Thread ZombineDev via Digitalmars-d

On Friday, 20 May 2016 at 11:40:12 UTC, Rene Zwanenburg wrote:

On Friday, 20 May 2016 at 11:32:16 UTC, ZombineDev wrote:

IMO, the best way forward is:
+ The compiler should lower voldemort types, according to the 
scheme that Steve suggested 
(http://forum.dlang.org/post/nhkmo7$ob5$1...@digitalmars.com)
+ After that, during symbol generation (mangling) if a symbol 
starts getting larger than some threshold (e.g. 800 
characters), the mangling algorithm should detect that and 
bail out by generating some unique id instead. The only 
valuable information that the symbol must include is the 
module name and location (line and column number) of the 
template instantiation.


Location info shouldn't be used. This will break things like 
interface files and dynamic libraries.


Well... template-heavy code doesn't play well header-files and 
dynamic libraries. Most of the time templates are used for the 
implementation of an interface, but template types such as ranges 
are unsuable in function signatures. That's why they're called 
voldemort types - because they're the ones that can not/must not 
be named.
Instead dynamic libraries should use stable types such as 
interfaces, arrays and function pointers, which don't have the 
aforementioned symbol size problems.


Since we're using random numbers for symbols (instead of the 
actual names) it would not be possible for such symbols to be 
part of an interface, because a different invocation of the 
compiler would produce different symbol names. Such symbols 
should always an implementation detail, and not part of an 
interface. That's why location info would play no role, except 
for debugging purposes.


Re: DMD producing huge binaries

2016-05-20 Thread Rene Zwanenburg via Digitalmars-d

On Friday, 20 May 2016 at 11:32:16 UTC, ZombineDev wrote:

IMO, the best way forward is:
+ The compiler should lower voldemort types, according to the 
scheme that Steve suggested 
(http://forum.dlang.org/post/nhkmo7$ob5$1...@digitalmars.com)
+ After that, during symbol generation (mangling) if a symbol 
starts getting larger than some threshold (e.g. 800 
characters), the mangling algorithm should detect that and bail 
out by generating some unique id instead. The only valuable 
information that the symbol must include is the module name and 
location (line and column number) of the template instantiation.


Location info shouldn't be used. This will break things like 
interface files and dynamic libraries.


Re: DMD producing huge binaries

2016-05-20 Thread ZombineDev via Digitalmars-d

On Friday, 20 May 2016 at 09:48:15 UTC, ZombineDev wrote:

On Thursday, 19 May 2016 at 22:16:03 UTC, Walter Bright wrote:

On 5/19/2016 6:45 AM, Andrei Alexandrescu wrote:
I very much advocate slapping a 64-long random string for all 
Voldermort returns
and calling it a day. I bet Liran's code will get a lot 
quicker to build and

smaller to boot.


Let's see how far we get with compression first.

  https://github.com/dlang/dmd/pull/5793

Using 64 character random strings will make symbolic debugging 
unpleasant.


Unfortunately, the PR doesn't cure the root cause, but only 
provides linear 45-55% improvement, which doesn't scale with 
the exponential growth of the symbol size:



casetime to compile du -h   strings file | wc -c
NG-case before  0m19.708s   339M338.23M
NG-case  after  0m27.006s   218M209.35M
OK-case before  0m1.466s 16M 15.33M
OK-case  after  0m1.856s 11M  9.28M

For more info on the measurements:
https://github.com/dlang/dmd/pull/5793#issuecomment-220550682


IMO, the best way forward is:
+ The compiler should lower voldemort types, according to the 
scheme that Steve suggested 
(http://forum.dlang.org/post/nhkmo7$ob5$1...@digitalmars.com)
+ After that, during symbol generation (mangling) if a symbol 
starts getting larger than some threshold (e.g. 800 characters), 
the mangling algorithm should detect that and bail out by 
generating some unique id instead. The only valuable information 
that the symbol must include is the module name and location 
(line and column number) of the template instantiation.


Re: DMD producing huge binaries

2016-05-20 Thread Andrei Alexandrescu via Digitalmars-d

On 5/19/16 6:16 PM, Walter Bright wrote:

On 5/19/2016 6:45 AM, Andrei Alexandrescu wrote:

I very much advocate slapping a 64-long random string for all
Voldermort returns
and calling it a day. I bet Liran's code will get a lot quicker to
build and
smaller to boot.


Let's see how far we get with compression first.

   https://github.com/dlang/dmd/pull/5793

Using 64 character random strings will make symbolic debugging unpleasant.


This is a fallacy. I don't think so, at all, when the baseline is an 
extremely long string. In any given scope there are at most a handful of 
Voldemort types at play, and it's easy to identify which is which. We've 
all done so many times with IPs, GUIDs of objects, directories, etc. 
etc. -- Andrei


Re: DMD producing huge binaries

2016-05-20 Thread ZombineDev via Digitalmars-d

On Thursday, 19 May 2016 at 22:16:03 UTC, Walter Bright wrote:

On 5/19/2016 6:45 AM, Andrei Alexandrescu wrote:
I very much advocate slapping a 64-long random string for all 
Voldermort returns
and calling it a day. I bet Liran's code will get a lot 
quicker to build and

smaller to boot.


Let's see how far we get with compression first.

  https://github.com/dlang/dmd/pull/5793

Using 64 character random strings will make symbolic debugging 
unpleasant.


Unfortunately, the PR doesn't cure the root cause, but only 
provides linear 45-55% improvement, which doesn't scale with the 
exponential growth of the symbol size:



casetime to compile du -h   strings file | wc -c
NG-case before  0m19.708s   339M338.23M
NG-case  after  0m27.006s   218M209.35M
OK-case before  0m1.466s 16M 15.33M
OK-case  after  0m1.856s 11M  9.28M

For more info on the measurements:
https://github.com/dlang/dmd/pull/5793#issuecomment-220550682




Re: DMD producing huge binaries

2016-05-20 Thread Georgi D via Digitalmars-d
On Thursday, 19 May 2016 at 16:41:59 UTC, Andrei Alexandrescu 
wrote:

On 05/19/2016 11:56 AM, Georgi D wrote:
Making a local copy of chain and moving the structure outside 
of the

method solved the problem and reduced the code size.

The stripped size even reduced from the version that did not 
experience

the huge increase. Stripped: 7.4Mb -> 5.5MB


Thanks very much for helping with these measurements! -- Andrei


Just some more numbers:

On the version with the local chain which is 5.5MB

# strings test.local_choose|wc -l
4170
# strings test.local_choose|wc -c
5194012

Which mean that of the 5.5MB total binary size  4.95MB are 
strings.


Inspecting the content of the strings over 98% are symbol names. 
Is there a way to not have all the symbol names as strings even 
after the binary was stripped?


I think that some compression for the symbol names is a good idea.

I agree though that a O(n) or better for the voldemort types is 
also needed.





Re: DMD producing huge binaries

2016-05-20 Thread poliklosio via Digitalmars-d

On Friday, 20 May 2016 at 05:34:15 UTC, default0 wrote:

On Thursday, 19 May 2016 at 22:16:03 UTC, Walter Bright wrote:
> (...)
(...) you still avoid generating a 5 Exabyte large symbol name 
just to compress/hash/whatever it.


This is why compression is not good enough.


Re: DMD producing huge binaries

2016-05-19 Thread default0 via Digitalmars-d

On Thursday, 19 May 2016 at 22:16:03 UTC, Walter Bright wrote:

On 5/19/2016 6:45 AM, Andrei Alexandrescu wrote:
I very much advocate slapping a 64-long random string for all 
Voldermort returns
and calling it a day. I bet Liran's code will get a lot 
quicker to build and

smaller to boot.


Let's see how far we get with compression first.

  https://github.com/dlang/dmd/pull/5793

Using 64 character random strings will make symbolic debugging 
unpleasant.


You could simply add a "trivial" version of the struct + 
enclosing function name (ie once, without repetitions) before or 
after the random character string. This way you would know which 
struct its referring to, its unique, and you still avoid 
generating a 5 Exabyte large symbol name just to 
compress/hash/whatever it.


Re: DMD producing huge binaries

2016-05-19 Thread Patrick Schluter via Digitalmars-d

On Thursday, 19 May 2016 at 22:46:02 UTC, Adam D. Ruppe wrote:

On Thursday, 19 May 2016 at 22:16:03 UTC, Walter Bright wrote:
Using 64 character random strings will make symbolic debugging 
unpleasant.


Using 6.4 megabyte strings already makes symbolic debugging 
unpleasant.


The one thing that worries me about random strings is that it 
needs to be the same across all builds, or you'll get random 
linking errors when doing package-at-a-time or whatever (dmd 
already has some problems like this!). But building a gigantic 
string then compressing or hashing it still sucks... what we 
need is a O(1) solution that is still unique and repeatable.


Good point. Using a SHA1 derived from the string instead of a 
GUID is imho better. It has the advantage of repeatability, is 
even shorter and not very expensive to generate.




Re: DMD producing huge binaries

2016-05-19 Thread poliklosio via Digitalmars-d

On Thursday, 19 May 2016 at 23:56:46 UTC, poliklosio wrote:

(...)
Clearly the reason for building such a gigantic string was some 
sort of repetition. Detect the repetition on-the-fly to avoid 
processing all of it. This way the generated name is already 
compressed.


It seems like dynamic programming can be useful.


Re: DMD producing huge binaries

2016-05-19 Thread poliklosio via Digitalmars-d

On Thursday, 19 May 2016 at 22:46:02 UTC, Adam D. Ruppe wrote:

On Thursday, 19 May 2016 at 22:16:03 UTC, Walter Bright wrote:
Using 64 character random strings will make symbolic debugging 
unpleasant.


Using 6.4 megabyte strings already makes symbolic debugging 
unpleasant.


The one thing that worries me about random strings is that it 
needs to be the same across all builds, or you'll get random 
linking errors when doing package-at-a-time or whatever (dmd 
already has some problems like this!). But building a gigantic 
string then compressing or hashing it still sucks... what we 
need is a O(1) solution that is still unique and repeatable.


Clearly the reason for building such a gigantic string was some 
sort of repetition. Detect the repetition on-the-fly to avoid 
processing all of it. This way the generated name is already 
compressed.


Re: DMD producing huge binaries

2016-05-19 Thread Walter Bright via Digitalmars-d

On 5/19/2016 4:36 PM, jmh530 wrote:

Queue me going to the language reference to look up static structs and reading

"A struct can be prevented from being nested by using the static attribute, but
then of course it will not be able to access variables from its enclosing 
scope."

Not a fan of that description...


You can always submit a PR for a better one.


Re: DMD producing huge binaries

2016-05-19 Thread jmh530 via Digitalmars-d

On Thursday, 19 May 2016 at 22:17:37 UTC, Walter Bright wrote:


Consider using a 'static struct'. A non-static struct will 
include a hidden member that points to the stack frame of 
foo(), i.e. "produces smaller code".


Queue me going to the language reference to look up static 
structs and reading


"A struct can be prevented from being nested by using the static 
attribute, but then of course it will not be able to access 
variables from its enclosing scope."


Not a fan of that description...


Re: DMD producing huge binaries

2016-05-19 Thread Adam D. Ruppe via Digitalmars-d

On Thursday, 19 May 2016 at 22:16:03 UTC, Walter Bright wrote:
Using 64 character random strings will make symbolic debugging 
unpleasant.


Using 6.4 megabyte strings already makes symbolic debugging 
unpleasant.


The one thing that worries me about random strings is that it 
needs to be the same across all builds, or you'll get random 
linking errors when doing package-at-a-time or whatever (dmd 
already has some problems like this!). But building a gigantic 
string then compressing or hashing it still sucks... what we need 
is a O(1) solution that is still unique and repeatable.


Re: DMD producing huge binaries

2016-05-19 Thread Walter Bright via Digitalmars-d

On 5/19/2016 6:45 AM, Andrei Alexandrescu wrote:

I very much advocate slapping a 64-long random string for all Voldermort returns
and calling it a day. I bet Liran's code will get a lot quicker to build and
smaller to boot.


Let's see how far we get with compression first.

  https://github.com/dlang/dmd/pull/5793

Using 64 character random strings will make symbolic debugging unpleasant.


Re: DMD producing huge binaries

2016-05-19 Thread Walter Bright via Digitalmars-d

On 5/19/2016 8:39 AM, Steven Schveighoffer wrote:

template(T) foo if (someConstraints)
{
struct Result
{
   T t;
}

auto foo(T t)
{
   return Result(t);
}
}


This solution works awesomely, actually. It even produces smaller code than
moving the struct completely outside.

I'm going to use this mechanism in iopipe to get my voldemorts back :)



Consider using a 'static struct'. A non-static struct will include a hidden 
member that points to the stack frame of foo(), i.e. "produces smaller code".


Re: DMD producing huge binaries

2016-05-19 Thread Andrei Alexandrescu via Digitalmars-d

On 05/19/2016 11:56 AM, Georgi D wrote:

Making a local copy of chain and moving the structure outside of the
method solved the problem and reduced the code size.

The stripped size even reduced from the version that did not experience
the huge increase. Stripped: 7.4Mb -> 5.5MB


Thanks very much for helping with these measurements! -- Andrei


Re: DMD producing huge binaries

2016-05-19 Thread Steven Schveighoffer via Digitalmars-d

On 5/19/16 11:56 AM, Georgi D wrote:

On Thursday, 19 May 2016 at 12:38:09 UTC, Steven Schveighoffer wrote:

On 5/17/16 6:04 PM, ZombineDev wrote:

I'm guessing that is highly related to this one:
https://issues.dlang.org/show_bug.cgi?id=15831


Yep. chain uses voldemort type, map does not.

Georgi: try copying chain implementation into your local file, but
instead of having a static internal struct, move it to a module-level
struct (you probably have to repeat the template parameters, but not
the constraints). See if it produces a similar slimming effect.



Yes,

Making a local copy of chain and moving the structure outside of the
method solved the problem and reduced the code size.


Thanks, that confirms the bug reports are for the same issue.



The stripped size even reduced from the version that did not experience
the huge increase. Stripped: 7.4Mb -> 5.5MB

Should we actually change the implementation in phobos to be like this?


I'd much prefer we fix the compiler to handle voldemort types in a more 
sane way rather than try and fix the code to deal with compiler 
deficiencies. However, if this isn't something that's easily done (or 
done within a decent time frame), we may look at that solution.



In the company I work in we are very sensitive to the code size so
anything that can be made for this in phobos will help with adoption.

I am still curious though why the code size increased so dramatically
with such a small change. Both versions return a result from chain and
actually the one that does not experience the increase is more complex
(includes the one that does).


Each call to a voldemort wrapping function (one that wraps one type into 
another "voldemort" type, or internal struct) increases the symbol size 
at a rate of 3^n. Non-voldemort types do not suffer from this, because 
the symbol size increases at just n.



In my actual code I have multiple calls to chain on top of the result of
this and the code size does not increase dramatically.

I am also concerned by the fact that ldc on mac just could not compile
the source which produced the big binary (entering an infinite loop).


I would expect the possibility that the issue is with the linker. In my 
toy experiments, the compiler works just fine and produces an object 
file. It's the linker having issues with parsing those humungous symbols.


-Steve


Re: DMD producing huge binaries

2016-05-19 Thread Georgi D via Digitalmars-d
On Thursday, 19 May 2016 at 12:38:09 UTC, Steven Schveighoffer 
wrote:

On 5/17/16 6:04 PM, ZombineDev wrote:
On Tuesday, 17 May 2016 at 21:58:06 UTC, Andrei Alexandrescu 
wrote:

On 05/17/2016 05:44 PM, Georgi D wrote:

Hi,

While working on a D project which heavily uses the lazy 
algorithms for
ranges I noticed a sudden huge increase in the compilation 
time and the

produced binary size.

[snip]

Thanks. That's definitely deserving of a bug report. -- Andrei


I'm guessing that is highly related to this one:
https://issues.dlang.org/show_bug.cgi?id=15831


Yep. chain uses voldemort type, map does not.

Georgi: try copying chain implementation into your local file, 
but instead of having a static internal struct, move it to a 
module-level struct (you probably have to repeat the template 
parameters, but not the constraints). See if it produces a 
similar slimming effect.


-Steve


Yes,

Making a local copy of chain and moving the structure outside of 
the method solved the problem and reduced the code size.


The stripped size even reduced from the version that did not 
experience the huge increase. Stripped: 7.4Mb -> 5.5MB


Should we actually change the implementation in phobos to be like 
this?


In the company I work in we are very sensitive to the code size 
so anything that can be made for this in phobos will help with 
adoption.


I am still curious though why the code size increased so 
dramatically with such a small change. Both versions return a 
result from chain and actually the one that does not experience 
the increase is more complex (includes the one that does).


In my actual code I have multiple calls to chain on top of the 
result of this and the code size does not increase dramatically.


I am also concerned by the fact that ldc on mac just could not 
compile the source which produced the big binary (entering an 
infinite loop).


Thanks



Re: DMD producing huge binaries

2016-05-19 Thread Steven Schveighoffer via Digitalmars-d

On 5/19/16 10:43 AM, Steven Schveighoffer wrote:


Or even better:

template(T) foo if (someConstraints)
{
struct Result
{
   T t;
}

auto foo(T t)
{
   return Result(t);
}
}


This solution works awesomely, actually. It even produces smaller code 
than moving the struct completely outside.


I'm going to use this mechanism in iopipe to get my voldemorts back :)

-Steve



Re: DMD producing huge binaries

2016-05-19 Thread Andrei Alexandrescu via Digitalmars-d

On 5/19/16 10:56 AM, Daniel Murphy wrote:

On 20/05/2016 12:44 AM, Andrei Alexandrescu wrote:

On 05/19/2016 10:43 AM, Steven Schveighoffer wrote:

This may be crazy, but I solved this problem by simply moving the struct
outside the function. What about a lowering that does this for you?


That's also a possibility, but likely a more complicated one. -- Andrei


We don't actually need to lower it, just mangle it as if it was lowered.


Noice! -- Andrei


Re: DMD producing huge binaries

2016-05-19 Thread Daniel Murphy via Digitalmars-d

On 20/05/2016 12:44 AM, Andrei Alexandrescu wrote:

On 05/19/2016 10:43 AM, Steven Schveighoffer wrote:

This may be crazy, but I solved this problem by simply moving the struct
outside the function. What about a lowering that does this for you?


That's also a possibility, but likely a more complicated one. -- Andrei


We don't actually need to lower it, just mangle it as if it was lowered.


Re: DMD producing huge binaries

2016-05-19 Thread Steven Schveighoffer via Digitalmars-d

On 5/19/16 9:45 AM, Andrei Alexandrescu wrote:

On 05/19/2016 08:38 AM, Steven Schveighoffer wrote:

Yep. chain uses voldemort type, map does not.


We definitely need to fix Voldemort types. Walter and I bounced a few
ideas during DConf.

1. Do the cleartext/compress/hash troika everywhere (currently we do it
on Windows). That should help in several places.

2. For Voldemort types, simply generate a GUID-style random string of
e.g. 64 bytes. Currently the name of the type is composed out of its
full scope, with some exponentially-increasing repetition of substrings.
That would compress well, but it's just a lot of work to produce and
then compress the name. A random string is cheap to produce and adequate
for Voldemort types (you don't care for their name anyway...
Voldemort... get it?).

I very much advocate slapping a 64-long random string for all Voldermort
returns and calling it a day. I bet Liran's code will get a lot quicker
to build and smaller to boot.


This may be crazy, but I solved this problem by simply moving the struct 
outside the function. What about a lowering that does this for you?


Example:

auto foo(T)(T t) if (someConstraints)
{
   static struct Result
   {
  T t;
   }
   return Result(t);
}

Changes to:

private struct foo_Result(T) if (someConstraints)
{
   T t;
}

auto foo(T)(T t) if (someConstraints)
{
   alias Result = foo_Result!T // inserted by compiler
   return Result(t);
}

Or even better:

template(T) foo if (someConstraints)
{
   struct Result
   {
  T t;
   }

   auto foo(T t)
   {
  return Result(t);
   }
}

What this does is lower the number of times the template parameters (and 
function arguments) appears in the type name to 1. This gets rid of the 
exponential nature of the symbol name.


This doesn't work for voldemorts with closures, but maybe it can somehow 
be reconstructed so the context is included in the generated struct.


-Steve


Re: DMD producing huge binaries

2016-05-19 Thread Andrei Alexandrescu via Digitalmars-d

On 05/19/2016 08:38 AM, Steven Schveighoffer wrote:

Yep. chain uses voldemort type, map does not.


We definitely need to fix Voldemort types. Walter and I bounced a few 
ideas during DConf.


1. Do the cleartext/compress/hash troika everywhere (currently we do it 
on Windows). That should help in several places.


2. For Voldemort types, simply generate a GUID-style random string of 
e.g. 64 bytes. Currently the name of the type is composed out of its 
full scope, with some exponentially-increasing repetition of substrings. 
That would compress well, but it's just a lot of work to produce and 
then compress the name. A random string is cheap to produce and adequate 
for Voldemort types (you don't care for their name anyway... 
Voldemort... get it?).


I very much advocate slapping a 64-long random string for all Voldermort 
returns and calling it a day. I bet Liran's code will get a lot quicker 
to build and smaller to boot.



Andrei



Re: DMD producing huge binaries

2016-05-19 Thread Steven Schveighoffer via Digitalmars-d

On 5/17/16 6:04 PM, ZombineDev wrote:

On Tuesday, 17 May 2016 at 21:58:06 UTC, Andrei Alexandrescu wrote:

On 05/17/2016 05:44 PM, Georgi D wrote:

Hi,

While working on a D project which heavily uses the lazy algorithms for
ranges I noticed a sudden huge increase in the compilation time and the
produced binary size.

[snip]

Thanks. That's definitely deserving of a bug report. -- Andrei


I'm guessing that is highly related to this one:
https://issues.dlang.org/show_bug.cgi?id=15831


Yep. chain uses voldemort type, map does not.

Georgi: try copying chain implementation into your local file, but 
instead of having a static internal struct, move it to a module-level 
struct (you probably have to repeat the template parameters, but not the 
constraints). See if it produces a similar slimming effect.


-Steve


Re: DMD producing huge binaries

2016-05-18 Thread Georgi D via Digitalmars-d
On Tuesday, 17 May 2016 at 21:58:06 UTC, Andrei Alexandrescu 
wrote:

On 05/17/2016 05:44 PM, Georgi D wrote:

Hi,

While working on a D project which heavily uses the lazy 
algorithms for
ranges I noticed a sudden huge increase in the compilation 
time and the

produced binary size.

[snip]

Thanks. That's definitely deserving of a bug report. -- Andrei


I have file bug: https://issues.dlang.org/show_bug.cgi?id=16039



Re: DMD producing huge binaries

2016-05-17 Thread ZombineDev via Digitalmars-d
On Tuesday, 17 May 2016 at 21:58:06 UTC, Andrei Alexandrescu 
wrote:

On 05/17/2016 05:44 PM, Georgi D wrote:

Hi,

While working on a D project which heavily uses the lazy 
algorithms for
ranges I noticed a sudden huge increase in the compilation 
time and the

produced binary size.

[snip]

Thanks. That's definitely deserving of a bug report. -- Andrei


I'm guessing that is highly related to this one: 
https://issues.dlang.org/show_bug.cgi?id=15831


Re: DMD producing huge binaries

2016-05-17 Thread Andrei Alexandrescu via Digitalmars-d

On 05/17/2016 05:44 PM, Georgi D wrote:

Hi,

While working on a D project which heavily uses the lazy algorithms for
ranges I noticed a sudden huge increase in the compilation time and the
produced binary size.

[snip]

Thanks. That's definitely deserving of a bug report. -- Andrei



DMD producing huge binaries

2016-05-17 Thread Georgi D via Digitalmars-d

Hi,

While working on a D project which heavily uses the lazy 
algorithms for ranges I noticed a sudden huge increase in the 
compilation time and the produced binary size.


I chased down the offending change to just a small change in one 
line of the code which made the resulting binary to jump from 
18MB to over 400MB.


The code that produces the large binary actually failed to link 
using DMD on osx (works on linux). The link command just consumes 
all the CPU and never completes. Using LDC on osx just failed to 
compile. ldc was using all the CPU and never completing.


There is also a very big difference between passing -unittest and 
not passing it.


I was able to shrink the code down to the following:

import std.range.primitives;
import std.range : chain, choose, repeat, chooseAmong, takeOne;
import std.algorithm.iteration : joiner , map;
import std.range.interfaces : InputRange, inputRangeObject;
import std.conv : text;

struct AType {
   string id;
   Annotation[] annotations;
   TypeArgument[] typeArguments;

}

struct Annotation {
   string type;
}

struct TypeArgument {
   AType refType;
}

struct PrimaryValue {
   int type;
}

struct FieldDecl {
   AType type;
   string id;
   PrimaryValue[] values;
}

struct MethodDecl {
   string name;
   AType returnType;
   FieldDecl[] params;
}

auto toCharsArray(R)(R r, string sep = "\n", string tail = "\n")
 if (isInputRange!R) {
return choose(r.empty, "", chain(r.joiner(sep), tail));
}


auto toChars(Annotation uda) {
   return chain("@", uda.type);
}

InputRange!(ElementType!string) toChars(TypeArgument ta) {
   auto r = chain("", ta.refType.id);
   with (ta.refType) {
  auto tArgs = choose(typeArguments.empty, "",
  chain("!(", 
typeArguments.map!(toChars).joiner(", "), ")"));


  return chain(id, tArgs).inputRangeObject();
   }
}

auto toChars(AType t) {
   auto udas = t.annotations.map!(toChars).toCharsArray();
   auto tArgs = choose(t.typeArguments.empty, "",
   chain("!(", 
t.typeArguments.map!(toChars).joiner(", "), ")"));


   return chain(udas, t.id, tArgs);
}

/// PrimaryValue
auto toChars(PrimaryValue pv) {
   return chooseAmong(pv.type, "val", "val");
}

auto toCharsSingleOrArray(R)(R r) if (isInputRange!R) {
   return choose(r.length == 1,
 r.takeOne.map!(toChars).joiner(),
 chain("[", r.map!(toChars).joiner(", "), "]")
 );
}

auto toCharsBare(FieldDecl f) {
   auto def = chain(f.type.toChars, " ", f.id);

   return choose(f.values.empty,
 def,
 chain(def, " = ", 
toCharsSingleOrArray(f.values)));

}

auto toChars(FieldDecl f) {
   return chain("", f.toCharsBare());
}

auto toChars(MethodDecl m) {
   //Just changing the line bellow to have map!(toChars) reduces 
the binary size

   //to 18MB from 403MB
   auto prms = chain("(", 
m.params.map!(toCharsBare).toCharsArray(", ", ""), ")");

   return chain(" ", m.name, prms, ";");
}

unittest {
   AType t1 = {id : "Foo"};
   FieldDecl f1 = {type : t1, id : "f1"};

   MethodDecl m1 = {name : "m", returnType : t1 };
   assert(!m1.toChars().text().empty); // removing this line 
drops the size by half

}

void main() { }

I am compiling this with:

# dmd -g -debug -unittest


I have marked the line that needs to be changed for the code size 
to drop significantly. What is interesting is that the using 
toChars should be more complex than using toCharsBare since the 
former calls the latter but the binary size just explodes when 
using toCharsBare.


I apologize for the long code segment.