Re: Desperately looking for a work-around to load and unload D shared libraries from C on OSX

2015-09-16 Thread ponce via Digitalmars-d

On Wednesday, 16 September 2015 at 23:24:29 UTC, bitwise wrote:


I was trying to solve this one myself, but the modifications to 
DMD's backend that are needed are out of reach for me right now.


If you're willing to build your own druntime, you may be able 
to get by.


I'd prefer a solution that works with existing compilers, but 
maybe building a custom LDC is possible if I figure it out.


If I understand correctly, you want to repeatedly load/unload 
the same shared library, correct? I ask because druntime for 
osx only supports loading a single image at a time:


https://github.com/D-Programming-Language/druntime/blob/1e25749cd01ad08dc08319a3853fbe86356c3e62/src/rt/sections_osx.d#L26



In practive I've found that the D shared libraries I produce can 
be dlopen/dlclose any number of times, simultaneous too. Using 
both LDC and DMD, don't know why it works.
The thing that doesn't work is the C host program dlopen'ing the 
shared library, dlclose it, then dlopen another shared library 
written in C.



Anyways, when main() of a D program runs, it calls rt_init() 
and rt_term(). If you don't have a D entry point in your 
program, you have to retrieve these from your shared lib(which 
has druntime statically linked) using dlsym() and call them 
yourself.


I don't control the host program. My shared libs do have an 
entrypoint, from which I call Runtime.initialize().


I can also use LDC global constructor/destructor to call 
Runtime.initialize / Runtime.terminate, but it doesn't work any 
better because of the callback.






https://github.com/D-Programming-Language/druntime/blob/478b6c5354470bc70e688c45821eea71b766e70d/src/rt/dmain2.d#L158

Now, initSections() and finiSections() are responsible for 
setting up the image. If you look at initSections(), the 
function "_dyld_register_func_for_add_image" is the one that 
causes the crash, because there is no way to remove the 
callback, which will reside in your shared lib.


https://github.com/D-Programming-Language/druntime/blob/1e25749cd01ad08dc08319a3853fbe86356c3e62/src/rt/sections_osx.d#L76

So what happens is, when you call 
_dyld_register_func_for_add_image, dyld will call your callback 
for every shared-library/image(including the main application's 
image) that is currently loaded. However, you can skip the 
callback and just call "sections_osx_onAddImage" yourself.


You would have to add something like this to sections_osx.d, 
and call it instead of adding the callback:


void callOnAddImage()
{
// dladdr() should give you information about the
// shared lib in which the symbol you pass resides.
// Passing the address of this function should work.
Dl_info info;
int ret = dladdr(cast(void*)&callOnAddImage, &info);
assert(ret);

// "dli_fbase" is actually a pointer to
// the mach_header for the shared library.
// once you have the mach_header, you can
// also retrieve the image slide, and finally
// call sections_osx_onAddImage().
mach_header* header = cast(mach_header*)info.dli_fbase;
intptr_t slide = _dyld_get_image_slide(header);
sections_osx_onAddImage(header, slide);
}

Now, if you look at finiSections(), it seems to be incomplete. 
There is nothing like sections_osx_onRemoveImage, so you'll 
have to complete it to make sure the library is unloaded 
correctly:


https://github.com/D-Programming-Language/druntime/blob/1e25749cd01ad08dc08319a3853fbe86356c3e62/src/rt/sections_osx.d#L83

You'll may have to make other mods here and there to get this 
working correctly, but this is the bulk of it.


 Bit


Thanks for your answer. This is really helpful, though I don't 
understand the first thing about what images, headers and 
sections are in this context.





Re: Anyone interested on working on a D parser ?

2015-09-16 Thread thedeemon via Digitalmars-d
On Thursday, 17 September 2015 at 01:35:42 UTC, Leandro T. C. 
Melo wrote:


An alternative would be a LL parser generator. I think ANTLR 
added a C++ target, but I don't know how mature it is.


I used C++ target of ANTLR like 13 years ago and it was fine. So 
I suppose it should be mature now. ;)


Re: Anyone interested on working on a D parser ?

2015-09-16 Thread deadalnix via Digitalmars-d
On Thursday, 17 September 2015 at 01:38:02 UTC, Adam D. Ruppe 
wrote:
Did you take a look at 
https://github.com/Hackerpilot/libdparse/tree/master already?


Yes. libdparse and/or SDC's parser seems like some good places to 
start.


Checked integer type API design poll

2015-09-16 Thread tsbockman via Digitalmars-d
I have written some poll questions concerning the design 
trade-offs involved in making a good `SafeInt`/`CheckedInt` type. 
They are about the actual semantics of the API, not the 
internals, nor bike-shedding about names.


(`SafeInt` and `CheckedInt` are integer data types which use 
`core.checkedint` to guard against overflow, divide-by-zero, etc. 
Links to current work-in-progress versions by


1) Robert Schadek (burner):
https://github.com/D-Programming-Language/phobos/pull/3389
2) Myself (tsbockman):
https://github.com/tsbockman/CheckedInt)

For the purposes of this poll please assume the following (based 
on my own extensive testing):


1) Code using checked operations will take about **1.5x longer to 
run** than unchecked code. (If you compile with GDC, anyway; DMD 
and LDC are another story...)
2) The main design decision with a significant runtime 
performance cost, is whether to throw exceptions or not. With 
some optimization, the hit is modest, but noticeable.
3) Even if the API uses exceptions some places, it can still be 
used in `nothrow @nogc` code, at the cost of some extra typing.


Two further points I would ask the reader to consider:

* A checked integer type is fundamentally semantically different 
from an unchecked type. The difference is of similar magnitude to 
that of floating-point vs fixed-point.
* It might be wise to read the entire poll before answering it - 
the questions are all related in some way.


The poll results are here, if you wish to preview the questions:

http://polljunkie.com/poll/kzrije/checked-integer-type-behaviour/view

When you are ready, please take the poll yourself:

http://polljunkie.com/poll/cytdbq/checked-integer-type-behaviour


Thanks for your time.


Re: Anyone interested on working on a D parser ?

2015-09-16 Thread Adam D. Ruppe via Digitalmars-d
Did you take a look at 
https://github.com/Hackerpilot/libdparse/tree/master already?


Anyone interested on working on a D parser ?

2015-09-16 Thread Leandro T. C. Melo via Digitalmars-d
Hi D enthusiasts,

I'm developing a multi-language code modelling engine. The heart of
the project is a language-unifying AST, a generic pipeline of binding,
type checking, code completion, etc, and hooks that allow each
language to plug-in their specific behavior where needed. Also, the
library is not tight to any particular IDE or text editor.

One "issue" I have so far is the D parser.

Mostly because of convenience I prototyped it with Bison. Despite
being tricky to get such LR parsers working in an interactive
environment, it's still possible to error-recover at the right spots
and provide a decent user experience - you can see some action in the
videos below, one for D and another for Go [1]. However, in the case
of D there's an additional challenge due to its grammar. Even though
I'm using a GLR parser (so ambiguities are handled), it's still
difficult to get everything in place.

Would anyone be interested on working out this parser or perhaps
building a recursive descent one? The parser is supposed to be
lightweight, not to perform symbol lookup (it can afford some
impreciseness), and its result must be the special AST. Therefore,
simply taking the official dmd2's parser is not a solution, although
it could certainly server as a reference.

An alternative would be a LL parser generator. I think ANTLR added a
C++ target, but I don't know how mature it is. There's also llgen, but
I never tried it. I might experiment one of them with Rust.

This is a project I work on my free time, but I'm trying to make it
move. So if anyone is interested, please get in touch, I'd be glad to
take contributions: https://github.com/ltcmelo/uaiso

Leandro

[1] https://www.youtube.com/watch?v=ZwMQ_GB-Zv0 and
https://www.youtube.com/watch?v=nUpcVBAw0DM


Re: Implement the "unum" representation in D ?

2015-09-16 Thread H. S. Teoh via Digitalmars-d
On Wed, Sep 16, 2015 at 08:06:42PM +, deadalnix via Digitalmars-d wrote:
[...]
> When you have a floating point unit, you get your 32 bits you get 23
> bits that go into the mantissa FU and 8 in the exponent FU. For
> instance, if you multiply floats, you send the 2 exponent into a
> adder, you send the 2 mantissa into a 24bits multiplier (you add a
> leading 1), you xor the bit signs.
> 
> You get the carry from the adder, and emit a multiply, or you count
> the leading 0 of the 48bit multiply result, shift by that amount and
> add the shit to the exponent.
> 
> If you get a carry in the exponent adder, you saturate and emit an
> inifinity.
> 
> Each bit goes into a given functional unit. That mean you need on wire
> from the input to the functional unit is goes to. Sale for these
> result.
> 
> Now, if the format is variadic, you need to wire all bits to all
> functional units, because they can potentially end up there. That's a
> lot of wire, in fact the number of wire is growing quadratically with
> that joke.
> 
> The author keep repeating that wire became the expensive thing and he
> is right. Meaning a solution with quadratic wiring is not going to cut
> it.

I found this .pdf that explains the unum representation a bit more:

http://sites.ieee.org/scv-cs/files/2013/03/Right-SizingPrecision1.pdf

On p.31, you can see the binary representation of unum. The utag has 3
bits for exponent size, presumably meaning the exponent can vary in size
up to 7 bits.  There are 5 bits in the utag for the mantissa, so it can
be anywhere from 0 to 31 bits.

It's not completely variadic, but it's complex enough that you will
probably need some kind of shift register to extract the exponent and
mantissa so that you can pass them in the right format to the various
parts of the hardware.  It definitely won't be as straightforward as the
current floating-point format; you can't just wire the bits directly to
the adders and multipliers. This is probably what the author meant by
needing "more transistors".  I guess his point was that we have to do
more work in the CPU, but in return we (hopefully) reduce the traffic to
DRAM, thereby saving the cost of data transfer.

I'm not so sure how well this will work in practice, though, unless we
have a working prototype that proves the benefits.  What if you have a
10*10 unum matrix, and during some operation the size of the unums in
the matrix changes?  Assuming the worst case, you could have started out
with 10*10 unums with small exponent/mantissa, maybe fitting in 2-3
cache lines, but after the operation most of the entries expand to 7-bit
exponent and 31-bit mantissa, so now your matrix doesn't fit into the
allocated memory anymore.  So now your hardware has to talk to druntime
to have it allocate new memory for storing the resulting unum matrix?

The only sensible solution seems to be to allocate the maximum size for
each matrix entry, so that if the value changes you won't run out of
space.  But that means we have lost the benefit of having a variadic
encoding to begin with -- you will have to transfer the maximum size's
worth of data when you load the matrix from DRAM, even if most of that
data is unused (because the unum only takes up a small percentage of the
space).  The author proposed GC, but I have a hard time imagining a GC
implemented in *CPU*, no less, colliding with the rest of the world
where it's the *software* that controls DRAM allocation.  (GC too slow
for your application? Too bad, gotta upgrade your CPU...)

The way I see it from reading the PDF slides, is that what the author is
proposing would work well as a *software* library, perhaps backed up by
hardware support for some of the lower-level primitives.  I'm a bit
skeptical of the claims of data traffic / power savings, unless there is
hard data to prove that it works.


T

-- 
"The number you have dialed is imaginary. Please rotate your phone 90 degrees 
and try again."


Re: Desperately looking for a work-around to load and unload D shared libraries from C on OSX

2015-09-16 Thread bitwise via Digitalmars-d

On Wednesday, 16 September 2015 at 22:29:46 UTC, ponce wrote:
Context: On OSX, a C program can load a D shared library but 
once unloaded the next dlopen will crash, jumping into a 
callback that doesn't exist anymore.


I've filed it here: 
https://issues.dlang.org/show_bug.cgi?id=15060



It looks like this was known and discussed several times 
already:

http://forum.dlang.org/post/vixoqmidlbizawbxm...@forum.dlang.org (2015)
https://github.com/D-Programming-Language/druntime/pull/228 
(2012)



Any idea to work-around this problem would be awesome.

I'm not looking for something correct, future-proof, or pretty. 
Any shit that still stick to the wall will do. Anything!


The only case I need to support is: C host, D shared library, 
with runtime statically linked.


Please help!


I was trying to solve this one myself, but the modifications to 
DMD's backend that are needed are out of reach for me right now.


If you're willing to build your own druntime, you may be able to 
get by.


If I understand correctly, you want to repeatedly load/unload the 
same shared library, correct? I ask because druntime for osx only 
supports loading a single image at a time:


https://github.com/D-Programming-Language/druntime/blob/1e25749cd01ad08dc08319a3853fbe86356c3e62/src/rt/sections_osx.d#L26

Anyways, when main() of a D program runs, it calls rt_init() and 
rt_term(). If you don't have a D entry point in your program, you 
have to retrieve these from your shared lib(which has druntime 
statically linked) using dlsym() and call them yourself.


https://github.com/D-Programming-Language/druntime/blob/478b6c5354470bc70e688c45821eea71b766e70d/src/rt/dmain2.d#L158

Now, initSections() and finiSections() are responsible for 
setting up the image. If you look at initSections(), the function 
"_dyld_register_func_for_add_image" is the one that causes the 
crash, because there is no way to remove the callback, which will 
reside in your shared lib.


https://github.com/D-Programming-Language/druntime/blob/1e25749cd01ad08dc08319a3853fbe86356c3e62/src/rt/sections_osx.d#L76

So what happens is, when you call 
_dyld_register_func_for_add_image, dyld will call your callback 
for every shared-library/image(including the main application's 
image) that is currently loaded. However, you can skip the 
callback and just call "sections_osx_onAddImage" yourself.


You would have to add something like this to sections_osx.d, and 
call it instead of adding the callback:


void callOnAddImage()
{
// dladdr() should give you information about the
// shared lib in which the symbol you pass resides.
// Passing the address of this function should work.
Dl_info info;
int ret = dladdr(cast(void*)&callOnAddImage, &info);
assert(ret);

// "dli_fbase" is actually a pointer to
// the mach_header for the shared library.
// once you have the mach_header, you can
// also retrieve the image slide, and finally
// call sections_osx_onAddImage().
mach_header* header = cast(mach_header*)info.dli_fbase;
intptr_t slide = _dyld_get_image_slide(header);
sections_osx_onAddImage(header, slide);
}

Now, if you look at finiSections(), it seems to be incomplete. 
There is nothing like sections_osx_onRemoveImage, so you'll have 
to complete it to make sure the library is unloaded correctly:


https://github.com/D-Programming-Language/druntime/blob/1e25749cd01ad08dc08319a3853fbe86356c3e62/src/rt/sections_osx.d#L83

You'll may have to make other mods here and there to get this 
working correctly, but this is the bulk of it.


 Bit



Desperately looking for a work-around to load and unload D shared libraries from C on OSX

2015-09-16 Thread ponce via Digitalmars-d
Context: On OSX, a C program can load a D shared library but once 
unloaded the next dlopen will crash, jumping into a callback that 
doesn't exist anymore.


I've filed it here: https://issues.dlang.org/show_bug.cgi?id=15060


It looks like this was known and discussed several times already:
http://forum.dlang.org/post/vixoqmidlbizawbxm...@forum.dlang.org 
(2015)

https://github.com/D-Programming-Language/druntime/pull/228 (2012)


Any idea to work-around this problem would be awesome.

I'm not looking for something correct, future-proof, or pretty. 
Any shit that still stick to the wall will do. Anything!


The only case I need to support is: C host, D shared library, 
with runtime statically linked.


Please help!


Re: running code on the homepage

2015-09-16 Thread Vladimir Panteleev via Digitalmars-d

On Wednesday, 16 September 2015 at 09:52:23 UTC, ixid wrote:

On Wednesday, 16 September 2015 at 06:44:30 UTC, nazriel wrote:
On Wednesday, 16 September 2015 at 05:54:03 UTC, Andrei 
Amatuni wrote:
maybe I'm doing something wrong...but the output of running 
the default code snippet on the dlang.org homepage is:


"unable to fork: Cannot allocate memory"

not a good look


Thank you for letting us know,

This issue will be fixed very soon.

Best regards,
Damian Ziemba


Would it be possible to set things up so ones that fail are 
retired until they can be fixed? Non-working examples look 
awful for the language.


https://github.com/D-Programming-Language/dlang.org/pull/1098

This removes unfixable examples. I think Damian is working on 
getting the one fixable-but-broken example (rounding 
floating-point numbers) to work.




Re: Implement the "unum" representation in D ?

2015-09-16 Thread deadalnix via Digitalmars-d
On Wednesday, 16 September 2015 at 21:12:11 UTC, Ola Fosheim 
Grøstad wrote:
On Wednesday, 16 September 2015 at 20:53:37 UTC, deadalnix 
wrote:
On Wednesday, 16 September 2015 at 20:30:36 UTC, Ola Fosheim 
Grøstad wrote:
On Wednesday, 16 September 2015 at 20:06:43 UTC, deadalnix 
wrote:
You know, when you have no idea what you are talking about, 
you can just move on to something you understand.


Ah, nice move. Back to your usual habits?



Stop


OK. I stop. You are beyond reason.


True, how blind I was. It is fairly obvious now, thinking about 
it, that you can get 3 order of magnitude increase in sequential 
decoding in hardware by having a compiler with a vectorized SSA 
and a scratchpad !


Or maybe you have number to present us that show I'm wrong ?



Re: Implement the "unum" representation in D ?

2015-09-16 Thread Ola Fosheim Grøstad via Digitalmars-d

On Wednesday, 16 September 2015 at 20:53:37 UTC, deadalnix wrote:
On Wednesday, 16 September 2015 at 20:30:36 UTC, Ola Fosheim 
Grøstad wrote:
On Wednesday, 16 September 2015 at 20:06:43 UTC, deadalnix 
wrote:
You know, when you have no idea what you are talking about, 
you can just move on to something you understand.


Ah, nice move. Back to your usual habits?



Stop


OK. I stop. You are beyond reason.



Re: Implement the "unum" representation in D ?

2015-09-16 Thread Ola Fosheim Grøstad via Digitalmars-d

On Wednesday, 16 September 2015 at 20:35:16 UTC, Wyatt wrote:
On Wednesday, 16 September 2015 at 08:53:24 UTC, Ola Fosheim 
Grøstad wrote:


I don't think he is downplaying it. He has said that it will 
probably take at least 10 years before it is available in 
hardware. There is also a company called Rex Computing that 
are looking at unum:


Oh hey, I remember these jokers.  They were trying to blow some 
smoke about moving 288 GB/s at 4W.  They're looking at unum?  
Of course they are; care to guess who's advising them?  Yep.


I'll be shocked if they ever even get to tape out.


Yes, of course, most startups in hardware don't  succeed. I 
assume they get knowhow from Adapteva.




Re: running code on the homepage

2015-09-16 Thread Andrei Alexandrescu via Digitalmars-d

On 09/16/2015 09:49 AM, nazriel wrote:

1-2 days more and we will be done with it so IMHO no need take any
additionals steps for it right now.


That's great, thanks for doing this. What is the current status with 
regard to making the online compilation infrastructure publicly 
accessible and improvable? Ideally everything would be in the open, and 
we (= the fledgling D Language Foundation) would pay for the server 
infrastructure. Please advise, thanks. -- Andrei




Re: Implement the "unum" representation in D ?

2015-09-16 Thread deadalnix via Digitalmars-d
On Wednesday, 16 September 2015 at 20:30:36 UTC, Ola Fosheim 
Grøstad wrote:
On Wednesday, 16 September 2015 at 20:06:43 UTC, deadalnix 
wrote:
You know, when you have no idea what you are talking about, 
you can just move on to something you understand.


Ah, nice move. Back to your usual habits?



Stop

Prefetching would not change anything here. The problem come 
from variable size encoding, and the challenge it causes for 
hardware. You can have 100% L1 hit and still have the same 
problem.


There is _no_ cache. The compiler fully controls the layout of 
the scratchpad.




You are the king of goalspot shifting. You answer about x86 
decoding you get served.


You want to talk about a scraptch pad ? Good ! How do the data 
ends up in the scratchpad to begin with ? Using magic ? What is 
the scraptchpad made of if not flip flops ? If if so, how is it 
different from a cache as far as the hardware is concerned ?


You can play with words, but the problem remain the same. When 
you get on chip memory, be it cache or scratchpad, and a variadic 
encoding, you can't even feed a handful of ALUs. How do you 
expect to feed 256+ VLIW cores ? There are 3 order of magintude 
of gap in your reasoning.


You can't pull 3 orders of magnitude out of your ass and just 
pretend it can be done.



That's hardware 101.


Is it?



Yes wire is hardware 101. I mean seriously, if one do not get how 
component can be wired together, one should probably abstain from 
making any hardware comment.


You cannot predict at this point what the future will be like. 
Is it unlikely that anything specific will change status quo? 
Yes. Is it highly probable that something will change status 
quo? Yes. Will it happen over night. No.


50+ years has been invested in floating point design. Will this 
be offset over night, no.


It'll probably take 10+ years before anyone has a different 
type of numerical ALU on their desktop than IEEE754. By that 
time we are in a new era.


Ok listen that is not complicated.

I don't know what car will come out next year? But I know there 
won't be a car that can go 1km on 10 centiliter of gazoline. 
This would be physic defying stuff.


Same thing you won't be able to feed 256+ cores if you load data 
sequentially.


Don't get me this stupid we don't know what's going to happen 
tomorow bullshit. We won't have unicorn meat in supermarkets. We 
won't have free energy. We won't have interstellar travel. And we 
won't have the capability to feed 256+ cores sequentially.


I gave you numbers you gave me bullshit.



Re: dmd codegen improvements

2015-09-16 Thread Walter Bright via Digitalmars-d

On 9/16/2015 7:16 AM, Bruno Medeiros wrote:

On 28/08/2015 22:59, Walter Bright wrote:

People told me I couldn't write a C compiler, then told me I couldn't
write a C++ compiler. I'm still the only person who has ever implemented
a complete C++ compiler (C++98). Then they all (100%) laughed at me for
starting D, saying nobody would ever use it.

My whole career is built on stepping over people who told me I couldn't
do anything and wouldn't amount to anything.


So your whole career is fundamentally based not on bringing value to the
software world, but rather merely proving people wrong? That amounts to living
your professional life in thrall of other people's validation, and it's not
commendable at all. It's a waste of your potential.

It is only worthwhile to prove people wrong when it brings you a considerable
amount of either monetary resources or clout - and more so than you would have
got doing something else with your time.

It's not clear to me that was always the case throughout your career... was it?


Wow, such an interpretation never occurred to me. I will reiterate that I worked 
on things that I believed had value and nobody else did. I.e. I did not need 
validation from others.




Re: Implement the "unum" representation in D ?

2015-09-16 Thread Wyatt via Digitalmars-d
On Wednesday, 16 September 2015 at 08:53:24 UTC, Ola Fosheim 
Grøstad wrote:


I don't think he is downplaying it. He has said that it will 
probably take at least 10 years before it is available in 
hardware. There is also a company called Rex Computing that are 
looking at unum:


Oh hey, I remember these jokers.  They were trying to blow some 
smoke about moving 288 GB/s at 4W.  They're looking at unum?  Of 
course they are; care to guess who's advising them?  Yep.


I'll be shocked if they ever even get to tape out.

-Wyatt


Re: Implement the "unum" representation in D ?

2015-09-16 Thread Ola Fosheim Grøstad via Digitalmars-d

On Wednesday, 16 September 2015 at 20:06:43 UTC, deadalnix wrote:
You know, when you have no idea what you are talking about, you 
can just move on to something you understand.


Ah, nice move. Back to your usual habits?

Prefetching would not change anything here. The problem come 
from variable size encoding, and the challenge it causes for 
hardware. You can have 100% L1 hit and still have the same 
problem.


There is _no_ cache. The compiler fully controls the layout of 
the scratchpad.



That's hardware 101.


Is it?

The core point is this:

1. if there is academic interest (i.e. publishing opportunities) 
you get research


2. if there is research you get new algorithms

3. you get funding

etc

You cannot predict at this point what the future will be like. Is 
it unlikely that anything specific will change status quo? Yes. 
Is it highly probable that something will change status quo? Yes. 
Will it happen over night. No.


50+ years has been invested in floating point design. Will this 
be offset over night, no.


It'll probably take 10+ years before anyone has a different type 
of numerical ALU on their desktop than IEEE754. By that time we 
are in a new era.




Re: Implement the "unum" representation in D ?

2015-09-16 Thread deadalnix via Digitalmars-d
On Wednesday, 16 September 2015 at 19:40:49 UTC, Ola Fosheim 
Grøstad wrote:
You can load continuously 64 bytes in a stream, decode to your 
internal format and push them into the scratchpad of other 
cores. You could even do this in hardware.




1/ If you load the worst case scenario, then your power advantage 
is gone.
2/ If you load these one by one, how do you expect to feed 256+ 
cores ?


Obviously you can make this in hardware. And obviously this is 
not going to be able to feed 256+ cores. Even with a chip at low 
frequency, let's say 800MHz or so, you have about 80 cycles to 
access memory. That mean you need to have 20 000+ cycles of work 
to do per core per unum.


That simple back of the envelope calculation. Your proposal is 
simply ludicrous. It's a complete non starter.


You can make this in hardware. Sure you can, no problem. But you 
won't because it is a stupid idea.


To gives you a similar example, x86 decoding is often the 
bottleneck on an x86 CPU. The number of ALUs in x86 over the 
past decade decreased rather than increased, because you 
simply can't decode fast enough to feed them. Yet, x86 CPUs 
have a 64 ways speculative decoding as a first stage.


That's because we use a dumb compiler that does not prefetch 
intelligently.


You know, when you have no idea what you are talking about, you 
can just move on to something you understand.


Prefetching would not change anything here. The problem come from 
variable size encoding, and the challenge it causes for hardware. 
You can have 100% L1 hit and still have the same problem.


No sufficiently smart compiler can fix that.

If you are writing for a tile based VLIW CPU you preload. These 
calculations are highly iterative so I'd rather think of it as 
a co-processor solving a single equation repeatedly than 
running the whole program. You can run the larger program on a 
regular CPU or a few cores.




That's irrelevant. The problem is not the kind of CPU, it is how 
do you feed it at a fast enough rate.


The problem is not transistor it is wire. Because the damn 
thing is variadic in every ways, pretty much every bit as 
input can end up anywhere in the functional unit. That is a 
LOT of wire.


I haven't seen a design, so I cannot comment. But keep in mind 
that the CPU does not have to work with the format, it can use 
a different format internally.


We'll probably see FPGA implementations that can be run on FPGU 
cards for PCs within a few years. I read somewhere that a group 
in Singapore was working on it.


That's hardware 101.

When you have a floating point unit, you get your 32 bits you get 
23 bits that go into the mantissa FU and 8 in the exponent FU. 
For instance, if you multiply floats, you send the 2 exponent 
into a adder, you send the 2 mantissa into a 24bits multiplier 
(you add a leading 1), you xor the bit signs.


You get the carry from the adder, and emit a multiply, or you 
count the leading 0 of the 48bit multiply result, shift by that 
amount and add the shit to the exponent.


If you get a carry in the exponent adder, you saturate and emit 
an inifinity.


Each bit goes into a given functional unit. That mean you need on 
wire from the input to the functional unit is goes to. Sale for 
these result.


Now, if the format is variadic, you need to wire all bits to all 
functional units, because they can potentially end up there. 
That's a lot of wire, in fact the number of wire is growing 
quadratically with that joke.


The author keep repeating that wire became the expensive thing 
and he is right. Meaning a solution with quadratic wiring is not 
going to cut it.




Re: Implement the "unum" representation in D ?

2015-09-16 Thread Ola Fosheim Grøstad via Digitalmars-d

On Wednesday, 16 September 2015 at 19:21:59 UTC, deadalnix wrote:
No you don't. Because the streamer still need to load the unum 
one by one. Maybe 2 by 2 with a fair amount of hardware 
speculation (which means you are already trading energy for 
performances, so the energy argument is weak). There is no way 
you can feed 256+ cores that way.


You can load continuously 64 bytes in a stream, decode to your 
internal format and push them into the scratchpad of other cores. 
You could even do this in hardware.


If you look at the ubox brute forcing method you compute many 
calculations over the same data, because you solve spatially, not 
by timesteps. So you can run many many parallell computations 
over the same data.


To gives you a similar example, x86 decoding is often the 
bottleneck on an x86 CPU. The number of ALUs in x86 over the 
past decade decreased rather than increased, because you simply 
can't decode fast enough to feed them. Yet, x86 CPUs have a 64 
ways speculative decoding as a first stage.


That's because we use a dumb compiler that does not prefetch 
intelligently. If you are writing for a tile based VLIW CPU you 
preload. These calculations are highly iterative so I'd rather 
think of it as a co-processor solving a single equation 
repeatedly than running the whole program. You can run the larger 
program on a regular CPU or a few cores.


The problem is not transistor it is wire. Because the damn 
thing is variadic in every ways, pretty much every bit as input 
can end up anywhere in the functional unit. That is a LOT of 
wire.


I haven't seen a design, so I cannot comment. But keep in mind 
that the CPU does not have to work with the format, it can use a 
different format internally.


We'll probably see FPGA implementations that can be run on FPGU 
cards for PCs within a few years. I read somewhere that a group 
in Singapore was working on it.




Re: Implement the "unum" representation in D ?

2015-09-16 Thread deadalnix via Digitalmars-d
On Wednesday, 16 September 2015 at 14:11:04 UTC, Ola Fosheim 
Grøstad wrote:
On Wednesday, 16 September 2015 at 08:38:25 UTC, deadalnix 
wrote:
The energy comparison is bullshit. As long as you haven't 
loaded the data, you don't know how wide they are. Meaning you 
need either to go pessimistic and load for the worst case 
scenario or do 2 round trip to memory.


That really depends on memory layout and algorithm. A likely 
implementation would be a co-processor that would take a unum 
stream and then pipe it through a network of cores (tile based 
co-processor). The internal busses between cores are very very 
fast and with 256+ cores you get tremendous throughput. But you 
need a good compiler/libraries and software support.




No you don't. Because the streamer still need to load the unum 
one by one. Maybe 2 by 2 with a fair amount of hardware 
speculation (which means you are already trading energy for 
performances, so the energy argument is weak). There is no way 
you can feed 256+ cores that way.


To gives you a similar example, x86 decoding is often the 
bottleneck on an x86 CPU. The number of ALUs in x86 over the past 
decade decreased rather than increased, because you simply can't 
decode fast enough to feed them. Yet, x86 CPUs have a 64 ways 
speculative decoding as a first stage.


The hardware is likely to be slower as you'll need way more 
wiring than for regular floats, and wire is not only cost, but 
also time.


You need more transistors per ALU, but slower does not matter 
if the algorithm needs bounded accuracy or if it converge more 
quickly with unums.  The key challenge for him is to create a 
market, meaning getting the semantics into scientific software 
and getting initial workable implementations out to scientists.


If there is a market demand, then there will be products. But 
you need to create the market first. Hence he wrote an easy to 
read book on the topic and support people who want to implement 
it.


The problem is not transistor it is wire. Because the damn thing 
is variadic in every ways, pretty much every bit as input can end 
up anywhere in the functional unit. That is a LOT of wire.




Re: Implementing typestate

2015-09-16 Thread Ola Fosheim Grøstad via Digitalmars-d
On Wednesday, 16 September 2015 at 18:41:33 UTC, Ola Fosheim 
Grøstad wrote:
I don't think this is possible to establish in the general 
case. Wouldn't this require a full theorem prover? I think the 
only way for that to work is to fully unroll all loops and hope 
that a theorem prover can deal with it.


For example:

Object obj = create();

for ... {
  (Object obj, Ref r) = obj.borrow();
  queue.push(r);
  dostuff(queue);
}


On the other hand if you have this:

  for i=0..2 {
(Object obj, Ref r[i]) = obj.borrow();
dostuff(r);
  }

then you can unwind it as (hopefully):

  (Object obj, Ref r[0]) = 
obj.borrow();
  (Object obj, Ref r[1]) = 
obj.borrow();
  (Object obj, Ref r[2]) = 
obj.borrow();


  x += somepurefunction(r[0]);
  x += somepurefunction(r[1]);
  x += somepurefunction(r[2]);

  r[0].~this();  // r[0] proven unmodified, type is 
Ref

  r[1].~this();  // r[1] proven to be Ref
  r[2].~this(); // r[2] proven to be Ref
  r.~this();

If the lend IDs always are unique then you sometimes can prove 
that all constructors have a matching destructor...  Or something 
like that...


?



Re: Overview of D User Groups?

2015-09-16 Thread Ali Çehreli via Digitalmars-d

On 09/16/2015 11:56 AM, qznc wrote:

Is there an overview of D user groups somewhere?

There is one in Berlin and one in the Valley, apparently. Walter
participates in the Cpp group in Seattle or something, if I remember
correctly.


If a Meetup group happens to list the right keywords (topics?) then it 
shows up on this map:


  http://dpl.meetup.com/

Ali



Overview of D User Groups?

2015-09-16 Thread qznc via Digitalmars-d

Is there an overview of D user groups somewhere?

There is one in Berlin and one in the Valley, apparently. Walter 
participates in the Cpp group in Seattle or something, if I 
remember correctly.


Re: Implement the "unum" representation in D ?

2015-09-16 Thread Timon Gehr via Digitalmars-d

On 09/16/2015 10:17 AM, Don wrote:


So:
...
* There is no guarantee that it would be possible to implement it in
hardware without a speed penalty, regardless of how many transistors you
throw at it (hardware analogue of Amdahl's Law)


https://en.wikipedia.org/wiki/Gustafson's_law :o)



Re: Implementing typestate

2015-09-16 Thread Ola Fosheim Grøstad via Digitalmars-d
On Wednesday, 16 September 2015 at 18:01:29 UTC, Marc Schütz 
wrote:

typestate(alias owner) {
this.owner := owner; // re-alias operator
this.owner.refcount++;
}


I don't think this is possible to establish in the general case. 
Wouldn't this require a full theorem prover? I think the only way 
for that to work is to fully unroll all loops and hope that a 
theorem prover can deal with it. Either that or painstakingly 
construct a proof manually (Hoare logic).


Like, how can you statically determine if borrowed references  
stuffed into a queue are all released? To do that you must prove 
when the queue is empty for borrowed references from a specific 
object, but it could be interleaved with references to other 
objects.




Re: Implement the "unum" representation in D ?

2015-09-16 Thread Timon Gehr via Digitalmars-d

On 09/16/2015 10:46 AM, deadalnix wrote:

On Saturday, 11 July 2015 at 18:16:22 UTC, Timon Gehr wrote:

On 07/11/2015 05:07 PM, Andrei Alexandrescu wrote:

On 7/10/15 11:02 PM, Nick B wrote:

John Gustafson book is now out:

It can be found here:

http://www.amazon.com/End-Error-Computing-Chapman-Computational/dp/1482239868/ref=sr_1_1?s=books&ie=UTF8&qid=1436582956&sr=1-1&keywords=John+Gustafson&pebp=1436583212284&perid=093TDC82KFP9Y4S5PXPY




Very interesting, I'll read it. Thanks! -- Andrei



I think Walter should read chapter 5.


What is this chapter about ?


Relevant quote: "Programmers and users were never given visibility or 
control of when a value was promoted to “double extended precision” 
(80-bit or higher) format, unless they wrote assembly language; it just 
happened automatically, opportunistically, and unpredictably. Confusion 
caused by different results outweighed the advantage of reduced 
rounding-overflow-underflow problems, and now coprocessors must dumb 
down their results to mimic systems that have no such extra scratchpad 
capability."





Re: Implementing typestate

2015-09-16 Thread Marc Schütz via Digitalmars-d
On Wednesday, 16 September 2015 at 17:15:55 UTC, Ola Fosheim 
Grøstad wrote:
On Wednesday, 16 September 2015 at 17:03:14 UTC, Marc Schütz 
wrote:

On Tuesday, 15 September 2015 at 21:44:25 UTC, Freddy wrote:

On Tuesday, 15 September 2015 at 17:45:45 UTC, Freddy wrote:

 Rust style memory management in a library


Wait nevermind about that part, it's harder than I thought.


Yeah, I thought about type-states as a way of implementing 
borrowing, too. I think the biggest difficulty is that the 
state of one object (the owner) can be affected by what 
happens in other objects (i.e., it becomes mutable again when 
those are destroyed).


If the borrowed reference itself follows move semantics, can't 
you just require it to be swallowed by it's origin as the 
"close" operation?


pseudocode:

File f = open();
(File f, FileRef r) = f.borrow();

dostuff(r);

(File f, FileRef r) = f.unborrow(r);

File f = f.close()


But the `unborrow` is explicit. What I'd want is to use the 
implicit destructor call:


struct S {
static struct Ref {
private @typestate alias owner;
private S* p;
@disable this();
this()
typestate(alias owner) {
this.owner := owner; // re-alias operator
this.owner.refcount++;
}
body {
this.p = &owner;
}
this(this) {
this.owner.refcount++;
}
~this() {
this.owner.refcount--;
}
}
@typestate size_t refcount = 0;
S.Ref opUnary(string op : "*")() {
// overload address operator (not yet supported)
return S.Ref(@typestate this);
}
~this() static if(refcount == 0) { }
}

void foo(scope S.Ref p);
void bar(-> S.Ref p); // move
void baz(S.Ref p);

S a;  // => S<0>
{
auto p = &a;  // => S<1>
foo(p);   // pass-by-scope doesn't copy or destroy
  // => S<1>
p.~this();// (implicit) => S<0>
}
{
auto p = &a;  // => S<1>
bar(p);   // pass-by-move, no copy or destruction
  // => S<1>
p.~this();// (implicit) => S<0>
}
{
auto p = &a;  // => S<1>
baz(p);   // compiler sees only the copy,
  // but no destructor => S<2>
p.~this();// (implicit) => S<1>
}
a.~this();// ERROR: a.refcount != 0

The first two cases can be analyzed at the call site. But the 
third one is problematic, because inside `baz()`, the compiler 
doesn't know where the alias actually points to, because it could 
be in an entirely different compilation unit. I guess this can be 
solved by disallowing all operations modifying or depending on an 
alias type-state.


(Other complicated things, like preserving type-state through 
references or array indices, probably shouldn't even be 
attempted.)


Re: Implement the "unum" representation in D ?

2015-09-16 Thread jmh530 via Digitalmars-d

On Wednesday, 16 September 2015 at 08:38:25 UTC, deadalnix wrote:


Also, predictable size mean you can split your dataset and 
process it in parallel, which is impossible if sizes are random.


I don't recall how he would deal with something similar to cache 
misses when you have to promote or demote a unum. However, my 
recollection of the book is that there was quite a bit of focus 
on a unum representation that has the same size as a double. If 
you only did the computations with this format, I would expect 
the sizes would be more-or-less fixed. Promotion would be pretty 
rare, but still possible, I would think.


Compared to calculations with doubles there might not be a strong 
case for energy efficiency (but I don't really know for sure). My 
understanding was that the benefit for energy efficiency is only 
when you use a smaller sized unum instead of a float. I don't 
recall how he would resolve your point about cache misses.


Anyway, while I can see a benefit from using unum numbers 
(accuracy, avoiding overflow, etc.) rather than floating point 
numbers, I think that performance or energy efficiency would have 
to be within range of floating point numbers for it to have any 
meaningful adoption.


Re: Implementing typestate

2015-09-16 Thread Ola Fosheim Grøstad via Digitalmars-d
On Wednesday, 16 September 2015 at 17:15:55 UTC, Ola Fosheim 
Grøstad wrote:

dostuff(r);

(File f, FileRef r) = f.unborrow(r);


Of course, files are tricky since they can change their state 
themselves (like IO error). Doing that statically would require 
some kind of branching mechanism with a try-catch that jumps to a 
different location where the file type changes to "File"...


Sounds non-trivial to bolt onto an existing language.





Re: Implementing typestate

2015-09-16 Thread Ola Fosheim Grøstad via Digitalmars-d
On Wednesday, 16 September 2015 at 17:03:14 UTC, Marc Schütz 
wrote:

On Tuesday, 15 September 2015 at 21:44:25 UTC, Freddy wrote:

On Tuesday, 15 September 2015 at 17:45:45 UTC, Freddy wrote:

 Rust style memory management in a library


Wait nevermind about that part, it's harder than I thought.


Yeah, I thought about type-states as a way of implementing 
borrowing, too. I think the biggest difficulty is that the 
state of one object (the owner) can be affected by what happens 
in other objects (i.e., it becomes mutable again when those are 
destroyed).


If the borrowed reference itself follows move semantics, can't 
you just require it to be swallowed by it's origin as the "close" 
operation?


pseudocode:

File f = open();
(File f, FileRef r) = f.borrow();

dostuff(r);

(File f, FileRef r) = f.unborrow(r);

File f = f.close()








Re: Implementing typestate

2015-09-16 Thread Marc Schütz via Digitalmars-d

On Tuesday, 15 September 2015 at 21:44:25 UTC, Freddy wrote:

On Tuesday, 15 September 2015 at 17:45:45 UTC, Freddy wrote:

 Rust style memory management in a library


Wait nevermind about that part, it's harder than I thought.


Yeah, I thought about type-states as a way of implementing 
borrowing, too. I think the biggest difficulty is that the state 
of one object (the owner) can be affected by what happens in 
other objects (i.e., it becomes mutable again when those are 
destroyed).


Re: GC performance: collection frequency

2015-09-16 Thread H. S. Teoh via Digitalmars-d
On Tue, Sep 15, 2015 at 07:08:01AM +0200, Daniel Kozák via Digitalmars-d wrote:
> 
> http://dlang.org/changelog/2.067.0.html#gc-options
[...]

Wow that is obscure.  This really needs to go into the main docs so
that it can actually be found...


T

-- 
People demand freedom of speech to make up for the freedom of thought which 
they avoid. -- Soren Aabye Kierkegaard (1813-1855)


Re: Implementing typestate

2015-09-16 Thread Ola Fosheim Grøstad via Digitalmars-d

On Wednesday, 16 September 2015 at 16:24:49 UTC, Idan Arye wrote:
No need for `reinterpret_cast`. The `close` function is 
declared in the same module as the `File` struct, so it has 
access to it's private d'tor.


True, so it might work for D. Interesting.



Re: Implementing typestate

2015-09-16 Thread Idan Arye via Digitalmars-d
On Wednesday, 16 September 2015 at 15:57:14 UTC, Ola Fosheim 
Grøstad wrote:
On Wednesday, 16 September 2015 at 15:34:40 UTC, Idan Arye 
wrote:
Move semantics should be enough. We can declare the destructor 
private, and then any code outside the module that implicitly 
calls the d'tor when the variable goes out of scope will raise 
a compilation error. In order to "get rid" of the variable, 
you'll have to pass ownership to the `close` function, so your 
code won't try to implicitly call the d'tor.


Sounds plausible, but does this work in C++ and D? I assume you 
mean that you "reinterpret_cast" to a different type in the 
close() function, which is cheating, but ok :).


No need for `reinterpret_cast`. The `close` function is declared 
in the same module as the `File` struct, so it has access to it's 
private d'tor.


Re: dpaste web site

2015-09-16 Thread John Colvin via Digitalmars-d

On Wednesday, 16 September 2015 at 16:12:03 UTC, Kagamin wrote:
On Wednesday, 16 September 2015 at 13:54:36 UTC, Andrea Fontana 
wrote:
I mean: to check some frequencies of common d keywords/combo 
like "class", "struct", "int", "float", "if(" "while(", "(int 
", "(float ", etc that are not common in plain english used by 
spammers...


Solving dcaptcha costs maybe 1$, so it should solve the problem 
of human spammers (too expensive).


I dunno, I reckon I could solve them in ~5 seconds each, 
especially with practice... At $1/solve it'd be one hell of an 
hourly rate!


Re: dpaste web site

2015-09-16 Thread Kagamin via Digitalmars-d
On Wednesday, 16 September 2015 at 13:54:36 UTC, Andrea Fontana 
wrote:
I mean: to check some frequencies of common d keywords/combo 
like "class", "struct", "int", "float", "if(" "while(", "(int 
", "(float ", etc that are not common in plain english used by 
spammers...


Solving dcaptcha costs maybe 1$, so it should solve the problem 
of human spammers (too expensive).


Re: Implementing typestate

2015-09-16 Thread Ola Fosheim Grøstad via Digitalmars-d

On Wednesday, 16 September 2015 at 15:34:40 UTC, Idan Arye wrote:
Move semantics should be enough. We can declare the destructor 
private, and then any code outside the module that implicitly 
calls the d'tor when the variable goes out of scope will raise 
a compilation error. In order to "get rid" of the variable, 
you'll have to pass ownership to the `close` function, so your 
code won't try to implicitly call the d'tor.


Sounds plausible, but does this work in C++ and D? I assume you 
mean that you "reinterpret_cast" to a different type in the 
close() function, which is cheating, but ok :).




Re: Implementing typestate

2015-09-16 Thread Idan Arye via Digitalmars-d
On Wednesday, 16 September 2015 at 14:34:05 UTC, Ola Fosheim 
Grøstad wrote:
On Wednesday, 16 September 2015 at 10:31:58 UTC, Idan Arye 
wrote:
What's wrong with two `open()`s in a row? Each will return a 
new file handle.


Yes, but if you do it by mistake then you don't get the 
compiler to check that you call close() on both. I should have 
written "what if you forget close()". Will the compiler then 
complain at compile time?


You can't make that happen with just move semantics, you need 
linear typing so that every resource created are consumed 
exactly once.


Move semantics should be enough. We can declare the destructor 
private, and then any code outside the module that implicitly 
calls the d'tor when the variable goes out of scope will raise a 
compilation error. In order to "get rid" of the variable, you'll 
have to pass ownership to the `close` function, so your code 
won't try to implicitly call the d'tor.


Re: dmd codegen improvements

2015-09-16 Thread Ola Fosheim Grøstad via Digitalmars-d
On Wednesday, 16 September 2015 at 14:40:26 UTC, Bruno Medeiros 
wrote:
Me and other people from D community: "ok... now we have a new 
half-baked functionality in D, adding complexity for little 
value, and put here only to please people that are extremely 
unlikely to ever be using D whatever any case"...


D is fun for prototyping ideas, so yes half-baked and not stable, 
but still useful.


I'm waiting for Rust to head down the same lane of adding 
features and obfuscating the syntax (and their starting point is 
even more complex than D's was)...




Re: dmd codegen improvements

2015-09-16 Thread Bruno Medeiros via Digitalmars-d

On 02/09/2015 19:58, Walter Bright wrote:

On 8/29/2015 12:37 PM, Laeeth Isharc wrote:

In my experience you can deliver
everything people say they want, and then find it isn't that at all.


That's so true. My favorite anecdote on that was back in the 1990's. A
friend of mine said that what he and the world really needs was a Java
native compiler. It'd be worth a fortune!

I told him that I had that idea a while back, and had implemented one
for Symantec. I could get him a copy that day.

He changed the subject.

I have many, many similar stories.

I also have many complementary stories - implementing things that people
laugh at me for doing, that turn out to be crucial. We can start with
the laundry list of D features that C++ is rushing to adopt :-)



Yes, and this I think is demonstrative of a very important 
consideration: if someone says they want X (and they are not paying 
upfront for it), then it is crucial for *you* to be able to figure out 
if that person or group actually wants X or not.


If someone spends time building a product or feature that turns out 
people don't want... the failure is on that someone.



And on this aspect I think the development of D does very poorly. Often 
people clamored for a feature or change (whether people in the D 
community, or the C++ one), and Walter you went ahead and did it, 
regardless of whether it will actually increase D usage in the long run. 
You are prone to this, given your nature to please people who ask for 
things, or to prove people wrong (as you yourself admitted).


I apologize for not remembering any example at the moment, but I know 
there was quite a few, especially many years back. It usually went like 
this:


C++ community guy: "D is crap, it's not gonna be used without X"
*some time later*
Walter: "Ok, I've now implemented X in D!"
the same C++ community guy: either finds another feature or change to 
complain about (repeat), or goes silent, or goes "meh, D is still not good"
Me and other people from D community: "ok... now we have a new 
half-baked functionality in D, adding complexity for little value, and 
put here only to please people that are extremely unlikely to ever be 
using D whatever any case"...



--
Bruno Medeiros
https://twitter.com/brunodomedeiros


Re: Implementing typestate

2015-09-16 Thread Ola Fosheim Grøstad via Digitalmars-d

On Wednesday, 16 September 2015 at 10:31:58 UTC, Idan Arye wrote:
What's wrong with two `open()`s in a row? Each will return a 
new file handle.


Yes, but if you do it by mistake then you don't get the compiler 
to check that you call close() on both. I should have written 
"what if you forget close()". Will the compiler then complain at 
compile time?


You can't make that happen with just move semantics, you need 
linear typing so that every resource created are consumed exactly 
once.





Weird "circular initialization of isInputRange" error

2015-09-16 Thread Alex Parrill via Digitalmars-d
This piece of code (which I reduced with dustmite) gives me the 
following error when I try to compile it:


$ rdmd -main parser.d parser.d(28): Error: circular 
initialization of isInputRange
parser.d(31): Error: template instance 
std.meta.staticMap!(handler, ArrayReader*) error instantiating
parser.d(36):instantiated from here: 
unpacker!(RefRange!(immutable(ubyte)[]))
parser.d(40): Error: template instance 
std.range.primitives.isInputRange!(ArrayReader*) error 
instantiating
/usr/include/dmd/phobos/std/meta.d(546):instantiated 
from here: F!(ArrayReader*)
parser.d(43):instantiated from here: staticMap!(toTD, 
ArrayReader*)

Failed: ["dmd", "-main", "-v", "-o-", "parser.d", "-I."]


I'm not really sure what's causing the error; I'm not declaring 
`isInputRange` in my code. Commenting out the definition of `TD` 
(the very last line) removes the error. Am I doing something 
wrong here, or is this a compiler bug?


Tested with dmd v2.068.1 on Linux x64

Code:
-

import std.range;
import std.variant;
import std.typetuple;

///
template unpacker(Range)
{
/// Element data types. See `unpack` for usage.
alias MsgPackData = Algebraic!(
ArrayReader*,
);


/// Reader range for arrays.
struct ArrayReader {
MsgPackData _front;
void update() {
_front.drain;
}

void popFront() {
update;
}
}

void drain(MsgPackData d) {
static handler(T)(T t) {
static if(isInputRange!T)
data;
}
d.visit!(staticMap!(handler, MsgPackData.AllowedTypes));
}
}


alias TestUnpacker = unpacker!(RefRange!(immutable(ubyte)[]));
alias D = TestUnpacker.MsgPackData;

template toTD(T) {
static if(isInputRange!T)
alias toTD = This;
}
alias TD = Algebraic!(staticMap!(toTD, D.AllowedTypes)); // 
test data type




Re: dmd codegen improvements

2015-09-16 Thread Bruno Medeiros via Digitalmars-d

On 28/08/2015 22:59, Walter Bright wrote:

People told me I couldn't write a C compiler, then told me I couldn't
write a C++ compiler. I'm still the only person who has ever implemented
a complete C++ compiler (C++98). Then they all (100%) laughed at me for
starting D, saying nobody would ever use it.

My whole career is built on stepping over people who told me I couldn't
do anything and wouldn't amount to anything.


So your whole career is fundamentally based not on bringing value to the 
software world, but rather merely proving people wrong? That amounts to 
living your professional life in thrall of other people's validation, 
and it's not commendable at all. It's a waste of your potential.


It is only worthwhile to prove people wrong when it brings you a 
considerable amount of either monetary resources or clout - and more so 
than you would have got doing something else with your time.


It's not clear to me that was always the case throughout your career... 
was it?


--
Bruno Medeiros
https://twitter.com/brunodomedeiros


Re: Implement the "unum" representation in D ?

2015-09-16 Thread Ola Fosheim Grøstad via Digitalmars-d

On Wednesday, 16 September 2015 at 08:38:25 UTC, deadalnix wrote:
The energy comparison is bullshit. As long as you haven't 
loaded the data, you don't know how wide they are. Meaning you 
need either to go pessimistic and load for the worst case 
scenario or do 2 round trip to memory.


That really depends on memory layout and algorithm. A likely 
implementation would be a co-processor that would take a unum 
stream and then pipe it through a network of cores (tile based 
co-processor). The internal busses between cores are very very 
fast and with 256+ cores you get tremendous throughput. But you 
need a good compiler/libraries and software support.


The hardware is likely to be slower as you'll need way more 
wiring than for regular floats, and wire is not only cost, but 
also time.


You need more transistors per ALU, but slower does not matter if 
the algorithm needs bounded accuracy or if it converge more 
quickly with unums.  The key challenge for him is to create a 
market, meaning getting the semantics into scientific software 
and getting initial workable implementations out to scientists.


If there is a market demand, then there will be products. But you 
need to create the market first. Hence he wrote an easy to read 
book on the topic and support people who want to implement it.




Re: dpaste web site

2015-09-16 Thread Andrea Fontana via Digitalmars-d

On Wednesday, 16 September 2015 at 13:46:07 UTC, nazriel wrote:
On Wednesday, 16 September 2015 at 06:52:57 UTC, Ola Fosheim 
Grøstad wrote:

How about just using a single click recaptcha:

https://www.google.com/recaptcha/intro/index.html


Used that before - still was getting spam.

As Vladimir mentioned - it costs 0.001$ to get Captcha solved :)


Why don't you try to check some stats over post?

I mean: to check some frequencies of common d keywords/combo like 
"class", "struct", "int", "float", "if(" "while(", "(int ", 
"(float ", etc that are not common in plain english used by 
spammers...





Re: running code on the homepage

2015-09-16 Thread nazriel via Digitalmars-d
On Wednesday, 16 September 2015 at 10:17:21 UTC, Dmitry Olshansky 
wrote:

On 16-Sep-2015 09:44, nazriel wrote:
On Wednesday, 16 September 2015 at 05:54:03 UTC, Andrei 
Amatuni wrote:
maybe I'm doing something wrong...but the output of running 
the

default code snippet on the dlang.org homepage is:

"unable to fork: Cannot allocate memory"

not a good look


Thank you for letting us know,

This issue will be fixed very soon.

Best regards,
Damian Ziemba


May I suggest you to record such conditions with automatic 
notification e.g. by e-mail.


Only 1 in 10 of visitors will consider reporting an issue, of 
these only 1 in 10 will get to dlang forum to post a message.


It is know for me issue.

At the time I was working on runable examples, samples on the 
main page were way simpler.
Not we are hitting some limitations of Container Dpaste's backend 
is running in.


I am working on new version of backend (and new container) as we 
speak so it will be solved once and for all.


1-2 days more and we will be done with it so IMHO no need take 
any additionals steps for it right now.




Re: dpaste web site

2015-09-16 Thread nazriel via Digitalmars-d
On Wednesday, 16 September 2015 at 06:52:57 UTC, Ola Fosheim 
Grøstad wrote:

How about just using a single click recaptcha:

https://www.google.com/recaptcha/intro/index.html


Used that before - still was getting spam.

As Vladimir mentioned - it costs 0.001$ to get Captcha solved :)


Re: running code on the homepage

2015-09-16 Thread Andrei Amatuni via Digitalmars-d
On Wednesday, 16 September 2015 at 10:17:21 UTC, Dmitry Olshansky 
wrote:

On 16-Sep-2015 09:44, nazriel wrote:
On Wednesday, 16 September 2015 at 05:54:03 UTC, Andrei 
Amatuni wrote:
maybe I'm doing something wrong...but the output of running 
the

default code snippet on the dlang.org homepage is:

"unable to fork: Cannot allocate memory"

not a good look


Thank you for letting us know,

This issue will be fixed very soon.

Best regards,
Damian Ziemba


May I suggest you to record such conditions with automatic 
notification e.g. by e-mail.


Only 1 in 10 of visitors will consider reporting an issue, of 
these only 1 in 10 will get to dlang forum to post a message.


well now I feel special :)


Re: Type helpers instead of UFCS

2015-09-16 Thread Per Nordlöw via Digitalmars-d

On Saturday, 12 September 2015 at 20:37:37 UTC, BBasile wrote:

UFCS is good but there are two huge problems:
- code completion in IDE. It'will never work.


Is is possible.

DCD plans to support it:

https://github.com/Hackerpilot/DCD/issues/13

I agree that this is a big issue, though, and is one of the most 
important things to work on.


Re: Implementing typestate

2015-09-16 Thread Idan Arye via Digitalmars-d
On Wednesday, 16 September 2015 at 06:25:59 UTC, Ola Fosheim 
Grostad wrote:
On Wednesday, 16 September 2015 at 05:51:50 UTC, Tobias Müller 
wrote:
Ola Fosheim Grøstad  
wrote:
On Tuesday, 15 September 2015 at 20:34:43 UTC, Tobias Müller 
wrote:

There's a Blog post somewhere but I can't find it atm.


Ok found it: > 
http://pcwalton.github.io/blog/2012/12/26/typestate-is-dead/


But that is for runtime detection, not compile time?


Not as far as I understand it.
The marker is a type, not a value. And it's used as template 
param.

But you need non-copyable move-only types for it to work.


Yes... But will it prevent you from doing two open() in a row 
at compiletime?


What's wrong with two `open()`s in a row? Each will return a new 
file handle.


Re: running code on the homepage

2015-09-16 Thread Dmitry Olshansky via Digitalmars-d

On 16-Sep-2015 09:44, nazriel wrote:

On Wednesday, 16 September 2015 at 05:54:03 UTC, Andrei Amatuni wrote:

maybe I'm doing something wrong...but the output of running the
default code snippet on the dlang.org homepage is:

"unable to fork: Cannot allocate memory"

not a good look


Thank you for letting us know,

This issue will be fixed very soon.

Best regards,
Damian Ziemba


May I suggest you to record such conditions with automatic notification 
e.g. by e-mail.


Only 1 in 10 of visitors will consider reporting an issue, of these only 
1 in 10 will get to dlang forum to post a message.


--
Dmitry Olshansky


Re: running code on the homepage

2015-09-16 Thread ixid via Digitalmars-d

On Wednesday, 16 September 2015 at 06:44:30 UTC, nazriel wrote:
On Wednesday, 16 September 2015 at 05:54:03 UTC, Andrei Amatuni 
wrote:
maybe I'm doing something wrong...but the output of running 
the default code snippet on the dlang.org homepage is:


"unable to fork: Cannot allocate memory"

not a good look


Thank you for letting us know,

This issue will be fixed very soon.

Best regards,
Damian Ziemba


Would it be possible to set things up so ones that fail are 
retired until they can be fixed? Non-working examples look awful 
for the language.


Re: Implement the "unum" representation in D ?

2015-09-16 Thread Ola Fosheim Grøstad via Digitalmars-d

On Wednesday, 16 September 2015 at 08:17:59 UTC, Don wrote:
I'm not convinced. I think they are downplaying the hardware 
difficulties. Slide 34:


I don't think he is downplaying it. He has said that it will 
probably take at least 10 years before it is available in 
hardware. There is also a company called Rex Computing that are 
looking at unum:


http://www.theplatform.net/2015/07/22/supercomputer-chip-startup-scores-funding-darpa-contract/

He assumes that you use a scratchpad (a big register file), not 
caching, for intermediate calculations.


His basic reasoning is that brute force ubox methods makes for 
highly parallel calculations. It might be possible to design ALUs 
that can work with various unum bit widths efficiently (many 
small or a few large)... who knows. You'll have to try first.


Let's not forget that there is a _lot_ of legacy constraints and 
architectural assumptions in both x86 architecture.


The energy comparisons are plain dishonest. The power required 
for accessing from DRAM is the energy consumption of a *cache 
miss* !! What's the energy consumption of a load from cache?


I think this argument is aiming at HPC where you can find funding 
for ASICs. They push a lot of data over the memory bus.




Re: Implement the "unum" representation in D ?

2015-09-16 Thread deadalnix via Digitalmars-d

On Saturday, 11 July 2015 at 18:16:22 UTC, Timon Gehr wrote:

On 07/11/2015 05:07 PM, Andrei Alexandrescu wrote:

On 7/10/15 11:02 PM, Nick B wrote:

John Gustafson book is now out:

It can be found here:

http://www.amazon.com/End-Error-Computing-Chapman-Computational/dp/1482239868/ref=sr_1_1?s=books&ie=UTF8&qid=1436582956&sr=1-1&keywords=John+Gustafson&pebp=1436583212284&perid=093TDC82KFP9Y4S5PXPY



Very interesting, I'll read it. Thanks! -- Andrei



I think Walter should read chapter 5.


What is this chapter about ?


Re: Implement the "unum" representation in D ?

2015-09-16 Thread deadalnix via Digitalmars-d

On Wednesday, 16 September 2015 at 08:17:59 UTC, Don wrote:
On Tuesday, 15 September 2015 at 11:13:59 UTC, Ola Fosheim 
Grøstad wrote:

On Tuesday, 15 September 2015 at 10:38:23 UTC, ponce wrote:
On Tuesday, 15 September 2015 at 09:35:36 UTC, Ola Fosheim 
Grøstad wrote:

http://sites.ieee.org/scv-cs/files/2013/03/Right-SizingPrecision1.pdf


That's a pretty convincing case. Who does it :)?


I'm not convinced. I think they are downplaying the hardware 
difficulties. Slide 34:


Disadvantages of the Unum Format
* Non-power-of-two alignment. Needs packing and unpacking, 
garbage collection.


I think that disadvantage is so enormous that it negates most 
of the advantages. Note that in the x86 world, unaligned memory 
loads of SSE values still take longer than aligned loads. And 
that's a trivial case!


The energy savings are achieved by using a primitive form of 
compression. Sure, you can reduce the memory bandwidth required 
by compressing the data. You could do that for *any* form of 
data, not just floating point. But I don't think anyone thinks 
that's worthwhile.




GPU do it a lot. Especially, but not exclusively on mobile. Not 
to reduce the misses (a miss is pretty much guaranteed, you load 
32 thread at once in a shader core, each of them will require at 
least 8 pixel for a bilinear texture with mipmap, that's the bare 
minimum. That means 256 memory access at once. One of these pixel 
WILL miss, and it is going to stall the 32 threads). It is not a 
latency issue, but a bandwidth and energy one.


But yeah, in the general case, random access is preferable, 
memory alignment, and the fact you don't need to do as much 
bookeeping are very significants.


Also, predictable size mean you can split your dataset and 
process it in parallel, which is impossible if sizes are random.


The energy comparisons are plain dishonest. The power required 
for accessing from DRAM is the energy consumption of a *cache 
miss* !! What's the energy consumption of a load from cache? 
That would show you what the real gains are, and my guess is 
they are tiny.




The energy comparison is bullshit. As long as you haven't loaded 
the data, you don't know how wide they are. Meaning you need 
either to go pessimistic and load for the worst case scenario or 
do 2 round trip to memory.


The author also use a lot the wire vs transistor cost, and how it 
evolved? He is right. Except that you won't cram more wire at 
runtime into the CPU. The CPU need the wiring for the worst case 
scenario, always.


The hardware is likely to be slower as you'll need way more 
wiring than for regular floats, and wire is not only cost, but 
also time.


That being said, even a hit in L1 is very energy hungry. Think 
about it, you need to go a 8 - way fetch (so you'll end up 
loading 4k of data from the cache) in parallel with address 
translation (usually 16 ways) in parallel with snooping into the 
load and the store buffer.


If the load is not aligned, you pretty much have to multiply this 
by 2 if it cross a cache line boundary.


I'm not sure what his number represent, but hitting L1 is quite 
power hungry. He is right on that one.



So:
* I don't believe the energy savings are real.
* There is no guarantee that it would be possible to implement 
it in hardware without a speed penalty, regardless of how many 
transistors you throw at it (hardware analogue of Amdahl's Law)

* but the error bound stuff is cool.


Yup, that's pretty much what I get out of it as well.



Re: Implement the "unum" representation in D ?

2015-09-16 Thread Don via Digitalmars-d
On Tuesday, 15 September 2015 at 11:13:59 UTC, Ola Fosheim 
Grøstad wrote:

On Tuesday, 15 September 2015 at 10:38:23 UTC, ponce wrote:
On Tuesday, 15 September 2015 at 09:35:36 UTC, Ola Fosheim 
Grøstad wrote:

http://sites.ieee.org/scv-cs/files/2013/03/Right-SizingPrecision1.pdf


That's a pretty convincing case. Who does it :)?


I'm not convinced. I think they are downplaying the hardware 
difficulties. Slide 34:


Disadvantages of the Unum Format
* Non-power-of-two alignment. Needs packing and unpacking, 
garbage collection.


I think that disadvantage is so enormous that it negates most of 
the advantages. Note that in the x86 world, unaligned memory 
loads of SSE values still take longer than aligned loads. And 
that's a trivial case!


The energy savings are achieved by using a primitive form of 
compression. Sure, you can reduce the memory bandwidth required 
by compressing the data. You could do that for *any* form of 
data, not just floating point. But I don't think anyone thinks 
that's worthwhile.


The energy comparisons are plain dishonest. The power required 
for accessing from DRAM is the energy consumption of a *cache 
miss* !! What's the energy consumption of a load from cache? That 
would show you what the real gains are, and my guess is they are 
tiny.


So:
* I don't believe the energy savings are real.
* There is no guarantee that it would be possible to implement it 
in hardware without a speed penalty, regardless of how many 
transistors you throw at it (hardware analogue of Amdahl's Law)

* but the error bound stuff is cool.