Re: Desperately looking for a work-around to load and unload D shared libraries from C on OSX
On Wednesday, 16 September 2015 at 23:24:29 UTC, bitwise wrote: I was trying to solve this one myself, but the modifications to DMD's backend that are needed are out of reach for me right now. If you're willing to build your own druntime, you may be able to get by. I'd prefer a solution that works with existing compilers, but maybe building a custom LDC is possible if I figure it out. If I understand correctly, you want to repeatedly load/unload the same shared library, correct? I ask because druntime for osx only supports loading a single image at a time: https://github.com/D-Programming-Language/druntime/blob/1e25749cd01ad08dc08319a3853fbe86356c3e62/src/rt/sections_osx.d#L26 In practive I've found that the D shared libraries I produce can be dlopen/dlclose any number of times, simultaneous too. Using both LDC and DMD, don't know why it works. The thing that doesn't work is the C host program dlopen'ing the shared library, dlclose it, then dlopen another shared library written in C. Anyways, when main() of a D program runs, it calls rt_init() and rt_term(). If you don't have a D entry point in your program, you have to retrieve these from your shared lib(which has druntime statically linked) using dlsym() and call them yourself. I don't control the host program. My shared libs do have an entrypoint, from which I call Runtime.initialize(). I can also use LDC global constructor/destructor to call Runtime.initialize / Runtime.terminate, but it doesn't work any better because of the callback. https://github.com/D-Programming-Language/druntime/blob/478b6c5354470bc70e688c45821eea71b766e70d/src/rt/dmain2.d#L158 Now, initSections() and finiSections() are responsible for setting up the image. If you look at initSections(), the function "_dyld_register_func_for_add_image" is the one that causes the crash, because there is no way to remove the callback, which will reside in your shared lib. https://github.com/D-Programming-Language/druntime/blob/1e25749cd01ad08dc08319a3853fbe86356c3e62/src/rt/sections_osx.d#L76 So what happens is, when you call _dyld_register_func_for_add_image, dyld will call your callback for every shared-library/image(including the main application's image) that is currently loaded. However, you can skip the callback and just call "sections_osx_onAddImage" yourself. You would have to add something like this to sections_osx.d, and call it instead of adding the callback: void callOnAddImage() { // dladdr() should give you information about the // shared lib in which the symbol you pass resides. // Passing the address of this function should work. Dl_info info; int ret = dladdr(cast(void*)&callOnAddImage, &info); assert(ret); // "dli_fbase" is actually a pointer to // the mach_header for the shared library. // once you have the mach_header, you can // also retrieve the image slide, and finally // call sections_osx_onAddImage(). mach_header* header = cast(mach_header*)info.dli_fbase; intptr_t slide = _dyld_get_image_slide(header); sections_osx_onAddImage(header, slide); } Now, if you look at finiSections(), it seems to be incomplete. There is nothing like sections_osx_onRemoveImage, so you'll have to complete it to make sure the library is unloaded correctly: https://github.com/D-Programming-Language/druntime/blob/1e25749cd01ad08dc08319a3853fbe86356c3e62/src/rt/sections_osx.d#L83 You'll may have to make other mods here and there to get this working correctly, but this is the bulk of it. Bit Thanks for your answer. This is really helpful, though I don't understand the first thing about what images, headers and sections are in this context.
Re: Anyone interested on working on a D parser ?
On Thursday, 17 September 2015 at 01:35:42 UTC, Leandro T. C. Melo wrote: An alternative would be a LL parser generator. I think ANTLR added a C++ target, but I don't know how mature it is. I used C++ target of ANTLR like 13 years ago and it was fine. So I suppose it should be mature now. ;)
Re: Anyone interested on working on a D parser ?
On Thursday, 17 September 2015 at 01:38:02 UTC, Adam D. Ruppe wrote: Did you take a look at https://github.com/Hackerpilot/libdparse/tree/master already? Yes. libdparse and/or SDC's parser seems like some good places to start.
Checked integer type API design poll
I have written some poll questions concerning the design trade-offs involved in making a good `SafeInt`/`CheckedInt` type. They are about the actual semantics of the API, not the internals, nor bike-shedding about names. (`SafeInt` and `CheckedInt` are integer data types which use `core.checkedint` to guard against overflow, divide-by-zero, etc. Links to current work-in-progress versions by 1) Robert Schadek (burner): https://github.com/D-Programming-Language/phobos/pull/3389 2) Myself (tsbockman): https://github.com/tsbockman/CheckedInt) For the purposes of this poll please assume the following (based on my own extensive testing): 1) Code using checked operations will take about **1.5x longer to run** than unchecked code. (If you compile with GDC, anyway; DMD and LDC are another story...) 2) The main design decision with a significant runtime performance cost, is whether to throw exceptions or not. With some optimization, the hit is modest, but noticeable. 3) Even if the API uses exceptions some places, it can still be used in `nothrow @nogc` code, at the cost of some extra typing. Two further points I would ask the reader to consider: * A checked integer type is fundamentally semantically different from an unchecked type. The difference is of similar magnitude to that of floating-point vs fixed-point. * It might be wise to read the entire poll before answering it - the questions are all related in some way. The poll results are here, if you wish to preview the questions: http://polljunkie.com/poll/kzrije/checked-integer-type-behaviour/view When you are ready, please take the poll yourself: http://polljunkie.com/poll/cytdbq/checked-integer-type-behaviour Thanks for your time.
Re: Anyone interested on working on a D parser ?
Did you take a look at https://github.com/Hackerpilot/libdparse/tree/master already?
Anyone interested on working on a D parser ?
Hi D enthusiasts, I'm developing a multi-language code modelling engine. The heart of the project is a language-unifying AST, a generic pipeline of binding, type checking, code completion, etc, and hooks that allow each language to plug-in their specific behavior where needed. Also, the library is not tight to any particular IDE or text editor. One "issue" I have so far is the D parser. Mostly because of convenience I prototyped it with Bison. Despite being tricky to get such LR parsers working in an interactive environment, it's still possible to error-recover at the right spots and provide a decent user experience - you can see some action in the videos below, one for D and another for Go [1]. However, in the case of D there's an additional challenge due to its grammar. Even though I'm using a GLR parser (so ambiguities are handled), it's still difficult to get everything in place. Would anyone be interested on working out this parser or perhaps building a recursive descent one? The parser is supposed to be lightweight, not to perform symbol lookup (it can afford some impreciseness), and its result must be the special AST. Therefore, simply taking the official dmd2's parser is not a solution, although it could certainly server as a reference. An alternative would be a LL parser generator. I think ANTLR added a C++ target, but I don't know how mature it is. There's also llgen, but I never tried it. I might experiment one of them with Rust. This is a project I work on my free time, but I'm trying to make it move. So if anyone is interested, please get in touch, I'd be glad to take contributions: https://github.com/ltcmelo/uaiso Leandro [1] https://www.youtube.com/watch?v=ZwMQ_GB-Zv0 and https://www.youtube.com/watch?v=nUpcVBAw0DM
Re: Implement the "unum" representation in D ?
On Wed, Sep 16, 2015 at 08:06:42PM +, deadalnix via Digitalmars-d wrote: [...] > When you have a floating point unit, you get your 32 bits you get 23 > bits that go into the mantissa FU and 8 in the exponent FU. For > instance, if you multiply floats, you send the 2 exponent into a > adder, you send the 2 mantissa into a 24bits multiplier (you add a > leading 1), you xor the bit signs. > > You get the carry from the adder, and emit a multiply, or you count > the leading 0 of the 48bit multiply result, shift by that amount and > add the shit to the exponent. > > If you get a carry in the exponent adder, you saturate and emit an > inifinity. > > Each bit goes into a given functional unit. That mean you need on wire > from the input to the functional unit is goes to. Sale for these > result. > > Now, if the format is variadic, you need to wire all bits to all > functional units, because they can potentially end up there. That's a > lot of wire, in fact the number of wire is growing quadratically with > that joke. > > The author keep repeating that wire became the expensive thing and he > is right. Meaning a solution with quadratic wiring is not going to cut > it. I found this .pdf that explains the unum representation a bit more: http://sites.ieee.org/scv-cs/files/2013/03/Right-SizingPrecision1.pdf On p.31, you can see the binary representation of unum. The utag has 3 bits for exponent size, presumably meaning the exponent can vary in size up to 7 bits. There are 5 bits in the utag for the mantissa, so it can be anywhere from 0 to 31 bits. It's not completely variadic, but it's complex enough that you will probably need some kind of shift register to extract the exponent and mantissa so that you can pass them in the right format to the various parts of the hardware. It definitely won't be as straightforward as the current floating-point format; you can't just wire the bits directly to the adders and multipliers. This is probably what the author meant by needing "more transistors". I guess his point was that we have to do more work in the CPU, but in return we (hopefully) reduce the traffic to DRAM, thereby saving the cost of data transfer. I'm not so sure how well this will work in practice, though, unless we have a working prototype that proves the benefits. What if you have a 10*10 unum matrix, and during some operation the size of the unums in the matrix changes? Assuming the worst case, you could have started out with 10*10 unums with small exponent/mantissa, maybe fitting in 2-3 cache lines, but after the operation most of the entries expand to 7-bit exponent and 31-bit mantissa, so now your matrix doesn't fit into the allocated memory anymore. So now your hardware has to talk to druntime to have it allocate new memory for storing the resulting unum matrix? The only sensible solution seems to be to allocate the maximum size for each matrix entry, so that if the value changes you won't run out of space. But that means we have lost the benefit of having a variadic encoding to begin with -- you will have to transfer the maximum size's worth of data when you load the matrix from DRAM, even if most of that data is unused (because the unum only takes up a small percentage of the space). The author proposed GC, but I have a hard time imagining a GC implemented in *CPU*, no less, colliding with the rest of the world where it's the *software* that controls DRAM allocation. (GC too slow for your application? Too bad, gotta upgrade your CPU...) The way I see it from reading the PDF slides, is that what the author is proposing would work well as a *software* library, perhaps backed up by hardware support for some of the lower-level primitives. I'm a bit skeptical of the claims of data traffic / power savings, unless there is hard data to prove that it works. T -- "The number you have dialed is imaginary. Please rotate your phone 90 degrees and try again."
Re: Desperately looking for a work-around to load and unload D shared libraries from C on OSX
On Wednesday, 16 September 2015 at 22:29:46 UTC, ponce wrote: Context: On OSX, a C program can load a D shared library but once unloaded the next dlopen will crash, jumping into a callback that doesn't exist anymore. I've filed it here: https://issues.dlang.org/show_bug.cgi?id=15060 It looks like this was known and discussed several times already: http://forum.dlang.org/post/vixoqmidlbizawbxm...@forum.dlang.org (2015) https://github.com/D-Programming-Language/druntime/pull/228 (2012) Any idea to work-around this problem would be awesome. I'm not looking for something correct, future-proof, or pretty. Any shit that still stick to the wall will do. Anything! The only case I need to support is: C host, D shared library, with runtime statically linked. Please help! I was trying to solve this one myself, but the modifications to DMD's backend that are needed are out of reach for me right now. If you're willing to build your own druntime, you may be able to get by. If I understand correctly, you want to repeatedly load/unload the same shared library, correct? I ask because druntime for osx only supports loading a single image at a time: https://github.com/D-Programming-Language/druntime/blob/1e25749cd01ad08dc08319a3853fbe86356c3e62/src/rt/sections_osx.d#L26 Anyways, when main() of a D program runs, it calls rt_init() and rt_term(). If you don't have a D entry point in your program, you have to retrieve these from your shared lib(which has druntime statically linked) using dlsym() and call them yourself. https://github.com/D-Programming-Language/druntime/blob/478b6c5354470bc70e688c45821eea71b766e70d/src/rt/dmain2.d#L158 Now, initSections() and finiSections() are responsible for setting up the image. If you look at initSections(), the function "_dyld_register_func_for_add_image" is the one that causes the crash, because there is no way to remove the callback, which will reside in your shared lib. https://github.com/D-Programming-Language/druntime/blob/1e25749cd01ad08dc08319a3853fbe86356c3e62/src/rt/sections_osx.d#L76 So what happens is, when you call _dyld_register_func_for_add_image, dyld will call your callback for every shared-library/image(including the main application's image) that is currently loaded. However, you can skip the callback and just call "sections_osx_onAddImage" yourself. You would have to add something like this to sections_osx.d, and call it instead of adding the callback: void callOnAddImage() { // dladdr() should give you information about the // shared lib in which the symbol you pass resides. // Passing the address of this function should work. Dl_info info; int ret = dladdr(cast(void*)&callOnAddImage, &info); assert(ret); // "dli_fbase" is actually a pointer to // the mach_header for the shared library. // once you have the mach_header, you can // also retrieve the image slide, and finally // call sections_osx_onAddImage(). mach_header* header = cast(mach_header*)info.dli_fbase; intptr_t slide = _dyld_get_image_slide(header); sections_osx_onAddImage(header, slide); } Now, if you look at finiSections(), it seems to be incomplete. There is nothing like sections_osx_onRemoveImage, so you'll have to complete it to make sure the library is unloaded correctly: https://github.com/D-Programming-Language/druntime/blob/1e25749cd01ad08dc08319a3853fbe86356c3e62/src/rt/sections_osx.d#L83 You'll may have to make other mods here and there to get this working correctly, but this is the bulk of it. Bit
Desperately looking for a work-around to load and unload D shared libraries from C on OSX
Context: On OSX, a C program can load a D shared library but once unloaded the next dlopen will crash, jumping into a callback that doesn't exist anymore. I've filed it here: https://issues.dlang.org/show_bug.cgi?id=15060 It looks like this was known and discussed several times already: http://forum.dlang.org/post/vixoqmidlbizawbxm...@forum.dlang.org (2015) https://github.com/D-Programming-Language/druntime/pull/228 (2012) Any idea to work-around this problem would be awesome. I'm not looking for something correct, future-proof, or pretty. Any shit that still stick to the wall will do. Anything! The only case I need to support is: C host, D shared library, with runtime statically linked. Please help!
Re: running code on the homepage
On Wednesday, 16 September 2015 at 09:52:23 UTC, ixid wrote: On Wednesday, 16 September 2015 at 06:44:30 UTC, nazriel wrote: On Wednesday, 16 September 2015 at 05:54:03 UTC, Andrei Amatuni wrote: maybe I'm doing something wrong...but the output of running the default code snippet on the dlang.org homepage is: "unable to fork: Cannot allocate memory" not a good look Thank you for letting us know, This issue will be fixed very soon. Best regards, Damian Ziemba Would it be possible to set things up so ones that fail are retired until they can be fixed? Non-working examples look awful for the language. https://github.com/D-Programming-Language/dlang.org/pull/1098 This removes unfixable examples. I think Damian is working on getting the one fixable-but-broken example (rounding floating-point numbers) to work.
Re: Implement the "unum" representation in D ?
On Wednesday, 16 September 2015 at 21:12:11 UTC, Ola Fosheim Grøstad wrote: On Wednesday, 16 September 2015 at 20:53:37 UTC, deadalnix wrote: On Wednesday, 16 September 2015 at 20:30:36 UTC, Ola Fosheim Grøstad wrote: On Wednesday, 16 September 2015 at 20:06:43 UTC, deadalnix wrote: You know, when you have no idea what you are talking about, you can just move on to something you understand. Ah, nice move. Back to your usual habits? Stop OK. I stop. You are beyond reason. True, how blind I was. It is fairly obvious now, thinking about it, that you can get 3 order of magnitude increase in sequential decoding in hardware by having a compiler with a vectorized SSA and a scratchpad ! Or maybe you have number to present us that show I'm wrong ?
Re: Implement the "unum" representation in D ?
On Wednesday, 16 September 2015 at 20:53:37 UTC, deadalnix wrote: On Wednesday, 16 September 2015 at 20:30:36 UTC, Ola Fosheim Grøstad wrote: On Wednesday, 16 September 2015 at 20:06:43 UTC, deadalnix wrote: You know, when you have no idea what you are talking about, you can just move on to something you understand. Ah, nice move. Back to your usual habits? Stop OK. I stop. You are beyond reason.
Re: Implement the "unum" representation in D ?
On Wednesday, 16 September 2015 at 20:35:16 UTC, Wyatt wrote: On Wednesday, 16 September 2015 at 08:53:24 UTC, Ola Fosheim Grøstad wrote: I don't think he is downplaying it. He has said that it will probably take at least 10 years before it is available in hardware. There is also a company called Rex Computing that are looking at unum: Oh hey, I remember these jokers. They were trying to blow some smoke about moving 288 GB/s at 4W. They're looking at unum? Of course they are; care to guess who's advising them? Yep. I'll be shocked if they ever even get to tape out. Yes, of course, most startups in hardware don't succeed. I assume they get knowhow from Adapteva.
Re: running code on the homepage
On 09/16/2015 09:49 AM, nazriel wrote: 1-2 days more and we will be done with it so IMHO no need take any additionals steps for it right now. That's great, thanks for doing this. What is the current status with regard to making the online compilation infrastructure publicly accessible and improvable? Ideally everything would be in the open, and we (= the fledgling D Language Foundation) would pay for the server infrastructure. Please advise, thanks. -- Andrei
Re: Implement the "unum" representation in D ?
On Wednesday, 16 September 2015 at 20:30:36 UTC, Ola Fosheim Grøstad wrote: On Wednesday, 16 September 2015 at 20:06:43 UTC, deadalnix wrote: You know, when you have no idea what you are talking about, you can just move on to something you understand. Ah, nice move. Back to your usual habits? Stop Prefetching would not change anything here. The problem come from variable size encoding, and the challenge it causes for hardware. You can have 100% L1 hit and still have the same problem. There is _no_ cache. The compiler fully controls the layout of the scratchpad. You are the king of goalspot shifting. You answer about x86 decoding you get served. You want to talk about a scraptch pad ? Good ! How do the data ends up in the scratchpad to begin with ? Using magic ? What is the scraptchpad made of if not flip flops ? If if so, how is it different from a cache as far as the hardware is concerned ? You can play with words, but the problem remain the same. When you get on chip memory, be it cache or scratchpad, and a variadic encoding, you can't even feed a handful of ALUs. How do you expect to feed 256+ VLIW cores ? There are 3 order of magintude of gap in your reasoning. You can't pull 3 orders of magnitude out of your ass and just pretend it can be done. That's hardware 101. Is it? Yes wire is hardware 101. I mean seriously, if one do not get how component can be wired together, one should probably abstain from making any hardware comment. You cannot predict at this point what the future will be like. Is it unlikely that anything specific will change status quo? Yes. Is it highly probable that something will change status quo? Yes. Will it happen over night. No. 50+ years has been invested in floating point design. Will this be offset over night, no. It'll probably take 10+ years before anyone has a different type of numerical ALU on their desktop than IEEE754. By that time we are in a new era. Ok listen that is not complicated. I don't know what car will come out next year? But I know there won't be a car that can go 1km on 10 centiliter of gazoline. This would be physic defying stuff. Same thing you won't be able to feed 256+ cores if you load data sequentially. Don't get me this stupid we don't know what's going to happen tomorow bullshit. We won't have unicorn meat in supermarkets. We won't have free energy. We won't have interstellar travel. And we won't have the capability to feed 256+ cores sequentially. I gave you numbers you gave me bullshit.
Re: dmd codegen improvements
On 9/16/2015 7:16 AM, Bruno Medeiros wrote: On 28/08/2015 22:59, Walter Bright wrote: People told me I couldn't write a C compiler, then told me I couldn't write a C++ compiler. I'm still the only person who has ever implemented a complete C++ compiler (C++98). Then they all (100%) laughed at me for starting D, saying nobody would ever use it. My whole career is built on stepping over people who told me I couldn't do anything and wouldn't amount to anything. So your whole career is fundamentally based not on bringing value to the software world, but rather merely proving people wrong? That amounts to living your professional life in thrall of other people's validation, and it's not commendable at all. It's a waste of your potential. It is only worthwhile to prove people wrong when it brings you a considerable amount of either monetary resources or clout - and more so than you would have got doing something else with your time. It's not clear to me that was always the case throughout your career... was it? Wow, such an interpretation never occurred to me. I will reiterate that I worked on things that I believed had value and nobody else did. I.e. I did not need validation from others.
Re: Implement the "unum" representation in D ?
On Wednesday, 16 September 2015 at 08:53:24 UTC, Ola Fosheim Grøstad wrote: I don't think he is downplaying it. He has said that it will probably take at least 10 years before it is available in hardware. There is also a company called Rex Computing that are looking at unum: Oh hey, I remember these jokers. They were trying to blow some smoke about moving 288 GB/s at 4W. They're looking at unum? Of course they are; care to guess who's advising them? Yep. I'll be shocked if they ever even get to tape out. -Wyatt
Re: Implement the "unum" representation in D ?
On Wednesday, 16 September 2015 at 20:06:43 UTC, deadalnix wrote: You know, when you have no idea what you are talking about, you can just move on to something you understand. Ah, nice move. Back to your usual habits? Prefetching would not change anything here. The problem come from variable size encoding, and the challenge it causes for hardware. You can have 100% L1 hit and still have the same problem. There is _no_ cache. The compiler fully controls the layout of the scratchpad. That's hardware 101. Is it? The core point is this: 1. if there is academic interest (i.e. publishing opportunities) you get research 2. if there is research you get new algorithms 3. you get funding etc You cannot predict at this point what the future will be like. Is it unlikely that anything specific will change status quo? Yes. Is it highly probable that something will change status quo? Yes. Will it happen over night. No. 50+ years has been invested in floating point design. Will this be offset over night, no. It'll probably take 10+ years before anyone has a different type of numerical ALU on their desktop than IEEE754. By that time we are in a new era.
Re: Implement the "unum" representation in D ?
On Wednesday, 16 September 2015 at 19:40:49 UTC, Ola Fosheim Grøstad wrote: You can load continuously 64 bytes in a stream, decode to your internal format and push them into the scratchpad of other cores. You could even do this in hardware. 1/ If you load the worst case scenario, then your power advantage is gone. 2/ If you load these one by one, how do you expect to feed 256+ cores ? Obviously you can make this in hardware. And obviously this is not going to be able to feed 256+ cores. Even with a chip at low frequency, let's say 800MHz or so, you have about 80 cycles to access memory. That mean you need to have 20 000+ cycles of work to do per core per unum. That simple back of the envelope calculation. Your proposal is simply ludicrous. It's a complete non starter. You can make this in hardware. Sure you can, no problem. But you won't because it is a stupid idea. To gives you a similar example, x86 decoding is often the bottleneck on an x86 CPU. The number of ALUs in x86 over the past decade decreased rather than increased, because you simply can't decode fast enough to feed them. Yet, x86 CPUs have a 64 ways speculative decoding as a first stage. That's because we use a dumb compiler that does not prefetch intelligently. You know, when you have no idea what you are talking about, you can just move on to something you understand. Prefetching would not change anything here. The problem come from variable size encoding, and the challenge it causes for hardware. You can have 100% L1 hit and still have the same problem. No sufficiently smart compiler can fix that. If you are writing for a tile based VLIW CPU you preload. These calculations are highly iterative so I'd rather think of it as a co-processor solving a single equation repeatedly than running the whole program. You can run the larger program on a regular CPU or a few cores. That's irrelevant. The problem is not the kind of CPU, it is how do you feed it at a fast enough rate. The problem is not transistor it is wire. Because the damn thing is variadic in every ways, pretty much every bit as input can end up anywhere in the functional unit. That is a LOT of wire. I haven't seen a design, so I cannot comment. But keep in mind that the CPU does not have to work with the format, it can use a different format internally. We'll probably see FPGA implementations that can be run on FPGU cards for PCs within a few years. I read somewhere that a group in Singapore was working on it. That's hardware 101. When you have a floating point unit, you get your 32 bits you get 23 bits that go into the mantissa FU and 8 in the exponent FU. For instance, if you multiply floats, you send the 2 exponent into a adder, you send the 2 mantissa into a 24bits multiplier (you add a leading 1), you xor the bit signs. You get the carry from the adder, and emit a multiply, or you count the leading 0 of the 48bit multiply result, shift by that amount and add the shit to the exponent. If you get a carry in the exponent adder, you saturate and emit an inifinity. Each bit goes into a given functional unit. That mean you need on wire from the input to the functional unit is goes to. Sale for these result. Now, if the format is variadic, you need to wire all bits to all functional units, because they can potentially end up there. That's a lot of wire, in fact the number of wire is growing quadratically with that joke. The author keep repeating that wire became the expensive thing and he is right. Meaning a solution with quadratic wiring is not going to cut it.
Re: Implement the "unum" representation in D ?
On Wednesday, 16 September 2015 at 19:21:59 UTC, deadalnix wrote: No you don't. Because the streamer still need to load the unum one by one. Maybe 2 by 2 with a fair amount of hardware speculation (which means you are already trading energy for performances, so the energy argument is weak). There is no way you can feed 256+ cores that way. You can load continuously 64 bytes in a stream, decode to your internal format and push them into the scratchpad of other cores. You could even do this in hardware. If you look at the ubox brute forcing method you compute many calculations over the same data, because you solve spatially, not by timesteps. So you can run many many parallell computations over the same data. To gives you a similar example, x86 decoding is often the bottleneck on an x86 CPU. The number of ALUs in x86 over the past decade decreased rather than increased, because you simply can't decode fast enough to feed them. Yet, x86 CPUs have a 64 ways speculative decoding as a first stage. That's because we use a dumb compiler that does not prefetch intelligently. If you are writing for a tile based VLIW CPU you preload. These calculations are highly iterative so I'd rather think of it as a co-processor solving a single equation repeatedly than running the whole program. You can run the larger program on a regular CPU or a few cores. The problem is not transistor it is wire. Because the damn thing is variadic in every ways, pretty much every bit as input can end up anywhere in the functional unit. That is a LOT of wire. I haven't seen a design, so I cannot comment. But keep in mind that the CPU does not have to work with the format, it can use a different format internally. We'll probably see FPGA implementations that can be run on FPGU cards for PCs within a few years. I read somewhere that a group in Singapore was working on it.
Re: Implement the "unum" representation in D ?
On Wednesday, 16 September 2015 at 14:11:04 UTC, Ola Fosheim Grøstad wrote: On Wednesday, 16 September 2015 at 08:38:25 UTC, deadalnix wrote: The energy comparison is bullshit. As long as you haven't loaded the data, you don't know how wide they are. Meaning you need either to go pessimistic and load for the worst case scenario or do 2 round trip to memory. That really depends on memory layout and algorithm. A likely implementation would be a co-processor that would take a unum stream and then pipe it through a network of cores (tile based co-processor). The internal busses between cores are very very fast and with 256+ cores you get tremendous throughput. But you need a good compiler/libraries and software support. No you don't. Because the streamer still need to load the unum one by one. Maybe 2 by 2 with a fair amount of hardware speculation (which means you are already trading energy for performances, so the energy argument is weak). There is no way you can feed 256+ cores that way. To gives you a similar example, x86 decoding is often the bottleneck on an x86 CPU. The number of ALUs in x86 over the past decade decreased rather than increased, because you simply can't decode fast enough to feed them. Yet, x86 CPUs have a 64 ways speculative decoding as a first stage. The hardware is likely to be slower as you'll need way more wiring than for regular floats, and wire is not only cost, but also time. You need more transistors per ALU, but slower does not matter if the algorithm needs bounded accuracy or if it converge more quickly with unums. The key challenge for him is to create a market, meaning getting the semantics into scientific software and getting initial workable implementations out to scientists. If there is a market demand, then there will be products. But you need to create the market first. Hence he wrote an easy to read book on the topic and support people who want to implement it. The problem is not transistor it is wire. Because the damn thing is variadic in every ways, pretty much every bit as input can end up anywhere in the functional unit. That is a LOT of wire.
Re: Implementing typestate
On Wednesday, 16 September 2015 at 18:41:33 UTC, Ola Fosheim Grøstad wrote: I don't think this is possible to establish in the general case. Wouldn't this require a full theorem prover? I think the only way for that to work is to fully unroll all loops and hope that a theorem prover can deal with it. For example: Object obj = create(); for ... { (Object obj, Ref r) = obj.borrow(); queue.push(r); dostuff(queue); } On the other hand if you have this: for i=0..2 { (Object obj, Ref r[i]) = obj.borrow(); dostuff(r); } then you can unwind it as (hopefully): (Object obj, Ref r[0]) = obj.borrow(); (Object obj, Ref r[1]) = obj.borrow(); (Object obj, Ref r[2]) = obj.borrow(); x += somepurefunction(r[0]); x += somepurefunction(r[1]); x += somepurefunction(r[2]); r[0].~this(); // r[0] proven unmodified, type is Ref r[1].~this(); // r[1] proven to be Ref r[2].~this(); // r[2] proven to be Ref r.~this(); If the lend IDs always are unique then you sometimes can prove that all constructors have a matching destructor... Or something like that... ?
Re: Overview of D User Groups?
On 09/16/2015 11:56 AM, qznc wrote: Is there an overview of D user groups somewhere? There is one in Berlin and one in the Valley, apparently. Walter participates in the Cpp group in Seattle or something, if I remember correctly. If a Meetup group happens to list the right keywords (topics?) then it shows up on this map: http://dpl.meetup.com/ Ali
Overview of D User Groups?
Is there an overview of D user groups somewhere? There is one in Berlin and one in the Valley, apparently. Walter participates in the Cpp group in Seattle or something, if I remember correctly.
Re: Implement the "unum" representation in D ?
On 09/16/2015 10:17 AM, Don wrote: So: ... * There is no guarantee that it would be possible to implement it in hardware without a speed penalty, regardless of how many transistors you throw at it (hardware analogue of Amdahl's Law) https://en.wikipedia.org/wiki/Gustafson's_law :o)
Re: Implementing typestate
On Wednesday, 16 September 2015 at 18:01:29 UTC, Marc Schütz wrote: typestate(alias owner) { this.owner := owner; // re-alias operator this.owner.refcount++; } I don't think this is possible to establish in the general case. Wouldn't this require a full theorem prover? I think the only way for that to work is to fully unroll all loops and hope that a theorem prover can deal with it. Either that or painstakingly construct a proof manually (Hoare logic). Like, how can you statically determine if borrowed references stuffed into a queue are all released? To do that you must prove when the queue is empty for borrowed references from a specific object, but it could be interleaved with references to other objects.
Re: Implement the "unum" representation in D ?
On 09/16/2015 10:46 AM, deadalnix wrote: On Saturday, 11 July 2015 at 18:16:22 UTC, Timon Gehr wrote: On 07/11/2015 05:07 PM, Andrei Alexandrescu wrote: On 7/10/15 11:02 PM, Nick B wrote: John Gustafson book is now out: It can be found here: http://www.amazon.com/End-Error-Computing-Chapman-Computational/dp/1482239868/ref=sr_1_1?s=books&ie=UTF8&qid=1436582956&sr=1-1&keywords=John+Gustafson&pebp=1436583212284&perid=093TDC82KFP9Y4S5PXPY Very interesting, I'll read it. Thanks! -- Andrei I think Walter should read chapter 5. What is this chapter about ? Relevant quote: "Programmers and users were never given visibility or control of when a value was promoted to “double extended precision” (80-bit or higher) format, unless they wrote assembly language; it just happened automatically, opportunistically, and unpredictably. Confusion caused by different results outweighed the advantage of reduced rounding-overflow-underflow problems, and now coprocessors must dumb down their results to mimic systems that have no such extra scratchpad capability."
Re: Implementing typestate
On Wednesday, 16 September 2015 at 17:15:55 UTC, Ola Fosheim Grøstad wrote: On Wednesday, 16 September 2015 at 17:03:14 UTC, Marc Schütz wrote: On Tuesday, 15 September 2015 at 21:44:25 UTC, Freddy wrote: On Tuesday, 15 September 2015 at 17:45:45 UTC, Freddy wrote: Rust style memory management in a library Wait nevermind about that part, it's harder than I thought. Yeah, I thought about type-states as a way of implementing borrowing, too. I think the biggest difficulty is that the state of one object (the owner) can be affected by what happens in other objects (i.e., it becomes mutable again when those are destroyed). If the borrowed reference itself follows move semantics, can't you just require it to be swallowed by it's origin as the "close" operation? pseudocode: File f = open(); (File f, FileRef r) = f.borrow(); dostuff(r); (File f, FileRef r) = f.unborrow(r); File f = f.close() But the `unborrow` is explicit. What I'd want is to use the implicit destructor call: struct S { static struct Ref { private @typestate alias owner; private S* p; @disable this(); this() typestate(alias owner) { this.owner := owner; // re-alias operator this.owner.refcount++; } body { this.p = &owner; } this(this) { this.owner.refcount++; } ~this() { this.owner.refcount--; } } @typestate size_t refcount = 0; S.Ref opUnary(string op : "*")() { // overload address operator (not yet supported) return S.Ref(@typestate this); } ~this() static if(refcount == 0) { } } void foo(scope S.Ref p); void bar(-> S.Ref p); // move void baz(S.Ref p); S a; // => S<0> { auto p = &a; // => S<1> foo(p); // pass-by-scope doesn't copy or destroy // => S<1> p.~this();// (implicit) => S<0> } { auto p = &a; // => S<1> bar(p); // pass-by-move, no copy or destruction // => S<1> p.~this();// (implicit) => S<0> } { auto p = &a; // => S<1> baz(p); // compiler sees only the copy, // but no destructor => S<2> p.~this();// (implicit) => S<1> } a.~this();// ERROR: a.refcount != 0 The first two cases can be analyzed at the call site. But the third one is problematic, because inside `baz()`, the compiler doesn't know where the alias actually points to, because it could be in an entirely different compilation unit. I guess this can be solved by disallowing all operations modifying or depending on an alias type-state. (Other complicated things, like preserving type-state through references or array indices, probably shouldn't even be attempted.)
Re: Implement the "unum" representation in D ?
On Wednesday, 16 September 2015 at 08:38:25 UTC, deadalnix wrote: Also, predictable size mean you can split your dataset and process it in parallel, which is impossible if sizes are random. I don't recall how he would deal with something similar to cache misses when you have to promote or demote a unum. However, my recollection of the book is that there was quite a bit of focus on a unum representation that has the same size as a double. If you only did the computations with this format, I would expect the sizes would be more-or-less fixed. Promotion would be pretty rare, but still possible, I would think. Compared to calculations with doubles there might not be a strong case for energy efficiency (but I don't really know for sure). My understanding was that the benefit for energy efficiency is only when you use a smaller sized unum instead of a float. I don't recall how he would resolve your point about cache misses. Anyway, while I can see a benefit from using unum numbers (accuracy, avoiding overflow, etc.) rather than floating point numbers, I think that performance or energy efficiency would have to be within range of floating point numbers for it to have any meaningful adoption.
Re: Implementing typestate
On Wednesday, 16 September 2015 at 17:15:55 UTC, Ola Fosheim Grøstad wrote: dostuff(r); (File f, FileRef r) = f.unborrow(r); Of course, files are tricky since they can change their state themselves (like IO error). Doing that statically would require some kind of branching mechanism with a try-catch that jumps to a different location where the file type changes to "File"... Sounds non-trivial to bolt onto an existing language.
Re: Implementing typestate
On Wednesday, 16 September 2015 at 17:03:14 UTC, Marc Schütz wrote: On Tuesday, 15 September 2015 at 21:44:25 UTC, Freddy wrote: On Tuesday, 15 September 2015 at 17:45:45 UTC, Freddy wrote: Rust style memory management in a library Wait nevermind about that part, it's harder than I thought. Yeah, I thought about type-states as a way of implementing borrowing, too. I think the biggest difficulty is that the state of one object (the owner) can be affected by what happens in other objects (i.e., it becomes mutable again when those are destroyed). If the borrowed reference itself follows move semantics, can't you just require it to be swallowed by it's origin as the "close" operation? pseudocode: File f = open(); (File f, FileRef r) = f.borrow(); dostuff(r); (File f, FileRef r) = f.unborrow(r); File f = f.close()
Re: Implementing typestate
On Tuesday, 15 September 2015 at 21:44:25 UTC, Freddy wrote: On Tuesday, 15 September 2015 at 17:45:45 UTC, Freddy wrote: Rust style memory management in a library Wait nevermind about that part, it's harder than I thought. Yeah, I thought about type-states as a way of implementing borrowing, too. I think the biggest difficulty is that the state of one object (the owner) can be affected by what happens in other objects (i.e., it becomes mutable again when those are destroyed).
Re: GC performance: collection frequency
On Tue, Sep 15, 2015 at 07:08:01AM +0200, Daniel Kozák via Digitalmars-d wrote: > > http://dlang.org/changelog/2.067.0.html#gc-options [...] Wow that is obscure. This really needs to go into the main docs so that it can actually be found... T -- People demand freedom of speech to make up for the freedom of thought which they avoid. -- Soren Aabye Kierkegaard (1813-1855)
Re: Implementing typestate
On Wednesday, 16 September 2015 at 16:24:49 UTC, Idan Arye wrote: No need for `reinterpret_cast`. The `close` function is declared in the same module as the `File` struct, so it has access to it's private d'tor. True, so it might work for D. Interesting.
Re: Implementing typestate
On Wednesday, 16 September 2015 at 15:57:14 UTC, Ola Fosheim Grøstad wrote: On Wednesday, 16 September 2015 at 15:34:40 UTC, Idan Arye wrote: Move semantics should be enough. We can declare the destructor private, and then any code outside the module that implicitly calls the d'tor when the variable goes out of scope will raise a compilation error. In order to "get rid" of the variable, you'll have to pass ownership to the `close` function, so your code won't try to implicitly call the d'tor. Sounds plausible, but does this work in C++ and D? I assume you mean that you "reinterpret_cast" to a different type in the close() function, which is cheating, but ok :). No need for `reinterpret_cast`. The `close` function is declared in the same module as the `File` struct, so it has access to it's private d'tor.
Re: dpaste web site
On Wednesday, 16 September 2015 at 16:12:03 UTC, Kagamin wrote: On Wednesday, 16 September 2015 at 13:54:36 UTC, Andrea Fontana wrote: I mean: to check some frequencies of common d keywords/combo like "class", "struct", "int", "float", "if(" "while(", "(int ", "(float ", etc that are not common in plain english used by spammers... Solving dcaptcha costs maybe 1$, so it should solve the problem of human spammers (too expensive). I dunno, I reckon I could solve them in ~5 seconds each, especially with practice... At $1/solve it'd be one hell of an hourly rate!
Re: dpaste web site
On Wednesday, 16 September 2015 at 13:54:36 UTC, Andrea Fontana wrote: I mean: to check some frequencies of common d keywords/combo like "class", "struct", "int", "float", "if(" "while(", "(int ", "(float ", etc that are not common in plain english used by spammers... Solving dcaptcha costs maybe 1$, so it should solve the problem of human spammers (too expensive).
Re: Implementing typestate
On Wednesday, 16 September 2015 at 15:34:40 UTC, Idan Arye wrote: Move semantics should be enough. We can declare the destructor private, and then any code outside the module that implicitly calls the d'tor when the variable goes out of scope will raise a compilation error. In order to "get rid" of the variable, you'll have to pass ownership to the `close` function, so your code won't try to implicitly call the d'tor. Sounds plausible, but does this work in C++ and D? I assume you mean that you "reinterpret_cast" to a different type in the close() function, which is cheating, but ok :).
Re: Implementing typestate
On Wednesday, 16 September 2015 at 14:34:05 UTC, Ola Fosheim Grøstad wrote: On Wednesday, 16 September 2015 at 10:31:58 UTC, Idan Arye wrote: What's wrong with two `open()`s in a row? Each will return a new file handle. Yes, but if you do it by mistake then you don't get the compiler to check that you call close() on both. I should have written "what if you forget close()". Will the compiler then complain at compile time? You can't make that happen with just move semantics, you need linear typing so that every resource created are consumed exactly once. Move semantics should be enough. We can declare the destructor private, and then any code outside the module that implicitly calls the d'tor when the variable goes out of scope will raise a compilation error. In order to "get rid" of the variable, you'll have to pass ownership to the `close` function, so your code won't try to implicitly call the d'tor.
Re: dmd codegen improvements
On Wednesday, 16 September 2015 at 14:40:26 UTC, Bruno Medeiros wrote: Me and other people from D community: "ok... now we have a new half-baked functionality in D, adding complexity for little value, and put here only to please people that are extremely unlikely to ever be using D whatever any case"... D is fun for prototyping ideas, so yes half-baked and not stable, but still useful. I'm waiting for Rust to head down the same lane of adding features and obfuscating the syntax (and their starting point is even more complex than D's was)...
Re: dmd codegen improvements
On 02/09/2015 19:58, Walter Bright wrote: On 8/29/2015 12:37 PM, Laeeth Isharc wrote: In my experience you can deliver everything people say they want, and then find it isn't that at all. That's so true. My favorite anecdote on that was back in the 1990's. A friend of mine said that what he and the world really needs was a Java native compiler. It'd be worth a fortune! I told him that I had that idea a while back, and had implemented one for Symantec. I could get him a copy that day. He changed the subject. I have many, many similar stories. I also have many complementary stories - implementing things that people laugh at me for doing, that turn out to be crucial. We can start with the laundry list of D features that C++ is rushing to adopt :-) Yes, and this I think is demonstrative of a very important consideration: if someone says they want X (and they are not paying upfront for it), then it is crucial for *you* to be able to figure out if that person or group actually wants X or not. If someone spends time building a product or feature that turns out people don't want... the failure is on that someone. And on this aspect I think the development of D does very poorly. Often people clamored for a feature or change (whether people in the D community, or the C++ one), and Walter you went ahead and did it, regardless of whether it will actually increase D usage in the long run. You are prone to this, given your nature to please people who ask for things, or to prove people wrong (as you yourself admitted). I apologize for not remembering any example at the moment, but I know there was quite a few, especially many years back. It usually went like this: C++ community guy: "D is crap, it's not gonna be used without X" *some time later* Walter: "Ok, I've now implemented X in D!" the same C++ community guy: either finds another feature or change to complain about (repeat), or goes silent, or goes "meh, D is still not good" Me and other people from D community: "ok... now we have a new half-baked functionality in D, adding complexity for little value, and put here only to please people that are extremely unlikely to ever be using D whatever any case"... -- Bruno Medeiros https://twitter.com/brunodomedeiros
Re: Implementing typestate
On Wednesday, 16 September 2015 at 10:31:58 UTC, Idan Arye wrote: What's wrong with two `open()`s in a row? Each will return a new file handle. Yes, but if you do it by mistake then you don't get the compiler to check that you call close() on both. I should have written "what if you forget close()". Will the compiler then complain at compile time? You can't make that happen with just move semantics, you need linear typing so that every resource created are consumed exactly once.
Weird "circular initialization of isInputRange" error
This piece of code (which I reduced with dustmite) gives me the following error when I try to compile it: $ rdmd -main parser.d parser.d(28): Error: circular initialization of isInputRange parser.d(31): Error: template instance std.meta.staticMap!(handler, ArrayReader*) error instantiating parser.d(36):instantiated from here: unpacker!(RefRange!(immutable(ubyte)[])) parser.d(40): Error: template instance std.range.primitives.isInputRange!(ArrayReader*) error instantiating /usr/include/dmd/phobos/std/meta.d(546):instantiated from here: F!(ArrayReader*) parser.d(43):instantiated from here: staticMap!(toTD, ArrayReader*) Failed: ["dmd", "-main", "-v", "-o-", "parser.d", "-I."] I'm not really sure what's causing the error; I'm not declaring `isInputRange` in my code. Commenting out the definition of `TD` (the very last line) removes the error. Am I doing something wrong here, or is this a compiler bug? Tested with dmd v2.068.1 on Linux x64 Code: - import std.range; import std.variant; import std.typetuple; /// template unpacker(Range) { /// Element data types. See `unpack` for usage. alias MsgPackData = Algebraic!( ArrayReader*, ); /// Reader range for arrays. struct ArrayReader { MsgPackData _front; void update() { _front.drain; } void popFront() { update; } } void drain(MsgPackData d) { static handler(T)(T t) { static if(isInputRange!T) data; } d.visit!(staticMap!(handler, MsgPackData.AllowedTypes)); } } alias TestUnpacker = unpacker!(RefRange!(immutable(ubyte)[])); alias D = TestUnpacker.MsgPackData; template toTD(T) { static if(isInputRange!T) alias toTD = This; } alias TD = Algebraic!(staticMap!(toTD, D.AllowedTypes)); // test data type
Re: dmd codegen improvements
On 28/08/2015 22:59, Walter Bright wrote: People told me I couldn't write a C compiler, then told me I couldn't write a C++ compiler. I'm still the only person who has ever implemented a complete C++ compiler (C++98). Then they all (100%) laughed at me for starting D, saying nobody would ever use it. My whole career is built on stepping over people who told me I couldn't do anything and wouldn't amount to anything. So your whole career is fundamentally based not on bringing value to the software world, but rather merely proving people wrong? That amounts to living your professional life in thrall of other people's validation, and it's not commendable at all. It's a waste of your potential. It is only worthwhile to prove people wrong when it brings you a considerable amount of either monetary resources or clout - and more so than you would have got doing something else with your time. It's not clear to me that was always the case throughout your career... was it? -- Bruno Medeiros https://twitter.com/brunodomedeiros
Re: Implement the "unum" representation in D ?
On Wednesday, 16 September 2015 at 08:38:25 UTC, deadalnix wrote: The energy comparison is bullshit. As long as you haven't loaded the data, you don't know how wide they are. Meaning you need either to go pessimistic and load for the worst case scenario or do 2 round trip to memory. That really depends on memory layout and algorithm. A likely implementation would be a co-processor that would take a unum stream and then pipe it through a network of cores (tile based co-processor). The internal busses between cores are very very fast and with 256+ cores you get tremendous throughput. But you need a good compiler/libraries and software support. The hardware is likely to be slower as you'll need way more wiring than for regular floats, and wire is not only cost, but also time. You need more transistors per ALU, but slower does not matter if the algorithm needs bounded accuracy or if it converge more quickly with unums. The key challenge for him is to create a market, meaning getting the semantics into scientific software and getting initial workable implementations out to scientists. If there is a market demand, then there will be products. But you need to create the market first. Hence he wrote an easy to read book on the topic and support people who want to implement it.
Re: dpaste web site
On Wednesday, 16 September 2015 at 13:46:07 UTC, nazriel wrote: On Wednesday, 16 September 2015 at 06:52:57 UTC, Ola Fosheim Grøstad wrote: How about just using a single click recaptcha: https://www.google.com/recaptcha/intro/index.html Used that before - still was getting spam. As Vladimir mentioned - it costs 0.001$ to get Captcha solved :) Why don't you try to check some stats over post? I mean: to check some frequencies of common d keywords/combo like "class", "struct", "int", "float", "if(" "while(", "(int ", "(float ", etc that are not common in plain english used by spammers...
Re: running code on the homepage
On Wednesday, 16 September 2015 at 10:17:21 UTC, Dmitry Olshansky wrote: On 16-Sep-2015 09:44, nazriel wrote: On Wednesday, 16 September 2015 at 05:54:03 UTC, Andrei Amatuni wrote: maybe I'm doing something wrong...but the output of running the default code snippet on the dlang.org homepage is: "unable to fork: Cannot allocate memory" not a good look Thank you for letting us know, This issue will be fixed very soon. Best regards, Damian Ziemba May I suggest you to record such conditions with automatic notification e.g. by e-mail. Only 1 in 10 of visitors will consider reporting an issue, of these only 1 in 10 will get to dlang forum to post a message. It is know for me issue. At the time I was working on runable examples, samples on the main page were way simpler. Not we are hitting some limitations of Container Dpaste's backend is running in. I am working on new version of backend (and new container) as we speak so it will be solved once and for all. 1-2 days more and we will be done with it so IMHO no need take any additionals steps for it right now.
Re: dpaste web site
On Wednesday, 16 September 2015 at 06:52:57 UTC, Ola Fosheim Grøstad wrote: How about just using a single click recaptcha: https://www.google.com/recaptcha/intro/index.html Used that before - still was getting spam. As Vladimir mentioned - it costs 0.001$ to get Captcha solved :)
Re: running code on the homepage
On Wednesday, 16 September 2015 at 10:17:21 UTC, Dmitry Olshansky wrote: On 16-Sep-2015 09:44, nazriel wrote: On Wednesday, 16 September 2015 at 05:54:03 UTC, Andrei Amatuni wrote: maybe I'm doing something wrong...but the output of running the default code snippet on the dlang.org homepage is: "unable to fork: Cannot allocate memory" not a good look Thank you for letting us know, This issue will be fixed very soon. Best regards, Damian Ziemba May I suggest you to record such conditions with automatic notification e.g. by e-mail. Only 1 in 10 of visitors will consider reporting an issue, of these only 1 in 10 will get to dlang forum to post a message. well now I feel special :)
Re: Type helpers instead of UFCS
On Saturday, 12 September 2015 at 20:37:37 UTC, BBasile wrote: UFCS is good but there are two huge problems: - code completion in IDE. It'will never work. Is is possible. DCD plans to support it: https://github.com/Hackerpilot/DCD/issues/13 I agree that this is a big issue, though, and is one of the most important things to work on.
Re: Implementing typestate
On Wednesday, 16 September 2015 at 06:25:59 UTC, Ola Fosheim Grostad wrote: On Wednesday, 16 September 2015 at 05:51:50 UTC, Tobias Müller wrote: Ola Fosheim Grøstad wrote: On Tuesday, 15 September 2015 at 20:34:43 UTC, Tobias Müller wrote: There's a Blog post somewhere but I can't find it atm. Ok found it: > http://pcwalton.github.io/blog/2012/12/26/typestate-is-dead/ But that is for runtime detection, not compile time? Not as far as I understand it. The marker is a type, not a value. And it's used as template param. But you need non-copyable move-only types for it to work. Yes... But will it prevent you from doing two open() in a row at compiletime? What's wrong with two `open()`s in a row? Each will return a new file handle.
Re: running code on the homepage
On 16-Sep-2015 09:44, nazriel wrote: On Wednesday, 16 September 2015 at 05:54:03 UTC, Andrei Amatuni wrote: maybe I'm doing something wrong...but the output of running the default code snippet on the dlang.org homepage is: "unable to fork: Cannot allocate memory" not a good look Thank you for letting us know, This issue will be fixed very soon. Best regards, Damian Ziemba May I suggest you to record such conditions with automatic notification e.g. by e-mail. Only 1 in 10 of visitors will consider reporting an issue, of these only 1 in 10 will get to dlang forum to post a message. -- Dmitry Olshansky
Re: running code on the homepage
On Wednesday, 16 September 2015 at 06:44:30 UTC, nazriel wrote: On Wednesday, 16 September 2015 at 05:54:03 UTC, Andrei Amatuni wrote: maybe I'm doing something wrong...but the output of running the default code snippet on the dlang.org homepage is: "unable to fork: Cannot allocate memory" not a good look Thank you for letting us know, This issue will be fixed very soon. Best regards, Damian Ziemba Would it be possible to set things up so ones that fail are retired until they can be fixed? Non-working examples look awful for the language.
Re: Implement the "unum" representation in D ?
On Wednesday, 16 September 2015 at 08:17:59 UTC, Don wrote: I'm not convinced. I think they are downplaying the hardware difficulties. Slide 34: I don't think he is downplaying it. He has said that it will probably take at least 10 years before it is available in hardware. There is also a company called Rex Computing that are looking at unum: http://www.theplatform.net/2015/07/22/supercomputer-chip-startup-scores-funding-darpa-contract/ He assumes that you use a scratchpad (a big register file), not caching, for intermediate calculations. His basic reasoning is that brute force ubox methods makes for highly parallel calculations. It might be possible to design ALUs that can work with various unum bit widths efficiently (many small or a few large)... who knows. You'll have to try first. Let's not forget that there is a _lot_ of legacy constraints and architectural assumptions in both x86 architecture. The energy comparisons are plain dishonest. The power required for accessing from DRAM is the energy consumption of a *cache miss* !! What's the energy consumption of a load from cache? I think this argument is aiming at HPC where you can find funding for ASICs. They push a lot of data over the memory bus.
Re: Implement the "unum" representation in D ?
On Saturday, 11 July 2015 at 18:16:22 UTC, Timon Gehr wrote: On 07/11/2015 05:07 PM, Andrei Alexandrescu wrote: On 7/10/15 11:02 PM, Nick B wrote: John Gustafson book is now out: It can be found here: http://www.amazon.com/End-Error-Computing-Chapman-Computational/dp/1482239868/ref=sr_1_1?s=books&ie=UTF8&qid=1436582956&sr=1-1&keywords=John+Gustafson&pebp=1436583212284&perid=093TDC82KFP9Y4S5PXPY Very interesting, I'll read it. Thanks! -- Andrei I think Walter should read chapter 5. What is this chapter about ?
Re: Implement the "unum" representation in D ?
On Wednesday, 16 September 2015 at 08:17:59 UTC, Don wrote: On Tuesday, 15 September 2015 at 11:13:59 UTC, Ola Fosheim Grøstad wrote: On Tuesday, 15 September 2015 at 10:38:23 UTC, ponce wrote: On Tuesday, 15 September 2015 at 09:35:36 UTC, Ola Fosheim Grøstad wrote: http://sites.ieee.org/scv-cs/files/2013/03/Right-SizingPrecision1.pdf That's a pretty convincing case. Who does it :)? I'm not convinced. I think they are downplaying the hardware difficulties. Slide 34: Disadvantages of the Unum Format * Non-power-of-two alignment. Needs packing and unpacking, garbage collection. I think that disadvantage is so enormous that it negates most of the advantages. Note that in the x86 world, unaligned memory loads of SSE values still take longer than aligned loads. And that's a trivial case! The energy savings are achieved by using a primitive form of compression. Sure, you can reduce the memory bandwidth required by compressing the data. You could do that for *any* form of data, not just floating point. But I don't think anyone thinks that's worthwhile. GPU do it a lot. Especially, but not exclusively on mobile. Not to reduce the misses (a miss is pretty much guaranteed, you load 32 thread at once in a shader core, each of them will require at least 8 pixel for a bilinear texture with mipmap, that's the bare minimum. That means 256 memory access at once. One of these pixel WILL miss, and it is going to stall the 32 threads). It is not a latency issue, but a bandwidth and energy one. But yeah, in the general case, random access is preferable, memory alignment, and the fact you don't need to do as much bookeeping are very significants. Also, predictable size mean you can split your dataset and process it in parallel, which is impossible if sizes are random. The energy comparisons are plain dishonest. The power required for accessing from DRAM is the energy consumption of a *cache miss* !! What's the energy consumption of a load from cache? That would show you what the real gains are, and my guess is they are tiny. The energy comparison is bullshit. As long as you haven't loaded the data, you don't know how wide they are. Meaning you need either to go pessimistic and load for the worst case scenario or do 2 round trip to memory. The author also use a lot the wire vs transistor cost, and how it evolved? He is right. Except that you won't cram more wire at runtime into the CPU. The CPU need the wiring for the worst case scenario, always. The hardware is likely to be slower as you'll need way more wiring than for regular floats, and wire is not only cost, but also time. That being said, even a hit in L1 is very energy hungry. Think about it, you need to go a 8 - way fetch (so you'll end up loading 4k of data from the cache) in parallel with address translation (usually 16 ways) in parallel with snooping into the load and the store buffer. If the load is not aligned, you pretty much have to multiply this by 2 if it cross a cache line boundary. I'm not sure what his number represent, but hitting L1 is quite power hungry. He is right on that one. So: * I don't believe the energy savings are real. * There is no guarantee that it would be possible to implement it in hardware without a speed penalty, regardless of how many transistors you throw at it (hardware analogue of Amdahl's Law) * but the error bound stuff is cool. Yup, that's pretty much what I get out of it as well.
Re: Implement the "unum" representation in D ?
On Tuesday, 15 September 2015 at 11:13:59 UTC, Ola Fosheim Grøstad wrote: On Tuesday, 15 September 2015 at 10:38:23 UTC, ponce wrote: On Tuesday, 15 September 2015 at 09:35:36 UTC, Ola Fosheim Grøstad wrote: http://sites.ieee.org/scv-cs/files/2013/03/Right-SizingPrecision1.pdf That's a pretty convincing case. Who does it :)? I'm not convinced. I think they are downplaying the hardware difficulties. Slide 34: Disadvantages of the Unum Format * Non-power-of-two alignment. Needs packing and unpacking, garbage collection. I think that disadvantage is so enormous that it negates most of the advantages. Note that in the x86 world, unaligned memory loads of SSE values still take longer than aligned loads. And that's a trivial case! The energy savings are achieved by using a primitive form of compression. Sure, you can reduce the memory bandwidth required by compressing the data. You could do that for *any* form of data, not just floating point. But I don't think anyone thinks that's worthwhile. The energy comparisons are plain dishonest. The power required for accessing from DRAM is the energy consumption of a *cache miss* !! What's the energy consumption of a load from cache? That would show you what the real gains are, and my guess is they are tiny. So: * I don't believe the energy savings are real. * There is no guarantee that it would be possible to implement it in hardware without a speed penalty, regardless of how many transistors you throw at it (hardware analogue of Amdahl's Law) * but the error bound stuff is cool.