Re: Challenge Tuples
On Friday, 3 May 2024 at 05:11:28 UTC, Salih Dincer wrote: .. Wouldn't it be great if there was a feature that worked at runtime... SDB@79 module m; @safe: private: import std; void main() { auto myTuple = tuple(1, 2, 3, [1, 3], 5); int[] arrToSum; foreach(int i, val; myTuple.expand) { if(typeof(val).stringof == "int[]") { foreach(v; myTuple.expand[i..i+1]) arrToSum ~= v; } else { arrToSum ~= val; } } writefln("The total value of the tuples is: %s", arrToSum.sum); // 15 }
Re: Challenge Tuples
On Wednesday, 1 May 2024 at 14:15:19 UTC, Andrey Zherikov wrote: Shorter and without allocations: ```d import std.typecons : tuple; import std.algorithm : sum, each; auto sum(int i) => i; void main() { auto t = tuple(1, 2, 3, [1, 3], 5); int res=0; t.each!(e => res += sum(e)); assert(res == 15); } ``` Super! In summary, D is clearly ahead of the tuple. Especially with its features similar to AliasSeq, I think it is unrivaled. Wouldn't it be great if there was a feature that worked at runtime... SDB@79
Re: Challenge Tuples
On Friday, 26 April 2024 at 13:25:34 UTC, Salih Dincer wrote: You have a 5-item data tuples as Tuple(1, 2, 3, [1, 3], 5) and implement the sum (total = 15) with the least codes using the sum() function of the language you are coding... Let's start with D: ```d import std.typecons : tuple; import std.algorithm : sum; void main() { auto t = tuple(1, 2, 3, [1, 3], 5); int[] arr; t.each!(e => arr ~= e); assert(arr.sum == 15); } ``` and bonus: ```d import std.typecons : tuple; import std.stdio : writeln; void main() { auto t = tuple(1, 2, 3, [1, 3], 5); auto results = [0]; foreach (data; t) { static if (is(typeof(data) == int[])) { int sum; foreach (d; data) { sum += d; } results ~= sum; } else { results ~= data; } } results.writeln; // [0, 1, 2, 3, 4, 5] ``` I bet you won't be able to do it this easily with other languages! Note: I tried with C# and Python and it didn't work! SDB@79 Shorter and without allocations: ```d import std.typecons : tuple; import std.algorithm : sum, each; auto sum(int i) => i; void main() { auto t = tuple(1, 2, 3, [1, 3], 5); int res=0; t.each!(e => res += sum(e)); assert(res == 15); } ```
Re: Challenge Tuples
On Friday, 26 April 2024 at 13:25:34 UTC, Salih Dincer wrote: You have a 5-item data tuples as Tuple(1, 2, 3, [1, 3], 5) and implement the sum (total = 15) with the least codes using the sum() function of the language you are coding... Nim: ```nim import std/[math, typetraits, macros] macro unrollRange(low, high: static int; name, body: untyped) = result = newStmtList() for i in low ..< high: result.add(newBlockStmt(newStmtList( newConstStmt(name, newLit i), copy body ))) let t = (1, 2, 3, @[1, 3], 5) var arr: seq[int] unrollRange(0, t.tupleLen, i): arr.add t[i] doAssert arr.sum == 15 ``` vs. D 1. there's no `each` on tuples 2. there's no static for loop, so a macro is needed for the tuple indices 3. `add` is making use of the same overload as D's `~=`
Re: Challenge Tuples
On Friday, 26 April 2024 at 13:25:34 UTC, Salih Dincer wrote: You have a 5-item data tuples as Tuple(1, 2, 3, [1, 3], 5) and implement the sum (total = 15) with the least codes using the sum() function of the language you are coding... Let's start with D: ```d import std.typecons : tuple; import std.algorithm : sum; void main() { auto t = tuple(1, 2, 3, [1, 3], 5); int[] arr; t.each!(e => arr ~= e); assert(arr.sum == 15); } ``` I bet you won't be able to do it this easily with other languages! Note: I tried with C# and Python and it didn't work! For Python it is possible to use something like: ```python t = (1,2,3,[1,3],5) for e in t: a.append(e) if isinstance(e, int) else a.extend(e) print(sum(a)) ```
Re: Challenge Tuples
On Saturday, 27 April 2024 at 15:36:40 UTC, Nick Treleaven wrote: On Saturday, 27 April 2024 at 15:32:40 UTC, Nick Treleaven wrote: On Saturday, 27 April 2024 at 11:55:58 UTC, Basile B. wrote: foreach const e in u do if echo(is, e, T) do result += e; static if (is(typeof(e) == int)) r += e; Actually I would write that: ```d R r; foreach (e; v) { static if (is(typeof(e) : R)) ``` I like the new sum() function, great Nick! Moreover, it also works with an ordinary range: ```d enum limit = 100; // sum = 5050 iota(limit + 1).sum.writeln; ``` Python seems too complicated to me, but C# looks delicious. Moreover, it was implemented without defining IEnumerator. Well done Matheus! SDB@79
Re: Challenge Tuples
On Saturday, 27 April 2024 at 15:32:40 UTC, Nick Treleaven wrote: On Saturday, 27 April 2024 at 11:55:58 UTC, Basile B. wrote: foreach const e in u do if echo(is, e, T) do result += e; static if (is(typeof(e) == int)) r += e; Actually I would write that: ```d R r; foreach (e; v) { static if (is(typeof(e) : R)) ```
Re: Challenge Tuples
On Saturday, 27 April 2024 at 11:55:58 UTC, Basile B. wrote: Here's [STYX](https://gitlab.com/styx-lang/styx) solution: function sum[T,U](U u): u32 I think you meant `: T`. { var T result; foreach const e in u do if echo(is, e, T) do result += e; else do result += sum![T](e); return result; } function main(): s32 { assert((1, 2, 3, [1, 3], 5).sum![u32]() == 15); return 0; } Mostly equivalent D: ```d R sum(R = long, T)(T v) { R r; foreach (e; v) { static if (is(typeof(e) == int)) r += e; else r += sum!R(e); } return r; } void main() { import std; assert(tuple(1, 2, 3, [1, 3], 5).sum == 15); } ```
Re: Challenge Tuples
On Friday, 26 April 2024 at 13:25:34 UTC, Salih Dincer wrote: You have a 5-item data tuples as Tuple(1, 2, 3, [1, 3], 5) and implement the sum (total = 15) with the least codes using the sum() function of the language you are coding... Let's start with D: Here's [STYX](https://gitlab.com/styx-lang/styx) solution: function sum[T,U](U u): u32 { var T result; foreach const e in u do if echo(is, e, T) do result += e; else do result += sum![T](e); return result; } function main(): s32 { assert((1, 2, 3, [1, 3], 5).sum![u32]() == 15); return 0; } A few notes: - tuples are first class citizen - `foreach` over tuple is like in D, i.e unrolling - `echo` is like D `__traits` - _Implicit Generic Application_ of `U` (that's like D's IFTI) makes the task easy
Re: Challenge Tuples
On Friday, 26 April 2024 at 13:25:34 UTC, Salih Dincer wrote: You have a 5-item data tuples as Tuple(1, 2, 3, [1, 3], 5) and implement the sum (total = 15) with the least codes using the sum() function of the language you are coding... My Python solution (function named dosum to avoid collision w. Python primitive): def dosum(itm): if isinstance(itm, (int, float)): return itm return sum( dosum(_i) for _i in itm ); print dosum( [1, 2, 3, [1, 3], 5] )
Re: Challenge Tuples
On Friday, 26 April 2024 at 13:25:34 UTC, Salih Dincer wrote: ... Very nice, for your first example I need to think a bit because I'm bit rusty in C#, but I think it will not be as easier as D version. For the bonus part: private static void Main(string[] args){ var a = (1,2,3,(1,3),5); var t = a as ITuple; var xx = new List(); for(var i=0;i if(t[i].GetType() == typeof(System.ValueTuple)){ var t2 = (t[i] as ITuple); var s = 0; for(var j=0;jAgain I'm rusty in C#... but so far yes it's more verbose than the D version. Matheus.
Challenge Tuples
You have a 5-item data tuples as Tuple(1, 2, 3, [1, 3], 5) and implement the sum (total = 15) with the least codes using the sum() function of the language you are coding... Let's start with D: ```d import std.typecons : tuple; import std.algorithm : sum; void main() { auto t = tuple(1, 2, 3, [1, 3], 5); int[] arr; t.each!(e => arr ~= e); assert(arr.sum == 15); } ``` and bonus: ```d import std.typecons : tuple; import std.stdio : writeln; void main() { auto t = tuple(1, 2, 3, [1, 3], 5); auto results = [0]; foreach (data; t) { static if (is(typeof(data) == int[])) { int sum; foreach (d; data) { sum += d; } results ~= sum; } else { results ~= data; } } results.writeln; // [0, 1, 2, 3, 4, 5] ``` I bet you won't be able to do it this easily with other languages! Note: I tried with C# and Python and it didn't work! SDB@79
Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow
Bit fields are currently going through the DIP process, although because of ImportC having it, its just a matter of turning them on and adding the parser stuff. However there is a major drawback to it and is why you'll still need to use a struct and that is you can't take a pointer to it.
Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow
On Sat, Mar 16, 2024 at 09:16:51PM +, Liam McGillivray via Digitalmars-d-learn wrote: > On Friday, 15 March 2024 at 00:21:42 UTC, H. S. Teoh wrote: [...] > > When dealing with units of data smaller than a byte, you generally > > need to do it manually, because memory is not addressable by > > individual bits, making it difficult to implement things like > > slicing an array of bool. [...] > I'm curious as to what "manual implementation" would mean, since > clearly making my own struct with `bool[3]` doesn't count. Does D have > features for precise memory manipulation? Manual implementation as in you would deal with the machine representation in terms of bytes, or more likely, uints (on modern CPUs even though bytes are individually addressible, the hardware actually works in terms of a larger unit, typically an 4-byte 32-bit unit, or an 8-byte 64-bit unit), using bitwise operators to manipulate the bits the way you want to. > Anyway, I'm surprised that D has a special operator `&=` for doing bit > manipulation on integers, especially given that the steps to convert > an int into a bool array is more complicated. I would imagine the > former would be a rather niche thing. You should understand that bitwise operators are directly implemented in hardware, and thus operators like &, |, ^, <<, >>, ~, etc., typically map directly to individual CPU instructions. As such, they are very fast, and preferred when you're doing bit-level manipulations. At this level, you typically do not work with individual bits per se, but with machine words (typically 32-bit or 64-bit units). Bitwise operators operate on all 32 or 64 bits at once, so performance-aware code typically manipulates all these bits simultaneously rather than individually. Of course, using suitable bit-masking you *can* address individual bits, but the hardware instructions themselves typically work with all 32/64 bits at once. Here's a simple example. Suppose you have 3 bits you want to store. Since the machine doesn't have a 3-bit built-in type, you typically just use the next larger available size, either a ubyte (8 bits) if you want compact storage, or if compactness isn't a big issue just a uint (32 bits, you just ignore the other 29 bits that you don't need). So you'd declare the storage something like this: uint myBits; Bits are usually indexed from 0, so bit 0 is the first position, bit 1 is the second position, and so on. So to set the first bit to 1, you'd do: myBits |= 0b001; Note that at the machine level, this operator works on all 32 bits at the same time. Most of the bits remain unchanged, though, because bitwise OR does not change the original value if the operand is 0. So the overall effect is that the first bit is set. To set the first bit to 0, there isn't a direct operator that does that, but you can take advantage of the behaviour of bitwise AND, in which any bit which is 0 in the operand will get cleared, everything else remains unchanged. So you'd do this: myBits &= 0b110; Now, since we don't really care about the other 29 bits, we could write this as follows instead, to make our intent clearer: myBits &= ~0b001; The ~ operator flips all the bits, so this is equivalent to writing: myBits &= ~0b_______1110; Writing it with ~ also has the advantage that should we later decide to add another bit to our "bit array", we don't have to update the code; whereas if we'd used `myBits &= 0b110;` then we'd need to change it to `myBits &= 0b1110;` otherwise our new 4th bit may get unexpectedly cleared when we only wanted to clear the first bit. Now, what if we wanted to set both the 1st and 3rd bits? In a hypothetical bit array implementation, we'd do the equivalent of: bool[3] myBits; myBits[0] = 1; myBits[2] = 1; However, in our uint approach, we can cut the number of operations by half, because the CPU is already operating on the entire 32 bits of the uint at once -- so there's no need to have two instructions to set two individual bits when we could just do it all in one: myBits |= 0b101; // look, ma! both bits set at once! Similarly, to clear the 1st and 3rd bits simultaneously, we simply write: myBits &= ~0b101; // clear both bits in 1 instruction! Of course, when we only have 3 bits to work with, the savings isn't that significant. However, if you have a larger bit array, say you need an array of 32 bits, this can speed your code up by 32x, because you're taking advantage of the fact that the hardware is already operating on all 32 bits at the same time. On 64-bit CPUs, you can speed it up by 64x because the CPU operates on all 64 bits simultaneously, so you can manipulate an entire array of 64 bits in a single instruction, which is 64x faster than if you looped over an array of bool with 64 iterations. T -- Without outlines, life would be pointless.
Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow
On Friday, 15 March 2024 at 17:25:09 UTC, Daniel N wrote: On Tuesday, 12 March 2024 at 05:38:03 UTC, Liam McGillivray wrote: I am in need of a data type for holding direction information; one of 8 directions on a single axis. They are named in terms of compass directions. If D had a 4-bit datatype, I would just use this and do `+=2` whenever I want to change the datatype, but it doesn't. Perhaps this would be a good programming challenge for someone more experienced than me. Make a data type (probably a struct) for holding one of 8 directional values using 3 bits. It should accept the use of increment operators to change the angle. Ideally (but optionally), it should work as an enum of the same name; "Direction". Here's a unittest that it should ideally pass: D actually supports both 3 and 4 bit integers. People will likely warn you of minor portability risks... but if you target a limited amount of platforms and prefer concise readable code, then it's a text book example for bitfields. The risk can easily be mitigated by having an unittest to catch the error directly(if you try to port to some exotic platform). dmd -preview=bitfields (Some lines stolen from Rikki) ```d struct Direction { private uint value : 3; alias this = value; enum Direction N = Direction(0); enum Direction NE = Direction(1); enum Direction E = Direction(2); enum Direction SE = Direction(3); enum Direction S = Direction(4); enum Direction SW = Direction(5); enum Direction W = Direction(6); enum Direction NW = Direction(7); } ``` Oh wow! That looks so clean and elegant, aside from the `: 3` part being easy to miss, and not very readable for those not aware of this feature. This would be so simple. If I used this, I wouldn't even need to make a struct and write the operator overload functions; just make an enum for a 3-bit uint. Based on the words in the command and a quick search, I'm guessing that this is an experimental feature that has not yet been accepted as part of the language. Perhaps I shouldn't use this then, just in case it gets pulled and someone who discovers my project in the future will have a build error that they don't know how to solve. This seems like a bigger problem than the extra memory the current ubyte version takes up, which is probably quite small for a computer of today anyway. I suppose this would be a nice feature for the language to have if the syntax were reworked. Perhaps something like `uint!3 value;` would be better.
Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow
On Friday, 15 March 2024 at 00:21:42 UTC, H. S. Teoh wrote: On Thu, Mar 14, 2024 at 11:39:33PM +, Liam McGillivray via Digitalmars-d-learn wrote: [...] I tried to rework the functions to use bitwise operations, but it was difficult to figure out the correct logic. I decided that it's not worth the hassle, so I just changed the value storage from `bool[3]` to `ubyte`. [...] Just wanted to note that while in theory bool[3] could be optimized by the compiler for compact storage, what you're most likely to get is 3 bytes, one for each bool, or perhaps even 3 ints (24 bytes). When dealing with units of data smaller than a byte, you generally need to do it manually, because memory is not addressable by individual bits, making it difficult to implement things like slicing an array of bool. So the compiler is most likely to simplify things by making it an array of bytes rather than emit complex bit manipulation code to make up for the lack of bit-addressability in the underlying hardware. Using bit operators like others have pointed out in this thread is probably the best way to implement what you want. T I'm curious as to what "manual implementation" would mean, since clearly making my own struct with `bool[3]` doesn't count. Does D have features for precise memory manipulation? Anyway, I'm surprised that D has a special operator `&=` for doing bit manipulation on integers, especially given that the steps to convert an int into a bool array is more complicated. I would imagine the former would be a rather niche thing.
Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow
On Tuesday, 12 March 2024 at 05:38:03 UTC, Liam McGillivray wrote: I am in need of a data type for holding direction information; one of 8 directions on a single axis. They are named in terms of compass directions. If D had a 4-bit datatype, I would just use this and do `+=2` whenever I want to change the datatype, but it doesn't. Perhaps this would be a good programming challenge for someone more experienced than me. Make a data type (probably a struct) for holding one of 8 directional values using 3 bits. It should accept the use of increment operators to change the angle. Ideally (but optionally), it should work as an enum of the same name; "Direction". Here's a unittest that it should ideally pass: D actually supports both 3 and 4 bit integers. People will likely warn you of minor portability risks... but if you target a limited amount of platforms and prefer concise readable code, then it's a text book example for bitfields. The risk can easily be mitigated by having an unittest to catch the error directly(if you try to port to some exotic platform). dmd -preview=bitfields (Some lines stolen from Rikki) ```d struct Direction { private uint value : 3; alias this = value; enum Direction N = Direction(0); enum Direction NE = Direction(1); enum Direction E = Direction(2); enum Direction SE = Direction(3); enum Direction S = Direction(4); enum Direction SW = Direction(5); enum Direction W = Direction(6); enum Direction NW = Direction(7); } ```
Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow
On Thu, Mar 14, 2024 at 11:39:33PM +, Liam McGillivray via Digitalmars-d-learn wrote: [...] > I tried to rework the functions to use bitwise operations, but it was > difficult to figure out the correct logic. I decided that it's not > worth the hassle, so I just changed the value storage from `bool[3]` > to `ubyte`. [...] Just wanted to note that while in theory bool[3] could be optimized by the compiler for compact storage, what you're most likely to get is 3 bytes, one for each bool, or perhaps even 3 ints (24 bytes). When dealing with units of data smaller than a byte, you generally need to do it manually, because memory is not addressable by individual bits, making it difficult to implement things like slicing an array of bool. So the compiler is most likely to simplify things by making it an array of bytes rather than emit complex bit manipulation code to make up for the lack of bit-addressability in the underlying hardware. Using bit operators like others have pointed out in this thread is probably the best way to implement what you want. T -- LINUX = Lousy Interface for Nefarious Unix Xenophobes.
Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow
On Friday, 15 March 2024 at 00:00:01 UTC, Richard (Rikki) Andrew Cattermole wrote: On 15/03/2024 12:47 PM, Basile B. wrote: On Thursday, 14 March 2024 at 23:39:33 UTC, Liam McGillivray wrote: On Thursday, 14 March 2024 at 01:58:46 UTC, Richard (Rikki) Andrew Cattermole wrote: [...] I tried to rework the functions to use bitwise operations, but it was difficult to figure out the correct logic. I decided that it's not worth the hassle, so I just changed the value storage from `bool[3]` to `ubyte`. Now it works much more like your version. https://github.com/LiamM32/Open_Emblem/blob/c2014ab3f77e89c0cedcd6dbf7f8362ebfac33a9/source/common.d I did a little reading, so now I understand what it means when you have `&= 7`. But I want to ask, is this faster than `%= 8`? If not, I would like to change it to the latter for readability. `%=8` will be codegened using slower intructions w/o optimz enabled but with `&=7` you directly get the right instruction, which does not involves integer division. See https://godbolt.org/z/74vbba5aG Yes, it'll depend upon how smart the compiler is at optimizing and it may not occur in non-optimizing builds. Indeed GDC (so very likely GCC too, or whatever language uses it as backend...) does it without optimz (https://godbolt.org/z/Ke7c54Gqj). That's not very surprising. If you look at LLVM bug tracker, for the tag "missed optimisations", in the report body you'll see a lot of "GDC does that but we dont", even if here it's a bit different, as optimz are not enabled.
Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow
On 15/03/2024 12:47 PM, Basile B. wrote: On Thursday, 14 March 2024 at 23:39:33 UTC, Liam McGillivray wrote: On Thursday, 14 March 2024 at 01:58:46 UTC, Richard (Rikki) Andrew Cattermole wrote: [...] I tried to rework the functions to use bitwise operations, but it was difficult to figure out the correct logic. I decided that it's not worth the hassle, so I just changed the value storage from `bool[3]` to `ubyte`. Now it works much more like your version. https://github.com/LiamM32/Open_Emblem/blob/c2014ab3f77e89c0cedcd6dbf7f8362ebfac33a9/source/common.d I did a little reading, so now I understand what it means when you have `&= 7`. But I want to ask, is this faster than `%= 8`? If not, I would like to change it to the latter for readability. `%=8` will be codegened using slower intructions w/o optimz enabled but with `&=7` you directly get the right instruction, which does not involves integer division. See https://godbolt.org/z/74vbba5aG Yes, it'll depend upon how smart the compiler is at optimizing and it may not occur in non-optimizing builds. The modulas instructions are based upon division, this is an incredibly expensive operation. https://stackoverflow.com/a/8022107 The division instruction on Haswell for integers ranges from 9 cycles for 8bit, all the way up to 36 cycles for 64bit.
Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow
On Thursday, 14 March 2024 at 23:39:33 UTC, Liam McGillivray wrote: On Thursday, 14 March 2024 at 01:58:46 UTC, Richard (Rikki) Andrew Cattermole wrote: [...] I tried to rework the functions to use bitwise operations, but it was difficult to figure out the correct logic. I decided that it's not worth the hassle, so I just changed the value storage from `bool[3]` to `ubyte`. Now it works much more like your version. https://github.com/LiamM32/Open_Emblem/blob/c2014ab3f77e89c0cedcd6dbf7f8362ebfac33a9/source/common.d I did a little reading, so now I understand what it means when you have `&= 7`. But I want to ask, is this faster than `%= 8`? If not, I would like to change it to the latter for readability. `%=8` will be codegened using slower intructions w/o optimz enabled but with `&=7` you directly get the right instruction, which does not involves integer division. See https://godbolt.org/z/74vbba5aG
Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow
On Thursday, 14 March 2024 at 01:58:46 UTC, Richard (Rikki) Andrew Cattermole wrote: The cost of an add + increment then a bitwise and is only 2-4 cycles on a Haswell cpu. Depending upon if its working solely in registers (via inlining) or its operating on ram. Whereas if you need to do branching (if statement, loops), this is an unpredictable cost and loops where a simple bitwise operation can be done is out right non-optimized. As for exceptions, totally not required. You can solve this by simply making your state private so nobody else can mutate it. Bounds checking will ensure if the state is corrupted it'll error out if you use the lookup method I suggested above. I tried to rework the functions to use bitwise operations, but it was difficult to figure out the correct logic. I decided that it's not worth the hassle, so I just changed the value storage from `bool[3]` to `ubyte`. Now it works much more like your version. https://github.com/LiamM32/Open_Emblem/blob/c2014ab3f77e89c0cedcd6dbf7f8362ebfac33a9/source/common.d I did a little reading, so now I understand what it means when you have `&= 7`. But I want to ask, is this faster than `%= 8`? If not, I would like to change it to the latter for readability.
Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow
There appears to be a few things that you may not be aware of based upon this implementation. The cost of an add + increment then a bitwise and is only 2-4 cycles on a Haswell cpu. Depending upon if its working solely in registers (via inlining) or its operating on ram. The cost of a move from ram into a register is about 1 cycle, but can have latencies around 3 cycles if not cached. Whereas if you need to do branching (if statement, loops), this is an unpredictable cost and loops where a simple bitwise operation can be done is out right non-optimized. As for methods like to: ```d static immutable string[] table = ["north", ...]; return table[this.value]; ``` That's two moves from ram to register. In essence in your design, you have blown out the cost by a significant margin well beyond the 12% that is described as being the minimum to consider optimization. If you want to understand where the 12% number comes from I suggest reading the paper "Structured Programming with go to Statements" by Donald Knuth. As for exceptions, totally not required. You can solve this by simply making your state private so nobody else can mutate it. Bounds checking will ensure if the state is corrupted it'll error out if you use the lookup method I suggested above. Understanding what I have said is very important for game engine programmers. So if you're interested in it beyond some hobbyist endeavors you have some reading up to do ;)
Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow
On Tuesday, 12 March 2024 at 06:38:28 UTC, Richard (Rikki) Andrew Cattermole wrote: By taking advantage of integer wrapping and a bitwise and, its quite a simple problem to solve! Challenge for the reader: add support for binary operations and toString support. Last night I pushed the latest commit to the GitHub repository for my game. It contains the `Direction` struct in [`source/common.d`](https://github.com/LiamM32/Open_Emblem/blob/master/source/common.d). Here it is: ``` struct Direction //One of 8 directions stored in 3 bits { import std.conv; import std.traits: isNumeric; bool[3] b; static Direction N = Direction(b:[false,false,false]); static Direction NE = Direction(b:[true,false,false]); static Direction E = Direction(b:[false,true,false]); static Direction SE = Direction(b:[true,true,false]); static Direction S = Direction(b:[false,false,true]); static Direction SW = Direction(b:[true,false,true]); static Direction W = Direction(b:[false,true,true]); static Direction NW = Direction(b:[true,true,true]); ref Direction opUnary(string op)() if (op == "++" || op == "--") { static if (op == "++") const bool up = true; else const bool up = false; if (b[0]) { if (b[1]) b[2] = !b[2]; b[1] = !b[1]; } b[0] = !b[0]; return this; } void opOpAssign(string op)(int amount) if (op == "+" || op == "-") { amount = amount%8; if (amount > 0) for (uint i = 0; i < amount; i++) { static if (op=="+") this++; else this--; } else for (uint i=0; i > amount; i--) { static if (op=="+") this--; else this++; } } T to(T)() const if(isNumeric!T) { return cast(T)(b[0] + 2*b[1] + 4*b[2]); } T opCast(T)() if (isNumeric!T) { return cast(T)(b[0] + 2*b[1] + 4*b[2]); } T to(T)() const if(is(T==string)) { if (this==Direction.N) return "north"; else if (this==Direction.NE) return "northeast"; else if (this==Direction.E) return "east"; else if (this==Direction.SE) return "southeast"; else if (this==Direction.S) return "south"; else if (this==Direction.SW) return "southwest"; else if (this==Direction.W) return "west"; else if (this==Direction.NW) return "northwest"; else throw new Exception("Direction.to!: direction has a value that should be impossible."); //else return ""; //This should never happen. } bool[3] opCast() const { return this.b; } Direction opposite() const { return Direction([b[0], b[1], !b[2]]); } bool diagonal() { return b[0]; } int getAngle() { if (this==Direction.N) return 0; else if (this==Direction.NE) return 45; else if (this==Direction.E) return 90; else if (this==Direction.SE) return 135; else if (this==Direction.S) return 180; else if (this==Direction.SW) return 225; else if (this==Direction.W) return 270; else if (this==Direction.NW) return 315; else throw new Exception("Direction.getAngle: direction has a value that should be impossible."); } } ``` The one thing that cant be done on this is doing `direction+8`. Perhaps it would have been easier if I had chosen to store the value as a ubyte rather than an array of bools. While this is probably over-optimized given the large amount of memory in today's computers, I like it that it can never be an illegitimate value. There's probably some trick I can use to make it easier to figure out the functions to do numerical operations on bits, but I don't know how best to cleanly represent this kind of thing in code. Maybe I need to look for examples of how it's already done. I noticed among the options for overloading unary operations are the symbols "+" & "-". What operation is being overloaded with the function `ref Direction opUnary(string op:"+")(int amount)`?
Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow
On Wednesday, 13 March 2024 at 10:27:49 UTC, Basile B. wrote: The semantics of the operators are actually not as clear as that. What if you define ```d enum Direction { N = 1, NE, S = 45, SW } ``` ? Certainly! EnumMembers; can be used. The EnumMembers template from the std.traits module is used to retrieve all the members of an enumeration. It generates a tuple containing all the enumeration values, which can be iterated over using a foreach loop. In the D programming language, you can use EnumMembers to iterate over enum values at compile time, which is useful for generating code based on enum members. Here’s a simple code example: ```d enum Days { Monday= 1001, Tuesday = 1010, Wednesday = 1011, Thursday = 1100, Friday= 1101, Saturday = 1110, Sunday= } import std.traits : EnumMembers; foreach(day; EnumMembers!Days) day.writeln(":", cast(int)day); ``` SDB@79
Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow
On Tuesday, 12 March 2024 at 05:38:03 UTC, Liam McGillivray wrote: I am in need of a data type for holding direction information; one of 8 directions on a single axis. They are named in terms of compass directions. If D had a 4-bit datatype, I would just use this and do `+=2` whenever I want to change the datatype, but it doesn't. Perhaps this would be a good programming challenge for someone more experienced than me. Make a data type (probably a struct) for holding one of 8 directional values using 3 bits. It should accept the use of increment operators to change the angle. Ideally (but optionally), it should work as an enum of the same name; "Direction". Here's a unittest that it should ideally pass: ``` unittest { Direction direction = Direction.N; direction++; assert(direction == Direction.NE); direction+=3; assert(direction == Direction.S); direction--; assert(direction == Direction.SE); direction-=4; assert(direction == Direction.NW); } ``` While there are workarounds (as proposed in the other answers, using operator overloads) I tend to think that the way currently enums work with arithmetic operators is a symptom of an incomplete type-system. Enums made of integer numbers are sub classes of the parent [integral sequence](https://en.wikipedia.org/wiki/Integer_sequence). The semantics of the operators are actually not as clear as that. What if you define ```d enum Direction { N = 1, NE, S = 45, SW } ``` ? You directly see that the proposed workarounds dont work anymore. There are natural numbers, sub-sets of natural numbers, unordered sequences, etc. The type system does not represent that. Anyway, I dont blame D, plenty of languages have the exact same problem.
Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow
On Tuesday, 12 March 2024 at 05:38:03 UTC, Liam McGillivray wrote: Perhaps this would be a good programming challenge for someone more experienced than me. Make a data type (probably a struct) for holding one of 8 directional values using 3 bits. It should accept the use of increment operators to change the angle. D is such a perfect language that you can do the features you mentioned and more. I also implemented the following during my rookie years: ```d alias Direction = enumSet; union enumSet(T) { T e; alias e this; struct { int i; auto opUnary(string op: "++")() => i = i < e.max ? ++i : e.min; auto opUnary(string op: "--")() => i = i > e.min ? --i : e.max; auto opOpAssign(string op: "+")(int val) { i += val; if(i > e.max) i -= e.max + 1; } auto opOpAssign(string op: "-")(int val) { i -= val; if(i < e.min) i += e.max + 1; } } string toString() const => e.to!string; } unittest { auto direction = Direction!Directions(Directions.N); direction++; assert(direction == Directions.NE); direction+=3; assert(direction == Directions.S); direction--; assert(direction == Directions.SE); direction-=4; assert(direction == Directions.NW); } import std.stdio, std.conv; void main() { enum Directions { N , NE , E, SE, S, SW , W, NW } auto test = enumSet!Directions(Directions.W); test += 9; /* ++test; ++test; ++test; ++test; ++test; ++test; ++test; ++test; ++test;//*/ test.writeln; test--; test--; test.writeln; } ``` Here union was used with an extra template parameter. I also added 2 aliases to avoid confusion. SDB@79
Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow
On 13/03/2024 11:00 AM, Liam McGillivray wrote: I'm not familiar with the syntax of the line |value &= 7;|. Is it equivalent to writing |value = value % 7;|? & is a bitwise and. LSB 123456789 MSB & 7 LSB 12300 MSB Anyway, you used an int, but I used an array of 3 bools. I'm guessing that mine uses less memory, but I'm not sure how memory it adds when it's a struct with functions. Due to alignment, it'll probably use just as much. Mine only needs a single ``byte``, at 7 bits it's more than enough. But ``int`` doesn't make much difference unless you are packing instances together ``align(0):`` and realistically cpus are optimized for 32bits not 8.
Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow
On Tuesday, 12 March 2024 at 06:38:28 UTC, Richard (Rikki) Andrew Cattermole wrote: By taking advantage of integer wrapping and a bitwise and, its quite a simple problem to solve! Challenge for the reader: add support for binary operations and toString support. https://dlang.org/spec/operatoroverloading.html ```d struct Direction { private int value; Direction opUnary(string op:"++")() { value++; value &= 7; return this; } Direction opUnary(string op:"--")() { value--; value &= 7; return this; } void opOpAssign(string op:"+")(int amount) { value += amount; value &= 7; } void opOpAssign(string op:"-")(int amount) { value -= amount; value &= 7; } enum Direction N = Direction(0); enum Direction NE = Direction(1); enum Direction E = Direction(2); enum Direction SE = Direction(3); enum Direction S = Direction(4); enum Direction SW = Direction(5); enum Direction W = Direction(6); enum Direction NW = Direction(7); } unittest { Direction direction = Direction.N; direction++; assert(direction == Direction.NE); direction+=3; assert(direction == Direction.S); direction--; assert(direction == Direction.SE); direction-=4; assert(direction == Direction.NW); } ``` Interesting. I didn't know that an enum can be defined inside a struct like that. I had used functions to get around it. Here is what I had already mostly written, using help from ChatGPT (but only for the opUnary syntax, not the algorithm): ``` struct Direction //One of 8 directions stored in 3 bits { bool[3] d; static Direction N() { return Direction(d:[false,false,false]); } static Direction NE() { return Direction(d:[false,false,true]); } static Direction E() { return Direction(d:[false,true,false]); } static Direction SE() { return Direction(d:[false,true,true]); } static Direction S() { return Direction(d:[true,false,false]); } static Direction SW() { return Direction(d:[true,false,true]); } static Direction W() { return Direction(d:[true,true,false]); } static Direction NW() { return Direction(d:[true,true,true]); } ref Direction opUnary(string op)() if (op == "++" || op == "--") { if (op == "++") const bool up = true; else const bool up = false; if (d[0]) { if (d[1]) d[2] = !d[2]; d[1] = !d[1]; } d[0] = !d[0]; return this; } auto to(T)() const { return cast(T)(d[0] + 2*d[1] + 4*d[2]); } } ``` I am not entirely sure how well it works. I will come back later with an updated version with more functions. I'm not familiar with the syntax of the line `value &= 7;`. Is it equivalent to writing `value = value % 7;`? Anyway, you used an int, but I used an array of 3 bools. I'm guessing that mine uses less memory, but I'm not sure how memory it adds when it's a struct with functions.
Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow
By taking advantage of integer wrapping and a bitwise and, its quite a simple problem to solve! Challenge for the reader: add support for binary operations and toString support. https://dlang.org/spec/operatoroverloading.html ```d struct Direction { private int value; Direction opUnary(string op:"++")() { value++; value &= 7; return this; } Direction opUnary(string op:"--")() { value--; value &= 7; return this; } void opOpAssign(string op:"+")(int amount) { value += amount; value &= 7; } void opOpAssign(string op:"-")(int amount) { value -= amount; value &= 7; } enum Direction N = Direction(0); enum Direction NE = Direction(1); enum Direction E = Direction(2); enum Direction SE = Direction(3); enum Direction S = Direction(4); enum Direction SW = Direction(5); enum Direction W = Direction(6); enum Direction NW = Direction(7); } unittest { Direction direction = Direction.N; direction++; assert(direction == Direction.NE); direction+=3; assert(direction == Direction.S); direction--; assert(direction == Direction.SE); direction-=4; assert(direction == Direction.NW); } ```
Challenge: Make a data type for holding one of 8 directions allowing increment and overflow
I am in need of a data type for holding direction information; one of 8 directions on a single axis. They are named in terms of compass directions. If D had a 4-bit datatype, I would just use this and do `+=2` whenever I want to change the datatype, but it doesn't. Perhaps this would be a good programming challenge for someone more experienced than me. Make a data type (probably a struct) for holding one of 8 directional values using 3 bits. It should accept the use of increment operators to change the angle. Ideally (but optionally), it should work as an enum of the same name; "Direction". Here's a unittest that it should ideally pass: ``` unittest { Direction direction = Direction.N; direction++; assert(direction == Direction.NE); direction+=3; assert(direction == Direction.S); direction--; assert(direction == Direction.SE); direction-=4; assert(direction == Direction.NW); } ```
Re: The One Billion Row Challenge
On Saturday, 13 January 2024 at 23:25:07 UTC, monkyyy wrote: On Thursday, 11 January 2024 at 11:21:39 UTC, Sergey wrote: On Thursday, 11 January 2024 at 08:57:43 UTC, Christian Köstlin wrote: Did someone already try to do this in dlang? I guess it will be very hard to beat the java solutions running with graalvm! https://news.ycombinator.com/item?id=38851337 Kind regards, Christian I think C++ people already beated Java's performance https://github.com/buybackoff/1brc?tab=readme-ov-file#native I feel we could beat c++ if they didn't radix sort The project is very hard. Many optimizations and tricks were applied by others. It requires a lot of skill to implement everything on a high level.
Re: The One Billion Row Challenge
On Thursday, 11 January 2024 at 11:21:39 UTC, Sergey wrote: On Thursday, 11 January 2024 at 08:57:43 UTC, Christian Köstlin wrote: Did someone already try to do this in dlang? I guess it will be very hard to beat the java solutions running with graalvm! https://news.ycombinator.com/item?id=38851337 Kind regards, Christian I think C++ people already beated Java's performance https://github.com/buybackoff/1brc?tab=readme-ov-file#native I feel we could beat c++ if they didn't radix sort
Re: The One Billion Row Challenge
On Thursday, 11 January 2024 at 08:57:43 UTC, Christian Köstlin wrote: Did someone already try to do this in dlang? I guess it will be very hard to beat the java solutions running with graalvm! https://news.ycombinator.com/item?id=38851337 Kind regards, Christian The problem with this challenge can be seen in the initial comments. Writing the fastest possible program *for a specific dataset* is not the same thing as writing the fastest program for an arbitrary dataset of that size. And, indeed, the fastest program is the one that does nothing but print the answer. Speed on this task doesn't tell you anything about performance with different types/sizes of data or constraints on programmer time needed to produce a correct implementation and maintain it over time.
Re: The One Billion Row Challenge
On Thursday, 11 January 2024 at 08:57:43 UTC, Christian Köstlin wrote: Did someone already try to do this in dlang? I guess it will be very hard to beat the java solutions running with graalvm! https://news.ycombinator.com/item?id=38851337 Kind regards, Christian I think C++ people already beated Java's performance https://github.com/buybackoff/1brc?tab=readme-ov-file#native
Re: Threading challenge: calculate fib(45) while spinning
On Friday, 15 October 2021 at 03:35:44 UTC, jfondren wrote: The book, "The Go Programming Language" has this simple goroutine example: [...] Here is a similar implementation using the concurrency library: ```d import concurrency; import concurrency.stream; import concurrency.sender : justFrom; import concurrency.operations : via, race; import concurrency.thread : ThreadSender; import core.time : msecs; import std.stdio : writef, writefln, stdout; import core.thread : Thread; void main() @safe { enum chars = `-\|/`; auto spinner = infiniteStream(0) .scan((int acc, int _) => acc + 1, 0) .collect((int i) shared @trusted { writef("\r%c", chars[i % chars.length]); stdout.flush(); Thread.sleep(100.msecs); }) .via(ThreadSender()); enum n = 45; auto work = justFrom(() => fib(n)); auto result = race(spinner, work).syncWait.value; writefln("\rFibonacci(%d) = %d", n, result.get); } int fib(int x) pure @safe @nogc { if (x < 2) return x; return fib(x - 1) + fib(x - 2); } ``` Go has language support so it is a bit unfair to compare it. But this code will properly handle errors (in case `writef` or `flush` throws), and as well as having an explicit synchronization point so that the final `writeln` is always after the spinner is done.
Re: Threading challenge: calculate fib(45) while spinning
On 10/15/21 10:01 AM, Ali Çehreli wrote: > writefln!"\rFibonacci(%d) = %d"(n, fibN); That '\r' bothered me because the actual work needs to know what the spinner is doing to clear its remaining character. I would expect the original go code had the same problem. -Steve
Re: Threading challenge: calculate fib(45) while spinning
On 10/14/21 8:54 PM, Ali Çehreli wrote: >writefln!"\rFibonacci(%d) = %d"(n, fibN); That '\r' bothered me because the actual work needs to know what the spinner is doing to clear its remaining character. >receiveTimeout(delay, > (OwnerTerminated msg) { And there is a race condition because the spinner can print an extra character by the time it receives the OwnerTerminated message. (You can observe this by adding e.g. Thread.sleep(300.msecs) after the "\rFibonnacci..." line above.) So, I improved it by removing both of those concerns as well as adding the following: - An optional message when spinning (it can be further improved because there is an extra space character if the message is empty) - A withSpinner() function to work with any delegate The only requirement is that the delegate should not output to stdout if we want a clean output. import std.stdio : stdout, writef, writefln; import std.concurrency : receiveTimeout, send, spawn; import std.traits : ReturnType; import core.thread : Duration, msecs, Thread; import std.range : cycle, repeat, walkLength; import std.format : format; void main() { enum n = 45; int fibN; // Left mutable not to complicate the example withSpinner({ fibN = fib(n); // slow }, format!"Calculating fib(%s)"(n)); writefln!"Fibonacci(%d) = %d"(n, fibN); } // The delegate 'dg' should not output to stdout. void withSpinner(Dg)(Dg dg, string message = null, Duration delay = 100.msecs) { shared(bool) spinnerDone = false; auto s = spawn(, message, delay, ); // Do work while spinning dg(); // Tell the spinner to stop (the value does not matter) s.send(0x0FF); // Busy(ish) wait until the spinner is done while (!spinnerDone) { Thread.yield(); } } void spinner(string message, Duration delay, shared(bool) * done) { foreach (c; `-\|/`.cycle) { if (receiveTimeout(delay, (int _) {})) { // Stop request received // Clear the spinning message writef("\r%s \r", " ".repeat(walkLength(message))); // Tell the user we are done *done = true; return; } writef!"\r%s %c"(message, c); stdout.flush(); } } auto fib(int x) { if (x < 2) { return x; } return fib(x-1) + fib(x-2); } Ali
Re: Threading challenge: calculate fib(45) while spinning
On 10/14/21 11:35 PM, jfondren wrote: The book, "The Go Programming Language" has this simple goroutine example: ```go func main() { go spinner(100 * time.Millisecond) const n = 45 fibN := fib(n) // slow fmt.Printf("\rFibonacci(%d) = %d\n", n, fibN) } func spinner(delay time.Duration) { for { for _, r := range `-\|/` { fmt.Printf("\r%c", r) time.Sleep(delay) } } } func fib(x int) int { if x < 2 { return x } return fib(x-1) + fib(x-2) } ``` Attempt #1, with std.concurrency: ```d import std.concurrency : spawn; import core.thread : Thread; import std.stdio : writefln, writef, stdout; import std.datetime : msecs, Duration; void main() @safe { (() @trusted { spawn(, 100.msecs); })(); const n = 45; const fibN = fib(n); // slow writefln!"\rFibonacci(%d) = %d"(n, fibN); } void spinner(Duration delay) @safe { (() @trusted { Thread.getThis.isDaemon(true); })(); while (true) { foreach (char c; `-\|/`) { writef!"\r%c"(c); (() @trusted { stdout.flush; })(); (() @trusted { Thread.sleep(delay); })(); } } } int fib(int x) pure @safe @nogc { if (x < 2) return x; return fib(x - 1) + fib(x - 2); } ``` This version has two problems: 1. a race condition with `isDaemon`: if `main()` ends before `isDaemon(true)` is called, then the program never ends because the kill-non-daemon-threads module destructor is called while the new thread isn't a daemon thread. You can also just spawn a thread directly with `Thread`, which I believe allows you to set the daemon-ness from `main`. 2. it crashes about 10% of the time on exit (in dmd, gdc, and ldc). valgrind on a gdc build complains about "Conditional jump or move depends on uninitialised value(s)" early on. The crash is likely because you are using D i/o utilities, and the runtime is shut down. Technically it shouldn't cause a problem, but possibly there are things that are needed deep inside `writef`. If you switch to `printf`, it will probably work. -Steve
Re: Threading challenge: calculate fib(45) while spinning
On Friday, 15 October 2021 at 03:54:17 UTC, Ali Çehreli wrote: On 10/14/21 8:35 PM, jfondren wrote: [...] Here is one that uses receiveTimeout and OwnerTerminated: import std.stdio; import std.concurrency; import core.thread; void main() { spawnLinked(, 100.msecs); enum n = 45; const fibN = fib(n); // slow writefln!"\rFibonacci(%d) = %d"(n, fibN); } void spinner(const(Duration) delay) { for (;;) { foreach (r; `-\|/`) { writef!"\r%c"(r); stdout.flush(); bool done; receiveTimeout(delay, (OwnerTerminated msg) { done = true; }); if (done) { return; } } } } auto fib(int x) { if (x < 2) { return x; } return fib(x-1) + fib(x-2); } Ali This is a "similar" approach to what Erlang does. I have always liked it ☀️
Re: Threading challenge: calculate fib(45) while spinning
On 10/14/21 9:17 PM, jfondren wrote: On Friday, 15 October 2021 at 03:54:17 UTC, Ali Çehreli wrote: On 10/14/21 8:35 PM, jfondren wrote: The book, "The Go Programming Language" has this simple goroutine example: Here is one that uses receiveTimeout and OwnerTerminated: Very nice, replacing Thread.sleep with receiveTimeout and getting graceful interruption for free. This also doesn't crash. Cool. :) Actually, it can be shorter by checking the return value of receiveTimeout: if (receiveTimeout(delay, (OwnerTerminated msg) {})) { return; } I didn't use this method earlier because I was afraid an unexpected message might make receiveTimeout return 'true'. But I've tested just now: Only the expected OwnerTerminated makes it return 'true'. Ali
Re: Threading challenge: calculate fib(45) while spinning
On Friday, 15 October 2021 at 03:54:17 UTC, Ali Çehreli wrote: On 10/14/21 8:35 PM, jfondren wrote: The book, "The Go Programming Language" has this simple goroutine example: Here is one that uses receiveTimeout and OwnerTerminated: Very nice, replacing Thread.sleep with receiveTimeout and getting graceful interruption for free. This also doesn't crash.
Re: Threading challenge: calculate fib(45) while spinning
On 10/14/21 8:35 PM, jfondren wrote: The book, "The Go Programming Language" has this simple goroutine example: Here is one that uses receiveTimeout and OwnerTerminated: import std.stdio; import std.concurrency; import core.thread; void main() { spawnLinked(, 100.msecs); enum n = 45; const fibN = fib(n); // slow writefln!"\rFibonacci(%d) = %d"(n, fibN); } void spinner(const(Duration) delay) { for (;;) { foreach (r; `-\|/`) { writef!"\r%c"(r); stdout.flush(); bool done; receiveTimeout(delay, (OwnerTerminated msg) { done = true; }); if (done) { return; } } } } auto fib(int x) { if (x < 2) { return x; } return fib(x-1) + fib(x-2); } Ali
Threading challenge: calculate fib(45) while spinning
The book, "The Go Programming Language" has this simple goroutine example: ```go func main() { go spinner(100 * time.Millisecond) const n = 45 fibN := fib(n) // slow fmt.Printf("\rFibonacci(%d) = %d\n", n, fibN) } func spinner(delay time.Duration) { for { for _, r := range `-\|/` { fmt.Printf("\r%c", r) time.Sleep(delay) } } } func fib(x int) int { if x < 2 { return x } return fib(x-1) + fib(x-2) } ``` Attempt #1, with std.concurrency: ```d import std.concurrency : spawn; import core.thread : Thread; import std.stdio : writefln, writef, stdout; import std.datetime : msecs, Duration; void main() @safe { (() @trusted { spawn(, 100.msecs); })(); const n = 45; const fibN = fib(n); // slow writefln!"\rFibonacci(%d) = %d"(n, fibN); } void spinner(Duration delay) @safe { (() @trusted { Thread.getThis.isDaemon(true); })(); while (true) { foreach (char c; `-\|/`) { writef!"\r%c"(c); (() @trusted { stdout.flush; })(); (() @trusted { Thread.sleep(delay); })(); } } } int fib(int x) pure @safe @nogc { if (x < 2) return x; return fib(x - 1) + fib(x - 2); } ``` This version has two problems: 1. a race condition with `isDaemon`: if `main()` ends before `isDaemon(true)` is called, then the program never ends because the kill-non-daemon-threads module destructor is called while the new thread isn't a daemon thread. 2. it crashes about 10% of the time on exit (in dmd, gdc, and ldc). valgrind on a gdc build complains about "Conditional jump or move depends on uninitialised value(s)" early on. Attempt #2, with std.parallelism: ```d import std.parallelism : task, taskPool; import core.thread : Thread; import std.stdio : writefln, writef, stdout; import std.datetime : msecs, Duration; void main() @safe { auto spin = task!spinner(100.msecs); taskPool.put(spin); const n = 45; const fibN = fib(n); // slow writefln!"\rFibonacci(%d) = %d"(n, fibN); } void spinner(Duration delay) @safe { while (true) { foreach (char c; `-\|/`) { writef!"\r%c"(c); (() @trusted { stdout.flush; })(); (() @trusted { Thread.sleep(delay); })(); } } } int fib(int x) pure @safe @nogc { if (x < 2) return x; return fib(x - 1) + fib(x - 2); } ``` This version continues to spin after the Fibonacci result is printed, despite https://dlang.org/phobos/std_parallelism.html#.taskPool saying that `taskPool` worker threads are daemon by default, and despite various attempts to add `isDaemon(true)` calls. Is there a d version without these problems, and without varying substantially from the go (by e.g. having the spinner poll to see if it should exit gracefully).
Re: Bit rotation question/challenge
On Saturday, 30 January 2021 at 14:56:14 UTC, burt wrote: On Saturday, 30 January 2021 at 14:41:59 UTC, Afgdr wrote: On Saturday, 30 January 2021 at 14:40:49 UTC, Afgdr wrote: On Saturday, 30 January 2021 at 13:30:49 UTC, burt wrote: [...] cast as uint and shift. cast the result as ubyte[4]. obiously, that works for n=4 with uint and n=8 for ulong, only. Yes I used to do this, but then I needed it for n > 8. You can try somethink like this: https://run.dlang.io/is/POQgnb import std.range : cycle, take, drop; import std.algorithm : copy; import std.stdio; version (LittleEndian) ubyte[n] rotateRight(size_t n)(ref const ubyte[n] array, uint rotation){ typeof(return) result; array[] .cycle() .drop(n - (rotation / 8) % n) .take(n) .copy(result[]); const ubyte bit_rotation = rotation % 8; enum ubyte full = 0b_; if(bit_rotation == 0) return result; ubyte next_prefix(const ubyte elm){ const ubyte suffix = (elm & ~(full << bit_rotation)); const ubyte prefix = cast(ubyte)(suffix << (8 - bit_rotation)); return prefix; } ubyte prefix = next_prefix(result[$-1]); foreach(ref ubyte elm; result[]){ const new_prefix = next_prefix(elm); elm = (elm >> bit_rotation) | prefix; prefix = new_prefix; } return result; } void main(){ ubyte[4] x = [ 0b00011000, 0b0011, 0b00010101, 0b0010, ]; writefln!"%(%8b,\n%)"(x.rotateRight(4)); }
Re: Bit rotation question/challenge
On Saturday, 30 January 2021 at 14:56:14 UTC, burt wrote: On Saturday, 30 January 2021 at 14:41:59 UTC, Afgdr wrote: On Saturday, 30 January 2021 at 14:40:49 UTC, Afgdr wrote: On Saturday, 30 January 2021 at 13:30:49 UTC, burt wrote: [...] cast as uint and shift. cast the result as ubyte[4]. obiously, that works for n=4 with uint and n=8 for ulong, only. Yes I used to do this, but then I needed it for n > 8. As suggested in the other answer BitArray may be the best generic solution.
Re: Bit rotation question/challenge
On Saturday, 30 January 2021 at 14:41:59 UTC, Afgdr wrote: On Saturday, 30 January 2021 at 14:40:49 UTC, Afgdr wrote: On Saturday, 30 January 2021 at 13:30:49 UTC, burt wrote: [...] cast as uint and shift. cast the result as ubyte[4]. obiously, that works for n=4 with uint and n=8 for ulong, only. Yes I used to do this, but then I needed it for n > 8.
Re: Bit rotation question/challenge
On Saturday, 30 January 2021 at 14:17:06 UTC, Paul Backus wrote: On Saturday, 30 January 2021 at 13:30:49 UTC, burt wrote: [...] Now I want to bit-rotate the array as if it is one big integer. You may find `std.bitmanip.BitArray` useful for this: http://phobos.dpldocs.info/std.bitmanip.BitArray.html Thank you, this is indeed what I am looking for! For future reference, this is how I implemented it: ```d ubyte[n] rotateRight(size_t n)(ubyte[n] x, uint rotation) { import std.bitmanip : BitArray; ubyte[n] x2; foreach (i, value; x) // have to swap because of endianness x2[n - 1 - i] = value; auto copy = x2; auto bitArray1 = BitArray(cast(void[]) x2[], n * 8); auto bitArray2 = BitArray(cast(void[]) copy[], n * 8); bitArray1 >>= rotation; bitArray2 <<= n * 8 - rotation; bitArray1 |= bitArray2; foreach (i, value; x2) // swap back x[n - 1 - i] = value; return x; } ubyte[4] x = [ 0b00011000, 0b0011, 0b00010101, 0b0010, ]; writefln!"%(%8b,\n%)"(x.rotateRight(4)); ```
Re: Bit rotation question/challenge
On Saturday, 30 January 2021 at 14:40:49 UTC, Afgdr wrote: On Saturday, 30 January 2021 at 13:30:49 UTC, burt wrote: I have a static array of `ubyte`s of arbitrary size: ```d ubyte[4] x = [ // in reality, ubyte[64] 0b1000, 0b0001, 0b00010101, 0b0010, ]; ``` Now I want to bit-rotate the array as if it is one big integer. So: ```d ubyte[n] rotateRight(size_t n)(ref const ubyte[n] array, uint rotation) { // ? } // same for rotateLeft ubyte[4] y = [ 0b1001, 0b0100, 0b, 0b10001010, ]; assert(x.rotateRight(9) == y); assert(y.rotateLeft(9) == x); ``` Any ideas how this could be achieved? I.e. what should go at the "?" for rotateRight and rotateLeft? cast as uint and shift. cast the result as ubyte[4]. obiously, that works for n=4 with uint and n=8 for ulong, only.
Re: Bit rotation question/challenge
On Saturday, 30 January 2021 at 13:30:49 UTC, burt wrote: I have a static array of `ubyte`s of arbitrary size: ```d ubyte[4] x = [ // in reality, ubyte[64] 0b1000, 0b0001, 0b00010101, 0b0010, ]; ``` Now I want to bit-rotate the array as if it is one big integer. So: ```d ubyte[n] rotateRight(size_t n)(ref const ubyte[n] array, uint rotation) { // ? } // same for rotateLeft ubyte[4] y = [ 0b1001, 0b0100, 0b, 0b10001010, ]; assert(x.rotateRight(9) == y); assert(y.rotateLeft(9) == x); ``` Any ideas how this could be achieved? I.e. what should go at the "?" for rotateRight and rotateLeft? cast as uint and shift. cast the result as ubyte[4].
Re: Bit rotation question/challenge
On Saturday, 30 January 2021 at 13:30:49 UTC, burt wrote: I have a static array of `ubyte`s of arbitrary size: ```d ubyte[4] x = [ // in reality, ubyte[64] 0b1000, 0b0001, 0b00010101, 0b0010, ]; ``` Now I want to bit-rotate the array as if it is one big integer. You may find `std.bitmanip.BitArray` useful for this: http://phobos.dpldocs.info/std.bitmanip.BitArray.html
Bit rotation question/challenge
I have a static array of `ubyte`s of arbitrary size: ```d ubyte[4] x = [ // in reality, ubyte[64] 0b1000, 0b0001, 0b00010101, 0b0010, ]; ``` Now I want to bit-rotate the array as if it is one big integer. So: ```d ubyte[n] rotateRight(size_t n)(ref const ubyte[n] array, uint rotation) { // ? } // same for rotateLeft ubyte[4] y = [ 0b1001, 0b0100, 0b, 0b10001010, ]; assert(x.rotateRight(9) == y); assert(y.rotateLeft(9) == x); ``` Any ideas how this could be achieved? I.e. what should go at the "?" for rotateRight and rotateLeft?
Challenge
I recently came across an interesting exercise. Given a series of positive numbers, each of which belongs to the set { 2^^i * 3^^j * 5^^k | i, j, k ≥ 0 }. The series is ordered in ascending order. The beginning looks like this: { 1, 2, 3, 4, 5, 6, 8, 9, 10, 12, 15, 16, 18, 20, 24, ... } The goal is to get the Nth element of the series. For example, for N = 10, the answer is 12. On the Internet, I found an elegant and incredibly fast solution in Haskell: ``` mergeUniq :: Ord a => [a] -> [a] -> [a] mergeUniq (x:xs) (y:ys) = case x `compare` y of EQ -> x : mergeUniq xs ys LT -> x : mergeUniq xs (y:ys) GT -> y : mergeUniq (x:xs) ys powers :: [Integer] powers = 1 : expand 2 `mergeUniq` expand 3 `mergeUniq` expand 5 where expand factor = (factor *) <$> powers main = print $ powers!!99 ``` On my machine, the 100th element is found in 0.215s. OMG! After that, I spent almost the entire day searching for at least the same fast solution in D. My first attempt was simple and quite pretty, but awfully slow: ``` import std.stdio; import std.array; import std.bigint; import std.algorithm; auto generate(int n) { BigInt[] nums = [BigInt("1")]; foreach(i; 0..n) { nums = merge(nums, [nums[i] * 2, nums[i] * 3, nums[i] * 5]).uniq().array(); } return nums[n - 1]; } void main() { writeln(generate(5000)); } ``` 0.275s for 5000th element. At the end of the day, I managed to catch up and even overtake Haskell by emulating its lazyness with ranges and a funny trick. Victory! :) If you find this interesting, I will publish my latest version. And feel free to find your own solution. ;)
Re: A Friendly Challenge for D
On Tuesday, 16 October 2018 at 21:12:39 UTC, welkam wrote: On Tuesday, 16 October 2018 at 20:58:54 UTC, Jabari Zakiya wrote: And they could be modded to catch semantics like this and produce faster code. Its hard to prove that you will only write 1 or 0 in the array and even if you write such pass it wont fire very often. So slower compile times for no gain in common case. My guess is the difference in code execution times based on the ALU operation pipeline. and my guess is that CPU knows that its going to write the same value to memory so it ignores that write. Exactly. It does the same instruction twice, and once it's set the first time it can ignore setting it the second time. However with seg[k] = 1 , the mov instruction always moves, because it doesn't know|care what the previous value was. That's my guess. Life is too short for such things. Today compilers are good at optimizing low level stuff. Those decades of improvements add up. Amen! :-)
Re: A Friendly Challenge for D
On Tuesday, 16 October 2018 at 20:58:54 UTC, Jabari Zakiya wrote: And they could be modded to catch semantics like this and produce faster code. Its hard to prove that you will only write 1 or 0 in the array and even if you write such pass it wont fire very often. So slower compile times for no gain in common case. My guess is the difference in code execution times based on the ALU operation pipeline. and my guess is that CPU knows that its going to write the same value to memory so it ignores that write. I used to know most of these tricks back in the Pentium days, but I haven't had a need to keep up with Intel Ixx core chips machine|compiler > level instruction optimizations. Life is too short for such things. Today compilers are good at optimizing low level stuff. Those decades of improvements add up.
Re: A Friendly Challenge for D
On Tuesday, 16 October 2018 at 20:38:24 UTC, welkam wrote: On Tuesday, 16 October 2018 at 17:57:23 UTC, Jabari Zakiya wrote: This is the exact same behavior I found with the Nim compiler too. Well Nim compiler is more like translator. It translates Nim code to c or c++. Since gcc was responsible for optimizations and instruction selection it would be more correct to say that gcc behavior was same as llvm. How many other unknown gotchas are in the compilers? :-( the number of inlining shall be 3 https://www.youtube.com/watch?v=s4wnuiCwTGU Yes, it finally is the compiler doing it, and Clang does the same thing. And they could be modded to catch semantics like this and produce faster code. My guess is the difference in code execution times based on the ALU operation pipeline. seg[k] = seg[k] | 1 can be done "inplace" in the pipeline, where the lsb can to be changed in mem, whereas seg[k] = 1 may require a separate constant|value creation and memory write, and bypass the ALU. I used to know most of these tricks back in the Pentium days, but I haven't had a need to keep up with Intel Ixx core chips machine|compiler level instruction optimizations.
Re: A Friendly Challenge for D
On Tuesday, 16 October 2018 at 17:57:23 UTC, Jabari Zakiya wrote: This is the exact same behavior I found with the Nim compiler too. Well Nim compiler is more like translator. It translates Nim code to c or c++. Since gcc was responsible for optimizations and instruction selection it would be more correct to say that gcc behavior was same as llvm. How many other unknown gotchas are in the compilers? :-( the number of inlining shall be 3 https://www.youtube.com/watch?v=s4wnuiCwTGU
Re: A Friendly Challenge for D
On Tuesday, 16 October 2018 at 07:09:05 UTC, Vijay Nayar wrote: On Monday, 15 October 2018 at 22:17:57 UTC, Jabari Zakiya wrote: $ dub build --compiler=ldc2 -b=release && echo "30" | ./twinprimes Enter integer number: threads = 8 each thread segment is [1 x 65536] bytes array twinprime candidates = 175324676; resgroups = 1298702 each 135 threads has nextp[2 x 5566] array setup time = 1 ms, 864 μs, and 7 hnsecs perform twinprimes ssoz sieve sieve time = 196 ms, 566 μs, and 5 hnsecs last segment = 53518 resgroups; segment slices = 20 total twins = 9210144; last twin = 299712+/- 1 total time = 198 ms, 431 μs, and 2 hnsecs My understanding is that the difference in performance is largely due to slightly better optimization from the LLVM based ldc2 compiler, where I believe Nim is using gcc. Here's what I get on my system. $ echo 3_000_000_000 | ./twinprimes_test7yc.0180.gcc821 Enter integer number: threads = 8 each thread segment is [1 x 65536] bytes array twinprime candidates = 175324676; resgroups = 1298702 each 135 threads has nextp[2 x 5566] array setup time = 0.000 secs perform twinprimes ssoz sieve sieve time = 0.144 secs last segment = 53518 resgroups; segment slices = 20 total twins = 9210144; last twin = 299712+/-1 total time = 0.144 secs Hey Vijay I found subtle static typing errors. When N < 2^32 - 1 (4,294,967,295) the twimprime count and lastprime values are correct. When N > 2^32 - 1 both values start to be incorrect. Examples: For N = 4,000,000,000 you get correct answers $ echo 40 | ./twinprimes_ssoz Enter integer number: threads = 8 each thread segment is [1 x 65536] bytes array twinprime candidates = 233766231; resgroups = 1731602 each 135 threads has nextp[2 x 6332] array setup time = 667 μs and 5 hnsecs perform twinprimes ssoz sieve sieve time = 181 ms, 277 μs, and 6 hnsecs last segment = 27666 resgroups; segment slices = 27 total twins = 11944438; last twin = 399798+/- 1 total time = 181 ms, 945 μs, and 1 hnsec Examples: For N = 5,000,000,000 you get wrong answers $ echo 50 | ./twinprimes_ssoz Enter integer number: threads = 8 each thread segment is [1 x 65536] bytes array twinprime candidates = 292207792; resgroups = 2164503 each 135 threads has nextp[2 x 6999] array setup time = 744 μs and 7 hnsecs perform twinprimes ssoz sieve sieve time = 225 ms, 323 μs, and 4 hnsecs last segment = 1815 resgroups; segment slices = 34 total twins = 14618173; last twin = 705034592+/- 1 total time = 226 ms, 68 μs, and 1 hnse These are correct answers for N = 5,000,000,000 $ echo 50 | ./twinprimes_test7yc.0180.gcc821 Enter integer number: threads = 8 each thread segment is [1 x 65536] bytes array twinprime candidates = 292207792; resgroups = 2164503 each 135 threads has nextp[2 x 6999] array setup time = 0.001 secs perform twinprimes ssoz sieve sieve time = 0.237 secs last segment = 1815 resgroups; segment slices = 34 total twins = 14618166; last twin = 499860+/-1 total time = 0.237 secs The first easy check should show the lastprime value just a little smaller than N. See the timing|results table I posted earlier to see correct outputs as N gets bigger. It took me a looong time, and was a big PITA, to get the correct static typing too in Nim, especially coming from a dynamic language like Ruby.
Re: A Friendly Challenge for D
On Tuesday, 16 October 2018 at 16:57:12 UTC, welkam wrote: So I run profiler and 97% of time is spent in void twinsSieve function and hotspots are seg[k] = seg[k] | 1; lines. Since seg[k] can only be 1 or 0 I removed that or operation. And the results are. Queue the drum-roll... 5% slower. I thought that all of my studying was getting somewhere. That I beginning to understand things but no. Performing OR operation and then storing data is faster than just storing data. [sarcasm] Of course it makes sense [/sarcasm] I looked at assembler and nothing changed except orb$0x1,(%rbx,%rdi,1) is changed to movb $0x1,(%rbx,%rdi,1) I`m completely lost. This is the exact same behavior I found with the Nim compiler too. seg[k] = 1 is slower than seg[k] = seg[k] or 1, which is why it's coded like that. Thanks for finding the exact instruction difference. How many other unknown gotchas are in the compilers? :-(
Re: A Friendly Challenge for D
On Tuesday, 16 October 2018 at 07:09:05 UTC, Vijay Nayar wrote: D has multiple compilers, but for the speed of the finished binary, LDC2 is generally recommended. I used version 1.11.0. https://github.com/ldc-developers/ldc/releases/tag/v1.11.0 I was using DUB to manage the project, but to build the stand-alone file from the gist link, use this command: $ ldc2 -release -O3 twinprimes_ssoz.d And to run it: $ echo "30" | ./twinprimes_ssoz It'd be interesting to see if LTO or PGO generated an improvement. It looks like it could in this case, as it might optimize some of the inner loops. LTO is easy, enable it with: -flto= -defaultlib=phobos2-ldc-lto,druntime-ldc-lto (see: https://github.com/ldc-developers/ldc/releases/tag/v1.9.0). I've been using 'thin' on OSX, 'full' on Linux. PGO is a bit more work, but not too bad. A good primer is here: https://johanengelen.github.io/ldc/2016/07/15/Profile-Guided-Optimization-with-LDC.html --Jon
Re: A Friendly Challenge for D
So I run profiler and 97% of time is spent in void twinsSieve function and hotspots are seg[k] = seg[k] | 1; lines. Since seg[k] can only be 1 or 0 I removed that or operation. And the results are. Queue the drum-roll... 5% slower. I thought that all of my studying was getting somewhere. That I beginning to understand things but no. Performing OR operation and then storing data is faster than just storing data. [sarcasm] Of course it makes sense [/sarcasm] I looked at assembler and nothing changed except orb$0x1,(%rbx,%rdi,1) is changed to movb $0x1,(%rbx,%rdi,1) I`m completely lost.
Re: A Friendly Challenge for D
On Monday, 15 October 2018 at 22:17:57 UTC, Jabari Zakiya wrote: $ dub build --compiler=ldc2 -b=release && echo "30" | ./twinprimes Enter integer number: threads = 8 each thread segment is [1 x 65536] bytes array twinprime candidates = 175324676; resgroups = 1298702 each 135 threads has nextp[2 x 5566] array setup time = 1 ms, 864 μs, and 7 hnsecs perform twinprimes ssoz sieve sieve time = 196 ms, 566 μs, and 5 hnsecs last segment = 53518 resgroups; segment slices = 20 total twins = 9210144; last twin = 299712+/- 1 total time = 198 ms, 431 μs, and 2 hnsecs My understanding is that the difference in performance is largely due to slightly better optimization from the LLVM based ldc2 compiler, where I believe Nim is using gcc. Here's what I get on my system. $ echo 3_000_000_000 | ./twinprimes_test7yc.0180.gcc821 Enter integer number: threads = 8 each thread segment is [1 x 65536] bytes array twinprime candidates = 175324676; resgroups = 1298702 each 135 threads has nextp[2 x 5566] array setup time = 0.000 secs perform twinprimes ssoz sieve sieve time = 0.144 secs last segment = 53518 resgroups; segment slices = 20 total twins = 9210144; last twin = 299712+/-1 total time = 0.144 secs Could you list your hardware, D ver, compiler specs. I will run your code on my system with your D version and compiler, if I can. Excellent work! D has multiple compilers, but for the speed of the finished binary, LDC2 is generally recommended. I used version 1.11.0. https://github.com/ldc-developers/ldc/releases/tag/v1.11.0 I was using DUB to manage the project, but to build the stand-alone file from the gist link, use this command: $ ldc2 -release -O3 twinprimes_ssoz.d And to run it: $ echo "30" | ./twinprimes_ssoz Running the program a bunch of times, I get variable results, mostly between 195ms and 250ms. Running the Nim version, I also get variable results, mostly between 230ms and 280ms. My system is an Alienware 14x R2 from 2012. Specs can be found here: https://www.cnet.com/products/alienware-m14xr2-14-core-i7-3630qm-8-gb-ram-750-gb-hdd/specs/
Re: A Friendly Challenge for D
$ dub build --compiler=ldc2 -b=release && echo "30" | ./twinprimes Enter integer number: threads = 8 each thread segment is [1 x 65536] bytes array twinprime candidates = 175324676; resgroups = 1298702 each 135 threads has nextp[2 x 5566] array setup time = 1 ms, 864 μs, and 7 hnsecs perform twinprimes ssoz sieve sieve time = 196 ms, 566 μs, and 5 hnsecs last segment = 53518 resgroups; segment slices = 20 total twins = 9210144; last twin = 299712+/- 1 total time = 198 ms, 431 μs, and 2 hnsecs My understanding is that the difference in performance is largely due to slightly better optimization from the LLVM based ldc2 compiler, where I believe Nim is using gcc. Here's what I get on my system. $ echo 3_000_000_000 | ./twinprimes_test7yc.0180.gcc821 Enter integer number: threads = 8 each thread segment is [1 x 65536] bytes array twinprime candidates = 175324676; resgroups = 1298702 each 135 threads has nextp[2 x 5566] array setup time = 0.000 secs perform twinprimes ssoz sieve sieve time = 0.144 secs last segment = 53518 resgroups; segment slices = 20 total twins = 9210144; last twin = 299712+/-1 total time = 0.144 secs Could you list your hardware, D ver, compiler specs. I will run your code on my system with your D version and compiler, if I can. Excellent work!
Re: A Friendly Challenge for D
I don't actually understand the underlying algorithm, but I at least understand the flow of the program and the structure. The algorithm utilized depends heavily on using shared memory access, which can be done in D, but I definitely wouldn't call it idiomatic. In D, message passing is preferred, but it really can't be well demonstrated on your algorithm without a deeper understanding of the algorithm itself. A complete working version can be found at: https://gist.github.com/vnayar/79e2d0a9850833b8859dd9f08997b4d7 I modified both versions of the program to utilize the pgParameters13 for more of an apples-to-apples comparison. The final results are as follows: $ nim c --cc:gcc --d:release --threads:on twinprimes_ssoz.nim && echo "30" | ./twinprimes_ssoz Enter integer number: threads = 8 each thread segment is [1 x 65536] bytes array twinprime candidates = 175324676; resgroups = 1298702 each 135 threads has nextp[2 x 5566] array setup time = 0.000 secs perform twinprimes ssoz sieve sieve time = 0.222 secs last segment = 53518 resgroups; segment slices = 20 total twins = 9210144; last twin = 299712+/-1 total time = 0.223 secs $ dub build --compiler=ldc2 -b=release && echo "30" | ./twinprimes Enter integer number: threads = 8 each thread segment is [1 x 65536] bytes array twinprime candidates = 175324676; resgroups = 1298702 each 135 threads has nextp[2 x 5566] array setup time = 1 ms, 864 μs, and 7 hnsecs perform twinprimes ssoz sieve sieve time = 196 ms, 566 μs, and 5 hnsecs last segment = 53518 resgroups; segment slices = 20 total twins = 9210144; last twin = 299712+/- 1 total time = 198 ms, 431 μs, and 2 hnsecs My understanding is that the difference in performance is largely due to slightly better optimization from the LLVM based ldc2 compiler, where I believe Nim is using gcc. Hey Great! I'll take some time and study your code. Nim currently allows using gcc or clang: --cc:gcc or --cc:clang, and compiles to C: nim c --cc:gcc... or C++: nim c++ --cc:gcc... You can compile in Nim with/out using garbage collection (GC) and choose GC. This version needs GC to recover mem created in each thread, or it will eat it up. See operation of program using htop/top to see threads|mem at work. If D has a fast bitarray structure, try it to create the data arrays "prms" in proc "sozpg" and "seg" arrays is "twin_sieves". This could allow for larger segment sizes|arrays that fit into cpu cache, reducing the number of segment slices to process. This would all have to be profiled and encapsulated in "selectPG", which adaptively selects to best PG to use. Here is a table of output data and performance times (in seconds). The results were produced on a System76 laptop with an Intel I7 6700HQ cpu, 2.6-3.5 GHz clock, with 8 threads, and 16GB of memory. I compiled the file "twinprimes_ssoz.nim" with Nim versions 0.18.0, and the just released 0.19.0, using gcc 8.2.1 with Manjaro (Linux) in Virtual Box. I then ran the executables on my host Linux system (which only has gcc-4.9.8) to use all 8 threads. The output was produced on a "quiet" system, where I turned off the laptop and rebooted, and ran tests with only a terminal and spreedsheet open (to record results). The times are the "best" times from multiple runs. I also compare times from the program "primesieve" - https://primesieve.org/ which states on its site: primesieve is a free software program and C/C++ library that generates primes using a highly optimized sieve of Eratosthenes implementation. It counts the primes below 10^10 in just 0.45 seconds on an Intel Core i7-6700 CPU (4 x 3.4 GHz). primesieve can generate primes and prime k‑tuplets up to 2^64. (An optimized C/C++ versions of twinprimes_ssoz would be nice to compare against too.) Number | Twimprimes | Last Twinprime |twinprime_ssoz.nim, gcc-8.2.1| primesieve | ||| 0.18.0 | 0.19.0| 9.7.1| --|||--|--|| 1 x 10^09 |3424506 | 199872 | 0.043 | 0.041 | 0.049 | --|||--|--|| 5 x 10^09 | 14618166 | 499860 | 0.219 | 0.226 | 0.248 | --|||--|--|| 1 x 10^10 | 27412679 | 999702 | 0.398 | 0.401 | 0.533 | --|||--|--|| 5 x 10^10 | 118903682 |4999590 | 2.364 | 2.331 | 2.978 | --|||--|--|| 1 x 10^11 | 224376048 |762 |
Re: A Friendly Challenge for D
On Sunday, 14 October 2018 at 10:51:11 UTC, Vijay Nayar wrote: Once I get the bugs out, I'm curious to see if any performance differences crop up. There's the theory that says they should be the same, and then there's the practice. I don't actually understand the underlying algorithm, but I at least understand the flow of the program and the structure. The algorithm utilized depends heavily on using shared memory access, which can be done in D, but I definitely wouldn't call it idiomatic. In D, message passing is preferred, but it really can't be well demonstrated on your algorithm without a deeper understanding of the algorithm itself. A complete working version can be found at: https://gist.github.com/vnayar/79e2d0a9850833b8859dd9f08997b4d7 I modified both versions of the program to utilize the pgParameters13 for more of an apples-to-apples comparison. The final results are as follows: $ nim c --cc:gcc --d:release --threads:on twinprimes_ssoz.nim && echo "30" | ./twinprimes_ssoz Enter integer number: threads = 8 each thread segment is [1 x 65536] bytes array twinprime candidates = 175324676; resgroups = 1298702 each 135 threads has nextp[2 x 5566] array setup time = 0.000 secs perform twinprimes ssoz sieve sieve time = 0.222 secs last segment = 53518 resgroups; segment slices = 20 total twins = 9210144; last twin = 299712+/-1 total time = 0.223 secs $ dub build --compiler=ldc2 -b=release && echo "30" | ./twinprimes Enter integer number: threads = 8 each thread segment is [1 x 65536] bytes array twinprime candidates = 175324676; resgroups = 1298702 each 135 threads has nextp[2 x 5566] array setup time = 1 ms, 864 μs, and 7 hnsecs perform twinprimes ssoz sieve sieve time = 196 ms, 566 μs, and 5 hnsecs last segment = 53518 resgroups; segment slices = 20 total twins = 9210144; last twin = 299712+/- 1 total time = 198 ms, 431 μs, and 2 hnsecs My understanding is that the difference in performance is largely due to slightly better optimization from the LLVM based ldc2 compiler, where I believe Nim is using gcc.
Re: A Friendly Challenge for D
On Sunday, 14 October 2018 at 10:51:11 UTC, Vijay Nayar wrote: But as previous posters have said, the code is not really very different between Nim and D. Most of it is array manipulation and arithmetic operations, and not many of the features of either D or Nim are very different. Both turn into fast code, both have garbage collection, and both have generally similar operators and libraries for this kind of problem. The biggest differences I observed revolved not around the languages themselves, but around code style. Are you aware of DCompute? [1] I haven’t used it myself but since Jabari explicitly mentioned taking advantage of the GPU, it may be worth giving it a try. It should have an impact on performance and possibly on style as well, even though it will still be D. It would be nice if Nicholas could chime in on this thread. Bastiaan. [1] https://dlang.org/blog/2017/10/30/d-compute-running-d-on-the-gpu/
Re: A Friendly Challenge for D
On Saturday, 13 October 2018 at 19:04:48 UTC, Jabari Zakiya wrote: On Saturday, 13 October 2018 at 18:31:57 UTC, Vijay Nayar wrote: On Saturday, 13 October 2018 at 18:14:20 UTC, Vijay Nayar wrote: On Saturday, 13 October 2018 at 18:05:45 UTC, Jabari Zakiya wrote: It may be also running into a hard time limit imposed on compilation that Nim had/has that prevented my code from initially compiling. I'm generating a lot of PG parameter constants at compile time, and it's doing a lot of number crunching and building larger and larger arrays of constants as the PG's get larger. Try compiling with successive PG's (just P5, then P5 and P7, etc) to see where it fails. That will let you know the code is working correctly, and that the compiler is choking either/and because of a hard time limit and/or memory limit. That's why I put in a compiler output statement in 'genPGparameters' to see the progression of the PG parameters being built by the compiler to initially find when the compiler started choking. You may also need to patch whatever facility in the D compiler chain that controls this too. It's P17, the biggest one that takes the longest to build in the Nim version. I actually don't know what memory limits exist for the D compiler at compile-time, so I may need to do some homework. It's not just DMD either. $ ldc2 twinprimes_ssoz.d ... generating parameters for P17 Killed $ gdc twinprimes_ssoz.d ... generating parameters for P17 gdc: internal compiler error: Killed (program cc1d) Please submit a full bug report, with preprocessed source if appropriate. See for instructions. $ dmd twinprimes_ssoz.d ... generating parameters for P17 Killed In the Nim code, starting line 91 is when the PG constants are being generate at compile time. - # Generate at compile time the parameters for PGs P5-P17. const parametersp5 = genPGparameters(5) const parametersp7 = genPGparameters(7) const parametersp11 = genPGparameters(11) const parametersp13 = genPGparameters(13) const parametersp17 = genPGparameters(17) - Can it compile just using P5 (the first line, others commented out), and then P7, etc? I'm not understanding your comments now. If you can get a working version up and running (with correct output) we can solve the P17 compiler issues later (or in a parallel forum thread), especially if you have to delve into the weeds of the compiler chain. In my mind (same with Nim process) getting working code using any PG is first order priority (because you'll need getting multi-threading working too). Once you can do that, by default, you can then use any generator you want if you create the correct parameters for it. That's one of the advantages of the algorithm, it's PG agnostic (as long as your hardware will accommodate it). So don't prioritize getting P17 to compile right now (in this thread). Create the working generic structure that can work with any PG first. Updated: https://gist.github.com/vnayar/79e2d0a9850833b8859dd9f08997b4d7 I still get a few runtime errors likely from mistakes in my conversion for certain primes. I'll resolve those after I get back from the gym. But as previous posters have said, the code is not really very different between Nim and D. Most of it is array manipulation and arithmetic operations, and not many of the features of either D or Nim are very different. Both turn into fast code, both have garbage collection, and both have generally similar operators and libraries for this kind of problem. The biggest differences I observed revolved not around the languages themselves, but around code style. For example, can you put a loop and 3 additional statements on a single line in D? Yes. But it is considered to be not very readable code from a style perspective. Once I get the bugs out, I'm curious to see if any performance differences crop up. There's the theory that says they should be the same, and then there's the practice.
Re: A Friendly Challenge for D
On Saturday, 13 October 2018 at 18:31:57 UTC, Vijay Nayar wrote: On Saturday, 13 October 2018 at 18:14:20 UTC, Vijay Nayar wrote: On Saturday, 13 October 2018 at 18:05:45 UTC, Jabari Zakiya wrote: It may be also running into a hard time limit imposed on compilation that Nim had/has that prevented my code from initially compiling. I'm generating a lot of PG parameter constants at compile time, and it's doing a lot of number crunching and building larger and larger arrays of constants as the PG's get larger. Try compiling with successive PG's (just P5, then P5 and P7, etc) to see where it fails. That will let you know the code is working correctly, and that the compiler is choking either/and because of a hard time limit and/or memory limit. That's why I put in a compiler output statement in 'genPGparameters' to see the progression of the PG parameters being built by the compiler to initially find when the compiler started choking. You may also need to patch whatever facility in the D compiler chain that controls this too. It's P17, the biggest one that takes the longest to build in the Nim version. I actually don't know what memory limits exist for the D compiler at compile-time, so I may need to do some homework. It's not just DMD either. $ ldc2 twinprimes_ssoz.d ... generating parameters for P17 Killed $ gdc twinprimes_ssoz.d ... generating parameters for P17 gdc: internal compiler error: Killed (program cc1d) Please submit a full bug report, with preprocessed source if appropriate. See for instructions. $ dmd twinprimes_ssoz.d ... generating parameters for P17 Killed In the Nim code, starting line 91 is when the PG constants are being generate at compile time. - # Generate at compile time the parameters for PGs P5-P17. const parametersp5 = genPGparameters(5) const parametersp7 = genPGparameters(7) const parametersp11 = genPGparameters(11) const parametersp13 = genPGparameters(13) const parametersp17 = genPGparameters(17) - Can it compile just using P5 (the first line, others commented out), and then P7, etc? I'm not understanding your comments now. If you can get a working version up and running (with correct output) we can solve the P17 compiler issues later (or in a parallel forum thread), especially if you have to delve into the weeds of the compiler chain. In my mind (same with Nim process) getting working code using any PG is first order priority (because you'll need getting multi-threading working too). Once you can do that, by default, you can then use any generator you want if you create the correct parameters for it. That's one of the advantages of the algorithm, it's PG agnostic (as long as your hardware will accommodate it). So don't prioritize getting P17 to compile right now (in this thread). Create the working generic structure that can work with any PG first.
Re: A Friendly Challenge for D
On Saturday, 13 October 2018 at 18:14:20 UTC, Vijay Nayar wrote: On Saturday, 13 October 2018 at 18:05:45 UTC, Jabari Zakiya wrote: It may be also running into a hard time limit imposed on compilation that Nim had/has that prevented my code from initially compiling. I'm generating a lot of PG parameter constants at compile time, and it's doing a lot of number crunching and building larger and larger arrays of constants as the PG's get larger. Try compiling with successive PG's (just P5, then P5 and P7, etc) to see where it fails. That will let you know the code is working correctly, and that the compiler is choking either/and because of a hard time limit and/or memory limit. That's why I put in a compiler output statement in 'genPGparameters' to see the progression of the PG parameters being built by the compiler to initially find when the compiler started choking. You may also need to patch whatever facility in the D compiler chain that controls this too. It's P17, the biggest one that takes the longest to build in the Nim version. I actually don't know what memory limits exist for the D compiler at compile-time, so I may need to do some homework. It's not just DMD either. $ ldc2 twinprimes_ssoz.d ... generating parameters for P17 Killed $ gdc twinprimes_ssoz.d ... generating parameters for P17 gdc: internal compiler error: Killed (program cc1d) Please submit a full bug report, with preprocessed source if appropriate. See for instructions. $ dmd twinprimes_ssoz.d ... generating parameters for P17 Killed
Re: A Friendly Challenge for D
On Saturday, 13 October 2018 at 18:14:20 UTC, Vijay Nayar wrote: On Saturday, 13 October 2018 at 18:05:45 UTC, Jabari Zakiya wrote: It may be also running into a hard time limit imposed on compilation that Nim had/has that prevented my code from initially compiling. I'm generating a lot of PG parameter constants at compile time, and it's doing a lot of number crunching and building larger and larger arrays of constants as the PG's get larger. Try compiling with successive PG's (just P5, then P5 and P7, etc) to see where it fails. That will let you know the code is working correctly, and that the compiler is choking either/and because of a hard time limit and/or memory limit. That's why I put in a compiler output statement in 'genPGparameters' to see the progression of the PG parameters being built by the compiler to initially find when the compiler started choking. You may also need to patch whatever facility in the D compiler chain that controls this too. It's P17, the biggest one that takes the longest to build in the Nim version. I actually don't know what memory limits exist for the D compiler at compile-time, so I may need to do some homework. Yes, that's what I figured, because that's when the Nim compiler initially balked. The variable that was patched in the Nim configg file set a hard limit on number of operations the compiler could do. It's used as a break switch on a runaway process (infinite loop, etc). It's probably something like that in the D compiler too, just guessing offhand. I hope it isn't a memory limit. However, the program will run fine without using P17. It's currently selected as the optimum PG for inputs greater than 15 trillion (15,000,000,000,000), so the program will run fine and produce correct output using P13 instead, but just not as fast.
Re: A Friendly Challenge for D
On Saturday, 13 October 2018 at 18:05:45 UTC, Jabari Zakiya wrote: It may be also running into a hard time limit imposed on compilation that Nim had/has that prevented my code from initially compiling. I'm generating a lot of PG parameter constants at compile time, and it's doing a lot of number crunching and building larger and larger arrays of constants as the PG's get larger. Try compiling with successive PG's (just P5, then P5 and P7, etc) to see where it fails. That will let you know the code is working correctly, and that the compiler is choking either/and because of a hard time limit and/or memory limit. That's why I put in a compiler output statement in 'genPGparameters' to see the progression of the PG parameters being built by the compiler to initially find when the compiler started choking. You may also need to patch whatever facility in the D compiler chain that controls this too. It's P17, the biggest one that takes the longest to build in the Nim version. I actually don't know what memory limits exist for the D compiler at compile-time, so I may need to do some homework.
Re: A Friendly Challenge for D
On Saturday, 13 October 2018 at 17:36:33 UTC, Vijay Nayar wrote: On Saturday, 13 October 2018 at 15:50:06 UTC, Vijay Nayar wrote: On Saturday, 13 October 2018 at 15:19:07 UTC, Jabari Zakiya wrote: On Saturday, 13 October 2018 at 14:32:33 UTC, welkam wrote: On Saturday, 13 October 2018 at 09:22:16 UTC, Vijay Nayar wrote: [...] import algorithm thats all but then it spits out lib/nim/pure/algorithm.nim(144, 11) Error: interpretation requires too many iterations My mistake. I updated the file and forgot to include the 'import algorithm' directive. The file is now fixed to include it. Download the corrected version or patch your file accordingly. As stated in the file intro **YOU MUST DO THIS** to get it to compile with current Nim (they were supposed to fix this in this version 0.19.0 but didn't). To compile for nim versions <= 0.19.0 do following: 1) in file: ~/nim-0.19.0/compiler/vmdef.nim 2) set variable: MaxLoopIterations* = 1_000_000_000 (1 Billion or >) 3) then rebuild sysem: ./koch boot -d:release If you are using 'choosenim' to install Nim (highly advisable) the full path is: ~/.choosenim/toolchains/nim-0.19.0/compiler/vmdef.nim I'll post performance results from my laptop to give reference times to compare against. Ok, now it builds. I was previously following the build instructions from the Nim website and am not super clear what the "koch" tool does, but following your instructions, the program does build and run. I'll take a stab at making a D version. Interesting results so far. I have a partially converted program here: https://gist.github.com/vnayar/79e2d0a9850833b8859dd9f08997b4d7 The interesting part is that during compilation (with the command "dmd twinprimes_ssoz.d"), the compilation will abort with the message "Killed" and no further information. That's a new one for me, so I'm looking into the cause. It may be also running into a hard time limit imposed on compilation that Nim had/has that prevented my code from initially compiling. I'm generating a lot of PG parameter constants at compile time, and it's doing a lot of number crunching and building larger and larger arrays of constants as the PG's get larger. Try compiling with successive PG's (just P5, then P5 and P7, etc) to see where it fails. That will let you know the code is working correctly, and that the compiler is choking either/and because of a hard time limit and/or memory limit. That's why I put in a compiler output statement in 'genPGparameters' to see the progression of the PG parameters being built by the compiler to initially find when the compiler started choking. You may also need to patch whatever facility in the D compiler chain that controls this too.
Re: A Friendly Challenge for D
On Saturday, 13 October 2018 at 15:50:06 UTC, Vijay Nayar wrote: On Saturday, 13 October 2018 at 15:19:07 UTC, Jabari Zakiya wrote: On Saturday, 13 October 2018 at 14:32:33 UTC, welkam wrote: On Saturday, 13 October 2018 at 09:22:16 UTC, Vijay Nayar wrote: [...] import algorithm thats all but then it spits out lib/nim/pure/algorithm.nim(144, 11) Error: interpretation requires too many iterations My mistake. I updated the file and forgot to include the 'import algorithm' directive. The file is now fixed to include it. Download the corrected version or patch your file accordingly. As stated in the file intro **YOU MUST DO THIS** to get it to compile with current Nim (they were supposed to fix this in this version 0.19.0 but didn't). To compile for nim versions <= 0.19.0 do following: 1) in file: ~/nim-0.19.0/compiler/vmdef.nim 2) set variable: MaxLoopIterations* = 1_000_000_000 (1 Billion or >) 3) then rebuild sysem: ./koch boot -d:release If you are using 'choosenim' to install Nim (highly advisable) the full path is: ~/.choosenim/toolchains/nim-0.19.0/compiler/vmdef.nim I'll post performance results from my laptop to give reference times to compare against. Ok, now it builds. I was previously following the build instructions from the Nim website and am not super clear what the "koch" tool does, but following your instructions, the program does build and run. I'll take a stab at making a D version. Interesting results so far. I have a partially converted program here: https://gist.github.com/vnayar/79e2d0a9850833b8859dd9f08997b4d7 The interesting part is that during compilation (with the command "dmd twinprimes_ssoz.d"), the compilation will abort with the message "Killed" and no further information. That's a new one for me, so I'm looking into the cause.
Re: A Friendly Challenge for D
On Saturday, 13 October 2018 at 15:19:07 UTC, Jabari Zakiya wrote: On Saturday, 13 October 2018 at 14:32:33 UTC, welkam wrote: On Saturday, 13 October 2018 at 09:22:16 UTC, Vijay Nayar wrote: [...] import algorithm thats all but then it spits out lib/nim/pure/algorithm.nim(144, 11) Error: interpretation requires too many iterations My mistake. I updated the file and forgot to include the 'import algorithm' directive. The file is now fixed to include it. Download the corrected version or patch your file accordingly. As stated in the file intro **YOU MUST DO THIS** to get it to compile with current Nim (they were supposed to fix this in this version 0.19.0 but didn't). To compile for nim versions <= 0.19.0 do following: 1) in file: ~/nim-0.19.0/compiler/vmdef.nim 2) set variable: MaxLoopIterations* = 1_000_000_000 (1 Billion or >) 3) then rebuild sysem: ./koch boot -d:release If you are using 'choosenim' to install Nim (highly advisable) the full path is: ~/.choosenim/toolchains/nim-0.19.0/compiler/vmdef.nim I'll post performance results from my laptop to give reference times to compare against. Ok, now it builds. I was previously following the build instructions from the Nim website and am not super clear what the "koch" tool does, but following your instructions, the program does build and run. I'll take a stab at making a D version.
Re: A Friendly Challenge for D
On Saturday, 13 October 2018 at 14:32:33 UTC, welkam wrote: On Saturday, 13 October 2018 at 09:22:16 UTC, Vijay Nayar wrote: I downloaded the reference NIM implementation and got the latest nim compiler, but I received the following error: $ nim c --cc:gcc --d:release --threads:on twinprimes_ssoz.nim twinprimes_ssoz.nim(74, 11) Error: attempting to call undeclared routine: 'sort' For a person not familiar with nim, what's the fastest way to fix that? import algorithm thats all but then it spits out lib/nim/pure/algorithm.nim(144, 11) Error: interpretation requires too many iterations My mistake. I updated the file and forgot to include the 'import algorithm' directive. The file is now fixed to include it. Download the corrected version or patch your file accordingly. As stated in the file intro **YOU MUST DO THIS** to get it to compile with current Nim (they were supposed to fix this in this version 0.19.0 but didn't). To compile for nim versions <= 0.19.0 do following: 1) in file: ~/nim-0.19.0/compiler/vmdef.nim 2) set variable: MaxLoopIterations* = 1_000_000_000 (1 Billion or >) 3) then rebuild sysem: ./koch boot -d:release If you are using 'choosenim' to install Nim (highly advisable) the full path is: ~/.choosenim/toolchains/nim-0.19.0/compiler/vmdef.nim I'll post performance results from my laptop to give reference times to compare against.
Re: A Friendly Challenge for D
On Saturday, 13 October 2018 at 14:32:33 UTC, welkam wrote: On Saturday, 13 October 2018 at 09:22:16 UTC, Vijay Nayar wrote: I downloaded the reference NIM implementation and got the latest nim compiler, but I received the following error: $ nim c --cc:gcc --d:release --threads:on twinprimes_ssoz.nim twinprimes_ssoz.nim(74, 11) Error: attempting to call undeclared routine: 'sort' For a person not familiar with nim, what's the fastest way to fix that? import algorithm thats all but then it spits out lib/nim/pure/algorithm.nim(144, 11) Error: interpretation requires too many iterations I ran into the same problem as you did, and then followed the instructions from the error. I modified the compiler source and increased the number of maximum iterations from 3_000_000 to 1_000_000_000, rebuilt and installed it, but still ran into the exact same problem. There may be something up with the algorithm itself.
Re: A Friendly Challenge for D
On Saturday, 13 October 2018 at 09:22:16 UTC, Vijay Nayar wrote: I downloaded the reference NIM implementation and got the latest nim compiler, but I received the following error: $ nim c --cc:gcc --d:release --threads:on twinprimes_ssoz.nim twinprimes_ssoz.nim(74, 11) Error: attempting to call undeclared routine: 'sort' For a person not familiar with nim, what's the fastest way to fix that? import algorithm thats all but then it spits out lib/nim/pure/algorithm.nim(144, 11) Error: interpretation requires too many iterations
Re: A Friendly Challenge for D
On Friday, 12 October 2018 at 21:08:03 UTC, Jabari Zakiya wrote: On Friday, 12 October 2018 at 20:05:29 UTC, welkam wrote: On Friday, 12 October 2018 at 16:19:59 UTC, Jabari Zakiya wrote: The real point of the challenge is too see what idiomatic code... There is no idiomatic D code. There is only better implementations. D doesnt tell you how to write your code. It gives you many tools and you choose which tools to use. That`s what people like about D. I await your implementation(s)! :-) I downloaded the reference NIM implementation and got the latest nim compiler, but I received the following error: $ nim c --cc:gcc --d:release --threads:on twinprimes_ssoz.nim twinprimes_ssoz.nim(74, 11) Error: attempting to call undeclared routine: 'sort' For a person not familiar with nim, what's the fastest way to fix that?
Re: A Friendly Challenge for D
On Friday, 12 October 2018 at 20:05:29 UTC, welkam wrote: On Friday, 12 October 2018 at 16:19:59 UTC, Jabari Zakiya wrote: The real point of the challenge is too see what idiomatic code... There is no idiomatic D code. There is only better implementations. D doesnt tell you how to write your code. It gives you many tools and you choose which tools to use. That`s what people like about D. I await your implementation(s)! :-)
Re: A Friendly Challenge for D
On Friday, 12 October 2018 at 16:19:59 UTC, Jabari Zakiya wrote: The real point of the challenge is too see what idiomatic code... There is no idiomatic D code. There is only better implementations. D doesnt tell you how to write your code. It gives you many tools and you choose which tools to use. That`s what people like about D.
Re: A Friendly Challenge for D
On Friday, 12 October 2018 at 16:19:59 UTC, Jabari Zakiya wrote: Hmm,I don't think what you're saying about similar output|performance with other languages is empirically correct, but it's really not the point of the challenge. Thats why godbolt exists. c++ and Rust https://godbolt.org/z/FffXY2 C and D https://godbolt.org/z/YQvkXU Look at assembly output. Its all the same. Compilers are very good at recognizing simple arithmetic computations and can reason about it. What they struggle with is memory access. On Friday, 12 October 2018 at 16:19:59 UTC, Jabari Zakiya wrote: Finally, a really fast D implementation can be a marketing bananza to show people > in the numerical analysis, data|signal processing fields, et al, that D can be used by them to solve their problems and be more performant than C++, etc eBay`s tsv-utils is fastest in the world https://github.com/eBay/tsv-utils/blob/master/docs/comparative-benchmarks-2018.md#top-four-in-each-benchmark D`s JSON parsing is fastest in the world https://github.com/kostya/benchmarks#json Sambamba is BAM data processor and is fastest in the world https://www.basepairtech.com/blog/sorting-bam-files-samtools-vs-sambamba/ Mir GLAS is faster than OpenBLAS and Eigen http://blog.mir.dlang.io/glas/benchmark/openblas/2016/09/23/glas-gemm-benchmark.html There are enough examples but they are not enough to change minds
Re: A Friendly Challenge for D
On Friday, 12 October 2018 at 15:11:17 UTC, welkam wrote: On Wednesday, 10 October 2018 at 16:15:56 UTC, Jabari Zakiya wrote: What I am requesting here is for a person(s) who is an "expert" (very good) to create a very fast D version, using whatever tricks it has to maximize performance. I would like to include in my paper a good comparison of various implementations in different compiled languages (C/C++, D, Nim, etc) to show how it performs with each. I looked into your NIM code and from programmers point of view there is nothing interesting going on. Simple data structures and simple operations. If you wrote equivalent code in C, C++, D, NIM, Rust, Zig and compiled with same optimizing compiler (llvm or gcc) you should get the same machine code and almost the same performance (less than 1% difference due to runtime). If you got different machine code for equivalent implementation then you should file a bug report. The only way you will get different performance is by changing implementation details but then you would compare apples to oranges. Hmm,I don't think what you're saying about similar output|performance with other languages is empirically correct, but it's really not the point of the challenge. The real point of the challenge is too see what idiomatic code, written for performance, using the best resources that the language provides, will produce compared, to the Nim version. It's not to see what a line-by-line translation from Nim to D would look like. That may be a start to get something working, but shouldn't be the end goal. I'm using the Nim version here as the "reference implementation" so it can be used as the standard for comparison (accuracy of results and performance). The goal for D (et al) users is to use whatever resources it provides to maybe do better. Example. Nim currently doesn't provide standard bitarrays. Using bitarrays in place of byte arrays should perform faster because more data can fit in cache and operate faster. Also, to parallelize the algorithm maybe using OpenMP, CUDA, etc is the way to do it for D. I don't know what constructs D uses for parallel multiprocessing. And as noted before, this algorithms screams out to be done with GPUs. But you are correct that the Nim code uses very simple coding operations. That is one of its beauties! :) It is simple to understand and implement mathematically, short and simple to code, and architecturally adaptable to hardware. So to really do the challenge, the Nim code needs to be compiled and run (per instructions in code) to use as the "reference implementation", to see what correct outputs look like, and their times, and then other implementations can be compared to it. I would hope, after getting an output correct implementation done (to show you really know what you're doing) then alternative implementations can be done to wring out better performance. I think this is a good challenge for anyone wanting to learn D too, because it involves something substantially more than a "toy" algorithm, but short enough to do with minimal time and effort, that involves the need to know (learn about) D in enough detail to determine the "best" (alternative) way to do it. Finally, a really fast D implementation can be a marketing bananza to show people in the numerical analysis, data|signal processing fields, et al, that D can be used by them to solve their problems and be more performant than C++, etc. Again, people should feel free to email me if the want more direct answers to questions, or help.
Re: A Friendly Challenge for D
On Wednesday, 10 October 2018 at 16:15:56 UTC, Jabari Zakiya wrote: What I am requesting here is for a person(s) who is an "expert" (very good) to create a very fast D version, using whatever tricks it has to maximize performance. I would like to include in my paper a good comparison of various implementations in different compiled languages (C/C++, D, Nim, etc) to show how it performs with each. I looked into your NIM code and from programmers point of view there is nothing interesting going on. Simple data structures and simple operations. If you wrote equivalent code in C, C++, D, NIM, Rust, Zig and compiled with same optimizing compiler (llvm or gcc) you should get the same machine code and almost the same performance (less than 1% difference due to runtime). If you got different machine code for equivalent implementation then you should file a bug report. The only way you will get different performance is by changing implementation details but then you would compare apples to oranges.
Re: A Friendly Challenge for D
On 10/11/2018 10:14 AM, Jabari Zakiya wrote: > Ok, hopefully this will work for everyone. Try this link: > > https://mega.nz/#!yJxUEQgK!MY9dwjiWheE8tACtEeS0szduIvdBjiyTn4O6mMD_aZw Thank you. That worked just fine. I clicked the Download link and the pdf was saved on my end. :) Ali
Re: A Friendly Challenge for D
On Thursday, 11 October 2018 at 16:13:17 UTC, Jabari Zakiya wrote: On Thursday, 11 October 2018 at 14:49:54 UTC, Carl Sturtivant wrote: On Thursday, 11 October 2018 at 13:26:19 UTC, Jabari Zakiyth wrote: On Thursday, 11 October 2018 at 05:11:20 UTC, Ali Çehreli wrote: [...] What country are you trying to get access from, because I know people in the US have gotten the papers from those link, for free and without an account. It may also have something to do with your browser. They work for me in various modern browsers, including the Tor Browser. Not credible --- I cannot get either without an account browsing from the US using Firefox with no unusual arrangements. Wow. Those links used to work with no problem, so I don't know what is it (vpn use, etc,?). OK, this will definitely work. Send me an email and I will email you the paper (and figure out a sure fire way to make it available). Construct this full email (for humans). (my first initial)(lastname) at gmail Ok, hopefully this will work for everyone. Try this link: https://mega.nz/#!yJxUEQgK!MY9dwjiWheE8tACtEeS0szduIvdBjiyTn4O6mMD_aZw However, I'm still interested to find out what is|are the specific issues with the other links so I can make a possible bug report to those services. 1) Can you actually get to the site? 2) Can you see the document (a reference to it)? 3) Can you read the document online (did you try)? 4) What happened when you tried to download it (popup requiring account, etc)? The paper is from 2014, Scribd has changed a lot since then. The more information people can provide will help me determine how to provide better access to them. But you can always email me to get them as a last resort, or just to ask questions.
Re: A Friendly Challenge for D
On Thursday, 11 October 2018 at 14:49:54 UTC, Carl Sturtivant wrote: On Thursday, 11 October 2018 at 13:26:19 UTC, Jabari Zakiyth wrote: On Thursday, 11 October 2018 at 05:11:20 UTC, Ali Çehreli wrote: [...] What country are you trying to get access from, because I know people in the US have gotten the papers from those link, for free and without an account. It may also have something to do with your browser. They work for me in various modern browsers, including the Tor Browser. Not credible --- I cannot get either without an account browsing from the US using Firefox with no unusual arrangements. Wow. Those links used to work with no problem, so I don't know what is it (vpn use, etc,?). OK, this will definitely work. Send me an email and I will email you the paper (and figure out a sure fire way to make it available). Construct this full email (for humans). (my first initial)(lastname) at gmail
Re: A Friendly Challenge for D
On Thursday, 11 October 2018 at 13:26:19 UTC, Jabari Zakiyth wrote: On Thursday, 11 October 2018 at 05:11:20 UTC, Ali Çehreli wrote: [...] What country are you trying to get access from, because I know people in the US have gotten the papers from those link, for free and without an account. It may also have something to do with your browser. They work for me in various modern browsers, including the Tor Browser. Not credible --- I cannot get either without an account browsing from the US using Firefox with no unusual arrangements.
Re: A Friendly Challenge for D
On Thursday, 11 October 2018 at 05:11:20 UTC, Ali Çehreli wrote: On 10/10/2018 07:52 PM, Jabari Zakiyth wrote: > On Wednesday, 10 October 2018 at 22:25:17 UTC, Neia Neutuladh wrote: >> On 10/10/2018 03:05 PM, Jabari Zakiya wrote: >>> https://www.scribd.com/doc/228155369/The-Segmented-Sieve-of-Zakiya-SSoZ >> >> It would be great if you could provide a link to a freely downloadable >> version of this. > > You can download the paper for free from that link. Did you have trouble > doing it? I think the problem is, scribd requires an account, which they apparently happy to link to an existing Google or Facebook account. > Here's another link to paper. > > https://www.academia.edu/7583194/The_Segmented_Sieve_of_Zakiya_SSoZ_ Similarly, that one requires a Google account. > Here, again, is the link to the Nim code. Just install Nim (current > 0.19.0) and compile and run it per instructions in code. I recommend > people do that to see its outputs and verify its results. > > https://gist.github.com/jzakiya/6c7e1868bd749a6b1add62e3e3b2341e That works! :) Ali What country are you trying to get access from, because I know people in the US have gotten the papers from those link, for free and without an account. It may also have something to do with your browser. They work for me in various modern browsers, including the Tor Browser.
Re: A Friendly Challenge for D
On 10/10/2018 07:52 PM, Jabari Zakiyth wrote: > On Wednesday, 10 October 2018 at 22:25:17 UTC, Neia Neutuladh wrote: >> On 10/10/2018 03:05 PM, Jabari Zakiya wrote: >>> https://www.scribd.com/doc/228155369/The-Segmented-Sieve-of-Zakiya-SSoZ >> >> It would be great if you could provide a link to a freely downloadable >> version of this. > > You can download the paper for free from that link. Did you have trouble > doing it? I think the problem is, scribd requires an account, which they apparently happy to link to an existing Google or Facebook account. > Here's another link to paper. > > https://www.academia.edu/7583194/The_Segmented_Sieve_of_Zakiya_SSoZ_ Similarly, that one requires a Google account. > Here, again, is the link to the Nim code. Just install Nim (current > 0.19.0) and compile and run it per instructions in code. I recommend > people do that to see its outputs and verify its results. > > https://gist.github.com/jzakiya/6c7e1868bd749a6b1add62e3e3b2341e That works! :) Ali
Re: A Friendly Challenge for D
On Thursday, 11 October 2018 at 00:22:10 UTC, tide wrote: On Wednesday, 10 October 2018 at 16:15:56 UTC, Jabari Zakiya wrote: I would like to include in my paper a good comparison of various implementations in different compiled languages (C/C++, D, Nim, etc) to show how it performs with each. If you want help with your paper, possibly some kind of decent financial incentive would be appropriate. If the algorithm benefits from more threads than finding or creating an implementation that runs on a GPU would probably be the true performance test. CPUs have like 4-8 cores in the mainstream? A GPU has hundreds, though with some limitations. I'm writing the paper anyway (just like the others), so other implementations are icing on the cake to show implementation variations, as a benefit to readers. Maybe if I set up a website and created a Rosetta Code repo for people to post their different language implementations, and offer a T-shirt for fastest implementation. :-) Yes, a GPU based implementation would be the epitome for this algorithm, by far. This is actually why I have gotten the algorithm to this implementation so that the number crunching can all be done in parallel threads. (It would also be screamingly fast done in hardware in a FPGA too.) However, I only have standard consumer grade laptops. Hopefully someone(s) with sufficient hardware, interest, and time, will take this upon themselves to do this and publicize their results.
Re: A Friendly Challenge for D
On Wednesday, 10 October 2018 at 22:25:17 UTC, Neia Neutuladh wrote: On 10/10/2018 03:05 PM, Jabari Zakiya wrote: https://www.scribd.com/doc/228155369/The-Segmented-Sieve-of-Zakiya-SSoZ It would be great if you could provide a link to a freely downloadable version of this. You can download the paper for free from that link. Did you have trouble doing it? Here's another link to paper. https://www.academia.edu/7583194/The_Segmented_Sieve_of_Zakiya_SSoZ_ Here, again, is the link to the Nim code. Just install Nim (current 0.19.0) and compile and run it per instructions in code. I recommend people do that to see its outputs and verify its results. https://gist.github.com/jzakiya/6c7e1868bd749a6b1add62e3e3b2341e
Re: A Friendly Challenge for D
On Wednesday, 10 October 2018 at 16:15:56 UTC, Jabari Zakiya wrote: I would like to include in my paper a good comparison of various implementations in different compiled languages (C/C++, D, Nim, etc) to show how it performs with each. If you want help with your paper, possibly some kind of decent financial incentive would be appropriate. If the algorithm benefits from more threads than finding or creating an implementation that runs on a GPU would probably be the true performance test. CPUs have like 4-8 cores in the mainstream? A GPU has hundreds, though with some limitations.
Re: A Friendly Challenge for D
On 10/10/2018 03:05 PM, Jabari Zakiya wrote: https://www.scribd.com/doc/228155369/The-Segmented-Sieve-of-Zakiya-SSoZ It would be great if you could provide a link to a freely downloadable version of this.
Re: A Friendly Challenge for D
On Wednesday, 10 October 2018 at 20:43:01 UTC, Kagamin wrote: On Wednesday, 10 October 2018 at 16:15:56 UTC, Jabari Zakiya wrote: https://gist.github.com/jzakiya/6c7e1868bd749a6b1add62e3e3b2341e As i understand, main thread preallocates global memory and tracks it, and other threads don't track it? Here's an abreviated elementary explanation of the algorithm and implementation. You will need to read (download) my paper "The Segmented Sieve of Zakiya (SSoZ)" because I will make reference to things in it, to keep this as short as possible. https://www.scribd.com/doc/228155369/The-Segmented-Sieve-of-Zakiya-SSoZ To really understand the math and theory of the algorithm requires primarily just understanding Table 3 on page 3 of the paper, as it encapsulates everything. You can read the paper to understand the language of the algorithm used in the code. Twinprimes are 2 (odd) primes that differ by only 2 eg. 3-5, 5-7, 11-13, 29-31. Table 3 shows all the residues values (prime candidates|pcs) and residues groups (resgroups|columns) to find all the primes upto 541 using P5 (modpg = 30, rescnt = 8). For a given value N, it will be represented with a PG table of some number of resgroups, with max size I call Kmax (the regroups value N residues in). Using P5, I only need to sieve primes along the residue tracks (restracks) that can produce twinprimes, here 11-13, 17-19, 29-31. Thus I create 3 byte arrays, one for each twinpair, and use the lower 2 bits to represent the upper and lower twinprime restracks. Then for each twinpair (here 3) I run 3 thread which perform the SSoZ on the entire Kmax length of resgroups in parallel. At the end I accumulate the results and print them out. This, in a nutshell, is what the algorithm does. The paper gives you enough to understand the fundamental nature of the algorithm, though I've learned so much more than in 2014. :) The larger the PG the more twinpair restracks (see page 14) there are to use. For larger numbers you want to use the largest PG possible that 'fits' into the hardware cpu. All my development|testing was done on laptops using Intel I5|I7 cpus with 4 or 8 threads. I'm really interested how it performs on other platforms (ARM, AMD, PowerPC, etc). The main proc "twinprimes_ssoz" manages program flow and set as follows: 1) accepts the input number (an integer) in "val" from the cli 2) sets "num" to be first odd number < "val" if "val" even 3) calls "selectPG" with "num" to select optimum Prime Generator (PG) parameters 4) compute various working parameters per selected PG and number value (see refs) 5) compute global values for number of primes, and their values, <= sqrt(num) for selected PG with proc "sozpg" 6) Set initial twinprime count for selected PG 7) Then with proc "segsieve" allocate one thread to perform SSoZ (segmented sieve of zakiya) on each twinprime pair residues (determined by selected PG), and count number of twinprmes computed in each thread. 8) Then determine true total twinprime count and last twinprime <= "num" It also is timing different intervals and prints out diagnostics and final output. The proc "twins_sieve" is the primary function that manages and performs the SSoZ for a given twinprim pair parameters in each thread. The more threads the faster the process goes. I'll provide more info later. I have to run now. I wanted to get this out now while I was at my laptop, and online.
Re: A Friendly Challenge for D
On Wednesday, 10 October 2018 at 16:15:56 UTC, Jabari Zakiya wrote: https://gist.github.com/jzakiya/6c7e1868bd749a6b1add62e3e3b2341e As i understand, main thread preallocates global memory and tracks it, and other threads don't track it?
Re: A Friendly Challenge for D
On Wednesday, 10 October 2018 at 16:15:56 UTC, Jabari Zakiya wrote: [...] Looking forward to this :)
A Friendly Challenge for D
Hi. I hope this is the right place to request this, if not please tell me a better one. I had looked at D, and played with it some circa 2010~2012, but time and life took my priorities away. But I'm still interested in learning different languages, but there are so many more now it's hard to devote the time to learn them to some high level of proficiency. Subsequently, I started learning Nim, because I find the syntax and constructs simpler and more familiar than most (I'm coming mostly from a Ruby background). I have developed in Nim a program to find twinprimes that seems to be the fastest of its type, compared to primesieve (https://primesieve.org/) which claims to be the fastest, written in C++. Here is the code to my Nim implementation of twinprimes_ssoz. https://gist.github.com/jzakiya/6c7e1868bd749a6b1add62e3e3b2341e The total file is just 318 lines of code, with about 60 separate lines of comments. The code is extensively commented per line to explain what's happening. Reading the references given in the code introduction will explain the general process. See "The Segmented Sieve of Zakiya (SSoZ)" https://www.scribd.com/doc/228155369/The-Segmented-Sieve-of-Zakiya-SSoZ I am in the process of writing up a paper to explain this application, and implementation. What I am requesting here is for a person(s) who is an "expert" (very good) to create a very fast D version, using whatever tricks it has to maximize performance. I would like to include in my paper a good comparison of various implementations in different compiled languages (C/C++, D, Nim, etc) to show how it performs with each. This algorithm is designed to be run in multiprocessing environments (more core/threads, better performance). The Nim version (0.18.0, 0.19.0 recently released) uses its native parallel processing structure. In C++, et al, it may be best implemented using OPenMP, or CUDA, etc. These are implementation details one versed in a language could determine. Well that's it. I am willing to share performance data, compared to primesive, if you like. The beauty of this algorithm|implementation is its simplicity in design, and minimal code. It shouldn't be that hard to understand to determine how to translate into D. Of course, I'm am fully available to answer questions and provide help. Thanks Jabari
Re: Knowing the approach to solve a D challenge
On Friday, 16 February 2018 at 09:44:27 UTC, aberba wrote: D has tone of features and library solutions. When you encounter a problem, how do you approach solving it in code? 1. Do you first write it in idiomatic D style or a more general approach before porting to idiomatic D? Like always, it depends. Do I attribute up all my functions with const/@safe/nothrow... (is that even idiomatic) no. Do I build a range to process some data... sometimes. Do I utilize std.algorithm/std.range functions to process my data... when available. Even my C# code tends to be more idiomatic D than others. 2. Do you find yourself mostly rolling out your own implementation first before using a library function? I don't know. If a rolled my own I probably don't know of the library function/function combination that does the same thing. 3. Do the use of generics come out of first try or a rewrite? Usually not. If I don't have another use-case I probably don't know what needs to be generic. 4. What rough percentage of phobos knowledge is required for reasonable good problem solving efficiency? I would guess average. The reason I say this because unlike popular languages search doesn't always provide a good example for every question so you kind of need to already know what you're looking for. It doesn't help that sometimes the answer is a combination of this (.reduce!max that isn't found in std.math)
Re: Knowing the approach to solve a D challenge
On Friday, 16 February 2018 at 09:44:27 UTC, aberba wrote: 1. Do you first write it in idiomatic D style or a more general approach before porting to idiomatic D? In micro-level, it's usually fairly idiomatic from get-go. Usually no heap allocations at inner loops, for-looping, static variables etc. Sometimes if I get a bit desperate of frustated I might do this a bit. But at larger level, I often find myself shifting work between functions, structs, modules and so. In other words, redesigning. 2. Do you find yourself mostly rolling out your own implementation first before using a library function? No, the vast majority of my calls are from Phobos or other libraries I use. And in case of Phobos at least, it's stuff is so general that I don't need to design my own function often. An exception is std.algorithm.each, but just to get better error messages (template constraints don't tell why the call did not work). 3. Do the use of generics come out of first try or a rewrite? Rewriting a lot, but that's the case with all coding. With generics, i often tend to forget ! before an alias parameter. 4. What rough percentage of phobos knowledge is required for reasonable good problem solving efficiency? If we talk about experience of the language in general, it's more than with C-style programming, Java or C# but less than with C++ generics. I don't think you have to know Phobos throughout, but you have to know the language and understand the philosophy of ranges quite deeply. Percentages aren't important, I think that as long as you know the general purpose of std.algorithm, std.range, std.stdio, std.conv and std.array that gets you started. Again, if you know the language and understand the range concept.
Re: Knowing the approach to solve a D challenge
On Friday, 16 February 2018 at 09:44:27 UTC, aberba wrote: D has tone of features and library solutions. When you encounter a problem, how do you approach solving it in code? 1. Do you first write it in idiomatic D style or a more general approach before porting to idiomatic D? As idiomatic as possible. But it depends on how big and/or important the method/function is. The more important and/or bigger, the more idiomatic. Sometimes a `foreach` will do in a small function. But D makes it easy to write idiomatic code straight away, although sometimes it can be a bit annoying when you get messages like "cannot implicitly convert x of type y to z". So you have to keep track of what you are chaining. 2. Do you find yourself mostly rolling out your own implementation first before using a library function? I usually start to roll out my own for a few minutes until I say "Hold on, maybe the library already has a function that does exactly that!" And it often does. That's because I get a better understanding of what I need, when I roll my own for a while. Using the library straight away can sometimes result in shooting yourself in the foot. So it's a mix of both. 3. Do the use of generics come out of first try or a rewrite? Depends on the problem at hand. I you know you will have to handle several types down the road, it's a minimal generic that accepts only one type to begin with but that can be extended to others. It's a bit "play by ear". If I'm dealing with HTML and I know that JSON or CSV will be needed later, I write the function in a way that it can easily be extended. If a function returns the user name as a `string`, I won't bother with generics. That's just normal, I think. 4. What rough percentage of phobos knowledge is required for reasonable good problem solving efficiency? You cannot know every function, of course, and new functions are being added all the time. What you need to know is what Phobos can do in general and where to look for it. The cheat sheet should be the first port of call. You can save a lot of time by skimming through the library regularly. Also, it's worth looking at other people's code in the "learn" section. What often happens to me, when I see a solution, I go "Jesus, that's clever, I'll keep that in mind!" The learn section is good because it focuses on one (small) problem at a time.