Re: Challenge Tuples

2024-05-03 Thread NotYouAgain via Digitalmars-d-learn

On Friday, 3 May 2024 at 05:11:28 UTC, Salih Dincer wrote:


..
Wouldn't it be great if there was a feature that worked at 
runtime...


SDB@79


module m;
@safe:
private:
import std;

void main()
{
auto myTuple = tuple(1, 2, 3, [1, 3], 5);
int[] arrToSum;

foreach(int i, val; myTuple.expand)
{
if(typeof(val).stringof == "int[]")
{
foreach(v; myTuple.expand[i..i+1])
arrToSum ~= v;
}
else
{
arrToSum ~= val;
}
}

writefln("The total value of the tuples is: %s", 
arrToSum.sum); // 15

}



Re: Challenge Tuples

2024-05-02 Thread Salih Dincer via Digitalmars-d-learn

On Wednesday, 1 May 2024 at 14:15:19 UTC, Andrey Zherikov wrote:



Shorter and without allocations:
```d
import std.typecons : tuple;
import std.algorithm : sum, each;

auto sum(int i) => i;

void main()
{
  auto t = tuple(1, 2, 3, [1, 3], 5);

  int res=0;
  t.each!(e => res += sum(e));
  assert(res == 15);
}
```


Super!

In summary, D is clearly ahead of the tuple. Especially with its 
features similar to AliasSeq, I think it is unrivaled. Wouldn't 
it be great if there was a feature that worked at runtime...


SDB@79



Re: Challenge Tuples

2024-05-01 Thread Andrey Zherikov via Digitalmars-d-learn

On Friday, 26 April 2024 at 13:25:34 UTC, Salih Dincer wrote:
You have a 5-item data tuples as Tuple(1, 2, 3, [1, 3], 5) and 
implement the sum (total = 15) with the least codes using the 
sum() function of the language you are coding...



Let's start with D:

```d
import std.typecons : tuple;
import std.algorithm : sum;

void main()
{
  auto t = tuple(1, 2, 3, [1, 3], 5);

  int[] arr;
  t.each!(e => arr ~= e);
  assert(arr.sum == 15);
}
```

and bonus:

```d
import std.typecons : tuple;
import std.stdio : writeln;
void main()
{
  auto t = tuple(1, 2, 3, [1, 3], 5);
  auto results = [0];

  foreach (data; t)
  {
static
if (is(typeof(data) == int[]))
{
  int sum;
  foreach (d; data)
  {
sum += d;
  }
  results ~= sum;
}
else
{
  results ~= data;
}
  }
  results.writeln; // [0, 1, 2, 3, 4, 5]
```

I bet you won't be able to do it this easily with other 
languages!  Note: I tried with C# and Python and it didn't work!


SDB@79



Shorter and without allocations:
```d
import std.typecons : tuple;
import std.algorithm : sum, each;

auto sum(int i) => i;

void main()
{
  auto t = tuple(1, 2, 3, [1, 3], 5);

  int res=0;
  t.each!(e => res += sum(e));
  assert(res == 15);
}
```



Re: Challenge Tuples

2024-04-27 Thread Julian Fondren via Digitalmars-d-learn

On Friday, 26 April 2024 at 13:25:34 UTC, Salih Dincer wrote:
You have a 5-item data tuples as Tuple(1, 2, 3, [1, 3], 5) and 
implement the sum (total = 15) with the least codes using the 
sum() function of the language you are coding...


Nim:

```nim
import std/[math, typetraits, macros]

macro unrollRange(low, high: static int; name, body: untyped) =
  result = newStmtList()
  for i in low ..< high:
result.add(newBlockStmt(newStmtList(
  newConstStmt(name, newLit i),
  copy body
)))

let t = (1, 2, 3, @[1, 3], 5)
var arr: seq[int]
unrollRange(0, t.tupleLen, i):
  arr.add t[i]
doAssert arr.sum == 15
```

vs. D

1. there's no `each` on tuples
2. there's no static for loop, so a macro is needed for the tuple 
indices

3. `add` is making use of the same overload as D's `~=`


Re: Challenge Tuples

2024-04-27 Thread Sergey via Digitalmars-d-learn

On Friday, 26 April 2024 at 13:25:34 UTC, Salih Dincer wrote:
You have a 5-item data tuples as Tuple(1, 2, 3, [1, 3], 5) and 
implement the sum (total = 15) with the least codes using the 
sum() function of the language you are coding...



Let's start with D:

```d
import std.typecons : tuple;
import std.algorithm : sum;

void main()
{
  auto t = tuple(1, 2, 3, [1, 3], 5);

  int[] arr;
  t.each!(e => arr ~= e);
  assert(arr.sum == 15);
}
```
I bet you won't be able to do it this easily with other 
languages!  Note: I tried with C# and Python and it didn't work!


For Python it is possible to use something like:
```python
t = (1,2,3,[1,3],5)
for e in t:
a.append(e) if isinstance(e, int) else a.extend(e)
print(sum(a))
```



Re: Challenge Tuples

2024-04-27 Thread Salih Dincer via Digitalmars-d-learn

On Saturday, 27 April 2024 at 15:36:40 UTC, Nick Treleaven wrote:
On Saturday, 27 April 2024 at 15:32:40 UTC, Nick Treleaven 
wrote:

On Saturday, 27 April 2024 at 11:55:58 UTC, Basile B. wrote:

foreach const e in u do
if echo(is, e, T) do
result += e;



static if (is(typeof(e) == int))
r += e;


Actually I would write that:
```d
R r;
foreach (e; v)
{
static if (is(typeof(e) : R))
```


I like the new sum() function, great Nick!  Moreover, it also 
works with an ordinary range:


```d
  enum limit = 100; // sum = 5050
  iota(limit + 1).sum.writeln;
```

Python seems too complicated to me, but C# looks delicious. 
Moreover, it was implemented without defining IEnumerator. Well 
done Matheus!


SDB@79


Re: Challenge Tuples

2024-04-27 Thread Nick Treleaven via Digitalmars-d-learn

On Saturday, 27 April 2024 at 15:32:40 UTC, Nick Treleaven wrote:

On Saturday, 27 April 2024 at 11:55:58 UTC, Basile B. wrote:

foreach const e in u do
if echo(is, e, T) do
result += e;



static if (is(typeof(e) == int))
r += e;


Actually I would write that:
```d
R r;
foreach (e; v)
{
static if (is(typeof(e) : R))
```


Re: Challenge Tuples

2024-04-27 Thread Nick Treleaven via Digitalmars-d-learn

On Saturday, 27 April 2024 at 11:55:58 UTC, Basile B. wrote:

Here's [STYX](https://gitlab.com/styx-lang/styx) solution:


function sum[T,U](U u): u32


I think you meant `: T`.


{
var T result;
foreach const e in u do
if echo(is, e, T) do
result += e;
else do
result += sum![T](e);
return result;
}

function main(): s32
{
assert((1, 2, 3, [1, 3], 5).sum![u32]() == 15);
return 0;
}


Mostly equivalent D:
```d
R sum(R = long, T)(T v)
{
R r;
foreach (e; v)
{
static if (is(typeof(e) == int))
r += e;
else
r += sum!R(e);
}
return r;
}

void main()
{
import std;
assert(tuple(1, 2, 3, [1, 3], 5).sum == 15);
}
```


Re: Challenge Tuples

2024-04-27 Thread Basile B. via Digitalmars-d-learn

On Friday, 26 April 2024 at 13:25:34 UTC, Salih Dincer wrote:
You have a 5-item data tuples as Tuple(1, 2, 3, [1, 3], 5) and 
implement the sum (total = 15) with the least codes using the 
sum() function of the language you are coding...



Let's start with D:


Here's [STYX](https://gitlab.com/styx-lang/styx) solution:


function sum[T,U](U u): u32
{
var T result;
foreach const e in u do
if echo(is, e, T) do
result += e;
else do
result += sum![T](e);
return result;
}

function main(): s32
{
assert((1, 2, 3, [1, 3], 5).sum![u32]() == 15);
return 0;
}


A few notes:

- tuples are first class citizen
- `foreach` over tuple is like in D, i.e unrolling
- `echo` is like D `__traits`
- _Implicit Generic Application_ of `U` (that's like D's IFTI) 
makes the task easy


Re: Challenge Tuples

2024-04-26 Thread Andy Valencia via Digitalmars-d-learn

On Friday, 26 April 2024 at 13:25:34 UTC, Salih Dincer wrote:
You have a 5-item data tuples as Tuple(1, 2, 3, [1, 3], 5) and 
implement the sum (total = 15) with the least codes using the 
sum() function of the language you are coding...


My Python solution (function named dosum to avoid collision w. 
Python primitive):


def dosum(itm):
if isinstance(itm, (int, float)):
return itm
return sum( dosum(_i) for _i in itm );

print dosum( [1, 2, 3, [1, 3], 5] )



Re: Challenge Tuples

2024-04-26 Thread matheus via Digitalmars-d-learn

On Friday, 26 April 2024 at 13:25:34 UTC, Salih Dincer wrote:

...


Very nice, for your first example I need to think a bit because 
I'm bit rusty in C#, but I think it will not be as easier as D 
version.


For the bonus part:

private static void Main(string[] args){
var a = (1,2,3,(1,3),5);
var t = a as ITuple;
var xx = new List();

for(var i=0;i   if(t[i].GetType() == 
typeof(System.ValueTuple)){

  var t2 = (t[i] as ITuple);
  var s = 0;
  for(var j=0;jAgain I'm rusty in C#... but so far yes it's more verbose than 
the D version.


Matheus.


Challenge Tuples

2024-04-26 Thread Salih Dincer via Digitalmars-d-learn
You have a 5-item data tuples as Tuple(1, 2, 3, [1, 3], 5) and 
implement the sum (total = 15) with the least codes using the 
sum() function of the language you are coding...



Let's start with D:

```d
import std.typecons : tuple;
import std.algorithm : sum;

void main()
{
  auto t = tuple(1, 2, 3, [1, 3], 5);

  int[] arr;
  t.each!(e => arr ~= e);
  assert(arr.sum == 15);
}
```

and bonus:

```d
import std.typecons : tuple;
import std.stdio : writeln;
void main()
{
  auto t = tuple(1, 2, 3, [1, 3], 5);
  auto results = [0];

  foreach (data; t)
  {
static
if (is(typeof(data) == int[]))
{
  int sum;
  foreach (d; data)
  {
sum += d;
  }
  results ~= sum;
}
else
{
  results ~= data;
}
  }
  results.writeln; // [0, 1, 2, 3, 4, 5]
```

I bet you won't be able to do it this easily with other 
languages!  Note: I tried with C# and Python and it didn't work!


SDB@79


Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow

2024-03-16 Thread Richard (Rikki) Andrew Cattermole via Digitalmars-d-learn
Bit fields are currently going through the DIP process, although because 
of ImportC having it, its just a matter of turning them on and adding 
the parser stuff.


However there is a major drawback to it and is why you'll still need to 
use a struct and that is you can't take a pointer to it.


Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow

2024-03-16 Thread H. S. Teoh via Digitalmars-d-learn
On Sat, Mar 16, 2024 at 09:16:51PM +, Liam McGillivray via 
Digitalmars-d-learn wrote:
> On Friday, 15 March 2024 at 00:21:42 UTC, H. S. Teoh wrote:
[...]
> > When dealing with units of data smaller than a byte, you generally
> > need to do it manually, because memory is not addressable by
> > individual bits, making it difficult to implement things like
> > slicing an array of bool.
[...]
> I'm curious as to what "manual implementation" would mean, since
> clearly making my own struct with `bool[3]` doesn't count. Does D have
> features for precise memory manipulation?

Manual implementation as in you would deal with the machine
representation in terms of bytes, or more likely, uints (on modern CPUs
even though bytes are individually addressible, the hardware actually
works in terms of a larger unit, typically an 4-byte 32-bit unit, or an
8-byte 64-bit unit), using bitwise operators to manipulate the bits the
way you want to.


> Anyway, I'm surprised that D has a special operator `&=` for doing bit
> manipulation on integers, especially given that the steps to convert
> an int into a bool array is more complicated. I would imagine the
> former would be a rather niche thing.

You should understand that bitwise operators are directly implemented in
hardware, and thus operators like &, |, ^, <<, >>, ~, etc., typically
map directly to individual CPU instructions. As such, they are very
fast, and preferred when you're doing bit-level manipulations.  At this
level, you typically do not work with individual bits per se, but with
machine words (typically 32-bit or 64-bit units).  Bitwise operators
operate on all 32 or 64 bits at once, so performance-aware code
typically manipulates all these bits simultaneously rather than
individually.  Of course, using suitable bit-masking you *can* address
individual bits, but the hardware instructions themselves typically work
with all 32/64 bits at once.

Here's a simple example. Suppose you have 3 bits you want to store.
Since the machine doesn't have a 3-bit built-in type, you typically just
use the next larger available size, either a ubyte (8 bits) if you want
compact storage, or if compactness isn't a big issue just a uint (32
bits, you just ignore the other 29 bits that you don't need). So you'd
declare the storage something like this:

uint myBits;

Bits are usually indexed from 0, so bit 0 is the first position, bit 1
is the second position, and so on.  So to set the first bit to 1, you'd
do:

myBits |= 0b001;

Note that at the machine level, this operator works on all 32 bits at
the same time. Most of the bits remain unchanged, though, because
bitwise OR does not change the original value if the operand is 0. So
the overall effect is that the first bit is set.

To set the first bit to 0, there isn't a direct operator that does that,
but you can take advantage of the behaviour of bitwise AND, in which
any bit which is 0 in the operand will get cleared, everything else
remains unchanged. So you'd do this:

myBits &= 0b110;

Now, since we don't really care about the other 29 bits, we could write
this as follows instead, to make our intent clearer:

myBits &= ~0b001;

The ~ operator flips all the bits, so this is equivalent to writing:

myBits &= ~0b_______1110;

Writing it with ~ also has the advantage that should we later decide to
add another bit to our "bit array", we don't have to update the code;
whereas if we'd used `myBits &= 0b110;` then we'd need to change it to
`myBits &= 0b1110;` otherwise our new 4th bit may get unexpectedly
cleared when we only wanted to clear the first bit.

Now, what if we wanted to set both the 1st and 3rd bits?  In a
hypothetical bit array implementation, we'd do the equivalent of:

bool[3] myBits;
myBits[0] = 1;
myBits[2] = 1;

However, in our uint approach, we can cut the number of operations by
half, because the CPU is already operating on the entire 32 bits of the
uint at once -- so there's no need to have two instructions to set two
individual bits when we could just do it all in one:

myBits |= 0b101; // look, ma! both bits set at once!

Similarly, to clear the 1st and 3rd bits simultaneously, we simply
write:

myBits &= ~0b101; // clear both bits in 1 instruction!

Of course, when we only have 3 bits to work with, the savings isn't that
significant.  However, if you have a larger bit array, say you need an
array of 32 bits, this can speed your code up by 32x, because you're
taking advantage of the fact that the hardware is already operating on
all 32 bits at the same time.  On 64-bit CPUs, you can speed it up by
64x because the CPU operates on all 64 bits simultaneously, so you can
manipulate an entire array of 64 bits in a single instruction, which is
64x faster than if you looped over an array of bool with 64 iterations.


T

-- 
Without outlines, life would be pointless.


Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow

2024-03-16 Thread Liam McGillivray via Digitalmars-d-learn

On Friday, 15 March 2024 at 17:25:09 UTC, Daniel N wrote:
On Tuesday, 12 March 2024 at 05:38:03 UTC, Liam McGillivray 
wrote:
I am in need of a data type for holding direction information; 
one of 8 directions on a single axis. They are named in terms 
of compass directions. If D had a 4-bit datatype, I would just 
use this and do `+=2` whenever I want to change the datatype, 
but it doesn't.


Perhaps this would be a good programming challenge for someone 
more experienced than me. Make a data type (probably a struct) 
for holding one of 8 directional values using 3 bits. It 
should accept the use of increment operators to change the 
angle.


Ideally (but optionally), it should work as an enum of the 
same name; "Direction".


Here's a unittest that it should ideally pass:


D actually supports both 3 and 4 bit integers. People will 
likely warn you of minor portability risks... but if you target 
a limited amount of platforms and prefer concise readable code, 
then it's a text book example for bitfields. The risk can 
easily be mitigated by having an unittest to catch the error 
directly(if you try to port to some exotic platform).


dmd -preview=bitfields

(Some lines stolen from Rikki)

```d
struct Direction
{
private uint value : 3;
alias this = value;

enum Direction N  = Direction(0);
enum Direction NE = Direction(1);
enum Direction E  = Direction(2);
enum Direction SE = Direction(3);
enum Direction S  = Direction(4);
enum Direction SW = Direction(5);
enum Direction W  = Direction(6);
enum Direction NW = Direction(7);
}
```


Oh wow! That looks so clean and elegant, aside from the `: 3` 
part being easy to miss, and not very readable for those not 
aware of this feature. This would be so simple. If I used this, I 
wouldn't even need to make a struct and write the operator 
overload functions; just make an enum for a 3-bit uint.


Based on the words in the command and a quick search, I'm 
guessing that this is an experimental feature that has not yet 
been accepted as part of the language. Perhaps I shouldn't use 
this then, just in case it gets pulled and someone who discovers 
my project in the future will have a build error that they don't 
know how to solve. This seems like a bigger problem than the 
extra memory the current ubyte version takes up, which is 
probably quite small for a computer of today anyway.


I suppose this would be a nice feature for the language to have 
if the syntax were reworked. Perhaps something like `uint!3 
value;` would be better.


Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow

2024-03-16 Thread Liam McGillivray via Digitalmars-d-learn

On Friday, 15 March 2024 at 00:21:42 UTC, H. S. Teoh wrote:
On Thu, Mar 14, 2024 at 11:39:33PM +, Liam McGillivray via 
Digitalmars-d-learn wrote: [...]
I tried to rework the functions to use bitwise operations, but 
it was difficult to figure out the correct logic. I decided 
that it's not worth the hassle, so I just changed the value 
storage from `bool[3]` to `ubyte`.

[...]

Just wanted to note that while in theory bool[3] could be 
optimized by the compiler for compact storage, what you're most 
likely to get is 3 bytes, one for each bool, or perhaps even 3 
ints (24 bytes). When dealing with units of data smaller than a 
byte, you generally need to do it manually, because memory is 
not addressable by individual bits, making it difficult to 
implement things like slicing an array of bool. So the compiler 
is most likely to simplify things by making it an array of 
bytes rather than emit complex bit manipulation code to make up 
for the lack of bit-addressability in the underlying hardware.


Using bit operators like others have pointed out in this thread 
is probably the best way to implement what you want.


T


I'm curious as to what "manual implementation" would mean, since 
clearly making my own struct with `bool[3]` doesn't count. Does D 
have features for precise memory manipulation?


Anyway, I'm surprised that D has a special operator `&=` for 
doing bit manipulation on integers, especially given that the 
steps to convert an int into a bool array is more complicated. I 
would imagine the former would be a rather niche thing.


Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow

2024-03-15 Thread Daniel N via Digitalmars-d-learn

On Tuesday, 12 March 2024 at 05:38:03 UTC, Liam McGillivray wrote:
I am in need of a data type for holding direction information; 
one of 8 directions on a single axis. They are named in terms 
of compass directions. If D had a 4-bit datatype, I would just 
use this and do `+=2` whenever I want to change the datatype, 
but it doesn't.


Perhaps this would be a good programming challenge for someone 
more experienced than me. Make a data type (probably a struct) 
for holding one of 8 directional values using 3 bits. It should 
accept the use of increment operators to change the angle.


Ideally (but optionally), it should work as an enum of the same 
name; "Direction".


Here's a unittest that it should ideally pass:


D actually supports both 3 and 4 bit integers. People will likely 
warn you of minor portability risks... but if you target a 
limited amount of platforms and prefer concise readable code, 
then it's a text book example for bitfields. The risk can easily 
be mitigated by having an unittest to catch the error directly(if 
you try to port to some exotic platform).


dmd -preview=bitfields

(Some lines stolen from Rikki)

```d
struct Direction
{
private uint value : 3;
alias this = value;

enum Direction N  = Direction(0);
enum Direction NE = Direction(1);
enum Direction E  = Direction(2);
enum Direction SE = Direction(3);
enum Direction S  = Direction(4);
enum Direction SW = Direction(5);
enum Direction W  = Direction(6);
enum Direction NW = Direction(7);
}
```


Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow

2024-03-14 Thread H. S. Teoh via Digitalmars-d-learn
On Thu, Mar 14, 2024 at 11:39:33PM +, Liam McGillivray via 
Digitalmars-d-learn wrote:
[...]
> I tried to rework the functions to use bitwise operations, but it was
> difficult to figure out the correct logic. I decided that it's not
> worth the hassle, so I just changed the value storage from `bool[3]`
> to `ubyte`.
[...]

Just wanted to note that while in theory bool[3] could be optimized by
the compiler for compact storage, what you're most likely to get is 3
bytes, one for each bool, or perhaps even 3 ints (24 bytes). When
dealing with units of data smaller than a byte, you generally need to do
it manually, because memory is not addressable by individual bits,
making it difficult to implement things like slicing an array of bool.
So the compiler is most likely to simplify things by making it an array
of bytes rather than emit complex bit manipulation code to make up for
the lack of bit-addressability in the underlying hardware.

Using bit operators like others have pointed out in this thread is
probably the best way to implement what you want.


T

-- 
LINUX = Lousy Interface for Nefarious Unix Xenophobes.


Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow

2024-03-14 Thread Basile B. via Digitalmars-d-learn
On Friday, 15 March 2024 at 00:00:01 UTC, Richard (Rikki) Andrew 
Cattermole wrote:


On 15/03/2024 12:47 PM, Basile B. wrote:
On Thursday, 14 March 2024 at 23:39:33 UTC, Liam McGillivray 
wrote:
On Thursday, 14 March 2024 at 01:58:46 UTC, Richard (Rikki) 
Andrew Cattermole wrote:

[...]


I tried to rework the functions to use bitwise operations, 
but it was difficult to figure out the correct logic. I 
decided that it's not worth the hassle, so I just changed the 
value storage from `bool[3]` to `ubyte`. Now it works much 
more like your version.

https://github.com/LiamM32/Open_Emblem/blob/c2014ab3f77e89c0cedcd6dbf7f8362ebfac33a9/source/common.d

I did a little reading, so now I understand what it means 
when you have `&= 7`. But I want to ask, is this faster than 
`%= 8`? If not, I would like to change it to the latter for 
readability.


`%=8` will be codegened using slower intructions w/o optimz 
enabled but with `&=7` you directly get the right instruction, 
which does not involves integer division. See 
https://godbolt.org/z/74vbba5aG


Yes, it'll depend upon how smart the compiler is at optimizing 
and it may not occur in non-optimizing builds.


Indeed GDC (so very likely GCC too, or whatever language uses it 
as backend...) does it without optimz 
(https://godbolt.org/z/Ke7c54Gqj).


That's not very surprising. If you look at LLVM bug tracker, for 
the tag "missed optimisations", in the report body you'll see a 
lot of "GDC does that but we dont", even if here it's a bit 
different, as optimz are not enabled.





Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow

2024-03-14 Thread Richard (Rikki) Andrew Cattermole via Digitalmars-d-learn



On 15/03/2024 12:47 PM, Basile B. wrote:

On Thursday, 14 March 2024 at 23:39:33 UTC, Liam McGillivray wrote:
On Thursday, 14 March 2024 at 01:58:46 UTC, Richard (Rikki) Andrew 
Cattermole wrote:

[...]


I tried to rework the functions to use bitwise operations, but it was 
difficult to figure out the correct logic. I decided that it's not 
worth the hassle, so I just changed the value storage from `bool[3]` 
to `ubyte`. Now it works much more like your version.

https://github.com/LiamM32/Open_Emblem/blob/c2014ab3f77e89c0cedcd6dbf7f8362ebfac33a9/source/common.d

I did a little reading, so now I understand what it means when you 
have `&= 7`. But I want to ask, is this faster than `%= 8`? If not, I 
would like to change it to the latter for readability.


`%=8` will be codegened using slower intructions w/o optimz enabled but 
with `&=7` you directly get the right instruction, which does not 
involves integer division. See https://godbolt.org/z/74vbba5aG


Yes, it'll depend upon how smart the compiler is at optimizing and it 
may not occur in non-optimizing builds.


The modulas instructions are based upon division, this is an incredibly 
expensive operation.


https://stackoverflow.com/a/8022107

The division instruction on Haswell for integers ranges from 9 cycles 
for 8bit, all the way up to 36 cycles for 64bit.


Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow

2024-03-14 Thread Basile B. via Digitalmars-d-learn
On Thursday, 14 March 2024 at 23:39:33 UTC, Liam McGillivray 
wrote:
On Thursday, 14 March 2024 at 01:58:46 UTC, Richard (Rikki) 
Andrew Cattermole wrote:

[...]


I tried to rework the functions to use bitwise operations, but 
it was difficult to figure out the correct logic. I decided 
that it's not worth the hassle, so I just changed the value 
storage from `bool[3]` to `ubyte`. Now it works much more like 
your version.

https://github.com/LiamM32/Open_Emblem/blob/c2014ab3f77e89c0cedcd6dbf7f8362ebfac33a9/source/common.d

I did a little reading, so now I understand what it means when 
you have `&= 7`. But I want to ask, is this faster than `%= 8`? 
If not, I would like to change it to the latter for readability.


`%=8` will be codegened using slower intructions w/o optimz 
enabled but with `&=7` you directly get the right instruction, 
which does not involves integer division. See 
https://godbolt.org/z/74vbba5aG


Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow

2024-03-14 Thread Liam McGillivray via Digitalmars-d-learn
On Thursday, 14 March 2024 at 01:58:46 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
The cost of an add + increment then a bitwise and is only 2-4 
cycles on a Haswell cpu. Depending upon if its working solely 
in registers (via inlining) or its operating on ram.


Whereas if you need to do branching (if statement, loops), this 
is an unpredictable cost and loops where a simple bitwise 
operation can be done is out right non-optimized.


As for exceptions, totally not required. You can solve this by 
simply making your state private so nobody else can mutate it. 
Bounds checking will ensure if the state is corrupted it'll 
error out if you use the lookup method I suggested above.


I tried to rework the functions to use bitwise operations, but it 
was difficult to figure out the correct logic. I decided that 
it's not worth the hassle, so I just changed the value storage 
from `bool[3]` to `ubyte`. Now it works much more like your 
version.

https://github.com/LiamM32/Open_Emblem/blob/c2014ab3f77e89c0cedcd6dbf7f8362ebfac33a9/source/common.d

I did a little reading, so now I understand what it means when 
you have `&= 7`. But I want to ask, is this faster than `%= 8`? 
If not, I would like to change it to the latter for readability.


Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow

2024-03-13 Thread Richard (Rikki) Andrew Cattermole via Digitalmars-d-learn
There appears to be a few things that you may not be aware of based upon 
this implementation.


The cost of an add + increment then a bitwise and is only 2-4 cycles on 
a Haswell cpu. Depending upon if its working solely in registers (via 
inlining) or its operating on ram.


The cost of a move from ram into a register is about 1 cycle, but can 
have latencies around 3 cycles if not cached.


Whereas if you need to do branching (if statement, loops), this is an 
unpredictable cost and loops where a simple bitwise operation can be 
done is out right non-optimized.


As for methods like to:

```d
static immutable string[] table = ["north", ...];
return table[this.value];
```

That's two moves from ram to register.

In essence in your design, you have blown out the cost by a significant 
margin well beyond the 12% that is described as being the minimum to 
consider optimization.


If you want to understand where the 12% number comes from I suggest 
reading the paper "Structured Programming with go to Statements" by 
Donald Knuth.


As for exceptions, totally not required. You can solve this by simply 
making your state private so nobody else can mutate it. Bounds checking 
will ensure if the state is corrupted it'll error out if you use the 
lookup method I suggested above.


Understanding what I have said is very important for game engine 
programmers. So if you're interested in it beyond some hobbyist 
endeavors you have some reading up to do ;)


Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow

2024-03-13 Thread Liam McGillivray via Digitalmars-d-learn
On Tuesday, 12 March 2024 at 06:38:28 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
By taking advantage of integer wrapping and a bitwise and, its 
quite a simple problem to solve!


Challenge for the reader: add support for binary operations and 
toString support.


Last night I pushed the latest commit to the GitHub repository 
for my game. It contains the `Direction` struct in 
[`source/common.d`](https://github.com/LiamM32/Open_Emblem/blob/master/source/common.d). Here it is:


```
struct Direction //One of 8 directions stored in 3 bits
{
import std.conv;
import std.traits: isNumeric;

bool[3] b;

static Direction N = Direction(b:[false,false,false]);
static Direction NE = Direction(b:[true,false,false]);
static Direction E = Direction(b:[false,true,false]);
static Direction SE = Direction(b:[true,true,false]);
static Direction S = Direction(b:[false,false,true]);
static Direction SW = Direction(b:[true,false,true]);
static Direction W = Direction(b:[false,true,true]);
static Direction NW = Direction(b:[true,true,true]);

ref Direction opUnary(string op)() if (op == "++" || op == 
"--") {

static if (op == "++") const bool up = true;
else const bool up = false;

if (b[0]) {
if (b[1]) b[2] = !b[2];
b[1] = !b[1];
}
b[0] = !b[0];
return this;
}

void opOpAssign(string op)(int amount) if (op == "+" || op == 
"-") {

amount = amount%8;
if (amount > 0) for (uint i = 0; i < amount; i++) {
static if (op=="+") this++;
else this--;
} else for (uint i=0; i > amount; i--) {
static if (op=="+") this--;
else this++;
}
}

T to(T)() const if(isNumeric!T) {
return cast(T)(b[0] + 2*b[1] + 4*b[2]);
}

T opCast(T)() if (isNumeric!T) {
return cast(T)(b[0] + 2*b[1] + 4*b[2]);
}

T to(T)() const if(is(T==string)) {
if (this==Direction.N) return "north";
else if (this==Direction.NE) return "northeast";
else if (this==Direction.E) return "east";
else if (this==Direction.SE) return "southeast";
else if (this==Direction.S) return "south";
else if (this==Direction.SW) return "southwest";
else if (this==Direction.W) return "west";
else if (this==Direction.NW) return "northwest";
else throw new Exception("Direction.to!: direction has a 
value that should be impossible.");

//else return ""; //This should never happen.
}

bool[3] opCast() const {
return this.b;
}

Direction opposite() const {
return Direction([b[0], b[1], !b[2]]);
}

bool diagonal() {
return b[0];
}

int getAngle() {
if (this==Direction.N) return 0;
else if (this==Direction.NE) return 45;
else if (this==Direction.E) return 90;
else if (this==Direction.SE) return 135;
else if (this==Direction.S) return 180;
else if (this==Direction.SW) return 225;
else if (this==Direction.W) return 270;
else if (this==Direction.NW) return 315;
else throw new Exception("Direction.getAngle: direction 
has a value that should be impossible.");

}
}
```

The one thing that cant be done on this is doing `direction+8`. 
Perhaps it would have been easier if I had chosen to store the 
value as a ubyte rather than an array of bools. While this is 
probably over-optimized given the large amount of memory in 
today's computers, I like it that it can never be an illegitimate 
value.


There's probably some trick I can use to make it easier to figure 
out the functions to do numerical operations on bits, but I don't 
know how best to cleanly represent this kind of thing in code. 
Maybe I need to look for examples of how it's already done.


I noticed among the options for overloading unary operations are 
the symbols "+" & "-". What operation is being overloaded with 
the function `ref Direction opUnary(string op:"+")(int amount)`?


Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow

2024-03-13 Thread Salih Dincer via Digitalmars-d-learn

On Wednesday, 13 March 2024 at 10:27:49 UTC, Basile B. wrote:
The semantics of the operators are actually not as clear as 
that. What if you define


```d
enum Direction
{
N = 1, NE, S = 45, SW
}
```

?


Certainly! EnumMembers; can be used. The EnumMembers template 
from the std.traits module is used to retrieve all the members of 
an enumeration. It generates a tuple containing all the 
enumeration values, which can be iterated over using a foreach 
loop. In the D programming language, you can use EnumMembers to 
iterate over enum values at compile time, which is useful for 
generating code based on enum members.

Here’s a simple code example:

```d
 enum Days
 {
   Monday= 1001,
   Tuesday   = 1010,
   Wednesday = 1011,
   Thursday  = 1100,
   Friday= 1101,
   Saturday  = 1110,
   Sunday= 
 }

 import std.traits : EnumMembers;
 foreach(day; EnumMembers!Days)
   day.writeln(":", cast(int)day);
```

SDB@79


Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow

2024-03-13 Thread Basile B. via Digitalmars-d-learn

On Tuesday, 12 March 2024 at 05:38:03 UTC, Liam McGillivray wrote:
I am in need of a data type for holding direction information; 
one of 8 directions on a single axis. They are named in terms 
of compass directions. If D had a 4-bit datatype, I would just 
use this and do `+=2` whenever I want to change the datatype, 
but it doesn't.


Perhaps this would be a good programming challenge for someone 
more experienced than me. Make a data type (probably a struct) 
for holding one of 8 directional values using 3 bits. It should 
accept the use of increment operators to change the angle.


Ideally (but optionally), it should work as an enum of the same 
name; "Direction".


Here's a unittest that it should ideally pass:
```
unittest
{
Direction direction = Direction.N;
direction++;
assert(direction == Direction.NE);
direction+=3;
assert(direction == Direction.S);
direction--;
assert(direction == Direction.SE);
direction-=4;
assert(direction == Direction.NW);
}
```


While there are workarounds (as proposed in the other answers, 
using operator overloads) I tend to think that the way currently 
enums work with arithmetic operators is a symptom of an 
incomplete type-system. Enums made of integer numbers are sub 
classes of the parent [integral 
sequence](https://en.wikipedia.org/wiki/Integer_sequence).


The semantics of the operators are actually not as clear as that. 
What if you define


```d
enum Direction
{
N = 1, NE, S = 45, SW
}
```

?

You directly see that the proposed workarounds dont work anymore.

There are natural numbers, sub-sets of natural numbers, unordered 
sequences, etc.
The type system does not represent that. Anyway, I dont blame D, 
plenty of languages have the exact same problem.


Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow

2024-03-12 Thread Salih Dincer via Digitalmars-d-learn

On Tuesday, 12 March 2024 at 05:38:03 UTC, Liam McGillivray wrote:
Perhaps this would be a good programming challenge for someone 
more experienced than me. Make a data type (probably a struct) 
for holding one of 8 directional values using 3 bits. It should 
accept the use of increment operators to change the angle.


D is such a perfect language that you can do the features you 
mentioned and more. I also implemented the following during my 
rookie years:


```d
alias Direction = enumSet;
union enumSet(T)
{
  T e;

  alias e this;
  struct
  {
int i;

auto opUnary(string op: "++")()
  => i = i < e.max ? ++i : e.min;

auto opUnary(string op: "--")()
  => i = i > e.min ? --i : e.max;

auto opOpAssign(string op: "+")(int val)
{
  i += val;
  if(i > e.max) i -= e.max + 1;
}

auto opOpAssign(string op: "-")(int val)
{
  i -= val;
  if(i < e.min) i += e.max + 1;
}
  }

  string toString() const
=> e.to!string;
}

unittest
{
 auto direction = Direction!Directions(Directions.N);
 direction++;
 assert(direction == Directions.NE);
 direction+=3;
 assert(direction == Directions.S);
 direction--;
 assert(direction == Directions.SE);
 direction-=4;
 assert(direction == Directions.NW);
}

import std.stdio, std.conv;

void main()
{
  enum Directions
  {
N , NE , E, SE, S, SW , W, NW
  }

  auto test = enumSet!Directions(Directions.W);
   test += 9; /*

   ++test; ++test; ++test;
   ++test; ++test; ++test;
   ++test; ++test; ++test;//*/

   test.writeln;

   test--; test--;
   test.writeln;
}
```

Here union was used with an extra template parameter. I also 
added 2 aliases to avoid confusion.


SDB@79


Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow

2024-03-12 Thread Richard (Rikki) Andrew Cattermole via Digitalmars-d-learn

On 13/03/2024 11:00 AM, Liam McGillivray wrote:
I'm not familiar with the syntax of the line |value &= 7;|. Is it 
equivalent to writing |value = value % 7;|?


& is a bitwise and.

LSB 123456789 MSB

& 7

LSB 12300 MSB

Anyway, you used an int, but I used an array of 3 bools. I'm guessing 
that mine uses less memory, but I'm not sure how memory it adds when 
it's a struct with functions.


Due to alignment, it'll probably use just as much.

Mine only needs a single ``byte``, at 7 bits it's more than enough.

But ``int`` doesn't make much difference unless you are packing 
instances together ``align(0):`` and realistically cpus are optimized 
for 32bits not 8.


Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow

2024-03-12 Thread Liam McGillivray via Digitalmars-d-learn
On Tuesday, 12 March 2024 at 06:38:28 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
By taking advantage of integer wrapping and a bitwise and, its 
quite a simple problem to solve!


Challenge for the reader: add support for binary operations and 
toString support.


https://dlang.org/spec/operatoroverloading.html

```d
struct Direction {
private int value;

Direction opUnary(string op:"++")() {
value++;
value &= 7;
return this;
}

Direction opUnary(string op:"--")() {
value--;
value &= 7;
return this;
}

void opOpAssign(string op:"+")(int amount) {
value += amount;
value &= 7;
}

void opOpAssign(string op:"-")(int amount) {
value -= amount;
value &= 7;
}

enum Direction N = Direction(0);
enum Direction NE = Direction(1);
enum Direction E = Direction(2);
enum Direction SE = Direction(3);
enum Direction S = Direction(4);
enum Direction SW = Direction(5);
enum Direction W = Direction(6);
enum Direction NW = Direction(7);
}

unittest {
 Direction direction = Direction.N;
 direction++;
 assert(direction == Direction.NE);
 direction+=3;
 assert(direction == Direction.S);
 direction--;
 assert(direction == Direction.SE);
 direction-=4;
 assert(direction == Direction.NW);
}
```


Interesting. I didn't know that an enum can be defined inside a 
struct like that. I had used functions to get around it.


Here is what I had already mostly written, using help from 
ChatGPT (but only for the opUnary syntax, not the algorithm):

```
struct Direction //One of 8 directions stored in 3 bits
{
bool[3] d;

static Direction N() { return 
Direction(d:[false,false,false]); }
static Direction NE() { return 
Direction(d:[false,false,true]); }
static Direction E() { return 
Direction(d:[false,true,false]); }
static Direction SE() { return 
Direction(d:[false,true,true]); }
static Direction S() { return 
Direction(d:[true,false,false]); }
static Direction SW() { return 
Direction(d:[true,false,true]); }
static Direction W() { return Direction(d:[true,true,false]); 
}
static Direction NW() { return Direction(d:[true,true,true]); 
}


ref Direction opUnary(string op)() if (op == "++" || op == 
"--") {

if (op == "++") const bool up = true;
else const bool up = false;

if (d[0]) {
if (d[1]) d[2] = !d[2];
d[1] = !d[1];
}
d[0] = !d[0];
return this;
}

auto to(T)() const {
return cast(T)(d[0] + 2*d[1] + 4*d[2]);
}
}
```

I am not entirely sure how well it works. I will come back later 
with an updated version with more functions.


I'm not familiar with the syntax of the line `value &= 7;`. Is it 
equivalent to writing `value = value % 7;`?


Anyway, you used an int, but I used an array of 3 bools. I'm 
guessing that mine uses less memory, but I'm not sure how memory 
it adds when it's a struct with functions.


Re: Challenge: Make a data type for holding one of 8 directions allowing increment and overflow

2024-03-12 Thread Richard (Rikki) Andrew Cattermole via Digitalmars-d-learn
By taking advantage of integer wrapping and a bitwise and, its quite a 
simple problem to solve!


Challenge for the reader: add support for binary operations and toString 
support.


https://dlang.org/spec/operatoroverloading.html

```d
struct Direction {
private int value;

Direction opUnary(string op:"++")() {
value++;
value &= 7;
return this;
}

Direction opUnary(string op:"--")() {
value--;
value &= 7;
return this;
}

void opOpAssign(string op:"+")(int amount) {
value += amount;
value &= 7;
}

void opOpAssign(string op:"-")(int amount) {
value -= amount;
value &= 7;
}

enum Direction N = Direction(0);
enum Direction NE = Direction(1);
enum Direction E = Direction(2);
enum Direction SE = Direction(3);
enum Direction S = Direction(4);
enum Direction SW = Direction(5);
enum Direction W = Direction(6);
enum Direction NW = Direction(7);
}

unittest {
 Direction direction = Direction.N;
 direction++;
 assert(direction == Direction.NE);
 direction+=3;
 assert(direction == Direction.S);
 direction--;
 assert(direction == Direction.SE);
 direction-=4;
 assert(direction == Direction.NW);
}
```


Challenge: Make a data type for holding one of 8 directions allowing increment and overflow

2024-03-11 Thread Liam McGillivray via Digitalmars-d-learn
I am in need of a data type for holding direction information; 
one of 8 directions on a single axis. They are named in terms of 
compass directions. If D had a 4-bit datatype, I would just use 
this and do `+=2` whenever I want to change the datatype, but it 
doesn't.


Perhaps this would be a good programming challenge for someone 
more experienced than me. Make a data type (probably a struct) 
for holding one of 8 directional values using 3 bits. It should 
accept the use of increment operators to change the angle.


Ideally (but optionally), it should work as an enum of the same 
name; "Direction".


Here's a unittest that it should ideally pass:
```
unittest
{
Direction direction = Direction.N;
direction++;
assert(direction == Direction.NE);
direction+=3;
assert(direction == Direction.S);
direction--;
assert(direction == Direction.SE);
direction-=4;
assert(direction == Direction.NW);
}
```


Re: The One Billion Row Challenge

2024-01-13 Thread Sergey via Digitalmars-d-learn

On Saturday, 13 January 2024 at 23:25:07 UTC, monkyyy wrote:

On Thursday, 11 January 2024 at 11:21:39 UTC, Sergey wrote:
On Thursday, 11 January 2024 at 08:57:43 UTC, Christian 
Köstlin wrote:

Did someone already try to do this in dlang?
I guess it will be very hard to beat the java solutions 
running with graalvm!


https://news.ycombinator.com/item?id=38851337

Kind regards,
Christian


I think C++ people already beated Java's performance 
https://github.com/buybackoff/1brc?tab=readme-ov-file#native


I feel we could beat c++ if they didn't radix sort


The project is very hard. Many optimizations and tricks were 
applied by others.
It requires a lot of skill to implement everything on a high 
level.


Re: The One Billion Row Challenge

2024-01-13 Thread monkyyy via Digitalmars-d-learn

On Thursday, 11 January 2024 at 11:21:39 UTC, Sergey wrote:
On Thursday, 11 January 2024 at 08:57:43 UTC, Christian Köstlin 
wrote:

Did someone already try to do this in dlang?
I guess it will be very hard to beat the java solutions 
running with graalvm!


https://news.ycombinator.com/item?id=38851337

Kind regards,
Christian


I think C++ people already beated Java's performance 
https://github.com/buybackoff/1brc?tab=readme-ov-file#native


I feel we could beat c++ if they didn't radix sort


Re: The One Billion Row Challenge

2024-01-11 Thread bachmeier via Digitalmars-d-learn
On Thursday, 11 January 2024 at 08:57:43 UTC, Christian Köstlin 
wrote:

Did someone already try to do this in dlang?
I guess it will be very hard to beat the java solutions running 
with graalvm!


https://news.ycombinator.com/item?id=38851337

Kind regards,
Christian


The problem with this challenge can be seen in the initial 
comments. Writing the fastest possible program *for a specific 
dataset* is not the same thing as writing the fastest program for 
an arbitrary dataset of that size. And, indeed, the fastest 
program is the one that does nothing but print the answer.


Speed on this task doesn't tell you anything about performance 
with different types/sizes of data or constraints on programmer 
time needed to produce a correct implementation and maintain it 
over time.


Re: The One Billion Row Challenge

2024-01-11 Thread Sergey via Digitalmars-d-learn
On Thursday, 11 January 2024 at 08:57:43 UTC, Christian Köstlin 
wrote:

Did someone already try to do this in dlang?
I guess it will be very hard to beat the java solutions running 
with graalvm!


https://news.ycombinator.com/item?id=38851337

Kind regards,
Christian


I think C++ people already beated Java's performance 
https://github.com/buybackoff/1brc?tab=readme-ov-file#native


Re: Threading challenge: calculate fib(45) while spinning

2021-10-16 Thread Sebastiaan Koppe via Digitalmars-d-learn

On Friday, 15 October 2021 at 03:35:44 UTC, jfondren wrote:
The book, "The Go Programming Language" has this simple 
goroutine example:


[...]


Here is a similar implementation using the concurrency library:

```d
import concurrency;
import concurrency.stream;
import concurrency.sender : justFrom;
import concurrency.operations : via, race;
import concurrency.thread : ThreadSender;
import core.time : msecs;
import std.stdio : writef, writefln, stdout;
import core.thread : Thread;

void main() @safe {
enum chars = `-\|/`;
auto spinner = infiniteStream(0)
.scan((int acc, int _) => acc + 1, 0)
.collect((int i) shared @trusted {
writef("\r%c", chars[i % chars.length]);
stdout.flush();
Thread.sleep(100.msecs);
})
.via(ThreadSender());

enum n = 45;
auto work = justFrom(() => fib(n));

auto result = race(spinner, work).syncWait.value;
writefln("\rFibonacci(%d) = %d", n, result.get);
}

int fib(int x) pure @safe @nogc {
if (x < 2)
return x;
return fib(x - 1) + fib(x - 2);
}
```

Go has language support so it is a bit unfair to compare it.

But this code will properly handle errors (in case `writef` or 
`flush` throws), and as well as having an explicit 
synchronization point so that the final `writeln` is always after 
the spinner is done.


Re: Threading challenge: calculate fib(45) while spinning

2021-10-15 Thread Steven Schveighoffer via Digitalmars-d-learn

On 10/15/21 10:01 AM, Ali Çehreli wrote:

 >    writefln!"\rFibonacci(%d) = %d"(n, fibN);

That '\r' bothered me because the actual work needs to know what the 
spinner is doing to clear its remaining character.


I would expect the original go code had the same problem.

-Steve


Re: Threading challenge: calculate fib(45) while spinning

2021-10-15 Thread Ali Çehreli via Digitalmars-d-learn

On 10/14/21 8:54 PM, Ali Çehreli wrote:

>writefln!"\rFibonacci(%d) = %d"(n, fibN);

That '\r' bothered me because the actual work needs to know what the 
spinner is doing to clear its remaining character.


>receiveTimeout(delay,
>   (OwnerTerminated msg) {

And there is a race condition because the spinner can print an extra 
character by the time it receives the OwnerTerminated message. (You can 
observe this by adding e.g. Thread.sleep(300.msecs) after the 
"\rFibonnacci..." line above.)


So, I improved it by removing both of those concerns as well as adding 
the following:


- An optional message when spinning (it can be further improved because 
there is an extra space character if the message is empty)


- A withSpinner() function to work with any delegate

The only requirement is that the delegate should not output to stdout if 
we want a clean output.


import std.stdio : stdout, writef, writefln;
import std.concurrency : receiveTimeout, send, spawn;
import std.traits : ReturnType;
import core.thread : Duration, msecs, Thread;
import std.range : cycle, repeat, walkLength;
import std.format : format;

void main() {
  enum n = 45;

  int fibN; // Left mutable not to complicate the example

  withSpinner({
  fibN = fib(n); // slow
},
format!"Calculating fib(%s)"(n));

  writefln!"Fibonacci(%d) = %d"(n, fibN);
}

// The delegate 'dg' should not output to stdout.
void withSpinner(Dg)(Dg dg,
 string message = null,
 Duration delay = 100.msecs) {
  shared(bool) spinnerDone = false;
  auto s = spawn(, message, delay, );

  // Do work while spinning
  dg();

  // Tell the spinner to stop (the value does not matter)
  s.send(0x0FF);

  // Busy(ish) wait until the spinner is done
  while (!spinnerDone) {
Thread.yield();
  }
}

void spinner(string message,
 Duration delay,
 shared(bool) * done) {
  foreach (c; `-\|/`.cycle) {
if (receiveTimeout(delay, (int _) {})) {
  // Stop request received

  // Clear the spinning message
  writef("\r%s  \r", " ".repeat(walkLength(message)));

  // Tell the user we are done
  *done = true;
  return;
}
writef!"\r%s %c"(message, c);
stdout.flush();
  }
}

auto fib(int x) {
  if (x < 2) {
return x;
  }
  return fib(x-1) + fib(x-2);
}

Ali




Re: Threading challenge: calculate fib(45) while spinning

2021-10-15 Thread Steven Schveighoffer via Digitalmars-d-learn

On 10/14/21 11:35 PM, jfondren wrote:

The book, "The Go Programming Language" has this simple goroutine example:

```go
func main() {
     go spinner(100 * time.Millisecond)
     const n = 45
     fibN := fib(n) // slow
     fmt.Printf("\rFibonacci(%d) = %d\n", n, fibN)
}

func spinner(delay time.Duration) {
     for {
     for _, r := range `-\|/` {
     fmt.Printf("\r%c", r)
     time.Sleep(delay)
     }
     }
}

func fib(x int) int {
     if x < 2 {
     return x
     }
     return fib(x-1) + fib(x-2)
}
```

Attempt #1, with std.concurrency:

```d
import std.concurrency : spawn;
import core.thread : Thread;
import std.stdio : writefln, writef, stdout;
import std.datetime : msecs, Duration;

void main() @safe {
     (() @trusted { spawn(, 100.msecs); })();
     const n = 45;
     const fibN = fib(n); // slow
     writefln!"\rFibonacci(%d) = %d"(n, fibN);
}

void spinner(Duration delay) @safe {
     (() @trusted { Thread.getThis.isDaemon(true); })();
     while (true) {
     foreach (char c; `-\|/`) {
     writef!"\r%c"(c);
     (() @trusted { stdout.flush; })();
     (() @trusted { Thread.sleep(delay); })();
     }
     }
}

int fib(int x) pure @safe @nogc {
     if (x < 2)
     return x;
     return fib(x - 1) + fib(x - 2);
}
```

This version has two problems:

1. a race condition with `isDaemon`: if `main()` ends before 
`isDaemon(true)` is called, then the program never ends because the 
kill-non-daemon-threads module destructor is called while the new thread 
isn't a daemon thread.


You can also just spawn a thread directly with `Thread`, which I believe 
allows you to set the daemon-ness from `main`.




2. it crashes about 10% of the time on exit (in dmd, gdc, and ldc). 
valgrind on a gdc build complains about "Conditional jump or move 
depends on uninitialised value(s)" early on.



The crash is likely because you are using D i/o utilities, and the 
runtime is shut down. Technically it shouldn't cause a problem, but 
possibly there are things that are needed deep inside `writef`.


If you switch to `printf`, it will probably work.

-Steve


Re: Threading challenge: calculate fib(45) while spinning

2021-10-15 Thread Imperatorn via Digitalmars-d-learn

On Friday, 15 October 2021 at 03:54:17 UTC, Ali Çehreli wrote:

On 10/14/21 8:35 PM, jfondren wrote:

[...]


Here is one that uses receiveTimeout and OwnerTerminated:

import std.stdio;
import std.concurrency;
import core.thread;

void main() {
  spawnLinked(, 100.msecs);
  enum n = 45;
  const fibN = fib(n); // slow
  writefln!"\rFibonacci(%d) = %d"(n, fibN);
}

void spinner(const(Duration) delay) {
  for (;;) {
foreach (r; `-\|/`) {
  writef!"\r%c"(r);
  stdout.flush();
  bool done;
  receiveTimeout(delay,
 (OwnerTerminated msg) {
   done = true;
 });
  if (done) {
return;
  }
}
  }
}

auto fib(int x) {
  if (x < 2) {
return x;
  }
  return fib(x-1) + fib(x-2);
}

Ali


This is a "similar" approach to what Erlang does. I have always 
liked it ☀️


Re: Threading challenge: calculate fib(45) while spinning

2021-10-14 Thread Ali Çehreli via Digitalmars-d-learn

On 10/14/21 9:17 PM, jfondren wrote:

On Friday, 15 October 2021 at 03:54:17 UTC, Ali Çehreli wrote:

On 10/14/21 8:35 PM, jfondren wrote:
The book, "The Go Programming Language" has this simple goroutine 
example:


Here is one that uses receiveTimeout and OwnerTerminated:



Very nice, replacing Thread.sleep with receiveTimeout and getting 
graceful interruption for free. This also doesn't crash.


Cool. :)

Actually, it can be shorter by checking the return value of receiveTimeout:

  if (receiveTimeout(delay, (OwnerTerminated msg) {})) {
return;
  }

I didn't use this method earlier because I was afraid an unexpected 
message might make receiveTimeout return 'true'. But I've tested just 
now: Only the expected OwnerTerminated makes it return 'true'.


Ali



Re: Threading challenge: calculate fib(45) while spinning

2021-10-14 Thread jfondren via Digitalmars-d-learn

On Friday, 15 October 2021 at 03:54:17 UTC, Ali Çehreli wrote:

On 10/14/21 8:35 PM, jfondren wrote:
The book, "The Go Programming Language" has this simple 
goroutine example:


Here is one that uses receiveTimeout and OwnerTerminated:



Very nice, replacing Thread.sleep with receiveTimeout and getting 
graceful interruption for free. This also doesn't crash.


Re: Threading challenge: calculate fib(45) while spinning

2021-10-14 Thread Ali Çehreli via Digitalmars-d-learn

On 10/14/21 8:35 PM, jfondren wrote:

The book, "The Go Programming Language" has this simple goroutine example:


Here is one that uses receiveTimeout and OwnerTerminated:

import std.stdio;
import std.concurrency;
import core.thread;

void main() {
  spawnLinked(, 100.msecs);
  enum n = 45;
  const fibN = fib(n); // slow
  writefln!"\rFibonacci(%d) = %d"(n, fibN);
}

void spinner(const(Duration) delay) {
  for (;;) {
foreach (r; `-\|/`) {
  writef!"\r%c"(r);
  stdout.flush();
  bool done;
  receiveTimeout(delay,
 (OwnerTerminated msg) {
   done = true;
 });
  if (done) {
return;
  }
}
  }
}

auto fib(int x) {
  if (x < 2) {
return x;
  }
  return fib(x-1) + fib(x-2);
}

Ali


Threading challenge: calculate fib(45) while spinning

2021-10-14 Thread jfondren via Digitalmars-d-learn
The book, "The Go Programming Language" has this simple goroutine 
example:


```go
func main() {
go spinner(100 * time.Millisecond)
const n = 45
fibN := fib(n) // slow
fmt.Printf("\rFibonacci(%d) = %d\n", n, fibN)
}

func spinner(delay time.Duration) {
for {
for _, r := range `-\|/` {
fmt.Printf("\r%c", r)
time.Sleep(delay)
}
}
}

func fib(x int) int {
if x < 2 {
return x
}
return fib(x-1) + fib(x-2)
}
```

Attempt #1, with std.concurrency:

```d
import std.concurrency : spawn;
import core.thread : Thread;
import std.stdio : writefln, writef, stdout;
import std.datetime : msecs, Duration;

void main() @safe {
(() @trusted { spawn(, 100.msecs); })();
const n = 45;
const fibN = fib(n); // slow
writefln!"\rFibonacci(%d) = %d"(n, fibN);
}

void spinner(Duration delay) @safe {
(() @trusted { Thread.getThis.isDaemon(true); })();
while (true) {
foreach (char c; `-\|/`) {
writef!"\r%c"(c);
(() @trusted { stdout.flush; })();
(() @trusted { Thread.sleep(delay); })();
}
}
}

int fib(int x) pure @safe @nogc {
if (x < 2)
return x;
return fib(x - 1) + fib(x - 2);
}
```

This version has two problems:

1. a race condition with `isDaemon`: if `main()` ends before 
`isDaemon(true)` is called, then the program never ends because 
the kill-non-daemon-threads module destructor is called while the 
new thread isn't a daemon thread.


2. it crashes about 10% of the time on exit (in dmd, gdc, and 
ldc). valgrind on a gdc build complains about "Conditional jump 
or move depends on uninitialised value(s)" early on.


Attempt #2, with std.parallelism:

```d
import std.parallelism : task, taskPool;
import core.thread : Thread;
import std.stdio : writefln, writef, stdout;
import std.datetime : msecs, Duration;

void main() @safe {
auto spin = task!spinner(100.msecs);
taskPool.put(spin);
const n = 45;
const fibN = fib(n); // slow
writefln!"\rFibonacci(%d) = %d"(n, fibN);
}

void spinner(Duration delay) @safe {
while (true) {
foreach (char c; `-\|/`) {
writef!"\r%c"(c);
(() @trusted { stdout.flush; })();
(() @trusted { Thread.sleep(delay); })();
}
}
}

int fib(int x) pure @safe @nogc {
if (x < 2)
return x;
return fib(x - 1) + fib(x - 2);
}
```

This version continues to spin after the Fibonacci result is 
printed, despite 
https://dlang.org/phobos/std_parallelism.html#.taskPool saying 
that `taskPool` worker threads are daemon by default, and despite 
various attempts to add `isDaemon(true)` calls.


Is there a d version without these problems, and without varying 
substantially from the go (by e.g. having the spinner poll to see 
if it should exit gracefully).


Re: Bit rotation question/challenge

2021-01-30 Thread vitamin via Digitalmars-d-learn

On Saturday, 30 January 2021 at 14:56:14 UTC, burt wrote:

On Saturday, 30 January 2021 at 14:41:59 UTC, Afgdr wrote:

On Saturday, 30 January 2021 at 14:40:49 UTC, Afgdr wrote:

On Saturday, 30 January 2021 at 13:30:49 UTC, burt wrote:

[...]


cast as uint and shift. cast the result as ubyte[4].


obiously, that works for n=4 with uint and n=8 for ulong, only.


Yes I used to do this, but then I needed it for n > 8.


You can try somethink like this:

https://run.dlang.io/is/POQgnb

import std.range : cycle, take, drop;
import std.algorithm : copy;
import std.stdio;

version (LittleEndian)
ubyte[n] rotateRight(size_t n)(ref const ubyte[n] array, uint 
rotation){

typeof(return) result;

array[]
.cycle()
.drop(n - (rotation / 8) % n)
.take(n)
.copy(result[]);

const ubyte bit_rotation = rotation % 8;

enum ubyte full = 0b_;

if(bit_rotation == 0)
return result;

ubyte next_prefix(const ubyte elm){
const ubyte suffix = (elm & ~(full << bit_rotation));
const ubyte prefix = cast(ubyte)(suffix << (8 - 
bit_rotation));

return prefix;
}

ubyte prefix = next_prefix(result[$-1]);

foreach(ref ubyte elm; result[]){
const new_prefix = next_prefix(elm);
elm = (elm >> bit_rotation) | prefix;
prefix = new_prefix;
}

return result;
}

void main(){
ubyte[4] x = [
0b00011000,
0b0011,
0b00010101,
0b0010,
];

writefln!"%(%8b,\n%)"(x.rotateRight(4));
}



Re: Bit rotation question/challenge

2021-01-30 Thread Afgdr via Digitalmars-d-learn

On Saturday, 30 January 2021 at 14:56:14 UTC, burt wrote:

On Saturday, 30 January 2021 at 14:41:59 UTC, Afgdr wrote:

On Saturday, 30 January 2021 at 14:40:49 UTC, Afgdr wrote:

On Saturday, 30 January 2021 at 13:30:49 UTC, burt wrote:

[...]


cast as uint and shift. cast the result as ubyte[4].


obiously, that works for n=4 with uint and n=8 for ulong, only.


Yes I used to do this, but then I needed it for n > 8.


As suggested in the other answer BitArray may be the best generic 
solution.


Re: Bit rotation question/challenge

2021-01-30 Thread burt via Digitalmars-d-learn

On Saturday, 30 January 2021 at 14:41:59 UTC, Afgdr wrote:

On Saturday, 30 January 2021 at 14:40:49 UTC, Afgdr wrote:

On Saturday, 30 January 2021 at 13:30:49 UTC, burt wrote:

[...]


cast as uint and shift. cast the result as ubyte[4].


obiously, that works for n=4 with uint and n=8 for ulong, only.


Yes I used to do this, but then I needed it for n > 8.


Re: Bit rotation question/challenge

2021-01-30 Thread burt via Digitalmars-d-learn

On Saturday, 30 January 2021 at 14:17:06 UTC, Paul Backus wrote:

On Saturday, 30 January 2021 at 13:30:49 UTC, burt wrote:

[...]

Now I want to bit-rotate the array as if it is one big integer.


You may find `std.bitmanip.BitArray` useful for this:

http://phobos.dpldocs.info/std.bitmanip.BitArray.html


Thank you, this is indeed what I am looking for!

For future reference, this is how I implemented it:

```d
ubyte[n] rotateRight(size_t n)(ubyte[n] x, uint rotation)
{
import std.bitmanip : BitArray;

ubyte[n] x2;
foreach (i, value; x) // have to swap because of endianness
x2[n - 1 - i] = value;

auto copy = x2;

auto bitArray1 = BitArray(cast(void[]) x2[], n * 8);
auto bitArray2 = BitArray(cast(void[]) copy[], n * 8);
bitArray1 >>= rotation;
bitArray2 <<= n * 8 - rotation;
bitArray1 |= bitArray2;

foreach (i, value; x2) // swap back
x[n - 1 - i] = value;
return x;
}

ubyte[4] x = [
0b00011000,
0b0011,
0b00010101,
0b0010,
];
writefln!"%(%8b,\n%)"(x.rotateRight(4));
```


Re: Bit rotation question/challenge

2021-01-30 Thread Afgdr via Digitalmars-d-learn

On Saturday, 30 January 2021 at 14:40:49 UTC, Afgdr wrote:

On Saturday, 30 January 2021 at 13:30:49 UTC, burt wrote:

I have a static array of `ubyte`s of arbitrary size:

```d
ubyte[4] x = [ // in reality, ubyte[64]
0b1000,
0b0001,
0b00010101,
0b0010,
];
```

Now I want to bit-rotate the array as if it is one big 
integer. So:


```d
ubyte[n] rotateRight(size_t n)(ref const ubyte[n] array, uint 
rotation)

{
// ?
}
// same for rotateLeft

ubyte[4] y = [
0b1001,
0b0100,
0b,
0b10001010,
];
assert(x.rotateRight(9) == y);
assert(y.rotateLeft(9) == x);
```

Any ideas how this could be achieved? I.e. what should go at 
the "?" for rotateRight and rotateLeft?


cast as uint and shift. cast the result as ubyte[4].


obiously, that works for n=4 with uint and n=8 for ulong, only.


Re: Bit rotation question/challenge

2021-01-30 Thread Afgdr via Digitalmars-d-learn

On Saturday, 30 January 2021 at 13:30:49 UTC, burt wrote:

I have a static array of `ubyte`s of arbitrary size:

```d
ubyte[4] x = [ // in reality, ubyte[64]
0b1000,
0b0001,
0b00010101,
0b0010,
];
```

Now I want to bit-rotate the array as if it is one big integer. 
So:


```d
ubyte[n] rotateRight(size_t n)(ref const ubyte[n] array, uint 
rotation)

{
// ?
}
// same for rotateLeft

ubyte[4] y = [
0b1001,
0b0100,
0b,
0b10001010,
];
assert(x.rotateRight(9) == y);
assert(y.rotateLeft(9) == x);
```

Any ideas how this could be achieved? I.e. what should go at 
the "?" for rotateRight and rotateLeft?


cast as uint and shift. cast the result as ubyte[4].


Re: Bit rotation question/challenge

2021-01-30 Thread Paul Backus via Digitalmars-d-learn

On Saturday, 30 January 2021 at 13:30:49 UTC, burt wrote:

I have a static array of `ubyte`s of arbitrary size:

```d
ubyte[4] x = [ // in reality, ubyte[64]
0b1000,
0b0001,
0b00010101,
0b0010,
];
```

Now I want to bit-rotate the array as if it is one big integer.


You may find `std.bitmanip.BitArray` useful for this:

http://phobos.dpldocs.info/std.bitmanip.BitArray.html


Bit rotation question/challenge

2021-01-30 Thread burt via Digitalmars-d-learn

I have a static array of `ubyte`s of arbitrary size:

```d
ubyte[4] x = [ // in reality, ubyte[64]
0b1000,
0b0001,
0b00010101,
0b0010,
];
```

Now I want to bit-rotate the array as if it is one big integer. 
So:


```d
ubyte[n] rotateRight(size_t n)(ref const ubyte[n] array, uint 
rotation)

{
// ?
}
// same for rotateLeft

ubyte[4] y = [
0b1001,
0b0100,
0b,
0b10001010,
];
assert(x.rotateRight(9) == y);
assert(y.rotateLeft(9) == x);
```

Any ideas how this could be achieved? I.e. what should go at the 
"?" for rotateRight and rotateLeft?


Challenge

2020-05-09 Thread Jack Applegame via Digitalmars-d-learn

I recently came across an interesting exercise.

Given a series of positive numbers, each of which belongs to the 
set


{ 2^^i * 3^^j * 5^^k | i, j, k ≥ 0 }.

The series is ordered in ascending order. The beginning looks 
like this:


{ 1, 2, 3, 4, 5, 6, 8, 9, 10, 12, 15, 16, 18, 20, 24, ... }

The goal is to get the Nth element of the series. For example, 
for N = 10, the answer is 12.


On the Internet, I found an elegant and incredibly fast solution 
in Haskell:


```
mergeUniq :: Ord a => [a] -> [a] -> [a]
mergeUniq (x:xs) (y:ys) = case x `compare` y of
   EQ -> x : mergeUniq xs ys
   LT -> x : mergeUniq xs (y:ys)
   GT -> y : mergeUniq (x:xs) ys

powers :: [Integer]
powers = 1 : expand 2 `mergeUniq` expand 3 `mergeUniq` expand 5
  where
expand factor = (factor *) <$> powers

main = print $ powers!!99
```
On my machine, the 100th element is found in 0.215s. OMG!


After that, I spent almost the entire day searching for at least 
the same fast solution in D.


My first attempt was simple and quite pretty, but awfully slow:

```
import std.stdio;
import std.array;
import std.bigint;
import std.algorithm;

auto generate(int n) {
BigInt[] nums = [BigInt("1")];
foreach(i; 0..n) {
nums = merge(nums, [nums[i] * 2, nums[i] * 3, nums[i] * 
5]).uniq().array();

}
return nums[n - 1];
}

void main() {
  writeln(generate(5000));
}
```

0.275s for 5000th element.

At the end of the day, I managed to catch up and even overtake 
Haskell by emulating its lazyness with ranges and a funny trick.


Victory! :)

If you find this interesting, I will publish my latest version.
And feel free to find your own solution. ;)


Re: A Friendly Challenge for D

2018-10-16 Thread Jabari Zakiya via Digitalmars-d

On Tuesday, 16 October 2018 at 21:12:39 UTC, welkam wrote:
On Tuesday, 16 October 2018 at 20:58:54 UTC, Jabari Zakiya 
wrote:
And they could be modded to catch semantics like this and 
produce faster code.


Its hard to prove that you will only write 1 or 0 in the array 
and even if you write such pass it wont fire very often. So 
slower compile times for no gain in common case.



My guess is the difference in code execution times based on 
the ALU operation pipeline.



and my guess is that CPU knows that its going to write the same 
value to memory so it ignores that write.


Exactly. It does the same instruction twice, and once it's set 
the first time it can ignore setting it the second time.


However with seg[k] = 1 , the mov instruction always moves, 
because it doesn't know|care what the previous value was. That's 
my guess.


Life is too short for such things. Today compilers are good at 
optimizing low level stuff. Those decades of improvements add 
up.


Amen! :-)



Re: A Friendly Challenge for D

2018-10-16 Thread welkam via Digitalmars-d

On Tuesday, 16 October 2018 at 20:58:54 UTC, Jabari Zakiya wrote:
And they could be modded to catch semantics like this and 
produce faster code.


Its hard to prove that you will only write 1 or 0 in the array 
and even if you write such pass it wont fire very often. So 
slower compile times for no gain in common case.



My guess is the difference in code execution times based on the 
ALU operation pipeline.



and my guess is that CPU knows that its going to write the same 
value to memory so it ignores that write.


I used to know most of these tricks back in the Pentium days, 
but I haven't had a need to keep up with Intel Ixx core chips 
machine|compiler > level instruction optimizations.


Life is too short for such things. Today compilers are good at 
optimizing low level stuff. Those decades of improvements add up.


Re: A Friendly Challenge for D

2018-10-16 Thread Jabari Zakiya via Digitalmars-d

On Tuesday, 16 October 2018 at 20:38:24 UTC, welkam wrote:
On Tuesday, 16 October 2018 at 17:57:23 UTC, Jabari Zakiya 
wrote:
This is the exact same behavior I found with the Nim compiler 
too.


Well Nim compiler is more like translator. It translates Nim 
code to c or c++. Since gcc was responsible for optimizations 
and instruction selection it would be more correct to say that 
gcc behavior was same as llvm.




How many other unknown gotchas are in the compilers? :-(


the number of inlining shall be 3
https://www.youtube.com/watch?v=s4wnuiCwTGU


Yes, it finally is the compiler doing it, and Clang does the same 
thing.
And they could be modded to catch semantics like this and produce 
faster code.


My guess is the difference in code execution times based on the 
ALU operation pipeline.


seg[k] = seg[k] | 1

can be done "inplace" in the pipeline, where the lsb can to be 
changed in mem, whereas


seg[k] = 1

may require a separate constant|value creation and memory write, 
and bypass the ALU.


I used to know most of these tricks back in the Pentium days, but 
I haven't had a need to keep up with Intel Ixx core chips 
machine|compiler level instruction optimizations.




Re: A Friendly Challenge for D

2018-10-16 Thread welkam via Digitalmars-d

On Tuesday, 16 October 2018 at 17:57:23 UTC, Jabari Zakiya wrote:
This is the exact same behavior I found with the Nim compiler 
too.


Well Nim compiler is more like translator. It translates Nim code 
to c or c++. Since gcc was responsible for optimizations and 
instruction selection it would be more correct to say that gcc 
behavior was same as llvm.




How many other unknown gotchas are in the compilers? :-(


the number of inlining shall be 3
https://www.youtube.com/watch?v=s4wnuiCwTGU




Re: A Friendly Challenge for D

2018-10-16 Thread Jabari Zakiya via Digitalmars-d

On Tuesday, 16 October 2018 at 07:09:05 UTC, Vijay Nayar wrote:

On Monday, 15 October 2018 at 22:17:57 UTC, Jabari Zakiya wrote:
$ dub build --compiler=ldc2 -b=release && echo "30" | 
./twinprimes

Enter integer number:
threads = 8
each thread segment is [1 x 65536] bytes array
twinprime candidates = 175324676; resgroups = 1298702
each 135 threads has nextp[2 x 5566] array
setup time = 1 ms, 864 μs, and 7 hnsecs
perform twinprimes ssoz sieve
sieve time = 196 ms, 566 μs, and 5 hnsecs
last segment = 53518 resgroups; segment slices = 20
total twins = 9210144; last twin = 299712+/- 1
total time = 198 ms, 431 μs, and 2 hnsecs

My understanding is that the difference in performance is 
largely due to slightly better optimization from the LLVM 
based ldc2 compiler, where I believe Nim is using gcc.


Here's what I get on my system.

$ echo 3_000_000_000 | ./twinprimes_test7yc.0180.gcc821
Enter integer number: threads = 8
each thread segment is [1 x 65536] bytes array
twinprime candidates = 175324676; resgroups = 1298702
each 135 threads has nextp[2 x 5566] array
setup time = 0.000 secs
perform twinprimes ssoz sieve
sieve time = 0.144 secs
last segment = 53518 resgroups; segment slices = 20
total twins = 9210144; last twin = 299712+/-1
total time = 0.144 secs


Hey Vijay I found subtle static typing errors.

When N < 2^32 - 1 (4,294,967,295) the twimprime count and 
lastprime values are correct.

When N > 2^32 - 1 both values start to be incorrect.

Examples: For N = 4,000,000,000 you get correct answers

$ echo 40 | ./twinprimes_ssoz
Enter integer number:
threads = 8
each thread segment is [1 x 65536] bytes array
twinprime candidates = 233766231; resgroups = 1731602
each 135 threads has nextp[2 x 6332] array
setup time = 667 μs and 5 hnsecs
perform twinprimes ssoz sieve
sieve time = 181 ms, 277 μs, and 6 hnsecs
last segment = 27666 resgroups; segment slices = 27
total twins = 11944438; last twin = 399798+/- 1
total time = 181 ms, 945 μs, and 1 hnsec

Examples: For N = 5,000,000,000 you get wrong answers

$ echo 50 | ./twinprimes_ssoz
Enter integer number:
threads = 8
each thread segment is [1 x 65536] bytes array
twinprime candidates = 292207792; resgroups = 2164503
each 135 threads has nextp[2 x 6999] array
setup time = 744 μs and 7 hnsecs
perform twinprimes ssoz sieve
sieve time = 225 ms, 323 μs, and 4 hnsecs
last segment = 1815 resgroups; segment slices = 34
total twins = 14618173; last twin = 705034592+/- 1
total time = 226 ms, 68 μs, and 1 hnse

These are correct answers for N = 5,000,000,000

$ echo 50 | ./twinprimes_test7yc.0180.gcc821
Enter integer number: threads = 8
each thread segment is [1 x 65536] bytes array
twinprime candidates = 292207792; resgroups = 2164503
each 135 threads has nextp[2 x 6999] array
setup time = 0.001 secs
perform twinprimes ssoz sieve
sieve time = 0.237 secs
last segment = 1815 resgroups; segment slices = 34
total twins = 14618166; last twin = 499860+/-1
total time = 0.237 secs

The first easy check should show the lastprime value just a 
little smaller than N.
See the timing|results table I posted earlier to see correct 
outputs as N gets bigger.


It took me a looong time, and was a big PITA, to get the correct 
static typing too in Nim, especially coming from a dynamic 
language like Ruby.




Re: A Friendly Challenge for D

2018-10-16 Thread Jabari Zakiya via Digitalmars-d

On Tuesday, 16 October 2018 at 16:57:12 UTC, welkam wrote:
So I run profiler and 97% of time is spent in void twinsSieve 
function and hotspots are seg[k] = seg[k] |  1; lines. Since 
seg[k] can only be 1 or 0 I removed that or operation. And the 
results are. Queue the drum-roll... 5% slower.


I thought that all of my studying was getting somewhere. That I 
beginning to understand things but no. Performing OR operation 
and then storing data is faster than just storing data. 
[sarcasm] Of course it makes sense [/sarcasm]


I looked at assembler and nothing changed except

orb$0x1,(%rbx,%rdi,1)

is changed to

movb   $0x1,(%rbx,%rdi,1)

I`m completely lost.


This is the exact same behavior I found with the Nim compiler too.

seg[k] = 1  is slower than seg[k] = seg[k] or 1, which is why 
it's coded like that.


Thanks for finding the exact instruction difference.

How many other unknown gotchas are in the compilers? :-(



Re: A Friendly Challenge for D

2018-10-16 Thread Jon Degenhardt via Digitalmars-d

On Tuesday, 16 October 2018 at 07:09:05 UTC, Vijay Nayar wrote:
D has multiple compilers, but for the speed of the finished 
binary, LDC2 is generally recommended.  I used version 1.11.0.  
https://github.com/ldc-developers/ldc/releases/tag/v1.11.0


I was using DUB to manage the project, but to build the 
stand-alone file from the gist link, use this command:  $ ldc2 
-release -O3 twinprimes_ssoz.d

And to run it:  $ echo "30" | ./twinprimes_ssoz


It'd be interesting to see if LTO or PGO generated an 
improvement. It looks like it could in this case, as it might 
optimize some of the inner loops. LTO is easy, enable it with:


-flto= -defaultlib=phobos2-ldc-lto,druntime-ldc-lto

(see: https://github.com/ldc-developers/ldc/releases/tag/v1.9.0). 
I've been using 'thin' on OSX, 'full' on Linux.


PGO is a bit more work, but not too bad. A good primer is here: 
https://johanengelen.github.io/ldc/2016/07/15/Profile-Guided-Optimization-with-LDC.html


--Jon




Re: A Friendly Challenge for D

2018-10-16 Thread welkam via Digitalmars-d
So I run profiler and 97% of time is spent in void twinsSieve 
function and hotspots are seg[k] = seg[k] |  1; lines. Since 
seg[k] can only be 1 or 0 I removed that or operation. And the 
results are. Queue the drum-roll... 5% slower.


I thought that all of my studying was getting somewhere. That I 
beginning to understand things but no. Performing OR operation 
and then storing data is faster than just storing data. [sarcasm] 
Of course it makes sense [/sarcasm]


I looked at assembler and nothing changed except

orb$0x1,(%rbx,%rdi,1)

is changed to

movb   $0x1,(%rbx,%rdi,1)

I`m completely lost.


Re: A Friendly Challenge for D

2018-10-16 Thread Vijay Nayar via Digitalmars-d

On Monday, 15 October 2018 at 22:17:57 UTC, Jabari Zakiya wrote:
$ dub build --compiler=ldc2 -b=release && echo "30" | 
./twinprimes

Enter integer number:
threads = 8
each thread segment is [1 x 65536] bytes array
twinprime candidates = 175324676; resgroups = 1298702
each 135 threads has nextp[2 x 5566] array
setup time = 1 ms, 864 μs, and 7 hnsecs
perform twinprimes ssoz sieve
sieve time = 196 ms, 566 μs, and 5 hnsecs
last segment = 53518 resgroups; segment slices = 20
total twins = 9210144; last twin = 299712+/- 1
total time = 198 ms, 431 μs, and 2 hnsecs

My understanding is that the difference in performance is 
largely due to slightly better optimization from the LLVM 
based ldc2 compiler, where I believe Nim is using gcc.


Here's what I get on my system.

$ echo 3_000_000_000 | ./twinprimes_test7yc.0180.gcc821
Enter integer number: threads = 8
each thread segment is [1 x 65536] bytes array
twinprime candidates = 175324676; resgroups = 1298702
each 135 threads has nextp[2 x 5566] array
setup time = 0.000 secs
perform twinprimes ssoz sieve
sieve time = 0.144 secs
last segment = 53518 resgroups; segment slices = 20
total twins = 9210144; last twin = 299712+/-1
total time = 0.144 secs

Could you list your hardware, D ver, compiler specs.

I will run your code on my system with your D version and 
compiler, if I can.


Excellent work!


D has multiple compilers, but for the speed of the finished 
binary, LDC2 is generally recommended.  I used version 1.11.0.  
https://github.com/ldc-developers/ldc/releases/tag/v1.11.0


I was using DUB to manage the project, but to build the 
stand-alone file from the gist link, use this command:  $ ldc2 
-release -O3 twinprimes_ssoz.d

And to run it:  $ echo "30" | ./twinprimes_ssoz

Running the program a bunch of times, I get variable results, 
mostly between 195ms and 250ms.  Running the Nim version, I also 
get variable results, mostly between 230ms and 280ms.


My system is an Alienware 14x R2 from 2012.  Specs can be found 
here: 
https://www.cnet.com/products/alienware-m14xr2-14-core-i7-3630qm-8-gb-ram-750-gb-hdd/specs/


Re: A Friendly Challenge for D

2018-10-15 Thread Jabari Zakiya via Digitalmars-d
$ dub build --compiler=ldc2 -b=release && echo "30" | 
./twinprimes

Enter integer number:
threads = 8
each thread segment is [1 x 65536] bytes array
twinprime candidates = 175324676; resgroups = 1298702
each 135 threads has nextp[2 x 5566] array
setup time = 1 ms, 864 μs, and 7 hnsecs
perform twinprimes ssoz sieve
sieve time = 196 ms, 566 μs, and 5 hnsecs
last segment = 53518 resgroups; segment slices = 20
total twins = 9210144; last twin = 299712+/- 1
total time = 198 ms, 431 μs, and 2 hnsecs

My understanding is that the difference in performance is 
largely due to slightly better optimization from the LLVM based 
ldc2 compiler, where I believe Nim is using gcc.


Here's what I get on my system.

$ echo 3_000_000_000 | ./twinprimes_test7yc.0180.gcc821
Enter integer number: threads = 8
each thread segment is [1 x 65536] bytes array
twinprime candidates = 175324676; resgroups = 1298702
each 135 threads has nextp[2 x 5566] array
setup time = 0.000 secs
perform twinprimes ssoz sieve
sieve time = 0.144 secs
last segment = 53518 resgroups; segment slices = 20
total twins = 9210144; last twin = 299712+/-1
total time = 0.144 secs

Could you list your hardware, D ver, compiler specs.

I will run your code on my system with your D version and 
compiler, if I can.


Excellent work!



Re: A Friendly Challenge for D

2018-10-15 Thread Jabari Zakiya via Digitalmars-d
I don't actually understand the underlying algorithm, but I at 
least understand the flow of the program and the structure.  
The algorithm utilized depends heavily on using shared memory 
access, which can be done in D, but I definitely wouldn't call 
it idiomatic.  In D, message passing is preferred, but it 
really can't be well demonstrated on your algorithm without a 
deeper understanding of the algorithm itself.


A complete working version can be found at: 
https://gist.github.com/vnayar/79e2d0a9850833b8859dd9f08997b4d7


I modified both versions of the program to utilize the 
pgParameters13 for more of an apples-to-apples comparison.


The final results are as follows:
$ nim c --cc:gcc --d:release --threads:on twinprimes_ssoz.nim 
&& echo "30" | ./twinprimes_ssoz

Enter integer number: threads = 8
each thread segment is [1 x 65536] bytes array
twinprime candidates = 175324676; resgroups = 1298702
each 135 threads has nextp[2 x 5566] array
setup time = 0.000 secs
perform twinprimes ssoz sieve
sieve time = 0.222 secs
last segment = 53518 resgroups; segment slices = 20
total twins = 9210144; last twin = 299712+/-1
total time = 0.223 secs

$ dub build --compiler=ldc2 -b=release && echo "30" | 
./twinprimes

Enter integer number:
threads = 8
each thread segment is [1 x 65536] bytes array
twinprime candidates = 175324676; resgroups = 1298702
each 135 threads has nextp[2 x 5566] array
setup time = 1 ms, 864 μs, and 7 hnsecs
perform twinprimes ssoz sieve
sieve time = 196 ms, 566 μs, and 5 hnsecs
last segment = 53518 resgroups; segment slices = 20
total twins = 9210144; last twin = 299712+/- 1
total time = 198 ms, 431 μs, and 2 hnsecs

My understanding is that the difference in performance is 
largely due to slightly better optimization from the LLVM based 
ldc2 compiler, where I believe Nim is using gcc.


Hey Great!  I'll take some time and study your code.

Nim currently allows using gcc or clang: --cc:gcc or --cc:clang, 
and

compiles to C: nim c --cc:gcc... or C++: nim c++ --cc:gcc...

You can compile in Nim with/out using garbage collection (GC) and 
choose GC.
This version needs GC to recover mem created in each thread, or 
it will eat
it up. See operation of program using htop/top to see threads|mem 
at work.


If D has a fast bitarray structure, try it to create the data 
arrays
"prms" in proc "sozpg" and "seg" arrays is "twin_sieves". This 
could allow
for larger segment sizes|arrays that fit into cpu cache, reducing 
the number
of segment slices to process. This would all have to be profiled 
and
encapsulated in "selectPG", which adaptively selects to best PG 
to use.


Here is a table of output data and performance times (in seconds).

The results were produced on a System76 laptop with an Intel I7 
6700HQ cpu,
2.6-3.5 GHz clock, with 8 threads, and 16GB of memory. I compiled 
the file
"twinprimes_ssoz.nim" with Nim versions 0.18.0, and the just 
released 0.19.0,
using gcc 8.2.1 with Manjaro (Linux) in Virtual Box. I then ran 
the executables
on my host Linux system (which only has gcc-4.9.8) to use all 8 
threads.


The output was produced on a "quiet" system, where I turned off 
the laptop and
rebooted, and ran tests with only a terminal and spreedsheet open 
(to record

results). The times are the "best" times from multiple runs.

I also compare times from the program "primesieve" - 
https://primesieve.org/

which states on its site:


primesieve is a free software program and C/C++ library that 
generates primes using a highly optimized sieve of Eratosthenes 
implementation. It counts the primes below 10^10 in just 0.45 
seconds on an Intel Core i7-6700 CPU (4 x 3.4 GHz). primesieve 
can generate primes and prime k‑tuplets up to 2^64.



(An optimized C/C++ versions of twinprimes_ssoz would be nice to 
compare against too.)



  Number  | Twimprimes | Last Twinprime |twinprime_ssoz.nim, 
gcc-8.2.1| primesieve |
  ||| 0.18.0   |
0.19.0|   9.7.1|

--|||--|--||
1 x 10^09 |3424506 | 199872 |  0.043   |  
0.041   | 0.049  |

--|||--|--||
5 x 10^09 |   14618166 | 499860 |  0.219   |  
0.226   | 0.248  |

--|||--|--||
1 x 10^10 |   27412679 | 999702 |  0.398   |  
0.401   | 0.533  |

--|||--|--||
5 x 10^10 |  118903682 |4999590 |  2.364   |  
2.331   | 2.978  |

--|||--|--||
1 x 10^11 |  224376048 |762 |  

Re: A Friendly Challenge for D

2018-10-15 Thread Vijay Nayar via Digitalmars-d

On Sunday, 14 October 2018 at 10:51:11 UTC, Vijay Nayar wrote:
Once I get the bugs out, I'm curious to see if any performance 
differences crop up.  There's the theory that says they should 
be the same, and then there's the practice.


I don't actually understand the underlying algorithm, but I at 
least understand the flow of the program and the structure.  The 
algorithm utilized depends heavily on using shared memory access, 
which can be done in D, but I definitely wouldn't call it 
idiomatic.  In D, message passing is preferred, but it really 
can't be well demonstrated on your algorithm without a deeper 
understanding of the algorithm itself.


A complete working version can be found at: 
https://gist.github.com/vnayar/79e2d0a9850833b8859dd9f08997b4d7


I modified both versions of the program to utilize the 
pgParameters13 for more of an apples-to-apples comparison.


The final results are as follows:
$ nim c --cc:gcc --d:release --threads:on twinprimes_ssoz.nim && 
echo "30" | ./twinprimes_ssoz

Enter integer number: threads = 8
each thread segment is [1 x 65536] bytes array
twinprime candidates = 175324676; resgroups = 1298702
each 135 threads has nextp[2 x 5566] array
setup time = 0.000 secs
perform twinprimes ssoz sieve
sieve time = 0.222 secs
last segment = 53518 resgroups; segment slices = 20
total twins = 9210144; last twin = 299712+/-1
total time = 0.223 secs

$ dub build --compiler=ldc2 -b=release && echo "30" | 
./twinprimes

Enter integer number:
threads = 8
each thread segment is [1 x 65536] bytes array
twinprime candidates = 175324676; resgroups = 1298702
each 135 threads has nextp[2 x 5566] array
setup time = 1 ms, 864 μs, and 7 hnsecs
perform twinprimes ssoz sieve
sieve time = 196 ms, 566 μs, and 5 hnsecs
last segment = 53518 resgroups; segment slices = 20
total twins = 9210144; last twin = 299712+/- 1
total time = 198 ms, 431 μs, and 2 hnsecs

My understanding is that the difference in performance is largely 
due to slightly better optimization from the LLVM based ldc2 
compiler, where I believe Nim is using gcc.




Re: A Friendly Challenge for D

2018-10-14 Thread Bastiaan Veelo via Digitalmars-d

On Sunday, 14 October 2018 at 10:51:11 UTC, Vijay Nayar wrote:
But as previous posters have said, the code is not really very 
different between Nim and D.  Most of it is array manipulation 
and arithmetic operations, and not many of the features of 
either D or Nim are very different.  Both turn into fast code, 
both have garbage collection, and both have generally similar 
operators and libraries for this kind of problem.


The biggest differences I observed revolved not around the 
languages themselves, but around code style.


Are you aware of DCompute? [1] I haven’t used it myself but since 
Jabari explicitly mentioned taking advantage of the GPU, it may 
be worth giving it a try. It should have an impact on performance 
and possibly on style as well, even though it will still be D. It 
would be nice if Nicholas could chime in on this thread.


Bastiaan.

[1] 
https://dlang.org/blog/2017/10/30/d-compute-running-d-on-the-gpu/


Re: A Friendly Challenge for D

2018-10-14 Thread Vijay Nayar via Digitalmars-d

On Saturday, 13 October 2018 at 19:04:48 UTC, Jabari Zakiya wrote:

On Saturday, 13 October 2018 at 18:31:57 UTC, Vijay Nayar wrote:
On Saturday, 13 October 2018 at 18:14:20 UTC, Vijay Nayar 
wrote:
On Saturday, 13 October 2018 at 18:05:45 UTC, Jabari Zakiya 
wrote:


It may be also running into a hard time limit imposed on 
compilation that Nim had/has that prevented my code from 
initially compiling. I'm generating a lot of PG parameter 
constants at compile time, and it's doing a lot of number 
crunching and building larger and larger arrays of constants 
as the PG's get larger.


Try compiling with successive PG's (just P5, then P5 and P7, 
etc) to see where it fails. That will let you know the code 
is working correctly, and that the compiler is choking 
either/and because of a hard time limit and/or memory limit. 
That's why I put in a compiler output statement in 
'genPGparameters' to see the progression of the PG 
parameters being built by the compiler to initially find 
when the compiler started choking. You may also need to 
patch whatever facility in the D compiler chain that 
controls this too.


It's P17, the biggest one that takes the longest to build in 
the Nim version. I actually don't know what memory limits 
exist for the D compiler at compile-time, so I may need to do 
some homework.


It's not just DMD either.

$ ldc2 twinprimes_ssoz.d
...
generating parameters for P17
Killed

$ gdc twinprimes_ssoz.d
...
generating parameters for P17
gdc: internal compiler error: Killed (program cc1d)
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.

$ dmd twinprimes_ssoz.d
...
generating parameters for P17
Killed


In the Nim code, starting line 91 is when the PG constants are 
being generate at compile time.


-
# Generate at compile time the parameters for PGs P5-P17.
const parametersp5  = genPGparameters(5)
const parametersp7  = genPGparameters(7)
const parametersp11 = genPGparameters(11)
const parametersp13 = genPGparameters(13)
const parametersp17 = genPGparameters(17)
-

Can it compile just using P5 (the first line, others commented 
out), and then P7, etc?


I'm not understanding your comments now.

If you can get a working version up and running (with correct 
output) we can solve the P17 compiler issues later (or in a 
parallel forum thread), especially if you have to delve into 
the weeds of the compiler chain.


In my mind (same with Nim process) getting working code using 
any PG is first order priority (because you'll need getting 
multi-threading working too). Once you can do that, by default, 
you can then use any generator you want if you create the 
correct parameters for it. That's one of the advantages of the 
algorithm, it's PG agnostic (as long as your hardware will 
accommodate it).


So don't prioritize getting P17 to compile right now (in this 
thread). Create the working generic structure that can work 
with any PG first.


Updated:  
https://gist.github.com/vnayar/79e2d0a9850833b8859dd9f08997b4d7


I still get a few runtime errors likely from mistakes in my 
conversion for certain primes.  I'll resolve those after I get 
back from the gym.


But as previous posters have said, the code is not really very 
different between Nim and D.  Most of it is array manipulation 
and arithmetic operations, and not many of the features of either 
D or Nim are very different.  Both turn into fast code, both have 
garbage collection, and both have generally similar operators and 
libraries for this kind of problem.


The biggest differences I observed revolved not around the 
languages themselves, but around code style.  For example, can 
you put a loop and 3 additional statements on a single line in D? 
 Yes.  But it is considered to be not very readable code from a 
style perspective.


Once I get the bugs out, I'm curious to see if any performance 
differences crop up.  There's the theory that says they should be 
the same, and then there's the practice.


Re: A Friendly Challenge for D

2018-10-13 Thread Jabari Zakiya via Digitalmars-d

On Saturday, 13 October 2018 at 18:31:57 UTC, Vijay Nayar wrote:

On Saturday, 13 October 2018 at 18:14:20 UTC, Vijay Nayar wrote:
On Saturday, 13 October 2018 at 18:05:45 UTC, Jabari Zakiya 
wrote:


It may be also running into a hard time limit imposed on 
compilation that Nim had/has that prevented my code from 
initially compiling. I'm generating a lot of PG parameter 
constants at compile time, and it's doing a lot of number 
crunching and building larger and larger arrays of constants 
as the PG's get larger.


Try compiling with successive PG's (just P5, then P5 and P7, 
etc) to see where it fails. That will let you know the code 
is working correctly, and that the compiler is choking 
either/and because of a hard time limit and/or memory limit. 
That's why I put in a compiler output statement in 
'genPGparameters' to see the progression of the PG parameters 
being built by the compiler to initially find when the 
compiler started choking. You may also need to patch whatever 
facility in the D compiler chain that controls this too.


It's P17, the biggest one that takes the longest to build in 
the Nim version. I actually don't know what memory limits 
exist for the D compiler at compile-time, so I may need to do 
some homework.


It's not just DMD either.

$ ldc2 twinprimes_ssoz.d
...
generating parameters for P17
Killed

$ gdc twinprimes_ssoz.d
...
generating parameters for P17
gdc: internal compiler error: Killed (program cc1d)
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.

$ dmd twinprimes_ssoz.d
...
generating parameters for P17
Killed


In the Nim code, starting line 91 is when the PG constants are 
being generate at compile time.


-
# Generate at compile time the parameters for PGs P5-P17.
const parametersp5  = genPGparameters(5)
const parametersp7  = genPGparameters(7)
const parametersp11 = genPGparameters(11)
const parametersp13 = genPGparameters(13)
const parametersp17 = genPGparameters(17)
-

Can it compile just using P5 (the first line, others commented 
out), and then P7, etc?


I'm not understanding your comments now.

If you can get a working version up and running (with correct 
output) we can solve the P17 compiler issues later (or in a 
parallel forum thread), especially if you have to delve into the 
weeds of the compiler chain.


In my mind (same with Nim process) getting working code using any 
PG is first order priority (because you'll need getting 
multi-threading working too). Once you can do that, by default, 
you can then use any generator you want if you create the correct 
parameters for it. That's one of the advantages of the algorithm, 
it's PG agnostic (as long as your hardware will accommodate it).


So don't prioritize getting P17 to compile right now (in this 
thread). Create the working generic structure that can work with 
any PG first.


Re: A Friendly Challenge for D

2018-10-13 Thread Vijay Nayar via Digitalmars-d

On Saturday, 13 October 2018 at 18:14:20 UTC, Vijay Nayar wrote:
On Saturday, 13 October 2018 at 18:05:45 UTC, Jabari Zakiya 
wrote:


It may be also running into a hard time limit imposed on 
compilation that Nim had/has that prevented my code from 
initially compiling. I'm generating a lot of PG parameter 
constants at compile time, and it's doing a lot of number 
crunching and building larger and larger arrays of constants 
as the PG's get larger.


Try compiling with successive PG's (just P5, then P5 and P7, 
etc) to see where it fails. That will let you know the code is 
working correctly, and that the compiler is choking either/and 
because of a hard time limit and/or memory limit. That's why I 
put in a compiler output statement in 'genPGparameters' to see 
the progression of the PG parameters being built by the 
compiler to initially find when the compiler started choking. 
You may also need to patch whatever facility in the D compiler 
chain that controls this too.


It's P17, the biggest one that takes the longest to build in 
the Nim version. I actually don't know what memory limits exist 
for the D compiler at compile-time, so I may need to do some 
homework.


It's not just DMD either.

$ ldc2 twinprimes_ssoz.d
...
generating parameters for P17
Killed

$ gdc twinprimes_ssoz.d
...
generating parameters for P17
gdc: internal compiler error: Killed (program cc1d)
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.

$ dmd twinprimes_ssoz.d
...
generating parameters for P17
Killed



Re: A Friendly Challenge for D

2018-10-13 Thread Jabari Zakiya via Digitalmars-d

On Saturday, 13 October 2018 at 18:14:20 UTC, Vijay Nayar wrote:
On Saturday, 13 October 2018 at 18:05:45 UTC, Jabari Zakiya 
wrote:


It may be also running into a hard time limit imposed on 
compilation that Nim had/has that prevented my code from 
initially compiling. I'm generating a lot of PG parameter 
constants at compile time, and it's doing a lot of number 
crunching and building larger and larger arrays of constants 
as the PG's get larger.


Try compiling with successive PG's (just P5, then P5 and P7, 
etc) to see where it fails. That will let you know the code is 
working correctly, and that the compiler is choking either/and 
because of a hard time limit and/or memory limit. That's why I 
put in a compiler output statement in 'genPGparameters' to see 
the progression of the PG parameters being built by the 
compiler to initially find when the compiler started choking. 
You may also need to patch whatever facility in the D compiler 
chain that controls this too.


It's P17, the biggest one that takes the longest to build in 
the Nim version. I actually don't know what memory limits exist 
for the D compiler at compile-time, so I may need to do some

homework.


Yes, that's what I figured, because that's when the Nim compiler 
initially balked. The variable that was patched in the Nim 
configg file set a hard limit on number of operations the 
compiler could do. It's used as a break switch on a runaway 
process (infinite loop, etc). It's probably something like that 
in the D compiler too, just guessing offhand. I hope it isn't a 
memory limit. However, the program will run fine without using 
P17. It's currently selected as the optimum PG for inputs greater 
than 15 trillion (15,000,000,000,000), so the program will run 
fine and produce correct output using P13 instead, but just not 
as fast.


Re: A Friendly Challenge for D

2018-10-13 Thread Vijay Nayar via Digitalmars-d

On Saturday, 13 October 2018 at 18:05:45 UTC, Jabari Zakiya wrote:


It may be also running into a hard time limit imposed on 
compilation that Nim had/has that prevented my code from 
initially compiling. I'm generating a lot of PG parameter 
constants at compile time, and it's doing a lot of number 
crunching and building larger and larger arrays of constants as 
the PG's get larger.


Try compiling with successive PG's (just P5, then P5 and P7, 
etc) to see where it fails. That will let you know the code is 
working correctly, and that the compiler is choking either/and 
because of a hard time limit and/or memory limit. That's why I 
put in a compiler output statement in 'genPGparameters' to see 
the progression of the PG parameters being built by the 
compiler to initially find when the compiler started choking. 
You may also need to patch whatever facility in the D compiler 
chain that controls this too.


It's P17, the biggest one that takes the longest to build in the 
Nim version. I actually don't know what memory limits exist for 
the D compiler at compile-time, so I may need to do some homework.


Re: A Friendly Challenge for D

2018-10-13 Thread Jabari Zakiya via Digitalmars-d

On Saturday, 13 October 2018 at 17:36:33 UTC, Vijay Nayar wrote:

On Saturday, 13 October 2018 at 15:50:06 UTC, Vijay Nayar wrote:
On Saturday, 13 October 2018 at 15:19:07 UTC, Jabari Zakiya 
wrote:

On Saturday, 13 October 2018 at 14:32:33 UTC, welkam wrote:
On Saturday, 13 October 2018 at 09:22:16 UTC, Vijay Nayar 
wrote:

[...]


import algorithm

thats all but then it spits out

lib/nim/pure/algorithm.nim(144, 11) Error: interpretation 
requires too many iterations


My mistake. I updated the file and forgot to include the 
'import algorithm' directive. The file is now fixed to 
include it. Download the corrected version or patch your file 
accordingly.


As stated in the file intro **YOU MUST DO THIS** to get it to 
compile with current Nim (they were supposed to fix this in 
this version 0.19.0 but didn't).


 To compile for nim versions <= 0.19.0 do following:
 1) in file: ~/nim-0.19.0/compiler/vmdef.nim
 2) set variable: MaxLoopIterations* = 1_000_000_000 (1 
Billion or >)

 3) then rebuild sysem: ./koch boot -d:release

If you are using 'choosenim' to install Nim (highly 
advisable) the full path is:


 ~/.choosenim/toolchains/nim-0.19.0/compiler/vmdef.nim

I'll post performance results from my laptop to give 
reference times to compare against.


Ok, now it builds.  I was previously following the build 
instructions from the Nim website and am not super clear what 
the "koch" tool does, but following your instructions, the 
program does build and run.  I'll take a stab at making a D 
version.


Interesting results so far.  I have a partially converted 
program here:  
https://gist.github.com/vnayar/79e2d0a9850833b8859dd9f08997b4d7


The interesting part is that during compilation (with the 
command "dmd twinprimes_ssoz.d"), the compilation will abort 
with the message "Killed" and no further information. That's a 
new one for me, so I'm looking into the cause.


It may be also running into a hard time limit imposed on 
compilation that Nim had/has that prevented my code from 
initially compiling. I'm generating a lot of PG parameter 
constants at compile time, and it's doing a lot of number 
crunching and building larger and larger arrays of constants as 
the PG's get larger.


Try compiling with successive PG's (just P5, then P5 and P7, etc) 
to see where it fails. That will let you know the code is working 
correctly, and that the compiler is choking either/and because of 
a hard time limit and/or memory limit. That's why I put in a 
compiler output statement in 'genPGparameters' to see the 
progression of the PG parameters being built by the compiler to 
initially find when the compiler started choking. You may also 
need to patch whatever facility in the D compiler chain that 
controls this too.




Re: A Friendly Challenge for D

2018-10-13 Thread Vijay Nayar via Digitalmars-d

On Saturday, 13 October 2018 at 15:50:06 UTC, Vijay Nayar wrote:
On Saturday, 13 October 2018 at 15:19:07 UTC, Jabari Zakiya 
wrote:

On Saturday, 13 October 2018 at 14:32:33 UTC, welkam wrote:
On Saturday, 13 October 2018 at 09:22:16 UTC, Vijay Nayar 
wrote:

[...]


import algorithm

thats all but then it spits out

lib/nim/pure/algorithm.nim(144, 11) Error: interpretation 
requires too many iterations


My mistake. I updated the file and forgot to include the 
'import algorithm' directive. The file is now fixed to include 
it. Download the corrected version or patch your file 
accordingly.


As stated in the file intro **YOU MUST DO THIS** to get it to 
compile with current Nim (they were supposed to fix this in 
this version 0.19.0 but didn't).


 To compile for nim versions <= 0.19.0 do following:
 1) in file: ~/nim-0.19.0/compiler/vmdef.nim
 2) set variable: MaxLoopIterations* = 1_000_000_000 (1 
Billion or >)

 3) then rebuild sysem: ./koch boot -d:release

If you are using 'choosenim' to install Nim (highly advisable) 
the full path is:


 ~/.choosenim/toolchains/nim-0.19.0/compiler/vmdef.nim

I'll post performance results from my laptop to give reference 
times to compare against.


Ok, now it builds.  I was previously following the build 
instructions from the Nim website and am not super clear what 
the "koch" tool does, but following your instructions, the 
program does build and run.  I'll take a stab at making a D 
version.


Interesting results so far.  I have a partially converted program 
here:  
https://gist.github.com/vnayar/79e2d0a9850833b8859dd9f08997b4d7


The interesting part is that during compilation (with the command 
"dmd twinprimes_ssoz.d"), the compilation will abort with the 
message "Killed" and no further information. That's a new one for 
me, so I'm looking into the cause.


Re: A Friendly Challenge for D

2018-10-13 Thread Vijay Nayar via Digitalmars-d

On Saturday, 13 October 2018 at 15:19:07 UTC, Jabari Zakiya wrote:

On Saturday, 13 October 2018 at 14:32:33 UTC, welkam wrote:
On Saturday, 13 October 2018 at 09:22:16 UTC, Vijay Nayar 
wrote:

[...]


import algorithm

thats all but then it spits out

lib/nim/pure/algorithm.nim(144, 11) Error: interpretation 
requires too many iterations


My mistake. I updated the file and forgot to include the 
'import algorithm' directive. The file is now fixed to include 
it. Download the corrected version or patch your file 
accordingly.


As stated in the file intro **YOU MUST DO THIS** to get it to 
compile with current Nim (they were supposed to fix this in 
this version 0.19.0 but didn't).


 To compile for nim versions <= 0.19.0 do following:
 1) in file: ~/nim-0.19.0/compiler/vmdef.nim
 2) set variable: MaxLoopIterations* = 1_000_000_000 (1 Billion 
or >)

 3) then rebuild sysem: ./koch boot -d:release

If you are using 'choosenim' to install Nim (highly advisable) 
the full path is:


 ~/.choosenim/toolchains/nim-0.19.0/compiler/vmdef.nim

I'll post performance results from my laptop to give reference 
times to compare against.


Ok, now it builds.  I was previously following the build 
instructions from the Nim website and am not super clear what the 
"koch" tool does, but following your instructions, the program 
does build and run.  I'll take a stab at making a D version.


Re: A Friendly Challenge for D

2018-10-13 Thread Jabari Zakiya via Digitalmars-d

On Saturday, 13 October 2018 at 14:32:33 UTC, welkam wrote:

On Saturday, 13 October 2018 at 09:22:16 UTC, Vijay Nayar wrote:


I downloaded the reference NIM implementation and got the 
latest nim compiler, but I received the following error:

  $ nim c --cc:gcc --d:release --threads:on twinprimes_ssoz.nim
  twinprimes_ssoz.nim(74, 11) Error: attempting to call 
undeclared routine: 'sort'


For a person not familiar with nim, what's the fastest way to 
fix that?


import algorithm

thats all but then it spits out

lib/nim/pure/algorithm.nim(144, 11) Error: interpretation 
requires too many iterations


My mistake. I updated the file and forgot to include the 'import 
algorithm' directive. The file is now fixed to include it. 
Download the corrected version or patch your file accordingly.


As stated in the file intro **YOU MUST DO THIS** to get it to 
compile with current Nim (they were supposed to fix this in this 
version 0.19.0 but didn't).


 To compile for nim versions <= 0.19.0 do following:
 1) in file: ~/nim-0.19.0/compiler/vmdef.nim
 2) set variable: MaxLoopIterations* = 1_000_000_000 (1 Billion 
or >)

 3) then rebuild sysem: ./koch boot -d:release

If you are using 'choosenim' to install Nim (highly advisable) 
the full path is:


 ~/.choosenim/toolchains/nim-0.19.0/compiler/vmdef.nim

I'll post performance results from my laptop to give reference 
times to compare against.


Re: A Friendly Challenge for D

2018-10-13 Thread Vijay Nayar via Digitalmars-d

On Saturday, 13 October 2018 at 14:32:33 UTC, welkam wrote:

On Saturday, 13 October 2018 at 09:22:16 UTC, Vijay Nayar wrote:


I downloaded the reference NIM implementation and got the 
latest nim compiler, but I received the following error:

  $ nim c --cc:gcc --d:release --threads:on twinprimes_ssoz.nim
  twinprimes_ssoz.nim(74, 11) Error: attempting to call 
undeclared routine: 'sort'


For a person not familiar with nim, what's the fastest way to 
fix that?


import algorithm

thats all but then it spits out

lib/nim/pure/algorithm.nim(144, 11) Error: interpretation 
requires too many iterations


I ran into the same problem as you did, and then followed the 
instructions from the error.  I modified the compiler source and 
increased the number of maximum iterations from 3_000_000 to 
1_000_000_000, rebuilt and installed it, but still ran into the 
exact same problem.  There may be something up with the algorithm 
itself.


Re: A Friendly Challenge for D

2018-10-13 Thread welkam via Digitalmars-d

On Saturday, 13 October 2018 at 09:22:16 UTC, Vijay Nayar wrote:


I downloaded the reference NIM implementation and got the 
latest nim compiler, but I received the following error:

  $ nim c --cc:gcc --d:release --threads:on twinprimes_ssoz.nim
  twinprimes_ssoz.nim(74, 11) Error: attempting to call 
undeclared routine: 'sort'


For a person not familiar with nim, what's the fastest way to 
fix that?


import algorithm

thats all but then it spits out

lib/nim/pure/algorithm.nim(144, 11) Error: interpretation 
requires too many iterations




Re: A Friendly Challenge for D

2018-10-13 Thread Vijay Nayar via Digitalmars-d

On Friday, 12 October 2018 at 21:08:03 UTC, Jabari Zakiya wrote:

On Friday, 12 October 2018 at 20:05:29 UTC, welkam wrote:
On Friday, 12 October 2018 at 16:19:59 UTC, Jabari Zakiya 
wrote:
The real point of the challenge is too see what idiomatic 
code...


There is no idiomatic D code. There is only better 
implementations.


D doesnt tell you how to write your code. It gives you many 
tools and you choose which tools to use. That`s what people 
like about D.


I await your implementation(s)! :-)


I downloaded the reference NIM implementation and got the latest 
nim compiler, but I received the following error:

  $ nim c --cc:gcc --d:release --threads:on twinprimes_ssoz.nim
  twinprimes_ssoz.nim(74, 11) Error: attempting to call 
undeclared routine: 'sort'


For a person not familiar with nim, what's the fastest way to fix 
that?


Re: A Friendly Challenge for D

2018-10-12 Thread Jabari Zakiya via Digitalmars-d

On Friday, 12 October 2018 at 20:05:29 UTC, welkam wrote:

On Friday, 12 October 2018 at 16:19:59 UTC, Jabari Zakiya wrote:
The real point of the challenge is too see what idiomatic 
code...


There is no idiomatic D code. There is only better 
implementations.


D doesnt tell you how to write your code. It gives you many 
tools and you choose which tools to use. That`s what people 
like about D.


I await your implementation(s)! :-)


Re: A Friendly Challenge for D

2018-10-12 Thread welkam via Digitalmars-d

On Friday, 12 October 2018 at 16:19:59 UTC, Jabari Zakiya wrote:
The real point of the challenge is too see what idiomatic 
code...


There is no idiomatic D code. There is only better 
implementations.


D doesnt tell you how to write your code. It gives you many tools 
and you choose which tools to use. That`s what people like about 
D.


Re: A Friendly Challenge for D

2018-10-12 Thread welkam via Digitalmars-d

On Friday, 12 October 2018 at 16:19:59 UTC, Jabari Zakiya wrote:
Hmm,I don't think what you're saying about similar 
output|performance with other languages is empirically correct, 
but it's really not the point of the challenge.



Thats why godbolt exists.

c++ and Rust
https://godbolt.org/z/FffXY2
C and D
https://godbolt.org/z/YQvkXU

Look at assembly output. Its all the same. Compilers are very 
good at recognizing simple arithmetic computations and can reason 
about it. What they struggle with is memory access.


On Friday, 12 October 2018 at 16:19:59 UTC, Jabari Zakiya wrote:
Finally, a really fast D implementation can be a marketing 
bananza to show people > in the numerical analysis, data|signal 
processing fields, et al, that D can be used by them to solve 
their problems and be more performant than C++, etc


eBay`s tsv-utils is fastest in the world
https://github.com/eBay/tsv-utils/blob/master/docs/comparative-benchmarks-2018.md#top-four-in-each-benchmark

D`s JSON parsing is fastest in the world
https://github.com/kostya/benchmarks#json

Sambamba is BAM data processor and is fastest in the world
https://www.basepairtech.com/blog/sorting-bam-files-samtools-vs-sambamba/

Mir GLAS is faster than OpenBLAS and Eigen
http://blog.mir.dlang.io/glas/benchmark/openblas/2016/09/23/glas-gemm-benchmark.html

There are enough examples but they are not enough to change minds


Re: A Friendly Challenge for D

2018-10-12 Thread Jabari Zakiya via Digitalmars-d

On Friday, 12 October 2018 at 15:11:17 UTC, welkam wrote:
On Wednesday, 10 October 2018 at 16:15:56 UTC, Jabari Zakiya 
wrote:
What I am requesting here is for a person(s) who is an 
"expert" (very good) to create a very fast D version, using 
whatever tricks it has to maximize performance.


I would like to include in my paper a good comparison of 
various implementations in different compiled languages 
(C/C++, D, Nim, etc) to show how it performs with each.


I looked into your NIM code and from programmers point of view 
there is nothing interesting going on. Simple data structures 
and simple operations. If you wrote equivalent code in C, C++, 
D, NIM, Rust, Zig and compiled with same optimizing compiler 
(llvm or gcc) you should get the same machine code and almost 
the same performance (less than 1% difference due to runtime). 
If you got different machine code for equivalent implementation 
then you should file a bug report.


The only way you will get different performance is by changing 
implementation details but then you would compare apples to 
oranges.


Hmm,I don't think what you're saying about similar 
output|performance with other languages is empirically correct, 
but it's really not the point of the challenge.


The real point of the challenge is too see what idiomatic code, 
written for performance, using the best resources that the 
language provides, will produce compared, to the Nim version. 
It's not to see what a line-by-line translation from Nim to D 
would look like. That may be a start to get something working, 
but shouldn't be the end goal.


I'm using the Nim version here as the "reference implementation" 
so it can be used as the standard for comparison (accuracy of 
results and performance). The goal for D (et al) users is to use 
whatever resources it provides to maybe do better.


Example. Nim currently doesn't provide standard bitarrays. Using 
bitarrays in place of byte arrays should perform faster because 
more data can fit in cache and operate faster.


Also, to parallelize the algorithm maybe using OpenMP, CUDA, etc 
is the way to do it for D. I don't know what constructs D uses 
for parallel multiprocessing. And as noted before, this 
algorithms screams out to be done with GPUs.


But you are correct that the Nim code uses very simple coding 
operations. That is one of its beauties! :) It is simple to 
understand and implement mathematically, short and simple to 
code, and architecturally adaptable to hardware.


So to really do the challenge, the Nim code needs to be compiled 
and run (per instructions in code) to use as the "reference 
implementation", to see what correct outputs look like, and their 
times, and then other implementations can be compared to it.


I would hope, after getting an output correct implementation done 
(to show you really know what you're doing) then alternative 
implementations can be done to wring out better performance.


I think this is a good challenge for anyone wanting to learn D 
too, because it involves something substantially more than a 
"toy" algorithm, but short enough to do with minimal time and 
effort, that involves the need to know (learn about) D in enough 
detail to determine the "best" (alternative) way to do it.


Finally, a really fast D implementation can be a marketing 
bananza to show people in the numerical analysis, data|signal 
processing fields, et al, that D can be used by them to solve 
their problems and be more performant than C++, etc.


Again, people should feel free to email me if the want more 
direct answers to questions, or help.




Re: A Friendly Challenge for D

2018-10-12 Thread welkam via Digitalmars-d
On Wednesday, 10 October 2018 at 16:15:56 UTC, Jabari Zakiya 
wrote:
What I am requesting here is for a person(s) who is an "expert" 
(very good) to create a very fast D version, using whatever 
tricks it has to maximize performance.


I would like to include in my paper a good comparison of 
various implementations in different compiled languages (C/C++, 
D, Nim, etc) to show how it performs with each.


I looked into your NIM code and from programmers point of view 
there is nothing interesting going on. Simple data structures and 
simple operations. If you wrote equivalent code in C, C++, D, 
NIM, Rust, Zig and compiled with same optimizing compiler (llvm 
or gcc) you should get the same machine code and almost the same 
performance (less than 1% difference due to runtime). If you got 
different machine code for equivalent implementation then you 
should file a bug report.


The only way you will get different performance is by changing 
implementation details but then you would compare apples to 
oranges.





Re: A Friendly Challenge for D

2018-10-11 Thread Ali Çehreli via Digitalmars-d

On 10/11/2018 10:14 AM, Jabari Zakiya wrote:

> Ok, hopefully this will work for everyone. Try this link:
>
> https://mega.nz/#!yJxUEQgK!MY9dwjiWheE8tACtEeS0szduIvdBjiyTn4O6mMD_aZw

Thank you. That worked just fine. I clicked the Download link and the 
pdf was saved on my end. :)


Ali



Re: A Friendly Challenge for D

2018-10-11 Thread Jabari Zakiya via Digitalmars-d

On Thursday, 11 October 2018 at 16:13:17 UTC, Jabari Zakiya wrote:
On Thursday, 11 October 2018 at 14:49:54 UTC, Carl Sturtivant 
wrote:
On Thursday, 11 October 2018 at 13:26:19 UTC, Jabari Zakiyth 
wrote:
On Thursday, 11 October 2018 at 05:11:20 UTC, Ali Çehreli 
wrote:

[...]


What country are you trying to get access from, because I 
know people in the US have gotten the papers from those link, 
for free and without an account. It may also have something 
to do with your browser. They work for me in various modern 
browsers, including the Tor Browser.


Not credible --- I cannot get either without an account 
browsing from the US using Firefox with no unusual 
arrangements.


Wow. Those links used to work with no problem, so I don't know 
what is it (vpn use, etc,?).


OK, this will definitely work. Send me an email and I will 
email you the paper (and figure out a sure fire way to make it 
available).


Construct this full email (for humans).

(my first initial)(lastname) at gmail


Ok, hopefully this will work for everyone. Try this link:

https://mega.nz/#!yJxUEQgK!MY9dwjiWheE8tACtEeS0szduIvdBjiyTn4O6mMD_aZw

However, I'm still interested to find out what is|are the 
specific issues with the other links so I can make a possible bug 
report to those services.


1) Can you actually get to the site?

2) Can you see the document (a reference to it)?

3) Can you read the document online (did you try)?

4) What happened when you tried to download it (popup requiring 
account, etc)?


The paper is from 2014, Scribd has changed a lot since then. The 
more information people can provide will help me determine how to 
provide better access to them.


But you can always email me to get them as a last resort, or just 
to ask questions.




Re: A Friendly Challenge for D

2018-10-11 Thread Jabari Zakiya via Digitalmars-d
On Thursday, 11 October 2018 at 14:49:54 UTC, Carl Sturtivant 
wrote:
On Thursday, 11 October 2018 at 13:26:19 UTC, Jabari Zakiyth 
wrote:
On Thursday, 11 October 2018 at 05:11:20 UTC, Ali Çehreli 
wrote:

[...]


What country are you trying to get access from, because I know 
people in the US have gotten the papers from those link, for 
free and without an account. It may also have something to do 
with your browser. They work for me in various modern 
browsers, including the Tor Browser.


Not credible --- I cannot get either without an account 
browsing from the US using Firefox with no unusual arrangements.


Wow. Those links used to work with no problem, so I don't know 
what is it (vpn use, etc,?).


OK, this will definitely work. Send me an email and I will email 
you the paper (and figure out a sure fire way to make it 
available).


Construct this full email (for humans).

(my first initial)(lastname) at gmail


Re: A Friendly Challenge for D

2018-10-11 Thread Carl Sturtivant via Digitalmars-d
On Thursday, 11 October 2018 at 13:26:19 UTC, Jabari Zakiyth 
wrote:

On Thursday, 11 October 2018 at 05:11:20 UTC, Ali Çehreli wrote:

[...]


What country are you trying to get access from, because I know 
people in the US have gotten the papers from those link, for 
free and without an account. It may also have something to do 
with your browser. They work for me in various modern browsers, 
including the Tor Browser.


Not credible --- I cannot get either without an account browsing 
from the US using Firefox with no unusual arrangements.


Re: A Friendly Challenge for D

2018-10-11 Thread Jabari Zakiyth via Digitalmars-d

On Thursday, 11 October 2018 at 05:11:20 UTC, Ali Çehreli wrote:

On 10/10/2018 07:52 PM, Jabari Zakiyth wrote:
> On Wednesday, 10 October 2018 at 22:25:17 UTC, Neia Neutuladh
wrote:
>> On 10/10/2018 03:05 PM, Jabari Zakiya wrote:
>>> 
https://www.scribd.com/doc/228155369/The-Segmented-Sieve-of-Zakiya-SSoZ

>>
>> It would be great if you could provide a link to a freely
downloadable
>> version of this.
>
> You can download the paper for free from that link. Did you
have trouble
> doing it?

I think the problem is, scribd requires an account, which they 
apparently happy to link to an existing Google or Facebook 
account.


> Here's another link to paper.
>
> 
https://www.academia.edu/7583194/The_Segmented_Sieve_of_Zakiya_SSoZ_


Similarly, that one requires a Google account.

> Here, again, is the link to the Nim code. Just install Nim
(current
> 0.19.0) and compile and run it per instructions in code. I
recommend
> people do that to see its outputs and verify its results.
>
> 
https://gist.github.com/jzakiya/6c7e1868bd749a6b1add62e3e3b2341e


That works! :)

Ali


What country are you trying to get access from, because I know 
people in the US have gotten the papers from those link, for free 
and without an account. It may also have something to do with 
your browser. They work for me in various modern browsers, 
including the Tor Browser.





Re: A Friendly Challenge for D

2018-10-10 Thread Ali Çehreli via Digitalmars-d

On 10/10/2018 07:52 PM, Jabari Zakiyth wrote:
> On Wednesday, 10 October 2018 at 22:25:17 UTC, Neia Neutuladh wrote:
>> On 10/10/2018 03:05 PM, Jabari Zakiya wrote:
>>> https://www.scribd.com/doc/228155369/The-Segmented-Sieve-of-Zakiya-SSoZ
>>
>> It would be great if you could provide a link to a freely downloadable
>> version of this.
>
> You can download the paper for free from that link. Did you have trouble
> doing it?

I think the problem is, scribd requires an account, which they 
apparently happy to link to an existing Google or Facebook account.


> Here's another link to paper.
>
> https://www.academia.edu/7583194/The_Segmented_Sieve_of_Zakiya_SSoZ_

Similarly, that one requires a Google account.

> Here, again, is the link to the Nim code. Just install Nim (current
> 0.19.0) and compile and run it per instructions in code. I recommend
> people do that to see its outputs and verify its results.
>
> https://gist.github.com/jzakiya/6c7e1868bd749a6b1add62e3e3b2341e

That works! :)

Ali



Re: A Friendly Challenge for D

2018-10-10 Thread Jabari Zakiyth via Digitalmars-d

On Thursday, 11 October 2018 at 00:22:10 UTC, tide wrote:
On Wednesday, 10 October 2018 at 16:15:56 UTC, Jabari Zakiya 
wrote:
I would like to include in my paper a good comparison of 
various implementations in different compiled languages 
(C/C++, D, Nim, etc) to show how it performs with each.


If you want help with your paper, possibly some kind of decent 
financial incentive would be appropriate. If the algorithm 
benefits from more threads than finding or creating an 
implementation that runs on a GPU would probably be the true 
performance test. CPUs have like 4-8 cores in the mainstream? A 
GPU has hundreds, though with some limitations.


I'm writing the paper anyway (just like the others), so other 
implementations are icing on the cake to show implementation 
variations, as a benefit to readers. Maybe if I set up a website 
and created a Rosetta Code repo for people to post their 
different language implementations, and offer a T-shirt for 
fastest implementation. :-)


Yes, a GPU based implementation would be the epitome for this 
algorithm, by far. This is actually why I have gotten the 
algorithm to this implementation so that the number crunching can 
all be done in parallel threads. (It would also be screamingly 
fast done in hardware in a FPGA too.) However, I only have 
standard consumer grade laptops. Hopefully someone(s) with 
sufficient hardware, interest, and time, will take this upon 
themselves to do this and publicize their results.


Re: A Friendly Challenge for D

2018-10-10 Thread Jabari Zakiyth via Digitalmars-d
On Wednesday, 10 October 2018 at 22:25:17 UTC, Neia Neutuladh 
wrote:

On 10/10/2018 03:05 PM, Jabari Zakiya wrote:

https://www.scribd.com/doc/228155369/The-Segmented-Sieve-of-Zakiya-SSoZ


It would be great if you could provide a link to a freely 
downloadable version of this.


You can download the paper for free from that link. Did you have 
trouble doing it?


Here's another link to paper.

https://www.academia.edu/7583194/The_Segmented_Sieve_of_Zakiya_SSoZ_


Here, again, is the link to the Nim code. Just install Nim 
(current 0.19.0) and compile and run it per instructions in code. 
I recommend people do that to see its outputs and verify its 
results.


https://gist.github.com/jzakiya/6c7e1868bd749a6b1add62e3e3b2341e


Re: A Friendly Challenge for D

2018-10-10 Thread tide via Digitalmars-d
On Wednesday, 10 October 2018 at 16:15:56 UTC, Jabari Zakiya 
wrote:
I would like to include in my paper a good comparison of 
various implementations in different compiled languages (C/C++, 
D, Nim, etc) to show how it performs with each.


If you want help with your paper, possibly some kind of decent 
financial incentive would be appropriate. If the algorithm 
benefits from more threads than finding or creating an 
implementation that runs on a GPU would probably be the true 
performance test. CPUs have like 4-8 cores in the mainstream? A 
GPU has hundreds, though with some limitations.


Re: A Friendly Challenge for D

2018-10-10 Thread Neia Neutuladh via Digitalmars-d

On 10/10/2018 03:05 PM, Jabari Zakiya wrote:

https://www.scribd.com/doc/228155369/The-Segmented-Sieve-of-Zakiya-SSoZ


It would be great if you could provide a link to a freely downloadable 
version of this.


Re: A Friendly Challenge for D

2018-10-10 Thread Jabari Zakiya via Digitalmars-d

On Wednesday, 10 October 2018 at 20:43:01 UTC, Kagamin wrote:
On Wednesday, 10 October 2018 at 16:15:56 UTC, Jabari Zakiya 
wrote:

https://gist.github.com/jzakiya/6c7e1868bd749a6b1add62e3e3b2341e


As i understand, main thread preallocates global memory and 
tracks it, and other threads don't track it?


Here's an abreviated elementary explanation of the algorithm and 
implementation.
You will need to read (download) my paper "The Segmented Sieve of 
Zakiya (SSoZ)"
because I will make reference to things in it, to keep this as 
short as possible.


https://www.scribd.com/doc/228155369/The-Segmented-Sieve-of-Zakiya-SSoZ

To really understand the math and theory of the algorithm 
requires primarily
just understanding Table 3 on page 3 of the paper, as it 
encapsulates everything.
You can read the paper to understand the language of the 
algorithm used in the code.


Twinprimes are 2 (odd) primes that differ by only 2 eg. 3-5, 5-7, 
11-13, 29-31.


Table 3 shows all the residues values (prime candidates|pcs) and 
residues groups
(resgroups|columns) to find all the primes upto 541 using P5 
(modpg = 30, rescnt = 8).


For a given value N, it will be represented with a PG table of 
some number of
resgroups, with max size I call Kmax (the regroups value N 
residues in).


Using P5, I only need to sieve primes along the residue tracks 
(restracks) that can
produce twinprimes, here 11-13, 17-19, 29-31. Thus I create 3 
byte arrays, one for each twinpair, and use the lower 2 bits to 
represent the upper and lower twinprime restracks.


Then for each twinpair (here 3) I run 3 thread which perform the 
SSoZ on the entire Kmax length of resgroups in parallel. At the 
end I accumulate the results and print them out. This, in a 
nutshell, is what the algorithm does. The paper gives you enough 
to understand the fundamental nature of the algorithm, though 
I've learned so much more than in 2014. :)


The larger the PG the more twinpair restracks (see page 14) there 
are to use. For larger numbers you want to use the largest PG 
possible that 'fits' into the hardware cpu. All my 
development|testing was done on laptops using Intel I5|I7 cpus 
with 4 or 8 threads. I'm really interested how it performs on 
other platforms (ARM, AMD, PowerPC, etc).



The main proc "twinprimes_ssoz" manages program flow and set as 
follows:


1) accepts the input number (an integer) in "val" from the cli

2) sets "num" to be first odd number < "val" if "val" even

3) calls "selectPG" with "num" to select optimum Prime Generator 
(PG) parameters


4) compute various working parameters per selected PG and number 
value (see refs)


5) compute global values for number of primes, and their values, 
<= sqrt(num) for

selected PG with proc "sozpg"

6) Set initial twinprime count for selected PG

7) Then with proc "segsieve" allocate one thread to perform SSoZ 
(segmented sieve of zakiya)
on each twinprime pair residues (determined by selected PG), and 
count number of twinprmes

computed in each thread.

8) Then determine true total twinprime count and last twinprime 
<= "num"


It also is timing different intervals and prints out diagnostics 
and final output.



The proc "twins_sieve" is the primary function that manages and 
performs the SSoZ for
a given twinprim pair parameters in each thread. The more threads 
the faster the process goes.


I'll provide more info later. I have to run now. I wanted to get 
this out now while I

was at my laptop, and online.


Re: A Friendly Challenge for D

2018-10-10 Thread Kagamin via Digitalmars-d
On Wednesday, 10 October 2018 at 16:15:56 UTC, Jabari Zakiya 
wrote:

https://gist.github.com/jzakiya/6c7e1868bd749a6b1add62e3e3b2341e


As i understand, main thread preallocates global memory and 
tracks it, and other threads don't track it?


Re: A Friendly Challenge for D

2018-10-10 Thread SrMordred via Digitalmars-d
On Wednesday, 10 October 2018 at 16:15:56 UTC, Jabari Zakiya 
wrote:

[...]


Looking forward to this :)


A Friendly Challenge for D

2018-10-10 Thread Jabari Zakiya via Digitalmars-d

Hi.

I hope this is the right place to request this, if not please 
tell me a better one.


I had looked at D, and played with it some circa 2010~2012, but 
time and life took my priorities away. But I'm still interested 
in learning different languages, but there are so many more now 
it's hard to devote the time to learn them to some high level of 
proficiency.


Subsequently, I started learning Nim, because I find the syntax 
and constructs simpler and more familiar than most (I'm coming 
mostly from a Ruby background).


I have developed in Nim a program to find twinprimes that seems 
to be the fastest of its type, compared to primesieve 
(https://primesieve.org/) which claims to be the fastest, written 
in C++.


Here is the code to my Nim implementation of twinprimes_ssoz.

https://gist.github.com/jzakiya/6c7e1868bd749a6b1add62e3e3b2341e

The total file is just 318 lines of code, with about 60 separate 
lines of comments.
The code is extensively commented per line to explain what's 
happening.
Reading the references given in the code introduction will 
explain the general process.

See  "The Segmented Sieve of Zakiya (SSoZ)"
https://www.scribd.com/doc/228155369/The-Segmented-Sieve-of-Zakiya-SSoZ

I am in the process of writing up a paper to explain this 
application, and implementation. What I am requesting here is for 
a person(s) who is an "expert" (very good) to create a very fast 
D version, using whatever tricks it has to maximize performance.


I would like to include in my paper a good comparison of various 
implementations in different compiled languages (C/C++, D, Nim, 
etc) to show how it performs with each.


This algorithm is designed to be run in multiprocessing 
environments (more core/threads, better performance). The Nim 
version (0.18.0, 0.19.0 recently released) uses its native 
parallel processing structure. In C++, et al, it may be best 
implemented using OPenMP, or CUDA, etc. These are implementation 
details one versed in a language could determine.


Well that's it. I am willing to share performance data, compared 
to primesive, if you like. The beauty of this 
algorithm|implementation is its simplicity in design, and minimal 
code. It shouldn't be that hard to understand to determine how to 
translate into D. Of course, I'm am fully available to answer 
questions and provide help.


Thanks

Jabari




Re: Knowing the approach to solve a D challenge

2018-02-16 Thread Jesse Phillips via Digitalmars-d

On Friday, 16 February 2018 at 09:44:27 UTC, aberba wrote:
D has tone of features and library solutions. When you 
encounter a problem, how do you approach solving it in code?


1. Do you first write it in idiomatic D style or a more general 
approach before porting to idiomatic D?


Like always, it depends. Do I attribute up all my functions with 
const/@safe/nothrow... (is that even idiomatic) no. Do I build a 
range to process some data... sometimes. Do I utilize 
std.algorithm/std.range functions to process my data... when 
available.


Even my C# code tends to be more idiomatic D than others.

2. Do you find yourself mostly rolling out your own 
implementation first before using a library function?


I don't know. If a rolled my own I probably don't know of the 
library function/function combination that does the same thing.



3. Do the use of generics come out of first try or a rewrite?


Usually not. If I don't have another use-case I probably don't 
know what needs

to be generic.

4. What rough percentage of phobos knowledge is required for 
reasonable good problem solving efficiency?


I would guess average. The reason I say this because unlike 
popular languages search doesn't always provide a good example 
for every question so you kind of need to already know what 
you're looking for. It doesn't help that sometimes the answer is 
a combination of this (.reduce!max that isn't found in std.math)


Re: Knowing the approach to solve a D challenge

2018-02-16 Thread Dukc via Digitalmars-d

On Friday, 16 February 2018 at 09:44:27 UTC, aberba wrote:
1. Do you first write it in idiomatic D style or a more general 
approach before porting to idiomatic D?


In micro-level, it's usually fairly idiomatic from get-go. 
Usually no heap allocations at inner loops, for-looping, static 
variables etc. Sometimes if I get a bit desperate of frustated I 
might do this a bit.


But at larger level, I often find myself shifting work between 
functions, structs, modules and so. In other words, redesigning.


2. Do you find yourself mostly rolling out your own 
implementation first before using a library function?


No, the vast majority of my calls are from Phobos or other 
libraries I use. And in case of Phobos at least, it's stuff is so 
general that I don't need to design my own function often. An 
exception is std.algorithm.each, but just to get better error 
messages (template constraints don't tell why the call did not 
work).



3. Do the use of generics come out of first try or a rewrite?


Rewriting a lot, but that's the case with all coding. With 
generics, i often tend to forget ! before an alias parameter.


4. What rough percentage of phobos knowledge is required for 
reasonable good problem solving efficiency?


If we talk about experience of the language in general, it's more 
than with C-style programming, Java or C# but less than with C++ 
generics. I don't think you have to know Phobos throughout, but 
you have to know the language and understand the philosophy of 
ranges quite deeply.


Percentages aren't important, I think that as long as you know 
the general purpose of std.algorithm, std.range, std.stdio, 
std.conv and std.array that gets you started. Again, if you know 
the language and understand the range concept.




Re: Knowing the approach to solve a D challenge

2018-02-16 Thread Chris via Digitalmars-d

On Friday, 16 February 2018 at 09:44:27 UTC, aberba wrote:
D has tone of features and library solutions. When you 
encounter a problem, how do you approach solving it in code?


1. Do you first write it in idiomatic D style or a more general 
approach before porting to idiomatic D?


As idiomatic as possible. But it depends on how big and/or 
important the method/function is. The more important and/or 
bigger, the more idiomatic. Sometimes a `foreach` will do in a 
small function.


But D makes it easy to write idiomatic code straight away, 
although sometimes it can be a bit annoying when you get messages 
like "cannot implicitly convert x of type y to z". So you have to 
keep track of what you are chaining.


2. Do you find yourself mostly rolling out your own 
implementation first before using a library function?


I usually start to roll out my own for a few minutes until I say 
"Hold on, maybe the library already has a function that does 
exactly that!" And it often does. That's because I get a better 
understanding of what I need, when I roll my own for a while. 
Using the library straight away can sometimes result in shooting 
yourself in the foot. So it's a mix of both.



3. Do the use of generics come out of first try or a rewrite?


Depends on the problem at hand. I you know you will have to 
handle several types down the road, it's a minimal generic that 
accepts only one type to begin with but that can be extended to 
others. It's a bit "play by ear". If I'm dealing with HTML and I 
know that JSON or CSV will be needed later, I write the function 
in a way that it can easily be extended. If a function returns 
the user name as a `string`, I won't bother with generics. That's 
just normal, I think.


4. What rough percentage of phobos knowledge is required for 
reasonable good problem solving efficiency?


You cannot know every function, of course, and new functions are 
being added all the time. What you need to know is what Phobos 
can do in general and where to look for it. The cheat sheet 
should be the first port of call. You can save a lot of time by 
skimming through the library regularly. Also, it's worth looking 
at other people's code in the "learn" section. What often happens 
to me, when I see a solution, I go "Jesus, that's clever, I'll 
keep that in mind!" The learn section is good because it focuses 
on one (small) problem at a time.




  1   2   3   4   5   6   7   >