Re: Comparisons of Nim to the Chapel computer programming language...

2020-06-17 Thread GordonBGood
@Brad, just a few pertinent points:

> That said, in our experience, Cygwin itself also incurs a lot of overhead 
> relative to a native installation of Chapel, so it's hard to say how much of 
> the poor performance you observed was due to fifo as opposed to Cygwin 
> overheads.

Cygwin may introduce some additional overheads, but Windows is what I have and 
I have barely enough memory to be able to run WSL on top so the Linux option is 
somewhat difficult (although likely possible). That said, other than even 
slower than usual compile times, the actual Chapel-produced executable files 
seem to run about the same speed on my cygwin setup as they do on "tio.run" 
other than this "fifo" issue.

Also, other than this issue (which I've worked around with my implementation of 
a thread pool), Chapel produced code, whether run on my cygwin system or on 
"tio.run"/Linux is producing as fast or faster executable code as compared to 
any compiler out there, maybe even including Nim. I am just putting the 
finishing touches on a Page-Segmented Sieve of Eratosthenes implementation in 
Chapel that should be at least as fast (and it's looking like it may be faster) 
than Kim Walisch's "primesieve" written in C++, although the reason it's faster 
isn't really the language (other than bad logic habits C/C++ programmers often 
fall into) but my algorithm, which simplifies the code.

Eventually I'll make a GitHub repo for this Chapel project. Do you have an 
"awesome Chapel" page somewhere to post a link to this eventual repo?

I'll also be updating my implementation of the same in Nim so that a comparison 
of the code and performance can be readily made.

Let's just chalk Chapel/cygwin's limitations as being up to the "fifo" 
implementation and leave it at that.

> With respect to the performance of our hash tables / associative domains and 
> arrays, this is admittedly another area that has not received much 
> performance focus in our work to date compared to dense rectangular arrays. 
> But unlike fifo and cygwin, it is an area where we intend to invest more 
> effort. For example, members of the team have recently been working on 
> refactoring the existing code with an eye toward improving it...

Looks like I should file an issue on the performance I'm finding for this one 
then, if your team is actively working on it.

I won't bother filing an issue on the fifo performance since it looks like you 
are unlikely to do anything about it anytime soon.

> Best wishes

And mine to you, Gordon


Re: Comparisons of Nim to the Chapel computer programming language...

2020-06-17 Thread GordonBGood
@Brad, Thanks for joining the forum and your hints as to Chapel threading 
performance:

> Are you able to share the benchmark you were using with us (as well as 
> information like back-end compiler and version, processor, etc)? In our 
> experiments, Chapel has generally shown to be competitive with OpenMP, so it 
> would be interesting for us to understand better what you were doing (prior 
> to resorting to a homegrown thread pool) in order to make sure nothing's 
> going horribly awry. I'd also be curious whether you were using 
> CHPL_TASKS=qthreads or CHPL_TASKS=fifo. Thanks.
> 
> ( = For example, these benchmarks :... demonstrate Chapel creating 12,000,000 
> tasks (500,000 x 24 cores) in 4.4 seconds, compared to 4.1 seconds for GNU 
> OpenMP). (about 8.8/~0.2 microseconds per thread, machine unspecified - GBG)

Thanks for your prompting as to checking the CHPL_TASKS environment variable; 
due to this prompting I re-read the [documentation section on Using 
Tasks](https://chapel-lang.org/docs/usingchapel/tasks.html) to find that due to 
my using cygwin on Windows this defaults to "fifo" which is pthreads and and 
was somewhat confused with some of my benchmarks run on "tio.run" (an online 
IDE using Linux and therefore "qthreads") and some on my own machine using 
cygwin and therefore defaulting to "fifo".

Your results as posted above are in line with what I was seeing on 
"tio.run"/Linux in just a few microseconds overhead per task switch using the 
benchmark program as follows: 


// testing simple threading...

use Time;

config const NUMLOOPS = 1000;
assert(NUMLOOPS >= 0, "NUMLOOPS must be zero or above!");
config const NUMTHRDS = here.maxTaskPar;
assert(NUMTHRDS > 0, "NUMTHRDS must be at least one!");
writeln("Number of loops:  ", NUMLOOPS, "; Number of threads:  ", NUMTHRDS);

var rsltsq:  [ 0 .. NUMTHRDS - 1 ] sync int;

proc dowrk(qi: int, v: int) {
  rsltsq[qi] = v; // thread work that doesn't do much of anything!
}

var timer: Timer; timer.start();
// pre load the thread pool work
for i in 0 .. NUMTHRDS - 1 {
  rsltsq[i].reset(); begin with (const in i) dowrk(i, i); }
var qi = 0; var sum = 0;
for i in NUMTHRDS .. NUMLOOPS + NUMTHRDS {
  sum += rsltsq[qi];
  if i <= NUMLOOPS then begin with (const in i) dowrk(qi, i);
  qi = if qi >= NUMTHRDS - 1 then 0 else qi + 1;
}
timer.stop();
write("The sum is ", sum, " in ");
writeln(timer.elapsed(TimeUnits.milliseconds), " milliseconds!");


Run

When run on 
[tio.run](https://tio.run/##hVPNctowEL77KTZcagZwTI845tRDD22Sackpw0HYa6xBtoxWQNNOXr10JcWYTjNTLtKutd/P7lLUokN1Pt/egkWyst0CyaZTCLY2KEpOJEkSRQdCWMkGsygqdFvJLfBBFu6fvn55eHj8DjnM0zTNIkGExsaX/DKHdAqjS9wcuGqD8BONBm1AbPQRb0bj7B/c1edvnxxujQaTRvxYCdo9CnNNEZ4se4YQ9gzCgkLBd90GgpORFlUbj@4PzQYN6AqU1h0tAEbTixOGymB4EdowvPEcDBYdhQFDytKevz1DCkkyqJ7BHNZAL20BsrX8ujO6gFKfzC7ey4VLTuHozzH8iuAN6Xkv1@z4mIGbh2eGkzY7vrObUiO1H9zJHovaqRPti615RjfRa1BkeUZm4UdlshAlZAW3izUzaGeQTTOsrfsRQ6e18jRRxQORLOodN1ci5ToxSOgguc9bfn6StoY4TI5DOX6zKqd8z@DVS9tLtpZm4O50aHwwMF7IAnFYlsmQdvyuapJf9SrjpKwY4C4fithZ@x9d@yCMq70ohuBzmf/t2OOkgIpXnz9PYJ5xl/uW6i7uVyoerbiZTpwktyV84yVyhNdbFwpRiY6wjN2AnlppKWmkUpKQRZY0dnXXCb@35/N5Nuvt5e5vxr@Q8mrzj7@LSoktca7ihf8D)
 (qthreads), there are no real surprises as it runs a million loops on two 
threads (the maximum available) on a Sandy Bridge class processor at 2.9 GHZ in 
about 675 milliseconds and with only one thread in a little over a 1000 
milliseconds; this isn't surprising as with such trivial work (almost none), 
most of the time expended will by in just running the sync'ing.

However when run on my cygwin system (Windows Intel Sandy Bridge class i3-2100 
CPU with 2 cores/4 threads at 3.1 GHz, compiled with Chapel version 1.22 and 
gcc version 9.3.0) using default 4 threads, it runs much much slower than the 
above at about 3000 to 5000 milliseconds for only 1 loops with full four 
threads, which is slow but perhaps can be attributed to pthreads being slower 
than qthreads (although the variable results are worrisome).

Furthermore, the shocker is when the number of threads is dropped to say only 
two, it can only run 1000 loops in about 2500 milliseconds (highly variable 
upwards from this), and with 1000 loops run on only one thread (argument of 
\--NUMTHRDS=1), it takes 16,000 milliseconds (and sometimes more). Obviously, 
taking over ten milliseconds per task start prevented me from being able to 
effectively run tasks which take about one millisecond each as the overhead per 
task is over ten times higher than the actual work, and explains why I had to 
implement the thread pool.

This seems to indicate the you do indeed have a problem with your pthreads 
implementation as I suspected before (even before I understood the difference 
between "qthreads" and "fifo" and which are default where).

Why is "fifo" default with cygwin? Are there problems compiling with 

Re: Fastest Prime Sieve, in Nim

2020-06-13 Thread GordonBGood
@BLM2:

> @GordonBGood Please provide:

  1. Link to full source code to your work.
  2. Timing results comparisons between your code, my code, and primesieve.
  3. Link to detailed writeup on your methodology and implementation details.



It's been some time, but I have done quite a lot of work on this across various 
languages.

Answers to your requests as follows:

  1. In Nim: [here is a Wandbox of a fast version of a true Sieve of 
Eratosthenes with wheel factorization as 
discussed](https://wandbox.org/permlink/Fynn7Z4hss1fH9mQ). This is not fully 
optimized to the level of Kim Walisch's "primesieve", but it shows the 
principles of using wheel modulo residual bit planes in efficient wheel 
factorization sieving using something you would call PG4 (wheel factors of 
2/3/5/7) but with the added refinement of using preculling for the primes of 
11/13/17/19.
  2. This has many more optimizations to the innermost culling loop than your 
code so is actually faster, but **as your code isn 't really a Sieve of 
Eratosthenes as it can't count or output the found primes in order**, they are 
not really comparable. It is about as fast as "primesieve" even though it isn't 
complete as to optimizations extending it's range above about 16 billion, but 
you can compare it to primesieve over the range of a billion to 16 billion 
yourself.
  3. I have written quite a full tutorial on how it works as [a StackOverflow 
answer on implementing this in 
JavaScript](https://stackoverflow.com/a/57108107/549617) which describes 
everything that is done here other than the macro that does extreme loop 
unrolling (not possible to use in Javascript) and also describes the other 
changes necessary to extend the work to much higher sieving ranges.



Now as your work:

  1. No, it's not really unique as [Huang Yuanbing's 
KTPrime](https://github.com/ktprime/ktprime) is really the same thing other 
than his code isn't limited to only counting Twin Primes and can handle all 
prime tuples. His code is written in C++ but could easily enough be translated 
into Nim or any other language producing efficient native code. As his code is 
much more optimized than yours, I suspect that his code is at least as fast as 
yours even though it is written for the more general finding of any/all prime 
tuples and not just Twin Primes.
  2. Huang Yuanbing doesn't claim that his code is a full Sieve of Eratosthenes 
as it doesn't count or find primes in order but rather across all modulo 
residual bit planes with only a range of bit planes in memory at any given 
time. In order to produce results in order, they would need to be stored and 
sorted at an extra computational cost. Your code has the same limitation and in 
fact isn't written for the general case of finding or counting all primes at 
all in any order, making it not really a full Sieve of Eratosthenes although it 
uses some of the SoE culling principles.
  3. As you can see from my linked example in 1., your code still misses many 
culling loop optimizations that would make it much faster if they were applied. 
If I cared to re-write your code using my optimizations, it would run up to 
about three times faster, but I'm not going to do that because the end result 
is a Twin Prime finder with the results found out-of-order. My efforts are 
better spent on extending the work as per the posted link in 1. to extended 
sieving ranges.
  4. I could adapt my work to finding prime K tuples as well as individual 
primes and prime gaps **in order** and will likely do that for the general case 
so my code is more comparable to "primesieve". The fact that your code may be 
slightly faster (or quite a bit faster if it were better optimized) means 
little as your code is for a limited use case whereas "primesieve" and my code 
are for the more general full sieve use case.




Re: Comparisons of Nim to the Chapel computer programming language...

2020-06-12 Thread GordonBGood
@mratsim:

> Very nice analysis.

Thank you.

> Nim threadpool actually does this. And it should be the other way around, 
> channels are the building blocks and you build Flowvars on top. And you also 
> need a Channel type that is aware of async frameworks to bridge the async 
> world and the sync/multithreaded world. In Weave, Flowvars are just a ptr 
> Channel.

Difference of opinion, Golang seems to do as is your opinion, Haskel builds 
everything on MVar's, and F# has both async and task parallelism with the Task 
type it gets from DotNet including the FlowVar concept, but with those Task's 
able to work together with Async. F# doesn't have built-in channels, so if 
required on would have to build them from primitives.

My opinion could be changed if the version of FlowVar's I propose truly can't 
work with a version of async, but I'd like to see the implementation as by F# 
researched first.

> Did you look into how they implement data parallelism? If they use something 
> similar to OpenMP they have to pack the inner body of the parallel for into a 
> closure that can be distributed to other threads.

I spent about a week fighting task parallelism as looking into why it was so 
slow but didn't have an immediate need for data parallelism so didn't look much 
into it. However, I see that much of the compilation time is spent doing 
various data flow analysis such that the final ways of doing parallelism is 
dependent on the particular case. RE: data parallelism, the documentation 
states that it isn't guaranteed that each inner element will be performed by a 
separate thread so it seems that some grouping may be taking place and use of 
some internal thread pools may also be taking place, else the feature would be 
almost useless. Documentation for data parallelism shows parallel operations 
being carried out on individual Array element basis, so it isn't taking just 
the naive approach.

All parallelism seems to be implemented by using the pthread library directly 
and not through the use of OpenMP; however, they do manually internally 
generate some sort of closure or closure emulation (remember neither Chapel nor 
low level C has real closures) where the required parameters/captured values 
are passed in as parameters to a function which is the body of the background 
task. It seems to be reasonably intelligent in how it does that for data 
parallelism so that the overheads are greatly reduced.

My interest was more in low level task parallelism as I had an application in 
which I wanted to control execution order and the reasonably simple way I found 
to do it was in assigning tasks to a thread pool and controlling their order 
with sync variable queues.

> There is no reason to have such a high overhead especially in high 
> performance computing. OpenMP is way way way lower.

I agree, and I'm still not 100% sure where all the time is going, just that the 
loss of time went away when I implemented thread pools. Creating individual 
pthreads directly is in the order of about ten times faster. That said, in 
their fancy data flow analysis, Chapel sometimes surprises in that when I did a 
little toy low level test with extremely simple tasks, it was able to give the 
expected boost in multi-threaded performance (boosted by the multiplier of the 
number of real cores) for doing almost no work per task and I think that it 
somehow saw the work was negligible/trivial and automatically made thread pool 
groups out of the tasks.

My actual application was much more complex so that it couldn't see that an 
automatically generated thread pool was what was necessary; thus my fix of 
manually generating thread pools.

> As an example it takes 0.33 ns to do 4 additions on a vectorized 3GHz CPU, 
> i.e. you can do 12 additions per ns, so 12M addition per milliseconds, i.e. 
> with your figures, you would need to work on matrices of size 3000x4000 for 
> parallelization to start being useful, that seems way too large.

As per my discussion above and the documentation examples, Chapel is very much 
specialized for doing Array/Matrix parallel manipulations so it almost 
certainly does something in data parallelism so one would never experience 
these delays even for much smaller matrices than you describe.

> Intel TBB also target tasks in the 1 cycles range which would translate 
> to 3.33 µs.

Chapel uses pthread mutexes and monitors to implement sync variables (as does 
Nim) and so much of the delay for task parallelism using a thread pool when a 
work queue/channel and result queue/channel are being used will be whatever 
execution delays those impose, which is likely to be the same as for Intel TBB 
or any language using these mechanisms. It is more than adequate when the 
actual task "Chunk" takes about one milliseconds (or much less down to say 100 
microseconds), and I could increase the Chunk to up to about 50 milliseconds if 
I wanted to so the overhead time would be miniscule.

My main lament in 

Re: Comparisons of Nim to the Chapel computer programming language...

2020-06-12 Thread GordonBGood
I've edit/added a short lament on the lack of any template/macro capability in 
Chapel in the OP above.


Re: Comparisons of Nim to the Chapel computer programming language...

2020-06-12 Thread GordonBGood
@kcvinu, as to readability of generated C, there is always going to be some 
name mangling (changing of names to be sure they are unique) and changing of 
forms of code into simple standard forms. In this regard, Chapel generated C 
code is maybe easier to read than Mom's, as Chapel is closer to C in syntax.

Or perhaps you find C code difficult to read from the outset, with which I 
agree if one doesn't have experience with it...


Comparisons of Nim to the Chapel computer programming language...

2020-06-12 Thread GordonBGood
You might be interested in my observations about the Chapel computing language, 
which is a Cray (now owned by HP) off shoot that is in the Open Source domain. 
The reason for my interest is that it is a language about roughly the same age 
as Nim but specializes toward parallel computing which I am interesting in 
exploring and do explore including using Nim:

**Management of the project...**

Chapel seems to have as Benevolent Dictator For Life (BDFL) a supportive person 
named Brad Chamberlain in place of our @araq, and a group of contributors which 
I would judge to be smaller than our group.

There is probably not as much information available about Chapel than about Nim 
on-line, but StackOverflow does have some Questions/Answers, many answered by 
Brad Chamberlain himself or out of the other contributors.

**Ease of installation and use...**

Chapel is not so easy to install other than on Xnix platforms (including OSX) 
for which there are pre-compiled packages available but can be used on Windows 
using cygwin although one needs to compile the compiler and libraries from 
source.

The compiler is written in C/C++, not Chapel (at least yet), and can generate 
very fast code as all the usual compiler optimizations are available, but the 
compilation process of compiling source Chapel code itself (currently) is 
slw, taking about a minute to compile about 200 lines of code on my older 
Sandy Bridge class desktop with full optimizations enabled.

The compiler emits C code for further processing by a C compiler just as is one 
of the options Nim has, and the generated C code is fairly easy to read and 
understand as Nim's generated code is.

There aren't nearly as many CPU targets such as all the ARM variants although 
it does support ARM64, doesn't directly support a JavaScript backend (again it 
might be made to work through Emscripten but isn't directly supported) and all 
the various C/C++ compilers that Nim supports, although it does support a LLVM 
backend.

I would judge that the Chapel language has many fewer available libraries than 
Nim does, probably reflecting the smaller number of contributors (AFAIKT) and a 
smaller and more specialized user base.

**Nature of the language...**

Chapel syntax is similar to Python as is Nim's, but Chapel brought back 
compulsory semicolon statement terminators and curly bracket block delimiters 
in the manner of C/C++, it seems to aim to appease users of those languages who 
refuse to learn how to effectively use white space/off side rule languages.

Chapel does have Generics for all of functions, classes, and records, although 
the syntax to use them isn't quite as clean as that of Nim.

Chapel is very much an Object Oriented Programming (OOP) language and depends 
on that paradigm whereas Nim can kind of support that paradigm but it probably 
isn't the preferred way to use it.

Along that line, Chapel has First Class Functions (FCF's) and lambdas 
(anonymous functions) but neither of these are closures (capable of capturing 
bindings external to their function blocks and/or input parameters). This puts 
some severe limitations on things one would commonly do using closures as one 
is forced to emulate closures using classes, which isn't always possible as in 
the following discussion.

Currently, Chapel doesn't have reference fields for classes (which use heap 
storage) and records (which are instantiated as value storage, likely on the 
stack) and thus any captured values in a class or record are copy-on-capture 
and can't follow any external mutations to their state.

Thus, the only way to have a function/method follow external state is by 
calling it with a ref formal parameter, making it somewhat restrictive if one 
needs to implement a function using recursive variables such as for the simpler 
recursive implementation of a Y-combinator.

As for system-level features, Chapel doesn't really directly support a very low 
level of code although pointers and some low level functions are available in 
order to make the (mostly C/C++) Foreign Function Interface (FFI) work. These 
can be used to do such things as call into the operating system for 
cpymem/movmem and using a different bitedness for views into Array's, etc. by 
casting c_ptr's. That said, Chapel is actually using efficient implementations 
using pointer references internally, so they rarely are needed for actual user 
code other that for when squeezing every last CPU cycle out of particular types 
of code that must run fast.

**Use for parallelism /multi-threading...**

Chapel has a wealth of what they consider low level task oriented parallelism 
using sync/single variables (more-or-less equivalent to Nim's FlowVar's; 
Haskell's MVar's, etc.) and also many ways of doing high level "data 
parallelism" (which might be considered to be somewhat equivalent to Nim's 
Threadpool spawn).

The real reason for Chapel's existence is its support for multi "locale" 
parallelism over many computing 

Re: What's happening with devel releases and version numbers?

2020-06-12 Thread GordonBGood
@araq:

> No, there are currently problems with the nightly builds but nim devel is 
> green on all major OSes including BSD.

Thanks, good to know.

Any idea when the nightly builds might be up to speed, as some online IDE's 
such as Wandbox only update their devel option when the nightlies are updated.


Re: What's happening with devel releases and version numbers?

2020-06-11 Thread GordonBGood
@araq, thanks for the quick reply regarding version numbers; I get it.

> If you regard Nim as incomplete without arc then that's fair but it's not the 
> same as unstable.

Yes, that's what I mean: it's incomplete.

How about the first part of the question on why there have been no OSX or Linux 
x64 devel 1.3.5 compiled releases for over a month? Does that mean there are 
problems compiling to these two targets?


What's happening with devel releases and version numbers?

2020-06-10 Thread GordonBGood
First, why haven't there been any full devel 1.3.5 releases including Linux x64 
and OSX for the last month since the 11th of May this year? There have been 
lots of releases but they haven't been "full", only including various ARM 
versions for Linux and no versions for OSX, at least pre-compiled versions 
which kind of implies that these versions can't be compiled for these Linux and 
OSX targets.

Second, I thought I understood the Nim version numbers as having odd numbers 
when being developed and even numbers when stable; thus 1.0.6 and 1.2 0 are 
stable but 1.2.1 is not but is a development version of a future possible 1.2.2 
which would be a version including relevant bug fixes and back ports.

What confuses me in the repo version numbers is that occasionally we get a 
version 1.0.7 which would imply that there will eventually be a 1.0.8 even 
though most are using the (mostly) stable 1.2 and waiting for either a minor 
update to 1.2.2 or a major update to 1.4.0 (or higher), which is what the 
version 1.3.X versions will become.

Also, the current devel branch went from 1.3.1 to 1.3.3 to current 1.3.5 even 
though there have been no stable releases inbetween, although I suppose that 
might somehow be tied to the state of back ports to the previous versions. This 
isn't so much of a problem as the highest number is probably always the latest 
devel.

As an observation, this "odd = unstable" kind of makes sense even as applied to 
major versions 1.0, 1.06, 1.2, etc., as there are so many anticipated changes 
related to using gc:arc to make threading more usable that these versions could 
almost be considered to be semi pre-final non-breaking changes versions, at 
least if one wants to get away from using GC in Nim but doesn't want to drop 
down to full unmanaged memory as I really want to see the language able to do 
in a stable version.

I'm starting to feel that Nim can't be regarded to be fully stable until it 
approaches version 2.0, whether that includes owned/borrowed/lent on references 
or not, as it seems that implementing gc:arc has been much more complex than 
anticipated, what with having to support so much legacy code.

I'm sorry that I haven't been able to help more with \--gc:arc implementation, 
which is too complex for my limited computer science background: my background 
is enough so I can follow as fixes are made but not be competent enough to 
contribute effectively. I am preparing some tests that will fully tax gc:arc 
when I feel it is ready, but until then have fallen back to mostly using 
unmanaged memory for threading, as that is adequate for most of the things I do 
to show that Nim is effective as a systems level language. However, for most of 
this there is no reason that the memory management can't use \--gc:arc 
reference counting as the overhead will be minimal and the resulting code would 
be cleaner.


Re: Another state of generating Android APK thread...

2020-04-20 Thread GordonBGood
My [Quora answer using 
this](https://www.quora.com/Can-you-write-a-C-program-that-finds-all-prime-numbers-from-2-to-2-billion-in-under-1-second-on-an-average-500-PC/answer/W-Gordon-Goodsman)
 has been restored - enjoy!


Re: Another state of generating Android APK thread...

2020-04-09 Thread GordonBGood
@glukhov:

Thanks for the advice on compiling as a static library; looks like I should 
give it a try for these repos.

Also, thanks for the pointer on using ANDROID_HOME to find the location of the 
Android NDK. However, it isn't automatically set on my Windows machines with 
Android Studio installed; does it always have to be set manually?


Re: Another state of generating Android APK thread...

2020-04-08 Thread GordonBGood
@yglukhov: glad you liked it. In a couple of weeks when I'm unblocked from 
editing, I'll remove or change a few links and try my answer again, as it 
seemed to be good:. It showed that the task didn't need C++ to do the job on a 
Intel i3-8100 or i3-9100 as the right algorithm (mine ;-) ) can do it in 
JavaScript, and if we use the same algorithm with a few extra tweaks made 
possible, an efficient native code producing language such as Nim can fulfill 
the task on the least expensive smartphone available.

I considered trying the approach you use in nimx (also couldn't one use a 
dynamic library as easily as a static one) for the reasons you give, but didn't 
do it because it seemed one would have to hard code the root location of the 
NDK in the make file. Or am I wrong? I might go back and try that.


Re: Another state of generating Android APK thread...

2020-04-07 Thread GordonBGood
So here is my version of a [Maximally Factorized Sieve of Eratosthenes 
Benchmark as an Android 
app](https://github.com/GordonBGood/SieveofEratosthenesBenchmark) using an 
algorithm which is at least as fast as [Kim Walich's 
primesieve](https://github.com/kimwalisch/primesieve) on platforms on which 
they run in common up to the range of about 65 billion or so; it lacks the 
optimizations for more efficient sieving above this point in order to keep the 
code simpler and easier to understand. The required optimizations in order to 
extend the range to be the same are explained in the README file of the repo.

This work was done to show that one doesn't need to write code in C/C++ in 
order to be fast as per [a Quora question on prime 
sieving](https://www.quora.com/Can-you-write-a-C-program-that-finds-all-prime-numbers-from-2-to-2-billion-in-under-1-second-on-an-average-500-PC),
 but the Quorum Moderators didn't accept my 
[answer](https://www.quora.com/Can-you-write-a-C-program-that-finds-all-prime-numbers-from-2-to-2-billion-in-under-1-second-on-an-average-500-PC/answer/W-Gordon-Goodsman),
 likely because it uses Nim rather than C++ and shows that it doesn't take a 
€500 PC to perform the task of sieving to two billion in under one second but 
that it can be done on the least expensive of currently available smartphones 
and not using C++.

You can try this app by [downloading the zip file from 
here](https://github.com/GordonBGood/SieveofEratosthenesBenchmark/blob/master/app/release/SoEBenchmarkAndroid.zip),
 extracting the APK file, and side loading the APK file to an Android 
smartphone. The results should look something like the following:

> 


Re: Another state of generating Android APK thread...

2020-03-26 Thread GordonBGood
@yglukhov: Now that I'm stuck at home and have lots of time to work on this, 
I've made it work as follows:

Most of the things that made it not work before were things not complete in 
other examples such as @akavel's "hellomello" or @treeform's [GLFM 
project](https://forum.nim-lang.org/t/5197#32584) with people not really 
understanding the relationship between the Android GC and Nim's, with many 
recommending not using Nim's GC but using \--gc:none or one of the new 
experimental ones; doing that isn't really satisfactory as one then can't 
easily use Nim's strings (or any other structure that uses heap data) and needs 
to do a lot of work generating cstrings themselves and manually memory managing 
them to avoid memory leaks.

In fact, the official Nim documentation contains most of what is needed, with 
@araq seeming have worked on this at one time, as follows:

  1. The need to call NimMain() before doing anything else is documented 
[here](https://nim-lang.org/docs/nimc.html#cross-compilation-for-android). I 
think the very slickest way to do this for a dynamically loaded library is to 
use the JNI_OnLoad event so that is what I've done and will continue to do.
  2. The above link recommends using Nim as compile-only to generated the 
required C or C++ files implying that one then uses the Android Studio Gradle 
build scripts to do the rest, but that second step means we need to make 
"nimbase.h" available to Gradle to find, which is mentioned 
[here](https://nim-lang.org/docs/backends.html#backend-code-calling-nim-nim-invocation-example-from-c).



There are things that I haven't seen mentioned anywhere as follows:

  1. Gradle is usually used to produce different "flavors" of the required 
APK/Bundle files containing APK's for the different supported ANDROID_ABI's 
(armeabi-v7a, arm64v8a and x86, and x86_64 currently) ans well as a "fat" APK 
containing all of them from which the Android installation system can select 
the native libraries required. That is easy for C/C++ where primitive types 
always have the same number of bits no matter the platform, but Nim has 
different int/uint bitedness depending on the platform and thus we need 
separate compilations for each CPU. I chose to use ndk-build Android.mk make 
files (I find them simplere to use) and adjusted them to account for the (four) 
different directories that must be used to cover this case.
  2. The above mentioned Android.mk file needs to be adjusted to refer to 
LOCAL_LIBS in order to add other Android capabilities such as logging.
  3. Theoretically, one could refer to the required "nimbase.h" file in place 
from the "Android.mk" file so one wouldn't need to make a copy into the code 
base, but I couldn't get that working nor did I see any forum posts where 
anyone has. If anyone can solve this, it would be appreciated!



In my projects, I have included a copy of @yglukhov's jnim "jni_wrapper.nim" 
file as a convenient way to use all the boilerplate, and have tried to include 
all due licence conformity.

So my first working project was to translate Android's NDK sample "hello-jni" 
from C to Nim, which is successful as per my repo 
<https://github.com/GordonBGood/NimHelloJNI>'. This example is similar to 
@akavel's "hellomello" project but doesn't depend on `NativeActivity, needs 
Android Studio (and its contained JDK) in order to be used, but is more 
complete due to the above reasons.

My second and more useful example was to translate Android's NDK sample 
"hello-jnicallback" from C to Nim, which is successful as per my [repo 
<https://github.com/GordonBGood/NimHelloJNICallback>'; this includes all of the 
above but shows how to move data back and forth between native Nim and the 
Android Java/Kotlin code via called methods and includes a background thread as 
@glukhov mentioned using in one of the posts 
`above](https://forum.nim-lang.org/t/6045#37435).

So I'm finally ready to write my Android benchmark app using what I have 
learned so far, and will post a link to the repo and comment on its development 
here when it is up and running, which I no longer have any doubts about being 
able to do. Although I have the Nim GC running, my actual multi-threaded code 
won't be using heap data to avoid the problems and overhead of multiple heap 
spaces, thus I will give up on the use of channels and threadpools which I 
don't regard as being stable enough for use until they are stable for use with 
the new -gc:arc; that would make the code more elegant but won't make any 
difference to the speed of the benchmark.

So, to get on with it...


Re: Another state of generating Android APK thread...

2020-03-10 Thread GordonBGood
Hi @akavel:

I have quite a lot of time to spend on what I think are worthy projects, too, 
and find that easy-to-use tutorials tend to lead to advances along certain 
paths that otherwise tend to be ignored; I think one of those is the use of Nim 
for building Android apps.

> I would probably recommend to go with a path more or less like this...

Yes, that is generally my approach and I have done the first few steps already. 
My problems are likely at the stage of introducing Nim to generate the C/C++ 
code in trying to do too much too soon in trying to compile for arm8a before 
just doing a basic arm build - the APK compiled but wouldn't install even 
though my phone is 64-bit Android with a arm8a SOC; I should have gone back a 
step to compile for just arm to be sure that works, and I'll get to that.

> As to multi-threading, I have no slightest idea how it works with Nim and NDK 
> on Android...

I actually know quite a bit about mutli-threading with Nim and also a fair 
amount about it with the JDK (which I assume should be about the same as for 
Android); also, I must assume that the basic Nim threading interfaces work with 
Android but I'll have to test to verify that once I get the interfaces between 
native Nim and Java/Kotlin working.

> EDIT: One extra note: please bear in mind, that with the Android Studio path, 
> you'll never be able to fully escape writing some Java/Kotlin code, esp. when 
> using GUIs...

I'm not worried about writing the GUI glue code in Java/Kotlin; this isn't much 
different than what one has to do for a Web Assembly app running in a web page 
as the glue code needs to work through Javascript.

> Also, the memory model between Java's garbage collector and Nim's garbage 
> collector is somewhat tricky...

Yes, and I intend to avoid that for now by not using GC (\--gc:none) and manual 
memory management for this particular project at least until \--gc:arc gets 
more stable, as that would seem to be the path forward as per @araq's goals.

My general aims are to show that one can use the same Nim code base with very 
few changes to generate a web page by compiling directly to Javascript ("nim 
js"), compiling through emscripten to either asm.js or to web assembly, 
compiling to mobile apps as is the point of this thread, or obviously compiling 
to a desktop environment with a GUI of one's choice as would be commonly 
currently done.

> ...using Android Studio will let you include various JAR packages from the 
> Internet in your project, while the Dali path won't, at least for a very long 
> time.

Without knocking your project, that is why I've chosen this approach rather 
than your Dali approach.

Thanks for your suggestions and for your Dali project which got me started 
thinking of the possibilities of writing Android app's using Nim.


Re: Another state of generating Android APK thread...

2020-03-07 Thread GordonBGood
@yglukhov:

> I'm not into writing tutorials right now, but I can answer any concrete 
> questions.

I can appreciate that not everyone is into writing tutorials; thus I am 
proposing that I will write the tutorial so that others can refer to it from 
their documentation, including yourself, but I need some help getting it 
working so that I can document it...

Perhaps I need to start with a repo showing my problems so far in which I can 
produce an APK file, but can't install the file and you can quickly spot my 
problems and make suggestions on how I can improve the process.

> Jnim at this point allows "implementing" java classes in nim (still 
> experimental feature though).

It seem that many things in Nim are still "experimental" including the memory 
management model, but unless we provide some reasonably easy-to-use methods to 
use these features, they will never get beyond being "experimental".

> One important point to consider is how pure-nim you wanna go with the 
> toolchain. Surely we want to build apks without java, android sdk, gradle, 
> etc

The above would seem to be the @akavel/Dali approach, but other than problem 
with the restrictive license he chose, I see the same problems as you do with 
the required reverse engineering, etc. I also look at why we must insist on a 
pure Nim toolchain as to advantages, as we already have a usable 
NDK/JDK/Android Studio/Gradle toolchain and the main problem is that I don't 
want to be forced to write C/C++ code but just have a way to cause the output 
of the Nim compiler to be used instead. This is what I would like to document 
as to how to go about it.

> Then comes the runtime question, in particular: on which android thread is 
> your nim "main" thread. Your mentioned use-case suggests nim main thread is 
> equal to activity UI thread, but any high-fps opengl/vk app would likely 
> prefer a secondary thread ... controlling this threading aspect from nim 
> still remains unsolved afaik, partly because of nim threading nuances.

Am I right that if Nim procedures are just called from Java/Kotlin through JNI 
that they will then just run on the activity thread from which they are called? 
If this is true, the threading control would be from the JDK end and would only 
be a problem if inter thread communication were needed at the Nim end, which 
isn't always required.

Also, I seem to see that Nim multi-threading on Android probably already works 
but the main problems are due to Nim's legacy GC'ed memory management. For 
simple app's, manual C-style memory management is likely enought, and for the 
future, \--gc:arc or \--newruntime will likely work (as per @araq's aims) and 
provide what is needed.

My main drive for trying to do this at this time is to be able to demonstrate 
the speed of a multi-threading algorithm on Android, which algorithm can 
actually be written to use manual memory management (although working 
\--gc:arc, channels, and spawn'ing that worked with \--gc:arc would make it 
more beautiful in terms of code and minimize extra "glue" code); however, in 
order to easily change test parameters, it would be good to have a user 
interface and an easy way to do it would seem to be as proposed.

I guess if the above is too difficult to get working, I could take the 
alternate approach of using a working example of SDL2 with a working interface 
and just patch in the algorithm I am trying to show working, but it would seem 
that SDL2 would be more complex am more something to be used for writing native 
games that what I am trying to do here.


Another state of generating Android APK thread...

2020-03-06 Thread GordonBGood
# Another state of generating Android APK thread

## Synopsis

The capability of easily compiling Nim to generate an Android installable APK 
file via the Android Native Development Kit (NDK) as per the title of the 
thread seems to be a subject of on-going interest, including by me.

## Past references and their limitations

The following links contain some relevant information, but none seem to be 
complete as per the comments:

  1. [The Dali Link by @akavel](https://forum.nim-lang.org/t/4840#30334), 
contains some useful runnable examples but I am not against using any of a Java 
Development Kit (JDK) or Android Studio.
  2. [Old forum thread](https://forum.nim-lang.org/t/1866#11614) in which 
@yglukhov gives a link to a nmake file in his nimx project that is said to help 
generate a APK file through the NDK.
  3. [A newer forum thread](https://forum.nim-lang.org/t/3575#22318) discussing 
the same but still with no specific examples.



The main problem with all of these previous inquiries or work is that none 
other than the "Dali" thread have specific simple runnable examples and none 
include having the native code receive input and output to a user interface 
that would preferably be implemented through a standard XML layout (via JNI?).

## What I would like to see

A step by step procedure or link to a GIT project that does the same as my 
posts showing how to generate a simple Javascript Single (web) Page Application 
using only Nim-generated Javascript and/or Web Assembly as per [my Javascript 
post here](https://forum.nim-lang.org/t/5296#33267) and [my Web Assembly post 
here](https://forum.nim-lang.org/t/5296#4).

## Useful libraries that might be helpful

  1. [jnim library](https://github.com/yglukhov/jnim), this contains some 
useful functionality for addressing the interface between Nim and the Java 
Native Inteface (JNI), but there are no examples of its use other than as 
mentioned in the tests.
  2. other libraries that get referenced here that I'll edit in here...,



## Conclusion

A simple runnable example showing how to generate an APK which is an Android 
app with a simple button layout that takes input from an editable field and 
outputs to another text field when the "go" button is pressed but using Nim 
code through the NDK for the business logic would seem to be useful to anyone 
trying to accomplish this (including me).

## This query is particularly directed to the following forum members:

@yglukhov, @akavel


Re: undeclared identifier: 'PGenericSeq' when using '--seqsv2:on'

2019-11-06 Thread GordonBGood
@drkameleon: "\--seqsv2" is a "future feature" that is a recent development 
that has come about because of experimental work done with "\--gc:destructors" 
and "\--newruntime", for which two uses (and --gc:hooks, which is a special 
adaptation of --gc:destructors) it is a hard-coded prerequisite to make them 
work; it is intended that it be back ported to be able to be used across all 
current GC's. In fact, it changes the implementation for both `seq`'s and 
`string`'s, not just `seq`'s.

You'll find that it doesn't currently work for any of the other GC's or even 
--gc:none with the same kind of compiler error.

Hopefully, you'll find that it will work for something like Nim version 1.1 or 
1.2, and it is intended to eventually be the default implementation for `seq`'s 
and `string`'s for everything.

For normal use (when it works), you shouldn't see any difference in its use 
from the current "legacy" `seq`'s and `string`'s as the API is intended to be 
exactly the same. The reason for the proposed change is that the implementation 
inside the compiler and runtime libraries is cleaner to use less memory and 
hopefully be faster.

As well, do note that that "-gc:destructors" and "\--newruntime" you'll find 
mentioned as compiler command line switches are in development and don't work 
entirely properly yet either. Destructors is likely to work before the 
Newruntime switches, but these are development explorations that will likely 
work in some form for future versions.

Just to shortcut your frustration in your experimentation, "\--gc:go" doesn't 
currently work either as there is a dependency on a Go language library file 
that currently isn't shipped with the Nim installation files for both Linux and 
Windows (don't know about MacOS). In case you see it mentioned, "\--gc:v2" has 
never been fully developed and tested and is likely to be eliminated in future 
versions. If you are trying other GC's and want one that works across threads, 
"\--gc:boehm" does work pretty well.

You should know that since you are running a devel branch version of Nim, there 
will be many features other than as mentioned that are experimental even though 
they aren't documented anywhere as such.


Re: Why is Seq.reverse() not part of the standard lib?

2019-11-04 Thread GordonBGood
@kayle, possibly reversing `seq`'s is not implemented in any of the current 
standard libraries because it is so easy to do writing it yourself.

> You have, map, filter, reduce, etc. but not reverse?

These are a little trickier to implement that a simple reversal:

I recently needed to reverse a result string in place, so investigated the best 
way to do it. I ended up using the first method from [rosetta code reversing a 
string](https://rosettacode.org/wiki/Reverse_a_string#Nim) in the first 
example, which is about as efficient as anything when one already has a 
"result" as a mutable (var) string. If one wanted a `proc` returning a 
reversal, the second example is just about as simple although requiring an 
allocation.

`string`'s and `seq`'s are really the same thing except that strings only 
contain `char`'s; thus one could quickly write the following generic `proc`'s 
to reverse `seq`'s, either in-place as for the first `proc` or as a new 
version, as in the second: 


proc reverse[T](s: var seq[T]) =
  for i in 0 .. (s.high - 1) shr 1: swap(s[i], s[s.high - i])

proc reversed[T](s: seq[T]): seq[T] =
  result = newSeq[T](s.len)
  for i in 0 .. s.high: result[s.high - i] = s[i]

var s = @[ 1, 2, 3, 4, 5 ]; s.echo # original

s.reverse; s.echo # reversed in place

let ns = s.reversed; ns.echo # new copy reversed again


Run

which outputs: 


@[1, 2, 3, 4, 5]
@[5, 4, 3, 2, 1]
@[1, 2, 3, 4, 5]


Run

and which you can play with [on this Nim 
Playground](https://play.nim-lang.org/#ix=20QU).

Hardly seems worth importing a library for one of these and you could possibly 
even inline them to avoid the `proc` call!


Re: Borrow Checker In Nim?

2019-11-04 Thread GordonBGood
@MrZoraman, Although there was some discussion on this forum in the past about 
making Nim a little safer as to controlling mutation, I don't think that it was 
@Araq who instigated the suggestions but rather others such as myself who 
thought it might be a good option to have so as to compete with languages that 
are regarded as "safe" such as Rust.

However, neither @Araq nor most of us here that suggested such an option want a 
"borrow checker" with all the syntax and rules such as Rust, and I don't think 
there is any way @Araq would approve such a change: he already laments the 
requirement of "newruntime" to have added "owned" key words at particular spots 
and regards that as one of the hold-ups to that concept, even if it can be made 
to work and accomplish its goals.

The form of "borrow checker" that I suggested would be built on tome of 
"newruntime" (when it works) and as suggested wouldn't require any more syntax 
changes that what are required to support it.

That said, we are up to version 1.0 and therefore there is a backward 
compatibility guarantee which would mean that any such additional features that 
would break current Nim would have to be optional so all current code would 
continue to compile and run.

That said, Nim already has an almost invisible form of "borrow checker" in Data 
Flow Analysis (DFA) as to the lifetimes of things that can be accessed so as to 
error out when it sees that something would be attempted to be accessed after 
its lifetime; that is one of the essential parts of what a "borrow checker" 
does, with the other being to prevent mutation of things by more than one 
"controller" at the same time; which is what my proposal for "newruntime" would 
attempt to extend it to do with (almost?) not cost in extra syntax .

Finally, both "\--gc:destructors" and "\--newruntime" are development features 
that are under development to check the feasibility of such schemes, with 
neither fully working nor any definite time frame for when either may be 
completed and accepted as main stream Nim.

For the above reasons, I wouldn't let rumors and FUD statements by others 
prevent you from exploring and using Nim in its current state of development, 
which is a very productive and useful computing language. The time "invested" 
in learning and using any computing language is almost always a net win even if 
you never use it for work related projects, as you advance as a programmer in 
learning other points of view.

For myself, if Nim ever adopts the form of "borrow checker" that Rust uses 
without options to not use it, I'm "out of here", as one of the reasons I am a 
fan of Nim as compared to Rust is its ease of learning and use. I think many 
here share my view (and yours).


Re: Current status of Nim for Webassembly?

2019-10-26 Thread GordonBGood
@bluenote: Just to augment what @yglukhov as said:

> Does the EMSCRIPTEN_KEEPALIVE play a role in this?

All this flag does is to prevent the clang c/c++ compiler from 
dead-code-eliminating the native code wasm `proc` which otherwise wouldn't have 
any referenced if it is only referenced from the JavaScript side.

As to GC issues, through the "emscripten magic" it seems the Nim GC is able to 
treat the provided memory as its heap so that GC is made to appear normal to 
the Nim code; the problem is when one tries to FFI share memory between Nim and 
JavaScript. The emscripten documentation is actually quite good at describing 
the standard things that can be done to allow this, but of course because c/c++ 
doesn't have a GC, it must be extended a bit when a Nim GC is active.

Or at least that's as I understand it and it seems to work...


Re: Nim v1.1 and beyond roadmap

2019-10-24 Thread GordonBGood
@Araq' Thanks for the further data on `async` using ref counting GC - I didn't 
realize it only worked without the cycle detector and a further "tiny" code gen 
patch.

> So we need a way to duplicate owned refs that still guarantees cycle freedom, 
> statically. I know that's what you're proposing via a `=deepCopy` for 
> `owned`...

Perhaps we should stop calling what I proposed as `deepCopy` using `=deepCopy` 
as I just grabbed that term as as being the location where "outer" ref counting 
would be injected. Perhaps that would be the "hook" whereby my proposal could 
be best implemented, maybe not...

> but I don't know if it can work out.

 **So how to test this and what code base to test it against?** I am probably 
able to do some patches against the current devel branch if they don't include 
code gen/"compiler magic" but I would prefer to leave any AST manipulations to 
you and your dev team as I have little experience in that area.

I have held off doing too much in the way of experiments as I would like to 
include ""\--seqsv2", but I suppose that isn't really necessary as that was 
implicitly turned on in "\--newruntime". I guess I'll start with some copying 
experiments somewhere combining use of copying `seq`'s, `string`'s, closures 
containing `ref`'s, and `ref`'s. It would seem to be easy to test it I just new 
how to "hook" into the disposal of `ref`'s when at the end of `proc`'s in the 
way that "nominal" types get the "hooks" called now.

**I would like to work within the framework of the ``=destroy``, ``=``, 
``=sink``, and ``=destroy`` "hooks" but as applied to ``ref``'s and not to just 
"nominal" types. Would this be possible?**

**Is there a place in the code that could make these "hooks" readily accessible 
to me?**


Re: Current status of Nim for Webassembly?

2019-10-24 Thread GordonBGood
@bluenote:

> Does compiling via emscripten and loading the resulting JS glue code satisfy 
> this criterion or not?

It seems to work as to satisfying this criterion as [in the code I posted 
here](https://forum.nim-lang.org/t/5296#4).

The biggest problem in coding for WebAssembly use is handling data that needs 
to cross over between WebAssembly and JavaScript, but my experiments with that 
code seem to indicate it's possible through the emscripten facilities even 
though it may be messy.

Yes, you have to consider GC as JavaScript has its own and the WebAssembly code 
may also have its own if you use Nim's GC, but it doesn't seem too bad, at 
least until you try to use WebAssembly multi-threaded.


Re: Nim v1.1 and beyond roadmap

2019-10-24 Thread GordonBGood
@Araq:

> It's important to see the full picture, most of what Nim's `async` does is 
> probably simply hidden from your view as F# delegates the hard work to C#'s 
> event loop implementation whereas in Nim there is little in between `async` 
> and the underlying event loop. IMHO that's why you think is "bulky".

 **You are probably right** ; although I have looked at how F#'s computational 
expressions are implemented (investigating why `seq`/enumerations are so slow), 
I've never looked under the covers at the DotNet code used to actually 
implement what it uses, though that is now open source and available in 
DotNet.Core. I do know that F# calls into DotNet's thread pool dispatcher when 
it needs to use threads for any sort of multi-threading and where threading is 
required for `async`/`await`, so it isn't so much calling into C# facilities as 
just the general facilities available in DotNet.

**As stated, it 's beside the point of the problem** of implementing either a 
GC that can support Nim's `async` in whatever it does or making it possible to 
use "newruntime" to do it. As you say, any working tracing shared-memory GC 
such as Boehm, Go, or your new "araqsgc" can be made to do the job. **My point 
is that it doesn 't look to be too hard to make the "newruntime" support a 
hybrid of reference counting and B/D to also do the job** without requiring 
changes to the current `async`.

**My idea of reasonably easily adding a combination of the different concepts 
to "newruntime" would seem to make it work.** It combines the currently almost 
working `owned ref`'s with reference counting where multiple ownership of the 
same data is required (especially in multi-threading) and with 
`deepDestroy`/`deepDispose` that I think may be able to accommodate cyclic data 
if desired so as to seemingly fill the bill. I've now backed off as trying to 
do this through "\--gc:destructors" hooks and now think that these ideas would 
just be part of "newruntime" **with the main extra requirement that there would 
be a little bit of extra code required to handle the two levels of destruction 
as in B /D and the shared-memory reference count**.

I've backed off of tying into the old `threadpool` type `deepCopy` which has 
already been removed for "newruntime"; my point there is that one would inject 
the shared-memory ref counted copies at the same place where these `deepCopy`'s 
used to be injected, but that they shouldn't need to have all the "compiler 
magic and cruft" with the new destructor based versions of `seq`'s/`string`'s 
(and closures?), perhaps with the currently forbidden copy of an `owned ref` 
becoming possible through reference counting.

You've mentioned adding some extra fields such as task ID's to `async` to make 
it work with "newruntime"; my ideas here are to try to make "newruntime" work 
so that changes aren't required to `async` to be able to work so that the 
accommodations would be made in one place. **My thinking is that if field(s) 
need to be added** to `async` tasks that likely need to be atomic in order to 
make it work with "newruntime", **it would be better to first investigate just 
adding atomic ref counting to B /D in "newruntime"** in such a way so that it 
would be used about as often as those extra task fields in order to narrow and 
thus simplify the things that need to be changed and tested.

**I am trying to think of a way to easily test my ideas in the current code 
base** as by dropping back to using "newruntime" would seem to eliminate the 
requirement for "compiler magic" that isn't already there and would only 
require changes within the handing of `owned ref`. The primary areas that need 
to be tested is how the "big three" of `seq`'s, `string`'s, and closures 
environments as well as `ref`'s cross the shared memory barrier. You have 
mentioned back-porting `deepDispose` and `dispose` across all the memory 
management so that part will have to be done anyway.

**You 've mentioned a "race" between "araqsgc" and "newruntime"** and whichever 
supports `async` first wins. Although I see that the mi-malloc allocator may be 
a desirable facility to have available as an option in Nim, I see that 
"araqsgc" is just another mark and sweep GC and **I would really like to see 
"newruntime" win, as it is unique and more elegant.**


Re: Nim v1.1 and beyond roadmap

2019-10-23 Thread GordonBGood
@Araq:

> I haven't tried to use `async` in Nim yet because I haven't had a need for 
> it, but I am very familiar with the concept from F#, where it works very well 
> (but which also has a tracing GC).

I've now had a quick look at those libraries from Nim. I'm not really in a 
position to comment meaningfully as I haven't tried to use them, but my first 
impression is that they are quite "bulky" as compared to the F# `async`/`await` 
implementations with which I am familiar.

However, since we are now at version 1.0+ and these don't seem to be marked as 
experimental libraries, I don't know that changes to make it appear to be as 
clean as F# are possible without breaking the API's. Part of why F#'s 
implementation is so clean to the user is the `async` is implemented as a 
computational expression which has elements of Haskell's monads and the fact 
that they are slow implemented this way doesn't much matter as compared to 
making sequences (the F# `seq`) implemented with computational expressions, 
which makes enumeration extremely slow, as use of `async` is more of a "coarse 
grained" use. However, `async`/`await` has been crossed over to use by C# 
(maybe partly by "compiler magic", as I don't believe it has computational 
expressions) and TypeScript and adopted by JavaScript ECMASCRIPT so it seems to 
be possible to migrate.

I suppose that Nim's "asyncmacro" library could be considered to be the 
"compiler magic" applied to make it possible in Nim and with that in mind, the 
implementation doesn't seem that bad, but somehow we seem to have lost some of 
the F# elegance in being able to do something like the following: 


let fetchAsync(name, url:string) =
async {
try
let uri = new System.Uri(url)
let webClient = new WebClient()
let! html = webClient.AsyncDownloadString(uri)
printfn "Read %d characters for %s" html.Length name
with
| ex -> printfn "%s" (ex.Message);
}


Run

But as I said at the start, I haven't used the Nim `async` libraries enough to 
be able to give a valid option.


Re: Nim v1.1 and beyond roadmap

2019-10-23 Thread GordonBGood
@Araq:

> Please don't repeat my mistake, no need to implement it when you can already 
> talk about it. ;-)

 **I really do like to back up my "talking" with little "mini-experiments" in 
code to prove and develop the concepts**, but I'll have to think more about how 
to do it simply, as otherwise it needs compiler help. Meanwhile...

I'll just throw the idea out and hopefully it isn't too laughable, as follows:

  1. We seem to be agreed that we would prefer not to just implement Yet 
Another Gar(bage) Collector (YAGarC), as we seem to have an excess of them that 
can be used in Boehm, GoGC, and now "araqgc", which I'm sure can be made to 
work. All of them will trade computational overheads, against latency, against 
throughput, and we've grown to not really like any of them, plus any of them 
can be chosen for those who truly love GC's.
  2. This has nothing to do with allocators, of which any of the Nim native 
one, malloc, mi-malloc, or some other new invented one can be used as 
appropriate, they all do the same job in providing heap memory when asked for 
it and freeing it when desired along with a couple of "tweaks" like resizing 
and reallocating; they just again provide various trade-offs and the new 
"mi-malloc" seamingly a very good one.
  3. The idea of "hooks" strikes my fancy, and in particular the `newObjHook` 
that when asked for memory returns a pointer to a free area of the requested 
size, but is well able to have an "extra" attached structure "head" containing 
anything we desire and do anything else on the side at the same time such as 
creating and using pointer structures, tables, etc.
  4. I don't particularly like the `transverseObjHook` even when it works as it 
would only seem to be of use in implementing GC's as in requesting a Mark phase 
or a Sweep phase. Remember, We don't really want YAGarC!
  5. We both really like the current destructor/copy/move semantics, which also 
is what really helped make the `owned ref` B/D reasonably easy to implement; 
however, although I still like it and think B/D can be combined with this idea 
in order to optimize non-shared ownership, this idea (kind of) extends it to 
efficiently "cloning" ownership.
  6. So I propose a new `destroyObjHook` (instead of `transverseObjHook` or as 
well as) that gets called only for `ref`'s when they go out of scope or at the 
end of `proc`'s just as destructors get called for "nominal" types. If the 
compiler can figure out when it time to call `=destroy` on those types (or 
inject a `=destroy` when there isn't one), it should be able to easily do this 
just for the `ref` type. But wait for it, this isn't all...
  7. If `ref`'s are destroyed at the end of scope or `proc` if they haven't 
been "sink'ed" away, how do we create clones so we have multiple owners of the 
data? Answer: we (Oh no, that dreaded word) `deepCopy` them. But wait, 
`deepCopy` in its final (not the `channels` version, which is terrible) 
actually wasn't so bad, and this isn't the `deepCopy` where everything gets 
duplicated; this is the overridable `=deepCopy` that can be applied to `ref T` 
and is always overridden to just do a ref count! In a way, it means that 
copying of owned ref's is no longer forbidden, it just becomes the some owned 
ref with a higher separate (permanent even for production) ref count.
  8. Now, the destroy called through the hook only destroys if there is no ref 
count and otherwise just decrements the ref count, and thus the object will 
only be destroyed once.
  9. Oh no, you say, that sounds a lot like atomic ref counting, and that isn't 
particularly fast. No problem, I say, as this "cloning" only happens when one 
needs multiple owners such as across threads, which is (should be) quite 
"coarse graIned" anyway. For other uses, this can be combined with such ideas 
as B/D if performance turns out to be a problem for almost no overhead (in 
production, with that part of the "inner" ref counting turned off).
  10. Ah, you say, but what about data cycles. Well, I say, maybe the "hooks" 
can support cycle detection analysis as some sort of deferred action as part of 
the "semi GC" role through the hooks, and the proposed options of B/D or 
"araqsgc" don't really handle that very well yet either.
  11. But wouldn't this be hard to implement, I hear the (dev) crowd groan. I 
reply, it seems that you have implemented most of it already, and this just 
corrects it. Ugly `deepCopy` (as implemented for `threadpool` can be applied in 
exactly the same places, just it becomes beautiful as it just makes the data 
structures (including closures, which no longer have to be prohibited) able to 
be shared. REALLY REALLY UGLY `deepCopy` as implemented in `channels` gets 
changed to the same `system.deepCopy` (or `system.clone` if you prefer).
  12. `async` becomes possible and beautiful and all is sweetness and light.



**There, I 'm done talking except for discussion. Does it merit consideration, 
investigation, and 

Re: Nim v1.1 and beyond roadmap

2019-10-23 Thread GordonBGood
@Araq:

> As I said "araqsgc" is a hybrid, it will work with async too.

I can see how "araqgc" is intended to be a hybrid; **what I fail to see in the 
code or in testing is how to make the ``traverseObjHook`` fire** so I can test 
it without calling dispose, deepDispose, or `collect`, which would be the other 
manual part of the "hybrid".

But also, being a hybrid, **it also has all the disadvantages of mark-and-sweep 
GC 's in nondeterministic destruction, execution overheads, etc**., even if the 
overheads have been reduced in this simplification.

> However, to me it's an open question of whether `async` intrinsically really 
> needs it (atomic ref counting) or the current `async` implementations need it 
> or neither...

I haven't looked at the code for `async` yet; guess it's time I did. Maybe its 
good that I am coming from outside the dev team, as all of you have been buried 
in the current implementations and problems for so many years that maybe there 
is something else that is so simple that it has been missed: I've got something 
cooking that I think might be possible within the current general code base 
framework that should be compatible with owned although having owned shouldn't 
be necessary to make it work based on how clever the current Nim compiler is at 
managing data flow analysis built around the AST; another good thing about it 
is that it isn't a tracing GC, but I think it should be able to support some 
implementation of `async`.

Give me a few hours to let it stew and to try to test some of the ideas within 
the current version (now up to version 1.0.2 I see - unannounced?) and I'll 
present it here for your opinion. I may not be able to easily test it within 
the current code as I need to be able to use some sort of "hooks" to tie into 
the handling of `ref`'s, so I may need to emulate that with some use of 
destructors plus some manual code.


Re: Nim v1.1 and beyond roadmap

2019-10-22 Thread GordonBGood
@Araq:

> Much more problematic to me is that neither `owned` nor `dispose` can handle 
> `async`'s complexities which is the most important requirement for us. If we 
> cannot solve this problem, using the tracing GC aspect just for this would be 
> valid option but then how can move the task/continuation to a different 
> thread...

 **Wow, this is a shocker; I thought that "araqsgc" was well on the way to 
providing a solution for** `async` **, but now you are saying that it can 't - 
because it isn't a "tracing" GC?**

I haven't tried to use `async` in Nim yet because I haven't had a need for it, 
but I am very familiar with the concept from F#, where it works very well (but 
which also has a tracing GC).

Questions, as follows:

  1. I assume that `async` can be made to work with the current default GC as 
that must be how it passes testing, but only with a lot of "cruft" and compiler 
magic, which isn't really accepable?
  2. Does the various "Back to Boehm" recommendations you've made in the 
various threads in the forum mean that it passes as a multi-threading tracing 
GC?
  3. Does the "\--gc:go", which I assume also supports multi-threaded tracing 
GC as it is used by the go-lang itself also pass the requirements?
  4. If the Go GC passes the requirements, would it be fair to say that one 
would use Boehm for higher throughput at the cost of higher latency, where the 
Go GC is lower latency at the cost of throughput?
  5. If used with the current version 1.0.2 code base, both the Boehm and Go 
GC's are saddled with all the "cruft" required to support non-multi-threading 
GC; would it be possible make them able to be used through the "GChooks" 
available from "\--gc:destructors"/"\--gc:noCruftHooks" to avoid that "cruft" 
if it isn't necessary?
  6. Is "araqsgc" not a tracing GC? I assumed that it can be given the `trace` 
`proc` but only if the `traverseObj` "hook" is calling that when required? 
Doesn't it do that? When do calls to `traverseObjHook` get injected if ever (in 
testing, I could never get it to fire)?
  7. Can the "GC hooks" be modified to support a full atomic reference counting 
model, given the thinking that multi-threading should really be done at quite a 
"coarse-grained" level of no context switch in less than about a millisecond so 
the execution overhead wouldn't be a problem?



AFAIK, the only working non-tracing GC implementations that support `async` are 
based on atomic reference counting as in Rust and Swift (each of which may have 
other problems, as we have discussed).

It seems to me that "newruntime" (much as we love the beauty of the idea) can 
never support `async` in the general case because it can never be certain in 
the general case that there never needs to be multiple owners of the same data, 
which would be the closures required to cleanly supply the 
continuations/callbacks, and would thus would still need the UGLINESS of 
"deepCopy" to produce `owned` data/closures with a different owner. Rust's 
ownership model ran into the same problems, with their fix as applied to 
general data to use atomic reference counted structures manually applied when 
required; however, they still have problems when applying all the ideas to 
single use closures.

> Oh and to answer your question, yes, the "hooks" currently do no(t) allow for 
> a pure refcounting GC.

In looking at the code for "gc_hooks.nim", I see no provision for external 
libraries to register global or thread local pointers to mark/sweep, as the 
`nimRegisterGlobalMarker` and `nimRegisterThreadLocalMarker` `proc`'s are not 
made public and are therefore orphan `proc`'s. Even if they were made public, 
as as long as the `traverseObjHook` is never fired, there is no way to cause a 
mark/sweep automatically anyway. If these observations are true, that "araqsgc" 
is nothing more than another way to manually manage destruction with manually 
called `dispose` and `deepDispose` with a different (likely better than the 
default or malloc) allocator, and as you say, that is never going to work with 
`async`

So my major question is " **Can we modify the hooks to get an automatic 
deferred multi-threaded reference counted mark and sweep working?** ".

The only alternative would seem to be an automatic non-deferred reference 
counting, but as long as we can only attach destructors/assignment/moves to 
"nominal" types (ie. objects) then we can't implement this through the "GC 
hooks".


Re: Help me see what's wrong with this

2019-10-22 Thread GordonBGood
bbxiong1999: I think the following working code is what you were trying to do:


# example1:

let t1 = "Hello"

var
  t2 = t1 # note that this must be indented in order to use the `var` 
section
  t3 :  pointer = addr(t2) # note that you need a new line and alignment

# I think the following is what you meant to do...
echo repr(cast[ptr string](t3)[])
# must cast to the "ptr string", then dereference it to print it!
# written more idiomatically in Nim as "cast[ptr string](t3)[].repr.echo:

# example2:

from dynlib import loadLib # must import the function from the library to 
use it

let ws2 = loadLib("ws2.dll");  echo repr(ws2) # note the argument to 
`loadLib` is a string
# note the missing `;` to separate statements, and you can't print a 
"handle"
# which is what is returned from `loadLib`; although you can print a `repr` 
of it!


Run

On an online compiler (Linux, not that it matters), this produces the following 
output: 


0x7f8cd2672080"Hello"
nil


Run

Conclusion:

I don't know what languages you know after which you were modelling your code, 
but you need to spend some more serious time with the tutorials and modifying 
working examples before you are ready to write your own code. In these very 
simple examples you made the following glaring errors:

  1. You didn't realize that you need to separate statements on the same line 
with semi-colons `;`.
  2. You don't have any idea about indentation in blocks or type sections.
  3. You seem to have little idea about static typing and how it's used.
  4. You seem to be trying to run this as a Nim script file (file.nims), which 
has its own special limits; I would do some "normal" Nim coding before trying 
scripting.
  5. Nim is not Python, although what you wrote wouldn't work in Python either.



The reason for the VM error messages is that it is the VM that runs scripts. 
Using nim as a compiler will produce better error messages.

The reason that the last printout is `nil` is that on the machine on which I 
ran the code, there was no file in the same directory named "ws2.dll" and this 
is the response when a library can't be loaded for the given file name.

As I said, you need to spend a lot more time modifying and understanding 
working examples. I don't think this forum is intended as a teaching aid for 
someone who clearly doesn't yet know how to code.


Re: Nim v1.1 and beyond roadmap

2019-10-21 Thread GordonBGood
@Araq:

Thanks for chipping in and providing the details that I missed. However, it 
leads to a couple of questions:

> A simple mark GC has the very nice property of not adding any overhead 
> to pointer assignments. It's fundamentally compatible with manual memory 
> management,..

 **Which of the current GC options are "simple mark & sweep"?** Can both the 
default and "\--gc:markAndSweep" be used this way?

> `dispose` and `deepDispose` operations ... are currently being backported to 
> the other GCs)...

Wow, **that 's along the lines of something I proposed earlier as in 
"deepDestroy"** which you seemed to find crazy at that time :-D

> you can encapsulate your data structure in a refcounted wrapper like 
> Refcount[JsonNode]...

Even at a :"macro" level, this also seems to be a major reversion to what we 
were looking at before...

> Having said that, classic B/D with owned still looks more elegant... ;-)

 **You 've converted me to that way of thinking; unfortunately, we are still 
not able to fully explore it due to incomplete implementation still fighting 
some UGLY legacy code.** For instance, it can't work with the `channels` 
library because of the "compiler magic" deep copying of messages by that 
library, and even worse, the library doesn't even use the `system.deepCopy` 
that can be overridden by a `=deepCopy` but implements its own "cruft". Also, 
the `threadpool` library is still full of eventually unnecessary problems to 
work around such as the types that are forbidden to be passed through a 
`FlowVar` (especially closures), all there because it wasn't considered that 
there would be better solutions to passing heap allocated data across threads 
than what the default GC(s) allowed. Even the `threadpool` implementation of 
`spawn` that requires a non-closure function with restricted arguments requires 
compiler magic and "cruft" due to this rather than just simply passing it a 
"thunk"/ closure if the considerations of passing single threaded GC'ed `ref`'s 
hadn't been locked in.

"When will we see the last of `deepCopy`?"

"The great power of being able to manipulate the AST in infinite ways leads to 
infinite ways to abuse that power."...

One final question that I think I have the answer to but may be wrong: **Would 
it be possible to implement a reference counted memory management system based 
on destructors using the "\--gc:destructors"/"\--gc:nocrufthook" GC "hooks"?** 
**I don 't see that it can**, as although one can produce such a thing through 
the `newObjHook`, it would seem to be immediately destroyed as destructors are 
based on "nominal types" which means that one has to have an object to keep the 
structure alive, whereas **``newObj` just produces a pointer**. I see that if 
one made this work, one has just produced another form of GC as in a deferred 
reference count as the ones we have already been provided.


Re: Can someone help me fill in missing info about Nim's 8 GC models?

2019-10-21 Thread GordonBGood
@Araq:

> I'm sorry but I really want something that works before announcing it and 
> documentation follows after a working implementation...

I'm sure **everyone appreciates that** , and it's obvious that we don't want to 
waste too much time on the documentation until things stabilize.

That said, **there are experiments and failed experiments in the code base that 
are confusing for any programmers that are getting into Nim well after they 
were considered**. For instance, of the actual ten methods of GC (eight plus 
"\--gc:stack" \- same as ""\--gc:regions" and "\--gc:none") as per the OP here, 
which doesn't include "\--newruntime" which is another form of no GC, it would 
seem that at least two options (as in "\--gc: generational" and "\--gc:v2" have 
been dropped/depreciated and lead nowhere and there are no plans to advance 
them. It's true that one needs to read the source code to actually find the 
dropped ones, as they no longer even show up in "nim --fullhelp". However, of 
the others, there doesn't seem to be much explanation, even in the "garbage 
collection" document that seems to be very out-of-date.

**When it comes to GC, Nim is getting to the state of too many options to try 
for someone new to Nim, without any documentation on the advantages and 
disadvantages of each.** For instance, there isn't any documentation that says 
that both both the Boehm and Go (i think?) GC options support GC across threads 
of the older no-longer-advancing methods, what Regions GC does that makes it 
different, that MarkAndSweep is the more tunable GC as to real time latencies 
as compared to the default deferred RC (I think?). AFAICT, all of the above 
methods are stable and have been available for years and are unlikely to 
change, especially in light of evolving "better" ways such as the new 
"Destructors" methods of "araqsgc" or "newruntime".

**There can also be the issue of somewhat confusing names /switches**: 
"\--gc:destructors" originally seems to have been conceived as a way of 
providing some sort of memory management based on the new destructors but with 
a GC "hook" option, but now the "destructors" part of that has been moved into 
a separate "\--seqsv2" compiler switch, as the remaining purpose of the 
"\--gc:destructors" switch seems to be to remove cruft (;-) ) as in by-passing 
"deepCopy" and various AST types of magic, as well as all forms of built in GC, 
leaving only whatever can be provided through the GC "hooks". A better name for 
it might be "\--gc:nocrufthook" ;-) ; in fact it is evolving into something 
rather nice!

**However, I wonder if it is getting ambiguous to do with the "\--gc:none" 
switch:** if the default for the "\--gc:nocrufthook" were to "hook" the newObj 
hook to the appropriate allocator (regular or malloc) just as "\--gc:none" does 
with the traverseObj hood hooked to a discard, then when no "hooks" are 
provided it would work as if "\--gc:none" had been used but with the guarantee 
of no cruft; it would also not blow up when no implementation of the "hooks" 
were provided when ref's (including closures) were used, as "nogc" would then 
be the default. In order to be backward compatible, the "\--gc:none" compiler 
switch could just activate this mode. **I can think of no reason that one would 
want "nogc" with cruft, can you?** I think doing it as proposed might actually 
decrease the code as in not turning off/bypassing/short-cutting the built-in GC 
in two places...

It's a minor quibble but while I highly applaud the cruft-removing "\--seqsv2" 
switch, **shouldn 't it be more clear that this applies "destructors" to 
`string` as well as `seq` and be named "\--seqstrv2"**?


Re: Nim v1.1 and beyond roadmap

2019-10-21 Thread GordonBGood
@Araq:

> Ok, now what are the differences? How does "araqsgc" work? Will tell you 
> later, I have to go now.

I've been checking this thread for answers for the past two weeks since this 
post appeared and there haven't been any, so I have researched [your "araqsgc" 
project](https://github.com/Araq/araqsgc) to find the following (part of what I 
think I found [is posted in the "types of gc" 
thread](https://forum.nim-lang.org/t/5232#33671)):

  1. As per that thread, the "\--gc:destructors" compiler option is intended so 
that custom implementations of memory management/garbage collection/allocators 
can be "hooked" in using the available (but undocumented) newObjHook and 
traverseObjHook.
  2. Currently, use of the above compiler option also turns on new seq/string 
semantics (which also apply to the "newruntime" option where this is 
automatically turned on) making their use more efficient; now, in the "devel" 
branch there is a separated "\--seqsv2" compiler option (automatically turned 
on for "newruntime") which can be optionally enabled (for any of the gc modes?) 
to see these improved destructor semantics enabled without forcing one to use 
"\--gc: destructors" (or "newruntime").
  3. As to what "araqsgc" is, it is a simple garbage collector built around the 
"mi-malloc" allocator [developed by Daan Leijen, et al, of Microsoft 
Research](https://www.microsoft.com/en-us/research/publication/mimalloc-free-list-sharding-in-action/)
 and available to the open source community under the same MIT license that Nim 
uses. Its [technical merits are described 
here](https://microsoft.github.io/mimalloc/) (with links to performance 
measurements), and [the actual repo is 
here](https://github.com/microsoft/mimalloc). your "araqsgc" project is a 
simple Nim wrapper around this implementation that uses the parts of it as 
appropriate and hooks it all into the above mentioned "gc hooks".



The "newruntime" has been discussed at length in other forum threads, and 
currently partially works inside the current Nim version 1.0 and the "devel" 
branch, with promises that work will continue and that it won't be dropped. The 
main limitation with the current state of "newruntime" is that is still doesn't 
smoothly work with multi-threading and the current multi-threading libraries 
(which was supposed to be one of its features/advantages), but it does look to 
be possible to make it work. Paraphrasing what you have said, you have pulled 
back a little from the "newruntime" implementation in favor of this new GC 
implementation due to the changes required to existing libraries to support the 
"newruntime", not only limited to the required addition of the owned ref 
keyword and other supporting programming constructs in order to make it work 
with those libraries.

The "araqsgc" project seems to be a "drop-in" solution to the problems of 
multi-threading without requiring library changes, requiring only that the 
araqsgc library be imported and that the "\--gc:destructors" compiler flag be 
used.

To those reading this post: Both of these sub-projects show that @Araq 
continues to do the research to look for better solutions for the current 
"state-of-Nim" with the "newruntime" the result of his reading a paper by 
Bacon/Dingle describing the use of the owned ref idea and this latest version 
of a multi-threaded GC from his reading of the Microsoft paper describing the 
"mi-malloc" project. Hopefully, one of these solutions will make the use of 
multi-threading in Nim much easier, which is one of Nim version 1.0's main (and 
most difficult to work around) weaknesses...


Re: Can someone help me fill in missing info about Nim's 8 GC models?

2019-10-21 Thread GordonBGood
Interest in this thread seems to have evaporated (as is often the case with 
some interesting forum threads, just as the one where @Araq promised to explain 
the direction memory management was going [int the version 1.1 roadmap 
thread](https://forum.nim-lang.org/postActivity.xml), in which interest has 
seemingly waned).

I got tired of waiting for the explanation of the "\--gc:destructors" option 
and just read the source, with the following observations:

  1. With the "\--gc:none" compiler option, it does as it says with nothing 
ever automatically deallocated meaning that everything allocated on the heap is 
a potential memory leak unless one arranges for its deallocation. This may be 
less onerous than one thinks as seq's and string`s can handle the destruction 
of the heap memory they use (especially with the new "\--seqv2" flag in 
"devel"), but the heap memory used for captures for closures looks to be an 
important cause of memory leaks that one must consider, as all `ref's including 
the one inside closure captures become the same as ptr's with "\--gc:none"; 
perhaps the easiest way to handle closures and other uses of ref's would be to 
"wrap" them in objects with destroy/copy/move (=destroy/=/=sink) semantics and 
even do the simple additions to add reference counting if these are to be used 
cross thread. This is what appears to be done for seq's/string's with the new 
"\--seqsv2" compiler flag (and previously with the "\--gd:destructors" flag, 
but one was expected to supply the custom memory management/garbage collection).
  2. The "\--gc:destructors" compiler flag seems to have been created expressly 
to support custom memory management/garbage collectors/allocators, as without 
supplying such a custom implementation, one cannot use ref's as the "hook's" 
aren't assigned to anything and nil out.
  3. In the "devel" branch, there seems to be some simplification of seq's and 
string's going on, as the new "\--seqsv2" takes the place of the use of the 
"gcDestructors" flag as applied to seq's and string's so that the better 
semantics of copying/moving can be used without using the "\--gc:destructors" 
compiler flag, which would imply supplying a custom memory manager.



So to answer the question as follows:

> For Destructors what function can I call to free?

the right answer is that you don't need to free manually because the custom 
memory management/garbage collector that you supply is expected to do it for 
you through use of the automatically called "hooks".

I'm sure that there are many in the dev team that know all of this and likely 
more, but it takes quite a long time for the information to be disseminated 
across to where it will readily pop up for a search such as in this forum; let 
alone the length of time it takes to be properly and completely documented in 
the official documentation. It I have made errors, please chip in with 
corrections, and if we can get the "full picture", let's look into doing a PR 
on the documentation to complete this...


Re: Any tutorials for compiling to JavaScript (specifically DOM)?

2019-10-08 Thread GordonBGood
@dotheda:

> Also, I would do WebAssembly but the process seems a lot more complicated, so 
> I'll wait until it gets proper support, official or not. Any news on that?

I don't know about official support, but it has been done often enough, even by 
@Araq, although many users are on Linux and there is a bit of a twist required 
for use on Windows. As I am on Windows, I explored this, with the results [as 
per my forum post here](https://forum.nim-lang.org/t/3667#32669).

As mentioned in that post, it works well, but adds a bit of a level of 
complexity in that you can no longer use the dom library directly from code and 
will likely have to call into the HTML DOM by using emit of JavaScript code if 
doing it from "asm.js" or by building two dependent projects when using 
WebAssembly, with one producing the JavaScript and the other producing the 
WebAssembly; alternatively, one could produce the required minor JavaScript 
"glue" code just in JavaScript using the produced HTML "runner" code as an 
example.

It doesn't look to be that hard and I'll likely be doing something along these 
lines shortly.


Re: Sequence of objects which contain a sequence

2019-10-07 Thread GordonBGood
@MaineTim: Yup, new to Nim...

In the following, I've shortened your code just to show the essentials: 


type
  TSeg = object
sub: seq[int]

var seg = TSeg(sub: @[])
var segments = newSeq[TSeg]()

for i in 0 .. 2: segments.add seg

segments.echo

segments[2].sub.add(7)
segments.add(TSeg(sub: @[42]))

segments.echo


Run

No need to create the seq's using capacities (unless you are optimizing), 
creating all the elements from your uninitialized model of segment doesn't 
actually create any sub sequences as the default values are all zeros, 
including the deductions sub field, so you can't assign or add to it until it 
has been created with some sort of a new. In my example code, I initialized the 
model seg including initializing its seq contents with seq literal initializer 
as in @[] but you could use any sort of newSeq or newSeqOfCap, then either 
assign to the existing elements or add to the elements as you wish as I have 
done here.


Re: Any tutorials for compiling to JavaScript (specifically DOM)?

2019-10-07 Thread GordonBGood
I haven't actually done it myself, but there seems to be two general 
approaches: generate the DOM "programmatically using 
[karax](https://github.com/pragmagic/karax) or just interfacing with the HTML 
DOM as you seem to be trying to do.

Using dom would seem to be pretty straightforward following what one would do 
in JavaScript: just 
[getElementById](https://nim-lang.org/docs/dom.html#getElementById%2Ccstring) 
which returns an 
[Element](https://github.com/nim-lang/Nim/tree/version-1-0/lib/js/dom.nim#L193),
 which contains a contentEditable as a cstring and if that was (say) a text 
element, I would say you could just set that to whatever you wanted to show.

One if these days I'll get around to actually trying this, but I'm pretty sure 
something along this line will work.


Re: Can someone help me fill in missing info about Nim's 8 GC models?

2019-09-27 Thread GordonBGood
@Sixte:

> the new keywords owned, sink and lent...

As @Araq has explained as a follow-up, they are in the library system.nim, but 
they aren't explained very well or at all in the library documentation. The 
best source of documentation for them is [the "destructors" 
document](https://nim-lang.org/docs/destructors.html). The top part of this 
page describes Nim's implementation of destructor/assignment/move semantics 
including sink/=sink (move) and the bottom part the implementation of owned 
ref, lent, etc.

@treeform:

> For Destructors what function can I call to free?

That's the point, with \--gc:destructors you don't need to call anything to 
free standard library allocated items that allocate heap space as in string's 
and seq's as this turns on the generation of =destroy/=/=sink "hooks" for these 
whose functionality is as described in the above documentation. That said, the 
default destruction time is when the binding goes out of scope at the end of a 
proc and if one wants to force destruction, =destroy can be called whenever 
immediate destruction is desired.

For your own allocated pointers, you can tap into this at any time (you don't 
need the \--gc:destuctors compiler flag) by wrapping them in an object for 
which you have defined your own custom override versions of these "hooks" that 
take care of destruction/assignment/move semantics; The compiler flag just 
makes it available automatically for the standard library structures.


Re: 1.0.0 is here

2019-09-27 Thread GordonBGood
@jehan: Thanks for the summary on Boehm; thought the disadvantage might be 
latency, but as you say only a big problem if one does a lot of 
allocations/deallocations.

@Araq: If you are impressed with Boehm (other than if latency is a problem) as 
compared to the current default, it must be good and it looks like I'll have to 
try it.


Re: 1.0.0 is here

2019-09-25 Thread GordonBGood
@Araq: Thanks for the update:

> we actually have two competing designs and whatever supports async first will 
> "win".

That's news to me. As you know, I've been a convert to the "owned ref's" 
newruntime. Any quick words on the proposed alternative? Is it a version of 
automatic reference counting?

Yes, perhaps we should revisit the Boehm Garbage Collector...

@jehan:

> The Boehm GC is also, contrary to popular belief, a very good option...

Thank's for the link to your GC benchmarks; They really show how good Boehm can 
be. What are Boehm's disadvantages compared to the others?


Re: Emscripten/WebAssembly GC considerations?

2019-09-22 Thread GordonBGood
To anyone trying to do this currently for Linux **or** Windows (I am using 
Windows), here are the steps as of 22-September-2019:

  1. Install 
[emscripten](https://emscripten.org/docs/getting_started/downloads.html) for 
your platform including Windows.
  2. Thanks to [@yglukhov](https://forum.nim-lang.org/t/4232#26349), we know we 
have to cross-compile to linux to make it work, especially for Windows, which 
was what I was missing. This can be used by putting the following "nim.cfg" 
file into your project directory:




@if emscripten:
cc = clang
@if windows:
clang.exe = "emcc.bat"
clang.linkerexe = "emcc.bat"
@else:
 clang.exe = "emcc"
clang.linkerexe = "emcc"
@end
os = linux
cpu = i386
@if asmjs:
passC = "-s WASM=0"
passL = "-s WASM=0"
@end
@end


Run

3\. Call the nim compiler with the following for compilation to wasm: 


nim c -d:emscripten -d:release -d:danger -o:./index.html 
-o: 


Run

or the following for compilation to asm.js: 


nim c -d:emscripten -d:asmjs -d:release -d:danger -o:./index.html 
-o: 


Run

  1. The output file extension can be ".html" for a pre-canned emscripten 
standard web page with the console output embedded or with a ".js" file 
extension for a file that can be run with node.
  2. From these, one can work out how to build a custom HTML document file to 
embed the project complete with custom JavaScript to handling the data to/from 
between wasm and the HTML DOM.



Use of wasm is generally at least a little faster than the use of even asm.js, 
but for many applications it may not be of "killer speed" considering that 
JavaScript engines such as Chrome's V8 already Just In Time (JIT) optimize "hot 
spots" in the code quite efficiently.

I haven't tried it yet but plan to but one of the most important reasons to use 
wasm instead of JavaScript or asm.js output is that one can do single web page 
multi-threading within the same page with shared data by using the new 
experimental feature that [can be enabled in Chrome since version 
70](https://developers.google.com/web/updates/2018/10/wasm-threads) (now up to 
version 77), if not other web browsers, (especially Firefox?). Finally, we 
should be able to make proper web page use of multi-threading and all the cores 
we have available even in smartphones!


Re: Who owns the widget in this example?

2019-09-05 Thread GordonBGood
@Sixte; You missed my point:

It seems that Nim **parameters** don't have to be specified as being owned and 
if not specified can handle either with the compiler causing the right behavior 
depending on what is provided; if it were required that the "ownedness" be 
specified, then one would need to provide proc overloads for all the 
combinations, leading to a lot of extra boilerplate code - IIRC, that is what 
Rust does. OTOH, storage **locations** need to be specified as owned or not and 
everything including use in copying/moving proc parameters must respect the 
"ownedness" of the source and destination.

This last is what the bug is for this one case where the source is a proc 
parameter when it is owned (which the compiler already knows) and the 
destination is not owned but the compiler is mistakenly treating it as if it 
were.


Re: Who owns the widget in this example?

2019-09-04 Thread GordonBGood
@Sixte:

> ... but proc sinkprocUsing(dst: var Widget; src: sink Widget) doesn't expect 
> an owned Widget...

It seems that it is @Araq's intention that one doesn't have to force proc 
parameters to be owned or not and the compiler determines which they are 
according to "last read of" or not; this isn't the problem.

What is the problem (as @Araq stated in his first reply to this thread) is that 
the logic to determine whether to "move" an owned paramater or to copy it as a 
"dangling" is too ambiguous when the destination doesn't specify that it is 
owned such that when the compiler uses the same rule as it uses for proc 
parameters as to "last read/use of", it then incorrectly moves an owned to a 
"dangling".

Since only known "owned" references are destroyed and the type of dw is not 
owned, it is never destroyed, which causes the memory leak.

Applying the general rule of making dw automatically "lift" to be an owned on 
such an assignment would seem to be overly complicated and confusing, as in the 
case where the assignee were an element of a seq as in the Original Post (OP) 
the compiler (and programmer) would have to deal with the case where some seq 
elements are owned and others not - not something that should have to be 
considered.

The simplest solution is likely to only consider the owned/"dangling" 
(nonowned) status as declared and cause a compiler error where possible when 
there is a lifetime discrepancy or at least a "Dangling reference error" at tun 
time when the discrepancy can not be detected.


Re: Who owns the widget in this example?

2019-09-03 Thread GordonBGood
@Sixte:

> Where are the (new) B/D improvements to be found in the source BTW?

The things related to the newruntime are mostly in the [runtime_v2.nim 
file](https://github.com/nim-lang/Nim/blob/79f1b8592a24e878fd886127708929d7fc613842/lib/core/runtime_v2.nim),
 but also there are tweaks scattered through the code; [a search for 
nimV2](https://github.com/nim-lang/Nim/search?q=nimv2_q=nimv2) will 
locate most of them.

> There is no problem with your proc sinkprocUsing.

True, there would be no problem with sinkprocUsing if the compiler didn't have 
a bug and not report an error when it is assigning the passed owned to a 
"dangling" and moved it as if the destination were owned.

> > Despite this, an owned ref can't be passed to a sink
> 
> sink would become useless in combination with owned.

Once the compiler bug is fixed, an owned ref can be passed to a sink parameter, 
in which case there will be a compiler error if the sink proc passes it to a 
destination that is not an owned ref but everything will be fine if the 
destination is an owned ref whose lifetime outlives any dangling ref`'s that 
have been created. If the destination of the owned ref is outside the scope of 
the dangling ref's, then there will be a "Dangling ref error" when it is 
destroyed if the compiler can't detect the shorter lifetime and give an error 
first. For instance, even now, if sinkprocUsing doesn't do anything with the 
owned ref passed to its sink parameter as far as passing it on, it is destroyed 
when sinkprocUsing returns, causing the Dangling `ref error" if there were 
other references to w, which is the correct behaviour.

If dw were an owned ref instead of a dangling one, the current behaviour is 
correct and there should be no errors. as dw lives beyond any other references 
that would be made in the body of `main.


Re: Who owns the widget in this example?

2019-09-02 Thread GordonBGood
@Araq: and all...

Since no one else seems to have filed an issue on this, I just did as 
[issue#12116](https://github.com/nim-lang/Nim/issues/12116). I further refined 
the problem as it has nothing to do with having global variables or seq's, but 
only to do with passing an owned ref to a sink proc as a sink parameter that 
gets correctly moved (marked with wasmoved to be zeroed) for the caller when 
"last use of" which sink proc then uses it to assign it to an 
"unowned"/"dangling" ref, the sink proc erroneously "moves"/zeroes it rather 
than causing a compiler error that the assignment is invalid as the "danglng" 
reference lives longer than its source.

The memory leak is caused because the erroneous zeroing/"moving" makes the only 
owned ref source copy unable to be destroyed.


Re: Who owns the widget in this example?

2019-08-31 Thread GordonBGood
@Sixte: You seem to be a proponent of the Pony memory model, and in being so 
wrapped up in it are missing the simplicity of what is being implemented here 
in the Nim newruntime. As @Araq once did for me, I highly recommend you read 
the Bacon/Dingle document thoroughly to see what was originally proposed and 
then go through the Nim implementation of that to see how even that model has 
been improved upon.

What you are doing is something like I often see in trying to teach foreigners 
to speak the local language here: they keep drawing parallels to how things 
work in English or whatever language they may already know and in so doing 
limit their appreciation of the full nuances of the language being learned, 
which is based on an entirely different history and culture.

As to the Pony language, I see the author's point, but lament the 
implementation complexity both as to concepts and as to required syntax to 
support those concepts; the Nim newruntime model is much simpler in concept and 
syntax and should be much easier for people new to the language to understand.

The Actor model on which the Pony concurrency model is based does have it's 
merits but the method of restricting ownership that Pony uses is not the only 
way to implement it - for instance, Erlang/Elixir do this in what I think is a 
more elegant way. If we want an Actor model of concurrency for Nim, I believe 
that it would be possible to build one as a library, and that use of newruntime 
could help express the ownership in a simpler way, although there might need to 
be some alternate means necessary to restrict mutability for passed messages. I 
haven't looked into it too deeply because,, while the Actor model has its 
advantages, it also has its limitations. What is great about Nim is that it 
doesn't restrict itself to any particular model, but tries to provide the tools 
to build whatever model is required for a given situation. While we aren't yet 
there for concurrency, explorations like newruntime are getting us much closer.

For many of us, we are glad that @Araq isn't using the Pony memory management 
model, and if he ever reverted to that, I think that many of us would be "out 
of here". **There must be a better way that all the complexity of Pony or 
Rust**. Most of us would rather just have some sort of automatic reference 
counting something like as in Swift than the complexity of those others if the  
newruntime doesn't work out. 


Re: Who owns the widget in this example?

2019-08-31 Thread GordonBGood
@Sixte, @Araq; confirming the problem is with the sink parameter of g1.add:

I've modified my example code showing the problem in a post above by 
eliminating addWidget entirely to show the problem still exists with 
seq.add/g1.add, which is the real problem that sink parameters are not 
destroyed when owned and only a dangling is passed on by an auto assignment to 
an unowned.


Re: Who owns the widget in this example?

2019-08-31 Thread GordonBGood
@Araq:

> The owned ref's destructor then still would be run at the end of addWidget...

Shouldn't that be " The owned ref's destructor then still would be run at the 
end of seq.add...", as the owned ref's ownership would have been transferred to 
seq.add (g1.add(w)) due to the sink parameter so that it would be its 
responsibility to destroy it (as creating the temporary location, if that's how 
it's implemented, shouldn't transfer ownership although that seems to be what 
it is doing currently)?


Re: Who owns the widget in this example?

2019-08-30 Thread GordonBGood
@Araq:

> The problem is that in addWidget you don't take over the ownership of w...

I don't quite get this, as why doesn't the owned Widget by "addWidget" (which 
has had ownership moved) not get destroyed at the end of "addWidget"'s scope 
when it hasn't transferred ownership anywhere else such as to the seq, upon 
which destruction there then should be the dangling error for two reasons: the 
x reference in main and the added element to the seq, both of which are 
dangling?

Or is this just a bug?


Re: Testing `newruntime` `owned ref` as to cyclic data...

2019-08-28 Thread GordonBGood
@Araq:

> B/D does mention that cycles are really rare but can come up.

I'd love to see concrete examples of such cycles to see if the end can be 
achieved by other means...

> the cycles that cause trouble in practice for Swift etc. are those resulting 
> from closures pointing back to some "upper layer" and afaict these cycles are 
> prevented. Statically.

Yes, that's how the Elm compiler (version 0.19) rejects cyclic data statically. 
Without mutation or pointers, they can always be detected as the only other way 
to cause them is through closures.

However, **our closures can contain `var` 's/`ref`'s/`ptr`'s whose contents can 
be mutated to point back to "some upper layer" after creation**, and I don't 
know that those can be detected. I suppose the same can be done in Swift/etc. 
and isn't to worry about as pointer manipulations are "unsafe behaviour"?

> Having said that, I'm still musing and tinkering whether some variation of 
> these ideas wouldn't work out better for Nim, as **adding owned to everything 
> is a pretty disruptive change**.

Now that is interesting, as **I understood you were completely committed to 
seeing how `owned` works out for `newruntime`**.

Let me just throw an idea out for your consideration; it may be completely 
wrong, but then I have been before :-) , as follows:

  1. Suppose that the **" everything can be `owned`" principle** that I raised 
in issue #12021 which the compiler already supports but you said shouldn't as 
"the compiler isn't ready for it yet" were to be accepted as the way the 
newruntime would work.
  2. Further, suppose that the rule of **" everything is created as owned"** 
were enforced by the compiler, which "everything" **would include all 
primitives and literals** as well as all the other things you already consider 
as being created owned.
  3. Now, we wouldn't have to worry about adding **`owned`** everywhere as that 
**would be implicit**.
  4. The compiler already detects when we are trying to **copy an `owned`** and 
errors out, so why not have it **inject an `unown`** instead and proceed rather 
than forcing us to manually inject it.
  5. If this were done, the main **problem would be that the programmer couldn 
't easily tell by code inspection whether a given binding is `owned` or not** 
as it may be difficult to keep track of whether a given use is "last read of" 
or not, **but the compiler always knows through its data flow analysis** as the 
whole B/D implementation for owned depends on it.
  6. Thus, if the programmer tries to do something they shouldn't, **the 
compiler can still give them an appropriate error message** , or, if not, the 
**" dangling" detection will catch it at run time** if turned on (for debugging 
or otherwise).



The above is starting to stink of Rust's ideas of ownership, but at least it 
doesn't have a onerous syntax of lifetimes and reference notations to deal with 
(I hope; it's worse than the newruntime as already proposed if it does).

The one problem I can immediately see is how to handle the cases (reasonably 
rare, except for multi-threading) when **multiple ownership needs to be 
handled** , and for that I can't see any other way than a deepCopy or a 
reference counted reference, which are Rust's two solutions to the same problem 
(Clone traits that can include use of RC/ARC). I don't think we ever want to 
see system.deepCopy come back again, but maybe we just do something like 
override system.shallowCopy to actually do a deep copy for these cases, as 
perhaps shallow Copy isn't usually used, or maybe we have a new system.clone 
that normally does a shallow copy but can be overridden to do either this deep 
copy or a reference counted wrapper copy chosen by the programmer as suitable 
for the particular need?

It seems to me that **such a system would offer the advantages of B /D** of 
optional better performance than reference counting or Garbage Collection 
(especially when dangling checks can be turned off) and at least some detection 
of cyclic data if not the actual handling of it while not adding too much extra 
syntax burden and **while also supporting an easy workaround when single 
ownership isn 't appropriate.**

But, OTOH, **I may be completely wrong** and such as system isn't possible in 
Nim... :-)


Re: Testing `newruntime` `owned ref` as to cyclic data...

2019-08-28 Thread GordonBGood
@cdunn2001:

> I'm not sure I see your point.

The point I was trying to make is that "reluctant adopters" might see the 
current implementation with reference counting of "dangling" references that 
can't be turned off costing execution speed, and needing to completely re-write 
code that might make high use of cyclic data a reason for rejecting the whole 
newruntime concept.

For myself, I don't think that cyclic data is a good implementation at any 
time, as it can cause memory leaks even for Garbage Collection, with which it 
works better but not perfectly. I would happily see cyclic data rejected 
completely (other than for unsafe code, of course, where we can't deny it).

> But is your code even legal?

Probably not legal and @Araq confirms that; I used unsafe casting just because 
I couldn't think of another way to generate cyclic data.

> Aren't you violating the single-ownership model?

In this example, probably not, as casting only ended up converting an owned ref 
to the same owned ref and was only done to circumvent the compiler detecting 
the assignment as a copy, which would be forbidden. I wanted both the list node 
and the contained "tail" reference to itself be owned, which doesn't circumvent 
single ownership as long as ownership isn't passed across to another outside 
entity. For instance, there is nothing wrong with a data structure that is 
owned from containing any number of nested owned and "dangling" things; the 
problems come about when an outside longer lifetime reference prevents the 
dangling ones from being cleared, the owned ones will be destroyed when the 
outer owner is destroyed.


Re: Testing `newruntime` `owned ref` as to cyclic data...

2019-08-28 Thread GordonBGood
@Araq: you know I am a convert to the newruntime, I'm just trying to take a 
step back to look at it from the point of view of those who are reluctant to 
accept its benefits:

> see .nodestroy.

I saw it but also noted that it is deemed not to be ideal; it also only solves 
one part of the problem of preventing =destroy races when used, but doesn't 
allow one to apply custom destructors to ref's and owned ref's.

> I never claimed to care about this list-business,

I know you don't, and I agree with you that lists have neither good performance 
nor memory storage efficiency when misused as many functional programmers and 
especially Haskell programmers tend to do (as in a list of char is a string :-) 
), but **lists can be useful in expressing some concepts concisely as in 
outside the code that needs to have high performance**. I hark back to it just 
because I think some will criticize Nim if it doesn 't have reasonable support.

> A dangling ref detector can always be turned off

I only meant that the dangling reference detector **can 't be turned off in the 
current implementation** and mentioned that I hoped that means would be 
provided to turn it of when not necessary (after debugging) in the final 
version.

> You cannot get ownership back via casting.

I wasn't really trying to get ownership back; **I used that casting just in 
order to get a cyclic structure** , as I couldn't think of another way to 
generate one within the newruntime and I wanted to test whether the newruntime 
really does work or not with cyclic data of which a cyclic list is the simplest 
example.

I would really like to see **how one legally generates a cyclic anything in 
`newruntime`** , as if they can't be generated legally (and especially if the 
compiler could detect illegal use such as this, but perhaps not as casting is 
unsafe), then cyclic data isn't a concern.

This would be similar to Elm, where cyclic data is absolutely forbidden at any 
level where it would interfere with destruction, but being a purely immutable 
language and without pointers at all, Elm has the advantage that cyclic data 
can easily be detected at compile time and as cyclic "infinite" data structures 
are already detected and rejected, that only leaves generating a cyclic 
reference by deferred execution, which Elm also detects and rejects.

In contrast, Nim has the problem that a mutable reference that wasn't cyclic 
can become cyclic because the reference is modified to refer to something 
further up the tree (which is what I did with the strange casting for the tail 
of the list), in which case the problem I was trying to demonstrate comes up.

**I feel about cyclic data about the same way as you feel about lists** , and 
if you could find some reliable way for the compiler to detect and reject all 
cyclic data (other than perhaps at the global level as Elm allows), I would 
greet it with a sense of relief, as we would never have to consider it again 
other than to help library writers who have used it to **replace it with 
something better!**


Re: What do you think about the programming language NIM?

2019-08-26 Thread GordonBGood
@jyelon: A little further comment with respect to the advantages of Nim's owned 
ref...

> I'm surprised everyone is treating owned and unowned refs as if they were 
> something new. C++ has owned and unowned refs

As explained in my previous post, **Nim 's `owned ref`'s are not the same as 
C++'s `unique_ptr` nor `shared_ptr` but combines the two**: the Nim compiler 
does data flow analysis just as the C++ compiler does for "unique_ptr"'s to 
determine when these references go out of scope and can be deallocated. 
However, Nim's version also has (originally intended to be optional) reference 
counting as C++'s shared_ptr's do to verify the data flow analysis and that the 
compiler/programmer are correct, which C++ does not. Thus, Nim's "dangling" 
ref''s are also better than C++'s `weak_ptr in that in Nim, they can be (maybe 
optionally eventually) checked that they go out of scope before the owned ref's 
to which they refer.

So in addition to the advantage that these can be (maybe when this is 
implemented) optional and then zero run-time execution cost, what I didn't 
mention was that **they also can prevent cyclic data memory leaks, which both 
C++ 's `shared_ptr` and Swift's automatic reference counting can have**.

**In the current implementation, cyclic data causes a crash on attempt to 
deallocate an `owned ref` which contains a cycle, which is better than an 
undetected memory leak**

**There is a (currently unimplemented) proposed extension in the original Bacon 
/Dingle paper that uses a recursive "deepDestroy" technique to cause 
destruction of cyclic data that can be implemented without data races**; this 
is somewhat costly when there are many deeply nested ref's as they require 
extra nested destruction passes, but nesting isn't that common or commonly that 
deep so this should be minimal and only on deallocation. **If a version of this 
were implemented in Nim, the `newruntime` would work like much the current 
Garbage Collected Nim** other than needing to have a few owned's and unown's 
injected into the code in a few places, and as things progress, even some of 
those could possibly be inferred by the compiler.


Testing `newruntime` `owned ref` as to cyclic data...

2019-08-26 Thread GordonBGood
As you may have noticed in many of my recent posts, I have been a convert to 
the purported advantages of the newruntime over both Garbage Collection and 
common or even automatic ref reference counting as a means of not needing to do 
manual memory management of heap memory allocation/deallocation while being 
more compatible with threading than the current Garbage Collector, with the 
main two advantages over common ref reference counting as follows:

  1. Performance advantages in that the actual reference counting is only done 
as a check that there are no "dangling" ref's left when the owned ref is 
destroyed, as these were said to be very costly in performance for threading 
when they must be atomic.
  2. The newruntime was said to prevent memory leaks with memory not able to be 
reclaimed due to mutual references by cyclic data structures not allowing the 
reference count to drop to zero.



However, we find the following in the current implementation:

  1. Reference counting is not able to be turned off as the source code says 
that would be an "unsafe" option. Now, this decision could revert back to the 
original specification when the newruntime has been fully implemented and 
tested, but meanwhile this implementation will be just as slow as common 
reference counting, especially for the atomic counting necessary for threading.
  2. According to the following testing, at least cyclic data causes a crash so 
it is detected when reference counting is on as per the following code:




# test cyclic owned...

# compile with "--newruntime"

type
  ListObj = object
head: int
tail: owned (ref ListObj)
  List = owned (ref ListObj) # without parenthesis, "Error: region needs to 
be an object type"
proc `=destroy`(x: var ListObj) =
  echo "destroying ListObj" # to know when its being destroyed
  x.head.`=destroy`

template own[T](x: T): owned T = # complement to `unown`
  var rslt: owned T
  cast[ptr T](rslt.unsafeAddr)[] = x
  rslt

proc main() =
  proc makeones(): List = # a cyclic list of ones...
result = List(head: 1); result.tail = own(unown result)
  let ones = makeones()
  echo "head of the list type and value: ", ones.typeof, " ", ones[]
  echo "first tail of the list type and pointer value: ", ones.tail.typeof, 
" ", cast[int](ones.tail)
  var onesv = unown ones
  stdout.write "The first 10 values of the cyclic list are:  "
  for _ in 1 .. 10:
stdout.write onesv.head, " "
onesv = unown onesv.tail; if onesv == nil: break
  "".echo

main()


Run

with the output of the [above program on 
Wandbox](https://forum.nim-lang.org/postActivity.xml) as follows: 


head of the list type and value: List (head: 1, tail: ...)
first tail of the list type and pointer value: owned ref ListObj 
140515572523088
[FATAL] dangling references exist
The first 10 values of the cyclic list are:  1 1 1 1 1 1 1 1 1 1
destroying ListObj
1


Run

However, just because a crash is caused by cyclic data when reference counting 
checks are on, it isn't actually handling cyclic data as the current Garbage 
Collector does, meaning the newruntime will break current code that uses cyclic 
data.

Also, if the code is modified to destroy-chase the tail(s), the code goes into 
a recursive race with resulting stack overflow, which isn't a desirable outcome 
either. In order to make this work so as not to break any possible current code 
that uses cyclic data (not really a good idea), the work would need to be 
extended to do a "deepDestroy" as suggested in the Bacon/Dingle article as an 
extension in order to handle cyclic data, where first the non-owned ref's are 
cleared, then the owned ref's, and finally the whole structure can be 
destroyed. This can't be done with just object destructors as ref's and owned 
ref's are created by compiler magic and we can't make custom destructors for 
them.

If this example and my thinking are correct, we are doing a lot of work on 
something that doesn't seem to achieve the goals, unless we can extend some 
ideas to include this case?


Re: possible in Nim to have a macro/template

2019-08-26 Thread GordonBGood
@mratsim:

> with backticks...

That's really neat! I knew about backticks, but never thought of using them 
this way. Thanks.


Re: possible in Nim to have a macro/template

2019-08-26 Thread GordonBGood
@ScottTorres: a Julia fan, eh?

> is it possible in Nim to have a macro/template (name - GBG) that starts with 
> a custom character?

No, I don't think it's possible to start a macro name with anything other than 
a letter character as it would seem to be contrary [to the general naming of 
(any) identifiers in the 
manual](https://nim-lang.org/docs/manual.html#lexical-analysis-identifiers-amp-keywords)
 that they need to start with a letter


Re: typedesc with subtype won't compile when returning objects

2019-08-22 Thread GordonBGood
@:

You can make your code trying to limit the contained type T inside your 
"SomeObject" container to SomeNumber successfully compile by the following, 
which forces the typedesc to appear to the compiler as a type; however, as 
shown, it doesn't accomplish what you desire in limiting the type to only 
numbers: 


type
  SomeObject[T : SomeNumber] = object
val: T

#ex3
proc test3(T: typedesc = typedesc[float]) : SomeObject[T.typeof] =
  echo "type is ", $T

# works, but shouldn't
echo test3(pointer)


Run

I think this doesn't constrain the type properly [as documented in the manual 
on Generic Inference 
Restrictions](https://nim-lang.org/docs/manual.html#generics-generic-inference-restrictions)
 as "typedesc[T] cannot be inferred in a generic instantiation. ". I think that 
typedesc is intended to be used more with custom templates and macros where 
this limitation isn't as much of a problem. Remember that a generic proc is 
actually automatically creating a template that instantiates the proc for the 
specific type used **at compile time** but this automatically generated 
template for generics is limited as specified in the manual. Due to the above 
restriction, the combination of regular default proc parameter behaviour and 
this "template" behaviour conflicts enough to interfere with the more complex 
type resolution in your example 3 - it's not a bug, it's a restriction.

@lucian''s approach using a template thus works, of course, because template's 
are processed at compile time and his template is doing something like what I 
suggesting in my first post in using when, which also makes changes at compile 
time.

So, the short answer is: use compile time restrictions as in when or by using 
templates or macros (if you need to step up that far) to do anything more than 
the basic described short cuts in specifying typedesc in proc's.


Re: [RFC] Project Picasso - A multithreading runtime for Nim

2019-08-21 Thread GordonBGood
@Araq: **Regarding the `channels` and `threads` libraries use with 
`newruntime`...**

> No, I mean they are currently "submodules" of system.nim but there is no 
> reason for that, it should be a separate, explicit import.

As I said before, I can see the sense of making these separate modules and not 
auto imported when "\--threads:on" is used.

However, currently in the latest builds **with "\--newruntime", there seems to 
be no `channels` library at all**, neither implicitly imported nor available 
for explicit import. I see that the old library is full of "cruft" to do with 
deep copying and isn't suitable, nor did it use system.deepCopy so as to 
automatically do whatever threadpool did in the case that deepCopy disappeared.

Does this mean that channels is being re-written to be an explicitly importable 
module without the "cruft"? I can see that it should be, but **the lack of 
`channels` makes any code that depends on `channels` break** so needs some 
immediate attention!

The threads module is still implicitly imported with "\--threads:on" whether 
"\--newruntime" is used or not, but then it isn't full of "cruft", being a 
fairly thin wrapper over the OS threading capabilities.


Re: typedesc with subtype won't compile when returning objects

2019-08-21 Thread GordonBGood
@vitreo:

As to your use in example three, the difference (and the reason for the bug, if 
bug it is) may be the more complex inheritance in returning an object.

One can get the effect of specifying that "SomeObject[T]" must contain a type 
of "SomeNumber" by the following code, which might be considered a workaround: 


type
  SomeObject[T] = object
val: T

#ex3
proc test3(T: typedesc = typedesc[float]) : SomeObject[T] =
  when T isnot SomeNumber: {.fatal: "type must be SomeNumber!!!".}
  echo "type is ", $T
#  return SomeObject[T](val : default(T)) # redundant

# works
echo test3(); echo test3(int)
# shouldn't work and doesn't
# echo test3(pointer)


Run

Note that it would be better to use "default(T)" (not default(0)) in your 
return statements but that the whole return statement is redundant in the case 
of pure non-ref objects as the return value will already just be the default 
value for the type. 


Re: No Windows nightlies for the last while?

2019-08-19 Thread GordonBGood
@shashlick: Thanks for the heads-up on the fix; the Windows version nightlies 
are appearing now, in fact we got two as of last night - looks like @Araq 
released a nightly that should have been released 7 nights ago that 
specifically should fix the **" newruntime with threads:on"** problem. Since 
I'd also like to try the fix that is supposed to **output newline properly on 
Windows** , I'll try the other version first as it includes that fix ;-)

Good luck on the Arm build; no Sqlite for Arm, eh?

@Araq, any plans to do another version 1 "beta" (0.20.4?) release soon, as some 
of the things that have been fixed lately seem to be "ShowStoppers" such as 
**the above two things**?


No Windows nightlies for the last while?

2019-08-18 Thread GordonBGood
I am just wondering if there is a reason there haven't been any ready-to-use 
Windows nightlies generated since the 7th August 2019 (this year).

There have been some fairly major fixes since that time related to Windows as 
in the UT8/UT16 mess and what was done with Linefeed characters, as well as 
more general things like making the newruntime work with \--threads:on that I 
would like to try on Windows, but it's not so convenient as I have to compile a 
clone of the latest source using my pretty minimal machine. The "ready-to-use" 
nightlies generated lately have been limited to Linux 32-bit/64-bit and OSX, 
but no windows (64-bit is fine).

If one can't compile these because of bootstrap issues then I won't be able to 
compile either in any case, but it would be nice to know.


Re: What do you think about the programming language NIM?

2019-08-16 Thread GordonBGood
@jyelon:

> I'm surprised everyone is treating owned and unowned refs as if they were 
> something new. C++ has owned and unowned refs

I'm afraid you are far behind the curve here, Nim's owned/dangling ref's aren't 
like those of C++ at all, see [@Araq's original post on the 
forum](https://forum.nim-lang.org/t/4743#29588) with follow-up discussion and 
[another follow-up thread](https://forum.nim-lang.org/t/4976#31173).

In short, in his research, @Araq found a paper that describes a way of doing 
this that is not just pure reference counting but rather more like C++ 
"unique_ptr" but better as it can optionally verify the correctness with 
reference counting - it is more like a cross between C++ "unique_ptr" and 
"shared_ptr" but not the same as either. Since then, we have been working at 
improving on the work in the original research paper to (hopefully) get 
something even better than that.

> But anybody considering this would have immediately understood the cost 
> tradeoffs: you pay the performance penalty of maintaining the reference 
> counts, and you pay the price of doing the extra work to clear out the 
> unowned refs.

In Nim's implementation, reference counts are optional and something that one 
would turn on for debug/development so as to make sure the compiler/programmer 
are doing their job properly, then turned of for release in which case there is 
no reference counters used nor checks made if that option is selected. As to 
"clearing out unowned ref's", if you are referring to dangling ref's, they are 
actually just something like C++ "weak_pointer" and thus there is no "clearing 
out to be done; if checks are turned off and the compiler and programmer are 
mistaken, then the program may fail by trying to reference a deallocated 
pointer. As to "clearing out owned ref's when they go out of scope, that is 
just a simple automatic deallocation just as for any pointer when it is 
deallocated.

Thus, the Nim version of owned/dangling ref's can be zero cost for 
release/danger mode.

When the ownership model fails as when one needs to have the equivalent of 
multiple owners of the same data, then it is proposed that one would implement 
an override that would do a deep copy just as one would implement a "Clone" 
trait in Rust for this case. When one wants to share the same data such as 
across threads, then one does the overrides to implement reference counted 
pointers that are basically the same as C++ "shared_ptr", but the need for 
those should be fairly rare and when required the overhead of their use will 
likely be much less than other overheads, such as those related to thread 
context switching.


Re: [RFC] Project Picasso - A multithreading runtime for Nim

2019-08-15 Thread GordonBGood
@Araq, @mratsim: As promised, I'm posting some code that shows the current 
difficulty of bypassing the overly simplistic (and performance costly) default 
behaviour for the current channels and threadpool libraries of deepCopy'ing all 
GC'ed ref's so they won't be prematurely destroyed if they go out of scope in 
the thread where they were created.

I was wrong previously in thinking that my difficulties in making this work 
were due to generic processes and thus nested templates; in fact, I think the 
problems were due to all the "cruft and compiler magic" not considering 
recursive algorithms where there may be threads spawned from within threads ad 
nauseum, which is also likely the problem that @mratsim has had in running his 
benchmarks. Accordingly, this benchmark **wraps absolutely everything that has 
to do with GC** (just in case) and should be able to handle just about an 
reasonable level of recursion, although I'm not sure about what nesting levels 
of "toDispose" lists generated by the protect/dispose pairs will do to the 
stack - there isn't much I can do about those without computer magic anyway, 
and we certainly don't want to add more of that if it isn't necessary.

The code linked below implements a little benchmark to cycle through spawning 
10,000 (trivial) tasks from the threadpool using a manual closure implemented 
customizable iterator through closures and includes a "polymorphic" converter 
function closure parameter. It is close to what I require to cleanly implement 
my version of the "Ultimate Sieve of Eratosthens in Nim" algorithm, which does 
require the ability to nest and recursively spawn threads. I've divided the 
code into modules by functionality (the file tabs across the top of the source 
code section) for ready reference and so you can see that the actual benchmark 
is fairly trivial; most of the code is there to make deepCopy unnecessary by 
preserving the GC'ed ref's in the current Nim provided ways. I've tried to make 
the code concise and elegant, but the need to do this is **UGLY**.

However, it is likely the easiest to implement "Plan B" if the "newruntime" 
doesn't work out rather than the huge project of implementing a multi-threaded 
GC - just make the extra support modules available as one or more libraries. 
This [link on Wandbox](https://wandbox.org/permlink/TSMrMyVVcikS9Bty) is the 
runnable code. It is run in full release mode at about 400 milliseconds on 
aIntel Xeon Sandy Bridge CPU at 2.5 GHz for which we are given the use of three 
threads, two of which likely share a core (Hyper Threaded). Thus, there are 
about a 2.5 billion total cycles used across all available threads, which means 
that there are about 250 thousand cycles or about 100 microseconds per thread 
spawn including overheads. This sounds like a lot but actually isn't bad 
considering it takes something like ten to a hundred times as long to do this 
by "spinning up a new thread" for every task.

Now, if "newruntime" does work out and for algorithms such as mine where single 
ownership is adequate, much of this code would just "go away": owned ref's 
would replace "RCRef", no wrappers would be required for closures or for ref's 
because they would be owned, and the channels and threadpool libraries could be 
re-written to be much simpler without the "cruft and compiler magic", **not 
depending on using global resources** , and thus written to be completely 
recursive if necessary.

All that would be left would be the benchmark itself, and even that would be 
much simpler, more concise, and more elegant through not having to call into 
the extra wrappers. It should also be a little (perhaps twice) as fast through 
not having the GC fighting us in the background and the more direct forms of 
code.

I had started work on converting this to use my own emulation of owned ref's 
(since release builds won't run with threading and newruntime on at the same 
time), but don't think I'll pursue it as emulating the new closures is quite 
hard without compiler help and there is little point if we are soon to have 
newruntime run for threads, which looks to be imminent. I'll reserve the effort 
for when that happens, as I think it is reasonably clear from this work how 
much easier that could make working across threads!


Re: What do you think about the programming language NIM?

2019-08-14 Thread GordonBGood
@lscrd, @mratsim, @cblack:

> @mratsim's code may not trigger it, but at least in /devel there seems to be 
> a check in semexprs.semOverloadedCallAnalyseEffects that errors out with 
> errRecursiveDependencyIteratorX ("recursion is not supported in iterators: 
> '$1'"). The error message was even updated just a couple months ago. It is 
> not hard to trigger this erroring out with trees that are "run-time switched" 
> recursions vs. @mratsim's compile-time switched recursion. I for one would 
> love full recursion support in iterators...

The message is correct that "recursion is not supported in iterators: '$1'"; it 
can't be supported for iterators because in the general case these are 
re-written as simple loops which can't reliably be nested when there is no 
proper "escape" logic. @mratsim's example showing it working for accessing 
array elements works because that use triggers a different special re-write 
rule that doesn't have the same escape logic. The general case is shown 
triggering the error message [in this Wandbox 
snippet](https://wandbox.org/permlink/9McHwZuENwU04lkA), which if the error 
message weren't trigger would run away and stack overflow.

There are a limited number of cases where recursion is actually useful with its 
results not easy to obtain other wise, with the main one being lazily deferred 
"function" type streams or lazy lists; for these, the lazy deferral breaks the 
race. However, these are actually their own iterators so there is no real need 
for recursive iterators and rather they are implemented just with closures as 
follows: 


# rolling your own recursive closure "iterator" stream...

import sugar

type
  CIS[T] = ref object # an Co-Inductive Stream (CIS) non-memoizing
head: T
tailf: () -> CIS[T]
iterator items[T](cis: CIS[T]): T {.inline.} =
  var ccs = cis
  while ccs != nil: yield ccs.head; ccs = ccs.tailf()

proc numsFrom(strt: int): CIS[int] = # we specify the iteration here
  if strt > 100: return nil # we can add the escape logic here
  CIS[int](head: strt, tailf: () => numsFrom(strt + 1))

for n in 0.numsFrom: # or can add escape logic here
  if n > 1000: break else: stdout.write n, " "


Run

as per [this other Wandbox 
snippet](https://wandbox.org/permlink/bt6nUKHpdpe8KSIo).

The latter technique is highly recommended as with them it is much easier to 
control data races and it can be easily extended for multi-level recursion.

There are some current difficulties implementing this to cross multi-threads 
due to the discussed limitations of the single-threaded Garbage Collector, but 
there are work arounds (although we a fair amount of boilerplate code).


Re: What do you think about the programming language NIM?

2019-08-14 Thread GordonBGood
@dom96:

> For the time being, Nim v1 will have a GC. We should improve what we have now 
> instead of jumping ship to this brand new runtime which is a massive risk..

I respect your caution about "newruntime", and of course version 1.0 will still 
have the current GC options as a fall back.

If there were no concern with the current state of multi-threading in Nim or a 
user doesn't require multi-theading or only very simple "naive" limited 
multi-threading, the current GC runtime is fine and we should have been at 
version 1.0 long ago or we are already there as proposed with version 0.20.X 
alpha/beta.

However, @jyelon has expressed the need for a **multi-threaded GC** , which 
would seem to be the only way to cleanly fix the problems and limitations with 
current multi-threading, and @Araq has said that "improving what we have now" 
by implementing a multi-threading GC is much bigger task than the 
implementation of the current "newruntime" spec.

Yes, the jury is out until we can actually try the new spec on practical 
applications and see how much it affects current packages, but the alternative 
of "fixing what we have" as to making multi-threading competitive could take 
years and Nim may well lose the currently developing interest momentum by that 
time.

I have become an owned ref evangelist because I see it quite simply mitigating 
the huge mess which is the current type system structure built around all the 
limitations of the current GC and data type definitions, but if it doesn't work 
out, will become a promoter of whatever way forward we can come up with that 
fixes these problems in a reasonable amount of time.

That said, I tend to favour non-GC solutions as even the best multi-threading 
GC in the world (perhaps that of Haskell) still has those limitations of 
undeterministic destruction, garbage collection compression moving memory 
locations, etc., etc. Besides, depending on GC just makes us an "also-ran" to 
be compared to such languages as GoLang, D, Crystal, Chapel, etc., etc. and not 
in the realm of "something unique" as the likes of Rust, Pony, etc., yet when 
compared to these last offers a much easier to learn and use experience (I 
think and hope).

My personal "back-up plan" if the "newruntime" doesn't work out would be to add 
some libraries or packages that would replace the "naive" implementations of 
channels and threadpool so as not to require all the extra boilerplate of 
creating GC-Safe wrappers in order to use non-"naive" multi-threading, and am 
actively working on those now while waiting for the "newruntime" to stabilize 
to the point where I can try it out. I believe that would be the fastest way 
forward to a stable version 1.0 if the "newruntime" doesn't work out. IMHO, we 
aren't really anywhere near version 1.0 without doing something about the 
implementation of multi-threading (for those that need it).

Of course, I admit that my use case is just for casual "hobby" projects and not 
in a production environment so it could well be different for others such as 
yourself.

I love something innovative and unique, and the "newruntime" version of Nim 
could well be "it".


Re: [RFC] Project Picasso - A multithreading runtime for Nim

2019-08-14 Thread GordonBGood
@mratsim: Thanks for the reply and your correction of my misunderstandings.

> Also the current threadpool/spawn/^ does not work on the "Hello World!" of 
> multithreading which for better or worse is fibonacci...

Ah, I understand. So the order of business is still likely correct: get 
threading working with "newruntime" as seems to actively being worked on by the 
DEVS, then get the current concept of channels and threadpool at least working 
with "newruntime" as I am discussing with @Araq here, then improve the "load 
balancing" of those implementations as per your RFC.

Sounds like a good plan to me!


Re: What do you think about the programming language NIM?

2019-08-14 Thread GordonBGood
@nickjonson:

> I’ve found (Nim's) supposed to be faster than Julia...

Referring to your opening post, Nim is at least as fast as Julia but not that 
much faster for well written Julia code other than Julia's wait to pre-compile 
it's code when it's called the first time. The problem with writing Julia code 
is one always has to think about and check how it sees the (dynamic) types in 
order to get this speed; also, I think it's state of development of 
multi-threading capabilities is in at least as much a state of flux as Nim's 
(currently).

But they are two different use cases: Nim produces stand alone executable code 
whereas Julia depends on its development environment to run (again at least 
currently).

@jyelon:

> the thread-local heaps made it impossible to pass structured data from one 
> thread to another

Not impossible, but a little more boilerplate code is required to accomplish if 
one doesn't want to allow the system to just deepCopy the data between threads.

> I think sharing data structures between threads in garbage collected 
> languages is more common than you think. Java, definitely. C#, golang. 
> Anything that runs on the JVM, like scala or kotlin. All pure functional 
> languages, like haskell...

You may as well add F#, too.

Yes, all of those named languages now have multi-threaded GC; that is the way 
memory management has been conventionally done for Virtual Machine types of 
languages and Haskell now has a multi-threaded GC implementation, too.

There are lots of modern languages that don't have GC as in Apple's Swift 
(using reference counted pointers with automatic destruction, but with flaws), 
Rust (a kind of safe smart pointer but with reference counting as an option 
when that doesn't work), and of course C++ (depends on libraries used, but 
something like Rust without the type safety) and one can still get things done, 
although you are right that GC makes it fairly easy for lots of types of 
applications.

> I want a garbage collected language.

Don't knock Nim's new run time owned ref's until you've tried them: we believe 
that for typical use they will be almost as easy to use as GC'ed references but 
without GC's downfalls in taking a huge development effort to reduce memory 
access wait time latency while supporting multi-threading, high execution time 
overheads (even worse for multi-threaded versions), and non-deterministic 
destruction (not knowing exactly when heap memory will be free'ed after the 
data goes out of scope or even after the reference to the data is manually set 
to nil).

Have you actually tried multi-threading in those other languages to see how 
easy they are to use for your particular application? Have you actually tried 
to write something using multi-threading in modern Nim for your application? 
You'll never know how well they work or don't until you've actually written 
some Proof Of Concept little micro applications.


Re: [RFC] Project Picasso - A multithreading runtime for Nim

2019-08-14 Thread GordonBGood
@Araq:

> (the theads and channels libraries) should be a separate, explicit import.

Okay, I see that.

> (about deepCopy) the answer is "no" because C++ and Rust and C etc. # all 
> lack it. ;-)

C/C++ lack deepCopy but then, in spite of their bulk, they aren't very complete 
languages are they ;-)

Rust has this thing called Clone and I was under the impression that it does a 
deep copy, as how else could one consider the copy to be a "clone". But we are 
alright without it as with Nim's overloaded "hooks" and 
operators/proc's/func's, we could implement a custom "clone" to do what we 
needed if it were necessary. In that case, it would be nice not to need 
shallowCopy either, but that would likely only be possible by eliminating the 
"special cases" by making standard data types that contain ref's be ref 
object's?

Don't we still need a deepDestroy to be sure we have handled possibly cyclic 
ref's correctly?


Re: [RFC] Project Picasso - A multithreading runtime for Nim

2019-08-14 Thread GordonBGood
@Araq: ???

> with --newruntime we shouldn't have threads or channels or deepcopy in system 
> anymore

What? threads and channels are their own libraries and aren't currently in the 
system library. I think you mean that they won't need to use system.deepCopy 
anymore?

We still need the threads library if one has some real need for very low level 
multi-threading and the channels library for convenient inter-thread 
communication, and in fact these are what makes my new implementation of the 
threadpool library work just as the current threadpool library depends on them. 
**What is nice when these are updated to use newruntime is that `channels` will 
be perfectly compatible with `spawn` without the problems as now** , once we 
remove much of the "cruft and compiler magic" from them.

However, **there still is a need for `system.deepCopy` as other than wrapping 
in a "RCRef", `deepCopy` is the only way of getting a new freshly owned 
non-shared/non-dangling version of an `owned` and we need it's handling of 
cyclic data structures.** I think that deepCopy must still be automatically 
used when the type of a passed sink parameter to the channel send or a spawn is 
a dangling ref or contains one such as when ownership is retained by not being 
"lastReadOf" in order for the thread to get an owned copy; if the programmer 
desires to avoid this behaviour such as to avoid a high execution overhead of a 
full deepCopy, then they need to wrap the parameter so that it doesn't appear 
as a dangling ref such as by hiding it behind a manually managed raw pointer or 
by using a "RCRef" wrapper with an override for deepCopy that bypasses the full 
deepCopy just as suggested in the [experimental documentation for 
spawn](https://nim-lang.org/docs/manual_experimental.html#parallel-amp-spawn-spawn-statement).

As per [the experimental documentation on 
spawn](https://nim-lang.org/docs/manual_experimental.html#parallel-amp-spawn-spawn-statement)
 **about the limitation that the `FlowVar` returned from a spawn not being able 
to contain a type containing a `ref` type (including a closure?)** , I see the 
reason for that being that it currently returns a FlowVar containing a closure 
that defers the nested production of the actual FlowVar containing the type 
perhaps in order to support the experimental parallel optimizations through 
"crud and compiler magic" and I think we need to find a better way to implement 
parallel than this if that is what causes this limitation. **Good 
multi-threading code uses the ability to pass closures to and fro as a 
convenient way to pass owned data and also cleanly defer execution without 
"smoke and mirrors"**.

In fact, **to fully implement B /D as to avoiding data races on destruction, we 
need to implement a `deepDestroy` as is part of the suggested extension at the 
end of that document**, and that also must avoid data races on cyclic data 
through ref's, but that is a bit easier. This was what you called my "crazy 
talk" but it is in the spec and needed unless you come up with a different 
simpler way of handling cyclic data?

There **may be a need for `shallowCopy` to handle the "special case handling" 
for `sting`'s and `seq`'s** (with which I disagree as per my KISS precept 2). 
However, (due to this precept), **I highly recommend you consider making these 
two date types and any others that persist in appearing as value types but 
containing `ref` pointers be converted to `ref object` reference types if 
possible without breaking everything**. I think that this can possibly be done 
for  newruntime for any code that uses them through their by keeping their 
API's the same and even for code that obtains raw pointers to their data as it 
should work in the same way. **I hate special cases** , and the libraries are 
full of them due to handling this!


Re: [RFC] Project Picasso - A multithreading runtime for Nim

2019-08-13 Thread GordonBGood
@Araq: Good of you to indicate you are following this! Sorry to inflict my long 
posts on you, but they are also **written for new users who don 't know the 
history and background of Nim**, and I **bold the main points** so those who 
are "in-the-know" such as yourself can skim.

> > It appears that Nim's threading model fails at several if not all of the 
> > above precepts; this is similar to @Araq just recognizing that Nim's GC 
> > based memory model is likely wrong in that it is taking more and more 
> > support time and still doesn't make managing memory Simple.
> 
> I take that as a compliment, thank you. :-)

 **It was intended as a compliment, but also a lament** : [@gradha was right 
those many years 
ago](https://gradha.github.io/articles/2015/02/goodbye-nim-and-good-luck.html) 
and **we 've lost years using GC and trying to adapt single threaded GC to 
cross threads**; as an aside to this thread's subject, perhaps he was also 
right in his lament on the lack of a standard GUI, but I see that you are open 
to someone developing something, as in exploring the possibility of making the 
new wNim package cross platform. For this, perhaps once our community get more 
used to the newruntime (once it works), it will also make that development 
easier.

On the current **subject of a cleaner threading experience, it isn 't as much a 
problem as the `threadpool` library but for the `channels` library**, the 
storeAux proc that is related to supporting the deepCopy'ing of send parameters 
is almost half the total file size using the metric of the number of code lines 
used! I suppose that it was developed before system.deepCopy became available, 
as it seems that if it were still necessary (hopefully not when newruntime is 
fully deployed) it could avoid the redundancy of seemingly doing much the same 
thing. But again, I digress, as current channels works "well enough" and it is 
easy enough to work around this complexity. I think we have already talked 
somewhere else about making Channel send parameters to be sink and now perhaps 
owned if they are ref to make this library much cleaner and more flexible as to 
being more usable with (a proposed new) threadpool/spawn.

> > I can post this code somewhere if anyone is interested.
> 
> Definitely!

I am looking at my code, and it is as beautiful as the whole concept of 
threadpool/spawn/FlowVar (which names it uses/borrows/steals as it attempts to 
provide the identical functionality); **however, in order to make it more 
readily generally adaptable to the current non-newruntime environment, it 
includes its own implementation of reference counted pointers (30 lines), and a 
closure wrapper to prevent premature garbage collection of closures and their 
captured bindings (another 30 lines)**. That works well enough as I am always 
**careful not to use cyclic data** and performance at the level of invocation 
of threads isn't a problem since it is of **negligible cost in CPU cycles 
compared to the cost of invoking even pooled threads** , but it is a 
significant part of the overall Proof Of Concept (POC) file (currently 160 
lines including performance testing code).

I see that in most cases, **one will be able to use the newruntime 's `owned 
ref` instead of the "RCRef" emulation and won't need to wrap closures since 
they will use `owned ref` internally**, which would make the code even more 
beautiful and be a good selling point for the use of the newruntime. However, 
**the newruntime doesn 't currently work with threading** on the stable release 
and only maybe (untested as of yet as to the latest commits) with the latest 
DEVEL build, so it can't really be done properly yet. **I have considered 
emulating `owned ref` rather than "RCRef` on top of the current runtime** as a 
POC so that it would be more easily converted when the newruntime works, **but 
emulating closure captures isn 't easy for general closures without some 
compiler magic**.

So what I propose is this: I'll **first post a bit of code using the current 
`threadpool` to establish some performance benchmark, post another bit of code 
that shows the problems using the current `threadpool` with generic `proc` 's 
using closures and bypassing `deepCopy`** (this could be posted as an issue 
against threadpool but what's the point when we can avoid all that complexity 
with newruntime), **then the current "RCRef"/"Closure" wrapper version** 
showing the concept works both for performance and solves the problem across 
generic proc's, and **finally, either a different emulation using `owned ref` 
or (if newruntime works with threading soon) using the actual new runtime** to 
show we gain both performance and code elegance.

What do you think?

At least I'm not proposing "throwing the baby out along with the bathwater" as 
per @gradha and others ;-)


Re: Official "Web Playground" + ASM inspector

2019-08-13 Thread GordonBGood
My current favourite is Wandbox: 
[https://wandbox.org/permlink/9qvzPvhvPhaSp96Q](https://wandbox.org/permlink/9qvzPvhvPhaSp96Q),
 which supports all current Nim versions and even has HEAD updated a few times 
a month. The great thing about it is that one can test multi-threading (three 
threads available), can set most compiler options as well as the command line 
options for the generated run program (so we can set release/danger mode, 
etc.), can even select other native compiler backends (--cc:clang, or one can 
make your own "nim.cfg" file as one of the other files to control the process), 
and can use multi-files with "includes" to bring any selected file (optionally 
named as per the user's preference except for the first main "program" file).

It seems that their server is currently a Xeon processor with Sandy Bridge 
class CPU's running at 2.5 GHz.

It doesn't allow one to view the generated assembly code though, although of 
course one can generate it by passing the compiler the '-S' in the command line 
(-t:-S) and using the "\--noLinking:on" command line option; there just isn't 
any way to view the resulting output assembly file, no matter where one directs 
it. One could contact the website owners and they would likely be able to 
easily add this capability...

I'm afraid that currently to inspect the generated assembly code, one has to 
install Nim, but that isn't very hard as in a 18 Megabyte download, an 
extraction, and setting a few environment variables, all as per the quite good 
instructions. On Windows one has to install MSYS2 and the requisite binutils 
capabilities, but again the instructions are reasonably complete.


Re: newruntime doesn't work with threads:on...

2019-08-07 Thread GordonBGood
@Araq, you should just reopen @mratsim's issue at 
[https://github.com/nim-lang/Nim/issues/11844](https://github.com/nim-lang/Nim/issues/11844)
 as the problem it describes is still there.


Re: newruntime doesn't work with threads:on...

2019-08-06 Thread GordonBGood
@Araq: Fixed, not...

Your commit 
([https://github.com/nim-lang/Nim/commit/7024bb800ccf2c3e075c5dd605b8384a399ffe53](https://github.com/nim-lang/Nim/commit/7024bb800ccf2c3e075c5dd605b8384a399ffe53))
 on file "lib/core/runtime_v2.nim" doesn't seem to have gone far enough, as I 
see three other places at lines 56, 62, and 68, where you use 
atomicInc/atomicDec without using discard, having only fixed the one at line 87.

But that isn't the source of the error anyway, as the message says, the error 
comes about because the allocs is less than or equal to zero when it is 
expected to be at least one. It seems allocations aren't balancing out with 
deallocations with threading on, perhaps because some file isn't seeing that 
hasThreadSupport is enabled and another file seeing it?


Re: Fastest Prime Sieve, in Nim

2019-08-06 Thread GordonBGood
@mratsim:

> Nim's FlowVars have contention issues and no load balancing.

I have been doing some more thinking about your performance problem with the 
"pibench2" program and think that the problem may have been your 36 threads 
being run in with pinned threads in thread affinity - there should be very good 
reasons to pin threads to a core and this code isn't a good example of one of 
them. As I show in my code above, it isn't necessary, and the above code could 
reasonable easily be converted to use raw threads rather than spawn.

That said, there are advantages in reducing context switching overhead by using 
spawn and as you said in your comment on using raw theads, they really are too 
low level if one doesn't need those extra cababilities as one doesn't here.

For your use, you might use setMinPoolSize to your countProcessors() value to 
be sure that enough threads are "spun up" in the thread pool.


Re: Fastest Prime Sieve, in Nim

2019-08-05 Thread GordonBGood
@mratsim

> Not sure if it's the only bug but for now this is the one to fix for 
> multithreading on --newruntime #11844.

It seems that @Araq has pushed a commit to fix that issue; all that remains is 
to compile the DEVEL branch and test or wait for a new release...

As to exploring different alternative multi-threading:

> approaching the limits of OpenMP (nested parallelism especially)

I rejected OpenMP early as it is too C'ish and therefore somewhat cludgy 
although I have used it before with C and see how its use can be expanded 
beyond the basics available "out-of-the-box".

> raw Threads is a too low-level abstraction

At first I thought using the low-level would be the answer as in easily 
emulating GoLang's green threads and that channels would work like GoLang 
channels; however, it kept getting more complex due to the limitations of not 
having GoLang's multi-threaded GC and therefore all the Nim limiations on the 
use of channels, meaning that one has to stoop to the use of low level features 
as in use of Lock and Cond and essentially building a Nim version of Haskell's 
"MVar" to pass through the channels. This has caused me to explore @Araq's 
recommended threadpool.

> Nim's FlowVars have contention issues and no load balancing

Funny you should say that about FlowVar's as your link to the "pibench2" 
project doesn't use theadpool and therefore doesn't use FlowVar's. I agree that 
the linked project using the raw thread's isn't very efficient and should be 
re-written to use threadpool and spawn as per the following:


# Compute PI in an inefficient way but multi-theaded...
from cpuinfo import countProcessors
from threadpool import FlowVar, spawn, `^`
from times import epochTime

const DELTA = 1_000_000 # number of terms calculated per thread call!

let numprocs = countProcessors()

proc pi(numterms: int64): float =
  result = 0
  proc terms(k, klmt: int64): float {.inline.} =
result = 0
for i in countup(k, klmt, 2):
  let frst = i + i + 1; result += 1 / frst.float - 1 / (frst + 2).float
# [ # take out the space between the `#` and the `]` to comment this out!
  var rslt = newSeq[FlowVar[float]](numprocs) # acts as a rotary queue!
  for i in 0 ..< numprocs: # preload waiting queue!
let k = i * DELTA; let klmt = k + (if k + DELTA > numterms: numterms - 
k else: DELTA - 1)
rslt[i] = terms(k, klmt).spawn
  for i in 0 .. numterms div DELTA:
let id = i mod rslt.len; result += ^rslt[id] # get values from queue!
let k = (i + numprocs) * DELTA
let klmt = k + (if k + DELTA > numterms: numterms - k else: DELTA - 1)
rslt[id] = terms(k, klmt).spawn # add new tasks to queue! # ]#
#[ # put a space between the `#` and the `]` to uncomment this!
  for k in countup(0, numterms, DELTA): # do it without threading for 
comparison
let xtra = (if k + DELTA > numterms: numterms - k else: DELTA - 1)
result += terms(k, k + xtra) # ]#
  result *= 4

let start = epochTime()
let answr = pi(10_000_000_000)
let elpsd = (epochTime() - start) * 1000.0
echo answr, " in ", elpsd, " milliseconds."


Run

The above code does the same calculations as the "pibench2" with default 
settings which calculates 10^10 = 10 billion terms for 10 "digits". One can 
vary DELTA to see that it starts to lose scaling across threads at something 
below about a hundred thousand when the thread task is something about a 
millisecond.

The above takes about 3.X seconds on a Sandy Bridge CPU with three cores at 2.5 
Gigahertz to do a tenth of the job as per the following WandBox link: 
[https://wandbox.org/permlink/9gOlb1WsjjgWFGIm](https://wandbox.org/permlink/9gOlb1WsjjgWFGIm)
 but that isn't too certain as the two of the three threads will likely be 
shared cores; it takes about 54.2 seconds on my Intel Atom i5-Z8350 Windows 
tablet CPU (Silvermont) at 1.44 GHz and four cores and about 36.25 seconds on 
my Intel i3-2100 at 3.1 GHz with two cores and four threads. From all of that, 
your processor being a half again faster clock speed than this last and with 
nine times the number of cores as well as being seven generations newer might 
make your 18 core/36 thread machine able to run this in a half second but I 
would more expect it to be about two to three seconds. One can comment out the 
multi-threading code and uncomment the single threaded code to see that the 
code scales with the number of cores other than the uncertainty of 
Hyperthreading/Simultaneous Multi-Threading threads and the number of floating 
point execution units there are per core and how they are used by the CPU. It 
is easy to calculate from these times (especially single threaded ones) that it 
is taking about 30 CPU clock cycles per term calculation on these older 
generation CPU's so that seems about right and explains why we need at least 
about a hundred thousand 

Re: newruntime doesn't work with threads:on...

2019-08-04 Thread GordonBGood
@mratsim, you had already reported the issue as at the following: 
[https://github.com/nim-lang/Nim/issues/11844](https://github.com/nim-lang/Nim/issues/11844).

@Araq just (hopefully ;-) ) fixed the issue with 
[https://github.com/nim-lang/Nim/commit/7024bb800ccf2c3e075c5dd605b8384a399ffe53](https://github.com/nim-lang/Nim/commit/7024bb800ccf2c3e075c5dd605b8384a399ffe53),
 which turned out to be very simple as just discard'ing the returned value from 
an atomicDec.

Thank you very much, @Araq, for this timely response. Now just have to compile 
the DEVEL branch or wait for the next release to check it out.


newruntime doesn't work with threads:on...

2019-08-04 Thread GordonBGood
Now that the \--newruntime is starting to stabilize, I really wanted to try it 
with \--threads:on as one of the worst problems with threading is not being 
able to pass closures (which contain a hidden ref to the captured bindings) 
across threads. I had high hopes that the newruntime would fix that as I 
understand that the hidden ref then becomes an owned ref which should be able 
to be passed, although perhaps not yet until the threads channels and 
threadpoos modules get refactored to match.

We can't even try it yet because of an error in the new runtime: "[FATAL] 
unpaired dealloc" even for a "Hello World" program that doesn't actually use 
any threading features!

It seems to me that one of the main reasons for the new runtime is that it will 
allow standard data structures such as ref object's, seq's, string's, anything 
containing a ref, and closures (which contain a hidden ref) to be more easily 
used across threads such as by spawn and may make channels more consistent to 
use anywhere. It may also mean that there need to be many less deepCopy's used 
across threads when ownership is clearly transferred.

Any ideas on why this is an when it might get fixed? 


Re: seq[owned T] and delete()

2019-08-04 Thread GordonBGood
@Stefan_Salewski:

> Setting the ref to nil seems to compile.

It doesn't work either way under version 0.20.2; I think it only works on HEAD 
version, perhaps because there was a patch as a response to the issue you 
raised before - so I can't try it...

If it works when the value being changed is set to nil first, then the reason 
is that the copy semantics have perhaps been changed to allow a owned copy when 
there is no l-value being copied over with the setting to nil causing a 
destruction of the old value so that it becomes the accepted case. I think it's 
working as it should.

The reported error is then correct as the delete is (at least perceptually) 
trying to copy a nil into the value at index one, which is explicitly 
disallowed.


Re: seq[owned T] and delete()

2019-08-04 Thread GordonBGood
@Stefan_Salewski:

> Setting the ref to nil seems to compile.

It doesn't work either way under version 0.20.2; I think it only works on HEAD 
version, perhaps because there was a patch as a response to the issue you 
raised before - so I can't try it...

If it works when the value being changed is set to nil first, then the reason 
is that the copy semantics have perhaps been changed to allow a owned copy when 
there is no l-value being copied over with the setting to nil causing a 
destruction of the old value so that it becomes the accepted case. I think it's 
working as it should.

The reported error is then correct as the delete is (at least perceptually) 
trying to copy a nil into the value at index one, which is explicitly 
disallowed.


Re: Fastest Prime Sieve, in Nim

2019-08-03 Thread GordonBGood
@BLM2, I'm still working on this, but since comparing single threaded 
performance is like racing cars where each is limited to firing on only one 
cylinder when they have at least four and maybe six, eight, twelve, thirty-two 
or sixty-four: one can compare performance per cylinder but some cars and 
driving techniques may not scale across the different cars for this limitation. 
Anyway, even if one is going to compare this way, it should be on the basis of 
performance in CPU clock cycles per core-pair since HyperThreading 
(HT)/Simultaneous Multi Threading (SMT) thread pairs usually share L1 and L2 
cache resources for modern CPU's.

That is the problem with Dana Jacobsens's single threaded sieve comparison 
chart 
([http://ntheory.org/sieves/benchmarks.html](http://ntheory.org/sieves/benchmarks.html))
 although it is somewhat useful for comparison by scaling clock speeds as 
described below.

The top contenders in her Chart are likely no longer the fastest, as Huang 
YuanBing's "ktprime" (K Tuple Prime) written in C 
([https://github.com/ktprime/ktprime](https://github.com/ktprime/ktprime)) is 
actually up to about 10% faster than Kim Walish's "primesieve" when run single 
threaded producing 'k'=1 (the non-k) as one can determine by scaling his 
(single-threaded) 1e10 and 1e12 results on his i7-6700 CPU running at 3.9 
Gigahertz with Dana Jacobsen's single threaded tests run on a i7-6700K CPU 
running at 4.2 Gigahertz. Unfortunately, Dana doesn't seem to see comparing 
single threaded results the way I do (or more likely she is comparing many 
algorithms that haven't been written to be multi-threaded) and "ktprime" hasn't 
yet had multi-threading added anyway, but I see no reason it wouldn't hold its 
lead over "primesieve" when multi-threading is added.

The code for "ktprime" is much more complex than yours as it is more general in 
being able to be used to count/find "K-Tuple Primes" in the general case and 
not just Twin Primes as your code does and also part of the extra lines of code 
are due to it being written in C and not Nim and as well having a full command 
line interface rather than just the basics, but the general principles of using 
"PGX" wheel factorization generators are the same. Your trick of boiling down 
the performance by eliminating some sieving of residual bit planes that aren't 
required for Twin Primes will still beat it for that special case if the same 
optimizations were applied to your code as he uses, but that is very much a 
special case.

Note that most newer algorithms don't even bother with sieving to a billion as 
that is now such a trivial case (about 40 milliseconds multi-threaded on a 
modern desktop) for modern algorithms and simple tricks that only work for 
these "smaller" sieving ranges can change the outcome but aren't generally 
applicable to larger ranges: thus, the comparisons should really start at ten 
billion or (even better) a hundred billion. For instance, "primesieve" runs 
about 20% faster sieving to a billion on a CPU with a 32 Kilobyte sieve buffer 
size than with a 16 Kilobyte sieve buffer size when run single threaded, but 
these gains don't translate to multi-threaded use as two threads share this 
same cache meaning that it effectively becomes about the same performance as if 
it were only 16 Kilobytes per thread. Also, the problem is that these better 
algorithms are too "grainy" at small ranges with possibly a single thread task 
sieving up to about a four billion number range at a time and a processor with 
eight threads thus processing a range of about 32 billion per step! What really 
counts in speed is the ability to sieve these larger ranges efficiently and in 
a way that scales well with multiple threads. Kim Walich's "primesieve" only 
scales moderately well with increasing range due to conflicting memory access 
HT/SMT threads sharing the same CPU L1 cache as described with "ktprimes" 
possibly scaling better and my own work where I always use just only half the 
L1 cache definitely scaling for multi-threads in a direct way.

> ...and may finally try to do a C++, since people want to see a C++ version.

I don't know that you need to go that far if you don't want to - I personally 
dislike C/C++ intensely and never write anything in them more than a couple of 
hundred lines of code, which is why I am with Nim: most people capable of 
reading and understanding C++ code should be able to easily translate your few 
hundred lines of Nim code and if it is just for the purposes of comparative 
benchmarks, it is a trivial task to install the Nim compiler that basically 
just generates C or C++ code from your code anyway. But "have at it" if you 
must and it will be an interesting exercise anyway in comparing the code.

In my experience, Nim can run just as fast for the same algorithm as C/C++ as 
long as one pays attention to memory allocation as in not using the Garbage 
Collector (GC) and is aware of when deepCopy operations will be 

Re: Emscripten/WebAssembly GC considerations?

2019-07-14 Thread GordonBGood
I doubt I have a program error as it won't compile "Hello World". The problem 
is likely with the Windows implementation, as I've seen referenced before. The 
thing is that Emscripten works fine from C/C++ code, so it seems the Nim is 
doing something when compiling in a Windows terminal.

I have MSYS2, so I can try running in that terminal. My machine is running out 
of System space, so I would prefer not to have to install WSL Linux, which I'm 
sure would work.

I don't think there is anything major wrong, Nim just needs some compiler 
options set to handle output to Emscripten.


Re: Nim's future: GC and the newruntime

2019-07-07 Thread GordonBGood
@Arag:

> But again, if you want RC, atomic or not, Nim is giving you the building 
> blocks to do exactly that.

I'm grateful for that capability and have used it witha destructor 
implementation when I wanted to try an algorithm that depended on list-like 
structures without using GC.

> The real issue is that RC cannot even detect cycles so you're left with 
> having to run a separate leak detector.

Would it help to reject cyclic data completely (except at the global level) as 
per what Elm does currently? Or do you see use cases where we can't live 
without cyclic data? According to the Elm people, cyclic data isn't a problem 
if you can't create it in the first place and they apparently think that they 
can live without it.

I see that "introducing a cycle (leak) detector" could cost a lot in cycles and 
is greatly to be avoided.

> Introducing cycles with B/D is much harder to accomplish.

I don't see it being hard to introduce cycles with B/D at all if we want B/D to 
be able to handle everything that GC currently can. The following trivial 
example works fine with current GC, but would be a cycle that would require 
cycle detection or that the special destruction extensions be used if the ref 
were a B/D owned ref or a combination of owned ref and dangling ref: 


type
  List[T] = ref object
head: T
tail: List[T]

proc makeCycle[T](x: T): List[T] =
  result = List[T](head: x); result.tail = result

var ones = makeCycle(1)

for _ in 1 .. 20: stdout.write ones.head, " "; ones = ones.tail


Run

Yes, this is a list which isn't really very useful, but the same principles 
apply if it were a more useful structure such as a binary tree (with cycles).

It's possible to fix for B/D by using the extension(s) proposed by B/D for 
destruction, but those, in turn, cost cycles and don't seem to cover deepCopy 
if one truly needs it. Again, deepCopy extensions can be implemented in a 
similar way, but they also cost cycles, although I suppose that would be 
acceptable given that deepCopy would be needed so rarely.

As well, B/D doesn't seem to apply in all cases where one must have multiple 
owners such as for the sub-nodes in a binary tree, as the only alternates would 
seem to be deepCopy ("Clone" for Rust) which are extremely wasteful for a large 
tree structure, having GC as an alternate (I've come to see your objections to 
it so don't really want to have to revert to using that), and RC (which is 
Rust's solution in spite of being "incomplete").


Re: Nim's future: GC and the newruntime

2019-07-06 Thread GordonBGood
@Sixte:

> That said, I'd like to complete the "storage classes" aka "reference types" 
> (abbr reft) with the following system...

I'm glad to see you take a combination of the Rust and my ideas and run with it 
(although I haven't seen your RFC as suggested by @Araq yet).

However, the more I see this expanded especially to the idea of "lifetimes" 
other than what Rust calls "static" implicitly inferred global lifetimes, the 
more I hate it. If we need to expand beyond "static" lifetimes to make this 
work, I see that Nim would be just another Rust and I don't think any of us 
want that. The only advantage to it for Nim that I can see is mutability type 
safety, and as currently Nim has been mostly built around the concept that 
arbitrary mutability will be a given whenever the var keyword is used, it's 
going to be a huge change to use controlled mutability. Part of it is that the 
Nim ecosystem prefers using "value"/"struct" objects as the building block 
basis of new types rather than "reference" types built on the heap with just a 
pointer to them, although the second is possible it doesn't seem to be 
encouraged by the ecosystem; for instance the crucial string and seq types are 
primarily objects that contain a pointer to heap data rather than a pointer to 
an object on the heap that contains all the fields related to the structure. 
There are advantages/disadvantages either way, but one of the major 
disadvantages of the current system is that special overloaded proc's need to 
be used to manually deal with edge cases such as GC_ref/GC_unref for string and 
seq rather than just being able to handle them generally as just standard ref 
types.

Although in one of the other forum threads, @Araq has convinced me of the 
advantages of B/D, when fully implemented as "the team" seem to be moving 
toward, it is becoming more and more a subset of what Rust does as in from what 
I understand, "ownership" no longer just applies to the new ref type but can be 
applied to everything, especially objects and tuples, but if extended to those 
why not to all primitives as well as in SomeNumber, enum, set, proc, func 
(which can include the closure versions of them), etc. As I pointed out in my 
post, that makes it even more possible to implement the "kind of" affine type 
safe system similar to Rust, to which I have no objection as long as it doesn't 
overly impact program complexity. However, I'm afraid that it will make 
programs considerably more complex as per Rust.

My other objection to B/D and/or Rust ideas is that they don't work for 
everything, meaning we still need either GC or RC. Rust chose RC; many here 
seem to hate pure RC, objecting to both its speed and the fact that one can't 
use cyclic data with it. The reason Rust must support either RC or GC is the 
same as we must: some data structures such as lists usually have multi owners 
as to their nodes and duplication by deepCopy/"Clone" in Rust terms is too 
expensive for structures that may be linked millions of items deep. Rust also 
has a huge problem handling the captured environment for closures as it doesn't 
fit the ownership paradigm, but it seems to be alright (so far) for us.

According to my tests, RC speed isn't that bad when comparing apples with 
apples: Current GC can't efficiently be used across threads, and 
non-thread-safe RC is actually faster than the current GC. If the current GC 
were expanded at great difficulty to include being able to be used across 
threads, it would be slower and likely about the same speed as atomic RC. Plus 
there are all kinds of optimizations we can make with RC as to improving 
efficiency of allocations that are likely much more difficult with GC and 
eliminating atomic ref counts by using the simpler of the Bacon/Dingle 
techniques for many use cases is not difficult. I don't really see speed being 
an issue, just as Rust doesn't seem to see it as an issue other than offering 
the option of non atomic RC when it isn't necessary to make it a little faster.

So the real issue is RC not handling cyclic data. Do we really need cyclic 
data? Elm has taken the step of forbidding cyclic data except at the global 
level to make RC a future option (to perhaps be applied for output of 
WebAssembly) and it doesn't seem to be much of a limitation. One can still 
define cyclic lists/structures at the global level (deferred definition tails 
because Elm doesn't have pointers exposed to the programmer) if one really 
needs them, but in most cases there are other (and more memory efficient) ways 
to get the same effect. In what cases must one have cyclic data at a non-global 
level? I've been able to apply such things as "Ring" lists, but there is almost 
always an alternate data structure that turns out to be more efficient as to 
memory use and/or execution speed. Remember that an ordinary linked list of 
numbers takes as much space for the tail links as it does for the actual 
storage of numbers, plus there is 

Re: Nim's future: GC and the newruntime

2019-06-30 Thread GordonBGood
**TLDR:** Excuse this long post, of which the gist is that @Araq seems to be 
moving toward a more Rust-like ownership model (into which Bacon/Dingle - B/D - 
principles would seem to fit), but that this model does not obliterate the need 
for some sort of GC/RC allocation/deallocation when multiple ownership is 
required. It also includes a discussion about "type safety" by controlling 
mutation a la Rust, which could optionally also be included. Finally, I give an 
example of a little algorithm that is easy to implement with GC or RC but con't 
seemly be implemented only using B/D.

@leorize:

> Nim's new runtime is different from Rust's ownership/borrowing/lifetime 
> system...

@dom96:

> Indeed, @leorize is completely right from what I understand as well.

Both of your understanding is correct as to the existing documentation for the 
"owned ref" proposal but it seems that @Araq has moved beyond basic 
Bacon/Dingle (B/D) as per this post in response to my statement in that other 
big thread:

> That would mean you've extended the definition of "owned" to beyond just 
> owned ref as was in the spec 2...

To which @Araq replied with an example that "owned" now applies to objects (as 
well as ref's) as per his post: 
[https://forum.nim-lang.org/t/4743#30882](https://forum.nim-lang.org/t/4743#30882)

> **I 'm afraid it will take a lot to convince me to bother annotating my types 
> with ``owned``...**

Well, we already have some other new key words as in sink and lent and move and 
wasMoved that don't have to be used if one doesn't worry about the optimization 
of making sure things are "moved" instead of "copied" (which often means a deep 
copy if it isn't "last use of"). With ownership now being able to be applied to 
possibly "everything", then it may be that we rarely have to actually use the 
word "owned" as "everything" will automatically be created as "owned" and the 
default for owned things is to move them; @Araq has also said that "owned" 
things would be implicitly cast to "dangling" things when the semantics are 
"copy", so again nothing would necessarily have to be added. It seems to me 
that if we don't add the type safety of Rust and if the Nim compiler can infer 
the implicit "copy" (made much easier when "everything" is owned), then there 
may be very few if any changes required for current libraries.

**A possible "kind-of" Affine Type Safety Control Model a la Rust - an option**

However, the original spec also mentioned introducing mutation type safety much 
as Rust has in a set of rules:

  1. Everything starts out "owned" as suggested above, meaning that Rust 
doesn't need an "owned" keyword as it is the default.
  2. Owned things can be mutated by their owner only if there are no mutable 
references, implying in Nim terms that they can be "moved" only if there are no 
"mdangling" references and still can never be copied.
  3. Owned things can still be mutated no matter how many immutable references 
there are; they just can't be destroyed, which is as per the specification.
  4. There can be only one mutable reference to anything, and when there is 
one, owned references can no longer mutate as above.



If this were implemented, we would have Rust's "kind of" affine type system 
with "dangling" = "&" and "mdangling" (mutable) = "" with the limitation 
that all lifetimes would be "static" in Rust terms, however that would limit 
the use of the type system. If this were desired, it could be turned on by a 
switch of some kind, and in so doing almost all current libraries would be 
broken (whilst this was enabled) meaning they are not "type safe" by this 
definition.

It really should be a separate RFC which is an optional "add-on" to the "owned 
ref" one, but if this idea of everything is owned idea is pursued, we really 
need a "spec 3" to cover it with this add-on as an option to it.

As safety isn't one of Nim's stated objectives of "efficiency, conciseness, and 
elegance" (but maybe should be), it is kind of beside the point of what to do 
with Nim's memory management system.

**Bacon /Dingle isn't complete without some sort of RC for multi ownership**

The problem with implementing ownership as a memory management system is that 
it doesn't work in all cases (or conflicts with Nim's primary objectives) just 
about as much as the current GC's has problems in other areas.

Let's look at the primary structure that limits the usability of each, as 
follows:

  1. Nim encourages the use of "value" structures as in object/tuple as the 
basic building block for new types; ref object or ref tuple or ref anything is 
supported, but these aren't commonly used as the base for new types, although 
the "tables" library also offers ref versions. This is currently quite ugly 
when one needs to share things as in a separate "shared table" library (shared 
lists, etc.) and all the possible sub mutations of shared table/table shared 
ref table/ref table, etc. It is also ugly when one needs to 

Re: Fastest Prime Sieve, in Nim

2019-06-27 Thread GordonBGood
> Please provide...

Working on it. I am currently sidetracked by considering the implications of 
the new run time memory management proposal to see how it impacts my work as to 
if its application can make things better. The old memory model's lack of 
support for threading is very likely the reason you are experiencing some very 
serious crashes when using parallel execution.

For the algorithm as to methodology, the general idea is pretty well described 
for in "Chapter 4" of my answer on how to implement the Page Segmented Sieve of 
Eratosthenes in JavaScript at: 
[https://stackoverflow.com/questions/39312107/implementing-the-page-segmented-sieve-of-eratosthenes-in-javascript/55761023#55761023](https://stackoverflow.com/questions/39312107/implementing-the-page-segmented-sieve-of-eratosthenes-in-javascript/55761023#55761023).

The JavaScript version of Chapter 4 isn't included because it wouldn't fit and 
because I stepped back from JavaScript (I hate it for anything more than about 
a 100 lines) in favour of finishing the Nim version with the plan to compile 
that as JavaScript as a quick way to do it. It's an algorithm, so it is 
applicable to any programming language within the limitations of what the 
language allows. For instance, JavaScript doesn't have pointers and thus can't 
have the extreme loop unrolling applied, but even the "odds-only" code does 
have the optimizations to cull about twice as fast by culling over one of the 
mask patterns at a time for the eight different mask patterns

Even the JavaScript version in odds-only can sieve to a billion in about a 
second on a fast modern desktop with a recent good JavaScript engine such as 
Chrome V8/FireFox Spider Monkey/Safari. Since using the modulo residue bit 
planes discussed here leaves the complexity per bit plane almost exactly the 
same, and the number of operations is reduced exactly as per the Wikipedia 
tables, this will be almost exactly a factor of four faster when wheel 
factorization is done (including using the "combo" technique with pre-culling 
some larger primes than the basic residue wheel) or about a quarter of a second 
in JavaScript according to the above environment.

With Nim and extreme loop unrolling using pointers without JIT compilation, it 
should be at least two and a half times faster than that, then multi-threading 
is added for a further factor of the number of effective cores...


Re: Fastest Prime Sieve, in Nim

2019-06-27 Thread GordonBGood
> the implementation is now even faster than primesieve over the tested ranges.

As I've noted before, your implementation is only faster than "primesieve" when 
it limited to the task of finding twin primes, for which your trick reduces the 
work. If one took "primesieve"'s implementation and used your trick to only 
find twin primes, it would be about two and a half to three times faster. Even 
better, my implementation is simpler and already divides the work into 48 or 
480 modulo residue bit planes, so it would be even easier to adapt it to only 
sieving the limited planes needed for twin/triplet/etc. primes.

> primesieve is a very large code base,

You are right that "primesieve" is a large complex program. However, it has 
grown to be overly complex and that is part of what is costing an extra 20% in 
execution time. My program is just over 500 lines of code including comments 
and performs about 20% better than "primesieve" for just counting primes even 
though it also makes (automatic) adjustments in steps for different size 
sieving ranges.

> fine tuning ... fine tuning...

A sign that one doesn't very will understand one's algorithm is when extensive 
empirical adjustments are made for performance; in contrast, one can predict 
and knows exactly why "primesieve" and my algorithm are as fast as they are. 
One of your bottlenecks is in your inner loop, which would run at about 3.5 CPU 
cycles per culling operation if it were run within a single CPU L1 cache size 
most of the time. When run multi-threaded on a Hyper Threaded CPU with a 32 
Kilobyte L1 cache as most current Intel CPU's have, you can only use 16 
Kilobytes per thread (half) for the highest culling speed. Even 3.5 cycles per 
cull isn't fast, as extreme loop unrolling gives as little as an average of 
about 1.25 cycles per culling operation for small base prime sizes and there is 
an intermediate method that can reduce cull operation cycles for smaller base 
prime values to about 1.875 cycles.

> segment sizes...

As above, making your segment sizes too small limits your range and/or makes 
culling over larger ranges too inefficient as in too many base primes don't 
"hit" the segment with your naive implementation; too large a segment size with 
your simple implementation will mean more/most base primes "hit" but are slower 
because of cache associativity. That is why programs use different schemes for 
different ranges of base primes.

> It takes so much time to run tests above that, I'm resource limited to fully 
> verify optimizing it over the full 64-bit range.

One doesn't actually test over the 64-bit number range, which would take 
hundreds of years at your rate; one uses the segmentation to test over a 
limited range every order of magnitude of growth. But your current algorithm is 
limited in it's use of fixed size segments so it won't be very efficient over 
ranges of about 1e14.

You are putting a lot of effort into this what is a very limited use that is 
only proven useful for finding twin primes (not triples and higher) faster. It 
seems to me that the effort could be better spent in doing something more 
general, but YMMV.


Re: Fastest Prime Sieve, in Nim

2019-06-21 Thread GordonBGood
@BLM2:

I'll reply here one more time before switching to email if necessary, as a few 
peripheral issues are open and readers may want to know the outcome:

> 1) The Classical Sieve of Eratothenes (CSoE) needs to search the whole 
> integer number space up to some N (see 
> [https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes)](https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes\)).
>  The Optimized SoE (OSoe) seeks to reduce the number space (by skipping even 
> number, factors of 3, 5, etc) but fundamentally still use as their 
> starting/end points the whole number space up to N.

In fact, the linked Sieve of Eratosthenes article specifically mentions use of 
wheel factorization and has tables showing the effect of "reduced number 
space"; for instance, for what you would call the "PG7" as wheel factorization 
using the prime factors of two, three, five, and seven for a range of a hundred 
billion (10^11) reduces the required number of cull operations from about 276 
billion for the "classical" SoE to about 36.2 billion operations due to 
"reduced number space" and further to about 22.3 billion culling operations for 
what you would call "PG19". The odds-only SoE is just a trivial case of "PG2" 
and reduces the number of culling operations to about 113 billion for this 
range as per the table: 
[https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes#Algorithmic_complexity](https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes#Algorithmic_complexity).

The OSoEs talk nothing about Prime Generators, residues, etc.

The article doesn't talk about actual implementation methods for wheel 
factorization because that is beyond its scope of describing the SoE algorithm; 
however, it has a formula whereby one can calculate the estimated number of 
culling operations for a given range and can then confirm that estimate using 
whatever implementation one chooses. It also goes beyond your implementations 
as in using optional "pre-culling" whereby for a given "PGX" one can 
pre-initialize with a wheel pattern for even larger primes than "X" and shows 
the resulting reduction in culling operations by doing this.

> 2) My methods are fundamentally different in conception and implementation. 
> They are based on Prime Generator math...

Your methods use one of the most efficient **implementations** (perhaps the 
best in general terms) but as discussed above "Prime Generator Math" is just 
wheel factorization math as has been well known for many years in the field and 
is thus nothing unique.

> They are also modern, as they are inherently structured to be optimally done 
> in parallel.

As I mentioned in my first post, your method of doing this in parallel is just 
one possible **implementation** as is exactly the one chosen by Professor 
Caldwell in the article I quoted as he has used for perhaps the last 20 years. 
Before that, there was a 1971 paper on an implementation for a IBG 370/136 
mainframe that implemented page segmented SoE and proposed refinements for what 
is essentially wheel factorization for "reduced number space" at the end of the 
article at: 
[http://www.math.ubc.ca/~gerg/teaching/592-Fall2018/papers/1977.Bays.pdf](http://www.math.ubc.ca/~gerg/teaching/592-Fall2018/papers/1977.Bays.pdf).
 Application of wheel factorization to the SoE is well known in the art.

Your implementation of running the individual modulo residue bit planes in 
parallel is not necessarily the best one in all situations, although it may be 
the best for the specific problem of finding the twin primes over the ranges 
you propose. Running each successive segment over all modulo residue bit planes 
per thread has the advantage that it scales to any number of threads as per a 
supercomputer or compute platform such as the computation facilities of a 
modern GPU; although the code in calculation of starting addresses per segment 
is a little more complex, it doesn't then need to be done as often.

> It's interesting you raise an issue of naming my work after myself. I mean 
> after all, if you look on Wikipedia, besides the SoE, it has pages for the 
> Sieve of Atkins, even though Daniel J. Bernstein as a grad student did all 
> the programming for it 
> [https://en.wikipedia.org/wiki/Sieve_of_Atkin](https://en.wikipedia.org/wiki/Sieve_of_Atkin),
>  and Sieve of Sundaram 
> [https://en.wikipedia.org/wiki/Sieve_of_Sundaram](https://en.wikipedia.org/wiki/Sieve_of_Sundaram)...

First, the Sieve of Atkin is deservedly named after Professor Atkin because it 
is a unique sieve with an academic paper which has been defended and recognized 
as being distinctly different than the Sieve of Eratosthenes. In fact, it also 
uses "baked in" wheel factorization of what you would call "PG5" but that comes 
about naturally due to the unique algorithm based on quadratic equations. What 
the paper and reference implementation by Daniel Bernstein fail to acknowledge 
is that a practical page segmented version of the SoA is very difficult to 
realize 

Re: Fastest Prime Sieve, in Nim

2019-06-21 Thread GordonBGood
@BLM2:

Your results are as valid for AMD or other processors as they are for Intel. 
The computer I am using is just a Windows tablet with a Intel Atom i5-Z8350 at 
1.44 GHz (when multi-threaded), which is a great machine for the price but 
definitely at the low end of the performance scale: primesieve version 7.4 
takes about 40 seconds to find the twin primes to a hundred million whereas 
your program takes about 25 seconds. I had a brief chance to play with a new 
laptop with an AMD Ryzen 5 2500U at 2.0 GHz multithreaded (slower than an 
equivalent Intel i5 in instructions per clock cycle by perhaps 20% to 25% but 
much less expensive) and the ratio was a little closer than on my Windows 
tablet but still a little bigger ratio than as you report for your Intel Core 
i7 notebook.

So I read your articles and your code to see why it's faster...

The following is not really on the topic of Nim but rather on your algorithm...

While it's great that you are doing innovative things with Nim as your chosen 
language, naming this sieving algorithm after yourself as something unique is 
stretching things: It is just a Sieve of Eratosthenes with a high degree of 
wheel factorization applied using residue bit planes as has been well known, at 
least by Berstein of Berstein and Atkin fame and I recall other implementations 
that used this same trick of separatiing the modulo residue bit planes prior to 
him; among them the reference from Professor Chris Caldwell: 
[https://primes.utm.edu/nthprime/algorithm.php](https://primes.utm.edu/nthprime/algorithm.php)
 with a quote from that page as "The modified algorithm actually sieves many 
times, once for each **residue relatively prime** to some number with many 
small prime factors. (I used 30030 = 2*3*5*7*11*13.) So, for example, in the 
first sieve, all of the numbers have remainder 1 when divided by 30030, so 
instead of having the flag bytes represent 0,1,2,... as in the standard sieve 
algorithm, they represent 1,1+30030,1+2*30030,... This may sound like it 
creates more work than before, but it turns out to be faster since we only need 
to consider remainders which are relatively prime to 30030. In this way, the 
multiples of 2,3,5,7,11, and 13 are automatically eliminated, making the 
algorithm roughly 6 times faster. What's more, the modified algorithm can be 
easily **parallelized** , as many computers **can each work on separate 
residues**. "

However, you are right that it is a great algorithm and something that Kim 
Walisch of primesieve missed as he persisted in byte packing the PG5 residues 
into a byte "because it works out - eight values in eight bits" and missed that 
by so doing he is reducing the effective range of each CPU L1 cache range with 
a subsequent loss of efficiency. However, he made up for this loss in other 
ways that you are currently not using in making the inner composite number 
culling loop(s) very efficient by using extreme loop unrolling which reduces 
his time per cull operation by a factor of about two from your implementation, 
with the technique described in his GitHub repo. This could be applied to your 
code to get this same gain and isn't that hard in Nim as you can use a hygenic 
code generation macro for the requisite 32 cases using about 100 lines of code 
to generate about 1600 lines of code at compile time. So your code could be 
about twice as fast as it is now, especially for ranges of hundreds of billions 
or so.

What you could claim as your own original implementation (and have ;-) ) is 
your method of reducing the number of composite number culls when finding twin 
primes by using the above bit planes but just not computing those bit planes 
that don't have an immediately adjacent bit plane (prime residue wise). By 
doing this, for a range of say a hundred billion you reduce the number of 
culls, as there are a total of 1485 residues out of a total of 5760 in the PG13 
modulo generator you use to this range that qualify as having an immediately 
adjacent modulo bit plane "to the right" but you calculate two residue bit 
planes for each so 2970 bit planes worth. Thus you have reduced the number of 
culls to about 10 billion culls for this range (about 10% of the range) whereas 
primesieve culls them all then checks for twin primes so does about 30 billion 
culls (about 30%). So if your inner loop was as efficient as his, you should be 
about three times as fast but because your inner loop isn't as efficient you 
are in the order of only about 70% of the execution time (rather than 33%) 
although you gain more with increasing range due to processing in bigger ranges 
per segment because of the more efficient use of CPU L1 cache.

Understanding why your algorithm is faster, it's easy to see that it will 
always be faster due to the reduced amount of work for any CPU when compared to 
primesieve on that same CPU, with the gains greater for low end processors such 
as mine or current AMD Zen due to them not being 

Re: A few questions about procs

2019-06-16 Thread GordonBGood
> or am I too pythonic in my way of thinking?

You probably are too "pythonic" in your way of thinking in getting used to 
static typing rather than dynamic typing. Static is safer when you get used to 
it. To pass an iterator as a paramenter: 


proc inputiter[T](x: iterator(): T) =
  for v in x(): echo v


Run

To call it for instance with iterators made from a seq, array, and string; note 
that the iterator can be created in the call as follows: 


inputiter(iterator(): int =
  for e in @[1,2,3]: yield e)

inputiter(iterator(): int =
  for e in [4,5,6]: yield e)

inputiter(iterator(): char =
  for c in "abc": yield c)


Run


Re: Owned refs

2019-06-14 Thread GordonBGood
@Araq, the following even more "craziness" is not off-topic as the original 
blog post opened the idea of controlling mutability in the following:

> With ownership becoming part of the type system we can easily envision a rule 
> like "only the owner should be allowed to mutate the object"

Now this won't work generally as explored further in that blog post as there 
are many cases where the "dangling" reference must be able to mutate the 
object. However, there could be controlled mutability as in languages like Rust 
and Pony, etc. which allow reference mutation when there is only one mutator 
reference, with the following changes:

1\. The "owned"/"dangling" "wrappers" are applied to every entity when created 
(as "owned") including primitives, not just the ones according to the "simple 
rule" where they either reference a heap structure or contain something that 
references a heap structure. This actually makes things simpler because the 
compiler doesn't have to check for the "simple rule", just "wrap" everything.

2\. Now, we have three types of distinct "wrapper": "owned", "dangling 
immutable" and "dangling mutable", with as said above everything created as 
"owned".

3\. Destruction rules for all three stay the same as now: only "owned" does the 
actual destruction if the reference counts are zero (if we are checking) but we 
need to keep separate track of the reference counts for "dangling mutable" and 
"dangling immutable". As there can only ever be one "dangling mutable" at a 
time, that count will be zero or one.

4\. Then "owned" types still can't be assigned/copied and the rule is now 
changed so they can only be moved if the "dangling mutable" count is zero.

5\. For the "dangling mutable" wrapper, it can't be shallow copy 
assigned/copied but only shallow copy moved. When assignment is attempted, it 
is converted to an assignment "in-place" and limited to moving another 
"dangling mutable" of the same enclosed type in which case the destination 
mutable ref count is decremented first before it is replaced by the source or 
the usual move/cast of an "owned" of the same wrapped type in which case the 
mutable ref count is implicitly again set to one. All other attempts to 
copy/move will be compiler and/or runtime errors. The proof is that the only 
way of copying is to copy from an owned of the same type (by implicit casting) 
or its own type, both of which need a ref count of one when the source is one 
of these.

6\. The "dangling immutable" wrapper is the same as "dangling" now: it can be 
copied or moved and can't be implicitly cast to move to an "owned" but also it 
can be case/assigned to a "dangling mutable" only if its ref count is one in 
which case it becomes a "dangling mutable" (mutable ref count of one, immutable 
ref count of zero). As now, copying assignment or moving just decrement the 
destination and increment the source immutable ref counts. In other words, this 
type can be shallow copied/moved but not "in-place"

I may have missed some rules or simplification of rules, but I'm sure they can 
be filled in, as this is basically the scheme used by Rust/Pony/languages that 
control mutation type safety.

If this were done, I believe we would have the type safety regarding mutability 
of Rust or Pony but perhaps without the complications and covering the cases as 
mentioned in the blog where a particular wrapper cannot be owned but needs 
mutation. The above rules would seem to be reasonably easy to be enforced by 
the compiler as the compiler already seems to have accommodated verifying 
lifetimes are long enough as for lent and var return values . As before, the 
ref counts can be "virtual" ref counts where they don't even exist if ref count 
checking is turned off and therefor of zero execution and memory cost for the 
"release" or "danger" cases.

It is well beyond B/D (but then we are moving that way anyway) but the 
increased type safety might be worth it?


Re: Owned refs

2019-06-14 Thread GordonBGood
@Araq:

> You are now in crazy-talk-land... "deepDestroy"?

> I'll re-read B/D as per your suggestion to see what I'm missing. I understood 
> that your implementation is "beyond B/D" in overcoming some of it's 
> limitations...

I have re-read B/D thoroughly and see we are well beyond their original concept 
although true to the basic concepts. I think what we are developing is even 
better. They had the same problems with destroying potentially cyclic data and 
their proposed extended solution was as follows: 


To overcome these limitations, we propose adding destructors
to Gel, and propose a multi-phase object destruction mechanism.
Gel could destroy an object graph rooted at a pointer P in the
following phases.
1. Recursively descend all objects in the ownership tree
below P, running each object's destructor.
2. Recursively descend again; for each non-owning pointer
in every object, decrement the reference count of the
object pointed to.
3. Recursively descend again; for each object, check that
its reference count is zero and free the object.


Run

Note that they only consider the concept of owned pointers and not the owned 
"object" structures as we have expanded to, I think for the good reasons, one 
of which is that it means less code changes to existing code. Also note the 
multiple recursive passes

My "deepDestroy" implements this as they suggest but adapted to the outer 
controling "owned" being an object rather than a pointer. They seem to have 
missed the precaution of setting the pointers (all of owned, dangling, and 
primitive) to nil **before** we destroy what they reference from a temporary 
copy so if they are attempted to be destroyed recursively by a path that leads 
back to the root owned object itself, we don't enter a data race as the 
recursive path will be nil. When applied to any dangling pointers, they will 
only be decremented once even if reference count checking is on, but we do need 
to do a scan "destroying" (possibly decrementing reference counts) for dangling 
pointers first as proposed in case any of these pointers are the same as some 
of the owned pointers. That means that every pointer must be "wrapped" in a 
owned/dangling object so we know which are which, and that means that these 
interior owned/dangling "wrappers" can't be elided.

So the concept of "deepDestroy" isn't so crazy and is in line with what B/D 
proposed.


Re: Owned refs

2019-06-14 Thread GordonBGood
@Araq:

> In what sense is owned "a wrapper"?

I think of "owned" as a "wrapper" because it contains a "T" type and adds some 
behavior to it, namely the B/D "hooks" along with the other "dangling" behavior 
for its alternate distinct type.

> And why does that cause problems with "nesting"?

I may be wrong in my reasoning, but I am envisioning a complex "owned" 
structure that contains nested objects with (owned or dangling) heap reference 
pointers, some of which nested structures (including possibly both tuples and 
objects) may in turn be "owned" because they in turn contain heap reference 
pointers and so on. I am asking the question of what happens when it comes time 
to destroy from the outside in as when the outer "owned" goes out of scope. My 
concern is due to that only objects can have "hooks" yet some of the nested 
items will be objects (or tuples) and others may be pointers of any of the 
three kinds (pointer/ptr T/ref T). Objects (and eventually tuples when the bug 
is fixed) will be automatically destroyed when they go out of scope because the 
outer "container" goes out of scope, but pointers and AFAIK ref's don't 
currently have "hooks". When the "owned" was applied only to ref, that wasn't a 
problem because the "owned" applied the hooks directly to the ref and caused 
destruction of what it referenced but now? This is why I envisioned a 
"deepDestroy" which would take care of all of that as it's the only way I could 
think of to make it work while still under the current version; but...

Perhaps you have now modified the compiler so it applies B/D destroy/copy/move 
semantics to ref's directly as well? This would then automatically "drill down" 
through a nested structure.

> destroying 'ptr' automatically...

Once we have the B/D "hooks" applied to what ref's point to **either** by 
"deepDestroy" **or** by some other method such as making "hooks" apply directly 
to "owned" ref's, my idea was that since B/D works so well, it could take care 
of the problem of such a nested structure also containing pointer's or 'ptr T's 
and apply B/D hooks to them as well, thus having complete protection from 
memory leaks for anything inside an "owned".

An example of something that might need this would be closures where the second 
pointer in the tuple is untyped and the type is known only to the compiler and 
the first pointer which is a nimcall proc that uses it as a hidden parameter. 
There needs to be some custom hooks to cause that pointer to be deallocated.

> Now compare that to using RC for a compiler...

I've backed entirely off of using RC except as a check as per spec 2 and as I 
understand B/D (optionally) uses it;

> You are now in crazy-talk-land... "deepDestroy"?

I'll re-read B/D as per your suggestion to see what I'm missing. I understood 
that your implementation is "beyond B/D" in overcoming some of it's 
limitations, so am trying to think "outside the box" to discover if there are 
limitations.

Part of the reason I was thinking about "deepDestroy" is to try to make most of 
the work be done by default versions of the B/D hooks so that the programmer 
wouldn't have to write so many custom hooks. Thus, deepDestroy isn't something 
the compiler generates but rather part of the runtime that the B/D default 
"hooks" can call as an easy means to destroy as deep as necessary according to 
the B/D rules.

Yes, B/D wrote a complete compiler design adequate to prove the points they 
were making, but AFAIK it was never used for a production application and may 
have limitations beyond what was tested.

Another limitation I can think of that I'll re-read to see how it would be 
overcome is the case where one needs to "deepCopy" (because B/D assignments are 
essentially shallow by definition) such as a case where one might need to pass 
the same data to multiple threads that would own it and either pass it back to 
an owned or pass it on, etc.; Although pure B/D does work with cyclic data, and 
can destroy cyclic data if the defaults take some care (resetting pointers to 
nil before destroying from a temporary copy), "deepCopy" is harder because it 
has to follow the links and some multiple number of them may be recursive. 
Again, it can be overcome, but it would be better to be a proc in the runtime 
and not something the compiler has to generate itself.

All of this is motivated by the desire to have the system as safe as possible 
while a performant as possible, yet have most of that safety come about by 
default without the programmer having to write much custom boilerplate code and 
preferably need to make only minimal changes if any to existing libraries.


Re: Owned refs

2019-06-13 Thread GordonBGood
@Araq, regarding my thinking on the treatment of raw pointers to do with 
"owned":

> The only complication on automatic application of the "simple rule" is what 
> to do with the raw pointers pointer and ptr T which might reference the heap 
> but can reference any memory location anywhere including the code as for 
> literals or the stack as for local bindings...

I've done more thinking about this, and I think the right answer is to 
deallocate these raw pointers just as we do ref's at the end of the scope of 
their "owned". There is no overhead complications necessary, or shouldn't be if 
there currently are, as when the Allocator is asked to deallocate a pointer to 
some memory address, it will try to look up that memory address in some way to 
find the memory control block, which it won't find for literals or for stack 
bindings as they were never allocated. In that case, it should just no-op and 
make no changes if that isn't what it does now.

By deallocating all forms of pointer when they are inside an "owned", we can 
have no forms of memory leak when the memory pointers are inside a "owned" data 
type structure. The only way to get memory leaks is to allocate manually and 
not manually deallocate for a pointer that is not part of a data structure and 
thus not automatically "deepDestroy"'ed, and even that would not be possible if 
we also made normally allocated pointers "owned".

Then the final consideration is what to do with pointers that come about as a 
result of addr and unsafeAddr: we need these in order to bypass copy/move 
semantics so these need to remain as truly "raw", but as the documentation 
says, these are "unsafe" and "really really unsafe", so only to be used 
internally and by people who know what they are doing for reasons of 
performance.


Re: Owned refs

2019-06-12 Thread GordonBGood
@Araq:

> This "owned wrapper" becomes a memory region with all its known up- and 
> downsides...

I may be too simple to understand how a "owned" wrapper turns into a memory 
region; perhaps the following example will help and you can tell me its 
problems or give me a reference so I can learn; a proposed impementation of a 
Owned/Dangling wrapper pair as follows: 


const hasThreadSupport = compileOption("threads") and defined(threadsafe)

# can only create Owned; can not create Dangling...
# defined in this order because we can't {.borrow `.`.} generic types yet...
type
  Dangling[T] = object
cntnts: T
refcnt: int
  Owned[T] = distinct Dangling[T]
proc `=destroy`[T](x: var Owned[T]) {.inline.} =
  if x != nil:
let cst = cast[Dangling[T]](x) # so we can access non Dangling fields
assert cst.refcnt == 0, "destroying owned type with dangling 
references!"
cast[ptr T](cst.cntnts.unsafeAddr)[].deepDestroy; x.reset
proc `=`[T](dst: var Owned[T]; src: Owned[T]) {.error: "owned types can 
only be moved".}
proc `=sink`[T](dst: var Owned[T]; src: Owned[T]) {.inline.} =
  let cstdst = cast[Dangling[T]](dst); let cstsrc = cast[Dangling[T]](src)
  if cstdst.cntnts != cstsrc.cntnts:
dst.`=destroy`; cast[ptr T](cstdst.cntnts.unsafeAddr)[] = cstsrc.cntnts
# when `=move` is used instead of `=sink`
proc `=move`[T](dst, src: var OwnedRefRC[T]) {.inline.} =
  let cstdst = cast[Dangling[T]](dst); let cstsrc = cast[Dangling[T]](src)
  if cstdst.cntnts != cstsrc.cntnts:
dst.`=destroy`; cast[ptr T](cstdst.cntnts.unsafeAddr)[] = 
cstsrc.cntnts; src.reset
proc `$`[T](x: Owned[T]): string =
  result = "owned:  "; result &= $cast[Dangling[T]](x).cntnts

proc `=destroy`[T](x: var Dangling[T]) {.inline.} =
  when hasThreadSupport: x.refcnt.atomicDec else: x.refcnt.dec
proc `=`[T](dst: var Dangling[T]; src: Dangiing[T]) {.inline.} =
  when hasThreadSupport: dst.refcnt.atomicInc else: dst.refcnt.inc
  when hasThreadSupport: src.refcnt.atomicInc else: src.refcnt.inc
  copyMem(dst.unsafeAddr, src.unsafeAddr, src.sizeof) # just copy the 
pointer is a byte copy
proc `=sink`[T](dst: var Dangling[T]; src: Dangling[T]) {.inline.} =
  # moves are the same as assignments
  `=`(dst, src)
proc `=move`[T](dst, src: var Dangling[T]) {.inline.} =
  # moves are the same as assignments
  `=`(dst, src)
proc `$`[T](x: RefRC[T]): string =
  result = "unowned:  "; result &= $x.cntnts


Run

> Owned vs non-owned looked intractable to compute to me...

You're "the man" regarding what you think can be done with the compiler ;-) I'm 
just envisioning a simple syntax where one doesn't have to go back and change 
every declaration of a data type that is a heap reference or contains a heap 
reference to add the "owned" designation. There is also the question of 
handling nested "owned"'s but perhaps there is no need if my understanding of 
"owned" just being a wrapper is incorrect.


Re: Owned refs

2019-06-12 Thread GordonBGood
@Araq:

> I don't think that is a good argument, (regarding tuples work in C++)

I meant that I don't care what works or partly works in C++; I care about what 
is easy to remember how to use in Nim: In Nim, to the casual programmer, tuples 
are just like objects but with a short cut declaration, construction, and 
dereferencing. In considering them this way, it might seem strange that one 
can't apply custom "hooks" to them although there are ways around that using 
what does work or will work when you fix the bug. "I'm just sayin'..." ;-)


Re: Owned refs

2019-06-12 Thread GordonBGood
@Araq:

As per my consideration in my last posts:

> unless every object was owned by default...

Although that could work, it likely isn't a good idea as it would put the 
overhead of owned/dangling "hooks" on every data type and it's unnecessary.

Rather, it must be used according to the "simple rule" I mentioned: "Every data 
type that either refers to the heap or contains something referring to the heap 
must be created within an "owned" wrapper so that it can be either "owned" or 
cast to the distinct "dangling" wrapper according to the rules of its copy/move 
semantics.

So the next question is: Given such a simple rule, it seems that the compiler 
will be able to check whether "owned" needs to be applied or not and can guide 
the programmer, then could the compiler insert the "owned" wrapper itself when 
necessary on creation?

If it could, then would the "owned/dangling" keywords not be necessary and "it 
would just work" including the implicit casting to "dangling" according to the 
copy/move rules, although when one could see when things have been wrapped by 
looking at the "repr" of a binding where the "owned/dangling" wrappers would 
show up? One could still override what the compiler would automatically do by 
applying the deepCopy/shallowCopy procs, which for these wrappers are 
complements of each other where deepCopy takes an "owned" or a "dangling" and 
makes a new independent "owned" and shallowCopy takes a "owned" or a "dangling" 
and always casts to output a "dangling".

I see that making this automatic avoids some complication the compiler might 
have in eliding nested "owned" wrappers: Consider a Window that creates an 
"owned" wrapper that contains not only a Button that is "owned but also many 
nested items all of which are owned, perhaps to many levels deep; for 
efficiency it would be good if the compiler elides all those nested "owned" 
wrappers away except for the outer one.

If the compiler needs to elide away all the declared "owned" wrappers, wouldn't 
it be better (simple) that the compiler just injects the outer "owned" wrapper 
by itself as necessary according to the "simple rule"?


Re: Owned refs

2019-06-12 Thread GordonBGood
@Araq:

> The hooks of tuples are lifted from their components' hooks

Having direct custom hooks for tuples was just a suggestion. Reading the 
documentation, tuples and objects are grouped together, and the alternate 
declaration syntax for tuples makes them look like objects,, there is the the 
same alternate way of constructing named tuples, and the same alternate way of 
using a default named tuple constructor and named fields; I just thought it 
made sense that there might be the same capability of adding custom hooks 
directly to tuples as there is to objects.

However, if one needs to do this, one can just use an object instead of a tuple 
and apply the custom hooks to the object, since the syntax would be the same. 
For tuples inside a ref structure, my tests seem to indicate that tuple 
contents **aren 't** destroyed and for any tuple that contains other than a 
primitive, we get an error when we trigger automatic destructor generation for 
a tuple containing a destroyable object as in the following:


type
  Foo = object
cntnts: int
proc `=destroy`(x: var Foo) =
  echo "destroying Foo"
  x.reset

var t = (33, Foo(cntnts: 42)); t.`=destroy`; t.echo


Run

where the error turns out to be "Error: internal error: destructor turned out 
to be not trivial". Perhaps that's just a bug in the current version 0.20.0?

> It works for C++...

I don't think that is a good argument, as I consider Nim good because it isn't 
C++ and better when one no longer cares for the whole OOP paradigm...

> Likewise, having owned(proc ())...

That would mean you've extended the definition of "owned" to beyond just owned 
ref as was in the spec 2 and I was trying to make everything fit within that 
spec. A simple "nimcall" proc is just a pointer to a proc implementation but a 
"closure" is actually a tuple of two pointers, one to the proc and one to the 
environment on the heap. However, your implementation works if the concept of 
owned is extended to tuples/objects as it would protect the interior pointers 
in the same way as an owned ref would do without the indirection as you say.

If owned were applied to the motivating example, then that would also work with 
the same advantages I mentioned of allowing pointers of the same fields inside 
the owned have the same target and shallow copying by default as per the 
specification for owned/non-owned items.

Given that you've extended the spec 2 so that owned can be applied to anything 
especially including object and tuple containers (as well as perhaps pointer 
and ptr T to go with ref?), then the extra indirection isn't necessary when the 
owned object is a pointer or in the case of closures a pair of pointers.

Also, you then might need the definition of the "owned" keyword unless every 
object was owned by default, in which case the alternate of only having an 
"unowned" or "dangling" keyword could be available for the rare cases where one 
needs to specify this manually in a type signature.

What I wrote then stands other than point 3) as it considered the spec 2 
exactly as not having anything other than owned ref; I think that the rest 
after skipping the extra indirection is still valid?

Even the motivating example still needs to be changed as it needs to be 
specified that the seq object type is owned and consideration made for that in 
the "hooks"; I think that most owned/not owned objects would be able to use a 
default set of "hooks" something like the ones as proposed for the 
owned/unowned ref's. Next, I'll provide an implementation that would allow this 
default "hooks" following on from other data than just ref being able to be 
owned...


Re: Owned refs

2019-06-12 Thread GordonBGood
@Araq, thanks for the acknowledgment and the tracking link; I assumed that you 
read even when you don't reply as I see changes here and there I have suggested:

> (Work has begun to make closures work with the new runtime.)

Lately, most of my thinking has been as to how closures and other heap based 
data types should be treated in the new run tiime, and it has made me think 
that the current spec 2 has a major problem, as follows:

1\. In the current run time, a closure can be considered (or is) a tuple as in 
the following: 


type Closure[Args, T] = object # or tuple
  prc: proc(hiddnEnvTpl: ptr tuple[...]; args: nrmlArgs): T {.nimcall.}
  env: ptr tuple[...])


Run

2 I heard you mention that you thought that the new run time could just 
implement the Closure type by changing the ptr to an owned ref meaning that the 
Closure is the owner of its own environment, and destroyed when the closure 
goes out of scope, assuming use of an object instead of a tuple or giving 
tuples the ability to have the same custom "hooks" as objects (which I highly 
recommend for consistency anyway as tuples/objects are just two slightly 
different forms of similar containers).

3\. However, I think that would lead to some problems as objects/tuples don't 
have a concept of ownership, only the new ref's do, so as the closure gets 
passed around and used it becomes unclear when it will be destroyed and whether 
its lifetime exceeds its use for many edge cases, and even if the compiler can 
do static flow analysis to cover that, another problem is that when the implied 
ownership changes then the implied type of all the contained ref'f needs to 
also change, basically meaning that a new data type is implicitly created based 
on copy/move. I'm sure there are ways to work around this, but they all seem 
quite complex (and therefore likely wrong ;-) ).

4\. The solution I propose is quite simple (and therefore hopefully correct): 
The rule is that everything that needs to have a concept of ownership which is 
everything that is allocated to the heap or contains anything allocated to the 
heap must be "contained" by a ref which may be either owned or unowned 
according to the new copy/move semantics. Now the new Closure would be defined 
as follows: 


type Closure = ref object # or tuple...
  prc: proc(hiddnEnvTpl: ptr tuple[...]; args: nrmlArgs): T {.nimcall.}
  env: ptr tuple[...])

Run

Note that the environment can be referenced by a owned ref but there is little 
point as using just a ptr is more flexible and it will never escape the owned 
object/tuple. Now this new ref object Closure obeys all of the new 
destroy/copy/move semantics and can be passed around freely with respect to 
those. If there is an edge case where the use exceeds the lifetime, we can 
simply do a "deepCopy" of the Closure (I have ideas on simply implementing that 
generally for all types, too) including allocating a fresh ptr for the copy of 
the environment, and end up with as many copies as required to get the job 
done, with all of the owned ones destroyed when they go out of scope in each of 
their uses.

5\. Proof that this will work: we only have one owner, which means that it will 
only be destroyed once no matter how many unowned references there are, and 
only the =destroy will do a deep destroy down to chasing the ptr, = copying (in 
the case of unowned ref) and =move only does shallow copies/swaps so just 
transfer the ptr over resulting in more than one copy with the same field 
having the same ptr target isn't a problem because unowned ref's never destroy.

6\. Adjunct benefit of the scheme: It overcomes one of the primary limitations 
as expressed in spec 2 that a given field in two instances can now have a 
pointer to the same memory area when that field is inside a ref \- with this as 
explained in the proof; it now isn't a problem as it is protected by the 
ownership of the ref.

7\. This "simple scheme" would be consistently applied to all data types 
containing heap data as in strings and seqs and anything else in the libraries 
- they will all have this same benefit. This makes the "Motivating example" in 
spec 2 null and void, and I will shortly produce a new motivating example 
according this this "simple rule". Unfortunately, it will mean all of the data 
types which are implemented as objects containing pointers or ref pointers will 
have to be changed to follow the ref object "simple plan" but the benefits are 
manifold...

8\. They also will enjoy a further benefit which will boost performance in a 
large number of cases: copying of these data types can now almost always be 
shallow (by default) since these are reference values and as it is enabled by 
the copy/move semantics of the owned/unowned ref.

9\. In the interests of simplicity I will make one more suggestion: I suggest 
that we don't have a keyword "owned" as all ref's are created as owned and in 
most 

Re: Owned refs

2019-06-12 Thread GordonBGood
@Araq, regarding saving time in atomic ref counts and shared allocations:

>   1. I think you will be surprised when you benchmark this. ;-)
>   2. There is also a different solution:
>   3. Keep in mind that this is only a dangling ref detector...
> 


I see what you mean now: ref counting only happens when the ref counting 
checking is turned on with a future pragma/compiler switch and in fact, when 
it's turned off there doesn't even need to be a ref count field. As you said, 
then all ref counting is of zero cost, of minimal cost except when "threads" is 
on and "threadSafe" is defined, can be of minimal cost if it recognizes use in 
a threadvar and therefore we would only have a cost if ref counting checking is 
left on **and** "threads:on" is set **and** "threadSafe" is defined **and** it 
is not a "threadvar" \- not so bad. With ref count checking turned off, this 
truly is a minimal cost memory management system!

As to the Allocator refinements, I'm sure those will come and some don't need 
locks as in using compare-and-atomic-swap the check if memory was successfully 
allocated or not to avoid even thread barriers/critical section waits along 
with the other allocation tricks you mentioned; the good thing is that we can 
play with Allocators all we want after version 1.0 without breaking changes. At 
any rate, according to my quick benchmarks, even using full sharing allocations 
with barriers is about 20% faster than using the current default non-sharing 
GC, and as we agree, we could tweak the allocator in many ways.

I'm getting excited about implementing this!

I'm excited enough to implement a trial test on top of Nim version 0.20.0 
without doing any compiler magic to see if there are any specification 
problems, am encountering some problems, but have been able to work around 
them. More on that a little later when I get a little further...


Re: Nim v0.20.0 is here (1.0 RC)

2019-06-09 Thread GordonBGood
@Araq, as I have mentioned in other posts, I am trying to apply some of the new 
memory management ideas under the current version 0.20.0 system, with problems 
as follows:

1\. The "old" closures aren't compatible with what closures will be as they 
rely on a (local) GC'ed ref to an environment structure, but one can work 
around that by wrapping "old" closures in an object with a sidecar ForeignCell 
field which contains the state of protect called on the environment pointer 
when created and dispose called on this field for destruction, thus making it 
look like the new closures that will rely on owned ref to do this. This is a 
kludgy but viable work around which should mean there are minimal changes when 
the new closures are available.

2\. My major problem is due to the different syntax and use of the old =sink 
versus the new =move as per the new destructor spec 2:

> under the old design so called "self assignments" could not work.

> Consequence: sink parameters for objects that have a non-trivial destructor 
> must be passed as by-pointer under the hood. A further advantage is that 
> parameters are never destroyed, only variables are. The caller's location 
> passed to a sink parameter has to be destroyed by the caller and does not 
> burden the callee. (with the following problem example:)


from ranom import randomize, rand

proc select(cond: bool; a, b: sink string): string =
  if cond:
result = a # moves a into result
  else:
result = b # moves b into result

proc main =
  var x = "abc"
  var y = "xyz"
  
  # possible self-assignment:
  x = select(rand(1.0) < 0.5, x, y)
  # 'select' must communicate what parameter has been
  # consumed. We cannot simply generate:
  # (select(...); wasMoved(x); wasMoved(y))
# NOTE, WE DON'T HAVE TO; IT JUST WORKS...


Run

However, it seems that things have been fixed so the above example does work in 
Nim version 0.20.0 with the compiler using a combination of the hooks and 
ismoved called on the parameter that is used and not on the other one; the main 
difference between as it works now and as proposed in Destructors spec 2 seems 
to be the different type signature on =sink than =move with the current one 
having a var only on the destination parameter where the second has a var on 
both parameters, and what the compiler is doing underneath rather than 
combining the destination destruction and source resetting in the assignment 
and moving "hooks"

The new =move[T](dst, src: var T) not available yet has a different type 
signature than the =sink[T](dst: var T; src: T) it functionally replaces due to 
having to modify its source to reset it to show that it has been transferred 
(an implicit move); As these are automatically invoked when ownership is 
transferred under move semantics along with "cheats" so that the var parameter 
can be applied to a let variable, the only work around is to make all bindings 
to which we want to apply it var's and not let's and manually invoke a move on 
the source of the sink parameter when in the sink proc.

I don't know enough about the implementation details to know what is the 
easiest way, but it seems to me that a decision should be made whether to use 
the current =sink syntax or the new =move syntax (along with the changed 
definition of all of the "hooks") so that there aren't breaking changes forward.

In other words, it seems to me that the new Destructors spec 2 has two distinct 
parts as follows:

  1. Redefinition of the move semantics and syntax
  2. Possible implementation of the new non-GC'ed ref T and owned ref T



If implemented as per the spec, the first would be a breaking change due to the 
different signature of the =move hook and I would recommend that, if desired, 
it be included pre-version 1.0. This doesn't seem to be risky at all as the 
functionality seems to be the same as currently, although it could break some 
libraries due to the different syntax.

The second isn't very much a breaking change other than making such kludges as 
are necessary to make closures work across threads no longer be necessary and 
requiring an occasional owned, and as you say the compiler will guide where 
those are necessary. As this is relatively untested as yet, it likely should be 
deferred to a later point upgrade version or version 2.0.

If somehow development of the second could be accelerated to be included in 
version 1.0 if it works, it would make me happy ;-)


  1   2   >