On Sun Oct 23 18:01:55 2016, d...@theschrags.net wrote: > This probably is an issue with moarvm. > It was indeed.
> When trying to benchmark the concurrent and non-concurrent versions of > Damien Conway's 'bogosort' algorithm, I ran into numerous problems with > Rakudo Star 2016.07 - depending on the MAX_RAKUDO_THREADS setting, the > process might succeed, or it might slowly eat up all of memory, and if > more than a few threads were used only some would actually process > anything (accumulate CPU time). rakudo-star-2016.10-RC0 looks like it > solved the deadlock problem, so that all threads are actually working. > > However, now this version of the program crashes after the first thread > completes with: > Internal error: zeroed target thread ID in work pass > > or occasionally: > Internal error: invalid thread ID 8770096 in GC work pass > > Test runs: > > doug@ender:~/study/Perl6/benchmarks$ RAKUDO_MAX_THREADS=2 perl6 > bench_bogosort_simple.pl6 > [e i l p r s x] > 1 seconds. > Internal error: invalid thread ID 8770096 in GC work pass > doug@ender:~/study/Perl6/benchmarks$ RAKUDO_MAX_THREADS=2 perl6 > bench_bogosort_simple.pl6 > [e i l p r s x] > 4 seconds. > Internal error: zeroed target thread ID in work pass > doug@ender:~/study/Perl6/benchmarks$ RAKUDO_MAX_THREADS=2 perl6 > bench_bogosort_simple.pl6 > [e i l p r s x] > 24 seconds. > Internal error: zeroed target thread ID in work pass > doug@ender:~/study/Perl6/benchmarks$ RAKUDO_MAX_THREADS=2 perl6 > bench_bogosort_simple.pl6 > [e i l p r s x] > 6 seconds. > Internal error: zeroed target thread ID in work pass > > > I also found that the problem is likely related to a channel not being > explicitly closed, so I was able to get normal test runs as follows: > > [ With a LEAVE block to close the channels ] > > doug@ender:~/study/Perl6/benchmarks$ RAKUDO_MAX_THREADS=2 perl6 > bench_bogosort_simple.pl6 > [e i l p r s x] > 4 seconds. > [e i l p r s x] > 29 seconds. > [e i l p r s x] > 57 seconds. > [e i l p r s x] > 58 seconds. > doug@ender:~/study/Perl6/benchmarks$ RAKUDO_MAX_THREADS=2 perl6 > bench_bogosort_simple.pl6 > [e i l p r s x] > 4 seconds. > [e i l p r s x] > 10 seconds. > [e i l p r s x] > 20 seconds. > [e i l p r s x] > 31 seconds. > > I'm figuring that failing to close the channels is bad practice to start, > but we don't really want to crash the VM with an obscure error message. > Also, I saw a bug report earlier this year where this message was > produced, but closed because of uncertainty in reproducing it. This test, > however fails quite repeatably, at least on my system (linux). > The channel really does want closing, otherwise you end up with a bunch of threads sat in a hot loop eating CPU (and each run sets off another thread doing exactly that, which is why the thing gets slower every run with the missing Channel.close). Of course, a VM crash is the wrong response; I've now hunted that down and committed a fix. It turns out the bug wasn't actually anything to do with closing the channel per se; it's just that missing the `LEAVE` out resulted in a lot more load and GC runs. The problem almost certainly could have happened with the `LEAVE` there; it was just a lot less likely. I've added this example as a stresstest, to catch any regressions. Thanks! /jnthn