That was a bit vague- meant that I suspect the workers are being starved, since you have many consumers, and only a single thread generating the 1k strings. I would prime the channel to be full - or other restructuring the ensure all threads are kept busy.
-y On Thu, Dec 6, 2018 at 10:56 PM yary <not....@gmail.com> wrote: > > Not sure if your test is measuring what you expect- the setup of > generating 50 x 1k strings is taking 2.7sec on my laptop, and that's > reducing the apparent effect of parllelism. > > $ perl6 > To exit type 'exit' or '^D' > > my $c = Channel.new; > Channel.new > > { for 1..50 {$c.send((1..1024).map( { (' '..'Z').pick } ).join);}; say now > > - ENTER now; } > 2.7289092 > > I'd move the setup outside the "cmpthese" and try again, re-think the > new results. > > > > On 12/6/18, Vadim Belman <vr...@lflat.org> wrote: > > Hi everybody! > > > > I have recently played a bit with somewhat intense computations and tried to > > parallelize them among a couple of threaded workers. The results were > > somewhat... eh... discouraging. To sum up my findings I wrote a simple demo > > benchmark: > > > > use Digest::SHA; > > use Bench; > > > > sub worker ( Str:D $str ) { > > my $digest = $str; > > > > for 1..100 { > > $digest = sha256 $digest; > > } > > } > > > > sub run ( Int $workers ) { > > my $c = Channel.new; > > > > my @w; > > @w.push: start { > > for 1..50 { > > $c.send( > > (1..1024).map( { (' '..'Z').pick } ).join > > ); > > } > > LEAVE $c.close; > > } > > > > for 1..$workers { > > @w.push: start { > > react { > > whenever $c -> $str { > > worker( $str ); > > } > > } > > } > > } > > > > await @w; > > } > > > > my $b = Bench.new; > > $b.cmpthese( > > 1, > > { > > workers1 => sub { run( 1 ) }, > > workers5 => sub { run( 5 ) }, > > workers10 => sub { run( 10 ) }, > > workers15 => sub { run( 15 ) }, > > } > > ); > > > > I tried this code with a macOS installation of Rakudo and with a Linux in a > > VM box. Here is macOS results (6 CPU cores): > > > > Timing 1 iterations of workers1, workers10, workers15, workers5... > > workers1: 27.176 wallclock secs (28.858 usr 0.348 sys 29.206 cpu) @ > > 0.037/s (n=1) > > (warning: too few iterations for a reliable count) > > workers10: 7.504 wallclock secs (56.903 usr 10.127 sys 67.030 cpu) @ > > 0.133/s (n=1) > > (warning: too few iterations for a reliable count) > > workers15: 7.938 wallclock secs (63.357 usr 9.483 sys 72.840 cpu) @ 0.126/s > > (n=1) > > (warning: too few iterations for a reliable count) > > workers5: 9.452 wallclock secs (40.185 usr 4.807 sys 44.992 cpu) @ 0.106/s > > (n=1) > > (warning: too few iterations for a reliable count) > > O-----------O----------O----------O-----------O-----------O----------O > > | | s/iter | workers1 | workers10 | workers15 | workers5 | > > O===========O==========O==========O===========O===========O==========O > > | workers1 | 27176370 | -- | -72% | -71% | -65% | > > | workers10 | 7503726 | 262% | -- | 6% | 26% | > > | workers15 | 7938428 | 242% | -5% | -- | 19% | > > | workers5 | 9452421 | 188% | -21% | -16% | -- | > > ---------------------------------------------------------------------- > > > > And Linux (4 virtual cores): > > > > Timing 1 iterations of workers1, workers10, workers15, workers5... > > workers1: 27.240 wallclock secs (29.143 usr 0.129 sys 29.272 cpu) @ > > 0.037/s (n=1) > > (warning: too few iterations for a reliable count) > > workers10: 10.339 wallclock secs (37.964 usr 0.611 sys 38.575 cpu) @ > > 0.097/s (n=1) > > (warning: too few iterations for a reliable count) > > workers15: 10.221 wallclock secs (35.452 usr 1.432 sys 36.883 cpu) @ > > 0.098/s (n=1) > > (warning: too few iterations for a reliable count) > > workers5: 10.663 wallclock secs (36.983 usr 0.848 sys 37.831 cpu) @ > > 0.094/s (n=1) > > (warning: too few iterations for a reliable count) > > O-----------O----------O----------O----------O-----------O-----------O > > | | s/iter | workers5 | workers1 | workers15 | workers10 | > > O===========O==========O==========O==========O===========O===========O > > | workers5 | 10663102 | -- | 155% | -4% | -3% | > > | workers1 | 27240221 | -61% | -- | -62% | -62% | > > | workers15 | 10220862 | 4% | 167% | -- | 1% | > > | workers10 | 10338829 | 3% | 163% | -1% | -- | > > ---------------------------------------------------------------------- > > > > Am I missing something here? Do I do something wrong? Because it just > > doesn't fit into my mind... > > > > As a side done: by playing with 1-2-3 workers I see that each new thread > > gradually adds atop of the total run time until a plato is reached. The > > plato is seemingly defined by the number of cores or, more correctly, by the > > number of supported threads. Proving this hypothesis wold require more time > > than I have on my hands right now. And not even sure if such proof ever > > makes sense. > > > > Best regards, > > Vadim Belman > > > > > > > -- > -y