Re: Performance of parallel computing.

yary Thu, 06 Dec 2018 23:06:58 -0800

That was a bit vague- meant that I suspect the workers are being
starved, since you have many consumers, and only a single thread
generating the 1k strings. I would prime the channel to be  full - or
other restructuring the ensure all threads are kept busy.


-y

On Thu, Dec 6, 2018 at 10:56 PM yary <not....@gmail.com> wrote:
>
> Not sure if your test is measuring what you expect- the setup of
> generating 50 x 1k strings is taking 2.7sec on my laptop, and that's
> reducing the apparent effect of parllelism.
>
> $ perl6
> To exit type 'exit' or '^D'
> > my $c = Channel.new;
> Channel.new
> > { for 1..50 {$c.send((1..1024).map( { (' '..'Z').pick } ).join);}; say now 
> > - ENTER now; }
> 2.7289092
>
> I'd move the setup outside the "cmpthese" and try again, re-think the
> new results.
>
>
>
> On 12/6/18, Vadim Belman <vr...@lflat.org> wrote:
> > Hi everybody!
> >
> > I have recently played a bit with somewhat intense computations and tried to
> > parallelize them among a couple of threaded workers. The results were
> > somewhat... eh... discouraging. To sum up my findings I wrote a simple demo
> > benchmark:
> >
> >      use Digest::SHA;
> >      use Bench;
> >
> >      sub worker ( Str:D $str ) {
> >          my $digest = $str;
> >
> >          for 1..100 {
> >              $digest = sha256 $digest;
> >          }
> >      }
> >
> >      sub run ( Int $workers ) {
> >          my $c = Channel.new;
> >
> >          my @w;
> >          @w.push: start {
> >              for 1..50 {
> >                  $c.send(
> >                      (1..1024).map( { (' '..'Z').pick } ).join
> >                  );
> >              }
> >              LEAVE $c.close;
> >          }
> >
> >          for 1..$workers {
> >              @w.push: start {
> >                  react {
> >                      whenever $c -> $str {
> >                          worker( $str );
> >                      }
> >                  }
> >              }
> >          }
> >
> >          await @w;
> >      }
> >
> >      my $b = Bench.new;
> >      $b.cmpthese(
> >          1,
> >          {
> >              workers1 => sub { run( 1 ) },
> >              workers5 => sub { run( 5 ) },
> >              workers10 => sub { run( 10 ) },
> >              workers15 => sub { run( 15 ) },
> >          }
> >      );
> >
> > I tried this code with a macOS installation of Rakudo and with a Linux in a
> > VM box. Here is macOS results (6 CPU cores):
> >
> > Timing 1 iterations of workers1, workers10, workers15, workers5...
> >   workers1: 27.176 wallclock secs (28.858 usr 0.348 sys 29.206 cpu) @
> > 0.037/s (n=1)
> >               (warning: too few iterations for a reliable count)
> >  workers10: 7.504 wallclock secs (56.903 usr 10.127 sys 67.030 cpu) @
> > 0.133/s (n=1)
> >               (warning: too few iterations for a reliable count)
> >  workers15: 7.938 wallclock secs (63.357 usr 9.483 sys 72.840 cpu) @ 0.126/s
> > (n=1)
> >               (warning: too few iterations for a reliable count)
> >   workers5: 9.452 wallclock secs (40.185 usr 4.807 sys 44.992 cpu) @ 0.106/s
> > (n=1)
> >               (warning: too few iterations for a reliable count)
> > O-----------O----------O----------O-----------O-----------O----------O
> > |           | s/iter   | workers1 | workers10 | workers15 | workers5 |
> > O===========O==========O==========O===========O===========O==========O
> > | workers1  | 27176370 | --       | -72%      | -71%      | -65%     |
> > | workers10 | 7503726  | 262%     | --        | 6%        | 26%      |
> > | workers15 | 7938428  | 242%     | -5%       | --        | 19%      |
> > | workers5  | 9452421  | 188%     | -21%      | -16%      | --       |
> > ----------------------------------------------------------------------
> >
> > And Linux (4 virtual cores):
> >
> > Timing 1 iterations of workers1, workers10, workers15, workers5...
> >   workers1: 27.240 wallclock secs (29.143 usr 0.129 sys 29.272 cpu) @
> > 0.037/s (n=1)
> >               (warning: too few iterations for a reliable count)
> >  workers10: 10.339 wallclock secs (37.964 usr 0.611 sys 38.575 cpu) @
> > 0.097/s (n=1)
> >               (warning: too few iterations for a reliable count)
> >  workers15: 10.221 wallclock secs (35.452 usr 1.432 sys 36.883 cpu) @
> > 0.098/s (n=1)
> >               (warning: too few iterations for a reliable count)
> >   workers5: 10.663 wallclock secs (36.983 usr 0.848 sys 37.831 cpu) @
> > 0.094/s (n=1)
> >               (warning: too few iterations for a reliable count)
> > O-----------O----------O----------O----------O-----------O-----------O
> > |           | s/iter   | workers5 | workers1 | workers15 | workers10 |
> > O===========O==========O==========O==========O===========O===========O
> > | workers5  | 10663102 | --       | 155%     | -4%       | -3%       |
> > | workers1  | 27240221 | -61%     | --       | -62%      | -62%      |
> > | workers15 | 10220862 | 4%       | 167%     | --        | 1%        |
> > | workers10 | 10338829 | 3%       | 163%     | -1%       | --        |
> > ----------------------------------------------------------------------
> >
> > Am I missing something here? Do I do something wrong? Because it just
> > doesn't fit into my mind...
> >
> > As a side done: by playing with 1-2-3 workers I see that each new thread
> > gradually adds atop of the total run time until a plato is reached. The
> > plato is seemingly defined by the number of cores or, more correctly, by the
> > number of supported threads. Proving this hypothesis wold require more time
> > than I have on my hands right now. And not even sure if such proof ever
> > makes sense.
> >
> > Best regards,
> > Vadim Belman
> >
> >
>
>
> --
> -y

Re: Performance of parallel computing.

Reply via email to