Re: [X10-users] Performance: at (...) async vs. at (...) @Uncountedasync

Marco Bungart Wed, 27 Jul 2016 00:33:07 -0700

Hi Josh,

sorry for the late answer.


Am 26.07.2016 um 21:12 schrieb Joshua J Milthorpe:
> Hi Marco,
>
> a few questions and suggestions about your benchmark code:
>
> Why measure the time over so few runs? With the number of runs in the
> attached output (500), the total execution time for the timed portion is
> around 0.05-0.1 seconds. This is probably too short a run to get a
> reliable measurement. I would suggest aiming for an execution time of at
> least 0.5 seconds (10000 runs or so) - maybe longer if the network is
> involved.
>
> What is the purpose of these blocks in the benchmark code?
>
> at (target) async
>
> {
>
> sleep(SLEEP);
>
> }

Yes. This part is essential. It "simualtes" the receiving side (which 
has only one thread) being blocked by some other activity for some time. 
The original program resembles GLB in which there is only one worker 
thread per place, executing tasks and probing the network from time to 
time so send/receive messages.

What is not essential (and a mistake on my side) was starting this 
acitivity within the loop. I will comment on this futher below.

>
> These blocks would seem to double the number of (counted) activities,
> but it's not clear whether they were meant to be included in the timing.
> More importantly, they tie up worker threads at the receiving place,
> meaning there is a limit to the number of remote async activities (RUNS)
> that can be processed. Is this an important feature of the benchmark? (I
> doubt a real X10 program would have large numbers of calls to sleep()
> from user code.)
>
> Also, in the version of the code you sent, asyncAt() is used for both
> tests, and asyncUncountedAt() is never called - I presume this is a typo
> not present in the version you tested?

Yes, that was a typo.
>
> Finally, I suspect the timer calls are at too low a level of nesting to
> be accurate. The benchmark code calls System.nanoTime() twice per run.
> This means that a significant amount of time may be attributable to the
> overhead of the timer function itself. I would suggest hoisting the
> timer calls outside of the loop over RUNS, so the timer is only called
> twice per execution.
>
> If I modify your benchmark code accordingly and use RUNS=10000, the
> repeatable result is that Uncounted async is faster over both Sockets
> and MPI (see attached version of the code). Can you confirm on your test
> machines?
>

I modified my test accordingly and ran both tests (yours and mine). I 
can confirm your results. Thanks for pointing out my mistakes =)

Cheers,
Marco

> /(See attached file: Test.x10)/
>
> $ x10c++ -O -NO_CHECKS Test.x10
> $ X10_NPLACES=2 ./a.out 3 10000 1000 2000
> [MASTER Place(0)]: starting test...
> --------------------------------------------------------------------------------
> [MASTER Place(0)]: starting execution 0
> [MASTER Place(0)]: starting asyncAt test... done!
> [MASTER Place(0)]: sleeping 2000 ms to let all open activities finish.
> [MASTER Place(0)]: starting asyncUncountedAt test... done!
> [MASTER Place(0)]: sleeping 2000 ms to let all open activities finish.
> [MASTER Place(0)]: test statistics:
> [MASTER Place(0)]: asyncAt: 10000 runs in 0.515896001 seconds -> 51589.0
> ns per run.
> [MASTER Place(0)]: asyncUncountedAt: 10000 runs in 0.177069871 seconds
> -> 17706.0 ns per run.
> [MASTER Place(0)]: execution 0 done.
> --------------------------------------------------------------------------------
> [MASTER Place(0)]: starting execution 1
> [MASTER Place(0)]: starting asyncAt test... done!
> [MASTER Place(0)]: sleeping 2000 ms to let all open activities finish.
> [MASTER Place(0)]: starting asyncUncountedAt test... done!
> [MASTER Place(0)]: sleeping 2000 ms to let all open activities finish.
> [MASTER Place(0)]: test statistics:
> [MASTER Place(0)]: asyncAt: 10000 runs in 0.461985174 seconds -> 46198.0
> ns per run.
> [MASTER Place(0)]: asyncUncountedAt: 10000 runs in 0.199475774 seconds
> -> 19947.0 ns per run.
> [MASTER Place(0)]: execution 1 done.
> --------------------------------------------------------------------------------
> [MASTER Place(0)]: starting execution 2
> [MASTER Place(0)]: starting asyncAt test... done!
> [MASTER Place(0)]: sleeping 2000 ms to let all open activities finish.
> [MASTER Place(0)]: starting asyncUncountedAt test... done!
> [MASTER Place(0)]: sleeping 2000 ms to let all open activities finish.
> [MASTER Place(0)]: test statistics:
> [MASTER Place(0)]: asyncAt: 10000 runs in 0.485613161 seconds -> 48561.0
> ns per run.
> [MASTER Place(0)]: asyncUncountedAt: 10000 runs in 0.269239342 seconds
> -> 26923.0 ns per run.
> [MASTER Place(0)]: execution 2 done.
> $
> $ x10c++ -x10rt mpi -O -NO_CHECKS Test.x10
> $ mpiexec -n 2 ./a.out 3 10000 1000 2000
> [MASTER Place(0)]: starting test...
> --------------------------------------------------------------------------------
> [MASTER Place(0)]: starting execution 0
> [MASTER Place(0)]: starting asyncAt test... done!
> [MASTER Place(0)]: sleeping 2000 ms to let all open activities finish.
> [MASTER Place(0)]: starting asyncUncountedAt test... done!
> [MASTER Place(0)]: sleeping 2000 ms to let all open activities finish.
> [MASTER Place(0)]: test statistics:
> [MASTER Place(0)]: asyncAt: 10000 runs in 0.070323175 seconds -> 7032.0
> ns per run.
> [MASTER Place(0)]: asyncUncountedAt: 10000 runs in 0.059020186 seconds
> -> 5902.0 ns per run.
> [MASTER Place(0)]: execution 0 done.
> --------------------------------------------------------------------------------
> [MASTER Place(0)]: starting execution 1
> [MASTER Place(0)]: starting asyncAt test... done!
> [MASTER Place(0)]: sleeping 2000 ms to let all open activities finish.
> [MASTER Place(0)]: starting asyncUncountedAt test... done!
> [MASTER Place(0)]: sleeping 2000 ms to let all open activities finish.
> [MASTER Place(0)]: test statistics:
> [MASTER Place(0)]: asyncAt: 10000 runs in 0.071089413 seconds -> 7108.0
> ns per run.
> [MASTER Place(0)]: asyncUncountedAt: 10000 runs in 0.03705698 seconds ->
> 3705.0 ns per run.
> [MASTER Place(0)]: execution 1 done.
> --------------------------------------------------------------------------------
> [MASTER Place(0)]: starting execution 2
> [MASTER Place(0)]: starting asyncAt test... done!
> [MASTER Place(0)]: sleeping 2000 ms to let all open activities finish.
> [MASTER Place(0)]: starting asyncUncountedAt test... done!
> [MASTER Place(0)]: sleeping 2000 ms to let all open activities finish.
> [MASTER Place(0)]: test statistics:
> [MASTER Place(0)]: asyncAt: 10000 runs in 0.079891188 seconds -> 7989.0
> ns per run.
> [MASTER Place(0)]: asyncUncountedAt: 10000 runs in 0.038455636 seconds
> -> 3845.0 ns per run.
> [MASTER Place(0)]: execution 2 done.
>
> Cheers,
>
> *Josh Milthorpe*
> Post Doctoral Researcher
> Cognitive Systems: Learning to Reason
> IBM Research
>
> ------------------------------------------------------------------------
> *Phone:*1-914-945-2209*
> E-mail:*_jjmil...@us.ibm.com_ <mailto:jjmil...@us.ibm.com>    
> 1101 Kitchawan Rd
> Yorktown Heights, NY 10598
> United States
>
>
>
> Inactive hide details for Marco Bungart ---07/26/2016 09:59:01 AM---Hi
> all, we monitored one of our programs and observed a strMarco Bungart
> ---07/26/2016 09:59:01 AM---Hi all, we monitored one of our programs and
> observed a strange behaviour: the
>
> From: Marco Bungart <m.bung...@gmx.net>
> To: Mailing list for users of the X10 programming language
> <x10-users@lists.sourceforge.net>
> Date: 07/26/2016 09:59 AM
> Subject: [X10-users] Performance: at (...) async vs. at (...) @Uncounted
> async
>
> ------------------------------------------------------------------------
>
>
>
> Hi all,
>
> we monitored one of our programs and observed a strange behaviour: the
> compiler-annotation @Uncounted seems to slow down program execution. I
> constructed a sample program (attached) to simulate the behaviour
> (please ignore any kind of synchronization problem, I tried to keep the
> test as simple as possible).
>
> I compiled the attached program with the current git-version of X10. The
> tests were conducted with the default (sockets) and mpi (OpenMPI
> 1.7.1ULFM) RT implementations. Envirnoment variables were set to
> X10_NPLACES=2 and X10_NTHREADS=1. The program output is in the
> corresponding attachments.
>
> Bottom line: with the socket implementation, at (...) @Uncounted async {
> ... } seems to be slower than or equally fast as a "normal" at (...)
> async { ... }. With the MPI implementation, execution times are as
> expected (@Uncounted is faster than its counterpart). I was able to
> observe this behaviour on two different machines (both linux, one Ubuntu
> 16.04.1 LTS, the other a CentOS Linux release 7.2.1511). Is there
> something wrong within the sockets implementation or is this behaviour
> expected? If this is expected behaviour, could someone tell me why, with
> sockets RT, an at @Uncounted async is slower?
>
> Thanks and cheers,
> Marco
> [attachment "Test.x10" deleted by Joshua J Milthorpe/Watson/IBM]
> [attachment "mpi.txt" deleted by Joshua J Milthorpe/Watson/IBM]
> [attachment "sockets.txt" deleted by Joshua J Milthorpe/Watson/IBM]
> ------------------------------------------------------------------------------
> What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
> patterns at an interface-level. Reveals which users, apps, and protocols
> are
> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
> J-Flow, sFlow and other flows. Make informed decisions using capacity
> planning
> reports.http://sdm.link/zohodev2dev_______________________________________________
> X10-users mailing list
> X10-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/x10-users
>
>
>
>
> ------------------------------------------------------------------------------
> What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
> patterns at an interface-level. Reveals which users, apps, and protocols are
> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
> J-Flow, sFlow and other flows. Make informed decisions using capacity planning
> reports.http://sdm.link/zohodev2dev
>
>
>
> _______________________________________________
> X10-users mailing list
> X10-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/x10-users
>

------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev
_______________________________________________
X10-users mailing list
X10-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/x10-users

Re: [X10-users] Performance: at (...) async vs. at (...) @Uncountedasync

Reply via email to