Hi Marco - the short answer is that Team.x10, as well as Clocks, are not
yet resilient-aware.  If a place dies while using either of these, you may
get unexpected behavior, at least in the upcoming 2.5.0 release.



   - Ben



From:   Marco Bungart <m.bung...@gmx.net>
To:     x10-users@lists.sourceforge.net
Date:   09/17/2014 08:59 AM
Subject:        [X10-users] Place.places() and Team.allreduce(...)



Hi there,

I am playing around with the current svn-version (R28225). In
particular, I tried using allreduce in combination with Place.places.
Example code is appended. I was able to make the following observations:

- The behaviour of the code changes with X10_NPLACES.
- The behaviour of the code changes with the place, which died.
- The behaviour is (partially) non-deterministic.

I tested the code under Ubuntu linux (14.04), gcc 4.8.3 and openJDK
1.7.0_65. I summarized my results in the table below.
I compiled the program with both the C++- and the Java-Backend. Both
show the same behaviour.

         | ID of killed Place (args(0)):
NPLACES |     1  |  2  |  3  |  4  |  5  |  6  |  7  |
      2  |     P  |     |     |     |     |     |     |
      4  |    ok  |  P  |  P  |     |     |     |     |
      8  |    ok  | S/2 | S/2 |  P  |  P  |  P  |  P  |


"P" means: premature program abortion.
The output can take one of tow forms: a short one and a long one.
short:
$ ./Hello 2
x10.lang.DeadPlaceException: DeadPlaceException at Place(2)
Place 2 exited unexpectedly with exit code: 1
Launcher 2: cleanup complete, exit code=1.  Goodbye!
Launcher 0: cleanup complete, exit code=1.  Goodbye!

long:
$ ./Hello 3
x10.lang.DeadPlaceException: DeadPlaceException at Place(3)
Place 3 exited unexpectedly with exit code: 1
Launcher 3: cleanup complete, exit code=1.  Goodbye!
Launcher 1: cleanup complete, exit code=1.  Goodbye!
                 at
x10::lang::FinishResilientPlace0::addDeadPlaceException
(x10::lang::FinishResilientPlace0__State*,
long long)
                 at x10::lang::FinishResilientPlace0::quiescent(long long)
                 at x10::lang::FinishResilientPlace0::notifyPlaceDeath()
                 at x10::lang::FinishResilient::notifyPlaceDeath()
                 at x10::lang::Runtime::notifyPlaceDeath()
                 at x10::lang::Runtime__Pool::scan(x10::util::Random*,
x10::lang::Runtime__Worker*)
                 at x10::lang::Runtime__Worker::loop()
                 at x10::lang::Runtime__Worker::__apply()
                 at x10::lang::Thread::thread_start_routine(void*)
                 at GC_inner_start_routine
                 at GC_call_with_stack_base
                 at GC_start_routine
                 at
                 at clone
Place(0): Place.places():
Launcher 0: cleanup complete, exit code=1.  Goodbye!
Launcher -1: cleanup complete, exit code=1.  Goodbye!

"S" means program gets stuck, the output looks like this:
$ ./Hello 3
x10.lang.DeadPlaceException: DeadPlaceException at Place(3)
                 at
x10::lang::FinishResilientPlace0::addDeadPlaceException
(x10::lang::FinishResilientPlace0__State*,
long long)
                 at x10::lang::FinishResilientPlace0::quiescent(long long)
                 at x10::lang::FinishResilientPlace0::notifyPlaceDeath()
                 at x10::lang::FinishResilient::notifyPlaceDeath()
                 at x10::lang::Runtime::notifyPlaceDeath()
                 at x10::lang::Runtime__Pool::scan(x10::util::Random*,
x10::lang::Runtime__Worker*)
                 at x10::lang::Runtime__Worker::loop()
                 at x10::lang::Runtime__Worker::__apply()
                 at x10::lang::Thread::thread_start_routine(void*)
                 at GC_inner_start_routine
                 at GC_call_with_stack_base
                 at GC_start_routine
                 at
                 at clone
Place(0): Place.places():
     Place(0)
     Place(1)
     Place(2)
     Place(4)
     Place(5)
     Place(6)
     Place(7)
Place(0): stage 1.
Place 3 exited unexpectedly with exit code: 1
(still running)

"S/2" meaning "getting stuck about half of the time, running as expected
half of the time", showing non-deterministic behaviour.

Is there something wrong with the way I use broadcast/allreduce or are
the corresponding library-functions still under construction for X10 2.5.0?

Cheers,
Marco
[attachment "Hello.x10" deleted by Benjamin Herta/Watson/IBM]
------------------------------------------------------------------------------

Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
X10-users mailing list
X10-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/x10-users
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
X10-users mailing list
X10-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/x10-users

Reply via email to