Jonas,
I have now reduced the connection retry count to 0.
Olivier
Olivier Tardieu/Watson/IBM@IBMUS wrote on 10/03/2016 09:30:03 PM:
> From: Olivier Tardieu/Watson/IBM@IBMUS
> To: Mailing list for users of the X10 programming language <x10-
> us...@lists.sourceforge.net>
> Date: 10/03/2016 09:31 PM
> Subject: Re: [X10-users] [APGAS] Performance at place failure
>
> Hi Jonas,
>
> I am starting to look into this. Sorry for the delay.
>
> There is a big difference between X10 and APGAS when it comes to
> detecting place failures.
> X10 does the detection quickly because it owns the communication
> channels between the places.
> As soon as a channel is closed it assumes the place has failed and
> report the failure.
> APGAS on the other hand relies on Hazelcast for all communication
channels.
> Hazelcast does not consider the loss of a channel as an
> irrecoverable error and tries to reconnect.
> I have just committed a patch to instruct Hazelcast to give up on
> failed connections faster.
> That should help a lot.
>
> Like you, I have also observed some performance issues with
> Hazelcast 3.6.3 and upgraded to 3.7.1 in another commit.
>
> Now, this is not going to make up for the difference as Hazelcast
> will still spend up to 1s trying to reconnect and some time doing
> reconfiguration.
> I will continue to look into ways to improve the recovery time with
Hazelcast.
>
> Could you please update and rerun your benchmark programs and share
> the results?
>
> Regards,
>
> Olivier
>
>
> Jonas Posner <jonas.pos...@uni-kassel.de> wrote on 08/31/2016 08:02:45
AM:
>
> > From: Jonas Posner <jonas.pos...@uni-kassel.de>
> > To: Mailing list for users of the X10 programming language <x10-
> > us...@lists.sourceforge.net>
> > Date: 08/31/2016 08:38 AM
> > Subject: [X10-users] [APGAS] Performance at place failure
> >
> > Hi all,
> >
> > I am playing around with the resiliency of APGAS. I wondered about
> > relative high times for manage a place failure. For comparision, I
wrote
> > a simple program in X10 and APGAS. Both are attached. The results show
> > significant difference.
> >
> > Every experiment was run with 4 places. X10 and APGAS are deployed
from
> > the official git repository.
> >
> > Native X10 with X10_RESILIENT_MODE=1
> > without crash: 0.003
> > with crash: 0.015
> >
> >
> > Managed X10 with Hazelcast 3.3.1 and X10_RESILIENT_MODE=1
> > without crash: 0.34
> > with crash: 0.74
> >
> >
> > APGAS with Hazelcast with Hazelcast 3.6.3 and
-Dapgas.serialization=java
> > -Dapgas.resilient=true -Dapgas.compact=false
> > without crash: 0.86
> > with crash: very varying: 8-38
> >
> >
> > APGAS with Hazelcast with Hazelcast 3.6.3 and
-Dapgas.serialization=java
> > -Dapgas.resilient=true -Dapgas.compact=true
> > without crash: 0.8
> > with crash: very varying: 8-31
> >
> >
> > APGAS with Hazelcast with Hazelcast 3.7 and -Dapgas.serialization=java
> > -Dapgas.resilient=true -Dapgas.compact=false
> > without crash: 0.77
> > with crash: 5.7 or 11.33 (50:50)
> >
> >
> > APGAS with Hazelcast with Hazelcast 3.7 and -Dapgas.serialization=java
> > -Dapgas.resilient=true -Dapgas.compact=true
> > without crash: 0.74
> > with crash: 5.7 or 11.33 (50:50)
> >
> >
> >
> > Managed X10 absorbs a failure significantly better than APGAS.
However,
> > Managed X10 uses Hazelcast 3.3.1 and (official) APGAS uses Hazelcast
> > 3.6.3. What causes the differences?
> >
> > A few days ago Hazelcast was released in version 3.7. In my
experiments,
> > it performs better.
> >
> > What purpose has "apgas.compact=true"? Should APGAS perform better
with it?
> >
> > Are there other options to improve APGAS's performance at a place
failure?
> >
> >
> >
> > Thanks and cheers
> >
> > --
> > Jonas Posner
> > Universitaet Kassel
> > Fachbereich 16 Elektrotechnik/Informatik
> > Fachgebiet Programmiersprachen/-methodik
> > Wilhelmshoeher Allee 71-73
> > 34121 Kassel, Germany
> >
> > Phone: +49 (0)561 804-6498
> > Fax: +49 (0)561 804-6219
> > mailto: jonas.pos...@uni-kassel.de
> > www.uni-kassel.de
> > [attachment "MessCrashingTime.java" deleted by Olivier Tardieu/
> > Watson/IBM] [attachment "MessCrashingTime.x10" deleted by Olivier
> > Tardieu/Watson/IBM]
> >
>
------------------------------------------------------------------------------
> > _______________________________________________
> > X10-users mailing list
> > X10-users@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/x10-users
>
------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> _______________________________________________
> X10-users mailing list
> X10-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/x10-users
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
X10-users mailing list
X10-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/x10-users