Jonas,

I have now reduced the connection retry count to 0.

Olivier

Olivier Tardieu/Watson/IBM@IBMUS wrote on 10/03/2016 09:30:03 PM:

> From: Olivier Tardieu/Watson/IBM@IBMUS
> To: Mailing list for users of the X10 programming language <x10-
> us...@lists.sourceforge.net>
> Date: 10/03/2016 09:31 PM
> Subject: Re: [X10-users] [APGAS] Performance at place failure
> 
> Hi Jonas,
> 
> I am starting to look into this. Sorry for the delay.
> 
> There is a big difference between X10 and APGAS when it comes to 
> detecting place failures.
> X10 does the detection quickly because it owns the communication 
> channels between the places.
> As soon as a channel is closed it assumes the place has failed and 
> report the failure.
> APGAS on the other hand relies on Hazelcast for all communication 
channels.
> Hazelcast does not consider the loss of a channel as an 
> irrecoverable error and tries to reconnect.
> I have just committed a patch to instruct Hazelcast to give up on 
> failed connections faster.
> That should help a lot.
> 
> Like you, I have also observed some performance issues with 
> Hazelcast 3.6.3 and upgraded to 3.7.1 in another commit.
> 
> Now, this is not going to make up for the difference as Hazelcast 
> will still spend up to 1s trying to reconnect and some time doing 
> reconfiguration.
> I will continue to look into ways to improve the recovery time with 
Hazelcast.
> 
> Could you please update and rerun your benchmark programs and share 
> the results?
> 
> Regards,
> 
> Olivier
> 
> 
> Jonas Posner <jonas.pos...@uni-kassel.de> wrote on 08/31/2016 08:02:45 
AM:
> 
> > From: Jonas Posner <jonas.pos...@uni-kassel.de>
> > To: Mailing list for users of the X10 programming language <x10-
> > us...@lists.sourceforge.net>
> > Date: 08/31/2016 08:38 AM
> > Subject: [X10-users] [APGAS] Performance at place failure
> > 
> > Hi all,
> > 
> > I am playing around with the resiliency of APGAS. I wondered about 
> > relative high times for manage a place failure. For comparision, I 
wrote 
> > a simple program in X10 and APGAS. Both are attached. The results show 

> > significant difference.
> > 
> > Every experiment was run with 4 places. X10 and APGAS are deployed 
from 
> > the official git repository.
> > 
> > Native X10 with X10_RESILIENT_MODE=1
> > without crash: 0.003
> > with crash: 0.015
> > 
> > 
> > Managed X10 with Hazelcast 3.3.1 and X10_RESILIENT_MODE=1
> > without crash: 0.34
> > with crash: 0.74
> > 
> > 
> > APGAS with Hazelcast with Hazelcast 3.6.3 and 
-Dapgas.serialization=java 
> > -Dapgas.resilient=true -Dapgas.compact=false
> > without crash: 0.86
> > with crash: very varying: 8-38
> > 
> > 
> > APGAS with Hazelcast with Hazelcast 3.6.3 and 
-Dapgas.serialization=java 
> > -Dapgas.resilient=true -Dapgas.compact=true
> > without crash: 0.8
> > with crash: very varying: 8-31
> > 
> > 
> > APGAS with Hazelcast with Hazelcast 3.7 and -Dapgas.serialization=java 

> > -Dapgas.resilient=true -Dapgas.compact=false
> > without crash: 0.77
> > with crash: 5.7 or 11.33 (50:50)
> > 
> > 
> > APGAS with Hazelcast with Hazelcast 3.7 and -Dapgas.serialization=java 

> > -Dapgas.resilient=true -Dapgas.compact=true
> > without crash: 0.74
> > with crash: 5.7 or 11.33 (50:50)
> > 
> > 
> > 
> > Managed X10 absorbs a failure significantly better than APGAS. 
However, 
> > Managed X10 uses Hazelcast 3.3.1 and (official) APGAS uses Hazelcast 
> > 3.6.3. What causes the differences?
> > 
> > A few days ago Hazelcast was released in version 3.7. In my 
experiments, 
> > it performs better.
> > 
> > What purpose has "apgas.compact=true"? Should APGAS perform better 
with it?
> > 
> > Are there other options to improve APGAS's performance at a place 
failure?
> > 
> > 
> > 
> > Thanks and cheers
> > 
> > -- 
> > Jonas Posner
> > Universitaet Kassel
> > Fachbereich 16 Elektrotechnik/Informatik
> > Fachgebiet Programmiersprachen/-methodik
> > Wilhelmshoeher Allee 71-73
> > 34121 Kassel, Germany
> > 
> > Phone:  +49 (0)561 804-6498
> > Fax:    +49 (0)561 804-6219
> > mailto: jonas.pos...@uni-kassel.de
> > www.uni-kassel.de
> > [attachment "MessCrashingTime.java" deleted by Olivier Tardieu/
> > Watson/IBM] [attachment "MessCrashingTime.x10" deleted by Olivier 
> > Tardieu/Watson/IBM] 
> > 
> 
------------------------------------------------------------------------------
> > _______________________________________________
> > X10-users mailing list
> > X10-users@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/x10-users
> 
------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most 
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> _______________________________________________
> X10-users mailing list
> X10-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/x10-users


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
X10-users mailing list
X10-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/x10-users

Reply via email to