Hi Olivier, I have rerun my benchmark and played around with some settings. With your optimizations, I noticed no time improvements with my little benchmark. However, I added in Transport class: config.setProperty("hazelcast.partition.count", System.getProperty(Configuration.APGAS_PLACES));
Thereby, the benchmark needs with one place crash only 0.9 - 1.4 sec. Well, only a little overhead to no place crash. In my actual project, this property has a positive impact, too. Do you know, if setting the partition count to count of places has some disadvantages? Moreover, if I use hazelcast transactions in a APGAS program, I am using additionally this: TransactionOptions txOptions = new TransactionOptions().setTimeout(10, TimeUnit.SECONDS); Otherwise, hazelcast waits in worst case 120 sec for a transaction. Thanks and cheers Jonas Am 04.10.2016 um 05:22 schrieb Olivier Tardieu: > Jonas, > > I have now reduced the connection retry count to 0. > > Olivier > > Olivier Tardieu/Watson/IBM@IBMUS wrote on 10/03/2016 09:30:03 PM: > >> From: Olivier Tardieu/Watson/IBM@IBMUS >> To: Mailing list for users of the X10 programming language <x10- >> us...@lists.sourceforge.net> >> Date: 10/03/2016 09:31 PM >> Subject: Re: [X10-users] [APGAS] Performance at place failure >> >> Hi Jonas, >> >> I am starting to look into this. Sorry for the delay. >> >> There is a big difference between X10 and APGAS when it comes to >> detecting place failures. >> X10 does the detection quickly because it owns the communication >> channels between the places. >> As soon as a channel is closed it assumes the place has failed and >> report the failure. >> APGAS on the other hand relies on Hazelcast for all communication > channels. >> Hazelcast does not consider the loss of a channel as an >> irrecoverable error and tries to reconnect. >> I have just committed a patch to instruct Hazelcast to give up on >> failed connections faster. >> That should help a lot. >> >> Like you, I have also observed some performance issues with >> Hazelcast 3.6.3 and upgraded to 3.7.1 in another commit. >> >> Now, this is not going to make up for the difference as Hazelcast >> will still spend up to 1s trying to reconnect and some time doing >> reconfiguration. >> I will continue to look into ways to improve the recovery time with > Hazelcast. >> >> Could you please update and rerun your benchmark programs and share >> the results? >> >> Regards, >> >> Olivier >> >> >> Jonas Posner <jonas.pos...@uni-kassel.de> wrote on 08/31/2016 08:02:45 AM: >> >> > From: Jonas Posner <jonas.pos...@uni-kassel.de> >> > To: Mailing list for users of the X10 programming language <x10- >> > us...@lists.sourceforge.net> >> > Date: 08/31/2016 08:38 AM >> > Subject: [X10-users] [APGAS] Performance at place failure >> > >> > Hi all, >> > >> > I am playing around with the resiliency of APGAS. I wondered about >> > relative high times for manage a place failure. For comparision, I > wrote >> > a simple program in X10 and APGAS. Both are attached. The results show >> > significant difference. >> > >> > Every experiment was run with 4 places. X10 and APGAS are deployed from >> > the official git repository. >> > >> > Native X10 with X10_RESILIENT_MODE=1 >> > without crash: 0.003 >> > with crash: 0.015 >> > >> > >> > Managed X10 with Hazelcast 3.3.1 and X10_RESILIENT_MODE=1 >> > without crash: 0.34 >> > with crash: 0.74 >> > >> > >> > APGAS with Hazelcast with Hazelcast 3.6.3 and > -Dapgas.serialization=java >> > -Dapgas.resilient=true -Dapgas.compact=false >> > without crash: 0.86 >> > with crash: very varying: 8-38 >> > >> > >> > APGAS with Hazelcast with Hazelcast 3.6.3 and > -Dapgas.serialization=java >> > -Dapgas.resilient=true -Dapgas.compact=true >> > without crash: 0.8 >> > with crash: very varying: 8-31 >> > >> > >> > APGAS with Hazelcast with Hazelcast 3.7 and -Dapgas.serialization=java >> > -Dapgas.resilient=true -Dapgas.compact=false >> > without crash: 0.77 >> > with crash: 5.7 or 11.33 (50:50) >> > >> > >> > APGAS with Hazelcast with Hazelcast 3.7 and -Dapgas.serialization=java >> > -Dapgas.resilient=true -Dapgas.compact=true >> > without crash: 0.74 >> > with crash: 5.7 or 11.33 (50:50) >> > >> > >> > >> > Managed X10 absorbs a failure significantly better than APGAS. However, >> > Managed X10 uses Hazelcast 3.3.1 and (official) APGAS uses Hazelcast >> > 3.6.3. What causes the differences? >> > >> > A few days ago Hazelcast was released in version 3.7. In my > experiments, >> > it performs better. >> > >> > What purpose has "apgas.compact=true"? Should APGAS perform better > with it? >> > >> > Are there other options to improve APGAS's performance at a place > failure? >> > >> > >> > >> > Thanks and cheers >> > >> > -- >> > Jonas Posner >> > Universitaet Kassel >> > Fachbereich 16 Elektrotechnik/Informatik >> > Fachgebiet Programmiersprachen/-methodik >> > Wilhelmshoeher Allee 71-73 >> > 34121 Kassel, Germany >> > >> > Phone: +49 (0)561 804-6498 >> > Fax: +49 (0)561 804-6219 >> > mailto: jonas.pos...@uni-kassel.de >> > www.uni-kassel.de >> > [attachment "MessCrashingTime.java" deleted by Olivier Tardieu/ >> > Watson/IBM] [attachment "MessCrashingTime.x10" deleted by Olivier >> > Tardieu/Watson/IBM] >> > >> > ------------------------------------------------------------------------------ >> > _______________________________________________ >> > X10-users mailing list >> > X10-users@lists.sourceforge.net >> > https://lists.sourceforge.net/lists/listinfo/x10-users >> > ------------------------------------------------------------------------------ >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, SlashDot.org! http://sdm.link/slashdot >> _______________________________________________ >> X10-users mailing list >> X10-users@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/x10-users > > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > > > > _______________________________________________ > X10-users mailing list > X10-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/x10-users > -- Jonas Posner Universitaet Kassel Fachbereich 16 Elektrotechnik/Informatik Fachgebiet Programmiersprachen/-methodik Wilhelmshoeher Allee 71-73 34121 Kassel, Germany Phone: +49 (0)561 804-6498 Fax: +49 (0)561 804-6219 mailto: jonas.pos...@uni-kassel.de www.uni-kassel.de ------------------------------------------------------------------------------ Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today. http://sdm.link/xeonphi _______________________________________________ X10-users mailing list X10-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/x10-users