Hi Olivier,

I have rerun my benchmark and played around with some settings. With 
your optimizations, I noticed no time improvements with my little 
benchmark. However, I added in Transport class:
config.setProperty("hazelcast.partition.count", 
System.getProperty(Configuration.APGAS_PLACES));

Thereby, the benchmark needs with one place crash only 0.9 - 1.4 sec. 
Well, only a little overhead to no place crash. In my actual project, 
this property has a positive impact, too.

Do you know, if setting the partition count to count of places has some 
disadvantages?

Moreover, if I use hazelcast transactions in a APGAS program, I am using 
additionally this:
TransactionOptions txOptions = new TransactionOptions().setTimeout(10, 
TimeUnit.SECONDS);

Otherwise, hazelcast waits in worst case 120 sec for a transaction.


Thanks and cheers
Jonas

Am 04.10.2016 um 05:22 schrieb Olivier Tardieu:
> Jonas,
>
> I have now reduced the connection retry count to 0.
>
> Olivier
>
> Olivier Tardieu/Watson/IBM@IBMUS wrote on 10/03/2016 09:30:03 PM:
>
>> From: Olivier Tardieu/Watson/IBM@IBMUS
>> To: Mailing list for users of the X10 programming language <x10-
>> us...@lists.sourceforge.net>
>> Date: 10/03/2016 09:31 PM
>> Subject: Re: [X10-users] [APGAS] Performance at place failure
>>
>> Hi Jonas,
>>
>> I am starting to look into this. Sorry for the delay.
>>
>> There is a big difference between X10 and APGAS when it comes to
>> detecting place failures.
>> X10 does the detection quickly because it owns the communication
>> channels between the places.
>> As soon as a channel is closed it assumes the place has failed and
>> report the failure.
>> APGAS on the other hand relies on Hazelcast for all communication
> channels.
>> Hazelcast does not consider the loss of a channel as an
>> irrecoverable error and tries to reconnect.
>> I have just committed a patch to instruct Hazelcast to give up on
>> failed connections faster.
>> That should help a lot.
>>
>> Like you, I have also observed some performance issues with
>> Hazelcast 3.6.3 and upgraded to 3.7.1 in another commit.
>>
>> Now, this is not going to make up for the difference as Hazelcast
>> will still spend up to 1s trying to reconnect and some time doing
>> reconfiguration.
>> I will continue to look into ways to improve the recovery time with
> Hazelcast.
>>
>> Could you please update and rerun your benchmark programs and share
>> the results?
>>
>> Regards,
>>
>> Olivier
>>
>>
>> Jonas Posner <jonas.pos...@uni-kassel.de> wrote on 08/31/2016 08:02:45 AM:
>>
>> > From: Jonas Posner <jonas.pos...@uni-kassel.de>
>> > To: Mailing list for users of the X10 programming language <x10-
>> > us...@lists.sourceforge.net>
>> > Date: 08/31/2016 08:38 AM
>> > Subject: [X10-users] [APGAS] Performance at place failure
>> >
>> > Hi all,
>> >
>> > I am playing around with the resiliency of APGAS. I wondered about
>> > relative high times for manage a place failure. For comparision, I
> wrote
>> > a simple program in X10 and APGAS. Both are attached. The results show
>> > significant difference.
>> >
>> > Every experiment was run with 4 places. X10 and APGAS are deployed from
>> > the official git repository.
>> >
>> > Native X10 with X10_RESILIENT_MODE=1
>> > without crash: 0.003
>> > with crash: 0.015
>> >
>> >
>> > Managed X10 with Hazelcast 3.3.1 and X10_RESILIENT_MODE=1
>> > without crash: 0.34
>> > with crash: 0.74
>> >
>> >
>> > APGAS with Hazelcast with Hazelcast 3.6.3 and
> -Dapgas.serialization=java
>> > -Dapgas.resilient=true -Dapgas.compact=false
>> > without crash: 0.86
>> > with crash: very varying: 8-38
>> >
>> >
>> > APGAS with Hazelcast with Hazelcast 3.6.3 and
> -Dapgas.serialization=java
>> > -Dapgas.resilient=true -Dapgas.compact=true
>> > without crash: 0.8
>> > with crash: very varying: 8-31
>> >
>> >
>> > APGAS with Hazelcast with Hazelcast 3.7 and -Dapgas.serialization=java
>> > -Dapgas.resilient=true -Dapgas.compact=false
>> > without crash: 0.77
>> > with crash: 5.7 or 11.33 (50:50)
>> >
>> >
>> > APGAS with Hazelcast with Hazelcast 3.7 and -Dapgas.serialization=java
>> > -Dapgas.resilient=true -Dapgas.compact=true
>> > without crash: 0.74
>> > with crash: 5.7 or 11.33 (50:50)
>> >
>> >
>> >
>> > Managed X10 absorbs a failure significantly better than APGAS. However,
>> > Managed X10 uses Hazelcast 3.3.1 and (official) APGAS uses Hazelcast
>> > 3.6.3. What causes the differences?
>> >
>> > A few days ago Hazelcast was released in version 3.7. In my
> experiments,
>> > it performs better.
>> >
>> > What purpose has "apgas.compact=true"? Should APGAS perform better
> with it?
>> >
>> > Are there other options to improve APGAS's performance at a place
> failure?
>> >
>> >
>> >
>> > Thanks and cheers
>> >
>> > --
>> > Jonas Posner
>> > Universitaet Kassel
>> > Fachbereich 16 Elektrotechnik/Informatik
>> > Fachgebiet Programmiersprachen/-methodik
>> > Wilhelmshoeher Allee 71-73
>> > 34121 Kassel, Germany
>> >
>> > Phone:  +49 (0)561 804-6498
>> > Fax:    +49 (0)561 804-6219
>> > mailto: jonas.pos...@uni-kassel.de
>> > www.uni-kassel.de
>> > [attachment "MessCrashingTime.java" deleted by Olivier Tardieu/
>> > Watson/IBM] [attachment "MessCrashingTime.x10" deleted by Olivier
>> > Tardieu/Watson/IBM]
>> >
>>
> ------------------------------------------------------------------------------
>> > _______________________________________________
>> > X10-users mailing list
>> > X10-users@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/x10-users
>>
> ------------------------------------------------------------------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> _______________________________________________
>> X10-users mailing list
>> X10-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/x10-users
>
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>
>
>
> _______________________________________________
> X10-users mailing list
> X10-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/x10-users
>

-- 
Jonas Posner
Universitaet Kassel
Fachbereich 16 Elektrotechnik/Informatik
Fachgebiet Programmiersprachen/-methodik
Wilhelmshoeher Allee 71-73
34121 Kassel, Germany

Phone:  +49 (0)561 804-6498
Fax:    +49 (0)561 804-6219
mailto: jonas.pos...@uni-kassel.de
www.uni-kassel.de

------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
X10-users mailing list
X10-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/x10-users

Reply via email to