Hi Jonas,

I think in theory having few partitions increases the risk of imbalance 
where some Hazelcast instances end up with a lot more data than others.
There are probably other consideration. I have never played with this 
setting.

We have also encountered issues with Hazelcast transactions where a failed 
transaction (even a spurious failure) may introduce this 2-minute delay.
AFAIK there is no configuration setting for this... but last time I looked 
was probably in 3.5.
Because of this issue, we are not using Hazelcast transactions.
For instance for UTS we have rolled our own mini transactions.
See method set2 in HazelcastStore.x10:

https://github.com/x10-lang/x10/blob/master/x10.runtime/src-x10/x10/util/resilient/store/HazelcastStore.x10

Olivier

Jonas Posner <jonas.pos...@uni-kassel.de> wrote on 11/04/2016 11:05:56 AM:

> From: Jonas Posner <jonas.pos...@uni-kassel.de>
> To: Mailing list for users of the X10 programming language <x10-
> us...@lists.sourceforge.net>
> Date: 11/04/2016 11:06 AM
> Subject: Re: [X10-users] [APGAS] Performance at place failure
> 
> Hi Olivier,
> 
> I have rerun my benchmark and played around with some settings. With 
> your optimizations, I noticed no time improvements with my little 
> benchmark. However, I added in Transport class:
> config.setProperty("hazelcast.partition.count", 
> System.getProperty(Configuration.APGAS_PLACES));
> 
> Thereby, the benchmark needs with one place crash only 0.9 - 1.4 sec. 
> Well, only a little overhead to no place crash. In my actual project, 
> this property has a positive impact, too.
> 
> Do you know, if setting the partition count to count of places has some 
> disadvantages?
> 
> Moreover, if I use hazelcast transactions in a APGAS program, I am using 

> additionally this:
> TransactionOptions txOptions = new TransactionOptions().setTimeout(10, 
> TimeUnit.SECONDS);
> 
> Otherwise, hazelcast waits in worst case 120 sec for a transaction.
> 
> 
> Thanks and cheers
> Jonas
> 
> Am 04.10.2016 um 05:22 schrieb Olivier Tardieu:
> > Jonas,
> >
> > I have now reduced the connection retry count to 0.
> >
> > Olivier
> >
> > Olivier Tardieu/Watson/IBM@IBMUS wrote on 10/03/2016 09:30:03 PM:
> >
> >> From: Olivier Tardieu/Watson/IBM@IBMUS
> >> To: Mailing list for users of the X10 programming language <x10-
> >> us...@lists.sourceforge.net>
> >> Date: 10/03/2016 09:31 PM
> >> Subject: Re: [X10-users] [APGAS] Performance at place failure
> >>
> >> Hi Jonas,
> >>
> >> I am starting to look into this. Sorry for the delay.
> >>
> >> There is a big difference between X10 and APGAS when it comes to
> >> detecting place failures.
> >> X10 does the detection quickly because it owns the communication
> >> channels between the places.
> >> As soon as a channel is closed it assumes the place has failed and
> >> report the failure.
> >> APGAS on the other hand relies on Hazelcast for all communication
> > channels.
> >> Hazelcast does not consider the loss of a channel as an
> >> irrecoverable error and tries to reconnect.
> >> I have just committed a patch to instruct Hazelcast to give up on
> >> failed connections faster.
> >> That should help a lot.
> >>
> >> Like you, I have also observed some performance issues with
> >> Hazelcast 3.6.3 and upgraded to 3.7.1 in another commit.
> >>
> >> Now, this is not going to make up for the difference as Hazelcast
> >> will still spend up to 1s trying to reconnect and some time doing
> >> reconfiguration.
> >> I will continue to look into ways to improve the recovery time with
> > Hazelcast.
> >>
> >> Could you please update and rerun your benchmark programs and share
> >> the results?
> >>
> >> Regards,
> >>
> >> Olivier
> >>
> >>
> >> Jonas Posner <jonas.pos...@uni-kassel.de> wrote on 08/31/2016 
08:02:45 AM:
> >>
> >> > From: Jonas Posner <jonas.pos...@uni-kassel.de>
> >> > To: Mailing list for users of the X10 programming language <x10-
> >> > us...@lists.sourceforge.net>
> >> > Date: 08/31/2016 08:38 AM
> >> > Subject: [X10-users] [APGAS] Performance at place failure
> >> >
> >> > Hi all,
> >> >
> >> > I am playing around with the resiliency of APGAS. I wondered about
> >> > relative high times for manage a place failure. For comparision, I
> > wrote
> >> > a simple program in X10 and APGAS. Both are attached. The results 
show
> >> > significant difference.
> >> >
> >> > Every experiment was run with 4 places. X10 and APGAS are deployed 
from
> >> > the official git repository.
> >> >
> >> > Native X10 with X10_RESILIENT_MODE=1
> >> > without crash: 0.003
> >> > with crash: 0.015
> >> >
> >> >
> >> > Managed X10 with Hazelcast 3.3.1 and X10_RESILIENT_MODE=1
> >> > without crash: 0.34
> >> > with crash: 0.74
> >> >
> >> >
> >> > APGAS with Hazelcast with Hazelcast 3.6.3 and
> > -Dapgas.serialization=java
> >> > -Dapgas.resilient=true -Dapgas.compact=false
> >> > without crash: 0.86
> >> > with crash: very varying: 8-38
> >> >
> >> >
> >> > APGAS with Hazelcast with Hazelcast 3.6.3 and
> > -Dapgas.serialization=java
> >> > -Dapgas.resilient=true -Dapgas.compact=true
> >> > without crash: 0.8
> >> > with crash: very varying: 8-31
> >> >
> >> >
> >> > APGAS with Hazelcast with Hazelcast 3.7 and 
-Dapgas.serialization=java
> >> > -Dapgas.resilient=true -Dapgas.compact=false
> >> > without crash: 0.77
> >> > with crash: 5.7 or 11.33 (50:50)
> >> >
> >> >
> >> > APGAS with Hazelcast with Hazelcast 3.7 and 
-Dapgas.serialization=java
> >> > -Dapgas.resilient=true -Dapgas.compact=true
> >> > without crash: 0.74
> >> > with crash: 5.7 or 11.33 (50:50)
> >> >
> >> >
> >> >
> >> > Managed X10 absorbs a failure significantly better than APGAS. 
However,
> >> > Managed X10 uses Hazelcast 3.3.1 and (official) APGAS uses 
Hazelcast
> >> > 3.6.3. What causes the differences?
> >> >
> >> > A few days ago Hazelcast was released in version 3.7. In my
> > experiments,
> >> > it performs better.
> >> >
> >> > What purpose has "apgas.compact=true"? Should APGAS perform better
> > with it?
> >> >
> >> > Are there other options to improve APGAS's performance at a place
> > failure?
> >> >
> >> >
> >> >
> >> > Thanks and cheers
> >> >
> >> > --
> >> > Jonas Posner
> >> > Universitaet Kassel
> >> > Fachbereich 16 Elektrotechnik/Informatik
> >> > Fachgebiet Programmiersprachen/-methodik
> >> > Wilhelmshoeher Allee 71-73
> >> > 34121 Kassel, Germany
> >> >
> >> > Phone:  +49 (0)561 804-6498
> >> > Fax:    +49 (0)561 804-6219
> >> > mailto: jonas.pos...@uni-kassel.de
> >> > www.uni-kassel.de
> >> > [attachment "MessCrashingTime.java" deleted by Olivier Tardieu/
> >> > Watson/IBM] [attachment "MessCrashingTime.x10" deleted by Olivier
> >> > Tardieu/Watson/IBM]
> >> >
> >>
> > 
> 
------------------------------------------------------------------------------
> >> > _______________________________________________
> >> > X10-users mailing list
> >> > X10-users@lists.sourceforge.net
> >> > https://lists.sourceforge.net/lists/listinfo/x10-users
> >>
> > 
> 
------------------------------------------------------------------------------
> >> Check out the vibrant tech community on one of the world's most
> >> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> >> _______________________________________________
> >> X10-users mailing list
> >> X10-users@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/x10-users
> >
> >
> >
> > 
> 
------------------------------------------------------------------------------
> > Check out the vibrant tech community on one of the world's most
> > engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> >
> >
> >
> > _______________________________________________
> > X10-users mailing list
> > X10-users@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/x10-users
> >
> 
> -- 
> Jonas Posner
> Universitaet Kassel
> Fachbereich 16 Elektrotechnik/Informatik
> Fachgebiet Programmiersprachen/-methodik
> Wilhelmshoeher Allee 71-73
> 34121 Kassel, Germany
> 
> Phone:  +49 (0)561 804-6498
> Fax:    +49 (0)561 804-6219
> mailto: jonas.pos...@uni-kassel.de
> www.uni-kassel.de
> 
> 
------------------------------------------------------------------------------
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today. http://sdm.link/xeonphi
> _______________________________________________
> X10-users mailing list
> X10-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/x10-users
> 


------------------------------------------------------------------------------
_______________________________________________
X10-users mailing list
X10-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/x10-users

Reply via email to