Hi Jens.

I am using GCP to fire up 3 servers. The import is quick enough and the
cluster and network looks ok then.
Speed also looks fine between the 3 nodes.

I have these properties enabled when I start the server:

java -server -agentpath:/home/r2d2/yourkit/bin/linux-x86-64/libyjpagent.so
-javaagent:lib/aspectj/lib/aspectjweaver.jar -Dgemfire.EXPIRY_THREADS=20
-Dgemfire.PREFER_SERIALIZED=false
*-Dgemfire.enable.network.partition.detection=false
*-Dgemfire.autopdx.ignoreConstructor=true
-Dgemfire.ALLOW_PERSISTENT_TRANSACTIONS=true
-Dgemfire.member-timeout=600000 -Xmx90G -Xms90G -Xmn30G -XX:SurvivorRatio=1
-XX:MaxTenuringThreshold=15 -XX:CMSInitiatingOccupancyFraction=78
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled
-XX:+UseCMSInitiatingOccupancyOnly -XX:+DisableExplicitGC
-XX:+PrintGCDetails -XX:+PrintTenuringDistribution -XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -verbose:gc
-Xloggc:/home/r2d2/rdb-geode-server/gc/gc-server.log
-Djava.rmi.server.hostname='localhost'
-Dcom.sun.management.jmxremote.port=9010
-Dcom.sun.management.jmxremote.rmi.port=9010
-Dcom.sun.management.jmxremote.local.only=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
.....org.rdb.geode.server.GeodeServer

Could this setting influence the cluster:
*Dgemfire.enable.network.partition.detection=false*

*I am seeing a lot of recovery messages:*

[info 2018/10/16 15:32:26.867 UTC  <Recovery thread for bucket
> _B__net.lautus.gls.domain.life.instruction.instruction.rebalance.
> AggregatePortfolioRebalanceChoice_92> tid=0x42c9] Initialization of
> region _B__net.lautus.gls.domain.life.instruction.instruction.rebalance.
> AggregatePortfolioRebalanceChoice_92 completed
> [info 2018/10/14 11:19:17.329 SAST  <RedundancyLogger for region
> net.lautus.gls.domain.life.additionalfields.AdditionalFieldConfiguration>
> tid=0x1858] Region
> /net.lautus.gls.domain.life.additionalfields.AdditionalFieldConfiguration
> (and any colocated sub-regions) has potentially stale data.  Buckets [3]
> are waiting for another offline member to recover the latest data.
>   My persistent id is:
>     DiskStore ID: 932530bc-4c45-4926-b4a1-6fe5fe1f0493
>     Name:
>     Location: /10.154.0.2:/home/r2d2/rdb-geode-server/geode/tauDiskStore
>
>   Offline members with potentially new data:
>   [
>     DiskStore ID: c09e4cce-51e9-4111-8643-fe582677f49f
>     Location: /10.154.0.4:/home/r2d2/rdb-geode-server/geode/tauDiskStore
>     Buckets: [3]
>   ]
>   Use the "gfsh show missing-disk-stores" command to see all disk stores
> that are being waited on by other members.
> [info 2018/10/14 11:19:35.250 SAST  <Pooled Waiting Message Processor 7>
> tid=0x1318] Configured redundancy of 1 copies has been restored to
> /net.lautus.gls.domain.life.additionalfields.AdditionalFieldConfiguration


Btw using Apache Geode 1.7.0.

Kindly
Pieter


On Wed, Oct 17, 2018 at 3:56 PM Jens Deppe <[email protected]> wrote:

> Hi Pieter,
>
> Your startup times are definitely  too long - probably at least an order
> of magnitude. My first guess is that this is network related. This may
> either be a DNS lookup issue or, if the the cluster is isolated from the
> internet, it may be some problem with XSD validation needing internet
> access (even though we do bundle the XSD files with Geode - should be the
> same for Spring too). I will see if I can find any potential XSD issue.
>
> --Jens
>
> On Wed, Oct 17, 2018 at 3:22 AM Pieter van Zyl <[email protected]>
> wrote:
>
>> Good day.
>>
>> We are currently running a 3 node Geode cluster.
>>
>> We are running the locator from gfsh and then staring up 3 servers with
>> Spring that connects to the central locator.
>>
>> We are using persistence on all the regions and have basically one data
>> and pdx store per node.
>>
>> The problem  we are experiencing is that with no data aka clean cluster
>> it take 75minutes to start up.
>>
>> Once data has been imported into the cluster and we shutdown all
>> nodes/server and startup again it takes 128 to 160 minutes
>> This is very slow.
>>
>> Question is is there anyway to improve the startup speed? Is this normal
>> and expected speed?
>>
>> We have a 100gig database distributed across the 3 nodes.
>> Server 1: 100 gig memory and 90 gig assigned heap and db size of 49gig
>> and 32 cores.
>> Server 2: 64 gig memory and 60 gig assigned heap and db size of 34gig and
>> 16 cores
>> Server 3: 64 gig memory and 60 gig assigned heap and db size of 34gig and
>> 16 cores
>>
>> Should we have more data stores? Maybe separate stores for the partition
>> vs replicated regions?
>>
>> <gfe:disk-store id="pdx-disk-store" allow-force-compaction="true"
>> auto-compact="true" max-oplog-size="1024">
>>    * <gfe:disk-dir location="geode/pdx"/>*
>> </gfe:disk-store>
>>
>> <gfe:disk-store id="tauDiskStore" allow-force-compaction="true"
>> auto-compact="true" max-oplog-size="5120"
>>                 compaction-threshold="90">
>>   *  <gfe:disk-dir location="geode/tauDiskStore"/>*
>> </gfe:disk-store>
>>
>> We have a mix of regions:
>>
>> Example partitioned region:
>>
>> <gfe:replicated-region id="net.lautus.gls.domain.life.accounting.Account"
>> disk-store-ref="tauDiskStore"
>>                        statistics="true"
>> persistent="true"><!--<gfe:cache-listener ref="cacheListener"/>-->
>>     <gfe:eviction type="HEAP_PERCENTAGE" action="OVERFLOW_TO_DISK"/>
>> </gfe:replicated-region>
>>
>> Example replicated region:
>> <gfe:replicated-region id="org.rdb.internal.session.rootmap.RootMapHolder"
>>                        disk-store-ref="tauDiskStore"
>>                        statistics="true" persistent="true"
>> >
>>     <!--<gfe:cache-listener ref="cacheListener"/>-->
>>     <gfe:eviction type="ENTRY_COUNT" action="OVERFLOW_TO_DISK"
>> threshold="100">
>>         <gfe:object-sizer ref="objectSizer"/>
>>     </gfe:eviction>
>> </gfe:replicated-region>
>>
>>
>> Any advice would be appreciated
>>
>> Kindly
>> Pieter
>>
>

Reply via email to