Hi,

Please properly subscribe to the user list (this way we will not have to
manually approve your emails) if you want to get answers from the community
earlier. All you need to do is send an email to ì
[email protected]î and follow simple instructions in the
reply.


Is there any particular reason why you set the following low level settings
for TcpDiscoverySpi? Why have you set these values?
                <property name="joinTimeout" value="30000"/>
                <property name="ackTimeout" value="30000"/>
                <property name="maxAckTimeout" value="60000"/>            
                <property name="reconnectCount" value="5"/>

If you observe high latencies in your network then you need to increase
'socketWriteTimeout' as well. The following also can be a reason of the
issue:
- long GC pauses on servers or clients side. Check GC logs -
https://apacheignite.readme.io/docs/jvm-and-system-tuning#section-detailed-garbage-collection-stats;
- not enough throughput in the network at some periods. I would suggest
removing all these low level settings done for TcpDiscvorySpi and set
IgniteConfiguration.failureDetectionTimeout instead (preferably on all the
nodes). Low level tuning of TcpDiscoverySpi is needed in rare cases.

--
Denis
 
------
Ignite Community, would you pleas help us diagnose a production outage. We
are on ignite 1.5.0-final version. Some of our clients are unable to connect
to grid and throw Join Timeout Exception. This is very intermittent and not
all clients are have this problem, but a few at irregular times and we
cannot replicate. 

Here is our configuration :

JVM heap size 10 GB for each node. 16 nodes total. 

Overnight few of our clients were not able to connect to the grid. Most
clients were fine. 
We checked JVM utilization on JMX for each node, memory was under-utilized. 

Cache configuration snippet. 

<property name="backups" value="1"/>
<property name="startSize" value="#{1 * 1024 * 1024}"/>  
<property name="memoryMode" value="OFFHEAP_TIERED"/>
<property name="offHeapMaxMemory" value="#{10 * 1024L * 1024L * 1024L}"/>


Have you guys seen this before? Here are our settings from the client config
file and the error is below. 

<property name="discoverySpi">
            <bean
class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
                                                <property name="joinTimeout"
value="30000"/>
                <property name="ackTimeout" value="30000"/>
                                <property name="maxAckTimeout"
value="60000"/>            
                                                <property
name="reconnectCount" value="5"/>
                                                <property name="ipFinder">
                    
                    
                    <bean
class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
                        <property name="addresses">
                            <list>                            
                                
                                <value>grid-tp1-prod:47500..47509</value>
                                <value>grid-tp2-prod:47500..47509</value>
                                <value>grid-tp3-prod:47500..47509</value>
                                <value>grid-tp4-prod:47500..47509</value>       
                         
                                        </list>
                        </property>
                    </bean>
                </property>
            </bean>
        </property>


Error Logs 

2016-06-13 18:58:11,657 ERROR orderserver.client.GridClient
(GridClient.java:174) - class org.apache.ignite.IgniteException: Failed to
start manager: GridManagerAdapter [enabled=true,
name=org.apache.ignite.internal.managers.discovery.GridDiscoveryManager] 
class org.apache.ignite.IgniteException: Failed to start manager:
GridManagerAdapter [enabled=true,
name=org.apache.ignite.internal.managers.discovery.GridDiscoveryManager] 
        at
org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:906)
 
        at org.apache.ignite.Ignition.start(Ignition.java:350) 
        at
com.tudor.datagridI.TradingDataAccessImpl.<init>(TradingDataAccessImpl.java:104)
 
        at
com.tudor.datagridI.DataGridClient.getTradingDataAccess(DataGridClient.java:16) 
        at
orderserver.client.GridClient.getTradingDataAccess(GridClient.java:94) 
        at
orderserver.client.GridClient.updateOrderInGrid(GridClient.java:164) 
        at orderserver.OrderFactory.saveOrders(OrderFactory.java:5683) 
        at
com.tudor.fix.processor.SaveOrders.saveOrders(SaveOrders.java:124) 
        at com.tudor.fix.processor.SaveOrders.saveOrders(SaveOrders.java:94) 
        at com.tudor.fix.processor.SaveOrders.transform(SaveOrders.java:38) 
        at
com.tudor.fix.transformer.CompositeFilteringFixStateTransformer.transform(CompositeFilteringFixStateTransformer.java:59)
 
        at
com.tudor.fix.transformer.ReportingBatchTransformer.transform(ReportingBatchTransformer.java:74)
 
        at
com.tudor.fix.transformer.BatchingFixStateTransformer.batch(BatchingFixStateTransformer.java:158)
 
        at
com.tudor.fix.transformer.BatchingFixStateTransformer.transform(BatchingFixStateTransformer.java:104)
 
        at com.tudor.fix.service.MessageKeeper.run(MessageKeeper.java:107) 
        at java.lang.Thread.run(Thread.java:745) 
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to start
manager: GridManagerAdapter [enabled=true,
name=org.apache.ignite.internal.managers.discovery.GridDiscoveryManager] 
        at
org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1536) 
        at
org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:897) 
        at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1736)
 
        at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1589)
 
        at
org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1042) 
        at
org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:964) 
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:850) 
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:749) 
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:619) 
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:589) 
        at org.apache.ignite.Ignition.start(Ignition.java:347) 
        ... 14 more 
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to start
SPI: TcpDiscoverySpi [addrRslvr=null, sockTimeout=5000, ackTimeout=30000,
reconCnt=5, maxAckTimeout=60000, forceSrvMode=false,
clientReconnectDisabled=false] 
        at
org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:258)
 
        at
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:677)
 
        at
org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1531) 
        ... 24 more 
Caused by: class org.apache.ignite.spi.IgniteSpiException: Join process
timed out, did not receive response for join request (consider increasing
'joinTimeout' configuration property) [joinTimeout=30000,
sock=Socket[addr=grid-tp2-prod/10.22.50.41,port=47503,localport=38191]] 
        at
org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1335)
 
        at
org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) 


Appreciate your help. 

Thanks, 
Sparkle. 



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Production-outage-Join-process-time-out-tp5631p5699.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Reply via email to