Re: ignite memory issues:Urgent in production

2016-09-20 Thread percent620
Can anyone help me to fix this issue as this issue happens in our production
env?



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/ignite-memory-issues-Urgent-in-production-tp7817p7842.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: ignite memory issues:Urgent in production

2016-09-18 Thread percent620
*Another server logs and I found that several ignite server automaticlly
shutdown.
*
Caused by: class org.apache.ignite.spi.IgniteSpiException: Failed to send
message to remote node: TcpDiscoveryNode
[id=f59d7d01-b01d-46b2-b679-17b73313ae98, addrs=[y, 127.0.0.1],
sockAddrs=[y/y:0, /127.0.0.1:0], discPort=0, order=2486,
intOrder=1265, lastExchangeTime=1474169373925, loc=false,
ver=1.7.0#20160801-sha1:383273e3, isClient=true]
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1996)
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1936)
at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1304)
... 30 more
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to connect
to node (is node still alive?). Make sure that each ComputeTask and cache
Transaction has a timeout set in order to prevent parties from waiting
forever in case of network issues
[nodeId=f59d7d01-b01d-46b2-b679-17b73313ae98, addrs=[y/y:47100,
/127.0.0.1:47100]]
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2499)
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2140)
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2034)
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1970)
... 32 more
Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to
connect to address: y/y:47100
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2504)
... 35 more
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:117)
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2363)
... 35 more
Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to
connect to address: /127.0.0.1:47100
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2504)
... 35 more
Caused by: class org.apache.ignite.IgniteCheckedException: Remote node 
ID
is not as expected [expected=f59d7d01-b01d-46b2-b679-17b73313ae98,
rcvd=a4df12c5-fe9e-4b3f-b652-0ec02111dc7b]
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:2614)
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2371)
... 35 more
[11:29:44,714][SEVERE][marshaller-cache-#228%null%][CacheContinuousQueryHandler]
Failed to send event notification to node:
19ca5b90-ae92-41fa-ae54-d1427e41185d
class org.apache.ignite.IgniteCheckedException: Failed to send message (node
may have left the grid or TCP connection cannot be established due to
firewall issues) [node=TcpDiscoveryNode
[id=19ca5b90-ae92-41fa-ae54-d1427e41185d, addrs=[y, 127.0.0.1],
sockAddrs=[/127.0.0.1:0, y/y:0], discPort=0, order=2481,
intOrder=1260, lastExchangeTime=1474169373764, loc=false,
ver=1.7.0#20160801-sha1:383273e3, isClient=true], topic=T4
[topic=TOPIC_CACHE, id1=1fd3a002-42a8-3e13-a1aa-bf164b7f2d64,
id2=19ca5b90-ae92-41fa-ae54-d1427e41185d, id3=1], msg=GridContinuousMessage
[type=MSG_EVT_NOTIFICATION, routineId=bf2fb8b0-db98-4f6e-8fa4-514d00dcf5e7,
data=null, futId=null], policy=2]
at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1309)
at
org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1540)
at
org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1337)
at
org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1308)
at
org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1290)
at
org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:945)
at
org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:888)
at
org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787)
   

ignite memory issues:Urgent in production

2016-09-18 Thread percent620
Hello, I have a urgent issues on our production env for ignite issues.

I have deployed ignite cluster with standalone server for 7 server nodes,
each ignite node memory is 40G. totally is 270G.

[10:05:11] Topology snapshot [ver=2356, servers=7, clients=0, CPUs=1488,
heap=270GB]



we have set all the ignite connection is "client" mode, when we have 60
clients(each clients is 4GB), then ignite with the following information 

*[10:05:11] Topology snapshot [ver=2356, servers=7, clients=60, CPUs=1488,
heap=510GB]*

sometimes all the ignite shut down quickly  and error message is 
[13:08:12,549][SEVERE][exchange-worker-#136%null%][GridDhtPartitionsExchangeFuture]
Failed to reinitialize local partitions (preloading will be stopped):
GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=2582,
minorTopVer=0], nodeId=8837eae8, evt=NODE_FAILED]
java.lang.NullPointerException
at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:734)
at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:473)
at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1440)
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at java.lang.Thread.run(Thread.java:745)
[13:08:12] Ignite node stopped OK [uptime=25:07:08:283]



I have 2 questions as below
1) Can you please tell me what's wrong with this error message?
2)
*[10:05:11] Topology snapshot [ver=2356, servers=7, clients=60, CPUs=1488,
heap=510GB]*
client total memory is 240GB(60 client nodes * 4GB), is this is root cause? 



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/ignite-memory-issues-Urgent-in-production-tp7817.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.