Re: Exception on CacheEntryProcessor invoke (2.10.0)

2021-05-25 Thread ihalilaltun
Hi,

here is the debug log  ignite.zip
  

in the mean time i'll try to simplfy use case as you suggested.



-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Exception on CacheEntryProcessor invoke (2.10.0)

2021-05-24 Thread ihalilaltun
Hi,

I've run more detailed tests during the weekend and i can surely tell that
problem is not related to the migrated data. With a new cluster setup and
with 0 data we can still get the error.

what i have in my mind is this; with the new version there may be a new
configuration parameter that has to be set in order cacheentryprocessors to
be DEPLOYED in SHARED mode to all cluster nodes, but i cannot find such a
parameter.

so at this point my problem becomes to this; is there a configuration
parameter that forces all cacheentroprocessors the be deployed on every
cluster node from client by force? 

following is the cluster and client configuration;
client-server-configs.zip

  

when the client nodes starts and run necessary jobs -containing
cacaheentryprocessors- first 1 or 2 cacheentryprocessors are deploed to both
clusters after that new cacaheentryprocessors starts to get
classnotfoundexception and cluster nodes keeps giving following warnings;


[2021-05-24T15:36:00,010][WARN
][sys-stripe-4-#5][GridDeploymentPerVersionStore] Failed to load peer class
(ignore if class got undeployed during preloading)
[alias=com.segmentify.lotr.frodo.cacheentryprocessor.ShiftPromotionCountersEntryProcessor,
dep=SharedDeployment [rmv=false, super=GridDeployment [ts=1621870560005,
depMode=SHARED, clsLdr=GridDeploymentClassLoader
[id=627410f9971-f4e082a1-4012-4720-bcbb-e438359221e1, singleNode=false,
nodeLdrMap=HashMap
{db22a85d-37a5-45c4-ae63-bdd535eaca44=75f020f9971-db22a85d-37a5-45c4-ae63-bdd535eaca44},
p2pTimeout=5000, usrVer=0, depMode=SHARED, quiet=false],
clsLdrId=627410f9971-f4e082a1-4012-4720-bcbb-e438359221e1, userVer=0,
loc=false,
sampleClsName=com.segmentify.lotr.frodo.cacheentryprocessor.ShiftPromotionCountersEntryProcessor,
pendingUndeploy=false, undeployed=false, usage=0]]]
[2021-05-24T15:36:00,103][WARN
][sys-stripe-2-#3][GridDeploymentPerVersionStore] Failed to load peer class
(ignore if class got undeployed during preloading)
[alias=com.segmentify.lotr.frodo.cacheentryprocessor.RockScoreResetProcessor,
dep=SharedDeployment [rmv=false, super=GridDeployment [ts=1621870560100,
depMode=SHARED, clsLdr=GridDeploymentClassLoader
[id=a27410f9971-f4e082a1-4012-4720-bcbb-e438359221e1, singleNode=false,
nodeLdrMap=HashMap
{db22a85d-37a5-45c4-ae63-bdd535eaca44=75f020f9971-db22a85d-37a5-45c4-ae63-bdd535eaca44},
p2pTimeout=5000, usrVer=0, depMode=SHARED, quiet=false],
clsLdrId=a27410f9971-f4e082a1-4012-4720-bcbb-e438359221e1, userVer=0,
loc=false,
sampleClsName=com.segmentify.lotr.frodo.cacheentryprocessor.RockScoreResetProcessor,
pendingUndeploy=false, undeployed=false, usage=0]]]
[2021-05-24T15:36:00,180][WARN
][sys-stripe-1-#2][GridDeploymentPerVersionStore] Failed to load peer class
(ignore if class got undeployed during preloading)
[alias=com.segmentify.lotr.frodo.cacheentryprocessor.RockScoreResetProcessor,
dep=SharedDeployment [rmv=false, super=GridDeployment [ts=1621870560171,
depMode=SHARED, clsLdr=GridDeploymentClassLoader
[id=f27410f9971-f4e082a1-4012-4720-bcbb-e438359221e1, singleNode=false,
nodeLdrMap=HashMap
{db22a85d-37a5-45c4-ae63-bdd535eaca44=75f020f9971-db22a85d-37a5-45c4-ae63-bdd535eaca44},
p2pTimeout=5000, usrVer=0, depMode=SHARED, quiet=false],
clsLdrId=f27410f9971-f4e082a1-4012-4720-bcbb-e438359221e1, userVer=0,
loc=false,
sampleClsName=com.segmentify.lotr.frodo.cacheentryprocessor.RockScoreResetProcessor,
pendingUndeploy=false, undeployed=false, usage=0]]]
[2021-05-24T15:36:00,202][WARN
][sys-stripe-1-#2][GridDeploymentPerVersionStore] Failed to load peer class
(ignore if class got undeployed during preloading)
[alias=com.segmentify.lotr.frodo.cacheentryprocessor.RockScoreResetProcessor,
dep=SharedDeployment [rmv=false, super=GridDeployment [ts=1621870560191,
depMode=SHARED, clsLdr=GridDeploymentClassLoader
[id=137410f9971-f4e082a1-4012-4720-bcbb-e438359221e1, singleNode=false,
nodeLdrMap=HashMap
{db22a85d-37a5-45c4-ae63-bdd535eaca44=75f020f9971-db22a85d-37a5-45c4-ae63-bdd535eaca44},
p2pTimeout=5000, usrVer=0, depMode=SHARED, quiet=false],
clsLdrId=137410f9971-f4e082a1-4012-4720-bcbb-e438359221e1, userVer=0,
loc=false,
sampleClsName=com.segmentify.lotr.frodo.cacheentryprocessor.RockScoreResetProcessor,
pendingUndeploy=false, undeployed=false, usage=0]]]
[2021-05-24T15:36:00,308][WARN
][sys-stripe-1-#2][GridDeploymentPerVersionStore] Failed to load peer class
(ignore if class got undeployed during preloading)
[alias=com.segmentify.lotr.frodo.cacheentryprocessor.RockScoreResetProcessor,
dep=SharedDeployment [rmv=false, super=GridDeployment [ts=1621870560302,
depMode=SHARED, clsLdr=GridDeploymentClassLoader
[id=337410f9971-f4e082a1-4012-4720-bcbb-e438359221e1, singleNode=false,
nodeLdrMap=HashMap
{db22a85d-37a5-45c4-ae63-bdd535eaca44=75f020f9971-db22a85d-37a5-45c4-ae63-bdd535eaca44},
p2pTimeout=5000, usrVer=0, depMode=SHARED, quiet=false],
clsLdrId=337410f9971-f4e082a1-4012-4720-bcbb-e438359221e1, userVer=0,
loc=

Re: Exception on CacheEntryProcessor invoke (2.10.0)

2021-05-21 Thread ihalilaltun
hi

the case can be reproduced only by upgrading from 2.7.6 to 2.10.0 with
existing data. can you run that kind of reproduce step?



-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Exception on CacheEntryProcessor invoke (2.10.0)

2021-05-21 Thread ihalilaltun
Hi Ilya

the exact same applications run on the system, there is no way that class is
missing.

by the way i have run some more tests; with a new clean ignite-cluster setup
we did not get the errors and systems runs smoothly. only difference here is
the upgrade proccess, there should be a problem with the data migration over
upgrade proccess.



-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Exception on CacheEntryProcessor invoke (2.10.0)

2021-05-21 Thread ihalilaltun
Hi, 
sorry but i cannot share such a project, company policies restrtics it. 

I tried to reproduce it with new code but no luck (due to different
environments and existing data structure). the idea was to call
cacheentryprocessor's with in a executorservice but cannot get error. 

Let me give more information about migration step that causes the error; I
have small portion of production data on test envitonment. ignite's running
on 2.7.6 version. none of our tests fail, none of test automations causes
the error. next step is to gracefully shut-down the ignite nodes, then *yum
upgrade apache-ignite* command is executed and successfully upgraded message
is received. Then ignite nodes are successfully started with 2.10.0 version.
After that point all cacheentryprocessors that is called from runnable
contexts starts to give ClassNotFoundExceptions. 

DEBUG mode log is here ->  ignite-debug.log
  

the most interesting log message is this;
*[2021-05-21T12:10:00,012][DEBUG][sys-stripe-9-#10][GridDeploymentPerVersionStore]
Failed to find class on remote node
[class=com.segmentify.lotr.frodo.cacheentryprocessor.ShiftPromotionCountersEntryProcessor,
nodeId=f2c50fc3-0e7b-43eb-bcbb-5d3dda635b6b,
clsLdrId=d12c24e8971-f2c50fc3-0e7b-43eb-bcbb-5d3dda635b6b, reason=Failed to
find local deployment for peer request: GridDeploymentRequest
[rsrcName=com/segmentify/lotr/frodo/cacheentryprocessor/ShiftPromotionCountersEntryProcessor.class,
ldrId=d12c24e8971-f2c50fc3-0e7b-43eb-bcbb-5d3dda635b6b, isUndeploy=false,
nodeIds=null]]*

although class is present on remote node, somehow ignite node cannot find
it.




-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Exception on CacheEntryProcessor invoke (2.10.0)

2021-05-20 Thread ihalilaltun
Hi igniters,

recenlty we have upgraded from 2.7.6 to 2.10.0 and some of
cacheentryprocessors started to throw following errors on cache.invoke(...)
calls.

Caused by: java.lang.ClassNotFoundException:
com.segmentify.lotr.frodo.cacheentryprocessor.RockScoreUpdateProcessor
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
~[?:1.8.0_261]
at java.lang.ClassLoader.loadClass(ClassLoader.java:418) ~[?:1.8.0_261]
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
~[?:1.8.0_261]
at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ~[?:1.8.0_261]
at java.lang.Class.forName0(Native Method) ~[?:1.8.0_261]
at java.lang.Class.forName(Class.java:348) ~[?:1.8.0_261]


on 2.7.6 version we also get these error from time to time, but when
application that uses these cacaheentryprocessors is restarted errors does
not occur. but on 2.10.0 version this solution did not solve our problem.

currently we have 23 different cacheentryprocessors runs on the system.
after many different test scenarios and checks we found a pattern on above
error case. only 4 out of 23 cacheentryprocessor keeps getting this error,
*3 of these are invoked by ExecutorServices*;

sample usage is somithing like the following;

private ExecutorService executorService = Executors.newCachedThreadPool();

executorService.submit(() -> { 
...
igniteCache.withKeepBinary()
.invoke(record.getKey(), new
RockScoreUpdateProcessor(),
"arg1", "arg2", "arg3");
});

*one cacheentryprocessor is invoked by XSync
(https://github.com/antkorwin/xsync) *


so what we see here, somehow when a cacheentryprocessor is invoked from a
runnable context classnotfoundexception is thrown.

*peerclassloading* property is set to true, *deploymentmode* is set to
SHARED and *persistenceEnabled* is set to true.


can this be a bug either known or unknown?


currently this is a blocker issue for us to upgrade on production
environment. any help is appriciated.

Thanks.




-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: CacheEntryProcessor ClassNotFoundException after 2.7.6 -> 2.10.0 Upgrade

2021-05-18 Thread ihalilaltun
what we expect here is that related cacheentryprocessors or any other class
should be redeployed in SHARED mode and do the task it should be.



-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


CacheEntryProcessor ClassNotFoundException after 2.7.6 -> 2.10.0 Upgrade

2021-05-17 Thread ihalilaltun
Hi igniters,

recenlty we have upgraded from 2.7.6 to 2.10.0 and some of
cacheentryprocessors started to act wierd. We have following
cacheentryprocessor  RockScoreUpdateProcessor.java

  

when the processor is called we are getting following error from application 
application.log
  
related exception is logged at ignite as following  ignite.log
  

strage thing is when application is restarted related entryprocessor is
deployed just fine with following ignite log
*[2021-05-17T15:30:54,124][INFO
][sys-stripe-121-#122][GridDeploymentPerVersionStore] Class was deployed in
SHARED or CONTINUOUS mode: class
com.segmentify.lotr.frodo.cacheentryprocessor.RockScoreUpdateProcessor*

but most of the time we are getting the above situation.


our client and server nodes have the same properties;

*
*


prior to upgrade we did not have such problem. Any help appriciated.

Thanks.



-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


ignite 2.10 index type change

2021-05-07 Thread ihalilaltun
Hello igniters,

We've recently upgraded from ignite 2.7.6 to 2.10. With the cluster start
we've seen that all indexes are rebuilded which went very well -> no data
loss :)

After upgrade we've run some tests and encountered following problem; we've
following parameter in one of our objects

@QuerySqlField(index = true, descending = true)
protected Timestamp lastUpdateTime;

but this parameter is persisted as followings;
write -> binaryWriter.writeDate("lastUpdateTime", lastUpdateTime != null ?
new Date(lastUpdateTime.getTime()) : null);
read -> Date h = binaryReader.readDate("lastUpdateTime");
lastUpdateTime = (h != null) ? new Timestamp(h.getTime()) :
null;

so after upgrade operation we're getting following error, my question here
is the following: can i somehow update the parameter to Date and update all
the indexes on the cluster without any data loss? Any help appriciated.

javax.cache.processor.EntryProcessorException: class
org.apache.ignite.IgniteCheckedException: Type for a column 'lastUpdateTime'
is not compatible with index definition. Expected 'Timestamp', actual type
'Date'
at
org.apache.ignite.internal.processors.cache.CacheInvokeResult.get(CacheInvokeResult.java:108)
~[ignite-core-2.10.0.jar!/:2.10.0]
at
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.invoke(IgniteCacheProxyImpl.java:1715)
~[ignite-core-2.10.0.jar!/:2.10.0]
at
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.invoke(IgniteCacheProxyImpl.java:1759)
~[ignite-core-2.10.0.jar!/:2.10.0]
at
org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.invoke(GatewayProtectedCacheProxy.java:1264)
~[ignite-core-2.10.0.jar!/:2.10.0]
at
com.segmentify.lotr.gimli.model.push.LastVisitReminderCampaign.runPrioritized(LastVisitReminderCampaign.java:180)
~[classes!/:0.0.1-SNAPSHOT]
at
com.segmentify.lotr.gimli.campaign.PushManager.executePrioritizedCamps(PushManager.java:264)
~[classes!/:0.0.1-SNAPSHOT]
at
com.segmentify.lotr.gimli.campaign.PushManager.lambda$executeCampaigns$1(PushManager.java:146)
~[classes!/:0.0.1-SNAPSHOT]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[?:1.8.0_261]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[?:1.8.0_261]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_261]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_261]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_261]
Caused by: org.apache.ignite.IgniteCheckedException: Type for a column
'lastUpdateTime' is not compatible with index definition. Expected
'Timestamp', actual type 'Date'
at 
org.apache.ignite.internal.util.IgniteUtils.cast(IgniteUtils.java:7587)
~[ignite-core-2.10.0.jar!/:2.10.0]
at
org.apache.ignite.internal.processors.cache.GridCacheContext.validateKeyAndValue(GridCacheContext.java:1916)
~[ignite-core-2.10.0.jar!/:2.10.0]
at
org.apache.ignite.internal.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.call(GridCacheMapEntry.java:6204)
~[ignite-core-2.10.0.jar!/:2.10.0]
at
org.apache.ignite.internal.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.call(GridCacheMapEntry.java:5923)
~[ignite-core-2.10.0.jar!/:2.10.0]
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.invokeClosure(BPlusTree.java:4019)
~[ignite-core-2.10.0.jar!/:2.10.0]
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.access$5700(BPlusTree.java:3913)
~[ignite-core-2.10.0.jar!/:2.10.0]
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:2042)
~[ignite-core-2.10.0.jar!/:2.10.0]
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:2013)
~[ignite-core-2.10.0.jar!/:2.10.0]
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1920)
~[ignite-core-2.10.0.jar!/:2.10.0]
at
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1758)
~[ignite-core-2.10.0.jar!/:2.10.0]
at
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1741)
~[ignite-core-2.10.0.jar!/:2.10.0]
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:2766)
~[ignite-core-2.10.0.jar!/:2.10.0]
at
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:439)
~[ignite-core-2.10.0.jar!/:2.10.0]
at
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerUpdate(GridCacheMapEntry.java:2338)
~[ignite-core-2.10.

Re: [ANNOUNCE] Apache Ignite 2.9.1 Released

2021-01-04 Thread ihalilaltun
Hi Yaroslav,

We are at v2.7.6 and want to upgrade the latest version, do you have any
directives for this kind of upgrade methodology? 
can we upgrade to latest version without any problem by default or should we
upgrade version by version? 

2.7.6 -> 2.8 -> 2.8.1 -> 2.9 then 2.9.1

thanks



-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Critical Workers Health Check on client side

2021-01-04 Thread ihalilaltun
hi there,

I am curious about whether we can manage somehow *Critical Workers Health
Check*on client side? What i need to do is catch critical workers health
check results on client side, can this be done by implementing custom
StopNodeOrHaltFailureHandler on client side?

We are on ignite v2.7.6

thanks



-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


node down after Caught unhandled exception in NIO worker thread (restart the node) log

2019-11-25 Thread ihalilaltun
Hi Igniters,

We had a strange node-down incident after getting following log (we've been
using ignite in production for almost 1 year and we're getting this error
for the first time)

[2019-11-22T21:19:54,222][INFO
][grid-nio-worker-tcp-comm-3-#203][TcpCommunicationSpi] Established outgoing
communication connection [locAddr=/192.168.199.60:43720,
rmtAddr=/192.168.199.222:47100]
[2019-11-22T21:19:54,230][ERROR][grid-nio-worker-tcp-comm-0-#200][TcpCommunicationSpi]
Caught unhandled exception in NIO worker thread (restart the node).
java.nio.channels.CancelledKeyException: null
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
~[?:1.8.0_201]
at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:82)
~[?:1.8.0_201]
at
java.nio.channels.spi.AbstractSelectableChannel.register(AbstractSelectableChannel.java:204)
~[?:1.8.0_201]
at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:1997)
~[ignite-core-2.7.6.jar:2.7.6]
at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1794)
[ignite-core-2.7.6.jar:2.7.6]
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
[ignite-core-2.7.6.jar:2.7.6]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201]
[2019-11-22T21:19:54,343][ERROR][grid-nio-worker-tcp-comm-2-#202][TcpCommunicationSpi]
Failed to process selector key [ses=GridSelectorNioSessionImpl
[worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=2,
bytesRcvd=617063277634, bytesSent=8878293076427, bytesRcvd0=107695,
bytesSent0=727192, select=true, super=GridWorker
[name=grid-nio-worker-tcp-comm-2, igniteInstanceName=null, finished=false,
heartbeatTs=1574457593322, hashCode=1772114147, interrupted=false,
runner=grid-nio-worker-tcp-comm-2-#202]]],
writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768],
readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768],
inRecovery=null, outRecovery=null, super=GridNioSessionImpl
[locAddr=/192.168.199.60:47100, rmtAddr=/192.168.199.68:62054,
createTime=1574457593307, closeTime=0, bytesSent=38, bytesRcvd=42,
bytesSent0=38, bytesRcvd0=42, sndSchedTime=1574457593322,
lastSndTime=1574457593322, lastRcvTime=1574457593322, readsPaused=false,
filterChain=FilterChain[filters=[GridNioCodecFilter
[parser=o.a.i.i.util.nio.GridDirectParser@2f6039d0, directMode=true],
GridConnectionBytesVerifyFilter], accepted=true, markedForClose=false]]]
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_201]
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
~[?:1.8.0_201]
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
~[?:1.8.0_201]
at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:1.8.0_201]
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
~[?:1.8.0_201]
at
org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processRead(GridNioServer.java:1282)
~[ignite-core-2.7.6.jar:2.7.6]
at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2386)
[ignite-core-2.7.6.jar:2.7.6]
at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2153)
[ignite-core-2.7.6.jar:2.7.6]
at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1794)
[ignite-core-2.7.6.jar:2.7.6]
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
[ignite-core-2.7.6.jar:2.7.6]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201]
[2019-11-22T21:19:54,624][WARN
][grid-nio-worker-tcp-comm-2-#202][TcpCommunicationSpi] Closing NIO session
because of unhandled exception [cls=class o.a.i.i.util.nio.GridNioException,
msg=Connection reset by peer]


- there was no network issues in any way
- here is the node logs;
ignite.zip
  
- here is the client node;
proto-20191122.log
  

any thougs?

Regards




-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: excessive timeouts and load on new cache creations

2019-11-25 Thread ihalilaltun
Hi Anton,

We have faced the same bug onnon-byte array types also.
here is the pojo we use;
UserMailInfo.java
  

I've already read the topic you shared, thanks :)




-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: excessive timeouts and load on new cache creations

2019-11-22 Thread ihalilaltun
Hi Pavel,

Thanks for you reply and suggestions but currenly we cannot use
cache-groups. As you know there is a know bug for it -> 
https://issues.apache.org/jira/browse/IGNITE-11953




-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: excessive timeouts and load on new cache creations

2019-11-21 Thread ihalilaltun
Hi Anton,

Timeouts can be found at the logs that i shared;

[query-#13207879][GridMapQueryExecutor] Failed to execute local query.
org.apache.ignite.cache.query.QueryCancelledException: The query was
cancelled while executing.

huge loads on server nodes are monitored via zabbix agent;
 


just after cache creation we cannot return to requests, these metrics are
monitored via prometheus, here is the SS;
 

for some reason, timeouts occur after cache proxy initializations (cache
creations)



-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


excessive timeouts and load on new cache creations

2019-11-21 Thread ihalilaltun
Hi Igniters,

Everytime a new cache is created dynamically we get
exessive number of timeouts and huge load on grid nodes. Current grid-node
metrics are the followings;

[2019-11-21T14:03:16,079][INFO ][grid-timeout-worker-#199][IgniteKernal] 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
^-- Node [id=5f730d51, uptime=54 days, 22:36:17.169]
^-- H/N/C [hosts=51, nodes=52, CPUs=290]
^-- CPU [cur=14.27%, avg=14.48%, GC=0%]
^-- PageMemory [pages=7115228]
^-- Heap [used=4278MB, free=47.77%, comm=8192MB]
^-- Off-heap [used=28119MB, free=2.94%, comm=28972MB]
^--   sysMemPlc region [used=0MB, free=99.99%, comm=100MB]
^--   default region [used=28118MB, free=1.93%, comm=28672MB]
^--   metastoreMemPlc region [used=1MB, free=98.89%, comm=100MB]
^--   TxLog region [used=0MB, free=100%, comm=100MB]
^-- Ignite persistence [used=36102MB]
^--   sysMemPlc region [used=0MB]
^--   default region [used=36102MB]
^--   metastoreMemPlc region [used=unknown]
^--   TxLog region [used=0MB]
^-- Outbound messages queue [size=0]
^-- Public thread pool [active=0, idle=0, qSize=0]
^-- System thread pool [active=0, idle=128, qSize=0]


ignite logs and gc logs;
ignite logs;
ignite.zip
  
gc logs;
gc.zip   
ignite configuration xml;
ignite-config.xml
  


any thougs and suggestions for configuraiton optimization?

Regards



-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: excessive timeouts and load on NODE_JOINED and NODE_LEFT events

2019-11-13 Thread ihalilaltun
Hi Maksim,

Thanks, i think it will

regards



-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: excessive timeouts and load on NODE_JOINED and NODE_LEFT events

2019-11-12 Thread ihalilaltun
Hi,

Timeouts always starts when NODE_JOINED event has been fired, i am not sure
if this event causes PME to take place or not. 

As I said before, this is a live system and we cannot stop ignite operations
while PME is running :(

I'll try to change log level to DEBUG, if I can do that, I'll share the logs
here.

Regards



-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: excessive timeouts and load on NODE_JOINED and NODE_LEFT events

2019-11-11 Thread ihalilaltun
Hi Ilya,

Can we restrict PME operations for client nodes explicitly? As far as I know
PME does not occur when client nodes are connected. This is a production
environment and as you may expect we have many clients joining and removing
the grid-nodes under heavy traffic.

Any suggestions except increasing timeouts and avoiding client connections?
May be a configuration that will block PME operations on clients'
joining/leaving?

Regards.



-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


excessive timeouts and load on NODE_JOINED and NODE_LEFT events

2019-11-11 Thread ihalilaltun
Hi igniters,

Everytime a client node connects or disconnects from the grid we get
exessive number of timeouts and huge load on grid nodes. Current grid-node
metrics are the followings;

Metrics for local node (to disable set 'metricsLogFrequency' to 0)
^-- Node [id=cbdf5b45, uptime=40 days, 08:39:18.888]
^-- H/N/C [hosts=50, nodes=51, CPUs=288]
^-- CPU [cur=50.3%, avg=11.15%, GC=0%]
^-- PageMemory [pages=7115259]
^-- Heap [used=6753MB, free=17.56%, comm=8192MB]
^-- Off-heap [used=28119MB, free=2.94%, comm=28972MB]
^--   sysMemPlc region [used=0MB, free=99.99%, comm=100MB]
^--   default region [used=28118MB, free=1.93%, comm=28672MB]
^--   metastoreMemPlc region [used=1MB, free=98.76%, comm=100MB]
^--   TxLog region [used=0MB, free=100%, comm=100MB]
^-- Ignite persistence [used=32000MB]
^--   sysMemPlc region [used=0MB]
^--   default region [used=32000MB]
^--   metastoreMemPlc region [used=unknown]
^--   TxLog region [used=0MB]
^-- Outbound messages queue [size=0]
^-- Public thread pool [active=0, idle=0, qSize=0]
^-- System thread pool [active=0, idle=128, qSize=0]

this log is from our internal monitoring tool;
PROBLEM: ignite-22: Warning: Processor load is high on pk-ignite: 5.485

ignite logs and gc logs;
ignite.zip
  
gc.zip   

any thougs?





-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


RE: Ignite node failure after network issue

2019-10-31 Thread ihalilaltun
Hi Alex,

Thnaks for the response. We've made some optimizations on thread sizes and
reorganize classpaths. I'll write againg if we face the problem again.

regards



-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


RE: Ignite node failure after network issue

2019-10-28 Thread ihalilaltun
Hi Alex,

I've been removed the IP addreses for the sake of security reasons thats why
it seems non-standart.

I'll try to adjust all thread-pool sizes, I am not sure if we need them or
not, since the configurations are made by our previous software architect.

I'll look further on the serialization and marsheller problems, thanks. Will
it be enough if I add the jar files under the lib directories on server
nodes in order not to get these serialization problems?

thanks.



-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Unresponsive cluster after "Checkpoint read lock acquisition has been timed out" Error

2019-10-25 Thread ihalilaltun
Hi Ilya,

It is almost impossible for us to get thread dumps since this is production
environment we cannot use profiler :(

Our biggest object range from 2 to 4 kilobytes. We are planning to shrink
the sizes but time for this is not decided yet.

regards.



-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Ignite node failure after network issue

2019-10-25 Thread ihalilaltun
Hi Igniters,

We had a network glitch last night and one node halted itself. Both client
and node logs are attached, can someone have a look and tell me the exact
problem here;

Archive.zip
  

We are on version 2.7.6. Out client application runs on spring-boot v2.0.6

I have been searching all over the Apache Ignite online sources to find what
should be the best practices for network problem/s handling on both server
and client side, if someone has such a source, I would be happy to read it.

Regards.



-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Unresponsive cluster after "Checkpoint read lock acquisition has been timed out" Error

2019-10-18 Thread ihalilaltun
Hi Ilya,

Sorry for the late response. We don't use lock mechanism in our environment.
We have a lot of put, get operaitons, as far as i remember these operations
does not hold the locks. In addition to these operations, in many update/put
operations we use CacheEntryProcessor which also does not hold the locks.

My guess would be this; sometimes we put big objects to cache. I know using
big objects causes this kind of problems but can you confirm this also?

Regards.



-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Starvation in striped pool

2019-10-17 Thread ihalilaltun
Hi Ilya,

>From time to time, we have faced exactly the same problem. Is there any best
practices for handling network issues? What i mean is, if there is any
network issues between client/s and server/s we want the cluster keeps
living. As for the clients, they can be disconnected from servers.

Regards.



-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Cluster went down after "Unable to await partitions release latch within timeout" WARN

2019-10-11 Thread ihalilaltun
Hi Pavel,

Thank you for detailed explanation. We are discussing hotfix with
management, but i think decision will be negative :(

I think we'll have to wait 2.8 release, which seems to be released on
January 17, 2020. I hope we'll have this issue by then.

Regards.



-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Unresponsive cluster after "Checkpoint read lock acquisition has been timed out" Error

2019-10-11 Thread ihalilaltun
Hi Ilya,

Yes we have persistence enabled.










OS is not swapping out ignite memory, since we have more than enough
resources on the server. The disks used for persistence are ssd ones with
96MB/s read and write speed. Is there any easy way to check if we are
running out of data region?

Regards.



-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Cluster went down after "Unable to await partitions release latch within timeout" WARN

2019-10-11 Thread ihalilaltun
Hi Pavel,

Here is the logs from node with localId:3561ac09-6752-4e2e-8279-d975c268d045
ignite-2019-10-06.gz

  

cache creation is done with java code on our side, we use getOrCreateCache
method, here is the piece of code how we configure and create caches;

...
ignite.getOrCreateCache(getCommonCacheConfigurationForAccount(accountId,
initCacheType));

private  CacheConfiguration
getCommonCacheConfigurationForAccount(String accountId, IgniteCacheType
cacheType) {
CacheConfiguration cacheConfiguration = new
CacheConfiguration<>();
   
cacheConfiguration.setName(accountId.concat(cacheType.getCacheNameSuffix()));
if (cacheType.isSqlTable()) {
cacheConfiguration.setIndexedTypes(cacheType.getKeyClass(),
cacheType.getValueClass());
cacheConfiguration.setSqlSchema(accountId);
cacheConfiguration.setSqlEscapeAll(true);
}
cacheConfiguration.setEventsDisabled(true);
cacheConfiguration.setStoreKeepBinary(true);
cacheConfiguration.setAtomicityMode(CacheAtomicityMode.ATOMIC);
cacheConfiguration.setBackups(1);
if (!cacheType.getCacheGroupName().isEmpty()) {
cacheConfiguration.setGroupName(cacheType.getCacheGroupName());
}
if (cacheType.getExpiryDurationInDays().getDurationAmount() > 0) {
   
cacheConfiguration.setExpiryPolicyFactory(TouchedExpiryPolicy.factoryOf(cacheType.getExpiryDurationInDays()));
}
return cacheConfiguration;
}



-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Cluster went down after "Unable to await partitions release latch within timeout" WARN

2019-10-09 Thread ihalilaltun
Hi There Igniters,

We had a very strange cluster behivour while creating new caches on the fly.
Just after caches are created we start get following warnings from all
cluster nodes, including coordinator node;

[2019-09-27T15:00:17,727][WARN
][exchange-worker-#219][GridDhtPartitionsExchangeFuture] Unable to await
partitions release latch within timeout: ServerLatch [permits=1,
pendingAcks=[3561ac09-6752-4e2e-8279-d975c268d045], super=CompletableLatch
[id=exchange, topVer=AffinityTopologyVersion [topVer=92, minorTopVer=2]]]

After a while all client nodes are seemed to disconnected from cluster with
no logs on clients' side.

Coordinator node has many logs like;
2019-09-27T15:00:03,124][WARN
][sys-#337823][GridDhtPartitionsExchangeFuture] Partition states validation
has failed for group: acc_1306acd07be78000_userPriceDrop. Partitions cache
sizes are inconsistent for Part 129:
[9497f1c4-13bd-4f90-bbf7-be7371cea22f=757
1486cd47-7d40-400c-8e36-b66947865602=2427 ] Part 138:
[1486cd47-7d40-400c-8e36-b66947865602=2463
f9cf594b-24f2-4a91-8d84-298c97eb0f98=736 ] Part 156:
[b7782803-10da-45d8-b042-b5b4a880eb07=672
9f0c2155-50a4-4147-b444-5cc002cf6f5d=2414 ] Part 284:
[b7782803-10da-45d8-b042-b5b4a880eb07=690
1486cd47-7d40-400c-8e36-b66947865602=1539 ] Part 308:
[1486cd47-7d40-400c-8e36-b66947865602=2401
7750e2f1-7102-4da2-9a9d-ea202f73905a=706 ] Part 362:
[1486cd47-7d40-400c-8e36-b66947865602=2387
7750e2f1-7102-4da2-9a9d-ea202f73905a=697 ] Part 434:
[53c253e1-ccbe-4af1-a3d6-178523023c8b=681
1486cd47-7d40-400c-8e36-b66947865602=1541 ] Part 499:
[1486cd47-7d40-400c-8e36-b66947865602=2505
7750e2f1-7102-4da2-9a9d-ea202f73905a=699 ] Part 622:
[1486cd47-7d40-400c-8e36-b66947865602=2436
e97a0f3f-3175-49f7-a476-54eddd59d493=662 ] Part 662:
[b7782803-10da-45d8-b042-b5b4a880eb07=686
1486cd47-7d40-400c-8e36-b66947865602=2445 ] Part 699:
[1486cd47-7d40-400c-8e36-b66947865602=2427
f9cf594b-24f2-4a91-8d84-298c97eb0f98=646 ] Part 827:
[62a05754-3f3a-4dc8-b0fa-53c0a0a0da63=703
1486cd47-7d40-400c-8e36-b66947865602=1549 ] Part 923:
[1486cd47-7d40-400c-8e36-b66947865602=2434
a9e9eaba-d227-4687-8c6c-7ed522e6c342=706 ] Part 967:
[62a05754-3f3a-4dc8-b0fa-53c0a0a0da63=673
1486cd47-7d40-400c-8e36-b66947865602=1595 ] Part 976:
[33301384-3293-417f-b94a-ed36ebc82583=666
1486cd47-7d40-400c-8e36-b66947865602=2384 ] 

Coordinator's log and one of the cluster node's log is attached.
coordinator_log.gz
  
cluster_node_log.gz
 
 

Any help/comment is appriciated.

Thanks.





-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Unresponsive cluster after "Checkpoint read lock acquisition has been timed out" Error

2019-10-09 Thread ihalilaltun
Hi There,

We had a unresponsive cluster today after the following error;

[2019-10-09T07:08:13,623][ERROR][sys-stripe-94-#95][GridCacheDatabaseSharedManager]
Checkpoint read lock acquisition has been timed out.
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$CheckpointReadLockTimeoutException:
Checkpoint read lock acquisition has been timed out.
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.failCheckpointReadLock(GridCacheDatabaseSharedManager.java:1564)
~[ignite-core-2.7.6.jar:2.7.6]
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1497)
[ignite-core-2.7.6.jar:2.7.6]
at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1739)
[ignite-core-2.7.6.jar:2.7.6]
at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1668)
[ignite-core-2.7.6.jar:2.7.6]
at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3138)
[ignite-core-2.7.6.jar:2.7.6]
at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:135)
[ignite-core-2.7.6.jar:2.7.6]
at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:271)
[ignite-core-2.7.6.jar:2.7.6]
at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:266)
[ignite-core-2.7.6.jar:2.7.6]
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1056)
[ignite-core-2.7.6.jar:2.7.6]
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:581)
[ignite-core-2.7.6.jar:2.7.6]
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:380)
[ignite-core-2.7.6.jar:2.7.6]
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:306)
[ignite-core-2.7.6.jar:2.7.6]
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:101)
[ignite-core-2.7.6.jar:2.7.6]
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:295)
[ignite-core-2.7.6.jar:2.7.6]
at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1569)
[ignite-core-2.7.6.jar:2.7.6]
at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1197)
[ignite-core-2.7.6.jar:2.7.6]
at
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:127)
[ignite-core-2.7.6.jar:2.7.6]
at
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1093)
[ignite-core-2.7.6.jar:2.7.6]
at
org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:505)
[ignite-core-2.7.6.jar:2.7.6]
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
[ignite-core-2.7.6.jar:2.7.6]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201]



After this log cluster went into infinite loop somehow and became
unresponsive. Since log files are bigger than 5MB, I am sharing google-drive
link for all log files.
https://drive.google.com/drive/folders/1XHaw2YZq3_F4CMw8m_mJZkUz1K17njU9?usp=sharing

any help appriciated

thanks




-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Too many file descriptors on ignite node

2019-09-30 Thread ihalilaltun
Hi Denis,

Our problem is not related to configuration parameters. We already limited
archive size to 8, but some nodes do not release file from filesystem. When
we look at the archive directory we only see 8 files, but when we look at
the file descriptors on the server, we get thousands of wal files that is
not released from filesystem. The list i shared is from filesystem. I think
there is a big issue here.



-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Too many file descriptors on ignite node

2019-09-23 Thread ihalilaltun
Hi Igniters it is me again :)

We are having a wierd behivor on some of cluster nodes. Cluster uses native
persistance with MMAP disabled. Some clusters have too many wal files even
if they are already deleted, but for some reason they are stll persisted on
the disk. I do not have any logs on cluster or related machines, but I have
ss's and file descriptors list;
here is the ss from related node;
Screen_Shot_2019-09-23_at_14.png

  
here is the file descriptor list;
open_files.zip
  

I am not sure if this is related to
https://issues.apache.org/jira/browse/IGNITE-12127, i hope it is, since we
are planning to upgrade all clusters before end of this week. If this is not
related to IGNITE-12127, then any comments how this is possible.

cheers



-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: [ANNOUNCE] Apache Ignite 2.7.6 Released

2019-09-23 Thread ihalilaltun
We have had some issues with the native persistance, in fact I reported this
issue :) https://issues.apache.org/jira/browse/IGNITE-12127
We are hoping to have the upgrade before this week ends.

cheers



-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Node failure with "Failed to write buffer." error

2019-09-02 Thread ihalilaltun
I am sorry but it has been a long time that we changed the configuration and
we do not have any logs or traces :(
any estimated date for 2.7.6 release?



-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Node failure with "Failed to write buffer." error

2019-09-02 Thread ihalilaltun
Hi mmuzaf,

Sorry for late response. When we enabled mmap we had some IO issues, that's
why we diseabled it. If there is such a bug like you sad, we can re-enable
mmap.



-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Node failure with "Failed to write buffer." error

2019-08-23 Thread ihalilaltun
Hi Mmuzaf

IGNITE_WAL_MMAP is false in our environment.

Here is the configuration;


http://www.springframework.org/schema/beans";
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
   xsi:schemaLocation="
http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd";>
















































































-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Node failure with "Failed to write buffer." error

2019-08-23 Thread ihalilaltun
Hi Dmagda

Here is the all log files that can get from the server;
ignite.zip
  
gc.zip   
gc-logs-continnued

  



-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Node failure with "Failed to write buffer." error

2019-08-22 Thread ihalilaltun
Hi folks,

We have been experiencing node failures with the error "Failed to write
buffer." recently. Any ideas or optimizations not to get the error and node
failure?

Thanks...

[2019-08-22T01:20:55,916][ERROR][wal-write-worker%null-#221][] Critical
system error detected. Will be handled accordingly to configured handler
[hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED,
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
[type=CRITICAL_ERROR, err=class
o.a.i.i.processors.cache.persistence.StorageException: Failed to write
buffer.]]
org.apache.ignite.internal.processors.cache.persistence.StorageException:
Failed to write buffer.
at
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$WALWriter.writeBuffer(FileWriteAheadLogManager.java:3484)
[ignite-core-2.7.5.jar:2.7.5]
at
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$WALWriter.body(FileWriteAheadLogManager.java:3301)
[ignite-core-2.7.5.jar:2.7.5]
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
[ignite-core-2.7.5.jar:2.7.5]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201]
Caused by: java.nio.channels.ClosedChannelException
at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:110)
~[?:1.8.0_201]
at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:253)
~[?:1.8.0_201]
at
org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIO.position(RandomAccessFileIO.java:48)
~[ignite-core-2.7.5.jar:2.7.5]
at
org.apache.ignite.internal.processors.cache.persistence.file.FileIODecorator.position(FileIODecorator.java:41)
~[ignite-core-2.7.5.jar:2.7.5]
at
org.apache.ignite.internal.processors.cache.persistence.file.AbstractFileIO.writeFully(AbstractFileIO.java:111)
~[ignite-core-2.7.5.jar:2.7.5]
at
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$WALWriter.writeBuffer(FileWriteAheadLogManager.java:3477)
~[ignite-core-2.7.5.jar:2.7.5]
... 3 more
[2019-08-22T01:20:55,921][WARN
][wal-write-worker%null-#221][FailureProcessor] No deadlocked threads
detected.
[2019-08-22T01:20:56,347][WARN
][wal-write-worker%null-#221][FailureProcessor] Thread dump at 2019/08/22
01:20:56 UTC


*Ignite version*: 2.7.5 
*Cluster size*: 16 
*Client size*: 22 
*Cluster OS version*: Centos 7 
*Cluster Kernel version*: 4.4.185-1.el7.elrepo.x86_64 
*Java version* : 
java version "1.8.0_201" 
Java(TM) SE Runtime Environment (build 1.8.0_201-b09) 
Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode) 

Current disk sizes;
Screen_Shot_2019-08-22_at_12.png

  
Ignite and gc logs;
ignite-9.zip
  
Ignite configuration file;
default-config.xml
  



-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Sudden node failure on Ignite v2.7.5

2019-07-18 Thread ihalilaltun
Hi Ivan

Thanks for the reply. I've checked the jira issue and it says it will be
released in v2.8, when do you think v2.8 will be released?



-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: CacheEntryProcessor causes cluster node to stop

2019-07-17 Thread ihalilaltun
Hi Vladimir,

here is logs from other node
ignite-3.zip
  



-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Random CorruptedTreeException from Apache Ignite

2019-07-17 Thread ihalilaltun
Hi Maxim,

we are facing the exact same problem :(

Is it ok/safe to remove cacheGroupName from the code that is already been
created on nodes? If so, when we start our applications from the updated
code will we be still have access to same caches or new caches will be
created on nodes?



-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


CacheEntryProcessor causes cluster node to stop

2019-07-17 Thread ihalilaltun
Hi Igniters,

Although class was deployed in SHARED or CONTINUOUS mode node got exception
and halted itself.

log added;
ignite.zip
  


*Ignite version*: 2.7.5 
*Cluster size*: 16 
*Client size*: 22 
*Cluster OS version*: Centos 7 
*Cluster Kernel version*: 4.4.185-1.el7.elrepo.x86_64 
*Java version* : 
java version "1.8.0_201" 
Java(TM) SE Runtime Environment (build 1.8.0_201-b09) 
Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode) 



-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


NegativeArraySizeException on cluster rebalance

2019-07-17 Thread ihalilaltun
Hi Igniters,

We are getting negativearraysizeexception on one of our cacheGroups
rebalancing period. I am adding the details that I could get from logs;

*Ignite version*: 2.7.5 
*Cluster size*: 16 
*Client size*: 22 
*Cluster OS version*: Centos 7 
*Cluster Kernel version*: 4.4.185-1.el7.elrepo.x86_64 
*Java version* : 
java version "1.8.0_201"
Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)


Pojo we use;
UserPriceDropRecord.txt

  

Error we got;

[2019-07-17T10:23:13,162][ERROR][sys-#241][GridDhtPartitionSupplier] Failed
to continue supplying [grp=userPriceDropDataCacheGroup,
demander=12d8bad8-62a9-465d-aca4-4afa203d6778,
topVer=AffinityTopologyVersion [topVer=238, minorTopVer=0], topic=1]
java.lang.NegativeArraySizeException: null
at 
org.apache.ignite.internal.pagemem.PageUtils.getBytes(PageUtils.java:63)
~[ignite-core-2.7.5.jar:2.7.5]
at
org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.readFullRow(CacheDataRowAdapter.java:330)
~[ignite-core-2.7.5.jar:2.7.5]
at
org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:167)
~[ignite-core-2.7.5.jar:2.7.5]
at
org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:108)
~[ignite-core-2.7.5.jar:2.7.5]
at
org.apache.ignite.internal.processors.cache.tree.DataRow.(DataRow.java:55)
~[ignite-core-2.7.5.jar:2.7.5]
at
org.apache.ignite.internal.processors.cache.tree.CacheDataRowStore.dataRow(CacheDataRowStore.java:92)
~[ignite-core-2.7.5.jar:2.7.5]
at
org.apache.ignite.internal.processors.cache.tree.CacheDataTree.getRow(CacheDataTree.java:200)
~[ignite-core-2.7.5.jar:2.7.5]
at
org.apache.ignite.internal.processors.cache.tree.CacheDataTree.getRow(CacheDataTree.java:49)
~[ignite-core-2.7.5.jar:2.7.5]
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.fillFromBuffer0(BPlusTree.java:5512)
~[ignite-core-2.7.5.jar:2.7.5]
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.fillFromBuffer(BPlusTree.java:5280)
~[ignite-core-2.7.5.jar:2.7.5]
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.nextPage(BPlusTree.java:5332)
~[ignite-core-2.7.5.jar:2.7.5]
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.next(BPlusTree.java:5566)
~[ignite-core-2.7.5.jar:2.7.5]
at
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$9.onHasNext(IgniteCacheOffheapManagerImpl.java:1185)
~[ignite-core-2.7.5.jar:2.7.5]
at
org.apache.ignite.internal.util.GridCloseableIteratorAdapter.hasNextX(GridCloseableIteratorAdapter.java:53)
~[ignite-core-2.7.5.jar:2.7.5]
at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.IgniteRebalanceIteratorImpl.advance(IgniteRebalanceIteratorImpl.java:79)
~[ignite-core-2.7.5.jar:2.7.5]
at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.IgniteRebalanceIteratorImpl.nextX(IgniteRebalanceIteratorImpl.java:139)
~[ignite-core-2.7.5.jar:2.7.5]
at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.IgniteRebalanceIteratorImpl.next(IgniteRebalanceIteratorImpl.java:185)
~[ignite-core-2.7.5.jar:2.7.5]
at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.IgniteRebalanceIteratorImpl.next(IgniteRebalanceIteratorImpl.java:37)
~[ignite-core-2.7.5.jar:2.7.5]
at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionSupplier.handleDemandMessage(GridDhtPartitionSupplier.java:333)
[ignite-core-2.7.5.jar:2.7.5]
at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.handleDemandMessage(GridDhtPreloader.java:404)
[ignite-core-2.7.5.jar:2.7.5]
at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:424)
[ignite-core-2.7.5.jar:2.7.5]
at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:409)
[ignite-core-2.7.5.jar:2.7.5]
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1056)
[ignite-core-2.7.5.jar:2.7.5]
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:581)
[ignite-core-2.7.5.jar:2.7.5]
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$700(GridCacheIoManager.java:101)
[ignite-core-2.7.5.jar:2.7.5]
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager$OrderedMessageListener.onMessage(GridCacheI

Re: Sudden node failure on Ignite v2.7.5

2019-07-15 Thread ihalilaltun
Hi Pavel,

Thanks for you reply. Since we use the whole sysyem on production
environment we cannot apply the second solution.
Do you have any estimated time for the first solution/fix?

Thanks.



-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Sudden node failure on Ignite v2.7.5

2019-07-13 Thread ihalilaltun
Hi Igniters,

Recently (11.07.2019), we have upgraded our ignite versin from 2.7.0 to
2.7.5. Just like after 11 hours one of our nodes killed itself without any
notification. I am adding the details that I could get from the server and
the topology we use;

*Ignite version*: 2.7.5
*Cluster size*: 16
*Client size*: 22
*Cluster OS version*: Centos 7
*Cluster Kernel version*: 4.4.185-1.el7.elrepo.x86_64
*Java version* : 
openjdk version "1.8.0_212"
OpenJDK Runtime Environment (build 1.8.0_212-b04)
OpenJDK 64-Bit Server VM (build 25.212-b04, mixed mode)

By the way this is a production environment and we have been using this
topology for almost 5 months. Our average tps size is ~5000 for the cluster.
We have 8 to 10 different object that we persist on ignite, some of them
relatively big and some ara just strings. 

ignite.zip
  
gc.current
  
hs_err_pid18537.log
 
 
 



-
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/