number of way segments in wal
Hi Guys - For some weird reason, files are filling up in wal folder and right now I have around 65,000 files and occupies almost 4.5TB disk. Ideally it should not be more than 10 files right? Also i have disabled wal archiving. Why is this happening? and what am i missing? Following is my configuration regarding wal configuration Thanx and Regards, KR Kumar
Ignite data loss
Hi Guys - I have a four node cluster with native persistence enabled. Its a partitioned cache and sync rebalance is enabled. When I restart the cluster the first node that starts retain the data and all the other nodes data is deleted and all the ignite data files are turned into 4096 byte files. Am I missing something? or some configuration that I missing. Following is the cache configuration: CacheConfiguration cacheConfig = new CacheConfiguration(); cacheConfig.setCacheMode(CacheMode.PARTITIONED); cacheConfig.setRebalanceMode(CacheRebalanceMode.SYNC); // cacheConfig.setRebalanceDelay(30); cacheConfig.setName("eventCache-" + tenantRunId + "-" + tenantId); cacheConfig.setBackups(1); cacheConfig.setAtomicityMode(CacheAtomicityMode.ATOMIC); cacheConfig.setWriteSynchronizationMode(CacheWriteSynchronizationMode. FULL_SYNC); IgniteCache cache = IgniteContextWrapper.getInstance().getEngine() .getOrCreateCache(cacheConfig); Here is the configuration of ignite Any quick pointers ?? Thanx and Regards, KR Kumar
Error while adding the node the baseline topology
Hi guys - I am running into the following issue when trying to add a node to the baseline topology? Its happening only after we had upgraded from 2.3 to 2.75. Any pointers would be appreciated. 2019-10-22 10:31:42,441][WARN ][data-streamer-stripe-3-#52][PageMemoryImpl] Parking thread=data-streamer-stripe-3-#52 for timeout (ms)=771038 [2019-10-22 10:31:45,635][ERROR][tcp-disco-msg-worker-#2][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [threadName=data-streamer-stripe-30, blockedFor=95s] [2019-10-22 10:31:45,635][WARN ][tcp-disco-msg-worker-#2][G] Thread [name="data-streamer-stripe-30-#79", id=110, state=TIMED_WAITING, blockCnt=0, waitCnt=36470] [2019-10-22 10:31:45,637][ERROR][tcp-disco-msg-worker-#2][root] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=data-streamer-stripe-30, igniteInstanceName=null, finished=false, heartbeatTs=1571754609956]]] class org.apache.ignite.IgniteException: GridWorker [name=data-streamer-stripe-30, igniteInstanceName=null, finished=false, heartbeatTs=1571754609956] Thanx and Regards, KR Kumar
DataStreamer addData takes lot of time after 500 million writes
Hi all - Why data streamer take lot of time randomly after 500+ million writes, it frequently and very consistently takes lot of time to finish the writes and to the extent of 25 to 45 seconds for write. May be its flushing the data as i have flush frequency set but why not in the beginning and why only in the end. And also I see heap size going up after sometime and trend is consistently upwards. Here is the streamer configuration: dataStreamer.autoFlushFrequency(1); dataStreamer.perNodeBufferSize(32 * 1024); dataStreamer.perNodeParallelOperations(32); Not sure if this is of any use but here is the dataStorageConfiguration
Data streamer closed randomly
Hi guys -I have a three node cluster in which one node has 192GB ram and 48 cores and (i call this manager as it does some heavy lifting) and other 2 nodes have 60GB ram and 36 cores. ( worker nodes). I am getting following exception ramdomly : [2019-08-28 03:03:52,346][ERROR][client-connector-#277][JdbcRequestHandler] Failed to execute batch query [qry=SqlFieldsQuery [sql=INSERT INTO EVENTS_IDX_MAIN_SPL_1 VALUES (?, ?, ?, ?, ?, ?), args=null, collocated=false, timeout=0, enforceJoinOrder=false, distributedJoins=false, replicatedOnly=false, lazy=false, schema=PUBLIC]] javax.cache.CacheException: class org.apache.ignite.IgniteCheckedException: Data streamer has been closed. at org.apache.ignite.internal.processors.query.GridQueryProcessor.streamBatchedUpdateQuery(GridQueryProcessor.java:2120) at org.apache.ignite.internal.processors.odbc.jdbc.JdbcRequestHandler.executeBatchedQuery(JdbcRequestHandler.java:694) at org.apache.ignite.internal.processors.odbc.jdbc.JdbcRequestHandler.executeBatch(JdbcRequestHandler.java:650) at org.apache.ignite.internal.processors.odbc.jdbc.JdbcRequestHandler.executeBatchOrdered(JdbcRequestHandler.java:278) at org.apache.ignite.internal.processors.odbc.jdbc.JdbcRequestHandler.dispatchBatchOrdered(JdbcRequestHandler.java:264) at org.apache.ignite.internal.processors.odbc.jdbc.JdbcRequestHandler.handle(JdbcRequestHandler.java:218) at org.apache.ignite.internal.processors.odbc.ClientListenerNioListener.onMessage(ClientListenerNioListener.java:160) at org.apache.ignite.internal.processors.odbc.ClientListenerNioListener.onMessage(ClientListenerNioListener.java:44) at org.apache.ignite.internal.util.nio.GridNioFilterChain$TailFilter.onMessageReceived(GridNioFilterChain.java:279) Worker nodes are up and heathy but not sure if its a split brain syndrome sort of things?? Request your help in this regard? Regards, RAGHAV
too many dirty pages
Hi - I have ignite persistence enabled cache that has about 100 cache tables that i created using Ignite SQL. When I am writing data ( lot of data ) in these tables, I see these warnings in the logs and writes hang for some time and then resume. [2019-06-23 10:12:47,647][INFO ][db-checkpoint-thread-#144][GridCacheDatabaseSharedManager] Checkpoint started [checkpointId=d067e58d-d62d-4914-9ce8-531369e08c33, startPtr=FileWALPointer [idx=214, fileOff=115119, len=2123305], checkpointLockWait=0ms, checkpointLockHoldTime=1727ms, walCpRecordFsyncDuration=2305ms, pages=304819, reason='too many dirty pages'] Any pointers, please let me know ?? Thanx and Regards, KR Kumar -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Splitting cache table slows down and long pauses
Hi Guys - I am using ignite file based persistence. I have a cache table that i created using JDBC driver. This table has grown very big, so I have split the table into 100 tables to improve the query performance. Now the problem, application has become very slow and also i see long pauses. One thing I see is that each cache instance creates bunch of tables in the data folders and now I see lot of file handles open. Is that a problem?? Any pointers?? Thanx and Regards, KR Kumar -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
BLOB with JDBC Driver
Hi - I trying out JDBC driver with ignite SQL tables. How do i insert a blob into my cache table through jdbc? Thanx and Regards, KR Kumar -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: Partioned cache is not distributing the data
We finally figured out why it's not working. Its something to do with the baseline topology -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Partioned cache is not distributing the data
I have posted this earlier but did not get any response, hence trying one more time. Have an application where the writes happens mostly from one node and reads/compute happens on all the nodes. The problem that I have is that the data is not getting distributed across nodes meaning the disk on the node where writes are happening is getting filled and disk is running out of space where as other nodes are disks are almost 99% empty and no data is getting written there. Here is the cache configuration: CacheConfiguration cacheConfig = new CacheConfiguration(); cacheConfig.setCacheMode(CacheMode.PARTITIONED); cacheConfig.setRebalanceMode(CacheRebalanceMode.ASYNC); cacheConfig.setRebalanceDelay(12); cacheConfig.setName("eventCache-"+ System.getProperty(RUN_ID) +"-"+ tenantId); cacheConfig.setBackups(0); cacheConfig.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL); cacheConfig.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC); Any pointers?? Thanx and Regards, KR Kumar -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: DataStreamer
Following code creates the data streamer instance dataStreamer = IgniteContextWrapper.getInstance().getEngine().dataStreamer("eventCache-" + System.getProperty(RUN_ID) +"-"+ tenantId); and for writing the data to cache, dataStreamer.addData(key, value); Nothing fancy, very simple code. Instead of streamer, If I used cache.put, everything works fine but its dead slow -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Data not getting distributed across the cluster.
Hi - I have a application where the writes happens mostly from one node and reads/compute happens on all the nodes. The problem that I have is that the data is not getting distributed across nodes meaning the disk on the node where writes are happening is getting filled and disk is running out of space where as other nodes are disks are almost 99% empty and no data is getting written there. Here is the cache configuration: CacheConfiguration cacheConfig = new CacheConfiguration(); cacheConfig.setCacheMode(CacheMode.PARTITIONED); cacheConfig.setRebalanceMode(CacheRebalanceMode.ASYNC); cacheConfig.setRebalanceDelay(12); cacheConfig.setName("eventCache-"+ System.getProperty(RUN_ID) +"-"+ tenantId); cacheConfig.setBackups(0); cacheConfig.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL); cacheConfig.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC); Any pointers?? Thanx and Regards, KR Kumar -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
DataStreamer
Hi Guys - I have a weird problem. when I am using data streamer to write data to ignite ( file based persistence), not all the entries are getting persisted as later some gets are returning nulls for few keys. This is very random in terms of which keys getting persisted but consistent in terms not writing all data to files. I am using 24 node cluster for both persistence based cache and distributed compute task execution. Any Pointers? Thanx and Regards, KR Kumar -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Ignite getAll does not return values randomly
Hi Guys - I have 10 node ignite cluster thats used for cache and some distributed computing as well. Now the problem is when I do get all on the cache, the cache entries returned are zero though while persisting, I get success. Does WalMode background or CacheRebalance.NONE have any effect on how I read the data. Thanx and Regards, KR Kumar -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
RE: Deployment of Ignite/Application upgrades to production in 100+ nodecluster
Hi Stan - Thanks a lot for quick response on this. Yeah just checked gridgain and looks likes exactly what I need. May be will explore this feature. Not sure if they have a trial. Thanx and Regards, KR Kumar -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Deployment of Ignite/Application upgrades to production in 100+ node cluster
Hi - I have a 100 node cluster that is going to production in next 3 months. The problem I am trying to figure out is how do i handle version and application upgrades i.e when I am deploying the nodes in a sequence, I will reach a point where half of my nodes are in the old version and half in the newer which could lead unpredictable application behavior. Are there any ways/tricks that people have already figured out in this area that I should be aware. One solution I am thinking is to create cluster groups and deploy them by one group at a time. Please share your thoughts?? Thanx and Regards, KR Kumar -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Ignite reads are slower with getAll
Hi guys - I have a ignite cluster with persistence enabled that has 200 million events in it. Right now read throughput is around 3000 events per second. I have increased the IOPS to 1 and even then I have the same performance. Am I doing something really wrong or this is how it perform with large amounts of data. I am using getAll with a batch of 300 keys in one read. The cache is basically a string key and json message, so its String,String type of cache. Any help/pointers ?? Thanx and Regards, KR Kumar -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: NODE_SEGMENTED AND NODE_FAILED errors
Thanks ... will try that configuration -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
NODE_SEGMENTED AND NODE_FAILED errors
Hi - I have a three node cluster with c4.2xlarge machines with ignite persistence enabled. While doing a load testing of the cluster, after writing about 50 million events to the cache, I am getting NODE_SEGMENTED and NODE_FAILED notifications and after the cache server sort of hangs. But when I see all the JVMs are perfectly up and running. For NODE_FAILED and NODE_SEGMENTED errors, can you give me some pointers as to what I should be looking at?? Thanx and Regards, KR Kumar -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Ignite Indexes are stored where??
Hi Guys - This could be a dumb question but i think its version dependent, so let me ask this anyways? Does Ignite store indexes in heap or off heap by default?? Because when I am profiling the application, I see huge bunch of int arrays instantiated by Ignite and wondering what are they ?? I am currently on 2.3 version.. Thanx and Regards, KR Kumar -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: Ignite Persistence performance issue
Yeah. Thats exactly what I am doing and seems to have improved. The frequency is set at 5000 i.e. 5 seconds Thanx and Regards, KR Kumar -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Ignite Persistence performance issue
Hi All - I am using Ignite Persistence ( 2.3) for storing events that we receive from different sources. Currently the write and read throughput is 2k to 4k per second on a t2.xlarge and c4.2xlarge machines on a 3 node cluster. The disks attached are provisioned SSDs with 1 IOPS. These numbers are not very encouraging and I am wondering if there is any way to improve or I am missing something fundamental.. The event size that i have is around 1kb. The datastorageconfiguration config is as follows : Not sure the details i have provided are good enough to give a concrete suggestion, so let me know if you need more info. Thanx and Regards, KRKumar -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: checkpoint marker is present on disk, but checkpoint record is missed in WAL
Hi AG, Thanks for responding to the thread. I have tried with 2.3 and I still face the same problem. Just to further explore, I killed ignite instance with kill -9 and a reboot, both situations, ignite just hangs during restart. Thanx and Regards KR Kumar -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
checkpoint marker is present on disk, but checkpoint record is missed in WAL
Hi Guys - I am using ignite persistence with a 8 node cluster. Currently in dev/poc stages. I get following exception when i try to restart the node after I killed the process with "kill . I have a shutdown hook to the code in which I am shutting down Ignite with G.stop(false). I read in a blog that When you stop ignite with cancel false, it will checkpoint the data and the stop the cluster and should not have any issues with restart. Any help is greatly appreciated. Invocation of init method failed; nested exception is class org.apache.ignite.IgniteCheckedException: Failed to restore memory state (checkpoint marker is present on disk, but checkpoint record is missed in WAL) [cpStatus=CheckpointStatus [cpStartTs=1507546382988, cpStartId=abeb760a-0388-4ad5-8473-62ed9c7bc0f3, startPtr=FileWALPointer [idx=6, fileOffset=33982453, len=2380345, forceFlush=false], cpEndId=c257dd1f-c350-4b0d-aefc-cad6d2c2082b, endPtr=FileWALPointer [idx=4, fileOffset=38761373, len=1586221, forceFlush=false]], lastRead=null] 06:55:09.341 [main] WARN org.springframework.context.support.ClassPathXmlApplicationContext - Exception encountered during context initialization - cancelling refresh attempt: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'igniteContainer' defined in class path resource [mihi-gridworker-s.xml]: Invocation of init method failed; nested exception is class org.apache.ignite.IgniteCheckedException: Failed to restore memory state (checkpoint marker is present on disk, but checkpoint record is missed in WAL) [cpStatus=CheckpointStatus [cpStartTs=1507546382988, cpStartId=abeb760a-0388-4ad5-8473-62ed9c7bc0f3, startPtr=FileWALPointer [idx=6, fileOffset=33982453, len=2380345, forceFlush=false], cpEndId=c257dd1f-c350-4b0d-aefc-cad6d2c2082b, endPtr=FileWALPointer [idx=4, fileOffset=38761373, len=1586221, forceFlush=false]], lastRead=null] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1628) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:555) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:483) at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:306) at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230) at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:302) at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:197) at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:761) at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:866) at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:542) at org.springframework.context.support.ClassPathXmlApplicationContext.(ClassPathXmlApplicationContext.java:139) at org.springframework.context.support.ClassPathXmlApplicationContext.(ClassPathXmlApplicationContext.java:83) at com.pointillist.gridworker.agent.MihiGridWorker.start(MihiGridWorker.java:32) at com.pointillist.gridworker.MihiWorker.main(MihiWorker.java:20) Caused by: class org.apache.ignite.IgniteCheckedException: Failed to restore memory state (checkpoint marker is present on disk, but checkpoint record is missed in WAL) [cpStatus=CheckpointStatus [cpStartTs=1507546382988, cpStartId=abeb760a-0388-4ad5-8473-62ed9c7bc0f3, startPtr=FileWALPointer [idx=6, fileOffset=33982453, len=2380345, forceFlush=false], cpEndId=c257dd1f-c350-4b0d-aefc-cad6d2c2082b, endPtr=FileWALPointer [idx=4, fileOffset=38761373, len=1586221, forceFlush=false]], lastRead=null] at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(GridCacheDatabaseSharedManager.java:1433) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointAndRestoreMemory(GridCacheDatabaseSharedManager.java:539) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:616) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1901) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at java.lang.Thread.run(Thread.java:745) Appreciate your help?? Thanx and Regars, KR Kumar -- Sent from: http://apache-ignite-users.705
Ignite Server start failing with the exception with Ignite 2.1.0
) at org.apache.ignite.internal.util.future.GridFutureAdapter.resolve(GridFutureAdapter.java:258) at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:206) at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:158) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1911) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.ignite.internal.processors.cache.persistence.tree.io.IOVersions.forVersion(IOVersions.java:82) at org.apache.ignite.internal.processors.cache.persistence.tree.io.IOVersions.forPage(IOVersions.java:92) at org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.init(PagesList.java:174) at org.apache.ignite.internal.processors.cache.persistence.freelist.FreeListImpl.(FreeListImpl.java:357) at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore$1.(GridCacheOffheapManager.java:893) at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.init0(GridCacheOffheapManager.java:885) at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.updateCounter(GridCacheOffheapManager.java:1130) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.updateCounter(GridDhtLocalPartition.java:882) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.casState(GridDhtLocalPartition.java:564) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.own(GridDhtLocalPartition.java:594) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.initPartitions0(GridDhtPartitionTopologyImpl.java:337) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.beforeExchange(GridDhtPartitionTopologyImpl.java:507) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:991) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:632) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1901) ... 2 more [2017-09-21 07:47:18,639][INFO ][grid-timeout-worker-#15%null%][IgniteKernal] Metrics for local node (to disable set 'metricsLogFrequency' to 0) Appreciate your help, KR Kumar -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/