[jira] [Comment Edited] (IGNITE-20299) Creating a cache with an unknown data region name causes total unrecoverable failure of the grid
[ https://issues.apache.org/jira/browse/IGNITE-20299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761283#comment-17761283 ] Pavel Tupitsyn edited comment on IGNITE-20299 at 9/1/23 11:29 AM: -- [~rpwilson] I can reproduce the issue. Some observations: * To fix the grid, remove {{BadCacheCreationReproducer\Persistence\...\cache-ABadCache}} directory * Reproduces on Apache.Ignite 2.15, but not on GridGain.Ignite 8.8.33 was (Author: ptupitsyn): [~rpwilson] I can reproduce the issue. Some observations: * To fix the grid, remove {code}BadCacheCreationReproducer\Persistence\...\cache-ABadCache{code} directory * Reproduces on Apache.Ignite 2.15, but not on GridGain.Ignite 8.8.33 > Creating a cache with an unknown data region name causes total unrecoverable > failure of the grid > > > Key: IGNITE-20299 > URL: https://issues.apache.org/jira/browse/IGNITE-20299 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 2.15 > Environment: Observed in: > C# client and grid running on Linux in a container > C# client and grid running on Windows > >Reporter: Raymond Wilson >Priority: Major > > Using the Ignite C# client. > > Given a running grid, having a client (and perhaps server) node in the grid > attempt to create a cache using a DataRegionName that does not exist in the > grid causes immediate failure in the client node with the following log > output. > > 2023-08-27 17:08:48,520 [44] INF [ImmutableClientServer] Completed > partition exchange [localNode=15122bd7-bf81-44e6-a548-e70dbd9334c0, > exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion > [topVer=15, minorTopVer=0], evt=NODE_FAILED, evtNode=TcpDiscoveryNode > [id=9d5ed68d-38bb-447d-aed5-189f52660716, > consistentId=9d5ed68d-38bb-447d-aed5-189f52660716, addrs=ArrayList > [127.0.0.1], sockAddrs=null, discPort=0, order=8, intOrder=8, > lastExchangeTime=1693112858024, loc=false, ver=2.15.0#20230425-sha1:f98f7f35, > isClient=true], rebalanced=false, done=true, newCrdFut=null], > topVer=AffinityTopologyVersion [topVer=15, minorTopVer=0]] > 2023-08-27 17:08:48,520 [44] INF [ImmutableClientServer] Exchange timings > [startVer=AffinityTopologyVersion [topVer=15, minorTopVer=0], > resVer=AffinityTopologyVersion [topVer=15, minorTopVer=0], stage="Waiting in > exchange queue" (14850 ms), stage="Exchange parameters initialization" (2 > ms), stage="Determine exchange type" (3 ms), stage="Exchange done" (4 ms), > stage="Total time" (14859 ms)] > 2023-08-27 17:08:48,522 [44] INF [ImmutableClientServer] Exchange longest > local stages [startVer=AffinityTopologyVersion [topVer=15, minorTopVer=0], > resVer=AffinityTopologyVersion [topVer=15, minorTopVer=0]] > 2023-08-27 17:08:48,524 [44] INF [ImmutableClientServer] Finished exchange > init [topVer=AffinityTopologyVersion [topVer=15, minorTopVer=0], crd=false] > 2023-08-27 17:08:48,525 [44] INF [ImmutableClientServer] > AffinityTopologyVersion [topVer=15, minorTopVer=0], evt=NODE_FAILED, > evtNode=9d5ed68d-38bb-447d-aed5-189f52660716, client=true] > Unhandled exception: Apache.Ignite.Core.Cache.CacheException: class > org.apache.ignite.IgniteCheckedException: Failed to complete exchange process. > ---> Apache.Ignite.Core.Common.IgniteException: Failed to complete exchange > process. > ---> Apache.Ignite.Core.Common.JavaException: javax.cache.CacheException: > class org.apache.ignite.IgniteCheckedException: Failed to complete exchange > process. > at > org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1272) > at > org.apache.ignite.internal.IgniteKernal.getOrCreateCache0(IgniteKernal.java:2278) > at > org.apache.ignite.internal.IgniteKernal.getOrCreateCache(IgniteKernal.java:2242) > at > org.apache.ignite.internal.processors.platform.PlatformProcessorImpl.processInStreamOutObject(PlatformProcessorImpl.java:643) > at > org.apache.ignite.internal.processors.platform.PlatformTargetProxyImpl.inStreamOutObject(PlatformTargetProxyImpl.java:79) > Caused by: class org.apache.ignite.IgniteCheckedException: Failed to complete > exchange process. > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.createExchangeException(GridDhtPartitionsExchangeFuture.java:3709) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.sendExchangeFailureMessage(GridDhtPartitionsExchangeFuture.java:3737) > at >
[jira] [Comment Edited] (IGNITE-20299) Creating a cache with an unknown data region name causes total unrecoverable failure of the grid
[ https://issues.apache.org/jira/browse/IGNITE-20299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17760136#comment-17760136 ] Raymond Wilson edited comment on IGNITE-20299 at 8/30/23 6:31 PM: -- I think persistence is an important aspect for this issue as it is on restart that the grid complains that it cannot (a) start the incorrectly created cache (which raises the question as to why it is still known about if creation of it was unsuccessful) and (b) fails to initialise the persisted caches. The cache folder for the incorrectly create cache is also constructed which indicates that the grid has somehow accepted the cache as a valid new cache while at the same time throwing the exchange process exception, all of which indicates the validation of the parameters for the new cache is not enforcing the requirement for the data region to be known. was (Author: rpwilson): I think persistence is an important for this issue as it is on restart that the grid complains that it cannot (a) start the incorrectly created cache (which raises the question as to why it is still known about if creation of it was unsuccessful) and (b) fails to initialise the persisted caches. The cache folder for the incorrectly create cache is also constructed which indicates that the grid has somehow accepted the cache as a valid new cache while at the same time throwing the exchange process exception, all of which indicates the validation of the parameters for the new cache is not enforcing the requirement for the data region to be known. > Creating a cache with an unknown data region name causes total unrecoverable > failure of the grid > > > Key: IGNITE-20299 > URL: https://issues.apache.org/jira/browse/IGNITE-20299 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 2.15 > Environment: Observed in: > C# client and grid running on Linux in a container > C# client and grid running on Windows > >Reporter: Raymond Wilson >Priority: Major > > Using the Ignite C# client. > > Given a running grid, having a client (and perhaps server) node in the grid > attempt to create a cache using a DataRegionName that does not exist in the > grid causes immediate failure in the client node with the following log > output. > > 2023-08-27 17:08:48,520 [44] INF [ImmutableClientServer] Completed > partition exchange [localNode=15122bd7-bf81-44e6-a548-e70dbd9334c0, > exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion > [topVer=15, minorTopVer=0], evt=NODE_FAILED, evtNode=TcpDiscoveryNode > [id=9d5ed68d-38bb-447d-aed5-189f52660716, > consistentId=9d5ed68d-38bb-447d-aed5-189f52660716, addrs=ArrayList > [127.0.0.1], sockAddrs=null, discPort=0, order=8, intOrder=8, > lastExchangeTime=1693112858024, loc=false, ver=2.15.0#20230425-sha1:f98f7f35, > isClient=true], rebalanced=false, done=true, newCrdFut=null], > topVer=AffinityTopologyVersion [topVer=15, minorTopVer=0]] > 2023-08-27 17:08:48,520 [44] INF [ImmutableClientServer] Exchange timings > [startVer=AffinityTopologyVersion [topVer=15, minorTopVer=0], > resVer=AffinityTopologyVersion [topVer=15, minorTopVer=0], stage="Waiting in > exchange queue" (14850 ms), stage="Exchange parameters initialization" (2 > ms), stage="Determine exchange type" (3 ms), stage="Exchange done" (4 ms), > stage="Total time" (14859 ms)] > 2023-08-27 17:08:48,522 [44] INF [ImmutableClientServer] Exchange longest > local stages [startVer=AffinityTopologyVersion [topVer=15, minorTopVer=0], > resVer=AffinityTopologyVersion [topVer=15, minorTopVer=0]] > 2023-08-27 17:08:48,524 [44] INF [ImmutableClientServer] Finished exchange > init [topVer=AffinityTopologyVersion [topVer=15, minorTopVer=0], crd=false] > 2023-08-27 17:08:48,525 [44] INF [ImmutableClientServer] > AffinityTopologyVersion [topVer=15, minorTopVer=0], evt=NODE_FAILED, > evtNode=9d5ed68d-38bb-447d-aed5-189f52660716, client=true] > Unhandled exception: Apache.Ignite.Core.Cache.CacheException: class > org.apache.ignite.IgniteCheckedException: Failed to complete exchange process. > ---> Apache.Ignite.Core.Common.IgniteException: Failed to complete exchange > process. > ---> Apache.Ignite.Core.Common.JavaException: javax.cache.CacheException: > class org.apache.ignite.IgniteCheckedException: Failed to complete exchange > process. > at > org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1272) > at > org.apache.ignite.internal.IgniteKernal.getOrCreateCache0(IgniteKernal.java:2278) > at > org.apache.ignite.internal.IgniteKernal.getOrCreateCache(IgniteKernal.java:2242) > at >
[jira] [Comment Edited] (IGNITE-20299) Creating a cache with an unknown data region name causes total unrecoverable failure of the grid
[ https://issues.apache.org/jira/browse/IGNITE-20299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759846#comment-17759846 ] Raymond Wilson edited comment on IGNITE-20299 at 8/29/23 7:00 AM: -- [~ptupitsyn] Yes, we are using persistence. This is our persistence XML file: {noformat} http://www.springframework.org/schema/beans; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; xmlns:util="http://www.springframework.org/schema/util; xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd http://www.springframework.org/schema/util http://www.springframework.org/schema/util/spring-util.xsd;> {noformat} Our configuration is mostly in code. Here is the primary configuration for the server nodes: {noformat} public void ConfigureTRexGrid(IgniteConfiguration cfg) { cfg.IgniteInstanceName = TRexGrids.ImmutableGridName(); cfg.JvmOptions = CommonJavaJVMOptions(); var configStore = DIContext.Obtain(); // Note: Set the PSN JVM heap size minimum and maximum sizes to be the maximum defined JVM heap size for the node. // This is to ensure the JVM always has access to the heap promised to it so will never act to resize the heap // This provide better performance and removes chances of surprise if the OS cannot allocate a larger heap size block // for other reason. cfg.JvmMaxMemoryMb = configStore.GetValueInt(PSNODE_IGNITE_JVM_MAX_HEAP_SIZE_MB, DEFAULT_IGNITE_JVM_MAX_HEAP_SIZE_MB); cfg.JvmInitialMemoryMb = configStore.GetValueInt(PSNODE_IGNITE_JVM_MAX_HEAP_SIZE_MB, DEFAULT_IGNITE_JVM_MAX_HEAP_SIZE_MB); cfg.UserAttributes = new Dictionary { { "Owner", TRexGrids.ImmutableGridName() } }; // Configure the Ignite persistence layer to store our data cfg.DataStorageConfiguration = new DataStorageConfiguration { WalMode = WalMode.Fsync, PageSize = IgniteDataRegionPageSize(), StoragePath = Path.Combine(TRexServerConfig.PersistentCacheStoreLocation, "Immutable", "Persistence"), WalPath = Path.Combine(TRexServerConfig.PersistentCacheStoreLocation, "Immutable", "WalStore"), WalArchivePath = Path.Combine(TRexServerConfig.PersistentCacheStoreLocation, "Immutable", "WalArchive"), WalSegmentSize = 512 * 1024 * 1024, // Set the WalSegmentSize to 512Mb to better support high write loads (can be set to max 2Gb) MaxWalArchiveSize = (long)10 * 512 * 1024 * 1024, // Ensure there are 10 segments in the WAL archive at the defined segment size CheckpointThreads = configStore.GetValueInt(IGNITE_NUMBER_OF_CHECKPOINTING_THREADS, DEFAULT_IGNITE_NUMBER_OF_CHECKPOINTING_THREADS), CheckpointFrequency = TimeSpan.FromSeconds(configStore.GetValueInt(IGNITE_CHECKPOINTING_INTERVAL_SECONDS, DEFAULT_IGNITE_CHECKPOINTING_INTERVAL_SECONDS)), DefaultDataRegionConfiguration = new DataRegionConfiguration { Name = DataRegions.DEFAULT_IMMUTABLE_DATA_REGION_NAME, InitialSize = configStore.GetValueLong(IMMUTABLE_DATA_REGION_INITIAL_SIZE_MB, DEFAULT_IMMUTABLE_DATA_REGION_INITIAL_SIZE_MB) * 1024 * 1024, MaxSize = configStore.GetValueLong(IMMUTABLE_DATA_REGION_MAX_SIZE_MB, DEFAULT_IMMUTABLE_DATA_REGION_MAX_SIZE_MB) * 1024 * 1024, PersistenceEnabled = true } }; Log.LogInformation($"cfg.DataStorageConfiguration.StoragePath={cfg.DataStorageConfiguration.StoragePath}"); Log.LogInformation($"cfg.DataStorageConfiguration.WalArchivePath={cfg.DataStorageConfiguration.WalArchivePath}"); Log.LogInformation($"cfg.DataStorageConfiguration.WalPath={cfg.DataStorageConfiguration.WalPath}"); if (!bool.TryParse(Environment.GetEnvironmentVariable("IS_KUBERNETES"), out var isKubernetes)) { Log.LogWarning($"Failed to parse the value of the 'IS_KUBERNETES' environment variable as a bool. Value is {Environment.GetEnvironmentVariable("IS_KUBERNETES")}. Defaulting to true"); } cfg = isKubernetes ? SetKubernetesIgniteConfiguration(cfg) : SetLocalIgniteConfiguration(cfg); cfg.WorkDirectory = Path.Combine(TRexServerConfig.PersistentCacheStoreLocation, "Immutable"); cfg.Logger = new TRexIgniteLogger(configStore, Logger.CreateLogger("ImmutableCacheComputeServer")); // Set an Ignite metrics heartbeat cfg.MetricsLogFrequency = new TimeSpan(0, 0, 0, configStore.GetValueInt(IGNITE_HEARTBEAT_FREQUENCY_SECONDS, DEFAULT_IGNITE_HEARTBEAT_FREQUENCY_SECONDS)); cfg.PublicThreadPoolSize = configStore.GetValueInt(IGNITE_PUBLIC_THREAD_POOL_SIZE, DEFAULT_IGNITE_PUBLIC_THREAD_POOL_SIZE); cfg.SystemThreadPoolSize =