[jira] [Comment Edited] (IGNITE-20299) Creating a cache with an unknown data region name causes total unrecoverable failure of the grid

2023-09-01 Thread Pavel Tupitsyn (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-20299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761283#comment-17761283
 ] 

Pavel Tupitsyn edited comment on IGNITE-20299 at 9/1/23 11:29 AM:
--

[~rpwilson] I can reproduce the issue. Some observations:
* To fix the grid, remove 
{{BadCacheCreationReproducer\Persistence\...\cache-ABadCache}} directory 
* Reproduces on Apache.Ignite 2.15, but not on GridGain.Ignite 8.8.33


was (Author: ptupitsyn):
[~rpwilson] I can reproduce the issue. Some observations:
* To fix the grid, remove 
{code}BadCacheCreationReproducer\Persistence\...\cache-ABadCache{code} 
directory 
* Reproduces on Apache.Ignite 2.15, but not on GridGain.Ignite 8.8.33

> Creating a cache with an unknown data region name causes total unrecoverable 
> failure of the grid
> 
>
> Key: IGNITE-20299
> URL: https://issues.apache.org/jira/browse/IGNITE-20299
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.15
> Environment: Observed in:
> C# client and grid running on Linux in a container
> C# client and grid running on Windows
>  
>Reporter: Raymond Wilson
>Priority: Major
>
> Using the Ignite C# client.
>  
> Given a running grid, having a client (and perhaps server) node in the grid 
> attempt to create a cache using a DataRegionName that does not exist in the 
> grid causes immediate failure in the client node with the following log 
> output. 
>  
> 2023-08-27 17:08:48,520 [44] INF [ImmutableClientServer]   Completed 
> partition exchange [localNode=15122bd7-bf81-44e6-a548-e70dbd9334c0, 
> exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion 
> [topVer=15, minorTopVer=0], evt=NODE_FAILED, evtNode=TcpDiscoveryNode 
> [id=9d5ed68d-38bb-447d-aed5-189f52660716, 
> consistentId=9d5ed68d-38bb-447d-aed5-189f52660716, addrs=ArrayList 
> [127.0.0.1], sockAddrs=null, discPort=0, order=8, intOrder=8, 
> lastExchangeTime=1693112858024, loc=false, ver=2.15.0#20230425-sha1:f98f7f35, 
> isClient=true], rebalanced=false, done=true, newCrdFut=null], 
> topVer=AffinityTopologyVersion [topVer=15, minorTopVer=0]]
> 2023-08-27 17:08:48,520 [44] INF [ImmutableClientServer]   Exchange timings 
> [startVer=AffinityTopologyVersion [topVer=15, minorTopVer=0], 
> resVer=AffinityTopologyVersion [topVer=15, minorTopVer=0], stage="Waiting in 
> exchange queue" (14850 ms), stage="Exchange parameters initialization" (2 
> ms), stage="Determine exchange type" (3 ms), stage="Exchange done" (4 ms), 
> stage="Total time" (14859 ms)]
> 2023-08-27 17:08:48,522 [44] INF [ImmutableClientServer]   Exchange longest 
> local stages [startVer=AffinityTopologyVersion [topVer=15, minorTopVer=0], 
> resVer=AffinityTopologyVersion [topVer=15, minorTopVer=0]]
> 2023-08-27 17:08:48,524 [44] INF [ImmutableClientServer]   Finished exchange 
> init [topVer=AffinityTopologyVersion [topVer=15, minorTopVer=0], crd=false]
> 2023-08-27 17:08:48,525 [44] INF [ImmutableClientServer]   
> AffinityTopologyVersion [topVer=15, minorTopVer=0], evt=NODE_FAILED, 
> evtNode=9d5ed68d-38bb-447d-aed5-189f52660716, client=true]
> Unhandled exception: Apache.Ignite.Core.Cache.CacheException: class 
> org.apache.ignite.IgniteCheckedException: Failed to complete exchange process.
>  ---> Apache.Ignite.Core.Common.IgniteException: Failed to complete exchange 
> process.
>  ---> Apache.Ignite.Core.Common.JavaException: javax.cache.CacheException: 
> class org.apache.ignite.IgniteCheckedException: Failed to complete exchange 
> process.
>         at 
> org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1272)
>         at 
> org.apache.ignite.internal.IgniteKernal.getOrCreateCache0(IgniteKernal.java:2278)
>         at 
> org.apache.ignite.internal.IgniteKernal.getOrCreateCache(IgniteKernal.java:2242)
>         at 
> org.apache.ignite.internal.processors.platform.PlatformProcessorImpl.processInStreamOutObject(PlatformProcessorImpl.java:643)
>         at 
> org.apache.ignite.internal.processors.platform.PlatformTargetProxyImpl.inStreamOutObject(PlatformTargetProxyImpl.java:79)
> Caused by: class org.apache.ignite.IgniteCheckedException: Failed to complete 
> exchange process.
>         at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.createExchangeException(GridDhtPartitionsExchangeFuture.java:3709)
>         at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.sendExchangeFailureMessage(GridDhtPartitionsExchangeFuture.java:3737)
>         at 
> 

[jira] [Comment Edited] (IGNITE-20299) Creating a cache with an unknown data region name causes total unrecoverable failure of the grid

2023-08-30 Thread Raymond Wilson (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-20299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17760136#comment-17760136
 ] 

Raymond Wilson edited comment on IGNITE-20299 at 8/30/23 6:31 PM:
--

I think persistence is an important aspect for this issue as it is on restart 
that the grid complains that it cannot (a) start the incorrectly created cache 
(which raises the question as to why it is still known about if creation of it 
was unsuccessful) and (b) fails to initialise the persisted caches.

The cache folder for the incorrectly create cache is also constructed which 
indicates that the grid has somehow accepted the cache as a valid new cache 
while at the same time throwing the exchange process exception, all of which 
indicates the validation of the parameters for the new cache is not enforcing 
the requirement for the data region to be known.



was (Author: rpwilson):
I think persistence is an important for this issue as it is on restart that the 
grid complains that it cannot (a) start the incorrectly created cache (which 
raises the question as to why it is still known about if creation of it was 
unsuccessful) and (b) fails to initialise the persisted caches.

The cache folder for the incorrectly create cache is also constructed which 
indicates that the grid has somehow accepted the cache as a valid new cache 
while at the same time throwing the exchange process exception, all of which 
indicates the validation of the parameters for the new cache is not enforcing 
the requirement for the data region to be known.


> Creating a cache with an unknown data region name causes total unrecoverable 
> failure of the grid
> 
>
> Key: IGNITE-20299
> URL: https://issues.apache.org/jira/browse/IGNITE-20299
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.15
> Environment: Observed in:
> C# client and grid running on Linux in a container
> C# client and grid running on Windows
>  
>Reporter: Raymond Wilson
>Priority: Major
>
> Using the Ignite C# client.
>  
> Given a running grid, having a client (and perhaps server) node in the grid 
> attempt to create a cache using a DataRegionName that does not exist in the 
> grid causes immediate failure in the client node with the following log 
> output. 
>  
> 2023-08-27 17:08:48,520 [44] INF [ImmutableClientServer]   Completed 
> partition exchange [localNode=15122bd7-bf81-44e6-a548-e70dbd9334c0, 
> exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion 
> [topVer=15, minorTopVer=0], evt=NODE_FAILED, evtNode=TcpDiscoveryNode 
> [id=9d5ed68d-38bb-447d-aed5-189f52660716, 
> consistentId=9d5ed68d-38bb-447d-aed5-189f52660716, addrs=ArrayList 
> [127.0.0.1], sockAddrs=null, discPort=0, order=8, intOrder=8, 
> lastExchangeTime=1693112858024, loc=false, ver=2.15.0#20230425-sha1:f98f7f35, 
> isClient=true], rebalanced=false, done=true, newCrdFut=null], 
> topVer=AffinityTopologyVersion [topVer=15, minorTopVer=0]]
> 2023-08-27 17:08:48,520 [44] INF [ImmutableClientServer]   Exchange timings 
> [startVer=AffinityTopologyVersion [topVer=15, minorTopVer=0], 
> resVer=AffinityTopologyVersion [topVer=15, minorTopVer=0], stage="Waiting in 
> exchange queue" (14850 ms), stage="Exchange parameters initialization" (2 
> ms), stage="Determine exchange type" (3 ms), stage="Exchange done" (4 ms), 
> stage="Total time" (14859 ms)]
> 2023-08-27 17:08:48,522 [44] INF [ImmutableClientServer]   Exchange longest 
> local stages [startVer=AffinityTopologyVersion [topVer=15, minorTopVer=0], 
> resVer=AffinityTopologyVersion [topVer=15, minorTopVer=0]]
> 2023-08-27 17:08:48,524 [44] INF [ImmutableClientServer]   Finished exchange 
> init [topVer=AffinityTopologyVersion [topVer=15, minorTopVer=0], crd=false]
> 2023-08-27 17:08:48,525 [44] INF [ImmutableClientServer]   
> AffinityTopologyVersion [topVer=15, minorTopVer=0], evt=NODE_FAILED, 
> evtNode=9d5ed68d-38bb-447d-aed5-189f52660716, client=true]
> Unhandled exception: Apache.Ignite.Core.Cache.CacheException: class 
> org.apache.ignite.IgniteCheckedException: Failed to complete exchange process.
>  ---> Apache.Ignite.Core.Common.IgniteException: Failed to complete exchange 
> process.
>  ---> Apache.Ignite.Core.Common.JavaException: javax.cache.CacheException: 
> class org.apache.ignite.IgniteCheckedException: Failed to complete exchange 
> process.
>         at 
> org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1272)
>         at 
> org.apache.ignite.internal.IgniteKernal.getOrCreateCache0(IgniteKernal.java:2278)
>         at 
> org.apache.ignite.internal.IgniteKernal.getOrCreateCache(IgniteKernal.java:2242)
>         at 
> 

[jira] [Comment Edited] (IGNITE-20299) Creating a cache with an unknown data region name causes total unrecoverable failure of the grid

2023-08-29 Thread Raymond Wilson (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-20299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759846#comment-17759846
 ] 

Raymond Wilson edited comment on IGNITE-20299 at 8/29/23 7:00 AM:
--

[~ptupitsyn]

Yes, we are using persistence. 

This is our persistence XML file:


{noformat}


http://www.springframework.org/schema/beans;
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
   xmlns:util="http://www.springframework.org/schema/util;
   xsi:schemaLocation="http://www.springframework.org/schema/beans
   
http://www.springframework.org/schema/beans/spring-beans.xsd
   http://www.springframework.org/schema/util
   
http://www.springframework.org/schema/util/spring-util.xsd;>

  
  

{noformat}

Our configuration is mostly in code. Here is the primary configuration for the 
server nodes:


{noformat}
public void ConfigureTRexGrid(IgniteConfiguration cfg)
{
   cfg.IgniteInstanceName = TRexGrids.ImmutableGridName();
  cfg.JvmOptions = CommonJavaJVMOptions();

  var configStore = DIContext.Obtain();

  // Note: Set the PSN JVM heap size minimum and maximum sizes to be the 
maximum defined JVM heap size for the node.
  // This is to ensure the JVM always has access to the heap promised to it 
so will never act to resize the heap 
  // This provide better performance and removes chances of surprise if the 
OS cannot allocate a larger heap size block 
  // for other reason.
  cfg.JvmMaxMemoryMb = 
configStore.GetValueInt(PSNODE_IGNITE_JVM_MAX_HEAP_SIZE_MB, 
DEFAULT_IGNITE_JVM_MAX_HEAP_SIZE_MB);
  cfg.JvmInitialMemoryMb = 
configStore.GetValueInt(PSNODE_IGNITE_JVM_MAX_HEAP_SIZE_MB, 
DEFAULT_IGNITE_JVM_MAX_HEAP_SIZE_MB);

  cfg.UserAttributes = new Dictionary
  {
  { "Owner", TRexGrids.ImmutableGridName() }
  };

  // Configure the Ignite persistence layer to store our data
  cfg.DataStorageConfiguration = new DataStorageConfiguration
  {
WalMode = WalMode.Fsync,
PageSize = IgniteDataRegionPageSize(),

StoragePath = 
Path.Combine(TRexServerConfig.PersistentCacheStoreLocation, "Immutable", 
"Persistence"),
WalPath = Path.Combine(TRexServerConfig.PersistentCacheStoreLocation, 
"Immutable", "WalStore"),
WalArchivePath = 
Path.Combine(TRexServerConfig.PersistentCacheStoreLocation, "Immutable", 
"WalArchive"),

WalSegmentSize = 512 * 1024 * 1024, // Set the WalSegmentSize to 512Mb 
to better support high write loads (can be set to max 2Gb)
MaxWalArchiveSize = (long)10 * 512 * 1024 * 1024, // Ensure there are 
10 segments in the WAL archive at the defined segment size

CheckpointThreads = 
configStore.GetValueInt(IGNITE_NUMBER_OF_CHECKPOINTING_THREADS, 
DEFAULT_IGNITE_NUMBER_OF_CHECKPOINTING_THREADS),
CheckpointFrequency = 
TimeSpan.FromSeconds(configStore.GetValueInt(IGNITE_CHECKPOINTING_INTERVAL_SECONDS,
 DEFAULT_IGNITE_CHECKPOINTING_INTERVAL_SECONDS)),

DefaultDataRegionConfiguration = new DataRegionConfiguration
{
  Name = DataRegions.DEFAULT_IMMUTABLE_DATA_REGION_NAME,
  InitialSize = 
configStore.GetValueLong(IMMUTABLE_DATA_REGION_INITIAL_SIZE_MB, 
DEFAULT_IMMUTABLE_DATA_REGION_INITIAL_SIZE_MB) * 1024 * 1024,  
  MaxSize = configStore.GetValueLong(IMMUTABLE_DATA_REGION_MAX_SIZE_MB, 
DEFAULT_IMMUTABLE_DATA_REGION_MAX_SIZE_MB) * 1024 * 1024,  

  PersistenceEnabled = true
}
  };

  
Log.LogInformation($"cfg.DataStorageConfiguration.StoragePath={cfg.DataStorageConfiguration.StoragePath}");
  
Log.LogInformation($"cfg.DataStorageConfiguration.WalArchivePath={cfg.DataStorageConfiguration.WalArchivePath}");
  
Log.LogInformation($"cfg.DataStorageConfiguration.WalPath={cfg.DataStorageConfiguration.WalPath}");

  if (!bool.TryParse(Environment.GetEnvironmentVariable("IS_KUBERNETES"), 
out var isKubernetes))
  {
Log.LogWarning($"Failed to parse the value of the 'IS_KUBERNETES' 
environment variable as a bool. Value is 
{Environment.GetEnvironmentVariable("IS_KUBERNETES")}. Defaulting to true");
  }

  cfg = isKubernetes ? SetKubernetesIgniteConfiguration(cfg) : 
SetLocalIgniteConfiguration(cfg);
  cfg.WorkDirectory = 
Path.Combine(TRexServerConfig.PersistentCacheStoreLocation, "Immutable");

  cfg.Logger = new TRexIgniteLogger(configStore, 
Logger.CreateLogger("ImmutableCacheComputeServer"));

  // Set an Ignite metrics heartbeat
  cfg.MetricsLogFrequency = new TimeSpan(0, 0, 0, 
configStore.GetValueInt(IGNITE_HEARTBEAT_FREQUENCY_SECONDS, 
DEFAULT_IGNITE_HEARTBEAT_FREQUENCY_SECONDS)); 

  cfg.PublicThreadPoolSize = 
configStore.GetValueInt(IGNITE_PUBLIC_THREAD_POOL_SIZE, 
DEFAULT_IGNITE_PUBLIC_THREAD_POOL_SIZE);
  cfg.SystemThreadPoolSize =