Re: The code inside the CacheEntryProcessor executes multiple times. Why?

2021-10-18 Thread 38797715

Any feedback?

在 2021/10/14 15:03, 38797715 写道:


Hi,

The internal code of CacheEntryProcessor in the attachment has been 
executed multiple times. Why?

Is there any simple way to solve this problem?
package com.test;

import java.util.Collections;
import java.util.HashMap;
import java.util.Map;

import org.apache.ignite.Ignite;
import org.apache.ignite.IgniteCache;
import org.apache.ignite.Ignition;
import org.apache.ignite.configuration.CacheConfiguration;
import org.apache.ignite.configuration.IgniteConfiguration;
import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;

public class IgniteTestDemo {

public static String TEST_DATA = "TEST_DATA";
public static CacheConfiguration cfg1 = new CacheConfiguration<>();
public static IgniteCache TEST_CACHE;


private static Ignite ignite = null;

public static void main(String[] args) {
init();
initCache();
TEST_CACHE.put(1L, new UserInfoData());
testInvoke(1L);
}

private static void init() {
IgniteConfiguration cfg = new IgniteConfiguration();
cfg.setMetricsLogFrequency(0);
cfg.setPeerClassLoadingEnabled(true);
TcpDiscoverySpi spi = new TcpDiscoverySpi();
TcpDiscoveryVmIpFinder ipFinder = new TcpDiscoveryVmIpFinder();
ipFinder.setAddresses(Collections.singletonList("127.0.0.1"));
spi.setIpFinder(ipFinder);
cfg.setDiscoverySpi(spi);

ignite = Ignition.start(cfg);
}

private static void initCache() {
cfg1.setIndexedTypes(Long.class, UserInfoData.class);
cfg1.setName(TEST_DATA);
TEST_CACHE = ignite.getOrCreateCache(cfg1);
}

public static void testInvoke(long uid) {
System.out.println(" ... pre  testInvoke ... ");
TEST_CACHE.invoke(
uid, (entry, arguement) -> {
System.out.println("\n-");
UserInfoData value;
value = new UserInfoData();
value.getMap1().put(1, new Object());
value.getMap2().put(1, new ItemClass1());
entry.setValue(value);
return null;
}
);
System.out.println(" ... post  testInvoke ... ");
}

static class UserInfoData {
public Map getMap1() {
return map1;
}

public void setMap1(Map map1) {
this.map1 = map1;
}

private Map map1 = new HashMap<>();

public Map getMap2() {
return map2;
}

public void setMap2(Map map2) {
this.map2 = map2;
}
private Map map2 = new HashMap<>();
}

static class ItemClass1 {

}
}


Re: Problem with Cache KV Remote Query

2021-10-18 Thread MJ
Confirmed "Compact footer" setting fixed the problem. Thanks a lot.
-MJ



---Original---
From: "Alex Plehanov"https://ignite.apache.org/docs/latest/data-modeling/affinity-collocation#configuring-affinity-key;
 . 


When I run the code in single jvm, it works perfect and successfully retrieved 
the cached object (personCache.get(new PersonKey(1, "company1"))) .
But when I try to run the client code in another new JVM(meanwhile leave the 
server node run in local), something goes wrong (see below). Please can 
anyone elaborate why the first test case succeeded but the second one failed ?


Logger log = 
LoggerFactory.getLogger(getClass()); 


//success
@Test
public void 
test_iterate() throws ClientException, Exception {
   
 ClientConfiguration cfg = new 
ClientConfiguration().setAddresses("127.0.0.1:10800");
   
 try (IgniteClient client = 
Ignition.startClient(cfg)) {
   
   
  ClientCache

RE: Problem with ScanQuery with filter on ClinetCache

2021-10-18 Thread Prasad Kommoju
Hi Alex,

Copying the jar file worked. Thanks.


-
Regards,
Prasad Kommoju

From: Alex Plehanov 
Sent: Monday, October 18, 2021 2:05 PM
To: user@ignite.apache.org
Subject: Re: Problem with ScanQuery with filter on ClinetCache

You should deploy not only the Person class, but also the filter class. You 
can, for example, make a jar with required classes and put it to the server 
classpath.

пн, 18 окт. 2021 г. в 23:53, Prasad Kommoju 
mailto:pkomm...@futurewei.com>>:
Hi Alex,

Thanks for the clarification. I am running it on a loptop, using the default 
configuration file so there is only one node and the client code and the server 
both run on the same laptop. Could you suggest how to deploy the Person class 
to other node (in production there will be) and in the laptop case?


-
Regards,
Prasad Kommoju

From: Alex Plehanov mailto:plehanov.a...@gmail.com>>
Sent: Monday, October 18, 2021 1:35 PM
To: user@ignite.apache.org
Subject: Re: Problem with ScanQuery with filter on ClinetCache

Hello,

Thin client doesn't have a peer class loader (and most probably will never 
have). To use predicates you should deploy classes for these predicates to 
Ignite nodes. Serializable interface for Person class will not help here.

пн, 18 окт. 2021 г. в 20:14, Prasad Kommoju 
mailto:pkomm...@futurewei.com>>:
Thanks for the tip but it did not help. Also, there was another suggestion that 
Person should implement Serializable. The examples do not show this 
requirement, I am going to try this too.



-
Regards,
Prasad Kommoju

From: Ilya Kazakov mailto:kazakov.i...@gmail.com>>
Sent: Sunday, October 17, 2021 8:41 PM
To: user@ignite.apache.org
Subject: Re: Problem with ScanQuery with filter on ClinetCache

Hello, it looks like you should enable Peer Class Loading: 
https://ignite.apache.org/docs/latest/code-deployment/peer-class-loading

--
Ilya Kazakov

сб, 16 окт. 2021 г. в 02:47, Prasad Kommoju 
mailto:pkomm...@futurewei.com>>:

Hi

I am having trouble getting ScanQuery to work with a filter from thin client.

I have attached a working minimal reproduction program (IntelliJ project zip 
export) , the ignite config file and the  part of the ignite logfile after 
running the query. The code is fashioned along the example in the documentation.

Here is the querying part of the code:

...
IgniteBiPredicate filter =
(key, p) -> p.getName().equals(srchName);

try (QueryCursor> scnCursor =
 personCache.query(new ScanQuery<>(filter))) {
scnCursor.forEach(
entry -> System.out.println(entry.getKey() +
" " + entry.getValue()));
} catch (Exception e) {
System.out.println("Scan query failed " + e.getMessage());
}

...

Any help in pointing out the problem will be greatly appreciated.


-
Regards,
Prasad Kommoju


Re: Problem with ScanQuery with filter on ClinetCache

2021-10-18 Thread Alex Plehanov
You should deploy not only the Person class, but also the filter class. You
can, for example, make a jar with required classes and put it to the server
classpath.

пн, 18 окт. 2021 г. в 23:53, Prasad Kommoju :

> Hi Alex,
>
>
>
> Thanks for the clarification. I am running it on a loptop, using the
> default configuration file so there is only one node and the client code
> and the server both run on the same laptop. Could you suggest how to deploy
> the Person class to other node (in production there will be) and in the
> laptop case?
>
>
>
>
>
> -
>
> Regards,
>
> Prasad Kommoju
>
>
>
> *From:* Alex Plehanov 
> *Sent:* Monday, October 18, 2021 1:35 PM
> *To:* user@ignite.apache.org
> *Subject:* Re: Problem with ScanQuery with filter on ClinetCache
>
>
>
> Hello,
>
>
>
> Thin client doesn't have a peer class loader (and most probably will never
> have). To use predicates you should deploy classes for these predicates to
> Ignite nodes. Serializable interface for Person class will not help here.
>
>
>
> пн, 18 окт. 2021 г. в 20:14, Prasad Kommoju :
>
> Thanks for the tip but it did not help. Also, there was another suggestion
> that Person should implement Serializable. The examples do not show this
> requirement, I am going to try this too.
>
>
>
>
>
>
>
> -
>
> Regards,
>
> Prasad Kommoju
>
>
>
> *From:* Ilya Kazakov 
> *Sent:* Sunday, October 17, 2021 8:41 PM
> *To:* user@ignite.apache.org
> *Subject:* Re: Problem with ScanQuery with filter on ClinetCache
>
>
>
> Hello, it looks like you should enable Peer Class Loading:
> https://ignite.apache.org/docs/latest/code-deployment/peer-class-loading
> 
>
>
>
> --
>
> Ilya Kazakov
>
>
>
> сб, 16 окт. 2021 г. в 02:47, Prasad Kommoju :
>
>
>
> Hi
>
>
>
> I am having trouble getting ScanQuery to work with a filter from thin
> client.
>
>
>
> I have attached a working minimal reproduction program (IntelliJ project
> zip export) , the ignite config file and the  part of the ignite logfile
> after running the query. The code is fashioned along the example in the
> documentation.
>
>
>
> Here is the querying part of the code:
>
>
>
> …
>
> IgniteBiPredicate filter =
> (key, p) -> p.getName().equals(srchName);
>
> try (QueryCursor> scnCursor =
>  personCache.query(new ScanQuery<>(filter))) {
> scnCursor.forEach(
> entry -> System.*out*.println(entry.getKey() +
> " " + entry.getValue()));
> } catch (Exception e) {
> System.*out*.println("Scan query failed " + e.getMessage());
> }
>
>
>
> …
>
>
>
> Any help in pointing out the problem will be greatly appreciated.
>
>
>
>
>
> -
>
> Regards,
>
> Prasad Kommoju
>
>


RE: Problem with ScanQuery with filter on ClinetCache

2021-10-18 Thread Prasad Kommoju
Hi Alex,

Thanks for the clarification. I am running it on a loptop, using the default 
configuration file so there is only one node and the client code and the server 
both run on the same laptop. Could you suggest how to deploy the Person class 
to other node (in production there will be) and in the laptop case?


-
Regards,
Prasad Kommoju

From: Alex Plehanov 
Sent: Monday, October 18, 2021 1:35 PM
To: user@ignite.apache.org
Subject: Re: Problem with ScanQuery with filter on ClinetCache

Hello,

Thin client doesn't have a peer class loader (and most probably will never 
have). To use predicates you should deploy classes for these predicates to 
Ignite nodes. Serializable interface for Person class will not help here.

пн, 18 окт. 2021 г. в 20:14, Prasad Kommoju 
mailto:pkomm...@futurewei.com>>:
Thanks for the tip but it did not help. Also, there was another suggestion that 
Person should implement Serializable. The examples do not show this 
requirement, I am going to try this too.



-
Regards,
Prasad Kommoju

From: Ilya Kazakov mailto:kazakov.i...@gmail.com>>
Sent: Sunday, October 17, 2021 8:41 PM
To: user@ignite.apache.org
Subject: Re: Problem with ScanQuery with filter on ClinetCache

Hello, it looks like you should enable Peer Class Loading: 
https://ignite.apache.org/docs/latest/code-deployment/peer-class-loading

--
Ilya Kazakov

сб, 16 окт. 2021 г. в 02:47, Prasad Kommoju 
mailto:pkomm...@futurewei.com>>:

Hi

I am having trouble getting ScanQuery to work with a filter from thin client.

I have attached a working minimal reproduction program (IntelliJ project zip 
export) , the ignite config file and the  part of the ignite logfile after 
running the query. The code is fashioned along the example in the documentation.

Here is the querying part of the code:

...
IgniteBiPredicate filter =
(key, p) -> p.getName().equals(srchName);

try (QueryCursor> scnCursor =
 personCache.query(new ScanQuery<>(filter))) {
scnCursor.forEach(
entry -> System.out.println(entry.getKey() +
" " + entry.getValue()));
} catch (Exception e) {
System.out.println("Scan query failed " + e.getMessage());
}

...

Any help in pointing out the problem will be greatly appreciated.


-
Regards,
Prasad Kommoju


Re: Problem with ScanQuery with filter on ClinetCache

2021-10-18 Thread Alex Plehanov
Hello,

Thin client doesn't have a peer class loader (and most probably will never
have). To use predicates you should deploy classes for these predicates to
Ignite nodes. Serializable interface for Person class will not help here.

пн, 18 окт. 2021 г. в 20:14, Prasad Kommoju :

> Thanks for the tip but it did not help. Also, there was another suggestion
> that Person should implement Serializable. The examples do not show this
> requirement, I am going to try this too.
>
>
>
>
>
>
>
> -
>
> Regards,
>
> Prasad Kommoju
>
>
>
> *From:* Ilya Kazakov 
> *Sent:* Sunday, October 17, 2021 8:41 PM
> *To:* user@ignite.apache.org
> *Subject:* Re: Problem with ScanQuery with filter on ClinetCache
>
>
>
> Hello, it looks like you should enable Peer Class Loading:
> https://ignite.apache.org/docs/latest/code-deployment/peer-class-loading
> 
>
>
>
> --
>
> Ilya Kazakov
>
>
>
> сб, 16 окт. 2021 г. в 02:47, Prasad Kommoju :
>
>
>
> Hi
>
>
>
> I am having trouble getting ScanQuery to work with a filter from thin
> client.
>
>
>
> I have attached a working minimal reproduction program (IntelliJ project
> zip export) , the ignite config file and the  part of the ignite logfile
> after running the query. The code is fashioned along the example in the
> documentation.
>
>
>
> Here is the querying part of the code:
>
>
>
> …
>
> IgniteBiPredicate filter =
> (key, p) -> p.getName().equals(srchName);
>
> try (QueryCursor> scnCursor =
>  personCache.query(new ScanQuery<>(filter))) {
> scnCursor.forEach(
> entry -> System.*out*.println(entry.getKey() +
> " " + entry.getValue()));
> } catch (Exception e) {
> System.*out*.println("Scan query failed " + e.getMessage());
> }
>
>
>
> …
>
>
>
> Any help in pointing out the problem will be greatly appreciated.
>
>
>
>
>
> -
>
> Regards,
>
> Prasad Kommoju
>
>


RE: Problem with ScanQuery with filter on ClinetCache

2021-10-18 Thread Prasad Kommoju
Thanks for the tip but it did not help. Also, there was another suggestion that 
Person should implement Serializable. The examples do not show this 
requirement, I am going to try this too.



-
Regards,
Prasad Kommoju

From: Ilya Kazakov 
Sent: Sunday, October 17, 2021 8:41 PM
To: user@ignite.apache.org
Subject: Re: Problem with ScanQuery with filter on ClinetCache

Hello, it looks like you should enable Peer Class Loading: 
https://ignite.apache.org/docs/latest/code-deployment/peer-class-loading

--
Ilya Kazakov

сб, 16 окт. 2021 г. в 02:47, Prasad Kommoju 
mailto:pkomm...@futurewei.com>>:

Hi

I am having trouble getting ScanQuery to work with a filter from thin client.

I have attached a working minimal reproduction program (IntelliJ project zip 
export) , the ignite config file and the  part of the ignite logfile after 
running the query. The code is fashioned along the example in the documentation.

Here is the querying part of the code:

...
IgniteBiPredicate filter =
(key, p) -> p.getName().equals(srchName);

try (QueryCursor> scnCursor =
 personCache.query(new ScanQuery<>(filter))) {
scnCursor.forEach(
entry -> System.out.println(entry.getKey() +
" " + entry.getValue()));
} catch (Exception e) {
System.out.println("Scan query failed " + e.getMessage());
}

...

Any help in pointing out the problem will be greatly appreciated.


-
Regards,
Prasad Kommoju


RE: [EXT] Re: Crash of Ignite (B+Tree corrupted) on a large PutIfAbsent

2021-10-18 Thread Semeria, Vincent
> some changes can be already made in the DB prior to the exception
Ok. Then fixing the .NET error and the documentation would be nice.

From: Alex Plehanov 
Sent: lundi 18 octobre 2021 14:55
To: user@ignite.apache.org
Subject: Re: [EXT] Re: Crash of Ignite (B+Tree corrupted) on a large PutIfAbsent

> Will you add a test in Ignite to avoid the crash
I'm not quite sure, but perhaps a crash is the best we can do in this case. 
Throwing exception to the user might be not enough, since some changes can be 
already made in the DB prior to the exception and this can lead to data 
inconsistency.

> give a clearer error message
Nested exception already contains a root cause of the problem (For example: 
"Record is too long [capacity=67108864, size=92984514]", see ticket with the 
same problem reproduced with java [1]), but perhaps this nested exception is 
not correctly displayed by Ignite .NET

[1]: 
https://issues.apache.org/jira/browse/IGNITE-13965

сб, 16 окт. 2021 г. в 15:02, Semeria, Vincent 
mailto:vincent.seme...@finastra.com>>:
Yes indeed, setting the WAL segment size to 150 megabytes accepts my 
PutIfAbsent. Thanks.

Will you add a test in Ignite to avoid the crash, give a clearer error message 
and mention in the documentation that the WAL segment size should be higher 
than any single cache entry ? At the moment the doc just says this
//
// Summary:
// Gets or sets the size of the WAL (Write Ahead Log) segment. For 
performance reasons,
// the whole WAL is split into files of fixed length called 
segments.

The limit should also be written in this page
Ignite Persistence | Ignite Documentation 
(apache.org)

From: Alex Plehanov mailto:plehanov.a...@gmail.com>>
Sent: Friday, October 15, 2021 16:12
To: user@ignite.apache.org
Subject: [EXT] Re: Crash of Ignite (B+Tree corrupted) on a large PutIfAbsent

Hello,

Perhaps you have too small WAL segment size (WAL segment should be large enough 
to fit the whole cache entry), try to change 
DataStorageConfiguration.WalSegmentSize property.

чт, 14 окт. 2021 г. в 00:43, Semeria, Vincent 
mailto:vincent.seme...@finastra.com>>:
Hello Ignite,

I currently use the C# API of Ignite 2.10 to store large objects of type V in 
an ICache. Typically an object of V is around 100 megabytes. My data 
region is persisted on the hard drive. PutIfAbsent crashes Ignite with the 
complicated message below. As a workaround, I split type V into smaller types 
and used loops of smaller PutIfAbsent, which succeeded. Ultimately the data 
stored in the cache is the same, which shows that Ignite accepts my data (this 
is not a problem in the binary serializer).

Is there a configuration of the data region that would accept a single 
PutIfAbsent of 100 megabytes ?

Anyway Ignite should probably not crash when this limit is exceeded. Please 
send a clean error instead like "Insertion request exceeds limit XYZ" and keep 
Ignite alive in this case.

Regards,
Vincent Semeria


error : 
PerformanceAttributionServer[PerformanceAttributionServer1]_Global@PARDH7JQHS2 
(41660.6010) : [2021/10/12-10:21:48.215] : 
Apache.Ignite.NLog.IgniteNLogLogger::LoggerLog() : Critical system error 
detected. Will be handled accordingly to configured handler 
[hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet 
[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], 
failureCtx=FailureContext [type=CRITICAL_ERROR, err=class 
o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is 
corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=241659666, 
val2=1127239936638982]], msg=Runtime failure on search row: SearchRow 
[key=KeyCacheObjectImpl [part=312, val=56ae72d3-a91a-4211-8279-0b0447881544, 
hasValBytes=true], hash=746501958, cacheId=0
error : 
PerformanceAttributionServer[PerformanceAttributionServer1]_Global@PARDH7JQHS2 
(41660.6010) : [2021/10/12-10:21:48.216] : 
org.apache.ignite.internal.processors.failure.FailureProcessor::LoggerLog() : A 
critical problem with persistence data 

Re: Problem with Cache KV Remote Query

2021-10-18 Thread Alex Plehanov
Hello,

How was the entry inserted to the cache? You are trying to get this entry
via thin client, if the entry was inserted via thick-client (Ignite node)
you can face such a problem. Ignite thin-client and Ignite nodes have
different default "Compact footer" property values, so POJO keys are
marshalled in different ways and treated as different keys for thin-clients
and Ignite nodes.
Try to change "Compact footer" property for thin-client configuration:
ClientConfiguration cfg = new
ClientConfiguration().setBinaryConfiguration(new
BinaryConfiguration().setCompactFooter(true)).setAddresses("127.0.0.1:10800
");


пн, 18 окт. 2021 г. в 15:02, MJ <6733...@qq.com>:

> Hi,
>
>
> I experienced below problem when testing the “Affinity Colocation”
> functionality. The code is from
> https://ignite.apache.org/docs/latest/data-modeling/affinity-collocation#configuring-affinity-key
> .
>
> When I run the code in single jvm, it works perfect and successfully
> retrieved the cached object (personCache.get(new PersonKey(1, "company1")))
> .
> But when I try to run the client code in another new JVM(meanwhile leave
> the server node run in local), something goes wrong (see below).  Please
> can anyone elaborate why the first test case succeeded but the second one
> failed ?
>
> Logger log = LoggerFactory.getLogger(getClass());
>
> //success
> @Test
> public void test_iterate() throws ClientException,
> Exception {
> ClientConfiguration cfg = new
> ClientConfiguration().setAddresses("127.0.0.1:10800");
> try (IgniteClient client =
> Ignition.startClient(cfg)) {
> ClientCache Person> cache = client.cache("persons");
> try
> (QueryCursor> qryCursor = cache.query(new
> ScanQuery<>(null))) {
>
> qryCursor.forEach(entry -> System.out.println("Key = " + entry.getKey() +
> ", Value = " + entry.getValue()));
> }
> }
> }
>
>
> //fail
> @Test
> public void test_query() throws ClientException, Exception
> {
> ClientConfiguration cfg = new
> ClientConfiguration().setAddresses("127.0.0.1:10800");
> try (IgniteClient client =
> Ignition.startClient(cfg)) {
> ClientCache Person> cache = client.cache("persons");
> Person row = cache.get(new
> PersonKey(1, "company1"));
>
> Assert.assertNotNull(row);  // no data returned
> log.info("{}", row);
> }
> }
>
>
> Thanks,
> -MJ
>
>


Re: [EXT] Re: Crash of Ignite (B+Tree corrupted) on a large PutIfAbsent

2021-10-18 Thread Alex Plehanov
> Will you add a test in Ignite to avoid the crash
I'm not quite sure, but perhaps a crash is the best we can do in this case.
Throwing exception to the user might be not enough, since some changes can
be already made in the DB prior to the exception and this can lead to data
inconsistency.

> give a clearer error message
Nested exception already contains a root cause of the problem (For example:
"Record is too long [capacity=67108864, size=92984514]", see ticket with
the same problem reproduced with java [1]), but perhaps this nested
exception is not correctly displayed by Ignite .NET

[1]: https://issues.apache.org/jira/browse/IGNITE-13965

сб, 16 окт. 2021 г. в 15:02, Semeria, Vincent :

> Yes indeed, setting the WAL segment size to 150 megabytes accepts my
> PutIfAbsent. Thanks.
>
>
>
> Will you add a test in Ignite to avoid the crash, give a clearer error
> message and mention in the documentation that the WAL segment size should
> be higher than any single cache entry ? At the moment the doc just says this
>
> //
>
> // Summary:
>
> // Gets or sets the size of the WAL (Write Ahead Log) segment.
> For performance reasons,
>
> // the whole WAL is split into files of fixed length called
> segments.
>
>
>
> The limit should also be written in this page
>
> Ignite Persistence | Ignite Documentation (apache.org)
> 
>
>
>
> *From:* Alex Plehanov 
> *Sent:* Friday, October 15, 2021 16:12
> *To:* user@ignite.apache.org
> *Subject:* [EXT] Re: Crash of Ignite (B+Tree corrupted) on a large
> PutIfAbsent
>
>
>
> Hello,
>
>
>
> Perhaps you have too small WAL segment size (WAL segment should be large
> enough to fit the whole cache entry), try to
> change DataStorageConfiguration.WalSegmentSize property.
>
>
>
> чт, 14 окт. 2021 г. в 00:43, Semeria, Vincent <
> vincent.seme...@finastra.com>:
>
> Hello Ignite,
>
>
>
> I currently use the C# API of Ignite 2.10 to store large objects of type V
> in an ICache. Typically an object of V is around 100 megabytes.
> My data region is persisted on the hard drive. PutIfAbsent crashes Ignite
> with the complicated message below. As a workaround, I split type V into
> smaller types and used loops of smaller PutIfAbsent, which succeeded.
> Ultimately the data stored in the cache is the same, which shows that
> Ignite accepts my data (this is not a problem in the binary serializer).
>
>
>
> Is there a configuration of the data region that would accept a single
> PutIfAbsent of 100 megabytes ?
>
>
>
> Anyway Ignite should probably not crash when this limit is exceeded.
> Please send a clean error instead like “Insertion request exceeds limit
> XYZ” and keep Ignite alive in this case.
>
>
>
> Regards,
>
> Vincent Semeria
>
>
>
>
>
> error :
> PerformanceAttributionServer[PerformanceAttributionServer1]_Global@PARDH7JQHS2
> (41660.6010) : [2021/10/12-10:21:48.215] :
> Apache.Ignite.NLog.IgniteNLogLogger::LoggerLog() : Critical system error
> detected. Will be handled accordingly to configured handler
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
> failureCtx=FailureContext [type=CRITICAL_ERROR, err=class
> o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is
> corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=241659666,
> val2=1127239936638982]], msg=Runtime failure on search row: SearchRow
> [key=KeyCacheObjectImpl [part=312,
> val=56ae72d3-a91a-4211-8279-0b0447881544, hasValBytes=true],
> hash=746501958, cacheId=0
>
> error :
> PerformanceAttributionServer[PerformanceAttributionServer1]_Global@PARDH7JQHS2
> (41660.6010) : [2021/10/12-10:21:48.216] :
> org.apache.ignite.internal.processors.failure.FailureProcessor::LoggerLog()
> : A critical problem with persistence data structures was detected. Please
> make backup of persistence storage and WAL files for further analysis.
> Persistence storage path:  WAL path: db/wal WAL archive path: db/wal/archive
>
> error :
> PerformanceAttributionServer[PerformanceAttributionServer1]_Global@PARDH7JQHS2
> (41660.6010) : [2021/10/12-10:21:48.219] :
> org.apache.ignite.internal.processors.failure.FailureProcessor::LoggerLog()
> : No deadlocked threads detected.
>
> error :
> PerformanceAttributionServer[PerformanceAttributionServer1]_Global@PARDH7JQHS2
> (41660.6010) : [2021/10/12-10:21:48.276] :
> org.apache.ignite.internal.processors.failure.FailureProcessor::LoggerLog()
> : Thread dump at 2021/10/12 10:21:48 CEST
>
> Thread [name="sys-#200", id=235, state=TIMED_WAITING, blockCnt=0,
> waitCnt=1]
>
> Lock
> [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@529e4706,
> ownerName=null, ownerId=-1]
>
> at sun.misc.Unsafe.park(Native Method)
>
> at
> 

Ignite Cluster becomes unresponsive when relaunching client nodes

2021-10-18 Thread Randall Woodruff
Link
 to stack overflow post:.

We are intermittently seeing the following error on our k8tes setup.
The issue happens after we relaunch our tomcat pod which launches new Ignite 
client nodes.

I understand the first stack trace shows that Ignite has detected that the tcp 
communications spi has become unresponsive but I do not see how this has 
anything to do with the second stack trace.  This seems like two totally 
unrelated errors but second one says the thread dump is at the same timestamp 
as the first.  `Thread dump at 2021/10/12 15:57:17`

The issue can resolved by bringing down all the Ignite pods and relaunching 
them but A better understanding of this issue and a way to not need to restart 
Ignite would be apricated.

12-Oct-2021 15:57:17.139 WARNING 
[grid-timeout-worker-#134%igniteClientInstance%] 
org.apache.ignite.logger.java.JavaLogger.warning Possible failure suppressed 
accordingly to a configured handler [hnd=StopNodeOrHaltFailureHandler 
[tryStop=false, timeout=0, super=AbstractFailureHandler 
[ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext 
[type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker 
[name=tcp-comm-worker, igniteInstanceName=igniteClientInstance, finished=false, 
heartbeatTs=163405418]]]
class org.apache.ignite.IgniteException: GridWorker [name=tcp-comm-worker, 
igniteInstanceName=igniteClientInstance, finished=false, 
heartbeatTs=163405418]
at java.base/sun.nio.ch.Net.poll(Native Method)
at 
java.base/sun.nio.ch.SocketChannelImpl.pollConnected(SocketChannelImpl.java:991)
at java.base/sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:119)
at 
org.apache.ignite.spi.communication.tcp.internal.GridNioServerWrapper.createNioSession(GridNioServerWrapper.java:465)
at 
org.apache.ignite.spi.communication.tcp.internal.GridNioServerWrapper.createTcpClient(GridNioServerWrapper.java:691)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:1255)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$$Lambda$389/0x12e5ffc0.apply(Unknown
 Source)
at 
org.apache.ignite.spi.communication.tcp.internal.GridNioServerWrapper.createTcpClient(GridNioServerWrapper.java:689)
at 
org.apache.ignite.spi.communication.tcp.internal.ConnectionClientPool.createCommunicationClient(ConnectionClientPool.java:453)
at 
org.apache.ignite.spi.communication.tcp.internal.ConnectionClientPool.reserveClient(ConnectionClientPool.java:228)
at 
org.apache.ignite.spi.communication.tcp.internal.CommunicationWorker.processDisconnect(CommunicationWorker.java:374)
at 
org.apache.ignite.spi.communication.tcp.internal.CommunicationWorker.body(CommunicationWorker.java:174)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$6.body(TcpCommunicationSpi.java:923)
at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58)
12-Oct-2021 15:57:17.141 WARNING 
[grid-timeout-worker-#134%igniteClientInstance%] 
org.apache.ignite.logger.java.JavaLogger.warning No deadlocked threads detected.
12-Oct-2021 15:57:17.170 WARNING 
[grid-timeout-worker-#134%igniteClientInstance%] 
org.apache.ignite.logger.java.JavaLogger.warning Thread dump at 2021/10/12 
15:57:17 GMT
Thread [name="main", id=1, state=RUNNABLE, blockCnt=19, waitCnt=416]
at java.base/java.net.SocketInputStream.socketRead0(Native Method)
at 
java.base/java.net.SocketInputStream.socketRead(SocketInputStream.java:115)
at java.base/java.net.SocketInputStream.read(SocketInputStream.java:168)
at java.base/java.net.SocketInputStream.read(SocketInputStream.java:140)
at 
java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:252)
at 
java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:271)
- locked java.io.BufferedInputStream@263909ea
at org.postgresql.core.PGStream.ReceiveChar(PGStream.java:256)
at 
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1163)
at 
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:188)
- locked org.postgresql.core.v3.QueryExecutorImpl@1b338a37
at 
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:437)
at 
org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:353)
at 
org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:257)
at 
com.mchange.v2.c3p0.impl.NewProxyPreparedStatement.executeQuery(NewProxyPreparedStatement.java:116)
at 

Re: Exception using ThinClient in dotnet core

2021-10-18 Thread Pavel Tupitsyn
Hi Stéphane,

Thanks for the bug report, this issue is tracked in
https://issues.apache.org/jira/browse/IGNITE-14776

On Mon, Oct 18, 2021 at 1:55 PM Stéphane Gayet 
wrote:

> Hi community,
>
> I'm trying to use thin client in a net5.0 project. When running the
> process, I get an error when calling StartClient().
>
> Exception":"System.ArgumentNullException: Value cannot be null. (Parameter
> 'logger')
>at Apache.Ignite.Core.Impl.Common.IgniteArgumentCheck.NotNull(Object
> arg, String argName)
>at Apache.Ignite.Core.Log.LoggerExtensions.Log(ILogger logger, LogLevel
> level, Exception ex, String message)
>at Apache.Ignite.Core.Log.LoggerExtensions.Debug(ILogger logger,
> Exception ex, String message)
>at Apache.Ignite.Core.Impl.Client.ClientFailoverSocket.GetIps(String
> host, Boolean suppressExceptions)
>at
> Apache.Ignite.Core.Impl.Client.ClientFailoverSocket.d__15.MoveNext()
>at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
>at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)
>at
> Apache.Ignite.Core.Impl.Client.ClientFailoverSocket..ctor(IgniteClientConfiguration
> config, Marshaller marsh, TransactionsClient transactions)
>at
> Apache.Ignite.Core.Impl.Client.IgniteClient..ctor(IgniteClientConfiguration
> clientConfiguration)
>at Apache.Ignite.Core.Ignition.StartClient(IgniteClientConfiguration
> clientConfiguration)
>at MrFly.Flight.DsFeed.Infra.ThinClientDataStore..ctor(IMapper mapper,
> IOptions`1 settings, ILogger`1 logger) in
> /home/jenkins/agent/workspace/Connectivity/DataScience/C.Flight.DsFeed/src/DsFeed/Infra/ThinClientDataStore.cs:line
> 42
>at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[]
> arguments, Signature sig, Boolean constructor, Boolean wrapExceptions)
>at System.Reflection.RuntimeConstructorInfo.Invoke(BindingFlags
> invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
>at
> Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteVisitor`2.VisitCallSiteMain(ServiceCallSite
> callSite, TArgument argument)
>at
> Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitCache(ServiceCallSite
> callSite, RuntimeResolverContext context, ServiceProviderEngineScope
> serviceProviderEngine, RuntimeResolverLock lockType)
>at
> Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitScopeCache(ServiceCallSite
> singletonCallSite, RuntimeResolverContext context)
>at
> Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteVisitor`2.VisitCallSite(ServiceCallSite
> callSite, TArgument argument)
>at
> Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitConstructor(ConstructorCallSite
> constructorCallSite, RuntimeResolverContext context)
>at
> Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteVisitor`2.VisitCallSiteMain(ServiceCallSite
> callSite, TArgument argument)
>at
> Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitDisposeCache(ServiceCallSite
> transientCallSite, RuntimeResolverContext context)
>at
> Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteVisitor`2.VisitCallSite(ServiceCallSite
> callSite, TArgument argument)
>at
> Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.Resolve(ServiceCallSite
> callSite, ServiceProviderEngineScope scope)
>at
> Microsoft.Extensions.DependencyInjection.ServiceLookup.DynamicServiceProviderEngine.<>c__DisplayClass1_0.b__0(ServiceProviderEngineScope
> scope)
>at
> Microsoft.Extensions.DependencyInjection.ServiceLookup.ServiceProviderEngine.GetService(Type
> serviceType, ServiceProviderEngineScope serviceProviderEngineScope)
>at
> Microsoft.Extensions.DependencyInjection.ServiceLookup.ServiceProviderEngineScope.GetService(Type
> serviceType)
>at Worker.Program.Main(String[] args) in
> /home/jenkins/agent/workspace/Connectivity/DataScience/C.Flight.DsFeed/src/DsFeed/Worker/Program.cs:line
>
>
> Looking in the sources of the dotnet ignite core, I noticed that the
> constructor of ClientFailoverSocket calls the method  GetIpEndPoints which
> in turn calls the method GetIps.
> In the GetIps method, when a socket exception occurs, the method calls
> *_logger.Debug()* to log the error. But the *_logger *member is not yet
> setted in the *ClientFailoverSocket *constructor.
>
>
>
> Regards,
>
>
> * Stéphane Gayet *
>  Responsable Développement / Development Manager
> (+33) 1 70 38 70 74 +33 6 00 00 00 00
> stephane.ga...@misterfly.com
>
> ü *Adoptez l’éco-attitude ! N’imprimez cet email que si c’est nécessaire.*
>
> 
> 

Exception using ThinClient in dotnet core

2021-10-18 Thread Stéphane Gayet
Hi community,

I'm trying to use thin client in a net5.0 project. When running the process, I 
get an error when calling StartClient().

Exception":"System.ArgumentNullException: Value cannot be null. (Parameter 
'logger')
   at Apache.Ignite.Core.Impl.Common.IgniteArgumentCheck.NotNull(Object arg, 
String argName)
   at Apache.Ignite.Core.Log.LoggerExtensions.Log(ILogger logger, LogLevel 
level, Exception ex, String message)
   at Apache.Ignite.Core.Log.LoggerExtensions.Debug(ILogger logger, Exception 
ex, String message)
   at Apache.Ignite.Core.Impl.Client.ClientFailoverSocket.GetIps(String host, 
Boolean suppressExceptions)
   at 
Apache.Ignite.Core.Impl.Client.ClientFailoverSocket.d__15.MoveNext()
   at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
   at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)
   at 
Apache.Ignite.Core.Impl.Client.ClientFailoverSocket..ctor(IgniteClientConfiguration
 config, Marshaller marsh, TransactionsClient transactions)
   at 
Apache.Ignite.Core.Impl.Client.IgniteClient..ctor(IgniteClientConfiguration 
clientConfiguration)
   at Apache.Ignite.Core.Ignition.StartClient(IgniteClientConfiguration 
clientConfiguration)
   at MrFly.Flight.DsFeed.Infra.ThinClientDataStore..ctor(IMapper mapper, 
IOptions`1 settings, ILogger`1 logger) in 
/home/jenkins/agent/workspace/Connectivity/DataScience/C.Flight.DsFeed/src/DsFeed/Infra/ThinClientDataStore.cs:line
 42
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] 
arguments, Signature sig, Boolean constructor, Boolean wrapExceptions)
   at System.Reflection.RuntimeConstructorInfo.Invoke(BindingFlags invokeAttr, 
Binder binder, Object[] parameters, CultureInfo culture)
   at 
Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteVisitor`2.VisitCallSiteMain(ServiceCallSite
 callSite, TArgument argument)
   at 
Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitCache(ServiceCallSite
 callSite, RuntimeResolverContext context, ServiceProviderEngineScope 
serviceProviderEngine, RuntimeResolverLock lockType)
   at 
Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitScopeCache(ServiceCallSite
 singletonCallSite, RuntimeResolverContext context)
   at 
Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteVisitor`2.VisitCallSite(ServiceCallSite
 callSite, TArgument argument)
   at 
Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitConstructor(ConstructorCallSite
 constructorCallSite, RuntimeResolverContext context)
   at 
Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteVisitor`2.VisitCallSiteMain(ServiceCallSite
 callSite, TArgument argument)
   at 
Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitDisposeCache(ServiceCallSite
 transientCallSite, RuntimeResolverContext context)
   at 
Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteVisitor`2.VisitCallSite(ServiceCallSite
 callSite, TArgument argument)
   at 
Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.Resolve(ServiceCallSite
 callSite, ServiceProviderEngineScope scope)
   at 
Microsoft.Extensions.DependencyInjection.ServiceLookup.DynamicServiceProviderEngine.<>c__DisplayClass1_0.b__0(ServiceProviderEngineScope
 scope)
   at 
Microsoft.Extensions.DependencyInjection.ServiceLookup.ServiceProviderEngine.GetService(Type
 serviceType, ServiceProviderEngineScope serviceProviderEngineScope)
   at 
Microsoft.Extensions.DependencyInjection.ServiceLookup.ServiceProviderEngineScope.GetService(Type
 serviceType)
   at Worker.Program.Main(String[] args) in 
/home/jenkins/agent/workspace/Connectivity/DataScience/C.Flight.DsFeed/src/DsFeed/Worker/Program.cs:line

Looking in the sources of the dotnet ignite core, I noticed that the 
constructor of ClientFailoverSocket calls the method  GetIpEndPoints which in 
turn calls the method GetIps.
In the GetIps method, when a socket exception occurs, the method calls 
_logger.Debug() to log the error. But the _logger member is not yet setted in 
the ClientFailoverSocket constructor.

[cid:0b597528-59f1-48a9-890d-41533e1ee55e]

Regards,


 Stéphane Gayet
 Responsable Développement / Development Manager
[https://storage.letsignit.com/5b884a1a1e0a0f0007c51472/55370421467418375936059118036679301217_5e30230b86cfde000a68f219_8e82e6302fdca8989b5eec24209877fa.png]
   (+33) 1 70 38 70 74 +33 6 00 00 00 00
[https://storage.letsignit.com/5b884a1a1e0a0f0007c51472/317558219157559608341284032890064642484_5e30230b86cfde000a68f219_3c9ea4b29daa57047729e2a2e8dfc4a4.png]
  stephane.ga...@misterfly.com
ü Adoptez l’éco-attitude ! N’imprimez cet email que si c’est nécessaire.

Re: Ignite cluster stability problems under heavy load

2021-10-18 Thread Piotr Jagielski

Hi again,

We managed to stabilize heap a bit for now, by tuning some general and 
cache configuration:


- enabled direct I/O

- enabled write throttling

- changed writeSynchronizationMode to FULL_ASYNC

We have now 5 days without peaks in heap (stabilized at around 4GB) and 
GC pauses (< 10ms). Update speed (dataStreamer) is around 5K/sec.


In the logs we have following entries:

2021-10-18 10:59:42 INFO  Throttling is applied to page modifications 
[percentOfPartTime=0.64, markDirty=2120 pages/sec, checkpointWrite=1926 
pages/sec, estIdealMarkDirty=0 pages/sec, curDirty=0.02, maxDirty=0.13, 
avgParkTime=13575716 ns, pages: (total=382258, evicted=0, 
written=108269, synced=0, cpBufUsed=10503, cpBufTotal=259107)]
2021-10-18 10:59:52 INFO  Throttling is applied to page modifications 
[percentOfPartTime=0.60, markDirty=2367 pages/sec, checkpointWrite=2151 
pages/sec, estIdealMarkDirty=0 pages/sec, curDirty=0.04, maxDirty=0.22, 
avgParkTime=12109253 ns, pages: (total=382258, evicted=0, 
written=204076, synced=0, cpBufUsed=11903, cpBufTotal=259107)]
2021-10-18 11:00:02 INFO  Throttling is applied to page modifications 
[percentOfPartTime=0.25, markDirty=2577 pages/sec, checkpointWrite=2340 
pages/sec, estIdealMarkDirty=0 pages/sec, curDirty=0.07, maxDirty=0.31, 
avgParkTime=4708297 ns, pages: (total=382258, evicted=0, written=298971, 
synced=0, cpBufUsed=8287, cpBufTotal=259107)]



We also observed that fsync phase of checkpointing duration dropped from 
2-5 secs to under 20 millis.


So, is the throttling the thing that helped? Is our disk too slow?

W dniu 08.10.2021 o 08:44, Piotr Jagielski pisze:


Hi again,

Any advice? We're really struggling with our cluster stability - we 
had to turn on throttling before sending data to DataStreamer, but 
problems still happen.


In DataStreamer javadoc I found this:

perNodeParallelOperations(int) - sometimes data may be added to the 
data streamer via addData(Object, Object) method faster than it can be 
put in cache. In this case, new buffered stream messages are sent to 
remote nodes before responses from previous ones are received. This 
could cause unlimited heap memory utilization growth on local and 
remote nodes. To control memory utilization, this setting limits 
maximum allowed number of parallel buffered stream messages that are 
being processed on remote nodes. If this number is exceeded, then 
addData(Object, Object) method will block to control memory 
utilization. Default is equal to CPU count on remote node multiply by 
DFLT_PARALLEL_OPS_MULTIPLIER.


This could be the case - we have unlimited heap memory utilization 
growth, on histogram I see GridDhtAtomicSingleUpdateRequest and 
GridNearAtomicUpdateResponse classes.


How can we check that "data may be added... faster than it can be put 
in cache"? Is there any growing metric exposed via JMX to check?


Regarding HDD I managed to run hdparm:

/dev/sda1:
 Timing cached reads:   15036 MB in  2.00 seconds = 7525.21 MB/sec
 Timing buffered disk reads: 2664 MB in  3.00 seconds = 887.36 MB/sec

Regards,

Piotr


On 2021/10/06 12:26:07, Piotr Jagielski wrote:
> OK I managed to take a larger heap histogram - attached
>
> Also I found WARNs about long running cache futures:
>
> 2021-10-06 14:15:29 WARN  First 10 long running cache futures
> [total=5986879]
> 2021-10-06 14:15:29 WARN  >>> Future [startTime=14:00:24.385,
> curTime=14:15:28.372, fut=GridDhtAtomicSingleUpdat
> eFuture [allUpdated=true, super=GridDhtAtomicAbstractUpdateFuture
> [futId=436214273, resCnt=0, addedReader=false,
>  dhtRes=TransformMapView
> {44055f7f-02d5-42bf-bcbe-2e78b45b7954=[res=false, size=1, 
nearSize=0]}]]]

> 2021-10-06 14:15:29 WARN  >>> Future [startTime=14:00:24.385,
> curTime=14:15:28.372, fut=GridDhtAtomicSingleUpdat
> eFuture [allUpdated=true, super=GridDhtAtomicAbstractUpdateFuture
> [futId=436214275, resCnt=0, addedReader=false,
>  dhtRes=TransformMapView
> {44055f7f-02d5-42bf-bcbe-2e78b45b7954=[res=false, size=1, 
nearSize=0]}]]]

> 2021-10-06 14:15:29 WARN  >>> Future [startTime=14:00:24.385,
> curTime=14:15:28.372, fut=GridDhtAtomicSingleUpdat
> eFuture [allUpdated=true, super=GridDhtAtomicAbstractUpdateFuture
> [futId=436214277, resCnt=0, addedReader=false,
>  dhtRes=TransformMapView
> {44055f7f-02d5-42bf-bcbe-2e78b45b7954=[res=false, size=1, 
nearSize=0]}]]]

> 2021-10-06 14:15:29 WARN  >>> Future [startTime=14:00:24.385,
> curTime=14:15:28.372, fut=GridDhtAtomicSingleUpdat
> eFuture [allUpdated=true, super=GridDhtAtomicAbstractUpdateFuture
> [futId=436214279, resCnt=0, addedReader=false,
>  dhtRes=TransformMapView
> {44055f7f-02d5-42bf-bcbe-2e78b45b7954=[res=false, size=1, 
nearSize=0]}]]]

> 2021-10-06 14:15:29 WARN  >>> Future [startTime=14:00:24.385,
> curTime=14:15:28.372, fut=GridDhtAtomicSingleUpdat
> eFuture [allUpdated=true, super=GridDhtAtomicAbstractUpdateFuture
> [futId=436214281, resCnt=0, addedReader=false,
>  dhtRes=TransformMapView
> {44055f7f-02d5-42bf-bcbe-2e78b45b7954=[res=false, size=1, 

Re[6]: Failed to perform cache operation (cache is stopped)

2021-10-18 Thread Zhenya Stanilovsky


Akash, can you attach here full logs with failure, not only a part ?
thanks ! 
>Could someone please help me out here to find out the root cause of this 
>problem?
>This is now happening so frequently.  
>On Wed, Oct 13, 2021 at 3:56 PM Akash Shinde < akashshi...@gmail.com > wrote:
>>Yes, I have set  failureDetectionTimeout  = 6.
>>There is no long GC pause. 
>>Core1 GC report
>>Core 2 GC Report  
>>On Wed, Oct 13, 2021 at 1:16 PM Zhenya Stanilovsky < arzamas...@mail.ru > 
>>wrote:
>>>
>>>Ok, additionally there is info about node segmentation:
>>>Node FAILED: TcpDiscoveryNode [id=fb67a5fd-f1ab-441d-a38e-bab975cd1037, 
>>>consistentId=0:0:0:0:0:0:0:1%lo,XX.XX.XX.XX, 127.0.0.1:47500 , 
>>>addrs=ArrayList [0:0:0:0:0:0:0:1%lo, XX.XX.XX.XX, 127.0.0.1], 
>>>sockAddrs=HashSet [ qagmscore02.xyz.com/XX.XX.XX.XX:47500 , 
>>>/0:0:0:0:0:0:0:1%lo:47500, / 127.0.0.1:47500 ], discPort=47500, order=25, 
>>>intOrder=16, lastExchangeTime=1633426750418, loc=false, 
>>>ver=2.10.0#20210310-sha1:bc24f6ba, isClient=false]
>>> 
>>>Local node SEGMENTED: TcpDiscoveryNode 
>>>[id=7f357ca2-0ae2-4af0-bfa4-d18e7bcb3797
>>> 
>>>Possible too long JVM pause: 1052 milliseconds.
>>>
>>>Are you changed default settings networking timeouts ? If no — try to 
>>>recheck setting of failureDetectionTimeout
>>>If you have GC pause longer than 10 seconds, node will be dropped from the 
>>>cluster(by default). 
>>>   
This is the codebase of AgmsCacheJdbcStoreSessionListner.java
This null pointer occurs due to a datasource bean not found. 
The cluster was working fine but what could be the reason for 
unavailability of datasource bean in between running cluster. 
 
 
public class AgmsCacheJdbcStoreSessionListener extends 
CacheJdbcStoreSessionListener {


  @SpringApplicationContextResource
  public void setupDataSourceFromSpringContext(Object appCtx) {
ApplicationContext appContext = (ApplicationContext) appCtx;
setDataSource((DataSource) appContext.getBean("dataSource"));
  }
}
 
I can see one log line that tells us about a problem on the network side. 
Is this the possible reason?
 
2021-10-07 16:28:22,889 197776202 [tcp-disco-msg-worker-[fb67a5fd 
XX.XX.XX.XX:47500 crd]-#2%springDataNode%-#69%springDataNode%] WARN  
o.a.i.s.d.tcp.TcpDiscoverySpi - Node is out of topology (probably, due to 
short-time network problems).
   
On Mon, Oct 11, 2021 at 7:15 PM stanilovsky evgeny < 
estanilovs...@gridgain.com > wrote:
>may be this ? 
> 
>Caused by: java.lang.NullPointerException: null
>at 
>com.xyz.agms.grid.cache.loader.AgmsCacheJdbcStoreSessionListener.setupDataSourceFromSpringContext(AgmsCacheJdbcStoreSessionListener.java:14)
>... 23 common frames omitted
> 
> 
>>Hi Zhenya,
>>CacheStoppedException occurred again on our ignite cluster. I have 
>>captured logs with  IGNITE_QUIET = false.
>>There are four core nodes in the cluster and two nodes gone down. I am 
>>attaching the logs for two failed nodes.
>>Please let me know if you need any further details.
>> 
>>Thanks,
>>Akash   
>>On Tue, Sep 7, 2021 at 12:19 PM Zhenya Stanilovsky < arzamas...@mail.ru > 
>>wrote:
>>>plz share somehow these logs, if you have no ideas how to share, you can 
>>>send it directly to  arzamas...@mail.ru
>>>   
Meanwhile I grep the logs with the next occurrence of cache stopped 
exception,can someone highlight if there is any known bug related to 
this?
I want to check the possible reason for this cache stop exception.  
On Mon, Sep 6, 2021 at 6:27 PM Akash Shinde < akashshi...@gmail.com > 
wrote:
>Hi Zhenya,
>Thanks for the quick response.
>I believe you are talking about ignite instances. There is single 
>ignite using in application.
>I also want to point out that I am not using destroyCache()  method 
>anywhere in application.
> 
>I will set   IGNITE_QUIET = false  and try to grep the required logs.
>This issue occurs by random and there is no way reproduce it.
> 
>Thanks,
>Akash
> 
>   
>On Mon, Sep 6, 2021 at 5:33 PM Zhenya Stanilovsky < arzamas...@mail.ru 
>> wrote:
>>Hi, Akash
>>You can obtain such a case, for example when you have several 
>>instances and :
>>inst1:
>>cache = inst1.getOrCreateCache("cache1");
>> 
>>after inst2 destroy calling:
>> 
>>cache._some_method_call_
>> 
>>inst2:
>> inst2.destroyCache("cache1");
>> 
>>or shorter: you still use instance that already destroyed, you can 
>>simple grep your logs and found the time when cache has been stopped.
>>probably you need to set  IGNITE_QUIET = false.
>>[1]