[jira] [Commented] (IGNITE-10959) Memory leaks in continuous query handlers

2020-05-22 Thread Zane Hu (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114194#comment-17114194
 ] 

Zane Hu commented on IGNITE-10959:
--

As I commented before, there are two confirmed cases of memory blowup in 
Ignite, which are caused by too many cache update events in pending buffers 
since an older update event than the buffered pending events has not arrived 
yet. 
 # One is per-partition TreeMap 
CacheContinuousQueryPartitionRecovery.pendingEvts in 
CacheContinuousQueryHandler.rcvs. It has a upper-bound prevention of 
MAX_BUFF_SIZE.
 # Another is per-partition ConcurrentSkipListMap 
CacheContinuousQueryEventBuffer.pending. It has no upper-bound prevention on 
CacheContinuousQueryEventBuffer.pending.

Attached below is a pseudo code of the main logic flow of how they are 
processed in Ignite. Hope it can help people to fix the problem. They all 
started in 
CacheContinuousQueryHandler.CacheContinuousQueryListener.onEntryUpdated() when 
a cache entry is updated.

[^Memory_blowup_in_Ignite_CacheContinuousQueryHandler.txt]

 

> Memory leaks in continuous query handlers
> -
>
> Key: IGNITE-10959
> URL: https://issues.apache.org/jira/browse/IGNITE-10959
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.7
>Reporter: Denis Mekhanikov
>Assignee: Maxim Muzafarov
>Priority: Critical
> Attachments: CacheContinuousQueryMemoryUsageTest.java, 
> CacheContinuousQueryMemoryUsageTest.result, 
> CacheContinuousQueryMemoryUsageTest2.java, 
> Memory_blowup_in_Ignite_CacheContinuousQueryHandler.txt, 
> Memory_blowup_in_Ignite_CacheContinuousQueryHandler.txt, 
> Memory_blowup_in_Ignite_CacheContinuousQueryHandler.txt, 
> continuousquery_leak_profile.png
>
>
> Continuous query handlers don't clear internal data structures after cache 
> events are processed.
> A test, that reproduces the problem, is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10959) Memory leaks in continuous query handlers

2020-05-22 Thread Zane Hu (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zane Hu updated IGNITE-10959:
-
Attachment: Memory_blowup_in_Ignite_CacheContinuousQueryHandler.txt

> Memory leaks in continuous query handlers
> -
>
> Key: IGNITE-10959
> URL: https://issues.apache.org/jira/browse/IGNITE-10959
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.7
>Reporter: Denis Mekhanikov
>Assignee: Maxim Muzafarov
>Priority: Critical
> Attachments: CacheContinuousQueryMemoryUsageTest.java, 
> CacheContinuousQueryMemoryUsageTest.result, 
> CacheContinuousQueryMemoryUsageTest2.java, 
> Memory_blowup_in_Ignite_CacheContinuousQueryHandler.txt, 
> Memory_blowup_in_Ignite_CacheContinuousQueryHandler.txt, 
> Memory_blowup_in_Ignite_CacheContinuousQueryHandler.txt, 
> continuousquery_leak_profile.png
>
>
> Continuous query handlers don't clear internal data structures after cache 
> events are processed.
> A test, that reproduces the problem, is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10959) Memory leaks in continuous query handlers

2020-05-22 Thread Zane Hu (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zane Hu updated IGNITE-10959:
-
Attachment: Memory_blowup_in_Ignite_CacheContinuousQueryHandler.txt

> Memory leaks in continuous query handlers
> -
>
> Key: IGNITE-10959
> URL: https://issues.apache.org/jira/browse/IGNITE-10959
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.7
>Reporter: Denis Mekhanikov
>Assignee: Maxim Muzafarov
>Priority: Critical
> Attachments: CacheContinuousQueryMemoryUsageTest.java, 
> CacheContinuousQueryMemoryUsageTest.result, 
> CacheContinuousQueryMemoryUsageTest2.java, 
> Memory_blowup_in_Ignite_CacheContinuousQueryHandler.txt, 
> Memory_blowup_in_Ignite_CacheContinuousQueryHandler.txt, 
> continuousquery_leak_profile.png
>
>
> Continuous query handlers don't clear internal data structures after cache 
> events are processed.
> A test, that reproduces the problem, is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10959) Memory leaks in continuous query handlers

2020-05-22 Thread Zane Hu (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zane Hu updated IGNITE-10959:
-
Attachment: Memory_blowup_in_Ignite_CacheContinuousQueryHandler.txt

> Memory leaks in continuous query handlers
> -
>
> Key: IGNITE-10959
> URL: https://issues.apache.org/jira/browse/IGNITE-10959
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.7
>Reporter: Denis Mekhanikov
>Assignee: Maxim Muzafarov
>Priority: Critical
> Attachments: CacheContinuousQueryMemoryUsageTest.java, 
> CacheContinuousQueryMemoryUsageTest.result, 
> CacheContinuousQueryMemoryUsageTest2.java, 
> Memory_blowup_in_Ignite_CacheContinuousQueryHandler.txt, 
> continuousquery_leak_profile.png
>
>
> Continuous query handlers don't clear internal data structures after cache 
> events are processed.
> A test, that reproduces the problem, is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-10959) Memory leaks in continuous query handlers

2019-11-13 Thread Zane Hu (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973903#comment-16973903
 ] 

Zane Hu commented on IGNITE-10959:
--

In addition to having an upper-bound limit to flush and remove 10% events out 
from CacheContinuousQueryEventBuffer.pending if the limit is reached, it would 
be nice to inform the app somehow that at least one of the earlier events than 
the flushed 10% of pending events has dropped because it has not arrived in 
time. This way, the app may have a chance to handle such exception afterwards, 
for example, by doing a full scan of the partition to which the dropped event 
belongs if possible. We would like to handle such exception for both 
CacheContinuousQueryPartitionRecovery.pendingEvts and 
CacheContinuousQueryEventBuffer.pending.

> Memory leaks in continuous query handlers
> -
>
> Key: IGNITE-10959
> URL: https://issues.apache.org/jira/browse/IGNITE-10959
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.7
>Reporter: Denis Mekhanikov
>Priority: Major
> Fix For: 2.9
>
> Attachments: CacheContinuousQueryMemoryUsageTest.java, 
> CacheContinuousQueryMemoryUsageTest.result, 
> CacheContinuousQueryMemoryUsageTest2.java, continuousquery_leak_profile.png
>
>
> Continuous query handlers don't clear internal data structures after cache 
> events are processed.
> A test, that reproduces the problem, is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-10959) Memory leaks in continuous query handlers

2019-11-05 Thread Zane Hu (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967866#comment-16967866
 ] 

Zane Hu commented on IGNITE-10959:
--

We have observed two cases of using huge amount of memory in Ignite Continuous 
Query, which both are caused by too many pending cache-update events since an 
earlier event than the pending events has not arrived yet. BTW, we use Ignite 
2.7.0.
 * One is CacheContinuousQueryHandler.rcvs growing to 7.7 GB Retained Heap, 
seen in Jmap/Memory Analyzer. Also we saw "Pending events reached max of buffer 
size" in Ignite log file. According to 
[https://github.com/apache/ignite/blob/ignite-2.7/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/query/continuous/CacheContinuousQueryPartitionRecovery.java#L196],
 it is because the size of CacheContinuousQueryPartitionRecovery.pendingEvts >= 
MAX_BUFF_SIZE, (default 10,000). And Ignite will flush and remove 10% of the 
entries in the pendingEvts, regardless some unarrived early events are dropped 
without notifying the listener. This upper-bound limit of MAX_BUFF_SIZE 
prevents the memory from further growing to OOM.
 * Another is CacheContinuousQueryEventBuffer.pending growing to 22 GB Retained 
Heap, seen in Jmap/Memory Analyzer. According to 
[https://github.com/apache/ignite/blob/ignite-2.7/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/query/continuous/CacheContinuousQueryEventBuffer.java#L168],
 the cache-update events are processed in batch of 
CacheContinuousQueryEventBuffer.Batch.entries[BUF_SIZE] (default BUF_SIZE is 
1,000). If an event entry is within the current batch (e.updateCounter() <= 
batch.endCntr), it is processed by batch.processEntry0(). Otherwise it is put 
in CacheContinuousQueryEventBuffer.pending. However, according to 
[https://github.com/apache/ignite/blob/ignite-2.7/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/query/continuous/CacheContinuousQueryEventBuffer.java#L425],
 if any event within the current batch has not arrived, Ignite will not move to 
the next batch to process entries in CacheContinuousQueryEventBuffer.pending. 
Differently from processing in 
CacheContinuousQueryPartitionRecovery.pendingEvts and MAX_BUFF_SIZE, there is 
NO upper-bound limit on CacheContinuousQueryEventBuffer.pending. It means if an 
earlier event than the events in CacheContinuousQueryEventBuffer.pending never 
comes for some reason (high frequency of lots of events, high concurrency, 
timeout, ...), CacheContinuousQueryEventBuffer.pending will grow to OOM. To 
prevent this, I think Ignite at least needs to add an upper-bound limit and 
some processing here, to flush and remove 10% events out from 
CacheContinuousQueryEventBuffer.pending, similarly to 
CacheContinuousQueryPartitionRecovery.pendingEvts. In terms of exception 
handling, I think dropping some events is better than OOM. 

 

> Memory leaks in continuous query handlers
> -
>
> Key: IGNITE-10959
> URL: https://issues.apache.org/jira/browse/IGNITE-10959
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.7
>Reporter: Denis Mekhanikov
>Priority: Major
> Fix For: 2.9
>
> Attachments: CacheContinuousQueryMemoryUsageTest.java, 
> CacheContinuousQueryMemoryUsageTest.result, 
> CacheContinuousQueryMemoryUsageTest2.java, continuousquery_leak_profile.png
>
>
> Continuous query handlers don't clear internal data structures after cache 
> events are processed.
> A test, that reproduces the problem, is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-10959) Memory leaks in continuous query handlers

2019-10-23 Thread Zane Hu (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958202#comment-16958202
 ] 

Zane Hu commented on IGNITE-10959:
--

The test result of [^CacheContinuousQueryMemoryUsageTest.result] if done using 
the following program with Ignite 2.7.0.

 

[^CacheContinuousQueryMemoryUsageTest2.java]

 

> Memory leaks in continuous query handlers
> -
>
> Key: IGNITE-10959
> URL: https://issues.apache.org/jira/browse/IGNITE-10959
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.7
>Reporter: Denis Mekhanikov
>Priority: Major
> Fix For: 2.9
>
> Attachments: CacheContinuousQueryMemoryUsageTest.java, 
> CacheContinuousQueryMemoryUsageTest.result, 
> CacheContinuousQueryMemoryUsageTest2.java, continuousquery_leak_profile.png
>
>
> Continuous query handlers don't clear internal data structures after cache 
> events are processed.
> A test, that reproduces the problem, is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10959) Memory leaks in continuous query handlers

2019-10-23 Thread Zane Hu (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zane Hu updated IGNITE-10959:
-
Attachment: CacheContinuousQueryMemoryUsageTest2.java

> Memory leaks in continuous query handlers
> -
>
> Key: IGNITE-10959
> URL: https://issues.apache.org/jira/browse/IGNITE-10959
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.7
>Reporter: Denis Mekhanikov
>Priority: Major
> Fix For: 2.9
>
> Attachments: CacheContinuousQueryMemoryUsageTest.java, 
> CacheContinuousQueryMemoryUsageTest.result, 
> CacheContinuousQueryMemoryUsageTest2.java, continuousquery_leak_profile.png
>
>
> Continuous query handlers don't clear internal data structures after cache 
> events are processed.
> A test, that reproduces the problem, is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-10959) Memory leaks in continuous query handlers

2019-10-22 Thread Zane Hu (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957176#comment-16957176
 ] 

Zane Hu commented on IGNITE-10959:
--

Here is the test result.

[^CacheContinuousQueryMemoryUsageTest.result]

 

> Memory leaks in continuous query handlers
> -
>
> Key: IGNITE-10959
> URL: https://issues.apache.org/jira/browse/IGNITE-10959
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.7
>Reporter: Denis Mekhanikov
>Priority: Major
> Fix For: 2.9
>
> Attachments: CacheContinuousQueryMemoryUsageTest.java, 
> CacheContinuousQueryMemoryUsageTest.result, continuousquery_leak_profile.png
>
>
> Continuous query handlers don't clear internal data structures after cache 
> events are processed.
> A test, that reproduces the problem, is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10959) Memory leaks in continuous query handlers

2019-10-22 Thread Zane Hu (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zane Hu updated IGNITE-10959:
-
Attachment: CacheContinuousQueryMemoryUsageTest.result

> Memory leaks in continuous query handlers
> -
>
> Key: IGNITE-10959
> URL: https://issues.apache.org/jira/browse/IGNITE-10959
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.7
>Reporter: Denis Mekhanikov
>Priority: Major
> Fix For: 2.9
>
> Attachments: CacheContinuousQueryMemoryUsageTest.java, 
> CacheContinuousQueryMemoryUsageTest.result, continuousquery_leak_profile.png
>
>
> Continuous query handlers don't clear internal data structures after cache 
> events are processed.
> A test, that reproduces the problem, is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (IGNITE-10959) Memory leaks in continuous query handlers

2019-10-19 Thread Zane Hu (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955345#comment-16955345
 ] 

Zane Hu edited comment on IGNITE-10959 at 10/20/19 1:09 AM:


An error case TransactionalPartitionedTwoBackupFullSync is as the following log 
we got from a slightly modified CacheContinuousQueryMemoryUsageTest.java. But 
we don't see such error for cases of TransactionalReplicatedTwoBackupFullSync 
and TransactionalPartitionedOneBackupFullSync. 
{quote}[ERROR] 
CacheContinuousQueryMemoryUsageTest>GridAbstractTest.access$000:143->GridAbstractTest.runTestInternal:2177->testTransactionalPartitionedTwoBackupFullSync:235->testContinuousQuery:355->assertEntriesReleased:423->assertEntriesReleased:435->checkEntryBuffers:466
 Backup queue is not empty. Node: 
continuous.CacheContinuousQueryMemoryUsageTest0; cache: test-cache. 
expected:<0> but was:<1>
{quote}
Looked at Ignite code, the following snip of onEntryUpdated() in 
CacheContinuousQueryHandler.java

 
{code:java}
if (primary || skipPrimaryCheck) // 
TransactionalReplicatedTwoBackupFullSync goes here
    onEntryUpdate(evt, notify, loc, recordIgniteEvt); // Notify the query 
client without putting evt.entry() into backupQ.
else // A backup node of TransactionalPartitionedTwoBackupFullSync goes here
    handleBackupEntry(cctx, evt.entry()); // This will put evt.entry() into 
backupQ
{code}
  

After notifying the query client, there seems an ack msg 
CacheContinuousQueryBatchAck sent to the CQ server side on backup nodes to 
clean up the entries in backupQ. And there is even a periodic BackupCleaner 
task every 5 seconds to clean up backupQ. The actual cleanup code is as below:

 
{code:java}
/**
 * @param updateCntr Acknowledged counter.
 */
void cleanupBackupQueue(Long updateCntr) {
Iterator it = backupQ.iterator();
while (it.hasNext()) {
CacheContinuousQueryEntry backupEntry = it.next();
if (backupEntry.updateCounter() <= updateCntr) // Remove 
backupEntry if its updateCounter <= Ack updateCntr
it.remove();
}
}
{code}
 

So some questions are
 # Why is a backupEntry still left over in backupQ after all these?
 # Is it possible that the updateCounter and Ack updateCntr are mis-calculated?
 # Is it possible that the ack msg is sent to only one of the two backup nodes? 
The load of 1000 updates of 3 nodes in a stable network, so there shouldn't be 
a msg somehow dropped in the middle. 

 

Please help to look into more, especially from Ignite experts or developers.

Thanks,

 


was (Author: zanehu):
An error case TransactionalPartitionedTwoBackupFullSync is as the following log 
we got from a slightly modified CacheContinuousQueryMemoryUsageTest.java. But 
we don't see such error for cases of TransactionalReplicatedTwoBackupFullSync 
and TransactionalPartitionedOneBackupFullSync. 

 

[ERROR] 
CacheContinuousQueryMemoryUsageTest>GridAbstractTest.access$000:143->GridAbstractTest.runTestInternal:2177->testTransactionalPartitionedTwoBackupFullSync:235->testContinuousQuery:355->assertEntriesReleased:423->assertEntriesReleased:435->checkEntryBuffers:466
 Backup queue is not empty. Node: 
continuous.CacheContinuousQueryMemoryUsageTest0; cache: test-cache. 
expected:<0> but was:<1>

 

Looked at Ignite code, the following snip of onEntryUpdated() in 
CacheContinuousQueryHandler.java

 
{code:java}
if (primary || skipPrimaryCheck) // 
TransactionalReplicatedTwoBackupFullSync goes here
    onEntryUpdate(evt, notify, loc, recordIgniteEvt); // Notify the query 
client without putting evt.entry() into backupQ.
else // A backup node of TransactionalPartitionedTwoBackupFullSync goes here
    handleBackupEntry(cctx, evt.entry()); // This will put evt.entry() into 
backupQ
{code}
  

After notifying the query client, there seems an ack msg 
CacheContinuousQueryBatchAck sent to the CQ server side on backup nodes to 
clean up the entries in backupQ. And there is even a periodic BackupCleaner 
task every 5 seconds to clean up backupQ. The actual cleanup code is as below:

 
{code:java}
/**
 * @param updateCntr Acknowledged counter.
 */
void cleanupBackupQueue(Long updateCntr) {
Iterator it = backupQ.iterator();
while (it.hasNext()) {
CacheContinuousQueryEntry backupEntry = it.next();
if (backupEntry.updateCounter() <= updateCntr) // Remove 
backupEntry if its updateCounter <= Ack updateCntr
it.remove();
}
}
{code}
 

So some questions are
 # Why is a backupEntry still left over in backupQ after all these?
 # Is it possible that the updateCounter and Ack updateCntr are mis-calculated?
 # Is it possible that the ack msg is sent to only one of the two backup nodes? 
The load of 1000 updates of 3 nodes in a stable network, so there shouldn't be 
a msg somehow dropped 

[jira] [Comment Edited] (IGNITE-10959) Memory leaks in continuous query handlers

2019-10-19 Thread Zane Hu (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955345#comment-16955345
 ] 

Zane Hu edited comment on IGNITE-10959 at 10/20/19 1:08 AM:


An error case TransactionalPartitionedTwoBackupFullSync is as the following log 
we got from a slightly modified CacheContinuousQueryMemoryUsageTest.java. But 
we don't see such error for cases of TransactionalReplicatedTwoBackupFullSync 
and TransactionalPartitionedOneBackupFullSync. 

 

[ERROR] 
CacheContinuousQueryMemoryUsageTest>GridAbstractTest.access$000:143->GridAbstractTest.runTestInternal:2177->testTransactionalPartitionedTwoBackupFullSync:235->testContinuousQuery:355->assertEntriesReleased:423->assertEntriesReleased:435->checkEntryBuffers:466
 Backup queue is not empty. Node: 
continuous.CacheContinuousQueryMemoryUsageTest0; cache: test-cache. 
expected:<0> but was:<1>

 

Looked at Ignite code, the following snip of onEntryUpdated() in 
CacheContinuousQueryHandler.java

 
{code:java}
if (primary || skipPrimaryCheck) // 
TransactionalReplicatedTwoBackupFullSync goes here
    onEntryUpdate(evt, notify, loc, recordIgniteEvt); // Notify the query 
client without putting evt.entry() into backupQ.
else // A backup node of TransactionalPartitionedTwoBackupFullSync goes here
    handleBackupEntry(cctx, evt.entry()); // This will put evt.entry() into 
backupQ
{code}
  

After notifying the query client, there seems an ack msg 
CacheContinuousQueryBatchAck sent to the CQ server side on backup nodes to 
clean up the entries in backupQ. And there is even a periodic BackupCleaner 
task every 5 seconds to clean up backupQ. The actual cleanup code is as below:

 
{code:java}
/**
 * @param updateCntr Acknowledged counter.
 */
void cleanupBackupQueue(Long updateCntr) {
Iterator it = backupQ.iterator();
while (it.hasNext()) {
CacheContinuousQueryEntry backupEntry = it.next();
if (backupEntry.updateCounter() <= updateCntr) // Remove 
backupEntry if its updateCounter <= Ack updateCntr
it.remove();
}
}
{code}
 

So some questions are
 # Why is a backupEntry still left over in backupQ after all these?
 # Is it possible that the updateCounter and Ack updateCntr are mis-calculated?
 # Is it possible that the ack msg is sent to only one of the two backup nodes? 
The load of 1000 updates of 3 nodes in a stable network, so there shouldn't be 
a msg somehow dropped in the middle. 

 

Please help to look into more, especially from Ignite experts or developers.

Thanks,

 


was (Author: zanehu):
An error case TransactionalPartitionedTwoBackupFullSync is as the following log 
we got from a slightly modified CacheContinuousQueryMemoryUsageTest.java:

 

[ERROR] 
CacheContinuousQueryMemoryUsageTest>GridAbstractTest.access$000:143->GridAbstractTest.runTestInternal:2177->testTransactionalPartitionedTwoBackupFullSync:235->testContinuousQuery:355->assertEntriesReleased:423->assertEntriesReleased:435->checkEntryBuffers:466
 Backup queue is not empty. Node: 
continuous.CacheContinuousQueryMemoryUsageTest0; cache: test-cache. 
expected:<0> but was:<1>

 

But we don't see such error for TransactionalReplicatedTwoBackupFullSync or 
TransactionalPartitionedOneBackupFullSync. Looked at Ignite code, the following 
snip of onEntryUpdated() in CacheContinuousQueryHandler.java

 
{code:java}
if (primary || skipPrimaryCheck) // 
TransactionalReplicatedTwoBackupFullSync goes here
    onEntryUpdate(evt, notify, loc, recordIgniteEvt); // Notify the query 
client without putting evt.entry() into backupQ.
else // A backup node of TransactionalPartitionedTwoBackupFullSync goes here
    handleBackupEntry(cctx, evt.entry()); // This will put evt.entry() into 
backupQ
{code}
  

After notifying the query client, there seems an ack msg 
CacheContinuousQueryBatchAck sent to the CQ server side on backup nodes to 
clean up the entries in backupQ. And there is even a periodic BackupCleaner 
task every 5 seconds to clean up backupQ. The actual cleanup code is as below:

 
{code:java}
/**
 * @param updateCntr Acknowledged counter.
 */
void cleanupBackupQueue(Long updateCntr) {
Iterator it = backupQ.iterator();
while (it.hasNext()) {
CacheContinuousQueryEntry backupEntry = it.next();
if (backupEntry.updateCounter() <= updateCntr) // Remove 
backupEntry if its updateCounter <= Ack updateCntr
it.remove();
}
}
{code}
 

So some questions are
 # Why is a backupEntry still left over in backupQ after all these?
 # Is it possible that the updateCounter and Ack updateCntr are mis-calculated?
 # Is it possible that the ack msg is sent to only one of the two backup nodes? 
The load of 1000 updates of 3 nodes in a stable network, so there shouldn't be 
a msg somehow dropped in the middle. 

 

[jira] [Comment Edited] (IGNITE-10959) Memory leaks in continuous query handlers

2019-10-19 Thread Zane Hu (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955345#comment-16955345
 ] 

Zane Hu edited comment on IGNITE-10959 at 10/20/19 1:07 AM:


An error case TransactionalPartitionedTwoBackupFullSync is as the following log 
we got from a slightly modified CacheContinuousQueryMemoryUsageTest.java:

 

[ERROR] 
CacheContinuousQueryMemoryUsageTest>GridAbstractTest.access$000:143->GridAbstractTest.runTestInternal:2177->testTransactionalPartitionedTwoBackupFullSync:235->testContinuousQuery:355->assertEntriesReleased:423->assertEntriesReleased:435->checkEntryBuffers:466
 Backup queue is not empty. Node: 
continuous.CacheContinuousQueryMemoryUsageTest0; cache: test-cache. 
expected:<0> but was:<1>

 

But we don't see such error for TransactionalReplicatedTwoBackupFullSync or 
TransactionalPartitionedOneBackupFullSync. Looked at Ignite code, the following 
snip of onEntryUpdated() in CacheContinuousQueryHandler.java

 
{code:java}
if (primary || skipPrimaryCheck) // 
TransactionalReplicatedTwoBackupFullSync goes here
    onEntryUpdate(evt, notify, loc, recordIgniteEvt); // Notify the query 
client without putting evt.entry() into backupQ.
else // A backup node of TransactionalPartitionedTwoBackupFullSync goes here
    handleBackupEntry(cctx, evt.entry()); // This will put evt.entry() into 
backupQ
{code}
  

After notifying the query client, there seems an ack msg 
CacheContinuousQueryBatchAck sent to the CQ server side on backup nodes to 
clean up the entries in backupQ. And there is even a periodic BackupCleaner 
task every 5 seconds to clean up backupQ. The actual cleanup code is as below:

 
{code:java}
/**
 * @param updateCntr Acknowledged counter.
 */
void cleanupBackupQueue(Long updateCntr) {
Iterator it = backupQ.iterator();
while (it.hasNext()) {
CacheContinuousQueryEntry backupEntry = it.next();
if (backupEntry.updateCounter() <= updateCntr) // Remove 
backupEntry if its updateCounter <= Ack updateCntr
it.remove();
}
}
{code}
 

So some questions are
 # Why is a backupEntry still left over in backupQ after all these?
 # Is it possible that the updateCounter and Ack updateCntr are mis-calculated?
 # Is it possible that the ack msg is sent to only one of the two backup nodes? 
The load of 1000 updates of 3 nodes in a stable network, so there shouldn't be 
a msg somehow dropped in the middle. 

 

Please help to look into more, especially from Ignite experts or developers.

Thanks,

 


was (Author: zanehu):
An error case TransactionalPartitionedTwoBackupFullSync is as the following log 
we got from a slightly modified CacheContinuousQueryMemoryUsageTest.java:

 

[ERROR] 
CacheContinuousQueryMemoryUsageTest>GridAbstractTest.access$000:143->GridAbstractTest.runTestInternal:2177->testTransactionalPartitionedTwoBackupFullSync:235->testContinuousQuery:355->assertEntriesReleased:423->assertEntriesReleased:435->checkEntryBuffers:466
 Backup queue is not empty. Node: 
continuous.CacheContinuousQueryMemoryUsageTest0; cache: test-cache. 
expected:<0> but was:<1>

 

But we don't see such error for TransactionalReplicatedTwoBackupFullSync or 
TransactionalPartitionedOneBackupFullSync

Looked at Ignite code, the following snip of onEntryUpdated() in 
CacheContinuousQueryHandler.java

 
{code:java}
if (primary || skipPrimaryCheck) // 
TransactionalReplicatedTwoBackupFullSync goes here
    onEntryUpdate(evt, notify, loc, recordIgniteEvt); // Notify the query 
client without putting evt.entry() into backupQ.
else // A backup node of TransactionalPartitionedTwoBackupFullSync goes here
    handleBackupEntry(cctx, evt.entry()); // This will put evt.entry() into 
backupQ
{code}
  

After notifying the query client, there seems an ack msg 
CacheContinuousQueryBatchAck sent to the CQ server side on backup nodes to 
clean up the entries in backupQ. And there is even a periodic BackupCleaner 
task every 5 seconds to clean up backupQ. The actual cleanup code is as below:

 
{code:java}
/**
 * @param updateCntr Acknowledged counter.
 */
void cleanupBackupQueue(Long updateCntr) {
Iterator it = backupQ.iterator();
while (it.hasNext()) {
CacheContinuousQueryEntry backupEntry = it.next();
if (backupEntry.updateCounter() <= updateCntr) // Remove 
backupEntry if its updateCounter <= Ack updateCntr
it.remove();
}
}
{code}
 

So some questions are
 # Why is a backupEntry still left over in backupQ after all these?
 # Is it possible that the updateCounter and Ack updateCntr are mis-calculated?
 # Is it possible that the ack msg is sent to only one of the two backup nodes? 
The load of 1000 updates of 3 nodes in a stable network, so there shouldn't be 
a msg somehow dropped in the middle. 

 

Please 

[jira] [Comment Edited] (IGNITE-10959) Memory leaks in continuous query handlers

2019-10-19 Thread Zane Hu (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955345#comment-16955345
 ] 

Zane Hu edited comment on IGNITE-10959 at 10/20/19 1:06 AM:


An error case TransactionalPartitionedTwoBackupFullSync is as the following log 
we got from a slightly modified CacheContinuousQueryMemoryUsageTest.java:

 

[ERROR] 
CacheContinuousQueryMemoryUsageTest>GridAbstractTest.access$000:143->GridAbstractTest.runTestInternal:2177->testTransactionalPartitionedTwoBackupFullSync:235->testContinuousQuery:355->assertEntriesReleased:423->assertEntriesReleased:435->checkEntryBuffers:466
 Backup queue is not empty. Node: 
continuous.CacheContinuousQueryMemoryUsageTest0; cache: test-cache. 
expected:<0> but was:<1>

 

But we don't see such error for TransactionalReplicatedTwoBackupFullSync or 
TransactionalPartitionedOneBackupFullSync

Looked at Ignite code, the following snip of onEntryUpdated() in 
CacheContinuousQueryHandler.java

 
{code:java}
if (primary || skipPrimaryCheck) // 
TransactionalReplicatedTwoBackupFullSync goes here
    onEntryUpdate(evt, notify, loc, recordIgniteEvt); // Notify the query 
client without putting evt.entry() into backupQ.
else // A backup node of TransactionalPartitionedTwoBackupFullSync goes here
    handleBackupEntry(cctx, evt.entry()); // This will put evt.entry() into 
backupQ
{code}
  

After notifying the query client, there seems an ack msg 
CacheContinuousQueryBatchAck sent to the CQ server side on backup nodes to 
clean up the entries in backupQ. And there is even a periodic BackupCleaner 
task every 5 seconds to clean up backupQ. The actual cleanup code is as below:

 
{code:java}
/**
 * @param updateCntr Acknowledged counter.
 */
void cleanupBackupQueue(Long updateCntr) {
Iterator it = backupQ.iterator();
while (it.hasNext()) {
CacheContinuousQueryEntry backupEntry = it.next();
if (backupEntry.updateCounter() <= updateCntr) // Remove 
backupEntry if its updateCounter <= Ack updateCntr
it.remove();
}
}
{code}
 

So some questions are
 # Why is a backupEntry still left over in backupQ after all these?
 # Is it possible that the updateCounter and Ack updateCntr are mis-calculated?
 # Is it possible that the ack msg is sent to only one of the two backup nodes? 
The load of 1000 updates of 3 nodes in a stable network, so there shouldn't be 
a msg somehow dropped in the middle. 

 

Please help to look into more, especially from Ignite experts or developers.

Thanks,

 


was (Author: zanehu):
An error case TransactionalPartitionedTwoBackupFullSync is as the following log 
we got from a slightly modified CacheContinuousQueryMemoryUsageTest.java:

 

[ERROR] 
CacheContinuousQueryMemoryUsageTest>GridAbstractTest.access$000:143->GridAbstractTest.runTestInternal:2177->testTransactionalPartitionedTwoBackupFullSync:235->testContinuousQuery:355->assertEntriesReleased:423->assertEntriesReleased:435->checkEntryBuffers:466
 Backup queue is not empty. Node: 
continuous.CacheContinuousQueryMemoryUsageTest0; cache: test-cache. 
expected:<0> but was:<1>

 

But we don't see such error for TransactionalReplicatedTwoBackupFullSync or 
TransactionalPartitionedOneBackupFullSync

Looked at Ignite code, the following snip of onEntryUpdated() in 
CacheContinuousQueryHandler.java

 

 
{code:java}
if (primary || skipPrimaryCheck) // 
TransactionalReplicatedTwoBackupFullSync goes here
    onEntryUpdate(evt, notify, loc, recordIgniteEvt); // Notify the query 
client without putting evt.entry() into backupQ.
else // A backup node of TransactionalPartitionedTwoBackupFullSync goes here
    handleBackupEntry(cctx, evt.entry()); // This will put evt.entry() into 
backupQ
{code}
 

 

After notifying the query client, there seems an ack msg 
CacheContinuousQueryBatchAck sent to the CQ server side on backup nodes to 
clean up the entries in backupQ. And there is even a periodic BackupCleaner 
task every 5 seconds to clean up backupQ. The actual cleanup code is as below:

 

 

 
{code:java}
/**
 * @param updateCntr Acknowledged counter.
 */
void cleanupBackupQueue(Long updateCntr) {
Iterator it = backupQ.iterator();
while (it.hasNext()) {
CacheContinuousQueryEntry backupEntry = it.next();
if (backupEntry.updateCounter() <= updateCntr) // Remove 
backupEntry if its updateCounter <= Ack updateCntr
it.remove();
}
}
{code}
 

So some questions are
 # Why is a backupEntry still left over in backupQ after all these?
 # Is it possible that the updateCounter and Ack updateCntr are mis-calculated?
 # Is it possible that the ack msg is sent to only one of the two backup nodes? 
The load of 1000 updates of 3 nodes in a stable network, so there shouldn't be 
a msg somehow dropped in the middle.

 


[jira] [Commented] (IGNITE-10959) Memory leaks in continuous query handlers

2019-10-19 Thread Zane Hu (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955345#comment-16955345
 ] 

Zane Hu commented on IGNITE-10959:
--

An error case TransactionalPartitionedTwoBackupFullSync is as the following log 
we got from a slightly modified CacheContinuousQueryMemoryUsageTest.java:

 

[ERROR] 
CacheContinuousQueryMemoryUsageTest>GridAbstractTest.access$000:143->GridAbstractTest.runTestInternal:2177->testTransactionalPartitionedTwoBackupFullSync:235->testContinuousQuery:355->assertEntriesReleased:423->assertEntriesReleased:435->checkEntryBuffers:466
 Backup queue is not empty. Node: 
continuous.CacheContinuousQueryMemoryUsageTest0; cache: test-cache. 
expected:<0> but was:<1>

 

But we don't see such error for TransactionalReplicatedTwoBackupFullSync or 
TransactionalPartitionedOneBackupFullSync

Looked at Ignite code, the following snip of onEntryUpdated() in 
CacheContinuousQueryHandler.java

 

 
{code:java}
if (primary || skipPrimaryCheck) // 
TransactionalReplicatedTwoBackupFullSync goes here
    onEntryUpdate(evt, notify, loc, recordIgniteEvt); // Notify the query 
client without putting evt.entry() into backupQ.
else // A backup node of TransactionalPartitionedTwoBackupFullSync goes here
    handleBackupEntry(cctx, evt.entry()); // This will put evt.entry() into 
backupQ
{code}
 

 

After notifying the query client, there seems an ack msg 
CacheContinuousQueryBatchAck sent to the CQ server side on backup nodes to 
clean up the entries in backupQ. And there is even a periodic BackupCleaner 
task every 5 seconds to clean up backupQ. The actual cleanup code is as below:

 

 

 
{code:java}
/**
 * @param updateCntr Acknowledged counter.
 */
void cleanupBackupQueue(Long updateCntr) {
Iterator it = backupQ.iterator();
while (it.hasNext()) {
CacheContinuousQueryEntry backupEntry = it.next();
if (backupEntry.updateCounter() <= updateCntr) // Remove 
backupEntry if its updateCounter <= Ack updateCntr
it.remove();
}
}
{code}
 

So some questions are
 # Why is a backupEntry still left over in backupQ after all these?
 # Is it possible that the updateCounter and Ack updateCntr are mis-calculated?
 # Is it possible that the ack msg is sent to only one of the two backup nodes? 
The load of 1000 updates of 3 nodes in a stable network, so there shouldn't be 
a msg somehow dropped in the middle.

 

Please help to look into more, especially from Ignite experts or developers.

Thanks,

 

> Memory leaks in continuous query handlers
> -
>
> Key: IGNITE-10959
> URL: https://issues.apache.org/jira/browse/IGNITE-10959
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.7
>Reporter: Denis Mekhanikov
>Priority: Major
> Fix For: 2.9
>
> Attachments: CacheContinuousQueryMemoryUsageTest.java, 
> continuousquery_leak_profile.png
>
>
> Continuous query handlers don't clear internal data structures after cache 
> events are processed.
> A test, that reproduces the problem, is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-10959) Memory leaks in continuous query handlers

2019-10-18 Thread Zane Hu (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954721#comment-16954721
 ] 

Zane Hu commented on IGNITE-10959:
--

We hit this issue too. Is it possible to have a quick fix patch soon? Thanks!

> Memory leaks in continuous query handlers
> -
>
> Key: IGNITE-10959
> URL: https://issues.apache.org/jira/browse/IGNITE-10959
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.7
>Reporter: Denis Mekhanikov
>Priority: Major
> Fix For: 2.9
>
> Attachments: CacheContinuousQueryMemoryUsageTest.java, 
> continuousquery_leak_profile.png
>
>
> Continuous query handlers don't clear internal data structures after cache 
> events are processed.
> A test, that reproduces the problem, is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)