[jira] [Commented] (IGNITE-10959) Memory leaks in continuous query handlers
[ https://issues.apache.org/jira/browse/IGNITE-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967866#comment-16967866 ] Zane Hu commented on IGNITE-10959: -- We have observed two cases of using huge amount of memory in Ignite Continuous Query, which both are caused by too many pending cache-update events since an earlier event than the pending events has not arrived yet. BTW, we use Ignite 2.7.0. * One is CacheContinuousQueryHandler.rcvs growing to 7.7 GB Retained Heap, seen in Jmap/Memory Analyzer. Also we saw "Pending events reached max of buffer size" in Ignite log file. According to [https://github.com/apache/ignite/blob/ignite-2.7/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/query/continuous/CacheContinuousQueryPartitionRecovery.java#L196], it is because the size of CacheContinuousQueryPartitionRecovery.pendingEvts >= MAX_BUFF_SIZE, (default 10,000). And Ignite will flush and remove 10% of the entries in the pendingEvts, regardless some unarrived early events are dropped without notifying the listener. This upper-bound limit of MAX_BUFF_SIZE prevents the memory from further growing to OOM. * Another is CacheContinuousQueryEventBuffer.pending growing to 22 GB Retained Heap, seen in Jmap/Memory Analyzer. According to [https://github.com/apache/ignite/blob/ignite-2.7/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/query/continuous/CacheContinuousQueryEventBuffer.java#L168], the cache-update events are processed in batch of CacheContinuousQueryEventBuffer.Batch.entries[BUF_SIZE] (default BUF_SIZE is 1,000). If an event entry is within the current batch (e.updateCounter() <= batch.endCntr), it is processed by batch.processEntry0(). Otherwise it is put in CacheContinuousQueryEventBuffer.pending. However, according to [https://github.com/apache/ignite/blob/ignite-2.7/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/query/continuous/CacheContinuousQueryEventBuffer.java#L425], if any event within the current batch has not arrived, Ignite will not move to the next batch to process entries in CacheContinuousQueryEventBuffer.pending. Differently from processing in CacheContinuousQueryPartitionRecovery.pendingEvts and MAX_BUFF_SIZE, there is NO upper-bound limit on CacheContinuousQueryEventBuffer.pending. It means if an earlier event than the events in CacheContinuousQueryEventBuffer.pending never comes for some reason (high frequency of lots of events, high concurrency, timeout, ...), CacheContinuousQueryEventBuffer.pending will grow to OOM. To prevent this, I think Ignite at least needs to add an upper-bound limit and some processing here, to flush and remove 10% events out from CacheContinuousQueryEventBuffer.pending, similarly to CacheContinuousQueryPartitionRecovery.pendingEvts. In terms of exception handling, I think dropping some events is better than OOM. > Memory leaks in continuous query handlers > - > > Key: IGNITE-10959 > URL: https://issues.apache.org/jira/browse/IGNITE-10959 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.7 >Reporter: Denis Mekhanikov >Priority: Major > Fix For: 2.9 > > Attachments: CacheContinuousQueryMemoryUsageTest.java, > CacheContinuousQueryMemoryUsageTest.result, > CacheContinuousQueryMemoryUsageTest2.java, continuousquery_leak_profile.png > > > Continuous query handlers don't clear internal data structures after cache > events are processed. > A test, that reproduces the problem, is attached. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-11765) Vulnerable library H2 Database Engine1.4.197 used
[ https://issues.apache.org/jira/browse/IGNITE-11765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967823#comment-16967823 ] Erikson Murrugarra commented on IGNITE-11765: - Hi. Do you have any updates for this issue? Thank you. > Vulnerable library H2 Database Engine1.4.197 used > - > > Key: IGNITE-11765 > URL: https://issues.apache.org/jira/browse/IGNITE-11765 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.7 >Reporter: VIJAY BHATT >Priority: Major > > We use blackduck for scanning our project. It has identified Ignite 2.7.0 > using H2 Database Engine version 1.4.197 as a vulnerable library having the > following 2 vulnerabilities: > BDSA-2018-1048 (CVE-2018-10054) > BDSA-2018-2507 (CVE-2018-14335) > Suggested fix by blackduck is to use version 1.4.198 > We tried using 1.4.198 using jar override but it has some breaking changes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-12300) ComputeJob#cancel executes with wrong SecurityContext
[ https://issues.apache.org/jira/browse/IGNITE-12300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967635#comment-16967635 ] Andrey Kuznetsov commented on IGNITE-12300: --- The fix and tests look OK to me. > ComputeJob#cancel executes with wrong SecurityContext > - > > Key: IGNITE-12300 > URL: https://issues.apache.org/jira/browse/IGNITE-12300 > Project: Ignite > Issue Type: Bug >Reporter: Denis Garus >Assignee: Denis Garus >Priority: Major > Labels: iep-38 > Attachments: ComputeJobCancelReproducerTest.java > > Time Spent: 10m > Remaining Estimate: 0h > > ComputeJob#cancel executes with the security context of a current node rather > than a security context of a node that initiates ComputeJob. > > Reproducer: > [https://github.com/apache/ignite/pull/6984/files] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (IGNITE-12303) Change comment for an enumeration item CACHE_DESTROY
[ https://issues.apache.org/jira/browse/IGNITE-12303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Albert Iskhakov reassigned IGNITE-12303: Assignee: Albert Iskhakov > Change comment for an enumeration item CACHE_DESTROY > > > Key: IGNITE-12303 > URL: https://issues.apache.org/jira/browse/IGNITE-12303 > Project: Ignite > Issue Type: Wish > Components: documentation, security >Reporter: Surkov Aleksandr >Assignee: Albert Iskhakov >Priority: Minor > Labels: newbie > > For the _org.apache.ignite.plugin.security.SecurityPermission#CACHE_DESTROY_ > enumeration element it would be worth changing the comment. "Cache create > permission." it's not very good. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (IGNITE-12278) Add metric showing how many nodes may safely leave the cluster wihout partition loss
[ https://issues.apache.org/jira/browse/IGNITE-12278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Albert Iskhakov reassigned IGNITE-12278: Assignee: Albert Iskhakov > Add metric showing how many nodes may safely leave the cluster wihout > partition loss > > > Key: IGNITE-12278 > URL: https://issues.apache.org/jira/browse/IGNITE-12278 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Rakov >Assignee: Albert Iskhakov >Priority: Major > Labels: IEP-35, newbie > Fix For: 2.8 > > > We already have CacheGroupMetricsMXBean#getMinimumNumberOfPartitionCopies > metrics that shows partitions redundancy number for a specific cache group. > It would be handy if user has single aggregated metric for all cache groups > showing how many nodes may leave the cluster without partition loss in any > cache. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12351) Append additional cp tracking activity - pages sort.
Vladimir Malinovskiy created IGNITE-12351: - Summary: Append additional cp tracking activity - pages sort. Key: IGNITE-12351 URL: https://issues.apache.org/jira/browse/IGNITE-12351 Project: Ignite Issue Type: Bug Reporter: Vladimir Malinovskiy Assignee: Vladimir Malinovskiy CheckpointMetricsTracker has no info about _splitAndSortCpPagesIfNeeded_ stage, thus in case of huge number of dirty pages someone can observe in log: 10:08:00 checkpoint started 10:10:00 checkpoint finished <--- ?? 10:10:20 checkpoint started if checkpointFrequency = 3 and all tracker durations: beforeLockDuration, lockWaitDuration gives no clue what kind of work (20 sec) Checkpointer thread is waiting for. Additionally (hope this is not big deal) need t obe fixed redundant effectivePageId computation cause FullPageId already has effectivePageId info. {{return Long.compare(PageIdUtils.effectivePageId(o1.pageId()), PageIdUtils.effectivePageId(o2.pageId()));}} writeCheckpointEntr() duration should also be logged. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-12189) Implement correct limit for TextQuery
[ https://issues.apache.org/jira/browse/IGNITE-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967465#comment-16967465 ] Ivan Pavlukhin commented on IGNITE-12189: - [~Yuriy_Shuliha], could you please create a ticket about passing `List` instead of `Collection` to `GridCacheQueryFutureAdapter#enqueue` method? > Implement correct limit for TextQuery > - > > Key: IGNITE-12189 > URL: https://issues.apache.org/jira/browse/IGNITE-12189 > Project: Ignite > Issue Type: Improvement > Components: general >Reporter: Yuriy Shuliha >Assignee: Yuriy Shuliha >Priority: Major > Fix For: 2.8 > > Time Spent: 10h 50m > Remaining Estimate: 0h > > PROBLEM > For now each server-node returns all response records to the client-node and > it may contain ~thousands, ~hundred thousands records. > Event if we need only first 10-100. Again, all the results are added to > queue in _*GridCacheQueryFutureAdapter*_ in arbitrary order by pages. > There are no any means to deliver deterministic result. > SOLUTION > Implement _*limit*_ as parameter for _*TextQuery*_ and > _*GridCacheQueryRequest*_ > It should be passed as limit parameter in Lucene's > _*IndexSearcher.search()*_ in _*GridLuceneIndex*_. > For distributed queries _*limit*_ will also trim response queue when merging > results. > Type: long > Special value: : 0 -> No limit (Integer.MAX_VALUE); -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-12189) Implement correct limit for TextQuery
[ https://issues.apache.org/jira/browse/IGNITE-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967463#comment-16967463 ] Ivan Pavlukhin commented on IGNITE-12189: - [~Yuriy_Shuliha], the PR looks fine to be merged. If there is no work to be done, I will merge it, please confirm. > Implement correct limit for TextQuery > - > > Key: IGNITE-12189 > URL: https://issues.apache.org/jira/browse/IGNITE-12189 > Project: Ignite > Issue Type: Improvement > Components: general >Reporter: Yuriy Shuliha >Assignee: Yuriy Shuliha >Priority: Major > Fix For: 2.8 > > Time Spent: 10h 50m > Remaining Estimate: 0h > > PROBLEM > For now each server-node returns all response records to the client-node and > it may contain ~thousands, ~hundred thousands records. > Event if we need only first 10-100. Again, all the results are added to > queue in _*GridCacheQueryFutureAdapter*_ in arbitrary order by pages. > There are no any means to deliver deterministic result. > SOLUTION > Implement _*limit*_ as parameter for _*TextQuery*_ and > _*GridCacheQueryRequest*_ > It should be passed as limit parameter in Lucene's > _*IndexSearcher.search()*_ in _*GridLuceneIndex*_. > For distributed queries _*limit*_ will also trim response queue when merging > results. > Type: long > Special value: : 0 -> No limit (Integer.MAX_VALUE); -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-12189) Implement correct limit for TextQuery
[ https://issues.apache.org/jira/browse/IGNITE-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967460#comment-16967460 ] Ignite TC Bot commented on IGNITE-12189: {panel:title=Branch: [pull/6917/head] Base: [master] : No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel} [TeamCity *-- Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=4747552buildTypeId=IgniteTests24Java8_RunAll] > Implement correct limit for TextQuery > - > > Key: IGNITE-12189 > URL: https://issues.apache.org/jira/browse/IGNITE-12189 > Project: Ignite > Issue Type: Improvement > Components: general >Reporter: Yuriy Shuliha >Assignee: Yuriy Shuliha >Priority: Major > Fix For: 2.8 > > Time Spent: 10h 50m > Remaining Estimate: 0h > > PROBLEM > For now each server-node returns all response records to the client-node and > it may contain ~thousands, ~hundred thousands records. > Event if we need only first 10-100. Again, all the results are added to > queue in _*GridCacheQueryFutureAdapter*_ in arbitrary order by pages. > There are no any means to deliver deterministic result. > SOLUTION > Implement _*limit*_ as parameter for _*TextQuery*_ and > _*GridCacheQueryRequest*_ > It should be passed as limit parameter in Lucene's > _*IndexSearcher.search()*_ in _*GridLuceneIndex*_. > For distributed queries _*limit*_ will also trim response queue when merging > results. > Type: long > Special value: : 0 -> No limit (Integer.MAX_VALUE); -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-12049) Allow custom authenticators to use SSL certificates
[ https://issues.apache.org/jira/browse/IGNITE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967400#comment-16967400 ] Alexei Scherbakov commented on IGNITE-12049: [~SomeFire] I left comments on PR, please address them. Some general questions: 1. For "normal" cluster nodes attributes are already available using ClusterNode.attributes and user can just set any attribute and use it in custom authenticator without any changes in core by implementing [1]. Do I understand correctly the fix is only relevant for thin clients authenticated using [2] and not having associated local attributes ? Shouldn't we instead provide the ability for thin clients to have attributes and avoid changing IgniteConfiguration ? 2. Why the new attribute is not available during authentication for jdbc/odbc client types ? 3. Can you create an example of using custom authenticator with certificates ? [1] org.apache.ignite.internal.processors.security.GridSecurityProcessor#authenticateNode [2] org.apache.ignite.internal.processors.security.GridSecurityProcessor#authenticate > Allow custom authenticators to use SSL certificates > --- > > Key: IGNITE-12049 > URL: https://issues.apache.org/jira/browse/IGNITE-12049 > Project: Ignite > Issue Type: Improvement >Reporter: Ryabov Dmitrii >Assignee: Ryabov Dmitrii >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > Add SSL certificates to AuthenticationContext, so, authenticators can make > additional checks based on SSL certificates. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-12189) Implement correct limit for TextQuery
[ https://issues.apache.org/jira/browse/IGNITE-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967319#comment-16967319 ] Ignite TC Bot commented on IGNITE-12189: {panel:title=Branch: [pull/6917/head] Base: [master] : Possible Blockers (1)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1} {color:#d04437}Scala (Visor Console){color} [[tests 1|https://ci.ignite.apache.org/viewLog.html?buildId=4747474]] * VisorConsoleSelfTestSuite: VisorActivationCommandSpec.A top visor command for cluster activation should deactivate cluster - Test has low fail rate in base branch 0,0% and is not flaky {panel} [TeamCity *-- Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=4747552buildTypeId=IgniteTests24Java8_RunAll] > Implement correct limit for TextQuery > - > > Key: IGNITE-12189 > URL: https://issues.apache.org/jira/browse/IGNITE-12189 > Project: Ignite > Issue Type: Improvement > Components: general >Reporter: Yuriy Shuliha >Assignee: Yuriy Shuliha >Priority: Major > Fix For: 2.8 > > Time Spent: 10h 50m > Remaining Estimate: 0h > > PROBLEM > For now each server-node returns all response records to the client-node and > it may contain ~thousands, ~hundred thousands records. > Event if we need only first 10-100. Again, all the results are added to > queue in _*GridCacheQueryFutureAdapter*_ in arbitrary order by pages. > There are no any means to deliver deterministic result. > SOLUTION > Implement _*limit*_ as parameter for _*TextQuery*_ and > _*GridCacheQueryRequest*_ > It should be passed as limit parameter in Lucene's > _*IndexSearcher.search()*_ in _*GridLuceneIndex*_. > For distributed queries _*limit*_ will also trim response queue when merging > results. > Type: long > Special value: : 0 -> No limit (Integer.MAX_VALUE); -- This message was sent by Atlassian Jira (v8.3.4#803005)