[jira] [Commented] (IGNITE-11996) Assertion error in IgniteCacheOffheapManagerImpl#destroyCacheDataStore
[ https://issues.apache.org/jira/browse/IGNITE-11996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16888580#comment-16888580 ] Ivan Pavlukhin commented on IGNITE-11996: - Linked PR [#6705|https://github.com/apache/ignite/pull/6705] contains a code allowing to reproduce the issue easiliy. > Assertion error in IgniteCacheOffheapManagerImpl#destroyCacheDataStore > -- > > Key: IGNITE-11996 > URL: https://issues.apache.org/jira/browse/IGNITE-11996 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 2.7.5 >Reporter: Ivan Pavlukhin >Assignee: Ivan Pavlukhin >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Assertion error occurs in a following code: > {code} > boolean removed = partDataStores.remove(p, store); > assert removed; > {code} > It asserts that a partition store must be removed from a map here. But in > practice a removal can occur at least in 2 places: node stop and partition > eviction. Employed synchronization is not sufficient to guarantee that a > removal happens exactly once. > The issue is reproduced in {{IgniteSqlQueryMinMaxTest}} from time to time. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (IGNITE-11996) Assertion error in IgniteCacheOffheapManagerImpl#destroyCacheDataStore
[ https://issues.apache.org/jira/browse/IGNITE-11996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Pavlukhin updated IGNITE-11996: Description: Assertion error occurs in a following code: {code} boolean removed = partDataStores.remove(p, store); assert removed; {code} It asserts that a partition store must be removed from a map here. But in practice a removal can occur at least in 2 places: node stop and partition eviction. Employed synchronization is not sufficient to guarantee that a removal happens exactly once. The issues is reproduced in {{IgniteSqlQueryMinMaxTest}} from time to time. was: Assertion error occurs in a following code: {code} boolean removed = partDataStores.remove(p, store); assert removed; {code} It asserts that a partition store must be removed from a map here. But in practice a removal can occur at least in 2 places: node stop and partition eviction. Employed synchronization is not sufficient to guarantee that a removal happens exactly once. > Assertion error in IgniteCacheOffheapManagerImpl#destroyCacheDataStore > -- > > Key: IGNITE-11996 > URL: https://issues.apache.org/jira/browse/IGNITE-11996 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 2.7.5 >Reporter: Ivan Pavlukhin >Assignee: Ivan Pavlukhin >Priority: Major > > Assertion error occurs in a following code: > {code} > boolean removed = partDataStores.remove(p, store); > assert removed; > {code} > It asserts that a partition store must be removed from a map here. But in > practice a removal can occur at least in 2 places: node stop and partition > eviction. Employed synchronization is not sufficient to guarantee that a > removal happens exactly once. > The issues is reproduced in {{IgniteSqlQueryMinMaxTest}} from time to time. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (IGNITE-11996) Assertion error in IgniteCacheOffheapManagerImpl#destroyCacheDataStore
[ https://issues.apache.org/jira/browse/IGNITE-11996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Pavlukhin updated IGNITE-11996: Description: Assertion error occurs in a following code: {code} boolean removed = partDataStores.remove(p, store); assert removed; {code} It asserts that a partition store must be removed from a map here. But in practice a removal can occur at least in 2 places: node stop and partition eviction. Employed synchronization is not sufficient to guarantee that a removal happens exactly once. The issue is reproduced in {{IgniteSqlQueryMinMaxTest}} from time to time. was: Assertion error occurs in a following code: {code} boolean removed = partDataStores.remove(p, store); assert removed; {code} It asserts that a partition store must be removed from a map here. But in practice a removal can occur at least in 2 places: node stop and partition eviction. Employed synchronization is not sufficient to guarantee that a removal happens exactly once. The issues is reproduced in {{IgniteSqlQueryMinMaxTest}} from time to time. > Assertion error in IgniteCacheOffheapManagerImpl#destroyCacheDataStore > -- > > Key: IGNITE-11996 > URL: https://issues.apache.org/jira/browse/IGNITE-11996 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 2.7.5 >Reporter: Ivan Pavlukhin >Assignee: Ivan Pavlukhin >Priority: Major > > Assertion error occurs in a following code: > {code} > boolean removed = partDataStores.remove(p, store); > assert removed; > {code} > It asserts that a partition store must be removed from a map here. But in > practice a removal can occur at least in 2 places: node stop and partition > eviction. Employed synchronization is not sufficient to guarantee that a > removal happens exactly once. > The issue is reproduced in {{IgniteSqlQueryMinMaxTest}} from time to time. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (IGNITE-11996) Assertion error in IgniteCacheOffheapManagerImpl#destroyCacheDataStore
Ivan Pavlukhin created IGNITE-11996: --- Summary: Assertion error in IgniteCacheOffheapManagerImpl#destroyCacheDataStore Key: IGNITE-11996 URL: https://issues.apache.org/jira/browse/IGNITE-11996 Project: Ignite Issue Type: Bug Components: cache Affects Versions: 2.7.5 Reporter: Ivan Pavlukhin Assignee: Ivan Pavlukhin Assertion error occurs in a following code: {code} boolean removed = partDataStores.remove(p, store); assert removed; {code} It asserts that a partition store must be removed from a map here. But in practice a removal can occur at least in 2 places: node stop and partition eviction. Employed synchronization is not sufficient to guarantee that a removal happens exactly once. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (IGNITE-11995) control.sh if experimental command disabled - don't show help for experemental commands
[ https://issues.apache.org/jira/browse/IGNITE-11995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16888537#comment-16888537 ] Ignite TC Bot commented on IGNITE-11995: {panel:title=Branch: [pull/6704/head] Base: [master] : No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel} [TeamCity *--> Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=4348011&buildTypeId=IgniteTests24Java8_RunAll] > control.sh if experimental command disabled - don't show help for > experemental commands > --- > > Key: IGNITE-11995 > URL: https://issues.apache.org/jira/browse/IGNITE-11995 > Project: Ignite > Issue Type: Improvement >Reporter: Kirill Tkalenko >Assignee: Kirill Tkalenko >Priority: Major > Fix For: 2.8 > > Time Spent: 10m > Remaining Estimate: 0h > > If experimental command disabled: > * don't show WALCommand help > * if user ask for help for particular command - print out warning about > experimental commands instead of ignoring user request -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (IGNITE-11994) [TC Bot] Prepare new view to select base branch and other build parameters
[ https://issues.apache.org/jira/browse/IGNITE-11994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16888200#comment-16888200 ] Dmitriy Pavlov commented on IGNITE-11994: - https://github.com/apache/ignite-teamcity-bot/pull/134 > [TC Bot] Prepare new view to select base branch and other build parameters > -- > > Key: IGNITE-11994 > URL: https://issues.apache.org/jira/browse/IGNITE-11994 > Project: Ignite > Issue Type: Improvement >Reporter: Dmitriy Pavlov >Assignee: Dmitriy Pavlov >Priority: Major > > New view required for reports view and for VISAs creation for non-standard > master branches, > additional features may be contributed to this new view later -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (IGNITE-11989) Preload predicate not used in GridCachePreloader
[ https://issues.apache.org/jira/browse/IGNITE-11989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Muzafarov updated IGNITE-11989: - Ignite Flags: (was: Docs Required) > Preload predicate not used in GridCachePreloader > > > Key: IGNITE-11989 > URL: https://issues.apache.org/jira/browse/IGNITE-11989 > Project: Ignite > Issue Type: Improvement >Reporter: Maxim Muzafarov >Assignee: Maxim Muzafarov >Priority: Minor > Fix For: 2.8 > > Time Spent: 10m > Remaining Estimate: 0h > > {code:title=GridCachePreloader.java} > /** > * @return Preload predicate. If not {@code null}, will evaluate each > preloaded entry during > * send and receive, and if predicate evaluates to {@code false}, > entry will be skipped. > */ > public IgnitePredicate preloadPredicate(); > {code} > This is internal cache preload predicate, which is not used and not tested > for entry preloading. Can be removed to keep code simple. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (IGNITE-10761) GridCacheProcessor should add info about cache in excecption message, if applicable.
[ https://issues.apache.org/jira/browse/IGNITE-10761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16888109#comment-16888109 ] Sergey Antonov commented on IGNITE-10761: - [~ktkale...@gridgain.com] Changes looks good for me. > GridCacheProcessor should add info about cache in excecption message, if > applicable. > - > > Key: IGNITE-10761 > URL: https://issues.apache.org/jira/browse/IGNITE-10761 > Project: Ignite > Issue Type: Improvement >Reporter: Sergey Antonov >Assignee: Kirill Tkalenko >Priority: Major > Fix For: 2.8 > > > We should add info about problem cache in exception message. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (IGNITE-11988) control.sh validate_indexes SQL Index issue add information about group and cache id
[ https://issues.apache.org/jira/browse/IGNITE-11988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16888105#comment-16888105 ] Sergey Antonov commented on IGNITE-11988: - [~ktkale...@gridgain.com] Changes looks good for me. > control.sh validate_indexes SQL Index issue add information about group and > cache id > > > Key: IGNITE-11988 > URL: https://issues.apache.org/jira/browse/IGNITE-11988 > Project: Ignite > Issue Type: Improvement >Reporter: Kirill Tkalenko >Assignee: Kirill Tkalenko >Priority: Major > Fix For: 2.8 > > Time Spent: 10m > Remaining Estimate: 0h > > At the moment we have the following output in case of SQL index problems: > {noformat} > SQL Index > [cache=com.sbt.processing.replication.dpl.data.ReplicationApplyStateV1Entity_DPL_union-module, > idx=_key_PK] ValidateIndexesPartitionResult > [consistentId=10.116.241.93:47500, sqlIdxName=_key_PK] > IndexValidationIssue [key=678073218895971307, > cacheName=com.sbt.processing.replication.dpl.data.ReplicationApplyStateV1Entity_DPL_union-module, > idxName=_key_PK], class org.apache.ignite.IgniteCheckedException: Key is > present in SQL index, but can't be found in CacheDataTree. > IndexValidationIssue [key=2495557143516676100, > cacheName=com.sbt.processing.replication.dpl.data.ReplicationApplyStateV1Entity_DPL_union-module, > idxName=_key_PK], class org.apache.ignite.IgniteCheckedException: Key is > present in SQL index, but can't be found in CacheDataTree. > IndexValidationIssue [key=null, > cacheName=com.sbt.processing.replication.dpl.data.ReplicationApplyStateV1Entity_DPL_union-module, > idxName=_key_PK], class java.lang.AssertionError: itemId=9, directCnt=9, > indirectCnt=0, page=000133230112 [3883, 3669, 3456, 3242, 3029, 2815, > 2602, 2386, 1747][][free=2101] > IndexValidationIssue [key=2760988046554825752, > cacheName=com.sbt.processing.replication.dpl.data.ReplicationApplyStateV1Entity_DPL_union-module, > idxName=_key_PK], class org.apache.ignite.IgniteCheckedException: Key is > present in SQL index, but can't be found in CacheDataTree. > {noformat} > We print info about cache name only. > Now shoud add group and cache id. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (IGNITE-11990) Optimize heap usage for TcpDiscoveryNodeAddedMessage stored in pending messages in ServerImpl
[ https://issues.apache.org/jira/browse/IGNITE-11990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16888102#comment-16888102 ] Sergey Antonov commented on IGNITE-11990: - [~vmalinovskiy] Changes looks good for me. > Optimize heap usage for TcpDiscoveryNodeAddedMessage stored in pending > messages in ServerImpl > - > > Key: IGNITE-11990 > URL: https://issues.apache.org/jira/browse/IGNITE-11990 > Project: Ignite > Issue Type: Improvement >Reporter: Vladimir Malinovskiy >Assignee: Vladimir Malinovskiy >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > We are storing pending discovery messages in deserialized form. Pending > message could be heavy, for example TcpDiscoveryNodeAddedMessage. I think we > should store only information requeired for resending messages across ring. > In case of TcpDiscoveryNodeAddedMessage we couldn't store unmarhalled data in > DiscoveryDataPacket -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (IGNITE-11993) Print warning if awaiting next wal segment it too long
[ https://issues.apache.org/jira/browse/IGNITE-11993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16888058#comment-16888058 ] Sergey Antonov commented on IGNITE-11993: - [~ktkale...@gridgain.com] Chages looks good for me. > Print warning if awaiting next wal segment it too long > -- > > Key: IGNITE-11993 > URL: https://issues.apache.org/jira/browse/IGNITE-11993 > Project: Ignite > Issue Type: Improvement >Reporter: Kirill Tkalenko >Assignee: Kirill Tkalenko >Priority: Major > Fix For: 2.8 > > Time Spent: 10m > Remaining Estimate: 0h > > We must print warn to log, if awaiting next WAL segment more then defined > threshold. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Comment Edited] (IGNITE-11927) [IEP-35] Add ability to enable\disable subset of metrics
[ https://issues.apache.org/jira/browse/IGNITE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887990#comment-16887990 ] Andrey Gura edited comment on IGNITE-11927 at 7/18/19 2:07 PM: --- [~NIzhikov] > Can you, please, make a simple, pseudo-code example of your idea? Caches, for example. Not pseudo-code, but list of action items. On cache start or metrics enabling on cache: - create metrics holder object. - create MetricRegistry instance. - register metrics in MetricRegistry. - add MetricsRegistry in GridMetricManager. On cache stop or metrics disabling for chache: - remove MetricRegistry from GridMetricManager. - assign null to metrics holder object reference. > 1. If we have 5000 caches, Ignite structures already huge. Why do you think > metrics bring a huge impact on GC? Why we should ignore this impact if we can just avoid it without much effort? > 2. All AtomicLong fields are created in previous versions of > CacheMetricsImpl. MetricRegistry is the only addition we made with the new > framework. You are right. But this addition lead to some changes in design. It's good time to improve implementation. Also, I think it would be better design if MetricRegistry will be immutable. It will lead to more clear code structure and behavior. > Do we have some benchmarks or other descriptions of this issue? No, we don't. But obviously all this objects in heap are not free. was (Author: agura): [~NIzhikov] > Can you, please, make a simple, pseudo-code example of your idea? Caches, for example. Not pseudo-code, but list of action items. On cache start or metrics enabling on cache: - create metrics holder object. - create MetricRegistry instance. - register metrics in MetricRegistry. - add MetricsRegistry in GridMetricManager. On cache stop or metrics disabling for chache: - remove MetricRegistry from GridMetricManager. - assign null to metrics holder object reference. > 1. If we have 5000 caches, Ignite structures already huge. Why do you think > metrics bring a huge impact on GC? Why we should ignore this impact if we can just avoid it without much effort? > 2. All AtomicLong fields are created in previous versions of > CacheMetricsImpl. MetricRegistry is the only addition we made with the new > framework. You are right. But this addition lead to some changes in design. It's good time to improve implementation. > Do we have some benchmarks or other descriptions of this issue? No, we don't. But obviously all this objects in heap are not free. > [IEP-35] Add ability to enable\disable subset of metrics > > > Key: IGNITE-11927 > URL: https://issues.apache.org/jira/browse/IGNITE-11927 > Project: Ignite > Issue Type: Improvement >Reporter: Nikolay Izhikov >Assignee: Nikolay Izhikov >Priority: Major > Labels: IEP-35 > Time Spent: 10m > Remaining Estimate: 0h > > Ignite should be able to: > * Enable or disable an arbitrary subset of the metrics. User should be able > to do it in runtime. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (IGNITE-11927) [IEP-35] Add ability to enable\disable subset of metrics
[ https://issues.apache.org/jira/browse/IGNITE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16888011#comment-16888011 ] Andrey Gura commented on IGNITE-11927: -- BTW, I found one more issue with memory consumption. When metric registers in metric registry we use metric name without group name as key in metric registry and metric name concatenated with group name as {{AbstractMetric.name}} field value. So we consume twice more memory just for metric name. > [IEP-35] Add ability to enable\disable subset of metrics > > > Key: IGNITE-11927 > URL: https://issues.apache.org/jira/browse/IGNITE-11927 > Project: Ignite > Issue Type: Improvement >Reporter: Nikolay Izhikov >Assignee: Nikolay Izhikov >Priority: Major > Labels: IEP-35 > Time Spent: 10m > Remaining Estimate: 0h > > Ignite should be able to: > * Enable or disable an arbitrary subset of the metrics. User should be able > to do it in runtime. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (IGNITE-11927) [IEP-35] Add ability to enable\disable subset of metrics
[ https://issues.apache.org/jira/browse/IGNITE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887990#comment-16887990 ] Andrey Gura commented on IGNITE-11927: -- [~NIzhikov] > Can you, please, make a simple, pseudo-code example of your idea? Caches, for example. Not pseudo-code, but list of action items. On cache start or metrics enabling on cache: - create metrics holder object. - create MetricRegistry instance. - register metrics in MetricRegistry. - add MetricsRegistry in GridMetricManager. On cache stop or metrics disabling for chache: - remove MetricRegistry from GridMetricManager. - assign null to metrics holder object reference. > 1. If we have 5000 caches, Ignite structures already huge. Why do you think > metrics bring a huge impact on GC? Why we should ignore this impact if we can just avoid it without much effort? > 2. All AtomicLong fields are created in previous versions of > CacheMetricsImpl. MetricRegistry is the only addition we made with the new > framework. You are right. But this addition lead to some changes in design. It's good time to improve implementation. > Do we have some benchmarks or other descriptions of this issue? No, we don't. But obviously all this objects in heap are not free. > [IEP-35] Add ability to enable\disable subset of metrics > > > Key: IGNITE-11927 > URL: https://issues.apache.org/jira/browse/IGNITE-11927 > Project: Ignite > Issue Type: Improvement >Reporter: Nikolay Izhikov >Assignee: Nikolay Izhikov >Priority: Major > Labels: IEP-35 > Time Spent: 10m > Remaining Estimate: 0h > > Ignite should be able to: > * Enable or disable an arbitrary subset of the metrics. User should be able > to do it in runtime. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (IGNITE-11990) Optimize heap usage for TcpDiscoveryNodeAddedMessage stored in pending messages in ServerImpl
[ https://issues.apache.org/jira/browse/IGNITE-11990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887982#comment-16887982 ] Ignite TC Bot commented on IGNITE-11990: {panel:title=--> Run :: All: Possible Blockers|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1} {color:#d04437}PDS (Indexing){color} [[tests 0 TIMEOUT , Exit Code |https://ci.ignite.apache.org/viewLog.html?buildId=4348387]] {panel} [TeamCity *--> Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=4344092&buildTypeId=IgniteTests24Java8_RunAll] > Optimize heap usage for TcpDiscoveryNodeAddedMessage stored in pending > messages in ServerImpl > - > > Key: IGNITE-11990 > URL: https://issues.apache.org/jira/browse/IGNITE-11990 > Project: Ignite > Issue Type: Improvement >Reporter: Vladimir Malinovskiy >Assignee: Vladimir Malinovskiy >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > We are storing pending discovery messages in deserialized form. Pending > message could be heavy, for example TcpDiscoveryNodeAddedMessage. I think we > should store only information requeired for resending messages across ring. > In case of TcpDiscoveryNodeAddedMessage we couldn't store unmarhalled data in > DiscoveryDataPacket -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (IGNITE-11961) Provide JMX metrics for PME timings
[ https://issues.apache.org/jira/browse/IGNITE-11961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887976#comment-16887976 ] Amelchev Nikita commented on IGNITE-11961: -- I have implemented new metric - isCurrentPmeBlocksOperations. It checks that current PME blocks operations. Together with the getCurrentPmeDuration metric, these metrics will show influence of the PME on cluster and user operations. > Provide JMX metrics for PME timings > --- > > Key: IGNITE-11961 > URL: https://issues.apache.org/jira/browse/IGNITE-11961 > Project: Ignite > Issue Type: Improvement >Reporter: Amelchev Nikita >Assignee: Amelchev Nikita >Priority: Major > Labels: IEP-35 > > Currently, partition map exchange timings printed to log(IGNITE-10493). It > will be useful if we allow external tools to collect and aggregate partition > map exchange metrics. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (IGNITE-11961) Provide JMX metrics for PME timings
[ https://issues.apache.org/jira/browse/IGNITE-11961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887974#comment-16887974 ] Ignite TC Bot commented on IGNITE-11961: {panel:title=--> Run :: All: No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel} [TeamCity *--> Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=4348868&buildTypeId=IgniteTests24Java8_RunAll] > Provide JMX metrics for PME timings > --- > > Key: IGNITE-11961 > URL: https://issues.apache.org/jira/browse/IGNITE-11961 > Project: Ignite > Issue Type: Improvement >Reporter: Amelchev Nikita >Assignee: Amelchev Nikita >Priority: Major > Labels: IEP-35 > > Currently, partition map exchange timings printed to log(IGNITE-10493). It > will be useful if we allow external tools to collect and aggregate partition > map exchange metrics. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (IGNITE-11990) Optimize heap usage for TcpDiscoveryNodeAddedMessage stored in pending messages in ServerImpl
[ https://issues.apache.org/jira/browse/IGNITE-11990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887964#comment-16887964 ] Ignite TC Bot commented on IGNITE-11990: {panel:title=--> Run :: All: Possible Blockers|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1} {color:#d04437}PDS (Indexing){color} [[tests 0 TIMEOUT , Exit Code |https://ci.ignite.apache.org/viewLog.html?buildId=4348387]] {panel} [TeamCity *--> Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=4344092&buildTypeId=IgniteTests24Java8_RunAll] > Optimize heap usage for TcpDiscoveryNodeAddedMessage stored in pending > messages in ServerImpl > - > > Key: IGNITE-11990 > URL: https://issues.apache.org/jira/browse/IGNITE-11990 > Project: Ignite > Issue Type: Improvement >Reporter: Vladimir Malinovskiy >Assignee: Vladimir Malinovskiy >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > We are storing pending discovery messages in deserialized form. Pending > message could be heavy, for example TcpDiscoveryNodeAddedMessage. I think we > should store only information requeired for resending messages across ring. > In case of TcpDiscoveryNodeAddedMessage we couldn't store unmarhalled data in > DiscoveryDataPacket -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Comment Edited] (IGNITE-11927) [IEP-35] Add ability to enable\disable subset of metrics
[ https://issues.apache.org/jira/browse/IGNITE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887958#comment-16887958 ] Nikolay Izhikov edited comment on IGNITE-11927 at 7/18/19 1:01 PM: --- [~agura] Sorry, I still don't understand you. Can you, please, make a simple, pseudo-code example of your idea? > Motivation: reducing memory consuming and GC pressure. There are users with > big amount of caches (I saw cases with 5000 caches in 200 cache groups). We can have NoOp implementation of MetricRegistries disabled in the config file. And this registries never can be enabled. But this motivation looks very odd for me: 1. If we have 5000 caches, Ignite structures already huge. Why do you think metrics bring a huge impact on GC? 2. All {{AtomicLong}} fields are created in previous versions of CacheMetricsImpl. MetricRegistry is the only addition we made with the new framework. Do we have some benchmarks or other descriptions of this issue? was (Author: nizhikov): [~agura] Sorry, I still don't understand you. Can you, please, make a simple, pseudo-code example of your idea? > Motivation: reducing memory consuming and GC pressure. There are users with > big amount of caches (I saw cases with 5000 caches in 200 cache groups). We can have NoOp implementation of MetricRegistries disabled in the config file. And this registries never can be enabled. But this motivation looks very odd for me: 1. If we have 5000 caches, Ignite structures already huge. Why do you think metrics bring a huge impact on GC? 2. All {AtomicLong} fields are created in previous versions of CacheMetricsImpl. MetricRegistry is the only addition we made with the new framework. Do we have some benchmarks or other descriptions of this issue? > [IEP-35] Add ability to enable\disable subset of metrics > > > Key: IGNITE-11927 > URL: https://issues.apache.org/jira/browse/IGNITE-11927 > Project: Ignite > Issue Type: Improvement >Reporter: Nikolay Izhikov >Assignee: Nikolay Izhikov >Priority: Major > Labels: IEP-35 > Time Spent: 10m > Remaining Estimate: 0h > > Ignite should be able to: > * Enable or disable an arbitrary subset of the metrics. User should be able > to do it in runtime. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (IGNITE-11927) [IEP-35] Add ability to enable\disable subset of metrics
[ https://issues.apache.org/jira/browse/IGNITE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887958#comment-16887958 ] Nikolay Izhikov commented on IGNITE-11927: -- [~agura] Sorry, I still don't understand you. Can you, please, make a simple, pseudo-code example of your idea? > Motivation: reducing memory consuming and GC pressure. There are users with > big amount of caches (I saw cases with 5000 caches in 200 cache groups). We can have NoOp implementation of MetricRegistries disabled in the config file. And this registries never can be enabled. But this motivation looks very odd for me: 1. If we have 5000 caches, Ignite structures already huge. Why do you think metrics bring a huge impact on GC? 2. All {AtomicLong} fields are created in previous versions of CacheMetricsImpl. MetricRegistry is the only addition we made with the new framework. Do we have some benchmarks or other descriptions of this issue? > [IEP-35] Add ability to enable\disable subset of metrics > > > Key: IGNITE-11927 > URL: https://issues.apache.org/jira/browse/IGNITE-11927 > Project: Ignite > Issue Type: Improvement >Reporter: Nikolay Izhikov >Assignee: Nikolay Izhikov >Priority: Major > Labels: IEP-35 > Time Spent: 10m > Remaining Estimate: 0h > > Ignite should be able to: > * Enable or disable an arbitrary subset of the metrics. User should be able > to do it in runtime. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (IGNITE-11799) Do not always clear partition in MOVING state before exchange
[ https://issues.apache.org/jira/browse/IGNITE-11799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887952#comment-16887952 ] Alexei Scherbakov commented on IGNITE-11799: [~Mmuzaf] This is actual. I still haven't donated several follow up fixes from GridGain CE, where comment is removed. Currently I'm on vacation and expecting to donate in the start of August. > Do not always clear partition in MOVING state before exchange > - > > Key: IGNITE-11799 > URL: https://issues.apache.org/jira/browse/IGNITE-11799 > Project: Ignite > Issue Type: Improvement >Reporter: Alexei Scherbakov >Assignee: Alexei Scherbakov >Priority: Major > > After IGNITE-10078 if partition was in moving state before exchange and > choosed for full rebalance (for example, this will happen if any minor PME > cancels previous rebalance) we always will clear it to avoid desync issues if > some removals were not delivered to demander. > This is not necessary to do if previous rebalance was full. > > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (IGNITE-9283) [ML] Add Discrete Cosine preprocessor
[ https://issues.apache.org/jira/browse/IGNITE-9283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887951#comment-16887951 ] Aleksey Zinoviev commented on IGNITE-9283: -- [~ilantukh] Happy to hear that, mention me to review it when PR will be prepared > [ML] Add Discrete Cosine preprocessor > - > > Key: IGNITE-9283 > URL: https://issues.apache.org/jira/browse/IGNITE-9283 > Project: Ignite > Issue Type: Sub-task > Components: ml >Reporter: Aleksey Zinoviev >Assignee: Ilya Lantukh >Priority: Major > > Add [https://en.wikipedia.org/wiki/Discrete_cosine_transform] > Please look at the MinMaxScaler or Normalization packages in preprocessing > package. > Add classes if required > 1) Preprocessor > 2) Trainer > 3) custom PartitionData if shuffling is a step of algorithm > > Requirements for successful PR: > # PartitionedDataset usage > # Trainer-Model paradigm support > # Tests for Model and for Trainer (and other stuff) > # Example of usage with small, but famous dataset like IRIS, Titanic or > House Prices > # Javadocs/codestyle according guidelines > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (IGNITE-9283) [ML] Add Discrete Cosine preprocessor
[ https://issues.apache.org/jira/browse/IGNITE-9283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887941#comment-16887941 ] Ilya Lantukh commented on IGNITE-9283: -- [~zaleslaw] , I will prepare a PR in a few days. > [ML] Add Discrete Cosine preprocessor > - > > Key: IGNITE-9283 > URL: https://issues.apache.org/jira/browse/IGNITE-9283 > Project: Ignite > Issue Type: Sub-task > Components: ml >Reporter: Aleksey Zinoviev >Assignee: Ilya Lantukh >Priority: Major > > Add [https://en.wikipedia.org/wiki/Discrete_cosine_transform] > Please look at the MinMaxScaler or Normalization packages in preprocessing > package. > Add classes if required > 1) Preprocessor > 2) Trainer > 3) custom PartitionData if shuffling is a step of algorithm > > Requirements for successful PR: > # PartitionedDataset usage > # Trainer-Model paradigm support > # Tests for Model and for Trainer (and other stuff) > # Example of usage with small, but famous dataset like IRIS, Titanic or > House Prices > # Javadocs/codestyle according guidelines > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (IGNITE-11927) [IEP-35] Add ability to enable\disable subset of metrics
[ https://issues.apache.org/jira/browse/IGNITE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887906#comment-16887906 ] Andrey Gura commented on IGNITE-11927: -- [~NIzhikov] >> 3. There is no need for disabled flag in MetricRegistry. As we discused >> early, when metric disabled they don't consume any memory. MetricsRegistry >> should be collected by GC. After enabling it can be created and registered >> into GridMetricManager again. >I don't understand your proposal. >Metric instances are stored as a class field in the places where they updated. >You can take a GridJobProcessor or CacheMetricImpl as examples. >How and why we should clear these variables on disabling? >Can you provide simple pseudo code for disable\enable processing. At least metric registry can be just removed from registries. On enabling new instance can be created. Ideally, metrics can be moved to the special holder class and reference to it can be null after disabling. Motivation: reducing memory consuming and GC pressure. There are users with big amount of caches (I saw cases with 5000 caches in 200 cache groups). > [IEP-35] Add ability to enable\disable subset of metrics > > > Key: IGNITE-11927 > URL: https://issues.apache.org/jira/browse/IGNITE-11927 > Project: Ignite > Issue Type: Improvement >Reporter: Nikolay Izhikov >Assignee: Nikolay Izhikov >Priority: Major > Labels: IEP-35 > Time Spent: 10m > Remaining Estimate: 0h > > Ignite should be able to: > * Enable or disable an arbitrary subset of the metrics. User should be able > to do it in runtime. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (IGNITE-11979) Add ability to set default parallelizm of rebuild indexes in configuration
[ https://issues.apache.org/jira/browse/IGNITE-11979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887899#comment-16887899 ] Andrew Mashenkov commented on IGNITE-11979: --- [~Denis Chudov], I've left few comments to the PR. Please, take a look. > Add ability to set default parallelizm of rebuild indexes in configuration > -- > > Key: IGNITE-11979 > URL: https://issues.apache.org/jira/browse/IGNITE-11979 > Project: Ignite > Issue Type: Improvement >Reporter: Denis Chudov >Assignee: Denis Chudov >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > We can't change SchemaIndexCacheVisitorImpl#DFLT_PARALLELISM at the moment: > {code:java} > /** Default degree of parallelism. */ > private static final int DFLT_PARALLELISM = Math.min(4, Math.max(1, > Runtime.getRuntime().availableProcessors() / 4)); > {code} > On huge servers with a lot of cores (such as 56) we will rebuild indexes in 4 > threads. I think we should have ability to set DFLT_PARALLELISM in Ignite > configuration. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (IGNITE-11992) Improvements for new security approach
[ https://issues.apache.org/jira/browse/IGNITE-11992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887890#comment-16887890 ] Ignite TC Bot commented on IGNITE-11992: {panel:title=--> Run :: All: No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel} [TeamCity *--> Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=4345271&buildTypeId=IgniteTests24Java8_RunAll] > Improvements for new security approach > -- > > Key: IGNITE-11992 > URL: https://issues.apache.org/jira/browse/IGNITE-11992 > Project: Ignite > Issue Type: Improvement > Components: security >Affects Versions: 2.8 >Reporter: Stepachev Maksim >Assignee: Stepachev Maksim >Priority: Major > Fix For: 2.8 > > Time Spent: 10m > Remaining Estimate: 0h > > 1. ZookeaperDiscoveryImpl doesn't implement security into itself. > As a result: Caused by: class org.apache.ignite.spi.IgniteSpiException: > Security context isn't certain. > 2. The visor tasks lost permission. > The method VisorQueryUtils#scheduleQueryStart makes a new thread and loses > context. > 3. The GridRestProcessor does tasks outside "withContext" section. As result > context loses. > 4. The GridRestProcessor isn't client, we can't read security subject from > node attribute. > We should transmit secCtx for fake nodes and secSubjId for real. > 5. NoOpIgniteSecurityProcessor should include a disabled processor and > validate it too if it is not null. It is important for a client node. > For example: > Into IgniteKernal#securityProcessor method createComponent return a > GridSecurityProcessor. For server nodes are enabled, but for clients aren't. > The clients aren't able to pass validation for this reason. > 6. ATTR_SECURITY_SUBJECT was removed. It broke compatibility. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (IGNITE-11799) Do not always clear partition in MOVING state before exchange
[ https://issues.apache.org/jira/browse/IGNITE-11799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887827#comment-16887827 ] Maxim Muzafarov commented on IGNITE-11799: -- [~ascherbakov] Hello, can you please clarify? The issue is closed as `won't fix`, but the comment is still persist. Should we remove it? org/apache/ignite/internal/processors/cache/distributed/dht/topology/GridDhtPartitionTopologyImpl.java:799 {code} // It's important to clear non empty moving partitions before full rebalancing. // Consider the scenario: // Node1 has keys k1 and k2 in the same partition. // Node2 started rebalancing from Node1. // Node2 received k1, k2 and failed before moving partition to OWNING state. // Node1 removes k2 but update has not been delivered to Node1 because of failure. // After new full rebalance Node1 will only send k1 to Node2 causing lost removal. // NOTE: avoid calling clearAsync for partition twice per topology version. // TODO FIXME clearing is not always needed see IGNITE-11799 {code} > Do not always clear partition in MOVING state before exchange > - > > Key: IGNITE-11799 > URL: https://issues.apache.org/jira/browse/IGNITE-11799 > Project: Ignite > Issue Type: Improvement >Reporter: Alexei Scherbakov >Assignee: Alexei Scherbakov >Priority: Major > > After IGNITE-10078 if partition was in moving state before exchange and > choosed for full rebalance (for example, this will happen if any minor PME > cancels previous rebalance) we always will clear it to avoid desync issues if > some removals were not delivered to demander. > This is not necessary to do if previous rebalance was full. > > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (IGNITE-3195) Rebalancing: IgniteConfiguration.rebalanceThreadPoolSize is wrongly treated
[ https://issues.apache.org/jira/browse/IGNITE-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887745#comment-16887745 ] Anton Vinogradov commented on IGNITE-3195: -- [~Mmuzaf], [~xtern] I've prepared the PoC. Please prereview the code. Going to perform benchmarks on real environment after your check. > Rebalancing: IgniteConfiguration.rebalanceThreadPoolSize is wrongly treated > --- > > Key: IGNITE-3195 > URL: https://issues.apache.org/jira/browse/IGNITE-3195 > Project: Ignite > Issue Type: Bug > Components: cache >Reporter: Denis Magda >Assignee: Anton Vinogradov >Priority: Major > Labels: iep-16 > Fix For: 2.8 > > Time Spent: 40m > Remaining Estimate: 0h > > Presently it's considered that the maximum number of threads that has to > process all demand and supply messages coming from all the nodes must not be > bigger than {{IgniteConfiguration.rebalanceThreadPoolSize}}. > Current implementation relies on ordered messages functionality creating a > number of topics equal to {{IgniteConfiguration.rebalanceThreadPoolSize}}. > However, the implementation doesn't take into account that ordered messages, > that correspond to a particular topic, are processed in parallel for > different nodes. Refer to the implementation of > {{GridIoManager.processOrderedMessage}} to see that for every topic there > will be a unique {{GridCommunicationMessageSet}} for every node. > Also to prove that this is true you can refer to this execution stack > {noformat} > java.lang.RuntimeException: HAPPENED DEMAND > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:378) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:364) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:622) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:320) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$300(GridCacheIoManager.java:81) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager$OrderedMessageListener.onMessage(GridCacheIoManager.java:1125) > at > org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1219) > at > org.apache.ignite.internal.managers.communication.GridIoManager.access$1600(GridIoManager.java:105) > at > org.apache.ignite.internal.managers.communication.GridIoManager$GridCommunicationMessageSet.unwind(GridIoManager.java:2456) > at > org.apache.ignite.internal.managers.communication.GridIoManager.unwindMessageSet(GridIoManager.java:1179) > at > org.apache.ignite.internal.managers.communication.GridIoManager.access$1900(GridIoManager.java:105) > at > org.apache.ignite.internal.managers.communication.GridIoManager$6.run(GridIoManager.java:1148) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > All this means that in fact the number of threads that will be busy with > replication activity will be equal to > {{IgniteConfiguration.rebalanceThreadPoolSize}} x > number_of_nodes_participated_in_rebalancing -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (IGNITE-3195) Rebalancing: IgniteConfiguration.rebalanceThreadPoolSize is wrongly treated
[ https://issues.apache.org/jira/browse/IGNITE-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887742#comment-16887742 ] Ignite TC Bot commented on IGNITE-3195: --- {panel:title=--> Run :: All: No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel} [TeamCity *--> Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=4347484&buildTypeId=IgniteTests24Java8_RunAll] > Rebalancing: IgniteConfiguration.rebalanceThreadPoolSize is wrongly treated > --- > > Key: IGNITE-3195 > URL: https://issues.apache.org/jira/browse/IGNITE-3195 > Project: Ignite > Issue Type: Bug > Components: cache >Reporter: Denis Magda >Assignee: Anton Vinogradov >Priority: Major > Labels: iep-16 > Fix For: 2.8 > > Time Spent: 40m > Remaining Estimate: 0h > > Presently it's considered that the maximum number of threads that has to > process all demand and supply messages coming from all the nodes must not be > bigger than {{IgniteConfiguration.rebalanceThreadPoolSize}}. > Current implementation relies on ordered messages functionality creating a > number of topics equal to {{IgniteConfiguration.rebalanceThreadPoolSize}}. > However, the implementation doesn't take into account that ordered messages, > that correspond to a particular topic, are processed in parallel for > different nodes. Refer to the implementation of > {{GridIoManager.processOrderedMessage}} to see that for every topic there > will be a unique {{GridCommunicationMessageSet}} for every node. > Also to prove that this is true you can refer to this execution stack > {noformat} > java.lang.RuntimeException: HAPPENED DEMAND > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:378) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:364) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:622) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:320) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$300(GridCacheIoManager.java:81) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager$OrderedMessageListener.onMessage(GridCacheIoManager.java:1125) > at > org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1219) > at > org.apache.ignite.internal.managers.communication.GridIoManager.access$1600(GridIoManager.java:105) > at > org.apache.ignite.internal.managers.communication.GridIoManager$GridCommunicationMessageSet.unwind(GridIoManager.java:2456) > at > org.apache.ignite.internal.managers.communication.GridIoManager.unwindMessageSet(GridIoManager.java:1179) > at > org.apache.ignite.internal.managers.communication.GridIoManager.access$1900(GridIoManager.java:105) > at > org.apache.ignite.internal.managers.communication.GridIoManager$6.run(GridIoManager.java:1148) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > All this means that in fact the number of threads that will be busy with > replication activity will be equal to > {{IgniteConfiguration.rebalanceThreadPoolSize}} x > number_of_nodes_participated_in_rebalancing -- This message was sent by Atlassian JIRA (v7.6.14#76016)