[jira] [Commented] (IGNITE-10959) Memory leaks in continuous query handlers

2019-11-05 Thread Zane Hu (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967866#comment-16967866
 ] 

Zane Hu commented on IGNITE-10959:
--

We have observed two cases of using huge amount of memory in Ignite Continuous 
Query, which both are caused by too many pending cache-update events since an 
earlier event than the pending events has not arrived yet. BTW, we use Ignite 
2.7.0.
 * One is CacheContinuousQueryHandler.rcvs growing to 7.7 GB Retained Heap, 
seen in Jmap/Memory Analyzer. Also we saw "Pending events reached max of buffer 
size" in Ignite log file. According to 
[https://github.com/apache/ignite/blob/ignite-2.7/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/query/continuous/CacheContinuousQueryPartitionRecovery.java#L196],
 it is because the size of CacheContinuousQueryPartitionRecovery.pendingEvts >= 
MAX_BUFF_SIZE, (default 10,000). And Ignite will flush and remove 10% of the 
entries in the pendingEvts, regardless some unarrived early events are dropped 
without notifying the listener. This upper-bound limit of MAX_BUFF_SIZE 
prevents the memory from further growing to OOM.
 * Another is CacheContinuousQueryEventBuffer.pending growing to 22 GB Retained 
Heap, seen in Jmap/Memory Analyzer. According to 
[https://github.com/apache/ignite/blob/ignite-2.7/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/query/continuous/CacheContinuousQueryEventBuffer.java#L168],
 the cache-update events are processed in batch of 
CacheContinuousQueryEventBuffer.Batch.entries[BUF_SIZE] (default BUF_SIZE is 
1,000). If an event entry is within the current batch (e.updateCounter() <= 
batch.endCntr), it is processed by batch.processEntry0(). Otherwise it is put 
in CacheContinuousQueryEventBuffer.pending. However, according to 
[https://github.com/apache/ignite/blob/ignite-2.7/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/query/continuous/CacheContinuousQueryEventBuffer.java#L425],
 if any event within the current batch has not arrived, Ignite will not move to 
the next batch to process entries in CacheContinuousQueryEventBuffer.pending. 
Differently from processing in 
CacheContinuousQueryPartitionRecovery.pendingEvts and MAX_BUFF_SIZE, there is 
NO upper-bound limit on CacheContinuousQueryEventBuffer.pending. It means if an 
earlier event than the events in CacheContinuousQueryEventBuffer.pending never 
comes for some reason (high frequency of lots of events, high concurrency, 
timeout, ...), CacheContinuousQueryEventBuffer.pending will grow to OOM. To 
prevent this, I think Ignite at least needs to add an upper-bound limit and 
some processing here, to flush and remove 10% events out from 
CacheContinuousQueryEventBuffer.pending, similarly to 
CacheContinuousQueryPartitionRecovery.pendingEvts. In terms of exception 
handling, I think dropping some events is better than OOM. 

 

> Memory leaks in continuous query handlers
> -
>
> Key: IGNITE-10959
> URL: https://issues.apache.org/jira/browse/IGNITE-10959
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.7
>Reporter: Denis Mekhanikov
>Priority: Major
> Fix For: 2.9
>
> Attachments: CacheContinuousQueryMemoryUsageTest.java, 
> CacheContinuousQueryMemoryUsageTest.result, 
> CacheContinuousQueryMemoryUsageTest2.java, continuousquery_leak_profile.png
>
>
> Continuous query handlers don't clear internal data structures after cache 
> events are processed.
> A test, that reproduces the problem, is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-11765) Vulnerable library H2 Database Engine1.4.197 used

2019-11-05 Thread Erikson Murrugarra (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-11765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967823#comment-16967823
 ] 

Erikson Murrugarra commented on IGNITE-11765:
-

Hi. Do you have any updates for this issue?

Thank you.

> Vulnerable library H2 Database Engine1.4.197 used
> -
>
> Key: IGNITE-11765
> URL: https://issues.apache.org/jira/browse/IGNITE-11765
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.7
>Reporter: VIJAY BHATT
>Priority: Major
>
> We use blackduck for scanning our project. It has identified Ignite 2.7.0 
> using H2 Database Engine version 1.4.197 as a vulnerable library having the 
> following 2 vulnerabilities:
> BDSA-2018-1048 (CVE-2018-10054)
> BDSA-2018-2507 (CVE-2018-14335)
> Suggested fix by blackduck is to use version 1.4.198
> We tried using 1.4.198 using jar override but it has some breaking changes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12300) ComputeJob#cancel executes with wrong SecurityContext

2019-11-05 Thread Andrey Kuznetsov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967635#comment-16967635
 ] 

Andrey Kuznetsov commented on IGNITE-12300:
---

The fix and tests look OK to me.

> ComputeJob#cancel executes with wrong SecurityContext
> -
>
> Key: IGNITE-12300
> URL: https://issues.apache.org/jira/browse/IGNITE-12300
> Project: Ignite
>  Issue Type: Bug
>Reporter: Denis Garus
>Assignee: Denis Garus
>Priority: Major
>  Labels: iep-38
> Attachments: ComputeJobCancelReproducerTest.java
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ComputeJob#cancel executes with the security context of a current node rather 
> than a security context of a node that initiates ComputeJob.
>  
> Reproducer:
> [https://github.com/apache/ignite/pull/6984/files]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IGNITE-12303) Change comment for an enumeration item CACHE_DESTROY

2019-11-05 Thread Albert Iskhakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Albert Iskhakov reassigned IGNITE-12303:


Assignee: Albert Iskhakov

> Change comment for an enumeration item CACHE_DESTROY
> 
>
> Key: IGNITE-12303
> URL: https://issues.apache.org/jira/browse/IGNITE-12303
> Project: Ignite
>  Issue Type: Wish
>  Components: documentation, security
>Reporter: Surkov Aleksandr
>Assignee: Albert Iskhakov
>Priority: Minor
>  Labels: newbie
>
> For the _org.apache.ignite.plugin.security.SecurityPermission#CACHE_DESTROY_ 
> enumeration element it would be worth changing the comment. "Cache create 
> permission." it's not very good.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IGNITE-12278) Add metric showing how many nodes may safely leave the cluster wihout partition loss

2019-11-05 Thread Albert Iskhakov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Albert Iskhakov reassigned IGNITE-12278:


Assignee: Albert Iskhakov

> Add metric showing how many nodes may safely leave the cluster wihout 
> partition loss
> 
>
> Key: IGNITE-12278
> URL: https://issues.apache.org/jira/browse/IGNITE-12278
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Rakov
>Assignee: Albert Iskhakov
>Priority: Major
>  Labels: IEP-35, newbie
> Fix For: 2.8
>
>
> We already have CacheGroupMetricsMXBean#getMinimumNumberOfPartitionCopies 
> metrics that shows partitions redundancy number for a specific cache group.
> It would be handy if user has single aggregated metric for all cache groups 
> showing how many nodes may leave the cluster without partition loss in any 
> cache.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12351) Append additional cp tracking activity - pages sort.

2019-11-05 Thread Vladimir Malinovskiy (Jira)
Vladimir Malinovskiy created IGNITE-12351:
-

 Summary: Append additional cp tracking activity - pages sort.
 Key: IGNITE-12351
 URL: https://issues.apache.org/jira/browse/IGNITE-12351
 Project: Ignite
  Issue Type: Bug
Reporter: Vladimir Malinovskiy
Assignee: Vladimir Malinovskiy


CheckpointMetricsTracker has no info about _splitAndSortCpPagesIfNeeded_ stage, 
thus in case of huge number of dirty pages someone can observe in log:
10:08:00 checkpoint started
10:10:00 checkpoint finished
<--- ?? 
10:10:20 checkpoint started

if checkpointFrequency = 3 and all tracker durations: beforeLockDuration, 
lockWaitDuration gives no clue what kind of work (20 sec) Checkpointer thread 
is waiting for.

Additionally (hope this is not big deal) need t obe fixed redundant 
effectivePageId computation cause FullPageId already has effectivePageId info.

 

{{return Long.compare(PageIdUtils.effectivePageId(o1.pageId()),
PageIdUtils.effectivePageId(o2.pageId()));}}

writeCheckpointEntr() duration should also be logged.
 
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12189) Implement correct limit for TextQuery

2019-11-05 Thread Ivan Pavlukhin (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967465#comment-16967465
 ] 

Ivan Pavlukhin commented on IGNITE-12189:
-

[~Yuriy_Shuliha], could you please create a ticket about passing `List` instead 
of `Collection` to `GridCacheQueryFutureAdapter#enqueue` method?

> Implement correct limit for TextQuery
> -
>
> Key: IGNITE-12189
> URL: https://issues.apache.org/jira/browse/IGNITE-12189
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Yuriy Shuliha 
>Assignee: Yuriy Shuliha 
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>
> PROBLEM
> For now each server-node returns all response records to the client-node and 
> it may contain ~thousands, ~hundred thousands records.
>  Event if we need only first 10-100. Again, all the results are added to 
> queue in _*GridCacheQueryFutureAdapter*_ in arbitrary order by pages.
>  There are no any means to deliver deterministic result.
> SOLUTION
>  Implement _*limit*_ as parameter for _*TextQuery*_ and 
> _*GridCacheQueryRequest*_ 
>  It should be passed as limit  parameter in Lucene's  
> _*IndexSearcher.search()*_ in _*GridLuceneIndex*_.
> For distributed queries _*limit*_ will also trim response queue when merging 
> results.
> Type: long
>  Special value: : 0 -> No limit (Integer.MAX_VALUE);



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12189) Implement correct limit for TextQuery

2019-11-05 Thread Ivan Pavlukhin (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967463#comment-16967463
 ] 

Ivan Pavlukhin commented on IGNITE-12189:
-

[~Yuriy_Shuliha], the PR looks fine to be merged. If there is no work to be 
done, I will merge it, please confirm.

> Implement correct limit for TextQuery
> -
>
> Key: IGNITE-12189
> URL: https://issues.apache.org/jira/browse/IGNITE-12189
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Yuriy Shuliha 
>Assignee: Yuriy Shuliha 
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>
> PROBLEM
> For now each server-node returns all response records to the client-node and 
> it may contain ~thousands, ~hundred thousands records.
>  Event if we need only first 10-100. Again, all the results are added to 
> queue in _*GridCacheQueryFutureAdapter*_ in arbitrary order by pages.
>  There are no any means to deliver deterministic result.
> SOLUTION
>  Implement _*limit*_ as parameter for _*TextQuery*_ and 
> _*GridCacheQueryRequest*_ 
>  It should be passed as limit  parameter in Lucene's  
> _*IndexSearcher.search()*_ in _*GridLuceneIndex*_.
> For distributed queries _*limit*_ will also trim response queue when merging 
> results.
> Type: long
>  Special value: : 0 -> No limit (Integer.MAX_VALUE);



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12189) Implement correct limit for TextQuery

2019-11-05 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967460#comment-16967460
 ] 

Ignite TC Bot commented on IGNITE-12189:


{panel:title=Branch: [pull/6917/head] Base: [master] : No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
[TeamCity *-- Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=4747552buildTypeId=IgniteTests24Java8_RunAll]

> Implement correct limit for TextQuery
> -
>
> Key: IGNITE-12189
> URL: https://issues.apache.org/jira/browse/IGNITE-12189
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Yuriy Shuliha 
>Assignee: Yuriy Shuliha 
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>
> PROBLEM
> For now each server-node returns all response records to the client-node and 
> it may contain ~thousands, ~hundred thousands records.
>  Event if we need only first 10-100. Again, all the results are added to 
> queue in _*GridCacheQueryFutureAdapter*_ in arbitrary order by pages.
>  There are no any means to deliver deterministic result.
> SOLUTION
>  Implement _*limit*_ as parameter for _*TextQuery*_ and 
> _*GridCacheQueryRequest*_ 
>  It should be passed as limit  parameter in Lucene's  
> _*IndexSearcher.search()*_ in _*GridLuceneIndex*_.
> For distributed queries _*limit*_ will also trim response queue when merging 
> results.
> Type: long
>  Special value: : 0 -> No limit (Integer.MAX_VALUE);



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12049) Allow custom authenticators to use SSL certificates

2019-11-05 Thread Alexei Scherbakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967400#comment-16967400
 ] 

Alexei Scherbakov commented on IGNITE-12049:


[~SomeFire]

I left comments on PR, please address them.

Some general questions:

1. For "normal" cluster nodes attributes are already available using 
ClusterNode.attributes and user can just set any attribute and use it in custom 
authenticator without any changes in core by implementing [1].

Do I understand correctly the fix is only relevant for thin clients 
authenticated using [2] and not having associated local attributes ? 
Shouldn't we instead provide the ability for thin clients to have attributes 
and avoid changing IgniteConfiguration ?

2. Why the new attribute is not available during authentication for jdbc/odbc 
client types ?

3. Can you create an example of using custom authenticator with certificates ?

[1] 
org.apache.ignite.internal.processors.security.GridSecurityProcessor#authenticateNode
[2] 
org.apache.ignite.internal.processors.security.GridSecurityProcessor#authenticate










> Allow custom authenticators to use SSL certificates
> ---
>
> Key: IGNITE-12049
> URL: https://issues.apache.org/jira/browse/IGNITE-12049
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ryabov Dmitrii
>Assignee: Ryabov Dmitrii
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Add SSL certificates to AuthenticationContext, so, authenticators can make 
> additional checks based on SSL certificates.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12189) Implement correct limit for TextQuery

2019-11-05 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967319#comment-16967319
 ] 

Ignite TC Bot commented on IGNITE-12189:


{panel:title=Branch: [pull/6917/head] Base: [master] : Possible Blockers 
(1)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1}
{color:#d04437}Scala (Visor Console){color} [[tests 
1|https://ci.ignite.apache.org/viewLog.html?buildId=4747474]]
* VisorConsoleSelfTestSuite: VisorActivationCommandSpec.A top visor 
command for cluster activation should deactivate cluster - Test has low fail 
rate in base branch 0,0% and is not flaky

{panel}
[TeamCity *-- Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=4747552buildTypeId=IgniteTests24Java8_RunAll]

> Implement correct limit for TextQuery
> -
>
> Key: IGNITE-12189
> URL: https://issues.apache.org/jira/browse/IGNITE-12189
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Yuriy Shuliha 
>Assignee: Yuriy Shuliha 
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>
> PROBLEM
> For now each server-node returns all response records to the client-node and 
> it may contain ~thousands, ~hundred thousands records.
>  Event if we need only first 10-100. Again, all the results are added to 
> queue in _*GridCacheQueryFutureAdapter*_ in arbitrary order by pages.
>  There are no any means to deliver deterministic result.
> SOLUTION
>  Implement _*limit*_ as parameter for _*TextQuery*_ and 
> _*GridCacheQueryRequest*_ 
>  It should be passed as limit  parameter in Lucene's  
> _*IndexSearcher.search()*_ in _*GridLuceneIndex*_.
> For distributed queries _*limit*_ will also trim response queue when merging 
> results.
> Type: long
>  Special value: : 0 -> No limit (Integer.MAX_VALUE);



--
This message was sent by Atlassian Jira
(v8.3.4#803005)