[jira] [Updated] (IGNITE-6923) Cache metrics are updated in timeout-worker potentially delaying critical code execution due to current implementation issues.

2017-11-17 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov updated IGNITE-6923:
--
Description: 
Some metrics are using cache iteration for calculation.

See stack trace for example.

{noformat}
"grid-timeout-worker-#39%DPL_GRID%DplGridNodeName%" #152 prio=5 os_prio=0 
tid=0x7f1009a03000 nid=0x5caa runnable [0x7f0f059d9000] 
   java.lang.Thread.State: RUNNABLE 
at java.util.HashMap.containsKey(HashMap.java:595) 
at java.util.HashSet.contains(HashSet.java:203) 
at 
java.util.Collections$UnmodifiableCollection.contains(Collections.java:1032) 
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$3.apply(IgniteCacheOffheapManagerImpl.java:339)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$3.apply(IgniteCacheOffheapManagerImpl.java:337)
at 
org.apache.ignite.internal.util.lang.gridfunc.TransformFilteringIterator.hasNext:@TransformFilteringIterator.java:90)
at 
org.apache.ignite.internal.util.lang.GridIteratorAdapter.hasNext(GridIteratorAdapter.java:45)
 
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.cacheEntriesCount(IgniteCacheOffheapManagerImpl.java:293)
at 
org.apache.ignite.internal.processors.cache.CacheMetricsImpl.getOffHeapPrimaryEntriesCount(CacheMetricsImpl.java:240)
at 
org.apache.ignite.internal.processors.cache.CacheMetricsSnapshot.(CacheMetricsSnapshot.java:271)
 
at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter.localMetrics(GridCacheAdapter.java:3217)
 
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$7.cacheMetrics(GridDiscoveryManager.java:1151)
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$7.nonHeapMemoryUsed(GridDiscoveryManager.java:1121)
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$7.metrics(GridDiscoveryManager.java:1087)
 
at 
org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNode.metrics(TcpDiscoveryNode.java:269)
 
at 
org.apache.ignite.internal.IgniteKernal$3.run(IgniteKernal.java:1175) 
at 
org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor$CancelableTask.onTimeout(GridTimeoutProcessor.java:256)
- locked <0x7f115f5bf890> (a 
org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor$CancelableTask)
at 
org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor$TimeoutWorker.body(GridTimeoutProcessor.java:158)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) 
at java.lang.Thread.run(Thread.java:748)
{noformat}

  was:
Some metrics are using full cache iteration for calculation.

See stack trace for example.

{noformat}
"grid-timeout-worker-#39%DPL_GRID%DplGridNodeName%" #152 prio=5 os_prio=0 
tid=0x7f1009a03000 nid=0x5caa runnable [0x7f0f059d9000] 
   java.lang.Thread.State: RUNNABLE 
at java.util.HashMap.containsKey(HashMap.java:595) 
at java.util.HashSet.contains(HashSet.java:203) 
at 
java.util.Collections$UnmodifiableCollection.contains(Collections.java:1032) 
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$3.apply(IgniteCacheOffheapManagerImpl.java:339)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$3.apply(IgniteCacheOffheapManagerImpl.java:337)
at 
org.apache.ignite.internal.util.lang.gridfunc.TransformFilteringIterator.hasNext:@TransformFilteringIterator.java:90)
at 
org.apache.ignite.internal.util.lang.GridIteratorAdapter.hasNext(GridIteratorAdapter.java:45)
 
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.cacheEntriesCount(IgniteCacheOffheapManagerImpl.java:293)
at 
org.apache.ignite.internal.processors.cache.CacheMetricsImpl.getOffHeapPrimaryEntriesCount(CacheMetricsImpl.java:240)
at 
org.apache.ignite.internal.processors.cache.CacheMetricsSnapshot.(CacheMetricsSnapshot.java:271)
 
at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter.localMetrics(GridCacheAdapter.java:3217)
 
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$7.cacheMetrics(GridDiscoveryManager.java:1151)
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$7.nonHeapMemoryUsed(GridDiscoveryManager.java:1121)
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$7.metrics(GridDiscoveryManager.java:1087)
 
at 
org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNode.metrics(TcpDiscoveryNode.java:269)
 
at 
org.apache.ignite.internal.IgniteKernal$3.run(IgniteKernal.java:1175) 
at 

[jira] [Commented] (IGNITE-6858) Wait for exchange inside GridReduceQueryExecutor.query which never finishes due to opened transaction

2017-11-16 Thread Alexei Scherbakov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16255006#comment-16255006
 ] 

Alexei Scherbakov commented on IGNITE-6858:
---

Latest tc result: 
https://ci.ignite.apache.org/viewLog.html?buildId=944130=buildResultsDiv=Ignite20Tests_RunAll

> Wait for exchange inside GridReduceQueryExecutor.query which never finishes 
> due to opened transaction
> -
>
> Key: IGNITE-6858
> URL: https://issues.apache.org/jira/browse/IGNITE-6858
> Project: Ignite
>  Issue Type: Bug
>  Security Level: Public(Viewable by anyone) 
>  Components: sql
>Affects Versions: 2.3
>Reporter: Alexandr Kuramshin
>Assignee: Alexei Scherbakov
> Fix For: 2.4
>
>
> Infinite waiting in loop
> {noformat}
> for (int attempt = 0;; attempt++) {
> if (attempt != 0) {
> try {
> Thread.sleep(attempt * 10); // Wait for exchange.
> }
> catch (InterruptedException e) {
> Thread.currentThread().interrupt();
> throw new CacheException("Query was interrupted.", e);
> }
> }
> {noformat}
> because of exchange will wait for partition eviction with opened transaction 
> in a related thread
> {noformat}
> at java.lang.Thread.sleep(Native Method)
> at 
> o.a.i.i.processors.query.h2.twostep.GridReduceQueryExecutor.query(GridReduceQueryExecutor.java:546)
> at 
> o.a.i.i.processors.query.h2.IgniteH2Indexing$8.iterator(IgniteH2Indexing.java:1236)
> at 
> o.a.i.i.processors.cache.QueryCursorImpl.iterator(QueryCursorImpl.java:95)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6923) Cache metrics are updated in timeout-worker potentially delaying critical code execution due to current implementation issues.

2017-11-15 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-6923:
-

 Summary: Cache metrics are updated in timeout-worker potentially 
delaying critical code execution due to current implementation issues.
 Key: IGNITE-6923
 URL: https://issues.apache.org/jira/browse/IGNITE-6923
 Project: Ignite
  Issue Type: Bug
  Security Level: Public (Viewable by anyone)
Affects Versions: 2.3
Reporter: Alexei Scherbakov
 Fix For: 2.4


Some metrics are using full cache iteration for calculation.

See stack trace for example.

{noformat}
"grid-timeout-worker-#39%DPL_GRID%DplGridNodeName%" #152 prio=5 os_prio=0 
tid=0x7f1009a03000 nid=0x5caa runnable [0x7f0f059d9000] 
   java.lang.Thread.State: RUNNABLE 
at java.util.HashMap.containsKey(HashMap.java:595) 
at java.util.HashSet.contains(HashSet.java:203) 
at 
java.util.Collections$UnmodifiableCollection.contains(Collections.java:1032) 
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$3.apply(IgniteCacheOffheapManagerImpl.java:339)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$3.apply(IgniteCacheOffheapManagerImpl.java:337)
at 
org.apache.ignite.internal.util.lang.gridfunc.TransformFilteringIterator.hasNext:@TransformFilteringIterator.java:90)
at 
org.apache.ignite.internal.util.lang.GridIteratorAdapter.hasNext(GridIteratorAdapter.java:45)
 
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.cacheEntriesCount(IgniteCacheOffheapManagerImpl.java:293)
at 
org.apache.ignite.internal.processors.cache.CacheMetricsImpl.getOffHeapPrimaryEntriesCount(CacheMetricsImpl.java:240)
at 
org.apache.ignite.internal.processors.cache.CacheMetricsSnapshot.(CacheMetricsSnapshot.java:271)
 
at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter.localMetrics(GridCacheAdapter.java:3217)
 
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$7.cacheMetrics(GridDiscoveryManager.java:1151)
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$7.nonHeapMemoryUsed(GridDiscoveryManager.java:1121)
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$7.metrics(GridDiscoveryManager.java:1087)
 
at 
org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNode.metrics(TcpDiscoveryNode.java:269)
 
at 
org.apache.ignite.internal.IgniteKernal$3.run(IgniteKernal.java:1175) 
at 
org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor$CancelableTask.onTimeout(GridTimeoutProcessor.java:256)
- locked <0x7f115f5bf890> (a 
org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor$CancelableTask)
at 
org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor$TimeoutWorker.body(GridTimeoutProcessor.java:158)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) 
at java.lang.Thread.run(Thread.java:748)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (IGNITE-6858) Wait for exchange inside GridReduceQueryExecutor.query which never finishes due to opened transaction

2017-11-11 Thread Alexei Scherbakov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16248477#comment-16248477
 ] 

Alexei Scherbakov commented on IGNITE-6858:
---

https://ci.ignite.apache.org/viewQueued.html?itemId=938907

> Wait for exchange inside GridReduceQueryExecutor.query which never finishes 
> due to opened transaction
> -
>
> Key: IGNITE-6858
> URL: https://issues.apache.org/jira/browse/IGNITE-6858
> Project: Ignite
>  Issue Type: Bug
>  Security Level: Public(Viewable by anyone) 
>  Components: sql
>Affects Versions: 2.3
>Reporter: Alexandr Kuramshin
>Assignee: Alexei Scherbakov
> Fix For: 2.4
>
>
> Infinite waiting in loop
> {noformat}
> for (int attempt = 0;; attempt++) {
> if (attempt != 0) {
> try {
> Thread.sleep(attempt * 10); // Wait for exchange.
> }
> catch (InterruptedException e) {
> Thread.currentThread().interrupt();
> throw new CacheException("Query was interrupted.", e);
> }
> }
> {noformat}
> because of exchange will wait for partition eviction with opened transaction 
> in a related thread
> {noformat}
> at java.lang.Thread.sleep(Native Method)
> at 
> o.a.i.i.processors.query.h2.twostep.GridReduceQueryExecutor.query(GridReduceQueryExecutor.java:546)
> at 
> o.a.i.i.processors.query.h2.IgniteH2Indexing$8.iterator(IgniteH2Indexing.java:1236)
> at 
> o.a.i.i.processors.cache.QueryCursorImpl.iterator(QueryCursorImpl.java:95)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (IGNITE-6858) Wait for exchange inside GridReduceQueryExecutor.query which never finishes due to opened transaction

2017-11-10 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov reassigned IGNITE-6858:
-

Assignee: Alexei Scherbakov  (was: Vladimir Ozerov)

> Wait for exchange inside GridReduceQueryExecutor.query which never finishes 
> due to opened transaction
> -
>
> Key: IGNITE-6858
> URL: https://issues.apache.org/jira/browse/IGNITE-6858
> Project: Ignite
>  Issue Type: Bug
>  Security Level: Public(Viewable by anyone) 
>  Components: sql
>Affects Versions: 2.3
>Reporter: Alexandr Kuramshin
>Assignee: Alexei Scherbakov
> Fix For: 2.4
>
>
> Infinite waiting in loop
> {noformat}
> for (int attempt = 0;; attempt++) {
> if (attempt != 0) {
> try {
> Thread.sleep(attempt * 10); // Wait for exchange.
> }
> catch (InterruptedException e) {
> Thread.currentThread().interrupt();
> throw new CacheException("Query was interrupted.", e);
> }
> }
> {noformat}
> because of exchange will wait for partition eviction with opened transaction 
> in a related thread
> {noformat}
> at java.lang.Thread.sleep(Native Method)
> at 
> o.a.i.i.processors.query.h2.twostep.GridReduceQueryExecutor.query(GridReduceQueryExecutor.java:546)
> at 
> o.a.i.i.processors.query.h2.IgniteH2Indexing$8.iterator(IgniteH2Indexing.java:1236)
> at 
> o.a.i.i.processors.cache.QueryCursorImpl.iterator(QueryCursorImpl.java:95)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (IGNITE-6667) Allow discovery cache instance reuse if only minor topology change has occured.

2017-11-08 Thread Alexei Scherbakov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243772#comment-16243772
 ] 

Alexei Scherbakov commented on IGNITE-6667:
---

Fixed in IGNITE-6831

> Allow discovery cache instance reuse if only minor topology change has 
> occured.
> ---
>
> Key: IGNITE-6667
> URL: https://issues.apache.org/jira/browse/IGNITE-6667
> Project: Ignite
>  Issue Type: Improvement
>  Security Level: Public(Viewable by anyone) 
>Affects Versions: 2.2
>Reporter: Alexei Scherbakov
> Fix For: 2.4
>
>
> Currently we always recreating DiscoCache instance even if only minor 
> topology change has occured and cache may be reused.
> Profiling shows what initialization of such object tooks up tens of millis 
> which adds to ring latency delay and especially sensitive for large 
> topologies.
> Solution: reuse current discovery cache instance whenever possible.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6827) Configurable rollback for long running transactions before partition exchange

2017-11-03 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-6827:
-

 Summary: Configurable rollback for long running transactions 
before partition exchange
 Key: IGNITE-6827
 URL: https://issues.apache.org/jira/browse/IGNITE-6827
 Project: Ignite
  Issue Type: Improvement
  Security Level: Public (Viewable by anyone)
Affects Versions: 2.0
Reporter: Alexei Scherbakov
Priority: Major
 Fix For: 2.4


Currently long running / buggy user transactions force partition exchange block 
on waiting for 
org.apache.ignite.internal.processors.cache.GridCacheSharedContext#partitionReleaseFuture,
 preventing all grid progress.

I suggest introducing new global flag in TransactionConfiguration, like 

{{txRollbackTimeoutOnTopologyChange}}

which will rollback exchange blocking transaction after the timeout.

Still need to think what to do with other topology locking activities.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (IGNITE-5037) Fix broken @AffinityKeyMapped annotation for compute jobs.

2017-10-29 Thread Alexei Scherbakov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-5037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223960#comment-16223960
 ] 

Alexei Scherbakov commented on IGNITE-5037:
---

[~ihorpts],

Actually, there is a way to do affinity collocated computations using current 
API.

You should use IgniteCompute.affinityRunAsync/affinityCallAsync methods which 
take partId as argument. 

Collect futures and wait for all of them to complete.

Each closure is guaranteed to run locally with data for given partition, and 
will hold the partition data from evictions(on rebalancing) until computation 
is not finished.

If node left while running closure, it wil be restarted on new primary node for 
partition.

I recommend to ask on user list if you still have questions.



> Fix broken @AffinityKeyMapped annotation for compute jobs.
> --
>
> Key: IGNITE-5037
> URL: https://issues.apache.org/jira/browse/IGNITE-5037
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 1.7
>Reporter: Alexei Scherbakov
>Assignee: Maksim Kozlov
>  Labels: newbie
>
> See related discussion on dev list entitled Proper collocation of 
> computations and data 
> (http://apache-ignite-developers.2346864.n4.nabble.com/Proper-collocation-of-computations-and-data-td16945.html).
> We must repair data affinity routing for compute jobs. It should work same as 
> for affinityCall/Run with partition.
> Currently, ComputeTask map method returns Map ClusterNode>,
> but we have to provide some API allows to map ComputeJobs to partitions or 
> keys. 
> This can be done using AffinityKeyMapped annotation or any other way.
> Since that's a publiс API any fixes should be discussed on dev list prior to 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (IGNITE-6667) Allow discovery cache instance reuse if only minor topology change has occured.

2017-10-26 Thread Alexei Scherbakov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16216662#comment-16216662
 ] 

Alexei Scherbakov edited comment on IGNITE-6667 at 10/26/17 10:40 AM:
--

[~sboikov]

I've implemented flexible strategy for disco cache reuse, please take a look.


was (Author: ascherbakov):
[~sboikov]

I've implemented flexible strategy for disco cache reuse, please take a look.

TC: https://ci.ignite.apache.org/viewQueued.html?itemId=909093 (still in queue)

> Allow discovery cache instance reuse if only minor topology change has 
> occured.
> ---
>
> Key: IGNITE-6667
> URL: https://issues.apache.org/jira/browse/IGNITE-6667
> Project: Ignite
>  Issue Type: Improvement
>  Security Level: Public(Viewable by anyone) 
>Affects Versions: 2.2
>Reporter: Alexei Scherbakov
>Assignee: Semen Boikov
> Fix For: 2.4
>
>
> Currently we always recreating DiscoCache instance even if only minor 
> topology change has occured and cache may be reused.
> Profiling shows what initialization of such object tooks up tens of millis 
> which adds to ring latency delay and especially sensitive for large 
> topologies.
> Solution: reuse current discovery cache instance whenever possible.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (IGNITE-6667) Allow discovery cache instance reuse if only minor topology change has occured.

2017-10-24 Thread Alexei Scherbakov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16216662#comment-16216662
 ] 

Alexei Scherbakov edited comment on IGNITE-6667 at 10/24/17 1:57 PM:
-

[~sboikov]

I've implemented flexible strategy for disco cache reuse, please take a look.

TC: https://ci.ignite.apache.org/viewQueued.html?itemId=909093 (still in queue)


was (Author: ascherbakov):
[~sboikov]

I've implemented flexible strategy for disco cache reuse, please take a look.

TC: https://ci.ignite.apache.org/viewQueued.html?itemId=906993 (still in queue)

> Allow discovery cache instance reuse if only minor topology change has 
> occured.
> ---
>
> Key: IGNITE-6667
> URL: https://issues.apache.org/jira/browse/IGNITE-6667
> Project: Ignite
>  Issue Type: Improvement
>  Security Level: Public(Viewable by anyone) 
>Affects Versions: 2.2
>Reporter: Alexei Scherbakov
>Assignee: Alexei Scherbakov
> Fix For: 2.4
>
>
> Currently we always recreating DiscoCache instance even if only minor 
> topology change has occured and cache may be reused.
> Profiling shows what initialization of such object tooks up tens of millis 
> which adds to ring latency delay and especially sensitive for large 
> topologies.
> Solution: reuse current discovery cache instance whenever possible.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (IGNITE-6667) Allow discovery cache instance reuse if only minor topology change has occured.

2017-10-24 Thread Alexei Scherbakov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16216662#comment-16216662
 ] 

Alexei Scherbakov commented on IGNITE-6667:
---

[~sboikov]

I've implemented flexible strategy for disco cache reuse, please take a look.

TC: https://ci.ignite.apache.org/viewQueued.html?itemId=906993 (still in queue)

> Allow discovery cache instance reuse if only minor topology change has 
> occured.
> ---
>
> Key: IGNITE-6667
> URL: https://issues.apache.org/jira/browse/IGNITE-6667
> Project: Ignite
>  Issue Type: Improvement
>  Security Level: Public(Viewable by anyone) 
>Affects Versions: 2.2
>Reporter: Alexei Scherbakov
>Assignee: Alexei Scherbakov
> Fix For: 2.4
>
>
> Currently we always recreating DiscoCache instance even if only minor 
> topology change has occured and cache may be reused.
> Profiling shows what initialization of such object tooks up tens of millis 
> which adds to ring latency delay and especially sensitive for large 
> topologies.
> Solution: reuse current discovery cache instance whenever possible.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (IGNITE-6628) Make possible to rebuild all SQL indexes programmatically with enabled persistence.

2017-10-21 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-6628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov reassigned IGNITE-6628:
-

Assignee: (was: Alexei Scherbakov)

> Make possible to rebuild all SQL indexes programmatically with enabled 
> persistence.
> ---
>
> Key: IGNITE-6628
> URL: https://issues.apache.org/jira/browse/IGNITE-6628
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.0
>Reporter: Alexei Scherbakov
> Fix For: 2.4
>
>
> We have unofficial way for rebuilding indexes, which is called on activation 
> if index.bin is removed from PDS directory.
> Code is located here [1]
> I think it's ok to make it public for several cases: model is changed, index 
> is damaged, etc...
> Also current impl has a bug: CacheEntry in [2] is not touched, polluting heap 
> and leading to OOM.
> [1] 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager#beforeExchange
> [2] 
> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing#rebuildIndexesFromHash



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (IGNITE-6667) Allow discovery cache instance reuse if only minor topology change has occured.

2017-10-18 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov updated IGNITE-6667:
--
Description: 
Currently we always recreating DiscoCache instance even if only minor topology 
change has occured and cache may be reused.

Profiling shows what initialization of such object tooks up tens of millis 
which adds to ring latency delay and especially sensitive for large topologies.

Solution: reuse current discovery cache instance whenever possible.



  was:
Currently we always recreating DiscoCache instance even if only minor topology 
change has occured and cache may be reused.

Profiling shows what initialization of such object tooks up tens of millis 
which adds to ring latency delay and especially sensitive for large topologies.

Solution: reuse current discovery cache instance if possible.




> Allow discovery cache instance reuse if only minor topology change has 
> occured.
> ---
>
> Key: IGNITE-6667
> URL: https://issues.apache.org/jira/browse/IGNITE-6667
> Project: Ignite
>  Issue Type: Improvement
>  Security Level: Public(Viewable by anyone) 
>Affects Versions: 2.2
>Reporter: Alexei Scherbakov
>Assignee: Alexei Scherbakov
> Fix For: 2.4
>
>
> Currently we always recreating DiscoCache instance even if only minor 
> topology change has occured and cache may be reused.
> Profiling shows what initialization of such object tooks up tens of millis 
> which adds to ring latency delay and especially sensitive for large 
> topologies.
> Solution: reuse current discovery cache instance whenever possible.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6667) Allow discovery cache instance reuse if only minor topology change has occured.

2017-10-18 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-6667:
-

 Summary: Allow discovery cache instance reuse if only minor 
topology change has occured.
 Key: IGNITE-6667
 URL: https://issues.apache.org/jira/browse/IGNITE-6667
 Project: Ignite
  Issue Type: Improvement
  Security Level: Public (Viewable by anyone)
Affects Versions: 2.2
Reporter: Alexei Scherbakov
Assignee: Alexei Scherbakov
 Fix For: 2.4


Currently we always recreating DiscoCache instance even if only minor topology 
change has occured and cache may be reused.

Profiling shows what initialization of such object tooks up tens of millis 
which adds to ring latency delay and especially sensitive for large topologies.

Solution: reuse current discovery cache instance if possible.





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6633) Repair basic SQL functionality

2017-10-16 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-6633:
-

 Summary: Repair basic SQL functionality
 Key: IGNITE-6633
 URL: https://issues.apache.org/jira/browse/IGNITE-6633
 Project: Ignite
  Issue Type: Improvement
  Security Level: Public (Viewable by anyone)
  Components: sql
Reporter: Alexei Scherbakov
 Fix For: 2.4


For a long time SQL engine has known limitation (H2 related) [1]

This is huge usability issue, because proposed workaround requires query 
rewriting and is difficult to implement in some cases, e.g. when some kind of 
query builder is used.

I suggest to fix it at last.

[1] 
https://apacheignite.readme.io/v2.1/docs/sql-performance-and-debugging#sql-performance-and-usability-considerations



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (IGNITE-6628) Make possible to rebuild all SQL indexes programmatically with enabled persistence.

2017-10-13 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-6628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov updated IGNITE-6628:
--
Description: 
We have unofficial way for rebuilding indexes, which is called on activation if 
index.bin is removed from PDS directory.

Code is located here [1]

I think it's ok to make it public for several cases: model is changed, index is 
damaged, etc...

Also current impl has a bug: CacheEntry in [2] is not touched, polluting heap 
and leading to OOM.

[1] 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager#beforeExchange
[2] 
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing#rebuildIndexesFromHash

  was:
We have unofficial way for rebuilding indexes, which is called on activation if 
index.bin is removed from PDS directory.

Code is located here [1]

I think it's ok to make it public for several cases: model is changed, index is 
damage, etc...

Also current impl has a bug: CacheEntry in [2] is not touched, polluting heap 
and leading to OOM.

[1] 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager#beforeExchange
[2] 
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing#rebuildIndexesFromHash


> Make possible to rebuild all SQL indexes programmatically with enabled 
> persistence.
> ---
>
> Key: IGNITE-6628
> URL: https://issues.apache.org/jira/browse/IGNITE-6628
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.0
>Reporter: Alexei Scherbakov
>Assignee: Alexei Scherbakov
> Fix For: 2.4
>
>
> We have unofficial way for rebuilding indexes, which is called on activation 
> if index.bin is removed from PDS directory.
> Code is located here [1]
> I think it's ok to make it public for several cases: model is changed, index 
> is damaged, etc...
> Also current impl has a bug: CacheEntry in [2] is not touched, polluting heap 
> and leading to OOM.
> [1] 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager#beforeExchange
> [2] 
> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing#rebuildIndexesFromHash



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6628) Make possible to rebuild all SQL indexes programmatically with enabled persistence.

2017-10-13 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-6628:
-

 Summary: Make possible to rebuild all SQL indexes programmatically 
with enabled persistence.
 Key: IGNITE-6628
 URL: https://issues.apache.org/jira/browse/IGNITE-6628
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 2.0
Reporter: Alexei Scherbakov
Assignee: Alexei Scherbakov
 Fix For: 2.4


We have unofficial way for rebuilding indexes, which is called on activation if 
index.bin is removed from PDS directory.

Code is located here [1]

I think it's ok to make it public for several cases: model is changed, index is 
damage, etc...

Also current impl has a bug: CacheEntry in [2] is not touched, polluting heap 
and leading to OOM.

[1] 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager#beforeExchange
[2] 
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing#rebuildIndexesFromHash



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (IGNITE-5357) Replicated cache reads load balancing.

2017-09-29 Thread Alexei Scherbakov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16186027#comment-16186027
 ] 

Alexei Scherbakov commented on IGNITE-5357:
---

[~mlipkovich],

Yes, you should choose among alive nodes in case of locked topology.

Something like:

{code}
// Always use primary, if reads from backups are not allowed.
if (!cctx.config().isReadFromBackup())
return affNodes.get(0);

Object mac = cctx.localNode().attribute(IgniteNodeAttributes.ATTR_MACS);

assert mac != null;

int r = ThreadLocalRandom8.current().nextInt(affNodes.size());

int c = 0;

ClusterNode n0 = null;

int lastMatch = -1;

for (int i = 0; i < affNodes.size(); i++) {
ClusterNode node = affNodes.get(i);

if (canRemap || cctx.discovery().alive(node)) {
// Prefer collocated node.
if (mac.equals(node.attribute(IgniteNodeAttributes.ATTR_MACS)))
return node;

if (c++ == r)
n0 = node;

lastMatch = i;
}
}

if (n0 == null && lastMatch != -1)
n0 = affNodes.get(lastMatch);

return n0;
{code}





> Replicated cache reads load balancing.
> --
>
> Key: IGNITE-5357
> URL: https://issues.apache.org/jira/browse/IGNITE-5357
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 1.6
>Reporter: Alexei Scherbakov
>Assignee: Mikhail Lipkovich
>  Labels: newbie
> Fix For: 2.3
>
>
> Currently all read requests from client node to replicated cache will go 
> through primary node for key.
> Need to select random affinity node in topology and send request here (only 
> if readFromBackups=true)
> If where are server nodes collocated on same host with client, must select 
> target node from them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (IGNITE-6426) Add support for packed representation of int and long primitives in raw readers/writers.

2017-09-29 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-6426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov updated IGNITE-6426:
--
Description: This is useful for implementing custom compression schemes.  
(was: This is useful for implementing custom efficient compression.)

> Add support for packed representation of int and long primitives in raw 
> readers/writers.
> 
>
> Key: IGNITE-6426
> URL: https://issues.apache.org/jira/browse/IGNITE-6426
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.1
>Reporter: Alexei Scherbakov
>Assignee: Alexei Scherbakov
>  Labels: iep-2
>
> This is useful for implementing custom compression schemes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (IGNITE-6426) Add support for packed representation of int and long primitives in raw readers/writers.

2017-09-29 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-6426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov updated IGNITE-6426:
--
Summary: Add support for packed representation of int and long primitives 
in raw readers/writers.  (was: Add support for variable length numbers in raw 
readers/writers.)

> Add support for packed representation of int and long primitives in raw 
> readers/writers.
> 
>
> Key: IGNITE-6426
> URL: https://issues.apache.org/jira/browse/IGNITE-6426
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.1
>Reporter: Alexei Scherbakov
>Assignee: Alexei Scherbakov
>  Labels: iep-2
>
> This is useful for implementing custom efficient compression.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (IGNITE-6507) Commit can be lost in network split scenario

2017-09-28 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov updated IGNITE-6507:
--
Description: 
Commit can be lost in network split scenario

Reproducer:

https://github.com/ascherbakoff/ignite/blob/ignite-6507/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/distributed/dht/IgniteCacheTopologySplitTxConsistencyTest.java

If routing will be switched to second data center, new transactions will no see 
commited state.

  was:
Commit can be lost in network split scenario

{noformat}
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements. See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License. You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.ignite.internal.processors.cache.distributed.dht;

import org.apache.ignite.IgniteCache;
import org.apache.ignite.cache.affinity.Affinity;
import org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction;
import org.apache.ignite.configuration.BinaryConfiguration;
import org.apache.ignite.configuration.CacheConfiguration;
import org.apache.ignite.configuration.IgniteConfiguration;
import org.apache.ignite.configuration.MemoryConfiguration;
import org.apache.ignite.internal.IgniteEx;
import org.apache.ignite.internal.IgniteInternalFuture;
import org.apache.ignite.internal.TestRecordingCommunicationSpi;
import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
import org.apache.ignite.spi.discovery.tcp.ipfinder.TcpDiscoveryIpFinder;
import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;
import org.apache.ignite.testframework.GridTestUtils;
import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;

import static org.apache.ignite.cache.CacheAtomicityMode.TRANSACTIONAL;
import static org.apache.ignite.cache.CacheWriteSynchronizationMode.FULL_SYNC;

/**
 * Tests commit consitency in split-brain scenario.
 */
public class GridCacheGridSplitTxConsistencyTest extends GridCommonAbstractTest 
{
/** */
private static final TcpDiscoveryIpFinder IP_FINDER = new 
TcpDiscoveryVmIpFinder(true);

/**
 * {@inheritDoc}
 */
@Override protected void afterTest() throws Exception {
super.afterTest();

stopAllGrids();

GridTestUtils.deleteDbFiles();
}

/**
 * {@inheritDoc}
 */
@Override protected IgniteConfiguration getConfiguration(String gridName) 
throws Exception {
IgniteConfiguration cfg = super.getConfiguration(gridName);

cfg.setCommunicationSpi(new TestRecordingCommunicationSpi());

cfg.setConsistentId(gridName);

MemoryConfiguration memCfg = new MemoryConfiguration();
memCfg.setPageSize(1024);
memCfg.setDefaultMemoryPolicySize(100 * 1024 * 1024);

cfg.setMemoryConfiguration(memCfg);

((TcpDiscoverySpi) cfg.getDiscoverySpi()).setIpFinder(IP_FINDER);

CacheConfiguration ccfg = new CacheConfiguration();
ccfg.setName(DEFAULT_CACHE_NAME);
ccfg.setAtomicityMode(TRANSACTIONAL);
ccfg.setWriteSynchronizationMode(FULL_SYNC);
ccfg.setAffinity(new RendezvousAffinityFunction(false, 3));
ccfg.setBackups(2);

cfg.setCacheConfiguration(ccfg);

return cfg;
}

/**
 * Tests if commits are working as expected.
 * @throws Exception
 */
public void testSplitTxConsistency() throws Exception {
IgniteEx grid0 = startGrid(0);
grid0.active(true);

IgniteEx grid1 = startGrid(1);
IgniteEx grid2 = startGrid(2);

int key = 0;

Affinity aff = grid0.affinity(DEFAULT_CACHE_NAME);
assertTrue(aff.isPrimary(grid0.localNode(), key));
assertTrue(aff.isBackup(grid1.localNode(), key));
assertTrue(aff.isBackup(grid2.localNode(), key));

final TestRecordingCommunicationSpi spi0 = 
(TestRecordingCommunicationSpi) grid0.configuration().getCommunicationSpi();

spi0.blockMessages(GridDhtTxFinishRequest.class, grid1.name());
spi0.blockMessages(GridDhtTxFinishRequest.class, grid2.name());

IgniteInternalFuture fut = multithreadedAsync(new Runnable() {
@Override public void run() {
try {
spi0.waitForBlocked();


[jira] [Commented] (IGNITE-6507) Commit can be lost in network split scenario

2017-09-28 Thread Alexei Scherbakov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16184313#comment-16184313
 ] 

Alexei Scherbakov commented on IGNITE-6507:
---

Added split emulation in reproducer.

> Commit can be lost in network split scenario
> 
>
> Key: IGNITE-6507
> URL: https://issues.apache.org/jira/browse/IGNITE-6507
> Project: Ignite
>  Issue Type: Bug
>  Components: general
>Affects Versions: 2.1
>Reporter: Alexei Scherbakov
>Priority: Critical
>  Labels: important
> Fix For: 2.4
>
>
> Commit can be lost in network split scenario
> Reproducer:
> https://github.com/ascherbakoff/ignite/blob/ignite-6507/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/distributed/dht/IgniteCacheTopologySplitTxConsistencyTest.java
> If routing will be switched to second data center, new transactions will no 
> see commited state.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (IGNITE-6507) Commit can be lost in network split scenario

2017-09-27 Thread Alexei Scherbakov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16182226#comment-16182226
 ] 

Alexei Scherbakov commented on IGNITE-6507:
---

[~sboikov],

The test demonstrates a problem on commit phase: commit is successful on 
primary node and not successful on backup nodes in network split scenario 
(simulated in test by stopping backup nodes in wrongtime).

In network split scenario usually single DC is selected for transaction 
processing, and if wrong DC is selected the commit is lost (as shown in test)

I can try to implement network split emulation in test by somehow hacking 
TcpDiscoverySpi. Could you suggest a proper way ?


> Commit can be lost in network split scenario
> 
>
> Key: IGNITE-6507
> URL: https://issues.apache.org/jira/browse/IGNITE-6507
> Project: Ignite
>  Issue Type: Bug
>  Components: general
>Affects Versions: 2.1
>Reporter: Alexei Scherbakov
>Priority: Critical
> Fix For: 2.3
>
>
> Commit can be lost in network split scenario
> {noformat}
> /*
>  * Licensed to the Apache Software Foundation (ASF) under one or more
>  * contributor license agreements. See the NOTICE file distributed with
>  * this work for additional information regarding copyright ownership.
>  * The ASF licenses this file to You under the Apache License, Version 2.0
>  * (the "License"); you may not use this file except in compliance with
>  * the License. You may obtain a copy of the License at
>  *
>  * http://www.apache.org/licenses/LICENSE-2.0
>  *
>  * Unless required by applicable law or agreed to in writing, software
>  * distributed under the License is distributed on an "AS IS" BASIS,
>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  * See the License for the specific language governing permissions and
>  * limitations under the License.
>  */
> package org.apache.ignite.internal.processors.cache.distributed.dht;
> import org.apache.ignite.IgniteCache;
> import org.apache.ignite.cache.affinity.Affinity;
> import org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction;
> import org.apache.ignite.configuration.BinaryConfiguration;
> import org.apache.ignite.configuration.CacheConfiguration;
> import org.apache.ignite.configuration.IgniteConfiguration;
> import org.apache.ignite.configuration.MemoryConfiguration;
> import org.apache.ignite.internal.IgniteEx;
> import org.apache.ignite.internal.IgniteInternalFuture;
> import org.apache.ignite.internal.TestRecordingCommunicationSpi;
> import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
> import org.apache.ignite.spi.discovery.tcp.ipfinder.TcpDiscoveryIpFinder;
> import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;
> import org.apache.ignite.testframework.GridTestUtils;
> import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;
> import static org.apache.ignite.cache.CacheAtomicityMode.TRANSACTIONAL;
> import static org.apache.ignite.cache.CacheWriteSynchronizationMode.FULL_SYNC;
> /**
>  * Tests commit consitency in split-brain scenario.
>  */
> public class GridCacheGridSplitTxConsistencyTest extends 
> GridCommonAbstractTest {
> /** */
> private static final TcpDiscoveryIpFinder IP_FINDER = new 
> TcpDiscoveryVmIpFinder(true);
> /**
>  * {@inheritDoc}
>  */
> @Override protected void afterTest() throws Exception {
> super.afterTest();
> stopAllGrids();
> GridTestUtils.deleteDbFiles();
> }
> /**
>  * {@inheritDoc}
>  */
> @Override protected IgniteConfiguration getConfiguration(String gridName) 
> throws Exception {
> IgniteConfiguration cfg = super.getConfiguration(gridName);
> cfg.setCommunicationSpi(new TestRecordingCommunicationSpi());
> cfg.setConsistentId(gridName);
> MemoryConfiguration memCfg = new MemoryConfiguration();
> memCfg.setPageSize(1024);
> memCfg.setDefaultMemoryPolicySize(100 * 1024 * 1024);
> cfg.setMemoryConfiguration(memCfg);
> ((TcpDiscoverySpi) cfg.getDiscoverySpi()).setIpFinder(IP_FINDER);
> CacheConfiguration ccfg = new CacheConfiguration();
> ccfg.setName(DEFAULT_CACHE_NAME);
> ccfg.setAtomicityMode(TRANSACTIONAL);
> ccfg.setWriteSynchronizationMode(FULL_SYNC);
> ccfg.setAffinity(new RendezvousAffinityFunction(false, 3));
> ccfg.setBackups(2);
> cfg.setCacheConfiguration(ccfg);
> return cfg;
> }
> /**
>  * Tests if commits are working as expected.
>  * @throws Exception
>  */
> public void testSplitTxConsistency() throws Exception {
> IgniteEx grid0 = startGrid(0);
> grid0.active(true);
> IgniteEx grid1 = startGrid(1);
> IgniteEx grid2 

[jira] [Created] (IGNITE-6507) Commit can be lost in network split scenario

2017-09-26 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-6507:
-

 Summary: Commit can be lost in network split scenario
 Key: IGNITE-6507
 URL: https://issues.apache.org/jira/browse/IGNITE-6507
 Project: Ignite
  Issue Type: Bug
  Components: general
Affects Versions: 2.1
Reporter: Alexei Scherbakov
Priority: Critical
 Fix For: 2.3


Commit can be lost in network split scenario

{noformat}
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements. See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License. You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.ignite.internal.processors.cache.distributed.dht;

import org.apache.ignite.IgniteCache;
import org.apache.ignite.cache.affinity.Affinity;
import org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction;
import org.apache.ignite.configuration.BinaryConfiguration;
import org.apache.ignite.configuration.CacheConfiguration;
import org.apache.ignite.configuration.IgniteConfiguration;
import org.apache.ignite.configuration.MemoryConfiguration;
import org.apache.ignite.internal.IgniteEx;
import org.apache.ignite.internal.IgniteInternalFuture;
import org.apache.ignite.internal.TestRecordingCommunicationSpi;
import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
import org.apache.ignite.spi.discovery.tcp.ipfinder.TcpDiscoveryIpFinder;
import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;
import org.apache.ignite.testframework.GridTestUtils;
import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;

import static org.apache.ignite.cache.CacheAtomicityMode.TRANSACTIONAL;
import static org.apache.ignite.cache.CacheWriteSynchronizationMode.FULL_SYNC;

/**
 * Tests commit consitency in split-brain scenario.
 */
public class GridCacheGridSplitTxConsistencyTest extends GridCommonAbstractTest 
{
/** */
private static final TcpDiscoveryIpFinder IP_FINDER = new 
TcpDiscoveryVmIpFinder(true);

/**
 * {@inheritDoc}
 */
@Override protected void afterTest() throws Exception {
super.afterTest();

stopAllGrids();

GridTestUtils.deleteDbFiles();
}

/**
 * {@inheritDoc}
 */
@Override protected IgniteConfiguration getConfiguration(String gridName) 
throws Exception {
IgniteConfiguration cfg = super.getConfiguration(gridName);

cfg.setCommunicationSpi(new TestRecordingCommunicationSpi());

cfg.setConsistentId(gridName);

MemoryConfiguration memCfg = new MemoryConfiguration();
memCfg.setPageSize(1024);
memCfg.setDefaultMemoryPolicySize(100 * 1024 * 1024);

cfg.setMemoryConfiguration(memCfg);

((TcpDiscoverySpi) cfg.getDiscoverySpi()).setIpFinder(IP_FINDER);

CacheConfiguration ccfg = new CacheConfiguration();
ccfg.setName(DEFAULT_CACHE_NAME);
ccfg.setAtomicityMode(TRANSACTIONAL);
ccfg.setWriteSynchronizationMode(FULL_SYNC);
ccfg.setAffinity(new RendezvousAffinityFunction(false, 3));
ccfg.setBackups(2);

cfg.setCacheConfiguration(ccfg);

return cfg;
}

/**
 * Tests if commits are working as expected.
 * @throws Exception
 */
public void testSplitTxConsistency() throws Exception {
IgniteEx grid0 = startGrid(0);
grid0.active(true);

IgniteEx grid1 = startGrid(1);
IgniteEx grid2 = startGrid(2);

int key = 0;

Affinity aff = grid0.affinity(DEFAULT_CACHE_NAME);
assertTrue(aff.isPrimary(grid0.localNode(), key));
assertTrue(aff.isBackup(grid1.localNode(), key));
assertTrue(aff.isBackup(grid2.localNode(), key));

final TestRecordingCommunicationSpi spi0 = 
(TestRecordingCommunicationSpi) grid0.configuration().getCommunicationSpi();

spi0.blockMessages(GridDhtTxFinishRequest.class, grid1.name());
spi0.blockMessages(GridDhtTxFinishRequest.class, grid2.name());

IgniteInternalFuture fut = multithreadedAsync(new Runnable() {
@Override public void run() {
try {
spi0.waitForBlocked();

} catch (InterruptedException e) {
fail();
}

stopGrid(1);

[jira] [Resolved] (IGNITE-6491) Race in TopologyValidator.validate() and EVT_NODE_LEFT listener calls (split-brain activator)

2017-09-26 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-6491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov resolved IGNITE-6491.
---
   Resolution: Fixed
Fix Version/s: (was: 2.2)
   2.3

> Race in TopologyValidator.validate() and EVT_NODE_LEFT listener calls 
> (split-brain activator)
> -
>
> Key: IGNITE-6491
> URL: https://issues.apache.org/jira/browse/IGNITE-6491
> Project: Ignite
>  Issue Type: Bug
>  Components: cache, general
>Affects Versions: 2.1
>Reporter: Alexandr Kuramshin
>Assignee: Alexandr Kuramshin
> Fix For: 2.3
>
>
> The following wrong cache {{validate}}/{{put}} sequence may occur
> On node left {{GridDhtPartitionsExchangeFuture}} will be generated by the 
> {{disco-event-worker}} thread.
> Then the {{exchange-worker}} thread does
> {noformat}
> Split-brain detected [cacheName=test40, activatorTopVer=0, cacheTopVer=14]
>   at 
> org.apache.ignite.internal.util.IgniteUtils.dumpStack(IgniteUtils.java:1141)
>   at 
> org.apache.ignite.internal.processors.cache.IgniteTopologyValidatorGridSplitCacheTest$SplitAwareTopologyValidator.validate(IgniteTopologyValidatorGridSplitCacheTest.java:307)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCacheGroup(GridDhtTopologyFutureAdapter.java:64)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:1456)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:115)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:450)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:668)
>   at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2278)
> {noformat}
> The result of validation is stored in {{grpValidRes}} with value of {{false}}.
> After some delay the {{disco-event-worker}} thread will do
> {noformat}
> java.lang.Exception: Node is segment activator [cacheName=test40, 
> activatorTopVer=14]
>   at 
> org.apache.ignite.internal.util.IgniteUtils.dumpStack(IgniteUtils.java:1141)
>   at 
> org.apache.ignite.internal.processors.cache.IgniteTopologyValidatorGridSplitCacheTest$SplitAwareTopologyValidator$2.apply(IgniteTopologyValidatorGridSplitCacheTest.java:360)
>   at 
> org.apache.ignite.internal.processors.cache.IgniteTopologyValidatorGridSplitCacheTest$SplitAwareTopologyValidator$2.apply(IgniteTopologyValidatorGridSplitCacheTest.java:349)
>   at 
> org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager$UserListenerWrapper.onEvent(GridEventStorageManager.java:1463)
>   at 
> org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.notifyListeners(GridEventStorageManager.java:859)
>   at 
> org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.notifyListeners(GridEventStorageManager.java:844)
>   at 
> org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.record0(GridEventStorageManager.java:341)
>   at 
> org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.record(GridEventStorageManager.java:307)
>   at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.recordEvent(GridDiscoveryManager.java:2478)
>   at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.body0(GridDiscoveryManager.java:2684)
>   at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.body(GridDiscoveryManager.java:2507)
> {noformat}
> After this invocation the result of {{SplitAwareTopologyValidator.validate}} 
> should be changed to {{true}}, but it was already invoked and the result has 
> been cached in {{grpValidRes}} with the value of {{false}}.
> So any successive calls to {{cache.put}} causes to fail
> {noformat}
> Test failed.
> java.lang.RuntimeException: tryPut() failed 
> [gridName=cache.IgniteTopologyValidatorGridSplitCacheTest0]
>   at 
> org.apache.ignite.internal.processors.cache.IgniteTopologyValidatorGridSplitCacheTest.tryPut(IgniteTopologyValidatorGridSplitCacheTest.java:262)
>   at 
> org.apache.ignite.internal.processors.cache.IgniteTopologyValidatorGridSplitCacheTest.testTopologyValidator(IgniteTopologyValidatorGridSplitCacheTest.java:182)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 

[jira] [Comment Edited] (IGNITE-6484) Fix IgnitePdsThreadInterruptionTest failure with larger number of threads.

2017-09-22 Thread Alexei Scherbakov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16176314#comment-16176314
 ] 

Alexei Scherbakov edited comment on IGNITE-6484 at 9/22/17 4:18 PM:


writeComplete conditional wait is made uninterruptable.


was (Author: ascherbakov):
writeComplete conditinal wait is made uninterruptable.

> Fix IgnitePdsThreadInterruptionTest failure with larger number of threads.
> --
>
> Key: IGNITE-6484
> URL: https://issues.apache.org/jira/browse/IGNITE-6484
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Affects Versions: 2.1
>Reporter: Alexei Scherbakov
>Assignee: Alexei Scherbakov
> Fix For: 2.3
>
>
> Test fails on interruptions of conditional wait.
> Related ticket IGNITE-6228



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (IGNITE-6484) Fix IgnitePdsThreadInterruptionTest failure with larger number of threads.

2017-09-22 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov updated IGNITE-6484:
--
Description: 
Test fails on interruptions of conditional wait.

Related ticket IGNITE-6228

  was:Test fails on interruptions of conditional wait.


> Fix IgnitePdsThreadInterruptionTest failure with larger number of threads.
> --
>
> Key: IGNITE-6484
> URL: https://issues.apache.org/jira/browse/IGNITE-6484
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.1
>Reporter: Alexei Scherbakov
>Assignee: Alexei Scherbakov
> Fix For: 2.3
>
>
> Test fails on interruptions of conditional wait.
> Related ticket IGNITE-6228



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6484) Fix IgnitePdsThreadInterruptionTest failure with larger number of threads.

2017-09-22 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-6484:
-

 Summary: Fix IgnitePdsThreadInterruptionTest failure with larger 
number of threads.
 Key: IGNITE-6484
 URL: https://issues.apache.org/jira/browse/IGNITE-6484
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.1
Reporter: Alexei Scherbakov
Assignee: Alexei Scherbakov
 Fix For: 2.3


Test fails on interruptions of conditional wait.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (IGNITE-6228) Avoid closing page store file with ClosedByInterruptException when user thread is interrupted

2017-09-18 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov reassigned IGNITE-6228:
-

Assignee: Alexei Scherbakov  (was: Ivan Rakov)

> Avoid closing page store file with ClosedByInterruptException when user 
> thread is interrupted
> -
>
> Key: IGNITE-6228
> URL: https://issues.apache.org/jira/browse/IGNITE-6228
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Affects Versions: 2.1
>Reporter: Ivan Rakov
>Assignee: Alexei Scherbakov
> Fix For: 2.3
>
> Attachments: RestartGridTest.java
>
>
> If cache proxy is in synchronous mode, user thread may be interrupted during 
> read from file page store file. This will cause closing of partition file 
> with ClosedByInterruptException.
> Example stacktrace:
> {noformat}
> class org.apache.ignite.IgniteCheckedException: Runtime failure on lookup 
> row: 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$SearchRow@717729d
>   at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(BPlusTree.java:1070)
>   at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.find(IgniteCacheOffheapManagerImpl.java:1476)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.find(GridCacheOffheapManager.java:1276)
>   at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.read(IgniteCacheOffheapManagerImpl.java:394)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.unswap(GridCacheMapEntry.java:371)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.onTtlExpired(GridCacheMapEntry.java:2952)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheTtlManager$1.applyx(GridCacheTtlManager.java:61)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheTtlManager$1.applyx(GridCacheTtlManager.java:52)
>   at 
> org.apache.ignite.internal.util.lang.IgniteInClosure2X.apply(IgniteInClosure2X.java:38)
>   at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.expire(IgniteCacheOffheapManagerImpl.java:1012)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheTtlManager.expire(GridCacheTtlManager.java:198)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheUtils.unwindEvicts(GridCacheUtils.java:868)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheGateway.leaveNoLock(GridCacheGateway.java:240)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheGateway.leave(GridCacheGateway.java:225)
>   at 
> org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.onLeave(GatewayProtectedCacheProxy.java:1680)
>   at 
> org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.put(GatewayProtectedCacheProxy.java:875)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.db.RestartGridTest$TestService.execute(RestartGridTest.java:160)
>   at 
> org.apache.ignite.internal.processors.service.GridServiceProcessor$2.run(GridServiceProcessor.java:1160)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: class org.apache.ignite.IgniteCheckedException: Read error
>   at 
> org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore.read(FilePageStore.java:356)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.read(FilePageStoreManager.java:287)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.read(FilePageStoreManager.java:272)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:570)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:488)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.DataStructure.acquirePage(DataStructure.java:129)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.treeMeta(BPlusTree.java:822)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access$7700(BPlusTree.java:81)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Get.init(BPlusTree.java:2392)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doFind(BPlusTree.java:1099)
>   at 

[jira] [Created] (IGNITE-6426) Add support for variable length numbers in raw readers/writers.

2017-09-18 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-6426:
-

 Summary: Add support for variable length numbers in raw 
readers/writers.
 Key: IGNITE-6426
 URL: https://issues.apache.org/jira/browse/IGNITE-6426
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.1
Reporter: Alexei Scherbakov
Assignee: Alexei Scherbakov
 Fix For: 2.3


This is useful for implementing custom efficient compression.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (IGNITE-6405) Deadlock is not detected if timed out on client.

2017-09-18 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov updated IGNITE-6405:
--
Description: 
Timeout exception is thrown instead.

Reproducer:

{noformat}
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *  http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.ignite.internal.processors.cache.transactions;

import java.util.Collections;
import java.util.concurrent.CountDownLatch;
import javax.cache.CacheException;
import org.apache.ignite.Ignite;
import org.apache.ignite.configuration.CacheConfiguration;
import org.apache.ignite.configuration.IgniteConfiguration;
import org.apache.ignite.configuration.TransactionConfiguration;
import org.apache.ignite.internal.IgniteInternalFuture;
import org.apache.ignite.internal.util.typedef.internal.U;
import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;
import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;
import org.apache.ignite.transactions.Transaction;
import org.apache.ignite.transactions.TransactionDeadlockException;

import static org.apache.ignite.cache.CacheAtomicityMode.TRANSACTIONAL;
import static org.apache.ignite.transactions.TransactionConcurrency.PESSIMISTIC;
import static 
org.apache.ignite.transactions.TransactionIsolation.REPEATABLE_READ;

/**
 * Tests an ability to eagerly rollback timed out transactions.
 */
public class TxPessimisticDeadlockDetectionClient extends 
GridCommonAbstractTest {
/** */
private static final long TX_MIN_TIMEOUT = 1;

/** */
private static final long TX_TIMEOUT = 300;

/** */
private static final long TX_DEFAULT_TIMEOUT = 3_000;

/** */
private static final String CACHE_NAME = "test";

/** IP finder. */
private static final TcpDiscoveryVmIpFinder IP_FINDER = new 
TcpDiscoveryVmIpFinder(true);

/** */
private static final int GRID_CNT = 3;

/** */
private final CountDownLatch blocked = new CountDownLatch(1);

/** */
private final CountDownLatch unblocked = new CountDownLatch(1);

/** {@inheritDoc} */
@Override protected IgniteConfiguration getConfiguration(String 
igniteInstanceName) throws Exception {
IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName);

cfg.setClientMode("client".equals(igniteInstanceName));

((TcpDiscoverySpi)cfg.getDiscoverySpi()).setIpFinder(IP_FINDER);

TransactionConfiguration txCfg = new TransactionConfiguration();
txCfg.setDefaultTxTimeout(TX_DEFAULT_TIMEOUT);

cfg.setTransactionConfiguration(txCfg);

CacheConfiguration ccfg = new CacheConfiguration(CACHE_NAME);
ccfg.setAtomicityMode(TRANSACTIONAL);
ccfg.setBackups(2);

cfg.setCacheConfiguration(ccfg);

return cfg;
}

/** {@inheritDoc} */
@Override protected void beforeTest() throws Exception {
super.beforeTest();

startGridsMultiThreaded(GRID_CNT);
}

/** {@inheritDoc} */
@Override protected void afterTest() throws Exception {
super.afterTest();

stopAllGrids();
}

/** */
protected void validateException(Exception e) {
assertEquals("Deadlock report is expected",
TransactionDeadlockException.class, 
e.getCause().getCause().getClass());
}

/**
 * Tests if deadlock is resolved on timeout with correct message.
 *
 * @throws Exception If failed.
 */
public void testDeadlockUnblockedOnTimeout() throws Exception {
Ignite client = startGrid("client");

testDeadlockUnblockedOnTimeout0(client, ignite(0));
}

/**
 * Tests if deadlock is resolved on timeout with correct message.
 * @throws Exception
 */
private void testDeadlockUnblockedOnTimeout0(final Ignite node1, final 
Ignite node2) throws Exception {
final CountDownLatch l = new CountDownLatch(2);

IgniteInternalFuture fut1 = multithreadedAsync(new Runnable() {
@Override public void run() {
try {
try (Transaction tx = node1.transactions().txStart()) {
   

[jira] [Updated] (IGNITE-6181) Tx is not rolled back on timeout leading to potential whole grid hang

2017-09-17 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-6181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov updated IGNITE-6181:
--
Description: 
Incorrectly handled transactions (not calling commit, rollback or close) are 
staying in grid forever, potentially holding locks and preventing exchange 
start.

Unit test reproducer:

{noformat}
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *  http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.ignite.cache;

import java.util.concurrent.CountDownLatch;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicBoolean;
import org.apache.ignite.Ignite;
import org.apache.ignite.IgniteException;
import org.apache.ignite.configuration.CacheConfiguration;
import org.apache.ignite.configuration.IgniteConfiguration;
import org.apache.ignite.configuration.TransactionConfiguration;
import org.apache.ignite.internal.IgniteInternalFuture;
import org.apache.ignite.internal.util.typedef.internal.U;
import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;
import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;
import org.apache.ignite.transactions.Transaction;
import org.apache.ignite.transactions.TransactionConcurrency;
import org.apache.ignite.transactions.TransactionIsolation;

/**
 * Tests ability to rollback not properly closed transaction.
 */
public class IgniteTxTimeoutTest extends GridCommonAbstractTest {
/** */
private static final long TX_TIMEOUT = 3_000L;

/** */
private static final String CACHE_NAME = "test";

/** IP finder. */
private static final TcpDiscoveryVmIpFinder IP_FINDER = new 
TcpDiscoveryVmIpFinder(true);

/** */
private final CountDownLatch l = new CountDownLatch(1);

/** */
private final Object mux = new Object();

/** {@inheritDoc} */
@Override protected IgniteConfiguration getConfiguration(String 
igniteInstanceName) throws Exception {
IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName);

cfg.setDiscoverySpi(new TcpDiscoverySpi().setIpFinder(IP_FINDER));

TransactionConfiguration txCfg = new TransactionConfiguration();
txCfg.setDefaultTxTimeout(TX_TIMEOUT);

cfg.setTransactionConfiguration(txCfg);

CacheConfiguration ccfg = new CacheConfiguration(CACHE_NAME);
ccfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);

cfg.setCacheConfiguration(ccfg);

return cfg;
}

/** */
public void testTxTimeoutHandling() throws Exception {
try {
final Ignite ignite = startGrid(0);

final AtomicBoolean released = new AtomicBoolean();

multithreadedAsync(new Runnable() {
@Override public void run() {
// Start tx with default settings.
try (Transaction tx = ignite.transactions().txStart()) {
ignite.cache(CACHE_NAME).put(1, 1);

l.countDown();

// Wait longer than default timeout.
synchronized (mux) {
while (!released.get()) {
try {
mux.wait();
}
catch (InterruptedException e) {
throw new IgniteException(e);
}
}
}

try {
tx.commit();

fail();
}
catch (IgniteException e) {
// Expect exception - tx is rolled back.
}
}
}
}, 1, "Locker");

IgniteInternalFuture fut2 = multithreadedAsync(new Runnable() {
@Override public void run() {
U.awaitQuiet(l);

// Try to acquire lock.
// Acquisition will be successul then first transaction 
will be 

[jira] [Commented] (IGNITE-6181) Tx is not rolled back on timeout leading to potential whole grid hang

2017-09-17 Thread Alexei Scherbakov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16169254#comment-16169254
 ] 

Alexei Scherbakov commented on IGNITE-6181:
---

Fixed some issues with concurrent deadlock detection. If tx is rolled back due 
to reaching timeout because of deadlock, it might not be reported correctly.

Looks like test [1] is not working correctly on low timeout values <= 100 ms

Larger timeouts work fine. Investigation is needed

TC result: https://ci.ignite.apache.org/viewLog.html?buildId=833829

[1] TxRollbackOnTimeoutTest#testDeadlockUnblockedOnTimeout3

> Tx is not rolled back on timeout leading to potential whole grid hang
> -
>
> Key: IGNITE-6181
> URL: https://issues.apache.org/jira/browse/IGNITE-6181
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.1
>Reporter: Alexei Scherbakov
>Assignee: Alexei Scherbakov
> Fix For: 2.3
>
>
> Unit test reproducer:
> {noformat}
> /*
>  * Licensed to the Apache Software Foundation (ASF) under one or more
>  * contributor license agreements.  See the NOTICE file distributed with
>  * this work for additional information regarding copyright ownership.
>  * The ASF licenses this file to You under the Apache License, Version 2.0
>  * (the "License"); you may not use this file except in compliance with
>  * the License.  You may obtain a copy of the License at
>  *
>  *  http://www.apache.org/licenses/LICENSE-2.0
>  *
>  * Unless required by applicable law or agreed to in writing, software
>  * distributed under the License is distributed on an "AS IS" BASIS,
>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  * See the License for the specific language governing permissions and
>  * limitations under the License.
>  */
> package org.apache.ignite.cache;
> import java.util.concurrent.CountDownLatch;
> import java.util.concurrent.TimeUnit;
> import java.util.concurrent.atomic.AtomicBoolean;
> import org.apache.ignite.Ignite;
> import org.apache.ignite.IgniteException;
> import org.apache.ignite.configuration.CacheConfiguration;
> import org.apache.ignite.configuration.IgniteConfiguration;
> import org.apache.ignite.configuration.TransactionConfiguration;
> import org.apache.ignite.internal.IgniteInternalFuture;
> import org.apache.ignite.internal.util.typedef.internal.U;
> import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
> import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;
> import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;
> import org.apache.ignite.transactions.Transaction;
> import org.apache.ignite.transactions.TransactionConcurrency;
> import org.apache.ignite.transactions.TransactionIsolation;
> /**
>  * Tests ability to rollback not properly closed transaction.
>  */
> public class IgniteTxTimeoutTest extends GridCommonAbstractTest {
> /** */
> private static final long TX_TIMEOUT = 3_000L;
> /** */
> private static final String CACHE_NAME = "test";
> /** IP finder. */
> private static final TcpDiscoveryVmIpFinder IP_FINDER = new 
> TcpDiscoveryVmIpFinder(true);
> /** */
> private final CountDownLatch l = new CountDownLatch(1);
> /** */
> private final Object mux = new Object();
> /** {@inheritDoc} */
> @Override protected IgniteConfiguration getConfiguration(String 
> igniteInstanceName) throws Exception {
> IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName);
> cfg.setDiscoverySpi(new TcpDiscoverySpi().setIpFinder(IP_FINDER));
> TransactionConfiguration txCfg = new TransactionConfiguration();
> txCfg.setDefaultTxTimeout(TX_TIMEOUT);
> cfg.setTransactionConfiguration(txCfg);
> CacheConfiguration ccfg = new CacheConfiguration(CACHE_NAME);
> ccfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
> cfg.setCacheConfiguration(ccfg);
> return cfg;
> }
> /** */
> public void testTxTimeoutHandling() throws Exception {
> try {
> final Ignite ignite = startGrid(0);
> final AtomicBoolean released = new AtomicBoolean();
> multithreadedAsync(new Runnable() {
> @Override public void run() {
> // Start tx with default settings.
> try (Transaction tx = ignite.transactions().txStart()) {
> ignite.cache(CACHE_NAME).put(1, 1);
> l.countDown();
> // Wait longer than default timeout.
> synchronized (mux) {
> while (!released.get()) {
> try {
> mux.wait();
>  

[jira] [Created] (IGNITE-6405) Deadlock is not detected if timed out on client.

2017-09-15 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-6405:
-

 Summary: Deadlock is not detected if timed out on client.
 Key: IGNITE-6405
 URL: https://issues.apache.org/jira/browse/IGNITE-6405
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.1
Reporter: Alexei Scherbakov
Priority: Minor
 Fix For: 2.3


Timeout exception is thrown instead.

Reproducer:

{noformat}
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *  http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.ignite.internal.processors.cache.transactions;

import java.util.Collections;
import java.util.concurrent.CountDownLatch;
import javax.cache.CacheException;
import org.apache.ignite.Ignite;
import org.apache.ignite.configuration.CacheConfiguration;
import org.apache.ignite.configuration.IgniteConfiguration;
import org.apache.ignite.configuration.TransactionConfiguration;
import org.apache.ignite.internal.IgniteInternalFuture;
import org.apache.ignite.internal.util.typedef.internal.U;
import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;
import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;
import org.apache.ignite.transactions.Transaction;
import org.apache.ignite.transactions.TransactionDeadlockException;

import static org.apache.ignite.cache.CacheAtomicityMode.TRANSACTIONAL;
import static org.apache.ignite.transactions.TransactionConcurrency.PESSIMISTIC;
import static 
org.apache.ignite.transactions.TransactionIsolation.REPEATABLE_READ;

/**
 * Tests an ability to eagerly rollback timed out transactions.
 */
public class TxPessimisticDeadlockDetectionClient extends 
GridCommonAbstractTest {
/** */
private static final long TX_MIN_TIMEOUT = 1;

/** */
private static final long TX_TIMEOUT = 300;

/** */
private static final long TX_DEFAULT_TIMEOUT = 3_000;

/** */
private static final String CACHE_NAME = "test";

/** IP finder. */
private static final TcpDiscoveryVmIpFinder IP_FINDER = new 
TcpDiscoveryVmIpFinder(true);

/** */
private static final int GRID_CNT = 3;

/** */
private final CountDownLatch blocked = new CountDownLatch(1);

/** */
private final CountDownLatch unblocked = new CountDownLatch(1);

/** {@inheritDoc} */
@Override protected IgniteConfiguration getConfiguration(String 
igniteInstanceName) throws Exception {
IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName);

cfg.setClientMode("client".equals(igniteInstanceName));

((TcpDiscoverySpi)cfg.getDiscoverySpi()).setIpFinder(IP_FINDER);

TransactionConfiguration txCfg = new TransactionConfiguration();
txCfg.setDefaultTxTimeout(TX_DEFAULT_TIMEOUT);

cfg.setTransactionConfiguration(txCfg);

CacheConfiguration ccfg = new CacheConfiguration(CACHE_NAME);
ccfg.setAtomicityMode(TRANSACTIONAL);
ccfg.setBackups(2);

cfg.setCacheConfiguration(ccfg);

return cfg;
}

/** {@inheritDoc} */
@Override protected void beforeTest() throws Exception {
super.beforeTest();

startGridsMultiThreaded(GRID_CNT);
}

/** {@inheritDoc} */
@Override protected void afterTest() throws Exception {
super.afterTest();

stopAllGrids();
}

/** */
protected void validateException(Exception e) {
assertEquals("Deadlock report is expected",
TransactionDeadlockException.class, 
e.getCause().getCause().getClass());
}

/**
 * Tests if deadlock is resolved on timeout with correct message.
 *
 * @throws Exception If failed.
 */
public void testDeadlockUnblockedOnTimeout() throws Exception {
Ignite client = startGrid("client");

testDeadlockUnblockedOnTimeout0(client, ignite(0));
}

/**
 * Tests if deadlock is resolved on timeout with correct message.
 * @throws Exception
 */
private void testDeadlockUnblockedOnTimeout0(final Ignite node1, final 
Ignite node2) throws Exception {
final CountDownLatch l = new CountDownLatch(2);


[jira] [Commented] (IGNITE-6181) Tx is not rolled back on timeout leading to potential whole grid hang

2017-09-12 Thread Alexei Scherbakov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162757#comment-16162757
 ] 

Alexei Scherbakov commented on IGNITE-6181:
---

TC run results: https://ci.ignite.apache.org/viewLog.html?buildId=822749;

Seems not OK, investigation is needed.

> Tx is not rolled back on timeout leading to potential whole grid hang
> -
>
> Key: IGNITE-6181
> URL: https://issues.apache.org/jira/browse/IGNITE-6181
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.1
>Reporter: Alexei Scherbakov
>Assignee: Alexei Scherbakov
> Fix For: 2.3
>
>
> Unit test reproducer:
> {noformat}
> /*
>  * Licensed to the Apache Software Foundation (ASF) under one or more
>  * contributor license agreements.  See the NOTICE file distributed with
>  * this work for additional information regarding copyright ownership.
>  * The ASF licenses this file to You under the Apache License, Version 2.0
>  * (the "License"); you may not use this file except in compliance with
>  * the License.  You may obtain a copy of the License at
>  *
>  *  http://www.apache.org/licenses/LICENSE-2.0
>  *
>  * Unless required by applicable law or agreed to in writing, software
>  * distributed under the License is distributed on an "AS IS" BASIS,
>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  * See the License for the specific language governing permissions and
>  * limitations under the License.
>  */
> package org.apache.ignite.cache;
> import java.util.concurrent.CountDownLatch;
> import java.util.concurrent.TimeUnit;
> import java.util.concurrent.atomic.AtomicBoolean;
> import org.apache.ignite.Ignite;
> import org.apache.ignite.IgniteException;
> import org.apache.ignite.configuration.CacheConfiguration;
> import org.apache.ignite.configuration.IgniteConfiguration;
> import org.apache.ignite.configuration.TransactionConfiguration;
> import org.apache.ignite.internal.IgniteInternalFuture;
> import org.apache.ignite.internal.util.typedef.internal.U;
> import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
> import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;
> import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;
> import org.apache.ignite.transactions.Transaction;
> import org.apache.ignite.transactions.TransactionConcurrency;
> import org.apache.ignite.transactions.TransactionIsolation;
> /**
>  * Tests ability to rollback not properly closed transaction.
>  */
> public class IgniteTxTimeoutTest extends GridCommonAbstractTest {
> /** */
> private static final long TX_TIMEOUT = 3_000L;
> /** */
> private static final String CACHE_NAME = "test";
> /** IP finder. */
> private static final TcpDiscoveryVmIpFinder IP_FINDER = new 
> TcpDiscoveryVmIpFinder(true);
> /** */
> private final CountDownLatch l = new CountDownLatch(1);
> /** */
> private final Object mux = new Object();
> /** {@inheritDoc} */
> @Override protected IgniteConfiguration getConfiguration(String 
> igniteInstanceName) throws Exception {
> IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName);
> cfg.setDiscoverySpi(new TcpDiscoverySpi().setIpFinder(IP_FINDER));
> TransactionConfiguration txCfg = new TransactionConfiguration();
> txCfg.setDefaultTxTimeout(TX_TIMEOUT);
> cfg.setTransactionConfiguration(txCfg);
> CacheConfiguration ccfg = new CacheConfiguration(CACHE_NAME);
> ccfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
> cfg.setCacheConfiguration(ccfg);
> return cfg;
> }
> /** */
> public void testTxTimeoutHandling() throws Exception {
> try {
> final Ignite ignite = startGrid(0);
> final AtomicBoolean released = new AtomicBoolean();
> multithreadedAsync(new Runnable() {
> @Override public void run() {
> // Start tx with default settings.
> try (Transaction tx = ignite.transactions().txStart()) {
> ignite.cache(CACHE_NAME).put(1, 1);
> l.countDown();
> // Wait longer than default timeout.
> synchronized (mux) {
> while (!released.get()) {
> try {
> mux.wait();
> }
> catch (InterruptedException e) {
> throw new IgniteException(e);
> }
> }
> }
> try {
>  

[jira] [Created] (IGNITE-6343) Index is not used properly if changing sort order.

2017-09-11 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-6343:
-

 Summary: Index is not used properly if changing sort order.
 Key: IGNITE-6343
 URL: https://issues.apache.org/jira/browse/IGNITE-6343
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 2.0
Reporter: Alexei Scherbakov
 Fix For: 2.3


Unit test reproducer:

{noformat}
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *  http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.ignite.internal.processors.cache;

import java.util.Calendar;
import java.util.Collections;
import java.util.Date;
import java.util.LinkedHashMap;
import java.util.List;
import org.apache.ignite.IgniteCache;
import org.apache.ignite.cache.CacheMode;
import org.apache.ignite.cache.QueryEntity;
import org.apache.ignite.cache.QueryIndex;
import org.apache.ignite.cache.QueryIndexType;
import org.apache.ignite.cache.query.SqlFieldsQuery;
import org.apache.ignite.configuration.CacheConfiguration;
import org.apache.ignite.configuration.IgniteConfiguration;
import org.apache.ignite.configuration.MemoryConfiguration;
import org.apache.ignite.configuration.MemoryPolicyConfiguration;
import org.apache.ignite.internal.util.typedef.internal.U;
import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
import org.apache.ignite.spi.discovery.tcp.ipfinder.TcpDiscoveryIpFinder;
import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;
import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;

import static org.apache.ignite.cache.CacheMode.PARTITIONED;
import static org.apache.ignite.cache.CacheWriteSynchronizationMode.FULL_SYNC;
import static java.util.Calendar.*;

/**
 * Tests for cache query results serialization.
 */
public class GridCacheQueryIndexUsageSelfTest extends GridCommonAbstractTest {
/** */
private static final int GRID_CNT = 1;

/** */
private static final String CACHE_NAME = "A";

/** */
private static final CacheMode CACHE_MODE = PARTITIONED;

/** */
private static TcpDiscoveryIpFinder ipFinder = new 
TcpDiscoveryVmIpFinder(true);

/** {@inheritDoc} */
@Override protected void beforeTest() throws Exception {
startGridsMultiThreaded(GRID_CNT);
}

/** {@inheritDoc} */
@Override protected void afterTest() throws Exception {
stopAllGrids();
}

/** {@inheritDoc} */
@Override protected IgniteConfiguration getConfiguration(String 
igniteInstanceName) throws Exception {
IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName);

MemoryPolicyConfiguration mpcfg = new MemoryPolicyConfiguration();
//mpcfg.setMaxSize(2 * 1024 * 1024 * 1024L);
mpcfg.setName("def");

MemoryConfiguration mcfg = new MemoryConfiguration();
mcfg.setDefaultMemoryPolicyName("def");
mcfg.setMemoryPolicies(mpcfg);

cfg.setMemoryConfiguration(mcfg);

TcpDiscoverySpi disco = new TcpDiscoverySpi();

disco.setIpFinder(ipFinder);

cfg.setDiscoverySpi(disco);

CacheConfiguration cacheCfg = defaultCacheConfiguration();

cacheCfg.setName(CACHE_NAME);
cacheCfg.setCacheMode(CACHE_MODE);
cacheCfg.setWriteSynchronizationMode(FULL_SYNC);

QueryEntity qe = new QueryEntity();
qe.setKeyType(Long.class.getName());
qe.setValueType(IndexedValue.class.getName());

LinkedHashMap fields = U.newLinkedHashMap(3);

fields.put("id", Long.class.getName());
fields.put("startDate", Date.class.getName());
qe.setFields(fields);

QueryIndex idx = new QueryIndex();
idx.setIndexType(QueryIndexType.SORTED);
LinkedHashMap idxFields = U.newLinkedHashMap(3);

idxFields.put("startDate", Boolean.TRUE);

idx.setFields(idxFields);

qe.setIndexes(Collections.singleton(idx));

cacheCfg.setQueryEntities(Collections.singleton(qe));

cfg.setCacheConfiguration(cacheCfg);

return cfg;
}

/** */
public void testIndexUsageAscSort() {
testIndexUsage0(true);
}

/** */
public void testIndexUsageDescSort() {
  

[jira] [Updated] (IGNITE-6333) Improve cache transaction metrics to better understand transactions efficiency.

2017-09-11 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-6333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov updated IGNITE-6333:
--
Description: 
I suggest to add:

1. Count of rollbacks due to transaction timeout. Depends on IGNITE-6181

2. Count of rollbacks due to deadlocks.

  was:
I suggest to add:

1. Count of rollbacks due to transaction timeout.

2. Count of rollbacks due to deadlocks.

1 depends on: IGNITE-6181

Summary: Improve cache transaction metrics to better understand 
transactions efficiency.  (was: Improve cache transaction metrics to understand 
transactions efficiency.)

> Improve cache transaction metrics to better understand transactions 
> efficiency.
> ---
>
> Key: IGNITE-6333
> URL: https://issues.apache.org/jira/browse/IGNITE-6333
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.0
>Reporter: Alexei Scherbakov
>  Labels: cache, newbie
> Fix For: 2.3
>
>
> I suggest to add:
> 1. Count of rollbacks due to transaction timeout. Depends on IGNITE-6181
> 2. Count of rollbacks due to deadlocks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (IGNITE-6333) Improve cache transaction metrics to understand transactions efficiency.

2017-09-11 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-6333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov updated IGNITE-6333:
--
Description: 
I suggest to add:

1. Count of rollbacks due to transaction timeout.

2. Count of rollbacks due to deadlocks.

1 depends on: IGNITE-6181

  was:
I suggest to add:

1. Count of rollbacks due to transaction timeout.

2. Count of rollbacks due to deadlocks.


> Improve cache transaction metrics to understand transactions efficiency.
> 
>
> Key: IGNITE-6333
> URL: https://issues.apache.org/jira/browse/IGNITE-6333
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.0
>Reporter: Alexei Scherbakov
>  Labels: cache, newbie
> Fix For: 2.3
>
>
> I suggest to add:
> 1. Count of rollbacks due to transaction timeout.
> 2. Count of rollbacks due to deadlocks.
> 1 depends on: IGNITE-6181



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6333) Improve cache transaction metrics to understand transactions efficiency.

2017-09-11 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-6333:
-

 Summary: Improve cache transaction metrics to understand 
transactions efficiency.
 Key: IGNITE-6333
 URL: https://issues.apache.org/jira/browse/IGNITE-6333
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 2.0
Reporter: Alexei Scherbakov
 Fix For: 2.3


I suggest to add:

1. Count of rollbacks due to transaction timeout.

2. Count of rollbacks due to deadlocks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (IGNITE-6181) Tx is not rolled back on timeout leading to potential whole grid hang

2017-09-11 Thread Alexei Scherbakov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16161032#comment-16161032
 ] 

Alexei Scherbakov commented on IGNITE-6181:
---

Found a proper way to implement proposed solution from 1.

TC is in progress.

> Tx is not rolled back on timeout leading to potential whole grid hang
> -
>
> Key: IGNITE-6181
> URL: https://issues.apache.org/jira/browse/IGNITE-6181
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.1
>Reporter: Alexei Scherbakov
>Assignee: Alexei Scherbakov
> Fix For: 2.3
>
>
> Unit test reproducer:
> {noformat}
> /*
>  * Licensed to the Apache Software Foundation (ASF) under one or more
>  * contributor license agreements.  See the NOTICE file distributed with
>  * this work for additional information regarding copyright ownership.
>  * The ASF licenses this file to You under the Apache License, Version 2.0
>  * (the "License"); you may not use this file except in compliance with
>  * the License.  You may obtain a copy of the License at
>  *
>  *  http://www.apache.org/licenses/LICENSE-2.0
>  *
>  * Unless required by applicable law or agreed to in writing, software
>  * distributed under the License is distributed on an "AS IS" BASIS,
>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  * See the License for the specific language governing permissions and
>  * limitations under the License.
>  */
> package org.apache.ignite.cache;
> import java.util.concurrent.CountDownLatch;
> import java.util.concurrent.TimeUnit;
> import java.util.concurrent.atomic.AtomicBoolean;
> import org.apache.ignite.Ignite;
> import org.apache.ignite.IgniteException;
> import org.apache.ignite.configuration.CacheConfiguration;
> import org.apache.ignite.configuration.IgniteConfiguration;
> import org.apache.ignite.configuration.TransactionConfiguration;
> import org.apache.ignite.internal.IgniteInternalFuture;
> import org.apache.ignite.internal.util.typedef.internal.U;
> import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
> import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;
> import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;
> import org.apache.ignite.transactions.Transaction;
> import org.apache.ignite.transactions.TransactionConcurrency;
> import org.apache.ignite.transactions.TransactionIsolation;
> /**
>  * Tests ability to rollback not properly closed transaction.
>  */
> public class IgniteTxTimeoutTest extends GridCommonAbstractTest {
> /** */
> private static final long TX_TIMEOUT = 3_000L;
> /** */
> private static final String CACHE_NAME = "test";
> /** IP finder. */
> private static final TcpDiscoveryVmIpFinder IP_FINDER = new 
> TcpDiscoveryVmIpFinder(true);
> /** */
> private final CountDownLatch l = new CountDownLatch(1);
> /** */
> private final Object mux = new Object();
> /** {@inheritDoc} */
> @Override protected IgniteConfiguration getConfiguration(String 
> igniteInstanceName) throws Exception {
> IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName);
> cfg.setDiscoverySpi(new TcpDiscoverySpi().setIpFinder(IP_FINDER));
> TransactionConfiguration txCfg = new TransactionConfiguration();
> txCfg.setDefaultTxTimeout(TX_TIMEOUT);
> cfg.setTransactionConfiguration(txCfg);
> CacheConfiguration ccfg = new CacheConfiguration(CACHE_NAME);
> ccfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
> cfg.setCacheConfiguration(ccfg);
> return cfg;
> }
> /** */
> public void testTxTimeoutHandling() throws Exception {
> try {
> final Ignite ignite = startGrid(0);
> final AtomicBoolean released = new AtomicBoolean();
> multithreadedAsync(new Runnable() {
> @Override public void run() {
> // Start tx with default settings.
> try (Transaction tx = ignite.transactions().txStart()) {
> ignite.cache(CACHE_NAME).put(1, 1);
> l.countDown();
> // Wait longer than default timeout.
> synchronized (mux) {
> while (!released.get()) {
> try {
> mux.wait();
> }
> catch (InterruptedException e) {
> throw new IgniteException(e);
> }
> }
> }
> try {
> tx.commit();
>   

[jira] [Commented] (IGNITE-6181) Tx is not rolled back on timeout leading to potential whole grid hang

2017-09-11 Thread Alexei Scherbakov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16160868#comment-16160868
 ] 

Alexei Scherbakov commented on IGNITE-6181:
---

TC results, looks acceptable
https://ci.ignite.apache.org/viewLog.html?buildId=820550

> Tx is not rolled back on timeout leading to potential whole grid hang
> -
>
> Key: IGNITE-6181
> URL: https://issues.apache.org/jira/browse/IGNITE-6181
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.1
>Reporter: Alexei Scherbakov
>Assignee: Alexei Scherbakov
> Fix For: 2.3
>
>
> Unit test reproducer:
> {noformat}
> /*
>  * Licensed to the Apache Software Foundation (ASF) under one or more
>  * contributor license agreements.  See the NOTICE file distributed with
>  * this work for additional information regarding copyright ownership.
>  * The ASF licenses this file to You under the Apache License, Version 2.0
>  * (the "License"); you may not use this file except in compliance with
>  * the License.  You may obtain a copy of the License at
>  *
>  *  http://www.apache.org/licenses/LICENSE-2.0
>  *
>  * Unless required by applicable law or agreed to in writing, software
>  * distributed under the License is distributed on an "AS IS" BASIS,
>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  * See the License for the specific language governing permissions and
>  * limitations under the License.
>  */
> package org.apache.ignite.cache;
> import java.util.concurrent.CountDownLatch;
> import java.util.concurrent.TimeUnit;
> import java.util.concurrent.atomic.AtomicBoolean;
> import org.apache.ignite.Ignite;
> import org.apache.ignite.IgniteException;
> import org.apache.ignite.configuration.CacheConfiguration;
> import org.apache.ignite.configuration.IgniteConfiguration;
> import org.apache.ignite.configuration.TransactionConfiguration;
> import org.apache.ignite.internal.IgniteInternalFuture;
> import org.apache.ignite.internal.util.typedef.internal.U;
> import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
> import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;
> import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;
> import org.apache.ignite.transactions.Transaction;
> import org.apache.ignite.transactions.TransactionConcurrency;
> import org.apache.ignite.transactions.TransactionIsolation;
> /**
>  * Tests ability to rollback not properly closed transaction.
>  */
> public class IgniteTxTimeoutTest extends GridCommonAbstractTest {
> /** */
> private static final long TX_TIMEOUT = 3_000L;
> /** */
> private static final String CACHE_NAME = "test";
> /** IP finder. */
> private static final TcpDiscoveryVmIpFinder IP_FINDER = new 
> TcpDiscoveryVmIpFinder(true);
> /** */
> private final CountDownLatch l = new CountDownLatch(1);
> /** */
> private final Object mux = new Object();
> /** {@inheritDoc} */
> @Override protected IgniteConfiguration getConfiguration(String 
> igniteInstanceName) throws Exception {
> IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName);
> cfg.setDiscoverySpi(new TcpDiscoverySpi().setIpFinder(IP_FINDER));
> TransactionConfiguration txCfg = new TransactionConfiguration();
> txCfg.setDefaultTxTimeout(TX_TIMEOUT);
> cfg.setTransactionConfiguration(txCfg);
> CacheConfiguration ccfg = new CacheConfiguration(CACHE_NAME);
> ccfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
> cfg.setCacheConfiguration(ccfg);
> return cfg;
> }
> /** */
> public void testTxTimeoutHandling() throws Exception {
> try {
> final Ignite ignite = startGrid(0);
> final AtomicBoolean released = new AtomicBoolean();
> multithreadedAsync(new Runnable() {
> @Override public void run() {
> // Start tx with default settings.
> try (Transaction tx = ignite.transactions().txStart()) {
> ignite.cache(CACHE_NAME).put(1, 1);
> l.countDown();
> // Wait longer than default timeout.
> synchronized (mux) {
> while (!released.get()) {
> try {
> mux.wait();
> }
> catch (InterruptedException e) {
> throw new IgniteException(e);
> }
> }
> }
> try {
> tx.commit();
>   

[jira] [Commented] (IGNITE-6181) Tx is not rolled back on timeout leading to potential whole grid hang

2017-09-10 Thread Alexei Scherbakov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16160360#comment-16160360
 ] 

Alexei Scherbakov commented on IGNITE-6181:
---

Looks like a suggestion to get rid of threadMap will not work, because where is 
no strict requirement for calling close() method while working with tx API.
On example, JTA integration doesn't assume calling close.

I reverted my changes and added thread id's set to track timeout events and 
prevent subsequent cache operations after the timeout.

TC in progress.

> Tx is not rolled back on timeout leading to potential whole grid hang
> -
>
> Key: IGNITE-6181
> URL: https://issues.apache.org/jira/browse/IGNITE-6181
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.1
>Reporter: Alexei Scherbakov
>Assignee: Alexei Scherbakov
> Fix For: 2.3
>
>
> Unit test reproducer:
> {noformat}
> /*
>  * Licensed to the Apache Software Foundation (ASF) under one or more
>  * contributor license agreements.  See the NOTICE file distributed with
>  * this work for additional information regarding copyright ownership.
>  * The ASF licenses this file to You under the Apache License, Version 2.0
>  * (the "License"); you may not use this file except in compliance with
>  * the License.  You may obtain a copy of the License at
>  *
>  *  http://www.apache.org/licenses/LICENSE-2.0
>  *
>  * Unless required by applicable law or agreed to in writing, software
>  * distributed under the License is distributed on an "AS IS" BASIS,
>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  * See the License for the specific language governing permissions and
>  * limitations under the License.
>  */
> package org.apache.ignite.cache;
> import java.util.concurrent.CountDownLatch;
> import java.util.concurrent.TimeUnit;
> import java.util.concurrent.atomic.AtomicBoolean;
> import org.apache.ignite.Ignite;
> import org.apache.ignite.IgniteException;
> import org.apache.ignite.configuration.CacheConfiguration;
> import org.apache.ignite.configuration.IgniteConfiguration;
> import org.apache.ignite.configuration.TransactionConfiguration;
> import org.apache.ignite.internal.IgniteInternalFuture;
> import org.apache.ignite.internal.util.typedef.internal.U;
> import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
> import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;
> import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;
> import org.apache.ignite.transactions.Transaction;
> import org.apache.ignite.transactions.TransactionConcurrency;
> import org.apache.ignite.transactions.TransactionIsolation;
> /**
>  * Tests ability to rollback not properly closed transaction.
>  */
> public class IgniteTxTimeoutTest extends GridCommonAbstractTest {
> /** */
> private static final long TX_TIMEOUT = 3_000L;
> /** */
> private static final String CACHE_NAME = "test";
> /** IP finder. */
> private static final TcpDiscoveryVmIpFinder IP_FINDER = new 
> TcpDiscoveryVmIpFinder(true);
> /** */
> private final CountDownLatch l = new CountDownLatch(1);
> /** */
> private final Object mux = new Object();
> /** {@inheritDoc} */
> @Override protected IgniteConfiguration getConfiguration(String 
> igniteInstanceName) throws Exception {
> IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName);
> cfg.setDiscoverySpi(new TcpDiscoverySpi().setIpFinder(IP_FINDER));
> TransactionConfiguration txCfg = new TransactionConfiguration();
> txCfg.setDefaultTxTimeout(TX_TIMEOUT);
> cfg.setTransactionConfiguration(txCfg);
> CacheConfiguration ccfg = new CacheConfiguration(CACHE_NAME);
> ccfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
> cfg.setCacheConfiguration(ccfg);
> return cfg;
> }
> /** */
> public void testTxTimeoutHandling() throws Exception {
> try {
> final Ignite ignite = startGrid(0);
> final AtomicBoolean released = new AtomicBoolean();
> multithreadedAsync(new Runnable() {
> @Override public void run() {
> // Start tx with default settings.
> try (Transaction tx = ignite.transactions().txStart()) {
> ignite.cache(CACHE_NAME).put(1, 1);
> l.countDown();
> // Wait longer than default timeout.
> synchronized (mux) {
> while (!released.get()) {
> try {
> mux.wait();
> }
> 

[jira] [Commented] (IGNITE-6181) Tx is not rolled back on timeout leading to potential whole grid hang

2017-09-10 Thread Alexei Scherbakov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16160307#comment-16160307
 ] 

Alexei Scherbakov commented on IGNITE-6181:
---

1. Done.
2. Added test with smallest possible timeout value. Everything looks ok.
3. Fixed.
4. Done.
5. Done.
6. Done.

TC is in progress, I'll update the issue as soon as be able to get results

> Tx is not rolled back on timeout leading to potential whole grid hang
> -
>
> Key: IGNITE-6181
> URL: https://issues.apache.org/jira/browse/IGNITE-6181
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.1
>Reporter: Alexei Scherbakov
>Assignee: Alexei Scherbakov
> Fix For: 2.3
>
>
> Unit test reproducer:
> {noformat}
> /*
>  * Licensed to the Apache Software Foundation (ASF) under one or more
>  * contributor license agreements.  See the NOTICE file distributed with
>  * this work for additional information regarding copyright ownership.
>  * The ASF licenses this file to You under the Apache License, Version 2.0
>  * (the "License"); you may not use this file except in compliance with
>  * the License.  You may obtain a copy of the License at
>  *
>  *  http://www.apache.org/licenses/LICENSE-2.0
>  *
>  * Unless required by applicable law or agreed to in writing, software
>  * distributed under the License is distributed on an "AS IS" BASIS,
>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  * See the License for the specific language governing permissions and
>  * limitations under the License.
>  */
> package org.apache.ignite.cache;
> import java.util.concurrent.CountDownLatch;
> import java.util.concurrent.TimeUnit;
> import java.util.concurrent.atomic.AtomicBoolean;
> import org.apache.ignite.Ignite;
> import org.apache.ignite.IgniteException;
> import org.apache.ignite.configuration.CacheConfiguration;
> import org.apache.ignite.configuration.IgniteConfiguration;
> import org.apache.ignite.configuration.TransactionConfiguration;
> import org.apache.ignite.internal.IgniteInternalFuture;
> import org.apache.ignite.internal.util.typedef.internal.U;
> import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
> import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;
> import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;
> import org.apache.ignite.transactions.Transaction;
> import org.apache.ignite.transactions.TransactionConcurrency;
> import org.apache.ignite.transactions.TransactionIsolation;
> /**
>  * Tests ability to rollback not properly closed transaction.
>  */
> public class IgniteTxTimeoutTest extends GridCommonAbstractTest {
> /** */
> private static final long TX_TIMEOUT = 3_000L;
> /** */
> private static final String CACHE_NAME = "test";
> /** IP finder. */
> private static final TcpDiscoveryVmIpFinder IP_FINDER = new 
> TcpDiscoveryVmIpFinder(true);
> /** */
> private final CountDownLatch l = new CountDownLatch(1);
> /** */
> private final Object mux = new Object();
> /** {@inheritDoc} */
> @Override protected IgniteConfiguration getConfiguration(String 
> igniteInstanceName) throws Exception {
> IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName);
> cfg.setDiscoverySpi(new TcpDiscoverySpi().setIpFinder(IP_FINDER));
> TransactionConfiguration txCfg = new TransactionConfiguration();
> txCfg.setDefaultTxTimeout(TX_TIMEOUT);
> cfg.setTransactionConfiguration(txCfg);
> CacheConfiguration ccfg = new CacheConfiguration(CACHE_NAME);
> ccfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
> cfg.setCacheConfiguration(ccfg);
> return cfg;
> }
> /** */
> public void testTxTimeoutHandling() throws Exception {
> try {
> final Ignite ignite = startGrid(0);
> final AtomicBoolean released = new AtomicBoolean();
> multithreadedAsync(new Runnable() {
> @Override public void run() {
> // Start tx with default settings.
> try (Transaction tx = ignite.transactions().txStart()) {
> ignite.cache(CACHE_NAME).put(1, 1);
> l.countDown();
> // Wait longer than default timeout.
> synchronized (mux) {
> while (!released.get()) {
> try {
> mux.wait();
> }
> catch (InterruptedException e) {
> throw new IgniteException(e);
> }
> }
> 

[jira] [Commented] (IGNITE-6181) Tx is not rolled back on timeout leading to potential whole grid hang

2017-09-05 Thread Alexei Scherbakov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16153670#comment-16153670
 ] 

Alexei Scherbakov commented on IGNITE-6181:
---

Sam, thanks for review.

1. Removed.
2. Missed what. Fixed.
3. Removed. Probably better leave it to application side.
4. Because timeouts < 100ms have no sense and to prevent races between tx init 
and concurrent tx timeout
5. Consider a scenario:
Thread started a tx with a timeout.
Tx execution reached the timeout point and tx was removed by timeout handler.
Tx execution was continued
Expected behavior: all subsequent operation within current transactions must 
fail, otherwise they will execute as implicit transaction(new tx will start, 
because context is empty), which is incorrect.
To fix it I delayed threadMap's cleanup  for not loosing state on timeout until 
calling tx close method, which in it's turn calls onLocalClose and removes map 
entry completely.
6. This is needed because threadMap may contain uncleared entry for timed out 
transaction, as described in 5. In such case a creation of new explicit 
transaction is allowed.
7. I removed block for checking timeout state, because it's no longer needed. A 
transaction will be rolled back by timeout handler.
8, 9, 10 Fixed


> Tx is not rolled back on timeout leading to potential whole grid hang
> -
>
> Key: IGNITE-6181
> URL: https://issues.apache.org/jira/browse/IGNITE-6181
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.1
>Reporter: Alexei Scherbakov
>Assignee: Alexei Scherbakov
> Fix For: 2.3
>
>
> Unit test reproducer:
> {noformat}
> /*
>  * Licensed to the Apache Software Foundation (ASF) under one or more
>  * contributor license agreements.  See the NOTICE file distributed with
>  * this work for additional information regarding copyright ownership.
>  * The ASF licenses this file to You under the Apache License, Version 2.0
>  * (the "License"); you may not use this file except in compliance with
>  * the License.  You may obtain a copy of the License at
>  *
>  *  http://www.apache.org/licenses/LICENSE-2.0
>  *
>  * Unless required by applicable law or agreed to in writing, software
>  * distributed under the License is distributed on an "AS IS" BASIS,
>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  * See the License for the specific language governing permissions and
>  * limitations under the License.
>  */
> package org.apache.ignite.cache;
> import java.util.concurrent.CountDownLatch;
> import java.util.concurrent.TimeUnit;
> import java.util.concurrent.atomic.AtomicBoolean;
> import org.apache.ignite.Ignite;
> import org.apache.ignite.IgniteException;
> import org.apache.ignite.configuration.CacheConfiguration;
> import org.apache.ignite.configuration.IgniteConfiguration;
> import org.apache.ignite.configuration.TransactionConfiguration;
> import org.apache.ignite.internal.IgniteInternalFuture;
> import org.apache.ignite.internal.util.typedef.internal.U;
> import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
> import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;
> import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;
> import org.apache.ignite.transactions.Transaction;
> import org.apache.ignite.transactions.TransactionConcurrency;
> import org.apache.ignite.transactions.TransactionIsolation;
> /**
>  * Tests ability to rollback not properly closed transaction.
>  */
> public class IgniteTxTimeoutTest extends GridCommonAbstractTest {
> /** */
> private static final long TX_TIMEOUT = 3_000L;
> /** */
> private static final String CACHE_NAME = "test";
> /** IP finder. */
> private static final TcpDiscoveryVmIpFinder IP_FINDER = new 
> TcpDiscoveryVmIpFinder(true);
> /** */
> private final CountDownLatch l = new CountDownLatch(1);
> /** */
> private final Object mux = new Object();
> /** {@inheritDoc} */
> @Override protected IgniteConfiguration getConfiguration(String 
> igniteInstanceName) throws Exception {
> IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName);
> cfg.setDiscoverySpi(new TcpDiscoverySpi().setIpFinder(IP_FINDER));
> TransactionConfiguration txCfg = new TransactionConfiguration();
> txCfg.setDefaultTxTimeout(TX_TIMEOUT);
> cfg.setTransactionConfiguration(txCfg);
> CacheConfiguration ccfg = new CacheConfiguration(CACHE_NAME);
> ccfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
> cfg.setCacheConfiguration(ccfg);
> return cfg;
> }
> /** */
> public void testTxTimeoutHandling() throws Exception {
> try {
> final Ignite ignite = startGrid(0);
>  

[jira] [Resolved] (IGNITE-5385) Get rid of discovery custom message on exchange completion

2017-08-29 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-5385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov resolved IGNITE-5385.
---
Resolution: Won't Fix

Fixed as a part of exchange optimizations for 2.1.4 by [~sboikov]

> Get rid of discovery custom message on exchange completion
> --
>
> Key: IGNITE-5385
> URL: https://issues.apache.org/jira/browse/IGNITE-5385
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache
>Affects Versions: 2.0
>Reporter: Yakov Zhdanov
>Assignee: Alexei Scherbakov
>Priority: Blocker
>  Labels: performance
> Fix For: 2.3
>
>
> Currently if late affinity assignment is on we send full partition map as a 
> custom message to make sure all nodes get it. With greater number of nodes 
> and caches this can cause significant slowdowns.
> We suggest to move sending to communication. In this case scenario with 
> coordinator failure requires special handling, since in this case some nodes 
> may receive full map, complete exchange and proceed with cache operations, 
> while others may not received full map yet. In this case full map should be 
> resend from new coordinator - it should be recalculated if none has received 
> one from former coordinator or should be requested from one of the lucky 
> receivers to get forwarded to other nodes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6203) Valid query cannot be processed.

2017-08-28 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-6203:
-

 Summary: Valid query cannot be processed.
 Key: IGNITE-6203
 URL: https://issues.apache.org/jira/browse/IGNITE-6203
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.1
Reporter: Alexei Scherbakov
 Fix For: 2.3


Query: select * from Integer where _KEY=? and false
Exception:
{noformat}
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to bind 
parameter [idx=1, obj=1, stmt=prep4: SELECT
__Z0._KEY,
__Z0._VAL
FROM "default".INTEGER __Z0
WHERE FALSE]
at 
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.bindObject(IgniteH2Indexing.java:515)
at 
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.bindParameters(IgniteH2Indexing.java:1048)
at 
org.apache.ignite.internal.processors.query.h2.sql.GridSqlQuerySplitter.optimize(GridSqlQuerySplitter.java:1628)
at 
org.apache.ignite.internal.processors.query.h2.sql.GridSqlQuerySplitter.split(GridSqlQuerySplitter.java:220)
at 
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.queryDistributedSqlFields(IgniteH2Indexing.java:1336)
... 18 more
Caused by: org.h2.jdbc.JdbcSQLException: Invalid value "1" for parameter 
"parameterIndex" [90008-195]
...
{noformat}


{noformat}
public void test() throws Exception {
try {
Ignite ignite = startGrid();

SqlFieldsQuery qry = new SqlFieldsQuery("select * from Integer 
where _KEY=? and false");

qry.setArgs(1);

FieldsQueryCursor query = 
ignite.cache(DEFAULT_CACHE_NAME).query(qry);

System.out.println(query.getAll());
} finally {
stopAllGrids();
}
}
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (IGNITE-6200) org.dom4j.QName can't be serialized

2017-08-28 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-6200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov updated IGNITE-6200:
--
Affects Version/s: 2.1
Fix Version/s: 2.3

> org.dom4j.QName can't be serialized
> ---
>
> Key: IGNITE-6200
> URL: https://issues.apache.org/jira/browse/IGNITE-6200
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.1
>Reporter: Alexei Scherbakov
> Fix For: 2.3
>
>
> Exception:
> {noformat}
> class org.apache.ignite.binary.BinaryObjectException: Failed to marshal 
> object with optimized marshaller: org.dom4j.QName@364492 [name: test 
> namespace: "org.dom4j.Namespace@e20 [Namespace: prefix qq mapped to URI ""]"]
>   at 
> org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal0(BinaryWriterExImpl.java:186)
>   at 
> org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:147)
>   at 
> org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:134)
>   at 
> org.apache.ignite.internal.binary.GridBinaryMarshaller.marshal(GridBinaryMarshaller.java:248)
>   at 
> org.apache.ignite.internal.binary.BinaryMarshaller.marshal0(BinaryMarshaller.java:82)
>   at 
> org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.marshal(AbstractNodeNameAwareMarshaller.java:58)
>   at 
> org.apache.ignite.internal.processors.cache.MarshallerTest.test(MarshallerTest.java:160)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at 
> org.apache.ignite.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:2000)
>   at 
> org.apache.ignite.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:132)
>   at 
> org.apache.ignite.testframework.junits.GridAbstractTest$5.run(GridAbstractTest.java:1915)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: class org.apache.ignite.IgniteCheckedException: Failed to 
> serialize object: org.dom4j.QName@364492 [name: test namespace: 
> "org.dom4j.Namespace@e20 [Namespace: prefix qq mapped to URI ""]"]
>   at 
> org.apache.ignite.internal.marshaller.optimized.OptimizedMarshaller.marshal0(OptimizedMarshaller.java:206)
>   at 
> org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.marshal(AbstractNodeNameAwareMarshaller.java:58)
>   at 
> org.apache.ignite.internal.util.IgniteUtils.marshal(IgniteUtils.java:9836)
>   at 
> org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal0(BinaryWriterExImpl.java:179)
>   ... 15 more
> Caused by: java.io.IOException: java.io.NotActiveException: Not in 
> writeObject() call.
>   at 
> org.apache.ignite.internal.marshaller.optimized.OptimizedObjectOutputStream.writeSerializable(OptimizedObjectOutputStream.java:324)
>   at 
> org.apache.ignite.internal.marshaller.optimized.OptimizedClassDescriptor.write(OptimizedClassDescriptor.java:827)
>   at 
> org.apache.ignite.internal.marshaller.optimized.OptimizedObjectOutputStream.writeObject0(OptimizedObjectOutputStream.java:224)
>   at 
> org.apache.ignite.internal.marshaller.optimized.OptimizedObjectOutputStream.writeObjectOverride(OptimizedObjectOutputStream.java:152)
>   at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:344)
>   at 
> org.apache.ignite.internal.marshaller.optimized.OptimizedMarshaller.marshal0(OptimizedMarshaller.java:201)
>   ... 18 more
> Caused by: java.io.NotActiveException: Not in writeObject() call.
>   at 
> org.apache.ignite.internal.marshaller.optimized.OptimizedObjectOutputStream.defaultWriteObject(OptimizedObjectOutputStream.java:684)
>   at org.dom4j.QName.writeObject(QName.java:239)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at 
> org.apache.ignite.internal.marshaller.optimized.OptimizedObjectOutputStream.writeSerializable(OptimizedObjectOutputStream.java:318)
>   ... 23 more
> {noformat}
> Reproducer:
> {noformat}
> public void test() throws Exception {
> try {
> IgniteEx ex = startGrid(0);
> QName qName = new QName("test", new Namespace("qq", null), "q");
> byte[] marshal = 
> ex.configuration().getMarshaller().marshal(qName);
> } finally {
> stopAllGrids();
> }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6200) org.dom4j.QName can't be serialized

2017-08-28 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-6200:
-

 Summary: org.dom4j.QName can't be serialized
 Key: IGNITE-6200
 URL: https://issues.apache.org/jira/browse/IGNITE-6200
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov


Exception:

{noformat}
class org.apache.ignite.binary.BinaryObjectException: Failed to marshal object 
with optimized marshaller: org.dom4j.QName@364492 [name: test namespace: 
"org.dom4j.Namespace@e20 [Namespace: prefix qq mapped to URI ""]"]
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal0(BinaryWriterExImpl.java:186)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:147)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:134)
at 
org.apache.ignite.internal.binary.GridBinaryMarshaller.marshal(GridBinaryMarshaller.java:248)
at 
org.apache.ignite.internal.binary.BinaryMarshaller.marshal0(BinaryMarshaller.java:82)
at 
org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.marshal(AbstractNodeNameAwareMarshaller.java:58)
at 
org.apache.ignite.internal.processors.cache.MarshallerTest.test(MarshallerTest.java:160)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at 
org.apache.ignite.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:2000)
at 
org.apache.ignite.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:132)
at 
org.apache.ignite.testframework.junits.GridAbstractTest$5.run(GridAbstractTest.java:1915)
at java.lang.Thread.run(Thread.java:745)
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to serialize 
object: org.dom4j.QName@364492 [name: test namespace: "org.dom4j.Namespace@e20 
[Namespace: prefix qq mapped to URI ""]"]
at 
org.apache.ignite.internal.marshaller.optimized.OptimizedMarshaller.marshal0(OptimizedMarshaller.java:206)
at 
org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.marshal(AbstractNodeNameAwareMarshaller.java:58)
at 
org.apache.ignite.internal.util.IgniteUtils.marshal(IgniteUtils.java:9836)
at 
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal0(BinaryWriterExImpl.java:179)
... 15 more
Caused by: java.io.IOException: java.io.NotActiveException: Not in 
writeObject() call.
at 
org.apache.ignite.internal.marshaller.optimized.OptimizedObjectOutputStream.writeSerializable(OptimizedObjectOutputStream.java:324)
at 
org.apache.ignite.internal.marshaller.optimized.OptimizedClassDescriptor.write(OptimizedClassDescriptor.java:827)
at 
org.apache.ignite.internal.marshaller.optimized.OptimizedObjectOutputStream.writeObject0(OptimizedObjectOutputStream.java:224)
at 
org.apache.ignite.internal.marshaller.optimized.OptimizedObjectOutputStream.writeObjectOverride(OptimizedObjectOutputStream.java:152)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:344)
at 
org.apache.ignite.internal.marshaller.optimized.OptimizedMarshaller.marshal0(OptimizedMarshaller.java:201)
... 18 more
Caused by: java.io.NotActiveException: Not in writeObject() call.
at 
org.apache.ignite.internal.marshaller.optimized.OptimizedObjectOutputStream.defaultWriteObject(OptimizedObjectOutputStream.java:684)
at org.dom4j.QName.writeObject(QName.java:239)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at 
org.apache.ignite.internal.marshaller.optimized.OptimizedObjectOutputStream.writeSerializable(OptimizedObjectOutputStream.java:318)
... 23 more
{noformat}

Reproducer:

{noformat}
public void test() throws Exception {
try {
IgniteEx ex = startGrid(0);

QName qName = new QName("test", new Namespace("qq", null), "q");

byte[] marshal = ex.configuration().getMarshaller().marshal(qName);
} finally {
stopAllGrids();
}
}
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)



[jira] [Updated] (IGNITE-6181) Tx is not rolled back on timeout leading to potential whole grid hang

2017-08-28 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-6181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov updated IGNITE-6181:
--
Issue Type: Improvement  (was: Bug)

> Tx is not rolled back on timeout leading to potential whole grid hang
> -
>
> Key: IGNITE-6181
> URL: https://issues.apache.org/jira/browse/IGNITE-6181
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.1
>Reporter: Alexei Scherbakov
>Assignee: Alexei Scherbakov
> Fix For: 2.3
>
>
> Unit test reproducer:
> {noformat}
> /*
>  * Licensed to the Apache Software Foundation (ASF) under one or more
>  * contributor license agreements.  See the NOTICE file distributed with
>  * this work for additional information regarding copyright ownership.
>  * The ASF licenses this file to You under the Apache License, Version 2.0
>  * (the "License"); you may not use this file except in compliance with
>  * the License.  You may obtain a copy of the License at
>  *
>  *  http://www.apache.org/licenses/LICENSE-2.0
>  *
>  * Unless required by applicable law or agreed to in writing, software
>  * distributed under the License is distributed on an "AS IS" BASIS,
>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  * See the License for the specific language governing permissions and
>  * limitations under the License.
>  */
> package org.apache.ignite.cache;
> import java.util.concurrent.CountDownLatch;
> import java.util.concurrent.TimeUnit;
> import java.util.concurrent.atomic.AtomicBoolean;
> import org.apache.ignite.Ignite;
> import org.apache.ignite.IgniteException;
> import org.apache.ignite.configuration.CacheConfiguration;
> import org.apache.ignite.configuration.IgniteConfiguration;
> import org.apache.ignite.configuration.TransactionConfiguration;
> import org.apache.ignite.internal.IgniteInternalFuture;
> import org.apache.ignite.internal.util.typedef.internal.U;
> import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
> import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;
> import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;
> import org.apache.ignite.transactions.Transaction;
> import org.apache.ignite.transactions.TransactionConcurrency;
> import org.apache.ignite.transactions.TransactionIsolation;
> /**
>  * Tests ability to rollback not properly closed transaction.
>  */
> public class IgniteTxTimeoutTest extends GridCommonAbstractTest {
> /** */
> private static final long TX_TIMEOUT = 3_000L;
> /** */
> private static final String CACHE_NAME = "test";
> /** IP finder. */
> private static final TcpDiscoveryVmIpFinder IP_FINDER = new 
> TcpDiscoveryVmIpFinder(true);
> /** */
> private final CountDownLatch l = new CountDownLatch(1);
> /** */
> private final Object mux = new Object();
> /** {@inheritDoc} */
> @Override protected IgniteConfiguration getConfiguration(String 
> igniteInstanceName) throws Exception {
> IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName);
> cfg.setDiscoverySpi(new TcpDiscoverySpi().setIpFinder(IP_FINDER));
> TransactionConfiguration txCfg = new TransactionConfiguration();
> txCfg.setDefaultTxTimeout(TX_TIMEOUT);
> cfg.setTransactionConfiguration(txCfg);
> CacheConfiguration ccfg = new CacheConfiguration(CACHE_NAME);
> ccfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
> cfg.setCacheConfiguration(ccfg);
> return cfg;
> }
> /** */
> public void testTxTimeoutHandling() throws Exception {
> try {
> final Ignite ignite = startGrid(0);
> final AtomicBoolean released = new AtomicBoolean();
> multithreadedAsync(new Runnable() {
> @Override public void run() {
> // Start tx with default settings.
> try (Transaction tx = ignite.transactions().txStart()) {
> ignite.cache(CACHE_NAME).put(1, 1);
> l.countDown();
> // Wait longer than default timeout.
> synchronized (mux) {
> while (!released.get()) {
> try {
> mux.wait();
> }
> catch (InterruptedException e) {
> throw new IgniteException(e);
> }
> }
> }
> try {
> tx.commit();
> fail();
> }
> catch 

[jira] [Commented] (IGNITE-6181) Tx is not rolled back on timeout leading to potential whole grid hang

2017-08-28 Thread Alexei Scherbakov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143596#comment-16143596
 ] 

Alexei Scherbakov commented on IGNITE-6181:
---

Proposed fix: associate TimeoutObject, containing cleaning closure, with a 
transaction on initiator node.

TimeoutObject must be removed after transaction reaching PREPARE state.

If timeout closure is triggered, tx state must be atomically switched to 
rollback only, preventing further access



> Tx is not rolled back on timeout leading to potential whole grid hang
> -
>
> Key: IGNITE-6181
> URL: https://issues.apache.org/jira/browse/IGNITE-6181
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.1
>Reporter: Alexei Scherbakov
>Assignee: Alexei Scherbakov
> Fix For: 2.3
>
>
> Unit test reproducer:
> {noformat}
> /*
>  * Licensed to the Apache Software Foundation (ASF) under one or more
>  * contributor license agreements.  See the NOTICE file distributed with
>  * this work for additional information regarding copyright ownership.
>  * The ASF licenses this file to You under the Apache License, Version 2.0
>  * (the "License"); you may not use this file except in compliance with
>  * the License.  You may obtain a copy of the License at
>  *
>  *  http://www.apache.org/licenses/LICENSE-2.0
>  *
>  * Unless required by applicable law or agreed to in writing, software
>  * distributed under the License is distributed on an "AS IS" BASIS,
>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  * See the License for the specific language governing permissions and
>  * limitations under the License.
>  */
> package org.apache.ignite.cache;
> import java.util.concurrent.CountDownLatch;
> import java.util.concurrent.TimeUnit;
> import java.util.concurrent.atomic.AtomicBoolean;
> import org.apache.ignite.Ignite;
> import org.apache.ignite.IgniteException;
> import org.apache.ignite.configuration.CacheConfiguration;
> import org.apache.ignite.configuration.IgniteConfiguration;
> import org.apache.ignite.configuration.TransactionConfiguration;
> import org.apache.ignite.internal.IgniteInternalFuture;
> import org.apache.ignite.internal.util.typedef.internal.U;
> import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
> import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;
> import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;
> import org.apache.ignite.transactions.Transaction;
> import org.apache.ignite.transactions.TransactionConcurrency;
> import org.apache.ignite.transactions.TransactionIsolation;
> /**
>  * Tests ability to rollback not properly closed transaction.
>  */
> public class IgniteTxTimeoutTest extends GridCommonAbstractTest {
> /** */
> private static final long TX_TIMEOUT = 3_000L;
> /** */
> private static final String CACHE_NAME = "test";
> /** IP finder. */
> private static final TcpDiscoveryVmIpFinder IP_FINDER = new 
> TcpDiscoveryVmIpFinder(true);
> /** */
> private final CountDownLatch l = new CountDownLatch(1);
> /** */
> private final Object mux = new Object();
> /** {@inheritDoc} */
> @Override protected IgniteConfiguration getConfiguration(String 
> igniteInstanceName) throws Exception {
> IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName);
> cfg.setDiscoverySpi(new TcpDiscoverySpi().setIpFinder(IP_FINDER));
> TransactionConfiguration txCfg = new TransactionConfiguration();
> txCfg.setDefaultTxTimeout(TX_TIMEOUT);
> cfg.setTransactionConfiguration(txCfg);
> CacheConfiguration ccfg = new CacheConfiguration(CACHE_NAME);
> ccfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
> cfg.setCacheConfiguration(ccfg);
> return cfg;
> }
> /** */
> public void testTxTimeoutHandling() throws Exception {
> try {
> final Ignite ignite = startGrid(0);
> final AtomicBoolean released = new AtomicBoolean();
> multithreadedAsync(new Runnable() {
> @Override public void run() {
> // Start tx with default settings.
> try (Transaction tx = ignite.transactions().txStart()) {
> ignite.cache(CACHE_NAME).put(1, 1);
> l.countDown();
> // Wait longer than default timeout.
> synchronized (mux) {
> while (!released.get()) {
> try {
> mux.wait();
> }
> catch (InterruptedException e) {
> throw 

[jira] [Assigned] (IGNITE-6181) Tx is not rolled back on timeout leading to potential whole grid hang

2017-08-28 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-6181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov reassigned IGNITE-6181:
-

Assignee: Alexei Scherbakov

> Tx is not rolled back on timeout leading to potential whole grid hang
> -
>
> Key: IGNITE-6181
> URL: https://issues.apache.org/jira/browse/IGNITE-6181
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.1
>Reporter: Alexei Scherbakov
>Assignee: Alexei Scherbakov
> Fix For: 2.3
>
>
> Unit test reproducer:
> {noformat}
> /*
>  * Licensed to the Apache Software Foundation (ASF) under one or more
>  * contributor license agreements.  See the NOTICE file distributed with
>  * this work for additional information regarding copyright ownership.
>  * The ASF licenses this file to You under the Apache License, Version 2.0
>  * (the "License"); you may not use this file except in compliance with
>  * the License.  You may obtain a copy of the License at
>  *
>  *  http://www.apache.org/licenses/LICENSE-2.0
>  *
>  * Unless required by applicable law or agreed to in writing, software
>  * distributed under the License is distributed on an "AS IS" BASIS,
>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  * See the License for the specific language governing permissions and
>  * limitations under the License.
>  */
> package org.apache.ignite.cache;
> import java.util.concurrent.CountDownLatch;
> import java.util.concurrent.TimeUnit;
> import java.util.concurrent.atomic.AtomicBoolean;
> import org.apache.ignite.Ignite;
> import org.apache.ignite.IgniteException;
> import org.apache.ignite.configuration.CacheConfiguration;
> import org.apache.ignite.configuration.IgniteConfiguration;
> import org.apache.ignite.configuration.TransactionConfiguration;
> import org.apache.ignite.internal.IgniteInternalFuture;
> import org.apache.ignite.internal.util.typedef.internal.U;
> import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
> import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;
> import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;
> import org.apache.ignite.transactions.Transaction;
> import org.apache.ignite.transactions.TransactionConcurrency;
> import org.apache.ignite.transactions.TransactionIsolation;
> /**
>  * Tests ability to rollback not properly closed transaction.
>  */
> public class IgniteTxTimeoutTest extends GridCommonAbstractTest {
> /** */
> private static final long TX_TIMEOUT = 3_000L;
> /** */
> private static final String CACHE_NAME = "test";
> /** IP finder. */
> private static final TcpDiscoveryVmIpFinder IP_FINDER = new 
> TcpDiscoveryVmIpFinder(true);
> /** */
> private final CountDownLatch l = new CountDownLatch(1);
> /** */
> private final Object mux = new Object();
> /** {@inheritDoc} */
> @Override protected IgniteConfiguration getConfiguration(String 
> igniteInstanceName) throws Exception {
> IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName);
> cfg.setDiscoverySpi(new TcpDiscoverySpi().setIpFinder(IP_FINDER));
> TransactionConfiguration txCfg = new TransactionConfiguration();
> txCfg.setDefaultTxTimeout(TX_TIMEOUT);
> cfg.setTransactionConfiguration(txCfg);
> CacheConfiguration ccfg = new CacheConfiguration(CACHE_NAME);
> ccfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
> cfg.setCacheConfiguration(ccfg);
> return cfg;
> }
> /** */
> public void testTxTimeoutHandling() throws Exception {
> try {
> final Ignite ignite = startGrid(0);
> final AtomicBoolean released = new AtomicBoolean();
> multithreadedAsync(new Runnable() {
> @Override public void run() {
> // Start tx with default settings.
> try (Transaction tx = ignite.transactions().txStart()) {
> ignite.cache(CACHE_NAME).put(1, 1);
> l.countDown();
> // Wait longer than default timeout.
> synchronized (mux) {
> while (!released.get()) {
> try {
> mux.wait();
> }
> catch (InterruptedException e) {
> throw new IgniteException(e);
> }
> }
> }
> try {
> tx.commit();
> fail();
> }
> catch 

[jira] [Comment Edited] (IGNITE-6181) Tx is not rolled back on timeout leading to potential whole grid hang

2017-08-25 Thread Alexei Scherbakov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16141996#comment-16141996
 ] 

Alexei Scherbakov edited comment on IGNITE-6181 at 8/25/17 6:54 PM:


"By design pessimistic transaction can detect timeout if some transaction 
related process is in progress (lock acquiring or prepare phase)"

This is exactly the thing needed to be somehow fixed.

If a transaction is started and abandoned for some case(e.g exception in the 
middle and not wrapping a transaction with try-with-resources block), it will 
lock all subsequent transactions(without timeouts) which are trying to acquire 
locks held by first transaction and will block exchange process.

I suggest to associate a TimeoutObject with a transaction having non-zero 
timeout.

Reopening the issue.


was (Author: ascherbakov):
"By design pessimistic transaction can detect timeout if some transaction 
related process is in progress (lock acquiring or prepare phase)"

This is exactly the thing needed to be somehow fixed.

If a transaction is started and abandoned for some case(e.g exception in the 
middle and not wrapping a transaction with try-with-resources block), it will 
lock all subsequent transactions(without timeouts) which are trying to acquire 
locks held by hanging thread and will block exchange process.

I suggest to associate a TimeoutObject with a transaction having non-zero 
timeout.

Reopening the issue.

> Tx is not rolled back on timeout leading to potential whole grid hang
> -
>
> Key: IGNITE-6181
> URL: https://issues.apache.org/jira/browse/IGNITE-6181
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.1
>Reporter: Alexei Scherbakov
> Fix For: 2.3
>
>
> Unit test reproducer:
> {noformat}
> /*
>  * Licensed to the Apache Software Foundation (ASF) under one or more
>  * contributor license agreements.  See the NOTICE file distributed with
>  * this work for additional information regarding copyright ownership.
>  * The ASF licenses this file to You under the Apache License, Version 2.0
>  * (the "License"); you may not use this file except in compliance with
>  * the License.  You may obtain a copy of the License at
>  *
>  *  http://www.apache.org/licenses/LICENSE-2.0
>  *
>  * Unless required by applicable law or agreed to in writing, software
>  * distributed under the License is distributed on an "AS IS" BASIS,
>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  * See the License for the specific language governing permissions and
>  * limitations under the License.
>  */
> package org.apache.ignite.cache;
> import java.util.concurrent.CountDownLatch;
> import java.util.concurrent.TimeUnit;
> import java.util.concurrent.atomic.AtomicBoolean;
> import org.apache.ignite.Ignite;
> import org.apache.ignite.IgniteException;
> import org.apache.ignite.configuration.CacheConfiguration;
> import org.apache.ignite.configuration.IgniteConfiguration;
> import org.apache.ignite.configuration.TransactionConfiguration;
> import org.apache.ignite.internal.IgniteInternalFuture;
> import org.apache.ignite.internal.util.typedef.internal.U;
> import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
> import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;
> import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;
> import org.apache.ignite.transactions.Transaction;
> import org.apache.ignite.transactions.TransactionConcurrency;
> import org.apache.ignite.transactions.TransactionIsolation;
> /**
>  * Tests ability to rollback not properly closed transaction.
>  */
> public class IgniteTxTimeoutTest extends GridCommonAbstractTest {
> /** */
> private static final long TX_TIMEOUT = 3_000L;
> /** */
> private static final String CACHE_NAME = "test";
> /** IP finder. */
> private static final TcpDiscoveryVmIpFinder IP_FINDER = new 
> TcpDiscoveryVmIpFinder(true);
> /** */
> private final CountDownLatch l = new CountDownLatch(1);
> /** */
> private final Object mux = new Object();
> /** {@inheritDoc} */
> @Override protected IgniteConfiguration getConfiguration(String 
> igniteInstanceName) throws Exception {
> IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName);
> cfg.setDiscoverySpi(new TcpDiscoverySpi().setIpFinder(IP_FINDER));
> TransactionConfiguration txCfg = new TransactionConfiguration();
> txCfg.setDefaultTxTimeout(TX_TIMEOUT);
> cfg.setTransactionConfiguration(txCfg);
> CacheConfiguration ccfg = new CacheConfiguration(CACHE_NAME);
> ccfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
> cfg.setCacheConfiguration(ccfg);
> return cfg;
> }

[jira] [Comment Edited] (IGNITE-6181) Tx is not rolled back on timeout leading to potential whole grid hang

2017-08-25 Thread Alexei Scherbakov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16141996#comment-16141996
 ] 

Alexei Scherbakov edited comment on IGNITE-6181 at 8/25/17 6:52 PM:


"By design pessimistic transaction can detect timeout if some transaction 
related process is in progress (lock acquiring or prepare phase)"

This is exactly the thing needed to be somehow fixed.

If a transaction is started and abandoned for some case(e.g exception in the 
middle and not wrapping a transaction with try-with-resources block), it will 
lock all other transactions(without timeouts) which are trying to acquire locks 
held by hanging thread and will block exchange process.

I suggest to associate a TimeoutObject with a transaction having non-zero 
timeout.

Reopening the issue.


was (Author: ascherbakov):
"By design pessimistic transaction can detect timeout if some transaction 
related process is in progress (lock acquiring or prepare phase)"

This is exactly the thing needed to be somehow fixed.

If a transaction is started and abandoned for some case(e.g exception in the 
middle and not wrapping a transaction with try-with-resources block) without 
proper closing, it will lock all other transactions(without timeouts) which are 
trying to acquire locks held by hanging thread and will block exchange process.

I suggest to associate a TimeoutObject with a transaction having non-zero 
timeout.

Reopening the issue.

> Tx is not rolled back on timeout leading to potential whole grid hang
> -
>
> Key: IGNITE-6181
> URL: https://issues.apache.org/jira/browse/IGNITE-6181
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.1
>Reporter: Alexei Scherbakov
> Fix For: 2.3
>
>
> Unit test reproducer:
> {noformat}
> /*
>  * Licensed to the Apache Software Foundation (ASF) under one or more
>  * contributor license agreements.  See the NOTICE file distributed with
>  * this work for additional information regarding copyright ownership.
>  * The ASF licenses this file to You under the Apache License, Version 2.0
>  * (the "License"); you may not use this file except in compliance with
>  * the License.  You may obtain a copy of the License at
>  *
>  *  http://www.apache.org/licenses/LICENSE-2.0
>  *
>  * Unless required by applicable law or agreed to in writing, software
>  * distributed under the License is distributed on an "AS IS" BASIS,
>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  * See the License for the specific language governing permissions and
>  * limitations under the License.
>  */
> package org.apache.ignite.cache;
> import java.util.concurrent.CountDownLatch;
> import java.util.concurrent.TimeUnit;
> import java.util.concurrent.atomic.AtomicBoolean;
> import org.apache.ignite.Ignite;
> import org.apache.ignite.IgniteException;
> import org.apache.ignite.configuration.CacheConfiguration;
> import org.apache.ignite.configuration.IgniteConfiguration;
> import org.apache.ignite.configuration.TransactionConfiguration;
> import org.apache.ignite.internal.IgniteInternalFuture;
> import org.apache.ignite.internal.util.typedef.internal.U;
> import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
> import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;
> import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;
> import org.apache.ignite.transactions.Transaction;
> import org.apache.ignite.transactions.TransactionConcurrency;
> import org.apache.ignite.transactions.TransactionIsolation;
> /**
>  * Tests ability to rollback not properly closed transaction.
>  */
> public class IgniteTxTimeoutTest extends GridCommonAbstractTest {
> /** */
> private static final long TX_TIMEOUT = 3_000L;
> /** */
> private static final String CACHE_NAME = "test";
> /** IP finder. */
> private static final TcpDiscoveryVmIpFinder IP_FINDER = new 
> TcpDiscoveryVmIpFinder(true);
> /** */
> private final CountDownLatch l = new CountDownLatch(1);
> /** */
> private final Object mux = new Object();
> /** {@inheritDoc} */
> @Override protected IgniteConfiguration getConfiguration(String 
> igniteInstanceName) throws Exception {
> IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName);
> cfg.setDiscoverySpi(new TcpDiscoverySpi().setIpFinder(IP_FINDER));
> TransactionConfiguration txCfg = new TransactionConfiguration();
> txCfg.setDefaultTxTimeout(TX_TIMEOUT);
> cfg.setTransactionConfiguration(txCfg);
> CacheConfiguration ccfg = new CacheConfiguration(CACHE_NAME);
> ccfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
> cfg.setCacheConfiguration(ccfg);
> return 

[jira] [Comment Edited] (IGNITE-6181) Tx is not rolled back on timeout leading to potential whole grid hang

2017-08-25 Thread Alexei Scherbakov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16141996#comment-16141996
 ] 

Alexei Scherbakov edited comment on IGNITE-6181 at 8/25/17 6:51 PM:


"By design pessimistic transaction can detect timeout if some transaction 
related process is in progress (lock acquiring or prepare phase)"

This is exactly the thing needed to be somehow fixed.

If a transaction is started and abandoned for some case(e.g exception in the 
middle and not wrapping a transaction with try-with-resources block) without 
proper closing, it will lock all other transactions(without timeouts) which are 
trying to acquire locks held by hanging thread and will block exchange process.

I suggest to associate a TimeoutObject with a transaction having non-zero 
timeout.

Reopening the issue.


was (Author: ascherbakov):
"By design pessimistic transaction can detect timeout if some transaction 
related process is in progress (lock acquiring or prepare phase)"

This is exactly the thing needed to be somehow fixed.

If a transaction is started and abandoned for some case(e.g exception in the 
middle) without proper closing, it will lock all other transactions(without 
timeouts) which are trying to acquire locks held by hanging thread and will 
block exchange process.

I suggest to associate a TimeoutObject with a transaction having non-zero 
timeout.

Reopening the issue.

> Tx is not rolled back on timeout leading to potential whole grid hang
> -
>
> Key: IGNITE-6181
> URL: https://issues.apache.org/jira/browse/IGNITE-6181
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.1
>Reporter: Alexei Scherbakov
> Fix For: 2.3
>
>
> Unit test reproducer:
> {noformat}
> /*
>  * Licensed to the Apache Software Foundation (ASF) under one or more
>  * contributor license agreements.  See the NOTICE file distributed with
>  * this work for additional information regarding copyright ownership.
>  * The ASF licenses this file to You under the Apache License, Version 2.0
>  * (the "License"); you may not use this file except in compliance with
>  * the License.  You may obtain a copy of the License at
>  *
>  *  http://www.apache.org/licenses/LICENSE-2.0
>  *
>  * Unless required by applicable law or agreed to in writing, software
>  * distributed under the License is distributed on an "AS IS" BASIS,
>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  * See the License for the specific language governing permissions and
>  * limitations under the License.
>  */
> package org.apache.ignite.cache;
> import java.util.concurrent.CountDownLatch;
> import java.util.concurrent.TimeUnit;
> import java.util.concurrent.atomic.AtomicBoolean;
> import org.apache.ignite.Ignite;
> import org.apache.ignite.IgniteException;
> import org.apache.ignite.configuration.CacheConfiguration;
> import org.apache.ignite.configuration.IgniteConfiguration;
> import org.apache.ignite.configuration.TransactionConfiguration;
> import org.apache.ignite.internal.IgniteInternalFuture;
> import org.apache.ignite.internal.util.typedef.internal.U;
> import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
> import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;
> import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;
> import org.apache.ignite.transactions.Transaction;
> import org.apache.ignite.transactions.TransactionConcurrency;
> import org.apache.ignite.transactions.TransactionIsolation;
> /**
>  * Tests ability to rollback not properly closed transaction.
>  */
> public class IgniteTxTimeoutTest extends GridCommonAbstractTest {
> /** */
> private static final long TX_TIMEOUT = 3_000L;
> /** */
> private static final String CACHE_NAME = "test";
> /** IP finder. */
> private static final TcpDiscoveryVmIpFinder IP_FINDER = new 
> TcpDiscoveryVmIpFinder(true);
> /** */
> private final CountDownLatch l = new CountDownLatch(1);
> /** */
> private final Object mux = new Object();
> /** {@inheritDoc} */
> @Override protected IgniteConfiguration getConfiguration(String 
> igniteInstanceName) throws Exception {
> IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName);
> cfg.setDiscoverySpi(new TcpDiscoverySpi().setIpFinder(IP_FINDER));
> TransactionConfiguration txCfg = new TransactionConfiguration();
> txCfg.setDefaultTxTimeout(TX_TIMEOUT);
> cfg.setTransactionConfiguration(txCfg);
> CacheConfiguration ccfg = new CacheConfiguration(CACHE_NAME);
> ccfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
> cfg.setCacheConfiguration(ccfg);
> return cfg;
> }
> /** */
> public 

[jira] [Updated] (IGNITE-6181) Tx is not rolled back on timeout leading to potential whole grid hang

2017-08-25 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-6181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov updated IGNITE-6181:
--
Description: 
Unit test reproducer:

{noformat}
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *  http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.ignite.cache;

import java.util.concurrent.CountDownLatch;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicBoolean;
import org.apache.ignite.Ignite;
import org.apache.ignite.IgniteException;
import org.apache.ignite.configuration.CacheConfiguration;
import org.apache.ignite.configuration.IgniteConfiguration;
import org.apache.ignite.configuration.TransactionConfiguration;
import org.apache.ignite.internal.IgniteInternalFuture;
import org.apache.ignite.internal.util.typedef.internal.U;
import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;
import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;
import org.apache.ignite.transactions.Transaction;
import org.apache.ignite.transactions.TransactionConcurrency;
import org.apache.ignite.transactions.TransactionIsolation;

/**
 * Tests ability to rollback not properly closed transaction.
 */
public class IgniteTxTimeoutTest extends GridCommonAbstractTest {
/** */
private static final long TX_TIMEOUT = 3_000L;

/** */
private static final String CACHE_NAME = "test";

/** IP finder. */
private static final TcpDiscoveryVmIpFinder IP_FINDER = new 
TcpDiscoveryVmIpFinder(true);

/** */
private final CountDownLatch l = new CountDownLatch(1);

/** */
private final Object mux = new Object();

/** {@inheritDoc} */
@Override protected IgniteConfiguration getConfiguration(String 
igniteInstanceName) throws Exception {
IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName);

cfg.setDiscoverySpi(new TcpDiscoverySpi().setIpFinder(IP_FINDER));

TransactionConfiguration txCfg = new TransactionConfiguration();
txCfg.setDefaultTxTimeout(TX_TIMEOUT);

cfg.setTransactionConfiguration(txCfg);

CacheConfiguration ccfg = new CacheConfiguration(CACHE_NAME);
ccfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);

cfg.setCacheConfiguration(ccfg);

return cfg;
}

/** */
public void testTxTimeoutHandling() throws Exception {
try {
final Ignite ignite = startGrid(0);

final AtomicBoolean released = new AtomicBoolean();

multithreadedAsync(new Runnable() {
@Override public void run() {
// Start tx with default settings.
try (Transaction tx = ignite.transactions().txStart()) {
ignite.cache(CACHE_NAME).put(1, 1);

l.countDown();

// Wait longer than default timeout.
synchronized (mux) {
while (!released.get()) {
try {
mux.wait();
}
catch (InterruptedException e) {
throw new IgniteException(e);
}
}
}

try {
tx.commit();

fail();
}
catch (IgniteException e) {
// Expect exception - tx is rolled back.
}
}
}
}, 1, "Locker");

IgniteInternalFuture fut2 = multithreadedAsync(new Runnable() {
@Override public void run() {
U.awaitQuiet(l);

// Try to acquire lock.
// Acquisition will be successul then first transaction 
will be rolled back after timeout.
try (Transaction tx = 
ignite.transactions().txStart(TransactionConcurrency.PESSIMISTIC,

[jira] [Reopened] (IGNITE-6181) Tx is not rolled back on timeout leading to potential whole grid hang

2017-08-25 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-6181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov reopened IGNITE-6181:
---

> Tx is not rolled back on timeout leading to potential whole grid hang
> -
>
> Key: IGNITE-6181
> URL: https://issues.apache.org/jira/browse/IGNITE-6181
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.1
>Reporter: Alexei Scherbakov
> Fix For: 2.3
>
>
> Unit test reproducer:
> {noformat}
> /*
>  * Licensed to the Apache Software Foundation (ASF) under one or more
>  * contributor license agreements.  See the NOTICE file distributed with
>  * this work for additional information regarding copyright ownership.
>  * The ASF licenses this file to You under the Apache License, Version 2.0
>  * (the "License"); you may not use this file except in compliance with
>  * the License.  You may obtain a copy of the License at
>  *
>  *  http://www.apache.org/licenses/LICENSE-2.0
>  *
>  * Unless required by applicable law or agreed to in writing, software
>  * distributed under the License is distributed on an "AS IS" BASIS,
>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  * See the License for the specific language governing permissions and
>  * limitations under the License.
>  */
> package org.apache.ignite.cache;
> import java.util.concurrent.CountDownLatch;
> import java.util.concurrent.TimeUnit;
> import java.util.concurrent.atomic.AtomicBoolean;
> import org.apache.ignite.Ignite;
> import org.apache.ignite.IgniteException;
> import org.apache.ignite.configuration.CacheConfiguration;
> import org.apache.ignite.configuration.IgniteConfiguration;
> import org.apache.ignite.configuration.TransactionConfiguration;
> import org.apache.ignite.internal.IgniteInternalFuture;
> import org.apache.ignite.internal.util.typedef.internal.U;
> import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
> import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;
> import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;
> import org.apache.ignite.transactions.Transaction;
> import org.apache.ignite.transactions.TransactionConcurrency;
> import org.apache.ignite.transactions.TransactionIsolation;
> /**
>  * Tests ability to rollback not properly closed transaction.
>  */
> public class IgniteTxTimeoutTest extends GridCommonAbstractTest {
> /** */
> private static final long TX_TIMEOUT = 3_000L;
> /** */
> private static final String CACHE_NAME = "test";
> /** IP finder. */
> private static final TcpDiscoveryVmIpFinder IP_FINDER = new 
> TcpDiscoveryVmIpFinder(true);
> /** */
> private final CountDownLatch l = new CountDownLatch(1);
> /** */
> private final Object mux = new Object();
> /** {@inheritDoc} */
> @Override protected IgniteConfiguration getConfiguration(String 
> igniteInstanceName) throws Exception {
> IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName);
> cfg.setDiscoverySpi(new TcpDiscoverySpi().setIpFinder(IP_FINDER));
> TransactionConfiguration txCfg = new TransactionConfiguration();
> txCfg.setDefaultTxTimeout(TX_TIMEOUT);
> cfg.setTransactionConfiguration(txCfg);
> CacheConfiguration ccfg = new CacheConfiguration(CACHE_NAME);
> ccfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
> cfg.setCacheConfiguration(ccfg);
> return cfg;
> }
> /** */
> public void testTxTimeoutHandling() throws Exception {
> try {
> final Ignite ignite = startGrid();
> final AtomicBoolean released = new AtomicBoolean();
> multithreadedAsync(new Runnable() {
> @Override public void run() {
> // Start tx with default settings.
> try (Transaction tx = ignite.transactions().txStart()) {
> ignite.cache(CACHE_NAME).put(1, 1);
> l.countDown();
> // Wait longer than default timeout.
> synchronized (mux) {
> while (!released.get()) {
> try {
> mux.wait();
> }
> catch (InterruptedException e) {
> throw new IgniteException(e);
> }
> }
> }
> try {
> tx.commit();
> fail();
> }
> catch (IgniteException e) {
> // Expect exception - tx is rolled 

[jira] [Commented] (IGNITE-6181) Tx is not rolled back on timeout leading to potential whole grid hang

2017-08-25 Thread Alexei Scherbakov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16141996#comment-16141996
 ] 

Alexei Scherbakov commented on IGNITE-6181:
---

"By design pessimistic transaction can detect timeout if some transaction 
related process is in progress (lock acquiring or prepare phase)"

This is exactly the thing needed to be somehow fixed.

If a transaction is started and abandoned for some case(e.g exception in the 
middle) without proper closing, it will lock all other transactions(without 
timeouts) which are trying to acquire locks held by hanging thread and will 
block exchange process.

I suggest to associate a TimeoutObject with a transaction having non-zero 
timeout.

Reopening the issue.

> Tx is not rolled back on timeout leading to potential whole grid hang
> -
>
> Key: IGNITE-6181
> URL: https://issues.apache.org/jira/browse/IGNITE-6181
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.1
>Reporter: Alexei Scherbakov
> Fix For: 2.3
>
>
> Unit test reproducer:
> {noformat}
> /*
>  * Licensed to the Apache Software Foundation (ASF) under one or more
>  * contributor license agreements.  See the NOTICE file distributed with
>  * this work for additional information regarding copyright ownership.
>  * The ASF licenses this file to You under the Apache License, Version 2.0
>  * (the "License"); you may not use this file except in compliance with
>  * the License.  You may obtain a copy of the License at
>  *
>  *  http://www.apache.org/licenses/LICENSE-2.0
>  *
>  * Unless required by applicable law or agreed to in writing, software
>  * distributed under the License is distributed on an "AS IS" BASIS,
>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  * See the License for the specific language governing permissions and
>  * limitations under the License.
>  */
> package org.apache.ignite.cache;
> import java.util.concurrent.CountDownLatch;
> import java.util.concurrent.TimeUnit;
> import java.util.concurrent.atomic.AtomicBoolean;
> import org.apache.ignite.Ignite;
> import org.apache.ignite.IgniteException;
> import org.apache.ignite.configuration.CacheConfiguration;
> import org.apache.ignite.configuration.IgniteConfiguration;
> import org.apache.ignite.configuration.TransactionConfiguration;
> import org.apache.ignite.internal.IgniteInternalFuture;
> import org.apache.ignite.internal.util.typedef.internal.U;
> import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
> import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;
> import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;
> import org.apache.ignite.transactions.Transaction;
> import org.apache.ignite.transactions.TransactionConcurrency;
> import org.apache.ignite.transactions.TransactionIsolation;
> /**
>  * Tests ability to rollback not properly closed transaction.
>  */
> public class IgniteTxTimeoutTest extends GridCommonAbstractTest {
> /** */
> private static final long TX_TIMEOUT = 3_000L;
> /** */
> private static final String CACHE_NAME = "test";
> /** IP finder. */
> private static final TcpDiscoveryVmIpFinder IP_FINDER = new 
> TcpDiscoveryVmIpFinder(true);
> /** */
> private final CountDownLatch l = new CountDownLatch(1);
> /** */
> private final Object mux = new Object();
> /** {@inheritDoc} */
> @Override protected IgniteConfiguration getConfiguration(String 
> igniteInstanceName) throws Exception {
> IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName);
> cfg.setDiscoverySpi(new TcpDiscoverySpi().setIpFinder(IP_FINDER));
> TransactionConfiguration txCfg = new TransactionConfiguration();
> txCfg.setDefaultTxTimeout(TX_TIMEOUT);
> cfg.setTransactionConfiguration(txCfg);
> CacheConfiguration ccfg = new CacheConfiguration(CACHE_NAME);
> ccfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
> cfg.setCacheConfiguration(ccfg);
> return cfg;
> }
> /** */
> public void testTxTimeoutHandling() throws Exception {
> try {
> final Ignite ignite = startGrid();
> final AtomicBoolean released = new AtomicBoolean();
> multithreadedAsync(new Runnable() {
> @Override public void run() {
> // Start tx with default settings.
> try (Transaction tx = ignite.transactions().txStart()) {
> ignite.cache(CACHE_NAME).put(1, 1);
> l.countDown();
> // Wait longer than default timeout.
> synchronized (mux) {
> while (!released.get()) {
>   

[jira] [Created] (IGNITE-6181) Tx is not rolled back on timeout leading to potential whole grid hang

2017-08-24 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-6181:
-

 Summary: Tx is not rolled back on timeout leading to potential 
whole grid hang
 Key: IGNITE-6181
 URL: https://issues.apache.org/jira/browse/IGNITE-6181
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.1
Reporter: Alexei Scherbakov
 Fix For: 2.3


Unit test reproducer:

{noformat}
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *  http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.ignite.cache;

import java.util.concurrent.CountDownLatch;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicBoolean;
import org.apache.ignite.Ignite;
import org.apache.ignite.IgniteException;
import org.apache.ignite.configuration.CacheConfiguration;
import org.apache.ignite.configuration.IgniteConfiguration;
import org.apache.ignite.configuration.TransactionConfiguration;
import org.apache.ignite.internal.IgniteInternalFuture;
import org.apache.ignite.internal.util.typedef.internal.U;
import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;
import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;
import org.apache.ignite.transactions.Transaction;
import org.apache.ignite.transactions.TransactionConcurrency;
import org.apache.ignite.transactions.TransactionIsolation;

/**
 * Tests ability to rollback not properly closed transaction.
 */
public class IgniteTxTimeoutTest extends GridCommonAbstractTest {
/** */
private static final long TX_TIMEOUT = 3_000L;

/** */
private static final String CACHE_NAME = "test";

/** IP finder. */
private static final TcpDiscoveryVmIpFinder IP_FINDER = new 
TcpDiscoveryVmIpFinder(true);

/** */
private final CountDownLatch l = new CountDownLatch(1);

/** */
private final Object mux = new Object();

/** {@inheritDoc} */
@Override protected IgniteConfiguration getConfiguration(String 
igniteInstanceName) throws Exception {
IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName);

cfg.setDiscoverySpi(new TcpDiscoverySpi().setIpFinder(IP_FINDER));

TransactionConfiguration txCfg = new TransactionConfiguration();
txCfg.setDefaultTxTimeout(TX_TIMEOUT);

cfg.setTransactionConfiguration(txCfg);

CacheConfiguration ccfg = new CacheConfiguration(CACHE_NAME);
ccfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);

cfg.setCacheConfiguration(ccfg);

return cfg;
}

/** */
public void testTxTimeoutHandling() throws Exception {
try {
final Ignite ignite = startGrid();

final AtomicBoolean released = new AtomicBoolean();

multithreadedAsync(new Runnable() {
@Override public void run() {
// Start tx with default settings.
try (Transaction tx = ignite.transactions().txStart()) {
ignite.cache(CACHE_NAME).put(1, 1);

l.countDown();

// Wait longer than default timeout.
synchronized (mux) {
while (!released.get()) {
try {
mux.wait();
}
catch (InterruptedException e) {
throw new IgniteException(e);
}
}
}

try {
tx.commit();

fail();
}
catch (IgniteException e) {
// Expect exception - tx is rolled back.
}
}
}
}, 1, "Locker");

IgniteInternalFuture fut2 = multithreadedAsync(new Runnable() {
@Override public void run() {
U.awaitQuiet(l);

// Try to acquire lock.
// Acquisition will be 

[jira] [Resolved] (IGNITE-5093) Better heap usage during exchange on large topologies and cache numbers/partitions.

2017-07-24 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov resolved IGNITE-5093.
---
Resolution: Duplicate

Major fixes were done for other tickes, including:

1. store difference with topology version

2. GridPartitionStateMap

> Better heap usage during exchange on large topologies and cache 
> numbers/partitions.
> ---
>
> Key: IGNITE-5093
> URL: https://issues.apache.org/jira/browse/IGNITE-5093
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Affects Versions: 1.6
>Reporter: Alexei Scherbakov
>Assignee: Alexei Scherbakov
> Fix For: 2.2
>
>
> I observed huge heap occupation on large grid installation including 136 
> nodes/1k caches.
> Example from machine with 64g heap:
> {noformat}
>  num #instances #bytes  class name
> --
>1:   89728797743069822896  java.util.HashMap$Node
>2:   927316214866180592  [Ljava.util.HashMap$Node;
>3:   2012822924830775008java.lang.Integer
>4:   6247215983811096  [Ljava.lang.Object;
>5:   3383402767741664  [C
>6:   12188669411952   [B
>7:   9923859   635126976   java.util.HashMap
> ...
> {noformat}
> Further investigation had showed the heap is polluted during exchange 
> process, which involves creating many hashmaps occupying large amounts of 
> memory.
> Proposal: use other datastructures to help keep heap usage low.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (IGNITE-5709) Node stopped on OutOfMemoryException with persistence

2017-07-07 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov updated IGNITE-5709:
--
Affects Version/s: (was: 2.1)
   2.2

> Node stopped on OutOfMemoryException with persistence
> -
>
> Key: IGNITE-5709
> URL: https://issues.apache.org/jira/browse/IGNITE-5709
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Affects Versions: 2.2
>Reporter: Alexander Belyak
>Priority: Critical
>
> In long heavy (100%) load node with configured persistence can stop with 
> "org.apache.ignite.internal.mem.OutOfMemoryException: Failed to find a page 
> for eviction" exception. In my test it fail after 23 hour of 100% load while 
> expiration outdated entries (by CreatedExpiryPolicy).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (IGNITE-5385) Get rid of discovery custom message on exchange completion

2017-06-26 Thread Alexei Scherbakov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-5385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063853#comment-16063853
 ] 

Alexei Scherbakov commented on IGNITE-5385:
---

Implemented procedure when a new coordinator will fetch ready assignments from 
over nodes and finish stale exchanges (if any) with the same assignement.

Working on tests.

> Get rid of discovery custom message on exchange completion
> --
>
> Key: IGNITE-5385
> URL: https://issues.apache.org/jira/browse/IGNITE-5385
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache
>Affects Versions: 2.0
>Reporter: Yakov Zhdanov
>Assignee: Alexei Scherbakov
>Priority: Blocker
>  Labels: performance
> Fix For: 2.2
>
>
> Currently if late affinity assignment is on we send full partition map as a 
> custom message to make sure all nodes get it. With greater number of nodes 
> and caches this can cause significant slowdowns.
> We suggest to move sending to communication. In this case scenario with 
> coordinator failure requires special handling, since in this case some nodes 
> may receive full map, complete exchange and proceed with cache operations, 
> while others may not received full map yet. In this case full map should be 
> resend from new coordinator - it should be recalculated if none has received 
> one from former coordinator or should be requested from one of the lucky 
> receivers to get forwarded to other nodes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (IGNITE-5385) Get rid of discovery custom message on exchange completion

2017-06-15 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-5385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov updated IGNITE-5385:
--
Fix Version/s: (was: 2.1)
   2.2

> Get rid of discovery custom message on exchange completion
> --
>
> Key: IGNITE-5385
> URL: https://issues.apache.org/jira/browse/IGNITE-5385
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache
>Affects Versions: 2.0
>Reporter: Yakov Zhdanov
>Assignee: Alexei Scherbakov
>Priority: Blocker
>  Labels: performance
> Fix For: 2.2
>
>
> Currently if late affinity assignment is on we send full partition map as a 
> custom message to make sure all nodes get it. With greater number of nodes 
> and caches this can cause significant slowdowns.
> We suggest to move sending to communication. In this case scenario with 
> coordinator failure requires special handling, since in this case some nodes 
> may receive full map, complete exchange and proceed with cache operations, 
> while others may not received full map yet. In this case full map should be 
> resend from new coordinator - it should be recalculated if none has received 
> one from former coordinator or should be requested from one of the lucky 
> receivers to get forwarded to other nodes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (IGNITE-5457) Weird discovery behavior on split brain.

2017-06-08 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov updated IGNITE-5457:
--
Description: 
I observe buggy behavior  in case of simulated split brain.

Nodes in DataCenter1 (where coordinator is located) are slowly leave grid,

while nodes in DataCenter2 stay in grid forever.

In logs I see multiple attemps to kick coordinator by communcation by connect 
timeout, but number of nodes does not change.

Note what my failureDetectionTimeout is significantly higher than communication 
connect timeout.

Looks like coordinator cannot be kicked from topology by TcpCommuncationSpi.

{noformat}
19:14:03.289 [WARN ] [o.a.i.s.c.tcp.TcpCommunicationSpi] [T:] - Connect timed 
out (consider increasing 'connTimeout' configuration property) 
[addr=grid457.ca.sbrf.ru/10.116.206.193:47100, connTimeout=12]
19:14:03.289 [ERROR] [o.a.i.s.c.tcp.TcpCommunicationSpi] [T:] - 
TcpCommunicationSpi failed to establish connection to node, node will be 
dropped from cluster [rmtNode=TcpDiscoveryNode 
[id=a8ac1b24-8377-4064-a3d9-02bad9c6f2bb, addrs=[10.116.206.193], 
sockAddrs=[grid457.ca.sbrf.ru/10.116.206.193:47500], discPort=47500, order=1, 
intOrder=1, lastExchangeTime=1496936257121, loc=false, 
ver=1.10.3#20170604-sha1:30521a17, isClient=false]]
org.apache.ignite.IgniteCheckedException: Failed to connect to node (is node 
still alive?). Make sure that each ComputeTask and cache Transaction has a 
timeout set in order to prevent parties from waiting forever in case of network 
issues [nodeId=a8ac1b24-8377-4064-a3d9-02bad9c6f2bb, 
addrs=[grid457.ca.sbrf.ru/10.116.206.193:47100]]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3022)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2636)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2528)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.access$5800(TcpCommunicationSpi.java:245)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.processDisconnect(TcpCommunicationSpi.java:3830)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.body(TcpCommunicationSpi.java:3656)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) 
[ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
Suppressed: org.apache.ignite.IgniteCheckedException: Failed to connect 
to address [addr=grid457.ca.sbrf.ru/10.116.206.193:47100, err=null]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3027)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
... 6 common frames omitted
Caused by: java.net.SocketTimeoutException: null
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2884)
... 6 common frames omitted
19:14:37.967 [INFO ] [o.a.i.i.IgniteKernal%DPL_GRID%grid880] [T:] - 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
^-- Node [id=21e01ea7, name=DPL_GRID%grid880, uptime=00:37:00:200]
^-- H/N/C [hosts=144, nodes=160, CPUs=8064]
^-- CPU [cur=0.2%, avg=2.37%, GC=0%]
^-- PageMemory [pages=604144]
^-- Heap [used=33396MB, free=49.04%, comm=65536MB]
^-- Non heap [used=171MB, free=-1%, comm=173MB]
^-- Public thread pool [active=0, idle=0, qSize=0]
^-- System thread pool [active=0, idle=0, qSize=0]
^-- Outbound messages queue [size=0]
{noformat}

Number of nodes is same as before kick attempt.

  was:
I observe buggy behavior  in case of simulated split brain.

Nodes in DataCenter1 (where coordinator is located) are slowly leave grid,

while nodes in DataCenter2 stay in grid forever.

In logs I see multiple attemps to kick coordinator by communcation by connect 
timeout, but number of nodes does not change.

Note what my failureDetectionTimeout is significantly higher than communication 
connect timeout.

Looks like coordinator cannot be kicked from topology by TcpCommuncationSpi.

{noformat}
19:14:03.289 [WARN ] [o.a.i.s.c.tcp.TcpCommunicationSpi] [T:] - Connect timed 
out (consider increasing 'connTimeout' configuration property) 
[addr=grid457.ca.sbrf.ru/10.116.206.193:47100, connTimeout=12]
19:14:03.289 [ERROR] [o.a.i.s.c.tcp.TcpCommunicationSpi] [T:] - 
TcpCommunicationSpi failed to establish connection to node, node 

[jira] [Updated] (IGNITE-5457) Weird discovery behavior on split brain.

2017-06-08 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov updated IGNITE-5457:
--
Description: 
I observe buggy behavior  in case of simulated split brain.

Nodes in DataCenter1 (where coordinator is located) are slowly leave grid,

while nodes in DataCenter2 stay in grid forever.

In logs I see multiple attemps to kick coordinator by communcation by connect 
timeout, but number of nodes does not change.

Note what my failureDetectionTimeout is significantly higher than communication 
connect timeout.

Looks like coordinator cannot be kicked from topology by TcpCommuncationSpi.

{noformat}
19:14:03.289 [WARN ] [o.a.i.s.c.tcp.TcpCommunicationSpi] [T:] - Connect timed 
out (consider increasing 'connTimeout' configuration property) 
[addr=grid457.ca.sbrf.ru/10.116.206.193:47100, connTimeout=12]
19:14:03.289 [ERROR] [o.a.i.s.c.tcp.TcpCommunicationSpi] [T:] - 
TcpCommunicationSpi failed to establish connection to node, node will be 
dropped from cluster [rmtNode=TcpDiscoveryNode 
[id=a8ac1b24-8377-4064-a3d9-02bad9c6f2bb, addrs=[10.116.206.193], 
sockAddrs=[grid457.ca.sbrf.ru/10.116.206.193:47500], discPort=47500, order=1, 
intOrder=1, lastExchangeTime=1496936257121, loc=false, 
ver=1.10.3#20170604-sha1:30521a17, isClient=false]]
org.apache.ignite.IgniteCheckedException: Failed to connect to node (is node 
still alive?). Make sure that each ComputeTask and cache Transaction has a 
timeout set in order to prevent parties from waiting forever in case of network 
issues [nodeId=a8ac1b24-8377-4064-a3d9-02bad9c6f2bb, 
addrs=[grid457.ca.sbrf.ru/10.116.206.193:47100]]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3022)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2636)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2528)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.access$5800(TcpCommunicationSpi.java:245)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.processDisconnect(TcpCommunicationSpi.java:3830)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.body(TcpCommunicationSpi.java:3656)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) 
[ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
Suppressed: org.apache.ignite.IgniteCheckedException: Failed to connect 
to address [addr=grid457.ca.sbrf.ru/10.116.206.193:47100, err=null]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3027)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
... 6 common frames omitted
Caused by: java.net.SocketTimeoutException: null
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2884)
... 6 common frames omitted
19:14:37.967 [INFO ] [o.a.i.i.IgniteKernal%DPL_GRID%grid880] [T:] - 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
^-- Node [id=21e01ea7, name=DPL_GRID%grid880, uptime=00:37:00:200]
^-- H/N/C [hosts=144, nodes=160, CPUs=8064]
^-- CPU [cur=0.2%, avg=2.37%, GC=0%]
^-- PageMemory [pages=604144]
^-- Heap [used=33396MB, free=49.04%, comm=65536MB]
^-- Non heap [used=171MB, free=-1%, comm=173MB]
^-- Public thread pool [active=0, idle=0, qSize=0]
^-- System thread pool [active=0, idle=0, qSize=0]
^-- Outbound messages queue [size=0]
{noformat}

  was:
I observe buggy behavior  in case of simulated split brain.

Nodes in DataCenter1 (where coordinator is located) are slowly leave grid,

while nodes in DataCenter2 stay in grid forever.

In logs I see multiple attemps to kick coordinator by communcation by socket 
timeout, but number of nodes does not change.

Note what my failureDetectionTimeout is significantly higher than communication 
socket timeout.

Looks like coordinator cannot be kicked from topology by TcpCommuncationSpi.

{noformat}
19:14:03.289 [WARN ] [o.a.i.s.c.tcp.TcpCommunicationSpi] [T:] - Connect timed 
out (consider increasing 'connTimeout' configuration property) 
[addr=grid457.ca.sbrf.ru/10.116.206.193:47100, connTimeout=12]
19:14:03.289 [ERROR] [o.a.i.s.c.tcp.TcpCommunicationSpi] [T:] - 
TcpCommunicationSpi failed to establish connection to node, node will be 
dropped from cluster 

[jira] [Updated] (IGNITE-5457) Weird discovery behavior on split brain.

2017-06-08 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov updated IGNITE-5457:
--
Description: 
I observe buggy behavior  in case of simulated split brain.

Nodes in DataCenter1 (where coordinator is located) are slowly leave grid,

while nodes in DataCenter2 stay in grid forever.

In logs I see multiple attemps to kick coordinator by communcation by socket 
timeout, but number of nodes does not change.

Note what my failureDetectionTimeout is significantly higher than communication 
socket timeout.

Looks like coordinator cannot be kicked from topology by TcpCommuncationSpi.

{noformat}
19:14:03.289 [WARN ] [o.a.i.s.c.tcp.TcpCommunicationSpi] [T:] - Connect timed 
out (consider increasing 'connTimeout' configuration property) 
[addr=grid457.ca.sbrf.ru/10.116.206.193:47100, connTimeout=12]
19:14:03.289 [ERROR] [o.a.i.s.c.tcp.TcpCommunicationSpi] [T:] - 
TcpCommunicationSpi failed to establish connection to node, node will be 
dropped from cluster [rmtNode=TcpDiscoveryNode 
[id=a8ac1b24-8377-4064-a3d9-02bad9c6f2bb, addrs=[10.116.206.193], 
sockAddrs=[grid457.ca.sbrf.ru/10.116.206.193:47500], discPort=47500, order=1, 
intOrder=1, lastExchangeTime=1496936257121, loc=false, 
ver=1.10.3#20170604-sha1:30521a17, isClient=false]]
org.apache.ignite.IgniteCheckedException: Failed to connect to node (is node 
still alive?). Make sure that each ComputeTask and cache Transaction has a 
timeout set in order to prevent parties from waiting forever in case of network 
issues [nodeId=a8ac1b24-8377-4064-a3d9-02bad9c6f2bb, 
addrs=[grid457.ca.sbrf.ru/10.116.206.193:47100]]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3022)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2636)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2528)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.access$5800(TcpCommunicationSpi.java:245)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.processDisconnect(TcpCommunicationSpi.java:3830)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.body(TcpCommunicationSpi.java:3656)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) 
[ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
Suppressed: org.apache.ignite.IgniteCheckedException: Failed to connect 
to address [addr=grid457.ca.sbrf.ru/10.116.206.193:47100, err=null]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3027)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
... 6 common frames omitted
Caused by: java.net.SocketTimeoutException: null
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2884)
... 6 common frames omitted
19:14:37.967 [INFO ] [o.a.i.i.IgniteKernal%DPL_GRID%grid880] [T:] - 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
^-- Node [id=21e01ea7, name=DPL_GRID%grid880, uptime=00:37:00:200]
^-- H/N/C [hosts=144, nodes=160, CPUs=8064]
^-- CPU [cur=0.2%, avg=2.37%, GC=0%]
^-- PageMemory [pages=604144]
^-- Heap [used=33396MB, free=49.04%, comm=65536MB]
^-- Non heap [used=171MB, free=-1%, comm=173MB]
^-- Public thread pool [active=0, idle=0, qSize=0]
^-- System thread pool [active=0, idle=0, qSize=0]
^-- Outbound messages queue [size=0]
{noformat}

  was:
I observe buggy behavior  in case of simulated split brain.

Nodes in DataCenter1 (where coordinator is located) are slowly leave grid,

while nodes in DataCenter2 stay in grid forever.

In logs I see multiple attemps to kick coordinator by communcation by socket 
timeout, but number of nodes does not change.

Note what my failureDetectionTimeout is significantly higher than communication 
socket timeout.

Looks like coordinator cannot be kicked from topology by TcpCommuncationSpi.

{noformat}
19:13:53.978 [INFO ] [o.g.g.i.p.c.d.GridCacheDatabaseSharedManager] [T:] - 
Skipping checkpoint (no pages were modified) [checkpointLockWait=0ms, 
checkpointLockHoldTime=131ms, reason='timeout']
19:14:03.289 [WARN ] [o.a.i.s.c.tcp.TcpCommunicationSpi] [T:] - Connect timed 
out (consider increasing 'connTimeout' configuration property) 
[addr=grid457.ca.sbrf.ru/10.116.206.193:47100, 

[jira] [Updated] (IGNITE-5457) Weird discovery behavior on split brain.

2017-06-08 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov updated IGNITE-5457:
--
Description: 
I observe buggy behavior  in case of simulated split brain.

Nodes in DataCenter1 (where coordinator is located) are slowly leave grid,

while nodes in DataCenter2 stay in grid forever.

In logs I see multiple attemps to kick coordinator by communcation by socket 
timeout, but number of nodes does not change.

Note what my failureDetectionTimeout is significantly higher than communication 
socket timeout.

Looks like coordinator cannot be kicked from topology by TcpCommuncationSpi.

{noformat}
19:13:53.978 [INFO ] [o.g.g.i.p.c.d.GridCacheDatabaseSharedManager] [T:] - 
Skipping checkpoint (no pages were modified) [checkpointLockWait=0ms, 
checkpointLockHoldTime=131ms, reason='timeout']
19:14:03.289 [WARN ] [o.a.i.s.c.tcp.TcpCommunicationSpi] [T:] - Connect timed 
out (consider increasing 'connTimeout' configuration property) 
[addr=grid457.ca.sbrf.ru/10.116.206.193:47100, connTimeout=12]
19:14:03.289 [ERROR] [o.a.i.s.c.tcp.TcpCommunicationSpi] [T:] - 
TcpCommunicationSpi failed to establish connection to node, node will be 
dropped from cluster [rmtNode=TcpDiscoveryNode 
[id=a8ac1b24-8377-4064-a3d9-02bad9c6f2bb, addrs=[10.116.206.193], 
sockAddrs=[grid457.ca.sbrf.ru/10.116.206.193:47500], discPort=47500, order=1, 
intOrder=1, lastExchangeTime=1496936257121, loc=false, 
ver=1.10.3#20170604-sha1:30521a17, isClient=false]]
org.apache.ignite.IgniteCheckedException: Failed to connect to node (is node 
still alive?). Make sure that each ComputeTask and cache Transaction has a 
timeout set in order to prevent parties from waiting forever in case of network 
issues [nodeId=a8ac1b24-8377-4064-a3d9-02bad9c6f2bb, 
addrs=[grid457.ca.sbrf.ru/10.116.206.193:47100]]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3022)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2636)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2528)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.access$5800(TcpCommunicationSpi.java:245)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.processDisconnect(TcpCommunicationSpi.java:3830)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.body(TcpCommunicationSpi.java:3656)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) 
[ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
Suppressed: org.apache.ignite.IgniteCheckedException: Failed to connect 
to address [addr=grid457.ca.sbrf.ru/10.116.206.193:47100, err=null]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3027)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
... 6 common frames omitted
Caused by: java.net.SocketTimeoutException: null
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2884)
... 6 common frames omitted
19:14:23.989 [INFO ] [o.g.g.i.p.c.d.GridCacheDatabaseSharedManager] [T:] - 
Skipping checkpoint (no pages were modified) [checkpointLockWait=0ms, 
checkpointLockHoldTime=130ms, reason='timeout']
19:14:34.078 [INFO ] [o.g.g.i.p.c.d.GridCacheDatabaseSharedManager] [T:] - 
Skipping checkpoint (no pages were modified) [checkpointLockWait=0ms, 
checkpointLockHoldTime=211ms, reason='timeout']
19:14:37.967 [INFO ] [o.a.i.i.IgniteKernal%DPL_GRID%grid880] [T:] - 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
^-- Node [id=21e01ea7, name=DPL_GRID%grid880, uptime=00:37:00:200]
^-- H/N/C [hosts=144, nodes=160, CPUs=8064]
^-- CPU [cur=0.2%, avg=2.37%, GC=0%]
^-- PageMemory [pages=604144]
^-- Heap [used=33396MB, free=49.04%, comm=65536MB]
^-- Non heap [used=171MB, free=-1%, comm=173MB]
^-- Public thread pool [active=0, idle=0, qSize=0]
^-- System thread pool [active=0, idle=0, qSize=0]
^-- Outbound messages queue [size=0]
{noformat}

  was:
I observe buggy behavior in case of simulated split brain.

Nodes in DataCenter1 (where coordinator is located) are slowly leave grid,

while nodes in DataCenter2 stay in grid forever.

In logs I see multiple attemps to kick coordinator by communcation by socket 
timeout, but number of nodes does not change.

[jira] [Updated] (IGNITE-5457) Weird discovery behavior on split brain.

2017-06-08 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov updated IGNITE-5457:
--
Description: 
I observe buggy behavior in case of simulated split brain.

Nodes in DataCenter1 (where coordinator is located) are slowly leave grid,

while nodes in DataCenter2 stay in grid forever.

In logs I see multiple attemps to kick coordinator by communcation by socket 
timeout, but number of nodes does not change.

{noformat}
19:13:53.978 [INFO ] [o.g.g.i.p.c.d.GridCacheDatabaseSharedManager] [T:] - 
Skipping checkpoint (no pages were modified) [checkpointLockWait=0ms, 
checkpointLockHoldTime=131ms, reason='timeout']
19:14:03.289 [WARN ] [o.a.i.s.c.tcp.TcpCommunicationSpi] [T:] - Connect timed 
out (consider increasing 'connTimeout' configuration property) 
[addr=grid457.ca.sbrf.ru/10.116.206.193:47100, connTimeout=12]
19:14:03.289 [ERROR] [o.a.i.s.c.tcp.TcpCommunicationSpi] [T:] - 
TcpCommunicationSpi failed to establish connection to node, node will be 
dropped from cluster [rmtNode=TcpDiscoveryNode 
[id=a8ac1b24-8377-4064-a3d9-02bad9c6f2bb, addrs=[10.116.206.193], 
sockAddrs=[grid457.ca.sbrf.ru/10.116.206.193:47500], discPort=47500, order=1, 
intOrder=1, lastExchangeTime=1496936257121, loc=false, 
ver=1.10.3#20170604-sha1:30521a17, isClient=false]]
org.apache.ignite.IgniteCheckedException: Failed to connect to node (is node 
still alive?). Make sure that each ComputeTask and cache Transaction has a 
timeout set in order to prevent parties from waiting forever in case of network 
issues [nodeId=a8ac1b24-8377-4064-a3d9-02bad9c6f2bb, 
addrs=[grid457.ca.sbrf.ru/10.116.206.193:47100]]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3022)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2636)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2528)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.access$5800(TcpCommunicationSpi.java:245)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.processDisconnect(TcpCommunicationSpi.java:3830)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.body(TcpCommunicationSpi.java:3656)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) 
[ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
Suppressed: org.apache.ignite.IgniteCheckedException: Failed to connect 
to address [addr=grid457.ca.sbrf.ru/10.116.206.193:47100, err=null]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3027)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
... 6 common frames omitted
Caused by: java.net.SocketTimeoutException: null
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2884)
... 6 common frames omitted
19:14:23.989 [INFO ] [o.g.g.i.p.c.d.GridCacheDatabaseSharedManager] [T:] - 
Skipping checkpoint (no pages were modified) [checkpointLockWait=0ms, 
checkpointLockHoldTime=130ms, reason='timeout']
19:14:34.078 [INFO ] [o.g.g.i.p.c.d.GridCacheDatabaseSharedManager] [T:] - 
Skipping checkpoint (no pages were modified) [checkpointLockWait=0ms, 
checkpointLockHoldTime=211ms, reason='timeout']
19:14:37.967 [INFO ] [o.a.i.i.IgniteKernal%DPL_GRID%grid880] [T:] - 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
^-- Node [id=21e01ea7, name=DPL_GRID%grid880, uptime=00:37:00:200]
^-- H/N/C [hosts=144, nodes=160, CPUs=8064]
^-- CPU [cur=0.2%, avg=2.37%, GC=0%]
^-- PageMemory [pages=604144]
^-- Heap [used=33396MB, free=49.04%, comm=65536MB]
^-- Non heap [used=171MB, free=-1%, comm=173MB]
^-- Public thread pool [active=0, idle=0, qSize=0]
^-- System thread pool [active=0, idle=0, qSize=0]
^-- Outbound messages queue [size=0]
{noformat}

  was:
I observe buggy behavior in case of simulated split brain.

Nodes in DataCenter1 (where coordinator is located) are slowly leave grid,

while nodes in DataCenter2 stay in grid forever.

In logs I see attempts multiple attemps to kick coordinator by communcation by 
socket timeout, but number of nodes does not change.

{noformat}
19:13:53.978 [INFO ] [o.g.g.i.p.c.d.GridCacheDatabaseSharedManager] [T:] - 
Skipping checkpoint (no pages were modified) [checkpointLockWait=0ms, 

[jira] [Created] (IGNITE-5457) Weird discovery behavior on split brain.

2017-06-08 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-5457:
-

 Summary: Weird discovery behavior on split brain.
 Key: IGNITE-5457
 URL: https://issues.apache.org/jira/browse/IGNITE-5457
 Project: Ignite
  Issue Type: Bug
  Components: general
Affects Versions: 2.0
Reporter: Alexei Scherbakov
Priority: Critical
 Fix For: 2.2


I observe buggy behavior in case of simulated split brain.

Nodes in DataCenter1 (where coordinator is located) are slowly leave grid,

while nodes in DataCenter2 stay in grid forever.

In logs I see attempts multiple attemps to kick coordinator by communcation by 
socket timeout, but number of nodes does not change.

{noformat}
19:13:53.978 [INFO ] [o.g.g.i.p.c.d.GridCacheDatabaseSharedManager] [T:] - 
Skipping checkpoint (no pages were modified) [checkpointLockWait=0ms, 
checkpointLockHoldTime=131ms, reason='timeout']
19:14:03.289 [WARN ] [o.a.i.s.c.tcp.TcpCommunicationSpi] [T:] - Connect timed 
out (consider increasing 'connTimeout' configuration property) 
[addr=grid457.ca.sbrf.ru/10.116.206.193:47100, connTimeout=12]
19:14:03.289 [ERROR] [o.a.i.s.c.tcp.TcpCommunicationSpi] [T:] - 
TcpCommunicationSpi failed to establish connection to node, node will be 
dropped from cluster [rmtNode=TcpDiscoveryNode 
[id=a8ac1b24-8377-4064-a3d9-02bad9c6f2bb, addrs=[10.116.206.193], 
sockAddrs=[grid457.ca.sbrf.ru/10.116.206.193:47500], discPort=47500, order=1, 
intOrder=1, lastExchangeTime=1496936257121, loc=false, 
ver=1.10.3#20170604-sha1:30521a17, isClient=false]]
org.apache.ignite.IgniteCheckedException: Failed to connect to node (is node 
still alive?). Make sure that each ComputeTask and cache Transaction has a 
timeout set in order to prevent parties from waiting forever in case of network 
issues [nodeId=a8ac1b24-8377-4064-a3d9-02bad9c6f2bb, 
addrs=[grid457.ca.sbrf.ru/10.116.206.193:47100]]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3022)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2636)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2528)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.access$5800(TcpCommunicationSpi.java:245)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.processDisconnect(TcpCommunicationSpi.java:3830)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.body(TcpCommunicationSpi.java:3656)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) 
[ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
Suppressed: org.apache.ignite.IgniteCheckedException: Failed to connect 
to address [addr=grid457.ca.sbrf.ru/10.116.206.193:47100, err=null]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3027)
 [ignite-core-1.10.3.ea10.jar:1.10.3.ea10]
... 6 common frames omitted
Caused by: java.net.SocketTimeoutException: null
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2884)
... 6 common frames omitted
19:14:23.989 [INFO ] [o.g.g.i.p.c.d.GridCacheDatabaseSharedManager] [T:] - 
Skipping checkpoint (no pages were modified) [checkpointLockWait=0ms, 
checkpointLockHoldTime=130ms, reason='timeout']
19:14:34.078 [INFO ] [o.g.g.i.p.c.d.GridCacheDatabaseSharedManager] [T:] - 
Skipping checkpoint (no pages were modified) [checkpointLockWait=0ms, 
checkpointLockHoldTime=211ms, reason='timeout']
19:14:37.967 [INFO ] [o.a.i.i.IgniteKernal%DPL_GRID%grid880] [T:] - 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
^-- Node [id=21e01ea7, name=DPL_GRID%grid880, uptime=00:37:00:200]
^-- H/N/C [hosts=144, nodes=160, CPUs=8064]
^-- CPU [cur=0.2%, avg=2.37%, GC=0%]
^-- PageMemory [pages=604144]
^-- Heap [used=33396MB, free=49.04%, comm=65536MB]
^-- Non heap [used=171MB, free=-1%, comm=173MB]
^-- Public thread pool [active=0, idle=0, qSize=0]
^-- System thread pool [active=0, idle=0, qSize=0]
^-- Outbound messages queue [size=0]
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (IGNITE-5385) Get rid of discovery custom message on exchange completion

2017-06-07 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-5385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov reassigned IGNITE-5385:
-

Assignee: Alexei Scherbakov  (was: Yakov Zhdanov)

> Get rid of discovery custom message on exchange completion
> --
>
> Key: IGNITE-5385
> URL: https://issues.apache.org/jira/browse/IGNITE-5385
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache
>Affects Versions: 2.0
>Reporter: Yakov Zhdanov
>Assignee: Alexei Scherbakov
>Priority: Blocker
>  Labels: performance
> Fix For: 2.1
>
>
> Currently if late affinity assignment is on we send full partition map as a 
> custom message to make sure all nodes get it. With greater number of nodes 
> and caches this can cause significant slowdowns.
> We suggest to move sending to communication. In this case scenario with 
> coordinator failure requires special handling, since in this case some nodes 
> may receive full map, complete exchange and proceed with cache operations, 
> while others may not received full map yet. In this case full map should be 
> resend from new coordinator - it should be recalculated if none has received 
> one from former coordinator or should be requested from one of the lucky 
> receivers to get forwarded to other nodes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (IGNITE-3714) Introduce new performance hint for default store-by-value caches behavior.

2017-06-04 Thread Alexei Scherbakov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036245#comment-16036245
 ] 

Alexei Scherbakov commented on IGNITE-3714:
---

Note what for speeding up reads configurable on-heap records cache will be 
introduced.

> Introduce new performance hint for default store-by-value caches behavior.
> --
>
> Key: IGNITE-3714
> URL: https://issues.apache.org/jira/browse/IGNITE-3714
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache
>Reporter: Alexei Scherbakov
>Assignee: Wuwei Lin
>  Labels: newbie
> Fix For: 2.1
>
>
> Default store-by-value semantics of Ignite has bad impact on performance and 
> rarely needed.
> We must print the performance hint if some of the caches have copyOnRead 
> property set to true (default value).
> It must work both for static and dynamic caches.
> Corresponding thread on dev list: 
> http://apache-ignite-developers.2346864.n4.nabble.com/copyOnRead-performance-issues-td10762.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (IGNITE-3714) Introduce new performance hint for default store-by-value caches behavior.

2017-06-04 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov resolved IGNITE-3714.
---
Resolution: Won't Fix

> Introduce new performance hint for default store-by-value caches behavior.
> --
>
> Key: IGNITE-3714
> URL: https://issues.apache.org/jira/browse/IGNITE-3714
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache
>Reporter: Alexei Scherbakov
>Assignee: Wuwei Lin
>  Labels: newbie
> Fix For: 2.1
>
>
> Default store-by-value semantics of Ignite has bad impact on performance and 
> rarely needed.
> We must print the performance hint if some of the caches have copyOnRead 
> property set to true (default value).
> It must work both for static and dynamic caches.
> Corresponding thread on dev list: 
> http://apache-ignite-developers.2346864.n4.nabble.com/copyOnRead-performance-issues-td10762.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (IGNITE-3714) Introduce new performance hint for default store-by-value caches behavior.

2017-06-04 Thread Alexei Scherbakov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036244#comment-16036244
 ] 

Alexei Scherbakov commented on IGNITE-3714:
---

[[~vinx13],

Look like the ticket is not longer actual, because in 2.0 we have new page 
memory architecture, where all reads are copying.

I'll close ticket as deprecated.

> Introduce new performance hint for default store-by-value caches behavior.
> --
>
> Key: IGNITE-3714
> URL: https://issues.apache.org/jira/browse/IGNITE-3714
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache
>Reporter: Alexei Scherbakov
>Assignee: Wuwei Lin
>  Labels: newbie
> Fix For: 2.1
>
>
> Default store-by-value semantics of Ignite has bad impact on performance and 
> rarely needed.
> We must print the performance hint if some of the caches have copyOnRead 
> property set to true (default value).
> It must work both for static and dynamic caches.
> Corresponding thread on dev list: 
> http://apache-ignite-developers.2346864.n4.nabble.com/copyOnRead-performance-issues-td10762.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (IGNITE-5389) Allow to start sequences in batches.

2017-06-02 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-5389:
-

 Summary: Allow to start sequences in batches.
 Key: IGNITE-5389
 URL: https://issues.apache.org/jira/browse/IGNITE-5389
 Project: Ignite
  Issue Type: Improvement
  Components: cache, data structures
Affects Versions: 1.6
Reporter: Alexei Scherbakov
 Fix For: 2.2


Currently I observe long times trying to start multiple sequences on large 
topology one by one.

This must be optimized same way as for caches, allowing batch start.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (IGNITE-5371) Persistence for text indices

2017-06-01 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-5371:
-

 Summary: Persistence for text indices
 Key: IGNITE-5371
 URL: https://issues.apache.org/jira/browse/IGNITE-5371
 Project: Ignite
  Issue Type: New Feature
  Components: cache
Affects Versions: 2.1
Reporter: Alexei Scherbakov
 Fix For: 2.2


Currently text indices (used with TextQuery) recide in java heap and do not 
survive node restarts. With the incoming persistence feature in 2.1  this 
behavior is uncceptable.

We need to implement LuceneDirectory based on PageMemory to make indices 
persistable and turn Ignite into full fledged full text search engine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (IGNITE-5357) Replicated cache reads load balancing.

2017-05-31 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-5357:
-

 Summary: Replicated cache reads load balancing.
 Key: IGNITE-5357
 URL: https://issues.apache.org/jira/browse/IGNITE-5357
 Project: Ignite
  Issue Type: Bug
  Components: cache
Affects Versions: 1.6
Reporter: Alexei Scherbakov
 Fix For: 2.2


Currently all read requests from client node to replicated cache will go 
through primary node for key.

Need to select random affinity node in topology and send request here (only if 
readFromBackups=true)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (IGNITE-5293) Replicated cache performance degradation.

2017-05-26 Thread Alexei Scherbakov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-5293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16026251#comment-16026251
 ] 

Alexei Scherbakov commented on IGNITE-5293:
---

Updated the description with my local results.

> Replicated cache performance degradation.
> -
>
> Key: IGNITE-5293
> URL: https://issues.apache.org/jira/browse/IGNITE-5293
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.0
>Reporter: Alexei Scherbakov
> Fix For: 2.2
>
>
> With increase in number of nodes puts to replicated cache are slowed down 
> almost in the same proportion.
> Unit test reproducer:
> {noformat}
> /*
>  * Licensed to the Apache Software Foundation (ASF) under one or more
>  * contributor license agreements.  See the NOTICE file distributed with
>  * this work for additional information regarding copyright ownership.
>  * The ASF licenses this file to You under the Apache License, Version 2.0
>  * (the "License"); you may not use this file except in compliance with
>  * the License.  You may obtain a copy of the License at
>  *
>  *  http://www.apache.org/licenses/LICENSE-2.0
>  *
>  * Unless required by applicable law or agreed to in writing, software
>  * distributed under the License is distributed on an "AS IS" BASIS,
>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  * See the License for the specific language governing permissions and
>  * limitations under the License.
>  */
> package org.apache.ignite.internal.processors.cache.distributed.replicated;
> import org.apache.ignite.Ignite;
> import org.apache.ignite.IgniteCache;
> import org.apache.ignite.cache.CacheAtomicityMode;
> import org.apache.ignite.cache.CacheMode;
> import org.apache.ignite.cache.CacheWriteSynchronizationMode;
> import org.apache.ignite.configuration.CacheConfiguration;
> import org.apache.ignite.configuration.IgniteConfiguration;
> import org.apache.ignite.configuration.MemoryConfiguration;
> import org.apache.ignite.internal.IgniteEx;
> import org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi;
> import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;
> import org.apache.ignite.transactions.Transaction;
> import org.apache.ignite.transactions.TransactionConcurrency;
> import org.apache.ignite.transactions.TransactionIsolation;
> /**
>  * Tests replicated cache performance .
>  */
> public class GridCacheReplicatedTransactionalDegradationTest extends 
> GridCommonAbstractTest {
> /** Keys. */
> private static final int KEYS = 100_000;
> @Override protected IgniteConfiguration getConfiguration(String gridName) 
> throws Exception {
> IgniteConfiguration cfg = super.getConfiguration(gridName);
> cfg.setClientMode(gridName.startsWith("client"));
> CacheConfiguration ccfg = new CacheConfiguration();
> ccfg.setCacheMode(CacheMode.REPLICATED);
> ccfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
> 
> ccfg.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC);
> ccfg.setName("test");
> cfg.setCacheConfiguration(ccfg);
> return cfg;
> }
> /** */
> public void testThroughput() throws Exception {
> try {
> IgniteEx grid0 = startGrid(0);
> Ignite client = startGrid("client");
> IgniteCache cache = 
> client.getOrCreateCache("test");
> doTest(client, cache);
> startGrid(1);
> doTest(client, cache);
> startGrid(2);
> doTest(client, cache);
> } finally {
> stopAllGrids();
> }
> }
> /**
>  * @param client
>  * @param cache Cache.
>  */
> private void doTest(Ignite client, IgniteCache cache) {
> long t1 = System.currentTimeMillis();
> for (int i = 0; i < KEYS; i++) {
> try (Transaction tx = 
> client.transactions().txStart(TransactionConcurrency.PESSIMISTIC, 
> TransactionIsolation.REPEATABLE_READ)) {
> cache.put(i, i);
> tx.commit();
> }
> }
> log.info("TPS: " + Math.round(KEYS / 
> (float)(System.currentTimeMillis() - t1) * 1000));
> }
> }
> {noformat}
> My test results are:
> 1. transactional cache, explicit transaction.
> TPS: 2507
> TPS: 1660
> TPS: 1148
> 2. atomic cache
> TPS: 6416
> TPS: 5177
> TPS: 4403
> 3. transactional cache, no explicit transaction
> TPS: 4485
> TPS: 2289
> TPS: 1439



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (IGNITE-5293) Replicated cache performance degradation.

2017-05-26 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-5293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov updated IGNITE-5293:
--
Description: 
With increase in number of nodes puts to replicated cache are slowed down 
almost in the same proportion.

Unit test reproducer:
{noformat}
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *  http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.ignite.internal.processors.cache.distributed.replicated;

import org.apache.ignite.Ignite;
import org.apache.ignite.IgniteCache;
import org.apache.ignite.cache.CacheAtomicityMode;
import org.apache.ignite.cache.CacheMode;
import org.apache.ignite.cache.CacheWriteSynchronizationMode;
import org.apache.ignite.configuration.CacheConfiguration;
import org.apache.ignite.configuration.IgniteConfiguration;
import org.apache.ignite.configuration.MemoryConfiguration;
import org.apache.ignite.internal.IgniteEx;
import org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi;
import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;
import org.apache.ignite.transactions.Transaction;
import org.apache.ignite.transactions.TransactionConcurrency;
import org.apache.ignite.transactions.TransactionIsolation;

/**
 * Tests replicated cache performance .
 */
public class GridCacheReplicatedTransactionalDegradationTest extends 
GridCommonAbstractTest {
/** Keys. */
private static final int KEYS = 100_000;

@Override protected IgniteConfiguration getConfiguration(String gridName) 
throws Exception {
IgniteConfiguration cfg = super.getConfiguration(gridName);

cfg.setClientMode(gridName.startsWith("client"));

CacheConfiguration ccfg = new CacheConfiguration();

ccfg.setCacheMode(CacheMode.REPLICATED);
ccfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);

ccfg.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC);
ccfg.setName("test");

cfg.setCacheConfiguration(ccfg);

return cfg;
}

/** */
public void testThroughput() throws Exception {
try {
IgniteEx grid0 = startGrid(0);

Ignite client = startGrid("client");

IgniteCache cache = client.getOrCreateCache("test");

doTest(client, cache);

startGrid(1);

doTest(client, cache);

startGrid(2);

doTest(client, cache);
} finally {
stopAllGrids();
}
}

/**
 * @param client
 * @param cache Cache.
 */
private void doTest(Ignite client, IgniteCache cache) {
long t1 = System.currentTimeMillis();

for (int i = 0; i < KEYS; i++) {
try (Transaction tx = 
client.transactions().txStart(TransactionConcurrency.PESSIMISTIC, 
TransactionIsolation.REPEATABLE_READ)) {
cache.put(i, i);

tx.commit();
}
}

log.info("TPS: " + Math.round(KEYS / (float)(System.currentTimeMillis() 
- t1) * 1000));
}
}
{noformat}

My test results are:

1. transactional cache, explicit transaction.
TPS: 2507
TPS: 1660
TPS: 1148

2. atomic cache
TPS: 6416
TPS: 5177
TPS: 4403

3. transactional cache, no explicit transaction
TPS: 4485
TPS: 2289
TPS: 1439



  was:
With increase in number of nodes puts to replicated cache are slowed down 
almost in the same proportion.

Unit test reproducer:
{noformat}
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *  http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and

[jira] [Commented] (IGNITE-5293) Replicated cache performance degradation.

2017-05-26 Thread Alexei Scherbakov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-5293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16026088#comment-16026088
 ] 

Alexei Scherbakov commented on IGNITE-5293:
---

Seems in atomic mode this is not an issue.

> Replicated cache performance degradation.
> -
>
> Key: IGNITE-5293
> URL: https://issues.apache.org/jira/browse/IGNITE-5293
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.0
>Reporter: Alexei Scherbakov
> Fix For: 2.2
>
>
> With increase in number of nodes puts to replicated cache are slowed down 
> almost in the same proportion.
> Unit test reproducer:
> {noformat}
> /*
>  * Licensed to the Apache Software Foundation (ASF) under one or more
>  * contributor license agreements.  See the NOTICE file distributed with
>  * this work for additional information regarding copyright ownership.
>  * The ASF licenses this file to You under the Apache License, Version 2.0
>  * (the "License"); you may not use this file except in compliance with
>  * the License.  You may obtain a copy of the License at
>  *
>  *  http://www.apache.org/licenses/LICENSE-2.0
>  *
>  * Unless required by applicable law or agreed to in writing, software
>  * distributed under the License is distributed on an "AS IS" BASIS,
>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  * See the License for the specific language governing permissions and
>  * limitations under the License.
>  */
> package org.apache.ignite.internal.processors.cache.distributed.replicated;
> import org.apache.ignite.Ignite;
> import org.apache.ignite.IgniteCache;
> import org.apache.ignite.cache.CacheAtomicityMode;
> import org.apache.ignite.cache.CacheMode;
> import org.apache.ignite.cache.CacheWriteSynchronizationMode;
> import org.apache.ignite.configuration.CacheConfiguration;
> import org.apache.ignite.configuration.IgniteConfiguration;
> import org.apache.ignite.configuration.MemoryConfiguration;
> import org.apache.ignite.internal.IgniteEx;
> import org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi;
> import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;
> import org.apache.ignite.transactions.Transaction;
> import org.apache.ignite.transactions.TransactionConcurrency;
> import org.apache.ignite.transactions.TransactionIsolation;
> /**
>  * Tests replicated cache performance .
>  */
> public class GridCacheReplicatedTransactionalDegradationTest extends 
> GridCommonAbstractTest {
> /** Keys. */
> private static final int KEYS = 100_000;
> @Override protected IgniteConfiguration getConfiguration(String gridName) 
> throws Exception {
> IgniteConfiguration cfg = super.getConfiguration(gridName);
> cfg.setClientMode(gridName.startsWith("client"));
> CacheConfiguration ccfg = new CacheConfiguration();
> ccfg.setCacheMode(CacheMode.REPLICATED);
> ccfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
> 
> ccfg.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC);
> ccfg.setName("test");
> cfg.setCacheConfiguration(ccfg);
> return cfg;
> }
> /** */
> public void testThroughput() throws Exception {
> try {
> IgniteEx grid0 = startGrid(0);
> Ignite client = startGrid("client");
> IgniteCache cache = 
> client.getOrCreateCache("test");
> doTest(client, cache);
> startGrid(1);
> doTest(client, cache);
> startGrid(2);
> doTest(client, cache);
> } finally {
> stopAllGrids();
> }
> }
> /**
>  * @param client
>  * @param cache Cache.
>  */
> private void doTest(Ignite client, IgniteCache cache) {
> long t1 = System.currentTimeMillis();
> for (int i = 0; i < KEYS; i++) {
> try (Transaction tx = 
> client.transactions().txStart(TransactionConcurrency.PESSIMISTIC, 
> TransactionIsolation.REPEATABLE_READ)) {
> cache.put(i, i);
> tx.commit();
> }
> }
> log.info("TPS: " + Math.round(KEYS / 
> (float)(System.currentTimeMillis() - t1) * 1000));
> }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (IGNITE-5293) Replicated cache performance degradation.

2017-05-25 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-5293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov updated IGNITE-5293:
--
Affects Version/s: 2.0
Fix Version/s: 2.2

> Replicated cache performance degradation.
> -
>
> Key: IGNITE-5293
> URL: https://issues.apache.org/jira/browse/IGNITE-5293
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.0
>Reporter: Alexei Scherbakov
> Fix For: 2.2
>
>
> With increase in number of nodes puts to replicated cache are slowed down 
> almost in the same proportion.
> Unit test reproducer:
> {noformat}
> /*
>  * Licensed to the Apache Software Foundation (ASF) under one or more
>  * contributor license agreements.  See the NOTICE file distributed with
>  * this work for additional information regarding copyright ownership.
>  * The ASF licenses this file to You under the Apache License, Version 2.0
>  * (the "License"); you may not use this file except in compliance with
>  * the License.  You may obtain a copy of the License at
>  *
>  *  http://www.apache.org/licenses/LICENSE-2.0
>  *
>  * Unless required by applicable law or agreed to in writing, software
>  * distributed under the License is distributed on an "AS IS" BASIS,
>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  * See the License for the specific language governing permissions and
>  * limitations under the License.
>  */
> package org.apache.ignite.internal.processors.cache.distributed.replicated;
> import org.apache.ignite.Ignite;
> import org.apache.ignite.IgniteCache;
> import org.apache.ignite.cache.CacheAtomicityMode;
> import org.apache.ignite.cache.CacheMode;
> import org.apache.ignite.cache.CacheWriteSynchronizationMode;
> import org.apache.ignite.configuration.CacheConfiguration;
> import org.apache.ignite.configuration.IgniteConfiguration;
> import org.apache.ignite.configuration.MemoryConfiguration;
> import org.apache.ignite.internal.IgniteEx;
> import org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi;
> import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;
> import org.apache.ignite.transactions.Transaction;
> import org.apache.ignite.transactions.TransactionConcurrency;
> import org.apache.ignite.transactions.TransactionIsolation;
> /**
>  * Tests replicated cache performance .
>  */
> public class GridCacheReplicatedTransactionalDegradationTest extends 
> GridCommonAbstractTest {
> /** Keys. */
> private static final int KEYS = 100_000;
> @Override protected IgniteConfiguration getConfiguration(String gridName) 
> throws Exception {
> IgniteConfiguration cfg = super.getConfiguration(gridName);
> cfg.setClientMode(gridName.startsWith("client"));
> CacheConfiguration ccfg = new CacheConfiguration();
> ccfg.setCacheMode(CacheMode.REPLICATED);
> ccfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
> 
> ccfg.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC);
> ccfg.setName("test");
> cfg.setCacheConfiguration(ccfg);
> return cfg;
> }
> /** */
> public void testThroughput() throws Exception {
> try {
> IgniteEx grid0 = startGrid(0);
> Ignite client = startGrid("client");
> IgniteCache cache = 
> client.getOrCreateCache("test");
> doTest(client, cache);
> startGrid(1);
> doTest(client, cache);
> startGrid(2);
> doTest(client, cache);
> } finally {
> stopAllGrids();
> }
> }
> /**
>  * @param client
>  * @param cache Cache.
>  */
> private void doTest(Ignite client, IgniteCache cache) {
> long t1 = System.currentTimeMillis();
> for (int i = 0; i < KEYS; i++) {
> try (Transaction tx = 
> client.transactions().txStart(TransactionConcurrency.PESSIMISTIC, 
> TransactionIsolation.REPEATABLE_READ)) {
> cache.put(i, i);
> tx.commit();
> }
> }
> log.info("TPS: " + Math.round(KEYS / 
> (float)(System.currentTimeMillis() - t1) * 1000));
> }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (IGNITE-5293) Replicated cache performance degradation.

2017-05-25 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-5293:
-

 Summary: Replicated cache performance degradation.
 Key: IGNITE-5293
 URL: https://issues.apache.org/jira/browse/IGNITE-5293
 Project: Ignite
  Issue Type: Bug
  Components: cache
Reporter: Alexei Scherbakov


With increase in number of nodes puts to replicated cache are slowed down 
almost in the same proportion.

Unit test reproducer:
{noformat}
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *  http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.ignite.internal.processors.cache.distributed.replicated;

import org.apache.ignite.Ignite;
import org.apache.ignite.IgniteCache;
import org.apache.ignite.cache.CacheAtomicityMode;
import org.apache.ignite.cache.CacheMode;
import org.apache.ignite.cache.CacheWriteSynchronizationMode;
import org.apache.ignite.configuration.CacheConfiguration;
import org.apache.ignite.configuration.IgniteConfiguration;
import org.apache.ignite.configuration.MemoryConfiguration;
import org.apache.ignite.internal.IgniteEx;
import org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi;
import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;
import org.apache.ignite.transactions.Transaction;
import org.apache.ignite.transactions.TransactionConcurrency;
import org.apache.ignite.transactions.TransactionIsolation;

/**
 * Tests replicated cache performance .
 */
public class GridCacheReplicatedTransactionalDegradationTest extends 
GridCommonAbstractTest {
/** Keys. */
private static final int KEYS = 100_000;

@Override protected IgniteConfiguration getConfiguration(String gridName) 
throws Exception {
IgniteConfiguration cfg = super.getConfiguration(gridName);

cfg.setClientMode(gridName.startsWith("client"));

CacheConfiguration ccfg = new CacheConfiguration();

ccfg.setCacheMode(CacheMode.REPLICATED);
ccfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);

ccfg.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC);
ccfg.setName("test");

cfg.setCacheConfiguration(ccfg);

return cfg;
}

/** */
public void testThroughput() throws Exception {
try {
IgniteEx grid0 = startGrid(0);

Ignite client = startGrid("client");

IgniteCache cache = client.getOrCreateCache("test");

doTest(client, cache);

startGrid(1);

doTest(client, cache);

startGrid(2);

doTest(client, cache);
} finally {
stopAllGrids();
}
}

/**
 * @param client
 * @param cache Cache.
 */
private void doTest(Ignite client, IgniteCache cache) {
long t1 = System.currentTimeMillis();

for (int i = 0; i < KEYS; i++) {
try (Transaction tx = 
client.transactions().txStart(TransactionConcurrency.PESSIMISTIC, 
TransactionIsolation.REPEATABLE_READ)) {
cache.put(i, i);

tx.commit();
}
}

log.info("TPS: " + Math.round(KEYS / (float)(System.currentTimeMillis() 
- t1) * 1000));
}
}
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (IGNITE-5212) Allow custom affinity for system caches

2017-05-12 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-5212:
-

 Summary: Allow custom affinity for system caches
 Key: IGNITE-5212
 URL: https://issues.apache.org/jira/browse/IGNITE-5212
 Project: Ignite
  Issue Type: Bug
Reporter: Alexei Scherbakov


Currently there is no option to specify affinity function atomics system cache, 
which may be not appropriate in custom data placement scenarios.

Suggestion: allow setting custom affinity for AtomicConfiguration



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (IGNITE-5100) Exchange queue is not used properly

2017-04-27 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-5100:
-

 Summary: Exchange queue is not used properly
 Key: IGNITE-5100
 URL: https://issues.apache.org/jira/browse/IGNITE-5100
 Project: Ignite
  Issue Type: Bug
  Components: general
Affects Versions: 1.6
Reporter: Alexei Scherbakov
Priority: Critical
 Fix For: 2.1


Currently exchange futures share same queue for pending(incomplete) and 
completed exchanges.

The queue has fixed hardcoded size of 1000.

This leads to a problem when > 1000 nodes try to enter grid.

In such case oldest exchanges will be removed by ExchangeFutureSet size limit, 
leading to whole exchange hanging.

Solution: 

1. Use separate queues for pending and completed exchanges.

2. Pending exchange queue must be unbounded.

3. Add system property to control exchange history.





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (IGNITE-5093) Better heap usage during exchange on large topologies and cache numbers/partitions.

2017-04-26 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov updated IGNITE-5093:
--
Description: 
I observed huge heap occupation during exchange on large grid installation 
including 136 nodes/1k caches.

Example from machine with 64g heap:

{noformat}
 num #instances #bytes  class name
--
   1:   89728797743069822896  java.util.HashMap$Node
   2:   927316214866180592  [Ljava.util.HashMap$Node;
   3:   2012822924830775008java.lang.Integer
   4:   6247215983811096  [Ljava.lang.Object;
   5:   3383402767741664  [C
   6:   12188669411952   [B
   7:   9923859   635126976   java.util.HashMap
...
{noformat}

Further investigation had showed the heap is polluted during exchange process, 
which involves creating many hashmaps occupying large amounts of memory.

Proposal: use other datastructures to help keep heap usage low.



  was:
I observed huge heap occupation during exchange on large grid installation 
including 136 nodes/1k caches.

Example from machine with 64g heap:

{noformat}
 num #instances #bytes  class name
--
   1:   89728797743069822896  java.util.HashMap$Node
   2:   927316214866180592  [Ljava.util.HashMap$Node;
   3:   2012822924830775008java.lang.Integer
   4:   6247215 983811096  [Ljava.lang.Object;
   5:   3383402 767741664  [C
   6:   12188  669411952   [B
   7:   9923859 635126976   java.util.HashMap
...
{noformat}

Further investigation had showed the heap is polluted during exchange process, 
which involves creating many hashmaps occupying large amounts of memory.

Proposal: use other datastructures to help keep heap usage low.




> Better heap usage during exchange on large topologies and cache 
> numbers/partitions.
> ---
>
> Key: IGNITE-5093
> URL: https://issues.apache.org/jira/browse/IGNITE-5093
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Affects Versions: 1.6
>Reporter: Alexei Scherbakov
>Assignee: Alexei Scherbakov
> Fix For: 2.1
>
>
> I observed huge heap occupation during exchange on large grid installation 
> including 136 nodes/1k caches.
> Example from machine with 64g heap:
> {noformat}
>  num #instances #bytes  class name
> --
>1:   89728797743069822896  java.util.HashMap$Node
>2:   927316214866180592  [Ljava.util.HashMap$Node;
>3:   2012822924830775008java.lang.Integer
>4:   6247215983811096  [Ljava.lang.Object;
>5:   3383402767741664  [C
>6:   12188669411952   [B
>7:   9923859   635126976   java.util.HashMap
> ...
> {noformat}
> Further investigation had showed the heap is polluted during exchange 
> process, which involves creating many hashmaps occupying large amounts of 
> memory.
> Proposal: use other datastructures to help keep heap usage low.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (IGNITE-5093) Better heap usage during exchange on large topologies and cache numbers/partitions.

2017-04-26 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov updated IGNITE-5093:
--
Description: 
I observed huge heap occupation on large grid installation including 136 
nodes/1k caches.

Example from machine with 64g heap:

{noformat}
 num #instances #bytes  class name
--
   1:   89728797743069822896  java.util.HashMap$Node
   2:   927316214866180592  [Ljava.util.HashMap$Node;
   3:   2012822924830775008java.lang.Integer
   4:   6247215983811096  [Ljava.lang.Object;
   5:   3383402767741664  [C
   6:   12188669411952   [B
   7:   9923859   635126976   java.util.HashMap
...
{noformat}

Further investigation had showed the heap is polluted during exchange process, 
which involves creating many hashmaps occupying large amounts of memory.

Proposal: use other datastructures to help keep heap usage low.



  was:
I observed huge heap occupation during exchange on large grid installation 
including 136 nodes/1k caches.

Example from machine with 64g heap:

{noformat}
 num #instances #bytes  class name
--
   1:   89728797743069822896  java.util.HashMap$Node
   2:   927316214866180592  [Ljava.util.HashMap$Node;
   3:   2012822924830775008java.lang.Integer
   4:   6247215983811096  [Ljava.lang.Object;
   5:   3383402767741664  [C
   6:   12188669411952   [B
   7:   9923859   635126976   java.util.HashMap
...
{noformat}

Further investigation had showed the heap is polluted during exchange process, 
which involves creating many hashmaps occupying large amounts of memory.

Proposal: use other datastructures to help keep heap usage low.




> Better heap usage during exchange on large topologies and cache 
> numbers/partitions.
> ---
>
> Key: IGNITE-5093
> URL: https://issues.apache.org/jira/browse/IGNITE-5093
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Affects Versions: 1.6
>Reporter: Alexei Scherbakov
>Assignee: Alexei Scherbakov
> Fix For: 2.1
>
>
> I observed huge heap occupation on large grid installation including 136 
> nodes/1k caches.
> Example from machine with 64g heap:
> {noformat}
>  num #instances #bytes  class name
> --
>1:   89728797743069822896  java.util.HashMap$Node
>2:   927316214866180592  [Ljava.util.HashMap$Node;
>3:   2012822924830775008java.lang.Integer
>4:   6247215983811096  [Ljava.lang.Object;
>5:   3383402767741664  [C
>6:   12188669411952   [B
>7:   9923859   635126976   java.util.HashMap
> ...
> {noformat}
> Further investigation had showed the heap is polluted during exchange 
> process, which involves creating many hashmaps occupying large amounts of 
> memory.
> Proposal: use other datastructures to help keep heap usage low.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (IGNITE-5093) Better heap usage during exchange on large topologies and cache numbers/partitions.

2017-04-26 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-5093:
-

 Summary: Better heap usage during exchange on large topologies and 
cache numbers/partitions.
 Key: IGNITE-5093
 URL: https://issues.apache.org/jira/browse/IGNITE-5093
 Project: Ignite
  Issue Type: Improvement
  Components: general
Affects Versions: 1.6
Reporter: Alexei Scherbakov
Assignee: Alexei Scherbakov
 Fix For: 2.1


I observed huge heap occupation during exchange on large grid installation 
including 136 nodes/1k caches.

Example from machine with 64g heap:

{noformat}
 num #instances #bytes  class name
--
   1:   89728797743069822896  java.util.HashMap$Node
   2:   927316214866180592  [Ljava.util.HashMap$Node;
   3:   2012822924830775008java.lang.Integer
   4:   6247215 983811096  [Ljava.lang.Object;
   5:   3383402 767741664  [C
   6:   12188  669411952   [B
   7:   9923859 635126976   java.util.HashMap
...
{noformat}

Further investigation had showed the heap is polluted during exchange process, 
which involves creating many hashmaps occupying large amounts of memory.

Proposal: use other datastructures to help keep heap usage low.





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (IGNITE-4523) Allow distributed SQL query execution over explicit set of partitions

2017-04-25 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-4523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov updated IGNITE-4523:
--
Fix Version/s: (was: 2.1)
   2.0

> Allow distributed SQL query execution over explicit set of partitions
> -
>
> Key: IGNITE-4523
> URL: https://issues.apache.org/jira/browse/IGNITE-4523
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache, SQL
>Affects Versions: 1.8
>Reporter: Alexei Scherbakov
>Assignee: Alexei Scherbakov
>  Labels: important
> Fix For: 2.0
>
>
> 3Currently distributed SQL query is executed on all nodes containing primary 
> partitions for a cache, sending map query requests on all nodes in grid.
> Sometimes we know in advance which partitions hold a data for query, on 
> example, in case of custom affinity function. 
> Therefore it's possible to reduce number of nodes receiving map query request 
> by providing explicit set of partitions, which will give significant 
> performance advantage and traffic reduction in case of very large clusters.
> Internally we already have such functionality, so the only necessary thing is 
> to provide public API for what.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (IGNITE-5037) Fix broken @AffinityKeyMapped annotation for compute jobs.

2017-04-20 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-5037:
-

 Summary: Fix broken @AffinityKeyMapped annotation for compute jobs.
 Key: IGNITE-5037
 URL: https://issues.apache.org/jira/browse/IGNITE-5037
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 1.7
Reporter: Alexei Scherbakov
 Fix For: 2.1


See related discussion on dev list entitled Proper collocation of computations 
and data.

We must repair data affinity routing for compute jobs. It should work same as 
for affinityCall/Run with partition.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (IGNITE-4523) Allow distributed SQL query execution over explicit set of partitions

2017-04-19 Thread Alexei Scherbakov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974176#comment-15974176
 ] 

Alexei Scherbakov commented on IGNITE-4523:
---

Merged 2.0 changes.
Added new test on unstable topology, fixed issues with unmapped partitions.
TC result looks acceptable [1]
Waiting for review [2]

[1] http://ci.ignite.apache.org/viewLog.html?buildId=564989;
[2] reviews.ignite.apache.org/ignite/review/IGNT-CR-155

> Allow distributed SQL query execution over explicit set of partitions
> -
>
> Key: IGNITE-4523
> URL: https://issues.apache.org/jira/browse/IGNITE-4523
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache, SQL
>Affects Versions: 1.8
>Reporter: Alexei Scherbakov
>Assignee: Alexei Scherbakov
>  Labels: important
> Fix For: 2.0
>
>
> 3Currently distributed SQL query is executed on all nodes containing primary 
> partitions for a cache, sending map query requests on all nodes in grid.
> Sometimes we know in advance which partitions hold a data for query, on 
> example, in case of custom affinity function. 
> Therefore it's possible to reduce number of nodes receiving map query request 
> by providing explicit set of partitions, which will give significant 
> performance advantage and traffic reduction in case of very large clusters.
> Internally we already have such functionality, so the only necessary thing is 
> to provide public API for what.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (IGNITE-4920) LocalDeploymentSpi resources cleanup on spi.register() might clean resources from other tasks using delegating classloader.

2017-04-06 Thread Alexei Scherbakov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15959514#comment-15959514
 ] 

Alexei Scherbakov commented on IGNITE-4920:
---

Fix is ready, waiting for TC.

> LocalDeploymentSpi resources cleanup on spi.register() might clean resources 
> from other tasks using delegating classloader.
> ---
>
> Key: IGNITE-4920
> URL: https://issues.apache.org/jira/browse/IGNITE-4920
> Project: Ignite
>  Issue Type: Bug
>  Components: general
>Affects Versions: 1.6
>Reporter: Alexei Scherbakov
>Assignee: Alexei Scherbakov
> Fix For: 2.0
>
>
> Culrpit is "if" condition in LocalDelpoymentSpi:
> {noformat}
> if (entry.getKey().equals(entry.getValue()) && isResourceExist(ldr, 
> entry.getKey()) &&
> !U.hasParent(clsLdrToIgnore, ldr) && 
> ldrRsrcs.remove(ldr, clsLdrRsrcs)) {
> ...
> }
> {noformat}
> and can be fixed by adding clsLdrRsrcs.containsKey(entry.getKey()):
> {noformat}
> if (entry.getKey().equals(entry.getValue()) && isResourceExist(ldr, 
> entry.getKey()) &&
> !U.hasParent(clsLdrToIgnore, ldr) && 
> clsLdrRsrcs.containsKey(entry.getKey()) && ldrRsrcs.remove(ldr, clsLdrRsrcs)) 
> {
> ...
> }
> {noformat}
> Reproducer (might require multiple runs)
> {noformat}
> /** */
> public class Main {
> public static void main(String args[]) throws MalformedURLException, 
> ClassNotFoundException {
> System.setProperty("IGNITE_CACHE_REMOVED_ENTRIES_TTL", "1");
> IgniteConfiguration cfg = new IgniteConfiguration();
> cfg.setPeerClassLoadingEnabled(true);
> TcpDiscoverySpi spi = new TcpDiscoverySpi();
> spi.setIpFinder(new TcpDiscoveryVmIpFinder(true));
> cfg.setDiscoverySpi(spi);
> final Ignite ignite = Ignition.start(cfg);
> final ClassLoader moduleClsLdr = Main.class.getClassLoader();
> final ClassLoader moduleCLImpl = new DelegateClassLoader(null, 
> moduleClsLdr);
> for (int i = 0; i < 100; i++)
> try {
> Class clazz = moduleCLImpl.loadClass("Main$CallFunction");
> ignite.compute().call(
> 
> (IgniteCallable)clazz.getDeclaredConstructor(ClassLoader.class).newInstance(moduleCLImpl)
> );
> }
> catch (Exception e) {
> e.printStackTrace();
> }
> System.out.println("Done");
> }
> public static class CallFunction implements IgniteCallable, 
> GridPeerDeployAware {
> transient ClassLoader classLoader;
> public CallFunction(ClassLoader cls) {
> this.classLoader = cls;
> }
> public Object call() throws Exception {
> return null;
> }
> public Class deployClass() {
> return this.getClass();
> }
> public ClassLoader classLoader() {
> return classLoader;
> }
> }
> public static class DelegateClassLoader extends ClassLoader {
> private ClassLoader delegateCL;
> public DelegateClassLoader(ClassLoader parent, ClassLoader 
> delegateCL) {
> super(parent); // Parent doesn't matter.
> this.delegateCL = delegateCL;
> }
> @Override
> public URL getResource(String name) {
> return delegateCL.getResource(name);
> }
> @Override
> public Class loadClass(String name) throws ClassNotFoundException {
> return delegateCL.loadClass(name);
> }
> }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (IGNITE-4920) LocalDeploymentSpi resources cleanup on spi.register() might clean resources from other tasks using delegating classloader.

2017-04-06 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov updated IGNITE-4920:
--
Fix Version/s: (was: 2.1)
   2.0

> LocalDeploymentSpi resources cleanup on spi.register() might clean resources 
> from other tasks using delegating classloader.
> ---
>
> Key: IGNITE-4920
> URL: https://issues.apache.org/jira/browse/IGNITE-4920
> Project: Ignite
>  Issue Type: Bug
>  Components: general
>Affects Versions: 1.6
>Reporter: Alexei Scherbakov
>Assignee: Alexei Scherbakov
> Fix For: 2.0
>
>
> Culrpit is "if" condition in LocalDelpoymentSpi:
> {noformat}
> if (entry.getKey().equals(entry.getValue()) && isResourceExist(ldr, 
> entry.getKey()) &&
> !U.hasParent(clsLdrToIgnore, ldr) && 
> ldrRsrcs.remove(ldr, clsLdrRsrcs)) {
> ...
> }
> {noformat}
> and can be fixed by adding clsLdrRsrcs.containsKey(entry.getKey()):
> {noformat}
> if (entry.getKey().equals(entry.getValue()) && isResourceExist(ldr, 
> entry.getKey()) &&
> !U.hasParent(clsLdrToIgnore, ldr) && 
> clsLdrRsrcs.containsKey(entry.getKey()) && ldrRsrcs.remove(ldr, clsLdrRsrcs)) 
> {
> ...
> }
> {noformat}
> Reproducer (might require multiple runs)
> {noformat}
> /** */
> public class Main {
> public static void main(String args[]) throws MalformedURLException, 
> ClassNotFoundException {
> System.setProperty("IGNITE_CACHE_REMOVED_ENTRIES_TTL", "1");
> IgniteConfiguration cfg = new IgniteConfiguration();
> cfg.setPeerClassLoadingEnabled(true);
> TcpDiscoverySpi spi = new TcpDiscoverySpi();
> spi.setIpFinder(new TcpDiscoveryVmIpFinder(true));
> cfg.setDiscoverySpi(spi);
> final Ignite ignite = Ignition.start(cfg);
> final ClassLoader moduleClsLdr = Main.class.getClassLoader();
> final ClassLoader moduleCLImpl = new DelegateClassLoader(null, 
> moduleClsLdr);
> for (int i = 0; i < 100; i++)
> try {
> Class clazz = moduleCLImpl.loadClass("Main$CallFunction");
> ignite.compute().call(
> 
> (IgniteCallable)clazz.getDeclaredConstructor(ClassLoader.class).newInstance(moduleCLImpl)
> );
> }
> catch (Exception e) {
> e.printStackTrace();
> }
> System.out.println("Done");
> }
> public static class CallFunction implements IgniteCallable, 
> GridPeerDeployAware {
> transient ClassLoader classLoader;
> public CallFunction(ClassLoader cls) {
> this.classLoader = cls;
> }
> public Object call() throws Exception {
> return null;
> }
> public Class deployClass() {
> return this.getClass();
> }
> public ClassLoader classLoader() {
> return classLoader;
> }
> }
> public static class DelegateClassLoader extends ClassLoader {
> private ClassLoader delegateCL;
> public DelegateClassLoader(ClassLoader parent, ClassLoader 
> delegateCL) {
> super(parent); // Parent doesn't matter.
> this.delegateCL = delegateCL;
> }
> @Override
> public URL getResource(String name) {
> return delegateCL.getResource(name);
> }
> @Override
> public Class loadClass(String name) throws ClassNotFoundException {
> return delegateCL.loadClass(name);
> }
> }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (IGNITE-4358) Better error reporting in case of unmarshallable classes.

2017-04-06 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov updated IGNITE-4358:
--
Fix Version/s: (was: 2.0)
   2.1

> Better error reporting in case of unmarshallable classes.
> -
>
> Key: IGNITE-4358
> URL: https://issues.apache.org/jira/browse/IGNITE-4358
> Project: Ignite
>  Issue Type: Improvement
>  Components: compute, messaging, newbie
>Affects Versions: 1.6
>Reporter: Alexei Scherbakov
>Assignee: Rohit Mohta
>Priority: Trivial
>  Labels: newbie
> Fix For: 2.1
>
> Attachments: IGNITE-4358-Exceptionlog-05Dec2016.txt, 
> IGNITE-4358-GridClosureProcessor-05Dec2016.patch, PureIgniteRunTest.java
>
>
> When trying to execute Thread's derived class implementing IgniteRunnable 
> using compute API, it silently serializes to null because Thread 
> serialization is prohibited in MarshallerExclusions and throws NPE on 
> executing node.
> We need to throw more informative exception for such case.
> Reproducer in the attachment.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (IGNITE-4378) Affinity function should support assigning partition to subset of cluster nodes

2017-04-06 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-4378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov updated IGNITE-4378:
--
Fix Version/s: (was: 2.0)
   2.1

> Affinity function should support assigning partition to subset of cluster 
> nodes
> ---
>
> Key: IGNITE-4378
> URL: https://issues.apache.org/jira/browse/IGNITE-4378
> Project: Ignite
>  Issue Type: New Feature
>  Components: cache
>Reporter: Dmitriy Setrakyan
>Assignee: Alexei Scherbakov
> Fix For: 2.1
>
>
> Currently both default affinity function(AF) implementations randomly choose 
> primary node among all topology nodes.
> This may not be enough to handle complex data placement scenarios without 
> implementing own AF.
> On example, some partitions can be assigned to more powerful hardware, or 
> limited to subset of cluster nodes due to ease of management or fault 
> tolerance scenarios.
> We should implement node filter, which will allow to choose subset of cluster 
> nodes to place primary and backup partitions.
> With already existing ability to filter backup nodes (using 
> {{AffinityBackupFilter}}) it will allow to implement different approaches to 
> data placement with Ignite without resorting to custom AF.
> It's also desirable to include a practical example of both topology filters 
> based on node attribute values.
> Proposed primary filter interface is below.
> {noformat}
> /**
>  * Allows partition placement to subset of cluster node.
>  *
>  * Backup nodes also will be assigned from the subset.
>  */
> public interface AffinityPrimaryFilter extends IgniteBiClosure List, List> {
> /**
>  * Return nodes allowed to contain given partition.
>  * @param partition Partition.
>  * @param currentTopologyNodes All nodes from current topology.
>  * @return Subset of nodes.
>  */
> @Override public List apply(Integer partition, 
> List currentTopologyNodes);
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (IGNITE-4437) Make sure data structures do not use outTx call

2017-04-06 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-4437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov updated IGNITE-4437:
--
Fix Version/s: (was: 2.0)
   2.1

> Make sure data structures do not use outTx call
> ---
>
> Key: IGNITE-4437
> URL: https://issues.apache.org/jira/browse/IGNITE-4437
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache, data structures
>Reporter: Alexei Scherbakov
> Fix For: 2.1
>
>
> Ignite's data structures use outTx call to avoid an intersection with user 
> transaction.
> This is no longer necessary, because system and user transactions are now 
> separated.
> Need to get rid of these calls.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (IGNITE-4448) Implement correct affinity validation on joining topology.

2017-04-06 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov updated IGNITE-4448:
--
Fix Version/s: (was: 2.0)
   2.1

> Implement correct affinity validation on joining topology.
> --
>
> Key: IGNITE-4448
> URL: https://issues.apache.org/jira/browse/IGNITE-4448
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache
>Reporter: Alexei Scherbakov
>Assignee: Alexei Scherbakov
> Fix For: 2.1
>
>
> Currently on joining a topology only affinity class name and partition number 
> are checked between configurations of local and remote nodes.
> This is not enough in case of configured backup filter and possible extension 
> with primary filter and can lead to disastrous situations due to node 
> misconfiguration.
> We should implement something like {{AffinityValidator}} having signature as 
> follows:
> {noformat}
> boolean validate(Affinity affinity)
> {noformat}
> Maybe it'll be useful for other grid objects as well, like 
> {{CacheStore}},{{NodeFilter}}, etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (IGNITE-3714) Introduce new performance hint for default store-by-value caches behavior.

2017-04-06 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov updated IGNITE-3714:
--
Fix Version/s: (was: 2.0)
   2.1

> Introduce new performance hint for default store-by-value caches behavior.
> --
>
> Key: IGNITE-3714
> URL: https://issues.apache.org/jira/browse/IGNITE-3714
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache, newbie
>Reporter: Alexei Scherbakov
> Fix For: 2.1
>
>
> Default store-by-value semantics of Ignite has bad impact on performance and 
> rarely needed.
> We must print the performance hint if some of the caches have copyOnRead 
> property set to true (default value).
> It must work both for static and dynamic caches.
> Corresponding thread on dev list: 
> http://apache-ignite-developers.2346864.n4.nabble.com/copyOnRead-performance-issues-td10762.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (IGNITE-3905) Optimize RendezvousAffinityFunction

2017-04-06 Thread Alexei Scherbakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexei Scherbakov updated IGNITE-3905:
--
Fix Version/s: (was: 2.0)
   2.1

> Optimize RendezvousAffinityFunction
> ---
>
> Key: IGNITE-3905
> URL: https://issues.apache.org/jira/browse/IGNITE-3905
> Project: Ignite
>  Issue Type: Bug
>  Components: cache, community, general
>Affects Versions: 1.6
>Reporter: Alexei Scherbakov
>Priority: Critical
> Fix For: 2.1
>
>
> Currently RendezvousAffinityFunction.assignPartition generates a lot of 
> garbage if called very often, on example in case of rebalancing a lot of 
> caches.
> This causes excessive pressure on GC, which is not always fast enough to 
> clear memory, producing long GC pauses leading to node segmentation.
> We should cache calculation of nodeHashBytes in node attribute or resort to 
> more efficient node hash calculation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (IGNITE-4920) LocalDeploymentSpi resources cleanup on spi.register() might clean resources from other tasks using delegating classloader.

2017-04-05 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-4920:
-

 Summary: LocalDeploymentSpi resources cleanup on spi.register() 
might clean resources from other tasks using delegating classloader.
 Key: IGNITE-4920
 URL: https://issues.apache.org/jira/browse/IGNITE-4920
 Project: Ignite
  Issue Type: Bug
  Components: general
Affects Versions: 1.6
Reporter: Alexei Scherbakov
Assignee: Alexei Scherbakov
 Fix For: 2.1


Culrpit is "if" condition in LocalDelpoymentSpi:

{noformat}
if (entry.getKey().equals(entry.getValue()) && isResourceExist(ldr, 
entry.getKey()) &&
!U.hasParent(clsLdrToIgnore, ldr) && 
ldrRsrcs.remove(ldr, clsLdrRsrcs)) {
...
}
{noformat}

and can be fixed by adding clsLdrRsrcs.containsKey(entry.getKey()):

{noformat}
if (entry.getKey().equals(entry.getValue()) && isResourceExist(ldr, 
entry.getKey()) &&
!U.hasParent(clsLdrToIgnore, ldr) && 
clsLdrRsrcs.containsKey(entry.getKey()) && ldrRsrcs.remove(ldr, clsLdrRsrcs)) {
...
}
{noformat}

Reproducer (might require multiple runs)

{noformat}
/** */
public class Main {
public static void main(String args[]) throws MalformedURLException, 
ClassNotFoundException {
System.setProperty("IGNITE_CACHE_REMOVED_ENTRIES_TTL", "1");

IgniteConfiguration cfg = new IgniteConfiguration();
cfg.setPeerClassLoadingEnabled(true);

TcpDiscoverySpi spi = new TcpDiscoverySpi();
spi.setIpFinder(new TcpDiscoveryVmIpFinder(true));

cfg.setDiscoverySpi(spi);

final Ignite ignite = Ignition.start(cfg);

final ClassLoader moduleClsLdr = Main.class.getClassLoader();

final ClassLoader moduleCLImpl = new DelegateClassLoader(null, 
moduleClsLdr);

for (int i = 0; i < 100; i++)
try {
Class clazz = moduleCLImpl.loadClass("Main$CallFunction");

ignite.compute().call(

(IgniteCallable)clazz.getDeclaredConstructor(ClassLoader.class).newInstance(moduleCLImpl)
);
}
catch (Exception e) {
e.printStackTrace();
}

System.out.println("Done");
}

public static class CallFunction implements IgniteCallable, 
GridPeerDeployAware {
transient ClassLoader classLoader;

public CallFunction(ClassLoader cls) {
this.classLoader = cls;
}

public Object call() throws Exception {
return null;
}

public Class deployClass() {
return this.getClass();
}

public ClassLoader classLoader() {
return classLoader;
}
}

public static class DelegateClassLoader extends ClassLoader {
private ClassLoader delegateCL;

public DelegateClassLoader(ClassLoader parent, ClassLoader delegateCL) {
super(parent); // Parent doesn't matter.
this.delegateCL = delegateCL;
}

@Override
public URL getResource(String name) {
return delegateCL.getResource(name);
}

@Override
public Class loadClass(String name) throws ClassNotFoundException {
return delegateCL.loadClass(name);
}
}
}
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (IGNITE-4554) Optimize integer sets.

2017-04-04 Thread Alexei Scherbakov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15955671#comment-15955671
 ] 

Alexei Scherbakov commented on IGNITE-4554:
---

Completed compressed bitset implementation based on roaring bitmaps [1]

Next is to replace Set with new data structure and measure heap 
usage/perfomance.

[1] http://roaringbitmap.org/

> Optimize integer sets.
> --
>
> Key: IGNITE-4554
> URL: https://issues.apache.org/jira/browse/IGNITE-4554
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Affects Versions: 1.6
>Reporter: Alexei Scherbakov
>Assignee: Alexei Scherbakov
> Fix For: 2.1
>
>
> Ignite has many uses of data structures like Set, IntArray etc.
> This is not most efficient way to represent integer sets. The best way is to 
> use compressed bit sets. This should save a lot of heap space.
> We should optimize integer sets whenever possible.
> The most obvious place to start is GridAffinityAssignment.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (IGNITE-4523) Allow distributed SQL query execution over explicit set of partitions

2017-04-02 Thread Alexei Scherbakov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952727#comment-15952727
 ] 

Alexei Scherbakov commented on IGNITE-4523:
---

Made final fixes, waiting for TC.

> Allow distributed SQL query execution over explicit set of partitions
> -
>
> Key: IGNITE-4523
> URL: https://issues.apache.org/jira/browse/IGNITE-4523
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache, SQL
>Affects Versions: 1.8
>Reporter: Alexei Scherbakov
>Assignee: Alexei Scherbakov
>  Labels: important
> Fix For: 2.0
>
>
> 3Currently distributed SQL query is executed on all nodes containing primary 
> partitions for a cache, sending map query requests on all nodes in grid.
> Sometimes we know in advance which partitions hold a data for query, on 
> example, in case of custom affinity function. 
> Therefore it's possible to reduce number of nodes receiving map query request 
> by providing explicit set of partitions, which will give significant 
> performance advantage and traffic reduction in case of very large clusters.
> Internally we already have such functionality, so the only necessary thing is 
> to provide public API for what.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (IGNITE-4875) Allow execution over explicit set of partitions for every non-SQL query type with data integrity guarantees.

2017-03-28 Thread Alexei Scherbakov (JIRA)
Alexei Scherbakov created IGNITE-4875:
-

 Summary: Allow execution over explicit set of partitions for every 
non-SQL query type with data integrity guarantees.
 Key: IGNITE-4875
 URL: https://issues.apache.org/jira/browse/IGNITE-4875
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 1.6
Reporter: Alexei Scherbakov
 Fix For: 2.1


The ticket has the same origin as IGNITE-4523.

I've just split SQL and non-SQL parts for the sake of clality.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


<    1   2   3   4   5   6   7   8   >