Re: Tx lock partial happens before

2019-07-12 Thread Ivan Rakov

Hi Anton,


Each get method now checks the consistency.
Check means:
1) tx lock acquired on primary
2) gained data from each owner (primary and backups)
3) data compared
Did you consider acquiring locks on backups as well during your check, 
just like 2PC prepare does?
If there's HB between steps 1 (lock primary) and 2 (update primary + 
lock backup + update backup), you may be sure that there will be no 
false-positive results and no deadlocks as well. Protocol won't be 
complicated: checking read from backup will just wait for commit if it's 
in progress.


Best Regards,
Ivan Rakov

On 12.07.2019 9:47, Anton Vinogradov wrote:

Igniters,

Let me explain problem in detail.
Read Repair at pessimistic tx (locks acquired on primary, full sync, 2pc)
able to see consistency violation because backups are not updated yet.
This seems to be not a good idea to "fix" code to unlock primary only when
backups updated, this definitely will cause a performance drop.
Currently, there is no explicit sync feature allows waiting for backups
updated during the previous tx.
Previous tx just sends GridNearTxFinishResponse to the originating node.

Bad ideas how to handle this:
- retry some times (still possible to gain false positive)
- lock tx entry on backups (will definitely break failover logic)
- wait for same entry version on backups during some timeout (will require
huge changes at "get" logic and false positive still possible)

Is there any simple fix for this issue?
Thanks for tips in advance.

Ivan,
thanks for your interest


4. Very fast and lucky txB writes a value 2 for the key on primary and

backup.
AFAIK, reordering not possible since backups "prepared" before primary
releases lock.
So, consistency guaranteed by failover and by "prepare" feature of 2PC.
Seems, the problem is NOT with consistency at AI, but with consistency
detection implementation (RR) and possible "false positive" results.
BTW, checked 1PC case (only one data node at test) and gained no issues.

On Fri, Jul 12, 2019 at 9:26 AM Павлухин Иван  wrote:


Anton,

Is such behavior observed for 2PC or for 1PC optimization? Does not it
mean that the things can be even worse and an inconsistent write is
possible on a backup? E.g. in scenario:
1. txA writes a value 1 for the key on primary.
2. txA unlocks the key on primary.
3. txA freezes before updating backup.
4. Very fast and lucky txB writes a value 2 for the key on primary and
backup.
5. txB wakes up and writes 1 for the key.
6. As result there is 2 on primary and 1 on backup.

Naively it seems that locks should be released after all replicas are
updated.

ср, 10 июл. 2019 г. в 16:36, Anton Vinogradov :

Folks,

Investigating now unexpected repairs [1] in case of ReadRepair usage at
testAccountTxNodeRestart.
Updated [2] the test to check is there any repairs happen.
Test's name now is "testAccountTxNodeRestartWithReadRepair".

Each get method now checks the consistency.
Check means:
1) tx lock acquired on primary
2) gained data from each owner (primary and backups)
3) data compared

Sometime, backup may have obsolete value during such check.

Seems, this happen because tx commit on primary going in the following

way

(check code [2] for details):
1) performing localFinish (releases tx lock)
2) performing dhtFinish (commits on backups)
3) transferring control back to the caller

So, seems, the problem here is that "tx lock released on primary" does

not

mean that backups updated, but "commit() method finished at caller's
thread" does.
This means that, currently, there is no happens-before between
1) thread 1 committed data on primary and tx lock can be reobtained
2) thread 2 reads from backup
but still strong HB between "commit() finished" and "backup updated"

So, it seems to be possible, for example, to gain notification by a
continuous query, then read from backup and gain obsolete value.

Is this "partial happens before" behavior expected?

[1] https://issues.apache.org/jira/browse/IGNITE-11973
[2] https://github.com/apache/ignite/pull/6679/files
[3]


org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal#finishTx



--
Best regards,
Ivan Pavlukhin



[jira] [Created] (IGNITE-11979) Add ability to set default parallelizm of rebuild indexes in configuration

2019-07-12 Thread Denis Chudov (JIRA)
Denis Chudov created IGNITE-11979:
-

 Summary: Add ability to set default parallelizm of rebuild indexes 
in configuration
 Key: IGNITE-11979
 URL: https://issues.apache.org/jira/browse/IGNITE-11979
 Project: Ignite
  Issue Type: Improvement
Reporter: Denis Chudov
Assignee: Denis Chudov


We can't change SchemaIndexCacheVisitorImpl#DFLT_PARALLELISM at the moment:
{code:java}
/** Default degree of parallelism. */
private static final int DFLT_PARALLELISM = Math.min(4, Math.max(1, 
Runtime.getRuntime().availableProcessors() / 4));
{code}
On huge servers with a lot of cores (such as 56) we will rebuild indexes in 4 
threads. I think we should have ability to set DFLT_PARALLELISM in Ignite 
configuration.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IGNITE-11978) Javadoc enhancement for the ReadRepair feature.

2019-07-12 Thread Vyacheslav Koptilin (JIRA)
Vyacheslav Koptilin created IGNITE-11978:


 Summary: Javadoc enhancement for the ReadRepair feature.
 Key: IGNITE-11978
 URL: https://issues.apache.org/jira/browse/IGNITE-11978
 Project: Ignite
  Issue Type: Bug
Reporter: Vyacheslav Koptilin
Assignee: Vyacheslav Koptilin


The newly added `ReadRepair` feature requires Javadoc improvements.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


Abandoning support of Visual Studio 2010

2019-07-12 Thread Igor Sapego
Hello Igniters,

Soon there will be Ignite 3.0, so I though this may be a good time
to speak about abandoning Visual Studio 2010 (msvc 10.0) support,
which will allow us to move to more modern features in Ignite C++.

Any thoughts or objections? Are there people which will be critically
affected by this? What are version of VS you are using, if you are
using one?

Best Regards,
Igor


[jira] [Created] (IGNITE-11977) Data streamer pool MXBean is registered as ThreadPoolMXBean instead of StripedExecutorMXBean

2019-07-12 Thread Stanislav Lukyanov (JIRA)
Stanislav Lukyanov created IGNITE-11977:
---

 Summary: Data streamer pool MXBean is registered as 
ThreadPoolMXBean instead of StripedExecutorMXBean
 Key: IGNITE-11977
 URL: https://issues.apache.org/jira/browse/IGNITE-11977
 Project: Ignite
  Issue Type: Bug
Reporter: Stanislav Lukyanov


Data streamer pool is registered with a ThreadPoolMXBean while it is actually a 
StripedExecutor and can use a StripedExecutorMXBean.

Need to change the registration in the IgniteKernal code. It should be 
registered the same way as the striped executor pool.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IGNITE-11976) @SpringResource is silently ignored if no Spring context is provided

2019-07-12 Thread Stanislav Lukyanov (JIRA)
Stanislav Lukyanov created IGNITE-11976:
---

 Summary: @SpringResource is silently ignored if no Spring context 
is provided
 Key: IGNITE-11976
 URL: https://issues.apache.org/jira/browse/IGNITE-11976
 Project: Ignite
  Issue Type: Improvement
  Components: spring
Affects Versions: 2.7
Reporter: Stanislav Lukyanov


@SpringResource annotation is silently ignored and the annotated field is null 
if Spring context.

For @SpringResource to work the node needs to be started with 
IgniteSpring::start instead of Ignition::start, but the user may not know that.

Need to add a warning for this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


Re: Tx lock partial happens before

2019-07-12 Thread Anton Vinogradov
Igniters,

Let me explain problem in detail.
Read Repair at pessimistic tx (locks acquired on primary, full sync, 2pc)
able to see consistency violation because backups are not updated yet.
This seems to be not a good idea to "fix" code to unlock primary only when
backups updated, this definitely will cause a performance drop.
Currently, there is no explicit sync feature allows waiting for backups
updated during the previous tx.
Previous tx just sends GridNearTxFinishResponse to the originating node.

Bad ideas how to handle this:
- retry some times (still possible to gain false positive)
- lock tx entry on backups (will definitely break failover logic)
- wait for same entry version on backups during some timeout (will require
huge changes at "get" logic and false positive still possible)

Is there any simple fix for this issue?
Thanks for tips in advance.

Ivan,
thanks for your interest

>> 4. Very fast and lucky txB writes a value 2 for the key on primary and
backup.
AFAIK, reordering not possible since backups "prepared" before primary
releases lock.
So, consistency guaranteed by failover and by "prepare" feature of 2PC.
Seems, the problem is NOT with consistency at AI, but with consistency
detection implementation (RR) and possible "false positive" results.
BTW, checked 1PC case (only one data node at test) and gained no issues.

On Fri, Jul 12, 2019 at 9:26 AM Павлухин Иван  wrote:

> Anton,
>
> Is such behavior observed for 2PC or for 1PC optimization? Does not it
> mean that the things can be even worse and an inconsistent write is
> possible on a backup? E.g. in scenario:
> 1. txA writes a value 1 for the key on primary.
> 2. txA unlocks the key on primary.
> 3. txA freezes before updating backup.
> 4. Very fast and lucky txB writes a value 2 for the key on primary and
> backup.
> 5. txB wakes up and writes 1 for the key.
> 6. As result there is 2 on primary and 1 on backup.
>
> Naively it seems that locks should be released after all replicas are
> updated.
>
> ср, 10 июл. 2019 г. в 16:36, Anton Vinogradov :
> >
> > Folks,
> >
> > Investigating now unexpected repairs [1] in case of ReadRepair usage at
> > testAccountTxNodeRestart.
> > Updated [2] the test to check is there any repairs happen.
> > Test's name now is "testAccountTxNodeRestartWithReadRepair".
> >
> > Each get method now checks the consistency.
> > Check means:
> > 1) tx lock acquired on primary
> > 2) gained data from each owner (primary and backups)
> > 3) data compared
> >
> > Sometime, backup may have obsolete value during such check.
> >
> > Seems, this happen because tx commit on primary going in the following
> way
> > (check code [2] for details):
> > 1) performing localFinish (releases tx lock)
> > 2) performing dhtFinish (commits on backups)
> > 3) transferring control back to the caller
> >
> > So, seems, the problem here is that "tx lock released on primary" does
> not
> > mean that backups updated, but "commit() method finished at caller's
> > thread" does.
> > This means that, currently, there is no happens-before between
> > 1) thread 1 committed data on primary and tx lock can be reobtained
> > 2) thread 2 reads from backup
> > but still strong HB between "commit() finished" and "backup updated"
> >
> > So, it seems to be possible, for example, to gain notification by a
> > continuous query, then read from backup and gain obsolete value.
> >
> > Is this "partial happens before" behavior expected?
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-11973
> > [2] https://github.com/apache/ignite/pull/6679/files
> > [3]
> >
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal#finishTx
>
>
>
> --
> Best regards,
> Ivan Pavlukhin
>


Re: Tx lock partial happens before

2019-07-12 Thread Павлухин Иван
Anton,

Is such behavior observed for 2PC or for 1PC optimization? Does not it
mean that the things can be even worse and an inconsistent write is
possible on a backup? E.g. in scenario:
1. txA writes a value 1 for the key on primary.
2. txA unlocks the key on primary.
3. txA freezes before updating backup.
4. Very fast and lucky txB writes a value 2 for the key on primary and backup.
5. txB wakes up and writes 1 for the key.
6. As result there is 2 on primary and 1 on backup.

Naively it seems that locks should be released after all replicas are updated.

ср, 10 июл. 2019 г. в 16:36, Anton Vinogradov :
>
> Folks,
>
> Investigating now unexpected repairs [1] in case of ReadRepair usage at
> testAccountTxNodeRestart.
> Updated [2] the test to check is there any repairs happen.
> Test's name now is "testAccountTxNodeRestartWithReadRepair".
>
> Each get method now checks the consistency.
> Check means:
> 1) tx lock acquired on primary
> 2) gained data from each owner (primary and backups)
> 3) data compared
>
> Sometime, backup may have obsolete value during such check.
>
> Seems, this happen because tx commit on primary going in the following way
> (check code [2] for details):
> 1) performing localFinish (releases tx lock)
> 2) performing dhtFinish (commits on backups)
> 3) transferring control back to the caller
>
> So, seems, the problem here is that "tx lock released on primary" does not
> mean that backups updated, but "commit() method finished at caller's
> thread" does.
> This means that, currently, there is no happens-before between
> 1) thread 1 committed data on primary and tx lock can be reobtained
> 2) thread 2 reads from backup
> but still strong HB between "commit() finished" and "backup updated"
>
> So, it seems to be possible, for example, to gain notification by a
> continuous query, then read from backup and gain obsolete value.
>
> Is this "partial happens before" behavior expected?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-11973
> [2] https://github.com/apache/ignite/pull/6679/files
> [3]
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal#finishTx



-- 
Best regards,
Ivan Pavlukhin