date:20180424

Right, as far as I understand we are not arguing on whether BLT is needed
or not. The main questions are how to properly deliver this feature to
users and how to deal with co-location issues between persistent and
non-persistent caches. Looks like change policies are the way to go for the
first question.

As far as co-location, it is important to note that different affinity
distribution for in-memory and persistent caches automatically means that
we loose SQL joins and predictable behavior of any affinity-based
operations. It means that if we calculated the same affinity for persistent
and in-memory caches at some point, we cannot re-distribute in-memory
caches differently if some nodes go down without breaking co-located
computations, am I right?

On Tue, Apr 24, 2018 at 10:19 PM, Alexey Goncharuk <
alexey.goncha...@gmail.com> wrote:

> Well, this means that the concept of baseline is still needed because we
> must not reassign partitions immediately (note that this is not identical
> to rebalance delay!). The approach you describe is identical to baseline
> change policies and I have nothing against this, their implementation was
> planned to phase II of baseline changes.
>
> 2018-04-24 21:31 GMT+03:00 Vladimir Ozerov :
>
> > Alex,
> >
> > CockroachDB is based on RAFT and is able to repair itself automatically
> [1]
> > [2]. Their approach looks reasonable to me and is pretty much similar to
> > MongoDB and Cassandra. In short, you distinguish between short-term and
> > long-term failures.
> > 1) First, you wait for small time window in hope that it was a network
> > glitch or restart. Even if this was a segmentation, with true consensus
> > algorithm this is not an issue - you partitions or the whole cluster is
> > unavailable during this window.
> > 2) Then, if majority is still there and cluster is operational you
> trigger
> > automatic rebalance.
> > 3) Last, if you need fine-grained control you can tune or disable
> > auto-rebalance and do some manual magic.
> >
> > This is very nice approach: it is simple for simple use cases and complex
> > for complex use cases. Ideally, this is how Ignite should work. Want to
> > play and write hello-world app? Just learn what cache is. Started
> > developing moderately complex application? Learn about affinity, cache
> > modes, etc.. Going to enterprise scale? Learn about BLAT, activation,
> etc..
> >
> > It seems that old behavior without BLAT and even without manual
> activation
> > would be enough for majority of our users. At the very least it is enough
> > for order of magnitude more popular Cassandra and MongoDB.
> >
> > [1]
> > https://www.cockroachlabs.com/docs/stable/frequently-asked-
> > questions.html#how-does-cockroachdb-survive-failures
> > [2]
> > https://www.cockroachlabs.com/docs/stable/training/fault-
> > tolerance-and-automated-repair.html
> >
> > On Tue, Apr 24, 2018 at 7:55 PM, Alexey Goncharuk <
> > alexey.goncha...@gmail.com> wrote:
> >
> > > Vladimir,
> > >
> > > Automatic cluster membership changes may be implemented to grow the
> > > topology, but auto-shrinking topology is usually not possible because a
> > > process cannot distinguish between a node shutdown and network
> > > partitioning. If we want to deal with split-brain scenarios as a
> grown-up
> > > system, we should change the replication strategy within partitions to
> a
> > > consensus algorithm (I really hope we will). None of the consensus
> > > algorithms (at least known to me - paxos, raft, ZAB) do auto cluster
> > > adjustments based on a internally-detected process failure. I consider
> > > baseline topology as a step towards this model.
> > >
> > > Addressing your second concern, If a node was down for a short period
> of
> > > time, we should (and we do) rebalance only deltas, which is faster than
> > > erasing the whole node and moving all data from scratch.
> > >
> > > 2018-04-24 19:42 GMT+03:00 Vladimir Ozerov :
> > >
> > > > Ivan,
> > > >
> > > > This reasoning sounds questionable to me. First, separate logic for
> in
> > > > memory and persistent regions means that we loose collocation between
> > > > persistent and non persistent caches. Second, “data is still on disk”
> > > > assumption might be not valid if node has left due to disk crash, or
> > when
> > > > data is updated on remaining nodes.
> > > >
> > > > вт, 24 апр. 2018 г. в 19:21, Ivan Rakov :
> > > >
> > > > > Stan,
> > > > >
> > > > > I believe it was discussed at the design proposal thread:
> > > > >
> > > > > http://apache-ignite-developers.2346864.n4.nabble.
> > > > com/Cluster-auto-activation-design-proposal-td20295.html
> > > > >
> > > > > The short answer: backup factor decreases if node leaves. In
> > > > > non-persistent mode we have to rebalance data ASAP - otherwise last
> > > node
> > > > > that owns partition may fail and data will be lost forever.
> > > > > This is not necessary if data is persisted to disk storage,

Re: Ticket review checklist

2018-04-24 Thread Andrey Kuznetsov

+1.

Once again, I beg for "small refactoring permission" in a checklist. As of
today, separate tickets for small refactorings has lowest priority, since
they neither fix any flaw nor add new functionality. Also, the attempts to
make issue-related code safer / cleaner / more readable in "real" pull
requests are typically rejected, since they contradict our current
guidelines.

I understand this will require a bit more effort from committer/maintainer,
but otherwise we will get constantly degrading code quality.


2018-04-24 18:52 GMT+03:00 Eduard Shangareev :

> Vladimir,
>
> I am not talking about massive/sophisticated refactoring. But I believe
> that ask to extract some methods should be OK to do without an extra
> ticket.
>
> A checklist shouldn't be necessarily a set of certain rules but also it
> could include suggestion and reminders.
>
> On Tue, Apr 24, 2018 at 6:39 PM, Vladimir Ozerov 
> wrote:
>
> > Ed,
> >
> > Refactoring is a separate task. If you would like to rework exchange
> future
> > - please do this in a ticket "Refactor exchange task", nobody would
> against
> > this. This is just a matter of creating separate ticket and separate PR.
> If
> > one have a time for refactoring, it should not be a problem for him to
> > spend several minutes on JIRA and GitHub.
> >
> > As far as documentation - what you describe is normal review process,
> when
> > reviewer might want to ask contributor to fix something. Checklist is a
> > different thing - this is a set of rules which must be followed by
> anyone.
> > I do not understand how you can define documentation in this checklist.
> > Same problem with logging - what is "enough"?
> >
> > On Tue, Apr 24, 2018 at 4:51 PM, Eduard Shangareev <
> > eduard.shangar...@gmail.com> wrote:
> >
> > > Igniters,
> > >
> > > I don't understand why you are so against refactoring.
> > > Code already smells like hell. Methods 200+ line is normal. Exchange
> > future
> > > is asking to be separated on several one. Transaction code could
> > understand
> > > few people.
> > >
> > > If we separate refactoring from development it would mean that no one
> > will
> > > do it.
> > >
> > >
> > > 2) Documentation.
> > > Everything which was asked by reviewers to clarify idea should be
> > reflected
> > > in the code.
> > >
> > > 3) Logging.
> > > Logging should be enough to troubleshoot the problem if someone comes
> to
> > > user-list with an issue in the code.
> > >
> > >
> > > On Fri, Apr 20, 2018 at 7:06 PM, Dmitry Pavlov 
> > > wrote:
> > >
> > > > Hi Igniters,
> > > >
> > > > +1 to idea of checklist.
> > > >
> > > > +1 to refactoring and documenting code related to ticket in +/-20 LOC
> > at
> > > > least.
> > > >
> > > > If we start to do it as part of our regular contribution, code will
> be
> > > > better, it would became common practice and part of Apache Ignite
> > > > development culure.
> > > >
> > > > If we will hope we will have free time to submit separate patch
> someday
> > > and
> > > > have patience to complete patch-submission process, code will remain
> > > > undocumented and poor-readable.
> > > >
> > > > Sincerely,
> > > > Dmitriy Pavlov
> > > >
> > > > пт, 20 апр. 2018 г. в 18:56, Александр Меньшиков <
> sharple...@gmail.com
> > >:
> > > >
> > > > > 4) Metrics.
> > > > > partially +1
> > > > >
> > > > > It makes sense to have some minimal code coverage for new code in
> PR.
> > > > IMHO.
> > > > >
> > > > > Also, we can limit the cyclomatic complexity of the new code in PR
> > too.
> > > > >
> > > > > 6) Refactoring
> > > > > -1
> > > > >
> > > > > I understand why people want to refactor old code.
> > > > > But I think refactoring should be always a separate task.
> > > > > And it's better to remove all refactoring from PR, if it's not the
> > > sense
> > > > of
> > > > > the issue.
> > > > >
> > > > >
> > > > > 2018-04-20 16:54 GMT+03:00 Andrey Kuznetsov :
> > > > >
> > > > > > What about adding the following item to the checklist: when the
> > > change
> > > > > adds
> > > > > > new functionality, then unit tests should also be provided, if
> it's
> > > > > > technically possible?
> > > > > >
> > > > > > As for refactorings, in fact they are strongly discouraged today
> > for
> > > > some
> > > > > > unclear reason. Let's permit to make refactorings in the
> checklist
> > > > being
> > > > > > discussed. (Of cource, refactoring should relate to problem being
> > > > > solved.)
> > > > > >
> > > > > > 2018-04-20 16:16 GMT+03:00 Vladimir Ozerov  >:
> > > > > >
> > > > > > > Hi Ed,
> > > > > > >
> > > > > > > Unfortunately some of these points are not good candidates for
> > the
> > > > > > > checklist because of these:
> > > > > > > - It must be clear and disallow *multiple interpretations*
> > > > > > > - It must be *lightweight*, otherwise Ignite development would
> > > > become a
> > > > > > > nightmare
> > > > > > >
> > > > > > >

Re: New definition for affinity node (issues with baseline)

Well, this means that the concept of baseline is still needed because we
must not reassign partitions immediately (note that this is not identical
to rebalance delay!). The approach you describe is identical to baseline
change policies and I have nothing against this, their implementation was
planned to phase II of baseline changes.

2018-04-24 21:31 GMT+03:00 Vladimir Ozerov :

> Alex,
>
> CockroachDB is based on RAFT and is able to repair itself automatically [1]
> [2]. Their approach looks reasonable to me and is pretty much similar to
> MongoDB and Cassandra. In short, you distinguish between short-term and
> long-term failures.
> 1) First, you wait for small time window in hope that it was a network
> glitch or restart. Even if this was a segmentation, with true consensus
> algorithm this is not an issue - you partitions or the whole cluster is
> unavailable during this window.
> 2) Then, if majority is still there and cluster is operational you trigger
> automatic rebalance.
> 3) Last, if you need fine-grained control you can tune or disable
> auto-rebalance and do some manual magic.
>
> This is very nice approach: it is simple for simple use cases and complex
> for complex use cases. Ideally, this is how Ignite should work. Want to
> play and write hello-world app? Just learn what cache is. Started
> developing moderately complex application? Learn about affinity, cache
> modes, etc.. Going to enterprise scale? Learn about BLAT, activation, etc..
>
> It seems that old behavior without BLAT and even without manual activation
> would be enough for majority of our users. At the very least it is enough
> for order of magnitude more popular Cassandra and MongoDB.
>
> [1]
> https://www.cockroachlabs.com/docs/stable/frequently-asked-
> questions.html#how-does-cockroachdb-survive-failures
> [2]
> https://www.cockroachlabs.com/docs/stable/training/fault-
> tolerance-and-automated-repair.html
>
> On Tue, Apr 24, 2018 at 7:55 PM, Alexey Goncharuk <
> alexey.goncha...@gmail.com> wrote:
>
> > Vladimir,
> >
> > Automatic cluster membership changes may be implemented to grow the
> > topology, but auto-shrinking topology is usually not possible because a
> > process cannot distinguish between a node shutdown and network
> > partitioning. If we want to deal with split-brain scenarios as a grown-up
> > system, we should change the replication strategy within partitions to a
> > consensus algorithm (I really hope we will). None of the consensus
> > algorithms (at least known to me - paxos, raft, ZAB) do auto cluster
> > adjustments based on a internally-detected process failure. I consider
> > baseline topology as a step towards this model.
> >
> > Addressing your second concern, If a node was down for a short period of
> > time, we should (and we do) rebalance only deltas, which is faster than
> > erasing the whole node and moving all data from scratch.
> >
> > 2018-04-24 19:42 GMT+03:00 Vladimir Ozerov :
> >
> > > Ivan,
> > >
> > > This reasoning sounds questionable to me. First, separate logic for in
> > > memory and persistent regions means that we loose collocation between
> > > persistent and non persistent caches. Second, “data is still on disk”
> > > assumption might be not valid if node has left due to disk crash, or
> when
> > > data is updated on remaining nodes.
> > >
> > > вт, 24 апр. 2018 г. в 19:21, Ivan Rakov :
> > >
> > > > Stan,
> > > >
> > > > I believe it was discussed at the design proposal thread:
> > > >
> > > > http://apache-ignite-developers.2346864.n4.nabble.
> > > com/Cluster-auto-activation-design-proposal-td20295.html
> > > >
> > > > The short answer: backup factor decreases if node leaves. In
> > > > non-persistent mode we have to rebalance data ASAP - otherwise last
> > node
> > > > that owns partition may fail and data will be lost forever.
> > > > This is not necessary if data is persisted to disk storage, that's
> the
> > > > reason for Baseline Topology concept.
> > > >
> > > > Best Regards,
> > > > Ivan Rakov
> > > >
> > > > On 24.04.2018 18:48, Stanislav Lukyanov wrote:
> > > > > + for Vladimir's point - adding more complexity may (and likely
> will)
> > > be
> > > > > even more misleading.
> > > > >
> > > > > Can we take a step back and discuss why do we need to have
> different
> > > > > behavior for persistent and in-memory caches? Can we make in-memory
> > > > caches
> > > > > honor baseline instead of special-casing them?
> > > > >
> > > > > Thanks,
> > > > > Stan
> > > > >
> > > > >
> > > > > вт, 24 апр. 2018 г., 18:28 Vladimir Ozerov :
> > > > >
> > > > >> Guys,
> > > > >>
> > > > >> As a user I definitely do not want to think about BLATs, SATs,
> DATs,
> > > > >> whatsoever. I want to query data, iterate over data, send compute
> > > tasks
> > > > to
> > > > >> data. If certain node is outside of BLAT and do not have data,
> then
> > > > this is
> > > > >> not affinity node. Can we

[jira] [Created] (IGNITE-8382) Problem with ignite-spring-data and Spring Boot 2

2018-04-24 Thread Patrice R (JIRA)

Patrice R created IGNITE-8382:
-

 Summary: Problem with ignite-spring-data and Spring Boot 2
 Key: IGNITE-8382
 URL: https://issues.apache.org/jira/browse/IGNITE-8382
 Project: Ignite
  Issue Type: Bug
  Components: spring
Affects Versions: 2.4
Reporter: Patrice R


Hi,

I've tried to update to Spring Boot 2 using an IgniteRepository (from 
ignite-spring-data) and I got the following exception during the start.

The same code with Spring Boot 1.5.9 is working.

 

{color:#FF}_***_{color}
{color:#FF}_APPLICATION FAILED TO START_{color}
{color:#FF}_***_{color}

{color:#FF}_Description:_{color}

{color:#FF}_Parameter 0 of constructor in 
org.apache.ignite.springdata.repository.support.IgniteRepositoryImpl required a 
bean of type 'org.apache.ignite.IgniteCache' that could not be found._{color}


{color:#FF}_Action:_{color}

{color:#FF}_Consider defining a bean of type 
'org.apache.ignite.IgniteCache' in your configuration._{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: New definition for affinity node (issues with baseline)

Alex,

CockroachDB is based on RAFT and is able to repair itself automatically [1]
[2]. Their approach looks reasonable to me and is pretty much similar to
MongoDB and Cassandra. In short, you distinguish between short-term and
long-term failures.
1) First, you wait for small time window in hope that it was a network
glitch or restart. Even if this was a segmentation, with true consensus
algorithm this is not an issue - you partitions or the whole cluster is
unavailable during this window.
2) Then, if majority is still there and cluster is operational you trigger
automatic rebalance.
3) Last, if you need fine-grained control you can tune or disable
auto-rebalance and do some manual magic.

This is very nice approach: it is simple for simple use cases and complex
for complex use cases. Ideally, this is how Ignite should work. Want to
play and write hello-world app? Just learn what cache is. Started
developing moderately complex application? Learn about affinity, cache
modes, etc.. Going to enterprise scale? Learn about BLAT, activation, etc..

It seems that old behavior without BLAT and even without manual activation
would be enough for majority of our users. At the very least it is enough
for order of magnitude more popular Cassandra and MongoDB.

[1]
https://www.cockroachlabs.com/docs/stable/frequently-asked-questions.html#how-does-cockroachdb-survive-failures
[2]
https://www.cockroachlabs.com/docs/stable/training/fault-tolerance-and-automated-repair.html

On Tue, Apr 24, 2018 at 7:55 PM, Alexey Goncharuk <
alexey.goncha...@gmail.com> wrote:

> Vladimir,
>
> Automatic cluster membership changes may be implemented to grow the
> topology, but auto-shrinking topology is usually not possible because a
> process cannot distinguish between a node shutdown and network
> partitioning. If we want to deal with split-brain scenarios as a grown-up
> system, we should change the replication strategy within partitions to a
> consensus algorithm (I really hope we will). None of the consensus
> algorithms (at least known to me - paxos, raft, ZAB) do auto cluster
> adjustments based on a internally-detected process failure. I consider
> baseline topology as a step towards this model.
>
> Addressing your second concern, If a node was down for a short period of
> time, we should (and we do) rebalance only deltas, which is faster than
> erasing the whole node and moving all data from scratch.
>
> 2018-04-24 19:42 GMT+03:00 Vladimir Ozerov :
>
> > Ivan,
> >
> > This reasoning sounds questionable to me. First, separate logic for in
> > memory and persistent regions means that we loose collocation between
> > persistent and non persistent caches. Second, “data is still on disk”
> > assumption might be not valid if node has left due to disk crash, or when
> > data is updated on remaining nodes.
> >
> > вт, 24 апр. 2018 г. в 19:21, Ivan Rakov :
> >
> > > Stan,
> > >
> > > I believe it was discussed at the design proposal thread:
> > >
> > > http://apache-ignite-developers.2346864.n4.nabble.
> > com/Cluster-auto-activation-design-proposal-td20295.html
> > >
> > > The short answer: backup factor decreases if node leaves. In
> > > non-persistent mode we have to rebalance data ASAP - otherwise last
> node
> > > that owns partition may fail and data will be lost forever.
> > > This is not necessary if data is persisted to disk storage, that's the
> > > reason for Baseline Topology concept.
> > >
> > > Best Regards,
> > > Ivan Rakov
> > >
> > > On 24.04.2018 18:48, Stanislav Lukyanov wrote:
> > > > + for Vladimir's point - adding more complexity may (and likely will)
> > be
> > > > even more misleading.
> > > >
> > > > Can we take a step back and discuss why do we need to have different
> > > > behavior for persistent and in-memory caches? Can we make in-memory
> > > caches
> > > > honor baseline instead of special-casing them?
> > > >
> > > > Thanks,
> > > > Stan
> > > >
> > > >
> > > > вт, 24 апр. 2018 г., 18:28 Vladimir Ozerov :
> > > >
> > > >> Guys,
> > > >>
> > > >> As a user I definitely do not want to think about BLATs, SATs, DATs,
> > > >> whatsoever. I want to query data, iterate over data, send compute
> > tasks
> > > to
> > > >> data. If certain node is outside of BLAT and do not have data, then
> > > this is
> > > >> not affinity node. Can we just fix affinity logic to take in count
> > BLAT
> > > >> appropriately?
> > > >>
> > > >> On Tue, Apr 24, 2018 at 6:12 PM, Ivan Rakov 
> > > wrote:
> > > >>
> > > >>> Eduard,
> > > >>>
> > > >>> Can you please summarize code changes that you are proposing?
> > > >>> I agree that BLT is a bit misleading term and DAT/SAT make more
> > sense.
> > > >>> However, establishing a consensus on v2.4 Baseline Topology
> > terminology
> > > >>> took a long time and seems like you are going to cause a bit more
> > > >>> perturbations.
> > > >>> I still don't understand what and how should be changed.

[GitHub] ignite pull request #3670: IGNITE-7823

2018-04-24 Thread xtern

GitHub user xtern reopened a pull request:

https://github.com/apache/ignite/pull/3670

IGNITE-7823



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xtern/ignite IGNITE-7823

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/3670.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3670


commit 5b14addadd7c35c55ff89fa6e3b667d6f9543d90
Author: pereslegin-pa 
Date:   2018-03-14T07:57:27Z

IGNITE-7823 wip

commit e35b3b3fc7d53c4bafa554e9af8b2dc15aac92e1
Author: pereslegin-pa 
Date:   2018-03-15T07:41:21Z

IGNITE-7823 wip igniteset -> set.

commit 2c64017ff204bd956d5b9a8a5f104b98118bade2
Author: pereslegin-pa 
Date:   2018-03-15T13:28:17Z

IGNITE-7823 replicated test

commit 68aad61342fb7daa8b1fe2cc2647f1a1aaf8c7b6
Author: pereslegin-pa 
Date:   2018-03-21T11:24:13Z

IGNITE-7823 Use separate cache for non-collocated IgniteSet (don't create 
IgniteCache#asSet API)

commit 837fe8467895d9e49eba9ce24c61483f06dcd626
Author: pereslegin-pa 
Date:   2018-03-28T15:03:44Z

IGNITE-7823 Fail flaky test.

commit dc9c117ceaffe2307b582e7844549e1758e3017d
Author: pereslegin-pa 
Date:   2018-04-03T07:49:37Z

IGNITE-7823 Code cleanup.

commit 6561cc9535697a1b544ac98fd80c40f6579a977d
Author: pereslegin-pa 
Date:   2018-04-06T14:31:06Z

IGNITE-7823 Backward compatibility support.

commit b934305c3e9e2de3a11278b10110d66bb1bfa68b
Author: pereslegin-pa 
Date:   2018-04-09T18:16:35Z

IGNITE-7823 Refactoring.

commit 0f17fc028cf1b0b968a3d245017143294c125da2
Author: pereslegin-pa 
Date:   2018-04-17T13:33:11Z

IGNITE-7823 Code cleanup.

commit 1ad5ba910015671ad1e5bce696e39e5804477c7e
Author: pereslegin-pa 
Date:   2018-04-19T14:47:07Z

IGNITE-7823 Do not create new version of IgniteSet if old node present.

commit 8a3e80dd68211eb9906eb0559e75a793a7e80193
Author: pereslegin-pa 
Date:   2018-04-19T15:58:07Z

IGNITE-7823 No inheritance.

commit 1837e51d1397caf49a2ffa0e098a0a2a36e5aac0
Author: pereslegin-pa 
Date:   2018-04-20T16:46:46Z

IGNITE-7823 Non collocated mode enabled only for PARTITIONED cache.

commit 53a60e27d262804f21dab0d5f10fc805ac354e9f
Author: pereslegin-pa 
Date:   2018-04-22T18:35:55Z

IGNITE-7823 Cache compatible mode.

commit 252c566cbc16adb4f1ee88eee3d055ba52f3e3c6
Author: pereslegin-pa 
Date:   2018-04-23T10:55:28Z

IGNITE-7823 Minor renaming.

commit 550bf74a462775c2066ec4ffb08fa0c413558f8a
Author: pereslegin-pa 
Date:   2018-04-23T13:38:04Z

IGNITE-7823 Documentation improvement.

commit 9bb1240c6b9df9797f0ed652699b6ab6bd8d545a
Author: pereslegin-pa 
Date:   2018-04-23T15:44:10Z

IGNITE-7823 IgnieSet CHM mem leak fix.

commit bc483d2386ca96a0f30ce05a217edc4cc86b8bcd
Author: pereslegin-pa 
Date:   2018-04-24T13:15:03Z

IGNITE-7823 Calc collocated in manager.

commit 81618bba9639b67342bab43b9403967e6a198472
Author: pereslegin-pa 
Date:   2018-04-24T17:18:10Z

IGNITE-7823 Code cleanup.




---

[GitHub] ignite pull request #3670: Separate cache for non-collocated IgniteSet (TC r...

2018-04-24 Thread xtern

Github user xtern closed the pull request at:

https://github.com/apache/ignite/pull/3670


---

Re: New definition for affinity node (issues with baseline)


- for in-memory caches, affinity would calculate with SAT/BLAT on the first
step and because of it collocation would work between in-memory and
persistent caches;
- on the next step, if there are offline nodes, we would spread their
partitions among alive nodes. This would save us from data loss.

+1 to this approach.
I can't estimate how hard is it to implement, but seems like it solves 
both collocation and data loss issues.


Best Regards,
Ivan Rakov

On 24.04.2018 20:29, Eduard Shangareev wrote:

Igniters,

I have introduced DAT in opposition to BLAT (SAT) because they reflect how
Ignite works.

But I actually have concerns about the necessity of such separation.

DAT exists only because we don't want to lose any data in in-memory caches.

But there are alternatives. Besides BLAT auto-change policies I would pay
attention to next approach:
- for in-memory caches, affinity would calculate with SAT/BLAT on the first
step and because of it collocation would work between in-memory and
persistent caches;
- on the next step, if there are offline nodes, we would spread their
partitions among alive nodes. This would save us from data loss.

I don't want to propose any changes until we don't have consensus.



On Tue, Apr 24, 2018 at 7:55 PM, Alexey Goncharuk <
alexey.goncha...@gmail.com> wrote:


Vladimir,

Automatic cluster membership changes may be implemented to grow the
topology, but auto-shrinking topology is usually not possible because a
process cannot distinguish between a node shutdown and network
partitioning. If we want to deal with split-brain scenarios as a grown-up
system, we should change the replication strategy within partitions to a
consensus algorithm (I really hope we will). None of the consensus
algorithms (at least known to me - paxos, raft, ZAB) do auto cluster
adjustments based on a internally-detected process failure. I consider
baseline topology as a step towards this model.

Addressing your second concern, If a node was down for a short period of
time, we should (and we do) rebalance only deltas, which is faster than
erasing the whole node and moving all data from scratch.

2018-04-24 19:42 GMT+03:00 Vladimir Ozerov :


Ivan,

This reasoning sounds questionable to me. First, separate logic for in
memory and persistent regions means that we loose collocation between
persistent and non persistent caches. Second, “data is still on disk”
assumption might be not valid if node has left due to disk crash, or when
data is updated on remaining nodes.

вт, 24 апр. 2018 г. в 19:21, Ivan Rakov :


Stan,

I believe it was discussed at the design proposal thread:

http://apache-ignite-developers.2346864.n4.nabble.

com/Cluster-auto-activation-design-proposal-td20295.html

The short answer: backup factor decreases if node leaves. In
non-persistent mode we have to rebalance data ASAP - otherwise last

node

that owns partition may fail and data will be lost forever.
This is not necessary if data is persisted to disk storage, that's the
reason for Baseline Topology concept.

Best Regards,
Ivan Rakov

On 24.04.2018 18:48, Stanislav Lukyanov wrote:

+ for Vladimir's point - adding more complexity may (and likely will)

be

even more misleading.

Can we take a step back and discuss why do we need to have different
behavior for persistent and in-memory caches? Can we make in-memory

caches

honor baseline instead of special-casing them?

Thanks,
Stan


вт, 24 апр. 2018 г., 18:28 Vladimir Ozerov :


Guys,

As a user I definitely do not want to think about BLATs, SATs, DATs,
whatsoever. I want to query data, iterate over data, send compute

tasks

to

data. If certain node is outside of BLAT and do not have data, then

this is

not affinity node. Can we just fix affinity logic to take in count

BLAT

appropriately?

On Tue, Apr 24, 2018 at 6:12 PM, Ivan Rakov 

wrote:

Eduard,

Can you please summarize code changes that you are proposing?
I agree that BLT is a bit misleading term and DAT/SAT make more

sense.

However, establishing a consensus on v2.4 Baseline Topology

terminology

took a long time and seems like you are going to cause a bit more
perturbations.
I still don't understand what and how should be changed. Please

provide

summary of upcoming class renamings and changes of existing system

parts.

Best Regards,
Ivan Rakov


On 24.04.2018 17:46, Eduard Shangareev wrote:


Hi, Igniters,

I want to raise a topic about our affinity node definition.

After adding baseline (affinity) topology (BL(A)T) things start

being

complicated.

Plenty of bugs appears:

IGNITE-8173
ignite.getOrCreateCache(cacheConfig).iterator() method works

incorrect

for
replicated cache in case if some data node isn't in baseline

IGNITE-7628
SqlQuery hangs indefinitely with additional not registered in

baseline

node.

It's because everything relies on concept "affinity node".
And until now it was as simple as a server node which

Re: New definition for affinity node (issues with baseline)

Igniters,

I have introduced DAT in opposition to BLAT (SAT) because they reflect how
Ignite works.

But I actually have concerns about the necessity of such separation.

DAT exists only because we don't want to lose any data in in-memory caches.

But there are alternatives. Besides BLAT auto-change policies I would pay
attention to next approach:
- for in-memory caches, affinity would calculate with SAT/BLAT on the first
step and because of it collocation would work between in-memory and
persistent caches;
- on the next step, if there are offline nodes, we would spread their
partitions among alive nodes. This would save us from data loss.

I don't want to propose any changes until we don't have consensus.



On Tue, Apr 24, 2018 at 7:55 PM, Alexey Goncharuk <
alexey.goncha...@gmail.com> wrote:

> Vladimir,
>
> Automatic cluster membership changes may be implemented to grow the
> topology, but auto-shrinking topology is usually not possible because a
> process cannot distinguish between a node shutdown and network
> partitioning. If we want to deal with split-brain scenarios as a grown-up
> system, we should change the replication strategy within partitions to a
> consensus algorithm (I really hope we will). None of the consensus
> algorithms (at least known to me - paxos, raft, ZAB) do auto cluster
> adjustments based on a internally-detected process failure. I consider
> baseline topology as a step towards this model.
>
> Addressing your second concern, If a node was down for a short period of
> time, we should (and we do) rebalance only deltas, which is faster than
> erasing the whole node and moving all data from scratch.
>
> 2018-04-24 19:42 GMT+03:00 Vladimir Ozerov :
>
> > Ivan,
> >
> > This reasoning sounds questionable to me. First, separate logic for in
> > memory and persistent regions means that we loose collocation between
> > persistent and non persistent caches. Second, “data is still on disk”
> > assumption might be not valid if node has left due to disk crash, or when
> > data is updated on remaining nodes.
> >
> > вт, 24 апр. 2018 г. в 19:21, Ivan Rakov :
> >
> > > Stan,
> > >
> > > I believe it was discussed at the design proposal thread:
> > >
> > > http://apache-ignite-developers.2346864.n4.nabble.
> > com/Cluster-auto-activation-design-proposal-td20295.html
> > >
> > > The short answer: backup factor decreases if node leaves. In
> > > non-persistent mode we have to rebalance data ASAP - otherwise last
> node
> > > that owns partition may fail and data will be lost forever.
> > > This is not necessary if data is persisted to disk storage, that's the
> > > reason for Baseline Topology concept.
> > >
> > > Best Regards,
> > > Ivan Rakov
> > >
> > > On 24.04.2018 18:48, Stanislav Lukyanov wrote:
> > > > + for Vladimir's point - adding more complexity may (and likely will)
> > be
> > > > even more misleading.
> > > >
> > > > Can we take a step back and discuss why do we need to have different
> > > > behavior for persistent and in-memory caches? Can we make in-memory
> > > caches
> > > > honor baseline instead of special-casing them?
> > > >
> > > > Thanks,
> > > > Stan
> > > >
> > > >
> > > > вт, 24 апр. 2018 г., 18:28 Vladimir Ozerov :
> > > >
> > > >> Guys,
> > > >>
> > > >> As a user I definitely do not want to think about BLATs, SATs, DATs,
> > > >> whatsoever. I want to query data, iterate over data, send compute
> > tasks
> > > to
> > > >> data. If certain node is outside of BLAT and do not have data, then
> > > this is
> > > >> not affinity node. Can we just fix affinity logic to take in count
> > BLAT
> > > >> appropriately?
> > > >>
> > > >> On Tue, Apr 24, 2018 at 6:12 PM, Ivan Rakov 
> > > wrote:
> > > >>
> > > >>> Eduard,
> > > >>>
> > > >>> Can you please summarize code changes that you are proposing?
> > > >>> I agree that BLT is a bit misleading term and DAT/SAT make more
> > sense.
> > > >>> However, establishing a consensus on v2.4 Baseline Topology
> > terminology
> > > >>> took a long time and seems like you are going to cause a bit more
> > > >>> perturbations.
> > > >>> I still don't understand what and how should be changed. Please
> > provide
> > > >>> summary of upcoming class renamings and changes of existing system
> > > parts.
> > > >>>
> > > >>> Best Regards,
> > > >>> Ivan Rakov
> > > >>>
> > > >>>
> > > >>> On 24.04.2018 17:46, Eduard Shangareev wrote:
> > > >>>
> > >  Hi, Igniters,
> > > 
> > >  I want to raise a topic about our affinity node definition.
> > > 
> > >  After adding baseline (affinity) topology (BL(A)T) things start
> > being
> > >  complicated.
> > > 
> > >  Plenty of bugs appears:
> > > 
> > >  IGNITE-8173
> > >  ignite.getOrCreateCache(cacheConfig).iterator() method works
> > incorrect
> > >  for
> > >  replicated cache in case if some data node isn't in baseline
> > > 
> > >  IGNITE-7628
> >

Re: New definition for affinity node (issues with baseline)

Vladimir,

Automatic cluster membership changes may be implemented to grow the
topology, but auto-shrinking topology is usually not possible because a
process cannot distinguish between a node shutdown and network
partitioning. If we want to deal with split-brain scenarios as a grown-up
system, we should change the replication strategy within partitions to a
consensus algorithm (I really hope we will). None of the consensus
algorithms (at least known to me - paxos, raft, ZAB) do auto cluster
adjustments based on a internally-detected process failure. I consider
baseline topology as a step towards this model.

Addressing your second concern, If a node was down for a short period of
time, we should (and we do) rebalance only deltas, which is faster than
erasing the whole node and moving all data from scratch.

2018-04-24 19:42 GMT+03:00 Vladimir Ozerov :

> Ivan,
>
> This reasoning sounds questionable to me. First, separate logic for in
> memory and persistent regions means that we loose collocation between
> persistent and non persistent caches. Second, “data is still on disk”
> assumption might be not valid if node has left due to disk crash, or when
> data is updated on remaining nodes.
>
> вт, 24 апр. 2018 г. в 19:21, Ivan Rakov :
>
> > Stan,
> >
> > I believe it was discussed at the design proposal thread:
> >
> > http://apache-ignite-developers.2346864.n4.nabble.
> com/Cluster-auto-activation-design-proposal-td20295.html
> >
> > The short answer: backup factor decreases if node leaves. In
> > non-persistent mode we have to rebalance data ASAP - otherwise last node
> > that owns partition may fail and data will be lost forever.
> > This is not necessary if data is persisted to disk storage, that's the
> > reason for Baseline Topology concept.
> >
> > Best Regards,
> > Ivan Rakov
> >
> > On 24.04.2018 18:48, Stanislav Lukyanov wrote:
> > > + for Vladimir's point - adding more complexity may (and likely will)
> be
> > > even more misleading.
> > >
> > > Can we take a step back and discuss why do we need to have different
> > > behavior for persistent and in-memory caches? Can we make in-memory
> > caches
> > > honor baseline instead of special-casing them?
> > >
> > > Thanks,
> > > Stan
> > >
> > >
> > > вт, 24 апр. 2018 г., 18:28 Vladimir Ozerov :
> > >
> > >> Guys,
> > >>
> > >> As a user I definitely do not want to think about BLATs, SATs, DATs,
> > >> whatsoever. I want to query data, iterate over data, send compute
> tasks
> > to
> > >> data. If certain node is outside of BLAT and do not have data, then
> > this is
> > >> not affinity node. Can we just fix affinity logic to take in count
> BLAT
> > >> appropriately?
> > >>
> > >> On Tue, Apr 24, 2018 at 6:12 PM, Ivan Rakov 
> > wrote:
> > >>
> > >>> Eduard,
> > >>>
> > >>> Can you please summarize code changes that you are proposing?
> > >>> I agree that BLT is a bit misleading term and DAT/SAT make more
> sense.
> > >>> However, establishing a consensus on v2.4 Baseline Topology
> terminology
> > >>> took a long time and seems like you are going to cause a bit more
> > >>> perturbations.
> > >>> I still don't understand what and how should be changed. Please
> provide
> > >>> summary of upcoming class renamings and changes of existing system
> > parts.
> > >>>
> > >>> Best Regards,
> > >>> Ivan Rakov
> > >>>
> > >>>
> > >>> On 24.04.2018 17:46, Eduard Shangareev wrote:
> > >>>
> >  Hi, Igniters,
> > 
> >  I want to raise a topic about our affinity node definition.
> > 
> >  After adding baseline (affinity) topology (BL(A)T) things start
> being
> >  complicated.
> > 
> >  Plenty of bugs appears:
> > 
> >  IGNITE-8173
> >  ignite.getOrCreateCache(cacheConfig).iterator() method works
> incorrect
> >  for
> >  replicated cache in case if some data node isn't in baseline
> > 
> >  IGNITE-7628
> >  SqlQuery hangs indefinitely with additional not registered in
> baseline
> >  node.
> > 
> >  It's because everything relies on concept "affinity node".
> >  And until now it was as simple as a server node which passes node
> > >> filter.
> >  Other words any server node which is not filtered out by node
> filter.
> > 
> >  But node which is not in BL(A)T and which passes node filter would
> be
> >  treated as affinity node. And it's definitely wrong. At least, it
> is a
> >  source of many bugs (I believe there are much more than those 2
> which
> > I
> >  already have mentioned).
> > 
> >  It's clear that this definition should be changed.
> >  Let's start with a new definition of "Affinity topology". Affinity
> >  topology
> >  is a set of nodes which potentially could keep data.
> > 
> >  If we use knowledge about the current realization we can say that 1.
> > for
> >  in-memory cache groups it would be all server nodes;
> >  2.

[GitHub] ignite pull request #3911: IGNITE-8358 Destroy partition inside evictor to p...

2018-04-24 Thread Jokser

GitHub user Jokser opened a pull request:

https://github.com/apache/ignite/pull/3911

IGNITE-8358 Destroy partition inside evictor to prevent possible deadlock



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gridgain/apache-ignite ignite-8358

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/3911.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3911


commit c8d9423521f9be9123c570b57cf638aad46583dc
Author: Pavel Kovalenko 
Date:   2018-04-24T16:45:08Z

IGNITE-8358 Destroy partition inside evictor to prevent possible deadlock.




---

Re: New definition for affinity node (issues with baseline)

Ivan,

This reasoning sounds questionable to me. First, separate logic for in
memory and persistent regions means that we loose collocation between
persistent and non persistent caches. Second, “data is still on disk”
assumption might be not valid if node has left due to disk crash, or when
data is updated on remaining nodes.

вт, 24 апр. 2018 г. в 19:21, Ivan Rakov :

> Stan,
>
> I believe it was discussed at the design proposal thread:
>
> http://apache-ignite-developers.2346864.n4.nabble.com/Cluster-auto-activation-design-proposal-td20295.html
>
> The short answer: backup factor decreases if node leaves. In
> non-persistent mode we have to rebalance data ASAP - otherwise last node
> that owns partition may fail and data will be lost forever.
> This is not necessary if data is persisted to disk storage, that's the
> reason for Baseline Topology concept.
>
> Best Regards,
> Ivan Rakov
>
> On 24.04.2018 18:48, Stanislav Lukyanov wrote:
> > + for Vladimir's point - adding more complexity may (and likely will) be
> > even more misleading.
> >
> > Can we take a step back and discuss why do we need to have different
> > behavior for persistent and in-memory caches? Can we make in-memory
> caches
> > honor baseline instead of special-casing them?
> >
> > Thanks,
> > Stan
> >
> >
> > вт, 24 апр. 2018 г., 18:28 Vladimir Ozerov :
> >
> >> Guys,
> >>
> >> As a user I definitely do not want to think about BLATs, SATs, DATs,
> >> whatsoever. I want to query data, iterate over data, send compute tasks
> to
> >> data. If certain node is outside of BLAT and do not have data, then
> this is
> >> not affinity node. Can we just fix affinity logic to take in count BLAT
> >> appropriately?
> >>
> >> On Tue, Apr 24, 2018 at 6:12 PM, Ivan Rakov 
> wrote:
> >>
> >>> Eduard,
> >>>
> >>> Can you please summarize code changes that you are proposing?
> >>> I agree that BLT is a bit misleading term and DAT/SAT make more sense.
> >>> However, establishing a consensus on v2.4 Baseline Topology terminology
> >>> took a long time and seems like you are going to cause a bit more
> >>> perturbations.
> >>> I still don't understand what and how should be changed. Please provide
> >>> summary of upcoming class renamings and changes of existing system
> parts.
> >>>
> >>> Best Regards,
> >>> Ivan Rakov
> >>>
> >>>
> >>> On 24.04.2018 17:46, Eduard Shangareev wrote:
> >>>
>  Hi, Igniters,
> 
>  I want to raise a topic about our affinity node definition.
> 
>  After adding baseline (affinity) topology (BL(A)T) things start being
>  complicated.
> 
>  Plenty of bugs appears:
> 
>  IGNITE-8173
>  ignite.getOrCreateCache(cacheConfig).iterator() method works incorrect
>  for
>  replicated cache in case if some data node isn't in baseline
> 
>  IGNITE-7628
>  SqlQuery hangs indefinitely with additional not registered in baseline
>  node.
> 
>  It's because everything relies on concept "affinity node".
>  And until now it was as simple as a server node which passes node
> >> filter.
>  Other words any server node which is not filtered out by node filter.
> 
>  But node which is not in BL(A)T and which passes node filter would be
>  treated as affinity node. And it's definitely wrong. At least, it is a
>  source of many bugs (I believe there are much more than those 2 which
> I
>  already have mentioned).
> 
>  It's clear that this definition should be changed.
>  Let's start with a new definition of "Affinity topology". Affinity
>  topology
>  is a set of nodes which potentially could keep data.
> 
>  If we use knowledge about the current realization we can say that 1.
> for
>  in-memory cache groups it would be all server nodes;
>  2. for persistent cache groups it would be BL(A)T.
> 
>  I will further use Dynamic Affinity Topology or DAT for 1 (in-memory
> >> cache
>  groups) and Static Affinity Topology or SAT instead BL(A)T, or 2nd
> >> point.
>  Denote node filter as f(X), where X is affinity topology.
> 
>  Then we can say that node A is affinity node if
>  A ∈ AT', where AT' = f(AT), where AT is DAT or SAT.
> 
>  It worth to mention that AT' should be used to pass to affinity
> function
>  of
>  cache groups.
>  Also, AT and AT' could change during the time (BL(A)T changes or node
>  joins/disconnections).
> 
>  And I don't like fact that usage of DAT or SAT relies on persistence
>  settings (Should we make it configurable per cache group?).
> 
>  Ok, I have created a ticket to implement this changes and will start
>  working on it.
>  https://issues.apache.org/jira/browse/IGNITE-8380 (Affinity node
>  calculation doesn't take into account BLT).
> 
>  Also, I want to use these definitions (Affinity Topology, Affinity
> Node,
>  DAT, SAT) in

[GitHub] ignite pull request #3910: IGNITE-7896 FilePageStore truncate now actually r...

2018-04-24 Thread ivandasch

GitHub user ivandasch opened a pull request:

https://github.com/apache/ignite/pull/3910

IGNITE-7896 FilePageStore truncate now actually remove redundant

partition page file.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ivandasch/ignite ignite-7896

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/3910.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3910






---

Re: New definition for affinity node (issues with baseline)

Stan,

I believe it was discussed at the design proposal thread:
http://apache-ignite-developers.2346864.n4.nabble.com/Cluster-auto-activation-design-proposal-td20295.html

The short answer: backup factor decreases if node leaves. In
non-persistent mode we have to rebalance data ASAP - otherwise last node
that owns partition may fail and data will be lost forever.
This is not necessary if data is persisted to disk storage, that's the
reason for Baseline Topology concept.

Best Regards,
Ivan Rakov

On 24.04.2018 18:48, Stanislav Lukyanov wrote:

+ for Vladimir's point - adding more complexity may (and likely will) be
even more misleading.

Can we take a step back and discuss why do we need to have different
behavior for persistent and in-memory caches? Can we make in-memory caches
honor baseline instead of special-casing them?

Thanks,
Stan

вт, 24 апр. 2018 г., 18:28 Vladimir Ozerov :

Guys,

As a user I definitely do not want to think about BLATs, SATs, DATs,
whatsoever. I want to query data, iterate over data, send compute tasks to
data. If certain node is outside of BLAT and do not have data, then this is
not affinity node. Can we just fix affinity logic to take in count BLAT
appropriately?

On Tue, Apr 24, 2018 at 6:12 PM, Ivan Rakov wrote:

Eduard,

Can you please summarize code changes that you are proposing?
I agree that BLT is a bit misleading term and DAT/SAT make more sense.
However, establishing a consensus on v2.4 Baseline Topology terminology
took a long time and seems like you are going to cause a bit more
perturbations.
I still don't understand what and how should be changed. Please provide
summary of upcoming class renamings and changes of existing system parts.

Best Regards,
Ivan Rakov

On 24.04.2018 17:46, Eduard Shangareev wrote:

Hi, Igniters,

I want to raise a topic about our affinity node definition.

After adding baseline (affinity) topology (BL(A)T) things start being
complicated.

Plenty of bugs appears:

IGNITE-8173
ignite.getOrCreateCache(cacheConfig).iterator() method works incorrect
for
replicated cache in case if some data node isn't in baseline

IGNITE-7628
SqlQuery hangs indefinitely with additional not registered in baseline
node.

It's because everything relies on concept "affinity node".
And until now it was as simple as a server node which passes node

filter.

Other words any server node which is not filtered out by node filter.

But node which is not in BL(A)T and which passes node filter would be
treated as affinity node. And it's definitely wrong. At least, it is a
source of many bugs (I believe there are much more than those 2 which I
already have mentioned).

It's clear that this definition should be changed.
Let's start with a new definition of "Affinity topology". Affinity
topology
is a set of nodes which potentially could keep data.

If we use knowledge about the current realization we can say that 1. for
in-memory cache groups it would be all server nodes;
2. for persistent cache groups it would be BL(A)T.

I will further use Dynamic Affinity Topology or DAT for 1 (in-memory

cache

groups) and Static Affinity Topology or SAT instead BL(A)T, or 2nd

point.

Denote node filter as f(X), where X is affinity topology.

Then we can say that node A is affinity node if
A ∈ AT', where AT' = f(AT), where AT is DAT or SAT.

It worth to mention that AT' should be used to pass to affinity function
of
cache groups.
Also, AT and AT' could change during the time (BL(A)T changes or node
joins/disconnections).

And I don't like fact that usage of DAT or SAT relies on persistence
settings (Should we make it configurable per cache group?).

Ok, I have created a ticket to implement this changes and will start
working on it.
https://issues.apache.org/jira/browse/IGNITE-8380 (Affinity node
calculation doesn't take into account BLT).

Also, I want to use these definitions (Affinity Topology, Affinity Node,
DAT, SAT) in documentation and java docs.

Maybe, we also should consider replacing BL(A)T with SAT.

Thank you for your attention.

Re: Ticket review checklist

Vladimir,

I am not talking about massive/sophisticated refactoring. But I believe
that ask to extract some methods should be OK to do without an extra ticket.

A checklist shouldn't be necessarily a set of certain rules but also it
could include suggestion and reminders.

On Tue, Apr 24, 2018 at 6:39 PM, Vladimir Ozerov 
wrote:

> Ed,
>
> Refactoring is a separate task. If you would like to rework exchange future
> - please do this in a ticket "Refactor exchange task", nobody would against
> this. This is just a matter of creating separate ticket and separate PR. If
> one have a time for refactoring, it should not be a problem for him to
> spend several minutes on JIRA and GitHub.
>
> As far as documentation - what you describe is normal review process, when
> reviewer might want to ask contributor to fix something. Checklist is a
> different thing - this is a set of rules which must be followed by anyone.
> I do not understand how you can define documentation in this checklist.
> Same problem with logging - what is "enough"?
>
> On Tue, Apr 24, 2018 at 4:51 PM, Eduard Shangareev <
> eduard.shangar...@gmail.com> wrote:
>
> > Igniters,
> >
> > I don't understand why you are so against refactoring.
> > Code already smells like hell. Methods 200+ line is normal. Exchange
> future
> > is asking to be separated on several one. Transaction code could
> understand
> > few people.
> >
> > If we separate refactoring from development it would mean that no one
> will
> > do it.
> >
> >
> > 2) Documentation.
> > Everything which was asked by reviewers to clarify idea should be
> reflected
> > in the code.
> >
> > 3) Logging.
> > Logging should be enough to troubleshoot the problem if someone comes to
> > user-list with an issue in the code.
> >
> >
> > On Fri, Apr 20, 2018 at 7:06 PM, Dmitry Pavlov 
> > wrote:
> >
> > > Hi Igniters,
> > >
> > > +1 to idea of checklist.
> > >
> > > +1 to refactoring and documenting code related to ticket in +/-20 LOC
> at
> > > least.
> > >
> > > If we start to do it as part of our regular contribution, code will be
> > > better, it would became common practice and part of Apache Ignite
> > > development culure.
> > >
> > > If we will hope we will have free time to submit separate patch someday
> > and
> > > have patience to complete patch-submission process, code will remain
> > > undocumented and poor-readable.
> > >
> > > Sincerely,
> > > Dmitriy Pavlov
> > >
> > > пт, 20 апр. 2018 г. в 18:56, Александр Меньшиков  >:
> > >
> > > > 4) Metrics.
> > > > partially +1
> > > >
> > > > It makes sense to have some minimal code coverage for new code in PR.
> > > IMHO.
> > > >
> > > > Also, we can limit the cyclomatic complexity of the new code in PR
> too.
> > > >
> > > > 6) Refactoring
> > > > -1
> > > >
> > > > I understand why people want to refactor old code.
> > > > But I think refactoring should be always a separate task.
> > > > And it's better to remove all refactoring from PR, if it's not the
> > sense
> > > of
> > > > the issue.
> > > >
> > > >
> > > > 2018-04-20 16:54 GMT+03:00 Andrey Kuznetsov :
> > > >
> > > > > What about adding the following item to the checklist: when the
> > change
> > > > adds
> > > > > new functionality, then unit tests should also be provided, if it's
> > > > > technically possible?
> > > > >
> > > > > As for refactorings, in fact they are strongly discouraged today
> for
> > > some
> > > > > unclear reason. Let's permit to make refactorings in the checklist
> > > being
> > > > > discussed. (Of cource, refactoring should relate to problem being
> > > > solved.)
> > > > >
> > > > > 2018-04-20 16:16 GMT+03:00 Vladimir Ozerov :
> > > > >
> > > > > > Hi Ed,
> > > > > >
> > > > > > Unfortunately some of these points are not good candidates for
> the
> > > > > > checklist because of these:
> > > > > > - It must be clear and disallow *multiple interpretations*
> > > > > > - It must be *lightweight*, otherwise Ignite development would
> > > become a
> > > > > > nightmare
> > > > > >
> > > > > > We cannot have "nice to have" points here. Checklist should
> answer
> > > the
> > > > > > question "is ticket eligible to be merged?"
> > > > > >
> > > > > > >>> 1) Code style.
> > > > > > +1
> > > > > >
> > > > > > >>>  2) Documentation
> > > > > > -1, it is impossible to define what is "well-documented". A piece
> > of
> > > > code
> > > > > > could be obvious for one contributor, and non-obvious for
> another.
> > In
> > > > any
> > > > > > case this is not a blocker for merge. Instead, during review one
> > can
> > > > ask
> > > > > > implementer to add more docs, but it cannot be forced.
> > > > > >
> > > > > > >>>  3) Logging
> > > > > > -1, same problem - what is "enough logging?". Enough for whom?
> How
> > to
> > > > > > understand whether it is enough or not?
> > > > > >
> > > > > > >>>  4) Metrics
> > > > > > -1, no clear boundaries, and decision on

Re: New definition for affinity node (issues with baseline)

2018-04-24 Thread Stanislav Lukyanov

+ for Vladimir's point - adding more complexity may (and likely will) be
even more misleading.

Can we take a step back and discuss why do we need to have different
behavior for persistent and in-memory caches? Can we make in-memory caches
honor baseline instead of special-casing them?

Thanks,
Stan


вт, 24 апр. 2018 г., 18:28 Vladimir Ozerov :

> Guys,
>
> As a user I definitely do not want to think about BLATs, SATs, DATs,
> whatsoever. I want to query data, iterate over data, send compute tasks to
> data. If certain node is outside of BLAT and do not have data, then this is
> not affinity node. Can we just fix affinity logic to take in count BLAT
> appropriately?
>
> On Tue, Apr 24, 2018 at 6:12 PM, Ivan Rakov  wrote:
>
> > Eduard,
> >
> > Can you please summarize code changes that you are proposing?
> > I agree that BLT is a bit misleading term and DAT/SAT make more sense.
> > However, establishing a consensus on v2.4 Baseline Topology terminology
> > took a long time and seems like you are going to cause a bit more
> > perturbations.
> > I still don't understand what and how should be changed. Please provide
> > summary of upcoming class renamings and changes of existing system parts.
> >
> > Best Regards,
> > Ivan Rakov
> >
> >
> > On 24.04.2018 17:46, Eduard Shangareev wrote:
> >
> >> Hi, Igniters,
> >>
> >> I want to raise a topic about our affinity node definition.
> >>
> >> After adding baseline (affinity) topology (BL(A)T) things start being
> >> complicated.
> >>
> >> Plenty of bugs appears:
> >>
> >> IGNITE-8173
> >> ignite.getOrCreateCache(cacheConfig).iterator() method works incorrect
> >> for
> >> replicated cache in case if some data node isn't in baseline
> >>
> >> IGNITE-7628
> >> SqlQuery hangs indefinitely with additional not registered in baseline
> >> node.
> >>
> >> It's because everything relies on concept "affinity node".
> >> And until now it was as simple as a server node which passes node
> filter.
> >> Other words any server node which is not filtered out by node filter.
> >>
> >> But node which is not in BL(A)T and which passes node filter would be
> >> treated as affinity node. And it's definitely wrong. At least, it is a
> >> source of many bugs (I believe there are much more than those 2 which I
> >> already have mentioned).
> >>
> >> It's clear that this definition should be changed.
> >> Let's start with a new definition of "Affinity topology". Affinity
> >> topology
> >> is a set of nodes which potentially could keep data.
> >>
> >> If we use knowledge about the current realization we can say that 1. for
> >> in-memory cache groups it would be all server nodes;
> >> 2. for persistent cache groups it would be BL(A)T.
> >>
> >> I will further use Dynamic Affinity Topology or DAT for 1 (in-memory
> cache
> >> groups) and Static Affinity Topology or SAT instead BL(A)T, or 2nd
> point.
> >>
> >> Denote node filter as f(X), where X is affinity topology.
> >>
> >> Then we can say that node A is affinity node if
> >> A ∈ AT', where AT' = f(AT), where AT is DAT or SAT.
> >>
> >> It worth to mention that AT' should be used to pass to affinity function
> >> of
> >> cache groups.
> >> Also, AT and AT' could change during the time (BL(A)T changes or node
> >> joins/disconnections).
> >>
> >> And I don't like fact that usage of DAT or SAT relies on persistence
> >> settings (Should we make it configurable per cache group?).
> >>
> >> Ok, I have created a ticket to implement this changes and will start
> >> working on it.
> >> https://issues.apache.org/jira/browse/IGNITE-8380 (Affinity node
> >> calculation doesn't take into account BLT).
> >>
> >> Also, I want to use these definitions (Affinity Topology, Affinity Node,
> >> DAT, SAT) in documentation and java docs.
> >>
> >> Maybe, we also should consider replacing BL(A)T with SAT.
> >>
> >> Thank you for your attention.
> >>
> >>
> >
>

Re: New definition for affinity node (issues with baseline)

Ed,

Agreed. Can we see proposed API changes?

On Tue, Apr 24, 2018 at 6:39 PM, Eduard Shangareev <
eduard.shangar...@gmail.com> wrote:

> Vladimir,
>
> It will be fixed, But it is not user-list.
>
> We (developers) should decide ourselves how to go ahead with these
> concepts.
>
> And I think that our old approach to describe BLAT is sophisticated and not
> clear (maybe, even error-prone).
>
> On Tue, Apr 24, 2018 at 6:28 PM, Vladimir Ozerov 
> wrote:
>
> > Guys,
> >
> > As a user I definitely do not want to think about BLATs, SATs, DATs,
> > whatsoever. I want to query data, iterate over data, send compute tasks
> to
> > data. If certain node is outside of BLAT and do not have data, then this
> is
> > not affinity node. Can we just fix affinity logic to take in count BLAT
> > appropriately?
> >
> > On Tue, Apr 24, 2018 at 6:12 PM, Ivan Rakov 
> wrote:
> >
> > > Eduard,
> > >
> > > Can you please summarize code changes that you are proposing?
> > > I agree that BLT is a bit misleading term and DAT/SAT make more sense.
> > > However, establishing a consensus on v2.4 Baseline Topology terminology
> > > took a long time and seems like you are going to cause a bit more
> > > perturbations.
> > > I still don't understand what and how should be changed. Please provide
> > > summary of upcoming class renamings and changes of existing system
> parts.
> > >
> > > Best Regards,
> > > Ivan Rakov
> > >
> > >
> > > On 24.04.2018 17:46, Eduard Shangareev wrote:
> > >
> > >> Hi, Igniters,
> > >>
> > >> I want to raise a topic about our affinity node definition.
> > >>
> > >> After adding baseline (affinity) topology (BL(A)T) things start being
> > >> complicated.
> > >>
> > >> Plenty of bugs appears:
> > >>
> > >> IGNITE-8173
> > >> ignite.getOrCreateCache(cacheConfig).iterator() method works
> incorrect
> > >> for
> > >> replicated cache in case if some data node isn't in baseline
> > >>
> > >> IGNITE-7628
> > >> SqlQuery hangs indefinitely with additional not registered in baseline
> > >> node.
> > >>
> > >> It's because everything relies on concept "affinity node".
> > >> And until now it was as simple as a server node which passes node
> > filter.
> > >> Other words any server node which is not filtered out by node filter.
> > >>
> > >> But node which is not in BL(A)T and which passes node filter would be
> > >> treated as affinity node. And it's definitely wrong. At least, it is a
> > >> source of many bugs (I believe there are much more than those 2 which
> I
> > >> already have mentioned).
> > >>
> > >> It's clear that this definition should be changed.
> > >> Let's start with a new definition of "Affinity topology". Affinity
> > >> topology
> > >> is a set of nodes which potentially could keep data.
> > >>
> > >> If we use knowledge about the current realization we can say that 1.
> for
> > >> in-memory cache groups it would be all server nodes;
> > >> 2. for persistent cache groups it would be BL(A)T.
> > >>
> > >> I will further use Dynamic Affinity Topology or DAT for 1 (in-memory
> > cache
> > >> groups) and Static Affinity Topology or SAT instead BL(A)T, or 2nd
> > point.
> > >>
> > >> Denote node filter as f(X), where X is affinity topology.
> > >>
> > >> Then we can say that node A is affinity node if
> > >> A ∈ AT', where AT' = f(AT), where AT is DAT or SAT.
> > >>
> > >> It worth to mention that AT' should be used to pass to affinity
> function
> > >> of
> > >> cache groups.
> > >> Also, AT and AT' could change during the time (BL(A)T changes or node
> > >> joins/disconnections).
> > >>
> > >> And I don't like fact that usage of DAT or SAT relies on persistence
> > >> settings (Should we make it configurable per cache group?).
> > >>
> > >> Ok, I have created a ticket to implement this changes and will start
> > >> working on it.
> > >> https://issues.apache.org/jira/browse/IGNITE-8380 (Affinity node
> > >> calculation doesn't take into account BLT).
> > >>
> > >> Also, I want to use these definitions (Affinity Topology, Affinity
> Node,
> > >> DAT, SAT) in documentation and java docs.
> > >>
> > >> Maybe, we also should consider replacing BL(A)T with SAT.
> > >>
> > >> Thank you for your attention.
> > >>
> > >>
> > >
> >
>

Re: Ticket review checklist

Ed,

Refactoring is a separate task. If you would like to rework exchange future
- please do this in a ticket "Refactor exchange task", nobody would against
this. This is just a matter of creating separate ticket and separate PR. If
one have a time for refactoring, it should not be a problem for him to
spend several minutes on JIRA and GitHub.

As far as documentation - what you describe is normal review process, when
reviewer might want to ask contributor to fix something. Checklist is a
different thing - this is a set of rules which must be followed by anyone.
I do not understand how you can define documentation in this checklist.
Same problem with logging - what is "enough"?

On Tue, Apr 24, 2018 at 4:51 PM, Eduard Shangareev <
eduard.shangar...@gmail.com> wrote:

> Igniters,
>
> I don't understand why you are so against refactoring.
> Code already smells like hell. Methods 200+ line is normal. Exchange future
> is asking to be separated on several one. Transaction code could understand
> few people.
>
> If we separate refactoring from development it would mean that no one will
> do it.
>
>
> 2) Documentation.
> Everything which was asked by reviewers to clarify idea should be reflected
> in the code.
>
> 3) Logging.
> Logging should be enough to troubleshoot the problem if someone comes to
> user-list with an issue in the code.
>
>
> On Fri, Apr 20, 2018 at 7:06 PM, Dmitry Pavlov 
> wrote:
>
> > Hi Igniters,
> >
> > +1 to idea of checklist.
> >
> > +1 to refactoring and documenting code related to ticket in +/-20 LOC at
> > least.
> >
> > If we start to do it as part of our regular contribution, code will be
> > better, it would became common practice and part of Apache Ignite
> > development culure.
> >
> > If we will hope we will have free time to submit separate patch someday
> and
> > have patience to complete patch-submission process, code will remain
> > undocumented and poor-readable.
> >
> > Sincerely,
> > Dmitriy Pavlov
> >
> > пт, 20 апр. 2018 г. в 18:56, Александр Меньшиков :
> >
> > > 4) Metrics.
> > > partially +1
> > >
> > > It makes sense to have some minimal code coverage for new code in PR.
> > IMHO.
> > >
> > > Also, we can limit the cyclomatic complexity of the new code in PR too.
> > >
> > > 6) Refactoring
> > > -1
> > >
> > > I understand why people want to refactor old code.
> > > But I think refactoring should be always a separate task.
> > > And it's better to remove all refactoring from PR, if it's not the
> sense
> > of
> > > the issue.
> > >
> > >
> > > 2018-04-20 16:54 GMT+03:00 Andrey Kuznetsov :
> > >
> > > > What about adding the following item to the checklist: when the
> change
> > > adds
> > > > new functionality, then unit tests should also be provided, if it's
> > > > technically possible?
> > > >
> > > > As for refactorings, in fact they are strongly discouraged today for
> > some
> > > > unclear reason. Let's permit to make refactorings in the checklist
> > being
> > > > discussed. (Of cource, refactoring should relate to problem being
> > > solved.)
> > > >
> > > > 2018-04-20 16:16 GMT+03:00 Vladimir Ozerov :
> > > >
> > > > > Hi Ed,
> > > > >
> > > > > Unfortunately some of these points are not good candidates for the
> > > > > checklist because of these:
> > > > > - It must be clear and disallow *multiple interpretations*
> > > > > - It must be *lightweight*, otherwise Ignite development would
> > become a
> > > > > nightmare
> > > > >
> > > > > We cannot have "nice to have" points here. Checklist should answer
> > the
> > > > > question "is ticket eligible to be merged?"
> > > > >
> > > > > >>> 1) Code style.
> > > > > +1
> > > > >
> > > > > >>>  2) Documentation
> > > > > -1, it is impossible to define what is "well-documented". A piece
> of
> > > code
> > > > > could be obvious for one contributor, and non-obvious for another.
> In
> > > any
> > > > > case this is not a blocker for merge. Instead, during review one
> can
> > > ask
> > > > > implementer to add more docs, but it cannot be forced.
> > > > >
> > > > > >>>  3) Logging
> > > > > -1, same problem - what is "enough logging?". Enough for whom? How
> to
> > > > > understand whether it is enough or not?
> > > > >
> > > > > >>>  4) Metrics
> > > > > -1, no clear boundaries, and decision on whether metrics are to be
> > > added
> > > > or
> > > > > not should be performed during design phase. As before, it is
> > perfectly
> > > > > valid to ask contributor to add metrics with clear explanation why,
> > but
> > > > > this is not part of the checklist.
> > > > >
> > > > > >>> 5) TC status
> > > > > +1, already mentioned
> > > > >
> > > > > >>>  6) Refactoring
> > > > > Strong -1. OOP is a slippery slope, there are no good and bad
> > receipts
> > > > for
> > > > > all cases, hence it cannot be used in a checklist.
> > > > >
> > > > > We can borrow useful rules from p.2, p.3 and p.4 if you provide
> clear
> > > > >

Re: New definition for affinity node (issues with baseline)

Vladimir,

It will be fixed, But it is not user-list.

We (developers) should decide ourselves how to go ahead with these concepts.

And I think that our old approach to describe BLAT is sophisticated and not
clear (maybe, even error-prone).

On Tue, Apr 24, 2018 at 6:28 PM, Vladimir Ozerov 
wrote:

> Guys,
>
> As a user I definitely do not want to think about BLATs, SATs, DATs,
> whatsoever. I want to query data, iterate over data, send compute tasks to
> data. If certain node is outside of BLAT and do not have data, then this is
> not affinity node. Can we just fix affinity logic to take in count BLAT
> appropriately?
>
> On Tue, Apr 24, 2018 at 6:12 PM, Ivan Rakov  wrote:
>
> > Eduard,
> >
> > Can you please summarize code changes that you are proposing?
> > I agree that BLT is a bit misleading term and DAT/SAT make more sense.
> > However, establishing a consensus on v2.4 Baseline Topology terminology
> > took a long time and seems like you are going to cause a bit more
> > perturbations.
> > I still don't understand what and how should be changed. Please provide
> > summary of upcoming class renamings and changes of existing system parts.
> >
> > Best Regards,
> > Ivan Rakov
> >
> >
> > On 24.04.2018 17:46, Eduard Shangareev wrote:
> >
> >> Hi, Igniters,
> >>
> >> I want to raise a topic about our affinity node definition.
> >>
> >> After adding baseline (affinity) topology (BL(A)T) things start being
> >> complicated.
> >>
> >> Plenty of bugs appears:
> >>
> >> IGNITE-8173
> >> ignite.getOrCreateCache(cacheConfig).iterator() method works incorrect
> >> for
> >> replicated cache in case if some data node isn't in baseline
> >>
> >> IGNITE-7628
> >> SqlQuery hangs indefinitely with additional not registered in baseline
> >> node.
> >>
> >> It's because everything relies on concept "affinity node".
> >> And until now it was as simple as a server node which passes node
> filter.
> >> Other words any server node which is not filtered out by node filter.
> >>
> >> But node which is not in BL(A)T and which passes node filter would be
> >> treated as affinity node. And it's definitely wrong. At least, it is a
> >> source of many bugs (I believe there are much more than those 2 which I
> >> already have mentioned).
> >>
> >> It's clear that this definition should be changed.
> >> Let's start with a new definition of "Affinity topology". Affinity
> >> topology
> >> is a set of nodes which potentially could keep data.
> >>
> >> If we use knowledge about the current realization we can say that 1. for
> >> in-memory cache groups it would be all server nodes;
> >> 2. for persistent cache groups it would be BL(A)T.
> >>
> >> I will further use Dynamic Affinity Topology or DAT for 1 (in-memory
> cache
> >> groups) and Static Affinity Topology or SAT instead BL(A)T, or 2nd
> point.
> >>
> >> Denote node filter as f(X), where X is affinity topology.
> >>
> >> Then we can say that node A is affinity node if
> >> A ∈ AT', where AT' = f(AT), where AT is DAT or SAT.
> >>
> >> It worth to mention that AT' should be used to pass to affinity function
> >> of
> >> cache groups.
> >> Also, AT and AT' could change during the time (BL(A)T changes or node
> >> joins/disconnections).
> >>
> >> And I don't like fact that usage of DAT or SAT relies on persistence
> >> settings (Should we make it configurable per cache group?).
> >>
> >> Ok, I have created a ticket to implement this changes and will start
> >> working on it.
> >> https://issues.apache.org/jira/browse/IGNITE-8380 (Affinity node
> >> calculation doesn't take into account BLT).
> >>
> >> Also, I want to use these definitions (Affinity Topology, Affinity Node,
> >> DAT, SAT) in documentation and java docs.
> >>
> >> Maybe, we also should consider replacing BL(A)T with SAT.
> >>
> >> Thank you for your attention.
> >>
> >>
> >
>

Re: New definition for affinity node (issues with baseline)

Guys,

As a user I definitely do not want to think about BLATs, SATs, DATs,
whatsoever. I want to query data, iterate over data, send compute tasks to
data. If certain node is outside of BLAT and do not have data, then this is
not affinity node. Can we just fix affinity logic to take in count BLAT
appropriately?

On Tue, Apr 24, 2018 at 6:12 PM, Ivan Rakov  wrote:

> Eduard,
>
> Can you please summarize code changes that you are proposing?
> I agree that BLT is a bit misleading term and DAT/SAT make more sense.
> However, establishing a consensus on v2.4 Baseline Topology terminology
> took a long time and seems like you are going to cause a bit more
> perturbations.
> I still don't understand what and how should be changed. Please provide
> summary of upcoming class renamings and changes of existing system parts.
>
> Best Regards,
> Ivan Rakov
>
>
> On 24.04.2018 17:46, Eduard Shangareev wrote:
>
>> Hi, Igniters,
>>
>> I want to raise a topic about our affinity node definition.
>>
>> After adding baseline (affinity) topology (BL(A)T) things start being
>> complicated.
>>
>> Plenty of bugs appears:
>>
>> IGNITE-8173
>> ignite.getOrCreateCache(cacheConfig).iterator() method works incorrect
>> for
>> replicated cache in case if some data node isn't in baseline
>>
>> IGNITE-7628
>> SqlQuery hangs indefinitely with additional not registered in baseline
>> node.
>>
>> It's because everything relies on concept "affinity node".
>> And until now it was as simple as a server node which passes node filter.
>> Other words any server node which is not filtered out by node filter.
>>
>> But node which is not in BL(A)T and which passes node filter would be
>> treated as affinity node. And it's definitely wrong. At least, it is a
>> source of many bugs (I believe there are much more than those 2 which I
>> already have mentioned).
>>
>> It's clear that this definition should be changed.
>> Let's start with a new definition of "Affinity topology". Affinity
>> topology
>> is a set of nodes which potentially could keep data.
>>
>> If we use knowledge about the current realization we can say that 1. for
>> in-memory cache groups it would be all server nodes;
>> 2. for persistent cache groups it would be BL(A)T.
>>
>> I will further use Dynamic Affinity Topology or DAT for 1 (in-memory cache
>> groups) and Static Affinity Topology or SAT instead BL(A)T, or 2nd point.
>>
>> Denote node filter as f(X), where X is affinity topology.
>>
>> Then we can say that node A is affinity node if
>> A ∈ AT', where AT' = f(AT), where AT is DAT or SAT.
>>
>> It worth to mention that AT' should be used to pass to affinity function
>> of
>> cache groups.
>> Also, AT and AT' could change during the time (BL(A)T changes or node
>> joins/disconnections).
>>
>> And I don't like fact that usage of DAT or SAT relies on persistence
>> settings (Should we make it configurable per cache group?).
>>
>> Ok, I have created a ticket to implement this changes and will start
>> working on it.
>> https://issues.apache.org/jira/browse/IGNITE-8380 (Affinity node
>> calculation doesn't take into account BLT).
>>
>> Also, I want to use these definitions (Affinity Topology, Affinity Node,
>> DAT, SAT) in documentation and java docs.
>>
>> Maybe, we also should consider replacing BL(A)T with SAT.
>>
>> Thank you for your attention.
>>
>>
>

[GitHub] ignite pull request #3881: IGNITE-8313 Add trace logs on exchange phases and...

2018-04-24 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/ignite/pull/3881


---

Re: New definition for affinity node (issues with baseline)


Eduard,

Can you please summarize code changes that you are proposing?
I agree that BLT is a bit misleading term and DAT/SAT make more sense. 
However, establishing a consensus on v2.4 Baseline Topology terminology 
took a long time and seems like you are going to cause a bit more 
perturbations.
I still don't understand what and how should be changed. Please provide 
summary of upcoming class renamings and changes of existing system parts.


Best Regards,
Ivan Rakov

On 24.04.2018 17:46, Eduard Shangareev wrote:

Hi, Igniters,

I want to raise a topic about our affinity node definition.

After adding baseline (affinity) topology (BL(A)T) things start being
complicated.

Plenty of bugs appears:

IGNITE-8173
ignite.getOrCreateCache(cacheConfig).iterator() method works incorrect for
replicated cache in case if some data node isn't in baseline

IGNITE-7628
SqlQuery hangs indefinitely with additional not registered in baseline node.

It's because everything relies on concept "affinity node".
And until now it was as simple as a server node which passes node filter.
Other words any server node which is not filtered out by node filter.

But node which is not in BL(A)T and which passes node filter would be
treated as affinity node. And it's definitely wrong. At least, it is a
source of many bugs (I believe there are much more than those 2 which I
already have mentioned).

It's clear that this definition should be changed.
Let's start with a new definition of "Affinity topology". Affinity topology
is a set of nodes which potentially could keep data.

If we use knowledge about the current realization we can say that 1. for
in-memory cache groups it would be all server nodes;
2. for persistent cache groups it would be BL(A)T.

I will further use Dynamic Affinity Topology or DAT for 1 (in-memory cache
groups) and Static Affinity Topology or SAT instead BL(A)T, or 2nd point.

Denote node filter as f(X), where X is affinity topology.

Then we can say that node A is affinity node if
A ∈ AT', where AT' = f(AT), where AT is DAT or SAT.

It worth to mention that AT' should be used to pass to affinity function of
cache groups.
Also, AT and AT' could change during the time (BL(A)T changes or node
joins/disconnections).

And I don't like fact that usage of DAT or SAT relies on persistence
settings (Should we make it configurable per cache group?).

Ok, I have created a ticket to implement this changes and will start
working on it.
https://issues.apache.org/jira/browse/IGNITE-8380 (Affinity node
calculation doesn't take into account BLT).

Also, I want to use these definitions (Affinity Topology, Affinity Node,
DAT, SAT) in documentation and java docs.

Maybe, we also should consider replacing BL(A)T with SAT.

Thank you for your attention.

Re: Orphaned, duplicate, and main-class tests!

2018-04-24 Thread Dmitry Pavlov

I agree with Yakov here. If nobody responds here we can consider we have
lazy consensus on removal of tests.

I'm going to review PRs from Ilya.

вт, 24 апр. 2018 г. в 6:11, Yakov Zhdanov :

> Alexey Goncharuk, Vladimir Ozerov, what do you think about these tests?
>
> I believe they were created as a part of variuos optimization and profiling
> activities. I also think we can remove them since nobody cares about them
> for too long.
>
> Thoughts?
>
> Yakov Zhdanov
>
> ср, 18 апр. 2018 г., 16:42 Ilya Kasnacheev :
>
> > Hello!
> >
> > I've decided to return to this task after a break.
> >
> > Can you please tell me why do we have main-class tests? Such as
> >
> > GridBasicPerformanceTest.class,
> > GridBenchmarkCacheGetLoadTest.class,
> > GridBoundedConcurrentLinkedHashSetLoadTest.class,
> > GridCacheDataStructuresLoadTest.class,
> > GridCacheReplicatedPreloadUndeploysTest.class,
> > GridCacheLoadTest.class,
> > GridCacheMultiNodeDataStructureTest.class,
> > GridCapacityLoadTest.class,
> > GridContinuousOperationsLoadTest.class,
> > GridFactoryVmShutdownTest.class,
> > GridFutureListenPerformanceTest.class,
> > GridFutureQueueTest.class,
> > GridGcTimeoutTest.class,
> > GridJobExecutionSingleNodeLoadTest.class,
> > GridJobExecutionSingleNodeSemaphoreLoadTest.class,
> > GridJobLoadTest.class,
> > GridMergeSortLoadTest.class,
> > GridNioBenchmarkTest.class,
> > GridThreadPriorityTest.class,
> > GridSystemCurrentTimeMillisTest.class,
> > BlockingQueueTest.class,
> > MultipleFileIOTest.class,
> > GridSingleExecutionTest.class
> >
> >
> > If nobody wants them, how about we delete them in master branch? Start
> > afresh?
> >
> > --
> > Ilya Kasnacheev
> >
> > 2018-02-13 17:02 GMT+03:00 Ilya Kasnacheev :
> >
> > > Anton,
> > >
> > > >Tests should be attached to appropriate suites
> > >
> > > This I can do
> > >
> > > > and muted if necessary, Issues should be created on each mute.
> > >
> > > This is roughly a week of work. I can't spare that right now. I doubt
> > > anyone can.
> > >
> > > Can we approach this by smaller steps?
> > >
> > > --
> > > Ilya Kasnacheev
> > >
> > > 2018-02-06 19:55 GMT+03:00 Anton Vinogradov  >:
> > >
> > >> Val,
> > >>
> > >> Tests should be attached to appropriate suites and muted if necessary,
> > >> Issues should be created on each mute.
> > >>
> > >> On Tue, Feb 6, 2018 at 7:23 PM, Valentin Kulichenko <
> > >> valentin.kuliche...@gmail.com> wrote:
> > >>
> > >> > Anton,
> > >> >
> > >> > I tend to agree with Ilya that identifying and fixing all the
> possible
> > >> > broken tests in one go is not feasible. What is the proper way in
> your
> > >> > view? What are you suggesting?
> > >> >
> > >> > -Val
> > >> >
> > >> > On Mon, Feb 5, 2018 at 2:18 AM, Anton Vinogradov <
> > >> avinogra...@gridgain.com
> > >> > >
> > >> > wrote:
> > >> >
> > >> > > Ilya,
> > >> > >
> > >> > > 1) Still see no reason for such changes. Does this break
> something?
> > >> > >
> > >> > > 2) Looks like you're trying to add Trash*TestSuite.java which will
> > >> never
> > >> > be
> > >> > > refactored.
> > >> > > We should do everything in proper way now, not sometime.
> > >> > >
> > >> > > 3) Your comments looks odd to me.
> > >> > > Issue should be resolved in proper way.
> > >> > >
> > >> > > On Mon, Feb 5, 2018 at 1:07 PM, Ilya Kasnacheev <
> > >> > ilya.kasnach...@gmail.com
> > >> > > >
> > >> > > wrote:
> > >> > >
> > >> > > > Anton,
> > >> > > >
> > >> > > > 1) We already have ~100 files named "*AbstractTest.java".
> Renaming
> > >> > these
> > >> > > > several files will help checking for orphaned tests in the
> future,
> > >> as
> > >> > > well
> > >> > > > as increasing code base consistency.
> > >> > > >
> > >> > > > 2) This is huge work that is not doable by any single developer.
> > >> While
> > >> > > > IgniteLostAndFoundTestSuite can be slowly refactored away
> > >> > > > This is unless you are OK with putting all these tests, most of
> > >> which
> > >> > are
> > >> > > > red and some are hanging, in production test suites and
> therefore
> > >> > > breaking
> > >> > > > productivity for a couple months while this gets sorted.
> > >> > > > Are you OK with that? Anybody else?
> > >> > > >
> > >> > > > 3) I think I *could* put them in some test suite or another, but
> > I'm
> > >> > > pretty
> > >> > > > sure I can't fix them all, not in one commit, not ever. Nobody
> can
> > >> do
> > >> > > that
> > >> > > > single-handedly. We need a plan here.
> > >> > > >
> > >> > > > Ilya.
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > > Ilya Kasnacheev
> > >> > > >
> > >> > > > 2018-02-05 13:00 GMT+03:00 Anton Vinogradov <
> > >> avinogra...@gridgain.com
> > >> > >:
> > >> > > >
> > >> > > > > Ilya,
> > >> > > > >
> > >> > > > > 1) I don't think it's a good idea to rename classes to
> > >> > >

Re: Reconsider TTL expire mechanics in Ignite

I think, it would be more fair and simple to configure distributed 
expiration as flag in cache configuration.


By the way, we still have to store ordered set of expirable entries on 
every node. Having https://issues.apache.org/jira/browse/IGNITE-5874 
merged, we can do the following: if distributed eviction is enabled, 
primary node will scan PendingEntriesTree and generate remove requests, 
if it's disabled, every node will clear it's own PendingEntriesTree. 
This will allow user to switch distributed expiration on/off after grid 
restart.


Best Regards,
Ivan Rakov

On 24.04.2018 17:02, Alexey Goncharuk wrote:


1.
Ivan,

Agree about the use-case when we have a read-write-through store. However,
we allow to use Ignite in-memory caches even without 3rd party stores, in
this case the same issue is still present. Maybe we can keep local expire
for read-through caches and have strongly consistent expire for other modes?

2018-04-24 16:51 GMT+03:00 Ivan Rakov :


Alexey,

Distributed expire will result in serious performance overhead, mostly on
network level.
I think, the main use case of TTL are in-memory caches that accelerate
access to slower third-party data source. In such case nothing is broken if
data is missing; strong consistency guarantees are not needed. I think,
that's why we should keep "local expiration" at least for in-memory caches.
Our in-memory page eviction works in the same way.

Best Regards,
Ivan Rakov

On 24.04.2018 16:05, Alexey Goncharuk wrote:


Igniters,

We recently experienced some issues with TTL with enabled persistence, the
issues were related to persistence implementation details. However, when
we
were adding tests to cover more cases, we found more failures, which, I
think, reveal some fundamental issues with expire mechanism.

In short, the root cause of the issue is that we expire entries on primary
and backup nodes independently, which means:
1) Partition sizes may have different values at different times which will
trigger false-negative checks on partition map exchange which was recently
added
2) More importantly, this may lead to inconsistent primary and backup node
values when EntryProcessor is used, because an entry processor may observe
a non-null value on one node and a null value on another node.

In my opinion, the second issue is critical and we must change the expiry
mechanics to run expiry in a distributed mode, with cache mode semantics
for entry remove.

Thoughts?

[GitHub] ignite pull request #3909: Fix tx hanging on node stop

2018-04-24 Thread AMashenkov

GitHub user AMashenkov opened a pull request:

https://github.com/apache/ignite/pull/3909

Fix tx hanging on node stop

For test purposes.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gridgain/apache-ignite ignite-gg-13317-1.8

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/3909.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3909


commit 5fb5c7e3b54ae4efb7a6a1832ba647677d93e0cd
Author: Evgenii Zhuravlev 
Date:   2017-06-22T06:43:03Z

IGNITE-5399 Manual cache rebalancing feature is broken

commit 01d41b72ecc3e81dfc8966cc0e395c247037241c
Author: Evgenii Zhuravlev 
Date:   2017-06-21T10:48:15Z

GG-12256 H2Indexes are not deleted if key class implements Externalizable

commit 5ac9afc719138e37a7d97d9d9db05243eee9a942
Author: Evgenii Zhuravlev 
Date:   2017-06-22T09:36:14Z

IGNITE-5399 add test to testsuite

commit a935d40a80e2f928a84a145aba540a45b156687f
Author: Evgenii Zhuravlev 
Date:   2017-06-22T12:10:32Z

GG-12256 Minor fixes

commit 7e2468770a4eb47a4f61204d8c2000b6ab67c967
Author: nikolay_tikhonov 
Date:   2017-06-22T13:13:01Z

IGNITE-GG-12197 Fixed "Ignore events for discarded update in CLOCK mode".

Signed-off-by: nikolay_tikhonov 

commit 5858efd406bb54352de14a0a7e7f21c2ac7bf899
Author: sboikov 
Date:   2016-12-16T16:23:29Z

IGNITE-2412 - Do not acquire asyncSemaphore for synchronous operations 
(cherry-picked from master)

(cherry picked from commit 82b4073)

commit 113a1380da34ea804d68757d39926da97dee09b6
Author: Alexey Goncharuk 
Date:   2017-06-13T05:20:22Z

GG-12355: Backported IO latency test.

commit 540ca449f1bd2386d3ba0722afb21dd3a504d044
Author: Alexey Goncharuk 
Date:   2017-06-13T17:55:38Z

GG-12355: Added discovery ring latency test + made it available from MBean 
(cherry-picked from master).

commit 0fc6271d8e39125bf5ee341e50a2665a04fc8b1e
Author: Andrey V. Mashenkov 
Date:   2017-06-21T10:42:12Z

GG-12350: GridDhtAtomicSingleUpdateRequest misses topologyVersion() method 
override.

commit f8224d13cf9a6432ba65e0016370ba51bbb544e9
Author: Alexey Goncharuk 
Date:   2017-06-15T19:49:45Z

GG-12299: Make sure concurrent type registrations do not trigger multiple 
cache updates.

commit 4ffc3acfa1bc43bea8c79bfd1864787c15cfc4de
Author: Alexey Goncharuk 
Date:   2017-06-20T04:59:09Z

IGNITE-5528 - IS_EVICT_DISABLED flag is not cleared on cache store 
exception.

commit 8cd9e829380f4c91cc9bb126169863286d1cb323
Author: Andrey V. Mashenkov 
Date:   2017-06-21T12:40:14Z

GG-12353: Added local binary context flag.

Backport of IGNITE-5223 with fixes.

commit 9036ad239d68eff663bc73a81baab2826b054d9a
Author: Andrey V. Mashenkov 
Date:   2017-06-21T15:25:31Z

Added MBean for system cache executors.

commit ed34a5dc681ea8f284f4d25c5575ad46569cc600
Author: Andrey V. Mashenkov 
Date:   2017-06-21T15:33:55Z

Partial fix of IGNITE-5562.

commit d427021f329292fb69d348ba949ad1f8f1e9089e
Author: Andrey V. Mashenkov 
Date:   2017-06-21T16:30:27Z

IGNITE-5552: ServiceProcessor recalculates all service assignments even if 
there is a pending topology change.

commit f1b9cdc0716a1b23f54d68ce0fe19eb85107567d
Author: Alexey Goncharuk 
Date:   2017-06-14T18:37:54Z

GG-12354: Partial fix of IGNITE-5473: Introduce troubleshooting logger.

commit beb2409cfe2045789443d47de735d879961d371e
Author: Andrey V. Mashenkov 
Date:   2017-06-23T09:26:06Z

GG-12352: Forcible node drop makes cluster instable in some cases.
Disable forcible node drop by default.

commit 802f18fc250cbae8959192c78bb28dc525ed3cf7
Author: AMRepo 
Date:   2017-06-22T21:24:57Z

Fix compilation

commit 39d2dec85a3c571dfdb1cd6189b53ae2413a5d22
Author: Andrey V. Mashenkov 
Date:   2017-06-23T10:41:30Z

Merge branch 'ignite-1.7.12-b2' into ignite-1.8.8

# Conflicts:
#   modules/core/src/main/java/org/apache/ignite/internal/GridTopic.java
#   
modules/core/src/main/java/org/apache/ignite/internal/managers/communication/GridIoManager.java
#   
modules/core/src/main/java/org/apache/ignite/internal/managers/communication/GridIoMessageFactory.java
#   
modules/core/src/main/java/org/apache/ignite/internal/managers/communication/IgniteIoTestMessage.java
#   
modules/core/src/main/java/org/apache/ignite/internal/processors/cache/GridCacheAdapter.java
#

New definition for affinity node (issues with baseline)

Hi, Igniters,

I want to raise a topic about our affinity node definition.

After adding baseline (affinity) topology (BL(A)T) things start being
complicated.

Plenty of bugs appears:

IGNITE-8173
ignite.getOrCreateCache(cacheConfig).iterator() method works incorrect for
replicated cache in case if some data node isn't in baseline

IGNITE-7628
SqlQuery hangs indefinitely with additional not registered in baseline node.

It's because everything relies on concept "affinity node".
And until now it was as simple as a server node which passes node filter.
Other words any server node which is not filtered out by node filter.

But node which is not in BL(A)T and which passes node filter would be
treated as affinity node. And it's definitely wrong. At least, it is a
source of many bugs (I believe there are much more than those 2 which I
already have mentioned).

It's clear that this definition should be changed.
Let's start with a new definition of "Affinity topology". Affinity topology
is a set of nodes which potentially could keep data.

If we use knowledge about the current realization we can say that 1. for
in-memory cache groups it would be all server nodes;
2. for persistent cache groups it would be BL(A)T.

I will further use Dynamic Affinity Topology or DAT for 1 (in-memory cache
groups) and Static Affinity Topology or SAT instead BL(A)T, or 2nd point.

Denote node filter as f(X), where X is affinity topology.

Then we can say that node A is affinity node if
A ∈ AT', where AT' = f(AT), where AT is DAT or SAT.

It worth to mention that AT' should be used to pass to affinity function of
cache groups.
Also, AT and AT' could change during the time (BL(A)T changes or node
joins/disconnections).

And I don't like fact that usage of DAT or SAT relies on persistence
settings (Should we make it configurable per cache group?).

Ok, I have created a ticket to implement this changes and will start
working on it.
https://issues.apache.org/jira/browse/IGNITE-8380 (Affinity node
calculation doesn't take into account BLT).

Also, I want to use these definitions (Affinity Topology, Affinity Node,
DAT, SAT) in documentation and java docs.

Maybe, we also should consider replacing BL(A)T with SAT.

Thank you for your attention.

[jira] [Created] (IGNITE-8381) testNodeSingletonDeploy in Basic 2 has high fail rate

2018-04-24 Thread Dmitriy Pavlov (JIRA)

Dmitriy Pavlov created IGNITE-8381:
--

 Summary: testNodeSingletonDeploy in Basic 2 has high fail rate
 Key: IGNITE-8381
 URL: https://issues.apache.org/jira/browse/IGNITE-8381
 Project: Ignite
  Issue Type: Improvement
Reporter: Dmitriy Pavlov
Assignee: Vyacheslav Daradur


IgniteServiceConfigVariationsFullApiTestSuite: 
IgniteServiceConfigVariationsFullApiTest.testNodeSingletonDeploy
is one from most failing test in TC Run All now

On dev.list IEP-17 is discussed to redesign services and fix deployemnt. In the 
same time test itself can't await service to be deployed and confuses Igniters 
in RunAll PR results.

It is suggested to find out fast fix to make test passing. It should be 
probably based on waiting instead of checking deployed state.

Test history

https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=1909279554207447487=%3Cdefault%3E=testDetails

{noformat}
class org.apache.ignite.IgniteException: Service not found: testService
at 
org.apache.ignite.internal.processors.closure.GridClosureProcessor$C2.execute(GridClosureProcessor.java:1858)
at 
org.apache.ignite.internal.processors.job.GridJobWorker$2.call(GridJobWorker.java:568)
at 
org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6695)
at 
org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:562)
at 
org.apache.ignite.internal.processors.job.GridJobWorker.body(GridJobWorker.java:491)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at 
org.apache.ignite.internal.processors.job.GridJobProcessor.processJobExecuteRequest(GridJobProcessor.java:1189)
at 
org.apache.ignite.internal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:1921)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:125)
at 
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1091)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: class 
org.apache.ignite.internal.processors.service.GridServiceNotFoundException: 
Service not found: testService
at 
org.apache.ignite.internal.processors.service.GridServiceProxy$ServiceProxyCallable.call(GridServiceProxy.java:415)
at 
org.apache.ignite.internal.processors.closure.GridClosureProcessor$C2.execute(GridClosureProcessor.java:1855)
... 14 more
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

I want to contribute to Apache Ignite

2018-04-24 Thread polyakov_alex

Hello Ignite Community!

My name is Alexandr. I want to contribute to Apache Ignite and want to start 
with this issue - IGNITE-7883, my JIRA username "a-polyakov".

Thanks!

Re: Reconsider TTL expire mechanics in Ignite

2018-04-24 Thread Andrey Mashenkov

Alexey,

Actually, there are 2 cases with readFromBackup=true which is used by
default for Replicated caches:
-user touch expired entry on backup node:
we can just return null and keep entry as-is with hope primary will remove
it.

-user touch alive entry on backup node:
TTL should be updated on primary somehow to prevent frequently used entries
eviction.




On Tue, Apr 24, 2018 at 5:18 PM, Alexey Goncharuk <
alexey.goncha...@gmail.com> wrote:

> Andrey,
>
> No, in this case the entry must not be evicted and kept as-is because only
> primary node can decide when an entry must be expired. The read in this
> case should return null, though. I understand that we can get non-monotonic
> reads, but this is always the case when readFromBackup is true.
>
> 2018-04-24 17:15 GMT+03:00 Andrey Mashenkov :
>
> > Alexey,
> >
> > What if user touch backup entry via readFromBackup=true?
> > Should we start distributed operation ( e.g. TTL update or expiration) in
> > that case?
> >
> > On Tue, Apr 24, 2018 at 5:02 PM, Alexey Goncharuk <
> > alexey.goncha...@gmail.com> wrote:
> >
> > > Ivan,
> > >
> > > Agree about the use-case when we have a read-write-through store.
> > However,
> > > we allow to use Ignite in-memory caches even without 3rd party stores,
> in
> > > this case the same issue is still present. Maybe we can keep local
> expire
> > > for read-through caches and have strongly consistent expire for other
> > > modes?
> > >
> > > 2018-04-24 16:51 GMT+03:00 Ivan Rakov :
> > >
> > > > Alexey,
> > > >
> > > > Distributed expire will result in serious performance overhead,
> mostly
> > on
> > > > network level.
> > > > I think, the main use case of TTL are in-memory caches that
> accelerate
> > > > access to slower third-party data source. In such case nothing is
> > broken
> > > if
> > > > data is missing; strong consistency guarantees are not needed. I
> think,
> > > > that's why we should keep "local expiration" at least for in-memory
> > > caches.
> > > > Our in-memory page eviction works in the same way.
> > > >
> > > > Best Regards,
> > > > Ivan Rakov
> > > >
> > > > On 24.04.2018 16:05, Alexey Goncharuk wrote:
> > > >
> > > >> Igniters,
> > > >>
> > > >> We recently experienced some issues with TTL with enabled
> persistence,
> > > the
> > > >> issues were related to persistence implementation details. However,
> > when
> > > >> we
> > > >> were adding tests to cover more cases, we found more failures,
> which,
> > I
> > > >> think, reveal some fundamental issues with expire mechanism.
> > > >>
> > > >> In short, the root cause of the issue is that we expire entries on
> > > primary
> > > >> and backup nodes independently, which means:
> > > >> 1) Partition sizes may have different values at different times
> which
> > > will
> > > >> trigger false-negative checks on partition map exchange which was
> > > recently
> > > >> added
> > > >> 2) More importantly, this may lead to inconsistent primary and
> backup
> > > node
> > > >> values when EntryProcessor is used, because an entry processor may
> > > observe
> > > >> a non-null value on one node and a null value on another node.
> > > >>
> > > >> In my opinion, the second issue is critical and we must change the
> > > expiry
> > > >> mechanics to run expiry in a distributed mode, with cache mode
> > semantics
> > > >> for entry remove.
> > > >>
> > > >> Thoughts?
> > > >>
> > > >>
> > > >
> > >
> >
> >
> >
> > --
> > Best regards,
> > Andrey V. Mashenkov
> >
>



-- 
Best regards,
Andrey V. Mashenkov

Re: Reconsider TTL expire mechanics in Ignite

Andrey,

No, in this case the entry must not be evicted and kept as-is because only
primary node can decide when an entry must be expired. The read in this
case should return null, though. I understand that we can get non-monotonic
reads, but this is always the case when readFromBackup is true.

2018-04-24 17:15 GMT+03:00 Andrey Mashenkov :

> Alexey,
>
> What if user touch backup entry via readFromBackup=true?
> Should we start distributed operation ( e.g. TTL update or expiration) in
> that case?
>
> On Tue, Apr 24, 2018 at 5:02 PM, Alexey Goncharuk <
> alexey.goncha...@gmail.com> wrote:
>
> > Ivan,
> >
> > Agree about the use-case when we have a read-write-through store.
> However,
> > we allow to use Ignite in-memory caches even without 3rd party stores, in
> > this case the same issue is still present. Maybe we can keep local expire
> > for read-through caches and have strongly consistent expire for other
> > modes?
> >
> > 2018-04-24 16:51 GMT+03:00 Ivan Rakov :
> >
> > > Alexey,
> > >
> > > Distributed expire will result in serious performance overhead, mostly
> on
> > > network level.
> > > I think, the main use case of TTL are in-memory caches that accelerate
> > > access to slower third-party data source. In such case nothing is
> broken
> > if
> > > data is missing; strong consistency guarantees are not needed. I think,
> > > that's why we should keep "local expiration" at least for in-memory
> > caches.
> > > Our in-memory page eviction works in the same way.
> > >
> > > Best Regards,
> > > Ivan Rakov
> > >
> > > On 24.04.2018 16:05, Alexey Goncharuk wrote:
> > >
> > >> Igniters,
> > >>
> > >> We recently experienced some issues with TTL with enabled persistence,
> > the
> > >> issues were related to persistence implementation details. However,
> when
> > >> we
> > >> were adding tests to cover more cases, we found more failures, which,
> I
> > >> think, reveal some fundamental issues with expire mechanism.
> > >>
> > >> In short, the root cause of the issue is that we expire entries on
> > primary
> > >> and backup nodes independently, which means:
> > >> 1) Partition sizes may have different values at different times which
> > will
> > >> trigger false-negative checks on partition map exchange which was
> > recently
> > >> added
> > >> 2) More importantly, this may lead to inconsistent primary and backup
> > node
> > >> values when EntryProcessor is used, because an entry processor may
> > observe
> > >> a non-null value on one node and a null value on another node.
> > >>
> > >> In my opinion, the second issue is critical and we must change the
> > expiry
> > >> mechanics to run expiry in a distributed mode, with cache mode
> semantics
> > >> for entry remove.
> > >>
> > >> Thoughts?
> > >>
> > >>
> > >
> >
>
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>

Re: Reconsider TTL expire mechanics in Ignite

2018-04-24 Thread Andrey Mashenkov

Alexey,

What if user touch backup entry via readFromBackup=true?
Should we start distributed operation ( e.g. TTL update or expiration) in
that case?

On Tue, Apr 24, 2018 at 5:02 PM, Alexey Goncharuk <
alexey.goncha...@gmail.com> wrote:

> Ivan,
>
> Agree about the use-case when we have a read-write-through store. However,
> we allow to use Ignite in-memory caches even without 3rd party stores, in
> this case the same issue is still present. Maybe we can keep local expire
> for read-through caches and have strongly consistent expire for other
> modes?
>
> 2018-04-24 16:51 GMT+03:00 Ivan Rakov :
>
> > Alexey,
> >
> > Distributed expire will result in serious performance overhead, mostly on
> > network level.
> > I think, the main use case of TTL are in-memory caches that accelerate
> > access to slower third-party data source. In such case nothing is broken
> if
> > data is missing; strong consistency guarantees are not needed. I think,
> > that's why we should keep "local expiration" at least for in-memory
> caches.
> > Our in-memory page eviction works in the same way.
> >
> > Best Regards,
> > Ivan Rakov
> >
> > On 24.04.2018 16:05, Alexey Goncharuk wrote:
> >
> >> Igniters,
> >>
> >> We recently experienced some issues with TTL with enabled persistence,
> the
> >> issues were related to persistence implementation details. However, when
> >> we
> >> were adding tests to cover more cases, we found more failures, which, I
> >> think, reveal some fundamental issues with expire mechanism.
> >>
> >> In short, the root cause of the issue is that we expire entries on
> primary
> >> and backup nodes independently, which means:
> >> 1) Partition sizes may have different values at different times which
> will
> >> trigger false-negative checks on partition map exchange which was
> recently
> >> added
> >> 2) More importantly, this may lead to inconsistent primary and backup
> node
> >> values when EntryProcessor is used, because an entry processor may
> observe
> >> a non-null value on one node and a null value on another node.
> >>
> >> In my opinion, the second issue is critical and we must change the
> expiry
> >> mechanics to run expiry in a distributed mode, with cache mode semantics
> >> for entry remove.
> >>
> >> Thoughts?
> >>
> >>
> >
>



-- 
Best regards,
Andrey V. Mashenkov

Re: Reconsider TTL expire mechanics in Ignite

Ivan,

Agree about the use-case when we have a read-write-through store. However,
we allow to use Ignite in-memory caches even without 3rd party stores, in
this case the same issue is still present. Maybe we can keep local expire
for read-through caches and have strongly consistent expire for other modes?

2018-04-24 16:51 GMT+03:00 Ivan Rakov :

> Alexey,
>
> Distributed expire will result in serious performance overhead, mostly on
> network level.
> I think, the main use case of TTL are in-memory caches that accelerate
> access to slower third-party data source. In such case nothing is broken if
> data is missing; strong consistency guarantees are not needed. I think,
> that's why we should keep "local expiration" at least for in-memory caches.
> Our in-memory page eviction works in the same way.
>
> Best Regards,
> Ivan Rakov
>
> On 24.04.2018 16:05, Alexey Goncharuk wrote:
>
>> Igniters,
>>
>> We recently experienced some issues with TTL with enabled persistence, the
>> issues were related to persistence implementation details. However, when
>> we
>> were adding tests to cover more cases, we found more failures, which, I
>> think, reveal some fundamental issues with expire mechanism.
>>
>> In short, the root cause of the issue is that we expire entries on primary
>> and backup nodes independently, which means:
>> 1) Partition sizes may have different values at different times which will
>> trigger false-negative checks on partition map exchange which was recently
>> added
>> 2) More importantly, this may lead to inconsistent primary and backup node
>> values when EntryProcessor is used, because an entry processor may observe
>> a non-null value on one node and a null value on another node.
>>
>> In my opinion, the second issue is critical and we must change the expiry
>> mechanics to run expiry in a distributed mode, with cache mode semantics
>> for entry remove.
>>
>> Thoughts?
>>
>>
>

Re: Ticket review checklist

Igniters,

I don't understand why you are so against refactoring.
Code already smells like hell. Methods 200+ line is normal. Exchange future
is asking to be separated on several one. Transaction code could understand
few people.

If we separate refactoring from development it would mean that no one will
do it.


2) Documentation.
Everything which was asked by reviewers to clarify idea should be reflected
in the code.

3) Logging.
Logging should be enough to troubleshoot the problem if someone comes to
user-list with an issue in the code.


On Fri, Apr 20, 2018 at 7:06 PM, Dmitry Pavlov 
wrote:

> Hi Igniters,
>
> +1 to idea of checklist.
>
> +1 to refactoring and documenting code related to ticket in +/-20 LOC at
> least.
>
> If we start to do it as part of our regular contribution, code will be
> better, it would became common practice and part of Apache Ignite
> development culure.
>
> If we will hope we will have free time to submit separate patch someday and
> have patience to complete patch-submission process, code will remain
> undocumented and poor-readable.
>
> Sincerely,
> Dmitriy Pavlov
>
> пт, 20 апр. 2018 г. в 18:56, Александр Меньшиков :
>
> > 4) Metrics.
> > partially +1
> >
> > It makes sense to have some minimal code coverage for new code in PR.
> IMHO.
> >
> > Also, we can limit the cyclomatic complexity of the new code in PR too.
> >
> > 6) Refactoring
> > -1
> >
> > I understand why people want to refactor old code.
> > But I think refactoring should be always a separate task.
> > And it's better to remove all refactoring from PR, if it's not the sense
> of
> > the issue.
> >
> >
> > 2018-04-20 16:54 GMT+03:00 Andrey Kuznetsov :
> >
> > > What about adding the following item to the checklist: when the change
> > adds
> > > new functionality, then unit tests should also be provided, if it's
> > > technically possible?
> > >
> > > As for refactorings, in fact they are strongly discouraged today for
> some
> > > unclear reason. Let's permit to make refactorings in the checklist
> being
> > > discussed. (Of cource, refactoring should relate to problem being
> > solved.)
> > >
> > > 2018-04-20 16:16 GMT+03:00 Vladimir Ozerov :
> > >
> > > > Hi Ed,
> > > >
> > > > Unfortunately some of these points are not good candidates for the
> > > > checklist because of these:
> > > > - It must be clear and disallow *multiple interpretations*
> > > > - It must be *lightweight*, otherwise Ignite development would
> become a
> > > > nightmare
> > > >
> > > > We cannot have "nice to have" points here. Checklist should answer
> the
> > > > question "is ticket eligible to be merged?"
> > > >
> > > > >>> 1) Code style.
> > > > +1
> > > >
> > > > >>>  2) Documentation
> > > > -1, it is impossible to define what is "well-documented". A piece of
> > code
> > > > could be obvious for one contributor, and non-obvious for another. In
> > any
> > > > case this is not a blocker for merge. Instead, during review one can
> > ask
> > > > implementer to add more docs, but it cannot be forced.
> > > >
> > > > >>>  3) Logging
> > > > -1, same problem - what is "enough logging?". Enough for whom? How to
> > > > understand whether it is enough or not?
> > > >
> > > > >>>  4) Metrics
> > > > -1, no clear boundaries, and decision on whether metrics are to be
> > added
> > > or
> > > > not should be performed during design phase. As before, it is
> perfectly
> > > > valid to ask contributor to add metrics with clear explanation why,
> but
> > > > this is not part of the checklist.
> > > >
> > > > >>> 5) TC status
> > > > +1, already mentioned
> > > >
> > > > >>>  6) Refactoring
> > > > Strong -1. OOP is a slippery slope, there are no good and bad
> receipts
> > > for
> > > > all cases, hence it cannot be used in a checklist.
> > > >
> > > > We can borrow useful rules from p.2, p.3 and p.4 if you provide clear
> > > > definitions on how to measure them.
> > > >
> > > > Vladimir.
> > > >
> > > > On Fri, Apr 20, 2018 at 3:50 PM, Eduard Shangareev <
> > > > eduard.shangar...@gmail.com> wrote:
> > > >
> > > > > Also, I want to add some technical requirement. Let's discuss them.
> > > > >
> > > > > 1) Code style.
> > > > > The code needs to be formatted according to coding guidelines
> > > > > <
> > https://cwiki.apache.org/confluence/display/IGNITE/Coding+Guidelines
> > > >.
> > > > > The
> > > > > code must not contain TODOs without a ticket reference.
> > > > >
> > > > > It is highly recommended to make major formatting changes in
> existing
> > > > code
> > > > > as a separate commit, to make review process more practical.
> > > > >
> > > > > 2) Documentation.
> > > > > Added code should be well-documented. Any methods that raise
> > questions
> > > > > regarding their code flow, invariants, synchronization, etc., must
> be
> > > > > documented with comprehensive javadoc. Any reviewer can request
> that
> > a
> > > > > particular added method be

Re: Reconsider TTL expire mechanics in Ignite


Alexey,

Distributed expire will result in serious performance overhead, mostly 
on network level.
I think, the main use case of TTL are in-memory caches that accelerate 
access to slower third-party data source. In such case nothing is broken 
if data is missing; strong consistency guarantees are not needed. I 
think, that's why we should keep "local expiration" at least for 
in-memory caches. Our in-memory page eviction works in the same way.


Best Regards,
Ivan Rakov

On 24.04.2018 16:05, Alexey Goncharuk wrote:

Igniters,

We recently experienced some issues with TTL with enabled persistence, the
issues were related to persistence implementation details. However, when we
were adding tests to cover more cases, we found more failures, which, I
think, reveal some fundamental issues with expire mechanism.

In short, the root cause of the issue is that we expire entries on primary
and backup nodes independently, which means:
1) Partition sizes may have different values at different times which will
trigger false-negative checks on partition map exchange which was recently
added
2) More importantly, this may lead to inconsistent primary and backup node
values when EntryProcessor is used, because an entry processor may observe
a non-null value on one node and a null value on another node.

In my opinion, the second issue is critical and we must change the expiry
mechanics to run expiry in a distributed mode, with cache mode semantics
for entry remove.

Thoughts?

[jira] [Created] (IGNITE-8380) Affinity node calculation doesn't take into account BLT

2018-04-24 Thread Eduard Shangareev (JIRA)

Eduard Shangareev created IGNITE-8380:
-

 Summary: Affinity node calculation doesn't take into account BLT
 Key: IGNITE-8380
 URL: https://issues.apache.org/jira/browse/IGNITE-8380
 Project: Ignite
  Issue Type: Bug
Reporter: Eduard Shangareev






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Reconsider TTL expire mechanics in Ignite

Huge +1.

On Tue, Apr 24, 2018 at 4:05 PM, Alexey Goncharuk <
alexey.goncha...@gmail.com> wrote:

> Igniters,
>
> We recently experienced some issues with TTL with enabled persistence, the
> issues were related to persistence implementation details. However, when we
> were adding tests to cover more cases, we found more failures, which, I
> think, reveal some fundamental issues with expire mechanism.
>
> In short, the root cause of the issue is that we expire entries on primary
> and backup nodes independently, which means:
> 1) Partition sizes may have different values at different times which will
> trigger false-negative checks on partition map exchange which was recently
> added
> 2) More importantly, this may lead to inconsistent primary and backup node
> values when EntryProcessor is used, because an entry processor may observe
> a non-null value on one node and a null value on another node.
>
> In my opinion, the second issue is critical and we must change the expiry
> mechanics to run expiry in a distributed mode, with cache mode semantics
> for entry remove.
>
> Thoughts?
>

[jira] [Created] (IGNITE-8379) Add maven-surefire-plugin support for PDS Compatibility tests

2018-04-24 Thread Peter Ivanov (JIRA)

Peter Ivanov created IGNITE-8379:


 Summary: Add maven-surefire-plugin support for PDS Compatibility 
tests
 Key: IGNITE-8379
 URL: https://issues.apache.org/jira/browse/IGNITE-8379
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 2.5
Reporter: Peter Ivanov
 Fix For: 2.6


In continuation of the works on PDS Compatibility test suite, it is required to 
add support for {{maven-surefire-plugin}} in Compatibility Framework.
See IGNITE-8275 for details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

cache size() calculation for MVCC

2018-04-24 Thread Sergey Kalashnikov

Hi Igniters,

I need your advice on a task at hand.

Currently cache API size() is a constant time operation, since the
number of entries is maintained as a separate counter.
However, for MVCC-enabled cache there can be multiple versions of the
same entry.
In order to calculate the size we need to obtain a MVCC snapshot and
then iterate over data pages filtering invisible versions.
So, it is impossible to keep the same complexity guarantees.

My current implementation internally switches to "full-scan" approach
if cache in question is a MVCC-enabled cache.
It happens unbeknown to users, which may expect lightning-fast
response as before.
Perhaps, we might add a new constant to CachePeekMode enumeration that
is passed to cache size() to make it explicit?

The second concern is that cache size calculation is also included
into Cache Metrics API and Visor functionality.
Will it be OK for metrics and things alike to keep returning raw
unfiltered number of entries?
Is there any sense in showing raw unfiltered number of entries which
may vary greatly from invokation to invokation with just simple
updates running in background?

Please share your thoughts.

Thanks in advance.
--
Sergey

[GitHub] ignite pull request #3885: IGNITE-8339 Do not log to WAL partition own durin...

2018-04-24 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/ignite/pull/3885


---

Reconsider TTL expire mechanics in Ignite

Igniters,

We recently experienced some issues with TTL with enabled persistence, the
issues were related to persistence implementation details. However, when we
were adding tests to cover more cases, we found more failures, which, I
think, reveal some fundamental issues with expire mechanism.

In short, the root cause of the issue is that we expire entries on primary
and backup nodes independently, which means:
1) Partition sizes may have different values at different times which will
trigger false-negative checks on partition map exchange which was recently
added
2) More importantly, this may lead to inconsistent primary and backup node
values when EntryProcessor is used, because an entry processor may observe
a non-null value on one node and a null value on another node.

In my opinion, the second issue is critical and we must change the expiry
mechanics to run expiry in a distributed mode, with cache mode semantics
for entry remove.

Thoughts?

Re: Service grid redesign

2018-04-24 Thread Vyacheslav Daradur

Hi, Denis M.,

I'd like to pick up a ticket from IEP-17 next week.

Could you please advise a ticket to start?

On Tue, Apr 24, 2018 at 11:47 AM, Dmitriy Setrakyan
 wrote:
> On Tue, Apr 24, 2018, 3:59 PM Denis Mekhanikov 
> wrote:
>
>> Dmitriy,
>>
>> After the proposed changes are made the utility cache won't be needed at
>> all.
>>
>
> I was rather talking about prioritization. In my view, first and foremost
> we must fix deployment before anything else.
>
> D.



-- 
Best Regards, Vyacheslav D.

[jira] [Created] (IGNITE-8378) Java crash upon node start after some restarts during failover test with 500 logical and 26 physical caches

2018-04-24 Thread Ksenia Rybakova (JIRA)

Ksenia Rybakova created IGNITE-8378:
---

 Summary: Java crash upon node start after some restarts during 
failover test with 500 logical and 26 physical caches
 Key: IGNITE-8378
 URL: https://issues.apache.org/jira/browse/IGNITE-8378
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.4
Reporter: Ksenia Rybakova


Load config:
- Yardstick benchmark: CacheRandomOperationBenchmark
- 10 client nodes, 20 server nodes
- 10K perloading, 20K key range
- 26 physical caches
- 500 logical caches
- 2 backups
- 1 server node is being restarted periodically
Complete configs are attached.

After several successfull restarts the node being restarted crashes (Java 
crash). Node log and error file are attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (IGNITE-8377) Add cluster (de)activation LifecycleBean callbacks

2018-04-24 Thread Alexey Goncharuk (JIRA)

Alexey Goncharuk created IGNITE-8377:


 Summary: Add cluster (de)activation LifecycleBean callbacks
 Key: IGNITE-8377
 URL: https://issues.apache.org/jira/browse/IGNITE-8377
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexey Goncharuk


I suggest to add new {{LifecycleEventType}}, {{BEFORE_CLUSTER_ACTIVATE}}, 
{{AFTER_CLUSTER_ACTIVATE}}, {{BEFORE_CLUSTER_DEACTIVATE}}, 
{{AFTER_CLUSTER_DEACTIVATE}} and fire corresponding lifecycle events along with 
regular events.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (IGNITE-8376) Add cluster (de)activation events

2018-04-24 Thread Alexey Goncharuk (JIRA)

Alexey Goncharuk created IGNITE-8376:


 Summary: Add cluster (de)activation events
 Key: IGNITE-8376
 URL: https://issues.apache.org/jira/browse/IGNITE-8376
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexey Goncharuk
 Fix For: 2.6


Currently, we do not have any way to detect that a cluster got activated, which 
results in busy-loops polling {{cluster().active()}}.

I suggest to add new events, {{EVT_CLUSTER_ACTIVATED}}, 
{{EVT_CLUSTER_DEACTIVATED}}, {{EVT_CLUSTER_ACTIVATION_FAILED}} which will be 
fired when corresponding steps are completed. The event should contain, if 
possible, information about the activation source (public API or 
auto-activation), topology version on which activation was performed. The fail 
event should contain information about the cause of the failure. If needed, a 
new class for this event should be introduced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] ignite pull request #3908: IGNITE-8252 NPE is replaced with an IgniteExcepti...

2018-04-24 Thread sergey-chugunov-1985

GitHub user sergey-chugunov-1985 opened a pull request:

https://github.com/apache/ignite/pull/3908

IGNITE-8252 NPE is replaced with an IgniteException with informative message



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gridgain/apache-ignite ignite-8252-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/3908.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3908


commit 3267f4b1ff2b985b1efe4f091cfd762a9745fbef
Author: Sergey Chugunov 
Date:   2018-04-24T11:23:36Z

IGNITE-8252 NPE is replaced with IgniteException with meaningful message




---

[jira] [Created] (IGNITE-8375) NPE due to race on cache stop and timeout handler execution.

2018-04-24 Thread Alexei Scherbakov (JIRA)

Alexei Scherbakov created IGNITE-8375:
-

 Summary: NPE due to race on cache stop and timeout handler 
execution.
 Key: IGNITE-8375
 URL: https://issues.apache.org/jira/browse/IGNITE-8375
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.4
Reporter: Alexei Scherbakov
 Fix For: 2.6


NPE caused by execution of method [1] during timeout handler execution [2]:

cacheCfg.isLoadPreviousValue() throws NPE because cacheCfg can be nulled by [3] 
on stop.

[1] 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture#loadMissingFromStore
[2] 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.LockTimeoutObject#onTimeout
[3] org.apache.ignite.internal.processors.cache.GridCacheContext#cleanup



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (IGNITE-8374) Test IgnitePdsCorruptedStoreTest.testCacheMetaCorruption hangs during node start

2018-04-24 Thread Aleksey Plekhanov (JIRA)

Aleksey Plekhanov created IGNITE-8374:
-

 Summary: Test IgnitePdsCorruptedStoreTest.testCacheMetaCorruption 
hangs during node start
 Key: IGNITE-8374
 URL: https://issues.apache.org/jira/browse/IGNITE-8374
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.5
Reporter: Aleksey Plekhanov
Assignee: Aleksey Plekhanov
 Fix For: 2.5


Call to cluster().active() in IgniteKernal.ackStart() synchronously waits for 
state transition to complete, but due to error during activation process this 
transition will never end.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: IGNITE-8167

2018-04-24 Thread Dmitry Pavlov

Hi Pavel,

In this case Run All PDS is ok, but as default I suggest to always use
run-all.

Hi Dmitriy G,

could you please take a look to this change?

Sincerely,
Dmitriy Pavlov

вт, 24 апр. 2018 г. в 10:56, Pavel Sapezhko :

> Run-All? As mentioned in contribute guide I was only need to run tests that
> have been affected by my changes. So I used Persistent Data Store test
> suite.
>
> https://ci.ignite.apache.org/viewLog.html?buildId=1187554=buildResultsDiv=IgniteTests24Java8_RunAllPds
>
> On Mon, Apr 23, 2018 at 9:16 PM Dmitry Pavlov 
> wrote:
>
> > I'll add to my to-do list.
> >
> > Igniters, any assistance is welcomed here (pre-review), especially from
> > Native Persistence Experts
> >
> > Pavel, could you please add link to TC Run-All to ticket?
> >
> > пн, 23 апр. 2018 г. в 20:32, Pavel Sapezhko :
> >
> > > I've made the patch for some time ago. Tests have been passed. I've
> > changed
> > > the state of Jira task to "Patch available" almost two weeks ago. The
> > patch
> > > is just one line of code, so I think it can be reviewed fast :) Thanks.
> > >
> > > On Mon, Apr 23, 2018 at 1:23 PM Dmitry Pavlov 
> > > wrote:
> > >
> > > > Hi Pavel,
> > > >
> > > > It seems that Denis added you to the list of contributors.
> > > >
> > > > What updates do you expect?
> > > >
> > > > Sincerely,
> > > > Dmitriy Pavlov
> > > >
> > > > пн, 23 апр. 2018 г. в 9:26, Pavel Sapezhko <
> pavel.sapez...@synesis.ru
> > >:
> > > >
> > > > > Any updates?
> > > > >
> > > > > On Sat, Apr 7, 2018 at 2:01 AM Denis Magda 
> > wrote:
> > > > >
> > > > > > Pavel, added you to JIRA contributors list.
> > > > > >
> > > > > > --
> > > > > > Denis
> > > > > >
> > > > > > On Fri, Apr 6, 2018 at 8:12 AM, Pavel Sapezhko <
> > > > > pavel.sapez...@synesis.ru>
> > > > > > wrote:
> > > > > >
> > > > > > > My JIRA ID: pavel.sapezhko
> > > > > > > As I mentioned above, first we will have only archiver thread
> > > crashed
> > > > > and
> > > > > > > absolute wal started from 0, but we will have alive ignite
> > > instance.
> > > > > Logs
> > > > > > > can be found in attach.
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Apr 6, 2018 at 5:42 PM Dmitry Pavlov <
> > > dpavlov@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Hi Pavel,
> > > > > > >>
> > > > > > >> Thank you. Could you please attach logs/stacktraces. Now it is
> > not
> > > > > quite
> > > > > > >> clear where Ignite has failed?
> > > > > > >>
> > > > > > >> Could you please share your JIRA ID?
> > > > > > >>
> > > > > > >> Hi PMCs,
> > > > > > >>
> > > > > > >> could you please add Pavel to contributor list so
> > > > > > >> https://issues.apache.org/jira/browse/IGNITE-8167 issue can
> be
> > > > > > assigned?
> > > > > > >>
> > > > > > >> Sincerely,
> > > > > > >> Dmitriy Pavlov
> > > > > > >>
> > > > > > >>
> > > > > > >> пт, 6 апр. 2018 г. в 17:38, Pavel Sapezhko <
> > > > pavel.sapez...@synesis.ru
> > > > > >:
> > > > > > >>
> > > > > > >> > Working on it.
> > > > > > >> > --
> > > > > > >> >
> > > > > > >> > С уважением,
> > > > > > >> > Cапежко Павел Александрович
> > > > > > >> > Инженер-программист ООО "Synesis"
> > > > > > >> > Skype: p.sapezhko
> > > > > > >> >
> > > > > > >>
> > > > > > > --
> > > > > > >
> > > > > > > С уважением,
> > > > > > > Cапежко Павел Александрович
> > > > > > > Инженер-программист ООО "Synesis"
> > > > > > > Skype: p.sapezhko
> > > > > > >
> > > > > > >
> > > > > >
> > > > > --
> > > > >
> > > > > С уважением,
> > > > > Cапежко Павел Александрович
> > > > > Инженер-программист ООО "Synesis"
> > > > > Skype: p.sapezhko
> > > > >
> > > >
> > > --
> > >
> > > С уважением,
> > > Cапежко Павел Александрович
> > > Инженер-программист ООО "Synesis"
> > > Skype: p.sapezhko
> > >
> >
> --
>
> С уважением,
> Cапежко Павел Александрович
> Инженер-программист ООО "Synesis"
> Skype: p.sapezhko
>

Re: Apache Ignite 2.4+ Go language client

2018-04-24 Thread Igor Sapego

Aleksandr,

Great job! Do you have any plans on adding new features to
your client?

Pavel,

There are  also CacheGet and CachePut [1] operations, as
far as I can see.

[1] -
https://github.com/amsokol/ignite-go-client/blob/master/binary/v1/client.go#L120

Best Regards,
Igor

On Tue, Apr 24, 2018 at 10:14 AM, Dmitriy Setrakyan 
wrote:

> Any chance we can add key-value support as well?
>
> On Tue, Apr 24, 2018, 2:48 PM Pavel Tupitsyn  wrote:
>
> > Hi Aleksandr,
> >
> > This is awesome, thank you!
> >
> > However, let's make it clear that this client supports SQL only,
> > and none of the other Thin Client protocol features.
> >
> > Pavel
> >
> > On Mon, Apr 23, 2018 at 10:41 PM, Aleksandr Sokolovskii <
> amso...@gmail.com
> > >
> > wrote:
> >
> > > Hi Oleg,
> > >
> > > Thanks for your answer.
> > >
> > > > Community is currently working on formal test specification.
> > > Great. Waiting for this one.
> > >
> > > > As far as NodeJS please note that it is already being developed by
> > > community at the moment [1].
> > > Cool. I stop my initiatives.
> > >
> > > Thanks,
> > > Aleksandr
> > >
> > > From: Vladimir Ozerov
> > > Sent: 23 апреля 2018 г. 22:35
> > > To: dev@ignite.apache.org
> > > Subject: Re: Apache Ignite 2.4+ Go language client
> > >
> > > Hi Alexander,
> > >
> > > Awesome thing! Please note that before accepting the client we need to
> > make
> > > sure it is operational. Community is currently working on formal test
> > > specification. I hope it will be ready soon.
> > >
> > > As far as NodeJS please note that it is already being developed by
> > > community at the moment [1]. We hope to have it in Apache Ignite 2.6.
> > >
> > > [1]
> > > https://issues.apache.org/jira/browse/IGNITE-
> > >
> > > пн, 23 апр. 2018 г. в 22:24, Aleksandr Sokolovskii  >:
> > >
> > > > Hi All,
> > > >
> > > > I hope you are well.
> > > >
> > > > I released Apache Ignite 2.4+ Go language client:
> > > > https://github.com/apache-ignite/go-client
> > > >
> > > > I updated link here:
> > > > https://github.com/golang/go/wiki/SQLDrivers
> > > >
> > > > Is it possible to add link to my repo to this page?:
> > > > https://apacheignite.readme.io/docs/binary-client-protocol
> > > > or this page:
> > > > https://apacheignite-net.readme.io/docs/thin-client
> > > > Golang is much more easy to understand than Java or С#.
> > > > It’s very easy to pull, build and run test for my library.
> > > > I believe it helps another guys to write more thin clients.
> > > >
> > > > P.S.: I started developing Node.js client also.
> > > >
> > > > Thanks,
> > > > Aleksandr
> > > >
> > > >
> > >
> > >
> >
>

[jira] [Created] (IGNITE-8373) BinaryObjectException: Cannot find schema for object with compact footer during load test

2018-04-24 Thread Ksenia Rybakova (JIRA)

Ksenia Rybakova created IGNITE-8373:
---

 Summary: BinaryObjectException: Cannot find schema for object with 
compact footer during load test
 Key: IGNITE-8373
 URL: https://issues.apache.org/jira/browse/IGNITE-8373
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.4
Reporter: Ksenia Rybakova






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: IGNITE-6827 - Review needed.

2018-04-24 Thread Ivan Daschinsky

Hi all, I've implemented corresponded .NET api.
Pavel, could you review my PR, please?


https://issues.apache.org/jira/browse/IGNITE-8075

2018-04-10 21:06 GMT+03:00 Dmitry Pavlov :

> Hi Pavel,
>
>  thank you for bring up test questions. It seems my previous comments were
> not taken into account.
>
> Igniters,
>
>  let me remind we should get passing TC suites before merge,
> https://cwiki.apache.org/confluence/display/IGNITE/How+
> to+Contribute#HowtoContribute-ReviewProcessandMaintainers
> (highlighted
> note).
>
> For disabling parity test checks please consider steps describled in
> https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Tests+How+To#
> IgniteTestsHowTo-Testof.NETAPIparitywithJavaAPI
>
> Sincerely,
> Dmitriy Pavlov
>
>
> пн, 9 апр. 2018 г. в 21:18, Pavel Tupitsyn :
>
> > > Pavel Tupitsyn, what about .NET stuff ?
> >
> > 1) Thank you for filing the ticket, personally I have no plans to work on
> > it in the near future.
> >
> > 2) .NET tests fail, please make sure they are fixed before merging:
> > https://ci.ignite.apache.org/viewLog.html?buildId=1175956
> >
> > TransactionsParityTest should be fixed by adding new properties to ignore
> > list with a reference to IGNITE-8075, this is simple.
> >
> > But I have concerns about
> > *CachePartitionedTest.TestTransactionScopeMultiCache, *
> > seems like something is broken with multi-cache transactions. Please
> > investigate this one.
> >
> > Thanks,
> > Pavel
> >
> >
> > On Mon, Apr 9, 2018 at 6:24 PM, Alexei Scherbakov <
> > alexey.scherbak...@gmail.com> wrote:
> >
> > > Guys,
> > >
> > > I've slightly modified public API javadoc as Denis Magda has suggested
> in
> > > PR review.
> > >
> > > Please take a look.
> > >
> > > Pavel Tupitsyn, what about .NET stuff ?
> > >
> > > I provided all necessary information in ticket [2]
> > >
> > > Upsource link [1]
> > >
> > > [1] https://reviews.ignite.apache.org/ignite/branch/PR%203624
> > >
> > > [2] https://issues.apache.org/jira/browse/IGNITE-8075
> > >
> > >
> > >
> > > пн, 9 апр. 2018 г. в 16:57, Alexey Goncharuk <
> alexey.goncha...@gmail.com
> > >:
> > >
> > > > I am not aware of any additional timeouts that we are willing to add
> in
> > > the
> > > > nearest future.
> > > >
> > > > 2018-04-09 16:01 GMT+03:00 Dmitriy Setrakyan  >:
> > > >
> > > > > On Mon, Apr 9, 2018 at 5:42 AM, Alexey Goncharuk <
> > > > > alexey.goncha...@gmail.com
> > > > > > wrote:
> > > > >
> > > > > > Guys,
> > > > > >
> > > > > > After the review in Upsource the configuration parameter was
> > renamed
> > > > > > to txTimeoutOnPartMapSync, and it makes sense to me because PME
> is
> > an
> > > > > > implementation detail and it may change in future, partition map
> > sync
> > > > is
> > > > > a
> > > > > > more abstract term. For the same reason I like this parameter
> being
> > > > > placed
> > > > > > on transactions configuration - we do not have any parameters for
> > > PME,
> > > > so
> > > > > > the configuration property goes to an object which affects a
> > > > user-exposed
> > > > > > API.
> > > > > >
> > > > >
> > > > > AG, are we going to have any other timeouts on PME, like locks? If
> > yes,
> > > > > then I would still vote of adding PmeTimeout property.
> > > > >
> > > >
> > >
> > >
> > > --
> > >
> > > Best regards,
> > > Alexei Scherbakov
> > >
> >
>



-- 
Sincerely yours, Ivan Daschinskiy

[GitHub] ignite pull request #3907: IGNITE-8372 ZookeeperClusterNode was made Externa...

2018-04-24 Thread sergey-chugunov-1985

GitHub user sergey-chugunov-1985 opened a pull request:

https://github.com/apache/ignite/pull/3907

IGNITE-8372 ZookeeperClusterNode was made Externalizable



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gridgain/apache-ignite ignite-8372

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/3907.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3907


commit 6e37daf2d34ca083f367e982a3db80460ca8de16
Author: Sergey Chugunov 
Date:   2018-04-24T09:14:14Z

IGNITE-8372 ZookeeperClusterNode was made Externalizable to preserve 
serializing local node's metrics




---

[GitHub] ignite pull request #3906: IGNITE-8191 Hotfix (don't wait for transition)

2018-04-24 Thread alex-plekhanov

GitHub user alex-plekhanov opened a pull request:

https://github.com/apache/ignite/pull/3906

IGNITE-8191 Hotfix (don't wait for transition)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/alex-plekhanov/ignite ignite-8191-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/3906.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3906


commit 828ad780194570f330e8120c81dba0c264cab8c0
Author: Aleksey Plekhanov 
Date:   2018-04-24T09:10:58Z

IGNITE-8191 Hotfix (don't wait for transition)




---

[jira] [Created] (IGNITE-8372) Cluster metrics are reported incorrectly on joining node with ZK-based discovery

2018-04-24 Thread Sergey Chugunov (JIRA)

Sergey Chugunov created IGNITE-8372:
---

 Summary: Cluster metrics are reported incorrectly on joining node 
with ZK-based discovery
 Key: IGNITE-8372
 URL: https://issues.apache.org/jira/browse/IGNITE-8372
 Project: Ignite
  Issue Type: Bug
  Components: zookeeper
Reporter: Sergey Chugunov
Assignee: Sergey Chugunov
 Fix For: 2.5


When new node joins with ZK discovery it sometimes reports negative number of 
CPUs and incorrect heap size.
Message in log looks like this:

{noformat}
[myid:] - INFO  [disco-event-worker-#61:Log4JLogger@495] - Topology snapshot 
[ver=100, servers=100, clients=0, CPUs=-6, heap=0.5GB]
{noformat}

There is a race though between this report and ClusterMetricsUpdateMessage: if 
the node receives and process this message first which happens in a separate 
thread correct values are printed to log.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (IGNITE-8371) MVCC TX: Force key request during rebalance may cause error on backups.

2018-04-24 Thread Roman Kondakov (JIRA)

Roman Kondakov created IGNITE-8371:
--

 Summary: MVCC TX: Force key request during rebalance may cause 
error on backups.
 Key: IGNITE-8371
 URL: https://issues.apache.org/jira/browse/IGNITE-8371
 Project: Ignite
  Issue Type: Bug
  Components: sql
Reporter: Roman Kondakov


When backup is updating during rebalance and the key to be updated in TX is not 
supplied yet from the previous partition owner, backup makes force key request 
in order to obtain this key and all its versions. But later this key can be 
send to this backup from the previous owner once again as a part of standard 
rebalance process. And this causes write conflict: we have to write this key on 
the backup once again.

Solution: do not update key when it has already been written before (during 
rebalance or force key request process).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Service grid redesign

2018-04-24 Thread Dmitriy Setrakyan

On Tue, Apr 24, 2018, 3:59 PM Denis Mekhanikov 
wrote:

> Dmitriy,
>
> After the proposed changes are made the utility cache won't be needed at
> all.
>

I was rather talking about prioritization. In my view, first and foremost
we must fix deployment before anything else.

D.

[jira] [Created] (IGNITE-8370) Web console: split page-signin into three separate pages

2018-04-24 Thread Ilya Borisov (JIRA)

Ilya Borisov created IGNITE-8370:


 Summary: Web console: split page-signin into three separate pages
 Key: IGNITE-8370
 URL: https://issues.apache.org/jira/browse/IGNITE-8370
 Project: Ignite
  Issue Type: Improvement
  Components: wizards
Reporter: Ilya Borisov
Assignee: Ilya Borisov


Currently, the page-signin component solves three separate cases: user 
registration, sign in and password restore. Since none of those features has to 
be on the same page, let's split those to separate pages.

What to do:
1. Split signup, signin and forgot password features to separate components and 
pages.
2. While at it, add an optional "Phone" input to signup page in order to match 
inputs available on user profile page.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [GitHub] ignite pull request #3719: IGNITE-8048 merge query entities for dynamic cach...