Re: Apache Ignite 3.0

2020-10-30 Thread Yakov Zhdanov
Alexey,
Thanks for details!

Common replication infra suggestion looks great!
Agree with your points regarding per-page replication, but still have a
feeling that this protocol can be made compact enough, e.g. by sending only
deltas. As far as entry processors we can decide on what to send - if the
serialized processor is of smaller size then we can send it instead of the
delta. This can sound like an overcomplication, but still good to measure.

Regards,
Yakov


Re: Apache Ignite 3.0

2020-10-17 Thread Yakov Zhdanov
Hey Valentin!

Any design docs/wiki for 1, 4  and 5 so far?

Yakov Zhdanov


Re: Apache Ignite 3.0

2020-10-10 Thread Yakov Zhdanov
Hi!
I am back!

Here are several ideas on top of my mind for Ignite 3.0
1. Client nodes should take the config from servers. Basically it should be
enough to provide some cluster identifier or any known IP address to start
a client.

2. Thread per partition. Again. I strongly recommend taking a look at how
Scylla DB operates. I think this is the best distributed database threading
model and can be a perfect fit for Ignite. Main idea is in "share nothing"
approach - they keep thread switches to the necessary minimum - messages
reading or updating data are processed within the thread that reads them
(of course, the sender should properly route the message, i.e. send it to
the correct socket). Blocking operations such as fsync happen outside of
worker (shard) threads.
This will require to split indexes per partition which should be quite ok
for most of the use cases in my view. Edge cases with high-selectivity
indexes selecting 1-2 rows for a value can be sped up with hash indexes or
even by secondary index backed by another cache.

3. Replicate physical updates instead of logical. This will simplify logic
running on backups to zero. Read a page sent by the primary node and apply
it locally. Most probably this change will require pure thread per
partition described above.

4. Merge (all?) network components (communication, disco, REST, etc?) and
listen to one port.

5. Revisit transaction concurrency and isolation settings. Currently some
of the combinations do not work as users may expect them to and some look
just weird.

Ready to discuss the above and other points.

Thanks!
Yakov


Re: Review of IGNITE-11521

2019-03-13 Thread Yakov Zhdanov
I think there should be a link on the page - Suggest edits

--Yakov


Re: Review of IGNITE-11521

2019-03-13 Thread Yakov Zhdanov
Lukas, would you be so kind as to suggest edits for the mentioned
documentation page. Your suggesions will be reviewed and incorporated.

Thanks!

--Yakov


Re: Storing short/empty strings in Ignite

2019-03-06 Thread Yakov Zhdanov
We still need to differentiate between nulls and empty strings.

--Yakov


Re: ipFinder configuration for Samples

2019-02-28 Thread Yakov Zhdanov
Stan, I thnk we never know the truth here. Imagine you never dealed with
distributed systems. You just copy distibs to intended machines and startup
distributed cluster out of the box. Is not it good? If we do the change we
should expect many questions to user@ asking why nodes do not see each
other. Agree?

--Yakov


Re: ipFinder configuration for Samples

2019-02-28 Thread Yakov Zhdanov
Guys,

I remember we did the opposite change some time ago - switched VM IP finder
to multicast. That was done for user being able to start cluster spanning
multiple machines using examples configuration. With this change you
removed all the working samples for starting really distributed environment.

What was the problem? Multicast IP finder has the list of addresses that we
try to connect to on start. As far as clashes - it seems it affects only
engineers sitting in one office, but they can set up env var to override
default mcast group. The primary goal for switching to multicast was to
allow people new to Ignite to quickly start the cluster.

As far as I remember hazelcast has multicast enabled by default. Can
anybody check this?

--Yakov


Re: ipFinder configuration for Samples

2019-02-28 Thread Yakov Zhdanov
Guys, I remember we did this to
--Yakov


ср, 27 февр. 2019 г. в 14:46, Dmitrii Ryabov :

> Hello, Igniters!
>
> Code is ready and reviewed, tests are passed.
>
> Can we make final decision about this change? Do we really need it? [1]
>
> Pros:
>
> * Multicast ipFinder adds some instability when several persons try & debug
> samples or evaluate a new Ignite version at the same local network.
> * Speedup for node start.
> * Same change was made for test framework [2].
>
> [1]
>
> https://issues.apache.org/jira/browse/IGNITE-6826?focusedCommentId=16752473=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16752473
> [2] https://issues.apache.org/jira/browse/IGNITE-10555
>
> пт, 3 нояб. 2017 г., 11:14 Alexey Popov :
>
> > I've created https://issues.apache.org/jira/browse/IGNITE-6826
> >
> > Thanks,
> > Alexey
> >
> >
> >
> > --
> > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> >
>


Re: High priority TCP discovery messages

2019-01-11 Thread Yakov Zhdanov
> How big the message worker's queue may grow until it becomes a problem?

Denis, you never know. Imagine node may be flooded with messages because of
the increased timeouts and network problems. I remember some cases with
hundreds of messages in queue on large topologies. Please, no O(n)
approaches =)

> So, we may never come to a point, when an actual
TcpDiscoveryMetricsUpdateMessage is processed.

Good catch! You can put hard limit and process enqued MetricsUpdate message
if last one of the kind was processed more than metricsUpdFreq millisecs
ago.

Denis, also note - initial problem is message queue growth. When we choose
to skip messages it means that node cannot process certain messages and
most probably experiencing problems. We need to think of killing such
nodes. I would suggest we allow queue overflow for 1 min, but if situation
does not go to normal then node should fire a special event and then kill
itself. Thoughts?

--Yakov


Re: High priority TCP discovery messages

2019-01-10 Thread Yakov Zhdanov
Denis, what if we remove priority difference for messages and always add
new to the end of the queue?

As far as traversing the queue - I don't like O(n) approaches =). So, with
adding all messages to the end of the queue (removing prio difference) I
would suggest that we save latest 1st lap message and latest 2nd lap
message and process metrics message in message worker thread in queue order
if they are latest and skip the otherwise.

Does this make sense?

--Yakov


[jira] [Created] (IGNITE-10698) Get rid of @MXBeanParametersNames and @MXBeanParametersDescriptions

2018-12-14 Thread Yakov Zhdanov (JIRA)
Yakov Zhdanov created IGNITE-10698:
--

 Summary: Get rid of @MXBeanParametersNames and 
@MXBeanParametersDescriptions
 Key: IGNITE-10698
 URL: https://issues.apache.org/jira/browse/IGNITE-10698
 Project: Ignite
  Issue Type: Task
Reporter: Yakov Zhdanov
 Fix For: 3.0


{noformat}
@MXBeanDescription("Returns or kills transactions matching the filter 
conditions.")
@MXBeanParametersNames(
{
"minDuration",
"minSize",
"prj",
"consistentIds",
"xid",
"lbRegex",
"limit",
"order",
"detailed",
"kill"
}
)
@MXBeanParametersDescriptions(
{
"Minimum duration (seconds).",
"Minimum size.",
"Projection (servers|clients).",
"Consistent ids (separated by comma).",
"Transaction XID.",
"Label regexp.",
"Limit a number of transactions collected on each node.",
"Order by DURATION|SIZE.",
"Show detailed description, otherwise only count.",
"Kill matching transactions (be careful)."
}
)
{noformat}

Above looks pretty ugly and is very error prone due to messing names and descr 
order or number of strings.

I would suggest to introduce individual parameters annotations and get them via 
mtd.getParamterAnnotations() at runtime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Case sensitive indexes question.

2018-11-30 Thread Yakov Zhdanov
Zhenya,

Vladimir suggested not to restrict anything. However, my opinion is to
throw exception on duplicating indexes. We should better add ability to
rename index if it can be useful for anyone. Having same field set indexed
with same index type is pretty strange and adds a lot of risk for
performance of the system. If this is hard to support in 2.x then let's do
it in 3.0. Vladimir, what do you think?

-- Yakov


Re: Apache Ignite 2.7. Last Mile

2018-11-29 Thread Yakov Zhdanov
Vladimir, can you please take a look at
https://issues.apache.org/jira/browse/IGNITE-10376?

--Yakov


Re: control.sh: a bug or feature?

2018-11-15 Thread Yakov Zhdanov
Max, correct link to ticket is
https://issues.apache.org/jira/browse/IGNITE-10258

I would agree with you and Vladimir that parameters in mentioned case may
appear in any order.

--Yakov


Re: Brainstorm: Make TC Run All faster

2018-11-15 Thread Yakov Zhdanov
Denis, you can go even further. E.g. you can start topology once for the
full set of single threaded full api cache tests. Each test should start
cache dynamically and run it logic.

As for me, I would think of splitting RunAll to 2 steps - one containing
basic tests and another with more complex tests. 2nd step should not start
(except manually) if 1st step results in any build failure.

--Yakov


Re: Service grid redesign

2018-11-08 Thread Yakov Zhdanov
Nikolay, let me take a look at the changes. I will do it possibly over
weekend.

Thanks!

--Yakov

2018-11-08 17:20 GMT+03:00 Nikolay Izhikov :

> Hello, Igniters.
>
> Please, respond if anyone wish to do the additional review of this
> improvement.
>
> I think it's ready to be merged, so if noone has time to review, I can
> merge the patch.
>
> ср, 7 нояб. 2018 г., 18:04 Vyacheslav Daradur daradu...@gmail.com:
>
> > Dmitriy, I published documentation in wiki:
> > https://cwiki.apache.org/confluence/pages/viewpage.
> action?pageId=95654584
> >
> > Thank you!
> > On Wed, Nov 7, 2018 at 5:10 PM Dmitriy Pavlov 
> > wrote:
> > >
> > > Hi I think wiki is better than any attached docs. Could you please
> > create a
> > > page?
> > >
> > > ср, 7 нояб. 2018 г., 14:39 Vyacheslav Daradur :
> > >
> > > > I prepared a description of the implemented solution and attached it
> > > > to the issue [1].
> > > >
> > > > This should help during a review. Should I post the document into
> wiki
> > or
> > > > IEP?
> > > >
> > > > I'd like to ask Ignite's experts review the solution [1] [2], please?
> > > >
> > > > [1] https://issues.apache.org/jira/browse/IGNITE-9607
> > > > [2] https://github.com/apache/ignite/pull/4434
> > > > On Wed, Oct 31, 2018 at 5:04 PM Vyacheslav Daradur <
> > daradu...@gmail.com>
> > > > wrote:
> > > > >
> > > > > Hi, Igniters! Good news!
> > > > >
> > > > > Service Grid Redesign Phase 1 - is in Patch Available now.
> > > > >
> > > > > Nikolay Izhikov has reviewed implementation.
> > > > >
> > > > > However, we need additional review from other Ignite experts.
> > > > >
> > > > > Here is an umbrella ticket [1] and PR [2].
> > > > >
> > > > > Could someone step in and do the review?
> > > > >
> > > > > [1] https://issues.apache.org/jira/browse/IGNITE-9607
> > > > > [2] https://github.com/apache/ignite/pull/4434
> > > > > On Sat, Aug 18, 2018 at 11:44 AM Denis Mekhanikov <
> > dmekhani...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > Pavel, could you assist?
> > > > > >
> > > > > > Does it make sense for .Net to specify service class name instead
> > of
> > > > its
> > > > > > implementation?
> > > > > >
> > > > > > I think, it shouldn't be a problem.
> > > > > >
> > > > > > Denis
> > > > > >
> > > > > > On Sat, Aug 18, 2018, 11:33 Vyacheslav Daradur <
> > daradu...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > I think that the replacement of serialized instance makes sense
> > to me
> > > > > > > for Java part.
> > > > > > >
> > > > > > > But how it should work for .NET client?
> > > > > > >
> > > > > > > On Tue, Aug 14, 2018 at 4:07 PM Dmitriy Setrakyan <
> > > > dsetrak...@apache.org>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Tue, Aug 14, 2018 at 6:10 AM, Nikita Amelchev <
> > > > nsamelc...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hello, Igniters.
> > > > > > > > >
> > > > > > > > > I am working on task [1] that would replace serialized
> > service's
> > > > > > > instance
> > > > > > > > > by service's class name and properties map in
> > > > {ServiceConfiguration}.
> > > > > > > > >
> > > > > > > > > The task describes that we should use
> > > > > > > > > {String className} + {Map properties}
> instead
> > > > {Service
> > > > > > > > > srvc}.
> > > > > > > > >
> > > > > > > > > I'd like to clarify the following questions:
> > > > > > > > >
> > > > > > > > > 1. What about public methods?
> > > > > > > > > I suggest to mark them as deprecated and use class name of
> > > > provided
> > > > > > > > > instance.
> > > > > > > > > Also to add deploying methods with new parameters:
> > > > > > > > >
> > > > > > > > > @Deprecated
> > > > > > > > > public IgniteInternalFuture
> > deployNodeSingleton(ClusterGroup
> > > > prj,
> > > > > > > > > String
> > > > > > > > > name, Service svc)
> > > > > > > > >
> > > > > > > > > public IgniteInternalFuture
> > deployNodeSingleton(ClusterGroup
> > > > prj,
> > > > > > > > > String
> > > > > > > > > name, String srvcClsName, Map prop)
> > > > > > > > >
> > > > > > > >
> > > > > > > > I think this makes sense, but I would like other committers
> to
> > > > confirm.
> > > > > > > > Perhaps Vladimir Ozerov should comment here.
> > > > > > > >
> > > > > > > >
> > > > > > > > > 2. Is {Map properties} parameter mandatory
> > when
> > > > > > > deploying a
> > > > > > > > > service?
> > > > > > > > > Is it make sense to add deploying methods without it? For
> > > > example:
> > > > > > > > >
> > > > > > > > > public IgniteInternalFuture
> > deployNodeSingleton(ClusterGroup
> > > > prj,
> > > > > > > > > String
> > > > > > > > > name, String srvcClsName)
> > > > > > > > >
> > > > > > > > > public IgniteInternalFuture
> > deployNodeSingleton(ClusterGroup
> > > > prj,
> > > > > > > > > String
> > > > > > > > > name, String srvcClsName, Map prop)
> > > > > > > > >
> > > > > > > >
> > > > > > > > I would always ask the user to pass the property map, but
> would
> > > > allow it
> > > > > > > to
> > > > > > > > be null.
> > > > 

Re: destroy cache holding residual metadata in memory (2.7)

2018-11-06 Thread Yakov Zhdanov
Wayne, can you please share a reproducer for this problem that can be
launched from IDE?

--Yakov


Re: destroy cache holding residual metadata in memory (2.7)

2018-11-06 Thread Yakov Zhdanov
Wayne, can you please share a reproducer for this problem that can be
launched from IDE?

--Yakov


Re: Abbreviation code-style requirement.

2018-11-02 Thread Yakov Zhdanov
No, I meant under Ignite's git so any change to resource file arrives with
project workspace updates and gets automatically picked up by plugin.

Makes sense?

--Yakov


Re: Abbreviation code-style requirement.

2018-11-02 Thread Yakov Zhdanov
Agree with Vyacheslav - reviewers can either fix the issues or ask to fix
them. After several PRs new contributors will get used with project
requirements.

As far as one time contributions, they are usually pretty simple and should
not take any significant time to fix. If one time contirbutor returns with
more contributions then he or she should account all the changes made on
review and, again, come to a point where all project requirements are
staisfied.

Btw, Vyacheslav, can we have abbreviations.properties in the project under
git and have plugin use it?

--Yakov


Re: Ignite documentation process

2018-11-02 Thread Yakov Zhdanov
Denis, there were email notifications from wiki on corresponding edits =)

--Yakov


Re: Abbreviation code-style requirement.

2018-11-01 Thread Yakov Zhdanov
Ivan I removed "lic" from the list. Thanks for catch!

Agree with Andrey. After several code reviews newcomers will get used to
abbreviations.

Andrey, try searching for "fut" and make sure to have "Word" checked. You
will see plenty of usages. "f" is also ok for future in case it does not
bring confusion and does not hurt readability.

Let's keep using abbreviations and treat them as mandatory requirement.
This is important for keeping our codebase consistent and tidy.

--Yakov


Re: Abbreviation code-style requirement.

2018-11-01 Thread Yakov Zhdanov
Igniters,

I have shortened the list of abbreviation rules and edited our wiki page -
https://cwiki.apache.org/confluence/display/IGNITE/Abbreviation+Rules.
Thanks to Vladimir Ozerov and Alexey Goncharuk for their useful feedback.
My idea was to leave only "common sense" abbreviations and those that are
Ignite domain specific.

I would also suggest that we treat names mentioned in the table on the page
as names that are required to be abbreviated. Please take this into account
when conducting code reviews.

Thanks!

--Yakov


Re: Problem with reading incomplete payload - IGNITE-7153

2018-10-31 Thread Yakov Zhdanov
Hi Mike!

Thanks for reproducer. Now I understand the problem. NIO worker reads
chunks from the network and notifies the parser on data read. Parser
expects chunks to be complete and has all the data to read entire message,
but this is not guaranteed and single message can arrive in several chunks.
Which is demostrated by your test.

The problem is inside GridRedisProtocolParser. We should add ability to
store the parsing context if we do not have all the data to complete
message parsing, as it is done, for example in GridBufferedParser. So, it
is definitely an issue and should be fixed by adding parsing state. I see
you attempted to do so in PR
https://github.com/apache/ignite/pull/5044/files. I did not do a formal
review, so let's ask community to review your patch.

Couple of comments about your reproducer.

1. Let's dump a proper Redis message bytes sent by Jedis.
2. Let's split this dump into 5 chunks and send them with 100 ms delays.

This should fail before fix is applied, and should pass with proper message
parsed after we have the issue fixed.

Thanks!

--Yakov


Re: Abbreviation code-style requirement.

2018-10-30 Thread Yakov Zhdanov
Guys, I am sorry I missed this discussion. Apparently, abbreviations use is
far from being the biggest problem in the project. I think everyone agrees
here.

I vote for leaving abbreviations mandatory, and would be strongly against
making them optional since we will endup in situation when different lines
of the same method or class will contain abbreviated and non-abbreviated
variables, fields and parameters names. This will look ugly. I think nobody
thought about source files that are several thousands lines long. Undo
abbreviations throughout the entire project is hard work, pretty stupid to
do on such huge code base and I am sure will introduce problems and
failures on TeamCity.

Instead I want to suggest the following:
1. Abbreviations stay mandatory. Making them optional does not make any
sense.
2. List of abbreviations should be shortened to up to 20 items and we
should leave only those which are common sense.
3. Contributor may also choose to use full words in complex variable names
if there is a mix of abbreviated and non-abbreviated words if this helps
with readability.

I will suggest shorter abbreviations list today or tomorrow and let you
know in this thread.

Thanks!

--Yakov


Re: Problem with reading incomplete payload - IGNITE-7153

2018-10-30 Thread Yakov Zhdanov
Michael, can you please share a reproducer? Is it possible to snapshot a
packet that causes the error and just emulate packet send with manually
opened socket bypassing Redis client lib?

--Yakov


Re: Code inspection

2018-10-26 Thread Yakov Zhdanov
Maxim,

Thanks for response, let's do it the way you suggested.

Please consider adding more checks
- line endings. I think we should only have \n
- ensure blank line in the end of file

All these are code reviews issues I pointed out many times when reviewing
conributions. It would be cool if we have TC build failing if there is any.

Thanks!

--Yakov


Re: Pre-touch for Ignite off-heap memory

2018-10-26 Thread Yakov Zhdanov
Andrey,

Probability of a OOM kill will be much lower if offheap is pretouched. What
do you mean by JVM internal needs? In my understanding if user enables
option to pretouch heap and fixes the heap to prevent jvm releasing memory
back to OS, then OOM killing is very unlikely.

I would agree that pretouch for offheap may be helpful in many cases.

--Yakov


Re: Code inspection

2018-10-26 Thread Yakov Zhdanov
Agree with Petr.

Maxim, what are our next steps? Can we add check for
- line length
- indents (tabs vs spaces)

This may require some efforts (will it and how much?), but can we add check
for:
- log messages structure
- log.warn() vs U.warn()
- abbreviations for local variables and fields.

And last question

> - the new configuration ignite_inspections_teamcity.xml added to PR;

Can this be installed locally by every contributor to check the code? Can
we add this to setup steps we have on wiki?

--Yakov


Re: Critical worker threads liveness checking drawbacks

2018-09-28 Thread Yakov Zhdanov
; > > > > > > >
> > > > > > > > failures on
> > > > > > > > > > > > > > per-failure-type basis.
> > > > > > > > > > > > > > According to this I have updated the
> > implementation: [1]
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > [1] https://github.com/apache/ignite/pull/4089
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > пн, 10 сент. 2018 г. в 22:35, David Harvey <
> > > > > > > >
> > > > > > > > syssoft...@gmail.com>:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > When I've done this before,I've needed to find
> > the
> > > > >
> > > > > oldest
> > > > > > > >
> > > > > > > > thread,
> > > > > > > > > > > > and
> > > > > > > > > > > > > kill
> > > > > > > > > > > > > > > the node running that.   From a language
> > standpoint,
> > > > > > >
> > > > > > > Maxim's
> > > > > > > > "without
> > > > > > > > > > > > > > > progress" better than "heartbeat".   For
> > example, what
> > > > >
> > > > > I'm
> > > > > > > >
> > > > > > > > most
> > > > > > > > > > > > > interested
> > > > > > > > > > > > > > > in on a distributed system is which thread
> > started the
> > > > >
> > > > > work
> > > > > > > >
> > > > > > > > it has
> > > > > > > > > > > > not
> > > > > > > > > > > > > > > completed the earliest, and when did that
> thread
> > last
> > > > >
> > > > > make
> > > > > > > >
> > > > > > > > forward
> > > > > > > > > > > > > > > process. You don't want to kill a node
> > because a
> > > > >
> > > > > thread
> > > > > > > >
> > > > > > > > is
> > > > > > > > > > > > waiting
> > > > > > > > > > > > > on a
> > > > > > > > > > > > > > > lock held by a thread that went off-node and
> has
> > not
> > > > > > >
> > > > > > > gotten a
> > > > > > > > > > > > response.
> > > > > > > > > > > > > > > If you don't understand the dependency
> > relationships,
> > > > >
> > > > > you
> > > > > > > >
> > > > > > > > will make
> > > > > > > > > > > > > > > incorrect recovery decisions.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Mon, Sep 10, 2018 at 4:08 AM Maxim
> Muzafarov <
> > > > > > > >
> > > > > > > > maxmu...@gmail.com>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I think we should find exact answers to these
> > > > >
> > > > > questions:
> > > > > > > > > > > > > > > >  1. What `critical` issue exactly is?
> > > > > > > > > > > > > > > >  2. How can we find critical issues?
> > > > > > > > > > > > > > > >  3. How can we handle critical issues?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > First,
> > > > > > > > > > > > > > > >  - Ignore uninterruptable

Re: Critical worker threads liveness checking drawbacks

2018-09-08 Thread Yakov Zhdanov
Agree with David. We need to have an opporunity set backups count threshold
(at runtime also!) that will not allow any automatic stop if there will be
a data loss. Andrey, what do you think?

--Yakov


Re: Critical worker threads liveness checking drawbacks

2018-09-07 Thread Yakov Zhdanov
Yes, and you should suggest solution, e.g. throttle rebalancing threads
more to produce less load.

What you suggesting kills the idea of this enhancement.

--Yakov

2018-09-07 19:03 GMT+03:00 Andrey Kuznetsov :

> Yakov,
>
> Thanks for reply. Indeed, initial design assumed node termination when
> hanging critical thread has been detected. But sometimes it looks
> inappropriate. Let, for example fsync in WAL writer thread takes too long,
> and we terminate the node. Upon rebalancing, this may lead to long fsyncs
> on other nodes due to increased per node load, hence we can terminate the
> next node as well. Eventually we can collapse the entire cluster. Is it a
> possible scenario?
>
> пт, 7 сент. 2018 г. в 18:44, Yakov Zhdanov :
>
> > Andrey,
> >
> > I don't understand your point. My opinion, the idea of these changes is
> to
> > make cluster more stable and responsive by eliminating hanged nodes. I
> > would not make too much difference between threads trapped in deadlock
> and
> > threads hanging on fsync calls for too long. Both situations lead to
> > increasing latency in cluster till its full unavailability.
> >
> > So, killing node hanging on fsync may be reasonable. Agree?
> >
> > You may implement the approach when you have warning messages in logs by
> > default, but termination option should also be available.
> >
> > Thanks!
> >
> > --Yakov
> >
> >
>


Re: Critical worker threads liveness checking drawbacks

2018-09-07 Thread Yakov Zhdanov
Andrey,

I don't understand your point. My opinion, the idea of these changes is to
make cluster more stable and responsive by eliminating hanged nodes. I
would not make too much difference between threads trapped in deadlock and
threads hanging on fsync calls for too long. Both situations lead to
increasing latency in cluster till its full unavailability.

So, killing node hanging on fsync may be reasonable. Agree?

You may implement the approach when you have warning messages in logs by
default, but termination option should also be available.

Thanks!

--Yakov

2018-09-06 17:02 GMT+03:00 Andrey Kuznetsov :

> Igniters,
>
> Currently, we have a nearly completed implementation for system-critical
> threads liveness checking [1], in terms of IEP-14 [2] and IEP-5 [3]. In a
> nutshell, system-critical threads monitor each other and checks for two
> aspects:
> - whether a thread is alive;
> - whether a thread is active, i.e. it updates its heartbeat timestamp
> periodically.
> When either check fails, critical failure handler is called, this in fact
> means node stop.
>
> The implementation of activity checks has a flaw now: some blocking actions
> are parts of normal operation and should not lead to node stop, e.g.
> - WAL writer thread can call {{fsync()}};
> - any cache write that occurs in system striped executor can lead to
> {{fsync()}} call again.
> The former example can be fixed by disabling heartbeat checks temporarily
> for known long-running actions, but it won't work with for the latter one.
>
> I see a few options to address the issue:
> - Just log any long-running action instead of calling critical failure
> handler.
> - Introduce several severity levels for long-running actions handling. Each
> level will have its own failure handler. Depending on the level,
> long-running action can lead to node stop, error logging or no-op reaction.
>
> I encourage you to suggest other options. Any idea is appreciated.
>
> [1] https://issues.apache.org/jira/browse/IGNITE-6587
> [2]
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> 14+Ignite+failures+handling
> [3]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=74683878
>
> --
> Best regards,
>   Andrey Kuznetsov.
>


Re: MVCC and transactional SQL is merged to master

2018-08-30 Thread Yakov Zhdanov
Great news, Vladimir! Congratulations!

--Yakov

2018-08-30 15:15 GMT+03:00 Vladimir Ozerov :

> Igniters,
>
> I am glad to announce that we finally merged MVCC and transactional SQL
> support to master branch.
>
> This long journey started more than a year ago with multiple design
> brainstorm sessions, conducted by Apache Ignite fellows - Semen Boikov,
> Alexey Goncharuk, Sergi Vladykin.
>
> As things had became more clear, we gradually switched to active
> development phase in November 2017. Since then we implemented new
> transactional model based on multi-version approach and snapshot isolation,
> and almost fully reworked SQL engine to support transactions.
>
> But this is not the end of the story. In Apache Ignite 2.7 we expect to
> release transactional SQL as "release candidate". To achieve this we still
> need to implement a number of things, such as new transactional protocol
> for key-value API, historical rebalance, continuous queries. Between AI 2.7
> and AI 2.8 we will work on several not-yet-supported cache operations, and
> also will focus on performance and stability.
>
> I would like to thank all community members, who worked hard to make MVCC
> happen: Igor Seliverstov, Alexander Paschenko, Sergey Kalashnikov, igor
> Sapego, Roman Kondakov, Pavel Kuznetsov, Ivan Pavlukihn, Andrey Mashenkov,
> and many other contributors who helped us with design, testing and
> benchmarking.
>
> Release notes and documentation will be prepared by AI 2.7 release.
>
> Please feel free to ask any questions about the feature here.
>
> Vladimir.
>


Re: Apache Ignite 2.7 release

2018-08-29 Thread Yakov Zhdanov
Nikolay,

I think we should have 2 weeks after code freeze which by the way may
include RC1 voting stage. This way I would like us to agree that release
candidate should be sent to vote on Oct, 11th and we can release on Oct,
15th.

What do you think?

--Yakov


Re: Apache Ignite 2.7 release

2018-08-24 Thread Yakov Zhdanov
Igniters,

We should definitely expand the list of important tickets adding tickets
related to (1) Partition Map Exchange speed up and (2) SQL memory
optimization tickets. Alex Goncharuk, Vladimir, can you please add labels
to corresponding tickets.

Also we have several blocker tickets (mostly related to failing tests) not
yet assigned to anyone. Nikolay, can you please list those tickets and ask
community to pick them up?

We have 556 open tickets now according to [1]. This seems a bit to much.
Can I ask everyone to review tickets assigned to yourselves and move those
that will not be fixed. Then Nikolay will process the ones that are not
assigned yet. Nikolay, can you take a look please?

--Yakov

2018-08-24 12:58 GMT+03:00 Nikolay Izhikov :

> Hello, Dmitriy.
>
> Release page - [1]
>
> > I think it is important to include the links to all important Jira
> tickets> in this thread
>
> Open:
>
> https://issues.apache.org/jira/browse/IGNITE-6980 - Automatic cancelling
> of hanging Ignite operations
> https://issues.apache.org/jira/browse/IGNITE-5473 - Create ignite
> troubleshooting logger
> https://issues.apache.org/jira/browse/IGNITE-6903 - Implement new JMX
> metrics for Indexing
> https://issues.apache.org/jira/browse/IGNITE-6507 - Commit can be lost in
> network split scenario
>
> In Progress:
>
> https://issues.apache.org/jira/browse/IGNITE-9349
>
> Patch Available:
>
> https://issues.apache.org/jira/browse/IGNITE-7251 - Remove term "fabric"
> from Ignite deliverables
>
>
> Resolved:
>
> https://issues.apache.org/jira/browse/IGNITE-8780 - File I/O operations
> must be retried if buffer hasn't read/written completely
> https://issues.apache.org/jira/browse/IGNITE-5059 - Implement logistic
> regression
> https://issues.apache.org/jira/browse/IGNITE-3478 - Multi-version
> concurrency control
> https://issues.apache.org/jira/browse/IGNITE-9340 - Update jetty version
> in Apache Ignite (ignite-rest-http)
>
>
>
> [1] https://cwiki.apache.org/confluence/display/IGNITE/Apache+Ignite+2.7
>
> В Пт, 24/08/2018 в 10:47 +0300, Nikolay Izhikov пишет:
> > Hello, Pavel.
> >
> > Please, be aware of IGNITE-6055 [1]
> >
> > I'm edit thin protocol in that ticket.
> > I can't support changes in Python and PHP clients, because, they are not
> merged in master, yet.
> > Write me, If you have any questions about new fields.
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-6055
> >
> > В Чт, 23/08/2018 в 18:02 -0700, Pavel Petroshenko пишет:
> > > Hi Nikolay,
> > >
> > > Python [1], PHP [2], and Node.js [3] thin clients will get into the
> release.
> > >
> > > Thanks,
> > > p.
> > >
> > > [1] https://jira.apache.org/jira/browse/IGNITE-7782
> > > [2] https://jira.apache.org/jira/browse/IGNITE-7783
> > > [3] https://jira.apache.org/jira/browse/IGNITE-
> > >
> > >
> > > On Tue, Aug 21, 2018 at 12:20 PM, Dmitriy Setrakyan <
> dsetrak...@apache.org> wrote:
> > > > Thanks, Nikolay!
> > > >
> > > > I think it is important to include the links to all important Jira
> tickets
> > > > in this thread, so that the community can track them.
> > > >
> > > > D.
> > > >
> > > > On Tue, Aug 21, 2018 at 12:06 AM, Nikolay Izhikov <
> nizhi...@apache.org>
> > > > wrote:
> > > >
> > > > > Hello, Dmitriy.
> > > > >
> > > > > I think Transparent Data Encryption will be available in 2.7
> > > > >
> > > > > В Пн, 20/08/2018 в 13:20 -0700, Dmitriy Setrakyan пишет:
> > > > > > Hi Nikolay,
> > > > > >
> > > > > > Thanks for being the release manager!
> > > > > >
> > > > > > I am getting a bit lost in all these tickets. Can we specify some
> > > > > > high-level tickets, that are not plain bug fixes, which will be
> > > > >
> > > > > interesting
> > > > > > for the community to notice?
> > > > > >
> > > > > > For example, here are some significant tasks that the community
> is either
> > > > > > working on or has been working on:
> > > > > >
> > > > > > - Node.JS client
> > > > > > - Python client
> > > > > > - Transactional SQL (MVCC)
> > > > > > - service grid stabilization
> > > > > > - SQL memory utilization improvements
> > > > > > - more?
> > > > > >
> > > > > > Can you please solicit status from the community for these tasks?
> > > > > >
> > > > > > D.
> > > > > >
> > > > > > On Mon, Aug 20, 2018 at 11:22 AM, Nikolay Izhikov <
> nizhi...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > Hello, Igniters.
> > > > > > >
> > > > > > > I'm release manager of Apache Ignite 2.7.
> > > > > > >
> > > > > > > It's time to start discussion of release. [1]
> > > > > > >
> > > > > > > Current code freeze date is September, 30.
> > > > > > > If you have any objections - please, responsd to this thread.
> > > > > > >
> > > > > > > [1] https://cwiki.apache.org/confluence/display/IGNITE/
> > > > >
> > > > > Apache+Ignite+2.7
> > > > >
>


Re: ignite PureJavaCrc32 vs java.util.zip.CRC32 bench.

2018-08-14 Thread Yakov Zhdanov
Guys, what time in % does crc calculation take in WAL logging process?

--Yakov

2018-08-14 13:37 GMT+03:00 Dmitriy Pavlov :

> Hi Alex, thank you for this idea.
>
> Evgeniy, Alex, would you like to submit the patch with bypassing
> implementation differences to keep compatibility?
>
> Sincerely,
> Dmitriy Pavlov
>
> вт, 14 авг. 2018 г. в 12:06, Alex Plehanov :
>
> > Hello, Igniters!
> >
> > In java8 java.lang.zip.CRC32 methods become intrinsic, moreover new
> > "update" method, which use ByteBuffer was introduced. Since we moved to
> > java8, perhaps we really can get performance boost by using standard
> > java.lang.zip.CRC32 instead of PureJavaCrc32.
> >
> > About compatibility: looks like PureJavaCrc32 implements the same
> algorithm
> > as java.lang.zip.CRC32. These two implementations uses the same
> polynomial
> > and the same initial value. The only difference is final xor mask
> > (0x for java.lang.zip.CRC32). So, we can easily convert from
> > PureJavaCrc32
> > to standard CRC32 and vice versa, using this expression: crc32 ^=
> > 0x
> >
> >
> > 2018-08-14 0:19 GMT+03:00 Eduard Shangareev  >:
> >
> > > Evgeniy,
> > >
> > > Could you share benchmark code? And please share what version of JVM
> > > you have used.
> > >
> > > On Mon, Aug 13, 2018 at 10:44 PM Zhenya 
> > > wrote:
> > >
> > > > I think it would break backward compatibility, as Nikolay mentioned
> > above
> > > > we would take exception here:
> > > >
> > > > [1]
> > > >
> > > > https://github.com/apache/ignite/blob/master/modules/
> > > core/src/main/java/org/apache/ignite/internal/processors/
> > > cache/persistence/file/FilePageStore.java#L372
> > > >
> > > > thats why i question for community thoughts here.
> > > >
> > > > > Hi Evgeniy,
> > > > >
> > > > > would you like to submit a patch with CRC32 implementation change?
> > > > >
> > > > > Sincerely,
> > > > > Dmitriy Pavlov
> > > > >
> > > > > пн, 13 авг. 2018 г. в 22:08, Евгений Станиловский
> > > > > :
> > > > >
> > > > >> Hi, igniters, i wrote a simple bench, looks like PureJavaCrc32 has
> > > > >> performance problems in compatible with zip.CRC32.
> > > > >>
> > > > >> Benchmark Mode Cnt Score Error Units
> > > > >> BenchmarkCRC.Crc32 avgt 5 1088914.540 ± 368851.822 ns/op
> > > > >> BenchmarkCRC.pureJavaCrc32 avgt 5 6619408.049 ± 3746712.210 ns/op
> > > > >>
> > > > >> thoughts?
> > > >
> > >
> >
>


Re: welcome

2018-08-08 Thread Yakov Zhdanov
Hi Yriy!

Go ahead! You were added!

--Yakov

2018-08-07 17:19 GMT+03:00 Юрий :

> Hello, Ignite Community!
>
> My name is Iurii. I want to contribute to Apache Ignite.
> my JIRA user name is jooger. Any help on this will be appreciated.
>
> Thanks!
>
> --
> Live with a smile! :D
>


Re: welcome

2018-08-07 Thread Yakov Zhdanov
Sergei, you are welcome! I have added you to contributors. Please go ahead!

--Yakov

2018-08-07 16:18 GMT+03:00 s v :

> Hello, Ignite Community!
>
> My name is Sergei. I want to contribute to Apache Ignite and want to start
> with this issue - https://issues.apache.org/jira/browse/IGNITE-9141 ,
> my JIRA user name is SGrimstad. Any help on this will be appreciated.
>
> Thanks!
>
>


Re: IP finder in tests

2018-08-03 Thread Yakov Zhdanov
Well, I tend to agree. Can you try applying it on some suite and share the
results in terms of run time decrease and decreasing number of random
failures due to improper tests stop or cleanup?

--Yakov

2018-08-02 16:37 GMT+03:00 Denis Mekhanikov :

> Yakov,
>
> Almost every test in the project uses the Vm IP finder anyway.
> It has become a convention to use it in all tests.
> So, I'm trying to reduce the amount of copy-pasted code and improve test's
> isolation.
>
> Also tests may be run outside TeamCity during development process.
> Multicast Ip Finder makes developers disconnect from their networks to
> guarantee, that no other nodes will get in the way.
> So, I don't see any advantages of multicast IP finder over Vm in context of
> tests.
>
> Denis
>
> чт, 2 авг. 2018 г. в 1:46, Pavel Kovalenko :
>
> > Hi Yakov,
> >
> > Currently TC agents defended by Docker virtual network, that's why we
> don't
> > see intersection between several clusters, but in case of any step aside
> > (running several suites on one agent, running several tests on one
> machine
> > and so on) we will have problems and return back to this conversation.
> > I'm voting for simplifying and speeding up testing process. It will also
> > reduce the number of copy-paste in ton of tests, where Vm Ip Finder is
> used
> > explicitly. As developer I'm confusing when I see in a test VmIpFinder
> and
> > in other test Multicast without any reason or comment.
> > If you care about test coverage of MulticastIpFinder you can pick several
> > suites where number of starting/stopping is most frequent and leave
> > multicast there, but in general it's not necessary to have it everywhere.
> >
> > 2018-08-02 0:54 GMT+03:00 Yakov Zhdanov :
> >
> > > It should be true, otherwise we would have nodes from all agents
> > > intersecting. No?
> > >
> > > And multicast IP finder is the defailt one, so I would not reduce its
> > test
> > > volume.
> > >
> > > Yakov Zhdanov
> > > www.gridgain.com
> > >
> > > 2018-08-02 0:32 GMT+03:00 Dmitriy Pavlov :
> > >
> > > > Hi Yakov,
> > > >
> > > > Regarding Each TC agent use own multicast: I'm not sure it is true,
> TC
> > > > admins tried to do so, but not succeded.
> > > >
> > > > One more reason is speed of tests run. Why do we need to scan
> something
> > > if
> > > > we always will connect localhost. TC tests do not use multicast in
> > almost
> > > > every test.
> > > >
> > > > Sincerely,
> > > > Dmitriy Pavlov
> > > >
> > > > чт, 2 авг. 2018 г. в 0:27, Yakov Zhdanov :
> > > >
> > > > > I disagree. Probably, no change required. Each TC agent use own
> > > multicast
> > > > > group so nodes do not intersect. If any of the test does not
> properly
> > > > clean
> > > > > up and leaves nodes running this dhould be flagged as test fail
> which
> > > is
> > > > > the case.
> > > > >
> > > > > Please provide strong reasons to start with this.
> > > > >
> > > > > --Yakov
> > > > >
> > > >
> > >
> >
>


Re: [MTCGA]: new failures in builds [1575775] needs to be handled

2018-08-03 Thread Yakov Zhdanov
Maxim, did we have .net tests failing before merge? If yes, how come we
merged it?

--Yakov

2018-08-02 18:03 GMT+03:00 Maxim Muzafarov :

> Folks,
>
> Seems like rebalancing changes leads us to hanging TestRebalance() test for
> .NET.
> Created issue [1], I will try to investigate it.
>
> [1] https://issues.apache.org/jira/browse/IGNITE-9170
>
> On Thu, 2 Aug 2018 at 17:12  wrote:
>
> > Hi Ignite Developer,
> >
> > I am MTCGA.Bot, and I've detected some issue on TeamCity to be addressed.
> > I hope you can help.
> >
> >  *New Critical Failure in master Platform .NET (Long Running)
> > https://ci.ignite.apache.org/viewType.html?buildTypeId=
> IgniteTests24Java8_PlatformNetLongRunning=%3Cdefault%3E=
> buildTypeStatusDiv
> >  Changes may led to failure were done by
> >  - sboikov
> > http://ci.ignite.apache.org/viewModification.html?modId=
> 827278=false
> >  - saikat.maitra
> > http://ci.ignite.apache.org/viewModification.html?modId=
> 827276=false
> >  - maxmuzaf
> > http://ci.ignite.apache.org/viewModification.html?modId=
> 827268=false
> >  - biryukovvitaliy92
> > http://ci.ignite.apache.org/viewModification.html?modId=
> 827267=false
> >  - alkuznetsov.sb
> > http://ci.ignite.apache.org/viewModification.html?modId=
> 827258=false
> >  - ilantukh
> > http://ci.ignite.apache.org/viewModification.html?modId=
> 827250=false
> >
> > - If your changes can led to this failure(s), please create issue
> > with label MakeTeamCityGreenAgain and assign it to you.
> > -- If you have fix, please set ticket to PA state and write to
> dev
> > list fix is ready
> > -- For case fix will require some time please mute test and set
> > label Muted_Test to issue
> > - If you know which change caused failure please contact change
> > author directly
> > - If you don't know which change caused failure please send
> > message to dev list to find out
> > Should you have any questions please contact dpav...@apache.org or write
> > to dev.list
> > Best Regards,
> > MTCGA.Bot
> > Notification generated at Thu Aug 02 17:12:39 MSK 2018
> >
> --
> --
> Maxim Muzafarov
>


Re: IP finder in tests

2018-08-01 Thread Yakov Zhdanov
It should be true, otherwise we would have nodes from all agents
intersecting. No?

And multicast IP finder is the defailt one, so I would not reduce its test
volume.

Yakov Zhdanov
www.gridgain.com

2018-08-02 0:32 GMT+03:00 Dmitriy Pavlov :

> Hi Yakov,
>
> Regarding Each TC agent use own multicast: I'm not sure it is true, TC
> admins tried to do so, but not succeded.
>
> One more reason is speed of tests run. Why do we need to scan something if
> we always will connect localhost. TC tests do not use multicast in almost
> every test.
>
> Sincerely,
> Dmitriy Pavlov
>
> чт, 2 авг. 2018 г. в 0:27, Yakov Zhdanov :
>
> > I disagree. Probably, no change required. Each TC agent use own multicast
> > group so nodes do not intersect. If any of the test does not properly
> clean
> > up and leaves nodes running this dhould be flagged as test fail which is
> > the case.
> >
> > Please provide strong reasons to start with this.
> >
> > --Yakov
> >
>


Re: IP finder in tests

2018-08-01 Thread Yakov Zhdanov
I disagree. Probably, no change required. Each TC agent use own multicast
group so nodes do not intersect. If any of the test does not properly clean
up and leaves nodes running this dhould be flagged as test fail which is
the case.

Please provide strong reasons to start with this.

--Yakov


Re: Apache Ignite tickets with empty description & dev list references

2018-08-01 Thread Yakov Zhdanov
Agree with Dmitry P.

Guys! Even if you file a ticket on yourself please spend additional 5 mins
to add minimal description to provide idea on the intended changes for the
rest of our community or those who will have to deal with your changes in
future.

--Yakov


Re: Pushing IGNITE-6826 forward

2018-07-19 Thread Yakov Zhdanov
Guys, multicast IP finder gives new user an opportunity to run tests on
several machines with zero config changes. And you want to change this
which is not good in my view.

Probably, we need to output warning pointing that user can change multicast
group to avoid undesired discovery and isolate several clusters in the same
network.

--Yakov

2018-07-18 17:30 GMT+03:00 Dmitry Pavlov :

> Hi Stanislav,
>
> I wish this push will have effect.
>
> Just two proposals that will help Igniters to easily jump into such emails:
> 1. Include ticket short description into subject, not only number.
> 2. Include link to JIRA issue into body so it could be easily clicked to
> find out details.
> It can seem not important, but saves a minute for everyone.
>
> Sincerely,
> Dmitriy Pavlov
>
> ср, 18 июл. 2018 г. в 16:32, Stanislav Lukyanov :
>
> > Hi Igniters,
> >
> > There is a small but annoying issue with examples using MulticastIpFinder
> > by default.
> > The JIRA is  IGNITE-6826.
> >
> > AntonK and DmitriiR have suggested PRs to fix this, but PavelT had some
> > concerns and the fix stuck as the result.
> >
> > Pavel, could you please suggest necessary changes to the PRs so that guys
> > can move forward with integration?
> >
> > Thanks,
> > Stan
> >
>


Re: Potential OOM while iterating over query cursor. Review needed.

2018-07-18 Thread Yakov Zhdanov
Yes! Just deprecate getAll() and change default keepAll for scans to false.

--Yakov

2018-07-18 13:39 GMT+03:00 Alexey Goncharuk :

> Folks,
>
> There is no need to add getNext() method because the object we are
> discussing is already an iterator. Then, to summarize the solution, we are
> going to deprecate getAll() method and set keepAll flag to false for scan
> query.
>
> Agree?
>
> пн, 16 июл. 2018 г. в 23:40, Dmitriy Setrakyan :
>
> > On Mon, Jul 16, 2018 at 5:42 PM, Yakov Zhdanov 
> > wrote:
> >
> > > Dmitry, let's have only getNext() same as jdbc. All other shortcuts
> seem
> > to
> > > overload API without adding much value.
> > >
> >
> > Agree. Do you mind creating a ticket?
> >
>


Re: Async cache groups rebalance not started with rebalanceOrder ZERO

2018-07-18 Thread Yakov Zhdanov
Maxim, I checked and it seems that send retry count is used only in cache
IO manager and the usage is semantically very far from what I suggest.
Resend count limits the attempts count, while I meant successfull send but
possible problems on supplier side.

--Yakov

2018-07-17 19:01 GMT+03:00 Maxim Muzafarov :

> Yakov,
>
> But we already have DFLT_SEND_RETRY_CNT and DFLT_SEND_RETRY_DELAY for
> configuring our CommunicationSPI behavior. What if user configure this
> parameters his own way and he will see a lot of WARN messages in log which
> have no sense?
>
> May be we use GridCachePartitionExchangeManager#forceRebalance (or may
> be forceReassign) if we fail rebalance all that retries. What do you think?
>
>
>
> пн, 16 июл. 2018 г. в 21:12, Yakov Zhdanov :
>
> > Maxim, I looked at the code you provided. I think we need to add some
> > timeout validation and output warning to logs on demander side in case
> > there is no supply message within 30 secs and repeat demanding process.
> > This should apply to any demand message throughout the rebalancing
> process
> > not only the 1st one.
> >
> > You can use the following message
> >
> > Failed to wait for supply message from node within 30 secs [cache=C,
> > partId=XX]
> >
> > Alex Goncharuk do you have comments here?
> >
> > Yakov Zhdanov
> > www.gridgain.com
> >
> > 2018-07-14 19:45 GMT+03:00 Maxim Muzafarov :
> >
> > > Yakov,
> > >
> > > Yes, you're right. Whole rebalancing progress will be stopped.
> > >
> > > Actually, rebalancing order doesn't matter you right it too. Javadoc
> just
> > > says the idea how rebalance should work for caches but in fact it don't
> > > work as described. Personally, I'd prefer to start rebalance of each
> > cache
> > > group in async way independently.
> > >
> > > Please, look at my reproducer [1].
> > >
> > > Scenario:
> > > Cluster with two REPLICATEDED caches.
> > > Start new node.
> > > First rebalance cache group is failed to start (e.g. network issues) -
> > it's
> > > OK.
> > > Second rebalance cache group will neber be started - whole futher
> > progress
> > > stucks (I think rebalance here should be started!).
> > >
> > >
> > > [1]
> > > https://github.com/Mmuzaf/ignite/blob/rebalance-cancel/
> > > modules/core/src/test/java/org/apache/ignite/internal/
> > > processors/cache/distributed/rebalancing/
> GridCacheRebalancingCancelSelf
> > > Test.java
> > >
> > > пт, 13 июл. 2018 г. в 17:46, Yakov Zhdanov :
> > >
> > > > Maxim, I do not understand the problem. Imagine I do not have any
> > > ordering
> > > > but rebalancing of some cache fails to start - so in my understanding
> > > > overall rebalancing progress becomes blocked. Is that true?
> > > >
> > > > Can you pleaes provide reproducer for your problem?
> > > >
> > > > --Yakov
> > > >
> > > > 2018-07-09 16:42 GMT+03:00 Maxim Muzafarov :
> > > >
> > > > > Hello Igniters,
> > > > >
> > > > > Each cache group has “rebalance order” property. As javadoc for
> > > > > getRebalanceOrder() says: “Note that cache with order {@code 0}
> does
> > > not
> > > > > participate in ordering. This means that cache with rebalance order
> > > > {@code
> > > > > 0} will never wait for any other caches. All caches with order
> {@code
> > > 0}
> > > > > will be rebalanced right away concurrently with each other and
> > ordered
> > > > > rebalance processes. If not set, cache order is 0, i.e. rebalancing
> > is
> > > > not
> > > > > ordered.”
> > > > >
> > > > > In fact GridCachePartitionExchangeManager always build the chain
> of
> > > > > rebalancing cache groups to start (even for cache order ZERO):
> > > > >
> > > > > ignite-sys-cache -> cacheR -> cacheR3 -> cacheR2 -> cacheR5 ->
> > cacheR1.
> > > > >
> > > > > If one of these groups will fail to start further groups will never
> > be
> > > > run.
> > > > >
> > > > > * Question 1*: Should we fix javadoc description or create a bug
> for
> > > > fixing
> > > > > such rebalance behavior?
> > > > >
> > > > > [1]
> > > > > https://github.com/apache/ignite/blob/master/modules/
> > > > > core/src/main/java/org/apache/ignite/internal/processors/cache/
> > > > > GridCachePartitionExchangeManager.java#L2630
> > > > >
> > > >
> > > --
> > > --
> > > Maxim Muzafarov
> > >
> >
> --
> --
> Maxim Muzafarov
>


Re: Neighbors exclusion

2018-07-16 Thread Yakov Zhdanov
Dmitry, it hink we can do this change right away. All we need is to add
proper error message on cache config validation in order to tell user that
default changed and manual configuration is needed for compatibility.

--Yakov

2018-07-16 15:47 GMT+03:00 Dmitry Karachentsev :

> Created a ticket and mapped to 3.0 version, as it changes basic default
> behavior:
> https://issues.apache.org/jira/browse/IGNITE-9011
>
> Thanks!
>
> 13.07.2018 22:10, Valentin Kulichenko пишет:
>
> Dmitry,
>>
>> Good point. I think it makes sense to even remove (deprecate) the
>> excludeNeighbors property and always distribute primary and backups to
>> different physical hosts in this scenario. Because why would anyone ever
>> set this to false if we switch default to true? This also automatically
>> fixes the confusing behavior of backupFilter - it should never be ignored
>> if it's set.
>>
>> -Val
>>
>> On Fri, Jul 13, 2018 at 8:05 AM Dmitry Karachentsev <
>> dkarachent...@gridgain.com> wrote:
>>
>> Hi folks,
>>>
>>> Why RendezvousAffinityFunction.excludeNeighbors [1] is false by default?
>>> It's not obvious that if user wants to run more than one node per
>>> machine it has also set this flag to true explicitly. Maybe it would be
>>> better to set it to true by default?
>>>
>>> At the same time if excludeNeighbors is true, it ignores backupFilter.
>>> Why it's not vice-versa? For example:
>>>
>>> 1) if backupFilter is set - it will be used,
>>>
>>> 2) if there are not enough backup nodes (or no backupFilter) - try to
>>> distribute according to excludeNeighbors = true,
>>>
>>> 3) if this is not possible too (or excludeNeighbors) = false - assign
>>> partitions as possible.
>>>
>>> [1]
>>>
>>> https://ignite.apache.org/releases/latest/javadoc/org/apache
>>> /ignite/cache/affinity/rendezvous/RendezvousAffinityFunction
>>> .html#setExcludeNeighbors-boolean-
>>>
>>> Are there any drawbacks in such approach?
>>>
>>> Thanks!
>>>
>>>
>>>
>


Re: Async cache groups rebalance not started with rebalanceOrder ZERO

2018-07-16 Thread Yakov Zhdanov
Maxim, I looked at the code you provided. I think we need to add some
timeout validation and output warning to logs on demander side in case
there is no supply message within 30 secs and repeat demanding process.
This should apply to any demand message throughout the rebalancing process
not only the 1st one.

You can use the following message

Failed to wait for supply message from node within 30 secs [cache=C,
partId=XX]

Alex Goncharuk do you have comments here?

Yakov Zhdanov
www.gridgain.com

2018-07-14 19:45 GMT+03:00 Maxim Muzafarov :

> Yakov,
>
> Yes, you're right. Whole rebalancing progress will be stopped.
>
> Actually, rebalancing order doesn't matter you right it too. Javadoc just
> says the idea how rebalance should work for caches but in fact it don't
> work as described. Personally, I'd prefer to start rebalance of each cache
> group in async way independently.
>
> Please, look at my reproducer [1].
>
> Scenario:
> Cluster with two REPLICATEDED caches.
> Start new node.
> First rebalance cache group is failed to start (e.g. network issues) - it's
> OK.
> Second rebalance cache group will neber be started - whole futher progress
> stucks (I think rebalance here should be started!).
>
>
> [1]
> https://github.com/Mmuzaf/ignite/blob/rebalance-cancel/
> modules/core/src/test/java/org/apache/ignite/internal/
> processors/cache/distributed/rebalancing/GridCacheRebalancingCancelSelf
> Test.java
>
> пт, 13 июл. 2018 г. в 17:46, Yakov Zhdanov :
>
> > Maxim, I do not understand the problem. Imagine I do not have any
> ordering
> > but rebalancing of some cache fails to start - so in my understanding
> > overall rebalancing progress becomes blocked. Is that true?
> >
> > Can you pleaes provide reproducer for your problem?
> >
> > --Yakov
> >
> > 2018-07-09 16:42 GMT+03:00 Maxim Muzafarov :
> >
> > > Hello Igniters,
> > >
> > > Each cache group has “rebalance order” property. As javadoc for
> > > getRebalanceOrder() says: “Note that cache with order {@code 0} does
> not
> > > participate in ordering. This means that cache with rebalance order
> > {@code
> > > 0} will never wait for any other caches. All caches with order {@code
> 0}
> > > will be rebalanced right away concurrently with each other and ordered
> > > rebalance processes. If not set, cache order is 0, i.e. rebalancing is
> > not
> > > ordered.”
> > >
> > > In fact GridCachePartitionExchangeManager always build the chain of
> > > rebalancing cache groups to start (even for cache order ZERO):
> > >
> > > ignite-sys-cache -> cacheR -> cacheR3 -> cacheR2 -> cacheR5 -> cacheR1.
> > >
> > > If one of these groups will fail to start further groups will never be
> > run.
> > >
> > > * Question 1*: Should we fix javadoc description or create a bug for
> > fixing
> > > such rebalance behavior?
> > >
> > > [1]
> > > https://github.com/apache/ignite/blob/master/modules/
> > > core/src/main/java/org/apache/ignite/internal/processors/cache/
> > > GridCachePartitionExchangeManager.java#L2630
> > >
> >
> --
> --
> Maxim Muzafarov
>


Re: Ignite guide for community developes

2018-07-16 Thread Yakov Zhdanov
I think you need to signup to Apache jira and let us know your user ID so
we can add you to contributors. Dmitry Pavlov, can you please help.

--Yakov

2018-07-12 18:54 GMT+03:00 vgrigorev :

> Hi colleges!
>
> I would like move topic to suitable place.
>
> Please only clarify how to do it:
> In a page about creating IEP
> link
>  Ignite+Enhancement+Proposal?showChildren=false>
> There are no appropriate information.
>
> If you can do it please do.
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>


Re: Potential OOM while iterating over query cursor. Review needed.

2018-07-16 Thread Yakov Zhdanov
Dmitry, let's have only getNext() same as jdbc. All other shortcuts seem to
overload API without adding much value.

--Yakov

2018-07-16 17:33 GMT+03:00 Dmitriy Setrakyan :

> Well, instead of getFirst(), I would have getNext(). This way we do not
> have to keep the first entry forever, which could present a problem in case
> if entry is too large.
>
> As far as initializing keepAll() to false - completely agree.
>
> D.
>
> On Mon, Jul 16, 2018 at 4:43 PM, Alexey Goncharuk <
> alexey.goncha...@gmail.com> wrote:
>
> > No objections from my side. Would be nice to receive some feedback from
> > other community members, though, because this is formally a breaking
> > change.
> >
> > пн, 16 июл. 2018 г. в 16:40, Yakov Zhdanov :
> >
> > > Guys, it seems we need to deprecate getAll() and remove it in 3.0. I
> > think
> > > it is usable only for queries that return 1 row. Every other case needs
> > > iteration. So having getFirst() seems to be better. Thoughts?
> > >
> > > As far as ScanQuery I think we can properly initialize keepAll to false
> > on
> > > scan query instantiation. I am pretty sure none needs getAll() in
> scans.
> > > Alex?
> > >
> > > --
> > > Yakov
> > >
> >
>


Re: Potential OOM while iterating over query cursor. Review needed.

2018-07-16 Thread Yakov Zhdanov
Guys, it seems we need to deprecate getAll() and remove it in 3.0. I think
it is usable only for queries that return 1 row. Every other case needs
iteration. So having getFirst() seems to be better. Thoughts?

As far as ScanQuery I think we can properly initialize keepAll to false on
scan query instantiation. I am pretty sure none needs getAll() in scans.
Alex?

--
Yakov


Re: Async cache groups rebalance not started with rebalanceOrder ZERO

2018-07-13 Thread Yakov Zhdanov
Maxim, I do not understand the problem. Imagine I do not have any ordering
but rebalancing of some cache fails to start - so in my understanding
overall rebalancing progress becomes blocked. Is that true?

Can you pleaes provide reproducer for your problem?

--Yakov

2018-07-09 16:42 GMT+03:00 Maxim Muzafarov :

> Hello Igniters,
>
> Each cache group has “rebalance order” property. As javadoc for
> getRebalanceOrder() says: “Note that cache with order {@code 0} does not
> participate in ordering. This means that cache with rebalance order {@code
> 0} will never wait for any other caches. All caches with order {@code 0}
> will be rebalanced right away concurrently with each other and ordered
> rebalance processes. If not set, cache order is 0, i.e. rebalancing is not
> ordered.”
>
> In fact GridCachePartitionExchangeManager always build the chain of
> rebalancing cache groups to start (even for cache order ZERO):
>
> ignite-sys-cache -> cacheR -> cacheR3 -> cacheR2 -> cacheR5 -> cacheR1.
>
> If one of these groups will fail to start further groups will never be run.
>
> * Question 1*: Should we fix javadoc description or create a bug for fixing
> such rebalance behavior?
>
> [1]
> https://github.com/apache/ignite/blob/master/modules/
> core/src/main/java/org/apache/ignite/internal/processors/cache/
> GridCachePartitionExchangeManager.java#L2630
>


Re: Add cluster (de)activation events IGNITE-8376

2018-07-12 Thread Yakov Zhdanov
If events are working when grid is not active and adding/removing listeners
is also possible then I agree/

--Yakov


Re: Ignite guide for community developes

2018-07-12 Thread Yakov Zhdanov
Hi!

Can you please move you proposal to Apache Ignite Wiki as new IEP? So that
community can discuss and comment? Eventually we will end up with a plan on
creating the guide. What do you think?

--Yakov

2018-07-12 12:24 GMT+03:00 vgrigorev :

> Very many developers have desire to participate in Ignite development but
> tries of this is typically not successful due to big barrier to begin
> develop to such a complex project.During Moscow Ignite Meetup #3  we were
> discussed with GridGain presenters that for attracting community to
> development process  creating a good guide can be crucial and I want to
> propose create such a guide.For the proposal to be more concrete and
> substantial I write document with my vision of basic organization, problem
> aspects that stops developers, sample of such documentation from really
> successful community.File is attached.Please consider and discuss
> Proposal_for_Ignite_development_guide.docx
>  com/file/t544/Proposal_for_Ignite_development_guide.docx>
> this proposal.
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/


Re: Add cluster (de)activation events IGNITE-8376

2018-07-12 Thread Yakov Zhdanov
Its counter event on_after_deactivated cannot be fired through event
notification support. Therefore, it is better to gather such events in
LifeCycleEventType.

--Yakov


Re: Clusterwide settings validation

2018-07-11 Thread Yakov Zhdanov
Ivan, yes. We can go with reflection through configuration and SPIs and,
you are rigth, suppressed list should be manually defined.

--Yakov


Re: Clusterwide settings validation

2018-07-10 Thread Yakov Zhdanov
Ivan, I would think of some test that will randomly generate configs for
nodes and run some logic.

--Yakov


Re: Move CacheStore::loadCache to a separate interface

2018-07-09 Thread Yakov Zhdanov
Stan, feel free to file the ticket. Just make sure to add detailed
description to it. Your suggestion seems to make sense.

--Yakov


Re: Add cluster (de)activation events IGNITE-8376

2018-07-09 Thread Yakov Zhdanov
> What is the difference between a lifecycle even and regular events?

Lifecycle events should be used when there is no opportunity for Ignite to
fire regular event, e.g. node stops or is not started yet. Please see
https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/lifecycle/LifecycleEventType.html

--Yakov


Re: MVCC and IgniteDataStreamer

2018-07-09 Thread Yakov Zhdanov
Igor,

I can't say if I agree with any of the suggestions. I would like us to
start from answering the question - what is data streamer used for?

First of all, for initial data loading. This should be super fast mode
probably ignoring all transactional semantics, but providing certain
guarantees for data passed into streamer to be loaded.

Second, for continuously streaming updates to some tables (from more than 1
streamer) and running some analytics over data, probably, with some
modifications from non-streamer side (user transactions). This way
streamers should not rollback user txs or do any kind of unexpected
visibility tricks. I think we can think of proper streamer tx on batch or
key level.

Third case I see is a combination of the above - we stream portions of data
to an existing table let's say once a day (which may be some market data
after closing or offloaded operations data set) with or without any other
concurrent non-streamer operations. This mode may involve table locks or do
the same as 2nd mode which should be up to user to decide.

So, planned changes to streamer should support at least these 3 scenarios.
What do you think?

Igniters, feel free sharing your thoughts on this. Question is pretty
important for us.

--Yakov


Re: Clusterwide settings validation

2018-07-06 Thread Yakov Zhdanov
Guys, I created ticket for config params validation -
https://issues.apache.org/jira/browse/IGNITE-8951. Feel free to comment.

Yakov Zhdanov
www.gridgain.com

2018-07-04 10:51 GMT+03:00 Andrew Medvedev :

> Hi Nikolay
>
> No, we have been beaten by
> https://issues.apache.org/jira/browse/IGNITE-8904?jql=text%20~%20%
> 22rebalanceThreadPoolSize%22
> it is not checked on start
>
> Utility I mean check
> org.apache.ignite.configuration.IgniteConfiguration and children
>
> On Wed, Jul 4, 2018 at 10:36 AM, Nikolay Izhikov 
> wrote:
> > Hello, Andrew.
> >
> > Can you clarify your question?
> >
> > What checks do you mean, exactly?
> > Do you mean internal Ignite checks or user-provided checks?
> >
> > Ignite checks configuration consistency on node start [1].
> >
> > Ignite do have consistency check for a joining node. Take a look at [2]
> and all of it children.
> >
> > [1] https://github.com/apache/ignite/blob/master/modules/
> core/src/main/java/org/apache/ignite/internal/IgniteKernal.java#L825
> > [2] https://github.com/apache/ignite/blob/master/modules/
> core/src/main/java/org/apache/ignite/internal/GridComponent.java#L153
> >
> > В Ср, 04/07/2018 в 08:58 +0300, Andrew Medvedev пишет:
> >> Hello everybody
> >>
> >> Our company has lots of nodes in cluster, and we have seen some
> >> problems with inconsistent settings on nodes clusterwide. To help us
> >> with this, we made an utility to check consistency of settings on
> >> running cluster, but it is a hack, better ways seems to be settings
> >> validation by each node itself on start/joining topology/etc..
> >>
> >> 1) Is his needed?
> >> 2) Have the implementation details been discussed somewhere?
> >>
> >> Cheers
>


[jira] [Created] (IGNITE-8951) Need to validate nodes configuration across cluster and warn on different parameters value

2018-07-06 Thread Yakov Zhdanov (JIRA)
Yakov Zhdanov created IGNITE-8951:
-

 Summary: Need to validate nodes configuration across cluster and 
warn on different parameters value
 Key: IGNITE-8951
 URL: https://issues.apache.org/jira/browse/IGNITE-8951
 Project: Ignite
  Issue Type: Task
Reporter: Yakov Zhdanov


On node start, node should output in its logs the  list of parameters havnig 
values different from values on remote node. This should be skipped for 
parameters that are always different e.g. host name, node ID or IP, however 
should be an option to include parameters from default ignore list as well.

Another requrement is that the intended output may be fully supressed by 
setting sysmem property -DIGNITE_SKIP_CONFIGURATION_CONSISTENCY_CHECK=true

It seems that the implementation approach should be similar to performance 
suggestions Ignite currently has.

The output may be as following
{noformat}
Local node has different configuration comparted to remote nodes for 
paramenters (fix if possible). To disable, set 
-DIGNITE_SKIP_CONFIGURATION_CONSISTENCY_CHECK=true
  ^-- rebalanceThreadPoolSize [locNodeVal=4, rmtNodeId=X1X1, rmtNodeVal=8]
  ^-- commSpi.selectorsCnt [locNodeVal=2, rmtNodeId=Y1Y1, rmtNodeVal=4]
  ^-- commSpi.selectorsCnt [locNodeVal=2, rmtNodeId=Z1Z1, rmtNodeVal=8]
{noformat}

All components should add messages to {{cfgConsistencyRegister}} on startup and 
then all differences should be output in one step.

If node aborts startup due to any problem differences collected so far should 
be output to logs.

If there are more than 1 value for some config parameter among remote nodes 
then all distinct options should be output (see {{commSpi.selectorsCnt}} in the 
example above).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Things To Do Before You Turn 30^W^WRelease 3.0

2018-06-08 Thread Yakov Zhdanov
We can start with separate discussion on dev list. Or can you point to
existing one? I think I need some details here.

--Yakov


Re: Things To Do Before You Turn 30^W^WRelease 3.0

2018-06-08 Thread Yakov Zhdanov
Ilya,

In my view putting @Deprecated is enough for now for things that should be
removed. When we will come closer to 3.0 release we will look through all
deprecated stuff and remove it. This applies to
IGNITE_BINARY_SORT_OBJECT_FIELDS.
Can you please annotate it?

As far as list of breaking changes we may want to do in 3.0, I think we
need to file tickets for 3.0 and label them as "breaking change" to reflect
in release notes those that will be implemented.

Thoughs?

--Yakov

2018-06-07 19:36 GMT+03:00 Ilya Kasnacheev :

> Hello!
>
> Do we have an official subj list? Such as Wiki page or JIRA label?
>
> Cause if we don't, we'll surely forget a lot of things and will have to
> wait for 4.0.
> We already did that with IGNITE_BINARY_SORT_OBJECT_FIELDS in 2.0 :(
>
> I expect to have in this list breaking changes, and @Deprecated stuff to be
> finally removed, and maybe unmaintained modules to be put to rest.
>
> Regards,
> --
> Ilya Kasnacheev
>


Re: WAL iterator unexpected behavior

2018-05-30 Thread Yakov Zhdanov
This is for offline WAL analysis. So skipping record with proper message is
also a solution. If it is possible, iterator should output a suggestion on
what is missing in classpath. Option to suppress warnings should also
present.

Makes sense?

And final question - did we look at similar utilities from other vendors?

--Yakov


Re: Thin clients connections management.

2018-05-30 Thread Yakov Zhdanov
Andrey, your suggestions look good to me. Let maintainer review your patch.

--Yakov


Re: async operation is not fair async

2018-05-24 Thread Yakov Zhdanov
Alexey Goncharuk, I remember we started working on async connection
establishment. This should fix latency issue related to network which I
believe gives the most contribution to overall latency. Mapping logic and
other stuff can be ignored as it can very rarely be an issue at least on
stable tolopogies. What is the status with async connections? That would
really be a huge improvement!

Also please remember that uncontrolled async operations may lead to OOME,
therefore at some point when there are too many uncompleted async
operations newly invoked async operations should become synchronous, i.e.
we should return completed future, ignoring the fact that user expected us
to be async.

I would like to have very strong reasons to start reapproaching this.

--Yakov


Re: Postpone Apache Ignite 2.5 release to fix baseline topology

2018-04-28 Thread Yakov Zhdanov
Guys, how about we release 2.5 in the nearest future after adding proper
usability log messages that will explain user what to do and also output
link to readme.io with the first BLT related message during node uptime.
This should not take much time and we can use the same messages when we
have (BL)AT modes in 2.6. I think that adding messages makes sense and
should be clear for users which is not the case for 2.4.

--Yakov


Re: cache size() calculation for MVCC

2018-04-25 Thread Yakov Zhdanov
Guys,

How do we update counter right now?

If we move to fair thread-per-partition we can update counter only if we
add new key and skip if we add or remove a version. Thoughts?

--Yakov

2018-04-25 12:07 GMT+03:00 Vladimir Ozerov :

> This is interesting question. Full-scan size may be tremendously slow
> operation on large data sets. On the other hand, printing total number of
> tuples including old and aborted versions make little to no sense as well.
> Looks like we need to choose lesser of two evils. What if we do the
> following:
> 1) Left default behavior as is - O(1) complexity, but includes invalid
> versions
> 2) As Sergey suggested, add new peek mode "MVCC_ALIVE_ONLY" which will
> perform full scan.
>
> Alternatively we may throw an "UnsupportedOperationException" from this
> method - why not?
>
> Thoughts?
>
> On Tue, Apr 24, 2018 at 4:28 PM, Sergey Kalashnikov  >
> wrote:
>
> > Hi Igniters,
> >
> > I need your advice on a task at hand.
> >
> > Currently cache API size() is a constant time operation, since the
> > number of entries is maintained as a separate counter.
> > However, for MVCC-enabled cache there can be multiple versions of the
> > same entry.
> > In order to calculate the size we need to obtain a MVCC snapshot and
> > then iterate over data pages filtering invisible versions.
> > So, it is impossible to keep the same complexity guarantees.
> >
> > My current implementation internally switches to "full-scan" approach
> > if cache in question is a MVCC-enabled cache.
> > It happens unbeknown to users, which may expect lightning-fast
> > response as before.
> > Perhaps, we might add a new constant to CachePeekMode enumeration that
> > is passed to cache size() to make it explicit?
> >
> > The second concern is that cache size calculation is also included
> > into Cache Metrics API and Visor functionality.
> > Will it be OK for metrics and things alike to keep returning raw
> > unfiltered number of entries?
> > Is there any sense in showing raw unfiltered number of entries which
> > may vary greatly from invokation to invokation with just simple
> > updates running in background?
> >
> > Please share your thoughts.
> >
> > Thanks in advance.
> > --
> > Sergey
> >
>


Re: Orphaned, duplicate, and main-class tests!

2018-04-23 Thread Yakov Zhdanov
Alexey Goncharuk, Vladimir Ozerov, what do you think about these tests?

I believe they were created as a part of variuos optimization and profiling
activities. I also think we can remove them since nobody cares about them
for too long.

Thoughts?

Yakov Zhdanov

ср, 18 апр. 2018 г., 16:42 Ilya Kasnacheev <ilya.kasnach...@gmail.com>:

> Hello!
>
> I've decided to return to this task after a break.
>
> Can you please tell me why do we have main-class tests? Such as
>
> GridBasicPerformanceTest.class,
> GridBenchmarkCacheGetLoadTest.class,
> GridBoundedConcurrentLinkedHashSetLoadTest.class,
> GridCacheDataStructuresLoadTest.class,
> GridCacheReplicatedPreloadUndeploysTest.class,
> GridCacheLoadTest.class,
> GridCacheMultiNodeDataStructureTest.class,
> GridCapacityLoadTest.class,
> GridContinuousOperationsLoadTest.class,
> GridFactoryVmShutdownTest.class,
> GridFutureListenPerformanceTest.class,
> GridFutureQueueTest.class,
> GridGcTimeoutTest.class,
> GridJobExecutionSingleNodeLoadTest.class,
> GridJobExecutionSingleNodeSemaphoreLoadTest.class,
> GridJobLoadTest.class,
> GridMergeSortLoadTest.class,
> GridNioBenchmarkTest.class,
> GridThreadPriorityTest.class,
> GridSystemCurrentTimeMillisTest.class,
> BlockingQueueTest.class,
> MultipleFileIOTest.class,
> GridSingleExecutionTest.class
>
>
> If nobody wants them, how about we delete them in master branch? Start
> afresh?
>
> --
> Ilya Kasnacheev
>
> 2018-02-13 17:02 GMT+03:00 Ilya Kasnacheev <ilya.kasnach...@gmail.com>:
>
> > Anton,
> >
> > >Tests should be attached to appropriate suites
> >
> > This I can do
> >
> > > and muted if necessary, Issues should be created on each mute.
> >
> > This is roughly a week of work. I can't spare that right now. I doubt
> > anyone can.
> >
> > Can we approach this by smaller steps?
> >
> > --
> > Ilya Kasnacheev
> >
> > 2018-02-06 19:55 GMT+03:00 Anton Vinogradov <avinogra...@gridgain.com>:
> >
> >> Val,
> >>
> >> Tests should be attached to appropriate suites and muted if necessary,
> >> Issues should be created on each mute.
> >>
> >> On Tue, Feb 6, 2018 at 7:23 PM, Valentin Kulichenko <
> >> valentin.kuliche...@gmail.com> wrote:
> >>
> >> > Anton,
> >> >
> >> > I tend to agree with Ilya that identifying and fixing all the possible
> >> > broken tests in one go is not feasible. What is the proper way in your
> >> > view? What are you suggesting?
> >> >
> >> > -Val
> >> >
> >> > On Mon, Feb 5, 2018 at 2:18 AM, Anton Vinogradov <
> >> avinogra...@gridgain.com
> >> > >
> >> > wrote:
> >> >
> >> > > Ilya,
> >> > >
> >> > > 1) Still see no reason for such changes. Does this break something?
> >> > >
> >> > > 2) Looks like you're trying to add Trash*TestSuite.java which will
> >> never
> >> > be
> >> > > refactored.
> >> > > We should do everything in proper way now, not sometime.
> >> > >
> >> > > 3) Your comments looks odd to me.
> >> > > Issue should be resolved in proper way.
> >> > >
> >> > > On Mon, Feb 5, 2018 at 1:07 PM, Ilya Kasnacheev <
> >> > ilya.kasnach...@gmail.com
> >> > > >
> >> > > wrote:
> >> > >
> >> > > > Anton,
> >> > > >
> >> > > > 1) We already have ~100 files named "*AbstractTest.java". Renaming
> >> > these
> >> > > > several files will help checking for orphaned tests in the future,
> >> as
> >> > > well
> >> > > > as increasing code base consistency.
> >> > > >
> >> > > > 2) This is huge work that is not doable by any single developer.
> >> While
> >> > > > IgniteLostAndFoundTestSuite can be slowly refactored away
> >> > > > This is unless you are OK with putting all these tests, most of
> >> which
> >> > are
> >> > > > red and some are hanging, in production test suites and therefore
> >> > > breaking
> >> > > > productivity for a couple months while this gets sorted.
> >> > > > Are you OK with that? Anybody else?
> >> > > >
> >> > > &g

Re: Topology-wide notification on critical errors

2018-04-20 Thread Yakov Zhdanov
Of course, no guarantees, but at least an effort.

--Yakov


Topology-wide notification on critical errors

2018-04-19 Thread Yakov Zhdanov
Guys,

We have activity to implement a set of mechanisms to handle critical issues
on nodes (IEP-14 - [1]).

I have an idea to spread message about critical issues to nodes through
entire topology and put it to logs of all nodes. In my view this will add
much more clarity. Imagine all nodes output message to log - "Critical
system thread failed on node XXX [details=...]". This should help a lot
with investigations.

Andrey Gura, Alex Goncharuk what do you think?

--Yakov

[1]
https://cwiki.apache.org/confluence/display/IGNITE/IEP-14+Ignite+failures+handling


Re: Deprecate CacheRebalanceMode.NONE

2018-04-17 Thread Yakov Zhdanov
+1 here

Always wanted to remove ForceKeysRequest =)

--Yakov


Re: Rebalancing - how to make it faster

2018-04-09 Thread Yakov Zhdanov
How about skipWalOnRebalancing or disableWalOnRebalancing? I like the first
one better and both are shorter.

--Yakov

2018-04-09 19:55 GMT+03:00 Denis Magda :

> I would enable this option only after we confirm it's stable. Until it
> happens it should be treated as an experimental feature that is turned on
> manually with a parameter like Ilya is suggesting.
>
> Ilya, the name sounds good to me.
>
> --
> Denis
>
> On Mon, Apr 9, 2018 at 9:16 AM, Anton Vinogradov  wrote:
>
> > Ilya,
> >
> > WAL should be automatically disabled at initial rebalancing.
> > It can't be disabled at regilar rebalancing, since you never ready to
> lose
> > data you protected by WAL.
> > Is there any need to have special param in that case?
> >
> > 2018-04-09 16:41 GMT+03:00 Ilya Lantukh :
> >
> > > Igniters,
> > >
> > > I am currently at the finish line of
> > > https://issues.apache.org/jira/browse/IGNITE-8017 ("Disable WAL during
> > > initial preloading") implementation. And I need that such behavior
> should
> > > be configurable. In my intermediate implementation I have parameter
> > called
> > > "disableWalDuringRebalancing" in IgniteConfiguration. Do you thing such
> > > name is meaningful and self-explanatory? Do we need to ensure that it
> has
> > > the same value on every node? Should I make it configurable per cache
> > > rather than globally?
> > >
> > > Please share your thoughts.
> > >
> > > On Mon, Apr 9, 2018 at 4:32 PM, Ilya Lantukh 
> > > wrote:
> > >
> > > > Denis,
> > > >
> > > > Those ticket are rather complex, and so I don't know when I'll be
> able
> > to
> > > > start working on them.
> > > >
> > > > On Fri, Mar 30, 2018 at 11:45 PM, Denis Magda 
> > wrote:
> > > >
> > > >> Ilya,
> > > >>
> > > >> Just came across the IEP put together by you:
> > > >> https://cwiki.apache.org/confluence/display/IGNITE/IEP-16%
> > > >> 3A+Optimization+of+rebalancing
> > > >>
> > > >> Excellent explanation, thanks for aggregating everything there.
> > > >>
> > > >> Two tickets below don't have a fixed version assigned:
> > > >> https://issues.apache.org/jira/browse/IGNITE-8020
> > > >> https://issues.apache.org/jira/browse/IGNITE-7935
> > > >>
> > > >> Do you plan to work on them in 2.6 time frame, right?
> > > >>
> > > >> --
> > > >> Denis
> > > >>
> > > >> On Tue, Mar 27, 2018 at 9:29 AM, Denis Magda 
> > wrote:
> > > >>
> > > >> > Ilya, granted you all the required permissions. Please let me know
> > if
> > > >> you
> > > >> > still have troubles with the wiki.
> > > >> >
> > > >> > --
> > > >> > Denis
> > > >> >
> > > >> > On Tue, Mar 27, 2018 at 8:56 AM, Ilya Lantukh <
> > ilant...@gridgain.com>
> > > >> > wrote:
> > > >> >
> > > >> >> Unfortunately, I don't have permission to create page for IEP on
> > > wiki.
> > > >> >> Denis, can you grant it? My username is ilantukh.
> > > >> >>
> > > >> >> On Mon, Mar 26, 2018 at 8:04 PM, Anton Vinogradov  >
> > > >> wrote:
> > > >> >>
> > > >> >> > >> It is impossible to disable WAL only for certain partitions
> > > >> without
> > > >> >> > >> completely overhauling design of Ignite storage mechanism.
> > Right
> > > >> now
> > > >> >> we
> > > >> >> > can
> > > >> >> > >> afford only to change WAL mode per cache group.
> > > >> >> >
> > > >> >> > Cache group rebalancing is a one cache rebalancing, and then
> this
> > > >> cache
> > > >> >> > ("cache group") can be presented as a set of virtual caches.
> > > >> >> > So, there is no issues for initial rebalancing.
> > > >> >> > Lets disable WAL on initial rebalancing.
> > > >> >> >
> > > >> >> > 2018-03-26 16:46 GMT+03:00 Ilya Lantukh  >:
> > > >> >> >
> > > >> >> > > Dmitry,
> > > >> >> > > It is impossible to disable WAL only for certain partitions
> > > without
> > > >> >> > > completely overhauling design of Ignite storage mechanism.
> > Right
> > > >> now
> > > >> >> we
> > > >> >> > can
> > > >> >> > > afford only to change WAL mode per cache group.
> > > >> >> > >
> > > >> >> > > The idea is to disable WAL when node doesn't have any
> partition
> > > in
> > > >> >> OWNING
> > > >> >> > > state, which means it doesn't have any consistent data and
> > won't
> > > be
> > > >> >> able
> > > >> >> > to
> > > >> >> > > restore from WAL anyway. I don't see any potential use for
> WAL
> > on
> > > >> such
> > > >> >> > > node, but we can keep a configurable parameter indicating can
> > we
> > > >> >> > > automatically disable WAL in such case or not.
> > > >> >> > >
> > > >> >> > > On Fri, Mar 23, 2018 at 10:40 PM, Dmitry Pavlov <
> > > >> >> dpavlov@gmail.com>
> > > >> >> > > wrote:
> > > >> >> > >
> > > >> >> > > > Denis, as I understood, there is and idea to exclude only
> > > >> rebalanced
> > > >> >> > > > partition(s) data. All other data will go to the WAL.
> > > >> >> > > >
> > > >> >> > > > Ilya, please correct me if I'm wrong.
> > > >> >> > > >
> > > >> >> 

Re: Thin clients release cycle

2018-04-09 Thread Yakov Zhdanov
Guys, has anybody checked with INFRA if we can have module structure? Denis?

--Yakov


Re: Atomic caches

2018-04-07 Thread Yakov Zhdanov
Dmitry, I think Anton meant AtomicsConfiguration, not atomic caches.
However, I would make sure we validate all conf parameters.

Anton, can you please share junit test that shows the problem?

Yakov Zhdanov

сб, 7 апр. 2018 г., 6:12 Dmitriy Setrakyan <dsetrak...@apache.org>:

> I would say absolutely YES - we need to have configuration validation.
>
> Igniters, why was the validation skipped in atomic caches?
>
> D.
>
> On Fri, Apr 6, 2018 at 1:43 PM, akurbanov <antkr@gmail.com> wrote:
>
> > Hello Igniters,
> >
> > I want to address a question on AtomicConfiguration validation. I've
> tested
> > in ignite-1.8 branch that it is impossible to start two nodes with
> > different
> > AtomicConfiguration parameters e.g. different cache modes or numbers of
> > backups are provided.
> > JIRA link: https://issues.apache.org/jira/browse/IGNITE-2096
> >
> > In ignite-2.4 AtomicConfiguration validation is completely skipped on
> node
> > startup and this issue is non-reproducible. Node with alternative
> > configuration successfully joins, but this configuration is being
> > completely
> > ignored, all created atomics will reference the same initial
> configuration
> > and belong to the same cache "ignite-sys-atomic-cache@default-ds-group",
> > even if configuration is provided in constructor.
> >
> > Do we need any kind of validation for this configuration?
> > Would it be correct to use the same approach for atomic types instances
> > caching as used for IgniteQueue/IgniteSet, cache for each unique
> > configuration?
> >
> > Best regards,
> > Anton Kurbanov
> >
> >
> >
> >
> > --
> > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> >
>


Re: Upgrade from 2.1.0 to 2.4.0 resulting in error within transaction block

2018-04-02 Thread Yakov Zhdanov
Cross posting to dev.

Vladimir Ozerov, can you please take a look at NPE from query processor
(see below - GridQueryProcessor.typeByValue(GridQueryProcessor.java:1901))?

--Yakov

2018-03-29 0:19 GMT+03:00 smurphy :

> Code works in Ignite 2.1.0. Upgrading to 2.4.0 produces the stack trace
> below. The delete statement that is causing the error is:
>
> SqlFieldsQuery sqlQuery = new SqlFieldsQuery("delete from EngineFragment
> where " + criteria());
> fragmentCache.query(sqlQuery.setArgs(criteria.getArgs()));
>
> The code above is called from within a transactional block managed by a
> PlatformTransactionManager which is in turn managed by Spring's
> ChainedTransactionManager. If the @Transactional annotation is removed from
> the surrounding code, then the code works ok...
>
> 2018-03-28 15:50:05,748 WARN  [engine 127.0.0.1] progress_monitor_2 unknown
> unknown {ProgressMonitorImpl.java:112} - Scan
> [ec7af5e8-a773-40fd-9722-f81103de73dc] is unable to process!
> javax.cache.CacheException: Failed to process key '247002'
> at
> org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.query(
> IgniteCacheProxyImpl.java:618)
> at
> org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.query(
> IgniteCacheProxyImpl.java:557)
> at
> org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.
> query(GatewayProtectedCacheProxy.java:382)
> at
> com.company.core.dao.ignite.IgniteFragmentDao.delete(
> IgniteFragmentDao.java:143)
> at
> com.company.core.dao.ignite.IgniteFragmentDao$$FastClassBySpringCGLIB$$
> c520aa1b.invoke()
> at org.springframework.cglib.proxy.MethodProxy.invoke(
> MethodProxy.java:204)
> at
> org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.
> invokeJoinpoint(CglibAopProxy.java:720)
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(
> ReflectiveMethodInvocation.java:157)
> at
> org.springframework.dao.support.PersistenceExceptionTranslatio
> nInterceptor.invoke(PersistenceExceptionTranslationInterceptor.java:136)
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(
> ReflectiveMethodInvocation.java:179)
> at
> org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.
> intercept(CglibAopProxy.java:655)
> at
> com.company.core.dao.ignite.IgniteFragmentDao$$EnhancerBySpringCGLIB$$
> ce60f71c.delete()
> at
> com.company.core.core.service.impl.InternalScanServiceImpl.
> purgeScanFromGrid(InternalScanServiceImpl.java:455)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
> 62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection
> (AopUtils.java:302)
> at
> org.springframework.aop.framework.JdkDynamicAopProxy.
> invoke(JdkDynamicAopProxy.java:202)
> at com.sun.proxy.$Proxy417.purgeScanFromGrid(Unknown Source)
> at com.company.core.core.async.tasks.PurgeTask.process(
> PurgeTask.java:85)
> at sun.reflect.GeneratedMethodAccessor197.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection
> (AopUtils.java:302)
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.
> invokeJoinpoint(ReflectiveMethodInvocation.java:190)
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(
> ReflectiveMethodInvocation.java:157)
> at
> org.springframework.transaction.interceptor.TransactionInterceptor$1.
> proceedWithInvocation(TransactionInterceptor.java:99)
> at
> org.springframework.transaction.interceptor.TransactionAspectSupport.
> invokeWithinTransaction(TransactionAspectSupport.java:281)
> at
> org.springframework.transaction.interceptor.TransactionInterceptor.invoke(
> TransactionInterceptor.java:96)
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(
> ReflectiveMethodInvocation.java:179)
> at
> org.springframework.aop.framework.JdkDynamicAopProxy.
> invoke(JdkDynamicAopProxy.java:208)
> at com.sun.proxy.$Proxy418.process(Unknown Source)
> at
> com.company.core.core.async.impl.ProgressMonitorImpl._
> runTasks(ProgressMonitorImpl.java:128)
> at
> com.company.core.core.async.impl.ProgressMonitorImpl.lambda$null$0(
> ProgressMonitorImpl.java:98)
> at java.util.concurrent.Executors$RunnableAdapter.
> call(Executors.java:511)
> at
> 

Re: IEP-14: Ignite failures handling (Discussion)

2018-03-23 Thread Yakov Zhdanov
Andrey, I understand your point but you are trying to build one more
mechanism and introduce abstractions that are already here. Again, please
take a look at segmentation policy and event types we already have.

Thanks!

Yakov


Re: IEP-14: Ignite failures handling (Discussion)

2018-03-20 Thread Yakov Zhdanov
If java runs oome then you cannot guarantee anything. Including calling
runtime.halt().

My point is about consistent approach throughout the project. I think
developing new mechanism with separate interface is incorrect.

Yakov


Re: (Partition Map) Exchange at wiki

2018-03-19 Thread Yakov Zhdanov
Awesome article, Dmitry!

Denis Magda, should we put a link to it from apacheignite.readme.io?

--Yakov


Re: IEP-14: Ignite failures handling (Discussion)

2018-03-19 Thread Yakov Zhdanov
Andrey Gura,

Why should we have any FailureHandler abstraction? We already have it -
this is EventListener. In my view it is better (and cleaner design) to add
events (similar to, for
example, org.apache.ignite.events.EventType#EVT_NODE_SEGMENTED) like
EVT_IGNITE_OOME, EVT_SYS_WORKER_FAILED and fire events accordingly to the
situation + execute configured system logic. We have exactly same way with
segmentation. We have policy which defines how system reacts and also allow
user to add event listeners.

For better understanding please take a look
at org.apache.ignite.plugin.segmentation.SegmentationPolicy
and 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.DiscoveryWorker#onSegmentation.
Discovery manager records the event (allowing user to get notification on
it) and executes internal logic in case segmentation policy is not NOOP.

Thanks!

--Yakov


Re: MTCGA: Tests of the week

2018-03-06 Thread Yakov Zhdanov
Great progress, guys! Keep going!

--Yakov


Re: IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi timeouts

2018-03-02 Thread Yakov Zhdanov
Alexey, generally I agree. However, I don't understand what exactly you
suggest. Can you please list the list of parameters you want to deprecate
(1), internal logic changes (2) and updates to the javadocs/description of
the params you want to keep (3)?

--Yakov


Re: WAL Archive Issue

2018-02-13 Thread Yakov Zhdanov
Ivan,

I do not want to create new files. As far as I know, now we copy segments
to archive dir before they get checkpointed. What I suggest is to copy them
to a temp dir under wal directory and then move to archive. In my
understanding at the time we copy the files to a temp folder all changes to
them are already fsynced.

Correct?

Yakov Zhdanov,
www.gridgain.com

2018-02-13 21:29 GMT+03:00 Ivan Rakov <ivan.glu...@gmail.com>:

> Yakov,
>
> I see the only one problem with your suggestion - number of
> "uncheckpointed" segments is potentially unlimited.
> Right now we have limited number (10) of file segments with immutable
> names in WAL "work" directory. We have to keep this approach due to known
> bug in XFS - fsync time is nearly twice bigger for recently created files.
>
> Best Regards,
> Ivan Rakov
>
>
> On 13.02.2018 21:22, Yakov Zhdanov wrote:
>
>> I meant we still will be copying segment once and then will be moving it
>> to
>> archive which should not affect file system much.
>>
>> Thoughts?
>>
>> --Yakov
>>
>> 2018-02-13 21:19 GMT+03:00 Yakov Zhdanov <yzhda...@apache.org>:
>>
>> Alex,
>>>
>>> I remember we had some confusing behavior for WAL archive when archived
>>> segments were required for successful recovery.
>>>
>>> Is issue still present?
>>>
>>> If yes, what if we copy "uncheckpointed" segments to a directory under
>>> wal
>>> directory and then move the segments to archive after checkpoint? Will
>>> this
>>> work?
>>>
>>> Thanks!
>>>
>>> --Yakov
>>>
>>>
>


Re: WAL Archive Issue

2018-02-13 Thread Yakov Zhdanov
I meant we still will be copying segment once and then will be moving it to
archive which should not affect file system much.

Thoughts?

--Yakov

2018-02-13 21:19 GMT+03:00 Yakov Zhdanov <yzhda...@apache.org>:

> Alex,
>
> I remember we had some confusing behavior for WAL archive when archived
> segments were required for successful recovery.
>
> Is issue still present?
>
> If yes, what if we copy "uncheckpointed" segments to a directory under wal
> directory and then move the segments to archive after checkpoint? Will this
> work?
>
> Thanks!
>
> --Yakov
>


WAL Archive Issue

2018-02-13 Thread Yakov Zhdanov
Alex,

I remember we had some confusing behavior for WAL archive when archived
segments were required for successful recovery.

Is issue still present?

If yes, what if we copy "uncheckpointed" segments to a directory under wal
directory and then move the segments to archive after checkpoint? Will this
work?

Thanks!

--Yakov


Re: Looks like a bug in ServerImpl.joinTopology()

2018-02-13 Thread Yakov Zhdanov
Alex, you can alter ServerImpl and insert a latch or thread.sleep(xxx)
anywhere you like to show the incorrect behavior you describe.

--Yakov


Re: Should we annotate @CacheLocalStore as @Depricated?

2018-01-29 Thread Yakov Zhdanov
+1 for deprecation



--Yakov

2018-01-30 1:06 GMT+03:00 Valentin Kulichenko :

> +1
>
> On Mon, Jan 29, 2018 at 8:31 AM, Andrey Mashenkov <
> andrey.mashen...@gmail.com> wrote:
>
> > Vyacheslav,
> >
> > +1 for dropping @CacheLocalStore.
> > Ignite have no support 2-phase commit for store and public API provides
> no
> > methods to users can easily implement it by themselves.
> >
> >
> >
> >
> > On Mon, Jan 29, 2018 at 7:11 PM, Vyacheslav Daradur  >
> > wrote:
> >
> > > Hi Igniters,
> > >
> > > I've worked with Apache Ignite 3rd Party Persistent Storage tools
> > recently.
> > >
> > > I found that use of CacheLocalStore annotation has hidden issues, for
> > > example:
> > > * rebalancing issues [1]
> > > * possible data consistency issues [1]
> > > * handling of CacheLocalStore on clients nodes [2]
> > >
> > > Valentin K. considers it necessary to make @CacheLocalStore deprecated
> > > and remove. If we want to have a decentralized persistent storage we
> > > should use Apache Ignite Native Persistence.
> > >
> > > If the community supports this decision I will create a new Jira issue.
> > >
> > > Any thoughts?
> > >
> > > [1] http://apache-ignite-developers.2346864.n4.nabble.
> > > com/Losing-data-during-restarting-cluster-with-
> > > persistence-enabled-tt24267.html
> > > [2] http://apache-ignite-developers.2346864.n4.nabble.
> com/How-to-handle-
> > > CacheLocalStore-on-clients-node-tt25703.html
> > >
> > >
> > >
> > > --
> > > Best Regards, Vyacheslav D.
> > >
> >
> >
> >
> > --
> > Best regards,
> > Andrey V. Mashenkov
> >
>


Plugins in tests

2018-01-29 Thread Yakov Zhdanov
Guys,

When running tests from core module I see that Ignite has 2 plugins
configured by default (because they are available in classpath):

-TestReconnectPlugin 1.0
-StanByClusterTestProvider 1.0

It seems they were introduced by Dmitry Karachentsev and Dmitry Govorukhin.
Guys, can you please move the plugins to extdata project similar to
PlatformTestPlugin and configure them only when needed. This is not correct
that each test we run for Ignite runs with plugins configured. By default
Ignite does not have any plugin.

--Yakov


Re: .NET development on Linux & macOS is now possible

2018-01-29 Thread Yakov Zhdanov
Pavel,

I tried this out recently and had few minor issues.

1. due to path issues mvn executable was not found, but process coninued to
.net build
2. after I set path for mvn it still was failing due to incorrect java home

Should we report maven build issues and stop the overall process with
appropriate error message?

Also 2 tests failed for me:

Failed   TestClientDisposeWhileOperationsAreInProgress
Error Message:
   Some tasks should have failed.
  Expected: True
  But was:  False

Stack Trace:
   at
Apache.Ignite.Core.Tests.Client.ClientConnectionTest.TestClientDisposeWhileOperationsAreInProgress()
in
/home/yzhdanov/projects/incubator-ignite/modules/platforms/dotnet/Apache.Ignite.Core.Tests/Client/ClientConnectionTest.cs:line
277

Failed   TestAsyncCompletionOrder
Error Message:
   Expected: False
  But was:  True

Stack Trace:
   at
Apache.Ignite.Core.Tests.Client.Cache.CacheTest.TestAsyncCompletionOrder()
in
/home/yzhdanov/projects/incubator-ignite/modules/platforms/dotnet/Apache.Ignite.Core.Tests/Client/Cache/CacheTest.cs:line
855

If you need logs just let me know.

--Yakov


Re: Ignite Semaphore Bug 7090

2018-01-25 Thread Yakov Zhdanov
Vlad and Tim, thanks for working on this!

Tim, please assign ticket to yourself to follow community process.
Currently I see it is unassigned.

--Yakov

2018-01-25 8:53 GMT-08:00 Vladisav Jelisavcic <vladis...@gmail.com>:

> Hi Tim,
>
> thank you for delving deeper into the problem,
> I left you a comment/question on the JIRA.
>
> Best regards,
> Vladisav
>
> On Tue, Jan 23, 2018 at 3:00 PM, Vladisav Jelisavcic <vladis...@gmail.com>
> wrote:
>
> > Hi Tim,
> >
> > thanks for the update! I left you a comment on Jira.
> >
> > Best regards,
> > Vladisav
> >
> > On Mon, Jan 22, 2018 at 6:17 PM, Tim Onyschak <tonysc...@gmail.com>
> wrote:
> >
> >> Hey Vladisav,
> >>
> >> I implemented your requests. Take a look, specifically, i created an
> >> interface to encapsulate the NodeUpdates and let
> >> the DataStructuresProcessor handle the execution by checking for one
> type
> >> as opposed to multiple if checks. In this case it checks for
> GridCacheNodeUpdate
> >> instance and executes onNodeRemoved. Let me know what you think.
> >>
> >> Thanks,
> >> Tim
> >>
> >>
> >>
> >> On Sat, Jan 20, 2018 at 8:10 AM, Vladisav Jelisavcic <
> vladis...@gmail.com
> >> > wrote:
> >>
> >>> Hi Tim,
> >>>
> >>> I reviewed your contribution and left you some comments on the pr.
> >>> Thanks!
> >>>
> >>> Vladisav
> >>>
> >>> On Wed, Jan 17, 2018 at 10:14 PM, Vladisav Jelisavcic <
> >>> vladis...@gmail.com> wrote:
> >>>
> >>>> Hi Tim,
> >>>>
> >>>> thank you for the contribution!
> >>>> I'll do the review soon and let you know.
> >>>>
> >>>>
> >>>>
> >>>> On Wed, Jan 17, 2018 at 8:56 AM, Yakov Zhdanov <yzhda...@apache.org>
> >>>> wrote:
> >>>>
> >>>>> Thanks Tim! I hope Vlad can review your patch. If this does not
> happen
> >>>>> in
> >>>>> 2-3 days I will take a look. Can you please let me know on weekend
> if I
> >>>>> need to?
> >>>>>
> >>>>> --Yakov
> >>>>>
> >>>>> 2018-01-16 23:36 GMT+03:00 Tim Onyschak <tonysc...@gmail.com>:
> >>>>>
> >>>>> > Hey all,
> >>>>> >
> >>>>> > I created a patch and posted to user group, was told feed back
> would
> >>>>> be
> >>>>> > left on the jira. I wanted to see if we could get a fix in with
> 2.4,
> >>>>> could
> >>>>> > somebody please review.
> >>>>> >
> >>>>> > http://apache-ignite-users.70518.x6.nabble.com/Semaphore-
> >>>>> > Stuck-when-no-acquirers-to-assign-permit-td18639.html
> >>>>> >
> >>>>> > https://issues.apache.org/jira/browse/IGNITE-7090
> >>>>> >
> >>>>> > https://github.com/apache/ignite/pull/3138
> >>>>> >
> >>>>> > Thanks,
> >>>>> > Tim
> >>>>> >
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >
>


Re: Java 9 support

2018-01-25 Thread Yakov Zhdanov
Did we add what Pavel suggested to README.txt and readme.io documentation?

Yakov Zhdanov,
www.gridgain.com

2018-01-25 14:27 GMT-08:00 Denis Magda <dma...@apache.org>:

> Pavel, it’s a good point.
>
> Peter, could you ensure that all Ignite scripts (ignite.sh/bat,
> control.sh/bat, etc.) perform this auto detection of JVM 9 as well? Alex
> K. please do the same for Visor and Web Console scripts.
>
> —
> Denis
>
> > On Jan 25, 2018, at 1:58 AM, Pavel Tupitsyn <ptupit...@apache.org>
> wrote:
> >
> > I would add that to run Ignite on Java 9 without default scripts one has
> to
> > use the following JVM options:
> >
> >--add-exports=java.base/jdk.internal.misc=ALL-UNNAMED"
> >--add-exports=java.base/sun.nio.ch=ALL-UNNAMED
> >
> > --add-exports=java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED
> >
> > --add-exports=jdk.internal.jvmstat/sun.jvmstat.monitor=ALL-UNNAMED
> >
> > Ignite.NET adds these options automatically when Java 9 is detected, no
> > user steps required.
> >
> > Thanks,
> > Pavel
> >
> >
> > On Thu, Jan 25, 2018 at 12:53 PM, vveider <mr.wei...@gmail.com> wrote:
> >
> >> Hi, Igniters!
> >>
> >>
> >> Ticket IGNITE-6730 [1] was merged to master (and ignite-2.4) and now we
> >> have
> >> preliminary support of Java 9, which includes:
> >> - compilation with JDK9 with some constraints (scala-2.10 based modules
> >> should be turned off)
> >> - run with JRE9/JDK9 through default scripts (ignite.{sh|bat})
> >>
> >> Also, temporary TC project for Ignite Tests on Java 9 [2] was prepared;
> >> analysis of test problems and corresponding fixes is in progress.
> >>
> >> Please, be advised.
> >>
> >>
> >> [1] https://issues.apache.org/jira/browse/IGNITE-6730
> >> [2] https://ci.ignite.apache.org/project.html?projectId=
> IgniteTests24Java9
> >>
> >>
> >>
> >> --
> >> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> >>
>
>


Re: Partition loss policy to disable cache completely

2018-01-25 Thread Yakov Zhdanov
No. This is not 100% consistent. Since operations started on prev version
after node has left (but system has not got event yet) would succeed. For
me consistent behavior is to throw exception for "select avg(x) from bla"
if data is currently missing or any data loss occurs in the middle of the
query and return result for cache.get(key); if partition for that key is
still in the grid.

--Yakov


Re: Partition loss policy to disable cache completely

2018-01-23 Thread Yakov Zhdanov
I'm still not sure on what Val has suggested. Dmitry, Val, Do you have any
concrete API/algorithm in mind?

--Yakov


Re: Partition loss policy to disable cache completely

2018-01-23 Thread Yakov Zhdanov
Val, your computation fails once it reaches the absent partition. Agree
with the point that any new computation should not start. Guys, any ideas
on how to achieve that? I would think of scan/sql query checking that there
is no data loss on current topology version prior to start. Val, please
note that along with queries that require full data set there can be some
operations that require only limited partitions (most probably only 1). So,
no point in such strict limitations. Agree?

--Yakov


Re: Partition loss policy to disable cache completely

2018-01-23 Thread Yakov Zhdanov
Alex, I am against reducing cluster operation. I tried to explain in the
prev email that it is impossible to have consistent approach here. You can
prohibit operations only after exchange completes. However, in this case
plenty of transactions are committed on previous cache topology having
nodes they do not touch crashed/left the grid.

--Yakov

2018-01-23 9:28 GMT-08:00 Alexey Goncharuk <alexey.goncha...@gmail.com>:

> Valentin,
>
> I am ok with having a policy which prohibits all cache operations, and this
> is not very hard to implement. Although, I agree with Yakov - I do not see
> any point in reducing cluster availability when operations can be safely
> completed.
>
> 2018-01-23 2:22 GMT+03:00 Yakov Zhdanov <yzhda...@apache.org>:
>
> > Val,
> >
> > Your suggestion to prohibit any cache operation on partition loss does
> not
> > make sense to me. Why should I care about some partition during
> particular
> > operation if I don't access it? Imagine I use data on nodes A and B
> > performing reads and writes and node C crashes in the middle of tx.
> Should
> > my tx be rolled back? I think no.
> >
> > As far as difference it seems that IGNORE resets lost status for affected
> > partitions and READ_WRITE_ALL does not.
> >
> > * @see Ignite#resetLostPartitions(Collection)
> > * @see IgniteCache#lostPartitions()
> >
> > --Yakov
> >
> > 2018-01-17 14:36 GMT-08:00 Valentin Kulichenko <
> > valentin.kuliche...@gmail.com>:
> >
> > > Folks,
> > >
> > > Our PartitionLossPolicy allows to disable operations on lost
> partitions,
> > > however all available policies allow any operations on partitions that
> > were
> > > not lost. It seems to me it can be very useful to also have a policy
> that
> > > completely blocks the cache in case of data loss. Is it possible to add
> > > one?
> > >
> > > And as a side question: what is the difference between READ_WRITE_ALL
> and
> > > IGNORE policies? Looks like both allow both read and write on all
> > > partitions.
> > >
> > > -Val
> > >
> >
>


Re: cache operation failed after transaction rolled back due to deadlock

2018-01-23 Thread Yakov Zhdanov
Alex Goncharuk, can you please take a look and comment? Test seems to be
valid from my standpoint.

Yakov Zhdanov,
www.gridgain.com

2018-01-22 23:14 GMT-08:00 ALEKSEY KUZNETSOV <alkuznetsov...@gmail.com>:

>
> created ticket with reproducer [1]
>
> I think the fix should be to clear context in tx.close()
>
> [1] https://issues.apache.org/jira/browse/IGNITE-7486
>
> > 23 янв. 2018 г., в 1:56, Yakov Zhdanov <yzhda...@apache.org> написал(а):
> >
> > Guys, can you check if you call tx.close(); or properly use
> > try-with-resources construct?
> >
> > Tx context should not be cleared automatically otherwise user would not
> get
> > any notification that original transaction failed. I believe context
> should
> > be cleared on tx.close().
> >
> > Anyway, let's take a look at reproducer from Alexey first.
> >
> > --Yakov
> >
> > 2018-01-22 2:02 GMT-08:00 ALEKSEY KUZNETSOV <alkuznetsov...@gmail.com>:
> >
> >> Sure
> >>
> >> пн, 22 янв. 2018 г. в 12:25, Andrey Gura <ag...@apache.org>:
> >>
> >>> It seems that problem isn't related with deadlock detection and should
> be
> >>> reproducible when deadlock detection disabled.
> >>>
> >>> Anyway it sounds like a bug. Could you please file a ticket and provide
> >>> minimal reproducer?
> >>>
> >>> 19 янв. 2018 г. 3:55 PM пользователь "ALEKSEY KUZNETSOV" <
> >>> alkuznetsov...@gmail.com> написал:
> >>>
> >>>> Hi, Igntrs!
> >>>>
> >>>>
> >>>>
> >>>> When you have your transaction rolled back due to detected deadlock,
> >> you
> >>>> are unabled to perform cache operations (in thread where tx was
> started
> >>> and
> >>>> rolled back), because it leads to TransactionTimeoutException.
> >>>>
> >>>>
> >>>>
> >>>> The reason of such behavior is that tx thread map
> (txManager#threadMap)
> >>> was
> >>>> not cleared from tx when roll back occured.
> >>>>
> >>>> In GridNearTxLocal#onTimeout you can find comment on that :
> >>>>
> >>>> *// Note: if rollback asynchronously on timeout should not clear
> thread
> >>>> map*
> >>>>
> >>>> *// since thread started tx still should be able to see this tx.*
> >>>>
> >>>> Cache operation picks up tx from that map and throws exception.
> >>>>
> >>>>
> >>>>
> >>>> So, one must create new thread in order to perform cache operations?
> >>>>
> >>>>
> >>>> --
> >>>>
> >>>> *Best Regards,*
> >>>>
> >>>> *Kuznetsov Aleksey*
> >>>>
> >>>
> >>
> >>
> >> --
> >>
> >> *Best Regards,*
> >>
> >> *Kuznetsov Aleksey*
> >>
>


Re: Partition loss policy to disable cache completely

2018-01-22 Thread Yakov Zhdanov
Val,

Your suggestion to prohibit any cache operation on partition loss does not
make sense to me. Why should I care about some partition during particular
operation if I don't access it? Imagine I use data on nodes A and B
performing reads and writes and node C crashes in the middle of tx. Should
my tx be rolled back? I think no.

As far as difference it seems that IGNORE resets lost status for affected
partitions and READ_WRITE_ALL does not.

* @see Ignite#resetLostPartitions(Collection)
* @see IgniteCache#lostPartitions()

--Yakov

2018-01-17 14:36 GMT-08:00 Valentin Kulichenko <
valentin.kuliche...@gmail.com>:

> Folks,
>
> Our PartitionLossPolicy allows to disable operations on lost partitions,
> however all available policies allow any operations on partitions that were
> not lost. It seems to me it can be very useful to also have a policy that
> completely blocks the cache in case of data loss. Is it possible to add
> one?
>
> And as a side question: what is the difference between READ_WRITE_ALL and
> IGNORE policies? Looks like both allow both read and write on all
> partitions.
>
> -Val
>


  1   2   3   4   5   6   7   >