from:"Andrey Kuznetsov"

Re: Model of permissions for Ignite 3

2021-04-08 Thread Andrey Kuznetsov

Hi Denis!

The idea and prototype look great.

I'd like to highlight one arguable point. Default authorization
implementation still assumes there are permissions provided in
SecuritySubject. In turn, authentication is still responsible for filling
these permissions. I suggest decoupling authentication from authorization,
so that GridSecurityProcessor implementation is fully responsible for
obtaining permissions for SecuritySubject given on authorization. In
particular, implementation can choose an existing behavior of bundling
permissions with SecuritySubject.

Makes sense?

чт, 8 апр. 2021 г. в 17:52, Denis Garus :

> Sorry, I forgot to point the link
>
> 1. https://github.com/apache/ignite/pull/8989
>
> чт, 8 апр. 2021 г. в 17:50, Denis Garus :
>
> > Hello, Igniters!
> >
> > I want to propose to improve the way which we use
> > to present permissions in Ignite 3.
> >
> > The model of permission in Ignite has a set of drawbacks.
> > The main drawback, IMHO: if you need to add a new permission,
> > you should change the core module by extended the 'SecurityPermission'
> > enum.
> > An approach like this becomes more challenged if new permission is
> created
> > for an extension.
> >
> > The existing permission model is overcomplicated.
> > The SecurityPermission enum is divided into four groups,
> > and to determine whether a security subject has been given permission,
> > a plugin developer has to know what the permission group is.
> > But 'CACHE_CREATE' and 'CACHE_DESTROY' are included in two groups (system
> > operations and cache operations).
> > When 'CACHE_CREATE' ('CACHE_DESTROY') is treated as system permission,
> > it applies to all caches. In other cases, when 'CACHE_CREATE'
> > ('CACHE_DESTROY') is treated as cache permission,
> > permission checking is executed with the account of the cache name.
> > IMHO, this logic is hard to understand.
> > There is no ability to represent compound operation as single permission
> > and so on.
> >
> >
> > So I would like to suggest using a permission model that is based on
> > 'java.security.Permission'.
> > I prepared the concept [1] of how this model could look in Ignite.
> > Classes 'CachePermission', 'ComputePermission', and 'ServicePermission'
> > represent cache, compute,
> > and service permissions accordingly,  allow wildcards, for example,
> > "org.apache.ignite.internal.*".
> > Class 'IgniteClusterPermission' represents permission without actions.
> > Interface 'GridSecurityProcessor' has a default implementation of the
> > 'authorize' method.
> > 'SecurityTestSuite' is green.
> >
> >
> > This representation of permission, IMHO, has the following advantages:
> > - A developer can easily add new permission without needing to touch the
> > core module.
> > - There is no need to implement complicated logic to authorize an
> > operation inside a security plugin.
> >But a developer has the opportunity to add custom logic.
> > - Wildcards for permission's name from a box, for example, 'new
> > CachePermission("x.y.z.*", "get,put")'.
> > - There is no need to implement 'SecurityPermissionSet' and a set of
> > methods from 'SecurityContex' ('xxxAllowed(String, SecurityPermission))'.
> > - We can define a security policy in a file as java does. It could
> > simplify work for administrators.
> >
> > WDYT?
> >
>


-- 
Best regards,
  Andrey Kuznetsov.

[jira] [Created] (IGNITE-14292) Change permissions required to create/destroy caches in GridRestProcessor

2021-03-09 Thread Andrey Kuznetsov (Jira)

Andrey Kuznetsov created IGNITE-14292:
-

 Summary: Change permissions required to create/destroy caches in 
GridRestProcessor
 Key: IGNITE-14292
 URL: https://issues.apache.org/jira/browse/IGNITE-14292
 Project: Ignite
  Issue Type: Improvement
  Components: security
Affects Versions: 2.9.1
Reporter: Andrey Kuznetsov


{{GridRestProcessor}} authorizes {{ADMIN_CACHE}} permission before cache 
creation/destruction. This is inconsistent with thin client connector behavior 
and looks counterintuitive. {{ADMIN_CACHE}} should be replaced with 
{{CACHE_CREATE}} and {{CACHE_DESTROY}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (IGNITE-14207) Access to system views is not controlled by security processor

2021-02-18 Thread Andrey Kuznetsov (Jira)

Andrey Kuznetsov created IGNITE-14207:
-

 Summary: Access to system views is not controlled by security 
processor
 Key: IGNITE-14207
 URL: https://issues.apache.org/jira/browse/IGNITE-14207
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.9.1
Reporter: Andrey Kuznetsov


As of now, it is impossible to restrict access to system views (SYS scheme) 
with {{IgniteSecurityProcessor}}; this should be fixed.

Suggestions:
- add new {{SecurityPermission}};
- authorize this permission before accessing any system view.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (IGNITE-14160) Issue log warning when GridNioSslHandler.handshake() takes too long

2021-02-10 Thread Andrey Kuznetsov (Jira)

Andrey Kuznetsov created IGNITE-14160:
-

 Summary: Issue log warning when GridNioSslHandler.handshake() 
takes too long
 Key: IGNITE-14160
 URL: https://issues.apache.org/jira/browse/IGNITE-14160
 Project: Ignite
  Issue Type: Task
Affects Versions: 2.9.1
Reporter: Andrey Kuznetsov


This will be helpful in investigating client connectivity/performance issues.
Threshold duration can be just a reasonable constant, say, 1 second.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (IGNITE-13505) WalCompactionAfterRestartTest fails stabilly

2020-10-01 Thread Andrey Kuznetsov (Jira)

Andrey Kuznetsov created IGNITE-13505:
-

 Summary: WalCompactionAfterRestartTest fails stabilly
 Key: IGNITE-13505
 URL: https://issues.apache.org/jira/browse/IGNITE-13505
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.8.1
Reporter: Andrey Kuznetsov
Assignee: Andrey Kuznetsov


And also, the test is not included to any test suite.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: Add user defined attributes to all GridClient messages.

2020-04-13 Thread Andrey Kuznetsov

+1

We should garantee user attributes transmission to cluster once they are
set in client configuration.

пн, 13 апр. 2020 г. в 15:09, Oleg Ostanin :

> Hello, Igniters!
>
> Recently we added the possibility of sending user defined attributes from
> clients, and check those attributes in a custom authenticator
> implementation[1]. However in some cases it's not working well for
> GridClient because currently the attributes are only added to TOPOLOGY
> message. I've created a ticket with a reproducer:
>
> https://issues.apache.org/jira/browse/IGNITE-12891
>
> I suggest solving this problem by adding user defined attributes to other
> GridClient messages such as GridClientAutheticationRequest and so on.
>
> What do you think?
>
> Best regards
> Oleg
>
> [1]
> https://issues.apache.org/jira/browse/IGNITE-12049
>


-- 
Best regards,
  Andrey Kuznetsov.

Re: Inconsistent API IgniteClient and REST

2020-03-31 Thread Andrey Kuznetsov

I'd prefer marking ADMIN_CACHE as deprecated, but postpone its removal from
GridRestProcessor till next Ignte release (2.10 or 3.0?). For now we could
just add checks for CACHE_CREATE / CACHE_DESTROY there along
with ADMIN_CACHE.

вт, 31 мар. 2020 г. в 12:30, Nikolay Izhikov :

> Hello, Sergey.
>
>
> I’m +1 to make this change.
>
> I think we should make security consistent across all APIs.
>
> > 31 марта 2020 г., в 12:14, Sergei Ryzhov 
> написал(а):
> >
> > Hello!
> > Now the work of permissions for API IgniteClient and REST is different.
> > To create/delete a cache:
> > IgniteClient authorises
> CACHE_CREATE/CACHE_DESTROY.(GridCacheProcessor#authorizeCacheCreate <
> https://github.com/apache/ignite/blob/aefad946ebd7720f81b460aa39e205c10dc24b26/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/GridCacheProcessor.java#L3983>,
> authorizeCacheDestroy <
> https://github.com/apache/ignite/blob/aefad946ebd7720f81b460aa39e205c10dc24b26/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/GridCacheProcessor.java#L3973
> >)
> > REST authorises ADMIN_CACHE.(GridRestProcessor#authorize <
> https://github.com/apache/ignite/blob/aefad946ebd7720f81b460aa39e205c10dc24b26/modules/core/src/main/java/org/apache/ignite/internal/processors/rest/GridRestProcessor.java#L841
> >)
> > I think this is inconsistent.
> >
> > I suggest ADMIN_CACHE mark @Deprecated
> > and replace it in the GridRestProcessor with CACHE_CREATE /
> CACHE_DESTROY
> > while maintaining backward compatibility for ADMIN_CACHE.
> >
> > This will allow us to remove ADMIN_CACHE in the future.
> >
> >
> >
> > Sergei Ryzhov
> > s.vi.ryz...@gmail.com
> >
> >
> >
> >
> >
> >
> >
> >
>
>

-- 
Best regards,
  Andrey Kuznetsov.

Per-cache CACHE_CREATE/CACHE_DESTROY permissions handling

2020-03-25 Thread Andrey Kuznetsov

Hi, Igniters!

Long ago I raised the issue [1]. The change is obvious yet useful; it
allows Ignite security implementations to manage per-cache-based
CACHE_CREATE/CACHE_DESTROY permissions (as opposed to system level
permissions). It is ready now, and some of community members have approved
it. Who could make final review, please?

[1] https://issues.apache.org/jira/browse/IGNITE-12220

-- 
Best regards,
  Andrey Kuznetsov.

[jira] [Created] (IGNITE-12832) Add user attributes support to control.sh

2020-03-23 Thread Andrey Kuznetsov (Jira)

Andrey Kuznetsov created IGNITE-12832:
-

 Summary: Add user attributes support to control.sh
 Key: IGNITE-12832
 URL: https://issues.apache.org/jira/browse/IGNITE-12832
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 2.8
Reporter: Andrey Kuznetsov


Change [1] introduced user attributes for various thin clients. 
{{control.sh|bat}} script uses {{GridClient}} to connect to cluster, but it's 
impossible to set user attributes in corresponding {{GridClientConfiguration}} 
currenly. I suggest to add such an ability by adding 
{{--attr-some-attr-name=attrValue}} command line option.

To prevent command line pollution I also suggest to implement {{.properties}} 
file support, so that command line arguments (including {{--attr*}} arguments) 
could be hidden in a file specified by {{--config filename.properties}}. In 
case of duplication explicit command line arguments should take precedence over 
arguments set in {{.properties}} file.

[1] https://issues.apache.org/jira/browse/IGNITE-12049



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (IGNITE-12744) Add user attributes to GridRestRequest creation routine

2020-03-03 Thread Andrey Kuznetsov (Jira)

Andrey Kuznetsov created IGNITE-12744:
-

 Summary: Add user attributes to GridRestRequest creation routine
 Key: IGNITE-12744
 URL: https://issues.apache.org/jira/browse/IGNITE-12744
 Project: Ignite
  Issue Type: Improvement
  Components: rest
Affects Versions: 2.8
Reporter: Andrey Kuznetsov


Improvement [1] has added user attributes support to Ignite thin clients. REST 
API connections should also support this feature: 
{{GridJettyRestHandler.createRequest}} can read user attributes from HTTP 
request parameters.

[1] https://issues.apache.org/jira/browse/IGNITE-12049



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (IGNITE-12718) Python Thin Client: add an ability to specify keyfile password

2020-02-26 Thread Andrey Kuznetsov (Jira)

Andrey Kuznetsov created IGNITE-12718:
-

 Summary: Python Thin Client: add an ability to specify keyfile 
password
 Key: IGNITE-12718
 URL: https://issues.apache.org/jira/browse/IGNITE-12718
 Project: Ignite
  Issue Type: Improvement
  Components: thin client
Affects Versions: 2.8
Reporter: Andrey Kuznetsov


In pyignite, there is no way to specify password for keyfile being used to 
establish TLS connection to Ignite cluster. If keyfile is encrypted, then 
OpenSSL library prompts for password interactively.

In order to add configurable password, one can set up explicit {{SSLContext}} 
instead of {{ssl.wrap_socket}} call.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: Security Subject of thin client on remote nodes

2020-02-21 Thread Andrey Kuznetsov

Hi, guys!

The change suggested by Denis looks robust to me: it covers security
subject handling by all kinds of clients/nodes at once. As for
ATTR_SECURITY_SUBJECT_V2 attribute, it is really better to move it to
plugin implementations to support backward compatibility with peer nodes of
older versions. Obviously, cluster with security disabled will not suffer
from attribute removal. Ignite core should know nothing about the specific
way of security context propagation.

Denis, could you please create Jira issue for your change?

чт, 20 февр. 2020 г. в 17:01, Denis Garus :

> > I just transmitted security subjects for rest requests.
>
> SecurityContext has an unlimited size so we can get significant overhead.
> And we do not solve problems with other thin clients.
>
> >If you remove ATTR_SECURITY_SUBJECT_V2, it breaks compatibility between
> old
> versions and new.
>
> I suggest removing ATTR_SECURITY_SUBJECT_V2 from Ignite's codebase, but for
> compatibility, it can be used by a security plugin like in PoC.
>
> чт, 20 февр. 2020 г. в 16:47, Maksim Stepachev  >:
>
> > Yes, I said about it at 07.19.
> >
> >
> http://apache-ignite-developers.2346864.n4.nabble.com/Improvements-for-new-security-approach-td42698.html#a42708
> > And in my solution, I just transmitted security subjects for rest
> requests.
> >
> > If you remove ATTR_SECURITY_SUBJECT_V2, it breaks compatibility between
> old
> > versions and new.
> >
> > чт, 20 февр. 2020 г. в 15:56, Denis Garus :
> >
> > > Hi, Igniters!
> > >
> > >
> > > At present, a security subject id is assumed to be node id.
> > >
> > > But when we are dealing with thin client, JDBC or REST subject id is
> > random
> > > UUID. In this case, we cannot get the subject information on a remote
> > node,
> > > and we get problems like these [1], [2].
> > >
> > > To fix the problem, we should spread the client session to the whole
> > > cluster.
> > >
> > >
> > > I want to suggest a solution to the problem.
> > >
> > >
> > > First, we should get subject information using GridSecurityProcessor.
> > >
> > > How GridSecurityProcessor will retrieve a subject data, it is up to
> > plugin
> > > developers.
> > >
> > >
> > > Second, we should get rid of the assumption that a subject id is a node
> > id
> > > and remove the ATTR_SECURITY_SUBJECT_V2 attribute.
> > >
> > >
> > > I have prepared PoC PR [3] that:
> > >
> > > - places the existing logic of spreading security context to
> > > GridSecurityProcessor;
> > >
> > > - uses GridSecurityProcessor to get SecurityContext.
> > >
> > >
> > >
> > >1.
> > >
> > >
> >
> http://apache-ignite-developers.2346864.n4.nabble.com/JDBC-thin-client-incorrect-security-context-td45929.html
> > >2. https://issues.apache.org/jira/browse/IGNITE-12589
> > >3. https://github.com/apache/ignite/pull/7375
> > >
> >
>


-- 
Best regards,
  Andrey Kuznetsov.

Re: [ANNOUNCE] New committer: Vyacheslav Koptilin

2020-02-18 Thread Andrey Kuznetsov

Congratulations, Slava!

вт, 18 февр. 2020 г. в 22:20, Dmitriy Pavlov :

> Hello Ignite Community,
>
> The Project Management Committee (PMC) for Apache Ignite has invited
> Vyacheslav Koptilin to become a committer and we are pleased to announce
> that he has accepted.
>
> Vyacheslav investigated and fixed a number of non-trivial issues in the
> Ignite Native persistent store, was a reviewer of Read Repair (ex.
> Consistency Check).
>
> Being a committer enables easier contribution to the project since there is
> no need to go via the patch submission process. This should enable better
> productivity.
>
> Vyacheslav, thanks for supporting the community and keep the pace!
>
> Best Regards,
> Dmitriy Pavlov
> on behalf of Apache Ignite PMC
>


-- 
Best regards,
  Andrey Kuznetsov.

Re: Add user attributes to thin clients

2020-01-28 Thread Andrey Kuznetsov

gt; > > >
> > > > > > > > > > > > > Binary marshaller, before packing object to byte[],
> > > will
> > > > > try
> > > > > > to
> > > > > > > > use
> > > > > > > > > > > > > discovery processor and send message containing
> class
> > > > > > > descriptor.
> > > > > > > > > But
> > > > > > > > > > > > thin
> > > > > > > > > > > > > clients don't have discovery. Furthermore, if we
> > write
> > > > > binary
> > > > > > > > > > > marshaller
> > > > > > > > > > > > > without class descriptor synchronization, we can
> get
> > > > > objects
> > > > > > > with
> > > > > > > > > > > > different
> > > > > > > > > > > > > class versions and corresponding exceptions.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Even if compact footer is disabled ?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > But we can say users to declare their classes in
> > > > > > > > > > > > > META-INF/classnames.properties and current binary
> > > > > marshaller
> > > > > > > will
> > > > > > > > > works
> > > > > > > > > > > > > good.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > This approach doesn't looks like cross-platform.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > чт, 23 янв. 2020 г., 12:13 Alex Plehanov <
> > > > > > > > plehanov.a...@gmail.com
> > > > > > > > > >:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hello,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > User attributes also (besides authentication) can
> > be
> > > > used
> > > > > > to
> > > > > > > > pass
> > > > > > > > > > > some
> > > > > > > > > > > > > info
> > > > > > > > > > > > > > about an application that uses a client and then
> > > > display
> > > > > > this
> > > > > > > > > > > > information
> > > > > > > > > > > > > > in monitoring tools. Other vendors use such
> > approach
> > > > > > (Oracle
> > > > > > > > DB,
> > > > > > > > > for
> > > > > > > > > > > > > > example, have DBMS_APPLICATION_INFO package,
> > > > PostgreeSQL
> > > > > > have
> > > > > > > > > > > > > > application_name connection property and
> > application
> > > > > > > > information
> > > > > > > > > > > > > available
> > > > > > > > > > > > > > later in system views).
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > About allowed data types: we should definitely
> > limit
> > > > > > > attribute
> > > > > > > > > types
> > > > > > > > > > > to
> > > > > > > > > > > > > > only primitive types. Thin client binary
> marshaller
> > > > can't
> > > > > > > send
> > > > > > > > > > > > > information
> > > > > > > > > > > > > >

Re: Add user attributes to thin clients

2020-01-22 Thread Andrey Kuznetsov

Hi, Pavel!

Sometimes single authentication factor is not enough. Attributes proposed
allow to add extra factors flexibly.

ср, 22 янв. 2020 г., 17:39 Pavel Tupitsyn :

> Token can be sent instead of a password (like git works with GitHub
> tokens).
>
> For now I don't see a reason to include attributes into the handshake
> message.
>
> On Wed, Jan 22, 2020 at 5:32 PM Ilya Kasnacheev  >
> wrote:
>
> > Hello!
> >
> > One does not send security certificate as attribute. The only way to
> obtain
> > peer security certificate is to ask SSL engine to provide it.
> >
> > Nevertheless, I can see how it can be useful with e.g. Kerberos, which is
> > token-based IIRC.
> >
> > Regards,
> > --
> > Ilya Kasnacheev
> >
> >
> > ср, 22 янв. 2020 г. в 17:20, Dmitrii Ryabov :
> >
> > > This map is something like user object from `SecurityCredentials`.
> > > Sometimes login and password are not enough for security checks. For
> > > example, we can send security certificate and validate it inside
> > > authenticator.
> > >
> > > ср, 22 янв. 2020 г., 17:16 Igor Sapego :
> > >
> > > > Hi Dmitrii,
> > > >
> > > > Can you please explain your use case?
> > > > I'm not sure I'm getting what is the motivation of this change.
> > > >
> > > > Best Regards,
> > > > Igor
> > > >
> > > >
> > > > On Wed, Jan 22, 2020 at 5:11 PM Pavel Tupitsyn  >
> > > > wrote:
> > > >
> > > > > Hi Dmitrii,
> > > > >
> > > > > Honestly, I could not grasp the problem, can you explain it in more
> > > > detail?
> > > > > What do we solve by adding a map with arbitrary stuff to the client
> > > > > protocol handshake?
> > > > >
> > > > > On Wed, Jan 22, 2020 at 5:02 PM Dmitrii Ryabov <
> > somefire...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hello, Igniters!
> > > > > >
> > > > > > I want to add the possibility of sending user defined attributes
> > from
> > > > > thin
> > > > > > clients. And check them inside custom authenticator during
> > handshake
> > > > [1].
> > > > > >
> > > > > > There is an issue in hardcoded binary writer for JDBC and
> > > > `IgniteClient`.
> > > > > > This writer searches for a classes in the JDK and
> > > > > > META-INF/classnames.properties, and tries to sync notdeclared
> > classes
> > > > > with
> > > > > > cluster. But fails because current classloading uses discovery.
> > > > > >
> > > > > > I'd like to keep this writer and allow only primitive types and
> > > > `String`
> > > > > > for user attributes to prevent unexpected fails. I think it is
> > better
> > > > > than
> > > > > > changing writer to one with heavy classloading.
> > > > > >
> > > > > > Is it ok to restrict thin attributes to primitives and 'String'?
> > > > > >
> > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-12049
> > > > > >
> > > > >
> > > >
> > >
> >
>

[jira] [Created] (IGNITE-12304) All DataRegionMetrics should be documented

2019-10-21 Thread Andrey Kuznetsov (Jira)

Andrey Kuznetsov created IGNITE-12304:
-

 Summary: All DataRegionMetrics should be documented
 Key: IGNITE-12304
 URL: https://issues.apache.org/jira/browse/IGNITE-12304
 Project: Ignite
  Issue Type: Task
  Components: documentation
Reporter: Andrey Kuznetsov
 Fix For: 2.8


All metrics added to {{DataRegionMetrics}} interface by [1] should be 
documented.

[1] https://issues.apache.org/jira/browse/IGNITE-8078



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (IGNITE-12220) Allow to use cache-related permissions both at system and per-cache levels

2019-09-23 Thread Andrey Kuznetsov (Jira)

Andrey Kuznetsov created IGNITE-12220:
-

 Summary: Allow to use cache-related permissions both at system and 
per-cache levels
 Key: IGNITE-12220
 URL: https://issues.apache.org/jira/browse/IGNITE-12220
 Project: Ignite
  Issue Type: Task
  Components: security
Affects Versions: 2.7.6
Reporter: Andrey Kuznetsov
Assignee: Andrey Kuznetsov
 Fix For: 2.8


Currently, {{CACHE_CREATE}} and {{CACHE_DESTROY}} permissions are enforced to 
be system-level permissions, see for instance 
{{SecurityPermissionSetBuilder#appendCachePermissions}}. This looks inflexible: 
Ignite Security implementations are not able to manage cache creation and 
deletion permissions on per-cache basis (unlike get/put/remove permissions). 
All such limitations should be found and removed on order to allow all 
{{CACHE_*}} permissions to be set both at system and per-cache levels.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: New Сommitter: Maxim Muzafarov

2019-08-28 Thread Andrey Kuznetsov

Great news! Congratulations!

ср, 28 авг. 2019 г., 18:28 Alex Plehanov :

> Maxim, congratulations!
>
> ср, 28 авг. 2019 г. в 18:13, Nikita Amelchev :
>
> > My congratulations, Maxim!
> >
> > ср, 28 авг. 2019 г. в 18:11, Dmitriy Pavlov :
> > >
> > > Dear community,
> > >
> > > The Project Management Committee (PMC) for Apache Ignite has invited
> > Maxim
> > > Muzafarov to become a committer and we are pleased to announce that he
> > has
> > > accepted.
> > >
> > > PMC recognizes Maxim's efforts in developing file transfer for
> > rebalancing,
> > > removal of WAL applying steps from Partition Map Exchange, finding and
> > > fixing non-trivial issues with the product, like mem leaks, and in
> > setting
> > > up code inspections and maintaining this process.
> > >
> > > Being a committer enables easier contribution to the project since
> there
> > is
> > > no need to go via the patch submission process. This should enable
> better
> > > productivity.
> > >
> > > Best regards,
> > > Dmitriy Pavlov
> > > on behalf of PMC, Apache Ignite
> >
> >
> >
> > --
> > Best wishes,
> > Amelchev Nikita
> >
>

Re: Coding guidelines. Useless JavaDoc comments.

2019-08-07 Thread Andrey Kuznetsov

+1 for making javadoc comments optional.

- Empty and tautological comments are kind of garbage that reduce
readability.
- It's better to leave the entity undocumented, than write
unexpressive/misleading comment.
- Even classes may not require javadocs, e.g. simple DTOs.

ср, 7 авг. 2019 г., 13:39 Denis Garus :

> Thx for feedback!
>
> >> we have to write proper javadoc for all production classes, including
> internal.
>
> Nikolay, I cannot agree with it.
>
> What should be the best comment for the next fields?
> /** */
> private static final long serialVersionUID = 0L;
> or
> /** */
> @LoggerResource
> private IgniteLogger log;
>
> There are more than 8000 lines of /** */ only at the ignite-core module (do
> not include tests).
>
> Any comments will be redundant and just noise. Obvious comment learns
> readers skip all comments.
>
>
> >> identical distance/padding/margin between fields in a class - is really
> cool
>
> Vyacheslav, but we have a blank line between fields. Why is one blank line
> not enough?
>
> ср, 7 авг. 2019 г. в 12:58, Павлухин Иван :
>
> > Hi,
> >
> > Denis, thank you for starting this discussion!
> >
> > My opinion here is that having a good javadoc for every class and
> > method is not feasible in the real world. I am quite curious to see a
> > non-trivial project which follows it. Also, all comments and javadocs
> > are prone to become misleading when code changes (human factor). In my
> > experience good method and variable names and clear code organization
> > often are more helpful than javadocs.
> >
> > ср, 7 авг. 2019 г. в 12:49, Vyacheslav Daradur :
> > >
> > > I agree that useless comments look weird in the codebase.
> > >
> > > But, identical distance/padding/margin between fields in a class - is
> > > really cool, and helps read the class very fast.
> > >
> > > On Wed, Aug 7, 2019 at 12:26 PM Nikolay Izhikov 
> > wrote:
> > > >
> > > > Hello, Denis.
> > > >
> > > > Thanks for starting this discussion.
> > > >
> > > > I think we have to write proper javadoc for all production classes,
> > including internal.
> > > > We should fix useless javadoc you provide.
> > > > We should not accept patches without good javadocs.
> > > >
> > > > As for the tests, seems, we can make javadoc optional.
> > > >
> > > > What do you think?
> > > >
> > > >
> > > > В Ср, 07/08/2019 в 11:41 +0300, Denis Garus пишет:
> > > > > Igniters!
> > > > >
> > > > > I think it's time to change coding guidelines in part of JavaDoc
> > comments
> > > > > [1]:
> > > > > > > Every method, field or initializer public, private or protected
> > in
> > > > >
> > > > > top-level,
> > > > > > > inner or anonymous type should have at least minimal Javadoc
> > comments
> > > > >
> > > > > including
> > > > > > > description and description of parameters using @param, @return
> > and
> > > > >
> > > > > @throws Javadoc tags,
> > > > > > > where applicable.
> > > > >
> > > > > Let's look at JavaDoc comments in the IgniteKernal class:
> > > > >
> > > > > Why?
> > > > >
> > > > > /** */ - 15 matches.
> > > > >
> > > > > What can you get new from these comments?
> > > > >
> > > > > /** Periodic starvation check interval. */
> > > > > private static final long PERIODIC_STARVATION_CHECK_FREQ = 1000 *
> 30;
> > > > >
> > > > > /** Long jvm pause detector. */
> > > > > private LongJVMPauseDetector longJVMPauseDetector;
> > > > >
> > > > > /** Scheduler. */
> > > > > private IgniteScheduler scheduler;
> > > > >
> > > > > /** Stop guard. */
> > > > > private final AtomicBoolean stopGuard = new AtomicBoolean();
> > > > >
> > > > > /**
> > > > >  * @param cfg Configuration to use.
> > > > >  * @param utilityCachePool Utility cache pool.
> > > > >  * @param execSvc Executor service.
> > > > >  * @param sysExecSvc System executor service.
> > > > >  * @param stripedExecSvc Striped executor.
> > > > >  * @param p2pExecSvc P2P executor service.
> > > > >  * @param mgmtExecSvc Management executor service.
> > > > >  * @param igfsExecSvc IGFS executor service.
> > > > >  * @param dataStreamExecSvc data stream executor service.
> > > > >  * @param restExecSvc Reset executor service.
> > > > >  * @param affExecSvc Affinity executor service.
> > > > >  * @param idxExecSvc Indexing executor service.
> > > > >  * @param callbackExecSvc Callback executor service.
> > > > >  * @param qryExecSvc Query executor service.
> > > > >  * @param schemaExecSvc Schema executor service.
> > > > >  * @param customExecSvcs Custom named executors.
> > > > >  * @param errHnd Error handler to use for notification about
> > startup
> > > > > problems.
> > > > >  * @param workerRegistry Worker registry.
> > > > >  * @param hnd Default uncaught exception handler used by thread
> > pools.
> > > > >  * @throws IgniteCheckedException Thrown in case of any errors.
> > > > >  */
> > > > > public void start(
> > > > > final IgniteConfiguration cfg,
> > > > >

Re: GridDhtInvalidPartitionException takes the cluster down

2019-03-27 Thread Andrey Kuznetsov

I see no other dependencies for IGNITE-10003.

Best regards,
Andrey Kuznetsov.

ср, 27 марта 2019, 18:25 Andrey Gura ag...@apache.org:

> What do you think about including patches [1] and [2] to Ignite 2.7.5?
> It's all about default failure handler behavior in cases of
> SYSTEM_WORKER_BLOCKED and SYSTEM_CRITICAL_OPERATION_TIMEOUT.
>
> Andrey Kuznetsov, could you please check, does IGNITE-10003 depend on
> other issue that isn't included into 2.7 release?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-10154
> [2] https://issues.apache.org/jira/browse/IGNITE-10003
>
> On Wed, Mar 27, 2019 at 8:11 AM Denis Magda  wrote:
> >
> > Folks, thanks for sharing details and inputs. This is helpful. As long
> as I
> > spend a lot of time working with Ignite users, I'll look into this topic
> in
> > a couple of days to propose some changes. In the meantime, here is a
> fresh
> > one report on the user list:
> >
> http://apache-ignite-users.70518.x6.nabble.com/Triggering-Rebalancing-Programmatically-get-error-while-requesting-td27651.html
> >
> >
> > -
> > Denis
> >
> >
> > On Tue, Mar 26, 2019 at 9:04 AM Andrey Gura  wrote:
> >
> > > CleanupWorker termination can lead to the following effects:
> > >
> > > - Queries can retrieve data that have to expired so application will
> > > behave incorrectly.
> > > - Memory and/or disc can be overflowed because entries weren't expired.
> > > - Performance degradation is possible due to unmanageable data set
> grows.
> > >
> > > On Tue, Mar 26, 2019 at 4:58 PM Roman Shtykh  >
> > > wrote:
> > > >
> > > > Vyacheslav, if you are talking about this particular case I
> described, I
> > > believe it has no influence on PME. What could happen is having
> > > CleanupWorker thread dead (which is not good too).But I believe we are
> > > talking in a wider scope.
> > > >
> > > > -- Roman
> > > >
> > > >
> > > > On Tuesday, March 26, 2019, 10:23:30 p.m. GMT+9, Vyacheslav
> Daradur <
> > > daradu...@gmail.com> wrote:
> > > >
> > > >  In general I agree with Andrey, the handler is very usefull itself.
> It
> > > > allows us to become know that ‘GridDhtInvalidPartitionException’ is
> not
> > > > processed properly in PME process by worker.
> > > >
> > > > Nikolay, look at the code, if Failure Handler hadles an exception -
> this
> > > > means that while-true loop in worker’s body has been interrupted with
> > > > unexpected exception and thread is completed his lifecycle.
> > > >
> > > > Without Failure Hanller, in the current case, the cluster will hang,
> > > > because of unable to participate in PME process.
> > > >
> > > > So, the problem is the incorrect handling of the exception in PME’s
> task
> > > > wich should be fixed.
> > > >
> > > >
> > > > вт, 26 марта 2019 г. в 14:24, Andrey Kuznetsov :
> > > >
> > > > > Nikolay,
> > > > >
> > > > > Feel free to suggest better error messages to indicate
> > > internal/critical
> > > > > failures. User actions in response to critical failures are rather
> > > limited:
> > > > > mail to user-list or maybe file an issue. As for repetitive
> warnings,
> > > it
> > > > > makes sense, but requires additional stuff to deliver such signals,
> > > mere
> > > > > spamming to log will not have an effect.
> > > > >
> > > > > Anyway, when experienced committers suggest to disable failure
> > > handling and
> > > > > hide existing issues, I feel as if they are pulling my leg.
> > > > >
> > > > > Best regards,
> > > > > Andrey Kuznetsov.
> > > > >
> > > > > вт, 26 марта 2019, 13:30 Nikolay Izhikov nizhi...@apache.org:
> > > > >
> > > > > > Andrey.
> > > > > >
> > > > > > >  the thread can be made non-critical, and we can restart it
> every
> > > time
> > > > > it
> > > > > > dies
> > > > > >
> > > > > > Why we can't restart critical thread?
> > > > > > What is the root difference between critical and non critical
> > > threads?
> > > > > >
> > > > > > > It's much simpler to catch and handle all ex

Re: GridDhtInvalidPartitionException takes the cluster down

2019-03-26 Thread Andrey Kuznetsov

Nikolay,

Feel free to suggest better error messages to indicate internal/critical
failures. User actions in response to critical failures are rather limited:
mail to user-list or maybe file an issue. As for repetitive warnings, it
makes sense, but requires additional stuff to deliver such signals, mere
spamming to log will not have an effect.

Anyway, when experienced committers suggest to disable failure handling and
hide existing issues, I feel as if they are pulling my leg.

Best regards,
Andrey Kuznetsov.

вт, 26 марта 2019, 13:30 Nikolay Izhikov nizhi...@apache.org:

> Andrey.
>
> >  the thread can be made non-critical, and we can restart it every time it
> dies
>
> Why we can't restart critical thread?
> What is the root difference between critical and non critical threads?
>
> > It's much simpler to catch and handle all exceptions in critical threads
>
> I don't agree with you.
> We develop Ignite not because it simple!
> We must spend extra time to made it robust and resilient to the failures.
>
> > Failure handling is a last-chance tool that reveals internal Ignite
> errors
> > 100% agree with you: overcome, but not hide.
>
> Logging stack trace with proper explanation is not hiding.
> Killing nodes and whole cluster is not "handling".
>
> > As far as I see from user-list messages, our users are qualified enough
> to provide necessary information from their cluster-wide logs.
>
> We shouldn't develop our product only for users who are able to read Ignite
> sources to decrypt the fail reason behind "starvation in stripped pool"
>
> Some of my questions remain unanswered :) :
>
> 1. How user can know it's an Ignite bug? Where this bug should be reported?
> 2. Do we log it somewhere?
> 3. Do we warn user before shutdown several times?
> 4. "starvation in stripped pool" I think it's not clear error message.
> Let's make it more specific!
> 5. Let's write to the user log - what he or she should do to prevent this
> error in future?
>
>
> вт, 26 мар. 2019 г. в 12:13, Andrey Kuznetsov :
>
> > Nikolay,
> >
> > >  Why we can't restart some thread?
> > Technically, we can. It's just matter of design: the thread can be made
> > non-critical, and we can restart it every time it dies. But such design
> > looks poor to me. It's much simpler to catch and handle all exceptions in
> > critical threads. Failure handling is a last-chance tool that reveals
> > internal Ignite errors. It's not pleasant for us when users see these
> > errors, but it's better than hiding.
> >
> > >  Actually, distributed systems are designed to overcome some bugs,
> thread
> > failure, node failure, for example, isn't it?
> > 100% agree with you: overcome, but not hide.
> >
> > >  How user can know it's a bug? Where this bug should be reported?
> > As far as I see from user-list messages, our users are qualified enough
> to
> > provide necessary information from their cluster-wide logs.
> >
> >
> > вт, 26 мар. 2019 г. в 11:19, Nikolay Izhikov :
> >
> > > Andrey.
> > >
> > > > As for SYSTEM_WORKER_TERMINATION, it's unrecoverable, there is no use
> > to
> > > wait for dead thread's magical resurrection.
> > >
> > > Why is it unrecoverable?
> > > Why we can't restart some thread?
> > > Is there some kind of nature limitation to not restart system thread?
> > >
> > > Actually, distributed systems are designed to overcome some bugs,
> thread
> > > failure, node failure, for example, isn't it?
> > > > if under some circumstances node> stop leads to cascade cluster
> crash,
> > > then it's a bug
> > >
> > > How user can know it's a bug? Where this bug should be reported?
> > > Do we log it somewhere?
> > > Do we warn user before shutdown one or several times?
> > >
> > > This feature kills user experience literally now.
> > >
> > > If I would be a user of the product that just shutdown with poor log I
> > > would throw this product away.
> > > Do we want it for Ignite?
> > >
> > > From SO discussion I see following error message: ": >>> Possible
> > > starvation in striped pool."
> > > Are you sure this message are clear for Ignite user(not Ignite hacker)?
> > > What user should do to prevent this error in future?
> > >
> > > В Вт, 26/03/2019 в 10:10 +0300, Andrey Kuznetsov пишет:
> > > > By default, SYSTEM_WORKER_BLOCKED failure type is not handled. I
> don't
> > > like
> > > > this behavior, but it m

Re: GridDhtInvalidPartitionException takes the cluster down

2019-03-26 Thread Andrey Kuznetsov

Nikolay,

>  Why we can't restart some thread?
Technically, we can. It's just matter of design: the thread can be made
non-critical, and we can restart it every time it dies. But such design
looks poor to me. It's much simpler to catch and handle all exceptions in
critical threads. Failure handling is a last-chance tool that reveals
internal Ignite errors. It's not pleasant for us when users see these
errors, but it's better than hiding.

>  Actually, distributed systems are designed to overcome some bugs, thread
failure, node failure, for example, isn't it?
100% agree with you: overcome, but not hide.

>  How user can know it's a bug? Where this bug should be reported?
As far as I see from user-list messages, our users are qualified enough to
provide necessary information from their cluster-wide logs.


вт, 26 мар. 2019 г. в 11:19, Nikolay Izhikov :

> Andrey.
>
> > As for SYSTEM_WORKER_TERMINATION, it's unrecoverable, there is no use to
> wait for dead thread's magical resurrection.
>
> Why is it unrecoverable?
> Why we can't restart some thread?
> Is there some kind of nature limitation to not restart system thread?
>
> Actually, distributed systems are designed to overcome some bugs, thread
> failure, node failure, for example, isn't it?
> > if under some circumstances node> stop leads to cascade cluster crash,
> then it's a bug
>
> How user can know it's a bug? Where this bug should be reported?
> Do we log it somewhere?
> Do we warn user before shutdown one or several times?
>
> This feature kills user experience literally now.
>
> If I would be a user of the product that just shutdown with poor log I
> would throw this product away.
> Do we want it for Ignite?
>
> From SO discussion I see following error message: ": >>> Possible
> starvation in striped pool."
> Are you sure this message are clear for Ignite user(not Ignite hacker)?
> What user should do to prevent this error in future?
>
> В Вт, 26/03/2019 в 10:10 +0300, Andrey Kuznetsov пишет:
> > By default, SYSTEM_WORKER_BLOCKED failure type is not handled. I don't
> like
> > this behavior, but it may be useful sometimes: "frozen" threads have a
> > chance to become active again after load decreases. As for
> > SYSTEM_WORKER_TERMINATION, it's unrecoverable, there is no use to wait
> for
> > dead thread's magical resurrection. Then, if under some circumstances
> node
> > stop leads to cascade cluster crash, then it's a bug, and it should be
> > fixed. Once and for all. Instead of hiding the flaw we have in the
> product.
> >
> > вт, 26 мар. 2019 г. в 09:17, Roman Shtykh :
> >
> > > + 1 for having the default settings revisited.
> > > I understand Andrey's reasonings, but sometimes taking nodes down is
> too
> > > radical (as in my case it was GridDhtInvalidPartitionException which
> could
> > > be ignored for a while when rebalancing <- I might be wrong here).
> > >
> > > -- Roman
> > >
> > >
> > > On Tuesday, March 26, 2019, 2:52:14 p.m. GMT+9, Denis Magda <
> > > dma...@apache.org> wrote:
> > >
> > > pNikolay,
> > > Thanks for kicking off this discussion. Surprisingly, planned to start
> a
> > > similar one today and incidentally came across this thread.
> > > Agree that the failure handler should be off by default or the default
> > > settings have to be revisited. That's true that people are complaining
> of
> > > nodes shutdowns even on moderate workloads. For instance, that's the
> most
> > > recent feedback related to slow checkpointing:
> > >
> https://stackoverflow.com/questions/55299337/stripped-pool-starvation-in-wal-writing-causes-node-cluster-node-failure
> > >
> > > At a minimum, let's consider the following:
> > >- A failure handler needs to provide hints on how to come around the
> > > shutdown in the future. Take the checkpointing SO thread above. It's
> > > unclear from the logs how to prevent the same situation next time
> (suggest
> > > parameters for tuning, flash drives, etc).
> > >- Is there any protection for a full cluster restart? We need to
> > > distinguish a slow cluster from the stuck one. A node removal should
> not
> > > lead to a meltdown of the whole storage.
> > >- Should we enable the failure handler for things like transactions
> or
> > > PME and have it off for checkpointing and something else? Let's have it
> > > enabled for cases when we are 100% certain that a node shutdown is the
> > > right thing and print out warnings with suggestions whenever we'r

Re: GridDhtInvalidPartitionException takes the cluster down

2019-03-26 Thread Andrey Kuznetsov

By default, SYSTEM_WORKER_BLOCKED failure type is not handled. I don't like
this behavior, but it may be useful sometimes: "frozen" threads have a
chance to become active again after load decreases. As for
SYSTEM_WORKER_TERMINATION, it's unrecoverable, there is no use to wait for
dead thread's magical resurrection. Then, if under some circumstances node
stop leads to cascade cluster crash, then it's a bug, and it should be
fixed. Once and for all. Instead of hiding the flaw we have in the product.

вт, 26 мар. 2019 г. в 09:17, Roman Shtykh :

> + 1 for having the default settings revisited.
> I understand Andrey's reasonings, but sometimes taking nodes down is too
> radical (as in my case it was GridDhtInvalidPartitionException which could
> be ignored for a while when rebalancing <- I might be wrong here).
>
> -- Roman
>
>
> On Tuesday, March 26, 2019, 2:52:14 p.m. GMT+9, Denis Magda <
> dma...@apache.org> wrote:
>
>  Nikolay,
> Thanks for kicking off this discussion. Surprisingly, planned to start a
> similar one today and incidentally came across this thread.
> Agree that the failure handler should be off by default or the default
> settings have to be revisited. That's true that people are complaining of
> nodes shutdowns even on moderate workloads. For instance, that's the most
> recent feedback related to slow checkpointing:
> https://stackoverflow.com/questions/55299337/stripped-pool-starvation-in-wal-writing-causes-node-cluster-node-failure
>
> At a minimum, let's consider the following:
>- A failure handler needs to provide hints on how to come around the
> shutdown in the future. Take the checkpointing SO thread above. It's
> unclear from the logs how to prevent the same situation next time (suggest
> parameters for tuning, flash drives, etc).
>- Is there any protection for a full cluster restart? We need to
> distinguish a slow cluster from the stuck one. A node removal should not
> lead to a meltdown of the whole storage.
>- Should we enable the failure handler for things like transactions or
> PME and have it off for checkpointing and something else? Let's have it
> enabled for cases when we are 100% certain that a node shutdown is the
> right thing and print out warnings with suggestions whenever we're not
> confident that the removal is appropriate.
> --Denis
>
> On Mon, Mar 25, 2019 at 5:52 AM Andrey Gura  wrote:
>
> Failure handlers were introduced in order to avoid cluster hanging and
> they kill nodes instead.
>
> If critical worker was terminated by GridDhtInvalidPartitionException
> then your node is unable to work anymore.
>
> Unexpected cluster shutdown with reasons in logs that failure handlers
> provide is better than hanging. So answer is NO. We mustn't disable
> failure handlers.
>
> On Mon, Mar 25, 2019 at 2:47 PM Roman Shtykh 
> wrote:
> >
> > If it sticks to the behavior we had before introducing failure handler,
> I think it's better to have disabled instead of killing the whole cluster,
> as in my case, and create a parent issue for those ten bugs.Pavel, thanks
> for the suggestion!
> >
> >
> >
> > On Monday, March 25, 2019, 7:07:20 p.m. GMT+9, Nikolay Izhikov <
> nizhi...@apache.org> wrote:
> >
> >  Guys.
> >
> > We should fix the SYSTEM_WORKER_TERMINATION once and for all.
> > Seems, we have ten or more "cluster shutdown" bugs with this subsystem
> > since it was introduced.
> >
> > Should we disable it by default in 2.7.5?
> >
> >
> > пн, 25 мар. 2019 г. в 13:04, Pavel Kovalenko :
> >
> > > Hi Roman,
> > >
> > > I think this InvalidPartition case can be simply handled
> > > in GridCacheTtlManager.expire method.
> > > For workaround a custom FailureHandler can be configured that will not
> stop
> > > a node in case of such exception is thrown.
> > >
> > > пн, 25 мар. 2019 г. в 08:38, Roman Shtykh :
> > >
> > > > Igniters,
> > > >
> > > > Restarting a node when injecting data and having it expired, results
> at
> > > > GridDhtInvalidPartitionException which terminates nodes with
> > > > SYSTEM_WORKER_TERMINATION one by one taking the whole cluster down.
> This
> > > is
> > > > really bad and I didn't find the way to save the cluster from
> > > disappearing.
> > > > I created a JIRA issue
> > > https://issues.apache.org/jira/browse/IGNITE-11620
> > > > with a test case. Any clues how to fix this inconsistency when
> > > rebalancing?
> > > >
> > > > -- Roman
> > > >
> > >
>
>



-- 
Best regards,
  Andrey Kuznetsov.

Re: Ignite 2.7.5 Release scope

2019-03-25 Thread Andrey Kuznetsov

Roman, I think the worst thing we can do is to hide the bug you discovered.
The sane options are either fix it urgently or classify it as non-critical
and postpone.

вт, 26 мар. 2019 г. в 05:13, Roman Shtykh :

> Guys, what do you think about disabling SYSTEM_WORKER_TERMINATION
> (introduced with IEP-14) before "cluster shutdown" bugs are fixed, as
> suggested by Nikolay I. in "GridDhtInvalidPartitionException takes the
> cluster down" thread?
>
> -- Roman
>
>
> On Tuesday, March 26, 2019, 3:41:29 a.m. GMT+9, Dmitriy Pavlov <
> dpav...@apache.org> wrote:
>
>  Hi Ignite Developers,
>
> So because nobody raised any feature I would like to call for scope freeze
> for 2.7.5.
>
> The scope is limited with corruption fix, Java 11 issues addressed.
> https://cwiki.apache.org/confluence/display/IGNITE/Apache+Ignite+2.7.5
>
> Also, launch scripts will be tested for Java 12.
>
> We entered the Rampdown phase. See more info in
> https://cwiki.apache.org/confluence/display/IGNITE/Release+Process
>
> Issues can be added to the scope only through discussion.
>
> Sincerely,
> Dmitriy Pavlov
>
> пн, 25 мар. 2019 г. в 11:24, Ilya Kasnacheev :
>
> > Hello!
> >
> > It seems that I can no longer test this case, on account of
> >
> >
> TcpDiscoveryCoordinatorFailureTest#testClusterFailedNewCoordinatorInitialized
> > hanging every time under Java 11 on Windows.
> >
> > Alexey, Ivan, can you please take a look?
> >
> >
> >
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_SpiWindows=buildTypeStatusDiv_IgniteTests24Java8=__all_branches__
> >
> > Regards,
> >
> > --
> > Ilya Kasnacheev
> >
> >
> > пт, 22 мар. 2019 г. в 16:59, Ilya Kasnacheev  >:
> >
> > > Hello!
> > >
> > > Basically there is a test that explicitly highlights this problem, that
> > is
> > > running SSL tests on Windows + Java 11. They will hang on Master but
> pass
> > > with this patch.
> > >
> > > I have started that on TC, results will probably be available later
> > today:
> > >
> > >
> >
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_SpiWindows=buildTypeStatusDiv_IgniteTests24Java8=__all_branches__
> > > (mind the Java version).
> > >
> > > Regards,
> > > --
> > > Ilya Kasnacheev
> > >
> > >
> > > пт, 22 мар. 2019 г. в 14:13, Maxim Muzafarov :
> > >
> > >> Dmitry, Ilya,
> > >>
> > >> Yes, I've looked through those changes [1] as they can affect my local
> > >> PR.  Basically, changes look good to me.
> > >>
> > >> I'm not an expert with CommunicationSpi component, so can miss some
> > >> details and I haven't tested these changes under Java 11. One more
> > >> thing I'd like to say, I would add additional tests to PR that will
> > >> explicitly highlight the problem being solved.
> > >>
> > >>
> > >> [1] https://issues.apache.org/jira/browse/IGNITE-11299
> > >>
> > >> On Thu, 21 Mar 2019 at 22:57, Dmitriy Pavlov 
> > wrote:
> > >> >
> > >> > Hi Igniters,
> > >> >
> > >> > fix https://issues.apache.org/jira/browse/IGNITE-11299 Avoid busy
> > wait
> > >> on
> > >> > processWrite during SSL handshake.
> > >> > seems to be blocker cause it is related to Java 11
> > >> >
> > >> > I see Maxim M left some comments. Ilya K., Maxim M.were these
> comments
> > >> > addressed?
> > >> >
> > >> > The ticket is in Patch Available. Reviewer needed. Changes located
> in
> > >> > GridNioServer.
> > >> >
> > >> > Sincerely,
> > >> > Dmitriy Pavlov
> > >> >
> > >> > P.S. a quite obvious ticket came to sope, as well:
> > >> > https://issues.apache.org/jira/browse/IGNITE-11600
> > >> >
> > >> >
> > >> > чт, 21 мар. 2019 г. в 16:55, Petr Ivanov :
> > >> >
> > >> > > Huge +1
> > >> > >
> > >> > > Will try to add new JDK in nearest time to our Teamcity.
> > >> > >
> > >> > >
> > >> > > > On 21 Mar 2019, at 16:27, Dmitriy Pavlov 
> > >> wrote:
> > >> > > >
> > >> > > > Hi Igniters,
> > >> > > >
> > >> > > > Meanwhile, Java 12 GA is available. I suggest at least test our
> > new
> > >> tests
> > >> > > > scripts with a couple of Java builds. WDYT?
> > >> > > >
> > >> > > > Sincerely,
> > >> > > > Dmitriy Pavlov
> > >> > > >
> > >> > > > ср, 20 мар. 2019 г. в 19:21, Dmitriy Pavlov  >:
> > >> > > >
> > >> > > >> Hi Ignite Developers,
> > >> > > >>
> > >> > > >> In a separate discussion, I've shared a log with all commits.
> > >> > > >>
> > >> > > >> As far as I can see, nobody removed commits from this sheet, so
> > the
> > >> > > scope
> > >> > > >> of release will be discussed in another way: only explicitly
> > >> declared
> > >> > > >> commits will be cherry-picked.
> > >> > > >>
> > >> > > >> Sincerely,
> > >> > > >> Dmitriy Pavlov
> > >> > > >>
> > >> > >
> > >> > >
> > >>
> > >
> >



-- 
Best regards,
  Andrey Kuznetsov.

Re: how to get zipkin tracing of REST calls from one microservice to all microservices calls from any task defined in apache ignite

2019-03-14 Thread Andrey Kuznetsov

Hi!

As far as I know, Ignite does not support Zipkin trace propagation out of
the box, unlike Spring Cloud. Hence request handler in {{rest-http}} loses
tracing-context-specific headers (see [1]) sent with request and the span
gets broken into two parts. This can be worked around by
replacing {{rest-http}} with Spring Cloud request handler, that in turn
should execute Ignite Compute Task explicitly.

[1]
https://cloud.spring.io/spring-cloud-sleuth/single/spring-cloud-sleuth.html#_propagation

чт, 14 мар. 2019 г. в 09:16, Павлухин Иван :

> Hi,
>
> As far as I remember zipkin defines tracing units in a current thread
> of execution. I cannot say for sure what goes wrong in your case. But
> it might be that traced execution on ser3 side switches from one
> thread to another and you see 2 units as a result.
>
> чт, 14 мар. 2019 г. в 02:09, Aditya Kumar :
> >
> >
> > Hi Team,
> >
> > I was using ignite as dependecy in our application and was able to trace
> end to end trace microservice calls.
> > Then, to let ignite handle our services in compute task, we removed all
> spring-boot dependencies and created task for each service we had in our
> microservice.
> >
> > The issue we are facing is explained using below POC done at our end.
> >
> >
> > We created a sample app where below things have been done and deployed
> in ignite:
> > 1. created beans and interceptors needed to start and track zipkin and
> brave trace
> > 2. craeted a task using org.apache.ignite.compute.ComputeTaskAdapter
> > 3. registered the task in the config file used while starting ignite
> server
> >
> > Below are the list of services(each service is exposing a REST endpoint)
> created to test this scenario:
> > 1. ms1 (spring boot app)
> > 2. ser2 (spring mvc app having a rest endpoint to serve the incoming
> request)
> > 3. ser3 (spring mvc app using Ignite ComputeTaskAdapter to serve the
> incoming request from '/ignite'. used ignite's ignite-rest-http.jar to
> enable '/ignite' endpoint)
> > 4. ms4 (spring boot app)
> > 5. ms5 (spring boot app)
> >
> > Then, there were two scenarios of executions:
> > Case1. ms1 -> ser2 -> ms4 -> ms5   ==> we get single unit of traing in
> zipkin from ms1 to ms5 (i.e. ms1 -> ser2 -> ms4 -> ms5)
> > Case2. ms1 -> ser3 -> ms4 -> ms5   ==> we get two unit of tracing in
> zipkin. One is ms1 -> ser3 and another is ser3 -> ms4 -> ms5
> >
> > I need to get single unit of tracing in zipkin using Case2
> execution(i.e. as we get in Case1)
> >
> > The sample app (ser3) is checked-in at
> https://github.com/aditya2910/adzzz1/tree/master/ignite-rest-task
> >
> > Any help will be appreciated.
> > Please let me know incase you need any more info.
> >
> > Thanks,
> > Aditya
> > This email and any files transmitted with it are confidential,
> proprietary and intended solely for the individual or entity to whom they
> are addressed. If you have received this email in error please delete it
> immediately.
>
>
>
> --
> Best regards,
> Ivan Pavlukhin
>


-- 
Best regards,
  Andrey Kuznetsov.

[jira] [Created] (IGNITE-11286) Add console prompt for password in control.(sh|bat) script

2019-02-11 Thread Andrey Kuznetsov (JIRA)

Andrey Kuznetsov created IGNITE-11286:
-

 Summary: Add console prompt for password in control.(sh|bat) script
 Key: IGNITE-11286
 URL: https://issues.apache.org/jira/browse/IGNITE-11286
 Project: Ignite
  Issue Type: Task
Affects Versions: 2.7
Reporter: Andrey Kuznetsov
Assignee: Andrey Kuznetsov


For security reasons we are to add interactive alternative to {{--password}}. 
Other password-related options already have this alternative, see [1].

[1] https://jira.apache.org/jira/browse/IGNITE-10257



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (IGNITE-11268) Unable to run control.bat or ignite.bat from Ignite source tree with compiled classes

2019-02-08 Thread Andrey Kuznetsov (JIRA)

Andrey Kuznetsov created IGNITE-11268:
-

 Summary: Unable to run control.bat or ignite.bat from Ignite 
source tree with compiled classes
 Key: IGNITE-11268
 URL: https://issues.apache.org/jira/browse/IGNITE-11268
 Project: Ignite
  Issue Type: Bug
Reporter: Andrey Kuznetsov


Under Windows, {{control}} and {{ignite}} scripts collect Java classpath value 
from {{%IGNITE_HOME%\modules\*\target}} directories into a variable. Batch 
script variable length is limited to 8k characters, and this leads to error 
when there are many compiled/packaged modules in a source tree.

Possible (yet imperfect) solutions:
- Limit modules list to some minimal required sublist.
- Create Class-Path-header-only jar "on the fly".
- (Java 9+ only) Generate command-line arguments file, see [1].

[1] 
https://docs.oracle.com/javase/9/tools/java.htm#JSWOR-GUID-4856361B-8BFD-4964-AE84-121F5F6CF111



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (IGNITE-11267) Print Warn user when keystore password arguments

2019-02-08 Thread Andrey Kuznetsov (JIRA)

Andrey Kuznetsov created IGNITE-11267:
-

 Summary: Print Warn user when keystore password arguments
 Key: IGNITE-11267
 URL: https://issues.apache.org/jira/browse/IGNITE-11267
 Project: Ignite
  Issue Type: Bug
Reporter: Andrey Kuznetsov






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Unreliable checks in tests for string presence in GridStringLogger contents

2019-02-01 Thread Andrey Kuznetsov

Hi, Sergey!

Your note sounds convincing. +1 for adding throwing version of {{check}}.

Best regards,
Andrey Kuznetsov.

вт, 29 янв. 2019, 19:14 Sergey macrerg...@gmail.com:

> Hi!
> I appreciate your efforts in replcacing GridStringLogger, just a remark.
> I think it was a mistake to change check() to boolean.
> I supppose method should have not be changed, but added as both methods are
> useful.
>
> Now we've lost error description messages existed in previous
> implementation.
>
> I mean if we previously matched specific counts, the following error
> message returned:
>
> String err =  errMsg != null ? errMsg :
> "\"" + subj + "\" matches " + matchesCnt + " times, expected: " +
>
> (exp.getMaximum() == exp.getMinimum() ? exp.getMinimum() : exp) +
> ".";
>
>
> But now in case of error we have less information.
>
> Hence, I suppose we should add  new method ( name can be assert() ) with
> old implementation returning AssertionError.
>
> What do you think?
>
> Best regards,
> Sergey Kosarev.
>
>
> пт, 26 окт. 2018 г. в 16:53, Nikita Amelchev :
>
> > Thanks for comments,
> >
> > I have filed a ticket [1] and will implement it if you don't mind.
> >
> > 1. https://issues.apache.org/jira/browse/IGNITE-10023
> > пт, 26 окт. 2018 г. в 15:56, Dmitrii Ryabov :
> > >
> > > > waitForCondition(lsnr::check, timeout);
> > > Agree, it is more convenient to use.
> > > пт, 26 окт. 2018 г. в 13:01, Pavel Pereslegin :
> > > >
> > > > Nikita,
> > > > personally, I don’t like that "check()" throws an AssertionError, but
> > in
> > > > the case of a composite listener, it will indicate which of the
> > conditions
> > > > did not work.
> > > > Btw, your case can be solved with custom listener, but I think it's
> > good
> > > > improvement, let's do it.
> > > >
> > > > чт, 25 окт. 2018 г. в 21:31, Andrey Kuznetsov :
> > > >
> > > > > Nikita,
> > > > >
> > > > > I like your suggestion. It looks more expressive for me than
> existing
> > > > > throwing version.
> > > > >
> > > > > чт, 25 окт. 2018 г. в 21:07, Nikita Amelchev  >:
> > > > >
> > > > > > Hi, Igniters.
> > > > > >
> > > > > > I suggest improving new listening test logger.
> > > > > >
> > > > > > I found usage case when needs wait for conditions for test
> duration
> > > > > > optimization.
> > > > > > For example, that messages A and B will be logged.
> > > > > >
> > > > > > For now, LogListener.check() doesn't return checking result as
> > boolean.
> > > > > > It throws the exception if conditions fail. Code for this case:
> > > > > >
> > > > > > waitForCondition(() -> {
> > > > > >  try {
> > > > > >lsnr.check();
> > > > > >
> > > > > >return true;
> > > > > >  }
> > > > > >  catch (AssertionError ignored) {
> > > > > >return false;
> > > > > >  }
> > > > > > }, timeout);
> > > > > >
> > > > > > For code readability, I suggest make LogListener.check() with
> > boolean
> > > > > type:
> > > > > >
> > > > > > waitForCondition(lsnr::check, timeout);
> > > > > >
> > > > > > Also, it's more understandable when we write explicit assert in
> > tests:
> > > > > > assertTrue("Fail reason.", lsnr.check());
> > > > > > ср, 23 мая 2018 г. в 14:36, Andrey Kuznetsov  >:
> > > > > > >
> > > > > > > Thanks, Vyacheslav.
> > > > > > >
> > > > > > > Created the issue [1] based on your idea.
> > > > > > >
> > > > > > > [1]  https://issues.apache.org/jira/browse/IGNITE-8570
> > > > > > >
> > > > > > >
> > > > > > > 2018-05-23 12:41 GMT+03:00 Vyacheslav Daradur <
> > daradu...@gmail.com>:
> > > > > > >
> > > > > > > > Hi, Andrey, I have faced this problem too.
> > > > > > > >
> > > > >

Re: [MTCGA]: new failures in builds [2636079] needs to be handled

2018-12-25 Thread Andrey Kuznetsov

Separate issue has been created to address this: IGNITE-10813.

Best regards,
Andrey Kuznetsov.

пн, 24 дек. 2018, 19:37 dpavlov.ta...@gmail.com:

> Hi Igniters,
>
>  I've detected some new issue on TeamCity to be handled. You are more than
> welcomed to help.
>
>  If your changes can lead to this failure(s): We're grateful that you were
> a volunteer to make the contribution to this project, but things change and
> you may no longer be able to finalize your contribution.
>  Could you respond to this email and indicate if you wish to continue and
> fix test failures or step down and some committer may revert you commit.
>
>  *Recently contributed test failed in master
> CheckpointReadLockFailureTest.initializationError
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=-5099637610936041665=%3Cdefault%3E=testDetails
>  Changes may lead to failure were done by
>  - stkuzma
> https://ci.ignite.apache.org/viewModification.html?modId=850850
>
>  - Here's a reminder of what contributors were agreed to do
> https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute
>  - Should you have any questions please contact
> dev@ignite.apache.org
>
> Best Regards,
> Apache Ignite TeamCity Bot
> https://github.com/apache/ignite-teamcity-bot
> Notification generated at 19:36:56 24-12-2018
>

[jira] [Created] (IGNITE-10813) Run CheckpointReadLockFailureTest with JUnit4 runner

2018-12-25 Thread Andrey Kuznetsov (JIRA)

Andrey Kuznetsov created IGNITE-10813:
-

 Summary: Run CheckpointReadLockFailureTest with JUnit4 runner
 Key: IGNITE-10813
 URL: https://issues.apache.org/jira/browse/IGNITE-10813
 Project: Ignite
  Issue Type: Bug
Reporter: Andrey Kuznetsov
Assignee: Andrey Kuznetsov


The test fails on TeamCity. Should be run in JUnit4 manner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Write access to Apache Ignite Confluence

2018-12-18 Thread Andrey Kuznetsov

Thanks, Dmitriy.

The page stub [1] is now created.

[1]
https://cwiki.apache.org/confluence/display/IGNITE/Apache+Ignite+3.0+Wishlist

Best regards,
Andrey Kuznetsov.

вт, 18 дек. 2018, 12:45 Dmitriy Pavlov dpav...@apache.org:

> Hi Andrey,
>
> I've updated permissions, please check.
>
> Sincerely,
> Dmitriy Pavlov
>
> вт, 18 дек. 2018 г. в 12:34, Andrey Kuznetsov :
>
> > Hi, Igniters.
> >
> > I'd like to add new wiki page with suggestions for future Ignite 3.0
> > release. We can collect any ideas there and then discuss and refine them
> on
> > dev-list.
> >
> > Could someone grant me (andrey-kuznetsov) write permission?
> >
>

Write access to Apache Ignite Confluence

2018-12-18 Thread Andrey Kuznetsov

Hi, Igniters.

I'd like to add new wiki page with suggestions for future Ignite 3.0
release. We can collect any ideas there and then discuss and refine them on
dev-list.

Could someone grant me (andrey-kuznetsov) write permission?

[GitHub] ignite pull request #4547: Tdr 26

2018-12-10 Thread andrey-kuznetsov

Github user andrey-kuznetsov closed the pull request at:

https://github.com/apache/ignite/pull/4547


---

[GitHub] ignite pull request #5092: TDR-93 Several WAL compressor fixes.

2018-12-10 Thread andrey-kuznetsov

Github user andrey-kuznetsov closed the pull request at:

https://github.com/apache/ignite/pull/5092


---

[GitHub] ignite pull request #5219: IGNITE-10079 Fixed inconsistent lastCompactedSegm...

2018-12-10 Thread andrey-kuznetsov

Github user andrey-kuznetsov closed the pull request at:

https://github.com/apache/ignite/pull/5219


---

[GitHub] ignite pull request #5541: IGNITE-10386 Add mode when WAL won't be disabled ...

2018-12-10 Thread andrey-kuznetsov

Github user andrey-kuznetsov closed the pull request at:

https://github.com/apache/ignite/pull/5541


---

Re: Default failure handler was changed for tests

2018-12-05 Thread Andrey Kuznetsov

> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Dmitri,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The meaningful failure handler as a default
> one
> > > > looks
> > > > > > > > > > reasonable.
> > > > > > > > > > > > > > > > Thanks a lot.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > But what is the reason to fallback to noop
> for
> > > 100+
> > > > > > test?
> > > > > > > > > > > > > > > > Does it means these test become failed after
> > > > changing
> > > > > > > > default
> > > > > > > > > > > > failure
> > > > > > > > > > > > > > > > handler?
> > > > > > > > > > > > > > > > If so, let's create a ticket (may be
> umbrella)
> > to
> > > > > > > > investigate
> > > > > > > > > > and
> > > > > > > > > > > > fix
> > > > > > > > > > > > > > > this.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I see 100+ touched files in PR and some of
> them
> > > are
> > > > > > > > abstract
> > > > > > > > > > > > classes,
> > > > > > > > > > > > > > so,
> > > > > > > > > > > > > > > > we have much more affected tests.
> > > > > > > > > > > > > > > > Seems, most of failover test doesn't expects
> if
> > > any
> > > > > > > > critical
> > > > > > > > > > > > internal
> > > > > > > > > > > > > > > issue
> > > > > > > > > > > > > > > > occur and there is no need to fallback to
> noop.
> > > > > > > > > > > > > > > > Other test should set custom failure handler
> to
> > > > > detect
> > > > > > > > expected
> > > > > > > > > > > > > > failures
> > > > > > > > > > > > > > > or
> > > > > > > > > > > > > > > > if grid hanging simulation is needed (to keep
> > > > hanged
> > > > > > grid
> > > > > > > > under
> > > > > > > > > > > > > > control).
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Wed, Dec 5, 2018 at 12:16 PM Anton
> > Vinogradov
> > > <
> > > > > > > > > > a...@apache.org>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Dmitrii,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > No-op means "hide any problem", so, we lose
> > the
> > > > > > > > guarantees.
> > > > > > > > > > > > > > > > > Could you please share some examples where
> > > > "no-op"
> > > > > > > better
> > > > > > > > > > than
> > > > > > > > > > > > > > "strict
> > > > > > > > > > > > > > > > > try-catch with a check"?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Wed, Dec 5, 2018 at 11:37 AM Dmitrii
> > Ryabov
> > > <
> > > > > > > > > > > > > > somefire...@gmail.com>
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Anton, I think wrapping every
> disconnecting
> > > > node
> > > > > > with
> > > > > > > > > > try-catch
> > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > less readable than no-op handler.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > ср, 5 дек. 2018 г., 9:26 Dmitriy Pavlov
> > > > > > > > dpav...@apache.org
> > > > > > > > > > :
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Folks let me remind you that Dmitry
> > changed
> > > > > > default
> > > > > > > > of
> > > > > > > > > > ALL
> > > > > > > > > > > > > tests
> > > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > > > noop
> > > > > > > > > > > > > > > > > > > to a meaningful handler. So we should
> > start
> > > > > every
> > > > > > > > message
> > > > > > > > > > > > here
> > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > > > saying
> > > > > > > > > > > > > > > > > > > thank you to Dmitry.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Please review remaining tests and
> remove
> > > noop
> > > > > > where
> > > > > > > > > > possible.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > вт, 4 дек. 2018 г., 23:48 Andrey
> > Mashenkov
> > > <
> > > > > > > > > > > > > > > > andrey.mashen...@gmail.com
> > > > > > > > > > > > > > > > > >:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Really, why noop?
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > If you expect failure handler should
> be
> > > > > > > triggered,
> > > > > > > > you
> > > > > > > > > > can
> > > > > > > > > > > > > > > override
> > > > > > > > > > > > > > > > > > > default
> > > > > > > > > > > > > > > > > > > > one and rise some flag, which can be
> > > > checked
> > > > > in
> > > > > > > > test.
> > > > > > > > > > > > > > > > > > > > This will make test clearer.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > With noop, you'll get previous
> unwanted
> > > > > > > behavior,
> > > > > > > > > > that you
> > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > trying
> > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > improve, isnt'it?
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > 4 дек. 2018 г. 23:25 пользователь
> > "Anton
> > > > > > > > Vinogradov" <
> > > > > > > > > > > > > > > > a...@apache.org>
> > > > > > > > > > > > > > > > > > > > написал:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > And you have to check the reason of
> > > failure
> > > > > > > inside
> > > > > > > > the
> > > > > > > > > > > > > > try-catch
> > > > > > > > > > > > > > > > > block,
> > > > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > course.
> > > > > > > > > > > > > > > > > > > > In case found not equals to expected
> > then
> > > > > test
> > > > > > > > should
> > > > > > > > > > > > rethrow
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > exception.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > вт, 4 дек. 2018 г. в 23:21, Anton
> > > > Vinogradov
> > > > > <
> > > > > > > > > > > > a...@apache.org
> > > > > > > > > > > > > >:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Dmitrii,
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > The solution is not clear to me.
> > > > > > > > > > > > > > > > > > > > > In case you expect the failure
> then a
> > > > > correct
> > > > > > > > case
> > > > > > > > > > is to
> > > > > > > > > > > > > wrap
> > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > > > > > > try-catch block instead of no-op
> > > failure
> > > > > > > handler
> > > > > > > > > > usage.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > вт, 4 дек. 2018 г. в 21:41, Dmitrii
> > > > Ryabov
> > > > > <
> > > > > > > > > > > > > > > > somefire...@gmail.com
> > > > > > > > > > > > > > > > > >:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >> Anton,
> > > > > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > > > > >> Tests in these classes check fail
> > > cases
> > > > > when
> > > > > > > we
> > > > > > > > > > expect
> > > > > > > > > > > > > > > critical
> > > > > > > > > > > > > > > > > > > > >> failure like node stop or
> exception
> > > > > thrown.
> > > > > > > Such
> > > > > > > > > > tests
> > > > > > > > > > > > > > trigger
> > > > > > > > > > > > > > > > > > failure
> > > > > > > > > > > > > > > > > > > > >> handler and it fails test when
> > > > everything
> > > > > > goes
> > > > > > > > as it
> > > > > > > > > > > > > should
> > > > > > > > > > > > > > > go.
> > > > > > > > > > > > > > > > > > That's
> > > > > > > > > > > > > > > > > > > > >> why we need no-op handler here.
> > > > > > > > > > > > > > > > > > > > >> вт, 4 дек. 2018 г. в 20:06,
> Dmitriy
> > > > > Pavlov <
> > > > > > > > > > > > > > > dpav...@apache.org
> > > > > > > > > > > > > > > > >:
> > > > > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > > > > >> > Hi Igniters,
> > > > > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > > > > >> > BTW, if you find in any of your
> > > tests
> > > > it
> > > > > > > > does't
> > > > > > > > > > need
> > > > > > > > > > > > an
> > > > > > > > > > > > > > old
> > > > > > > > > > > > > > > > > value
> > > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > >> > handler (=NoOp), feel free to
> > remove
> > > > it.
> > > > > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > > > > >> > Sincerely,
> > > > > > > > > > > > > > > > > > > > >> > Dmitriy Pavlov
> > > > > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > > > > >> > вт, 4 дек. 2018 г. в 20:02,
> Anton
> > > > > > > Vinogradov <
> > > > > > > > > > > > > > a...@apache.org
> > > > > > > > > > > > > > > >:
> > > > > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > > > > >> > > Dmitrii,
> > > > > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > > > > >> > > Could you please explain the
> > > reason
> > > > of
> > > > > > > > explicit
> > > > > > > > > > set
> > > > > > > > > > > > of
> > > > > > > > > > > > > > > 100+
> > > > > > > > > > > > > > > > > > > > >> > > NoOpFailureHandlers?
> > > > > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > > > > >> > > вт, 4 дек. 2018 г. в 19:12,
> > > Dmitrii
> > > > > > > Ryabov <
> > > > > > > > > > > > > > > > > > somefire...@gmail.com
> > > > > > > > > > > > > > > > > > > >:
> > > > > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > > > > >> > > > Hello, Igniters!
> > > > > > > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > > > > > > >> > > > Today the test framework's
> > > default
> > > > > > no-op
> > > > > > > > > > failure
> > > > > > > > > > > > > > handler
> > > > > > > > > > > > > > > > was
> > > > > > > > > > > > > > > > > > > > >> changed to
> > > > > > > > > > > > > > > > > > > > >> > > the
> > > > > > > > > > > > > > > > > > > > >> > > > handler, which stops the
> node
> > > and
> > > > > > fails
> > > > > > > > the
> > > > > > > > > > test.
> > > > > > > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > > > > > > >> > > > Over 100 tests kept no-op
> > > failure
> > > > > > > handler
> > > > > > > > by
> > > > > > > > > > > > > overrided
> > > > > > > > > > > > > > > > > > > > >> > > > `getFailureHandler()`
> method.
> > > > > > > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > > > > > > >> > > > If you'll found a problem or
> > > > > something
> > > > > > > > > > unexpected
> > > > > > > > > > > > -
> > > > > > > > > > > > > > > write
> > > > > > > > > > > > > > > > > here
> > > > > > > > > > > > > > > > > > > or
> > > > > > > > > > > > > > > > > > > > >> in the
> > > > > > > > > > > > > > > > > > > > >> > > > ticket [1].
> > > > > > > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > > > > > > >> > > > [1]
> > > > > > > > > > > > > https://issues.apache.org/jira/browse/IGNITE-8227
> > > > > > > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > Best regards,
> > > > > > > > > > > > > > > > Andrey V. Mashenkov
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Best regards,
> > > > > > > > > > Ivan Pavlukhin
> > > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best regards,
> > > > > > > > Ivan Pavlukhin
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > Best regards,
> > > Andrey V. Mashenkov
> > >
> >
>


-- 
Best regards,
  Andrey Kuznetsov.

Re: Default failure handler was changed for tests

2018-12-05 Thread Andrey Kuznetsov

n PR and some of them are
> > > > > abstract
> > > > > > > > > classes,
> > > > > > > > > > > so,
> > > > > > > > > > > > > we have much more affected tests.
> > > > > > > > > > > > > Seems, most of failover test doesn't expects if any
> > > > > critical
> > > > > > > > > internal
> > > > > > > > > > > > issue
> > > > > > > > > > > > > occur and there is no need to fallback to noop.
> > > > > > > > > > > > > Other test should set custom failure handler to
> > detect
> > > > > expected
> > > > > > > > > > > failures
> > > > > > > > > > > > or
> > > > > > > > > > > > > if grid hanging simulation is needed (to keep
> hanged
> > > grid
> > > > > under
> > > > > > > > > > > control).
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Dec 5, 2018 at 12:16 PM Anton Vinogradov <
> > > > > > > a...@apache.org>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Dmitrii,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > No-op means "hide any problem", so, we lose the
> > > > > guarantees.
> > > > > > > > > > > > > > Could you please share some examples where
> "no-op"
> > > > better
> > > > > > > than
> > > > > > > > > > > "strict
> > > > > > > > > > > > > > try-catch with a check"?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, Dec 5, 2018 at 11:37 AM Dmitrii Ryabov <
> > > > > > > > > > > somefire...@gmail.com>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Anton, I think wrapping every disconnecting
> node
> > > with
> > > > > > > try-catch
> > > > > > > > > > > will
> > > > > > > > > > > > be
> > > > > > > > > > > > > > > less readable than no-op handler.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > ср, 5 дек. 2018 г., 9:26 Dmitriy Pavlov
> > > > > dpav...@apache.org
> > > > > > > :
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Folks let me remind you that Dmitry changed
> > > default
> > > > > of
> > > > > > > ALL
> > > > > > > > > > tests
> > > > > > > > > > > > from
> > > > > > > > > > > > > > > noop
> > > > > > > > > > > > > > > > to a meaningful handler. So we should start
> > every
> > > > > message
> > > > > > > > > here
> > > > > > > > > > > from
> > > > > > > > > > > > > > > saying
> > > > > > > > > > > > > > > > thank you to Dmitry.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Please review remaining tests and remove noop
> > > where
> > > > > > > possible.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > вт, 4 дек. 2018 г., 23:48 Andrey Mashenkov <
> > > > > > > > > > > > > andrey.mashen...@gmail.com
> > > > > > > > > > > > > > >:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Really, why noop?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > If you expect failure handler should be
> > > > triggered,
> > > > > you
> > > > > > > can
> > > > > > > > > > > > override
> > > > > > > > > > > > > > > > default
> > > > > > > > > > > > > > > > > one and rise some flag, which can be
> checked
> > in
> > > > > test.
> > > > > > > > > > > > > > > > > This will make test clearer.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > With noop, you'll get previous unwanted
> > > > behavior,
> > > > > > > that you
> > > > > > > > > > are
> > > > > > > > > > > > > > trying
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > improve, isnt'it?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 4 дек. 2018 г. 23:25 пользователь "Anton
> > > > > Vinogradov" <
> > > > > > > > > > > > > a...@apache.org>
> > > > > > > > > > > > > > > > > написал:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > And you have to check the reason of failure
> > > > inside
> > > > > the
> > > > > > > > > > > try-catch
> > > > > > > > > > > > > > block,
> > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > course.
> > > > > > > > > > > > > > > > > In case found not equals to expected then
> > test
> > > > > should
> > > > > > > > > rethrow
> > > > > > > > > > > the
> > > > > > > > > > > > > > > > > exception.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > вт, 4 дек. 2018 г. в 23:21, Anton
> Vinogradov
> > <
> > > > > > > > > a...@apache.org
> > > > > > > > > > >:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Dmitrii,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > The solution is not clear to me.
> > > > > > > > > > > > > > > > > > In case you expect the failure then a
> > correct
> > > > > case
> > > > > > > is to
> > > > > > > > > > wrap
> > > > > > > > > > > > it
> > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > > > try-catch block instead of no-op failure
> > > > handler
> > > > > > > usage.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > вт, 4 дек. 2018 г. в 21:41, Dmitrii
> Ryabov
> > <
> > > > > > > > > > > > > somefire...@gmail.com
> > > > > > > > > > > > > > >:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >> Anton,
> > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > >> Tests in these classes check fail cases
> > when
> > > > we
> > > > > > > expect
> > > > > > > > > > > > critical
> > > > > > > > > > > > > > > > > >> failure like node stop or exception
> > thrown.
> > > > Such
> > > > > > > tests
> > > > > > > > > > > trigger
> > > > > > > > > > > > > > > failure
> > > > > > > > > > > > > > > > > >> handler and it fails test when
> everything
> > > goes
> > > > > as it
> > > > > > > > > > should
> > > > > > > > > > > > go.
> > > > > > > > > > > > > > > That's
> > > > > > > > > > > > > > > > > >> why we need no-op handler here.
> > > > > > > > > > > > > > > > > >> вт, 4 дек. 2018 г. в 20:06, Dmitriy
> > Pavlov <
> > > > > > > > > > > > dpav...@apache.org
> > > > > > > > > > > > > >:
> > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > >> > Hi Igniters,
> > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > >> > BTW, if you find in any of your tests
> it
> > > > > does't
> > > > > > > need
> > > > > > > > > an
> > > > > > > > > > > old
> > > > > > > > > > > > > > value
> > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > >> > handler (=NoOp), feel free to remove
> it.
> > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > >> > Sincerely,
> > > > > > > > > > > > > > > > > >> > Dmitriy Pavlov
> > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > >> > вт, 4 дек. 2018 г. в 20:02, Anton
> > > > Vinogradov <
> > > > > > > > > > > a...@apache.org
> > > > > > > > > > > > >:
> > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > >> > > Dmitrii,
> > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > >> > > Could you please explain the reason
> of
> > > > > explicit
> > > > > > > set
> > > > > > > > > of
> > > > > > > > > > > > 100+
> > > > > > > > > > > > > > > > > >> > > NoOpFailureHandlers?
> > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > >> > > вт, 4 дек. 2018 г. в 19:12, Dmitrii
> > > > Ryabov <
> > > > > > > > > > > > > > > somefire...@gmail.com
> > > > > > > > > > > > > > > > >:
> > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > >> > > > Hello, Igniters!
> > > > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > > > >> > > > Today the test framework's default
> > > no-op
> > > > > > > failure
> > > > > > > > > > > handler
> > > > > > > > > > > > > was
> > > > > > > > > > > > > > > > > >> changed to
> > > > > > > > > > > > > > > > > >> > > the
> > > > > > > > > > > > > > > > > >> > > > handler, which stops the node and
> > > fails
> > > > > the
> > > > > > > test.
> > > > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > > > >> > > > Over 100 tests kept no-op failure
> > > > handler
> > > > > by
> > > > > > > > > > overrided
> > > > > > > > > > > > > > > > > >> > > > `getFailureHandler()` method.
> > > > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > > > >> > > > If you'll found a problem or
> > something
> > > > > > > unexpected
> > > > > > > > > -
> > > > > > > > > > > > write
> > > > > > > > > > > > > > here
> > > > > > > > > > > > > > > > or
> > > > > > > > > > > > > > > > > >> in the
> > > > > > > > > > > > > > > > > >> > > > ticket [1].
> > > > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > > > >> > > > [1]
> > > > > > > > > > https://issues.apache.org/jira/browse/IGNITE-8227
> > > > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > Best regards,
> > > > > > > > > > > > > Andrey V. Mashenkov
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best regards,
> > > > > > > Ivan Pavlukhin
> > > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > > Ivan Pavlukhin
> > > > >
> > > >
> > >
> >
>


-- 
Best regards,
  Andrey Kuznetsov.

[GitHub] ignite pull request #5541: IGNITE-10386 Add mode when WAL won't be disabled ...

2018-11-30 Thread andrey-kuznetsov

GitHub user andrey-kuznetsov opened a pull request:

https://github.com/apache/ignite/pull/5541

IGNITE-10386 Add mode when WAL won't be disabled during rebalancing caused 
by BLT change

Just for tests now

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andrey-kuznetsov/ignite ignite-10386

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/5541.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5541


commit 69638ce7bcaaade30eba4de800e4f29c31e9144f
Author: Andrey Kuznetsov 
Date:   2018-11-29T13:47:18Z

IGNITE-10386 Suppress IGNITE_DISABLE_WAL_DURING_REBALANCING effect when BLT 
changes.

commit a867fe85d234909cbd2b9a2163bc3482b3c952b3
Author: Andrey Kuznetsov 
Date:   2018-11-29T13:55:45Z

IGNITE-10386 (WiP) Test for change introduced, some other are broken.

commit a9d240dab55e2f05fa8d4c4437c3a12df14604bb
Author: Andrey Kuznetsov 
Date:   2018-11-29T14:57:41Z

IGNITE-10386 (WiP) Fixed testLocalAndGlobalWalStateInterdependence.




---

Re: IGNITE-2.7. New Features

2018-11-02 Thread Andrey Kuznetsov

Great news!

Future release is about to contain mission critical Ignite workers liveness
monitoring, introduced in IGNITE-6587.


пт, 2 нояб. 2018 г. в 13:23, Nikolay Izhikov :

> Hello, Guys.
>
> Good news! We have 2 final tickets for 2.7.
> So release date is very near!
>
> Let's collect new features and improvements of Ignite 2.7 and includes it
> to release notes and other documents.
>
> Can you answer and describe your contributions?
>


-- 
Best regards,
  Andrey Kuznetsov.

Re: Abbreviation code-style requirement.

2018-11-02 Thread Andrey Kuznetsov

Ivan, I agree with you: some our code style rules are really uncommon.

As for one-time contributions, if somebody decides to make a contribution
to some project, it's ok to adopt that project rules. Moreover, reviewing
committer can silently fix minor code style issues himself upon merge.

пт, 2 нояб. 2018 г. в 10:08, Павлухин Иван :

> Andrey, Yakov,
>
> Actually my concert is more about one-time contributions. I imagine
> the following. Someone finds a bug a decides to contribute a fix.
> I think it is quite common scenario in Open Source.
> He creates a PR and awaits a review. I think that a smooth and fast
> review process will encourage for new contributions. But if the review
> process is not such the contributor can simply give up.
>
> P.S. In my mind there are quite uncommon code style rules in Ignite
> project. But it is definitely not for that topic. I imagine some "New
> Contributor Survey".
>
> чт, 1 нояб. 2018 г. в 18:28, Yakov Zhdanov :
>
> > Ivan I removed "lic" from the list. Thanks for catch!
> >
> > Agree with Andrey. After several code reviews newcomers will get used to
> > abbreviations.
> >
> > Andrey, try searching for "fut" and make sure to have "Word" checked. You
> > will see plenty of usages. "f" is also ok for future in case it does not
> > bring confusion and does not hurt readability.
> >
> > Let's keep using abbreviations and treat them as mandatory requirement.
> > This is important for keeping our codebase consistent and tidy.
> >
> > --Yakov
> >
>
>
> --
> Best regards,
> Ivan Pavlukhin
>


-- 
Best regards,
  Andrey Kuznetsov.

Re: Abbreviation code-style requirement.

2018-11-01 Thread Andrey Kuznetsov

Ivan, I think it's harder to read others' code than write new code, so well
known abbreviations may be helpful. As for writing, it's a matter of habit,
and also abbeviation plugin is a good aid.

I like current abbreviation list, except 'fut'. Never saw this before
Ignite. 'f' or 'future' could look better to me. Also, futures often denote
some asynchronous action, and it could be more expressive to use the name
of the action as identifier instead of 'fut'.

чт, 1 нояб. 2018 г. в 17:46, Павлухин Иван :

> Hi Yakov and all,
>
> Recently I went through abbreviations list [1] to find items which are not
> clear
> for me. After the list was shortened by Yakov and others most of them have
> gone.
> But pay attention to "lic -> license". I cannot find usages of it in Ignite
> codebase?
> Could it be removed as well?
>
> And a little follow up. I worry how comfortable is contribution for an
> external
> contributor with presence of abbreviation rules. I always thought that long
> names are common practice in Java world. And our abbreviations might
> distract
> a typical Java engineer. Does it make any sense?
>
> [1]
>
> https://cwiki.apache.org/confluence/display/IGNITE/Abbreviation+Rules#AbbreviationRules-VariableAbbreviation
>
> чт, 1 нояб. 2018 г. в 17:33, Dmitriy Pavlov :
>
> > Hi Yakov, thank you for your efforts.
> >
> > I think no one is suggesting de-abbreviate, it would be no-sense work to
> > do. I think the initial reason to start this discussion was the case when
> > abbreviation seemed as hiding meaning, and multi-word. I'm glad we agree
> > multiword complex variables may be non-abbreviated if it is meaningful.
> >
> > Vyacheslav D.,
> >
> > could you please take a look and would you like to change abbrev plugin
> > rules?
> >
> > Sincerely,
> > Dmitriy Pavlov
> >
> > чт, 1 нояб. 2018 г. в 17:27, Yakov Zhdanov :
> >
> > > Igniters,
> > >
> > > I have shortened the list of abbreviation rules and edited our wiki
> page
> > -
> > > https://cwiki.apache.org/confluence/display/IGNITE/Abbreviation+Rules.
> > > Thanks to Vladimir Ozerov and Alexey Goncharuk for their useful
> feedback.
> > > My idea was to leave only "common sense" abbreviations and those that
> are
> > > Ignite domain specific.
> > >
> > > I would also suggest that we treat names mentioned in the table on the
> > page
> > > as names that are required to be abbreviated. Please take this into
> > account
> > > when conducting code reviews.
> > >
> > > Thanks!
> > >
> > > --Yakov
> > >
> >
>
>
> --
> Best regards,
> Ivan Pavlukhin
>


-- 
Best regards,
  Andrey Kuznetsov.

[GitHub] ignite pull request #5219: IGNITE-10079 Fixed inconsistent lastCompactedSegm...

2018-10-31 Thread andrey-kuznetsov

GitHub user andrey-kuznetsov opened a pull request:

https://github.com/apache/ignite/pull/5219

IGNITE-10079 Fixed inconsistent lastCompactedSegment in FileWALMgr.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andrey-kuznetsov/ignite ignite-10079

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/5219.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5219


commit 93c5ab7aa3a694deb83a97080d745c56d555c85d
Author: Andrey Kuznetsov 
Date:   2018-10-31T07:55:34Z

IGNITE-10079 Fixed inconsistent lastCompactedSegment in FileWALMgr.




---

[jira] [Created] (IGNITE-10079) FileWriteAheadLogManager may return invalid lastCompactedSegment

2018-10-31 Thread Andrey Kuznetsov (JIRA)

Andrey Kuznetsov created IGNITE-10079:
-

 Summary: FileWriteAheadLogManager may return invalid 
lastCompactedSegment
 Key: IGNITE-10079
 URL: https://issues.apache.org/jira/browse/IGNITE-10079
 Project: Ignite
  Issue Type: Bug
  Components: persistence
Reporter: Andrey Kuznetsov
Assignee: Andrey Kuznetsov
 Fix For: 2.8
 Attachments: WalCompactionAfterRestartTest.java

As of current {{master}} branch, 
{{FileWriteAheadLogManager#lastCompactedSegment}} may report -1 even after some 
segments have been actually compressed. Reproducer is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] ignite pull request #5092: TDR-93 Several WAL compressor fixes.

2018-10-26 Thread andrey-kuznetsov

GitHub user andrey-kuznetsov opened a pull request:

https://github.com/apache/ignite/pull/5092

TDR-93 Several WAL compressor fixes.

For test purposes.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andrey-kuznetsov/ignite tdr-93

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/5092.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5092


commit 32f31efe37eb1dbb450333ae9005a728ab56e722
Author: Andrey Kuznetsov 
Date:   2018-10-26T15:25:34Z

TDR-93 Several WAL compressor fixes.




---

Re: Unreliable checks in tests for string presence in GridStringLogger contents

2018-10-25 Thread Andrey Kuznetsov

Nikita,

I like your suggestion. It looks more expressive for me than existing
throwing version.

чт, 25 окт. 2018 г. в 21:07, Nikita Amelchev :

> Hi, Igniters.
>
> I suggest improving new listening test logger.
>
> I found usage case when needs wait for conditions for test duration
> optimization.
> For example, that messages A and B will be logged.
>
> For now, LogListener.check() doesn't return checking result as boolean.
> It throws the exception if conditions fail. Code for this case:
>
> waitForCondition(() -> {
>  try {
>lsnr.check();
>
>return true;
>  }
>  catch (AssertionError ignored) {
>return false;
>  }
> }, timeout);
>
> For code readability, I suggest make LogListener.check() with boolean type:
>
> waitForCondition(lsnr::check, timeout);
>
> Also, it's more understandable when we write explicit assert in tests:
> assertTrue("Fail reason.", lsnr.check());
> ср, 23 мая 2018 г. в 14:36, Andrey Kuznetsov :
> >
> > Thanks, Vyacheslav.
> >
> > Created the issue [1] based on your idea.
> >
> > [1]  https://issues.apache.org/jira/browse/IGNITE-8570
> >
> >
> > 2018-05-23 12:41 GMT+03:00 Vyacheslav Daradur :
> >
> > > Hi, Andrey, I have faced this problem too.
> > >
> > > I'd suggest introducing new logger for tests instead of extending API
> > > of *GridStringLogger*.
> > >
> > > The new logger should be some kind of *listened*, for example with the
> > > folowing API:
> > >
> > > void addListener(String pattern, CountDownLatch latch);
> > > void addListener(IgniteInClosure lsnr);
> > >
> > > This approach reduces memory load in comparison with
> *GridStringLogger*.
> > >
> > > Just for example these should demonstrate my idea, *listened logger* -
> > > [1], *listener* - [2]:
> > >
> > > [1] https://github.com/apache/ignite/blob/master/modules/
> > >
> compatibility/src/test/java/org/apache/ignite/compatibility/testframework/
> > > junits/logger/ListenedGridTestLog4jLogger.java
> > > [2] https://github.com/apache/ignite/blob/master/modules/
> > >
> compatibility/src/test/java/org/apache/ignite/compatibility/testframework/
> > > junits/IgniteCompatibilityAbstractTest.java#L304
> > >
> > >
> > >
> > --
> > Best regards,
> >   Andrey Kuznetsov.
>
>
>
> --
> Best wishes,
> Amelchev Nikita
>


-- 
Best regards,
  Andrey Kuznetsov.

[GitHub] ignite pull request #5084: IGNITE-10003 Changed checkpointReadLock timeout f...

2018-10-25 Thread andrey-kuznetsov

GitHub user andrey-kuznetsov opened a pull request:

https://github.com/apache/ignite/pull/5084

IGNITE-10003 Changed checkpointReadLock timeout failure type.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andrey-kuznetsov/ignite ignite-10003

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/5084.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5084


commit 61b26e4004bf70d40e8aae55422c8e60e11761ac
Author: Andrey Kuznetsov 
Date:   2018-10-25T15:34:02Z

IGNITE-10003 Changed checkpointReadLock timeout failure type.




---

[jira] [Created] (IGNITE-10003) Raise SYSTEM_WORKER_BLOCKED instead of CRITICAL_ERROR when checkpoint read lock timeout detected

2018-10-25 Thread Andrey Kuznetsov (JIRA)

Andrey Kuznetsov created IGNITE-10003:
-

 Summary: Raise SYSTEM_WORKER_BLOCKED instead of CRITICAL_ERROR 
when checkpoint read lock timeout detected
 Key: IGNITE-10003
 URL: https://issues.apache.org/jira/browse/IGNITE-10003
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.7
Reporter: Andrey Kuznetsov
 Fix For: 2.8


{{GridCacheDatabaseSharedManager#failCheckpointReadLock}} should report 
{{SYSTEM_WORKER_BLOCKED}} to failure handler: it is closer to the truth and 
default consequenses are not so severe as opposed to {{CRITICAL_ERROR}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Switching to real FailureHandler in tests

2018-10-22 Thread Andrey Kuznetsov

Hi, Dmitrii!

As for [1], I think your suggestion is good to complete the change.

As for [2], I tend to disagree: in future tests default no-op handler can
hide bugs of new Ignite functionality. If it is really needed, developer
can explicitly mention that critical failure is a normal part of test
operation.

[1] https://issues.apache.org/jira/browse/IGNITE-8227
[2] https://issues.apache.org/jira/browse/IGNITE-9660

пн, 22 окт. 2018 г. в 15:18, Dmitrii Ryabov :

> I tried to replace default no-op handler by handler stopping node and
> failing the test.
>
> I've returned the no-op handler in many classes because critical
> situations are expected behavior. But PR still have a lot of failed
> tests and suites. In some tests, I can't understand a failure reason.
>
> I'm not finished to check failures, but after several RunAll runs, I
> see new flaky tests (1 or 2 fails) appeared because of new handler.
>
> I think we should keep no-op handler as default, but add new handler
> for a few classes, where critical situations aren't expected.
>
> пт, 21 сент. 2018 г. в 17:03, Andrey Kuznetsov :
> >
> > Thanks to all for participating the discussion.
> >
> > I've updated [1]: now it requires new handler from [2] for completion.
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-9660
> > [2] https://issues.apache.org/jira/browse/IGNITE-8227
> >
> > чт, 20 сент. 2018 г. в 21:56, Vladimir Ozerov :
> >
> > > Stop node handler is not very good choice. Some test will continue
> work as
> > > usual even if some node failed. E.g. SQL queries with backups may
> continue
> > > function in some cases, especially if these are test with REPLICATED
> cache.
> > >
> > > New test-scope handler looks like a better candidate to me.
> > >
> > > чт, 20 сент. 2018 г. в 21:22, Andrey Kuznetsov :
> > >
> > > > I meant the first comment in [1]. We are to decide first whether
> we'll do
> > > > it or not.
> > > >
> > > > [1] https://issues.apache.org/jira/browse/IGNITE-8227
> > > > <
> > > >
> > >
> https://issues.apache.org/jira/browse/IGNITE-8227?focusedCommentId=16435298=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16435298
> > > > >
> > > >
> > > > чт, 20 сент. 2018 г. в 21:18, Dmitriy Pavlov  >:
> > > >
> > > > > Sorry, incomplete message.
> > > > >
> > > > > Why do you think there is no consensus?
> > > > >
> > > > > I have no clue what can be a reason for another approach.
> > > > > By default failure handler should fail all test.
> > > > >
> > > > > Failure handlers test will be always a minority of tests, so fail
> > > handler
> > > > > call is something abnormal.
> > > > >
> > > > > чт, 20 сент. 2018 г. в 21:15, Dmitriy Pavlov <
> dpavlov@gmail.com>:
> > > > >
> > > > > > Why do you think there is no consensus?
> > > > > >
> > > > > > I have no clue that by default failure handler should fail all
> test.
> > > > > >
> > > > > > чт, 20 сент. 2018 г. в 21:10, Andrey Kuznetsov <
> stku...@gmail.com>:
> > > > > >
> > > > > >> I've created [1] to address this.
> > > > > >>
> > > > > >> Dmitriy, I like your idea of creating special test-scope
> handler.
> > > But
> > > > > >> there
> > > > > >> is no consensus about it, so I don't want to rely on that
> potential
> > > > > >> handler
> > > > > >> right now. We can switch to it later, of course.
> > > > > >>
> > > > > >> [1] https://issues.apache.org/jira/browse/IGNITE-9660
> > > > > >>
> > > > > >> чт, 20 сент. 2018 г. в 20:03, Maxim Muzafarov <
> maxmu...@gmail.com>:
> > > > > >>
> > > > > >> > Andrey,
> > > > > >> >
> > > > > >> > I like your idea.
> > > > > >> >
> > > > > >> > After changing the default node failure handler to the new
> one we
> > > > > should
> > > > > >> > carefully review the whole new test failures. For instance,
> > > calling
> > > > > this
> > > > > >> > method in tests should not lead test to the node bei

[GitHub] ignite pull request #5026: IGNITE-9932 Ignoring exchanger critical section b...

2018-10-18 Thread andrey-kuznetsov

GitHub user andrey-kuznetsov opened a pull request:

https://github.com/apache/ignite/pull/5026

IGNITE-9932 Ignoring exchanger critical section begin/end if called from 
illegal thread.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andrey-kuznetsov/ignite ignite-9932

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/5026.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5026


commit f5db2ad273b5e79b30971e6be5e4c5e501e0315e
Author: Andrey Kuznetsov 
Date:   2018-10-18T14:08:16Z

IGNITE-9932 Ignoring exchanger critical section begin/end if called from 
illegal thread.




---

Re: Apache Ignite 2.7. Last Mile

2018-10-18 Thread Andrey Kuznetsov

I have got one more potential 2.7 blocker [1] with straightforward fix. I
beleive it will not break any production use case, but it leads to test
suite hang, thus affecting other urgent issues.

[1] https://issues.apache.org/jira/browse/IGNITE-9932

чт, 18 окт. 2018 г. в 14:59, Ivan Daschinsky :

> Hi! Is it possible to merge IGNITE-9854? Fix is pretty simple, but quite
> important.
>
> ср, 17 окт. 2018 г. в 17:49, Andrey Gura :
>
> > JFYI
> >
> > IGNITE-9737 and IGNITE-9710 are merged to release branch.
> > On Wed, Oct 17, 2018 at 5:41 PM Pavel Tupitsyn 
> > wrote:
> > >
> > > Thank you. Fix has been merged to master and cherry-picked to
> ignite-2.7.
> > >
> > > On Wed, Oct 17, 2018 at 1:26 PM Nikolay Izhikov 
> > wrote:
> > >
> > > > Pavel.
> > > >
> > > > Ok, I agree to include this ticket into 2.7
> > > > Let's do it.
> > > >
> > > > В Ср, 17/10/2018 в 13:20 +0300, Pavel Tupitsyn пишет:
> > > > > Nikolay,
> > > > >
> > > > > It completely breaks a major feature under certain conditions. I
> > would
> > > > > consider it a blocker.
> > > > >
> > > > > On Wed, Oct 17, 2018 at 1:00 PM Nikolay Izhikov <
> nizhi...@apache.org
> > >
> > > > wrote:
> > > > >
> > > > > > Hello, Pavel.
> > > > > >
> > > > > > Is it a blocker?
> > > > > >
> > > > > > В Ср, 17/10/2018 в 12:58 +0300, Pavel Tupitsyn пишет:
> > > > > > > Hi Igniters,
> > > > > > >
> > > > > > > I'd like to include IGNITE-9877 in 2.7, can we do that?
> > > > > > > The fix is ready, I'm waiting for TC run.
> > > > > > >
> > > > > > > Pavel
> > > > > > >
> > > > > > > On Wed, Oct 17, 2018 at 11:45 AM Павлухин Иван <
> > vololo...@gmail.com>
> > > > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi NIkolay,
> > > > > > > >
> > > > > > > > Thank you for keeping everybody focused! Regarding to my
> ticket
> > > > > > > > IGNITE-5935.
> > > > > > > > It is in final stage now. Tests look good. I believe that it
> > will
> > > > be
> > > > > >
> > > > > > merged
> > > > > > > > in couple of days (at most).
> > > > > > > >
> > > > > > > > ср, 17 окт. 2018 г. в 11:39, Nikolay Izhikov <
> > nizhi...@apache.org
> > > > >:
> > > > > > > >
> > > > > > > > > Hello, Igniters.
> > > > > > > > >
> > > > > > > > > 9 tickets to go!
> > > > > > > > >
> > > > > > > > > Alexey Goncharuk - IGNITE-9784
> > > > > > > > > Dmitriy Govorukhin - IGNITE-9898
> > > > > > > > > Andrey Kuznetsov   - IGNITE-9737, IGNITE-9710
> > > > > > > > > Taras Ledkov - IGNITE-9882
> > > > > > > > > Petr Ivanov - IGNITE-9852
> > > > > > > > > Ivan Pavlukhin - IGNITE-5935
> > > > > > > > > Roman Kondakov - IGNITE-9663
> > > > > > > > > Alexey Stelmak - IGNITE-9776
> > > > > > > > >
> > > > > > > > > В Вт, 16/10/2018 в 16:20 +0300, Andrey Gura пишет:
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > I've found that IGNITE-9723 was resolved but didn't
> cherry
> > > > picked
> > > > > >
> > > > > > to
> > > > > > > > > > ignite-2.7 branch. So I'll do it.
> > > > > > > > > > On Tue, Oct 16, 2018 at 2:30 PM Nikolay Izhikov <
> > > > > >
> > > > > > nizhi...@apache.org>
> > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > Hello, Igniters.
> > > > > > > > > > >
> > > > > > > > > > > We have 13 tickets mapped to 2.7.
> > > >

[jira] [Created] (IGNITE-9932) Exchanger blocking session bounds can be accessed from invalid thread

2018-10-18 Thread Andrey Kuznetsov (JIRA)

Andrey Kuznetsov created IGNITE-9932:


 Summary: Exchanger blocking session bounds can be accessed from 
invalid thread
 Key: IGNITE-9932
 URL: https://issues.apache.org/jira/browse/IGNITE-9932
 Project: Ignite
  Issue Type: Bug
Reporter: Andrey Kuznetsov
Assignee: Andrey Kuznetsov


{{GridDhtPartitionExchangeFuture}} uses critical sections surrounded by 
{{exchangerBlockingSectionBegin}} and {{exchangerBlockingSectionEnd}}. 
Currently, these begin/end bounds assert they are called from 
partition-exchanger thread. It appeared that this assertion can be failed 
reasonably. So it is better to make begin/end bounds no-op unless they are 
called from partition-exchanger thread.

{{IgniteStableBaselineBinObjFieldsQuerySelfTest#testQueryReplicatedTransactional}}
 may hang due to this issue, see [1]. Exception stack trace leading to critical 
failure follows.

{noformat}
java.lang.AssertionError
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.exchangerBlockingSectionBegin(GridCachePartitionExchangeManager.java:2351)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.waitUntilNewCachesAreRegistered(GridDhtPartitionsExchangeFuture.java:2261)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:2066)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processFullMessage(GridDhtPartitionsExchangeFuture.java:3980)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$2100(GridDhtPartitionsExchangeFuture.java:141)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$7.apply(GridDhtPartitionsExchangeFuture.java:3667)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$7.apply(GridDhtPartitionsExchangeFuture.java:3655)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:385)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:355)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onReceiveFullMessage(GridDhtPartitionsExchangeFuture.java:3655)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.processFullPartitionUpdate(GridCachePartitionExchangeManager.java:1655)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$3.onMessage(GridCachePartitionExchangeManager.java:393)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$3.onMessage(GridCachePartitionExchangeManager.java:380)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:3178)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:3157)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1056)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:581)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:380)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:306)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:101)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:295)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1569)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1197)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:127)
at 
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1093)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{noformat}

[1] 
https://ci.ignite.apache.org/viewLog.html?buildId=2111470=buildResultsDiv=IgniteTests24Java8_BinaryObjectsSimpleMapperQueries




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] ignite pull request #4962: IGNITE-9710 Ignite watchdog service handles longr...

2018-10-17 Thread andrey-kuznetsov

Github user andrey-kuznetsov closed the pull request at:

https://github.com/apache/ignite/pull/4962


---

[GitHub] ignite pull request #4914: IGNITE-9737

2018-10-17 Thread andrey-kuznetsov

Github user andrey-kuznetsov closed the pull request at:

https://github.com/apache/ignite/pull/4914


---

Re: Applicability of term 'cache' to Apache Ignite

2018-10-17 Thread Andrey Kuznetsov

I'm not an ML expert, so 'dataset' term just reminds me of various client
drivers to access tables from RDBM servers. For me, the only common trait
of all kinds of Ignite caches is their asociativity. So if we rename them
I'd suggest something like KVStore.

ср, 17 окт. 2018 г. в 12:56, Alexey Zinoviev :

> From my perspective, the main goal is to make easy the explanation what is
> Ignite on conferences, marketing deals, in papers, in documentation. And
> the
> /cache/ term really reduces the area of Ignite usage in users minds.
>
> I don't support the critical changes in code base, but I support all
> changes
> that helps the goal described above in this letter.
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>


-- 
Best regards,
  Andrey Kuznetsov.

Re: Critical worker threads liveness checking drawbacks

2018-10-11 Thread Andrey Kuznetsov

Igniters,

Now I spot blocking / long-running code arising from
{{GridDhtPartitionsExchangeFuture#init}} calls in partition-exchanger
thread, see [1]. Ideally, all blocking operations along all possible code
paths should be guarded implicitly from critical failure detector to avoid
the thread from being considered blocked. There is a pull request [2] that
provides shallow solution. I didn't change code outside
{{GridDhtPartitionsExchangeFuture}}, otherwise it could be broken by any
upcoming change. Also, I didn't touch the code runnable by threads other
than partition-exchanger. So I have a number of guarded sections that are
wider than they could be, and this potentially hides issues from failure
detector. Does this PR make sense? Or maybe it's better to exclude
partition-exchanger from critical threads registry at all?

[1] https://issues.apache.org/jira/browse/IGNITE-9710
[2] https://github.com/apache/ignite/pull/4962


пт, 28 сент. 2018 г. в 18:56, Maxim Muzafarov :

> Andrey, Andrey
>
> > Thanks for being attentive! It's definitely a typo. Could you please
> create
> > an issue?
>
> I've created an issue [1] and prepared PR [2].
> Please, review this change.
>
> [1] https://issues.apache.org/jira/browse/IGNITE-9723
> [2] https://github.com/apache/ignite/pull/4862
>
> On Fri, 28 Sep 2018 at 16:58 Yakov Zhdanov  wrote:
>
> > Config option + mbean access. Does that make sense?
> >
> > Yakov
> >
> > On Fri, Sep 28, 2018, 17:17 Vladimir Ozerov 
> wrote:
> >
> > > Then it should be config option.
> > >
> > > пт, 28 сент. 2018 г. в 13:15, Andrey Gura :
> > >
> > > > Guys,
> > > >
> > > > why we need both config option and system property? I believe one way
> > is
> > > > enough.
> > > > On Fri, Sep 28, 2018 at 12:38 PM Nikolay Izhikov <
> nizhi...@apache.org>
> > > > wrote:
> > > > >
> > > > > Ticket created - https://issues.apache.org/jira/browse/IGNITE-9737
> > > > >
> > > > > Fixed version is 2.7.
> > > > >
> > > > > В Пт, 28/09/2018 в 11:41 +0300, Alexey Goncharuk пишет:
> > > > > > Nikolay, I agree, a user should be able to disable both thread
> > > liveness
> > > > > > check and checkpoint read lock timeout check from config and a
> > system
> > > > > > property.
> > > > > >
> > > > > > пт, 28 сент. 2018 г. в 11:30, Nikolay Izhikov <
> nizhi...@apache.org
> > >:
> > > > > >
> > > > > > > Hello, Igniters.
> > > > > > >
> > > > > > > I found that this feature can't be disabled from config.
> > > > > > > The only way to disable it is from JMX bean.
> > > > > > >
> > > > > > > I think it very dangerous: If we have some corner case or a bug
> > in
> > > > this
> > > > > > > Watch Dog it can make Ignite unusable.
> > > > > > > I propose to implement possibility to disable this feature
> both -
> > > > from
> > > > > > > config and from JVM options.
> > > > > > >
> > > > > > > What do you think?
> > > > > > >
> > > > > > > В Чт, 27/09/2018 в 16:14 +0300, Andrey Kuznetsov пишет:
> > > > > > > > Maxim,
> > > > > > > >
> > > > > > > > Thanks for being attentive! It's definitely a typo. Could you
> > > > please
> > > > > > >
> > > > > > > create
> > > > > > > > an issue?
> > > > > > > >
> > > > > > > > чт, 27 сент. 2018 г. в 16:00, Maxim Muzafarov <
> > > maxmu...@gmail.com
> > > > >:
> > > > > > > >
> > > > > > > > > Folks,
> > > > > > > > >
> > > > > > > > > I've found in `GridCachePartitionExchangeManager:2684` [1]
> > > > (master
> > > > > > >
> > > > > > > branch)
> > > > > > > > > exchange future wrapped
> > > > > > > > > with double `blockingSectionEnd` method. Is it correct? I
> > just
> > > > want to
> > > > > > > > > understand this change and
> > > > > > > > > how should I use this in the future.
> > > > > > > > >
> >

[GitHub] ignite pull request #4559: Ignite 9280

2018-10-11 Thread andrey-kuznetsov

Github user andrey-kuznetsov closed the pull request at:

https://github.com/apache/ignite/pull/4559


---

[GitHub] ignite pull request #4736: Ignite 6587 debug

2018-10-11 Thread andrey-kuznetsov

Github user andrey-kuznetsov closed the pull request at:

https://github.com/apache/ignite/pull/4736


---

[GitHub] ignite pull request #4089: Ignite 6587 true

2018-10-11 Thread andrey-kuznetsov

Github user andrey-kuznetsov closed the pull request at:

https://github.com/apache/ignite/pull/4089


---

[GitHub] ignite pull request #4962: IGNITE-9710 Ignite watchdog service handles longr...

2018-10-11 Thread andrey-kuznetsov

GitHub user andrey-kuznetsov opened a pull request:

https://github.com/apache/ignite/pull/4962

IGNITE-9710 Ignite watchdog service handles longrunning cache creation



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andrey-kuznetsov/ignite ignite-9710

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/4962.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4962


commit 2eb31c9d1680ecb7a3c16284c96fc4641e330dba
Author: Andrey Kuznetsov 
Date:   2018-10-11T15:09:13Z

IGNITE-9710 Added blocking sections in GridDhtPartitionsExchangeFuture#init.

commit 269bc84894138d654fe448ca93b09369e9878e3d
Author: Andrey Kuznetsov 
Date:   2018-10-11T16:21:44Z

Merge branch 'master' into ignite-9710

# Conflicts:
#   
modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/preloader/GridDhtPartitionsExchangeFuture.java

commit d667c11853a212fb3bad41d4f7be933f4ff35dc1
Author: Andrey Kuznetsov 
Date:   2018-10-11T18:10:33Z

IGNITE-9710 Fixing blocking sections.




---

[GitHub] ignite pull request #4015: Ignite 6587

2018-10-11 Thread andrey-kuznetsov

Github user andrey-kuznetsov closed the pull request at:

https://github.com/apache/ignite/pull/4015


---

[jira] [Created] (IGNITE-9860) Unreliable listener invocation order in GridDhtPartitionsExchangeFuture#onDone

2018-10-11 Thread Andrey Kuznetsov (JIRA)

Andrey Kuznetsov created IGNITE-9860:


 Summary: Unreliable listener invocation order in 
GridDhtPartitionsExchangeFuture#onDone
 Key: IGNITE-9860
 URL: https://issues.apache.org/jira/browse/IGNITE-9860
 Project: Ignite
  Issue Type: Bug
Reporter: Andrey Kuznetsov
 Fix For: 2.8


Listener being added right before {{super.onDone()}} call is intended to be 
invoked earlier than all other listeners. There is a small probability of 
breaking this guarantee: some other thread can call {{listen()}} before 
future-completing thread enters {{super.onDone()}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (IGNITE-9838) TxStateChangeEventTest fails sometimes on TeamCity

2018-10-10 Thread Andrey Kuznetsov (JIRA)

Andrey Kuznetsov created IGNITE-9838:


 Summary: TxStateChangeEventTest fails sometimes on TeamCity
 Key: IGNITE-9838
 URL: https://issues.apache.org/jira/browse/IGNITE-9838
 Project: Ignite
  Issue Type: Test
Reporter: Andrey Kuznetsov
 Fix For: 2.8


Both test methods may fail to acquire transaction lock. Presumably, timeout 
increasing can be enough to fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Apache Ignite 2.7 release

2018-10-09 Thread Andrey Kuznetsov

Igniters,

Recently, I have filed an issue [1] that deals with possible hanging of WAL
logging. I will appreciate your thoughts on its severity. To make logging
hang two conditions should be satisfied: WAL mode is {{FSYNC}}, and WAL
archiving is disabled. Should we investigate and fix this immediately or is
it possible to postpone till 2.8?

[1] https://issues.apache.org/jira/browse/IGNITE-9776

вт, 9 окт. 2018 г. в 11:17, Andrey Kuznetsov :

> Ignite committers,
>
> I have prepared a PR for 2.7 blocker [1]. Could anybody merge it to 2.7
> and master?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-9737
>
>
> ср, 3 окт. 2018 г. в 14:02, Nikolay Izhikov :
>
>> Alexey.
>>
>> Sorry, I lost link to IGNITE-9760 in this thread :)
>>
>> Thanks, for a clarification.
>>
>>
>> В Ср, 03/10/2018 в 13:58 +0300, Alexey Goncharuk пишет:
>> > Nikolay, both commits fixed a regression compared to ignite-2.6. First
>> one was mentioned by Anton Kalashnikov before (java-level deadlock during
>> WAL flush), another - by Andrey Kuznetsov (NPE during a concurrent WAL
>> flush).
>> >
>> > --AG
>> >
>> > ср, 3 окт. 2018 г. в 13:38, Nikolay Izhikov :
>> > > Hello, Igniters.
>> > >
>> > > Release scope is frozen.
>> > > Please, if you include some new issues in release - discuss it in
>> this thread.
>> > >
>> > > Alexey, can you, please, comment on including fix for IGNITE-9760,
>> IGNITE-9761 in 2.7 branch.
>> > >
>> > >
>> https://git-wip-us.apache.org/repos/asf?p=ignite.git;a=commit;h=3355201f3e8cafd23b2250aaf3b91b8b8ed1
>> > >
>> https://git-wip-us.apache.org/repos/asf?p=ignite.git;a=commit;h=9d6e6ff394c05ddf7ef31a9d9ed1b492d9eeba69
>> > >
>> > > В Ср, 03/10/2018 в 13:24 +0300, Vladimir Ozerov пишет:
>> > > > Nobody vetos anything, let's stop use this term unless some really
>> > > > important problem is discussed.
>> > > >
>> > > > At this point we are in situation when new tickets are still
>> included into
>> > > > the scope. All want to ask is to stop including new tickets without
>> > > > explaining on why they should be in AI 2.7. Regression between is
>> AI 2.6
>> > > > and AI 2.7 is enough. But "I found new NPE" is not.
>> > > >
>> > > > On Wed, Oct 3, 2018 at 11:10 AM Dmitriy Pavlov <
>> dpavlov@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > Nikolay,
>> > > > >
>> > > > > this has nothing about scaring someone. Let me explain about
>> Apache Way.
>> > > > >
>> > > > > Voting -1 to release does not mean blocking it, release can't be
>> vetoed.
>> > > > > Approving release is done by policy: majority approval. 3+1
>> binding and
>> > > > > more +1 than -1. Consensus approval is better but not mandatory.
>> > > > >
>> > > > > Instead, if PMC says -1 to code modification it means veto and
>> can't be
>> > > > > bypassed to anyone. This is a very strong statement, which should
>> be
>> > > > > applied reasonably and with technical justification. Lack of
>> > > > > understanding is not a justification.
>> > > > >
>> > > > > So my point instead of vetoing bugfix let's veto commits where
>> the bugs
>> > > > > were introduced. I feel a number of bugs reported recently are all
>> > > > > connected to WalManager, and these bugs may come from just a
>> couple of
>> > > > > fixes. PDS tests were quite stable last time, so I think it is
>> possible to
>> > > > > find out why WAL crashes and hangs.
>> > > > >
>> > > > > Sincerely,
>> > > > > Dmitriy Pavlov
>> > > > >
>> > > > > ср, 3 окт. 2018 г. в 10:05, Andrey Kuznetsov :
>> > > > >
>> > > > > > Vladimir, Nikolay,
>> > > > > >
>> > > > > > For sure, I'm not an experienced Ignite contributor, so I'm
>> sorry for
>> > > > > > intervening. I've just run the reproducer from [1] against
>> ignite-2.6
>> > > > > > branch and it has passed. So, it's not an legacy bug, we've
>> brought it
>> > > > >
>> > > > > with
>> > > > > > some c

Re: Apache Ignite 2.7 release

2018-10-09 Thread Andrey Kuznetsov

Ignite committers,

I have prepared a PR for 2.7 blocker [1]. Could anybody merge it to 2.7 and
master?

[1] https://issues.apache.org/jira/browse/IGNITE-9737


ср, 3 окт. 2018 г. в 14:02, Nikolay Izhikov :

> Alexey.
>
> Sorry, I lost link to IGNITE-9760 in this thread :)
>
> Thanks, for a clarification.
>
>
> В Ср, 03/10/2018 в 13:58 +0300, Alexey Goncharuk пишет:
> > Nikolay, both commits fixed a regression compared to ignite-2.6. First
> one was mentioned by Anton Kalashnikov before (java-level deadlock during
> WAL flush), another - by Andrey Kuznetsov (NPE during a concurrent WAL
> flush).
> >
> > --AG
> >
> > ср, 3 окт. 2018 г. в 13:38, Nikolay Izhikov :
> > > Hello, Igniters.
> > >
> > > Release scope is frozen.
> > > Please, if you include some new issues in release - discuss it in this
> thread.
> > >
> > > Alexey, can you, please, comment on including fix for IGNITE-9760,
> IGNITE-9761 in 2.7 branch.
> > >
> > >
> https://git-wip-us.apache.org/repos/asf?p=ignite.git;a=commit;h=3355201f3e8cafd23b2250aaf3b91b8b8ed1
> > >
> https://git-wip-us.apache.org/repos/asf?p=ignite.git;a=commit;h=9d6e6ff394c05ddf7ef31a9d9ed1b492d9eeba69
> > >
> > > В Ср, 03/10/2018 в 13:24 +0300, Vladimir Ozerov пишет:
> > > > Nobody vetos anything, let's stop use this term unless some really
> > > > important problem is discussed.
> > > >
> > > > At this point we are in situation when new tickets are still
> included into
> > > > the scope. All want to ask is to stop including new tickets without
> > > > explaining on why they should be in AI 2.7. Regression between is AI
> 2.6
> > > > and AI 2.7 is enough. But "I found new NPE" is not.
> > > >
> > > > On Wed, Oct 3, 2018 at 11:10 AM Dmitriy Pavlov <
> dpavlov@gmail.com>
> > > > wrote:
> > > >
> > > > > Nikolay,
> > > > >
> > > > > this has nothing about scaring someone. Let me explain about
> Apache Way.
> > > > >
> > > > > Voting -1 to release does not mean blocking it, release can't be
> vetoed.
> > > > > Approving release is done by policy: majority approval. 3+1
> binding and
> > > > > more +1 than -1. Consensus approval is better but not mandatory.
> > > > >
> > > > > Instead, if PMC says -1 to code modification it means veto and
> can't be
> > > > > bypassed to anyone. This is a very strong statement, which should
> be
> > > > > applied reasonably and with technical justification. Lack of
> > > > > understanding is not a justification.
> > > > >
> > > > > So my point instead of vetoing bugfix let's veto commits where the
> bugs
> > > > > were introduced. I feel a number of bugs reported recently are all
> > > > > connected to WalManager, and these bugs may come from just a
> couple of
> > > > > fixes. PDS tests were quite stable last time, so I think it is
> possible to
> > > > > find out why WAL crashes and hangs.
> > > > >
> > > > > Sincerely,
> > > > > Dmitriy Pavlov
> > > > >
> > > > > ср, 3 окт. 2018 г. в 10:05, Andrey Kuznetsov :
> > > > >
> > > > > > Vladimir, Nikolay,
> > > > > >
> > > > > > For sure, I'm not an experienced Ignite contributor, so I'm
> sorry for
> > > > > > intervening. I've just run the reproducer from [1] against
> ignite-2.6
> > > > > > branch and it has passed. So, it's not an legacy bug, we've
> brought it
> > > > >
> > > > > with
> > > > > > some change of 2.7 scope. Is it still ok to ignore the bug?
> > > > > >
> > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-9776
> > > > > >
> > > > > > ср, 3 окт. 2018 г. в 2:07, Nikolay Izhikov  >:
> > > > > >
> > > > > > > Hello, Dmitriy.
> > > > > > >
> > > > > > > I'm sorry, but I don't understand your concern.
> > > > > > >
> > > > > > > Vladimir just asks experienced Ignite contributor to *explain
> impact*
> > > > >
> > > > > of
> > > > > > a
> > > > > > > bug.
> > > > > > >
> > > > > > > Why are you scaring us with your

[GitHub] ignite pull request #4914: Ignite 9737

2018-10-04 Thread andrey-kuznetsov

GitHub user andrey-kuznetsov opened a pull request:

https://github.com/apache/ignite/pull/4914

Ignite 9737



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andrey-kuznetsov/ignite ignite-9737

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/4914.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4914


commit 23591291a3ce6401e0a98db961bf0ac90649017b
Author: Andrey Kuznetsov 
Date:   2018-10-03T16:08:54Z

IGNITE-9737 Added separate timeout for blocked workers detection.

commit e07e40cade56a33ea57ee98907eb40d5a2b3
Author: Andrey Kuznetsov 
Date:   2018-10-04T11:56:20Z

IGNITE-9737 Added separate timeout for checkpoint read lock.

commit e043f9aece6941d30233341249de14f3f40d457b
Author: Andrey Kuznetsov 
Date:   2018-10-04T12:18:28Z

IGNITE-9737 Refined throwing rules on cp-read-lock timeout.




---

Re: Apache Ignite 2.7 release

2018-10-03 Thread Andrey Kuznetsov

Vladimir, Nikolay,

For sure, I'm not an experienced Ignite contributor, so I'm sorry for
intervening. I've just run the reproducer from [1] against ignite-2.6
branch and it has passed. So, it's not an legacy bug, we've brought it with
some change of 2.7 scope. Is it still ok to ignore the bug?

[1] https://issues.apache.org/jira/browse/IGNITE-9776

ср, 3 окт. 2018 г. в 2:07, Nikolay Izhikov :

> Hello, Dmitriy.
>
> I'm sorry, but I don't understand your concern.
>
> Vladimir just asks experienced Ignite contributor to *explain impact* of a
> bug.
>
> Why are you scaring us with your "-1"?
> Is it Apache Way to do so?
> What should be done for you to return to a constructive discussion?
>
> В Ср, 03/10/2018 в 00:23 +0300, Dmitriy Pavlov пишет:
> > Hi Igniters, Vladimir,
> >
> > NPEs or hangs in WAL is a completely non-functional grid (if persistence
> > enabled).
> >
> > I see no reasons to release 2.7 with such symptoms until we're sure it is
> > too rare/impossible to reproduce. But it seems it is not the case. I will
> > definitely vote -1 for the release if I'm aware of such problems exist
> and
> > were not researched. Community guarantees the quality and usability of
> the
> > product.
> >
> > We should ask and answer other questions:
> > 1) why there are a lot of NPEs and hangs reported recently in the same
> area
> > 2) and why we signed-off commit(s).
> >
> > Probably we can identify and revert these commit(s) from 2.7 and research
> > these failures in master (with no rush).
> >
> > Sincerely,
> > Dmitriy Pavlov
> >
> > вт, 2 окт. 2018 г. в 23:54, Vladimir Ozerov :
> >
> > > Andrey, Anton,
> > >
> > > How do you conclude that these tickets are blockers? What is the
> impact to
> > > users and in what circumstances users can met them?
> > >
> > > Note that we have many hundreds opened bugs, and yet we do not strive
> to
> > > include them all, because bug != blocker.
> > >
> > > So -1 from my side to including these tickets to release scope, unless
> > > impact is explained.
> > >
> > > Vladimir.
> > >
> > > вт, 2 окт. 2018 г. в 22:45, Andrey Kuznetsov :
> > >
> > > > I've caught a bug [1] in FsyncModeFileWriteAheadLogManager. It looks
> > >
> > > like a
> > > > release blocker to me.
> > > >
> > > > [1] https://issues.apache.org/jira/browse/IGNITE-9776
> > > >
> > > > вт, 2 окт. 2018 г. в 13:14, Dmitriy Pavlov :
> > > >
> > > > > Hi Anton,
> > > > >
> > > > >  I definitely agree it is a blocker.
> > > > >
> > > > > Sincerely,
> > > > > Dmitriy Pavlov
> > > > >
> > > > > вт, 2 окт. 2018 г. в 13:09, Anton Kalashnikov :
> > > > >
> > > > > > Hi Igniters.
> > > > > >
> > > > > > I have one more possible blockers - deadlock in archiver -
> > > > > > https://issues.apache.org/jira/browse/IGNITE-9761. I almost
> fixed
> > >
> > > it.
> > > > > > It seems it should be include to scope.
> > > > > >
> > > > > > --
> > > > > > Best regards,
> > > > > > Anton Kalashnikov
> > > > > >
> > > > > >
> > > > > > 02.10.2018, 00:08, "Dmitriy Setrakyan" :
> > > > > > > Thanks, got it.
> > > > > > >
> > > > > > > On Mon, Oct 1, 2018 at 1:14 PM Dmitriy Pavlov <
> > >
> > > dpavlov@gmail.com
> > > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > >  Here I agree with Vladimir. Furthermore, I do my absolute
> best to
> > > > > >
> > > > > > finalize
> > > > > > > >  all reviews in all 2.7 tickets I'm related to. I think most
> of
> > >
> > > the
> > > > > > > >  contributors doing the same.
> > > > > > > >
> > > > > > > >  пн, 1 окт. 2018 г. в 23:03, Vladimir Ozerov <
> > >
> > > voze...@gridgain.com
> > > > > :
> > > > > > > >
> > > > > > > >  > This is precisely the scope we have at the moment. All
> these
> > > > >
> > > > > tickets
> > > >

Re: Apache Ignite 2.7 release

2018-10-02 Thread Andrey Kuznetsov

I've caught a bug [1] in FsyncModeFileWriteAheadLogManager. It looks like a
release blocker to me.

[1] https://issues.apache.org/jira/browse/IGNITE-9776

вт, 2 окт. 2018 г. в 13:14, Dmitriy Pavlov :

> Hi Anton,
>
>  I definitely agree it is a blocker.
>
> Sincerely,
> Dmitriy Pavlov
>
> вт, 2 окт. 2018 г. в 13:09, Anton Kalashnikov :
>
> > Hi Igniters.
> >
> > I have one more possible blockers - deadlock in archiver -
> > https://issues.apache.org/jira/browse/IGNITE-9761. I almost fixed it.
> > It seems it should be include to scope.
> >
> > --
> > Best regards,
> > Anton Kalashnikov
> >
> >
> > 02.10.2018, 00:08, "Dmitriy Setrakyan" :
> > > Thanks, got it.
> > >
> > > On Mon, Oct 1, 2018 at 1:14 PM Dmitriy Pavlov 
> > wrote:
> > >
> > >>  Here I agree with Vladimir. Furthermore, I do my absolute best to
> > finalize
> > >>  all reviews in all 2.7 tickets I'm related to. I think most of the
> > >>  contributors doing the same.
> > >>
> > >>  пн, 1 окт. 2018 г. в 23:03, Vladimir Ozerov :
> > >>
> > >>  > This is precisely the scope we have at the moment. All these
> tickets
> > were
> > >>  > considered carefully on whether to include them into AI 2.7 scope.
> I
> > >>  would
> > >>  > say that 10-15% of current tickets may be moved furhter.
> > >>  >
> > >>  > Third of current tickets are features on their final review stages
> > (e.g.
> > >>  > TDE, MVCC invoke, TensorFlow, Thin Clients), another big part is
> > >>  > stabilization tickets (mainly - various test failures), and another
> > big
> > >>  > part is infrastructure (adopting new modules, Java 9+ support,
> > etc.). So
> > >>  > despite big absolute number, most of these tickets are grouped
> around
> > >>  > several big areas, and overall progress over this week should be
> very
> > >>  good.
> > >>  >
> > >>  > On Mon, Oct 1, 2018 at 9:50 PM Dmitriy Setrakyan <
> > dsetrak...@apache.org>
> > >>  > wrote:
> > >>  >
> > >>  > > If this filter is for 2.7 release, then I do not believe all
> these
> > >>  > tickets
> > >>  > > will be closed. It would be nice to leave only "must-have"
> tickets
> > in
> > >>  2.7
> > >>  > > and move the rest to 2.8.
> > >>  > >
> > >>  > > D.
> > >>  > >
> > >>  > > On Mon, Oct 1, 2018 at 11:02 AM Vladimir Ozerov <
> > voze...@gridgain.com>
> > >>  > > wrote:
> > >>  > >
> > >>  > > > Igniters,
> > >>  > > >
> > >>  > > > Please use this filter, as it properly handles tickets without
> > >>  > > components:
> > >>  > > >
> > >>  > > >
> > >>  > > >
> > >>  > >
> > >>  >
> > >>
> >
> https://issues.apache.org/jira/issues/?jql=(project%20%3D%20%27Ignite%27%20AND%20fixVersion%20is%20not%20empty%20AND%20fixVersion%20in%20(%272.7%27)%20AND%20status%20NOT%20IN%20(Resolved%2C%20Closed)%20and%20(component%20is%20null%20or%20component%20not%20in%20(documentation)))%20ORDER%20BY%20priority%20%20%20%20%20%20%20%20%20%20%20%20%20%20
> > >>  > > >
> > >>  > > > On Mon, Oct 1, 2018 at 6:18 PM Nikolay Izhikov <
> > nizhi...@apache.org>
> > >>  > > > wrote:
> > >>  > > >
> > >>  > > > > Hello, Igniters.
> > >>  > > > >
> > >>  > > > > I announce scope freeze for an Apache Ignite 2.7 release.
> > >>  > > > >
> > >>  > > > > It means:
> > >>  > > > >
> > >>  > > > > 1. We add to 2.7 only critical bugs.
> > >>  > > > > 2. We merge to 2.7 branch only previously announces features.
> > >>  > > > > 3. I expect we should exclude or *MERGE ALL TASKS FOR 2.7 DUE
> > TO
> > >>  > > OCTOBER
> > >>  > > > > 10*.
> > >>  > > > > So the *October 10 is DEADLINE* for new features.
> > >>  > > > >
> > >>  > > > > Thoughts?
> > >>  > > > >
> &

[jira] [Created] (IGNITE-9776) FsyncModeFileWriteAheadLogManager can block forever in log() call

2018-10-02 Thread Andrey Kuznetsov (JIRA)

Andrey Kuznetsov created IGNITE-9776:


 Summary: FsyncModeFileWriteAheadLogManager can block forever in 
log() call
 Key: IGNITE-9776
 URL: https://issues.apache.org/jira/browse/IGNITE-9776
 Project: Ignite
  Issue Type: Bug
Reporter: Andrey Kuznetsov
 Fix For: 2.7
 Attachments: FsyncWalRolloverDoesNotBlockTest.java

If WAL archiver is disabled and WALRecord being logged has {{rollOver() == 
true}}, then {{log()}} call blocks forever in {{FileArchiver}}'s (!) method:

{noformat}
nextAbsoluteSegmentIndex:1707, FsyncModeFileWriteAheadLogManager$FileArchiver 
(org.apache.ignite.internal.processors.cache.persistence.wal)
access$3200:1437, FsyncModeFileWriteAheadLogManager$FileArchiver 
(org.apache.ignite.internal.processors.cache.persistence.wal)
pollNextFile:1384, FsyncModeFileWriteAheadLogManager 
(org.apache.ignite.internal.processors.cache.persistence.wal)
initNextWriteHandle:1243, FsyncModeFileWriteAheadLogManager 
(org.apache.ignite.internal.processors.cache.persistence.wal)
rollOver:1130, FsyncModeFileWriteAheadLogManager 
(org.apache.ignite.internal.processors.cache.persistence.wal)
log:712, FsyncModeFileWriteAheadLogManager 
(org.apache.ignite.internal.processors.cache.persistence.wal)
{noformat}

Reporoducer is attached.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Apache Ignite 2.7 release

2018-10-01 Thread Andrey Kuznetsov

t; > > > > > > > whether
> > > > > > > > > > > > > > something goes to 2.7”.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > пт, 28 сент. 2018 г. в 11:11, Dmitriy Pavlov <
> > > > > > > > > >
> > > > > > > > > > dpavlov@gmail.com
> > > > > > > > > > > > :
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > No, it is up to the community to discuss after
> > their
> > > > > >
> > > > > > review
> > > > > > > > > > >
> > > > > > > > > > > results.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > пт, 28 сент. 2018 г. в 11:09, Vladimir Ozerov <
> > > > > > > > > > >
> > > > > > > > > > > voze...@gridgain.com
> > > > > > > > > > > > > :
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Dmitriy,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Did I read your words correctly that it is up
> > to
> > > > > > > >
> > > > > > > > implementor
> > > > > > > > > >
> > > > > > > > > > of a
> > > > > > > > > > > > > > single
> > > > > > > > > > > > > > > > feature to decide whether release of all
> other
> > > >
> > > > features
> > > > > > > and
> > > > > > > > > >
> > > > > > > > > > fixes
> > > > > > > > > > > > to
> > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > delayed?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > пт, 28 сент. 2018 г. в 11:00, Dmitriy Pavlov
> <
> > > > > > > > > > > >
> > > > > > > > > > > > dpavlov@gmail.com
> > > > > > > > > > > > > > :
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > My point we can wait a bit for services
> > because
> > > > > > > > > > > > > > > > > 1  we are open-minded and we don't have
> > outside
> > > > > > >
> > > > > > > pressure
> > > > > > > > to
> > > > > > > > > >
> > > > > > > > > > do
> > > > > > > > > > > > > > release
> > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > October
> > > > > > > > > > > > > > > > > 2  and services it is not some new feature,
> > which
> > > > > > > >
> > > > > > > > suddenly
> > > > > > > > > > > >
> > > > > > > > > > > > appeared
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > autumn, it is a well known and important
> > feature.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > So it is up to Vyacheslav, Anton and
> Nikolay
> > to
> > > > > >
> > > > > > decide.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Decisions can be services are not
> > ready/ready to
> > > > > >
> > > > > > merge
> > > > > > > > only
> >

[GitHub] ignite pull request #4876: Ignite 9744

2018-09-30 Thread andrey-kuznetsov

GitHub user andrey-kuznetsov opened a pull request:

https://github.com/apache/ignite/pull/4876

Ignite 9744



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andrey-kuznetsov/ignite ignite-9744

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/4876.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4876


commit e5c7e049f3fa0d630f77db4f660706e63f073fab
Author: Andrey Kuznetsov 
Date:   2018-09-30T18:29:19Z

IGNITE-9744 Dropped pointless test; refined handler for synthetic test.

commit a2db02adb0a0a7b77ca29dd753595df48d3d53f6
Author: Andrey Kuznetsov 
Date:   2018-09-30T18:31:14Z

IGNITE-9744 Added general-case SYSTEM_WORKER_TERMINATION detection.




---

[jira] [Created] (IGNITE-9744) Fix SYSTEM_WORKER_TERMINATION detection in general case

2018-09-30 Thread Andrey Kuznetsov (JIRA)

Andrey Kuznetsov created IGNITE-9744:


 Summary: Fix SYSTEM_WORKER_TERMINATION detection in general case
 Key: IGNITE-9744
 URL: https://issues.apache.org/jira/browse/IGNITE-9744
 Project: Ignite
  Issue Type: Bug
Reporter: Andrey Kuznetsov
Assignee: Andrey Kuznetsov
 Fix For: 2.7


All existing critical workers handle unintended termination individually. This 
should be done for arbitrtary critical worker as well. There is a test to check 
this situation, {{SystemWorkersTerminationTest.testTermination}}, but now it 
passes in fact due to {{SYSTEM_WORKER_BLOCKED}} instead of 
{{SYSTEM_WORKER_TERMINATION}}, and this should be fixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Apache Ignite 2.7 release

2018-09-27 Thread Andrey Kuznetsov

Igniters,

I've bumped into a new bug in WAL manager recently, see [1]. It looks
critical enough, and can be a good candidate for fixing before 2.7 release.

Do you agree?

[1] https://issues.apache.org/jira/browse/IGNITE-9731

чт, 27 сент. 2018 г. в 19:45, Dmitriy Pavlov :

> I need Vhyacheslav's opinion to be absolutely sure what status is now.
>
> We never committed to dates of release, as well. I don't quite understand
> what can mean 'the community committed to doing/releasing something'.
>
> About SG, I also concerned why such a big feature has quite a few
> discussions on the list. But it is another story.
>
> чт, 27 сент. 2018 г. в 19:33, Vladimir Ozerov :
>
> > Folks,
> >
> > Please stop looking for enemies everywhere. Just went through this thread
> > and search for "service" word.
> >
> > On Thu, Sep 27, 2018 at 7:30 PM Denis Magda  wrote:
> >
> >> >
> >> > Denis, as PMC Chair, could you please control, that Service Grid
> >> > inclusion/exclusion is discussed properly according to the Apache Way.
> >>
> >>
> >> It's fine when committers/contributors have private discussions related
> to
> >> a feature they've been working on. Not everything should go through the
> >> dev
> >> list. Otherwise, it will be inundated.  However, agree, that
> architectural
> >> and release decisions need to be done publicly.
> >>
> >> Speaking about Service Grid, there was a discussion where I saw that it
> >> was
> >> questionable whether it gets added to the release or not.
> >>
> >> *Vladislav*, could you please shed some light on the current status of
> the
> >> service grid?
> >>
> >> On Thu, Sep 27, 2018 at 9:12 AM Dmitriy Pavlov 
> >> wrote:
> >>
> >> > Ok, let's wait for feedback from SG Author(s)/Reviewer(s) first. If it
> >> is
> >> > not ready, ok. But I thought it is almost done.
> >> >
> >> > I apologize if I missed some discussion (it can happen), but
> >> > According to the statement "our current agreement"
> >> > I can suspect some members are making some sort of private agreements,
> >> and
> >> > do not to discuss it on the list.
> >> >
> >> > Let's build consensus here first, and then name an agreement.
> >> >
> >> > Denis, as PMC Chair, could you please control, that Service Grid
> >> > inclusion/exclusion is discussed properly according to the Apache Way.
> >> >
> >> > чт, 27 сент. 2018 г. в 18:55, Vladimir Ozerov :
> >> >
> >> >> Dmitriy,
> >> >>
> >> >> This is an outcome of current state of Service Grid - it is not
> ready.
> >> We
> >> >> never committed to have it to 2.7. Our goal was to try to include it
> >> into
> >> >> 2.7.
> >> >>
> >> >> On Thu, Sep 27, 2018 at 6:48 PM Dmitriy Pavlov <
> dpavlov@gmail.com>
> >> >> wrote:
> >> >>
> >> >> > Could you please provide a reference to some thread? Probably I
> >> missed
> >> >> it.
> >> >> >
> >> >> > чт, 27 сент. 2018 г. в 18:46, Vladimir Ozerov <
> voze...@gridgain.com
> >> >:
> >> >> >
> >> >> > > Our current agreement is that Service Grid is out of scope. This
> >> is a
> >> >> > huge
> >> >> > > feature, which hasn't entered review stage so far, We will not be
> >> >> able to
> >> >> > > review/fix/test it properly.
> >> >> > >
> >> >> > > On Thu, Sep 27, 2018 at 6:32 PM Dmitriy Pavlov <
> >> dpavlov@gmail.com
> >> >> >
> >> >> > > wrote:
> >> >> > >
> >> >> > > > I agree, and I prefer four weeks for stabilization* (1 Oct - 29
> >> Oct)
> >> >> > > >
> >> >> > > > Do I understand it correctly: Service Grid is still in scope,
> >> isn't
> >> >> > it? I
> >> >> > > > find it very important.
> >> >> > > >
> >> >> > > > чт, 27 сент. 2018 г. в 18:28, Nikolay Izhikov <
> >> nizhi...@apache.org
> >> >> >:
> >> >> > > >
> >> >> > > > > Hello, Vova.
> >> >> > > > >
> >> >> > > > > Thank you for clear release status.
> >> >> > > > > I'm +1 for your proposal.
> >> >> > > > >
> >> >> > > > > чт, 27 сент. 2018 г., 18:25 Alexey Kuznetsov <
> >> >> akuznet...@apache.org
> >> >> > >:
> >> >> > > > >
> >> >> > > > > > Vova,
> >> >> > > > > >
> >> >> > > > > > Huge +1 to do a stabilization.
> >> >> > > > > >
> >> >> > > > > >
> >> >> > > > > > --
> >> >> > > > > > Alexey Kuznetsov
> >> >> > > > > >
> >> >> > > > >
> >> >> > > >
> >> >> > >
> >> >> >
> >> >>
> >> >
> >>
> >
>


-- 
Best regards,
  Andrey Kuznetsov.

[jira] [Created] (IGNITE-9731) NPE is possible during WAL flushing

2018-09-27 Thread Andrey Kuznetsov (JIRA)

Andrey Kuznetsov created IGNITE-9731:


 Summary: NPE is possible during WAL flushing
 Key: IGNITE-9731
 URL: https://issues.apache.org/jira/browse/IGNITE-9731
 Project: Ignite
  Issue Type: Task
Reporter: Andrey Kuznetsov
 Fix For: 2.7
 Attachments: WalRolloverRecordLoggingTest.java

{{FileWriteAheadLogManager.flush()}} seems to be not thread-safe anymore in 
master branch. The test attached produces the following NPE:

{noformat}
java.lang.NullPointerException
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileHandle.getSegmentId(FileWriteAheadLogManager.java:2371)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.needFsync(FileWriteAheadLogManager.java:2642)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.fsync(FileWriteAheadLogManager.java:2668)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.access$1900(FileWriteAheadLogManager.java:2445)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.flush(FileWriteAheadLogManager.java:866)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.markCheckpointBegin(GridCacheDatabaseSharedManager.java:3633)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.doCheckpoint(GridCacheDatabaseSharedManager.java:3126)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.body(GridCacheDatabaseSharedManager.java:3025)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:748)
{noformat}

This could be possibly brought by commit [1].

[1] 
https://github.com/apache/ignite/commit/2f72fe758d4256c4eb4610e5922ad3d174b43dc5




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Critical worker threads liveness checking drawbacks

2018-09-27 Thread Andrey Kuznetsov

Maxim,

Thanks for being attentive! It's definitely a typo. Could you please create
an issue?

чт, 27 сент. 2018 г. в 16:00, Maxim Muzafarov :

> Folks,
>
> I've found in `GridCachePartitionExchangeManager:2684` [1] (master branch)
> exchange future wrapped
> with double `blockingSectionEnd` method. Is it correct? I just want to
> understand this change and
> how should I use this in the future.
>
> Should I file a new issue to fix this? I think here `blockingSectionBegin`
> method should be used.
>
> -
> blockingSectionEnd();
>
> try {
> resVer = exchFut.get(exchTimeout, TimeUnit.MILLISECONDS);
> } finally {
> blockingSectionEnd();
> }
>
>
> [1]
>
> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/GridCachePartitionExchangeManager.java#L2684
>
> On Wed, 26 Sep 2018 at 22:47 Vyacheslav Daradur 
> wrote:
>
> > Andrey Gura, thank you for the answer!
> >
> > I agree that wrapping of 'init' method reduces the profit of watchdog
> > service in case of PME worker, but in other cases, we should wrap all
> > possible long sections on GridDhtPartitionExchangeFuture. For example
> > 'onCacheChangeRequest' method or
> > 'cctx.affinity().onCacheChangeRequest' inside because it may take
> > significant time (reproducer attached).
> >
> > I only want to point out a possible issue which may allow to end-user
> > halt the Ignite cluster accidentally.
> >
> > I'm sure that PME experts know how to fix this issue properly.
> > On Wed, Sep 26, 2018 at 10:28 PM Andrey Gura  wrote:
> > >
> > > Vyacheslav,
> > >
> > > Exchange worker is strongly tied with
> > > GridDhtPartitionExchangeFuture#init and it is ok. Exchange worker also
> > > shouldn't be blocked for long time but in reality it happens.It also
> > > means that your change doesn't make sense.
> > >
> > > What actually make sense it is identification of places which
> > > intentionally blocking. May be some places/actions should be braced by
> > > blocking guards.
> > >
> > > If you have failing tests please make sure that your failureHandler is
> > > NoOpFailureHandler or any other handler with ignoreFailureTypes =
> > > [CRITICAL_WORKER_BLOCKED].
> > >
> > >
> > > On Wed, Sep 26, 2018 at 9:43 PM Vyacheslav Daradur <
> daradu...@gmail.com>
> > wrote:
> > > >
> > > > Hi Igniters!
> > > >
> > > > Thank you for this important improvement!
> > > >
> > > > I've looked through implementation and noticed that
> > > > GridDhtPartitionsExchangeFuture#init has not been wrapped in blocked
> > > > section. This means it easy to halt the node in case of longrunning
> > > > actions during PME, for example when we create a cache with
> > > > StoreFactrory which connect to 3rd party DB.
> > > >
> > > > I'm not sure that it is the right behavior.
> > > >
> > > > I filled the issue [1] and prepared the PR [2] with reproducer and
> > possible fix.
> > > >
> > > > Andrey, could you please look at and confirm that it makes sense?
> > > >
> > > > [1] https://issues.apache.org/jira/browse/IGNITE-9710
> > > > [2] https://github.com/apache/ignite/pull/4845
> > > > On Mon, Sep 24, 2018 at 9:46 PM Andrey Kuznetsov 
> > wrote:
> > > > >
> > > > > Denis,
> > > > >
> > > > > I've created the ticket [1] with short description of the
> > functionality.
> > > > >
> > > > > [1] https://issues.apache.org/jira/browse/IGNITE-9679
> > > > >
> > > > >
> > > > > пн, 24 сент. 2018 г. в 17:46, Denis Magda :
> > > > >
> > > > > > Andrey K. and G.,
> > > > > >
> > > > > > Thanks, do we have a documentation ticket created? Prachi
> (copied)
> > can help
> > > > > > with the documentation.
> > > > > >
> > > > > > --
> > > > > > Denis
> > > > > >
> > > > > > On Mon, Sep 24, 2018 at 5:51 AM Andrey Gura 
> > wrote:
> > > > > >
> > > > > > > Andrey,
> > > > > > >
> > > > > > > finally your change is merged to master branch. Congratulations
> > and
> > > > > > > thank you very much! :)
> > >

[GitHub] ignite pull request #4835: IGNITE-9695 WAL disabling prohibition in WalState...

2018-09-26 Thread andrey-kuznetsov

GitHub user andrey-kuznetsov opened a pull request:

https://github.com/apache/ignite/pull/4835

IGNITE-9695 WAL disabling prohibition in WalStateManager.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andrey-kuznetsov/ignite ignite-9695

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/4835.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4835


commit 48a00e37be36d3bab1291c852adf7709425551e9
Author: Andrey Kuznetsov 
Date:   2018-09-26T08:48:59Z

IGNITE-9695 WAL disabling prohibition in WalStateManager.




---

[jira] [Created] (IGNITE-9695) Add a way to prevent per-cache WAL disabling in WalStateManager

2018-09-25 Thread Andrey Kuznetsov (JIRA)

Andrey Kuznetsov created IGNITE-9695:


 Summary: Add a way to prevent per-cache WAL disabling in 
WalStateManager
 Key: IGNITE-9695
 URL: https://issues.apache.org/jira/browse/IGNITE-9695
 Project: Ignite
  Issue Type: Task
Affects Versions: 2.6
Reporter: Andrey Kuznetsov
Assignee: Andrey Kuznetsov
 Fix For: 2.8


When this prevention is on, {{WalStateManager.init()}} should return an 
error-holding future immediately.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Critical worker threads liveness checking drawbacks

2018-09-24 Thread Andrey Kuznetsov

Denis,

I've created the ticket [1] with short description of the functionality.

[1] https://issues.apache.org/jira/browse/IGNITE-9679


пн, 24 сент. 2018 г. в 17:46, Denis Magda :

> Andrey K. and G.,
>
> Thanks, do we have a documentation ticket created? Prachi (copied) can help
> with the documentation.
>
> --
> Denis
>
> On Mon, Sep 24, 2018 at 5:51 AM Andrey Gura  wrote:
>
> > Andrey,
> >
> > finally your change is merged to master branch. Congratulations and
> > thank you very much! :)
> >
> > I think that the next step is feature that will allow signal about
> > blocked threads to the monitoring tools via MXBean.
> >
> > I hope you will continue development of this feature and provide your
> > vision in new JIRA issue.
> >
> >
> > On Tue, Sep 11, 2018 at 6:54 PM Andrey Kuznetsov 
> > wrote:
> > >
> > > David, Maxim!
> > >
> > > Thanks a lot for you ideas. Unfortunately, I can't adopt all of them
> > right
> > > now: the scope is much broader than the scope of the change I
> implement.
> > I
> > > have had a talk to a group of Ignite commiters, and we agreed to
> complete
> > > the change as follows.
> > > - Blocking instructions in system-critical which may resonably last
> long
> > > should be explicitly excluded from the monitoring.
> > > - Failure handlers should have a setting to suppress some failures on
> > > per-failure-type basis.
> > > According to this I have updated the implementation: [1]
> > >
> > > [1] https://github.com/apache/ignite/pull/4089
> > >
> > > пн, 10 сент. 2018 г. в 22:35, David Harvey :
> > >
> > > > When I've done this before,I've needed to find the oldest  thread,
> and
> > kill
> > > > the node running that.   From a language standpoint, Maxim's "without
> > > > progress" better than "heartbeat".   For example, what I'm most
> > interested
> > > > in on a distributed system is which thread started the work it has
> not
> > > > completed the earliest, and when did that thread last make forward
> > > > process. You don't want to kill a node because a thread is
> waiting
> > on a
> > > > lock held by a thread that went off-node and has not gotten a
> response.
> > > > If you don't understand the dependency relationships, you will make
> > > > incorrect recovery decisions.
> > > >
> > > > On Mon, Sep 10, 2018 at 4:08 AM Maxim Muzafarov 
> > > > wrote:
> > > >
> > > > > I think we should find exact answers to these questions:
> > > > >  1. What `critical` issue exactly is?
> > > > >  2. How can we find critical issues?
> > > > >  3. How can we handle critical issues?
> > > > >
> > > > > First,
> > > > >  - Ignore uninterruptable actions (e.g. worker\service shutdown)
> > > > >  - Long I/O operations (should be a configurable timeout for each
> > type of
> > > > > usage)
> > > > >  - Infinite loops
> > > > >  - Stalled\deadlocked threads (and\or too many parked threads,
> > exclude
> > > > I/O)
> > > > >
> > > > > Second,
> > > > >  - The working queue is without progress (e.g. disco, exchange
> > queues)
> > > > >  - Work hasn't been completed since the last heartbeat (checking
> > > > > milestones)
> > > > >  - Too many system resources used by a thread for the long period
> of
> > time
> > > > > (allocated memory, CPU)
> > > > >  - Timing fields associated with each thread status exceeded a
> > maximum
> > > > time
> > > > > limit.
> > > > >
> > > > > Third (not too many options here),
> > > > >  - `log everything` should be the default behaviour in all these
> > cases,
> > > > > since it may be difficult to find the cause after the restart.
> > > > >  - Wait some interval of time and kill the hanging node (cluster
> > should
> > > > be
> > > > > configured stable enough)
> > > > >
> > > > > Questions,
> > > > >  - Not sure, but can workers miss their heartbeat deadlines if CPU
> > loads
> > > > up
> > > > > to 80%-90%? Bursts of momentary overloads can be
> > > > > expected behaviour as a normal part of system operations.

[jira] [Created] (IGNITE-9679) Document critical workers liveness checking implementation

2018-09-24 Thread Andrey Kuznetsov (JIRA)

Andrey Kuznetsov created IGNITE-9679:


 Summary: Document critical workers liveness checking implementation
 Key: IGNITE-9679
 URL: https://issues.apache.org/jira/browse/IGNITE-9679
 Project: Ignite
  Issue Type: Task
  Components: documentation
Reporter: Andrey Kuznetsov
Assignee: Denis Magda
 Fix For: 2.7


Newly implemented critical worker thread liveness checks should be mentioned in 
Ignite Documentation. Brief description of the functionality follows.

Ignite node has a number of critical worker threads that should be alive and 
responsive, otherwise node's health is not guaranteed. These threads monitor 
each other periodically and track two aspects for a thread being checked:
- whether it's alive;
- whether it updates its internal heartbeat timestamp.
Both checks use {{IgniteConfiguration.failureDetectionTimeout}} property as a 
threshold value.
Whenever at least one of the above conditions is violated, checker thread logs 
the error and calls currently configured {{FailureHandler}}.

Liveness checks are enabled by default, but can be disabled through 
{{WorkersControlMXBean.healthMonitoringEnabled}} property.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (IGNITE-9666) TxPessimisticDeadlockDetectionCrossCacheTest.testDeadlockAnotherNear is flaky on master

2018-09-23 Thread Andrey Kuznetsov (JIRA)

Andrey Kuznetsov created IGNITE-9666:


 Summary: 
TxPessimisticDeadlockDetectionCrossCacheTest.testDeadlockAnotherNear is flaky 
on master
 Key: IGNITE-9666
 URL: https://issues.apache.org/jira/browse/IGNITE-9666
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.6
Reporter: Andrey Kuznetsov
 Fix For: 2.8


Sometimes the test cannot pass {{assertTrue(deadlock.get())}}. 

Presumably, it's due to ignoring possible long JVM pauses. For example, one can 
see near the first 'put' pair (note timestamps) :

{noformat}
[2018-09-23 11:16:55,975][INFO ][tx-thread-1][root] >>> Performs put 
[node=TcpDiscoveryNode [id=dd46ab0e-ed28-4c67-b3c4-98900bb0, 
addrs=ArrayList [127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47500], 
discPort=47500, order=1, intOrder=1, lastExchangeTime=1537690615852, loc=true, 
ver=2.7.0#19700101-sha1:, isClient=false], tx=TransactionProxyImpl 
[tx=IgniteTxAdapter [xidVer=GridCacheVersion [topVer=149170604, 
order=1537690611182, nodeOrder=1], writeVer=null, implicit=false, loc=true, 
threadId=129, startTime=1537690615791, 
nodeId=dd46ab0e-ed28-4c67-b3c4-98900bb0, startVer=GridCacheVersion 
[topVer=149170604, order=1537690611182, nodeOrder=1], endVer=null, 
isolation=REPEATABLE_READ, concurrency=PESSIMISTIC, timeout=500, 
sysInvalidate=false, sys=false, plc=2, commitVer=null, finalizing=NONE, 
invalidParts=null, state=ACTIVE, timedOut=false, topVer=AffinityTopologyVersion 
[topVer=-1, minorTopVer=0], 
txCounters=org.apache.ignite.internal.processors.cache.transactions.TxCounters@31c7393f,
 duration=155ms, onePhaseCommit=false]IgniteTxLocalAdapter [completedBase=null, 
sndTransformedVals=false, depEnabled=false, txState=IgniteTxStateImpl 
[activeCacheIds=[], recovery=null, mvccEnabled=null, txMap=EmptySet []], 
mvccWaitTxs=null, qryEnlisted=false, super=, size=0]GridDhtTxLocalAdapter 
[nearOnOriginatingNode=false, nearNodes=KeySetView [], dhtNodes=KeySetView [], 
explicitLock=false, super=]GridNearTxLocal [mappings=IgniteTxMappingsImpl [], 
nearLocallyMapped=false, colocatedLocallyMapped=false, needCheckBackup=null, 
hasRemoteLocks=false, trackTimeout=true, lb=null, mvccTracker=null, sql=null, 
thread=tx-thread-1, mappings=IgniteTxMappingsImpl [], super=], async=false, 
asyncRes=null], key=2, cache=cache0]
[2018-09-23 11:16:55,975][INFO ][tx-thread-2][root] >>> Performs put 
[node=TcpDiscoveryNode [id=dd46ab0e-ed28-4c67-b3c4-98900bb0, 
addrs=ArrayList [127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47500], 
discPort=47500, order=1, intOrder=1, lastExchangeTime=1537690615852, loc=true, 
ver=2.7.0#19700101-sha1:, isClient=false], tx=TransactionProxyImpl 
[tx=IgniteTxAdapter [xidVer=GridCacheVersion [topVer=149170604, 
order=1537690611181, nodeOrder=1], writeVer=null, implicit=false, loc=true, 
threadId=130, startTime=1537690615791, 
nodeId=dd46ab0e-ed28-4c67-b3c4-98900bb0, startVer=GridCacheVersion 
[topVer=149170604, order=1537690611182, nodeOrder=1], endVer=null, 
isolation=REPEATABLE_READ, concurrency=PESSIMISTIC, timeout=500, 
sysInvalidate=false, sys=false, plc=2, commitVer=null, finalizing=NONE, 
invalidParts=null, state=ACTIVE, timedOut=false, topVer=AffinityTopologyVersion 
[topVer=-1, minorTopVer=0], 
txCounters=org.apache.ignite.internal.processors.cache.transactions.TxCounters@14d54c9c,
 duration=155ms, onePhaseCommit=false]IgniteTxLocalAdapter [completedBase=null, 
sndTransformedVals=false, depEnabled=false, txState=IgniteTxStateImpl 
[activeCacheIds=[], recovery=null, mvccEnabled=null, txMap=EmptySet []], 
mvccWaitTxs=null, qryEnlisted=false, super=, size=0]GridDhtTxLocalAdapter 
[nearOnOriginatingNode=false, nearNodes=KeySetView [], dhtNodes=KeySetView [], 
explicitLock=false, super=]GridNearTxLocal [mappings=IgniteTxMappingsImpl [], 
nearLocallyMapped=false, colocatedLocallyMapped=false, needCheckBackup=null, 
hasRemoteLocks=false, trackTimeout=true, lb=null, mvccTracker=null, sql=null, 
thread=tx-thread-2, mappings=IgniteTxMappingsImpl [], super=], async=false, 
asyncRes=null], key=2, cache=cache1]
[2018-09-23 11:16:56,378][INFO 
][exchange-worker-#38%transactions.TxPessimisticDeadlockDetectionCrossCacheTest0%][time]
 Started exchange init [topVer=AffinityTopologyVersion [topVer=2, 
minorTopVer=3], mvccCrd=MvccCoordinator 
[nodeId=dd46ab0e-ed28-4c67-b3c4-98900bb0, crdVer=1537690602134, 
topVer=AffinityTopologyVersion [topVer=1, minorTopVer=0]], mvccCrdChange=false, 
crd=true, evt=DISCOVERY_CUSTOM_EVT, 
evtNode=dd46ab0e-ed28-4c67-b3c4-98900bb0, 
customEvt=CacheAffinityChangeMessage 
[id=d7540850661-799b6d10-6e53-4f8b-9595-98f8c060efa1, 
topVer=AffinityTopologyVersion [topVer=2, minorTopVer=0], exchId=null, 
partsMsg=null, exchangeNeeded=true], allowMerge=false]
{noformat}

And then, transactions have to roll back due to 500 ms timeout, leaving no 
possibility to produce deadlock.




--
Thi

Re: Switching to real FailureHandler in tests

2018-09-21 Thread Andrey Kuznetsov

Thanks to all for participating the discussion.

I've updated [1]: now it requires new handler from [2] for completion.

[1] https://issues.apache.org/jira/browse/IGNITE-9660
[2] https://issues.apache.org/jira/browse/IGNITE-8227

чт, 20 сент. 2018 г. в 21:56, Vladimir Ozerov :

> Stop node handler is not very good choice. Some test will continue work as
> usual even if some node failed. E.g. SQL queries with backups may continue
> function in some cases, especially if these are test with REPLICATED cache.
>
> New test-scope handler looks like a better candidate to me.
>
> чт, 20 сент. 2018 г. в 21:22, Andrey Kuznetsov :
>
> > I meant the first comment in [1]. We are to decide first whether we'll do
> > it or not.
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-8227
> > <
> >
> https://issues.apache.org/jira/browse/IGNITE-8227?focusedCommentId=16435298=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16435298
> > >
> >
> > чт, 20 сент. 2018 г. в 21:18, Dmitriy Pavlov :
> >
> > > Sorry, incomplete message.
> > >
> > > Why do you think there is no consensus?
> > >
> > > I have no clue what can be a reason for another approach.
> > > By default failure handler should fail all test.
> > >
> > > Failure handlers test will be always a minority of tests, so fail
> handler
> > > call is something abnormal.
> > >
> > > чт, 20 сент. 2018 г. в 21:15, Dmitriy Pavlov :
> > >
> > > > Why do you think there is no consensus?
> > > >
> > > > I have no clue that by default failure handler should fail all test.
> > > >
> > > > чт, 20 сент. 2018 г. в 21:10, Andrey Kuznetsov :
> > > >
> > > >> I've created [1] to address this.
> > > >>
> > > >> Dmitriy, I like your idea of creating special test-scope handler.
> But
> > > >> there
> > > >> is no consensus about it, so I don't want to rely on that potential
> > > >> handler
> > > >> right now. We can switch to it later, of course.
> > > >>
> > > >> [1] https://issues.apache.org/jira/browse/IGNITE-9660
> > > >>
> > > >> чт, 20 сент. 2018 г. в 20:03, Maxim Muzafarov :
> > > >>
> > > >> > Andrey,
> > > >> >
> > > >> > I like your idea.
> > > >> >
> > > >> > After changing the default node failure handler to the new one we
> > > should
> > > >> > carefully review the whole new test failures. For instance,
> calling
> > > this
> > > >> > method in tests should not lead test to the node being stopped:
> > > >> >
> > > >> > FOR TEST ONLY!!!
> > > >> > TcpDiscoverySpi#simulateNodeFailure
> > > >> >
> > > >> > BTW, I would like to remove this method at all from production
> code.
> > > >> >
> > > >> > On Thu, 20 Sep 2018 at 19:43 Dmitriy Pavlov <
> dpavlov@gmail.com>
> > > >> wrote:
> > > >> >
> > > >> > > But the totally ideal situation would be finding a way to fail
> the
> > > >> test
> > > >> > by
> > > >> > > default, not only stopping a node.
> > > >> > >
> > > >> > > Some time ago I've created
> > > >> > > https://issues.apache.org/jira/browse/IGNITE-8227 to
> > > >> > > find out a way to do so.
> > > >> > >
> > > >> > > чт, 20 сент. 2018 г. в 19:40, Dmitriy Pavlov <
> > dpavlov@gmail.com
> > > >:
> > > >> > >
> > > >> > > > ++1
> > > >> > > >
> > > >> > > > чт, 20 сент. 2018 г. в 19:39, Andrey Kuznetsov <
> > stku...@gmail.com
> > > >:
> > > >> > > >
> > > >> > > >> Igniters,
> > > >> > > >>
> > > >> > > >> While running tests I see a lot of ignored critical failures
> > > >> caused by
> > > >> > > >> {{NoOpFailureHandler}} that we use by default. In some tests,
> > of
> > > >> > cource,
> > > >> > > >> critical failures are the part of normal workflow, but in the
> > > >> majority
> > > >> > > of
> > > >> > > >> tests they indicate bugs. By using {{NoOpFailureHandler}} we
> > just
> > > >> hide
> > > >> > > >> these bugs from ourselves.
> > > >> > > >>
> > > >> > > >> What do you think about changing default handler to
> > > >> > > >> {{StopNodeFailureHandler}} at {{GridAbstractTest}} level?
> This
> > > >> could
> > > >> > be
> > > >> > > >> overridden in subclasses.
> > > >> > > >>
> > > >> > > >> --
> > > >> > > >> Best regards,
> > > >> > > >>   Andrey Kuznetsov.
> > > >> > > >>
> > > >> > > >
> > > >> > >
> > > >> > --
> > > >> > --
> > > >> > Maxim Muzafarov
> > > >> >
> > > >>
> > > >>
> > > >> --
> > > >> Best regards,
> > > >>   Andrey Kuznetsov.
> > > >>
> > > >
> > >
> >
> >
> > --
> > Best regards,
> >   Andrey Kuznetsov.
> >
>


-- 
Best regards,
  Andrey Kuznetsov.

Re: Switching to real FailureHandler in tests

2018-09-20 Thread Andrey Kuznetsov

I meant the first comment in [1]. We are to decide first whether we'll do
it or not.

[1] https://issues.apache.org/jira/browse/IGNITE-8227
<https://issues.apache.org/jira/browse/IGNITE-8227?focusedCommentId=16435298=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16435298>

чт, 20 сент. 2018 г. в 21:18, Dmitriy Pavlov :

> Sorry, incomplete message.
>
> Why do you think there is no consensus?
>
> I have no clue what can be a reason for another approach.
> By default failure handler should fail all test.
>
> Failure handlers test will be always a minority of tests, so fail handler
> call is something abnormal.
>
> чт, 20 сент. 2018 г. в 21:15, Dmitriy Pavlov :
>
> > Why do you think there is no consensus?
> >
> > I have no clue that by default failure handler should fail all test.
> >
> > чт, 20 сент. 2018 г. в 21:10, Andrey Kuznetsov :
> >
> >> I've created [1] to address this.
> >>
> >> Dmitriy, I like your idea of creating special test-scope handler. But
> >> there
> >> is no consensus about it, so I don't want to rely on that potential
> >> handler
> >> right now. We can switch to it later, of course.
> >>
> >> [1] https://issues.apache.org/jira/browse/IGNITE-9660
> >>
> >> чт, 20 сент. 2018 г. в 20:03, Maxim Muzafarov :
> >>
> >> > Andrey,
> >> >
> >> > I like your idea.
> >> >
> >> > After changing the default node failure handler to the new one we
> should
> >> > carefully review the whole new test failures. For instance, calling
> this
> >> > method in tests should not lead test to the node being stopped:
> >> >
> >> > FOR TEST ONLY!!!
> >> > TcpDiscoverySpi#simulateNodeFailure
> >> >
> >> > BTW, I would like to remove this method at all from production code.
> >> >
> >> > On Thu, 20 Sep 2018 at 19:43 Dmitriy Pavlov 
> >> wrote:
> >> >
> >> > > But the totally ideal situation would be finding a way to fail the
> >> test
> >> > by
> >> > > default, not only stopping a node.
> >> > >
> >> > > Some time ago I've created
> >> > > https://issues.apache.org/jira/browse/IGNITE-8227 to
> >> > > find out a way to do so.
> >> > >
> >> > > чт, 20 сент. 2018 г. в 19:40, Dmitriy Pavlov  >:
> >> > >
> >> > > > ++1
> >> > > >
> >> > > > чт, 20 сент. 2018 г. в 19:39, Andrey Kuznetsov  >:
> >> > > >
> >> > > >> Igniters,
> >> > > >>
> >> > > >> While running tests I see a lot of ignored critical failures
> >> caused by
> >> > > >> {{NoOpFailureHandler}} that we use by default. In some tests, of
> >> > cource,
> >> > > >> critical failures are the part of normal workflow, but in the
> >> majority
> >> > > of
> >> > > >> tests they indicate bugs. By using {{NoOpFailureHandler}} we just
> >> hide
> >> > > >> these bugs from ourselves.
> >> > > >>
> >> > > >> What do you think about changing default handler to
> >> > > >> {{StopNodeFailureHandler}} at {{GridAbstractTest}} level? This
> >> could
> >> > be
> >> > > >> overridden in subclasses.
> >> > > >>
> >> > > >> --
> >> > > >> Best regards,
> >> > > >>   Andrey Kuznetsov.
> >> > > >>
> >> > > >
> >> > >
> >> > --
> >> > --
> >> > Maxim Muzafarov
> >> >
> >>
> >>
> >> --
> >> Best regards,
> >>   Andrey Kuznetsov.
> >>
> >
>


-- 
Best regards,
  Andrey Kuznetsov.

Re: Switching to real FailureHandler in tests

2018-09-20 Thread Andrey Kuznetsov

I've created [1] to address this.

Dmitriy, I like your idea of creating special test-scope handler. But there
is no consensus about it, so I don't want to rely on that potential handler
right now. We can switch to it later, of course.

[1] https://issues.apache.org/jira/browse/IGNITE-9660

чт, 20 сент. 2018 г. в 20:03, Maxim Muzafarov :

> Andrey,
>
> I like your idea.
>
> After changing the default node failure handler to the new one we should
> carefully review the whole new test failures. For instance, calling this
> method in tests should not lead test to the node being stopped:
>
> FOR TEST ONLY!!!
> TcpDiscoverySpi#simulateNodeFailure
>
> BTW, I would like to remove this method at all from production code.
>
> On Thu, 20 Sep 2018 at 19:43 Dmitriy Pavlov  wrote:
>
> > But the totally ideal situation would be finding a way to fail the test
> by
> > default, not only stopping a node.
> >
> > Some time ago I've created
> > https://issues.apache.org/jira/browse/IGNITE-8227 to
> > find out a way to do so.
> >
> > чт, 20 сент. 2018 г. в 19:40, Dmitriy Pavlov :
> >
> > > ++1
> > >
> > > чт, 20 сент. 2018 г. в 19:39, Andrey Kuznetsov :
> > >
> > >> Igniters,
> > >>
> > >> While running tests I see a lot of ignored critical failures caused by
> > >> {{NoOpFailureHandler}} that we use by default. In some tests, of
> cource,
> > >> critical failures are the part of normal workflow, but in the majority
> > of
> > >> tests they indicate bugs. By using {{NoOpFailureHandler}} we just hide
> > >> these bugs from ourselves.
> > >>
> > >> What do you think about changing default handler to
> > >> {{StopNodeFailureHandler}} at {{GridAbstractTest}} level? This could
> be
> > >> overridden in subclasses.
> > >>
> > >> --
> > >> Best regards,
> > >>   Andrey Kuznetsov.
> > >>
> > >
> >
> --
> --
> Maxim Muzafarov
>


-- 
Best regards,
  Andrey Kuznetsov.

[jira] [Created] (IGNITE-9660) Switch default test FailureHandler to StopNodeFailureHandler

2018-09-20 Thread Andrey Kuznetsov (JIRA)

Andrey Kuznetsov created IGNITE-9660:


 Summary: Switch default test FailureHandler to 
StopNodeFailureHandler
 Key: IGNITE-9660
 URL: https://issues.apache.org/jira/browse/IGNITE-9660
 Project: Ignite
  Issue Type: Test
Affects Versions: 2.6
Reporter: Andrey Kuznetsov
 Fix For: 2.8


{{GridAbstractTest.getFailureHandler()}} returns {{NoOpFailureHandler}} 
instance. This often leads to hiding bugs occurring in tests. 
{{getFailureFailureHandler()}} should return {{StopNodeFailureHandler}} instead.

The change assumes re-checking failed tests and set handler to 
{{NoOpFailureHandler}} in subclasses where it's really a must.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Switching to real FailureHandler in tests

2018-09-20 Thread Andrey Kuznetsov

Igniters,

While running tests I see a lot of ignored critical failures caused by
{{NoOpFailureHandler}} that we use by default. In some tests, of cource,
critical failures are the part of normal workflow, but in the majority of
tests they indicate bugs. By using {{NoOpFailureHandler}} we just hide
these bugs from ourselves.

What do you think about changing default handler to
{{StopNodeFailureHandler}} at {{GridAbstractTest}} level? This could be
overridden in subclasses.

--
Best regards,
  Andrey Kuznetsov.

[jira] [Created] (IGNITE-9653) StopNodeOrHaltFailureHandlerTest.testJvmHalted has flaky failures on master branch

2018-09-20 Thread Andrey Kuznetsov (JIRA)

Andrey Kuznetsov created IGNITE-9653:


 Summary: StopNodeOrHaltFailureHandlerTest.testJvmHalted has flaky 
failures on master branch
 Key: IGNITE-9653
 URL: https://issues.apache.org/jira/browse/IGNITE-9653
 Project: Ignite
  Issue Type: Bug
Reporter: Andrey Kuznetsov
 Fix For: 2.8


```
junit.framework.AssertionFailedError
at 
org.apache.ignite.failure.StopNodeOrHaltFailureHandlerTest.testJvmHalted(StopNodeOrHaltFailureHandlerTest.java:93)
```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (IGNITE-9640) [TC Bot] Determine repetitive failure types by analyzing build log

2018-09-18 Thread Andrey Kuznetsov (JIRA)

Andrey Kuznetsov created IGNITE-9640:


 Summary: [TC Bot] Determine repetitive failure types by analyzing 
build log
 Key: IGNITE-9640
 URL: https://issues.apache.org/jira/browse/IGNITE-9640
 Project: Ignite
  Issue Type: Task
Reporter: Andrey Kuznetsov


When someone is analyzing flaky test failure, it's important to distinguish 
between newly created failure and pre-existing one. In the latter case, the bot 
should not attract contributor's attention to the test.

In more detail, TC build log fragments starts with identical substrings for 
identical failures very often, e.g.

{noformat}
junit.framework.AssertionFailedError
at 
org.apache.ignite.internal.GridVersionSelfTest.testVersions(GridVersionSelfTest.java:54)
{noformat}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

CRUD issues in Ignite Upsource

2018-09-17 Thread Andrey Kuznetsov

Igniters,

I experience issues with Upsource [1]. Review creation/deletion or changes
of reviewer set in existing review lead to error message in a toast.
However, changes requested can be reflected some time later or after page
reload. Can anyone comment on this?

[1] https://reviews.ignite.apache.org/ignite

-- 
Best regards,
  Andrey Kuznetsov.

[GitHub] ignite pull request #4762: IGNITE-9601 Writing rollover record to the end of...

2018-09-14 Thread andrey-kuznetsov

GitHub user andrey-kuznetsov opened a pull request:

https://github.com/apache/ignite/pull/4762

IGNITE-9601 Writing rollover record to the end of current segment.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andrey-kuznetsov/ignite ignite-9601

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/4762.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4762


commit 13a1dfea64e5974f0014930a08bcf58d239f0428
Author: Andrey Kuznetsov 
Date:   2018-09-14T12:50:47Z

IGNITE-9601 Writing rollover record to the end of current segment.




---

[jira] [Created] (IGNITE-9601) Write rollover WAL record as the last

2018-09-14 Thread Andrey Kuznetsov (JIRA)

Andrey Kuznetsov created IGNITE-9601:


 Summary: Write rollover WAL record as the last 
 Key: IGNITE-9601
 URL: https://issues.apache.org/jira/browse/IGNITE-9601
 Project: Ignite
  Issue Type: Task
Affects Versions: 2.6
Reporter: Andrey Kuznetsov
Assignee: Andrey Kuznetsov
 Fix For: 2.8


Currently, rollover WAL record gets to the next segment when being logged. 
Moreover, the implementation does allows data races, and rollover record is not 
necessarily the first record in the next segment. We are to add an option to 
logging facility to allow writing rollover record to the end of the current 
segment; subsequent records should get to the next segment then.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] ignite pull request #4736: Ignite 6587 debug

2018-09-12 Thread andrey-kuznetsov

GitHub user andrey-kuznetsov opened a pull request:

https://github.com/apache/ignite/pull/4736

Ignite 6587 debug

For debug purposes.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andrey-kuznetsov/ignite ignite-6587-debug

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/4736.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4736


commit 3b25cb934838feb3300c7088db4b3840ec7f9ab5
Author: Andrey Kuznetsov 
Date:   2018-05-07T12:43:26Z

IGNITE-6587: Refactored ServerImpl.TcpServer to use GridWorker.

commit 665ba863ceb95b643a805d6920f3f384275dae20
Author: Andrey Kuznetsov 
Date:   2018-05-07T13:06:22Z

IGNITE-6587: Refactored ServerImpl.RingMessageWorker to use GridWorker.

commit b10bcd350efb55de95792b369996ba266b3612a9
Author: Andrey Kuznetsov 
Date:   2018-05-07T15:59:56Z

IGNITE-6587: Refactored WAL manager critical threads to use GridWorker.

commit 3154112c8bab70d1ff4fe20639648e8a9c7da373
Author: Andrey Kuznetsov 
Date:   2018-05-07T18:55:25Z

IGNITE-6587: Minor improvements.

commit a44fc8064a9ba8f42e4faacde018ac9bc1193f01
Author: Andrey Kuznetsov 
Date:   2018-05-08T10:52:01Z

IGNITE-6587: Refactored CommunicationWorker to use GridWorker.

commit 0efc5354f33666ac705c738d75955258084091de
Author: Andrey Kuznetsov 
Date:   2018-05-08T15:25:56Z

IGNITE-6587: (WIP) switching to timed waits/polls, updating heartbeats.

commit 48c889441a119ef2192c1ff61485c20f80659e82
Author: Andrey Kuznetsov 
Date:   2018-05-14T13:38:47Z

IGNITE-6587: Heartbeat updates in critical WAL manager workers.

commit 69059bda2071f7d6af0cef30cf8d77b160fc56be
Author: Andrey Kuznetsov 
Date:   2018-05-14T15:55:51Z

IGNITE-6587: (WiP) More heartbeat updates in critical workers.

commit 6fb2edf1fef15bb187481f89eddb3710d5daadba
Author: Andrey Kuznetsov 
Date:   2018-05-15T13:09:16Z

IGNITE-6587: Heartbeat updates in checkpointer.

commit 9cd8cba1fdc8c21d999df4c334daaef7e226aa0f
Author: Andrey Kuznetsov 
Date:   2018-05-15T13:23:17Z

IGNITE-6587: More heartbeat updates in critical workers.

commit 5245abc0c72812c4746127b25638a28dac7af11a
Author: Andrey Kuznetsov 
Date:   2018-05-16T20:14:31Z

IGNITE-6587: Extended StripedExecutor.Stripe from GridWorker.

commit 2eb448f6a7ff7a6be423742336480a7b98793c2d
Author: Andrey Kuznetsov 
Date:   2018-05-17T11:37:08Z

IGNITE-6587: Heartbeat updates in StripedExecutor workers.

commit 8304fa92879fe293f112d1c7ba515b5bd30cc271
Author: Andrey Kuznetsov 
Date:   2018-05-17T12:33:23Z

Merge branch 'master' into ignite-6587

# Conflicts:
#   
modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/wal/FileWriteAheadLogManager.java
#   
modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/wal/FsyncModeFileWriteAheadLogManager.java

commit 5051aa4d44e78b5362dd0094bc5a57566eea4855
Author: Andrey Kuznetsov 
Date:   2018-05-17T12:36:23Z

IGNITE-6587: Addition to merge commit.

commit 732d2c25013d94d67188f42b2f96c786eeaf6346
Author: Andrey Kuznetsov 
Date:   2018-05-17T12:42:05Z

IGNITE-6587: Updated WAL FileDecompressors to conform uniform poll style.

commit 0c6f7762f1bc709c1444e545ff6475edc3fd6152
Author: Andrey Kuznetsov 
Date:   2018-05-18T09:54:05Z

IGNITE-6587: Added Checkpointer to worker registry.

commit d64e2f8e97e9d8a339695087ceea1443f8f75141
Author: Andrey Kuznetsov 
Date:   2018-05-18T14:43:22Z

IGNITE-6587: Added Stripe workers to worker registry.

commit fd142e80f80e73d420c0a03923ba3a4783dc7f34
Author: Andrey Kuznetsov 
Date:   2018-05-21T11:40:46Z

IGNITE-6587: Optional reference to worker registry in GridNioServer.

commit de98f8782f1728cc7e4a2852fbd0931cb21d3af9
Author: Andrey Kuznetsov 
Date:   2018-05-21T11:42:11Z

IGNITE-6587: Added worker registry to communication SPI GridNioServer.

commit f24f93abe8b8a3833f556b204c9995d28cd9cf44
Author: Andrey Kuznetsov 
Date:   2018-05-21T13:47:11Z

Merge branch 'master' into ignite-6587

# Conflicts:
#   
modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/wal/FileWriteAheadLogManager.java
#   
modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/wal/FsyncModeFileWriteAheadLogManager.java

commit e3a7b9aaed79d45b205715b6b281db702e7be95d
Author: Andrey Kuznetsov 
Date:   2018-05-22T12:30:30Z

IGNITE-6587: Recovering after incorrect merge commit.

commit 45c64a1ebaf5dabcfdf64e35a14542c0bddaa4c2
Author: Andrey Kuznetsov 
Date:   2018-05-22T12:56:27Z

IGNITE-6587: Moved WAL timeout constants to proper place.

commit 95f1ce0a9c3002a72fa47c616f928521570d2d40
Author: Andrey Kuznetsov 
Date:   2018-05-22T13:12:36Z

IGNITE-6587: Made WAL FileDecompressors not critical again.

commit

Re: Critical worker threads liveness checking drawbacks

2018-09-11 Thread Andrey Kuznetsov

David, Maxim!

Thanks a lot for you ideas. Unfortunately, I can't adopt all of them right
now: the scope is much broader than the scope of the change I implement. I
have had a talk to a group of Ignite commiters, and we agreed to complete
the change as follows.
- Blocking instructions in system-critical which may resonably last long
should be explicitly excluded from the monitoring.
- Failure handlers should have a setting to suppress some failures on
per-failure-type basis.
According to this I have updated the implementation: [1]

[1] https://github.com/apache/ignite/pull/4089

пн, 10 сент. 2018 г. в 22:35, David Harvey :

> When I've done this before,I've needed to find the oldest  thread, and kill
> the node running that.   From a language standpoint, Maxim's "without
> progress" better than "heartbeat".   For example, what I'm most interested
> in on a distributed system is which thread started the work it has not
> completed the earliest, and when did that thread last make forward
> process. You don't want to kill a node because a thread is waiting on a
> lock held by a thread that went off-node and has not gotten a response.
> If you don't understand the dependency relationships, you will make
> incorrect recovery decisions.
>
> On Mon, Sep 10, 2018 at 4:08 AM Maxim Muzafarov 
> wrote:
>
> > I think we should find exact answers to these questions:
> >  1. What `critical` issue exactly is?
> >  2. How can we find critical issues?
> >  3. How can we handle critical issues?
> >
> > First,
> >  - Ignore uninterruptable actions (e.g. worker\service shutdown)
> >  - Long I/O operations (should be a configurable timeout for each type of
> > usage)
> >  - Infinite loops
> >  - Stalled\deadlocked threads (and\or too many parked threads, exclude
> I/O)
> >
> > Second,
> >  - The working queue is without progress (e.g. disco, exchange queues)
> >  - Work hasn't been completed since the last heartbeat (checking
> > milestones)
> >  - Too many system resources used by a thread for the long period of time
> > (allocated memory, CPU)
> >  - Timing fields associated with each thread status exceeded a maximum
> time
> > limit.
> >
> > Third (not too many options here),
> >  - `log everything` should be the default behaviour in all these cases,
> > since it may be difficult to find the cause after the restart.
> >  - Wait some interval of time and kill the hanging node (cluster should
> be
> > configured stable enough)
> >
> > Questions,
> >  - Not sure, but can workers miss their heartbeat deadlines if CPU loads
> up
> > to 80%-90%? Bursts of momentary overloads can be
> > expected behaviour as a normal part of system operations.
> >  - Why do we decide that critical thread should monitor each other? For
> > instance, if all the tasks were blocked and unable to run,
> > node reset would never occur. As for me, a better solution is to use
> a
> > separate monitor thread or pool (maybe both with software
> > and hardware checks) that not only checks heartbeats but monitors the
> > other system as well.
> >
> > On Mon, 10 Sep 2018 at 00:07 David Harvey  wrote:
> >
> > > It would be safer to restart the entire cluster than to remove the last
> > > node for a cache that should be redundant.
> > >
> > > On Sun, Sep 9, 2018, 4:00 PM Andrey Gura  wrote:
> > >
> > > > Hi,
> > > >
> > > > I agree with Yakov that we can provide some option that manage worker
> > > > liveness checker behavior in case of observing that some worker is
> > > > blocked too long.
> > > > At least it will  some workaround for cases when node fails is too
> > > > annoying.
> > > >
> > > > Backups count threshold sounds good but I don't understand how it
> will
> > > > help in case of cluster hanging.
> > > >
> > > > The simplest solution here is alert in cases of blocking of some
> > > > critical worker (we can improve WorkersRegistry for this purpose and
> > > > expose list of blocked workers) and optionally call system configured
> > > > failure processor. BTW, failure processor can be extended in order to
> > > > perform any checks (e.g. backup count) and decide whether it should
> > > > stop node or not.
> > > > On Sat, Sep 8, 2018 at 3:42 PM Andrey Kuznetsov 
> > > wrote:
> > > > >
> > > > > David, Yakov, I understand your fears. But liveness checks deal
> with
> > > > > _critical_ co

Re: Critical worker threads liveness checking drawbacks

2018-09-08 Thread Andrey Kuznetsov

David, Yakov, I understand your fears. But liveness checks deal with
_critical_ conditions, i.e. when such a condition is met we conclude the
node as totally broken, and there is no sense to keep it alive regardless
the data it contains. If we want to give it a chance, then the condition
(long fsync etc.) should not considered as critical at all.

сб, 8 сент. 2018 г. в 15:18, Yakov Zhdanov :

> Agree with David. We need to have an opporunity set backups count threshold
> (at runtime also!) that will not allow any automatic stop if there will be
> a data loss. Andrey, what do you think?
>
> --Yakov
>


-- 
Best regards,
  Andrey Kuznetsov.

Re: Critical worker threads liveness checking drawbacks

2018-09-07 Thread Andrey Kuznetsov

Yakov,

Thanks for reply. Indeed, initial design assumed node termination when
hanging critical thread has been detected. But sometimes it looks
inappropriate. Let, for example fsync in WAL writer thread takes too long,
and we terminate the node. Upon rebalancing, this may lead to long fsyncs
on other nodes due to increased per node load, hence we can terminate the
next node as well. Eventually we can collapse the entire cluster. Is it a
possible scenario?

пт, 7 сент. 2018 г. в 18:44, Yakov Zhdanov :

> Andrey,
>
> I don't understand your point. My opinion, the idea of these changes is to
> make cluster more stable and responsive by eliminating hanged nodes. I
> would not make too much difference between threads trapped in deadlock and
> threads hanging on fsync calls for too long. Both situations lead to
> increasing latency in cluster till its full unavailability.
>
> So, killing node hanging on fsync may be reasonable. Agree?
>
> You may implement the approach when you have warning messages in logs by
> default, but termination option should also be available.
>
> Thanks!
>
> --Yakov
>
>

Critical worker threads liveness checking drawbacks

2018-09-06 Thread Andrey Kuznetsov

Igniters,

Currently, we have a nearly completed implementation for system-critical
threads liveness checking [1], in terms of IEP-14 [2] and IEP-5 [3]. In a
nutshell, system-critical threads monitor each other and checks for two
aspects:
- whether a thread is alive;
- whether a thread is active, i.e. it updates its heartbeat timestamp
periodically.
When either check fails, critical failure handler is called, this in fact
means node stop.

The implementation of activity checks has a flaw now: some blocking actions
are parts of normal operation and should not lead to node stop, e.g.
- WAL writer thread can call {{fsync()}};
- any cache write that occurs in system striped executor can lead to
{{fsync()}} call again.
The former example can be fixed by disabling heartbeat checks temporarily
for known long-running actions, but it won't work with for the latter one.

I see a few options to address the issue:
- Just log any long-running action instead of calling critical failure
handler.
- Introduce several severity levels for long-running actions handling. Each
level will have its own failure handler. Depending on the level,
long-running action can lead to node stop, error logging or no-op reaction.

I encourage you to suggest other options. Any idea is appreciated.

[1] https://issues.apache.org/jira/browse/IGNITE-6587
[2]
https://cwiki.apache.org/confluence/display/IGNITE/IEP-14+Ignite+failures+handling
[3]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=74683878

--
Best regards,
  Andrey Kuznetsov.

[GitHub] ignite pull request #4335: IGNITE-8823 Quick fix.

2018-09-03 Thread andrey-kuznetsov

Github user andrey-kuznetsov closed the pull request at:

https://github.com/apache/ignite/pull/4335


---

[GitHub] ignite pull request #4667: IGNITE-8823 Quick fix.

2018-09-03 Thread andrey-kuznetsov

GitHub user andrey-kuznetsov opened a pull request:

https://github.com/apache/ignite/pull/4667

IGNITE-8823 Quick fix.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andrey-kuznetsov/ignite ignite-8823-true

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/4667.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4667


commit b907d2a8329d58acca77049de580c345a3dee27d
Author: Andrey Kuznetsov 
Date:   2018-09-03T11:21:23Z

IGNITE-8823 Quick fix.




---

[GitHub] ignite pull request #4559: Ignite 9280

2018-08-16 Thread andrey-kuznetsov

GitHub user andrey-kuznetsov opened a pull request:

https://github.com/apache/ignite/pull/4559

Ignite 9280



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andrey-kuznetsov/ignite ignite-9280

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/4559.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4559


commit b74b4a645a13fea214937652bc4523bb0340995e
Author: Andrey Kuznetsov 
Date:   2018-08-15T12:15:33Z

IGNITE-9280 Compressing WAL archive segments ASAP.

commit 099d67410ee19e68b2dc3c335d3497b1f37c4696
Author: Andrey Kuznetsov 
Date:   2018-08-15T15:52:02Z

IGNITE-9280 Added WAL segment compaction completion event.

commit 72bbecd98c19b1cda2c605b19f2b1ba2e7867d18
Author: Andrey Kuznetsov 
Date:   2018-08-15T17:42:44Z

IGNITE-9280 Exposed last WAL segment compaction time to public API.

commit ea6a2e1249695cd3d838eb7d01367d21feaed821
Author: Andrey Kuznetsov 
Date:   2018-08-16T10:28:34Z

Merge branch 'master' into ignite-9280




---

[jira] [Created] (IGNITE-9280) Decrease WAL archive compaction latency

2018-08-15 Thread Andrey Kuznetsov (JIRA)

Andrey Kuznetsov created IGNITE-9280:


 Summary: Decrease WAL archive compaction latency
 Key: IGNITE-9280
 URL: https://issues.apache.org/jira/browse/IGNITE-9280
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 2.6
Reporter: Andrey Kuznetsov
 Fix For: 2.7


If {{n}} is the index of WAL segment containing latest checkpoint mark, then 
the segment {{n-1}} is prevented from compaction currently. This limitation can 
be removed safely (while the requirement of preserving raw segment {{n-1}} 
should remain untouched).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

1 2 3 >

1 - 100 of 264 matches

Mail list logo