Re: Apache Ignite 2.7. Last Mile

2018-11-16 Thread Alexey Goncharuk
Igniters,

I've just found that S.toString() implementation is broken in ignite-2.7
and master [1]. It leads to a message
*Wrapper [p=Parent [a=0]Child [b=0, super=]]*
being formed instead of
*Wrapper [p=Child [b=0, super=Parent [a=0]]]*
for classes with inheritance that use S.toString(SomeClass.class, this,
super.toString()) embedded to some other object.

Dmitrii Ryabov, I've reverted two commits related to IGNITE-602 and
IGNITE-9209 tickets locally and it fixes the issue. Can you take a look at
the issue?

I think this regression essentially makes our logs unreadable in some cases
and I would like to get it fixed in ignite-2.7 or revert both commits from
the release.

[1] https://issues.apache.org/jira/browse/IGNITE-10301

пт, 9 нояб. 2018 г. в 09:22, Nikolay Izhikov :

> Hello, Igniters.
>
> We still have 5 tickets for 2.7:
>
> IGNITE-10052Andrew Mashenkov Restart node during TX causes vacuum
> error.
> IGNITE-10170Unassigned   .NET: Services.ServicesTestAsync fails
> IGNITE-10196Maxim Pudov  Remove kafka-clients-*-test dependency
> IGNITE-10154Andrey Gura  Critical worker liveness check
> configuration is non-trivial and inconsistent
> IGNITE-9996 Nikolay Izhikov  Investigate possible performance drop in
> FSYNC mode for ignite-2.7 compared to ignite-2.6
>
>
> В Чт, 08/11/2018 в 14:25 +0300, Nikolay Izhikov пишет:
> > I'm OK with this.
> >
> > чт, 8 нояб. 2018 г., 13:44 Andrey Gura ag...@apache.org:
> > > Long, long way to release :)
> > >
> > > Guys, we have a breaking change in Ignite 2.7 so we must add
> > > IGNITE-10154 [1] fix to the release.
> > >
> > > [1] https://issues.apache.org/jira/browse/IGNITE-10154
> > > On Tue, Nov 6, 2018 at 6:30 PM Igor Sapego  wrote:
> > > >
> > > > Guys,
> > > >
> > > > I've found the following issue: [1]. It is quite local (only affects
> > > > Ignite C++ Linux build system) but quite critical too. I think it
> > > > should be included in 2.7.
> > > >
> > > > What do you think?
> > > >
> > > > [1] - https://issues.apache.org/jira/browse/IGNITE-10147
> > > >
> > > > Best Regards,
> > > > Igor
> > > >
> > > >
> > > > On Tue, Oct 30, 2018 at 4:35 PM Вячеслав Коптилин <
> slava.kopti...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hello Nikolay, Igniters,
> > > > >
> > > > > It seems that we lost the following commit that should be included
> in
> > > > > 'ignite-2.7' branch
> > > > > (It looks like the change was not accidentally cherry-picked from
> 'master'
> > > > > to 'ignite-2.7')
> > > > >  -
> > > > >
> > > > >
> https://github.com/apache/ignite/commit/6e0ff06f8e309657a16c94da605348d9c3b804ad
> > > > >
> > > > > The most important part is the change introduced into
> GridDhtAtomicCache,
> > > > > the fix prevents NullPointerException during cache updates under
> some
> > > > > circumstances.
> > > > > So, I propose including the fix into ignite-2.7, at least the
> change of
> > > > > GridDhtAtomicCache.
> > > > >
> > > > > Thanks,
> > > > > Slava.
> > > > >
> > > > > пн, 29 окт. 2018 г. в 11:20, Nikolay Izhikov  >:
> > > > >
> > > > > > Hello, guys.
> > > > > >
> > > > > > For today we have 11 tickets mapped to 2.7
> > > > > >
> > > > > > IGNITE-10010 Alexey Goncharuk Node halted if second node was
> stopped,
> > > > > then
> > > > > > cache destroyed, then second node returned
> > > > > > IGNITE-10015 Alexey Goncharuk Sporadic JVM crash due to restart
> nodes
> > > > > > IGNITE-10013 Unassigned Node restart may lead to NPE in
> > > > > > GridDhtPartitionsExchangeFuture
> > > > > > IGNITE-9928 Igor Seliverstov MVCC TX: Late affinity assignment
> support.
> > > > > > IGNITE-9985 Igor Seliverstov MVCC TX: fix backup mappings
> > > > > > IGNITE-10007 Sergey Kozlov Deactivation hangs if an open
> transaction
> > > > > exists
> > > > > > IGNITE-10004 Andrew Mashenkov Parse error leads to leave the
> transaction
> > > > > > IGNITE-10024 Ivan Pavlukhin MVCC TX: Stackoverflow during
> DhtEnlistFuture
> > > > > > mapping
> > > > > &g

[jira] [Created] (IGNITE-10301) GridToStringBuilder is broken for classes with inheritance

2018-11-16 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-10301:
-

 Summary: GridToStringBuilder is broken for classes with inheritance
 Key: IGNITE-10301
 URL: https://issues.apache.org/jira/browse/IGNITE-10301
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.7
Reporter: Alexey Goncharuk
 Fix For: 2.7


Given the following class hierarchy
{code}
/** */
private static class Parent {
/** */
private int a;

/** {@inheritDoc} */
@Override public String toString() {
return S.toString(Parent.class, this);
}
}

/** */
private static class Child extends Parent {
/** */
private int b;

/** {@inheritDoc} */
@Override public String toString() {
return S.toString(Child.class, this, super.toString());
}
}

private static class Wrapper {
/** */
@GridToStringInclude
Parent p = new Child();

/** {@inheritDoc} */
@Override public String toString() {
return S.toString(Wrapper.class, this);
}
}
{code}
the next test fails:
{code}
/**
 */
public void testHierarchy() {
Wrapper w = new Wrapper();
Parent p = w.p;

String wS = w.toString();
String pS = p.toString();

// Expect wS to be "Wrapper [p=" + pS + ']'.
assertEquals("Wrapper [p=" + pS + ']', wS);
}
{code}

{code}
Expected :Wrapper [p=Child [b=0, super=Parent [a=0]]]
Actual   :Wrapper [p=Parent [a=0]Child [b=0, super=]]
{code}

This is a regression from IGNITE-602. We need to fix this in 2.7 or revert 
IGNITE-602.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10238) Intermittent Client Nodes suite hang

2018-11-13 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-10238:
-

 Summary: Intermittent Client Nodes suite hang
 Key: IGNITE-10238
 URL: https://issues.apache.org/jira/browse/IGNITE-10238
 Project: Ignite
  Issue Type: Test
 Environment: There are occasional hangs of Client Nodes suite in 
master. A quick peek at the thread dumps reveals an interesting deadlock (only 
relevant parts of the thread dump are left):
{code}
"disco-notifier-worker-#634%internal.IgniteClientReconnectApiExceptionTest0%" 
#791 prio=5 os_prio=0 tid=0x7f990c12d800 nid=0x11b9 waiting on condition 
[0x7f991a0eb000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141)
at 
org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.metadata(CacheObjectBinaryProcessorImpl.java:656)
at 
org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl$1.metadata(CacheObjectBinaryProcessorImpl.java:206)
at 
org.apache.ignite.internal.binary.BinaryContext.metadata(BinaryContext.java:1293)
at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.getOrCreateSchema(BinaryReaderExImpl.java:2007)
at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.(BinaryReaderExImpl.java:286)
at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.(BinaryReaderExImpl.java:185)
at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.readField(BinaryReaderExImpl.java:1984)
at 
org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.read0(BinaryFieldAccessor.java:703)
at 
org.apache.ignite.internal.binary.BinaryFieldAccessor.read(BinaryFieldAccessor.java:188)
at 
org.apache.ignite.internal.binary.BinaryClassDescriptor.read(BinaryClassDescriptor.java:874)
at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1764)
at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1716)
at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.readField(BinaryReaderExImpl.java:1984)
at 
org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.read0(BinaryFieldAccessor.java:703)
at 
org.apache.ignite.internal.binary.BinaryFieldAccessor.read(BinaryFieldAccessor.java:188)
at 
org.apache.ignite.internal.binary.BinaryClassDescriptor.read(BinaryClassDescriptor.java:874)
at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1764)
at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1716)
at 
org.apache.ignite.internal.binary.GridBinaryMarshaller.deserialize(GridBinaryMarshaller.java:313)
at 
org.apache.ignite.internal.binary.BinaryMarshaller.unmarshal0(BinaryMarshaller.java:101)
at 
org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(AbstractNodeNameAwareMarshaller.java:81)
at 
org.apache.ignite.internal.util.IgniteUtils.unmarshal(IgniteUtils.java:10131)
at 
org.apache.ignite.internal.util.IgniteUtils.unmarshal(IgniteUtils.java:10160)
at 
org.apache.ignite.internal.GridEventConsumeHandler.p2pUnmarshal(GridEventConsumeHandler.java:390)
at 
org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.processStartRequest(GridContinuousProcessor.java:1362)
at 
org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.access$400(GridContinuousProcessor.java:111)
at 
org.apache.ignite.internal.processors.continuous.GridContinuousProcessor$2.onCustomEvent(GridContinuousProcessor.java:203)
at 
org.apache.ignite.internal.processors.continuous.GridContinuousProcessor$2.onCustomEvent(GridContinuousProcessor.java:194)
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:725)
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:602)
- locked <0x0007b62859b8> (a java.lang.Object)
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4$$Lambda$17/432384581.run(Unknown
 Source)
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2665)
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryM

[jira] [Created] (IGNITE-10237) Inspections build is broken in master

2018-11-13 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-10237:
-

 Summary: Inspections build is broken in master
 Key: IGNITE-10237
 URL: https://issues.apache.org/jira/browse/IGNITE-10237
 Project: Ignite
  Issue Type: Test
Reporter: Alexey Goncharuk






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10123) Intermittent OOME errors in PDS indexing tests

2018-11-02 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-10123:
-

 Summary: Intermittent OOME errors in PDS indexing tests
 Key: IGNITE-10123
 URL: https://issues.apache.org/jira/browse/IGNITE-10123
 Project: Ignite
  Issue Type: Test
Reporter: Alexey Goncharuk
 Fix For: 2.8






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10094) TC: Introduce overnight builds

2018-10-31 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-10094:
-

 Summary: TC: Introduce overnight builds
 Key: IGNITE-10094
 URL: https://issues.apache.org/jira/browse/IGNITE-10094
 Project: Ignite
  Issue Type: Task
Reporter: Alexey Goncharuk


Creating this ticket to collect all efforts on shortening a single TC run and 
introduce overnight TC runs.
>From the infrastructure side, we need to create a separate run configuration 
>(for example, Run All Nightly). To begin, Run All Nightly will delegate to Run 
>All and later we will move several long-running suites to the nightly run. 
>Nightly Run All should have a nightly trigger.
>From the TC bot side, we need to configure it to push nightly builds when TC 
>is idle and additionally to track new failures in nightly runs.
>From the code side, we need to define an environment property that should 
>distinguish a quick run from the nightly run. Later this property will be used 
>to scale tests duration.

[~dpavlov], [~sergey-chugunov], [~vveider], can you chime in?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10068) Update documentation for username and password handling in control.sh

2018-10-30 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-10068:
-

 Summary: Update documentation for username and password handling 
in control.sh
 Key: IGNITE-10068
 URL: https://issues.apache.org/jira/browse/IGNITE-10068
 Project: Ignite
  Issue Type: Task
  Components: documentation
Reporter: Alexey Goncharuk


Need to update documentation on ./control.sh utility handling username and 
password according to the linked change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [MTCGA]: new failures in builds [2171480] needs to be handled

2018-10-27 Thread Alexey Goncharuk
Pushed a fix for the failed tests.

сб, 27 окт. 2018 г. в 5:40, :

> Hi Igniters,
>
>  I've detected some new issue on TeamCity to be handled. You are more than
> welcomed to help.
>
>  If your changes can lead to this failure(s): We're grateful that you were
> a volunteer to make the contribution to this project, but things change and
> you may no longer be able to finalize your contribution.
>  Could you respond to this email and indicate if you wish to continue and
> fix test failures or step down and some committer may revert you commit.
>
>  *Recently contributed test failed in master
> IgnitePdsPartitionPreloadTest.testPreloadLocalTransactionalAsync
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=4952717927608879685=%3Cdefault%3E=testDetails
>
>  *Recently contributed test failed in master
> IgnitePdsPartitionPreloadTest.testPreloadPartitionAtomicClientAsync
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=-173857265556901553=%3Cdefault%3E=testDetails
>
>  *Recently contributed test failed in master
> IgnitePdsPartitionPreloadTest.testPreloadPartitionTransactionalNodeFilteredAsync
>
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=5208646148789249104=%3Cdefault%3E=testDetails
>
>  *Recently contributed test failed in master
> IgnitePdsPartitionPreloadTest.testPreloadPartitionTransactionalNodeFilteredSync
>
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=2958209006766821475=%3Cdefault%3E=testDetails
>
>  *Recently contributed test failed in master
> IgnitePdsPartitionPreloadTest.testPreloadPartitionTransactionalClientAsync
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=-8661712816760577172=%3Cdefault%3E=testDetails
>
>  *Recently contributed test failed in master
> IgnitePdsPartitionPreloadTest.testPreloadPartitionAtomicClientSync
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=-872876181572445460=%3Cdefault%3E=testDetails
>
>  *Recently contributed test failed in master
> IgnitePdsPartitionPreloadTest.testPreloadPartitionTransactionalClientSync
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=567804134723325=%3Cdefault%3E=testDetails
>
>  *Recently contributed test failed in master
> IgnitePdsPartitionPreloadTest.testPreloadPartitionAtomicNodeFilteredAsync
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=-2375861829685564274=%3Cdefault%3E=testDetails
>
>  *Recently contributed test failed in master
> IgnitePdsPartitionPreloadTest.testPreloadLocalTransactionalSync
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=-5804002776086444183=%3Cdefault%3E=testDetails
>
>  *Recently contributed test failed in master
> IgnitePdsPartitionPreloadTest.testPreloadPartitionAtomicNodeFilteredSync
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=7109598602825094156=%3Cdefault%3E=testDetails
>  Changes may lead to failure were done by
>  - y.chief
> http://ci.ignite.apache.org/viewModification.html?modId=836543=false
>  - zaleslaw.sin
> http://ci.ignite.apache.org/viewModification.html?modId=836515=false
>  - maxmuzaf
> http://ci.ignite.apache.org/viewModification.html?modId=836501=false
>  - alexey.scherbakoff
> http://ci.ignite.apache.org/viewModification.html?modId=836489=false
>  - verbalab
> http://ci.ignite.apache.org/viewModification.html?modId=836477=false
>  - maxmuzaf
> http://ci.ignite.apache.org/viewModification.html?modId=836471=false
>  - ilya.kasnacheev
> http://ci.ignite.apache.org/viewModification.html?modId=836467=false
>
>  - Here's a reminder of what contributors were agreed to do
> https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute
>  - Should you have any questions please contact
> dev@ignite.apache.org
>
> Best Regards,
> Apache Ignite TeamCity Bot
> https://github.com/apache/ignite-teamcity-bot
> Notification generated at 05:39:58 27-10-2018
>


[jira] [Created] (IGNITE-9999) Add verbose logging for node recovery

2018-10-25 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-:


 Summary: Add verbose logging for node recovery
 Key: IGNITE-
 URL: https://issues.apache.org/jira/browse/IGNITE-
 Project: Ignite
  Issue Type: Task
Reporter: Alexey Goncharuk






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Critical worker threads liveness checking drawbacks

2018-10-25 Thread Alexey Goncharuk
Andrey,

I still see that checkpoint read lock acquisition raises a CRITICAL_ERROR,
which by default will shut down local node. As far as I remember, we
decided that by default thread timeout should not trigger node failure.
Now, however, it does, because we ignore SYSTEM_WORKER_BLOCKED events in
default configuration.

Should we introduce another critical failure type
CHECKPOINT_READ_LOCK_BLOCKED or use SYSTEM_WORKER_BLOCKED for checkpoint
read lock acquire failure?

--AG

пт, 12 окт. 2018 г. в 8:29, Andrey Kuznetsov :

> Igniters,
>
> Now I spot blocking / long-running code arising from
> {{GridDhtPartitionsExchangeFuture#init}} calls in partition-exchanger
> thread, see [1]. Ideally, all blocking operations along all possible code
> paths should be guarded implicitly from critical failure detector to avoid
> the thread from being considered blocked. There is a pull request [2] that
> provides shallow solution. I didn't change code outside
> {{GridDhtPartitionsExchangeFuture}}, otherwise it could be broken by any
> upcoming change. Also, I didn't touch the code runnable by threads other
> than partition-exchanger. So I have a number of guarded sections that are
> wider than they could be, and this potentially hides issues from failure
> detector. Does this PR make sense? Or maybe it's better to exclude
> partition-exchanger from critical threads registry at all?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-9710
> [2] https://github.com/apache/ignite/pull/4962
>
>
> пт, 28 сент. 2018 г. в 18:56, Maxim Muzafarov :
>
> > Andrey, Andrey
> >
> > > Thanks for being attentive! It's definitely a typo. Could you please
> > create
> > > an issue?
> >
> > I've created an issue [1] and prepared PR [2].
> > Please, review this change.
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-9723
> > [2] https://github.com/apache/ignite/pull/4862
> >
> > On Fri, 28 Sep 2018 at 16:58 Yakov Zhdanov  wrote:
> >
> > > Config option + mbean access. Does that make sense?
> > >
> > > Yakov
> > >
> > > On Fri, Sep 28, 2018, 17:17 Vladimir Ozerov 
> > wrote:
> > >
> > > > Then it should be config option.
> > > >
> > > > пт, 28 сент. 2018 г. в 13:15, Andrey Gura :
> > > >
> > > > > Guys,
> > > > >
> > > > > why we need both config option and system property? I believe one
> way
> > > is
> > > > > enough.
> > > > > On Fri, Sep 28, 2018 at 12:38 PM Nikolay Izhikov <
> > nizhi...@apache.org>
> > > > > wrote:
> > > > > >
> > > > > > Ticket created -
> https://issues.apache.org/jira/browse/IGNITE-9737
> > > > > >
> > > > > > Fixed version is 2.7.
> > > > > >
> > > > > > В Пт, 28/09/2018 в 11:41 +0300, Alexey Goncharuk пишет:
> > > > > > > Nikolay, I agree, a user should be able to disable both thread
> > > > liveness
> > > > > > > check and checkpoint read lock timeout check from config and a
> > > system
> > > > > > > property.
> > > > > > >
> > > > > > > пт, 28 сент. 2018 г. в 11:30, Nikolay Izhikov <
> > nizhi...@apache.org
> > > >:
> > > > > > >
> > > > > > > > Hello, Igniters.
> > > > > > > >
> > > > > > > > I found that this feature can't be disabled from config.
> > > > > > > > The only way to disable it is from JMX bean.
> > > > > > > >
> > > > > > > > I think it very dangerous: If we have some corner case or a
> bug
> > > in
> > > > > this
> > > > > > > > Watch Dog it can make Ignite unusable.
> > > > > > > > I propose to implement possibility to disable this feature
> > both -
> > > > > from
> > > > > > > > config and from JVM options.
> > > > > > > >
> > > > > > > > What do you think?
> > > > > > > >
> > > > > > > > В Чт, 27/09/2018 в 16:14 +0300, Andrey Kuznetsov пишет:
> > > > > > > > > Maxim,
> > > > > > > > >
> > > > > > > > > Thanks for being attentive! It's definitely a typo. Could
> you
> > > > > please
> > > > > >

[jira] [Created] (IGNITE-9996) Investigate possible performance drop in FSYNC mode for ignite-2.7 compared to ignite-2.6

2018-10-25 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-9996:


 Summary: Investigate possible performance drop in FSYNC mode for 
ignite-2.7 compared to ignite-2.6
 Key: IGNITE-9996
 URL: https://issues.apache.org/jira/browse/IGNITE-9996
 Project: Ignite
  Issue Type: Task
Reporter: Alexey Goncharuk






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [MTCGA]: new failures in builds [2123440] needs to be handled

2018-10-23 Thread Alexey Goncharuk
All,

We had to revert the commit because the fix appeared to be more complex
than we expected. Tests should be ok now.

вт, 23 окт. 2018 г. в 11:20, Alexey Goncharuk :

> Hi all,
>
> We are working on the fix, it should be merged to master asap.
>
> вт, 23 окт. 2018 г. в 11:18, Maxim Muzafarov :
>
>> Hello,
>>
>> Are there any updates?
>> The build constantly fails with `Execution timeout` in the master branch
>> since October 20.
>>
>> The problem commit supposed to be related to the [2] issue and I think the
>> probable `IgniteSqlSplitterSelfTest#testPushDown` test fails with
>> excpetion:
>>
>> class org.apache.ignite.binary.BinaryObjectException: Failed to register
>> class.
>> at
>>
>> org.apache.ignite.internal.binary.BinaryContext.registerUserClassName(BinaryContext.java:1249)
>> at
>>
>> org.apache.ignite.internal.binary.BinaryContext.registerUserClassDescriptor(BinaryContext.java:798)
>> at
>>
>> org.apache.ignite.internal.binary.BinaryContext.registerClassDescriptor(BinaryContext.java:775)
>>
>> and the whole suite hungs.
>>
>> [1]
>>
>> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_BinaryObjectsSimpleMapperQueries=buildTypeStatusDiv_IgniteTests24Java8=%3Cdefault%3E
>> [2] https://issues.apache.org/jira/browse/IGNITE-5795
>>
>> On Sun, 21 Oct 2018 at 08:30 Павлухин Иван  wrote:
>>
>> > Hi Anton,
>> >
>> > I ran a problematic build against my contribution [1] and it seems to
>> pass
>> > fine. Then I ran the build against your PR branch and it hanged [2].
>> > There is nothing surprising that it fired only in master because the
>> build
>> > was added to RunAll after your PR runs.
>> > Could you please take a look?
>> >
>> > [1] https://ci.ignite.apache.org/viewLog.html?buildId=2127238;
>> > [2]
>> >
>> >
>> https://ci.ignite.apache.org/viewLog.html?buildId=2127232=buildResultsDiv=IgniteTests24Java8_BinaryObjectsSimpleMapperQueries
>> >
>> > сб, 20 окт. 2018 г. в 10:30, :
>> >
>> > > Hi Igniters,
>> > >
>> > >  I've detected some new issue on TeamCity to be handled. You are more
>> > than
>> > > welcomed to help.
>> > >
>> > >  If your changes can lead to this failure(s): We're grateful that you
>> > were
>> > > a volunteer to make the contribution to this project, but things
>> change
>> > and
>> > > you may no longer be able to finalize your contribution.
>> > >  Could you respond to this email and indicate if you wish to continue
>> and
>> > > fix test failures or step down and some committer may revert you
>> commit.
>> > >
>> > >  *New Critical Failure in master Queries (Binary Objects Simple
>> > > Mapper)
>> > >
>> >
>> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_BinaryObjectsSimpleMapperQueries=%3Cdefault%3E=buildTypeStatusDiv
>> > >  Changes may lead to failure were done by
>> > >  - kaa.dev
>> > >
>> >
>> http://ci.ignite.apache.org/viewModification.html?modId=835798=false
>> > >  - vololo100
>> > >
>> >
>> http://ci.ignite.apache.org/viewModification.html?modId=835794=false
>> > >
>> > >  - Here's a reminder of what contributors were agreed to do
>> > > https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute
>> > >  - Should you have any questions please contact
>> > > dev@ignite.apache.org
>> > >
>> > > Best Regards,
>> > > Apache Ignite TeamCity Bot
>> > > https://github.com/apache/ignite-teamcity-bot
>> > > Notification generated at 10:30:43 20-10-2018
>> > >
>> >
>> >
>> > --
>> > Best regards,
>> > Ivan Pavlukhin
>> >
>> --
>> --
>> Maxim Muzafarov
>>
>


Re: [MTCGA]: new failures in builds [2142325] needs to be handled

2018-10-23 Thread Alexey Goncharuk
This can be ignored. I removed the test because it measured performance and
we should run performance tests in a verified environment.

вт, 23 окт. 2018 г. в 5:16, :

> Hi Igniters,
>
>  I've detected some new issue on TeamCity to be handled. You are more than
> welcomed to help.
>
>  If your changes can lead to this failure(s): We're grateful that you were
> a volunteer to make the contribution to this project, but things change and
> you may no longer be able to finalize your contribution.
>  Could you respond to this email and indicate if you wish to continue and
> fix test failures or step down and some committer may revert you commit.
>
>  *Recently contributed test failed in master
> CacheStartInParallelTest.testParallelizationAcceleratesStartOfCaches2
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=-7547494506920381112=%3Cdefault%3E=testDetails
>  Changes may lead to failure were done by
>  - vololo100
> http://ci.ignite.apache.org/viewModification.html?modId=835983=false
>  - dmitry.melnichuk
> http://ci.ignite.apache.org/viewModification.html?modId=835941=false
>  - vpyatkov
> http://ci.ignite.apache.org/viewModification.html?modId=835937=false
>  - kaa.dev
> http://ci.ignite.apache.org/viewModification.html?modId=835901=false
>  - alexey.goncharuk
> http://ci.ignite.apache.org/viewModification.html?modId=835898=false
>  - andrey.mashenkov
> http://ci.ignite.apache.org/viewModification.html?modId=835896=false
>  - stanlukyanov
> http://ci.ignite.apache.org/viewModification.html?modId=835887=false
>  - bessonov.ip
> http://ci.ignite.apache.org/viewModification.html?modId=835884=false
>  - nsamelchev
> http://ci.ignite.apache.org/viewModification.html?modId=835881=false
>  - nsamelchev
> http://ci.ignite.apache.org/viewModification.html?modId=835875=false
>  - nsamelchev
> http://ci.ignite.apache.org/viewModification.html?modId=835873=false
>
>  - Here's a reminder of what contributors were agreed to do
> https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute
>  - Should you have any questions please contact
> dev@ignite.apache.org
>
> Best Regards,
> Apache Ignite TeamCity Bot
> https://github.com/apache/ignite-teamcity-bot
> Notification generated at 05:15:59 23-10-2018
>


Re: [MTCGA]: new failures in builds [2123440] needs to be handled

2018-10-23 Thread Alexey Goncharuk
Hi all,

We are working on the fix, it should be merged to master asap.

вт, 23 окт. 2018 г. в 11:18, Maxim Muzafarov :

> Hello,
>
> Are there any updates?
> The build constantly fails with `Execution timeout` in the master branch
> since October 20.
>
> The problem commit supposed to be related to the [2] issue and I think the
> probable `IgniteSqlSplitterSelfTest#testPushDown` test fails with
> excpetion:
>
> class org.apache.ignite.binary.BinaryObjectException: Failed to register
> class.
> at
>
> org.apache.ignite.internal.binary.BinaryContext.registerUserClassName(BinaryContext.java:1249)
> at
>
> org.apache.ignite.internal.binary.BinaryContext.registerUserClassDescriptor(BinaryContext.java:798)
> at
>
> org.apache.ignite.internal.binary.BinaryContext.registerClassDescriptor(BinaryContext.java:775)
>
> and the whole suite hungs.
>
> [1]
>
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_BinaryObjectsSimpleMapperQueries=buildTypeStatusDiv_IgniteTests24Java8=%3Cdefault%3E
> [2] https://issues.apache.org/jira/browse/IGNITE-5795
>
> On Sun, 21 Oct 2018 at 08:30 Павлухин Иван  wrote:
>
> > Hi Anton,
> >
> > I ran a problematic build against my contribution [1] and it seems to
> pass
> > fine. Then I ran the build against your PR branch and it hanged [2].
> > There is nothing surprising that it fired only in master because the
> build
> > was added to RunAll after your PR runs.
> > Could you please take a look?
> >
> > [1] https://ci.ignite.apache.org/viewLog.html?buildId=2127238;
> > [2]
> >
> >
> https://ci.ignite.apache.org/viewLog.html?buildId=2127232=buildResultsDiv=IgniteTests24Java8_BinaryObjectsSimpleMapperQueries
> >
> > сб, 20 окт. 2018 г. в 10:30, :
> >
> > > Hi Igniters,
> > >
> > >  I've detected some new issue on TeamCity to be handled. You are more
> > than
> > > welcomed to help.
> > >
> > >  If your changes can lead to this failure(s): We're grateful that you
> > were
> > > a volunteer to make the contribution to this project, but things change
> > and
> > > you may no longer be able to finalize your contribution.
> > >  Could you respond to this email and indicate if you wish to continue
> and
> > > fix test failures or step down and some committer may revert you
> commit.
> > >
> > >  *New Critical Failure in master Queries (Binary Objects Simple
> > > Mapper)
> > >
> >
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_BinaryObjectsSimpleMapperQueries=%3Cdefault%3E=buildTypeStatusDiv
> > >  Changes may lead to failure were done by
> > >  - kaa.dev
> > >
> >
> http://ci.ignite.apache.org/viewModification.html?modId=835798=false
> > >  - vololo100
> > >
> >
> http://ci.ignite.apache.org/viewModification.html?modId=835794=false
> > >
> > >  - Here's a reminder of what contributors were agreed to do
> > > https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute
> > >  - Should you have any questions please contact
> > > dev@ignite.apache.org
> > >
> > > Best Regards,
> > > Apache Ignite TeamCity Bot
> > > https://github.com/apache/ignite-teamcity-bot
> > > Notification generated at 10:30:43 20-10-2018
> > >
> >
> >
> > --
> > Best regards,
> > Ivan Pavlukhin
> >
> --
> --
> Maxim Muzafarov
>


[jira] [Created] (IGNITE-9943) Update documentation for default WAL archive size (added auto-adjust)

2018-10-19 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-9943:


 Summary: Update documentation for default WAL archive size (added 
auto-adjust)
 Key: IGNITE-9943
 URL: https://issues.apache.org/jira/browse/IGNITE-9943
 Project: Ignite
  Issue Type: Bug
Reporter: Alexey Goncharuk






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [MTCGA]: new failures in builds [2075095] needs to be handled

2018-10-16 Thread Alexey Goncharuk
The test is flaky.

сб, 13 окт. 2018 г. в 23:10, :

> Hi Igniters,
>
>  I've detected some new issue on TeamCity to be handled. You are more than
> welcomed to help.
>
>  If your changes can lead to this failure(s): We're grateful that you were
> a volunteer to make the contribution to this project, but things change and
> you may no longer be able to finalize your contribution.
>  Could you respond to this email and indicate if you wish to continue and
> fix test failures or step down and some committer may revert you commit.
>
>  *New stable failure of a flaky test in master
> TcpDiscoverySslTrustedSelfTest.testNodeShutdownOnRingMessageWorkerStartNotFinished
>
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=4205209624038097982=%3Cdefault%3E=testDetails
>  No changes in the build
>
>  - Here's a reminder of what contributors were agreed to do
> https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute
>  - Should you have any questions please contact
> dev@ignite.apache.org
>
> Best Regards,
> Apache Ignite TeamCity Bot
> https://github.com/apache/ignite-teamcity-bot
> Notification generated at 23:10:37 13-10-2018
>


Re: [MTCGA]: new failures in builds [2063682] needs to be handled

2018-10-16 Thread Alexey Goncharuk
Last two runs are green, let's keep monitoring.

пн, 15 окт. 2018 г. в 3:25, :

> Hi Igniters,
>
>  I've detected some new issue on TeamCity to be handled. You are more than
> welcomed to help.
>
>  If your changes can lead to this failure(s): We're grateful that you were
> a volunteer to make the contribution to this project, but things change and
> you may no longer be able to finalize your contribution.
>  Could you respond to this email and indicate if you wish to continue and
> fix test failures or step down and some committer may revert you commit.
>
>  *New stable failure of a flaky test in master
> HadoopMapReduceErrorResilienceTest.testRecoveryAfterAnError0_Error
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=2414028243453692891=%3Cdefault%3E=testDetails
>  Changes may lead to failure were done by
>  - jokserfn
> http://ci.ignite.apache.org/viewModification.html?modId=834843=false
>  - aplatonovv
> http://ci.ignite.apache.org/viewModification.html?modId=834837=false
>  - kondakov87
> http://ci.ignite.apache.org/viewModification.html?modId=834798=false
>  - mr.weider
> http://ci.ignite.apache.org/viewModification.html?modId=834773=false
>  - tledkov
> http://ci.ignite.apache.org/viewModification.html?modId=834768=false
>
>  - Here's a reminder of what contributors were agreed to do
> https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute
>  - Should you have any questions please contact
> dev@ignite.apache.org
>
> Best Regards,
> Apache Ignite TeamCity Bot
> https://github.com/apache/ignite-teamcity-bot
> Notification generated at 03:25:42 15-10-2018
>


[jira] [Created] (IGNITE-9895) DiscoveryMessageNotifierWorker must be instanceof IgniteDiscoveryThread

2018-10-16 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-9895:


 Summary: DiscoveryMessageNotifierWorker must be instanceof 
IgniteDiscoveryThread
 Key: IGNITE-9895
 URL: https://issues.apache.org/jira/browse/IGNITE-9895
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.7
Reporter: Alexey Goncharuk
 Fix For: 2.7


This is a regression from IGNITE-9398. The newly added thread must implement 
the marker interface, otherwise it is possible for a blocking future get inside 
of discovery worker, which leads to a cluster-wide deadlock:

{code}
"disco-notyfier-worker-#625%internal.IgniteClientReconnectApiExceptionTest0%" 
#770 prio=5 os_prio=0 tid=0x7f479c263800 nid=0x209b waiting on condition 
[0x7f49287ec000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
at 
org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.metadata(CacheObjectBinaryProcessorImpl.java:579)
at 
org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl$2.metadata(CacheObjectBinaryProcessorImpl.java:197)
at 
org.apache.ignite.internal.binary.BinaryContext.metadata(BinaryContext.java:1283)
at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.getOrCreateSchema(BinaryReaderExImpl.java:2007)
at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.(BinaryReaderExImpl.java:286)
at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.(BinaryReaderExImpl.java:185)
at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.readField(BinaryReaderExImpl.java:1984)
at 
org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.read0(BinaryFieldAccessor.java:698)
at 
org.apache.ignite.internal.binary.BinaryFieldAccessor.read(BinaryFieldAccessor.java:183)
at 
org.apache.ignite.internal.binary.BinaryClassDescriptor.read(BinaryClassDescriptor.java:870)
at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1764)
at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1716)
at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.readField(BinaryReaderExImpl.java:1984)
at 
org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.read0(BinaryFieldAccessor.java:698)
at 
org.apache.ignite.internal.binary.BinaryFieldAccessor.read(BinaryFieldAccessor.java:183)
at 
org.apache.ignite.internal.binary.BinaryClassDescriptor.read(BinaryClassDescriptor.java:870)
at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1764)
at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1716)
at 
org.apache.ignite.internal.binary.GridBinaryMarshaller.deserialize(GridBinaryMarshaller.java:310)
at 
org.apache.ignite.internal.binary.BinaryMarshaller.unmarshal0(BinaryMarshaller.java:99)
at 
org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(AbstractNodeNameAwareMarshaller.java:82)
at 
org.apache.ignite.internal.util.IgniteUtils.unmarshal(IgniteUtils.java:10014)
at 
org.apache.ignite.internal.util.IgniteUtils.unmarshal(IgniteUtils.java:10043)
at 
org.apache.ignite.internal.GridMessageListenHandler.p2pUnmarshal(GridMessageListenHandler.java:194)
at 
org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.processStartRequest(GridContinuousProcessor.java:1331)
at 
org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.access$400(GridContinuousProcessor.java:108)
at 
org.apache.ignite.internal.processors.continuous.GridContinuousProcessor$2.onCustomEvent(GridContinuousProcessor.java:200)
at 
org.apache.ignite.internal.processors.continuous.GridContinuousProcessor$2.onCustomEvent(GridContinuousProcessor.java:191)
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:721)
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:600)
- locked <0x0007860b5c70> (a java.lang.Object)
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4$$Lambda$10/346299427.run(Unknown
 Source)
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifyerWorker.body0(GridDiscoveryManager.java:2681)
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifyerWorker.body(GridDiscoveryM

Re: [MTCGA]: new failures in builds [2093987] needs to be handled

2018-10-16 Thread Alexey Goncharuk
This is a flaky failure, can be ignored.

вт, 16 окт. 2018 г. в 8:40, :

> Hi Igniters,
>
>  I've detected some new issue on TeamCity to be handled. You are more than
> welcomed to help.
>
>  If your changes can lead to this failure(s): We're grateful that you were
> a volunteer to make the contribution to this project, but things change and
> you may no longer be able to finalize your contribution.
>  Could you respond to this email and indicate if you wish to continue and
> fix test failures or step down and some committer may revert you commit.
>
>  *New stable failure of a flaky test in master
> TcpDiscoverySslTrustedSelfTest.testNodeShutdownOnRingMessageWorkerStartNotFinished
>
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=4205209624038097982=%3Cdefault%3E=testDetails
>  No changes in the build
>
>  - Here's a reminder of what contributors were agreed to do
> https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute
>  - Should you have any questions please contact
> dev@ignite.apache.org
>
> Best Regards,
> Apache Ignite TeamCity Bot
> https://github.com/apache/ignite-teamcity-bot
> Notification generated at 08:40:43 16-10-2018
>


[jira] [Created] (IGNITE-9857) .NET: IgniteConfigurationTest.TestSpringXml is flaky in master

2018-10-11 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-9857:


 Summary: .NET: IgniteConfigurationTest.TestSpringXml is flaky in 
master
 Key: IGNITE-9857
 URL: https://issues.apache.org/jira/browse/IGNITE-9857
 Project: Ignite
  Issue Type: Test
Reporter: Alexey Goncharuk


The test is constantly failing on a specific set of agents.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Apache Ignite 2.7. Last Mile

2018-10-11 Thread Alexey Goncharuk
Nikolay,

I am waiting for final benchmark results for 9784, after that I will merge
the change.

On the subject of Ignite 2.7 scope, our fellow Igniter Alexey Platonov have
found another case when a failure handler is incorrectly called on node
stop: https://issues.apache.org/jira/browse/IGNITE-9834. The case is rare,
but it is quite an unpleasant UX. Should we include it to 2.7 as well?

чт, 11 окт. 2018 г. в 11:22, Nikolay Izhikov :

> Hello, Igniters.
>
> We made a good progress yesterday.
>
> Here is the list of remaining tickets(17) mapped to 2.7:
>
> Alexey Goncharuk   - IGNITE-9784
> Taras Ledkov   - IGNITE-9171
> Andrey Kuznetsov   - IGNITE-9737, IGNITE-9710
> Peter Ivanov   - IGNITE-9583, IGNITE-9823, IGNITE-9685
> Igor Seliverstov   - IGNITE-9749, IGNITE-9292
> Dmitry Melnichuk   - IGNITE-7782
> Ivan Pavlukhin - IGNITE-5935
> Yury Babak - IGNITE-8670
> Roman Kondakov - IGNITE-7953, IGNITE-9446
> Alexey Stelmak - IGNITE-9776
>
> Unassigned:
>
> IGNITE-9620 - MVCC: select throwing `Transaction is already completed`> >
> > exception after mvcc missmatch
> IGNITE-9663 - MVCC: Data node failure can cause TX hanging.
>
>
> В Чт, 11/10/2018 в 10:40 +0300, Vladimir Ozerov пишет:
> > What kind of help is needed?
> >
> > On Wed, Oct 10, 2018 at 11:51 PM Dmitriy Setrakyan <
> dsetrak...@apache.org>
> > wrote:
> >
> > > Vladimir Ozerov,
> > >
> > > Can you help with the unassigned MVCC tickets?
> > >
> > > D.
> > >
> > > On Wed, Oct 10, 2018 at 3:32 AM Nikolay Izhikov 
> > > wrote:
> > >
> > > > Hello, Igniters.
> > > >
> > > > I list all contributors that assigned to the 2.7 tickets.
> > > > If you can help them to finish that tickets - please, do.
> > > > Assigners, if you need any help - please, respond to this thread.
> > > >
> > > > NOTE: We have 6 Unassigned tickets for 2.7. Let's start work on it!
> > > >
> > > > Peter Ivanov   - IGNITE-9559, IGNITE-9583, IGNITE-9685,
> IGNITE-9823
> > > > Andrey Kuznetsov   - IGNITE-9737, IGNITE-9710
> > > > Taras Ledkov   - IGNITE-9171
> > > > Alexey Goncharuk   - IGNITE-9784
> > > > Dmitriy Govorukhin - IGNITE-9550
> > > > Igor Seliverstov   - IGNITE-9749
> > > > Dmitry Melnichuk   - IGNITE-7782
> > > > Alexey Platonov- IGNITE-9726
> > > > Ivan Pavlukhin - IGNITE-5935
> > > > Yury Babak - IGNITE-8670
> > > > Roman Kondakov - IGNITE-7953, IGNITE-9446
> > > > Maxim Pudov- IGNITE-9126
> > > > Alexey Stelmak - IGNITE-9776
> > > > Alexey Kuznetsov   - IGNITE-7926
> > > >
> > > > Unassigned tickets:
> > > >
> > > > IGNITE-9781 - JDK11: SSL handshake is failed
> > > > IGNITE-9620 - MVCC: select throwing `Transaction is already
> completed`
> > > > exception after mvcc missmatch
> > > > IGNITE-9292 - MVCC SQL: Unexpected state exception when updating
> backup
> > > > IGNITE-9663 - MVCC: Data node failure can cause TX hanging.
> > > > IGNITE-9724 - MVCC SQL: Test
> > > >
> CacheMvccSelectForUpdateQueryAbstractTest.testSelectForUpdateDistributed
> > > > hangs sporadically.
> > > > IGNITE-9133 - MVCC: Proper empty DHT transactions handling.
> > > >
>


Re: Apache Ignite 2.7 release

2018-10-09 Thread Alexey Goncharuk
Igniters, Nikolay,

I've recently discovered an issue [1] which was causing test suite to quit
on TC. The root cause of the issue was an incorrect handling of WAL
archiver stop, which causes a failure propagated to the failure processor
and lead to a JVM halt on each node stop. This is a regression compared
from 2.6.
I think it is worth including this fix to 2.7 given that the fix is ready
and verified on TC.

Please let me know if you have any objections.

вт, 9 окт. 2018 г. в 19:00, Andrey Kuznetsov :

> Igniters,
>
> Recently, I have filed an issue [1] that deals with possible hanging of WAL
> logging. I will appreciate your thoughts on its severity. To make logging
> hang two conditions should be satisfied: WAL mode is {{FSYNC}}, and WAL
> archiving is disabled. Should we investigate and fix this immediately or is
> it possible to postpone till 2.8?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-9776
>
> вт, 9 окт. 2018 г. в 11:17, Andrey Kuznetsov :
>
> > Ignite committers,
> >
> > I have prepared a PR for 2.7 blocker [1]. Could anybody merge it to 2.7
> > and master?
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-9737
> >
> >
> > ср, 3 окт. 2018 г. в 14:02, Nikolay Izhikov :
> >
> >> Alexey.
> >>
> >> Sorry, I lost link to IGNITE-9760 in this thread :)
> >>
> >> Thanks, for a clarification.
> >>
> >>
> >> В Ср, 03/10/2018 в 13:58 +0300, Alexey Goncharuk пишет:
> >> > Nikolay, both commits fixed a regression compared to ignite-2.6. First
> >> one was mentioned by Anton Kalashnikov before (java-level deadlock
> during
> >> WAL flush), another - by Andrey Kuznetsov (NPE during a concurrent WAL
> >> flush).
> >> >
> >> > --AG
> >> >
> >> > ср, 3 окт. 2018 г. в 13:38, Nikolay Izhikov :
> >> > > Hello, Igniters.
> >> > >
> >> > > Release scope is frozen.
> >> > > Please, if you include some new issues in release - discuss it in
> >> this thread.
> >> > >
> >> > > Alexey, can you, please, comment on including fix for IGNITE-9760,
> >> IGNITE-9761 in 2.7 branch.
> >> > >
> >> > >
> >>
> https://git-wip-us.apache.org/repos/asf?p=ignite.git;a=commit;h=3355201f3e8cafd23b2250aaf3b91b8b8ed1
> >> > >
> >>
> https://git-wip-us.apache.org/repos/asf?p=ignite.git;a=commit;h=9d6e6ff394c05ddf7ef31a9d9ed1b492d9eeba69
> >> > >
> >> > > В Ср, 03/10/2018 в 13:24 +0300, Vladimir Ozerov пишет:
> >> > > > Nobody vetos anything, let's stop use this term unless some really
> >> > > > important problem is discussed.
> >> > > >
> >> > > > At this point we are in situation when new tickets are still
> >> included into
> >> > > > the scope. All want to ask is to stop including new tickets
> without
> >> > > > explaining on why they should be in AI 2.7. Regression between is
> >> AI 2.6
> >> > > > and AI 2.7 is enough. But "I found new NPE" is not.
> >> > > >
> >> > > > On Wed, Oct 3, 2018 at 11:10 AM Dmitriy Pavlov <
> >> dpavlov@gmail.com>
> >> > > > wrote:
> >> > > >
> >> > > > > Nikolay,
> >> > > > >
> >> > > > > this has nothing about scaring someone. Let me explain about
> >> Apache Way.
> >> > > > >
> >> > > > > Voting -1 to release does not mean blocking it, release can't be
> >> vetoed.
> >> > > > > Approving release is done by policy: majority approval. 3+1
> >> binding and
> >> > > > > more +1 than -1. Consensus approval is better but not mandatory.
> >> > > > >
> >> > > > > Instead, if PMC says -1 to code modification it means veto and
> >> can't be
> >> > > > > bypassed to anyone. This is a very strong statement, which
> should
> >> be
> >> > > > > applied reasonably and with technical justification. Lack of
> >> > > > > understanding is not a justification.
> >> > > > >
> >> > > > > So my point instead of vetoing bugfix let's veto commits where
> >> the bugs
> >> > > > > were introduced. I feel a number of bugs reported recently are
> all
> >> > > > > connected to WalManager, and these bugs may come from just

[jira] [Created] (IGNITE-9822) Correct WAL archiver termination is incorrectly reported as critical thread termination

2018-10-09 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-9822:


 Summary: Correct WAL archiver termination is incorrectly reported 
as critical thread termination
 Key: IGNITE-9822
 URL: https://issues.apache.org/jira/browse/IGNITE-9822
 Project: Ignite
  Issue Type: Bug
Reporter: Alexey Goncharuk






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9821) IgnitePdsCacheConfigurationFileConsistencyCheckTest is flaky in master

2018-10-09 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-9821:


 Summary: IgnitePdsCacheConfigurationFileConsistencyCheckTest is 
flaky in master
 Key: IGNITE-9821
 URL: https://issues.apache.org/jira/browse/IGNITE-9821
 Project: Ignite
  Issue Type: Bug
Reporter: Alexey Goncharuk
Assignee: Alexey Goncharuk


The test fails with timeout, a node is waiting for activation to complete.

Example of the fail:
https://ci.ignite.apache.org/viewLog.html?buildId=2035822=IgniteTests24Java8_PdsDirectIo1=buildResultsDiv

Root cause of the hang is subject to investigate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Danger (?) change of DiscoveryCustomEvent in GridDhtPartitionsExchangeFuture#onDone

2018-10-04 Thread Alexey Goncharuk
Vyacheslav,

I think it would be more correct to capture all required state that will be
further used in a custom object and use it later in service processor.
Nullifying the field is an explicit action that was taken to reduce memory
consumption on server nodes, so we cannot simply drop it. Another solution
is reference counting and nullifying the field only after all parties
finished processing the message, but it looks like an overengineering to me.

ср, 3 окт. 2018 г. в 15:00, Vyacheslav Daradur :

> Alexey, thank you for the answer. I'd ask your advice about the
> following question:
>
> New Service Grid implementation listens to messages:
> * ChangeGlobalStateMessage - to perform activation/deactivation actions;
> * DynamicCacheChangeBatch - to handle caches stopping to undeploy
> related affinity services;
> * CacheAffinityChangeMessage - to recalculate assignments for affinity
> services in case of late affinity.
>
> It's important to handle these messages in order from disco-spi that
> means SG is storing them in own exchange queue to process.
> In some cases, PME may nullify this message earlier than they will be
> processed by SG.
>
> Could I exclude these messages from PME nullifying?
>
> On Wed, Oct 3, 2018 at 11:26 AM Alexey Goncharuk
>  wrote:
> >
> > Vyacheslav,
> >
> > Thanks for investigating this. User code should never listen to system
> > custom events because this is an internal API and it's a subject to
> change.
> > If there is anything a user interested in, the corresponding public event
> > should be added.
> >
> > Nullifying the event in this case looks ok for me.
> >
> > вс, 30 сент. 2018 г. в 11:45, Vyacheslav Daradur :
> >
> > > I think that I understand a reason for doing this:
> > > The most custom events which handle in
> > > 'GridDhtPartitionsExchangeFuture' are using only in PME flow and
> > > reason is release them for GC as soon as possible.
> > >
> > > But there are some other systems which can listen to the same events,
> > > for example, to perform activation/deactivation actions them should
> > > handle [ChangeGlobalStateMessage, ChangeGlobalStateFinishMessage]
> > > which can be reset to 'null' by PME earlier then they will be handled
> > > by other systems.
> > >
> > > I'd suggest do not reset to 'null' custom messages in
> > > 'DiscoveryCustomEvent ' (at least without properly logic from the
> > > discovery-spi side).
> > >
> > > Thoughts?
> > >
> > >
> > >
> > > On Sat, Sep 29, 2018 at 11:43 PM Vyacheslav Daradur <
> daradu...@gmail.com>
> > > wrote:
> > > >
> > > > Hi Igniters!
> > > >
> > > > I think I found an illegal behavior in
> > > > GridDhtPartitionsExchangeFuture#onDone, the following code is called
> > > > here:
> > > > ((DiscoveryCustomEvent)firstDiscoEvt).customMessage(null);
> > > >
> > > > That means a global instance of 'DiscoveryCustomEvent' is being
> > > > mutated outside discovery-spi infrastructure. It also means that
> > > > discovery listeners receive 'DiscoveryCustomEvent' with 'null' field
> > > > instead of 'CustomMessage' which they may rely on.
> > > >
> > > > Could someone confirm if it is wrong behavior and should be fixed?
> > > >
> > > > --
> > > > Best Regards, Vyacheslav D.
> > >
> > >
> > >
> > > --
> > > Best Regards, Vyacheslav D.
> > >
>
>
>
> --
> Best Regards, Vyacheslav D.
>


Re: [MTCGA]: new failures in builds [1871897] needs to be handled

2018-10-04 Thread Alexey Goncharuk
Dmitriy, to my knowledge, the test will be fixed by the ticket
https://issues.apache.org/jira/browse/IGNITE-9550, we expect it to be
merged by the end of this week.

ср, 3 окт. 2018 г. в 18:00, Dmitriy Pavlov :

> Hi Alexey,
>
> Could you please assist with fixing test?
>
> Sincerely,
> Dmitriy Pavlov
>
> сб, 29 сент. 2018 г. в 12:23, Dmitriy Pavlov :
>
>> Folks,
>>
>> both tests are failed in ignite-2.7 IgniteStandByClusterTest.testSimple
>> 
>>  &
>> IgniteChangeGlobalStateFailOverTest.testActivateDeActivateOnFixTopologyWithPutValues
>> 
>>
>>
>> Can I hope these failures will be fixed in master and 2.7 before release?
>>
>> https://issues.apache.org/jira/browse/IGNITE-7618
>> 
>>
>>
>>
>> пт, 21 сент. 2018 г. в 11:33, Dmitrii Ryabov :
>>
>>> Hi, Dmitriy,
>>> I checked 7618 and previous commits: test fails locally starting from
>>> 7618.
>>> It fails because `cache.get()` remembers deactivated state and doesn't
>>> check current state.
>>>
>>> 2018-09-20 18:41 GMT+03:00 Dmitriy Pavlov :
>>>
>>> > Hi,
>>> >
>>> > IgniteStandByClusterTest seems to fail, Dmitriy G., Ivan, would it be
>>> > reasonable to revert commit?
>>> >
>>> > Dmitriy Ryabov, is it related to recent fix or is it a standalone
>>> problem?
>>> >
>>> > Sincerely,
>>> > Dmitriy Pavlov
>>> >
>>> > пн, 17 сент. 2018 г. в 18:45, Dmitrii Ryabov :
>>> >
>>> > > Looks like problem I had described in the ticket.
>>> > >
>>> > >
>>> > > https://issues.apache.org/jira/browse/IGNITE-7618?
>>> > focusedCommentId=16506923=com.atlassian.jira.
>>> > plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16506923
>>> > >
>>> > > 2018-09-15 12:01 GMT+03:00 Dmitriy Pavlov :
>>> > >
>>> > > > Dmitriy G, Ivan B,
>>> > > >
>>> > > > could you please double-check if failure is not coming from
>>> > > > https://issues.apache.org/jira/browse/IGNITE-7618
>>> > > >
>>> > > > Sincerely,
>>> > > > Dmitriy Pavlov
>>> > > >
>>> > > > сб, 15 сент. 2018 г. в 5:42, :
>>> > > >
>>> > > > > Hi Ignite Developer,
>>> > > > >
>>> > > > > I am MTCGA.Bot, and I've detected some issue on TeamCity to be
>>> > > addressed.
>>> > > > > I hope you can help.
>>> > > > >
>>> > > > >  *New test failure in master
>>> IgniteStandByClusterTest.testSimple
>>> > > > > https://ci.ignite.apache.org/project.html?projectId=
>>> > > > IgniteTests24Java8=1332314705000986815=%
>>> > > > 3Cdefault%3E=testDetails
>>> > > > >  Changes may led to failure were done by
>>> > > > >  - bessonov.ip
>>> > > > > http://ci.ignite.apache.org/viewModification.html?modId=
>>> > > > 831651=false
>>> > > > >
>>> > > > > - If your changes can led to this failure(s), please
>>> create
>>> > > issue
>>> > > > > with label MakeTeamCityGreenAgain and assign it to you.
>>> > > > > -- If you have fix, please set ticket to PA state and
>>> write
>>> > to
>>> > > > dev
>>> > > > > list fix is ready
>>> > > > > -- For case fix will require some time please mute test
>>> and
>>> > set
>>> > > > > label Muted_Test to issue
>>> > > > > - If you know which change caused failure please contact
>>> > change
>>> > > > > author directly
>>> > > > > - If you don't know which change caused failure please
>>> send
>>> > > > > message to dev list to find out
>>> > > > > Should you have any questions please contact
>>> dev@ignite.apache.org
>>> > > > > Best Regards,
>>> > > > > MTCGA.Bot
>>> > > > > Notification generated at Sat Sep 15 05:42:21 MSK 2018
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>


[jira] [Created] (IGNITE-9790) Assertion error on full messages merge after coordinator failover

2018-10-04 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-9790:


 Summary: Assertion error on full messages merge after coordinator 
failover
 Key: IGNITE-9790
 URL: https://issues.apache.org/jira/browse/IGNITE-9790
 Project: Ignite
  Issue Type: Bug
Reporter: Alexey Goncharuk






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Apache Ignite 2.7 release

2018-10-03 Thread Alexey Goncharuk
Nikolay, both commits fixed a regression compared to ignite-2.6. First one
was mentioned by Anton Kalashnikov before (java-level deadlock during WAL
flush), another - by Andrey Kuznetsov (NPE during a concurrent WAL flush).

--AG

ср, 3 окт. 2018 г. в 13:38, Nikolay Izhikov :

> Hello, Igniters.
>
> Release scope is frozen.
> Please, if you include some new issues in release - discuss it in this
> thread.
>
> Alexey, can you, please, comment on including fix for IGNITE-9760,
> IGNITE-9761 in 2.7 branch.
>
>
> https://git-wip-us.apache.org/repos/asf?p=ignite.git;a=commit;h=3355201f3e8cafd23b2250aaf3b91b8b8ed1
>
> https://git-wip-us.apache.org/repos/asf?p=ignite.git;a=commit;h=9d6e6ff394c05ddf7ef31a9d9ed1b492d9eeba69
>
> В Ср, 03/10/2018 в 13:24 +0300, Vladimir Ozerov пишет:
> > Nobody vetos anything, let's stop use this term unless some really
> > important problem is discussed.
> >
> > At this point we are in situation when new tickets are still included
> into
> > the scope. All want to ask is to stop including new tickets without
> > explaining on why they should be in AI 2.7. Regression between is AI 2.6
> > and AI 2.7 is enough. But "I found new NPE" is not.
> >
> > On Wed, Oct 3, 2018 at 11:10 AM Dmitriy Pavlov 
> > wrote:
> >
> > > Nikolay,
> > >
> > > this has nothing about scaring someone. Let me explain about Apache
> Way.
> > >
> > > Voting -1 to release does not mean blocking it, release can't be
> vetoed.
> > > Approving release is done by policy: majority approval. 3+1 binding and
> > > more +1 than -1. Consensus approval is better but not mandatory.
> > >
> > > Instead, if PMC says -1 to code modification it means veto and can't be
> > > bypassed to anyone. This is a very strong statement, which should be
> > > applied reasonably and with technical justification. Lack of
> > > understanding is not a justification.
> > >
> > > So my point instead of vetoing bugfix let's veto commits where the bugs
> > > were introduced. I feel a number of bugs reported recently are all
> > > connected to WalManager, and these bugs may come from just a couple of
> > > fixes. PDS tests were quite stable last time, so I think it is
> possible to
> > > find out why WAL crashes and hangs.
> > >
> > > Sincerely,
> > > Dmitriy Pavlov
> > >
> > > ср, 3 окт. 2018 г. в 10:05, Andrey Kuznetsov :
> > >
> > > > Vladimir, Nikolay,
> > > >
> > > > For sure, I'm not an experienced Ignite contributor, so I'm sorry for
> > > > intervening. I've just run the reproducer from [1] against ignite-2.6
> > > > branch and it has passed. So, it's not an legacy bug, we've brought
> it
> > >
> > > with
> > > > some change of 2.7 scope. Is it still ok to ignore the bug?
> > > >
> > > > [1] https://issues.apache.org/jira/browse/IGNITE-9776
> > > >
> > > > ср, 3 окт. 2018 г. в 2:07, Nikolay Izhikov :
> > > >
> > > > > Hello, Dmitriy.
> > > > >
> > > > > I'm sorry, but I don't understand your concern.
> > > > >
> > > > > Vladimir just asks experienced Ignite contributor to *explain
> impact*
> > >
> > > of
> > > > a
> > > > > bug.
> > > > >
> > > > > Why are you scaring us with your "-1"?
> > > > > Is it Apache Way to do so?
> > > > > What should be done for you to return to a constructive discussion?
> > > > >
> > > > > В Ср, 03/10/2018 в 00:23 +0300, Dmitriy Pavlov пишет:
> > > > > > Hi Igniters, Vladimir,
> > > > > >
> > > > > > NPEs or hangs in WAL is a completely non-functional grid (if
> > > >
> > > > persistence
> > > > > > enabled).
> > > > > >
> > > > > > I see no reasons to release 2.7 with such symptoms until we're
> sure
> > >
> > > it
> > > > is
> > > > > > too rare/impossible to reproduce. But it seems it is not the
> case. I
> > > >
> > > > will
> > > > > > definitely vote -1 for the release if I'm aware of such problems
> > >
> > > exist
> > > > > and
> > > > > > were not researched. Community guarantees the quality and
> usability
> > >
> > > of
> > > > > the
> > > > > > product.
> > > > > >
> > > > > > We should ask and answer other questions:
> > > > > > 1) why there are a lot of NPEs and hangs reported recently in the
> > >
> > > same
> > > > > area
> > > > > > 2) and why we signed-off commit(s).
> > > > > >
> > > > > > Probably we can identify and revert these commit(s) from 2.7 and
> > > >
> > > > research
> > > > > > these failures in master (with no rush).
> > > > > >
> > > > > > Sincerely,
> > > > > > Dmitriy Pavlov
> > > > > >
> > > > > > вт, 2 окт. 2018 г. в 23:54, Vladimir Ozerov <
> voze...@gridgain.com>:
> > > > > >
> > > > > > > Andrey, Anton,
> > > > > > >
> > > > > > > How do you conclude that these tickets are blockers? What is
> the
> > > > >
> > > > > impact to
> > > > > > > users and in what circumstances users can met them?
> > > > > > >
> > > > > > > Note that we have many hundreds opened bugs, and yet we do not
> > >
> > > strive
> > > > > to
> > > > > > > include them all, because bug != blocker.
> > > > > > >
> > > > > > > So -1 from my side to including these tickets to release scope,
> > > >
> 

Re: Danger (?) change of DiscoveryCustomEvent in GridDhtPartitionsExchangeFuture#onDone

2018-10-03 Thread Alexey Goncharuk
Vyacheslav,

Thanks for investigating this. User code should never listen to system
custom events because this is an internal API and it's a subject to change.
If there is anything a user interested in, the corresponding public event
should be added.

Nullifying the event in this case looks ok for me.

вс, 30 сент. 2018 г. в 11:45, Vyacheslav Daradur :

> I think that I understand a reason for doing this:
> The most custom events which handle in
> 'GridDhtPartitionsExchangeFuture' are using only in PME flow and
> reason is release them for GC as soon as possible.
>
> But there are some other systems which can listen to the same events,
> for example, to perform activation/deactivation actions them should
> handle [ChangeGlobalStateMessage, ChangeGlobalStateFinishMessage]
> which can be reset to 'null' by PME earlier then they will be handled
> by other systems.
>
> I'd suggest do not reset to 'null' custom messages in
> 'DiscoveryCustomEvent ' (at least without properly logic from the
> discovery-spi side).
>
> Thoughts?
>
>
>
> On Sat, Sep 29, 2018 at 11:43 PM Vyacheslav Daradur 
> wrote:
> >
> > Hi Igniters!
> >
> > I think I found an illegal behavior in
> > GridDhtPartitionsExchangeFuture#onDone, the following code is called
> > here:
> > ((DiscoveryCustomEvent)firstDiscoEvt).customMessage(null);
> >
> > That means a global instance of 'DiscoveryCustomEvent' is being
> > mutated outside discovery-spi infrastructure. It also means that
> > discovery listeners receive 'DiscoveryCustomEvent' with 'null' field
> > instead of 'CustomMessage' which they may rely on.
> >
> > Could someone confirm if it is wrong behavior and should be fixed?
> >
> > --
> > Best Regards, Vyacheslav D.
>
>
>
> --
> Best Regards, Vyacheslav D.
>


[jira] [Created] (IGNITE-9764) Node may hang on start if cluster state is in transition

2018-10-02 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-9764:


 Summary: Node may hang on start if cluster state is in transition
 Key: IGNITE-9764
 URL: https://issues.apache.org/jira/browse/IGNITE-9764
 Project: Ignite
  Issue Type: Bug
Reporter: Alexey Goncharuk


The following sequence of events may cause node hang on start
Node starts, detects cluster state transition and waits for it to complete
{code}
"start-node-1" #11804 prio=5 os_prio=0 tid=0x7f9cc4022000 nid=0x1094 
waiting on condition [0x7f9ffc4c2000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141)
at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1084)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2033)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1728)
- locked <0x9467c890> (a 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance)
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1156)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:654)
at 
org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:917)
at 
org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:855)
at 
org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:843)
at 
org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:809)
at 
org.apache.ignite.internal.processors.cache.IgniteClusterActivateDeactivateTest.lambda$testConcurrentJoinAndActivate$4(IgniteClusterActivateDeactivateTest.java:539)
at 
org.apache.ignite.internal.processors.cache.IgniteClusterActivateDeactivateTest$$Lambda$99/295822519.call(Unknown
 Source)
at 
org.apache.ignite.testframework.GridTestThread.run(GridTestThread.java:86)
{code}

Nio thread that is to process a message that would complete the exchange is 
attempting to create a session and get a local node ID
{code}
"grid-nio-worker-tcp-comm-3-#9833%cache.IgniteClusterActivateDeactivateTest3%" 
#11875 prio=5 os_prio=0 tid=0x7f9c8009e800 nid=0x10dc waiting on condition 
[0x7f9ff4d76000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  (a 
java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
at 
org.apache.ignite.internal.util.IgniteUtils.await(IgniteUtils.java:7577)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.getSpiContext(TcpCommunicationSpi.java:2266)
at 
org.apache.ignite.spi.IgniteSpiAdapter.getLocalNode(IgniteSpiAdapter.java:156)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeLocalNodeId(TcpCommunicationSpi.java:4006)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.nodeIdMessage(TcpCommunicationSpi.java:3999)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.access$300(TcpCommunicationSpi.java:271)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$2.onConnected(TcpCommunicationSpi.java:412)
at 
org.apache.ignite.internal.util.nio.GridNioFilterChain$TailFilter.onSessionOpened(GridNioFilterChain.java:251)
at 
org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionOpened(GridNioFilterAdapter.java:88)
at 
org.apache.ignite.internal.util.nio.GridNioCodecFilter.onSessionOpened(GridNioCodecFilter.java:67)
at 
org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionOpened(GridNioFilterAdapter.java:88)
at 
org.apache.ignite.internal.util.nio.GridConnectionBytesVerifyFilter.onSessionOpened(GridConnectionBytesVerifyFilter.java:58)
at 
org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionOpened(GridNioFilterAdapter.java:88)
at 
org.apache.ignite.internal.util.nio.GridNioServer$HeadFilter.onSessionOpen

Re: Request for review : IGNITE-3303 Apache Flink Integration - Flink source

2018-10-01 Thread Alexey Goncharuk
Hello Saikat,

I am ok with the prod code changes, but I am a bit confused with the
example being added to the tests folder. I think it should be either added
to the examples (not sure about the dependency though), or should not be
added at all. Also, I see that you added a new suite, has it been added to
a TC configuration?



пн, 1 окт. 2018 г. в 16:36, Nikolay Izhikov :

> Hello, Saikat.
>
> I have no objections to include this integration to 2.7 release.
> But, we should ask for a final review from Alex Goncharyuk.
>
> Alex, can you comment on this?
> Is this patch ready to be merged?
> Do you see any risks to include it to 2.7 release?
>
>
> В Вс, 30/09/2018 в 19:57 -0500, Saikat Maitra пишет:
> > Hi Alex, Nicolay
> >
> > As discussed with Andrew the changes looks good. Would it be ok to merge
> > this change to master considering the 2.7 release plan?
> >
> > Regards,
> > Saikat
> >
> > On Fri, Sep 28, 2018 at 7:15 PM Saikat Maitra 
> > wrote:
> >
> > > Thank you Andrew
> > >
> > > Regards,
> > > Saikat
> > >
> > > On Fri, Sep 28, 2018 at 7:00 PM Andrey Mashenkov <
> > > andrey.mashen...@gmail.com> wrote:
> > >
> > > > Hi Saikat,
> > > >
> > > > Sorry for late answer. I've checked changes a day ago. Now, looks
> good.
> > > > Hope, it will be merged soon.
> > > >
> > > > Alex, would you please merge PR to master.
> > > >
> > > > сб, 29 сент. 2018 г., 2:29 Saikat Maitra :
> > > >
> > > > > Hi Andrew,
> > > > >
> > > > > I have updated the changes.
> > > > >
> > > > > Can you please review and share feedback.
> > > > >
> > > > > Regards
> > > > > Saikat
> > > > >
> > > > > On Sat, Sep 22, 2018 at 2:23 PM Saikat Maitra <
> saikat.mai...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Andrew
> > > > > >
> > > > > >
> > > > > > I have updated the changes.
> > > > > >
> > > > > >
> > > > > > Can you please review and share feedback.
> > > > > >
> > > > > >
> > > > > > Regards
> > > > > > Saikat
> > > > > >
> > > > > >
> > > > > > On Wed, Sep 19, 2018 at 8:11 PM, Saikat Maitra <
> > > >
> > > > saikat.mai...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Andrew,
> > > > > > >
> > > > > > > I have updated the tests and also added java docs.
> > > > > > >
> > > > > > > Can you please review and share feedback.
> > > > > > >
> > > > > > >
> > > > > > > Regards
> > > > > > > Saikat
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Sun, Sep 16, 2018 at 11:53 AM, Saikat Maitra <
> > > > >
> > > > > saikat.mai...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Andrew,
> > > > > > > >
> > > > > > > > I have updated the tests and also added java docs.
> > > > > > > >
> > > > > > > > Please review and share feedback.
> > > > > > > >
> > > > > > > > Regards
> > > > > > > > Saikat
> > > > > > > >
> > > > > > > >
> > > > > > > > On Sat, Sep 8, 2018 at 2:09 PM, Saikat Maitra <
> > > >
> > > > saikat.mai...@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Andrew, Alexey
> > > > > > > > >
> > > > > > > > > I have incorporated the review changes.
> > > > > > > > >
> > > > > > > > > I have also refactored the CacheEventSerializer class and
> moved it
> > > >
> > > > to
> > > > > > > > > test folder because it is used only in the
> > > > &g

[jira] [Created] (IGNITE-9749) Assertion error in JdbcThinTransactionsServerAutoCommitComplexSelfTest leading to JDBC MVCC suite hang

2018-10-01 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-9749:


 Summary: Assertion error in 
JdbcThinTransactionsServerAutoCommitComplexSelfTest leading to JDBC MVCC suite 
hang
 Key: IGNITE-9749
 URL: https://issues.apache.org/jira/browse/IGNITE-9749
 Project: Ignite
  Issue Type: Bug
Reporter: Alexey Goncharuk


The following assertion can be observed in master
{code}
[10:34:12]W: [org.apache.ignite:ignite-clients] [07:34:12] (err) 
Failed to notify listener: 
o.a.i.i.util.future.GridEmbeddedFuture$2...@4e56da7bjava.lang.AssertionError: 
localNode = 14353600-ea43-42ae-bf7c-4b467800, dhtNodes = [TcpDiscoveryNode 
[id=04134719-3eb1-4969-99dc-f520f982, addrs=ArrayList [127.0.0.1], 
sockAddrs=HashSet [/127.0.0.1:47501], discPort=47501, order=3, intOrder=3, 
lastExchangeTime=1538379249752, loc=false, ver=2.7.0#20181001-sha1:9ab8ebd7, 
isClient=false]]
[10:34:12]W: [org.apache.ignite:ignite-clients] at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxAbstractEnlistFuture.backupNodes(GridDhtTxAbstractEnlistFuture.java:867)
[10:34:12]W: [org.apache.ignite:ignite-clients] at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxAbstractEnlistFuture.addToBatch(GridDhtTxAbstractEnlistFuture.java:627)
[10:34:12]W: [org.apache.ignite:ignite-clients] at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxAbstractEnlistFuture.processEntry(GridDhtTxAbstractEnlistFuture.java:614)
[10:34:12]W: [org.apache.ignite:ignite-clients] at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxAbstractEnlistFuture.continueLoop(GridDhtTxAbstractEnlistFuture.java:501)
[10:34:12]W: [org.apache.ignite:ignite-clients] at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxAbstractEnlistFuture.init(GridDhtTxAbstractEnlistFuture.java:363)
[10:34:12]W: [org.apache.ignite:ignite-clients] at 
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxQueryEnlistFuture.map(GridNearTxQueryEnlistFuture.java:212)
[10:34:12]W: [org.apache.ignite:ignite-clients] at 
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxAbstractEnlistFuture.mapOnTopology(GridNearTxAbstractEnlistFuture.java:332)
[10:34:12] : [Step 4/5] [2018-10-01 07:34:12,762][INFO 
][exchange-worker-#2510%thin.JdbcThinTransactionsServerAutoCommitComplexSelfTest2%][GridCachePartitionExchangeManager]
 Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion 
[topVer=4, minorTopVer=16], force=false, evt=DISCOVERY_CUSTOM_EVT, 
node=14353600-ea43-42ae-bf7c-4b467800]
[10:34:12]W: [org.apache.ignite:ignite-clients] at 
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxAbstractEnlistFuture.access$000(GridNearTxAbstractEnlistFuture.java:56)
[10:34:12]W: [org.apache.ignite:ignite-clients] at 
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxAbstractEnlistFuture$2.apply(GridNearTxAbstractEnlistFuture.java:340)
[10:34:12]W: [org.apache.ignite:ignite-clients] at 
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxAbstractEnlistFuture$2.apply(GridNearTxAbstractEnlistFuture.java:335)
[10:34:12]W: [org.apache.ignite:ignite-clients] at 
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:385)
[10:34:12]W: [org.apache.ignite:ignite-clients] at 
org.apache.ignite.internal.util.future.GridFutureAdapter.unblock(GridFutureAdapter.java:349)
[10:34:12]W: [org.apache.ignite:ignite-clients] at 
org.apache.ignite.internal.util.future.GridFutureAdapter.unblockAll(GridFutureAdapter.java:337)
[10:34:12]W: [org.apache.ignite:ignite-clients] at 
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:497)
[10:34:12]W: [org.apache.ignite:ignite-clients] at 
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:476)
[10:34:12]W: [org.apache.ignite:ignite-clients] at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:1947)
[10:34:12]W: [org.apache.ignite:ignite-clients] at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.finishExchangeOnCoordinator(GridDhtPartitionsExchangeFuture.java:3168)
[10:34:12]W: [org.apache.ignite:ignite-clients] at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onAllReceived(GridDhtPartitionsExchangeFuture.java:2934)
[10:34:12]W: [org.apache.ignite:ignite-clients

Re: Apache Ignite 2.7 release

2018-09-28 Thread Alexey Goncharuk
I think if a commit does not lead to any test failure in the current
master, there are no reasons to revert the commit. If there are valid
scenarios which are failing, corresponding tests should be added and the
root cause should be fixed under a separate issue.

пт, 28 сент. 2018 г. в 11:19, Dmitriy Pavlov :

> Hi Maxim,
>
> Once 1) you are sure that commit is related to the failure, and 2) in case
> contributors are not responding,
> please let me know, probably we need to open one more separate topic about
> revert.
>
> Sincerely,
> Dmitriy Pavlov
>
> пт, 28 сент. 2018 г. в 11:15, Maxim Muzafarov :
>
> > Andrey, Dmitry,
> >
> > > I've bumped into a new bug in WAL manager recently, see [1]. It looks
> > critical enough and can be a good candidate for fixing before 2.7
> release.
> >
> > I've found that commit [2] is also lead the exchange worker to hang in my
> > branch related to IGNITE-7196.
> > Not sure, I'm able to fix the whole [1] issue, but I will take a look at
> > it.
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-9731
> > [2]
> >
> >
> https://github.com/apache/ignite/commit/2f72fe758d4256c4eb4610e5922ad3d174b43dc5
> >
> >
> >
> > On Fri, 28 Sep 2018 at 11:11 Dmitriy Pavlov 
> wrote:
> >
> > > No, it is up to the community to discuss after their review results.
> > >
> > > пт, 28 сент. 2018 г. в 11:09, Vladimir Ozerov :
> > >
> > > > Dmitriy,
> > > >
> > > > Did I read your words correctly that it is up to implementor of a
> > single
> > > > feature to decide whether release of all other features and fixes to
> be
> > > > delayed?
> > > >
> > > > пт, 28 сент. 2018 г. в 11:00, Dmitriy Pavlov  >:
> > > >
> > > > > My point we can wait a bit for services because
> > > > > 1  we are open-minded and we don't have outside pressure to do
> > release
> > > in
> > > > > October
> > > > > 2  and services it is not some new feature, which suddenly appeared
> > in
> > > > > autumn, it is a well known and important feature.
> > > > >
> > > > > So it is up to Vyacheslav, Anton and Nikolay to decide.
> > > > >
> > > > > Decisions can be services are not ready/ready to merge only to
> > > > master/ready
> > > > > to merge to master and to 2.7.
> > > > >
> > > > >
> > > > > пт, 28 сент. 2018 г. в 10:46, Vladimir Ozerov <
> voze...@gridgain.com
> > >:
> > > > >
> > > > > > Dmitry,
> > > > > >
> > > > > > Community agreement was to perform the release in October. Of
> > course
> > > we
> > > > > can
> > > > > > wait a bit for services. Then we wait a bit for other cool
> features
> > > > ready
> > > > > > by that time, then again and again, and release will never
> happen.
> > > And
> > > > > > while we are waiting for new features to come, already completerd
> > > > > features
> > > > > > cannot be used by anyone.
> > > > > >
> > > > > > This is why we have an agreement that if feature is not ready, it
> > > > should
> > > > > be
> > > > > > moved to future release, instead of shifting release. The sole
> > reason
> > > > to
> > > > > > have strict dates when decisions are made is to let release
> happen.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Sep 28, 2018 at 2:22 AM Dmitriy Pavlov <
> > > dpavlov@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Vladimir,  I'm not searching for enemy, and not fighting with
> > you.
> > > > I'm
> > > > > > not
> > > > > > > happy about cases when we are hurrying.
> > > > > > >
> > > > > > > We can't fix test, fill ticket details, can't wait for
> > > contributions
> > > > to
> > > > > > > finish their tasks.  It is not best idea to use experience from
> > > > > > commercial
> > > > > > > companies in open source. Are there any pressure outside
> > community?
> > > > Did
> > > > > > > someone promised rest of features to be released at 30
> September?
> > > > > > >
> > > > > > > Let's remember principle do-orcracy, power of those who do. If
> > > > > contribor
> > > > > > > does change and reviewer does review, let's give right of
> making
> > > > > decision
> > > > > > > to them, but not to some closed club of people who privately
> > > discuss
> > > > > > > something.
> > > > > > >
> > > > > > > Sincerely
> > > > > > > Dmitriy Pavlov
> > > > > > >
> > > > > > > чт, 27 сент. 2018 г., 23:42 Vyacheslav Daradur <
> > > daradu...@gmail.com
> > > > >:
> > > > > > >
> > > > > > > > Hi Igniters!
> > > > > > > >
> > > > > > > > As I have written about Service Grid before [1] I'm
> finalizing
> > > the
> > > > > > > > solution to be sure that implementation is reliable.
> > > > > > > >
> > > > > > > > About including it in 2.7, if we talk that code freeze
> tomorrow
> > > > then
> > > > > > > > the solution is not ready to merge yet.
> > > > > > > > I hope that prereviewers Anton Vinogradov and Nikolay Izhikov
> > > will
> > > > be
> > > > > > > > able to answer if solution out of scope or not in a couple of
> > > days.
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> 

Re: Critical worker threads liveness checking drawbacks

2018-09-28 Thread Alexey Goncharuk
Nikolay, I agree, a user should be able to disable both thread liveness
check and checkpoint read lock timeout check from config and a system
property.

пт, 28 сент. 2018 г. в 11:30, Nikolay Izhikov :

> Hello, Igniters.
>
> I found that this feature can't be disabled from config.
> The only way to disable it is from JMX bean.
>
> I think it very dangerous: If we have some corner case or a bug in this
> Watch Dog it can make Ignite unusable.
> I propose to implement possibility to disable this feature both - from
> config and from JVM options.
>
> What do you think?
>
> В Чт, 27/09/2018 в 16:14 +0300, Andrey Kuznetsov пишет:
> > Maxim,
> >
> > Thanks for being attentive! It's definitely a typo. Could you please
> create
> > an issue?
> >
> > чт, 27 сент. 2018 г. в 16:00, Maxim Muzafarov :
> >
> > > Folks,
> > >
> > > I've found in `GridCachePartitionExchangeManager:2684` [1] (master
> branch)
> > > exchange future wrapped
> > > with double `blockingSectionEnd` method. Is it correct? I just want to
> > > understand this change and
> > > how should I use this in the future.
> > >
> > > Should I file a new issue to fix this? I think here
> `blockingSectionBegin`
> > > method should be used.
> > >
> > > -
> > > blockingSectionEnd();
> > >
> > > try {
> > > resVer = exchFut.get(exchTimeout, TimeUnit.MILLISECONDS);
> > > } finally {
> > > blockingSectionEnd();
> > > }
> > >
> > >
> > > [1]
> > >
> > >
> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/GridCachePartitionExchangeManager.java#L2684
> > >
> > > On Wed, 26 Sep 2018 at 22:47 Vyacheslav Daradur 
> > > wrote:
> > >
> > > > Andrey Gura, thank you for the answer!
> > > >
> > > > I agree that wrapping of 'init' method reduces the profit of watchdog
> > > > service in case of PME worker, but in other cases, we should wrap all
> > > > possible long sections on GridDhtPartitionExchangeFuture. For example
> > > > 'onCacheChangeRequest' method or
> > > > 'cctx.affinity().onCacheChangeRequest' inside because it may take
> > > > significant time (reproducer attached).
> > > >
> > > > I only want to point out a possible issue which may allow to end-user
> > > > halt the Ignite cluster accidentally.
> > > >
> > > > I'm sure that PME experts know how to fix this issue properly.
> > > > On Wed, Sep 26, 2018 at 10:28 PM Andrey Gura 
> wrote:
> > > > >
> > > > > Vyacheslav,
> > > > >
> > > > > Exchange worker is strongly tied with
> > > > > GridDhtPartitionExchangeFuture#init and it is ok. Exchange worker
> also
> > > > > shouldn't be blocked for long time but in reality it happens.It
> also
> > > > > means that your change doesn't make sense.
> > > > >
> > > > > What actually make sense it is identification of places which
> > > > > intentionally blocking. May be some places/actions should be
> braced by
> > > > > blocking guards.
> > > > >
> > > > > If you have failing tests please make sure that your
> failureHandler is
> > > > > NoOpFailureHandler or any other handler with ignoreFailureTypes =
> > > > > [CRITICAL_WORKER_BLOCKED].
> > > > >
> > > > >
> > > > > On Wed, Sep 26, 2018 at 9:43 PM Vyacheslav Daradur <
> > >
> > > daradu...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > Hi Igniters!
> > > > > >
> > > > > > Thank you for this important improvement!
> > > > > >
> > > > > > I've looked through implementation and noticed that
> > > > > > GridDhtPartitionsExchangeFuture#init has not been wrapped in
> blocked
> > > > > > section. This means it easy to halt the node in case of
> longrunning
> > > > > > actions during PME, for example when we create a cache with
> > > > > > StoreFactrory which connect to 3rd party DB.
> > > > > >
> > > > > > I'm not sure that it is the right behavior.
> > > > > >
> > > > > > I filled the issue [1] and prepared the PR [2] with reproducer
> and
> > > >
> > > > possible fix.
> > > > > >
> > > > > > Andrey, could you please look at and confirm that it makes sense?
> > > > > >
> > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-9710
> > > > > > [2] https://github.com/apache/ignite/pull/4845
> > > > > > On Mon, Sep 24, 2018 at 9:46 PM Andrey Kuznetsov <
> stku...@gmail.com>
> > > >
> > > > wrote:
> > > > > > >
> > > > > > > Denis,
> > > > > > >
> > > > > > > I've created the ticket [1] with short description of the
> > > >
> > > > functionality.
> > > > > > >
> > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-9679
> > > > > > >
> > > > > > >
> > > > > > > пн, 24 сент. 2018 г. в 17:46, Denis Magda :
> > > > > > >
> > > > > > > > Andrey K. and G.,
> > > > > > > >
> > > > > > > > Thanks, do we have a documentation ticket created? Prachi
> > >
> > > (copied)
> > > > can help
> > > > > > > > with the documentation.
> > > > > > > >
> > > > > > > > --
> > > > > > > > Denis
> > > > > > > >
> > > > > > > > On Mon, Sep 24, 2018 at 5:51 AM Andrey Gura <
> ag...@apache.org>
> > > >
> > > > wrote:
> > > > > > > >
> > > > > > > > > Andrey,

[jira] [Created] (IGNITE-9720) Initialize partition free lists lazily

2018-09-27 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-9720:


 Summary: Initialize partition free lists lazily
 Key: IGNITE-9720
 URL: https://issues.apache.org/jira/browse/IGNITE-9720
 Project: Ignite
  Issue Type: Task
Reporter: Alexey Goncharuk


When persistence is enabled, partition free lists metadata may take quite a lot 
of pages.
This results in a very long start time because 
{{GridCacheOffheapManager.GridCacheDataStore#init0}} will read all metadata for 
free list in each partition on exchange start (this is done in the 
{{CacheFreeListImpl}} constructor)
We should only read required information on exchange and defer actual free list 
initialization to the first access.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9716) Document partition distribution and reset lost partitions commands of control script

2018-09-27 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-9716:


 Summary: Document partition distribution and reset lost partitions 
commands of control script
 Key: IGNITE-9716
 URL: https://issues.apache.org/jira/browse/IGNITE-9716
 Project: Ignite
  Issue Type: Task
Affects Versions: 2.7
Reporter: Alexey Goncharuk
 Fix For: 2.7






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9715) Document WAL compression level

2018-09-27 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-9715:


 Summary: Document WAL compression level
 Key: IGNITE-9715
 URL: https://issues.apache.org/jira/browse/IGNITE-9715
 Project: Ignite
  Issue Type: Task
  Components: documentation
Affects Versions: 2.7
Reporter: Alexey Goncharuk
 Fix For: 2.7


In 2.7 we added an ability to set WAL compression level, this ability should be 
documented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9659) NonCollocatedRetryMessageSelfTest is flaky

2018-09-20 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-9659:


 Summary: NonCollocatedRetryMessageSelfTest is flaky
 Key: IGNITE-9659
 URL: https://issues.apache.org/jira/browse/IGNITE-9659
 Project: Ignite
  Issue Type: Test
Reporter: Alexey Goncharuk


https://ci.ignite.apache.org/viewLog.html?buildId=1881869=buildResultsDiv=IgniteTests24Java8_Queries2#testNameId-2853122976880171731

A few concerns on the test code:
1) What is the point of setting an anonymous discovery SPI? The overridden 
method is identical to super(), but the new discovery SPI looses test VM IP 
finder which is set by default
2) Looks like there is a race in test communication SPI code - there is an if - 
assign block for a volatile variable
3) The test fails with "Node left during query execution" - since the node is 
stopped, this should be either an allowed exception, or the test communication 
SPI should be crafted more carefully.

(FYI: Locally I have this test failing with No CacheException emitted. 
Collection size=10)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


MTCGA: SqlSystemViewsSelfTest is flaky

2018-09-18 Thread Alexey Goncharuk
Igniters,

I noticed a pretty high raise in the SqlSystemViewsSelfTest failure rate
recently [1]. There is a chance that the failure was introduced by another
SQL system view ticket [2].

Alexey, as an author of the change and the test, would you mind taking a
look? The test failure looks simple (it fails with the MXBean registration
failure).

Thanks,
--AG

[1]
https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=-7838705496181695292=%3Cdefault%3E=testDetails
[2] https://issues.apache.org/jira/browse/IGNITE-9366


Re: The future of Affinity / Topology concepts and possible PME optimizations.

2018-09-18 Thread Alexey Goncharuk
Ilya,

This is a great idea, but before we can ultimately decouple the affinity
version from the topology version, we need to fix a few things with
baseline topology first. Currently the in-memory caches are not using the
baseline topology. We are going to fix this as a part of IEP-4 Phase II
(baseline auto-adjust). Once fixed, we can safely assume that
out-of-baseline node does not affect affinity distribution.

Agree with Dmitriy that we should start with simpler optimizations first.

чт, 13 сент. 2018 г. в 15:58, Ilya Lantukh :

> Igniters,
>
> As most of you know, Ignite has a concept of AffinityTopologyVersion, which
> is associated with nodes that are currently present in topology and a
> global cluster state (active/inactive, baseline topology, started caches).
> Modification of either of them involves process called Partition Map
> Exchange (PME) and results in new AffinityTopologyVersion. At that moment
> all new cache and compute grid operations are globally "frozen". This might
> lead to indeterminate cache downtimes.
>
> However, our recent changes (esp. introduction of Baseline Topology) caused
> me to re-think those concept. Currently there are many cases when we
> trigger PME, but it isn't necessary. For example, adding/removing client
> node or server node not in BLT should never cause partition map
> modifications. Those events modify the *topology*, but *affinity* in
> unaffected. On the other hand, there are events that affect only *affinity*
> - most straightforward example is CacheAffinityChange event, which is
> triggered after rebalance is finished to assign new primary/backup nodes.
> So the term *AffinityTopologyVersion* now looks weird - it tries to "merge"
> two entities that aren't always related. To me it makes sense to introduce
> separate *AffinityVersion *and *TopologyVersion*, review all events that
> currently modify AffinityTopologyVersion and split them into 3 categories:
> those that modify only AffinityVersion, only TopologyVersion and both. It
> will allow us to process such events using different mechanics and avoid
> redundant steps, and also reconsider mapping of operations - some will be
> mapped to topology, others - to affinity.
>
> Here is my view about how different event types theoretically can be
> optimized:
> 1. Client node start / stop: as stated above, no PME is needed, ticket
> https://issues.apache.org/jira/browse/IGNITE-9558 is already in progress.
> 2. Server node start / stop not from baseline: should be similar to the
> previous case, since nodes outside of baseline cannot be partition owners.
> 3. Start node in baseline: both affinity and topology versions should be
> incremented, but it might be possible to optimize PME for such case and
> avoid cluster-wide freeze. Partition assignments for such node are already
> calculated, so we can simply put them all into MOVING state. However, it
> might take significant effort to avoid race conditions and redesign our
> architecture.
> 4. Cache start / stop: starting or stopping one cache doesn't modify
> partition maps for other caches. It should be possible to change this
> procedure to skip PME and perform all necessary actions (compute affinity,
> start/stop cache contexts on each node) in background, but it looks like a
> very complex modification too.
> 5. Rebalance finish: it seems possible to design a "lightweight" PME for
> this case as well. If there were no node failures (and if there were, PME
> should be triggered and rebalance should be cancelled anyways) all
> partition states are already known by coordinator. Furthermore, no new
> MOVING or OWNING node for any partition is introduced, so all previous
> mappings should still be valid.
>
> For the latter complex cases in might be necessary to introduce "is
> compatible" relationship between affinity versions. Operation needs to be
> remapped only if new version isn't compatible with the previous one.
>
> Please share your thoughts.
>
> --
> Best regards,
> Ilya
>


[jira] [Created] (IGNITE-9636) CacheBaselineTopologyTest#testClusterActiveWhileBaselineChanging is flaky in master

2018-09-18 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-9636:


 Summary: 
CacheBaselineTopologyTest#testClusterActiveWhileBaselineChanging is flaky in 
master
 Key: IGNITE-9636
 URL: https://issues.apache.org/jira/browse/IGNITE-9636
 Project: Ignite
  Issue Type: Test
Reporter: Alexey Goncharuk


The test asynchronously sets baseline topology and while the transition happens 
checks that public API active state returns true. Sometimes this fails because 
publicApiActiveState(false) checks for transition and may return false if 
transition is in progress.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Cache scan efficiency

2018-09-18 Thread Alexey Goncharuk
Dmitriy,

In my understanding, the proper fix for the scan query looks like a big
change and it is unlikely that we include it in Ignite 2.7. On the other
hand, the method suggested by Alexei is quite simple  and it definitely
fits Ignite 2.7, which will provide a better user experience. Even having a
proper scan query implemented this method can be useful in some specific
scenarios, so we will not have to deprecate it.

--AG

пн, 17 сент. 2018 г. в 19:15, Dmitriy Pavlov :

> As I understood it is not a hack, it is an advanced feature for warming up
> the partition. We can build warm-up of the overall cache by calling its
> partitions warm-up. Users often ask about this feature and are not
> confident with our lazy upload.
>
> Please correct me if I misunderstood the idea.
>
> пн, 17 сент. 2018 г. в 18:37, Dmitriy Setrakyan :
>
> > I would rather fix the scan than hack the scan. Is there any technical
> > reason for hacking it now instead of fixing it properly? Can some of the
> > experts in this thread provide an estimate of complexity and difference
> in
> > work that would be required for each approach?
> >
> > D.
> >
> > On Mon, Sep 17, 2018 at 4:42 PM Alexey Goncharuk <
> > alexey.goncha...@gmail.com>
> > wrote:
> >
> > > I think it would be beneficial for some Ignite users if we added such a
> > > partition warmup method to the public API. The method should be
> > > well-documented and state that it may invalidate existing page cache.
> It
> > > will be a very effective instrument until we add the proper scan
> ability
> > > that Vladimir was referring to.
> > >
> > > пн, 17 сент. 2018 г. в 13:05, Maxim Muzafarov :
> > >
> > > > Folks,
> > > >
> > > > Such warming up can be an effective technique for performing
> > calculations
> > > > which required large cache
> > > > data reads, but I think it's the single narrow use case of all over
> > > Ignite
> > > > store usages. Like all other
> > > > powerfull techniques, we should use it wisely. In the general case, I
> > > think
> > > > we should consider other
> > > > techniques mentioned by Vladimir and may create something like
> `global
> > > > statistics of cache data usage`
> > > > to choose the best technique in each case.
> > > >
> > > > For instance, it's not obvious what would take longer: multi-block
> > reads
> > > or
> > > > 50 single-block reads issues
> > > > sequentially. It strongly depends on used hardware under the hood and
> > > might
> > > > depend on workload system
> > > > resources (CPU-intensive calculations and I\O access) as well. But
> > > > `statistics` will help us to choose
> > > > the right way.
> > > >
> > > >
> > > > On Sun, 16 Sep 2018 at 23:59 Dmitriy Pavlov 
> > > wrote:
> > > >
> > > > > Hi Alexei,
> > > > >
> > > > > I did not find any PRs associated with the ticket for check code
> > > changes
> > > > > behind this idea. Are there any PRs?
> > > > >
> > > > > If we create some forwards scan of pages, it should be a very
> > > > intellectual
> > > > > algorithm including a lot of parameters (how much RAM is free, how
> > > > probably
> > > > > we will need next page, etc). We had the private talk about such
> idea
> > > > some
> > > > > time ago.
> > > > >
> > > > > By my experience, Linux systems already do such forward reading of
> > file
> > > > > data (for corresponding sequential flagged file descriptors), but
> > some
> > > > > prefetching of data at the level of application may be useful for
> > > > O_DIRECT
> > > > > file descriptors.
> > > > >
> > > > > And one more concern from me is about selecting a right place in
> the
> > > > system
> > > > > to do such prefetch.
> > > > >
> > > > > Sincerely,
> > > > > Dmitriy Pavlov
> > > > >
> > > > > вс, 16 сент. 2018 г. в 19:54, Vladimir Ozerov <
> voze...@gridgain.com
> > >:
> > > > >
> > > > > > HI Alex,
> > > > > >
> > > > > > This is good that you observed speedup. But I do not think this
> > > > solution
> > > > > > works 

Re: Cache scan efficiency

2018-09-17 Thread Alexey Goncharuk
I think it would be beneficial for some Ignite users if we added such a
partition warmup method to the public API. The method should be
well-documented and state that it may invalidate existing page cache. It
will be a very effective instrument until we add the proper scan ability
that Vladimir was referring to.

пн, 17 сент. 2018 г. в 13:05, Maxim Muzafarov :

> Folks,
>
> Such warming up can be an effective technique for performing calculations
> which required large cache
> data reads, but I think it's the single narrow use case of all over Ignite
> store usages. Like all other
> powerfull techniques, we should use it wisely. In the general case, I think
> we should consider other
> techniques mentioned by Vladimir and may create something like `global
> statistics of cache data usage`
> to choose the best technique in each case.
>
> For instance, it's not obvious what would take longer: multi-block reads or
> 50 single-block reads issues
> sequentially. It strongly depends on used hardware under the hood and might
> depend on workload system
> resources (CPU-intensive calculations and I\O access) as well. But
> `statistics` will help us to choose
> the right way.
>
>
> On Sun, 16 Sep 2018 at 23:59 Dmitriy Pavlov  wrote:
>
> > Hi Alexei,
> >
> > I did not find any PRs associated with the ticket for check code changes
> > behind this idea. Are there any PRs?
> >
> > If we create some forwards scan of pages, it should be a very
> intellectual
> > algorithm including a lot of parameters (how much RAM is free, how
> probably
> > we will need next page, etc). We had the private talk about such idea
> some
> > time ago.
> >
> > By my experience, Linux systems already do such forward reading of file
> > data (for corresponding sequential flagged file descriptors), but some
> > prefetching of data at the level of application may be useful for
> O_DIRECT
> > file descriptors.
> >
> > And one more concern from me is about selecting a right place in the
> system
> > to do such prefetch.
> >
> > Sincerely,
> > Dmitriy Pavlov
> >
> > вс, 16 сент. 2018 г. в 19:54, Vladimir Ozerov :
> >
> > > HI Alex,
> > >
> > > This is good that you observed speedup. But I do not think this
> solution
> > > works for the product in general case. Amount of RAM is limited, and
> > even a
> > > single partition may need more space than RAM available. Moving a lot
> of
> > > pages to page memory for scan means that you evict a lot of other
> pages,
> > > what will ultimately lead to bad performance of subsequent queries and
> > > defeat LRU algorithms, which are of great improtance for good database
> > > performance.
> > >
> > > Database vendors choose another approach - skip BTrees, iterate
> direclty
> > > over data pages, read them in multi-block fashion, use separate scan
> > buffer
> > > to avoid excessive evictions of other hot pages. Corresponding ticket
> for
> > > SQL exists [1], but idea is common for all parts of the system,
> requiring
> > > scans.
> > >
> > > As far as proposed solution, it might be good idea to add special API
> to
> > > "warmup" partition with clear explanation of pros (fast scan after
> > warmup)
> > > and cons (slowdown of any other operations). But I think we should not
> > make
> > > this approach part of normal scans.
> > >
> > > Vladimir.
> > >
> > > [1] https://issues.apache.org/jira/browse/IGNITE-6057
> > >
> > >
> > > On Sun, Sep 16, 2018 at 6:44 PM Alexei Scherbakov <
> > > alexey.scherbak...@gmail.com> wrote:
> > >
> > > > Igniters,
> > > >
> > > > My use case involves scenario where it's necessary to iterate over
> > > > large(many TBs) persistent cache doing some calculation on read data.
> > > >
> > > > The basic solution is to iterate cache using ScanQuery.
> > > >
> > > > This turns out to be slow because iteration over cache involves a lot
> > of
> > > > random disk access for reading data pages referenced from leaf pages
> by
> > > > links.
> > > >
> > > > This is especially true when data is stored on disks with slow random
> > > > access, like SAS disks. In my case on modern SAS disks array reading
> > > speed
> > > > was like several MB/sec while sequential read speed in perf test was
> > > about
> > > > GB/sec.
> > > >
> > > > I was able to fix the issue by using ScanQuery with explicit
> partition
> > > set
> > > > and running simple warmup code before each partition scan.
> > > >
> > > > The code pins cold pages in memory in sequential order thus
> eliminating
> > > > random disk access. Speedup was like x100 magnitude.
> > > >
> > > > I suggest adding the improvement to the product's core  by always
> > > > sequentially preloading pages for all internal partition iterations
> > > (cache
> > > > iterators, scan queries, sql queries with scan plan) if partition is
> > cold
> > > > (low number of pinned pages).
> > > >
> > > > This also should speed up rebalancing from cold partitions.
> > > >
> > > > Ignite JIRA ticket [1]
> > > >
> > > > Thoughts ?
> > > >
> > > > [1] 

Re: Apache Ignite 2.7 release

2018-09-14 Thread Alexey Goncharuk
We already have all the mechanics in place to work with properties - we use
ignite.build and ignite.revision from ignite.properties which are adjusted
during the build in the binary package.

Should I create the ticket if there are no objections?

пт, 14 сент. 2018 г. в 13:22, Ilya Kasnacheev :

> Hello!
>
> So now there's an issue that this script makes source change after every
> build, show up in git status.
>
> What we could do to it:
> - Commit the changes after the build, once. In hopes that it won't change
> very often. With benefit that we could do that right now, before the code
> freeze.
> - Move these values to a properties file from both pom.xml and
> IgniteProvider.java. Any problems with this approach? We'll just read them
> from classpath properties file.
> - Update the links in the file once and remove them from build process. Why
> were they added to build process in the first place - to make them
> configurable during build?
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> вт, 11 сент. 2018 г. в 5:53, Roman Shtykh :
>
> > Ilya,
> >
> > The "latest" version is the default, and resolved by
> > https://ignite.apache.org/latest which is used by our web site when a
> > user download the latest Ignite version. And I think this is the
> authority
> > to judge of the latest official release (pom.xml you suggest can have
> > SNAPSHOTs etc.).
> > Also, as I explained during our review sessions, ignite-mesos-2.6.0 is a
> > driver and doesn't mean you need to have Ignite 2.6.0. User can run any
> > version of Ignite he/she specifies. By default, it's "latest" but a user
> > can specify any version needed, even from a non-archive URL.
> >
> > In short, what we have now
> > 1. mesos driver (ignite-mesos-x.x.x) will use "latest" version by default
> > -> it will try to resolve the latest officially releases version of
> Apache
> > Ignite, find the closest mirror and download Ignite in a minute. If the
> > version resolution fails, we fall back to the slow apache archive (as you
> > suggest; in my opinion we better fail-fast instead of waiting for hours
> to
> > download, so the user can choose another download option (3))
> > 2. If the user specifies the version explicitly, it goes to the slow
> > apache archive.
> > 3. The user can put ignite zip file on his/her http server and provide
> the
> > URL as a parameter to the driver, if options 1 and 2 don't work.
> >
> > As you see, there are 3 options. And I just fix the 1st one with
> > https://issues.apache.org/jira/browse/IGNITE-9388 and don't change the
> > original logic (which I find reasonable) documented on our site -- I
> don't
> > see how it blocks anything.
> >
> > Roman Shtykh
> >
> >
> > On Monday, September 10, 2018, 6:16:15 p.m. GMT+9, Ilya Kasnacheev <
> > ilya.kasnach...@gmail.com> wrote:
> >
> >
> > Hello!
> >
> > There's still two issues with the submission.
> >
> > The first one is that we're downloading "latest" version from preferred
> > mirror but a specified version, such as "2.6", we're also going to
> download
> > from "slow" archive.apache.org/dist.
> > That's a great limitation for this change, since most real deployments of
> > Apache Ignite will have their Ignite version pegged to a specific
> release.
> > But in this case there's no win in download speed.
> > *In my opinion it is a blocker.*
> >
> > The second one is that we can't download anything when we failed to
> > resolve "latest". My idea is that we should try and download last known
> > version in this case, which can be pushed to source from pom.xml, as we
> > already do with URLs. So if you could not resolve "latest" you will
> > download 2.7.0.
> >
> > Buuut, maybe it's not necessary, maybe we should just *discourage
> > "latest"*, which is in my opinion almost always a bad idea.
> >
> > WDYT?
> >
> > Regards,
> > --
> > Ilya Kasnacheev
> >
> >
> > вс, 9 сент. 2018 г. в 5:47, Roman Shtykh :
> >
> > Hi Ilya,
> >
> > Sorry, missed that.
> > Added now.
> >
> > --
> > Roman Shtykh
> >
> >
> > On Thursday, September 6, 2018, 6:16:58 p.m. GMT+9, Ilya Kasnacheev <
> > ilya.kasnach...@gmail.com> wrote:
> >
> >
> > Hello!
> >
> > The last of my requests still standing is that we should fall-back to
> > single URL download in case of error with 'latest' version. Everything
> else
> > looks good to me.
> >
> > Can we do that? I'm really worried that Apache API will go sour.
> >
> > Regards,
> > --
> > Ilya Kasnacheev
> >
> >
> > чт, 6 сент. 2018 г. в 8:56, Roman Shtykh :
> >
> > Hi Ilya,
> >
> > Thanks again.
> >
> > 1) Done.
> > 2) Used catch() for latest version.
> >
> > Please see my comments on github.
> > --
> > Roman Shtykh
> >
> >
> > On Wednesday, September 5, 2018, 11:30:10 p.m. GMT+9, Ilya Kasnacheev <
> > ilya.kasnach...@gmail.com> wrote:
> >
> >
> > Hello!
> >
> > I've left a new wave of replies.
> >
> > Basically, 1) let's keep DOWNLOAD_URL_PATTERN string value inlined so
> > that it will work even if build process is broken (would be useful for
> e.g.
> > developing 

[jira] [Created] (IGNITE-9603) Cache proxy flags are not (de)serialized

2018-09-14 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-9603:


 Summary: Cache proxy flags are not (de)serialized
 Key: IGNITE-9603
 URL: https://issues.apache.org/jira/browse/IGNITE-9603
 Project: Ignite
  Issue Type: Bug
Reporter: Alexey Goncharuk


The following code will fail with {{ClassNotFoundException}} assuming that 
cache contains a complex object with key {{1}}.

IgniteCache cache = ignite.cache(CACHE_NAME).withKeepBinary();

ignite.compute().broadcast(() -> {
cache.get(1);
});

The {{withKeepBinary()}} flag is not serialized into the closure and causes the 
{{cache.get(1)}} to attempt to deserialize cache read result.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9584) .NET DataStorageMetricsTest is flaky in master

2018-09-13 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-9584:


 Summary: .NET DataStorageMetricsTest is flaky in master
 Key: IGNITE-9584
 URL: https://issues.apache.org/jira/browse/IGNITE-9584
 Project: Ignite
  Issue Type: Test
Reporter: Alexey Goncharuk






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Class field ThreadLocal. Why not static?

2018-09-13 Thread Alexey Goncharuk
Maxim,

If multiple instances of Ignite is started in the same JVM and a user
thread will access first one instance of Ignite, then another, you will end
up with the static thread local holding the last WAL pointer from the
second grid. This is possible, for example, when a user thread commits a
transaction or runs an atomic update on a data node. Any access of the
first Ignite instance will have an invalid thread-local value.

вт, 11 сент. 2018 г. в 13:29, Maxim Muzafarov :

> Alexey, Ivan,
>
> Agree. Keeping strong references to the Thread object is the source of
> memory leak with ThreadLocals variables
> and the values that it stores. ThreadLocalMap is bound to the Thread
> lifespan [1], so I think when we are using
> everything right all will be GC'ed correctly.
> Is this memory leaks with ThreadLocal's you mean, Alexey? If not, please,
> share your example.
>
> Also, agree that these usages should be bound to the component lifespan.
> But for `FileWriteAheadLogManager`
> I think this variable used not semantically right. I've dumped all threads
> (total ~49 threads)
> that are using `lastWalPtr` in `FileWriteAheadLogManager`. For instance,
>  * exchange-worker-#40%wal.IgniteWalRecoveryTest0%
>  * sys-#148%wal.IgniteWalRecoveryTest1%
>  * db-checkpoint-thread-#129%wal.IgniteWalRecoveryTest2%
> Suppose everything would be OK here for `static` and `non-static` case of
> ThreadLocal.
>
> [1]
>
> http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/lang/Thread.java#l760
>
> On Tue, 11 Sep 2018 at 13:05 Павлухин Иван  wrote:
>
> > Dmitriy,
> >
> > Could you point to some piece of code implementing described pattern?
> >
> > 2018-09-11 13:02 GMT+03:00 Павлухин Иван :
> >
> > > Alex,
> > >
> > > ThreadLocal subclass is used in IgniteH2Indexing for simple access to
> H2
> > > Connection from current thread. Such subclass has a capability to
> create
> > > connection if one does not exist, so obtaining connection is merely
> > > ThreadLocal.get. Also there are scheduled routines to cleanup
> connections
> > > and associated with them statement cache after some expiration time.
> For
> > > that reason Map is maintained. As query
> can
> > > run on user thread we need to cleanup mentioned map to avoid a leak
> when
> > > Thread is terminated. So we need to check thread status in cleanup
> > routines
> > > and remove entries for terminated Threads. And historically there was
> no
> > > cleanup for terminated threads and leak was possible. And also great
> care
> > > must be taken in order to avoid cyclic reference between ThreadLocal
> > > instance and a stored value. Which easily could occur if the stored
> value
> > > is covered by multiple layers of abstraction.
> > >
> > > And I am describing some historical state. Now machinery in
> > IgniteH2Indexing
> > > is even more complex (I hope we will have a chance to improve it).
> > >
> > > 2018-09-11 11:00 GMT+03:00 Alexey Goncharuk <
> alexey.goncha...@gmail.com
> > >:
> > >
> > >> Ivan,
> > >>
> > >> Can you elaborate on the issue with the thread local cleanup you've
> > faced?
> > >>
> > >> вт, 11 сент. 2018 г. в 9:13, Павлухин Иван :
> > >>
> > >> > Guys,
> > >> >
> > >> > As we know ThreadLocal is an instrument which should be used with
> > great
> > >> > care. And I recently faced with problems related to proper cleanup
> of
> > >> > ThreadLocal which is not needed anymore. In my opinion the best
> thing
> > >> (in
> > >> > ideal world) is to get rid of ThreadLocal where possible, but I
> guess
> > >> that
> > >> > it is quite hard (in real world).
> > >> >
> > >> > Also, a question comes to my mind. As ThreadLocal is so common in
> our
> > >> code,
> > >> > could you suggest some guidance or code fragments which address
> proper
> > >> > ThreadLocal
> > >> >  lifecycle control and especially cleanup?
> > >> >
> > >> > 2018-09-10 12:46 GMT+03:00 Alexey Goncharuk <
> > alexey.goncha...@gmail.com
> > >> >:
> > >> >
> > >> > > Maxim,
> > >> > >
> > >> > > Ignite supports starting multiple instances of Ignite in the same
> > VM,
> > >> so
> > >> > > having static thread locals for the fields you mentioned does no

[jira] [Created] (IGNITE-9558) Avoid changing AffinityTopologyVersion on client connect when possible

2018-09-12 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-9558:


 Summary: Avoid changing AffinityTopologyVersion on client connect 
when possible
 Key: IGNITE-9558
 URL: https://issues.apache.org/jira/browse/IGNITE-9558
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 2.0
Reporter: Alexey Goncharuk


Currently a client join event changes discovery topology version which, in 
turn, changes AffinityTopologyVersion.
When a client maps transaction on new AffinityTopologyVersion, corresponding 
message is not processed on remote node until remote node receives the 
corresponding discovery event. If discovery event delivery is delayed for some 
reason, this will result in transaction stalls on client joins.

Since the client node does not change partition affinity, we can safely map 
transactions on the previous topology version and do not change the affinity 
topology version at all.
Some cases need special care and probably do not qualify for this optimization, 
such as when client has near cache or client hosts partition for REPLICATED 
cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Class field ThreadLocal. Why not static?

2018-09-11 Thread Alexey Goncharuk
Ivan,

Can you elaborate on the issue with the thread local cleanup you've faced?

вт, 11 сент. 2018 г. в 9:13, Павлухин Иван :

> Guys,
>
> As we know ThreadLocal is an instrument which should be used with great
> care. And I recently faced with problems related to proper cleanup of
> ThreadLocal which is not needed anymore. In my opinion the best thing (in
> ideal world) is to get rid of ThreadLocal where possible, but I guess that
> it is quite hard (in real world).
>
> Also, a question comes to my mind. As ThreadLocal is so common in our code,
> could you suggest some guidance or code fragments which address proper
> ThreadLocal
>  lifecycle control and especially cleanup?
>
> 2018-09-10 12:46 GMT+03:00 Alexey Goncharuk :
>
> > Maxim,
> >
> > Ignite supports starting multiple instances of Ignite in the same VM, so
> > having static thread locals for the fields you mentioned does not work.
> >
> > Generally, I think thread-local should be bound to the lifespan of the
> > component it describes. Static thread-locals are hard to clean-up and
> they
> > often lead to leaks, so I would rather changed existing static
> > thread-locals to be non-static.
> >
> > --AG
> >
> > пн, 10 сент. 2018 г. в 11:54, Maxim Muzafarov :
> >
> > > Igniters,
> > >
> > > According to javadoc [1] class ThreadLocal:
> > > `ThreadLocal instances are typically private *static* fields in classes
> > > that wish to associate state with a thread (e.g., a user ID or
> > Transaction
> > > ID).`
> > >
> > > So, AFAIK non-static ThreadLocal usage means as `per thread - per class
> > > instance`. What the real cases of using non-static ThreadLocal class
> > fields
> > > in Ignite code project? When we need it?
> > >
> > > In Ignite code project I've found ThreadLocal usage as:
> > >  - non-static - 67
> > >  - static  - 68
> > >
> > > Back to my example, I've checked FileWriteAheadLogManager. It has:
> > > 1) private final ThreadLocal interrupted [2]
> > > 2) private final ThreadLocal lastWALPtr [3]
> > > I think both of these fields should be set and used as `static`. Can
> > anyone
> > > confirm it?
> > >
> > >
> > > [1]
> https://docs.oracle.com/javase/8/docs/api/java/lang/ThreadLocal.html
> > > [2]
> > >
> > > https://github.com/apache/ignite/blob/master/modules/
> > core/src/main/java/org/apache/ignite/internal/processors/
> > cache/persistence/wal/FileWriteAheadLogManager.java#L253
> > > [3]
> > >
> > > https://github.com/apache/ignite/blob/master/modules/
> > core/src/main/java/org/apache/ignite/internal/processors/
> > cache/persistence/wal/FileWriteAheadLogManager.java#L340
> > > --
> > > --
> > > Maxim Muzafarov
> > >
> >
>
>
>
> --
> Best regards,
> Ivan Pavlukhin
>


[jira] [Created] (IGNITE-9520) Investigate fuzzy free lists

2018-09-10 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-9520:


 Summary: Investigate fuzzy free lists
 Key: IGNITE-9520
 URL: https://issues.apache.org/jira/browse/IGNITE-9520
 Project: Ignite
  Issue Type: Task
Reporter: Alexey Goncharuk


We have several data structures (free list, reuse list) associated with each 
partition. For these structures a major part of their state is maintained 
on-heap and persisted during checkpoints.
This yields a lot of random disk accesses during checkpoints which 
significantly increases checkpoint mark phase (done under checkpoint write lock 
and essentially blocks all tx ops on the node).

Need to investigate if we can implement some sort of a data structure which is 
updated lazily and may be out-of date, then we can update these data structures 
outside of checkpoint mark phases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Class field ThreadLocal. Why not static?

2018-09-10 Thread Alexey Goncharuk
Maxim,

Ignite supports starting multiple instances of Ignite in the same VM, so
having static thread locals for the fields you mentioned does not work.

Generally, I think thread-local should be bound to the lifespan of the
component it describes. Static thread-locals are hard to clean-up and they
often lead to leaks, so I would rather changed existing static
thread-locals to be non-static.

--AG

пн, 10 сент. 2018 г. в 11:54, Maxim Muzafarov :

> Igniters,
>
> According to javadoc [1] class ThreadLocal:
> `ThreadLocal instances are typically private *static* fields in classes
> that wish to associate state with a thread (e.g., a user ID or Transaction
> ID).`
>
> So, AFAIK non-static ThreadLocal usage means as `per thread - per class
> instance`. What the real cases of using non-static ThreadLocal class fields
> in Ignite code project? When we need it?
>
> In Ignite code project I've found ThreadLocal usage as:
>  - non-static - 67
>  - static  - 68
>
> Back to my example, I've checked FileWriteAheadLogManager. It has:
> 1) private final ThreadLocal interrupted [2]
> 2) private final ThreadLocal lastWALPtr [3]
> I think both of these fields should be set and used as `static`. Can anyone
> confirm it?
>
>
> [1] https://docs.oracle.com/javase/8/docs/api/java/lang/ThreadLocal.html
> [2]
>
> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/wal/FileWriteAheadLogManager.java#L253
> [3]
>
> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/wal/FileWriteAheadLogManager.java#L340
> --
> --
> Maxim Muzafarov
>


Re: IoC/DI support in Apache Ignite.NEt

2018-09-10 Thread Alexey Goncharuk
Hello Artyom,

Welcome to the Apache Ignite community! I've added you to the list of
contributors, you should now be able to assign tickets to yourself.

Get familiar with Apache Ignite development process described here:
https://cwiki.apache.org/confluence/display/IGNITE/Development+Process

Instructions on how to contribute can be found here:
https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute

Project setup in Intellij IDEA:
https://cwiki.apache.org/confluence/display/IGNITE/Project+Setup

--AG

вс, 9 сент. 2018 г. в 15:20, Artyom Sokolov :

> Hello,
>
> I would like to start implementing IoC/DI support (Autofac, Castle Windsor,
> etc.) in Apache Ignite.NET.
>
> Please add me as a contributor in JIRA, so I could create ticket and start
> working on this.
>
> My JIRA username is applicazza.
>
> Cheers,
> Artyom.
>


Re: IGNITE-7482 Cursor in TextQuery fetches all data in first call to next() or hasNext()

2018-09-10 Thread Alexey Goncharuk
Hi,

Please send your jira account ID so we can add you to the contributors
list. Then you will be able to assign tickets to yourself and contribute to
the project according to the process.

You can get more info here:

https://cwiki.apache.org/confluence/display/IGNITE/Development+Process
https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute

--AG

пн, 10 сент. 2018 г. в 9:16, Tâm Nguyễn Mạnh :

> Hi,
> I have not been assigned yet. But i really want to.
>
> On Fri, Sep 7, 2018 at 4:13 PM Ilya Kasnacheev 
> wrote:
>
> > Hello!
> >
> > Can you please frame it as Github pull request as per our process? Do you
> > have ticket for that?
> >
> > Regards,
> > --
> > Ilya Kasnacheev
> >
> >
> > пт, 7 сент. 2018 г. в 5:08, Tâm Nguyễn Mạnh  >:
> >
> > >
> > >
> >
> modules\indexing\src\main\java\org\apache\ignite\internal\processors\query\h2\opt\GridLuceneIndex.java
> > > ```java
> > > /*
> > >  * Licensed to the Apache Software Foundation (ASF) under one or more
> > >  * contributor license agreements.  See the NOTICE file distributed
> with
> > >  * this work for additional information regarding copyright ownership.
> > >  * The ASF licenses this file to You under the Apache License, Version
> > 2.0
> > >  * (the "License"); you may not use this file except in compliance with
> > >  * the License.  You may obtain a copy of the License at
> > >  *
> > >  *  http://www.apache.org/licenses/LICENSE-2.0
> > >  *
> > >  * Unless required by applicable law or agreed to in writing, software
> > >  * distributed under the License is distributed on an "AS IS" BASIS,
> > >  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> > implied.
> > >  * See the License for the specific language governing permissions and
> > >  * limitations under the License.
> > >  */
> > >
> > > package org.apache.ignite.internal.processors.query.h2.opt;
> > >
> > > import java.io.IOException;
> > > import java.util.Collection;
> > > import java.util.concurrent.atomic.AtomicLong;
> > > import org.apache.ignite.IgniteCheckedException;
> > > import org.apache.ignite.internal.GridKernalContext;
> > > import org.apache.ignite.internal.processors.cache.CacheObject;
> > > import org.apache.ignite.internal.processors.cache.CacheObjectContext;
> > > import
> > > org.apache.ignite.internal.processors.cache.version.GridCacheVersion;
> > > import
> > > org.apache.ignite.internal.processors.query.GridQueryIndexDescriptor;
> > > import
> > org.apache.ignite.internal.processors.query.GridQueryTypeDescriptor;
> > > import org.apache.ignite.internal.util.GridAtomicLong;
> > > import org.apache.ignite.internal.util.GridCloseableIteratorAdapter;
> > > import org.apache.ignite.internal.util.lang.GridCloseableIterator;
> > > import org.apache.ignite.internal.util.offheap.unsafe.GridUnsafeMemory;
> > > import org.apache.ignite.internal.util.typedef.internal.U;
> > > import org.apache.ignite.lang.IgniteBiTuple;
> > > import org.apache.ignite.spi.indexing.IndexingQueryFilter;
> > > import org.apache.ignite.spi.indexing.IndexingQueryCacheFilter;
> > > import org.apache.lucene.analysis.standard.StandardAnalyzer;
> > > import org.apache.lucene.document.Document;
> > > import org.apache.lucene.document.Field;
> > > import org.apache.lucene.document.LongField;
> > > import org.apache.lucene.document.StoredField;
> > > import org.apache.lucene.document.StringField;
> > > import org.apache.lucene.document.TextField;
> > > import org.apache.lucene.index.DirectoryReader;
> > > import org.apache.lucene.index.IndexReader;
> > > import org.apache.lucene.index.IndexWriter;
> > > import org.apache.lucene.index.IndexWriterConfig;
> > > import org.apache.lucene.index.Term;
> > > import org.apache.lucene.queryparser.classic.MultiFieldQueryParser;
> > > import org.apache.lucene.search.BooleanClause;
> > > import org.apache.lucene.search.BooleanQuery;
> > > import org.apache.lucene.search.IndexSearcher;
> > > import org.apache.lucene.search.NumericRangeQuery;
> > > import org.apache.lucene.search.Query;
> > > import org.apache.lucene.search.ScoreDoc;
> > > import org.apache.lucene.search.TopDocs;
> > > import org.apache.lucene.util.BytesRef;
> > > import org.h2.util.JdbcUtils;
> > > import org.jetbrains.annotations.Nullable;
> > >
> > > import static
> > > org.apache.ignite.internal.processors.query.QueryUtils.KEY_FIELD_NAME;
> > > import static
> > > org.apache.ignite.internal.processors.query.QueryUtils.VAL_FIELD_NAME;
> > >
> > > /**
> > >  * Lucene fulltext index.
> > >  */
> > > public class GridLuceneIndex implements AutoCloseable {
> > > /** Field name for string representation of value. */
> > > public static final String VAL_STR_FIELD_NAME = "_gg_val_str__";
> > >
> > > /** Field name for value version. */
> > > public static final String VER_FIELD_NAME = "_gg_ver__";
> > >
> > > /** Field name for value expiration time. */
> > > public static final String EXPIRATION_TIME_FIELD_NAME =
> > > "_gg_expires__";
> > >
> > >  

[jira] [Created] (IGNITE-9498) SchemaExchangeSelfTest should use test IP finder for all discovery SPIs

2018-09-07 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-9498:


 Summary: SchemaExchangeSelfTest should use test IP finder for all 
discovery SPIs
 Key: IGNITE-9498
 URL: https://issues.apache.org/jira/browse/IGNITE-9498
 Project: Ignite
  Issue Type: Test
Reporter: Alexey Goncharuk






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Can not gather default cluster on localhost

2018-09-07 Thread Alexey Goncharuk
Hello Dmitriy,

Did you use the default config shipped with Ignite? Can you try modifying
the config to use TcpDiscoveryVmIpFinder providing only 127.0.0.1:47500 in
the list of IP addresses?

чт, 6 сент. 2018 г. в 22:16, Dmitry Melnichuk <
dmitry.melnic...@nobitlost.com>:

> Hello igniters!
>
> I am again in need of your mighty helping hand.
>
> Recently I experienced a strange bug. It may be a fresh regression, or
> maybe I myself did something wrong. It would be great if someone looked
> into this matter.
>
> I have a need of testing some things in my client against a cluster of
> multiple Ignite instances. If my understanding is correct, I can create
> an Ignite cluster on localhost just by launching multiple
> `bin/ignite.sh` in console(s) with no special parameters. I tried doing
> so, but each Ignite instance I created was forming a separate cluster,
> and no data sharing occurred.
>
> I was told that in some cases IPv6 may prevent default Ignite instances
> from gathering into a cluster, at least there was a same or similar case
> with Ignite 2.7 on fresh MacOS X. I disabled IPv6 with
>
> ```
> $ _JAVA_OPTIONS=-Djava.net.preferIPv4Stack=true bin/ignite.sh
> ```
>
> but the outcome did not change. For me it looks like clustering is
> enabled, but instances can not detect each other's presence.
>
> My OS is Arch Linux. This is my java version:
>
> ```
> $ java -version
> openjdk version "1.8.0_181"
> OpenJDK Runtime Environment (build 1.8.0_181-b13)
> OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)
> ```
>
> Output of the first console:
>
> ```
>  >>> +--+
>  >>> Ignite ver. 2.7.0.20180830#19700101-sha1:DEV
>  >>> +--+
>  >>> OS name: Linux 4.18.3-arch1-1-ARCH amd64
>  >>> CPU(s): 4
>  >>> Heap: 1.0GB
>  >>> VM name: 20751@ibmpc
>  >>> Local node [ID=FD43A01E-C0D1-4ED3-A653-036982C0205F, order=1,
> clientMode=false]
>  >>> Local node addresses: [ibmpc/127.0.0.1, ibmpc/192.168.0.222,
> /192.168.122.1]
>  >>> Local ports: TCP:8080 TCP:10800 TCP:11211 TCP:47100 UDP:47400
> TCP:47500
>
> [07-09-2018 03:39:41][INFO ][main][GridDiscoveryManager] Topology
> snapshot [ver=1, servers=1, clients=0, CPUs=4, offheap=3.1GB, heap=1.0GB]
> [07-09-2018 03:39:41][INFO ][main][GridDiscoveryManager]   ^-- Node
> [id=FD43A01E-C0D1-4ED3-A653-036982C0205F, clusterState=ACTIVE]
> [07-09-2018 03:39:41][INFO ][main][GridDiscoveryManager] Data Regions
> Configured:
> [07-09-2018 03:39:41][INFO ][main][GridDiscoveryManager]   ^-- default
> [initSize=256.0 MiB, maxSize=3.1 GiB, persistenceEnabled=false]
>
> ```
>
> Second console:
> ```
>  >>> +--+
>  >>> Ignite ver. 2.7.0.20180830#19700101-sha1:DEV
>  >>> +--+
>  >>> OS name: Linux 4.18.3-arch1-1-ARCH amd64
>  >>> CPU(s): 4
>  >>> Heap: 1.0GB
>  >>> VM name: 20168@ibmpc
>  >>> Local node [ID=108D943D-CE66-4A5D-B8DD-F59F199F4E66, order=1,
> clientMode=false]
>  >>> Local node addresses: [ibmpc/127.0.0.1, ibmpc/192.168.0.222,
> /192.168.122.1]
>  >>> Local ports: TCP:8081 TCP:10801 TCP:11212 TCP:47101 UDP:47400
> TCP:47501
>
> [07-09-2018 03:37:13][INFO ][main][GridDiscoveryManager] Topology
> snapshot [ver=1, servers=1, clients=0, CPUs=4, offheap=3.1GB, heap=1.0GB]
> [07-09-2018 03:37:13][INFO ][main][GridDiscoveryManager]   ^-- Node
> [id=108D943D-CE66-4A5D-B8DD-F59F199F4E66, clusterState=ACTIVE]
> ```
>
> Third console:
> `>>> +--+
>  >>> Ignite ver. 2.7.0.20180830#19700101-sha1:DEV
>  >>> +--+
>  >>> OS name: Linux 4.18.3-arch1-1-ARCH amd64
>  >>> CPU(s): 4
>  >>> Heap: 1.0GB
>  >>> VM name: 20433@ibmpc
>  >>> Local node [ID=40D54138-491D-4AF9-BD10-5A3B53E38E77, order=1,
> clientMode=false]
>  >>> Local node addresses: [ibmpc/127.0.0.1, ibmpc/192.168.0.222,
> /192.168.122.1]
>  >>> Local ports: TCP:8082 TCP:10802 TCP:11213 TCP:47102 UDP:47400
> TCP:47502
>
> [07-09-2018 03:38:23][INFO ][main][GridDiscoveryManager] Topology
> snapshot [ver=1, servers=1, clients=0, CPUs=4, offheap=3.1GB, heap=1.0GB]
> [07-09-2018 03:38:23][INFO ][main][GridDiscoveryManager]   ^-- Node
> [id=40D54138-491D-4AF9-BD10-5A3B53E38E77, clusterState=ACTIVE]
> [07-09-2018 03:38:23][INFO ][main][GridDiscoveryManager] Data Regions
> Configured:
> [07-09-2018 03:38:23][INFO ][main][GridDiscoveryManager]   ^-- default
> [initSize=256.0 MiB, maxSize=3.1 GiB, persistenceEnabled=false]``
>
> ```
>
> Dmitry
>


[jira] [Created] (IGNITE-9485) Update documentation for ScanQuery with setLocal flag

2018-09-06 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-9485:


 Summary: Update documentation for ScanQuery with setLocal flag
 Key: IGNITE-9485
 URL: https://issues.apache.org/jira/browse/IGNITE-9485
 Project: Ignite
  Issue Type: Task
Reporter: Alexey Goncharuk






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9479) Spontaneous rebalance may be triggered after a cache start

2018-09-06 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-9479:


 Summary: Spontaneous rebalance may be triggered after a cache start
 Key: IGNITE-9479
 URL: https://issues.apache.org/jira/browse/IGNITE-9479
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.5
Reporter: Alexey Goncharuk
Assignee: Alexey Goncharuk
 Fix For: 2.7


As an optimization, we do not run two-phase exchange latch release and do not 
validate partition counters during cache start.
This can lead to a situation, when rebalance is mistakenly triggered under load 
after a cache start.

We should treat cache start event the same way as other exchange events that 
need counters sync.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Request for review : IGNITE-3303 Apache Flink Integration - Flink source

2018-09-04 Thread Alexey Goncharuk
Hello Saikat,

I see a few fellow Igniters added some comments to your PR (including me).
I believe the PR can be merged after you address them.

Thanks,
AG

пт, 31 авг. 2018 г. в 3:11, Saikat Maitra :

> Thank you, Denis
>
> Regards,
> Saikat
>
> On Thu, Aug 30, 2018 at 7:01 PM, Denis Magda  wrote:
>
> > Hello Saikat,
> >
> > Hopefully, someone from the community will review the changes in the
> > nearest time.
> >
> > --
> > Denis
> >
> > On Thu, Aug 30, 2018 at 4:37 PM Saikat Maitra 
> > wrote:
> >
> > > Hello,
> > >
> > > The changes for IGNITE-3303 for IgniteSource is complete. This will
> help
> > is
> > > streaming data from Ignite cluster and process, filter, transform and
> > > publish it back to Ignite using IgniteSink or in any other data sink.
> > >
> > > I was hoping if the changes can be approved I can go ahead merge the
> > > changes.
> > >
> > >
> > > Regards,
> > > Saikat
> > >
> > >
> > >
> > > On Tue, Aug 28, 2018 at 12:56 AM, Saikat Maitra <
> saikat.mai...@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi Andrew,
> > > >
> > > > As discussed I have incorporated the changes. Please review and let
> me
> > > > know if any changes required.
> > > >
> > > > Regards,
> > > > Saikat
> > > >
> > > > On Mon, Aug 27, 2018 at 1:45 AM, Saikat Maitra <
> > saikat.mai...@gmail.com>
> > > > wrote:
> > > >
> > > >> Hi,
> > > >>
> > > >> I have updated the PR with additional tests.
> > > >>
> > > >> Please review and share feedback.
> > > >>
> > > >> This PR is related to IgniteSink but allows to stream data from
> > Ignite.
> > > >>
> > > >> PR https://github.com/apache/ignite/pull/870/files
> > > >>
> > > >> Review https://reviews.ignite.apache.org/ignite/review/IGNT-CR-135
> > > >>
> > > >> Regards,
> > > >> Saikat
> > > >>
> > > >
> > > >
> > >
> >
>


Re: Questions about getAllInternal(...)

2018-09-04 Thread Alexey Goncharuk
Hello Steve,

A cache entry becomes obsolete once the on-heap object is no longer locked
and is not used by any thread. Since we moved to off-heap-first model in
Ignite 2.0, we must clean on-heap entries as soon as possible to keep the
heap small. Thus both of the places you pointed out.

Hope this helps,
--AG

вт, 4 сент. 2018 г. в 12:04, steve.hostett...@gmail.com <
steve.hostett...@gmail.com>:

> Hello,
>
> in the case of local caches without eviction policy. I have the following
> questions:
>
> 1) I would to understand why, in the method, getAllInternal the method
> entryEx(cacheKey); uses the topology in the case of a local cache.
> Furthermore, it calls the method  map.putEntryIfObsoleteOrAbsent. But the
> entry cannot be obsolete or absent because it is a local cache.
>
>
> 2) Similarly, why is there a touch on the key : ctx.evicts().touch(entry,
> ctx.affinity().affinityTopologyVersion()); when the evicting policy is null
> (never evict)? This puts locks even when the context is lock=false.
>
>
> Thanks a lot for shedding some light on this.
>
> Steve
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>


Re: Hello!

2018-09-04 Thread Alexey Goncharuk
Hello Ivan,

Welcome to the Ignite community! I've added you to the list of
contributors, you should now be able to assign tickets to yourself.

Get familiar with Apache Ignite development process described here:
https://cwiki.apache.org/confluence/display/IGNITE/Development+Process

Instructions on how to contribute can be found here:
https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute

Project setup in Intellij IDEA:
https://cwiki.apache.org/confluence/display/IGNITE/Project+Setup

--AG

пн, 3 сент. 2018 г. в 18:27, Ivan Bessonov :

> I'm new to Ignite and I would like to join Apache Ignite development.
> My JIRA's login is ibessonov
>


Re: Zookeeper mass timeouts

2018-09-03 Thread Alexey Goncharuk
Dmitriy,

The zookeeper timeouts were caused by my commit (I looked through the wrong
PR when was checking tests), already reverted from master.

As for the bot, does it already have the logic to detect continuous
timeouts and send notification only after a successful run? If not, I guess
we should put it on our helper roadmap because this will not be the last
timeout change.

--AG

вс, 2 сент. 2018 г. в 10:11, Dmitriy Pavlov :

> Hi,
>
> First of all I apologize for mass emails came from zookeeper timeout
> failures.
>
> Both failures and the bot should be researched. I believe we can improve
> the bot notification rules to avoid mass emails in case of flaky timeouts.
>
> I believe it is better solution to make rules perfect instated of creating
> separate channel.
>
> Sincerely,
> Dmitriy Pavlov
>


Re: Hello

2018-09-03 Thread Alexey Goncharuk
Hello Maxim,

Welcome to the Ignite community! I've added you to the list of
contributors, you should now be able to assign tickets to yourself.

Get familiar with Apache Ignite development process described here:
https://cwiki.apache.org/confluence/display/IGNITE/Development+Process

Instructions on how to contribute can be found here:
https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute

Project setup in Intellij IDEA:
https://cwiki.apache.org/confluence/display/IGNITE/Project+Setup

--AG

пн, 3 сент. 2018 г. в 15:21, Maxim.Pudov :

> I'm new to Ignite and I would like to join Apache Ignite development.
> My JIRA's login is Maxim.Pudov
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>


Re: [MTCGA]: new failures in builds [1757777] needs to be handled

2018-09-03 Thread Alexey Goncharuk
Petr,

Can you by any chance check if rackspace tests have actual keys? I've
created a ticket for the failing test [1].

[1] https://issues.apache.org/jira/browse/IGNITE-9444

пт, 31 авг. 2018 г. в 11:44, Alexey Goncharuk :

> The failure is not related to the change (verified in a separate branch
> with revert) and seems to be related to some change in rackspace
> infrastructure. Will create a ticket and mute the test.
>
> пт, 31 авг. 2018 г. в 0:07, :
>
>> Hi Ignite Developer,
>>
>> I am MTCGA.Bot, and I've detected some issue on TeamCity to be addressed.
>> I hope you can help.
>>
>>  *New test failure in master
>> TcpDiscoveryCloudIpFinderSelfTest.testRackspace
>> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=3102247230760992596=%3Cdefault%3E=testDetails
>>  Changes may led to failure were done by
>>  - irakov
>> http://ci.ignite.apache.org/viewModification.html?modId=829932=false
>>
>> - If your changes can led to this failure(s), please create issue
>> with label MakeTeamCityGreenAgain and assign it to you.
>> -- If you have fix, please set ticket to PA state and write to
>> dev list fix is ready
>> -- For case fix will require some time please mute test and set
>> label Muted_Test to issue
>> - If you know which change caused failure please contact change
>> author directly
>> - If you don't know which change caused failure please send
>> message to dev list to find out
>> Should you have any questions please contact dev@ignite.apache.org
>> Best Regards,
>> MTCGA.Bot
>> Notification generated at Fri Aug 31 00:06:59 MSK 2018
>>
>


[jira] [Created] (IGNITE-9444) TcpDiscoveryCloudIpFinderSelfTest.testRackspace fails in master

2018-08-31 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-9444:


 Summary: TcpDiscoveryCloudIpFinderSelfTest.testRackspace fails in 
master
 Key: IGNITE-9444
 URL: https://issues.apache.org/jira/browse/IGNITE-9444
 Project: Ignite
  Issue Type: Test
Reporter: Alexey Goncharuk






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [MTCGA]: new failures in builds [1757777] needs to be handled

2018-08-31 Thread Alexey Goncharuk
The failure is not related to the change (verified in a separate branch
with revert) and seems to be related to some change in rackspace
infrastructure. Will create a ticket and mute the test.

пт, 31 авг. 2018 г. в 0:07, :

> Hi Ignite Developer,
>
> I am MTCGA.Bot, and I've detected some issue on TeamCity to be addressed.
> I hope you can help.
>
>  *New test failure in master
> TcpDiscoveryCloudIpFinderSelfTest.testRackspace
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=3102247230760992596=%3Cdefault%3E=testDetails
>  Changes may led to failure were done by
>  - irakov
> http://ci.ignite.apache.org/viewModification.html?modId=829932=false
>
> - If your changes can led to this failure(s), please create issue
> with label MakeTeamCityGreenAgain and assign it to you.
> -- If you have fix, please set ticket to PA state and write to dev
> list fix is ready
> -- For case fix will require some time please mute test and set
> label Muted_Test to issue
> - If you know which change caused failure please contact change
> author directly
> - If you don't know which change caused failure please send
> message to dev list to find out
> Should you have any questions please contact dev@ignite.apache.org
> Best Regards,
> MTCGA.Bot
> Notification generated at Fri Aug 31 00:06:59 MSK 2018
>


Re: Hello ignite team, IGNITE-9433 contribution

2018-08-31 Thread Alexey Goncharuk
Hi Pavel,

Welcome to the Ignite community! I've added you to the contributors list,
you should now be able to assign the tickets to yourself.

--AG

чт, 30 авг. 2018 г. в 16:08, Pavel Voronkin :

> Hi Team,
>
> I'd like to join to Apache Ignite development.
> Especially to the ML part.
> Currently, I work on IGNITE-9433
> https://issues.apache.org/jira/browse/IGNITE-9433
>
> My Jira login is voropava
>
> Best regards,
> Pavel
>


[jira] [Created] (IGNITE-9429) GridCacheReplicatedDataStructuresFailoverSelfTest#testAtomicSequenceConstantTopologyChange is flaky

2018-08-30 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-9429:


 Summary: 
GridCacheReplicatedDataStructuresFailoverSelfTest#testAtomicSequenceConstantTopologyChange
 is flaky
 Key: IGNITE-9429
 URL: https://issues.apache.org/jira/browse/IGNITE-9429
 Project: Ignite
  Issue Type: Test
Reporter: Alexey Goncharuk






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Retries in write-behind store

2018-08-29 Thread Alexey Goncharuk
Val,

There is no need to have two implementations of the store, the exception
may be handled based on the cache configuration, the store only need to
check whether write-behind is enabled. The configuration change will be
transparently handled by the store. This change can be easily incorporated
to our POJO store.

I am ok with the callback idea, but we need to discuss the API changes
first.

--AG

ср, 29 авг. 2018 г. в 16:07, Valentin Kulichenko <
valentin.kuliche...@gmail.com>:

> Alex,
>
> I see your point, but I still think it should be incorporated into the
> product.
>
> First of all, your solution implies changing the CacheStore implementation
> every time you switch between write-through and write-behind. This
> contradicts with the overall design.
>
> Second of all, one of the most commonly used implementations is the POJO
> store which is provided out of the box. Moreover, usually users take
> advantage of automatic integration feature and generate the config using
> Web Console, so they often don't even know what "CacheJdbcPojoStore" is.
>
> Said that, your suggestion works as a workaround, but it doesn't seem to be
> very user-friendly. I actually like Gaurav's idea - instead of blindly
> limiting number of retries we can have a callback to handle errors.
>
> -Val
>
> On Wed, Aug 29, 2018 at 1:31 AM Alexey Goncharuk <
> alexey.goncha...@gmail.com>
> wrote:
>
> > Since the write-behind store wraps the store provided by a user, I think
> > the correct solution would be to catch the exception and not propagate it
> > further in the store, because only the end-user knows which errors can be
> > retried, and which errors cannot.
> >
> > In this case no changes in the write-behind logic is needed.
> >
> > ср, 29 авг. 2018 г. в 7:14, Valentin Kulichenko <
> > valentin.kuliche...@gmail.com>:
> >
> > > Folks,
> > >
> > > Is there a way to limit or disable retries of failed updates in the
> > > write-behind store? I can't find one, it looks like if an update fails,
> > it
> > > is moved to the end of the queue and then eventually retried. If it
> fails
> > > again, process is repeated.
> > >
> > > Such behavior *might* be OK if failures are caused by database being
> > > temporarily unavailable. But what if update fails deterministically,
> for
> > > example due to a constraint violation? There is absolutely no reason to
> > > retry it, and at the same time it can cause stability and performance
> > > issues when buffer is full with such "broken" updates.
> > >
> > > Does it makes sense to add an option that would allow to limit number
> of
> > > retries (or even disable them)?
> > >
> > > -Val
> > >
> >
>


Re: Bots on dev list

2018-08-29 Thread Alexey Goncharuk
Denis,

I am against filtering out MTCGA messages from the dev-list because test
failures affect every developer in the community and may be caused by any
developer in the community. Usually such emails require immediate action
and it would be wrong to move them to a separate list.

I understand, though, your concern about the dev-list appearance on nabble.
I wonder if there is an option to create subfolders there so that certain
emails are put into a separate subfolders, like Dmitrii Ryabov mentioned
before.

--AG

ср, 29 авг. 2018 г. в 12:48, Maxim Muzafarov :

> Denis,
>
> I would like to keep a single entry point into the whole Ignite development
> process project,
> but maybe other Igniters have another opinion on this. As for me, it's a
> more convenient way
> for searching any activity on dev.list by single keyword (e.g. PRs, JIRAs,
> topics).
>
> As mentioned Dmitry, you can configure your mail agent for grouping or
> moving bot messages into
> the separate directory. These messages have predefined topic names and can
> be easily filtered.
>
> On Wed, 29 Aug 2018 at 12:26 Dmitrii Ryabov  wrote:
>
> >  Modern mail services allow users to filter messages. You can easily
> filter
> > out bot messages.
> >
> > 2018-08-29 11:48 GMT+03:00 Denis Mekhanikov :
> >
> > > Igniters,
> > >
> > > We have a lot of threads, created by bots on the dev list.
> > > Currently messages are sent by JIRA, GitHub and MTCGA bots. Maybe, some
> > > others too, but these are the most active.
> > >
> > > Take a look at this page:
> > > http://apache-ignite-developers.2346864.n4.nabble.
> > > com/Apache-Ignite-Developers-f1i35.html
> > > It's hard to find actual discussions in this mess. I'd like to see
> > > something like what we have on the users list:
> > > http://apache-ignite-users.70518.x6.nabble.com/
> > >
> > > It doesn't seem necessary to me to mix discussions with this cryptic
> flow
> > > of information.
> > > Can we create a separate mailing list for bots?
> > >
> > > Denis
> > >
> >
> --
> --
> Maxim Muzafarov
>


Re: TeamCity's security certificate expired

2018-08-29 Thread Alexey Goncharuk
Certificates are updated, TC is up and running.

ср, 29 авг. 2018 г. в 12:55, Dmitrii Ryabov :

> Hi, Igniters!
>
> Who can refresh TeamCity's security certificate?
>


Re: Retries in write-behind store

2018-08-29 Thread Alexey Goncharuk
Since the write-behind store wraps the store provided by a user, I think
the correct solution would be to catch the exception and not propagate it
further in the store, because only the end-user knows which errors can be
retried, and which errors cannot.

In this case no changes in the write-behind logic is needed.

ср, 29 авг. 2018 г. в 7:14, Valentin Kulichenko <
valentin.kuliche...@gmail.com>:

> Folks,
>
> Is there a way to limit or disable retries of failed updates in the
> write-behind store? I can't find one, it looks like if an update fails, it
> is moved to the end of the queue and then eventually retried. If it fails
> again, process is repeated.
>
> Such behavior *might* be OK if failures are caused by database being
> temporarily unavailable. But what if update fails deterministically, for
> example due to a constraint violation? There is absolutely no reason to
> retry it, and at the same time it can cause stability and performance
> issues when buffer is full with such "broken" updates.
>
> Does it makes sense to add an option that would allow to limit number of
> retries (or even disable them)?
>
> -Val
>


[jira] [Created] (IGNITE-9405) Update documentation for metrics for remains to evict keys/partitions

2018-08-29 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-9405:


 Summary: Update documentation for metrics for remains to evict 
keys/partitions
 Key: IGNITE-9405
 URL: https://issues.apache.org/jira/browse/IGNITE-9405
 Project: Ignite
  Issue Type: Task
  Components: documentation
Reporter: Alexey Goncharuk






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9399) Document new WAL history size configuration parameter

2018-08-28 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-9399:


 Summary: Document new WAL history size configuration parameter
 Key: IGNITE-9399
 URL: https://issues.apache.org/jira/browse/IGNITE-9399
 Project: Ignite
  Issue Type: Task
  Components: documentation
Reporter: Alexey Goncharuk






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [MTCGA]: new failures in builds [1588345] needs to be handled

2018-08-27 Thread Alexey Goncharuk
Folks, this is a new failure in IGFS/Hadoop domain. Looks like the
http://www.us.apache.org/dist/hadoop/core/hadoop-2.5.2/hadoop-2.5.2.tar.gz
was moved (the URL gives 404).

Vladimir, can you take a look or point community how to take care of this
failure? Should we simply upgrade the Hadoop version?

пн, 27 авг. 2018 г. в 9:43, :

> Hi Ignite Developer,
>
> I am MTCGA.Bot, and I've detected some issue on TeamCity to be addressed.
> I hope you can help.
>
>  *Recently contributed test failed in master
> org.apache.ignite.testsuites.IgniteIgfsLinuxAndMacOSTestSuite.initializationError
>
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=-4034437294452110435=%3Cdefault%3E=testDetails
>  No changes in build
>
> - If your changes can led to this failure(s), please create issue
> with label MakeTeamCityGreenAgain and assign it to you.
> -- If you have fix, please set ticket to PA state and write to dev
> list fix is ready
> -- For case fix will require some time please mute test and set
> label Muted_Test to issue
> - If you know which change caused failure please contact change
> author directly
> - If you don't know which change caused failure please send
> message to dev list to find out
> Should you have any questions please contact dev@ignite.apache.org
> Best Regards,
> MTCGA.Bot
> Notification generated at Mon Aug 27 09:43:20 MSK 2018
>


Re: [MTCGA]: new failures in builds [1731430, 1470304, 1263230] needs to be handled

2018-08-27 Thread Alexey Goncharuk
This can be ignored for now, last 10 runs are green.

сб, 25 авг. 2018 г. в 15:12, :

> Hi Ignite Developer,
>
> I am MTCGA.Bot, and I've detected some issue on TeamCity to be addressed.
> I hope you can help.
>
>  *New test failure in master IgniteCoreTest: CacheTestSuite:
> TestPutGetTime
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=7658720346544183028=%3Cdefault%3E=testDetails
>
>  *New test failure in master IgniteCoreTest: CacheTestSuite:
> TestGetAndPutIfAbsent
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=6996817533011399981=%3Cdefault%3E=testDetails
>
>  *New test failure in master IgniteCoreTest: CacheTestSuite:
> TestContainsKeys
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=7285770292804096604=%3Cdefault%3E=testDetails
>
>  *New test failure in master IgniteCoreTest: CacheTestSuite:
> TestPutAll
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=-6390205996503748689=%3Cdefault%3E=testDetails
>
>  *New test failure in master IgniteCoreTest: CacheTestSuite:
> TestRemoveAllKeysIterVector
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=8758275678114310806=%3Cdefault%3E=testDetails
>
>  *New test failure in master IgniteCoreTest: CacheStoreTestSuite:
> LocalLoadCacheSingleNodeNoPredicate
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=9015012699833728261=%3Cdefault%3E=testDetails
>
>  *New test failure in master IgniteCoreTest: CacheTestSuite:
> TestCreateCache
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=-5095691067384514204=%3Cdefault%3E=testDetails
>
>  *New test failure in master IgniteCoreTest: CacheTestSuite:
> TestLocalClearAllIterArray
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=3166615619754517087=%3Cdefault%3E=testDetails
>
>  *New test failure in master IgniteCoreTest: CacheTestSuite:
> TestPutGetDate
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=-2606088378994662262=%3Cdefault%3E=testDetails
>
>  *New test failure in master IgniteCoreTest: CacheTestSuite:
> TestLocalEvictIterArray
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=7575936322131817559=%3Cdefault%3E=testDetails
>
>  *New test failure in master IgniteCoreTest: CacheStoreTestSuite:
> LoadCacheSeveralNodesNoPredicate
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=7743436803082385641=%3Cdefault%3E=testDetails
>
>  *New test failure in master IgniteCoreTest: CacheTestSuite:
> TestContainsKey
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=1115225155613585414=%3Cdefault%3E=testDetails
>
>  *New test failure in master IgniteCoreTest: CacheTestSuite:
> TestLocalClearAllIterList
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=2661124177463452745=%3Cdefault%3E=testDetails
>
>  *New test failure in master IgniteCoreTest: CacheStoreTestSuite:
> LoadCacheSingleNodeNoPredicate
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=3359013355795990869=%3Cdefault%3E=testDetails
>
>  *New test failure in master IgniteCoreTest: CacheTestSuite: TestGet
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=5606130551689605135=%3Cdefault%3E=testDetails
>
>  *New test failure in master IgniteCoreTest: CacheTestSuite:
> TestPutIfAbsent
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=4462541653374489660=%3Cdefault%3E=testDetails
>
>  *New test failure in master IgniteCoreTest: CacheTestSuite:
> TestRemoveAllKeysIterArray
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=-2310399454784119338=%3Cdefault%3E=testDetails
>
>  *New test failure in master IgniteCoreTest: CacheTestSuite:
> TestGetAndReplace
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=1749168869106643063=%3Cdefault%3E=testDetails
>
>  *New test failure in master IgniteCoreTest:
> BinaryIdentityResolverTestSuite: IdentityEquilityWithGuid
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=-3273658252531597324=%3Cdefault%3E=testDetails
>
>  *New test failure in master IgniteCoreTest: CacheTestSuite:
> TestLocalEvictIterSet
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=3524550646347789055=%3Cdefault%3E=testDetails
>
>  *New test failure in master IgniteCoreTest: CacheTestSuite: TestSizes
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=2370244242758466103=%3Cdefault%3E=testDetails
>
>  *New test failure in master IgniteCoreTest:
> BinaryIdentityResolverTestSuite: IdentityEquilityWithoutGuid
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=2359304547582357827=%3Cdefault%3E=testDetails
>
>  *New test failure in master IgniteCoreTest: CacheTestSuite:
> 

[jira] [Created] (IGNITE-9384) Transaction state PREPARED may be set too early or too late

2018-08-27 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-9384:


 Summary: Transaction state PREPARED may be set too early or too 
late
 Key: IGNITE-9384
 URL: https://issues.apache.org/jira/browse/IGNITE-9384
 Project: Ignite
  Issue Type: Bug
Reporter: Alexey Goncharuk


I see the following issues in the {{GridDhtTxPrepareFutureAdapter}} class:
1) {{PREPARED}} state on near local transaction may be set during the future 
completion. I think the {{onComplete(res)}} method should be corrected as 
follows:
{code}
if (last || tx.isSystemInvalidate() && !(tx.near() && tx.local()))
tx.state(PREPARED);
...
{code}

2) In {{onDone}} we have the following code:
{code}
if (REPLIED_UPD.compareAndSet(this, 0, 1)) {
GridNearTxPrepareResponse res = createPrepareResponse(this.err);

try {
sendPrepareResponse(res);
}
finally {
// Will call super.onDone().
onComplete(res);
}

return true;
}
...
{code}
This code will send near prepare response to the near node before the local DHT 
transaction sets it's state to {{PREPARED}} which violates the invariant that 
all transactions are prepared before any of the transactions is committed. I 
think these two methods should be swapped, but we need to carefully check the 
error handling (note that {{onComplete}} is called in {{finally}} block).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [MTCGA]: new failures in builds [1711030] needs to be handled

2018-08-23 Thread Alexey Goncharuk
This new test if failed intentionally (to be fixed in a separate ticket),
muted on TC.

чт, 23 авг. 2018 г. в 2:37, :

> Hi Ignite Developer,
>
> I am MTCGA.Bot, and I've detected some issue on TeamCity to be addressed.
> I hope you can help.
>
>  *Recently contributed test failed in master
> TransactionIntegrityWithPrimaryIndexCorruptionTest.testPrimaryIndexCorruptionDuringCommitOnPrimaryNode3
>
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=3475401922666481512=%3Cdefault%3E=testDetails
>  No changes in build
>
> - If your changes can led to this failure(s), please create issue
> with label MakeTeamCityGreenAgain and assign it to you.
> -- If you have fix, please set ticket to PA state and write to dev
> list fix is ready
> -- For case fix will require some time please mute test and set
> label Muted_Test to issue
> - If you know which change caused failure please contact change
> author directly
> - If you don't know which change caused failure please send
> message to dev list to find out
> Should you have any questions please contact dpav...@apache.org or write
> to dev.list
> Best Regards,
> MTCGA.Bot
> Notification generated at Thu Aug 23 02:37:34 MSK 2018
>


[jira] [Created] (IGNITE-9326) IgniteCacheFailedUpdateResponseTest hangs in master

2018-08-20 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-9326:


 Summary: IgniteCacheFailedUpdateResponseTest hangs in master
 Key: IGNITE-9326
 URL: https://issues.apache.org/jira/browse/IGNITE-9326
 Project: Ignite
  Issue Type: Test
Reporter: Alexey Goncharuk
Assignee: Alexey Goncharuk


The test started to hang after IGNITE-8926 was merged.

The reason is that entry processor result started to be lazily serialized 
during the message send, which results in a failure handler invocation. 
However, the test checks that the exception is rethrown to a user.

One of the possible fixes is to marshal the result after the topology lock is 
released.
Muting test in master for now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [Distributed SQL] Do we have a plan to implement QuadTree index?

2018-08-16 Thread Alexey Goncharuk
Alexey,

I am not sure if it will be a proper fir for you, but I think it worth a
try.

Apache Ignite has an option to index geospatial data using third-party
libraries (note that it is not included in the default Ignite build, the
module is activated via the lgpl profile). The index is located in
Ignite-geospatial module and uses an r-tree implementation underneath. One
downside of this module is that the geospatial index is not supported for
the Ignite native persistence yet.

Hope this helps!
--AG

чт, 16 авг. 2018 г. в 6:21, Alexey Zinoviev :

> Sorry, for the delay, dear Pavel and Denis.
>
> Yes, I need a kind of indexing to improve KNN algorithms during training
> the model.
>
> In my draft solution I've implemented building of
> https://en.wikipedia.org/wiki/K-d_tree
>  on the training data set.
> It collects the information about data distributed in our specific ML
> Datasets (distributed by nodes over Ignite cache)
>
> Pavel, you ask me any questions and I've prepared answers.
>
> 1) Should be this index in-memory only or you want to persist it?
> *Maybe it should be persisted (to reuse that for next predictions)*
>
> 2) In general index can be implemented in two ways per-partition and
> per-node.
> *Thank you for your explanation. Of course the per-node is better.*
>
> 3) I think K-d tree is preferable
> *You are absolutely right, it should be K-d tree*
>
> 4) Could you please share use cases you're trying to solve with QuadTree?
> With
> close to real data and examples of queries?
>
> I need to solve *k-nearest neighbors search task *on a set of vectors with
> unique keys presented in Ignite Cache (training set),
> during the training phase I'm going to build a temporary index as a KD-Tree
> (based on distance between vectors).
>
> The distance metric is a parameter too.
>
> After that, in prediction phase the *k-nearest neighbors search task *is
> solved by brute-force iteration over all vectors to find the *k-nearest
> neighbors.*
> I'd like to improve the search part with queries to index to extract
> closest vectors.
>
> Of course, it's kind of experiment, and maybe it's bad idea to patch SQL
> internals to solve this private task, but as you mentioned it can be a good
> point of collaboration between ML and SQL components.
>
> Can I get one of the implemented indices as a primary example and implement
> it by myself?
> Could you recommend something as an start point for me?
>
> Thanks for your help.
>
>
>
>
> 2018-08-04 11:18 GMT+06:00 Denis Magda :
>
> > Alexey, are you working on some new ML/DL APIs/algorithms? Please
> elaborate
> > what you'd like to add to Ignite.
> >
> > --
> > Denis
> >
> > On Wed, Aug 1, 2018 at 3:10 PM Pavel Kovalenko 
> wrote:
> >
> > > Hello Alexey,
> > >
> > > It's not so difficult to implement new type of indexing of data, but if
> > you
> > > want to reach performance in distributed environment you need to have
> > > strong knowledge of a data you're indexing and what kind of queries you
> > > want to execute.
> > > Should be this index in-memory only or you want to persist it?
> > > In case of persistence your index should fit our page memory model
> > > requirements.
> > > In both cases your index should be ready to work in concurrent
> > environment.
> > >
> > > In general index can be implemented in two ways per-partition and
> > per-node.
> > > Per-partition may be efficient if you have a lot of points (x,y)
> > > representing a big one, e.g. image. In this case it's required that all
> > of
> > > these points will be in one partition that query e.g. makes images
> > > intersection will execute in one node. But if you have multiple images,
> > > your index will contain also another points from other object and will
> > > overload it.
> > > Per-node may be efficient if you have a lot of points (x,y) that are
> > > independent of each other, that you will use it as spatial e.g.. But in
> > > this case, I think K-d tree is preferable as it can be used in more
> wide
> > > way.
> > >
> > > Could you please share use cases you're trying to solve with QuadTree?
> > With
> > > close to real data and examples of queries? Because now the question is
> > too
> > > abstract and it's hard to understand how it should be implemented to
> > reach
> > > good results.
> > >
> > >
> > > 2018-08-01 16:45 GMT+03:00 Alexey Zinoviev :
> > >
> > > > Hi, Igniters.
> > > >
> > > > Currently I'm working on different math stuff over the Apache Ignite
> > and
> > > in
> > > > a few tasks I need to implement in memory something like this
> > > >
> > > > https://en.wikipedia.org/wiki/Quadtree
> > > >
> > > > I didn't find such index in Apache Ignite, but maybe it's under
> > > development
> > > > by somebody?
> > > >
> > > > Is it a difficult to add a new index type to our distributed SQL
> (from
> > > > point of view of different infrastructure issues and so on P.S I
> don't
> > > > worry the math stuff here because I've implemented it many 

Re: CacheStore and ignite.close

2018-08-16 Thread Alexey Goncharuk
Ilya,

Can you please clarify what you mean by 'abandons cache store operations'?
Does it mean that a read-through/write-through op is omitted, but the
public API method returns without an error? If it is so, then this is a
bug. If a public API method finishes with an exception when read-through is
omitted, then I think it is a correct behavior.

ср, 15 авг. 2018 г. в 16:38, Ilya Kasnacheev :

> Hello!
>
> I'm working on https://issues.apache.org/jira/browse/IGNITE-9093 test fix
> Turns out, IgniteDbPutGetWithCacheStoreTest.testReadThrough fails
> sporadically because the default ignite.close() is close(cancel=true), and
> it seems that it abandons cache store operations. So not all data is
> read/written from cache store.
>
> Is it really so? Is it considered safe?
>
> I have a pull request on this topic:
> https://github.com/apache/ignite/pull/4545
>
> Regards,
>
> --
> Ilya Kasnacheev
>


Re: Synchronous tx entries unlocking in discovery\exchange threads.

2018-08-16 Thread Alexey Goncharuk
Andrey, I agree that most likely this can be done in an async way. There
are some nuances, though, because if a node leaves during an ongoing
exchange, we should remove the locks in the context of the ongoing exchange
and not wait for the next exchange event.

I will take a look at your PR shortly.

ср, 15 авг. 2018 г. в 15:57, Andrey Mashenkov :

> Hi Igniters,
>
> I've found Ignite node tries to unlock tx entries when a node left the
> grid.
> Ignite do this synchronously in
> GridCacheMvccManager.removeExplicitNodeLocks() in discovery and exchange
> threads.
>
> Looks like this can be done in ascync way.
> I've made a PR #4565 and seems there is no new test failures [1].
>
> I'm not familiar enough with exchange manager code, but looks like we can
> scan locked entries more than once per node left event.
> Also, it looks possible we can scan locked entries once for several merged
> exchange events.
>
> Thoughts? Any ideas how this can be refactoried?
>
> [1]
>
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=projectOverview_IgniteTests24Java8=pull%2F4546%2Fhead
>
> --
> Best regards,
> Andrey V. Mashenkov
>


[jira] [Created] (IGNITE-9275) Introduce mechanism to fetch partition file via a p2p protocol

2018-08-15 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-9275:


 Summary: Introduce mechanism to fetch partition file via a p2p 
protocol
 Key: IGNITE-9275
 URL: https://issues.apache.org/jira/browse/IGNITE-9275
 Project: Ignite
  Issue Type: Sub-task
Reporter: Alexey Goncharuk






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9269) Stall optimistic transactions reads if there is a candidate for PREPARED transaction

2018-08-14 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-9269:


 Summary: Stall optimistic transactions reads if there is a 
candidate for PREPARED transaction
 Key: IGNITE-9269
 URL: https://issues.apache.org/jira/browse/IGNITE-9269
 Project: Ignite
  Issue Type: Improvement
  Components: general
Reporter: Alexey Goncharuk


This is an idea that needs confirmation and accurate benchmarking.

Currently, when a read is performed inside an optimistic serializable 
transaction, we capture the value and entry version immediately regardless of 
the pending transactional locks for the read entry.
If there is a write candidate on the entry, this scenario will very likely 
result in optimistic write conflict exception. If a read observes a single (or 
several?) write candidates on the entry, we may stall the read until the 
pending prepared transaction is committed and proceed with the read only later. 
This is a speculative optimization, but it may reduce the chance of getting 
optimistic write conflict exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Data regions on client nodes

2018-08-09 Thread Alexey Goncharuk
Once the OS gave us the pointer, no exceptions can be raised in the user
code. If the OS overcommitted the memory, then the process will be halted
when OOME occurs, not much we can do here.

My point was that dynamic data region allocation can lead to another
dynamic exception that should be properly handled during cache start. When
the data region is allocated statically, this exception is handled on node
start, which is much easier.

ср, 8 авг. 2018 г. в 18:18, Dmitriy Pavlov :

> I used to think that OS allocates pages not immediately after call to
> allocate(), but only during first touch of each page.
>
> I'm not sure every OS provides guaranee that 'allocated' memory will never
> cause OOME. Please correct me if I'm wrong.
>
> ср, 8 авг. 2018 г. в 17:38, Dmitriy Setrakyan :
>
> > Alexey,
> >
> > I am not sure I understand. If cache creation proceeds, but memory region
> > was not allocated, how can the cache operate?
> >
> > D.
> >
> > On Wed, Aug 8, 2018 at 8:05 AM, Alexey Goncharuk <
> > alexey.goncha...@gmail.com
> > > wrote:
> >
> > > I do not mind making this change, but note that the reason for non-lazy
> > > region allocation is the need to gracefully handle OOME errors during
> > cache
> > > start. When a region is pre-allocated, no OOME can happen.
> > >
> > > If a region is allocated dynamically, then all errors that happened
> > during
> > > the node start before should be properly handled (a client node must be
> > > stopped, but cache creation should proceed).
> > >
> > > пт, 27 июл. 2018 г. в 20:04, Valentin Kulichenko <
> > > valentin.kuliche...@gmail.com>:
> > >
> > > > Ticket created: https://issues.apache.org/jira/browse/IGNITE-9113
> > > >
> > > > -Val
> > > >
> > > > On Fri, Jul 27, 2018 at 5:59 AM Dmitry Pavlov  >
> > > > wrote:
> > > >
> > > > > Maxim, thank you.
> > > > >
> > > > > If it seems it is technically possible, we can file ticket for this
> > > > change.
> > > > >
> > > > > I find this proposal reasonable, change makes perfectly sense to
> me.
> > > > >
> > > > > We can wait Alex G. feedback on this change before starting actual
> > > > > implementation. It can take for a while, because he is travelling
> > now.
> > > > >
> > > > > пт, 27 июл. 2018 г. в 14:35, Maxim Muzafarov :
> > > > >
> > > > > > Guys,
> > > > > >
> > > > > > I can miss some details, but at the first glance we have
> everething
> > > we
> > > > > need
> > > > > > to defer
> > > > > > region memory allocation if it has no cache groups assignments.
> And
> > > it
> > > > > > doesn't matter
> > > > > > where it happens on client or server nodes.
> > > > > >
> > > > > > Currently region memory allocation happens at exchange future
> init
> > > > > method.
> > > > > > At the
> > > > > > node startup method initCachesOnLocalJoin executes. This method
> > > > > resposnible
> > > > > > for
> > > > > > memory allocation (through initiating cache managers) and it also
> > > > starts
> > > > > > caches.
> > > > > > So, at this point we have all existing caches descriptors and can
> > > find
> > > > > out
> > > > > > which
> > > > > > cache matches which region to defer some regions initialization
> to
> > > the
> > > > > > moment when
> > > > > > newly cache assings to this region (happend at
> > onCacheChangeRequest).
> > > > > >
> > > > > > Please, сorrect me if I'm wrong and missing something.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, 25 Jul 2018 at 19:32 Dmitry Pavlov <
> dpavlov@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Hi Maxim,
> > > > > > >
> > > > > > > thank you for stepping in. How do you think, is it possible to
> > > check
> > > > > > cache
> > > > > > > assignment to region at stage of memory allocation?
> > > > > > >
> > > > > > 

Re: ConcurrentLinkedHashMap works incorrectly after clear()

2018-08-09 Thread Alexey Goncharuk
Guys, I think we can definitely change current implementation of CLHM with
a more stable one, but as a temporal solution I see nothing wrong with
throwing an UnsupportedOperationException from clear() method. Ilya already
provided a patch which replaces all clear() calls with a new map creation,
semantically it has the same behavior as a correct clear() method.

I suggest to merge the provided PR because IgniteH2Indexing is broken
because of this bug and then create a ticket to replace/fix/whatever it
feels right to do with current CLHM.

Thoughts?

пт, 27 июл. 2018 г. в 16:43, Anton Vinogradov :

> Is it possible to replace current implementation with google's [1] or some
> other?
> It a bad idea to have own CLHM and keep it outdated (this version based on
> CHM from java7!) and broken.
>
> [1]
>
> https://mvnrepository.com/artifact/com.googlecode.concurrentlinkedhashmap/concurrentlinkedhashmap-lru
>
> пт, 27 июл. 2018 г. в 16:40, Ilya Kasnacheev :
>
> > Hello!
> >
> > Most of our code uses CLHM as a container with fixed size. I can surely
> fix
> > LogThrottle but my main concern is H2 indexing code which uses the same
> > CLHM with cap.
> >
> > Regards,
> >
> > --
> > Ilya Kasnacheev
> >
> > 2018-07-27 16:38 GMT+03:00 Maxim Muzafarov :
> >
> > > Ilya,
> > >
> > > As for me, the main cause of this problem is not in CLHM itself but
> that
> > > we are using it for GridLogThrottle as container with fixed size.
> > Suppose,
> > > at current moment we have no alternative and should start thinking
> about
> > > futher steps.
> > >
> > > Can you create clear reproducer for described issue with CLHM above?
> > >
> > > As workaround for GridLogThrottle we can change clear() like this:
> > >
> > > public static void clear() {
> > > errors.forEach((k, v) -> errors.remove(k));
> > > }
> > >
> > > Will it helps you to fix test?
> > >
> > > Thoughts?
> > >
> > > On Wed, 25 Jul 2018 at 19:57 Ilya Kasnacheev <
> ilya.kasnach...@gmail.com>
> > > wrote:
> > >
> > > > Hello!
> > > >
> > > > LT stops throttling input as soon as it is unable to add any entries
> to
> > > > underlying map since it would consider itself full with 0 entries.
> > > >
> > > > I have tried to implement both suggestions, and they all have big
> > impact
> > > on
> > > > CLHM code. I am super uncomfortable with making non-trivial edits to
> > it.
> > > >
> > > > If the first approach is chosen, there's the need to iterate values
> > > instead
> > > > of clearing table, and if second approach is chosen, locking becomes
> > > > non-trivial since we're grabbing segment locks outside of segment
> > code..
> > > >
> > > > Changing LongAdder to AtomicLong also has potential implications
> which
> > > are
> > > > not readily understood.
> > > >
> > > > Note that entryQ is not cleared in clear() either which can cause
> > further
> > > > problems. My suggestion is still to either disallow clear() or throw
> > this
> > > > class away in its entirety.
> > > >
> > > > Regards,
> > > >
> > > > --
> > > > Ilya Kasnacheev
> > > >
> > > > 2018-07-25 12:00 GMT+03:00 Maxim Muzafarov :
> > > >
> > > > > Hello Ilya,
> > > > >
> > > > > Can you add more info about why and how LT for this test case
> prints
> > > log
> > > > > message twice?
> > > > >
> > > > > From my point, maiking clear() method to throw
> > > > > UnsupportedOperationException is not right
> > > > > fix for flaky test issues. A brief search through CLHM led me to a
> > > > thought
> > > > > that we just forgot to drop down
> > > > > LongAdder size when iterating over HashEntry array. We incrementing
> > and
> > > > > decrementing this
> > > > > counter on put/remove operations by why not in clear method? Am I
> > > right?
> > > > > So, replacing LongAdder to AtomicLong
> > > > > sounds good to me too, as it was suggested by Ilya Lantukh. But I
> can
> > > > > mistake here.
> > > > >
> > > > > In general way, I think it's a good case to start thinking about
> how
> > to
> > > > get
> > > > > rid of CLHM. E.g. we can consider this implementaion [1].
> > > > >
> > > > > [1] https://github.com/ben-manes/concurrentlinkedhashmap
> > > > >
> > > > > вт, 24 июл. 2018 г. в 16:45, Stanislav Lukyanov <
> > > stanlukya...@gmail.com
> > > > >:
> > > > >
> > > > > > It seems that we currently use the CLHM primarily as a FIFO
> cache.
> > > > > > I see two ways around that.
> > > > > >
> > > > > > First, we could implement such LRU/FIFO cache based on another,
> > > > properly
> > > > > > supported data structure from j.u.c.
> > > > > > ConcurrentSkipListMap seems OK. I have a draft implementation of
> > > > > > LruEvictionPolicy based on it that passes functional tests,
> > > > > > but I haven’t benchmarked it yet.
> > > > > >
> > > > > > Second, Guava has a Cache API with a lot of configuration options
> > > that
> > > > we
> > > > > > could use (license is Apache, should be OK).
> > > > > >
> > > > > > Stan
> > > > > >
> > > > > > From: Nikolay Izhikov
> > > > > > Sent: 24 июля 2018 г. 16:27
> 

Re: Data regions on client nodes

2018-08-08 Thread Alexey Goncharuk
I do not mind making this change, but note that the reason for non-lazy
region allocation is the need to gracefully handle OOME errors during cache
start. When a region is pre-allocated, no OOME can happen.

If a region is allocated dynamically, then all errors that happened during
the node start before should be properly handled (a client node must be
stopped, but cache creation should proceed).

пт, 27 июл. 2018 г. в 20:04, Valentin Kulichenko <
valentin.kuliche...@gmail.com>:

> Ticket created: https://issues.apache.org/jira/browse/IGNITE-9113
>
> -Val
>
> On Fri, Jul 27, 2018 at 5:59 AM Dmitry Pavlov 
> wrote:
>
> > Maxim, thank you.
> >
> > If it seems it is technically possible, we can file ticket for this
> change.
> >
> > I find this proposal reasonable, change makes perfectly sense to me.
> >
> > We can wait Alex G. feedback on this change before starting actual
> > implementation. It can take for a while, because he is travelling now.
> >
> > пт, 27 июл. 2018 г. в 14:35, Maxim Muzafarov :
> >
> > > Guys,
> > >
> > > I can miss some details, but at the first glance we have everething we
> > need
> > > to defer
> > > region memory allocation if it has no cache groups assignments. And it
> > > doesn't matter
> > > where it happens on client or server nodes.
> > >
> > > Currently region memory allocation happens at exchange future init
> > method.
> > > At the
> > > node startup method initCachesOnLocalJoin executes. This method
> > resposnible
> > > for
> > > memory allocation (through initiating cache managers) and it also
> starts
> > > caches.
> > > So, at this point we have all existing caches descriptors and can find
> > out
> > > which
> > > cache matches which region to defer some regions initialization to the
> > > moment when
> > > newly cache assings to this region (happend at onCacheChangeRequest).
> > >
> > > Please, сorrect me if I'm wrong and missing something.
> > >
> > >
> > >
> > > On Wed, 25 Jul 2018 at 19:32 Dmitry Pavlov 
> > wrote:
> > >
> > > > Hi Maxim,
> > > >
> > > > thank you for stepping in. How do you think, is it possible to check
> > > cache
> > > > assignment to region at stage of memory allocation?
> > > >
> > > > Sincerely,
> > > > Dmitriy Pavlov
> > > >
> > > > ср, 25 июл. 2018 г. в 18:22, Maxim Muzafarov :
> > > >
> > > > > Folks,
> > > > >
> > > > > I've checked memory allocation. It looks like we are allocating
> > memory
> > > > only
> > > > > on the first exchange future init on local join occurs on node.
> Also,
> > > > seems
> > > > > like we are allocating only the first chunk of memory (not the
> whole
> > > > bunch)
> > > > > and it calculates as:
> > > > >
> > > > > Math.max((maxSize - startSize) / (SEG_CNT - 1), 256L * 1024 * 1024)
> > > > >
> > > > > But, I'm agree with Val. It's better to allocate memory only when
> > when
> > > > > the first cache assigned to this region.
> > > > >
> > > > >
> > > > >
> > > > > Also, It seems like we have some problem with user notification
> about
> > > > > available
> > > > > physical resources. For client nodes method requiredOffheap()
> returns
> > > > > always
> > > > > zero [1]. That's why WARN message shown here [2] would be not not
> > quite
> > > > > right
> > > > > if we have a lot of client nodes in cluster.
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/managers/discovery/GridDiscoveryManager.java#L1501
> > > > > [2]
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/IgniteKernal.java#L1489
> > > > >
> > > > > сб, 21 июл. 2018 г. в 14:15, Dmitriy Setrakyan <
> > dsetrak...@apache.org
> > > >:
> > > > >
> > > > > > On Sat, Jul 21, 2018 at 5:22 AM, Valentin Kulichenko <
> > > > > > valentin.kuliche...@gmail.com> wrote:
> > > > > >
> > > > > > > Actually, I would go even further: only allocate a data region
> > on a
> > > > > node
> > > > > > > when the first cache assigned to this region is deployed on
> that
> > > > node.
> > > > > > > Because issue is broader than client nodes and local caches.
> One
> > > can
> > > > > have
> > > > > > > server nodes without any caches as well - running only
> services,
> > > for
> > > > > > > example.
> > > > > > >
> > > > > >
> > > > > > It would be great if this was possible, but to my knowledge,
> > regions
> > > > need
> > > > > > to be allocated on startup.
> > > > > >
> > > > > > Alexey Goncharuk, do you have any suggestions on this?
> > > > > >
> > > > > > D.
> > > > > >
> > > > > --
> > > > > --
> > > > > Maxim Muzafarov
> > > > >
> > > >
> > > --
> > > --
> > > Maxim Muzafarov
> > >
> >
>


Re: [MTCGA]: new failures in builds [1532044] needs to be handled

2018-07-24 Thread Alexey Goncharuk
Thanks, merged to master.

вт, 24 июл. 2018 г. в 16:28, Anton Vinogradov :

> Seems, it works.
> Cancelled hanged suites, they hangs because of problems fixed at master.
>
> вт, 24 июл. 2018 г. в 15:20, Anton Vinogradov :
>
> > Started more [1] Data Structures tasks to make sure fix works.
> >
> > [1]
> >
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_DataStructures_IgniteTests24Java8=pull%2F4415%2Fhead=buildTypeStatusDiv
> >
> > вт, 24 июл. 2018 г. в 14:59, Alexey Goncharuk <
> alexey.goncha...@gmail.com
> > >:
> >
> >> I will merge the fix once the TC passes for the new PR.
> >>
> >> вт, 24 июл. 2018 г. в 13:13, Andrey Mashenkov <
> andrey.mashen...@gmail.com
> >> >:
> >>
> >> > Hi Dmitry,
> >> >
> >> > Looks like DataStructure sute timeouts is caused by IGNITE-8892 fix
> [1].
> >> > The issue here is IgniteSetImpl uses internal CacheQueryFuture in
> wrong
> >> > way.
> >> >
> >> > I've already run PR branch with a fix on TC and it looks fine [2].
> >> >
> >> > [1] issues.apache.org/jira/browse/IGNITE-8892
> >> > [2]
> >> >
> >> >
> >>
> https://ci.ignite.apache.org/viewLog.html?buildId=1534153=queuedBuildOverviewTab
> >> >
> >> > On Tue, Jul 24, 2018 at 12:53 PM Dmitry Pavlov  >
> >> > wrote:
> >> >
> >> > > Hi Igniters,
> >> > >
> >> > > Test caused suite timeout 3 times in a row
> >> > > ⚠ IgniteDataStructureWithJobTest.testJobWithRestart (last started)
> >> > > ⚠ IgniteDataStructureWithJobTest.testJobWithRestart (last started)
> >> > > ⚠ IgniteDataStructureWithJobTest.testJobWithRestart (last started)
> >> > > History -
> >> > >
> >> > >
> >> >
> >>
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_DataStructures=%3Cdefault%3E=buildTypeStatusDiv
> >> > >
> >> > > Could it be related to IGNITE-8783 change?
> >> > >
> >> > > Test seems to be waiting for all job responses from worker nodes
> >> before
> >> > > stopping grid.
> >> > >
> >> > > Anton, please step in.
> >> > >
> >> > > Sincerely,
> >> > > Dmitriy Pavlov
> >> > >
> >> > > вт, 24 июл. 2018 г. в 3:17, :
> >> > >
> >> > > > Hi Ignite Developer,
> >> > > >
> >> > > > I am MTCGA.Bot, and I've detected some issue on TeamCity to be
> >> > addressed.
> >> > > > I hope you can help.
> >> > > >
> >> > > >  *New Critical Failure in master Data Structures
> >> > > >
> >> > >
> >> >
> >>
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_DataStructures=%3Cdefault%3E=buildTypeStatusDiv
> >> > > >  Changes may led to failure were done by
> >> > > >  - somefireone
> >> > > >
> >> > >
> >> >
> >>
> http://ci.ignite.apache.org/viewModification.html?modId=826273=false
> >> > > >  - vinokurov.pasha
> >> > > >
> >> > >
> >> >
> >>
> http://ci.ignite.apache.org/viewModification.html?modId=826253=false
> >> > > >  - dmitriy.govorukhin
> >> > > >
> >> > >
> >> >
> >>
> http://ci.ignite.apache.org/viewModification.html?modId=826250=false
> >> > > >  - kaa.dev
> >> > > >
> >> > >
> >> >
> >>
> http://ci.ignite.apache.org/viewModification.html?modId=826246=false
> >> > > >  - vanen31
> >> > > >
> >> > >
> >> >
> >>
> http://ci.ignite.apache.org/viewModification.html?modId=826242=false
> >> > > >  - garus.d.g
> >> > > >
> >> > >
> >> >
> >>
> http://ci.ignite.apache.org/viewModification.html?modId=826234=false
> >> > > >  - ivandasch
> >> > > >
> >> > >
> >> >
> >>
> http://ci.ignite.apache.org/viewModification.html?modId=826229=false
> >> > > >  - av
> >> > > >
> >> > >
> >> >
> >>
> http://ci.ignite.apache.org/viewModification.html?modId=826218=false
> >> > > >  - estanilovskiy
> >> > > >
> >> > >
> >> >
> >>
> http://ci.ignite.apache.org/viewModification.html?modId=826197=false
> >> > > >  - dmitriy.govorukhin
> >> > > >
> >> > >
> >> >
> >>
> http://ci.ignite.apache.org/viewModification.html?modId=826195=false
> >> > > >
> >> > > > - If your changes can led to this failure(s), please
> create
> >> > issue
> >> > > > with label MakeTeamCityGreenAgain and assign it to you.
> >> > > > -- If you have fix, please set ticket to PA state and
> write
> >> to
> >> > > dev
> >> > > > list fix is ready
> >> > > > -- For case fix will require some time please mute test
> and
> >> set
> >> > > > label Muted_Test to issue
> >> > > > - If you know which change caused failure please contact
> >> change
> >> > > > author directly
> >> > > > - If you don't know which change caused failure please
> send
> >> > > > message to dev list to find out
> >> > > > Should you have any questions please contact dpav...@apache.org
> or
> >> > write
> >> > > > to dev.list
> >> > > > Best Regards,
> >> > > > MTCGA.Bot
> >> > > > Notification generated at Tue Jul 24 03:17:09 MSK 2018
> >> > > >
> >> > >
> >> >
> >> >
> >> > --
> >> > Best regards,
> >> > Andrey V. Mashenkov
> >> >
> >>
> >
>


Re: [MTCGA]: new failures in builds [1532044] needs to be handled

2018-07-24 Thread Alexey Goncharuk
I will merge the fix once the TC passes for the new PR.

вт, 24 июл. 2018 г. в 13:13, Andrey Mashenkov :

> Hi Dmitry,
>
> Looks like DataStructure sute timeouts is caused by IGNITE-8892 fix [1].
> The issue here is IgniteSetImpl uses internal CacheQueryFuture in wrong
> way.
>
> I've already run PR branch with a fix on TC and it looks fine [2].
>
> [1] issues.apache.org/jira/browse/IGNITE-8892
> [2]
>
> https://ci.ignite.apache.org/viewLog.html?buildId=1534153=queuedBuildOverviewTab
>
> On Tue, Jul 24, 2018 at 12:53 PM Dmitry Pavlov 
> wrote:
>
> > Hi Igniters,
> >
> > Test caused suite timeout 3 times in a row
> > ⚠ IgniteDataStructureWithJobTest.testJobWithRestart (last started)
> > ⚠ IgniteDataStructureWithJobTest.testJobWithRestart (last started)
> > ⚠ IgniteDataStructureWithJobTest.testJobWithRestart (last started)
> > History -
> >
> >
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_DataStructures=%3Cdefault%3E=buildTypeStatusDiv
> >
> > Could it be related to IGNITE-8783 change?
> >
> > Test seems to be waiting for all job responses from worker nodes before
> > stopping grid.
> >
> > Anton, please step in.
> >
> > Sincerely,
> > Dmitriy Pavlov
> >
> > вт, 24 июл. 2018 г. в 3:17, :
> >
> > > Hi Ignite Developer,
> > >
> > > I am MTCGA.Bot, and I've detected some issue on TeamCity to be
> addressed.
> > > I hope you can help.
> > >
> > >  *New Critical Failure in master Data Structures
> > >
> >
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_DataStructures=%3Cdefault%3E=buildTypeStatusDiv
> > >  Changes may led to failure were done by
> > >  - somefireone
> > >
> >
> http://ci.ignite.apache.org/viewModification.html?modId=826273=false
> > >  - vinokurov.pasha
> > >
> >
> http://ci.ignite.apache.org/viewModification.html?modId=826253=false
> > >  - dmitriy.govorukhin
> > >
> >
> http://ci.ignite.apache.org/viewModification.html?modId=826250=false
> > >  - kaa.dev
> > >
> >
> http://ci.ignite.apache.org/viewModification.html?modId=826246=false
> > >  - vanen31
> > >
> >
> http://ci.ignite.apache.org/viewModification.html?modId=826242=false
> > >  - garus.d.g
> > >
> >
> http://ci.ignite.apache.org/viewModification.html?modId=826234=false
> > >  - ivandasch
> > >
> >
> http://ci.ignite.apache.org/viewModification.html?modId=826229=false
> > >  - av
> > >
> >
> http://ci.ignite.apache.org/viewModification.html?modId=826218=false
> > >  - estanilovskiy
> > >
> >
> http://ci.ignite.apache.org/viewModification.html?modId=826197=false
> > >  - dmitriy.govorukhin
> > >
> >
> http://ci.ignite.apache.org/viewModification.html?modId=826195=false
> > >
> > > - If your changes can led to this failure(s), please create
> issue
> > > with label MakeTeamCityGreenAgain and assign it to you.
> > > -- If you have fix, please set ticket to PA state and write to
> > dev
> > > list fix is ready
> > > -- For case fix will require some time please mute test and set
> > > label Muted_Test to issue
> > > - If you know which change caused failure please contact change
> > > author directly
> > > - If you don't know which change caused failure please send
> > > message to dev list to find out
> > > Should you have any questions please contact dpav...@apache.org or
> write
> > > to dev.list
> > > Best Regards,
> > > MTCGA.Bot
> > > Notification generated at Tue Jul 24 03:17:09 MSK 2018
> > >
> >
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>


[jira] [Created] (IGNITE-9056) BinaryTypeMismatchLoggingTest.testEntryWriteQueryEntities is flaky

2018-07-23 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-9056:


 Summary: BinaryTypeMismatchLoggingTest.testEntryWriteQueryEntities 
is flaky
 Key: IGNITE-9056
 URL: https://issues.apache.org/jira/browse/IGNITE-9056
 Project: Ignite
  Issue Type: Test
Reporter: Alexey Goncharuk
 Fix For: 2.7


The test fails because the captured message contains more than one occurence of 
the payload value.
Need to investigate the root cause.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Potential OOM while iterating over query cursor. Review needed.

2018-07-18 Thread Alexey Goncharuk
Folks,

There is no need to add getNext() method because the object we are
discussing is already an iterator. Then, to summarize the solution, we are
going to deprecate getAll() method and set keepAll flag to false for scan
query.

Agree?

пн, 16 июл. 2018 г. в 23:40, Dmitriy Setrakyan :

> On Mon, Jul 16, 2018 at 5:42 PM, Yakov Zhdanov 
> wrote:
>
> > Dmitry, let's have only getNext() same as jdbc. All other shortcuts seem
> to
> > overload API without adding much value.
> >
>
> Agree. Do you mind creating a ticket?
>


[jira] [Created] (IGNITE-9029) TcpDiscoverySpiMBeanTest flaky-fails in master

2018-07-18 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-9029:


 Summary: TcpDiscoverySpiMBeanTest flaky-fails in master
 Key: IGNITE-9029
 URL: https://issues.apache.org/jira/browse/IGNITE-9029
 Project: Ignite
  Issue Type: Test
Reporter: Alexey Goncharuk
 Fix For: 2.7


Only test-related issue, the test does not use shared IP finder.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Potential OOM while iterating over query cursor. Review needed.

2018-07-16 Thread Alexey Goncharuk
No objections from my side. Would be nice to receive some feedback from
other community members, though, because this is formally a breaking change.

пн, 16 июл. 2018 г. в 16:40, Yakov Zhdanov :

> Guys, it seems we need to deprecate getAll() and remove it in 3.0. I think
> it is usable only for queries that return 1 row. Every other case needs
> iteration. So having getFirst() seems to be better. Thoughts?
>
> As far as ScanQuery I think we can properly initialize keepAll to false on
> scan query instantiation. I am pretty sure none needs getAll() in scans.
> Alex?
>
> --
> Yakov
>


Re: Delete ticket - issues.apache.org

2018-07-16 Thread Alexey Goncharuk
Hi,

You can edit the ticket so it does not display any information, but the
content will be still accessible in the history. We can change the ticket
visibility to private, but it will be still be accessible for committers. I
suggest you contact Apache infra team regarding your request.

ср, 20 июн. 2018 г. в 11:15, Puviarasu S :

> Hi Team,
>
> I have created a ticket in https://issues.apache.org/jira. I am in a
> situation to delete the ticket such that it is not accessible over the
> Internet.
>
> Kindly requesting you to help me in deleting the ticket from Apache JIRA.
>
> Thanks in advance!!!
>
> Regards,
> Puviarasu
>


[jira] [Created] (IGNITE-9006) Add per-call tracing capabilities for IgniteCache#{get, getAll} methods

2018-07-16 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-9006:


 Summary: Add per-call tracing capabilities for IgniteCache#{get, 
getAll} methods
 Key: IGNITE-9006
 URL: https://issues.apache.org/jira/browse/IGNITE-9006
 Project: Ignite
  Issue Type: New Feature
Affects Versions: 2.6
Reporter: Alexey Goncharuk
 Fix For: 2.7


As an experimental feature to debug objects visibility issues, we should add an 
API method {{IgniteCache#withTrace()}} which will enable per-call API calls 
tracing.

I think we can start with reads tracing, as a simplest change. We need to 
introduce an additional flag mask to {{NearSingleGetRequest}}, 
{{NearGetRequest}}, {{Near...Response}} to identify such requests, and when 
processing, print out all message processing path (send, receive), entry 
swap/unswap, versions and values read, {{GridCacheEntryRemovedException}}s 
encountered, etc. For transactional caches, I think it also makes sense to 
print out all pending transactions that touch the key being read.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Potential OOM while iterating over query cursor. Review needed.

2018-07-11 Thread Alexey Goncharuk
Andrey,

Correct me if I am wrong, but my impression was that after the change
cursor#getAll() will return only the last page of the result, not the whole
collection. If public API method semantics is preserved, then no objections
from my side.

ср, 11 июл. 2018 г. в 15:18, Andrey Mashenkov :

> Alexey,
>
> I saw no issues on TC with this change, and change affect only private API.
> If you think it can break smth, then we can mark keepAll flag as deprecated
> to be deleted in 3.0 and change default value to true.
> I doubt this flag is useful for Ignite internals, and moreover user always
> can wrap query and implement same logic if he will really need such
> behavior.
> So, I vote for removing useless code if it is of course.
>
> ср, 11 июл. 2018 г., 13:26 Alexey Goncharuk :
>
> > Folks,
> >
> > Bumping up the discussion as it is hitting one of the Ignite users.
> >
> > The change seams reasonable to me, but it is a breaking change and may
> > affect existing users. Would the community be ok if we change the
> > QueryCursor#getAll method for scan queries? If not, we should expose the
> > keepAll() flag to the public API.
> >
> > пт, 29 июн. 2018 г. в 11:37, Andrey Mashenkov <
> andrey.mashen...@gmail.com
> > >:
> >
> > > Hi Igniters,
> > >
> > > There is an issue IGNITE-8892 [1] related to OOM during distributed
> query
> > > execution.
> > > This issue is not limited with ScanQuery usage and looks like affected
> > all
> > > query types.
> > >
> > > The use case is quite simple. 1 server and 1 client.
> > > Client starts scan query with default flags and iterate over cursor.
> > > If whole query result is not fit to memory - JVM will crashed with OOM,
> > > but it is not expected as client takes entries one by one and throw out
> > > them immediately.
> > > Reproducer is attached to the ticket.
> > >
> > > Same query works fine if query starts on server. Seems, we have no
> > > DistributedQueryFuture in that case and all works fine.
> > >
> > > I've found GridCacheDistributedQueryFuture collects all entries and try
> > to
> > > return the collection via onDone().
> > > This behaviour turn on with a flag 'keepAll' which is true by default.
> > > Iterating over cache via cache.iterator() has no OOM issues as we set
> > > keepAll flag to false.
> > >
> > > Why we have keepAll=true by default as seems noone expects future.get()
> > > will return any data and all queries works through queue in paging
> mode?
> > > Will it be ok to get rid of 'allCol' and keepAll flag at all?
> > >
> > > I've made a PR and TC looks fine.
> > > Could someone review it, please?
> > >
> > > [1] https://issues.apache.org/jira/browse/IGNITE-8892
> > >
> > >
> > > --
> > > Best regards,
> > > Andrey V. Mashenkov
> > >
> >
>


[jira] [Created] (IGNITE-8983) .NET long-running suite fails in master

2018-07-11 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-8983:


 Summary: .NET long-running suite fails in master
 Key: IGNITE-8983
 URL: https://issues.apache.org/jira/browse/IGNITE-8983
 Project: Ignite
  Issue Type: Test
Affects Versions: 2.6
Reporter: Alexey Goncharuk
 Fix For: 2.7






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Potential OOM while iterating over query cursor. Review needed.

2018-07-11 Thread Alexey Goncharuk
Folks,

Bumping up the discussion as it is hitting one of the Ignite users.

The change seams reasonable to me, but it is a breaking change and may
affect existing users. Would the community be ok if we change the
QueryCursor#getAll method for scan queries? If not, we should expose the
keepAll() flag to the public API.

пт, 29 июн. 2018 г. в 11:37, Andrey Mashenkov :

> Hi Igniters,
>
> There is an issue IGNITE-8892 [1] related to OOM during distributed query
> execution.
> This issue is not limited with ScanQuery usage and looks like affected all
> query types.
>
> The use case is quite simple. 1 server and 1 client.
> Client starts scan query with default flags and iterate over cursor.
> If whole query result is not fit to memory - JVM will crashed with OOM,
> but it is not expected as client takes entries one by one and throw out
> them immediately.
> Reproducer is attached to the ticket.
>
> Same query works fine if query starts on server. Seems, we have no
> DistributedQueryFuture in that case and all works fine.
>
> I've found GridCacheDistributedQueryFuture collects all entries and try to
> return the collection via onDone().
> This behaviour turn on with a flag 'keepAll' which is true by default.
> Iterating over cache via cache.iterator() has no OOM issues as we set
> keepAll flag to false.
>
> Why we have keepAll=true by default as seems noone expects future.get()
> will return any data and all queries works through queue in paging mode?
> Will it be ok to get rid of 'allCol' and keepAll flag at all?
>
> I've made a PR and TC looks fine.
> Could someone review it, please?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-8892
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>


Re: [VOTE] Apache Ignite 2.6.0 RC1

2018-07-11 Thread Alexey Goncharuk
+1

Checked src archive signature, verified build from source, started a node
with and without persistence.

ср, 11 июл. 2018 г. в 11:24, Dmitriy Setrakyan :

> Got it. I did not see the section up top. No more questions.
>
> On Wed, Jul 11, 2018 at 11:21 AM, Ilya Suntsov 
> wrote:
>
> > Dmitry,
> >
> > I see 2.6 changes above 2.5:
> >
> > Apache Ignite Release Notes
> > > ===
> > > Apache Ignite In-Memory Data Fabric 2.6
> > > ---
> > > Ignite:
> > > * Fixed incorrect calculation of client affinity assignment with
> > baseline.
> > > * Fixed incorrect calculation of switch segment record in WAL.
> > > * Fixed JVM crush during in-progress eviction and cache group stop.
> > > REST:
> > > * Fixed serialization of BinaryObjects to JSON.
> > >
> >
> >
> > Apache Ignite In-Memory Data Fabric 2.5
> >
> > ---
> >
> >
> > 2018-07-11 9:54 GMT+03:00 Dmitriy Setrakyan :
> >
> > > Andrey, this is the link I have:
> > > https://git-wip-us.apache.org/repos/asf?p=ignite.git;a=blob_
> > > plain;f=RELEASE_NOTES.txt;hb=refs/tags/2.6.0-rc1
> > >
> > > This list does not look short. It looks like a copy of 2.5.
> > >
> > > D.
> > >
> > > On Wed, Jul 11, 2018 at 5:48 AM, Andrey Gura  wrote:
> > >
> > > > Sorry,
> > > >
> > > > You could check it using provided links.
> > > >
> > > >
> > > >
> > > > ср, 11 июл. 2018 г., 5:41 Andrey Gura :
> > > >
> > > > > Dmitry,
> > > > >
> > > > > It is emergency release so we have very short list of changes. You
> > > could
> > > > > check it using provided links.
> > > > >
> > > > >
> > > > > ср, 11 июл. 2018 г., 3:01 Dmitriy Setrakyan  >:
> > > > >
> > > > >> Not sure if this is worth a down vote, but the release notes seem
> > > like a
> > > > >> copy of 2.5 release. I think 2.6 should have its own release
> notes.
> > > > >>
> > > > >> D.
> > > > >>
> > > > >> On Tue, Jul 10, 2018 at 8:02 PM, Andrey Gura 
> > > wrote:
> > > > >>
> > > > >> > Igniters,
> > > > >> >
> > > > >> > We've uploaded a 2.6.0 release candidate to
> > > > >> > https://dist.apache.org/repos/dist/dev/ignite/2.6.0-rc1/
> > > > >> >
> > > > >> > Git tag name is 2.6.0-rc1.
> > > > >> >
> > > > >> > This release includes the following changes:
> > > > >> >
> > > > >> > Ignite:
> > > > >> > * Fixed incorrect calculation of client affinity assignment with
> > > > >> baseline.
> > > > >> > * Fixed incorrect calculation of switch segment record in WAL.
> > > > >> > * Fixed JVM crush during in-progress eviction and cache group
> > stop.
> > > > >> >
> > > > >> > REST:
> > > > >> > * Fixed serialization of BinaryObjects to JSON.
> > > > >> >
> > > > >> > Complete list of closed issues:
> > > > >> >
> > > > >> https://issues.apache.org/jira/issues/?jql=project%20%
> > > > 3D%20IGNITE%20AND%
> > > > >> > 20fixVersion%20%3D%202.6%20AND%20(status%20%3D%
> > > > >> > 20closed%20or%20status%20%3D%20resolved)
> > > > >> >
> > > > >> > DEVNOTES
> > > > >> > https://git-wip-us.apache.org/repos/asf?p=ignite.git;a=blob_
> > > > >> > plain;f=DEVNOTES.txt;hb=refs/tags/2.6.0-rc1
> > > > >> >
> > > > >> > RELEASE NOTES
> > > > >> > https://git-wip-us.apache.org/repos/asf?p=ignite.git;a=blob_
> > > > >> > plain;f=RELEASE_NOTES.txt;hb=refs/tags/2.6.0-rc1
> > > > >> >
> > > > >> > Please start voting.
> > > > >> >
> > > > >> > +1 - to accept Apache Ignite 2.6.0-RC1
> > > > >> > 0 - don't care either way
> > > > >> > -1 - DO NOT accept Apache Ignite 2.6.0-RC1 (explain why)
> > > > >> >
> > > > >> > This vote will go for 72 hours.
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Best Regards,
> > Ilya Suntsov
> > email: isunt...@gridgain.com
> > *GridGain Systems*
> > www.gridgain.com
> >
>


Re: data extractor

2018-07-02 Thread Alexey Goncharuk
Dmitriy,

A few questions regarding the user cases for the utility:
1) Would I be able to read the extracted data from the dumped file without
Ignite node binary/marshaller metadata? In other words, will I be able to
move only the dumped file to another grid or will I need to move the
metadata as well?
2) Are you planning to add a public API version of this utility as a part
of Ignite? For example, if I am planning to run some statistics on a
checkpointed data, will I be able to get some sort of an iterator to
process this data?
3) How a user will choose which caches (cache groups) to process? Will the
user need to provide a cache or cache ID (or either of them)? Will the
utility be able to extract a single cache data from a cache group?
4) I think the upload part of the utility is missing some input parameters
- for example, what cluster to connect to, what caches to upload to, etc.

сб, 30 июн. 2018 г. в 22:38, Dmitriy Govorukhin <
dmitriy.govoruk...@gmail.com>:

> Igniters,
>
> I am working on IGNITE-7644
>  (export all key-value
> data from a persisted partition),
> it will be command line tool for extracting data from Ignite partition
> file without the need to start node.
> The main motivation is to have a lifebuoy in case if a file has damage for
> some reason.
>
> I suggest simple API and two commands for the first implementation:
>
> -c
> --CRC [srcPath] - check CRC for all(or by type) pages in partition
>
> -e
> --extract [srcPath] [outPath] - dump all survey data from partition to
> another file with raw key/value pair format
> (required graceful stop for a node, not necessary after --restore will be
> implemented)
>
> Output file format see in attached, this format does not contain any index
> inside but it is very simple and
> flexible for future works with raw key/value data.
>
> Future features:
> -u
> --upload - reload raw key/value pairs to node
>
> -s
> --status - check current node file status, need binary recovery or not
> (node crash on the middle of a checkpoint)
>
> -r
> --restore - restore binary consistency (finish checkpoint, required WAL
> file for recovery)
>
> Let's start a discussion, any comments are welcome.
>
>


Re: Introduce a sample of activation policy when cluster is activated first time

2018-06-27 Thread Alexey Goncharuk
I think a proper fix would be to reuse the logic of automatic baseline
change that is planned to be implemented and automatically activate the
cluster after a certain timeout is reached.

Note that immediate cluster activation on first node start won't work
because under concurrent nodes start and existing baseline this will lead
to potential error when a new node is started first, forms a new baseline
and prevents existing nodes with old baseline from joining the cluster.

вт, 26 июн. 2018 г. в 14:49, Ivan Rakov :

> Guys,
>
> We have auto-activation on restart of persistent cluster when last
> baseline node joins the cluster. On first start we have no baseline
> topology, and thus no auto-activation.
>
> I think it would be useful to add java snippet to Ignite documentation
> that will safely activate the cluster on certain condition. We already
> did this as intermediate solution for baseline autochange issue, see
>
> https://apacheignite.readme.io/docs/baseline-topology#section-triggering-rebalancing-programmatically
> page.
>
> I'll share my version of ActivationWatcher in this thread soon.
>
> Best Regards,
> Ivan Rakov
>
> On 26.06.2018 14:36, Dmitriy Govorukhin wrote:
> > Vladimir,
> >
> > Auto-activation on the first start?
> > Please, shared an issue link if you have.
> >
> > On Tue, Jun 26, 2018 at 11:29 AM Vladimir Ozerov 
> > wrote:
> >
> >> Pavel,
> >>
> >> As far as I know we agreed to implement auto activation in one of the
> >> nearest releases. Am I missing something?
> >>
> >> вт, 26 июня 2018 г. в 0:56, Pavel Kovalenko :
> >>
> >>> Igniters,
> >>>
> >>> By the results of the recent Ignite meeting at St. Petersburg I've
> >> noticed
> >>> that some of our users getting stuck with the problem when a cluster is
> >>> activated the first time.
> >>> At the moment we have only manual options to do it (control.sh, Visor,
> >>> etc.), but it's not enough. Manual activation might be good when users
> >> have
> >>> a dedicated cluster in production with a stable environment.
> >>> But this problem becomes harder when users deploy embedded Ignite (with
> >>> persistence) inside other services, or frequently deploy to temporary
> >> stage
> >>> / test environment.
> >>> It's uncomfortable to manual invoke control.sh each time after deploy
> to
> >>> clean environment and hard to write a custom script to do it
> >> automatically.
> >>> This is the clearly usability problem.
> >>>
> >>> I think we should introduce an example of how to write such policy
> using
> >>> Ignite API, similarly as we did it with Baseline Watcher.
> >>>
> >>> I've created a ticket regarding the problem:
> >>> https://issues.apache.org/jira/browse/IGNITE-8844
> >>> I think we should provide an example of one of the simplest and most
> >>> useful policy - when cluster server nodes size reaches some number.
> >>>
> >>> Moreover, I think it would be nice to have some sort of automatic
> cluster
> >>> management service (external or internal) like Spark Driver or Storm
> >>> Nimbus which
> >>> will do such things without user actions.
> >>>
> >>> What do you think?
> >>>
>
>


  1   2   3   4   5   6   7   >