[jira] [Commented] (IGNITE-15343) NullPointerException occurs when restarting ignite client application

2021-08-24 Thread Pavel Vinokurov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-15343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17403882#comment-17403882
 ] 

Pavel Vinokurov commented on IGNITE-15343:
--

You could also check that 10.211.80.15:6200 is accessible from server instances

> NullPointerException occurs when restarting ignite client application
> -
>
> Key: IGNITE-15343
> URL: https://issues.apache.org/jira/browse/IGNITE-15343
> Project: Ignite
>  Issue Type: Bug
>Reporter: Franco Po
>Priority: Critical
> Attachments: failed_startup-ignite_info.1st.attempt.log, 
> failed_startup-ignite_info.2nd.attempt.log, 
> server1-ignite_info.1st.attempt.log, server2-ignite_info.1st.attempt.log, 
> successful_startup-ignite_info.log
>
>
> I upgraded one of my API backend applications from Apache Ignite 2.6 to 
> GridGain Community Edition 8.8.5 successfully in live environment a couple of 
> months ago. The entire setup is 2 instances of this ignite client application 
> plus a cluster of 2 ignite server instances. A planned maintenance needed to 
> restart the ignite client application. However, it couldn't be started again 
> due to a sequence of below exceptions (see 
> [^failed_startup-ignite_info.1st.attempt.log] and 
> [^failed_startup-ignite_info.2nd.attempt.log] for full log):
>  # java.io.IOException: Failed to get acknowledge for message: 
> TcpDiscoveryClientMetricsUpdateMessage [super=TcpDiscoveryAbstractMessage 
> [sndNodeId=null, id=fef7e5e5b71-b588bb65-6fe8-4aff-8f17-4a9e8733369b, 
> verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, isClient=true]]
>  # java.net.SocketException: Socket is closed
>  # java.lang.NullPointerException: null
>  # org.apache.ignite.IgniteCheckedException: Node stopped
> I could restart same ignite client applications running in hot standby 
> environment where the ignite server contains no active data (see 
> [^successful_startup-ignite_info.log]).
> Is this problem related to GG-17439 and IGNITE-11406? Which is equivalent 
> version of ignite 2.10 in GainGrid edition?
> If anyone can provide insight as to how I can resolve this, that would be 
> greatly appreciated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (IGNITE-15343) NullPointerException occurs when restarting ignite client application

2021-08-24 Thread Pavel Vinokurov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-15343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17403877#comment-17403877
 ] 

Pavel Vinokurov edited comment on IGNITE-15343 at 8/24/21, 3:29 PM:



{code:java}
[2021/08/19 20:25:38.398]  INFO [tcp-client-disco-msg-worker-#4-#42] [] - 
Router node: TcpDiscoveryNode [id=c791881c-983f-44c9-a30b-c9b12e9cb7f6, 
consistentId=rhdpg03, addrs=ArrayList [10.211.80.17, 127.0.0.1], 
sockAddrs=HashSet [/127.0.0.1:6200, rhdpg03/10.211.80.17:6200], discPort=6200, 
order=1, intOrder=1, lastExchangeTime=1629375878183, loc=false, 
ver=8.8.5#20210519-sha1:067284c6, isClient=false]
{code}



was (Author: pvinokurov):
The difference 

> NullPointerException occurs when restarting ignite client application
> -
>
> Key: IGNITE-15343
> URL: https://issues.apache.org/jira/browse/IGNITE-15343
> Project: Ignite
>  Issue Type: Bug
>Reporter: Franco Po
>Priority: Critical
> Attachments: failed_startup-ignite_info.1st.attempt.log, 
> failed_startup-ignite_info.2nd.attempt.log, 
> server1-ignite_info.1st.attempt.log, server2-ignite_info.1st.attempt.log, 
> successful_startup-ignite_info.log
>
>
> I upgraded one of my API backend applications from Apache Ignite 2.6 to 
> GridGain Community Edition 8.8.5 successfully in live environment a couple of 
> months ago. The entire setup is 2 instances of this ignite client application 
> plus a cluster of 2 ignite server instances. A planned maintenance needed to 
> restart the ignite client application. However, it couldn't be started again 
> due to a sequence of below exceptions (see 
> [^failed_startup-ignite_info.1st.attempt.log] and 
> [^failed_startup-ignite_info.2nd.attempt.log] for full log):
>  # java.io.IOException: Failed to get acknowledge for message: 
> TcpDiscoveryClientMetricsUpdateMessage [super=TcpDiscoveryAbstractMessage 
> [sndNodeId=null, id=fef7e5e5b71-b588bb65-6fe8-4aff-8f17-4a9e8733369b, 
> verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, isClient=true]]
>  # java.net.SocketException: Socket is closed
>  # java.lang.NullPointerException: null
>  # org.apache.ignite.IgniteCheckedException: Node stopped
> I could restart same ignite client applications running in hot standby 
> environment where the ignite server contains no active data (see 
> [^successful_startup-ignite_info.log]).
> Is this problem related to GG-17439 and IGNITE-11406? Which is equivalent 
> version of ignite 2.10 in GainGrid edition?
> If anyone can provide insight as to how I can resolve this, that would be 
> greatly appreciated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-15343) NullPointerException occurs when restarting ignite client application

2021-08-24 Thread Pavel Vinokurov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-15343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17403877#comment-17403877
 ] 

Pavel Vinokurov commented on IGNITE-15343:
--

The difference 

> NullPointerException occurs when restarting ignite client application
> -
>
> Key: IGNITE-15343
> URL: https://issues.apache.org/jira/browse/IGNITE-15343
> Project: Ignite
>  Issue Type: Bug
>Reporter: Franco Po
>Priority: Critical
> Attachments: failed_startup-ignite_info.1st.attempt.log, 
> failed_startup-ignite_info.2nd.attempt.log, 
> server1-ignite_info.1st.attempt.log, server2-ignite_info.1st.attempt.log, 
> successful_startup-ignite_info.log
>
>
> I upgraded one of my API backend applications from Apache Ignite 2.6 to 
> GridGain Community Edition 8.8.5 successfully in live environment a couple of 
> months ago. The entire setup is 2 instances of this ignite client application 
> plus a cluster of 2 ignite server instances. A planned maintenance needed to 
> restart the ignite client application. However, it couldn't be started again 
> due to a sequence of below exceptions (see 
> [^failed_startup-ignite_info.1st.attempt.log] and 
> [^failed_startup-ignite_info.2nd.attempt.log] for full log):
>  # java.io.IOException: Failed to get acknowledge for message: 
> TcpDiscoveryClientMetricsUpdateMessage [super=TcpDiscoveryAbstractMessage 
> [sndNodeId=null, id=fef7e5e5b71-b588bb65-6fe8-4aff-8f17-4a9e8733369b, 
> verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, isClient=true]]
>  # java.net.SocketException: Socket is closed
>  # java.lang.NullPointerException: null
>  # org.apache.ignite.IgniteCheckedException: Node stopped
> I could restart same ignite client applications running in hot standby 
> environment where the ignite server contains no active data (see 
> [^successful_startup-ignite_info.log]).
> Is this problem related to GG-17439 and IGNITE-11406? Which is equivalent 
> version of ignite 2.10 in GainGrid edition?
> If anyone can provide insight as to how I can resolve this, that would be 
> greatly appreciated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (IGNITE-15343) NullPointerException occurs when restarting ignite client application

2021-08-24 Thread Pavel Vinokurov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-15343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17403228#comment-17403228
 ] 

Pavel Vinokurov edited comment on IGNITE-15343 at 8/24/21, 3:16 PM:


{code:java}
[2021/08/19 20:25:38.519]  WARN [main] [] - Local node's value of 
'java.net.preferIPv4Stack' system property differs from remote node's (all 
nodes in topology should have identical value) [locPreferIpV4=true, 
rmtPreferIpV4=null, locId8=b588bb65, rmtId8=7d483a80, 
rmtAddrs=[rhdpg02/0:0:0:0:0:0:0:1%lo, /10.211.80.16, /127.0.0.1], 
rmtNode=ClusterNode [id=7d483a80-4ada-4c10-b2e2-3b85a47b2d26, order=24, 
addr=[0:0:0:0:0:0:0:1%lo, 10.211.80.16, 127.0.0.1], daemon=false]]
{code}

Please add -Djava.net.preferIPv4Stack=true to all nodes including server and 
set IgniteConfiguration.setLocalHost()


was (Author: pvinokurov):

{code:java}
[2021/08/19 20:25:38.519]  WARN [main] [] - Local node's value of 
'java.net.preferIPv4Stack' system property differs from remote node's (all 
nodes in topology should have identical value) [locPreferIpV4=true, 
rmtPreferIpV4=null, locId8=b588bb65, rmtId8=7d483a80, 
rmtAddrs=[rhdpg02/0:0:0:0:0:0:0:1%lo, /10.211.80.16, /127.0.0.1], 
rmtNode=ClusterNode [id=7d483a80-4ada-4c10-b2e2-3b85a47b2d26, order=24, 
addr=[0:0:0:0:0:0:0:1%lo, 10.211.80.16, 127.0.0.1], daemon=false]]
{code}

Please add -Djava.net.preferIPv4Stack=true to the client node and set 
IgniteConfiguration.setLocalHost()

> NullPointerException occurs when restarting ignite client application
> -
>
> Key: IGNITE-15343
> URL: https://issues.apache.org/jira/browse/IGNITE-15343
> Project: Ignite
>  Issue Type: Bug
>Reporter: Franco Po
>Priority: Critical
> Attachments: failed_startup-ignite_info.1st.attempt.log, 
> failed_startup-ignite_info.2nd.attempt.log, 
> server1-ignite_info.1st.attempt.log, server2-ignite_info.1st.attempt.log, 
> successful_startup-ignite_info.log
>
>
> I upgraded one of my API backend applications from Apache Ignite 2.6 to 
> GridGain Community Edition 8.8.5 successfully in live environment a couple of 
> months ago. The entire setup is 2 instances of this ignite client application 
> plus a cluster of 2 ignite server instances. A planned maintenance needed to 
> restart the ignite client application. However, it couldn't be started again 
> due to a sequence of below exceptions (see 
> [^failed_startup-ignite_info.1st.attempt.log] and 
> [^failed_startup-ignite_info.2nd.attempt.log] for full log):
>  # java.io.IOException: Failed to get acknowledge for message: 
> TcpDiscoveryClientMetricsUpdateMessage [super=TcpDiscoveryAbstractMessage 
> [sndNodeId=null, id=fef7e5e5b71-b588bb65-6fe8-4aff-8f17-4a9e8733369b, 
> verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, isClient=true]]
>  # java.net.SocketException: Socket is closed
>  # java.lang.NullPointerException: null
>  # org.apache.ignite.IgniteCheckedException: Node stopped
> I could restart same ignite client applications running in hot standby 
> environment where the ignite server contains no active data (see 
> [^successful_startup-ignite_info.log]).
> Is this problem related to GG-17439 and IGNITE-11406? Which is equivalent 
> version of ignite 2.10 in GainGrid edition?
> If anyone can provide insight as to how I can resolve this, that would be 
> greatly appreciated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-15343) NullPointerException occurs when restarting ignite client application

2021-08-23 Thread Pavel Vinokurov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-15343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17403228#comment-17403228
 ] 

Pavel Vinokurov commented on IGNITE-15343:
--


{code:java}
[2021/08/19 20:25:38.519]  WARN [main] [] - Local node's value of 
'java.net.preferIPv4Stack' system property differs from remote node's (all 
nodes in topology should have identical value) [locPreferIpV4=true, 
rmtPreferIpV4=null, locId8=b588bb65, rmtId8=7d483a80, 
rmtAddrs=[rhdpg02/0:0:0:0:0:0:0:1%lo, /10.211.80.16, /127.0.0.1], 
rmtNode=ClusterNode [id=7d483a80-4ada-4c10-b2e2-3b85a47b2d26, order=24, 
addr=[0:0:0:0:0:0:0:1%lo, 10.211.80.16, 127.0.0.1], daemon=false]]
{code}

Please add -Djava.net.preferIPv4Stack=true to the client node and set 
IgniteConfiguration.setLocalHost()

> NullPointerException occurs when restarting ignite client application
> -
>
> Key: IGNITE-15343
> URL: https://issues.apache.org/jira/browse/IGNITE-15343
> Project: Ignite
>  Issue Type: Bug
>Reporter: Franco Po
>Priority: Critical
> Attachments: failed_startup-ignite_info.1st.attempt.log, 
> failed_startup-ignite_info.2nd.attempt.log, 
> server1-ignite_info.1st.attempt.log, server2-ignite_info.1st.attempt.log, 
> successful_startup-ignite_info.log
>
>
> I upgraded one of my API backend applications from Apache Ignite 2.6 to 
> GridGain Community Edition 8.8.5 successfully in live environment a couple of 
> months ago. The entire setup is 2 instances of this ignite client application 
> plus a cluster of 2 ignite server instances. A planned maintenance needed to 
> restart the ignite client application. However, it couldn't be started again 
> due to a sequence of below exceptions (see 
> [^failed_startup-ignite_info.1st.attempt.log] and 
> [^failed_startup-ignite_info.2nd.attempt.log] for full log):
>  # java.io.IOException: Failed to get acknowledge for message: 
> TcpDiscoveryClientMetricsUpdateMessage [super=TcpDiscoveryAbstractMessage 
> [sndNodeId=null, id=fef7e5e5b71-b588bb65-6fe8-4aff-8f17-4a9e8733369b, 
> verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, isClient=true]]
>  # java.net.SocketException: Socket is closed
>  # java.lang.NullPointerException: null
>  # org.apache.ignite.IgniteCheckedException: Node stopped
> I could restart same ignite client applications running in hot standby 
> environment where the ignite server contains no active data (see 
> [^successful_startup-ignite_info.log]).
> Is this problem related to GG-17439 and IGNITE-11406? Which is equivalent 
> version of ignite 2.10 in GainGrid edition?
> If anyone can provide insight as to how I can resolve this, that would be 
> greatly appreciated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (IGNITE-15343) NullPointerException occurs when restarting ignite client application

2021-08-22 Thread Pavel Vinokurov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-15343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402837#comment-17402837
 ] 

Pavel Vinokurov edited comment on IGNITE-15343 at 8/22/21, 5:22 PM:


[~francopo] It would be helpful if you attached the logs from server nodes. The 
log messages indicated connection issues. Thus the server logs could show the 
cause of this issues


was (Author: pvinokurov):
[~francopo] It would be helpful if you attached the logs from server nodes.

> NullPointerException occurs when restarting ignite client application
> -
>
> Key: IGNITE-15343
> URL: https://issues.apache.org/jira/browse/IGNITE-15343
> Project: Ignite
>  Issue Type: Bug
>Reporter: Franco Po
>Priority: Critical
> Attachments: failed_startup-ignite_info.1st.attempt.log, 
> failed_startup-ignite_info.2nd.attempt.log, successful_startup-ignite_info.log
>
>
> I upgraded one of my API backend applications from Apache Ignite 2.6 to 
> GridGain Community Edition 8.8.5 successfully in live environment a couple of 
> months ago. The entire setup is 2 instances of this ignite client application 
> plus a cluster of 2 ignite server instances. A planned maintenance needed to 
> restart the ignite client application. However, it couldn't be started again 
> due to a sequence of below exceptions (see 
> [^failed_startup-ignite_info.1st.attempt.log] and 
> [^failed_startup-ignite_info.2nd.attempt.log] for full log):
>  # java.io.IOException: Failed to get acknowledge for message: 
> TcpDiscoveryClientMetricsUpdateMessage [super=TcpDiscoveryAbstractMessage 
> [sndNodeId=null, id=fef7e5e5b71-b588bb65-6fe8-4aff-8f17-4a9e8733369b, 
> verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, isClient=true]]
>  # java.net.SocketException: Socket is closed
>  # java.lang.NullPointerException: null
>  # org.apache.ignite.IgniteCheckedException: Node stopped
> I could restart same ignite client applications running in hot standby 
> environment where the ignite server contains no active data (see 
> [^successful_startup-ignite_info.log]).
> Is this problem related to GG-17439 and IGNITE-11406? Which is equivalent 
> version of ignite 2.10 in GainGrid edition?
> If anyone can provide insight as to how I can resolve this, that would be 
> greatly appreciated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-15343) NullPointerException occurs when restarting ignite client application

2021-08-22 Thread Pavel Vinokurov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-15343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402837#comment-17402837
 ] 

Pavel Vinokurov commented on IGNITE-15343:
--

[~francopo] It would be helpful if you attached the logs from server nodes.

> NullPointerException occurs when restarting ignite client application
> -
>
> Key: IGNITE-15343
> URL: https://issues.apache.org/jira/browse/IGNITE-15343
> Project: Ignite
>  Issue Type: Bug
>Reporter: Franco Po
>Priority: Critical
> Attachments: failed_startup-ignite_info.1st.attempt.log, 
> failed_startup-ignite_info.2nd.attempt.log, successful_startup-ignite_info.log
>
>
> I upgraded one of my API backend applications from Apache Ignite 2.6 to 
> GridGain Community Edition 8.8.5 successfully in live environment a couple of 
> months ago. The entire setup is 2 instances of this ignite client application 
> plus a cluster of 2 ignite server instances. A planned maintenance needed to 
> restart the ignite client application. However, it couldn't be started again 
> due to a sequence of below exceptions (see 
> [^failed_startup-ignite_info.1st.attempt.log] and 
> [^failed_startup-ignite_info.2nd.attempt.log] for full log):
>  # java.io.IOException: Failed to get acknowledge for message: 
> TcpDiscoveryClientMetricsUpdateMessage [super=TcpDiscoveryAbstractMessage 
> [sndNodeId=null, id=fef7e5e5b71-b588bb65-6fe8-4aff-8f17-4a9e8733369b, 
> verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, isClient=true]]
>  # java.net.SocketException: Socket is closed
>  # java.lang.NullPointerException: null
>  # org.apache.ignite.IgniteCheckedException: Node stopped
> I could restart same ignite client applications running in hot standby 
> environment where the ignite server contains no active data (see 
> [^successful_startup-ignite_info.log]).
> Is this problem related to GG-17439 and IGNITE-11406? Which is equivalent 
> version of ignite 2.10 in GainGrid edition?
> If anyone can provide insight as to how I can resolve this, that would be 
> greatly appreciated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (IGNITE-14439) NPE when accessing clustername before first exchange finished

2021-08-21 Thread Pavel Vinokurov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-14439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402637#comment-17402637
 ] 

Pavel Vinokurov edited comment on IGNITE-14439 at 8/21/21, 4:10 PM:


Hi [~francopo], most probably NPE was caused by another issue because before 
calling GridServiceProcessor.onKernalStart the system cache should be 
initialised. Is that NPE being repeated all the time?


was (Author: pvinokurov):
Hi [~francopo], most probably NPE was caused by another issue because before 
calling GridServiceProcessor.onKernalStart the system cache should be 
initialised. Is those NPE being repeated all the time?

> NPE when accessing clustername before first exchange finished
> -
>
> Key: IGNITE-14439
> URL: https://issues.apache.org/jira/browse/IGNITE-14439
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.9
>Reporter: Pavel Vinokurov
>Assignee: Pavel Vinokurov
>Priority: Major
> Fix For: 2.11
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [IGNITE-11406|https://issues.apache.org/jira/browse/IGNITE-11406] has not 
> been fixed properly for two reasons. The first is one is that 
> _GridCacheProcessor.utilityCache_ could be accessed before the first exchange 
> finished. The second is that it doesn't resolve the original issue, because 
> _GridServiceProcessor.onKernelStop_ is followed by 
> _GridCacheProcessor.onKernelStop_, so caches should be already initialized. 
> Thus that fix should be reverted.
> Revering this fix induces the issue related to accessing the utility cache by 
> getting cluster name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-14439) NPE when accessing clustername before first exchange finished

2021-08-21 Thread Pavel Vinokurov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-14439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402637#comment-17402637
 ] 

Pavel Vinokurov commented on IGNITE-14439:
--

Hi [~francopo], most probably NPE was caused by another issue because before 
calling GridServiceProcessor.onKernalStart the system cache should be 
initialised. Is those NPE being repeated all the time?

> NPE when accessing clustername before first exchange finished
> -
>
> Key: IGNITE-14439
> URL: https://issues.apache.org/jira/browse/IGNITE-14439
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.9
>Reporter: Pavel Vinokurov
>Assignee: Pavel Vinokurov
>Priority: Major
> Fix For: 2.11
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [IGNITE-11406|https://issues.apache.org/jira/browse/IGNITE-11406] has not 
> been fixed properly for two reasons. The first is one is that 
> _GridCacheProcessor.utilityCache_ could be accessed before the first exchange 
> finished. The second is that it doesn't resolve the original issue, because 
> _GridServiceProcessor.onKernelStop_ is followed by 
> _GridCacheProcessor.onKernelStop_, so caches should be already initialized. 
> Thus that fix should be reverted.
> Revering this fix induces the issue related to accessing the utility cache by 
> getting cluster name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IGNITE-13000) Connection.prepareStatement(String,int) always throws UnsupportedException ignoring 'autoGeneratedKeys' parameter

2021-04-02 Thread Pavel Vinokurov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov reassigned IGNITE-13000:


Assignee: (was: Pavel Vinokurov)

> Connection.prepareStatement(String,int) always throws UnsupportedException  
> ignoring  'autoGeneratedKeys' parameter
> ---
>
> Key: IGNITE-13000
> URL: https://issues.apache.org/jira/browse/IGNITE-13000
> Project: Ignite
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 2.8
>Reporter: Pavel Vinokurov
>Priority: Major
>
> Below the method call throwing Exception.
> {code:java}
> conn.prepareStatement(query, Statement.NO_GENERATED_KEYS)
> {code}
> But there is should be the same result as for:
> {code:java}
> conn.prepareStatement(query)
> {code}
> The possible fix:
> {code:java}
> @Override 
> public PreparedStatement prepareStatement(String sql, int autoGeneratedKeys) 
> throws SQLException {
> ensureNotClosed();
> if(autoGeneratedKeys == Statement.RETURN_GENERATED_KEYS)
> throw new SQLFeatureNotSupportedException("Auto generated keys are 
> not supported.");
> return prepareStatement(sql);
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IGNITE-14439) NPE when accessing clustername before first exchange finished

2021-04-02 Thread Pavel Vinokurov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-14439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov reassigned IGNITE-14439:


Assignee: Pavel Vinokurov

> NPE when accessing clustername before first exchange finished
> -
>
> Key: IGNITE-14439
> URL: https://issues.apache.org/jira/browse/IGNITE-14439
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.9
>Reporter: Pavel Vinokurov
>Assignee: Pavel Vinokurov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> [IGNITE-11406|https://issues.apache.org/jira/browse/IGNITE-11406] has not 
> been fixed properly for two reasons. The first is one is that 
> _GridCacheProcessor.utilityCache_ could be accessed before the first exchange 
> finished. The second is that it doesn't resolve the original issue, because 
> _GridServiceProcessor.onKernelStop_ is followed by 
> _GridCacheProcessor.onKernelStop_, so caches should be already initialized. 
> Thus that fix should be reverted.
> Revering this fix induces the issue related to accessing the utility cache by 
> getting cluster name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-14439) NPE when accessing clustername before first exchange finished

2021-03-31 Thread Pavel Vinokurov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-14439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312174#comment-17312174
 ] 

Pavel Vinokurov commented on IGNITE-14439:
--

[~ilyak] Please review

> NPE when accessing clustername before first exchange finished
> -
>
> Key: IGNITE-14439
> URL: https://issues.apache.org/jira/browse/IGNITE-14439
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.9
>Reporter: Pavel Vinokurov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> [IGNITE-11406|https://issues.apache.org/jira/browse/IGNITE-11406] has not 
> been fixed properly for two reasons. The first is one is that 
> _GridCacheProcessor.utilityCache_ could be accessed before the first exchange 
> finished. The second is that it doesn't resolve the original issue, because 
> _GridServiceProcessor.onKernelStop_ is followed by 
> _GridCacheProcessor.onKernelStop_, so caches should be already initialized. 
> Thus that fix should be reverted.
> Revering this fix induces the issue related to accessing the utility cache by 
> getting cluster name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-14443) Calcite integration. SqlFirstLastValueAggFunction support

2021-03-30 Thread Pavel Vinokurov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-14443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-14443:
-
Priority: Major  (was: Minor)

> Calcite integration. SqlFirstLastValueAggFunction support
> -
>
> Key: IGNITE-14443
> URL: https://issues.apache.org/jira/browse/IGNITE-14443
> Project: Ignite
>  Issue Type: New Feature
>  Components: sql
>Affects Versions: 3.0.0-alpha1
>Reporter: Pavel Vinokurov
>Priority: Major
>
> We need to support aggregation functions, especially 
> SqlFirstLastValueAggFunction that allows simplify and optimize the wide range 
> of sql queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14443) Calcite integration. SqlFirstLastValueAggFunction support

2021-03-30 Thread Pavel Vinokurov (Jira)
Pavel Vinokurov created IGNITE-14443:


 Summary: Calcite integration. SqlFirstLastValueAggFunction support
 Key: IGNITE-14443
 URL: https://issues.apache.org/jira/browse/IGNITE-14443
 Project: Ignite
  Issue Type: New Feature
  Components: sql
Affects Versions: 3.0.0-alpha1
Reporter: Pavel Vinokurov


We need to support aggregation functions, especially 
SqlFirstLastValueAggFunction that allows simplify and optimize the wide range 
of sql queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14439) NPE when accessing clustername before first exchange finished

2021-03-29 Thread Pavel Vinokurov (Jira)
Pavel Vinokurov created IGNITE-14439:


 Summary: NPE when accessing clustername before first exchange 
finished
 Key: IGNITE-14439
 URL: https://issues.apache.org/jira/browse/IGNITE-14439
 Project: Ignite
  Issue Type: Bug
  Components: cache
Affects Versions: 2.9
Reporter: Pavel Vinokurov


[IGNITE-11406|https://issues.apache.org/jira/browse/IGNITE-11406] has not been 
fixed properly for two reasons. The first is one is that 
_GridCacheProcessor.utilityCache_ could be accessed before the first exchange 
finished. The second is that it doesn't resolve the original issue, because 
_GridServiceProcessor.onKernelStop_ is followed by 
_GridCacheProcessor.onKernelStop_, so caches should be already initialized. 
Thus that fix should be reverted.

Revering this fix induces the issue related to accessing the utility cache by 
getting cluster name.






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14263) Failure handler is triggered by NPE on unstable topology

2021-03-01 Thread Pavel Vinokurov (Jira)
Pavel Vinokurov created IGNITE-14263:


 Summary: Failure handler is triggered by NPE on unstable topology
 Key: IGNITE-14263
 URL: https://issues.apache.org/jira/browse/IGNITE-14263
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.9.1
Reporter: Pavel Vinokurov
 Attachments: Reproducer.java

Restarting servers and clients produced the following exception:

{code:java}
SEVERE: Critical system error detected. Will be handled accordingly to 
configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet 
[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], 
failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=class 
o.a.i.IgniteCheckedException: null]]
class org.apache.ignite.IgniteCheckedException: null
at 
org.apache.ignite.internal.util.IgniteUtils.cast(IgniteUtils.java:7759)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.resolve(GridFutureAdapter.java:268)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:217)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:168)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3431)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:3222)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at 
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.ownOrphans(GridDhtPartitionTopologyImpl.java:2075)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.onExchangeDone(GridDhtPartitionTopologyImpl.java:2059)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:2535)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:159)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:475)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$8.run(GridDhtPartitionsExchangeFuture.java:5119)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.initDone(GridDhtPartitionsExchangeFuture.java:5002)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.clientOnlyExchange(GridDhtPartitionsExchangeFuture.java:1592)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:1052)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3403)
... 3 more
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-14256) SQL delete statement ignores skipOnReduce and local flags for replicated caches

2021-02-28 Thread Pavel Vinokurov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-14256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-14256:
-
Attachment: DeleteTest.java

> SQL delete statement ignores skipOnReduce and local flags for replicated 
> caches
> ---
>
> Key: IGNITE-14256
> URL: https://issues.apache.org/jira/browse/IGNITE-14256
> Project: Ignite
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 2.9.1
>Reporter: Pavel Vinokurov
>Priority: Major
> Attachments: DeleteTest.java
>
>
> Delete statement removes data from all nodes ignoring enabled lazy and 
> skipOnReduce flags.
> The reproducer is attached.
> Below the stacktrace
> {code:java}
> "sys-stripe-4-#68%5f5ea90d-6614-448f-9df7-0d770f0b216d%" #111 prio=5 
> os_prio=0 tid=0x7fbac1d59000 nid=0x7329 runnable [0x7fba8c7ed000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.EPollArrayWrapper.interrupt(Native Method)
>   at sun.nio.ch.EPollArrayWrapper.interrupt(EPollArrayWrapper.java:317)
>   at sun.nio.ch.EPollSelectorImpl.wakeup(EPollSelectorImpl.java:207)
>   - locked <0x0005cc4475f8> (a java.lang.Object)
>   at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.offer(GridNioServer.java:1988)
>   at 
> org.apache.ignite.internal.util.nio.GridNioServer.send0(GridNioServer.java:652)
>   at 
> org.apache.ignite.internal.util.nio.GridNioServer.send(GridNioServer.java:620)
>   at 
> org.apache.ignite.internal.util.nio.GridNioServer$HeadFilter.onSessionWrite(GridNioServer.java:3704)
>   at 
> org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionWrite(GridNioFilterAdapter.java:120)
>   at 
> org.apache.ignite.internal.util.nio.GridConnectionBytesVerifyFilter.onSessionWrite(GridConnectionBytesVerifyFilter.java:80)
>   at 
> org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionWrite(GridNioFilterAdapter.java:120)
>   at 
> org.apache.ignite.internal.util.nio.GridNioCodecFilter.onSessionWrite(GridNioCodecFilter.java:90)
>   at 
> org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionWrite(GridNioFilterAdapter.java:120)
>   at 
> org.apache.ignite.internal.util.nio.GridNioFilterChain$TailFilter.onSessionWrite(GridNioFilterChain.java:268)
>   at 
> org.apache.ignite.internal.util.nio.GridNioFilterChain.onSessionWrite(GridNioFilterChain.java:191)
>   at 
> org.apache.ignite.internal.util.nio.GridNioSessionImpl.sendNoFuture(GridNioSessionImpl.java:129)
>   at 
> org.apache.ignite.internal.util.nio.GridTcpNioCommunicationClient.sendMessage(GridTcpNioCommunicationClient.java:115)
>   at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1182)
>   at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1124)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1809)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:1923)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1257)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1296)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicAbstractUpdateFuture.sendDhtRequests(GridDhtAtomicAbstractUpdateFuture.java:489)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicAbstractUpdateFuture.map(GridDhtAtomicAbstractUpdateFuture.java:445)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1926)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1679)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3190)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:151)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:286)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:281)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142)
>   at 
> 

[jira] [Created] (IGNITE-14256) SQL delete statement ignores skipOnReduce and local flags for replicated caches

2021-02-28 Thread Pavel Vinokurov (Jira)
Pavel Vinokurov created IGNITE-14256:


 Summary: SQL delete statement ignores skipOnReduce and local flags 
for replicated caches
 Key: IGNITE-14256
 URL: https://issues.apache.org/jira/browse/IGNITE-14256
 Project: Ignite
  Issue Type: Bug
  Components: sql
Affects Versions: 2.9.1
Reporter: Pavel Vinokurov


Delete statement removes data from all nodes ignoring enabled lazy and 
skipOnReduce flags.
The reproducer is attached.

Below the stacktrace
{code:java}
"sys-stripe-4-#68%5f5ea90d-6614-448f-9df7-0d770f0b216d%" #111 prio=5 os_prio=0 
tid=0x7fbac1d59000 nid=0x7329 runnable [0x7fba8c7ed000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.interrupt(Native Method)
at sun.nio.ch.EPollArrayWrapper.interrupt(EPollArrayWrapper.java:317)
at sun.nio.ch.EPollSelectorImpl.wakeup(EPollSelectorImpl.java:207)
- locked <0x0005cc4475f8> (a java.lang.Object)
at 
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.offer(GridNioServer.java:1988)
at 
org.apache.ignite.internal.util.nio.GridNioServer.send0(GridNioServer.java:652)
at 
org.apache.ignite.internal.util.nio.GridNioServer.send(GridNioServer.java:620)
at 
org.apache.ignite.internal.util.nio.GridNioServer$HeadFilter.onSessionWrite(GridNioServer.java:3704)
at 
org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionWrite(GridNioFilterAdapter.java:120)
at 
org.apache.ignite.internal.util.nio.GridConnectionBytesVerifyFilter.onSessionWrite(GridConnectionBytesVerifyFilter.java:80)
at 
org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionWrite(GridNioFilterAdapter.java:120)
at 
org.apache.ignite.internal.util.nio.GridNioCodecFilter.onSessionWrite(GridNioCodecFilter.java:90)
at 
org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionWrite(GridNioFilterAdapter.java:120)
at 
org.apache.ignite.internal.util.nio.GridNioFilterChain$TailFilter.onSessionWrite(GridNioFilterChain.java:268)
at 
org.apache.ignite.internal.util.nio.GridNioFilterChain.onSessionWrite(GridNioFilterChain.java:191)
at 
org.apache.ignite.internal.util.nio.GridNioSessionImpl.sendNoFuture(GridNioSessionImpl.java:129)
at 
org.apache.ignite.internal.util.nio.GridTcpNioCommunicationClient.sendMessage(GridTcpNioCommunicationClient.java:115)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1182)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1124)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1809)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:1923)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1257)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1296)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicAbstractUpdateFuture.sendDhtRequests(GridDhtAtomicAbstractUpdateFuture.java:489)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicAbstractUpdateFuture.map(GridDhtAtomicAbstractUpdateFuture.java:445)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1926)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1679)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3190)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:151)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:286)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:281)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318)
at 

[jira] [Created] (IGNITE-13989) Destroy of persisted cache doesn't remove cache folder

2021-01-13 Thread Pavel Vinokurov (Jira)
Pavel Vinokurov created IGNITE-13989:


 Summary: Destroy of persisted cache doesn't remove cache folder
 Key: IGNITE-13989
 URL: https://issues.apache.org/jira/browse/IGNITE-13989
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.9.1
Reporter: Pavel Vinokurov


IgniteCache#destroy doesn't remove the folder in the persistent storage.
Creating/Destroying dynamic caches could clutter the PDS and meet with the 
system limits




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13960) Starvation in mgmt pool caused by MetadataTask execution

2021-01-12 Thread Pavel Vinokurov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1726#comment-1726
 ] 

Pavel Vinokurov commented on IGNITE-13960:
--

[~tledkov] Please review

> Starvation in mgmt pool caused by  MetadataTask execution
> -
>
> Key: IGNITE-13960
> URL: https://issues.apache.org/jira/browse/IGNITE-13960
> Project: Ignite
>  Issue Type: Bug
>  Components: compute
>Affects Versions: 2.9.1
>Reporter: Pavel Vinokurov
>Assignee: Pavel Vinokurov
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> *Issue:*
> Requesting cache metadata from multiple threads causes starvation in the mgmt 
> pool.
> *Root Cause:*
> From the mgmt pool GridCacheCommandHandler.MetadataJob calls 
> GridCacheQueryManager#sqlMetadata() and 
> GridClosureProcessor#callAsyncNoFailover().get() that executes and waits an 
> another internal task.  The job response of this task should be also handled 
> from the mgmt pool. It causes starvation.
> *Proposed Fix:*
> Make GridCacheQueryManager#sqlMetadata() asynchronous and apply continuation 
> for GridCacheCommandHandler.MetadataJob to release a mgmt thread for the time 
> of completing the future returned by sqlMetadata().
> Attached threads with hanging threads:
> {code:java}
> "mgmt-#10633" #14311 prio=5 os_prio=0 tid=0x560c79117000 nid=0x134c6 
> waiting on condition [0x7f15baa77000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
>   at 
> org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.sqlMetadata(GridCacheQueryManager.java:1803)
>   at 
> org.apache.ignite.internal.processors.rest.handlers.cache.GridCacheCommandHandler$MetadataJob.execute(GridCacheCommandHandler.java:1123)
>   at 
> org.apache.ignite.internal.processors.rest.handlers.cache.GridCacheCommandHandler$MetadataJob.execute(GridCacheCommandHandler.java:1088)
>   at 
> org.apache.ignite.internal.processors.job.GridJobWorker$2.call(GridJobWorker.java:567)
>   at 
> org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7069)
>   at 
> org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:561)
>   at 
> org.apache.ignite.internal.processors.job.GridJobWorker.body(GridJobWorker.java:490)
>   at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
>   at 
> org.apache.ignite.internal.processors.job.GridJobProcessor.processJobExecuteRequest(GridJobProcessor.java:1270)
>   at 
> org.apache.ignite.internal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:2088)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1635)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1255)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager.access$4300(GridIoManager.java:144)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager$8.execute(GridIoManager.java:1144)
>   at 
> org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:50)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> "mgmt-#81" #270 prio=5 os_prio=0 tid=0x562323c3c800 nid=0x592 waiting on 
> condition [0x7fba5f378000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
>   at 
> org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor$ClientChangeGlobalStateComputeRequest.run(GridClusterStateProcessor.java:1979)
>   at 
> org.apache.ignite.internal.processors.closure.GridClosureProcessor$C4.execute(GridClosureProcessor.java:1943)
>   at 
> org.apache.ignite.internal.processors.job.GridJobWorker$2.call(GridJobWorker.java:567)
>   at 
> org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7069)
>   at 
> 

[jira] [Created] (IGNITE-13960) Starvation in mgmt pool caused by MetadataTask execution

2021-01-05 Thread Pavel Vinokurov (Jira)
Pavel Vinokurov created IGNITE-13960:


 Summary: Starvation in mgmt pool caused by  MetadataTask execution
 Key: IGNITE-13960
 URL: https://issues.apache.org/jira/browse/IGNITE-13960
 Project: Ignite
  Issue Type: Bug
  Components: compute
Affects Versions: 2.9.1
Reporter: Pavel Vinokurov
Assignee: Pavel Vinokurov


*Issue:*

Requesting cache metadata from multiple threads causes starvation in the mgmt 
pool.

*Root Cause:*

>From the mgmt pool GridCacheCommandHandler.MetadataJob calls 
>GridCacheQueryManager#sqlMetadata() and 
>GridClosureProcessor#callAsyncNoFailover().get() that executes and waits an 
>another internal task.  The job response of this task should be also handled 
>from the mgmt pool. It causes starvation.

*Proposed Fix:*

Make GridCacheQueryManager#sqlMetadata() asynchronous and apply continuation 
for GridCacheCommandHandler.MetadataJob to release a mgmt thread for the time 
of completing the future returned by sqlMetadata().

Attached threads with hanging threads:

{code:java}

"mgmt-#10633" #14311 prio=5 os_prio=0 tid=0x560c79117000 nid=0x134c6 
waiting on condition [0x7f15baa77000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.sqlMetadata(GridCacheQueryManager.java:1803)
at 
org.apache.ignite.internal.processors.rest.handlers.cache.GridCacheCommandHandler$MetadataJob.execute(GridCacheCommandHandler.java:1123)
at 
org.apache.ignite.internal.processors.rest.handlers.cache.GridCacheCommandHandler$MetadataJob.execute(GridCacheCommandHandler.java:1088)
at 
org.apache.ignite.internal.processors.job.GridJobWorker$2.call(GridJobWorker.java:567)
at 
org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7069)
at 
org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:561)
at 
org.apache.ignite.internal.processors.job.GridJobWorker.body(GridJobWorker.java:490)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
at 
org.apache.ignite.internal.processors.job.GridJobProcessor.processJobExecuteRequest(GridJobProcessor.java:1270)
at 
org.apache.ignite.internal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:2088)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1635)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1255)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4300(GridIoManager.java:144)
at 
org.apache.ignite.internal.managers.communication.GridIoManager$8.execute(GridIoManager.java:1144)
at 
org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:50)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"mgmt-#81" #270 prio=5 os_prio=0 tid=0x562323c3c800 nid=0x592 waiting on 
condition [0x7fba5f378000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
at 
org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor$ClientChangeGlobalStateComputeRequest.run(GridClusterStateProcessor.java:1979)
at 
org.apache.ignite.internal.processors.closure.GridClosureProcessor$C4.execute(GridClosureProcessor.java:1943)
at 
org.apache.ignite.internal.processors.job.GridJobWorker$2.call(GridJobWorker.java:567)
at 
org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7069)
at 
org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:561)
at 
org.apache.ignite.internal.processors.job.GridJobWorker.body(GridJobWorker.java:490)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
at 
org.apache.ignite.internal.processors.job.GridJobProcessor.processJobExecuteRequest(GridJobProcessor.java:1270)
at 

[jira] [Updated] (IGNITE-11406) NullPointerException may occur on client start

2020-12-28 Thread Pavel Vinokurov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-11406:
-
Summary: NullPointerException may occur on client start  (was: Fix 
NullPointerException on client start)

> NullPointerException may occur on client start
> --
>
> Key: IGNITE-11406
> URL: https://issues.apache.org/jira/browse/IGNITE-11406
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.9
>Reporter: Dmitry Sherstobitov
>Assignee: Pavel Vinokurov
>Priority: Critical
> Fix For: 2.10
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> During testing fixes for https://issues.apache.org/jira/browse/IGNITE-10878
>  # Start cluster, create caches with no persistence and load data into it
>  # Restart each node in cluster by order (coordinator first)
> Do not wait until topology message occurs 
>  # Try to run utilities: activate, baseline (to check that cluster is alive)
>  # Run clients and load data into alive caches
> On 4th step one of the clients throw NPE on start
> {code:java}
> 2019-02-23T18:36:24,045][DEBUG][tcp-client-disco-msg-worker-#4][TcpDiscoverySpi]
>  Connection closed, local node received force fail message, will not try to 
> restore connection
> 2019-02-23T18:36:24,045][DEBUG][tcp-client-disco-msg-worker-#4][TcpDiscoverySpi]
>  Failed to restore closed connection, will try to reconnect 
> [networkTimeout=5000, joinTimeout=0, failMsg=TcpDiscoveryNodeFailedMessage 
> [failedNodeId=80f8b6ee-6a6d-4235-86e9-1b66ea310eb6, order=90, warning=Client 
> node considered as unreachable and will be dropped from cluster, because no 
> metrics update messages received in interval: 
> TcpDiscoverySpi.clientFailureDetectionTimeout() ms. It may be caused by 
> network problems or long GC pause on client node, try to increase this 
> parameter. [nodeId=80f8b6ee-6a6d-4235-86e9-1b66ea310eb6, 
> clientFailureDetectionTimeout=3], super=TcpDiscoveryAbstractMessage 
> [sndNodeId=987d4a03-8233-4130-af5b-c06900bdb6d7, 
> id=3642cfa1961-987d4a03-8233-4130-af5b-c06900bdb6d7, 
> verifierNodeId=d9abbff3-4b4d-4a13-9cb1-0ca4d2436164, topVer=167, 
> pendingIdx=0, failedNodes=null, isClient=false]]]
> 2019-02-23T18:36:24,046][DEBUG][tcp-client-disco-msg-worker-#4][TcpDiscoverySpi]
>  Discovery notification [node=TcpDiscoveryNode 
> [id=80f8b6ee-6a6d-4235-86e9-1b66ea310eb6, addrs=[172.25.1.34], 
> sockAddrs=[lab34.gridgain.local/172.25.1.34:0], discPort=0, order=165, 
> intOrder=0, lastExchangeTime=1550936128313, loc=true, 
> ver=2.4.15#20190222-sha1:36b1d676, isClient=true], 
> type=CLIENT_NODE_DISCONNECTED, topVer=166]
> 2019-02-23T18:36:24,049][INFO 
> ][tcp-client-disco-msg-worker-#4][GridDhtPartitionsExchangeFuture] Finish 
> exchange future [startVer=AffinityTopologyVersion [topVer=165, 
> minorTopVer=0], resVer=null, err=class 
> org.apache.ignite.internal.IgniteClientDisconnectedCheckedException: Client 
> node disconnected: null]
> [2019-02-23T18:36:24,061][ERROR][Thread-2][IgniteKernal] Got exception while 
> starting (will rollback startup routine).
> java.lang.NullPointerException: null
> at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.internalCacheEx(GridCacheProcessor.java:3886)
>  ~[ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.utilityCache(GridCacheProcessor.java:3858)
>  ~[ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.processors.service.GridServiceProcessor.updateUtilityCache(GridServiceProcessor.java:290)
>  ~[ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.processors.service.GridServiceProcessor.onKernalStart0(GridServiceProcessor.java:233)
>  ~[ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.processors.service.GridServiceProcessor.onKernalStart(GridServiceProcessor.java:221)
>  ~[ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1038) 
> [ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1973)
>  [ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1716)
>  [ignite-core-2.4.15.jar:2.4.15]
> at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1144) 
> [ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1062)
>  [ignite-core-2.4.15.jar:2.4.15]
> at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:948) 
> [ignite-core-2.4.15.jar:2.4.15]
> at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:847) 
> 

[jira] [Updated] (IGNITE-11406) Fix NullPointerException on client start

2020-12-28 Thread Pavel Vinokurov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-11406:
-
Summary: Fix NullPointerException on client start  (was: 
NullPointerException may occur on client start)

> Fix NullPointerException on client start
> 
>
> Key: IGNITE-11406
> URL: https://issues.apache.org/jira/browse/IGNITE-11406
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.9
>Reporter: Dmitry Sherstobitov
>Assignee: Pavel Vinokurov
>Priority: Critical
> Fix For: 2.10
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> During testing fixes for https://issues.apache.org/jira/browse/IGNITE-10878
>  # Start cluster, create caches with no persistence and load data into it
>  # Restart each node in cluster by order (coordinator first)
> Do not wait until topology message occurs 
>  # Try to run utilities: activate, baseline (to check that cluster is alive)
>  # Run clients and load data into alive caches
> On 4th step one of the clients throw NPE on start
> {code:java}
> 2019-02-23T18:36:24,045][DEBUG][tcp-client-disco-msg-worker-#4][TcpDiscoverySpi]
>  Connection closed, local node received force fail message, will not try to 
> restore connection
> 2019-02-23T18:36:24,045][DEBUG][tcp-client-disco-msg-worker-#4][TcpDiscoverySpi]
>  Failed to restore closed connection, will try to reconnect 
> [networkTimeout=5000, joinTimeout=0, failMsg=TcpDiscoveryNodeFailedMessage 
> [failedNodeId=80f8b6ee-6a6d-4235-86e9-1b66ea310eb6, order=90, warning=Client 
> node considered as unreachable and will be dropped from cluster, because no 
> metrics update messages received in interval: 
> TcpDiscoverySpi.clientFailureDetectionTimeout() ms. It may be caused by 
> network problems or long GC pause on client node, try to increase this 
> parameter. [nodeId=80f8b6ee-6a6d-4235-86e9-1b66ea310eb6, 
> clientFailureDetectionTimeout=3], super=TcpDiscoveryAbstractMessage 
> [sndNodeId=987d4a03-8233-4130-af5b-c06900bdb6d7, 
> id=3642cfa1961-987d4a03-8233-4130-af5b-c06900bdb6d7, 
> verifierNodeId=d9abbff3-4b4d-4a13-9cb1-0ca4d2436164, topVer=167, 
> pendingIdx=0, failedNodes=null, isClient=false]]]
> 2019-02-23T18:36:24,046][DEBUG][tcp-client-disco-msg-worker-#4][TcpDiscoverySpi]
>  Discovery notification [node=TcpDiscoveryNode 
> [id=80f8b6ee-6a6d-4235-86e9-1b66ea310eb6, addrs=[172.25.1.34], 
> sockAddrs=[lab34.gridgain.local/172.25.1.34:0], discPort=0, order=165, 
> intOrder=0, lastExchangeTime=1550936128313, loc=true, 
> ver=2.4.15#20190222-sha1:36b1d676, isClient=true], 
> type=CLIENT_NODE_DISCONNECTED, topVer=166]
> 2019-02-23T18:36:24,049][INFO 
> ][tcp-client-disco-msg-worker-#4][GridDhtPartitionsExchangeFuture] Finish 
> exchange future [startVer=AffinityTopologyVersion [topVer=165, 
> minorTopVer=0], resVer=null, err=class 
> org.apache.ignite.internal.IgniteClientDisconnectedCheckedException: Client 
> node disconnected: null]
> [2019-02-23T18:36:24,061][ERROR][Thread-2][IgniteKernal] Got exception while 
> starting (will rollback startup routine).
> java.lang.NullPointerException: null
> at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.internalCacheEx(GridCacheProcessor.java:3886)
>  ~[ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.utilityCache(GridCacheProcessor.java:3858)
>  ~[ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.processors.service.GridServiceProcessor.updateUtilityCache(GridServiceProcessor.java:290)
>  ~[ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.processors.service.GridServiceProcessor.onKernalStart0(GridServiceProcessor.java:233)
>  ~[ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.processors.service.GridServiceProcessor.onKernalStart(GridServiceProcessor.java:221)
>  ~[ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1038) 
> [ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1973)
>  [ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1716)
>  [ignite-core-2.4.15.jar:2.4.15]
> at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1144) 
> [ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1062)
>  [ignite-core-2.4.15.jar:2.4.15]
> at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:948) 
> [ignite-core-2.4.15.jar:2.4.15]
> at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:847) 
> 

[jira] [Commented] (IGNITE-11406) NullPointerException may occur on client start

2020-12-27 Thread Pavel Vinokurov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-11406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255422#comment-17255422
 ] 

Pavel Vinokurov commented on IGNITE-11406:
--

[~ilyak] Fixed!

> NullPointerException may occur on client start
> --
>
> Key: IGNITE-11406
> URL: https://issues.apache.org/jira/browse/IGNITE-11406
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitry Sherstobitov
>Assignee: Pavel Vinokurov
>Priority: Critical
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> During testing fixes for https://issues.apache.org/jira/browse/IGNITE-10878
>  # Start cluster, create caches with no persistence and load data into it
>  # Restart each node in cluster by order (coordinator first)
> Do not wait until topology message occurs 
>  # Try to run utilities: activate, baseline (to check that cluster is alive)
>  # Run clients and load data into alive caches
> On 4th step one of the clients throw NPE on start
> {code:java}
> 2019-02-23T18:36:24,045][DEBUG][tcp-client-disco-msg-worker-#4][TcpDiscoverySpi]
>  Connection closed, local node received force fail message, will not try to 
> restore connection
> 2019-02-23T18:36:24,045][DEBUG][tcp-client-disco-msg-worker-#4][TcpDiscoverySpi]
>  Failed to restore closed connection, will try to reconnect 
> [networkTimeout=5000, joinTimeout=0, failMsg=TcpDiscoveryNodeFailedMessage 
> [failedNodeId=80f8b6ee-6a6d-4235-86e9-1b66ea310eb6, order=90, warning=Client 
> node considered as unreachable and will be dropped from cluster, because no 
> metrics update messages received in interval: 
> TcpDiscoverySpi.clientFailureDetectionTimeout() ms. It may be caused by 
> network problems or long GC pause on client node, try to increase this 
> parameter. [nodeId=80f8b6ee-6a6d-4235-86e9-1b66ea310eb6, 
> clientFailureDetectionTimeout=3], super=TcpDiscoveryAbstractMessage 
> [sndNodeId=987d4a03-8233-4130-af5b-c06900bdb6d7, 
> id=3642cfa1961-987d4a03-8233-4130-af5b-c06900bdb6d7, 
> verifierNodeId=d9abbff3-4b4d-4a13-9cb1-0ca4d2436164, topVer=167, 
> pendingIdx=0, failedNodes=null, isClient=false]]]
> 2019-02-23T18:36:24,046][DEBUG][tcp-client-disco-msg-worker-#4][TcpDiscoverySpi]
>  Discovery notification [node=TcpDiscoveryNode 
> [id=80f8b6ee-6a6d-4235-86e9-1b66ea310eb6, addrs=[172.25.1.34], 
> sockAddrs=[lab34.gridgain.local/172.25.1.34:0], discPort=0, order=165, 
> intOrder=0, lastExchangeTime=1550936128313, loc=true, 
> ver=2.4.15#20190222-sha1:36b1d676, isClient=true], 
> type=CLIENT_NODE_DISCONNECTED, topVer=166]
> 2019-02-23T18:36:24,049][INFO 
> ][tcp-client-disco-msg-worker-#4][GridDhtPartitionsExchangeFuture] Finish 
> exchange future [startVer=AffinityTopologyVersion [topVer=165, 
> minorTopVer=0], resVer=null, err=class 
> org.apache.ignite.internal.IgniteClientDisconnectedCheckedException: Client 
> node disconnected: null]
> [2019-02-23T18:36:24,061][ERROR][Thread-2][IgniteKernal] Got exception while 
> starting (will rollback startup routine).
> java.lang.NullPointerException: null
> at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.internalCacheEx(GridCacheProcessor.java:3886)
>  ~[ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.utilityCache(GridCacheProcessor.java:3858)
>  ~[ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.processors.service.GridServiceProcessor.updateUtilityCache(GridServiceProcessor.java:290)
>  ~[ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.processors.service.GridServiceProcessor.onKernalStart0(GridServiceProcessor.java:233)
>  ~[ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.processors.service.GridServiceProcessor.onKernalStart(GridServiceProcessor.java:221)
>  ~[ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1038) 
> [ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1973)
>  [ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1716)
>  [ignite-core-2.4.15.jar:2.4.15]
> at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1144) 
> [ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1062)
>  [ignite-core-2.4.15.jar:2.4.15]
> at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:948) 
> [ignite-core-2.4.15.jar:2.4.15]
> at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:847) 
> [ignite-core-2.4.15.jar:2.4.15]
> at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:717) 
> [ignite-core-2.4.15.jar:2.4.15]

[jira] [Commented] (IGNITE-11406) NullPointerException may occur on client start

2020-12-25 Thread Pavel Vinokurov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-11406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254839#comment-17254839
 ] 

Pavel Vinokurov commented on IGNITE-11406:
--

[~ilyak] Please review

> NullPointerException may occur on client start
> --
>
> Key: IGNITE-11406
> URL: https://issues.apache.org/jira/browse/IGNITE-11406
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitry Sherstobitov
>Assignee: Pavel Vinokurov
>Priority: Critical
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> During testing fixes for https://issues.apache.org/jira/browse/IGNITE-10878
>  # Start cluster, create caches with no persistence and load data into it
>  # Restart each node in cluster by order (coordinator first)
> Do not wait until topology message occurs 
>  # Try to run utilities: activate, baseline (to check that cluster is alive)
>  # Run clients and load data into alive caches
> On 4th step one of the clients throw NPE on start
> {code:java}
> 2019-02-23T18:36:24,045][DEBUG][tcp-client-disco-msg-worker-#4][TcpDiscoverySpi]
>  Connection closed, local node received force fail message, will not try to 
> restore connection
> 2019-02-23T18:36:24,045][DEBUG][tcp-client-disco-msg-worker-#4][TcpDiscoverySpi]
>  Failed to restore closed connection, will try to reconnect 
> [networkTimeout=5000, joinTimeout=0, failMsg=TcpDiscoveryNodeFailedMessage 
> [failedNodeId=80f8b6ee-6a6d-4235-86e9-1b66ea310eb6, order=90, warning=Client 
> node considered as unreachable and will be dropped from cluster, because no 
> metrics update messages received in interval: 
> TcpDiscoverySpi.clientFailureDetectionTimeout() ms. It may be caused by 
> network problems or long GC pause on client node, try to increase this 
> parameter. [nodeId=80f8b6ee-6a6d-4235-86e9-1b66ea310eb6, 
> clientFailureDetectionTimeout=3], super=TcpDiscoveryAbstractMessage 
> [sndNodeId=987d4a03-8233-4130-af5b-c06900bdb6d7, 
> id=3642cfa1961-987d4a03-8233-4130-af5b-c06900bdb6d7, 
> verifierNodeId=d9abbff3-4b4d-4a13-9cb1-0ca4d2436164, topVer=167, 
> pendingIdx=0, failedNodes=null, isClient=false]]]
> 2019-02-23T18:36:24,046][DEBUG][tcp-client-disco-msg-worker-#4][TcpDiscoverySpi]
>  Discovery notification [node=TcpDiscoveryNode 
> [id=80f8b6ee-6a6d-4235-86e9-1b66ea310eb6, addrs=[172.25.1.34], 
> sockAddrs=[lab34.gridgain.local/172.25.1.34:0], discPort=0, order=165, 
> intOrder=0, lastExchangeTime=1550936128313, loc=true, 
> ver=2.4.15#20190222-sha1:36b1d676, isClient=true], 
> type=CLIENT_NODE_DISCONNECTED, topVer=166]
> 2019-02-23T18:36:24,049][INFO 
> ][tcp-client-disco-msg-worker-#4][GridDhtPartitionsExchangeFuture] Finish 
> exchange future [startVer=AffinityTopologyVersion [topVer=165, 
> minorTopVer=0], resVer=null, err=class 
> org.apache.ignite.internal.IgniteClientDisconnectedCheckedException: Client 
> node disconnected: null]
> [2019-02-23T18:36:24,061][ERROR][Thread-2][IgniteKernal] Got exception while 
> starting (will rollback startup routine).
> java.lang.NullPointerException: null
> at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.internalCacheEx(GridCacheProcessor.java:3886)
>  ~[ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.utilityCache(GridCacheProcessor.java:3858)
>  ~[ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.processors.service.GridServiceProcessor.updateUtilityCache(GridServiceProcessor.java:290)
>  ~[ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.processors.service.GridServiceProcessor.onKernalStart0(GridServiceProcessor.java:233)
>  ~[ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.processors.service.GridServiceProcessor.onKernalStart(GridServiceProcessor.java:221)
>  ~[ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1038) 
> [ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1973)
>  [ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1716)
>  [ignite-core-2.4.15.jar:2.4.15]
> at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1144) 
> [ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1062)
>  [ignite-core-2.4.15.jar:2.4.15]
> at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:948) 
> [ignite-core-2.4.15.jar:2.4.15]
> at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:847) 
> [ignite-core-2.4.15.jar:2.4.15]
> at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:717) 
> 

[jira] [Assigned] (IGNITE-11406) NullPointerException may occur on client start

2020-12-23 Thread Pavel Vinokurov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov reassigned IGNITE-11406:


Assignee: Pavel Vinokurov

> NullPointerException may occur on client start
> --
>
> Key: IGNITE-11406
> URL: https://issues.apache.org/jira/browse/IGNITE-11406
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitry Sherstobitov
>Assignee: Pavel Vinokurov
>Priority: Critical
>
> During testing fixes for https://issues.apache.org/jira/browse/IGNITE-10878
>  # Start cluster, create caches with no persistence and load data into it
>  # Restart each node in cluster by order (coordinator first)
> Do not wait until topology message occurs 
>  # Try to run utilities: activate, baseline (to check that cluster is alive)
>  # Run clients and load data into alive caches
> On 4th step one of the clients throw NPE on start
> {code:java}
> 2019-02-23T18:36:24,045][DEBUG][tcp-client-disco-msg-worker-#4][TcpDiscoverySpi]
>  Connection closed, local node received force fail message, will not try to 
> restore connection
> 2019-02-23T18:36:24,045][DEBUG][tcp-client-disco-msg-worker-#4][TcpDiscoverySpi]
>  Failed to restore closed connection, will try to reconnect 
> [networkTimeout=5000, joinTimeout=0, failMsg=TcpDiscoveryNodeFailedMessage 
> [failedNodeId=80f8b6ee-6a6d-4235-86e9-1b66ea310eb6, order=90, warning=Client 
> node considered as unreachable and will be dropped from cluster, because no 
> metrics update messages received in interval: 
> TcpDiscoverySpi.clientFailureDetectionTimeout() ms. It may be caused by 
> network problems or long GC pause on client node, try to increase this 
> parameter. [nodeId=80f8b6ee-6a6d-4235-86e9-1b66ea310eb6, 
> clientFailureDetectionTimeout=3], super=TcpDiscoveryAbstractMessage 
> [sndNodeId=987d4a03-8233-4130-af5b-c06900bdb6d7, 
> id=3642cfa1961-987d4a03-8233-4130-af5b-c06900bdb6d7, 
> verifierNodeId=d9abbff3-4b4d-4a13-9cb1-0ca4d2436164, topVer=167, 
> pendingIdx=0, failedNodes=null, isClient=false]]]
> 2019-02-23T18:36:24,046][DEBUG][tcp-client-disco-msg-worker-#4][TcpDiscoverySpi]
>  Discovery notification [node=TcpDiscoveryNode 
> [id=80f8b6ee-6a6d-4235-86e9-1b66ea310eb6, addrs=[172.25.1.34], 
> sockAddrs=[lab34.gridgain.local/172.25.1.34:0], discPort=0, order=165, 
> intOrder=0, lastExchangeTime=1550936128313, loc=true, 
> ver=2.4.15#20190222-sha1:36b1d676, isClient=true], 
> type=CLIENT_NODE_DISCONNECTED, topVer=166]
> 2019-02-23T18:36:24,049][INFO 
> ][tcp-client-disco-msg-worker-#4][GridDhtPartitionsExchangeFuture] Finish 
> exchange future [startVer=AffinityTopologyVersion [topVer=165, 
> minorTopVer=0], resVer=null, err=class 
> org.apache.ignite.internal.IgniteClientDisconnectedCheckedException: Client 
> node disconnected: null]
> [2019-02-23T18:36:24,061][ERROR][Thread-2][IgniteKernal] Got exception while 
> starting (will rollback startup routine).
> java.lang.NullPointerException: null
> at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.internalCacheEx(GridCacheProcessor.java:3886)
>  ~[ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.utilityCache(GridCacheProcessor.java:3858)
>  ~[ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.processors.service.GridServiceProcessor.updateUtilityCache(GridServiceProcessor.java:290)
>  ~[ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.processors.service.GridServiceProcessor.onKernalStart0(GridServiceProcessor.java:233)
>  ~[ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.processors.service.GridServiceProcessor.onKernalStart(GridServiceProcessor.java:221)
>  ~[ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1038) 
> [ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1973)
>  [ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1716)
>  [ignite-core-2.4.15.jar:2.4.15]
> at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1144) 
> [ignite-core-2.4.15.jar:2.4.15]
> at 
> org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1062)
>  [ignite-core-2.4.15.jar:2.4.15]
> at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:948) 
> [ignite-core-2.4.15.jar:2.4.15]
> at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:847) 
> [ignite-core-2.4.15.jar:2.4.15]
> at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:717) 
> [ignite-core-2.4.15.jar:2.4.15]
> at 

[jira] [Commented] (IGNITE-13507) NullPointerException on tx recovery

2020-12-10 Thread Pavel Vinokurov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247090#comment-17247090
 ] 

Pavel Vinokurov commented on IGNITE-13507:
--

[~ilyak] Please reivew

> NullPointerException on tx recovery
> ---
>
> Key: IGNITE-13507
> URL: https://issues.apache.org/jira/browse/IGNITE-13507
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.7.5
>Reporter: Pavel Vinokurov
>Assignee: Pavel Vinokurov
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Server node failed because of NullPointerException on tx recovery:
> {code:java}
> [17:15:02,428][SEVERE][sys-#305][] Critical system error detected. Will be 
> handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler 
> [tryStop=false, timeout=0, super=AbstractFailureHandler 
> [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext 
> [type=CRITICAL_ERROR, err=class o.a.i.IgniteException: Failed to perform tx 
> recovery]]
> class org.apache.ignite.IgniteException: Failed to perform tx recovery
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager$TxRecoveryInitRunnable.run(IgniteTxManager.java:3288)
>   at 
> org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7186)
>   at 
> org.apache.ignite.internal.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:826)
>   at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.ignite.internal.IgniteFeatures.nodeSupports(IgniteFeatures.java:212)
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager$TxRecoveryInitRunnable.isMvccRecoveryMessageRequired(IgniteTxManager.java:3304)
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager$TxRecoveryInitRunnable.run(IgniteTxManager.java:3208)
>   ... 6 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-8719) Index left partially built if a node crashes during index create or rebuild

2020-12-03 Thread Pavel Vinokurov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-8719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243203#comment-17243203
 ] 

Pavel Vinokurov commented on IGNITE-8719:
-

The issue still could be reproduced in both cases when an index is creating or 
rebuilding.
Considering impact of this issue I suppose it could be fixed before 
implementation of IEP-28.

> Index left partially built if a node crashes during index create or rebuild
> ---
>
> Key: IGNITE-8719
> URL: https://issues.apache.org/jira/browse/IGNITE-8719
> Project: Ignite
>  Issue Type: Bug
>Reporter: Alexey Goncharuk
>Priority: Critical
> Attachments: IndexRebuildAfterNodeCrashTest.java, 
> IndexRebuildingTest.java
>
>
> Currently, we do not have any state associated with the index tree. Consider 
> the following scenario:
> 1) Start node, put some data
> 2) start CREATE INDEX operation
> 3) Wait for a checkpoint and stop node before index create finished
> 4) Restart node
> Since the checkpoint finished, the new index tree will be persisted to the 
> disk, but not all data will be present in the index.
> We should somehow store information about initializing index tree and mark it 
> valid only after all data is indexed. The state should be persisted as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13792) Reconnecting clients trigger failure handler

2020-12-01 Thread Pavel Vinokurov (Jira)
Pavel Vinokurov created IGNITE-13792:


 Summary: Reconnecting clients trigger failure handler
 Key: IGNITE-13792
 URL: https://issues.apache.org/jira/browse/IGNITE-13792
 Project: Ignite
  Issue Type: Bug
  Components: cache
Affects Versions: 2.9
Reporter: Pavel Vinokurov
 Attachments: UnstableClients.java


{code:java}
Dec 01, 2020 9:38:29 PM java.util.logging.LogManager$RootLogger log
SEVERE: JVM will be halted immediately due to the failure: 
[failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=class 
o.a.i.IgniteCheckedException: Affinity for topology version is not initialized 
[locNode=b50635ff-0324-431b-bc34-00a6cd36c9e3, grp=ignite-sys-cache, 
topVer=AffinityTopologyVersion [topVer=570, minorTopVer=0], 
head=AffinityTopologyVersion [topVer=569, minorTopVer=0], 
history=[AffinityTopologyVersion [topVer=551, minorTopVer=0], 
AffinityTopologyVersion [topVer=552, minorTopVer=0], AffinityTopologyVersion 
[topVer=553, minorTopVer=0], AffinityTopologyVersion [topVer=554, 
minorTopVer=0], AffinityTopologyVersion [topVer=555, minorTopVer=0], 
AffinityTopologyVersion [topVer=556, minorTopVer=0], AffinityTopologyVersion 
[topVer=557, minorTopVer=0], AffinityTopologyVersion [topVer=558, 
minorTopVer=0], AffinityTopologyVersion [topVer=559, minorTopVer=0], 
AffinityTopologyVersion [topVer=560, minorTopVer=0], AffinityTopologyVersion 
[topVer=561, minorTopVer=0], AffinityTopologyVersion [topVer=562, 
minorTopVer=0], AffinityTopologyVersion [topVer=563, minorTopVer=0], 
AffinityTopologyVersion [topVer=564, minorTopVer=0], AffinityTopologyVersion 
[topVer=565, minorTopVer=0], AffinityTopologyVersion [topVer=566, 
minorTopVer=0], AffinityTopologyVersion [topVer=567, minorTopVer=0], 
AffinityTopologyVersion [topVer=568, minorTopVer=0], AffinityTopologyVersion 
[topVer=569, minorTopVer=0]
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-13791) NullPointerException when topology is unstable

2020-12-01 Thread Pavel Vinokurov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-13791:
-
Attachment: (was: UnstableServerTopology.java)

> NullPointerException when topology is unstable 
> ---
>
> Key: IGNITE-13791
> URL: https://issues.apache.org/jira/browse/IGNITE-13791
> Project: Ignite
>  Issue Type: Bug
>  Components: networking
>Affects Versions: 2.9.1
>Reporter: Pavel Vinokurov
>Priority: Major
> Attachments: UnstableServerTopology.java
>
>
> Unstable topology with blinking server nodes leads to the critical system 
> error:
> {code:java}
> SEVERE: Critical system error detected. Will be handled accordingly to 
> configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, 
> timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet 
> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], 
> failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, 
> err=java.lang.NullPointerException]]
> java.lang.NullPointerException
>   at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processNodeAddedMessage(ServerImpl.java:5096)
>   at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:3236)
>   at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2915)
>   at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:8064)
>   at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:3086)
>   at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
>   at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7995)
>   at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58)
> Dec 01, 2020 8:22:55 PM java.util.logging.LogManager$RootLogger log
> SEVERE: JVM will be halted immediately due to the failure: 
> [failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, 
> err=java.lang.NullPointerException]]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-13791) NullPointerException when topology is unstable

2020-12-01 Thread Pavel Vinokurov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-13791:
-
Attachment: UnstableServerTopology.java

> NullPointerException when topology is unstable 
> ---
>
> Key: IGNITE-13791
> URL: https://issues.apache.org/jira/browse/IGNITE-13791
> Project: Ignite
>  Issue Type: Bug
>  Components: networking
>Affects Versions: 2.9.1
>Reporter: Pavel Vinokurov
>Priority: Major
> Attachments: UnstableServerTopology.java
>
>
> Unstable topology with blinking server nodes leads to the critical system 
> error:
> {code:java}
> SEVERE: Critical system error detected. Will be handled accordingly to 
> configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, 
> timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet 
> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], 
> failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, 
> err=java.lang.NullPointerException]]
> java.lang.NullPointerException
>   at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processNodeAddedMessage(ServerImpl.java:5096)
>   at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:3236)
>   at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2915)
>   at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:8064)
>   at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:3086)
>   at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
>   at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7995)
>   at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58)
> Dec 01, 2020 8:22:55 PM java.util.logging.LogManager$RootLogger log
> SEVERE: JVM will be halted immediately due to the failure: 
> [failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, 
> err=java.lang.NullPointerException]]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13791) NullPointerException when topology is unstable

2020-12-01 Thread Pavel Vinokurov (Jira)
Pavel Vinokurov created IGNITE-13791:


 Summary: NullPointerException when topology is unstable 
 Key: IGNITE-13791
 URL: https://issues.apache.org/jira/browse/IGNITE-13791
 Project: Ignite
  Issue Type: Bug
  Components: networking
Affects Versions: 2.9.1
Reporter: Pavel Vinokurov
 Attachments: UnstableServerTopology.java

Unstable topology with blinking server nodes leads to the critical system error:

{code:java}
SEVERE: Critical system error detected. Will be handled accordingly to 
configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet 
[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], 
failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, 
err=java.lang.NullPointerException]]
java.lang.NullPointerException
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processNodeAddedMessage(ServerImpl.java:5096)
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:3236)
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2915)
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:8064)
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:3086)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7995)
at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58)

Dec 01, 2020 8:22:55 PM java.util.logging.LogManager$RootLogger log
SEVERE: JVM will be halted immediately due to the failure: 
[failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, 
err=java.lang.NullPointerException]]
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-13507) NullPointerException on tx recovery

2020-11-17 Thread Pavel Vinokurov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-13507:
-
Summary: NullPointerException on tx recovery  (was: NullPointerException 
error on tx recovery)

> NullPointerException on tx recovery
> ---
>
> Key: IGNITE-13507
> URL: https://issues.apache.org/jira/browse/IGNITE-13507
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.7.5
>Reporter: Pavel Vinokurov
>Assignee: Pavel Vinokurov
>Priority: Major
>
> Server node failed because of NullPointerException on tx recovery:
> {code:java}
> [17:15:02,428][SEVERE][sys-#305][] Critical system error detected. Will be 
> handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler 
> [tryStop=false, timeout=0, super=AbstractFailureHandler 
> [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext 
> [type=CRITICAL_ERROR, err=class o.a.i.IgniteException: Failed to perform tx 
> recovery]]
> class org.apache.ignite.IgniteException: Failed to perform tx recovery
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager$TxRecoveryInitRunnable.run(IgniteTxManager.java:3288)
>   at 
> org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7186)
>   at 
> org.apache.ignite.internal.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:826)
>   at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.ignite.internal.IgniteFeatures.nodeSupports(IgniteFeatures.java:212)
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager$TxRecoveryInitRunnable.isMvccRecoveryMessageRequired(IgniteTxManager.java:3304)
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager$TxRecoveryInitRunnable.run(IgniteTxManager.java:3208)
>   ... 6 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-13507) NullPointerException error on tx recovery

2020-11-17 Thread Pavel Vinokurov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-13507:
-
Summary: NullPointerException error on tx recovery  (was: Critical error on 
tx recovery)

> NullPointerException error on tx recovery
> -
>
> Key: IGNITE-13507
> URL: https://issues.apache.org/jira/browse/IGNITE-13507
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.7.5
>Reporter: Pavel Vinokurov
>Assignee: Pavel Vinokurov
>Priority: Major
>
> Server node failed because of NullPointerException on tx recovery:
> {code:java}
> [17:15:02,428][SEVERE][sys-#305][] Critical system error detected. Will be 
> handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler 
> [tryStop=false, timeout=0, super=AbstractFailureHandler 
> [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext 
> [type=CRITICAL_ERROR, err=class o.a.i.IgniteException: Failed to perform tx 
> recovery]]
> class org.apache.ignite.IgniteException: Failed to perform tx recovery
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager$TxRecoveryInitRunnable.run(IgniteTxManager.java:3288)
>   at 
> org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7186)
>   at 
> org.apache.ignite.internal.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:826)
>   at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.ignite.internal.IgniteFeatures.nodeSupports(IgniteFeatures.java:212)
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager$TxRecoveryInitRunnable.isMvccRecoveryMessageRequired(IgniteTxManager.java:3304)
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager$TxRecoveryInitRunnable.run(IgniteTxManager.java:3208)
>   ... 6 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IGNITE-13507) Critical error on tx recovery

2020-11-16 Thread Pavel Vinokurov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov reassigned IGNITE-13507:


Assignee: Pavel Vinokurov

> Critical error on tx recovery
> -
>
> Key: IGNITE-13507
> URL: https://issues.apache.org/jira/browse/IGNITE-13507
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.7.5
>Reporter: Pavel Vinokurov
>Assignee: Pavel Vinokurov
>Priority: Major
>
> Server node failed because of NullPointerException on tx recovery:
> {code:java}
> [17:15:02,428][SEVERE][sys-#305][] Critical system error detected. Will be 
> handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler 
> [tryStop=false, timeout=0, super=AbstractFailureHandler 
> [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext 
> [type=CRITICAL_ERROR, err=class o.a.i.IgniteException: Failed to perform tx 
> recovery]]
> class org.apache.ignite.IgniteException: Failed to perform tx recovery
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager$TxRecoveryInitRunnable.run(IgniteTxManager.java:3288)
>   at 
> org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7186)
>   at 
> org.apache.ignite.internal.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:826)
>   at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.ignite.internal.IgniteFeatures.nodeSupports(IgniteFeatures.java:212)
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager$TxRecoveryInitRunnable.isMvccRecoveryMessageRequired(IgniteTxManager.java:3304)
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager$TxRecoveryInitRunnable.run(IgniteTxManager.java:3208)
>   ... 6 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13649) Local cache causes system thread pool overflow

2020-11-01 Thread Pavel Vinokurov (Jira)
Pavel Vinokurov created IGNITE-13649:


 Summary: Local cache causes system thread pool overflow
 Key: IGNITE-13649
 URL: https://issues.apache.org/jira/browse/IGNITE-13649
 Project: Ignite
  Issue Type: Bug
  Components: cache
Affects Versions: 2.8.1
Reporter: Pavel Vinokurov
 Attachments: LocalCacheAndStoreReproducerClient.java, 
LocalCacheAndStoreReproducerServer.java

Calling get operations for a LOCAL cache with read-through within a long 
running job causes 
system thread pool overflow.

Scenario:
1. Start 2 server nodes using LocalCacheAndStoreReproducerServer
2. Start 1 client node using LocalCacheAndStoreReproducerClient
3. Forcible stop the client node.

Result:
The system thread pool is consistently increasing until OOM.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13632) Transaction hangs due to communication failures

2020-10-27 Thread Pavel Vinokurov (Jira)
Pavel Vinokurov created IGNITE-13632:


 Summary: Transaction hangs due to communication failures
 Key: IGNITE-13632
 URL: https://issues.apache.org/jira/browse/IGNITE-13632
 Project: Ignite
  Issue Type: Bug
  Components: cache
Affects Versions: 2.8.1
Reporter: Pavel Vinokurov
 Attachments: TxReproducer.java

Transaction hangs after dropping communication messages.
The reproducer is attached



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-13590) Node fails with "Node with the same ID was found in node IDs history" after missing TcpDiscoveryNodeAddedMessage

2020-10-19 Thread Pavel Vinokurov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-13590:
-
Description: 
A new server node sends the join request and  doesn't receive 
TcpDiscoveryNodeAddedMessage due to network issues.
The node retries the join request and fails with:
{code:java}
Caused by: class org.apache.ignite.spi.IgniteSpiException: Node with the same 
ID was found in node IDs history or existing node in topology has the same ID 
(fix configuration and restart local node) 
{code}

Instead of fail down it could retry joining the cluster after 
failureDetectionTimeout.



  was:
A new server node sends the join request and  doesn't receive 
TcpDiscoveryNodeAddedMessage due to network issues.
The node retries the join request and fails down with:
{code:java}
Caused by: class org.apache.ignite.spi.IgniteSpiException: Node with the same 
ID was found in node IDs history or existing node in topology has the same ID 
(fix configuration and restart local node) 
{code}

Instead of fail down it could retry joining the cluster after 
failureDetectionTimeout.




> Node fails with "Node with the same ID was found in node IDs history" after 
> missing TcpDiscoveryNodeAddedMessage
> 
>
> Key: IGNITE-13590
> URL: https://issues.apache.org/jira/browse/IGNITE-13590
> Project: Ignite
>  Issue Type: Bug
>  Components: networking
>Affects Versions: 2.8.1
>Reporter: Pavel Vinokurov
>Priority: Major
> Attachments: TcpDiscoveryMissingNodeAddedMessageTest.class
>
>
> A new server node sends the join request and  doesn't receive 
> TcpDiscoveryNodeAddedMessage due to network issues.
> The node retries the join request and fails with:
> {code:java}
> Caused by: class org.apache.ignite.spi.IgniteSpiException: Node with the same 
> ID was found in node IDs history or existing node in topology has the same ID 
> (fix configuration and restart local node) 
> {code}
> Instead of fail down it could retry joining the cluster after 
> failureDetectionTimeout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-13590) Node fails with "Node with the same ID was found in node IDs history" after missing TcpDiscoveryNodeAddedMessage

2020-10-19 Thread Pavel Vinokurov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-13590:
-
Description: 
A new server node sends the join request and  doesn't receive 
TcpDiscoveryNodeAddedMessage due to network issues.
The node retries the join request and fails with:
{code:java}
Caused by: class org.apache.ignite.spi.IgniteSpiException: Node with the same 
ID was found in node IDs history or existing node in topology has the same ID 
(fix configuration and restart local node) 
{code}

Instead of stopping it could retry joining to the cluster after 
failureDetectionTimeout.



  was:
A new server node sends the join request and  doesn't receive 
TcpDiscoveryNodeAddedMessage due to network issues.
The node retries the join request and fails with:
{code:java}
Caused by: class org.apache.ignite.spi.IgniteSpiException: Node with the same 
ID was found in node IDs history or existing node in topology has the same ID 
(fix configuration and restart local node) 
{code}

Instead of stopping it could retry joining the cluster after 
failureDetectionTimeout.




> Node fails with "Node with the same ID was found in node IDs history" after 
> missing TcpDiscoveryNodeAddedMessage
> 
>
> Key: IGNITE-13590
> URL: https://issues.apache.org/jira/browse/IGNITE-13590
> Project: Ignite
>  Issue Type: Bug
>  Components: networking
>Affects Versions: 2.8.1
>Reporter: Pavel Vinokurov
>Priority: Major
> Attachments: TcpDiscoveryMissingNodeAddedMessageTest.class
>
>
> A new server node sends the join request and  doesn't receive 
> TcpDiscoveryNodeAddedMessage due to network issues.
> The node retries the join request and fails with:
> {code:java}
> Caused by: class org.apache.ignite.spi.IgniteSpiException: Node with the same 
> ID was found in node IDs history or existing node in topology has the same ID 
> (fix configuration and restart local node) 
> {code}
> Instead of stopping it could retry joining to the cluster after 
> failureDetectionTimeout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-13590) Node fails with "Node with the same ID was found in node IDs history" after missing TcpDiscoveryNodeAddedMessage

2020-10-19 Thread Pavel Vinokurov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-13590:
-
Attachment: (was: TcpDiscoveryMissingNodeAddedMessageTest.class)

> Node fails with "Node with the same ID was found in node IDs history" after 
> missing TcpDiscoveryNodeAddedMessage
> 
>
> Key: IGNITE-13590
> URL: https://issues.apache.org/jira/browse/IGNITE-13590
> Project: Ignite
>  Issue Type: Bug
>  Components: networking
>Affects Versions: 2.8.1
>Reporter: Pavel Vinokurov
>Priority: Major
> Attachments: TcpDiscoveryMissingNodeAddedMessageTest.java
>
>
> A new server node sends the join request and  doesn't receive 
> TcpDiscoveryNodeAddedMessage due to network issues.
> The node retries the join request and fails with:
> {code:java}
> Caused by: class org.apache.ignite.spi.IgniteSpiException: Node with the same 
> ID was found in node IDs history or existing node in topology has the same ID 
> (fix configuration and restart local node) 
> {code}
> Instead of stopping it could retry joining to the cluster after 
> failureDetectionTimeout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-13590) Node fails with "Node with the same ID was found in node IDs history" after missing TcpDiscoveryNodeAddedMessage

2020-10-19 Thread Pavel Vinokurov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-13590:
-
Attachment: TcpDiscoveryMissingNodeAddedMessageTest.java

> Node fails with "Node with the same ID was found in node IDs history" after 
> missing TcpDiscoveryNodeAddedMessage
> 
>
> Key: IGNITE-13590
> URL: https://issues.apache.org/jira/browse/IGNITE-13590
> Project: Ignite
>  Issue Type: Bug
>  Components: networking
>Affects Versions: 2.8.1
>Reporter: Pavel Vinokurov
>Priority: Major
> Attachments: TcpDiscoveryMissingNodeAddedMessageTest.java
>
>
> A new server node sends the join request and  doesn't receive 
> TcpDiscoveryNodeAddedMessage due to network issues.
> The node retries the join request and fails with:
> {code:java}
> Caused by: class org.apache.ignite.spi.IgniteSpiException: Node with the same 
> ID was found in node IDs history or existing node in topology has the same ID 
> (fix configuration and restart local node) 
> {code}
> Instead of stopping it could retry joining to the cluster after 
> failureDetectionTimeout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-13590) Node fails with "Node with the same ID was found in node IDs history" after missing TcpDiscoveryNodeAddedMessage

2020-10-19 Thread Pavel Vinokurov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-13590:
-
Description: 
A new server node sends the join request and  doesn't receive 
TcpDiscoveryNodeAddedMessage due to network issues.
The node retries the join request and fails down with:
{code:java}
Caused by: class org.apache.ignite.spi.IgniteSpiException: Node with the same 
ID was found in node IDs history or existing node in topology has the same ID 
(fix configuration and restart local node) 
{code}

Instead of fail down it could retry joining the cluster after 
failureDetectionTimeout.



  was:
A new server node sends the join request and  doesn't receive 
TcpDiscoveryNodeAddedMessage due to network issues.
it retries the join request and fails down with 

{code:java}
Caused by: class org.apache.ignite.spi.IgniteSpiException: Node with the same 
ID was found in node IDs history or existing node in topology has the same ID 
(fix configuration and restart local node) 
{code}

Instead of fail down it could retry joining the cluster after 
failureDetectionTimeout.




> Node fails with "Node with the same ID was found in node IDs history" after 
> missing TcpDiscoveryNodeAddedMessage
> 
>
> Key: IGNITE-13590
> URL: https://issues.apache.org/jira/browse/IGNITE-13590
> Project: Ignite
>  Issue Type: Bug
>  Components: networking
>Affects Versions: 2.8.1
>Reporter: Pavel Vinokurov
>Priority: Major
> Attachments: TcpDiscoveryMissingNodeAddedMessageTest.class
>
>
> A new server node sends the join request and  doesn't receive 
> TcpDiscoveryNodeAddedMessage due to network issues.
> The node retries the join request and fails down with:
> {code:java}
> Caused by: class org.apache.ignite.spi.IgniteSpiException: Node with the same 
> ID was found in node IDs history or existing node in topology has the same ID 
> (fix configuration and restart local node) 
> {code}
> Instead of fail down it could retry joining the cluster after 
> failureDetectionTimeout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-13590) Node fails with "Node with the same ID was found in node IDs history" after missing TcpDiscoveryNodeAddedMessage

2020-10-19 Thread Pavel Vinokurov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-13590:
-
Description: 
A new server node sends the join request and  doesn't receive 
TcpDiscoveryNodeAddedMessage due to network issues.
The node retries the join request and fails with:
{code:java}
Caused by: class org.apache.ignite.spi.IgniteSpiException: Node with the same 
ID was found in node IDs history or existing node in topology has the same ID 
(fix configuration and restart local node) 
{code}

Instead of stopping it could retry joining the cluster after 
failureDetectionTimeout.



  was:
A new server node sends the join request and  doesn't receive 
TcpDiscoveryNodeAddedMessage due to network issues.
The node retries the join request and fails with:
{code:java}
Caused by: class org.apache.ignite.spi.IgniteSpiException: Node with the same 
ID was found in node IDs history or existing node in topology has the same ID 
(fix configuration and restart local node) 
{code}

Instead of fail down it could retry joining the cluster after 
failureDetectionTimeout.




> Node fails with "Node with the same ID was found in node IDs history" after 
> missing TcpDiscoveryNodeAddedMessage
> 
>
> Key: IGNITE-13590
> URL: https://issues.apache.org/jira/browse/IGNITE-13590
> Project: Ignite
>  Issue Type: Bug
>  Components: networking
>Affects Versions: 2.8.1
>Reporter: Pavel Vinokurov
>Priority: Major
> Attachments: TcpDiscoveryMissingNodeAddedMessageTest.class
>
>
> A new server node sends the join request and  doesn't receive 
> TcpDiscoveryNodeAddedMessage due to network issues.
> The node retries the join request and fails with:
> {code:java}
> Caused by: class org.apache.ignite.spi.IgniteSpiException: Node with the same 
> ID was found in node IDs history or existing node in topology has the same ID 
> (fix configuration and restart local node) 
> {code}
> Instead of stopping it could retry joining the cluster after 
> failureDetectionTimeout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13590) Node fails with "Node with the same ID was found in node IDs history" after missing TcpDiscoveryNodeAddedMessage

2020-10-19 Thread Pavel Vinokurov (Jira)
Pavel Vinokurov created IGNITE-13590:


 Summary: Node fails with "Node with the same ID was found in node 
IDs history" after missing TcpDiscoveryNodeAddedMessage
 Key: IGNITE-13590
 URL: https://issues.apache.org/jira/browse/IGNITE-13590
 Project: Ignite
  Issue Type: Bug
  Components: networking
Affects Versions: 2.8.1
Reporter: Pavel Vinokurov
 Attachments: TcpDiscoveryMissingNodeAddedMessageTest.class

A new server node sends the join request and  doesn't receive 
TcpDiscoveryNodeAddedMessage due to network issues.
it retries the join request and fails down with 

{code:java}
Caused by: class org.apache.ignite.spi.IgniteSpiException: Node with the same 
ID was found in node IDs history or existing node in topology has the same ID 
(fix configuration and restart local node) 
{code}

Instead of fail down it could retry joining the cluster after 
failureDetectionTimeout.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13507) Critical error on tx recovery

2020-10-01 Thread Pavel Vinokurov (Jira)
Pavel Vinokurov created IGNITE-13507:


 Summary: Critical error on tx recovery
 Key: IGNITE-13507
 URL: https://issues.apache.org/jira/browse/IGNITE-13507
 Project: Ignite
  Issue Type: Bug
  Components: cache
Affects Versions: 2.7.5
Reporter: Pavel Vinokurov


Server node failed because of NullPointerException on tx recovery:
{code:java}
[17:15:02,428][SEVERE][sys-#305][] Critical system error detected. Will be 
handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler 
[tryStop=false, timeout=0, super=AbstractFailureHandler 
[ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext 
[type=CRITICAL_ERROR, err=class o.a.i.IgniteException: Failed to perform tx 
recovery]]
class org.apache.ignite.IgniteException: Failed to perform tx recovery
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager$TxRecoveryInitRunnable.run(IgniteTxManager.java:3288)
at 
org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7186)
at 
org.apache.ignite.internal.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:826)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.NullPointerException
at 
org.apache.ignite.internal.IgniteFeatures.nodeSupports(IgniteFeatures.java:212)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager$TxRecoveryInitRunnable.isMvccRecoveryMessageRequired(IgniteTxManager.java:3304)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager$TxRecoveryInitRunnable.run(IgniteTxManager.java:3208)
... 6 more
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-13439) Printing detailed classpath slowdowns node initialization

2020-09-21 Thread Pavel Vinokurov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-13439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199244#comment-17199244
 ] 

Pavel Vinokurov commented on IGNITE-13439:
--

[~ilyak] Please review

> Printing detailed classpath slowdowns node initialization
> -
>
> Key: IGNITE-13439
> URL: https://issues.apache.org/jira/browse/IGNITE-13439
> Project: Ignite
>  Issue Type: Bug
>  Components: general
>Affects Versions: 2.8.1
>Reporter: Pavel Vinokurov
>Assignee: Pavel Vinokurov
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If IGNITE_LOG_CLASSPATH_CONTENT_ON_STARTUP is enabled, 
> IgniteKernel#ackClassPathContent parses the classpath and recursively  
> traverses the file system printing all jars and class files.
> Traversing the files system could take much time in case of many class files 
> or having a root folder in the classpath. 
> The reasonable behavior is  to print only root classpath folders.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-13439) Printing detailed classpath slowdowns node initialization

2020-09-15 Thread Pavel Vinokurov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-13439:
-
Reviewer: Ilya Kasnacheev

> Printing detailed classpath slowdowns node initialization
> -
>
> Key: IGNITE-13439
> URL: https://issues.apache.org/jira/browse/IGNITE-13439
> Project: Ignite
>  Issue Type: Bug
>  Components: general
>Affects Versions: 2.8.1
>Reporter: Pavel Vinokurov
>Assignee: Pavel Vinokurov
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If IGNITE_LOG_CLASSPATH_CONTENT_ON_STARTUP is enabled, 
> IgniteKernel#ackClassPathContent parses the classpath and recursively  
> traverses the file system printing all jars and class files.
> Traversing the files system could take much time in case of many class files 
> or having a root folder in the classpath. 
> The reasonable behavior is  to print only root classpath folders.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IGNITE-13439) Printing detailed classpath slowdowns node initialization

2020-09-15 Thread Pavel Vinokurov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov reassigned IGNITE-13439:


Assignee: Pavel Vinokurov

> Printing detailed classpath slowdowns node initialization
> -
>
> Key: IGNITE-13439
> URL: https://issues.apache.org/jira/browse/IGNITE-13439
> Project: Ignite
>  Issue Type: Bug
>  Components: general
>Affects Versions: 2.8.1
>Reporter: Pavel Vinokurov
>Assignee: Pavel Vinokurov
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If IGNITE_LOG_CLASSPATH_CONTENT_ON_STARTUP is enabled, 
> IgniteKernel#ackClassPathContent parses the classpath and recursively  
> traverses the file system printing all jars and class files.
> Traversing the files system could take much time in case of many class files 
> or having a root folder in the classpath. 
> The reasonable behavior is  to print only root classpath folders.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13439) Printing detailed classpath slowdowns node initialization

2020-09-11 Thread Pavel Vinokurov (Jira)
Pavel Vinokurov created IGNITE-13439:


 Summary: Printing detailed classpath slowdowns node initialization
 Key: IGNITE-13439
 URL: https://issues.apache.org/jira/browse/IGNITE-13439
 Project: Ignite
  Issue Type: Bug
  Components: general
Affects Versions: 2.8.1
Reporter: Pavel Vinokurov


If IGNITE_LOG_CLASSPATH_CONTENT_ON_STARTUP is enabled, 
IgniteKernel#ackClassPathContent parses the classpath and recursively  
traverses the file system printing all jars and class files.
Traversing the files system could take much time in case of many class files or 
having a root folder in the classpath. 
The reasonable behavior is  to print only root classpath folders.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-9474) Ignite does not eagerly remove expired cache entries

2020-09-02 Thread Pavel Vinokurov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-9474:

Attachment: IgniteExpirationReproducerWithoutPersistance.java

> Ignite does not eagerly remove expired cache entries
> 
>
> Key: IGNITE-9474
> URL: https://issues.apache.org/jira/browse/IGNITE-9474
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.4
>Reporter: Pavel Vinokurov
>Priority: Major
> Attachments: IgniteExpirationReproducerWithoutPersistance.java
>
>
> cache.size() indicates existed rows, but any get operation returns empty 
> result.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-9474) Ignite does not eagerly remove expired cache entries

2020-09-02 Thread Pavel Vinokurov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-9474:

Attachment: (was: IgniteExpirationReproducerWithoutPersistance.java)

> Ignite does not eagerly remove expired cache entries
> 
>
> Key: IGNITE-9474
> URL: https://issues.apache.org/jira/browse/IGNITE-9474
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.4
>Reporter: Pavel Vinokurov
>Priority: Major
>
> cache.size() indicates existed rows, but any get operation returns empty 
> result.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IGNITE-13000) Connection.prepareStatement(String,int) always throws UnsupportedException ignoring 'autoGeneratedKeys' parameter

2020-05-12 Thread Pavel Vinokurov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-13000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov reassigned IGNITE-13000:


Assignee: Pavel Vinokurov

> Connection.prepareStatement(String,int) always throws UnsupportedException  
> ignoring  'autoGeneratedKeys' parameter
> ---
>
> Key: IGNITE-13000
> URL: https://issues.apache.org/jira/browse/IGNITE-13000
> Project: Ignite
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 2.8
>Reporter: Pavel Vinokurov
>Assignee: Pavel Vinokurov
>Priority: Major
>
> Below the method call throwing Exception.
> {code:java}
> conn.prepareStatement(query, Statement.NO_GENERATED_KEYS)
> {code}
> But there is should be the same result as for:
> {code:java}
> conn.prepareStatement(query)
> {code}
> The possible fix:
> {code:java}
> @Override 
> public PreparedStatement prepareStatement(String sql, int autoGeneratedKeys) 
> throws SQLException {
> ensureNotClosed();
> if(autoGeneratedKeys == Statement.RETURN_GENERATED_KEYS)
> throw new SQLFeatureNotSupportedException("Auto generated keys are 
> not supported.");
> return prepareStatement(sql);
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13000) Connection.prepareStatement(String,int) always throws UnsupportedException ignoring 'autoGeneratedKeys' parameter

2020-05-12 Thread Pavel Vinokurov (Jira)
Pavel Vinokurov created IGNITE-13000:


 Summary: Connection.prepareStatement(String,int) always throws 
UnsupportedException  ignoring  'autoGeneratedKeys' parameter
 Key: IGNITE-13000
 URL: https://issues.apache.org/jira/browse/IGNITE-13000
 Project: Ignite
  Issue Type: Bug
  Components: sql
Affects Versions: 2.8
Reporter: Pavel Vinokurov


Below the method call throwing Exception.
{code:java}
conn.prepareStatement(query, Statement.NO_GENERATED_KEYS)
{code}

But there is should be the same result as for:
{code:java}
conn.prepareStatement(query)
{code}


The possible fix:
{code:java}
@Override 
public PreparedStatement prepareStatement(String sql, int autoGeneratedKeys) 
throws SQLException {
ensureNotClosed();
if(autoGeneratedKeys == Statement.RETURN_GENERATED_KEYS)
throw new SQLFeatureNotSupportedException("Auto generated keys are not 
supported.");
return prepareStatement(sql);
}
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-11798) Memory leak on unstable topology caused by partition reservation

2019-04-23 Thread Pavel Vinokurov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-11798:
-
Affects Version/s: 2.7

> Memory leak on unstable topology caused by partition reservation
> 
>
> Key: IGNITE-11798
> URL: https://issues.apache.org/jira/browse/IGNITE-11798
> Project: Ignite
>  Issue Type: Bug
>  Components: cache, sql
>Affects Versions: 2.7
>Reporter: Pavel Vinokurov
>Priority: Major
> Attachments: PartitionReservationReproducer.java
>
>
> Executing queries on unstable topology leads to OOM caused by leak of  the 
> partition reservation.
> The reproducer is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11798) Memory leak on unstable topology caused by partition reservation

2019-04-23 Thread Pavel Vinokurov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-11798:
-
Summary: Memory leak on unstable topology caused by partition reservation  
(was: Memory leak on unstable topology caused by reservation partitions)

> Memory leak on unstable topology caused by partition reservation
> 
>
> Key: IGNITE-11798
> URL: https://issues.apache.org/jira/browse/IGNITE-11798
> Project: Ignite
>  Issue Type: Bug
>  Components: cache, sql
>Reporter: Pavel Vinokurov
>Priority: Major
> Attachments: PartitionReservationReproducer.java
>
>
> Executing queries on unstable topology leads to OOM caused by leak of  the 
> partition reservation.
> The reproducer is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11798) Memory leak on unstable topology caused by reservation partitions

2019-04-23 Thread Pavel Vinokurov (JIRA)
Pavel Vinokurov created IGNITE-11798:


 Summary: Memory leak on unstable topology caused by reservation 
partitions
 Key: IGNITE-11798
 URL: https://issues.apache.org/jira/browse/IGNITE-11798
 Project: Ignite
  Issue Type: Bug
  Components: cache, sql
Reporter: Pavel Vinokurov
 Attachments: PartitionReservationReproducer.java

Executing queries on unstable topology leads to OOM caused by leak of  the 
partition reservation.
The reproducer is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-11544) Unclear behavior for cache operations using classes different from specified as indexed types

2019-04-16 Thread Pavel Vinokurov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818999#comment-16818999
 ] 

Pavel Vinokurov commented on IGNITE-11544:
--

[~zstan]The main issue is to throwing CorruptedTreeException by 
cache2.removeAll().Thus at least it's unable to perform removeAll() operation.

> Unclear behavior for cache operations using classes different from specified 
> as indexed types
> -
>
> Key: IGNITE-11544
> URL: https://issues.apache.org/jira/browse/IGNITE-11544
> Project: Ignite
>  Issue Type: Bug
>  Components: cache, sql
>Affects Versions: 2.7
>Reporter: Pavel Vinokurov
>Assignee: Igor Belyakov
>Priority: Major
> Attachments: IndexedTypesReproducer.java
>
>
> There are a few cases presented in the attached reproducer where caches are 
> populated by objects of classes different from specified in 
> CacheConfiguration#setIndexedTypes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11699) Node can't start after forced shutdown if the wal archiver disabled

2019-04-09 Thread Pavel Vinokurov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-11699:
-
Attachment: disabled-wal-archive-reproducer.zip

> Node can't start after forced shutdown if the wal archiver disabled
> ---
>
> Key: IGNITE-11699
> URL: https://issues.apache.org/jira/browse/IGNITE-11699
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Affects Versions: 2.7
>Reporter: Pavel Vinokurov
>Priority: Major
> Attachments: disabled-wal-archive-reproducer.zip
>
>
> If a server node killed with the disabled wal archive, it could fail on start 
> with following exception:
> {code:java}
> [18:37:53,887][SEVERE][sys-stripe-1-#2][G] Failed to execute runnable.
> java.lang.IllegalStateException: Failed to get page IO instance (page content 
> is corrupted)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.tree.io.IOVersions.forVersion(IOVersions.java:85)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.tree.io.IOVersions.forPage(IOVersions.java:97)
>   at 
> org.apache.ignite.internal.pagemem.wal.record.delta.MetaPageUpdatePartitionDataRecord.applyDelta(MetaPageUpdatePartitionDataRecord.java:109)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyPageDelta(GridCacheDatabaseSharedManager.java:2532)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.lambda$performBinaryMemoryRestore$11(GridCacheDatabaseSharedManager.java:2327)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.lambda$stripedApplyPage$12(GridCacheDatabaseSharedManager.java:2441)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.lambda$stripedApply$13(GridCacheDatabaseSharedManager.java:2479)
>   at 
> org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:550)
>   at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> The reproducer is attached(works only on Linux).
> Steps to run the reproducer.
> 1. Copy config/server.xml into IGNITE_HOME/config folder;
> 2. Set IGNITE_HOME in the CorruptionReproducer class;
> 3. Launch  CorruptionReproducer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11699) Node can't start after forced shutdown if the wal archiver disabled

2019-04-09 Thread Pavel Vinokurov (JIRA)
Pavel Vinokurov created IGNITE-11699:


 Summary: Node can't start after forced shutdown if the wal 
archiver disabled
 Key: IGNITE-11699
 URL: https://issues.apache.org/jira/browse/IGNITE-11699
 Project: Ignite
  Issue Type: Bug
  Components: persistence
Affects Versions: 2.7
Reporter: Pavel Vinokurov


If a server node killed with the disabled wal archive, it could fail on start 
with following exception:

{code:java}
[18:37:53,887][SEVERE][sys-stripe-1-#2][G] Failed to execute runnable.
java.lang.IllegalStateException: Failed to get page IO instance (page content 
is corrupted)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.io.IOVersions.forVersion(IOVersions.java:85)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.io.IOVersions.forPage(IOVersions.java:97)
at 
org.apache.ignite.internal.pagemem.wal.record.delta.MetaPageUpdatePartitionDataRecord.applyDelta(MetaPageUpdatePartitionDataRecord.java:109)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyPageDelta(GridCacheDatabaseSharedManager.java:2532)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.lambda$performBinaryMemoryRestore$11(GridCacheDatabaseSharedManager.java:2327)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.lambda$stripedApplyPage$12(GridCacheDatabaseSharedManager.java:2441)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.lambda$stripedApply$13(GridCacheDatabaseSharedManager.java:2479)
at 
org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:550)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:748)
{code}


The reproducer is attached(works only on Linux).
Steps to run the reproducer.
1. Copy config/server.xml into IGNITE_HOME/config folder;
2. Set IGNITE_HOME in the CorruptionReproducer class;
3. Launch  CorruptionReproducer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IGNITE-8357) Recreated atomic sequence produces "Sequence was removed from cache"

2019-04-01 Thread Pavel Vinokurov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-8357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov reassigned IGNITE-8357:
---

Assignee: (was: Pavel Vinokurov)

> Recreated atomic sequence produces "Sequence was removed from cache"
> 
>
> Key: IGNITE-8357
> URL: https://issues.apache.org/jira/browse/IGNITE-8357
> Project: Ignite
>  Issue Type: Bug
>  Components: data structures
>Affects Versions: 2.4
>Reporter: Pavel Vinokurov
>Priority: Major
> Attachments: RecreatingAtomicSequence.java
>
>
> If a cluster has two or more nodes, recreated atomic sequence produces error 
> on incrementAndGet operation. 
> The reproducer is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IGNITE-11378) Critical system errors on cluster with enabled peristance

2019-03-21 Thread Pavel Vinokurov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov resolved IGNITE-11378.
--
Resolution: Not A Problem

There is the long checkpoint process caused by the small data region size.

> Critical system errors on cluster with enabled peristance
> -
>
> Key: IGNITE-11378
> URL: https://issues.apache.org/jira/browse/IGNITE-11378
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Affects Versions: 2.7
>Reporter: Pavel Vinokurov
>Priority: Major
> Attachments: CheckpointLockReproducer.java
>
>
> The attached reproducer shows the following exception during streaming data 
> to cache:
> {code:java}
> [2019-02-21 16:15:23,202][ERROR][tcp-disco-msg-worker-[100a6976 
> 0:0:0:0:0:0:0:1%lo:47500]-#19%3%][root] Critical system error detected. Will 
> be handled accordingly to configured handler 
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet []]], 
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
> o.a.i.IgniteException: GridWorker [name=data-streamer-stripe-5, 
> igniteInstanceName=3, finished=false, heartbeatTs=1550754912905]]]
> class org.apache.ignite.IgniteException: GridWorker 
> [name=data-streamer-stripe-5, igniteInstanceName=3, finished=false, 
> heartbeatTs=1550754912905]
> {code}
> If the blocked timeout is changed by  
> cfg.setSystemWorkerBlockedTimeout(30_000), after streaming data and 
> restarting several nodes the following critical error occurs:
> {code:java}
> [2019-02-21 16:24:07,164][ERROR][grid-timeout-worker-#214%client%][G] Blocked 
> system-critical thread has been detected. This can lead to cluster-wide 
> undefined behaviour [threadName=tcp-comm-worker, blockedFor=36s]
> [2019-02-21 16:24:07,166][WARN ][grid-timeout-worker-#214%client%][G] Thread 
> [name="tcp-comm-worker-#25%client%", id=482, state=TIMED_WAITING, blockCnt=0, 
> waitCnt=729]
> [2019-02-21 16:24:07,168][ERROR][grid-timeout-worker-#214%client%][root] 
> Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet []]], 
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
> o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, 
> igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]]]
> class org.apache.ignite.IgniteException: GridWorker [name=tcp-comm-worker, 
> igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IGNITE-9626) Applying WAL updates ignores evicition policy

2019-03-21 Thread Pavel Vinokurov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov resolved IGNITE-9626.
-
Resolution: Duplicate

> Applying WAL updates ignores evicition policy
> -
>
> Key: IGNITE-9626
> URL: https://issues.apache.org/jira/browse/IGNITE-9626
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.6
>Reporter: Pavel Vinokurov
>Assignee: Pavel Vinokurov
>Priority: Major
> Attachments: IgniteExpirationWitPeristanceReproducer.java
>
>
> Steps to reproduce:
> 1. Add record for cache obtained by ignite.cache().withExpiryPolicy().
> 2. Stops node before checkpoint.
> 3. Start node and get record for cache after specified duration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11585) Update Spring dependency to version 5

2019-03-21 Thread Pavel Vinokurov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-11585:
-
Issue Type: Wish  (was: Improvement)

> Update Spring dependency to version 5
> -
>
> Key: IGNITE-11585
> URL: https://issues.apache.org/jira/browse/IGNITE-11585
> Project: Ignite
>  Issue Type: Wish
>Reporter: Pavel Vinokurov
>Assignee: Pavel Vinokurov
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11585) Update Spring dependency to version 5

2019-03-20 Thread Pavel Vinokurov (JIRA)
Pavel Vinokurov created IGNITE-11585:


 Summary: Update Spring dependency to version 5
 Key: IGNITE-11585
 URL: https://issues.apache.org/jira/browse/IGNITE-11585
 Project: Ignite
  Issue Type: Improvement
Reporter: Pavel Vinokurov
Assignee: Pavel Vinokurov






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-7718) Collections.singleton() and Collections.singletonMap() are not properly serialized by binary marshaller

2019-03-18 Thread Pavel Vinokurov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-7718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16795146#comment-16795146
 ] 

Pavel Vinokurov commented on IGNITE-7718:
-

[~amashenkov] Please review

> Collections.singleton() and Collections.singletonMap() are not properly 
> serialized by binary marshaller
> ---
>
> Key: IGNITE-7718
> URL: https://issues.apache.org/jira/browse/IGNITE-7718
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.3
>Reporter: Pavel Vinokurov
>Assignee: Pavel Vinokurov
>Priority: Major
>
> After desialization collections obtained by Collections.singleton() and  
> Collections.singletonMap() does not return collection of binary objects, but 
> rather collection of deserialized objects. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11544) Unclear behavior for cache operations using classes different from specified as indexed types

2019-03-14 Thread Pavel Vinokurov (JIRA)
Pavel Vinokurov created IGNITE-11544:


 Summary: Unclear behavior for cache operations using classes 
different from specified as indexed types
 Key: IGNITE-11544
 URL: https://issues.apache.org/jira/browse/IGNITE-11544
 Project: Ignite
  Issue Type: Bug
  Components: cache, sql
Affects Versions: 2.7
Reporter: Pavel Vinokurov
 Attachments: IndexedTypesReproducer.java

There are a few cases presented in the attached reproducer where caches are 
populated by objects of classes different from specified in 
CacheConfiguration#setIndexedTypes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11524) Memory leak caused by executing an jdbc prepared statement

2019-03-12 Thread Pavel Vinokurov (JIRA)
Pavel Vinokurov created IGNITE-11524:


 Summary: Memory leak caused by executing an jdbc prepared statement
 Key: IGNITE-11524
 URL: https://issues.apache.org/jira/browse/IGNITE-11524
 Project: Ignite
  Issue Type: Bug
  Components: sql, thin client
Reporter: Pavel Vinokurov
 Fix For: 2.7
 Attachments: PreparedStatementOOMReproducer.java

Executing a prepared statement multiple times lead to OOM.
VisualVM indicates that heap contains  a lot of JdbcThinPreparedStatament 
objects.

The reproducer is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-11419) Memory leak after multiple restarts of server node within the same thread

2019-03-05 Thread Pavel Vinokurov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784180#comment-16784180
 ] 

Pavel Vinokurov edited comment on IGNITE-11419 at 3/5/19 8:28 AM:
--

ConnectionManager declares the thread local variable initialized by the 
instance of anonymous class.Thus the thread local variable linked with the 
ConnectionManager. It leads to memory leak.
{code:java}
/** Connection cache. */
private final 
ThreadLocal.Reusable> threadConn =
new ThreadLocal.Reusable>() {
@Override public ThreadLocalObjectPool.Reusable 
get() {
ThreadLocalObjectPool.Reusable reusable = 
super.get();
{code}



was (Author: pvinokurov):
ConnectionManager declares the thread local variable initialized by the 
instance of anonymous class.Thus the thread local variably linked with the 
ConnectionManager. It leads to memory leak.
{code:java}
/** Connection cache. */
private final 
ThreadLocal.Reusable> threadConn =
new ThreadLocal.Reusable>() {
@Override public ThreadLocalObjectPool.Reusable 
get() {
ThreadLocalObjectPool.Reusable reusable = 
super.get();
{code}


> Memory leak after multiple restarts of server node within the same thread
> -
>
> Key: IGNITE-11419
> URL: https://issues.apache.org/jira/browse/IGNITE-11419
> Project: Ignite
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 2.7
>Reporter: Pavel Vinokurov
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Multiple restarts of a server node with enabled persistence and 20 caches 
> lead to OutOfMemory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11419) Memory leak after multiple restarts of server node within the same thread

2019-03-04 Thread Pavel Vinokurov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-11419:
-
Affects Version/s: 2.7

> Memory leak after multiple restarts of server node within the same thread
> -
>
> Key: IGNITE-11419
> URL: https://issues.apache.org/jira/browse/IGNITE-11419
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.7
>Reporter: Pavel Vinokurov
>Priority: Minor
>
> Multiple restarts of a server node with enabled persistence and 20 caches 
> lead to OutOfMemory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11419) Memory leak after multiple restarts of server node within the same thread

2019-03-04 Thread Pavel Vinokurov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-11419:
-
Component/s: sql

> Memory leak after multiple restarts of server node within the same thread
> -
>
> Key: IGNITE-11419
> URL: https://issues.apache.org/jira/browse/IGNITE-11419
> Project: Ignite
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 2.7
>Reporter: Pavel Vinokurov
>Priority: Minor
>
> Multiple restarts of a server node with enabled persistence and 20 caches 
> lead to OutOfMemory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11419) Memory leak after multiple restarts of server node within the same thread

2019-03-04 Thread Pavel Vinokurov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-11419:
-
Ignite Flags:   (was: Docs Required)

> Memory leak after multiple restarts of server node within the same thread
> -
>
> Key: IGNITE-11419
> URL: https://issues.apache.org/jira/browse/IGNITE-11419
> Project: Ignite
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 2.7
>Reporter: Pavel Vinokurov
>Priority: Minor
>
> Multiple restarts of a server node with enabled persistence and 20 caches 
> lead to OutOfMemory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-11419) Memory leak after multiple restarts of server node within the same jvm

2019-03-04 Thread Pavel Vinokurov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784180#comment-16784180
 ] 

Pavel Vinokurov commented on IGNITE-11419:
--

ConnectionManager declares the thread local variable initialized by the 
instance of anonymous class.Thus the thread local variably linked with the 
ConnectionManager. It leads to memory leak.
{code:java}
/** Connection cache. */
private final 
ThreadLocal.Reusable> threadConn =
new ThreadLocal.Reusable>() {
@Override public ThreadLocalObjectPool.Reusable 
get() {
ThreadLocalObjectPool.Reusable reusable = 
super.get();
{code}


> Memory leak after multiple restarts of server node within the same jvm
> --
>
> Key: IGNITE-11419
> URL: https://issues.apache.org/jira/browse/IGNITE-11419
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Vinokurov
>Priority: Minor
>
> Multiple restarts of a server node with enabled persistence and 20 caches 
> lead to OutOfMemory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11419) Memory leak after multiple restarts of server node within the same jvm

2019-02-26 Thread Pavel Vinokurov (JIRA)
Pavel Vinokurov created IGNITE-11419:


 Summary: Memory leak after multiple restarts of server node within 
the same jvm
 Key: IGNITE-11419
 URL: https://issues.apache.org/jira/browse/IGNITE-11419
 Project: Ignite
  Issue Type: Bug
Reporter: Pavel Vinokurov


Multiple restarts of a server node with enabled persistence and 20 caches lead 
to OutOfMemory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-11383) Unable to restart node with WALMode.NONE

2019-02-22 Thread Pavel Vinokurov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16775018#comment-16775018
 ] 

Pavel Vinokurov commented on IGNITE-11383:
--

[~dpavlov] I suppose the server node should clean up PDS or show a correct 
message.

> Unable to restart node with WALMode.NONE 
> -
>
> Key: IGNITE-11383
> URL: https://issues.apache.org/jira/browse/IGNITE-11383
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Affects Versions: 2.7
>Reporter: Pavel Vinokurov
>Priority: Major
> Attachments: MemoryRestoreReproducer.java
>
>
> Scenario:
> 1. Start single node with persistence without WAL.
> 2. Stream data to a cache.
> 3. Restart the node.
> Result:
> Node failed with following exception.
> {code:java}
> Exception in thread "main" class org.apache.ignite.IgniteException: null
>   at 
> org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:1059)
>   at org.apache.ignite.Ignition.start(Ignition.java:324)
>   at MemoryRestoreReproducer.main(MemoryRestoreReproducer.java:27)
> Caused by: class org.apache.ignite.IgniteCheckedException: null
>   at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1196)
>   at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1992)
>   at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1683)
>   at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1109)
>   at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:629)
>   at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:554)
>   at org.apache.ignite.Ignition.start(Ignition.java:321)
>   ... 1 more
> Caused by: java.util.NoSuchElementException
>   at 
> org.apache.ignite.internal.util.GridCloseableIteratorAdapter.nextX(GridCloseableIteratorAdapter.java:39)
>   at 
> org.apache.ignite.internal.util.lang.GridIteratorAdapter.next(GridIteratorAdapter.java:35)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.read(FileWriteAheadLogManager.java:855)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.performBinaryMemoryRestore(GridCacheDatabaseSharedManager.java:2120)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:749)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetaStorageSubscribersOnReadyForRead(GridCacheDatabaseSharedManager.java:4963)
>   at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1058)
>   ... 7 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11378) Critical system errors on cluster with enabled peristance

2019-02-22 Thread Pavel Vinokurov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-11378:
-
Description: 
The attached reproducer shows the following exception during streaming data to 
cache:


{code:java}
[2019-02-21 16:15:23,202][ERROR][tcp-disco-msg-worker-[100a6976 
0:0:0:0:0:0:0:1%lo:47500]-#19%3%][root] Critical system error detected. Will be 
handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler 
[tryStop=false, timeout=0, super=AbstractFailureHandler 
[ignoredFailureTypes=UnmodifiableSet []]], failureCtx=FailureContext 
[type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker 
[name=data-streamer-stripe-5, igniteInstanceName=3, finished=false, 
heartbeatTs=1550754912905]]]
class org.apache.ignite.IgniteException: GridWorker 
[name=data-streamer-stripe-5, igniteInstanceName=3, finished=false, 
heartbeatTs=1550754912905]
{code}


If the blocked timeout is changed by  
cfg.setSystemWorkerBlockedTimeout(30_000), after streaming data and restarting 
several nodes the following critical error occurs:

{code:java}
[2019-02-21 16:24:07,164][ERROR][grid-timeout-worker-#214%client%][G] Blocked 
system-critical thread has been detected. This can lead to cluster-wide 
undefined behaviour [threadName=tcp-comm-worker, blockedFor=36s]
[2019-02-21 16:24:07,166][WARN ][grid-timeout-worker-#214%client%][G] Thread 
[name="tcp-comm-worker-#25%client%", id=482, state=TIMED_WAITING, blockCnt=0, 
waitCnt=729]

[2019-02-21 16:24:07,168][ERROR][grid-timeout-worker-#214%client%][root] 
Critical system error detected. Will be handled accordingly to configured 
handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet []]], 
failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, 
igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]]]
class org.apache.ignite.IgniteException: GridWorker [name=tcp-comm-worker, 
igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]
{code}




  was:
The attached reproducer shows the following exception during streaming data to 
cache:

[2019-02-21 16:15:23,202][ERROR][tcp-disco-msg-worker-[100a6976 
0:0:0:0:0:0:0:1%lo:47500]-#19%3%][root] Critical system error detected. Will be 
handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler 
[tryStop=false, timeout=0, super=AbstractFailureHandler 
[ignoredFailureTypes=UnmodifiableSet []]], failureCtx=FailureContext 
[type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker 
[name=data-streamer-stripe-5, igniteInstanceName=3, finished=false, 
heartbeatTs=1550754912905]]]
class org.apache.ignite.IgniteException: GridWorker 
[name=data-streamer-stripe-5, igniteInstanceName=3, finished=false, 
heartbeatTs=1550754912905]

If the blocked timeout is changed by  
cfg.setSystemWorkerBlockedTimeout(30_000), after streaming data and restarting 
several nodes the following critical error occurs:
[2019-02-21 16:24:07,164][ERROR][grid-timeout-worker-#214%client%][G] Blocked 
system-critical thread has been detected. This can lead to cluster-wide 
undefined behaviour [threadName=tcp-comm-worker, blockedFor=36s]
[2019-02-21 16:24:07,166][WARN ][grid-timeout-worker-#214%client%][G] Thread 
[name="tcp-comm-worker-#25%client%", id=482, state=TIMED_WAITING, blockCnt=0, 
waitCnt=729]

[2019-02-21 16:24:07,168][ERROR][grid-timeout-worker-#214%client%][root] 
Critical system error detected. Will be handled accordingly to configured 
handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet []]], 
failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, 
igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]]]
class org.apache.ignite.IgniteException: GridWorker [name=tcp-comm-worker, 
igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]




> Critical system errors on cluster with enabled peristance
> -
>
> Key: IGNITE-11378
> URL: https://issues.apache.org/jira/browse/IGNITE-11378
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Affects Versions: 2.7
>Reporter: Pavel Vinokurov
>Priority: Major
> Attachments: CheckpointLockReproducer.java
>
>
> The attached reproducer shows the following exception during streaming data 
> to cache:
> {code:java}
> [2019-02-21 16:15:23,202][ERROR][tcp-disco-msg-worker-[100a6976 
> 0:0:0:0:0:0:0:1%lo:47500]-#19%3%][root] Critical system error detected. Will 
> be handled accordingly to configured handler 
> [hnd=StopNodeOrHaltFailureHandler 

[jira] [Updated] (IGNITE-11378) Critical system errors on cluster with enabled peristance

2019-02-22 Thread Pavel Vinokurov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-11378:
-
Description: 
The attached reproducer shows the following exception during streaming data to 
cache:

[2019-02-21 16:15:23,202][ERROR][tcp-disco-msg-worker-[100a6976 
0:0:0:0:0:0:0:1%lo:47500]-#19%3%][root] Critical system error detected. Will be 
handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler 
[tryStop=false, timeout=0, super=AbstractFailureHandler 
[ignoredFailureTypes=UnmodifiableSet []]], failureCtx=FailureContext 
[type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker 
[name=data-streamer-stripe-5, igniteInstanceName=3, finished=false, 
heartbeatTs=1550754912905]]]
class org.apache.ignite.IgniteException: GridWorker 
[name=data-streamer-stripe-5, igniteInstanceName=3, finished=false, 
heartbeatTs=1550754912905]

If the blocked timeout is changed by  
cfg.setSystemWorkerBlockedTimeout(30_000), after streaming data and restarting 
several nodes the following critical error occurs:
[2019-02-21 16:24:07,164][ERROR][grid-timeout-worker-#214%client%][G] Blocked 
system-critical thread has been detected. This can lead to cluster-wide 
undefined behaviour [threadName=tcp-comm-worker, blockedFor=36s]
[2019-02-21 16:24:07,166][WARN ][grid-timeout-worker-#214%client%][G] Thread 
[name="tcp-comm-worker-#25%client%", id=482, state=TIMED_WAITING, blockCnt=0, 
waitCnt=729]

[2019-02-21 16:24:07,168][ERROR][grid-timeout-worker-#214%client%][root] 
Critical system error detected. Will be handled accordingly to configured 
handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet []]], 
failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, 
igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]]]
class org.apache.ignite.IgniteException: GridWorker [name=tcp-comm-worker, 
igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]



  was:
The attached reproducer shows the following exception during streaming data to 
cache:
[2019-02-21 16:15:23,202][ERROR][tcp-disco-msg-worker-[100a6976 
0:0:0:0:0:0:0:1%lo:47500]-#19%3%][root] Critical system error detected. Will be 
handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler 
[tryStop=false, timeout=0, super=AbstractFailureHandler 
[ignoredFailureTypes=UnmodifiableSet []]], failureCtx=FailureContext 
[type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker 
[name=data-streamer-stripe-5, igniteInstanceName=3, finished=false, 
heartbeatTs=1550754912905]]]
class org.apache.ignite.IgniteException: GridWorker 
[name=data-streamer-stripe-5, igniteInstanceName=3, finished=false, 
heartbeatTs=1550754912905]

If the blocked timeout is changed by  
cfg.setSystemWorkerBlockedTimeout(30_000), after streaming data and restarting 
several nodes the following critical error occurs:
[2019-02-21 16:24:07,164][ERROR][grid-timeout-worker-#214%client%][G] Blocked 
system-critical thread has been detected. This can lead to cluster-wide 
undefined behaviour [threadName=tcp-comm-worker, blockedFor=36s]
[2019-02-21 16:24:07,166][WARN ][grid-timeout-worker-#214%client%][G] Thread 
[name="tcp-comm-worker-#25%client%", id=482, state=TIMED_WAITING, blockCnt=0, 
waitCnt=729]

[2019-02-21 16:24:07,168][ERROR][grid-timeout-worker-#214%client%][root] 
Critical system error detected. Will be handled accordingly to configured 
handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet []]], 
failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, 
igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]]]
class org.apache.ignite.IgniteException: GridWorker [name=tcp-comm-worker, 
igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]




> Critical system errors on cluster with enabled peristance
> -
>
> Key: IGNITE-11378
> URL: https://issues.apache.org/jira/browse/IGNITE-11378
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Affects Versions: 2.7
>Reporter: Pavel Vinokurov
>Priority: Major
> Attachments: CheckpointLockReproducer.java
>
>
> The attached reproducer shows the following exception during streaming data 
> to cache:
> [2019-02-21 16:15:23,202][ERROR][tcp-disco-msg-worker-[100a6976 
> 0:0:0:0:0:0:0:1%lo:47500]-#19%3%][root] Critical system error detected. Will 
> be handled accordingly to configured handler 
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
> super=AbstractFailureHandler 

[jira] [Created] (IGNITE-11383) Unable to restart node with WALMode.NONE

2019-02-21 Thread Pavel Vinokurov (JIRA)
Pavel Vinokurov created IGNITE-11383:


 Summary: Unable to restart node with WALMode.NONE 
 Key: IGNITE-11383
 URL: https://issues.apache.org/jira/browse/IGNITE-11383
 Project: Ignite
  Issue Type: Bug
  Components: persistence
Affects Versions: 2.7
Reporter: Pavel Vinokurov
 Attachments: MemoryRestoreReproducer.java

Scenario:
1. Start single node with persistence without WAL.
2. Stream data to a cache.
3. Restart the node.
Result:
Node failed with following exception.
{code:java}
Exception in thread "main" class org.apache.ignite.IgniteException: null
at 
org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:1059)
at org.apache.ignite.Ignition.start(Ignition.java:324)
at MemoryRestoreReproducer.main(MemoryRestoreReproducer.java:27)
Caused by: class org.apache.ignite.IgniteCheckedException: null
at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1196)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1992)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1683)
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1109)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:629)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:554)
at org.apache.ignite.Ignition.start(Ignition.java:321)
... 1 more
Caused by: java.util.NoSuchElementException
at 
org.apache.ignite.internal.util.GridCloseableIteratorAdapter.nextX(GridCloseableIteratorAdapter.java:39)
at 
org.apache.ignite.internal.util.lang.GridIteratorAdapter.next(GridIteratorAdapter.java:35)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.read(FileWriteAheadLogManager.java:855)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.performBinaryMemoryRestore(GridCacheDatabaseSharedManager.java:2120)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:749)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetaStorageSubscribersOnReadyForRead(GridCacheDatabaseSharedManager.java:4963)
at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1058)
... 7 more
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11378) Critical system errors on cluster with enabled peristance

2019-02-21 Thread Pavel Vinokurov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-11378:
-
Attachment: CheckpointLockReproducer.java

> Critical system errors on cluster with enabled peristance
> -
>
> Key: IGNITE-11378
> URL: https://issues.apache.org/jira/browse/IGNITE-11378
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Affects Versions: 2.7
>Reporter: Pavel Vinokurov
>Priority: Major
> Attachments: CheckpointLockReproducer.java
>
>
> The attached reproducer shows the following exception during streaming data 
> to cache:
> [2019-02-21 16:15:23,202][ERROR][tcp-disco-msg-worker-[100a6976 
> 0:0:0:0:0:0:0:1%lo:47500]-#19%3%][root] Critical system error detected. Will 
> be handled accordingly to configured handler 
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet []]], 
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
> o.a.i.IgniteException: GridWorker [name=data-streamer-stripe-5, 
> igniteInstanceName=3, finished=false, heartbeatTs=1550754912905]]]
> class org.apache.ignite.IgniteException: GridWorker 
> [name=data-streamer-stripe-5, igniteInstanceName=3, finished=false, 
> heartbeatTs=1550754912905]
> If the blocked timeout is changed by  
> cfg.setSystemWorkerBlockedTimeout(30_000), after streaming data and 
> restarting several nodes the following critical error occurs:
> [2019-02-21 16:24:07,164][ERROR][grid-timeout-worker-#214%client%][G] Blocked 
> system-critical thread has been detected. This can lead to cluster-wide 
> undefined behaviour [threadName=tcp-comm-worker, blockedFor=36s]
> [2019-02-21 16:24:07,166][WARN ][grid-timeout-worker-#214%client%][G] Thread 
> [name="tcp-comm-worker-#25%client%", id=482, state=TIMED_WAITING, blockCnt=0, 
> waitCnt=729]
> [2019-02-21 16:24:07,168][ERROR][grid-timeout-worker-#214%client%][root] 
> Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet []]], 
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
> o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, 
> igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]]]
> class org.apache.ignite.IgniteException: GridWorker [name=tcp-comm-worker, 
> igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11378) Critical system errors on cluster with enabled peristance

2019-02-21 Thread Pavel Vinokurov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-11378:
-
Attachment: (was: CheckpointLockReproducer.java)

> Critical system errors on cluster with enabled peristance
> -
>
> Key: IGNITE-11378
> URL: https://issues.apache.org/jira/browse/IGNITE-11378
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Affects Versions: 2.7
>Reporter: Pavel Vinokurov
>Priority: Major
> Attachments: CheckpointLockReproducer.java
>
>
> The attached reproducer shows the following exception during streaming data 
> to cache:
> [2019-02-21 16:15:23,202][ERROR][tcp-disco-msg-worker-[100a6976 
> 0:0:0:0:0:0:0:1%lo:47500]-#19%3%][root] Critical system error detected. Will 
> be handled accordingly to configured handler 
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet []]], 
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
> o.a.i.IgniteException: GridWorker [name=data-streamer-stripe-5, 
> igniteInstanceName=3, finished=false, heartbeatTs=1550754912905]]]
> class org.apache.ignite.IgniteException: GridWorker 
> [name=data-streamer-stripe-5, igniteInstanceName=3, finished=false, 
> heartbeatTs=1550754912905]
> If the blocked timeout is changed by  
> cfg.setSystemWorkerBlockedTimeout(30_000), after streaming data and 
> restarting several nodes the following critical error occurs:
> [2019-02-21 16:24:07,164][ERROR][grid-timeout-worker-#214%client%][G] Blocked 
> system-critical thread has been detected. This can lead to cluster-wide 
> undefined behaviour [threadName=tcp-comm-worker, blockedFor=36s]
> [2019-02-21 16:24:07,166][WARN ][grid-timeout-worker-#214%client%][G] Thread 
> [name="tcp-comm-worker-#25%client%", id=482, state=TIMED_WAITING, blockCnt=0, 
> waitCnt=729]
> [2019-02-21 16:24:07,168][ERROR][grid-timeout-worker-#214%client%][root] 
> Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet []]], 
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
> o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, 
> igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]]]
> class org.apache.ignite.IgniteException: GridWorker [name=tcp-comm-worker, 
> igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-11378) Critical system errors on cluster with enabled peristance

2019-02-21 Thread Pavel Vinokurov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774351#comment-16774351
 ] 

Pavel Vinokurov edited comment on IGNITE-11378 at 2/21/19 5:49 PM:
---

The reproducer has been updated


was (Author: pvinokurov):
The reproducer was updated

> Critical system errors on cluster with enabled peristance
> -
>
> Key: IGNITE-11378
> URL: https://issues.apache.org/jira/browse/IGNITE-11378
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Affects Versions: 2.7
>Reporter: Pavel Vinokurov
>Priority: Major
> Attachments: CheckpointLockReproducer.java
>
>
> The attached reproducer shows the following exception during streaming data 
> to cache:
> [2019-02-21 16:15:23,202][ERROR][tcp-disco-msg-worker-[100a6976 
> 0:0:0:0:0:0:0:1%lo:47500]-#19%3%][root] Critical system error detected. Will 
> be handled accordingly to configured handler 
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet []]], 
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
> o.a.i.IgniteException: GridWorker [name=data-streamer-stripe-5, 
> igniteInstanceName=3, finished=false, heartbeatTs=1550754912905]]]
> class org.apache.ignite.IgniteException: GridWorker 
> [name=data-streamer-stripe-5, igniteInstanceName=3, finished=false, 
> heartbeatTs=1550754912905]
> If the blocked timeout is changed by  
> cfg.setSystemWorkerBlockedTimeout(30_000), after streaming data and 
> restarting several nodes the following critical error occurs:
> [2019-02-21 16:24:07,164][ERROR][grid-timeout-worker-#214%client%][G] Blocked 
> system-critical thread has been detected. This can lead to cluster-wide 
> undefined behaviour [threadName=tcp-comm-worker, blockedFor=36s]
> [2019-02-21 16:24:07,166][WARN ][grid-timeout-worker-#214%client%][G] Thread 
> [name="tcp-comm-worker-#25%client%", id=482, state=TIMED_WAITING, blockCnt=0, 
> waitCnt=729]
> [2019-02-21 16:24:07,168][ERROR][grid-timeout-worker-#214%client%][root] 
> Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet []]], 
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
> o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, 
> igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]]]
> class org.apache.ignite.IgniteException: GridWorker [name=tcp-comm-worker, 
> igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-11378) Critical system errors on cluster with enabled peristance

2019-02-21 Thread Pavel Vinokurov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774351#comment-16774351
 ] 

Pavel Vinokurov commented on IGNITE-11378:
--

The reproducer was updated

> Critical system errors on cluster with enabled peristance
> -
>
> Key: IGNITE-11378
> URL: https://issues.apache.org/jira/browse/IGNITE-11378
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Affects Versions: 2.7
>Reporter: Pavel Vinokurov
>Priority: Major
> Attachments: CheckpointLockReproducer.java
>
>
> The attached reproducer shows the following exception during streaming data 
> to cache:
> [2019-02-21 16:15:23,202][ERROR][tcp-disco-msg-worker-[100a6976 
> 0:0:0:0:0:0:0:1%lo:47500]-#19%3%][root] Critical system error detected. Will 
> be handled accordingly to configured handler 
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet []]], 
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
> o.a.i.IgniteException: GridWorker [name=data-streamer-stripe-5, 
> igniteInstanceName=3, finished=false, heartbeatTs=1550754912905]]]
> class org.apache.ignite.IgniteException: GridWorker 
> [name=data-streamer-stripe-5, igniteInstanceName=3, finished=false, 
> heartbeatTs=1550754912905]
> If the blocked timeout is changed by  
> cfg.setSystemWorkerBlockedTimeout(30_000), after streaming data and 
> restarting several nodes the following critical error occurs:
> [2019-02-21 16:24:07,164][ERROR][grid-timeout-worker-#214%client%][G] Blocked 
> system-critical thread has been detected. This can lead to cluster-wide 
> undefined behaviour [threadName=tcp-comm-worker, blockedFor=36s]
> [2019-02-21 16:24:07,166][WARN ][grid-timeout-worker-#214%client%][G] Thread 
> [name="tcp-comm-worker-#25%client%", id=482, state=TIMED_WAITING, blockCnt=0, 
> waitCnt=729]
> [2019-02-21 16:24:07,168][ERROR][grid-timeout-worker-#214%client%][root] 
> Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet []]], 
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
> o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, 
> igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]]]
> class org.apache.ignite.IgniteException: GridWorker [name=tcp-comm-worker, 
> igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11378) Critical system errors on cluster with enabled peristance

2019-02-21 Thread Pavel Vinokurov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-11378:
-
Attachment: (was: CheckpointLockReproducer.java)

> Critical system errors on cluster with enabled peristance
> -
>
> Key: IGNITE-11378
> URL: https://issues.apache.org/jira/browse/IGNITE-11378
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Affects Versions: 2.7
>Reporter: Pavel Vinokurov
>Priority: Major
> Attachments: CheckpointLockReproducer.java
>
>
> The attached reproducer shows the following exception during streaming data 
> to cache:
> [2019-02-21 16:15:23,202][ERROR][tcp-disco-msg-worker-[100a6976 
> 0:0:0:0:0:0:0:1%lo:47500]-#19%3%][root] Critical system error detected. Will 
> be handled accordingly to configured handler 
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet []]], 
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
> o.a.i.IgniteException: GridWorker [name=data-streamer-stripe-5, 
> igniteInstanceName=3, finished=false, heartbeatTs=1550754912905]]]
> class org.apache.ignite.IgniteException: GridWorker 
> [name=data-streamer-stripe-5, igniteInstanceName=3, finished=false, 
> heartbeatTs=1550754912905]
> If the blocked timeout is changed by  
> cfg.setSystemWorkerBlockedTimeout(30_000), after streaming data and 
> restarting several nodes the following critical error occurs:
> [2019-02-21 16:24:07,164][ERROR][grid-timeout-worker-#214%client%][G] Blocked 
> system-critical thread has been detected. This can lead to cluster-wide 
> undefined behaviour [threadName=tcp-comm-worker, blockedFor=36s]
> [2019-02-21 16:24:07,166][WARN ][grid-timeout-worker-#214%client%][G] Thread 
> [name="tcp-comm-worker-#25%client%", id=482, state=TIMED_WAITING, blockCnt=0, 
> waitCnt=729]
> [2019-02-21 16:24:07,168][ERROR][grid-timeout-worker-#214%client%][root] 
> Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet []]], 
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
> o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, 
> igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]]]
> class org.apache.ignite.IgniteException: GridWorker [name=tcp-comm-worker, 
> igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11378) Critical system errors on cluster with enabled peristance

2019-02-21 Thread Pavel Vinokurov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-11378:
-
Attachment: CheckpointLockReproducer.java

> Critical system errors on cluster with enabled peristance
> -
>
> Key: IGNITE-11378
> URL: https://issues.apache.org/jira/browse/IGNITE-11378
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Affects Versions: 2.7
>Reporter: Pavel Vinokurov
>Priority: Major
> Attachments: CheckpointLockReproducer.java
>
>
> The attached reproducer shows the following exception during streaming data 
> to cache:
> [2019-02-21 16:15:23,202][ERROR][tcp-disco-msg-worker-[100a6976 
> 0:0:0:0:0:0:0:1%lo:47500]-#19%3%][root] Critical system error detected. Will 
> be handled accordingly to configured handler 
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet []]], 
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
> o.a.i.IgniteException: GridWorker [name=data-streamer-stripe-5, 
> igniteInstanceName=3, finished=false, heartbeatTs=1550754912905]]]
> class org.apache.ignite.IgniteException: GridWorker 
> [name=data-streamer-stripe-5, igniteInstanceName=3, finished=false, 
> heartbeatTs=1550754912905]
> If the blocked timeout is changed by  
> cfg.setSystemWorkerBlockedTimeout(30_000), after streaming data and 
> restarting several nodes the following critical error occurs:
> [2019-02-21 16:24:07,164][ERROR][grid-timeout-worker-#214%client%][G] Blocked 
> system-critical thread has been detected. This can lead to cluster-wide 
> undefined behaviour [threadName=tcp-comm-worker, blockedFor=36s]
> [2019-02-21 16:24:07,166][WARN ][grid-timeout-worker-#214%client%][G] Thread 
> [name="tcp-comm-worker-#25%client%", id=482, state=TIMED_WAITING, blockCnt=0, 
> waitCnt=729]
> [2019-02-21 16:24:07,168][ERROR][grid-timeout-worker-#214%client%][root] 
> Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet []]], 
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
> o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, 
> igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]]]
> class org.apache.ignite.IgniteException: GridWorker [name=tcp-comm-worker, 
> igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11378) Critical system errors on cluster with enabled peristance

2019-02-21 Thread Pavel Vinokurov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-11378:
-
Affects Version/s: 2.7

> Critical system errors on cluster with enabled peristance
> -
>
> Key: IGNITE-11378
> URL: https://issues.apache.org/jira/browse/IGNITE-11378
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Affects Versions: 2.7
>Reporter: Pavel Vinokurov
>Priority: Major
> Attachments: CheckpointLockReproducer.java
>
>
> The attached reproducer shows the following exception during streaming data 
> to cache:
> [2019-02-21 16:15:23,202][ERROR][tcp-disco-msg-worker-[100a6976 
> 0:0:0:0:0:0:0:1%lo:47500]-#19%3%][root] Critical system error detected. Will 
> be handled accordingly to configured handler 
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet []]], 
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
> o.a.i.IgniteException: GridWorker [name=data-streamer-stripe-5, 
> igniteInstanceName=3, finished=false, heartbeatTs=1550754912905]]]
> class org.apache.ignite.IgniteException: GridWorker 
> [name=data-streamer-stripe-5, igniteInstanceName=3, finished=false, 
> heartbeatTs=1550754912905]
> If the blocked timeout is changed by  
> cfg.setSystemWorkerBlockedTimeout(30_000), after streaming data and 
> restarting several nodes the following critical error occurs:
> [2019-02-21 16:24:07,164][ERROR][grid-timeout-worker-#214%client%][G] Blocked 
> system-critical thread has been detected. This can lead to cluster-wide 
> undefined behaviour [threadName=tcp-comm-worker, blockedFor=36s]
> [2019-02-21 16:24:07,166][WARN ][grid-timeout-worker-#214%client%][G] Thread 
> [name="tcp-comm-worker-#25%client%", id=482, state=TIMED_WAITING, blockCnt=0, 
> waitCnt=729]
> [2019-02-21 16:24:07,168][ERROR][grid-timeout-worker-#214%client%][root] 
> Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet []]], 
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
> o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, 
> igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]]]
> class org.apache.ignite.IgniteException: GridWorker [name=tcp-comm-worker, 
> igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11378) Critical system errors on cluster with enabled peristance

2019-02-21 Thread Pavel Vinokurov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-11378:
-
Attachment: CheckpointLockReproducer.java

> Critical system errors on cluster with enabled peristance
> -
>
> Key: IGNITE-11378
> URL: https://issues.apache.org/jira/browse/IGNITE-11378
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Reporter: Pavel Vinokurov
>Priority: Major
> Attachments: CheckpointLockReproducer.java
>
>
> The attached reproducer shows the following exception during streaming data 
> to cache:
> [2019-02-21 16:15:23,202][ERROR][tcp-disco-msg-worker-[100a6976 
> 0:0:0:0:0:0:0:1%lo:47500]-#19%3%][root] Critical system error detected. Will 
> be handled accordingly to configured handler 
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet []]], 
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
> o.a.i.IgniteException: GridWorker [name=data-streamer-stripe-5, 
> igniteInstanceName=3, finished=false, heartbeatTs=1550754912905]]]
> class org.apache.ignite.IgniteException: GridWorker 
> [name=data-streamer-stripe-5, igniteInstanceName=3, finished=false, 
> heartbeatTs=1550754912905]
> If the blocked timeout is changed by  
> cfg.setSystemWorkerBlockedTimeout(30_000), after streaming data and 
> restarting several nodes the following critical error occurs:
> [2019-02-21 16:24:07,164][ERROR][grid-timeout-worker-#214%client%][G] Blocked 
> system-critical thread has been detected. This can lead to cluster-wide 
> undefined behaviour [threadName=tcp-comm-worker, blockedFor=36s]
> [2019-02-21 16:24:07,166][WARN ][grid-timeout-worker-#214%client%][G] Thread 
> [name="tcp-comm-worker-#25%client%", id=482, state=TIMED_WAITING, blockCnt=0, 
> waitCnt=729]
> [2019-02-21 16:24:07,168][ERROR][grid-timeout-worker-#214%client%][root] 
> Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet []]], 
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
> o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, 
> igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]]]
> class org.apache.ignite.IgniteException: GridWorker [name=tcp-comm-worker, 
> igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11378) Critical system errors on cluster with enabled peristance

2019-02-21 Thread Pavel Vinokurov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-11378:
-
Attachment: (was: CheckpointLockReproducer.java)

> Critical system errors on cluster with enabled peristance
> -
>
> Key: IGNITE-11378
> URL: https://issues.apache.org/jira/browse/IGNITE-11378
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Reporter: Pavel Vinokurov
>Priority: Major
>
> The attached reproducer shows the following exception during streaming data 
> to cache:
> [2019-02-21 16:15:23,202][ERROR][tcp-disco-msg-worker-[100a6976 
> 0:0:0:0:0:0:0:1%lo:47500]-#19%3%][root] Critical system error detected. Will 
> be handled accordingly to configured handler 
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet []]], 
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
> o.a.i.IgniteException: GridWorker [name=data-streamer-stripe-5, 
> igniteInstanceName=3, finished=false, heartbeatTs=1550754912905]]]
> class org.apache.ignite.IgniteException: GridWorker 
> [name=data-streamer-stripe-5, igniteInstanceName=3, finished=false, 
> heartbeatTs=1550754912905]
> If the blocked timeout is changed by  
> cfg.setSystemWorkerBlockedTimeout(30_000), after streaming data and 
> restarting several nodes the following critical error occurs:
> [2019-02-21 16:24:07,164][ERROR][grid-timeout-worker-#214%client%][G] Blocked 
> system-critical thread has been detected. This can lead to cluster-wide 
> undefined behaviour [threadName=tcp-comm-worker, blockedFor=36s]
> [2019-02-21 16:24:07,166][WARN ][grid-timeout-worker-#214%client%][G] Thread 
> [name="tcp-comm-worker-#25%client%", id=482, state=TIMED_WAITING, blockCnt=0, 
> waitCnt=729]
> [2019-02-21 16:24:07,168][ERROR][grid-timeout-worker-#214%client%][root] 
> Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet []]], 
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
> o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, 
> igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]]]
> class org.apache.ignite.IgniteException: GridWorker [name=tcp-comm-worker, 
> igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11378) Critical system errors on cluster with enabled peristance

2019-02-21 Thread Pavel Vinokurov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-11378:
-
Attachment: CheckpointLockReproducer.java

> Critical system errors on cluster with enabled peristance
> -
>
> Key: IGNITE-11378
> URL: https://issues.apache.org/jira/browse/IGNITE-11378
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Reporter: Pavel Vinokurov
>Priority: Major
> Attachments: CheckpointLockReproducer.java
>
>
> The attached reproducer shows the following exception during streaming data 
> to cache:
> [2019-02-21 16:15:23,202][ERROR][tcp-disco-msg-worker-[100a6976 
> 0:0:0:0:0:0:0:1%lo:47500]-#19%3%][root] Critical system error detected. Will 
> be handled accordingly to configured handler 
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet []]], 
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
> o.a.i.IgniteException: GridWorker [name=data-streamer-stripe-5, 
> igniteInstanceName=3, finished=false, heartbeatTs=1550754912905]]]
> class org.apache.ignite.IgniteException: GridWorker 
> [name=data-streamer-stripe-5, igniteInstanceName=3, finished=false, 
> heartbeatTs=1550754912905]
> If the blocked timeout is changed by  
> cfg.setSystemWorkerBlockedTimeout(30_000), after streaming data and 
> restarting several nodes the following critical error occurs:
> [2019-02-21 16:24:07,164][ERROR][grid-timeout-worker-#214%client%][G] Blocked 
> system-critical thread has been detected. This can lead to cluster-wide 
> undefined behaviour [threadName=tcp-comm-worker, blockedFor=36s]
> [2019-02-21 16:24:07,166][WARN ][grid-timeout-worker-#214%client%][G] Thread 
> [name="tcp-comm-worker-#25%client%", id=482, state=TIMED_WAITING, blockCnt=0, 
> waitCnt=729]
> [2019-02-21 16:24:07,168][ERROR][grid-timeout-worker-#214%client%][root] 
> Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet []]], 
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
> o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, 
> igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]]]
> class org.apache.ignite.IgniteException: GridWorker [name=tcp-comm-worker, 
> igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11378) Critical system errors on cluster with enabled peristance

2019-02-21 Thread Pavel Vinokurov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-11378:
-
Attachment: CheckpointLockReproducer.java

> Critical system errors on cluster with enabled peristance
> -
>
> Key: IGNITE-11378
> URL: https://issues.apache.org/jira/browse/IGNITE-11378
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Reporter: Pavel Vinokurov
>Priority: Major
> Attachments: CheckpointLockReproducer.java
>
>
> The attached reproducer shows the following exception during streaming data 
> to cache:
> [2019-02-21 16:15:23,202][ERROR][tcp-disco-msg-worker-[100a6976 
> 0:0:0:0:0:0:0:1%lo:47500]-#19%3%][root] Critical system error detected. Will 
> be handled accordingly to configured handler 
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet []]], 
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
> o.a.i.IgniteException: GridWorker [name=data-streamer-stripe-5, 
> igniteInstanceName=3, finished=false, heartbeatTs=1550754912905]]]
> class org.apache.ignite.IgniteException: GridWorker 
> [name=data-streamer-stripe-5, igniteInstanceName=3, finished=false, 
> heartbeatTs=1550754912905]
> If the blocked timeout is changed by  
> cfg.setSystemWorkerBlockedTimeout(30_000), after streaming data and 
> restarting several nodes the following critical error occurs:
> [2019-02-21 16:24:07,164][ERROR][grid-timeout-worker-#214%client%][G] Blocked 
> system-critical thread has been detected. This can lead to cluster-wide 
> undefined behaviour [threadName=tcp-comm-worker, blockedFor=36s]
> [2019-02-21 16:24:07,166][WARN ][grid-timeout-worker-#214%client%][G] Thread 
> [name="tcp-comm-worker-#25%client%", id=482, state=TIMED_WAITING, blockCnt=0, 
> waitCnt=729]
> [2019-02-21 16:24:07,168][ERROR][grid-timeout-worker-#214%client%][root] 
> Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet []]], 
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
> o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, 
> igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]]]
> class org.apache.ignite.IgniteException: GridWorker [name=tcp-comm-worker, 
> igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11378) Critical system errors on cluster with enabled peristance

2019-02-21 Thread Pavel Vinokurov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-11378:
-
Attachment: (was: CheckpointLockReproducer.java)

> Critical system errors on cluster with enabled peristance
> -
>
> Key: IGNITE-11378
> URL: https://issues.apache.org/jira/browse/IGNITE-11378
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Reporter: Pavel Vinokurov
>Priority: Major
> Attachments: CheckpointLockReproducer.java
>
>
> The attached reproducer shows the following exception during streaming data 
> to cache:
> [2019-02-21 16:15:23,202][ERROR][tcp-disco-msg-worker-[100a6976 
> 0:0:0:0:0:0:0:1%lo:47500]-#19%3%][root] Critical system error detected. Will 
> be handled accordingly to configured handler 
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet []]], 
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
> o.a.i.IgniteException: GridWorker [name=data-streamer-stripe-5, 
> igniteInstanceName=3, finished=false, heartbeatTs=1550754912905]]]
> class org.apache.ignite.IgniteException: GridWorker 
> [name=data-streamer-stripe-5, igniteInstanceName=3, finished=false, 
> heartbeatTs=1550754912905]
> If the blocked timeout is changed by  
> cfg.setSystemWorkerBlockedTimeout(30_000), after streaming data and 
> restarting several nodes the following critical error occurs:
> [2019-02-21 16:24:07,164][ERROR][grid-timeout-worker-#214%client%][G] Blocked 
> system-critical thread has been detected. This can lead to cluster-wide 
> undefined behaviour [threadName=tcp-comm-worker, blockedFor=36s]
> [2019-02-21 16:24:07,166][WARN ][grid-timeout-worker-#214%client%][G] Thread 
> [name="tcp-comm-worker-#25%client%", id=482, state=TIMED_WAITING, blockCnt=0, 
> waitCnt=729]
> [2019-02-21 16:24:07,168][ERROR][grid-timeout-worker-#214%client%][root] 
> Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet []]], 
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
> o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, 
> igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]]]
> class org.apache.ignite.IgniteException: GridWorker [name=tcp-comm-worker, 
> igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11378) Critical system errors on cluster with enabled peristance

2019-02-21 Thread Pavel Vinokurov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-11378:
-
Attachment: (was: CheckpointLockReproducer.java)

> Critical system errors on cluster with enabled peristance
> -
>
> Key: IGNITE-11378
> URL: https://issues.apache.org/jira/browse/IGNITE-11378
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Reporter: Pavel Vinokurov
>Priority: Major
> Attachments: CheckpointLockReproducer.java
>
>
> The attached reproducer shows the following exception during streaming data 
> to cache:
> [2019-02-21 16:15:23,202][ERROR][tcp-disco-msg-worker-[100a6976 
> 0:0:0:0:0:0:0:1%lo:47500]-#19%3%][root] Critical system error detected. Will 
> be handled accordingly to configured handler 
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet []]], 
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
> o.a.i.IgniteException: GridWorker [name=data-streamer-stripe-5, 
> igniteInstanceName=3, finished=false, heartbeatTs=1550754912905]]]
> class org.apache.ignite.IgniteException: GridWorker 
> [name=data-streamer-stripe-5, igniteInstanceName=3, finished=false, 
> heartbeatTs=1550754912905]
> If the blocked timeout is changed by  
> cfg.setSystemWorkerBlockedTimeout(30_000), after streaming data and 
> restarting several nodes the following critical error occurs:
> [2019-02-21 16:24:07,164][ERROR][grid-timeout-worker-#214%client%][G] Blocked 
> system-critical thread has been detected. This can lead to cluster-wide 
> undefined behaviour [threadName=tcp-comm-worker, blockedFor=36s]
> [2019-02-21 16:24:07,166][WARN ][grid-timeout-worker-#214%client%][G] Thread 
> [name="tcp-comm-worker-#25%client%", id=482, state=TIMED_WAITING, blockCnt=0, 
> waitCnt=729]
> [2019-02-21 16:24:07,168][ERROR][grid-timeout-worker-#214%client%][root] 
> Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet []]], 
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
> o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, 
> igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]]]
> class org.apache.ignite.IgniteException: GridWorker [name=tcp-comm-worker, 
> igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11378) Critical system errors on cluster with enabled peristance

2019-02-21 Thread Pavel Vinokurov (JIRA)
Pavel Vinokurov created IGNITE-11378:


 Summary: Critical system errors on cluster with enabled peristance
 Key: IGNITE-11378
 URL: https://issues.apache.org/jira/browse/IGNITE-11378
 Project: Ignite
  Issue Type: Bug
  Components: persistence
Reporter: Pavel Vinokurov
 Attachments: CheckpointLockReproducer.java

The attached reproducer shows the following exception during streaming data to 
cache:
[2019-02-21 16:15:23,202][ERROR][tcp-disco-msg-worker-[100a6976 
0:0:0:0:0:0:0:1%lo:47500]-#19%3%][root] Critical system error detected. Will be 
handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler 
[tryStop=false, timeout=0, super=AbstractFailureHandler 
[ignoredFailureTypes=UnmodifiableSet []]], failureCtx=FailureContext 
[type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker 
[name=data-streamer-stripe-5, igniteInstanceName=3, finished=false, 
heartbeatTs=1550754912905]]]
class org.apache.ignite.IgniteException: GridWorker 
[name=data-streamer-stripe-5, igniteInstanceName=3, finished=false, 
heartbeatTs=1550754912905]

If the blocked timeout is changed by  
cfg.setSystemWorkerBlockedTimeout(30_000), after streaming data and restarting 
several nodes the following critical error occurs:
[2019-02-21 16:24:07,164][ERROR][grid-timeout-worker-#214%client%][G] Blocked 
system-critical thread has been detected. This can lead to cluster-wide 
undefined behaviour [threadName=tcp-comm-worker, blockedFor=36s]
[2019-02-21 16:24:07,166][WARN ][grid-timeout-worker-#214%client%][G] Thread 
[name="tcp-comm-worker-#25%client%", id=482, state=TIMED_WAITING, blockCnt=0, 
waitCnt=729]

[2019-02-21 16:24:07,168][ERROR][grid-timeout-worker-#214%client%][root] 
Critical system error detected. Will be handled accordingly to configured 
handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet []]], 
failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, 
igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]]]
class org.apache.ignite.IgniteException: GridWorker [name=tcp-comm-worker, 
igniteInstanceName=client, finished=false, heartbeatTs=1550755410331]





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-10873) CorruptedTreeException during simultaneous cache put operations

2019-01-09 Thread Pavel Vinokurov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-10873:
-
Component/s: sql

> CorruptedTreeException during simultaneous cache put operations
> ---
>
> Key: IGNITE-10873
> URL: https://issues.apache.org/jira/browse/IGNITE-10873
> Project: Ignite
>  Issue Type: Bug
>  Components: cache, persistence, sql
>Affects Versions: 2.7
>Reporter: Pavel Vinokurov
>Priority: Critical
>
> [2019-01-09 20:47:04,376][ERROR][pool-9-thread-9][GridDhtAtomicCache]  
> Unexpected exception during cache update
> org.h2.message.DbException: General error: "class 
> org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
>  Runtime failure on row: Row@780acfb4[ key: .. ][ GTEST, null, 254, null, 
> null, null, null, 0, null, null, null, null, null, null, null, 0, 0, 0, null, 
> 0, 0, 0, 0, 0, 0, 0, null, 0, 0, null, 0, null, 0, null, 0, null, null, null, 
> 0, 0, 0, 0, 0, 0, null, null, null, null, null, null, null, 0.0, 0, 0.0, 0, 
> 0.0, 0, null, 0, 0, 0, 0, null, null, null, null, null, null, null, null, 
> null, null, null, null, null, null, null, null ]" [5-197]
>   at org.h2.message.DbException.get(DbException.java:168)
>   at org.h2.message.DbException.convert(DbException.java:307)
>   at 
> org.apache.ignite.internal.processors.query.h2.database.H2TreeIndex.putx(H2TreeIndex.java:302)
>   at 
> org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.addToIndex(GridH2Table.java:546)
>   at 
> org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.update(GridH2Table.java:479)
>   at 
> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.store(IgniteH2Indexing.java:768)
>   at 
> org.apache.ignite.internal.processors.query.GridQueryProcessor.store(GridQueryProcessor.java:1905)
>   at 
> org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.store(GridCacheQueryManager.java:404)
>   at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.finishUpdate(IgniteCacheOffheapManagerImpl.java:2633)
>   at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1646)
>   at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1621)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:1935)
>   at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:428)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerUpdate(GridCacheMapEntry.java:2295)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateSingle(GridDhtAtomicCache.java:2494)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update(GridDhtAtomicCache.java:1951)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1780)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1668)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.sendSingleRequest(GridNearAtomicAbstractUpdateFuture.java:299)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.map(GridNearAtomicSingleUpdateFuture.java:483)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.mapOnTopology(GridNearAtomicSingleUpdateFuture.java:443)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.map(GridNearAtomicAbstractUpdateFuture.java:248)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update0(GridDhtAtomicCache.java:1153)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.put0(GridDhtAtomicCache.java:611)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2449)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2426)
>   at 
> org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.put(IgniteCacheProxyImpl.java:1105)
>   at 
> 

[jira] [Updated] (IGNITE-10873) CorruptedTreeException during simultaneous cache put operations

2019-01-09 Thread Pavel Vinokurov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-10873:
-
Description: 
[2019-01-09 20:47:04,376][ERROR][pool-9-thread-9][GridDhtAtomicCache]  
Unexpected exception during cache update
org.h2.message.DbException: General error: "class 
org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
 Runtime failure on row: Row@780acfb4[ key: .. ][ GTEST, null, 254, null, 
null, null, null, 0, null, null, null, null, null, null, null, 0, 0, 0, null, 
0, 0, 0, 0, 0, 0, 0, null, 0, 0, null, 0, null, 0, null, 0, null, null, null, 
0, 0, 0, 0, 0, 0, null, null, null, null, null, null, null, 0.0, 0, 0.0, 0, 
0.0, 0, null, 0, 0, 0, 0, null, null, null, null, null, null, null, null, null, 
null, null, null, null, null, null, null ]" [5-197]
at org.h2.message.DbException.get(DbException.java:168)
at org.h2.message.DbException.convert(DbException.java:307)
at 
org.apache.ignite.internal.processors.query.h2.database.H2TreeIndex.putx(H2TreeIndex.java:302)
at 
org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.addToIndex(GridH2Table.java:546)
at 
org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.update(GridH2Table.java:479)
at 
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.store(IgniteH2Indexing.java:768)
at 
org.apache.ignite.internal.processors.query.GridQueryProcessor.store(GridQueryProcessor.java:1905)
at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.store(GridCacheQueryManager.java:404)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.finishUpdate(IgniteCacheOffheapManagerImpl.java:2633)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1646)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1621)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:1935)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:428)
at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerUpdate(GridCacheMapEntry.java:2295)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateSingle(GridDhtAtomicCache.java:2494)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update(GridDhtAtomicCache.java:1951)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1780)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1668)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.sendSingleRequest(GridNearAtomicAbstractUpdateFuture.java:299)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.map(GridNearAtomicSingleUpdateFuture.java:483)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.mapOnTopology(GridNearAtomicSingleUpdateFuture.java:443)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.map(GridNearAtomicAbstractUpdateFuture.java:248)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update0(GridDhtAtomicCache.java:1153)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.put0(GridDhtAtomicCache.java:611)
at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2449)
at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2426)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.put(IgniteCacheProxyImpl.java:1105)
at 
org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.put(GatewayProtectedCacheProxy.java:820)
at IndexCorruptionReproducer$1.run(IndexCorruptionReproducer.java:43)
at 
java.util.concurrent.Executors$RunnableAdapter.call$$$capture(Executors.java:511)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java)
at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
at java.util.concurrent.FutureTask.run(FutureTask.java)
at 

[jira] [Created] (IGNITE-10873) CorruptedTreeException during simultaneous cache put operations

2019-01-09 Thread Pavel Vinokurov (JIRA)
Pavel Vinokurov created IGNITE-10873:


 Summary: CorruptedTreeException during simultaneous cache put 
operations
 Key: IGNITE-10873
 URL: https://issues.apache.org/jira/browse/IGNITE-10873
 Project: Ignite
  Issue Type: Bug
  Components: cache, persistence
Affects Versions: 2.7
Reporter: Pavel Vinokurov


[2019-01-09 20:47:04,376][ERROR][pool-9-thread-9][GridDhtAtomicCache] 
 Unexpected exception during cache update
org.h2.message.DbException: General error: "class 
org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
 Runtime failure on row: Row@780acfb4[ key: model.TbclsrInputKey 
[idHash=1823856383, hash=275143246, clsbInputRef=GTEST, firstInputFlag=254], 
val: model.TbclsrInput [idHash=708235920, hash=-19147671, clsbMatchRef=null, 
origBic=null, desStlmtMbrBic=null, cpBic=null, cpDesSmBic=null, 
desSmManuAuth=0, origRef=null, relatedRef=null, commonRef=null, 
clsbTransRef=null, lastAmdSendRef=null, branchId=null, inputType=null, 
formatType=0, sourceType=0, sourceId=0, operType=null, fwdBookFlag=0, 
possDupFlag=0, sameDayFlag=0, pendingFlag=0, rescOrigSmFlag=0, 
rescCpCpsmFlag=0, stlmtEligFlag=0, authTms=null, ntfId=0, inputStatus=0, 
lastActionTms=null, ofacStatus=0, ofacTms=null, prevInputStatus=0, 
prevTms=null, cpOfacStatus=0, sentDt=null, valueDt=null, tradeDt=null, 
origSuspFlag=0, origSmSuspFlag=0, cpSuspFlag=0, cpSmSuspFlag=0, currSuspFlag=0, 
tpIndicatorFlag=0, tpBic=null, tpReference=null, tpFreeText=null, 
tpFurtherRef=null, tpCustIntRef=null, tpMbrField1=null, tpMbrField2=null, 
exchRate=0.0, currIdBuy=0, volBuy=0.0, currIdSell=0, volSell=0.0, 
inputVersionId=0, versionId=null, grpQueueOrderNo=0, queueOrderNo=0, 
originalGroupId=0, groupStatus=0, usi=null, prevUsi=null, origLei=null, 
cpLei=null, fundLei=null, reportJuris=null, execVenue=null, execTms=null, 
execTmsUtcoff=null, mappingRule=null, reportJuris2=null, usi2=null, 
prevUsi2=null, reportJuris3=null, usi3=null, prevUsi3=null], ver: 
GridCacheVersion [topVer=158536014, order=1547056011256, nodeOrder=1] ][ GTEST, 
null, 254, null, null, null, null, 0, null, null, null, null, null, null, null, 
0, 0, 0, null, 0, 0, 0, 0, 0, 0, 0, null, 0, 0, null, 0, null, 0, null, 0, 
null, null, null, 0, 0, 0, 0, 0, 0, null, null, null, null, null, null, null, 
0.0, 0, 0.0, 0, 0.0, 0, null, 0, 0, 0, 0, null, null, null, null, null, null, 
null, null, null, null, null, null, null, null, null, null ]" [5-197]
at org.h2.message.DbException.get(DbException.java:168)
at org.h2.message.DbException.convert(DbException.java:307)
at 
org.apache.ignite.internal.processors.query.h2.database.H2TreeIndex.putx(H2TreeIndex.java:302)
at 
org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.addToIndex(GridH2Table.java:546)
at 
org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.update(GridH2Table.java:479)
at 
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.store(IgniteH2Indexing.java:768)
at 
org.apache.ignite.internal.processors.query.GridQueryProcessor.store(GridQueryProcessor.java:1905)
at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.store(GridCacheQueryManager.java:404)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.finishUpdate(IgniteCacheOffheapManagerImpl.java:2633)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1646)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1621)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:1935)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:428)
at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerUpdate(GridCacheMapEntry.java:2295)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateSingle(GridDhtAtomicCache.java:2494)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update(GridDhtAtomicCache.java:1951)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1780)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1668)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.sendSingleRequest(GridNearAtomicAbstractUpdateFuture.java:299)

[jira] [Created] (IGNITE-10524) IgniteCache.iterator() from a client node leads to OOM

2018-12-04 Thread Pavel Vinokurov (JIRA)
Pavel Vinokurov created IGNITE-10524:


 Summary: IgniteCache.iterator() from a client node leads to OOM
 Key: IGNITE-10524
 URL: https://issues.apache.org/jira/browse/IGNITE-10524
 Project: Ignite
  Issue Type: Bug
  Components: cache
Affects Versions: 2.4
Reporter: Pavel Vinokurov


Looks like "iterator()" method perform a scan query and load all cache rows 
into heap. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10291) Unable to find row by index created on partial baseline topology

2018-11-16 Thread Pavel Vinokurov (JIRA)
Pavel Vinokurov created IGNITE-10291:


 Summary: Unable to find row by index created on partial baseline 
topology
 Key: IGNITE-10291
 URL: https://issues.apache.org/jira/browse/IGNITE-10291
 Project: Ignite
  Issue Type: Bug
  Components: cache, sql
Affects Versions: 2.6, 2.5, 2.4
Reporter: Pavel Vinokurov
 Attachments: Reproducer.java

Steps to reproduce:
1. Start two nodes cluster with persistence.
2. Create table
CREATE TABLE PERSON (
 FIRST_NAME VARCHAR,
 LAST_NAME VARCHAR,
 ADDRESS VARCHAR,
 LANG VARCHAR,
 BIRTH_DATE TIMESTAMP,
 CONSTRAINT PK_PESON PRIMARY KEY (FIRST_NAME,LAST_NAME,ADDRESS,LANG)
) WITH "key_type=PersonKeyType, CACHE_NAME=PersonCache, 
value_type=PersonValueType, 
AFFINITY_KEY=FIRST_NAME,template=PARTITIONED,backups=1"

Insert 1000 rows.
3. Stop the second node.
4. Create index
create index PERSON_FIRST_NAME_IDX on  PERSON(FIRST_NAME)
5. Start the second node
6. Perform select query for each row:
select * from PERSON use index(PERSON_FIRST_NAME_IDX) 
where 
FIRST_NAME=?
and LAST_NAME=?
and ADDRESS=?
and LANG  = ? 

Result: The select doesn't return row in half of cases.

The reproducer is attached.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-10110) SQL query with DISTINCT and JOIN in suquery produces "Column not found"

2018-11-01 Thread Pavel Vinokurov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-10110:
-
Description: 
Initial script:
CREATE TABLE Person(
  person_id INTEGER PRIMARY KEY,
  company_id INTEGER,
  last_name VARCHAR(100)
);

CREATE TABLE Company(
  company_id INTEGER PRIMARY KEY,
  location_id INTEGER
);

CREATE TABLE Department(
  department_id INTEGER PRIMARY KEY,
  person_id INTEGER
);

CREATE TABLE Organization(
  organization_id INTEGER PRIMARY KEY,
  company_id INTEGER
);

Query:

{code:java}
SELECT
last_name
FROM
( SELECT
last_name,
person_id,
company_id
FROM
( SELECT
last_name,
person_id,
p.company_id as company_id
FROM
Person p
INNER JOIN
(
SELECT
DISTINCT location_id,
company_id
FROM
Company
WHERE
location_id = 1
) cpy
ON (
p.company_id = cpy.company_id
)
) a
) src
INNER JOIN
department dep
ON src.person_id = dep.person_id
LEFT JOIN
organization og
ON src.company_id = og.company_id
{code}


Result:
Caused by: org.h2.jdbc.JdbcSQLException: Column "SRC__Z4.COMPANY_ID" not found; 
SQL statement:
SELECT
DEP__Z5.PERSON_ID __C2_0
FROM PUBLIC.DEPARTMENT DEP__Z5 
 LEFT OUTER JOIN PUBLIC.ORGANIZATION OG__Z6 
 ON SRC__Z4.COMPANY_ID = OG__Z6.COMPANY_ID

  was:
Initial script:
CREATE TABLE Person(
  person_id INTEGER PRIMARY KEY,
  company_id INTEGER,
  last_name VARCHAR(100)
);

CREATE TABLE Company(
  company_id INTEGER PRIMARY KEY,
  location_id INTEGER
);

CREATE TABLE Department(
  department_id INTEGER PRIMARY KEY,
  person_id INTEGER
);

CREATE TABLE Organization(
  organization_id INTEGER PRIMARY KEY,
  company_id INTEGER
);

Query:
SELECT
last_name
FROM
(  SELECT
last_name,
person_id,
company_id
FROM
( SELECT
last_name,
person_id,
p.company_id as company_id
FROM
Person p
INNER JOIN
(
SELECT
DISTINCT location_id,
company_id
FROM
Company
WHERE
location_id = 1
) cpy
ON (
p.company_id = cpy.company_id
)
) a
  ) src
INNER JOIN
department dep
ON src.person_id = dep.person_id
LEFT JOIN
organization og
ON src.company_id = og.company_id

Result:
Caused by: org.h2.jdbc.JdbcSQLException: Column "SRC__Z4.COMPANY_ID" not found; 
SQL statement:
SELECT
DEP__Z5.PERSON_ID __C2_0
FROM PUBLIC.DEPARTMENT DEP__Z5 
 LEFT OUTER JOIN PUBLIC.ORGANIZATION OG__Z6 
 ON SRC__Z4.COMPANY_ID = OG__Z6.COMPANY_ID


> SQL query with DISTINCT and JOIN in suquery produces "Column  not found" 
> -
>
> Key: IGNITE-10110
> URL: https://issues.apache.org/jira/browse/IGNITE-10110
> Project: Ignite
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 2.4
>Reporter: Pavel Vinokurov
>Priority: Major
>  Labels: sql
>
> Initial script:
> CREATE TABLE Person(
>   person_id INTEGER PRIMARY KEY,
>   company_id INTEGER,
>   last_name VARCHAR(100)
> );
> CREATE TABLE Company(
>   company_id INTEGER PRIMARY KEY,
>   location_id INTEGER
> );
> CREATE TABLE Department(
>   department_id INTEGER PRIMARY KEY,
>   person_id INTEGER
> );
> CREATE TABLE Organization(
>   organization_id INTEGER PRIMARY KEY,
>   company_id INTEGER
> );
> Query:
> {code:java}
> SELECT
> last_name
> FROM
> ( SELECT
> last_name,
> person_id,
> company_id
> FROM
> ( SELECT
> last_name,
> person_id,
> p.company_id as company_id
> FROM
> Person p
> INNER JOIN
> (
> SELECT
> DISTINCT location_id,
> company_id
> FROM
> Company
> WHERE
> location_id = 1
> ) cpy
> ON (
> p.company_id = cpy.company_id
> )
> ) a
> ) src
> INNER JOIN
> department dep
> ON src.person_id = dep.person_id
> LEFT JOIN
> organization og
> ON src.company_id = og.company_id
> {code}
> Result:
> Caused by: org.h2.jdbc.JdbcSQLException: Column "SRC__Z4.COMPANY_ID" not 
> found; SQL statement:
> SELECT
> DEP__Z5.PERSON_ID __C2_0
> FROM PUBLIC.DEPARTMENT DEP__Z5 
>  LEFT OUTER JOIN PUBLIC.ORGANIZATION OG__Z6 
>  ON SRC__Z4.COMPANY_ID = OG__Z6.COMPANY_ID



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-10110) SQL query with DISTINCT and JOIN in suquery produces "Column not found"

2018-11-01 Thread Pavel Vinokurov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Vinokurov updated IGNITE-10110:
-
Description: 
Initial script:
CREATE TABLE Person(
  person_id INTEGER PRIMARY KEY,
  company_id INTEGER,
  last_name VARCHAR(100)
);

CREATE TABLE Company(
  company_id INTEGER PRIMARY KEY,
  location_id INTEGER
);

CREATE TABLE Department(
  department_id INTEGER PRIMARY KEY,
  person_id INTEGER
);

CREATE TABLE Organization(
  organization_id INTEGER PRIMARY KEY,
  company_id INTEGER
);

Query:

{code:java}
SELECT
last_name
FROM
(  SELECT
last_name,
person_id,
company_id
FROM
( SELECT
last_name,
person_id,
p.company_id as company_id
FROM
Person p
INNER JOIN
(
SELECT
DISTINCT location_id,
company_id
FROM
Company
WHERE
location_id = 1
) cpy
ON (
p.company_id = cpy.company_id
)
) a
  ) src
INNER JOIN
department dep
ON src.person_id = dep.person_id
LEFT JOIN
organization og
ON src.company_id = og.company_id
{code}


Result:
Caused by: org.h2.jdbc.JdbcSQLException: Column "SRC__Z4.COMPANY_ID" not found; 
SQL statement:
SELECT
DEP__Z5.PERSON_ID __C2_0
FROM PUBLIC.DEPARTMENT DEP__Z5 
 LEFT OUTER JOIN PUBLIC.ORGANIZATION OG__Z6 
 ON SRC__Z4.COMPANY_ID = OG__Z6.COMPANY_ID

  was:
Initial script:
CREATE TABLE Person(
  person_id INTEGER PRIMARY KEY,
  company_id INTEGER,
  last_name VARCHAR(100)
);

CREATE TABLE Company(
  company_id INTEGER PRIMARY KEY,
  location_id INTEGER
);

CREATE TABLE Department(
  department_id INTEGER PRIMARY KEY,
  person_id INTEGER
);

CREATE TABLE Organization(
  organization_id INTEGER PRIMARY KEY,
  company_id INTEGER
);

Query:

{code:java}
SELECT
last_name
FROM
( SELECT
last_name,
person_id,
company_id
FROM
( SELECT
last_name,
person_id,
p.company_id as company_id
FROM
Person p
INNER JOIN
(
SELECT
DISTINCT location_id,
company_id
FROM
Company
WHERE
location_id = 1
) cpy
ON (
p.company_id = cpy.company_id
)
) a
) src
INNER JOIN
department dep
ON src.person_id = dep.person_id
LEFT JOIN
organization og
ON src.company_id = og.company_id
{code}


Result:
Caused by: org.h2.jdbc.JdbcSQLException: Column "SRC__Z4.COMPANY_ID" not found; 
SQL statement:
SELECT
DEP__Z5.PERSON_ID __C2_0
FROM PUBLIC.DEPARTMENT DEP__Z5 
 LEFT OUTER JOIN PUBLIC.ORGANIZATION OG__Z6 
 ON SRC__Z4.COMPANY_ID = OG__Z6.COMPANY_ID


> SQL query with DISTINCT and JOIN in suquery produces "Column  not found" 
> -
>
> Key: IGNITE-10110
> URL: https://issues.apache.org/jira/browse/IGNITE-10110
> Project: Ignite
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 2.4
>Reporter: Pavel Vinokurov
>Priority: Major
>  Labels: sql
>
> Initial script:
> CREATE TABLE Person(
>   person_id INTEGER PRIMARY KEY,
>   company_id INTEGER,
>   last_name VARCHAR(100)
> );
> CREATE TABLE Company(
>   company_id INTEGER PRIMARY KEY,
>   location_id INTEGER
> );
> CREATE TABLE Department(
>   department_id INTEGER PRIMARY KEY,
>   person_id INTEGER
> );
> CREATE TABLE Organization(
>   organization_id INTEGER PRIMARY KEY,
>   company_id INTEGER
> );
> Query:
> {code:java}
> SELECT
> last_name
> FROM
> (  SELECT
> last_name,
> person_id,
> company_id
> FROM
> ( SELECT
> last_name,
> person_id,
> p.company_id as company_id
> FROM
> Person p
> INNER JOIN
> (
> SELECT
> DISTINCT location_id,
> company_id
> FROM
> Company
> WHERE
> location_id = 1
> ) cpy
> ON (
> p.company_id = cpy.company_id
> )
> ) a
>   ) src
> INNER JOIN
> department dep
> ON src.person_id = dep.person_id
> LEFT JOIN
> organization og
> ON src.company_id = og.company_id
> {code}
> Result:
> Caused by: org.h2.jdbc.JdbcSQLException: Column "SRC__Z4.COMPANY_ID" not 
> found; SQL statement:
> SELECT
> DEP__Z5.PERSON_ID __C2_0
> FROM PUBLIC.DEPARTMENT DEP__Z5 
>  LEFT OUTER JOIN PUBLIC.ORGANIZATION OG__Z6 
>  ON SRC__Z4.COMPANY_ID = OG__Z6.COMPANY_ID



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-10110) SQL query with DISTINCT and JOIN in suquery produces "Column not found"

2018-11-01 Thread Pavel Vinokurov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16671457#comment-16671457
 ] 

Pavel Vinokurov edited comment on IGNITE-10110 at 11/1/18 10:48 AM:


{code:java}
SELECT
last_name
FROM
(   SELECT
DISTINCT last_name,
person_id,
company_id
FROM Person
  ) src
INNER JOIN
department dep
ON src.person_id = dep.person_id
LEFT JOIN
organization og
ON src.company_id = og.company_id
{code}


was (Author: pvinokurov):
Simplified query:
SELECT
last_name
FROM
(   SELECT
DISTINCT last_name,
person_id,
company_id
FROM Person
  ) src
INNER JOIN
department dep
ON src.person_id = dep.person_id
LEFT JOIN
organization og
ON src.company_id = og.company_id

> SQL query with DISTINCT and JOIN in suquery produces "Column  not found" 
> -
>
> Key: IGNITE-10110
> URL: https://issues.apache.org/jira/browse/IGNITE-10110
> Project: Ignite
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 2.4
>Reporter: Pavel Vinokurov
>Priority: Major
>  Labels: sql
>
> Initial script:
> CREATE TABLE Person(
>   person_id INTEGER PRIMARY KEY,
>   company_id INTEGER,
>   last_name VARCHAR(100)
> );
> CREATE TABLE Company(
>   company_id INTEGER PRIMARY KEY,
>   location_id INTEGER
> );
> CREATE TABLE Department(
>   department_id INTEGER PRIMARY KEY,
>   person_id INTEGER
> );
> CREATE TABLE Organization(
>   organization_id INTEGER PRIMARY KEY,
>   company_id INTEGER
> );
> Query:
> SELECT
> last_name
> FROM
> (  SELECT
> last_name,
> person_id,
> company_id
> FROM
> ( SELECT
> last_name,
> person_id,
> p.company_id as company_id
> FROM
> Person p
> INNER JOIN
> (
> SELECT
> DISTINCT location_id,
> company_id
> FROM
> Company
> WHERE
> location_id = 1
> ) cpy
> ON (
> p.company_id = cpy.company_id
> )
> ) a
>   ) src
> INNER JOIN
> department dep
> ON src.person_id = dep.person_id
> LEFT JOIN
> organization og
> ON src.company_id = og.company_id
> Result:
> Caused by: org.h2.jdbc.JdbcSQLException: Column "SRC__Z4.COMPANY_ID" not 
> found; SQL statement:
> SELECT
> DEP__Z5.PERSON_ID __C2_0
> FROM PUBLIC.DEPARTMENT DEP__Z5 
>  LEFT OUTER JOIN PUBLIC.ORGANIZATION OG__Z6 
>  ON SRC__Z4.COMPANY_ID = OG__Z6.COMPANY_ID



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-10110) SQL query with DISTINCT and JOIN in suquery produces "Column not found"

2018-11-01 Thread Pavel Vinokurov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16671457#comment-16671457
 ] 

Pavel Vinokurov edited comment on IGNITE-10110 at 11/1/18 10:48 AM:


Simplified query:
{code:java}
SELECT
last_name
FROM
(   SELECT
DISTINCT last_name,
person_id,
company_id
FROM Person
  ) src
INNER JOIN
department dep
ON src.person_id = dep.person_id
LEFT JOIN
organization og
ON src.company_id = og.company_id
{code}


was (Author: pvinokurov):
{code:java}
SELECT
last_name
FROM
(   SELECT
DISTINCT last_name,
person_id,
company_id
FROM Person
  ) src
INNER JOIN
department dep
ON src.person_id = dep.person_id
LEFT JOIN
organization og
ON src.company_id = og.company_id
{code}

> SQL query with DISTINCT and JOIN in suquery produces "Column  not found" 
> -
>
> Key: IGNITE-10110
> URL: https://issues.apache.org/jira/browse/IGNITE-10110
> Project: Ignite
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 2.4
>Reporter: Pavel Vinokurov
>Priority: Major
>  Labels: sql
>
> Initial script:
> CREATE TABLE Person(
>   person_id INTEGER PRIMARY KEY,
>   company_id INTEGER,
>   last_name VARCHAR(100)
> );
> CREATE TABLE Company(
>   company_id INTEGER PRIMARY KEY,
>   location_id INTEGER
> );
> CREATE TABLE Department(
>   department_id INTEGER PRIMARY KEY,
>   person_id INTEGER
> );
> CREATE TABLE Organization(
>   organization_id INTEGER PRIMARY KEY,
>   company_id INTEGER
> );
> Query:
> SELECT
> last_name
> FROM
> (  SELECT
> last_name,
> person_id,
> company_id
> FROM
> ( SELECT
> last_name,
> person_id,
> p.company_id as company_id
> FROM
> Person p
> INNER JOIN
> (
> SELECT
> DISTINCT location_id,
> company_id
> FROM
> Company
> WHERE
> location_id = 1
> ) cpy
> ON (
> p.company_id = cpy.company_id
> )
> ) a
>   ) src
> INNER JOIN
> department dep
> ON src.person_id = dep.person_id
> LEFT JOIN
> organization og
> ON src.company_id = og.company_id
> Result:
> Caused by: org.h2.jdbc.JdbcSQLException: Column "SRC__Z4.COMPANY_ID" not 
> found; SQL statement:
> SELECT
> DEP__Z5.PERSON_ID __C2_0
> FROM PUBLIC.DEPARTMENT DEP__Z5 
>  LEFT OUTER JOIN PUBLIC.ORGANIZATION OG__Z6 
>  ON SRC__Z4.COMPANY_ID = OG__Z6.COMPANY_ID



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   >