[jira] [Assigned] (IGNITE-21051) Fix javadocs for IndexQuery

2023-12-12 Thread Oleg Valuyskiy (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleg Valuyskiy reassigned IGNITE-21051:
---

Assignee: Oleg Valuyskiy

> Fix javadocs for IndexQuery
> ---
>
> Key: IGNITE-21051
> URL: https://issues.apache.org/jira/browse/IGNITE-21051
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Maksim Timonin
>Assignee: Oleg Valuyskiy
>Priority: Major
>  Labels: ise, newbie
>
> It's required to fix javadoc formatting in the `IndexQuery` class. Now it 
> renders the algorithm list in single line. Should use "ul", "li" tags for 
> correct rendering.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20992) ODBC 3.0: Propagate username to connection_info

2023-12-12 Thread Igor Sapego (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Sapego updated IGNITE-20992:
-
Fix Version/s: 3.0.0-beta2

> ODBC 3.0: Propagate username to connection_info
> ---
>
> Key: IGNITE-20992
> URL: https://issues.apache.org/jira/browse/IGNITE-20992
> Project: Ignite
>  Issue Type: New Feature
>  Components: odbc
>Reporter: Dmitrii Zabotlin
>Assignee: Dmitrii Zabotlin
>Priority: Major
>  Labels: ignite-3, odbc
> Fix For: 3.0.0-beta2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20992) ODBC 3.0: Propagate username to connection_info

2023-12-12 Thread Igor Sapego (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Sapego updated IGNITE-20992:
-
Ignite Flags:   (was: Docs Required,Release Notes Required)

> ODBC 3.0: Propagate username to connection_info
> ---
>
> Key: IGNITE-20992
> URL: https://issues.apache.org/jira/browse/IGNITE-20992
> Project: Ignite
>  Issue Type: New Feature
>  Components: odbc
>Reporter: Dmitrii Zabotlin
>Assignee: Dmitrii Zabotlin
>Priority: Major
>  Labels: ignite-3, odbc
> Fix For: 3.0.0-beta2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-20992) ODBC 3.0: Propagate username to connection_info

2023-12-12 Thread Igor Sapego (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-20992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795904#comment-17795904
 ] 

Igor Sapego commented on IGNITE-20992:
--

Looks good to me.

> ODBC 3.0: Propagate username to connection_info
> ---
>
> Key: IGNITE-20992
> URL: https://issues.apache.org/jira/browse/IGNITE-20992
> Project: Ignite
>  Issue Type: New Feature
>  Components: odbc
>Reporter: Dmitrii Zabotlin
>Assignee: Dmitrii Zabotlin
>Priority: Major
>  Labels: ignite-3, odbc
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-21059) We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running cache operations

2023-12-12 Thread Vipul Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-21059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795899#comment-17795899
 ] 

Vipul Thakur commented on IGNITE-21059:
---

Thank you for your response [~zstan]  

Will make the above changes and let you know how it goes, will also provide you 
the logs from all nodes.

> We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running 
> cache operations
> 
>
> Key: IGNITE-21059
> URL: https://issues.apache.org/jira/browse/IGNITE-21059
> Project: Ignite
>  Issue Type: Bug
>  Components: binary, clients
>Affects Versions: 2.14
>Reporter: Vipul Thakur
>Priority: Critical
> Attachments: cache-config-1.xml, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt2, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt3, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt2, 
> ignite-server-nohup.out
>
>
> We have recently upgraded from 2.7.6 to 2.14 due to the issue observed in 
> production environment where cluster would go in hang state due to partition 
> map exchange.
> Please find the below ticket which i created a while back for ignite 2.7.6
> https://issues.apache.org/jira/browse/IGNITE-13298
> So we migrated the apache ignite version to 2.14 and upgrade happened 
> smoothly but on the third day we could see cluster traffic dip again. 
> We have 5 nodes in a cluster where we provide 400 GB of RAM and more than 1 
> TB SDD.
> PFB for the attached config.[I have added it as attachment for review]
> I have also added the server logs from the same time when issue happened.
> We have set txn timeout as well as socket timeout both at server and client 
> end for our write operations but seems like sometimes cluster goes into hang 
> state and all our get calls are stuck and slowly everything starts to freeze 
> our jms listener threads and every thread reaches a choked up state in 
> sometime.
> Due to which our read services which does not even use txn to retrieve data 
> also starts to choke. Ultimately leading to end user traffic dip.
> We were hoping product upgrade will help but that has not been the case till 
> now. 
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-21059) We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running cache operations

2023-12-12 Thread Evgeny Stanilovsky (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-21059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795883#comment-17795883
 ] 

Evgeny Stanilovsky commented on IGNITE-21059:
-

1. yep just erase or comment them in config
2. ok here
3. 30 sec too match, if you detect tx rollback by timeout you can rerun it 
(check - optimistic tx may be more faster) but there are some differences tx 
write AND read can throws exception ! 
https://ignite.apache.org/docs/latest/key-value-api/transactions
4. can`t suggest here need to consider concrete usage.

> We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running 
> cache operations
> 
>
> Key: IGNITE-21059
> URL: https://issues.apache.org/jira/browse/IGNITE-21059
> Project: Ignite
>  Issue Type: Bug
>  Components: binary, clients
>Affects Versions: 2.14
>Reporter: Vipul Thakur
>Priority: Critical
> Attachments: cache-config-1.xml, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt2, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt3, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt2, 
> ignite-server-nohup.out
>
>
> We have recently upgraded from 2.7.6 to 2.14 due to the issue observed in 
> production environment where cluster would go in hang state due to partition 
> map exchange.
> Please find the below ticket which i created a while back for ignite 2.7.6
> https://issues.apache.org/jira/browse/IGNITE-13298
> So we migrated the apache ignite version to 2.14 and upgrade happened 
> smoothly but on the third day we could see cluster traffic dip again. 
> We have 5 nodes in a cluster where we provide 400 GB of RAM and more than 1 
> TB SDD.
> PFB for the attached config.[I have added it as attachment for review]
> I have also added the server logs from the same time when issue happened.
> We have set txn timeout as well as socket timeout both at server and client 
> end for our write operations but seems like sometimes cluster goes into hang 
> state and all our get calls are stuck and slowly everything starts to freeze 
> our jms listener threads and every thread reaches a choked up state in 
> sometime.
> Due to which our read services which does not even use txn to retrieve data 
> also starts to choke. Ultimately leading to end user traffic dip.
> We were hoping product upgrade will help but that has not been the case till 
> now. 
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-21059) We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running cache operations

2023-12-12 Thread Vipul Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-21059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795867#comment-17795867
 ] 

Vipul Thakur commented on IGNITE-21059:
---

So as per my understanding i will be doing the following, please correct me if 
am wrong : 

failureDetectionTimeout , clientFailureDetectionTimeout will switch back to 
default values which is 10secs and 30secs

will increase the walSegmentSize from default 64mb to bigger value maybe around 
512mb. [limit value being 2Gb]

Any comments regarding the txn timeout value which is 30secs at client.

TcpDiscoveryVmIpFinder – socket timeout is 60secs at server end and 5secs at 
client end.

> We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running 
> cache operations
> 
>
> Key: IGNITE-21059
> URL: https://issues.apache.org/jira/browse/IGNITE-21059
> Project: Ignite
>  Issue Type: Bug
>  Components: binary, clients
>Affects Versions: 2.14
>Reporter: Vipul Thakur
>Priority: Critical
> Attachments: cache-config-1.xml, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt2, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt3, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt2, 
> ignite-server-nohup.out
>
>
> We have recently upgraded from 2.7.6 to 2.14 due to the issue observed in 
> production environment where cluster would go in hang state due to partition 
> map exchange.
> Please find the below ticket which i created a while back for ignite 2.7.6
> https://issues.apache.org/jira/browse/IGNITE-13298
> So we migrated the apache ignite version to 2.14 and upgrade happened 
> smoothly but on the third day we could see cluster traffic dip again. 
> We have 5 nodes in a cluster where we provide 400 GB of RAM and more than 1 
> TB SDD.
> PFB for the attached config.[I have added it as attachment for review]
> I have also added the server logs from the same time when issue happened.
> We have set txn timeout as well as socket timeout both at server and client 
> end for our write operations but seems like sometimes cluster goes into hang 
> state and all our get calls are stuck and slowly everything starts to freeze 
> our jms listener threads and every thread reaches a choked up state in 
> sometime.
> Due to which our read services which does not even use txn to retrieve data 
> also starts to choke. Ultimately leading to end user traffic dip.
> We were hoping product upgrade will help but that has not been the case till 
> now. 
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-21059) We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running cache operations

2023-12-12 Thread Evgeny Stanilovsky (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-21059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795864#comment-17795864
 ] 

Evgeny Stanilovsky commented on IGNITE-21059:
-

if you are talking about TcpDiscoveryVmIpFinder socket timeout - this is not a 
linked things ...
i suggest to stay with both failureDetectionTimeout, 
clientFailureDetectionTimeout defaults and tune it only if you really found it 
would be helpful, but all failure issues need to be investigated, if system 
detects slow client (no matter where problem is, io\net\jvm pause) seems you no 
need such a client and need to fix the problem which leads to such situation at 
first.

> We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running 
> cache operations
> 
>
> Key: IGNITE-21059
> URL: https://issues.apache.org/jira/browse/IGNITE-21059
> Project: Ignite
>  Issue Type: Bug
>  Components: binary, clients
>Affects Versions: 2.14
>Reporter: Vipul Thakur
>Priority: Critical
> Attachments: cache-config-1.xml, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt2, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt3, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt2, 
> ignite-server-nohup.out
>
>
> We have recently upgraded from 2.7.6 to 2.14 due to the issue observed in 
> production environment where cluster would go in hang state due to partition 
> map exchange.
> Please find the below ticket which i created a while back for ignite 2.7.6
> https://issues.apache.org/jira/browse/IGNITE-13298
> So we migrated the apache ignite version to 2.14 and upgrade happened 
> smoothly but on the third day we could see cluster traffic dip again. 
> We have 5 nodes in a cluster where we provide 400 GB of RAM and more than 1 
> TB SDD.
> PFB for the attached config.[I have added it as attachment for review]
> I have also added the server logs from the same time when issue happened.
> We have set txn timeout as well as socket timeout both at server and client 
> end for our write operations but seems like sometimes cluster goes into hang 
> state and all our get calls are stuck and slowly everything starts to freeze 
> our jms listener threads and every thread reaches a choked up state in 
> sometime.
> Due to which our read services which does not even use txn to retrieve data 
> also starts to choke. Ultimately leading to end user traffic dip.
> We were hoping product upgrade will help but that has not been the case till 
> now. 
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (IGNITE-21059) We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running cache operations

2023-12-12 Thread Vipul Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-21059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795861#comment-17795861
 ] 

Vipul Thakur edited comment on IGNITE-21059 at 12/12/23 5:33 PM:
-

We have daily requirement of 90-120 millions request for read and around 15-20 
millions write requests

current values : 

failureDetectionTimeout=12

clientFailureDetectionTimeout= 12

What would be the suggested values should we bring this closer to what 
socketTimeout is like 5secs and should these configuration be same at both 
server and client end?


was (Author: vipul.thakur):
We have daily requirement of 90-120 millions request for read and around 15-20 
millions write requests

current values : 

failureDetectionTimeout=12

clientFailureDetectionTimeout= 12

What would be the suggested value should bring this closer to what 
socketTimeout is like 5secs and should these configuration be same at both 
server and client end?

> We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running 
> cache operations
> 
>
> Key: IGNITE-21059
> URL: https://issues.apache.org/jira/browse/IGNITE-21059
> Project: Ignite
>  Issue Type: Bug
>  Components: binary, clients
>Affects Versions: 2.14
>Reporter: Vipul Thakur
>Priority: Critical
> Attachments: cache-config-1.xml, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt2, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt3, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt2, 
> ignite-server-nohup.out
>
>
> We have recently upgraded from 2.7.6 to 2.14 due to the issue observed in 
> production environment where cluster would go in hang state due to partition 
> map exchange.
> Please find the below ticket which i created a while back for ignite 2.7.6
> https://issues.apache.org/jira/browse/IGNITE-13298
> So we migrated the apache ignite version to 2.14 and upgrade happened 
> smoothly but on the third day we could see cluster traffic dip again. 
> We have 5 nodes in a cluster where we provide 400 GB of RAM and more than 1 
> TB SDD.
> PFB for the attached config.[I have added it as attachment for review]
> I have also added the server logs from the same time when issue happened.
> We have set txn timeout as well as socket timeout both at server and client 
> end for our write operations but seems like sometimes cluster goes into hang 
> state and all our get calls are stuck and slowly everything starts to freeze 
> our jms listener threads and every thread reaches a choked up state in 
> sometime.
> Due to which our read services which does not even use txn to retrieve data 
> also starts to choke. Ultimately leading to end user traffic dip.
> We were hoping product upgrade will help but that has not been the case till 
> now. 
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (IGNITE-21059) We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running cache operations

2023-12-12 Thread Vipul Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-21059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795861#comment-17795861
 ] 

Vipul Thakur edited comment on IGNITE-21059 at 12/12/23 5:32 PM:
-

We have daily requirement of 90-120 millions request for read and around 15-20 
millions write requests

current values : 

failureDetectionTimeout=12

clientFailureDetectionTimeout= 12

What would be the suggested value should bring this closer to what 
socketTimeout is like 5secs and should these configuration be same at both 
server and client end?


was (Author: vipul.thakur):
We have daily requirement of 90-120 millions request for read and around 15-20 
millions 

current values : 

failureDetectionTimeout=12

clientFailureDetectionTimeout= 12

What would be the suggested value should bring this closer to what 
socketTimeout is like 5secs and should these configuration be same at both 
server and client end?

> We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running 
> cache operations
> 
>
> Key: IGNITE-21059
> URL: https://issues.apache.org/jira/browse/IGNITE-21059
> Project: Ignite
>  Issue Type: Bug
>  Components: binary, clients
>Affects Versions: 2.14
>Reporter: Vipul Thakur
>Priority: Critical
> Attachments: cache-config-1.xml, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt2, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt3, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt2, 
> ignite-server-nohup.out
>
>
> We have recently upgraded from 2.7.6 to 2.14 due to the issue observed in 
> production environment where cluster would go in hang state due to partition 
> map exchange.
> Please find the below ticket which i created a while back for ignite 2.7.6
> https://issues.apache.org/jira/browse/IGNITE-13298
> So we migrated the apache ignite version to 2.14 and upgrade happened 
> smoothly but on the third day we could see cluster traffic dip again. 
> We have 5 nodes in a cluster where we provide 400 GB of RAM and more than 1 
> TB SDD.
> PFB for the attached config.[I have added it as attachment for review]
> I have also added the server logs from the same time when issue happened.
> We have set txn timeout as well as socket timeout both at server and client 
> end for our write operations but seems like sometimes cluster goes into hang 
> state and all our get calls are stuck and slowly everything starts to freeze 
> our jms listener threads and every thread reaches a choked up state in 
> sometime.
> Due to which our read services which does not even use txn to retrieve data 
> also starts to choke. Ultimately leading to end user traffic dip.
> We were hoping product upgrade will help but that has not been the case till 
> now. 
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-21059) We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running cache operations

2023-12-12 Thread Vipul Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-21059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795861#comment-17795861
 ] 

Vipul Thakur commented on IGNITE-21059:
---

We have daily requirement of 90-120 millions request for read and around 15-20 
millions 

current values : 

failureDetectionTimeout=12

clientFailureDetectionTimeout= 12

What would be the suggested value should bring this closer to what 
socketTimeout is like 5secs and should these configuration be same at both 
server and client end?

> We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running 
> cache operations
> 
>
> Key: IGNITE-21059
> URL: https://issues.apache.org/jira/browse/IGNITE-21059
> Project: Ignite
>  Issue Type: Bug
>  Components: binary, clients
>Affects Versions: 2.14
>Reporter: Vipul Thakur
>Priority: Critical
> Attachments: cache-config-1.xml, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt2, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt3, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt2, 
> ignite-server-nohup.out
>
>
> We have recently upgraded from 2.7.6 to 2.14 due to the issue observed in 
> production environment where cluster would go in hang state due to partition 
> map exchange.
> Please find the below ticket which i created a while back for ignite 2.7.6
> https://issues.apache.org/jira/browse/IGNITE-13298
> So we migrated the apache ignite version to 2.14 and upgrade happened 
> smoothly but on the third day we could see cluster traffic dip again. 
> We have 5 nodes in a cluster where we provide 400 GB of RAM and more than 1 
> TB SDD.
> PFB for the attached config.[I have added it as attachment for review]
> I have also added the server logs from the same time when issue happened.
> We have set txn timeout as well as socket timeout both at server and client 
> end for our write operations but seems like sometimes cluster goes into hang 
> state and all our get calls are stuck and slowly everything starts to freeze 
> our jms listener threads and every thread reaches a choked up state in 
> sometime.
> Due to which our read services which does not even use txn to retrieve data 
> also starts to choke. Ultimately leading to end user traffic dip.
> We were hoping product upgrade will help but that has not been the case till 
> now. 
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-21059) We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running cache operations

2023-12-12 Thread Evgeny Stanilovsky (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-21059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795860#comment-17795860
 ] 

Evgeny Stanilovsky commented on IGNITE-21059:
-

failureDetectionTimeout - too huge as for me, if someone will hangs, grid will 
wait until this timeout, problem with txs are expected here.
clientFailureDetectionTimeout - the same
rebalanceBatchSize and rebalanceThrottle i suggest defaults.

> We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running 
> cache operations
> 
>
> Key: IGNITE-21059
> URL: https://issues.apache.org/jira/browse/IGNITE-21059
> Project: Ignite
>  Issue Type: Bug
>  Components: binary, clients
>Affects Versions: 2.14
>Reporter: Vipul Thakur
>Priority: Critical
> Attachments: cache-config-1.xml, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt2, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt3, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt2, 
> ignite-server-nohup.out
>
>
> We have recently upgraded from 2.7.6 to 2.14 due to the issue observed in 
> production environment where cluster would go in hang state due to partition 
> map exchange.
> Please find the below ticket which i created a while back for ignite 2.7.6
> https://issues.apache.org/jira/browse/IGNITE-13298
> So we migrated the apache ignite version to 2.14 and upgrade happened 
> smoothly but on the third day we could see cluster traffic dip again. 
> We have 5 nodes in a cluster where we provide 400 GB of RAM and more than 1 
> TB SDD.
> PFB for the attached config.[I have added it as attachment for review]
> I have also added the server logs from the same time when issue happened.
> We have set txn timeout as well as socket timeout both at server and client 
> end for our write operations but seems like sometimes cluster goes into hang 
> state and all our get calls are stuck and slowly everything starts to freeze 
> our jms listener threads and every thread reaches a choked up state in 
> sometime.
> Due to which our read services which does not even use txn to retrieve data 
> also starts to choke. Ultimately leading to end user traffic dip.
> We were hoping product upgrade will help but that has not been the case till 
> now. 
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-21059) We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running cache operations

2023-12-12 Thread Vipul Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-21059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795859#comment-17795859
 ] 

Vipul Thakur commented on IGNITE-21059:
---

We also have configured socket timeout at server and client end but from thread 
dump is seems like its stuck at get call in all the txns.

> We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running 
> cache operations
> 
>
> Key: IGNITE-21059
> URL: https://issues.apache.org/jira/browse/IGNITE-21059
> Project: Ignite
>  Issue Type: Bug
>  Components: binary, clients
>Affects Versions: 2.14
>Reporter: Vipul Thakur
>Priority: Critical
> Attachments: cache-config-1.xml, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt2, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt3, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt2, 
> ignite-server-nohup.out
>
>
> We have recently upgraded from 2.7.6 to 2.14 due to the issue observed in 
> production environment where cluster would go in hang state due to partition 
> map exchange.
> Please find the below ticket which i created a while back for ignite 2.7.6
> https://issues.apache.org/jira/browse/IGNITE-13298
> So we migrated the apache ignite version to 2.14 and upgrade happened 
> smoothly but on the third day we could see cluster traffic dip again. 
> We have 5 nodes in a cluster where we provide 400 GB of RAM and more than 1 
> TB SDD.
> PFB for the attached config.[I have added it as attachment for review]
> I have also added the server logs from the same time when issue happened.
> We have set txn timeout as well as socket timeout both at server and client 
> end for our write operations but seems like sometimes cluster goes into hang 
> state and all our get calls are stuck and slowly everything starts to freeze 
> our jms listener threads and every thread reaches a choked up state in 
> sometime.
> Due to which our read services which does not even use txn to retrieve data 
> also starts to choke. Ultimately leading to end user traffic dip.
> We were hoping product upgrade will help but that has not been the case till 
> now. 
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (IGNITE-21059) We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running cache operations

2023-12-12 Thread Vipul Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-21059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795859#comment-17795859
 ] 

Vipul Thakur edited comment on IGNITE-21059 at 12/12/23 5:12 PM:
-

We also have configured socket timeout at server and client end but from thread 
dump its seems like its stuck at get call in all the txns.


was (Author: vipul.thakur):
We also have configured socket timeout at server and client end but from thread 
dump is seems like its stuck at get call in all the txns.

> We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running 
> cache operations
> 
>
> Key: IGNITE-21059
> URL: https://issues.apache.org/jira/browse/IGNITE-21059
> Project: Ignite
>  Issue Type: Bug
>  Components: binary, clients
>Affects Versions: 2.14
>Reporter: Vipul Thakur
>Priority: Critical
> Attachments: cache-config-1.xml, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt2, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt3, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt2, 
> ignite-server-nohup.out
>
>
> We have recently upgraded from 2.7.6 to 2.14 due to the issue observed in 
> production environment where cluster would go in hang state due to partition 
> map exchange.
> Please find the below ticket which i created a while back for ignite 2.7.6
> https://issues.apache.org/jira/browse/IGNITE-13298
> So we migrated the apache ignite version to 2.14 and upgrade happened 
> smoothly but on the third day we could see cluster traffic dip again. 
> We have 5 nodes in a cluster where we provide 400 GB of RAM and more than 1 
> TB SDD.
> PFB for the attached config.[I have added it as attachment for review]
> I have also added the server logs from the same time when issue happened.
> We have set txn timeout as well as socket timeout both at server and client 
> end for our write operations but seems like sometimes cluster goes into hang 
> state and all our get calls are stuck and slowly everything starts to freeze 
> our jms listener threads and every thread reaches a choked up state in 
> sometime.
> Due to which our read services which does not even use txn to retrieve data 
> also starts to choke. Ultimately leading to end user traffic dip.
> We were hoping product upgrade will help but that has not been the case till 
> now. 
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-21059) We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running cache operations

2023-12-12 Thread Vipul Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-21059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795858#comment-17795858
 ] 

Vipul Thakur commented on IGNITE-21059:
---

In 2.7.6 we use to observe long jvm pause logger in read services and not that 
much in write. 

Such behavior is not observed in 2.14 we have another such setup with same 
amount of nodes in cluster and same amount client serving as another datacenter 
for our api endpoint it has been running with no problems over a month now , 
but when we upgraded our other data center this issue occurred after just 3 
days of upgrade.

> We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running 
> cache operations
> 
>
> Key: IGNITE-21059
> URL: https://issues.apache.org/jira/browse/IGNITE-21059
> Project: Ignite
>  Issue Type: Bug
>  Components: binary, clients
>Affects Versions: 2.14
>Reporter: Vipul Thakur
>Priority: Critical
> Attachments: cache-config-1.xml, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt2, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt3, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt2, 
> ignite-server-nohup.out
>
>
> We have recently upgraded from 2.7.6 to 2.14 due to the issue observed in 
> production environment where cluster would go in hang state due to partition 
> map exchange.
> Please find the below ticket which i created a while back for ignite 2.7.6
> https://issues.apache.org/jira/browse/IGNITE-13298
> So we migrated the apache ignite version to 2.14 and upgrade happened 
> smoothly but on the third day we could see cluster traffic dip again. 
> We have 5 nodes in a cluster where we provide 400 GB of RAM and more than 1 
> TB SDD.
> PFB for the attached config.[I have added it as attachment for review]
> I have also added the server logs from the same time when issue happened.
> We have set txn timeout as well as socket timeout both at server and client 
> end for our write operations but seems like sometimes cluster goes into hang 
> state and all our get calls are stuck and slowly everything starts to freeze 
> our jms listener threads and every thread reaches a choked up state in 
> sometime.
> Due to which our read services which does not even use txn to retrieve data 
> also starts to choke. Ultimately leading to end user traffic dip.
> We were hoping product upgrade will help but that has not been the case till 
> now. 
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-21059) We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running cache operations

2023-12-12 Thread Evgeny Stanilovsky (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-21059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795855#comment-17795855
 ] 

Evgeny Stanilovsky commented on IGNITE-21059:
-

i suppose you no need such huge amount of readers writers, client nodes are not 
the narrow place at all (but it`s not a root cause of course) long jvm pause on 
*client* node - can lead to your problem i think.

> We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running 
> cache operations
> 
>
> Key: IGNITE-21059
> URL: https://issues.apache.org/jira/browse/IGNITE-21059
> Project: Ignite
>  Issue Type: Bug
>  Components: binary, clients
>Affects Versions: 2.14
>Reporter: Vipul Thakur
>Priority: Critical
> Attachments: cache-config-1.xml, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt2, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt3, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt2, 
> ignite-server-nohup.out
>
>
> We have recently upgraded from 2.7.6 to 2.14 due to the issue observed in 
> production environment where cluster would go in hang state due to partition 
> map exchange.
> Please find the below ticket which i created a while back for ignite 2.7.6
> https://issues.apache.org/jira/browse/IGNITE-13298
> So we migrated the apache ignite version to 2.14 and upgrade happened 
> smoothly but on the third day we could see cluster traffic dip again. 
> We have 5 nodes in a cluster where we provide 400 GB of RAM and more than 1 
> TB SDD.
> PFB for the attached config.[I have added it as attachment for review]
> I have also added the server logs from the same time when issue happened.
> We have set txn timeout as well as socket timeout both at server and client 
> end for our write operations but seems like sometimes cluster goes into hang 
> state and all our get calls are stuck and slowly everything starts to freeze 
> our jms listener threads and every thread reaches a choked up state in 
> sometime.
> Due to which our read services which does not even use txn to retrieve data 
> also starts to choke. Ultimately leading to end user traffic dip.
> We were hoping product upgrade will help but that has not been the case till 
> now. 
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (IGNITE-21059) We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running cache operations

2023-12-12 Thread Vipul Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-21059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795851#comment-17795851
 ] 

Vipul Thakur edited comment on IGNITE-21059 at 12/12/23 4:59 PM:
-

we have two k8s cluster connected to that datacenter where in each k8s cluster 
10 are read , 10 are write and 2 are kind of admin service. So in total of 44 
client nodes. And i have also updated our cluster spec its 5 nodes , 400GB RAM 
and 1 Tb SDD.

 

Long JVM pauses were observed in in 2.7.6.


was (Author: vipul.thakur):
we have two k8s cluster connected to that datacenter where in each k8s cluster 
10 are read , 10 are write and 2 are kind of admin service. So in total of 44 
client nodes. And i have also updated our cluster spec its 5 nodes , 400GB RAM 
and 1 Tb SDD.

> We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running 
> cache operations
> 
>
> Key: IGNITE-21059
> URL: https://issues.apache.org/jira/browse/IGNITE-21059
> Project: Ignite
>  Issue Type: Bug
>  Components: binary, clients
>Affects Versions: 2.14
>Reporter: Vipul Thakur
>Priority: Critical
> Attachments: cache-config-1.xml, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt2, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt3, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt2, 
> ignite-server-nohup.out
>
>
> We have recently upgraded from 2.7.6 to 2.14 due to the issue observed in 
> production environment where cluster would go in hang state due to partition 
> map exchange.
> Please find the below ticket which i created a while back for ignite 2.7.6
> https://issues.apache.org/jira/browse/IGNITE-13298
> So we migrated the apache ignite version to 2.14 and upgrade happened 
> smoothly but on the third day we could see cluster traffic dip again. 
> We have 5 nodes in a cluster where we provide 400 GB of RAM and more than 1 
> TB SDD.
> PFB for the attached config.[I have added it as attachment for review]
> I have also added the server logs from the same time when issue happened.
> We have set txn timeout as well as socket timeout both at server and client 
> end for our write operations but seems like sometimes cluster goes into hang 
> state and all our get calls are stuck and slowly everything starts to freeze 
> our jms listener threads and every thread reaches a choked up state in 
> sometime.
> Due to which our read services which does not even use txn to retrieve data 
> also starts to choke. Ultimately leading to end user traffic dip.
> We were hoping product upgrade will help but that has not been the case till 
> now. 
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-21059) We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running cache operations

2023-12-12 Thread Vipul Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-21059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795851#comment-17795851
 ] 

Vipul Thakur commented on IGNITE-21059:
---

we have two k8s cluster connected to that datacenter where in each k8s cluster 
10 are read , 10 are write and 2 are kind of admin service. So in total of 44 
client nodes. And i have also updated our cluster spec its 5 nodes , 400GB RAM 
and 1 Tb SDD.

> We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running 
> cache operations
> 
>
> Key: IGNITE-21059
> URL: https://issues.apache.org/jira/browse/IGNITE-21059
> Project: Ignite
>  Issue Type: Bug
>  Components: binary, clients
>Affects Versions: 2.14
>Reporter: Vipul Thakur
>Priority: Critical
> Attachments: cache-config-1.xml, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt2, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt3, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt2, 
> ignite-server-nohup.out
>
>
> We have recently upgraded from 2.7.6 to 2.14 due to the issue observed in 
> production environment where cluster would go in hang state due to partition 
> map exchange.
> Please find the below ticket which i created a while back for ignite 2.7.6
> https://issues.apache.org/jira/browse/IGNITE-13298
> So we migrated the apache ignite version to 2.14 and upgrade happened 
> smoothly but on the third day we could see cluster traffic dip again. 
> We have 5 nodes in a cluster where we provide 400 GB of RAM and more than 1 
> TB SDD.
> PFB for the attached config.[I have added it as attachment for review]
> I have also added the server logs from the same time when issue happened.
> We have set txn timeout as well as socket timeout both at server and client 
> end for our write operations but seems like sometimes cluster goes into hang 
> state and all our get calls are stuck and slowly everything starts to freeze 
> our jms listener threads and every thread reaches a choked up state in 
> sometime.
> Due to which our read services which does not even use txn to retrieve data 
> also starts to choke. Ultimately leading to end user traffic dip.
> We were hoping product upgrade will help but that has not been the case till 
> now. 
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (IGNITE-21059) We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running cache operations

2023-12-12 Thread Vipul Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-21059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795851#comment-17795851
 ] 

Vipul Thakur edited comment on IGNITE-21059 at 12/12/23 4:59 PM:
-

we have two k8s cluster connected to that datacenter where in each k8s cluster 
10 are read , 10 are write and 2 are kind of admin service. So in total of 44 
client nodes. And i have also updated our cluster spec its 5 nodes , 400GB RAM 
and 1 Tb SDD

Long JVM pauses were observed in in 2.7.6.


was (Author: vipul.thakur):
we have two k8s cluster connected to that datacenter where in each k8s cluster 
10 are read , 10 are write and 2 are kind of admin service. So in total of 44 
client nodes. And i have also updated our cluster spec its 5 nodes , 400GB RAM 
and 1 Tb SDD.

 

Long JVM pauses were observed in in 2.7.6.

> We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running 
> cache operations
> 
>
> Key: IGNITE-21059
> URL: https://issues.apache.org/jira/browse/IGNITE-21059
> Project: Ignite
>  Issue Type: Bug
>  Components: binary, clients
>Affects Versions: 2.14
>Reporter: Vipul Thakur
>Priority: Critical
> Attachments: cache-config-1.xml, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt2, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt3, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt2, 
> ignite-server-nohup.out
>
>
> We have recently upgraded from 2.7.6 to 2.14 due to the issue observed in 
> production environment where cluster would go in hang state due to partition 
> map exchange.
> Please find the below ticket which i created a while back for ignite 2.7.6
> https://issues.apache.org/jira/browse/IGNITE-13298
> So we migrated the apache ignite version to 2.14 and upgrade happened 
> smoothly but on the third day we could see cluster traffic dip again. 
> We have 5 nodes in a cluster where we provide 400 GB of RAM and more than 1 
> TB SDD.
> PFB for the attached config.[I have added it as attachment for review]
> I have also added the server logs from the same time when issue happened.
> We have set txn timeout as well as socket timeout both at server and client 
> end for our write operations but seems like sometimes cluster goes into hang 
> state and all our get calls are stuck and slowly everything starts to freeze 
> our jms listener threads and every thread reaches a choked up state in 
> sometime.
> Due to which our read services which does not even use txn to retrieve data 
> also starts to choke. Ultimately leading to end user traffic dip.
> We were hoping product upgrade will help but that has not been the case till 
> now. 
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21059) We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running cache operations

2023-12-12 Thread Vipul Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vipul Thakur updated IGNITE-21059:
--
Description: 
We have recently upgraded from 2.7.6 to 2.14 due to the issue observed in 
production environment where cluster would go in hang state due to partition 
map exchange.

Please find the below ticket which i created a while back for ignite 2.7.6

https://issues.apache.org/jira/browse/IGNITE-13298

So we migrated the apache ignite version to 2.14 and upgrade happened smoothly 
but on the third day we could see cluster traffic dip again. 

We have 5 nodes in a cluster where we provide 400 GB of RAM and more than 1 TB 
SDD.

PFB for the attached config.[I have added it as attachment for review]

I have also added the server logs from the same time when issue happened.

We have set txn timeout as well as socket timeout both at server and client end 
for our write operations but seems like sometimes cluster goes into hang state 
and all our get calls are stuck and slowly everything starts to freeze our jms 
listener threads and every thread reaches a choked up state in sometime.

Due to which our read services which does not even use txn to retrieve data 
also starts to choke. Ultimately leading to end user traffic dip.

We were hoping product upgrade will help but that has not been the case till 
now. 

 

 

 

 

 

 

  was:
We have recently upgraded from 2.7.6 to 2.14 due to the issue observed in 
production environment where cluster would go in hang state due to partition 
map exchange.

Please find the below ticket which i created a while back for ignite 2.7.6

https://issues.apache.org/jira/browse/IGNITE-13298

So we migrated the apache ignite version to 2.14 and upgrade happened smoothly 
but on the third day we could see cluster traffic dip again. 

We have 4 nodes in a cluster where we provide 400 GB of RAM and more than 1 TB 
HDD.

PFB for the attached config.[I have added it as attachment for review]

I have also added the server logs from the same time when issue happened.

We have set txn timeout as well as socket timeout both at server and client end 
for our write operations but seems like sometimes cluster goes into hang state 
and all our get calls are stuck and slowly everything starts to freeze our jms 
listener threads and every thread reaches a choked up state in sometime.

Due to which our read services which does not even use txn to retrieve data 
also starts to choke. Ultimately leading to end user traffic dip.

We were hoping product upgrade will help but that has not been the case till 
now. 

 

 

 

 

 

 


> We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running 
> cache operations
> 
>
> Key: IGNITE-21059
> URL: https://issues.apache.org/jira/browse/IGNITE-21059
> Project: Ignite
>  Issue Type: Bug
>  Components: binary, clients
>Affects Versions: 2.14
>Reporter: Vipul Thakur
>Priority: Critical
> Attachments: cache-config-1.xml, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt2, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt3, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt2, 
> ignite-server-nohup.out
>
>
> We have recently upgraded from 2.7.6 to 2.14 due to the issue observed in 
> production environment where cluster would go in hang state due to partition 
> map exchange.
> Please find the below ticket which i created a while back for ignite 2.7.6
> https://issues.apache.org/jira/browse/IGNITE-13298
> So we migrated the apache ignite version to 2.14 and upgrade happened 
> smoothly but on the third day we could see cluster traffic dip again. 
> We have 5 nodes in a cluster where we provide 400 GB of RAM and more than 1 
> TB SDD.
> PFB for the attached config.[I have added it as attachment for review]
> I have also added the server logs from the same time when issue happened.
> We have set txn timeout as well as socket timeout both at server and client 
> end for our write operations but seems like sometimes cluster goes into hang 
> state and all our get calls are stuck and slowly everything starts to freeze 
> our jms listener threads and every thread reaches a choked up state in 
> sometime.
> Due to which our read services which does not even use txn to retrieve data 
> also starts to choke. Ultimately leading to end user traffic dip.
> We were hoping product upgrade will help but that has not been the case till 
> now. 
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-21059) We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running cache operations

2023-12-12 Thread Evgeny Stanilovsky (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-21059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795846#comment-17795846
 ] 

Evgeny Stanilovsky commented on IGNITE-21059:
-

do you really need 44 client nodes ? seems that client nodes restart help here 
? is it all ok with client nodes ? no long jvm pauses ?

> We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running 
> cache operations
> 
>
> Key: IGNITE-21059
> URL: https://issues.apache.org/jira/browse/IGNITE-21059
> Project: Ignite
>  Issue Type: Bug
>  Components: binary, clients
>Affects Versions: 2.14
>Reporter: Vipul Thakur
>Priority: Critical
> Attachments: cache-config-1.xml, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt2, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt3, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt2, 
> ignite-server-nohup.out
>
>
> We have recently upgraded from 2.7.6 to 2.14 due to the issue observed in 
> production environment where cluster would go in hang state due to partition 
> map exchange.
> Please find the below ticket which i created a while back for ignite 2.7.6
> https://issues.apache.org/jira/browse/IGNITE-13298
> So we migrated the apache ignite version to 2.14 and upgrade happened 
> smoothly but on the third day we could see cluster traffic dip again. 
> We have 4 nodes in a cluster where we provide 400 GB of RAM and more than 1 
> TB HDD.
> PFB for the attached config.[I have added it as attachment for review]
> I have also added the server logs from the same time when issue happened.
> We have set txn timeout as well as socket timeout both at server and client 
> end for our write operations but seems like sometimes cluster goes into hang 
> state and all our get calls are stuck and slowly everything starts to freeze 
> our jms listener threads and every thread reaches a choked up state in 
> sometime.
> Due to which our read services which does not even use txn to retrieve data 
> also starts to choke. Ultimately leading to end user traffic dip.
> We were hoping product upgrade will help but that has not been the case till 
> now. 
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-21059) We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running cache operations

2023-12-12 Thread Vipul Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-21059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795838#comment-17795838
 ] 

Vipul Thakur commented on IGNITE-21059:
---

Ok please give me sometime and we will change the wal size and let u know.

> We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running 
> cache operations
> 
>
> Key: IGNITE-21059
> URL: https://issues.apache.org/jira/browse/IGNITE-21059
> Project: Ignite
>  Issue Type: Bug
>  Components: binary, clients
>Affects Versions: 2.14
>Reporter: Vipul Thakur
>Priority: Critical
> Attachments: cache-config-1.xml, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt2, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt3, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt2, 
> ignite-server-nohup.out
>
>
> We have recently upgraded from 2.7.6 to 2.14 due to the issue observed in 
> production environment where cluster would go in hang state due to partition 
> map exchange.
> Please find the below ticket which i created a while back for ignite 2.7.6
> https://issues.apache.org/jira/browse/IGNITE-13298
> So we migrated the apache ignite version to 2.14 and upgrade happened 
> smoothly but on the third day we could see cluster traffic dip again. 
> We have 4 nodes in a cluster where we provide 400 GB of RAM and more than 1 
> TB HDD.
> PFB for the attached config.[I have added it as attachment for review]
> I have also added the server logs from the same time when issue happened.
> We have set txn timeout as well as socket timeout both at server and client 
> end for our write operations but seems like sometimes cluster goes into hang 
> state and all our get calls are stuck and slowly everything starts to freeze 
> our jms listener threads and every thread reaches a choked up state in 
> sometime.
> Due to which our read services which does not even use txn to retrieve data 
> also starts to choke. Ultimately leading to end user traffic dip.
> We were hoping product upgrade will help but that has not been the case till 
> now. 
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-21059) We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running cache operations

2023-12-12 Thread Evgeny Stanilovsky (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-21059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795836#comment-17795836
 ] 

Evgeny Stanilovsky commented on IGNITE-21059:
-

need all logs from all nodes for further analyze, also check : 
https://ignite.apache.org/docs/latest/tools/control-script#transaction-management
and change wal size

> We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running 
> cache operations
> 
>
> Key: IGNITE-21059
> URL: https://issues.apache.org/jira/browse/IGNITE-21059
> Project: Ignite
>  Issue Type: Bug
>  Components: binary, clients
>Affects Versions: 2.14
>Reporter: Vipul Thakur
>Priority: Critical
> Attachments: cache-config-1.xml, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt2, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt3, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt2, 
> ignite-server-nohup.out
>
>
> We have recently upgraded from 2.7.6 to 2.14 due to the issue observed in 
> production environment where cluster would go in hang state due to partition 
> map exchange.
> Please find the below ticket which i created a while back for ignite 2.7.6
> https://issues.apache.org/jira/browse/IGNITE-13298
> So we migrated the apache ignite version to 2.14 and upgrade happened 
> smoothly but on the third day we could see cluster traffic dip again. 
> We have 4 nodes in a cluster where we provide 400 GB of RAM and more than 1 
> TB HDD.
> PFB for the attached config.[I have added it as attachment for review]
> I have also added the server logs from the same time when issue happened.
> We have set txn timeout as well as socket timeout both at server and client 
> end for our write operations but seems like sometimes cluster goes into hang 
> state and all our get calls are stuck and slowly everything starts to freeze 
> our jms listener threads and every thread reaches a choked up state in 
> sometime.
> Due to which our read services which does not even use txn to retrieve data 
> also starts to choke. Ultimately leading to end user traffic dip.
> We were hoping product upgrade will help but that has not been the case till 
> now. 
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-20652) .NET: Thin 3.0: add SQL script execution API

2023-12-12 Thread Pavel Tupitsyn (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-20652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795818#comment-17795818
 ] 

Pavel Tupitsyn commented on IGNITE-20652:
-

Comment addressed. Merged to main: 5bc3d2ccd22e2493231dfa857d24bed27a8373bd

> .NET: Thin 3.0: add SQL script execution API
> 
>
> Key: IGNITE-20652
> URL: https://issues.apache.org/jira/browse/IGNITE-20652
> Project: Ignite
>  Issue Type: Improvement
>  Components: platforms, thin client
>Affects Versions: 3.0.0-beta1
>Reporter: Pavel Pereslegin
>Assignee: Pavel Tupitsyn
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Support SQL script execution in dotnet thin client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-19215) ODBC 3.0: Implement DML data batching

2023-12-12 Thread Igor Sapego (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-19215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Sapego reassigned IGNITE-19215:


Assignee: Dmitrii Zabotlin  (was: Igor Sapego)

> ODBC 3.0: Implement DML data batching
> -
>
> Key: IGNITE-19215
> URL: https://issues.apache.org/jira/browse/IGNITE-19215
> Project: Ignite
>  Issue Type: Improvement
>  Components: odbc
>Reporter: Igor Sapego
>Assignee: Dmitrii Zabotlin
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>
> Scope:
> - Implement server side request handling;
> - Port client side functionality;
> - Port applicable tests;



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-19720) ODBC 3.0: Implement retrieval of Ignite version on handshake

2023-12-12 Thread Igor Sapego (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-19720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Sapego reassigned IGNITE-19720:


Assignee: Dmitrii Zabotlin  (was: Igor Sapego)

> ODBC 3.0: Implement retrieval of Ignite version on handshake
> 
>
> Key: IGNITE-19720
> URL: https://issues.apache.org/jira/browse/IGNITE-19720
> Project: Ignite
>  Issue Type: New Feature
>  Components: odbc
>Affects Versions: 3.0.0-beta1
>Reporter: Igor Sapego
>Assignee: Dmitrii Zabotlin
>Priority: Major
>  Labels: ignite-3
>
> SQLGetInfo(SQL_DBMS_VER) should return current version of cluster. Currently, 
> ODBC driver have not this information. Need to implement retrieval of this 
> information on handshake.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-19969) ODBC 3.0: Add support for period, duration and big_integer types

2023-12-12 Thread Igor Sapego (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-19969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Sapego reassigned IGNITE-19969:


Assignee: Dmitrii Zabotlin

> ODBC 3.0: Add support for period, duration and big_integer types
> 
>
> Key: IGNITE-19969
> URL: https://issues.apache.org/jira/browse/IGNITE-19969
> Project: Ignite
>  Issue Type: Improvement
>  Components: odbc
>Reporter: Igor Sapego
>Assignee: Dmitrii Zabotlin
>Priority: Major
>  Labels: ignite-3
>
> We didn't have support for such types in Ignite 2, so need to implement it 
> from scratch and add some tests for them as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-21032) ReadOnlyDynamicMBean.getAttributes may return a list of attribute values instead of Attribute instances

2023-12-12 Thread Simon Greatrix (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-21032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795811#comment-17795811
 ] 

Simon Greatrix commented on IGNITE-21032:
-

Fixed checkstyle issues and updated the pull request. Assigning back to 
[~slava.koptilin] for review and merging

> ReadOnlyDynamicMBean.getAttributes may return a list of attribute values 
> instead of Attribute instances
> ---
>
> Key: IGNITE-21032
> URL: https://issues.apache.org/jira/browse/IGNITE-21032
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vyacheslav Koptilin
>Assignee: Simon Greatrix
>Priority: Major
> Fix For: 2.16
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When supplying JMX information, the AttributeList class should contain 
> Attributes, however the existing code returns attribute values. This can 
> cause ClassCastExceptions in code that attempts to read an AttributeList.
>  
> [GitHub Issue #11045|https://github.com/apache/ignite/issues/11045]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-21032) ReadOnlyDynamicMBean.getAttributes may return a list of attribute values instead of Attribute instances

2023-12-12 Thread Simon Greatrix (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Greatrix reassigned IGNITE-21032:
---

Assignee: Vyacheslav Koptilin  (was: Simon Greatrix)

> ReadOnlyDynamicMBean.getAttributes may return a list of attribute values 
> instead of Attribute instances
> ---
>
> Key: IGNITE-21032
> URL: https://issues.apache.org/jira/browse/IGNITE-21032
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vyacheslav Koptilin
>Assignee: Vyacheslav Koptilin
>Priority: Major
> Fix For: 2.16
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When supplying JMX information, the AttributeList class should contain 
> Attributes, however the existing code returns attribute values. This can 
> cause ClassCastExceptions in code that attempts to read an AttributeList.
>  
> [GitHub Issue #11045|https://github.com/apache/ignite/issues/11045]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20652) .NET: Thin 3.0: add SQL script execution API

2023-12-12 Thread Pavel Tupitsyn (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Tupitsyn updated IGNITE-20652:

Summary: .NET: Thin 3.0: add SQL script execution API  (was: .NET: Thin 
3.0: support SQL script execution in dotnet thin client)

> .NET: Thin 3.0: add SQL script execution API
> 
>
> Key: IGNITE-20652
> URL: https://issues.apache.org/jira/browse/IGNITE-20652
> Project: Ignite
>  Issue Type: Improvement
>  Components: platforms, thin client
>Affects Versions: 3.0.0-beta1
>Reporter: Pavel Pereslegin
>Assignee: Pavel Tupitsyn
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Support SQL script execution in dotnet thin client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (IGNITE-21059) We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running cache operations

2023-12-12 Thread Vipul Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-21059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795714#comment-17795714
 ] 

Vipul Thakur edited comment on IGNITE-21059 at 12/12/23 2:42 PM:
-

Hi 

Thank you for quick response, we have configured tx timeout at client end our 
clients are written in spring boot and java , any config needed at server's 
config.xml also ? 

We will also read about changing-wal-segment-size and make the changes 
accordingly 


was (Author: vipul.thakur):
Hi 

Thank you for quick response, we have configured tx timeout at client end our 
clients are written in spring boot and java , is it needed at server's 
config.xml also ? 

We will also read about chaning-wal-segment-size and make the changes 
accordingly 

> We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running 
> cache operations
> 
>
> Key: IGNITE-21059
> URL: https://issues.apache.org/jira/browse/IGNITE-21059
> Project: Ignite
>  Issue Type: Bug
>  Components: binary, clients
>Affects Versions: 2.14
>Reporter: Vipul Thakur
>Priority: Critical
> Attachments: cache-config-1.xml, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt2, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt3, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt2, 
> ignite-server-nohup.out
>
>
> We have recently upgraded from 2.7.6 to 2.14 due to the issue observed in 
> production environment where cluster would go in hang state due to partition 
> map exchange.
> Please find the below ticket which i created a while back for ignite 2.7.6
> https://issues.apache.org/jira/browse/IGNITE-13298
> So we migrated the apache ignite version to 2.14 and upgrade happened 
> smoothly but on the third day we could see cluster traffic dip again. 
> We have 4 nodes in a cluster where we provide 400 GB of RAM and more than 1 
> TB HDD.
> PFB for the attached config.[I have added it as attachment for review]
> I have also added the server logs from the same time when issue happened.
> We have set txn timeout as well as socket timeout both at server and client 
> end for our write operations but seems like sometimes cluster goes into hang 
> state and all our get calls are stuck and slowly everything starts to freeze 
> our jms listener threads and every thread reaches a choked up state in 
> sometime.
> Due to which our read services which does not even use txn to retrieve data 
> also starts to choke. Ultimately leading to end user traffic dip.
> We were hoping product upgrade will help but that has not been the case till 
> now. 
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21072) NamedListConfiguration#get(java.util.UUID) should cast to polymorphic type

2023-12-12 Thread Vadim Pakhnushev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Pakhnushev updated IGNITE-21072:
--
Summary: NamedListConfiguration#get(java.util.UUID) should cast to 
polymorphic type  (was: NamedListConfiguration#get(java.util.UUID) doesn't work 
with polymorphic configurations)

> NamedListConfiguration#get(java.util.UUID) should cast to polymorphic type
> --
>
> Key: IGNITE-21072
> URL: https://issues.apache.org/jira/browse/IGNITE-21072
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vadim Pakhnushev
>Assignee: Vadim Pakhnushev
>Priority: Major
>  Labels: ignite-3
>
> {{NamedListConfiguration#get(java.util.UUID)}} doesn't call 
> {{specificConfigTree}} so it doesn't cast the value to the specific type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-20652) .NET: Thin 3.0: support SQL script execution in dotnet thin client

2023-12-12 Thread Igor Sapego (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-20652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795773#comment-17795773
 ] 

Igor Sapego commented on IGNITE-20652:
--

Approved with a little comment.

> .NET: Thin 3.0: support SQL script execution in dotnet thin client
> --
>
> Key: IGNITE-20652
> URL: https://issues.apache.org/jira/browse/IGNITE-20652
> Project: Ignite
>  Issue Type: Improvement
>  Components: platforms, thin client
>Affects Versions: 3.0.0-beta1
>Reporter: Pavel Pereslegin
>Assignee: Pavel Tupitsyn
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Support SQL script execution in dotnet thin client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21072) NamedListConfiguration#get(java.util.UUID) doesn't work with polymorphic configurations

2023-12-12 Thread Vadim Pakhnushev (Jira)
Vadim Pakhnushev created IGNITE-21072:
-

 Summary: NamedListConfiguration#get(java.util.UUID) doesn't work 
with polymorphic configurations
 Key: IGNITE-21072
 URL: https://issues.apache.org/jira/browse/IGNITE-21072
 Project: Ignite
  Issue Type: Bug
Reporter: Vadim Pakhnushev
Assignee: Vadim Pakhnushev


{{NamedListConfiguration#get(java.util.UUID)}} doesn't call 
{{specificConfigTree}} so it doesn't cast the value to the specific type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21071) Rollback the transaction on primary failure if replication is not finished

2023-12-12 Thread Alexander Lapin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Lapin updated IGNITE-21071:
-
Description: 
h3. Motivation

Despite the fact that it's not always necessary within initial implementation 
it's required to rollback the transaction if a node that host primary replica 
failed pripor to replication finalization.
h3. Definition of Done

Transaction should be eventually rolled back in case of primary replica host 
failure while transaction is in infligts awaiting state.
h3. Implementation Notes

Primary replica host failure will end with corresponding primary replica 
expiration, thus within initial implementation it's required to listen primary 
replica expirations on tx finish while waiting infligts to complete. 
Corresponding SQL based case should also be checked.

> Rollback the transaction on primary failure if replication is not finished
> --
>
> Key: IGNITE-21071
> URL: https://issues.apache.org/jira/browse/IGNITE-21071
> Project: Ignite
>  Issue Type: New Feature
>Reporter: Alexander Lapin
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> Despite the fact that it's not always necessary within initial implementation 
> it's required to rollback the transaction if a node that host primary replica 
> failed pripor to replication finalization.
> h3. Definition of Done
> Transaction should be eventually rolled back in case of primary replica host 
> failure while transaction is in infligts awaiting state.
> h3. Implementation Notes
> Primary replica host failure will end with corresponding primary replica 
> expiration, thus within initial implementation it's required to listen 
> primary replica expirations on tx finish while waiting infligts to complete. 
> Corresponding SQL based case should also be checked.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-20994) Adjust writeIntentResolution logic in order to initiate the recovery if coordinator is dead

2023-12-12 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-20994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795748#comment-17795748
 ] 

Vladislav Pyatkov commented on IGNITE-20994:


Merged 64d3bcb345a20e1dbcc554e40d8e709111b02110

> Adjust writeIntentResolution logic in order to initiate the recovery if 
> coordinator is dead
> ---
>
> Key: IGNITE-20994
> URL: https://issues.apache.org/jira/browse/IGNITE-20994
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Alexander Lapin
>Assignee: Denis Chudov
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> h3. Motivation
> Besides lock conflict, writeIntent resolution may also detect abandoned 
> transactions, thus it also should trigger initiate recovery logic. Probably 
> with some write intent resolution specifics.
> h3. Definition of Done
> Write intent resolution will initiate the recovery in case of dead tx 
> coordinator within commit partition path.
> h3. Implementation Notes
> Basically within commit partition path in case of pending state write intent 
> should check whether coordinator is dead (that part might be tricky because 
> we may lose volatile state of where coordinator is) and if it is: rollback 
> the transaction, and send rolled back state backwards, the one that should 
> change local txn state to ABORTED on initial trigger. It's not clear whether 
> it's required to send unlock or special sort of cleanup message to the 
> trigger node.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21071) Rollback the transaction on primary failure if replication is not finished

2023-12-12 Thread Alexander Lapin (Jira)
Alexander Lapin created IGNITE-21071:


 Summary: Rollback the transaction on primary failure if 
replication is not finished
 Key: IGNITE-21071
 URL: https://issues.apache.org/jira/browse/IGNITE-21071
 Project: Ignite
  Issue Type: New Feature
Reporter: Alexander Lapin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21070) Ensure that data node's primary replica expiration properly handled

2023-12-12 Thread Alexander Lapin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Lapin updated IGNITE-21070:
-
Labels: ignite-3  (was: )

> Ensure that data node's primary replica expiration properly handled
> ---
>
> Key: IGNITE-21070
> URL: https://issues.apache.org/jira/browse/IGNITE-21070
> Project: Ignite
>  Issue Type: New Feature
>Reporter: Alexander Lapin
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> Corresponding primary replica expiration logic should already be implemented, 
> mainly within coordinator recovery thus within a given ticket it's only 
> required to extend test coverage. Primary replica expiration handling logic 
> differs whether expiration itself occurred before or after following flow 
> splitters:
>  * Replication finishing. (Inflights == 0)
>  * Commit timestamp evolution. // Not sure actually, maybe there's no 
> difference between replication finishing and commit timestamp evolution 
> splitters.
>  * Finish request handling.
>  * Cleanup request handling.
>  * WriteIntent switch request handling.
> Specific test scenarious will be specified during ticket implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20652) .NET: Thin 3.0: support SQL script execution in dotnet thin client

2023-12-12 Thread Pavel Tupitsyn (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Tupitsyn updated IGNITE-20652:

Ignite Flags:   (was: Docs Required,Release Notes Required)

> .NET: Thin 3.0: support SQL script execution in dotnet thin client
> --
>
> Key: IGNITE-20652
> URL: https://issues.apache.org/jira/browse/IGNITE-20652
> Project: Ignite
>  Issue Type: Improvement
>  Components: platforms, thin client
>Affects Versions: 3.0.0-beta1
>Reporter: Pavel Pereslegin
>Assignee: Pavel Tupitsyn
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Support SQL script execution in dotnet thin client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21070) Ensure that data node's primary replica expiration properly handled

2023-12-12 Thread Alexander Lapin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Lapin updated IGNITE-21070:
-
Description: 
h3. Motivation

Corresponding primary replica expiration logic should already be implemented, 
mainly within coordinator recovery thus within a given ticket it's only 
required to extend test coverage. Primary replica expiration handling logic 
differs whether expiration itself occurred before or after following flow 
splitters:
 * Replication finishing. (Inflights == 0)
 * Commit timestamp evolution. // Not sure actually, maybe there's no 
difference between replication finishing and commit timestamp evolution 
splitters.
 * Finish request handling.
 * Cleanup request handling.
 * WriteIntent switch request handling.

Specific test scenarious will be specified during ticket implementation.

> Ensure that data node's primary replica expiration properly handled
> ---
>
> Key: IGNITE-21070
> URL: https://issues.apache.org/jira/browse/IGNITE-21070
> Project: Ignite
>  Issue Type: New Feature
>Reporter: Alexander Lapin
>Priority: Major
>
> h3. Motivation
> Corresponding primary replica expiration logic should already be implemented, 
> mainly within coordinator recovery thus within a given ticket it's only 
> required to extend test coverage. Primary replica expiration handling logic 
> differs whether expiration itself occurred before or after following flow 
> splitters:
>  * Replication finishing. (Inflights == 0)
>  * Commit timestamp evolution. // Not sure actually, maybe there's no 
> difference between replication finishing and commit timestamp evolution 
> splitters.
>  * Finish request handling.
>  * Cleanup request handling.
>  * WriteIntent switch request handling.
> Specific test scenarious will be specified during ticket implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20994) Adjust writeIntentResolution logic in order to initiate the recovery if coordinator is dead

2023-12-12 Thread Denis Chudov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denis Chudov updated IGNITE-20994:
--
Reviewer: Vladislav Pyatkov

> Adjust writeIntentResolution logic in order to initiate the recovery if 
> coordinator is dead
> ---
>
> Key: IGNITE-20994
> URL: https://issues.apache.org/jira/browse/IGNITE-20994
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Alexander Lapin
>Assignee: Denis Chudov
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> h3. Motivation
> Besides lock conflict, writeIntent resolution may also detect abandoned 
> transactions, thus it also should trigger initiate recovery logic. Probably 
> with some write intent resolution specifics.
> h3. Definition of Done
> Write intent resolution will initiate the recovery in case of dead tx 
> coordinator within commit partition path.
> h3. Implementation Notes
> Basically within commit partition path in case of pending state write intent 
> should check whether coordinator is dead (that part might be tricky because 
> we may lose volatile state of where coordinator is) and if it is: rollback 
> the transaction, and send rolled back state backwards, the one that should 
> change local txn state to ABORTED on initial trigger. It's not clear whether 
> it's required to send unlock or special sort of cleanup message to the 
> trigger node.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21070) Ensure that data node's primary replica expiration properly handled

2023-12-12 Thread Alexander Lapin (Jira)
Alexander Lapin created IGNITE-21070:


 Summary: Ensure that data node's primary replica expiration 
properly handled
 Key: IGNITE-21070
 URL: https://issues.apache.org/jira/browse/IGNITE-21070
 Project: Ignite
  Issue Type: New Feature
Reporter: Alexander Lapin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21069) Tx on unstable topology: data node recovery

2023-12-12 Thread Alexander Lapin (Jira)
Alexander Lapin created IGNITE-21069:


 Summary: Tx on unstable topology: data node recovery
 Key: IGNITE-21069
 URL: https://issues.apache.org/jira/browse/IGNITE-21069
 Project: Ignite
  Issue Type: Epic
Reporter: Alexander Lapin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (IGNITE-21059) We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running cache operations

2023-12-12 Thread Vipul Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-21059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795735#comment-17795735
 ] 

Vipul Thakur edited comment on IGNITE-21059 at 12/12/23 1:08 PM:
-

Evidence that txn timeout is enabled at client end : 

Below are the server logs: 

2023-11-30T14:19:01,783][ERROR][grid-timeout-worker-#326%EVENT_PROCESSING%|#326%EVENT_PROCESSING%][GridDhtColocatedCache]
  Failed to acquire lock for request: GridNearLockRequest 
[topVer=AffinityTopologyVersion [topVer=93, minorTopVer=0], miniId=1, 
dhtVers=GridCacheVersion[] [null], taskNameHash=0, createTtl=-1, accessTtl=-1, 
flags=3, txLbl=null, filter=null, super=GridDistributedLockRequest 
[nodeId=62fdf256-6130-4ef3-842c-b2078f6e6c07, nearXidVer=GridCacheVersion 
[topVer=312674007, order=1701333641101, nodeOrder=53, dataCenterId=0], 
threadId=372, futId=9c4a6212c81-c17f568a-3419-42a6-9042-7a1f3281301c, 
timeout=3, isInTx=true, isInvalidate=false, isRead=true, 
isolation=REPEATABLE_READ, retVals=[true], txSize=0, flags=0, keysCnt=1, 
super=GridDistributedBaseMessage [ver=GridCacheVersion [topVer=312674007, 
order=1701333641101, nodeOrder=53, dataCenterId=0], committedVers=null, 
rolledbackVers=null, cnt=0, super=GridCacheIdMessage [cacheId=-885490198, 
super=GridCacheMessage [msgId=55444220, depInfo=null, 
lastAffChangedTopVer=AffinityTopologyVersion [topVer=53, minorTopVer=0], 
err=null, skipPrepare=false]
[2023-11-30T14:19:44,579][ERROR][grid-timeout-worker-#326%EVENT_PROCESSING%|#326%EVENT_PROCESSING%][GridDhtColocatedCache]
  Failed to acquire lock for request: GridNearLockRequest 
[topVer=AffinityTopologyVersion [topVer=93, minorTopVer=0], miniId=1, 
dhtVers=GridCacheVersion[] [null], taskNameHash=0, createTtl=-1, accessTtl=-1, 
flags=3, txLbl=null, filter=null, super=GridDistributedLockRequest 
[nodeId=62fdf256-6130-4ef3-842c-b2078f6e6c07, nearXidVer=GridCacheVersion 
[topVer=312674007, order=1701333641190, nodeOrder=53, dataCenterId=0], 
threadId=897, futId=a3ba6212c81-c17f568a-3419-42a6-9042-7a1f3281301c, 
*timeout=3, isInTx=true, isInvalidate=false, isRead=true, 
isolation=REPEATABLE_READ,* retVals=[true], txSize=0, flags=0, keysCnt=1, 
super=GridDistributedBaseMessage [ver=GridCacheVersion [topVer=312674007, 
order=1701333641190, nodeOrder=53, dataCenterId=0], committedVers=null, 
rolledbackVers=null, cnt=0, super=GridCacheIdMessage [cacheId=-885490198, 
super=GridCacheMessage [msgId=55444392, depInfo=null, 
lastAffChangedTopVer=AffinityTopologyVersion [topVer=53, minorTopVer=0], 
err=null, skipPrepare=false]
org.apache.ignite.internal.transactions.IgniteTxTimeoutCheckedException: Failed 
to acquire lock within provided timeout for transaction [timeout=3, 
tx=GridDhtTxLocal[xid=c8a166f1c81--12a3-06d7--0001, 
xidVersion=GridCacheVersion [topVer=312674007, order=1701333834380, 
nodeOrder=1, dataCenterId=0], nearXidVersion=GridCacheVersion 
[topVer=312674007, order=1701333641190, nodeOrder=53, dataCenterId=0], 
concurrency=PESSIMISTIC, isolation=REPEATABLE_READ, state=MARKED_ROLLBACK, 
invalidate=false, rollbackOnly=true, 
nodeId=f751efe5-c44c-4b3c-bcd3-dd5866ec0bdd, timeout=3, 
startTime=1701334154571, {*}duration=30003]{*}]


was (Author: vipul.thakur):
Evidence that txn timeout is enabled at client end : 

 

2023-11-30T14:19:01,783][ERROR][grid-timeout-worker-#326%EVENT_PROCESSING%][GridDhtColocatedCache]
  Failed to acquire lock for request: GridNearLockRequest 
[topVer=AffinityTopologyVersion [topVer=93, minorTopVer=0], miniId=1, 
dhtVers=GridCacheVersion[] [null], taskNameHash=0, createTtl=-1, accessTtl=-1, 
flags=3, txLbl=null, filter=null, super=GridDistributedLockRequest 
[nodeId=62fdf256-6130-4ef3-842c-b2078f6e6c07, nearXidVer=GridCacheVersion 
[topVer=312674007, order=1701333641101, nodeOrder=53, dataCenterId=0], 
threadId=372, futId=9c4a6212c81-c17f568a-3419-42a6-9042-7a1f3281301c, 
timeout=3, isInTx=true, isInvalidate=false, isRead=true, 
isolation=REPEATABLE_READ, retVals=[true], txSize=0, flags=0, keysCnt=1, 
super=GridDistributedBaseMessage [ver=GridCacheVersion [topVer=312674007, 
order=1701333641101, nodeOrder=53, dataCenterId=0], committedVers=null, 
rolledbackVers=null, cnt=0, super=GridCacheIdMessage [cacheId=-885490198, 
super=GridCacheMessage [msgId=55444220, depInfo=null, 
lastAffChangedTopVer=AffinityTopologyVersion [topVer=53, minorTopVer=0], 
err=null, skipPrepare=false]
[2023-11-30T14:19:44,579][ERROR][grid-timeout-worker-#326%EVENT_PROCESSING%][GridDhtColocatedCache]
  Failed to acquire lock for request: GridNearLockRequest 
[topVer=AffinityTopologyVersion [topVer=93, minorTopVer=0], miniId=1, 
dhtVers=GridCacheVersion[] [null], taskNameHash=0, createTtl=-1, accessTtl=-1, 
flags=3, txLbl=null, filter=null, super=GridDistributedLockRequest 
[nodeId=62fdf256-6130-4ef3-842c-b2078f6e6c07, nearXidVer=GridCacheVersion 
[topVer=312674007, 

[jira] [Commented] (IGNITE-21059) We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running cache operations

2023-12-12 Thread Vipul Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-21059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795735#comment-17795735
 ] 

Vipul Thakur commented on IGNITE-21059:
---

Evidence that txn timeout is enabled at client end : 

 

2023-11-30T14:19:01,783][ERROR][grid-timeout-worker-#326%EVENT_PROCESSING%][GridDhtColocatedCache]
  Failed to acquire lock for request: GridNearLockRequest 
[topVer=AffinityTopologyVersion [topVer=93, minorTopVer=0], miniId=1, 
dhtVers=GridCacheVersion[] [null], taskNameHash=0, createTtl=-1, accessTtl=-1, 
flags=3, txLbl=null, filter=null, super=GridDistributedLockRequest 
[nodeId=62fdf256-6130-4ef3-842c-b2078f6e6c07, nearXidVer=GridCacheVersion 
[topVer=312674007, order=1701333641101, nodeOrder=53, dataCenterId=0], 
threadId=372, futId=9c4a6212c81-c17f568a-3419-42a6-9042-7a1f3281301c, 
timeout=3, isInTx=true, isInvalidate=false, isRead=true, 
isolation=REPEATABLE_READ, retVals=[true], txSize=0, flags=0, keysCnt=1, 
super=GridDistributedBaseMessage [ver=GridCacheVersion [topVer=312674007, 
order=1701333641101, nodeOrder=53, dataCenterId=0], committedVers=null, 
rolledbackVers=null, cnt=0, super=GridCacheIdMessage [cacheId=-885490198, 
super=GridCacheMessage [msgId=55444220, depInfo=null, 
lastAffChangedTopVer=AffinityTopologyVersion [topVer=53, minorTopVer=0], 
err=null, skipPrepare=false]
[2023-11-30T14:19:44,579][ERROR][grid-timeout-worker-#326%EVENT_PROCESSING%][GridDhtColocatedCache]
  Failed to acquire lock for request: GridNearLockRequest 
[topVer=AffinityTopologyVersion [topVer=93, minorTopVer=0], miniId=1, 
dhtVers=GridCacheVersion[] [null], taskNameHash=0, createTtl=-1, accessTtl=-1, 
flags=3, txLbl=null, filter=null, super=GridDistributedLockRequest 
[nodeId=62fdf256-6130-4ef3-842c-b2078f6e6c07, nearXidVer=GridCacheVersion 
[topVer=312674007, order=1701333641190, nodeOrder=53, dataCenterId=0], 
threadId=897, futId=a3ba6212c81-c17f568a-3419-42a6-9042-7a1f3281301c, 
*timeout=3, isInTx=true, isInvalidate=false, isRead=true, 
isolation=REPEATABLE_READ,* retVals=[true], txSize=0, flags=0, keysCnt=1, 
super=GridDistributedBaseMessage [ver=GridCacheVersion [topVer=312674007, 
order=1701333641190, nodeOrder=53, dataCenterId=0], committedVers=null, 
rolledbackVers=null, cnt=0, super=GridCacheIdMessage [cacheId=-885490198, 
super=GridCacheMessage [msgId=55444392, depInfo=null, 
lastAffChangedTopVer=AffinityTopologyVersion [topVer=53, minorTopVer=0], 
err=null, skipPrepare=false]
org.apache.ignite.internal.transactions.IgniteTxTimeoutCheckedException: Failed 
to acquire lock within provided timeout for transaction [timeout=3, 
tx=GridDhtTxLocal[xid=c8a166f1c81--12a3-06d7--0001, 
xidVersion=GridCacheVersion [topVer=312674007, order=1701333834380, 
nodeOrder=1, dataCenterId=0], nearXidVersion=GridCacheVersion 
[topVer=312674007, order=1701333641190, nodeOrder=53, dataCenterId=0], 
concurrency=PESSIMISTIC, isolation=REPEATABLE_READ, state=MARKED_ROLLBACK, 
invalidate=false, rollbackOnly=true, 
nodeId=f751efe5-c44c-4b3c-bcd3-dd5866ec0bdd, timeout=3, 
startTime=1701334154571, {*}duration=30003]{*}]

> We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running 
> cache operations
> 
>
> Key: IGNITE-21059
> URL: https://issues.apache.org/jira/browse/IGNITE-21059
> Project: Ignite
>  Issue Type: Bug
>  Components: binary, clients
>Affects Versions: 2.14
>Reporter: Vipul Thakur
>Priority: Critical
> Attachments: cache-config-1.xml, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt2, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt3, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt2, 
> ignite-server-nohup.out
>
>
> We have recently upgraded from 2.7.6 to 2.14 due to the issue observed in 
> production environment where cluster would go in hang state due to partition 
> map exchange.
> Please find the below ticket which i created a while back for ignite 2.7.6
> https://issues.apache.org/jira/browse/IGNITE-13298
> So we migrated the apache ignite version to 2.14 and upgrade happened 
> smoothly but on the third day we could see cluster traffic dip again. 
> We have 4 nodes in a cluster where we provide 400 GB of RAM and more than 1 
> TB HDD.
> PFB for the attached config.[I have added it as attachment for review]
> I have also added the server logs from the same time when issue happened.
> We have set txn timeout as well as socket timeout both at server and client 
> end for our write operations but seems like sometimes cluster goes into hang 
> state and all our get calls are stuck and slowly everything 

[jira] [Created] (IGNITE-21068) Ignite node must not communicate with a node removed from the Physical Topology

2023-12-12 Thread Roman Puchkovskiy (Jira)
Roman Puchkovskiy created IGNITE-21068:
--

 Summary: Ignite node must not communicate with a node removed from 
the Physical Topology
 Key: IGNITE-21068
 URL: https://issues.apache.org/jira/browse/IGNITE-21068
 Project: Ignite
  Issue Type: Improvement
Reporter: Roman Puchkovskiy
 Fix For: 3.0.0-beta2


It is possible for a node to be considered DEAD due to a timeout (because it 
did not respond to a series of pings in a timely manner) even though the 
network channel is still operational. Currently, even after a nodes is removed 
from the Physical Topology, it can still send/receive messages.

This breaks an invariant that such a node must not be able to communicate with 
the cluster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (IGNITE-20745) TableManager.tableAsync(int tableId) is slowing down thin clients

2023-12-12 Thread Igor Sapego (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-20745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795722#comment-17795722
 ] 

Igor Sapego edited comment on IGNITE-20745 at 12/12/23 12:44 PM:
-

[~ibessonov] suggested to try the following quick fix:

Let's make the following call synchronous, and check if this will help:

At org.apache.ignite.internal.table.distributed.TableManager#tableAsyncInternal:
{code:java}
return orStopManagerFuture(schemaSyncService.waitForMetadataCompleteness(now))
.thenComposeAsync(unused -> inBusyLockAsync(busyLock, () -> 
{
...
{code}

Replace this with:
{code:java}
if (fut.isDone()) ...
{code}

In most cases, the future that was returned from 
{{waitForMetadataCompleteness}} has already completed, and using async is a 
waste of time. There is only a reading from the map inside, and probably 
nothing more. This should be the most heavy place in terms of execution time.


was (Author: isapego):
[~ibessonov] suggested to try the following quick fix:

Let's make the following call synchronous, and check if this will help:

At org.apache.ignite.internal.table.distributed.TableManager#tableAsyncInternal:
{code:java}
return orStopManagerFuture(schemaSyncService.waitForMetadataCompleteness(now))
.thenComposeAsync(unused -> inBusyLockAsync(busyLock, () -> 
{
...
{code}

Replace this with:
{code:java}
if (fut.isDone()) ...
{code}


> TableManager.tableAsync(int tableId) is slowing down thin clients
> -
>
> Key: IGNITE-20745
> URL: https://issues.apache.org/jira/browse/IGNITE-20745
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 3.0.0-beta1
>Reporter: Pavel Tupitsyn
>Assignee: Igor Sapego
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
> Attachments: ItThinClientPutGetBenchmark.java
>
>
> Performance difference between embedded and client modes is affected 
> considerably by the call to *IgniteTablesInternal#tableAsync(int id)*. This 
> call has to be performed on every individual table operation.
> We should make it as fast as possible. Something like a dictionary lookup + 
> quick check for deleted table.
> ||Part||Duration, us||
> |Network & msgpack|19.30|
> |Get table|14.29|
> |Get tuple & serialize|12.86|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (IGNITE-20745) TableManager.tableAsync(int tableId) is slowing down thin clients

2023-12-12 Thread Igor Sapego (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-20745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795722#comment-17795722
 ] 

Igor Sapego edited comment on IGNITE-20745 at 12/12/23 12:42 PM:
-

[~ibessonov] suggested to try the following quick fix:

Let's make the following call synchronous, and check if this will help:

At org.apache.ignite.internal.table.distributed.TableManager#tableAsyncInternal:
{code:java}
return orStopManagerFuture(schemaSyncService.waitForMetadataCompleteness(now))
.thenComposeAsync(unused -> inBusyLockAsync(busyLock, () -> 
{
...
{code}

Replace this with:
{code:java}
if (fut.isDone()) ...
{code}



was (Author: isapego):
[~ibessonov] suggested to try the following quick fix:

Let's make the following call synchronous, and check if this will help:

At org.apache.ignite.internal.table.distributed.TableManager#tableAsyncInternal:
{code:java}
return orStopManagerFuture(schemaSyncService.waitForMetadataCompleteness(now))
.thenComposeAsync(unused -> inBusyLockAsync(busyLock, () -> 
{
...
{code}



> TableManager.tableAsync(int tableId) is slowing down thin clients
> -
>
> Key: IGNITE-20745
> URL: https://issues.apache.org/jira/browse/IGNITE-20745
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 3.0.0-beta1
>Reporter: Pavel Tupitsyn
>Assignee: Igor Sapego
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
> Attachments: ItThinClientPutGetBenchmark.java
>
>
> Performance difference between embedded and client modes is affected 
> considerably by the call to *IgniteTablesInternal#tableAsync(int id)*. This 
> call has to be performed on every individual table operation.
> We should make it as fast as possible. Something like a dictionary lookup + 
> quick check for deleted table.
> ||Part||Duration, us||
> |Network & msgpack|19.30|
> |Get table|14.29|
> |Get tuple & serialize|12.86|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-20745) TableManager.tableAsync(int tableId) is slowing down thin clients

2023-12-12 Thread Igor Sapego (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-20745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795722#comment-17795722
 ] 

Igor Sapego commented on IGNITE-20745:
--

[~ibessonov] suggested to try the following quick fix:

Let's make the following call synchronous, and check if this will help:

At org.apache.ignite.internal.table.distributed.TableManager#tableAsyncInternal:
{code:java}
return orStopManagerFuture(schemaSyncService.waitForMetadataCompleteness(now))
.thenComposeAsync(unused -> inBusyLockAsync(busyLock, () -> 
{
...
{code}



> TableManager.tableAsync(int tableId) is slowing down thin clients
> -
>
> Key: IGNITE-20745
> URL: https://issues.apache.org/jira/browse/IGNITE-20745
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 3.0.0-beta1
>Reporter: Pavel Tupitsyn
>Assignee: Igor Sapego
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
> Attachments: ItThinClientPutGetBenchmark.java
>
>
> Performance difference between embedded and client modes is affected 
> considerably by the call to *IgniteTablesInternal#tableAsync(int id)*. This 
> call has to be performed on every individual table operation.
> We should make it as fast as possible. Something like a dictionary lookup + 
> quick check for deleted table.
> ||Part||Duration, us||
> |Network & msgpack|19.30|
> |Get table|14.29|
> |Get tuple & serialize|12.86|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-21032) ReadOnlyDynamicMBean.getAttributes may return a list of attribute values instead of Attribute instances

2023-12-12 Thread Simon Greatrix (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Greatrix reassigned IGNITE-21032:
---

Assignee: Simon Greatrix

Taking ownership to fix checkstyle issues.

> ReadOnlyDynamicMBean.getAttributes may return a list of attribute values 
> instead of Attribute instances
> ---
>
> Key: IGNITE-21032
> URL: https://issues.apache.org/jira/browse/IGNITE-21032
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vyacheslav Koptilin
>Assignee: Simon Greatrix
>Priority: Major
> Fix For: 2.16
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When supplying JMX information, the AttributeList class should contain 
> Attributes, however the existing code returns attribute values. This can 
> cause ClassCastExceptions in code that attempts to read an AttributeList.
>  
> [GitHub Issue #11045|https://github.com/apache/ignite/issues/11045]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-21032) ReadOnlyDynamicMBean.getAttributes may return a list of attribute values instead of Attribute instances

2023-12-12 Thread Simon Greatrix (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-21032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795719#comment-17795719
 ] 

Simon Greatrix commented on IGNITE-21032:
-

Yes it is. However, the supplementary issue identified in the comment from 
[~sato_eiichi] is not covered by this.

> ReadOnlyDynamicMBean.getAttributes may return a list of attribute values 
> instead of Attribute instances
> ---
>
> Key: IGNITE-21032
> URL: https://issues.apache.org/jira/browse/IGNITE-21032
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vyacheslav Koptilin
>Priority: Major
> Fix For: 2.16
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When supplying JMX information, the AttributeList class should contain 
> Attributes, however the existing code returns attribute values. This can 
> cause ClassCastExceptions in code that attempts to read an AttributeList.
>  
> [GitHub Issue #11045|https://github.com/apache/ignite/issues/11045]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-21063) Cannot create 1000 tables

2023-12-12 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov reassigned IGNITE-21063:
--

Assignee: Ivan Bessonov

> Cannot create 1000 tables
> -
>
> Key: IGNITE-21063
> URL: https://issues.apache.org/jira/browse/IGNITE-21063
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> Fails with OOM after a while, managing to create about 500 tales locally. We 
> need to research, why it happens. Is there a leak, or we simply use too much 
> memory.
> Main candidate: thread-local marshallers. For some reason, we use too many 
> threads, I guess? Meta-storage entries may be up to several megabytes in 
> current implementation.
> We should limit the size of cached buffers, and number of threads in general. 
> Shared pool (priority-queue) of pre-allocated buffers would solve the issue, 
> they don't have to be thread-local. It's a bit slower, but it's not a problem 
> until proven otherwise



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-21059) We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running cache operations

2023-12-12 Thread Vipul Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-21059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795714#comment-17795714
 ] 

Vipul Thakur commented on IGNITE-21059:
---

Hi 

Thank you for quick response, we have configured tx timeout at client end our 
clients are written in spring boot and java , is it needed at server's 
config.xml also ? 

We will also read about chaning-wal-segment-size and make the changes 
accordingly 

> We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running 
> cache operations
> 
>
> Key: IGNITE-21059
> URL: https://issues.apache.org/jira/browse/IGNITE-21059
> Project: Ignite
>  Issue Type: Bug
>  Components: binary, clients
>Affects Versions: 2.14
>Reporter: Vipul Thakur
>Priority: Critical
> Attachments: cache-config-1.xml, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt2, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt3, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt2, 
> ignite-server-nohup.out
>
>
> We have recently upgraded from 2.7.6 to 2.14 due to the issue observed in 
> production environment where cluster would go in hang state due to partition 
> map exchange.
> Please find the below ticket which i created a while back for ignite 2.7.6
> https://issues.apache.org/jira/browse/IGNITE-13298
> So we migrated the apache ignite version to 2.14 and upgrade happened 
> smoothly but on the third day we could see cluster traffic dip again. 
> We have 4 nodes in a cluster where we provide 400 GB of RAM and more than 1 
> TB HDD.
> PFB for the attached config.[I have added it as attachment for review]
> I have also added the server logs from the same time when issue happened.
> We have set txn timeout as well as socket timeout both at server and client 
> end for our write operations but seems like sometimes cluster goes into hang 
> state and all our get calls are stuck and slowly everything starts to freeze 
> our jms listener threads and every thread reaches a choked up state in 
> sometime.
> Due to which our read services which does not even use txn to retrieve data 
> also starts to choke. Ultimately leading to end user traffic dip.
> We were hoping product upgrade will help but that has not been the case till 
> now. 
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-21060) Extract ClusterNodeResolver as a separate entity

2023-12-12 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-21060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795710#comment-17795710
 ] 

Vladislav Pyatkov commented on IGNITE-21060:


Merged 492f01b6423c3514cbe4b3ac45b3d84e70080318

> Extract ClusterNodeResolver as a separate entity
> 
>
> Key: IGNITE-21060
> URL: https://issues.apache.org/jira/browse/IGNITE-21060
> Project: Ignite
>  Issue Type: Task
>Reporter:  Kirill Sizov
>Assignee:  Kirill Sizov
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> *Motivation*
> There are many places in the code having a parameter and/or a field like
> {code}
> Function clusterNodeResolver
> {code}
> Instead of a generic function we want to have a specific ClusterNodeResolver 
> entity for the better code readability.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21067) Clean up documents related to MVCC

2023-12-12 Thread YuJue Li (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YuJue Li updated IGNITE-21067:
--
Parent: IGNITE-13871
Issue Type: Sub-task  (was: Improvement)

> Clean up documents related to MVCC
> --
>
> Key: IGNITE-21067
> URL: https://issues.apache.org/jira/browse/IGNITE-21067
> Project: Ignite
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 2.15
>Reporter: YuJue Li
>Assignee: YuJue Li
>Priority: Minor
> Fix For: 2.17
>
>
> Clean up documents related to MVCC



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-21016) ItMixedQueriesTest.testIgniteSchemaAwaresAlterTableCommand is flaky

2023-12-12 Thread Konstantin Orlov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-21016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795708#comment-17795708
 ] 

Konstantin Orlov commented on IGNITE-21016:
---

[~xtern], [~vpyatkov], folks, do a review please

> ItMixedQueriesTest.testIgniteSchemaAwaresAlterTableCommand is flaky
> ---
>
> Key: IGNITE-21016
> URL: https://issues.apache.org/jira/browse/IGNITE-21016
> Project: Ignite
>  Issue Type: Bug
>  Components: sql
>Reporter: Yury Gerzhedovich
>Assignee: Konstantin Orlov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>
> The test 
> org.apache.ignite.internal.sql.engine.ItMixedQueriesTest#testIgniteSchemaAwaresAlterTableCommand
>  is flacky.
> The issue periodically appears on TC, also reproducable on local environment.
> {code:java}
> org.opentest4j.AssertionFailedError: Column metadata doesn't match ==> 
> expected: <3> but was: <2>
>   at 
> app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
>   at 
> app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
>   at 
> app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
>   at 
> app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:150)
>   at app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:560)
>   at 
> app//org.apache.ignite.internal.sql.engine.util.QueryCheckerImpl.check(QueryCheckerImpl.java:322)
>   at 
> app//org.apache.ignite.internal.sql.engine.util.QueryCheckerFactoryImpl$1.check(QueryCheckerFactoryImpl.java:90)
>   at 
> app//org.apache.ignite.internal.sql.engine.ItMixedQueriesTest.testIgniteSchemaAwaresAlterTableCommand(ItMixedQueriesTest.java:221)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21039) Network performance optimization

2023-12-12 Thread Vyacheslav Koptilin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vyacheslav Koptilin updated IGNITE-21039:
-
Labels: ignite-3  (was: )

> Network performance optimization
> 
>
> Key: IGNITE-21039
> URL: https://issues.apache.org/jira/browse/IGNITE-21039
> Project: Ignite
>  Issue Type: Improvement
>  Components: networking
>Affects Versions: 3.0
>Reporter: Alexander Belyak
>Priority: Major
>  Labels: ignite-3
>
> I've run several test to find out the MessagingService performance metrics 
> and that is what I've found:
> {noformat}
> TestBoolaMessage 139MB/sec WARD
> TestByteaMessage 132MB/sec WARD
> TestDoubleaMessage 102MB/sec WARD
> TestFloataMessage 132MB/sec WARD
> TestDoubleaMessage 130MB/sec WARD
> TestLongaMessage 131MB/sec WARD
> TestDoubleaMessage 131MB/sec WARD
> TestStringaMessage 280MB/sec WARD
> TestBoolMessage 11MB/sec WARD WARD
> TestByteMessage 12MB/sec WARD
> TestDoubleMessage 12MB/sec WARD
> TestFloatMessage 13MB/sec WARD
> TestIntMessage 12MB/sec WARD
> TestLongMessage 11MB/sec WARD
> TestShortMessage 12MB/sec WARD
> TestStringMessage 18MB/sec WARD
> TestBool20Message 15MB/sec WARD
> TestByte20Message 12MB/sec WARD
> TestDouble20Message 32MB/sec WARD
> TestFloat20Message 22MB/sec WARD
> TestInt20Message 13MB/sec WARD
> TestLong20Message 14MB/sec WARD
> TestShort20Message 14MB/sec WARD
> TestString20Message 65MB/sec WARD
> {noformat}
> All messages were sent in the same setup: 2 server nodes, connected with a 
> *10GBit* interface. *Iperf3* (iperf3 --time 30 --zerocopy --client 
> 192.168.1.126 --omit 3 --interval 1 --length 16384 --window 131072 --parallel 
> 2 --json --version4) shows about *850MB/sec* network throughput. But the 
> *best AI3* result was only {*}280MB/sec{*}. Upper results use 3 type of 
> messages:
> 1. {*}TestaMessage{*}: array of 163840 elements (primitive, except 
> String) of type .
> 2. {*}TestMessage{*}: single property (primitive, except String) of 
> type 
> 3. {*}Test20Message{*}: 20 property (primitive, except String) of type 
> 
> All the messages were sent in parallel from the single thread with the window 
> of 100 messages (right after getting another first ack - the new message were 
> sent).
> It was expected, that network utilization low for the very short messages 
> (like 1 int or 20 int fields), but in comparison with the iperf3 results, the 
> performance of MessagingService for 163KBytes messages was very low. It 
> became significantly better only while sending huge array of strings (same 
> string "{color:#067d17}Test string to check message service 
> performance.{color}").
>  
> I've run another butch of tests with 1KB byte[] property in the message in 1 
> and 8 threads and without send window at all (each thread sends next message 
> after getting the ack for the previous one):
>  * *1 thread* and got *37 MBytes/sec*
>  * *8 threads* and got *63 MBytes/sec* result.
> So I suppose there is pretty much contention.
> All messages were sent in the followin manner:
> {code:java}
> private void send(ClusterNode target, NetworkMessage msg) {
> messagingService.send(target, msg).handle((v, t) -> {
> if (t != null) {
> LOG.info("Error while sending huge message", t);
> }
> if (time() < timeout) {
> send(target, msg);
> }
> }{code}
>  
>  
>  *  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21067) Clean up documents related to MVCC

2023-12-12 Thread YuJue Li (Jira)
YuJue Li created IGNITE-21067:
-

 Summary: Clean up documents related to MVCC
 Key: IGNITE-21067
 URL: https://issues.apache.org/jira/browse/IGNITE-21067
 Project: Ignite
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.15
Reporter: YuJue Li
Assignee: YuJue Li
 Fix For: 2.17


Clean up documents related to MVCC



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-20506) CacheAtomicityMode#TRANSACTIONAL_SNAPSHOT removal

2023-12-12 Thread YuJue Li (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YuJue Li reassigned IGNITE-20506:
-

Assignee: Anton Vinogradov  (was: YuJue Li)

> CacheAtomicityMode#TRANSACTIONAL_SNAPSHOT removal
> -
>
> Key: IGNITE-20506
> URL: https://issues.apache.org/jira/browse/IGNITE-20506
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Anton Vinogradov
>Assignee: Anton Vinogradov
>Priority: Major
>  Labels: important
> Fix For: 2.16
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21063) Cannot create 1000 tables

2023-12-12 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21063:
---
Description: 
Fails with OOM after a while, managing to create about 500 tales locally. We 
need to research, why it happens. Is there a leak, or we simply use too much 
memory.

Main candidate: thread-local marshallers. For some reason, we use too many 
threads, I guess? Meta-storage entries may be up to several megabytes in 
current implementation.

We should limit the size of cached buffers, and number of threads in general. 
Shared pool (priority-queue) of pre-allocated buffers would solve the issue, 
they don't have to be thread-local. It's a bit slower, but it's not a problem 
until proven otherwise

  was:Fails with OOM after a while, managing to create about 500 tales locally. 
We need to research, why it happens. Is there a leak, or we simply use too much 
memory


> Cannot create 1000 tables
> -
>
> Key: IGNITE-21063
> URL: https://issues.apache.org/jira/browse/IGNITE-21063
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> Fails with OOM after a while, managing to create about 500 tales locally. We 
> need to research, why it happens. Is there a leak, or we simply use too much 
> memory.
> Main candidate: thread-local marshallers. For some reason, we use too many 
> threads, I guess? Meta-storage entries may be up to several megabytes in 
> current implementation.
> We should limit the size of cached buffers, and number of threads in general. 
> Shared pool (priority-queue) of pre-allocated buffers would solve the issue, 
> they don't have to be thread-local. It's a bit slower, but it's not a problem 
> until proven otherwise



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21066) Create job priority change API

2023-12-12 Thread Mikhail Pochatkin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Pochatkin updated IGNITE-21066:
---
Summary: Create job priority change API  (was: Change job priority API)

> Create job priority change API
> --
>
> Key: IGNITE-21066
> URL: https://issues.apache.org/jira/browse/IGNITE-21066
> Project: Ignite
>  Issue Type: Improvement
>  Components: compute
>Reporter: Mikhail Pochatkin
>Priority: Major
>  Labels: ignite-3
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21066) Create job priority change API

2023-12-12 Thread Mikhail Pochatkin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Pochatkin updated IGNITE-21066:
---
Description: Once a job has been accepted for execution and is in the 
queue, we should be able to dynamically change its priority in order to move it 
up or down in the execution queue.

> Create job priority change API
> --
>
> Key: IGNITE-21066
> URL: https://issues.apache.org/jira/browse/IGNITE-21066
> Project: Ignite
>  Issue Type: Improvement
>  Components: compute
>Reporter: Mikhail Pochatkin
>Priority: Major
>  Labels: ignite-3
>
> Once a job has been accepted for execution and is in the queue, we should be 
> able to dynamically change its priority in order to move it up or down in the 
> execution queue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-21059) We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running cache operations

2023-12-12 Thread Evgeny Stanilovsky (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-21059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795690#comment-17795690
 ] 

Evgeny Stanilovsky commented on IGNITE-21059:
-

also you have infinite tx timeouts, plz configure :
https://ignite.apache.org/docs/latest/key-value-api/transactions#deadlock-detection

> We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running 
> cache operations
> 
>
> Key: IGNITE-21059
> URL: https://issues.apache.org/jira/browse/IGNITE-21059
> Project: Ignite
>  Issue Type: Bug
>  Components: binary, clients
>Affects Versions: 2.14
>Reporter: Vipul Thakur
>Priority: Critical
> Attachments: cache-config-1.xml, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt2, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt3, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt2, 
> ignite-server-nohup.out
>
>
> We have recently upgraded from 2.7.6 to 2.14 due to the issue observed in 
> production environment where cluster would go in hang state due to partition 
> map exchange.
> Please find the below ticket which i created a while back for ignite 2.7.6
> https://issues.apache.org/jira/browse/IGNITE-13298
> So we migrated the apache ignite version to 2.14 and upgrade happened 
> smoothly but on the third day we could see cluster traffic dip again. 
> We have 4 nodes in a cluster where we provide 400 GB of RAM and more than 1 
> TB HDD.
> PFB for the attached config.[I have added it as attachment for review]
> I have also added the server logs from the same time when issue happened.
> We have set txn timeout as well as socket timeout both at server and client 
> end for our write operations but seems like sometimes cluster goes into hang 
> state and all our get calls are stuck and slowly everything starts to freeze 
> our jms listener threads and every thread reaches a choked up state in 
> sometime.
> Due to which our read services which does not even use txn to retrieve data 
> also starts to choke. Ultimately leading to end user traffic dip.
> We were hoping product upgrade will help but that has not been the case till 
> now. 
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-21059) We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running cache operations

2023-12-12 Thread Evgeny Stanilovsky (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-21059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795688#comment-17795688
 ] 

Evgeny Stanilovsky commented on IGNITE-21059:
-

also plz increase 
https://ignite.apache.org/docs/latest/persistence/native-persistence#changing-wal-segment-size
numerous "Starting to clean WAL archive" in log


> We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running 
> cache operations
> 
>
> Key: IGNITE-21059
> URL: https://issues.apache.org/jira/browse/IGNITE-21059
> Project: Ignite
>  Issue Type: Bug
>  Components: binary, clients
>Affects Versions: 2.14
>Reporter: Vipul Thakur
>Priority: Critical
> Attachments: cache-config-1.xml, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt2, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt3, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt2, 
> ignite-server-nohup.out
>
>
> We have recently upgraded from 2.7.6 to 2.14 due to the issue observed in 
> production environment where cluster would go in hang state due to partition 
> map exchange.
> Please find the below ticket which i created a while back for ignite 2.7.6
> https://issues.apache.org/jira/browse/IGNITE-13298
> So we migrated the apache ignite version to 2.14 and upgrade happened 
> smoothly but on the third day we could see cluster traffic dip again. 
> We have 4 nodes in a cluster where we provide 400 GB of RAM and more than 1 
> TB HDD.
> PFB for the attached config.[I have added it as attachment for review]
> I have also added the server logs from the same time when issue happened.
> We have set txn timeout as well as socket timeout both at server and client 
> end for our write operations but seems like sometimes cluster goes into hang 
> state and all our get calls are stuck and slowly everything starts to freeze 
> our jms listener threads and every thread reaches a choked up state in 
> sometime.
> Due to which our read services which does not even use txn to retrieve data 
> also starts to choke. Ultimately leading to end user traffic dip.
> We were hoping product upgrade will help but that has not been the case till 
> now. 
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21066) Change job priority API

2023-12-12 Thread Mikhail Pochatkin (Jira)
Mikhail Pochatkin created IGNITE-21066:
--

 Summary: Change job priority API
 Key: IGNITE-21066
 URL: https://issues.apache.org/jira/browse/IGNITE-21066
 Project: Ignite
  Issue Type: Improvement
  Components: compute
Reporter: Mikhail Pochatkin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IGNITE-20847) Ownership mechanism for Compute Jobs

2023-12-12 Thread Mikhail Pochatkin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Pochatkin resolved IGNITE-20847.

Resolution: Won't Fix

> Ownership mechanism for Compute Jobs
> 
>
> Key: IGNITE-20847
> URL: https://issues.apache.org/jira/browse/IGNITE-20847
> Project: Ignite
>  Issue Type: Improvement
>  Components: compute
>Reporter: Mikhail Pochatkin
>Priority: Major
>  Labels: ignite-3
>
> If we have enabled authentication on AI3 each Compute job execution start 
> should store owner user. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (IGNITE-20463) Sql. Integration of TX-related statements into sql script processor

2023-12-12 Thread Pavel Pereslegin (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-20463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795679#comment-17795679
 ] 

Pavel Pereslegin edited comment on IGNITE-20463 at 12/12/23 10:58 AM:
--

[~zstan], [~amashenkov], thanks for the review!

Merged to the main branch 
([8ed1326|https://github.com/apache/ignite-3/commit/8ed13261f523b2c432ccb3702599df2106388f3c]).


was (Author: xtern):
Merged to the main branch 
([8ed1326|https://github.com/apache/ignite-3/commit/8ed13261f523b2c432ccb3702599df2106388f3c]).

[~zstan], [~amashenkov], thanks for the review!

> Sql. Integration of TX-related statements into sql script processor
> ---
>
> Key: IGNITE-20463
> URL: https://issues.apache.org/jira/browse/IGNITE-20463
> Project: Ignite
>  Issue Type: New Feature
>  Components: sql
>Affects Versions: 3.0.0-beta1
>Reporter: Evgeny Stanilovsky
>Assignee: Pavel Pereslegin
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 21h 20m
>  Remaining Estimate: 0h
>
> Script processor [1] need to process tx related statements.
> After parsing appropriate transaction syntax it need to be retained and 
> processed in script processor.
> [1] https://issues.apache.org/jira/browse/IGNITE-20443



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20905) Make it possible to add an explicitly NULL column via ADD COLUMN

2023-12-12 Thread Maksim Zhuravkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maksim Zhuravkov updated IGNITE-20905:
--
Fix Version/s: 3.0.0-beta2

> Make it possible to add an explicitly NULL column via ADD COLUMN
> 
>
> Key: IGNITE-20905
> URL: https://issues.apache.org/jira/browse/IGNITE-20905
> Project: Ignite
>  Issue Type: Improvement
>  Components: sql
>Reporter: Roman Puchkovskiy
>Assignee: Maksim Zhuravkov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When creating a table, it's possible to specify that a column is nullable by 
> explicitly using NULL:
> CREATE TABLE t(id INT PRIMARY KEY, col1 INT NULL)
> But, if we add a column to an existing table, this does not work:
> ALTER TABLE t ADD COLUMN col2 INT NULL
> -> Failed to parse query: Encountered "NULL" at line 1, column X
> It seems that for consistency ADD COLUMN should support same syntax as CREATE 
> TABLE does.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21065) Enhance granularity of authentication events

2023-12-12 Thread Ivan Gagarkin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Gagarkin updated IGNITE-21065:
---
Summary: Enhance granularity of authentication events  (was: Extend 
AuthenticationEvent with USER_CREATED and USER_CHANGED)

> Enhance granularity of authentication events
> 
>
> Key: IGNITE-21065
> URL: https://issues.apache.org/jira/browse/IGNITE-21065
> Project: Ignite
>  Issue Type: Bug
>  Components: security, thin client
>Reporter: Ivan Gagarkin
>Priority: Major
>  Labels: ignite-3
>
> Since the basic authenticator stores a list of users, we need to extend 
> authentication events to improve granularity. This is to ensure that the 
> connection is not closed for all users if just one of them changes their 
> password. Update the tests in 
> {{org.apache.ignite.client.handler.ClientInboundMessageHandlerTest}} 
> accordingly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-20506) CacheAtomicityMode#TRANSACTIONAL_SNAPSHOT removal

2023-12-12 Thread Anton Vinogradov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-20506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795678#comment-17795678
 ] 

Anton Vinogradov commented on IGNITE-20506:
---

[~liyuj], This Issue is already closed at 2.16.
If we need to change something, let's create new linked issue.

> CacheAtomicityMode#TRANSACTIONAL_SNAPSHOT removal
> -
>
> Key: IGNITE-20506
> URL: https://issues.apache.org/jira/browse/IGNITE-20506
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Anton Vinogradov
>Assignee: YuJue Li
>Priority: Major
>  Labels: important
> Fix For: 2.16
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21065) Extend AuthenticationEvent with USER_CREATED and USER_CHANGED

2023-12-12 Thread Ivan Gagarkin (Jira)
Ivan Gagarkin created IGNITE-21065:
--

 Summary: Extend AuthenticationEvent with USER_CREATED and 
USER_CHANGED
 Key: IGNITE-21065
 URL: https://issues.apache.org/jira/browse/IGNITE-21065
 Project: Ignite
  Issue Type: Bug
  Components: security, thin client
Reporter: Ivan Gagarkin


Since the basic authenticator stores a list of users, we need to extend 
authentication events to improve granularity. This is to ensure that the 
connection is not closed for all users if just one of them changes their 
password. Update the tests in 
{{org.apache.ignite.client.handler.ClientInboundMessageHandlerTest}} 
accordingly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21064) Refactor authentication naming and enum in Thin Client for clarity

2023-12-12 Thread Ivan Gagarkin (Jira)
Ivan Gagarkin created IGNITE-21064:
--

 Summary: Refactor authentication naming and enum in Thin Client 
for clarity
 Key: IGNITE-21064
 URL: https://issues.apache.org/jira/browse/IGNITE-21064
 Project: Ignite
  Issue Type: Improvement
  Components: thin client
Reporter: Ivan Gagarkin


Currently, the Thin Client utilizes 
{{org.apache.ignite.security.AuthenticationType}} to specify the authentication 
method during the handshake process. This approach can be confusing due to its 
interaction with the type of authentication defined in the configuration. To 
resolve this, we propose creating a separate enumeration specifically for the 
client. Additionally, the 'BASIC' authentication type should be renamed to 
'PASSWORD' for clearer understanding.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-20995) Add more integration tests for tx recovery on unstable topology

2023-12-12 Thread Vyacheslav Koptilin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vyacheslav Koptilin reassigned IGNITE-20995:


Assignee:  Kirill Sizov

> Add more integration tests for tx recovery on unstable topology
> ---
>
> Key: IGNITE-20995
> URL: https://issues.apache.org/jira/browse/IGNITE-20995
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Alexander Lapin
>Assignee:  Kirill Sizov
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> Surprisingly it might be useful to check tx recovery implementation with some 
> tests.
> h3. Defintion of Done
> 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21063) Cannot create 1000 tables

2023-12-12 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21063:
---
Description: Fails with OOM after a while, managing to create about 500 
tales locally. We need to research, why it happens. Is there a leak, or we 
simply use too much memory  (was: Fails with OOM on TC. We need to research, 
why it happens. Is there a leak, or we simply use too much memory)

> Cannot create 1000 tables
> -
>
> Key: IGNITE-21063
> URL: https://issues.apache.org/jira/browse/IGNITE-21063
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> Fails with OOM after a while, managing to create about 500 tales locally. We 
> need to research, why it happens. Is there a leak, or we simply use too much 
> memory



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-20918) Leases expire after a node has been restarted

2023-12-12 Thread Vyacheslav Koptilin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vyacheslav Koptilin reassigned IGNITE-20918:


Assignee: Alexander Lapin  (was: Vladislav Pyatkov)

> Leases expire after a node has been restarted
> -
>
> Key: IGNITE-20918
> URL: https://issues.apache.org/jira/browse/IGNITE-20918
> Project: Ignite
>  Issue Type: Bug
>Reporter: Aleksandr Polovtcev
>Assignee: Alexander Lapin
>Priority: Critical
>  Labels: ignite-3
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> IGNITE-20910 introduces a test that inserts some data after restarting a 
> node. For some reason, after some time, I can see the following messages in 
> the log:
> {noformat}
> [2023-11-22T10:00:17,056][INFO 
> ][%isnt_tmpar_0%metastorage-watch-executor-3][PartitionReplicaListener] 
> Primary replica expired [grp=5_part_19]
> [2023-11-22T10:00:17,057][INFO 
> ][%isnt_tmpar_0%metastorage-watch-executor-3][PartitionReplicaListener] 
> Primary replica expired [grp=5_part_0]
> [2023-11-22T10:00:17,057][INFO 
> ][%isnt_tmpar_0%metastorage-watch-executor-3][PartitionReplicaListener] 
> Primary replica expired [grp=5_part_9]
> [2023-11-22T10:00:17,057][INFO 
> ][%isnt_tmpar_0%metastorage-watch-executor-3][PartitionReplicaListener] 
> Primary replica expired [grp=5_part_10]
> {noformat}
> After that, the test fails with a {{PrimaryReplicaMissException}}. The 
> problem here, that it is expected that a single node should never have 
> expired leases, they should be prolongated automatically. I think that this 
> happens because the initial lease that was issued before the node was 
> restarted is still accepted by the node after restart.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-20993) Make the tables recover on the same assignments on different nodes

2023-12-12 Thread Vyacheslav Koptilin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vyacheslav Koptilin reassigned IGNITE-20993:


Assignee: Denis Chudov

> Make the tables recover on the same assignments on different nodes
> --
>
> Key: IGNITE-20993
> URL: https://issues.apache.org/jira/browse/IGNITE-20993
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Denis Chudov
>Assignee: Denis Chudov
>Priority: Major
>  Labels: ignite-3
>
> *Motivation*
> Currently the following is possible:
>  * node A performs the recovery on revision _x_ and in case of absence of 
> stable assignments calculates the assignments from data nodes for this 
> revision;
>  * node B is doing the same for recovery revision _y_ which is not equal to 
> {_}x{_}.
> As a result, they can start partitions for different assignments which are 
> not consistent, and this can lead to side effects like unavailability of the 
> majority for some partitions, and ambiguous unpredictable assignments will be 
> written to meta storate due to the race between these nodes.
> *Definition of done*
> The multiple nodes performing recovery always start partitions for 
> assignments calculated for the same revision.
> *Implementation notes*
> The common revision for different nodes in this case can be revision on which 
> the table was created, it should be done under IGNITE-21014 .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21063) Cannot create 1000 tables

2023-12-12 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21063:
---
Description: Fails with OOM on TC. We need to research, why it happens. Is 
there a leak, or we simply use too much memory  (was: Fails with OOM on TC)

> Cannot create 1000 tables
> -
>
> Key: IGNITE-21063
> URL: https://issues.apache.org/jira/browse/IGNITE-21063
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> Fails with OOM on TC. We need to research, why it happens. Is there a leak, 
> or we simply use too much memory



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-21014) Table creation revision for table descriptor

2023-12-12 Thread Mirza Aliev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mirza Aliev reassigned IGNITE-21014:


Assignee: Mirza Aliev

> Table creation revision for table descriptor
> 
>
> Key: IGNITE-21014
> URL: https://issues.apache.org/jira/browse/IGNITE-21014
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Denis Chudov
>Assignee: Mirza Aliev
>Priority: Major
>  Labels: ignite-3
>
> *Motivation*
> In order to be able to correctly recover data nodes for tables that don't 
> have stable assignments in meta storage (see IGNITE-20993 ) there should be 
> some special revision for the recovery. We can use the revision on which such 
> tables were created. Table creation revision should be added to the table 
> descriptor.
> *Definition of done*
> Creation revision is added to the table descriptor.
> *Implementation notes*
> This creation revision shouldn't change in new versions of the descriptor - 
> it should be taken from the previous version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21014) Table creation revision for table descriptor

2023-12-12 Thread Vyacheslav Koptilin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vyacheslav Koptilin updated IGNITE-21014:
-
Epic Link:   (was: IGNITE-19170)

> Table creation revision for table descriptor
> 
>
> Key: IGNITE-21014
> URL: https://issues.apache.org/jira/browse/IGNITE-21014
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Denis Chudov
>Priority: Major
>  Labels: ignite-3
>
> *Motivation*
> In order to be able to correctly recover data nodes for tables that don't 
> have stable assignments in meta storage (see IGNITE-20993 ) there should be 
> some special revision for the recovery. We can use the revision on which such 
> tables were created. Table creation revision should be added to the table 
> descriptor.
> *Definition of done*
> Creation revision is added to the table descriptor.
> *Implementation notes*
> This creation revision shouldn't change in new versions of the descriptor - 
> it should be taken from the previous version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-21059) We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running cache operations

2023-12-12 Thread Evgeny Stanilovsky (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-21059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795675#comment-17795675
 ] 

Evgeny Stanilovsky commented on IGNITE-21059:
-

[~vipul.thakur] can you attach logs corresponding to observed problem (some 
time earlier and some after) the incident ? Thread dumps seems can`t help here 
... If logs are already rotated plz make fresh copy if incident repeats.

> We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running 
> cache operations
> 
>
> Key: IGNITE-21059
> URL: https://issues.apache.org/jira/browse/IGNITE-21059
> Project: Ignite
>  Issue Type: Bug
>  Components: binary, clients
>Affects Versions: 2.14
>Reporter: Vipul Thakur
>Priority: Critical
> Attachments: cache-config-1.xml, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt2, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt3, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt2, 
> ignite-server-nohup.out
>
>
> We have recently upgraded from 2.7.6 to 2.14 due to the issue observed in 
> production environment where cluster would go in hang state due to partition 
> map exchange.
> Please find the below ticket which i created a while back for ignite 2.7.6
> https://issues.apache.org/jira/browse/IGNITE-13298
> So we migrated the apache ignite version to 2.14 and upgrade happened 
> smoothly but on the third day we could see cluster traffic dip again. 
> We have 4 nodes in a cluster where we provide 400 GB of RAM and more than 1 
> TB HDD.
> PFB for the attached config.[I have added it as attachment for review]
> I have also added the server logs from the same time when issue happened.
> We have set txn timeout as well as socket timeout both at server and client 
> end for our write operations but seems like sometimes cluster goes into hang 
> state and all our get calls are stuck and slowly everything starts to freeze 
> our jms listener threads and every thread reaches a choked up state in 
> sometime.
> Due to which our read services which does not even use txn to retrieve data 
> also starts to choke. Ultimately leading to end user traffic dip.
> We were hoping product upgrade will help but that has not been the case till 
> now. 
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21063) Cannot create 1000 tables

2023-12-12 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21063:
---
Ignite Flags:   (was: Docs Required,Release Notes Required)

> Cannot create 1000 tables
> -
>
> Key: IGNITE-21063
> URL: https://issues.apache.org/jira/browse/IGNITE-21063
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> Fails with OOM on TC



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-20745) TableManager.tableAsync(int tableId) is slowing down thin clients

2023-12-12 Thread Vyacheslav Koptilin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vyacheslav Koptilin reassigned IGNITE-20745:


Assignee: Igor Sapego  (was: Kirill Gusakov)

> TableManager.tableAsync(int tableId) is slowing down thin clients
> -
>
> Key: IGNITE-20745
> URL: https://issues.apache.org/jira/browse/IGNITE-20745
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 3.0.0-beta1
>Reporter: Pavel Tupitsyn
>Assignee: Igor Sapego
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
> Attachments: ItThinClientPutGetBenchmark.java
>
>
> Performance difference between embedded and client modes is affected 
> considerably by the call to *IgniteTablesInternal#tableAsync(int id)*. This 
> call has to be performed on every individual table operation.
> We should make it as fast as possible. Something like a dictionary lookup + 
> quick check for deleted table.
> ||Part||Duration, us||
> |Network & msgpack|19.30|
> |Get table|14.29|
> |Get tuple & serialize|12.86|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21016) ItMixedQueriesTest.testIgniteSchemaAwaresAlterTableCommand is flaky

2023-12-12 Thread Konstantin Orlov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Orlov updated IGNITE-21016:
--
Ignite Flags:   (was: Docs Required,Release Notes Required)

> ItMixedQueriesTest.testIgniteSchemaAwaresAlterTableCommand is flaky
> ---
>
> Key: IGNITE-21016
> URL: https://issues.apache.org/jira/browse/IGNITE-21016
> Project: Ignite
>  Issue Type: Bug
>  Components: sql
>Reporter: Yury Gerzhedovich
>Assignee: Konstantin Orlov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>
> The test 
> org.apache.ignite.internal.sql.engine.ItMixedQueriesTest#testIgniteSchemaAwaresAlterTableCommand
>  is flacky.
> The issue periodically appears on TC, also reproducable on local environment.
> {code:java}
> org.opentest4j.AssertionFailedError: Column metadata doesn't match ==> 
> expected: <3> but was: <2>
>   at 
> app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
>   at 
> app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
>   at 
> app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
>   at 
> app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:150)
>   at app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:560)
>   at 
> app//org.apache.ignite.internal.sql.engine.util.QueryCheckerImpl.check(QueryCheckerImpl.java:322)
>   at 
> app//org.apache.ignite.internal.sql.engine.util.QueryCheckerFactoryImpl$1.check(QueryCheckerFactoryImpl.java:90)
>   at 
> app//org.apache.ignite.internal.sql.engine.ItMixedQueriesTest.testIgniteSchemaAwaresAlterTableCommand(ItMixedQueriesTest.java:221)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21016) ItMixedQueriesTest.testIgniteSchemaAwaresAlterTableCommand is flaky

2023-12-12 Thread Konstantin Orlov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Orlov updated IGNITE-21016:
--
Fix Version/s: 3.0.0-beta2

> ItMixedQueriesTest.testIgniteSchemaAwaresAlterTableCommand is flaky
> ---
>
> Key: IGNITE-21016
> URL: https://issues.apache.org/jira/browse/IGNITE-21016
> Project: Ignite
>  Issue Type: Bug
>  Components: sql
>Reporter: Yury Gerzhedovich
>Assignee: Konstantin Orlov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>
> The test 
> org.apache.ignite.internal.sql.engine.ItMixedQueriesTest#testIgniteSchemaAwaresAlterTableCommand
>  is flacky.
> The issue periodically appears on TC, also reproducable on local environment.
> {code:java}
> org.opentest4j.AssertionFailedError: Column metadata doesn't match ==> 
> expected: <3> but was: <2>
>   at 
> app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
>   at 
> app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
>   at 
> app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
>   at 
> app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:150)
>   at app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:560)
>   at 
> app//org.apache.ignite.internal.sql.engine.util.QueryCheckerImpl.check(QueryCheckerImpl.java:322)
>   at 
> app//org.apache.ignite.internal.sql.engine.util.QueryCheckerFactoryImpl$1.check(QueryCheckerFactoryImpl.java:90)
>   at 
> app//org.apache.ignite.internal.sql.engine.ItMixedQueriesTest.testIgniteSchemaAwaresAlterTableCommand(ItMixedQueriesTest.java:221)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-20361) Implement the table storage description

2023-12-12 Thread Mirza Aliev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mirza Aliev reassigned IGNITE-20361:


Assignee: Mirza Aliev

> Implement the table storage description
> ---
>
> Key: IGNITE-20361
> URL: https://issues.apache.org/jira/browse/IGNITE-20361
> Project: Ignite
>  Issue Type: Task
>Reporter: Kirill Gusakov
>Assignee: Mirza Aliev
>Priority: Major
>  Labels: ignite-3
>
> *Motivation*
> According to IGNITE-20357 we need to have an appropriate table storage 
> configuration, which can be used on the table create/alter to check if the 
> chosen zone has appropriate storage requirements.
> *Definition of done*
> - Table has the storage configuration, which can be used for early validation 
> if table can be "deployed" in its zone correctly.
> - Data storage configuration removed from zone configuration
> *Notes*
> - Avoid altering of this field for now with appropriate exception



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21063) Cannot create 1000 tables

2023-12-12 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21063:
--

 Summary: Cannot create 1000 tables
 Key: IGNITE-21063
 URL: https://issues.apache.org/jira/browse/IGNITE-21063
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov


Fails with OOM on TC



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-21062) Safe time reordering in partitions

2023-12-12 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov reassigned IGNITE-21062:
--

Assignee: Ivan Bessonov

> Safe time reordering in partitions
> --
>
> Key: IGNITE-21062
> URL: https://issues.apache.org/jira/browse/IGNITE-21062
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> In the scenario of creating a lot of table and having slow system 
> (presumably), it's possible to notice {{Safe time reordering detected 
> [current=...}} assertion error in logs.
> It happens with safe-time sync commands, in the absence of transactional load.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21062) Safe time reordering in partitions

2023-12-12 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21062:
--

 Summary: Safe time reordering in partitions
 Key: IGNITE-21062
 URL: https://issues.apache.org/jira/browse/IGNITE-21062
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov


In the scenario of creating a lot of table and having slow system (presumably), 
it's possible to notice {{Safe time reordering detected [current=...}} 
assertion error in logs.

It happens with safe-time sync commands, in the absence of transactional load.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21061) Durable cleanup requires additional replication group command

2023-12-12 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-21061:
--

 Summary: Durable cleanup requires additional replication group 
command
 Key: IGNITE-21061
 URL: https://issues.apache.org/jira/browse/IGNITE-21061
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov


h3. Motivation
After locks are released, it is required to write the information to 
transaction persistent storage and replicate it on all commit partition 
replication group nodes. That is performed by the replication command 
({{MarkLocksReleasedCommand}}). As a result, we have an additional replication 
command for the entire transaction process.

h3. Implementation notes
In my opinion, we can resolve this situation in the transaction resolution 
procedure ({{OrphanDetector}}). It just needs to check the lease on the commit 
partition. If the lease is changed when we are faced with a transaction lock, 
the replication process should start.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-21060) Extract ClusterNodeResolver as a separate entity

2023-12-12 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IGNITE-21060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

 Kirill Sizov reassigned IGNITE-21060:
--

Assignee:  Kirill Sizov

> Extract ClusterNodeResolver as a separate entity
> 
>
> Key: IGNITE-21060
> URL: https://issues.apache.org/jira/browse/IGNITE-21060
> Project: Ignite
>  Issue Type: Task
>Reporter:  Kirill Sizov
>Assignee:  Kirill Sizov
>Priority: Major
>  Labels: ignite-3
>
> *Motivation*
> There are many places in the code having a parameter and/or a field like
> {code}
> Function clusterNodeResolver
> {code}
> Instead of a generic function we want to have a specific ClusterNodeResolver 
> entity for the better code readability.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21060) Extract ClusterNodeResolver as a separate entity

2023-12-12 Thread Jira
 Kirill Sizov created IGNITE-21060:
--

 Summary: Extract ClusterNodeResolver as a separate entity
 Key: IGNITE-21060
 URL: https://issues.apache.org/jira/browse/IGNITE-21060
 Project: Ignite
  Issue Type: Task
Reporter:  Kirill Sizov


*Motivation*

There are many places in the code having a parameter and/or a field like
{code}
Function clusterNodeResolver
{code}

Instead of a generic function we want to have a specific ClusterNodeResolver 
entity for the better code readability.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20909) Thin 3.0: Compute jobs should use server notification to signal completion to the client

2023-12-12 Thread Pavel Tupitsyn (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Tupitsyn updated IGNITE-20909:

Ignite Flags:   (was: Docs Required,Release Notes Required)

> Thin 3.0: Compute jobs should use server notification to signal completion to 
> the client
> 
>
> Key: IGNITE-20909
> URL: https://issues.apache.org/jira/browse/IGNITE-20909
> Project: Ignite
>  Issue Type: Improvement
>  Components: compute, thin client
>Affects Versions: 3.0.0-beta1
>Reporter: Pavel Tupitsyn
>Assignee: Pavel Tupitsyn
>Priority: Major
>  Labels: iep-42, ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Compute jobs can be long-lived and even out-live the client connection. New 
> Compute API is coming that will return some "execution" object immediately, 
> which can be used to monitor or cancel the job. Therefore, job startup and 
> completion should be separated - normal request-response approach is not 
> suitable. 
> * Use request-response to initiate the job execution and return an ID to the 
> client
> * Use server -> client notification to signal about completion
> This is a tried approach from Ignite 2.x, see linked 
> [IEP-42|https://cwiki.apache.org/confluence/display/IGNITE/IEP-42+Thin+Client+Compute]
>  and related discussion



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-20909) Thin 3.0: Compute jobs should use server notification to signal completion to the client

2023-12-12 Thread Pavel Tupitsyn (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-20909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795641#comment-17795641
 ] 

Pavel Tupitsyn commented on IGNITE-20909:
-

Merged to main: 0f5434baf426f9c56fb1d511364f9d83a5dfe24d

> Thin 3.0: Compute jobs should use server notification to signal completion to 
> the client
> 
>
> Key: IGNITE-20909
> URL: https://issues.apache.org/jira/browse/IGNITE-20909
> Project: Ignite
>  Issue Type: Improvement
>  Components: compute, thin client
>Affects Versions: 3.0.0-beta1
>Reporter: Pavel Tupitsyn
>Assignee: Pavel Tupitsyn
>Priority: Major
>  Labels: iep-42, ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Compute jobs can be long-lived and even out-live the client connection. New 
> Compute API is coming that will return some "execution" object immediately, 
> which can be used to monitor or cancel the job. Therefore, job startup and 
> completion should be separated - normal request-response approach is not 
> suitable. 
> * Use request-response to initiate the job execution and return an ID to the 
> client
> * Use server -> client notification to signal about completion
> This is a tried approach from Ignite 2.x, see linked 
> [IEP-42|https://cwiki.apache.org/confluence/display/IGNITE/IEP-42+Thin+Client+Compute]
>  and related discussion



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IGNITE-21056) Use thread local buffer for encrypted dump

2023-12-12 Thread Nikolay Izhikov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikolay Izhikov resolved IGNITE-21056.
--
Resolution: Fixed

> Use thread local buffer for encrypted dump
> --
>
> Key: IGNITE-21056
> URL: https://issues.apache.org/jira/browse/IGNITE-21056
> Project: Ignite
>  Issue Type: Task
>Reporter: Yuri Naryshkin
>Assignee: Yuri Naryshkin
>Priority: Minor
>  Labels: IEP-109, ise
> Fix For: 2.16
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When encrypted dump taken, expanded byte buffer doesn't replace thread local 
> one. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21056) Use thread local buffer for encrypted dump

2023-12-12 Thread Nikolay Izhikov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikolay Izhikov updated IGNITE-21056:
-
Fix Version/s: 2.16

> Use thread local buffer for encrypted dump
> --
>
> Key: IGNITE-21056
> URL: https://issues.apache.org/jira/browse/IGNITE-21056
> Project: Ignite
>  Issue Type: Task
>Reporter: Yuri Naryshkin
>Assignee: Yuri Naryshkin
>Priority: Minor
>  Labels: IEP-109, ise
> Fix For: 2.16
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When encrypted dump taken, expanded byte buffer doesn't replace thread local 
> one. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-21056) Use thread local buffer for encrypted dump

2023-12-12 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-21056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795636#comment-17795636
 ] 

Ignite TC Bot commented on IGNITE-21056:


{panel:title=Branch: [pull/11086/head] Base: [master] : No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
{panel:title=Branch: [pull/11086/head] Base: [master] : No new tests 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1}{panel}
[TeamCity *-- Run :: All* 
Results|https://ci2.ignite.apache.org/viewLog.html?buildId=7652539buildTypeId=IgniteTests24Java8_RunAll]

> Use thread local buffer for encrypted dump
> --
>
> Key: IGNITE-21056
> URL: https://issues.apache.org/jira/browse/IGNITE-21056
> Project: Ignite
>  Issue Type: Task
>Reporter: Yuri Naryshkin
>Assignee: Yuri Naryshkin
>Priority: Minor
>  Labels: IEP-109, ise
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When encrypted dump taken, expanded byte buffer doesn't replace thread local 
> one. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)