[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2017-04-14 Thread serg38
Github user serg38 commented on the issue: https://github.com/apache/cloudstack/pull/1762 @yvsubhash Please, take this up. So far this PR hasn't moved forward. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2017-04-14 Thread yvsubhash
Github user yvsubhash commented on the issue: https://github.com/apache/cloudstack/pull/1762 @serg38 Is the refactoring suggested by rafael taken care by @nvazquez, else I would take it up --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2017-03-13 Thread cloudmonger
Github user cloudmonger commented on the issue: https://github.com/apache/cloudstack/pull/1762 ### ACS CI BVT Run **Sumarry:** Build Number 464 Hypervisor xenserver NetworkType Advanced Passed=104 Failed=1 Skipped=7 _Link to logs Folder

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2016-11-30 Thread rafaelweingartner
Github user rafaelweingartner commented on the issue: https://github.com/apache/cloudstack/pull/1762 @serg38 I have the same understanding about the agent LB. And this is one of the problems I think we have found here. It seems that this method is removing the balance created with

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2016-11-29 Thread serg38
Github user serg38 commented on the issue: https://github.com/apache/cloudstack/pull/1762 @rafaelweingartner Thanks a lot. I totally agree that resetting hosts doesn't really need to be a part of transaction and should be extracted to a new method. The same is for lines 527-546, and

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2016-11-27 Thread rafaelweingartner
Github user rafaelweingartner commented on the issue: https://github.com/apache/cloudstack/pull/1762 @serg38, it is great that you found one of the methods that cause the deadlock problem “com.cloud.host.dao.HostDaoImpl.findAndUpdateDirectAgentToLoad(long, Long, long)”.

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2016-11-25 Thread serg38
Github user serg38 commented on the issue: https://github.com/apache/cloudstack/pull/1762 @rafaelweingartner I might be wrong but 2d came from findAndUpdateDirectAgentToLoad in HostDaoImpl which also creates a large transaction. --- If your project is set up for it, you can reply

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2016-11-25 Thread serg38
Github user serg38 commented on the issue: https://github.com/apache/cloudstack/pull/1762 @rafaelweingartner You might be right that pod_vlan_map should be in the join. May be I didn't find the correct methods after all. @jburwell @rhtyd What do you think? I was able to find

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2016-11-25 Thread rafaelweingartner
Github user rafaelweingartner commented on the issue: https://github.com/apache/cloudstack/pull/1762 @serg38 if that "AssignIpAddressFromPodVlanSearch" object was being used to generate the SQL; should not we see a join with "pod_vlan_map" too? For me this, this SC is very confusing.

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2016-11-24 Thread serg38
Github user serg38 commented on the issue: https://github.com/apache/cloudstack/pull/1762 @rafaelweingartner Tried tracing where deadlock 5 originated. It seems both transactions are part of the same method fetchNewPublicIp in IpAddressManagerImpl . Transactions are executed on

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2016-11-24 Thread serg38
Github user serg38 commented on the issue: https://github.com/apache/cloudstack/pull/1762 @rafaelweingartner Looks like the deadlocks 2 and 3 are the same. I scanned our production log and since last December we had 6400 deadlocks. Out of them close to 6000 were Deadlock 1 20

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2016-11-24 Thread rafaelweingartner
Github user rafaelweingartner commented on the issue: https://github.com/apache/cloudstack/pull/1762 Thanks, @serg38. Looking at the SQLs you posted. We could start to discuss whether or not some SQLs statements need locking transactions. Ignoring Deadlocks 3 and 4 for

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2016-11-24 Thread serg38
Github user serg38 commented on the issue: https://github.com/apache/cloudstack/pull/1762 Here it is few samples of deadlocks we observe in high transaction volume environment with multiple management servers. As you can see most of them are concurrent operations from different

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2016-11-24 Thread rafaelweingartner
Github user rafaelweingartner commented on the issue: https://github.com/apache/cloudstack/pull/1762 @serg38 I have just now started reading this PR (excuse me if I overlooked some information). > If we are to try to implement a general way of dealing with deadlocks in ACS

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2016-11-24 Thread serg38
Github user serg38 commented on the issue: https://github.com/apache/cloudstack/pull/1762 @rafaelweingartner @swill @wido @koushik-das @karuturi @rhtyd @jburwell Let's ask a different question. If we are to try to implement a general way of dealing with deadlocks in ACS how could it

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2016-11-24 Thread abhinandanprateek
Github user abhinandanprateek commented on the issue: https://github.com/apache/cloudstack/pull/1762 Even trying the full transaction again could be problematic as there might be checks done before firing the transaction that may not be valid now. The thing is it may mostly work,

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2016-11-23 Thread serg38
Github user serg38 commented on the issue: https://github.com/apache/cloudstack/pull/1762 What about if the author can figure out a way to identify all part of transaction being cancelled and retry all parts? Or retry the whole transaction? It would be nice to open a path for the

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2016-11-23 Thread jburwell
Github user jburwell commented on the issue: https://github.com/apache/cloudstack/pull/1762 @serg38 corruption could happen at any point -- it's a ticking time bomb. From a ACID perspective, this patch fails from a consistency perspective. All data being updated must be re-queried

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2016-11-23 Thread serg38
Github user serg38 commented on the issue: https://github.com/apache/cloudstack/pull/1762 @jburwell We've been running this fix as a part of proprietary CS for several weeks now. We are observing elimination of deadlocks and no DB corruption. Retry seems to be the only realistic way

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2016-11-23 Thread jburwell
Github user jburwell commented on the issue: https://github.com/apache/cloudstack/pull/1762 @rhtyd I am -1 on this PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2016-11-23 Thread rhtyd
Github user rhtyd commented on the issue: https://github.com/apache/cloudstack/pull/1762 @abhinandanprateek can you help reviewing this one, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2016-11-20 Thread blueorangutan
Github user blueorangutan commented on the issue: https://github.com/apache/cloudstack/pull/1762 Trillian test result (tid-347) Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7 Total time taken: 26094 seconds Marvin logs:

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2016-11-20 Thread blueorangutan
Github user blueorangutan commented on the issue: https://github.com/apache/cloudstack/pull/1762 @rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2016-11-20 Thread rhtyd
Github user rhtyd commented on the issue: https://github.com/apache/cloudstack/pull/1762 @blueorangutan test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2016-11-20 Thread blueorangutan
Github user blueorangutan commented on the issue: https://github.com/apache/cloudstack/pull/1762 Packaging result: ✔centos6 ✔centos7 ✔debian. JID-164 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2016-11-19 Thread blueorangutan
Github user blueorangutan commented on the issue: https://github.com/apache/cloudstack/pull/1762 @rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2016-11-19 Thread rhtyd
Github user rhtyd commented on the issue: https://github.com/apache/cloudstack/pull/1762 @blueorangutan package --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2016-11-16 Thread jburwell
Github user jburwell commented on the issue: https://github.com/apache/cloudstack/pull/1762 Due to the previous discussion, I am -1 on merging this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2016-11-15 Thread cloudmonger
Github user cloudmonger commented on the issue: https://github.com/apache/cloudstack/pull/1762 ### ACS CI BVT Run **Sumarry:** Build Number 135 Hypervisor xenserver NetworkType Advanced Passed=102 Failed=3 Skipped=6 _Link to logs Folder

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2016-11-14 Thread jburwell
Github user jburwell commented on the issue: https://github.com/apache/cloudstack/pull/1762 @serg38 with custom plugins, there is no way to reliably perform such tracing. I can think of batch cleanup operations in the storage layer that follow the pattern I described. Even if there

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2016-11-14 Thread serg38
Github user serg38 commented on the issue: https://github.com/apache/cloudstack/pull/1762 @jburwell I concur but if @yvsubhash verified that those methods don't participate in complex DML transactions this might be still a good start. If so this approach might be expanded later to

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2016-11-14 Thread jburwell
Github user jburwell commented on the issue: https://github.com/apache/cloudstack/pull/1762 @serg38 there remains a risk when those methods are executed in the context of an open transaction where DMLs have already been executed and subsequent DMLs will be executed. In this

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2016-11-14 Thread serg38
Github user serg38 commented on the issue: https://github.com/apache/cloudstack/pull/1762 @jburwell @yvsubhash I might be wrong but this PR will retry on deadlock for only 2 DAO methods searchIncludingRemoved and customSearchIncludingRemoved. No update methods are set with this

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2016-11-14 Thread jburwell
Github user jburwell commented on the issue: https://github.com/apache/cloudstack/pull/1762 @serg38 that is not a safe assumption. Transactions often span multiple statements and methods across DAOs. `TransactionLegacy` has a transaction stacking/nested model that further occludes

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2016-11-14 Thread serg38
Github user serg38 commented on the issue: https://github.com/apache/cloudstack/pull/1762 @jburwell I thought that most if not all of ACS interaction through DAO is rather atomic transactions. Do we have cases of multiple DML statements as a part of the same transaction? We have been

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2016-11-14 Thread jburwell
Github user jburwell commented on the issue: https://github.com/apache/cloudstack/pull/1762 @serg38 my reading of the code is that only the most recently attempted DML will be re-executed. Furthermore, retrying without refreshing the base data can also lead to data corruption. The

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2016-11-14 Thread serg38
Github user serg38 commented on the issue: https://github.com/apache/cloudstack/pull/1762 @jburwell @yvsubhash My understanding that all roll back statements will receive MYSQL_DEADLOCK_ERROR_CODE and will be retired as a part of this patch. --- If your project is set up for it,

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2016-11-14 Thread jburwell
Github user jburwell commented on the issue: https://github.com/apache/cloudstack/pull/1762 @yvsubhash according to the (MySQL deadlock documenation)[http://dev.mysql.com/doc/refman/5.7/en/innodb-deadlocks.html], a `MYSQL_DEADLOCK_ERROR_CODE` error indicates the enclosing

[GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried...

2016-11-11 Thread serg38
Github user serg38 commented on the issue: https://github.com/apache/cloudstack/pull/1762 LGTM. Finally !!! We have been seeing occasional deadlocks in environments with high level transaction rate. @rhtyd @jburwell This could be a good add to 4.8/4.9. --- If your project is set