Github user serg38 commented on the issue:
https://github.com/apache/cloudstack/pull/1762
@yvsubhash Please, take this up. So far this PR hasn't moved forward.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user yvsubhash commented on the issue:
https://github.com/apache/cloudstack/pull/1762
@serg38 Is the refactoring suggested by rafael taken care by @nvazquez,
else I would take it up
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user cloudmonger commented on the issue:
https://github.com/apache/cloudstack/pull/1762
### ACS CI BVT Run
**Sumarry:**
Build Number 464
Hypervisor xenserver
NetworkType Advanced
Passed=104
Failed=1
Skipped=7
_Link to logs Folder
Github user rafaelweingartner commented on the issue:
https://github.com/apache/cloudstack/pull/1762
@serg38 I have the same understanding about the agent LB. And this is one
of the problems I think we have found here. It seems that this method is
removing the balance created with
Github user serg38 commented on the issue:
https://github.com/apache/cloudstack/pull/1762
@rafaelweingartner Thanks a lot. I totally agree that resetting hosts
doesn't really need to be a part of transaction and should be extracted to a
new method. The same is for lines 527-546, and
Github user rafaelweingartner commented on the issue:
https://github.com/apache/cloudstack/pull/1762
@serg38, it is great that you found one of the methods that cause the
deadlock problem
âcom.cloud.host.dao.HostDaoImpl.findAndUpdateDirectAgentToLoad(long, Long,
long)â.
Github user serg38 commented on the issue:
https://github.com/apache/cloudstack/pull/1762
@rafaelweingartner I might be wrong but 2d came from
findAndUpdateDirectAgentToLoad in HostDaoImpl which also creates a large
transaction.
---
If your project is set up for it, you can reply
Github user serg38 commented on the issue:
https://github.com/apache/cloudstack/pull/1762
@rafaelweingartner You might be right that pod_vlan_map should be in the
join. May be I didn't find the correct methods after all. @jburwell @rhtyd What
do you think?
I was able to find
Github user rafaelweingartner commented on the issue:
https://github.com/apache/cloudstack/pull/1762
@serg38 if that "AssignIpAddressFromPodVlanSearch" object was being used to
generate the SQL; should not we see a join with "pod_vlan_map" too? For me
this, this SC is very confusing.
Github user serg38 commented on the issue:
https://github.com/apache/cloudstack/pull/1762
@rafaelweingartner Tried tracing where deadlock 5 originated. It seems
both transactions are part of the same method fetchNewPublicIp in
IpAddressManagerImpl . Transactions are executed on
Github user serg38 commented on the issue:
https://github.com/apache/cloudstack/pull/1762
@rafaelweingartner Looks like the deadlocks 2 and 3 are the same. I
scanned our production log and since last December we had 6400 deadlocks. Out
of them close to 6000 were Deadlock 1
20
Github user rafaelweingartner commented on the issue:
https://github.com/apache/cloudstack/pull/1762
Thanks, @serg38.
Looking at the SQLs you posted. We could start to discuss whether or not
some SQLs statements need locking transactions.
Ignoring Deadlocks 3 and 4 for
Github user serg38 commented on the issue:
https://github.com/apache/cloudstack/pull/1762
Here it is few samples of deadlocks we observe in high transaction volume
environment with multiple management servers. As you can see most of them are
concurrent operations from different
Github user rafaelweingartner commented on the issue:
https://github.com/apache/cloudstack/pull/1762
@serg38 I have just now started reading this PR (excuse me if I overlooked
some information).
> If we are to try to implement a general way of dealing with deadlocks in
ACS
Github user serg38 commented on the issue:
https://github.com/apache/cloudstack/pull/1762
@rafaelweingartner @swill @wido @koushik-das @karuturi @rhtyd @jburwell
Let's ask a different question. If we are to try to implement a general way of
dealing with deadlocks in ACS how could it
Github user abhinandanprateek commented on the issue:
https://github.com/apache/cloudstack/pull/1762
Even trying the full transaction again could be problematic as there might
be checks done before firing the transaction that may not be valid now.
The thing is it may mostly work,
Github user serg38 commented on the issue:
https://github.com/apache/cloudstack/pull/1762
What about if the author can figure out a way to identify all part of
transaction being cancelled and retry all parts? Or retry the whole
transaction? It would be nice to open a path for the
Github user jburwell commented on the issue:
https://github.com/apache/cloudstack/pull/1762
@serg38 corruption could happen at any point -- it's a ticking time bomb.
From a ACID perspective, this patch fails from a consistency perspective. All
data being updated must be re-queried
Github user serg38 commented on the issue:
https://github.com/apache/cloudstack/pull/1762
@jburwell We've been running this fix as a part of proprietary CS for
several weeks now. We are observing elimination of deadlocks and no DB
corruption. Retry seems to be the only realistic way
Github user jburwell commented on the issue:
https://github.com/apache/cloudstack/pull/1762
@rhtyd I am -1 on this PR
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user rhtyd commented on the issue:
https://github.com/apache/cloudstack/pull/1762
@abhinandanprateek can you help reviewing this one, thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user blueorangutan commented on the issue:
https://github.com/apache/cloudstack/pull/1762
Trillian test result (tid-347)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 26094 seconds
Marvin logs:
Github user blueorangutan commented on the issue:
https://github.com/apache/cloudstack/pull/1762
@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been
kicked to run smoke tests
---
If your project is set up for it, you can reply to this email and have your
reply
Github user rhtyd commented on the issue:
https://github.com/apache/cloudstack/pull/1762
@blueorangutan test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user blueorangutan commented on the issue:
https://github.com/apache/cloudstack/pull/1762
Packaging result: âcentos6 âcentos7 âdebian. JID-164
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user blueorangutan commented on the issue:
https://github.com/apache/cloudstack/pull/1762
@rhtyd a Jenkins job has been kicked to build packages. I'll keep you
posted as I make progress.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user rhtyd commented on the issue:
https://github.com/apache/cloudstack/pull/1762
@blueorangutan package
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user jburwell commented on the issue:
https://github.com/apache/cloudstack/pull/1762
Due to the previous discussion, I am -1 on merging this PR.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user cloudmonger commented on the issue:
https://github.com/apache/cloudstack/pull/1762
### ACS CI BVT Run
**Sumarry:**
Build Number 135
Hypervisor xenserver
NetworkType Advanced
Passed=102
Failed=3
Skipped=6
_Link to logs Folder
Github user jburwell commented on the issue:
https://github.com/apache/cloudstack/pull/1762
@serg38 with custom plugins, there is no way to reliably perform such
tracing. I can think of batch cleanup operations in the storage layer that
follow the pattern I described. Even if there
Github user serg38 commented on the issue:
https://github.com/apache/cloudstack/pull/1762
@jburwell I concur but if @yvsubhash verified that those methods don't
participate in complex DML transactions this might be still a good start. If so
this approach might be expanded later to
Github user jburwell commented on the issue:
https://github.com/apache/cloudstack/pull/1762
@serg38 there remains a risk when those methods are executed in the context
of an open transaction where DMLs have already been executed and subsequent
DMLs will be executed. In this
Github user serg38 commented on the issue:
https://github.com/apache/cloudstack/pull/1762
@jburwell @yvsubhash I might be wrong but this PR will retry on deadlock
for only 2 DAO methods searchIncludingRemoved and
customSearchIncludingRemoved. No update methods are set with this
Github user jburwell commented on the issue:
https://github.com/apache/cloudstack/pull/1762
@serg38 that is not a safe assumption. Transactions often span multiple
statements and methods across DAOs. `TransactionLegacy` has a transaction
stacking/nested model that further occludes
Github user serg38 commented on the issue:
https://github.com/apache/cloudstack/pull/1762
@jburwell I thought that most if not all of ACS interaction through DAO is
rather atomic transactions. Do we have cases of multiple DML statements as a
part of the same transaction? We have been
Github user jburwell commented on the issue:
https://github.com/apache/cloudstack/pull/1762
@serg38 my reading of the code is that only the most recently attempted DML
will be re-executed. Furthermore, retrying without refreshing the base data
can also lead to data corruption. The
Github user serg38 commented on the issue:
https://github.com/apache/cloudstack/pull/1762
@jburwell @yvsubhash My understanding that all roll back statements will
receive MYSQL_DEADLOCK_ERROR_CODE and will be retired as a part of this patch.
---
If your project is set up for it,
Github user jburwell commented on the issue:
https://github.com/apache/cloudstack/pull/1762
@yvsubhash according to the (MySQL deadlock
documenation)[http://dev.mysql.com/doc/refman/5.7/en/innodb-deadlocks.html], a
`MYSQL_DEADLOCK_ERROR_CODE` error indicates the enclosing
Github user serg38 commented on the issue:
https://github.com/apache/cloudstack/pull/1762
LGTM. Finally !!! We have been seeing occasional deadlocks in environments
with high level transaction rate. @rhtyd @jburwell This could be a good add to
4.8/4.9.
---
If your project is set
39 matches
Mail list logo