Re: [DISCUSS] Management server (pre-)shutdown to avoid killing jobs

2017-12-18 Thread Marc-Aurèle Brothier
It's definitively a great direction to take and much more robust. ZK would be great fit to monitor the state of management servers and agent with the help of the ephemeral nodes. On the other side, it's not encouraged to use it as a messaging queue, and kafka would be a much better fit for that

Adding Spellchecker to code style validator

2017-12-18 Thread Ivan Kudryavtsev
Hello, devs. How about adding spell checking to code style guide. ACS uses a lot of java introspection including JSON generation, etc. so typos migrate to protocol level. Working on CLOUDSTACK-10168 I found ipv4_adress inside python code / dhcp related json, trying to improve "the camp" I moved

RE: Master Blockers and Criticals

2017-12-18 Thread Paul Angus
Thank you Khosrow, Do you have an Apache Jira ID, so that I can assign it in Jira also? Kind regards, Paul Angus paul.an...@shapeblue.com  www.shapeblue.com 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue -Original Message- From: Khosrow Moossavi

Re: Bug in ViewResponseHelper.java of 4627fb2

2017-12-18 Thread Tutkowski, Mike
I’m not at my computer now, so don’t know the exact line number. I can open up a PR with my fix. The problem is that you shouldn’t pass in null for the key of a ConcurrentHashMap, but the code can do this for data disks on VMware (hence the NullPointerException). On Dec 18, 2017, at 4:48 PM,

Re: Bug in ViewResponseHelper.java of 4627fb2

2017-12-18 Thread Rafael Weingärtner
What is the line in that class that may generate a NPE? 291? Please do open a PR to propose a fix for this situation. On Mon, Dec 18, 2017 at 6:38 PM, Tutkowski, Mike wrote: > Hi, > > I noticed an issue today with a fairly recent commit: 4627fb2. > > In

Re: Master Blockers and Criticals

2017-12-18 Thread Khosrow Moossavi
@Paul you can assign CLOUDSTACK-9862 to me, we already have it fixed in our own fork. On Mon, Dec 18, 2017 at 12:05 PM, Paul Angus wrote: > Hi All, here is an updated summary of the open Critical and Blocker Issues > in Jira. > If you are working on any of these

Bug in ViewResponseHelper.java of 4627fb2

2017-12-18 Thread Tutkowski, Mike
Hi, I noticed an issue today with a fairly recent commit: 4627fb2. In ViewResponseHelper.java, a NullPointerException can be thrown when interacting with a data disk on VMware because the disk chain value in cloud.volumes can have a value of NULL. I can put in a check for NULL and avoid the

Re: XenServer 7.1 and 7.2

2017-12-18 Thread Khosrow Moossavi
Apparently XenServer "xen-tools" has been renamed from version 7.0 onward to "guest-tools". https://docs.citrix.com/content/dam/docs/en-us/xenserver/xenserver-7-0/downloads/xenserver-7-0-quick-start-guide.pdf (Section 4.2, point 3) And this comment:

Re: XenServer 7.1 and 7.2

2017-12-18 Thread Rohit Yadav
Thanks Paul, the PR has been merged after reviewing and based on smoketests. Would you also like to add support for XenServer 7.3? Regards. From: Paul Angus Sent: Wednesday, December 13, 2017 11:39:28 PM To: dev Cc: Syed Ahmed;

Re: [UPDATE] Debian 9 "stretch" systemvmtemplate for master

2017-12-18 Thread Rohit Yadav
Hi Wido, Thanks. I've verified, virtio-scsi seems to work for me. Qemu guest agent also works, I was able to write poc code to get rid of patchviasocket.py as well. Can you help review and test the PR? Regards. From: Wido den Hollander Sent:

Re: [UPDATE] Debian 9 "stretch" systemvmtemplate for master

2017-12-18 Thread Rohit Yadav
All, Thanks for your feedback. We're reaching close to completion now. All smoketests are now passing on KVM, XenServer and VMWare now. There are however few intermittent failures on VMware being looked into. The rVR smoketests failures on VMware have been fixed as well. The systemvmtemplate

Re: [DISCUSS] Management server (pre-)shutdown to avoid killing jobs

2017-12-18 Thread ilya musayev
I very much agree with Paul, we should consider moving into resilient model with least dependence I.e ha-proxy.. Send a notification to partner MS to take over the job management would be ideal. On Mon, Dec 18, 2017 at 9:28 AM Paul Angus wrote: > Hi Marc-Aurèle, > >

RE: [DISCUSS] Management server (pre-)shutdown to avoid killing jobs

2017-12-18 Thread Paul Angus
Hi Marc-Aurèle, Personally, my utopia would be to be able to pass async jobs between mgmt. servers. So rather than waiting in indeterminate time for a snapshot to complete, monitoring the job is passed to another management server. I would LOVE that something like Zookeeper monitored the

RE: Master Blockers and Criticals

2017-12-18 Thread Paul Angus
Hi All, here is an updated summary of the open Critical and Blocker Issues in Jira. If you are working on any of these issues, please whether you believe that you will have this issue closed by 8th Jan. @Jayapal Reddy please respond to the pings on the subject of the blocker that you have

Re: Clean up old and obsolete branches

2017-12-18 Thread Rafael Weingärtner
@Marc, I like this idea. However, some folks believe it might be useful to use the official repo to work in groups (group of committers). I did not want to push this without a broader discussion; that is why I am proposing that people can use the official repository, as long as they remove the

[DISCUSS] Management server (pre-)shutdown to avoid killing jobs

2017-12-18 Thread Marc-Aurèle Brothier
Hi everyone, Another point, another thread. Currently when shutting down a management server, despite all the "stop()" method not being called as far as I know, the server could be in the middle of processing an async job task. It will lead to a failed job since the response won't be delivered to

Re: Clean up old and obsolete branches

2017-12-18 Thread Daan Hoogland
any workable procedure (including yours, Rafael) will do but let's be extremely patient and lenient. I think we can start deleting a lot of old branches (RC-branches and merged PRs to start with) On Mon, Dec 18, 2017 at 2:23 PM, Marc-Aurèle Brothier wrote: > +1 for me > > On

Re: Clean up old and obsolete branches

2017-12-18 Thread Marc-Aurèle Brothier
+1 for me On the point 5, since you can have people working together on forks, I would simply state that no other branches except the official ones can be in the project repository, removing: "If one uses the official repository, the branch used must be cleaned right after merging;" On Mon, Dec

Re: [Discuss] Management cluster / Zookeeper holding locks

2017-12-18 Thread Rafael Weingärtner
Now, yes! Thanks for the clarification. On Mon, Dec 18, 2017 at 11:16 AM, Marc-Aurèle Brothier wrote: > Sorry about the confusion. It's not going to replace the DB transactions in > the DAO way. Today we can say that there are 2 types of locks in CS, either > a pure

Re: [Discuss] Management cluster / Zookeeper holding locks

2017-12-18 Thread Marc-Aurèle Brothier
Sorry about the confusion. It's not going to replace the DB transactions in the DAO way. Today we can say that there are 2 types of locks in CS, either a pure transaction one, with the select for update which locks a row for any operation by other threads, or a more programmatic one with the

Re: Clean up old and obsolete branches

2017-12-18 Thread Rafael Weingärtner
Guys, this is the moment to give your opinion here. Since nobody has commented anything on the protocol. I will just add some more steps before deletion. 1. Only maintain the master and major release branches. We currently have a system of X.Y.Z.S. I define major release here as a release

Re: [Discuss] Management cluster / Zookeeper holding locks

2017-12-18 Thread Rafael Weingärtner
So, we would need to change every piece of code that opens and uses connections and transactions to change to ZK model? I mean, to direct the flow to ZK. On Mon, Dec 18, 2017 at 8:55 AM, Marc-Aurèle Brothier wrote: > I understand your point, but there isn't any "transaction"

Re: MySQL HA

2017-12-18 Thread Alireza Eskandari
Yes, I'll keep it and do some stress tests on it to be sure about its functionality. On Dec 18, 2017 14:53, "Rafael Weingärtner" wrote: > So, this fixed the problem? > Can you keep this running for a while longer? Just to make sure. Then, I > can open a PR to fix it

Re: MySQL HA

2017-12-18 Thread Rafael Weingärtner
So, this fixed the problem? Can you keep this running for a while longer? Just to make sure. Then, I can open a PR to fix it in master. On Mon, Dec 18, 2017 at 9:02 AM, Alireza Eskandari wrote: > Thank you Rafael, > I test your fix and it seems that I have got the

Re: MySQL HA

2017-12-18 Thread Alireza Eskandari
Thank you Rafael, I test your fix and it seems that I have got the expected result. You can see the exception raised for database failover. I should notice I replace the file for cloudstack-mnagement and cloudstack-usage: /usr/share/cloudstack-usage/lib/cloud-framework-cluster-4.9.3.0.jar

Re: [Discuss] Management cluster / Zookeeper holding locks

2017-12-18 Thread Marc-Aurèle Brothier
I understand your point, but there isn't any "transaction" in ZK. The transaction and commit stuff are really for DB and not part of ZK. All entries (if you start writing data in some nodes) are versioned. For example you could enforce that to overwrite a node value you must submit the node data

Re: MySQL HA

2017-12-18 Thread L Radhakrishna Rao
On 18-Dec-2017 4:03 PM, "Rafael Weingärtner" wrote: > Here is a fix: > https://www.dropbox.com/s/kgakhs3v05uz88x/cloud- > framework-cluster-4.9.3.0.jar?dl=1 > You need to replace this jar file in CloudStack installation. You should > also backup the original jar and

Re: [Discuss] Management cluster / Zookeeper holding locks

2017-12-18 Thread Rafael Weingärtner
I did not check the link before. Sorry about that. Reading some of the pages there, I see curator more like a client library such as MySQL JDBC client. When I mentioned framework, I was looking for something like Spring-data. So, we could simply rely on the framework to manage connections and

Re: MySQL HA

2017-12-18 Thread Rafael Weingärtner
Here is a fix: https://www.dropbox.com/s/kgakhs3v05uz88x/cloud-framework-cluster-4.9.3.0.jar?dl=1 You need to replace this jar file in CloudStack installation. You should also backup the original jar and restore it as soon as you finish testing. To replace the JARs, you need to stop ACS, and just

Re: [Discuss] Management cluster / Zookeeper holding locks

2017-12-18 Thread Marc-Aurèle Brothier
@rafael, yes there is a framework (curator), it's the link I posted in my first message: https://curator.apache.org/curator-recipes/shared-lock.html This framework helps handling all the complexity of ZK. The ZK client stays connected all the time (as the DB connection pool), and only one

Re: [Discuss] Management cluster / Zookeeper holding locks

2017-12-18 Thread Rafael Weingärtner
Do we have framework to do this kind of looking in ZK? I mean, you said " create a new InterProcessSemaphoreMutex which handles the locking mechanism.". This feels that we would have to continue opening and closing this transaction manually, which is what causes a lot of our headaches with

Re: [Discuss] Management cluster / Zookeeper holding locks

2017-12-18 Thread Marc-Aurèle Brothier
We added ZK lock for fix this issue but we will remove all current locks in ZK in favor of ZK one. The ZK lock is already encapsulated in a project with an interface, but more work should be done to have a proper interface for locks which could be implemented with the "tool" you want, either a DB

Re: [Discuss] Management cluster / Zookeeper holding locks

2017-12-18 Thread Rafael Weingärtner
so, how does that work? I mean, instead of opening a transaction with the database and executing locks, what do we need to do in the code? On Mon, Dec 18, 2017 at 7:24 AM, Ivan Kudryavtsev wrote: > Rafael, > > - It's easy to configure and run ZK either in single node

Re: [Discuss] Management cluster / Zookeeper holding locks

2017-12-18 Thread Ivan Kudryavtsev
Rafael, - It's easy to configure and run ZK either in single node or cluster - zookeeper should replace mysql locking mechanism used inside ACS code (places where ACS locks tables or rows). I don't think from the other size, that moving from MySQL locks to ZK locks is easy and light and (even

Re: [Discuss] Management cluster / Zookeeper holding locks

2017-12-18 Thread Rafael Weingärtner
How hard is it to configure Zookeeper and get everything up and running? BTW: what zookeeper would be managing? CloudStack management servers or MySQL nodes? On Mon, Dec 18, 2017 at 7:13 AM, Ivan Kudryavtsev wrote: > Hello, Marc-Aurele, I strongly believe that all

Re: [Discuss] Management cluster / Zookeeper holding locks

2017-12-18 Thread Ivan Kudryavtsev
Hello, Marc-Aurele, I strongly believe that all mysql locks should be removed in favour of truly DLM solution like Zookeeper. The performance of 3node ZK ensemble should be enough to hold up to 1000-2000 locks per second and it helps to move to truly clustered MySQL like galera without single

Re: [Discuss] Management cluster / Zookeeper holding locks

2017-12-18 Thread Daan Hoogland
Are you proposing to add zookeeper as an optional requirement, Marc-Aurèle? or just curator? and what is the decision mech of including it or not? On Mon, Dec 18, 2017 at 9:33 AM, Marc-Aurèle Brothier wrote: > Hi everyone, > > I was wondering how many of you are running

[Discuss] Management cluster / Zookeeper holding locks

2017-12-18 Thread Marc-Aurèle Brothier
Hi everyone, I was wondering how many of you are running CloudStack with a cluster of management servers. I would think most of you, but it would be nice to hear everyone voices. And do you get hosts going over their capacity limits? We discovered that during the VM allocation, if you get a lot