Re: [ANNOUNCE] Open source distributed virtual machine scheduling platform

Rafael Weingärtner Tue, 03 May 2016 13:00:25 -0700

Hi Rohit, thanks ;)

I will answer your questions in line.

I did not look at the code but I'm curious on how you're powering off
hosts, I think with my out-of-band management PR you can use the oobm
subsystem to perform power management operations for IPMI 2.0 enabled hosts.

A: when we developed the first version (around October 2015), Apache
CloudStack (ACS) did not have support to activate and deactivate hosts, it
still does not have; you are working on that for Shapeblue, right? If there
was something at that time, it would have been great. Therefore, we had to
develop something to allow us to power on/off hosts (that was not our
focus, but we needed it). So, we created the simplest solution possible
(just to suffice our needs). Our cloud computing environment is created
using pretty outdated servers, half of them do not have support for IPMI.
Therefore, to shut down hosts, we use the hypervisors API. We noticed that
most of the hypervisors have a shutdown command in their APIs; that is why
we used it. We could not use many resources (time and energy) on developing
that for every hypervisor ACS supports, so we did it only for XenServer to
be used as a proof of concept (POC); to add the support to other
hypervisors it would be a matter of implementing an interface.

Even though we did the “shutdown“ using the hypervisor API, it would be
nice to have it also through the IPMI interface; it is rare, but we have
seen servers hung during the shutdown process.

Then, to activate (start) servers, we used the wake on LAN (WOL) protocol.
We found that to be the easiest way to activate servers on a LAN (there are
some requirements to do that, giving that it uses the layer 2 of the OSI
model to send the commands). However, once again, our environment did not
help much. One of our servers did not support WOL, but gladly it had IPMI
support. Therefore, to start servers depending on a flag that we add to the
“cloud.host” table we use IPMI or WOL.

Did the explanation help? You are welcome to look at the code, we think it
is more or less clear and documented.

Also curious how you implemented the heuristics and wrote tests (esp.
integration ones), some of us had a related discussion about such a feature
and we looked at this paper from VMware DRS team:
http://www.waldspurger.org/carl/papers/drs-vmtj-mar12.pdf

A: well, the heuristics are written in Java; we have an interface with a
set of methods that have to be implemented and that can be used by our
agents; also, we have a set of basic classes to support the development of
new heuristics. We have created only two simple heuristics to be used as a
proof of concept of the whole architecture we have created. Our first goal
was to formalize and finish the whole architecture; after that, we could
work on some more interesting things. Right now we are working on
techniques to mix (add) neural or Bayesian networks into our heuristics; we
intend to use those techniques to improve our VM mapping algorithms or the
ranking of hosts.

We have not read the VMware’s paper (we have based our whole proposals
solely on academic work until now); I have just glanced at it, and it seems
interesting; though I would need much more time and a deeper reading to be
able to comment on it.

The testing is done in a test environment we have, we isolate and control
the variables of the environment and everything that can affect the agents
behaviors; then, we start to test every functionalities and the agent
behavior. The process of testing for the first release was very manual.
However, now that we know the whole framework works. We are covering it
with test cases (unit and integration) and then to test a heuristic it
would be a matter of writing test cases for it.

Even with test cases, every experiment we do or release that is closed, we
execute a thorough batch of tests to check if everything is working; sadly,
those tests today manually executed.

I can say that the fun is going to start now. I find it much more
interesting to create methods/heuristics to manage the environment than to
create the structure that uses the heuristics.

Do you have some other doubts?

On Tue, May 3, 2016 at 12:18 PM, Rohit Yadav <[email protected]>
wrote:

> Nice feature :)
>
> I did not look at the code but I'm curious on how you're powering off
> hosts, I think with my out-of-band management PR you can use the oobm
> subsystem to perform power management operations for IPMI 2.0 enabled hosts.
>
> Also curious how you implemented the heuristics and wrote tests (esp.
> integration ones), some of us had a related discussion about such a feature
> and we looked at this paper from VMware DRS team:
> http://www.waldspurger.org/carl/papers/drs-vmtj-mar12.pdf
>
> Regards,
> Rohit Yadav
>
>
> Regards,
>
> Rohit Yadav
>
> [email protected]
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
> On Apr 27 2016, at 2:29 am, Gabriel Beims Bräscher <[email protected]>
> wrote:
>
> Hello CloudStack community members (@dev and @users),
>
> This email is meant to announce the publication of a project on Github that
> provides a distributed virtual machine scheduling platform that can be
> easily integrated with Apache CloudStack (ACS). The project is available at
> [1], you can find a detailed explanation of the idea of the project, its
> aspirations, basic concepts, installation and uninstallation processes and
> other information at [2]. Also, if you want to know more about the
> Autonomiccs and its creators, you can access the link [3].
>
> The code that was opened at Github is part of a bigger system that has the
> goal of managing a cloud computing environment autonomously. All of that is
> being developed and used in my Ph. D. thesis and the masters’ thesis of
> some colleagues. The formalization of that component will be published at
> the 12th IEEE World Congress on Services (SERVICES 2016) at San Francisco
> USA.
>
> You can see the stats of our code at [4] and [5]. Right now we only have
> ~40% of code test coverage. However, we intend to increase that value to
> ~60% until next week and ~90% until the end of June.
>
> To give you a picture of what we are preparing for the future, we can
> highlight the following goals for this year (You can find others short term
> goals at [6]):
>
>    -
>
>    Integrate our platform [1] with a multi-agent system (MAS) platform, in
>    order to facilitate the development of agents. Currently, we are using
>    Spring-integration to “emulate” and an agent life cycle; that can
> become a
>    problem when needing to add more agents and they start to communicate
> with
>    each other. Therefore, we will integrate the platform in [1] with JADE
> [7];
>    -
>
>    Today the metrics about the use of resource are not properly gathered by
>    ACS; in order to develop more accurate predictions we need to store
>    resource usage metrics. Also, those metrics have to be gathered in a
>    distributed way without causing service degradation. For that and a few
>    other reasons (you can send us an email so we can provide you more
>    details), we are developing an autonomic monitoring platform that will
>    integrate with the system available in [1];
>    -
>
>    We also foresee the need to develop a better way to visualize the cloud
>    environment, a way to detect hot spots (pods and hosts) with higher
>    resource usage trends (VMs trends). We see the need to change the rustic
>    view of the environment with tables for a better suitable one for humans
>    (this is a surprise that we intend to present at the CCCBR).
>
> We hope you like the software and that it meets your expectations. If it
> does not suffice all of your needs, let’s work together to improve it. If
> you have any doubts or suggestions please send us an email; we will reply
> it as fast as we can. Also, critics that can help us improve that platform
> are very welcome.
>
> [1] https://github.com/Autonomiccs/autonomiccs-platform
>
> [2] https://github.com/Autonomiccs/autonomiccs-platform/wiki
>
> [3] http://autonomiccs.com.br/
>
> [4] http://jenkins.autonomiccs.com.br/
>
> [5] http://sonar.autonomiccs.com.br/
>
> [6] https://github.com/Autonomiccs/autonomiccs-platform#project-evolution
>
> [7] http://jade.tilab.com/
>
> Cheers, Gabriel.
>

-- 
Rafael Weingärtner

Re: [ANNOUNCE] Open source distributed virtual machine scheduling platform

Reply via email to