On 11/29/2011 06:29 PM, Adam Litke wrote:
> After discussing MOM / VDSM integration at length, two different strategies 
> have
> emerged.  I will call them Plan A and Plan B:
>
> Plan A: MOM integration at the OS/Packaging level
> Plan B: MOM integration as a new VDSM thread

I think a form of plan B is more appropriate:

In general we can look at MOM vs VDSM just like micro kernel vs linux 
kernel approach. MOM can be independent project but then it will need to 
expose much more apis for VDSM and wise verse.

For example, take live migration, there is no point of MOM balloon a 
guest while it is migrating. So either you ignore that which is bad or 
now need to listen to VDSM events on VM migration.

Think about hot plug vcpu/pci-device to a VM - if before MOM used some 
SLA for the VM, now it will need to change to cope w/ the new resources, 
again more api/events for that.

Another thing - all of the settings for per VM KSM/THP/Swap/Balloon - 
all will need to propagate from the vdsm api towards MOM.

I can go on this way.

VDSM is not libvirt, it has policies today, there is no need to split it 
up into two or more.

For completeness, I do think that there is a place for MOM like 
functionality within the OS. But I think for the best of ovirt project 
goals, it would be the most efficient to host all in VDSM and keep our 
actions VM specific.

Thanks,
Dor


>
> This RFC is about Plan A.  I will start another thread to discuss Plan B once 
> I
> have properly prototyped the idea in code.
>
> Integration VDSM and MOM at the OS level is by far the simpler and least
> intrusive option.  As you can see from the included patch, the changes to vdsm
> are very limited.  In this model, VDSM interacts with MOM in the same way as 
> it
> uses libvirt.  Upon installation, VDSM installs its own MOM configuration file
> and restarts the MOM daemon (which continues to exist as an independent
> system-level daemon).  Once restarted, MOM will load its policy from the VDSM
> configuration directory.
>
> Pros:
> - Simple and unobtrusive to either MOM or VDSM
> - Clean API with no duplication or layering
> - Maintain flexibility to tighten integration in the future
>
> Cons:
> - Momd runs as root (like supervdsm)
> - If MOM will consume VDSM APIs, it must use the slower xmlrpc interface
>
> Based on my experience while working on Plan A and Plan B, I feel that this
> approach is the best way to start.  Once MOM and VSDM are commingled on the 
> node,
> we can begin the interesting work of providing the actual dynamic policy to
> manage the system.
>
> Sample code for Plan A follows:
>
> commit 4464c07849cfd921d0e3446961c5b6471dd360d9
> Author: Adam Litke<a...@us.ibm.com>
> Date:   Mon Nov 28 08:46:22 2011 -0600
>
>      Integrate with MOM at the system/packaging level
>
> diff --git a/vdsm.spec.in b/vdsm.spec.in
> index cf12428..14588c6 100644
> --- a/vdsm.spec.in
> +++ b/vdsm.spec.in
> @@ -151,6 +151,13 @@ rm -rf %{buildroot}
>   /usr/sbin/saslpasswd2 -p -a libvirt vdsm@rhevh<  \
>       /etc/pki/vdsm/keys/libvirt_password
>
> +# install the mom config file and restart momd
> +if [ -f %{_sysconfdir}/momd.conf ]; then
> +    mv -n %{_sysconfdir}/momd.conf %{_sysconfdir}/momd.conf.vdsmsave
> +fi
> +cp %{_sysconfdir}/%{vdsm_name}/momd.conf %{_sysconfdir}/momd.conf
> +/sbin/service momd condrestart>  /dev/null 2>&1
> +
>   %preun
>   if [ "$1" -eq 0 ]
>   then
> @@ -176,6 +183,12 @@ _EOF
>
>       /usr/sbin/saslpasswd2 -p -a libvirt -d vdsm@rhevh
>
> +    # Restore old MOM configuration
> +    if [ -f %{_sysconfdir}/momd.conf.vdsmsave ]; then
> +        mv %{_sysconfdir}/momd.conf.vdsmsave %{_sysconfdir}/momd.conf
> +        /sbin/service momd condrestart>  /dev/null 2>&1
> +    fi
> +
>   %if 0%{?rhel}
>       if /sbin/initctl status libvirtd>/dev/null 2>/dev/null ; then
>           /sbin/initctl stop libvirtd>/dev/null 2>/dev/null
> @@ -246,6 +259,8 @@ fi
>   %config(noreplace) %{_sysconfdir}/%{vdsm_name}/logger.conf
>   %config(noreplace) %{_sysconfdir}/logrotate.d/vdsm
>   %config(noreplace) %{_sysconfdir}/rwtab.d/vdsm
> +%{_sysconfdir}/%{vdsm_name}/mom.policy
> +%{_sysconfdir}/%{vdsm_name}/momd.conf
>   %{_sysconfdir}/sudoers.d/50_vdsm
>   %{_sysconfdir}/cron.hourly/vdsm-logrotate
>   %{_sysconfdir}/cron.d/vdsm-libvirt-logrotate
> diff --git a/vdsm/Makefile.am b/vdsm/Makefile.am
> index 7da9cad..a96a323 100644
> --- a/vdsm/Makefile.am
> +++ b/vdsm/Makefile.am
> @@ -83,7 +83,9 @@ EXTRA_DIST = \
>       vdsm-restore-net-config.in \
>       vdsm.rwtab \
>       vdsm-sosplugin.py.in \
> -     vdsm-store-net-config.in
> +     vdsm-store-net-config.in \
> +     mom.policy \
> +     momd.conf
>
>   # Reference:
>   # http://www.gnu.org/software/automake/manual/html_node/Scripts.html
> @@ -115,7 +117,7 @@ install-data-hook:
>   install-data-local: install-data-init install-data-logger \
>               install-data-rwtab install-data-logrotate \
>               install-data-sudoers install-data-sosplugin \
> -             install-data-libvirtpass
> +             install-data-libvirtpass install-data-mom
>       $(MKDIR_P) $(DESTDIR)$(vdsmtsdir)/keys
>       $(MKDIR_P) $(DESTDIR)$(vdsmtsdir)/certs
>       $(MKDIR_P) $(DESTDIR)$(vdsmlogdir)
> @@ -128,7 +130,7 @@ install-data-local: install-data-init install-data-logger 
> \
>   uninstall-local: uninstall-data-init uninstall-data-logger \
>               uninstall-data-rwtab uninstall-data-logrotate \
>               uninstall-data-sudoers uninstall-data-sosplugin \
> -             uninstall-data-libvirtpass
> +             uninstall-data-libvirtpass uninstall-data-mom
>
>   install-data-init:
>       $(MKDIR_P) $(DESTDIR)$(sysconfdir)/rc.d/init.d
> @@ -191,3 +193,13 @@ install-data-sosplugin:
>
>   uninstall-data-sosplugin:
>       $(RM) $(DESTDIR)$(pythondir)/sos/plugins/vdsm.py
> +
> +install-data-mom:
> +     $(INSTALL_DATA) mom.policy \
> +             $(DESTDIR)$(vdsmconfdir)/mom.policy
> +     $(INSTALL_DATA) momd.conf \
> +             $(DESTDIR)$(vdsmconfdir)/momd.conf
> +
> +uninstall-data-mom:
> +     $(RM) $(DESTDIR)$(vdsmconfdir)/mom.policy
> +     $(RM) $(DESTDIR)$(vdsmconfdir)/momd.conf
> diff --git a/vdsm/mom.policy b/vdsm/mom.policy
> new file mode 100644
> index 0000000..cb31526
> --- /dev/null
> +++ b/vdsm/mom.policy
> @@ -0,0 +1,155 @@
> +### KSM 
> ########################################################################
> +
> +### Constants
> +# The number of pages to add when increasing pages_to_scan
> +(defvar ksm_pages_boost 300)
> +
> +# The number of pages to subtract when decreasing pages_to_scan
> +(defvar ksm_pages_decay -50)
> +
> +# The min and max number of pages to scan per cycle when ksm is activated
> +(defvar ksm_npages_min 64)
> +(defvar ksm_npages_max 1250)
> +
> +# The number of ms to sleep between ksmd scans for a 16GB system.  Systems 
> with
> +# more memory will sleep less, while smaller systems will sleep more.
> +(defvar ksm_sleep_ms_baseline 10)
> +
> +# A virtualization host tends to use most of its memory for running guests 
> but
> +# a certain amount is reserved for the host OS, non virtualization-related 
> work,
> +# and as a failsafe.  When free memory (including memory used for caches) 
> drops
> +# below this parcentage of total memory, the host is deemed under pressure. 
> and
> +# KSM will be started to try and free up some memory.
> +(defvar ksm_free_percent 0.20)
> +
> +### Helper functions
> +(def change_npages (delta)
> +{
> +    (defvar newval (+ Host.ksm_pages_to_scan delta))
> +    (if (>  newval ksm_npages_max) (set newval ksm_npages_max) 1)
> +    (if (<  newval ksm_npages_min) (set newval ksm_npages_min) 0)
> +    (Host.Control "ksm_pages_to_scan" newval)
> +})
> +
> +### Main Script
> +# Methodology: Since running KSM does incur some overhead, try to run it only
> +# when necessary.  If the amount of committed KSM shareable memory is high 
> or if
> +# free memory is low, enable KSM to try to increase free memory.  Large 
> memory
> +# machines should scan more often than small ones.  Likewise, machines under
> +# memory pressure should scan more aggressively then more idle machines.
> +
> +(defvar ksm_pressure_threshold (* Host.mem_available ksm_free_percent))
> +(defvar ksm_committed Host.ksm_shareable)
> +
> +(if (and (<  (+ ksm_pressure_threshold ksm_committed) Host.mem_available)
> +         (>  (Host.StatAvg "mem_free") ksm_pressure_threshold))
> +    (Host.Control "ksm_run" 0)
> +    {        # else
> +        (Host.Control "ksm_run" 1)
> +        (Host.Control "ksm_sleep_millisecs"
> +            (/ (* ksm_sleep_ms_baseline 16777216) Host.mem_available))
> +       (if (<  (Host.StatAvg "mem_free") ksm_pressure_threshold)
> +            (change_npages ksm_pages_boost)
> +           (change_npages ksm_pages_decay))
> +    }
> +)
> +### Auto-Balloon 
> ###############################################################
> +
> +### Constants
> +# If the percentage of host free memory drops below this value
> +# then we will consider the host to be under memory pressure
> +(defvar pressure_threshold 0.20)
> +
> +# If pressure threshold drops below this level, then the pressure
> +# is critical and more aggressive ballooning will be employed.
> +(defvar pressure_critical 0.05)
> +
> +# This is the minimum percentage of free memory that an unconstrained
> +# guest would like to maintain
> +(defvar min_guest_free_percent 0.20)
> +
> +# Don't change a guest's memory by more than this percent of total memory
> +(defvar max_balloon_change_percent 0.05)
> +
> +# Only ballooning operations that change the balloon by this percentage
> +# of current guest memory should be undertaken to avoid overhead
> +(defvar min_balloon_change_percent 0.0025)
> +
> +### Helper functions
> +# Check if the proposed new balloon value is a large-enough
> +# change to justify a balloon operation.  This prevents us from
> +# introducing overhead through lots of small ballooning operations
> +(def change_big_enough (guest new_val)
> +{
> +    (if (>  (abs (- new_val guest.libvirt_curmem))
> +           (* min_balloon_change_percent guest.libvirt_curmem))
> +        1 0)
> +})
> +
> +(def shrink_guest (guest)
> +{
> +    # Determine the degree of host memory pressure
> +    (if (<= host_free_percent pressure_critical)
> +        # Pressure is critical:
> +        #   Force guest to swap by making free memory negative
> +        (defvar guest_free_percent (+ -0.05 host_free_percent))
> +        # Normal pressure situation
> +        #   Scale the guest free memory back according to host pressure
> +        (defvar guest_free_percent (* min_guest_free_percent
> +                                    (/ host_free_percent 
> pressure_threshold))))
> +
> +    # Given current conditions, determine the ideal guest memory size
> +    (defvar guest_used_mem (- (guest.StatAvg "libvirt_curmem")
> +                              (guest.StatAvg "mem_unused")))
> +    (defvar balloon_min (+ guest_used_mem
> +                           (* guest_free_percent guest.libvirt_maxmem)))
> +    # But do not change it too fast
> +    (defvar balloon_size (* guest.libvirt_curmem
> +                            (- 1 max_balloon_change_percent)))
> +    (if (<  balloon_size balloon_min)
> +        (set balloon_size balloon_min)
> +        0)
> +    # Set the new target for the BalloonController.  Only set it if the
> +    # value makes sense and is a large enough change to be worth it.
> +    (if (and (<= balloon_size guest.libvirt_maxmem)
> +            (change_big_enough guest balloon_size))
> +        (guest.Control "balloon_target" balloon_size)
> +        0)
> +})
> +
> +(def grow_guest (guest)
> +{
> +    # There is only work to do if the guest is ballooned
> +    (if (<  guest.libvirt_curmem guest.libvirt_maxmem) {
> +        # Minimally, increase so the guest has its desired free memory
> +        (defvar guest_used_mem (- (guest.StatAvg "libvirt_curmem")
> +                                  (guest.StatAvg "mem_unused")))
> +        (defvar balloon_min (+ guest_used_mem (* min_guest_free_percent
> +                                                 guest.libvirt_maxmem)))
> +        # Otherwise, increase according to the max balloon change
> +        (defvar balloon_size (* guest.libvirt_curmem
> +                                (+ 1 max_balloon_change_percent)))
> +
> +        # Determine the new target for the BalloonController.  Only set
> +        # if the value is a large enough for the change to be worth it.
> +        (if (>  balloon_size guest.libvirt_maxmem)
> +            (set balloon_size guest.libvirt_maxmem) 0)
> +        (if (<  balloon_size balloon_min)
> +            (set balloon_size balloon_min) 0)
> +        (if (change_big_enough guest balloon_size)
> +            (guest.Control "balloon_target" balloon_size) 0)
> +    } 0)
> +})
> +
> +### Main script
> +# Methodology: The goal is to shrink all guests fairly and by an amount
> +# scaled to the level of host memory pressure.  If the host is under
> +# severe pressure, scale back more aggressively.  We don't yet handle
> +# symptoms of over-ballooning guests or try to balloon idle guests more
> +# aggressively.  When the host is not under memory pressure, slowly
> +# deflate the balloons.
> +
> +(defvar host_free_percent (/ (Host.StatAvg "mem_free") Host.mem_available))
> +(if (<  host_free_percent pressure_threshold)
> +    (with Guests guest (shrink_guest guest))
> +    (with Guests guest (grow_guest guest)))
> diff --git a/vdsm/momd.conf b/vdsm/momd.conf
> new file mode 100644
> index 0000000..4d09f44
> --- /dev/null
> +++ b/vdsm/momd.conf
> @@ -0,0 +1,83 @@
> +### DO NOT REMOVE THIS COMMENT -- MOM Configuration for VDSM ###
> +
> +[main]
> +# The wake up frequency of the main daemon (in seconds)
> +main-loop-interval: 5
> +
> +# The data collection interval for host statistics (in seconds)
> +host-monitor-interval: 5
> +
> +# The data collection interval for guest statistics (in seconds)
> +guest-monitor-interval: 5
> +
> +# The wake up frequency of the guest manager (in seconds).  The guest manager
> +# sets up monitoring and control for newly-created guests and cleans up after
> +# deleted guests.
> +guest-manager-interval: 5
> +
> +# The wake up frequency of the policy engine (in seconds).  During each
> +# interval the policy engine evaluates the policy and passes the results
> +# to each enabled controller plugin.
> +policy-engine-interval: 10
> +
> +# A comma-separated list of Controller plugins to enable
> +controllers: Balloon, KSM
> +
> +# Sets the maximum number of statistic samples to keep for the purpose of
> +# calculating moving averages.
> +sample-history-length: 10
> +
> +# The URI to use when connecting to this host's libvirt interface.  If this 
> is
> +# left blank then the system default URI is used.
> +libvirt-hypervisor-uri: qemu:///system
> +
> +# Set this to an existing, writable directory to enable plotting.  For each
> +# invocation of the program a subdirectory momplot-NNN will be created where 
> NNN
> +# is a sequence number.  Within that directory, tab-delimited data files 
> will be
> +# created and updated with all data generated by the configured Collectors.
> +plot-dir:
> +
> +# Activate the RPC server on the designated port (-1 to disable).  RPC is
> +# disabled by default until authentication is added to the protocol.
> +rpc-port: -1
> +
> +# At startup, load a policy from the given file.  If empty, no policy is 
> loaded
> +policy: /etc/vdsm/mom.policy
> +
> +[logging]
> +# Set the destination for program log messages.  This can be either 'stdio' 
> or
> +# a filename.  When the log goes to a file, log rotation will be done
> +# automatically.
> +log: /var/log/momd.log
> +
> +# Set the logging verbosity level.  The following levels are supported:
> +# 5 or debug:     Debugging messages
> +# 4 or info:      Detailed messages concerning normal program operation
> +# 3 or warn:      Warning messages (program operation may be impacted)
> +# 2 or error:     Errors that severely impact program operation
> +# 1 or critical:  Emergency conditions
> +# This option can be specified by number or name.
> +verbosity: info
> +
> +## The following two variables are used only when logging is directed to a 
> file.
> +# Set the maximum size of a log file (in bytes) before it is rotated.
> +max-bytes: 2097152
> +# Set the maximum number of rotated logs to retain.
> +backup-count: 5
> +
> +[host]
> +# A comma-separated list of Collector plugins to use for Host data 
> collection.
> +collectors: HostMemory, HostKSM
> +
> +[guest]
> +# A comma-separated list of Collector plugins to use for Guest data 
> collection.
> +collectors: GuestQemuProc, GuestLibvirt
> +
> +# Collector-specific configuration for GuestQemuAgent
> +[Collector: GuestQemuAgent]
> +# Set the base path where the host-side sockets for guest communication can 
> be
> +# found.  The GuestQemuAgent Collector will try to open files with the 
> following
> +# names:
> +#<socket_path>/va-<guest-name>-virtio.sock - for virtio serial
> +#<socket_path>/va-<guest-name>-isa.sock - for isa serial
> +socket_path: /var/lib/libvirt/qemu
>

_______________________________________________
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://fedorahosted.org/mailman/listinfo/vdsm-devel

Reply via email to