[Virt-test-devel] [RFC] Hierarchical multiplexer variants (cartesian config)

Ademar Reis Fri, 16 May 2014 08:43:43 -0700

Hi Folks.

As Lucas pointed out the other day, we're trying to refactor and
improve the way the variable multiplexing system works.


After thinking and discussing about this for a while, I've come
up with a proposal for a new way to declare and filter the
variants.

The problem
-----------

The current cartesian format is unstructured. It's a collection
of dictionaries and lists that when combined and multiplexed
easily explode into a huge data structure. A lot of effort has
been put into optimizations and cleanups along the years, but it
still remains as one of the most complex parts of the autotest
framework.

It's relatively difficult to understand and given the global
nature of the variants/variables (no proper namespace or
compartmentalization), the format of the file and the size of the
variants set, it's very error prone.

It's currently very common to find global filters in the code
such as "only raw", "only up" or "only ide".  Filters easily
become complex and difficult to understand and debug. Given
they're applied on top of the resulting (huge) structure, they
are inherently slow to process and hard to understand and
validate.

I believe this lack of structure is the root cause of many of the
problems or dislikes we have with the system.


Hierarchical multiplexer variants
---------------------------------

I'm proposing a different data structure to keep the variants and
variables that will be used for multiplexing.

**NOTE**: In this RFC the syntax and the naming are not supposed
to be final or similar to the current implementation.

The idea is that all variants should be part of a tree structure,
with a single root, thus having a single identifier.

For multiplexing (or combination into dicts), only the leaves are
used. Filtering happens in the tree, before any multiplexing
actually happens.

For example: hardware->disks->ide would be an example of a
variant where ide variables would be kept. When filtering for
ide, one could specify any part of the full path:

  * 'hardware->disks->ide' --> the full ID of the ide variant
  * '*disks->ide' --> any leave 'ide' inside a branch 'disks'
  * '*ide' --> any leave called 'ide'

It might be better to explain the concept with an hypothetic
example:

## reference tree ##

 * <ROOT>
   * env
     * production
       - malloc_perturb = no
       - gcc_flags = -O3
     * debug
       - malloc_pertub = yes
       - gcc_flags = -g
   * host
     * kernel_config
       * huge_pages
         - huge_pages = yes
       * small_pages
         - huge_pages = no
       * numa_ballance_aggressive
         - numa_balancing = 1
         - numa_balancing_migrate_deferred = 32
         - numa_balancing_scan_size_mb = 512
       * numa_ballance_light
         - numa_balancing = 1
         - numa_balancing_migrate_deferred = 8
         - numa_balancing_scan_size_mb = 32    
   * guest
      * os
        * windows
           - os_type = windows
           * xp
             - win = xp
           * 2k12
             - win = 2k12
           * 7
             - win = 7
        * linux
          - os_type = linux
          * distro
             * fedora
               - distro = fedora
             * ubuntu
               - distro = ubuntu
      * hardware
        * disks
           * ide
             - drive_format = ide
           * scsi
             - drive_format = scsi
        * network
          * rtl_8139
             - nic_model = rtl8139
          * e1000
             - nic_model = e1000
          * virtio_net
             - filter: "only guest->os->linux" # meaning: only
                                               # the linux leave
                                               # of guest->os will
                                               # be used
             - nic_model = virtio
             - enable_msix_vectors = yes
 * tests
   - filter: "no tests" # we don't want to multiplex test
                        # variables from different tests, so we
                        # filter out the entire branch
   * sync_test
      - filter: "only guest->os->linux" # we don't want to
                                        # multiplex other OSes here
      - filter: "only hardware->disks" # self-explanatory
      * standard
         - sync_timeout = 30
         - sync_tries = 10
      * aggressive
         - sync_timeout = 10
         - sync_tries = 20
  * ping_test
      - filter: "only guest->os->linux" # we don't want to
                                        # multiplex other OSes here
      - filter: "only hardware->network" # self-explanatory
      * standard
        - ping_tries = 10
        - timeout = 20
      * aggressive
        - ping_flags = -f
        - ping_tries = 100
        - timeout - 5

In summary, variants are nodes and variables are leaves. Variants
can have arbitrary sub-variants and arbitrary variables (combined
or not). Variables and filters from a parent are inherited.

In the example above, the only reserved world is "filter", which
is used to specify a rule to follow when multiplexing the tree
(see more below).

## multiple files, namespaces ##

We have one single tree, but we can use multiple files to declare
the variants.

When including or combining multiple multiplexer files, the
destination should be declared at the beginning, so that the
entries are added (injected) at the right place in the tree:

$ cat fedora.mplx
using os->linux->distro->fedora
* 18
   - version = 18
   - has_whatever_tool = true
   - foobar_params = -f -g -d
 * 19
   - version = 19
   - has_whatever_tool = false
   - foobar_params = -f -l - N 12
EOF

The file above, when combined with the previous tree, will
"inject" the variants "18" and "19" into the
os->linux->distro->fedora node of the tree.

With this mechanism, creating variants at runtime becomes
trivial. It's also easy to extend the tree (perhaps in a
upstream/downstream fashion).

## filtering ##

Notice that filters are applied to the tree of variants before
any multiplexing occurs, thus being very efficient.

Besides adding filtering to the variants tree itself, one can use
them to specify what they want to multiplex at runtime.

Given that tests are part of the tree, by default the test runner
would start multiplexing from a tests->$NAME variant that matches
$(basename($TESTID)). So, for example (again, ignore the syntax
and naming):

 * $ avocado --multiplex file.mplx run synctest
    * --> given no external filter, this will multiplex all
      combinations for synctest and run them. With the tree above
      it's a huge matrix, one that multiplexes host config, linux
      distros, debug, production, ide, scsi, virtio, etc.

PS: notice that --multiplex would be optional. Tests can run
with default values.

Now let's see an example with strict filtering via a config file:

  * synctest.cfg:
    """
    filter: "no env->host" 
    filter: "no env->guest"
    """

  * $ avocado --multiplex=file.mplx synctest
    * --> using the config file above, this will multiplex only
      the env variables, resulting in two variants: production
      and debug.


## filtering by value ##

It should be possible to filter by the value of a variable. For
example, the user may want all variants where "os_type =
windows", or a more comprehensive hypothetical list of variants
where "project_license = gpl".


### filtering by depth ###

It should also be possible to limit combinations by depht. The
depths from our example would be:

 * <ROOT> (0)
   * env (1)
     * production (2)
     * debug (2)
   * host (1)
     * kernel_config (2)
       * huge_pages (3)
       * numa_ballance_aggressive (3)
       * numa_ballance_light (3)
   * guest (1)
      * os (2)
        * windows (3)
           * xp (4)
           * 2k12 (4)
           * 7 (4)
        * linux (3)
          * distro (4)
             * fedora (5)
             * ubuntu (5)
      * hardware (2)
        * disks (3)
           * ide (4)
           * scsi (4)
        * network (3)
          * rtl_8139 (4)
          * e1000 (4)
          * virtio_net (4)
 * tests (1)
   * sync_test (2)
     * standard (3)
     * aggressive (3)
  * ping_test (2)
     * standard (3)
     * aggressive (3)

 * filter: "depth = 1" # no variables (leaves) to be multiplexed,
   no even the one from the tests 

 * filter: "depth = 2" # will result in only leaves at depth 2
   being used. In our example, this will effectively cut
   substantial portions of the tree and won't multiplex any of
   the tests variables (but will consider their filtering).

 * filter: "depth = 3" # will ignore hardware details, and
   different OSes. Tets variables will be multiplexed.

The implementation may even allow a per-branch depth
specification, for example:

 * filter: "only guest->hardware(2)" # use the guest->hardware
   branch, but limit it to depth 2.


## tools and instrumentation ##

We need tools to help us debug and parse the tree of variants.
Some ideas:

 - plot the full tree (combining multiple files)
 - plot the tree without the leaves
 - plot the tree with depth information
 - plot the tree with paths for the files from where each line
   came from (similar to "git blame")
 - apply a filter and plot the resulting tree
 - apply a filter and plot the multiplexed matrix (or list)

Given that the filters apply to the tree, it should be fairly
easy to understand and debug their effect.


## future extensions ##

We could probably extend the filtering language to include
restrictions such as limits (min/max).

### End of proposal ###

I'm trying to understand if there's any current use case that
wouldn't be covered by the implementation above or if I'm missing
something. Given my lack of practical experience with the
cartesian format, I most certainly am. :)

Thanks!
   - Ademar

-- 
Ademar de Souza Reis Jr.
Red Hat

^[:wq!

_______________________________________________
Virt-test-devel mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/virt-test-devel

[Virt-test-devel] [RFC] Hierarchical multiplexer variants (cartesian config)

Reply via email to