[Virt-test-devel] [RFCv2] Hierarchical multiplexer variants (cartesian config)

Ademar Reis Fri, 16 May 2014 14:29:30 -0700

Changes from v1->v2

 - changed 'filter: "only ..."' to "filter-only: ..."
 - changed 'filter: "no ..."' to "filter-out: ..."
 - changed 'filter: "depth ...' to "filter-depth: ..."
 - added a section explaining filter behaviors and an example of
   a multiplexation of the resulting tree.
 - clarification about default values
 - minor fixes/improvements



The problem
-----------

A multiplexer generates a combination of variables that basically
allows one test to be run on different scenarios and with
different parameters. In other words, users write just one test,
but the test runner runs n tests, with n being the size of the
multiplexed set.

The current cartesian format, as used in autotest is
unstructured. It's a collection of dictionaries and lists that
when combined and multiplexed easily explode into a huge data
set. A lot of effort has been put into optimizations and cleanups
along the years, but it still remains as one of the most complex
parts of the autotest framework.

It's relatively difficult to understand and given the global
nature of the variants/variables (no proper namespace or
compartmentalization), the format of the file and the size of the
variants set, it's very error prone.

It's currently very common to find global filters in the code
such as "only raw", "only up" or "only ide".  Filters easily
become complex and difficult to understand and debug. Given
they're applied on top of the resulting (huge) set, they are
inherently slow to process and hard to understand and validate.

I believe this lack of structure is the root cause of many of the
problems or dislikes we have with the cartesian system.


Hierarchical multiplexer variants
---------------------------------

My proposal is to keep all variants in a tree structure, with a
single root. The actual variants will be leaves on this tree and
therefore have an unique identifier.

Filters work on the tree (chopping it), before any multiplexing
actually happens.

For example: hardware->disks->ide would be an example of a
variant where ide variables would be kept. When filtering for
ide, one could specify any part of the full path:

  * 'hardware->disks->ide' --> the full ID of the ide variant
  * '*disks->ide' --> any leave 'ide' inside a branch 'disks'
  * '*ide' --> any leave called 'ide'

It might be better to explain the concept with an hypothetic
example:

## reference tree ##

 * <ROOT>
   * env
     * production
       - malloc_perturb = no
       - gcc_flags = -O3
     * debug
       - malloc_pertub = yes
       - gcc_flags = -g
   * host
     * kernel_config
       * huge_pages
         - huge_pages = yes
       * small_pages
         - huge_pages = no
       * numa_ballance_aggressive
         - numa_balancing = 1
         - numa_balancing_migrate_deferred = 32
         - numa_balancing_scan_size_mb = 512
       * numa_ballance_light
         - numa_balancing = 1
         - numa_balancing_migrate_deferred = 8
         - numa_balancing_scan_size_mb = 32    
   * guest
      * os
        * windows
           - os_type = windows
           * xp
             - win = xp
           * 2k12
             - win = 2k12
           * 7
             - win = 7
        * linux
          - os_type = linux
          * distro
             * fedora
               - distro = fedora
             * ubuntu
               - distro = ubuntu
      * hardware
        * disks
           * ide
             - drive_format = ide
           * scsi
             - drive_format = scsi
        * network
          * rtl_8139
             - nic_model = rtl8139
          * e1000
             - nic_model = e1000
          * virtio_net
             - filter-only: "guest->os->linux" # meaning: only
                                               # the linux leave
                                               # of guest->os will
                                               # be used
             - nic_model = virtio
             - enable_msix_vectors = yes
 * tests
   - filter-out: "tests" # we don't want to multiplex test
                          # variables from different tests, so we
                          # filter out the entire branch
   * sync_test
      - filter-only: "guest->os->linux" # we don't want to
                                        # multiplex other OSes here
      - filter-only: "hardware->disks" # self-explanatory
      * standard
         - sync_timeout = 30
         - sync_tries = 10
      * aggressive
         - sync_timeout = 10
         - sync_tries = 20
  * ping_test
      - filter-only: "guest->os->linux" # we don't want to
                                        # multiplex other OSes here
      - filter-only: "hardware->network" # self-explanatory
      * standard
        - ping_tries = 10
        - timeout = 20
      * aggressive
        - ping_flags = -f
        - ping_tries = 100
        - timeout - 5

In summary, variants are nodes and variables are leaves. Variants
can have arbitrary sub-variants and arbitrary variables (combined
or not). Variables and filters from a parent are inherited.

In the example above, the only reserved words are "filter-only"
and "filter-out", which are filters that chop the tree before a
multiplexation happens (see more below).


## multiplexing example ##

For example, suppose that one wants to run the sync_test and
after applying several filters they're left with the tree below:

<ROOT>
   * env
     * production
       - malloc_perturb = no
       - gcc_flags = -O3
     * debug
       - malloc_pertub = yes
       - gcc_flags = -g
   * tests
     * sync_test
       * standard
         - sync_timeout = 30
         - sync_tries = 10
       * aggressive
         - sync_timeout = 10
         - sync_tries = 20

The leaves are:

    env->production
    env->debug
    tests->sync_test->standard
    tests->sync_test->aggressive

This will get multiplexed into 4 test scenarios, thus requiring 4
test runs:

    env->production + tests->sync_test->standard
    env->production + tests->sync_test->aggressive
    env->debug + tests->sync_test->standard
    env->debug + tests->sync_test->aggressive

It's easy to notice that the system grows exponentially with the
number of variants.


## multiple files, namespaces ##

We have one single tree, but we can use multiple files to declare
the variants.

When including or combining multiple multiplexer files, the
destination in the tree should be declared at the beginning, so
that the entries are added (injected) at the right place.

$ cat fedora.mplx
using os->linux->distro->fedora
* 18
   - version = 18
   - has_whatever_tool = true
   - foobar_params = -f -g -d
 * 19
   - version = 19
   - has_whatever_tool = false
   - foobar_params = -f -l - N 12
EOF

The file above, when combined with the previous tree, will
"inject" the variants "18" and "19" into the
os->linux->distro->fedora node of the tree.

With this mechanism, creating variants at runtime becomes
trivial. It's also easy to extend the tree (perhaps in a
upstream/downstream fashion).


## default values ##

The multiplex system is optional. Tests are supposed to run with
default values in a supported system, just like virt-test does
today. Test writers declare the defaults (not part of the
multiplexing system).

So when a branch is filtered out, the test will fallback to the
defaults provided by the test runner. Ditto for when tests are
run without the multiplexer.


## filtering ##

Filters are applied to the tree of variants before any
multiplexing occurs, thus being very efficient. Filters "chop"
the tree.

Filters can be added inside the tree itself (as in the example)
and also specified at runtime.

Given that tests are part of the tree, by default the test runner
would start multiplexing from a tests->$NAME variant that matches
$(basename($TESTID)). So, for example:

 * $ avocado --multiplex file.mplx run synctest
    * --> given no external filter, this will multiplex all
      combinations for synctest and run them. With the tree above
      it's a huge set, one that multiplexes host config, linux
      distros, debug, production, ide, scsi, virtio, e1000,
      rtl8139, etc.

Now let's see an example with strict filtering via a config file:

  * synctest.cfg:
    """
    filter-out: "host" 
    filter-out: "guest"
    """

  * $ avocado --multiplex=file.mplx synctest
    * --> using the config file above, the tree will be left only
      with the env branch, where there are two variants:
      production and debug.


### filtering by value ###

It should be possible to filter by the value of a variable. For
example, the user may want all variants where "os_type =
windows", or a more comprehensive hypothetical list of variants
where "project_license = gpl".


### filtering by depth ###

It should also be possible to limit combinations by depth. The
depths from our example would be:

 * <ROOT> (0)
   * env (1)
     * production (2)
     * debug (2)
   * host (1)
     * kernel_config (2)
       * huge_pages (3)
       * numa_ballance_aggressive (3)
       * numa_ballance_light (3)
   * guest (1)
      * os (2)
        * windows (3)
           * xp (4)
           * 2k12 (4)
           * 7 (4)
        * linux (3)
          * distro (4)
             * fedora (5)
             * ubuntu (5)
      * hardware (2)
        * disks (3)
           * ide (4)
           * scsi (4)
        * network (3)
          * rtl_8139 (4)
          * e1000 (4)
          * virtio_net (4)
 * tests (1)
   * sync_test (2)
     * standard (3)
     * aggressive (3)
  * ping_test (2)
     * standard (3)
     * aggressive (3)

Following with examples on this tree:

 * filter-depth: 1 # no variables (leaves) left out to be
   multiplexed, not even the ones from the tests. The result is
   an empty set.

 * filter-depth: 2 # will result in only leaves at depth 2 being
   used. In our example, this will effectively cut substantial
   portions of the tree and won't multiplex any of the tests
   variables, but "env" will be there.

 * filter-depth: 3 # will ignore hardware details, and different
   OSes, but test variables and env will be multiplexed.

Maybe we can use the depth as a paramter to filter-only as well:

 * filter-only: "guest->hardware(2)" # use the guest->hardware
   branch, but limit it to depth 2.


## tools and instrumentation ##

We need tools to help us debug and parse the tree of variants.
Some ideas:

 - plot the full tree (combining multiple files)
 - plot the tree without the leaves
 - plot the tree with depth information
 - plot the tree with paths for the files from where each line
   came from (similar to "git blame")
 - apply a filter before plotting the tree
 - print the list of leaves
 - apply a filter and print the resulting set of variants

Given that the filters apply to the tree, it should be relatively
easy to understand and debug their effect.

## future extensions ##

We could probably extend the filtering language to include
restrictions such as limits (min/max).

EOF.

-- 
Ademar de Souza Reis Jr.
Red Hat

^[:wq!

_______________________________________________
Virt-test-devel mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/virt-test-devel

[Virt-test-devel] [RFCv2] Hierarchical multiplexer variants (cartesian config)

Reply via email to