I've been procrastinating this mail for quite some time, but the recent
discussion about the quality of patches moved this task on the top of my TODO list.

The purpose of this mail is to illustrate the recent changes made to our testing infrastructure. I'll give you a little background history and then I'll go to the implementation details.

## The origin

I joined the SUSE Manager team before last summer to take care of its python parts. I'm a strong believer of TDD, so I tried to adopt it. Unfortunately I found it really hard to do that.

The test suite assumes it's running on a system where all the spacewalk rpms are installed, plus in some cases it assumes the whole spacewalk stack was also running. So each time I touched something I had to send my changes to a virtual machine running the whole spacewalk stack, and then run the test suite inside of this running instance.
Propagating the changes could be done in two ways:
  1. The dirty and quick way: just copy the changed files to the right
     locations. You have to figure out which are the "right locations",
     which is not always immediately obvious.
  2. The right way: rebuild the package affected by the changes, then
     copy it and install it on the running VM.

That felt such a big waste of time to me and I soon felt frustrated by it.

I just wanted to run the python test suite straight from my git checkout without having to run the whole spacewalk stack neither on my machine nor inside of a VM.

I started looking into this problem, I tried to solve the issue from different angles and failed until I settled on the right solution: enter Docker.

Docker[0] is a virtualization solution based on Linux containers (LXC). I think you Red Hat guys should have heard about it :) BTW thanks for your recent contributions!

Using Docker proved to solve all of my issues. The other people working on the Java codebase found this approach interesting and embraced it.
Next we adopted this solution also on our continuous integration  servers.

That's enough for the preamble, let's go straight to our setup.


## Our goal

* Run the test suite straight from a git checkout.
* Run the test suite on different platforms: SUMA 1.7 is based on SLES
  11 SP2, but next version is going to be based on SLES 11 SP3.
* Run the test suite against all the supported databases: PostgreSQL
  and Oracle.
* Do not run tons of heavy VMs on the developer machine.
* Make feedback from continuous integration servers as fast as possible.


## Our Setup

We have different Docker images organized in a slightly complex hierarchy.

We start by creating two parent images:
  * sle_11_sp2
  * sle_11_sp3

Then we inherit them into db specific images, one per database type we support:
  * sle11_sp2_pgsql
  * sle11_sp3_pgsql
  * sle11_sp3_oracle

We do not test Oracle db on sle11 sp2 with docker yet.

Finally we inherit these images one last time to add all the tools and libraries required to test the java and the python code.

In the end the tests run on the following images:
  * sle11sp2_pgsql_java
  * sle11sp2_pgsql_python
  * sle11sp3_pgsql_java
  * sle11sp3_pgsql_python
  * sle11sp3_oracle_java
  * sle11sp3_oracle_python

All the 'pgsql' containers have the PostgreSQL server installed and configured. We also prepare the db to run the tests while creating the container to reduce the testing time. As you know the database schema does not change that often, when it happens we rebuild the images.

The 'oracle' images do not have the Oracle server installed. We tried to
get Oracle working within a Docker container but it turned out to be impossible. We are pretty confident one big reason for this "failure" is to be found in the file system used by the containers at runtime: AUFS. AUFS proved to be the major cause of headaches in different situations. It just seems Oracle cannot run on AUFS since it does not support some of the low level directives provided by standard filesystems.

Things changed with Docker 0.7 since you guys made it possible to ditch AUFS for device mapper thin provisioning (and hence ext3). We are going to give another shot at running Oracle within a docker container. Right now we resorted to use an external Oracle database running inside of a VM.

## How the testing happens

I'm going to focus on python testing since that's my field of expertise. However the concepts adapt to Java testing as well.

First of all the checkout spacewalk's code from git. The git checkout is made accessible to the running container by using the volume feature of Docker. All the spacewalk code is mounted inside of the container under the '/manager/' path.

During the image creation we create a couple of symbolic links to fake spacewalk's installation:
  * '/usr/share/rhn/config-defaults' points to
    '/manager/backend/rhn-conf/'
  * '/usr/lib64/python/site-packages/spacewalk' points to
    '/manager/backend/'

The containers are also instructed to run with a custom PYTHONPATH which forces python to look into some directories under '/manager'.

Docker allocates the right container and runs a simple script inside of it. Once testing is done Docker deletes the container.

The script executed within the container does the following things:
  * Optional: start PostgreSQL database
  * Invoke nosetests.
  * Optional: shut down PostgreSQL database (this is required to
    properly release some memory allocated by pgsql via 'shmget').

Nose produces all the xml reports inside of some folder under the '/manager' directory, making them available also on the host system. On our CI servers the reports are collected and processed by Jenkins.

## Why is this awesome

There are several reasons which makes this solution shine.

### Speed

The biggest advantage is speed: Docker does not introduce any significant overhead to testing. Allocating the container and releasing it requires less than a second. The code running inside of it has the same performance as if it were run outside of docker on the host system.

### Resource usage

We do not need to allocate a full VM to perform the tests. That saves a lot of resources and makes it possible to run simultaneously multiple containers on the same host.

### Testing matrix

Since it's fast and it doesn't hammer the system each developer can test his changes against multiple deployment scenarios (sle11sp2 with pgsql, sle11sp3 with pgsql, ...).

The same applies to our CI servers.

### Consistency

A failure happening on the CI server can be always reproduced by the developers: they just need to pull the container used by Jenkins and run the tests inside of it.


## Why are we disclosing this information

I think we all want spacewalk to be stable and reliable. One way to achieve that is by having a good test suite. But that IMHO is not enough. The tests should be run continuously and the CI servers should react fast to new commits. If something breaks you need to know it as soon as possible.

Moreover developers should try to not break spacewalk with their commits. We should always run the test suite locally before pushing our changes. But to promote this behaviour the test suite must be fast and easy to run. That's something which with IMHO is not possible without resorting to the solution I just illustrated.


As I said in the beginning this mail has been motivated by the recent discussion about the quality of the patches. I totally agree's with what Duncan Mac-Vicar proposed. I think it would be terrific to use github's pull request to submit new code to spacewalk. Using this approach we could also have fast feedback from a CI server for each pull request made by the contributors.


I hope you find this reading interesting, if you have more questions just ask me. If you are interested we can also start a discussion about pushing parts of our system upstream (makefiles, docker files,...).

Cheers
Flavio


[0] http://docker.io

_______________________________________________
Spacewalk-devel mailing list
Spacewalk-devel@redhat.com
https://www.redhat.com/mailman/listinfo/spacewalk-devel

Reply via email to