I've been procrastinating this mail for quite some time, but the recent
discussion about the quality of patches moved this task on the top of my
TODO list.
The purpose of this mail is to illustrate the recent changes made to our
testing infrastructure. I'll give you a little background history and
then I'll go to the implementation details.
## The origin
I joined the SUSE Manager team before last summer to take care of its
python parts. I'm a strong believer of TDD, so I tried to adopt it.
Unfortunately I found it really hard to do that.
The test suite assumes it's running on a system where all the spacewalk
rpms are installed, plus in some cases it assumes the whole spacewalk
stack was also running. So each time I touched something I had to send
my changes to a virtual machine running the whole spacewalk stack, and
then run the test suite inside of this running instance.
Propagating the changes could be done in two ways:
1. The dirty and quick way: just copy the changed files to the right
locations. You have to figure out which are the "right locations",
which is not always immediately obvious.
2. The right way: rebuild the package affected by the changes, then
copy it and install it on the running VM.
That felt such a big waste of time to me and I soon felt frustrated by it.
I just wanted to run the python test suite straight from my git checkout
without having to run the whole spacewalk stack neither on my machine
nor inside of a VM.
I started looking into this problem, I tried to solve the issue from
different angles and failed until I settled on the right solution: enter
Docker.
Docker[0] is a virtualization solution based on Linux containers (LXC).
I think you Red Hat guys should have heard about it :) BTW thanks for
your recent contributions!
Using Docker proved to solve all of my issues. The other people working
on the Java codebase found this approach interesting and embraced it.
Next we adopted this solution also on our continuous integration servers.
That's enough for the preamble, let's go straight to our setup.
## Our goal
* Run the test suite straight from a git checkout.
* Run the test suite on different platforms: SUMA 1.7 is based on SLES
11 SP2, but next version is going to be based on SLES 11 SP3.
* Run the test suite against all the supported databases: PostgreSQL
and Oracle.
* Do not run tons of heavy VMs on the developer machine.
* Make feedback from continuous integration servers as fast as possible.
## Our Setup
We have different Docker images organized in a slightly complex hierarchy.
We start by creating two parent images:
* sle_11_sp2
* sle_11_sp3
Then we inherit them into db specific images, one per database type we
support:
* sle11_sp2_pgsql
* sle11_sp3_pgsql
* sle11_sp3_oracle
We do not test Oracle db on sle11 sp2 with docker yet.
Finally we inherit these images one last time to add all the tools and
libraries required to test the java and the python code.
In the end the tests run on the following images:
* sle11sp2_pgsql_java
* sle11sp2_pgsql_python
* sle11sp3_pgsql_java
* sle11sp3_pgsql_python
* sle11sp3_oracle_java
* sle11sp3_oracle_python
All the 'pgsql' containers have the PostgreSQL server installed and
configured.
We also prepare the db to run the tests while creating the container to
reduce the testing time. As you know the database schema does not change
that often, when it happens we rebuild the images.
The 'oracle' images do not have the Oracle server installed. We tried to
get Oracle working within a Docker container but it turned out to be
impossible.
We are pretty confident one big reason for this "failure" is to be found
in the file system used by the containers at runtime: AUFS.
AUFS proved to be the major cause of headaches in different situations.
It just seems Oracle cannot run on AUFS since it does not support some
of the low level directives provided by standard filesystems.
Things changed with Docker 0.7 since you guys made it possible to ditch
AUFS for device mapper thin provisioning (and hence ext3). We are going
to give another shot at running Oracle within a docker container. Right
now we resorted to use an external Oracle database running inside of a VM.
## How the testing happens
I'm going to focus on python testing since that's my field of expertise.
However the concepts adapt to Java testing as well.
First of all the checkout spacewalk's code from git. The git checkout is
made accessible to the running container by using the volume feature of
Docker.
All the spacewalk code is mounted inside of the container under the
'/manager/' path.
During the image creation we create a couple of symbolic links to fake
spacewalk's installation:
* '/usr/share/rhn/config-defaults' points to
'/manager/backend/rhn-conf/'
* '/usr/lib64/python/site-packages/spacewalk' points to
'/manager/backend/'
The containers are also instructed to run with a custom PYTHONPATH which
forces python to look into some directories under '/manager'.
Docker allocates the right container and runs a simple script inside of
it. Once testing is done Docker deletes the container.
The script executed within the container does the following things:
* Optional: start PostgreSQL database
* Invoke nosetests.
* Optional: shut down PostgreSQL database (this is required to
properly release some memory allocated by pgsql via 'shmget').
Nose produces all the xml reports inside of some folder under the
'/manager' directory, making them available also on the host system. On
our CI servers the reports are collected and processed by Jenkins.
## Why is this awesome
There are several reasons which makes this solution shine.
### Speed
The biggest advantage is speed: Docker does not introduce any
significant overhead to testing. Allocating the container and releasing
it requires less than a second. The code running inside of it has the
same performance as if it were run outside of docker on the host system.
### Resource usage
We do not need to allocate a full VM to perform the tests. That saves a
lot of resources and makes it possible to run simultaneously multiple
containers on the same host.
### Testing matrix
Since it's fast and it doesn't hammer the system each developer can test
his changes against multiple deployment scenarios (sle11sp2 with pgsql,
sle11sp3 with pgsql, ...).
The same applies to our CI servers.
### Consistency
A failure happening on the CI server can be always reproduced by the
developers: they just need to pull the container used by Jenkins and run
the tests inside of it.
## Why are we disclosing this information
I think we all want spacewalk to be stable and reliable. One way to
achieve that is by having a good test suite. But that IMHO is not
enough. The tests should be run continuously and the CI servers should
react fast to new commits. If something breaks you need to know it as
soon as possible.
Moreover developers should try to not break spacewalk with their
commits. We should always run the test suite locally before pushing our
changes. But to promote this behaviour the test suite must be fast and
easy to run. That's something which with IMHO is not possible without
resorting to the solution I just illustrated.
As I said in the beginning this mail has been motivated by the recent
discussion about the quality of the patches. I totally agree's with what
Duncan Mac-Vicar proposed.
I think it would be terrific to use github's pull request to submit new
code to spacewalk. Using this approach we could also have fast feedback
from a CI server for each pull request made by the contributors.
I hope you find this reading interesting, if you have more questions
just ask me.
If you are interested we can also start a discussion about pushing parts
of our system upstream (makefiles, docker files,...).
Cheers
Flavio
[0] http://docker.io
_______________________________________________
Spacewalk-devel mailing list
Spacewalk-devel@redhat.com
https://www.redhat.com/mailman/listinfo/spacewalk-devel