Hi,

I ran into a very weird problem the other day that I thought I would share with the list.

We recently switched our maven builds to Fedora linux running on a really great VMWARE cluster with very fast SSD-based storage.

However, our average java build inexplicably increased to several hours. In my own build environment under mac osx, similar builds would complete in minutes (some of the small builds take only 2-3 minutes under normal circumstances).

I eventually noticed that the jobs were hanging during jarsigning.

After investigating many red herrings, it boiled down to the linux "secure random" device, /dev/random, not supplying enough data. It is a blocking device, and therefore, if you are using the jarsigner plugin, the jarsigner will read from /dev/random to seed the encryption for the signature for each class. If you have a jar with many classes that need to be signed, it can literally take hours.

In our case, a 2 minute build was taking up to 4 hours.

Note that if you are running under mac osx, this will not occur, because Apple maps /dev/random to the psuedo random number device which is /dev/urandom:

# ls -l /dev/random /dev/urandom
crw-rw-rw-  1 root  wheel    9,   0 Feb  1 09:31 /dev/random
crw-rw-rw-  1 root  wheel    9,   1 Jan 23 19:22 /dev/urandom

You may not experience this problem if you are running on real hardware, because the so-called "entropy pool" can be supplied with hardware sources of noise, even at the chip level in some cases.

However, if you are running in a virtual environment, apparently there is not enough noise to supply the entropy pool with a steady stream of data, and so /dev/random runs out and blocks very quickly.

The problem is very easy to detect:

        $ dd if=/dev/random of=/dev/null bs=1 count=1000

If dd hangs while reading from the device, or takes a very long time, then you have the problem.

Once you identify the problem, it is easy to solve [1,2]. In our case, we opted for a system-wide solution using the rngd-tools [3] package, since this problem can impact many things, including ssl connections with jenkins [4], or any other programs that consume encryption services.

Cheers,
-Russ

References:

[1] http://stackoverflow.com/questions/137212/how-to-solve-performance-problem-with-java-securerandom
[2] http://docs.oracle.com/cd/E12529_01/wlss31/configwlss/jvmrand.html
[3] on fedora/centos/etc:  sudo yum install rng-utils
    start daemon with:  rngd --rng-device=/dev/urandom
[4] "<https://groups.google.com/forum/?hl=en_US#!searchin/jenkinsci-users/Re$3A$20How$20does$20Jenkins$20checkout$20sources$20from$20svn$20to$20the$20slaves/jenkinsci-users/sqp6hvzdXDY/P-ZYkDO-8TgJ>How does Jenkins checkout sources from svn to the slaves", jenkinsci-users list.

Reply via email to