On 2015-04-24 02:03, Moe Jette wrote:
Slurm version 14.11.6 is now available with quite a few bug fixes as
listed below.
Slurm downloads are available from
http://slurm.schedmd.com/download.html
* Changes in Slurm 14.11.6
==========================
[snip]
-- Enable compiling without optimizations and with debugging symbols by
default. Disable this by configuring with --disable-debug.
Always including debug symbols is good (the only cost is a little bit of
disk space, should never really be a problem), but disabling
optimization by default?? In our environment, slurmctld consumes a
decent chunk of cpu time, I would loathe to see it getting a lot (?)
slower.
Typically, problems which are "fixed" by disabling optimization are due
to violations of the C standard or such which for some reason just
doesn't happen to trigger with -O0. Perhaps I'm being needlessly harsh
here, but I'd prefer if the bugs were fixed properly rather than being
papered over like this.
-- Use standard statvfs(2) syscall if available, in preference to
non-standard statfs.
This is not actually such a good idea. Prior to Linux kernel 2.6.36 and
glibc 2.13, the implementation of statvfs required checking all entries
in /proc/mounts. If any of those other filesystems are not available
(e.g. a hung NFS mount), the statvfs call would thus hang. See e.g.
http://man7.org/linux/man-pages/man2/statvfs.2.html
Not directly related to this change, there is also a bit of silliness in
the statfs() code for get_tmp_disk(), namely that it assumes that the fs
record size is the same as the memory page size. As of Linux 2.6 the
struct statfs contains a field f_frsize which contains the correct
record size. I suggest the attached patch which should fix both of these
issues.
--
Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist
Aalto University School of Science, PHYS & NBE
+358503841576 || [email protected]
diff --git a/src/slurmd/slurmd/get_mach_stat.c b/src/slurmd/slurmd/get_mach_stat.c
index d7c5eb1..6493755 100644
--- a/src/slurmd/slurmd/get_mach_stat.c
+++ b/src/slurmd/slurmd/get_mach_stat.c
@@ -217,7 +217,28 @@ get_tmp_disk(uint32_t *tmp_disk, char *tmp_fs)
{
int error_code = 0;
-#if defined(HAVE_STATVFS)
+#ifdef(__linux__)
+ /* Prior to Linux 2.6.36 and glibc 2.13, statvfs() can get
+ * stuck if ANY mount in the system is hung, so use the
+ * non-standard statfs() instead. Furthermore, as of Linux
+ * 2.6+ struct statfs contains the f_frsize field which gives
+ * the size of the blocks reported in the f_blocks field. */
+ struct statfs stat_buf;
+ unsigned long total_size;
+ char *tmp_fs_name = tmp_fs;
+
+ if (tmp_fs_name == NULL)
+ tmp_fs_name = "/tmp";
+ if (statfs(tmp_fs_name, &stat_buf) == 0) {
+ total_size = stat_buf.f_blocks / 1024;
+ total_size *= stat_buf.f_frsize;
+ total_size /= 1024;
+ } else if (errno != ENOENT) {
+ error_code = errno;
+ error ("get_tmp_disk: error %d executing statvfs on %s",
+ errno, tmp_fs_name);
+ } *tmp_disk = (uint32_t)total_size;
+#elif defined(HAVE_STATVFS)
struct statvfs stat_buf;
uint64_t total_size = 0;
char *tmp_fs_name = tmp_fs;