On 2015-04-24 02:03, Moe Jette wrote:

Slurm version 14.11.6 is now available with quite a few bug fixes as
listed below.

Slurm downloads are available from
http://slurm.schedmd.com/download.html

* Changes in Slurm 14.11.6
==========================

[snip]

  -- Enable compiling without optimizations and with debugging symbols by
     default. Disable this by configuring with --disable-debug.

Always including debug symbols is good (the only cost is a little bit of disk space, should never really be a problem), but disabling optimization by default?? In our environment, slurmctld consumes a decent chunk of cpu time, I would loathe to see it getting a lot (?) slower.

Typically, problems which are "fixed" by disabling optimization are due to violations of the C standard or such which for some reason just doesn't happen to trigger with -O0. Perhaps I'm being needlessly harsh here, but I'd prefer if the bugs were fixed properly rather than being papered over like this.

  -- Use standard statvfs(2) syscall if available, in preference to
     non-standard statfs.

This is not actually such a good idea. Prior to Linux kernel 2.6.36 and glibc 2.13, the implementation of statvfs required checking all entries in /proc/mounts. If any of those other filesystems are not available (e.g. a hung NFS mount), the statvfs call would thus hang. See e.g.

http://man7.org/linux/man-pages/man2/statvfs.2.html

Not directly related to this change, there is also a bit of silliness in the statfs() code for get_tmp_disk(), namely that it assumes that the fs record size is the same as the memory page size. As of Linux 2.6 the struct statfs contains a field f_frsize which contains the correct record size. I suggest the attached patch which should fix both of these issues.



--
Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist
Aalto University School of Science, PHYS & NBE
+358503841576 || [email protected]
diff --git a/src/slurmd/slurmd/get_mach_stat.c b/src/slurmd/slurmd/get_mach_stat.c
index d7c5eb1..6493755 100644
--- a/src/slurmd/slurmd/get_mach_stat.c
+++ b/src/slurmd/slurmd/get_mach_stat.c
@@ -217,7 +217,28 @@ get_tmp_disk(uint32_t *tmp_disk, char *tmp_fs)
 {
 	int error_code = 0;
 
-#if defined(HAVE_STATVFS)
+#ifdef(__linux__)
+	/* Prior to Linux 2.6.36 and glibc 2.13, statvfs() can get
+	 * stuck if ANY mount in the system is hung, so use the
+	 * non-standard statfs() instead. Furthermore, as of Linux
+	 * 2.6+ struct statfs contains the f_frsize field which gives
+	 * the size of the blocks reported in the f_blocks field. */
+	struct statfs stat_buf;
+	unsigned long total_size;
+	char *tmp_fs_name = tmp_fs;
+
+	if (tmp_fs_name == NULL)
+		tmp_fs_name = "/tmp";
+	if (statfs(tmp_fs_name, &stat_buf) == 0) {
+		total_size = stat_buf.f_blocks / 1024;
+		total_size *= stat_buf.f_frsize;
+		total_size /= 1024;
+	} else if (errno != ENOENT) {
+		error_code = errno;
+		error ("get_tmp_disk: error %d executing statvfs on %s",
+		       errno, tmp_fs_name);
+        }                                                                                                                                                         *tmp_disk = (uint32_t)total_size;  
+#elif defined(HAVE_STATVFS)
 	struct statvfs stat_buf;
 	uint64_t total_size = 0;
 	char *tmp_fs_name = tmp_fs;

Reply via email to