Thank you for the patch. A somewhat more general solution to this problem will be in version 14.03.5 and can be found here:

https://github.com/SchedMD/slurm/commit/5351d393a67809371f14a40ca8bb4e85f0c64018


Quoting Didier GAZEN <[email protected]>:

Sorry, my previous post was truncated.

diff --git a/src/sinfo/sinfo.c b/src/sinfo/sinfo.c
index 15a17dc..0643f62 100644
--- a/src/sinfo/sinfo.c
+++ b/src/sinfo/sinfo.c
@@ -573,6 +573,9 @@ static int _build_sinfo_data(List sinfo_list,
                sinfo_cnt++;
                slurm_mutex_unlock(&sinfo_cnt_mutex);

+               if (params.list_reasons) {
+                 _build_part_info(build_struct_ptr);
+               } else {
                  slurm_attr_init(&attr_sinfo);
                  if (pthread_attr_setdetachstate
                      (&attr_sinfo, PTHREAD_CREATE_DETACHED))
@@ -585,6 +588,7 @@ static int _build_sinfo_data(List sinfo_list,
                  }
                  slurm_attr_destroy(&attr_sinfo);
                }
+       }


Regards,

Didier Gazen
Laboratoire d'AƩrologie


On 06/17/2014 07:09 PM, Didier GAZEN wrote:
Hi,

The fix that is supposed to print each down/drained node once rather than once per partition (commit object b5ace9a) when running "sinfo -R" is not sufficient.

When enabling the -R option (list_reasons), all spawned threads (_build_part_info) that work on a separate partition receive as argument an empty sinfo_list. In this "list_reasons" case, threads may update the same partition data during the _insert_node_ptr function call. The problem is that the _insert_node_ptr function and all the functions it calls are not thread safe WHEN different threads are updating the SAME partition member of sinfo_list.

The simplest workaround I found was to disable thread spawning when list_reasons is on.

static int _build_sinfo_data(List sinfo_list,
                            partition_info_msg_t *partition_msg,
                            node_info_msg_t *node_msg)
{



Reply via email to