Hi all,

I stumbled upon this stackoverflow issue
<https://stackoverflow.com/questions/44456801/slurmctld-fatal-cluster-name-mismatch>,
the user gets the error below. Please note I'm not asking for help about
the error, I'm pointing to an issue with the error message itself:

> slurmctld -cD
slurmctld: fatal: CLUSTER NAME MISMATCH.
slurmctld has been started with "ClusterName=�����", but read "cluster"
from the state files in StateSaveLocation.
Running multiple clusters from a shared StateSaveLocation WILL CAUSE
CORRUPTION.

I checked the source, and it seems to me the 2 values are switched, which
adds to the confusion.

Relevant extract from source code
<https://github.com/SchedMD/slurm/blob/slurm-17-02-4-1/src/slurmctld/controller.c#L2634-L2643>
:
fatal("CLUSTER NAME MISMATCH.\n"
        "slurmctld has been started with \""
        "ClusterName=%s\", but read \"%s\" from "
        "the state files in StateSaveLocation.\n"
        "Running multiple clusters from a shared "
        "StateSaveLocation WILL CAUSE CORRUPTION.\n"
        "Remove %s to override this safety check if "
        "this is intentional (e.g., the ClusterName "
        "has changed).", name,
        slurmctld_conf.cluster_name, filename);

Here, "name" was the value read from clustername file.

So I think the parameters are switched, and the message should say:

slurmctld has been started with "ClusterName=cluster", but read "�����"
from the state files in StateSaveLocation.

So I'm asking here before I file a useless bug: have I missed something?
(would not be the first time I misread code)

Thanks in advance,
Hugues

Reply via email to