Dear developers, I see this...
slurmctld.log [2015-08-28T08:25:43.370] Updating acct_gather data for <nodelist> slurmd.log [2015-08-28T08:26:08.401] debug3: in the service_connection [2015-08-28T08:26:08.401] debug2: got this type of message 1017 [2015-08-28T08:26:08.401] debug2: Processing RPC: REQUEST_ACCT_GATHER_UPDATE [2015-08-28T08:26:08.401] debug2: Processing RPC: REQUEST_ACCT_GATHER_UPDATE what is the meaning of these messages? The slurmctld tries to summon the daemon on the node (just for status), but somehow gets no response. The load on both is nodes is low, ping runs fine. This happens (stochastically) from time to time and makes the node unreliable. thanks a lot, Ulf PS. sometimes I get other lines doubled, but this is the only slurmd process running... [2015-08-28T08:27:23.475] debug3: in the service_connection [2015-08-28T08:27:23.476] debug2: got this type of message 1017 [2015-08-28T08:27:23.476] debug2: got this type of message 1017 [2015-08-28T08:27:23.484] debug2: Processing RPC: REQUEST_ACCT_GATHER_UPDATE -- ___________________________________________________________________ Dr. Ulf Markwardt Technische Universität Dresden Center for Information Services and High Performance Computing (ZIH) 01062 Dresden, Germany Phone: (+49) 351/463-33640 WWW: http://www.tu-dresden.de/zih
smime.p7s
Description: S/MIME Cryptographic Signature
