Moe,
Thank you for your reply, but I am not sure I understand what you saying. 
I have the same slurm.conf file for both releases.  The srun that is 
queued, is started with the 2.3 release and I expected it to be started 
even when I upgrade to V2.4.1 once resources are available.  Maybe this is 
not how is works... 
Nancy



From:   Moe Jette <[email protected]>
To:     slurm-dev <[email protected]>, [email protected], 
Date:   07/04/2012 09:12 AM
Subject:        Re: [slurm-dev] Re: Problems upgrading to 2.4.0



RPC 4017 is RESPONSE_JOB_ALLOCATION_INFO_LITE (see 
src/common/slurm_protocol_defs.h) and that only contains a job id. 
Nothing in the message contents have changed. Most plugins are loaded 
on demand rather than all being loaded when a program (e.g.. srun) 
starts. My best guess is that the srun command has some version 2.3 
plugins loaded and some version 2.4 plugins were loaded after the 
upgrade resulting in an inconsistent set of software.

You definitely don't want to keep using a version 2.3 srun with 
version 2.4 daemons. The other commands (sinfo, sbatch, squeue, etc.) 
should all work with new daemons though.

Quoting [email protected]:

> Danny,
> We are having some trouble with the transition from v2.3.5 to v2.4.1.  I
> tried to keep the test and logs as simple as possible.  I have a single
> node and start job and have a job queued awaiting resources.  When I
> terminate v2.3.5 and start v2.4.1 the job terminates correctly, but the
> queued job does not start with the following error coming to the 
console.
> The logs are attached as well.
> Thanks for any help,
> Nancy
>
>  [sulu] (slurm) slurm>srun: error: Invalid Protocol Version 6144 from
> uid=200 at 141.112.17.124:39306
> srun: error: slurm_receive_msg: Protocol version has changed, re-link 
your
> code
> srun: error: _accept_msg_connection[sulu.gpv.az05.bull.com]: Protocol
> version has changed, re-link your code
> srun: error: Malformed RPC of type 4017 received
> srun: error: slurm_receive_msg: Header lengths are longer than data
> received
> srun: error: Invalid Protocol Version 6144 from uid=200 at
> 141.112.17.124:53548
> srun: error: slurm_receive_msg: Protocol version has changed, re-link 
your
> code
> srun: error: slurm_receive_msg[141.112.17.124]: Protocol version has
> changed, re-link your code
> srun: error: Unable to allocate resources: Header lengths are longer 
than
> data received
>
>
>





Reply via email to