Yes, the srun was started on v2.3. 


From:   Moe Jette <[email protected]>
To:     "slurm-dev" <[email protected]>, 
Date:   07/04/2012 10:48 AM
Subject:        [slurm-dev] Re: Problems upgrading to 2.4.0




Did the srun start on v2.3, but not get a resource allocation, then 
continue execution on v2.4? In that case, it could has a combination 
of plugins, some from v2.3 and others from v2.4, which would probably 
not work. That is what I am thinking happened.

Quoting [email protected]:

> Moe,
> Thank you for your reply, but I am not sure I understand what you 
saying.
> I have the same slurm.conf file for both releases.  The srun that is
> queued, is started with the 2.3 release and I expected it to be started
> even when I upgrade to V2.4.1 once resources are available.  Maybe this 
is
> not how is works...
> Nancy
>
>
>
> From:   Moe Jette <[email protected]>
> To:     slurm-dev <[email protected]>, [email protected],
> Date:   07/04/2012 09:12 AM
> Subject:        Re: [slurm-dev] Re: Problems upgrading to 2.4.0
>
>
>
> RPC 4017 is RESPONSE_JOB_ALLOCATION_INFO_LITE (see
> src/common/slurm_protocol_defs.h) and that only contains a job id.
> Nothing in the message contents have changed. Most plugins are loaded
> on demand rather than all being loaded when a program (e.g.. srun)
> starts. My best guess is that the srun command has some version 2.3
> plugins loaded and some version 2.4 plugins were loaded after the
> upgrade resulting in an inconsistent set of software.
>
> You definitely don't want to keep using a version 2.3 srun with
> version 2.4 daemons. The other commands (sinfo, sbatch, squeue, etc.)
> should all work with new daemons though.
>
> Quoting [email protected]:
>
>> Danny,
>> We are having some trouble with the transition from v2.3.5 to v2.4.1. I
>> tried to keep the test and logs as simple as possible.  I have a single
>> node and start job and have a job queued awaiting resources.  When I
>> terminate v2.3.5 and start v2.4.1 the job terminates correctly, but the
>> queued job does not start with the following error coming to the
> console.
>> The logs are attached as well.
>> Thanks for any help,
>> Nancy
>>
>>  [sulu] (slurm) slurm>srun: error: Invalid Protocol Version 6144 from
>> uid=200 at 141.112.17.124:39306
>> srun: error: slurm_receive_msg: Protocol version has changed, re-link
> your
>> code
>> srun: error: _accept_msg_connection[sulu.gpv.az05.bull.com]: Protocol
>> version has changed, re-link your code
>> srun: error: Malformed RPC of type 4017 received
>> srun: error: slurm_receive_msg: Header lengths are longer than data
>> received
>> srun: error: Invalid Protocol Version 6144 from uid=200 at
>> 141.112.17.124:53548
>> srun: error: slurm_receive_msg: Protocol version has changed, re-link
> your
>> code
>> srun: error: slurm_receive_msg[141.112.17.124]: Protocol version has
>> changed, re-link your code
>> srun: error: Unable to allocate resources: Header lengths are longer
> than
>> data received
>>
>>
>>
>
>
>
>
>
>



Reply via email to