Moe,
I'm sorry I guess I am still a bit confused.  I was assuming that I was 
having a similar problem as everyone else on this issue, but just having 
different symptoms.  I guess to clarify first.  Is it supported that jobs 
can be queued waiting resources in a 2.3.x release and then upgrade to a 
2.4.1 release and expect these jobs to run once resources are available? 
If this is true, are you saying that we can not have any plugins 
configured if we want to do this?  I am surprised that the previous 
reports of problems moving from previous releases to 2.4.1 do not have any 
plugins configured. 
Nancy 



From:   Moe Jette <[email protected]>
To:     "slurm-dev" <[email protected]>, 
Date:   07/04/2012 10:48 AM
Subject:        [slurm-dev] Re: Problems upgrading to 2.4.0




Did the srun start on v2.3, but not get a resource allocation, then 
continue execution on v2.4? In that case, it could has a combination 
of plugins, some from v2.3 and others from v2.4, which would probably 
not work. That is what I am thinking happened.

Quoting [email protected]:

> Moe,
> Thank you for your reply, but I am not sure I understand what you 
saying.
> I have the same slurm.conf file for both releases.  The srun that is
> queued, is started with the 2.3 release and I expected it to be started
> even when I upgrade to V2.4.1 once resources are available.  Maybe this 
is
> not how is works...
> Nancy
>
>
>
> From:   Moe Jette <[email protected]>
> To:     slurm-dev <[email protected]>, [email protected],
> Date:   07/04/2012 09:12 AM
> Subject:        Re: [slurm-dev] Re: Problems upgrading to 2.4.0
>
>
>
> RPC 4017 is RESPONSE_JOB_ALLOCATION_INFO_LITE (see
> src/common/slurm_protocol_defs.h) and that only contains a job id.
> Nothing in the message contents have changed. Most plugins are loaded
> on demand rather than all being loaded when a program (e.g.. srun)
> starts. My best guess is that the srun command has some version 2.3
> plugins loaded and some version 2.4 plugins were loaded after the
> upgrade resulting in an inconsistent set of software.
>
> You definitely don't want to keep using a version 2.3 srun with
> version 2.4 daemons. The other commands (sinfo, sbatch, squeue, etc.)
> should all work with new daemons though.
>
> Quoting [email protected]:
>
>> Danny,
>> We are having some trouble with the transition from v2.3.5 to v2.4.1. I
>> tried to keep the test and logs as simple as possible.  I have a single
>> node and start job and have a job queued awaiting resources.  When I
>> terminate v2.3.5 and start v2.4.1 the job terminates correctly, but the
>> queued job does not start with the following error coming to the
> console.
>> The logs are attached as well.
>> Thanks for any help,
>> Nancy
>>
>>  [sulu] (slurm) slurm>srun: error: Invalid Protocol Version 6144 from
>> uid=200 at 141.112.17.124:39306
>> srun: error: slurm_receive_msg: Protocol version has changed, re-link
> your
>> code
>> srun: error: _accept_msg_connection[sulu.gpv.az05.bull.com]: Protocol
>> version has changed, re-link your code
>> srun: error: Malformed RPC of type 4017 received
>> srun: error: slurm_receive_msg: Header lengths are longer than data
>> received
>> srun: error: Invalid Protocol Version 6144 from uid=200 at
>> 141.112.17.124:53548
>> srun: error: slurm_receive_msg: Protocol version has changed, re-link
> your
>> code
>> srun: error: slurm_receive_msg[141.112.17.124]: Protocol version has
>> changed, re-link your code
>> srun: error: Unable to allocate resources: Header lengths are longer
> than
>> data received
>>
>>
>>
>
>
>
>
>
>



Reply via email to