Hi, As far as I understand, the incompatibility is now with srun jobs. My first post was about batch jobs (sorry I didn't specify that). I suppose that the protocol to talk to a "waiting" srun has changed and that is what makes your jobs fail, but I'm just guessing.
regards, Carles Fenoy On Thu, Jul 5, 2012 at 1:49 AM, <[email protected]> wrote: > Moe, > I'm sorry I guess I am still a bit confused. I was assuming that I was > having a similar problem as everyone else on this issue, but just having > different symptoms. I guess to clarify first. Is it supported that jobs > can be queued waiting resources in a 2.3.x release and then upgrade to a > 2.4.1 release and expect these jobs to run once resources are available? > If this is true, are you saying that we can not have any plugins > configured if we want to do this? I am surprised that the previous reports > of problems moving from previous releases to 2.4.1 do not have any plugins > configured. > Nancy > > > > From: Moe Jette <[email protected]> > To: "slurm-dev" <[email protected]>, > Date: 07/04/2012 10:48 AM > Subject: [slurm-dev] Re: Problems upgrading to 2.4.0 > ------------------------------ > > > > > Did the srun start on v2.3, but not get a resource allocation, then > continue execution on v2.4? In that case, it could has a combination > of plugins, some from v2.3 and others from v2.4, which would probably > not work. That is what I am thinking happened. > > Quoting [email protected]: > > > Moe, > > Thank you for your reply, but I am not sure I understand what you saying. > > I have the same slurm.conf file for both releases. The srun that is > > queued, is started with the 2.3 release and I expected it to be started > > even when I upgrade to V2.4.1 once resources are available. Maybe this > is > > not how is works... > > Nancy > > > > > > > > From: Moe Jette <[email protected]> > > To: slurm-dev <[email protected]>, [email protected], > > Date: 07/04/2012 09:12 AM > > Subject: Re: [slurm-dev] Re: Problems upgrading to 2.4.0 > > > > > > > > RPC 4017 is RESPONSE_JOB_ALLOCATION_INFO_LITE (see > > src/common/slurm_protocol_defs.h) and that only contains a job id. > > Nothing in the message contents have changed. Most plugins are loaded > > on demand rather than all being loaded when a program (e.g.. srun) > > starts. My best guess is that the srun command has some version 2.3 > > plugins loaded and some version 2.4 plugins were loaded after the > > upgrade resulting in an inconsistent set of software. > > > > You definitely don't want to keep using a version 2.3 srun with > > version 2.4 daemons. The other commands (sinfo, sbatch, squeue, etc.) > > should all work with new daemons though. > > > > Quoting [email protected]: > > > >> Danny, > >> We are having some trouble with the transition from v2.3.5 to v2.4.1. I > >> tried to keep the test and logs as simple as possible. I have a single > >> node and start job and have a job queued awaiting resources. When I > >> terminate v2.3.5 and start v2.4.1 the job terminates correctly, but the > >> queued job does not start with the following error coming to the > > console. > >> The logs are attached as well. > >> Thanks for any help, > >> Nancy > >> > >> [sulu] (slurm) slurm>srun: error: Invalid Protocol Version 6144 from > >> uid=200 at 141.112.17.124:39306 > >> srun: error: slurm_receive_msg: Protocol version has changed, re-link > > your > >> code > >> srun: error: _accept_msg_connection[sulu.gpv.az05.bull.com]: Protocol > >> version has changed, re-link your code > >> srun: error: Malformed RPC of type 4017 received > >> srun: error: slurm_receive_msg: Header lengths are longer than data > >> received > >> srun: error: Invalid Protocol Version 6144 from uid=200 at > >> 141.112.17.124:53548 > >> srun: error: slurm_receive_msg: Protocol version has changed, re-link > > your > >> code > >> srun: error: slurm_receive_msg[141.112.17.124]: Protocol version has > >> changed, re-link your code > >> srun: error: Unable to allocate resources: Header lengths are longer > > than > >> data received > >> > >> > >> > > > > > > > > > > > > > > > > -- -- Carles Fenoy
