Wow, well spotted.  I came here to see if anyone had reported this same
issue with environment modules, as I noticed several of my jobs failing
on our cluster this morning.  Turns out, I'm probably the only one who
had failed jobs, as I have a long-running tmux session open on the head
node, and therefore old bash. ;)

Other users wouldn't have noticed because we updated all of our
infrastructure in one go using ansible[0] last Friday.

In any case, glad to be in good company.  Cheers!

Alan

[0]
http://mjanja.co.ke/2014/09/update-hosts-via-ansible-to-mitigate-bash-shellshock-vulnerability/

On 09/29/2014 08:27 AM, Christopher Samuel wrote:
> On 27/09/14 08:30, John Brunelle wrote:
>
>> This caused a bit of trouble for us when we patched some head nodes
>> before compute nodes.
> We did some testing to confirm that:
>
> A) If you update a login node before compute nodes jobs will fail as
> John describes.
>
> B) If you update a compute node when there are jobs queued under the
> previous bash then they will fail when they run there (also cannot find
> modules, even though a prologue of ours sets BASH_ENV to force the env
> vars to get set).
>
>
> Our way to (hopefully safely) upgrade our x86-64 clusters was:
>
> 0) Note that our slurmctld runs on the cluster management node which is
> separate to the login nodes and not accessible to users.
>
> 1) Kick all the users off the login nodes, update bash, reboot them
> (ours come back with nologin enabled to stop users getting back on
> before we're ready).
>
> 2) Set all partitions down to stop new jobs starting
>
> 3) Move all compute nodes to an "old" partition
>
> 4) Move all queued (pending) jobs to the "old" partition
>
> 5) Update bash on any idle nodes and move them back to our "main"
> (default) partition
>
> 6) Set an AllowGroups on the "old" partition so users can't submit jobs
> to it by accident.
>
> 7) Let users back onto the login nodes.
>
> 8) Set partitions back to "up" to start jobs going again.
>
>
> Hope this helps folks..
>
> cheers!
> Chris

-- 
Alan Orth
[email protected]
http://alaninkenya.org
http://mjanja.co.ke
"I have always wished for my computer to be as easy to use as my telephone; my 
wish has come true because I can no longer figure out how to use my telephone." 
-Bjarne Stroustrup, inventor of C++
GPG public key ID: 0x8cb0d0acb5cd81ec209c6cdfbd1a0e09c2f836c0

Reply via email to