Hi Steve,

Thanks for your work on this.

Just to make sure you understand (since you mentioned the problem
doesn't happen for you). this bug is not just about warning messages...
It actually drops you into the rescue shell, and the boot process stops
there, waiting for console input.

This happens about 10% of the time per node for me (I gather it's when
some race condition occurs). This makes unattended boot of a cluster not
usable. E.g., I have 20 nodes in a cluster so almost always one or more
nodes don't boot.

I am mounting /home and /usr/local over NFS.
evoa1:/home /home nfs rw 0 0
evoa1:/usr/local /usr/local nfs rw 0 0

I tried a the workaround suggested above - putting noauto in /etc/fstab
and then mounting the directories in /etc/rc.local... however then users
can ssh into the nodes before their home directories are mounted & that
causes other problems. (We are running some job queuing software &
queued jobs may try to start up quickly as soon as a node is up.)

A workaround would be helpful... e.g., just knowing the right place to
put a "sleep 30" to reduce the frequency of the race condition.

thanks,

Mike

-- 
retry remote devices when parent is ready after SIGUSR1
https://bugs.launchpad.net/bugs/470776
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to