Hi Roger, On Wed, Dec 24, 2008 at 1:54 PM, Roger Mason <[email protected]> wrote: > > "Daniel Gruner" <[email protected]> writes: > >> There are a couple of things you need not (or should not) do. Once >> you have configured perceus, it should simply restart on reboot. >> After that you should not run the "perceus activate module..." >> commands. > > OK. > >> You say the node booted up. Have you actually looked at >> the node's console when it boots? I assume it does work, since you >> can do the xgroupset and xuserset stuff. After that I don't know. > > Yes, there is a console on the node. I have not tried to do much with > it but simple things like 'ls' certainly work. > >> What does xstat return? > > lowalbite ~ # xstat > Error: could not obtain node list from statfs: Connection refused:127.0.0.1: > 111 >
statfs is NOT running... :-) I suspect this is the source of all your problems. You must start statfs on the master node in order to be able to use the "-a" option to most commands, as this is the daemon that monitors which nodes are up, their load, etc. >> What are the contents of the /etc/xcpu directory? > > lowalbite ~ # ls /etc/xcpu/ > admin_key admin_key.pub statfs.conf statfs.conf~ > > lowalbite ~ # cat /etc/xcpu/statfs.conf > #/etc/xcpu/statfs.conf > n0000=tcp!192.168.0.100!6667 > n0001=tcp!192.168.0.101!6667 > The two lines defining the nodes look ok. I don't know if you can have comment lines like the first line in your statfs.conf. What messages do you get when you try to start statfs? > > Thanks and best wishes, > Roger > Same to you! Happy holidays. Happy holidays to all in the list too! I am happy to report that I am about to go production with my 42-node xcpu cluster, with bjs as the scheduler. Now it is only mpi that is still giving me trouble. Next year... Daniel
