[xcpu] Re: perceus and xcpu

Daniel Gruner Wed, 24 Dec 2008 13:02:55 -0800

Hi Roger,

On Wed, Dec 24, 2008 at 1:54 PM, Roger Mason <[email protected]> wrote:
>
> "Daniel Gruner" <[email protected]> writes:
>
>> There are a couple of things you need not (or should not) do.  Once
>> you have configured perceus, it should simply restart on reboot.
>> After that you should not run the "perceus activate module..."
>> commands.
>
> OK.
>
>> You say the node booted up.  Have you actually looked at
>> the node's console when it boots?  I assume it does work, since you
>> can do the xgroupset and xuserset stuff.  After that I don't know.
>
> Yes, there is a console on the node.  I have not tried to do much with
> it but simple things like 'ls' certainly work.
>
>> What does xstat return?
>
> lowalbite ~ # xstat
> Error: could not obtain node list from statfs: Connection refused:127.0.0.1: 
> 111
>


statfs is NOT running... :-)  I suspect this is the source of all your problems.
You must start statfs on the master node in order to be able to use
the "-a" option to most commands, as this is the daemon that monitors
which nodes are up, their load, etc.

>> What are the contents of the /etc/xcpu directory?
>
> lowalbite ~ # ls /etc/xcpu/
> admin_key  admin_key.pub  statfs.conf  statfs.conf~
>
> lowalbite ~ # cat /etc/xcpu/statfs.conf
> #/etc/xcpu/statfs.conf
> n0000=tcp!192.168.0.100!6667
> n0001=tcp!192.168.0.101!6667
>

The two lines defining the nodes look ok.  I don't know if you can
have comment lines like the first line in your statfs.conf.  What
messages do you get when you try to start statfs?

>
> Thanks and best wishes,
> Roger
>

Same to you!  Happy holidays.

Happy holidays to all in the list too!  I am happy to report that I am
about to go production with my
42-node xcpu cluster, with bjs as the scheduler.  Now it is only mpi
that is still giving me trouble.  Next year...

Daniel

[xcpu] Re: perceus and xcpu

Reply via email to