Excellent write-up, Daniel. I am adding some of my comments and/or
suggestions inline.
I am trying to detail most of these steps in the wiki guide for xcpu. I will
take note of some of the points that you have made.
Thanks.

On Sat, Dec 13, 2008 at 10:10 AM, Daniel Gruner <[email protected]> wrote:

>
> Ok, here we go...
>
> I start from an almost vanilla RHEL5.2 machine, except for the kernel.
>  RHEL does not provide the 9p modules, so rather than trying to
> recompile their kernel I just got the 2.6.26 kernel from kernel.org.
> This allows me to build sxcpu right out of the box.


sxcpu does not need any kernel modules at all. xcpu2 uses the 9p and 9pnet
modules to mount the head node file system. You can also build the 9p
modules for a RHEL kernel.


> I obtained it
> from the sourceforge svn repository:
>
> svn co https://xcpu.svn.sourceforge.net/svnroot/xcpu/sxcpu/trunk sxcpu
>
> Here you simply do "make; make install" and it should all be
> available.


Another thing to note: there are a few prerequisites (libelf, openssl
headers) for sxcpu and you would have to install them for it to build
successfully if you are on a vanilla debian/ubuntu system.


>  It will be important to run the "statfs" daemon on the
> master, so that the commands you use later are aware of the status of
> the compute nodes in the cluster.  These will need to be configured in
> the /etc/xcpu/statfs.conf file:
>
> [r...@dgk3 xcpu]# cat /etc/xcpu/statfs.conf
> n0000=tcp!10.10.0.10!6667
> n0001=tcp!10.10.0.11!6667
>
> See below for more details on the assignment of IP addresses to the
> nodes by perceus.
>
>
> Then the perceus side of things:  I have perceus 1.4.4, downloaded
> directly from their site.  To build it I just did the usual
> ./configure; make; make install with no special options.  Now to the
> perceus configuration...
>
> My internal network to the compute nodes is eth0.  Here is the
> ifconfig for my master node:
>
> [r...@dgk3 all]# ifconfig
> eth0      Link encap:Ethernet  HWaddr 00:E0:81:2C:81:D0
>          inet addr:10.10.0.1  Bcast:10.10.0.255  Mask:255.255.255.0
>          inet6 addr: fe80::2e0:81ff:fe2c:81d0/64 Scope:Link
>          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>          RX packets:8044504 errors:0 dropped:0 overruns:0 frame:0
>          TX packets:10719515 errors:0 dropped:0 overruns:0 carrier:0
>          collisions:0 txqueuelen:1000
>          RX bytes:1770711038 (1.6 GiB)  TX bytes:1542820770 (1.4 GiB)
>          Interrupt:24
>
> eth1      Link encap:Ethernet  HWaddr 00:E0:81:2C:81:D1
>          inet addr:142.150.227.13  Bcast:142.150.227.255
>  Mask:255.255.252.0
>          inet6 addr: fec0::9:2e0:81ff:fe2c:81d1/64 Scope:Site
>          inet6 addr: 2002:8e96:e1cc:9:2e0:81ff:fe2c:81d1/64 Scope:Global
>          inet6 addr: fe80::2e0:81ff:fe2c:81d1/64 Scope:Link
>          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>          RX packets:44696410 errors:0 dropped:0 overruns:0 frame:0
>          TX packets:1158903 errors:0 dropped:0 overruns:0 carrier:0
>          collisions:0 txqueuelen:1000
>          RX bytes:4665982604 (4.3 GiB)  TX bytes:1197487595 (1.1 GiB)
>          Interrupt:25
>
> lo        Link encap:Local Loopback
>          inet addr:127.0.0.1  Mask:255.0.0.0
>          inet6 addr: ::1/128 Scope:Host
>          UP LOOPBACK RUNNING  MTU:16436  Metric:1
>          RX packets:21339 errors:0 dropped:0 overruns:0 frame:0
>          TX packets:21339 errors:0 dropped:0 overruns:0 carrier:0
>          collisions:0 txqueuelen:0
>          RX bytes:66081636 (63.0 MiB)  TX bytes:66081636 (63.0 MiB)
>
> In /etc/perceus there are several configuration files:
>
> ---defaults.conf---
> [r...@dgk3 perceus]# cat defaults.conf
> #
> # Copyright (c) 2006-2008, Greg M. Kurtzer, Arthur A. Stevens and
> # Infiscale, Inc. All rights reserved
> #
>
> # This is the template name for all new nodes as they are configured.
>
> # Define the node name range. The '#' characters symbolize the node number
> # in the order of initalized. If you don't allocate enough number spaces
> # here for what you defined in 'Total Nodes' then it will be automatically
> # padded.
> Node Name = n####
>
> # What is the default group for new nodes (this doesn't have to exist
> # anywhere before hand)
> Group Name = cluster
>
> # Define the default VNFS image that should be assigned to new nodes
> Vnfs Name =
>
> # Are new nodes automatically enabled and provisionined?
> Enabled = 1
>
> # What is the first node number that we should count at?
> First Node = 0
>
> # This is the total node count that Perceus would ever try and allocate a
> # node to. It is safe to make this big, so you should leave it big.
> Total Nodes = 10000
>
> (I did not modify the defaults.conf file).
>
>
> ---dnsmasq.conf---
> [r...@dgk3 perceus]# cat dnsmasq.conf
> interface=eth0
> enable-tftp
> tftp-root=/usr/local/var/lib/perceus//tftp
> dhcp-option=vendor:Etherboot,60,"Etherboot"
> dhcp-boot=pxelinux.0
> local=//
> domain=internal
> expand-hosts
> dhcp-range=10.10.0.128,10.10.0.254
> dhcp-lease-max=21600
> read-ethers
>
>
> ---perceus.conf---
> [r...@dgk3 perceus]# cat perceus.conf
> #
> # Copyright (c) 2006-2008, Greg M. Kurtzer, Arthur A. Stevens and
> # Infiscale, Inc. All rights reserved
> #
>
> # This is the primary configuration file for Perceus
>
> # Define the network device on this system that is connected directly
> # and privately to the nodes. This device will be responding to DHCP
> # requests thus make sure you specify the proper device name!
> # note: This device must be configured for IP based communication.
> master network device = eth0
>
> # What protocol should be used to retireve the VNFS information. Generally
> # Supported options in this version of Perceus are: 'xget', 'nfs', and
> 'http'
> # but others may also be available via specialized VNFS capsules or
> # feature enhancing Perceus Modules.
> vnfs transfer method = xget
>
> # Define the IP Address of the network file server. This address must be
> # set before Perceus can operate. If this option is left blank, the IP
> # address of the "master network device" defined above will be used.
> vnfs transfer master =
>
> # Define the VNFS transfer location if it is different from the default
> # ('statedir'). This gets used differently for different transfer methods
> # (e.g. NFS this replaces the path to statedir, while with http it is gets
> # prepended to the "/perceus" path).
> vnfs transfer prefix =
>
> # What is the default database that should be used. If this option is not
> # specified, then the default is "hash" to remain compatible with
> # previous versions of Perceus. Other options are 'btree' and 'mysql'.
> # note: btree is default as of version 1.4.
> database type = btree
>
> # If you selected an SQL database solution as your database type above,
> # then you will need to specify the SQL user login information here.
> # note: this will be ignored for non-SQL database types.
> database server = localhost
> database name = perceus
> database user = db user
> database pass = db pass
>
> # To allow for better scaling the Perceus daemon 'preforks' which creates
> # multiple subprocesses to better handle large number of simultaneous
> # connections. The default is 4 which on most systems can support
> # thousands of nodes per minute but for best tuning this number is highly
> # dependant on system configuration (both hardware and software).
> prefork = 4
>
> # How long (in seconds) should we wait before considering a node as dead.
> # Note, that if you are not running node client daemons, then after
> # provisioning the node will never check in, and will no doubt expire.
> # Considering that the default node check in is 5 minutes, setting this
> # to double that should ensure that any living node would have checked in
> # by then (600).
> node timeout = 600
>
>
> I only modified the master network device to point to eth0.  Note that
> there are no VNFS images defined, as booting xcpu does not require
> them.
>
> Install the perceus startup script in /etc/rc.d/init.d, so that it
> will start on boot.  I believe it gets installed by default (in the
> "make install" step), but it still needs to be configured with
> "chkconfig -a perceus".


I don't think this is necessary. Perceus manages the init scripts for most
distributions properly.


> This will start the perceus daemons,
> including the dnsmasq which provides dhcp for the slave nodes.  No
> other dhcp server can run, but this one can be configured to provide
> other network configuration for additional NICs.


Yes, this works fine in most cases. And dnsmasq is pretty customizable at
that.
http://www.thekelleys.org.uk/dnsmasq/docs/dnsmasq-man.html

Unless you have special needs like having all the compute nodes accessible
directly from a public network (yes i have heard that before!), it should
work for you.


>
>
> After rebooting, make sure perceus is running.  Then run the command:
>
> perceus module activate xcpu
> perceus module activate ipaddr
>
> In order to get static addreses assigned to the compute nodes
> (desireable), their addresses must be added to the /etc/hosts file,
> e.g.:
>
> 10.10.0.1       master
> 10.10.0.10      n0000
> 10.10.0.11      n0001
>
> You should be ready to start configuring the nodes at this stage.
> They must be set for pxe boot.  All the necessary stuff for this is
> installed by perceus in /usr/local/var/lib/perceus.  You boot your
> compute nodes in the order in which you want them named, starting, by
> default, as n0000.  The first time they will be assigned an IP address
> from the dynamic range defined in the /etc/perceus/dnsmasq.conf file,
> but on reboot they will get the statically assigned address from the
> /etc/hosts file.
>
> By this stage you should have a useable xcpu cluster.  You need to set
> up the groups and users using the xgroupset and xuserset commands.  In
> order to get "proper" behaviour, in accordance with the version of
> sxcpu that you downloaded and built, you may need to update the xcpufs
> provided by perceus.  This is done  by statically linking the xcpufs
> daemon:
>
> In /usr/local/src/sxcpu/xcpufs (or wherever you installed the sxcpu
> sources) there is a script called LINKSTATIC.  I am running on x86_64,
> so I modified it to read:
>
> [r...@dgk3 xcpufs]# cat LINKSTATIC
> #!/bin/sh
> echo This script is for linking statically on Linux.
> cc -static -o xcpufs.static -Wall -g -I ../include -DSYSNAME=Linux
> file.o pipe.o proc-Linux.o tspawn.o ufs.o xauth.o xcpufs.o  -g
> -L../libstrutil -lstrutil -L../libspclient -lspclient -L../libspfs
> -lspfs -L../libxauth -lxauth -lcrypto /usr/lib64/libdl.a
>
> Note that it produces the "xcpufs.static" executable, and it looks for
> its libdl.a library in the /usr/lib64 directory.  Then I copy the
> xcpufs.static executable to the location where perceus needs it:


Rather than doing this manually, it's recommended to put the tarball in the
right place within Perceus and then make -C 3rd_party/ xcpu to generate a
new xcpufs. Perceus applies its own static libs patch to sxcpu and you don't
have to worry about the multilib path.


>
>
> cp /usr/local/src/sxcpu/xcpufs/xcpufs.static
> /usr/local/var/lib/perceus/modules/xcpu/xcpufs
>
> and on reboot the nodes will pick up the latest and greatest xcpufs.
>
> After this you can do, for example:
>
> xgroupset add -a -u
> xuserset add -a -u
>
> in order to add all the groups and all the users to the permitted user
> list on the nodes.  You can then run anything on the nodes, e.g. "xrx
> -a date".
>
> Needless to say, this requires that the "statfs" daemon be running.
> You can verify this with the "xstat" command (see the configuration
> instructions for this above).
>
> Now, I don't know anything about IB, mainly because I have never had
> access to an IB-connected cluster.  I have no idea if perceus can
> manage pxe booting over IB, but I suspect that if you have IP over IB
> then it should, for all intents and purposes, look like just another
> network interface to it (I could be utterly wrong on this, of
> course...).


gPXE does have a working IB subsystem but I am not sure what network cards
do they support.


>
>
> However, if you need to configure a second interface, say for access
> to a fileserver on a separate network, then all you need to do is
> change the /etc/perceus/dnsmasq.conf and define the machines in there.
>  Again, for static IP addresses on the second interface they need to
> be defined in /etc/hosts.  Let me know if you would like details on
> how I did this.  I then mounted my fileserver on the compute nodes by
> modifying the perceus xcpu startup script in
> /etc/perceus/nodescripts/init/all/05-xcpu.sh, so that the node gets a
> mount point and executes the nfs mount.
>
> Please let me know if/how this works for you.  I hope it is complete...
> Best regards,
> Daniel
>
> p.s. Please feel free to modify this blurb and add it to the xcpu
> installation instructions.  Greg from the perceus group is extremely
> helpful with any perceus issues.
>

Yes, I will use this for the instructions on the wiki. Thanks.

 -- Abhishek



>
> On Fri, Dec 12, 2008 at 3:41 PM, Chris Kinney <[email protected]> wrote:
> > Hey Daniel,
> >
> >   My name is Chris Kinney, I'm Ron's intern. I was wondering if you could
> > show me how you're booting your perceus setup. We're in need of perceus
> > being able to work with IB and from what we've seen, the capsules just
> don't
> > work with it. What ever you got that can help that would be great! Thanks
> > again!
> >
> > -Chris
> >
> > ron minnich wrote:
> >>
> >> On Thu, Dec 11, 2008 at 5:48 PM, Daniel Gruner <[email protected]>
> wrote:
> >>
> >>>
> >>> Yeah, you don't even need a VNFS image in order to boot into xcpu!
> >>> All you need is the initial busybox provided by perceus and activating
> >>> the perceus xcpu module "perceus module activate xcpu".  This will
> >>> give you a minimal xcpu node, and you can then add remote filesystems
> >>> for mounting if necessary.  You only really need user files, if at
> >>> all, since the executables that you run with xrx take the necessary
> >>> libraries along automagically.
> >>>
> >>>
> >>
> >> daniel, this is an excellent point, and since we are having a terrible
> >> time getting our vnfs capsules to work with ib ...
> >>
> >> can you give us a quick writeup for how you set this up so we can use it
> >> too.
> >>
> >> thanks
> >>
> >> ron
> >>
> >>
> >
> >
>

Reply via email to