Thanks for the corrections! I too have some comments inline... On Sat, Dec 13, 2008 at 1:10 PM, Abhishek Kulkarni <[email protected]> wrote: > Excellent write-up, Daniel. I am adding some of my comments and/or > suggestions inline. > I am trying to detail most of these steps in the wiki guide for xcpu. I will > take note of some of the points that you have made. > Thanks. > > On Sat, Dec 13, 2008 at 10:10 AM, Daniel Gruner <[email protected]> wrote: >> >> Ok, here we go... >> >> I start from an almost vanilla RHEL5.2 machine, except for the kernel. >> RHEL does not provide the 9p modules, so rather than trying to >> recompile their kernel I just got the 2.6.26 kernel from kernel.org. >> This allows me to build sxcpu right out of the box. > > sxcpu does not need any kernel modules at all. xcpu2 uses the 9p and 9pnet > modules to mount the head node file system. You can also build the 9p > modules for a RHEL kernel. >
I guess way back, when I started with perceus, I was still trying to use xcpu2, hence the need for a different kernel with 9p support. Then we realized that perceus includes sxcpu out of the box, so I went back to that. xcpu2 is still enticing, but I am not very comfortable with it - yet. Perhaps when the writeup is done and it can be explained in more detail, including its benefits and pitfalls, I'll go to it. >> >> I obtained it >> from the sourceforge svn repository: >> >> svn co https://xcpu.svn.sourceforge.net/svnroot/xcpu/sxcpu/trunk sxcpu >> >> Here you simply do "make; make install" and it should all be >> available. > > Another thing to note: there are a few prerequisites (libelf, openssl > headers) for sxcpu and you would have to install them for it to build > successfully if you are on a vanilla debian/ubuntu system. > >> >> It will be important to run the "statfs" daemon on the >> master, so that the commands you use later are aware of the status of >> the compute nodes in the cluster. These will need to be configured in >> the /etc/xcpu/statfs.conf file: >> >> [r...@dgk3 xcpu]# cat /etc/xcpu/statfs.conf >> n0000=tcp!10.10.0.10!6667 >> n0001=tcp!10.10.0.11!6667 >> >> See below for more details on the assignment of IP addresses to the >> nodes by perceus. >> >> >> Then the perceus side of things: I have perceus 1.4.4, downloaded >> directly from their site. To build it I just did the usual >> ./configure; make; make install with no special options. Now to the >> perceus configuration... >> >> My internal network to the compute nodes is eth0. Here is the >> ifconfig for my master node: >> >> [r...@dgk3 all]# ifconfig >> eth0 Link encap:Ethernet HWaddr 00:E0:81:2C:81:D0 >> inet addr:10.10.0.1 Bcast:10.10.0.255 Mask:255.255.255.0 >> inet6 addr: fe80::2e0:81ff:fe2c:81d0/64 Scope:Link >> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >> RX packets:8044504 errors:0 dropped:0 overruns:0 frame:0 >> TX packets:10719515 errors:0 dropped:0 overruns:0 carrier:0 >> collisions:0 txqueuelen:1000 >> RX bytes:1770711038 (1.6 GiB) TX bytes:1542820770 (1.4 GiB) >> Interrupt:24 >> >> eth1 Link encap:Ethernet HWaddr 00:E0:81:2C:81:D1 >> inet addr:142.150.227.13 Bcast:142.150.227.255 >> Mask:255.255.252.0 >> inet6 addr: fec0::9:2e0:81ff:fe2c:81d1/64 Scope:Site >> inet6 addr: 2002:8e96:e1cc:9:2e0:81ff:fe2c:81d1/64 Scope:Global >> inet6 addr: fe80::2e0:81ff:fe2c:81d1/64 Scope:Link >> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >> RX packets:44696410 errors:0 dropped:0 overruns:0 frame:0 >> TX packets:1158903 errors:0 dropped:0 overruns:0 carrier:0 >> collisions:0 txqueuelen:1000 >> RX bytes:4665982604 (4.3 GiB) TX bytes:1197487595 (1.1 GiB) >> Interrupt:25 >> >> lo Link encap:Local Loopback >> inet addr:127.0.0.1 Mask:255.0.0.0 >> inet6 addr: ::1/128 Scope:Host >> UP LOOPBACK RUNNING MTU:16436 Metric:1 >> RX packets:21339 errors:0 dropped:0 overruns:0 frame:0 >> TX packets:21339 errors:0 dropped:0 overruns:0 carrier:0 >> collisions:0 txqueuelen:0 >> RX bytes:66081636 (63.0 MiB) TX bytes:66081636 (63.0 MiB) >> >> In /etc/perceus there are several configuration files: >> >> ---defaults.conf--- >> [r...@dgk3 perceus]# cat defaults.conf >> # >> # Copyright (c) 2006-2008, Greg M. Kurtzer, Arthur A. Stevens and >> # Infiscale, Inc. All rights reserved >> # >> >> # This is the template name for all new nodes as they are configured. >> >> # Define the node name range. The '#' characters symbolize the node number >> # in the order of initalized. If you don't allocate enough number spaces >> # here for what you defined in 'Total Nodes' then it will be automatically >> # padded. >> Node Name = n#### >> >> # What is the default group for new nodes (this doesn't have to exist >> # anywhere before hand) >> Group Name = cluster >> >> # Define the default VNFS image that should be assigned to new nodes >> Vnfs Name = >> >> # Are new nodes automatically enabled and provisionined? >> Enabled = 1 >> >> # What is the first node number that we should count at? >> First Node = 0 >> >> # This is the total node count that Perceus would ever try and allocate a >> # node to. It is safe to make this big, so you should leave it big. >> Total Nodes = 10000 >> >> (I did not modify the defaults.conf file). >> >> >> ---dnsmasq.conf--- >> [r...@dgk3 perceus]# cat dnsmasq.conf >> interface=eth0 >> enable-tftp >> tftp-root=/usr/local/var/lib/perceus//tftp >> dhcp-option=vendor:Etherboot,60,"Etherboot" >> dhcp-boot=pxelinux.0 >> local=// >> domain=internal >> expand-hosts >> dhcp-range=10.10.0.128,10.10.0.254 >> dhcp-lease-max=21600 >> read-ethers >> >> >> ---perceus.conf--- >> [r...@dgk3 perceus]# cat perceus.conf >> # >> # Copyright (c) 2006-2008, Greg M. Kurtzer, Arthur A. Stevens and >> # Infiscale, Inc. All rights reserved >> # >> >> # This is the primary configuration file for Perceus >> >> # Define the network device on this system that is connected directly >> # and privately to the nodes. This device will be responding to DHCP >> # requests thus make sure you specify the proper device name! >> # note: This device must be configured for IP based communication. >> master network device = eth0 >> >> # What protocol should be used to retireve the VNFS information. Generally >> # Supported options in this version of Perceus are: 'xget', 'nfs', and >> 'http' >> # but others may also be available via specialized VNFS capsules or >> # feature enhancing Perceus Modules. >> vnfs transfer method = xget >> >> # Define the IP Address of the network file server. This address must be >> # set before Perceus can operate. If this option is left blank, the IP >> # address of the "master network device" defined above will be used. >> vnfs transfer master = >> >> # Define the VNFS transfer location if it is different from the default >> # ('statedir'). This gets used differently for different transfer methods >> # (e.g. NFS this replaces the path to statedir, while with http it is gets >> # prepended to the "/perceus" path). >> vnfs transfer prefix = >> >> # What is the default database that should be used. If this option is not >> # specified, then the default is "hash" to remain compatible with >> # previous versions of Perceus. Other options are 'btree' and 'mysql'. >> # note: btree is default as of version 1.4. >> database type = btree >> >> # If you selected an SQL database solution as your database type above, >> # then you will need to specify the SQL user login information here. >> # note: this will be ignored for non-SQL database types. >> database server = localhost >> database name = perceus >> database user = db user >> database pass = db pass >> >> # To allow for better scaling the Perceus daemon 'preforks' which creates >> # multiple subprocesses to better handle large number of simultaneous >> # connections. The default is 4 which on most systems can support >> # thousands of nodes per minute but for best tuning this number is highly >> # dependant on system configuration (both hardware and software). >> prefork = 4 >> >> # How long (in seconds) should we wait before considering a node as dead. >> # Note, that if you are not running node client daemons, then after >> # provisioning the node will never check in, and will no doubt expire. >> # Considering that the default node check in is 5 minutes, setting this >> # to double that should ensure that any living node would have checked in >> # by then (600). >> node timeout = 600 >> >> >> I only modified the master network device to point to eth0. Note that >> there are no VNFS images defined, as booting xcpu does not require >> them. >> >> Install the perceus startup script in /etc/rc.d/init.d, so that it >> will start on boot. I believe it gets installed by default (in the >> "make install" step), but it still needs to be configured with >> "chkconfig -a perceus". > > I don't think this is necessary. Perceus manages the init scripts for most > distributions properly. > I just though it would be good to mention it. I do not remember if perceus did this automatically or not... >> >> This will start the perceus daemons, >> including the dnsmasq which provides dhcp for the slave nodes. No >> other dhcp server can run, but this one can be configured to provide >> other network configuration for additional NICs. > > Yes, this works fine in most cases. And dnsmasq is pretty customizable at > that. > http://www.thekelleys.org.uk/dnsmasq/docs/dnsmasq-man.html > > Unless you have special needs like having all the compute nodes accessible > directly from a public network (yes i have heard that before!), it should > work for you. Ugg! I believe in compute nodes needing an external fileserver, but not in direct access to the nodes from the outside. > >> >> After rebooting, make sure perceus is running. Then run the command: >> >> perceus module activate xcpu >> perceus module activate ipaddr >> >> In order to get static addreses assigned to the compute nodes >> (desireable), their addresses must be added to the /etc/hosts file, >> e.g.: >> >> 10.10.0.1 master >> 10.10.0.10 n0000 >> 10.10.0.11 n0001 >> >> You should be ready to start configuring the nodes at this stage. >> They must be set for pxe boot. All the necessary stuff for this is >> installed by perceus in /usr/local/var/lib/perceus. You boot your >> compute nodes in the order in which you want them named, starting, by >> default, as n0000. The first time they will be assigned an IP address >> from the dynamic range defined in the /etc/perceus/dnsmasq.conf file, >> but on reboot they will get the statically assigned address from the >> /etc/hosts file. >> >> By this stage you should have a useable xcpu cluster. You need to set >> up the groups and users using the xgroupset and xuserset commands. In >> order to get "proper" behaviour, in accordance with the version of >> sxcpu that you downloaded and built, you may need to update the xcpufs >> provided by perceus. This is done by statically linking the xcpufs >> daemon: >> >> In /usr/local/src/sxcpu/xcpufs (or wherever you installed the sxcpu >> sources) there is a script called LINKSTATIC. I am running on x86_64, >> so I modified it to read: >> >> [r...@dgk3 xcpufs]# cat LINKSTATIC >> #!/bin/sh >> echo This script is for linking statically on Linux. >> cc -static -o xcpufs.static -Wall -g -I ../include -DSYSNAME=Linux >> file.o pipe.o proc-Linux.o tspawn.o ufs.o xauth.o xcpufs.o -g >> -L../libstrutil -lstrutil -L../libspclient -lspclient -L../libspfs >> -lspfs -L../libxauth -lxauth -lcrypto /usr/lib64/libdl.a >> >> Note that it produces the "xcpufs.static" executable, and it looks for >> its libdl.a library in the /usr/lib64 directory. Then I copy the >> xcpufs.static executable to the location where perceus needs it: > > Rather than doing this manually, it's recommended to put the tarball in the > right place within Perceus and then make -C 3rd_party/ xcpu to generate a > new xcpufs. Perceus applies its own static libs patch to sxcpu and you don't > have to worry about the multilib path. > Ok this is a good idea. I didn't know where perceus expected this. Looking at the makefile in the 3rd_party directory of perceus it seems that one needs to possibly change it to correspond to whatever version of sxcpu one provides. Should not be a big problem. >> >> cp /usr/local/src/sxcpu/xcpufs/xcpufs.static >> /usr/local/var/lib/perceus/modules/xcpu/xcpufs >> >> and on reboot the nodes will pick up the latest and greatest xcpufs. >> >> After this you can do, for example: >> >> xgroupset add -a -u >> xuserset add -a -u >> >> in order to add all the groups and all the users to the permitted user >> list on the nodes. You can then run anything on the nodes, e.g. "xrx >> -a date". >> >> Needless to say, this requires that the "statfs" daemon be running. >> You can verify this with the "xstat" command (see the configuration >> instructions for this above). >> >> Now, I don't know anything about IB, mainly because I have never had >> access to an IB-connected cluster. I have no idea if perceus can >> manage pxe booting over IB, but I suspect that if you have IP over IB >> then it should, for all intents and purposes, look like just another >> network interface to it (I could be utterly wrong on this, of >> course...). > > gPXE does have a working IB subsystem but I am not sure what network cards > do they support. > >> >> However, if you need to configure a second interface, say for access >> to a fileserver on a separate network, then all you need to do is >> change the /etc/perceus/dnsmasq.conf and define the machines in there. >> Again, for static IP addresses on the second interface they need to >> be defined in /etc/hosts. Let me know if you would like details on >> how I did this. I then mounted my fileserver on the compute nodes by >> modifying the perceus xcpu startup script in >> /etc/perceus/nodescripts/init/all/05-xcpu.sh, so that the node gets a >> mount point and executes the nfs mount. >> >> Please let me know if/how this works for you. I hope it is complete... >> Best regards, >> Daniel >> >> p.s. Please feel free to modify this blurb and add it to the xcpu >> installation instructions. Greg from the perceus group is extremely >> helpful with any perceus issues. > > Yes, I will use this for the instructions on the wiki. Thanks. > > -- Abhishek Great! Is the wiki available for perusing yet? Daniel > > >> >> On Fri, Dec 12, 2008 at 3:41 PM, Chris Kinney <[email protected]> wrote: >> > Hey Daniel, >> > >> > My name is Chris Kinney, I'm Ron's intern. I was wondering if you >> > could >> > show me how you're booting your perceus setup. We're in need of perceus >> > being able to work with IB and from what we've seen, the capsules just >> > don't >> > work with it. What ever you got that can help that would be great! >> > Thanks >> > again! >> > >> > -Chris >> > >> > ron minnich wrote: >> >> >> >> On Thu, Dec 11, 2008 at 5:48 PM, Daniel Gruner <[email protected]> >> >> wrote: >> >> >> >>> >> >>> Yeah, you don't even need a VNFS image in order to boot into xcpu! >> >>> All you need is the initial busybox provided by perceus and activating >> >>> the perceus xcpu module "perceus module activate xcpu". This will >> >>> give you a minimal xcpu node, and you can then add remote filesystems >> >>> for mounting if necessary. You only really need user files, if at >> >>> all, since the executables that you run with xrx take the necessary >> >>> libraries along automagically. >> >>> >> >>> >> >> >> >> daniel, this is an excellent point, and since we are having a terrible >> >> time getting our vnfs capsules to work with ib ... >> >> >> >> can you give us a quick writeup for how you set this up so we can use >> >> it >> >> too. >> >> >> >> thanks >> >> >> >> ron >> >> >> >> >> > >> > > >
