On Sun, Jul 25, 2010 at 11:30 PM, Patrick Nolan <patrick.no...@stanford.edu> wrote:
>> First, there is an OSCAR 5.2b2 that is preferred to 5.1rc1. > > Thanks. I haven't run across it yet, though. Google doesn't seem > to know about it. Can you give a pointer? > Typo. I mean 5.1b2. http://svn.oscar.openclustergroup.org/php/download.php?d_name=beta >> If you are not using dhcp in the image setting, then client should >> never attempt dhcp (after initial node imaging). Something is wrong. >> > This has been on my mind. I have this nagging suspicion that some > script or program is deciding what interface to use and coming up > with Infiniband instead of Ethernet. Maybe it has to do with device > numbers or something. Would an unconfigured interface default to > DHCP? Again. I think that the problem is that the node image is missing something. Either a post imaging script failed to run, or the cp/rm/mv aliases prevented the imaging. It is a nasty problem. The aliases exist both on the headnode doing the imaging and on the compute nodes after imaging (I think there is a post image script that executes on the compute nodes. This is when the ipaddr and interface are set). > I've been trying various older distributions today. With 5.1, step > 3 comes to a halt because it's looking for repodata/filelists.sqlite.bz2 > from the CentOS mirror. But it provides a file called filelists.xml.gz > instead. It appears that CentOS 5.4 is the first version in which the > sqlite version occurred. When I used 5.4, it halted on the same file, > saying "Metadata file does not match checksum". I've been trying to > figure out where it gets the checksum, so far without success. > It has been the custom on the list to post the log file generated by OSCAR (and to run the script in verbose mode). Unfortunately, client boot problems are not well recorded in the logs. > I undid the aliases for mv,cp,rm but that had no effect. The best solution to this problem is to change all instances of cp/rm/mv in the code to /usr/bin/[cp|rm|mv]. In the 6.0 branch someone posted a patch that did just this. However, there was no communication from the repository maintainer to suggest that he even accepted those patches. If you don't want to modify all the oscar code, you might have to comment out the aliases in /root/.bashrc of the *image* as well (before the image is installed). That image is located somewhere like /var/lib/sysimager/images/*imagename*/ That means undefine your compute nodes. Edit the image. Reinstall the nodes. > When I'm stuck on step 3, can I switch distro versions by just editing > /tftpboot/distro/centos-5-x86_64.url ? Is there anything else that > needs to be changed? It might be best to nuke the whole thing and start over. >> By the way... what distribution are you using? >> > CentOS. The head node is version 5.5. I suggest 5.2. > Are you suggesting that the head node and the clients should have the > same version of CentOS? I hope not. Developers are installing stuff > on the head node and champing at the bit to get the cluster running. Well, that is one suggestion. Since you did the unalias maybe you can get away without it. ------------------------------------------------------------------------------ The Palm PDK Hot Apps Program offers developers who use the Plug-In Development Kit to bring their C/C++ apps to Palm for a share of $1 Million in cash or HP Products. Visit us here for more details: http://ad.doubleclick.net/clk;226879339;13503038;l? http://clk.atdmt.com/CRS/go/247765532/direct/01/ _______________________________________________ Oscar-users mailing list Oscar-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oscar-users