Have you tried turning off pfilter?  I hadn't noticed you were running 4.2.1,
but the pfilter package has problems on some systems.  I don't think they
were with NFS, but since this appears to be some sort of boot timing issue
you might try making a new image which doesn't include it and push that to
the nodes.  You'll need to turn the service off on the head node as well.

You might try a more specific NFS list, OSCAR doesn't do much with the stock
NFS install to my knowledge so the fact that they are unfamiliar with OSCAR
shouldn't mater much.

How many nodes are you using?  Have you tested with only one node to see if
it is a network bottleneck of some kind?  My brain keeps straying toward the
switch, or service startup order, since everything works once everything is
going.

On 4/16/07, Greenseid, Joseph M. <[EMAIL PROTECTED]> wrote:

I made this change but it does not seem to help.  The nodes still don't
catch the NFS file systems on boot. Very strange...

--Joe

________________________________

From: [EMAIL PROTECTED] on behalf of Greenseid,
Joseph M.
Sent: Fri 4/13/2007 8:43 AM
To: oscar-users@lists.sourceforge.net; oscar-users@lists.sourceforge.net
Subject: Re: [Oscar-users] NFS file systems not mounting on boot



Thanks for the suggestion.  I'll give it a try and let you know how it
goes.

--Joe

________________________________

From: [EMAIL PROTECTED] on behalf of Michael
Edwards
Sent: Thu 4/12/2007 5:54 PM
To: oscar-users@lists.sourceforge.net
Subject: Re: [Oscar-users] NFS file systems not mounting on boot


One thing you can try is to kick the value of the System V startup script
toward the end of the boot process.  I have had occasional problems where
the network startup took a very long time but seemed to go on in the
background after the system said [ok] and there was not any connectivity
until much later in the boot process.

check in /var/lib/systemimager/images/<imagename>/etc/rc3.d/ for netfs and
nfslock (at least), they should show up as symlinks, something like S25netfs
and S14nfslock.  You can mv the file to something like S90netfs and
S90nfslock.  You may need to play with this, but if nfslock is broken things
should work, just with lots of errors in the log files.

Don't forget to either cpush the files to the /etc/rc3.d on the nodes
and/or reimage the nodes.

This is a hack, but I have "had" to do it with autofs on some of my
systems to get things to work.


On 4/12/07, Greenseid, Joseph M. <[EMAIL PROTECTED]> wrote:

        The head node is not rebooting during this process.  It is simply
up and running; a good example of this is a node installation.

        On the head node, I am running the OSCAR Wizard; I am in the
monitor cluster screen, and the nodes are installing.  Upon completion of
the install, I set them to reboot.  They reboot (with the head node still
up, running the OSCAR Wizard), and no NFS file systems.  When I click the
"complete cluster setup" button after they reboot, it says success, and then
do the "test cluster setup," the new nodes do not have /home mounted, and
that test fails...

        --Joe

        ________________________________

        From: [EMAIL PROTECTED] on behalf of
Michael Edwards
        Sent: Thu 4/12/2007 2:15 PM
        To: oscar-users@lists.sourceforge.net
        Subject: Re: [Oscar-users] NFS file systems not mounting on boot



        Are you completely booting the head node before you boot the
client
        nodes?  If you don't the client nodes boot much faster than the
head
        (because they run  so few services) and they will generaly finish
        booting before the nfs server is up, so the mounts fail.

        On 4/12/07, Greenseid, Joseph M. < [EMAIL PROTECTED]>
wrote:
        > I am installing OSCAR 4.2.1 on an IA64 cluster.  When the nodes
reboot (after installation, and also every time after that), they fail to
mount any NFS file systems.  However, once the node has finished booting, if
I do a "mount /home" it mounts instantly.
        >
        > My exports file on my head node says:
        >
        > ~]$ cat /etc/exports
        > #
        > /home 10.2.148.1/255.255.255.0(async,rw,no_root_squash)
        > /share 10.2.148.1/255.255.255.0(async,rw)
        > ~]$
        >
        > My fstab entry on a compute node looks like this:
        >
        > # This file is edited by fstab-sync - see 'man fstab-sync' for
details
        > /dev/sda3       swap    swap    defaults        0       0
        > /dev/sda2       /       ext3    defaults        1       2
        > /dev/sda4       /tmp    ext3    defaults        1       2
        > /dev/sda1       /boot/efi       vfat    defaults        1
2
        > /dev/fd0        /mnt/floppy     auto    noauto,owner    0
0
        > none    /dev/pts        devpts  defaults        0       0
        > none    /proc   proc    defaults        0       0
        > nfs_oscar:/share        /share  nfs     rw      0       0
        > nfs_oscar:/home /home   nfs     rw      0       0
        > none      /dev/shm        tmpfs   defaults        0 0
        >
        >
        > I tried changing nfs_oscar to the head node's eth0 IP addr, and
the file looks like this:
        >
        >
        > # This file is edited by fstab-sync - see 'man fstab-sync' for
details
        > /dev/sda3       swap    swap    defaults        0       0
        > /dev/sda2       /       ext3    defaults        1       2
        > /dev/sda4       /tmp    ext3    defaults        1       2
        > /dev/sda1       /boot/efi       vfat    defaults        1
2
        > /dev/fd0        /mnt/floppy     auto    noauto,owner    0
0
        > none    /dev/pts        devpts  defaults        0       0
        > none    /proc   proc    defaults        0       0
        > 10.2.148.1:/share       /share  nfs     rw      0       0
        > 10.2.148.1:/home        /home   nfs     rw      0       0
        > none      /dev/shm        tmpfs   defaults        0 0
        >
        > However, NFS mounting during boot fails in both cases.
        >
        > I get variations of the error message in my /var/log/messages
log from the failures during boot:
        >
        > Apr 12 13:23:58 compute-15-01 mount: mount: mount to NFS server
'nfs_oscar' failed:
        > Apr 12 13:23:58 compute-15-01 mount: System Error: No route to
host.
        >
        > or
        >
        > Apr 12 13:24:07 compute-15-01 mount: mount: mount to NFS server
'nfs_oscar' failed: System Error: Connection refused
        >
        > or
        >
        > Apr 12 13:32:25 compute-15-01 mount: mount: mount to NFS server
'10.2.148.1' failed:
        > Apr 12 13:32:25 compute-15-01 mount: System Error: No route to
host.
        >
        > As I said, in both instances (both IP address and nfs_oscar
hostname in the fstab), once booting was complete, the command "mount /home"
worked perfectly fine.
        >
        > Any idea why it isn't working during boot?
        >
        > Thanks,
        > --Joe
        >
        >
-------------------------------------------------------------------------
        > Take Surveys. Earn Cash. Influence the Future of IT
        > Join SourceForge.net's Techsay panel and you'll get the chance
to share your
        > opinions on IT & business topics through brief surveys-and earn
cash
        >
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
        > _______________________________________________
        > Oscar-users mailing list
        > Oscar-users@lists.sourceforge.net
        > https://lists.sourceforge.net/lists/listinfo/oscar-users
        >


        
-------------------------------------------------------------------------
        Take Surveys. Earn Cash. Influence the Future of IT
        Join SourceForge.net 's Techsay panel and you'll get the chance to
share your
        opinions on IT & business topics through brief surveys-and earn
cash

http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
        _______________________________________________
        Oscar-users mailing list
        Oscar-users@lists.sourceforge.net
        https://lists.sourceforge.net/lists/listinfo/oscar-users




        
-------------------------------------------------------------------------
        Take Surveys. Earn Cash. Influence the Future of IT
        Join SourceForge.net's Techsay panel and you'll get the chance to
share your
        opinions on IT & business topics through brief surveys-and earn
cash

http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
        _______________________________________________
        Oscar-users mailing list
        Oscar-users@lists.sourceforge.net
        https://lists.sourceforge.net/lists/listinfo/oscar-users




-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share
your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to