Jim: Your 2.8.5 rpms install the kernel module in /lib/modules/2.6.18-92.1.13.el5/kernel/fs/pvfs2/pvfs2.ko and 2.8.6 into /lib/modules/2.6.32-220.13.1.el6.x86_64/kernel/fs/pvfs2/pvfs2.ko. Please verify that:
/lib/modules/2.6.18-92.1.13.el5/modules.dep contains "kernel/fs/pvfs2/pvfs2.ko" /lib/modules/2.6.32-220.13.1.el6.x86_64/modules.dep contains "kernel/fs/pvfs2/pvfs2.ko" You might have to modify your rebuild scripts to execute a "depmod" AFTER the orangefs-kmod rpm is installed. Your scripts may already be doing this, and, if so, then the kernel module should be loaded after an install. You did mention that you thought the kernel module was being loaded properly. If that is the case, then adding the "LD_LIBRARY_PATH" to your OrangeFS init.d script should allow the client-core to fire up properly and then the following mount. Let me know if this works for you. Becky On Wed, Jul 25, 2012 at 5:50 PM, Jim Kusznir <[email protected]> wrote: > Here's the last file. > > On Wed, Jul 25, 2012 at 10:06 AM, Becky Ligon <[email protected]> wrote: > > Jim: > > > > One more thing: can you send me the pvfs2-client.log files from the > nodes > > where a KP has occurred? If possible, I'd like the corresponding > > /var/log/messages log file from when the KP happened. > > > > Thanks, > > Becky > > > > > > On Wed, Jul 25, 2012 at 1:05 PM, Becky Ligon <[email protected]> wrote: > >> > >> Jim: > >> > >> Can you also send me your PVFS server config file? > >> > >> Becky > >> > >> > >> On Wed, Jul 25, 2012 at 12:49 PM, Becky Ligon <[email protected]> > wrote: > >>> > >>> Jim: > >>> > >>> Can you send me the kmod-pvfs2-...rpm? I'd like to see how its files > are > >>> layed out. > >>> > >>> Thanks, > >>> Becky > >>> > >>> > >>> On Sat, Jul 21, 2012 at 4:46 PM, Jim Kusznir <[email protected]> > wrote: > >>>> > >>>> Hi Becky: > >>>> > >>>> Thanks for all your input. I was on travel and am currently catching > >>>> up on e-mail, so here are answers to your questions: > >>>> > >>>> 1) this problem occurs on both my ROCKS 5.1 (CentOS 5.2) and ROCKS 6 > >>>> (CentOS 6.2) clusters identically. > >>>> 2) I can mount manually using the init script. It just will not run > >>>> on boot. It tries, but fails with the error message supplied. > >>>> 3) The module is installed with a kmod-pvfs2-... rpm (as is required > >>>> for ROCKS clusters...Any software to be installed on each node needs > >>>> to be its own RPM). It appears to me that the module is being loaded > >>>> successfully. > >>>> 4) Ok, that sounds plausible. I'll make those corrections and see if > >>>> that fixes things. > >>>> > >>>> Of course, the mount on boot was one of two show-stopping issues. The > >>>> second show-stopping issue is how many kernel panics are being caused > >>>> by OrangeFS. I've been experiencing 3-8 KP's a week on a light to > >>>> moderate load on my cluster (24 nodes + head node, 3 pvfs nodes). > >>>> > >>>> My versions in use are: 2.8.5 (ROCKS 5.1), 2.8.6 (ROCKS 6). For my > >>>> users, I absolutely must have a "traditional filesystem interface" > >>>> (eg, MPI-IO or pvfs-* commands are not acceptable, they need to work > >>>> on the files like they would for any other filesystem). > >>>> > >>>> --Jim > >>>> > >>>> On Fri, Jul 20, 2012 at 1:45 PM, Becky Ligon <[email protected]> > wrote: > >>>> > Jim: > >>>> > > >>>> > In your init script, you need to add the LD_LIBRARY_PATH variable, > >>>> > since > >>>> > your pvfs library is not in a standard location: > >>>> > > >>>> > export LD_LIBRARY_PATH=/opt/pvfs2/lib:$LD_LIBRARY_PATH > >>>> > > >>>> > Remove the LD_PRELOAD. It is not needed here. > >>>> > > >>>> > Before "modprobe" will work, you have to run the command "depmod" to > >>>> > update > >>>> > the modules list. The "make kmod_install" does not automatically do > >>>> > this. > >>>> > NOTE: if you place the kernel module (pvfs2.ko) somewhere other > than > >>>> > /lib/modules/`uname -r`/kernel/fs/pvfs2, then you can't use modprobe > >>>> > to load > >>>> > the module. Instead, use "/sbin/insmod <path>/pvfs2.ko". If you > are > >>>> > using > >>>> > the rpm spec that I gave you (and it looks like you are), then > >>>> > pvfs2.ko is > >>>> > located in /opt/pvfs2/lib/pvfs2.ko, in which case, you have to use > the > >>>> > "insmod" command to load it and the "rmmod" command to unload it. > >>>> > > >>>> > When you issue a "stop", your script does not stop the client nor > does > >>>> > it > >>>> > unload the kernel module. This will cause problems if you issue a > >>>> > "start" > >>>> > by starting another pvfs2-client. I will send you the init script > >>>> > that we > >>>> > use here. Maybe, you can modify it to accommodate your environment. > >>>> > We > >>>> > have more checks in it than you have in yours. > >>>> > > >>>> > I am not familiar with how PVFS reacts to the "intr" option that you > >>>> > specify > >>>> > in the mount command. What is its purpose? > >>>> > > >>>> > Becky > >>>> > > >>>> > > >>>> > On Fri, Jul 20, 2012 at 3:27 PM, Becky Ligon <[email protected]> > >>>> > wrote: > >>>> >> > >>>> >> Jim: > >>>> >> > >>>> >> I just realized that you have already sent me your init script. > Let > >>>> >> me > >>>> >> take a closer look at it. > >>>> >> > >>>> >> Becky > >>>> >> > >>>> >> > >>>> >> On Fri, Jul 20, 2012 at 3:13 PM, Becky Ligon <[email protected]> > >>>> >> wrote: > >>>> >>> > >>>> >>> Jim: > >>>> >>> > >>>> >>> I have successfully booted my CentOS 6.2 system (using > >>>> >>> 2.6.32-220.13.1.el6.x86_64) and started the PVFS2 server and > mounted > >>>> >>> the > >>>> >>> client. Thus, I can only guess that there is something in your > >>>> >>> environment > >>>> >>> causing the problem. Is it possible for you to mount the client > by > >>>> >>> issuing > >>>> >>> the commands manually once the system is running? Can you send > me a > >>>> >>> copy of > >>>> >>> your startup script for mounting the client from your /etc/init.d > >>>> >>> directory? > >>>> >>> > >>>> >>> Becky > >>>> >>> > >>>> >>> > >>>> >>> On Thu, Jul 19, 2012 at 12:58 PM, Becky Ligon <[email protected] > > > >>>> >>> wrote: > >>>> >>>> > >>>> >>>> Jim: > >>>> >>>> > >>>> >>>> I have been able to successfully mount-on-boot on a VM with the > >>>> >>>> 2.6.32-220.13.1.el6.x86_64. However, I was using the Scientific > >>>> >>>> Linux 6 > >>>> >>>> distro and NOT CentOS 6.2. Next, I will try a CentOS 6.2 distro > >>>> >>>> and see > >>>> >>>> what happens with it. > >>>> >>>> > >>>> >>>> Becky > >>>> >>>> > >>>> >>>> > >>>> >>>> On Wed, Jul 18, 2012 at 5:14 PM, Becky Ligon <[email protected] > > > >>>> >>>> wrote: > >>>> >>>>> > >>>> >>>>> Jim: > >>>> >>>>> > >>>> >>>>> Is the mount-on-boot issue just with your CentOS 6.2 > environment? > >>>> >>>>> If > >>>> >>>>> so, which version of OrangeFS are you running? > >>>> >>>>> > >>>> >>>>> Becky > >>>> >>>>> > >>>> >>>>> > >>>> >>>>> On Wed, Jul 18, 2012 at 3:28 PM, Jim Kusznir < > [email protected]> > >>>> >>>>> wrote: > >>>> >>>>>> > >>>> >>>>>> I cannot reproduce the pvfs2 crash on demand. I have not yet > >>>> >>>>>> seen it > >>>> >>>>>> on centos 6, but I haven't placed centos6 into production yet. > >>>> >>>>>> > >>>> >>>>>> On my centos5 systems, its not reproducible on demand, but it > >>>> >>>>>> seems to > >>>> >>>>>> happen with moderate file access from a few different > processes. > >>>> >>>>>> Sometimes scp'ing files to/from pvfs2 on the head node (which > is > >>>> >>>>>> a > >>>> >>>>>> pvfs2 client) will do it. This has happened since the > beginning > >>>> >>>>>> of > >>>> >>>>>> pvfs2 for me; on the compute nodes, I'm not sure if there's > more > >>>> >>>>>> than > >>>> >>>>>> one process, but since I updated to OrangeFS 2.8.5, I've been > >>>> >>>>>> seeing > >>>> >>>>>> compute nodes KP with the previous screenshot (it did not crash > >>>> >>>>>> (that > >>>> >>>>>> I'm aware of) prior to OrangeFS 2.8.5 on compute nodes). > >>>> >>>>>> > >>>> >>>>>> Here's my /etc/init.d/pvfs2-client script: > >>>> >>>>>> --------------- > >>>> >>>>>> #!/bin/sh > >>>> >>>>>> # > >>>> >>>>>> # chkconfig: 2345 99 99 > >>>> >>>>>> # > >>>> >>>>>> # description: mount pvfs2 filesystem > >>>> >>>>>> # > >>>> >>>>>> > >>>> >>>>>> . /etc/rc.d/init.d/functions > >>>> >>>>>> #export LD_PRELOAD=/opt/db4/lib/ > >>>> >>>>>> case "$1" in > >>>> >>>>>> start) > >>>> >>>>>> echo -n "Mounting PVFS2 Filesystem: " > >>>> >>>>>> modprobe pvfs2 > >>>> >>>>>> /opt/pvfs2/sbin/pvfs2-client -p > >>>> >>>>>> /opt/pvfs2/sbin/pvfs2-client-core > >>>> >>>>>> mkdir -p /mnt/pvfs2 > >>>> >>>>>> mount -t pvfs2 -o intr tcp://pvfs2-io-0-0:3334/pvfs2-fs > >>>> >>>>>> /mnt/pvfs2 > >>>> >>>>>> touch /var/lock/subsys/pvfs2-client > >>>> >>>>>> ;; > >>>> >>>>>> > >>>> >>>>>> stop) > >>>> >>>>>> echo -n "Unmounting PVFS2 Filesystem: " > >>>> >>>>>> umount /mnt/pvfs2 > >>>> >>>>>> rm -f /var/lock/subsys/pvfs2-client > >>>> >>>>>> ;; > >>>> >>>>>> > >>>> >>>>>> restart) > >>>> >>>>>> $0 stop > >>>> >>>>>> $0 start > >>>> >>>>>> ;; > >>>> >>>>>> > >>>> >>>>>> status) > >>>> >>>>>> status $NAME > >>>> >>>>>> ;; > >>>> >>>>>> *) > >>>> >>>>>> echo "Usage: $NAME {start|stop|restart|status}" > >>>> >>>>>> exit 1 > >>>> >>>>>> esac > >>>> >>>>>> > >>>> >>>>>> exit 0 > >>>> >>>>>> ---------------- > >>>> >>>>>> I've tried with the export commented and uncommented, no > >>>> >>>>>> difference. > >>>> >>>>>> > >>>> >>>>>> --Jim > >>>> >>>>>> > >>>> >>>>>> On Wed, Jul 18, 2012 at 12:20 PM, Becky Ligon > >>>> >>>>>> <[email protected]> > >>>> >>>>>> wrote: > >>>> >>>>>> > Thanks, Jim. > >>>> >>>>>> > > >>>> >>>>>> > We are using 2.6.32-220.4.1.el6.x86_64 in our production > >>>> >>>>>> > environment. So, I > >>>> >>>>>> > should be able to setup a VM with your kernel version and > test. > >>>> >>>>>> > Can > >>>> >>>>>> > you > >>>> >>>>>> > give me a scenario to try in order to reproduce the problem? > >>>> >>>>>> > > >>>> >>>>>> > I am also setting up a CENTOS 6 VM, so I can analyze the > >>>> >>>>>> > mount-with-boot > >>>> >>>>>> > issue. > >>>> >>>>>> > > >>>> >>>>>> > Becky > >>>> >>>>>> > > >>>> >>>>>> > > >>>> >>>>>> > On Wed, Jul 18, 2012 at 3:16 PM, Jim Kusznir > >>>> >>>>>> > <[email protected]> > >>>> >>>>>> > wrote: > >>>> >>>>>> >> > >>>> >>>>>> >> [root@aeoltest torque]# rpm -qa |grep kernel > >>>> >>>>>> >> kernel-2.6.32-220.13.1.el6.x86_64 > >>>> >>>>>> >> dracut-kernel-004-256.el6_2.1.noarch > >>>> >>>>>> >> kernel-devel-2.6.32-220.13.1.el6.x86_64 > >>>> >>>>>> >> kernel-headers-2.6.32-220.13.1.el6.x86_64 > >>>> >>>>>> >> kernel-firmware-2.6.32-220.13.1.el6.noarch > >>>> >>>>>> >> kernel-doc-2.6.32-220.13.1.el6.noarch > >>>> >>>>>> >> [root@aeoltest torque]# uname -a > >>>> >>>>>> >> Linux aeoltest.local 2.6.32-220.13.1.el6.x86_64 #1 SMP Tue > Apr > >>>> >>>>>> >> 17 > >>>> >>>>>> >> 23:56:34 BST 2012 x86_64 x86_64 x86_64 GNU/Linux > >>>> >>>>>> >> [root@aeoltest torque]# > >>>> >>>>>> >> > >>>> >>>>>> >> > >>>> >>>>>> >> On Wed, Jul 18, 2012 at 12:10 PM, Becky Ligon > >>>> >>>>>> >> <[email protected]> > >>>> >>>>>> >> wrote: > >>>> >>>>>> >> > Jim: > >>>> >>>>>> >> > > >>>> >>>>>> >> > We are working on a few corrections to the user library, > as > >>>> >>>>>> >> > we > >>>> >>>>>> >> > speak, > >>>> >>>>>> >> > that > >>>> >>>>>> >> > were identified last week. Using LD_PRELOAD would > >>>> >>>>>> >> > definitely get > >>>> >>>>>> >> > around > >>>> >>>>>> >> > the > >>>> >>>>>> >> > kernel issues at hand, but I ask that you wait until we > have > >>>> >>>>>> >> > all > >>>> >>>>>> >> > of the > >>>> >>>>>> >> > current corrections in place before using it. > >>>> >>>>>> >> > > >>>> >>>>>> >> > I also have some questions for you. I am working the > issue > >>>> >>>>>> >> > with > >>>> >>>>>> >> > the > >>>> >>>>>> >> > "won't > >>>> >>>>>> >> > mount on boot" issue and would like to know the specific > >>>> >>>>>> >> > kernel > >>>> >>>>>> >> > that you > >>>> >>>>>> >> > are > >>>> >>>>>> >> > using under CentOS 6.2. > >>>> >>>>>> >> > > >>>> >>>>>> >> > Thanks, > >>>> >>>>>> >> > Becky > >>>> >>>>>> >> > > >>>> >>>>>> >> > > >>>> >>>>>> >> > On Wed, Jul 18, 2012 at 3:01 PM, Jim Kusznir > >>>> >>>>>> >> > <[email protected]> > >>>> >>>>>> >> > wrote: > >>>> >>>>>> >> >> > >>>> >>>>>> >> >> I managed to get a screenshot of a ip-kvm with the last > >>>> >>>>>> >> >> chunk of > >>>> >>>>>> >> >> a > >>>> >>>>>> >> >> pvfs-induced KP on a compute node; image attached. > >>>> >>>>>> >> >> > >>>> >>>>>> >> >> With respect to client access methods, perhaps I should > >>>> >>>>>> >> >> switch > >>>> >>>>>> >> >> to a > >>>> >>>>>> >> >> user space solution. I remember hearing about an > >>>> >>>>>> >> >> LD_Preload > >>>> >>>>>> >> >> client > >>>> >>>>>> >> >> module (not using fuse, but being entirely userspace). > Is > >>>> >>>>>> >> >> that > >>>> >>>>>> >> >> "ready" with 2.8.6? If not, perhaps I need to switch to > >>>> >>>>>> >> >> the > >>>> >>>>>> >> >> fuse > >>>> >>>>>> >> >> module... > >>>> >>>>>> >> >> > >>>> >>>>>> >> >> --Jim > >>>> >>>>>> >> >> > >>>> >>>>>> >> >> On Wed, Jul 18, 2012 at 11:46 AM, Andrew Savchenko > >>>> >>>>>> >> >> <[email protected]> > >>>> >>>>>> >> >> wrote: > >>>> >>>>>> >> >> > Hello Becky, > >>>> >>>>>> >> >> > > >>>> >>>>>> >> >> > On Wed, 18 Jul 2012 12:43:51 -0400 Becky Ligon wrote: > >>>> >>>>>> >> >> >> Andrew: > >>>> >>>>>> >> >> >> > >>>> >>>>>> >> >> >> 2.8.6 does not fix the problem you were seeing with > >>>> >>>>>> >> >> >> question > >>>> >>>>>> >> >> >> marks > >>>> >>>>>> >> >> >> in > >>>> >>>>>> >> >> >> the > >>>> >>>>>> >> >> >> "ls" output, but we are working on it. > >>>> >>>>>> >> >> >> > >>>> >>>>>> >> >> >> Just FYI! > >>>> >>>>>> >> >> > > >>>> >>>>>> >> >> > Thanks for the warning. I'll keep sticking to the fuse > >>>> >>>>>> >> >> > client > >>>> >>>>>> >> >> > during > >>>> >>>>>> >> >> > update then. > >>>> >>>>>> >> >> > > >>>> >>>>>> >> >> > Best regards, > >>>> >>>>>> >> >> > Andrew Savchenko > >>>> >>>>>> >> >> > > >>>> >>>>>> >> >> > _______________________________________________ > >>>> >>>>>> >> >> > Pvfs2-users mailing list > >>>> >>>>>> >> >> > [email protected] > >>>> >>>>>> >> >> > > >>>> >>>>>> >> >> > > >>>> >>>>>> >> >> > > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users > >>>> >>>>>> >> >> > > >>>> >>>>>> >> > > >>>> >>>>>> >> > > >>>> >>>>>> >> > > >>>> >>>>>> >> > > >>>> >>>>>> >> > -- > >>>> >>>>>> >> > Becky Ligon > >>>> >>>>>> >> > OrangeFS Support and Development > >>>> >>>>>> >> > Omnibond Systems > >>>> >>>>>> >> > Anderson, South Carolina > >>>> >>>>>> >> > > >>>> >>>>>> >> > > >>>> >>>>>> > > >>>> >>>>>> > > >>>> >>>>>> > > >>>> >>>>>> > > >>>> >>>>>> > -- > >>>> >>>>>> > Becky Ligon > >>>> >>>>>> > OrangeFS Support and Development > >>>> >>>>>> > Omnibond Systems > >>>> >>>>>> > Anderson, South Carolina > >>>> >>>>>> > > >>>> >>>>>> > > >>>> >>>>> > >>>> >>>>> > >>>> >>>>> > >>>> >>>>> > >>>> >>>>> -- > >>>> >>>>> Becky Ligon > >>>> >>>>> OrangeFS Support and Development > >>>> >>>>> Omnibond Systems > >>>> >>>>> Anderson, South Carolina > >>>> >>>>> > >>>> >>>>> > >>>> >>>> > >>>> >>>> > >>>> >>>> > >>>> >>>> -- > >>>> >>>> Becky Ligon > >>>> >>>> OrangeFS Support and Development > >>>> >>>> Omnibond Systems > >>>> >>>> Anderson, South Carolina > >>>> >>>> > >>>> >>>> > >>>> >>> > >>>> >>> > >>>> >>> > >>>> >>> -- > >>>> >>> Becky Ligon > >>>> >>> OrangeFS Support and Development > >>>> >>> Omnibond Systems > >>>> >>> Anderson, South Carolina > >>>> >>> > >>>> >>> > >>>> >> > >>>> >> > >>>> >> > >>>> >> -- > >>>> >> Becky Ligon > >>>> >> OrangeFS Support and Development > >>>> >> Omnibond Systems > >>>> >> Anderson, South Carolina > >>>> >> > >>>> >> > >>>> > > >>>> > > >>>> > > >>>> > -- > >>>> > Becky Ligon > >>>> > OrangeFS Support and Development > >>>> > Omnibond Systems > >>>> > Anderson, South Carolina > >>>> > > >>>> > > >>> > >>> > >>> > >>> > >>> -- > >>> Becky Ligon > >>> OrangeFS Support and Development > >>> Omnibond Systems > >>> Anderson, South Carolina > >>> > >>> > >> > >> > >> > >> -- > >> Becky Ligon > >> OrangeFS Support and Development > >> Omnibond Systems > >> Anderson, South Carolina > >> > >> > > > > > > > > -- > > Becky Ligon > > OrangeFS Support and Development > > Omnibond Systems > > Anderson, South Carolina > > > > > -- Becky Ligon OrangeFS Support and Development Omnibond Systems Anderson, South Carolina
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
