Jim: Can you send me the kmod-pvfs2-...rpm? I'd like to see how its files are layed out.
Thanks, Becky On Sat, Jul 21, 2012 at 4:46 PM, Jim Kusznir <[email protected]> wrote: > Hi Becky: > > Thanks for all your input. I was on travel and am currently catching > up on e-mail, so here are answers to your questions: > > 1) this problem occurs on both my ROCKS 5.1 (CentOS 5.2) and ROCKS 6 > (CentOS 6.2) clusters identically. > 2) I can mount manually using the init script. It just will not run > on boot. It tries, but fails with the error message supplied. > 3) The module is installed with a kmod-pvfs2-... rpm (as is required > for ROCKS clusters...Any software to be installed on each node needs > to be its own RPM). It appears to me that the module is being loaded > successfully. > 4) Ok, that sounds plausible. I'll make those corrections and see if > that fixes things. > > Of course, the mount on boot was one of two show-stopping issues. The > second show-stopping issue is how many kernel panics are being caused > by OrangeFS. I've been experiencing 3-8 KP's a week on a light to > moderate load on my cluster (24 nodes + head node, 3 pvfs nodes). > > My versions in use are: 2.8.5 (ROCKS 5.1), 2.8.6 (ROCKS 6). For my > users, I absolutely must have a "traditional filesystem interface" > (eg, MPI-IO or pvfs-* commands are not acceptable, they need to work > on the files like they would for any other filesystem). > > --Jim > > On Fri, Jul 20, 2012 at 1:45 PM, Becky Ligon <[email protected]> wrote: > > Jim: > > > > In your init script, you need to add the LD_LIBRARY_PATH variable, since > > your pvfs library is not in a standard location: > > > > export LD_LIBRARY_PATH=/opt/pvfs2/lib:$LD_LIBRARY_PATH > > > > Remove the LD_PRELOAD. It is not needed here. > > > > Before "modprobe" will work, you have to run the command "depmod" to > update > > the modules list. The "make kmod_install" does not automatically do > this. > > NOTE: if you place the kernel module (pvfs2.ko) somewhere other than > > /lib/modules/`uname -r`/kernel/fs/pvfs2, then you can't use modprobe to > load > > the module. Instead, use "/sbin/insmod <path>/pvfs2.ko". If you are > using > > the rpm spec that I gave you (and it looks like you are), then pvfs2.ko > is > > located in /opt/pvfs2/lib/pvfs2.ko, in which case, you have to use the > > "insmod" command to load it and the "rmmod" command to unload it. > > > > When you issue a "stop", your script does not stop the client nor does it > > unload the kernel module. This will cause problems if you issue a > "start" > > by starting another pvfs2-client. I will send you the init script that > we > > use here. Maybe, you can modify it to accommodate your environment. We > > have more checks in it than you have in yours. > > > > I am not familiar with how PVFS reacts to the "intr" option that you > specify > > in the mount command. What is its purpose? > > > > Becky > > > > > > On Fri, Jul 20, 2012 at 3:27 PM, Becky Ligon <[email protected]> wrote: > >> > >> Jim: > >> > >> I just realized that you have already sent me your init script. Let me > >> take a closer look at it. > >> > >> Becky > >> > >> > >> On Fri, Jul 20, 2012 at 3:13 PM, Becky Ligon <[email protected]> > wrote: > >>> > >>> Jim: > >>> > >>> I have successfully booted my CentOS 6.2 system (using > >>> 2.6.32-220.13.1.el6.x86_64) and started the PVFS2 server and mounted > the > >>> client. Thus, I can only guess that there is something in your > environment > >>> causing the problem. Is it possible for you to mount the client by > issuing > >>> the commands manually once the system is running? Can you send me a > copy of > >>> your startup script for mounting the client from your /etc/init.d > directory? > >>> > >>> Becky > >>> > >>> > >>> On Thu, Jul 19, 2012 at 12:58 PM, Becky Ligon <[email protected]> > wrote: > >>>> > >>>> Jim: > >>>> > >>>> I have been able to successfully mount-on-boot on a VM with the > >>>> 2.6.32-220.13.1.el6.x86_64. However, I was using the Scientific > Linux 6 > >>>> distro and NOT CentOS 6.2. Next, I will try a CentOS 6.2 distro and > see > >>>> what happens with it. > >>>> > >>>> Becky > >>>> > >>>> > >>>> On Wed, Jul 18, 2012 at 5:14 PM, Becky Ligon <[email protected]> > wrote: > >>>>> > >>>>> Jim: > >>>>> > >>>>> Is the mount-on-boot issue just with your CentOS 6.2 environment? If > >>>>> so, which version of OrangeFS are you running? > >>>>> > >>>>> Becky > >>>>> > >>>>> > >>>>> On Wed, Jul 18, 2012 at 3:28 PM, Jim Kusznir <[email protected]> > >>>>> wrote: > >>>>>> > >>>>>> I cannot reproduce the pvfs2 crash on demand. I have not yet seen > it > >>>>>> on centos 6, but I haven't placed centos6 into production yet. > >>>>>> > >>>>>> On my centos5 systems, its not reproducible on demand, but it seems > to > >>>>>> happen with moderate file access from a few different processes. > >>>>>> Sometimes scp'ing files to/from pvfs2 on the head node (which is a > >>>>>> pvfs2 client) will do it. This has happened since the beginning of > >>>>>> pvfs2 for me; on the compute nodes, I'm not sure if there's more > than > >>>>>> one process, but since I updated to OrangeFS 2.8.5, I've been seeing > >>>>>> compute nodes KP with the previous screenshot (it did not crash > (that > >>>>>> I'm aware of) prior to OrangeFS 2.8.5 on compute nodes). > >>>>>> > >>>>>> Here's my /etc/init.d/pvfs2-client script: > >>>>>> --------------- > >>>>>> #!/bin/sh > >>>>>> # > >>>>>> # chkconfig: 2345 99 99 > >>>>>> # > >>>>>> # description: mount pvfs2 filesystem > >>>>>> # > >>>>>> > >>>>>> . /etc/rc.d/init.d/functions > >>>>>> #export LD_PRELOAD=/opt/db4/lib/ > >>>>>> case "$1" in > >>>>>> start) > >>>>>> echo -n "Mounting PVFS2 Filesystem: " > >>>>>> modprobe pvfs2 > >>>>>> /opt/pvfs2/sbin/pvfs2-client -p > >>>>>> /opt/pvfs2/sbin/pvfs2-client-core > >>>>>> mkdir -p /mnt/pvfs2 > >>>>>> mount -t pvfs2 -o intr tcp://pvfs2-io-0-0:3334/pvfs2-fs > >>>>>> /mnt/pvfs2 > >>>>>> touch /var/lock/subsys/pvfs2-client > >>>>>> ;; > >>>>>> > >>>>>> stop) > >>>>>> echo -n "Unmounting PVFS2 Filesystem: " > >>>>>> umount /mnt/pvfs2 > >>>>>> rm -f /var/lock/subsys/pvfs2-client > >>>>>> ;; > >>>>>> > >>>>>> restart) > >>>>>> $0 stop > >>>>>> $0 start > >>>>>> ;; > >>>>>> > >>>>>> status) > >>>>>> status $NAME > >>>>>> ;; > >>>>>> *) > >>>>>> echo "Usage: $NAME {start|stop|restart|status}" > >>>>>> exit 1 > >>>>>> esac > >>>>>> > >>>>>> exit 0 > >>>>>> ---------------- > >>>>>> I've tried with the export commented and uncommented, no difference. > >>>>>> > >>>>>> --Jim > >>>>>> > >>>>>> On Wed, Jul 18, 2012 at 12:20 PM, Becky Ligon <[email protected]> > >>>>>> wrote: > >>>>>> > Thanks, Jim. > >>>>>> > > >>>>>> > We are using 2.6.32-220.4.1.el6.x86_64 in our production > >>>>>> > environment. So, I > >>>>>> > should be able to setup a VM with your kernel version and test. > Can > >>>>>> > you > >>>>>> > give me a scenario to try in order to reproduce the problem? > >>>>>> > > >>>>>> > I am also setting up a CENTOS 6 VM, so I can analyze the > >>>>>> > mount-with-boot > >>>>>> > issue. > >>>>>> > > >>>>>> > Becky > >>>>>> > > >>>>>> > > >>>>>> > On Wed, Jul 18, 2012 at 3:16 PM, Jim Kusznir <[email protected]> > >>>>>> > wrote: > >>>>>> >> > >>>>>> >> [root@aeoltest torque]# rpm -qa |grep kernel > >>>>>> >> kernel-2.6.32-220.13.1.el6.x86_64 > >>>>>> >> dracut-kernel-004-256.el6_2.1.noarch > >>>>>> >> kernel-devel-2.6.32-220.13.1.el6.x86_64 > >>>>>> >> kernel-headers-2.6.32-220.13.1.el6.x86_64 > >>>>>> >> kernel-firmware-2.6.32-220.13.1.el6.noarch > >>>>>> >> kernel-doc-2.6.32-220.13.1.el6.noarch > >>>>>> >> [root@aeoltest torque]# uname -a > >>>>>> >> Linux aeoltest.local 2.6.32-220.13.1.el6.x86_64 #1 SMP Tue Apr 17 > >>>>>> >> 23:56:34 BST 2012 x86_64 x86_64 x86_64 GNU/Linux > >>>>>> >> [root@aeoltest torque]# > >>>>>> >> > >>>>>> >> > >>>>>> >> On Wed, Jul 18, 2012 at 12:10 PM, Becky Ligon < > [email protected]> > >>>>>> >> wrote: > >>>>>> >> > Jim: > >>>>>> >> > > >>>>>> >> > We are working on a few corrections to the user library, as we > >>>>>> >> > speak, > >>>>>> >> > that > >>>>>> >> > were identified last week. Using LD_PRELOAD would definitely > get > >>>>>> >> > around > >>>>>> >> > the > >>>>>> >> > kernel issues at hand, but I ask that you wait until we have > all > >>>>>> >> > of the > >>>>>> >> > current corrections in place before using it. > >>>>>> >> > > >>>>>> >> > I also have some questions for you. I am working the issue > with > >>>>>> >> > the > >>>>>> >> > "won't > >>>>>> >> > mount on boot" issue and would like to know the specific kernel > >>>>>> >> > that you > >>>>>> >> > are > >>>>>> >> > using under CentOS 6.2. > >>>>>> >> > > >>>>>> >> > Thanks, > >>>>>> >> > Becky > >>>>>> >> > > >>>>>> >> > > >>>>>> >> > On Wed, Jul 18, 2012 at 3:01 PM, Jim Kusznir < > [email protected]> > >>>>>> >> > wrote: > >>>>>> >> >> > >>>>>> >> >> I managed to get a screenshot of a ip-kvm with the last chunk > of > >>>>>> >> >> a > >>>>>> >> >> pvfs-induced KP on a compute node; image attached. > >>>>>> >> >> > >>>>>> >> >> With respect to client access methods, perhaps I should switch > >>>>>> >> >> to a > >>>>>> >> >> user space solution. I remember hearing about an LD_Preload > >>>>>> >> >> client > >>>>>> >> >> module (not using fuse, but being entirely userspace). Is > that > >>>>>> >> >> "ready" with 2.8.6? If not, perhaps I need to switch to the > >>>>>> >> >> fuse > >>>>>> >> >> module... > >>>>>> >> >> > >>>>>> >> >> --Jim > >>>>>> >> >> > >>>>>> >> >> On Wed, Jul 18, 2012 at 11:46 AM, Andrew Savchenko > >>>>>> >> >> <[email protected]> > >>>>>> >> >> wrote: > >>>>>> >> >> > Hello Becky, > >>>>>> >> >> > > >>>>>> >> >> > On Wed, 18 Jul 2012 12:43:51 -0400 Becky Ligon wrote: > >>>>>> >> >> >> Andrew: > >>>>>> >> >> >> > >>>>>> >> >> >> 2.8.6 does not fix the problem you were seeing with > question > >>>>>> >> >> >> marks > >>>>>> >> >> >> in > >>>>>> >> >> >> the > >>>>>> >> >> >> "ls" output, but we are working on it. > >>>>>> >> >> >> > >>>>>> >> >> >> Just FYI! > >>>>>> >> >> > > >>>>>> >> >> > Thanks for the warning. I'll keep sticking to the fuse > client > >>>>>> >> >> > during > >>>>>> >> >> > update then. > >>>>>> >> >> > > >>>>>> >> >> > Best regards, > >>>>>> >> >> > Andrew Savchenko > >>>>>> >> >> > > >>>>>> >> >> > _______________________________________________ > >>>>>> >> >> > Pvfs2-users mailing list > >>>>>> >> >> > [email protected] > >>>>>> >> >> > > >>>>>> >> >> > > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users > >>>>>> >> >> > > >>>>>> >> > > >>>>>> >> > > >>>>>> >> > > >>>>>> >> > > >>>>>> >> > -- > >>>>>> >> > Becky Ligon > >>>>>> >> > OrangeFS Support and Development > >>>>>> >> > Omnibond Systems > >>>>>> >> > Anderson, South Carolina > >>>>>> >> > > >>>>>> >> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > -- > >>>>>> > Becky Ligon > >>>>>> > OrangeFS Support and Development > >>>>>> > Omnibond Systems > >>>>>> > Anderson, South Carolina > >>>>>> > > >>>>>> > > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Becky Ligon > >>>>> OrangeFS Support and Development > >>>>> Omnibond Systems > >>>>> Anderson, South Carolina > >>>>> > >>>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> Becky Ligon > >>>> OrangeFS Support and Development > >>>> Omnibond Systems > >>>> Anderson, South Carolina > >>>> > >>>> > >>> > >>> > >>> > >>> -- > >>> Becky Ligon > >>> OrangeFS Support and Development > >>> Omnibond Systems > >>> Anderson, South Carolina > >>> > >>> > >> > >> > >> > >> -- > >> Becky Ligon > >> OrangeFS Support and Development > >> Omnibond Systems > >> Anderson, South Carolina > >> > >> > > > > > > > > -- > > Becky Ligon > > OrangeFS Support and Development > > Omnibond Systems > > Anderson, South Carolina > > > > > -- Becky Ligon OrangeFS Support and Development Omnibond Systems Anderson, South Carolina
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
