Jim: The config files that you sent me are for older PVFS systems. We no longer use the two-config-file approach anymore, and the config file should have a data space and a storage space. Did you send me the wrong files?
Becky On Thu, Jul 26, 2012 at 11:48 AM, Becky Ligon <[email protected]> wrote: > Jim: > > Your 2.8.5 rpms install the kernel module in > /lib/modules/2.6.18-92.1.13.el5/kernel/fs/pvfs2/pvfs2.ko and 2.8.6 into > /lib/modules/2.6.32-220.13.1.el6.x86_64/kernel/fs/pvfs2/pvfs2.ko. Please > verify that: > > /lib/modules/2.6.18-92.1.13.el5/modules.dep contains > "kernel/fs/pvfs2/pvfs2.ko" > > /lib/modules/2.6.32-220.13.1.el6.x86_64/modules.dep contains > "kernel/fs/pvfs2/pvfs2.ko" > > You might have to modify your rebuild scripts to execute a "depmod" AFTER > the orangefs-kmod rpm is installed. Your scripts may already be doing > this, and, if so, then the kernel module should be loaded after an > install. You did mention that you thought the kernel module was being > loaded properly. If that is the case, then adding the "LD_LIBRARY_PATH" to > your OrangeFS init.d script should allow the client-core to fire up > properly and then the following mount. > > Let me know if this works for you. > > Becky > > On Wed, Jul 25, 2012 at 5:50 PM, Jim Kusznir <[email protected]> wrote: > >> Here's the last file. >> >> On Wed, Jul 25, 2012 at 10:06 AM, Becky Ligon <[email protected]> wrote: >> > Jim: >> > >> > One more thing: can you send me the pvfs2-client.log files from the >> nodes >> > where a KP has occurred? If possible, I'd like the corresponding >> > /var/log/messages log file from when the KP happened. >> > >> > Thanks, >> > Becky >> > >> > >> > On Wed, Jul 25, 2012 at 1:05 PM, Becky Ligon <[email protected]> >> wrote: >> >> >> >> Jim: >> >> >> >> Can you also send me your PVFS server config file? >> >> >> >> Becky >> >> >> >> >> >> On Wed, Jul 25, 2012 at 12:49 PM, Becky Ligon <[email protected]> >> wrote: >> >>> >> >>> Jim: >> >>> >> >>> Can you send me the kmod-pvfs2-...rpm? I'd like to see how its files >> are >> >>> layed out. >> >>> >> >>> Thanks, >> >>> Becky >> >>> >> >>> >> >>> On Sat, Jul 21, 2012 at 4:46 PM, Jim Kusznir <[email protected]> >> wrote: >> >>>> >> >>>> Hi Becky: >> >>>> >> >>>> Thanks for all your input. I was on travel and am currently catching >> >>>> up on e-mail, so here are answers to your questions: >> >>>> >> >>>> 1) this problem occurs on both my ROCKS 5.1 (CentOS 5.2) and ROCKS 6 >> >>>> (CentOS 6.2) clusters identically. >> >>>> 2) I can mount manually using the init script. It just will not run >> >>>> on boot. It tries, but fails with the error message supplied. >> >>>> 3) The module is installed with a kmod-pvfs2-... rpm (as is required >> >>>> for ROCKS clusters...Any software to be installed on each node needs >> >>>> to be its own RPM). It appears to me that the module is being loaded >> >>>> successfully. >> >>>> 4) Ok, that sounds plausible. I'll make those corrections and see if >> >>>> that fixes things. >> >>>> >> >>>> Of course, the mount on boot was one of two show-stopping issues. >> The >> >>>> second show-stopping issue is how many kernel panics are being caused >> >>>> by OrangeFS. I've been experiencing 3-8 KP's a week on a light to >> >>>> moderate load on my cluster (24 nodes + head node, 3 pvfs nodes). >> >>>> >> >>>> My versions in use are: 2.8.5 (ROCKS 5.1), 2.8.6 (ROCKS 6). For my >> >>>> users, I absolutely must have a "traditional filesystem interface" >> >>>> (eg, MPI-IO or pvfs-* commands are not acceptable, they need to work >> >>>> on the files like they would for any other filesystem). >> >>>> >> >>>> --Jim >> >>>> >> >>>> On Fri, Jul 20, 2012 at 1:45 PM, Becky Ligon <[email protected]> >> wrote: >> >>>> > Jim: >> >>>> > >> >>>> > In your init script, you need to add the LD_LIBRARY_PATH variable, >> >>>> > since >> >>>> > your pvfs library is not in a standard location: >> >>>> > >> >>>> > export LD_LIBRARY_PATH=/opt/pvfs2/lib:$LD_LIBRARY_PATH >> >>>> > >> >>>> > Remove the LD_PRELOAD. It is not needed here. >> >>>> > >> >>>> > Before "modprobe" will work, you have to run the command "depmod" >> to >> >>>> > update >> >>>> > the modules list. The "make kmod_install" does not automatically >> do >> >>>> > this. >> >>>> > NOTE: if you place the kernel module (pvfs2.ko) somewhere other >> than >> >>>> > /lib/modules/`uname -r`/kernel/fs/pvfs2, then you can't use >> modprobe >> >>>> > to load >> >>>> > the module. Instead, use "/sbin/insmod <path>/pvfs2.ko". If you >> are >> >>>> > using >> >>>> > the rpm spec that I gave you (and it looks like you are), then >> >>>> > pvfs2.ko is >> >>>> > located in /opt/pvfs2/lib/pvfs2.ko, in which case, you have to use >> the >> >>>> > "insmod" command to load it and the "rmmod" command to unload it. >> >>>> > >> >>>> > When you issue a "stop", your script does not stop the client nor >> does >> >>>> > it >> >>>> > unload the kernel module. This will cause problems if you issue a >> >>>> > "start" >> >>>> > by starting another pvfs2-client. I will send you the init script >> >>>> > that we >> >>>> > use here. Maybe, you can modify it to accommodate your >> environment. >> >>>> > We >> >>>> > have more checks in it than you have in yours. >> >>>> > >> >>>> > I am not familiar with how PVFS reacts to the "intr" option that >> you >> >>>> > specify >> >>>> > in the mount command. What is its purpose? >> >>>> > >> >>>> > Becky >> >>>> > >> >>>> > >> >>>> > On Fri, Jul 20, 2012 at 3:27 PM, Becky Ligon <[email protected]> >> >>>> > wrote: >> >>>> >> >> >>>> >> Jim: >> >>>> >> >> >>>> >> I just realized that you have already sent me your init script. >> Let >> >>>> >> me >> >>>> >> take a closer look at it. >> >>>> >> >> >>>> >> Becky >> >>>> >> >> >>>> >> >> >>>> >> On Fri, Jul 20, 2012 at 3:13 PM, Becky Ligon <[email protected]> >> >>>> >> wrote: >> >>>> >>> >> >>>> >>> Jim: >> >>>> >>> >> >>>> >>> I have successfully booted my CentOS 6.2 system (using >> >>>> >>> 2.6.32-220.13.1.el6.x86_64) and started the PVFS2 server and >> mounted >> >>>> >>> the >> >>>> >>> client. Thus, I can only guess that there is something in your >> >>>> >>> environment >> >>>> >>> causing the problem. Is it possible for you to mount the client >> by >> >>>> >>> issuing >> >>>> >>> the commands manually once the system is running? Can you send >> me a >> >>>> >>> copy of >> >>>> >>> your startup script for mounting the client from your /etc/init.d >> >>>> >>> directory? >> >>>> >>> >> >>>> >>> Becky >> >>>> >>> >> >>>> >>> >> >>>> >>> On Thu, Jul 19, 2012 at 12:58 PM, Becky Ligon < >> [email protected]> >> >>>> >>> wrote: >> >>>> >>>> >> >>>> >>>> Jim: >> >>>> >>>> >> >>>> >>>> I have been able to successfully mount-on-boot on a VM with the >> >>>> >>>> 2.6.32-220.13.1.el6.x86_64. However, I was using the Scientific >> >>>> >>>> Linux 6 >> >>>> >>>> distro and NOT CentOS 6.2. Next, I will try a CentOS 6.2 distro >> >>>> >>>> and see >> >>>> >>>> what happens with it. >> >>>> >>>> >> >>>> >>>> Becky >> >>>> >>>> >> >>>> >>>> >> >>>> >>>> On Wed, Jul 18, 2012 at 5:14 PM, Becky Ligon < >> [email protected]> >> >>>> >>>> wrote: >> >>>> >>>>> >> >>>> >>>>> Jim: >> >>>> >>>>> >> >>>> >>>>> Is the mount-on-boot issue just with your CentOS 6.2 >> environment? >> >>>> >>>>> If >> >>>> >>>>> so, which version of OrangeFS are you running? >> >>>> >>>>> >> >>>> >>>>> Becky >> >>>> >>>>> >> >>>> >>>>> >> >>>> >>>>> On Wed, Jul 18, 2012 at 3:28 PM, Jim Kusznir < >> [email protected]> >> >>>> >>>>> wrote: >> >>>> >>>>>> >> >>>> >>>>>> I cannot reproduce the pvfs2 crash on demand. I have not yet >> >>>> >>>>>> seen it >> >>>> >>>>>> on centos 6, but I haven't placed centos6 into production yet. >> >>>> >>>>>> >> >>>> >>>>>> On my centos5 systems, its not reproducible on demand, but it >> >>>> >>>>>> seems to >> >>>> >>>>>> happen with moderate file access from a few different >> processes. >> >>>> >>>>>> Sometimes scp'ing files to/from pvfs2 on the head node (which >> is >> >>>> >>>>>> a >> >>>> >>>>>> pvfs2 client) will do it. This has happened since the >> beginning >> >>>> >>>>>> of >> >>>> >>>>>> pvfs2 for me; on the compute nodes, I'm not sure if there's >> more >> >>>> >>>>>> than >> >>>> >>>>>> one process, but since I updated to OrangeFS 2.8.5, I've been >> >>>> >>>>>> seeing >> >>>> >>>>>> compute nodes KP with the previous screenshot (it did not >> crash >> >>>> >>>>>> (that >> >>>> >>>>>> I'm aware of) prior to OrangeFS 2.8.5 on compute nodes). >> >>>> >>>>>> >> >>>> >>>>>> Here's my /etc/init.d/pvfs2-client script: >> >>>> >>>>>> --------------- >> >>>> >>>>>> #!/bin/sh >> >>>> >>>>>> # >> >>>> >>>>>> # chkconfig: 2345 99 99 >> >>>> >>>>>> # >> >>>> >>>>>> # description: mount pvfs2 filesystem >> >>>> >>>>>> # >> >>>> >>>>>> >> >>>> >>>>>> . /etc/rc.d/init.d/functions >> >>>> >>>>>> #export LD_PRELOAD=/opt/db4/lib/ >> >>>> >>>>>> case "$1" in >> >>>> >>>>>> start) >> >>>> >>>>>> echo -n "Mounting PVFS2 Filesystem: " >> >>>> >>>>>> modprobe pvfs2 >> >>>> >>>>>> /opt/pvfs2/sbin/pvfs2-client -p >> >>>> >>>>>> /opt/pvfs2/sbin/pvfs2-client-core >> >>>> >>>>>> mkdir -p /mnt/pvfs2 >> >>>> >>>>>> mount -t pvfs2 -o intr >> tcp://pvfs2-io-0-0:3334/pvfs2-fs >> >>>> >>>>>> /mnt/pvfs2 >> >>>> >>>>>> touch /var/lock/subsys/pvfs2-client >> >>>> >>>>>> ;; >> >>>> >>>>>> >> >>>> >>>>>> stop) >> >>>> >>>>>> echo -n "Unmounting PVFS2 Filesystem: " >> >>>> >>>>>> umount /mnt/pvfs2 >> >>>> >>>>>> rm -f /var/lock/subsys/pvfs2-client >> >>>> >>>>>> ;; >> >>>> >>>>>> >> >>>> >>>>>> restart) >> >>>> >>>>>> $0 stop >> >>>> >>>>>> $0 start >> >>>> >>>>>> ;; >> >>>> >>>>>> >> >>>> >>>>>> status) >> >>>> >>>>>> status $NAME >> >>>> >>>>>> ;; >> >>>> >>>>>> *) >> >>>> >>>>>> echo "Usage: $NAME {start|stop|restart|status}" >> >>>> >>>>>> exit 1 >> >>>> >>>>>> esac >> >>>> >>>>>> >> >>>> >>>>>> exit 0 >> >>>> >>>>>> ---------------- >> >>>> >>>>>> I've tried with the export commented and uncommented, no >> >>>> >>>>>> difference. >> >>>> >>>>>> >> >>>> >>>>>> --Jim >> >>>> >>>>>> >> >>>> >>>>>> On Wed, Jul 18, 2012 at 12:20 PM, Becky Ligon >> >>>> >>>>>> <[email protected]> >> >>>> >>>>>> wrote: >> >>>> >>>>>> > Thanks, Jim. >> >>>> >>>>>> > >> >>>> >>>>>> > We are using 2.6.32-220.4.1.el6.x86_64 in our production >> >>>> >>>>>> > environment. So, I >> >>>> >>>>>> > should be able to setup a VM with your kernel version and >> test. >> >>>> >>>>>> > Can >> >>>> >>>>>> > you >> >>>> >>>>>> > give me a scenario to try in order to reproduce the problem? >> >>>> >>>>>> > >> >>>> >>>>>> > I am also setting up a CENTOS 6 VM, so I can analyze the >> >>>> >>>>>> > mount-with-boot >> >>>> >>>>>> > issue. >> >>>> >>>>>> > >> >>>> >>>>>> > Becky >> >>>> >>>>>> > >> >>>> >>>>>> > >> >>>> >>>>>> > On Wed, Jul 18, 2012 at 3:16 PM, Jim Kusznir >> >>>> >>>>>> > <[email protected]> >> >>>> >>>>>> > wrote: >> >>>> >>>>>> >> >> >>>> >>>>>> >> [root@aeoltest torque]# rpm -qa |grep kernel >> >>>> >>>>>> >> kernel-2.6.32-220.13.1.el6.x86_64 >> >>>> >>>>>> >> dracut-kernel-004-256.el6_2.1.noarch >> >>>> >>>>>> >> kernel-devel-2.6.32-220.13.1.el6.x86_64 >> >>>> >>>>>> >> kernel-headers-2.6.32-220.13.1.el6.x86_64 >> >>>> >>>>>> >> kernel-firmware-2.6.32-220.13.1.el6.noarch >> >>>> >>>>>> >> kernel-doc-2.6.32-220.13.1.el6.noarch >> >>>> >>>>>> >> [root@aeoltest torque]# uname -a >> >>>> >>>>>> >> Linux aeoltest.local 2.6.32-220.13.1.el6.x86_64 #1 SMP Tue >> Apr >> >>>> >>>>>> >> 17 >> >>>> >>>>>> >> 23:56:34 BST 2012 x86_64 x86_64 x86_64 GNU/Linux >> >>>> >>>>>> >> [root@aeoltest torque]# >> >>>> >>>>>> >> >> >>>> >>>>>> >> >> >>>> >>>>>> >> On Wed, Jul 18, 2012 at 12:10 PM, Becky Ligon >> >>>> >>>>>> >> <[email protected]> >> >>>> >>>>>> >> wrote: >> >>>> >>>>>> >> > Jim: >> >>>> >>>>>> >> > >> >>>> >>>>>> >> > We are working on a few corrections to the user library, >> as >> >>>> >>>>>> >> > we >> >>>> >>>>>> >> > speak, >> >>>> >>>>>> >> > that >> >>>> >>>>>> >> > were identified last week. Using LD_PRELOAD would >> >>>> >>>>>> >> > definitely get >> >>>> >>>>>> >> > around >> >>>> >>>>>> >> > the >> >>>> >>>>>> >> > kernel issues at hand, but I ask that you wait until we >> have >> >>>> >>>>>> >> > all >> >>>> >>>>>> >> > of the >> >>>> >>>>>> >> > current corrections in place before using it. >> >>>> >>>>>> >> > >> >>>> >>>>>> >> > I also have some questions for you. I am working the >> issue >> >>>> >>>>>> >> > with >> >>>> >>>>>> >> > the >> >>>> >>>>>> >> > "won't >> >>>> >>>>>> >> > mount on boot" issue and would like to know the specific >> >>>> >>>>>> >> > kernel >> >>>> >>>>>> >> > that you >> >>>> >>>>>> >> > are >> >>>> >>>>>> >> > using under CentOS 6.2. >> >>>> >>>>>> >> > >> >>>> >>>>>> >> > Thanks, >> >>>> >>>>>> >> > Becky >> >>>> >>>>>> >> > >> >>>> >>>>>> >> > >> >>>> >>>>>> >> > On Wed, Jul 18, 2012 at 3:01 PM, Jim Kusznir >> >>>> >>>>>> >> > <[email protected]> >> >>>> >>>>>> >> > wrote: >> >>>> >>>>>> >> >> >> >>>> >>>>>> >> >> I managed to get a screenshot of a ip-kvm with the last >> >>>> >>>>>> >> >> chunk of >> >>>> >>>>>> >> >> a >> >>>> >>>>>> >> >> pvfs-induced KP on a compute node; image attached. >> >>>> >>>>>> >> >> >> >>>> >>>>>> >> >> With respect to client access methods, perhaps I should >> >>>> >>>>>> >> >> switch >> >>>> >>>>>> >> >> to a >> >>>> >>>>>> >> >> user space solution. I remember hearing about an >> >>>> >>>>>> >> >> LD_Preload >> >>>> >>>>>> >> >> client >> >>>> >>>>>> >> >> module (not using fuse, but being entirely userspace). >> Is >> >>>> >>>>>> >> >> that >> >>>> >>>>>> >> >> "ready" with 2.8.6? If not, perhaps I need to switch to >> >>>> >>>>>> >> >> the >> >>>> >>>>>> >> >> fuse >> >>>> >>>>>> >> >> module... >> >>>> >>>>>> >> >> >> >>>> >>>>>> >> >> --Jim >> >>>> >>>>>> >> >> >> >>>> >>>>>> >> >> On Wed, Jul 18, 2012 at 11:46 AM, Andrew Savchenko >> >>>> >>>>>> >> >> <[email protected]> >> >>>> >>>>>> >> >> wrote: >> >>>> >>>>>> >> >> > Hello Becky, >> >>>> >>>>>> >> >> > >> >>>> >>>>>> >> >> > On Wed, 18 Jul 2012 12:43:51 -0400 Becky Ligon wrote: >> >>>> >>>>>> >> >> >> Andrew: >> >>>> >>>>>> >> >> >> >> >>>> >>>>>> >> >> >> 2.8.6 does not fix the problem you were seeing with >> >>>> >>>>>> >> >> >> question >> >>>> >>>>>> >> >> >> marks >> >>>> >>>>>> >> >> >> in >> >>>> >>>>>> >> >> >> the >> >>>> >>>>>> >> >> >> "ls" output, but we are working on it. >> >>>> >>>>>> >> >> >> >> >>>> >>>>>> >> >> >> Just FYI! >> >>>> >>>>>> >> >> > >> >>>> >>>>>> >> >> > Thanks for the warning. I'll keep sticking to the fuse >> >>>> >>>>>> >> >> > client >> >>>> >>>>>> >> >> > during >> >>>> >>>>>> >> >> > update then. >> >>>> >>>>>> >> >> > >> >>>> >>>>>> >> >> > Best regards, >> >>>> >>>>>> >> >> > Andrew Savchenko >> >>>> >>>>>> >> >> > >> >>>> >>>>>> >> >> > _______________________________________________ >> >>>> >>>>>> >> >> > Pvfs2-users mailing list >> >>>> >>>>>> >> >> > [email protected] >> >>>> >>>>>> >> >> > >> >>>> >>>>>> >> >> > >> >>>> >>>>>> >> >> > >> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users >> >>>> >>>>>> >> >> > >> >>>> >>>>>> >> > >> >>>> >>>>>> >> > >> >>>> >>>>>> >> > >> >>>> >>>>>> >> > >> >>>> >>>>>> >> > -- >> >>>> >>>>>> >> > Becky Ligon >> >>>> >>>>>> >> > OrangeFS Support and Development >> >>>> >>>>>> >> > Omnibond Systems >> >>>> >>>>>> >> > Anderson, South Carolina >> >>>> >>>>>> >> > >> >>>> >>>>>> >> > >> >>>> >>>>>> > >> >>>> >>>>>> > >> >>>> >>>>>> > >> >>>> >>>>>> > >> >>>> >>>>>> > -- >> >>>> >>>>>> > Becky Ligon >> >>>> >>>>>> > OrangeFS Support and Development >> >>>> >>>>>> > Omnibond Systems >> >>>> >>>>>> > Anderson, South Carolina >> >>>> >>>>>> > >> >>>> >>>>>> > >> >>>> >>>>> >> >>>> >>>>> >> >>>> >>>>> >> >>>> >>>>> >> >>>> >>>>> -- >> >>>> >>>>> Becky Ligon >> >>>> >>>>> OrangeFS Support and Development >> >>>> >>>>> Omnibond Systems >> >>>> >>>>> Anderson, South Carolina >> >>>> >>>>> >> >>>> >>>>> >> >>>> >>>> >> >>>> >>>> >> >>>> >>>> >> >>>> >>>> -- >> >>>> >>>> Becky Ligon >> >>>> >>>> OrangeFS Support and Development >> >>>> >>>> Omnibond Systems >> >>>> >>>> Anderson, South Carolina >> >>>> >>>> >> >>>> >>>> >> >>>> >>> >> >>>> >>> >> >>>> >>> >> >>>> >>> -- >> >>>> >>> Becky Ligon >> >>>> >>> OrangeFS Support and Development >> >>>> >>> Omnibond Systems >> >>>> >>> Anderson, South Carolina >> >>>> >>> >> >>>> >>> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> >> -- >> >>>> >> Becky Ligon >> >>>> >> OrangeFS Support and Development >> >>>> >> Omnibond Systems >> >>>> >> Anderson, South Carolina >> >>>> >> >> >>>> >> >> >>>> > >> >>>> > >> >>>> > >> >>>> > -- >> >>>> > Becky Ligon >> >>>> > OrangeFS Support and Development >> >>>> > Omnibond Systems >> >>>> > Anderson, South Carolina >> >>>> > >> >>>> > >> >>> >> >>> >> >>> >> >>> >> >>> -- >> >>> Becky Ligon >> >>> OrangeFS Support and Development >> >>> Omnibond Systems >> >>> Anderson, South Carolina >> >>> >> >>> >> >> >> >> >> >> >> >> -- >> >> Becky Ligon >> >> OrangeFS Support and Development >> >> Omnibond Systems >> >> Anderson, South Carolina >> >> >> >> >> > >> > >> > >> > -- >> > Becky Ligon >> > OrangeFS Support and Development >> > Omnibond Systems >> > Anderson, South Carolina >> > >> > >> > > > > -- > Becky Ligon > OrangeFS Support and Development > Omnibond Systems > Anderson, South Carolina > > > -- Becky Ligon OrangeFS Support and Development Omnibond Systems Anderson, South Carolina
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
