Jim: Can you also send me your PVFS server config file?
Becky On Wed, Jul 25, 2012 at 12:49 PM, Becky Ligon <[email protected]> wrote: > Jim: > > Can you send me the kmod-pvfs2-...rpm? I'd like to see how its files are > layed out. > > Thanks, > Becky > > > On Sat, Jul 21, 2012 at 4:46 PM, Jim Kusznir <[email protected]> wrote: > >> Hi Becky: >> >> Thanks for all your input. I was on travel and am currently catching >> up on e-mail, so here are answers to your questions: >> >> 1) this problem occurs on both my ROCKS 5.1 (CentOS 5.2) and ROCKS 6 >> (CentOS 6.2) clusters identically. >> 2) I can mount manually using the init script. It just will not run >> on boot. It tries, but fails with the error message supplied. >> 3) The module is installed with a kmod-pvfs2-... rpm (as is required >> for ROCKS clusters...Any software to be installed on each node needs >> to be its own RPM). It appears to me that the module is being loaded >> successfully. >> 4) Ok, that sounds plausible. I'll make those corrections and see if >> that fixes things. >> >> Of course, the mount on boot was one of two show-stopping issues. The >> second show-stopping issue is how many kernel panics are being caused >> by OrangeFS. I've been experiencing 3-8 KP's a week on a light to >> moderate load on my cluster (24 nodes + head node, 3 pvfs nodes). >> >> My versions in use are: 2.8.5 (ROCKS 5.1), 2.8.6 (ROCKS 6). For my >> users, I absolutely must have a "traditional filesystem interface" >> (eg, MPI-IO or pvfs-* commands are not acceptable, they need to work >> on the files like they would for any other filesystem). >> >> --Jim >> >> On Fri, Jul 20, 2012 at 1:45 PM, Becky Ligon <[email protected]> wrote: >> > Jim: >> > >> > In your init script, you need to add the LD_LIBRARY_PATH variable, since >> > your pvfs library is not in a standard location: >> > >> > export LD_LIBRARY_PATH=/opt/pvfs2/lib:$LD_LIBRARY_PATH >> > >> > Remove the LD_PRELOAD. It is not needed here. >> > >> > Before "modprobe" will work, you have to run the command "depmod" to >> update >> > the modules list. The "make kmod_install" does not automatically do >> this. >> > NOTE: if you place the kernel module (pvfs2.ko) somewhere other than >> > /lib/modules/`uname -r`/kernel/fs/pvfs2, then you can't use modprobe to >> load >> > the module. Instead, use "/sbin/insmod <path>/pvfs2.ko". If you are >> using >> > the rpm spec that I gave you (and it looks like you are), then pvfs2.ko >> is >> > located in /opt/pvfs2/lib/pvfs2.ko, in which case, you have to use the >> > "insmod" command to load it and the "rmmod" command to unload it. >> > >> > When you issue a "stop", your script does not stop the client nor does >> it >> > unload the kernel module. This will cause problems if you issue a >> "start" >> > by starting another pvfs2-client. I will send you the init script that >> we >> > use here. Maybe, you can modify it to accommodate your environment. We >> > have more checks in it than you have in yours. >> > >> > I am not familiar with how PVFS reacts to the "intr" option that you >> specify >> > in the mount command. What is its purpose? >> > >> > Becky >> > >> > >> > On Fri, Jul 20, 2012 at 3:27 PM, Becky Ligon <[email protected]> >> wrote: >> >> >> >> Jim: >> >> >> >> I just realized that you have already sent me your init script. Let me >> >> take a closer look at it. >> >> >> >> Becky >> >> >> >> >> >> On Fri, Jul 20, 2012 at 3:13 PM, Becky Ligon <[email protected]> >> wrote: >> >>> >> >>> Jim: >> >>> >> >>> I have successfully booted my CentOS 6.2 system (using >> >>> 2.6.32-220.13.1.el6.x86_64) and started the PVFS2 server and mounted >> the >> >>> client. Thus, I can only guess that there is something in your >> environment >> >>> causing the problem. Is it possible for you to mount the client by >> issuing >> >>> the commands manually once the system is running? Can you send me a >> copy of >> >>> your startup script for mounting the client from your /etc/init.d >> directory? >> >>> >> >>> Becky >> >>> >> >>> >> >>> On Thu, Jul 19, 2012 at 12:58 PM, Becky Ligon <[email protected]> >> wrote: >> >>>> >> >>>> Jim: >> >>>> >> >>>> I have been able to successfully mount-on-boot on a VM with the >> >>>> 2.6.32-220.13.1.el6.x86_64. However, I was using the Scientific >> Linux 6 >> >>>> distro and NOT CentOS 6.2. Next, I will try a CentOS 6.2 distro and >> see >> >>>> what happens with it. >> >>>> >> >>>> Becky >> >>>> >> >>>> >> >>>> On Wed, Jul 18, 2012 at 5:14 PM, Becky Ligon <[email protected]> >> wrote: >> >>>>> >> >>>>> Jim: >> >>>>> >> >>>>> Is the mount-on-boot issue just with your CentOS 6.2 environment? >> If >> >>>>> so, which version of OrangeFS are you running? >> >>>>> >> >>>>> Becky >> >>>>> >> >>>>> >> >>>>> On Wed, Jul 18, 2012 at 3:28 PM, Jim Kusznir <[email protected]> >> >>>>> wrote: >> >>>>>> >> >>>>>> I cannot reproduce the pvfs2 crash on demand. I have not yet seen >> it >> >>>>>> on centos 6, but I haven't placed centos6 into production yet. >> >>>>>> >> >>>>>> On my centos5 systems, its not reproducible on demand, but it >> seems to >> >>>>>> happen with moderate file access from a few different processes. >> >>>>>> Sometimes scp'ing files to/from pvfs2 on the head node (which is a >> >>>>>> pvfs2 client) will do it. This has happened since the beginning of >> >>>>>> pvfs2 for me; on the compute nodes, I'm not sure if there's more >> than >> >>>>>> one process, but since I updated to OrangeFS 2.8.5, I've been >> seeing >> >>>>>> compute nodes KP with the previous screenshot (it did not crash >> (that >> >>>>>> I'm aware of) prior to OrangeFS 2.8.5 on compute nodes). >> >>>>>> >> >>>>>> Here's my /etc/init.d/pvfs2-client script: >> >>>>>> --------------- >> >>>>>> #!/bin/sh >> >>>>>> # >> >>>>>> # chkconfig: 2345 99 99 >> >>>>>> # >> >>>>>> # description: mount pvfs2 filesystem >> >>>>>> # >> >>>>>> >> >>>>>> . /etc/rc.d/init.d/functions >> >>>>>> #export LD_PRELOAD=/opt/db4/lib/ >> >>>>>> case "$1" in >> >>>>>> start) >> >>>>>> echo -n "Mounting PVFS2 Filesystem: " >> >>>>>> modprobe pvfs2 >> >>>>>> /opt/pvfs2/sbin/pvfs2-client -p >> >>>>>> /opt/pvfs2/sbin/pvfs2-client-core >> >>>>>> mkdir -p /mnt/pvfs2 >> >>>>>> mount -t pvfs2 -o intr tcp://pvfs2-io-0-0:3334/pvfs2-fs >> >>>>>> /mnt/pvfs2 >> >>>>>> touch /var/lock/subsys/pvfs2-client >> >>>>>> ;; >> >>>>>> >> >>>>>> stop) >> >>>>>> echo -n "Unmounting PVFS2 Filesystem: " >> >>>>>> umount /mnt/pvfs2 >> >>>>>> rm -f /var/lock/subsys/pvfs2-client >> >>>>>> ;; >> >>>>>> >> >>>>>> restart) >> >>>>>> $0 stop >> >>>>>> $0 start >> >>>>>> ;; >> >>>>>> >> >>>>>> status) >> >>>>>> status $NAME >> >>>>>> ;; >> >>>>>> *) >> >>>>>> echo "Usage: $NAME {start|stop|restart|status}" >> >>>>>> exit 1 >> >>>>>> esac >> >>>>>> >> >>>>>> exit 0 >> >>>>>> ---------------- >> >>>>>> I've tried with the export commented and uncommented, no >> difference. >> >>>>>> >> >>>>>> --Jim >> >>>>>> >> >>>>>> On Wed, Jul 18, 2012 at 12:20 PM, Becky Ligon <[email protected]> >> >>>>>> wrote: >> >>>>>> > Thanks, Jim. >> >>>>>> > >> >>>>>> > We are using 2.6.32-220.4.1.el6.x86_64 in our production >> >>>>>> > environment. So, I >> >>>>>> > should be able to setup a VM with your kernel version and test. >> Can >> >>>>>> > you >> >>>>>> > give me a scenario to try in order to reproduce the problem? >> >>>>>> > >> >>>>>> > I am also setting up a CENTOS 6 VM, so I can analyze the >> >>>>>> > mount-with-boot >> >>>>>> > issue. >> >>>>>> > >> >>>>>> > Becky >> >>>>>> > >> >>>>>> > >> >>>>>> > On Wed, Jul 18, 2012 at 3:16 PM, Jim Kusznir <[email protected] >> > >> >>>>>> > wrote: >> >>>>>> >> >> >>>>>> >> [root@aeoltest torque]# rpm -qa |grep kernel >> >>>>>> >> kernel-2.6.32-220.13.1.el6.x86_64 >> >>>>>> >> dracut-kernel-004-256.el6_2.1.noarch >> >>>>>> >> kernel-devel-2.6.32-220.13.1.el6.x86_64 >> >>>>>> >> kernel-headers-2.6.32-220.13.1.el6.x86_64 >> >>>>>> >> kernel-firmware-2.6.32-220.13.1.el6.noarch >> >>>>>> >> kernel-doc-2.6.32-220.13.1.el6.noarch >> >>>>>> >> [root@aeoltest torque]# uname -a >> >>>>>> >> Linux aeoltest.local 2.6.32-220.13.1.el6.x86_64 #1 SMP Tue Apr >> 17 >> >>>>>> >> 23:56:34 BST 2012 x86_64 x86_64 x86_64 GNU/Linux >> >>>>>> >> [root@aeoltest torque]# >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> On Wed, Jul 18, 2012 at 12:10 PM, Becky Ligon < >> [email protected]> >> >>>>>> >> wrote: >> >>>>>> >> > Jim: >> >>>>>> >> > >> >>>>>> >> > We are working on a few corrections to the user library, as we >> >>>>>> >> > speak, >> >>>>>> >> > that >> >>>>>> >> > were identified last week. Using LD_PRELOAD would definitely >> get >> >>>>>> >> > around >> >>>>>> >> > the >> >>>>>> >> > kernel issues at hand, but I ask that you wait until we have >> all >> >>>>>> >> > of the >> >>>>>> >> > current corrections in place before using it. >> >>>>>> >> > >> >>>>>> >> > I also have some questions for you. I am working the issue >> with >> >>>>>> >> > the >> >>>>>> >> > "won't >> >>>>>> >> > mount on boot" issue and would like to know the specific >> kernel >> >>>>>> >> > that you >> >>>>>> >> > are >> >>>>>> >> > using under CentOS 6.2. >> >>>>>> >> > >> >>>>>> >> > Thanks, >> >>>>>> >> > Becky >> >>>>>> >> > >> >>>>>> >> > >> >>>>>> >> > On Wed, Jul 18, 2012 at 3:01 PM, Jim Kusznir < >> [email protected]> >> >>>>>> >> > wrote: >> >>>>>> >> >> >> >>>>>> >> >> I managed to get a screenshot of a ip-kvm with the last >> chunk of >> >>>>>> >> >> a >> >>>>>> >> >> pvfs-induced KP on a compute node; image attached. >> >>>>>> >> >> >> >>>>>> >> >> With respect to client access methods, perhaps I should >> switch >> >>>>>> >> >> to a >> >>>>>> >> >> user space solution. I remember hearing about an LD_Preload >> >>>>>> >> >> client >> >>>>>> >> >> module (not using fuse, but being entirely userspace). Is >> that >> >>>>>> >> >> "ready" with 2.8.6? If not, perhaps I need to switch to the >> >>>>>> >> >> fuse >> >>>>>> >> >> module... >> >>>>>> >> >> >> >>>>>> >> >> --Jim >> >>>>>> >> >> >> >>>>>> >> >> On Wed, Jul 18, 2012 at 11:46 AM, Andrew Savchenko >> >>>>>> >> >> <[email protected]> >> >>>>>> >> >> wrote: >> >>>>>> >> >> > Hello Becky, >> >>>>>> >> >> > >> >>>>>> >> >> > On Wed, 18 Jul 2012 12:43:51 -0400 Becky Ligon wrote: >> >>>>>> >> >> >> Andrew: >> >>>>>> >> >> >> >> >>>>>> >> >> >> 2.8.6 does not fix the problem you were seeing with >> question >> >>>>>> >> >> >> marks >> >>>>>> >> >> >> in >> >>>>>> >> >> >> the >> >>>>>> >> >> >> "ls" output, but we are working on it. >> >>>>>> >> >> >> >> >>>>>> >> >> >> Just FYI! >> >>>>>> >> >> > >> >>>>>> >> >> > Thanks for the warning. I'll keep sticking to the fuse >> client >> >>>>>> >> >> > during >> >>>>>> >> >> > update then. >> >>>>>> >> >> > >> >>>>>> >> >> > Best regards, >> >>>>>> >> >> > Andrew Savchenko >> >>>>>> >> >> > >> >>>>>> >> >> > _______________________________________________ >> >>>>>> >> >> > Pvfs2-users mailing list >> >>>>>> >> >> > [email protected] >> >>>>>> >> >> > >> >>>>>> >> >> > >> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users >> >>>>>> >> >> > >> >>>>>> >> > >> >>>>>> >> > >> >>>>>> >> > >> >>>>>> >> > >> >>>>>> >> > -- >> >>>>>> >> > Becky Ligon >> >>>>>> >> > OrangeFS Support and Development >> >>>>>> >> > Omnibond Systems >> >>>>>> >> > Anderson, South Carolina >> >>>>>> >> > >> >>>>>> >> > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> > -- >> >>>>>> > Becky Ligon >> >>>>>> > OrangeFS Support and Development >> >>>>>> > Omnibond Systems >> >>>>>> > Anderson, South Carolina >> >>>>>> > >> >>>>>> > >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> -- >> >>>>> Becky Ligon >> >>>>> OrangeFS Support and Development >> >>>>> Omnibond Systems >> >>>>> Anderson, South Carolina >> >>>>> >> >>>>> >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> Becky Ligon >> >>>> OrangeFS Support and Development >> >>>> Omnibond Systems >> >>>> Anderson, South Carolina >> >>>> >> >>>> >> >>> >> >>> >> >>> >> >>> -- >> >>> Becky Ligon >> >>> OrangeFS Support and Development >> >>> Omnibond Systems >> >>> Anderson, South Carolina >> >>> >> >>> >> >> >> >> >> >> >> >> -- >> >> Becky Ligon >> >> OrangeFS Support and Development >> >> Omnibond Systems >> >> Anderson, South Carolina >> >> >> >> >> > >> > >> > >> > -- >> > Becky Ligon >> > OrangeFS Support and Development >> > Omnibond Systems >> > Anderson, South Carolina >> > >> > >> > > > > -- > Becky Ligon > OrangeFS Support and Development > Omnibond Systems > Anderson, South Carolina > > > -- Becky Ligon OrangeFS Support and Development Omnibond Systems Anderson, South Carolina
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
