Jim:

The config files that you sent me are for older PVFS systems.  We no longer
use the two-config-file approach anymore, and the config file should have a
data space and a storage space.  Did you send me the wrong files?

Becky

On Thu, Jul 26, 2012 at 11:48 AM, Becky Ligon <[email protected]> wrote:

> Jim:
>
> Your 2.8.5 rpms install the kernel module in
> /lib/modules/2.6.18-92.1.13.el5/kernel/fs/pvfs2/pvfs2.ko and 2.8.6 into
> /lib/modules/2.6.32-220.13.1.el6.x86_64/kernel/fs/pvfs2/pvfs2.ko.  Please
> verify that:
>
> /lib/modules/2.6.18-92.1.13.el5/modules.dep contains
> "kernel/fs/pvfs2/pvfs2.ko"
>
> /lib/modules/2.6.32-220.13.1.el6.x86_64/modules.dep contains
> "kernel/fs/pvfs2/pvfs2.ko"
>
> You might have to modify your rebuild scripts to execute a "depmod" AFTER
> the orangefs-kmod rpm is installed.  Your scripts may already be doing
> this, and, if so, then the kernel module should be loaded after an
> install.  You did mention that you thought the kernel module was being
> loaded properly.  If that is the case, then adding the "LD_LIBRARY_PATH" to
> your OrangeFS init.d script should allow the client-core to fire up
> properly and then the following mount.
>
> Let me know if this works for you.
>
> Becky
>
> On Wed, Jul 25, 2012 at 5:50 PM, Jim Kusznir <[email protected]> wrote:
>
>> Here's the last file.
>>
>> On Wed, Jul 25, 2012 at 10:06 AM, Becky Ligon <[email protected]> wrote:
>> > Jim:
>> >
>> > One more thing:  can you send me the pvfs2-client.log files from the
>> nodes
>> > where a KP has occurred?  If possible, I'd like the corresponding
>> > /var/log/messages log file from when the KP happened.
>> >
>> > Thanks,
>> > Becky
>> >
>> >
>> > On Wed, Jul 25, 2012 at 1:05 PM, Becky Ligon <[email protected]>
>> wrote:
>> >>
>> >> Jim:
>> >>
>> >> Can you also send me your PVFS server config file?
>> >>
>> >> Becky
>> >>
>> >>
>> >> On Wed, Jul 25, 2012 at 12:49 PM, Becky Ligon <[email protected]>
>> wrote:
>> >>>
>> >>> Jim:
>> >>>
>> >>> Can you send me the kmod-pvfs2-...rpm?  I'd like to see how its files
>> are
>> >>> layed out.
>> >>>
>> >>> Thanks,
>> >>> Becky
>> >>>
>> >>>
>> >>> On Sat, Jul 21, 2012 at 4:46 PM, Jim Kusznir <[email protected]>
>> wrote:
>> >>>>
>> >>>> Hi Becky:
>> >>>>
>> >>>> Thanks for all your input.  I was on travel and am currently catching
>> >>>> up on e-mail, so here are answers to your questions:
>> >>>>
>> >>>> 1) this problem occurs on both my ROCKS 5.1 (CentOS 5.2) and ROCKS 6
>> >>>> (CentOS 6.2) clusters identically.
>> >>>> 2) I can mount manually using the init script.  It just will not run
>> >>>> on boot.  It tries, but fails with the error message supplied.
>> >>>> 3) The module is installed with a kmod-pvfs2-... rpm (as is required
>> >>>> for ROCKS clusters...Any software to be installed on each node needs
>> >>>> to be its own RPM).  It appears to me that the module is being loaded
>> >>>> successfully.
>> >>>> 4) Ok, that sounds plausible.  I'll make those corrections and see if
>> >>>> that fixes things.
>> >>>>
>> >>>> Of course, the mount on boot was one of two show-stopping issues.
>>  The
>> >>>> second show-stopping issue is how many kernel panics are being caused
>> >>>> by OrangeFS.  I've been experiencing 3-8 KP's a week on a light to
>> >>>> moderate load on my cluster (24 nodes + head node, 3 pvfs nodes).
>> >>>>
>> >>>> My versions in use are: 2.8.5 (ROCKS 5.1), 2.8.6 (ROCKS 6).  For my
>> >>>> users, I absolutely must have a "traditional filesystem interface"
>> >>>> (eg, MPI-IO or pvfs-* commands are not acceptable, they need to work
>> >>>> on the files like they would for any other filesystem).
>> >>>>
>> >>>> --Jim
>> >>>>
>> >>>> On Fri, Jul 20, 2012 at 1:45 PM, Becky Ligon <[email protected]>
>> wrote:
>> >>>> > Jim:
>> >>>> >
>> >>>> > In your init script, you need to add the LD_LIBRARY_PATH variable,
>> >>>> > since
>> >>>> > your pvfs library is not in a standard location:
>> >>>> >
>> >>>> > export LD_LIBRARY_PATH=/opt/pvfs2/lib:$LD_LIBRARY_PATH
>> >>>> >
>> >>>> > Remove the LD_PRELOAD.  It is not needed here.
>> >>>> >
>> >>>> > Before "modprobe" will work, you have to run the command "depmod"
>> to
>> >>>> > update
>> >>>> > the modules list.  The "make kmod_install" does not automatically
>> do
>> >>>> > this.
>> >>>> > NOTE:  if you place the kernel module (pvfs2.ko) somewhere other
>> than
>> >>>> > /lib/modules/`uname -r`/kernel/fs/pvfs2, then you can't use
>> modprobe
>> >>>> > to load
>> >>>> > the module.  Instead, use "/sbin/insmod <path>/pvfs2.ko".  If you
>> are
>> >>>> > using
>> >>>> > the rpm spec that I gave you (and it looks like you are), then
>> >>>> > pvfs2.ko is
>> >>>> > located in /opt/pvfs2/lib/pvfs2.ko, in which case, you have to use
>> the
>> >>>> > "insmod" command to load it and the "rmmod" command to unload it.
>> >>>> >
>> >>>> > When you issue a "stop", your script does not stop the client nor
>> does
>> >>>> > it
>> >>>> > unload the kernel module.  This will cause problems if you issue a
>> >>>> > "start"
>> >>>> > by starting another pvfs2-client.  I will send you the init script
>> >>>> > that we
>> >>>> > use here.  Maybe, you can modify it to accommodate your
>> environment.
>> >>>> > We
>> >>>> > have more checks in it than you have in yours.
>> >>>> >
>> >>>> > I am not familiar with how PVFS reacts to the "intr" option that
>> you
>> >>>> > specify
>> >>>> > in the mount command.  What is its purpose?
>> >>>> >
>> >>>> > Becky
>> >>>> >
>> >>>> >
>> >>>> > On Fri, Jul 20, 2012 at 3:27 PM, Becky Ligon <[email protected]>
>> >>>> > wrote:
>> >>>> >>
>> >>>> >> Jim:
>> >>>> >>
>> >>>> >> I just realized that you have already sent me your init script.
>>  Let
>> >>>> >> me
>> >>>> >> take a closer look at it.
>> >>>> >>
>> >>>> >> Becky
>> >>>> >>
>> >>>> >>
>> >>>> >> On Fri, Jul 20, 2012 at 3:13 PM, Becky Ligon <[email protected]>
>> >>>> >> wrote:
>> >>>> >>>
>> >>>> >>> Jim:
>> >>>> >>>
>> >>>> >>> I have successfully booted my CentOS 6.2 system (using
>> >>>> >>> 2.6.32-220.13.1.el6.x86_64) and started the PVFS2 server and
>> mounted
>> >>>> >>> the
>> >>>> >>> client.  Thus, I can only guess that there is something in your
>> >>>> >>> environment
>> >>>> >>> causing the problem.  Is it possible for you to mount the client
>> by
>> >>>> >>> issuing
>> >>>> >>> the commands manually once the system is running?  Can you send
>> me a
>> >>>> >>> copy of
>> >>>> >>> your startup script for mounting the client from your /etc/init.d
>> >>>> >>> directory?
>> >>>> >>>
>> >>>> >>> Becky
>> >>>> >>>
>> >>>> >>>
>> >>>> >>> On Thu, Jul 19, 2012 at 12:58 PM, Becky Ligon <
>> [email protected]>
>> >>>> >>> wrote:
>> >>>> >>>>
>> >>>> >>>> Jim:
>> >>>> >>>>
>> >>>> >>>> I have been able to successfully mount-on-boot on a VM with the
>> >>>> >>>> 2.6.32-220.13.1.el6.x86_64.  However, I was using the Scientific
>> >>>> >>>> Linux 6
>> >>>> >>>> distro and NOT CentOS 6.2.  Next, I will try a CentOS 6.2 distro
>> >>>> >>>> and see
>> >>>> >>>> what happens with it.
>> >>>> >>>>
>> >>>> >>>> Becky
>> >>>> >>>>
>> >>>> >>>>
>> >>>> >>>> On Wed, Jul 18, 2012 at 5:14 PM, Becky Ligon <
>> [email protected]>
>> >>>> >>>> wrote:
>> >>>> >>>>>
>> >>>> >>>>> Jim:
>> >>>> >>>>>
>> >>>> >>>>> Is the mount-on-boot issue just with your CentOS 6.2
>> environment?
>> >>>> >>>>> If
>> >>>> >>>>> so, which version of OrangeFS are you running?
>> >>>> >>>>>
>> >>>> >>>>> Becky
>> >>>> >>>>>
>> >>>> >>>>>
>> >>>> >>>>> On Wed, Jul 18, 2012 at 3:28 PM, Jim Kusznir <
>> [email protected]>
>> >>>> >>>>> wrote:
>> >>>> >>>>>>
>> >>>> >>>>>> I cannot reproduce the pvfs2 crash on demand.  I have not yet
>> >>>> >>>>>> seen it
>> >>>> >>>>>> on centos 6, but I haven't placed centos6 into production yet.
>> >>>> >>>>>>
>> >>>> >>>>>> On my centos5 systems, its not reproducible on demand, but it
>> >>>> >>>>>> seems to
>> >>>> >>>>>> happen with moderate file access from a few different
>> processes.
>> >>>> >>>>>> Sometimes scp'ing files to/from pvfs2 on the head node (which
>> is
>> >>>> >>>>>> a
>> >>>> >>>>>> pvfs2 client) will do it.  This has happened since the
>> beginning
>> >>>> >>>>>> of
>> >>>> >>>>>> pvfs2 for me; on the compute nodes, I'm not sure if there's
>> more
>> >>>> >>>>>> than
>> >>>> >>>>>> one process, but since I updated to OrangeFS 2.8.5, I've been
>> >>>> >>>>>> seeing
>> >>>> >>>>>> compute nodes KP with the previous screenshot (it did not
>> crash
>> >>>> >>>>>> (that
>> >>>> >>>>>> I'm aware of) prior to OrangeFS 2.8.5 on compute nodes).
>> >>>> >>>>>>
>> >>>> >>>>>> Here's my /etc/init.d/pvfs2-client script:
>> >>>> >>>>>> ---------------
>> >>>> >>>>>> #!/bin/sh
>> >>>> >>>>>> #
>> >>>> >>>>>> # chkconfig: 2345 99 99
>> >>>> >>>>>> #
>> >>>> >>>>>> # description: mount pvfs2 filesystem
>> >>>> >>>>>> #
>> >>>> >>>>>>
>> >>>> >>>>>> . /etc/rc.d/init.d/functions
>> >>>> >>>>>> #export LD_PRELOAD=/opt/db4/lib/
>> >>>> >>>>>> case "$1" in
>> >>>> >>>>>> start)
>> >>>> >>>>>>         echo -n "Mounting PVFS2 Filesystem: "
>> >>>> >>>>>>         modprobe pvfs2
>> >>>> >>>>>>         /opt/pvfs2/sbin/pvfs2-client -p
>> >>>> >>>>>> /opt/pvfs2/sbin/pvfs2-client-core
>> >>>> >>>>>>         mkdir -p /mnt/pvfs2
>> >>>> >>>>>>         mount -t pvfs2 -o intr
>> tcp://pvfs2-io-0-0:3334/pvfs2-fs
>> >>>> >>>>>> /mnt/pvfs2
>> >>>> >>>>>>         touch /var/lock/subsys/pvfs2-client
>> >>>> >>>>>>         ;;
>> >>>> >>>>>>
>> >>>> >>>>>> stop)
>> >>>> >>>>>>         echo -n "Unmounting PVFS2 Filesystem: "
>> >>>> >>>>>>         umount /mnt/pvfs2
>> >>>> >>>>>>         rm -f /var/lock/subsys/pvfs2-client
>> >>>> >>>>>>         ;;
>> >>>> >>>>>>
>> >>>> >>>>>> restart)
>> >>>> >>>>>>         $0 stop
>> >>>> >>>>>>         $0 start
>> >>>> >>>>>>         ;;
>> >>>> >>>>>>
>> >>>> >>>>>> status)
>> >>>> >>>>>>         status $NAME
>> >>>> >>>>>>         ;;
>> >>>> >>>>>> *)
>> >>>> >>>>>>         echo "Usage: $NAME {start|stop|restart|status}"
>> >>>> >>>>>>         exit 1
>> >>>> >>>>>> esac
>> >>>> >>>>>>
>> >>>> >>>>>> exit 0
>> >>>> >>>>>> ----------------
>> >>>> >>>>>> I've tried with the export commented and uncommented, no
>> >>>> >>>>>> difference.
>> >>>> >>>>>>
>> >>>> >>>>>> --Jim
>> >>>> >>>>>>
>> >>>> >>>>>> On Wed, Jul 18, 2012 at 12:20 PM, Becky Ligon
>> >>>> >>>>>> <[email protected]>
>> >>>> >>>>>> wrote:
>> >>>> >>>>>> > Thanks, Jim.
>> >>>> >>>>>> >
>> >>>> >>>>>> > We are using 2.6.32-220.4.1.el6.x86_64 in our production
>> >>>> >>>>>> > environment.  So, I
>> >>>> >>>>>> > should be able to setup a VM with your kernel version and
>> test.
>> >>>> >>>>>> > Can
>> >>>> >>>>>> > you
>> >>>> >>>>>> > give me a scenario to try in order to reproduce the problem?
>> >>>> >>>>>> >
>> >>>> >>>>>> > I am also setting up a CENTOS 6 VM, so I can analyze the
>> >>>> >>>>>> > mount-with-boot
>> >>>> >>>>>> > issue.
>> >>>> >>>>>> >
>> >>>> >>>>>> > Becky
>> >>>> >>>>>> >
>> >>>> >>>>>> >
>> >>>> >>>>>> > On Wed, Jul 18, 2012 at 3:16 PM, Jim Kusznir
>> >>>> >>>>>> > <[email protected]>
>> >>>> >>>>>> > wrote:
>> >>>> >>>>>> >>
>> >>>> >>>>>> >> [root@aeoltest torque]# rpm -qa |grep kernel
>> >>>> >>>>>> >> kernel-2.6.32-220.13.1.el6.x86_64
>> >>>> >>>>>> >> dracut-kernel-004-256.el6_2.1.noarch
>> >>>> >>>>>> >> kernel-devel-2.6.32-220.13.1.el6.x86_64
>> >>>> >>>>>> >> kernel-headers-2.6.32-220.13.1.el6.x86_64
>> >>>> >>>>>> >> kernel-firmware-2.6.32-220.13.1.el6.noarch
>> >>>> >>>>>> >> kernel-doc-2.6.32-220.13.1.el6.noarch
>> >>>> >>>>>> >> [root@aeoltest torque]# uname -a
>> >>>> >>>>>> >> Linux aeoltest.local 2.6.32-220.13.1.el6.x86_64 #1 SMP Tue
>> Apr
>> >>>> >>>>>> >> 17
>> >>>> >>>>>> >> 23:56:34 BST 2012 x86_64 x86_64 x86_64 GNU/Linux
>> >>>> >>>>>> >> [root@aeoltest torque]#
>> >>>> >>>>>> >>
>> >>>> >>>>>> >>
>> >>>> >>>>>> >> On Wed, Jul 18, 2012 at 12:10 PM, Becky Ligon
>> >>>> >>>>>> >> <[email protected]>
>> >>>> >>>>>> >> wrote:
>> >>>> >>>>>> >> > Jim:
>> >>>> >>>>>> >> >
>> >>>> >>>>>> >> > We are working on a few corrections to the user library,
>> as
>> >>>> >>>>>> >> > we
>> >>>> >>>>>> >> > speak,
>> >>>> >>>>>> >> > that
>> >>>> >>>>>> >> > were identified last week.  Using LD_PRELOAD would
>> >>>> >>>>>> >> > definitely get
>> >>>> >>>>>> >> > around
>> >>>> >>>>>> >> > the
>> >>>> >>>>>> >> > kernel issues at hand, but I ask that you wait until we
>> have
>> >>>> >>>>>> >> > all
>> >>>> >>>>>> >> > of the
>> >>>> >>>>>> >> > current corrections in place before using it.
>> >>>> >>>>>> >> >
>> >>>> >>>>>> >> > I also have some questions for you.  I am working the
>> issue
>> >>>> >>>>>> >> > with
>> >>>> >>>>>> >> > the
>> >>>> >>>>>> >> > "won't
>> >>>> >>>>>> >> > mount on boot" issue and would like to know the specific
>> >>>> >>>>>> >> > kernel
>> >>>> >>>>>> >> > that you
>> >>>> >>>>>> >> > are
>> >>>> >>>>>> >> > using under CentOS 6.2.
>> >>>> >>>>>> >> >
>> >>>> >>>>>> >> > Thanks,
>> >>>> >>>>>> >> > Becky
>> >>>> >>>>>> >> >
>> >>>> >>>>>> >> >
>> >>>> >>>>>> >> > On Wed, Jul 18, 2012 at 3:01 PM, Jim Kusznir
>> >>>> >>>>>> >> > <[email protected]>
>> >>>> >>>>>> >> > wrote:
>> >>>> >>>>>> >> >>
>> >>>> >>>>>> >> >> I managed to get a screenshot of a ip-kvm with the last
>> >>>> >>>>>> >> >> chunk of
>> >>>> >>>>>> >> >> a
>> >>>> >>>>>> >> >> pvfs-induced KP on a compute node; image attached.
>> >>>> >>>>>> >> >>
>> >>>> >>>>>> >> >> With respect to client access methods, perhaps I should
>> >>>> >>>>>> >> >> switch
>> >>>> >>>>>> >> >> to a
>> >>>> >>>>>> >> >> user space solution.  I remember hearing about an
>> >>>> >>>>>> >> >> LD_Preload
>> >>>> >>>>>> >> >> client
>> >>>> >>>>>> >> >> module (not using fuse, but being entirely userspace).
>>  Is
>> >>>> >>>>>> >> >> that
>> >>>> >>>>>> >> >> "ready" with 2.8.6?  If not, perhaps I need to switch to
>> >>>> >>>>>> >> >> the
>> >>>> >>>>>> >> >> fuse
>> >>>> >>>>>> >> >> module...
>> >>>> >>>>>> >> >>
>> >>>> >>>>>> >> >> --Jim
>> >>>> >>>>>> >> >>
>> >>>> >>>>>> >> >> On Wed, Jul 18, 2012 at 11:46 AM, Andrew Savchenko
>> >>>> >>>>>> >> >> <[email protected]>
>> >>>> >>>>>> >> >> wrote:
>> >>>> >>>>>> >> >> > Hello Becky,
>> >>>> >>>>>> >> >> >
>> >>>> >>>>>> >> >> > On Wed, 18 Jul 2012 12:43:51 -0400 Becky Ligon wrote:
>> >>>> >>>>>> >> >> >> Andrew:
>> >>>> >>>>>> >> >> >>
>> >>>> >>>>>> >> >> >> 2.8.6 does not fix the problem you were seeing with
>> >>>> >>>>>> >> >> >> question
>> >>>> >>>>>> >> >> >> marks
>> >>>> >>>>>> >> >> >> in
>> >>>> >>>>>> >> >> >> the
>> >>>> >>>>>> >> >> >> "ls" output, but we are working on it.
>> >>>> >>>>>> >> >> >>
>> >>>> >>>>>> >> >> >> Just FYI!
>> >>>> >>>>>> >> >> >
>> >>>> >>>>>> >> >> > Thanks for the warning. I'll keep sticking to the fuse
>> >>>> >>>>>> >> >> > client
>> >>>> >>>>>> >> >> > during
>> >>>> >>>>>> >> >> > update then.
>> >>>> >>>>>> >> >> >
>> >>>> >>>>>> >> >> > Best regards,
>> >>>> >>>>>> >> >> > Andrew Savchenko
>> >>>> >>>>>> >> >> >
>> >>>> >>>>>> >> >> > _______________________________________________
>> >>>> >>>>>> >> >> > Pvfs2-users mailing list
>> >>>> >>>>>> >> >> > [email protected]
>> >>>> >>>>>> >> >> >
>> >>>> >>>>>> >> >> >
>> >>>> >>>>>> >> >> >
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>> >>>> >>>>>> >> >> >
>> >>>> >>>>>> >> >
>> >>>> >>>>>> >> >
>> >>>> >>>>>> >> >
>> >>>> >>>>>> >> >
>> >>>> >>>>>> >> > --
>> >>>> >>>>>> >> > Becky Ligon
>> >>>> >>>>>> >> > OrangeFS Support and Development
>> >>>> >>>>>> >> > Omnibond Systems
>> >>>> >>>>>> >> > Anderson, South Carolina
>> >>>> >>>>>> >> >
>> >>>> >>>>>> >> >
>> >>>> >>>>>> >
>> >>>> >>>>>> >
>> >>>> >>>>>> >
>> >>>> >>>>>> >
>> >>>> >>>>>> > --
>> >>>> >>>>>> > Becky Ligon
>> >>>> >>>>>> > OrangeFS Support and Development
>> >>>> >>>>>> > Omnibond Systems
>> >>>> >>>>>> > Anderson, South Carolina
>> >>>> >>>>>> >
>> >>>> >>>>>> >
>> >>>> >>>>>
>> >>>> >>>>>
>> >>>> >>>>>
>> >>>> >>>>>
>> >>>> >>>>> --
>> >>>> >>>>> Becky Ligon
>> >>>> >>>>> OrangeFS Support and Development
>> >>>> >>>>> Omnibond Systems
>> >>>> >>>>> Anderson, South Carolina
>> >>>> >>>>>
>> >>>> >>>>>
>> >>>> >>>>
>> >>>> >>>>
>> >>>> >>>>
>> >>>> >>>> --
>> >>>> >>>> Becky Ligon
>> >>>> >>>> OrangeFS Support and Development
>> >>>> >>>> Omnibond Systems
>> >>>> >>>> Anderson, South Carolina
>> >>>> >>>>
>> >>>> >>>>
>> >>>> >>>
>> >>>> >>>
>> >>>> >>>
>> >>>> >>> --
>> >>>> >>> Becky Ligon
>> >>>> >>> OrangeFS Support and Development
>> >>>> >>> Omnibond Systems
>> >>>> >>> Anderson, South Carolina
>> >>>> >>>
>> >>>> >>>
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >> --
>> >>>> >> Becky Ligon
>> >>>> >> OrangeFS Support and Development
>> >>>> >> Omnibond Systems
>> >>>> >> Anderson, South Carolina
>> >>>> >>
>> >>>> >>
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> > --
>> >>>> > Becky Ligon
>> >>>> > OrangeFS Support and Development
>> >>>> > Omnibond Systems
>> >>>> > Anderson, South Carolina
>> >>>> >
>> >>>> >
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Becky Ligon
>> >>> OrangeFS Support and Development
>> >>> Omnibond Systems
>> >>> Anderson, South Carolina
>> >>>
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Becky Ligon
>> >> OrangeFS Support and Development
>> >> Omnibond Systems
>> >> Anderson, South Carolina
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > Becky Ligon
>> > OrangeFS Support and Development
>> > Omnibond Systems
>> > Anderson, South Carolina
>> >
>> >
>>
>
>
>
> --
> Becky Ligon
> OrangeFS Support and Development
> Omnibond Systems
> Anderson, South Carolina
>
>
>


-- 
Becky Ligon
OrangeFS Support and Development
Omnibond Systems
Anderson, South Carolina
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to