Jim:

Can you also send me your PVFS server config file?

Becky

On Wed, Jul 25, 2012 at 12:49 PM, Becky Ligon <[email protected]> wrote:

> Jim:
>
> Can you send me the kmod-pvfs2-...rpm?  I'd like to see how its files are
> layed out.
>
> Thanks,
> Becky
>
>
> On Sat, Jul 21, 2012 at 4:46 PM, Jim Kusznir <[email protected]> wrote:
>
>> Hi Becky:
>>
>> Thanks for all your input.  I was on travel and am currently catching
>> up on e-mail, so here are answers to your questions:
>>
>> 1) this problem occurs on both my ROCKS 5.1 (CentOS 5.2) and ROCKS 6
>> (CentOS 6.2) clusters identically.
>> 2) I can mount manually using the init script.  It just will not run
>> on boot.  It tries, but fails with the error message supplied.
>> 3) The module is installed with a kmod-pvfs2-... rpm (as is required
>> for ROCKS clusters...Any software to be installed on each node needs
>> to be its own RPM).  It appears to me that the module is being loaded
>> successfully.
>> 4) Ok, that sounds plausible.  I'll make those corrections and see if
>> that fixes things.
>>
>> Of course, the mount on boot was one of two show-stopping issues.  The
>> second show-stopping issue is how many kernel panics are being caused
>> by OrangeFS.  I've been experiencing 3-8 KP's a week on a light to
>> moderate load on my cluster (24 nodes + head node, 3 pvfs nodes).
>>
>> My versions in use are: 2.8.5 (ROCKS 5.1), 2.8.6 (ROCKS 6).  For my
>> users, I absolutely must have a "traditional filesystem interface"
>> (eg, MPI-IO or pvfs-* commands are not acceptable, they need to work
>> on the files like they would for any other filesystem).
>>
>> --Jim
>>
>> On Fri, Jul 20, 2012 at 1:45 PM, Becky Ligon <[email protected]> wrote:
>> > Jim:
>> >
>> > In your init script, you need to add the LD_LIBRARY_PATH variable, since
>> > your pvfs library is not in a standard location:
>> >
>> > export LD_LIBRARY_PATH=/opt/pvfs2/lib:$LD_LIBRARY_PATH
>> >
>> > Remove the LD_PRELOAD.  It is not needed here.
>> >
>> > Before "modprobe" will work, you have to run the command "depmod" to
>> update
>> > the modules list.  The "make kmod_install" does not automatically do
>> this.
>> > NOTE:  if you place the kernel module (pvfs2.ko) somewhere other than
>> > /lib/modules/`uname -r`/kernel/fs/pvfs2, then you can't use modprobe to
>> load
>> > the module.  Instead, use "/sbin/insmod <path>/pvfs2.ko".  If you are
>> using
>> > the rpm spec that I gave you (and it looks like you are), then pvfs2.ko
>> is
>> > located in /opt/pvfs2/lib/pvfs2.ko, in which case, you have to use the
>> > "insmod" command to load it and the "rmmod" command to unload it.
>> >
>> > When you issue a "stop", your script does not stop the client nor does
>> it
>> > unload the kernel module.  This will cause problems if you issue a
>> "start"
>> > by starting another pvfs2-client.  I will send you the init script that
>> we
>> > use here.  Maybe, you can modify it to accommodate your environment.  We
>> > have more checks in it than you have in yours.
>> >
>> > I am not familiar with how PVFS reacts to the "intr" option that you
>> specify
>> > in the mount command.  What is its purpose?
>> >
>> > Becky
>> >
>> >
>> > On Fri, Jul 20, 2012 at 3:27 PM, Becky Ligon <[email protected]>
>> wrote:
>> >>
>> >> Jim:
>> >>
>> >> I just realized that you have already sent me your init script.  Let me
>> >> take a closer look at it.
>> >>
>> >> Becky
>> >>
>> >>
>> >> On Fri, Jul 20, 2012 at 3:13 PM, Becky Ligon <[email protected]>
>> wrote:
>> >>>
>> >>> Jim:
>> >>>
>> >>> I have successfully booted my CentOS 6.2 system (using
>> >>> 2.6.32-220.13.1.el6.x86_64) and started the PVFS2 server and mounted
>> the
>> >>> client.  Thus, I can only guess that there is something in your
>> environment
>> >>> causing the problem.  Is it possible for you to mount the client by
>> issuing
>> >>> the commands manually once the system is running?  Can you send me a
>> copy of
>> >>> your startup script for mounting the client from your /etc/init.d
>> directory?
>> >>>
>> >>> Becky
>> >>>
>> >>>
>> >>> On Thu, Jul 19, 2012 at 12:58 PM, Becky Ligon <[email protected]>
>> wrote:
>> >>>>
>> >>>> Jim:
>> >>>>
>> >>>> I have been able to successfully mount-on-boot on a VM with the
>> >>>> 2.6.32-220.13.1.el6.x86_64.  However, I was using the Scientific
>> Linux 6
>> >>>> distro and NOT CentOS 6.2.  Next, I will try a CentOS 6.2 distro and
>> see
>> >>>> what happens with it.
>> >>>>
>> >>>> Becky
>> >>>>
>> >>>>
>> >>>> On Wed, Jul 18, 2012 at 5:14 PM, Becky Ligon <[email protected]>
>> wrote:
>> >>>>>
>> >>>>> Jim:
>> >>>>>
>> >>>>> Is the mount-on-boot issue just with your CentOS 6.2 environment?
>>  If
>> >>>>> so, which version of OrangeFS are you running?
>> >>>>>
>> >>>>> Becky
>> >>>>>
>> >>>>>
>> >>>>> On Wed, Jul 18, 2012 at 3:28 PM, Jim Kusznir <[email protected]>
>> >>>>> wrote:
>> >>>>>>
>> >>>>>> I cannot reproduce the pvfs2 crash on demand.  I have not yet seen
>> it
>> >>>>>> on centos 6, but I haven't placed centos6 into production yet.
>> >>>>>>
>> >>>>>> On my centos5 systems, its not reproducible on demand, but it
>> seems to
>> >>>>>> happen with moderate file access from a few different processes.
>> >>>>>> Sometimes scp'ing files to/from pvfs2 on the head node (which is a
>> >>>>>> pvfs2 client) will do it.  This has happened since the beginning of
>> >>>>>> pvfs2 for me; on the compute nodes, I'm not sure if there's more
>> than
>> >>>>>> one process, but since I updated to OrangeFS 2.8.5, I've been
>> seeing
>> >>>>>> compute nodes KP with the previous screenshot (it did not crash
>> (that
>> >>>>>> I'm aware of) prior to OrangeFS 2.8.5 on compute nodes).
>> >>>>>>
>> >>>>>> Here's my /etc/init.d/pvfs2-client script:
>> >>>>>> ---------------
>> >>>>>> #!/bin/sh
>> >>>>>> #
>> >>>>>> # chkconfig: 2345 99 99
>> >>>>>> #
>> >>>>>> # description: mount pvfs2 filesystem
>> >>>>>> #
>> >>>>>>
>> >>>>>> . /etc/rc.d/init.d/functions
>> >>>>>> #export LD_PRELOAD=/opt/db4/lib/
>> >>>>>> case "$1" in
>> >>>>>> start)
>> >>>>>>         echo -n "Mounting PVFS2 Filesystem: "
>> >>>>>>         modprobe pvfs2
>> >>>>>>         /opt/pvfs2/sbin/pvfs2-client -p
>> >>>>>> /opt/pvfs2/sbin/pvfs2-client-core
>> >>>>>>         mkdir -p /mnt/pvfs2
>> >>>>>>         mount -t pvfs2 -o intr tcp://pvfs2-io-0-0:3334/pvfs2-fs
>> >>>>>> /mnt/pvfs2
>> >>>>>>         touch /var/lock/subsys/pvfs2-client
>> >>>>>>         ;;
>> >>>>>>
>> >>>>>> stop)
>> >>>>>>         echo -n "Unmounting PVFS2 Filesystem: "
>> >>>>>>         umount /mnt/pvfs2
>> >>>>>>         rm -f /var/lock/subsys/pvfs2-client
>> >>>>>>         ;;
>> >>>>>>
>> >>>>>> restart)
>> >>>>>>         $0 stop
>> >>>>>>         $0 start
>> >>>>>>         ;;
>> >>>>>>
>> >>>>>> status)
>> >>>>>>         status $NAME
>> >>>>>>         ;;
>> >>>>>> *)
>> >>>>>>         echo "Usage: $NAME {start|stop|restart|status}"
>> >>>>>>         exit 1
>> >>>>>> esac
>> >>>>>>
>> >>>>>> exit 0
>> >>>>>> ----------------
>> >>>>>> I've tried with the export commented and uncommented, no
>> difference.
>> >>>>>>
>> >>>>>> --Jim
>> >>>>>>
>> >>>>>> On Wed, Jul 18, 2012 at 12:20 PM, Becky Ligon <[email protected]>
>> >>>>>> wrote:
>> >>>>>> > Thanks, Jim.
>> >>>>>> >
>> >>>>>> > We are using 2.6.32-220.4.1.el6.x86_64 in our production
>> >>>>>> > environment.  So, I
>> >>>>>> > should be able to setup a VM with your kernel version and test.
>>  Can
>> >>>>>> > you
>> >>>>>> > give me a scenario to try in order to reproduce the problem?
>> >>>>>> >
>> >>>>>> > I am also setting up a CENTOS 6 VM, so I can analyze the
>> >>>>>> > mount-with-boot
>> >>>>>> > issue.
>> >>>>>> >
>> >>>>>> > Becky
>> >>>>>> >
>> >>>>>> >
>> >>>>>> > On Wed, Jul 18, 2012 at 3:16 PM, Jim Kusznir <[email protected]
>> >
>> >>>>>> > wrote:
>> >>>>>> >>
>> >>>>>> >> [root@aeoltest torque]# rpm -qa |grep kernel
>> >>>>>> >> kernel-2.6.32-220.13.1.el6.x86_64
>> >>>>>> >> dracut-kernel-004-256.el6_2.1.noarch
>> >>>>>> >> kernel-devel-2.6.32-220.13.1.el6.x86_64
>> >>>>>> >> kernel-headers-2.6.32-220.13.1.el6.x86_64
>> >>>>>> >> kernel-firmware-2.6.32-220.13.1.el6.noarch
>> >>>>>> >> kernel-doc-2.6.32-220.13.1.el6.noarch
>> >>>>>> >> [root@aeoltest torque]# uname -a
>> >>>>>> >> Linux aeoltest.local 2.6.32-220.13.1.el6.x86_64 #1 SMP Tue Apr
>> 17
>> >>>>>> >> 23:56:34 BST 2012 x86_64 x86_64 x86_64 GNU/Linux
>> >>>>>> >> [root@aeoltest torque]#
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >> On Wed, Jul 18, 2012 at 12:10 PM, Becky Ligon <
>> [email protected]>
>> >>>>>> >> wrote:
>> >>>>>> >> > Jim:
>> >>>>>> >> >
>> >>>>>> >> > We are working on a few corrections to the user library, as we
>> >>>>>> >> > speak,
>> >>>>>> >> > that
>> >>>>>> >> > were identified last week.  Using LD_PRELOAD would definitely
>> get
>> >>>>>> >> > around
>> >>>>>> >> > the
>> >>>>>> >> > kernel issues at hand, but I ask that you wait until we have
>> all
>> >>>>>> >> > of the
>> >>>>>> >> > current corrections in place before using it.
>> >>>>>> >> >
>> >>>>>> >> > I also have some questions for you.  I am working the issue
>> with
>> >>>>>> >> > the
>> >>>>>> >> > "won't
>> >>>>>> >> > mount on boot" issue and would like to know the specific
>> kernel
>> >>>>>> >> > that you
>> >>>>>> >> > are
>> >>>>>> >> > using under CentOS 6.2.
>> >>>>>> >> >
>> >>>>>> >> > Thanks,
>> >>>>>> >> > Becky
>> >>>>>> >> >
>> >>>>>> >> >
>> >>>>>> >> > On Wed, Jul 18, 2012 at 3:01 PM, Jim Kusznir <
>> [email protected]>
>> >>>>>> >> > wrote:
>> >>>>>> >> >>
>> >>>>>> >> >> I managed to get a screenshot of a ip-kvm with the last
>> chunk of
>> >>>>>> >> >> a
>> >>>>>> >> >> pvfs-induced KP on a compute node; image attached.
>> >>>>>> >> >>
>> >>>>>> >> >> With respect to client access methods, perhaps I should
>> switch
>> >>>>>> >> >> to a
>> >>>>>> >> >> user space solution.  I remember hearing about an LD_Preload
>> >>>>>> >> >> client
>> >>>>>> >> >> module (not using fuse, but being entirely userspace).  Is
>> that
>> >>>>>> >> >> "ready" with 2.8.6?  If not, perhaps I need to switch to the
>> >>>>>> >> >> fuse
>> >>>>>> >> >> module...
>> >>>>>> >> >>
>> >>>>>> >> >> --Jim
>> >>>>>> >> >>
>> >>>>>> >> >> On Wed, Jul 18, 2012 at 11:46 AM, Andrew Savchenko
>> >>>>>> >> >> <[email protected]>
>> >>>>>> >> >> wrote:
>> >>>>>> >> >> > Hello Becky,
>> >>>>>> >> >> >
>> >>>>>> >> >> > On Wed, 18 Jul 2012 12:43:51 -0400 Becky Ligon wrote:
>> >>>>>> >> >> >> Andrew:
>> >>>>>> >> >> >>
>> >>>>>> >> >> >> 2.8.6 does not fix the problem you were seeing with
>> question
>> >>>>>> >> >> >> marks
>> >>>>>> >> >> >> in
>> >>>>>> >> >> >> the
>> >>>>>> >> >> >> "ls" output, but we are working on it.
>> >>>>>> >> >> >>
>> >>>>>> >> >> >> Just FYI!
>> >>>>>> >> >> >
>> >>>>>> >> >> > Thanks for the warning. I'll keep sticking to the fuse
>> client
>> >>>>>> >> >> > during
>> >>>>>> >> >> > update then.
>> >>>>>> >> >> >
>> >>>>>> >> >> > Best regards,
>> >>>>>> >> >> > Andrew Savchenko
>> >>>>>> >> >> >
>> >>>>>> >> >> > _______________________________________________
>> >>>>>> >> >> > Pvfs2-users mailing list
>> >>>>>> >> >> > [email protected]
>> >>>>>> >> >> >
>> >>>>>> >> >> >
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>> >>>>>> >> >> >
>> >>>>>> >> >
>> >>>>>> >> >
>> >>>>>> >> >
>> >>>>>> >> >
>> >>>>>> >> > --
>> >>>>>> >> > Becky Ligon
>> >>>>>> >> > OrangeFS Support and Development
>> >>>>>> >> > Omnibond Systems
>> >>>>>> >> > Anderson, South Carolina
>> >>>>>> >> >
>> >>>>>> >> >
>> >>>>>> >
>> >>>>>> >
>> >>>>>> >
>> >>>>>> >
>> >>>>>> > --
>> >>>>>> > Becky Ligon
>> >>>>>> > OrangeFS Support and Development
>> >>>>>> > Omnibond Systems
>> >>>>>> > Anderson, South Carolina
>> >>>>>> >
>> >>>>>> >
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Becky Ligon
>> >>>>> OrangeFS Support and Development
>> >>>>> Omnibond Systems
>> >>>>> Anderson, South Carolina
>> >>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Becky Ligon
>> >>>> OrangeFS Support and Development
>> >>>> Omnibond Systems
>> >>>> Anderson, South Carolina
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Becky Ligon
>> >>> OrangeFS Support and Development
>> >>> Omnibond Systems
>> >>> Anderson, South Carolina
>> >>>
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Becky Ligon
>> >> OrangeFS Support and Development
>> >> Omnibond Systems
>> >> Anderson, South Carolina
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > Becky Ligon
>> > OrangeFS Support and Development
>> > Omnibond Systems
>> > Anderson, South Carolina
>> >
>> >
>>
>
>
>
> --
> Becky Ligon
> OrangeFS Support and Development
> Omnibond Systems
> Anderson, South Carolina
>
>
>


-- 
Becky Ligon
OrangeFS Support and Development
Omnibond Systems
Anderson, South Carolina
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to