Doh- sorry to hear about the merge problem, but I am relieved to know we don't have another bug floating around on this path! Thanks for the update.

-Phil

Bart Taylor wrote:
I finally narrowed it down. It turns out we had a problem merging the previous release, but it did not show up since we never got a chance to test it. Sam added an op_release in namei.c to fix a kmem_cache leak, and it sneaked in twice without warning. Taking that out fixed the problem.

Bart.



On Tue, Jul 29, 2008 at 8:03 AM, Phil Carns <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:

    I'm having a hard time thinking of anything specific that would have
    impacted this.  You could maybe try to narrow it down some by taking
    a diff of just the src/kernel/linux-2.6 directory and apply that to
    a 2.7.1 tree to test and see if it is something specifically in the
    kernel module code.

    -Phil

    Bart Taylor wrote:

        I ran the test the same way you mentioned - outside of the LTP
        framework - and still had the problem. I have applied the patch
        that fixed the rename06 test as well as the kernel buffer
        overflow fix from a few days ago and still have the problem.

        I did a CVS export of head this morning and used the same
        configure and build as last time. I ran the open file test
        against a file system created from head and against a 271 file
        system (with some recent patches) and both tests succeed, so it
        seems like the fix is somewhere between the 271 release and
        head, but I am not sure where. Do you have an idea where it
        might be lurking?

        Bart.



        On Fri, Jul 25, 2008 at 7:16 AM, Phil Carns <[EMAIL PROTECTED]
        <mailto:[EMAIL PROTECTED]> <mailto:[EMAIL PROTECTED]
        <mailto:[EMAIL PROTECTED]>>> wrote:

           Phil Carns wrote:

               Bart Taylor wrote:

                   I am having a problem with an LTP test from the
        20080630 set
                   of LTP tests. The
                   'openfile01' test does 10 threaded opens of 10 files.
        It is
                   attached in case you
                   need a copy. The test completes successfully, but an 'ls'
                   command immediately
                   after that  hangs and cannot be killed. Eventually
        the node
                   hangs as well. Any
                   command that touches the file system will trigger the
        problem.

                   We also tried this with the 2.7.1 release tarball and see
                   the same problem. A
                   single node file system running RHEL4 and a 2.6.9-67
        kernel.
                   The client was on
                   the same node.

                   Here is the configure line used:

                     ./configure --with-kernel=/lib/modules/`uname -r`/build

                   and how the client was started:

                     ./pvfs2-client -p ./pvfs2-client-core

                   The fs.conf file is attached.

                   The client debug mask was set to 'all', and
                   /proc/sys/pvfs2/debug had a value of
                   32767. But once the 'ls' command was issued, there
        were no
                   log messages.

                   Does anyone else see this error?

                   Bart.


               Are you able to reproduce this running openfile by itself
        after
               a fresh boot?  It looks like openfile operates on a file
        in the
               current working directory, so I have been trying to run
        it like
               this:

               <mount pvfs2 on /mnt/pvfs2>
               cd /mnt/pvfs2
               ~/openfile -f10 -t10
               ls -alh

               So far I haven't had any trouble with that particular
               combination.  I'm running it on a centos4 box with a very
               similar kernel.  The openfile tests looks fairly
        innocent- with
               those arguments each of 10 separate threads open the same
        single
               file 10 times (for a total of 100 file descriptors open
        to the
               same file) if I understand correctly.

               If I try to run a full LTP test, however, I do have other
               problems.  In particular the rename06 test hangs.  I can
        trigger
               that one by itself as follows:

               export TMPDIR=/mnt/pvfs2
               ~/rename06

               The same suite of tests runs fine on a 2.6.24 kernel and
        a trunk
               build of PVFS.  I'm not sure yet if the difference is between
               pvfs versions or between kernel versions.


           The rename06 test passes with pvfs trunk; I think that particular
           problem has already been fixed.  I still haven't figured out why
           openfile01 would be a problem, though.

           -Phil





_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to