Thanks for tracking that down, Bart.  The patch is in CVS now.

-Phil

On 03/11/2010 10:36 AM, Bart Taylor wrote:

After some further digging and off-list help from Phil, we determined that the pvfs2_inode structures were not being properly initialized.

PVFS2 is using slab cache for the pvfs2_inode structures which allcoates and initializes a big chunk of them. There is a constructor function (pvfs2_inode_cache_ctor) passed to the kmem_cache_create call that does the initialization up front to cut down on expensive setup time for semaphores and stuff later. The constructor only gets called one time when the cache is created or again later only if the cache needs to be grown. When the memory is released back into the cache, none of the contents are cleared before being handed out again.

In the pvfs2_inode_alloc function there is a call to kmem_cache_alloc to get a pvfs2_inode structure from the slab cache, but it is never initialized. It looks like PVFS2 expects the constructor to be called every time a kmem_cache_alloc call is made, because that is the only place the pvfs2_inode structures are cleared. Since they are not initialized, some of the fields - including the pinode_flags - are never reset from their previous use. If a pvfs2_inode structure has leftover pinode_flags that indicate an mtime update is required and that structure is handed out by the cache again, pvfs2_flush_inode does a setattr when the file is released updating the mtime on the file which may or may not actually need it.

The same constructor/initialization situation exists for dev_req_alloc and kiocb_alloc; the initializations are made once at cache creation time instead of each time a structure is allocated.

The attached patches clear more fields in the pvfs2_inode strcuture and directly call the pvfs2_inode_initialize function for each alloc. They also remove the constructor functions for kiocb and dev_req and initialize them in their respective alloc functions. There is a patch for 2.6 and another for 2.8.2. The patch for 2.6 makes a few small modifications to be more like 2.8.


Bart.




On Fri, Mar 5, 2010 at 9:20 AM, Bart Taylor <[email protected] <mailto:[email protected]>> wrote:


    After some more digging, I found that pvfs2_clear_inode is being
    called on the inode before the timestamp changes. That call
    destroys the pvfs2_inode, so the next time getattr is called on
    it, the inode has to be reallocated. When the inode gets
    reallocated, it is initialized with pinode_flags that indicate the
    mtime needs to be set, so when that file is finally released,
    pvfs2_flush_inode calls setattr and updates the mtime.

    I modified the pinode_flags to unset the mtime flag in
    pvfs2_inode_alloc. That took care of the problem, but I am not
    sure what else that will affect. I do not see any code in pvfs
    that is assigning the flags, so I assume it is coming from the
    kernel during the kmem_cache_alloc.

    That alloc function returns with just the P_INIT_FLAG every time
    except for the instance where the mtime is getting updated. In
    that case it also has the P_ATIME_FLAG and P_MTIME_FLAG set. Does
    anyone know why this function would sometimes return with more
    flags set? Could it have something to do with a make_bad_inode call?

    I should also mention that we have only seen this on 2.4 kernels.

    Bart.










    On Wed, Feb 24, 2010 at 9:15 AM, Bart Taylor <[email protected]
    <mailto:[email protected]>> wrote:

        Actually I managed to trigger the same timestamp change on
        2.8.2 this morning. I attached a copy to the job running
        against that file system and triggering the timestamp change;
        acache and ncache logging are disabled, but all other logging
        is enabled.

        Bart.




        A reference for looking through the log file:

        File/Directory                                    Handle
        =================================
        /                                               1048576
        /small-job/                                 715624920
        /small-job/data_file                     2147280687
        /small-job/temp/                        1431452804
        /small-job/temp/output_file         1431452797
        /small-job/temp/output_file.ctl     2147280686





        On Tue, Feb 23, 2010 at 11:11 PM, Bart Taylor
        <[email protected] <mailto:[email protected]>> wrote:

            Hey guys,

            We are running into a scenario where modify timestamps are
            getting updated when we do not think they should be. We
            have a single client accessing a single node file system
            that is reading an input file (357 bytes) and writing to
            two output files (~500 bytes) in a subdirectory. The
            timestamp is sporadically (one time in 10 or 20 runs)
            updated on the input file, but only if the write occurs
            (on the output file). I tried removing the portion of the
            job that writes to the output file and the timestamps
            never changes. I also moved the job off of PVFS2 and the
            timestamps never changes.

            The file system is a heavily patched version of the 2.6
            tree. I ran the same test on the latest 2.8.2 code and
            could not replicate the timestamp change. Unfortunately we
            cannot upgrade everything to 2.8.2 yet. Does anyone recall
            running into this particular problem, or have an idea of
            what might be causing it? I have attached a log file from
            the job with some explanations below.

            Thanks,

            Bart.

            I turned on verbose client logging and "32767" kernel
            logging and captured a run of the job failing. Acache and
            ncache are disabled. There are a few extra log messages
            that log when the SetMtimeFlag() call is made, but they do
            not show the flag being set for data_file.

            A reference for looking through the log file:

            File/Directory                            Handle

            =============================

            /                                                 1048576

            /small-job/                                  1047532

            /small-job/data_file                    1047520

            /small-job/temp/                         1047531

            /small-job/temp/output_file        1047501

            /small-job/temp/output_file.ctl   1047518





_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to