Thanks for tracking that down, Bart. The patch is in CVS now.
-Phil
On 03/11/2010 10:36 AM, Bart Taylor wrote:
After some further digging and off-list help from Phil, we determined
that the pvfs2_inode structures were not being properly initialized.
PVFS2 is using slab cache for the pvfs2_inode structures which
allcoates and initializes a big chunk of them. There is a constructor
function (pvfs2_inode_cache_ctor) passed to the kmem_cache_create call
that does the initialization up front to cut down on expensive setup
time for semaphores and stuff later. The constructor only gets called
one time when the cache is created or again later only if the cache
needs to be grown. When the memory is released back into the cache,
none of the contents are cleared before being handed out again.
In the pvfs2_inode_alloc function there is a call to kmem_cache_alloc
to get a pvfs2_inode structure from the slab cache, but it is never
initialized. It looks like PVFS2 expects the constructor to be called
every time a kmem_cache_alloc call is made, because that is the only
place the pvfs2_inode structures are cleared. Since they are not
initialized, some of the fields - including the pinode_flags - are
never reset from their previous use. If a pvfs2_inode structure has
leftover pinode_flags that indicate an mtime update is required and
that structure is handed out by the cache again, pvfs2_flush_inode
does a setattr when the file is released updating the mtime on the
file which may or may not actually need it.
The same constructor/initialization situation exists for dev_req_alloc
and kiocb_alloc; the initializations are made once at cache creation
time instead of each time a structure is allocated.
The attached patches clear more fields in the pvfs2_inode strcuture
and directly call the pvfs2_inode_initialize function for each alloc.
They also remove the constructor functions for kiocb and dev_req and
initialize them in their respective alloc functions. There is a patch
for 2.6 and another for 2.8.2. The patch for 2.6 makes a few small
modifications to be more like 2.8.
Bart.
On Fri, Mar 5, 2010 at 9:20 AM, Bart Taylor <[email protected]
<mailto:[email protected]>> wrote:
After some more digging, I found that pvfs2_clear_inode is being
called on the inode before the timestamp changes. That call
destroys the pvfs2_inode, so the next time getattr is called on
it, the inode has to be reallocated. When the inode gets
reallocated, it is initialized with pinode_flags that indicate the
mtime needs to be set, so when that file is finally released,
pvfs2_flush_inode calls setattr and updates the mtime.
I modified the pinode_flags to unset the mtime flag in
pvfs2_inode_alloc. That took care of the problem, but I am not
sure what else that will affect. I do not see any code in pvfs
that is assigning the flags, so I assume it is coming from the
kernel during the kmem_cache_alloc.
That alloc function returns with just the P_INIT_FLAG every time
except for the instance where the mtime is getting updated. In
that case it also has the P_ATIME_FLAG and P_MTIME_FLAG set. Does
anyone know why this function would sometimes return with more
flags set? Could it have something to do with a make_bad_inode call?
I should also mention that we have only seen this on 2.4 kernels.
Bart.
On Wed, Feb 24, 2010 at 9:15 AM, Bart Taylor <[email protected]
<mailto:[email protected]>> wrote:
Actually I managed to trigger the same timestamp change on
2.8.2 this morning. I attached a copy to the job running
against that file system and triggering the timestamp change;
acache and ncache logging are disabled, but all other logging
is enabled.
Bart.
A reference for looking through the log file:
File/Directory Handle
=================================
/ 1048576
/small-job/ 715624920
/small-job/data_file 2147280687
/small-job/temp/ 1431452804
/small-job/temp/output_file 1431452797
/small-job/temp/output_file.ctl 2147280686
On Tue, Feb 23, 2010 at 11:11 PM, Bart Taylor
<[email protected] <mailto:[email protected]>> wrote:
Hey guys,
We are running into a scenario where modify timestamps are
getting updated when we do not think they should be. We
have a single client accessing a single node file system
that is reading an input file (357 bytes) and writing to
two output files (~500 bytes) in a subdirectory. The
timestamp is sporadically (one time in 10 or 20 runs)
updated on the input file, but only if the write occurs
(on the output file). I tried removing the portion of the
job that writes to the output file and the timestamps
never changes. I also moved the job off of PVFS2 and the
timestamps never changes.
The file system is a heavily patched version of the 2.6
tree. I ran the same test on the latest 2.8.2 code and
could not replicate the timestamp change. Unfortunately we
cannot upgrade everything to 2.8.2 yet. Does anyone recall
running into this particular problem, or have an idea of
what might be causing it? I have attached a log file from
the job with some explanations below.
Thanks,
Bart.
I turned on verbose client logging and "32767" kernel
logging and captured a run of the job failing. Acache and
ncache are disabled. There are a few extra log messages
that log when the SetMtimeFlag() call is made, but they do
not show the flag being set for data_file.
A reference for looking through the log file:
File/Directory Handle
=============================
/ 1048576
/small-job/ 1047532
/small-job/data_file 1047520
/small-job/temp/ 1047531
/small-job/temp/output_file 1047501
/small-job/temp/output_file.ctl 1047518
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users