Hello,
we have an user who's program regularly fails as he's trying to read
input files from or PVFS2 file system. Since we can't see anything wrong
with PVFS2 both on the server and client end, I figure that I'll ask the
public for suggestions.
From the user standpoint, all he sees is something like this (output from
his program):
--------------
Error reading file: sandia_helium_3m201compL2up.uda.003/input.xml
--------------
In this case, he's running on 32 dual Opteron nodes, and each CPU on these
nodes (64 in total) reads these input files via standard UNIX I/O. All
clues that we got so far is from examination of /var/log/messages on all
the nodes used in this job (32 in total), with the following results:
--------------------------
.....
bunch of nodes were OK
.....
Jan 10 09:38:08 da065 kernel: pvfs2:copy_attributes_to_inode: got invalid
attribute type 2
Jan 10 09:38:08 da065 kernel: pvfs2_inode_getattr: failed to copy attributes
Jan 10 09:38:08 da065 kernel: pvfs2:copy_attributes_to_inode: got invalid
attribute type 2
Jan 10 09:38:08 da065 kernel: pvfs2_inode_getattr: failed to copy attributes
.....
Jan 10 09:38:08 da066 kernel: pvfs2:copy_attributes_to_inode: got invalid
attribute type 2
Jan 10 09:38:08 da066 kernel: pvfs2_inode_getattr: failed to copy attributes
Jan 10 09:38:08 da066 kernel: pvfs2:copy_attributes_to_inode: got invalid
attribute type 2
Jan 10 09:38:08 da066 kernel: pvfs2_inode_getattr: failed to copy attributes
.....
Jan 10 09:38:08 da067 kernel: pvfs2:copy_attributes_to_inode: got invalid
attribute type 2
Jan 10 09:38:08 da067 kernel: pvfs2_inode_getattr: failed to copy attributes
Jan 10 09:38:08 da067 kernel: pvfs2:copy_attributes_to_inode: got invalid
attribute type 2
Jan 10 09:38:08 da067 kernel: pvfs2_inode_getattr: failed to copy attributes
.....
da080 OK
.....
Jan 10 09:38:08 da085 kernel: pvfs2:copy_attributes_to_inode: got invalid
attribute type 2
Jan 10 09:38:08 da085 kernel: pvfs2_inode_getattr: failed to copy attributes
Jan 10 09:38:08 da085 kernel: pvfs2:copy_attributes_to_inode: got invalid
attribute type 2
Jan 10 09:38:08 da085 kernel: pvfs2_inode_getattr: failed to copy attributes
-------------------------------------
As you can see, four nodes on the bottom of the node list for this job
reported these problems. I feel like PVFS2 may be getting oveloaded, but,
why would it be in the first place? Anyway, any idea on this would be
welcome.
Thanks,
MC
--
Martin Cuma
Center for High Performance Computing
University of Utah
_______________________________________________
PVFS2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users