Hi, Thank you for all the effort put into making PVFS2 available. I'm relatively new to Linux (from WinXP), and have built a 3 node cluster using the Rocks Cluster software v4.2.1. I've installed the PVFS2 roll and by following the PVFS2 roll guide all has proceeded very smoothly - really, thanks - I'd expected a few days/weeks to get to this point.
At the end of this email I pose some questions that the following behavior has raised. About my set-up: A single user. I made no changes to the PVFS configuration established by the PVFS2 roll, and have one head node and two compute-I/O nodes. PVFS version 1.5.1 The unexpected behavior: Using pvfs2-cp I have copied approx 900GB of files from serval DVD using dd (I dd to a tmpfs area then pvfs2-cp this 'image' to /mnt/pvfs2/some/path). I have noticed that this runs fine so long as it is the first time the file is copied. If I use pvfs2-rm to delete a file, not necessarily from the same node used to make the copy, the following occurs (all nodes seems to be up and working fine): - I can see the file is removed using the gnome file browser. - The pvfs2-rm seems to hang, and the hollowing message is displayed: [E 15:10:02.584608] Job time out: cancelling bmi operation, job_id: 21. [E 15:10:02.584769] msgpair failed, will retry: Operation cancelled (possibly due to timeout) If I try to re-copy the file (using pvfs2-cp), again, not necessarily from the same node it was first copied on, then I see and the copy fails. [E 15:26:53.690560] Job time out: cancelling bmi operation, job_id: 25. [E 15:26:53.690710] msgpair failed, will retry: Operation cancelled (possibly due to timeout) [E 15:26:53.690733] *** msgpairarray_completion_fn: msgpair to server tcp://pvfs2-compute-0-1:3334 failed: Operation cancelled (possibly due to timeout) [E 15:26:53.690743] *** No retries requested. pvfs2-cp: src/client/sysint/sys-getattr.sm:331: getattr_acache_lookup: Assertion `object_ref.handle != ((PVFS_handle)0)' failed. / On rebooting one of the nodes I was forced to run fsck, after this the cluster seems to have returned to 'normal'. The good news is that the std linux commands: cp and rm don't seem to have any trouble, so I am using those at the moment..... I couldn't find any advice that cp, etc, is preferred to pvfs2-cp, or vice versa. 1) Is this a known issue that is fixed in PVFS 2.6? 2) Is it fine to continue to use v1.5.1 so long as I don't use the PVFS-* commands? 3) Is upgrading to v2.6 on a rocks cluster 'straight forward', or is it likely to involve some 'debugging' and a few days work - bear in mind my relative inexperience with Linux. Regards Mark _______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
