Hi Tiankai,
I've been debugging something similar I think, but I'm not able to reproduce the EACCES (Permission denied) error with only a few nodes. It would be helpful to eliminate a few things to isolate the problem, and see we're both looking at the same bug.
Can you disable the name and attribute cache in the client daemon? To do that, you should be able to start the pvfs2-client with -n 0 -a 0. With those options, does the problem persist?
Are your nodes x86_64?What happens if you just use one node as a metadata server instead of all 6?
Thanks, -sam On Mar 17, 2008, at 11:20 AM, Tu, Tiankai wrote:
I have been testing whether PVFS2 can be used to support large-scale read-intensive parallel workload, in particular, post-simulation data analysis. Although the preliminary results (on a small cluster) are encouraging when everything worked, there have been a few occasions where mysterious "Permission Denied" errors occurred and the applications halted. Below are the system hardware/software setup: - 6 compute nodes each with 8 cores, 16 GB memory, 170 GB free disk space managed by xfs. - Nodes are interconnected by a 1 GigE cable to a 10 GigE switch - Linux kernel: 2.6.22.15-7smp PVFS setup - pvfs-2.7.0 installed - All the 6 nodes also used as both metadata servers and IO servers - The same 6 nodes used to run application codes (as pvfs clients) - pvfs kernel module installed on all the nodes- pvfs mounted with the local hostname specified as the metadata serveron each node - regular unix open/read/close calls from within the applications - Default file striping on all the servers Application characteristics: - Parallel Python programs - A large number of parallel read threads - Mostly independent read traces; occasionally shared accesses to the same file but by no more than 2 threads - Large, equally-sized files (> 64 MB) - Each thread opens a file, reads in the content of the entire file(most of the time), extracts data of interest, closes the file and movesto the next file - The sequence of files to be accessed by each thread pre-determined (i.e., no runtime arbitration) - Experiments run on configurations with different number of nodes and different number of cores per node; total number of (read) threads determined by (number of nodes X cores per nodes) Error: - An example (6 nodes, 4 threads per node) : Cannot open file/scratch/mnt/pvfs2/merged_frameset_64MB/p2auto/00000001/trj/ frame000000844 [Errno 13] Permission denied:'/scratch/mnt/pvfs2/merged_frameset_64MB/p2auto/00000001/trj/ frame000000844' - Similar errors encountered in other node/thread configurations - The files being reported as inaccessible were all verified to be accessible from all the 6 compute/storage nodes Extra information: - On the first trial with PVFS, a different error "[Errno 11] Resourcetemporarily unavailable" occurred multiple times along with "[Errno 13]Permission denied."- PVFS configuration was changed to increase the number of retry from 5to 10 and delay from 2 to 2.5 sec- [Errno 11] did not show up again; but [Errno 13] showed up more oftenThanks for the help. Tiankai _______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
