Hi Tiankai,

I've been debugging something similar I think, but I'm not able to reproduce the EACCES (Permission denied) error with only a few nodes. It would be helpful to eliminate a few things to isolate the problem, and see we're both looking at the same bug.

Can you disable the name and attribute cache in the client daemon? To do that, you should be able to start the pvfs2-client with -n 0 -a 0. With those options, does the problem persist?

Are your nodes x86_64?

What happens if you just use one node as a metadata server instead of all 6?

Thanks,
-sam

On Mar 17, 2008, at 11:20 AM, Tu, Tiankai wrote:

I have been testing whether PVFS2 can be used to support large-scale
read-intensive parallel workload, in particular, post-simulation data
analysis. Although the preliminary results (on a small cluster) are
encouraging when everything worked, there have been a few occasions
where mysterious "Permission Denied" errors occurred and the
applications halted.

Below are the system hardware/software setup:

- 6 compute nodes each with 8 cores, 16 GB memory, 170 GB free disk
space managed by xfs.
- Nodes are interconnected by a 1 GigE cable to a 10 GigE switch
- Linux kernel: 2.6.22.15-7smp

PVFS setup

- pvfs-2.7.0 installed
- All the 6 nodes also used as both metadata servers and IO servers
- The same 6 nodes used to run application codes (as pvfs clients)
- pvfs kernel module installed on all the nodes
- pvfs mounted with the local hostname specified as the metadata server
on each node
- regular unix open/read/close calls from within the applications
- Default file striping on all the servers

Application characteristics:

- Parallel Python programs
- A large number of parallel read threads
- Mostly independent read traces; occasionally shared accesses to the
same file but by no more than 2 threads
- Large, equally-sized files (> 64 MB)
- Each thread opens a file, reads in the content of the entire file
(most of the time), extracts data of interest, closes the file and moves
to the next file
- The sequence of files to be accessed by each thread pre-determined
(i.e., no runtime arbitration)
- Experiments run on configurations with different number of nodes and
different number of cores per node; total number of (read) threads
determined by (number of nodes X cores per nodes)

Error:
- An example (6 nodes, 4 threads per node) : Cannot open file
/scratch/mnt/pvfs2/merged_frameset_64MB/p2auto/00000001/trj/ frame0000008
44 [Errno 13] Permission denied:
'/scratch/mnt/pvfs2/merged_frameset_64MB/p2auto/00000001/trj/ frame000000
844'
- Similar errors encountered in other node/thread configurations
- The files being reported as inaccessible were all verified to be
accessible from all the 6 compute/storage nodes


Extra information:
- On the first trial with PVFS, a different error "[Errno 11] Resource
temporarily unavailable" occurred multiple times along with "[Errno 13]
Permission denied."
- PVFS configuration was changed to increase the number of retry from 5
to 10 and delay from 2 to 2.5 sec
- [Errno 11] did not show up again; but [Errno 13] showed up more often

Thanks for the help.
Tiankai



_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users


Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to