Hi Sam,
I didn't have a chance to test your suggested configuration (-n 0 -a 0)
on the 6-node cluster. But recently, I installed and experimented with
PVFS on a larger cluster with 64 nodes using your proposed runtime
flags. It worked most of time. But the error of "Permission denied"
still showed up occasionally. The details of the setup are listed below.
PVFS2 configuration:
- A 64-node Linux cluster, each node has 8 cores
- Kernel version 2.6.22.15-8smp
- Pvfs-2.7.0 installed
- All 64 nodes used as both metadata servers and IO servers
- PVFS kernel module install on all 64 nodes
- Regular open/read/close calls from within applications
- Default file striping on all servers
Application characteristics:
- Parallel Python programs, using 4 cores out of the 8 core on each node
- A large number of parallel read threads
- Mostly independent read traces; occasionally shared accesses to the
same file but no more than two threads
- Large, equally-sized files (> 64 MB)
- Each thread opens a file, reads in the content of the entire file
(most of the time), extracts data of interest, closes the file and
moves to the next file
- The sequence of files to be accessed by each thread pre-determined
(i.e., no runtime arbitration)
An error example:
Cannot open file:
/scratch/mnt/pvfs2/dataset/merged_frameset_64MB/conduction/trj.dtr_20071
207222232/frame000000025 [Errno 13]. Permission denied.
Extra information:
A number of [Errno 11] "Resource temporarily unavailable" showed up
earlier. I changed the default PVFS configuration as follows and no
longer saw errno 11.
<Defaults>
UnexpectedRequests 2048
ServerJobBMITimeoutSecs 30
ServerJobFlowTimeoutSecs 30
ClientJobBMITimeoutSecs 5
ClientJobFlowTimeoutSecs 5
ClientRetryLimit 8
ClientRetryDelayMilliSecs 0
TroveMaxConcurrentIO 64
</Defaults>
Tiankai
-----Original Message-----
From: Sam Lang [mailto:[EMAIL PROTECTED]
Sent: Tuesday, March 18, 2008 11:23 AM
To: Tu, Tiankai
Cc: [email protected]
Subject: Re: [Pvfs2-users] Heavy read workload and "Permission denied"
errors
Hi Tiankai,
I've been debugging something similar I think, but I'm not able to
reproduce the EACCES (Permission denied) error with only a few nodes.
It would be helpful to eliminate a few things to isolate the problem,
and see we're both looking at the same bug.
Can you disable the name and attribute cache in the client daemon? To
do that, you should be able to start the pvfs2-client with -n 0 -a 0.
With those options, does the problem persist?
Are your nodes x86_64?
What happens if you just use one node as a metadata server instead of
all 6?
Thanks,
-sam
On Mar 17, 2008, at 11:20 AM, Tu, Tiankai wrote:
> I have been testing whether PVFS2 can be used to support large-scale
> read-intensive parallel workload, in particular, post-simulation data
> analysis. Although the preliminary results (on a small cluster) are
> encouraging when everything worked, there have been a few occasions
> where mysterious "Permission Denied" errors occurred and the
> applications halted.
>
> Below are the system hardware/software setup:
>
> - 6 compute nodes each with 8 cores, 16 GB memory, 170 GB free disk
> space managed by xfs.
> - Nodes are interconnected by a 1 GigE cable to a 10 GigE switch
> - Linux kernel: 2.6.22.15-7smp
>
> PVFS setup
>
> - pvfs-2.7.0 installed
> - All the 6 nodes also used as both metadata servers and IO servers
> - The same 6 nodes used to run application codes (as pvfs clients)
> - pvfs kernel module installed on all the nodes
> - pvfs mounted with the local hostname specified as the metadata
> server
> on each node
> - regular unix open/read/close calls from within the applications
> - Default file striping on all the servers
>
> Application characteristics:
>
> - Parallel Python programs
> - A large number of parallel read threads
> - Mostly independent read traces; occasionally shared accesses to the
> same file but by no more than 2 threads
> - Large, equally-sized files (> 64 MB)
> - Each thread opens a file, reads in the content of the entire file
> (most of the time), extracts data of interest, closes the file and
> moves
> to the next file
> - The sequence of files to be accessed by each thread pre-determined
> (i.e., no runtime arbitration)
> - Experiments run on configurations with different number of nodes and
> different number of cores per node; total number of (read) threads
> determined by (number of nodes X cores per nodes)
>
> Error:
> - An example (6 nodes, 4 threads per node) : Cannot open file
> /scratch/mnt/pvfs2/merged_frameset_64MB/p2auto/00000001/trj/
> frame0000008
> 44 [Errno 13] Permission denied:
> '/scratch/mnt/pvfs2/merged_frameset_64MB/p2auto/00000001/trj/
> frame000000
> 844'
> - Similar errors encountered in other node/thread configurations
> - The files being reported as inaccessible were all verified to be
> accessible from all the 6 compute/storage nodes
>
>
> Extra information:
> - On the first trial with PVFS, a different error "[Errno 11] Resource
> temporarily unavailable" occurred multiple times along with "[Errno
> 13]
> Permission denied."
> - PVFS configuration was changed to increase the number of retry
> from 5
> to 10 and delay from 2 to 2.5 sec
> - [Errno 11] did not show up again; but [Errno 13] showed up more
> often
>
> Thanks for the help.
> Tiankai
>
>
>
> _______________________________________________
> Pvfs2-users mailing list
> [email protected]
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users