All,
I am testing OrangeFS to see if it appropriate for use as our distributed file
system in a HPC environment. I'm having a problem with invalid objects
appearing during heavy loads.
My current set up is 3 instances of Amazon Linux running in EC2, each with 5 x
1 TB EBS volumes, striped using LVM. I am running OrangeFS 2.8.6, configured
with the following options: "--with-db=/usr/local/BerkeleyDB.4.8
--with-kernel=/usr/src/kernels/3.2.28-45.62.amzn1.x86_64"
The invalid objects appear when I run something that generates a lot of I/O on
one node, then edit a file on a different node. In the following example,
"/shared" is the pvfs2 mount.
OrangeFS1- Create a 16 GB file:
/usr/bin/time dd if=/dev/zero of=/shared/16gb-test bs=1024k count=16384
OrangeFS2- During the File creation, vi a file, add a character and save it
vi /shared/make_files.sh
Then show a file listing on OrangeFS1 or OrangeFS3:
[root@ip-10-0-2-138 shared]# ls -lah
ls: cannot access make_files.sh: Input/output error
total 17G
drwxrwxrwt 1 root root 4.0K Sep 5 15:58 .
dr-xr-xr-x 25 root root 4.0K Sep 5 03:11 ..
-rw-r--r-- 1 root root 16G Sep 5 15:50 16gb-test
drwxr-xr-x 1 root root 4.0K Sep 5 13:54 filetest
drwxrwxr-x 1 root root 4.0K Sep 5 15:39 iospeed
drwxrwxrwx 1 root root 4.0K Sep 4 18:14 lost+found
?????????? ? ? ? ? ? make_files.sh
If I unmount and mount /shared on the out-of-sync server, the make_files.sh
file returns to normal.
I have tried adjusting the TroveSyncMeta and TroveSyncData to both "yes" and
restarting the service, but that didn't change the behavior. I don't see any
errors in the client or server log files, except broken pipes related to
restarting the service.
What could be causing the invalid objects, and is there a tuning that might
make it more stable?
Thanks,
Nick Sabine
The 3 EC2 instance types are cc2.8xlarge, with the following specs:
Cluster Compute Eight Extra Large Instance
60.5 GB of memory
88 EC2 Compute Units (2 x Intel Xeon E5-2670, eight-core "Sandy Bridge"
architecture)
3370 GB of instance storage
64-bit platform
I/O Performance: Very High (10 Gigabit Ethernet)
EBS-Optimized Available: No*
API name: cc2.8xlarge
Configuration file for OrangeFS:
[root@ip-10-0-2-138 shared]# cat /etc/pvfs2-fs.conf
<Defaults>
UnexpectedRequests 50
EventLogging none
EnableTracing no
LogStamp datetime
BMIModules bmi_tcp
FlowModules flowproto_multiqueue
PerfUpdateInterval 1000
ServerJobBMITimeoutSecs 30
ServerJobFlowTimeoutSecs 30
ClientJobBMITimeoutSecs 300
ClientJobFlowTimeoutSecs 300
ClientRetryLimit 5
ClientRetryDelayMilliSecs 2000
PrecreateBatchSize 0,32,512,32,32,32,0
PrecreateLowThreshold 0,16,256,16,16,16,0
DataStorageSpace /local
MetadataStorageSpace /local
LogFile /var/log/pvfs2-server.log
</Defaults>
<Aliases>
Alias ip-10-0-2-138 tcp://ip-10-0-2-138:3334
Alias ip-10-0-2-139 tcp://ip-10-0-2-139:3334
Alias ip-10-0-2-140 tcp://ip-10-0-2-140:3334
</Aliases>
<Filesystem>
Name pvfs2-fs
ID 165017615
RootHandle 1048576
FileStuffing yes
<MetaHandleRanges>
Range ip-10-0-2-138 3-1537228672809129302
Range ip-10-0-2-139 1537228672809129303-3074457345618258602
Range ip-10-0-2-140 3074457345618258603-4611686018427387902
</MetaHandleRanges>
<DataHandleRanges>
Range ip-10-0-2-138 4611686018427387903-6148914691236517202
Range ip-10-0-2-139 6148914691236517203-7686143364045646502
Range ip-10-0-2-140 7686143364045646503-9223372036854775802
</DataHandleRanges>
<StorageHints>
TroveSyncMeta no
TroveSyncData no
TroveMethod alt-aio
</StorageHints>
</Filesystem>
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users