All,

I am testing OrangeFS to see if it appropriate for use as our distributed file 
system in a HPC environment.  I'm having a problem with invalid objects 
appearing during heavy loads.

My current set up is 3 instances of Amazon Linux running in EC2, each with 5 x 
1 TB EBS volumes, striped using LVM.  I am running OrangeFS 2.8.6, configured 
with the following options:  "--with-db=/usr/local/BerkeleyDB.4.8 
--with-kernel=/usr/src/kernels/3.2.28-45.62.amzn1.x86_64"

The invalid objects appear when I run something that generates a lot of I/O on 
one node, then edit a file on a different node.  In the following example, 
"/shared" is the pvfs2 mount.

OrangeFS1-  Create a 16 GB file:
/usr/bin/time dd if=/dev/zero of=/shared/16gb-test bs=1024k count=16384

OrangeFS2- During the File creation, vi a file, add a character and save it
vi /shared/make_files.sh

Then show a file listing on OrangeFS1 or OrangeFS3:
[root@ip-10-0-2-138 shared]# ls -lah
ls: cannot access make_files.sh: Input/output error
total 17G
drwxrwxrwt  1 root  root  4.0K Sep  5 15:58 .
dr-xr-xr-x 25 root  root  4.0K Sep  5 03:11 ..
-rw-r--r--  1 root  root   16G Sep  5 15:50 16gb-test
drwxr-xr-x  1 root  root  4.0K Sep  5 13:54 filetest
drwxrwxr-x  1 root  root  4.0K Sep  5 15:39 iospeed
drwxrwxrwx  1 root  root  4.0K Sep  4 18:14 lost+found
??????????  ? ?     ?        ?            ? make_files.sh

If I unmount and mount /shared on the out-of-sync server, the make_files.sh 
file returns to normal.

I have tried adjusting the TroveSyncMeta and TroveSyncData to both "yes" and 
restarting the service, but that didn't change the behavior.  I don't see any 
errors in the client or server log files, except broken pipes related to 
restarting the service.

What could be causing the invalid objects, and is there a tuning that might 
make it more stable?

Thanks,
Nick Sabine


The 3 EC2 instance types are cc2.8xlarge, with the following specs:
Cluster Compute Eight Extra Large Instance
60.5 GB of memory
88 EC2 Compute Units (2 x Intel Xeon E5-2670, eight-core "Sandy Bridge" 
architecture)
3370 GB of instance storage
64-bit platform
I/O Performance: Very High (10 Gigabit Ethernet)
EBS-Optimized Available: No*
API name: cc2.8xlarge

Configuration file for OrangeFS:

[root@ip-10-0-2-138 shared]# cat /etc/pvfs2-fs.conf
<Defaults>
        UnexpectedRequests 50
        EventLogging none
        EnableTracing no
        LogStamp datetime
        BMIModules bmi_tcp
        FlowModules flowproto_multiqueue
        PerfUpdateInterval 1000
        ServerJobBMITimeoutSecs 30
        ServerJobFlowTimeoutSecs 30
        ClientJobBMITimeoutSecs 300
        ClientJobFlowTimeoutSecs 300
        ClientRetryLimit 5
        ClientRetryDelayMilliSecs 2000
        PrecreateBatchSize 0,32,512,32,32,32,0
        PrecreateLowThreshold 0,16,256,16,16,16,0

        DataStorageSpace /local
        MetadataStorageSpace /local

        LogFile /var/log/pvfs2-server.log
</Defaults>

<Aliases>
        Alias ip-10-0-2-138 tcp://ip-10-0-2-138:3334
        Alias ip-10-0-2-139 tcp://ip-10-0-2-139:3334
        Alias ip-10-0-2-140 tcp://ip-10-0-2-140:3334
</Aliases>

<Filesystem>
        Name pvfs2-fs
        ID 165017615
        RootHandle 1048576
        FileStuffing yes
       <MetaHandleRanges>
                Range ip-10-0-2-138 3-1537228672809129302
                Range ip-10-0-2-139 1537228672809129303-3074457345618258602
                Range ip-10-0-2-140 3074457345618258603-4611686018427387902
        </MetaHandleRanges>
        <DataHandleRanges>
                Range ip-10-0-2-138 4611686018427387903-6148914691236517202
                Range ip-10-0-2-139 6148914691236517203-7686143364045646502
                Range ip-10-0-2-140 7686143364045646503-9223372036854775802
        </DataHandleRanges>
        <StorageHints>
                TroveSyncMeta no
                TroveSyncData no
                TroveMethod alt-aio
        </StorageHints>
</Filesystem>

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to