When FTPing a file to an FTP server that exposed an OrangeFS file system, I can 
consistently get data corruption when the ProFTPD server has a specific 
configuration.  If I set the send and 
recv socket buffer size settings for ProFTPD (as configured in the attached 
config file), it will often corrupt the data.  As far as I could determine, all 
these settings do is modify the socket 
options for the FTP sockets using SO_SNDBUF and SO_RCVBUF options set with 
setsockopt.

Here is my environment configuration:

FTP server: ProFTPD (I just built and installed the latest version available at 
proftpd.org).  I've attached the configuration file.
FTP client: The standard "ftp" command line utility present in RedHat Linux
FS Cluster: 4 Node OrangeFS Cluster using the attached config file.
All nodes involved were running 64bit RHEL5.5

Here is the command line used for the pvfs2-client: 
/usr/sbin/pvfs2-client --logtype syslog -p /usr/sbin/pvfs2-client-core 
--logstamp datetime --acache-timeout=30000 --ncache-timeout=30000 --desc-size 
8388608 --desc-count 5

The FTP client and server were on the same machine, so I simply FTP'd to 
localhost.

We initially saw the corruption when copying a 110G file, but a 118G file 
copied just fine.  I've been able to reliably reproduce the corruption with a 
10G file and sometimes with a 1G file.  I was 
never able to reproduce the corruption with a file smaller than 500M.

The corruption itself seems to manifest itself by replacing good data with 
Nulls.  The file size is always correct.

Another observation is that it usually takes several attempts for a file 
transfer to cause corruption, but once it does, it is fairly consistent.

Also, I attempted to reproduce the issue by using the "cp" command, "dd", and 
"rsync".  None of these operations would reproduce the issue.  Even using the 
"curl" command line utility to 
make the FTP transfer worked.  The only cases that cause corruption are our 
custom FTP client and the command line ftp client.  I'm assuming their access 
pattern triggers some edge case in  
OrangeFS.

I also put debug statements in the proftpd code to write the data to a local 
file immediately before writing to the OrangeFS FS.  The debug file would be 
correct (without corruption), while the 
file written out to OrangeFS would be corrupted, so this isn't an issue of the 
data being corrupted in-flight to the FTP server.

Any help you can provide would be greatly appreciated.

Thanks!

-- 
Benjamin Severs
<Defaults>
        LogType syslog
        TCPBufferReceive 524288
        TCPBufferSend 524288
        TroveMaxConcurrentIO 16
        UnexpectedRequests 150
        EventLogging none
        EnableTracing no
        LogStamp datetime
        BMIModules bmi_tcp
        FlowModules flowproto_multiqueue
        PerfUpdateInterval 1000
        ServerJobBMITimeoutSecs 30
        ServerJobFlowTimeoutSecs 30
        ClientJobBMITimeoutSecs 30
        ClientJobFlowTimeoutSecs 30
        ClientRetryLimit 5
        ClientRetryDelayMilliSecs 33000
        PrecreateBatchSize 0, 32, 512, 32, 32, 32, 0
        PrecreateLowThreshold 0, 16, 256, 16, 16, 16, 0
        LogFile /tmp/pvfs2-server.log
        TCPBindSpecific yes
</Defaults>

<Security>
        TrustedNetwork 
</Security>

<Aliases>
        Alias node1.domain_tcp3334 tcp://node1.domain:3334
        Alias node2.domain_tcp3334 tcp://node2.domain:3334
        Alias node3.domain_tcp3334 tcp://node3.domain:3334
        Alias node4.domain_tcp3334 tcp://node4.domain:3334
</Aliases>

<Filesystem>
        DefaultNumDFiles 0
        FlowBuffersPerFlow 16
        FlowBufferSizeBytes 524288
        Name pvfs2-fs
        ID 2108795306
        RootHandle 1048576
        FileStuffing yes
        <MetaHandleRanges>
                Range node1.domain_tcp3334 3-536870913
                Range node2.domain_tcp3334 536870914-1073741824
                Range node3.domain_tcp3334 1073741825-1610612735
                Range node4.domain_tcp3334 1610612736-2147483646
        </MetaHandleRanges>
        <DataHandleRanges>
                Range node1.domain_tcp3334 2147483647-2684354557
                Range node2.domain_tcp3334 2684354558-3221225468
                Range node3.domain_tcp3334 3221225469-3758096379
                Range node4.domain_tcp3334 3758096380-4294967290
        </DataHandleRanges>
        <StorageHints>
                DirectIOTimeout 1000
                DirectIOOpsPerQueue 10
                DirectIOThreadNum 30
                AttrCacheMaxNumElems 1024
                AttrCacheSize 511
                AttrCacheKeywords dh,md,de,st
                HandleRecycleTimeoutSecs 360
                CoalescingLowWatermark 1
                CoalescingHighWatermark 8
                DBCacheType sys
                DBCacheSizeBytes 262144
                TroveSyncMeta yes
                TroveSyncData yes
                TroveMethod directio
        </StorageHints>
        <Distribution>
                Name simple_stripe
                Param strip_size
                Value 65536
        </Distribution>
        <ExportOptions>
                ReadOnly 
                RootSquashExceptions 
                RootSquash 
                AnonGID 99
                AnonUID 99
        </ExportOptions>
</Filesystem>

<ServerOptions>
        Server node1.domain_tcp3334
        StorageSpace /pvfs2-data-3334
        LogFile /tmp/pvfs2-server.log-node1.domain_tcp3334
</ServerOptions>

<ServerOptions>
        Server node2.domain_tcp3334
        StorageSpace /pvfs2-data-3334
        LogFile /tmp/pvfs2-server.log-node2.domain_tcp3334
</ServerOptions>

<ServerOptions>
        Server node3.domain_tcp3334
        StorageSpace /pvfs2-data-3334
        LogFile /tmp/pvfs2-server.log-node3.domain_tcp3334
</ServerOptions>

<ServerOptions>
        Server node4.domain_tcp3334
        StorageSpace /pvfs2-data-3334
        LogFile /tmp/pvfs2-server.log-node4.domain_tcp3334
</ServerOptions>
AllowOverwrite on
AuthOrder mod_auth_pam.c* mod_auth_unix.c
AuthPAM on
AuthPAMConfig proftp
CommandBufferSize 512
DebugLevel 0
DefaultChdir /mnt
DefaultRoot /
DefaultServer on
DefaultTransferMode binary
ExtendedLog /var/log/proftpd/extended.log ALL default
Group nobody
IdentLookups off
LogFormat default "%t - %h %u '%r' '%D' '%f' <%s> %b %T"
MaxClients 200
MaxClientsPerHost 100
MaxClientsPerUser 20
MaxLoginAttempts 3
PersistentPasswd off
Port 21
RootLogin off
ServerName ProFTPD
ServerType standalone
SetEnv TZ :/etc/localtime
ShowSymlinks on
SocketOptions sndbuf 104857600 rcvbuf 104857600
SyslogLevel notice
SystemLog /var/log/proftpd/server.log
tcpBackLog 5
tcpNoDelay on
TimeoutIdle 600
TimeoutLogin 300
TimeoutNoTransfer 600
TimeoutStalled 3600
TimesGMT off
TransferLog /var/log/proftpd/transfer.log
Umask 002
User nobody
UseReverseDNS on
WtmpLog off
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to