When FTPing a file to an FTP server that exposed an OrangeFS file system, I can
consistently get data corruption when the ProFTPD server has a specific
configuration. If I set the send and
recv socket buffer size settings for ProFTPD (as configured in the attached
config file), it will often corrupt the data. As far as I could determine, all
these settings do is modify the socket
options for the FTP sockets using SO_SNDBUF and SO_RCVBUF options set with
setsockopt.
Here is my environment configuration:
FTP server: ProFTPD (I just built and installed the latest version available at
proftpd.org). I've attached the configuration file.
FTP client: The standard "ftp" command line utility present in RedHat Linux
FS Cluster: 4 Node OrangeFS Cluster using the attached config file.
All nodes involved were running 64bit RHEL5.5
Here is the command line used for the pvfs2-client:
/usr/sbin/pvfs2-client --logtype syslog -p /usr/sbin/pvfs2-client-core
--logstamp datetime --acache-timeout=30000 --ncache-timeout=30000 --desc-size
8388608 --desc-count 5
The FTP client and server were on the same machine, so I simply FTP'd to
localhost.
We initially saw the corruption when copying a 110G file, but a 118G file
copied just fine. I've been able to reliably reproduce the corruption with a
10G file and sometimes with a 1G file. I was
never able to reproduce the corruption with a file smaller than 500M.
The corruption itself seems to manifest itself by replacing good data with
Nulls. The file size is always correct.
Another observation is that it usually takes several attempts for a file
transfer to cause corruption, but once it does, it is fairly consistent.
Also, I attempted to reproduce the issue by using the "cp" command, "dd", and
"rsync". None of these operations would reproduce the issue. Even using the
"curl" command line utility to
make the FTP transfer worked. The only cases that cause corruption are our
custom FTP client and the command line ftp client. I'm assuming their access
pattern triggers some edge case in
OrangeFS.
I also put debug statements in the proftpd code to write the data to a local
file immediately before writing to the OrangeFS FS. The debug file would be
correct (without corruption), while the
file written out to OrangeFS would be corrupted, so this isn't an issue of the
data being corrupted in-flight to the FTP server.
Any help you can provide would be greatly appreciated.
Thanks!
--
Benjamin Severs
<Defaults>
LogType syslog
TCPBufferReceive 524288
TCPBufferSend 524288
TroveMaxConcurrentIO 16
UnexpectedRequests 150
EventLogging none
EnableTracing no
LogStamp datetime
BMIModules bmi_tcp
FlowModules flowproto_multiqueue
PerfUpdateInterval 1000
ServerJobBMITimeoutSecs 30
ServerJobFlowTimeoutSecs 30
ClientJobBMITimeoutSecs 30
ClientJobFlowTimeoutSecs 30
ClientRetryLimit 5
ClientRetryDelayMilliSecs 33000
PrecreateBatchSize 0, 32, 512, 32, 32, 32, 0
PrecreateLowThreshold 0, 16, 256, 16, 16, 16, 0
LogFile /tmp/pvfs2-server.log
TCPBindSpecific yes
</Defaults>
<Security>
TrustedNetwork
</Security>
<Aliases>
Alias node1.domain_tcp3334 tcp://node1.domain:3334
Alias node2.domain_tcp3334 tcp://node2.domain:3334
Alias node3.domain_tcp3334 tcp://node3.domain:3334
Alias node4.domain_tcp3334 tcp://node4.domain:3334
</Aliases>
<Filesystem>
DefaultNumDFiles 0
FlowBuffersPerFlow 16
FlowBufferSizeBytes 524288
Name pvfs2-fs
ID 2108795306
RootHandle 1048576
FileStuffing yes
<MetaHandleRanges>
Range node1.domain_tcp3334 3-536870913
Range node2.domain_tcp3334 536870914-1073741824
Range node3.domain_tcp3334 1073741825-1610612735
Range node4.domain_tcp3334 1610612736-2147483646
</MetaHandleRanges>
<DataHandleRanges>
Range node1.domain_tcp3334 2147483647-2684354557
Range node2.domain_tcp3334 2684354558-3221225468
Range node3.domain_tcp3334 3221225469-3758096379
Range node4.domain_tcp3334 3758096380-4294967290
</DataHandleRanges>
<StorageHints>
DirectIOTimeout 1000
DirectIOOpsPerQueue 10
DirectIOThreadNum 30
AttrCacheMaxNumElems 1024
AttrCacheSize 511
AttrCacheKeywords dh,md,de,st
HandleRecycleTimeoutSecs 360
CoalescingLowWatermark 1
CoalescingHighWatermark 8
DBCacheType sys
DBCacheSizeBytes 262144
TroveSyncMeta yes
TroveSyncData yes
TroveMethod directio
</StorageHints>
<Distribution>
Name simple_stripe
Param strip_size
Value 65536
</Distribution>
<ExportOptions>
ReadOnly
RootSquashExceptions
RootSquash
AnonGID 99
AnonUID 99
</ExportOptions>
</Filesystem>
<ServerOptions>
Server node1.domain_tcp3334
StorageSpace /pvfs2-data-3334
LogFile /tmp/pvfs2-server.log-node1.domain_tcp3334
</ServerOptions>
<ServerOptions>
Server node2.domain_tcp3334
StorageSpace /pvfs2-data-3334
LogFile /tmp/pvfs2-server.log-node2.domain_tcp3334
</ServerOptions>
<ServerOptions>
Server node3.domain_tcp3334
StorageSpace /pvfs2-data-3334
LogFile /tmp/pvfs2-server.log-node3.domain_tcp3334
</ServerOptions>
<ServerOptions>
Server node4.domain_tcp3334
StorageSpace /pvfs2-data-3334
LogFile /tmp/pvfs2-server.log-node4.domain_tcp3334
</ServerOptions>
AllowOverwrite on
AuthOrder mod_auth_pam.c* mod_auth_unix.c
AuthPAM on
AuthPAMConfig proftp
CommandBufferSize 512
DebugLevel 0
DefaultChdir /mnt
DefaultRoot /
DefaultServer on
DefaultTransferMode binary
ExtendedLog /var/log/proftpd/extended.log ALL default
Group nobody
IdentLookups off
LogFormat default "%t - %h %u '%r' '%D' '%f' <%s> %b %T"
MaxClients 200
MaxClientsPerHost 100
MaxClientsPerUser 20
MaxLoginAttempts 3
PersistentPasswd off
Port 21
RootLogin off
ServerName ProFTPD
ServerType standalone
SetEnv TZ :/etc/localtime
ShowSymlinks on
SocketOptions sndbuf 104857600 rcvbuf 104857600
SyslogLevel notice
SystemLog /var/log/proftpd/server.log
tcpBackLog 5
tcpNoDelay on
TimeoutIdle 600
TimeoutLogin 300
TimeoutNoTransfer 600
TimeoutStalled 3600
TimesGMT off
TransferLog /var/log/proftpd/transfer.log
Umask 002
User nobody
UseReverseDNS on
WtmpLog off
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers