Hi Vincenzo, This sounds similar to an issue just reported by Benjamin Seevers here on the developers list: http://www.beowulf-underground.org/pipermail/pvfs2-developers/2011-June/004732.html
Based on his experience with the issue if you switch to alt-aio instead of directio the corruption no longer occurs. Could you try switching from directio to alt-aio in your configuration to help isolate if this is a similar or different issue? If that doesn't resolve the issue, could you provide what errors, if any, you see on the client when it fails and what errors, if any, appear in the pvfs2-server logs? Thanks, Michael On Fri, Jun 17, 2011 at 6:48 AM, Vincenzo Gulisano < [email protected]> wrote: > Hi, > I'm using the following setup: > 4 machines used as I/O server > 10 machines used as I/O client > > The configuration file is the following: > > <Defaults> > UnexpectedRequests 50 > EventLogging none > EnableTracing no > LogStamp datetime > BMIModules bmi_tcp > FlowModules flowproto_multiqueue > PerfUpdateInterval 1000 > ServerJobBMITimeoutSecs 30 > ServerJobFlowTimeoutSecs 30 > ClientJobBMITimeoutSecs 300 > ClientJobFlowTimeoutSecs 300 > ClientRetryLimit 5 > ClientRetryDelayMilliSecs 2000 > PrecreateBatchSize 512 > PrecreateLowThreshold 256 > TCPBufferSend 524288 > TCPBufferReceive 524288 > StorageSpace /local/vincenzo/pvfs2-storage-space > LogFile /tmp/pvfs2-server.log > </Defaults> > > <Aliases> > Alias blade58 tcp://blade58:3334 > Alias blade59 tcp://blade59:3334 > Alias blade60 tcp://blade60:3334 > Alias blade61 tcp://blade61:3334 > </Aliases> > > <Filesystem> > Name pvfs2-fs > ID 1615492168 > RootHandle 1048576 > FileStuffing yes > <MetaHandleRanges> > Range blade58 3-1152921504606846977 > Range blade59 1152921504606846978-2305843009213693952 > Range blade60 2305843009213693953-3458764513820540927 > Range blade61 3458764513820540928-4611686018427387902 > </MetaHandleRanges> > <DataHandleRanges> > Range blade58 4611686018427387903-5764607523034234877 > Range blade59 5764607523034234878-6917529027641081852 > Range blade60 6917529027641081853-8070450532247928827 > Range blade61 8070450532247928828-9223372036854775802 > </DataHandleRanges> > <StorageHints> > TroveSyncMeta no > TroveSyncData no > TroveMethod directio > </StorageHints> > </Filesystem> > > I'm testing the system writing (continuously) from 1 client machine chunks > of 500K. After few seconds, the client is not able to write. Checking > manually the file system, I can see my file (running ls) and it seems to be > corrupted (no information about the file is given and I cannot remove the > file). The only solution is to stop all clients / servers and re-create the > file system. > > Thanks in advance > > Vincenzo > > _______________________________________________ > Pvfs2-users mailing list > [email protected] > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users > >
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
