Thanks Michael I've tried setting alt-aio as TroveMethod and the problem is still there.
Some logs: Client (blade39) says: [E 13:36:19.590763] server: tcp://blade60:3334 [E 13:36:19.591006] io_process_context_recv (op_status): No such file or directory [E 13:36:19.591018] server: tcp://blade61:3334 [E 13:36:19.768105] io_process_context_recv (op_status): No such file or directory Servers: blade58: [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred [E 06/17 13:37] handle_io_error: flow proto error cleanup started on 0x7f5cac004370: Connection reset by peer [E 06/17 13:37] handle_io_error: flow proto 0x7f5cac004370 canceled 0 operations, will clean up. [E 06/17 13:37] handle_io_error: flow proto 0x7f5cac004370 error cleanup finished: Connection reset by peer [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred [E 06/17 13:37] handle_io_error: flow proto error cleanup started on 0x7f5cac0ee8f0: Connection reset by peer [E 06/17 13:37] handle_io_error: flow proto 0x7f5cac0ee8f0 canceled 0 operations, will clean up. [E 06/17 13:37] handle_io_error: flow proto 0x7f5cac0ee8f0 error cleanup finished: Connection reset by peer blade59: [E 06/17 13:37] trove_write_callback_fn: I/O error occurred [E 06/17 13:37] handle_io_error: flow proto error cleanup started on 0x799410: Broken pipe [E 06/17 13:37] handle_io_error: flow proto 0x799410 canceled 0 operations, will clean up. [E 06/17 13:37] handle_io_error: flow proto 0x799410 error cleanup finished: Broken pipe blade60: [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred [E 06/17 13:37] handle_io_error: flow proto error cleanup started on 0x7fb0a012bed0: Connection reset by peer [E 06/17 13:37] handle_io_error: flow proto 0x7fb0a012bed0 canceled 0 operations, will clean up. [E 06/17 13:37] handle_io_error: flow proto 0x7fb0a012bed0 error cleanup finished: Connection reset by peer blade61: [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred [E 06/17 13:37] handle_io_error: flow proto error cleanup started on 0x76b5a0: Broken pipe [E 06/17 13:37] handle_io_error: flow proto 0x76b5a0 canceled 0 operations, will clean up. [E 06/17 13:37] handle_io_error: flow proto 0x76b5a0 error cleanup finished: Broken pipe [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred [E 06/17 13:37] handle_io_error: flow proto error cleanup started on 0x778e00: Broken pipe [E 06/17 13:37] handle_io_error: flow proto 0x778e00 canceled 0 operations, will clean up. [E 06/17 13:37] handle_io_error: flow proto 0x778e00 error cleanup finished: Broken pipe [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred [E 06/17 13:37] handle_io_error: flow proto error cleanup started on 0x76b5a0: Broken pipe [E 06/17 13:37] handle_io_error: flow proto 0x76b5a0 canceled 0 operations, will clean up. [E 06/17 13:37] handle_io_error: flow proto 0x76b5a0 error cleanup finished: Broken pipe [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred [E 06/17 13:37] handle_io_error: flow proto error cleanup started on 0x778e00: Broken pipe [E 06/17 13:37] handle_io_error: flow proto 0x778e00 canceled 0 operations, will clean up. [E 06/17 13:37] handle_io_error: flow proto 0x778e00 error cleanup finished: Broken pipe Vincenzo On 17 June 2011 13:29, Michael Moore <[email protected]> wrote: > Hi Vincenzo, > > This sounds similar to an issue just reported by Benjamin Seevers here on > the developers list: > > http://www.beowulf-underground.org/pipermail/pvfs2-developers/2011-June/004732.html > > Based on his experience with the issue if you switch to alt-aio instead of > directio the corruption no longer occurs. Could you try switching from > directio to alt-aio in your configuration to help isolate if this is a > similar or different issue? If that doesn't resolve the issue, could you > provide what errors, if any, you see on the client when it fails and what > errors, if any, appear in the pvfs2-server logs? > > Thanks, > Michael > > On Fri, Jun 17, 2011 at 6:48 AM, Vincenzo Gulisano < > [email protected]> wrote: > >> Hi, >> I'm using the following setup: >> 4 machines used as I/O server >> 10 machines used as I/O client >> >> The configuration file is the following: >> >> <Defaults> >> UnexpectedRequests 50 >> EventLogging none >> EnableTracing no >> LogStamp datetime >> BMIModules bmi_tcp >> FlowModules flowproto_multiqueue >> PerfUpdateInterval 1000 >> ServerJobBMITimeoutSecs 30 >> ServerJobFlowTimeoutSecs 30 >> ClientJobBMITimeoutSecs 300 >> ClientJobFlowTimeoutSecs 300 >> ClientRetryLimit 5 >> ClientRetryDelayMilliSecs 2000 >> PrecreateBatchSize 512 >> PrecreateLowThreshold 256 >> TCPBufferSend 524288 >> TCPBufferReceive 524288 >> StorageSpace /local/vincenzo/pvfs2-storage-space >> LogFile /tmp/pvfs2-server.log >> </Defaults> >> >> <Aliases> >> Alias blade58 tcp://blade58:3334 >> Alias blade59 tcp://blade59:3334 >> Alias blade60 tcp://blade60:3334 >> Alias blade61 tcp://blade61:3334 >> </Aliases> >> >> <Filesystem> >> Name pvfs2-fs >> ID 1615492168 >> RootHandle 1048576 >> FileStuffing yes >> <MetaHandleRanges> >> Range blade58 3-1152921504606846977 >> Range blade59 1152921504606846978-2305843009213693952 >> Range blade60 2305843009213693953-3458764513820540927 >> Range blade61 3458764513820540928-4611686018427387902 >> </MetaHandleRanges> >> <DataHandleRanges> >> Range blade58 4611686018427387903-5764607523034234877 >> Range blade59 5764607523034234878-6917529027641081852 >> Range blade60 6917529027641081853-8070450532247928827 >> Range blade61 8070450532247928828-9223372036854775802 >> </DataHandleRanges> >> <StorageHints> >> TroveSyncMeta no >> TroveSyncData no >> TroveMethod directio >> </StorageHints> >> </Filesystem> >> >> I'm testing the system writing (continuously) from 1 client machine chunks >> of 500K. After few seconds, the client is not able to write. Checking >> manually the file system, I can see my file (running ls) and it seems to be >> corrupted (no information about the file is given and I cannot remove the >> file). The only solution is to stop all clients / servers and re-create the >> file system. >> >> Thanks in advance >> >> Vincenzo >> >> _______________________________________________ >> Pvfs2-users mailing list >> [email protected] >> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users >> >> >
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
