I will try to move to OrangeFS in order to check if the problem is still there. Thank you
Vincenzo On 17 June 2011 14:31, Michael Moore <[email protected]> wrote: > If you could, try OrangeFS 2.8.4, there have been several bug fixes that > may or may not address the problem you're seeing. > > Can you provide the full program (or a larger snippet) of what the client > is doing? Does the client code segfault or hang when it fails? > > Thanks, > Michael > > > On Fri, Jun 17, 2011 at 8:20 AM, Vincenzo Gulisano < > [email protected]> wrote: > >> It's an ubuntu server, 2.6.24-24-server 64 bits >> pvfs2 is 2.8.2 >> >> I've 1 client that loops calling syscall(SYS_io_submit,... >> >> >> >> On 17 June 2011 14:02, Michael Moore <[email protected]> wrote: >> >>> What version of OrangeFS/PVFS and what distro/kernel version is used in >>> the setup? To re-create it, just a stream of simple write() calls from a >>> single client or something more involved? >>> >>> Thanks, >>> Michael >>> >>> >>> On Fri, Jun 17, 2011 at 7:43 AM, Vincenzo Gulisano < >>> [email protected]> wrote: >>> >>>> Thanks Michael >>>> >>>> I've tried setting alt-aio as TroveMethod and the problem is still >>>> there. >>>> >>>> Some logs: >>>> >>>> Client (blade39) says: >>>> >>>> [E 13:36:19.590763] server: tcp://blade60:3334 >>>> [E 13:36:19.591006] io_process_context_recv (op_status): No such file or >>>> directory >>>> [E 13:36:19.591018] server: tcp://blade61:3334 >>>> [E 13:36:19.768105] io_process_context_recv (op_status): No such file or >>>> directory >>>> >>>> Servers: >>>> >>>> blade58: >>>> [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred >>>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on >>>> 0x7f5cac004370: Connection reset by peer >>>> [E 06/17 13:37] handle_io_error: flow proto 0x7f5cac004370 canceled 0 >>>> operations, will clean up. >>>> [E 06/17 13:37] handle_io_error: flow proto 0x7f5cac004370 error cleanup >>>> finished: Connection reset by peer >>>> [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred >>>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on >>>> 0x7f5cac0ee8f0: Connection reset by peer >>>> [E 06/17 13:37] handle_io_error: flow proto 0x7f5cac0ee8f0 canceled 0 >>>> operations, will clean up. >>>> [E 06/17 13:37] handle_io_error: flow proto 0x7f5cac0ee8f0 error cleanup >>>> finished: Connection reset by peer >>>> >>>> blade59: >>>> [E 06/17 13:37] trove_write_callback_fn: I/O error occurred >>>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on >>>> 0x799410: Broken pipe >>>> [E 06/17 13:37] handle_io_error: flow proto 0x799410 canceled 0 >>>> operations, will clean up. >>>> [E 06/17 13:37] handle_io_error: flow proto 0x799410 error cleanup >>>> finished: Broken pipe >>>> >>>> blade60: >>>> [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred >>>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on >>>> 0x7fb0a012bed0: Connection reset by peer >>>> [E 06/17 13:37] handle_io_error: flow proto 0x7fb0a012bed0 canceled 0 >>>> operations, will clean up. >>>> [E 06/17 13:37] handle_io_error: flow proto 0x7fb0a012bed0 error cleanup >>>> finished: Connection reset by peer >>>> >>>> blade61: >>>> [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred >>>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on >>>> 0x76b5a0: Broken pipe >>>> [E 06/17 13:37] handle_io_error: flow proto 0x76b5a0 canceled 0 >>>> operations, will clean up. >>>> [E 06/17 13:37] handle_io_error: flow proto 0x76b5a0 error cleanup >>>> finished: Broken pipe >>>> [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred >>>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on >>>> 0x778e00: Broken pipe >>>> [E 06/17 13:37] handle_io_error: flow proto 0x778e00 canceled 0 >>>> operations, will clean up. >>>> [E 06/17 13:37] handle_io_error: flow proto 0x778e00 error cleanup >>>> finished: Broken pipe >>>> >>>> [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred >>>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on >>>> 0x76b5a0: Broken pipe >>>> [E 06/17 13:37] handle_io_error: flow proto 0x76b5a0 canceled 0 >>>> operations, will clean up. >>>> [E 06/17 13:37] handle_io_error: flow proto 0x76b5a0 error cleanup >>>> finished: Broken pipe >>>> [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred >>>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on >>>> 0x778e00: Broken pipe >>>> [E 06/17 13:37] handle_io_error: flow proto 0x778e00 canceled 0 >>>> operations, will clean up. >>>> [E 06/17 13:37] handle_io_error: flow proto 0x778e00 error cleanup >>>> finished: Broken pipe >>>> >>>> Vincenzo >>>> >>>> On 17 June 2011 13:29, Michael Moore <[email protected]> wrote: >>>> >>>>> Hi Vincenzo, >>>>> >>>>> This sounds similar to an issue just reported by Benjamin Seevers here >>>>> on the developers list: >>>>> >>>>> http://www.beowulf-underground.org/pipermail/pvfs2-developers/2011-June/004732.html >>>>> >>>>> Based on his experience with the issue if you switch to alt-aio instead >>>>> of directio the corruption no longer occurs. Could you try switching from >>>>> directio to alt-aio in your configuration to help isolate if this is a >>>>> similar or different issue? If that doesn't resolve the issue, could you >>>>> provide what errors, if any, you see on the client when it fails and what >>>>> errors, if any, appear in the pvfs2-server logs? >>>>> >>>>> Thanks, >>>>> Michael >>>>> >>>>> On Fri, Jun 17, 2011 at 6:48 AM, Vincenzo Gulisano < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi, >>>>>> I'm using the following setup: >>>>>> 4 machines used as I/O server >>>>>> 10 machines used as I/O client >>>>>> >>>>>> The configuration file is the following: >>>>>> >>>>>> <Defaults> >>>>>> UnexpectedRequests 50 >>>>>> EventLogging none >>>>>> EnableTracing no >>>>>> LogStamp datetime >>>>>> BMIModules bmi_tcp >>>>>> FlowModules flowproto_multiqueue >>>>>> PerfUpdateInterval 1000 >>>>>> ServerJobBMITimeoutSecs 30 >>>>>> ServerJobFlowTimeoutSecs 30 >>>>>> ClientJobBMITimeoutSecs 300 >>>>>> ClientJobFlowTimeoutSecs 300 >>>>>> ClientRetryLimit 5 >>>>>> ClientRetryDelayMilliSecs 2000 >>>>>> PrecreateBatchSize 512 >>>>>> PrecreateLowThreshold 256 >>>>>> TCPBufferSend 524288 >>>>>> TCPBufferReceive 524288 >>>>>> StorageSpace /local/vincenzo/pvfs2-storage-space >>>>>> LogFile /tmp/pvfs2-server.log >>>>>> </Defaults> >>>>>> >>>>>> <Aliases> >>>>>> Alias blade58 tcp://blade58:3334 >>>>>> Alias blade59 tcp://blade59:3334 >>>>>> Alias blade60 tcp://blade60:3334 >>>>>> Alias blade61 tcp://blade61:3334 >>>>>> </Aliases> >>>>>> >>>>>> <Filesystem> >>>>>> Name pvfs2-fs >>>>>> ID 1615492168 >>>>>> RootHandle 1048576 >>>>>> FileStuffing yes >>>>>> <MetaHandleRanges> >>>>>> Range blade58 3-1152921504606846977 >>>>>> Range blade59 1152921504606846978-2305843009213693952 >>>>>> Range blade60 2305843009213693953-3458764513820540927 >>>>>> Range blade61 3458764513820540928-4611686018427387902 >>>>>> </MetaHandleRanges> >>>>>> <DataHandleRanges> >>>>>> Range blade58 4611686018427387903-5764607523034234877 >>>>>> Range blade59 5764607523034234878-6917529027641081852 >>>>>> Range blade60 6917529027641081853-8070450532247928827 >>>>>> Range blade61 8070450532247928828-9223372036854775802 >>>>>> </DataHandleRanges> >>>>>> <StorageHints> >>>>>> TroveSyncMeta no >>>>>> TroveSyncData no >>>>>> TroveMethod directio >>>>>> </StorageHints> >>>>>> </Filesystem> >>>>>> >>>>>> I'm testing the system writing (continuously) from 1 client machine >>>>>> chunks of 500K. After few seconds, the client is not able to write. >>>>>> Checking >>>>>> manually the file system, I can see my file (running ls) and it seems to >>>>>> be >>>>>> corrupted (no information about the file is given and I cannot remove the >>>>>> file). The only solution is to stop all clients / servers and re-create >>>>>> the >>>>>> file system. >>>>>> >>>>>> Thanks in advance >>>>>> >>>>>> Vincenzo >>>>>> >>>>>> _______________________________________________ >>>>>> Pvfs2-users mailing list >>>>>> [email protected] >>>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users >>>>>> >>>>>> >>>>> >>>> >>> >> >
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
