Good memory, Phil! Vincenzo, you are welcome to try and upgrade to OrangeFS however I don't suspect it will do too much good. Let me get this on our list and take a look at it.
Michael On Fri, Jun 17, 2011 at 9:54 AM, Phil Carns <[email protected]> wrote: > ** > I think there must be a problem with the client (kernel) side aio support > in PVFS. There is a related bug report from a while back: > > > http://www.beowulf-underground.org/pipermail/pvfs2-users/2010-February/003045.html > > The libaio library described in that bug report uses the io_submit() system > call as well. > > -Phil > > > On 06/17/2011 08:20 AM, Vincenzo Gulisano wrote: > > It's an ubuntu server, 2.6.24-24-server 64 bits > pvfs2 is 2.8.2 > > I've 1 client that loops calling syscall(SYS_io_submit,... > > > On 17 June 2011 14:02, Michael Moore <[email protected]> wrote: > >> What version of OrangeFS/PVFS and what distro/kernel version is used in >> the setup? To re-create it, just a stream of simple write() calls from a >> single client or something more involved? >> >> Thanks, >> Michael >> >> >> On Fri, Jun 17, 2011 at 7:43 AM, Vincenzo Gulisano < >> [email protected]> wrote: >> >>> Thanks Michael >>> >>> I've tried setting alt-aio as TroveMethod and the problem is still >>> there. >>> >>> Some logs: >>> >>> Client (blade39) says: >>> >>> [E 13:36:19.590763] server: tcp://blade60:3334 >>> [E 13:36:19.591006] io_process_context_recv (op_status): No such file or >>> directory >>> [E 13:36:19.591018] server: tcp://blade61:3334 >>> [E 13:36:19.768105] io_process_context_recv (op_status): No such file or >>> directory >>> >>> Servers: >>> >>> blade58: >>> [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred >>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on >>> 0x7f5cac004370: Connection reset by peer >>> [E 06/17 13:37] handle_io_error: flow proto 0x7f5cac004370 canceled 0 >>> operations, will clean up. >>> [E 06/17 13:37] handle_io_error: flow proto 0x7f5cac004370 error cleanup >>> finished: Connection reset by peer >>> [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred >>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on >>> 0x7f5cac0ee8f0: Connection reset by peer >>> [E 06/17 13:37] handle_io_error: flow proto 0x7f5cac0ee8f0 canceled 0 >>> operations, will clean up. >>> [E 06/17 13:37] handle_io_error: flow proto 0x7f5cac0ee8f0 error cleanup >>> finished: Connection reset by peer >>> >>> blade59: >>> [E 06/17 13:37] trove_write_callback_fn: I/O error occurred >>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on >>> 0x799410: Broken pipe >>> [E 06/17 13:37] handle_io_error: flow proto 0x799410 canceled 0 >>> operations, will clean up. >>> [E 06/17 13:37] handle_io_error: flow proto 0x799410 error cleanup >>> finished: Broken pipe >>> >>> blade60: >>> [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred >>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on >>> 0x7fb0a012bed0: Connection reset by peer >>> [E 06/17 13:37] handle_io_error: flow proto 0x7fb0a012bed0 canceled 0 >>> operations, will clean up. >>> [E 06/17 13:37] handle_io_error: flow proto 0x7fb0a012bed0 error cleanup >>> finished: Connection reset by peer >>> >>> blade61: >>> [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred >>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on >>> 0x76b5a0: Broken pipe >>> [E 06/17 13:37] handle_io_error: flow proto 0x76b5a0 canceled 0 >>> operations, will clean up. >>> [E 06/17 13:37] handle_io_error: flow proto 0x76b5a0 error cleanup >>> finished: Broken pipe >>> [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred >>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on >>> 0x778e00: Broken pipe >>> [E 06/17 13:37] handle_io_error: flow proto 0x778e00 canceled 0 >>> operations, will clean up. >>> [E 06/17 13:37] handle_io_error: flow proto 0x778e00 error cleanup >>> finished: Broken pipe >>> >>> [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred >>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on >>> 0x76b5a0: Broken pipe >>> [E 06/17 13:37] handle_io_error: flow proto 0x76b5a0 canceled 0 >>> operations, will clean up. >>> [E 06/17 13:37] handle_io_error: flow proto 0x76b5a0 error cleanup >>> finished: Broken pipe >>> [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred >>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on >>> 0x778e00: Broken pipe >>> [E 06/17 13:37] handle_io_error: flow proto 0x778e00 canceled 0 >>> operations, will clean up. >>> [E 06/17 13:37] handle_io_error: flow proto 0x778e00 error cleanup >>> finished: Broken pipe >>> >>> Vincenzo >>> >>> On 17 June 2011 13:29, Michael Moore <[email protected]> wrote: >>> >>>> Hi Vincenzo, >>>> >>>> This sounds similar to an issue just reported by Benjamin Seevers here >>>> on the developers list: >>>> >>>> http://www.beowulf-underground.org/pipermail/pvfs2-developers/2011-June/004732.html >>>> >>>> Based on his experience with the issue if you switch to alt-aio instead >>>> of directio the corruption no longer occurs. Could you try switching from >>>> directio to alt-aio in your configuration to help isolate if this is a >>>> similar or different issue? If that doesn't resolve the issue, could you >>>> provide what errors, if any, you see on the client when it fails and what >>>> errors, if any, appear in the pvfs2-server logs? >>>> >>>> Thanks, >>>> Michael >>>> >>>> On Fri, Jun 17, 2011 at 6:48 AM, Vincenzo Gulisano < >>>> [email protected]> wrote: >>>> >>>>> Hi, >>>>> I'm using the following setup: >>>>> 4 machines used as I/O server >>>>> 10 machines used as I/O client >>>>> >>>>> The configuration file is the following: >>>>> >>>>> <Defaults> >>>>> UnexpectedRequests 50 >>>>> EventLogging none >>>>> EnableTracing no >>>>> LogStamp datetime >>>>> BMIModules bmi_tcp >>>>> FlowModules flowproto_multiqueue >>>>> PerfUpdateInterval 1000 >>>>> ServerJobBMITimeoutSecs 30 >>>>> ServerJobFlowTimeoutSecs 30 >>>>> ClientJobBMITimeoutSecs 300 >>>>> ClientJobFlowTimeoutSecs 300 >>>>> ClientRetryLimit 5 >>>>> ClientRetryDelayMilliSecs 2000 >>>>> PrecreateBatchSize 512 >>>>> PrecreateLowThreshold 256 >>>>> TCPBufferSend 524288 >>>>> TCPBufferReceive 524288 >>>>> StorageSpace /local/vincenzo/pvfs2-storage-space >>>>> LogFile /tmp/pvfs2-server.log >>>>> </Defaults> >>>>> >>>>> <Aliases> >>>>> Alias blade58 tcp://blade58:3334 >>>>> Alias blade59 tcp://blade59:3334 >>>>> Alias blade60 tcp://blade60:3334 >>>>> Alias blade61 tcp://blade61:3334 >>>>> </Aliases> >>>>> >>>>> <Filesystem> >>>>> Name pvfs2-fs >>>>> ID 1615492168 >>>>> RootHandle 1048576 >>>>> FileStuffing yes >>>>> <MetaHandleRanges> >>>>> Range blade58 3-1152921504606846977 >>>>> Range blade59 1152921504606846978-2305843009213693952 >>>>> Range blade60 2305843009213693953-3458764513820540927 >>>>> Range blade61 3458764513820540928-4611686018427387902 >>>>> </MetaHandleRanges> >>>>> <DataHandleRanges> >>>>> Range blade58 4611686018427387903-5764607523034234877 >>>>> Range blade59 5764607523034234878-6917529027641081852 >>>>> Range blade60 6917529027641081853-8070450532247928827 >>>>> Range blade61 8070450532247928828-9223372036854775802 >>>>> </DataHandleRanges> >>>>> <StorageHints> >>>>> TroveSyncMeta no >>>>> TroveSyncData no >>>>> TroveMethod directio >>>>> </StorageHints> >>>>> </Filesystem> >>>>> >>>>> I'm testing the system writing (continuously) from 1 client machine >>>>> chunks of 500K. After few seconds, the client is not able to write. >>>>> Checking >>>>> manually the file system, I can see my file (running ls) and it seems to >>>>> be >>>>> corrupted (no information about the file is given and I cannot remove the >>>>> file). The only solution is to stop all clients / servers and re-create >>>>> the >>>>> file system. >>>>> >>>>> Thanks in advance >>>>> >>>>> Vincenzo >>>>> >>>>> _______________________________________________ >>>>> Pvfs2-users mailing list >>>>> [email protected] >>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users >>>>> >>>>> >>>> >>> >> > > _______________________________________________ > Pvfs2-users mailing > [email protected]http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users > > > > _______________________________________________ > Pvfs2-users mailing list > [email protected] > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users > >
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
