I think there must be a problem with the client (kernel) side aio support in PVFS. There is a related bug report from a while back:

http://www.beowulf-underground.org/pipermail/pvfs2-users/2010-February/003045.html

The libaio library described in that bug report uses the io_submit() system call as well.

-Phil

On 06/17/2011 08:20 AM, Vincenzo Gulisano wrote:
It's an ubuntu server, 2.6.24-24-server 64 bits
pvfs2 is 2.8.2

I've 1 client that loops calling syscall(SYS_io_submit,...


On 17 June 2011 14:02, Michael Moore <[email protected] <mailto:[email protected]>> wrote:

    What version of OrangeFS/PVFS and what distro/kernel version is
    used in the setup? To re-create it, just a stream of simple
    write() calls from a single client or something more involved?

    Thanks,
    Michael


    On Fri, Jun 17, 2011 at 7:43 AM, Vincenzo Gulisano
    <[email protected] <mailto:[email protected]>>
    wrote:

        Thanks Michael

        I've tried setting alt-aio as TroveMethod and the problem is
        still there.

        Some logs:

        Client (blade39) says:

        [E 13:36:19.590763] server: tcp://blade60:3334
        [E 13:36:19.591006] io_process_context_recv (op_status): No
        such file or directory
        [E 13:36:19.591018] server: tcp://blade61:3334
        [E 13:36:19.768105] io_process_context_recv (op_status): No
        such file or directory

        Servers:

        blade58:
        [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
        [E 06/17 13:37] handle_io_error: flow proto error cleanup
        started on 0x7f5cac004370: Connection reset by peer
        [E 06/17 13:37] handle_io_error: flow proto 0x7f5cac004370
        canceled 0 operations, will clean up.
        [E 06/17 13:37] handle_io_error: flow proto 0x7f5cac004370
        error cleanup finished: Connection reset by peer
        [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
        [E 06/17 13:37] handle_io_error: flow proto error cleanup
        started on 0x7f5cac0ee8f0: Connection reset by peer
        [E 06/17 13:37] handle_io_error: flow proto 0x7f5cac0ee8f0
        canceled 0 operations, will clean up.
        [E 06/17 13:37] handle_io_error: flow proto 0x7f5cac0ee8f0
        error cleanup finished: Connection reset by peer

        blade59:
        [E 06/17 13:37] trove_write_callback_fn: I/O error occurred
        [E 06/17 13:37] handle_io_error: flow proto error cleanup
        started on 0x799410: Broken pipe
        [E 06/17 13:37] handle_io_error: flow proto 0x799410 canceled
        0 operations, will clean up.
        [E 06/17 13:37] handle_io_error: flow proto 0x799410 error
        cleanup finished: Broken pipe

        blade60:
        [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
        [E 06/17 13:37] handle_io_error: flow proto error cleanup
        started on 0x7fb0a012bed0: Connection reset by peer
        [E 06/17 13:37] handle_io_error: flow proto 0x7fb0a012bed0
        canceled 0 operations, will clean up.
        [E 06/17 13:37] handle_io_error: flow proto 0x7fb0a012bed0
        error cleanup finished: Connection reset by peer

        blade61:
        [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
        [E 06/17 13:37] handle_io_error: flow proto error cleanup
        started on 0x76b5a0: Broken pipe
        [E 06/17 13:37] handle_io_error: flow proto 0x76b5a0 canceled
        0 operations, will clean up.
        [E 06/17 13:37] handle_io_error: flow proto 0x76b5a0 error
        cleanup finished: Broken pipe
        [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
        [E 06/17 13:37] handle_io_error: flow proto error cleanup
        started on 0x778e00: Broken pipe
        [E 06/17 13:37] handle_io_error: flow proto 0x778e00 canceled
        0 operations, will clean up.
        [E 06/17 13:37] handle_io_error: flow proto 0x778e00 error
        cleanup finished: Broken pipe

        [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
        [E 06/17 13:37] handle_io_error: flow proto error cleanup
        started on 0x76b5a0: Broken pipe
        [E 06/17 13:37] handle_io_error: flow proto 0x76b5a0 canceled
        0 operations, will clean up.
        [E 06/17 13:37] handle_io_error: flow proto 0x76b5a0 error
        cleanup finished: Broken pipe
        [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
        [E 06/17 13:37] handle_io_error: flow proto error cleanup
        started on 0x778e00: Broken pipe
        [E 06/17 13:37] handle_io_error: flow proto 0x778e00 canceled
        0 operations, will clean up.
        [E 06/17 13:37] handle_io_error: flow proto 0x778e00 error
        cleanup finished: Broken pipe

        Vincenzo

        On 17 June 2011 13:29, Michael Moore <[email protected]
        <mailto:[email protected]>> wrote:

            Hi Vincenzo,

            This sounds similar to an issue just reported by Benjamin
            Seevers here on the developers list:
            
http://www.beowulf-underground.org/pipermail/pvfs2-developers/2011-June/004732.html

            Based on his experience with the issue if you switch to
            alt-aio instead of directio the corruption no longer
            occurs. Could you try switching from directio to alt-aio
            in your configuration to help isolate if this is a similar
            or different issue? If that doesn't resolve the issue,
            could you provide what errors, if any, you see on the
            client when it fails and what errors, if any, appear in
            the pvfs2-server logs?

            Thanks,
            Michael

            On Fri, Jun 17, 2011 at 6:48 AM, Vincenzo Gulisano
            <[email protected]
            <mailto:[email protected]>> wrote:

                Hi,
                I'm using the following setup:
                4 machines used as I/O server
                10 machines used as I/O client

                The configuration file is the following:

                <Defaults>
                UnexpectedRequests 50
                EventLogging none
                EnableTracing no
                LogStamp datetime
                BMIModules bmi_tcp
                FlowModules flowproto_multiqueue
                PerfUpdateInterval 1000
                ServerJobBMITimeoutSecs 30
                ServerJobFlowTimeoutSecs 30
                ClientJobBMITimeoutSecs 300
                ClientJobFlowTimeoutSecs 300
                ClientRetryLimit 5
                ClientRetryDelayMilliSecs 2000
                PrecreateBatchSize 512
                PrecreateLowThreshold 256
                TCPBufferSend 524288
                TCPBufferReceive 524288
                StorageSpace /local/vincenzo/pvfs2-storage-space
                LogFile /tmp/pvfs2-server.log
                </Defaults>

                <Aliases>
                Alias blade58 tcp://blade58:3334
                Alias blade59 tcp://blade59:3334
                Alias blade60 tcp://blade60:3334
                Alias blade61 tcp://blade61:3334
                </Aliases>

                <Filesystem>
                Name pvfs2-fs
                ID 1615492168
                RootHandle 1048576
                FileStuffing yes
                <MetaHandleRanges>
                Range blade58 3-1152921504606846977
                Range blade59 1152921504606846978-2305843009213693952
                Range blade60 2305843009213693953-3458764513820540927
                Range blade61 3458764513820540928-4611686018427387902
                </MetaHandleRanges>
                <DataHandleRanges>
                Range blade58 4611686018427387903-5764607523034234877
                Range blade59 5764607523034234878-6917529027641081852
                Range blade60 6917529027641081853-8070450532247928827
                Range blade61 8070450532247928828-9223372036854775802
                </DataHandleRanges>
                <StorageHints>
                TroveSyncMeta no
                TroveSyncData no
                TroveMethod directio
                </StorageHints>
                </Filesystem>

                I'm testing the system writing (continuously) from 1
                client machine chunks of 500K. After few seconds, the
                client is not able to write. Checking manually
                the file system, I can see my file (running ls) and it
                seems to be corrupted (no information about the file
                is given and I cannot remove the file). The only
                solution is to stop all clients / servers and
                re-create the file system.

                Thanks in advance

                Vincenzo

                _______________________________________________
                Pvfs2-users mailing list
                [email protected]
                <mailto:[email protected]>
                http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users






_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to