What happens if a file system simply doesn't provide the aio functions (ie, leaves aio_write, aio_read, etc. set to NULL in the file_operations structure)? I wonder if the aio system calls return ENOSYS or if the kernel just services them as blocking calls.

At any rate it seems like it might be a good idea to turn off that functionality until the bug is fixed so that folks don't get caught off guard.

-Phil

On 06/17/2011 10:07 AM, Michael Moore wrote:
Good memory, Phil!

Vincenzo, you are welcome to try and upgrade to OrangeFS however I don't suspect it will do too much good. Let me get this on our list and take a look at it.

Michael

On Fri, Jun 17, 2011 at 9:54 AM, Phil Carns <[email protected] <mailto:[email protected]>> wrote:

    I think there must be a problem with the client (kernel) side aio
    support in PVFS.  There is a related bug report from a while back:

    
http://www.beowulf-underground.org/pipermail/pvfs2-users/2010-February/003045.html

    The libaio library described in that bug report uses the
    io_submit() system call as well.

    -Phil


    On 06/17/2011 08:20 AM, Vincenzo Gulisano wrote:
    It's an ubuntu server, 2.6.24-24-server 64 bits
    pvfs2 is 2.8.2

    I've 1 client that loops calling syscall(SYS_io_submit,...


    On 17 June 2011 14:02, Michael Moore <[email protected]
    <mailto:[email protected]>> wrote:

        What version of OrangeFS/PVFS and what distro/kernel version
        is used in the setup? To re-create it, just a stream of
        simple write() calls from a single client or something more
        involved?

        Thanks,
        Michael


        On Fri, Jun 17, 2011 at 7:43 AM, Vincenzo Gulisano
        <[email protected]
        <mailto:[email protected]>> wrote:

            Thanks Michael

            I've tried setting alt-aio as TroveMethod and the problem
            is still there.

            Some logs:

            Client (blade39) says:

            [E 13:36:19.590763] server: tcp://blade60:3334
            [E 13:36:19.591006] io_process_context_recv (op_status):
            No such file or directory
            [E 13:36:19.591018] server: tcp://blade61:3334
            [E 13:36:19.768105] io_process_context_recv (op_status):
            No such file or directory

            Servers:

            blade58:
            [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
            [E 06/17 13:37] handle_io_error: flow proto error cleanup
            started on 0x7f5cac004370: Connection reset by peer
            [E 06/17 13:37] handle_io_error: flow proto
            0x7f5cac004370 canceled 0 operations, will clean up.
            [E 06/17 13:37] handle_io_error: flow proto
            0x7f5cac004370 error cleanup finished: Connection reset
            by peer
            [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
            [E 06/17 13:37] handle_io_error: flow proto error cleanup
            started on 0x7f5cac0ee8f0: Connection reset by peer
            [E 06/17 13:37] handle_io_error: flow proto
            0x7f5cac0ee8f0 canceled 0 operations, will clean up.
            [E 06/17 13:37] handle_io_error: flow proto
            0x7f5cac0ee8f0 error cleanup finished: Connection reset
            by peer

            blade59:
            [E 06/17 13:37] trove_write_callback_fn: I/O error occurred
            [E 06/17 13:37] handle_io_error: flow proto error cleanup
            started on 0x799410: Broken pipe
            [E 06/17 13:37] handle_io_error: flow proto 0x799410
            canceled 0 operations, will clean up.
            [E 06/17 13:37] handle_io_error: flow proto 0x799410
            error cleanup finished: Broken pipe

            blade60:
            [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
            [E 06/17 13:37] handle_io_error: flow proto error cleanup
            started on 0x7fb0a012bed0: Connection reset by peer
            [E 06/17 13:37] handle_io_error: flow proto
            0x7fb0a012bed0 canceled 0 operations, will clean up.
            [E 06/17 13:37] handle_io_error: flow proto
            0x7fb0a012bed0 error cleanup finished: Connection reset
            by peer

            blade61:
            [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
            [E 06/17 13:37] handle_io_error: flow proto error cleanup
            started on 0x76b5a0: Broken pipe
            [E 06/17 13:37] handle_io_error: flow proto 0x76b5a0
            canceled 0 operations, will clean up.
            [E 06/17 13:37] handle_io_error: flow proto 0x76b5a0
            error cleanup finished: Broken pipe
            [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
            [E 06/17 13:37] handle_io_error: flow proto error cleanup
            started on 0x778e00: Broken pipe
            [E 06/17 13:37] handle_io_error: flow proto 0x778e00
            canceled 0 operations, will clean up.
            [E 06/17 13:37] handle_io_error: flow proto 0x778e00
            error cleanup finished: Broken pipe

            [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
            [E 06/17 13:37] handle_io_error: flow proto error cleanup
            started on 0x76b5a0: Broken pipe
            [E 06/17 13:37] handle_io_error: flow proto 0x76b5a0
            canceled 0 operations, will clean up.
            [E 06/17 13:37] handle_io_error: flow proto 0x76b5a0
            error cleanup finished: Broken pipe
            [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
            [E 06/17 13:37] handle_io_error: flow proto error cleanup
            started on 0x778e00: Broken pipe
            [E 06/17 13:37] handle_io_error: flow proto 0x778e00
            canceled 0 operations, will clean up.
            [E 06/17 13:37] handle_io_error: flow proto 0x778e00
            error cleanup finished: Broken pipe

            Vincenzo

            On 17 June 2011 13:29, Michael Moore
            <[email protected] <mailto:[email protected]>> wrote:

                Hi Vincenzo,

                This sounds similar to an issue just reported by
                Benjamin Seevers here on the developers list:
                
http://www.beowulf-underground.org/pipermail/pvfs2-developers/2011-June/004732.html

                Based on his experience with the issue if you switch
                to alt-aio instead of directio the corruption no
                longer occurs. Could you try switching from directio
                to alt-aio in your configuration to help isolate if
                this is a similar or different issue? If that doesn't
                resolve the issue, could you provide what errors, if
                any, you see on the client when it fails and what
                errors, if any, appear in the pvfs2-server logs?

                Thanks,
                Michael

                On Fri, Jun 17, 2011 at 6:48 AM, Vincenzo Gulisano
                <[email protected]
                <mailto:[email protected]>> wrote:

                    Hi,
                    I'm using the following setup:
                    4 machines used as I/O server
                    10 machines used as I/O client

                    The configuration file is the following:

                    <Defaults>
                    UnexpectedRequests 50
                    EventLogging none
                    EnableTracing no
                    LogStamp datetime
                    BMIModules bmi_tcp
                    FlowModules flowproto_multiqueue
                    PerfUpdateInterval 1000
                    ServerJobBMITimeoutSecs 30
                    ServerJobFlowTimeoutSecs 30
                    ClientJobBMITimeoutSecs 300
                    ClientJobFlowTimeoutSecs 300
                    ClientRetryLimit 5
                    ClientRetryDelayMilliSecs 2000
                    PrecreateBatchSize 512
                    PrecreateLowThreshold 256
                    TCPBufferSend 524288
                    TCPBufferReceive 524288
                    StorageSpace /local/vincenzo/pvfs2-storage-space
                    LogFile /tmp/pvfs2-server.log
                    </Defaults>

                    <Aliases>
                    Alias blade58 tcp://blade58:3334
                    Alias blade59 tcp://blade59:3334
                    Alias blade60 tcp://blade60:3334
                    Alias blade61 tcp://blade61:3334
                    </Aliases>

                    <Filesystem>
                    Name pvfs2-fs
                    ID 1615492168
                    RootHandle 1048576
                    FileStuffing yes
                    <MetaHandleRanges>
                    Range blade58 3-1152921504606846977
                    Range blade59 1152921504606846978-2305843009213693952
                    Range blade60 2305843009213693953-3458764513820540927
                    Range blade61 3458764513820540928-4611686018427387902
                    </MetaHandleRanges>
                    <DataHandleRanges>
                    Range blade58 4611686018427387903-5764607523034234877
                    Range blade59 5764607523034234878-6917529027641081852
                    Range blade60 6917529027641081853-8070450532247928827
                    Range blade61 8070450532247928828-9223372036854775802
                    </DataHandleRanges>
                    <StorageHints>
                    TroveSyncMeta no
                    TroveSyncData no
                    TroveMethod directio
                    </StorageHints>
                    </Filesystem>

                    I'm testing the system writing (continuously)
                    from 1 client machine chunks of 500K. After few
                    seconds, the client is not able to write.
                    Checking manually the file system, I can see my
                    file (running ls) and it seems to be corrupted
                    (no information about the file is given and I
                    cannot remove the file). The only solution is to
                    stop all clients / servers and re-create the file
                    system.

                    Thanks in advance

                    Vincenzo

                    _______________________________________________
                    Pvfs2-users mailing list
                    [email protected]
                    <mailto:[email protected]>
                    
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users






    _______________________________________________
    Pvfs2-users mailing list
    [email protected]  
<mailto:[email protected]>
    http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users


    _______________________________________________
    Pvfs2-users mailing list
    [email protected]
    <mailto:[email protected]>
    http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users



_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to