Hello,

I'm concerned with some errors I'm seeing in the logs on one of our storage nodes and one client node. This is on a freshly built pvfs2 fs.

[D 11/02 11:39] PVFS2 Server version 2.8.2 starting.
[E 11/02 14:53] trove_write_callback_fn: I/O error occurred
[E 11/02 14:53] handle_io_error: flow proto error cleanup started on 0x5535420: No such file or directory [E 11/02 14:53] handle_io_error: flow proto 0x5535420 canceled 0 operations, will clean up. [E 11/02 14:53] handle_io_error: flow proto 0x5535420 error cleanup finished: No such file or directory
[E 11/02 14:58] trove_read_callback_fn: I/O error occurred
[E 11/02 14:58] handle_io_error: flow proto error cleanup started on 0x55b6330: Broken pipe [E 11/02 14:58] handle_io_error: flow proto 0x55b6330 canceled 0 operations, will clean up. [E 11/02 14:58] handle_io_error: flow proto 0x55b6330 error cleanup finished: Broken pipe


[E 11:43:15.488743] PVFS Client Daemon Started.  Version 2.8.2
[D 11:43:15.488941] [INFO]: Mapping pointer 0x2abd7a0cc000 for I/O.
[D 11:43:15.495858] [INFO]: Mapping pointer 0x7878000 for I/O.
[E 14:58:10.085848] job_time_mgr_expire: job time out: cancelling bmi operation, job_id: 59797618.
[E 14:58:10.085938] bmi_to_mem_callback_fn: I/O error occurred
[E 14:58:10.085949] handle_io_error: flow proto error cleanup started on 0x883d528: Operation cancelled (possibly due to timeout) [E 14:58:10.085956] handle_io_error: flow proto 0x883d528 canceled 0 operations, will clean up. [E 14:58:10.085963] handle_io_error: flow proto 0x883d528 error cleanup finished: Operation cancelled (possibly due to timeout) [E 14:58:10.085972] io_datafile_complete_operations: flow failed, retrying from msgpair


Configuration and other info:

> pvfs2-statfs -m /pvfs2

aggregate statistics:
---------------------------------------

        fs_id: 1713165884
        total number of servers (meta and I/O): 3
        handles available (meta and I/O):       9223372036854771237
        handles total (meta and I/O):           9223372036854775800
        bytes available:                        11107855478784
        bytes total:                            11249595187200

NOTE: The aggregate total and available statistics are calculated based
on an algorithm that assumes data will be distributed evenly; thus
the free space is equal to the smallest I/O server capacity
multiplied by the number of I/O servers.  If this number seems
unusually small, then check the individual server statistics below
to look for problematic servers.

meta server statistics:
---------------------------------------

server: tcp://sn1.ib:3334
        RAM bytes total  : 8365256704
        RAM bytes free   : 48209920
        uptime (seconds) : 171290
        load averages    : 12480 27808 24480
        handles available: 3074457345618257075
        handles total    : 3074457345618258600
        bytes available  : 3702620225536
        bytes total      : 3749865062400
        mode: serving both metadata and I/O data

server: tcp://sn2.ib:3334
        RAM bytes total  : 8365256704
        RAM bytes free   : 48979968
        uptime (seconds) : 169733
        load averages    : 39328 33984 26464
        handles available: 3074457345618257081
        handles total    : 3074457345618258600
        bytes available  : 3702620045312
        bytes total      : 3749865062400
        mode: serving both metadata and I/O data

server: tcp://sn3.ib:3334
        RAM bytes total  : 8365256704
        RAM bytes free   : 46600192
        uptime (seconds) : 171290
        load averages    : 24352 26560 21920
        handles available: 3074457345618257081
        handles total    : 3074457345618258600
        bytes available  : 3702618492928
        bytes total      : 3749865062400
        mode: serving both metadata and I/O data


I/O server statistics:
---------------------------------------

server: tcp://sn1.ib:3334
        RAM bytes total  : 8365256704
        RAM bytes free   : 48209920
        uptime (seconds) : 171290
        load averages    : 12480 27808 24480
        handles available: 3074457345618257075
        handles total    : 3074457345618258600
        bytes available  : 3702620225536
        bytes total      : 3749865062400
        mode: serving both metadata and I/O data

server: tcp://sn2.ib:3334
        RAM bytes total  : 8365256704
        RAM bytes free   : 48979968
        uptime (seconds) : 169733
        load averages    : 39328 33984 26464
        handles available: 3074457345618257081
        handles total    : 3074457345618258600
        bytes available  : 3702620045312
        bytes total      : 3749865062400
        mode: serving both metadata and I/O data

server: tcp://sn3.ib:3334
        RAM bytes total  : 8365256704
        RAM bytes free   : 46600192
        uptime (seconds) : 171290
        load averages    : 24352 26560 21920
        handles available: 3074457345618257081
        handles total    : 3074457345618258600
        bytes available  : 3702618492928
        bytes total      : 3749865062400
        mode: serving both metadata and I/O data


Environment information:

- 3 storage nodes, I/O, and metadata roles
- 4 clients
- the storage nodes mount their storage from some raid boxes on a SAN

The pvfs communication is tcp on infiniband. One thing I did that may or may not be an issue was to setup DNS round robin for the storage nodes access from the clients. So each client has a line like this in their /etc/fstab:

tcp://pvfsnsd.ib:3334/pvfs2-fs /pvfs2 pvfs2 defaults,noauto,intr 0 0

So the requests in theory should be balanced to all of the 3 storage nodes.

Let me know if you need additional information, thank you.
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to