Re: [Pvfs2-users] errors in server log

Phil Carns Thu, 11 Nov 2010 08:01:23 -0800

Hi Christopher,

I think the root error message here is that the server got a "no suchfile or directory" error while it was trying to write data. From thereit cancelled its current I/O operation, which in turn made the clienttime out, which probably reset communication and caused the broken pipemessages.

Do you know what sort of workload was occurring when this happened? Isit possible that a file was deleted while a process was still writing to it?


The dns round-robin should work just fine for your fstab.

thanks,
-Phil

On 11/03/2010 12:27 PM, Christopher Coffey wrote:

 Hello,

I'm concerned with some errors I'm seeing in the logs on one of ourstorage nodes and one client node. This is on a freshly built pvfs2 fs.


[D 11/02 11:39] PVFS2 Server version 2.8.2 starting.
[E 11/02 14:53] trove_write_callback_fn: I/O error occurred

[E 11/02 14:53] handle_io_error: flow proto error cleanup started on0x5535420: No such file or directory[E 11/02 14:53] handle_io_error: flow proto 0x5535420 canceled 0operations, will clean up.[E 11/02 14:53] handle_io_error: flow proto 0x5535420 error cleanupfinished: No such file or directory

[E 11/02 14:58] trove_read_callback_fn: I/O error occurred

[E 11/02 14:58] handle_io_error: flow proto error cleanup started on0x55b6330: Broken pipe[E 11/02 14:58] handle_io_error: flow proto 0x55b6330 canceled 0operations, will clean up.[E 11/02 14:58] handle_io_error: flow proto 0x55b6330 error cleanupfinished: Broken pipe



[E 11:43:15.488743] PVFS Client Daemon Started.  Version 2.8.2
[D 11:43:15.488941] [INFO]: Mapping pointer 0x2abd7a0cc000 for I/O.
[D 11:43:15.495858] [INFO]: Mapping pointer 0x7878000 for I/O.

[E 14:58:10.085848] job_time_mgr_expire: job time out: cancelling bmioperation, job_id: 59797618.

[E 14:58:10.085938] bmi_to_mem_callback_fn: I/O error occurred

[E 14:58:10.085949] handle_io_error: flow proto error cleanup startedon 0x883d528: Operation cancelled (possibly due to timeout)[E 14:58:10.085956] handle_io_error: flow proto 0x883d528 canceled 0operations, will clean up.[E 14:58:10.085963] handle_io_error: flow proto 0x883d528 errorcleanup finished: Operation cancelled (possibly due to timeout)[E 14:58:10.085972] io_datafile_complete_operations: flow failed,retrying from msgpair



Configuration and other info:

> pvfs2-statfs -m /pvfs2

aggregate statistics:
---------------------------------------

        fs_id: 1713165884
        total number of servers (meta and I/O): 3
        handles available (meta and I/O):       9223372036854771237
        handles total (meta and I/O):           9223372036854775800
        bytes available:                        11107855478784
        bytes total:                            11249595187200

NOTE: The aggregate total and available statistics are calculated based
on an algorithm that assumes data will be distributed evenly; thus
the free space is equal to the smallest I/O server capacity
multiplied by the number of I/O servers.  If this number seems
unusually small, then check the individual server statistics below
to look for problematic servers.

meta server statistics:
---------------------------------------

server: tcp://sn1.ib:3334
        RAM bytes total  : 8365256704
        RAM bytes free   : 48209920
        uptime (seconds) : 171290
        load averages    : 12480 27808 24480
        handles available: 3074457345618257075
        handles total    : 3074457345618258600
        bytes available  : 3702620225536
        bytes total      : 3749865062400
        mode: serving both metadata and I/O data

server: tcp://sn2.ib:3334
        RAM bytes total  : 8365256704
        RAM bytes free   : 48979968
        uptime (seconds) : 169733
        load averages    : 39328 33984 26464
        handles available: 3074457345618257081
        handles total    : 3074457345618258600
        bytes available  : 3702620045312
        bytes total      : 3749865062400
        mode: serving both metadata and I/O data

server: tcp://sn3.ib:3334
        RAM bytes total  : 8365256704
        RAM bytes free   : 46600192
        uptime (seconds) : 171290
        load averages    : 24352 26560 21920
        handles available: 3074457345618257081
        handles total    : 3074457345618258600
        bytes available  : 3702618492928
        bytes total      : 3749865062400
        mode: serving both metadata and I/O data


I/O server statistics:
---------------------------------------

server: tcp://sn1.ib:3334
        RAM bytes total  : 8365256704
        RAM bytes free   : 48209920
        uptime (seconds) : 171290
        load averages    : 12480 27808 24480
        handles available: 3074457345618257075
        handles total    : 3074457345618258600
        bytes available  : 3702620225536
        bytes total      : 3749865062400
        mode: serving both metadata and I/O data

server: tcp://sn2.ib:3334
        RAM bytes total  : 8365256704
        RAM bytes free   : 48979968
        uptime (seconds) : 169733
        load averages    : 39328 33984 26464
        handles available: 3074457345618257081
        handles total    : 3074457345618258600
        bytes available  : 3702620045312
        bytes total      : 3749865062400
        mode: serving both metadata and I/O data

server: tcp://sn3.ib:3334
        RAM bytes total  : 8365256704
        RAM bytes free   : 46600192
        uptime (seconds) : 171290
        load averages    : 24352 26560 21920
        handles available: 3074457345618257081
        handles total    : 3074457345618258600
        bytes available  : 3702618492928
        bytes total      : 3749865062400
        mode: serving both metadata and I/O data


Environment information:

- 3 storage nodes, I/O, and metadata roles
- 4 clients
- the storage nodes mount their storage from some raid boxes on a SAN

The pvfs communication is tcp on infiniband. One thing I did that mayor may not be an issue was to setup DNS round robin for the storagenodes access from the clients. So each client has a line like this intheir /etc/fstab:


tcp://pvfsnsd.ib:3334/pvfs2-fs /pvfs2 pvfs2 defaults,noauto,intr 0 0

So the requests in theory should be balanced to all of the 3 storagenodes.


Let me know if you need additional information, thank you.
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users


_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Re: [Pvfs2-users] errors in server log

Reply via email to