Hello,
I'm concerned with some errors I'm seeing in the logs on one of our
storage nodes and one client node. This is on a freshly built pvfs2 fs.
[D 11/02 11:39] PVFS2 Server version 2.8.2 starting.
[E 11/02 14:53] trove_write_callback_fn: I/O error occurred
[E 11/02 14:53] handle_io_error: flow proto error cleanup started on
0x5535420: No such file or directory
[E 11/02 14:53] handle_io_error: flow proto 0x5535420 canceled 0
operations, will clean up.
[E 11/02 14:53] handle_io_error: flow proto 0x5535420 error cleanup
finished: No such file or directory
[E 11/02 14:58] trove_read_callback_fn: I/O error occurred
[E 11/02 14:58] handle_io_error: flow proto error cleanup started on
0x55b6330: Broken pipe
[E 11/02 14:58] handle_io_error: flow proto 0x55b6330 canceled 0
operations, will clean up.
[E 11/02 14:58] handle_io_error: flow proto 0x55b6330 error cleanup
finished: Broken pipe
[E 11:43:15.488743] PVFS Client Daemon Started. Version 2.8.2
[D 11:43:15.488941] [INFO]: Mapping pointer 0x2abd7a0cc000 for I/O.
[D 11:43:15.495858] [INFO]: Mapping pointer 0x7878000 for I/O.
[E 14:58:10.085848] job_time_mgr_expire: job time out: cancelling bmi
operation, job_id: 59797618.
[E 14:58:10.085938] bmi_to_mem_callback_fn: I/O error occurred
[E 14:58:10.085949] handle_io_error: flow proto error cleanup started on
0x883d528: Operation cancelled (possibly due to timeout)
[E 14:58:10.085956] handle_io_error: flow proto 0x883d528 canceled 0
operations, will clean up.
[E 14:58:10.085963] handle_io_error: flow proto 0x883d528 error cleanup
finished: Operation cancelled (possibly due to timeout)
[E 14:58:10.085972] io_datafile_complete_operations: flow failed,
retrying from msgpair
Configuration and other info:
> pvfs2-statfs -m /pvfs2
aggregate statistics:
---------------------------------------
fs_id: 1713165884
total number of servers (meta and I/O): 3
handles available (meta and I/O): 9223372036854771237
handles total (meta and I/O): 9223372036854775800
bytes available: 11107855478784
bytes total: 11249595187200
NOTE: The aggregate total and available statistics are calculated based
on an algorithm that assumes data will be distributed evenly; thus
the free space is equal to the smallest I/O server capacity
multiplied by the number of I/O servers. If this number seems
unusually small, then check the individual server statistics below
to look for problematic servers.
meta server statistics:
---------------------------------------
server: tcp://sn1.ib:3334
RAM bytes total : 8365256704
RAM bytes free : 48209920
uptime (seconds) : 171290
load averages : 12480 27808 24480
handles available: 3074457345618257075
handles total : 3074457345618258600
bytes available : 3702620225536
bytes total : 3749865062400
mode: serving both metadata and I/O data
server: tcp://sn2.ib:3334
RAM bytes total : 8365256704
RAM bytes free : 48979968
uptime (seconds) : 169733
load averages : 39328 33984 26464
handles available: 3074457345618257081
handles total : 3074457345618258600
bytes available : 3702620045312
bytes total : 3749865062400
mode: serving both metadata and I/O data
server: tcp://sn3.ib:3334
RAM bytes total : 8365256704
RAM bytes free : 46600192
uptime (seconds) : 171290
load averages : 24352 26560 21920
handles available: 3074457345618257081
handles total : 3074457345618258600
bytes available : 3702618492928
bytes total : 3749865062400
mode: serving both metadata and I/O data
I/O server statistics:
---------------------------------------
server: tcp://sn1.ib:3334
RAM bytes total : 8365256704
RAM bytes free : 48209920
uptime (seconds) : 171290
load averages : 12480 27808 24480
handles available: 3074457345618257075
handles total : 3074457345618258600
bytes available : 3702620225536
bytes total : 3749865062400
mode: serving both metadata and I/O data
server: tcp://sn2.ib:3334
RAM bytes total : 8365256704
RAM bytes free : 48979968
uptime (seconds) : 169733
load averages : 39328 33984 26464
handles available: 3074457345618257081
handles total : 3074457345618258600
bytes available : 3702620045312
bytes total : 3749865062400
mode: serving both metadata and I/O data
server: tcp://sn3.ib:3334
RAM bytes total : 8365256704
RAM bytes free : 46600192
uptime (seconds) : 171290
load averages : 24352 26560 21920
handles available: 3074457345618257081
handles total : 3074457345618258600
bytes available : 3702618492928
bytes total : 3749865062400
mode: serving both metadata and I/O data
Environment information:
- 3 storage nodes, I/O, and metadata roles
- 4 clients
- the storage nodes mount their storage from some raid boxes on a SAN
The pvfs communication is tcp on infiniband. One thing I did that may
or may not be an issue was to setup DNS round robin for the storage
nodes access from the clients. So each client has a line like this in
their /etc/fstab:
tcp://pvfsnsd.ib:3334/pvfs2-fs /pvfs2 pvfs2 defaults,noauto,intr 0 0
So the requests in theory should be balanced to all of the 3 storage nodes.
Let me know if you need additional information, thank you.
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users