What happens if a file system simply doesn't provide the aio functions
(ie, leaves aio_write, aio_read, etc. set to NULL in the file_operations
structure)? I wonder if the aio system calls return ENOSYS or if the
kernel just services them as blocking calls.
At any rate it seems like it might be a good idea to turn off that
functionality until the bug is fixed so that folks don't get caught off
guard.
-Phil
On 06/17/2011 10:07 AM, Michael Moore wrote:
Good memory, Phil!
Vincenzo, you are welcome to try and upgrade to OrangeFS however I
don't suspect it will do too much good. Let me get this on our list
and take a look at it.
Michael
On Fri, Jun 17, 2011 at 9:54 AM, Phil Carns <[email protected]
<mailto:[email protected]>> wrote:
I think there must be a problem with the client (kernel) side aio
support in PVFS. There is a related bug report from a while back:
http://www.beowulf-underground.org/pipermail/pvfs2-users/2010-February/003045.html
The libaio library described in that bug report uses the
io_submit() system call as well.
-Phil
On 06/17/2011 08:20 AM, Vincenzo Gulisano wrote:
It's an ubuntu server, 2.6.24-24-server 64 bits
pvfs2 is 2.8.2
I've 1 client that loops calling syscall(SYS_io_submit,...
On 17 June 2011 14:02, Michael Moore <[email protected]
<mailto:[email protected]>> wrote:
What version of OrangeFS/PVFS and what distro/kernel version
is used in the setup? To re-create it, just a stream of
simple write() calls from a single client or something more
involved?
Thanks,
Michael
On Fri, Jun 17, 2011 at 7:43 AM, Vincenzo Gulisano
<[email protected]
<mailto:[email protected]>> wrote:
Thanks Michael
I've tried setting alt-aio as TroveMethod and the problem
is still there.
Some logs:
Client (blade39) says:
[E 13:36:19.590763] server: tcp://blade60:3334
[E 13:36:19.591006] io_process_context_recv (op_status):
No such file or directory
[E 13:36:19.591018] server: tcp://blade61:3334
[E 13:36:19.768105] io_process_context_recv (op_status):
No such file or directory
Servers:
blade58:
[E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
[E 06/17 13:37] handle_io_error: flow proto error cleanup
started on 0x7f5cac004370: Connection reset by peer
[E 06/17 13:37] handle_io_error: flow proto
0x7f5cac004370 canceled 0 operations, will clean up.
[E 06/17 13:37] handle_io_error: flow proto
0x7f5cac004370 error cleanup finished: Connection reset
by peer
[E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
[E 06/17 13:37] handle_io_error: flow proto error cleanup
started on 0x7f5cac0ee8f0: Connection reset by peer
[E 06/17 13:37] handle_io_error: flow proto
0x7f5cac0ee8f0 canceled 0 operations, will clean up.
[E 06/17 13:37] handle_io_error: flow proto
0x7f5cac0ee8f0 error cleanup finished: Connection reset
by peer
blade59:
[E 06/17 13:37] trove_write_callback_fn: I/O error occurred
[E 06/17 13:37] handle_io_error: flow proto error cleanup
started on 0x799410: Broken pipe
[E 06/17 13:37] handle_io_error: flow proto 0x799410
canceled 0 operations, will clean up.
[E 06/17 13:37] handle_io_error: flow proto 0x799410
error cleanup finished: Broken pipe
blade60:
[E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
[E 06/17 13:37] handle_io_error: flow proto error cleanup
started on 0x7fb0a012bed0: Connection reset by peer
[E 06/17 13:37] handle_io_error: flow proto
0x7fb0a012bed0 canceled 0 operations, will clean up.
[E 06/17 13:37] handle_io_error: flow proto
0x7fb0a012bed0 error cleanup finished: Connection reset
by peer
blade61:
[E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
[E 06/17 13:37] handle_io_error: flow proto error cleanup
started on 0x76b5a0: Broken pipe
[E 06/17 13:37] handle_io_error: flow proto 0x76b5a0
canceled 0 operations, will clean up.
[E 06/17 13:37] handle_io_error: flow proto 0x76b5a0
error cleanup finished: Broken pipe
[E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
[E 06/17 13:37] handle_io_error: flow proto error cleanup
started on 0x778e00: Broken pipe
[E 06/17 13:37] handle_io_error: flow proto 0x778e00
canceled 0 operations, will clean up.
[E 06/17 13:37] handle_io_error: flow proto 0x778e00
error cleanup finished: Broken pipe
[E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
[E 06/17 13:37] handle_io_error: flow proto error cleanup
started on 0x76b5a0: Broken pipe
[E 06/17 13:37] handle_io_error: flow proto 0x76b5a0
canceled 0 operations, will clean up.
[E 06/17 13:37] handle_io_error: flow proto 0x76b5a0
error cleanup finished: Broken pipe
[E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
[E 06/17 13:37] handle_io_error: flow proto error cleanup
started on 0x778e00: Broken pipe
[E 06/17 13:37] handle_io_error: flow proto 0x778e00
canceled 0 operations, will clean up.
[E 06/17 13:37] handle_io_error: flow proto 0x778e00
error cleanup finished: Broken pipe
Vincenzo
On 17 June 2011 13:29, Michael Moore
<[email protected] <mailto:[email protected]>> wrote:
Hi Vincenzo,
This sounds similar to an issue just reported by
Benjamin Seevers here on the developers list:
http://www.beowulf-underground.org/pipermail/pvfs2-developers/2011-June/004732.html
Based on his experience with the issue if you switch
to alt-aio instead of directio the corruption no
longer occurs. Could you try switching from directio
to alt-aio in your configuration to help isolate if
this is a similar or different issue? If that doesn't
resolve the issue, could you provide what errors, if
any, you see on the client when it fails and what
errors, if any, appear in the pvfs2-server logs?
Thanks,
Michael
On Fri, Jun 17, 2011 at 6:48 AM, Vincenzo Gulisano
<[email protected]
<mailto:[email protected]>> wrote:
Hi,
I'm using the following setup:
4 machines used as I/O server
10 machines used as I/O client
The configuration file is the following:
<Defaults>
UnexpectedRequests 50
EventLogging none
EnableTracing no
LogStamp datetime
BMIModules bmi_tcp
FlowModules flowproto_multiqueue
PerfUpdateInterval 1000
ServerJobBMITimeoutSecs 30
ServerJobFlowTimeoutSecs 30
ClientJobBMITimeoutSecs 300
ClientJobFlowTimeoutSecs 300
ClientRetryLimit 5
ClientRetryDelayMilliSecs 2000
PrecreateBatchSize 512
PrecreateLowThreshold 256
TCPBufferSend 524288
TCPBufferReceive 524288
StorageSpace /local/vincenzo/pvfs2-storage-space
LogFile /tmp/pvfs2-server.log
</Defaults>
<Aliases>
Alias blade58 tcp://blade58:3334
Alias blade59 tcp://blade59:3334
Alias blade60 tcp://blade60:3334
Alias blade61 tcp://blade61:3334
</Aliases>
<Filesystem>
Name pvfs2-fs
ID 1615492168
RootHandle 1048576
FileStuffing yes
<MetaHandleRanges>
Range blade58 3-1152921504606846977
Range blade59 1152921504606846978-2305843009213693952
Range blade60 2305843009213693953-3458764513820540927
Range blade61 3458764513820540928-4611686018427387902
</MetaHandleRanges>
<DataHandleRanges>
Range blade58 4611686018427387903-5764607523034234877
Range blade59 5764607523034234878-6917529027641081852
Range blade60 6917529027641081853-8070450532247928827
Range blade61 8070450532247928828-9223372036854775802
</DataHandleRanges>
<StorageHints>
TroveSyncMeta no
TroveSyncData no
TroveMethod directio
</StorageHints>
</Filesystem>
I'm testing the system writing (continuously)
from 1 client machine chunks of 500K. After few
seconds, the client is not able to write.
Checking manually the file system, I can see my
file (running ls) and it seems to be corrupted
(no information about the file is given and I
cannot remove the file). The only solution is to
stop all clients / servers and re-create the file
system.
Thanks in advance
Vincenzo
_______________________________________________
Pvfs2-users mailing list
[email protected]
<mailto:[email protected]>
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
_______________________________________________
Pvfs2-users mailing list
[email protected]
<mailto:[email protected]>
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
_______________________________________________
Pvfs2-users mailing list
[email protected]
<mailto:[email protected]>
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users