I believe it returns ENOSYS if the calls aren't defined. I agree if we can't
get a fix, it is better to disable it. However, the turn-around time on
getting it off in a released version, that version being in use, is
hopefully longer than the resolution time of the issue.

I've been doing some digging into the issue. First, the one time I ran the
example code from the man page with AIO_MAXIO == 2 (two reads at once)
succeeds, greater than two and it fails. The kernel panics are all over the
place, in code that can't be the problem (scsi, ext3, etc). Which makes me
think we doing something really bad(tm). I've attached a handful of panics
I've collected while looking at this. I think the issue is in
pvfs_bufmap_copy_to_user_task_iovec(). It makes sense in the case that
reads, not writes, cause the panics. However, not sure what the issue is
yet. Any insight or recommendations appreciated.

Michael

On Fri, Jun 17, 2011 at 10:23 AM, Phil Carns <[email protected]> wrote:

> **
> What happens if a file system simply doesn't provide the aio functions (ie,
> leaves aio_write, aio_read, etc. set to NULL in the file_operations
> structure)?  I wonder if the aio system calls return ENOSYS or if the kernel
> just services them as blocking calls.
>
> At any rate it seems like it might be a good idea to turn off that
> functionality until the bug is fixed so that folks don't get caught off
> guard.
>
> -Phil
>
>
> On 06/17/2011 10:07 AM, Michael Moore wrote:
>
> Good memory, Phil!
>
> Vincenzo, you are welcome to try and upgrade to OrangeFS however I don't
> suspect it will do too much good. Let me get this on our list and take a
> look at it.
>
> Michael
>
>  On Fri, Jun 17, 2011 at 9:54 AM, Phil Carns <[email protected]> wrote:
>
>>  I think there must be a problem with the client (kernel) side aio support
>> in PVFS.  There is a related bug report from a while back:
>>
>>
>> http://www.beowulf-underground.org/pipermail/pvfs2-users/2010-February/003045.html
>>
>> The libaio library described in that bug report uses the io_submit()
>> system call as well.
>>
>> -Phil
>>
>>
>> On 06/17/2011 08:20 AM, Vincenzo Gulisano wrote:
>>
>> It's an ubuntu server, 2.6.24-24-server 64 bits
>> pvfs2 is 2.8.2
>>
>>  I've 1 client that loops calling syscall(SYS_io_submit,...
>>
>>
>> On 17 June 2011 14:02, Michael Moore <[email protected]> wrote:
>>
>>> What version of OrangeFS/PVFS and what distro/kernel version is used in
>>> the setup? To re-create it, just a stream of simple write() calls from a
>>> single client or something more involved?
>>>
>>> Thanks,
>>> Michael
>>>
>>>
>>>  On Fri, Jun 17, 2011 at 7:43 AM, Vincenzo Gulisano <
>>> [email protected]> wrote:
>>>
>>>> Thanks Michael
>>>>
>>>>  I've tried setting alt-aio as TroveMethod and the problem is still
>>>> there.
>>>>
>>>>  Some logs:
>>>>
>>>>  Client (blade39) says:
>>>>
>>>>   [E 13:36:19.590763] server: tcp://blade60:3334
>>>> [E 13:36:19.591006] io_process_context_recv (op_status): No such file or
>>>> directory
>>>> [E 13:36:19.591018] server: tcp://blade61:3334
>>>> [E 13:36:19.768105] io_process_context_recv (op_status): No such file or
>>>> directory
>>>>
>>>>  Servers:
>>>>
>>>>  blade58:
>>>>  [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
>>>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on
>>>> 0x7f5cac004370: Connection reset by peer
>>>> [E 06/17 13:37] handle_io_error: flow proto 0x7f5cac004370 canceled 0
>>>> operations, will clean up.
>>>> [E 06/17 13:37] handle_io_error: flow proto 0x7f5cac004370 error cleanup
>>>> finished: Connection reset by peer
>>>> [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
>>>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on
>>>> 0x7f5cac0ee8f0: Connection reset by peer
>>>> [E 06/17 13:37] handle_io_error: flow proto 0x7f5cac0ee8f0 canceled 0
>>>> operations, will clean up.
>>>> [E 06/17 13:37] handle_io_error: flow proto 0x7f5cac0ee8f0 error cleanup
>>>> finished: Connection reset by peer
>>>>
>>>>  blade59:
>>>>  [E 06/17 13:37] trove_write_callback_fn: I/O error occurred
>>>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on
>>>> 0x799410: Broken pipe
>>>> [E 06/17 13:37] handle_io_error: flow proto 0x799410 canceled 0
>>>> operations, will clean up.
>>>> [E 06/17 13:37] handle_io_error: flow proto 0x799410 error cleanup
>>>> finished: Broken pipe
>>>>
>>>>  blade60:
>>>>  [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
>>>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on
>>>> 0x7fb0a012bed0: Connection reset by peer
>>>> [E 06/17 13:37] handle_io_error: flow proto 0x7fb0a012bed0 canceled 0
>>>> operations, will clean up.
>>>> [E 06/17 13:37] handle_io_error: flow proto 0x7fb0a012bed0 error cleanup
>>>> finished: Connection reset by peer
>>>>
>>>>  blade61:
>>>>  [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
>>>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on
>>>> 0x76b5a0: Broken pipe
>>>> [E 06/17 13:37] handle_io_error: flow proto 0x76b5a0 canceled 0
>>>> operations, will clean up.
>>>> [E 06/17 13:37] handle_io_error: flow proto 0x76b5a0 error cleanup
>>>> finished: Broken pipe
>>>> [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
>>>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on
>>>> 0x778e00: Broken pipe
>>>> [E 06/17 13:37] handle_io_error: flow proto 0x778e00 canceled 0
>>>> operations, will clean up.
>>>> [E 06/17 13:37] handle_io_error: flow proto 0x778e00 error cleanup
>>>> finished: Broken pipe
>>>>
>>>>  [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
>>>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on
>>>> 0x76b5a0: Broken pipe
>>>> [E 06/17 13:37] handle_io_error: flow proto 0x76b5a0 canceled 0
>>>> operations, will clean up.
>>>> [E 06/17 13:37] handle_io_error: flow proto 0x76b5a0 error cleanup
>>>> finished: Broken pipe
>>>> [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
>>>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on
>>>> 0x778e00: Broken pipe
>>>> [E 06/17 13:37] handle_io_error: flow proto 0x778e00 canceled 0
>>>> operations, will clean up.
>>>> [E 06/17 13:37] handle_io_error: flow proto 0x778e00 error cleanup
>>>> finished: Broken pipe
>>>>
>>>> Vincenzo
>>>>
>>>> On 17 June 2011 13:29, Michael Moore <[email protected]> wrote:
>>>>
>>>>> Hi Vincenzo,
>>>>>
>>>>> This sounds similar to an issue just reported by Benjamin Seevers here
>>>>> on the developers list:
>>>>>
>>>>> http://www.beowulf-underground.org/pipermail/pvfs2-developers/2011-June/004732.html
>>>>>
>>>>> Based on his experience with the issue if you switch to alt-aio instead
>>>>> of directio the corruption no longer occurs. Could you try switching from
>>>>> directio to alt-aio in your configuration to help isolate if this is a
>>>>> similar or different issue? If that doesn't resolve the issue, could you
>>>>> provide what errors, if any, you see on the client when it fails and what
>>>>> errors, if any, appear in the pvfs2-server logs?
>>>>>
>>>>> Thanks,
>>>>> Michael
>>>>>
>>>>>  On Fri, Jun 17, 2011 at 6:48 AM, Vincenzo Gulisano <
>>>>> [email protected]> wrote:
>>>>>
>>>>>>  Hi,
>>>>>> I'm using the following setup:
>>>>>> 4 machines used as I/O server
>>>>>> 10 machines used as I/O client
>>>>>>
>>>>>>  The configuration file is the following:
>>>>>>
>>>>>>  <Defaults>
>>>>>>  UnexpectedRequests 50
>>>>>>  EventLogging none
>>>>>>  EnableTracing no
>>>>>>  LogStamp datetime
>>>>>>  BMIModules bmi_tcp
>>>>>>  FlowModules flowproto_multiqueue
>>>>>>  PerfUpdateInterval 1000
>>>>>>  ServerJobBMITimeoutSecs 30
>>>>>>  ServerJobFlowTimeoutSecs 30
>>>>>>  ClientJobBMITimeoutSecs 300
>>>>>>  ClientJobFlowTimeoutSecs 300
>>>>>>  ClientRetryLimit 5
>>>>>>  ClientRetryDelayMilliSecs 2000
>>>>>>  PrecreateBatchSize 512
>>>>>>  PrecreateLowThreshold 256
>>>>>>  TCPBufferSend 524288
>>>>>>  TCPBufferReceive 524288
>>>>>>  StorageSpace /local/vincenzo/pvfs2-storage-space
>>>>>>  LogFile /tmp/pvfs2-server.log
>>>>>> </Defaults>
>>>>>>
>>>>>>  <Aliases>
>>>>>>  Alias blade58 tcp://blade58:3334
>>>>>>  Alias blade59 tcp://blade59:3334
>>>>>>  Alias blade60 tcp://blade60:3334
>>>>>>  Alias blade61 tcp://blade61:3334
>>>>>> </Aliases>
>>>>>>
>>>>>>  <Filesystem>
>>>>>>  Name pvfs2-fs
>>>>>>  ID 1615492168
>>>>>>  RootHandle 1048576
>>>>>>  FileStuffing yes
>>>>>>  <MetaHandleRanges>
>>>>>>  Range blade58 3-1152921504606846977
>>>>>>  Range blade59 1152921504606846978-2305843009213693952
>>>>>>  Range blade60 2305843009213693953-3458764513820540927
>>>>>>  Range blade61 3458764513820540928-4611686018427387902
>>>>>>  </MetaHandleRanges>
>>>>>>  <DataHandleRanges>
>>>>>>  Range blade58 4611686018427387903-5764607523034234877
>>>>>>  Range blade59 5764607523034234878-6917529027641081852
>>>>>>  Range blade60 6917529027641081853-8070450532247928827
>>>>>>  Range blade61 8070450532247928828-9223372036854775802
>>>>>>  </DataHandleRanges>
>>>>>>  <StorageHints>
>>>>>>  TroveSyncMeta no
>>>>>>  TroveSyncData no
>>>>>>  TroveMethod directio
>>>>>>  </StorageHints>
>>>>>> </Filesystem>
>>>>>>
>>>>>>  I'm testing the system writing (continuously) from 1 client machine
>>>>>> chunks of 500K. After few seconds, the client is not able to write. 
>>>>>> Checking
>>>>>> manually the file system, I can see my file (running ls) and it seems to 
>>>>>> be
>>>>>> corrupted (no information about the file is given and I cannot remove the
>>>>>> file). The only solution is to stop all clients / servers and re-create 
>>>>>> the
>>>>>> file system.
>>>>>>
>>>>>>  Thanks in advance
>>>>>>
>>>>>>  Vincenzo
>>>>>>
>>>>>>  _______________________________________________
>>>>>> Pvfs2-users mailing list
>>>>>> [email protected]
>>>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>> _______________________________________________
>> Pvfs2-users mailing 
>> [email protected]http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>
>>
>>
>> _______________________________________________
>> Pvfs2-users mailing list
>> [email protected]
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>
>>
>
>

Attachment: aio.panic.tar.gz
Description: GNU Zip compressed data

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to