Good memory, Phil!

Vincenzo, you are welcome to try and upgrade to OrangeFS however I don't
suspect it will do too much good. Let me get this on our list and take a
look at it.

Michael

On Fri, Jun 17, 2011 at 9:54 AM, Phil Carns <[email protected]> wrote:

> **
> I think there must be a problem with the client (kernel) side aio support
> in PVFS.  There is a related bug report from a while back:
>
>
> http://www.beowulf-underground.org/pipermail/pvfs2-users/2010-February/003045.html
>
> The libaio library described in that bug report uses the io_submit() system
> call as well.
>
> -Phil
>
>
> On 06/17/2011 08:20 AM, Vincenzo Gulisano wrote:
>
> It's an ubuntu server, 2.6.24-24-server 64 bits
> pvfs2 is 2.8.2
>
>  I've 1 client that loops calling syscall(SYS_io_submit,...
>
>
> On 17 June 2011 14:02, Michael Moore <[email protected]> wrote:
>
>> What version of OrangeFS/PVFS and what distro/kernel version is used in
>> the setup? To re-create it, just a stream of simple write() calls from a
>> single client or something more involved?
>>
>> Thanks,
>> Michael
>>
>>
>>  On Fri, Jun 17, 2011 at 7:43 AM, Vincenzo Gulisano <
>> [email protected]> wrote:
>>
>>> Thanks Michael
>>>
>>>  I've tried setting alt-aio as TroveMethod and the problem is still
>>> there.
>>>
>>>  Some logs:
>>>
>>>  Client (blade39) says:
>>>
>>>   [E 13:36:19.590763] server: tcp://blade60:3334
>>> [E 13:36:19.591006] io_process_context_recv (op_status): No such file or
>>> directory
>>> [E 13:36:19.591018] server: tcp://blade61:3334
>>> [E 13:36:19.768105] io_process_context_recv (op_status): No such file or
>>> directory
>>>
>>>  Servers:
>>>
>>>  blade58:
>>>  [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
>>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on
>>> 0x7f5cac004370: Connection reset by peer
>>> [E 06/17 13:37] handle_io_error: flow proto 0x7f5cac004370 canceled 0
>>> operations, will clean up.
>>> [E 06/17 13:37] handle_io_error: flow proto 0x7f5cac004370 error cleanup
>>> finished: Connection reset by peer
>>> [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
>>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on
>>> 0x7f5cac0ee8f0: Connection reset by peer
>>> [E 06/17 13:37] handle_io_error: flow proto 0x7f5cac0ee8f0 canceled 0
>>> operations, will clean up.
>>> [E 06/17 13:37] handle_io_error: flow proto 0x7f5cac0ee8f0 error cleanup
>>> finished: Connection reset by peer
>>>
>>>  blade59:
>>>  [E 06/17 13:37] trove_write_callback_fn: I/O error occurred
>>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on
>>> 0x799410: Broken pipe
>>> [E 06/17 13:37] handle_io_error: flow proto 0x799410 canceled 0
>>> operations, will clean up.
>>> [E 06/17 13:37] handle_io_error: flow proto 0x799410 error cleanup
>>> finished: Broken pipe
>>>
>>>  blade60:
>>>  [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
>>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on
>>> 0x7fb0a012bed0: Connection reset by peer
>>> [E 06/17 13:37] handle_io_error: flow proto 0x7fb0a012bed0 canceled 0
>>> operations, will clean up.
>>> [E 06/17 13:37] handle_io_error: flow proto 0x7fb0a012bed0 error cleanup
>>> finished: Connection reset by peer
>>>
>>>  blade61:
>>>  [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
>>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on
>>> 0x76b5a0: Broken pipe
>>> [E 06/17 13:37] handle_io_error: flow proto 0x76b5a0 canceled 0
>>> operations, will clean up.
>>> [E 06/17 13:37] handle_io_error: flow proto 0x76b5a0 error cleanup
>>> finished: Broken pipe
>>> [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
>>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on
>>> 0x778e00: Broken pipe
>>> [E 06/17 13:37] handle_io_error: flow proto 0x778e00 canceled 0
>>> operations, will clean up.
>>> [E 06/17 13:37] handle_io_error: flow proto 0x778e00 error cleanup
>>> finished: Broken pipe
>>>
>>>  [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
>>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on
>>> 0x76b5a0: Broken pipe
>>> [E 06/17 13:37] handle_io_error: flow proto 0x76b5a0 canceled 0
>>> operations, will clean up.
>>> [E 06/17 13:37] handle_io_error: flow proto 0x76b5a0 error cleanup
>>> finished: Broken pipe
>>> [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
>>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on
>>> 0x778e00: Broken pipe
>>> [E 06/17 13:37] handle_io_error: flow proto 0x778e00 canceled 0
>>> operations, will clean up.
>>> [E 06/17 13:37] handle_io_error: flow proto 0x778e00 error cleanup
>>> finished: Broken pipe
>>>
>>> Vincenzo
>>>
>>> On 17 June 2011 13:29, Michael Moore <[email protected]> wrote:
>>>
>>>> Hi Vincenzo,
>>>>
>>>> This sounds similar to an issue just reported by Benjamin Seevers here
>>>> on the developers list:
>>>>
>>>> http://www.beowulf-underground.org/pipermail/pvfs2-developers/2011-June/004732.html
>>>>
>>>> Based on his experience with the issue if you switch to alt-aio instead
>>>> of directio the corruption no longer occurs. Could you try switching from
>>>> directio to alt-aio in your configuration to help isolate if this is a
>>>> similar or different issue? If that doesn't resolve the issue, could you
>>>> provide what errors, if any, you see on the client when it fails and what
>>>> errors, if any, appear in the pvfs2-server logs?
>>>>
>>>> Thanks,
>>>> Michael
>>>>
>>>>  On Fri, Jun 17, 2011 at 6:48 AM, Vincenzo Gulisano <
>>>> [email protected]> wrote:
>>>>
>>>>>  Hi,
>>>>> I'm using the following setup:
>>>>> 4 machines used as I/O server
>>>>> 10 machines used as I/O client
>>>>>
>>>>>  The configuration file is the following:
>>>>>
>>>>>  <Defaults>
>>>>>  UnexpectedRequests 50
>>>>>  EventLogging none
>>>>>  EnableTracing no
>>>>>  LogStamp datetime
>>>>>  BMIModules bmi_tcp
>>>>>  FlowModules flowproto_multiqueue
>>>>>  PerfUpdateInterval 1000
>>>>>  ServerJobBMITimeoutSecs 30
>>>>>  ServerJobFlowTimeoutSecs 30
>>>>>  ClientJobBMITimeoutSecs 300
>>>>>  ClientJobFlowTimeoutSecs 300
>>>>>  ClientRetryLimit 5
>>>>>  ClientRetryDelayMilliSecs 2000
>>>>>  PrecreateBatchSize 512
>>>>>  PrecreateLowThreshold 256
>>>>>  TCPBufferSend 524288
>>>>>  TCPBufferReceive 524288
>>>>>  StorageSpace /local/vincenzo/pvfs2-storage-space
>>>>>  LogFile /tmp/pvfs2-server.log
>>>>> </Defaults>
>>>>>
>>>>>  <Aliases>
>>>>>  Alias blade58 tcp://blade58:3334
>>>>>  Alias blade59 tcp://blade59:3334
>>>>>  Alias blade60 tcp://blade60:3334
>>>>>  Alias blade61 tcp://blade61:3334
>>>>> </Aliases>
>>>>>
>>>>>  <Filesystem>
>>>>>  Name pvfs2-fs
>>>>>  ID 1615492168
>>>>>  RootHandle 1048576
>>>>>  FileStuffing yes
>>>>>  <MetaHandleRanges>
>>>>>  Range blade58 3-1152921504606846977
>>>>>  Range blade59 1152921504606846978-2305843009213693952
>>>>>  Range blade60 2305843009213693953-3458764513820540927
>>>>>  Range blade61 3458764513820540928-4611686018427387902
>>>>>  </MetaHandleRanges>
>>>>>  <DataHandleRanges>
>>>>>  Range blade58 4611686018427387903-5764607523034234877
>>>>>  Range blade59 5764607523034234878-6917529027641081852
>>>>>  Range blade60 6917529027641081853-8070450532247928827
>>>>>  Range blade61 8070450532247928828-9223372036854775802
>>>>>  </DataHandleRanges>
>>>>>  <StorageHints>
>>>>>  TroveSyncMeta no
>>>>>  TroveSyncData no
>>>>>  TroveMethod directio
>>>>>  </StorageHints>
>>>>> </Filesystem>
>>>>>
>>>>>  I'm testing the system writing (continuously) from 1 client machine
>>>>> chunks of 500K. After few seconds, the client is not able to write. 
>>>>> Checking
>>>>> manually the file system, I can see my file (running ls) and it seems to 
>>>>> be
>>>>> corrupted (no information about the file is given and I cannot remove the
>>>>> file). The only solution is to stop all clients / servers and re-create 
>>>>> the
>>>>> file system.
>>>>>
>>>>>  Thanks in advance
>>>>>
>>>>>  Vincenzo
>>>>>
>>>>>  _______________________________________________
>>>>> Pvfs2-users mailing list
>>>>> [email protected]
>>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>>>>
>>>>>
>>>>
>>>
>>
>
> _______________________________________________
> Pvfs2-users mailing 
> [email protected]http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>
>
>
> _______________________________________________
> Pvfs2-users mailing list
> [email protected]
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>
>
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to