I will try to move to OrangeFS in order to check if the problem is still
there.
Thank you

Vincenzo

On 17 June 2011 14:31, Michael Moore <[email protected]> wrote:

> If you could, try OrangeFS 2.8.4, there have been several bug fixes that
> may or may not address the problem you're seeing.
>
> Can you provide the full program (or a larger snippet) of what the client
> is doing? Does the client code segfault or hang when it fails?
>
> Thanks,
> Michael
>
>
> On Fri, Jun 17, 2011 at 8:20 AM, Vincenzo Gulisano <
> [email protected]> wrote:
>
>> It's an ubuntu server, 2.6.24-24-server 64 bits
>> pvfs2 is 2.8.2
>>
>> I've 1 client that loops calling syscall(SYS_io_submit,...
>>
>>
>>
>> On 17 June 2011 14:02, Michael Moore <[email protected]> wrote:
>>
>>> What version of OrangeFS/PVFS and what distro/kernel version is used in
>>> the setup? To re-create it, just a stream of simple write() calls from a
>>> single client or something more involved?
>>>
>>> Thanks,
>>> Michael
>>>
>>>
>>> On Fri, Jun 17, 2011 at 7:43 AM, Vincenzo Gulisano <
>>> [email protected]> wrote:
>>>
>>>> Thanks Michael
>>>>
>>>> I've tried setting alt-aio as TroveMethod and the problem is still
>>>> there.
>>>>
>>>> Some logs:
>>>>
>>>> Client (blade39) says:
>>>>
>>>>  [E 13:36:19.590763] server: tcp://blade60:3334
>>>> [E 13:36:19.591006] io_process_context_recv (op_status): No such file or
>>>> directory
>>>> [E 13:36:19.591018] server: tcp://blade61:3334
>>>> [E 13:36:19.768105] io_process_context_recv (op_status): No such file or
>>>> directory
>>>>
>>>> Servers:
>>>>
>>>> blade58:
>>>> [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
>>>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on
>>>> 0x7f5cac004370: Connection reset by peer
>>>> [E 06/17 13:37] handle_io_error: flow proto 0x7f5cac004370 canceled 0
>>>> operations, will clean up.
>>>> [E 06/17 13:37] handle_io_error: flow proto 0x7f5cac004370 error cleanup
>>>> finished: Connection reset by peer
>>>> [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
>>>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on
>>>> 0x7f5cac0ee8f0: Connection reset by peer
>>>> [E 06/17 13:37] handle_io_error: flow proto 0x7f5cac0ee8f0 canceled 0
>>>> operations, will clean up.
>>>> [E 06/17 13:37] handle_io_error: flow proto 0x7f5cac0ee8f0 error cleanup
>>>> finished: Connection reset by peer
>>>>
>>>> blade59:
>>>> [E 06/17 13:37] trove_write_callback_fn: I/O error occurred
>>>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on
>>>> 0x799410: Broken pipe
>>>> [E 06/17 13:37] handle_io_error: flow proto 0x799410 canceled 0
>>>> operations, will clean up.
>>>> [E 06/17 13:37] handle_io_error: flow proto 0x799410 error cleanup
>>>> finished: Broken pipe
>>>>
>>>> blade60:
>>>> [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
>>>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on
>>>> 0x7fb0a012bed0: Connection reset by peer
>>>> [E 06/17 13:37] handle_io_error: flow proto 0x7fb0a012bed0 canceled 0
>>>> operations, will clean up.
>>>> [E 06/17 13:37] handle_io_error: flow proto 0x7fb0a012bed0 error cleanup
>>>> finished: Connection reset by peer
>>>>
>>>> blade61:
>>>> [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
>>>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on
>>>> 0x76b5a0: Broken pipe
>>>> [E 06/17 13:37] handle_io_error: flow proto 0x76b5a0 canceled 0
>>>> operations, will clean up.
>>>> [E 06/17 13:37] handle_io_error: flow proto 0x76b5a0 error cleanup
>>>> finished: Broken pipe
>>>> [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
>>>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on
>>>> 0x778e00: Broken pipe
>>>> [E 06/17 13:37] handle_io_error: flow proto 0x778e00 canceled 0
>>>> operations, will clean up.
>>>> [E 06/17 13:37] handle_io_error: flow proto 0x778e00 error cleanup
>>>> finished: Broken pipe
>>>>
>>>> [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
>>>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on
>>>> 0x76b5a0: Broken pipe
>>>> [E 06/17 13:37] handle_io_error: flow proto 0x76b5a0 canceled 0
>>>> operations, will clean up.
>>>> [E 06/17 13:37] handle_io_error: flow proto 0x76b5a0 error cleanup
>>>> finished: Broken pipe
>>>> [E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
>>>> [E 06/17 13:37] handle_io_error: flow proto error cleanup started on
>>>> 0x778e00: Broken pipe
>>>> [E 06/17 13:37] handle_io_error: flow proto 0x778e00 canceled 0
>>>> operations, will clean up.
>>>> [E 06/17 13:37] handle_io_error: flow proto 0x778e00 error cleanup
>>>> finished: Broken pipe
>>>>
>>>> Vincenzo
>>>>
>>>> On 17 June 2011 13:29, Michael Moore <[email protected]> wrote:
>>>>
>>>>> Hi Vincenzo,
>>>>>
>>>>> This sounds similar to an issue just reported by Benjamin Seevers here
>>>>> on the developers list:
>>>>>
>>>>> http://www.beowulf-underground.org/pipermail/pvfs2-developers/2011-June/004732.html
>>>>>
>>>>> Based on his experience with the issue if you switch to alt-aio instead
>>>>> of directio the corruption no longer occurs. Could you try switching from
>>>>> directio to alt-aio in your configuration to help isolate if this is a
>>>>> similar or different issue? If that doesn't resolve the issue, could you
>>>>> provide what errors, if any, you see on the client when it fails and what
>>>>> errors, if any, appear in the pvfs2-server logs?
>>>>>
>>>>> Thanks,
>>>>> Michael
>>>>>
>>>>> On Fri, Jun 17, 2011 at 6:48 AM, Vincenzo Gulisano <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi,
>>>>>> I'm using the following setup:
>>>>>> 4 machines used as I/O server
>>>>>> 10 machines used as I/O client
>>>>>>
>>>>>> The configuration file is the following:
>>>>>>
>>>>>> <Defaults>
>>>>>>  UnexpectedRequests 50
>>>>>> EventLogging none
>>>>>> EnableTracing no
>>>>>>  LogStamp datetime
>>>>>> BMIModules bmi_tcp
>>>>>> FlowModules flowproto_multiqueue
>>>>>>  PerfUpdateInterval 1000
>>>>>> ServerJobBMITimeoutSecs 30
>>>>>> ServerJobFlowTimeoutSecs 30
>>>>>>  ClientJobBMITimeoutSecs 300
>>>>>> ClientJobFlowTimeoutSecs 300
>>>>>> ClientRetryLimit 5
>>>>>>  ClientRetryDelayMilliSecs 2000
>>>>>> PrecreateBatchSize 512
>>>>>> PrecreateLowThreshold 256
>>>>>>  TCPBufferSend 524288
>>>>>> TCPBufferReceive 524288
>>>>>> StorageSpace /local/vincenzo/pvfs2-storage-space
>>>>>>  LogFile /tmp/pvfs2-server.log
>>>>>> </Defaults>
>>>>>>
>>>>>> <Aliases>
>>>>>> Alias blade58 tcp://blade58:3334
>>>>>>  Alias blade59 tcp://blade59:3334
>>>>>> Alias blade60 tcp://blade60:3334
>>>>>> Alias blade61 tcp://blade61:3334
>>>>>> </Aliases>
>>>>>>
>>>>>> <Filesystem>
>>>>>> Name pvfs2-fs
>>>>>> ID 1615492168
>>>>>>  RootHandle 1048576
>>>>>> FileStuffing yes
>>>>>> <MetaHandleRanges>
>>>>>>  Range blade58 3-1152921504606846977
>>>>>> Range blade59 1152921504606846978-2305843009213693952
>>>>>>  Range blade60 2305843009213693953-3458764513820540927
>>>>>> Range blade61 3458764513820540928-4611686018427387902
>>>>>>  </MetaHandleRanges>
>>>>>> <DataHandleRanges>
>>>>>> Range blade58 4611686018427387903-5764607523034234877
>>>>>>  Range blade59 5764607523034234878-6917529027641081852
>>>>>> Range blade60 6917529027641081853-8070450532247928827
>>>>>>  Range blade61 8070450532247928828-9223372036854775802
>>>>>> </DataHandleRanges>
>>>>>>  <StorageHints>
>>>>>> TroveSyncMeta no
>>>>>> TroveSyncData no
>>>>>>  TroveMethod directio
>>>>>> </StorageHints>
>>>>>> </Filesystem>
>>>>>>
>>>>>> I'm testing the system writing (continuously) from 1 client machine
>>>>>> chunks of 500K. After few seconds, the client is not able to write. 
>>>>>> Checking
>>>>>> manually the file system, I can see my file (running ls) and it seems to 
>>>>>> be
>>>>>> corrupted (no information about the file is given and I cannot remove the
>>>>>> file). The only solution is to stop all clients / servers and re-create 
>>>>>> the
>>>>>> file system.
>>>>>>
>>>>>> Thanks in advance
>>>>>>
>>>>>> Vincenzo
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pvfs2-users mailing list
>>>>>> [email protected]
>>>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to