Thanks Michael

I've tried setting alt-aio as TroveMethod and the problem is still there.

Some logs:

Client (blade39) says:

[E 13:36:19.590763] server: tcp://blade60:3334
[E 13:36:19.591006] io_process_context_recv (op_status): No such file or
directory
[E 13:36:19.591018] server: tcp://blade61:3334
[E 13:36:19.768105] io_process_context_recv (op_status): No such file or
directory

Servers:

blade58:
[E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
[E 06/17 13:37] handle_io_error: flow proto error cleanup started on
0x7f5cac004370: Connection reset by peer
[E 06/17 13:37] handle_io_error: flow proto 0x7f5cac004370 canceled 0
operations, will clean up.
[E 06/17 13:37] handle_io_error: flow proto 0x7f5cac004370 error cleanup
finished: Connection reset by peer
[E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
[E 06/17 13:37] handle_io_error: flow proto error cleanup started on
0x7f5cac0ee8f0: Connection reset by peer
[E 06/17 13:37] handle_io_error: flow proto 0x7f5cac0ee8f0 canceled 0
operations, will clean up.
[E 06/17 13:37] handle_io_error: flow proto 0x7f5cac0ee8f0 error cleanup
finished: Connection reset by peer

blade59:
[E 06/17 13:37] trove_write_callback_fn: I/O error occurred
[E 06/17 13:37] handle_io_error: flow proto error cleanup started on
0x799410: Broken pipe
[E 06/17 13:37] handle_io_error: flow proto 0x799410 canceled 0 operations,
will clean up.
[E 06/17 13:37] handle_io_error: flow proto 0x799410 error cleanup finished:
Broken pipe

blade60:
[E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
[E 06/17 13:37] handle_io_error: flow proto error cleanup started on
0x7fb0a012bed0: Connection reset by peer
[E 06/17 13:37] handle_io_error: flow proto 0x7fb0a012bed0 canceled 0
operations, will clean up.
[E 06/17 13:37] handle_io_error: flow proto 0x7fb0a012bed0 error cleanup
finished: Connection reset by peer

blade61:
[E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
[E 06/17 13:37] handle_io_error: flow proto error cleanup started on
0x76b5a0: Broken pipe
[E 06/17 13:37] handle_io_error: flow proto 0x76b5a0 canceled 0 operations,
will clean up.
[E 06/17 13:37] handle_io_error: flow proto 0x76b5a0 error cleanup finished:
Broken pipe
[E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
[E 06/17 13:37] handle_io_error: flow proto error cleanup started on
0x778e00: Broken pipe
[E 06/17 13:37] handle_io_error: flow proto 0x778e00 canceled 0 operations,
will clean up.
[E 06/17 13:37] handle_io_error: flow proto 0x778e00 error cleanup finished:
Broken pipe

[E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
[E 06/17 13:37] handle_io_error: flow proto error cleanup started on
0x76b5a0: Broken pipe
[E 06/17 13:37] handle_io_error: flow proto 0x76b5a0 canceled 0 operations,
will clean up.
[E 06/17 13:37] handle_io_error: flow proto 0x76b5a0 error cleanup finished:
Broken pipe
[E 06/17 13:37] bmi_recv_callback_fn: I/O error occurred
[E 06/17 13:37] handle_io_error: flow proto error cleanup started on
0x778e00: Broken pipe
[E 06/17 13:37] handle_io_error: flow proto 0x778e00 canceled 0 operations,
will clean up.
[E 06/17 13:37] handle_io_error: flow proto 0x778e00 error cleanup finished:
Broken pipe

Vincenzo

On 17 June 2011 13:29, Michael Moore <[email protected]> wrote:

> Hi Vincenzo,
>
> This sounds similar to an issue just reported by Benjamin Seevers here on
> the developers list:
>
> http://www.beowulf-underground.org/pipermail/pvfs2-developers/2011-June/004732.html
>
> Based on his experience with the issue if you switch to alt-aio instead of
> directio the corruption no longer occurs. Could you try switching from
> directio to alt-aio in your configuration to help isolate if this is a
> similar or different issue? If that doesn't resolve the issue, could you
> provide what errors, if any, you see on the client when it fails and what
> errors, if any, appear in the pvfs2-server logs?
>
> Thanks,
> Michael
>
> On Fri, Jun 17, 2011 at 6:48 AM, Vincenzo Gulisano <
> [email protected]> wrote:
>
>> Hi,
>> I'm using the following setup:
>> 4 machines used as I/O server
>> 10 machines used as I/O client
>>
>> The configuration file is the following:
>>
>> <Defaults>
>>  UnexpectedRequests 50
>> EventLogging none
>> EnableTracing no
>>  LogStamp datetime
>> BMIModules bmi_tcp
>> FlowModules flowproto_multiqueue
>>  PerfUpdateInterval 1000
>> ServerJobBMITimeoutSecs 30
>> ServerJobFlowTimeoutSecs 30
>>  ClientJobBMITimeoutSecs 300
>> ClientJobFlowTimeoutSecs 300
>> ClientRetryLimit 5
>>  ClientRetryDelayMilliSecs 2000
>> PrecreateBatchSize 512
>> PrecreateLowThreshold 256
>>  TCPBufferSend 524288
>> TCPBufferReceive 524288
>> StorageSpace /local/vincenzo/pvfs2-storage-space
>>  LogFile /tmp/pvfs2-server.log
>> </Defaults>
>>
>> <Aliases>
>> Alias blade58 tcp://blade58:3334
>>  Alias blade59 tcp://blade59:3334
>> Alias blade60 tcp://blade60:3334
>> Alias blade61 tcp://blade61:3334
>> </Aliases>
>>
>> <Filesystem>
>> Name pvfs2-fs
>> ID 1615492168
>>  RootHandle 1048576
>> FileStuffing yes
>> <MetaHandleRanges>
>>  Range blade58 3-1152921504606846977
>> Range blade59 1152921504606846978-2305843009213693952
>>  Range blade60 2305843009213693953-3458764513820540927
>> Range blade61 3458764513820540928-4611686018427387902
>>  </MetaHandleRanges>
>> <DataHandleRanges>
>> Range blade58 4611686018427387903-5764607523034234877
>>  Range blade59 5764607523034234878-6917529027641081852
>> Range blade60 6917529027641081853-8070450532247928827
>>  Range blade61 8070450532247928828-9223372036854775802
>> </DataHandleRanges>
>>  <StorageHints>
>> TroveSyncMeta no
>> TroveSyncData no
>>  TroveMethod directio
>> </StorageHints>
>> </Filesystem>
>>
>> I'm testing the system writing (continuously) from 1 client machine chunks
>> of 500K. After few seconds, the client is not able to write. Checking
>> manually the file system, I can see my file (running ls) and it seems to be
>> corrupted (no information about the file is given and I cannot remove the
>> file). The only solution is to stop all clients / servers and re-create the
>> file system.
>>
>> Thanks in advance
>>
>> Vincenzo
>>
>> _______________________________________________
>> Pvfs2-users mailing list
>> [email protected]
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>
>>
>
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to