Re: [Pvfs2-developers] Unexpected flow protocol error using unequally distribution of data with MPI

Sam Lang Mon, 12 Mar 2007 07:51:16 -0800


Hi Julian,

Those flow error messages are either coming from bmi or trove. Basedon the error, my guess would be that the request processing on theserver may tell flow to expect more data (BMI messages), but therequest processing doesn't match up and the client has already senteverything it has to the server. That's just a guess though.

The error code is EINVAL, so maybe the request processing actuallyfails in the flow code. Could you set the server debug level to'all' and send us the output?


Thanks,

-sam

On Mar 12, 2007, at 6:37 AM, Julian Martin Kunkel wrote:

Hi guys,
I found another unexpected behavior :(
This time I get in trouble when I create a unbalanced distributionover the
datafiles with MPI_Type_struct. I tried with 5 dataservers and with 2
dataservers, the example I will give here is for 2 dataservers.
The datatype I use for the view places 64KByte on one server and128KByte on
another server.
    blocklens[0] = 1;
    blocklens[1] = 128*1024;
    blocklens[2] = 64*1024;
    blocklens[3] = 1;
    indices[0] = 0;
    indices[1] = 0;
    indices[2] = (128+64)*1024;
    indices[3] = (128+128)*1024;
    old_types[0] = MPI_LB;
    old_types[1] = MPI_BYTE;
    old_types[2] = MPI_BYTE;
    old_types[3] = MPI_UB;
I attached a program which demonstrated the problem for 2dataservers, it
writes 100MByte per iteration.
Once I write more than 1500MByte with MPI_File_write I always geton the
server machines:
[E 12:18:30.682262] handle_io_error: flow proto error cleanupstarted on
0x81669f0, error_code: -1073742095
[E 12:18:30.682312] handle_io_error: flow proto 0x81669f0 canceled 0
operations, will clean up.
[E 12:18:30.682326] handle_io_error: flow proto 0x81669f0 errorcleanup
finished, error_code: -1073742095
[E 12:18:30.709711] handle_io_error: flow proto error cleanupstarted on
0x81508c8, error_code: -1073742095
[E 12:18:30.710381] handle_io_error: flow proto 0x81508c8 canceled 1
operations, will clean up.
[E 12:18:30.710544] handle_io_error: flow proto 0x81508c8 errorcleanup
finished, error_code: -1073742095
This is reproducable, I tried maybe 10 times with differentprograms usingthis patterns. With this program the flow error occurs on iteration15 when
the file will be about 2GByte big ..
On disk of the 2 dataserver with ls I get 1GByte per datafile, withdu I cansee the holes, about 1.1 GByte is needed on one machine and514MByte on the
other server which seems to resemble the 2:1 distribution correctly...
With 5 dataservers I tried the following imbalanced distribution10,10,10,10,9(which means the last server get 10% less data per iteration) andget thesame problem once the file is bigger than 2GByte... This does notoccur if
the amount of data is distributed evenly in each iteration...

Thanks for helping me out :)
julian
<unexpected-pvfs2-flow-error.c>
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Re: [Pvfs2-developers] Unexpected flow protocol error using unequally distribution of data with MPI

Reply via email to