Hi guys,
I found another unexpected behavior :(
This time I get in trouble when I create a unbalanced distribution over the 
datafiles with MPI_Type_struct. I tried with 5 dataservers and with 2 
dataservers, the example I will give here is for 2 dataservers.
The datatype I use for the view places 64KByte on one server and 128KByte on 
another server.
    blocklens[0] = 1;
    blocklens[1] = 128*1024;
    blocklens[2] = 64*1024;
    blocklens[3] = 1;
    indices[0] = 0;
    indices[1] = 0;
    indices[2] = (128+64)*1024;
    indices[3] = (128+128)*1024;
    old_types[0] = MPI_LB;
    old_types[1] = MPI_BYTE;
    old_types[2] = MPI_BYTE;
    old_types[3] = MPI_UB;

I attached a program which demonstrated the problem for 2 dataservers, it 
writes 100MByte per iteration. 
Once I write more than 1500MByte with MPI_File_write I always get on the 
server machines:
[E 12:18:30.682262] handle_io_error: flow proto error cleanup started on 
0x81669f0, error_code: -1073742095
[E 12:18:30.682312] handle_io_error: flow proto 0x81669f0 canceled 0 
operations, will clean up.
[E 12:18:30.682326] handle_io_error: flow proto 0x81669f0 error cleanup 
finished, error_code: -1073742095
[E 12:18:30.709711] handle_io_error: flow proto error cleanup started on 
0x81508c8, error_code: -1073742095
[E 12:18:30.710381] handle_io_error: flow proto 0x81508c8 canceled 1 
operations, will clean up.
[E 12:18:30.710544] handle_io_error: flow proto 0x81508c8 error cleanup 
finished, error_code: -1073742095

This is reproducable, I tried maybe 10 times with different programs using 
this patterns. With this program the flow error occurs on iteration 15 when 
the file will be about 2GByte big .. 
On disk of the 2 dataserver with ls I get 1GByte per datafile, with du I can 
see the holes,  about 1.1 GByte is needed on one machine and 514MByte on the 
other server which seems to resemble the 2:1 distribution correctly...

With 5 dataservers I tried the following imbalanced distribution 10,10,10,10,9 
(which means the last server get 10% less data per iteration) and get the 
same problem once the file is bigger than 2GByte... This does not occur if 
the amount of data is distributed evenly in each iteration...

Thanks for helping me out :)
julian
/*
 * Sample
*/
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <getopt.h>
#include <assert.h>

int main (int argc, char** argv)
{
    int iter;
    int ret;
    MPI_File fh;
    MPI_Init(&argc, &argv);

    MPI_Aint displacement = 0;
    MPI_Aint indices[4];
    MPI_Datatype old_types[4];
    int blocklens[4];

    MPI_Datatype dt;

    int total_bytes = 100*1024*1024;

    char * f_buff = malloc( total_bytes );

    /* creation of datatype */
    blocklens[0] = 1;
    blocklens[1] = 128*1024;
    blocklens[2] = 64*1024;
    blocklens[3] = 1;
    indices[0] = 0;
    indices[1] = 0;
    indices[2] = (128+64)*1024;
    indices[3] = (128+128)*1024;
    old_types[0] = MPI_LB;
    old_types[1] = MPI_BYTE;
    old_types[2] = MPI_BYTE;
    old_types[3] = MPI_UB;

    ret = MPI_Type_struct( 4, blocklens, indices, old_types, & dt );
    assert(ret == 0);

    ret = MPI_Type_commit(& dt);
    assert(ret == 0);

    ret = MPI_File_open( MPI_COMM_WORLD,
                  "pvfs2://pvfs2/test", MPI_MODE_RDWR  | MPI_MODE_CREATE,
                  MPI_INFO_NULL, & fh );
    assert(ret == 0);

    ret =MPI_File_set_view(fh, 0,
                            MPI_BYTE,  /* etype */
                             dt, /* file type */
                            "native", MPI_INFO_NULL);

    assert(ret == 0);
    memset(f_buff, 17, total_bytes );
    for(iter = 0 ; iter < 50; iter ++){
	printf("%d writing %fKByte \n", iter, total_bytes/1024.0f);
        ret = MPI_File_write(
            fh,
            f_buff,
            total_bytes,
            MPI_BYTE,
            MPI_STATUS_IGNORE );
        assert(ret == 0);
    }

    MPI_File_close(& fh);

    MPI_Finalize();

    return 0;
}


_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to