Hey all,

I have successfully configured and installed PVFS2 on our cluster. I
managed to get the pvfs2 servers and clients running properly. The mount
point is set fine, and i can create/delete files properly.
Operating System: OpenSuSe 11.0

OpenMPI (trunk) used configured with:
    ./configure CFLAGS=-I/opt/pvfs2-2.7.1/include/
LDFLAGS=-L/opt/pvfs2-2.7.1/lib/ LIBS=-lpvfs2 -lpthread
--prefix=/home/mschaara/OMPI-PVFS2 --with-openib=/usr
--with-slurm=/opt/SLURM
--with-io-romio-flags=--with-file-system=pvfs2+ufs+nfs

pvfs-2.7.1:
    ./configure --with-kernel=/usr/src/linux-2.6.25.11/
--prefix=/opt/pvfs2-2.7.1 --enable-shared

However when i run an MPI program that open a PVFS2 file and Writes_all,
one of the PVFS2 servers crashes. I attached the test file that im running
(test_write_all.c). If i run the test file with 1,2,or 3 processes, it
gives the correct output. However with more than 3 processes it gives the
following error:
mpirun -np 5 ./test_write_all /pvfs2/test_5

    [E 18:48:03.117239] msgpair failed, will retry: Broken pipe
    [E 18:48:05.125048] msgpair failed, will retry: Connection refused
    [E 18:48:07.132856] msgpair failed, will retry: Connection refused
    [E 18:48:09.140665] msgpair failed, will retry: Connection refused
    [E 18:48:11.148474] msgpair failed, will retry: Connection refused
    [E 18:48:13.156282] msgpair failed, will retry: Connection refused
    [E 18:48:13.156282] *** msgpairarray_completion_fn: msgpair to server
tcp://shark07:3334 failed: Connection refused
    [E 18:48:13.156282] *** Out of retries.

When i Login in to the node (shark07) the server would not be running, If
is start the server again on that node, pvfs2 would be fine again (testing
by pvfs2-ping).
I saw this in the pvfs2-server.log:
    [E 10/22 18:55] src/common/misc/state-machine-fns.c line 289: Error:
state machine returned SM_ACTION_TERMINATE but didn't reach terminate
    [E 10/22 18:55]         [bt]
/opt/pvfs2-2.7.1/sbin/pvfs2-server(PINT_state_machine_next+0x1d5)
[0x41f1b5]
    [E 10/22 18:55]         [bt]
/opt/pvfs2-2.7.1/sbin/pvfs2-server(PINT_state_machine_continue+0x1e)
[0x41ec0e]
    [E 10/22 18:55]         [bt]
/opt/pvfs2-2.7.1/sbin/pvfs2-server(main+0xe3e) [0x4122be]
    [E 10/22 18:55]         [bt] /lib64/libc.so.6(__libc_start_main+0xe6)
[0x7f4640020436]
    [E 10/22 18:55]         [bt] /opt/pvfs2-2.7.1/sbin/pvfs2-server
[0x40f939]
    [D 10/22 18:55] server_state_machine_terminate 0x7881b0

and this in var/log/messages:
    shark07 kernel: pvfs2-server[14842]: segfault at 7f6ae09c7ec0 ip
7f6ae09c7ec0 sp 7fffea083628 error 15 in
libgcc_s.so.1[7f6ae09c7000+1000]

So any idea what might be wrong with my configuration on pvfs2, or OMPI?
Or might be a bug somewhere?

Thank you,


-- 
Mohamad Chaarawi
Research Assistant                http://www.cs.uh.edu/~mschaara
Department of Computer Science    University of Houston
4800 Calhoun, PGH Room 526        Houston, TX 77204, USA
/* test program that writes integers to files 
   each process write NUM_ELEMENTS integers
*/
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>

#define NUM_BLOCKS 2 /* how many blocks each process will write */
#define NUM_ELEMENTS 4 /* number of elements each process will write in a block */

int main(int argc, char** argv)
{

    MPI_File fh;
    int size, rank, i, j, k;
    MPI_Datatype etype, ftype;
    int buf[NUM_BLOCKS*NUM_ELEMENTS];
    int disp [NUM_BLOCKS];
    int blocklength[NUM_BLOCKS];
    int ret;
	
    MPI_Init (&argc, &argv);

    MPI_Comm_size (MPI_COMM_WORLD, &size);
    MPI_Comm_rank (MPI_COMM_WORLD, &rank);

    k=0;
    for (i=0; i<NUM_BLOCKS ; i++)
    {
        for (j=0 ; j<NUM_ELEMENTS ; j++)
        {
            buf[k++] = rank*NUM_ELEMENTS + j + i*size*NUM_ELEMENTS;
        }
    }

    MPI_Type_contiguous (NUM_ELEMENTS, MPI_INT, &etype);
    MPI_Type_commit (&etype);

    for (i=0 ; i<NUM_BLOCKS ; i++)
    {
        disp[i] = rank + i*size;
        blocklength[i] = 1;
    }
    MPI_Type_indexed (NUM_BLOCKS, blocklength, disp, etype, &ftype);
    MPI_Type_commit (&ftype);

    ret = MPI_File_open (MPI_COMM_WORLD, argv[1], MPI_MODE_WRONLY | MPI_MODE_CREATE, 
                  MPI_INFO_NULL, &fh);
    if ( ret != MPI_SUCCESS ) {
	printf("Could not open file, ret = %d\n", ret );
	MPI_Abort ( MPI_COMM_WORLD, ret );
}

    MPI_File_set_view(fh, 0, etype, ftype, "native", MPI_INFO_NULL);

    MPI_File_write_all(fh, buf, NUM_ELEMENTS*NUM_BLOCKS, MPI_INT, MPI_STATUS_IGNORE);
    
    MPI_File_close(&fh);

    MPI_Type_free (&etype);
    MPI_Type_free (&ftype);
    MPI_Finalize ();

    return 0;
}
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to