Hey all,
I have successfully configured and installed PVFS2 on our cluster. I
managed to get the pvfs2 servers and clients running properly. The mount
point is set fine, and i can create/delete files properly.
Operating System: OpenSuSe 11.0
OpenMPI (trunk) used configured with:
./configure CFLAGS=-I/opt/pvfs2-2.7.1/include/
LDFLAGS=-L/opt/pvfs2-2.7.1/lib/ LIBS=-lpvfs2 -lpthread
--prefix=/home/mschaara/OMPI-PVFS2 --with-openib=/usr
--with-slurm=/opt/SLURM
--with-io-romio-flags=--with-file-system=pvfs2+ufs+nfs
pvfs-2.7.1:
./configure --with-kernel=/usr/src/linux-2.6.25.11/
--prefix=/opt/pvfs2-2.7.1 --enable-shared
However when i run an MPI program that open a PVFS2 file and Writes_all,
one of the PVFS2 servers crashes. I attached the test file that im running
(test_write_all.c). If i run the test file with 1,2,or 3 processes, it
gives the correct output. However with more than 3 processes it gives the
following error:
mpirun -np 5 ./test_write_all /pvfs2/test_5
[E 18:48:03.117239] msgpair failed, will retry: Broken pipe
[E 18:48:05.125048] msgpair failed, will retry: Connection refused
[E 18:48:07.132856] msgpair failed, will retry: Connection refused
[E 18:48:09.140665] msgpair failed, will retry: Connection refused
[E 18:48:11.148474] msgpair failed, will retry: Connection refused
[E 18:48:13.156282] msgpair failed, will retry: Connection refused
[E 18:48:13.156282] *** msgpairarray_completion_fn: msgpair to server
tcp://shark07:3334 failed: Connection refused
[E 18:48:13.156282] *** Out of retries.
When i Login in to the node (shark07) the server would not be running, If
is start the server again on that node, pvfs2 would be fine again (testing
by pvfs2-ping).
I saw this in the pvfs2-server.log:
[E 10/22 18:55] src/common/misc/state-machine-fns.c line 289: Error:
state machine returned SM_ACTION_TERMINATE but didn't reach terminate
[E 10/22 18:55] [bt]
/opt/pvfs2-2.7.1/sbin/pvfs2-server(PINT_state_machine_next+0x1d5)
[0x41f1b5]
[E 10/22 18:55] [bt]
/opt/pvfs2-2.7.1/sbin/pvfs2-server(PINT_state_machine_continue+0x1e)
[0x41ec0e]
[E 10/22 18:55] [bt]
/opt/pvfs2-2.7.1/sbin/pvfs2-server(main+0xe3e) [0x4122be]
[E 10/22 18:55] [bt] /lib64/libc.so.6(__libc_start_main+0xe6)
[0x7f4640020436]
[E 10/22 18:55] [bt] /opt/pvfs2-2.7.1/sbin/pvfs2-server
[0x40f939]
[D 10/22 18:55] server_state_machine_terminate 0x7881b0
and this in var/log/messages:
shark07 kernel: pvfs2-server[14842]: segfault at 7f6ae09c7ec0 ip
7f6ae09c7ec0 sp 7fffea083628 error 15 in
libgcc_s.so.1[7f6ae09c7000+1000]
So any idea what might be wrong with my configuration on pvfs2, or OMPI?
Or might be a bug somewhere?
Thank you,
--
Mohamad Chaarawi
Research Assistant http://www.cs.uh.edu/~mschaara
Department of Computer Science University of Houston
4800 Calhoun, PGH Room 526 Houston, TX 77204, USA/* test program that writes integers to files
each process write NUM_ELEMENTS integers
*/
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#define NUM_BLOCKS 2 /* how many blocks each process will write */
#define NUM_ELEMENTS 4 /* number of elements each process will write in a block */
int main(int argc, char** argv)
{
MPI_File fh;
int size, rank, i, j, k;
MPI_Datatype etype, ftype;
int buf[NUM_BLOCKS*NUM_ELEMENTS];
int disp [NUM_BLOCKS];
int blocklength[NUM_BLOCKS];
int ret;
MPI_Init (&argc, &argv);
MPI_Comm_size (MPI_COMM_WORLD, &size);
MPI_Comm_rank (MPI_COMM_WORLD, &rank);
k=0;
for (i=0; i<NUM_BLOCKS ; i++)
{
for (j=0 ; j<NUM_ELEMENTS ; j++)
{
buf[k++] = rank*NUM_ELEMENTS + j + i*size*NUM_ELEMENTS;
}
}
MPI_Type_contiguous (NUM_ELEMENTS, MPI_INT, &etype);
MPI_Type_commit (&etype);
for (i=0 ; i<NUM_BLOCKS ; i++)
{
disp[i] = rank + i*size;
blocklength[i] = 1;
}
MPI_Type_indexed (NUM_BLOCKS, blocklength, disp, etype, &ftype);
MPI_Type_commit (&ftype);
ret = MPI_File_open (MPI_COMM_WORLD, argv[1], MPI_MODE_WRONLY | MPI_MODE_CREATE,
MPI_INFO_NULL, &fh);
if ( ret != MPI_SUCCESS ) {
printf("Could not open file, ret = %d\n", ret );
MPI_Abort ( MPI_COMM_WORLD, ret );
}
MPI_File_set_view(fh, 0, etype, ftype, "native", MPI_INFO_NULL);
MPI_File_write_all(fh, buf, NUM_ELEMENTS*NUM_BLOCKS, MPI_INT, MPI_STATUS_IGNORE);
MPI_File_close(&fh);
MPI_Type_free (&etype);
MPI_Type_free (&ftype);
MPI_Finalize ();
return 0;
}_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users