souvik bhattacherjee wrote:
Hi all,
I'm trying to interleave computation with communication. As a result,
I have resorted to using MPI with POSIX threads. Primarily, I am
trying to communicate a partial vector v3 while computing an inner
product v1*v2 (mod q). To give you an idea of the platform and the
libraries:
1. Intel dual-socket quadcore m/c (total 8 cores/machine)
2. openmpi 1.3.3 (separate installations on ict6 and ict4 machines)
3. lib64gmp3 4.3.1
4. gcc 4.3.2
5. interconnect: Gigabit ethernet
I have used a single thread for most of the communication and the
remaining 7 threads for computation. Perhaps, this portion of the code
has gone wrong somewhere and the program terminates with the following
error message.
$ mpicc test-vecvecmul.c -lgmp -pthread -Wall -o tvmul
$ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 2 --host ict6,ict4 ./tvmul
[err] event_queue_remove: 0xc1d6b0(fd 10) not on queue 8
[err] event_queue_remove: 0xc1d6b0(fd 10) not on queue 8
[ict6][[21545,1],0][../../../../../ompi/mca/btl/tcp/btl_tcp_frag.c:216:mca_btl_tcp_frag_recv]
mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
--------------------------------------------------------------------------
mpirun has exited due to process rank 1 with PID 17154 on
node ict4 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
The code is attached along with. Please suggest where in the code have
I gone wrong. Also, a more efficient way of interleaving (if exists)
is something that I am interested in.
**** Can anyone suggest a good tutorial sort of thing where I can get
to know about programming in MPI with POSIX threads/OpenMP.
Regards,
--
Souvik
I got a similar error when using non-blocking communication on large
datasets. I eventually had to switch to blocking communication... Try to
make the code work with blocking communication first and see if that
removes your error, then re-implement it from there with non-blocking
again. Doesn't MPI have decent threading performance if the processes
are located on the same node? Could you perhaps use MPI only?
- Atle