Re: [OMPI users] Possible buffer overflow on Recv rank

2019-03-28 Thread George Bosilca
If I add a loop to make sure I account for all receives on the master, and
correctly set the tags a basic application based on your scheme seems to
work as intended. Can you post a reproducer for you issue instead ?

Thanks,
  George.


On Thu, Mar 28, 2019 at 6:52 AM carlos aguni  wrote:

> Hi Gilles.
>
> Thank you for ur reply.
> Here's some code:
>
> // MASTER NODE
> printf("[%s][RECV] src=%d tag=%d\n", processor_name, src, hashtag);
> fflush(stdout);
> MPI_Request req;
> rs = MPI_Irecv(buf, count, MPI_DOUBLE, src, hashtag, comm, &req);
> MPI_Wait(&req, status);
> printf("[%s][RECV] src=%d tag=%d OK\n", processor_name, src, hashtag);
> fflush(stdout);
>
> // WORKER NODES
> printf("[exec_cmd] Send double buff to %d, %d\n", dest, msg_tag);
> fflush(stdout);
> int bufsize = msg_size * sizeof(double) + MPI_BSEND_OVERHEAD;
> double * buff = malloc(bufsize);
> MPI_Buffer_attach(buff, bufsize);
> MPI_Bsend(rec_msg, msg_size, MPI_DOUBLE, dest, msg_tag, comm);
> MPI_Buffer_detach(buff, &bufsize);
> printf("[exec_cmd] Send double buff to %d, %d OK\n", dest, msg_tag);
> fflush(stdout);
>
> //Attempt with Isend
> //MPI_Request req;
> //MPI_Status status;
> //MPI_Isend(rec_msg, msg_size, MPI_DOUBLE, dest, msg_tag, comm, &req);
> //MPI_Wait(&req, &status);
>
> Output log:
> Sending 91 rows to task 9 offset=728
> Sending 91 rows to task 10 offset=819
> Sending 90 rows to task 11 offset=910
> Received results from task 1
> [exec_cmd] Send to 0, 508
> [exec_cmd] Send to 0, 508 OK
> [exec_cmd] Send to 0, 508
> [exec_cmd] Send to 0, 508 OK
> [exec_cmd] Send double buff to 0, 508
> [exec_cmd] Send to 0, 510pir
> [exec_cmd] Send to 0, 510 OK
> [exec_cmd] Send to 0, 510
> [exec_cmd] Send to 0, 510 OK
> [exec_cmd] Send double buff to 0, 510
> Received results from task 2
> Received results from task 3
> [controller][RECV] src=4 tag=506
> output hangs here
>
> Is there any way to instrument this to assess If the problem is actually
> on the receive end or at the send part?
>
> Regards,
> Carlos.
>
>
>
>
>
>
>
>
>
> On Wed, Mar 27, 2019 at 11:13 AM Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com> wrote:
>
>> Carlos,
>>
>> can you post a trimmed version of your code that evidences the issue ?
>>
>> Keep in mind that if you want to write MPI code that is correct with
>> respect to the standard, you should assume MPI_Send() might block until a
>> matching receive is posted.
>>
>> Cheers,
>>
>> Gilles
>>
>> Sent from my iPod
>>
>> On Mar 27, 2019, at 20:46, carlos aguni  wrote:
>>
>> Not "MPI_Send from 0"..
>> MPI_Send from 1 to 0
>> MPI_Send from 7 to 0
>> And so on..
>>
>> On Wed, Mar 27, 2019, 8:43 AM carlos aguni  wrote:
>>
>>> Hi all.
>>>
>>> I've an MPI application in which at one moment one rank receives a slice
>>> of an array from the other nodes.
>>> Thing is that my application hangs there.
>>>
>>> One thing I could get from printint out logs are:
>>> (Rank 0) Starts MPI_Recv from source 4
>>> But then it receives:
>>> MPI_Send from 0
>>> MPI_Send from 1
>>> ... From 10
>>> ... From 7
>>> ... From 6
>>>
>>> Then at one neither of them are responding.
>>> The message is a double array type of size 100.000.
>>> Later it would receive the message from 4.
>>>
>>> So i assume the buffer on the Recv side is overflowing.
>>>
>>> Few tests:
>>> - Using smaller array size works
>>> - alreay tried using isend. Irecv. Bsend. And the ranks still get stuck.
>>>
>>> So that leaves me to a few questions rather than how to solve this issue:
>>> - how can i know the size of mpi's interbal buffer?
>>> - how would one debug this?
>>>
>>> Regards,
>>> Carlos.
>>>
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>>
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Possible buffer overflow on Recv rank

2019-03-28 Thread carlos aguni
Hi Gilles.

Thank you for ur reply.
Here's some code:

// MASTER NODE
printf("[%s][RECV] src=%d tag=%d\n", processor_name, src, hashtag);
fflush(stdout);
MPI_Request req;
rs = MPI_Irecv(buf, count, MPI_DOUBLE, src, hashtag, comm, &req);
MPI_Wait(&req, status);
printf("[%s][RECV] src=%d tag=%d OK\n", processor_name, src, hashtag);
fflush(stdout);

// WORKER NODES
printf("[exec_cmd] Send double buff to %d, %d\n", dest, msg_tag);
fflush(stdout);
int bufsize = msg_size * sizeof(double) + MPI_BSEND_OVERHEAD;
double * buff = malloc(bufsize);
MPI_Buffer_attach(buff, bufsize);
MPI_Bsend(rec_msg, msg_size, MPI_DOUBLE, dest, msg_tag, comm);
MPI_Buffer_detach(buff, &bufsize);
printf("[exec_cmd] Send double buff to %d, %d OK\n", dest, msg_tag);
fflush(stdout);

//Attempt with Isend
//MPI_Request req;
//MPI_Status status;
//MPI_Isend(rec_msg, msg_size, MPI_DOUBLE, dest, msg_tag, comm, &req);
//MPI_Wait(&req, &status);

Output log:
Sending 91 rows to task 9 offset=728
Sending 91 rows to task 10 offset=819
Sending 90 rows to task 11 offset=910
Received results from task 1
[exec_cmd] Send to 0, 508
[exec_cmd] Send to 0, 508 OK
[exec_cmd] Send to 0, 508
[exec_cmd] Send to 0, 508 OK
[exec_cmd] Send double buff to 0, 508
[exec_cmd] Send to 0, 510pir
[exec_cmd] Send to 0, 510 OK
[exec_cmd] Send to 0, 510
[exec_cmd] Send to 0, 510 OK
[exec_cmd] Send double buff to 0, 510
Received results from task 2
Received results from task 3
[controller][RECV] src=4 tag=506
output hangs here

Is there any way to instrument this to assess If the problem is actually on
the receive end or at the send part?

Regards,
Carlos.









On Wed, Mar 27, 2019 at 11:13 AM Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:

> Carlos,
>
> can you post a trimmed version of your code that evidences the issue ?
>
> Keep in mind that if you want to write MPI code that is correct with
> respect to the standard, you should assume MPI_Send() might block until a
> matching receive is posted.
>
> Cheers,
>
> Gilles
>
> Sent from my iPod
>
> On Mar 27, 2019, at 20:46, carlos aguni  wrote:
>
> Not "MPI_Send from 0"..
> MPI_Send from 1 to 0
> MPI_Send from 7 to 0
> And so on..
>
> On Wed, Mar 27, 2019, 8:43 AM carlos aguni  wrote:
>
>> Hi all.
>>
>> I've an MPI application in which at one moment one rank receives a slice
>> of an array from the other nodes.
>> Thing is that my application hangs there.
>>
>> One thing I could get from printint out logs are:
>> (Rank 0) Starts MPI_Recv from source 4
>> But then it receives:
>> MPI_Send from 0
>> MPI_Send from 1
>> ... From 10
>> ... From 7
>> ... From 6
>>
>> Then at one neither of them are responding.
>> The message is a double array type of size 100.000.
>> Later it would receive the message from 4.
>>
>> So i assume the buffer on the Recv side is overflowing.
>>
>> Few tests:
>> - Using smaller array size works
>> - alreay tried using isend. Irecv. Bsend. And the ranks still get stuck.
>>
>> So that leaves me to a few questions rather than how to solve this issue:
>> - how can i know the size of mpi's interbal buffer?
>> - how would one debug this?
>>
>> Regards,
>> Carlos.
>>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Possible buffer overflow on Recv rank

2019-03-27 Thread Gilles Gouaillardet
Carlos,

can you post a trimmed version of your code that evidences the issue ?

Keep in mind that if you want to write MPI code that is correct with respect to 
the standard, you should assume MPI_Send() might block until a matching receive 
is posted.

Cheers,

Gilles 

Sent from my iPod

> On Mar 27, 2019, at 20:46, carlos aguni  wrote:
> 
> Not "MPI_Send from 0"..
> MPI_Send from 1 to 0
> MPI_Send from 7 to 0
> And so on..
> 
>> On Wed, Mar 27, 2019, 8:43 AM carlos aguni  wrote:
>> Hi all.
>> 
>> I've an MPI application in which at one moment one rank receives a slice of 
>> an array from the other nodes.
>> Thing is that my application hangs there.
>> 
>> One thing I could get from printint out logs are:
>> (Rank 0) Starts MPI_Recv from source 4
>> But then it receives:
>> MPI_Send from 0
>> MPI_Send from 1
>> ... From 10
>> ... From 7
>> ... From 6
>> 
>> Then at one neither of them are responding.
>> The message is a double array type of size 100.000.
>> Later it would receive the message from 4.
>> 
>> So i assume the buffer on the Recv side is overflowing. 
>> 
>> Few tests:
>> - Using smaller array size works
>> - alreay tried using isend. Irecv. Bsend. And the ranks still get stuck.
>> 
>> So that leaves me to a few questions rather than how to solve this issue:
>> - how can i know the size of mpi's interbal buffer?
>> - how would one debug this?
>> 
>> Regards,
>> Carlos.
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Possible buffer overflow on Recv rank

2019-03-27 Thread carlos aguni
Not "MPI_Send from 0"..
MPI_Send from 1 to 0
MPI_Send from 7 to 0
And so on..

On Wed, Mar 27, 2019, 8:43 AM carlos aguni  wrote:

> Hi all.
>
> I've an MPI application in which at one moment one rank receives a slice
> of an array from the other nodes.
> Thing is that my application hangs there.
>
> One thing I could get from printint out logs are:
> (Rank 0) Starts MPI_Recv from source 4
> But then it receives:
> MPI_Send from 0
> MPI_Send from 1
> ... From 10
> ... From 7
> ... From 6
>
> Then at one neither of them are responding.
> The message is a double array type of size 100.000.
> Later it would receive the message from 4.
>
> So i assume the buffer on the Recv side is overflowing.
>
> Few tests:
> - Using smaller array size works
> - alreay tried using isend. Irecv. Bsend. And the ranks still get stuck.
>
> So that leaves me to a few questions rather than how to solve this issue:
> - how can i know the size of mpi's interbal buffer?
> - how would one debug this?
>
> Regards,
> Carlos.
>
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] Possible buffer overflow on Recv rank

2019-03-27 Thread carlos aguni
Hi all.

I've an MPI application in which at one moment one rank receives a slice of
an array from the other nodes.
Thing is that my application hangs there.

One thing I could get from printint out logs are:
(Rank 0) Starts MPI_Recv from source 4
But then it receives:
MPI_Send from 0
MPI_Send from 1
... From 10
... From 7
... From 6

Then at one neither of them are responding.
The message is a double array type of size 100.000.
Later it would receive the message from 4.

So i assume the buffer on the Recv side is overflowing.

Few tests:
- Using smaller array size works
- alreay tried using isend. Irecv. Bsend. And the ranks still get stuck.

So that leaves me to a few questions rather than how to solve this issue:
- how can i know the size of mpi's interbal buffer?
- how would one debug this?

Regards,
Carlos.
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users