Re: [OMPI users] How can I measure synchronization time of MPI_Bcast()

Konstantinos Konstantinidis Mon, 23 Oct 2017 12:30:44 -0700

I do not completely understand whether that involves changing some MPI
code. I have no prior experience with that.


But if I get the idea something like this could potentially work (assume
that comm is the communicator of the groups that communicates at each
iteration):


*clock_t total_time = clock();*

*clock_t sync_time = 0;*

*for each transmission{*

*    sync_time = sync_time - clock();*

*    comm.Barrier();*

*    sync_time = sync_time + clock();*

*    comm.Bcast(...);*

*}*


*total_time = clock() - total_time;*

*//Total time*

*double t_time = double(total_time)/CLOCKS_PER_SEC;*

*//Synchronization time*

*double s_time = double(sync_time)/CLOCKS_PER_SEC;*

*//Actual data transmission time*
*double d_time = t_time - s_time;*


I know that I have added a useless barrier call, but do you think that this
can work the way I think it will and at least give some idea of the
synchronization time?

Barrett, I am also working on switching to m4.large instances and will
check if this helps.

Regards,
Kostas



On Mon, Oct 23, 2017 at 10:20 AM, Barrett, Brian <bbarr...@amazon.com>
wrote:

> Gilles suggested your best next course of action; time the MPI_Bcast and
> MPI_Barrier calls and see if there’s a non-linear scaling effect as you
> increase group size.
>
> You mention that you’re using m3.large instances; while this isn’t the
> list for in-depth discussion about EC2 instances (the AWS Forums are better
> for that), I’ll note that unless you’re tied to m3 for organizational or
> reserved instance reasons, you’ll probably be happier on another instance
> type.  m3 was one of the last instance families released which does not
> support Enhanced Networking.  There’s significantly more jitter and latency
> in the m3 network stack compared to platforms which support Enhanced
> Networking (including the m4 platform).  If networking costs are causing
> your scaling problems, the first step will be migrating instance types.
>
> Brian
>
> > On Oct 23, 2017, at 4:19 AM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com> wrote:
> >
> > Konstantions,
> >
> > A simple way is to rewrite MPI_Bcast() and insert timer and
> > PMPI_Barrier() before invoking the real PMPI_Bcast().
> > time spent in PMPI_Barrier() can be seen as time NOT spent on actual
> > data transmission,
> > and since all tasks are synchronized upon exit, time spent in
> > PMPI_Bcast() can be seen as time spent on actual data transmission.
> > this is not perfect, but this is a pretty good approximation.
> > You can add extra timers so you end up with an idea of how much time
> > is spent in PMPI_Barrier() vs PMPI_Bcast().
> >
> > Cheers,
> >
> > Gilles
> >
> > On Mon, Oct 23, 2017 at 4:16 PM, Konstantinos Konstantinidis
> > <kostas1...@gmail.com> wrote:
> >> In any case, do you think that the time NOT spent on actual data
> >> transmission can impact the total time of the broadcast especially when
> >> there are so many groups that communicate (please refer to the numbers I
> >> gave before if you want to get an idea).
> >>
> >> Also, is there any way to quantify this impact i.e. to measure the time
> not
> >> spent on actual data transmissions?
> >>
> >> Kostas
> >>
> >> On Fri, Oct 20, 2017 at 10:32 PM, Jeff Hammond <jeff.scie...@gmail.com>
> >> wrote:
> >>>
> >>> Broadcast is collective but not necessarily synchronous in the sense
> you
> >>> imply. If you broadcast message size under the eager limit, the root
> may
> >>> return before any non-root processes enter the function. Data transfer
> may
> >>> happen prior to processes entering the function. Only rendezvous forces
> >>> synchronization between any two processes but there may still be
> asynchrony
> >>> between different levels of the broadcast tree.
> >>>
> >>> Jeff
> >>>
> >>> On Fri, Oct 20, 2017 at 3:27 PM Konstantinos Konstantinidis
> >>> <kostas1...@gmail.com> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> I am running some tests on Amazon EC2 and they require a lot of
> >>>> communication among m3.large instances.
> >>>>
> >>>> I would like to give you an idea of what kind of communication takes
> >>>> place. There are 40 m3.large instances. Now, 28672 groups of 5
> instances are
> >>>> formed in a specific manner (let's skip the details). Within each
> group,
> >>>> each instance broadcasts some unsigned char data to the other 4
> instances in
> >>>> the group. So within each group, exactly 5 broadcasts take place.
> >>>>
> >>>> The problem is that if I increase the size of the group from 5 to 10
> >>>> there is significant delay in terms of transmission rate while, based
> on
> >>>> some theoretical results, this is not reasonable.
> >>>>
> >>>> I want to check if one of the reasons that this is happening is due to
> >>>> the time needed for the instances to synchronize when they call
> MPI_Bcast()
> >>>> since it's a collective function. As far as I know, all of the
> machines in
> >>>> the broadcast need to call it and then synchronize until the actual
> data
> >>>> transfer starts. Is there any way to measure this synchronization
> time?
> >>>>
> >>>> The code is in C++ and the MPI installed is described in the attached
> >>>> file.
> >>>> _______________________________________________
> >>>> users mailing list
> >>>> users@lists.open-mpi.org
> >>>> https://lists.open-mpi.org/mailman/listinfo/users
> >>>
> >>> --
> >>> Jeff Hammond
> >>> jeff.scie...@gmail.com
> >>> http://jeffhammond.github.io/
> >>
> >>
> >>
> >> _______________________________________________
> >> users mailing list
> >> users@lists.open-mpi.org
> >> https://lists.open-mpi.org/mailman/listinfo/users
> > _______________________________________________
> > users mailing list
> > users@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/users
>
>

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] How can I measure synchronization time of MPI_Bcast()

Reply via email to