Re: Hadoop + MPI

2011-11-28 Thread Milind.Bhandarkar
Great ! Works for me ! Thanks Ralph.

- Milind

---
Milind Bhandarkar
Greenplum Labs, EMC
(Disclaimer: Opinions expressed in this email are those of the author, and
do not necessarily represent the views of any organization, past or
present, the author might be affiliated with.)



On 11/23/11 5:13 PM, Ralph Castain r...@open-mpi.org wrote:

FWIW: I can commit the OMPI part of your patch for you. The CCLA is
intended to ensure that people realize the need to protect OMPI from
infection due to code based on other licenses such as GPL. For people
only offering a single patch, it often is too big a burden to get
corporate approval of the legal document.

So as long as someone (e.g., me) who already is operating under the CCLA
is willing to review and commit the patch, and the patch isn't too huge,
we can absorb it that way. I expect your patch is just a new ess
component, and I'm happy to do the review and commit it on your behalf,
if that is acceptable to you.


On Nov 21, 2011, at 5:04 PM, milind.bhandar...@emc.com
milind.bhandar...@emc.com wrote:

 Ralph,
 
 Yes, I have completed the first step, although I would really like that
 code to be part of the MPI Application Master (Chris Douglas suggested a
 way to do this at ApacheCon).
 
 Regarding the remaining steps, I have been following discussions on the
 open mpi mailing lists, and reading code for hwloc.
 
 If you are making a trip to Cisco HQ sometime soon, I would like to
have a
 face-to-face about hwloc. I have so far avoided to use a native task
 controller for spawning MPI jobs, but given the lack of support for
 binding in Java, it looks like I will have to bite the bullet.
 
 - milind
 
 ---
 Milind Bhandarkar
 Greenplum Labs, EMC
 (Disclaimer: Opinions expressed in this email are those of the author,
and
 do not necessarily represent the views of any organization, past or
 present, the author might be affiliated with.)
 
 
 
 On 11/21/11 3:54 PM, Ralph Castain r...@open-mpi.org wrote:
 
 Hi Milind
 
 Glad to hear of the progress - I recall our earlier conversation. I
 gather you have completed step 1 (wireup) - have you given any thought
to
 the other two steps? Anything I can do to help?
 
 Ralph
 
 
 On Nov 21, 2011, at 4:47 PM, milind.bhandar...@emc.com wrote:
 
 Hi Ralph,
 
 I spoke with Jeff Squyres  at SC11, and updated him on the status of
my
 OpenMPI port on Hadoop Yarn.
 
 To update everyone, I have OpenMPI examples running on #Yarn, although
 it
 requires some code cleanup and refactoring, however that can be done
as
 a
 later step.
 
 Currently, the MPI processes come up, get submitting client's IP and
 port
 via environment variables, connect to it, and do a barrier. The result
 of
 this barrier is that everyone in MPI_COMM_WORLD gets each other's
 endpoints.
 
 I am aiming to submit the patch to hadoop by the end of this month.
 
 I will publish the openmpi patch to github.
 
 (As I mentioned to Jeff, OpenMPI requires a CCLA for accepting
 submissions. That will take some time.)
 
 - Milind
 
 ---
 Milind Bhandarkar
 Greenplum Labs, EMC
 (Disclaimer: Opinions expressed in this email are those of the author,
 and
 do not necessarily represent the views of any organization, past or
 present, the author might be affiliated with.)
 
 
 
 
 I'm willing to do the integration work, but wanted to check first to
 see
 if (a) someone in the Hadoop community is already doing so, and (b)
if
 you would be interested in seeing such a capability and willing to
 accept
 the code contribution?
 
 Establishing MPI support requires the following steps:
 
 1. wireup support. MPI processes need to exchange endpoint info
(e.g.,
 for TCP connections, IP address and port) so that each process knows
 how
 to connect to any other process in the application. This is typically
 done in a collective modex operation. There are several ways of
doing
 it - if we proceed, I will outline those in a separate email to
solicit
 your input on the most desirable approach to use.
 
 2. binding support. One can achieve significant performance
 improvements
 by binding processes to specific cores, sockets, and/or NUMA regions
 (regardless of using MPI or not, but certainly important for MPI
 applications). This requires not only the binding code, but some
logic
 to
 ensure that one doesn't overload specific resources.
 
 3. process mapping. I haven't verified it yet, but I suspect that
 Hadoop
 provides each executing instance with an identifier that is unique
 within
 that job - e.g., we typically assign an integer rank that ranges
 from 0
 to the number of instances being executed. This identifier is
critical
 for MPI applications, and the relative placement of processes within
a
 job often dictates overall performance. Thus, we would provide a
 mapping
 capability that allows users to specify patterns of process placement
 for
 their job - e.g., place one process on each socket on every node.
 
 I have written the code to implement the above support on a number of
 

Re: Hadoop + MPI

2011-11-23 Thread Ralph Castain
FWIW: I can commit the OMPI part of your patch for you. The CCLA is intended to 
ensure that people realize the need to protect OMPI from infection due to 
code based on other licenses such as GPL. For people only offering a single 
patch, it often is too big a burden to get corporate approval of the legal 
document.

So as long as someone (e.g., me) who already is operating under the CCLA is 
willing to review and commit the patch, and the patch isn't too huge, we can 
absorb it that way. I expect your patch is just a new ess component, and I'm 
happy to do the review and commit it on your behalf, if that is acceptable to 
you.


On Nov 21, 2011, at 5:04 PM, milind.bhandar...@emc.com 
milind.bhandar...@emc.com wrote:

 Ralph,
 
 Yes, I have completed the first step, although I would really like that
 code to be part of the MPI Application Master (Chris Douglas suggested a
 way to do this at ApacheCon).
 
 Regarding the remaining steps, I have been following discussions on the
 open mpi mailing lists, and reading code for hwloc.
 
 If you are making a trip to Cisco HQ sometime soon, I would like to have a
 face-to-face about hwloc. I have so far avoided to use a native task
 controller for spawning MPI jobs, but given the lack of support for
 binding in Java, it looks like I will have to bite the bullet.
 
 - milind
 
 ---
 Milind Bhandarkar
 Greenplum Labs, EMC
 (Disclaimer: Opinions expressed in this email are those of the author, and
 do not necessarily represent the views of any organization, past or
 present, the author might be affiliated with.)
 
 
 
 On 11/21/11 3:54 PM, Ralph Castain r...@open-mpi.org wrote:
 
 Hi Milind
 
 Glad to hear of the progress - I recall our earlier conversation. I
 gather you have completed step 1 (wireup) - have you given any thought to
 the other two steps? Anything I can do to help?
 
 Ralph
 
 
 On Nov 21, 2011, at 4:47 PM, milind.bhandar...@emc.com wrote:
 
 Hi Ralph,
 
 I spoke with Jeff Squyres  at SC11, and updated him on the status of my
 OpenMPI port on Hadoop Yarn.
 
 To update everyone, I have OpenMPI examples running on #Yarn, although
 it
 requires some code cleanup and refactoring, however that can be done as
 a
 later step.
 
 Currently, the MPI processes come up, get submitting client's IP and
 port
 via environment variables, connect to it, and do a barrier. The result
 of
 this barrier is that everyone in MPI_COMM_WORLD gets each other's
 endpoints.
 
 I am aiming to submit the patch to hadoop by the end of this month.
 
 I will publish the openmpi patch to github.
 
 (As I mentioned to Jeff, OpenMPI requires a CCLA for accepting
 submissions. That will take some time.)
 
 - Milind
 
 ---
 Milind Bhandarkar
 Greenplum Labs, EMC
 (Disclaimer: Opinions expressed in this email are those of the author,
 and
 do not necessarily represent the views of any organization, past or
 present, the author might be affiliated with.)
 
 
 
 
 I'm willing to do the integration work, but wanted to check first to
 see
 if (a) someone in the Hadoop community is already doing so, and (b) if
 you would be interested in seeing such a capability and willing to
 accept
 the code contribution?
 
 Establishing MPI support requires the following steps:
 
 1. wireup support. MPI processes need to exchange endpoint info (e.g.,
 for TCP connections, IP address and port) so that each process knows
 how
 to connect to any other process in the application. This is typically
 done in a collective modex operation. There are several ways of doing
 it - if we proceed, I will outline those in a separate email to solicit
 your input on the most desirable approach to use.
 
 2. binding support. One can achieve significant performance
 improvements
 by binding processes to specific cores, sockets, and/or NUMA regions
 (regardless of using MPI or not, but certainly important for MPI
 applications). This requires not only the binding code, but some logic
 to
 ensure that one doesn't overload specific resources.
 
 3. process mapping. I haven't verified it yet, but I suspect that
 Hadoop
 provides each executing instance with an identifier that is unique
 within
 that job - e.g., we typically assign an integer rank that ranges
 from 0
 to the number of instances being executed. This identifier is critical
 for MPI applications, and the relative placement of processes within a
 job often dictates overall performance. Thus, we would provide a
 mapping
 capability that allows users to specify patterns of process placement
 for
 their job - e.g., place one process on each socket on every node.
 
 I have written the code to implement the above support on a number of
 systems, and don't foresee major problems doing it for Hadoop (though I
 would welcome a chance to get a brief walk-thru the code from someone).
 Please let me know if this would be of interest to the Hadoop
 community.
 
 Thanks
 Ralph Castain
 
 
 
 
 
 
 



Re: Hadoop + MPI

2011-11-23 Thread Arun Murthy
Awesome, thanks to both you guys! It's very exciting to see this progress!

Arun

Sent from my iPhone

On Nov 23, 2011, at 5:14 PM, Ralph Castain r...@open-mpi.org wrote:

 FWIW: I can commit the OMPI part of your patch for you. The CCLA is intended 
 to ensure that people realize the need to protect OMPI from infection due 
 to code based on other licenses such as GPL. For people only offering a 
 single patch, it often is too big a burden to get corporate approval of the 
 legal document.

 So as long as someone (e.g., me) who already is operating under the CCLA is 
 willing to review and commit the patch, and the patch isn't too huge, we can 
 absorb it that way. I expect your patch is just a new ess component, and I'm 
 happy to do the review and commit it on your behalf, if that is acceptable to 
 you.


 On Nov 21, 2011, at 5:04 PM, milind.bhandar...@emc.com 
 milind.bhandar...@emc.com wrote:

 Ralph,

 Yes, I have completed the first step, although I would really like that
 code to be part of the MPI Application Master (Chris Douglas suggested a
 way to do this at ApacheCon).

 Regarding the remaining steps, I have been following discussions on the
 open mpi mailing lists, and reading code for hwloc.

 If you are making a trip to Cisco HQ sometime soon, I would like to have a
 face-to-face about hwloc. I have so far avoided to use a native task
 controller for spawning MPI jobs, but given the lack of support for
 binding in Java, it looks like I will have to bite the bullet.

 - milind

 ---
 Milind Bhandarkar
 Greenplum Labs, EMC
 (Disclaimer: Opinions expressed in this email are those of the author, and
 do not necessarily represent the views of any organization, past or
 present, the author might be affiliated with.)



 On 11/21/11 3:54 PM, Ralph Castain r...@open-mpi.org wrote:

 Hi Milind

 Glad to hear of the progress - I recall our earlier conversation. I
 gather you have completed step 1 (wireup) - have you given any thought to
 the other two steps? Anything I can do to help?

 Ralph


 On Nov 21, 2011, at 4:47 PM, milind.bhandar...@emc.com wrote:

 Hi Ralph,

 I spoke with Jeff Squyres  at SC11, and updated him on the status of my
 OpenMPI port on Hadoop Yarn.

 To update everyone, I have OpenMPI examples running on #Yarn, although
 it
 requires some code cleanup and refactoring, however that can be done as
 a
 later step.

 Currently, the MPI processes come up, get submitting client's IP and
 port
 via environment variables, connect to it, and do a barrier. The result
 of
 this barrier is that everyone in MPI_COMM_WORLD gets each other's
 endpoints.

 I am aiming to submit the patch to hadoop by the end of this month.

 I will publish the openmpi patch to github.

 (As I mentioned to Jeff, OpenMPI requires a CCLA for accepting
 submissions. That will take some time.)

 - Milind

 ---
 Milind Bhandarkar
 Greenplum Labs, EMC
 (Disclaimer: Opinions expressed in this email are those of the author,
 and
 do not necessarily represent the views of any organization, past or
 present, the author might be affiliated with.)




 I'm willing to do the integration work, but wanted to check first to
 see
 if (a) someone in the Hadoop community is already doing so, and (b) if
 you would be interested in seeing such a capability and willing to
 accept
 the code contribution?

 Establishing MPI support requires the following steps:

 1. wireup support. MPI processes need to exchange endpoint info (e.g.,
 for TCP connections, IP address and port) so that each process knows
 how
 to connect to any other process in the application. This is typically
 done in a collective modex operation. There are several ways of doing
 it - if we proceed, I will outline those in a separate email to solicit
 your input on the most desirable approach to use.

 2. binding support. One can achieve significant performance
 improvements
 by binding processes to specific cores, sockets, and/or NUMA regions
 (regardless of using MPI or not, but certainly important for MPI
 applications). This requires not only the binding code, but some logic
 to
 ensure that one doesn't overload specific resources.

 3. process mapping. I haven't verified it yet, but I suspect that
 Hadoop
 provides each executing instance with an identifier that is unique
 within
 that job - e.g., we typically assign an integer rank that ranges
 from 0
 to the number of instances being executed. This identifier is critical
 for MPI applications, and the relative placement of processes within a
 job often dictates overall performance. Thus, we would provide a
 mapping
 capability that allows users to specify patterns of process placement
 for
 their job - e.g., place one process on each socket on every node.

 I have written the code to implement the above support on a number of
 systems, and don't foresee major problems doing it for Hadoop (though I
 would welcome a chance to get a brief walk-thru the code from someone).
 Please let me know if this would be of 

Re: Hadoop + MPI

2011-11-21 Thread Arun C Murthy
Hi Ralph,

 Welcome!

 We'd absolutely love to have OpenMPI integrated with Hadoop!

 In fact, there already has been a bunch of discussions running OpenMPI on what 
we call MR2 (aka YARN), documented here: 
https://issues.apache.org/jira/browse/MAPREDUCE-2911.

 YARN is our effort to re-imagine Hadoop MapReduce as a general purpose, 
distributed, data processing system to support MapReduce, MPI and other 
programming paradigms on the same Hadoop cluster.

 Would love to collaborate, should we discuss on that jira?

thanks,
Arun

On Nov 21, 2011, at 3:35 PM, Ralph Castain wrote:

 Hi folks
 
 I am a lead developer in the Open MPI community, mostly focused on 
 integrating that package with various environments. Over the last few months, 
 I've had a couple of people ask me about MPI support within Hadoop - i.e., 
 they want to run MPI applications under the Hadoop umbrella. I've spent a 
 little time studying Hadoop, and it would seem a good fit for such a 
 capability.
 
 I'm willing to do the integration work, but wanted to check first to see if 
 (a) someone in the Hadoop community is already doing so, and (b) if you would 
 be interested in seeing such a capability and willing to accept the code 
 contribution?
 
 Establishing MPI support requires the following steps:
 
 1. wireup support. MPI processes need to exchange endpoint info (e.g., for 
 TCP connections, IP address and port) so that each process knows how to 
 connect to any other process in the application. This is typically done in a 
 collective modex operation. There are several ways of doing it - if we 
 proceed, I will outline those in a separate email to solicit your input on 
 the most desirable approach to use.
 
 2. binding support. One can achieve significant performance improvements by 
 binding processes to specific cores, sockets, and/or NUMA regions (regardless 
 of using MPI or not, but certainly important for MPI applications). This 
 requires not only the binding code, but some logic to ensure that one doesn't 
 overload specific resources.
 
 3. process mapping. I haven't verified it yet, but I suspect that Hadoop 
 provides each executing instance with an identifier that is unique within 
 that job - e.g., we typically assign an integer rank that ranges from 0 to 
 the number of instances being executed. This identifier is critical for MPI 
 applications, and the relative placement of processes within a job often 
 dictates overall performance. Thus, we would provide a mapping capability 
 that allows users to specify patterns of process placement for their job - 
 e.g., place one process on each socket on every node.
 
 I have written the code to implement the above support on a number of 
 systems, and don't foresee major problems doing it for Hadoop (though I would 
 welcome a chance to get a brief walk-thru the code from someone). Please let 
 me know if this would be of interest to the Hadoop community.
 
 Thanks
 Ralph Castain
 
 



Re: Hadoop + MPI

2011-11-21 Thread Ralph Castain

On Nov 21, 2011, at 4:43 PM, Arun C Murthy wrote:

 Hi Ralph,
 
 Welcome!
 
 We'd absolutely love to have OpenMPI integrated with Hadoop!
 
 In fact, there already has been a bunch of discussions running OpenMPI on 
 what we call MR2 (aka YARN), documented here: 
 https://issues.apache.org/jira/browse/MAPREDUCE-2911.
 
 YARN is our effort to re-imagine Hadoop MapReduce as a general purpose, 
 distributed, data processing system to support MapReduce, MPI and other 
 programming paradigms on the same Hadoop cluster.
 
 Would love to collaborate, should we discuss on that jira?

Sure! I'll poke my nose over there...thanks!

 
 thanks,
 Arun
 
 On Nov 21, 2011, at 3:35 PM, Ralph Castain wrote:
 
 Hi folks
 
 I am a lead developer in the Open MPI community, mostly focused on 
 integrating that package with various environments. Over the last few 
 months, I've had a couple of people ask me about MPI support within Hadoop - 
 i.e., they want to run MPI applications under the Hadoop umbrella. I've 
 spent a little time studying Hadoop, and it would seem a good fit for such a 
 capability.
 
 I'm willing to do the integration work, but wanted to check first to see if 
 (a) someone in the Hadoop community is already doing so, and (b) if you 
 would be interested in seeing such a capability and willing to accept the 
 code contribution?
 
 Establishing MPI support requires the following steps:
 
 1. wireup support. MPI processes need to exchange endpoint info (e.g., for 
 TCP connections, IP address and port) so that each process knows how to 
 connect to any other process in the application. This is typically done in a 
 collective modex operation. There are several ways of doing it - if we 
 proceed, I will outline those in a separate email to solicit your input on 
 the most desirable approach to use.
 
 2. binding support. One can achieve significant performance improvements by 
 binding processes to specific cores, sockets, and/or NUMA regions 
 (regardless of using MPI or not, but certainly important for MPI 
 applications). This requires not only the binding code, but some logic to 
 ensure that one doesn't overload specific resources.
 
 3. process mapping. I haven't verified it yet, but I suspect that Hadoop 
 provides each executing instance with an identifier that is unique within 
 that job - e.g., we typically assign an integer rank that ranges from 0 to 
 the number of instances being executed. This identifier is critical for MPI 
 applications, and the relative placement of processes within a job often 
 dictates overall performance. Thus, we would provide a mapping capability 
 that allows users to specify patterns of process placement for their job - 
 e.g., place one process on each socket on every node.
 
 I have written the code to implement the above support on a number of 
 systems, and don't foresee major problems doing it for Hadoop (though I 
 would welcome a chance to get a brief walk-thru the code from someone). 
 Please let me know if this would be of interest to the Hadoop community.
 
 Thanks
 Ralph Castain
 
 
 



Re: Hadoop + MPI

2011-11-21 Thread Mahadev Konar
Milind,
 Great news. Any chance you can upload a patch as it is? I am sure,
others can help cleaning it up. I am willing to help  smoothen it out
and am sure Ralph can provide feedback as well.

thanks
mahadev

On Mon, Nov 21, 2011 at 3:47 PM,  milind.bhandar...@emc.com wrote:
 Hi Ralph,

 I spoke with Jeff Squyres  at SC11, and updated him on the status of my
 OpenMPI port on Hadoop Yarn.

 To update everyone, I have OpenMPI examples running on #Yarn, although it
 requires some code cleanup and refactoring, however that can be done as a
 later step.

 Currently, the MPI processes come up, get submitting client's IP and port
 via environment variables, connect to it, and do a barrier. The result of
 this barrier is that everyone in MPI_COMM_WORLD gets each other's
 endpoints.

 I am aiming to submit the patch to hadoop by the end of this month.

 I will publish the openmpi patch to github.

 (As I mentioned to Jeff, OpenMPI requires a CCLA for accepting
 submissions. That will take some time.)

 - Milind

 ---
 Milind Bhandarkar
 Greenplum Labs, EMC
 (Disclaimer: Opinions expressed in this email are those of the author, and
 do not necessarily represent the views of any organization, past or
 present, the author might be affiliated with.)




I'm willing to do the integration work, but wanted to check first to see
if (a) someone in the Hadoop community is already doing so, and (b) if
you would be interested in seeing such a capability and willing to accept
the code contribution?

Establishing MPI support requires the following steps:

1. wireup support. MPI processes need to exchange endpoint info (e.g.,
for TCP connections, IP address and port) so that each process knows how
to connect to any other process in the application. This is typically
done in a collective modex operation. There are several ways of doing
it - if we proceed, I will outline those in a separate email to solicit
your input on the most desirable approach to use.

2. binding support. One can achieve significant performance improvements
by binding processes to specific cores, sockets, and/or NUMA regions
(regardless of using MPI or not, but certainly important for MPI
applications). This requires not only the binding code, but some logic to
ensure that one doesn't overload specific resources.

3. process mapping. I haven't verified it yet, but I suspect that Hadoop
provides each executing instance with an identifier that is unique within
that job - e.g., we typically assign an integer rank that ranges from 0
to the number of instances being executed. This identifier is critical
for MPI applications, and the relative placement of processes within a
job often dictates overall performance. Thus, we would provide a mapping
capability that allows users to specify patterns of process placement for
their job - e.g., place one process on each socket on every node.

I have written the code to implement the above support on a number of
systems, and don't foresee major problems doing it for Hadoop (though I
would welcome a chance to get a brief walk-thru the code from someone).
Please let me know if this would be of interest to the Hadoop community.

Thanks
Ralph Castain







Re: Hadoop + MPI

2011-11-21 Thread Ralph Castain
Hi Milind

Glad to hear of the progress - I recall our earlier conversation. I gather you 
have completed step 1 (wireup) - have you given any thought to the other two 
steps? Anything I can do to help?

Ralph


On Nov 21, 2011, at 4:47 PM, milind.bhandar...@emc.com wrote:

 Hi Ralph,
 
 I spoke with Jeff Squyres  at SC11, and updated him on the status of my
 OpenMPI port on Hadoop Yarn.
 
 To update everyone, I have OpenMPI examples running on #Yarn, although it
 requires some code cleanup and refactoring, however that can be done as a
 later step.
 
 Currently, the MPI processes come up, get submitting client's IP and port
 via environment variables, connect to it, and do a barrier. The result of
 this barrier is that everyone in MPI_COMM_WORLD gets each other's
 endpoints.
 
 I am aiming to submit the patch to hadoop by the end of this month.
 
 I will publish the openmpi patch to github.
 
 (As I mentioned to Jeff, OpenMPI requires a CCLA for accepting
 submissions. That will take some time.)
 
 - Milind
 
 ---
 Milind Bhandarkar
 Greenplum Labs, EMC
 (Disclaimer: Opinions expressed in this email are those of the author, and
 do not necessarily represent the views of any organization, past or
 present, the author might be affiliated with.)
 
 
 
 
 I'm willing to do the integration work, but wanted to check first to see
 if (a) someone in the Hadoop community is already doing so, and (b) if
 you would be interested in seeing such a capability and willing to accept
 the code contribution?
 
 Establishing MPI support requires the following steps:
 
 1. wireup support. MPI processes need to exchange endpoint info (e.g.,
 for TCP connections, IP address and port) so that each process knows how
 to connect to any other process in the application. This is typically
 done in a collective modex operation. There are several ways of doing
 it - if we proceed, I will outline those in a separate email to solicit
 your input on the most desirable approach to use.
 
 2. binding support. One can achieve significant performance improvements
 by binding processes to specific cores, sockets, and/or NUMA regions
 (regardless of using MPI or not, but certainly important for MPI
 applications). This requires not only the binding code, but some logic to
 ensure that one doesn't overload specific resources.
 
 3. process mapping. I haven't verified it yet, but I suspect that Hadoop
 provides each executing instance with an identifier that is unique within
 that job - e.g., we typically assign an integer rank that ranges from 0
 to the number of instances being executed. This identifier is critical
 for MPI applications, and the relative placement of processes within a
 job often dictates overall performance. Thus, we would provide a mapping
 capability that allows users to specify patterns of process placement for
 their job - e.g., place one process on each socket on every node.
 
 I have written the code to implement the above support on a number of
 systems, and don't foresee major problems doing it for Hadoop (though I
 would welcome a chance to get a brief walk-thru the code from someone).
 Please let me know if this would be of interest to the Hadoop community.
 
 Thanks
 Ralph Castain
 
 
 
 



Re: Hadoop + MPI

2011-11-21 Thread Milind.Bhandarkar
Ralph,

Yes, I have completed the first step, although I would really like that
code to be part of the MPI Application Master (Chris Douglas suggested a
way to do this at ApacheCon).

Regarding the remaining steps, I have been following discussions on the
open mpi mailing lists, and reading code for hwloc.

If you are making a trip to Cisco HQ sometime soon, I would like to have a
face-to-face about hwloc. I have so far avoided to use a native task
controller for spawning MPI jobs, but given the lack of support for
binding in Java, it looks like I will have to bite the bullet.

- milind

---
Milind Bhandarkar
Greenplum Labs, EMC
(Disclaimer: Opinions expressed in this email are those of the author, and
do not necessarily represent the views of any organization, past or
present, the author might be affiliated with.)



On 11/21/11 3:54 PM, Ralph Castain r...@open-mpi.org wrote:

Hi Milind

Glad to hear of the progress - I recall our earlier conversation. I
gather you have completed step 1 (wireup) - have you given any thought to
the other two steps? Anything I can do to help?

Ralph


On Nov 21, 2011, at 4:47 PM, milind.bhandar...@emc.com wrote:

 Hi Ralph,
 
 I spoke with Jeff Squyres  at SC11, and updated him on the status of my
 OpenMPI port on Hadoop Yarn.
 
 To update everyone, I have OpenMPI examples running on #Yarn, although
it
 requires some code cleanup and refactoring, however that can be done as
a
 later step.
 
 Currently, the MPI processes come up, get submitting client's IP and
port
 via environment variables, connect to it, and do a barrier. The result
of
 this barrier is that everyone in MPI_COMM_WORLD gets each other's
 endpoints.
 
 I am aiming to submit the patch to hadoop by the end of this month.
 
 I will publish the openmpi patch to github.
 
 (As I mentioned to Jeff, OpenMPI requires a CCLA for accepting
 submissions. That will take some time.)
 
 - Milind
 
 ---
 Milind Bhandarkar
 Greenplum Labs, EMC
 (Disclaimer: Opinions expressed in this email are those of the author,
and
 do not necessarily represent the views of any organization, past or
 present, the author might be affiliated with.)
 
 
 
 
 I'm willing to do the integration work, but wanted to check first to
see
 if (a) someone in the Hadoop community is already doing so, and (b) if
 you would be interested in seeing such a capability and willing to
accept
 the code contribution?
 
 Establishing MPI support requires the following steps:
 
 1. wireup support. MPI processes need to exchange endpoint info (e.g.,
 for TCP connections, IP address and port) so that each process knows
how
 to connect to any other process in the application. This is typically
 done in a collective modex operation. There are several ways of doing
 it - if we proceed, I will outline those in a separate email to solicit
 your input on the most desirable approach to use.
 
 2. binding support. One can achieve significant performance
improvements
 by binding processes to specific cores, sockets, and/or NUMA regions
 (regardless of using MPI or not, but certainly important for MPI
 applications). This requires not only the binding code, but some logic
to
 ensure that one doesn't overload specific resources.
 
 3. process mapping. I haven't verified it yet, but I suspect that
Hadoop
 provides each executing instance with an identifier that is unique
within
 that job - e.g., we typically assign an integer rank that ranges
from 0
 to the number of instances being executed. This identifier is critical
 for MPI applications, and the relative placement of processes within a
 job often dictates overall performance. Thus, we would provide a
mapping
 capability that allows users to specify patterns of process placement
for
 their job - e.g., place one process on each socket on every node.
 
 I have written the code to implement the above support on a number of
 systems, and don't foresee major problems doing it for Hadoop (though I
 would welcome a chance to get a brief walk-thru the code from someone).
 Please let me know if this would be of interest to the Hadoop
community.
 
 Thanks
 Ralph Castain
 
 
 
 





Re: Hadoop + MPI

2011-11-21 Thread Ralph Castain

On Nov 21, 2011, at 5:04 PM, milind.bhandar...@emc.com wrote:

 Ralph,
 
 Yes, I have completed the first step, although I would really like that
 code to be part of the MPI Application Master (Chris Douglas suggested a
 way to do this at ApacheCon).
 
 Regarding the remaining steps, I have been following discussions on the
 open mpi mailing lists, and reading code for hwloc.
 
 If you are making a trip to Cisco HQ sometime soon, I would like to have a
 face-to-face about hwloc.

Not sure that looks likely right now - my project at Cisco is done, and it 
appears I'll be leaving the company soon.

 I have so far avoided to use a native task
 controller for spawning MPI jobs, but given the lack of support for
 binding in Java, it looks like I will have to bite the bullet.

I was actually looking at porting the binding support to Java as it looks 
feasible to do so, and I can understand not wanting to absorb all that 
configuration code to handle it in C. Given the loss of job, I have some free 
time on my hands while I search for employment, so I thought I might spend it 
looking at the Hadoop integration - since you have completed the wireup, I 
might look at this next.

 
 - milind
 
 ---
 Milind Bhandarkar
 Greenplum Labs, EMC
 (Disclaimer: Opinions expressed in this email are those of the author, and
 do not necessarily represent the views of any organization, past or
 present, the author might be affiliated with.)
 
 
 
 On 11/21/11 3:54 PM, Ralph Castain r...@open-mpi.org wrote:
 
 Hi Milind
 
 Glad to hear of the progress - I recall our earlier conversation. I
 gather you have completed step 1 (wireup) - have you given any thought to
 the other two steps? Anything I can do to help?
 
 Ralph
 
 
 On Nov 21, 2011, at 4:47 PM, milind.bhandar...@emc.com wrote:
 
 Hi Ralph,
 
 I spoke with Jeff Squyres  at SC11, and updated him on the status of my
 OpenMPI port on Hadoop Yarn.
 
 To update everyone, I have OpenMPI examples running on #Yarn, although
 it
 requires some code cleanup and refactoring, however that can be done as
 a
 later step.
 
 Currently, the MPI processes come up, get submitting client's IP and
 port
 via environment variables, connect to it, and do a barrier. The result
 of
 this barrier is that everyone in MPI_COMM_WORLD gets each other's
 endpoints.
 
 I am aiming to submit the patch to hadoop by the end of this month.
 
 I will publish the openmpi patch to github.
 
 (As I mentioned to Jeff, OpenMPI requires a CCLA for accepting
 submissions. That will take some time.)
 
 - Milind
 
 ---
 Milind Bhandarkar
 Greenplum Labs, EMC
 (Disclaimer: Opinions expressed in this email are those of the author,
 and
 do not necessarily represent the views of any organization, past or
 present, the author might be affiliated with.)
 
 
 
 
 I'm willing to do the integration work, but wanted to check first to
 see
 if (a) someone in the Hadoop community is already doing so, and (b) if
 you would be interested in seeing such a capability and willing to
 accept
 the code contribution?
 
 Establishing MPI support requires the following steps:
 
 1. wireup support. MPI processes need to exchange endpoint info (e.g.,
 for TCP connections, IP address and port) so that each process knows
 how
 to connect to any other process in the application. This is typically
 done in a collective modex operation. There are several ways of doing
 it - if we proceed, I will outline those in a separate email to solicit
 your input on the most desirable approach to use.
 
 2. binding support. One can achieve significant performance
 improvements
 by binding processes to specific cores, sockets, and/or NUMA regions
 (regardless of using MPI or not, but certainly important for MPI
 applications). This requires not only the binding code, but some logic
 to
 ensure that one doesn't overload specific resources.
 
 3. process mapping. I haven't verified it yet, but I suspect that
 Hadoop
 provides each executing instance with an identifier that is unique
 within
 that job - e.g., we typically assign an integer rank that ranges
 from 0
 to the number of instances being executed. This identifier is critical
 for MPI applications, and the relative placement of processes within a
 job often dictates overall performance. Thus, we would provide a
 mapping
 capability that allows users to specify patterns of process placement
 for
 their job - e.g., place one process on each socket on every node.
 
 I have written the code to implement the above support on a number of
 systems, and don't foresee major problems doing it for Hadoop (though I
 would welcome a chance to get a brief walk-thru the code from someone).
 Please let me know if this would be of interest to the Hadoop
 community.
 
 Thanks
 Ralph Castain