Re: [OMPI devel] -display-map behavior in 1.3

2009-05-04 Thread Ralph Castain
Easier than I thought...done in r21147

Let me know if that meets your needs
Ralph


On Mon, May 4, 2009 at 9:42 AM, Ralph Castain  wrote:

> Should be doable
>
> Since the output was going to stderr, we just let it continue to do so and
> tagged it. I think I can redirect it when doing xml tagging as that is
> handled as a separate case - shouldn't be too hard to do.
>
>
> On Mon, May 4, 2009 at 9:29 AM, Greg Watson  wrote:
>
>> Ralph,
>> I did find another issue in 1.3 though. It looks like with the -xml option
>> you're sending output tagged with  to stderr, whereas it would
>> probably be better if everything was sent to stdout. Otherwise it's
>> necessary to parse the stderr stream separately.
>>
>> Greg
>>
>> On May 1, 2009, at 10:47 AM, Greg Watson wrote:
>>
>> Arrgh! Sorry, my bad. I must have been linked against an old version or
>> something. When I recompiled the output went away.
>> Greg
>>
>> On May 1, 2009, at 10:09 AM, Ralph Castain wrote:
>>
>> Interesting - I'm not seeing this behavior:
>>
>> graywolf54:trunk rhc$ mpirun -n 3 --xml --display-map hostname
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> graywolf54.lanl.gov
>> graywolf54.lanl.gov
>> graywolf54.lanl.gov
>> graywolf54:trunk rhc$
>>
>> Can you tell me more about when you see this? Note that the display-map
>> output should always appear on stderr because that is our default output
>> device.
>>
>>
>> On Fri, May 1, 2009 at 7:39 AM, Ralph Castain  wrote:
>>
>>> Hmmm...no, that's a bug. I'll fix it.
>>>
>>> Thanks!
>>>
>>>
>>>
>>> On Fri, May 1, 2009 at 7:24 AM, Greg Watson wrote:
>>>
 Ralf,

 I've just noticed that if I use '-xml -display-map', I get the xml
 version of the map and then the non-xml version is sent to stderr (wrapped
 in xml tags). Was this by design? In my view it would be better to suppress
 the non-xml map altogether if the -xml option. 1.4 seems to do the same.

 Greg

 ___
 devel mailing list
 de...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/devel

>>>
>>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>


Re: [OMPI devel] -display-map behavior in 1.3

2009-05-04 Thread Ralph Castain
Should be doable

Since the output was going to stderr, we just let it continue to do so and
tagged it. I think I can redirect it when doing xml tagging as that is
handled as a separate case - shouldn't be too hard to do.

On Mon, May 4, 2009 at 9:29 AM, Greg Watson  wrote:

> Ralph,
> I did find another issue in 1.3 though. It looks like with the -xml option
> you're sending output tagged with  to stderr, whereas it would
> probably be better if everything was sent to stdout. Otherwise it's
> necessary to parse the stderr stream separately.
>
> Greg
>
> On May 1, 2009, at 10:47 AM, Greg Watson wrote:
>
> Arrgh! Sorry, my bad. I must have been linked against an old version or
> something. When I recompiled the output went away.
> Greg
>
> On May 1, 2009, at 10:09 AM, Ralph Castain wrote:
>
> Interesting - I'm not seeing this behavior:
>
> graywolf54:trunk rhc$ mpirun -n 3 --xml --display-map hostname
> 
> 
> 
> 
> 
> 
> 
> graywolf54.lanl.gov
> graywolf54.lanl.gov
> graywolf54.lanl.gov
> graywolf54:trunk rhc$
>
> Can you tell me more about when you see this? Note that the display-map
> output should always appear on stderr because that is our default output
> device.
>
>
> On Fri, May 1, 2009 at 7:39 AM, Ralph Castain  wrote:
>
>> Hmmm...no, that's a bug. I'll fix it.
>>
>> Thanks!
>>
>>
>>
>> On Fri, May 1, 2009 at 7:24 AM, Greg Watson wrote:
>>
>>> Ralf,
>>>
>>> I've just noticed that if I use '-xml -display-map', I get the xml
>>> version of the map and then the non-xml version is sent to stderr (wrapped
>>> in xml tags). Was this by design? In my view it would be better to suppress
>>> the non-xml map altogether if the -xml option. 1.4 seems to do the same.
>>>
>>> Greg
>>>
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


Re: [OMPI devel] -display-map behavior in 1.3

2009-05-04 Thread Greg Watson

Ralph,

I did find another issue in 1.3 though. It looks like with the -xml  
option you're sending output tagged with  to stderr, whereas  
it would probably be better if everything was sent to stdout.  
Otherwise it's necessary to parse the stderr stream separately.


Greg

On May 1, 2009, at 10:47 AM, Greg Watson wrote:

Arrgh! Sorry, my bad. I must have been linked against an old version  
or something. When I recompiled the output went away.


Greg

On May 1, 2009, at 10:09 AM, Ralph Castain wrote:


Interesting - I'm not seeing this behavior:

graywolf54:trunk rhc$ mpirun -n 3 --xml --display-map hostname







graywolf54.lanl.gov
graywolf54.lanl.gov
graywolf54.lanl.gov
graywolf54:trunk rhc$

Can you tell me more about when you see this? Note that the display- 
map output should always appear on stderr because that is our  
default output device.



On Fri, May 1, 2009 at 7:39 AM, Ralph Castain   
wrote:

Hmmm...no, that's a bug. I'll fix it.

Thanks!



On Fri, May 1, 2009 at 7:24 AM, Greg Watson   
wrote:

Ralf,

I've just noticed that if I use '-xml -display-map', I get the xml  
version of the map and then the non-xml version is sent to stderr  
(wrapped in xml tags). Was this by design? In my view it would be  
better to suppress the non-xml map altogether if the -xml option.  
1.4 seems to do the same.


Greg

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] -display-map behavior in 1.3

2009-05-01 Thread Greg Watson
Arrgh! Sorry, my bad. I must have been linked against an old version  
or something. When I recompiled the output went away.


Greg

On May 1, 2009, at 10:09 AM, Ralph Castain wrote:


Interesting - I'm not seeing this behavior:

graywolf54:trunk rhc$ mpirun -n 3 --xml --display-map hostname







graywolf54.lanl.gov
graywolf54.lanl.gov
graywolf54.lanl.gov
graywolf54:trunk rhc$

Can you tell me more about when you see this? Note that the display- 
map output should always appear on stderr because that is our  
default output device.



On Fri, May 1, 2009 at 7:39 AM, Ralph Castain   
wrote:

Hmmm...no, that's a bug. I'll fix it.

Thanks!



On Fri, May 1, 2009 at 7:24 AM, Greg Watson   
wrote:

Ralf,

I've just noticed that if I use '-xml -display-map', I get the xml  
version of the map and then the non-xml version is sent to stderr  
(wrapped in xml tags). Was this by design? In my view it would be  
better to suppress the non-xml map altogether if the -xml option.  
1.4 seems to do the same.


Greg

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] -display-map behavior in 1.3

2009-05-01 Thread Ralph Castain
Interesting - I'm not seeing this behavior:

graywolf54:trunk rhc$ mpirun -n 3 --xml --display-map hostname







graywolf54.lanl.gov
graywolf54.lanl.gov
graywolf54.lanl.gov
graywolf54:trunk rhc$

Can you tell me more about when you see this? Note that the display-map
output should always appear on stderr because that is our default output
device.


On Fri, May 1, 2009 at 7:39 AM, Ralph Castain  wrote:

> Hmmm...no, that's a bug. I'll fix it.
>
> Thanks!
>
>
>
> On Fri, May 1, 2009 at 7:24 AM, Greg Watson  wrote:
>
>> Ralf,
>>
>> I've just noticed that if I use '-xml -display-map', I get the xml version
>> of the map and then the non-xml version is sent to stderr (wrapped in xml
>> tags). Was this by design? In my view it would be better to suppress the
>> non-xml map altogether if the -xml option. 1.4 seems to do the same.
>>
>> Greg
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>


Re: [OMPI devel] -display-map behavior in 1.3

2009-05-01 Thread Ralph Castain
Hmmm...no, that's a bug. I'll fix it.

Thanks!


On Fri, May 1, 2009 at 7:24 AM, Greg Watson  wrote:

> Ralf,
>
> I've just noticed that if I use '-xml -display-map', I get the xml version
> of the map and then the non-xml version is sent to stderr (wrapped in xml
> tags). Was this by design? In my view it would be better to suppress the
> non-xml map altogether if the -xml option. 1.4 seems to do the same.
>
> Greg
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


[OMPI devel] -display-map behavior in 1.3

2009-05-01 Thread Greg Watson

Ralf,

I've just noticed that if I use '-xml -display-map', I get the xml  
version of the map and then the non-xml version is sent to stderr  
(wrapped in xml tags). Was this by design? In my view it would be  
better to suppress the non-xml map altogether if the -xml option. 1.4  
seems to do the same.


Greg



Re: [OMPI devel] -display-map

2009-01-20 Thread Greg Watson

Looks good now. Thanks!

Greg

On Jan 20, 2009, at 12:00 PM, Ralph Castain wrote:

I'm embarrassed to admit that I never actually implemented the xml  
option for tag-output...this has been rectified with r20302.


Let me know if that works for you - sorry for confusion.

Ralph


On Jan 20, 2009, at 8:08 AM, Greg Watson wrote:


Ralph,

The encapsulation is not quite right yet. I'm seeing this:

[1,0]n = 0
[1,1]n = 0

but it should be:

n = 0
n = 0

Thanks,

Greg

On Jan 20, 2009, at 9:20 AM, Ralph Castain wrote:

You need to add --tag-output - this is a separate option as it  
applies both to xml and non-xml situations.


If you like, I can force tag-output "on" by default whenever -xml  
is specified.


Ralph


On Jan 16, 2009, at 12:52 PM, Greg Watson wrote:


Ralph,

Is there something I need to do to enable stdout/err  
encapsulation (apart from -xml)? Here's what I see:


$ mpirun -mca orte_show_resolved_nodenames 1 -xml -display-map - 
np 5 /Users/greg/Documents/workspace1/testMPI/Debug/testMPI


















n = 0
n = 0
n = 0
n = 0
n = 0

On Jan 15, 2009, at 1:13 PM, Ralph Castain wrote:

Okay, it is in the trunk as of r20284 - I'll file the request to  
have it moved to 1.3.1.


Let me know if you get a chance to test the stdout/err stuff in  
the trunk - we should try and iterate it so any changes can make  
1.3.1 as well.


Thanks!
Ralph


On Jan 15, 2009, at 11:03 AM, Greg Watson wrote:


Ralph,

I think the second form would be ideal and would simplify  
things greatly.


Greg

On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote:

Here is what I was able to do - note that the resolve messages  
are associated with the specific hostname, not the overall map:











Will that work for you? If you like, I can remove the name=  
field from the noderesolve element since the info is specific  
to the host element that contains it. In other words, I can  
make it look like this:











if that would help.

Ralph


On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote:

We -may- be able to do a more formal XML output at some  
point. The problem will be the natural interleaving of stdout/ 
err from the various procs due to the async behavior of MPI.  
Mpirun receives fragmented output in the forwarding system,  
limited by the buffer sizes and the amount of data we can  
read at any one "bite" from the pipes connecting us to the  
procs. So even though the user -thinks- they output a single  
large line of stuff, it may show up at mpirun as a series of  
fragments. Hence, it gets tricky to know how to put  
appropriate XML brackets around it.


Given this input about when you actually want resolved name  
info, I can at least do something about that area. Won't be  
in 1.3.0, but should make 1.3.1.


As for XML-tagged stdout/err: the OMPI community asked me not  
to turn that feature "on" for 1.3.0 as they felt it hasn't  
been adequately tested yet. The code is present, but cannot  
be activated in 1.3.0. However, I believe it is activated on  
the trunk when you do --xml --tagged-output, so perhaps some  
testing will help us debug and validate it adequately for  
1.3.1?


Thanks
Ralph


On Jan 14, 2009, at 7:02 AM, Greg Watson wrote:


Ralph,

The only time we use the resolved names is when we get a  
map, so we consider them part of the map output.


If quasi-XML is all that will ever be possible with 1.3,  
then you may as well leave as-is and we will attempt to  
clean it up in Eclipse. It would be nice if a future version  
of ompi could output correct XML (including stdout) as this  
would vastly simplify the parsing we need to do.


Regards,

Greg

On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote:

Hmmm...well, I can't do either for 1.3.0 as it is departing  
this afternoon.


The first option would be very hard to do. I would have to  
expose the display-map option across the code base and  
check it prior to printing anything about resolving node  
names. I guess I should ask: do you only want noderesolve  
statements when we are displaying the map? Right now, I  
will output them regardless.


The second option could be done. I could check if any  
"display" option has been specified, and output the   
root at that time (likewise for the end). Anything we  
output in-between would be encapsulated between the two,  
but that would include any user output to stdout and/or  
stderr - which for 1.3.0 is not in xml.


Any thoughts?

Ralph

PS. Guess I should clarify that I was not striving for true  
XML interaction here, but rather a quasi-XML format that  
would help you to filter the output. I have no problem 

Re: [OMPI devel] -display-map

2009-01-20 Thread Ralph Castain
I'm embarrassed to admit that I never actually implemented the xml  
option for tag-output...this has been rectified with r20302.


Let me know if that works for you - sorry for confusion.

Ralph


On Jan 20, 2009, at 8:08 AM, Greg Watson wrote:


Ralph,

The encapsulation is not quite right yet. I'm seeing this:

[1,0]n = 0
[1,1]n = 0

but it should be:

n = 0
n = 0

Thanks,

Greg

On Jan 20, 2009, at 9:20 AM, Ralph Castain wrote:

You need to add --tag-output - this is a separate option as it  
applies both to xml and non-xml situations.


If you like, I can force tag-output "on" by default whenever -xml  
is specified.


Ralph


On Jan 16, 2009, at 12:52 PM, Greg Watson wrote:


Ralph,

Is there something I need to do to enable stdout/err encapsulation  
(apart from -xml)? Here's what I see:


$ mpirun -mca orte_show_resolved_nodenames 1 -xml -display-map -np  
5 /Users/greg/Documents/workspace1/testMPI/Debug/testMPI


















n = 0
n = 0
n = 0
n = 0
n = 0

On Jan 15, 2009, at 1:13 PM, Ralph Castain wrote:

Okay, it is in the trunk as of r20284 - I'll file the request to  
have it moved to 1.3.1.


Let me know if you get a chance to test the stdout/err stuff in  
the trunk - we should try and iterate it so any changes can make  
1.3.1 as well.


Thanks!
Ralph


On Jan 15, 2009, at 11:03 AM, Greg Watson wrote:


Ralph,

I think the second form would be ideal and would simplify things  
greatly.


Greg

On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote:

Here is what I was able to do - note that the resolve messages  
are associated with the specific hostname, not the overall map:











Will that work for you? If you like, I can remove the name=  
field from the noderesolve element since the info is specific  
to the host element that contains it. In other words, I can  
make it look like this:











if that would help.

Ralph


On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote:

We -may- be able to do a more formal XML output at some point.  
The problem will be the natural interleaving of stdout/err  
from the various procs due to the async behavior of MPI.  
Mpirun receives fragmented output in the forwarding system,  
limited by the buffer sizes and the amount of data we can read  
at any one "bite" from the pipes connecting us to the procs.  
So even though the user -thinks- they output a single large  
line of stuff, it may show up at mpirun as a series of  
fragments. Hence, it gets tricky to know how to put  
appropriate XML brackets around it.


Given this input about when you actually want resolved name  
info, I can at least do something about that area. Won't be in  
1.3.0, but should make 1.3.1.


As for XML-tagged stdout/err: the OMPI community asked me not  
to turn that feature "on" for 1.3.0 as they felt it hasn't  
been adequately tested yet. The code is present, but cannot be  
activated in 1.3.0. However, I believe it is activated on the  
trunk when you do --xml --tagged-output, so perhaps some  
testing will help us debug and validate it adequately for 1.3.1?


Thanks
Ralph


On Jan 14, 2009, at 7:02 AM, Greg Watson wrote:


Ralph,

The only time we use the resolved names is when we get a map,  
so we consider them part of the map output.


If quasi-XML is all that will ever be possible with 1.3, then  
you may as well leave as-is and we will attempt to clean it  
up in Eclipse. It would be nice if a future version of ompi  
could output correct XML (including stdout) as this would  
vastly simplify the parsing we need to do.


Regards,

Greg

On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote:

Hmmm...well, I can't do either for 1.3.0 as it is departing  
this afternoon.


The first option would be very hard to do. I would have to  
expose the display-map option across the code base and check  
it prior to printing anything about resolving node names. I  
guess I should ask: do you only want noderesolve statements  
when we are displaying the map? Right now, I will output  
them regardless.


The second option could be done. I could check if any  
"display" option has been specified, and output the   
root at that time (likewise for the end). Anything we output  
in-between would be encapsulated between the two, but that  
would include any user output to stdout and/or stderr -  
which for 1.3.0 is not in xml.


Any thoughts?

Ralph

PS. Guess I should clarify that I was not striving for true  
XML interaction here, but rather a quasi-XML format that  
would help you to filter the output. I have no problem  
trying to get to something more formally correct, but it  
could be tricky in some 

Re: [OMPI devel] -display-map

2009-01-20 Thread Greg Watson

Ralph,

The encapsulation is not quite right yet. I'm seeing this:

[1,0]n = 0
[1,1]n = 0

but it should be:

n = 0
n = 0

Thanks,

Greg

On Jan 20, 2009, at 9:20 AM, Ralph Castain wrote:

You need to add --tag-output - this is a separate option as it  
applies both to xml and non-xml situations.


If you like, I can force tag-output "on" by default whenever -xml is  
specified.


Ralph


On Jan 16, 2009, at 12:52 PM, Greg Watson wrote:


Ralph,

Is there something I need to do to enable stdout/err encapsulation  
(apart from -xml)? Here's what I see:


$ mpirun -mca orte_show_resolved_nodenames 1 -xml -display-map -np  
5 /Users/greg/Documents/workspace1/testMPI/Debug/testMPI


















n = 0
n = 0
n = 0
n = 0
n = 0

On Jan 15, 2009, at 1:13 PM, Ralph Castain wrote:

Okay, it is in the trunk as of r20284 - I'll file the request to  
have it moved to 1.3.1.


Let me know if you get a chance to test the stdout/err stuff in  
the trunk - we should try and iterate it so any changes can make  
1.3.1 as well.


Thanks!
Ralph


On Jan 15, 2009, at 11:03 AM, Greg Watson wrote:


Ralph,

I think the second form would be ideal and would simplify things  
greatly.


Greg

On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote:

Here is what I was able to do - note that the resolve messages  
are associated with the specific hostname, not the overall map:











Will that work for you? If you like, I can remove the name=  
field from the noderesolve element since the info is specific to  
the host element that contains it. In other words, I can make it  
look like this:











if that would help.

Ralph


On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote:

We -may- be able to do a more formal XML output at some point.  
The problem will be the natural interleaving of stdout/err from  
the various procs due to the async behavior of MPI. Mpirun  
receives fragmented output in the forwarding system, limited by  
the buffer sizes and the amount of data we can read at any one  
"bite" from the pipes connecting us to the procs. So even  
though the user -thinks- they output a single large line of  
stuff, it may show up at mpirun as a series of fragments.  
Hence, it gets tricky to know how to put appropriate XML  
brackets around it.


Given this input about when you actually want resolved name  
info, I can at least do something about that area. Won't be in  
1.3.0, but should make 1.3.1.


As for XML-tagged stdout/err: the OMPI community asked me not  
to turn that feature "on" for 1.3.0 as they felt it hasn't been  
adequately tested yet. The code is present, but cannot be  
activated in 1.3.0. However, I believe it is activated on the  
trunk when you do --xml --tagged-output, so perhaps some  
testing will help us debug and validate it adequately for 1.3.1?


Thanks
Ralph


On Jan 14, 2009, at 7:02 AM, Greg Watson wrote:


Ralph,

The only time we use the resolved names is when we get a map,  
so we consider them part of the map output.


If quasi-XML is all that will ever be possible with 1.3, then  
you may as well leave as-is and we will attempt to clean it up  
in Eclipse. It would be nice if a future version of ompi could  
output correct XML (including stdout) as this would vastly  
simplify the parsing we need to do.


Regards,

Greg

On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote:

Hmmm...well, I can't do either for 1.3.0 as it is departing  
this afternoon.


The first option would be very hard to do. I would have to  
expose the display-map option across the code base and check  
it prior to printing anything about resolving node names. I  
guess I should ask: do you only want noderesolve statements  
when we are displaying the map? Right now, I will output them  
regardless.


The second option could be done. I could check if any  
"display" option has been specified, and output the   
root at that time (likewise for the end). Anything we output  
in-between would be encapsulated between the two, but that  
would include any user output to stdout and/or stderr - which  
for 1.3.0 is not in xml.


Any thoughts?

Ralph

PS. Guess I should clarify that I was not striving for true  
XML interaction here, but rather a quasi-XML format that  
would help you to filter the output. I have no problem trying  
to get to something more formally correct, but it could be  
tricky in some places to achieve it due to the inherent async  
nature of the beast.



On Jan 13, 2009, at 12:17 PM, Greg Watson wrote:


Ralph,

The XML is looking better now, but there is still one  
problem. To be valid, there needs to be only one root  

Re: [OMPI devel] -display-map

2009-01-20 Thread Greg Watson
I don't think there's any reason we'd want stdout/err not to be  
encapsulated, so forcing tag-output makes sense.


Greg

On Jan 20, 2009, at 9:20 AM, Ralph Castain wrote:

You need to add --tag-output - this is a separate option as it  
applies both to xml and non-xml situations.


If you like, I can force tag-output "on" by default whenever -xml is  
specified.


Ralph


On Jan 16, 2009, at 12:52 PM, Greg Watson wrote:


Ralph,

Is there something I need to do to enable stdout/err encapsulation  
(apart from -xml)? Here's what I see:


$ mpirun -mca orte_show_resolved_nodenames 1 -xml -display-map -np  
5 /Users/greg/Documents/workspace1/testMPI/Debug/testMPI


















n = 0
n = 0
n = 0
n = 0
n = 0

On Jan 15, 2009, at 1:13 PM, Ralph Castain wrote:

Okay, it is in the trunk as of r20284 - I'll file the request to  
have it moved to 1.3.1.


Let me know if you get a chance to test the stdout/err stuff in  
the trunk - we should try and iterate it so any changes can make  
1.3.1 as well.


Thanks!
Ralph


On Jan 15, 2009, at 11:03 AM, Greg Watson wrote:


Ralph,

I think the second form would be ideal and would simplify things  
greatly.


Greg

On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote:

Here is what I was able to do - note that the resolve messages  
are associated with the specific hostname, not the overall map:











Will that work for you? If you like, I can remove the name=  
field from the noderesolve element since the info is specific to  
the host element that contains it. In other words, I can make it  
look like this:











if that would help.

Ralph


On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote:

We -may- be able to do a more formal XML output at some point.  
The problem will be the natural interleaving of stdout/err from  
the various procs due to the async behavior of MPI. Mpirun  
receives fragmented output in the forwarding system, limited by  
the buffer sizes and the amount of data we can read at any one  
"bite" from the pipes connecting us to the procs. So even  
though the user -thinks- they output a single large line of  
stuff, it may show up at mpirun as a series of fragments.  
Hence, it gets tricky to know how to put appropriate XML  
brackets around it.


Given this input about when you actually want resolved name  
info, I can at least do something about that area. Won't be in  
1.3.0, but should make 1.3.1.


As for XML-tagged stdout/err: the OMPI community asked me not  
to turn that feature "on" for 1.3.0 as they felt it hasn't been  
adequately tested yet. The code is present, but cannot be  
activated in 1.3.0. However, I believe it is activated on the  
trunk when you do --xml --tagged-output, so perhaps some  
testing will help us debug and validate it adequately for 1.3.1?


Thanks
Ralph


On Jan 14, 2009, at 7:02 AM, Greg Watson wrote:


Ralph,

The only time we use the resolved names is when we get a map,  
so we consider them part of the map output.


If quasi-XML is all that will ever be possible with 1.3, then  
you may as well leave as-is and we will attempt to clean it up  
in Eclipse. It would be nice if a future version of ompi could  
output correct XML (including stdout) as this would vastly  
simplify the parsing we need to do.


Regards,

Greg

On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote:

Hmmm...well, I can't do either for 1.3.0 as it is departing  
this afternoon.


The first option would be very hard to do. I would have to  
expose the display-map option across the code base and check  
it prior to printing anything about resolving node names. I  
guess I should ask: do you only want noderesolve statements  
when we are displaying the map? Right now, I will output them  
regardless.


The second option could be done. I could check if any  
"display" option has been specified, and output the   
root at that time (likewise for the end). Anything we output  
in-between would be encapsulated between the two, but that  
would include any user output to stdout and/or stderr - which  
for 1.3.0 is not in xml.


Any thoughts?

Ralph

PS. Guess I should clarify that I was not striving for true  
XML interaction here, but rather a quasi-XML format that  
would help you to filter the output. I have no problem trying  
to get to something more formally correct, but it could be  
tricky in some places to achieve it due to the inherent async  
nature of the beast.



On Jan 13, 2009, at 12:17 PM, Greg Watson wrote:


Ralph,

The XML is looking better now, but there is still one  
problem. To be valid, there needs to be only one root  
element, but 

Re: [OMPI devel] -display-map

2009-01-20 Thread Ralph Castain
You need to add --tag-output - this is a separate option as it applies  
both to xml and non-xml situations.


If you like, I can force tag-output "on" by default whenever -xml is  
specified.


Ralph


On Jan 16, 2009, at 12:52 PM, Greg Watson wrote:


Ralph,

Is there something I need to do to enable stdout/err encapsulation  
(apart from -xml)? Here's what I see:


$ mpirun -mca orte_show_resolved_nodenames 1 -xml -display-map -np  
5 /Users/greg/Documents/workspace1/testMPI/Debug/testMPI


















n = 0
n = 0
n = 0
n = 0
n = 0

On Jan 15, 2009, at 1:13 PM, Ralph Castain wrote:

Okay, it is in the trunk as of r20284 - I'll file the request to  
have it moved to 1.3.1.


Let me know if you get a chance to test the stdout/err stuff in the  
trunk - we should try and iterate it so any changes can make 1.3.1  
as well.


Thanks!
Ralph


On Jan 15, 2009, at 11:03 AM, Greg Watson wrote:


Ralph,

I think the second form would be ideal and would simplify things  
greatly.


Greg

On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote:

Here is what I was able to do - note that the resolve messages  
are associated with the specific hostname, not the overall map:











Will that work for you? If you like, I can remove the name= field  
from the noderesolve element since the info is specific to the  
host element that contains it. In other words, I can make it look  
like this:











if that would help.

Ralph


On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote:

We -may- be able to do a more formal XML output at some point.  
The problem will be the natural interleaving of stdout/err from  
the various procs due to the async behavior of MPI. Mpirun  
receives fragmented output in the forwarding system, limited by  
the buffer sizes and the amount of data we can read at any one  
"bite" from the pipes connecting us to the procs. So even though  
the user -thinks- they output a single large line of stuff, it  
may show up at mpirun as a series of fragments. Hence, it gets  
tricky to know how to put appropriate XML brackets around it.


Given this input about when you actually want resolved name  
info, I can at least do something about that area. Won't be in  
1.3.0, but should make 1.3.1.


As for XML-tagged stdout/err: the OMPI community asked me not to  
turn that feature "on" for 1.3.0 as they felt it hasn't been  
adequately tested yet. The code is present, but cannot be  
activated in 1.3.0. However, I believe it is activated on the  
trunk when you do --xml --tagged-output, so perhaps some testing  
will help us debug and validate it adequately for 1.3.1?


Thanks
Ralph


On Jan 14, 2009, at 7:02 AM, Greg Watson wrote:


Ralph,

The only time we use the resolved names is when we get a map,  
so we consider them part of the map output.


If quasi-XML is all that will ever be possible with 1.3, then  
you may as well leave as-is and we will attempt to clean it up  
in Eclipse. It would be nice if a future version of ompi could  
output correct XML (including stdout) as this would vastly  
simplify the parsing we need to do.


Regards,

Greg

On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote:

Hmmm...well, I can't do either for 1.3.0 as it is departing  
this afternoon.


The first option would be very hard to do. I would have to  
expose the display-map option across the code base and check  
it prior to printing anything about resolving node names. I  
guess I should ask: do you only want noderesolve statements  
when we are displaying the map? Right now, I will output them  
regardless.


The second option could be done. I could check if any  
"display" option has been specified, and output the   
root at that time (likewise for the end). Anything we output  
in-between would be encapsulated between the two, but that  
would include any user output to stdout and/or stderr - which  
for 1.3.0 is not in xml.


Any thoughts?

Ralph

PS. Guess I should clarify that I was not striving for true  
XML interaction here, but rather a quasi-XML format that would  
help you to filter the output. I have no problem trying to get  
to something more formally correct, but it could be tricky in  
some places to achieve it due to the inherent async nature of  
the beast.



On Jan 13, 2009, at 12:17 PM, Greg Watson wrote:


Ralph,

The XML is looking better now, but there is still one  
problem. To be valid, there needs to be only one root  
element, but currently you don't have any (or many). So  
rather than:














the XML should 

Re: [OMPI devel] -display-map

2009-01-16 Thread Greg Watson

Ralph,

Is there something I need to do to enable stdout/err encapsulation  
(apart from -xml)? Here's what I see:


$ mpirun -mca orte_show_resolved_nodenames 1 -xml -display-map -np 5 / 
Users/greg/Documents/workspace1/testMPI/Debug/testMPI


















n = 0
n = 0
n = 0
n = 0
n = 0

On Jan 15, 2009, at 1:13 PM, Ralph Castain wrote:

Okay, it is in the trunk as of r20284 - I'll file the request to  
have it moved to 1.3.1.


Let me know if you get a chance to test the stdout/err stuff in the  
trunk - we should try and iterate it so any changes can make 1.3.1  
as well.


Thanks!
Ralph


On Jan 15, 2009, at 11:03 AM, Greg Watson wrote:


Ralph,

I think the second form would be ideal and would simplify things  
greatly.


Greg

On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote:

Here is what I was able to do - note that the resolve messages are  
associated with the specific hostname, not the overall map:











Will that work for you? If you like, I can remove the name= field  
from the noderesolve element since the info is specific to the  
host element that contains it. In other words, I can make it look  
like this:











if that would help.

Ralph


On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote:

We -may- be able to do a more formal XML output at some point.  
The problem will be the natural interleaving of stdout/err from  
the various procs due to the async behavior of MPI. Mpirun  
receives fragmented output in the forwarding system, limited by  
the buffer sizes and the amount of data we can read at any one  
"bite" from the pipes connecting us to the procs. So even though  
the user -thinks- they output a single large line of stuff, it  
may show up at mpirun as a series of fragments. Hence, it gets  
tricky to know how to put appropriate XML brackets around it.


Given this input about when you actually want resolved name info,  
I can at least do something about that area. Won't be in 1.3.0,  
but should make 1.3.1.


As for XML-tagged stdout/err: the OMPI community asked me not to  
turn that feature "on" for 1.3.0 as they felt it hasn't been  
adequately tested yet. The code is present, but cannot be  
activated in 1.3.0. However, I believe it is activated on the  
trunk when you do --xml --tagged-output, so perhaps some testing  
will help us debug and validate it adequately for 1.3.1?


Thanks
Ralph


On Jan 14, 2009, at 7:02 AM, Greg Watson wrote:


Ralph,

The only time we use the resolved names is when we get a map, so  
we consider them part of the map output.


If quasi-XML is all that will ever be possible with 1.3, then  
you may as well leave as-is and we will attempt to clean it up  
in Eclipse. It would be nice if a future version of ompi could  
output correct XML (including stdout) as this would vastly  
simplify the parsing we need to do.


Regards,

Greg

On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote:

Hmmm...well, I can't do either for 1.3.0 as it is departing  
this afternoon.


The first option would be very hard to do. I would have to  
expose the display-map option across the code base and check it  
prior to printing anything about resolving node names. I guess  
I should ask: do you only want noderesolve statements when we  
are displaying the map? Right now, I will output them regardless.


The second option could be done. I could check if any "display"  
option has been specified, and output the  root at that  
time (likewise for the end). Anything we output in-between  
would be encapsulated between the two, but that would include  
any user output to stdout and/or stderr - which for 1.3.0 is  
not in xml.


Any thoughts?

Ralph

PS. Guess I should clarify that I was not striving for true XML  
interaction here, but rather a quasi-XML format that would help  
you to filter the output. I have no problem trying to get to  
something more formally correct, but it could be tricky in some  
places to achieve it due to the inherent async nature of the  
beast.



On Jan 13, 2009, at 12:17 PM, Greg Watson wrote:


Ralph,

The XML is looking better now, but there is still one problem.  
To be valid, there needs to be only one root element, but  
currently you don't have any (or many). So rather than:














the XML should be:













or:









   

Re: [OMPI devel] -display-map

2009-01-16 Thread Jeff Squyres

Fixed in r20288.  Thanks for the catch.

On Jan 16, 2009, at 2:04 PM, Greg Watson wrote:

FYI, if I configure with --with-platform=contrib/platform/lanl/ 
macosx-dynamic the build succeeds.


Greg

On Jan 16, 2009, at 1:08 PM, Jeff Squyres wrote:

References: <78c4b4d7-d9bc-4268-97cf-8cbad...@computer.org>  <9317bd55-13a2-44be-bcc0-3e42e2322...@computer.org> <5cb48a5d-1ce3-48f7-8890-c99239b0a...@lanl.gov 
> <22ebe824--47f1-a954-8b54536bf...@computer.org>  <6dda0348-96b4-4e3f-91b4-490631cfe...@computer.org>   <460591d2-bd7b-43ca-9b1e-1b4e02127...@lanl.gov 
>   <4d997767-d893-43e7-bd4a-41266c9b4...@lanl.gov> <206dc9cd-aa61-4e7c-8a28-7dd3279ce...@computer.org 
>  <5175dc9a-ee1f-4b38-be89-eb55fcef3...@lanl.gov 
> <66736892-ce43-464c-b439-7ed03ddb0...@computer.org>  <8d67c754-d192-45ee-b4e8-071f67d78...@lanl.gov>  
<6D116CE6-9A8B-407E-A2D7-!
1 f716e827...@computer.org> <7d175b97-df6a-42db-81b9-4d9663861...@lanl.gov 
> <19ef4971-0390-4992-a8a7-cbc6b7189...@computer.org>

X-Mailer: Apple Mail (2.930.3)
Return-Path: jsquy...@cisco.com
X-OriginalArrivalTime: 16 Jan 2009 18:08:11.0165 (UTC)  
FILETIME=[646AA4D0:01C97805]


Er... whoops.  This looks like my mistake (I just recently add  
MPI_REDUCE_LOCAL to the trunk -- not v1.3).


I could have sworn that I tested this on a Mac, multiple times.   
I'll test again...



On Jan 16, 2009, at 12:58 PM, Greg Watson wrote:


When I try to build trunk, it fails with:

i_f77.lax/libmpi_f77_pmpi.a/pwin_unlock_f.o .libs/libmpi_f77.lax/ 
libmpi_f77_pmpi.a/pwin_wait_f.o .libs/libmpi_f77.lax/ 
libmpi_f77_pmpi.a/pwtick_f.o .libs/libmpi_f77.lax/ 
libmpi_f77_pmpi.a/pwtime_f.o   ../../../ompi/.libs/libmpi. 
0.0.0.dylib /usr/local/openmpi-1.4-devel/lib/libopen-rte. 
0.0.0.dylib /usr/local/openmpi-1.4-devel/lib/libopen-pal. 
0.0.0.dylib  -install_name  /usr/local/openmpi-1.4-devel/lib/ 
libmpi_f77.0.dylib -compatibility_version 1 -current_version 1.0
ld: duplicate symbol _mpi_reduce_local_f in .libs/libmpi_f77.lax/ 
libmpi_f77_pmpi.a/preduce_local_f.o and .libs/reduce_local_f.o


collect2: ld returned 1 exit status
make[3]: *** [libmpi_f77.la] Error 1
make[2]: *** [all-recursive] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all-recursive] Error 1

I'm using the default configure command (./configure --prefix=xxx)  
on Mac OS X 10.5. This works fine on the 1.3 branch.


Greg

On Jan 15, 2009, at 1:13 PM, Ralph Castain wrote:

Okay, it is in the trunk as of r20284 - I'll file the request to  
have it moved to 1.3.1.


Let me know if you get a chance to test the stdout/err stuff in  
the trunk - we should try and iterate it so any changes can make  
1.3.1 as well.


Thanks!
Ralph


On Jan 15, 2009, at 11:03 AM, Greg Watson wrote:


Ralph,

I think the second form would be ideal and would simplify things  
greatly.


Greg

On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote:

Here is what I was able to do - note that the resolve messages  
are associated with the specific hostname, not the overall map:











Will that work for you? If you like, I can remove the name=  
field from the noderesolve element since the info is specific  
to the host element that contains it. In other words, I can  
make it look like this:











if that would help.

Ralph


On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote:

We -may- be able to do a more formal XML output at some point.  
The problem will be the natural interleaving of stdout/err  
from the various procs due to the async behavior of MPI.  
Mpirun receives fragmented output in the forwarding system,  
limited by the buffer sizes and the amount of data we can read  
at any one "bite" from the pipes connecting us to the procs.  
So even though the user -thinks- they output a single large  
line of stuff, it may show up at mpirun as a series of  
fragments. Hence, it gets tricky to know how to put  
appropriate XML brackets around it.


Given this input about when you actually want resolved name  
info, I can at least do something about that area. Won't be in  
1.3.0, but should make 1.3.1.


As for XML-tagged stdout/err: the OMPI community asked me not  
to turn that feature "on" for 1.3.0 as they felt it hasn't  
been adequately tested yet. The code is present, but cannot be  
activated in 1.3.0. However, I believe it is activated on the  
trunk when you do --xml --tagged-output, so perhaps some  
testing will help us debug and validate it adequately for 1.3.1?



Re: [OMPI devel] -display-map

2009-01-16 Thread Greg Watson
FYI, if I configure with --with-platform=contrib/platform/lanl/macosx- 
dynamic the build succeeds.


Greg

On Jan 16, 2009, at 1:08 PM, Jeff Squyres wrote:

References: <78c4b4d7-d9bc-4268-97cf-8cbad...@computer.org>  <9317bd55-13a2-44be-bcc0-3e42e2322...@computer.org> <5cb48a5d-1ce3-48f7-8890-c99239b0a...@lanl.gov 
> <22ebe824--47f1-a954-8b54536bf...@computer.org>  <6dda0348-96b4-4e3f-91b4-490631cfe...@computer.org>   <460591d2-bd7b-43ca-9b1e-1b4e02127...@lanl.gov 
>   <4d997767-d893-43e7-bd4a-41266c9b4...@lanl.gov> <206dc9cd-aa61-4e7c-8a28-7dd3279ce...@computer.org 
>  <5175dc9a-ee1f-4b38-be89-eb55fcef3...@lanl.gov 
> <66736892-ce43-464c-b439-7ed03ddb0...@computer.org>  <8d67c754-d192-45ee-b4e8-071f67d78...@lanl.gov>  
<6D116CE6-9A8B-407E-A2D7-!
1 f716e827...@computer.org> <7d175b97-df6a-42db-81b9-4d9663861...@lanl.gov 
> <19ef4971-0390-4992-a8a7-cbc6b7189...@computer.org>

X-Mailer: Apple Mail (2.930.3)
Return-Path: jsquy...@cisco.com
X-OriginalArrivalTime: 16 Jan 2009 18:08:11.0165 (UTC)  
FILETIME=[646AA4D0:01C97805]


Er... whoops.  This looks like my mistake (I just recently add  
MPI_REDUCE_LOCAL to the trunk -- not v1.3).


I could have sworn that I tested this on a Mac, multiple times.   
I'll test again...



On Jan 16, 2009, at 12:58 PM, Greg Watson wrote:


When I try to build trunk, it fails with:

i_f77.lax/libmpi_f77_pmpi.a/pwin_unlock_f.o .libs/libmpi_f77.lax/ 
libmpi_f77_pmpi.a/pwin_wait_f.o .libs/libmpi_f77.lax/ 
libmpi_f77_pmpi.a/pwtick_f.o .libs/libmpi_f77.lax/libmpi_f77_pmpi.a/ 
pwtime_f.o   ../../../ompi/.libs/libmpi.0.0.0.dylib /usr/local/ 
openmpi-1.4-devel/lib/libopen-rte.0.0.0.dylib /usr/local/ 
openmpi-1.4-devel/lib/libopen-pal.0.0.0.dylib  -install_name  /usr/ 
local/openmpi-1.4-devel/lib/libmpi_f77.0.dylib - 
compatibility_version 1 -current_version 1.0
ld: duplicate symbol _mpi_reduce_local_f in .libs/libmpi_f77.lax/ 
libmpi_f77_pmpi.a/preduce_local_f.o and .libs/reduce_local_f.o


collect2: ld returned 1 exit status
make[3]: *** [libmpi_f77.la] Error 1
make[2]: *** [all-recursive] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all-recursive] Error 1

I'm using the default configure command (./configure --prefix=xxx)  
on Mac OS X 10.5. This works fine on the 1.3 branch.


Greg

On Jan 15, 2009, at 1:13 PM, Ralph Castain wrote:

Okay, it is in the trunk as of r20284 - I'll file the request to  
have it moved to 1.3.1.


Let me know if you get a chance to test the stdout/err stuff in  
the trunk - we should try and iterate it so any changes can make  
1.3.1 as well.


Thanks!
Ralph


On Jan 15, 2009, at 11:03 AM, Greg Watson wrote:


Ralph,

I think the second form would be ideal and would simplify things  
greatly.


Greg

On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote:

Here is what I was able to do - note that the resolve messages  
are associated with the specific hostname, not the overall map:











Will that work for you? If you like, I can remove the name=  
field from the noderesolve element since the info is specific to  
the host element that contains it. In other words, I can make it  
look like this:











if that would help.

Ralph


On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote:

We -may- be able to do a more formal XML output at some point.  
The problem will be the natural interleaving of stdout/err from  
the various procs due to the async behavior of MPI. Mpirun  
receives fragmented output in the forwarding system, limited by  
the buffer sizes and the amount of data we can read at any one  
"bite" from the pipes connecting us to the procs. So even  
though the user -thinks- they output a single large line of  
stuff, it may show up at mpirun as a series of fragments.  
Hence, it gets tricky to know how to put appropriate XML  
brackets around it.


Given this input about when you actually want resolved name  
info, I can at least do something about that area. Won't be in  
1.3.0, but should make 1.3.1.


As for XML-tagged stdout/err: the OMPI community asked me not  
to turn that feature "on" for 1.3.0 as they felt it hasn't been  
adequately tested yet. The code is present, but cannot be  
activated in 1.3.0. However, I believe it is activated on the  
trunk when you do --xml --tagged-output, so perhaps some  
testing will help us debug and validate it adequately for 1.3.1?


Thanks
Ralph


On Jan 14, 2009, at 7:02 AM, Greg Watson wrote:


Ralph,

The only time we use 

Re: [OMPI devel] -display-map

2009-01-16 Thread Jeff Squyres

References: <78c4b4d7-d9bc-4268-97cf-8cbad...@computer.org>  <9317bd55-13a2-44be-bcc0-3e42e2322...@computer.org> 
<5cb48a5d-1ce3-48f7-8890-c99239b0a...@lanl.gov> <22ebe824--47f1-a954-8b54536bf...@computer.org>  
<6dda0348-96b4-4e3f-91b4-490631cfe...@computer.org>   
<460591d2-bd7b-43ca-9b1e-1b4e02127...@lanl.gov>   
<4d997767-d893-43e7-bd4a-41266c9b4...@lanl.gov> <206dc9cd-aa61-4e7c-8a28-7dd3279ce...@computer.org>  
<5175dc9a-ee1f-4b38-be89-eb55fcef3...@lanl.gov> <66736892-ce43-464c-b439-7ed03ddb0...@computer.org>  
<8d67c754-d192-45ee-b4e8-071f67d78...@lanl.gov> <6D116CE6-9A8B-407E-A2D7-1 f716e827...@computer.org> <7d175b97-df6a-42db-81b9-4d9663861...@lanl.gov> 
<19ef4971-0390-4992-a8a7-cbc6b7189...@computer.org>
X-Mailer: Apple Mail (2.930.3)
Return-Path: jsquy...@cisco.com
X-OriginalArrivalTime: 16 Jan 2009 18:08:11.0165 (UTC) 
FILETIME=[646AA4D0:01C97805]

Er... whoops.  This looks like my mistake (I just recently add  
MPI_REDUCE_LOCAL to the trunk -- not v1.3).


I could have sworn that I tested this on a Mac, multiple times.  I'll  
test again...



On Jan 16, 2009, at 12:58 PM, Greg Watson wrote:


When I try to build trunk, it fails with:

i_f77.lax/libmpi_f77_pmpi.a/pwin_unlock_f.o .libs/libmpi_f77.lax/ 
libmpi_f77_pmpi.a/pwin_wait_f.o .libs/libmpi_f77.lax/ 
libmpi_f77_pmpi.a/pwtick_f.o .libs/libmpi_f77.lax/libmpi_f77_pmpi.a/ 
pwtime_f.o   ../../../ompi/.libs/libmpi.0.0.0.dylib /usr/local/ 
openmpi-1.4-devel/lib/libopen-rte.0.0.0.dylib /usr/local/openmpi-1.4- 
devel/lib/libopen-pal.0.0.0.dylib  -install_name  /usr/local/ 
openmpi-1.4-devel/lib/libmpi_f77.0.dylib -compatibility_version 1 - 
current_version 1.0
ld: duplicate symbol _mpi_reduce_local_f in .libs/libmpi_f77.lax/ 
libmpi_f77_pmpi.a/preduce_local_f.o and .libs/reduce_local_f.o


collect2: ld returned 1 exit status
make[3]: *** [libmpi_f77.la] Error 1
make[2]: *** [all-recursive] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all-recursive] Error 1

I'm using the default configure command (./configure --prefix=xxx)  
on Mac OS X 10.5. This works fine on the 1.3 branch.


Greg

On Jan 15, 2009, at 1:13 PM, Ralph Castain wrote:

Okay, it is in the trunk as of r20284 - I'll file the request to  
have it moved to 1.3.1.


Let me know if you get a chance to test the stdout/err stuff in the  
trunk - we should try and iterate it so any changes can make 1.3.1  
as well.


Thanks!
Ralph


On Jan 15, 2009, at 11:03 AM, Greg Watson wrote:


Ralph,

I think the second form would be ideal and would simplify things  
greatly.


Greg

On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote:

Here is what I was able to do - note that the resolve messages  
are associated with the specific hostname, not the overall map:











Will that work for you? If you like, I can remove the name= field  
from the noderesolve element since the info is specific to the  
host element that contains it. In other words, I can make it look  
like this:











if that would help.

Ralph


On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote:

We -may- be able to do a more formal XML output at some point.  
The problem will be the natural interleaving of stdout/err from  
the various procs due to the async behavior of MPI. Mpirun  
receives fragmented output in the forwarding system, limited by  
the buffer sizes and the amount of data we can read at any one  
"bite" from the pipes connecting us to the procs. So even though  
the user -thinks- they output a single large line of stuff, it  
may show up at mpirun as a series of fragments. Hence, it gets  
tricky to know how to put appropriate XML brackets around it.


Given this input about when you actually want resolved name  
info, I can at least do something about that area. Won't be in  
1.3.0, but should make 1.3.1.


As for XML-tagged stdout/err: the OMPI community asked me not to  
turn that feature "on" for 1.3.0 as they felt it hasn't been  
adequately tested yet. The code is present, but cannot be  
activated in 1.3.0. However, I believe it is activated on the  
trunk when you do --xml --tagged-output, so perhaps some testing  
will help us debug and validate it adequately for 1.3.1?


Thanks
Ralph


On Jan 14, 2009, at 7:02 AM, Greg Watson wrote:


Ralph,

The only time we use the resolved names is when we get a map,  
so we consider them part of the map output.


If quasi-XML is all that will ever be possible with 1.3, then  
you may as well leave as-is 

Re: [OMPI devel] -display-map

2009-01-16 Thread Greg Watson

When I try to build trunk, it fails with:

i_f77.lax/libmpi_f77_pmpi.a/pwin_unlock_f.o .libs/libmpi_f77.lax/ 
libmpi_f77_pmpi.a/pwin_wait_f.o .libs/libmpi_f77.lax/libmpi_f77_pmpi.a/ 
pwtick_f.o .libs/libmpi_f77.lax/libmpi_f77_pmpi.a/ 
pwtime_f.o   ../../../ompi/.libs/libmpi.0.0.0.dylib /usr/local/ 
openmpi-1.4-devel/lib/libopen-rte.0.0.0.dylib /usr/local/openmpi-1.4- 
devel/lib/libopen-pal.0.0.0.dylib  -install_name  /usr/local/ 
openmpi-1.4-devel/lib/libmpi_f77.0.dylib -compatibility_version 1 - 
current_version 1.0
ld: duplicate symbol _mpi_reduce_local_f in .libs/libmpi_f77.lax/ 
libmpi_f77_pmpi.a/preduce_local_f.o and .libs/reduce_local_f.o


collect2: ld returned 1 exit status
make[3]: *** [libmpi_f77.la] Error 1
make[2]: *** [all-recursive] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all-recursive] Error 1

I'm using the default configure command (./configure --prefix=xxx) on  
Mac OS X 10.5. This works fine on the 1.3 branch.


Greg

On Jan 15, 2009, at 1:13 PM, Ralph Castain wrote:

Okay, it is in the trunk as of r20284 - I'll file the request to  
have it moved to 1.3.1.


Let me know if you get a chance to test the stdout/err stuff in the  
trunk - we should try and iterate it so any changes can make 1.3.1  
as well.


Thanks!
Ralph


On Jan 15, 2009, at 11:03 AM, Greg Watson wrote:


Ralph,

I think the second form would be ideal and would simplify things  
greatly.


Greg

On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote:

Here is what I was able to do - note that the resolve messages are  
associated with the specific hostname, not the overall map:











Will that work for you? If you like, I can remove the name= field  
from the noderesolve element since the info is specific to the  
host element that contains it. In other words, I can make it look  
like this:











if that would help.

Ralph


On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote:

We -may- be able to do a more formal XML output at some point.  
The problem will be the natural interleaving of stdout/err from  
the various procs due to the async behavior of MPI. Mpirun  
receives fragmented output in the forwarding system, limited by  
the buffer sizes and the amount of data we can read at any one  
"bite" from the pipes connecting us to the procs. So even though  
the user -thinks- they output a single large line of stuff, it  
may show up at mpirun as a series of fragments. Hence, it gets  
tricky to know how to put appropriate XML brackets around it.


Given this input about when you actually want resolved name info,  
I can at least do something about that area. Won't be in 1.3.0,  
but should make 1.3.1.


As for XML-tagged stdout/err: the OMPI community asked me not to  
turn that feature "on" for 1.3.0 as they felt it hasn't been  
adequately tested yet. The code is present, but cannot be  
activated in 1.3.0. However, I believe it is activated on the  
trunk when you do --xml --tagged-output, so perhaps some testing  
will help us debug and validate it adequately for 1.3.1?


Thanks
Ralph


On Jan 14, 2009, at 7:02 AM, Greg Watson wrote:


Ralph,

The only time we use the resolved names is when we get a map, so  
we consider them part of the map output.


If quasi-XML is all that will ever be possible with 1.3, then  
you may as well leave as-is and we will attempt to clean it up  
in Eclipse. It would be nice if a future version of ompi could  
output correct XML (including stdout) as this would vastly  
simplify the parsing we need to do.


Regards,

Greg

On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote:

Hmmm...well, I can't do either for 1.3.0 as it is departing  
this afternoon.


The first option would be very hard to do. I would have to  
expose the display-map option across the code base and check it  
prior to printing anything about resolving node names. I guess  
I should ask: do you only want noderesolve statements when we  
are displaying the map? Right now, I will output them regardless.


The second option could be done. I could check if any "display"  
option has been specified, and output the  root at that  
time (likewise for the end). Anything we output in-between  
would be encapsulated between the two, but that would include  
any user output to stdout and/or stderr - which for 1.3.0 is  
not in xml.


Any thoughts?

Ralph

PS. Guess I should clarify that I was not striving for true XML  
interaction here, but rather a quasi-XML format that would help  
you to filter the output. I have no problem trying to get to  
something more formally correct, but it could be tricky in some  
places to achieve it due to the inherent async nature of the  
beast.



On Jan 13, 2009, at 12:17 PM, Greg Watson wrote:


Ralph,

The XML is looking better now, but there is still one problem.  
To be valid, there needs to be only one root 

Re: [OMPI devel] -display-map

2009-01-15 Thread Ralph Castain
Okay, it is in the trunk as of r20284 - I'll file the request to have  
it moved to 1.3.1.


Let me know if you get a chance to test the stdout/err stuff in the  
trunk - we should try and iterate it so any changes can make 1.3.1 as  
well.


Thanks!
Ralph


On Jan 15, 2009, at 11:03 AM, Greg Watson wrote:


Ralph,

I think the second form would be ideal and would simplify things  
greatly.


Greg

On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote:

Here is what I was able to do - note that the resolve messages are  
associated with the specific hostname, not the overall map:











Will that work for you? If you like, I can remove the name= field  
from the noderesolve element since the info is specific to the host  
element that contains it. In other words, I can make it look like  
this:











if that would help.

Ralph


On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote:

We -may- be able to do a more formal XML output at some point. The  
problem will be the natural interleaving of stdout/err from the  
various procs due to the async behavior of MPI. Mpirun receives  
fragmented output in the forwarding system, limited by the buffer  
sizes and the amount of data we can read at any one "bite" from  
the pipes connecting us to the procs. So even though the user - 
thinks- they output a single large line of stuff, it may show up  
at mpirun as a series of fragments. Hence, it gets tricky to know  
how to put appropriate XML brackets around it.


Given this input about when you actually want resolved name info,  
I can at least do something about that area. Won't be in 1.3.0,  
but should make 1.3.1.


As for XML-tagged stdout/err: the OMPI community asked me not to  
turn that feature "on" for 1.3.0 as they felt it hasn't been  
adequately tested yet. The code is present, but cannot be  
activated in 1.3.0. However, I believe it is activated on the  
trunk when you do --xml --tagged-output, so perhaps some testing  
will help us debug and validate it adequately for 1.3.1?


Thanks
Ralph


On Jan 14, 2009, at 7:02 AM, Greg Watson wrote:


Ralph,

The only time we use the resolved names is when we get a map, so  
we consider them part of the map output.


If quasi-XML is all that will ever be possible with 1.3, then you  
may as well leave as-is and we will attempt to clean it up in  
Eclipse. It would be nice if a future version of ompi could  
output correct XML (including stdout) as this would vastly  
simplify the parsing we need to do.


Regards,

Greg

On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote:

Hmmm...well, I can't do either for 1.3.0 as it is departing this  
afternoon.


The first option would be very hard to do. I would have to  
expose the display-map option across the code base and check it  
prior to printing anything about resolving node names. I guess I  
should ask: do you only want noderesolve statements when we are  
displaying the map? Right now, I will output them regardless.


The second option could be done. I could check if any "display"  
option has been specified, and output the  root at that  
time (likewise for the end). Anything we output in-between would  
be encapsulated between the two, but that would include any user  
output to stdout and/or stderr - which for 1.3.0 is not in xml.


Any thoughts?

Ralph

PS. Guess I should clarify that I was not striving for true XML  
interaction here, but rather a quasi-XML format that would help  
you to filter the output. I have no problem trying to get to  
something more formally correct, but it could be tricky in some  
places to achieve it due to the inherent async nature of the  
beast.



On Jan 13, 2009, at 12:17 PM, Greg Watson wrote:


Ralph,

The XML is looking better now, but there is still one problem.  
To be valid, there needs to be only one root element, but  
currently you don't have any (or many). So rather than:














the XML should be:













or:















Would either of these be possible?

Thanks,

Greg

On Dec 8, 2008, at 2:18 PM, Greg Watson wrote:


Ok thanks. I'll test from trunk in future.

Greg

On Dec 8, 2008, at 2:05 PM, Ralph Castain wrote:


Working its way around the CMR process now.

Might be easier in the future if we could test/debug this in  
the trunk, though. Otherwise, the CMR procedure will fall  
behind and a fix might miss a release window.


Anyway, hopefully this one will make the 1.3.0 release cutoff.

Thanks
Ralph


Re: [OMPI devel] -display-map

2009-01-15 Thread Greg Watson

Ralph,

I think the second form would be ideal and would simplify things  
greatly.


Greg

On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote:

Here is what I was able to do - note that the resolve messages are  
associated with the specific hostname, not the overall map:











Will that work for you? If you like, I can remove the name= field  
from the noderesolve element since the info is specific to the host  
element that contains it. In other words, I can make it look like  
this:











if that would help.

Ralph


On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote:

We -may- be able to do a more formal XML output at some point. The  
problem will be the natural interleaving of stdout/err from the  
various procs due to the async behavior of MPI. Mpirun receives  
fragmented output in the forwarding system, limited by the buffer  
sizes and the amount of data we can read at any one "bite" from the  
pipes connecting us to the procs. So even though the user -thinks-  
they output a single large line of stuff, it may show up at mpirun  
as a series of fragments. Hence, it gets tricky to know how to put  
appropriate XML brackets around it.


Given this input about when you actually want resolved name info, I  
can at least do something about that area. Won't be in 1.3.0, but  
should make 1.3.1.


As for XML-tagged stdout/err: the OMPI community asked me not to  
turn that feature "on" for 1.3.0 as they felt it hasn't been  
adequately tested yet. The code is present, but cannot be activated  
in 1.3.0. However, I believe it is activated on the trunk when you  
do --xml --tagged-output, so perhaps some testing will help us  
debug and validate it adequately for 1.3.1?


Thanks
Ralph


On Jan 14, 2009, at 7:02 AM, Greg Watson wrote:


Ralph,

The only time we use the resolved names is when we get a map, so  
we consider them part of the map output.


If quasi-XML is all that will ever be possible with 1.3, then you  
may as well leave as-is and we will attempt to clean it up in  
Eclipse. It would be nice if a future version of ompi could output  
correct XML (including stdout) as this would vastly simplify the  
parsing we need to do.


Regards,

Greg

On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote:

Hmmm...well, I can't do either for 1.3.0 as it is departing this  
afternoon.


The first option would be very hard to do. I would have to expose  
the display-map option across the code base and check it prior to  
printing anything about resolving node names. I guess I should  
ask: do you only want noderesolve statements when we are  
displaying the map? Right now, I will output them regardless.


The second option could be done. I could check if any "display"  
option has been specified, and output the  root at that  
time (likewise for the end). Anything we output in-between would  
be encapsulated between the two, but that would include any user  
output to stdout and/or stderr - which for 1.3.0 is not in xml.


Any thoughts?

Ralph

PS. Guess I should clarify that I was not striving for true XML  
interaction here, but rather a quasi-XML format that would help  
you to filter the output. I have no problem trying to get to  
something more formally correct, but it could be tricky in some  
places to achieve it due to the inherent async nature of the beast.



On Jan 13, 2009, at 12:17 PM, Greg Watson wrote:


Ralph,

The XML is looking better now, but there is still one problem.  
To be valid, there needs to be only one root element, but  
currently you don't have any (or many). So rather than:














the XML should be:













or:















Would either of these be possible?

Thanks,

Greg

On Dec 8, 2008, at 2:18 PM, Greg Watson wrote:


Ok thanks. I'll test from trunk in future.

Greg

On Dec 8, 2008, at 2:05 PM, Ralph Castain wrote:


Working its way around the CMR process now.

Might be easier in the future if we could test/debug this in  
the trunk, though. Otherwise, the CMR procedure will fall  
behind and a fix might miss a release window.


Anyway, hopefully this one will make the 1.3.0 release cutoff.

Thanks
Ralph

On Dec 8, 2008, at 9:56 AM, Greg Watson wrote:


Hi Ralph,

This is now in 1.3rc2, thanks. However there are a couple of  
problems. Here is what I see:


[Jarrah.watson.ibm.com:58957] resolved="Jarrah.watson.ibm.com">


For some reason each line is prefixed with "[...]", any idea  
why this is? Also the end 

Re: [OMPI devel] -display-map

2009-01-15 Thread Ralph Castain
Here is what I was able to do - note that the resolve messages are  
associated with the specific hostname, not the overall map:











Will that work for you? If you like, I can remove the name= field from  
the noderesolve element since the info is specific to the host element  
that contains it. In other words, I can make it look like this:











if that would help.

Ralph


On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote:

We -may- be able to do a more formal XML output at some point. The  
problem will be the natural interleaving of stdout/err from the  
various procs due to the async behavior of MPI. Mpirun receives  
fragmented output in the forwarding system, limited by the buffer  
sizes and the amount of data we can read at any one "bite" from the  
pipes connecting us to the procs. So even though the user -thinks-  
they output a single large line of stuff, it may show up at mpirun  
as a series of fragments. Hence, it gets tricky to know how to put  
appropriate XML brackets around it.


Given this input about when you actually want resolved name info, I  
can at least do something about that area. Won't be in 1.3.0, but  
should make 1.3.1.


As for XML-tagged stdout/err: the OMPI community asked me not to  
turn that feature "on" for 1.3.0 as they felt it hasn't been  
adequately tested yet. The code is present, but cannot be activated  
in 1.3.0. However, I believe it is activated on the trunk when you  
do --xml --tagged-output, so perhaps some testing will help us debug  
and validate it adequately for 1.3.1?


Thanks
Ralph


On Jan 14, 2009, at 7:02 AM, Greg Watson wrote:


Ralph,

The only time we use the resolved names is when we get a map, so we  
consider them part of the map output.


If quasi-XML is all that will ever be possible with 1.3, then you  
may as well leave as-is and we will attempt to clean it up in  
Eclipse. It would be nice if a future version of ompi could output  
correct XML (including stdout) as this would vastly simplify the  
parsing we need to do.


Regards,

Greg

On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote:

Hmmm...well, I can't do either for 1.3.0 as it is departing this  
afternoon.


The first option would be very hard to do. I would have to expose  
the display-map option across the code base and check it prior to  
printing anything about resolving node names. I guess I should  
ask: do you only want noderesolve statements when we are  
displaying the map? Right now, I will output them regardless.


The second option could be done. I could check if any "display"  
option has been specified, and output the  root at that time  
(likewise for the end). Anything we output in-between would be  
encapsulated between the two, but that would include any user  
output to stdout and/or stderr - which for 1.3.0 is not in xml.


Any thoughts?

Ralph

PS. Guess I should clarify that I was not striving for true XML  
interaction here, but rather a quasi-XML format that would help  
you to filter the output. I have no problem trying to get to  
something more formally correct, but it could be tricky in some  
places to achieve it due to the inherent async nature of the beast.



On Jan 13, 2009, at 12:17 PM, Greg Watson wrote:


Ralph,

The XML is looking better now, but there is still one problem. To  
be valid, there needs to be only one root element, but currently  
you don't have any (or many). So rather than:














the XML should be:













or:















Would either of these be possible?

Thanks,

Greg

On Dec 8, 2008, at 2:18 PM, Greg Watson wrote:


Ok thanks. I'll test from trunk in future.

Greg

On Dec 8, 2008, at 2:05 PM, Ralph Castain wrote:


Working its way around the CMR process now.

Might be easier in the future if we could test/debug this in  
the trunk, though. Otherwise, the CMR procedure will fall  
behind and a fix might miss a release window.


Anyway, hopefully this one will make the 1.3.0 release cutoff.

Thanks
Ralph

On Dec 8, 2008, at 9:56 AM, Greg Watson wrote:


Hi Ralph,

This is now in 1.3rc2, thanks. However there are a couple of  
problems. Here is what I see:


[Jarrah.watson.ibm.com:58957] resolved="Jarrah.watson.ibm.com">


For some reason each line is prefixed with "[...]", any idea  
why this is? Also the end tag should be "/>" not ">".


Thanks,

Greg

On Nov 24, 2008, at 3:06 PM, Greg Watson wrote:


Great, thanks. I'll take a look once it comes over 

Re: [OMPI devel] -display-map

2009-01-14 Thread Ralph Castain
We -may- be able to do a more formal XML output at some point. The  
problem will be the natural interleaving of stdout/err from the  
various procs due to the async behavior of MPI. Mpirun receives  
fragmented output in the forwarding system, limited by the buffer  
sizes and the amount of data we can read at any one "bite" from the  
pipes connecting us to the procs. So even though the user -thinks-  
they output a single large line of stuff, it may show up at mpirun as  
a series of fragments. Hence, it gets tricky to know how to put  
appropriate XML brackets around it.


Given this input about when you actually want resolved name info, I  
can at least do something about that area. Won't be in 1.3.0, but  
should make 1.3.1.


As for XML-tagged stdout/err: the OMPI community asked me not to turn  
that feature "on" for 1.3.0 as they felt it hasn't been adequately  
tested yet. The code is present, but cannot be activated in 1.3.0.  
However, I believe it is activated on the trunk when you do --xml -- 
tagged-output, so perhaps some testing will help us debug and validate  
it adequately for 1.3.1?


Thanks
Ralph


On Jan 14, 2009, at 7:02 AM, Greg Watson wrote:


Ralph,

The only time we use the resolved names is when we get a map, so we  
consider them part of the map output.


If quasi-XML is all that will ever be possible with 1.3, then you  
may as well leave as-is and we will attempt to clean it up in  
Eclipse. It would be nice if a future version of ompi could output  
correct XML (including stdout) as this would vastly simplify the  
parsing we need to do.


Regards,

Greg

On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote:

Hmmm...well, I can't do either for 1.3.0 as it is departing this  
afternoon.


The first option would be very hard to do. I would have to expose  
the display-map option across the code base and check it prior to  
printing anything about resolving node names. I guess I should ask:  
do you only want noderesolve statements when we are displaying the  
map? Right now, I will output them regardless.


The second option could be done. I could check if any "display"  
option has been specified, and output the  root at that time  
(likewise for the end). Anything we output in-between would be  
encapsulated between the two, but that would include any user  
output to stdout and/or stderr - which for 1.3.0 is not in xml.


Any thoughts?

Ralph

PS. Guess I should clarify that I was not striving for true XML  
interaction here, but rather a quasi-XML format that would help you  
to filter the output. I have no problem trying to get to something  
more formally correct, but it could be tricky in some places to  
achieve it due to the inherent async nature of the beast.



On Jan 13, 2009, at 12:17 PM, Greg Watson wrote:


Ralph,

The XML is looking better now, but there is still one problem. To  
be valid, there needs to be only one root element, but currently  
you don't have any (or many). So rather than:














the XML should be:













or:















Would either of these be possible?

Thanks,

Greg

On Dec 8, 2008, at 2:18 PM, Greg Watson wrote:


Ok thanks. I'll test from trunk in future.

Greg

On Dec 8, 2008, at 2:05 PM, Ralph Castain wrote:


Working its way around the CMR process now.

Might be easier in the future if we could test/debug this in the  
trunk, though. Otherwise, the CMR procedure will fall behind and  
a fix might miss a release window.


Anyway, hopefully this one will make the 1.3.0 release cutoff.

Thanks
Ralph

On Dec 8, 2008, at 9:56 AM, Greg Watson wrote:


Hi Ralph,

This is now in 1.3rc2, thanks. However there are a couple of  
problems. Here is what I see:


[Jarrah.watson.ibm.com:58957] resolved="Jarrah.watson.ibm.com">


For some reason each line is prefixed with "[...]", any idea  
why this is? Also the end tag should be "/>" not ">".


Thanks,

Greg

On Nov 24, 2008, at 3:06 PM, Greg Watson wrote:


Great, thanks. I'll take a look once it comes over to 1.3.

Cheers,

Greg

On Nov 24, 2008, at 2:59 PM, Ralph Castain wrote:


Yo Greg

This is in the trunk as of r20032. I'll bring it over to 1.3  
in a few days.


I implemented it as another MCA param  
"orte_show_resolved_nodenames" so you can actually get the  
info as you execute the job, if you want. The xml tag is  
"noderesolve" - let me know if you need any changes.


Ralph


On Oct 22, 2008, at 11:55 AM, Greg Watson wrote:


Ralph,

I guess the issue for us is that we will have to run two  
commands to get the information we need. One to get the  
configuration information, such as 

Re: [OMPI devel] -display-map

2009-01-14 Thread Greg Watson

Ralph,

The only time we use the resolved names is when we get a map, so we  
consider them part of the map output.


If quasi-XML is all that will ever be possible with 1.3, then you may  
as well leave as-is and we will attempt to clean it up in Eclipse. It  
would be nice if a future version of ompi could output correct XML  
(including stdout) as this would vastly simplify the parsing we need  
to do.


Regards,

Greg

On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote:

Hmmm...well, I can't do either for 1.3.0 as it is departing this  
afternoon.


The first option would be very hard to do. I would have to expose  
the display-map option across the code base and check it prior to  
printing anything about resolving node names. I guess I should ask:  
do you only want noderesolve statements when we are displaying the  
map? Right now, I will output them regardless.


The second option could be done. I could check if any "display"  
option has been specified, and output the  root at that time  
(likewise for the end). Anything we output in-between would be  
encapsulated between the two, but that would include any user output  
to stdout and/or stderr - which for 1.3.0 is not in xml.


Any thoughts?

Ralph

PS. Guess I should clarify that I was not striving for true XML  
interaction here, but rather a quasi-XML format that would help you  
to filter the output. I have no problem trying to get to something  
more formally correct, but it could be tricky in some places to  
achieve it due to the inherent async nature of the beast.



On Jan 13, 2009, at 12:17 PM, Greg Watson wrote:


Ralph,

The XML is looking better now, but there is still one problem. To  
be valid, there needs to be only one root element, but currently  
you don't have any (or many). So rather than:














the XML should be:













or:















Would either of these be possible?

Thanks,

Greg

On Dec 8, 2008, at 2:18 PM, Greg Watson wrote:


Ok thanks. I'll test from trunk in future.

Greg

On Dec 8, 2008, at 2:05 PM, Ralph Castain wrote:


Working its way around the CMR process now.

Might be easier in the future if we could test/debug this in the  
trunk, though. Otherwise, the CMR procedure will fall behind and  
a fix might miss a release window.


Anyway, hopefully this one will make the 1.3.0 release cutoff.

Thanks
Ralph

On Dec 8, 2008, at 9:56 AM, Greg Watson wrote:


Hi Ralph,

This is now in 1.3rc2, thanks. However there are a couple of  
problems. Here is what I see:


[Jarrah.watson.ibm.com:58957] resolved="Jarrah.watson.ibm.com">


For some reason each line is prefixed with "[...]", any idea why  
this is? Also the end tag should be "/>" not ">".


Thanks,

Greg

On Nov 24, 2008, at 3:06 PM, Greg Watson wrote:


Great, thanks. I'll take a look once it comes over to 1.3.

Cheers,

Greg

On Nov 24, 2008, at 2:59 PM, Ralph Castain wrote:


Yo Greg

This is in the trunk as of r20032. I'll bring it over to 1.3  
in a few days.


I implemented it as another MCA param  
"orte_show_resolved_nodenames" so you can actually get the  
info as you execute the job, if you want. The xml tag is  
"noderesolve" - let me know if you need any changes.


Ralph


On Oct 22, 2008, at 11:55 AM, Greg Watson wrote:


Ralph,

I guess the issue for us is that we will have to run two  
commands to get the information we need. One to get the  
configuration information, such as version and MCA  
parameters, and one to get the host information, whereas it  
would seem more logical that this should all be available via  
some kind of "configuration discovery" command. I understand  
the issue with supplying the hostfile though, so maybe this  
just points at the need for us to separate configuration  
information from the host information. In any case, we'll  
work with what you think is best.


Greg

On Oct 20, 2008, at 4:49 PM, Ralph Castain wrote:

Hmmm...just to be sure we are all clear on this. The reason  
we proposed to use mpirun is that "hostfile" has no meaning  
outside of mpirun. That's why ompi_info can't do anything in  
this regard.


We have no idea what hostfile the user may specify until we  
actually get the mpirun cmd line. They may have specified a  
default-hostfile, but they could also specify hostfiles for  
the individual app_contexts. These may or may not include  
the node upon which mpirun is executing.


So the only way to provide you with a separate command to  
get a hostfile<->nodename mapping would require you to  
provide us with the default-hostifle and/or hostfile cmd  
line options just as if you were issuing 

Re: [OMPI devel] -display-map

2009-01-13 Thread Ralph Castain
Hmmm...well, I can't do either for 1.3.0 as it is departing this  
afternoon.


The first option would be very hard to do. I would have to expose the  
display-map option across the code base and check it prior to printing  
anything about resolving node names. I guess I should ask: do you only  
want noderesolve statements when we are displaying the map? Right now,  
I will output them regardless.


The second option could be done. I could check if any "display" option  
has been specified, and output the  root at that time (likewise  
for the end). Anything we output in-between would be encapsulated  
between the two, but that would include any user output to stdout and/ 
or stderr - which for 1.3.0 is not in xml.


Any thoughts?

Ralph

PS. Guess I should clarify that I was not striving for true XML  
interaction here, but rather a quasi-XML format that would help you to  
filter the output. I have no problem trying to get to something more  
formally correct, but it could be tricky in some places to achieve it  
due to the inherent async nature of the beast.



On Jan 13, 2009, at 12:17 PM, Greg Watson wrote:


Ralph,

The XML is looking better now, but there is still one problem. To be  
valid, there needs to be only one root element, but currently you  
don't have any (or many). So rather than:














the XML should be:













or:















Would either of these be possible?

Thanks,

Greg

On Dec 8, 2008, at 2:18 PM, Greg Watson wrote:


Ok thanks. I'll test from trunk in future.

Greg

On Dec 8, 2008, at 2:05 PM, Ralph Castain wrote:


Working its way around the CMR process now.

Might be easier in the future if we could test/debug this in the  
trunk, though. Otherwise, the CMR procedure will fall behind and a  
fix might miss a release window.


Anyway, hopefully this one will make the 1.3.0 release cutoff.

Thanks
Ralph

On Dec 8, 2008, at 9:56 AM, Greg Watson wrote:


Hi Ralph,

This is now in 1.3rc2, thanks. However there are a couple of  
problems. Here is what I see:


[Jarrah.watson.ibm.com:58957] resolved="Jarrah.watson.ibm.com">


For some reason each line is prefixed with "[...]", any idea why  
this is? Also the end tag should be "/>" not ">".


Thanks,

Greg

On Nov 24, 2008, at 3:06 PM, Greg Watson wrote:


Great, thanks. I'll take a look once it comes over to 1.3.

Cheers,

Greg

On Nov 24, 2008, at 2:59 PM, Ralph Castain wrote:


Yo Greg

This is in the trunk as of r20032. I'll bring it over to 1.3 in  
a few days.


I implemented it as another MCA param  
"orte_show_resolved_nodenames" so you can actually get the info  
as you execute the job, if you want. The xml tag is  
"noderesolve" - let me know if you need any changes.


Ralph


On Oct 22, 2008, at 11:55 AM, Greg Watson wrote:


Ralph,

I guess the issue for us is that we will have to run two  
commands to get the information we need. One to get the  
configuration information, such as version and MCA parameters,  
and one to get the host information, whereas it would seem  
more logical that this should all be available via some kind  
of "configuration discovery" command. I understand the issue  
with supplying the hostfile though, so maybe this just points  
at the need for us to separate configuration information from  
the host information. In any case, we'll work with what you  
think is best.


Greg

On Oct 20, 2008, at 4:49 PM, Ralph Castain wrote:

Hmmm...just to be sure we are all clear on this. The reason  
we proposed to use mpirun is that "hostfile" has no meaning  
outside of mpirun. That's why ompi_info can't do anything in  
this regard.


We have no idea what hostfile the user may specify until we  
actually get the mpirun cmd line. They may have specified a  
default-hostfile, but they could also specify hostfiles for  
the individual app_contexts. These may or may not include the  
node upon which mpirun is executing.


So the only way to provide you with a separate command to get  
a hostfile<->nodename mapping would require you to provide us  
with the default-hostifle and/or hostfile cmd line options  
just as if you were issuing the mpirun cmd. We just wouldn't  
launch - but it would be the exact equivalent of doing  
"mpirun --do-not-launch".


Am I missing something? If so, please do correct me - I would  
be happy to provide a tool if that would make it easier. Just  
not sure what that tool would do.


Thanks
Ralph


On Oct 19, 2008, at 1:59 PM, Greg Watson wrote:


Ralph,

It seems a little strange to be using mpirun for this, but  
barring providing a separate command, or using ompi_info, I  

Re: [OMPI devel] -display-map

2009-01-13 Thread Greg Watson

Ralph,

The XML is looking better now, but there is still one problem. To be  
valid, there needs to be only one root element, but currently you  
don't have any (or many). So rather than:














the XML should be:













or:















Would either of these be possible?

Thanks,

Greg

On Dec 8, 2008, at 2:18 PM, Greg Watson wrote:


Ok thanks. I'll test from trunk in future.

Greg

On Dec 8, 2008, at 2:05 PM, Ralph Castain wrote:


Working its way around the CMR process now.

Might be easier in the future if we could test/debug this in the  
trunk, though. Otherwise, the CMR procedure will fall behind and a  
fix might miss a release window.


Anyway, hopefully this one will make the 1.3.0 release cutoff.

Thanks
Ralph

On Dec 8, 2008, at 9:56 AM, Greg Watson wrote:


Hi Ralph,

This is now in 1.3rc2, thanks. However there are a couple of  
problems. Here is what I see:


[Jarrah.watson.ibm.com:58957] resolved="Jarrah.watson.ibm.com">


For some reason each line is prefixed with "[...]", any idea why  
this is? Also the end tag should be "/>" not ">".


Thanks,

Greg

On Nov 24, 2008, at 3:06 PM, Greg Watson wrote:


Great, thanks. I'll take a look once it comes over to 1.3.

Cheers,

Greg

On Nov 24, 2008, at 2:59 PM, Ralph Castain wrote:


Yo Greg

This is in the trunk as of r20032. I'll bring it over to 1.3 in  
a few days.


I implemented it as another MCA param  
"orte_show_resolved_nodenames" so you can actually get the info  
as you execute the job, if you want. The xml tag is  
"noderesolve" - let me know if you need any changes.


Ralph


On Oct 22, 2008, at 11:55 AM, Greg Watson wrote:


Ralph,

I guess the issue for us is that we will have to run two  
commands to get the information we need. One to get the  
configuration information, such as version and MCA parameters,  
and one to get the host information, whereas it would seem more  
logical that this should all be available via some kind of  
"configuration discovery" command. I understand the issue with  
supplying the hostfile though, so maybe this just points at the  
need for us to separate configuration information from the host  
information. In any case, we'll work with what you think is best.


Greg

On Oct 20, 2008, at 4:49 PM, Ralph Castain wrote:

Hmmm...just to be sure we are all clear on this. The reason we  
proposed to use mpirun is that "hostfile" has no meaning  
outside of mpirun. That's why ompi_info can't do anything in  
this regard.


We have no idea what hostfile the user may specify until we  
actually get the mpirun cmd line. They may have specified a  
default-hostfile, but they could also specify hostfiles for  
the individual app_contexts. These may or may not include the  
node upon which mpirun is executing.


So the only way to provide you with a separate command to get  
a hostfile<->nodename mapping would require you to provide us  
with the default-hostifle and/or hostfile cmd line options  
just as if you were issuing the mpirun cmd. We just wouldn't  
launch - but it would be the exact equivalent of doing "mpirun  
--do-not-launch".


Am I missing something? If so, please do correct me - I would  
be happy to provide a tool if that would make it easier. Just  
not sure what that tool would do.


Thanks
Ralph


On Oct 19, 2008, at 1:59 PM, Greg Watson wrote:


Ralph,

It seems a little strange to be using mpirun for this, but  
barring providing a separate command, or using ompi_info, I  
think this would solve our problem.


Thanks,

Greg

On Oct 17, 2008, at 10:46 AM, Ralph Castain wrote:


Sorry for delay - had to ponder this one for awhile.

Jeff and I agree that adding something to ompi_info would  
not be a good idea. Ompi_info has no knowledge or  
understanding of hostfiles, and adding that capability to it  
would be a major distortion of its intended use.


However, we think we can offer an alternative that might  
better solve the problem. Remember, we now treat hostfiles  
in a very different manner than before - see the wiki page  
for a complete description, or "man orte_hosts".


So the problem is that, to provide you with what you want,  
we need to "dump" the information from whatever default- 
hostfile was provided, and, if no default-hostfile was  
provided, then the information from each hostfile that was  
provided with an app_context.


The best way we could think of to do this is to add another  
mpirun cmd line option --dump-hostfiles that would output  
the line-by-line name from the hostfile plus the name we  
resolved it to. Of course, --xml would cause it to be in xml  

Re: [OMPI devel] -display-map

2008-12-08 Thread Greg Watson

Hi Ralph,

This is now in 1.3rc2, thanks. However there are a couple of problems.  
Here is what I see:


[Jarrah.watson.ibm.com:58957] resolved="Jarrah.watson.ibm.com">


For some reason each line is prefixed with "[...]", any idea why this  
is? Also the end tag should be "/>" not ">".


Thanks,

Greg

On Nov 24, 2008, at 3:06 PM, Greg Watson wrote:


Great, thanks. I'll take a look once it comes over to 1.3.

Cheers,

Greg

On Nov 24, 2008, at 2:59 PM, Ralph Castain wrote:


Yo Greg

This is in the trunk as of r20032. I'll bring it over to 1.3 in a  
few days.


I implemented it as another MCA param  
"orte_show_resolved_nodenames" so you can actually get the info as  
you execute the job, if you want. The xml tag is "noderesolve" -  
let me know if you need any changes.


Ralph


On Oct 22, 2008, at 11:55 AM, Greg Watson wrote:


Ralph,

I guess the issue for us is that we will have to run two commands  
to get the information we need. One to get the configuration  
information, such as version and MCA parameters, and one to get  
the host information, whereas it would seem more logical that this  
should all be available via some kind of "configuration discovery"  
command. I understand the issue with supplying the hostfile  
though, so maybe this just points at the need for us to separate  
configuration information from the host information. In any case,  
we'll work with what you think is best.


Greg

On Oct 20, 2008, at 4:49 PM, Ralph Castain wrote:

Hmmm...just to be sure we are all clear on this. The reason we  
proposed to use mpirun is that "hostfile" has no meaning outside  
of mpirun. That's why ompi_info can't do anything in this regard.


We have no idea what hostfile the user may specify until we  
actually get the mpirun cmd line. They may have specified a  
default-hostfile, but they could also specify hostfiles for the  
individual app_contexts. These may or may not include the node  
upon which mpirun is executing.


So the only way to provide you with a separate command to get a  
hostfile<->nodename mapping would require you to provide us with  
the default-hostifle and/or hostfile cmd line options just as if  
you were issuing the mpirun cmd. We just wouldn't launch - but it  
would be the exact equivalent of doing "mpirun --do-not-launch".


Am I missing something? If so, please do correct me - I would be  
happy to provide a tool if that would make it easier. Just not  
sure what that tool would do.


Thanks
Ralph


On Oct 19, 2008, at 1:59 PM, Greg Watson wrote:


Ralph,

It seems a little strange to be using mpirun for this, but  
barring providing a separate command, or using ompi_info, I  
think this would solve our problem.


Thanks,

Greg

On Oct 17, 2008, at 10:46 AM, Ralph Castain wrote:


Sorry for delay - had to ponder this one for awhile.

Jeff and I agree that adding something to ompi_info would not  
be a good idea. Ompi_info has no knowledge or understanding of  
hostfiles, and adding that capability to it would be a major  
distortion of its intended use.


However, we think we can offer an alternative that might better  
solve the problem. Remember, we now treat hostfiles in a very  
different manner than before - see the wiki page for a complete  
description, or "man orte_hosts".


So the problem is that, to provide you with what you want, we  
need to "dump" the information from whatever default-hostfile  
was provided, and, if no default-hostfile was provided, then  
the information from each hostfile that was provided with an  
app_context.


The best way we could think of to do this is to add another  
mpirun cmd line option --dump-hostfiles that would output the  
line-by-line name from the hostfile plus the name we resolved  
it to. Of course, --xml would cause it to be in xml format.


Would that meet your needs?

Ralph


On Oct 15, 2008, at 3:12 PM, Greg Watson wrote:


Hi Ralph,

We've been discussing this back and forth a bit internally and  
don't really see an easy solution. Our problem is that Eclipse  
is not running on the head node, so gethostbyname will not  
necessarily resolve to the same address. For example, the  
hostfile might refer to the head node by an internal network  
address that is not visible to the outside world. Since  
gethostname also looks in /etc/hosts, it may resolve locally  
but not on a remote system. The only think I can think of  
would be, rather than us reading the hostfile directly as we  
do now, to provide an option to ompi_info that would dump the  
hostfile using the same rules that you apply when you're using  
the hostfile. Would that be feasible?


Greg

On Sep 22, 2008, at 4:25 PM, Ralph Castain wrote:

Sorry for delay - was on vacation and am now trying to work  
my way back to the surface.


I'm not sure I can fix this one for two reasons:

1. In general, OMPI doesn't really care what name is used for  
the node. However, the problem is that it needs to be  
consistent. In this case, ORTE has already 

Re: [OMPI devel] -display-map

2008-12-02 Thread Ralph Castain

It slipped thru the cracks - will be in rc2.

Thanks for the reminder!
Ralph


On Dec 2, 2008, at 2:03 PM, Greg Watson wrote:


Ralph, will this be in 1.3rc1?

Thanks,
Greg

On Nov 24, 2008, at 3:06 PM, Greg Watson wrote:


Great, thanks. I'll take a look once it comes over to 1.3.

Cheers,

Greg

On Nov 24, 2008, at 2:59 PM, Ralph Castain wrote:


Yo Greg

This is in the trunk as of r20032. I'll bring it over to 1.3 in a  
few days.


I implemented it as another MCA param  
"orte_show_resolved_nodenames" so you can actually get the info as  
you execute the job, if you want. The xml tag is "noderesolve" -  
let me know if you need any changes.


Ralph


On Oct 22, 2008, at 11:55 AM, Greg Watson wrote:


Ralph,

I guess the issue for us is that we will have to run two commands  
to get the information we need. One to get the configuration  
information, such as version and MCA parameters, and one to get  
the host information, whereas it would seem more logical that  
this should all be available via some kind of "configuration  
discovery" command. I understand the issue with supplying the  
hostfile though, so maybe this just points at the need for us to  
separate configuration information from the host information. In  
any case, we'll work with what you think is best.


Greg

On Oct 20, 2008, at 4:49 PM, Ralph Castain wrote:

Hmmm...just to be sure we are all clear on this. The reason we  
proposed to use mpirun is that "hostfile" has no meaning outside  
of mpirun. That's why ompi_info can't do anything in this regard.


We have no idea what hostfile the user may specify until we  
actually get the mpirun cmd line. They may have specified a  
default-hostfile, but they could also specify hostfiles for the  
individual app_contexts. These may or may not include the node  
upon which mpirun is executing.


So the only way to provide you with a separate command to get a  
hostfile<->nodename mapping would require you to provide us with  
the default-hostifle and/or hostfile cmd line options just as if  
you were issuing the mpirun cmd. We just wouldn't launch - but  
it would be the exact equivalent of doing "mpirun --do-not- 
launch".


Am I missing something? If so, please do correct me - I would be  
happy to provide a tool if that would make it easier. Just not  
sure what that tool would do.


Thanks
Ralph


On Oct 19, 2008, at 1:59 PM, Greg Watson wrote:


Ralph,

It seems a little strange to be using mpirun for this, but  
barring providing a separate command, or using ompi_info, I  
think this would solve our problem.


Thanks,

Greg

On Oct 17, 2008, at 10:46 AM, Ralph Castain wrote:


Sorry for delay - had to ponder this one for awhile.

Jeff and I agree that adding something to ompi_info would not  
be a good idea. Ompi_info has no knowledge or understanding of  
hostfiles, and adding that capability to it would be a major  
distortion of its intended use.


However, we think we can offer an alternative that might  
better solve the problem. Remember, we now treat hostfiles in  
a very different manner than before - see the wiki page for a  
complete description, or "man orte_hosts".


So the problem is that, to provide you with what you want, we  
need to "dump" the information from whatever default-hostfile  
was provided, and, if no default-hostfile was provided, then  
the information from each hostfile that was provided with an  
app_context.


The best way we could think of to do this is to add another  
mpirun cmd line option --dump-hostfiles that would output the  
line-by-line name from the hostfile plus the name we resolved  
it to. Of course, --xml would cause it to be in xml format.


Would that meet your needs?

Ralph


On Oct 15, 2008, at 3:12 PM, Greg Watson wrote:


Hi Ralph,

We've been discussing this back and forth a bit internally  
and don't really see an easy solution. Our problem is that  
Eclipse is not running on the head node, so gethostbyname  
will not necessarily resolve to the same address. For  
example, the hostfile might refer to the head node by an  
internal network address that is not visible to the outside  
world. Since gethostname also looks in /etc/hosts, it may  
resolve locally but not on a remote system. The only think I  
can think of would be, rather than us reading the hostfile  
directly as we do now, to provide an option to ompi_info that  
would dump the hostfile using the same rules that you apply  
when you're using the hostfile. Would that be feasible?


Greg

On Sep 22, 2008, at 4:25 PM, Ralph Castain wrote:

Sorry for delay - was on vacation and am now trying to work  
my way back to the surface.


I'm not sure I can fix this one for two reasons:

1. In general, OMPI doesn't really care what name is used  
for the node. However, the problem is that it needs to be  
consistent. In this case, ORTE has already used the name  
returned by gethostname to create its session directory  
structure long before mpirun reads a hostfile. This is why 

Re: [OMPI devel] -display-map

2008-12-02 Thread Greg Watson

Ralph, will this be in 1.3rc1?

Thanks,
Greg

On Nov 24, 2008, at 3:06 PM, Greg Watson wrote:


Great, thanks. I'll take a look once it comes over to 1.3.

Cheers,

Greg

On Nov 24, 2008, at 2:59 PM, Ralph Castain wrote:


Yo Greg

This is in the trunk as of r20032. I'll bring it over to 1.3 in a  
few days.


I implemented it as another MCA param  
"orte_show_resolved_nodenames" so you can actually get the info as  
you execute the job, if you want. The xml tag is "noderesolve" -  
let me know if you need any changes.


Ralph


On Oct 22, 2008, at 11:55 AM, Greg Watson wrote:


Ralph,

I guess the issue for us is that we will have to run two commands  
to get the information we need. One to get the configuration  
information, such as version and MCA parameters, and one to get  
the host information, whereas it would seem more logical that this  
should all be available via some kind of "configuration discovery"  
command. I understand the issue with supplying the hostfile  
though, so maybe this just points at the need for us to separate  
configuration information from the host information. In any case,  
we'll work with what you think is best.


Greg

On Oct 20, 2008, at 4:49 PM, Ralph Castain wrote:

Hmmm...just to be sure we are all clear on this. The reason we  
proposed to use mpirun is that "hostfile" has no meaning outside  
of mpirun. That's why ompi_info can't do anything in this regard.


We have no idea what hostfile the user may specify until we  
actually get the mpirun cmd line. They may have specified a  
default-hostfile, but they could also specify hostfiles for the  
individual app_contexts. These may or may not include the node  
upon which mpirun is executing.


So the only way to provide you with a separate command to get a  
hostfile<->nodename mapping would require you to provide us with  
the default-hostifle and/or hostfile cmd line options just as if  
you were issuing the mpirun cmd. We just wouldn't launch - but it  
would be the exact equivalent of doing "mpirun --do-not-launch".


Am I missing something? If so, please do correct me - I would be  
happy to provide a tool if that would make it easier. Just not  
sure what that tool would do.


Thanks
Ralph


On Oct 19, 2008, at 1:59 PM, Greg Watson wrote:


Ralph,

It seems a little strange to be using mpirun for this, but  
barring providing a separate command, or using ompi_info, I  
think this would solve our problem.


Thanks,

Greg

On Oct 17, 2008, at 10:46 AM, Ralph Castain wrote:


Sorry for delay - had to ponder this one for awhile.

Jeff and I agree that adding something to ompi_info would not  
be a good idea. Ompi_info has no knowledge or understanding of  
hostfiles, and adding that capability to it would be a major  
distortion of its intended use.


However, we think we can offer an alternative that might better  
solve the problem. Remember, we now treat hostfiles in a very  
different manner than before - see the wiki page for a complete  
description, or "man orte_hosts".


So the problem is that, to provide you with what you want, we  
need to "dump" the information from whatever default-hostfile  
was provided, and, if no default-hostfile was provided, then  
the information from each hostfile that was provided with an  
app_context.


The best way we could think of to do this is to add another  
mpirun cmd line option --dump-hostfiles that would output the  
line-by-line name from the hostfile plus the name we resolved  
it to. Of course, --xml would cause it to be in xml format.


Would that meet your needs?

Ralph


On Oct 15, 2008, at 3:12 PM, Greg Watson wrote:


Hi Ralph,

We've been discussing this back and forth a bit internally and  
don't really see an easy solution. Our problem is that Eclipse  
is not running on the head node, so gethostbyname will not  
necessarily resolve to the same address. For example, the  
hostfile might refer to the head node by an internal network  
address that is not visible to the outside world. Since  
gethostname also looks in /etc/hosts, it may resolve locally  
but not on a remote system. The only think I can think of  
would be, rather than us reading the hostfile directly as we  
do now, to provide an option to ompi_info that would dump the  
hostfile using the same rules that you apply when you're using  
the hostfile. Would that be feasible?


Greg

On Sep 22, 2008, at 4:25 PM, Ralph Castain wrote:

Sorry for delay - was on vacation and am now trying to work  
my way back to the surface.


I'm not sure I can fix this one for two reasons:

1. In general, OMPI doesn't really care what name is used for  
the node. However, the problem is that it needs to be  
consistent. In this case, ORTE has already used the name  
returned by gethostname to create its session directory  
structure long before mpirun reads a hostfile. This is why we  
retain the value from gethostname instead of allowing it to  
be overwritten by the name in whatever allocation we are  

Re: [OMPI devel] -display-map

2008-11-24 Thread Greg Watson

Great, thanks. I'll take a look once it comes over to 1.3.

Cheers,

Greg

On Nov 24, 2008, at 2:59 PM, Ralph Castain wrote:


Yo Greg

This is in the trunk as of r20032. I'll bring it over to 1.3 in a  
few days.


I implemented it as another MCA param "orte_show_resolved_nodenames"  
so you can actually get the info as you execute the job, if you  
want. The xml tag is "noderesolve" - let me know if you need any  
changes.


Ralph


On Oct 22, 2008, at 11:55 AM, Greg Watson wrote:


Ralph,

I guess the issue for us is that we will have to run two commands  
to get the information we need. One to get the configuration  
information, such as version and MCA parameters, and one to get the  
host information, whereas it would seem more logical that this  
should all be available via some kind of "configuration discovery"  
command. I understand the issue with supplying the hostfile though,  
so maybe this just points at the need for us to separate  
configuration information from the host information. In any case,  
we'll work with what you think is best.


Greg

On Oct 20, 2008, at 4:49 PM, Ralph Castain wrote:

Hmmm...just to be sure we are all clear on this. The reason we  
proposed to use mpirun is that "hostfile" has no meaning outside  
of mpirun. That's why ompi_info can't do anything in this regard.


We have no idea what hostfile the user may specify until we  
actually get the mpirun cmd line. They may have specified a  
default-hostfile, but they could also specify hostfiles for the  
individual app_contexts. These may or may not include the node  
upon which mpirun is executing.


So the only way to provide you with a separate command to get a  
hostfile<->nodename mapping would require you to provide us with  
the default-hostifle and/or hostfile cmd line options just as if  
you were issuing the mpirun cmd. We just wouldn't launch - but it  
would be the exact equivalent of doing "mpirun --do-not-launch".


Am I missing something? If so, please do correct me - I would be  
happy to provide a tool if that would make it easier. Just not  
sure what that tool would do.


Thanks
Ralph


On Oct 19, 2008, at 1:59 PM, Greg Watson wrote:


Ralph,

It seems a little strange to be using mpirun for this, but  
barring providing a separate command, or using ompi_info, I think  
this would solve our problem.


Thanks,

Greg

On Oct 17, 2008, at 10:46 AM, Ralph Castain wrote:


Sorry for delay - had to ponder this one for awhile.

Jeff and I agree that adding something to ompi_info would not be  
a good idea. Ompi_info has no knowledge or understanding of  
hostfiles, and adding that capability to it would be a major  
distortion of its intended use.


However, we think we can offer an alternative that might better  
solve the problem. Remember, we now treat hostfiles in a very  
different manner than before - see the wiki page for a complete  
description, or "man orte_hosts".


So the problem is that, to provide you with what you want, we  
need to "dump" the information from whatever default-hostfile  
was provided, and, if no default-hostfile was provided, then the  
information from each hostfile that was provided with an  
app_context.


The best way we could think of to do this is to add another  
mpirun cmd line option --dump-hostfiles that would output the  
line-by-line name from the hostfile plus the name we resolved it  
to. Of course, --xml would cause it to be in xml format.


Would that meet your needs?

Ralph


On Oct 15, 2008, at 3:12 PM, Greg Watson wrote:


Hi Ralph,

We've been discussing this back and forth a bit internally and  
don't really see an easy solution. Our problem is that Eclipse  
is not running on the head node, so gethostbyname will not  
necessarily resolve to the same address. For example, the  
hostfile might refer to the head node by an internal network  
address that is not visible to the outside world. Since  
gethostname also looks in /etc/hosts, it may resolve locally  
but not on a remote system. The only think I can think of would  
be, rather than us reading the hostfile directly as we do now,  
to provide an option to ompi_info that would dump the hostfile  
using the same rules that you apply when you're using the  
hostfile. Would that be feasible?


Greg

On Sep 22, 2008, at 4:25 PM, Ralph Castain wrote:

Sorry for delay - was on vacation and am now trying to work my  
way back to the surface.


I'm not sure I can fix this one for two reasons:

1. In general, OMPI doesn't really care what name is used for  
the node. However, the problem is that it needs to be  
consistent. In this case, ORTE has already used the name  
returned by gethostname to create its session directory  
structure long before mpirun reads a hostfile. This is why we  
retain the value from gethostname instead of allowing it to be  
overwritten by the name in whatever allocation we are given.  
Using the name in hostfile would require that I either find  
some way to remember any prior 

Re: [OMPI devel] -display-map

2008-11-24 Thread Ralph Castain

Yo Greg

This is in the trunk as of r20032. I'll bring it over to 1.3 in a few  
days.


I implemented it as another MCA param "orte_show_resolved_nodenames"  
so you can actually get the info as you execute the job, if you want.  
The xml tag is "noderesolve" - let me know if you need any changes.


Ralph


On Oct 22, 2008, at 11:55 AM, Greg Watson wrote:


Ralph,

I guess the issue for us is that we will have to run two commands to  
get the information we need. One to get the configuration  
information, such as version and MCA parameters, and one to get the  
host information, whereas it would seem more logical that this  
should all be available via some kind of "configuration discovery"  
command. I understand the issue with supplying the hostfile though,  
so maybe this just points at the need for us to separate  
configuration information from the host information. In any case,  
we'll work with what you think is best.


Greg

On Oct 20, 2008, at 4:49 PM, Ralph Castain wrote:

Hmmm...just to be sure we are all clear on this. The reason we  
proposed to use mpirun is that "hostfile" has no meaning outside of  
mpirun. That's why ompi_info can't do anything in this regard.


We have no idea what hostfile the user may specify until we  
actually get the mpirun cmd line. They may have specified a default- 
hostfile, but they could also specify hostfiles for the individual  
app_contexts. These may or may not include the node upon which  
mpirun is executing.


So the only way to provide you with a separate command to get a  
hostfile<->nodename mapping would require you to provide us with  
the default-hostifle and/or hostfile cmd line options just as if  
you were issuing the mpirun cmd. We just wouldn't launch - but it  
would be the exact equivalent of doing "mpirun --do-not-launch".


Am I missing something? If so, please do correct me - I would be  
happy to provide a tool if that would make it easier. Just not sure  
what that tool would do.


Thanks
Ralph


On Oct 19, 2008, at 1:59 PM, Greg Watson wrote:


Ralph,

It seems a little strange to be using mpirun for this, but barring  
providing a separate command, or using ompi_info, I think this  
would solve our problem.


Thanks,

Greg

On Oct 17, 2008, at 10:46 AM, Ralph Castain wrote:


Sorry for delay - had to ponder this one for awhile.

Jeff and I agree that adding something to ompi_info would not be  
a good idea. Ompi_info has no knowledge or understanding of  
hostfiles, and adding that capability to it would be a major  
distortion of its intended use.


However, we think we can offer an alternative that might better  
solve the problem. Remember, we now treat hostfiles in a very  
different manner than before - see the wiki page for a complete  
description, or "man orte_hosts".


So the problem is that, to provide you with what you want, we  
need to "dump" the information from whatever default-hostfile was  
provided, and, if no default-hostfile was provided, then the  
information from each hostfile that was provided with an  
app_context.


The best way we could think of to do this is to add another  
mpirun cmd line option --dump-hostfiles that would output the  
line-by-line name from the hostfile plus the name we resolved it  
to. Of course, --xml would cause it to be in xml format.


Would that meet your needs?

Ralph


On Oct 15, 2008, at 3:12 PM, Greg Watson wrote:


Hi Ralph,

We've been discussing this back and forth a bit internally and  
don't really see an easy solution. Our problem is that Eclipse  
is not running on the head node, so gethostbyname will not  
necessarily resolve to the same address. For example, the  
hostfile might refer to the head node by an internal network  
address that is not visible to the outside world. Since  
gethostname also looks in /etc/hosts, it may resolve locally but  
not on a remote system. The only think I can think of would be,  
rather than us reading the hostfile directly as we do now, to  
provide an option to ompi_info that would dump the hostfile  
using the same rules that you apply when you're using the  
hostfile. Would that be feasible?


Greg

On Sep 22, 2008, at 4:25 PM, Ralph Castain wrote:

Sorry for delay - was on vacation and am now trying to work my  
way back to the surface.


I'm not sure I can fix this one for two reasons:

1. In general, OMPI doesn't really care what name is used for  
the node. However, the problem is that it needs to be  
consistent. In this case, ORTE has already used the name  
returned by gethostname to create its session directory  
structure long before mpirun reads a hostfile. This is why we  
retain the value from gethostname instead of allowing it to be  
overwritten by the name in whatever allocation we are given.  
Using the name in hostfile would require that I either find  
some way to remember any prior name, or that I tear down and  
rebuild the session directory tree - neither seems attractive  
nor simple (e.g., what happens 

Re: [OMPI devel] -display-map

2008-10-22 Thread Greg Watson

Ralph,

I guess the issue for us is that we will have to run two commands to  
get the information we need. One to get the configuration information,  
such as version and MCA parameters, and one to get the host  
information, whereas it would seem more logical that this should all  
be available via some kind of "configuration discovery" command. I  
understand the issue with supplying the hostfile though, so maybe this  
just points at the need for us to separate configuration information  
from the host information. In any case, we'll work with what you think  
is best.


Greg

On Oct 20, 2008, at 4:49 PM, Ralph Castain wrote:

Hmmm...just to be sure we are all clear on this. The reason we  
proposed to use mpirun is that "hostfile" has no meaning outside of  
mpirun. That's why ompi_info can't do anything in this regard.


We have no idea what hostfile the user may specify until we actually  
get the mpirun cmd line. They may have specified a default-hostfile,  
but they could also specify hostfiles for the individual  
app_contexts. These may or may not include the node upon which  
mpirun is executing.


So the only way to provide you with a separate command to get a  
hostfile<->nodename mapping would require you to provide us with the  
default-hostifle and/or hostfile cmd line options just as if you  
were issuing the mpirun cmd. We just wouldn't launch - but it would  
be the exact equivalent of doing "mpirun --do-not-launch".


Am I missing something? If so, please do correct me - I would be  
happy to provide a tool if that would make it easier. Just not sure  
what that tool would do.


Thanks
Ralph


On Oct 19, 2008, at 1:59 PM, Greg Watson wrote:


Ralph,

It seems a little strange to be using mpirun for this, but barring  
providing a separate command, or using ompi_info, I think this  
would solve our problem.


Thanks,

Greg

On Oct 17, 2008, at 10:46 AM, Ralph Castain wrote:


Sorry for delay - had to ponder this one for awhile.

Jeff and I agree that adding something to ompi_info would not be a  
good idea. Ompi_info has no knowledge or understanding of  
hostfiles, and adding that capability to it would be a major  
distortion of its intended use.


However, we think we can offer an alternative that might better  
solve the problem. Remember, we now treat hostfiles in a very  
different manner than before - see the wiki page for a complete  
description, or "man orte_hosts".


So the problem is that, to provide you with what you want, we need  
to "dump" the information from whatever default-hostfile was  
provided, and, if no default-hostfile was provided, then the  
information from each hostfile that was provided with an  
app_context.


The best way we could think of to do this is to add another mpirun  
cmd line option --dump-hostfiles that would output the line-by- 
line name from the hostfile plus the name we resolved it to. Of  
course, --xml would cause it to be in xml format.


Would that meet your needs?

Ralph


On Oct 15, 2008, at 3:12 PM, Greg Watson wrote:


Hi Ralph,

We've been discussing this back and forth a bit internally and  
don't really see an easy solution. Our problem is that Eclipse is  
not running on the head node, so gethostbyname will not  
necessarily resolve to the same address. For example, the  
hostfile might refer to the head node by an internal network  
address that is not visible to the outside world. Since  
gethostname also looks in /etc/hosts, it may resolve locally but  
not on a remote system. The only think I can think of would be,  
rather than us reading the hostfile directly as we do now, to  
provide an option to ompi_info that would dump the hostfile using  
the same rules that you apply when you're using the hostfile.  
Would that be feasible?


Greg

On Sep 22, 2008, at 4:25 PM, Ralph Castain wrote:

Sorry for delay - was on vacation and am now trying to work my  
way back to the surface.


I'm not sure I can fix this one for two reasons:

1. In general, OMPI doesn't really care what name is used for  
the node. However, the problem is that it needs to be  
consistent. In this case, ORTE has already used the name  
returned by gethostname to create its session directory  
structure long before mpirun reads a hostfile. This is why we  
retain the value from gethostname instead of allowing it to be  
overwritten by the name in whatever allocation we are given.  
Using the name in hostfile would require that I either find some  
way to remember any prior name, or that I tear down and rebuild  
the session directory tree - neither seems attractive nor simple  
(e.g., what happens when the user provides multiple entries in  
the hostfile for the node, each with a different IP address  
based on another interface in that node? Sounds crazy, but we  
have already seen it done - which one do I use?).


2. We don't actually store the hostfile info anywhere - we just  
use it and forget it. For us to add an XML attribute containing  
any 

Re: [OMPI devel] -display-map

2008-10-20 Thread Ralph Castain
Hmmm...just to be sure we are all clear on this. The reason we  
proposed to use mpirun is that "hostfile" has no meaning outside of  
mpirun. That's why ompi_info can't do anything in this regard.


We have no idea what hostfile the user may specify until we actually  
get the mpirun cmd line. They may have specified a default-hostfile,  
but they could also specify hostfiles for the individual app_contexts.  
These may or may not include the node upon which mpirun is executing.


So the only way to provide you with a separate command to get a  
hostfile<->nodename mapping would require you to provide us with the  
default-hostifle and/or hostfile cmd line options just as if you were  
issuing the mpirun cmd. We just wouldn't launch - but it would be the  
exact equivalent of doing "mpirun --do-not-launch".


Am I missing something? If so, please do correct me - I would be happy  
to provide a tool if that would make it easier. Just not sure what  
that tool would do.


Thanks
Ralph


On Oct 19, 2008, at 1:59 PM, Greg Watson wrote:


Ralph,

It seems a little strange to be using mpirun for this, but barring  
providing a separate command, or using ompi_info, I think this would  
solve our problem.


Thanks,

Greg

On Oct 17, 2008, at 10:46 AM, Ralph Castain wrote:


Sorry for delay - had to ponder this one for awhile.

Jeff and I agree that adding something to ompi_info would not be a  
good idea. Ompi_info has no knowledge or understanding of  
hostfiles, and adding that capability to it would be a major  
distortion of its intended use.


However, we think we can offer an alternative that might better  
solve the problem. Remember, we now treat hostfiles in a very  
different manner than before - see the wiki page for a complete  
description, or "man orte_hosts".


So the problem is that, to provide you with what you want, we need  
to "dump" the information from whatever default-hostfile was  
provided, and, if no default-hostfile was provided, then the  
information from each hostfile that was provided with an app_context.


The best way we could think of to do this is to add another mpirun  
cmd line option --dump-hostfiles that would output the line-by-line  
name from the hostfile plus the name we resolved it to. Of course,  
--xml would cause it to be in xml format.


Would that meet your needs?

Ralph


On Oct 15, 2008, at 3:12 PM, Greg Watson wrote:


Hi Ralph,

We've been discussing this back and forth a bit internally and  
don't really see an easy solution. Our problem is that Eclipse is  
not running on the head node, so gethostbyname will not  
necessarily resolve to the same address. For example, the hostfile  
might refer to the head node by an internal network address that  
is not visible to the outside world. Since gethostname also looks  
in /etc/hosts, it may resolve locally but not on a remote system.  
The only think I can think of would be, rather than us reading the  
hostfile directly as we do now, to provide an option to ompi_info  
that would dump the hostfile using the same rules that you apply  
when you're using the hostfile. Would that be feasible?


Greg

On Sep 22, 2008, at 4:25 PM, Ralph Castain wrote:

Sorry for delay - was on vacation and am now trying to work my  
way back to the surface.


I'm not sure I can fix this one for two reasons:

1. In general, OMPI doesn't really care what name is used for the  
node. However, the problem is that it needs to be consistent. In  
this case, ORTE has already used the name returned by gethostname  
to create its session directory structure long before mpirun  
reads a hostfile. This is why we retain the value from  
gethostname instead of allowing it to be overwritten by the name  
in whatever allocation we are given. Using the name in hostfile  
would require that I either find some way to remember any prior  
name, or that I tear down and rebuild the session directory tree  
- neither seems attractive nor simple (e.g., what happens when  
the user provides multiple entries in the hostfile for the node,  
each with a different IP address based on another interface in  
that node? Sounds crazy, but we have already seen it done - which  
one do I use?).


2. We don't actually store the hostfile info anywhere - we just  
use it and forget it. For us to add an XML attribute containing  
any hostfile-related info would therefore require us to re-read  
the hostfile. I could have it do that -only- in the case of "XML  
output required", but it seems rather ugly.


An alternative might be for you to simply do a "gethostbyname"  
lookup of the IP address or hostname to see if it matches instead  
of just doing a strcmp. This is what we have to do internally as  
we frequently have problems with FQDN vs. non-FQDN vs. IP  
addresses etc. If the local OS hasn't cached the IP address for  
the node in question it can take a little time to DNS resolve it,  
but otherwise works fine.


I can point you to the code in OPAL that we use - I 

Re: [OMPI devel] -display-map

2008-10-19 Thread Greg Watson

Ralph,

It seems a little strange to be using mpirun for this, but barring  
providing a separate command, or using ompi_info, I think this would  
solve our problem.


Thanks,

Greg

On Oct 17, 2008, at 10:46 AM, Ralph Castain wrote:


Sorry for delay - had to ponder this one for awhile.

Jeff and I agree that adding something to ompi_info would not be a  
good idea. Ompi_info has no knowledge or understanding of hostfiles,  
and adding that capability to it would be a major distortion of its  
intended use.


However, we think we can offer an alternative that might better  
solve the problem. Remember, we now treat hostfiles in a very  
different manner than before - see the wiki page for a complete  
description, or "man orte_hosts".


So the problem is that, to provide you with what you want, we need  
to "dump" the information from whatever default-hostfile was  
provided, and, if no default-hostfile was provided, then the  
information from each hostfile that was provided with an app_context.


The best way we could think of to do this is to add another mpirun  
cmd line option --dump-hostfiles that would output the line-by-line  
name from the hostfile plus the name we resolved it to. Of course, -- 
xml would cause it to be in xml format.


Would that meet your needs?

Ralph


On Oct 15, 2008, at 3:12 PM, Greg Watson wrote:


Hi Ralph,

We've been discussing this back and forth a bit internally and  
don't really see an easy solution. Our problem is that Eclipse is  
not running on the head node, so gethostbyname will not necessarily  
resolve to the same address. For example, the hostfile might refer  
to the head node by an internal network address that is not visible  
to the outside world. Since gethostname also looks in /etc/hosts,  
it may resolve locally but not on a remote system. The only think I  
can think of would be, rather than us reading the hostfile directly  
as we do now, to provide an option to ompi_info that would dump the  
hostfile using the same rules that you apply when you're using the  
hostfile. Would that be feasible?


Greg

On Sep 22, 2008, at 4:25 PM, Ralph Castain wrote:

Sorry for delay - was on vacation and am now trying to work my way  
back to the surface.


I'm not sure I can fix this one for two reasons:

1. In general, OMPI doesn't really care what name is used for the  
node. However, the problem is that it needs to be consistent. In  
this case, ORTE has already used the name returned by gethostname  
to create its session directory structure long before mpirun reads  
a hostfile. This is why we retain the value from gethostname  
instead of allowing it to be overwritten by the name in whatever  
allocation we are given. Using the name in hostfile would require  
that I either find some way to remember any prior name, or that I  
tear down and rebuild the session directory tree - neither seems  
attractive nor simple (e.g., what happens when the user provides  
multiple entries in the hostfile for the node, each with a  
different IP address based on another interface in that node?  
Sounds crazy, but we have already seen it done - which one do I  
use?).


2. We don't actually store the hostfile info anywhere - we just  
use it and forget it. For us to add an XML attribute containing  
any hostfile-related info would therefore require us to re-read  
the hostfile. I could have it do that -only- in the case of "XML  
output required", but it seems rather ugly.


An alternative might be for you to simply do a "gethostbyname"  
lookup of the IP address or hostname to see if it matches instead  
of just doing a strcmp. This is what we have to do internally as  
we frequently have problems with FQDN vs. non-FQDN vs. IP  
addresses etc. If the local OS hasn't cached the IP address for  
the node in question it can take a little time to DNS resolve it,  
but otherwise works fine.


I can point you to the code in OPAL that we use - I would think  
something similar would be easy to implement in your code and  
would readily solve the problem.


Ralph

On Sep 19, 2008, at 7:18 AM, Greg Watson wrote:


Ralph,

The problem we're seeing is just with the head node. If I specify  
a particular IP address for the head node in the hostfile, it  
gets changed to the FQDN when displayed in the map. This is a  
problem for us as we need to be able to match the two, and since  
we're not necessarily running on the head node, we can't always  
do the same resolution you're doing.


Would it be possible to use the same address that is specified in  
the hostfile, or alternatively provide an XML attribute that  
contains this information?


Thanks,

Greg

On Sep 11, 2008, at 9:06 AM, Ralph Castain wrote:

Not in that regard, depending upon what you mean by "recently".  
The only changes I am aware of wrt nodes consisted of some  
changes to the order in which we use the nodes when specified by  
hostfile or -host, and a little #if protectionism needed by  
Brian for the Cray 

Re: [OMPI devel] -display-map

2008-10-17 Thread Ralph Castain

Sorry for delay - had to ponder this one for awhile.

Jeff and I agree that adding something to ompi_info would not be a  
good idea. Ompi_info has no knowledge or understanding of hostfiles,  
and adding that capability to it would be a major distortion of its  
intended use.


However, we think we can offer an alternative that might better solve  
the problem. Remember, we now treat hostfiles in a very different  
manner than before - see the wiki page for a complete description, or  
"man orte_hosts".


So the problem is that, to provide you with what you want, we need to  
"dump" the information from whatever default-hostfile was provided,  
and, if no default-hostfile was provided, then the information from  
each hostfile that was provided with an app_context.


The best way we could think of to do this is to add another mpirun cmd  
line option --dump-hostfiles that would output the line-by-line name  
from the hostfile plus the name we resolved it to. Of course, --xml  
would cause it to be in xml format.


Would that meet your needs?

Ralph


On Oct 15, 2008, at 3:12 PM, Greg Watson wrote:


Hi Ralph,

We've been discussing this back and forth a bit internally and don't  
really see an easy solution. Our problem is that Eclipse is not  
running on the head node, so gethostbyname will not necessarily  
resolve to the same address. For example, the hostfile might refer  
to the head node by an internal network address that is not visible  
to the outside world. Since gethostname also looks in /etc/hosts, it  
may resolve locally but not on a remote system. The only think I can  
think of would be, rather than us reading the hostfile directly as  
we do now, to provide an option to ompi_info that would dump the  
hostfile using the same rules that you apply when you're using the  
hostfile. Would that be feasible?


Greg

On Sep 22, 2008, at 4:25 PM, Ralph Castain wrote:

Sorry for delay - was on vacation and am now trying to work my way  
back to the surface.


I'm not sure I can fix this one for two reasons:

1. In general, OMPI doesn't really care what name is used for the  
node. However, the problem is that it needs to be consistent. In  
this case, ORTE has already used the name returned by gethostname  
to create its session directory structure long before mpirun reads  
a hostfile. This is why we retain the value from gethostname  
instead of allowing it to be overwritten by the name in whatever  
allocation we are given. Using the name in hostfile would require  
that I either find some way to remember any prior name, or that I  
tear down and rebuild the session directory tree - neither seems  
attractive nor simple (e.g., what happens when the user provides  
multiple entries in the hostfile for the node, each with a  
different IP address based on another interface in that node?  
Sounds crazy, but we have already seen it done - which one do I  
use?).


2. We don't actually store the hostfile info anywhere - we just use  
it and forget it. For us to add an XML attribute containing any  
hostfile-related info would therefore require us to re-read the  
hostfile. I could have it do that -only- in the case of "XML output  
required", but it seems rather ugly.


An alternative might be for you to simply do a "gethostbyname"  
lookup of the IP address or hostname to see if it matches instead  
of just doing a strcmp. This is what we have to do internally as we  
frequently have problems with FQDN vs. non-FQDN vs. IP addresses  
etc. If the local OS hasn't cached the IP address for the node in  
question it can take a little time to DNS resolve it, but otherwise  
works fine.


I can point you to the code in OPAL that we use - I would think  
something similar would be easy to implement in your code and would  
readily solve the problem.


Ralph

On Sep 19, 2008, at 7:18 AM, Greg Watson wrote:


Ralph,

The problem we're seeing is just with the head node. If I specify  
a particular IP address for the head node in the hostfile, it gets  
changed to the FQDN when displayed in the map. This is a problem  
for us as we need to be able to match the two, and since we're not  
necessarily running on the head node, we can't always do the same  
resolution you're doing.


Would it be possible to use the same address that is specified in  
the hostfile, or alternatively provide an XML attribute that  
contains this information?


Thanks,

Greg

On Sep 11, 2008, at 9:06 AM, Ralph Castain wrote:

Not in that regard, depending upon what you mean by "recently".  
The only changes I am aware of wrt nodes consisted of some  
changes to the order in which we use the nodes when specified by  
hostfile or -host, and a little #if protectionism needed by Brian  
for the Cray port.


Are you seeing this for every node? Reason I ask: I can't offhand  
think of anything in the code base that would replace a host name  
with the FQDN because we don't get that info for remote nodes.  
The only exception is the head 

Re: [OMPI devel] -display-map

2008-10-15 Thread Greg Watson

Hi Ralph,

We've been discussing this back and forth a bit internally and don't  
really see an easy solution. Our problem is that Eclipse is not  
running on the head node, so gethostbyname will not necessarily  
resolve to the same address. For example, the hostfile might refer to  
the head node by an internal network address that is not visible to  
the outside world. Since gethostname also looks in /etc/hosts, it may  
resolve locally but not on a remote system. The only think I can think  
of would be, rather than us reading the hostfile directly as we do  
now, to provide an option to ompi_info that would dump the hostfile  
using the same rules that you apply when you're using the hostfile.  
Would that be feasible?


Greg

On Sep 22, 2008, at 4:25 PM, Ralph Castain wrote:

Sorry for delay - was on vacation and am now trying to work my way  
back to the surface.


I'm not sure I can fix this one for two reasons:

1. In general, OMPI doesn't really care what name is used for the  
node. However, the problem is that it needs to be consistent. In  
this case, ORTE has already used the name returned by gethostname to  
create its session directory structure long before mpirun reads a  
hostfile. This is why we retain the value from gethostname instead  
of allowing it to be overwritten by the name in whatever allocation  
we are given. Using the name in hostfile would require that I either  
find some way to remember any prior name, or that I tear down and  
rebuild the session directory tree - neither seems attractive nor  
simple (e.g., what happens when the user provides multiple entries  
in the hostfile for the node, each with a different IP address based  
on another interface in that node? Sounds crazy, but we have already  
seen it done - which one do I use?).


2. We don't actually store the hostfile info anywhere - we just use  
it and forget it. For us to add an XML attribute containing any  
hostfile-related info would therefore require us to re-read the  
hostfile. I could have it do that -only- in the case of "XML output  
required", but it seems rather ugly.


An alternative might be for you to simply do a "gethostbyname"  
lookup of the IP address or hostname to see if it matches instead of  
just doing a strcmp. This is what we have to do internally as we  
frequently have problems with FQDN vs. non-FQDN vs. IP addresses  
etc. If the local OS hasn't cached the IP address for the node in  
question it can take a little time to DNS resolve it, but otherwise  
works fine.


I can point you to the code in OPAL that we use - I would think  
something similar would be easy to implement in your code and would  
readily solve the problem.


Ralph

On Sep 19, 2008, at 7:18 AM, Greg Watson wrote:


Ralph,

The problem we're seeing is just with the head node. If I specify a  
particular IP address for the head node in the hostfile, it gets  
changed to the FQDN when displayed in the map. This is a problem  
for us as we need to be able to match the two, and since we're not  
necessarily running on the head node, we can't always do the same  
resolution you're doing.


Would it be possible to use the same address that is specified in  
the hostfile, or alternatively provide an XML attribute that  
contains this information?


Thanks,

Greg

On Sep 11, 2008, at 9:06 AM, Ralph Castain wrote:

Not in that regard, depending upon what you mean by "recently".  
The only changes I am aware of wrt nodes consisted of some changes  
to the order in which we use the nodes when specified by hostfile  
or -host, and a little #if protectionism needed by Brian for the  
Cray port.


Are you seeing this for every node? Reason I ask: I can't offhand  
think of anything in the code base that would replace a host name  
with the FQDN because we don't get that info for remote nodes. The  
only exception is the head node (where mpirun sits) - in that lone  
case, we default to the name returned to us by gethostname(). We  
do that because the head node is frequently accessible on a more  
global basis than the compute nodes - thus, the FQDN is required  
to ensure that there is no address confusion on the network.


If the user refers to compute nodes in a hostfile or -host (or in  
an allocation from a resource manager) by non-FQDN, we just assume  
they know what they are doing and the name will correctly resolve  
to a unique address.



On Sep 10, 2008, at 9:45 AM, Greg Watson wrote:


Hi,

Has there been a change in the behavior of the -display-map  
option has changed recently in the 1.3 branch. We're now seeing  
the host name as a fully resolved DN rather than the entry that  
was specified in the hostfile. Is there any particular reason for  
this? If so, would it be possible to add the hostfile entry to  
the output since we need to be able to match the two?


Thanks,

Greg
___
devel mailing list
de...@open-mpi.org

Re: [OMPI devel] -display-map and mpi_spawn

2008-09-22 Thread Ralph Castain
We always output the entire map, so you'll see the parent procs as  
well as the child



On Sep 16, 2008, at 12:52 PM, Greg Watson wrote:


Hi Ralph,

No I'm happy to get a map at the beginning and at every spawn. Do  
you send the whole map again, or only an update?


Regards,

Greg

On Sep 11, 2008, at 9:09 AM, Ralph Castain wrote:

It already somewhat does. If you use --display-map at mpirun, you  
automatically get display-map whenever MPI_Spawn is called.


We didn't provide a mechanism by which you could only display-map  
for MPI_Spawn (and not for the original mpirun), but it would be  
trivial to do so - just have to define an info-key for that  
purpose. Is that what you need?



On Sep 11, 2008, at 5:35 AM, Greg Watson wrote:


Ralph,

At the moment -display-map shows the process mapping when mpirun  
first starts, but I'm wondering about processes created  
dynamically. Would it be possible to trigger a map update when  
MPI_Spawn is called?


Regards,

Greg
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] -display-map

2008-09-19 Thread Greg Watson

Ralph,

The problem we're seeing is just with the head node. If I specify a  
particular IP address for the head node in the hostfile, it gets  
changed to the FQDN when displayed in the map. This is a problem for  
us as we need to be able to match the two, and since we're not  
necessarily running on the head node, we can't always do the same  
resolution you're doing.


Would it be possible to use the same address that is specified in the  
hostfile, or alternatively provide an XML attribute that contains this  
information?


Thanks,

Greg

On Sep 11, 2008, at 9:06 AM, Ralph Castain wrote:

Not in that regard, depending upon what you mean by "recently". The  
only changes I am aware of wrt nodes consisted of some changes to  
the order in which we use the nodes when specified by hostfile or - 
host, and a little #if protectionism needed by Brian for the Cray  
port.


Are you seeing this for every node? Reason I ask: I can't offhand  
think of anything in the code base that would replace a host name  
with the FQDN because we don't get that info for remote nodes. The  
only exception is the head node (where mpirun sits) - in that lone  
case, we default to the name returned to us by gethostname(). We do  
that because the head node is frequently accessible on a more global  
basis than the compute nodes - thus, the FQDN is required to ensure  
that there is no address confusion on the network.


If the user refers to compute nodes in a hostfile or -host (or in an  
allocation from a resource manager) by non-FQDN, we just assume they  
know what they are doing and the name will correctly resolve to a  
unique address.



On Sep 10, 2008, at 9:45 AM, Greg Watson wrote:


Hi,

Has there been a change in the behavior of the -display-map option  
has changed recently in the 1.3 branch. We're now seeing the host  
name as a fully resolved DN rather than the entry that was  
specified in the hostfile. Is there any particular reason for this?  
If so, would it be possible to add the hostfile entry to the output  
since we need to be able to match the two?


Thanks,

Greg
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





Re: [OMPI devel] -display-map and mpi_spawn

2008-09-16 Thread Roland Dreier
 > thanks, applied

oops, replied to the wrong message ;)


Re: [OMPI devel] -display-map and mpi_spawn

2008-09-16 Thread Roland Dreier
thanks, applied


Re: [OMPI devel] -display-map and mpi_spawn

2008-09-16 Thread Greg Watson

Hi Ralph,

No I'm happy to get a map at the beginning and at every spawn. Do you  
send the whole map again, or only an update?


Regards,

Greg

On Sep 11, 2008, at 9:09 AM, Ralph Castain wrote:

It already somewhat does. If you use --display-map at mpirun, you  
automatically get display-map whenever MPI_Spawn is called.


We didn't provide a mechanism by which you could only display-map  
for MPI_Spawn (and not for the original mpirun), but it would be  
trivial to do so - just have to define an info-key for that purpose.  
Is that what you need?



On Sep 11, 2008, at 5:35 AM, Greg Watson wrote:


Ralph,

At the moment -display-map shows the process mapping when mpirun  
first starts, but I'm wondering about processes created  
dynamically. Would it be possible to trigger a map update when  
MPI_Spawn is called?


Regards,

Greg
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





Re: [OMPI devel] -display-map and mpi_spawn

2008-09-11 Thread Ralph Castain
It already somewhat does. If you use --display-map at mpirun, you  
automatically get display-map whenever MPI_Spawn is called.


We didn't provide a mechanism by which you could only display-map for  
MPI_Spawn (and not for the original mpirun), but it would be trivial  
to do so - just have to define an info-key for that purpose. Is that  
what you need?



On Sep 11, 2008, at 5:35 AM, Greg Watson wrote:


Ralph,

At the moment -display-map shows the process mapping when mpirun  
first starts, but I'm wondering about processes created dynamically.  
Would it be possible to trigger a map update when MPI_Spawn is called?


Regards,

Greg
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] -display-map

2008-09-11 Thread Ralph Castain
Not in that regard, depending upon what you mean by "recently". The  
only changes I am aware of wrt nodes consisted of some changes to the  
order in which we use the nodes when specified by hostfile or -host,  
and a little #if protectionism needed by Brian for the Cray port.


Are you seeing this for every node? Reason I ask: I can't offhand  
think of anything in the code base that would replace a host name with  
the FQDN because we don't get that info for remote nodes. The only  
exception is the head node (where mpirun sits) - in that lone case, we  
default to the name returned to us by gethostname(). We do that  
because the head node is frequently accessible on a more global basis  
than the compute nodes - thus, the FQDN is required to ensure that  
there is no address confusion on the network.


If the user refers to compute nodes in a hostfile or -host (or in an  
allocation from a resource manager) by non-FQDN, we just assume they  
know what they are doing and the name will correctly resolve to a  
unique address.



On Sep 10, 2008, at 9:45 AM, Greg Watson wrote:


Hi,

Has there been a change in the behavior of the -display-map option  
has changed recently in the 1.3 branch. We're now seeing the host  
name as a fully resolved DN rather than the entry that was specified  
in the hostfile. Is there any particular reason for this? If so,  
would it be possible to add the hostfile entry to the output since  
we need to be able to match the two?


Thanks,

Greg
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




[OMPI devel] -display-map and mpi_spawn

2008-09-11 Thread Greg Watson

Ralph,

At the moment -display-map shows the process mapping when mpirun first  
starts, but I'm wondering about processes created dynamically. Would  
it be possible to trigger a map update when MPI_Spawn is called?


Regards,

Greg


[OMPI devel] -display-map

2008-09-10 Thread Greg Watson

Hi,

Has there been a change in the behavior of the -display-map option has  
changed recently in the 1.3 branch. We're now seeing the host name as  
a fully resolved DN rather than the entry that was specified in the  
hostfile. Is there any particular reason for this? If so, would it be  
possible to add the hostfile entry to the output since we need to be  
able to match the two?


Thanks,

Greg


[OMPI devel] Display map and allocation

2008-09-04 Thread Ralph Castain

Hi folks

I am giving a series of talks here about OMPI 1.3, beginning with a  
description of the user-oriented features - i.e., cmd line options,  
etc. In working on the presentation, and showing a draft to some  
users, questions arose about two options: --display-map and --display- 
allocation. To be fair, Greg Watson had raised similar questions before.


The questions revolve around the fact that the data provided by those  
options contains a lot of stuff that, while immensely useful to an  
OMPI developer, are of no use to a user and actually cause confusion.  
What we propose, therefore, is to revise these options:


--display-map: displays a list of nodes, to include node name and  
state and a list of the procs on that node. For each proc,  show the  
MPI rank, local and node ranks, any slot list for that proc (if  
given), and state.


--display-allocation: displays a list of nodes to include node name,  
slot info, username (if given), and state ("unknown" if not known)


We would then add two new options that show the broader output we have  
today: --debug-display-map, and --debug-display-allocation.


Anybody have heartburn and/or comments on this? If not, I plan to make  
the change by the end of the week.


Ralph