Re: [OMPI devel] Getting the number of nodes

2006-07-05 Thread Nathan DeBardeleben
I'm running this on my mac where I expected to only get back the 
localhost.  I upgraded to 1.0.2 a little while back, had been using one 
of the alphas (I think it was alpha 9 but I can't be sure) up until that 
point when this function returned '1' on my mac.


-- Nathan
Correspondence
-
Nathan DeBardeleben, Ph.D.
Los Alamos National Laboratory
Parallel Tools Team
High Performance Computing Environments
phone: 505-667-3428
email: ndeb...@lanl.gov
-



Ralph H Castain wrote:

Rc=0 indicates that the "get" function was successful, so this means that
there were no nodes on the NODE_SEGMENT. Were you running this in an
environment where nodes had been allocated to you? Or were you expecting to
find only "localhost" on the segment?

I'm not entirely sure, but I don't believe there have been significant
changes in 1.0.2 for some time. My guess is that something has changed on
your system as opposed to in the OpenMPI code you're using. Did you do an
update recently and then begin seeing this behavior? Your revision level is
1000+ behind the current repository, so my guess is that you haven't updated
for awhile - since 1.0.2 is under maintenance for bugs only, that shouldn't
be a problem. I'm just trying to understand why your function is doing
something different if the OpenMPI code your using hasn't changed.

Ralph



On 7/5/06 2:40 PM, "Nathan DeBardeleben"  wrote:

  

Open MPI: 1.0.2
   Open MPI SVN revision: r9571
  

The rc value returned by the 'get' call is '0'.
All I'm doing is calling init with my own daemon name, it's coming up
fine, then I immediately call this to figure out how many nodes are
associated with this machine.

-- Nathan
Correspondence
-
Nathan DeBardeleben, Ph.D.
Los Alamos National Laboratory
Parallel Tools Team
High Performance Computing Environments
phone: 505-667-3428
email: ndeb...@lanl.gov
-



Ralph H Castain wrote:


Hi Nathan

Could you tell us which version of the code you are using, and print out the
rc value that was returned by the "get" call? I see nothing obviously wrong
with the code, but much depends on what happened prior to this call too.

BTW: you might want to release the memory stored in the returned values - it
could represent a substantial memory leak.

Ralph



On 7/5/06 9:28 AM, "Nathan DeBardeleben"  wrote:

  
  

I used to use this code to get the number of nodes in a cluster /
machine / whatever:



int
get_num_nodes(void)
{
int rc;
size_t cnt;
orte_gpr_value_t **values;

rc = orte_gpr.get(ORTE_GPR_KEYS_OR|ORTE_GPR_TOKENS_OR,

ORTE_NODE_SEGMENT, NULL, NULL, , );
  
if(rc != ORTE_SUCCESS) {

return 0;
}

return cnt;

}
  
  

This now returns '0' on my MAC when it used to return 1.  Is this not an
acceptable way of doing this?  Is there a cleaner / better way these days?



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  
  

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  


Re: [OMPI devel] Getting the number of nodes

2006-07-05 Thread Ralph H Castain
Rc=0 indicates that the "get" function was successful, so this means that
there were no nodes on the NODE_SEGMENT. Were you running this in an
environment where nodes had been allocated to you? Or were you expecting to
find only "localhost" on the segment?

I'm not entirely sure, but I don't believe there have been significant
changes in 1.0.2 for some time. My guess is that something has changed on
your system as opposed to in the OpenMPI code you're using. Did you do an
update recently and then begin seeing this behavior? Your revision level is
1000+ behind the current repository, so my guess is that you haven't updated
for awhile - since 1.0.2 is under maintenance for bugs only, that shouldn't
be a problem. I'm just trying to understand why your function is doing
something different if the OpenMPI code your using hasn't changed.

Ralph



On 7/5/06 2:40 PM, "Nathan DeBardeleben"  wrote:

>> Open MPI: 1.0.2
>>Open MPI SVN revision: r9571
> The rc value returned by the 'get' call is '0'.
> All I'm doing is calling init with my own daemon name, it's coming up
> fine, then I immediately call this to figure out how many nodes are
> associated with this machine.
> 
> -- Nathan
> Correspondence
> -
> Nathan DeBardeleben, Ph.D.
> Los Alamos National Laboratory
> Parallel Tools Team
> High Performance Computing Environments
> phone: 505-667-3428
> email: ndeb...@lanl.gov
> -
> 
> 
> 
> Ralph H Castain wrote:
>> Hi Nathan
>> 
>> Could you tell us which version of the code you are using, and print out the
>> rc value that was returned by the "get" call? I see nothing obviously wrong
>> with the code, but much depends on what happened prior to this call too.
>> 
>> BTW: you might want to release the memory stored in the returned values - it
>> could represent a substantial memory leak.
>> 
>> Ralph
>> 
>> 
>> 
>> On 7/5/06 9:28 AM, "Nathan DeBardeleben"  wrote:
>> 
>>   
>>> I used to use this code to get the number of nodes in a cluster /
>>> machine / whatever:
>>> 
 int
 get_num_nodes(void)
 {
 int rc;
 size_t cnt;
 orte_gpr_value_t **values;
 
 rc = orte_gpr.get(ORTE_GPR_KEYS_OR|ORTE_GPR_TOKENS_OR,
 ORTE_NODE_SEGMENT, NULL, NULL, , );
   
 if(rc != ORTE_SUCCESS) {
 return 0;
 }
 
 return cnt;
 }
   
>>> This now returns '0' on my MAC when it used to return 1.  Is this not an
>>> acceptable way of doing this?  Is there a cleaner / better way these days?
>>> 
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>>   
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Getting the number of nodes

2006-07-05 Thread Ralph H Castain
Hi Nathan

Could you tell us which version of the code you are using, and print out the
rc value that was returned by the "get" call? I see nothing obviously wrong
with the code, but much depends on what happened prior to this call too.

BTW: you might want to release the memory stored in the returned values - it
could represent a substantial memory leak.

Ralph



On 7/5/06 9:28 AM, "Nathan DeBardeleben"  wrote:

> I used to use this code to get the number of nodes in a cluster /
> machine / whatever:
>> int
>> get_num_nodes(void)
>> {
>> int rc;
>> size_t cnt;
>> orte_gpr_value_t **values;
>> 
>> rc = orte_gpr.get(ORTE_GPR_KEYS_OR|ORTE_GPR_TOKENS_OR,
>> ORTE_NODE_SEGMENT, NULL, NULL, , );
>> 
>> if(rc != ORTE_SUCCESS) {
>> return 0;
>> }
>> 
>> return cnt;
>> }
> This now returns '0' on my MAC when it used to return 1.  Is this not an
> acceptable way of doing this?  Is there a cleaner / better way these days?




[OMPI devel] Getting the number of nodes

2006-07-05 Thread Nathan DeBardeleben
I used to use this code to get the number of nodes in a cluster / 
machine / whatever:

int
get_num_nodes(void)
{
int rc;
size_t cnt;
orte_gpr_value_t **values;

rc = orte_gpr.get(ORTE_GPR_KEYS_OR|ORTE_GPR_TOKENS_OR,

ORTE_NODE_SEGMENT, NULL, NULL, , );

if(rc != ORTE_SUCCESS) {

return 0;
}

return cnt;

}
This now returns '0' on my MAC when it used to return 1.  Is this not an 
acceptable way of doing this?  Is there a cleaner / better way these days?


--
-- Nathan
Correspondence
-
Nathan DeBardeleben, Ph.D.
Los Alamos National Laboratory
Parallel Tools Team
High Performance Computing Environments
phone: 505-667-3428
email: ndeb...@lanl.gov
-