Re: [OMPI devel] orted problem

2006-07-05 Thread Ralph H Castain
This has been around for a very long time (at least a year, if memory serves correctly). The problem is that the system "hangs" while trying to flush the io buffers through the RML because it loses connection to the head node process (for 1.x, that's basically mpirun) - but the "flush" procedure do

[OMPI devel] Getting the number of nodes

2006-07-05 Thread Nathan DeBardeleben
I used to use this code to get the number of nodes in a cluster / machine / whatever: int get_num_nodes(void) { int rc; size_t cnt; orte_gpr_value_t **values; rc = orte_gpr.get(ORTE_GPR_KEYS_OR|ORTE_GPR_TOKENS_OR, ORTE_NODE_SEGMENT, NULL, NULL, &cnt,

Re: [OMPI devel] Getting the number of nodes

2006-07-05 Thread Ralph H Castain
Hi Nathan Could you tell us which version of the code you are using, and print out the rc value that was returned by the "get" call? I see nothing obviously wrong with the code, but much depends on what happened prior to this call too. BTW: you might want to release the memory stored in the retur

Re: [OMPI devel] Getting the number of nodes

2006-07-05 Thread Nathan DeBardeleben
Open MPI: 1.0.2 Open MPI SVN revision: r9571 The rc value returned by the 'get' call is '0'. All I'm doing is calling init with my own daemon name, it's coming up fine, then I immediately call this to figure out how many nodes are associated with this machine. -- Nathan Cor

Re: [OMPI devel] Getting the number of nodes

2006-07-05 Thread Ralph H Castain
Rc=0 indicates that the "get" function was successful, so this means that there were no nodes on the NODE_SEGMENT. Were you running this in an environment where nodes had been allocated to you? Or were you expecting to find only "localhost" on the segment? I'm not entirely sure, but I don't believ

Re: [OMPI devel] Getting the number of nodes

2006-07-05 Thread Nathan DeBardeleben
I'm running this on my mac where I expected to only get back the localhost. I upgraded to 1.0.2 a little while back, had been using one of the alphas (I think it was alpha 9 but I can't be sure) up until that point when this function returned '1' on my mac. -- Nathan Correspondence --

Re: [OMPI devel] Getting the number of nodes

2006-07-05 Thread Josh Hursey
I agree with Ralph, this code should work fine (we do this internally in orte_ras_base_node_query()). You may try adding a 'dump' of the GPR to make sure that the node segment has information on it. Add a call like the following to your function: orte_gpr.dup_segment(NULL); or better yet ort