Yes, that's fine. Thx!
On Aug 24, 2010, at 9:02 AM, Philippe wrote:
> awesome, I'll give it a spin! with the parameters as below?
>
> p.
>
> On Tue, Aug 24, 2010 at 10:47 AM, Ralph Castain wrote:
>> I think I have this working now - try anything on or after r23647
>>
>>
I think I have this working now - try anything on or after r23647
On Aug 23, 2010, at 1:36 PM, Philippe wrote:
> sure. I took a guess at ppn and nodes for the case where 2 processes
> are on the same node... I dont claim these are the right values ;-)
>
>
>
> c0301b10e1 ~/mpi> env|grep OMPI
sure. I took a guess at ppn and nodes for the case where 2 processes
are on the same node... I dont claim these are the right values ;-)
c0301b10e1 ~/mpi> env|grep OMPI
OMPI_MCA_orte_nodes=c0301b10e1
OMPI_MCA_orte_rank=0
OMPI_MCA_orte_ppn=2
OMPI_MCA_orte_num_procs=2
Can you send me the values you are using for the relevant envars? That way I
can try to replicate here
On Aug 23, 2010, at 1:15 PM, Philippe wrote:
> I took a look at the code but I'm afraid I dont see anything wrong.
>
> p.
>
> On Thu, Aug 19, 2010 at 2:32 PM, Ralph Castain
I took a look at the code but I'm afraid I dont see anything wrong.
p.
On Thu, Aug 19, 2010 at 2:32 PM, Ralph Castain wrote:
> Yes, that is correct - we reserve the first port in the range for a daemon,
> should one exist.
> The problem is clearly that get_node_rank is
Yes, that is correct - we reserve the first port in the range for a daemon,
should one exist.
The problem is clearly that get_node_rank is returning the wrong value for
the second process (your rank=1). If you want to dig deeper, look at the
orte/mca/ess/generic code where it generates the nidmap
Ralph,
somewhere in ./orte/mca/oob/tcp/oob_tcp.c, there is this comment:
orte_node_rank_t nrank;
/* do I know my node_local_rank yet? */
if (ORTE_NODE_RANK_INVALID != (nrank =
orte_ess.get_node_rank(ORTE_PROC_MY_NAME)) &&
Something doesn't look right - here is what the algo attempts to do:
given a port range of 1-12000, the lowest rank'd process on the node
should open port 1. The next lowest rank on the node will open 10001,
etc.
So it looks to me like there is some confusion in the local rank algo. I'll
Ralph,
I'm able to use the generic module when the processes are on different machines.
what would be the values of the EV when two processes are on the same
machine (hopefully talking over SHM).
i've played with combination of nodelist and ppn but no luck. I get errors like:
Use what hostname returns - don't worry about IP addresses as we'll discover
them.
On Jul 26, 2010, at 10:45 PM, Philippe wrote:
> Thanks a lot!
>
> now, for the ev "OMPI_MCA_orte_nodes", what do I put exactly? our
> nodes have a short/long name (it's rhel 5.x, so the command hostname
>
Doh - yes it should! I'll fix it right now.
Thanks!
On Jul 26, 2010, at 9:28 PM, Philippe wrote:
> Ralph,
>
> i was able to test the generic module and it seems to be working.
>
> one question tho, the function orte_ess_generic_component_query in
>
Ralph,
i was able to test the generic module and it seems to be working.
one question tho, the function orte_ess_generic_component_query in
"orte/mca/ess/generic/ess_generic_component.c" calls getenv with the
argument "OMPI_MCA_enc", which seems to cause the module to fail to
load. shouldnt it
Dev trunk looks okay right now - I think you'll be fine using it. My new
component -might- work with 1.5, but probably not with 1.4. I haven't checked
either of them.
Anything at r23478 or above will have the new module. Let me know how it works
for you. I haven't tested it myself, but am
Ralph,
Thank you so much!!
I'll give it a try and let you know.
I know it's a tough question, but how stable is the dev trunk? Can I
just grab the latest and run, or am I better off taking your changes
and copy them back in a stable release? (if so, which one? 1.4? 1.5?)
p.
On Thu, Jul 22,
It was easier for me to just construct this module than to explain how to do so
:-)
I will commit it this evening (couple of hours from now) as that is our
standard practice. You'll need to use the developer's trunk, though, to use it.
Here are the envars you'll need to provide:
Each process
On Wed, Jul 21, 2010 at 10:44 AM, Ralph Castain wrote:
>
> On Jul 21, 2010, at 7:44 AM, Philippe wrote:
>
>> Ralph,
>>
>> Sorry for the late reply -- I was away on vacation.
>
> no problem at all!
>
>>
>> regarding your earlier question about how many processes where
>>
On Jul 21, 2010, at 7:44 AM, Philippe wrote:
> Ralph,
>
> Sorry for the late reply -- I was away on vacation.
no problem at all!
>
> regarding your earlier question about how many processes where
> involved when the memory was entirely allocated, it was only two, a
> sender and a receiver.
Ralph,
Sorry for the late reply -- I was away on vacation.
regarding your earlier question about how many processes where
involved when the memory was entirely allocated, it was only two, a
sender and a receiver. I'm still trying to pinpoint what can be
different between the standalone case and
Well, I finally managed to make this work without the required ompi-server
rendezvous point. The fix is only in the devel trunk right now - I'll have to
ask the release managers for 1.5 and 1.4 if they want it ported to those series.
On the notion of integrating OMPI to your launch environment:
I'm wondering if we can't make this simpler. What launch environment are you
operating under? I know you said you can't use mpiexec, but I'm wondering if we
could add support for your environment to mpiexec so you could.
On Jul 18, 2010, at 4:09 PM, Philippe wrote:
> Ralph,
>
> thanks for
On Jul 18, 2010, at 4:09 PM, Philippe wrote:
> Ralph,
>
> thanks for investigating.
>
> I've applied the two patches you mentioned earlier and ran with the
> ompi server. Although i was able to runn our standalone test, when I
> integrated the changes to our code, the processes entered a crazy
Ralph,
thanks for investigating.
I've applied the two patches you mentioned earlier and ran with the
ompi server. Although i was able to runn our standalone test, when I
integrated the changes to our code, the processes entered a crazy loop
and allocated all the memory available when calling
Okay, I can reproduce this problem. Frankly, I don't think this ever worked
with OMPI, and I'm not sure how the choice of BTL makes a difference.
The program is crashing in the communicator definition, which involves a
communication over our internal out-of-band messaging system. That system
Reopening this thread. In searching another problem I ran across this one in a
different context. Turns out there really is a bug here that needs to be
addressed.
I'll try to tackle it this weekend - will update you when done.
On Jun 25, 2010, at 7:23 AM, Philippe wrote:
> Hi,
>
> I'm
Sorry for the delayed response - Brad asked if I could comment on this.
I'm afraid your application, as written, isn't going to work because the
rendezvous protocol isn't correct. You cannot just write a port to a file and
have the other side of a connect/accept read it. The reason for this is
Hi,
I'm trying to run a test program which consists of a server creating a
port using MPI_Open_port and N clients using MPI_Comm_connect to
connect to the server.
I'm able to do so with 1 server and 2 clients, but with 1 server + 3
clients, I get the following error message:
[node003:32274]
26 matches
Mail list logo