Hello @ll.
I'm doing some changes in the communication framework. Right now i'm
working on a "secure" MPI_Send, this send needs to know when an endpoint
goes down, and then retry the communication constructing a new endpoint, or
at least, overwriting the data of the old endpoint with the new
After having to explain to someone at SC for the umpteenth time this week that
the "vader" BTL uses the XPMEM transport under the covers, I'd like to put
forth an appeal to rename the "vader" BTL to be "xpmem."
Here's my rationale for why:
1. Although we have a history of Star Wars-related
+1
Isn't there precedent with the other BTLs to name them based on the
messaging protocol they are supporting instead of some movie character
(tcp, openib, shmem, portals, ...).
--td
On 11/17/2011 8:11 AM, Jeff Squyres wrote:
After having to explain to someone at SC for the umpteenth time
Frankly, the only vote that counts is Nathan's - it's his btl, and we have
never forcibly made someone rename their component. I would suggest we not set
that precedent. I'm comfortable with whatever he decides to call it.
On Nov 17, 2011, at 7:00 AM, TERRY DONTJE wrote:
> +1
>
> Isn't there
I could possibly buy your argument Ralph if this was a one off BTL that
only Nathan (and his employer) is going to use. I am assuming though
this is a more general protocol for a vendor specific protocol. Thus it
seems that a sane naming of the BTL is within the realm of the community.
That
On 11/17/2011 9:54 AM, Ralph Castain wrote:
On Nov 17, 2011, at 7:45 AM, TERRY DONTJE wrote:
I could possibly buy your argument Ralph if this was a one off BTL
that only Nathan (and his employer) is going to use. I am assuming
though this is a more general protocol for a vendor specific
I have got to say I like the name ...
On Nov 17, 2011, at 11:34 AM, Barrett, Brian W wrote:
> On 11/17/11 6:29 AM, "Ralph Castain" wrote:
>
>> Frankly, the only vote that counts is Nathan's - it's his btl, and we
>> have never forcibly made someone rename their component. I
I guess I reach one of these corner-cases that didn't got tested. I can't start
any apps (not even a hostname) after this commit using the rsh PLM (as soon as
I add a hostile). The mpirun is blocked in an infinite loop (after it spawned
the daemons) in orte_rmaps_base_compute_vpids. Attaching
I'll take a look - I tested that case, and the trunk appears to be working on
all the MTT runs. I'll have to see if I can replicate it.
On Nov 17, 2011, at 7:42 PM, George Bosilca wrote:
> I guess I reach one of these corner-cases that didn't got tested. I can't
> start any apps (not even a
Hmmm...well, things seem to work just fine for me:
[rhc@odin ~/ompi-hwloc]$ mpirun -np 2 -bynode -mca plm rsh hostname
odin090.cs.indiana.edu
odin091.cs.indiana.edu
[rhc@odin mpi]$ mpirun -np 2 -bynode -mca plm rsh ./hello_nodename
Hello, World, I am 1 of 2 on host odin091.cs.indiana.edu from
I have a fresh checkout. In your example where are your hosts coming from? How
do you specify the hostile?
george.
On Nov 17, 2011, at 19:06 , Ralph Castain wrote:
> Hmmm...well, things seem to work just fine for me:
>
> [rhc@odin ~/ompi-hwloc]$ mpirun -np 2 -bynode -mca plm rsh hostname
>
On Nov 17, 2011, at 8:13 PM, George Bosilca wrote:
> I have a fresh checkout. In your example where are your hosts coming from?
> How do you specify the hostile?
The hosts are coming from the slurm allocation, though I also tried adding
-host arguments. The error you describe comes well after
Maybe the issue is generated by how the hostile is specified. I used
orte_default_hostfile= in my mca-params.conf.
george.
On Nov 17, 2011, at 19:17 , Ralph Castain wrote:
> I'm still building on odin, but will check there again to see if I can
> replicate - perhaps something didn't get
I can't get it to fail, even with hostfile arguments. I'll try again in the
morning.
On Nov 17, 2011, at 8:49 PM, George Bosilca wrote:
> Maybe the issue is generated by how the hostile is specified. I used
> orte_default_hostfile= in my mca-params.conf.
>
> george.
>
> On Nov 17, 2011, at
14 matches
Mail list logo