Hi Ralph > > some weeks ago (mainly in the beginning of October) I reported > > several problems and I would be grateful if you can tell me if > > and probably when somebody will try to solve them. > > > > 1) I don't get the expected results, when I try to send or scatter > > the columns of a matrix in Java. The received column values have > > nothing to do with the original values, if I use a homogeneous > > environment and the program breaks with "An error occurred in > > MPI_Comm_dup" and "MPI_ERR_INTERN: internal error", if I use > > a heterogeneous environment. I would like to use the Java API. > > > > 2) I don't get the expected result, when I try to scatter an object > > in Java. > > https://svn.open-mpi.org/trac/ompi/ticket/3351 > > Nothing has happened on these yet
Do you have an idea when somebody will have time to fix these problems? > > 3) I still get only a message that all nodes are already filled up > > when I use a "rankfile" and nothing else happens. I would like > > to use a rankfile. You filed a bug fix for it. > > > > I believe rankfile was fixed, at least on the trunk - not sure if it > was moved to 1.7. I assume that's the release you are talking about? I'm using the trunk for my tests. It didn't work for me because I used the rankfile without a hostfile or a hostlist (it is not enough to specify the hosts in the rankfile). Everything works fine when I provide a "correct" hostfile or hostlist and the binding isn't too compilicated (see my last example below). My rankfile: rank 0=sunpc0 slot=0:0 rank 1=sunpc1 slot=0:0 rank 2=sunpc0 slot=1:0 rank 3=sunpc1 slot=1:0 My hostfile: sunpc0 slots=4 sunpc1 slots=4 It will not work without a hostfile or hostlist. sunpc0 mpi-probleme 128 mpiexec -report-bindings -rf rankfile_1.openmpi \ -np 4 hostname ------------------------------------------------------------------------ The rankfile that was used claimed that a host was either not allocated or oversubscribed its slots. Please review your rank-slot assignments and your host allocation to ensure a proper match. Also, some systems may require using full hostnames, such as "host1.example.com" (instead of just plain "host1"). Host: sunpc1 ------------------------------------------------------------------------ sunpc0 mpi-probleme 129 I get the expected output, if I add "-hostfile host_sunpc" or "-host sunpc0,sunpc1" on the command line. sunpc0 mpi-probleme 129 mpiexec -report-bindings -rf rankfile_1.openmpi \ -np 4 -hostfile host_sunpc hostname [sunpc0:06954] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/.][./.] [sunpc0:06954] MCW rank 2 bound to socket 1[core 2[hwt 0]]: [./.][B/.] sunpc0 sunpc0 [sunpc1:12583] MCW rank 1 bound to socket 0[core 0[hwt 0]]: [B/.][./.] [sunpc1:12583] MCW rank 3 bound to socket 1[core 2[hwt 0]]: [./.][B/.] sunpc1 sunpc1 sunpc0 mpi-probleme 130 Furthermore it is necessary that both the rankfile and the hostfile contain qualified or unqualified hostnames in the same way. Otherwise it will not work as you can see in the following output where my hostfile contains a qualified hostname and my rankfile only the hostname without domain name. sunpc0 mpi-probleme 131 mpiexec -report-bindings -rf rankfile_1.openmpi \ -np 4 -hostfile host_sunpc_full hostname ------------------------------------------------------------------------ The rankfile that was used claimed that a host was either not allocated or oversubscribed its slots. Please review your rank-slot assignments and your host allocation to ensure a proper match. Also, some systems may require using full hostnames, such as "host1.example.com" (instead of just plain "host1"). Host: sunpc1 ------------------------------------------------------------------------ sunpc0 mpi-probleme 132 Unfortunately my complicated rankfile still doesn't work, although you told me some weeks ago that it is correct. rank 0=sunpc0 slot=0:0-1,1:0-1 rank 1=sunpc1 slot=0:0-1 rank 2=sunpc1 slot=1:0 rank 3=sunpc1 slot=1:1 sunpc1 mpi-probleme 103 mpiexec -report-bindings -rf rankfile -np 4 \ -hostfile host_sunpc hostname sunpc1 sunpc1 sunpc1 [sunpc1:12741] MCW rank 2 bound to socket 1[core 2[hwt 0]]: [./.][B/.] [sunpc1:12741] MCW rank 3 bound to socket 1[core 3[hwt 0]]: [./.][./B] [sunpc1:12741] MCW rank 1 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]]: [B/B][./.] [sunpc0:07075] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]]: [B/B][./.] sunpc0 sunpc1 mpi-probleme 104 The bindings for ranks 1 to 3 are correct, but rank 0 didn't get the cores from the second socket. > > 4) I would like to have "-cpus-per-proc", "-npersocket", etc for > > every set of machines/applications and not globally for all > > machines/applications if I specify several colon-separated sets > > of machines or applications on the command line. You told me that > > it could be done. > > > > 5) By the way, it seems that the option "-cpus-per-proc" isn't any > > longer supported in openmpi-1.7 and openmpi-1.9. How can I bind a > > multi-threaded process to more than one core in these versions? > > I'm afraid I haven't gotten around to working on cpus-per-proc, though > I believe npersocket was fixed. Will you also support "-cpus-per-proc" in openmpi-1.7 and openmpi-1.9? At the moment it isn't available. sunpc1 mpi-probleme 106 mpiexec -report-bindings -np 4 \ -host linpc0,linpc1,sunpc0,sunpc1 -cpus-per-proc 4 -map-by core \ -bind-to core hostname mpiexec: Error: unknown option "-p" Type 'mpiexec --help' for usage. sunpc1 mpi-probleme 110 mpiexec --help | grep cpus cpus allocated to this job [default: none] -use-hwthread-cpus|--use-hwthread-cpus Use hardware threads as independent cpus > > I can provide my small programs once more if you need them. Thank > > you very much for any answer in advance. Thank you very much for all your help and time Siegmar