Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0

2016-08-09 Thread tmishima
Finally it worked, thanks! [mishima@manage OMB-3.1.1-openmpi2.0.0]$ ompi_info --param btl openib --level 5 | grep openib_flags MCA btl openib: parameter "btl_openib_flags" (current value: "65847", data source: default, level: 5 tuner/det ail, type: unsigned_int) [mishima@manage OMB-3.1.1

Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0

2016-08-08 Thread tmishima
The latest patch also causes a segfault... By the way, I found a typo as below. &ca_pml_ob1.use_all_rdma in the last line should be &mca_pml_ob1.use_all_rdma: +mca_pml_ob1.use_all_rdma = false; +(void) mca_base_component_var_register (&mca_pml_ob1_component.pmlm_version, "use_all_rdma", +

Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0

2016-08-08 Thread tmishima
I understood. Thanks. Tetsuya Mishima 2016/08/09 11:33:15、"devel"さんは「Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0」で書きました > I will add a control to have the new behavior or using all available RDMA btls or just the eager ones for the RDMA protocol. The flags will remain as they are. And

Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0

2016-08-08 Thread tmishima
Then, my understanding is that you will restore the default value of btl_openib_flags to previous one( = 310) and add a new MCA parameter to control HCA inclusion for such a situation. The work arround so far for openmpi-2.0.0 is setting those flags manually. Right? Tetsuya Mishima 2016/08/09 9:5

Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0

2016-08-08 Thread tmishima
Hi, unfortunately it doesn't work well. The previous one was much better ... [mishima@manage OMB-3.1.1-openmpi2.0.0]$ mpirun -np 2 -report-bindings osu_bw [manage.cluster:25107] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so cket 0[core 3[hwt 0]],

Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0

2016-08-07 Thread tmishima
Hi, here is the gdb output for additional information: (It might be inexact, because I built openmpi-2.0.0 without debug option) Core was generated by `osu_bw'. Program terminated with signal 11, Segmentation fault. #0 0x0031d9008806 in ?? () from /lib64/libgcc_s.so.1 (gdb) where #0 0x0

Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0

2016-08-07 Thread tmishima
Hi, it caused segfault as below: [manage.cluster:25436] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so cket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [B/B/B/B/B/B][./././././.] [manage.cluster:25436] MCW rank 1 bound to s

Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0

2016-08-07 Thread tmishima
Hi, I applied the patch to the file "pml_ob1_rdma.c" and ran osu_bw again. Then, I still see the bad performance for larger size(>=2097152 ). [mishima@manage OMB-3.1.1-openmpi2.0.0]$ mpirun -np 2 -report-bindings osu_bw [manage.cluster:27444] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[

Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0

2016-08-05 Thread tmishima
Hi Christoph, I applied the commits - pull/#1250 as Nathan told me and added "-mca btl_openib_flags 311" to the mpirun command line option, then it worked for me. I don't know the reason, but it looks ATOMIC_FOP in the btl_openib_flags degrades the sm/vader perfomance. Regards, Tetsuya Mishima

[OMPI devel] sm/vader BTL performance in openmpi-2.0.0

2016-07-28 Thread tmishima
Hi Nathan, You gave me a hit, thanks! I applied your patches and added "-mca btl_openib_flags 311" to the mpirun option, then it worked for me. [mishima@manage OMB-3.1.1-openmpi2.0.0]$ mpirun -np 2 -mca btl_openib_flags 311 -bind-to core -report-bindings osu_bw [manage.cluster:21733] MCW rank 0

Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0

2016-07-26 Thread tmishima
Hi Gilles, I confirmed the vader is used when I don't specify any BTL as you pointed out! Regards, Tetsuya Mishima [mishima@manage OMB-3.1.1-openmpi2.0.0]$ mpirun -np 2 --mca btl_base_verbose 10 -bind-to core -report-bindings osu_bw [manage.cluster:20006] MCW rank 0 bound to socket 0[core 0[hwt

Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0

2016-07-26 Thread tmishima
Hi, Thanks. I will try it and report later. Tetsuya Mishima 2016/07/27 9:20:28、"devel"さんは「Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0」で書きました > sm is deprecated in 2.0.0 and will likely be removed in favor of vader in 2.1.0. > > This issue is probably this known issue: https://github

Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0

2016-07-26 Thread tmishima
Hi Gilles, Thanks. I ran again with --mca pml ob1 but I've got the same results as below: [mishima@manage OMB-3.1.1-openmpi2.0.0]$ mpirun -np 2 -mca pml ob1 -bind-to core -report-bindings osu_bw [manage.cluster:18142] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././.][./././././.] [manage

[OMPI devel] sm BTL performace of the openmpi-2.0.0

2016-07-26 Thread tmishima
Hi folks, I saw a performance degradation of openmpi-2.0.0 when I ran our application on a node (12cores). So I did 4 tests using osu_bw as below: 1: mpirun –np 2 osu_bw bad(30% of test2) 2: mpirun –np 2 –mca btl self,sm osu_bw good(same as openmpi1.10.3) 3: mpir

Re: [OMPI devel] v2.0.0rc4 is released

2016-07-07 Thread tmishima
Hi Gilles san, thank you for your quick comment. I fully understand the meaning of the warning. Regarding the question you raise, I'm afraid that I'm not sure which solution is better ... Regards, Tetsuya Mishima 2016/07/07 14:13:02、"devel"さんは「Re: [OMPI devel] v2.0.0rc4 is released」で書きました > This

Re: [OMPI devel] v2.0.0rc4 is released

2016-07-07 Thread tmishima
Hi Jeff, sorry for a very short report. I saw the warning below at the end of installation of openmpi-2.0.0rc4. Is this okay? $ make install ... make install-exec-hook make[3]: Entering directory `/home/mishima/mis/openmpi/openmpi-pgi16.5/openmpi-2.0.0rc4' WARNING! Common symbols found:

Re: [OMPI devel] binding output error

2015-04-20 Thread tmishima
Hi Devendar, As far as I know, the report-bindings option shows the logical cpu order. On the other hand, you are talking about physical one, I guess. Regards, Tetsuya Mishima 2015/04/21 9:04:37、"devel"さんは「Re: [OMPI devel] binding output error」で書きました > HT is not enabled.  All node are same topo

Re: [OMPI devel] oshmem-openmpi-1.8.2 causes compile error with -i8(64bit fortarn integer) configuration

2014-09-16 Thread tmishima
Gilles, Your patch looks good to me and I think this issue should be fixed in the upcoming openmpi-1.8.3. Could you commit it to the trunk and create a CMR for it? Tetsuya > Mishima-san, > > the root cause is macro expansion does not always occur as one would > have expected ... > > could you pl

Re: [OMPI devel] oshmem-openmpi-1.8.2 causes compile error with -i8(64bit fortarn integer) configuration

2014-09-01 Thread tmishima
Gilles, Thank you for your fix. I successfully compiled it with PGI, although I could not check it executing actual test run. Tetsuya > Mishima-san, > > the root cause is macro expansion does not always occur as one would > have expected ... > > could you please give a try to the attached patch

[OMPI devel] oshmem-openmpi-1.8.2 causes compile error with -i8(64bit fortarn integer) configuration

2014-08-31 Thread tmishima
Hi folks, I tried to build openmpi-1.8.2 with PGI fortran and -i8(64bit fortran int) option as shown below: ./configure \ --prefix=/home/mishima/opt/mpi/openmpi-1.8.2-pgi14.7_int64 \ --enable-abi-breaking-fortran-status-i8-fix \ --with-tm \ --with-verbs \ --disable-ipv6 \ CC=pgcc CFLAGS="-tp k8-

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-08-02 Thread tmishima
Hi Ralph, I comfirmed that the openib issue was really fixed by r32395 and hope you'll be able to release the final version soon. Tetsuya > Kewl - the openib issue has been fixed in the nightly tarball. I'm waiting for review of a couple of pending CMRs, then we'll release a quick rc4 and move

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-08-02 Thread tmishima
I comfirmed openmpi-1.8.2rc3 with PGI-14.7 worked fine for me except for the openib issue reported by Mike Dubman. Tetsuya Mishima > Sorry, finally got through all this ompi email and see this problem was fixed. > > -Original Message- > From: devel [mailto:devel-boun...@open-mpi.org] On

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-31 Thread tmishima
Hi Paul, Thank you for your investigation. I'm sure it's very close to fix the problem although I myself can't do that. So I must owe you something... Please try Awamori, which is Okinawa's sake and very good in such a hot day. Tetsuya > On Wed, Jul 30, 2014 at 8:53 PM, Paul Hargrove wrote: >

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-30 Thread tmishima
Paul and Jeff, I additionally installed PGI14.4 and check the behavior. Then, I confirmed that both versions create same results. PGI14.7: [mishima@manage work]$ mpif90 test.f -o test.ex --showme pgfortran test.f -o test.ex -I/home/mishima/opt/mpi/openmpi-1.8.2rc2-pgi14.7/include -I/home/mishim

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-30 Thread tmishima
Hi Paul, thank you for your comment. I don't think my mpi_f08.mod is older one, because the time stamp is equal to the time when I rebuilt them today. [mishima@manage openmpi-1.8.2rc2-pgi14.7]$ ll lib/mpi* -rwxr-xr-x 1 mishima mishima315 Jul 30 12:27 lib/mpi_ext.mod -rwxr-xr-x 1 mishima mis

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-30 Thread tmishima
This is another one. (See attached file: openmpi-1.8.2rc2-pgi14.7.tar.gz) Tetusya > Tetsuya -- > > I am unable to test with the PGI compiler -- I don't have a license. I was hoping that LANL would be able to test today, but I don't think they got to it. > > Can you send more details? > > E.g.

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-30 Thread tmishima
Hi Jeff, Sorry for poor information and late reply. Today, I attended a very very long meeting ... Anyway, I attached compile-output and configure-log. (due to file size limitation, I send them in twice) I hope you could find the problem. (See attached file: openmpi-1.8-pgi14.7.tar.gz) Regar

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-29 Thread tmishima
Sorry for poor information. I attached compile-output and configure-log. I hope you could find the problem. (See attached file: openmpi-pgi14.7.tar.gz) Regards, Tetsuya Mishima > Tetsuya -- > > I am unable to test with the PGI compiler -- I don't have a license. I was hoping that LANL would b

[OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-28 Thread tmishima
Hi folks, I tried to build openmpi-1.8.2rc2 with PGI-14.7 and execute a sample program. Then, it causes linking error: [mishima@manage work]$ cat test.f program hello_world use mpi_f08 implicit none type(MPI_Comm) :: comm integer :: myid, npes, ierror integer

Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

2014-06-20 Thread tmishima
Thanks Ralph. I'll check it on next Monday. Tetsuya > Should be fixed with r32058 > > > On Jun 20, 2014, at 4:13 AM, tmish...@jcity.maeda.co.jp wrote: > > > > > > > Hi Ralph, > > > > By the way, something is wrong with your latest rmaps_rank_file.c. > > I've got the error below. I'm tring to fi

Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

2014-06-20 Thread tmishima
Hi Ralph, By the way, something is wrong with your latest rmaps_rank_file.c. I've got the error below. I'm tring to find the problem. But, you could find it more quickly... [mishima@manage trial]$ cat rankfile rank 0=node05 slot=0-1 rank 1=node05 slot=3-4 rank 2=node05 slot=6-7 [mishima@manage

Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

2014-06-20 Thread tmishima
I'm not sure, but I guess it's related to Gilles's ticket. It's a quite bad binding pattern as Ralph pointed out, so checking for that condition and disqualifying coll/ml could be a practical solution as well. Tetsuya > It is related, but it means that coll/ml has a higher degree of sensitivity

[OMPI devel] trunk hangs when I specify a particular binding by rankfile

2014-06-19 Thread tmishima
Hi folks, Recently I have been seeing a hang with trunk when I specify a particular binding by use of rankfile or "-map-by slot". This can be reproduced by the rankfile which allocates a process beyond socket boundary. For example, on the node05 which has 2 socket with 4 core, the rank 1 is allo

Re: [OMPI devel] openmpi-1.8 - hangup using more than 4 nodes under managed state by Torque

2014-04-01 Thread tmishima
Thanks Ralph. Tetsuya > I tracked it down - not Torque specific, but impacts all managed environments. Will fix > > > On Apr 1, 2014, at 2:23 AM, tmish...@jcity.maeda.co.jp wrote: > > > > > Hi Ralph, > > > > I saw another hangup with openmpi-1.8 when I used more than 4 nodes > > (having 8 cores

[OMPI devel] openmpi-1.8 - hangup using more than 4 nodes under managed state by Torque

2014-04-01 Thread tmishima
Hi Ralph, I saw another hangup with openmpi-1.8 when I used more than 4 nodes (having 8 cores each) under managed state by Torque. Although I'm not sure you can reproduce it with SLURM, at leaset with Torque it can be reproduced in this way: [mishima@manage ~]$ qsub -I -l nodes=4:ppn=8 qsub: wai

Re: [OMPI devel] system call failed during shared memory initialization with openmpi-1.8a1r31254

2014-03-30 Thread tmishima
Hi Jeff, it worked for me with openmpi-1.8rc1. Tetsuya > Ralph applied a bunch of CMRs to the v1.8 branch after the nightly tarball was made last night. > > I just created a new nightly tarball that includes all of those CMRs: 1.8a1r31269. It should have the fix for this error included in it.

Re: [OMPI devel] system call failed during shared memory initialization with openmpi-1.8a1r31254

2014-03-28 Thread tmishima
Thanks Jeff. But I'm already offline today ... I can not confirm it until monday morning, sorry. Tetsuya > Ralph applied a bunch of CMRs to the v1.8 branch after the nightly tarball was made last night. > > I just created a new nightly tarball that includes all of those CMRs: 1.8a1r31269. It s

Re: [OMPI devel] system call failed during shared memory initialization with openmpi-1.8a1r31254

2014-03-28 Thread tmishima
Thanks Jeff. It seems to be really the latest one - ticket #4474. > On Mar 28, 2014, at 5:45 AM, wrote: > > > -- > > A system call failed during shared memory initialization that should > > not have. It is likely that your

[OMPI devel] system call failed during shared memory initialization with openmpi-1.8a1r31254

2014-03-28 Thread tmishima
Hi all, I saw this error as shown below with openmpi-1.8a1r31254. I've never seen it before with openmpi-1.7.5. The message implies it's related to vader and I can stop it by excluding vader from btl, -mca btl ^vader. Could someone fix this problem? Tetsuya [mishima@manage openmpi]$ mpirun -n

Re: [OMPI devel] cleanup of rr_byobj

2014-03-27 Thread tmishima
I added two improvements. Please replace the previous patch file by this attached one, and take a look this week end. 1. Add pre-check for ORTE_ERR_NOT_FOUND to make retry with byslot work afterward correctly. Otherwise, the retry could fail, because some fields such as node->procs, node->slots_

Re: [OMPI devel] cleanup of rr_byobj

2014-03-25 Thread tmishima
no problem - it's a minor cleanup. Tetsuya > Hi Tetsuya > > Let me take a look when I get home this weekend - I'm giving an ORTE tutorial to a group of new developers this week and my time is very limited. > > Thanks > Ralph > > > > On Tue, Mar 25, 2014 at 5:37 PM, wrote: > > Hi Ralph, I moved

[OMPI devel] cleanup of rr_byobj

2014-03-25 Thread tmishima
Hi Ralph, I moved on to the development list. I'm not sure why add_one flag is used in the rr_byobj. Here, if oversubscribed, proc is mapped to each object one by one. So, I think the add_one is not necesarry. Instead, when the user doesn't permit oversubscription, the second pass should be skip