Re: [hwloc-devel] lstopo-gui

2012-05-24 Thread Brice Goglin
Le 24/05/2012 22:14, Jeff Squyres a écrit :
> On May 24, 2012, at 11:30 AM, Brice Goglin wrote:
>
>> So what you dislike isn't the split, it's the fact that lstopo doesn't
>> behave as it did earlier. You want lstopo-nogui and lstopo instead of
>> lstopo and lstopo-gui. And alternative systems can make lstopo point to
>> lstopo-nogui when the real lstopo isn't installed.
> Taking a step back and looking objectively, I think that's a good assessment 
> and probably a good simple solution (i.e., make "lstopo" be the one with the 
> Cario/etc. support).  
>
> FWIW, "gui" and "nogui" are not necessarily good names.  The current 
> lstopo-gui isn't really a traditional GUI application (even though it can 
> display a trivial X window).  I seem to recall that "nox" was discarded as a 
> possibility, because lstopo supports more than just X.  
>
> Are there other traditional suffixes that are used for this kind of thing?  
> Or is lstopo kinda unique in this area?
>

There are lshw and lshw-gtk and others examples where the suffix is the
toolkit name. We could use "-text" but that doesn't seem very common
either. Same for "-nox" which may also not look obvious to random users.
For sure "-nogui" is bad a name :)

Brice



Re: [OMPI devel] MPI_Cart_coords_f segv with Intel compiler

2012-05-24 Thread Jeff Squyres
FWIW, I think we imported your patch a while ago.  Here it is on the trunk:

https://svn.open-mpi.org/trac/ompi/browser/trunk/opal/mca/memory/linux/malloc.c#L3933

And here it is on v1.6:

https://svn.open-mpi.org/trac/ompi/browser/branches/v1.6/opal/mca/memory/linux/malloc.c#L3933



On May 24, 2012, at 2:06 PM, Larry Baker wrote:

> Terry,
> 
> What you are seeing is a bug in the vectorizer in the Intel 2011.6.233 
> release.  We've talked about this before.  You should probably remove that 
> compiler from your system(s).  I think the new release of OpenMPI describes 
> this problem, but does not stop if from occurring.  I write a patch for 
> ptmalloc2/malloc.c for OpenMPI 1.4.3 which automatically adjusts the 
> optimization level for _int_malloc(), which is where the bug occurs.
> 
> Larry Baker
> US Geological Survey
> 650-329-5608
> ba...@usgs.gov
> 
> -- Start of Patch --
> --- opal/mca/memory/ptmalloc2/malloc.c.original   2010-04-13 
> 10:30:26.0 -0700
> +++ opal/mca/memory/ptmalloc2/malloc.c2011-11-04 15:01:37.0 
> -0700
> @@ -2,6 +2,17 @@
>  /* Copyright (c) 2010  Cisco Systems, Inc.  All rights reserved.
>   */
>  
> +/* With Intel Composer XE V12.1.0, release 2011.6.233, any launch   */
> +/* fails, even before main(), due to a bug in the vectorizer (see   */
> +/* https://svn.open-mpi.org/trac/ompi/changeset/25290).  The fix is */
> +/* to disable vectorization by reducing the optimization level to   */
> +/* -O1 for _int_malloc().  The only reliable method to identify */
> +/* release 2011.6.233 is the predefined __INTEL_COMPILER_BUILD_DATE */
> +/* macro, which will have the value 20110811 (Linux, Windows, and   */
> +/* Mac OS X).  (The predefined __INTEL_COMPILER macro is nonsense,  */
> +/* , and both the 2011.6.233 and 2011.7.256 releases identify   */
> +/* themselves as V12.1.0 from the -v command line option.)  */
> +
>  #define OPAL_DISABLE_ENABLE_MEM_DEBUG 1
>  #include "opal_config.h"
>  
> @@ -3945,6 +3956,12 @@
>-- malloc --
>  */
>  
> +#ifdef __INTEL_COMPILER_BUILD_DATE
> +#if __INTEL_COMPILER_BUILD_DATE == 20110811
> +#pragma GCC optimization_level 1
> +#endif
> +#endif
> +
>  Void_t*
>  _int_malloc(mstate av, size_t bytes)
>  {
> -- End of Patch --
> 
> On 24 May 2012, at 6:54 AM, TERRY DONTJE wrote:
> 
>> I am seeing several Cart Fortran tests (like MPI_Cart_coords_f) segv in 
>> opal_memory_ptmalloc2_int_free when OMPI trunk is compiled with icc 12.1.0 
>> for 64 bit on linux.   Just wondering if anyone has seen anything similar to 
>> this with a different version of icc.  Other non-Intel compilers seem to not 
>> exhibit this issue.
>> 
>> -- 
>> Terry D. Dontje | Principal Software Engineer
>> Developer Tools Engineering | +1.781.442.2631
>> Oracle - Performance Technologies
>> 95 Network Drive, Burlington, MA 01803
>> Email terry.don...@oracle.com
>> 
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




[OMPI devel] Fwd: [Open MPI] #3108: Affinity still busted in v1.6

2012-05-24 Thread Jeff Squyres
FYI.

I think I have fixes ready, but I am bummed that we didn't fix the whole 
paffinity mess properly in 1.6.  :-(

Begin forwarded message:

> From: Open MPI 
> Subject: [Open MPI] #3108: Affinity still busted in v1.6
> Date: May 24, 2012 2:59:42 PM EDT
> Cc: 
> 
> #3108: Affinity still busted in v1.6
> -+
> Reporter:  jsquyres  |  Owner:  rhc
>Type:  defect| Status:  new
> Priority:  major |  Milestone:  Open MPI 1.6.1
> Version:  trunk |   Keywords:
> -+
> I found a system yesterday where affinity is still horribly broken in
> v1.6.  bind-to-core and bind-to-socket effectively did completely
> incorrect things.  Among other things, the system in question has
> effectively fairly random physical socket/core numbering.  It's not
> uniform across all the cores in any given socket.
> 
> I have a new bitbucket where I think I've fixed the problems, and will be
> reviewing the code with Ralph soon:
> 
> https://bitbucket.org/jsquyres/ompi-affinity-again-v1.6
> 
> There were actually three bugs (that I've found so far); there's a
> separate commit on that bitbucket for each.  See the commit messages on
> each of them.
> 
> Once this firms up a bit, I'll make a tarball and ask others in the
> community to test it (e.g., Oracle and IBM, which have traditionally been
> good at finding whacky paffinity bugs).
> 
> Note that this ''only'' affects OMPI v1.6 -- the trunk has a wholly
> revamped affinity system and the entire paffintiy framework is gone
> (yay!).
> 
> -- 
> Ticket URL: 
> Open MPI 
> 


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] MPI_Cart_coords_f segv with Intel compiler

2012-05-24 Thread TERRY DONTJE
Forgot to add the date of my compiler was 2011.10.11 so I wonder if it 
might not have this issue you mentioned below.  Anyways, I'll keep the 
below in mind as I try to run more tests.


thanks,

--td

On 5/24/2012 2:06 PM, Larry Baker wrote:

Terry,

What you are seeing is a bug in the vectorizer in the Intel 2011.6.233 
release.  We've talked about this before.  You should probably remove 
that compiler from your system(s).  I think the new release of OpenMPI 
describes this problem, but does not stop if from occurring.  I write 
a patch for ptmalloc2/malloc.c for OpenMPI 1.4.3 which automatically 
adjusts the optimization level for _int_malloc(), which is where the 
bug occurs.


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov 

-- Start of Patch --
--- opal/mca/memory/ptmalloc2/malloc.c.original2010-04-13 
10:30:26.0 -0700

+++ opal/mca/memory/ptmalloc2/malloc.c2011-11-04 15:01:37.0 -0700
@@ -2,6 +2,17 @@
 /* Copyright (c) 2010  Cisco Systems, Inc.  All rights reserved.
  */
+/* With Intel Composer XE V12.1.0, release 2011.6.233, any launch   */
+/* fails, even before main(), due to a bug in the vectorizer (see   */
+/* https://svn.open-mpi.org/trac/ompi/changeset/25290).  The fix is */
+/* to disable vectorization by reducing the optimization level to   */
+/* -O1 for _int_malloc().  The only reliable method to identify */
+/* release 2011.6.233 is the predefined __INTEL_COMPILER_BUILD_DATE */
+/* macro, which will have the value 20110811 (Linux, Windows, and   */
+/* Mac OS X).  (The predefined __INTEL_COMPILER macro is nonsense,  */
+/* , and both the 2011.6.233 and 2011.7.256 releases identify   */
+/* themselves as V12.1.0 from the -v command line option.)  */
+
 #define OPAL_DISABLE_ENABLE_MEM_DEBUG 1
 #include "opal_config.h"
@@ -3945,6 +3956,12 @@
   -- malloc --
 */
+#ifdef __INTEL_COMPILER_BUILD_DATE
+#if __INTEL_COMPILER_BUILD_DATE == 20110811
+#pragma GCC optimization_level 1
+#endif
+#endif
+
 Void_t*
 _int_malloc(mstate av, size_t bytes)
 {
-- End of Patch --

On 24 May 2012, at 6:54 AM, TERRY DONTJE wrote:

I am seeing several Cart Fortran tests (like MPI_Cart_coords_f) segv 
in opal_memory_ptmalloc2_int_free when OMPI trunk is compiled with 
icc 12.1.0 for 64 bit on linux.   Just wondering if anyone has seen 
anything similar to this with a different version of icc.  Other 
non-Intel compilers seem to not exhibit this issue.


--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 



___
devel mailing list
de...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/devel




--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI devel] MPI_Cart_coords_f segv with Intel compiler

2012-05-24 Thread TERRY DONTJE
Actually, I don't think the below is the issue.  I think the 
OMPI_ARRAY_INT_2_LOGICAL macro is doing a free on line 193 when it 
shouldn't because the OMPI_ARRAY_LOGICAL_2_INT macro calling an empty 
OMPI_ARRAY_LOGICAL__2_INT_ALLOC macro which in the other case that macro 
actually does a malloc.


It was interesting looking at the diff between 26283 and the prior 
version for fint_2_int.h and seeing commented out "frees" being 
uncommented.  I suspect only one of the frees should have been commented 
out.


--td

On 5/24/2012 2:06 PM, Larry Baker wrote:

Terry,

What you are seeing is a bug in the vectorizer in the Intel 2011.6.233 
release.  We've talked about this before.  You should probably remove 
that compiler from your system(s).  I think the new release of OpenMPI 
describes this problem, but does not stop if from occurring.  I write 
a patch for ptmalloc2/malloc.c for OpenMPI 1.4.3 which automatically 
adjusts the optimization level for _int_malloc(), which is where the 
bug occurs.


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov 

-- Start of Patch --
--- opal/mca/memory/ptmalloc2/malloc.c.original2010-04-13 
10:30:26.0 -0700

+++ opal/mca/memory/ptmalloc2/malloc.c2011-11-04 15:01:37.0 -0700
@@ -2,6 +2,17 @@
 /* Copyright (c) 2010  Cisco Systems, Inc.  All rights reserved.
  */
+/* With Intel Composer XE V12.1.0, release 2011.6.233, any launch   */
+/* fails, even before main(), due to a bug in the vectorizer (see   */
+/* https://svn.open-mpi.org/trac/ompi/changeset/25290).  The fix is */
+/* to disable vectorization by reducing the optimization level to   */
+/* -O1 for _int_malloc().  The only reliable method to identify */
+/* release 2011.6.233 is the predefined __INTEL_COMPILER_BUILD_DATE */
+/* macro, which will have the value 20110811 (Linux, Windows, and   */
+/* Mac OS X).  (The predefined __INTEL_COMPILER macro is nonsense,  */
+/* , and both the 2011.6.233 and 2011.7.256 releases identify   */
+/* themselves as V12.1.0 from the -v command line option.)  */
+
 #define OPAL_DISABLE_ENABLE_MEM_DEBUG 1
 #include "opal_config.h"
@@ -3945,6 +3956,12 @@
   -- malloc --
 */
+#ifdef __INTEL_COMPILER_BUILD_DATE
+#if __INTEL_COMPILER_BUILD_DATE == 20110811
+#pragma GCC optimization_level 1
+#endif
+#endif
+
 Void_t*
 _int_malloc(mstate av, size_t bytes)
 {
-- End of Patch --

On 24 May 2012, at 6:54 AM, TERRY DONTJE wrote:

I am seeing several Cart Fortran tests (like MPI_Cart_coords_f) segv 
in opal_memory_ptmalloc2_int_free when OMPI trunk is compiled with 
icc 12.1.0 for 64 bit on linux.   Just wondering if anyone has seen 
anything similar to this with a different version of icc.  Other 
non-Intel compilers seem to not exhibit this issue.


--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 



___
devel mailing list
de...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/devel




--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI devel] MPI_Cart_coords_f segv with Intel compiler

2012-05-24 Thread Larry Baker
Terry,

What you are seeing is a bug in the vectorizer in the Intel 2011.6.233 release. 
 We've talked about this before.  You should probably remove that compiler from 
your system(s).  I think the new release of OpenMPI describes this problem, but 
does not stop if from occurring.  I write a patch for ptmalloc2/malloc.c for 
OpenMPI 1.4.3 which automatically adjusts the optimization level for 
_int_malloc(), which is where the bug occurs.

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

-- Start of Patch --
--- opal/mca/memory/ptmalloc2/malloc.c.original 2010-04-13 10:30:26.0 
-0700
+++ opal/mca/memory/ptmalloc2/malloc.c  2011-11-04 15:01:37.0 -0700
@@ -2,6 +2,17 @@
 /* Copyright (c) 2010  Cisco Systems, Inc.  All rights reserved.
  */
 
+/* With Intel Composer XE V12.1.0, release 2011.6.233, any launch   */
+/* fails, even before main(), due to a bug in the vectorizer (see   */
+/* https://svn.open-mpi.org/trac/ompi/changeset/25290).  The fix is */
+/* to disable vectorization by reducing the optimization level to   */
+/* -O1 for _int_malloc().  The only reliable method to identify */
+/* release 2011.6.233 is the predefined __INTEL_COMPILER_BUILD_DATE */
+/* macro, which will have the value 20110811 (Linux, Windows, and   */
+/* Mac OS X).  (The predefined __INTEL_COMPILER macro is nonsense,  */
+/* , and both the 2011.6.233 and 2011.7.256 releases identify   */
+/* themselves as V12.1.0 from the -v command line option.)  */
+
 #define OPAL_DISABLE_ENABLE_MEM_DEBUG 1
 #include "opal_config.h"
 
@@ -3945,6 +3956,12 @@
   -- malloc --
 */
 
+#ifdef __INTEL_COMPILER_BUILD_DATE
+#if __INTEL_COMPILER_BUILD_DATE == 20110811
+#pragma GCC optimization_level 1
+#endif
+#endif
+
 Void_t*
 _int_malloc(mstate av, size_t bytes)
 {
-- End of Patch --

On 24 May 2012, at 6:54 AM, TERRY DONTJE wrote:

> I am seeing several Cart Fortran tests (like MPI_Cart_coords_f) segv in 
> opal_memory_ptmalloc2_int_free when OMPI trunk is compiled with icc 12.1.0 
> for 64 bit on linux.   Just wondering if anyone has seen anything similar to 
> this with a different version of icc.  Other non-Intel compilers seem to not 
> exhibit this issue.
> 
> -- 
> Terry D. Dontje | Principal Software Engineer
> Developer Tools Engineering | +1.781.442.2631
> Oracle - Performance Technologies
> 95 Network Drive, Burlington, MA 01803
> Email terry.don...@oracle.com
> 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] [EXTERNAL] SVN trunk PSM MTL is busted

2012-05-24 Thread Jeff Squyres
On May 24, 2012, at 12:07 PM, Barrett, Brian W wrote:

>> I'll file a bug about this; I'm assuming this is an issue the 1.7 RMs
>> will care about.
> 
> Did you file a bug?  Ralph fixed this one and I fixed its sister in probe,
> so all should work now...


Yep -- Ralph closed it.

Thanks!

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] [EXTERNAL] SVN trunk PSM MTL is busted

2012-05-24 Thread Barrett, Brian W
On 5/24/12 8:55 AM, "Jeff Squyres"  wrote:

>Per Brian's recent MTL updates, the PSM MTL is busted.  I notice the
>following when I run on a machine that has the PSM software stack
>installed:
>
>[ompi_r00lez:19108] mca: base: component_find: unable to open
>/scratch/local/jsquyres/bogus/lib/openmpi/mca_mtl_psm:
>/scratch/local/jsquyres/bogus/lib/openmpi/mca_mtl_psm.so: undefined
>symbol: ompi_mtl_psm_imrecv (ignored)
>
>I'll file a bug about this; I'm assuming this is an issue the 1.7 RMs
>will care about.

Did you file a bug?  Ralph fixed this one and I fixed its sister in probe,
so all should work now...

Brian

-- 
  Brian W. Barrett
  Dept. 1423: Scalable System Software
  Sandia National Laboratories








Re: [hwloc-devel] lstopo-gui

2012-05-24 Thread Brice Goglin
Le 24/05/2012 17:16, Jeff Squyres a écrit :
> Just for the record, I really, really dislike the fact that we've now split 
> lstopo into lstopo and lstopo-gui.
>
> Especially since I keep flipping back and forth between hwloc 1.4 and the 
> hwloc trunk, I continually get it wrong on the command line (e.g., use 
> "lstopo foo.pdf" when I needed to use lstopo-gui, or use "lstopo-gui foo.pdf" 
> when it doesn't exist).
>
> I think real users will be just as annoyed when they upgrade to hwloc 1.5.  I 
> really think we should revisit the issue and find a way to accommodate those 
> who don't want lots of dependencies in downstream binary packaging without 
> splitting into 2 different binaries.
>
> *** I say this because I think that the common case will be people who don't 
> give a whit about extra dependencies and just want lstopo to output pretty 
> jpg/pdf/whatever pictures -- just like it used to in all prior versions.  
> /etc/alternatives is not a good enough solution for us here upstream; 
> consider platforms that don't have /etc/alternatives-like solutions (OS X, 
> Windows).
>

So what you dislike isn't the split, it's the fact that lstopo doesn't
behave as it did earlier. You want lstopo-nogui and lstopo instead of
lstopo and lstopo-gui. And alternative systems can make lstopo point to
lstopo-nogui when the real lstopo isn't installed.

If you prefer adding plugin support, I won't be against : my intern will
likely need some plugin support in the core soon anyway.

Brice



[OMPI devel] SVN trunk PSM MTL is busted

2012-05-24 Thread Jeff Squyres
Per Brian's recent MTL updates, the PSM MTL is busted.  I notice the following 
when I run on a machine that has the PSM software stack installed:

[ompi_r00lez:19108] mca: base: component_find: unable to open 
/scratch/local/jsquyres/bogus/lib/openmpi/mca_mtl_psm: 
/scratch/local/jsquyres/bogus/lib/openmpi/mca_mtl_psm.so: undefined symbol: 
ompi_mtl_psm_imrecv (ignored)

I'll file a bug about this; I'm assuming this is an issue the 1.7 RMs will care 
about.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




[OMPI devel] MPI_Cart_coords_f segv with Intel compiler

2012-05-24 Thread TERRY DONTJE
I am seeing several Cart Fortran tests (like MPI_Cart_coords_f) segv in 
opal_memory_ptmalloc2_int_free when OMPI trunk is compiled with icc 
12.1.0 for 64 bit on linux.   Just wondering if anyone has seen anything 
similar to this with a different version of icc.  Other non-Intel 
compilers seem to not exhibit this issue.


--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com