Re: [OMPI devel] openmpi-1.3rc4 build failure with qsnet4.30

2009-01-14 Thread Paul H. Hargrove
I can confirm that both 1.3rc6 and 1.2.9rc2 now build fine for me. -Paul George Bosilca wrote: Paul, Thanks for noticing the Elan problem. It appears we miss one patch in the 1.3 (https://svn.open-mpi.org/trac/ompi/changeset/20122). I'll fill a CMR asap. Thanks, george. On Jan 13,

[OMPI devel] Open MPI v1.3rc6 has been posted

2009-01-14 Thread Tim Mattox
Hi All, The sixth (yes 6!) release candidate of Open MPI v1.3 is now available: http://www.open-mpi.org/software/ompi/v1.3/ Please run it through it's paces as best you can. Anticipated release of 1.3 is tomorrow morning. This only has a fix for a segfault in coll_hierarch_component.c with

Re: [OMPI devel] reduce_scatter bug with hierarch

2009-01-14 Thread Brad Benton
r20275 looks good. I suggest that we CMR that into 1.3 and get rc6 rolled and tested. (actually, Jeff just did the CMR...so off to rc6) --brad On Wed, Jan 14, 2009 at 1:16 PM, Edgar Gabriel wrote: > so I am not entirely sure why the bug only happened on trunk, it could in >

Re: [OMPI devel] OpenMPI rpm build 1.3rc3r20226 build failed

2009-01-14 Thread Matthias Jurenz
Sorry, I have searched the whole day for a solution of that problem, but unfortunately, I'm clueless :-( I cannot say which flag causes the compile error. Furthermore, I'm also unable to reproduce this error on some different platforms. The coding style in the concerned source file looks also not

Re: [OMPI devel] reduce_scatter bug with hierarch

2009-01-14 Thread Edgar Gabriel
so I am not entirely sure why the bug only happened on trunk, it could in theory also appear on v1.3 (is there a difference on how pointer_arrays are handled between the two versions?) Anyway, it passes now on both with changeset 20275. We should probably move that over to 1.3 as well,

Re: [OMPI devel] reduce_scatter bug with hierarch

2009-01-14 Thread Brad Benton
So, if it looks okay on 1.3...then there should not be anything holding up the release, right? Otherwise, George we need to decide on whether or not this is a blocker, or if we go ahead and release with this as a known issue and schedule the fix for 1.3.1. My vote is to go ahead and release, but

Re: [OMPI devel] crcpw verbosity

2009-01-14 Thread Josh Hursey
The crcpw component is in the PML framework. The following should be the MCA parameter you are looking for: pml_crcpw_verbose=20 You can use the 'ompi_info' command to find out more information about MCA parameters available. For example to find this one you can use the following:

Re: [OMPI devel] autosizing the shared memory backing file

2009-01-14 Thread Eugene Loh
I think you'd like to know more than just how many procs are local. E.g., if the chunk or eager limits are changed much, that would impact how much memory you'd like to allocate. A phone chat is all right for me, though so far all I've heard is that no one understands the code! But, maybe

[OMPI devel] Open MPI v1.3rc5 has been posted

2009-01-14 Thread Tim Mattox
Hi All, The fifth release candidate of Open MPI v1.3 is now available: http://www.open-mpi.org/software/ompi/v1.3/ Please run it through it's paces as best you can. Anticipated release of 1.3 is tonight/tomorrow. (again) -- Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/

Re: [OMPI devel] reduce_scatter bug with hierarch

2009-01-14 Thread Edgar Gabriel
I'm already debugging it. the good news is that it only seems to appear with trunk, with 1.3 (after copying the new tuned module over), all the tests pass. Now if somebody can tell me a trick on how to tell mpirun not kill the debugger under my feet, then I could even see where the problem

Re: [OMPI devel] reduce_scatter bug with hierarch

2009-01-14 Thread George Bosilca
All these errors are in the MPI_Finalize, it should not be that hard to find. I'll take a look later this afternoon. george. On Jan 14, 2009, at 06:41 , Tim Mattox wrote: Unfortunately, although this fixed some problems when enabling hierarch coll, there is still a segfault in two of

[OMPI devel] crcpw verbosity

2009-01-14 Thread Caciano Machado
Hi, What variable should I set to increase the verbosity of crcpw component? I've tried "ompi_crcpw_verbose=20" and "crcpw_base_verbose=20". How can I figure out the name of the variable. Regards, Caciano

Re: [OMPI devel] OpenMPI question

2009-01-14 Thread Jeff Squyres
To followup for the web archives -- we discussed this more off-list. AFAIK, compiling Open MPI -- including its memory registration cache -- works fine in 32 bit mode, even on 64 bit platforms (there was some confusion between virtual and physical memory addresses and who uses what [OMPI

Re: [OMPI devel] -display-map

2009-01-14 Thread Ralph Castain
We -may- be able to do a more formal XML output at some point. The problem will be the natural interleaving of stdout/err from the various procs due to the async behavior of MPI. Mpirun receives fragmented output in the forwarding system, limited by the buffer sizes and the amount of data

Re: [OMPI devel] autosizing the shared memory backing file

2009-01-14 Thread Ralph Castain
I also know little about that part of the code, but agree that does seem weird. Seeing as we know how many local procs there are before we get to this point, I would think we could be smart about our memory pool size. You might not need to dive into the sm BTL to get the info you need - if

Re: [OMPI devel] RFC: Fragmented sm Allocations

2009-01-14 Thread Jeff Squyres
Whoa, this analysis rocks. :-) I'm going through trying to grok it all... Just wanted to say: kudos for this. On Jan 14, 2009, at 1:14 AM, Eugene Loh wrote: RFC: Fragmented sm Allocations WHAT: Dealing with the fragmented allocations of sm BTL FIFO circular buffers (CB) during

Re: [OMPI devel] autosizing the shared memory backing file

2009-01-14 Thread Jeff Squyres
Ya, that does seem weird to me, but I never fully grokked the whole mpool / allocator scheme (I haven't had to interact with that part of the code much). Would it be useful to get on the phone and discuss this stuff? On Jan 14, 2009, at 1:11 AM, Eugene Loh wrote: Thanks for the reply. I

Re: [OMPI devel] -display-map

2009-01-14 Thread Greg Watson
Ralph, The only time we use the resolved names is when we get a map, so we consider them part of the map output. If quasi-XML is all that will ever be possible with 1.3, then you may as well leave as-is and we will attempt to clean it up in Eclipse. It would be nice if a future version

Re: [OMPI devel] OpenMPI Performance Problem with Open|SpeedShop

2009-01-14 Thread Ralph Castain
If your timer is actually generating an interrupt to the process, then that could be the source of the problem. I believe the event library also treats interrupts as events, and assigns them the highest priority. So every one of your interrupts would cause the event library to stop what it

Re: [OMPI devel] RFC: Fragmented sm Allocations

2009-01-14 Thread Ralph Castain
I haven't reviewed the code either, but really do appreciate someone taking the time for such a thorough analysis of the problems we have all observed for some time! Thanks Eugene!! On Jan 14, 2009, at 5:05 AM, Tim Mattox wrote: Great analysis and suggested changes! I've not had a

Re: [OMPI devel] OpenMPI rpm build 1.3rc3r20226 build failed

2009-01-14 Thread Jeff Squyres
Is there some code that can be fixed instead? I.e., is this feature totally incompatible with whatever RPM compiler flags are used, or is it just some coding style that these particular flags don't like? On Jan 14, 2009, at 5:05 AM, Matthias Jurenz wrote: Another workaround should be to

Re: [OMPI devel] RFC: Fragmented sm Allocations

2009-01-14 Thread Tim Mattox
Great analysis and suggested changes! I've not had a chance yet to look at your hg branch, so this sin't a code review... Barring a bad code review, I'd say these changes should all go in the trunk for inclusion in 1.4. 2009/1/14 Eugene Loh : > > > RFC: Fragmented sm

Re: [OMPI devel] reduce_scatter bug with hierarch

2009-01-14 Thread Tim Mattox
Unfortunately, although this fixed some problems when enabling hierarch coll, there is still a segfault in two of IU's tests that only shows up when we set -mca coll_hierarch_priority 100 See this MTT summary to see how the failures improved on the trunk, but that there are still two that

Re: [OMPI devel] OpenMPI rpm build 1.3rc3r20226 build failed

2009-01-14 Thread Matthias Jurenz
Another workaround should be to disable the I/O tracing feature of VT by adding the configure option '--with-contrib-vt-flags=--disable-iotrace' That will have the effect that the upcoming OMPI-rpm's have no support for I/O tracing, but in our opinion it is not so bad... Furthermore, we

[OMPI devel] RFC: Fragmented sm Allocations

2009-01-14 Thread Eugene Loh
Title: RFC: Fragmented sm Allocations RFC: Fragmented sm Allocations WHAT: Dealing with the fragmented allocations of sm BTL FIFO circular buffers (CB) during MPI_Init(). Also: Improve handling of error codes. Automate the sizing of the mmap file. WHY: To reduce consumption of

Re: [OMPI devel] autosizing the shared memory backing file

2009-01-14 Thread Eugene Loh
Thanks for the reply. I kind of understand, but it's rather weird. The BTL calls mca_mpool_base_module_create() to create a pool of memory, but the BTL has no say how big of a pool to create? Could you imagine having a memory allocation routine ("malloc" or something) that didn't allow you

Re: [OMPI devel] reduce_scatter bug with hierarch

2009-01-14 Thread George Bosilca
Here we go by the book :) https://svn.open-mpi.org/trac/ompi/ticket/1749 george. On Jan 13, 2009, at 23:40 , Jeff Squyres wrote: Let's debate tomorrow when people are around, but first you have to file a CMR... :-) On Jan 13, 2009, at 10:28 PM, George Bosilca wrote: Unfortunately, this