Re: [OMPI devel] reduce_scatter bug with hierarch

2009-01-14 Thread Brad Benton
r20275 looks good. I suggest that we CMR that into 1.3 and get rc6 rolled and tested. (actually, Jeff just did the CMR...so off to rc6) --brad On Wed, Jan 14, 2009 at 1:16 PM, Edgar Gabriel wrote: > so I am not entirely sure why the bug only happened on trunk, it could in >

Re: [OMPI devel] reduce_scatter bug with hierarch

2009-01-14 Thread Edgar Gabriel
so I am not entirely sure why the bug only happened on trunk, it could in theory also appear on v1.3 (is there a difference on how pointer_arrays are handled between the two versions?) Anyway, it passes now on both with changeset 20275. We should probably move that over to 1.3 as well,

Re: [OMPI devel] reduce_scatter bug with hierarch

2009-01-14 Thread Brad Benton
So, if it looks okay on 1.3...then there should not be anything holding up the release, right? Otherwise, George we need to decide on whether or not this is a blocker, or if we go ahead and release with this as a known issue and schedule the fix for 1.3.1. My vote is to go ahead and release, but

Re: [OMPI devel] reduce_scatter bug with hierarch

2009-01-14 Thread Edgar Gabriel
I'm already debugging it. the good news is that it only seems to appear with trunk, with 1.3 (after copying the new tuned module over), all the tests pass. Now if somebody can tell me a trick on how to tell mpirun not kill the debugger under my feet, then I could even see where the problem

Re: [OMPI devel] reduce_scatter bug with hierarch

2009-01-14 Thread George Bosilca
All these errors are in the MPI_Finalize, it should not be that hard to find. I'll take a look later this afternoon. george. On Jan 14, 2009, at 06:41 , Tim Mattox wrote: Unfortunately, although this fixed some problems when enabling hierarch coll, there is still a segfault in two of

Re: [OMPI devel] reduce_scatter bug with hierarch

2009-01-14 Thread Tim Mattox
Unfortunately, although this fixed some problems when enabling hierarch coll, there is still a segfault in two of IU's tests that only shows up when we set -mca coll_hierarch_priority 100 See this MTT summary to see how the failures improved on the trunk, but that there are still two that

Re: [OMPI devel] reduce_scatter bug with hierarch

2009-01-14 Thread George Bosilca
Here we go by the book :) https://svn.open-mpi.org/trac/ompi/ticket/1749 george. On Jan 13, 2009, at 23:40 , Jeff Squyres wrote: Let's debate tomorrow when people are around, but first you have to file a CMR... :-) On Jan 13, 2009, at 10:28 PM, George Bosilca wrote: Unfortunately, this

Re: [OMPI devel] reduce_scatter bug with hierarch

2009-01-13 Thread Tim Mattox
George, I suggest that you file a CMR for r20267 and we can go from there. If it makes 1.3 it makes it, otherwise we have it ready for 1.3.1 At this point the earliest 1.3 will go out is Wednesday late morning (presuming I'm the one moving the bits), and is more likely to hit the website in the

Re: [OMPI devel] reduce_scatter bug with hierarch

2009-01-13 Thread Jeff Squyres
Let's debate tomorrow when people are around, but first you have to file a CMR... :-) On Jan 13, 2009, at 10:28 PM, George Bosilca wrote: Unfortunately, this pinpoint the fact that we didn't test enough the collective module mixing thing. I went over the tuned collective functions and

Re: [OMPI devel] reduce_scatter bug with hierarch

2009-01-13 Thread George Bosilca
Unfortunately, this pinpoint the fact that we didn't test enough the collective module mixing thing. I went over the tuned collective functions and changed all instances to use the correct module information. It is now on the trunk, revision 20267. Simultaneously,I checked that all other

Re: [OMPI devel] reduce_scatter bug with hierarch

2009-01-13 Thread Jeff Squyres
Thanks for digging into this. Can you file a bug? Let's mark it for v1.3.1. I say 1.3.1 instead of 1.3.0 because this *only* affects hierarch, and since hierarch isn't currently selected by default (you must specifically elevate hierarch's priority to get it to run), there's no danger

[OMPI devel] reduce_scatter bug with hierarch

2009-01-13 Thread Edgar Gabriel
I just debugged the Reduce_scatter bug mentioned previously. The bug is unfortunately not in hierarch, but in tuned. Here is the code snipplet causing the problems: int reduce_scatter (, mca_coll_base_module_t *module) { ... err = comm->c_coll.coll_reduce (, module) ... } but