Re: [OMPI devel] [OMPI svn] svn:open-mpi r31577 - trunk/ompi/mca/rte/base

2014-04-30 Thread George Bosilca
wrote: > On Apr 30, 2014, at 6:35 PM, George Bosilca <bosi...@icl.utk.edu> wrote: > >> Puzzling. We survived so far without such a requirement. > > Ralph tells me that this is a requirement. So I figured we should check for > it. > >> In the BTLs wher

Re: [OMPI devel] [OMPI svn] svn:open-mpi r31577 - trunk/ompi/mca/rte/base

2014-04-30 Thread George Bosilca
that there be some way to reduce >> to 64-bits when accessing the common data. >> >> I think the usnic BTL may have an issue with that approach, so maybe some >> way of "unhashing" will be required? >> >> >> On Apr 30, 2014, at 3:42 PM, Jeff

Re: [OMPI devel] [OMPI svn] svn:open-mpi r31577 - trunk/ompi/mca/rte/base

2014-04-30 Thread George Bosilca
branch, and seems to work pretty well so far. George. On Apr 30, 2014, at 22:01 , George Bosilca <bosi...@icl.utk.edu> wrote: > On Apr 30, 2014, at 20:04 , Jeff Squyres (jsquyres) <jsquy...@cisco.com> > wrote: > >> All we need in usnic is the ompi_process_name_t

Re: [OMPI devel] [OMPI svn] svn:open-mpi r31577 - trunk/ompi/mca/rte/base

2014-05-01 Thread George Bosilca
shift the process identifier down to the opal layer? >>>> If we define opal_identifier_t to include the required jobid/vpid, perhaps >>>> adding a void* so someone can put whatever they want in it? >>>> >>>> Note that I'm not wild about extending the

Re: [OMPI devel] coll/tuned MPI_Bcast can crash or silently fail when using distinct datatypes across tasks

2014-05-06 Thread George Bosilca
Any update on this? Can it be used in the RMA part? George. On Wed, Apr 23, 2014 at 1:58 AM, Gilles Gouaillardet wrote: > my bad :-( > > this has just been fixed > > Gilles > > On 2014/04/23 14:55, Nathan Hjelm wrote: >> The ompi_datatype_flatten.c file appears

Re: [OMPI devel] regression with derived datatypes

2014-05-07 Thread George Bosilca
Strange. The outcome and the timing of this issue seems to highlight a link with the other datatype-related issue you reported earlier, and as suggested by Ralph with Gilles scif+vader issue. Generally speaking, the mechanism used to split the data in the case of multiple BTLs, is identical to

Re: [OMPI devel] regression with derived datatypes

2014-05-08 Thread George Bosilca
Nathan, or anybody with access to the target hardware, If you can provide a minimalistic output of the applications with and without the above-mentioned patch and with mpi_ddt_unpack_debug and mpi_ddt_pack_debug, and mpi_ddt_position_debug set to 1, I would try to help. George. On Thu, May

Re: [OMPI devel] about btl/scif thread cancellation (#4616 / r31738)

2014-05-13 Thread George Bosilca
I heard multiple references to pthread_cancel being known to have bad side effects. Can somebody educate my on this topic please? Thanks, George. On Tue, May 13, 2014 at 10:25 PM, Ralph Castain wrote: > It could be a bug in the software stack, though I wouldn't count

Re: [OMPI devel] Non-uniform BTL problems in: openib, tcp, sctp, portals4, vader, scif

2014-05-14 Thread George Bosilca
Good catch. I fixed the TCP BTL (r31753). It is the only BTL I can test so that's the most I can do here. However, I never get OPAL_ERR_DATA_VALUE_NOT_FOUND out of the modex call when the key doesn't exists. I looked in dstore and the correct value one should look for is OPAL_ERR_NOT_FOUND. I

Re: [OMPI devel] about btl/scif thread cancellation (#4616 / r31738)

2014-05-14 Thread George Bosilca
resources, > including (but not limited to) file descriptors and malloc()ed memory. > Even if Open MPI is written very carefully, one cannot assume that all the > libraries it calls (and their dependencies, etc.) are written to properly > deal with cancellation. > > -Paul > &g

Re: [OMPI devel] about btl/scif thread cancellation (#4616 / r31738)

2014-05-14 Thread George Bosilca
There seems to be a consensus on the fact that closing an fd should trigger the return from poll. Unfortunately this assumption is wrong, and not condoned by any documentation available online. To be more clear, all documentation I know tend to point in the opposite direction: it is unwise to

Re: [OMPI devel] RFC: fix leak of bml endpoints

2014-05-15 Thread George Bosilca
The solution you propose here is definitively not OK. It is 1) ugly and 2) break the separation barrier that we hold dear. Regarding your other suggestion I don’t see any reasons not to call the delete_proc on MPI_COMM_WORLD as the last action we do before tearing down everything else.

Re: [OMPI devel] [OMPI bugs] [Open MPI] #4645: Move r31786, 31829, r31830, r31833, r31834, r31835 to v1.8 branch (bml/r2 : fix mca_bml_r2_del_procs()) (was: Move r31786, 31829, r31830 to v1.8 branch (

2014-05-20 Thread George Bosilca
In order to cope with the dynamic case I think we will need to remove the check for a single registration and instead do a ompi_proc ref count. George. On Tue, May 20, 2014 at 6:58 AM, Open MPI wrote: > #4645: Move r31786, 31829, r31830, r31833, r31834, r31835 to v1.8

Re: [OMPI devel] RFC: add STCI component to OMPI/RTE framework

2014-05-27 Thread George Bosilca
>From a practical perspective, I don't think there is a need for a phone call. Ralph made his point, and we all took notice of it. However, the proposed changes are in a single independent component, with no impact on the rest of the code base. Therefore, there is absolutely no valid reason not to

Re: [OMPI devel] RFC: add STCI component to OMPI/RTE framework

2014-05-27 Thread George Bosilca
On Tue, May 27, 2014 at 5:09 PM, Ralph Castain wrote: >> That being said, I agree with Ralph on the fact that accepting them in >> the trunk doesn't automatically qualify it for inclusion in any >> further stable release. However, if ORNL setup nightly builds to >> validate

Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite

2014-05-28 Thread George Bosilca
Calling MPI_Comm_free is not enough from MPI perspective to clean up all knowledge about remote processes, nor to sever the links between the local and remote groups. One MUST call MPI_Comm_disconnect in order to achieve this. Look at the code in ompi/mpi/c and see the difference between

Re: [OMPI devel] regression with derived datatypes

2014-05-29 Thread George Bosilca
_position(...) is >> invoked and does not set the pConvertor->pStack >> as expected by r31496 >> >> i will run some more tests from now >> >> Gilles >> >> On 2014/05/08 2:23, George Bosilca wrote: >>> Strange. The outcome and the timing of

Re: [OMPI devel] fortran types alignment

2014-05-30 Thread George Bosilca
I think I like d the most but it is not a perfect solution. With d all real8 types in a common will be badly aligned and the Open MPI internal datatype will be incorrect. So I will vote for a combo: b + d. George. On Fri, May 30, 2014 at 4:57 AM, Gilles Gouaillardet

Re: [OMPI devel] btl/scif: SIGSEGV in MPI_Finalize()

2014-06-02 Thread George Bosilca
If the scif BTL registered it's own memory registration function, I would have expected that it will deregister it upon finalize. Without this we run into circular dependencies that are not solvable at the library level. George. On Mon, Jun 2, 2014 at 12:39 AM, Gilles Gouaillardet

[OMPI devel] RFC: Move the Open MPI communication infrastructure in OPAL

2014-06-05 Thread George Bosilca
WHAT:Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL WHY: All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several

Re: [OMPI devel] [patch] man and FUNC_NAME corrections

2014-07-09 Thread George Bosilca
Thanks for the patch. I applied it to the trunk in r32190, and CMR (#4780) it for the next release 1.8.2 George. On Thu, Jul 10, 2014 at 3:09 AM, Kawashima, Takahiro < t-kawash...@jp.fujitsu.com> wrote: > Hi, > > The attached patch corrects trivial typos in man files and > FUNC_NAME

Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-15 Thread George Bosilca
Nathan, Fixing the classes to correctly tear down everything was a two lines patch. However, this doesn’t fix the bigger issue, which is related to the fact that not all frameworks are correctly teared down, and when they are they leave behind char* parameters not set to NULL, and that the

Re: [OMPI devel] RFC: Move the Open MPI communication infrastructure in OPAL

2014-07-15 Thread George Bosilca
be good > because at least there will be no conflicts with usnic BTL concurrent > development. :-) > > > > > On Jul 10, 2014, at 2:56 PM, Ralph Castain <r...@open-mpi.org> wrote: > > > George: any update on when this will happen? > > > > > >

Re: [OMPI devel] 100% test failures

2014-07-15 Thread George Bosilca
I'm also looking into it. George. On Tue, Jul 15, 2014 at 10:50 AM, Nathan Hjelm wrote: > On Tue, Jul 15, 2014 at 11:40:38PM +0900, Gilles GOUAILLARDET wrote: > >r32236 is a suspect > > > >i am afk > > > >I just read the code and a class is initialized with >

Re: [OMPI devel] 100% test failures

2014-07-15 Thread George Bosilca
r32248 should be the fix for this issue. I was overly optimistic about the cleanup of the classes. It turns out this is not possible without deep rearrangement of the class infrastructure. More info on the commit log. Sorry for the mess, George. On Tue, Jul 15, 2014 at 11:38 AM, George

Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-15 Thread George Bosilca
are unloaded.   George. On July 15, 2014 at 1:17:26 AM, George Bosilca (bosi...@icl.utk.edu) wrote: > Nathan, > > Fixing the classes to correctly tear down everything was a two lines patch. > However, > this doesn’t fix the bigger issue, which is related to the fact that not al

Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-15 Thread George Bosilca
Enforcing the portability of this sounds like a huge [almost impossible] mess, without a clean portable solution (more about this below). However, few things should be considered: - Except for reinit, Open MPI works without it! If we provide such a capability it will be more a convenience

Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-17 Thread George Bosilca
Are these also called for shared libraries? George. On Wed, Jul 16, 2014 at 3:36 PM, Paul Hargrove wrote: > > On Wed, Jul 16, 2014 at 7:36 AM, Nathan Hjelm wrote: > >> Correction. xlc does support the destructor function attribute. The odd >> one out

Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-17 Thread George Bosilca
I think Case #1 is only a partial solution, as it only solves the example attached to the ticket. Based on my reading the the tool chapter calling MPI_T_init after MPI_Finalize is legit, and this case is not covered by the patch. But this is not the major issue I have with this patch. From a

Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-18 Thread George Bosilca
discuss the init-after-finalize issue, and he intends to > raise it with the Forum as it doesn't seem a logical thing to do. So that > issue may go away. Still leaves us pondering the right solution, and > hopefully coming up with something better than either of the ones we have > so far. >

Re: [OMPI devel] barrier before calling del_procs

2014-07-21 Thread George Bosilca
There was a long thread of discussion on why we must use an rte_barrier and not an mpi_barrier during the finalize. Basically, we long as we have connectionless unreliable BTLs we need an external mechanism to ensure complete tear-down of the entire infrastructure. Thus, we need to rely on an

Re: [OMPI devel] barrier before calling del_procs

2014-07-23 Thread George Bosilca
m rankB, while rankB is still doing MPI work. In this case rankB will > not be able to communicate with rankA any more, while it still has work to > do. > > > > *From:* devel [mailto:devel-boun...@open-mpi.org] *On Behalf Of *George > Bosilca > *Sent:* Monday, July 21, 2014 9:1

Re: [OMPI devel] RFC: Move the Open MPI communication infrastructure in OPAL

2014-07-23 Thread George Bosilca
information about peer processes in the >>> usnic BTL to include the peer's VPID, which is the MCW rank. I'll be sad >>> if that goes away... >>> >>> >>> On Jul 15, 2014, at 2:06 AM, George Bosilca <bosi...@icl.utk.edu> wrote: >>> >&

Re: [OMPI devel] SM initialization race condition

2008-08-21 Thread George Bosilca
Terry, We use the feature defined by POSIX mmap where the area should be zero- filled when the file length is extended. What OS you're using when you see such problems ? Just in case, here is a patch that set the beginning of the mmaped region to zero, in case this is not done

Re: [OMPI devel] SM initialization race condition

2008-08-21 Thread George Bosilca
is a gnu-ism. We should probably use memset instead. On Aug 21, 2008, at 5:40 AM, George Bosilca wrote: Terry, We use the feature defined by POSIX mmap where the area should be zero-filled when the file length is extended. What OS you're using when you see such problems ? Just in case

Re: [OMPI devel] Possible buffer overrun bug in opal_free_list_grow, called by MPI::Init

2008-08-28 Thread George Bosilca
Patrick, I'm unable to reproduce the buffer overrun with the latest trunk. I run valgrind (with the memchekcer tool) on a regular basis on the trunk, and I never noticed anything like that. Moreover, I went over the code, and I cannot imagine how we can overrun the buffer in the code you

Re: [OMPI devel] Any problems with https://svn.open-mpi.org/trac/ompi/ ??

2008-09-18 Thread George Bosilca
Same here. The TRAC server seems to have some problems, but the svn access is still working. george. On Sep 18, 2008, at 10:28 AM, Lenny Verkhovsky wrote: Any problems with https://svn.open-mpi.org/trac/ompi/ ?? I can open a new ticket :( Internal Server Error The server encountered an

Re: [OMPI devel] Upgrade GNU auto tools?

2008-09-19 Thread George Bosilca
I've been using these versions for some time, basically from the date they get released. So far, no issues have been raised. However, I do not see any benefit with these new versions (on Linux and Mac OS X). george. On Sep 19, 2008, at 9:56 AM, Tim Mattox wrote: Just an FYI, Last night

Re: [OMPI devel] [OMPI svn] svn:open-mpi r19600

2008-09-22 Thread George Bosilca
Ralph, There is NO need to have this discussion again, it was painful enough last time. From my perspective I do not understand why are you making so much noise on this one. How a 4 lines change in some ALPS specific files (Cray system very specific to ORNL) can generate more than 3 A4

Re: [OMPI devel] Should visibility and memchecker abort configure?

2008-10-03 Thread George Bosilca
Ralph in order to have the behavior you describe for the visibility feature just don't specify --enable-visibility. This will enable it if the feature is supported and disable (plus a small warning) if not. We decided a while ago that 1) we should have a consistent behavior for similar

Re: [OMPI devel] Should visibility and memchecker abort configure?

2008-10-03 Thread George Bosilca
arly state that we are unable to find any (or some specific) version of the valgrind libraries. The behavior related to memchecker you described in your second email seems like a deviation from this, so from my perspective it should be considered as a bug. george. On Oct 3, 2008, at 7:18

Re: [OMPI devel] sendi sm BTL function

2008-10-10 Thread George Bosilca
There is a simple (and good) reason not to have it in the upcoming release. By lack of time, I didn't manage to test it well enough. As I'm not 100% confident that it will not create any problems, I preferred to left it out of the 1.3. If you want to test it, please fell free to do so. And

Re: [OMPI devel] Possible buffer overrun bug in opal_free_list_grow, called by MPI::Init

2008-10-15 Thread George Bosilca
I did investigate this issue for about 3 hours yesterday. Neither valgrind nor efence report any errors on my cluster. I'm using debian unstable with gcc-4.1.2. Adding printfs doesn't shows the same output as you, all addresses are in the correct range. I went over the code manually, and

Re: [OMPI devel] Possible buffer overrun bug in opal_free_list_grow, called by MPI::Init

2008-10-17 Thread George Bosilca
? If yes does it resolve the issue ? george. On Oct 16, 2008, at 7:29 PM, Stephan Kramer wrote: George Bosilca wrote: I did investigate this issue for about 3 hours yesterday. Neither valgrind nor efence report any errors on my cluster. I'm using debian unstable with gcc-4.1.2. Adding

Re: [OMPI devel] Possible buffer overrun bug in opal_free_list_grow, called by MPI::Init

2008-10-21 Thread George Bosilca
Stephan, The fix was committed in the trunk (revision 19778). Fixes for the 1.2, as well as the 1.3 are pending. Thanks for your help, george. On Oct 20, 2008, at 5:43 PM, George Bosilca wrote: Stephen, I think you're completely right, and that I had a wrong understanding

Re: [OMPI devel] Direct routed module

2008-10-22 Thread George Bosilca
Youpiii! george. On Oct 21, 2008, at 4:53 PM, Ralph Castain wrote: Hello all I am working on adding a new radix tree routed module and am simultaneously doing a little streamlining to the overall routed- related code for scalability. One thing that would help cleanup several areas

Re: [OMPI devel] Comm_spawn limits

2008-10-22 Thread George Bosilca
What's happened if we roll around with the counter ? george. On Oct 22, 2008, at 2:49 PM, Ralph Castain wrote: There recently was activity on the mailing lists where someone was attempting to call comm_spawn 100,000 times. Setting aside the threading issues that were the focus of that

Re: [OMPI devel] Component open

2008-10-22 Thread George Bosilca
Ralph, This problem was fixed long ago by some of the work Camille did. The exact revision number is r15402 (https://svn.open-mpi.org/trac/ompi/changeset/15402 ). I'm using this feature daily and so far I had any problems with it. To reuse your example here is what Camille came up with. $

Re: [OMPI devel] MPI_Com_spawn

2008-10-29 Thread George Bosilca
, Ralph Castain wrote: Done...r19820 On Oct 28, 2008, at 8:37 AM, Ralph Castain wrote: Yes, of course it does - the problem is in a sanity check I just installed over the weekend. Easily fixed... On Oct 28, 2008, at 8:33 AM, George Bosilca wrote: Ralph, I run in troubles with the new IO

Re: [OMPI devel] libevent

2008-11-07 Thread George Bosilca
Leonardo, All events generated by the libevent are catched internally by the ompi library, but are not propagated until the next call to opal_progress. If you want to use alarms that trigger outside the opal_progress you will have to deal directly with the libevent (and not use

Re: [OMPI devel] Amateur Guidance

2008-11-07 Thread George Bosilca
On Nov 7, 2008, at 11:41 AM, Timothy Hayes wrote: http://macneill.cs.tcd.ie/~hayesti/ompi.jpg This is unfortunately not available to the outside world. N.B. The XEN component in the BTL layer represents what I'm trying to make. So far so good, the BTL is what you need in order to move

Re: [OMPI devel] Dropped message for the non-existing communicator

2008-11-08 Thread George Bosilca
Apparently it was with 19845, so before the patch that is supposed to fix this issue. Terry can you please test with a more recent version (> 19929). Thanks, george. On Nov 8, 2008, at 9:54 AM, Edgar Gabriel wrote: Terry, was this with the trunk or v1.3? If it was the trunk, was it

Re: [OMPI devel] More README questions

2008-11-15 Thread George Bosilca
On Nov 15, 2008, at 10:30 , Jeff Squyres wrote: I've reviewed and updated the entire README file, but have several questions that need to be answered by others in the community. I committed all my changes, but marked sections of the file with "***" where there's still a question about

Re: [OMPI devel] RFC: Add SunStudio/Libtool helper script for post-configure

2008-11-19 Thread George Bosilca
We're still using STL ? I was pretty much sure that we removed this dependency a while ago ? george. On Nov 19, 2008, at 09:11 , Ethan Mallove wrote: WHAT: Add patch-libtool-for-sun-studio.pl script

Re: [OMPI devel] RFC: merge windows branch into trunk

2008-11-25 Thread George Bosilca
Shiqing, Don't waste your time. While the idea behind cccl is nice, the overhead is unbelievably expensive. As a comparison it took 2 hours to compile Open MPI on Windows using cccl and makefile, while it takes less than 4 minutes to compile exactly the same set of functionalities using

Re: [OMPI devel] Preparations for moving the btl's

2008-12-03 Thread George Bosilca
Terry, I'm involved [at some degree] in both efforts and I can confirm these two efforts will not affect each other in any bad way. george. On Dec 3, 2008, at 11:42 , Terry Dontje wrote: I don't have any *strong* objections. However, I know that Eugene and George B have been working on

Re: [OMPI devel] Fwd: [OMPI users] Onesided + derived datatypes

2008-12-11 Thread George Bosilca
Brian, You're right, the datatype is being too cautious with the boundaries when detecting the overlap. There is no good solution to detect the overlap except parsing the whole memory layout to check the status of every predefined type. As one can imagine this is a very expensive

Re: [OMPI devel] Fwd: [OMPI users] Onesided + derived datatypes

2008-12-11 Thread George Bosilca
s to devel. Someone might want to reply to Dorian's e-mail on users. Brian On Dec 11, 2008, at 2:31 PM, George Bosilca wrote: Brian, You're right, the datatype is being too cautious with the boundaries when detecting the overlap. There is no good solution to detect the overlap exce

Re: [OMPI devel] Fwd: [OMPI users] Onesided + derived datatypes

2008-12-12 Thread George Bosilca
solution. However, the words "not it" come to mind. Sorry, but I have way too much on my plate this month. By the way, in case no one noticed, I had e-mailed my findings to devel. Someone might want to reply to Dorian's e-mail on users. Brian On Dec 11, 2008, at 2:31 PM, Geor

Re: [OMPI devel] Fwd: [OMPI users] Onesided + derived datatypes

2008-12-13 Thread George Bosilca
eone might want to reply to Dorian's e-mail on users. Brian On Dec 11, 2008, at 2:31 PM, George Bosilca wrote: Brian, You're right, the datatype is being too cautious with the boundaries when detecting the overlap. There is no good solution to detect the overlap except parsing the wh

Re: [OMPI devel] sm BTL "extra procs"

2008-12-24 Thread George Bosilca
As Rich stated, the original design of the SM BTL included [some] support for dynamic processes. Over the years, by lack of interest and man-power this support was more or less dropped. Some pieces of the code were removed or disabled, but apparently not everything. However, lately the

Re: [OMPI devel] OpenMPI Performance Problem with Open|SpeedShop

2009-01-12 Thread George Bosilca
There might be one reason to slowdown the application quite a bit. If the fact that you're using timer interact with the libevent (the library we're using to internally manage any kind of events), then we might end-up in the situation where we call the poll for every iteration in the event

Re: [OMPI devel] openmpi-1.3rc4 build failure with qsnet4.30

2009-01-13 Thread George Bosilca
Paul, Thanks for noticing the Elan problem. It appears we miss one patch in the 1.3 (https://svn.open-mpi.org/trac/ompi/changeset/20122). I'll fill a CMR asap. Thanks, george. On Jan 13, 2009, at 16:31 , Paul H. Hargrove wrote: Since it looks like you guys are very close to

Re: [OMPI devel] autosizing the shared memory backing file

2009-01-13 Thread George Bosilca
The simple answer is you can't. The mpool is loaded before the BTLs and on Linux the loader use the RTLD_NOW flag (i.e. all symbols have to be defined or the dlopen call will fail). Moreover, there is no way in Open MPI to exchange information between components except a global variable or

Re: [OMPI devel] 1.3 PML default choice

2009-01-13 Thread George Bosilca
This topic was raised on the mailing list quite a few times. There is a major difference between the PSM and the MX support. For PSM there is just an MTL, which makes everything a lot simpler. The problem with MX is that we have an MTL and a BTL. In order to figure out which one to use, we

Re: [OMPI devel] 1.3 PML default choice

2009-01-13 Thread George Bosilca
rent behavior to match. Brian On Jan 13, 2009, at 6:27 PM, George Bosilca wrote: This topic was raised on the mailing list quite a few times. There is a major difference between the PSM and the MX support. For PSM there is just an MTL, which makes everything a lot simpler. The problem with

Re: [OMPI devel] reduce_scatter bug with hierarch

2009-01-13 Thread George Bosilca
Unfortunately, this pinpoint the fact that we didn't test enough the collective module mixing thing. I went over the tuned collective functions and changed all instances to use the correct module information. It is now on the trunk, revision 20267. Simultaneously,I checked that all other

Re: [OMPI devel] reduce_scatter bug with hierarch

2009-01-14 Thread George Bosilca
Here we go by the book :) https://svn.open-mpi.org/trac/ompi/ticket/1749 george. On Jan 13, 2009, at 23:40 , Jeff Squyres wrote: Let's debate tomorrow when people are around, but first you have to file a CMR... :-) On Jan 13, 2009, at 10:28 PM, George Bosilca wrote: Unfortunately

Re: [OMPI devel] reduce_scatter bug with hierarch

2009-01-14 Thread George Bosilca
: http://www.open-mpi.org/mtt/index.php?do_redir=922 So, I'll vote for applying the CMR for 1.3 since it clearly improved things, but there is still more to be done to get coll_hierarch ready for regular use. On Wed, Jan 14, 2009 at 12:15 AM, George Bosilca <bosi...@eecs.utk.edu>

Re: [OMPI devel] RFC: Eliminate opal_round_up_to_nearest_pow2()

2009-01-15 Thread George Bosilca
Absolutely! Why wait until the 1.4 while we can have that in the 1.3.1... george. On Jan 15, 2009, at 16:39 , Eugene Loh wrote: I don't know what scope of changes require RFCs, but here's a trivial change. == RFC: Eliminate

Re: [OMPI devel] Open-MX vs OMPI 1.3 using MX internal symbols

2009-01-26 Thread George Bosilca
There are several reasons these calls are there. Please read further. On Jan 26, 2009, at 02:19 , Brice Goglin wrote: Hello, I am testing OpenMPI 1.3 over Open-MX. OpenMPI 1.2 works well but 1.3 does not load. This is caused by OMPI MX components now using some MX internal symbols

Re: [OMPI devel] Open-MX vs OMPI 1.3 using MX internal symbols

2009-01-26 Thread George Bosilca
On Jan 26, 2009, at 15:31 , Brice Goglin wrote: George Bosilca wrote: Yes, the only thing we need is an unique identifier per cluster. We use the last 6 digits from the mapper MAC address. Ok, thanks for the details. We are going to implement all this in Open-MX now. Then, I guess

Re: [OMPI devel] Trunk broken at r20375

2009-01-28 Thread George Bosilca
Seems more like a compiler problem. A static inline function defined in the header file but never used is the source of the problem. It did compile for me with the gcc from Leopard and 4.3.1 on Linux. I'll commit the fix asap. george. On Jan 28, 2009, at 14:26 , Ralph Castain wrote:

Re: [OMPI devel] RFC: Move of ompi_bitmap_t

2009-02-03 Thread George Bosilca
In the current bitmap implementation every time we set or check a bit we have to compute the index of the char where this bit is set and the relative position from the beginning of char. This requires two _VERY_ expensive operations: a division and a modulo. Compared with the cost of these

Re: [OMPI devel] RFC: Move of ompi_bitmap_t

2009-02-03 Thread George Bosilca
it ... george. On Feb 3, 2009, at 15:30 , Jeff Squyres wrote: On Feb 3, 2009, at 3:24 PM, George Bosilca wrote: In the current bitmap implementation every time we set or check a bit we have to compute the index of the char where this bit is set and the relative position from the beginning of char

Re: [OMPI devel] bug in openmpi-1.3/ompi/mpi/f77/profile/pcart_coords_f.c

2009-02-04 Thread George Bosilca
Christoph, You're absolutely right. In addition to your comment about the syntactically wrong line of code, even in the case when the fortran and C integers have the same length, we modify the value pointer by the fortran IN only argument. A patch is on the way. Thanks, george. On

Re: [OMPI devel] add_procs

2009-02-05 Thread George Bosilca
This functionality has as many chances to be called as any MPI 2 dynamics MPI functions. Every time the MPI universe is expanded, once the modex of the new processes is known, add procs is called in order to allow the PML and BTL to update their local view of the MPI universe. The code is

Re: [OMPI devel] RFC: Rename several OMPI_* names to OPAL_*

2009-02-10 Thread George Bosilca
These changes look fine to me. However, I would like to amend this proposal to include the splitting of the config directory. Over the last months, I know several project that use OPAL, and they like to use it as an independent part and not as a subset of ompi. Therefore, I had to extract

Re: [OMPI devel] MPI_Op_free crashes in MTT

2009-02-12 Thread George Bosilca
I'm unable to replicate these errors with revision r20529. All tests pass on my Linux cluster, TCP based not Myrinet. Let's see if other contributors to the MTT tests trigger the same errors. george. On Feb 12, 2009, at 12:04 , Tim Mattox wrote: Hello, Last night's MTT runs show a

Re: [OMPI devel] RFC: Eliminate ompi/class/ompi_[circular_buffer_]fifo.h

2009-02-13 Thread George Bosilca
I can't confirm or deny. The only thing I can tell is that the same test works fine over other BTL, so this tent either to pinpoint a problem in the sm BTL or in a particular path in the PML (the one used by the sm BTL). I'll have to dig a little bit more into it, but I was hoping to do it

Re: [OMPI devel] [OMPI svn] svn:open-mpi r20562

2009-02-16 Thread George Bosilca
Josh, Spending few minutes to understand, could have pinpointed you to the real culprit: the tool itself! The assert in the code state that on finalize there is still a registered signal handler. A quick gdb show that this is for the SIG_CHLD. Tracking the signal addition in the tool

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r20568

2009-02-16 Thread George Bosilca
Based on several man pages, free is capable of handling a NULL argument. What is really puzzling is that on your system it doesn't ... I tried on two system a 64 bits Debian and on my MAC OS X with all memory allocator options on, and I'm unable to get such a warning :( george. On Feb

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r20568

2009-02-17 Thread George Bosilca
I guess that if the free function supports the NULL pointer we should do the same... george. On Feb 17, 2009, at 07:35 , Jeff Squyres wrote: On Feb 16, 2009, at 9:16 PM, George Bosilca wrote: Based on several man pages, free is capable of handling a NULL argument. What is really

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r20568

2009-02-17 Thread George Bosilca
17, 2009, at 11:18 AM, George Bosilca wrote: I guess that if the free function supports the NULL pointer we should do the same... I'll agree with that if we know for sure that free(NULL) is universally supported. You mentioned "a few man pages" -- how universal is this support?

Re: [OMPI devel] sm BTL question: frag alloc

2009-02-17 Thread George Bosilca
Eugene, It appears this is a sm BTL problem. The prepare_src function can be called with any size. The BTL should check the size against the eager and return a descriptor that match the size requested. george. On Feb 17, 2009, at 20:14 , Eugene Loh wrote: (Rich: same question as I

Re: [OMPI devel] possible bugs and unexpected values in returned errors classes

2009-02-19 Thread George Bosilca
I fail to find anything about this on the MPI Standard. For me passing the NULL error handle to any kind of set handler function should not be an error. It should means that you prefer to not have any error handler triggered on the object. george. On Feb 19, 2009, at 09:34 , Lisandro

Re: [OMPI devel] possible bugs and unexpected values in returned errors classes

2009-02-19 Thread George Bosilca
_REQUEST_NULL should be the empty status (defined in the MPI standard) and not any kind of errors (i.e MPI_ERR_ARG). george. On Feb 19, 2009, at 11:43 , Jeff Squyres wrote: On Feb 19, 2009, at 10:47 AM, George Bosilca wrote: I fail to find anything about this on the MPI Standard. MPI doesn

Re: [OMPI devel] RFC: eliminating "descriptor" argument from sendi function

2009-02-23 Thread George Bosilca
It doesn't sound reasonable to me. There is a reason for this, and I think it's a good reason. The sendi function work for some devices as a fast path for sending data, when the network is not flooded. However, in the case sendi cannot do the job we expect, the fact that it return the

Re: [OMPI devel] RFC: eliminating "descriptor" argument from sendi function

2009-02-23 Thread George Bosilca
On Feb 23, 2009, at 12:14 , Eugene Loh wrote: I'm a newbie and George is a veteran. So, this feels rather like David and Goliath. (Hmm, David won and became king. Gee, I kinda like that.) Anyhow... That's an old story, we're living in modern times now ;) George Bosilca wrote

Re: [OMPI devel] Failure to make progress

2009-02-23 Thread George Bosilca
Ken, Your interpretation of the MPI standard is way too optimistic. Unfortunately, there is no asynchronous progress (expect on very few devices) in most of the MPI libraries. So, you should not expect the non blocking send to complete, without going in some MPI calls (MPI_Test as an

Re: [OMPI devel] RFC: eliminating "descriptor" argument from sendi function

2009-02-24 Thread George Bosilca
Here is another way to write the code without having to pay the expensive initialization of sendreq. first_time = 0; for ( btl = ... ) { if ( SUCCESS == sendi() ) return SUCCESS; if( 0 == first_time++) set_up_expensive_send_request(); if ( SUCCESS == send() ) return

Re: [OMPI devel] mca_btl_sm_sendi question

2009-02-25 Thread George Bosilca
On Feb 24, 2009, at 18:08 , Eugene Loh wrote: (Probably this message only for George, but I'll toss it out to the alias/archive.) I have a question about the sm sendi() function. What should happen if the sendi() function attempts to write to the FIFO, but the FIFO is full? The write

Re: [OMPI devel] Bug (wrong LB?) when using cascading derived data types

2009-03-02 Thread George Bosilca
Markus, You're right, there was a problem in the code. I'll pass the gore details of the why and how. The problem is now fixed by commit r20674. It will be in the next release. Thanks, george. On Mar 2, 2009, at 10:04 , Markus Blatt wrote: Hi, I already posted this accidentally

Re: [OMPI devel] PML Start error?

2009-03-02 Thread George Bosilca
Right, this should be reinitialized at the beginning of each loop. However, the current code works fine, it only call the ompi_convertor_set_position twice if the condition is true. This function check if the current position match the requested one, and does nothing if its the case.

Re: [OMPI devel] PML/ob1 problem

2009-03-03 Thread George Bosilca
Which solution seems to be working ? This bug was fixed a while ago in the trunk (https://svn.open-mpi.org/trac/ompi/changeset/20591 ) and in the 1.3 branch. It even made it in the 1.3.2. george. On Mar 3, 2009, at 05:01 , Lenny Verkhovsky wrote: Seems to be working. George, can you

Re: [OMPI devel] calling sendi earlier in the PML

2009-03-04 Thread George Bosilca
On Mar 4, 2009, at 14:44 , Eugene Loh wrote: Let me try another thought here. Why do we have BTL sendi functions at all? I'll make an assertion and would appreciate feedback: a BTL sendi function contributes nothing to optimizing send latency. To optimize send latency in the

Re: [OMPI devel] RFC: move BTLs out of ompi into separate layer

2009-03-09 Thread George Bosilca
On Mar 9, 2009, at 15:13 , Ralph Castain wrote: Could you please clarify - what is going to happen on Mar 23 (your timeout date)? It also wasn't clear about your testing. Are you calling up into the ONET layer to run it from the RTE? I believe this was the point of concern regarding

Re: [OMPI devel] OMPI vs Scali performance comparisons

2009-03-17 Thread George Bosilca
The default values for the large message fragments are not optimized for the new generation processors. This might be something to investigate, in order to see if we can have the same bandwidth as they do or not. george. On Mar 17, 2009, at 18:23 , Eugene Loh wrote: A colleague of mine

Re: [OMPI devel] Infinite Loop: ompi_free_list_wait

2009-03-23 Thread George Bosilca
It is a known problem. When the freelist is empty going in the ompi_free_list_wait will block the process until at least one fragment became available. As a fragment can became available only when returned by the BTL, this can lead to deadlocks in some cases. The workaround is to ban the

Re: [OMPI devel] OMPI 1.3 - PERUSE peruse_comm_spec_t peer Negative Value

2009-03-23 Thread George Bosilca
You are absolutely right, the peer should never be set to -1 on any of the PERUSE callbacks. I checked the code this morning and figure out what was the problem. We report the peer and the tag attached to a request before setting the right values (some code moved around). I submitted a

<    2   3   4   5   6   7   8   9   10   11   >