from:"Tim Prins"

Re: [OMPI devel] [OMPI svn] svn:open-mpi r18252

2008-04-25 Thread Tim Prins

This commit causes mpirun to segfault when running the IBM spawn tests on our slurm platforms (it may affect others as well). The failures only happen when mpirun is run in a batch script. The backtrace I get is: Program terminated with signal 11, Segmentation fault. #0 0x002a969b9dbe in d

Re: [OMPI devel] Change in btl/tcp

2008-04-18 Thread Tim Prins

To echo what Josh said, there are no special compile flags being used. If you send me a patch with debug output, I'd be happy to run it for you. Both odin and sif are fairly normal linux based clusters, with ethernet and openib IP networks. The ethernet network has both ipv4 & ipv6, and the op

Re: [OMPI devel] Change in btl/tcp

2008-04-18 Thread Tim Prins

Hi Adrian, After this change, I am getting a lot of errors of the form: [sif2][[12854,1],9][btl_tcp_frag.c:216:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104) See for instance: http://www.open-mpi.org/mtt/index.php?do_redir=615 I have found this espe

Re: [OMPI devel] RFC: changes to modex

2008-04-15 Thread Tim Prins

Hate to bring this up again, but I was thinking that an easy way to reduce the size of the modex would be to reduce the length of the names describing each piece of data. More concretely, for a simple run I get the following names, each of which are sent over the wire for every proc (note that

[OMPI devel] mpirun return code problems

2008-04-08 Thread Tim Prins

Hi all, I reported this before, but it seems that the report got lost. I have found some situations where mpirun will return a '0' when there is an error. An easy way to reproduce this is to edit the file 'orte/mca/plm/base/plm_base_launch_support.c' and on line 154 put in 'return ORTE_ERROR

Re: [OMPI devel] init_thread + spawn error

2008-04-04 Thread Tim Prins

Thanks for the report. As Ralph indicated the threading support in Open MPI is not good right now, but we are working to make it better. I have filed a ticket (https://svn.open-mpi.org/trac/ompi/ticket/1267) so we do not loose track of this issue, and attached a potential fix to the ticket.

Re: [OMPI devel] RFC: changes to modex

2008-04-02 Thread Tim Prins

Is there a reason to rename ompi_modex_{send,recv} to ompi_modex_proc_{send,recv}? It seems simpler (and no more confusing and less work) to leave the names alone and add ompi_modex_node_{send,recv}. Another question: Does the receiving process care that the information received applies to a w

Re: [OMPI devel] [OMPI svn] svn:open-mpi r17941

2008-03-27 Thread Tim Prins

Unfortunately now with r17988 I cannot run any mpi programs, they seem to hang in the modex. Tim Ralph H Castain wrote: Thanks Tim - I found the problem and will commit a fix shortly. Appreciate your testing and reporting! On 3/27/08 8:24 AM, "Tim Prins" wrote: This commit bre

Re: [OMPI devel] [OMPI svn] svn:open-mpi r17941

2008-03-27 Thread Tim Prins

This commit breaks things for me. Running on 3 nodes of odin: mpirun -mca btl tcp,sm,self examples/ring_c causes a hang. All of the processes are stuck in orte_grpcomm_base_barrier during MPI_Finalize. Not all programs hang, and the ring program does not hang all the time, but fairly often.

Re: [OMPI devel] RMAPS rank_file component patch and modifications for review

2008-03-26 Thread Tim Prins

Hi Lenny, This looks good. But I have a couple of suggestions (which others may disagree with): 1. You register an opal mca parameter, but look it up in ompi, then call a opal function with the result. What if you had a function opal_paffinity_base_set_slots(long rank) (or some other name, I

Re: [OMPI devel] iof/libevent failures?

2008-03-25 Thread Tim Prins

-- can you try running with: --mca opal_event_include select and see if that fixes the problem for you? On Mar 25, 2008, at 8:49 AM, Tim Prins wrote: Hi everyone, For the last couple nights ALL of our mtt runs have been failing (although the failure is masked because mpirun is returning

[OMPI devel] iof/libevent failures?

2008-03-25 Thread Tim Prins

Hi everyone, For the last couple nights ALL of our mtt runs have been failing (although the failure is masked because mpirun is returning the wrong error code) with: [odin005.cs.indiana.edu:28167] [[46567,0],0] ORTE_ERROR_LOG: Error in file base/plm_base_launch_support.c at line 161 -

[OMPI devel] Return code and error message problems

2008-03-25 Thread Tim Prins

Hi, Something went wrong last night and all our MTT tests had the following output: [odin005.cs.indiana.edu:28167] [[46567,0],0] ORTE_ERROR_LOG: Error in file base/plm_base_launch_support.c at line 161 -- mpirun was unable

Re: [OMPI devel] orte can't launch process

2008-03-06 Thread Tim Prins

Sorry about that. I removed a field in a structure, then 'svn up' seems to have added it back, so we were using a field that should not even exist in a couple places. Should be fixed in r17757 Tim Gleb Natapov wrote: Something is broken in the trunk. # mpirun -np 2 -H host1,host2 ./osu_lat

[OMPI devel] [RFC] Reduce the number of tests run by make check

2008-03-04 Thread Tim Prins

WHAT: Reduce the number of tests run by make check WHY: Some of the tests will not work properly until Open MPI is installed. Also, many of the tests do not really test anything. WHERE: See below. TIMEOUT: COB Friday March 14 DESCRIPTION: We have been having many problems with make check ove

Re: [OMPI devel] make check failing

2008-03-04 Thread Tim Prins

d any carto modules. Is there some reason why carto absolutely must find a module? Can we crate a default "none available" module in the base? On 3/4/08 7:39 AM, "Tim Prins" wrote: Hi, We have been having a problem lately with our MTT runs where make check would f

[OMPI devel] make check failing

2008-03-04 Thread Tim Prins

Hi, We have been having a problem lately with our MTT runs where make check would fail when mpi threads were enabled. Turns out the problem is that opal_init now calls opal_base_carto_select, which cannot find any carto modules since we have not done an install yet. So it returns a failure.

Re: [OMPI devel] [RFC] Default hostfile MCA param

2008-03-04 Thread Tim Prins

We have used '^' elsewhere to indicate not, so maybe just have the syntax be if you put '^' at the beginning of a line, that node is not used. So we could have: n0 n1 ^headnode n3 I understand the idea of having a flag to indicate that all nodes below a certain point should be ignored, but I t

[OMPI devel] [RFC] Removal of orte_proc_table

2008-03-03 Thread Tim Prins

WHAT: Removal of orte_proc_table WHY: It is the last 'orte' class, its implementation is an abstraction violation since it assumes certain things about how the opal_hash_table is implemented, and it is not much code to remove it. WHERE: This will necessitate minor changes in: btl: tcp, sc

Re: [OMPI devel] rch-step2 branch errors

2008-02-21 Thread Tim Prins

I have just made a change in r17540 which may fix your problem. If not, please send the requested output. Tim Tim Prins wrote: Hi, I have been doing some work on this branch, and may have caused that problem. But I really cannot help at all without all the error output. If you do a '

Re: [OMPI devel] rch-step2 branch errors

2008-02-21 Thread Tim Prins

Hi, I have been doing some work on this branch, and may have caused that problem. But I really cannot help at all without all the error output. If you do a 'make >/dev/null' and send that output I may be able to help. Thanks, Tim Lenny Verkhovsky wrote: Hi, In order to make a /tmp/rank_fi

[OMPI devel] t_win failures if openib btl is not loaded

2008-02-18 Thread Tim Prins

Hi all, This is a bit strange, so I thought I'd ping the group before digging any further. The onesided test 't_win' is failing for us (specifically the 'FENCE_EPOCH' part). It is only failing when we are NOT using openib. But here is where it gets strange. The test is run twice: once where

Re: [OMPI devel] C++ errhandler

2008-02-15 Thread Tim Prins

eb 15, 2008, at 9:19 AM, Tim Prins wrote: Hi, We are running into a problem with the IBM test cxx_call_errhandler since the merge of the c++ bindings changes. Not sure if this is a known problem, but I did not see a bug or any traffic about this one. MTT link: http://www.open-mpi.org/mtt/inde

[OMPI devel] C++ errhandler

2008-02-15 Thread Tim Prins

Hi, We are running into a problem with the IBM test cxx_call_errhandler since the merge of the c++ bindings changes. Not sure if this is a known problem, but I did not see a bug or any traffic about this one. MTT link: http://www.open-mpi.org/mtt/index.php?do_redir=532 Thanks, Tim

Re: [OMPI devel] New address selection for btl-tcp (was Re: [OMPI svn] svn:open-mpi r17307)

2008-02-15 Thread Tim Prins

Adrian Knoth wrote: On Fri, Feb 01, 2008 at 11:40:20AM -0500, Tim Prins wrote: Adrian, Hi! Sorry for the late reply and thanks for your testing. 1. There are some warnings when compiling: I've fixed these issues. Thanks. 2. If I exclude all my tcp interfaces, the connection

Re: [OMPI devel] C++ build failures

2008-02-12 Thread Tim Prins

I just talked to Jeff about this. The problem was that on Sif we use --enable-visibility, and apparently the new c++ bindings access ompi_errhandler_create, which was not OMPI_DECLSPEC'd. Jeff will fix this soon. Tim Jeff Squyres wrote: I'm a little concerned about the C++ test build failures

Re: [OMPI devel] [OMPI svn] svn:open-mpi r17307

2008-02-01 Thread Tim Prins

Adrian, For the most part this seems to work for me. But there are a few issues. I'm not sure which are introduced by this patch, and whether some may be expected behavior. But for completeness I will point them all out. First, let me explain I am working on a machine with 3 tcp interfaces, l

[OMPI devel] More VT warnings

2008-02-01 Thread Tim Prins

With a fresh checkout, I get the following warnings: vt_metric_papi.c:72: warning: no previous prototype for ‘vt_metric_error’ vt_metric_papi.c:86: warning: no previous prototype for ‘vt_metric_warning’ vt_metric_papi.c:100: warning: function declaration isn’t a prototype vt_metric_papi.c: In fun

Re: [OMPI devel] 32 bit openib btl warnings

2008-02-01 Thread Tim Prins

These were fixed by Gleb yesterday in https://svn.open-mpi.org/trac/ompi/changeset/17346 Tim Jeff Squyres wrote: I noticed these in IBM's MTT runs on the rhc branch last night: btl_openib_frag.c: In function 'out_constructor': btl_openib_frag.c:74: warning: cast from pointer to integer of

Re: [OMPI devel] vt compiler warnings and errors

2008-01-31 Thread Tim Prins

Hi Matthias, I just noticed something else that seems odd. On a fresh checkout, I did a autogen and configure. Then I type 'make clean'. Things seem to progress normally, but once it gets to ompi/contrib/vt/vt/extlib/otf, a new configure script gets run. Specifically: [tprins@sif test]$ make

[OMPI devel] 32 bit udapl warnings

2008-01-31 Thread Tim Prins

Hi, I am seeing some warnings on the trunk when compiling udapl in 32 bit mode with OFED 1.2.5.1: btl_udapl.c: In function 'udapl_reg_mr': btl_udapl.c:95: warning: cast from pointer to integer of different size btl_udapl.c: In function 'mca_btl_udapl_alloc': btl_udapl.c:852: warning: cast from

Re: [OMPI devel] vt compiler warnings and errors

2008-01-29 Thread Tim Prins

Jeff Squyres wrote: I got a bunch of compiler warnings and errors with VT on the PGI compiler last night -- my mail client won't paste it in nicely. :-( See these MTT reports for details: - On Absoft systems: http://www.open-mpi.org/mtt/index.php?do_redir=516 - On Cisco systems: With P

Re: [OMPI devel] Trunk borked

2008-01-29 Thread Tim Prins

We have run into another issue with the vt integration. Configuring with '--disable-mpi-io' results in: vt_mpiwrap.c:3565: error: `fh' undeclared (first use in this function) vt_mpiwrap.c:3565: error: `buf' undeclared (first use in this function) vt_mpiwrap.c:3565: error: `count' undeclared (fir

Re: [OMPI devel] patch for building gm btl

2008-01-02 Thread Tim Prins

On Wednesday 02 January 2008 08:52:08 am Jeff Squyres wrote: > On Dec 31, 2007, at 11:42 PM, Paul H. Hargrove wrote: > > I tried today to build the OMPI trunk on a system w/ GM libs installed > > (I tried both GM-2.0.16 and GM-1.6.4) and found that the GM BTL won't > > even compile, due to unbalanc

Re: [OMPI devel] opal_condition_wait

2007-12-11 Thread Tim Prins

Well, this makes some sense, although it still seems like this violates the spirit of condition variables. Thanks, Tim Brian W. Barrett wrote: On Thu, 6 Dec 2007, Tim Prins wrote: Tim Prins wrote: First, in opal_condition_wait (condition.h:97) we do not release the passed mutex if

Re: [OMPI devel] opal_condition_wait

2007-12-11 Thread Tim Prins

upon further inspection of the pthreads documentation this behavior seems to be allowable. Thanks for the clarifications, Tim Gleb Natapov wrote: On Thu, Dec 06, 2007 at 09:46:45AM -0500, Tim Prins wrote: Also, when we are using threads, there is a case where we do not decrement the signaled

Re: [OMPI devel] opal_condition_wait

2007-12-06 Thread Tim Prins

Tim Prins wrote: Hi, A couple of questions. First, in opal_condition_wait (condition.h:97) we do not release the passed mutex if opal_using_threads() is not set. Is there a reason for this? I ask since this violates the way condition variables are supposed to work, and it seems like there

[OMPI devel] opal_condition_wait

2007-12-06 Thread Tim Prins

Hi, A couple of questions. First, in opal_condition_wait (condition.h:97) we do not release the passed mutex if opal_using_threads() is not set. Is there a reason for this? I ask since this violates the way condition variables are supposed to work, and it seems like there are situations where

[OMPI devel] opal_condition

2007-12-05 Thread Tim Prins

Hi, Last night we had one of our threaded builds on the trunk hang when running make check on the test opal_condition in test/threads/ After running the test about 30-40 times, I was only able to get it to hang once. Looking at it is gdb, we get: (gdb) info threads 3 Thread 1084229984 (LW

[OMPI devel] opal_condition

2007-12-05 Thread Tim Prins

Hi, Last night we had one of our threaded builds on the trunk hang when running make check on the test opal_condition in test/threads/ After running the test about 30-40 times, I was only able to get it to hang once. Looking at it is gdb, we get: (gdb) info threads 3 Thread 1084229984 (LW

Re: [OMPI devel] RTE Issue III: Collective communications across daemons

2007-12-05 Thread Tim Prins

The latter issue exists for even MPI jobs. Consider the case of a single process job that comm_spawns a child job onto other nodes. The RTE will launch daemons on the new nodes, and then broadcast the "launch procs" command across all the daemons (this is done to exploit a scalable comm procedure)

Re: [OMPI devel] RTE Issue IV: RTE/MPI relative modex responsibilities

2007-12-05 Thread Tim Prins

Well, I think it is pretty obvious that I am a fan of a attribute system :) For completeness, I will point out that we also exchange architecture and hostname info in the modex. Do we really need a complete node map? A far as I can tell, it looks like the MPI layer only needs a list of local

Re: [OMPI devel] RTE Issue II: Interaction between the ROUTED and GRPCOMM frameworks

2007-12-05 Thread Tim Prins

To me, (c) is a non-starter. I think whenever possible we should be automatically doing the right thing. The user should not need to have any idea how things work inside the library. Between options (a) and (b), I don't really care. (b) would be great if we had a mca component dependency syste

Re: [OMPI devel] Using ompi_proc_t's proc_name.vpid as Universal rank

2007-11-30 Thread Tim Prins

Hi Sajjad, The vpid is not unique. If you do a comm_spawn then the newly launched processes will have a new jobid, and their vpids will start at 0. So the whole process name is unique. However, there is talk now of being able to join 2 jobs that were started completely independently. This ma

Re: [OMPI devel] ORTE process name,, nodeid..

2007-11-19 Thread Tim Prins

On Monday 19 November 2007 09:42:21 am Ralph H Castain wrote: > An alternative solution might be to incorporate the modex in the new OMPI > framework I was about to create anyway. This framework was intended to deal > with publish/lookup of OMPI data to support a variety of methods. > Originally,

Re: [OMPI devel] [OMPI svn] svn:open-mpi r16723

2007-11-14 Thread Tim Prins

Hi, The following files bother me about this commit: trunk/ompi/mca/btl/sctp/sctp_writev.c trunk/ompi/mca/btl/sctp/sctp_writev.h They bother me for 2 reasons: 1. Their naming does not follow the prefix rule 2. They are LGPL licensed. While I personally like the LGPL, I do not believe it

Re: [OMPI devel] Multiworld MCA parameter values broken

2007-11-08 Thread Tim Prins

er end. Ralph On 11/7/07 5:50 PM, "Tim Prins" wrote: I'm curious what changed to make this a problem. How were we passing mca param from the base to the app before, and why did it change? I think that options 1 & 2 below are no good, since we, in general, allow s

Re: [OMPI devel] Multiworld MCA parameter values broken

2007-11-07 Thread Tim Prins

that was happening. Could be an > error in the way I was doing things, or could be a real characteristic of > the parser. Anyway, we would have to ensure that the parser removes any > surrounding "" before passing along the param value or this won't work. > > R

Re: [OMPI devel] Environment forwarding

2007-11-05 Thread Tim Prins

Thanks for the clarification everyone. Tim On Monday 05 November 2007 05:41:00 pm Torsten Hoefler wrote: > On Mon, Nov 05, 2007 at 05:32:04PM -0500, Brian W. Barrett wrote: > > On Mon, 5 Nov 2007, Torsten Hoefler wrote: > > > On Mon, Nov 05, 2007 at 04:57:19PM -0500, Brian W. Barrett wrote: > > >

[OMPI devel] Environment forwarding

2007-11-05 Thread Tim Prins

Hi, After talking with Torsten today I found something weird. When using the SLURM pls we seem to forward a user's environment, but when using the rsh pls we do not. I.e.: [tprins@odin ~]$ mpirun -np 1 printenv |grep foo [tprins@odin ~]$ export foo=bar [tprins@odin ~]$ mpirun -np 1 printenv |gr

[OMPI devel] Multiworld MCA parameter values broken

2007-11-05 Thread Tim Prins

Hi, Commit 16364 broke things when using multiword mca param values. For instance: mpirun --debug-daemons -mca orte_debug 1 -mca pls rsh -mca pls_rsh_agent "ssh -Y" xterm Will crash and burn, because the value "ssh -Y" is being stored into the argv orted_cmd_line in orterun.c:1506. This is

Re: [OMPI devel] [OMPI svn] svn:open-mpi r16644

2007-11-05 Thread Tim Prins

Hi, This commit causes a bunch of warnings of the type: runtime/opal_init.c:55:2: warning: #ident is a GCC extension runtime/orte_init.c:92:2: warning: #ident is a GCC extension pinit.c:32:2: warning: #ident is a GCC extension pinit_f.c:26:2: warning: #ident is a GCC extension Thanks, Tim e

Re: [OMPI devel] Use of orte_pointer_array in openib and udapl btls

2007-10-18 Thread Tim Prins

e. On Oct 17, 2007, at 10:43 PM, Tim Prins wrote: Hi, The openib and udapl btls currently use the orte_pointer_array class. This is a problem for me as I am trying to implement the RSL. So, as far as I can tell, there are 3 options: 1. Move the orte_pointer_array class to opal. This would be

[OMPI devel] Use of orte_pointer_array in openib and udapl btls

2007-10-17 Thread Tim Prins

Hi, The openib and udapl btls currently use the orte_pointer_array class. This is a problem for me as I am trying to implement the RSL. So, as far as I can tell, there are 3 options: 1. Move the orte_pointer_array class to opal. This would be quite simple to do and makes sense in that there

Re: [OMPI devel] Non-blocking modex

2007-10-08 Thread Tim Prins

cking receive would be to ask for the information before the exchange has been done. I don't know if this is helpful to anyone. My guess is not since no one does it. Tim Andrew Friedley wrote: Tim Prins wrote: Hi, I am working on implementing the RSL. Part of this is changing the modex

[OMPI devel] Non-blocking modex

2007-10-08 Thread Tim Prins

Hi, I am working on implementing the RSL. Part of this is changing the modex to use the process attribute system in the RSL. I had designed this system to to include a non-blocking interface. However, I have looked again and noticed that nobody is using the non-blocking modex receive. Becaus

[OMPI devel] RFC: Remove opal message buffer

2007-10-08 Thread Tim Prins

WHAT: Remove the opal message buffer code WHY: It is not used WHERE: Remove references from opal/mca/base/Makefile.am and opal/mca/base/base.h svn rm opal/mca/base/mca_base_msgbuf* WHEN: After timeout TIMEOUT: COB, Wednesday October 10, 2007 I ran into this code accidental

Re: [OMPI devel] Malloc segfaulting?

2007-09-21 Thread Tim Prins

omething that might help > you. The option --enable-mem-debug add a unused space at the end of > each memory allocation and make sure we don't write anything there. I > think this is the simplest way to pinpoint this problem. > >Thanks, > george. > > On Se

Re: [OMPI devel] Malloc segfaulting?

2007-09-21 Thread Tim Prins

like somewhere we are over-running our allocated space. So now I am attempting to redo the run with valgrind. Tim On Thursday 20 September 2007 09:59:14 pm Brian Barrett wrote: > On Sep 20, 2007, at 7:02 AM, Tim Prins wrote: > > In our nightly runs with the trunk I have started see

[OMPI devel] Malloc segfaulting?

2007-09-20 Thread Tim Prins

Hi folks, In our nightly runs with the trunk I have started seeing cases where we appear to be segfaulting within/below malloc. Below is a typical output. Note that this appears to only happen on the trunk, when we use openib, and are in 32 bit mode. It seems to happen randomly at a very low

Re: [OMPI devel] FreeBSD Support?

2007-09-19 Thread Tim Prins

This is fixed in r16164. Tim Brian Barrett wrote: On Sep 19, 2007, at 4:11 PM, Tim Prins wrote: Here is where it gets nasty. On FreeBSD, /usr/include/string.h includes strings.h in some cases. But there is a strings.h in the ompi/mpi/f77 directory, so that is getting included instead of

Re: [OMPI devel] FreeBSD Support?

2007-09-19 Thread Tim Prins

Hi Karol, Thanks for the reports. I cannot help with the first problem. Maybe someone else can. Problem 2: I have committed your suggested fix in r16163. As for the third problem, this is very strange. It looks like what is happening is that we are in the ompi/mpi/f77 directory compiling a .c

Re: [OMPI devel] Adding a new component to MTL

2007-09-13 Thread Tim Prins

Hi, In the configure output, there should be a section like: --- MCA component mtl: (m4 configuration macro) checking for MCA component mtl: compile mode... dso checking usability... yes checking if MCA component mtl: can compile... yes Make sure that last line is yes. If it is not, look at th

Re: [OMPI devel] Public tmp branches

2007-08-31 Thread Tim Prins

Jeff Squyres wrote: That's fine, too. I don't really care -- /public already exists. We can simply rename it to /tmp-public. Let's do that. It should (more or less) address all concerns that have been voiced. Tim On Aug 31, 2007, at 8:52 AM, Ralph Castain wrote: Why not make /tmp-pu

Re: [OMPI devel] Public tmp branches

2007-08-31 Thread Tim Prins

Why not make /tmp-public and /tmp-private? Leave /tmp alone. Have all new branches made in one of the two new directories, and as /tmp branches are slowly whacked, we can (eventually) get rid of /tmp. Tim Jeff Squyres (jsquyres) wrote: I thought about both of those (/tmp/private and /tmp/pub

Re: [OMPI devel] [devel-core] [RFC] Runtime Services Layer

2007-08-27 Thread Tim Prins

ue in creating RSL just to support old versions of ORTE, nor for supporting ORTE development. It would be nice if we could re-evaluate this after the next ORTE version becomes solidified to see how the cost/benefit analysis has changed, and whether the RSL remains a desirable option. Ralph On 8/16/07 7:4

Re: [OMPI devel] [devel-core] [RFC] Runtime Services Layer

2007-08-24 Thread Tim Prins

ve to use a RTE which supports every system ever imagined, and provides every possible fault-tolerant feature, when all they want is a thin RTE. Tim george. On Aug 16, 2007, at 9:47 PM, Tim Prins wrote: WHAT: Solicitation of feedback on the possibility of adding a runtime services layer

Re: [OMPI devel] [devel-core] [RFC] Runtime Services Layer

2007-08-21 Thread Tim Prins

ds, Matthias On Thu, 2007-08-16 at 21:47 -0400, Tim Prins wrote: WHAT: Solicitation of feedback on the possibility of adding a runtime services layer to Open MPI to abstract out the runtime. WHY: To solidify the interface between OMPI and the runtime environment, and to allow the use

Re: [OMPI devel] [RFC] Runtime Services Layer

2007-08-21 Thread Tim Prins

me concern as Jeff's). Understood. Tim --td Tim Prins wrote: WHAT: Solicitation of feedback on the possibility of adding a runtime services layer to Open MPI to abstract out the runtime. WHY: To solidify the interface between OMPI and the runtime environment, and to allow the u

Re: [OMPI devel] [devel-core] [RFC] Runtime Services Layer

2007-08-19 Thread Tim Prins

st to > caution against moving too fast, and then having to redo things. > I am very supportive if this, I do believe this is the right way to go, > unless someone else can come up with a better idea, and time to implement. Thanks for the comments, Tim > > Thanks, > Rich > >

Re: [OMPI devel] [devel-core] [RFC] Runtime Services Layer

2007-08-19 Thread Tim Prins

a feature-driven release), etc. I agree that it is impossible to say right now, but wanted to throw it out there for people to consider/think about. Tim > > On Aug 16, 2007, at 9:47 PM, Tim Prins wrote: > > WHAT: Solicitation of feedback on the possibility of adding a runtime > &g

[OMPI devel] [RFC] Runtime Services Layer

2007-08-16 Thread Tim Prins

e will be any performance implications. My guess is not, but am not quite sure of this yet. Again, I am interested in people's comments on whether they think adding such abstraction is good or not, and whether it is reasonable to do such a thing for v1.3. Thanks, Tim Prins RSL-Diagram.pdf Description: Adobe PDF document

Re: [OMPI devel] [OMPI svn] svn:open-mpi r15881

2007-08-16 Thread Tim Prins

Jeff Squyres wrote: On Aug 16, 2007, at 11:48 AM, Tim Prins wrote: +#define ORTE_RML_TAG_UDAPL 25 +#define ORTE_RML_TAG_OPENIB 26 +#define ORTE_RML_TAG_MVAPI 27 I think that UDAPL, OPENIB, MVAPI should not appear anywhere in the ORTE layer

Re: [OMPI devel] Problem with group code

2007-08-16 Thread Tim Prins

's passing.. Thanks, Mohamad On Thu, August 16, 2007 7:49 am, Tim Prins wrote: Sorry, I pushed the wrong button and sent this before it was ready Tim Prins wrote: Hi folks, I am running into a problem with the ibm test 'group'. I will try to explain what I think is going on, but I

Re: [OMPI devel] [OMPI svn] svn:open-mpi r15881

2007-08-16 Thread Tim Prins

George Bosilca wrote: Was there a real reason for this commit ? Was the previous code broken ? I'm not aware of it ever breaking for anyone. However, it is a potential problem if the right combination of networks are used, since multiple btls are listening on the same tag. It's not that I ha

Re: [OMPI devel] Problem with group code

2007-08-16 Thread Tim Prins

Sorry, I pushed the wrong button and sent this before it was ready Tim Prins wrote: Hi folks, I am running into a problem with the ibm test 'group'. I will try to explain what I think is going on, but I do not really understand the group code so please forgive me if it is wron

[OMPI devel] Problem with group code

2007-08-16 Thread Tim Prins

Hi folks, I am running into a problem with the ibm test 'group'. I will try to explain what I think is going on, but I do not really understand the group code so please forgive me if it is wrong... The test creates a group based on MPI_COMM_WORLD (group1), and a group that has half the procs

[OMPI devel] RML tags

2007-08-15 Thread Tim Prins

Hi folks, I was looking at the rml usage in ompi, and noticed that several of the btls (udapl, mvapi, and openib) use the same rml tag for their messages. My guess is that this is a mistake, but just want to ask if there is a reason for this before I correct it. Thanks, Tim

Re: [OMPI devel] [OMPI svn] svn:open-mpi r15848

2007-08-14 Thread Tim Prins

This might be breaking things on odin. All our 64 bit openib mtt tests have the following output: [odin003.cs.indiana.edu:30971] Wrong QP specification (QP 0 "P,128,256,128,16:S,1024,256,128,32:S,4096,256,128,32:S,65536,256,128,32"). Point-to-point QP get 1-5 parameters However, on my debug

[OMPI devel] Problem in mpool rdma finalize

2007-08-13 Thread Tim Prins

Hi folks, I have run into a problem with mca_mpool_rdma_finalize as implemented in r15557. With the t_win onesided test, running over gm, it segfaults. What appears to be happening is that some memory is registered with gm, and then gets freed by mca_mpool_rdma_finalize. But the free function t

Re: [OMPI devel] [OMPI users] warning:regcache incompatible with malloc

2007-07-11 Thread Tim Prins

Scott Atchley wrote: On Jul 10, 2007, at 3:24 PM, Tim Prins wrote: On Tuesday 10 July 2007 03:11:45 pm Scott Atchley wrote: On Jul 10, 2007, at 2:58 PM, Scott Atchley wrote: Tim, starting with the recently released 1.2.1, it is the default. To clarify, MX_RCACHE=1 is the default. It would

Re: [OMPI devel] Multi-environment builds

2007-07-10 Thread Tim Prins

Jeff Squyres wrote: 2. The "--enable-mca-no-build" option takes a comma-delimited list of components that will then not be built. Granted, this option isn't exactly intuitive, but it was the best that we could think of at the time to present a general solution for inhibiting the build of a

Re: [OMPI devel] Ob1 segfault

2007-07-09 Thread Tim Prins

On Monday 09 July 2007 02:04:33 pm Gleb Natapov wrote: > On Mon, Jul 09, 2007 at 10:41:52AM -0400, Tim Prins wrote: > > Gleb Natapov wrote: > > > On Sun, Jul 08, 2007 at 12:41:58PM -0400, Tim Prins wrote: > > >> On Sunday 08 July 2007 08:32:27 am Gleb Natapov wrote: &

Re: [OMPI devel] Ob1 segfault

2007-07-09 Thread Tim Prins

Gleb Natapov wrote: On Sun, Jul 08, 2007 at 12:41:58PM -0400, Tim Prins wrote: On Sunday 08 July 2007 08:32:27 am Gleb Natapov wrote: On Fri, Jul 06, 2007 at 06:36:13PM -0400, Tim Prins wrote: While looking into another problem I ran into an issue which made ob1 segfault on me. Using gm, and

Re: [OMPI devel] Ob1 segfault

2007-07-08 Thread Tim Prins

On Sunday 08 July 2007 08:32:27 am Gleb Natapov wrote: > On Fri, Jul 06, 2007 at 06:36:13PM -0400, Tim Prins wrote: > > While looking into another problem I ran into an issue which made ob1 > > segfault on me. Using gm, and running the test test_dan1 in the onesided > > test

[OMPI devel] Ob1 segfault

2007-07-06 Thread Tim Prins

While looking into another problem I ran into an issue which made ob1 segfault on me. Using gm, and running the test test_dan1 in the onesided test suite, if I limit the gm freelist by too much, I get a segfault. That is, mpirun -np 2 -mca btl gm,self -mca btl_gm_free_list_max 1024 test_dan1 wo

Re: [OMPI devel] threaded builds

2007-06-11 Thread Tim Prins

I had similar problems on the trunk, which was fixed by Brian with r14877. Perhaps 1.2 needs something similar? Tim On Monday 11 June 2007 10:08:15 am Jeff Squyres wrote: > Per the teleconf last week, I have started to revamp the Cisco MTT > infrastructure to do simplistic thread testing. Speci

Re: [OMPI devel] ORTE registry patch

2007-06-06 Thread Tim Prins

you have done those tests, then my apology - but your note only indicates that you ran "hello_world" and are basing your recommendation *solely* on that test. On 6/6/07 7:51 AM, "Tim Prins" wrote: I hate to go back to this, but... The original commits also included chan

Re: [OMPI devel] ORTE registry patch

2007-06-06 Thread Tim Prins

...sigh. Anyway, it doesn't appear to have any bearing either way on George's patch(es), so whomever wants to commit them is welcome to do so. Thanks Ralph On 5/29/07 11:44 AM, "Ralph Castain" wrote: On 5/29/07 11:02 AM, "Tim Prins" wrote: Well,

Re: [OMPI devel] ORTE registry patch

2007-05-29 Thread Tim Prins

let me know and I'll push it in the trunk asap. Thanks, george. On May 29, 2007, at 10:56 AM, Tim Prins wrote: I think both patches should be put in immediately. I have done some simple testing, and with 128 nodes of odin, with 1024 processes running mpi hello, these decrease our runn

Re: [OMPI devel] ORTE registry patch

2007-05-29 Thread Tim Prins

I think both patches should be put in immediately. I have done some simple testing, and with 128 nodes of odin, with 1024 processes running mpi hello, these decrease our running time from about 14.2 seconds to 10.9 seconds. This is a significant decrease, and as the scale increases there sh

Re: [O-MPI devel] Intel tests

2006-01-14 Thread Tim Prins

a possible inplace error. Expect a fix shortly. > > G. > > On Tue, 10 Jan 2006, Tim Prins wrote: > > > Graham, > > > > It works properly if I select the basic coll component. Anyways, > here is > > the output you requested. The full output is about 14

Re: [O-MPI devel] Intel tests

2006-01-10 Thread Tim Prins

t; > and email me the output? > Thanks > G > On Tue, 10 Jan 2006, Tim Prins wrote: > > > Hi everyone, > > > > I have been playing around with Open-MPI, using it as a test bed > for > > another project I am working on, and have found that on the intel > te

[O-MPI devel] Intel tests

2006-01-10 Thread Tim Prins

Hi everyone, I have been playing around with Open-MPI, using it as a test bed for another project I am working on, and have found that on the intel test suite, ompi is failing the MPI_Allreduce_user_c, MPI_Reduce_scatter_user_c, and MPI_Reduce_user_c tests (it prints something like MPITEST error

94 matches

Mail list logo