Re: [OMPI devel] Maximum Shared Memory Segment - OK to increase?
Rolf, I think it is not a good idea to increase the default value to 2G. You have to keep in mind that there are not so many people who have a machine with 128 and more cores on a single node. The average people will have nodes with 2,4 maybe 8 cores and therefore it is not necessary to set this parameter to such a high value. Eventually it allocates all of this memory per node, and if you have only 4 or 8G per node it will be inbalanced. For my 8core nodes I have even decreased the sm_max_size to 32G and I had no problems with that. As far as I know (if not otherwise specified during runtime) this parameter is global. So even if you run on your machine with 2 procs it might allocate the 2G for the MPI smp module. I would recommend like Richard suggests to set the parameter for your machine in etc/openmpi-mca-params.conf and not to change the default value. Markus Rolf vandeVaart wrote: > We are running into a problem when running on one of our larger SMPs > using the latest Open MPI v1.2 branch. We are trying to run a job > with np=128 within a single node. We are seeing the following error: > > "SM failed to send message due to shortage of shared memory." > > We then increased the allowable maximum size of the shared segment to > 2Gigabytes-1 which is the maximum allowed on 32-bit application. We > used the mca parameter to increase it as shown here. > > -mca mpool_sm_max_size 2147483647 > > This allowed the program to run to completion. Therefore, we would > like to increase the default maximum from 512Mbytes to 2G-1 Gigabytes. > Does anyone have an objection to this change? Soon we are going to > have larger CPU counts and would like to increase the odds that things > work "out of the box" on these large SMPs. > > On a side note, I did a quick comparison of the shared memory needs of > the old Sun ClusterTools to Open MPI and came up with this table. > > Open MPI > np Sun ClusterTools 6current suggested > - > 2 20M 128M128M > 4 20M 128M128M > 8 22M 256M256M > 16 27M 512M512M > 32 48M 512M 1G > 64133M 512M2G-1 > 128476M 512M2G-1 > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
Re: [OMPI devel] Maximum Shared Memory Segment - OK to increase?
Maybe an clarification of the SM BTL implementation is needed. Does the SM BTL not set a limit based on np using the max allowable as a ceiling? If not and all jobs are allowed to use up to max allowable I see the reason for not wanting to raise the max allowable. That being said it seems to me that the memory usage of the SM BTL is a lot larger than it should be. Wasn't there some work done around June that looked why the SM BTL was allocating a lot of memory, anything come out of that? --td Markus Daene wrote: Rolf, I think it is not a good idea to increase the default value to 2G. You have to keep in mind that there are not so many people who have a machine with 128 and more cores on a single node. The average people will have nodes with 2,4 maybe 8 cores and therefore it is not necessary to set this parameter to such a high value. Eventually it allocates all of this memory per node, and if you have only 4 or 8G per node it will be inbalanced. For my 8core nodes I have even decreased the sm_max_size to 32G and I had no problems with that. As far as I know (if not otherwise specified during runtime) this parameter is global. So even if you run on your machine with 2 procs it might allocate the 2G for the MPI smp module. I would recommend like Richard suggests to set the parameter for your machine in etc/openmpi-mca-params.conf and not to change the default value. Markus Rolf vandeVaart wrote: We are running into a problem when running on one of our larger SMPs using the latest Open MPI v1.2 branch. We are trying to run a job with np=128 within a single node. We are seeing the following error: "SM failed to send message due to shortage of shared memory." We then increased the allowable maximum size of the shared segment to 2Gigabytes-1 which is the maximum allowed on 32-bit application. We used the mca parameter to increase it as shown here. -mca mpool_sm_max_size 2147483647 This allowed the program to run to completion. Therefore, we would like to increase the default maximum from 512Mbytes to 2G-1 Gigabytes. Does anyone have an objection to this change? Soon we are going to have larger CPU counts and would like to increase the odds that things work "out of the box" on these large SMPs. On a side note, I did a quick comparison of the shared memory needs of the old Sun ClusterTools to Open MPI and came up with this table. Open MPI np Sun ClusterTools 6current suggested - 2 20M 128M128M 4 20M 128M128M 8 22M 256M256M 16 27M 512M512M 32 48M 512M 1G 64133M 512M2G-1 128476M 512M2G-1 ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Maximum Shared Memory Segment - OK to increase?
There are 3 parameters that control how much memory is used by the SM BTL. MCA mpool: parameter "mpool_sm_max_size" (current value: "536870912") Maximum size of the sm mpool shared memory file MCA mpool: parameter "mpool_sm_min_size" (current value: "134217728") Minimum size of the sm mpool shared memory file MCA mpool: parameter "mpool_sm_per_peer_size" (current value: "33554432") Size (in bytes) to allocate per local peer in the sm mpool shared memory file, bounded by min_size and max_size To paraphrase the above, the default ceiling is 512M, the default floor is 128M, and the scaling factor is 32M*procs_on_node. Therefore, changing it would only effect cases where there were more than 16 processes on a node. (16*32=512M) My suggestion was to increase the ceiling from 512M to 2G-1. And yes, we could adjust as suggested by Rich by adding setting the parameter in our customized openmpi-mca-params.conf file. I just was not sure that was the optimal solution. Rolf Terry D. Dontje wrote: Maybe an clarification of the SM BTL implementation is needed. Does the SM BTL not set a limit based on np using the max allowable as a ceiling? If not and all jobs are allowed to use up to max allowable I see the reason for not wanting to raise the max allowable. That being said it seems to me that the memory usage of the SM BTL is a lot larger than it should be. Wasn't there some work done around June that looked why the SM BTL was allocating a lot of memory, anything come out of that? --td Markus Daene wrote: Rolf, I think it is not a good idea to increase the default value to 2G. You have to keep in mind that there are not so many people who have a machine with 128 and more cores on a single node. The average people will have nodes with 2,4 maybe 8 cores and therefore it is not necessary to set this parameter to such a high value. Eventually it allocates all of this memory per node, and if you have only 4 or 8G per node it will be inbalanced. For my 8core nodes I have even decreased the sm_max_size to 32G and I had no problems with that. As far as I know (if not otherwise specified during runtime) this parameter is global. So even if you run on your machine with 2 procs it might allocate the 2G for the MPI smp module. I would recommend like Richard suggests to set the parameter for your machine in etc/openmpi-mca-params.conf and not to change the default value. Markus Rolf vandeVaart wrote: We are running into a problem when running on one of our larger SMPs using the latest Open MPI v1.2 branch. We are trying to run a job with np=128 within a single node. We are seeing the following error: "SM failed to send message due to shortage of shared memory." We then increased the allowable maximum size of the shared segment to 2Gigabytes-1 which is the maximum allowed on 32-bit application. We used the mca parameter to increase it as shown here. -mca mpool_sm_max_size 2147483647 This allowed the program to run to completion. Therefore, we would like to increase the default maximum from 512Mbytes to 2G-1 Gigabytes. Does anyone have an objection to this change? Soon we are going to have larger CPU counts and would like to increase the odds that things work "out of the box" on these large SMPs. On a side note, I did a quick comparison of the shared memory needs of the old Sun ClusterTools to Open MPI and came up with this table. Open MPI np Sun ClusterTools 6current suggested - 2 20M 128M128M 4 20M 128M128M 8 22M 256M256M 16 27M 512M512M 32 48M 512M 1G 64133M 512M2G-1 128476M 512M2G-1 ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Maximum Shared Memory Segment - OK to increase?
On Mon, 2007-08-27 at 15:10 -0400, Rolf vandeVaart wrote: > We are running into a problem when running on one of our larger SMPs > using the latest Open MPI v1.2 branch. We are trying to run a job > with np=128 within a single node. We are seeing the following error: > > "SM failed to send message due to shortage of shared memory." > > We then increased the allowable maximum size of the shared segment to > 2Gigabytes-1 which is the maximum allowed on 32-bit application. We > used the mca parameter to increase it as shown here. > > -mca mpool_sm_max_size 2147483647 > > This allowed the program to run to completion. Therefore, we would > like to increase the default maximum from 512Mbytes to 2G-1 Gigabytes. > Does anyone have an objection to this change? Soon we are going to > have larger CPU counts and would like to increase the odds that things > work "out of the box" on these large SMPs. > There is a serious problem with the 1.2 branch, it does not allocate any SM area for each process at the beginning. SM areas are allocated on demand and if some of the processes are more aggressive than the others, it will cause starvation. This problem is fixed in the trunk by assign at least one SM area for each process. I think this is what you saw (starvation) and an increase of max size may not be necessary. Ollie
Re: [OMPI devel] Maximum Shared Memory Segment - OK to increase?
On Aug 28, 2007, at 9:05 AM, Li-Ta Lo wrote: On Mon, 2007-08-27 at 15:10 -0400, Rolf vandeVaart wrote: We are running into a problem when running on one of our larger SMPs using the latest Open MPI v1.2 branch. We are trying to run a job with np=128 within a single node. We are seeing the following error: "SM failed to send message due to shortage of shared memory." We then increased the allowable maximum size of the shared segment to 2Gigabytes-1 which is the maximum allowed on 32-bit application. We used the mca parameter to increase it as shown here. -mca mpool_sm_max_size 2147483647 This allowed the program to run to completion. Therefore, we would like to increase the default maximum from 512Mbytes to 2G-1 Gigabytes. Does anyone have an objection to this change? Soon we are going to have larger CPU counts and would like to increase the odds that things work "out of the box" on these large SMPs. There is a serious problem with the 1.2 branch, it does not allocate any SM area for each process at the beginning. SM areas are allocated on demand and if some of the processes are more aggressive than the others, it will cause starvation. This problem is fixed in the trunk by assign at least one SM area for each process. I think this is what you saw (starvation) and an increase of max size may not be necessary. Although I'm pretty sure this is fixed in the v1.2 branch already. I don't think we should raise that ceiling at this point. We create the file in /tmp, and if someone does -np 32 on a single, small node (not unheard of), it'll do really evil things. Personally, I don't think we need nearly as much shared memory as we're using. It's a bad design in terms of its unbounded memory usage. We should fix that, rather than making the file bigger. But I'm not going to fix it, so take my opinion with a grain of salt. Brian
Re: [OMPI devel] thread model
On Aug 27, 2007, at 10:04 PM, Jeff Squyres wrote: On Aug 27, 2007, at 2:50 PM, Greg Watson wrote: Until now I haven't had to worry about the opal/orte thread model. However, there are now people who would like to use ompi that has been configured with --with-threads=posix and --with-enable-mpi- threads. Can someone give me some pointers as to what I need to do in order to make sure I don't violate any threading model? Note that this is *NOT* well tested. There is work going on right now to make the OMPI layer be able to support MPI_THREAD_MULTIPLE (support was designed in from the beginning, but we haven't ever done any kind of comprehensive testing/stressing of multi-thread support such that it is pretty much guaranteed not to work), but it is occurring on the trunk (i.e., what will eventually become v1.3) -- not the v1.2 branch. The interfaces I'm calling are: opal_event_loop() Brian or George will have to answer about that one... opal_path_findv() This guy should be multi-thread safe (disclaimer: haven't tested it myself); it doesn't rely on any global state. orte_init() orte_ns.create_process_name() orte_iof.iof_subscribe() orte_iof.iof_unsubscribe() orte_schema.get_job_segment_name() orte_gpr.get() orte_dss.get() orte_rml.send_buffer() orte_rmgr.spawn_job() orte_pls.terminate_job() orte_rds.query() orte_smr.job_stage_gate_subscribe() orte_rmgr.get_vpid_range() Note that all of ORTE is *NOT* thread safe, nor is it planned to be (it just seemed way more trouble than it was worth). You need to serialize access to it. Does that mean just calling OPAL_THREAD_LOCK() and OPAL_THREAD_UNLOCK () around each? Greg
Re: [OMPI devel] Maximum Shared Memory Segment - OK to increase?
On Tue, 2007-08-28 at 10:12 -0600, Brian Barrett wrote: > On Aug 28, 2007, at 9:05 AM, Li-Ta Lo wrote: > > > On Mon, 2007-08-27 at 15:10 -0400, Rolf vandeVaart wrote: > >> We are running into a problem when running on one of our larger SMPs > >> using the latest Open MPI v1.2 branch. We are trying to run a job > >> with np=128 within a single node. We are seeing the following error: > >> > >> "SM failed to send message due to shortage of shared memory." > >> > >> We then increased the allowable maximum size of the shared segment to > >> 2Gigabytes-1 which is the maximum allowed on 32-bit application. We > >> used the mca parameter to increase it as shown here. > >> > >> -mca mpool_sm_max_size 2147483647 > >> > >> This allowed the program to run to completion. Therefore, we would > >> like to increase the default maximum from 512Mbytes to 2G-1 > >> Gigabytes. > >> Does anyone have an objection to this change? Soon we are going to > >> have larger CPU counts and would like to increase the odds that > >> things > >> work "out of the box" on these large SMPs. > >> > > > > > > There is a serious problem with the 1.2 branch, it does not allocate > > any SM area for each process at the beginning. SM areas are allocated > > on demand and if some of the processes are more aggressive than the > > others, it will cause starvation. This problem is fixed in the trunk > > by assign at least one SM area for each process. I think this is what > > you saw (starvation) and an increase of max size may not be necessary. > > Although I'm pretty sure this is fixed in the v1.2 branch already. > It should never happen for the new code. The only way we can get the message is when MCA_BTL_SM_FIFO_WRITE return rc != OMPI_SUCCESS, but the new MCA_BTL_SM_FIFO_WRITE always return rc = OMPI_SUCCESS #define MCA_BTL_SM_FIFO_WRITE(endpoint_peer, my_smp_rank,peer_smp_rank,hdr,rc) \ do { \ ompi_fifo_t* fifo; \ fifo=&(mca_btl_sm_component.fifo[peer_smp_rank][my_smp_rank]); \ \ /* thread lock */ \ if(opal_using_threads()) \ opal_atomic_lock(fifo->head_lock); \ /* post fragment */ \ while(ompi_fifo_write_to_head(hdr, fifo, \ mca_btl_sm_component.sm_mpool) != OMPI_SUCCESS) \ opal_progress(); \ MCA_BTL_SM_SIGNAL_PEER(endpoint_peer); \ rc=OMPI_SUCCESS; \ if(opal_using_threads()) \ opal_atomic_unlock(fifo->head_lock); \ } while(0) Rolf, are you using the really last 1.2 branch? Ollie
Re: [OMPI devel] [devel-core] [RFC] Runtime Services Layer
On 8/27/07 7:30 AM, "Tim Prins" wrote: > Ralph, > > Ralph H Castain wrote: >> Just returned from vacation...sorry for delayed response > No Problem. Hope you had a good vacation :) And sorry for my super > delayed response. I have been pondering this a bit. > >> In the past, I have expressed three concerns about the RSL. >> >> >> My bottom line recommendation: I have no philosophical issue with the RSL >> concept. However, I recommend holding off until the next version of ORTE is >> completed and then re-evaluating to see how valuable the RSL might be, as >> that next version will include memory footprint reduction and framework >> consolidation that may yield much of the RSL's value without the extra work. >> >> >> Long version: >> >> 1. What problem are we really trying to solve? >> If the RSL is intended to solve the Cray support problem (where the Cray OS >> really just wants to see OMPI, not ORTE), then it may have some value. The >> issue to date has revolved around the difficulty of maintaining the Cray >> port in the face of changes to ORTE - as new frameworks are added, special >> components for Cray also need to be created to provide a "do-nothing" >> capability. In addition, the Cray is memory constrained, and the ORTE >> library occupies considerable space while providing very little >> functionality. > This is definitely a motivation, but not the only one. So...what are the others? > >> The degree of value provide by the RSL will therefore depend somewhat on the >> efficacy of the changes in development within ORTE. Those changes will, >> among other things, significantly consolidate and reduce the number of >> frameworks, and reduce the memory footprint. The expectation is that the >> result will require only a single CNOS component in one framework. It isn't >> clear, therefore, that the RSL will provide a significant value in that >> environment. > But won't there still be a lot of orte code linked in that will never be > used? Not really. The only thing left would be the stuff in runtime and util. We have talked for years about creating an ORTE "services" framework - basically, combining what is now in the runtime and util directories into a single framework ala "svcs". The notion was that everything OS-specific would go in there. What has held up implementation is (a) some thought that maybe those things should go into OPAL instead of ORTE, and (b) low priority and more important things to do. However, if someone went ahead and implemented that idea, then you would have a "NULL" component in the base that basically does a "no-op", and a "default" component that provides actual services. Thus, for CNOS, you would take the NULL component (so you don't open the framework's components and avoid that memory overhead), and away you go. I don't see how the RSL does anything better. Admittedly, you wouldn't have to maintain the svcs APIs, but that doesn't seem any more onerous than maintaining the RSL APIs as we change the MPI/RTE interfaces. > > Also, a RSL would simplify ORTE in that there would be no need to do > anything special for CNOs in it. But if all I do is remove the ORTE cnos component and add an RSL cnos component...what have I simplified? > >> >> If the RSL is intended to aid in ORTE development, as hinted at in the RFC, >> then I believe that is questionable. Developing ORTE in a tmp branch has >> proven reasonably effective as changes to the MPI layer are largely >> invisible to ORTE. Creating another layer to the system that would also have >> to be maintained seems like a non-productive way of addressing any problems >> in that area. > Whether or not it would help in orte development remains to be seen. I > just say that it might. Although I would argue that developing in tmp > branches has caused a lot of problems with merging, etc. Guess I don't see how this would solve the merge problems...but whatever. > >> If the RSL is intended as a means of "freezing" the MPI-RTE interface, then >> I believe we could better attain that objective by simply defining a set of >> requirements for the RTE. As I'll note below, freezing the interface at an >> API level could negatively impact other Open MPI objectives. > It is intended to easily allow the development and use of other runtime > systems, so simply defining requirements is not enough. Could you please give some examples of these other runtimes?? Or is this just hypothetical at this time? > >> 2. Who is going to maintain old RTE versions, and why? >> It isn't clear to me why anyone would want to do this - are we seriously >> proposing that we maintain support for the ORTE layer that shipped with Open >> MPI 1.0?? Can someone explain why we would want to do that? > I highly doubt anyone would, and see no reason to include support for > older runtime versions. Again, the purpose is to be able to run > different runtimes. The ability to run different versions of the same > runtime is just a side-effect.
[OMPI devel] Patch for reporter and friends
Attached is a patch for the PHP side of things that does the following: * Creates a config.inc file for centralization of various user- settable parameters: * HTTP username/password for curl (passwords still protected; see code) * MTT database name/username/password * HTML header / footer * Google Analytics account number * Use the config.inc values in reporter, stats, and submit * Preliminary GA integration; if GA account number set in config.inc: * Report actual reporter URL * Report stats URL * Note that submits are not tracked by GA because the MTT client does not understand javascript * Moved "deny_mirror" functionality out of report.inc to config.inc because it's very www.open-mpi.org-specific -- Jeff Squyres Cisco Systems mtt-php.patch Description: Binary data
Re: [OMPI devel] Patch for reporter and friends
@#$%@#$% Sorry; I keep sending to devel instead of mtt-devel. On Aug 28, 2007, at 2:48 PM, Jeff Squyres wrote: Attached is a patch for the PHP side of things that does the following: * Creates a config.inc file for centralization of various user- settable parameters: * HTTP username/password for curl (passwords still protected; see code) * MTT database name/username/password * HTML header / footer * Google Analytics account number * Use the config.inc values in reporter, stats, and submit * Preliminary GA integration; if GA account number set in config.inc: * Report actual reporter URL * Report stats URL * Note that submits are not tracked by GA because the MTT client does not understand javascript * Moved "deny_mirror" functionality out of report.inc to config.inc because it's very www.open-mpi.org-specific -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
[OMPI devel] UD BTL alltoall hangs
I'm having a problem with the UD BTL and hoping someone might have some input to help solve it. What I'm seeing is hangs when running alltoall benchmarks with nbcbench or an LLNL program called mpiBench -- both hang exactly the same way. With the code on the trunk running nbcbench on IU's odin using 32 nodes and a command line like this: mpirun -np 128 -mca btl ofud,self ./nbcbench -t MPI_Alltoall -p 128-128 -s 1-262144 hangs consistently when testing 256-byte messages. There are two things I can do to make the hang go away until running at larger scale. First is to increase the 'btl_ofud_sd_num' MCA param from its default value of 128. This allows you to run with more procs/nodes before hitting the hang, but AFAICT doesn't fix the actual problem. What this parameter does is control the maximum number of outstanding send WQEs posted at the IB level -- when the limit is reached, frags are queued on an opal_list_t and later sent by progress as IB sends complete. The other way I've found is to play games with calling mca_btl_ud_component_progress() in mca_btl_ud_endpoint_post_send(). In fact I replaced the CHECK_FRAG_QUEUES() macro used around btl_ofud_endpoint.c:77 with a version that loops on progress until a send WQE slot is available (as opposed to queueing). Same result -- I can run at larger scale, but still hit the hang eventually. It appears that when the job hangs, progress is being polled very quickly, and after spinning for a while there are no outstanding send WQEs or queued sends in the BTL. I'm not sure where further up things are spinning/blocking, as I can't produce the hang at less than 32 nodes / 128 procs and don't have a good way of debugging that (suggestions appreciated). Furthermore, both ob1 and dr PMLs result in the same behavior, except that DR eventually trips a watchdog timeout, fails the BTL, and terminates the job. Other collectives such as allreduce and allgather do not hang -- only alltoall. I can also reproduce the hang on LLNL's Atlas machine. Can anyone else reproduce this (Torsten might have to make a copy of nbcbench available)? Anyone have any ideas as to what's wrong? Andrew
Re: [OMPI devel] UD BTL alltoall hangs
The first step will be to figure out which version of the alltoall you're using. I suppose you use the default parameters, and then the decision function in the tuned component say it is using the linear all to all. As the name state it, this means that every node will post one receive from any other node and then will start sending to every other node the respective fragment. This will lead to a lot of outstanding sends and receives. I doubt that the receive can cause a problem, so I expect the problem is coming from the send side. Do you have TotalView installed on your odin ? If yes there is a simple way to see how many sends are pending and where ... That might pinpoint [at least] the process where you should look to see what' wrong. george. On Aug 29, 2007, at 12:37 AM, Andrew Friedley wrote: I'm having a problem with the UD BTL and hoping someone might have some input to help solve it. What I'm seeing is hangs when running alltoall benchmarks with nbcbench or an LLNL program called mpiBench -- both hang exactly the same way. With the code on the trunk running nbcbench on IU's odin using 32 nodes and a command line like this: mpirun -np 128 -mca btl ofud,self ./nbcbench -t MPI_Alltoall -p 128-128 -s 1-262144 hangs consistently when testing 256-byte messages. There are two things I can do to make the hang go away until running at larger scale. First is to increase the 'btl_ofud_sd_num' MCA param from its default value of 128. This allows you to run with more procs/nodes before hitting the hang, but AFAICT doesn't fix the actual problem. What this parameter does is control the maximum number of outstanding send WQEs posted at the IB level -- when the limit is reached, frags are queued on an opal_list_t and later sent by progress as IB sends complete. The other way I've found is to play games with calling mca_btl_ud_component_progress() in mca_btl_ud_endpoint_post_send (). In fact I replaced the CHECK_FRAG_QUEUES() macro used around btl_ofud_endpoint.c:77 with a version that loops on progress until a send WQE slot is available (as opposed to queueing). Same result -- I can run at larger scale, but still hit the hang eventually. It appears that when the job hangs, progress is being polled very quickly, and after spinning for a while there are no outstanding send WQEs or queued sends in the BTL. I'm not sure where further up things are spinning/blocking, as I can't produce the hang at less than 32 nodes / 128 procs and don't have a good way of debugging that (suggestions appreciated). Furthermore, both ob1 and dr PMLs result in the same behavior, except that DR eventually trips a watchdog timeout, fails the BTL, and terminates the job. Other collectives such as allreduce and allgather do not hang -- only alltoall. I can also reproduce the hang on LLNL's Atlas machine. Can anyone else reproduce this (Torsten might have to make a copy of nbcbench available)? Anyone have any ideas as to what's wrong? Andrew ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel