Re: [OMPI devel] Bug btl:tcp with grpcomm:hier

2011-03-17 Thread Damien Guinier
Yes please, this fixes is asked by Bull clients. damien Le 17/03/2011 15:44, Jeff Squyres a écrit : Does this need to be CMR'ed to 1.4 and/or 1.5? On Mar 16, 2011, at 10:27 PM, Ralph Castain wrote: Okay, I fixed this in r24536. Sorry for the problem, Damien - thanks for catching it!

Re: [OMPI devel] Bug btl:tcp with grpcomm:hier

2011-03-17 Thread Damien Guinier
You are welcome. I'm happy you find quickly this fix. Thanks to all Damien Le 17/03/2011 03:27, Ralph Castain a écrit : Okay, I fixed this in r24536. Sorry for the problem, Damien - thanks for catching it! Went unnoticed because the folks at the Labs always use IB. On Mar 16, 2011, at

[OMPI devel] Bug btl:tcp with grpcomm:hier

2011-03-16 Thread Damien Guinier
Hi all From my test, it is impossible to use "btl:tcp" with "grpcomm:hier". The "grpcomm:hier" module is important because, "srun" launch protocol can't use any other "grpcomm" module. You can reproduce this bug, by using "btl:tcp" and "grpcomm:hier" , when you create a ring(like: IMB

[OMPI devel] setenv MPI_ROOT

2011-02-08 Thread Damien Guinier
by customer who use BPS and LSF batch manager. thanks Damien Guinier - diff -r 486ca4bfca95 contrib/dist/linux/openmpi.spec --- a/contrib/dist/linux/openmpi.spec Mon Feb 07 15:40:31 2011 +0100 +++ b/contrib/dist/linux/openmpi.spec Tue Feb 08 14:30:01 2011 +0100 @@ -514,6 +514,10

Re: [OMPI devel] confusion between slot and procs on mca/rmaps

2010-12-01 Thread Damien Guinier
oups Ok, you can commit it. All problem is on "procs" word, on source code, "processes" AND "cores" definition is used. Le 01/12/2010 11:37, Damien Guinier a écrit : Ok, you can commit it. All problem is on "procs" work, on source code, "proce

Re: [OMPI devel] confusion between slot and procs on mca/rmaps

2010-12-01 Thread Damien Guinier
uot; causes us to set the "bynode" flag by mistake. Did you check that? BTW: when running cpus-per-proc, a slot doesn't have X processes. I suspect this is just a language thing, but it will create confusion. A slot consists of X cpus - we still assign only one process to each slot.

[OMPI devel] srun + Intel OpenMP = SIGSEGV

2010-06-15 Thread Damien Guinier
Using Intel OpenMP in conjunction with srun seems to cause a segmentation fault, at least in the 1.5 branch. After a long time tracking this strange bug, I finally found out that the slurmd ess component was corrupting the __environ structure. This results in a crash in Intel OpenMP, which

[OMPI devel] Refresh the libevent to 1.4.13.

2010-06-07 Thread Damien Guinier
Hi all A recent update of the libevent seems to cause a regression on our side. On my 32 cpus node cluster , process launch by srun, hang on opal_event_loop(). We see a deadlock in MPI_Init (endlessly looping in opal_event_loop()) when we launch processes with pure srun on 32 cores nodes.

Re: [OMPI devel] Openmpi with slurm : salloc -c

2010-03-02 Thread Damien Guinier
appy with the way this behaves... Let me know what you find out. On Feb 26, 2010, at 9:45 AM, Damien Guinier wrote: Hi Ralph, I find a minor bug on the MCA composent: ras slurm. This one have an incorrect comportement with the "X number of processors per task" feature. On the file ort

[OMPI devel] Openmpi with slurm : salloc -c

2010-02-26 Thread Damien Guinier
Hi Ralph, I find a minor bug on the MCA composent: ras slurm. This one have an incorrect comportement with the "X number of processors per task" feature. On the file orte/mca/ras/slurm/ras_slurm_module.c, line 356: - The node slot number is divide with "cpus_per_task" information, but

Re: [OMPI devel] using hnp_always_use_plm

2009-12-18 Thread Damien Guinier
s module as it will then try to track its own launches and totally forget that it is a remote orted with slightly different responsibilities. If you need it to execute a different plm on the backend, please let me know - it is a trivial change to allow specification of remote launch agents, and we

[OMPI devel] using hnp_always_use_plm

2009-12-18 Thread Damien Guinier
Hi Ralph On Openmpi, I working on a new little feature: hnp_always_use_plm. - To create final application , mpirun use on remote "orted via plm: Process lifecycle managment module" or localy "fork()". So the first compute node haven't the same methode than other compute node. Some debug

[OMPI devel] MPI_finalize with srun

2009-12-07 Thread Damien Guinier
Hi Ralph I have found a bug in the 'grpcomm' : 'hier'. This bug create a infinite loop in mpi_finalize. In this module: the barrier is executed as an allgather with data length of zero. This allgather function can go to an infinite loop , depend of rank execution order. In