Re: [OMPI devel] v1.7.0rc7

2013-02-26 Thread Eugene Loh

On 02/23/13 14:45, Ralph Castain wrote:

This release candidate is the last one we expect to have before release, so 
please test it. Can be downloaded from the usual place:

http://www.open-mpi.org/software/ompi/v1.7/


I haven't looked at this very carefully yet.  Maybe someone can confirm what I'm seeing?  If I try to "mpirun `pwd`", the job should 
fail (since I'm launching a directory rather than an executable).  With v1.7, however, the return status is 0.  (The error message 
also suggests some confusion.)


My experiment is to run

mpirun `pwd`
echo status is $status

Here is v1.7:

--
Open MPI tried to fork a new process via the "execve" system call but
failed.  This is an unusual error because Open MPI checks many things
before attempting to launch a child process.  This error may be
indicative of another problem on the target host.  Your job will now
abort.

  Local host:/workspace/eugene/v1.7-testing
  Application name:  Permission denied
--
status is 0

Here is v1.6:

--
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--
status is 1


Re: [OMPI devel] RFC: Remove windows support

2013-02-26 Thread Jeff Squyres (jsquyres)
No other issues were raised about this, and today was the timeout.

On the call today, Ralph volunteered to do the work:

- svn rm the windows-specific components
- remove all the #if Windows-specific code

He'll be doing that over the next week or so.



On Feb 18, 2013, at 1:34 PM, Ralph Castain  wrote:

> Thanks Marco - I was hoping that would be the case!
> 
> 
> On Feb 18, 2013, at 8:42 AM, marco atzeri  wrote:
> 
>> On 2/18/2013 5:10 PM, Jeff Squyres (jsquyres) wrote:
>>> WHAT: Remove all Windows code from the trunk.
>>> 
>>> WHY: This issue keeps coming up over and over and over...
>>> 
>> [cut]
>>> 2. Remove all Windows code.  This involves some wholesale removing of 
>>> components as well as a bunch of #if code throughout the code base.
>>> 
>>>  ==> Removing this code can probably be done in multiple SVN commits:
>>> 
>>> 2a. Removing Windows-only components (which, given the rate of change that 
>>> we are planning for the trunk, may well need to be re-written if they are 
>>> ever re-introduced into the tree).
>> 
>> Cygwin does not use them. I'm currently building the trunk packages with
>> 
>> --enable-mca-no-build=paffinity,installdirs-windows,timer-windows,shmem-sysv,if-windows,shmem-windows
>> 
>> to specifically exclude them
>> 
>>> 2b. Removing "#if WINDOWS" code (e.g., in opal/util/*, etc.).  This code 
>>> may not be changing as much as the rest of the trunk, and may be suitable 
>>> for svn reverting someday.
>>> 
>>> This does kill Cygwin support, too.  I realize we have a downstream 
>>> packager for Cygwin, but the fact that we can't get any developer support 
>>> for Windows -- despite multiple appeals -- seems to imply that the Windows 
>>> Open MPI audience is very, very small.  So while it feels a bit sad to kill 
>>> it, it may still be the Right Thing to do.
>> 
>> I assume it is __WINDOWS__
>> That is not defined on cygwin, so the build should survive
>> 
>>> 
>>> This is a proposal, and is open for discussion.
>>> 
>> 
>> Regards
>> Marco
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [MTT devel] fix zombie commit

2013-02-26 Thread Jeff Squyres (jsquyres)
On Feb 26, 2013, at 2:11 AM, Mike Dubman  wrote:

> On Mon, Feb 25, 2013 at 6:24 PM, Jeff Squyres (jsquyres)  
> wrote:
> >Looking at the code, you're checking for zombie status before MTT kills the 
> >proc.  Am I reading that right?
> I don`t think the order matters, if process is not Zombie yet and about to be 
> killed by MTT later - it is a good flow.
> If process is already Zombie - mtt will not be able to kill it anyway and and 
> can stop waiting and switch to the new task.

No, the _kill_proc() routine does both a kill() and a waitpid().  The waitpid() 
should reap the zombie.

I.e., if the process has died, MTT simply just hasn't reaped it yet.  Hence, 
it's a zombie.

> >If so, then it could well be that the process has exited but not yet been 
> >reaped (because _kill_proc() hasn't been invoked yet).  If this is the case, 
> >is the real cause of the problem that >the OUTread and ERRread aren't being 
> >closed when the child process exits, and therefore we keep looping looking 
> >for new output from them?
> yep, sounds like it can be the cause, need to look into this code.

Ok.  It would be interesting to see if the process dies, but:

1) MTT is still blocking in select() (i.e., OUTread and OUTerr aren't returning 
0 from sysread upon process death)

2) $done is somehow not getting set to 0, and therefore MTT is still looping 
until the timeout expires

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] v1.7.0rc7

2013-02-26 Thread Siegmar Gross
Hi

> This release candidate is the last one we expect to have
> before release, so please test it. Can be downloaded from
> the usual place:
> 
> http://www.open-mpi.org/software/ompi/v1.7/
> 
> Latest changes include:
> 
> * update of the alps/lustre configure code
> * fixed solaris hwloc code
> * various mxm updates
> * removed java bindings (delayed until later release)
> * improved the --report-bindings output
> * a variety of minor cleanups


My rankfiles don't work.

tyr rankfiles 106 ompi_info | grep "MPI:"
Open MPI: 1.7rc7
tyr rankfiles 107 mpiexec -report-bindings -rf rf_ex_linpc hostname
--
All nodes which are allocated for this job are already filled.
--
tyr rankfiles 108 mpiexec -report-bindings -rf rf_ex_sunpc hostname
--
All nodes which are allocated for this job are already filled.
--
tyr rankfiles 109 mpiexec -report-bindings -rf rf_ex_sunpc_linpc hostname
--
All nodes which are allocated for this job are already filled.
--
tyr rankfiles 110 



They work as expected for openmpi-1.6.4.

tyr rankfiles 99 ompi_info | grep "MPI:"
Open MPI: 1.6.4rc4r28039
tyr rankfiles 100 mpiexec -report-bindings -rf rf_ex_linpc hostname
[linpc0:17655] MCW rank 0 bound to socket 0[core 0-1]
  socket 1[core 0-1]: [B B][B B] (slot list 0:0-1,1:0-1)
linpc0
linpc1
[linpc1:06707] MCW rank 1 bound to socket 0[core 0-1]:
  [B B][. .] (slot list 0:0-1)
linpc1
[linpc1:06707] MCW rank 2 bound to socket 1[core 0]:
  [. .][B .] (slot list 1:0)
[linpc1:06707] MCW rank 3 bound to socket 1[core 1]:
  [. .][. B] (slot list 1:1)
linpc1

tyr rankfiles 101 mpiexec -report-bindings -rf rf_ex_sunpc hostname
[sunpc0:22706] MCW rank 0 bound to socket 0[core 0-1]
  socket 1[core 0-1]: [B B][B B] (slot list 0:0-1,1:0-1)
sunpc0
sunpc1
[sunpc1:25189] MCW rank 1 bound to socket 0[core 0-1]:
  [B B][. .] (slot list 0:0-1)
sunpc1
[sunpc1:25189] MCW rank 2 bound to socket 1[core 0]:
  [. .][B .] (slot list 1:0)
[sunpc1:25189] MCW rank 3 bound to socket 1[core 1]:
  [. .][. B] (slot list 1:1)
sunpc1

tyr rankfiles 102 mpiexec -report-bindings -rf rf_ex_sunpc_linpc hostname
[linpc1:06777] MCW rank 0 bound to socket 0[core 0-1]
  socket 1[core 0-1]: [B B][B B] (slot list 0:0-1,1:0-1)
linpc1
sunpc1
[sunpc1:25226] MCW rank 1 bound to socket 0[core 0-1]:
  [B B][. .] (slot list 0:0-1)
sunpc1
[sunpc1:25226] MCW rank 2 bound to socket 1[core 0]:
  [. .][B .] (slot list 1:0)
[sunpc1:25226] MCW rank 3 bound to socket 1[core 1]:
  [. .][. B] (slot list 1:1)
sunpc1
tyr rankfiles 103 


Kind regards

Siegmar



Re: [OMPI devel] v1.7.0rc7

2013-02-26 Thread George Bosilca
These warnings are now fixed (r28106). Thanks for reporting them.

  George.

On Feb 26, 2013, at 04:27 , marco atzeri  wrote:

>  CC   to_self.o
> /pub/devel/openmpi/openmpi-1.7rc7-1/src/openmpi-1.7rc7/test/datatype/to_self.c:
>  In function ‘create_indexed_constant_gap_ddt’:
> /pub/devel/openmpi/openmpi-1.7rc7-1/src/openmpi-1.7rc7/test/datatype/to_self.c:48:5:
>  warning: ‘MPI_Type_struct’ is deprecated (declared at 
> ../../ompi/include/mpi.h:1579): MPI_Type_struct is superseded by 
> MPI_Type_create_struct in MPI-2.0
> /pub/devel/openmpi/openmpi-1.7rc7-1/src/openmpi-1.7rc7/test/datatype/to_self.c:
>  In function ‘create_indexed_gap_ddt’:
> /pub/devel/openmpi/openmpi-1.7rc7-1/src/openmpi-1.7rc7/test/datatype/to_self.c:89:5:
>  warning: ‘MPI_Address’ is deprecated (declared at 
> ../../ompi/include/mpi.h:1057): MPI_Address is superseded by MPI_Get_address 
> in MPI-2.0
> /pub/devel/openmpi/openmpi-1.7rc7-1/src/openmpi-1.7rc7/test/datatype/to_self.c:90:5:
>  warning: ‘MPI_Address’ is deprecated (declared at 
> ../../ompi/include/mpi.h:1057): MPI_Address is superseded by MPI_Get_address 
> in MPI-2.0
> /pub/devel/openmpi/openmpi-1.7rc7-1/src/openmpi-1.7rc7/test/datatype/to_self.c:93:5:
>  warning: ‘MPI_Type_struct’ is deprecated (declared at 
> ../../ompi/include/mpi.h:1579): MPI_Type_struct is superseded by 
> MPI_Type_create_struct in MPI-2.0
> /pub/devel/openmpi/openmpi-1.7rc7-1/src/openmpi-1.7rc7/test/datatype/to_self.c:99:5:
>  warning: ‘MPI_Address’ is deprecated (declared at 
> ../../ompi/include/mpi.h:1057): MPI_Address is superseded by MPI_Get_address 
> in MPI-2.0
> /pub/devel/openmpi/openmpi-1.7rc7-1/src/openmpi-1.7rc7/test/datatype/to_self.c:100:5:
>  warning: ‘MPI_Address’ is deprecated (declared at 
> ../../ompi/include/mpi.h:1057): MPI_Address is superseded by MPI_Get_address 
> in MPI-2.0
> /pub/devel/openmpi/openmpi-1.7rc7-1/src/openmpi-1.7rc7/test/datatype/to_self.c:105:5:
>  warning: ‘MPI_Type_struct’ is deprecated (declared at 
> ../../ompi/include/mpi.h:1579): MPI_Type_struct is superseded by 
> MPI_Type_create_struct in MPI-2.0
> /pub/devel/openmpi/openmpi-1.7rc7-1/src/openmpi-1.7rc7/test/datatype/to_self.c:
>  In function ‘create_indexed_gap_optimized_ddt’:
> /pub/devel/openmpi/openmpi-1.7rc7-1/src/openmpi-1.7rc7/test/datatype/to_self.c:139:5:
>  warning: ‘MPI_Type_struct’ is deprecated (declared at 
> ../../ompi/include/mpi.h:1579): MPI_Type_struct is superseded by 
> MPI_Type_create_struct in MPI-2.0
> /pub/devel/openmpi/openmpi-1.7rc7-1/src/openmpi-1.7rc7/test/datatype/to_self.c:
>  In function ‘do_test_for_ddt’:
> /pub/devel/openmpi/openmpi-1.7rc7-1/src/openmpi-1.7rc7/test/datatype/to_self.c:307:5:
>  warning: ‘MPI_Type_extent’ is deprecated (declared at 
> ../../ompi/include/mpi.h:1541): MPI_Type_extent is superseded by 
> MPI_Type_get_extent in MPI-2.0
>  CCLD to_self.exe