Re: [OMPI devel] trunk borked -- my fault

2009-08-04 Thread Jeff Squyres

On Aug 4, 2009, at 5:50 PM, Jeff Squyres (jsquyres) wrote:


Ah -- I see an AC 2.63b release note:

** AC_REQUIRE now detects the case of an outer macro which first  
expands

then later indirectly requires the same inner macro.  Previously,




Yes, this is exactly what was happening.

The AC_REQUIRE's that I added force the tests to be above the  
respective stdout section headers, which is a little bit of a bummer.   
I'll fix that.


--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI devel] trunk borked -- my fault

2009-08-04 Thread Jeff Squyres

Ah -- I see an AC 2.63b release note:

** AC_REQUIRE now detects the case of an outer macro which first expands
   then later indirectly requires the same inner macro.  Previously,
   this case led to silent out-of-order expansion (bug present since
   2.50); it now issues a syntax warning, and duplicates the expansion
   of the inner macro to guarantee dependencies have been met.  See
   the manual for advice on how to refactor macros in order to avoid
   the bug in earlier autoconf versions and avoid increased script
   size in the current version.

This looks related to what I am seeing.

/me goes to investigate...


On Aug 4, 2009, at 5:47 PM, George Bosilca wrote:


Indeed, r21759 solves the problem. ompi compile successfully on Mac OS
X with autoconf 2.64.

   Thanks,
 george.

On Aug 4, 2009, at 17:41 , Jeff Squyres wrote:

> On Aug 4, 2009, at 5:37 PM, George Bosilca wrote:
>
>> I used 2.64 for about a week on a bunch of machines. I never had
>> problems with it before...
>>
>> After checking it turned out that autoconf 2.64 was freshly  
installed

>> on my Mac, so this might be a problem with autoconf 2.64 and MAC OS
>> X ... I'll go back to 2.63 until we figure out a way to solve these
>> problems.
>>
>
> FWIW, I saw the warnings on Linux as well, and then configure failed
> later in spectacular and interesting ways (I didn't let it get to
> the build because configure was so borked up -- all the individual
> POSIX .h file tests said that the file was present but could not be
> compiled because somehow it was stuck trying to compile them with
> gfortran (!) instead of gcc).  Something changed in AC2.64 with
> regards to how they do language REQUIRE'ing, etc. that I don't fully
> understand.
>
> Let me know if the workaround in r21759 works for you.
>
> --
> Jeff Squyres
> jsquy...@cisco.com
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI devel] trunk borked -- my fault

2009-08-04 Thread George Bosilca
Indeed, r21759 solves the problem. ompi compile successfully on Mac OS  
X with autoconf 2.64.


  Thanks,
george.

On Aug 4, 2009, at 17:41 , Jeff Squyres wrote:


On Aug 4, 2009, at 5:37 PM, George Bosilca wrote:


I used 2.64 for about a week on a bunch of machines. I never had
problems with it before...

After checking it turned out that autoconf 2.64 was freshly installed
on my Mac, so this might be a problem with autoconf 2.64 and MAC OS
X ... I'll go back to 2.63 until we figure out a way to solve these
problems.



FWIW, I saw the warnings on Linux as well, and then configure failed  
later in spectacular and interesting ways (I didn't let it get to  
the build because configure was so borked up -- all the individual  
POSIX .h file tests said that the file was present but could not be  
compiled because somehow it was stuck trying to compile them with  
gfortran (!) instead of gcc).  Something changed in AC2.64 with  
regards to how they do language REQUIRE'ing, etc. that I don't fully  
understand.


Let me know if the workaround in r21759 works for you.

--
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] trunk borked -- my fault

2009-08-04 Thread Jeff Squyres

On Aug 4, 2009, at 5:37 PM, George Bosilca wrote:


I used 2.64 for about a week on a bunch of machines. I never had
problems with it before...

After checking it turned out that autoconf 2.64 was freshly installed
on my Mac, so this might be a problem with autoconf 2.64 and MAC OS
X ... I'll go back to 2.63 until we figure out a way to solve these
problems.



FWIW, I saw the warnings on Linux as well, and then configure failed  
later in spectacular and interesting ways (I didn't let it get to the  
build because configure was so borked up -- all the individual  
POSIX .h file tests said that the file was present but could not be  
compiled because somehow it was stuck trying to compile them with  
gfortran (!) instead of gcc).  Something changed in AC2.64 with  
regards to how they do language REQUIRE'ing, etc. that I don't fully  
understand.


Let me know if the workaround in r21759 works for you.

--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI devel] trunk borked -- my fault

2009-08-04 Thread Jeff Squyres
https://svn.open-mpi.org/trac/ompi/changeset/21759 seems to make us  
play well with AC 2.64.  To be honest, I'm not sure why this change  
works, but it does.


I'm going to ping Ralf W. and see if he's got any insight here...


On Aug 4, 2009, at 5:17 PM, Jeff Squyres (jsquyres) wrote:


Checking this further, my C++ changes were r21755.  Updating my SVN
tree to the commit before that (r21754), I see that AC 2.64 on this
tree issues these same warnings, but then configure works and the
build seems to proceed as normal.

Did you try AC 2.64 before today?

If not, I'd advise backing off AC 2.64 and moving back down to AC 2.63
until we can figure those warnings out.  They *seem* to be harmless,
but I'm not entirely sure.

It looks like some things changed 2.63->2.64 with regards to how
languages are selected / used within AC 2.64 that break some of the
things I did today (perhaps AC 2.64 just got more strict...?).



On Aug 4, 2009, at 4:43 PM, Jeff Squyres (jsquyres) wrote:

> Doh.  I tested with 2.63.  I'll check out 2.64 right now...
>
>
> On Aug 4, 2009, at 4:37 PM, George Bosilca wrote:
>
> > Not completely fixed. With the latest version of autoconf (2.64) I
> get
> > a bunch of warnings.
> >
> > configure.ac:449: warning: AC_REQUIRE: `AC_PROG_CXX' was expanded
> > before it was required
> > ../../lib/autoconf/c.m4:671: AC_LANG_COMPILER(C++) is expanded
> from...
> > ../../lib/autoconf/lang.m4:315: AC_LANG_COMPILER_REQUIRE is  
expanded

> > from...
> > ../../lib/autoconf/general.m4:2735: AC_RUN_IFELSE is expanded
> from...
> > ../../lib/m4sugar/m4sh.m4:620: AS_IF is expanded from...
> > ../../lib/autoconf/general.m4:2018: AC_CACHE_VAL is expanded  
from...

> > ../../lib/autoconf/general.m4:2039: AC_CACHE_CHECK is expanded
> from...
> > config/ompi_check_compiler_works.m4:28:  
OMPI_CHECK_COMPILER_WORKS is

> > expanded from...
> > config/ompi_setup_cxx.m4:48: _OMPI_SETUP_CXX_COMPILER is expanded
> > from...
> > config/ompi_setup_cxx.m4:28: OMPI_SETUP_CXX is expanded from...
> > configure.ac:449: the top level
> > configure.ac:488: warning: AC_REQUIRE: `AC_PROG_F77' was expanded
> > before it was required
> > ../../lib/autoconf/fortran.m4:272: AC_LANG_COMPILER(Fortran 77) is
> > expanded from...
> > config/ompi_setup_f77.m4:35: OMPI_SETUP_F77 is expanded from...
> > configure.ac:488: the top level
> > configure.ac:603: warning: AC_REQUIRE: `AC_PROG_FC' was expanded
> > before it was required
> > ../../lib/autoconf/fortran.m4:279: AC_LANG_COMPILER(Fortran) is
> > expanded from...
> > config/ompi_setup_f90.m4:37: OMPI_SETUP_F90 is expanded from...
> > configure.ac:603: the top level
> >
> >george.
> >
> >
> > On Aug 4, 2009, at 14:49 , Jeff Squyres wrote:
> >
> > > Should be fixed in https://svn.open-mpi.org/trac/ompi/changeset/
> > > 21758.  Sorry for the interruption...
> > >
> > >
> > > On Aug 4, 2009, at 10:24 AM, Jeff Squyres wrote:
> > >
> > >> Doh!
> > >>
> > >> I committed the "we don't need no stinkin' C++ compiler"  
changes
> > >> this morning after a bunch of testing, but I totally  
neglected to

> > >> test the case *with* a C++ compiler.  :-(
> > >>
> > >> So the trunk is borked at the moment; I'm working on a fix...
> > >>
> > >> --
> > >> Jeff Squyres
> > >> jsquy...@cisco.com
> > >>
> > >> ___
> > >> devel mailing list
> > >> de...@open-mpi.org
> > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > >
> > >
> > > --
> > > Jeff Squyres
> > > jsquy...@cisco.com
> > >
> > > ___
> > > devel mailing list
> > > de...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


--
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI devel] trunk borked -- my fault

2009-08-04 Thread George Bosilca
I used 2.64 for about a week on a bunch of machines. I never had  
problems with it before...


After checking it turned out that autoconf 2.64 was freshly installed  
on my Mac, so this might be a problem with autoconf 2.64 and MAC OS  
X ... I'll go back to 2.63 until we figure out a way to solve these  
problems.


  george.

On Aug 4, 2009, at 17:17 , Jeff Squyres wrote:

Checking this further, my C++ changes were r21755.  Updating my SVN  
tree to the commit before that (r21754), I see that AC 2.64 on this  
tree issues these same warnings, but then configure works and the  
build seems to proceed as normal.


Did you try AC 2.64 before today?

If not, I'd advise backing off AC 2.64 and moving back down to AC  
2.63 until we can figure those warnings out.  They *seem* to be  
harmless, but I'm not entirely sure.


It looks like some things changed 2.63->2.64 with regards to how  
languages are selected / used within AC 2.64 that break some of the  
things I did today (perhaps AC 2.64 just got more strict...?).




On Aug 4, 2009, at 4:43 PM, Jeff Squyres (jsquyres) wrote:


Doh.  I tested with 2.63.  I'll check out 2.64 right now...


On Aug 4, 2009, at 4:37 PM, George Bosilca wrote:

> Not completely fixed. With the latest version of autoconf (2.64)  
I get

> a bunch of warnings.
>
> configure.ac:449: warning: AC_REQUIRE: `AC_PROG_CXX' was expanded
> before it was required
> ../../lib/autoconf/c.m4:671: AC_LANG_COMPILER(C++) is expanded  
from...
> ../../lib/autoconf/lang.m4:315: AC_LANG_COMPILER_REQUIRE is  
expanded

> from...
> ../../lib/autoconf/general.m4:2735: AC_RUN_IFELSE is expanded  
from...

> ../../lib/m4sugar/m4sh.m4:620: AS_IF is expanded from...
> ../../lib/autoconf/general.m4:2018: AC_CACHE_VAL is expanded  
from...
> ../../lib/autoconf/general.m4:2039: AC_CACHE_CHECK is expanded  
from...
> config/ompi_check_compiler_works.m4:28: OMPI_CHECK_COMPILER_WORKS  
is

> expanded from...
> config/ompi_setup_cxx.m4:48: _OMPI_SETUP_CXX_COMPILER is expanded
> from...
> config/ompi_setup_cxx.m4:28: OMPI_SETUP_CXX is expanded from...
> configure.ac:449: the top level
> configure.ac:488: warning: AC_REQUIRE: `AC_PROG_F77' was expanded
> before it was required
> ../../lib/autoconf/fortran.m4:272: AC_LANG_COMPILER(Fortran 77) is
> expanded from...
> config/ompi_setup_f77.m4:35: OMPI_SETUP_F77 is expanded from...
> configure.ac:488: the top level
> configure.ac:603: warning: AC_REQUIRE: `AC_PROG_FC' was expanded
> before it was required
> ../../lib/autoconf/fortran.m4:279: AC_LANG_COMPILER(Fortran) is
> expanded from...
> config/ompi_setup_f90.m4:37: OMPI_SETUP_F90 is expanded from...
> configure.ac:603: the top level
>
>george.
>
>
> On Aug 4, 2009, at 14:49 , Jeff Squyres wrote:
>
> > Should be fixed in https://svn.open-mpi.org/trac/ompi/changeset/
> > 21758.  Sorry for the interruption...
> >
> >
> > On Aug 4, 2009, at 10:24 AM, Jeff Squyres wrote:
> >
> >> Doh!
> >>
> >> I committed the "we don't need no stinkin' C++ compiler" changes
> >> this morning after a bunch of testing, but I totally neglected  
to

> >> test the case *with* a C++ compiler.  :-(
> >>
> >> So the trunk is borked at the moment; I'm working on a fix...
> >>
> >> --
> >> Jeff Squyres
> >> jsquy...@cisco.com
> >>
> >> ___
> >> devel mailing list
> >> de...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> >
> > --
> > Jeff Squyres
> > jsquy...@cisco.com
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


--
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




--
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] trunk borked -- my fault

2009-08-04 Thread Jeff Squyres
Checking this further, my C++ changes were r21755.  Updating my SVN  
tree to the commit before that (r21754), I see that AC 2.64 on this  
tree issues these same warnings, but then configure works and the  
build seems to proceed as normal.


Did you try AC 2.64 before today?

If not, I'd advise backing off AC 2.64 and moving back down to AC 2.63  
until we can figure those warnings out.  They *seem* to be harmless,  
but I'm not entirely sure.


It looks like some things changed 2.63->2.64 with regards to how  
languages are selected / used within AC 2.64 that break some of the  
things I did today (perhaps AC 2.64 just got more strict...?).




On Aug 4, 2009, at 4:43 PM, Jeff Squyres (jsquyres) wrote:


Doh.  I tested with 2.63.  I'll check out 2.64 right now...


On Aug 4, 2009, at 4:37 PM, George Bosilca wrote:

> Not completely fixed. With the latest version of autoconf (2.64) I  
get

> a bunch of warnings.
>
> configure.ac:449: warning: AC_REQUIRE: `AC_PROG_CXX' was expanded
> before it was required
> ../../lib/autoconf/c.m4:671: AC_LANG_COMPILER(C++) is expanded  
from...

> ../../lib/autoconf/lang.m4:315: AC_LANG_COMPILER_REQUIRE is expanded
> from...
> ../../lib/autoconf/general.m4:2735: AC_RUN_IFELSE is expanded  
from...

> ../../lib/m4sugar/m4sh.m4:620: AS_IF is expanded from...
> ../../lib/autoconf/general.m4:2018: AC_CACHE_VAL is expanded from...
> ../../lib/autoconf/general.m4:2039: AC_CACHE_CHECK is expanded  
from...

> config/ompi_check_compiler_works.m4:28: OMPI_CHECK_COMPILER_WORKS is
> expanded from...
> config/ompi_setup_cxx.m4:48: _OMPI_SETUP_CXX_COMPILER is expanded
> from...
> config/ompi_setup_cxx.m4:28: OMPI_SETUP_CXX is expanded from...
> configure.ac:449: the top level
> configure.ac:488: warning: AC_REQUIRE: `AC_PROG_F77' was expanded
> before it was required
> ../../lib/autoconf/fortran.m4:272: AC_LANG_COMPILER(Fortran 77) is
> expanded from...
> config/ompi_setup_f77.m4:35: OMPI_SETUP_F77 is expanded from...
> configure.ac:488: the top level
> configure.ac:603: warning: AC_REQUIRE: `AC_PROG_FC' was expanded
> before it was required
> ../../lib/autoconf/fortran.m4:279: AC_LANG_COMPILER(Fortran) is
> expanded from...
> config/ompi_setup_f90.m4:37: OMPI_SETUP_F90 is expanded from...
> configure.ac:603: the top level
>
>george.
>
>
> On Aug 4, 2009, at 14:49 , Jeff Squyres wrote:
>
> > Should be fixed in https://svn.open-mpi.org/trac/ompi/changeset/
> > 21758.  Sorry for the interruption...
> >
> >
> > On Aug 4, 2009, at 10:24 AM, Jeff Squyres wrote:
> >
> >> Doh!
> >>
> >> I committed the "we don't need no stinkin' C++ compiler" changes
> >> this morning after a bunch of testing, but I totally neglected to
> >> test the case *with* a C++ compiler.  :-(
> >>
> >> So the trunk is borked at the moment; I'm working on a fix...
> >>
> >> --
> >> Jeff Squyres
> >> jsquy...@cisco.com
> >>
> >> ___
> >> devel mailing list
> >> de...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> >
> > --
> > Jeff Squyres
> > jsquy...@cisco.com
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


--
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI devel] trunk borked -- my fault

2009-08-04 Thread Jeff Squyres

Doh.  I tested with 2.63.  I'll check out 2.64 right now...


On Aug 4, 2009, at 4:37 PM, George Bosilca wrote:


Not completely fixed. With the latest version of autoconf (2.64) I get
a bunch of warnings.

configure.ac:449: warning: AC_REQUIRE: `AC_PROG_CXX' was expanded
before it was required
../../lib/autoconf/c.m4:671: AC_LANG_COMPILER(C++) is expanded from...
../../lib/autoconf/lang.m4:315: AC_LANG_COMPILER_REQUIRE is expanded
from...
../../lib/autoconf/general.m4:2735: AC_RUN_IFELSE is expanded from...
../../lib/m4sugar/m4sh.m4:620: AS_IF is expanded from...
../../lib/autoconf/general.m4:2018: AC_CACHE_VAL is expanded from...
../../lib/autoconf/general.m4:2039: AC_CACHE_CHECK is expanded from...
config/ompi_check_compiler_works.m4:28: OMPI_CHECK_COMPILER_WORKS is
expanded from...
config/ompi_setup_cxx.m4:48: _OMPI_SETUP_CXX_COMPILER is expanded
from...
config/ompi_setup_cxx.m4:28: OMPI_SETUP_CXX is expanded from...
configure.ac:449: the top level
configure.ac:488: warning: AC_REQUIRE: `AC_PROG_F77' was expanded
before it was required
../../lib/autoconf/fortran.m4:272: AC_LANG_COMPILER(Fortran 77) is
expanded from...
config/ompi_setup_f77.m4:35: OMPI_SETUP_F77 is expanded from...
configure.ac:488: the top level
configure.ac:603: warning: AC_REQUIRE: `AC_PROG_FC' was expanded
before it was required
../../lib/autoconf/fortran.m4:279: AC_LANG_COMPILER(Fortran) is
expanded from...
config/ompi_setup_f90.m4:37: OMPI_SETUP_F90 is expanded from...
configure.ac:603: the top level

   george.


On Aug 4, 2009, at 14:49 , Jeff Squyres wrote:

> Should be fixed in https://svn.open-mpi.org/trac/ompi/changeset/
> 21758.  Sorry for the interruption...
>
>
> On Aug 4, 2009, at 10:24 AM, Jeff Squyres wrote:
>
>> Doh!
>>
>> I committed the "we don't need no stinkin' C++ compiler" changes
>> this morning after a bunch of testing, but I totally neglected to
>> test the case *with* a C++ compiler.  :-(
>>
>> So the trunk is borked at the moment; I'm working on a fix...
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI devel] trunk borked -- my fault

2009-08-04 Thread George Bosilca
Not completely fixed. With the latest version of autoconf (2.64) I get  
a bunch of warnings.


configure.ac:449: warning: AC_REQUIRE: `AC_PROG_CXX' was expanded  
before it was required

../../lib/autoconf/c.m4:671: AC_LANG_COMPILER(C++) is expanded from...
../../lib/autoconf/lang.m4:315: AC_LANG_COMPILER_REQUIRE is expanded  
from...

../../lib/autoconf/general.m4:2735: AC_RUN_IFELSE is expanded from...
../../lib/m4sugar/m4sh.m4:620: AS_IF is expanded from...
../../lib/autoconf/general.m4:2018: AC_CACHE_VAL is expanded from...
../../lib/autoconf/general.m4:2039: AC_CACHE_CHECK is expanded from...
config/ompi_check_compiler_works.m4:28: OMPI_CHECK_COMPILER_WORKS is  
expanded from...
config/ompi_setup_cxx.m4:48: _OMPI_SETUP_CXX_COMPILER is expanded  
from...

config/ompi_setup_cxx.m4:28: OMPI_SETUP_CXX is expanded from...
configure.ac:449: the top level
configure.ac:488: warning: AC_REQUIRE: `AC_PROG_F77' was expanded  
before it was required
../../lib/autoconf/fortran.m4:272: AC_LANG_COMPILER(Fortran 77) is  
expanded from...

config/ompi_setup_f77.m4:35: OMPI_SETUP_F77 is expanded from...
configure.ac:488: the top level
configure.ac:603: warning: AC_REQUIRE: `AC_PROG_FC' was expanded  
before it was required
../../lib/autoconf/fortran.m4:279: AC_LANG_COMPILER(Fortran) is  
expanded from...

config/ompi_setup_f90.m4:37: OMPI_SETUP_F90 is expanded from...
configure.ac:603: the top level

  george.


On Aug 4, 2009, at 14:49 , Jeff Squyres wrote:

Should be fixed in https://svn.open-mpi.org/trac/ompi/changeset/ 
21758.  Sorry for the interruption...



On Aug 4, 2009, at 10:24 AM, Jeff Squyres wrote:


Doh!

I committed the "we don't need no stinkin' C++ compiler" changes  
this morning after a bunch of testing, but I totally neglected to  
test the case *with* a C++ compiler.  :-(


So the trunk is borked at the moment; I'm working on a fix...

--
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] trunk borked -- my fault

2009-08-04 Thread Jeff Squyres
Should be fixed in https://svn.open-mpi.org/trac/ompi/changeset/ 
21758.  Sorry for the interruption...



On Aug 4, 2009, at 10:24 AM, Jeff Squyres wrote:


Doh!

I committed the "we don't need no stinkin' C++ compiler" changes  
this morning after a bunch of testing, but I totally neglected to  
test the case *with* a C++ compiler.  :-(


So the trunk is borked at the moment; I'm working on a fix...

--
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
jsquy...@cisco.com



[OMPI devel] trunk borked -- my fault

2009-08-04 Thread Jeff Squyres

Doh!

I committed the "we don't need no stinkin' C++ compiler" changes this  
morning after a bunch of testing, but I totally neglected to test the  
case *with* a C++ compiler.  :-(


So the trunk is borked at the moment; I'm working on a fix...

--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI devel] Device failover on ob1

2009-08-04 Thread Graham, Richard L.
>From my perspective, the assumption that the low-level is reliable is 
>completely
 consistent with the assumptions that went into the ob1 design, so I don't see 
changes
 you may propose as a problem in principal.

Thanks a lot for the clarification,
Rich


On 8/3/09 9:39 AM, "Mouhamed Gueye"  wrote:

Hi list,

I'll try to answer to the main concerns so far.

We chose to work on ob1 for mainly 2 reasons:
- we focused first on fixing  dr  but  were quite disappointed by its
performance in comparison with ob1. Then, we oriented our work on ob1 to
provide failover while keeping good performance.
- Secondly, we wanted to avoid as much as possible to fork ob1 to stay
up-to-date with the code base. Plus, the failover layer is so thin (in
comparison with the code base) that it would not make sense to fork the
base into a new pml.

But we were aware that ob1 won't allow any non-zero impact change and
that is why the added code is configured out by default. Actually, we
wanted to address long jobs that can afford a very little performance
loss but won't allow aborting after several hours or days of computation
because of one port failure. The goal of this prototype is to provide a
proof of concept for discussion, as we know there are other people
working on this subject.

As stated in the previous mail, the idea is to store any sent btl
descriptor until it is marked as delivered. For that, we rely on
completion callbacks and the assumption, clearly, is that a completion
function called means message delivery to the remote card. The
underlying btl is the one that ensures message delivery. This is
currently the case of the openib btl, but any other btl may be able to
do so. So, with that assumption, we do not need any pml level
acknowledgment protocol  (no extra messages).
No timer is needed for retransmission as it is triggered by btl failure.
Today, only error callback scenario is implemented. We should also treat
btl send method return codes. To deal with message duplication, the
protocol maintains a message id allowing to track received messages
(hence the larger header). So any duplicated message will not be processed.

Concerning the openib btl, on a multi-port system, the connection scheme
is supposed to be (host 1-port 0) <==> (host 2-port 0) and (host 1-port
1) <==> (host 2-port 1) for example. This is done at btl endpoint
initialization but when establishing connexion at first send attempt,
the port association information is not processed. This results in a
crossed connection scheme ( (host 1-port 0) <==> (host 2-port 1) and
(host 1-port 1) <==> (host 2-port 0)). So, instead of having two
separate rings or paths, we have 1 big ring that does not allow
failover. We had to fix this to enable failover in both multi-path (same
network) and multi-rail (2 separate networks) with openib.

Brian, so far, we are able to switch from one failing btl to a safe one
only. When there is no more btl left, we abort the job. Next step is to
be able to re-establish the connection when the network is back.

Mouhamed
Graham, Richard L. a écrit :
> What is the impact on sm, which is by far the most sensitive to latency. This 
> really belongs in a place other than ob1.  Ob1 is supposed to provide the 
> lowest latency possible, and other pml's are supposed to be used for heavier 
> weight protocols.
>
> On the technical side, how do you distinguish between a lot acknowledgement 
> and an undelivered message ?  You really don't want to try and deliver data 
> into user space twice, as once a receive is complete, who knows what the user 
> has done with that buffer ?  A general treatment needs to be able to false 
> negatives, and attempts to deliver the data more than once.
>
> How are you detecting missing acknowledgements ?  Are you using some sort of 
> timer ?
>
> Rich
>
> On 7/31/09 5:49 AM, "Mouhamed Gueye"  wrote:
>
> Hi list,
>
> Here is an update on our work concerning device failover.
>
> As many of you suggested, we reoriented our work on ob1 rather than dr
> and we now have a working prototype on top of ob1. The approach is to
> store btl descriptors sent to peers and delete them when we receive
> proof of delivery. So far, we rely on completion callback functions,
> assuming that the message is delivered when the completion function is
> called, that is the case of openib. When a btl module fails, it is
> removed from the endpoint's btl list and the next one is used to
> retransmit stored descriptors. No extra-message is transmitted, it only
> consists in additions to the header. It has been mainly tested with two
> IB modules, in both multi-rail (two separate networks) and multi-path (a
> big unique network).
>
> You can grab and test the patch here (applies on top of the trunk) :
> http://bitbucket.org/gueyem/ob1-failover/
>
> To compile with failover support, just define --enable-device-failover
> at configure. You can then run a benchmark, disconnect a port and see
> the failover operate.
>
> A little latency 

Re: [OMPI devel] [OT] Who's going to Helsinki?

2009-08-04 Thread Edgar Gabriel

I'll be there, however for EPVMMPI only, i.e. I arrive on Sunday.
Edgar

Jeff Squyres wrote:

Who's going to Helsinki?

Does anyone want to meet up for some sight-seeing and/or have a devel 
meeting?  I know that some of our European developers are not attending, 
but if we have a day-long devel meeting, perhaps they might be 
motivated...?




--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335


Re: [OMPI devel] [PATCH] Better error reporting when failing to load a component

2009-08-04 Thread Jeff Squyres

Absolutely correct -- fixed -- thanks!

On Aug 4, 2009, at 4:45 AM, Arthur Huillet wrote:


Hi,

Jeff Squyres wrote:
> Glad it was helpful!  Feel free to let us know if there's anything
> else that would be helpful there -- it's easy enough to give you  
write

> access to the wiki.
Just a small thing on the CreateComponent page :

"Create a directory with the component name in /mca/foo/. For
the purposes of this document, we'll assume that your framework name  
is

"bar" (i.e., /mca/foo/bar/)."

This lines looks fishy to me. s/framework/component/ is probably what
should be written here.

Thanks

--
Greetings,
A. Huillet

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI devel] [OT] Who's going to Helsinki?

2009-08-04 Thread Terry Dontje

Jeff Squyres wrote:

Who's going to Helsinki?

Does anyone want to meet up for some sight-seeing and/or have a devel 
meeting?  I know that some of our European developers are not 
attending, but if we have a day-long devel meeting, perhaps they might 
be motivated...?



I will be attending.

--td


Re: [OMPI devel] [OT] Who's going to Helsinki?

2009-08-04 Thread Jeff Squyres

On Aug 4, 2009, at 7:34 AM, Sylvain Jeaugey wrote:


I bet you're refering to Euro PVM MPI 09 ?



Heh -- sorry, I should have been more clear.  :-)  Yes, I was  
referring to both Euro PVM/MPI and the Forum meeting on W-F in the  
previous week.  I'm actually *only* attending the Forum meeting  
(leaving early Saturday morning), but there's wiggle room in there for  
some OMPI-specific devel time...


--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI devel] [OT] Who's going to Helsinki?

2009-08-04 Thread Sylvain Jeaugey

Hi Jeff,

I bet you're refering to Euro PVM MPI 09 ?

If this is what you're refering to, I should attend as usual. And of 
course, I'm very interested in joining a devel meeting :)


Sylvain

On Tue, 4 Aug 2009, Jeff Squyres wrote:


Who's going to Helsinki?

Does anyone want to meet up for some sight-seeing and/or have a devel 
meeting?  I know that some of our European developers are not attending, but 
if we have a day-long devel meeting, perhaps they might be motivated...?


--
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



[OMPI devel] [OT] Who's going to Helsinki?

2009-08-04 Thread Jeff Squyres

Who's going to Helsinki?

Does anyone want to meet up for some sight-seeing and/or have a devel  
meeting?  I know that some of our European developers are not  
attending, but if we have a day-long devel meeting, perhaps they might  
be motivated...?


--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI devel] Device failover on ob1

2009-08-04 Thread Ralph Castain

Rolf/Mouhamed

Could you get together off-list to discuss the different approaches  
and see if/where there is common ground. It would be nice to see an  
integrated solution - personally, I would rather not see two  
orthogonal approaches unless they can be cleanly separated. Much  
better if they could support each other in an intelligent fashion.


On Aug 3, 2009, at 9:49 AM, Pavel Shamis (Pasha) wrote:




I have not, but there should be no difference.  The failover code  
only gets triggered when an error happens.  Otherwise, there are no  
differences in the code paths while everything is functioning  
normally.
Sounds good. I still did not have time to review the code. I will  
try to do it during this week.


Pasha


Rolf

On 08/03/09 11:14, Pavel Shamis (Pasha) wrote:

Rolf,
Did you compare latency/bw for failover-enabled code VS trunk ?

Pasha.

Rolf Vandevaart wrote:

Hi folks:

As some of you know, I have also been looking into implementing  
failover as well.  I took a different approach as I am solving  
the problem within the openib BTL itself.  This of course means  
that this only works for failing from one openib BTL to another  
but that was our area of interest.  This also means that we do  
not need to keep track of fragments as we get them back from the  
completion queue upon failure. We then extract the relevant  
information and repost on the other working endpoint.


My work has been progressing at http://bitbucket.org/rolfv/ompi-failover 
.


This only currently works for send semantics so you have to run  
with -mca btl_openib_flags 1.


Rolf

On 07/31/09 05:49, Mouhamed Gueye wrote:

Hi list,

Here is an update on our work concerning device failover.

As many of you suggested, we reoriented our work on ob1 rather  
than dr and we now have a working prototype on top of ob1. The  
approach is to store btl descriptors sent to peers and delete  
them when we receive proof of delivery. So far, we rely on  
completion callback functions, assuming that the message is  
delivered when the completion function is called, that is the  
case of openib. When a btl module fails, it is removed from the  
endpoint's btl list and the next one is used to retransmit  
stored descriptors. No extra-message is transmitted, it only  
consists in additions to the header. It has been mainly tested  
with two IB modules, in both multi-rail (two separate networks)  
and multi-path (a big unique network).


You can grab and test the patch here (applies on top of the  
trunk) :

http://bitbucket.org/gueyem/ob1-failover/

To compile with failover support, just define --enable-device- 
failover at configure. You can then run a benchmark, disconnect  
a port and see the failover operate.


A little latency increase (~ 2%) is induced by the failover  
layer when no failover occurs. To accelerate the failover  
process on openib, you can try to lower the  
btl_openib_ib_timeout openib parameter to 15 for example instead  
of 20 (default value).


Mouhamed
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel










___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [PATCH] Better error reporting when failing to load a component

2009-08-04 Thread Arthur Huillet

Hi,

Jeff Squyres wrote:
Glad it was helpful!  Feel free to let us know if there's anything 
else that would be helpful there -- it's easy enough to give you write 
access to the wiki.

Just a small thing on the CreateComponent page :

"Create a directory with the component name in /mca/foo/. For 
the purposes of this document, we'll assume that your framework name is 
"bar" (i.e., /mca/foo/bar/)."


This lines looks fishy to me. s/framework/component/ is probably what 
should be written here.


Thanks

--
Greetings, 
A. Huillet