[OMPI devel] Open-MPI between Mac and Linux (ubuntu 9.04) over wireless

2009-09-21 Thread Pallab Datta
Hi

I am trying to run open-mpi 1.3.3. between a linux box running ubuntu
server v.9.04 and a Macintosh. I have configured openmpi with the
following options.:
./configure --prefix=/usr/local/ --enable-heterogeneous --disable-shared
--enable-static

When both the machines are connected to the network via ethernet cables
openmpi works fine.

But when I switch the linux box to a wireless adapter i can reach (ping)
the macintosh
but openmpi hangs on a hello world program.

I ran :

/usr/local/bin/mpirun --mca btl_tcp_port_min_v4 36900 -mca
btl_tcp_port_range_v4 32 --mca btl_base_verbose 30 --mca
OMPI_mca_mpi_preconnect_all 1 -np 2 -hetero -H localhost,10.11.14.205
/tmp/back

it hangs on a send receive function between the two ends. All my firewalls
are turned off at the macintosh end. PLEASE HELP ASAP> PLEASE let me know
how to debug it further..


The following is the error dump

fuji:src pallabdatta$ /usr/local/bin/mpirun --mca btl_tcp_port_min_v4
36900 -mca btl_tcp_port_range_v4 32 --mca btl_base_verbose 30 --mca btl
tcp,self --mca OMPI_mca_mpi_preconnect_all 1 -np 2 -hetero -H
localhost,10.11.14.205 /tmp/hello
[fuji.local:01316] mca: base: components_open: Looking for btl components
[fuji.local:01316] mca: base: components_open: opening btl components
[fuji.local:01316] mca: base: components_open: found loaded component self
[fuji.local:01316] mca: base: components_open: component self has no
register function
[fuji.local:01316] mca: base: components_open: component self open
function successful
[fuji.local:01316] mca: base: components_open: found loaded component tcp
[fuji.local:01316] mca: base: components_open: component tcp has no
register function
[fuji.local:01316] mca: base: components_open: component tcp open function
successful
[fuji.local:01316] select: initializing btl component self
[fuji.local:01316] select: init of component self returned success
[fuji.local:01316] select: initializing btl component tcp
[fuji.local:01316] select: init of component tcp returned success
[apex-backpack:04753] mca: base: components_open: Looking for btl components
[apex-backpack:04753] mca: base: components_open: opening btl components
[apex-backpack:04753] mca: base: components_open: found loaded component self
[apex-backpack:04753] mca: base: components_open: component self has no
register function
[apex-backpack:04753] mca: base: components_open: component self open
function successful
[apex-backpack:04753] mca: base: components_open: found loaded component tcp
[apex-backpack:04753] mca: base: components_open: component tcp has no
register function
[apex-backpack:04753] mca: base: components_open: component tcp open
function successful
[apex-backpack:04753] select: initializing btl component self
[apex-backpack:04753] select: init of component self returned success
[apex-backpack:04753] select: initializing btl component tcp
[apex-backpack:04753] select: init of component tcp returned success
Process 0 on fuji.local out of 2
Process 1 on apex-backpack out of 2
[apex-backpack:04753] btl: tcp: attempting to connect() to address
10.11.14.203 on port 9360

regards,
pallab





Re: [OMPI devel] MPIR_Breakpoint visibility

2009-09-21 Thread Samuel K. Gutierrez

Hi Jeff,

Sorry about the ambiguity.  I just had another conversation with our  
TotalView person and the problem -seems- to be unrelated to OMPI.   
Guess I jumped the gun...


Thanks,

Samuel K. Gutierrez

On Sep 21, 2009, at 8:58 AM, Jeff Squyres wrote:


Can you more precisely define "not working properly"?

On Sep 21, 2009, at 10:26 AM, Samuel K. Gutierrez wrote:


Hi,

According to our TotalView person, PGI and Intel versions of OMPI
1.3.3 are not working properly.  She noted that 1.2.8 and 1.3.2 work
fine.

Thanks,

Samuel K. Gutierrez

On Sep 21, 2009, at 7:19 AM, Terry Dontje wrote:

> Ralph Castain wrote:
>> I see it declared "extern" in orte/tools/orterun/debuggers.h, but
>> not DECLSPEC'd
>>
>> FWIW: LANL uses intel compilers + totalview on a regular basis,  
and

>> I have yet to hear of an issue.
>>
> It actually will work if you attach to the job or if you are not
> relying on the MPIR_Breakpoint to actually stop execution.
>
> --td
>
>> On Sep 21, 2009, at 7:03 AM, Terry Dontje wrote:
>>
>>> I was kind of amazed no one else managed to run into this but it
>>> was brought to my attention that compiling OMPI with Intel
>>> compilers and visibility on that the MPIR_Breakpoint symbol was
>>> not being exposed. I am assuming this is due to MPIR_Breakpoint
>>> not being ORTE or OMPI_DECLSPEC'd
>>> Do others agree or am I missing something obvious here?
>>>
>>> Interestingly enough, it doesn't look like gcc, pgi, pathscale or
>>> sun compilers are hiding the MPIR_Breakpoint symbol.
>>> --td
>>>
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




--
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Dynamic languages, dlopen() issues, and symbol visibility of libtool ltdl API in current trunk

2009-09-21 Thread Ralf Wildenhues
As a workaround, Lisandro could just pre-seed the cache variables of the
respective configure tests that come out wrong.

  ./configure lt_cv_dlopen_self=yes lt_cv_dlopen_self_static=yes

HTH.

Cheers,
Ralf

* Jeff Squyres wrote on Mon, Sep 21, 2009 at 02:45:28PM CEST:
> Ick; I appreciate Lisandro's quandry, but don't quite know what to do.
> 
> How about keeping libltdl fvisibility=hidden inside mpi4py?
> 
> 
> On Sep 17, 2009, at 11:16 AM, Josh Hursey wrote:
> 
> >So I started down this road a couple months ago. I was using the
> >lt_dlopen() and friends in the OPAL CRS self module. The visibility
> >changes broke that functionality. The one solution that I started
> >implementing was precisely what you suggested, wrapping a subset the
> >libtool calls and prefixing them with opal_*. The email thread is
> >below:
> >   http://www.open-mpi.org/community/lists/devel/2009/07/6531.php
> >
> >The problem that I hit was that libtool's build system did not play
> >well with the visibility symbols. This caused dlopen to be disabled
> >incorrectly. The libtool folks have a patch and, I believe, they are
> >planning on incorporating in the next release. The email thread is
> >below:
> >   http://thread.gmane.org/gmane.comp.gnu.libtool.patches/9446
> >
> >So we would (others can speak up if not) certainly consider such a
> >wrapper, but I think we need to wait for the next libtool release
> >(unless there is other magic we can do) before it would be usable.
> >
> >Do others have any other ideas on how we might get around this in the
> >mean time?
> >
> >-- Josh
> >
> >
> >On Sep 16, 2009, at 5:59 PM, Lisandro Dalcin wrote:
> >
> >> Hi all.. I have to contact you again about the issues related to
> >> dlopen()ing libmpi with RTLD_LOCAL, as many dynamic languages
> >(Python
> >> in my case) do.
> >>
> >> So far, I've been able to manage the issues (despite the "do
> >nothing"
> >> policy from Open MPI devs, which I understand) in a more or less
> >> portable manner by taking advantage of the availability of libtool
> >> ltdl symbols in the Open MPI libraries (specifically, in
> >libopen-pal).
> >> For reference, all this hackery is here:
> >> http://code.google.com/p/mpi4py/source/browse/trunk/src/compat/openmpi.h
> >>
> >> However, I noticed that in current trunk (v1.4, IIUC) things have
> >> changed and libtool symbols are not externally available. Again, I
> >> understand the reason and acknowledge that such change is a really
> >> good thing. However, this change has broken all my hackery for
> >> dlopen()ing libmpi before the call to MPI_Init().
> >>
> >> Is there any chance that libopen-pal could provide some properly
> >> prefixed (let say, using "opal_" as a prefix) wrapper calls to a
> >small
> >> subset of the libtool ltdl API? The following set of wrapper calls
> >> would is the minimum required to properly load libmpi in a portable
> >> manner and cleanup resources (let me abuse of my previous suggestion
> >> and add the opal_ prefix):
> >>
> >> opal_lt_dlinit()
> >> opal_lt_dlexit()
> >>
> >> opal_lt_dladvise_init(a)
> >> opal_lt_dladvise_destroy(a)
> >> opal_lt_dladvise_global(a)
> >> opal_lt_dladvise_ext(a)
> >>
> >> opal_lt_dlopenadvise(n,a)
> >> opal_lt_dlclose(h)
> >>
> >> Any chance this request could be considered? I would really like to
> >> have this before any Open MPI tarball get released without libtool
> >> symbols exposed...


Re: [hwloc-devel] last API possible changes

2009-09-21 Thread Samuel Thibault
Samuel Thibault, le Mon 21 Sep 2009 19:33:11 +0200, a écrit :
> Jeff Squyres, le Mon 21 Sep 2009 12:31:41 -0400, a écrit :
> > On Sep 21, 2009, at 10:42 AM, Samuel Thibault wrote:
> > >It's part of the language starting from C99 only. An application could
> > >enable non-C99 mode where it becomes undefined, you can never know.
> > 
> > That is a decade old, no?  ;-)
> 
> Yes, but existing software tends to still not evolve. I'm still seeing
> software using the old termio interface while it dates at least back
   it = termios, the newer one
> 1988.


Samuel


Re: [hwloc-devel] last API possible changes

2009-09-21 Thread Samuel Thibault
Jeff Squyres, le Mon 21 Sep 2009 12:31:41 -0400, a écrit :
> On Sep 21, 2009, at 10:42 AM, Samuel Thibault wrote:
> >It's part of the language starting from C99 only. An application could
> >enable non-C99 mode where it becomes undefined, you can never know.
> 
> That is a decade old, no?  ;-)

Yes, but existing software tends to still not evolve. I'm still seeing
software using the old termio interface while it dates at least back
1988.

> (Sorry -- I wasn't trying to be a jerk; just trying to be thorough...)

No problem. Actually, I had followed your reasoning at the time I wrote
that part of the code, I'm just repeating what I've thought in my head
at the time :)

> Is it possible that our use of restrict in cpuset- 
> bits.h could come to a conclusion that is different than the  
> underlying compiler (e.g., the underlying compiler needs __restrict)?   

I'm not sure to understand what you mean.  What could happen
is that gcc stops understanding __restrict while it has been
understanding it since 2.95; I doubt that will ever happen.  Another
very low-probability possibility to get something wrong is if a
compiler defines __STDC_VERSION__ to a value greater than 199901L but
doesn't accept restrict; that would really be a compiler bug.  The
last possibility is restrict being #defined to something not being the
standard restrict qualifier.  I've just dropped the #if defined restrict
part to avoid it, that's not a big loss.

> Alternatively, is the restrict optimization really worth it here?

Re-reading what we use it for at the moment, there is not many
optimizations to be done, but now that I've removed the only case that
could be problematic, it shouldn't be a problem.

Samuel


Re: [hwloc-devel] last API possible changes

2009-09-21 Thread Samuel Thibault
Jeff Squyres, le Mon 21 Sep 2009 10:04:21 -0400, a écrit :
> On Sep 21, 2009, at 9:40 AM, Samuel Thibault wrote:
> >> So it should be ok to use AC_C_RESTRICT then, right?
> >
> >But then we can't expose restrict in installed headers since we don't
> >know _whether_ and how it is defined.
> >
> 
> Understood, but is that really our problem?  "restrict" is part of the  
> C language, so portable applications should be able to handle it in  
> headers that they import, right?

It's part of the language starting from C99 only. An application could
enable non-C99 mode where it becomes undefined, you can never know.

> Alternatively, this whole block in cpuset-bits.h could be wrapped in  
> an "#ifndef restrict", right?:

That will work only if libraries possibly defining restrict and included
after hwloc do the same.  You may argue that then it's their matter.
The only library I see defining restrict in my /usr/include does it
without #ifndef restrict, I'm not sure we want to fight this.

Anyway, #defining restrict in an installed header means that we'd have
to copy the autoconf stuff into it anyway as to my knowledge autoconf is
not flexible enough to only output that to some config.h.in header. That
hence doesn't buy ourself much compared to the current situation, except
that we'd have restrict instead of __hwloc_restrict.  Risking conflicts
on the definition of restrict just for that seems too much to me.

> >> So I'd call it a "feature" if hwloc defines "restrict" to one thing
> >> and then some other header file defines it to another.  :-)
> >
> >?
> >That would make applications get a warning just because they are for
> >instance using at the same time two libraries which both define  
> >restrict
> >to different things.
> >
> 
> Yes -- and that's a Bad Thing.  The differences between those two  
> libraries should be resolved, lest unexpected things occur because of  
> a mismatch between what the header exports (and might be redefined)  
> and what the compiled back-end library expects, no?

You may not be able to resolve the difference: depending on the kind of
detection of the compiler etc. etc. you may end up with restrict defined
to __restrict or to something else.  And as soon as one improves its
detection of the compiler, the other(s!) will have to harmonize, etc.
Not really sustainable.

Samuel


Re: [OMPI devel] MPIR_Breakpoint visibility

2009-09-21 Thread Samuel K. Gutierrez

Hi,

According to our TotalView person, PGI and Intel versions of OMPI  
1.3.3 are not working properly.  She noted that 1.2.8 and 1.3.2 work  
fine.


Thanks,

Samuel K. Gutierrez

On Sep 21, 2009, at 7:19 AM, Terry Dontje wrote:


Ralph Castain wrote:
I see it declared "extern" in orte/tools/orterun/debuggers.h, but  
not DECLSPEC'd


FWIW: LANL uses intel compilers + totalview on a regular basis, and  
I have yet to hear of an issue.


It actually will work if you attach to the job or if you are not  
relying on the MPIR_Breakpoint to actually stop execution.


--td


On Sep 21, 2009, at 7:03 AM, Terry Dontje wrote:

I was kind of amazed no one else managed to run into this but it  
was brought to my attention that compiling OMPI with Intel  
compilers and visibility on that the MPIR_Breakpoint symbol was  
not being exposed. I am assuming this is due to MPIR_Breakpoint  
not being ORTE or OMPI_DECLSPEC'd

Do others agree or am I missing something obvious here?

Interestingly enough, it doesn't look like gcc, pgi, pathscale or  
sun compilers are hiding the MPIR_Breakpoint symbol.

--td

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] detecting regcache_clean deadlocks in Open-MX

2009-09-21 Thread Brice Goglin
Jeff Squyres wrote:
> Do you just want to wait for the ummunotify stuff in OMPI?  I'm half
> done making a merged "linux" memory component (i.e., it merges the
> ptmalloc2 component with the new ummunotify stuff).
>
> It won't help for kernels <2.6.32, of course.  :-)

Yeah that's another solution for bleeding-edge distribs :) But it's not
even merged in 2.6.32 yet, and I hope the perfcounter guys won't
successfully convince Roland of reimplementing ummunotify as a
perfcounter feature. The 2.6.32 merge window is closing soon...

Brice



Re: [OMPI devel] detecting regcache_clean deadlocks in Open-MX

2009-09-21 Thread Jeff Squyres
Do you just want to wait for the ummunotify stuff in OMPI?  I'm half  
done making a merged "linux" memory component (i.e., it merges the  
ptmalloc2 component with the new ummunotify stuff).


It won't help for kernels <2.6.32, of course.  :-)


On Sep 21, 2009, at 9:11 AM, Brice Goglin wrote:


Jeff Squyres wrote:
> On Sep 21, 2009, at 5:50 AM, Brice Goglin wrote:
>
>> I am playing with mx__regcache_clean() in Open-MX so as to have  
OpenMPI
>> cleanup the Open-MX regcache when needed. It causes some  
deadlocks since
>> OpenMPI intercepts Open-MX' own free() calls. Is there a "safe"  
way to
>> have Open-MX free/munmap calls not invoke OpenMPI interception  
hooks?

>>
>
> Not ATM, no.
>
>> Or
>> is there a way to detect the caller of free/munmap so that my
>> regcache_clean does nothing in this case? Otherwise, I guess I'll  
have
>> to add a private malloc implementation inside Open-MX and hope  
OpenMPI

>> won't see it.
>>
>
>
> Can you structure your code to not call free/munmap inside the  
handler?


The first problem is actually about thread-safety. Most Open-MX
functions, including mx_regcache_clean(), take a pthread mutex. So I
would have to move all free/munmap outside of the locked section.  
That's

probably feasible but requires a lot of work :)

Brice

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI devel] MPIR_Breakpoint visibility

2009-09-21 Thread Terry Dontje

Ralph Castain wrote:
I see it declared "extern" in orte/tools/orterun/debuggers.h, but not 
DECLSPEC'd


FWIW: LANL uses intel compilers + totalview on a regular basis, and I 
have yet to hear of an issue.


It actually will work if you attach to the job or if you are not relying 
on the MPIR_Breakpoint to actually stop execution.


--td


On Sep 21, 2009, at 7:03 AM, Terry Dontje wrote:

I was kind of amazed no one else managed to run into this but it was 
brought to my attention that compiling OMPI with Intel compilers and 
visibility on that the MPIR_Breakpoint symbol was not being exposed. 
I am assuming this is due to MPIR_Breakpoint not being ORTE or 
OMPI_DECLSPEC'd

Do others agree or am I missing something obvious here?

Interestingly enough, it doesn't look like gcc, pgi, pathscale or sun 
compilers are hiding the MPIR_Breakpoint symbol.

--td

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] MPIR_Breakpoint visibility

2009-09-21 Thread Jeff Squyres

Does declspec matter for executables?  (I don't recall)

On Sep 21, 2009, at 9:15 AM, Ralph Castain wrote:


I see it declared "extern" in orte/tools/orterun/debuggers.h, but not
DECLSPEC'd

FWIW: LANL uses intel compilers + totalview on a regular basis, and I
have yet to hear of an issue.

On Sep 21, 2009, at 7:03 AM, Terry Dontje wrote:

> I was kind of amazed no one else managed to run into this but it was
> brought to my attention that compiling OMPI with Intel compilers and
> visibility on that the MPIR_Breakpoint symbol was not being
> exposed.  I am assuming this is due to MPIR_Breakpoint not being
> ORTE or OMPI_DECLSPEC'd
> Do others agree or am I missing something obvious here?
>
> Interestingly enough, it doesn't look like gcc, pgi, pathscale or
> sun compilers are hiding the MPIR_Breakpoint symbol.
> --td
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI devel] MPIR_Breakpoint visibility

2009-09-21 Thread Ralph Castain
I see it declared "extern" in orte/tools/orterun/debuggers.h, but not  
DECLSPEC'd


FWIW: LANL uses intel compilers + totalview on a regular basis, and I  
have yet to hear of an issue.


On Sep 21, 2009, at 7:03 AM, Terry Dontje wrote:

I was kind of amazed no one else managed to run into this but it was  
brought to my attention that compiling OMPI with Intel compilers and  
visibility on that the MPIR_Breakpoint symbol was not being  
exposed.  I am assuming this is due to MPIR_Breakpoint not being  
ORTE or OMPI_DECLSPEC'd

Do others agree or am I missing something obvious here?

Interestingly enough, it doesn't look like gcc, pgi, pathscale or  
sun compilers are hiding the MPIR_Breakpoint symbol.

--td

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [hwloc-devel] last API possible changes

2009-09-21 Thread Samuel Thibault
Jeff Squyres, le Mon 21 Sep 2009 08:51:35 -0400, a écrit :
> On Sep 21, 2009, at 8:44 AM, Samuel Thibault wrote:
> 
> >> FWIW, is there a reason we're not using AC_C_RESTRICT in
> >> configure.ac?  This allows you to use "restrict" in C code  
> >everywhere;
> >> it'll be #defined to something acceptable by the compiler if
> >> "restrict" itself is not.
> >
> >Our __hwloc_restrict macro is actually a copy/paste of AC_C_RESTRICT's
> >tinkering.

Ah, wait, no, I'm mistaking with something else in another project.
Looking closer, this definition is mine.

Note btw that the autoconf license makes an exception for code output
from autoconf scripts, the GPL doesn't apply to them, there is
“unlimited permission to copy, distribute, and modify” it.

> >The problem is precisely that AC_C_RESTRICT provides "restrict",
> >and not another keyword, so that using it in installed headers
> >may conflict with other headers' tinkering about restrict. Yes,
> >configure is meant to detect such kind of things, but it can not
> >know which variety of headers the user will want to include from
> >its application, and any of them could want to define restrict to
> >something else.
> 
> Would it ever be sane to use one value of restrict in hwloc and a  
> different value in an upper-level application?

That's not a problem since it's just an optimization & validity checking
qualifier.

Samuel


Re: [OMPI devel] detecting regcache_clean deadlocks in Open-MX

2009-09-21 Thread Brice Goglin
Jeff Squyres wrote:
> On Sep 21, 2009, at 5:50 AM, Brice Goglin wrote:
>
>> I am playing with mx__regcache_clean() in Open-MX so as to have OpenMPI
>> cleanup the Open-MX regcache when needed. It causes some deadlocks since
>> OpenMPI intercepts Open-MX' own free() calls. Is there a "safe" way to
>> have Open-MX free/munmap calls not invoke OpenMPI interception hooks?
>>
>
> Not ATM, no.
>
>> Or
>> is there a way to detect the caller of free/munmap so that my
>> regcache_clean does nothing in this case? Otherwise, I guess I'll have
>> to add a private malloc implementation inside Open-MX and hope OpenMPI
>> won't see it.
>>
>
>
> Can you structure your code to not call free/munmap inside the handler?

The first problem is actually about thread-safety. Most Open-MX
functions, including mx_regcache_clean(), take a pthread mutex. So I
would have to move all free/munmap outside of the locked section. That's
probably feasible but requires a lot of work :)

Brice



[OMPI devel] MPIR_Breakpoint visibility

2009-09-21 Thread Terry Dontje
I was kind of amazed no one else managed to run into this but it was 
brought to my attention that compiling OMPI with Intel compilers and 
visibility on that the MPIR_Breakpoint symbol was not being exposed.  I 
am assuming this is due to MPIR_Breakpoint not being ORTE or OMPI_DECLSPEC'd

Do others agree or am I missing something obvious here?

Interestingly enough, it doesn't look like gcc, pgi, pathscale or sun 
compilers are hiding the MPIR_Breakpoint symbol. 


--td



Re: [hwloc-devel] last API possible changes

2009-09-21 Thread Jeff Squyres

On Sep 21, 2009, at 8:44 AM, Samuel Thibault wrote:


> FWIW, is there a reason we're not using AC_C_RESTRICT in
> configure.ac?  This allows you to use "restrict" in C code  
everywhere;

> it'll be #defined to something acceptable by the compiler if
> "restrict" itself is not.

Our __hwloc_restrict macro is actually a copy/paste of AC_C_RESTRICT's
tinkering.



Er... that's a license violation, right?  Doesn't that taint hwloc  
into making it GPL?  (good thing we haven't released yet!)


:-(


The problem is precisely that AC_C_RESTRICT provides "restrict", and
not another keyword, so that using it in installed headers may  
conflict
with other headers' tinkering about restrict. Yes, configure is  
meant to
detect such kind of things, but it can not know which variety of  
headers

the user will want to include from its application, and any of them
could want to define restrict to something else.




Would it ever be sane to use one value of restrict in hwloc and a  
different value in an upper-level application?


--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI devel] Dynamic languages, dlopen() issues, and symbol visibility of libtool ltdl API in current trunk

2009-09-21 Thread Jeff Squyres

Ick; I appreciate Lisandro's quandry, but don't quite know what to do.

How about keeping libltdl fvisibility=hidden inside mpi4py?


On Sep 17, 2009, at 11:16 AM, Josh Hursey wrote:


So I started down this road a couple months ago. I was using the
lt_dlopen() and friends in the OPAL CRS self module. The visibility
changes broke that functionality. The one solution that I started
implementing was precisely what you suggested, wrapping a subset the
libtool calls and prefixing them with opal_*. The email thread is  
below:

   http://www.open-mpi.org/community/lists/devel/2009/07/6531.php

The problem that I hit was that libtool's build system did not play
well with the visibility symbols. This caused dlopen to be disabled
incorrectly. The libtool folks have a patch and, I believe, they are
planning on incorporating in the next release. The email thread is
below:
   http://thread.gmane.org/gmane.comp.gnu.libtool.patches/9446

So we would (others can speak up if not) certainly consider such a
wrapper, but I think we need to wait for the next libtool release
(unless there is other magic we can do) before it would be usable.

Do others have any other ideas on how we might get around this in the
mean time?

-- Josh


On Sep 16, 2009, at 5:59 PM, Lisandro Dalcin wrote:

> Hi all.. I have to contact you again about the issues related to
> dlopen()ing libmpi with RTLD_LOCAL, as many dynamic languages  
(Python

> in my case) do.
>
> So far, I've been able to manage the issues (despite the "do  
nothing"

> policy from Open MPI devs, which I understand) in a more or less
> portable manner by taking advantage of the availability of libtool
> ltdl symbols in the Open MPI libraries (specifically, in libopen- 
pal).

> For reference, all this hackery is here:
> http://code.google.com/p/mpi4py/source/browse/trunk/src/compat/openmpi.h
>
> However, I noticed that in current trunk (v1.4, IIUC) things have
> changed and libtool symbols are not externally available. Again, I
> understand the reason and acknowledge that such change is a really
> good thing. However, this change has broken all my hackery for
> dlopen()ing libmpi before the call to MPI_Init().
>
> Is there any chance that libopen-pal could provide some properly
> prefixed (let say, using "opal_" as a prefix) wrapper calls to a  
small

> subset of the libtool ltdl API? The following set of wrapper calls
> would is the minimum required to properly load libmpi in a portable
> manner and cleanup resources (let me abuse of my previous suggestion
> and add the opal_ prefix):
>
> opal_lt_dlinit()
> opal_lt_dlexit()
>
> opal_lt_dladvise_init(a)
> opal_lt_dladvise_destroy(a)
> opal_lt_dladvise_global(a)
> opal_lt_dladvise_ext(a)
>
> opal_lt_dlopenadvise(n,a)
> opal_lt_dlclose(h)
>
> Any chance this request could be considered? I would really like to
> have this before any Open MPI tarball get released without libtool
> symbols exposed...
>
>
> --
> Lisandro Dalcín
> ---
> Centro Internacional de Métodos Computacionales en Ingeniería  
(CIMEC)
> Instituto de Desarrollo Tecnológico para la Industria Química  
(INTEC)

> Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
> PTLC - Güemes 3450, (3000) Santa Fe, Argentina
> Tel/Fax: +54-(0)342-451.1594
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




--
Jeff Squyres
jsquy...@cisco.com




Re: [hwloc-devel] last API possible changes

2009-09-21 Thread Samuel Thibault
Jeff Squyres, le Mon 21 Sep 2009 08:22:06 -0400, a écrit :
> FWIW, is there a reason we're not using AC_C_RESTRICT in  
> configure.ac?  This allows you to use "restrict" in C code everywhere;  
> it'll be #defined to something acceptable by the compiler if  
> "restrict" itself is not.

Our __hwloc_restrict macro is actually a copy/paste of AC_C_RESTRICT's
tinkering.

The problem is precisely that AC_C_RESTRICT provides "restrict", and
not another keyword, so that using it in installed headers may conflict
with other headers' tinkering about restrict. Yes, configure is meant to
detect such kind of things, but it can not know which variety of headers
the user will want to include from its application, and any of them
could want to define restrict to something else.

Samuel


Re: [OMPI devel] detecting regcache_clean deadlocks in Open-MX

2009-09-21 Thread Jeff Squyres

On Sep 21, 2009, at 5:50 AM, Brice Goglin wrote:

I am playing with mx__regcache_clean() in Open-MX so as to have  
OpenMPI
cleanup the Open-MX regcache when needed. It causes some deadlocks  
since

OpenMPI intercepts Open-MX' own free() calls. Is there a "safe" way to
have Open-MX free/munmap calls not invoke OpenMPI interception hooks?



Not ATM, no.


Or
is there a way to detect the caller of free/munmap so that my
regcache_clean does nothing in this case? Otherwise, I guess I'll have
to add a private malloc implementation inside Open-MX and hope OpenMPI
won't see it.




Can you structure your code to not call free/munmap inside the handler?

--
Jeff Squyres
jsquy...@cisco.com



Re: [hwloc-devel] last API possible changes

2009-09-21 Thread Jeff Squyres

On Sep 20, 2009, at 6:12 PM, Samuel Thibault wrote:

> Also, we have __hwloc_restrict everywhere in the public API, but  
also in

> the manpages? Should we convert the latter into a regular "restrict"
> keyword ?

I had tried before already through the .cfg and that didn't work.   
Since

we now have our own Makefile rules, I've just added a sed call, which
also does the same for inline btw.




FWIW, is there a reason we're not using AC_C_RESTRICT in  
configure.ac?  This allows you to use "restrict" in C code everywhere;  
it'll be #defined to something acceptable by the compiler if  
"restrict" itself is not.


--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI devel] Deadlock with comm_create since cid allocator change

2009-09-21 Thread Sylvain Jeaugey

You were faster to fix the bug than I was to send my bug report :-)

So I confirm : this fixes the problem.

Thanks !
Sylvain

On Mon, 21 Sep 2009, Edgar Gabriel wrote:

what version of OpenMPI did you use? Patch #21970 should have fixed this 
issue on the trunk...


Thanks
Edgar

Sylvain Jeaugey wrote:

Hi list,

We are currently experiencing deadlocks when using communicators other than 
MPI_COMM_WORLD. So we made a very simple reproducer (Comm_create then 
MPI_Barrier on the communicator - see end of e-mail).


We can reproduce the deadlock only with openib and with at least 8 cores 
(no success with sm) and after ~20 runs average. Using larger number of 
cores greatly increases the occurence of the deadlock. When the deadlock 
occurs, every even process is stuck in MPI_Finalize and every odd process 
is in MPI_Barrier.


So we tracked the bug in the changesets and found out that this patch seem 
to have introduced the bug :


user:brbarret
date:Tue Aug 25 15:13:31 2009 +
summary: Per discussion in ticket #2009, temporarily disable the block 
CID allocation

algorithms until they properly reuse CIDs.

Reverting to the non multi-thread cid allocator makes the deadlock 
disappear.


I tried to dig further and understand why this makes a difference, with no 
luck.


If anyone can figure out what's happening, that would be great ...

Thanks,
Sylvain

#include 
#include 

int main(int argc, char **argv) {
int rank, numTasks;
int range[3];
MPI_Comm testComm, dupComm;
MPI_Group orig_group, new_group;

MPI_Init(, );
MPI_Comm_rank(MPI_COMM_WORLD, );
MPI_Comm_size(MPI_COMM_WORLD, );
MPI_Comm_group(MPI_COMM_WORLD, _group);
range[0] = 0; /* first rank */
range[1] = numTasks - 1; /* last rank */
range[2] = 1; /* stride */
MPI_Group_range_incl(orig_group, 1, , _group);
MPI_Comm_create(MPI_COMM_WORLD, new_group, );
MPI_Barrier(testComm);
MPI_Finalize();
return 0;
}

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Deadlock with comm_create since cid allocator change

2009-09-21 Thread Edgar Gabriel
what version of OpenMPI did you use? Patch #21970 should have fixed this 
issue on the trunk...


Thanks
Edgar

Sylvain Jeaugey wrote:

Hi list,

We are currently experiencing deadlocks when using communicators other 
than MPI_COMM_WORLD. So we made a very simple reproducer (Comm_create 
then MPI_Barrier on the communicator - see end of e-mail).


We can reproduce the deadlock only with openib and with at least 8 cores 
(no success with sm) and after ~20 runs average. Using larger number of 
cores greatly increases the occurence of the deadlock. When the deadlock 
occurs, every even process is stuck in MPI_Finalize and every odd 
process is in MPI_Barrier.


So we tracked the bug in the changesets and found out that this patch 
seem to have introduced the bug :


user:brbarret
date:Tue Aug 25 15:13:31 2009 +
summary: Per discussion in ticket #2009, temporarily disable the 
block CID allocation

algorithms until they properly reuse CIDs.

Reverting to the non multi-thread cid allocator makes the deadlock 
disappear.


I tried to dig further and understand why this makes a difference, with 
no luck.


If anyone can figure out what's happening, that would be great ...

Thanks,
Sylvain

#include 
#include 

int main(int argc, char **argv) {
int rank, numTasks;
int range[3];
MPI_Comm testComm, dupComm;
MPI_Group orig_group, new_group;

MPI_Init(, );
MPI_Comm_rank(MPI_COMM_WORLD, );
MPI_Comm_size(MPI_COMM_WORLD, );
MPI_Comm_group(MPI_COMM_WORLD, _group);
range[0] = 0; /* first rank */
range[1] = numTasks - 1; /* last rank */
range[2] = 1; /* stride */
MPI_Group_range_incl(orig_group, 1, , _group);
MPI_Comm_create(MPI_COMM_WORLD, new_group, );
MPI_Barrier(testComm);
MPI_Finalize();
return 0;
}

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335


[OMPI devel] Deadlock with comm_create since cid allocator change

2009-09-21 Thread Sylvain Jeaugey

Hi list,

We are currently experiencing deadlocks when using communicators other 
than MPI_COMM_WORLD. So we made a very simple reproducer (Comm_create then 
MPI_Barrier on the communicator - see end of e-mail).


We can reproduce the deadlock only with openib and with at least 8 cores 
(no success with sm) and after ~20 runs average. Using larger number of 
cores greatly increases the occurence of the deadlock. When the deadlock 
occurs, every even process is stuck in MPI_Finalize and every odd process 
is in MPI_Barrier.


So we tracked the bug in the changesets and found out that this patch seem 
to have introduced the bug :


user:brbarret
date:Tue Aug 25 15:13:31 2009 +
summary: Per discussion in ticket #2009, temporarily disable the block CID 
allocation
algorithms until they properly reuse CIDs.

Reverting to the non multi-thread cid allocator makes the deadlock 
disappear.


I tried to dig further and understand why this makes a difference, with no 
luck.


If anyone can figure out what's happening, that would be great ...

Thanks,
Sylvain

#include 
#include 

int main(int argc, char **argv) {
int rank, numTasks;
int range[3];
MPI_Comm testComm, dupComm;
MPI_Group orig_group, new_group;

MPI_Init(, );
MPI_Comm_rank(MPI_COMM_WORLD, );
MPI_Comm_size(MPI_COMM_WORLD, );
MPI_Comm_group(MPI_COMM_WORLD, _group);
range[0] = 0; /* first rank */
range[1] = numTasks - 1; /* last rank */
range[2] = 1; /* stride */
MPI_Group_range_incl(orig_group, 1, , _group);
MPI_Comm_create(MPI_COMM_WORLD, new_group, );
MPI_Barrier(testComm);
MPI_Finalize();
return 0;
}



[OMPI devel] detecting regcache_clean deadlocks in Open-MX

2009-09-21 Thread Brice Goglin
Hello,

I am playing with mx__regcache_clean() in Open-MX so as to have OpenMPI
cleanup the Open-MX regcache when needed. It causes some deadlocks since
OpenMPI intercepts Open-MX' own free() calls. Is there a "safe" way to
have Open-MX free/munmap calls not invoke OpenMPI interception hooks? Or
is there a way to detect the caller of free/munmap so that my
regcache_clean does nothing in this case? Otherwise, I guess I'll have
to add a private malloc implementation inside Open-MX and hope OpenMPI
won't see it.

thanks,
Brice