On Jul 23, 2013, at 3:56 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote:
> On Jul 21, 2013, at 8:50 AM, Kevin H. Hobbs <hob...@ohio.edu> wrote: > >>> Ah! That would indicate an issue with the external hwloc >>> package they provided, which is the big reason we don't >>> recommend installing from packages. >> >> I'll happily report the bug to the hwloc developers. > > I don't think that this is necessarily an hwloc bug. > >> I'll also add what we've found here to the bug on the Fedora >> bugzilla. >> >> Is there anything more I can do on this list to figure out the >> nature of the bug? >> >>> We have internal copies of hwloc and libevent that ensure (a) >>> they are at the proper level, and (b) they are configured >>> properly for OMPI's use. >> >> It does look like Fedora's hwloc is ahead of OMPI's. >> >> Fedora 18 has openmpi-1.6.3 and hwloc-1.4.2. >> >> The source of openmpi-1.6.5 has hwloc-1.3.2. > > Hypothetically, hwloc 1.4.x is backwards source-compatible with hwloc 1.3.x, > but we have not tested this. I don't know if hwloc has, either (I'm sure > they haven't tested with Open MPI 1.6.x). > >> How can I tell what the configuration differences are? >> >> The entire configure section of the .spec file in >> hwloc-1.4.2-2.fc18.src.rpm is : >> >> %configure >> %{__make} %{?_smp_mflags} V=1 > > OMPI builds hwloc in "embedded" mode, which means that OMPI's configure line > is used to build hwloc (vs. having a separate configure invocation for > hwloc). They're hypothetically the moral equivalent of each other, but > perhaps something is different somehow... > >> I don't see anything that looks like any hwloc configure options >> are being set. >> >> How do I tell how OMPI configures it's bundled hwloc? > > With this embedded mechanism, we're calling hwloc's configury with the moral > equivalent of: > > ./configure --disable-cairo --disable-libxml2 --enable-xml > --with-hwloc-symbol-prefix=opal_hwloc152_ --enable-embedded-mode > >> Better yet, I'd like to figure out the actual nature of the bug >> and report it in the proper place. > > > Yes, it's curious that they can't reproduce your issue, Guess I missed this - where does it say that they can't reproduce the issue?? I'm suspicious because build-from-source produced a working result. > which suggests that the hwloc issue is a red herring (because, as stated > above, hwloc *should* be backwards compatible). > > Ralph: is there an easy way to find out more detail on why > orte_util_nidmap_init() failed without attaching a debugger? A debugger would be the best way. > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ >