I was able to create the fix - it is in OMPI master. I have provided a patch
for OMPI v3.1.5 here:
https://github.com/open-mpi/ompi/pull/7276
Ralph
> On Jan 3, 2020, at 6:04 PM, Ralph Castain via devel
> wrote:
>
> I'm afraid the fix uncovered an issue in the ds21 component that will requir
I'm afraid the fix uncovered an issue in the ds21 component that will require
Mellanox to address it - unsure of the timetable for that to happen.
> On Jan 3, 2020, at 6:28 AM, Ralph Castain via devel
> wrote:
>
> I committed something upstream in PMIx master and v3.1 that probably resolves
I committed something upstream in PMIx master and v3.1 that probably resolves
this - another user reported it over there and provided a patch. I can probably
backport it to v2.x and give you a patch for OMPI v3.1.
> On Jan 3, 2020, at 3:25 AM, Jeff Squyres (jsquyres) via devel
> wrote:
>
> I
Is there a configure test we can add to make this kind of behavior be the
default?
> On Jan 1, 2020, at 11:50 PM, Marco Atzeri via devel
> wrote:
>
> thanks Ralph
>
> gds = ^ds21
> works as expected
>
> Am 31.12.2019 um 19:27 schrieb Ralph Castain via devel:
>> PMIx likely defaults to the d
thanks Ralph
gds = ^ds21
works as expected
Am 31.12.2019 um 19:27 schrieb Ralph Castain via devel:
PMIx likely defaults to the ds12 component - which will work fine but a tad slower than
ds21. It is likely something to do with the way cygwin handles memory locks. You can
avoid the error messa
PMIx likely defaults to the ds12 component - which will work fine but a tad
slower than ds21. It is likely something to do with the way cygwin handles
memory locks. You can avoid the error message by simply adding "gds = ^ds21" to
your default MCA param file (the pmix one - should be named
pmix
I have no multinode around for testing
I will need to setup one for testing after the holidays
Am 24.12.2019 um 23:27 schrieb Jeff Squyres (jsquyres):
That actually looks like a legit error -- it's failing to initialize a shared
mutex.
I'm not sure what the consequence is of this failure, tho
That actually looks like a legit error -- it's failing to initialize a shared
mutex.
I'm not sure what the consequence is of this failure, though, since the job
seemed to run ok.
Are you able to run multi-node jobs ok?
> On Dec 22, 2019, at 1:20 AM, Marco Atzeri via devel
> wrote:
>
> Hi D