We use petsc4py as a solver suite in our [FiPy](https://www.ctcms.nist.gov/fipy) Python-based PDE solver package. Some time back, I refactored some of the code and provoked a deadlock situation in our test suite. I have been tearing what remains of my hair out trying to isolate things and am at a loss. I’ve gone through the refactoring line-by-line and I just don’t think I’ve changed anything substantive, just how the code is organized.
I have posted a branch that exhibits the issue at https://github.com/usnistgov/fipy/pull/761 I explain in greater detail in that “pull request” how to reproduce, but in short, after a substantial number of our tests run, the code either deadlocks or raises exceptions: On processor 0 in matrix.setUp() specifically in [0] PetscSplitOwnership() line 93 in /Users/runner/miniforge3/conda-bld/petsc_1601473259434/work/src/sys/utils/psplit.c and on other processors a few lines earlier in matrix.create(comm) specifically in [1] PetscCommDuplicate() line 126 in /Users/runner/miniforge3/conda-bld/petsc_1601473259434/work/src/sys/objects/tagm.c The circumstances that lead to this failure are really fragile and it seems likely due to some memory corruption. Particularly likely given that I can make the failure go away by removing seemingly irrelevant things like >>> from scipy.stats.mstats import argstoarray Note that when I run the full test suite after taking out this scipy import, the same problem just arises elsewhere without any obvious similar import trigger. Running with `-malloc_debug true` doesn’t illuminate anything. I’ve run with `-info` and `-log_trace` and don’t see any obvious issues, but there’s a ton of output. I have tried reducing things to a minimal reproducible example, but unfortunately things remain way too complicated and idiosyncratic to FiPy. I’m grateful for any help anybody can offer despite the mess that I’m offering.
