[issue37224] test__xxsubinterpreters fails randomly

Kyle Stanley Tue, 14 Jan 2020 23:21:33 -0800


Kyle Stanley <aeros...@gmail.com> added the comment:


I just made a rather interesting discovery. Instead of specifically focusing my 
efforts on the logic with interp_destroy(), I decided to take a closer look at 
the failing unit test itself. 

The main test within DestroyTests that's failing is the following:

```
    def test_still_running(self):
        main, = interpreters.list_all()
        interp = interpreters.create()
        with _running(interp):
            with self.assertRaises(RuntimeError):
                interpreters.destroy(interp)
            self.assertTrue(interpreters.is_running(interp))
```

(Specifically, "self.assertRaises(RuntimeError): interpreters.destroy(interp)" 
is the main point of failure)

In order to be 100% certain that it was an issue occurring from 
interpreters.destroy(), I decided to add in a bit of a "sanity check" to ensure 
the interpreter was actually running in the first place before destroying it (I 
also added some extra debugging info):

```
    def test_still_running(self):
        main, = interpreters.list_all()
        interp = interpreters.create()
        with _running(interp):
            self.assertTrue(interpreters.is_running(interp),
                msg=f"Interp {interp} should be running before destruction.")

            with self.assertRaises(RuntimeError,
                msg=f"Should not be able to destroy interp {interp} while"
                           " it's still running."):
                interpreters.destroy(interp)

            self.assertTrue(interpreters.is_running(interp))
```

The results were very interesting...

```
OK
0:00:49 load avg: 135.49 [306/1] test__xxsubinterpreters failed
test_all (test.test__xxsubinterpreters.DestroyTests) ... ok
test_already_destroyed (test.test__xxsubinterpreters.DestroyTests) ... ok
test_bad_id (test.test__xxsubinterpreters.DestroyTests) ... ok
test_does_not_exist (test.test__xxsubinterpreters.DestroyTests) ... ok
test_from_current (test.test__xxsubinterpreters.DestroyTests) ... ok
test_from_other_thread (test.test__xxsubinterpreters.DestroyTests) ... ok
test_from_sibling (test.test__xxsubinterpreters.DestroyTests) ... ok
test_main (test.test__xxsubinterpreters.DestroyTests) ... ok
test_one (test.test__xxsubinterpreters.DestroyTests) ... ok
test_still_running (test.test__xxsubinterpreters.DestroyTests) ... FAIL

======================================================================
FAIL: test_still_running (test.test__xxsubinterpreters.DestroyTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/aeros/repos/aeros-cpython/Lib/test/test__xxsubinterpreters.py", 
line 763, in test_still_running
    self.assertTrue(interpreters.is_running(interp),
AssertionError: False is not true : Interp 12 should be running before 
destruction.

----------------------------------------------------------------------
```

As can be seen from the results above, the interpreter is not even running in 
the first place before it's destroyed, so of course destroy() won't raise an 
RuntimeError. I think this proves that interpreters.destroy() is _not_ where we 
should be focusing our efforts (at least for now). Instead, we should first 
investigate why it's not even running at this point. I suspect the issue 
_might_ be a race condition within the "_running()" context manager that's 
preventing the interpreter from being ran, but I'll have to do some further 
investigation.

I also ran this ~20 times to be 100% certain, and every single one of those 
times the point of failure was at the new assertion check I added before 
destroy(). 

Notably, a rather difficult and hard to explain side effect occurred from 
adding the new assertion. The average number of tests before failure increased 
by a significant amount. In the above test, it was able to pass 306 iterations 
before failing, and in one of my earlier tests it reached over 1000. 

That never happened before on the 3.8 branch, it would very consistently fail 
in the first set of parallel workers if not very soon after. I can say that 
with a degree certainty as well, since I've ran this set of tests a countless 
number of times while trying to debug the failure. But, I have no explanation 
for this.

Do you have any potential ideas, Eric? Also, do you think it might be worth 
adding in the changes I made to DestroyTests.test_still_running above? It 
doesn't directly address the core failure occurring, but I think it improves 
the test significantly; both in functionality and debugging info. I would be 
glad to open a PR if you think the test changes might be useful, as well as 
make any needed adjustments to them.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue37224>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37224] test__xxsubinterpreters fails randomly

Reply via email to