Hi,

The fastest way to produce the problem described in
https://cygwin.com/pipermail/cygwin/2024-January/255267.html and https://cygwin.com/pipermail/cygwin/2024-January/255273.html seems to be to run `pip install ...` on a version of `pip` that uses its vendored `rich` dependency to draw progress bars. (The hang reliably occurs at 0% on the *second* progress bar, and `--progress-bar off` avoids it.) Examining what `pip` is doing *may* be sufficient to investigate this.

However, I was able to make a *fairly* simple script that reliably produces it, at least on my machine (and on GitHub Actions runners). It seems to me that this script may give some insight. In case it's useful:

import hashlib
import threading
import time
t1 = threading.Thread(target=lambda: print("hello"))
t2 = threading.Thread(target=lambda: print("goodbye"))
t1.start()
time.sleep(1)
print("in between")
t2.start()
t1.join()
t2.join()

The interesting thing here is that the `hashlib` import is required. Even though that import is not used, the script does not trigger the problem if it is removed.

As discussed at
https://github.com/gitpython-developers/GitPython/pull/1814, this script is motivated by code in GitPython that produces the hang when unit tests are run. The script hangs when attempting to execute `t2.start()`. The effect appears specific to Python 3.9.18 on Cygwin. Running that script with Python 3.9.16 on Cygwin, or on either Python 3.9.16 or Python 3.9.18 on either Ubuntu 22.04 LTS or macOS 13, does not produce the problem. (I don't have native Windows builds of those versions to test with at this time.)

`t1` can be joined before `t2` is started, and the problem still reliably occurs. If that is done, then the sleep can be omitted and the problem sometimes occurs. Running the statements in a REPL also produces the problem without requiring a sleep (presumably the delay of entering them is sufficient). The child threads and main thread don't have to print to produce the problem; I included that to make it clearer what's going on. I have not tested non-blocking delays.

I named that `simple.py` and ran it in various ways to verify that it triggers the problem, but I think the most important ways to report are:

/usr/bin/python3.9 simple.py

And:

strace -o strace.out /usr/bin/python3.9 simple.py

By the time I killed the process in the strace run, `strace.out` had grown to 1819328 lines, most of which were:

--- Process 25112 (pid: 20768), exception c0000005 at 0000000000000000

(This is the same pattern Daniel Abrahamsson reported when running
`pip install` with strace.)

I made a copy of the first 6610 lines as `truncated.out`, but even that is 828 KiB, so I've posted it here rather than attaching it:

https://gist.github.com/EliahKagan/04143302056426d72c7a617d65890dda

The last 8 lines of `truncated.out` are identical, and the original `strace.out` continued that way.

(Although the strace output shows that this was run from a directory related to GitPython, this was not done with any virtual environment activated, nothing from GitPython was imported or otherwise used, and neither GitPython nor its distinctive dependencies gitdb and smmap were installed in the global environment.)

That GitHub Gist also includes `simple.py` for convenience, and `cygcheck.out` in case that would somehow be useful.

-Eliah

--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Reply via email to