gaogaotiantian commented on code in PR #53528:
URL: https://github.com/apache/spark/pull/53528#discussion_r2632315742
##########
python/run-tests.py:
##########
@@ -134,6 +135,12 @@ def run_individual_python_test(target_dir, test_name,
pyspark_python, keep_test_
env["PYSPARK_SUBMIT_ARGS"] = " ".join(spark_args)
+ timeout = int(os.environ.get("PYSPARK_TEST_TIMEOUT", 100))
Review Comment:
We have tests that are well above 100s. I don't think we should have a
default value here. It will also impact people's workflow. Is this for CI or
regular dev? If it's for CI we should set the env var in CI. If for regular
dev, we should probably add a manual way to do that (catch `KeyboardInterrupt`
for example).
##########
python/pyspark/testing/connectutils.py:
##########
@@ -177,6 +180,10 @@ def master(cls):
@classmethod
def setUpClass(cls):
+ if os.environ.get("PYSPARK_TEST_TIMEOUT"):
+ faulthandler.enable(file=sys.__stderr__)
+ faulthandler.register(signal.SIGUSR1, file=sys.__stderr__,
all_threads=True)
Review Comment:
If you don't need to catch segfault, you don't need to do `enable`. I don't
think that has anything to do with the custom signal. Also I think you can just
use `SIGTERM` here. It will print data and exit when receiving SIGTERM.
##########
python/run-tests.py:
##########
@@ -134,6 +135,12 @@ def run_individual_python_test(target_dir, test_name,
pyspark_python, keep_test_
env["PYSPARK_SUBMIT_ARGS"] = " ".join(spark_args)
+ timeout = int(os.environ.get("PYSPARK_TEST_TIMEOUT", 100))
+ if timeout == 0:
+ timeout = None
+ else:
+ env["PYSPARK_TEST_TIMEOUT"] = str(timeout)
Review Comment:
Is this used by anything else?
##########
python/run-tests.py:
##########
@@ -142,17 +149,25 @@ def run_individual_python_test(target_dir, test_name,
pyspark_python, keep_test_
LOGGER.info(
"Starting test(%s): %s (temp output: %s)", pyspark_python, test_name,
per_test_output.name)
start_time = time.time()
+ retcode = None
try:
- retcode = subprocess.Popen(
+ proc = subprocess.Popen(
[os.path.join(SPARK_HOME, "bin/pyspark")] + test_name.split(),
- stderr=per_test_output, stdout=per_test_output, env=env).wait()
+ stderr=per_test_output, stdout=per_test_output, env=env)
+ retcode = proc.wait(timeout=timeout)
if not keep_test_output:
# There exists a race condition in Python and it causes flakiness
in MacOS
# https://github.com/python/cpython/issues/73885
if platform.system() == "Darwin":
os.system("rm -rf " + tmp_dir)
else:
shutil.rmtree(tmp_dir, ignore_errors=True)
+ except subprocess.TimeoutExpired:
+ LOGGER.exception("Got TimeoutExpired while running %s with %s",
test_name, pyspark_python)
+ proc.send_signal(signal.SIGUSR1)
+ time.sleep(1)
+ proc.terminate()
Review Comment:
If you catch `SIGTERM` above, you don't need to send `SIGUSR1` and sleep
here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]