grundprinzip commented on code in PR #37288:
URL: https://github.com/apache/spark/pull/37288#discussion_r929807694
##########
python/run-tests.py:
##########
@@ -107,20 +118,26 @@ def run_individual_python_test(target_dir, test_name,
pyspark_python):
env["PYSPARK_SUBMIT_ARGS"] = " ".join(spark_args)
output_prefix = get_valid_filename(pyspark_python + "__" + test_name +
"__").lstrip("_")
- per_test_output = tempfile.NamedTemporaryFile(prefix=output_prefix,
suffix=".log")
+
+ if keep_test_output:
+ # The location is unique because the test is already in a unique
directory.
Review Comment:
I think this comes back to my original question, what is the expectation of
retaining the test output? Shouldn't we retain all the files written by Spark
as well?
Independent whether we clobber the target and log directory together, the
other point is, from an ergonomics perspective, is it better or worse to use a
two argument option for something that is actually only one?
One alternative implementation would be to simply pass the
`delete=opts.keep_test_output` flag to the `NamedTemporaryFile` and be done
with it. Then we don't change any of the existing logic today. The downside is
that the output will be stored in the OS's temporary locations.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]