grundprinzip commented on code in PR #37288:
URL: https://github.com/apache/spark/pull/37288#discussion_r929807694


##########
python/run-tests.py:
##########
@@ -107,20 +118,26 @@ def run_individual_python_test(target_dir, test_name, 
pyspark_python):
     env["PYSPARK_SUBMIT_ARGS"] = " ".join(spark_args)
 
     output_prefix = get_valid_filename(pyspark_python + "__" + test_name + 
"__").lstrip("_")
-    per_test_output = tempfile.NamedTemporaryFile(prefix=output_prefix, 
suffix=".log")
+
+    if keep_test_output:
+        # The location is unique because the test is already in a unique 
directory.

Review Comment:
   I think this comes back to my original question, what is the expectation of 
retaining the test output? Shouldn't we retain all the files written by Spark 
as well? 
   
   Independent whether we clobber the target and log directory together, the 
other point is, from an ergonomics perspective, is it better or worse to use a 
two argument option for something that is actually only one?
   
   One alternative implementation would be to simply pass the 
`delete=opts.keep_test_output` flag to the `NamedTemporaryFile` and be done 
with it. Then we don't change any of the existing logic today. The downside is 
that the output will be stored in the OS's temporary locations.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to