[GitHub] spark pull request #21198: [SPARK-24126][pyspark] Use build-specific temp di...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21198 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21198: [SPARK-24126][pyspark] Use build-specific temp di...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21198#discussion_r185705560 --- Diff: python/run-tests.py --- @@ -77,13 +79,33 @@ def run_individual_python_test(test_name, pyspark_python): 'PYSPARK_PYTHON': which(pyspark_python), 'PYSPARK_DRIVER_PYTHON': which(pyspark_python) }) + +# Create a unique temp directory under 'target/' for each run. The TMPDIR variable is +# recognized by the tempfile module to override the default system temp directory. +target_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), 'target')) +if not os.path.isdir(target_dir): +os.mkdir(target_dir) +tmp_dir = os.path.join(target_dir, str(uuid.uuid4())) +if not os.path.isdir(tmp_dir): +os.mkdir(tmp_dir) +env["TMPDIR"] = tmp_dir + +# Also override the JVM's temp directory by setting driver and executor options. +spark_args = [ +"--conf", "spark.driver.extraJavaOptions=-Djava.io.tmpdir={0}".format(tmp_dir), +"--conf", "spark.executor.extraJavaOptions=-Djava.io.tmpdir={0}".format(tmp_dir), +"pyspark-shell" +] +env["PYSPARK_SUBMIT_ARGS"] = " ".join(spark_args) + LOGGER.info("Starting test(%s): %s", pyspark_python, test_name) start_time = time.time() try: per_test_output = tempfile.TemporaryFile() retcode = subprocess.Popen( [os.path.join(SPARK_HOME, "bin/pyspark"), test_name], stderr=per_test_output, stdout=per_test_output, env=env).wait() +shutil.rmtree(tmp_dir, ignore_errors=True) --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21198: [SPARK-24126][pyspark] Use build-specific temp di...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21198#discussion_r185242826 --- Diff: python/run-tests.py --- @@ -77,13 +79,33 @@ def run_individual_python_test(test_name, pyspark_python): 'PYSPARK_PYTHON': which(pyspark_python), 'PYSPARK_DRIVER_PYTHON': which(pyspark_python) }) + +# Create a unique temp directory under 'target/' for each run. The TMPDIR variable is +# recognized by the tempfile module to override the default system temp directory. +target_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), 'target')) +if not os.path.isdir(target_dir): +os.mkdir(target_dir) +tmp_dir = os.path.join(target_dir, str(uuid.uuid4())) +if not os.path.isdir(tmp_dir): --- End diff -- Unlikely to happen but sure. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21198: [SPARK-24126][pyspark] Use build-specific temp di...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21198#discussion_r185242337 --- Diff: python/run-tests.py --- @@ -77,13 +79,33 @@ def run_individual_python_test(test_name, pyspark_python): 'PYSPARK_PYTHON': which(pyspark_python), 'PYSPARK_DRIVER_PYTHON': which(pyspark_python) }) + +# Create a unique temp directory under 'target/' for each run. The TMPDIR variable is +# recognized by the tempfile module to override the default system temp directory. +target_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), 'target')) +if not os.path.isdir(target_dir): +os.mkdir(target_dir) +tmp_dir = os.path.join(target_dir, str(uuid.uuid4())) +if not os.path.isdir(tmp_dir): +os.mkdir(tmp_dir) +env["TMPDIR"] = tmp_dir + +# Also override the JVM's temp directory by setting driver and executor options. +spark_args = [ +"--conf", "spark.driver.extraJavaOptions=-Djava.io.tmpdir={0}".format(tmp_dir), +"--conf", "spark.executor.extraJavaOptions=-Djava.io.tmpdir={0}".format(tmp_dir), +"pyspark-shell" +] +env["PYSPARK_SUBMIT_ARGS"] = " ".join(spark_args) + LOGGER.info("Starting test(%s): %s", pyspark_python, test_name) start_time = time.time() try: per_test_output = tempfile.TemporaryFile() retcode = subprocess.Popen( [os.path.join(SPARK_HOME, "bin/pyspark"), test_name], stderr=per_test_output, stdout=per_test_output, env=env).wait() +shutil.rmtree(tmp_dir, ignore_errors=True) --- End diff -- I wanted to leave the filed temp directories behind in case they might contain useful info. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21198: [SPARK-24126][pyspark] Use build-specific temp di...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21198#discussion_r185182113 --- Diff: python/run-tests.py --- @@ -77,13 +79,33 @@ def run_individual_python_test(test_name, pyspark_python): 'PYSPARK_PYTHON': which(pyspark_python), 'PYSPARK_DRIVER_PYTHON': which(pyspark_python) }) + +# Create a unique temp directory under 'target/' for each run. The TMPDIR variable is +# recognized by the tempfile module to override the default system temp directory. +target_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), 'target')) +if not os.path.isdir(target_dir): +os.mkdir(target_dir) +tmp_dir = os.path.join(target_dir, str(uuid.uuid4())) +if not os.path.isdir(tmp_dir): +os.mkdir(tmp_dir) +env["TMPDIR"] = tmp_dir + +# Also override the JVM's temp directory by setting driver and executor options. +spark_args = [ +"--conf", "spark.driver.extraJavaOptions=-Djava.io.tmpdir={0}".format(tmp_dir), +"--conf", "spark.executor.extraJavaOptions=-Djava.io.tmpdir={0}".format(tmp_dir), +"pyspark-shell" +] +env["PYSPARK_SUBMIT_ARGS"] = " ".join(spark_args) + LOGGER.info("Starting test(%s): %s", pyspark_python, test_name) start_time = time.time() try: per_test_output = tempfile.TemporaryFile() retcode = subprocess.Popen( [os.path.join(SPARK_HOME, "bin/pyspark"), test_name], stderr=per_test_output, stdout=per_test_output, env=env).wait() +shutil.rmtree(tmp_dir, ignore_errors=True) --- End diff -- rmtree also on failure? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21198: [SPARK-24126][pyspark] Use build-specific temp di...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21198#discussion_r185182067 --- Diff: python/run-tests.py --- @@ -77,13 +79,33 @@ def run_individual_python_test(test_name, pyspark_python): 'PYSPARK_PYTHON': which(pyspark_python), 'PYSPARK_DRIVER_PYTHON': which(pyspark_python) }) + +# Create a unique temp directory under 'target/' for each run. The TMPDIR variable is +# recognized by the tempfile module to override the default system temp directory. +target_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), 'target')) +if not os.path.isdir(target_dir): +os.mkdir(target_dir) +tmp_dir = os.path.join(target_dir, str(uuid.uuid4())) +if not os.path.isdir(tmp_dir): --- End diff -- shouldn't it fail or create a new uuid if the directory is already there? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21198: [SPARK-24126][pyspark] Use build-specific temp di...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21198#discussion_r185158876 --- Diff: python/pyspark/streaming/tests.py --- @@ -1549,7 +1549,9 @@ def search_kinesis_asl_assembly_jar(): kinesis_jar_present = True jars = "%s,%s,%s" % (kafka_assembly_jar, flume_assembly_jar, kinesis_asl_assembly_jar) -os.environ["PYSPARK_SUBMIT_ARGS"] = "--jars %s pyspark-shell" % jars +existing_args = os.environ.get("PYSPARK_SUBMIT_ARGS") --- End diff -- or get("", ...) if possible --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21198: [SPARK-24126][pyspark] Use build-specific temp di...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21198#discussion_r185158778 --- Diff: python/pyspark/streaming/tests.py --- @@ -1549,7 +1549,9 @@ def search_kinesis_asl_assembly_jar(): kinesis_jar_present = True jars = "%s,%s,%s" % (kafka_assembly_jar, flume_assembly_jar, kinesis_asl_assembly_jar) -os.environ["PYSPARK_SUBMIT_ARGS"] = "--jars %s pyspark-shell" % jars +existing_args = os.environ.get("PYSPARK_SUBMIT_ARGS") --- End diff -- I think there could be ... at least I know one informal way like `SPARK_TESTING=1 pyspark pyspark.sql.tests` for a quicker test. It wouldn't hard to have one if statement. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21198: [SPARK-24126][pyspark] Use build-specific temp di...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21198#discussion_r185156339 --- Diff: python/pyspark/streaming/tests.py --- @@ -1549,7 +1549,9 @@ def search_kinesis_asl_assembly_jar(): kinesis_jar_present = True jars = "%s,%s,%s" % (kafka_assembly_jar, flume_assembly_jar, kinesis_asl_assembly_jar) -os.environ["PYSPARK_SUBMIT_ARGS"] = "--jars %s pyspark-shell" % jars +existing_args = os.environ.get("PYSPARK_SUBMIT_ARGS") --- End diff -- When would that happen? That's always set by `run-tests.py`. Is there another way to run these tests? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21198: [SPARK-24126][pyspark] Use build-specific temp di...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21198#discussion_r185148928 --- Diff: python/pyspark/streaming/tests.py --- @@ -1549,7 +1549,9 @@ def search_kinesis_asl_assembly_jar(): kinesis_jar_present = True jars = "%s,%s,%s" % (kafka_assembly_jar, flume_assembly_jar, kinesis_asl_assembly_jar) -os.environ["PYSPARK_SUBMIT_ARGS"] = "--jars %s pyspark-shell" % jars +existing_args = os.environ.get("PYSPARK_SUBMIT_ARGS") --- End diff -- @vanzin, I think this will be broken if `PYSPARK_SUBMIT_ARGS` is not set and it returns `None`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21198: [SPARK-24126][pyspark] Use build-specific temp di...
GitHub user vanzin opened a pull request: https://github.com/apache/spark/pull/21198 [SPARK-24126][pyspark] Use build-specific temp directory for pyspark tests. This avoids polluting and leaving garbage behind in /tmp, and allows the usual build tools to clean up any leftover files. You can merge this pull request into a Git repository by running: $ git pull https://github.com/vanzin/spark SPARK-24126 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21198.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21198 commit 45d3c2a36d6f2ac10a78d4afb500cd65c6b6db7f Author: Marcelo VanzinDate: 2018-04-27T21:58:54Z [SPARK-24126][pyspark] Use build-specific temp directory for pyspark tests. This avoids polluting and leaving garbage behind in /tmp, and allows the usual build tools to clean up any leftover files. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org