[GitHub] spark pull request #21198: [SPARK-24126][pyspark] Use build-specific temp di...

2018-05-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21198


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21198: [SPARK-24126][pyspark] Use build-specific temp di...

2018-05-03 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21198#discussion_r185705560
  
--- Diff: python/run-tests.py ---
@@ -77,13 +79,33 @@ def run_individual_python_test(test_name, 
pyspark_python):
 'PYSPARK_PYTHON': which(pyspark_python),
 'PYSPARK_DRIVER_PYTHON': which(pyspark_python)
 })
+
+# Create a unique temp directory under 'target/' for each run. The 
TMPDIR variable is
+# recognized by the tempfile module to override the default system 
temp directory.
+target_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), 
'target'))
+if not os.path.isdir(target_dir):
+os.mkdir(target_dir)
+tmp_dir = os.path.join(target_dir, str(uuid.uuid4()))
+if not os.path.isdir(tmp_dir):
+os.mkdir(tmp_dir)
+env["TMPDIR"] = tmp_dir
+
+# Also override the JVM's temp directory by setting driver and 
executor options.
+spark_args = [
+"--conf", 
"spark.driver.extraJavaOptions=-Djava.io.tmpdir={0}".format(tmp_dir),
+"--conf", 
"spark.executor.extraJavaOptions=-Djava.io.tmpdir={0}".format(tmp_dir),
+"pyspark-shell"
+]
+env["PYSPARK_SUBMIT_ARGS"] = " ".join(spark_args)
+
 LOGGER.info("Starting test(%s): %s", pyspark_python, test_name)
 start_time = time.time()
 try:
 per_test_output = tempfile.TemporaryFile()
 retcode = subprocess.Popen(
 [os.path.join(SPARK_HOME, "bin/pyspark"), test_name],
 stderr=per_test_output, stdout=per_test_output, env=env).wait()
+shutil.rmtree(tmp_dir, ignore_errors=True)
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21198: [SPARK-24126][pyspark] Use build-specific temp di...

2018-05-01 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21198#discussion_r185242826
  
--- Diff: python/run-tests.py ---
@@ -77,13 +79,33 @@ def run_individual_python_test(test_name, 
pyspark_python):
 'PYSPARK_PYTHON': which(pyspark_python),
 'PYSPARK_DRIVER_PYTHON': which(pyspark_python)
 })
+
+# Create a unique temp directory under 'target/' for each run. The 
TMPDIR variable is
+# recognized by the tempfile module to override the default system 
temp directory.
+target_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), 
'target'))
+if not os.path.isdir(target_dir):
+os.mkdir(target_dir)
+tmp_dir = os.path.join(target_dir, str(uuid.uuid4()))
+if not os.path.isdir(tmp_dir):
--- End diff --

Unlikely to happen but sure.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21198: [SPARK-24126][pyspark] Use build-specific temp di...

2018-05-01 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21198#discussion_r185242337
  
--- Diff: python/run-tests.py ---
@@ -77,13 +79,33 @@ def run_individual_python_test(test_name, 
pyspark_python):
 'PYSPARK_PYTHON': which(pyspark_python),
 'PYSPARK_DRIVER_PYTHON': which(pyspark_python)
 })
+
+# Create a unique temp directory under 'target/' for each run. The 
TMPDIR variable is
+# recognized by the tempfile module to override the default system 
temp directory.
+target_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), 
'target'))
+if not os.path.isdir(target_dir):
+os.mkdir(target_dir)
+tmp_dir = os.path.join(target_dir, str(uuid.uuid4()))
+if not os.path.isdir(tmp_dir):
+os.mkdir(tmp_dir)
+env["TMPDIR"] = tmp_dir
+
+# Also override the JVM's temp directory by setting driver and 
executor options.
+spark_args = [
+"--conf", 
"spark.driver.extraJavaOptions=-Djava.io.tmpdir={0}".format(tmp_dir),
+"--conf", 
"spark.executor.extraJavaOptions=-Djava.io.tmpdir={0}".format(tmp_dir),
+"pyspark-shell"
+]
+env["PYSPARK_SUBMIT_ARGS"] = " ".join(spark_args)
+
 LOGGER.info("Starting test(%s): %s", pyspark_python, test_name)
 start_time = time.time()
 try:
 per_test_output = tempfile.TemporaryFile()
 retcode = subprocess.Popen(
 [os.path.join(SPARK_HOME, "bin/pyspark"), test_name],
 stderr=per_test_output, stdout=per_test_output, env=env).wait()
+shutil.rmtree(tmp_dir, ignore_errors=True)
--- End diff --

I wanted to leave the filed temp directories behind in case they might 
contain useful info.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21198: [SPARK-24126][pyspark] Use build-specific temp di...

2018-05-01 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21198#discussion_r185182113
  
--- Diff: python/run-tests.py ---
@@ -77,13 +79,33 @@ def run_individual_python_test(test_name, 
pyspark_python):
 'PYSPARK_PYTHON': which(pyspark_python),
 'PYSPARK_DRIVER_PYTHON': which(pyspark_python)
 })
+
+# Create a unique temp directory under 'target/' for each run. The 
TMPDIR variable is
+# recognized by the tempfile module to override the default system 
temp directory.
+target_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), 
'target'))
+if not os.path.isdir(target_dir):
+os.mkdir(target_dir)
+tmp_dir = os.path.join(target_dir, str(uuid.uuid4()))
+if not os.path.isdir(tmp_dir):
+os.mkdir(tmp_dir)
+env["TMPDIR"] = tmp_dir
+
+# Also override the JVM's temp directory by setting driver and 
executor options.
+spark_args = [
+"--conf", 
"spark.driver.extraJavaOptions=-Djava.io.tmpdir={0}".format(tmp_dir),
+"--conf", 
"spark.executor.extraJavaOptions=-Djava.io.tmpdir={0}".format(tmp_dir),
+"pyspark-shell"
+]
+env["PYSPARK_SUBMIT_ARGS"] = " ".join(spark_args)
+
 LOGGER.info("Starting test(%s): %s", pyspark_python, test_name)
 start_time = time.time()
 try:
 per_test_output = tempfile.TemporaryFile()
 retcode = subprocess.Popen(
 [os.path.join(SPARK_HOME, "bin/pyspark"), test_name],
 stderr=per_test_output, stdout=per_test_output, env=env).wait()
+shutil.rmtree(tmp_dir, ignore_errors=True)
--- End diff --

rmtree also on failure?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21198: [SPARK-24126][pyspark] Use build-specific temp di...

2018-05-01 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21198#discussion_r185182067
  
--- Diff: python/run-tests.py ---
@@ -77,13 +79,33 @@ def run_individual_python_test(test_name, 
pyspark_python):
 'PYSPARK_PYTHON': which(pyspark_python),
 'PYSPARK_DRIVER_PYTHON': which(pyspark_python)
 })
+
+# Create a unique temp directory under 'target/' for each run. The 
TMPDIR variable is
+# recognized by the tempfile module to override the default system 
temp directory.
+target_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), 
'target'))
+if not os.path.isdir(target_dir):
+os.mkdir(target_dir)
+tmp_dir = os.path.join(target_dir, str(uuid.uuid4()))
+if not os.path.isdir(tmp_dir):
--- End diff --

shouldn't it fail or create a new uuid if the directory is already there?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21198: [SPARK-24126][pyspark] Use build-specific temp di...

2018-04-30 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21198#discussion_r185158876
  
--- Diff: python/pyspark/streaming/tests.py ---
@@ -1549,7 +1549,9 @@ def search_kinesis_asl_assembly_jar():
 kinesis_jar_present = True
 jars = "%s,%s,%s" % (kafka_assembly_jar, flume_assembly_jar, 
kinesis_asl_assembly_jar)
 
-os.environ["PYSPARK_SUBMIT_ARGS"] = "--jars %s pyspark-shell" % jars
+existing_args = os.environ.get("PYSPARK_SUBMIT_ARGS")
--- End diff --

or get("", ...) if possible


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21198: [SPARK-24126][pyspark] Use build-specific temp di...

2018-04-30 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21198#discussion_r185158778
  
--- Diff: python/pyspark/streaming/tests.py ---
@@ -1549,7 +1549,9 @@ def search_kinesis_asl_assembly_jar():
 kinesis_jar_present = True
 jars = "%s,%s,%s" % (kafka_assembly_jar, flume_assembly_jar, 
kinesis_asl_assembly_jar)
 
-os.environ["PYSPARK_SUBMIT_ARGS"] = "--jars %s pyspark-shell" % jars
+existing_args = os.environ.get("PYSPARK_SUBMIT_ARGS")
--- End diff --

I think there could be ... at least I know one informal way like 
`SPARK_TESTING=1 pyspark pyspark.sql.tests` for a quicker test. It wouldn't 
hard to have one if statement.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21198: [SPARK-24126][pyspark] Use build-specific temp di...

2018-04-30 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21198#discussion_r185156339
  
--- Diff: python/pyspark/streaming/tests.py ---
@@ -1549,7 +1549,9 @@ def search_kinesis_asl_assembly_jar():
 kinesis_jar_present = True
 jars = "%s,%s,%s" % (kafka_assembly_jar, flume_assembly_jar, 
kinesis_asl_assembly_jar)
 
-os.environ["PYSPARK_SUBMIT_ARGS"] = "--jars %s pyspark-shell" % jars
+existing_args = os.environ.get("PYSPARK_SUBMIT_ARGS")
--- End diff --

When would that happen? That's always set by `run-tests.py`. Is there 
another way to run these tests?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21198: [SPARK-24126][pyspark] Use build-specific temp di...

2018-04-30 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21198#discussion_r185148928
  
--- Diff: python/pyspark/streaming/tests.py ---
@@ -1549,7 +1549,9 @@ def search_kinesis_asl_assembly_jar():
 kinesis_jar_present = True
 jars = "%s,%s,%s" % (kafka_assembly_jar, flume_assembly_jar, 
kinesis_asl_assembly_jar)
 
-os.environ["PYSPARK_SUBMIT_ARGS"] = "--jars %s pyspark-shell" % jars
+existing_args = os.environ.get("PYSPARK_SUBMIT_ARGS")
--- End diff --

@vanzin, I think this will be broken if `PYSPARK_SUBMIT_ARGS` is not set 
and it returns `None`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21198: [SPARK-24126][pyspark] Use build-specific temp di...

2018-04-30 Thread vanzin
GitHub user vanzin opened a pull request:

https://github.com/apache/spark/pull/21198

[SPARK-24126][pyspark] Use build-specific temp directory for pyspark tests.

This avoids polluting and leaving garbage behind in /tmp, and allows the
usual build tools to clean up any leftover files.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vanzin/spark SPARK-24126

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21198.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21198


commit 45d3c2a36d6f2ac10a78d4afb500cd65c6b6db7f
Author: Marcelo Vanzin 
Date:   2018-04-27T21:58:54Z

[SPARK-24126][pyspark] Use build-specific temp directory for pyspark tests.

This avoids polluting and leaving garbage behind in /tmp, and allows the
usual build tools to clean up any leftover files.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org