HyukjinKwon commented on a change in pull request #29703:
URL: https://github.com/apache/spark/pull/29703#discussion_r492425304
##########
File path: python/pyspark/find_spark_home.py
##########
@@ -42,7 +42,11 @@ def is_spark_home(path):
import_error_raised = False
from importlib.util import find_spec
try:
+ # Spark distribution can be downloaded when HADOOP_VERSION environment
variable is set.
+ # We should look up this directory first, see also SPARK-32017.
+ spark_dist_dir = "spark-distribution"
module_home = os.path.dirname(find_spec("pyspark").origin)
+ paths.append(os.path.join(module_home, spark_dist_dir))
Review comment:
`os.path.dirname(os.path.realpath(__file__))` is usually in a script
directory because `find_spark_home.py` is included as a script at `setup.py`:
https://github.com/apache/spark/blob/058e61ae143a3619fc13fb0c316044a75f7bc15f/python/setup.py#L187
for example, it will be user's `bin`.
I thought `os.path.dirname(os.path.realpath(__file__))` is more just dev
purpose. I checked that it correctly points out the newly installed Spark home.
However, sure. why not be safer :-) I will update.
##########
File path: python/pyspark/find_spark_home.py
##########
@@ -42,7 +42,11 @@ def is_spark_home(path):
import_error_raised = False
from importlib.util import find_spec
try:
+ # Spark distribution can be downloaded when HADOOP_VERSION environment
variable is set.
+ # We should look up this directory first, see also SPARK-32017.
+ spark_dist_dir = "spark-distribution"
module_home = os.path.dirname(find_spec("pyspark").origin)
+ paths.append(os.path.join(module_home, spark_dist_dir))
Review comment:
`os.path.dirname(os.path.realpath(__file__))` is usually in a script
directory because `find_spark_home.py` is included as a script at `setup.py`:
https://github.com/apache/spark/blob/058e61ae143a3619fc13fb0c316044a75f7bc15f/python/setup.py#L187
for example, it will be user's `bin`.
I thought `os.path.dirname(os.path.realpath(__file__))` is more just for dev
purpose or non-pip installed PySpark. I checked that it correctly points out
the newly installed Spark home.
However, sure. why not be safer :-) I will update.
##########
File path: python/pyspark/find_spark_home.py
##########
@@ -42,7 +42,11 @@ def is_spark_home(path):
import_error_raised = False
from importlib.util import find_spec
try:
+ # Spark distribution can be downloaded when HADOOP_VERSION environment
variable is set.
+ # We should look up this directory first, see also SPARK-32017.
+ spark_dist_dir = "spark-distribution"
module_home = os.path.dirname(find_spec("pyspark").origin)
+ paths.append(os.path.join(module_home, spark_dist_dir))
Review comment:
`os.path.dirname(os.path.realpath(__file__))` is usually in a script
directory because `find_spark_home.py` is included as a script at `setup.py`:
https://github.com/apache/spark/blob/058e61ae143a3619fc13fb0c316044a75f7bc15f/python/setup.py#L187
for example, it will be user's `bin`.
I thought `os.path.dirname(os.path.realpath(__file__))` is more just for dev
purpose. I checked that it correctly points out the newly installed Spark home.
However, sure. why not be safer :-) I will update.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]