GitHub user JoshRosen opened a pull request:

    https://github.com/apache/spark/pull/3940

    [SPARK-3910] Remove pyspark/mllib/ from sys.path in tests to fix relative 
import issue

    This patch addresses an issue where PySpark's MLlib unit tests might fail 
on certain environments due to circular import issues.  The root issue is that 
PySpark has a module named `random`, which shares its name as the built-in 
Python `random`.  This isn't a problem for consumers of Spark, though: `numpy` 
also has a `numpy.random` module, so it's perfectly fine to have _qualified_ 
module names that have components that match Python built-ins.  In normal 
operation, the top-level `pyspark` directory is added to `sys.path` so that 
PySpark's `random` module can only be imported via `pyspark.mllib.random`.  
However, our unit tests end up invoking `./bin/pyspark 
python/pyspark/mllib/clustering.py`, which causes [the first entry of 
`sys.path`](https://docs.python.org/2/library/sys.html#sys.path) to be the 
directory containing `clustering.py` (`python/pyspark/mllib`, in that case); as 
a result, PySpark's `random` module shadows the built-in Python module.  This 
causes `numpy` and other d
 ependencies to end up importing PySpark's `random`, which leads to circular 
import errors that cause tests to fail.
    
    To fix this, we need to prevent our tests from adding the script's 
directory to `sys.path`.  There was already some code to do this in 
`linalg.py`, so this patch just copies that code to all of the other MLlib 
tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/JoshRosen/spark SPARK-3910

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/3940.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3940
    
----
commit ad385223a11a28fff6238dddf0ff27c58d4e6974
Author: Josh Rosen <[email protected]>
Date:   2015-01-07T22:20:47Z

    [SPARK-3910] Remove pyspark/mllib/ from sys.path in tests to fix relative 
import issue

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to