Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/1628#issuecomment-50687582
  
    I think the problem when running `/bin/pyspark 
python/pyspark/mllib/linalg.py` is that `$SPARK-HOME/python/pyspark/mllib/` is 
finding its way onto the path and its `random` is being imported first.
    
    [According to the Python 
docs](https://docs.python.org/2/library/sys.html#sys.path):
    
    > As initialized upon program startup, the first item of this list, 
path[0], is the directory containing the script that was used to invoke the 
Python interpreter. If the script directory is not available (e.g. if the 
interpreter is invoked interactively or if the script is read from standard 
input), path[0] is the empty string, which directs Python to search modules in 
the current directory first. Notice that the script directory is inserted 
before the entries inserted as a result of PYTHONPATH.
    
    I don't think we want this behavior in the `linalg.py` test.  I seemed to 
be able to fix things by just popping the first entry off of `sys.path` when 
running that test:
    
    ```diff
    diff --git a/python/pyspark/mllib/linalg.py b/python/pyspark/mllib/linalg.py
    index 71f4ad1..ced6e34 100644
    --- a/python/pyspark/mllib/linalg.py
    +++ b/python/pyspark/mllib/linalg.py
    @@ -255,4 +255,6 @@ def _test():
             exit(-1)
    
     if __name__ == "__main__":
    +    import sys
    +    sys.path = sys.path[1:]
         _test()
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to