[jira] [Commented] (SPARK-4348) pyspark.mllib.random conflicts with random module

2014-11-13 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14211768#comment-14211768
 ] 

Xiangrui Meng commented on SPARK-4348:
--

Note that after this fix, it is very likely that the bytecode file `random.pyc` 
still sits under `pyspark/mllib`. We need to remove it manually to prevent 
import random taking that file. 

 pyspark.mllib.random conflicts with random module
 -

 Key: SPARK-4348
 URL: https://issues.apache.org/jira/browse/SPARK-4348
 Project: Spark
  Issue Type: Bug
  Components: MLlib, PySpark
Affects Versions: 1.1.0, 1.2.0
Reporter: Davies Liu
Assignee: Davies Liu
Priority: Blocker
 Fix For: 1.2.0


 There are conflict in two cases:
 1. random module is used by pyspark.mllib.feature, if the first part of 
 sys.path is not '', then the hack in pyspark/__init__.py will fail to fix the 
 conflict.
 2. Run tests in mllib/xxx.py, the '' should be popped out before import 
 anything, or it will fail.
 The first one is not fully fixed for user, it will introduce problems in some 
 cases, such as:
 {code}
  import sys
  import sys.insert(0, PATH_OF_MODULE)
  import pyspark
  # use Word2Vec will fail
 {code}
 I'd like to rename mllib/random.py as random/_random.py, then in 
 mllib/__init.py
 {code}
 import pyspark.mllib._random as random
 {code}
 cc [~mengxr] [~dorx]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4348) pyspark.mllib.random conflicts with random module

2014-11-11 Thread Doris Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207199#comment-14207199
 ] 

Doris Xin commented on SPARK-4348:
--

I fully support this. It took a lot of hacking just to override the default 
random module in Python, and it wasn't clear if the override was the ideal 
solution.

 pyspark.mllib.random conflicts with random module
 -

 Key: SPARK-4348
 URL: https://issues.apache.org/jira/browse/SPARK-4348
 Project: Spark
  Issue Type: Bug
  Components: MLlib, PySpark
Affects Versions: 1.1.0, 1.2.0
Reporter: Davies Liu
Priority: Blocker

 There are conflict in two cases:
 1. random module is used by pyspark.mllib.feature, if the first part of 
 sys.path is not '', then the hack in pyspark/__init__.py will fail to fix the 
 conflict.
 2. Run tests in mllib/xxx.py, the '' should be popped out before import 
 anything, or it will fail.
 The first one is not fully fixed for user, it will introduce problems in some 
 cases, such as:
 {code}
  import sys
  import sys.insert(0, PATH_OF_MODULE)
  import pyspark
  # use Word2Vec will fail
 {code}
 I'd like to rename mllib/random.py as random/_random.py, then in 
 mllib/__init.py
 {code}
 import pyspark.mllib._random as random
 {code}
 cc [~mengxr] [~dorx]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4348) pyspark.mllib.random conflicts with random module

2014-11-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207511#comment-14207511
 ] 

Apache Spark commented on SPARK-4348:
-

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/3216

 pyspark.mllib.random conflicts with random module
 -

 Key: SPARK-4348
 URL: https://issues.apache.org/jira/browse/SPARK-4348
 Project: Spark
  Issue Type: Bug
  Components: MLlib, PySpark
Affects Versions: 1.1.0, 1.2.0
Reporter: Davies Liu
Priority: Blocker

 There are conflict in two cases:
 1. random module is used by pyspark.mllib.feature, if the first part of 
 sys.path is not '', then the hack in pyspark/__init__.py will fail to fix the 
 conflict.
 2. Run tests in mllib/xxx.py, the '' should be popped out before import 
 anything, or it will fail.
 The first one is not fully fixed for user, it will introduce problems in some 
 cases, such as:
 {code}
  import sys
  import sys.insert(0, PATH_OF_MODULE)
  import pyspark
  # use Word2Vec will fail
 {code}
 I'd like to rename mllib/random.py as random/_random.py, then in 
 mllib/__init.py
 {code}
 import pyspark.mllib._random as random
 {code}
 cc [~mengxr] [~dorx]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4348) pyspark.mllib.random conflicts with random module

2014-11-11 Thread Davies Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207529#comment-14207529
 ] 

Davies Liu commented on SPARK-4348:
---

After some experiments, I found it's more harder than expected, it still need 
some hack to make it work (see the PR), but I think this hack is safer than 
before:

1. the rand.py module will not overwrite default random module, so it's safe to 
run the mllib/xxx.py without hacking, also we do not need hack to use random in 
mllib package.

2. the RandomModuleHook only installed when user try to import 'pyspark.mllib', 
it also only works for 'pyspark.mllib.random'.

Note: In order to use default random module, we need 'from __future__ import 
absolute_import' in the caller module, this also need as more. Without this, 
'import random' can be translated as 'from pyspark.mllib import random'.  So, 
there is a bug in master (Word2Vec)

 pyspark.mllib.random conflicts with random module
 -

 Key: SPARK-4348
 URL: https://issues.apache.org/jira/browse/SPARK-4348
 Project: Spark
  Issue Type: Bug
  Components: MLlib, PySpark
Affects Versions: 1.1.0, 1.2.0
Reporter: Davies Liu
Priority: Blocker

 There are conflict in two cases:
 1. random module is used by pyspark.mllib.feature, if the first part of 
 sys.path is not '', then the hack in pyspark/__init__.py will fail to fix the 
 conflict.
 2. Run tests in mllib/xxx.py, the '' should be popped out before import 
 anything, or it will fail.
 The first one is not fully fixed for user, it will introduce problems in some 
 cases, such as:
 {code}
  import sys
  import sys.insert(0, PATH_OF_MODULE)
  import pyspark
  # use Word2Vec will fail
 {code}
 I'd like to rename mllib/random.py as random/_random.py, then in 
 mllib/__init.py
 {code}
 import pyspark.mllib._random as random
 {code}
 cc [~mengxr] [~dorx]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org