[GitHub] spark pull request: [SPARK-8583] [SPARK-5482] [BUILD] Refactor pyt...
Github user cocoatomo commented on the pull request: https://github.com/apache/spark/pull/6967#issuecomment-117729148 I created the pull request for this: https://github.com/apache/spark/pull/7161. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8763][PySpark] executing run-tests.py w...
GitHub user cocoatomo opened a pull request: https://github.com/apache/spark/pull/7161 [SPARK-8763][PySpark] executing run-tests.py with Python 2.6 fails with absence of subprocess.check_output function Running run-tests.py with Python 2.6 cause following error: ``` Running PySpark tests. Output is in python//Users/tomohiko/.jenkins/jobs/pyspark_test/workspace/python/unit-tests.log Will test against the following Python executables: ['python2.6', 'python3.4', 'pypy'] Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming'] Traceback (most recent call last): File ./python/run-tests.py, line 196, in module main() File ./python/run-tests.py, line 159, in main python_implementation = subprocess.check_output( AttributeError: 'module' object has no attribute 'check_output' ... ``` The cause of this error is using subprocess.check_output function, which exists since Python 2.7. (ref. https://docs.python.org/2.7/library/subprocess.html#subprocess.check_output) You can merge this pull request into a Git repository by running: $ git pull https://github.com/cocoatomo/spark issues/8763-test-fails-py26 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/7161.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #7161 commit cf4f90170663fda5f12b021719dd025a94f9a903 Author: cocoatomo cocoatom...@gmail.com Date: 2015-07-01T15:45:15Z [SPARK-8763] backport process.check_output function from Python 2.7 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8583] [SPARK-5482] [BUILD] Refactor pyt...
Github user cocoatomo commented on the pull request: https://github.com/apache/spark/pull/6967#issuecomment-117418734 Hi, @JoshRosen When running run-tests.py with Python 2.6, I got a folloing error: ``` Running PySpark tests. Output is in python//Users/tomohiko/.jenkins/jobs/pyspark_test/workspace/python/unit-tests.log Will test against the following Python executables: ['python2.6', 'python3.4', 'pypy'] Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming'] Traceback (most recent call last): File ./python/run-tests.py, line 196, in module main() File ./python/run-tests.py, line 159, in main python_implementation = subprocess.check_output( AttributeError: 'module' object has no attribute 'check_output' ``` The cause of this error is using subprocess.check_output function, which exists since Python 2.7. (ref. https://docs.python.org/2.7/library/subprocess.html#subprocess.check_output) The paragraph https://spark.apache.org/docs/latest/#downloading says Spark runs on Java 6+, Python 2.6+..., so should we make run-tests.py enable to run on Python 2.6? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3867][PySpark] ./python/run-tests faile...
Github user cocoatomo commented on the pull request: https://github.com/apache/spark/pull/2759#issuecomment-61286620 Hi, @scwf You may use an unexpected python executable. ./python/run-tests checks whether which python2.6 command has a return code 0, and uses $(which python2.6) as an executable to run tests. c.f. https://github.com/apache/spark/blob/master/python/run-tests#L95 Would you like to check a result of the command which python2.6? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3869] ./bin/spark-class miss Java versi...
Github user cocoatomo commented on the pull request: https://github.com/apache/spark/pull/2725#issuecomment-59141938 Hi, @andrewor14 @vanzin . Thank you for your comments. A value of _JAVA_OPTIONS is automatically passed to java command as an argument. Because the default file.encoding on Mac OS X is MacRoman which is not confortable, it is common to set _JAVA_OPTIONS=-Dfile.encoding=UTF-8. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3909][PySpark][Doc] A corrupted format ...
GitHub user cocoatomo opened a pull request: https://github.com/apache/spark/pull/2766 [SPARK-3909][PySpark][Doc] A corrupted format in Sphinx documents and building warnings Sphinx documents contains a corrupted ReST format and have some warnings. The purpose of this issue is same as https://issues.apache.org/jira/browse/SPARK-3773. commit: 0e8203f4fb721158fb27897680da476174d24c4b output ``` $ cd ./python/docs $ make clean html rm -rf _build/* sphinx-build -b html -d _build/doctrees . _build/html Making output directory... Running Sphinx v1.2.3 loading pickled environment... not yet created building [html]: targets for 4 source files that are out of date updating environment: 4 added, 0 changed, 0 removed reading sources... [100%] pyspark.sql /Users/user/MyRepos/Scala/spark/python/pyspark/mllib/feature.py:docstring of pyspark.mllib.feature.Word2VecModel.findSynonyms:4: WARNING: Field list ends without a blank line; unexpected unindent. /Users/user/MyRepos/Scala/spark/python/pyspark/mllib/feature.py:docstring of pyspark.mllib.feature.Word2VecModel.transform:3: WARNING: Field list ends without a blank line; unexpected unindent. /Users/user/MyRepos/Scala/spark/python/pyspark/sql.py:docstring of pyspark.sql:4: WARNING: Bullet list ends without a blank line; unexpected unindent. looking for now-outdated files... none found pickling environment... done checking consistency... done preparing documents... done writing output... [100%] pyspark.sql writing additional files... (12 module code pages) _modules/index search copying static files... WARNING: html_static_path entry u'/Users/user/MyRepos/Scala/spark/python/docs/_static' does not exist done copying extra files... done dumping search index... done dumping object inventory... done build succeeded, 4 warnings. Build finished. The HTML pages are in _build/html. ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/cocoatomo/spark issues/3909-sphinx-build-warnings Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2766.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2766 commit 2c7faa8ca05820edd9936fdacc69e551059fc532 Author: cocoatomo cocoatom...@gmail.com Date: 2014-10-11T10:20:24Z [SPARK-3909][PySpark][Doc] A corrupted format in Sphinx documents and building warnings --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3909][PySpark][Doc] A corrupted format ...
Github user cocoatomo commented on the pull request: https://github.com/apache/spark/pull/2766#issuecomment-58769648 Thank you for the comment. I'm also happy that Jenkins checks this kind of issue rather than me. -W option of sphinx-build would help us to check and report errors strictly. http://sphinx-doc.org/invocation.html#cmdoption-sphinx-build-W It is used through a make command such like: ```bash $ SPHINXOPTS=-W make clean html ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3867] ./python/run-tests failed when it...
GitHub user cocoatomo opened a pull request: https://github.com/apache/spark/pull/2759 [SPARK-3867] ./python/run-tests failed when it run with Python 2.6 and unittest2 is not installed ./python/run-tests search a Python 2.6 executable on PATH and use it if available. When using Python 2.6, it is going to import unittest2 module which is not a standard library in Python 2.6, so it fails with ImportError. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cocoatomo/spark issues/3867-unittest2-import-error Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2759.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2759 commit f068eb508c7f0e6991d296f4473eb754c7d5090f Author: cocoatomo cocoatom...@gmail.com Date: 2014-10-11T03:05:22Z [SPARK-3867] ./python/run-tests failed when it run with Python 2.6 and unittest2 is not installed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3868][PySpark] Hard to recognize which ...
GitHub user cocoatomo opened a pull request: https://github.com/apache/spark/pull/2724 [SPARK-3868][PySpark] Hard to recognize which module is tested from unit-tests.log ./python/run-tests script display messages about which test it is running currently on stdout but not write them on unit-tests.log. It is harder for us to recognize what test programs were executed and which test was failed. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cocoatomo/spark issues/3868-display-testing-module-name Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2724.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2724 commit c63d9faf3f712327d5e84050097638092c3dced2 Author: cocoatomo cocoatom...@gmail.com Date: 2014-10-09T01:05:06Z [SPARK-3868][PySpark] Hard to recognize which module is tested from unit-tests.log --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3869] ./bin/spark-class miss Java versi...
GitHub user cocoatomo opened a pull request: https://github.com/apache/spark/pull/2725 [SPARK-3869] ./bin/spark-class miss Java version with _JAVA_OPTIONS set When _JAVA_OPTIONS environment variable is set, a command java -version outputs a message like Picked up _JAVA_OPTIONS: -Dfile.encoding=UTF-8. ./bin/spark-class knows java version from the first line of java -version output, so it mistakes java version with _JAVA_OPTIONS set. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cocoatomo/spark issues/3869-mistake-java-version Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2725.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2725 commit f894ebd0b6799af4037134fadf6c515af09181fc Author: cocoatomo cocoatom...@gmail.com Date: 2014-10-09T01:10:23Z [SPARK-3869] ./bin/spark-class miss Java version with _JAVA_OPTIONS set --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3773][PySpark][Doc] Sphinx build warnin...
GitHub user cocoatomo opened a pull request: https://github.com/apache/spark/pull/2653 [SPARK-3773][PySpark][Doc] Sphinx build warning When building Sphinx documents for PySpark, we have 12 warnings. Their causes are almost docstrings in broken ReST format. To reproduce this issue, we should run following commands on the commit: 6e27cb630de69fa5acb510b4e2f6b980742b1957. ```bash $ cd ./python/docs $ make clean html ... /Users/user/MyRepos/Scala/spark/python/pyspark/__init__.py:docstring of pyspark.SparkContext.sequenceFile:4: ERROR: Unexpected indentation. /Users/user/MyRepos/Scala/spark/python/pyspark/__init__.py:docstring of pyspark.RDD.saveAsSequenceFile:4: ERROR: Unexpected indentation. /Users/user/MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring of pyspark.mllib.classification.LogisticRegressionWithSGD.train:14: ERROR: Unexpected indentation. /Users/user/MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring of pyspark.mllib.classification.LogisticRegressionWithSGD.train:16: WARNING: Definition list ends without a blank line; unexpected unindent. /Users/user/MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring of pyspark.mllib.classification.LogisticRegressionWithSGD.train:17: WARNING: Block quote ends without a blank line; unexpected unindent. /Users/user/MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring of pyspark.mllib.classification.SVMWithSGD.train:14: ERROR: Unexpected indentation. /Users/user/MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring of pyspark.mllib.classification.SVMWithSGD.train:16: WARNING: Definition list ends without a blank line; unexpected unindent. /Users/user/MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring of pyspark.mllib.classification.SVMWithSGD.train:17: WARNING: Block quote ends without a blank line; unexpected unindent. /Users/user/MyRepos/Scala/spark/python/docs/pyspark.mllib.rst:50: WARNING: missing attribute mentioned in :members: or __all__: module pyspark.mllib.regression, attribute RidgeRegressionModelLinearRegressionWithSGD /Users/user/MyRepos/Scala/spark/python/pyspark/mllib/tree.py:docstring of pyspark.mllib.tree.DecisionTreeModel.predict:3: ERROR: Unexpected indentation. ... checking consistency... /Users/user/MyRepos/Scala/spark/python/docs/modules.rst:: WARNING: document isn't included in any toctree ... copying static files... WARNING: html_static_path entry u'/Users/user/MyRepos/Scala/spark/python/docs/_static' does not exist ... build succeeded, 12 warnings. ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/cocoatomo/spark issues/3773-sphinx-build-warnings Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2653.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2653 commit 6f656618d7a6fe3f9977f6a1fb15350577388f06 Author: cocoatomo cocoatom...@gmail.com Date: 2014-10-04T14:07:20Z [SPARK-3773][PySpark][Doc] Sphinx build warning Remove all warnings on document building --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3706][PySpark] Cannot run IPython REPL ...
Github user cocoatomo commented on the pull request: https://github.com/apache/spark/pull/2554#issuecomment-57660342 Thank you for the suggestions, mattf and JoshRosen. I deleted the sentence about IPYTHON and IPYTHON_OPTS, and replace -u option with PYTHONUNBUFFERED. To confirm that PYTHONUNBUFFERED is set, we can run a python executable with a following python script set as an argument. ```python # env.py import os print os.environ['PYTHONUNBUFFERED'] ``` ```bash $ PYSPARK_PYTHON=ipython ./bin/pyspark env.py ... YES ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3706][PySpark] Cannot run IPython REPL ...
Github user cocoatomo commented on the pull request: https://github.com/apache/spark/pull/2554#issuecomment-57504536 Thank you for the comment. I agree that using PYSPARK_PYTHON and PYSPARK_PYTHON_OPTS environment variables is simpler and IPYTHON flag should not be exposed. I will keep backward compatibility for IPYTHON and IPYTHON_OPTS. Please review the additional commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3706][PySpark] Cannot run IPython REPL ...
GitHub user cocoatomo opened a pull request: https://github.com/apache/spark/pull/2554 [SPARK-3706][PySpark] Cannot run IPython REPL with IPYTHON set to 1 and PYSPARK_PYTHON unset ### Problem The section Using the shell in Spark Programming Guide (https://spark.apache.org/docs/latest/programming-guide.html#using-the-shell) says that we can run pyspark REPL through IPython. But a folloing command does not run IPython but a default Python executable. {quote} $ IPYTHON=1 ./bin/pyspark Python 2.7.8 (default, Jul 2 2014, 10:14:46) ... {quote} the spark/bin/pyspark script on the commit b235e013638685758885842dc3268e9800af3678 decides which executable and options it use folloing way. 1. if PYSPARK_PYTHON unset * â defaulting to python 2. if IPYTHON_OPTS set * â set IPYTHON 1 3. some python scripts passed to ./bin/pyspak â run it with ./bin/spark-submit * out of this issues scope 4. if IPYTHON set as 1 * â execute $PYSPARK_PYTHON (default: ipython) with arguments $IPYTHON_OPTS * otherwise execute $PYSPARK_PYTHON Therefore, when PYSPARK_PYTHON is unset, python is executed though IPYTHON is 1. In other word, when PYSPARK_PYTHON is unset, IPYTHON_OPS and IPYTHON has no effect on decide which command to use. PYSPARK_PYTHON | IPYTHON_OPTS | IPYTHON | resulting command | expected command | | - | - | - (unset â defaults to python) | (unset) | (unset) | python | (same) (unset â defaults to python) | (unset) | 1 | python | ipython (unset â defaults to python) | an_option | (unset â set to 1) | python an_option | ipython an_option (unset â defaults to python) | an_option | 1 | python an_option | ipython an_option ipython | (unset) | (unset) | ipython | (same) ipython | (unset) | 1 | ipython | (same) ipython | an_option | (unset â set to 1) | ipython an_option | (same) ipython | an_option | 1 | ipython an_option | (same) ### Suggestion The pyspark script should determine firstly whether a user wants to run IPython or other executables. 1. if IPYTHON_OPTS set * set IPYTHON 1 2. if IPYTHON has a value 1 * PYSPARK_PYTHON defaults to ipython if not set 3. PYSPARK_PYTHON defaults to python if not set See the pull request for more detailed modification. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cocoatomo/spark issues/cannot-run-ipython-without-options Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2554.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2554 commit 10d56fbc2703e919882610cc061b00481d009b88 Author: cocoatomo cocoatom...@gmail.com Date: 2014-09-27T03:41:26Z [SPARK-3706][PySpark] Cannot run IPython REPL with IPYTHON set to 1 and PYSPARK_PYTHON unset --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org