[GitHub] spark issue #16397: [WIP][SPARK-18922][TESTS] Fix more path-related test fai...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16397 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16397: [WIP][SPARK-18922][TESTS] Fix more path-related test fai...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16397 Build started: [TESTS] `ALL` [![PR-16397](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=A54F518D-4D20-424F-95B6-3641C55CFBC1&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/A54F518D-4D20-424F-95B6-3641C55CFBC1) Diff: https://github.com/apache/spark/compare/master...spark-test:A54F518D-4D20-424F-95B6-3641C55CFBC1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16405: [SPARK-19002][BUILD] Check pep8 against merge_spa...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/16405 [SPARK-19002][BUILD] Check pep8 against merge_spark_pr.py script ## What changes were proposed in this pull request? This PR proposes to check pep8 against `merge_spark_pr.py` script. ``` ./dev/merge_spark_pr.py:100:1: E302 expected 2 blank lines, found 1 ./dev/merge_spark_pr.py:285:44: E251 unexpected spaces around keyword / parameter equals ./dev/merge_spark_pr.py:285:46: E251 unexpected spaces around keyword / parameter equals ./dev/merge_spark_pr.py:286:16: E251 unexpected spaces around keyword / parameter equals ./dev/merge_spark_pr.py:286:18: E251 unexpected spaces around keyword / parameter equals ./dev/merge_spark_pr.py:286:38: E251 unexpected spaces around keyword / parameter equals ./dev/merge_spark_pr.py:286:40: E251 unexpected spaces around keyword / parameter equals ./dev/merge_spark_pr.py:303:101: E501 line too long (127 > 100 characters) ./dev/merge_spark_pr.py:305:101: E501 line too long (109 > 100 characters) ./dev/merge_spark_pr.py:307:101: E501 line too long (110 > 100 characters) ./dev/merge_spark_pr.py:313:101: E501 line too long (108 > 100 characters) ./dev/merge_spark_pr.py:317:101: E501 line too long (107 > 100 characters) ./dev/merge_spark_pr.py:319:101: E501 line too long (117 > 100 characters) ./dev/merge_spark_pr.py:353:101: E501 line too long (103 > 100 characters) ./dev/merge_spark_pr.py:419:37: E128 continuation line under-indented for visual indent ./dev/merge_spark_pr.py:448:101: E501 line too long (103 > 100 characters) ``` ## How was this patch tested? Via doctests, `python -m doctest -v merge_spark_pr.py` You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark minor-pep8 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16405.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16405 commit 8af1edb7185176ea25eac7a19c7438f30b677528 Author: hyukjinkwon Date: 2016-12-26T13:07:44Z Check pep8 against merge_spark_pr.py script --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16405: [SPARK-19002][BUILD] Check pep8 against merge_spark_pr.p...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16405 Hi @srowen and @holdenk, this is a small PR to run pep8 against `merge_spark_pr.py`. Could I ask if it makes sense please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16405: [SPARK-19002][BUILD] Check pep8 against merge_spark_pr.p...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16405 Hm, this was passed on my local. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16405: [SPARK-19002][BUILD] Check pep8 against merge_spark_pr.p...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16405 Hi @srowen and @holden, this is a small minor PR to check pep8 against `./dev/merge_spark_pr.py`. Could you check if it makes sense please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16405: [SPARK-19002][BUILD] Check pep8 against merge_spa...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16405#discussion_r93886392 --- Diff: dev/lint-python --- @@ -23,6 +23,7 @@ PATHS_TO_CHECK="./python/pyspark/ ./examples/src/main/python/ ./dev/sparktestsup # TODO: fix pep8 errors with the rest of the Python scripts under dev --- End diff -- Sure, make sense. Let me try to do this for all.Thank you both. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16405: [SPARK-19002][BUILD] Check pep8 against dev/*.py scripts
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16405 Ah, this seems complaining in Python 3. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16397: [WIP][SPARK-18922][TESTS] Fix more path-related test fai...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16397 Here, I concatenated all the logs into single file - https://gist.github.com/HyukjinKwon/58567451773f87322c7009007e4fdc34 I just found each in PR description. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16397: [SPARK-18922][TESTS] Fix more path-related test failures...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16397 cc @srowen, could I please ask to review this one? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16405: [SPARK-19002][BUILD] Check pep8 against dev/*.py scripts
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16405 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16413: Branch 1.3
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16413 Hi @Kevy123, it seems this pull request is mistakenly open. Could you please close this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16405: [SPARK-19002][BUILD] Check pep8 against dev/*.py scripts
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16405 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16405: [SPARK-19002][BUILD] Check pep8 against dev/*.py scripts
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16405 Sure, let me double check. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16386 Only regarding the comment, https://github.com/apache/spark/pull/16386#issuecomment-269386229, I have a similar (rather combined) idea that we provide another option such as corrupt file name optionally (meaning maybe the column appears only when user explicitly set for backwards compatibility), don't add a column by `columnNameOfCorruptRecord` with a proper documentation in `wholeFile` mode and issue a warning message if `columnNameOfCorruptRecord` is set by user in `wholeFile` mode. This is a bit complicated idea that might make users confused though. I am not sure if it is the best idea. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16405: [SPARK-19002][BUILD] Check pep8 against all Python scrip...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16405 It seems some existing examples such as `random_rdd_generation.py` do not work with Python 3.3.6 too although it complies fine so that pep8 check can be passed. I fixed only the errors from pep8 here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16405: [SPARK-19002][BUILD] Check pep8 against all Python scrip...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16405 BTW, anyone tried Python 3.6.0 with PySpark? I could not even run `./bin/pyspark` appeartly with an error. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16397: [SPARK-18922][TESTS] Fix more path-related test f...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16397#discussion_r94024218 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/MultiDatabaseSuite.scala --- @@ -80,7 +80,7 @@ class MultiDatabaseSuite extends QueryTest with SQLTestUtils with TestHiveSingle |CREATE TABLE t1 |USING parquet |OPTIONS ( - | path '$path' + | path '${dir.toURI.toString}' --- End diff -- I see, let me correct it for the former. In case of `path`, it is being used above. I thought this is a minimised change because this is the only problematic line, parsing the path wrongly on Windows. For example, `C:\tmp\a\b\c` becomes `C:mpabc`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16397: [SPARK-18922][TESTS] Fix more path-related test f...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16397#discussion_r94024588 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveCommandSuite.scala --- @@ -257,31 +257,37 @@ class HiveCommandSuite extends QueryTest with SQLTestUtils with TestHiveSingleto """.stripMargin) // LOAD DATA INTO partitioned table must specify partition - withInputFile { path => + withInputFile { f => intercept[AnalysisException] { + val path = f.toURI.toString --- End diff -- Just simply because some meet the limit of 100 line length in that case. I will try to clean up including the comment above. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16405: [SPARK-19002][BUILD] Check pep8 against all Python scrip...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16405 Ah, thank you for approving @srowen. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16397: [SPARK-18922][TESTS] Fix more path-related test f...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16397#discussion_r94030166 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/MultiDatabaseSuite.scala --- @@ -80,7 +80,7 @@ class MultiDatabaseSuite extends QueryTest with SQLTestUtils with TestHiveSingle |CREATE TABLE t1 |USING parquet |OPTIONS ( - | path '$path' + | path '${dir.toURI.toString}' --- End diff -- Ah, sure. Let me double check. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16397: [SPARK-18922][TESTS] Fix more path-related test failures...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16397 Build started: [TESTS] `ALL` [![PR-16397](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=443B17ED-C621-4A3A-B45A-1F5E042189A2&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/443B17ED-C621-4A3A-B45A-1F5E042189A2) Diff: https://github.com/apache/spark/compare/master...spark-test:443B17ED-C621-4A3A-B45A-1F5E042189A2 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16397: [SPARK-18922][TESTS] Fix more path-related test failures...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16397 Build started: [TESTS] `ALL` [![PR-16397](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=F9490ECC-9D49-44C8-8CDE-7BCA9C1FD88C&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/F9490ECC-9D49-44C8-8CDE-7BCA9C1FD88C) Diff: https://github.com/apache/spark/compare/master...spark-test:F9490ECC-9D49-44C8-8CDE-7BCA9C1FD88C --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16397: [SPARK-18922][TESTS] Fix more path-related test failures...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16397 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16429: [WIP][SPARK-19019][PYTHON] Fix hijected collectio...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/16429 [WIP][SPARK-19019][PYTHON] Fix hijected collections.namedtuple to be serialized with keyword-only arguments ## What changes were proposed in this pull request? Currently, PySpark does not work with Python 3.6.0. Running `./bin/pyspark` simply throws the error as below: ``` Traceback (most recent call last): File ".../spark/python/pyspark/shell.py", line 30, in import pyspark File ".../spark/python/pyspark/__init__.py", line 46, in from pyspark.context import SparkContext File ".../spark/python/pyspark/context.py", line 36, in from pyspark.java_gateway import launch_gateway File ".../spark/python/pyspark/java_gateway.py", line 31, in from py4j.java_gateway import java_import, JavaGateway, GatewayClient File "", line 961, in _find_and_load File "", line 950, in _find_and_load_unlocked File "", line 646, in _load_unlocked File "", line 616, in _load_backward_compatible File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 18, in File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pydoc.py", line 62, in import pkgutil File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pkgutil.py", line 22, in ModuleInfo = namedtuple('ModuleInfo', 'module_finder name ispkg') File ".../spark/python/pyspark/serializers.py", line 394, in namedtuple cls = _old_namedtuple(*args, **kwargs) TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 'rename', and 'module' ``` The root cause seems because the arguments of `namedtuple` are now completely keyword-only arguments from Python 3.6.0 (See https://bugs.python.org/issue25628). We currently copy this function via `types.FunctionType` which does not set the default values of keyword-only arguments (meaning `namedtuple.__kwdefaults__`) and this seems causing internally missing values in the function (non-bound arguments). This PR proposes to work around this by manually setting it via `kwargs` as `types.FunctionType` seems not supporting to set this. ## How was this patch tested? Manually tested with Python 3.6.0. ``` ./bin/pyspsark ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-19019 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16429.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16429 commit fb049790b5f96070ebd1006630e24bf20c20319a Author: hyukjinkwon Date: 2016-12-29T02:42:28Z Fix naedtuple can be serialized with keyword-only arguments too --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16429: [WIP][SPARK-19019][PYTHON] Fix hijected collections.name...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16429 cc @davies and @JoshRosen. I know both of you are insightful in this area. I am not too sure if this is a correct fix as it seems not even fixed in some other Python thirdparty libraries. Do you mind if I ask to take a look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16397: [SPARK-18922][TESTS] Fix more path-related test failures...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16397 I just checked each is fine in a concatenated log file - https://gist.github.com/HyukjinKwon/8851815ede9dcae80632a5378b74d1ae --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16433: [SPARK-19022][TESTS] Fix tests dependent on OS du...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/16433 [SPARK-19022][TESTS] Fix tests dependent on OS due to different newline characters ## What changes were proposed in this pull request? There are two tests failing on Windows due to the different newlines. ``` - StreamingQueryProgress - prettyJson *** FAILED *** (0 milliseconds) "{ "id" : "39788670-6722-48b7-a248-df6ba08722ac", "runId" : "422282f1-3b81-4b47-a15d-82dda7e69390", "name" : "myName", ... }" did not equal "{ "id" : "39788670-6722-48b7-a248-df6ba08722ac", "runId" : "422282f1-3b81-4b47-a15d-82dda7e69390", "name" : "myName", ... }" ... ``` ``` - StreamingQueryStatus - prettyJson *** FAILED *** (0 milliseconds) "{ "message" : "active", "isDataAvailable" : true, "isTriggerActive" : false }" did not equal "{ "message" : "active", "isDataAvailable" : true, "isTriggerActive" : false }" ... ``` The reason is, `pretty` in `org.json4s.pretty` writes OS-dependent newlines but the string defined in the tests are `\n`. This ends up with test failures. This PR proposes to compare these regardless of newline concerns. ## How was this patch tested? Manually tested via AppVeyor. **Before** https://ci.appveyor.com/project/spark-test/spark/build/417-newlines-fix-before **After** https://ci.appveyor.com/project/spark-test/spark/build/418-newlines-fix You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark tests-StreamingQueryStatusAndProgressSuite Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16433.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16433 commit 15f821cadd39027cfd8860309e32d6b06be92833 Author: hyukjinkwon Date: 2016-12-29T05:27:05Z Fix newline comparison issues --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16433: [SPARK-19022][TESTS] Fix tests dependent on OS due to di...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16433 Build started: [TESTS] `org.apache.spark.sql.streaming.StreamingQueryStatusAndProgressSuite` [![PR-16433](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=AE40452F-D970-407C-92EB-C8079EC86A06&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/AE40452F-D970-407C-92EB-C8079EC86A06) Diff: https://github.com/apache/spark/compare/master...spark-test:AE40452F-D970-407C-92EB-C8079EC86A06 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16428: [SPARK-19018][SQL] ADD csv write charset param
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16428 Do you mind if I ask wheather it writes the line separstor correctly as the encoding specified in the option? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16433: [SPARK-19022][TESTS] Fix tests dependent on OS du...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16433#discussion_r94200602 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryStatusAndProgressSuite.scala --- @@ -30,10 +30,16 @@ import org.apache.spark.sql.streaming.StreamingQueryStatusAndProgressSuite._ class StreamingQueryStatusAndProgressSuite extends StreamTest { + implicit class EqualsIgnoreCRLF(source: String) { +def equalsIgnoreCRLF(target: String): Boolean = { + source.stripMargin.replaceAll("\r\n|\r|\n", System.lineSeparator) === --- End diff -- Oh, sure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16433: [SPARK-19022][TESTS] Fix tests dependent on OS due to di...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16433 In most cases, it seems they explicitly write `\n` (e.g. writing CSV and JSON). _Apparently_, these seem only tests being failed due to this problem --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16397: [SPARK-18922][TESTS] Fix more path-related test failures...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16397 @srowen, thank you Sean. I think it is okay for now. To be honest, I found some more same instances but I haven't fixed, tested and verified them yet. Maybe, I need one more go to deal with them all cleanly. I hope it is okay to go ahead and merge this as is. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against all Pyth...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16405 I just manually ran `./dev/create-release/translate-contributors.py` which had a conflict for sure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16433: [SPARK-19022][TESTS] Fix tests dependent on OS due to di...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16433 Build started: [TESTS] `org.apache.spark.sql.streaming.StreamingQueryStatusAndProgressSuite` [![PR-16433](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=D1A3B54F-82B5-481D-ADE8-7CC273C97303&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/D1A3B54F-82B5-481D-ADE8-7CC273C97303) Diff: https://github.com/apache/spark/compare/master...spark-test:D1A3B54F-82B5-481D-ADE8-7CC273C97303 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.n...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16429#discussion_r94201674 --- Diff: python/pyspark/serializers.py --- @@ -382,18 +382,30 @@ def _hijack_namedtuple(): return global _old_namedtuple # or it will put in closure +global _old_namedtuple_kwdefaults # or it will put in closure too def _copy_func(f): return types.FunctionType(f.__code__, f.__globals__, f.__name__, f.__defaults__, f.__closure__) +def _kwdefaults(f): +kargs = getattr(f, "__kwdefaults__", None) --- End diff -- `__kwdefaults__` can be `None` or not existing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against all Pyth...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16405 retest yhis please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against all Pyth...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16405 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16428: [SPARK-19018][SQL] ADD csv write charset param
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16428 BTW, the reason I asked that in https://github.com/apache/spark/pull/16428#issuecomment-269635303 is I remember that I checked the reading/writing paths related with encodings before and the encoding should be set to line record reader. I just now double-chekced that newlines were `\n` for each batch due to [`TextOutputFormat`s record writer](https://github.com/apache/hadoop/blob/f67237cbe7bc48a1b9088e990800b37529f1db2a/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/TextOutputFormat.java#L48-L49) but it seems it was changed in [the recent commit](https://github.com/apache/spark/pull/16089/files#diff-6a14f6bb643b1474139027d72a17f41aL203). So, now, it seems the newlines are dependent on univocity library. We should add some tests for this for sure, in `CSVSuite` to verify this behaviour and prevent regressions. As a small side note, we don't currently support non-ascii compatible encodings in reading path if I haven't missed some changes in this path. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16397: [SPARK-18922][TESTS] Fix more path-related test failures...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16397 It was a problem because I could not proceed further because the error messages were flooding and somehow the logs were truncated in AppVeyor (e.g. https://ci.appveyor.com/project/spark-test/spark/build/376-hive-failed-tests). I had to run tests separately but I figured it out to run (almost) all tests via AppVeyor, e.g., [![PR-16397](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=F9490ECC-9D49-44C8-8CDE-7BCA9C1FD88C&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/F9490ECC-9D49-44C8-8CDE-7BCA9C1FD88C). But, now then this takes a lot of times (7h 12m in this case). I believe It of course makes easier to spot the errors because - it gets rid of a lot of flooding errors I believe and I can easily spot the errors - therefore, I can run some concatenated tests. I am willing to try to add all here if these reasons are not persuasive enough. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16433: [SPARK-19022][TESTS] Fix tests dependent on OS due to di...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16433 Yes, I hesitated to submit this PR for a while due to similar concerns.. > is it because this is the only test for prettyJson? I believe so. Let me double check again. > I also make sure that, say, we do want the output of prettyJson to vary by platform. Hm, I guess that's reasonable here as it's meant for display on a terminal I guess. ^ @zsxwing could you confirm this please if possible? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16397: [SPARK-18922][TESTS] Fix more path-related test failures...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16397 Thank you @srowen !! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16428: [SPARK-19018][SQL] ADD csv write charset param
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16428 Ah, I meant to add a test there in this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16428: [SPARK-19018][SQL] ADD csv write charset param
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16428#discussion_r94238157 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala --- @@ -71,7 +71,9 @@ private[csv] class CSVOptions(@transient private val parameters: CaseInsensitive val delimiter = CSVTypeCast.toChar( parameters.getOrElse("sep", parameters.getOrElse("delimiter", ","))) private val parseMode = parameters.getOrElse("mode", "PERMISSIVE") - val charset = parameters.getOrElse("encoding", + val readCharSet = parameters.getOrElse("encoding", +parameters.getOrElse("charset", StandardCharsets.UTF_8.name())) + val writeCharSet = parameters.getOrElse("writeEncoding", --- End diff -- I think we should not necessarily introduce additional option. We could just use `charset` variable because other options such as `nullValue` are already applied to both reading and writing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16428: [SPARK-19018][SQL] ADD csv write charset param
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16428#discussion_r94239452 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -573,6 +573,7 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { * indicates a timestamp format. Custom date formats follow the formats at * `java.text.SimpleDateFormat`. This applies to timestamp type. * + * `writeEncoding`(default `utf-8`) save dataFrame 2 csv by giving encoding --- End diff -- We also should add the same documentation in `readwriter.py`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against a...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16405#discussion_r94263510 --- Diff: examples/src/main/python/mllib/decision_tree_regression_example.py --- @@ -44,7 +44,7 @@ # Evaluate model on test instances and compute test error predictions = model.predict(testData.map(lambda x: x.features)) labelsAndPredictions = testData.map(lambda lp: lp.label).zip(predictions) -testMSE = labelsAndPredictions.map(lambda (v, p): (v - p) * (v - p)).sum() /\ +testMSE = labelsAndPredictions.map(lambda lp: (lp[0] - lp[1]) * (lp[0] - lp[1])).sum() /\ --- End diff -- That seems causing errors in python 3 when a tuple is used in lambda to unpack. It seems http://www.python.org/dev/peps/pep-3113 is related issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against a...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16405#discussion_r94263914 --- Diff: dev/lint-python --- @@ -19,10 +19,8 @@ SCRIPT_DIR="$( cd "$( dirname "$0" )" && pwd )" SPARK_ROOT_DIR="$(dirname "$SCRIPT_DIR")" -PATHS_TO_CHECK="./python/pyspark/ ./examples/src/main/python/ ./dev/sparktestsupport" -# TODO: fix pep8 errors with the rest of the Python scripts under dev -PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/run-tests.py ./python/*.py ./dev/run-tests-jenkins.py" -PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/pip-sanity-check.py" +# Exclude auto-geneated configuration file. +PATHS_TO_CHECK="$( find "$SPARK_ROOT_DIR" -name "*.py" -not -path "*python/docs/conf.py" )" --- End diff -- Yea, I think this is a valid point. Let me check the length and the length limitation first for sure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against a...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16405#discussion_r94273247 --- Diff: dev/lint-python --- @@ -19,10 +19,8 @@ SCRIPT_DIR="$( cd "$( dirname "$0" )" && pwd )" SPARK_ROOT_DIR="$(dirname "$SCRIPT_DIR")" -PATHS_TO_CHECK="./python/pyspark/ ./examples/src/main/python/ ./dev/sparktestsupport" -# TODO: fix pep8 errors with the rest of the Python scripts under dev -PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/run-tests.py ./python/*.py ./dev/run-tests-jenkins.py" -PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/pip-sanity-check.py" +# Exclude auto-geneated configuration file. +PATHS_TO_CHECK="$( find "$SPARK_ROOT_DIR" -name "*.py" -not -path "*python/docs/conf.py" )" --- End diff -- It seems usually 32K on Cygwin by default in general. The actual length without any prefix seems 11K for now. Let me try to turn these into relative paths as a safe choice. Then, it would be safe in general. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against a...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16405#discussion_r94273331 --- Diff: dev/lint-python --- @@ -19,10 +19,8 @@ SCRIPT_DIR="$( cd "$( dirname "$0" )" && pwd )" SPARK_ROOT_DIR="$(dirname "$SCRIPT_DIR")" -PATHS_TO_CHECK="./python/pyspark/ ./examples/src/main/python/ ./dev/sparktestsupport" -# TODO: fix pep8 errors with the rest of the Python scripts under dev -PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/run-tests.py ./python/*.py ./dev/run-tests-jenkins.py" -PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/pip-sanity-check.py" +# Exclude auto-geneated configuration file. +PATHS_TO_CHECK="$( cd "$SPARK_ROOT_DIR" && find . -name "*.py" -not -path "*python/docs/conf.py" )" --- End diff -- I tested this as below for sure, ```bash ./lint-python ./dev/lint-python ./spark/dev/lint-python ``` So, now it is relative paths which currently are up to 11K as below: ``` ./dev/create-release/generate-contributors.py ./dev/create-release/releaseutils.py ./dev/create-release/translate-contributors.py ./dev/github_jira_sync.py ./dev/merge_spark_pr.py ./dev/pep8-1.7.0.py ./dev/pip-sanity-check.py ./dev/run-tests-jenkins.py ./dev/run-tests.py ./dev/sparktestsupport/__init__.py ./dev/sparktestsupport/modules.py ./dev/sparktestsupport/shellutils.py ./dev/sparktestsupport/toposort.py ./examples/src/main/python/als.py ./examples/src/main/python/avro_inputformat.py ./examples/src/main/python/kmeans.py ./examples/src/main/python/logistic_regression.py ./examples/src/main/python/ml/aft_survival_regression.py ./examples/src/main/python/ml/als_example.py ./examples/src/main/python/ml/binarizer_example.py ./examples/src/main/python/ml/bisecting_k_means_example.py ./examples/src/main/python/ml/bucketizer_example.py ./examples/src/main/python/ml/chisq_selector_example.py ./examples/src/main/python/ml/count_vectorizer_example.py ./examples/src/main/python/ml/cross _validator.py ./examples/src/main/python/ml/dataframe_example.py ./examples/src/main/python/ml/dct_example.py ./examples/src/main/python/ml/decision_tree_classification_example.py ./examples/src/main/python/ml/decision_tree_regression_example.py ./examples/src/main/python/ml/elementwise_product_example.py ./examples/src/main/python/ml/estimator_transformer_param_example.py ./examples/src/main/python/ml/gaussian_mixture_example.py ./examples/src/main/python/ml/generalized_linear_regression_example.py ./examples/src/main/python/ml/gradient_boosted_tree_classifier_example.py ./examples/src/main/python/ml/gradient_boosted_tree_regressor_example.py ./examples/src/main/python/ml/index_to_string_example.py ./examples/src/main/python/ml/isotonic_regression_example.py ./examples/src/main/python/ml/kmeans_example.py ./examples/src/main/python/ml/lda_example.py ./examples/src/main/python/ml/linear_regression_with_elastic_net.py ./examples/src/main/python/ml/logistic_regression_summary_example. py ./examples/src/main/python/ml/logistic_regression_with_elastic_net.py ./examples/src/main/python/ml/max_abs_scaler_example.py ./examples/src/main/python/ml/min_max_scaler_example.py ./examples/src/main/python/ml/multiclass_logistic_regression_with_elastic_net.py ./examples/src/main/python/ml/multilayer_perceptron_classification.py ./examples/src/main/python/ml/n_gram_example.py ./examples/src/main/python/ml/naive_bayes_example.py ./examples/src/main/python/ml/normalizer_example.py ./examples/src/main/python/ml/one_vs_rest_example.py ./examples/src/main/python/ml/onehot_encoder_example.py ./examples/src/main/python/ml/pca_example.py ./examples/src/main/python/ml/pipeline_example.py ./examples/src/main/python/ml/polynomial_expansion_example.py ./examples/src/main/python/ml/quantile_discretizer_example.py ./examples/src/main/python/ml/random_forest_classifier_example.py ./examples/src/main/python/ml/random_forest_regressor_example.py ./examples/src/main/python/ml/rformula_example.py ./examples/src/main/python/ml/sql_transformer.py ./examples/src/main/python/ml/standard_scaler_example.py ./examples/src/main/python/ml/stopwords_remover_example.py ./examples/src/main/python/ml/string_indexer_example.py ./examples/src/main/python/ml/tf_idf_example.py ./examples/src/main/python/ml/tokenizer_example.py ./examples/src/main/python/ml/train_validation_split.py ./examples/src/main/python/ml/vector_assembler_example.py ./examples/src/main/python/ml/vector_indexer_example.py ./examples/src/main/python/ml/vector_slicer_example.py ./examples/src/main/python/ml/word2vec_example.py ./examples/src/main/python/mllib/binary_classification_metrics_example.py ./examples/src/main/python/mllib/bisecting_k_means_example.py ./ex
[GitHub] spark pull request #16428: [SPARK-19018][SQL] ADD csv write charset param
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16428#discussion_r94273423 --- Diff: python/pyspark/sql/readwriter.py --- @@ -659,7 +659,7 @@ def text(self, path, compression=None): self._jwrite.text(path) @since(2.0) -def csv(self, path, mode=None, compression=None, sep=None, quote=None, escape=None, +def csv(self, path, mode=None, compression=None, sep=None, encoding=None, quote=None, escape=None, --- End diff -- We need to place this new option at the end. Otherwise, it will breaks existing codes that use this options as a positional argument (not keyword argument). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16428: [SPARK-19018][SQL] ADD csv write charset param
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16428#discussion_r94273531 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -33,6 +33,7 @@ import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.test.{SharedSQLContext, SQLTestUtils} import org.apache.spark.sql.types._ +//noinspection ScalaStyle --- End diff -- We can disable only the lines with the block as below if you need this for non-ascii characters: ```scala // scalastyle:off ... // scalastyle:on ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16428: [SPARK-19018][SQL] ADD csv write charset param
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16428#discussion_r94273548 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -573,6 +573,7 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { * indicates a timestamp format. Custom date formats follow the formats at * `java.text.SimpleDateFormat`. This applies to timestamp type. * + * `encoding`(default `utf-8`) save dataFrame 2 csv by giving encoding --- End diff -- Could we just resemble the documentation in `DataFrameReader` just for consistency? ``` `encoding` (default `UTF-8`): decodes the CSV files by the given encoding * type. ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16428: [SPARK-19018][SQL] ADD csv write charset param
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16428#discussion_r94273678 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -905,4 +906,21 @@ class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils { checkAnswer(df, Row(1, null)) } } + + test("save data with gb18030") { +withTempPath{ path => --- End diff -- nit: it should be `withTempPath { path =>`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16428: [SPARK-19018][SQL] ADD csv write charset param
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16428#discussion_r94273677 --- Diff: python/pyspark/sql/readwriter.py --- @@ -677,6 +677,8 @@ def csv(self, path, mode=None, compression=None, sep=None, quote=None, escape=No snappy and deflate). :param sep: sets the single character as a separator for each field and value. If None is set, it uses the default value, ``,``. +:param encoding: sets writer CSV files by the given encoding type. If None is set, + it uses the default value, ``UTF-8``. --- End diff -- Here too, let's resemble the one in `DataFrameReader` above in this file. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16428: [SPARK-19018][SQL] ADD csv write charset param
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16428#discussion_r94273685 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -905,4 +906,21 @@ class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils { checkAnswer(df, Row(1, null)) } } + + test("save data with gb18030") { +withTempPath{ path => + Seq(("1", "ä¸æ")) +.toDF("num", "lanaguage") +.write +.option("encoding", "GB18030") +.option("header", "true") +.csv(path.getAbsolutePath) + val df = spark.read +.option("header", "true") +.option("encoding", "GB18030") +.csv(path.getAbsolutePath) + + checkAnswer(df, Row("1", "ä¸æ")) --- End diff -- Could we write this something like as below: ```scala // scalastyle:off val df = Seq(("1", "ä¸æ")).toDF("num", "lanaguage") // scalastyle:on df.write .option("header", "true") .option("encoding", "GB18030") .csv(path.getAbsolutePath) val readBack = spark.read .option("header", "true") .option("encoding", "GB18030") .csv(path.getAbsolutePath) checkAnswer(df, readBack) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16428: [SPARK-19018][SQL] ADD csv write charset param
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16428#discussion_r94273737 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -573,6 +573,7 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { * indicates a timestamp format. Custom date formats follow the formats at * `java.text.SimpleDateFormat`. This applies to timestamp type. * + * `encoding`(default `utf-8`) save dataFrame 2 csv by giving encoding --- End diff -- looks good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16428: [SPARK-19018][SQL] ADD csv write charset param
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16428#discussion_r94273866 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -573,6 +573,7 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { * indicates a timestamp format. Custom date formats follow the formats at * `java.text.SimpleDateFormat`. This applies to timestamp type. * + * `encoding`(default `utf-8`) save dataFrame 2 csv by giving encoding --- End diff -- Oh, also, it seems the newly added option here should be put in .. ``` ... ``` so that this can be rendered fine in Java API documentation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16433: [SPARK-19022][TESTS] Fix tests dependent on OS due to di...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16433 I just double checked. It seems `org.json4s.pretty` is being used in several places but they look for debugging purpose, printing purpose and making a request body (e.g., `StandaloneRestSubmitSuite`). So, for `org.json4s.pretty`, it seems these are only tests being failed due to this problem. About OS-dependent newline tests, I just checked the rest of them. I skimmed again the failed tests at my best and it seems these are only the tests being failed due to this problem. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15848: [SPARK-9487] Use the same num. worker threads in Java/Sc...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15848 @skanjila We would be able to close this if there are no updates for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16397: [SPARK-18922][TESTS] Fix more path-related test failures...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16397 @srowen, Otherwise, I think I am able to open a `[WIP]` or `[DO-NOT-MERGE]` PR and then push & test again and again some commits fixing these rather than trying to only verify these via only the local branches in my @spark-test account (which I am currently doing) because my appveyor scripts can easily run tests against a PR. Do you mind if I open a long-time open `[WIP]` or `[DO-NOT-MERGE]` PR to find all failing tests related with this issue if you are worried of merging multiple PRs that fixes the same issues? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16397: [SPARK-18922][TESTS] Fix more path-related test failures...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16397 Yes, I think I am almost there and am fixing these although these are slightly more than I expected before due to some errors I didn't think were caused by this issue such as ``` org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'csv_table' not found in database 'default'; ``` and aborted tests which I missed while just grepping. But, I think these can be still in one go. Let me just try to verify them as usual and then open a short-term wip PR like this. I just asked this just because suddenly I realised this might be a better idea for the best. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16397: [SPARK-18922][TESTS] Fix more path-related test failures...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16397 BTW, thanks again for your quick response. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16451: [WIP][SPARK-18922][SQL][CORE][TESTS] Fix all iden...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/16451 [WIP][SPARK-18922][SQL][CORE][TESTS] Fix all identified tests failed due to path and resources problems on Windows ## What changes were proposed in this pull request? WIP - just opened this first to run some more tests together with Jenkins and AppVeyor. ## How was this patch tested? You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark all-path-resource-fixes Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16451.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16451 commit e58f0bdd170421d484c384d8d8feb3f18eae310c Author: hyukjinkwon Date: 2017-01-02T04:43:20Z Fix more path and resources problems --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16451: [WIP][SPARK-18922][SQL][CORE][TESTS] Fix all identified ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16451 Build started: [TESTS] `ALL` [![PR-16451](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=044D6A78-26AA-4A2C-A4A1-B39DF60C811C&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/044D6A78-26AA-4A2C-A4A1-B39DF60C811C) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16451: [WIP][SPARK-18922][SQL][CORE][TESTS] Fix all identified ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16451 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.namedtup...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16429 gentle ping.. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16320#discussion_r94358447 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -85,7 +85,9 @@ private[csv] object CSVInferSchema { case NullType => tryParseInteger(field, options) case IntegerType => tryParseInteger(field, options) case LongType => tryParseLong(field, options) -case _: DecimalType => tryParseDecimal(field, options) +case _: DecimalType => + // DecimalTypes have different precisions and scales, so we try to find the common type. + findTightestCommonType(typeSoFar, tryParseDecimal(field, options)).getOrElse(NullType) --- End diff -- Yes, otherwise, it might end up with an incorrect datatypes. For example, ```scala val path = "/tmp/test1" Seq(s"${Long.MaxValue}1", "2015-12-01 00:00:00", "1").toDF().coalesce(1).write.text(path) spark.read.option("inferSchema", true).csv(path).printSchema() ``` ``` root |-- _c0: integer (nullable = true) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.namedtup...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16429 Thanks for your interests @azmras. I just checked it as below: ```python sc.parallelize(range(100), 8) ``` ``` Traceback (most recent call last): File ".../spark/python/pyspark/cloudpickle.py", line 107, in dump return Pickler.dump(self, obj) File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py", line 409, in dump self.save(obj) File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py", line 476, in save f(self, obj) # Call unbound method with explicit self File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py", line 751, in save_tuple save(element) File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py", line 476, in save f(self, obj) # Call unbound method with explicit self File ".../spark/python/pyspark/cloudpickle.py", line 214, in save_function self.save_function_tuple(obj) File ".../spark/python/pyspark/cloudpickle.py", line 244, in save_function_tuple code, f_globals, defaults, closure, dct, base_globals = self.extract_func_data(func) File ".../spark/python/pyspark/cloudpickle.py", line 306, in extract_func_data func_global_refs = self.extract_code_globals(code) File ".../spark/python/pyspark/cloudpickle.py", line 288, in extract_code_globals out_names.add(names[oparg]) IndexError: tuple index out of range During handling of the above exception, another exception occurred: Traceback (most recent call last): File "", line 1, in File ".../spark/python/pyspark/rdd.py", line 198, in __repr__ return self._jrdd.toString() File ".../spark/python/pyspark/rdd.py", line 2438, in _jrdd self._jrdd_deserializer, profiler) File ".../spark/python/pyspark/rdd.py", line 2371, in _wrap_function pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command) File ".../spark/python/pyspark/rdd.py", line 2357, in _prepare_for_python_RDD pickled_command = ser.dumps(command) File ".../spark/python/pyspark/serializers.py", line 452, in dumps return cloudpickle.dumps(obj, 2) File ".../spark/python/pyspark/cloudpickle.py", line 667, in dumps cp.dump(obj) File ".../spark/python/pyspark/cloudpickle.py", line 115, in dump if "'i' format requires" in e.message: AttributeError: 'IndexError' object has no attribute 'message' ``` It looks another issue with Python 3.6.0. This is only related with the hijacked `collections.namedtuple`. We should port https://github.com/cloudpipe/cloudpickle/commit/4945361c2db92095f934b92a6c00316243caf3cc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.namedtup...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16429 Hi @joshrosen and @davies, do you think that should be ported in this PR? I am worried of making this PR harder by porting it here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.namedtup...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16429 Hi @azmras, now it should work fine for your case as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.namedtup...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16429 @azmras Could you maybe double check? It works okay in my local as below: ``` Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.2.0-SNAPSHOT /_/ Using Python version 3.6.0 (default, Dec 24 2016 00:01:50) SparkSession available as 'spark'. >>> sc.parallelize(range(100), 8).take(5) [0, 1, 2, 3, 4] >>> sc.parallelize(range(1000), 20).take(5) [0, 1, 2, 3, 4] >>> ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16451: [WIP][SPARK-18922][SQL][CORE][TESTS] Fix all identified ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16451 Build started: [TESTS] `org.apache.spark.streaming.kafka.ReliableKafkaStreamSuite` [![PR-16451](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=A2836427-A94C-4BE0-9D24-537B09362C69&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/A2836427-A94C-4BE0-9D24-537B09362C69) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16451: [WIP][SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16451 Build started: [TESTS] `org.apache.spark.streaming.kafka.DirectKafkaStreamSuite` [![PR-16451](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=1C2B248D-2455-4ADB-AC8A-1CEB93E4EC5F&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/1C2B248D-2455-4ADB-AC8A-1CEB93E4EC5F) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16451: [WIP][SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16451 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16451: [WIP][SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16451 Build started: [TESTS] `org.apache.spark.streaming.kafka010.DirectKafkaStreamSuite` [![PR-16451](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=887C39EC-849A-40E5-BAE7-771BDF5BC98A&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/887C39EC-849A-40E5-BAE7-771BDF5BC98A) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16451: [WIP][SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16451 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16451: [WIP][SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16451 Build started: [TESTS] `org.apache.spark.streaming.kafka010.DirectKafkaStreamSuite` [![PR-16451](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=E8488472-738C-4ADF-A924-8F858728D120&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/E8488472-738C-4ADF-A924-8F858728D120) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.namedtup...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16429 @azmras Thank you for confirming this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16451: [WIP][SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16451 Build started: [TESTS] `org.apache.spark.streaming.kafka010.DirectKafkaStreamSuite` [![PR-16451](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=A7615F8B-58B0-4D9B-A914-32E7BF7DCB65&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/A7615F8B-58B0-4D9B-A914-32E7BF7DCB65) Build started: [TESTS] `org.apache.spark.sql.hive.execution.SQLQuerySuite` [![PR-16451](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=3789CF31-AF57-492C-9FF7-5235D5C8C124&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/3789CF31-AF57-492C-9FF7-5235D5C8C124) Build started: [TESTS] `org.apache.spark.sql.hive.MetastoreDataSourcesSuite` [![PR-16451](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=451A5CFC-6AB3-498B-86A0-43DED5C0F13A&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/451A5CFC-6AB3-498B-86A0-43DED5C0F13A) Build started: [TESTS] `org.apache.spark.streaming.kafka.ReliableKafkaStreamSuite` [![PR-16451](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=D1BE653C-EDE2-4E4E-8781-85EE95CA078B&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/D1BE653C-EDE2-4E4E-8781-85EE95CA078B) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16451: [WIP][SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16451 Now, there are 30ish tests failed on Windows which I could identify via AppVeyor tests [here](https://gist.github.com/HyukjinKwon/88a0b37cd027934bc14f3aa9f812be31) which I am currently working on. Their causes do not look resource or path related problems --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16451: [WIP][SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16451 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16451: [WIP][SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16451 Build started: [TESTS] `org.apache.spark.sql.hive.PartitionedTablePerfStatsSuite` [![PR-16451](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=0C0F228B-9B67-49AC-9C35-4385944721D0&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/0C0F228B-9B67-49AC-9C35-4385944721D0) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16451: [WIP][SPARK-18922][SQL][CORE][STREAMING][TESTS] F...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16451#discussion_r94562021 --- Diff: core/src/test/scala/org/apache/spark/util/UtilsSuite.scala --- @@ -482,7 +482,7 @@ class UtilsSuite extends SparkFunSuite with ResetSystemProperties with Logging { s"hdfs:/jar1,file:/jar2,file:$cwd/jar3,file:$cwd/jar4#jar5,file:$cwd/path%20to/jar6") if (Utils.isWindows) { assertResolves("""hdfs:/jar1,file:/jar2,jar3,C:\pi.py#py.pi,C:\path to\jar4""", - s"hdfs:/jar1,file:/jar2,file:$cwd/jar3,file:/C:/pi.py#py.pi,file:/C:/path%20to/jar4") + s"hdfs:/jar1,file:/jar2,file:$cwd/jar3,file:/C:/pi.py%23py.pi,file:/C:/path%20to/jar4") --- End diff -- This test was already being failed on Windows. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16451: [WIP][SPARK-18922][SQL][CORE][STREAMING][TESTS] F...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16451#discussion_r94561930 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -1485,17 +1485,18 @@ private[spark] object Utils extends Logging { /** Return uncompressed file length of a compressed file. */ private def getCompressedFileLength(file: File): Long = { try { - // Uncompress .gz file to determine file size. - var fileSize = 0L - val gzInputStream = new GZIPInputStream(new FileInputStream(file)) - val bufSize = 1024 - val buf = new Array[Byte](bufSize) - var numBytes = ByteStreams.read(gzInputStream, buf, 0, bufSize) - while (numBytes > 0) { -fileSize += numBytes -numBytes = ByteStreams.read(gzInputStream, buf, 0, bufSize) + tryWithResource(new GZIPInputStream(new FileInputStream(file))) { gzInputStream => --- End diff -- This simply changes from ```scala val gzInputStream = new GZIPInputStream(new FileInputStream(file)) ... ``` to ```scala tryWithResource(new GZIPInputStream(new FileInputStream(file))) { gzInputStream => ... } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16451: [WIP][SPARK-18922][SQL][CORE][STREAMING][TESTS] F...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16451#discussion_r94564561 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/receiver/ReceiverSupervisorImpl.scala --- @@ -175,6 +175,12 @@ private[streaming] class ReceiverSupervisorImpl( } override protected def onStop(message: String, error: Option[Throwable]) { +receivedBlockHandler match { + case handler: WriteAheadLogBasedBlockHandler => +// Write ahead log should be closed. +handler.stop() --- End diff -- It seems closing write ahead log is missed. This causes the test failure in `org.apache.spark.streaming.kafka.ReliableKafkaStreamSuite`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16451: [WIP][SPARK-18922][SQL][CORE][STREAMING][TESTS] F...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16451#discussion_r94562979 --- Diff: external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/DirectKafkaStreamSuite.scala --- @@ -372,7 +367,7 @@ class DirectKafkaStreamSuite sendData(i) } -eventually(timeout(10 seconds), interval(50 milliseconds)) { +eventually(timeout(20 seconds), interval(50 milliseconds)) { --- End diff -- This test seems too flacky on Windows (at least in AppVeyor). Now, it looks passed in most cases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16451: [WIP][SPARK-18922][SQL][CORE][STREAMING][TESTS] F...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16451#discussion_r94562260 --- Diff: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala --- @@ -138,10 +139,15 @@ class KafkaTestUtils extends Logging { if (server != null) { server.shutdown() + server.awaitShutdown() server = null } -brokerConf.logDirs.foreach { f => Utils.deleteRecursively(new File(f)) } +// On Windows, `logDirs` is left open even after Kafka server above is completely shut-downed +// in some cases. It leads to test failures on Windows if these are not ignored. +brokerConf.logDirs.map(new File(_)) + .filter(FileUtils.deleteQuietly) + .foreach(f => logWarning("Failed to delete: " + f.getAbsolutePath)) --- End diff -- It really looks an issue in Kafka. The broker seems shut-downed without closing the log directories in some cases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16451: [WIP][SPARK-18922][SQL][CORE][STREAMING][TESTS] F...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16451#discussion_r94563476 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -222,25 +223,34 @@ case class LoadDataCommand( val loadPath = if (isLocal) { val uri = Utils.resolveURI(path) -val filePath = uri.getPath() -val exists = if (filePath.contains("*")) { +val file = new File(uri.getPath) +val exists = if (file.getAbsolutePath.contains("*")) { val fileSystem = FileSystems.getDefault - val pathPattern = fileSystem.getPath(filePath) - val dir = pathPattern.getParent.toString + val dir = file.getParentFile.getAbsolutePath if (dir.contains("*")) { throw new AnalysisException( s"LOAD DATA input path allows only filename wildcard: $path") } + // Note that special characters such as "*" on Windows are not allowed as a path. + // Calling `WindowsFileSystem.getPath` throws an exception if there are in the path. + val dirPath = fileSystem.getPath(dir) + val pathPattern = new File(dirPath.toAbsolutePath.toString, file.getName).toURI.getPath + val safePathPattern = if (Utils.isWindows) { +// On Windows, the pattern should not start with slashes for absolute file paths. +pathPattern.stripPrefix("/") --- End diff -- On Windows, both `C:\\a\\b\\c` and `C:/a/b/c` are allowed here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16451: [WIP][SPARK-18922][SQL][CORE][STREAMING][TESTS] F...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16451#discussion_r94564162 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala --- @@ -339,10 +339,15 @@ class HiveSparkSubmitSuite private def runSparkSubmit(args: Seq[String]): Unit = { val sparkHome = sys.props.getOrElse("spark.test.home", fail("spark.test.home is not set!")) val history = ArrayBuffer.empty[String] -val commands = Seq("./bin/spark-submit") ++ args +val sparkSubmit = if (Utils.isWindows) { + new File("..\\..\\bin\\spark-submit.cmd").getAbsolutePath +} else { + new File("../../bin/spark-submit").getAbsolutePath +} +val commands = Seq(sparkSubmit) ++ args val commandLine = commands.mkString("'", "' '", "'") -val builder = new ProcessBuilder(commands: _*).directory(new File(sparkHome)) --- End diff -- `ProcessBuilder.directory` seems not changing the working directory on Windows. I verified this with the codes below: ```scala import scala.io.Source import java.lang.ProcessBuilder import java.io.File val sparkHome = "your-spark-home" val process = new ProcessBuilder(".\\bin\\spark-submit.cmd").directory(new File(sparkHome)).start() process.waitFor() Source.fromInputStream(process.getInputStream()).getLines().mkString("\n") ``` This code path resembles `org.apache.spark.deploy.SparkSubmitSuite` and the test codes there already use relative paths. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16451: [WIP][SPARK-18922][SQL][CORE][STREAMING][TESTS] F...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16451#discussion_r94562756 --- Diff: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala --- @@ -374,8 +380,15 @@ class KafkaTestUtils extends Logging { def shutdown() { factory.shutdown() - Utils.deleteRecursively(snapshotDir) - Utils.deleteRecursively(logDir) + // The directories are not closed even if the ZooKeeper server is shut-downed. + // Please see ZOOKEEPER-1844, which is fixed in 3.4.6+. It leads to test failures + // on Windows if these are not ignored. + if (FileUtils.deleteQuietly(snapshotDir)) { --- End diff -- It does not close the directory in Zookeeper. This seems fixed from 3.4.6+ (See https://github.com/apache/zookeeper/blob/release-3.4.6/src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java#L161-L165) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16451: [WIP][SPARK-18922][SQL][CORE][STREAMING][TESTS] F...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16451#discussion_r94563417 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -222,25 +223,34 @@ case class LoadDataCommand( val loadPath = if (isLocal) { val uri = Utils.resolveURI(path) -val filePath = uri.getPath() -val exists = if (filePath.contains("*")) { +val file = new File(uri.getPath) +val exists = if (file.getAbsolutePath.contains("*")) { val fileSystem = FileSystems.getDefault - val pathPattern = fileSystem.getPath(filePath) - val dir = pathPattern.getParent.toString + val dir = file.getParentFile.getAbsolutePath --- End diff -- Here, it threw the exception as below: ``` java.nio.file.InvalidPathException: Illegal char <:> at index 2: /C:/projects/spark/target/tmp/spark-8e874658-3e0d-4622-a999-d4305954d2c1/*part-r* ``` because the leading `/` is not allowed. After converting it into `C:\a\b\c` format, then it throws an exception as below: ``` java.nio.file.InvalidPathException: Illegal char <*> at index 72: C:\projects\spark\target\tmp\spark-2f34e61d-9951-43fe-bb7d-32248fa55b22\*part-r* ``` Special characters such as "*" on Windows are not allowed as a path. So, calling `WindowsFileSystem.getPath` throws an exception if there are in the path. So, here, I separated the file name and the dir path. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ident...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16451 Hi @srowen, do you mind if I ask to check whether the changes look reasonable? (I will double check if the tests are really passed after the tests above are finished.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ident...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16451 Now, there are 30ish tests failed on Windows which I could identify via AppVeyor tests [here](https://gist.github.com/HyukjinKwon/88a0b37cd027934bc14f3aa9f812be31) which I am currently working on. Their causes do not look resource or path related problems --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ident...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16451 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix al...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16451#discussion_r94569924 --- Diff: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala --- @@ -138,10 +139,15 @@ class KafkaTestUtils extends Logging { if (server != null) { server.shutdown() + server.awaitShutdown() server = null } -brokerConf.logDirs.foreach { f => Utils.deleteRecursively(new File(f)) } +// On Windows, `logDirs` is left open even after Kafka server above is completely shut-downed +// in some cases. It leads to test failures on Windows if these are not ignored. +brokerConf.logDirs.map(new File(_)) --- End diff -- It really looks an issue in Kafka. The broker seems shut-downed without closing the log directories in some cases. This directories are Kafka specified directories. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix al...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16451#discussion_r94575201 --- Diff: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala --- @@ -138,10 +139,15 @@ class KafkaTestUtils extends Logging { if (server != null) { server.shutdown() + server.awaitShutdown() server = null } -brokerConf.logDirs.foreach { f => Utils.deleteRecursively(new File(f)) } +// On Windows, `logDirs` is left open even after Kafka server above is completely shut-downed +// in some cases. It leads to test failures on Windows if these are not ignored. +brokerConf.logDirs.map(new File(_)) + .filterNot(FileUtils.deleteQuietly) --- End diff -- Ah, actually, `_.delete` does not actually delete when it is not empty as below: ``` . âââ tmp âââ aa ``` ```scala scala> import java.io.File import java.io.File scala> new File("./tmp").delete() res0: Boolean = false ``` I first wanted to use `Utils.deleteRecursively` but it throws an exception as below: ``` DirectKafkaStreamSuite: Exception encountered when attempting to run a suite with class name: org.apache.spark.streaming.kafka.DirectKafkaStreamSuite *** ABORTED *** (7 seconds, 127 milliseconds) java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-d0d3eba7-4215-4e10-b40e-bb797e89338e at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1010) ``` when a lock is hold on Windows. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix al...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16451#discussion_r94575266 --- Diff: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala --- @@ -374,8 +380,15 @@ class KafkaTestUtils extends Logging { def shutdown() { factory.shutdown() - Utils.deleteRecursively(snapshotDir) - Utils.deleteRecursively(logDir) + // The directories are not closed even if the ZooKeeper server is shut-downed. + // Please see ZOOKEEPER-1844, which is fixed in 3.4.6+. It leads to test failures + // on Windows if these are not ignored. + if (FileUtils.deleteQuietly(snapshotDir)) { --- End diff -- Oh, yes. It should compare it with _not_. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix al...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16451#discussion_r94575575 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala --- @@ -339,10 +339,15 @@ class HiveSparkSubmitSuite private def runSparkSubmit(args: Seq[String]): Unit = { val sparkHome = sys.props.getOrElse("spark.test.home", fail("spark.test.home is not set!")) val history = ArrayBuffer.empty[String] -val commands = Seq("./bin/spark-submit") ++ args +val sparkSubmit = if (Utils.isWindows) { --- End diff -- I think we don't have such ones anymore assuming from the rest of errors - [here](https://gist.github.com/HyukjinKwon/88a0b37cd027934bc14f3aa9f812be31). We have a similar ones that are failed during trying to execute `/bin/bash` but I believe they are different with this case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix al...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16451#discussion_r94575683 --- Diff: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala --- @@ -138,10 +139,15 @@ class KafkaTestUtils extends Logging { if (server != null) { server.shutdown() + server.awaitShutdown() server = null } -brokerConf.logDirs.foreach { f => Utils.deleteRecursively(new File(f)) } +// On Windows, `logDirs` is left open even after Kafka server above is completely shut-downed +// in some cases. It leads to test failures on Windows if these are not ignored. +brokerConf.logDirs.map(new File(_)) + .filterNot(FileUtils.deleteQuietly) --- End diff -- Should I maybe just try to use a try-catch with `Utils.deleteRecursively`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ident...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16451 Let me just push a small commit fixing the _"not"_ condition just mainly to retrigger the test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org