[GitHub] spark issue #16397: [WIP][SPARK-18922][TESTS] Fix more path-related test fai...

2016-12-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16397
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16397: [WIP][SPARK-18922][TESTS] Fix more path-related test fai...

2016-12-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16397
  
Build started: [TESTS] `ALL` 
[![PR-16397](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=A54F518D-4D20-424F-95B6-3641C55CFBC1&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/A54F518D-4D20-424F-95B6-3641C55CFBC1)
Diff: 
https://github.com/apache/spark/compare/master...spark-test:A54F518D-4D20-424F-95B6-3641C55CFBC1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16405: [SPARK-19002][BUILD] Check pep8 against merge_spa...

2016-12-26 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/16405

[SPARK-19002][BUILD] Check pep8 against merge_spark_pr.py script

## What changes were proposed in this pull request?

This PR proposes to check pep8 against `merge_spark_pr.py` script.

```
./dev/merge_spark_pr.py:100:1: E302 expected 2 blank lines, found 1
./dev/merge_spark_pr.py:285:44: E251 unexpected spaces around keyword / 
parameter equals
./dev/merge_spark_pr.py:285:46: E251 unexpected spaces around keyword / 
parameter equals
./dev/merge_spark_pr.py:286:16: E251 unexpected spaces around keyword / 
parameter equals
./dev/merge_spark_pr.py:286:18: E251 unexpected spaces around keyword / 
parameter equals
./dev/merge_spark_pr.py:286:38: E251 unexpected spaces around keyword / 
parameter equals
./dev/merge_spark_pr.py:286:40: E251 unexpected spaces around keyword / 
parameter equals
./dev/merge_spark_pr.py:303:101: E501 line too long (127 > 100 characters)
./dev/merge_spark_pr.py:305:101: E501 line too long (109 > 100 characters)
./dev/merge_spark_pr.py:307:101: E501 line too long (110 > 100 characters)
./dev/merge_spark_pr.py:313:101: E501 line too long (108 > 100 characters)
./dev/merge_spark_pr.py:317:101: E501 line too long (107 > 100 characters)
./dev/merge_spark_pr.py:319:101: E501 line too long (117 > 100 characters)
./dev/merge_spark_pr.py:353:101: E501 line too long (103 > 100 characters)
./dev/merge_spark_pr.py:419:37: E128 continuation line under-indented for 
visual indent
./dev/merge_spark_pr.py:448:101: E501 line too long (103 > 100 characters)
```

## How was this patch tested?

Via doctests, `python -m doctest -v merge_spark_pr.py`


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark minor-pep8

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16405.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16405


commit 8af1edb7185176ea25eac7a19c7438f30b677528
Author: hyukjinkwon 
Date:   2016-12-26T13:07:44Z

Check pep8 against merge_spark_pr.py script




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16405: [SPARK-19002][BUILD] Check pep8 against merge_spark_pr.p...

2016-12-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16405
  
Hi @srowen and @holdenk, this is a small PR to run pep8 against 
`merge_spark_pr.py`. Could I ask if it makes sense please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16405: [SPARK-19002][BUILD] Check pep8 against merge_spark_pr.p...

2016-12-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16405
  
Hm, this was passed on my local.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16405: [SPARK-19002][BUILD] Check pep8 against merge_spark_pr.p...

2016-12-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16405
  
Hi @srowen and @holden, this is a small minor PR to check pep8 against 
`./dev/merge_spark_pr.py`. Could you check if it makes sense please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16405: [SPARK-19002][BUILD] Check pep8 against merge_spa...

2016-12-26 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16405#discussion_r93886392
  
--- Diff: dev/lint-python ---
@@ -23,6 +23,7 @@ PATHS_TO_CHECK="./python/pyspark/ 
./examples/src/main/python/ ./dev/sparktestsup
 # TODO: fix pep8 errors with the rest of the Python scripts under dev
--- End diff --

Sure, make sense. Let me try to do this for all.Thank you both.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16405: [SPARK-19002][BUILD] Check pep8 against dev/*.py scripts

2016-12-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16405
  
Ah, this seems complaining in Python 3.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16397: [WIP][SPARK-18922][TESTS] Fix more path-related test fai...

2016-12-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16397
  
Here, I concatenated all the logs into single file - 
https://gist.github.com/HyukjinKwon/58567451773f87322c7009007e4fdc34

I just found each in PR description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16397: [SPARK-18922][TESTS] Fix more path-related test failures...

2016-12-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16397
  
cc @srowen, could I please ask to review this one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16405: [SPARK-19002][BUILD] Check pep8 against dev/*.py scripts

2016-12-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16405
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16413: Branch 1.3

2016-12-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16413
  
Hi @Kevy123, it seems this pull request is mistakenly open. Could you 
please close this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16405: [SPARK-19002][BUILD] Check pep8 against dev/*.py scripts

2016-12-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16405
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16405: [SPARK-19002][BUILD] Check pep8 against dev/*.py scripts

2016-12-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16405
  
Sure, let me double check.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files

2016-12-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16386
  
Only regarding the comment, 
https://github.com/apache/spark/pull/16386#issuecomment-269386229, I have a 
similar (rather combined) idea that we provide another option such as corrupt 
file name optionally (meaning maybe the column appears only when user 
explicitly set for backwards compatibility), don't add a column by 
`columnNameOfCorruptRecord` with a proper documentation in `wholeFile` mode and 
issue a warning message if `columnNameOfCorruptRecord` is set by user in 
`wholeFile` mode. This is a bit complicated idea that might make users confused 
though. I am not sure if it is the best idea.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16405: [SPARK-19002][BUILD] Check pep8 against all Python scrip...

2016-12-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16405
  
It seems some existing examples such as `random_rdd_generation.py` do not 
work with Python 3.3.6 too although it complies fine so that pep8 check can be 
passed. I fixed only the errors from pep8 here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16405: [SPARK-19002][BUILD] Check pep8 against all Python scrip...

2016-12-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16405
  
BTW, anyone tried Python 3.6.0 with PySpark? I could not even run 
`./bin/pyspark` appeartly with an error. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16397: [SPARK-18922][TESTS] Fix more path-related test f...

2016-12-28 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16397#discussion_r94024218
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/MultiDatabaseSuite.scala ---
@@ -80,7 +80,7 @@ class MultiDatabaseSuite extends QueryTest with 
SQLTestUtils with TestHiveSingle
   |CREATE TABLE t1
   |USING parquet
   |OPTIONS (
-  |  path '$path'
+  |  path '${dir.toURI.toString}'
--- End diff --

I see, let me correct it for the former. In case of `path`, it is being 
used above. I thought this is a minimised change because this is the only 
problematic line, parsing the path wrongly on Windows. For example, 
`C:\tmp\a\b\c` becomes `C:mpabc`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16397: [SPARK-18922][TESTS] Fix more path-related test f...

2016-12-28 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16397#discussion_r94024588
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveCommandSuite.scala
 ---
@@ -257,31 +257,37 @@ class HiveCommandSuite extends QueryTest with 
SQLTestUtils with TestHiveSingleto
 """.stripMargin)
 
   // LOAD DATA INTO partitioned table must specify partition
-  withInputFile { path =>
+  withInputFile { f =>
 intercept[AnalysisException] {
+  val path = f.toURI.toString
--- End diff --

Just simply because some meet the limit of 100 line length in that case. I 
will try to clean up including the comment above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16405: [SPARK-19002][BUILD] Check pep8 against all Python scrip...

2016-12-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16405
  
Ah, thank you for approving @srowen.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16397: [SPARK-18922][TESTS] Fix more path-related test f...

2016-12-28 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16397#discussion_r94030166
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/MultiDatabaseSuite.scala ---
@@ -80,7 +80,7 @@ class MultiDatabaseSuite extends QueryTest with 
SQLTestUtils with TestHiveSingle
   |CREATE TABLE t1
   |USING parquet
   |OPTIONS (
-  |  path '$path'
+  |  path '${dir.toURI.toString}'
--- End diff --

Ah, sure. Let me double check.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16397: [SPARK-18922][TESTS] Fix more path-related test failures...

2016-12-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16397
  
Build started: [TESTS] `ALL` 
[![PR-16397](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=443B17ED-C621-4A3A-B45A-1F5E042189A2&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/443B17ED-C621-4A3A-B45A-1F5E042189A2)
Diff: 
https://github.com/apache/spark/compare/master...spark-test:443B17ED-C621-4A3A-B45A-1F5E042189A2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16397: [SPARK-18922][TESTS] Fix more path-related test failures...

2016-12-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16397
  
Build started: [TESTS] `ALL` 
[![PR-16397](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=F9490ECC-9D49-44C8-8CDE-7BCA9C1FD88C&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/F9490ECC-9D49-44C8-8CDE-7BCA9C1FD88C)
Diff: 
https://github.com/apache/spark/compare/master...spark-test:F9490ECC-9D49-44C8-8CDE-7BCA9C1FD88C


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16397: [SPARK-18922][TESTS] Fix more path-related test failures...

2016-12-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16397
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16429: [WIP][SPARK-19019][PYTHON] Fix hijected collectio...

2016-12-28 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/16429

[WIP][SPARK-19019][PYTHON] Fix hijected collections.namedtuple to be 
serialized with keyword-only arguments

## What changes were proposed in this pull request?

Currently, PySpark does not work with Python 3.6.0.

Running `./bin/pyspark` simply throws the error as below:

```
Traceback (most recent call last):
  File ".../spark/python/pyspark/shell.py", line 30, in 
import pyspark
  File ".../spark/python/pyspark/__init__.py", line 46, in 
from pyspark.context import SparkContext
  File ".../spark/python/pyspark/context.py", line 36, in 
from pyspark.java_gateway import launch_gateway
  File ".../spark/python/pyspark/java_gateway.py", line 31, in 
from py4j.java_gateway import java_import, JavaGateway, GatewayClient
  File "", line 961, in _find_and_load
  File "", line 950, in _find_and_load_unlocked
  File "", line 646, in _load_unlocked
  File "", line 616, in 
_load_backward_compatible
  File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", 
line 18, in 
  File 
"/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pydoc.py",
 line 62, in 
import pkgutil
  File 
"/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pkgutil.py",
 line 22, in 
ModuleInfo = namedtuple('ModuleInfo', 'module_finder name ispkg')
  File ".../spark/python/pyspark/serializers.py", line 394, in namedtuple
cls = _old_namedtuple(*args, **kwargs)
TypeError: namedtuple() missing 3 required keyword-only arguments: 
'verbose', 'rename', and 'module'
```

The root cause seems because the arguments of `namedtuple` are now 
completely keyword-only arguments from Python 3.6.0 (See 
https://bugs.python.org/issue25628).

We currently copy this function via `types.FunctionType` which does not set 
the default values of keyword-only arguments (meaning 
`namedtuple.__kwdefaults__`) and this seems causing internally missing values 
in the function (non-bound arguments).

This PR proposes to work around this by manually setting it via `kwargs` as 
`types.FunctionType` seems not supporting to set this.

## How was this patch tested?

Manually tested with Python 3.6.0.

```
./bin/pyspsark
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-19019

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16429.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16429


commit fb049790b5f96070ebd1006630e24bf20c20319a
Author: hyukjinkwon 
Date:   2016-12-29T02:42:28Z

Fix naedtuple can be serialized with keyword-only arguments too




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16429: [WIP][SPARK-19019][PYTHON] Fix hijected collections.name...

2016-12-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16429
  
cc @davies and @JoshRosen. I know both of you are insightful in this area. 
I am not too sure if this is a correct fix as it seems not even fixed in some 
other Python thirdparty libraries. Do you mind if I ask to take a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16397: [SPARK-18922][TESTS] Fix more path-related test failures...

2016-12-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16397
  
I just checked each is fine in a concatenated log file - 
https://gist.github.com/HyukjinKwon/8851815ede9dcae80632a5378b74d1ae


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16433: [SPARK-19022][TESTS] Fix tests dependent on OS du...

2016-12-29 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/16433

[SPARK-19022][TESTS] Fix tests dependent on OS due to different newline 
characters

## What changes were proposed in this pull request?

There are two tests failing on Windows due to the different newlines.

```
 - StreamingQueryProgress - prettyJson *** FAILED *** (0 milliseconds)
 "{
"id" : "39788670-6722-48b7-a248-df6ba08722ac",
"runId" : "422282f1-3b81-4b47-a15d-82dda7e69390",
"name" : "myName",
...
  }" did not equal "{
"id" : "39788670-6722-48b7-a248-df6ba08722ac",
"runId" : "422282f1-3b81-4b47-a15d-82dda7e69390",
"name" : "myName",
...
  }"
  ...
```

```
 - StreamingQueryStatus - prettyJson *** FAILED *** (0 milliseconds)
 "{
"message" : "active",
"isDataAvailable" : true,
"isTriggerActive" : false
  }" did not equal "{
"message" : "active",
"isDataAvailable" : true,
"isTriggerActive" : false
  }" 
  ...
```

The reason is, `pretty` in `org.json4s.pretty` writes OS-dependent newlines 
but the string defined in the tests are `\n`. This ends up with test failures.

This PR proposes to compare these regardless of newline concerns.

## How was this patch tested?

Manually tested via AppVeyor.

**Before**

https://ci.appveyor.com/project/spark-test/spark/build/417-newlines-fix-before

**After**
https://ci.appveyor.com/project/spark-test/spark/build/418-newlines-fix

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark 
tests-StreamingQueryStatusAndProgressSuite

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16433.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16433


commit 15f821cadd39027cfd8860309e32d6b06be92833
Author: hyukjinkwon 
Date:   2016-12-29T05:27:05Z

Fix newline comparison issues




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16433: [SPARK-19022][TESTS] Fix tests dependent on OS due to di...

2016-12-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16433
  
Build started: [TESTS] 
`org.apache.spark.sql.streaming.StreamingQueryStatusAndProgressSuite` 
[![PR-16433](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=AE40452F-D970-407C-92EB-C8079EC86A06&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/AE40452F-D970-407C-92EB-C8079EC86A06)
Diff: 
https://github.com/apache/spark/compare/master...spark-test:AE40452F-D970-407C-92EB-C8079EC86A06


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16428: [SPARK-19018][SQL] ADD csv write charset param

2016-12-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16428
  
Do you mind if I ask wheather it writes the line separstor correctly as the 
encoding specified in the option?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16433: [SPARK-19022][TESTS] Fix tests dependent on OS du...

2016-12-29 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16433#discussion_r94200602
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryStatusAndProgressSuite.scala
 ---
@@ -30,10 +30,16 @@ import 
org.apache.spark.sql.streaming.StreamingQueryStatusAndProgressSuite._
 
 
 class StreamingQueryStatusAndProgressSuite extends StreamTest {
+  implicit class EqualsIgnoreCRLF(source: String) {
+def equalsIgnoreCRLF(target: String): Boolean = {
+  source.stripMargin.replaceAll("\r\n|\r|\n", System.lineSeparator) ===
--- End diff --

Oh, sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16433: [SPARK-19022][TESTS] Fix tests dependent on OS due to di...

2016-12-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16433
  
In most cases, it seems they explicitly write `\n` (e.g. writing CSV and 
JSON). _Apparently_, these seem only tests being failed due to this problem 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16397: [SPARK-18922][TESTS] Fix more path-related test failures...

2016-12-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16397
  
@srowen, thank you Sean. I think it is okay for now. To be honest, I found 
some more same instances but I haven't fixed, tested and verified them yet. 
Maybe, I need one more go to deal with them all cleanly. I hope it is okay to 
go ahead and merge this as is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against all Pyth...

2016-12-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16405
  
I just manually ran `./dev/create-release/translate-contributors.py` which 
had a conflict for sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16433: [SPARK-19022][TESTS] Fix tests dependent on OS due to di...

2016-12-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16433
  
Build started: [TESTS] 
`org.apache.spark.sql.streaming.StreamingQueryStatusAndProgressSuite` 
[![PR-16433](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=D1A3B54F-82B5-481D-ADE8-7CC273C97303&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/D1A3B54F-82B5-481D-ADE8-7CC273C97303)
Diff: 
https://github.com/apache/spark/compare/master...spark-test:D1A3B54F-82B5-481D-ADE8-7CC273C97303


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.n...

2016-12-29 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16429#discussion_r94201674
  
--- Diff: python/pyspark/serializers.py ---
@@ -382,18 +382,30 @@ def _hijack_namedtuple():
 return
 
 global _old_namedtuple  # or it will put in closure
+global _old_namedtuple_kwdefaults  # or it will put in closure too
 
 def _copy_func(f):
 return types.FunctionType(f.__code__, f.__globals__, f.__name__,
   f.__defaults__, f.__closure__)
 
+def _kwdefaults(f):
+kargs = getattr(f, "__kwdefaults__", None)
--- End diff --

`__kwdefaults__` can be `None` or not existing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against all Pyth...

2016-12-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16405
  
retest yhis please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against all Pyth...

2016-12-30 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16405
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16428: [SPARK-19018][SQL] ADD csv write charset param

2016-12-30 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16428
  
BTW, the reason I asked that in 
https://github.com/apache/spark/pull/16428#issuecomment-269635303 is I remember 
that I checked the reading/writing paths related with encodings before and the 
encoding should be set to line record reader.

I just now double-chekced that newlines were `\n` for each batch due to 
[`TextOutputFormat`s record 
writer](https://github.com/apache/hadoop/blob/f67237cbe7bc48a1b9088e990800b37529f1db2a/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/TextOutputFormat.java#L48-L49)
 but it seems it was changed in [the recent 
commit](https://github.com/apache/spark/pull/16089/files#diff-6a14f6bb643b1474139027d72a17f41aL203).
   So, now, it seems the newlines are dependent on univocity library.

We should add some tests for this for sure, in `CSVSuite` to verify this 
behaviour and prevent regressions.

As a small side note, we don't currently support non-ascii compatible 
encodings in reading path if I haven't missed some changes in this path.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16397: [SPARK-18922][TESTS] Fix more path-related test failures...

2016-12-30 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16397
  
It was a problem because I could not proceed further because the error 
messages were flooding and somehow the logs were truncated in AppVeyor (e.g. 
https://ci.appveyor.com/project/spark-test/spark/build/376-hive-failed-tests).

I had to run tests separately but I figured it out to run (almost) all 
tests via AppVeyor, e.g., 
[![PR-16397](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=F9490ECC-9D49-44C8-8CDE-7BCA9C1FD88C&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/F9490ECC-9D49-44C8-8CDE-7BCA9C1FD88C).
 But, now then this takes a lot of times (7h 12m in this case).

I believe It of course makes easier to spot the errors because 

  - it gets rid of a lot of flooding errors I believe and I can easily spot 
the errors
  - therefore, I can run some concatenated tests.

I am willing to try to add all here if these reasons are not persuasive 
enough.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16433: [SPARK-19022][TESTS] Fix tests dependent on OS due to di...

2016-12-30 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16433
  
Yes, I hesitated to submit this PR for a while due to similar concerns.. 

> is it because this is the only test for prettyJson?

I believe so. Let me double check again.

> I also make sure that, say, we do want the output of prettyJson to vary 
by platform. Hm, I guess that's reasonable here as it's meant for display on a 
terminal I guess.

^
@zsxwing could you confirm this please if possible?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16397: [SPARK-18922][TESTS] Fix more path-related test failures...

2016-12-30 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16397
  
Thank you @srowen !!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16428: [SPARK-19018][SQL] ADD csv write charset param

2016-12-30 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16428
  
Ah, I meant to add a test there in this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16428: [SPARK-19018][SQL] ADD csv write charset param

2016-12-30 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16428#discussion_r94238157
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala
 ---
@@ -71,7 +71,9 @@ private[csv] class CSVOptions(@transient private val 
parameters: CaseInsensitive
   val delimiter = CSVTypeCast.toChar(
 parameters.getOrElse("sep", parameters.getOrElse("delimiter", ",")))
   private val parseMode = parameters.getOrElse("mode", "PERMISSIVE")
-  val charset = parameters.getOrElse("encoding",
+  val readCharSet = parameters.getOrElse("encoding",
+parameters.getOrElse("charset", StandardCharsets.UTF_8.name()))
+  val writeCharSet = parameters.getOrElse("writeEncoding",
--- End diff --

I think we should not necessarily introduce additional option. We could 
just use `charset` variable because other options such as `nullValue` are 
already applied to both reading and writing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16428: [SPARK-19018][SQL] ADD csv write charset param

2016-12-30 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16428#discussion_r94239452
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -573,6 +573,7 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
* indicates a timestamp format. Custom date formats follow the formats 
at
* `java.text.SimpleDateFormat`. This applies to timestamp type.
* 
+   * `writeEncoding`(default `utf-8`) save dataFrame 2 csv by giving 
encoding
--- End diff --

We also should add the same documentation in `readwriter.py`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against a...

2016-12-30 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16405#discussion_r94263510
  
--- Diff: 
examples/src/main/python/mllib/decision_tree_regression_example.py ---
@@ -44,7 +44,7 @@
 # Evaluate model on test instances and compute test error
 predictions = model.predict(testData.map(lambda x: x.features))
 labelsAndPredictions = testData.map(lambda lp: 
lp.label).zip(predictions)
-testMSE = labelsAndPredictions.map(lambda (v, p): (v - p) * (v - 
p)).sum() /\
+testMSE = labelsAndPredictions.map(lambda lp: (lp[0] - lp[1]) * (lp[0] 
- lp[1])).sum() /\
--- End diff --

That seems causing errors in python 3 when a tuple is used in lambda to 
unpack. It seems http://www.python.org/dev/peps/pep-3113 is related issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against a...

2016-12-30 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16405#discussion_r94263914
  
--- Diff: dev/lint-python ---
@@ -19,10 +19,8 @@
 
 SCRIPT_DIR="$( cd "$( dirname "$0" )" && pwd )"
 SPARK_ROOT_DIR="$(dirname "$SCRIPT_DIR")"
-PATHS_TO_CHECK="./python/pyspark/ ./examples/src/main/python/ 
./dev/sparktestsupport"
-# TODO: fix pep8 errors with the rest of the Python scripts under dev
-PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/run-tests.py ./python/*.py 
./dev/run-tests-jenkins.py"
-PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/pip-sanity-check.py"
+# Exclude auto-geneated configuration file.
+PATHS_TO_CHECK="$( find "$SPARK_ROOT_DIR" -name "*.py" -not -path 
"*python/docs/conf.py" )"
--- End diff --

Yea, I think this is a valid point. Let me check the length and the length 
limitation first for sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against a...

2016-12-31 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16405#discussion_r94273247
  
--- Diff: dev/lint-python ---
@@ -19,10 +19,8 @@
 
 SCRIPT_DIR="$( cd "$( dirname "$0" )" && pwd )"
 SPARK_ROOT_DIR="$(dirname "$SCRIPT_DIR")"
-PATHS_TO_CHECK="./python/pyspark/ ./examples/src/main/python/ 
./dev/sparktestsupport"
-# TODO: fix pep8 errors with the rest of the Python scripts under dev
-PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/run-tests.py ./python/*.py 
./dev/run-tests-jenkins.py"
-PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/pip-sanity-check.py"
+# Exclude auto-geneated configuration file.
+PATHS_TO_CHECK="$( find "$SPARK_ROOT_DIR" -name "*.py" -not -path 
"*python/docs/conf.py" )"
--- End diff --

It seems usually 32K on Cygwin by default in general. The actual length 
without any prefix seems 11K for now. Let me try to turn these into relative 
paths as a safe choice. Then, it would be safe in general.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against a...

2016-12-31 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16405#discussion_r94273331
  
--- Diff: dev/lint-python ---
@@ -19,10 +19,8 @@
 
 SCRIPT_DIR="$( cd "$( dirname "$0" )" && pwd )"
 SPARK_ROOT_DIR="$(dirname "$SCRIPT_DIR")"
-PATHS_TO_CHECK="./python/pyspark/ ./examples/src/main/python/ 
./dev/sparktestsupport"
-# TODO: fix pep8 errors with the rest of the Python scripts under dev
-PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/run-tests.py ./python/*.py 
./dev/run-tests-jenkins.py"
-PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/pip-sanity-check.py"
+# Exclude auto-geneated configuration file.
+PATHS_TO_CHECK="$( cd "$SPARK_ROOT_DIR" && find . -name "*.py" -not -path 
"*python/docs/conf.py" )"
--- End diff --

I tested this as below for sure,

```bash
./lint-python
./dev/lint-python
./spark/dev/lint-python
```

So, now it is relative paths which currently are up to 11K as below:

```
./dev/create-release/generate-contributors.py 
./dev/create-release/releaseutils.py 
./dev/create-release/translate-contributors.py ./dev/github_jira_sync.py 
./dev/merge_spark_pr.py ./dev/pep8-1.7.0.py ./dev/pip-sanity-check.py 
./dev/run-tests-jenkins.py ./dev/run-tests.py 
./dev/sparktestsupport/__init__.py ./dev/sparktestsupport/modules.py 
./dev/sparktestsupport/shellutils.py ./dev/sparktestsupport/toposort.py 
./examples/src/main/python/als.py 
./examples/src/main/python/avro_inputformat.py 
./examples/src/main/python/kmeans.py 
./examples/src/main/python/logistic_regression.py 
./examples/src/main/python/ml/aft_survival_regression.py 
./examples/src/main/python/ml/als_example.py 
./examples/src/main/python/ml/binarizer_example.py 
./examples/src/main/python/ml/bisecting_k_means_example.py 
./examples/src/main/python/ml/bucketizer_example.py 
./examples/src/main/python/ml/chisq_selector_example.py 
./examples/src/main/python/ml/count_vectorizer_example.py 
./examples/src/main/python/ml/cross
 _validator.py ./examples/src/main/python/ml/dataframe_example.py 
./examples/src/main/python/ml/dct_example.py 
./examples/src/main/python/ml/decision_tree_classification_example.py 
./examples/src/main/python/ml/decision_tree_regression_example.py 
./examples/src/main/python/ml/elementwise_product_example.py 
./examples/src/main/python/ml/estimator_transformer_param_example.py 
./examples/src/main/python/ml/gaussian_mixture_example.py 
./examples/src/main/python/ml/generalized_linear_regression_example.py 
./examples/src/main/python/ml/gradient_boosted_tree_classifier_example.py 
./examples/src/main/python/ml/gradient_boosted_tree_regressor_example.py 
./examples/src/main/python/ml/index_to_string_example.py 
./examples/src/main/python/ml/isotonic_regression_example.py 
./examples/src/main/python/ml/kmeans_example.py 
./examples/src/main/python/ml/lda_example.py 
./examples/src/main/python/ml/linear_regression_with_elastic_net.py 
./examples/src/main/python/ml/logistic_regression_summary_example.
 py ./examples/src/main/python/ml/logistic_regression_with_elastic_net.py 
./examples/src/main/python/ml/max_abs_scaler_example.py 
./examples/src/main/python/ml/min_max_scaler_example.py 
./examples/src/main/python/ml/multiclass_logistic_regression_with_elastic_net.py
 ./examples/src/main/python/ml/multilayer_perceptron_classification.py 
./examples/src/main/python/ml/n_gram_example.py 
./examples/src/main/python/ml/naive_bayes_example.py 
./examples/src/main/python/ml/normalizer_example.py 
./examples/src/main/python/ml/one_vs_rest_example.py 
./examples/src/main/python/ml/onehot_encoder_example.py 
./examples/src/main/python/ml/pca_example.py 
./examples/src/main/python/ml/pipeline_example.py 
./examples/src/main/python/ml/polynomial_expansion_example.py 
./examples/src/main/python/ml/quantile_discretizer_example.py 
./examples/src/main/python/ml/random_forest_classifier_example.py 
./examples/src/main/python/ml/random_forest_regressor_example.py 
./examples/src/main/python/ml/rformula_example.py
  ./examples/src/main/python/ml/sql_transformer.py 
./examples/src/main/python/ml/standard_scaler_example.py 
./examples/src/main/python/ml/stopwords_remover_example.py 
./examples/src/main/python/ml/string_indexer_example.py 
./examples/src/main/python/ml/tf_idf_example.py 
./examples/src/main/python/ml/tokenizer_example.py 
./examples/src/main/python/ml/train_validation_split.py 
./examples/src/main/python/ml/vector_assembler_example.py 
./examples/src/main/python/ml/vector_indexer_example.py 
./examples/src/main/python/ml/vector_slicer_example.py 
./examples/src/main/python/ml/word2vec_example.py 
./examples/src/main/python/mllib/binary_classification_metrics_example.py 
./examples/src/main/python/mllib/bisecting_k_means_example.py 
./ex

[GitHub] spark pull request #16428: [SPARK-19018][SQL] ADD csv write charset param

2016-12-31 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16428#discussion_r94273423
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -659,7 +659,7 @@ def text(self, path, compression=None):
 self._jwrite.text(path)
 
 @since(2.0)
-def csv(self, path, mode=None, compression=None, sep=None, quote=None, 
escape=None,
+def csv(self, path, mode=None, compression=None, sep=None, 
encoding=None, quote=None, escape=None,
--- End diff --

We need to place this new option at the end. Otherwise, it will breaks 
existing codes that use this options as a positional argument (not keyword 
argument).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16428: [SPARK-19018][SQL] ADD csv write charset param

2016-12-31 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16428#discussion_r94273531
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 ---
@@ -33,6 +33,7 @@ import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.test.{SharedSQLContext, SQLTestUtils}
 import org.apache.spark.sql.types._
 
+//noinspection ScalaStyle
--- End diff --

We can disable only the lines with the block as below if you need this for 
non-ascii characters: 

```scala
// scalastyle:off
...
// scalastyle:on
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16428: [SPARK-19018][SQL] ADD csv write charset param

2016-12-31 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16428#discussion_r94273548
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -573,6 +573,7 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
* indicates a timestamp format. Custom date formats follow the formats 
at
* `java.text.SimpleDateFormat`. This applies to timestamp type.
* 
+   * `encoding`(default `utf-8`) save dataFrame 2 csv by giving 
encoding
--- End diff --

Could we just resemble the documentation in `DataFrameReader` just for 
consistency?

```
 `encoding` (default `UTF-8`): decodes the CSV files by the given 
encoding
   * type.
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16428: [SPARK-19018][SQL] ADD csv write charset param

2016-12-31 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16428#discussion_r94273678
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 ---
@@ -905,4 +906,21 @@ class CSVSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils {
   checkAnswer(df, Row(1, null))
 }
   }
+
+  test("save data with gb18030") {
+withTempPath{ path =>
--- End diff --

nit: it should be `withTempPath { path =>`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16428: [SPARK-19018][SQL] ADD csv write charset param

2016-12-31 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16428#discussion_r94273677
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -677,6 +677,8 @@ def csv(self, path, mode=None, compression=None, 
sep=None, quote=None, escape=No
 snappy and deflate).
 :param sep: sets the single character as a separator for each 
field and value. If None is
 set, it uses the default value, ``,``.
+:param encoding: sets writer CSV files by the given encoding type. 
If None is set,
+ it uses the default value, ``UTF-8``.
--- End diff --

Here too, let's resemble the one in `DataFrameReader` above in this file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16428: [SPARK-19018][SQL] ADD csv write charset param

2016-12-31 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16428#discussion_r94273685
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 ---
@@ -905,4 +906,21 @@ class CSVSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils {
   checkAnswer(df, Row(1, null))
 }
   }
+
+  test("save data with gb18030") {
+withTempPath{ path =>
+  Seq(("1", "中文"))
+.toDF("num", "lanaguage")
+.write
+.option("encoding", "GB18030")
+.option("header", "true")
+.csv(path.getAbsolutePath)
+  val df = spark.read
+.option("header", "true")
+.option("encoding", "GB18030")
+.csv(path.getAbsolutePath)
+
+  checkAnswer(df, Row("1", "中文"))
--- End diff --

Could we write this something like as below:

```scala
// scalastyle:off
val df = Seq(("1", "中文")).toDF("num", "lanaguage")
// scalastyle:on
df.write
  .option("header", "true")
  .option("encoding", "GB18030")
  .csv(path.getAbsolutePath)

val readBack = spark.read
  .option("header", "true")
  .option("encoding", "GB18030")
  .csv(path.getAbsolutePath)
 
checkAnswer(df, readBack)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16428: [SPARK-19018][SQL] ADD csv write charset param

2016-12-31 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16428#discussion_r94273737
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -573,6 +573,7 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
* indicates a timestamp format. Custom date formats follow the formats 
at
* `java.text.SimpleDateFormat`. This applies to timestamp type.
* 
+   * `encoding`(default `utf-8`) save dataFrame 2 csv by giving 
encoding
--- End diff --

looks good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16428: [SPARK-19018][SQL] ADD csv write charset param

2016-12-31 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16428#discussion_r94273866
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -573,6 +573,7 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
* indicates a timestamp format. Custom date formats follow the formats 
at
* `java.text.SimpleDateFormat`. This applies to timestamp type.
* 
+   * `encoding`(default `utf-8`) save dataFrame 2 csv by giving 
encoding
--- End diff --

Oh, also, it seems the newly added option here should be put in ..

```

...

```

so that this can be rendered fine in Java API documentation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16433: [SPARK-19022][TESTS] Fix tests dependent on OS due to di...

2016-12-31 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16433
  
I just double checked. It seems `org.json4s.pretty` is being used in 
several places but they look for debugging purpose, printing purpose and making 
a request body (e.g., `StandaloneRestSubmitSuite`). So, for 
`org.json4s.pretty`, it seems these are only tests being failed due to this 
problem.

About OS-dependent newline tests, I just checked the rest of them. I 
skimmed again the failed tests at my best and it seems these are only the tests 
being failed due to this problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15848: [SPARK-9487] Use the same num. worker threads in Java/Sc...

2017-01-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15848
  
@skanjila We would be able to close this if there are no updates for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16397: [SPARK-18922][TESTS] Fix more path-related test failures...

2017-01-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16397
  
@srowen, Otherwise, I think I am able to open a `[WIP]` or `[DO-NOT-MERGE]` 
PR and then push & test again and again some commits fixing these rather than 
trying to only verify these via only the local branches in my @spark-test 
account (which I am currently doing) because my appveyor scripts can easily run 
tests against a PR.

Do you mind if I open a long-time open `[WIP]` or `[DO-NOT-MERGE]` PR to 
find all failing tests related with this issue if you are worried of merging 
multiple PRs that fixes the same issues?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16397: [SPARK-18922][TESTS] Fix more path-related test failures...

2017-01-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16397
  
Yes, I think I am almost there and am fixing these although these are 
slightly more than I expected before due to some errors I didn't think were 
caused by this issue such as 

```
org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 
'csv_table' not found in database 'default';
```

and aborted tests which I missed while just grepping.

But, I think these can be still in one go. Let me just try to verify them 
as usual and then open a short-term wip PR like this.

I just asked this just because suddenly I realised this might be a better 
idea for the best.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16397: [SPARK-18922][TESTS] Fix more path-related test failures...

2017-01-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16397
  
BTW, thanks again for your quick response.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16451: [WIP][SPARK-18922][SQL][CORE][TESTS] Fix all iden...

2017-01-02 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/16451

[WIP][SPARK-18922][SQL][CORE][TESTS] Fix all identified tests failed due to 
path and resources problems on Windows

## What changes were proposed in this pull request?

WIP - just opened this first to run some more tests together with Jenkins 
and AppVeyor.

## How was this patch tested?



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark all-path-resource-fixes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16451.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16451


commit e58f0bdd170421d484c384d8d8feb3f18eae310c
Author: hyukjinkwon 
Date:   2017-01-02T04:43:20Z

Fix more path and resources problems




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [WIP][SPARK-18922][SQL][CORE][TESTS] Fix all identified ...

2017-01-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
Build started: [TESTS] `ALL` 
[![PR-16451](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=044D6A78-26AA-4A2C-A4A1-B39DF60C811C&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/044D6A78-26AA-4A2C-A4A1-B39DF60C811C)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [WIP][SPARK-18922][SQL][CORE][TESTS] Fix all identified ...

2017-01-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.namedtup...

2017-01-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16429
  
gentle ping..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on...

2017-01-02 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16320#discussion_r94358447
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -85,7 +85,9 @@ private[csv] object CSVInferSchema {
 case NullType => tryParseInteger(field, options)
 case IntegerType => tryParseInteger(field, options)
 case LongType => tryParseLong(field, options)
-case _: DecimalType => tryParseDecimal(field, options)
+case _: DecimalType =>
+  // DecimalTypes have different precisions and scales, so we try 
to find the common type.
+  findTightestCommonType(typeSoFar, tryParseDecimal(field, 
options)).getOrElse(NullType)
--- End diff --

Yes, otherwise, it might end up with an incorrect datatypes. For example,

```scala
val path = "/tmp/test1"
Seq(s"${Long.MaxValue}1", "2015-12-01 00:00:00", 
"1").toDF().coalesce(1).write.text(path)
spark.read.option("inferSchema", true).csv(path).printSchema()
```

```
root
 |-- _c0: integer (nullable = true)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.namedtup...

2017-01-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16429
  
Thanks for your interests @azmras.  I just checked it as below:

```python
sc.parallelize(range(100), 8)
```

```
Traceback (most recent call last):
  File ".../spark/python/pyspark/cloudpickle.py", line 107, in dump
return Pickler.dump(self, obj)
  File 
"/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py",
 line 409, in dump
self.save(obj)
  File 
"/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py",
 line 476, in save
f(self, obj) # Call unbound method with explicit self
  File 
"/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py",
 line 751, in save_tuple
save(element)
  File 
"/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py",
 line 476, in save
f(self, obj) # Call unbound method with explicit self
  File ".../spark/python/pyspark/cloudpickle.py", line 214, in save_function
self.save_function_tuple(obj)
  File ".../spark/python/pyspark/cloudpickle.py", line 244, in 
save_function_tuple
code, f_globals, defaults, closure, dct, base_globals = 
self.extract_func_data(func)
  File ".../spark/python/pyspark/cloudpickle.py", line 306, in 
extract_func_data
func_global_refs = self.extract_code_globals(code)
  File ".../spark/python/pyspark/cloudpickle.py", line 288, in 
extract_code_globals
out_names.add(names[oparg])
IndexError: tuple index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "", line 1, in 
  File ".../spark/python/pyspark/rdd.py", line 198, in __repr__
return self._jrdd.toString()
  File ".../spark/python/pyspark/rdd.py", line 2438, in _jrdd
self._jrdd_deserializer, profiler)
  File ".../spark/python/pyspark/rdd.py", line 2371, in _wrap_function
pickled_command, broadcast_vars, env, includes = 
_prepare_for_python_RDD(sc, command)
  File ".../spark/python/pyspark/rdd.py", line 2357, in 
_prepare_for_python_RDD
pickled_command = ser.dumps(command)
  File ".../spark/python/pyspark/serializers.py", line 452, in dumps
return cloudpickle.dumps(obj, 2)
  File ".../spark/python/pyspark/cloudpickle.py", line 667, in dumps
cp.dump(obj)
  File ".../spark/python/pyspark/cloudpickle.py", line 115, in dump
if "'i' format requires" in e.message:
AttributeError: 'IndexError' object has no attribute 'message'
```

It looks another issue with Python 3.6.0. This is only related with the 
hijacked `collections.namedtuple`.

We should port 
https://github.com/cloudpipe/cloudpickle/commit/4945361c2db92095f934b92a6c00316243caf3cc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.namedtup...

2017-01-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16429
  
Hi @joshrosen and @davies, do you think that should be ported in this PR? I 
am worried of making this PR harder by porting it here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.namedtup...

2017-01-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16429
  
Hi @azmras, now it should work fine for your case as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.namedtup...

2017-01-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16429
  
@azmras Could you maybe double check? It works okay in my local as below:

```
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.2.0-SNAPSHOT
  /_/

Using Python version 3.6.0 (default, Dec 24 2016 00:01:50)
SparkSession available as 'spark'.
>>> sc.parallelize(range(100), 8).take(5)
[0, 1, 2, 3, 4]
>>> sc.parallelize(range(1000), 20).take(5)
[0, 1, 2, 3, 4]
>>>
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [WIP][SPARK-18922][SQL][CORE][TESTS] Fix all identified ...

2017-01-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
Build started: [TESTS] 
`org.apache.spark.streaming.kafka.ReliableKafkaStreamSuite` 
[![PR-16451](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=A2836427-A94C-4BE0-9D24-537B09362C69&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/A2836427-A94C-4BE0-9D24-537B09362C69)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [WIP][SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ...

2017-01-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
Build started: [TESTS] 
`org.apache.spark.streaming.kafka.DirectKafkaStreamSuite` 
[![PR-16451](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=1C2B248D-2455-4ADB-AC8A-1CEB93E4EC5F&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/1C2B248D-2455-4ADB-AC8A-1CEB93E4EC5F)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [WIP][SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ...

2017-01-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [WIP][SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ...

2017-01-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
Build started: [TESTS] 
`org.apache.spark.streaming.kafka010.DirectKafkaStreamSuite` 
[![PR-16451](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=887C39EC-849A-40E5-BAE7-771BDF5BC98A&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/887C39EC-849A-40E5-BAE7-771BDF5BC98A)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [WIP][SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ...

2017-01-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [WIP][SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ...

2017-01-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
Build started: [TESTS] 
`org.apache.spark.streaming.kafka010.DirectKafkaStreamSuite` 
[![PR-16451](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=E8488472-738C-4ADF-A924-8F858728D120&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/E8488472-738C-4ADF-A924-8F858728D120)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.namedtup...

2017-01-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16429
  
@azmras Thank you for confirming this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [WIP][SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
Build started: [TESTS] 
`org.apache.spark.streaming.kafka010.DirectKafkaStreamSuite` 
[![PR-16451](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=A7615F8B-58B0-4D9B-A914-32E7BF7DCB65&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/A7615F8B-58B0-4D9B-A914-32E7BF7DCB65)
Build started: [TESTS] `org.apache.spark.sql.hive.execution.SQLQuerySuite` 
[![PR-16451](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=3789CF31-AF57-492C-9FF7-5235D5C8C124&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/3789CF31-AF57-492C-9FF7-5235D5C8C124)
Build started: [TESTS] 
`org.apache.spark.sql.hive.MetastoreDataSourcesSuite` 
[![PR-16451](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=451A5CFC-6AB3-498B-86A0-43DED5C0F13A&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/451A5CFC-6AB3-498B-86A0-43DED5C0F13A)
Build started: [TESTS] 
`org.apache.spark.streaming.kafka.ReliableKafkaStreamSuite` 
[![PR-16451](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=D1BE653C-EDE2-4E4E-8781-85EE95CA078B&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/D1BE653C-EDE2-4E4E-8781-85EE95CA078B)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [WIP][SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
Now, there are 30ish tests failed on Windows which I could identify via 
AppVeyor tests 
[here](https://gist.github.com/HyukjinKwon/88a0b37cd027934bc14f3aa9f812be31) 
which I am currently working on. Their causes do not look resource or path 
related problems


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [WIP][SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [WIP][SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
Build started: [TESTS] 
`org.apache.spark.sql.hive.PartitionedTablePerfStatsSuite` 
[![PR-16451](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=0C0F228B-9B67-49AC-9C35-4385944721D0&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/0C0F228B-9B67-49AC-9C35-4385944721D0)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16451: [WIP][SPARK-18922][SQL][CORE][STREAMING][TESTS] F...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16451#discussion_r94562021
  
--- Diff: core/src/test/scala/org/apache/spark/util/UtilsSuite.scala ---
@@ -482,7 +482,7 @@ class UtilsSuite extends SparkFunSuite with 
ResetSystemProperties with Logging {
   
s"hdfs:/jar1,file:/jar2,file:$cwd/jar3,file:$cwd/jar4#jar5,file:$cwd/path%20to/jar6")
 if (Utils.isWindows) {
   assertResolves("""hdfs:/jar1,file:/jar2,jar3,C:\pi.py#py.pi,C:\path 
to\jar4""",
-
s"hdfs:/jar1,file:/jar2,file:$cwd/jar3,file:/C:/pi.py#py.pi,file:/C:/path%20to/jar4")
+
s"hdfs:/jar1,file:/jar2,file:$cwd/jar3,file:/C:/pi.py%23py.pi,file:/C:/path%20to/jar4")
--- End diff --

This test was already being failed on Windows.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16451: [WIP][SPARK-18922][SQL][CORE][STREAMING][TESTS] F...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16451#discussion_r94561930
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -1485,17 +1485,18 @@ private[spark] object Utils extends Logging {
   /** Return uncompressed file length of a compressed file. */
   private def getCompressedFileLength(file: File): Long = {
 try {
-  // Uncompress .gz file to determine file size.
-  var fileSize = 0L
-  val gzInputStream = new GZIPInputStream(new FileInputStream(file))
-  val bufSize = 1024
-  val buf = new Array[Byte](bufSize)
-  var numBytes = ByteStreams.read(gzInputStream, buf, 0, bufSize)
-  while (numBytes > 0) {
-fileSize += numBytes
-numBytes = ByteStreams.read(gzInputStream, buf, 0, bufSize)
+  tryWithResource(new GZIPInputStream(new FileInputStream(file))) { 
gzInputStream =>
--- End diff --

This simply changes from

```scala
val gzInputStream = new GZIPInputStream(new FileInputStream(file))
...
```

to 

```scala
tryWithResource(new GZIPInputStream(new FileInputStream(file))) { 
gzInputStream =>
  ...
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16451: [WIP][SPARK-18922][SQL][CORE][STREAMING][TESTS] F...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16451#discussion_r94564561
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/receiver/ReceiverSupervisorImpl.scala
 ---
@@ -175,6 +175,12 @@ private[streaming] class ReceiverSupervisorImpl(
   }
 
   override protected def onStop(message: String, error: Option[Throwable]) 
{
+receivedBlockHandler match {
+  case handler: WriteAheadLogBasedBlockHandler =>
+// Write ahead log should be closed.
+handler.stop()
--- End diff --

It seems closing write ahead log is missed. This causes the test failure in 
`org.apache.spark.streaming.kafka.ReliableKafkaStreamSuite`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16451: [WIP][SPARK-18922][SQL][CORE][STREAMING][TESTS] F...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16451#discussion_r94562979
  
--- Diff: 
external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/DirectKafkaStreamSuite.scala
 ---
@@ -372,7 +367,7 @@ class DirectKafkaStreamSuite
   sendData(i)
 }
 
-eventually(timeout(10 seconds), interval(50 milliseconds)) {
+eventually(timeout(20 seconds), interval(50 milliseconds)) {
--- End diff --

This test seems too flacky on Windows (at least in AppVeyor). Now, it looks 
passed in most cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16451: [WIP][SPARK-18922][SQL][CORE][STREAMING][TESTS] F...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16451#discussion_r94562260
  
--- Diff: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala
 ---
@@ -138,10 +139,15 @@ class KafkaTestUtils extends Logging {
 
 if (server != null) {
   server.shutdown()
+  server.awaitShutdown()
   server = null
 }
 
-brokerConf.logDirs.foreach { f => Utils.deleteRecursively(new File(f)) 
}
+// On Windows, `logDirs` is left open even after Kafka server above is 
completely shut-downed
+// in some cases. It leads to test failures on Windows if these are 
not ignored.
+brokerConf.logDirs.map(new File(_))
+  .filter(FileUtils.deleteQuietly)
+  .foreach(f => logWarning("Failed to delete: " + f.getAbsolutePath))
--- End diff --

It really looks an issue in Kafka. The broker seems shut-downed without 
closing the log directories in some cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16451: [WIP][SPARK-18922][SQL][CORE][STREAMING][TESTS] F...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16451#discussion_r94563476
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -222,25 +223,34 @@ case class LoadDataCommand(
 val loadPath =
   if (isLocal) {
 val uri = Utils.resolveURI(path)
-val filePath = uri.getPath()
-val exists = if (filePath.contains("*")) {
+val file = new File(uri.getPath)
+val exists = if (file.getAbsolutePath.contains("*")) {
   val fileSystem = FileSystems.getDefault
-  val pathPattern = fileSystem.getPath(filePath)
-  val dir = pathPattern.getParent.toString
+  val dir = file.getParentFile.getAbsolutePath
   if (dir.contains("*")) {
 throw new AnalysisException(
   s"LOAD DATA input path allows only filename wildcard: $path")
   }
 
+  // Note that special characters such as "*" on Windows are not 
allowed as a path.
+  // Calling `WindowsFileSystem.getPath` throws an exception if 
there are in the path.
+  val dirPath = fileSystem.getPath(dir)
+  val pathPattern = new File(dirPath.toAbsolutePath.toString, 
file.getName).toURI.getPath
+  val safePathPattern = if (Utils.isWindows) {
+// On Windows, the pattern should not start with slashes for 
absolute file paths.
+pathPattern.stripPrefix("/")
--- End diff --

On Windows, both `C:\\a\\b\\c` and `C:/a/b/c` are allowed here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16451: [WIP][SPARK-18922][SQL][CORE][STREAMING][TESTS] F...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16451#discussion_r94564162
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala ---
@@ -339,10 +339,15 @@ class HiveSparkSubmitSuite
   private def runSparkSubmit(args: Seq[String]): Unit = {
 val sparkHome = sys.props.getOrElse("spark.test.home", 
fail("spark.test.home is not set!"))
 val history = ArrayBuffer.empty[String]
-val commands = Seq("./bin/spark-submit") ++ args
+val sparkSubmit = if (Utils.isWindows) {
+  new File("..\\..\\bin\\spark-submit.cmd").getAbsolutePath
+} else {
+  new File("../../bin/spark-submit").getAbsolutePath
+}
+val commands = Seq(sparkSubmit) ++ args
 val commandLine = commands.mkString("'", "' '", "'")
 
-val builder = new ProcessBuilder(commands: _*).directory(new 
File(sparkHome))
--- End diff --

`ProcessBuilder.directory` seems not changing the working directory on 
Windows. I verified this with the codes below:

```scala
import scala.io.Source
import java.lang.ProcessBuilder
import java.io.File

val sparkHome = "your-spark-home"
val process = new ProcessBuilder(".\\bin\\spark-submit.cmd").directory(new 
File(sparkHome)).start()
process.waitFor()
Source.fromInputStream(process.getInputStream()).getLines().mkString("\n")
```

This code path resembles `org.apache.spark.deploy.SparkSubmitSuite` and the 
test codes there already use relative paths.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16451: [WIP][SPARK-18922][SQL][CORE][STREAMING][TESTS] F...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16451#discussion_r94562756
  
--- Diff: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala
 ---
@@ -374,8 +380,15 @@ class KafkaTestUtils extends Logging {
 
 def shutdown() {
   factory.shutdown()
-  Utils.deleteRecursively(snapshotDir)
-  Utils.deleteRecursively(logDir)
+  // The directories are not closed even if the ZooKeeper server is 
shut-downed.
+  // Please see ZOOKEEPER-1844, which is fixed in 3.4.6+. It leads to 
test failures
+  // on Windows if these are not ignored.
+  if (FileUtils.deleteQuietly(snapshotDir)) {
--- End diff --

It does not close the directory in Zookeeper. This seems fixed from 3.4.6+ 
(See 
https://github.com/apache/zookeeper/blob/release-3.4.6/src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java#L161-L165)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16451: [WIP][SPARK-18922][SQL][CORE][STREAMING][TESTS] F...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16451#discussion_r94563417
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -222,25 +223,34 @@ case class LoadDataCommand(
 val loadPath =
   if (isLocal) {
 val uri = Utils.resolveURI(path)
-val filePath = uri.getPath()
-val exists = if (filePath.contains("*")) {
+val file = new File(uri.getPath)
+val exists = if (file.getAbsolutePath.contains("*")) {
   val fileSystem = FileSystems.getDefault
-  val pathPattern = fileSystem.getPath(filePath)
-  val dir = pathPattern.getParent.toString
+  val dir = file.getParentFile.getAbsolutePath
--- End diff --

Here, it threw the exception as below:

```
java.nio.file.InvalidPathException: Illegal char <:> at index 2: 
/C:/projects/spark/target/tmp/spark-8e874658-3e0d-4622-a999-d4305954d2c1/*part-r*
```

because the leading `/` is not allowed. After converting it into `C:\a\b\c` 
format, then it throws an exception as below:

```
java.nio.file.InvalidPathException: Illegal char <*> at index 72: 
C:\projects\spark\target\tmp\spark-2f34e61d-9951-43fe-bb7d-32248fa55b22\*part-r*
```

Special characters such as "*" on Windows are not allowed as a path. So, 
calling `WindowsFileSystem.getPath` throws an exception if there are in the 
path.

So, here, I separated the file name and the dir path.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ident...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
Hi @srowen, do you mind if I ask to check whether the changes look 
reasonable? (I will double check if the tests are really passed after the tests 
above are finished.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ident...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
Now, there are 30ish tests failed on Windows which I could identify via 
AppVeyor tests 
[here](https://gist.github.com/HyukjinKwon/88a0b37cd027934bc14f3aa9f812be31) 
which I am currently working on. Their causes do not look resource or path 
related problems


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ident...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix al...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16451#discussion_r94569924
  
--- Diff: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala
 ---
@@ -138,10 +139,15 @@ class KafkaTestUtils extends Logging {
 
 if (server != null) {
   server.shutdown()
+  server.awaitShutdown()
   server = null
 }
 
-brokerConf.logDirs.foreach { f => Utils.deleteRecursively(new File(f)) 
}
+// On Windows, `logDirs` is left open even after Kafka server above is 
completely shut-downed
+// in some cases. It leads to test failures on Windows if these are 
not ignored.
+brokerConf.logDirs.map(new File(_))
--- End diff --

It really looks an issue in Kafka. The broker seems shut-downed without 
closing the log directories in some cases. This directories are Kafka specified 
directories.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix al...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16451#discussion_r94575201
  
--- Diff: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala
 ---
@@ -138,10 +139,15 @@ class KafkaTestUtils extends Logging {
 
 if (server != null) {
   server.shutdown()
+  server.awaitShutdown()
   server = null
 }
 
-brokerConf.logDirs.foreach { f => Utils.deleteRecursively(new File(f)) 
}
+// On Windows, `logDirs` is left open even after Kafka server above is 
completely shut-downed
+// in some cases. It leads to test failures on Windows if these are 
not ignored.
+brokerConf.logDirs.map(new File(_))
+  .filterNot(FileUtils.deleteQuietly)
--- End diff --

Ah, actually, `_.delete` does not actually delete when it is not empty as 
below:

```
.
└── tmp
└── aa
```

```scala
scala> import java.io.File
import java.io.File

scala> new File("./tmp").delete()
res0: Boolean = false
```

I first wanted to use `Utils.deleteRecursively` but it throws an exception 
as below:

```
DirectKafkaStreamSuite:
 Exception encountered when attempting to run a suite with class name: 
org.apache.spark.streaming.kafka.DirectKafkaStreamSuite *** ABORTED *** (7 
seconds, 127 milliseconds)
   java.io.IOException: Failed to delete: 
C:\projects\spark\target\tmp\spark-d0d3eba7-4215-4e10-b40e-bb797e89338e
   at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1010)
```

when a lock is hold on Windows.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix al...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16451#discussion_r94575266
  
--- Diff: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala
 ---
@@ -374,8 +380,15 @@ class KafkaTestUtils extends Logging {
 
 def shutdown() {
   factory.shutdown()
-  Utils.deleteRecursively(snapshotDir)
-  Utils.deleteRecursively(logDir)
+  // The directories are not closed even if the ZooKeeper server is 
shut-downed.
+  // Please see ZOOKEEPER-1844, which is fixed in 3.4.6+. It leads to 
test failures
+  // on Windows if these are not ignored.
+  if (FileUtils.deleteQuietly(snapshotDir)) {
--- End diff --

Oh, yes. It should compare it with _not_.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix al...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16451#discussion_r94575575
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala ---
@@ -339,10 +339,15 @@ class HiveSparkSubmitSuite
   private def runSparkSubmit(args: Seq[String]): Unit = {
 val sparkHome = sys.props.getOrElse("spark.test.home", 
fail("spark.test.home is not set!"))
 val history = ArrayBuffer.empty[String]
-val commands = Seq("./bin/spark-submit") ++ args
+val sparkSubmit = if (Utils.isWindows) {
--- End diff --

I think we don't have such ones anymore assuming from the rest of errors - 
[here](https://gist.github.com/HyukjinKwon/88a0b37cd027934bc14f3aa9f812be31).

We have a similar ones that are failed during trying to execute `/bin/bash` 
but I believe they are different with this case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix al...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16451#discussion_r94575683
  
--- Diff: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala
 ---
@@ -138,10 +139,15 @@ class KafkaTestUtils extends Logging {
 
 if (server != null) {
   server.shutdown()
+  server.awaitShutdown()
   server = null
 }
 
-brokerConf.logDirs.foreach { f => Utils.deleteRecursively(new File(f)) 
}
+// On Windows, `logDirs` is left open even after Kafka server above is 
completely shut-downed
+// in some cases. It leads to test failures on Windows if these are 
not ignored.
+brokerConf.logDirs.map(new File(_))
+  .filterNot(FileUtils.deleteQuietly)
--- End diff --

Should I maybe just try to use a try-catch with `Utils.deleteRecursively`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ident...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
Let me just push a small commit fixing the _"not"_ condition just mainly to 
retrigger the test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6   7   8   9   10   >