date:20180213

[GitHub] spark issue #20589: [SPARK-23394][UI] In RDD storage page show the executor ...

2018-02-13 Thread attilapiros

Github user attilapiros commented on the issue:

https://github.com/apache/spark/pull/20589
  
jenkins retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20556: [SPARK-23367][Build] Include python document style check...

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20556
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20556: [SPARK-23367][Build] Include python document style check...

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20556
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87439/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20556: [SPARK-23367][Build] Include python document style check...

2018-02-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20556
  
**[Test build #87439 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87439/testReport)**
 for PR 20556 at commit 
[`ee14cf7`](https://github.com/apache/spark/commit/ee14cf708603bd904505a110c0ca5d3607d5cdb8).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20605: [SPARK-23419][SPARK-23416][SS] data source v2 write path...

2018-02-13 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20605
  
Also cc @tdas @marmbrus 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20555: [SPARK-23366] Improve hot reading path in ReadAheadInput...

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20555
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20555: [SPARK-23366] Improve hot reading path in ReadAheadInput...

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20555
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87438/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20555: [SPARK-23366] Improve hot reading path in ReadAheadInput...

2018-02-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20555
  
**[Test build #87438 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87438/testReport)**
 for PR 20555 at commit 
[`b6852aa`](https://github.com/apache/spark/commit/b6852aa8a7f0e053985d5b89ba4020792d648f82).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20424: [Spark-23240][python] Better error message when e...

2018-02-13 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20424#discussion_r168086187
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
@@ -191,7 +192,26 @@ private[spark] class PythonWorkerFactory(pythonExec: 
String, envVars: Map[String
 daemon = pb.start()
 
 val in = new DataInputStream(daemon.getInputStream)
-daemonPort = in.readInt()
+try {
+  daemonPort = in.readInt()
+} catch {
+  case _: EOFException =>
+throw new IOException(s"No port number in $daemonModule's 
stdout")
+}
+
+// test that the returned port number is within a valid range.
+// note: this does not cover the case where the port number
+// is arbitrary data but is also coincidentally within range
+if (daemonPort < 1 || daemonPort > 0x) {
+  val exceptionMessage = f"""
+   |Bad data in  $daemonModule's standard output.
+   |Expected valid port number, got 0x$daemonPort%08x.
+   |PYTHONPATH set to '$pythonPath'
+   |Python command is '${command.asScala.mkString(" ")}'
+   |One possibility is a sitecustomize.py module in your 
python installation
+   |that is printing to stdout"""
--- End diff --

But I think there's no need to put some additional logics here to deal with 
it tho. Let's just fix the error message to be in the similar format.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20424: [Spark-23240][python] Better error message when e...

2018-02-13 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20424#discussion_r168081472
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
@@ -191,7 +192,26 @@ private[spark] class PythonWorkerFactory(pythonExec: 
String, envVars: Map[String
 daemon = pb.start()
 
 val in = new DataInputStream(daemon.getInputStream)
-daemonPort = in.readInt()
+try {
+  daemonPort = in.readInt()
+} catch {
+  case _: EOFException =>
+throw new IOException(s"No port number in $daemonModule's 
stdout")
+}
+
+// test that the returned port number is within a valid range.
+// note: this does not cover the case where the port number
+// is arbitrary data but is also coincidentally within range
+if (daemonPort < 1 || daemonPort > 0x) {
+  val exceptionMessage = f"""
+   |Bad data in  $daemonModule's standard output.
+   |Expected valid port number, got 0x$daemonPort%08x.
+   |PYTHONPATH set to '$pythonPath'
+   |Python command is '${command.asScala.mkString(" ")}'
+   |One possibility is a sitecustomize.py module in your 
python installation
+   |that is printing to stdout"""
--- End diff --

Oh, I see. I meant it might be located. please fix few my bad words :). One 
thing I was worried of is, it might print the Python path twice if it contained 
stderr (it can be wrapped in L232 -L235).

I think in most cases the Python paths will be printed out actually because 
it should usually throw an exception.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20594
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20556: [SPARK-23367][Build] Include python document styl...

2018-02-13 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20556#discussion_r168079061
  
--- Diff: dev/tox.ini ---
@@ -17,3 +17,5 @@
 ignore=E402,E731,E241,W503,E226,E722,E741,E305
 max-line-length=100
 
exclude=cloudpickle.py,heapq3.py,shared.py,python/docs/conf.py,work/*/*.py,python/.eggs/*
+[pydocstyle]

+ignore=D100,D101,D102,D103,D104,D105,D106,D107,D200,D201,D202,D203,D204,D205,D206,D207,D208,D209,D210,D211,D212,D213,D214,D215,D300,D301,D302,D400,D401,D402,D403,D404,D405,D406,D407,D408,D409,D410,D411,D412,D413,D414
--- End diff --

We need a line break at the end of the file.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20556: [SPARK-23367][Build] Include python document styl...

2018-02-13 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20556#discussion_r168078916
  
--- Diff: dev/lint-python ---
@@ -21,10 +21,15 @@ SCRIPT_DIR="$( cd "$( dirname "$0" )" && pwd )"
 SPARK_ROOT_DIR="$(dirname "$SCRIPT_DIR")"
 # Exclude auto-generated configuration file.
 PATHS_TO_CHECK="$( cd "$SPARK_ROOT_DIR" && find . -name "*.py" )"
+DOC_PATHS_TO_CHECK="$( cd "$SPARK_ROOT_DIR" && find . -name "*.py" | grep 
-vF 'functions.py')"
--- End diff --

nit: add an extra space between `'functions.py'` and `)`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20594
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87440/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20594
  
**[Test build #87440 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87440/testReport)**
 for PR 20594 at commit 
[`3a29039`](https://github.com/apache/spark/commit/3a290392e87e6476b3a9253b902850a078dfc4ea).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20556: [SPARK-23367][Build] Include python document styl...

2018-02-13 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20556#discussion_r168078994
  
--- Diff: dev/lint-python ---
@@ -21,10 +21,15 @@ SCRIPT_DIR="$( cd "$( dirname "$0" )" && pwd )"
 SPARK_ROOT_DIR="$(dirname "$SCRIPT_DIR")"
 # Exclude auto-generated configuration file.
 PATHS_TO_CHECK="$( cd "$SPARK_ROOT_DIR" && find . -name "*.py" )"
+DOC_PATHS_TO_CHECK="$( cd "$SPARK_ROOT_DIR" && find . -name "*.py" | grep 
-vF 'functions.py')"
 PYCODESTYLE_REPORT_PATH="$SPARK_ROOT_DIR/dev/pycodestyle-report.txt"
+PYDOCSTYLE_REPORT_PATH="$SPARK_ROOT_DIR/dev/pydocstyle-report.txt"
 PYLINT_REPORT_PATH="$SPARK_ROOT_DIR/dev/pylint-report.txt"
 PYLINT_INSTALL_INFO="$SPARK_ROOT_DIR/dev/pylint-info.txt"
+PYDOCSTYLEBUILD="pydocstyle"
+PYDOCSTYLEVERSION=$(python -c 'import pkg_resources; print 
pkg_resources.get_distribution("pydocstyle").version')
--- End diff --

`..; print(pkg_resources.get_distribution("pydocstyle").version)' 2> 
/dev/null)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20424: [Spark-23240][python] Better error message when e...

2018-02-13 Thread bersprockets

Github user bersprockets commented on a diff in the pull request:

https://github.com/apache/spark/pull/20424#discussion_r168080320
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
@@ -191,7 +192,26 @@ private[spark] class PythonWorkerFactory(pythonExec: 
String, envVars: Map[String
 daemon = pb.start()
 
 val in = new DataInputStream(daemon.getInputStream)
-daemonPort = in.readInt()
+try {
+  daemonPort = in.readInt()
+} catch {
+  case _: EOFException =>
+throw new IOException(s"No port number in $daemonModule's 
stdout")
+}
+
+// test that the returned port number is within a valid range.
+// note: this does not cover the case where the port number
+// is arbitrary data but is also coincidentally within range
+if (daemonPort < 1 || daemonPort > 0x) {
+  val exceptionMessage = f"""
+   |Bad data in  $daemonModule's standard output.
+   |Expected valid port number, got 0x$daemonPort%08x.
+   |PYTHONPATH set to '$pythonPath'
+   |Python command is '${command.asScala.mkString(" ")}'
+   |One possibility is a sitecustomize.py module in your 
python installation
+   |that is printing to stdout"""
--- End diff --

Nice, except this one thing:

>This module 'sitecustomize.py' can be located in your Python path:
  
/.../spark/python/lib/pyspark.zip:/.../spark/python/lib/py4j-0.10.6-src.zip:/.../spark/assembly/target/scala-2.11/jars/spark-core_2.11-2.4.0-SNAPSHOT.jar:/.../spark/python/lib/py4j-0.10.6-src.zip:/.../spark/python/:

I display the path because maybe you accidentally configured the path to 
have some old .zip or other incompatible versions of expected python modules on 
your path.

sitecustomize.py might not be in your path, although it could be. I found a 
machine that had it here: /usr/lib/python2.7/sitecustomize.py. Another, I found 
it here: /usr/lib64/python2.7/sitecustomize.py



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20605: [SPARK-23419][SPARK-23416][SS] data source v2 write path...

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20605
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20605: [SPARK-23419][SPARK-23416][SS] data source v2 write path...

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20605
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87437/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20605: [SPARK-23419][SPARK-23416][SS] data source v2 write path...

2018-02-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20605
  
**[Test build #87437 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87437/testReport)**
 for PR 20605 at commit 
[`b91a3af`](https://github.com/apache/spark/commit/b91a3af185fa8c953648080c24c71664b1ebe646).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20593: [SPARK-23230][SQL][BRANCH-2.2]When hive.default.f...

2018-02-13 Thread cxzl25

Github user cxzl25 closed the pull request at:

https://github.com/apache/spark/pull/20593


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20593: [SPARK-23230][SQL][BRANCH-2.2]When hive.default.fileform...

2018-02-13 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20593
  
Thanks! Merged to 2.2.

Could you close it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20593: [SPARK-23230][SQL][BRANCH-2.2]When hive.default.fileform...

2018-02-13 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20593
  
Supporting Hadoop 3.0 is being discussed. Now, we do not support it yet. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20424: [Spark-23240][python] Better error message when e...

2018-02-13 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20424#discussion_r168076901
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
@@ -17,7 +17,7 @@
 
 package org.apache.spark.api.python
 
-import java.io.{DataInputStream, DataOutputStream, InputStream, 
OutputStreamWriter}
+import java.io._
--- End diff --

BTW, don't forget to get rid of this import change too. This import looks 
needed due to `IOException`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20424: [Spark-23240][python] Better error message when e...

2018-02-13 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20424#discussion_r168075968
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
@@ -191,7 +192,26 @@ private[spark] class PythonWorkerFactory(pythonExec: 
String, envVars: Map[String
 daemon = pb.start()
 
 val in = new DataInputStream(daemon.getInputStream)
-daemonPort = in.readInt()
+try {
+  daemonPort = in.readInt()
+} catch {
+  case _: EOFException =>
+throw new IOException(s"No port number in $daemonModule's 
stdout")
+}
+
+// test that the returned port number is within a valid range.
+// note: this does not cover the case where the port number
+// is arbitrary data but is also coincidentally within range
+if (daemonPort < 1 || daemonPort > 0x) {
+  val exceptionMessage = f"""
+   |Bad data in  $daemonModule's standard output.
+   |Expected valid port number, got 0x$daemonPort%08x.
+   |PYTHONPATH set to '$pythonPath'
+   |Python command is '${command.asScala.mkString(" ")}'
+   |One possibility is a sitecustomize.py module in your 
python installation
+   |that is printing to stdout"""
+  throw new IOException(exceptionMessage.stripMargin)
--- End diff --

I think this rather should be `SparkException`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20424: [Spark-23240][python] Better error message when e...

2018-02-13 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20424#discussion_r168075909
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
@@ -191,7 +192,26 @@ private[spark] class PythonWorkerFactory(pythonExec: 
String, envVars: Map[String
 daemon = pb.start()
 
 val in = new DataInputStream(daemon.getInputStream)
-daemonPort = in.readInt()
+try {
+  daemonPort = in.readInt()
+} catch {
+  case _: EOFException =>
+throw new IOException(s"No port number in $daemonModule's 
stdout")
+}
+
+// test that the returned port number is within a valid range.
+// note: this does not cover the case where the port number
+// is arbitrary data but is also coincidentally within range
+if (daemonPort < 1 || daemonPort > 0x) {
+  val exceptionMessage = f"""
+   |Bad data in  $daemonModule's standard output.
+   |Expected valid port number, got 0x$daemonPort%08x.
+   |PYTHONPATH set to '$pythonPath'
+   |Python command is '${command.asScala.mkString(" ")}'
+   |One possibility is a sitecustomize.py module in your 
python installation
+   |that is printing to stdout"""
--- End diff --

Shall we keep the format same as:


https://github.com/apache/spark/blob/b63abee881f2b4379f375500d51fdef706d6d512/core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala#L232-L235

?

the current message looks:

```
...
Caused by: java.io.IOException:
Bad data in  pyspark.daemon's standard output.
Expected valid port number, got 0x4920616d.
PYTHONPATH set to 
'/.../spark/python/lib/pyspark.zip:/.../spark/python/lib/py4j-0.10.6-src.zip:/.../spark/assembly/target/scala-2.11/jars/spark-core_2.11-2.4.0-SNAPSHOT.jar:/.../spark/python/lib/py4j-0.10.6-src.zip:/.../spark/python/:'
Python command is 'python -m pyspark.daemon'
Check if you have a sitecustomize.py module in your python installation.
...
```

I made a suggestion while verifying this PR:


```
...
Error from bad data in pyspark.daemon's standard output. Invalid port 
number:
  1633771786 (0x6161610a)
Python command to execute the daemon was:
  python -m pyspark.daemon

One possibility is a sitecustomize module printing some data to the 
standard output.
This module 'sitecustomize.py' can be located in your Python path:
  
/.../spark/python/lib/pyspark.zip:/.../spark/python/lib/py4j-0.10.6-src.zip:/.../spark/assembly/target/scala-2.11/jars/spark-core_2.11-2.4.0-SNAPSHOT.jar:/.../spark/python/lib/py4j-0.10.6-src.zip:/.../spark/python/:
...
```

Here is the diff I used. I also did some insane nitpicks here as well.

```diff
diff --git 
a/core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala 
b/core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala
index 5790c050a7f..b44aa6064bb 100644
--- 
a/core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala
+++ 
b/core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala
@@ -196,20 +196,24 @@ private[spark] class PythonWorkerFactory(pythonExec: 
String, envVars: Map[String
   daemonPort = in.readInt()
 } catch {
   case _: EOFException =>
-throw new IOException(s"No port number in $daemonModule's 
stdout")
+throw new SparkException(s"No port number in $daemonModule's 
standard output.")
 }

-// test that the returned port number is within a valid range.
-// note: this does not cover the case where the port number
-// is arbitrary data but is also coincidentally within range
+// Check if the returned port number is within a valid range.
+// Note: this does not cover the case where the port number is 
arbitrary data but is
+// also coincidentally within range.
 if (daemonPort < 1 || daemonPort > 0x) {
   val exceptionMessage = f"""
-   |Bad data in  $daemonModule's standard output.
-   |Expected valid port number, got 0x$daemonPort%08x.
-   |PYTHONPATH set to '$pythonPath'
-   |Python command is '${command.asScala.mkString(" ")}'
-   |Check if you have a sitecustomize.py module in your python 
installation."""
-  throw new IOException(exceptionMessage.stripMargin)
+ |Error from bad data in $daemonModule's standard output. 
Invalid port number:
+ |  $daemonPort (0x$daemonPort%08x)
+ |Python command to

[GitHub] spark issue #20424: [Spark-23240][python] Better error message when extraneo...

2018-02-13 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20424
  
Thank you @squito  .. LGTM otherwise.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20424: [Spark-23240][python] Better error message when e...

2018-02-13 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20424#discussion_r168075988
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
@@ -191,7 +192,26 @@ private[spark] class PythonWorkerFactory(pythonExec: 
String, envVars: Map[String
 daemon = pb.start()
 
 val in = new DataInputStream(daemon.getInputStream)
-daemonPort = in.readInt()
+try {
+  daemonPort = in.readInt()
+} catch {
+  case _: EOFException =>
+throw new IOException(s"No port number in $daemonModule's 
stdout")
--- End diff --

Here too, `SparkException`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20594
  
**[Test build #87440 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87440/testReport)**
 for PR 20594 at commit 
[`3a29039`](https://github.com/apache/spark/commit/3a290392e87e6476b3a9253b902850a078dfc4ea).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20594
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/888/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20594
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20554
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87436/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20554
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2

2018-02-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20554
  
**[Test build #87436 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87436/testReport)**
 for PR 20554 at commit 
[`f9983d9`](https://github.com/apache/spark/commit/f9983d937b97b2ba9f020f370b8aefdd353a654b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20556: [SPARK-23367][Build] Include python document style check...

2018-02-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20556
  
**[Test build #87439 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87439/testReport)**
 for PR 20556 at commit 
[`ee14cf7`](https://github.com/apache/spark/commit/ee14cf708603bd904505a110c0ca5d3607d5cdb8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20556: [SPARK-23367][Build] Include python document style check...

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20556
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/887/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20556: [SPARK-23367][Build] Include python document style check...

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20556
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20555: [SPARK-23366] Improve hot reading path in ReadAheadInput...

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20555
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87435/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20556: [SPARK-23367][Build] Include python document style check...

2018-02-13 Thread rekhajoshm

Github user rekhajoshm commented on the issue:

https://github.com/apache/spark/pull/20556
  
build error unrelated to this PR. retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20555: [SPARK-23366] Improve hot reading path in ReadAheadInput...

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20555
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20555: [SPARK-23366] Improve hot reading path in ReadAheadInput...

2018-02-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20555
  
**[Test build #87435 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87435/testReport)**
 for PR 20555 at commit 
[`52f4a7c`](https://github.com/apache/spark/commit/52f4a7c3e97c475b4464b82c7d8e00dcd9d889b3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20593: [SPARK-23230][SQL][BRANCH-2.2]When hive.default.fileform...

2018-02-13 Thread PandaMonkey

Github user PandaMonkey commented on the issue:

https://github.com/apache/spark/pull/20593
  
@dongjoon-hyun Hi, I have a digression question, the latest version of 
Hadoop is 3.0.0, why Spark still uses Hadoop 2.6.5? Does spark plan to upgrade 
Hadoop from 2.6.5 to 3.0.0?
I am only a downstream of Spark, and we encountered some dependency 
conflicts problems. I'm not sure if it is suitable to ask this question here, 
if you have the upgrade plannning, I can report this in Jira.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20554
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87433/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20554
  
Build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-13 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20594
  
Because this is a quick fix, my idea is to have a surface patch that 
doesn't change existing API. The approach of adding parameter to 
`DefaultParamsWriter.saveMetadata` also sounds good to me, but the parameter 
seems useless if we get rid of this quick fix in the future.

Instead of adding parameter, I think we can pass the `paramMap` parameter 
when calling `saveMetadata`.

For #20410 and #18982, I have a question, are they regression? Seems to me 
they are not new issues to 2.3.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2

2018-02-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20554
  
**[Test build #87433 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87433/testReport)**
 for PR 20554 at commit 
[`f9983d9`](https://github.com/apache/spark/commit/f9983d937b97b2ba9f020f370b8aefdd353a654b).
 * This patch passes all tests.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable logical ...

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20387
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable logical ...

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20387
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87434/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable logical ...

2018-02-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20387
  
**[Test build #87434 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87434/testReport)**
 for PR 20387 at commit 
[`3b55609`](https://github.com/apache/spark/commit/3b55609b605fb461f6c2616d1da95a2d4b27ff4b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple ...

2018-02-13 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20594#discussion_r168069150
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala 
---
@@ -290,6 +293,27 @@ object Bucketizer extends 
DefaultParamsReadable[Bucketizer] {
 }
   }
 
+
+  private[Bucketizer] class BucketizerWriter(instance: Bucketizer) extends 
MLWriter {
+
+override protected def saveImpl(path: String): Unit = {
+  // SPARK-23377: The default params will be saved and loaded as 
user-supplied params.
+  // Once `inputCols` is set, the default value of `outputCol` param 
causes the error
+  // when checking exclusive params. As a temporary to fix it, we 
remove the default
+  // value of `outputCol` if `inputCols` is set before saving.
+  // TODO: If we modify the persistence mechanism later to better 
handle default params,
+  // we can get rid of this.
+  var removedOutputCol: Option[String] = None
+  if (instance.isSet(instance.inputCols)) {
--- End diff --

Yes I think #20410 is not related to this PR for now. But I am afraid in 
the future, when we add more functionality, potential bugs will possible to be 
triggered.
But I think we don't need to care the order of them to be merged. :)



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20555: [SPARK-23366] Improve hot reading path in ReadAheadInput...

2018-02-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20555
  
**[Test build #87438 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87438/testReport)**
 for PR 20555 at commit 
[`b6852aa`](https://github.com/apache/spark/commit/b6852aa8a7f0e053985d5b89ba4020792d648f82).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20555: [SPARK-23366] Improve hot reading path in ReadAheadInput...

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20555
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/886/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20555: [SPARK-23366] Improve hot reading path in ReadAheadInput...

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20555
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20599: [SPARK-23407][SQL] add a config to try to inline all mut...

2018-02-13 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20599
  
@mgaido91 As I said in the PR description, no regression is found so far, 
just providing a config to be super safe.

Actually this PR has a problem: the codegen usually happens at executor 
side, so we can't use SQLConf directy. I'll figure this out after my vacation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple ...

2018-02-13 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20594#discussion_r168067865
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala 
---
@@ -290,6 +293,27 @@ object Bucketizer extends 
DefaultParamsReadable[Bucketizer] {
 }
   }
 
+
+  private[Bucketizer] class BucketizerWriter(instance: Bucketizer) extends 
MLWriter {
+
+override protected def saveImpl(path: String): Unit = {
+  // SPARK-23377: The default params will be saved and loaded as 
user-supplied params.
+  // Once `inputCols` is set, the default value of `outputCol` param 
causes the error
+  // when checking exclusive params. As a temporary to fix it, we 
remove the default
+  // value of `outputCol` if `inputCols` is set before saving.
+  // TODO: If we modify the persistence mechanism later to better 
handle default params,
+  // we can get rid of this.
+  var removedOutputCol: Option[String] = None
--- End diff --

yep. But I have some new thoughts, see my comments at bottom. -:)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-13 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/20594
  
I thought again, instead of "removing default value and restore it again 
later (which may cause some side effects)", maybe the better way is, add a 
parameter to `DefaultParamsWriter.saveMetadata`, specify which default param 
need to skip when saving.

@mgaido91 Yes I agree with you. Either #20410 or #18982 need to be merged 
to 2.3, the related issue is possible to cause some strange bugs.

cc @jkbradley 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20599: [SPARK-23407][SQL] add a config to try to inline ...

2018-02-13 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20599#discussion_r168067551
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -1461,20 +1465,19 @@ object CodeGenerator extends Logging {
   
CodegenMetrics.METRIC_GENERATED_CLASS_BYTECODE_SIZE.update(classBytes.length)
   try {
 val cf = new ClassFile(new ByteArrayInputStream(classBytes))
-val stats = cf.methodInfos.asScala.flatMap { method =>
+cf.methodInfos.asScala.flatMap { method =>
--- End diff --

not, but a small clean up.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20599: [SPARK-23407][SQL] add a config to try to inline ...

2018-02-13 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20599#discussion_r168067521
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -675,6 +675,16 @@ object SQLConf {
   "disable logging or -1 to apply no limit.")
 .createWithDefault(1000)
 
+  val CODEGEN_TRY_INLINE_ALL_STATES =
+buildConf("spark.sql.codegen.tryInlineAllStates")
+.internal()
+.doc("When adding mutable states during code generation, whether or 
not we should try to " +
+  "inline all the states. If this config is false, we only try to 
inline primitive stats, " +
+  "so that primitive states are more likely to be inlined. Set this 
config to true to make " +
+  "the behavior same as Spark 2.2.")
--- End diff --

yea, let me improve it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20590: [SPARK-23399][SQL] Register a task completion lis...

2018-02-13 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20590


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20590: [SPARK-23399][SQL] Register a task completion listener f...

2018-02-13 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20590
  
thanks, merging to master/2.3!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20555: [SPARK-23366] Improve hot reading path in ReadAhe...

2018-02-13 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20555#discussion_r168066120
  
--- Diff: core/src/main/java/org/apache/spark/io/ReadAheadInputStream.java 
---
@@ -230,6 +227,7 @@ private void signalAsyncReadComplete() {
 
   private void waitForAsyncReadComplete() throws IOException {
 stateChangeLock.lock();
+isWaiting.set(true);
 try {
   while (readInProgress) {
--- End diff --

shall we add a comment about spurious wakeup? Otherwise someone else may 
still mistakenly remove it in the future.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20598: [SPARK-23406] [SS] Enable stream-stream self-join...

2018-02-13 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20598#discussion_r168064805
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingRelation.scala
 ---
@@ -62,7 +64,7 @@ case class StreamingRelation(dataSource: DataSource, 
sourceName: String, output:
 case class StreamingExecutionRelation(
--- End diff --

not very familiar with the streaming side, but IIRC, some of these plans 
are temporary and will be replaced before entering analyzer, and these plans 
don't need to extend MultiInstanceRelation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20605: [SPARK-23419][SPARK-23416][SS] data source v2 write path...

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20605
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20605: [SPARK-23419][SPARK-23416][SS] data source v2 write path...

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20605
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/885/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20605: [SPARK-23419][SPARK-23416][SS] data source v2 write path...

2018-02-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20605
  
**[Test build #87437 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87437/testReport)**
 for PR 20605 at commit 
[`b91a3af`](https://github.com/apache/spark/commit/b91a3af185fa8c953648080c24c71664b1ebe646).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20605: [SPARK-23419][SPARK-23416][SS] data source v2 wri...

2018-02-13 Thread cloud-fan

GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/20605

[SPARK-23419][SPARK-23416][SS] data source v2 write path should re-throw 
interruption exceptions directly

## What changes were proposed in this pull request?

Streaming execution has a list of exceptions that means interruption, and 
handle them specially. `WriteToDataSourceV2Exec` should also respect this list 
and not wrap them with `SparkException`.

## How was this patch tested?

existing test.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark write

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20605.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20605


commit b91a3af185fa8c953648080c24c71664b1ebe646
Author: Wenchen Fan 
Date:   2018-02-14T02:18:35Z

data source v2 write path should re-throw interruption exceptions directly




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20605: [SPARK-23419][SPARK-23416][SS] data source v2 write path...

2018-02-13 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20605
  
cc @zsxwing @jose-torres @mgaido91 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20602: [SPARK-23416][SS] handle streaming interrupts in ...

2018-02-13 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20602#discussion_r168060518
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
 ---
@@ -369,7 +370,11 @@ abstract class StreamExecution(
 //  exception
 // UncheckedExecutionException - thrown by codes that cannot throw 
a checked
 //   ExecutionException, such as 
BiFunction.apply
-case e2 @ (_: UncheckedIOException | _: ExecutionException | _: 
UncheckedExecutionException)
+// SparkException - thrown if the interrupt happens in the middle 
of an RPC wait
--- End diff --

does it mean this issue is nothing to do with `WriteToDataSourceV2Exec`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20604: [WIP][SPARK-23365][CORE] Do not adjust num executors whe...

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20604
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87429/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20604: [WIP][SPARK-23365][CORE] Do not adjust num executors whe...

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20604
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20589: [SPARK-23394][UI] In RDD storage page show the executor ...

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20589
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87430/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20604: [WIP][SPARK-23365][CORE] Do not adjust num executors whe...

2018-02-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20604
  
**[Test build #87429 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87429/testReport)**
 for PR 20604 at commit 
[`b5a39da`](https://github.com/apache/spark/commit/b5a39dab324be4bb358682720cf0f7e55272559d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20589: [SPARK-23394][UI] In RDD storage page show the executor ...

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20589
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20589: [SPARK-23394][UI] In RDD storage page show the executor ...

2018-02-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20589
  
**[Test build #87430 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87430/testReport)**
 for PR 20589 at commit 
[`cdc5168`](https://github.com/apache/spark/commit/cdc5168f721eed6b3634edd9eaaae8965b295ceb).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20594: [SPARK-23377][ML] Fixes Bucketizer with multiple ...

2018-02-13 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20594#discussion_r168057330
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala 
---
@@ -290,6 +293,27 @@ object Bucketizer extends 
DefaultParamsReadable[Bucketizer] {
 }
   }
 
+
+  private[Bucketizer] class BucketizerWriter(instance: Bucketizer) extends 
MLWriter {
+
+override protected def saveImpl(path: String): Unit = {
+  // SPARK-23377: The default params will be saved and loaded as 
user-supplied params.
+  // Once `inputCols` is set, the default value of `outputCol` param 
causes the error
+  // when checking exclusive params. As a temporary to fix it, we 
remove the default
+  // value of `outputCol` if `inputCols` is set before saving.
+  // TODO: If we modify the persistence mechanism later to better 
handle default params,
+  // we can get rid of this.
+  var removedOutputCol: Option[String] = None
+  if (instance.isSet(instance.inputCols)) {
--- End diff --

Why? I think they are orthogonal and this shouldn't cause the issue in 
Python side. Besides, as the PySpark multi-column support is not added yet 
(it's reverted), I think we don't hit the Python API issue. This is a quick fix 
to deal with the persistence bug. I'm not sure we should be blocked.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20603: [SPARK-23418][SQL]: Fail DataSourceV2 reads when user sc...

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20603
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87431/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20596: [SPARK-23404][CORE]When the underlying buffers are direc...

2018-02-13 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/20596
  
Can you please elaborate the case to support your fix here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20603: [SPARK-23418][SQL]: Fail DataSourceV2 reads when user sc...

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20603
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20603: [SPARK-23418][SQL]: Fail DataSourceV2 reads when user sc...

2018-02-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20603
  
**[Test build #87431 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87431/testReport)**
 for PR 20603 at commit 
[`bd06193`](https://github.com/apache/spark/commit/bd06193a1f9d2a6289a1fad768904ccb017ada56).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable logical ...

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20387
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87432/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable logical ...

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20387
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable logical ...

2018-02-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20387
  
**[Test build #87432 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87432/testReport)**
 for PR 20387 at commit 
[`b8e3623`](https://github.com/apache/spark/commit/b8e3623837047949b39141e46eb96f30de8aa21e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20554
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/884/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20554
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2

2018-02-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20554
  
**[Test build #87436 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87436/testReport)**
 for PR 20554 at commit 
[`f9983d9`](https://github.com/apache/spark/commit/f9983d937b97b2ba9f020f370b8aefdd353a654b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20555: [SPARK-23366] Improve hot reading path in ReadAheadInput...

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20555
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/883/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20555: [SPARK-23366] Improve hot reading path in ReadAheadInput...

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20555
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2

2018-02-13 Thread tdas

Github user tdas commented on the issue:

https://github.com/apache/spark/pull/20554
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20555: [SPARK-23366] Improve hot reading path in ReadAheadInput...

2018-02-13 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/20555
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20555: [SPARK-23366] Improve hot reading path in ReadAheadInput...

2018-02-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20555
  
**[Test build #87435 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87435/testReport)**
 for PR 20555 at commit 
[`52f4a7c`](https://github.com/apache/spark/commit/52f4a7c3e97c475b4464b82c7d8e00dcd9d889b3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20555: [SPARK-23366] Improve hot reading path in ReadAhe...

2018-02-13 Thread juliuszsompolski

Github user juliuszsompolski commented on a diff in the pull request:

https://github.com/apache/spark/pull/20555#discussion_r168048937
  
--- Diff: core/src/main/java/org/apache/spark/io/ReadAheadInputStream.java 
---
@@ -78,9 +79,8 @@
   // whether there is a read ahead task running,
   private boolean isReading;
 
-  // If the remaining data size in the current buffer is below this 
threshold,
-  // we issue an async read from the underlying input stream.
-  private final int readAheadThresholdInBytes;
+  // whether there is a reader waiting for data.
+  private AtomicBoolean isWaiting = new AtomicBoolean(false);
--- End diff --

I'll leave it be - should compile to basically the same, and with using 
`AtomicBoolean` the intent seems more readable to me.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20555: [SPARK-23366] Improve hot reading path in ReadAhe...

2018-02-13 Thread juliuszsompolski

Github user juliuszsompolski commented on a diff in the pull request:

https://github.com/apache/spark/pull/20555#discussion_r168048795
  
--- Diff: core/src/main/java/org/apache/spark/io/ReadAheadInputStream.java 
---
@@ -230,24 +227,32 @@ private void signalAsyncReadComplete() {
 
   private void waitForAsyncReadComplete() throws IOException {
 stateChangeLock.lock();
+isWaiting.set(true);
 try {
-  while (readInProgress) {
+  if (readInProgress) {
--- End diff --

Good catch, thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable logical ...

2018-02-13 Thread rdblue

Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/20387
  
Okay, I rebased again after SPARK-23303 was reverted.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20373: [SPARK-23159][PYTHON] Update cloudpickle to v0.4.3

2018-02-13 Thread BryanCutler

Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/20373
  
>Does the hijacking of the namedtuple still cause problems on Python 3.6?

I'm not too familiar with the history of this, but I ran PySpark tests that 
cover namedtuples with 3.6.3 and all passed. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable logical ...

2018-02-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20387
  
**[Test build #87434 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87434/testReport)**
 for PR 20387 at commit 
[`3b55609`](https://github.com/apache/spark/commit/3b55609b605fb461f6c2616d1da95a2d4b27ff4b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable logical ...

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20387
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/882/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20373: [SPARK-23159][PYTHON] Update cloudpickle to v0.4.3

2018-02-13 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20373
  
I think it's fine in cloudpickle but Spark has the hijacking for regular 
pickling. I was thinking of a possibility for a deduplicated fix but might have 
to be investigated separately.

Let's hold this on a bit until the release of 2.3.0 as it's going to go 
into master anyway (I think). Seems it's been delayed unexpectedly and we 
better keep the diff small between master and branch-2.3 for now. Will keep my 
eyes on this PR anyway.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable logical ...

2018-02-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20387
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-13 Thread tdas

Github user tdas commented on the issue:

https://github.com/apache/spark/pull/20477
  
Thank you very much @gatorsmile, I promise I will do a proper review of the 
streaming side when you reopen this PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20474: [SPARK-23235][Core] Add executor Threaddump to ap...

2018-02-13 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20474


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 >

1 - 100 of 615 matches

Mail list logo