[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

2018-02-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20487
  
Ah, yup. There are few tests for old Pandas which were tested only when 
Pandas version was lower, and I rewrote them to be tested when both Pandas 
version is lower and missing. Let me clarify the title and description.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20515: [SPARK-23290][SQL][PYTHON][BACKPORT-2.3] Use datetime.da...

2018-02-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20515
  
**[Test build #87094 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87094/testReport)**
 for PR 20515 at commit 
[`b489f4a`](https://github.com/apache/spark/commit/b489f4a0d4fa25fd51d9db78bd01fc972e4e0dd4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20495: [SPARK-23327] [SQL] Update the description and tests of ...

2018-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20495
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87091/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20515: [SPARK-23290][SQL][PYTHON][BACKPORT-2.3] Use datetime.da...

2018-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20515
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/615/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20495: [SPARK-23327] [SQL] Update the description and tests of ...

2018-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20495
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20515: [SPARK-23290][SQL][PYTHON][BACKPORT-2.3] Use datetime.da...

2018-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20515
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20495: [SPARK-23327] [SQL] Update the description and tests of ...

2018-02-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20495
  
**[Test build #87091 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87091/testReport)**
 for PR 20495 at commit 
[`9e97db9`](https://github.com/apache/spark/commit/9e97db9da89c9d9f8bb467eb025239041b3231db).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20515: [SPARK-23290][SQL][PYTHON][BACKPORT-2.3] Use datetime.da...

2018-02-05 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/20515
  
cc @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20515: [SPARK-23290][SQL][PYTHON][BACKPORT-2.3] Use date...

2018-02-05 Thread ueshin
GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/20515

[SPARK-23290][SQL][PYTHON][BACKPORT-2.3] Use datetime.date for date type 
when converting Spark DataFrame to Pandas DataFrame.

## What changes were proposed in this pull request?

This is a backport of #20506.

In #18664, there was a change in how `DateType` is being returned to users 
([line 1968 in 
dataframe.py](https://github.com/apache/spark/pull/18664/files#diff-6fc344560230bf0ef711bb9b5573f1faR1968)).
 This can cause client code which works in Spark 2.2 to fail.
See 
[SPARK-23290](https://issues.apache.org/jira/browse/SPARK-23290?focusedCommentId=16350917=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16350917)
 for an example.

This pr modifies to use `datetime.date` for date type as Spark 2.2 does.

## How was this patch tested?

Tests modified to fit the new behavior and existing tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-23290_2.3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20515.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20515


commit b489f4a0d4fa25fd51d9db78bd01fc972e4e0dd4
Author: Takuya UESHIN 
Date:   2018-02-06T06:52:25Z

[SPARK-23290][SQL][PYTHON] Use datetime.date for date type when converting 
Spark DataFrame to Pandas DataFrame.

## What changes were proposed in this pull request?

In #18664, there was a change in how `DateType` is being returned to users 
([line 1968 in 
dataframe.py](https://github.com/apache/spark/pull/18664/files#diff-6fc344560230bf0ef711bb9b5573f1faR1968)).
 This can cause client code which works in Spark 2.2 to fail.
See 
[SPARK-23290](https://issues.apache.org/jira/browse/SPARK-23290?focusedCommentId=16350917=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16350917)
 for an example.

This pr modifies to use `datetime.date` for date type as Spark 2.2 does.

## How was this patch tested?

Tests modified to fit the new behavior and existing tests.

Author: Takuya UESHIN 

Closes #20506 from ueshin/issues/SPARK-23290.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18555: [SPARK-21353][CORE]add checkValue in spark.internal.conf...

2018-02-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18555
  
Seems fine.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20495: [SPARK-23327] [SQL] Update the description and te...

2018-02-05 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20495#discussion_r166205775
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -1655,15 +1655,17 @@ case class Left(str: Expression, len: Expression, 
child: Expression) extends Run
  */
 // scalastyle:off line.size.limit
 @ExpressionDescription(
-  usage = "_FUNC_(expr) - Returns the character length of `expr` or number 
of bytes in binary data.",
+  usage = "_FUNC_(expr) - Returns the character length of `expr` or number 
of bytes in binary data. " +
--- End diff --

We should be consistent, either `character string` vs `binary string`, or 
`string data` vs `binary data`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

2018-02-05 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20487
  
looks like this PR doesn't skip the "old Pandas" tests, but rewrite them?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20473: [SPARK-23300][TESTS] Prints out if Pandas and PyA...

2018-02-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20473


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20473: [SPARK-23300][TESTS] Prints out if Pandas and PyArrow ar...

2018-02-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20473
  
Thank you @felixcheung, @yhuai, @ueshin and @BryanCutler for reviewing this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20473: [SPARK-23300][TESTS] Prints out if Pandas and PyArrow ar...

2018-02-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20473
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for ...

2018-02-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20506


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...

2018-02-05 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20506
  
@ueshin  can you send a new PR for 2.3? it conflicts, thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...

2018-02-05 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20506
  
LGTM, merging to master/2.3!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19340: [SPARK-22119][ML] Add cosine distance to KMeans

2018-02-05 Thread zhengruifeng
Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/19340
  
@mgaido91 agree that it is better to normalize centers 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20493: [SPARK-23326][WEBUI]schedulerDelay should return ...

2018-02-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20493


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20493: [SPARK-23326][WEBUI]schedulerDelay should return 0 when ...

2018-02-05 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20493
  
thanks, merging to master/2.3!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20493: [SPARK-23326][WEBUI]schedulerDelay should return ...

2018-02-05 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/20493#discussion_r166197592
  
--- Diff: 
core/src/test/scala/org/apache/spark/status/AppStatusUtilsSuite.scala ---
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.status
+
+import java.util.Date
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.status.api.v1.{TaskData, TaskMetrics}
+
+class AppStatusUtilsSuite extends SparkFunSuite {
+
+  test("schedulerDelay") {
+val runningTask = new TaskData(
--- End diff --

Yeah, I'm inclined to keep it as they are more real. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20493: [SPARK-23326][WEBUI]schedulerDelay should return 0 when ...

2018-02-05 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20493
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20493: [SPARK-23326][WEBUI]schedulerDelay should return ...

2018-02-05 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20493#discussion_r166197254
  
--- Diff: 
core/src/test/scala/org/apache/spark/status/AppStatusUtilsSuite.scala ---
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.status
+
+import java.util.Date
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.status.api.v1.{TaskData, TaskMetrics}
+
+class AppStatusUtilsSuite extends SparkFunSuite {
+
+  test("schedulerDelay") {
+val runningTask = new TaskData(
--- End diff --

Actually there are many different values between these 2 code blocks
```
 +executorDeserializeTime = 5L,
 +executorDeserializeCpuTime = 3L,
 +executorRunTime = 90L,
 +executorCpuTime = 10L,
 +resultSize = 100L,
 +jvmGcTime = 10L,
 +resultSerializationTime = 2L,
```
I think it's OK keep the code as it is.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20495: [SPARK-23327] [SQL] Update the description and te...

2018-02-05 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/20495#discussion_r166196777
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -1655,15 +1655,17 @@ case class Left(str: Expression, len: Expression, 
child: Expression) extends Run
  */
 // scalastyle:off line.size.limit
 @ExpressionDescription(
-  usage = "_FUNC_(expr) - Returns the character length of `expr` or number 
of bytes in binary data.",
+  usage = "_FUNC_(expr) - Returns the character length of `expr` or number 
of bytes in binary data. " +
--- End diff --

why are other places use "binary string" and here we have "binary data"?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...

2018-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20506
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87092/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...

2018-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20506
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...

2018-02-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20506
  
**[Test build #87092 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87092/testReport)**
 for PR 20506 at commit 
[`f151cdf`](https://github.com/apache/spark/commit/f151cdf492959d928025a51cabe9c4ba7a395460).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20477
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87087/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20477
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20477
  
**[Test build #87087 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87087/testReport)**
 for PR 20477 at commit 
[`1556a9f`](https://github.com/apache/spark/commit/1556a9f782d9aed08322d222dbd9223dfe479a2a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20513: [SPARK-23312][SQL][followup] add a config to turn off ve...

2018-02-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20513
  
**[Test build #87093 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87093/testReport)**
 for PR 20513 at commit 
[`8525b2c`](https://github.com/apache/spark/commit/8525b2c7e540991c75c8d61bfc5a8361cae78c7b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20513: [SPARK-23312][SQL][followup] add a config to turn off ve...

2018-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20513
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/614/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20513: [SPARK-23312][SQL][followup] add a config to turn off ve...

2018-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20513
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20513: [SPARK-23312][SQL][followup] add a config to turn off ve...

2018-02-05 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20513
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18555: [SPARK-21353][CORE]add checkValue in spark.internal.conf...

2018-02-05 Thread heary-cao
Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/18555
  
cc @HyukjinKwon,


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20513: [SPARK-23312][SQL][followup] add a config to turn off ve...

2018-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20513
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20513: [SPARK-23312][SQL][followup] add a config to turn off ve...

2018-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20513
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87088/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20513: [SPARK-23312][SQL][followup] add a config to turn off ve...

2018-02-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20513
  
**[Test build #87088 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87088/testReport)**
 for PR 20513 at commit 
[`8525b2c`](https://github.com/apache/spark/commit/8525b2c7e540991c75c8d61bfc5a8361cae78c7b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20226: [SPARK-23034][SQL] Override `nodeName` for all *ScanExec...

2018-02-05 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20226
  
After went through the changes here, I think we only need to update 2 nodes 
to include table name in `nodeName`: hive table scan and in-memory table scan.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

2018-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20485
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87086/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

2018-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20485
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

2018-02-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20485
  
**[Test build #87086 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87086/testReport)**
 for PR 20485 at commit 
[`3aa0438`](https://github.com/apache/spark/commit/3aa043897bea5de1c230db6386d832e9b2993df3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20226: [SPARK-23034][SQL] Override `nodeName` for all *S...

2018-02-05 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20226#discussion_r166193218
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -169,10 +171,12 @@ case class LogicalRDD(
 case class RDDScanExec(
 output: Seq[Attribute],
 rdd: RDD[InternalRow],
-override val nodeName: String,
+name: String,
 override val outputPartitioning: Partitioning = UnknownPartitioning(0),
 override val outputOrdering: Seq[SortOrder] = Nil) extends 
LeafExecNode {
 
+  override val nodeName: String = s"Scan RDD $name 
${output.map(_.name).mkString("[", ",", "]")}"
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20226: [SPARK-23034][SQL] Override `nodeName` for all *S...

2018-02-05 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20226#discussion_r166193203
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -103,6 +103,8 @@ case class ExternalRDDScanExec[T](
   override lazy val metrics = Map(
 "numOutputRows" -> SQLMetrics.createMetric(sparkContext, "number of 
output rows"))
 
+  override val nodeName: String = s"Scan ExternalRDD 
${output.map(_.name).mkString("[", ",", "]")}"
--- End diff --

I don't think including the output in the node name is a good idea.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20448: [SPARK-23203][SQL] make DataSourceV2Relation immu...

2018-02-05 Thread cloud-fan
Github user cloud-fan closed the pull request at:

https://github.com/apache/spark/pull/20448


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...

2018-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20506
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/613/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...

2018-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20506
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...

2018-02-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20506
  
**[Test build #87092 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87092/testReport)**
 for PR 20506 at commit 
[`f151cdf`](https://github.com/apache/spark/commit/f151cdf492959d928025a51cabe9c4ba7a395460).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20513: [SPARK-23312][SQL][followup] add a config to turn off ve...

2018-02-05 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20513
  
LGTM pending Jenkins.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20513: [SPARK-23312][SQL][followup] add a config to turn...

2018-02-05 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20513#discussion_r166192327
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala
 ---
@@ -61,6 +61,9 @@ case class InMemoryTableScanExec(
 }) && !WholeStageCodegenExec.isTooManyFields(conf, relation.schema)
   }
 
+  // TODO: revisit this. Shall we always turn off whole stage codegen if 
the output data are rows?
+  override def supportCodegen: Boolean = supportsBatch
--- End diff --

Yeah, we can do more perf measurement after 2.3 release


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for ...

2018-02-05 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20506#discussion_r166192233
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -4062,18 +4062,42 @@ def test_vectorized_udf_unsupported_types(self):
 with self.assertRaisesRegexp(Exception, 'Unsupported data 
type'):
 df.select(f(col('map'))).collect()
 
-def test_vectorized_udf_null_date(self):
+def test_vectorized_udf_dates(self):
--- End diff --

Maybe `ArrowTests.test_toPandas_arrow_toggle`:


https://github.com/apache/spark/blob/ebdbd8c4a06a4da52fc61b1dc98d6e2f2facdf9c/python/pyspark/sql/tests.py#L3461-L3464

?

In addition, I'll modify it to check between its expected Pandas DataFrame.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for ...

2018-02-05 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20506#discussion_r166191974
  
--- Diff: python/pyspark/sql/types.py ---
@@ -1694,6 +1694,21 @@ def from_arrow_schema(arrow_schema):
  for field in arrow_schema])
 
 
+def _correct_date_of_dataframe_from_arrow(pdf, schema):
--- End diff --

Sure. I'll update it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20226: [SPARK-23034][SQL] Override `nodeName` for all *ScanExec...

2018-02-05 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20226
  
After more thoughts, I feel it's reasonable to include table information in 
the node name.

The UI displays `nodeName` in the plan graph, and displays `simpleString` 
in a pop-up window when users hover over the plan graph. Since table 
information is pretty important, it makes sense to display it in the plan graph 
instead of the pop-up window.

Data Source table scan does follow this rule

![image](https://user-images.githubusercontent.com/3182036/35843404-e432e968-0b42-11e8-8487-d00735afe3b8.png)

![image](https://user-images.githubusercontent.com/3182036/35843409-edae649a-0b42-11e8-8706-b7b5d3f3b212.png)

+1 on this PR to fix the hive table scan, or any other scan nodes that 
don't follow this rule.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20507: [SPARK-23334][SQL][PYTHON] Fix pandas_udf with return ty...

2018-02-05 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/20507
  
also cc @cloud-fan @gatorsmile @sameeragarwal 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

2018-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20419
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87084/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

2018-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20419
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

2018-02-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20419
  
**[Test build #87084 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87084/testReport)**
 for PR 20419 at commit 
[`cb7a16b`](https://github.com/apache/spark/commit/cb7a16b1e1abdb7dcb45f2a18085dda0cae8c12f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20495: [SPARK-23327] [SQL] Update the description and tests of ...

2018-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20495
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/612/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20495: [SPARK-23327] [SQL] Update the description and tests of ...

2018-02-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20495
  
**[Test build #87091 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87091/testReport)**
 for PR 20495 at commit 
[`9e97db9`](https://github.com/apache/spark/commit/9e97db9da89c9d9f8bb467eb025239041b3231db).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20495: [SPARK-23327] [SQL] Update the description and tests of ...

2018-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20495
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20495: [SPARK-23327] [SQL] Update the description and tests of ...

2018-02-05 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20495
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...

2018-02-05 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20506
  
@HyukjinKwon SGTM!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for ...

2018-02-05 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20506#discussion_r166189478
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -4062,18 +4062,42 @@ def test_vectorized_udf_unsupported_types(self):
 with self.assertRaisesRegexp(Exception, 'Unsupported data 
type'):
 df.select(f(col('map'))).collect()
 
-def test_vectorized_udf_null_date(self):
+def test_vectorized_udf_dates(self):
--- End diff --

shall we have a new test to directly verify the `toPandas` works?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for ...

2018-02-05 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20506#discussion_r166189014
  
--- Diff: python/pyspark/sql/types.py ---
@@ -1694,6 +1694,21 @@ def from_arrow_schema(arrow_schema):
  for field in arrow_schema])
 
 
+def _correct_date_of_dataframe_from_arrow(pdf, schema):
--- End diff --

to be consistent with other methods in this file, how about 
`_check_dataframe_convert_date`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20495: [SPARK-23327] [SQL] Update the description and tests of ...

2018-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20495
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87085/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20495: [SPARK-23327] [SQL] Update the description and tests of ...

2018-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20495
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20495: [SPARK-23327] [SQL] Update the description and tests of ...

2018-02-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20495
  
**[Test build #87085 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87085/testReport)**
 for PR 20495 at commit 
[`9e97db9`](https://github.com/apache/spark/commit/9e97db9da89c9d9f8bb467eb025239041b3231db).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20514: [SPARK-23310][CORE][FOLLOWUP] Fix Java style check issue...

2018-02-05 Thread sitalkedia
Github user sitalkedia commented on the issue:

https://github.com/apache/spark/pull/20514
  
LGTM, thanks for fixing this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...

2018-02-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20506
  
I originally thought similarly but after another look into this again, it 
seems it would rather be better to keep it consistent with what Pandas does for 
now. FYI, seems `datetime.date` -> `object` in Pandas:

```
>>> pd.Series([datetime.date(2012,1,1)])
02012-01-01
dtype: object
```

and looks it needs a explicit conversion:

```
>>> pd.Series([pd.Timestamp(datetime.date(2012,1,1))])
0   2012-01-01
dtype: datetime64[ns]
```

Given `datetime.datetime` and `datetime.date` are not directly comparable, 
seems making sense to have a different type at least for now. I think we can 
even go with it into the master and then research the past discussion within 
Pandas after 2.3.0.

I have been reading related discussions from yesterday with Pandas dev and 
seems we should go with `object`. For example see 
`https://github.com/pandas-dev/pandas/issues/6932#issuecomment-41084598` and 
`https://github.com/pandas-dev/pandas/issues/4338` (I left links with code 
blocks to avoid messing up links to other repos).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20510: [SPARK-23336][BUILD] Upgrade snappy-java to 1.1.4

2018-02-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20510
  
**[Test build #87090 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87090/testReport)**
 for PR 20510 at commit 
[`7565e29`](https://github.com/apache/spark/commit/7565e2991b022011e78b163c2a7af226c37defed).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20510: [SPARK-23336][BUILD] Upgrade snappy-java to 1.1.4

2018-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20510
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20510: [SPARK-23336][BUILD] Upgrade snappy-java to 1.1.4

2018-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20510
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/611/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20510: [SPARK-23336][BUILD] Upgrade snappy-java to 1.1.4

2018-02-05 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20510
  
jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20514: [SPARK-23310][CORE][FOLLOWUP] Fix Java style check issue...

2018-02-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20514
  
**[Test build #87089 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87089/testReport)**
 for PR 20514 at commit 
[`405418a`](https://github.com/apache/spark/commit/405418a1e6647e92b7c9b29fee5a0a8135546336).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20514: [SPARK-23310][CORE][FOLLOWUP] Fix Java style check issue...

2018-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20514
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/610/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20514: [SPARK-23310][CORE][FOLLOWUP] Fix Java style check issue...

2018-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20514
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20514: [SPARK-23310][CORE][FOLLOWUP] Fix Java style check issue...

2018-02-05 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/20514
  
cc @sitalkedia @sameeragarwal 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20514: [SPARK-23310][CORE][FOLLOWUP] Fix Java style chec...

2018-02-05 Thread ueshin
GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/20514

[SPARK-23310][CORE][FOLLOWUP] Fix Java style check issues.

## What changes were proposed in this pull request?

This is a follow-up of #20492 which broke lint-java checks.
This pr fixes the lint-java issues.

```
[ERROR] 
src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java:[79]
 (sizes) LineLength: Line is longer than 100 characters (found 114).
```

## How was this patch tested?

Checked manually in my local environment.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-23310/fup1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20514.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20514


commit 405418a1e6647e92b7c9b29fee5a0a8135546336
Author: Takuya UESHIN 
Date:   2018-02-06T04:26:37Z

Fix Java style check issues.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20513: [SPARK-23312][SQL][followup] add a config to turn off ve...

2018-02-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20513
  
**[Test build #87088 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87088/testReport)**
 for PR 20513 at commit 
[`8525b2c`](https://github.com/apache/spark/commit/8525b2c7e540991c75c8d61bfc5a8361cae78c7b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20513: [SPARK-23312][SQL][followup] add a config to turn off ve...

2018-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20513
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20513: [SPARK-23312][SQL][followup] add a config to turn off ve...

2018-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20513
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/609/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20513: [SPARK-23312][SQL][followup] add a config to turn...

2018-02-05 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20513#discussion_r166184445
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala
 ---
@@ -61,6 +61,9 @@ case class InMemoryTableScanExec(
 }) && !WholeStageCodegenExec.isTooManyFields(conf, relation.schema)
   }
 
+  // TODO: revisit this. Shall we always turn off whole stage codegen if 
the output data are rows?
+  override def supportCodegen: Boolean = supportsBatch
--- End diff --

In 2.4 we should look into this. My gut feeling is we don't need to enable 
whole stage codegen for scan nodes that output data as rows.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20513: [SPARK-23312][SQL][followup] add a config to turn off ve...

2018-02-05 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20513
  
@sameeragarwal @kiszk  @viirya @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20513: [SPARK-23312][SQL][followup] add a config to turn...

2018-02-05 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/20513

[SPARK-23312][SQL][followup] add a config to turn off vectorized cache 
reader

## What changes were proposed in this pull request?

https://github.com/apache/spark/pull/20483 tried to provide a way to turn 
off the new columnar cache reader, to restore the behavior in 2.2. However even 
we turn off that config, the behavior is still different than 2.2.

If the output data are rows, we still enable whole stage codegen for the 
scan node, which is different with 2.2, we should also fix it.

## How was this patch tested?

existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark cache

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20513.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20513


commit 8525b2c7e540991c75c8d61bfc5a8361cae78c7b
Author: Wenchen Fan 
Date:   2018-02-06T04:17:03Z

followup




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20508: [SPARK-23335][SQL] Should not convert to double w...

2018-02-05 Thread caneGuy
Github user caneGuy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20508#discussion_r166182782
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -327,6 +327,14 @@ object TypeCoercion {
   // Skip nodes who's children have not been resolved yet.
   case e if !e.childrenResolved => e
 
+  // For integralType should not convert to double which will cause 
precision loss.
+  case a @ BinaryArithmetic(left @ StringType(), right @ 
IntegralType()) =>
--- End diff --

Thanks @wangyum , it will return `NULL`.
I modify to use `DecimalType.SYSTEM_DEFAULT` instead. 
I consider to check value, but i think `DecimalType.SYSTEM_DEFAULT` is 
enough.What do you think about this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20493: [SPARK-23326][WEBUI]schedulerDelay should return ...

2018-02-05 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20493#discussion_r166181600
  
--- Diff: 
core/src/test/scala/org/apache/spark/status/AppStatusUtilsSuite.scala ---
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.status
+
+import java.util.Date
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.status.api.v1.{TaskData, TaskMetrics}
+
+class AppStatusUtilsSuite extends SparkFunSuite {
+
+  test("schedulerDelay") {
+val runningTask = new TaskData(
--- End diff --

+1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20493: [SPARK-23326][WEBUI]schedulerDelay should return ...

2018-02-05 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20493#discussion_r166181455
  
--- Diff: core/src/main/scala/org/apache/spark/status/AppStatusUtils.scala 
---
@@ -17,16 +17,23 @@
 
 package org.apache.spark.status
 
-import org.apache.spark.status.api.v1.{TaskData, TaskMetrics}
+import org.apache.spark.status.api.v1.TaskData
 
 private[spark] object AppStatusUtils {
 
+  private val TASK_FINISHED_STATES = Set("FAILED", "KILLED", "SUCCESS")
+
+  private def isTaskFinished(task: TaskData): Boolean = {
+TASK_FINISHED_STATES.contains(task.status)
+  }
+
   def schedulerDelay(task: TaskData): Long = {
-if (task.taskMetrics.isDefined && task.duration.isDefined) {
+if (isTaskFinished(task) && task.taskMetrics.isDefined && 
task.duration.isDefined) {
--- End diff --

Logically `duration` should be set for running tasks, to indicate how long 
a task has been run.

I feel it's safer to keep `task.duration.isDefined`, as we call 
`task.duration.get` below.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2

2018-02-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20511
  
Thank you for review, @gatorsmile and @HyukjinKwon .
Sure, this is for Apache Spark 2.4.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20510: [SPARK-23336][BUILD] Upgrade snappy-java to 1.1.4

2018-02-05 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20510
  
The failure is due to flaky test suite.
```
org.apache.spark.sql.hive.client.HiveClientSuites.(It is not a test it is a 
sbt.testing.NestedSuiteSelector)
```

jenkins, retest this please.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for ...

2018-02-05 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20506#discussion_r166179612
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -2020,8 +2021,6 @@ def _to_corrected_pandas_type(dt):
 return np.int32
 elif type(dt) == FloatType:
 return np.float32
-elif type(dt) == DateType:
-return 'datetime64[ns]'
--- End diff --

+1, I feel it was a bug. Maybe we can merge this to branch-2.3 only and 
update the migration guide in the master branch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19340: [SPARK-22119][ML] Add cosine distance to KMeans

2018-02-05 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/19340
  
@mgaido91 what do you think about the right follow-up here? as in your 
comment just above?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20477
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20477
  
**[Test build #87087 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87087/testReport)**
 for PR 20477 at commit 
[`1556a9f`](https://github.com/apache/spark/commit/1556a9f782d9aed08322d222dbd9223dfe479a2a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20477
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/608/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20477: [SPARK-23303][SQL] improve the explain result for...

2018-02-05 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20477#discussion_r166175748
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala
 ---
@@ -36,11 +38,14 @@ import org.apache.spark.sql.types.StructType
  */
 case class DataSourceV2ScanExec(
 fullOutput: Seq[AttributeReference],
-@transient reader: DataSourceReader)
+@transient reader: DataSourceReader,
+@transient sourceClass: Class[_ <: DataSourceV2])
   extends LeafExecNode with DataSourceReaderHolder with ColumnarBatchScan {
 
   override def canEqual(other: Any): Boolean = 
other.isInstanceOf[DataSourceV2ScanExec]
 
+  override def simpleString: String = s"Scan $metadataString"
--- End diff --

I've replied on that PR. I don't think overwriting `nodeName` is the right 
way to fix the UI issue, as we need to overwrite more methods. We can discuss 
more on that PR about this problem, but it should not block this PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

2018-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20485
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/607/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

2018-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20485
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

2018-02-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20485
  
**[Test build #87086 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87086/testReport)**
 for PR 20485 at commit 
[`3aa0438`](https://github.com/apache/spark/commit/3aa043897bea5de1c230db6386d832e9b2993df3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

2018-02-05 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20485
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   >