[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20487 Ah, yup. There are few tests for old Pandas which were tested only when Pandas version was lower, and I rewrote them to be tested when both Pandas version is lower and missing. Let me clarify the title and description. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20515: [SPARK-23290][SQL][PYTHON][BACKPORT-2.3] Use datetime.da...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20515 **[Test build #87094 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87094/testReport)** for PR 20515 at commit [`b489f4a`](https://github.com/apache/spark/commit/b489f4a0d4fa25fd51d9db78bd01fc972e4e0dd4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20495: [SPARK-23327] [SQL] Update the description and tests of ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20495 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87091/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20515: [SPARK-23290][SQL][PYTHON][BACKPORT-2.3] Use datetime.da...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20515 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/615/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20495: [SPARK-23327] [SQL] Update the description and tests of ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20495 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20515: [SPARK-23290][SQL][PYTHON][BACKPORT-2.3] Use datetime.da...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20515 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20495: [SPARK-23327] [SQL] Update the description and tests of ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20495 **[Test build #87091 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87091/testReport)** for PR 20495 at commit [`9e97db9`](https://github.com/apache/spark/commit/9e97db9da89c9d9f8bb467eb025239041b3231db). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20515: [SPARK-23290][SQL][PYTHON][BACKPORT-2.3] Use datetime.da...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/20515 cc @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20515: [SPARK-23290][SQL][PYTHON][BACKPORT-2.3] Use date...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/20515 [SPARK-23290][SQL][PYTHON][BACKPORT-2.3] Use datetime.date for date type when converting Spark DataFrame to Pandas DataFrame. ## What changes were proposed in this pull request? This is a backport of #20506. In #18664, there was a change in how `DateType` is being returned to users ([line 1968 in dataframe.py](https://github.com/apache/spark/pull/18664/files#diff-6fc344560230bf0ef711bb9b5573f1faR1968)). This can cause client code which works in Spark 2.2 to fail. See [SPARK-23290](https://issues.apache.org/jira/browse/SPARK-23290?focusedCommentId=16350917=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16350917) for an example. This pr modifies to use `datetime.date` for date type as Spark 2.2 does. ## How was this patch tested? Tests modified to fit the new behavior and existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-23290_2.3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20515.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20515 commit b489f4a0d4fa25fd51d9db78bd01fc972e4e0dd4 Author: Takuya UESHINDate: 2018-02-06T06:52:25Z [SPARK-23290][SQL][PYTHON] Use datetime.date for date type when converting Spark DataFrame to Pandas DataFrame. ## What changes were proposed in this pull request? In #18664, there was a change in how `DateType` is being returned to users ([line 1968 in dataframe.py](https://github.com/apache/spark/pull/18664/files#diff-6fc344560230bf0ef711bb9b5573f1faR1968)). This can cause client code which works in Spark 2.2 to fail. See [SPARK-23290](https://issues.apache.org/jira/browse/SPARK-23290?focusedCommentId=16350917=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16350917) for an example. This pr modifies to use `datetime.date` for date type as Spark 2.2 does. ## How was this patch tested? Tests modified to fit the new behavior and existing tests. Author: Takuya UESHIN Closes #20506 from ueshin/issues/SPARK-23290. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18555: [SPARK-21353][CORE]add checkValue in spark.internal.conf...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18555 Seems fine. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20495: [SPARK-23327] [SQL] Update the description and te...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20495#discussion_r166205775 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -1655,15 +1655,17 @@ case class Left(str: Expression, len: Expression, child: Expression) extends Run */ // scalastyle:off line.size.limit @ExpressionDescription( - usage = "_FUNC_(expr) - Returns the character length of `expr` or number of bytes in binary data.", + usage = "_FUNC_(expr) - Returns the character length of `expr` or number of bytes in binary data. " + --- End diff -- We should be consistent, either `character string` vs `binary string`, or `string data` vs `binary data`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20487 looks like this PR doesn't skip the "old Pandas" tests, but rewrite them? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20473: [SPARK-23300][TESTS] Prints out if Pandas and PyA...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20473 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20473: [SPARK-23300][TESTS] Prints out if Pandas and PyArrow ar...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20473 Thank you @felixcheung, @yhuai, @ueshin and @BryanCutler for reviewing this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20473: [SPARK-23300][TESTS] Prints out if Pandas and PyArrow ar...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20473 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20506 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20506 @ueshin can you send a new PR for 2.3? it conflicts, thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20506 LGTM, merging to master/2.3! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19340: [SPARK-22119][ML] Add cosine distance to KMeans
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/19340 @mgaido91 agree that it is better to normalize centers --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20493: [SPARK-23326][WEBUI]schedulerDelay should return ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20493 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20493: [SPARK-23326][WEBUI]schedulerDelay should return 0 when ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20493 thanks, merging to master/2.3! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20493: [SPARK-23326][WEBUI]schedulerDelay should return ...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/20493#discussion_r166197592 --- Diff: core/src/test/scala/org/apache/spark/status/AppStatusUtilsSuite.scala --- @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.status + +import java.util.Date + +import org.apache.spark.SparkFunSuite +import org.apache.spark.status.api.v1.{TaskData, TaskMetrics} + +class AppStatusUtilsSuite extends SparkFunSuite { + + test("schedulerDelay") { +val runningTask = new TaskData( --- End diff -- Yeah, I'm inclined to keep it as they are more real. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20493: [SPARK-23326][WEBUI]schedulerDelay should return 0 when ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20493 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20493: [SPARK-23326][WEBUI]schedulerDelay should return ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20493#discussion_r166197254 --- Diff: core/src/test/scala/org/apache/spark/status/AppStatusUtilsSuite.scala --- @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.status + +import java.util.Date + +import org.apache.spark.SparkFunSuite +import org.apache.spark.status.api.v1.{TaskData, TaskMetrics} + +class AppStatusUtilsSuite extends SparkFunSuite { + + test("schedulerDelay") { +val runningTask = new TaskData( --- End diff -- Actually there are many different values between these 2 code blocks ``` +executorDeserializeTime = 5L, +executorDeserializeCpuTime = 3L, +executorRunTime = 90L, +executorCpuTime = 10L, +resultSize = 100L, +jvmGcTime = 10L, +resultSerializationTime = 2L, ``` I think it's OK keep the code as it is. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20495: [SPARK-23327] [SQL] Update the description and te...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/20495#discussion_r166196777 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -1655,15 +1655,17 @@ case class Left(str: Expression, len: Expression, child: Expression) extends Run */ // scalastyle:off line.size.limit @ExpressionDescription( - usage = "_FUNC_(expr) - Returns the character length of `expr` or number of bytes in binary data.", + usage = "_FUNC_(expr) - Returns the character length of `expr` or number of bytes in binary data. " + --- End diff -- why are other places use "binary string" and here we have "binary data"? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20506 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87092/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20506 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20506 **[Test build #87092 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87092/testReport)** for PR 20506 at commit [`f151cdf`](https://github.com/apache/spark/commit/f151cdf492959d928025a51cabe9c4ba7a395460). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20477 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87087/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20477 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20477 **[Test build #87087 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87087/testReport)** for PR 20477 at commit [`1556a9f`](https://github.com/apache/spark/commit/1556a9f782d9aed08322d222dbd9223dfe479a2a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20513: [SPARK-23312][SQL][followup] add a config to turn off ve...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20513 **[Test build #87093 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87093/testReport)** for PR 20513 at commit [`8525b2c`](https://github.com/apache/spark/commit/8525b2c7e540991c75c8d61bfc5a8361cae78c7b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20513: [SPARK-23312][SQL][followup] add a config to turn off ve...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20513 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/614/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20513: [SPARK-23312][SQL][followup] add a config to turn off ve...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20513 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20513: [SPARK-23312][SQL][followup] add a config to turn off ve...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20513 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18555: [SPARK-21353][CORE]add checkValue in spark.internal.conf...
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/18555 cc @HyukjinKwon, --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20513: [SPARK-23312][SQL][followup] add a config to turn off ve...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20513 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20513: [SPARK-23312][SQL][followup] add a config to turn off ve...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20513 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87088/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20513: [SPARK-23312][SQL][followup] add a config to turn off ve...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20513 **[Test build #87088 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87088/testReport)** for PR 20513 at commit [`8525b2c`](https://github.com/apache/spark/commit/8525b2c7e540991c75c8d61bfc5a8361cae78c7b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20226: [SPARK-23034][SQL] Override `nodeName` for all *ScanExec...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20226 After went through the changes here, I think we only need to update 2 nodes to include table name in `nodeName`: hive table scan and in-memory table scan. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20485 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87086/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20485 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20485 **[Test build #87086 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87086/testReport)** for PR 20485 at commit [`3aa0438`](https://github.com/apache/spark/commit/3aa043897bea5de1c230db6386d832e9b2993df3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20226: [SPARK-23034][SQL] Override `nodeName` for all *S...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20226#discussion_r166193218 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -169,10 +171,12 @@ case class LogicalRDD( case class RDDScanExec( output: Seq[Attribute], rdd: RDD[InternalRow], -override val nodeName: String, +name: String, override val outputPartitioning: Partitioning = UnknownPartitioning(0), override val outputOrdering: Seq[SortOrder] = Nil) extends LeafExecNode { + override val nodeName: String = s"Scan RDD $name ${output.map(_.name).mkString("[", ",", "]")}" --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20226: [SPARK-23034][SQL] Override `nodeName` for all *S...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20226#discussion_r166193203 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -103,6 +103,8 @@ case class ExternalRDDScanExec[T]( override lazy val metrics = Map( "numOutputRows" -> SQLMetrics.createMetric(sparkContext, "number of output rows")) + override val nodeName: String = s"Scan ExternalRDD ${output.map(_.name).mkString("[", ",", "]")}" --- End diff -- I don't think including the output in the node name is a good idea. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20448: [SPARK-23203][SQL] make DataSourceV2Relation immu...
Github user cloud-fan closed the pull request at: https://github.com/apache/spark/pull/20448 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20506 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/613/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20506 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20506 **[Test build #87092 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87092/testReport)** for PR 20506 at commit [`f151cdf`](https://github.com/apache/spark/commit/f151cdf492959d928025a51cabe9c4ba7a395460). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20513: [SPARK-23312][SQL][followup] add a config to turn off ve...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20513 LGTM pending Jenkins. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20513: [SPARK-23312][SQL][followup] add a config to turn...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20513#discussion_r166192327 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala --- @@ -61,6 +61,9 @@ case class InMemoryTableScanExec( }) && !WholeStageCodegenExec.isTooManyFields(conf, relation.schema) } + // TODO: revisit this. Shall we always turn off whole stage codegen if the output data are rows? + override def supportCodegen: Boolean = supportsBatch --- End diff -- Yeah, we can do more perf measurement after 2.3 release --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/20506#discussion_r166192233 --- Diff: python/pyspark/sql/tests.py --- @@ -4062,18 +4062,42 @@ def test_vectorized_udf_unsupported_types(self): with self.assertRaisesRegexp(Exception, 'Unsupported data type'): df.select(f(col('map'))).collect() -def test_vectorized_udf_null_date(self): +def test_vectorized_udf_dates(self): --- End diff -- Maybe `ArrowTests.test_toPandas_arrow_toggle`: https://github.com/apache/spark/blob/ebdbd8c4a06a4da52fc61b1dc98d6e2f2facdf9c/python/pyspark/sql/tests.py#L3461-L3464 ? In addition, I'll modify it to check between its expected Pandas DataFrame. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/20506#discussion_r166191974 --- Diff: python/pyspark/sql/types.py --- @@ -1694,6 +1694,21 @@ def from_arrow_schema(arrow_schema): for field in arrow_schema]) +def _correct_date_of_dataframe_from_arrow(pdf, schema): --- End diff -- Sure. I'll update it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20226: [SPARK-23034][SQL] Override `nodeName` for all *ScanExec...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20226 After more thoughts, I feel it's reasonable to include table information in the node name. The UI displays `nodeName` in the plan graph, and displays `simpleString` in a pop-up window when users hover over the plan graph. Since table information is pretty important, it makes sense to display it in the plan graph instead of the pop-up window. Data Source table scan does follow this rule ![image](https://user-images.githubusercontent.com/3182036/35843404-e432e968-0b42-11e8-8487-d00735afe3b8.png) ![image](https://user-images.githubusercontent.com/3182036/35843409-edae649a-0b42-11e8-8706-b7b5d3f3b212.png) +1 on this PR to fix the hive table scan, or any other scan nodes that don't follow this rule. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20507: [SPARK-23334][SQL][PYTHON] Fix pandas_udf with return ty...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/20507 also cc @cloud-fan @gatorsmile @sameeragarwal --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20419 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87084/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20419 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20419 **[Test build #87084 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87084/testReport)** for PR 20419 at commit [`cb7a16b`](https://github.com/apache/spark/commit/cb7a16b1e1abdb7dcb45f2a18085dda0cae8c12f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20495: [SPARK-23327] [SQL] Update the description and tests of ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20495 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/612/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20495: [SPARK-23327] [SQL] Update the description and tests of ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20495 **[Test build #87091 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87091/testReport)** for PR 20495 at commit [`9e97db9`](https://github.com/apache/spark/commit/9e97db9da89c9d9f8bb467eb025239041b3231db). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20495: [SPARK-23327] [SQL] Update the description and tests of ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20495 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20495: [SPARK-23327] [SQL] Update the description and tests of ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20495 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20506 @HyukjinKwon SGTM! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20506#discussion_r166189478 --- Diff: python/pyspark/sql/tests.py --- @@ -4062,18 +4062,42 @@ def test_vectorized_udf_unsupported_types(self): with self.assertRaisesRegexp(Exception, 'Unsupported data type'): df.select(f(col('map'))).collect() -def test_vectorized_udf_null_date(self): +def test_vectorized_udf_dates(self): --- End diff -- shall we have a new test to directly verify the `toPandas` works? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20506#discussion_r166189014 --- Diff: python/pyspark/sql/types.py --- @@ -1694,6 +1694,21 @@ def from_arrow_schema(arrow_schema): for field in arrow_schema]) +def _correct_date_of_dataframe_from_arrow(pdf, schema): --- End diff -- to be consistent with other methods in this file, how about `_check_dataframe_convert_date` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20495: [SPARK-23327] [SQL] Update the description and tests of ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20495 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87085/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20495: [SPARK-23327] [SQL] Update the description and tests of ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20495 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20495: [SPARK-23327] [SQL] Update the description and tests of ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20495 **[Test build #87085 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87085/testReport)** for PR 20495 at commit [`9e97db9`](https://github.com/apache/spark/commit/9e97db9da89c9d9f8bb467eb025239041b3231db). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20514: [SPARK-23310][CORE][FOLLOWUP] Fix Java style check issue...
Github user sitalkedia commented on the issue: https://github.com/apache/spark/pull/20514 LGTM, thanks for fixing this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for date ty...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20506 I originally thought similarly but after another look into this again, it seems it would rather be better to keep it consistent with what Pandas does for now. FYI, seems `datetime.date` -> `object` in Pandas: ``` >>> pd.Series([datetime.date(2012,1,1)]) 02012-01-01 dtype: object ``` and looks it needs a explicit conversion: ``` >>> pd.Series([pd.Timestamp(datetime.date(2012,1,1))]) 0 2012-01-01 dtype: datetime64[ns] ``` Given `datetime.datetime` and `datetime.date` are not directly comparable, seems making sense to have a different type at least for now. I think we can even go with it into the master and then research the past discussion within Pandas after 2.3.0. I have been reading related discussions from yesterday with Pandas dev and seems we should go with `object`. For example see `https://github.com/pandas-dev/pandas/issues/6932#issuecomment-41084598` and `https://github.com/pandas-dev/pandas/issues/4338` (I left links with code blocks to avoid messing up links to other repos). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20510: [SPARK-23336][BUILD] Upgrade snappy-java to 1.1.4
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20510 **[Test build #87090 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87090/testReport)** for PR 20510 at commit [`7565e29`](https://github.com/apache/spark/commit/7565e2991b022011e78b163c2a7af226c37defed). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20510: [SPARK-23336][BUILD] Upgrade snappy-java to 1.1.4
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20510 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20510: [SPARK-23336][BUILD] Upgrade snappy-java to 1.1.4
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20510 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/611/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20510: [SPARK-23336][BUILD] Upgrade snappy-java to 1.1.4
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20510 jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20514: [SPARK-23310][CORE][FOLLOWUP] Fix Java style check issue...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20514 **[Test build #87089 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87089/testReport)** for PR 20514 at commit [`405418a`](https://github.com/apache/spark/commit/405418a1e6647e92b7c9b29fee5a0a8135546336). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20514: [SPARK-23310][CORE][FOLLOWUP] Fix Java style check issue...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20514 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/610/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20514: [SPARK-23310][CORE][FOLLOWUP] Fix Java style check issue...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20514 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20514: [SPARK-23310][CORE][FOLLOWUP] Fix Java style check issue...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/20514 cc @sitalkedia @sameeragarwal --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20514: [SPARK-23310][CORE][FOLLOWUP] Fix Java style chec...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/20514 [SPARK-23310][CORE][FOLLOWUP] Fix Java style check issues. ## What changes were proposed in this pull request? This is a follow-up of #20492 which broke lint-java checks. This pr fixes the lint-java issues. ``` [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java:[79] (sizes) LineLength: Line is longer than 100 characters (found 114). ``` ## How was this patch tested? Checked manually in my local environment. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-23310/fup1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20514.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20514 commit 405418a1e6647e92b7c9b29fee5a0a8135546336 Author: Takuya UESHINDate: 2018-02-06T04:26:37Z Fix Java style check issues. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20513: [SPARK-23312][SQL][followup] add a config to turn off ve...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20513 **[Test build #87088 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87088/testReport)** for PR 20513 at commit [`8525b2c`](https://github.com/apache/spark/commit/8525b2c7e540991c75c8d61bfc5a8361cae78c7b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20513: [SPARK-23312][SQL][followup] add a config to turn off ve...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20513 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20513: [SPARK-23312][SQL][followup] add a config to turn off ve...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20513 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/609/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20513: [SPARK-23312][SQL][followup] add a config to turn...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20513#discussion_r166184445 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala --- @@ -61,6 +61,9 @@ case class InMemoryTableScanExec( }) && !WholeStageCodegenExec.isTooManyFields(conf, relation.schema) } + // TODO: revisit this. Shall we always turn off whole stage codegen if the output data are rows? + override def supportCodegen: Boolean = supportsBatch --- End diff -- In 2.4 we should look into this. My gut feeling is we don't need to enable whole stage codegen for scan nodes that output data as rows. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20513: [SPARK-23312][SQL][followup] add a config to turn off ve...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20513 @sameeragarwal @kiszk @viirya @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20513: [SPARK-23312][SQL][followup] add a config to turn...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/20513 [SPARK-23312][SQL][followup] add a config to turn off vectorized cache reader ## What changes were proposed in this pull request? https://github.com/apache/spark/pull/20483 tried to provide a way to turn off the new columnar cache reader, to restore the behavior in 2.2. However even we turn off that config, the behavior is still different than 2.2. If the output data are rows, we still enable whole stage codegen for the scan node, which is different with 2.2, we should also fix it. ## How was this patch tested? existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark cache Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20513.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20513 commit 8525b2c7e540991c75c8d61bfc5a8361cae78c7b Author: Wenchen FanDate: 2018-02-06T04:17:03Z followup --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20508: [SPARK-23335][SQL] Should not convert to double w...
Github user caneGuy commented on a diff in the pull request: https://github.com/apache/spark/pull/20508#discussion_r166182782 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -327,6 +327,14 @@ object TypeCoercion { // Skip nodes who's children have not been resolved yet. case e if !e.childrenResolved => e + // For integralType should not convert to double which will cause precision loss. + case a @ BinaryArithmetic(left @ StringType(), right @ IntegralType()) => --- End diff -- Thanks @wangyum , it will return `NULL`. I modify to use `DecimalType.SYSTEM_DEFAULT` instead. I consider to check value, but i think `DecimalType.SYSTEM_DEFAULT` is enough.What do you think about this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20493: [SPARK-23326][WEBUI]schedulerDelay should return ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20493#discussion_r166181600 --- Diff: core/src/test/scala/org/apache/spark/status/AppStatusUtilsSuite.scala --- @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.status + +import java.util.Date + +import org.apache.spark.SparkFunSuite +import org.apache.spark.status.api.v1.{TaskData, TaskMetrics} + +class AppStatusUtilsSuite extends SparkFunSuite { + + test("schedulerDelay") { +val runningTask = new TaskData( --- End diff -- +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20493: [SPARK-23326][WEBUI]schedulerDelay should return ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20493#discussion_r166181455 --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusUtils.scala --- @@ -17,16 +17,23 @@ package org.apache.spark.status -import org.apache.spark.status.api.v1.{TaskData, TaskMetrics} +import org.apache.spark.status.api.v1.TaskData private[spark] object AppStatusUtils { + private val TASK_FINISHED_STATES = Set("FAILED", "KILLED", "SUCCESS") + + private def isTaskFinished(task: TaskData): Boolean = { +TASK_FINISHED_STATES.contains(task.status) + } + def schedulerDelay(task: TaskData): Long = { -if (task.taskMetrics.isDefined && task.duration.isDefined) { +if (isTaskFinished(task) && task.taskMetrics.isDefined && task.duration.isDefined) { --- End diff -- Logically `duration` should be set for running tasks, to indicate how long a task has been run. I feel it's safer to keep `task.duration.isDefined`, as we call `task.duration.get` below. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20511: [SPARK-23340][BUILD] Update ORC to 1.4.2
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20511 Thank you for review, @gatorsmile and @HyukjinKwon . Sure, this is for Apache Spark 2.4. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20510: [SPARK-23336][BUILD] Upgrade snappy-java to 1.1.4
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20510 The failure is due to flaky test suite. ``` org.apache.spark.sql.hive.client.HiveClientSuites.(It is not a test it is a sbt.testing.NestedSuiteSelector) ``` jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20506: [SPARK-23290][SQL][PYTHON] Use datetime.date for ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20506#discussion_r166179612 --- Diff: python/pyspark/sql/dataframe.py --- @@ -2020,8 +2021,6 @@ def _to_corrected_pandas_type(dt): return np.int32 elif type(dt) == FloatType: return np.float32 -elif type(dt) == DateType: -return 'datetime64[ns]' --- End diff -- +1, I feel it was a bug. Maybe we can merge this to branch-2.3 only and update the migration guide in the master branch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19340: [SPARK-22119][ML] Add cosine distance to KMeans
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19340 @mgaido91 what do you think about the right follow-up here? as in your comment just above? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20477 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20477 **[Test build #87087 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87087/testReport)** for PR 20477 at commit [`1556a9f`](https://github.com/apache/spark/commit/1556a9f782d9aed08322d222dbd9223dfe479a2a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20477 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/608/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20477: [SPARK-23303][SQL] improve the explain result for...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20477#discussion_r166175748 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala --- @@ -36,11 +38,14 @@ import org.apache.spark.sql.types.StructType */ case class DataSourceV2ScanExec( fullOutput: Seq[AttributeReference], -@transient reader: DataSourceReader) +@transient reader: DataSourceReader, +@transient sourceClass: Class[_ <: DataSourceV2]) extends LeafExecNode with DataSourceReaderHolder with ColumnarBatchScan { override def canEqual(other: Any): Boolean = other.isInstanceOf[DataSourceV2ScanExec] + override def simpleString: String = s"Scan $metadataString" --- End diff -- I've replied on that PR. I don't think overwriting `nodeName` is the right way to fix the UI issue, as we need to overwrite more methods. We can discuss more on that PR about this problem, but it should not block this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20485 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/607/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20485 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20485 **[Test build #87086 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87086/testReport)** for PR 20485 at commit [`3aa0438`](https://github.com/apache/spark/commit/3aa043897bea5de1c230db6386d832e9b2993df3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20485 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org