[GitHub] spark issue #13989: [SPARK-16311][SQL] Improve metadata refresh
Github user petermaxlee commented on the issue: https://github.com/apache/spark/pull/13989 What do you mean by both positive and negative cases? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/13989#discussion_r69075298 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala --- @@ -166,8 +166,8 @@ private[sql] class SessionState(sparkSession: SparkSession) { def executePlan(plan: LogicalPlan): QueryExecution = new QueryExecution(sparkSession, plan) - def invalidateTable(tableName: String): Unit = { -catalog.invalidateTable(sqlParser.parseTableIdentifier(tableName)) + def refreshTable(tableName: String): Unit = { --- End diff -- I just picked the one that was exposed to users (refresh in catalog and in sql). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13603: [SPARK-15865][CORE] Blacklist should not result in job h...
Github user kayousterhout commented on the issue: https://github.com/apache/spark/pull/13603 LGTM! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13971: [SPARK-16289][SQL] Implement posexplode table gen...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13971#discussion_r69075261 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/GeneratorExpressionSuite.scala --- @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.unsafe.types.UTF8String + +class GeneratorExpressionSuite extends SparkFunSuite with ExpressionEvalHelper { + private def checkTuple(actual: ExplodeBase, expected: Seq[InternalRow]): Unit = { +assert(actual.eval(null).toSeq === expected) --- End diff -- And, how to check the zero row? At Line 39, https://github.com/apache/spark/pull/13971/files#diff-6715134a4e95980149a7600ecb71674cR41 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/13989#discussion_r69075247 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala --- @@ -85,5 +85,10 @@ case class LogicalRelation( expectedOutputAttributes, metastoreTableIdentifier).asInstanceOf[this.type] + override def refresh(): Unit = relation match { +case fs: HadoopFsRelation => fs.refresh() --- End diff -- I don't agree on this one. LogicalRelation might not be the only one that needs to override this in the future. There can certainly be other logical plans in the future that keep some state and needs to implement refresh. The definition of "refresh" itself with a default implementation also means only plans that need to refresh anything should override it. I'm going to update refresh in LogicalPlan to make this more clear. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/13989#discussion_r69075198 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -265,6 +265,11 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { s"Reference '$name' is ambiguous, could be: $referenceNames.") } } + + /** + * Invalidates any metadata cached in the plan recursively. + */ + def refresh(): Unit = children.foreach(_.refresh()) --- End diff -- I don't get it. Why would this be more expensive than any other recursive calls that happen in logical plans? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13972: [SPARK-16294][SQL] Labelling support for the incl...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13972 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13989#discussion_r69074558 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -265,6 +265,11 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { s"Reference '$name' is ambiguous, could be: $referenceNames.") } } + + /** + * Invalidates any metadata cached in the plan recursively. + */ + def refresh(): Unit = children.foreach(_.refresh()) --- End diff -- I think we want to avoid recursive implementation at best. It is too expensive for a large tree. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13972: [SPARK-16294][SQL] Labelling support for the include_exa...
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13972 @yinxusen Do you have time to consolidate example files for `mllib-data-types.md`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13972: [SPARK-16294][SQL] Labelling support for the include_exa...
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13972 LGTM2. Merged into master and branch-2.0. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13989#discussion_r69074411 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala --- @@ -85,5 +85,10 @@ case class LogicalRelation( expectedOutputAttributes, metastoreTableIdentifier).asInstanceOf[this.type] + override def refresh(): Unit = relation match { +case fs: HadoopFsRelation => fs.refresh() --- End diff -- I know, but we need to write the comments for the code readers. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/13989#discussion_r69074328 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -265,6 +265,11 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { s"Reference '$name' is ambiguous, could be: $referenceNames.") } } + + /** + * Invalidates any metadata cached in the plan recursively. + */ + def refresh(): Unit = children.foreach(_.refresh()) --- End diff -- But this function is not tail recursive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13971: [SPARK-16289][SQL] Implement posexplode table gen...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13971#discussion_r69074335 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/GeneratorExpressionSuite.scala --- @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.unsafe.types.UTF8String + +class GeneratorExpressionSuite extends SparkFunSuite with ExpressionEvalHelper { + private def checkTuple(actual: ExplodeBase, expected: Seq[InternalRow]): Unit = { +assert(actual.eval(null).toSeq === expected) --- End diff -- Oh, thank you for review, @cloud-fan , too. Do we have an example of `checkEvaluation` to check the generator, multiple InternalRows? I just thought `checkEvaluation` is just for a single row, e.g., values, arrays, maps. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13989#discussion_r69074265 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -265,6 +265,11 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { s"Reference '$name' is ambiguous, could be: $referenceNames.") } } + + /** + * Invalidates any metadata cached in the plan recursively. + */ + def refresh(): Unit = children.foreach(_.refresh()) --- End diff -- You need to mark it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13989: [SPARK-16311][SQL] Improve metadata refresh
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/13989 Test cases are not enough to cover the metadata refreshing. The current metadata cache is only used for data source tables. We still could convert Hive tables to data source tables. For example, parquet and orc. Thus, we also need to check the behaviors of these cases. Try to design more test cases for metadata refreshing, including both positive and negative cases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/13989#discussion_r69074253 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2307,6 +2307,19 @@ class Dataset[T] private[sql]( def distinct(): Dataset[T] = dropDuplicates() /** + * Refreshes the metadata and data cached in Spark for data associated with this Dataset. + * An example use case is to invalidate the file system metadata cached by Spark, when the + * underlying files have been updated by an external process. + * + * @group action + * @since 2.0.0 + */ + def refresh(): Unit = { +unpersist(false) --- End diff -- ah ic - we can't unpersist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/13989#discussion_r69074131 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala --- @@ -85,5 +85,10 @@ case class LogicalRelation( expectedOutputAttributes, metastoreTableIdentifier).asInstanceOf[this.type] + override def refresh(): Unit = relation match { +case fs: HadoopFsRelation => fs.refresh() --- End diff -- What do you mean? Other leaf nodes don't keep state, do they? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/13989#discussion_r69074039 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -265,6 +265,11 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { s"Reference '$name' is ambiguous, could be: $referenceNames.") } } + + /** + * Invalidates any metadata cached in the plan recursively. + */ + def refresh(): Unit = children.foreach(_.refresh()) --- End diff -- This is not a tailrec? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13989#discussion_r69073906 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala --- @@ -166,8 +166,8 @@ private[sql] class SessionState(sparkSession: SparkSession) { def executePlan(plan: LogicalPlan): QueryExecution = new QueryExecution(sparkSession, plan) - def invalidateTable(tableName: String): Unit = { -catalog.invalidateTable(sqlParser.parseTableIdentifier(tableName)) + def refreshTable(tableName: String): Unit = { --- End diff -- To be honest, I still think `invalidateTable` is a right name. We are not doing `refresh` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13989#discussion_r69073454 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala --- @@ -85,5 +85,10 @@ case class LogicalRelation( expectedOutputAttributes, metastoreTableIdentifier).asInstanceOf[this.type] + override def refresh(): Unit = relation match { +case fs: HadoopFsRelation => fs.refresh() --- End diff -- You have to document the reason why only `LogicalRelation` override this function --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13989#discussion_r69073383 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -265,6 +265,11 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { s"Reference '$name' is ambiguous, could be: $referenceNames.") } } + + /** + * Invalidates any metadata cached in the plan recursively. + */ + def refresh(): Unit = children.foreach(_.refresh()) --- End diff -- use @tailrec --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13989#discussion_r69073191 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -139,18 +139,6 @@ private[hive] class HiveMetastoreCatalog(sparkSession: SparkSession) extends Log } def refreshTable(tableIdent: TableIdentifier): Unit = { -// refreshTable does not eagerly reload the cache. It just invalidate the cache. -// Next time when we use the table, it will be populated in the cache. -// Since we also cache ParquetRelations converted from Hive Parquet tables and -// adding converted ParquetRelations into the cache is not defined in the load function -// of the cache (instead, we add the cache entry in convertToParquetRelation), -// it is better at here to invalidate the cache to avoid confusing waring logs from the -// cache loader (e.g. cannot find data source provider, which is only defined for -// data source table.). --- End diff -- Keep the comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13989#discussion_r69072136 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2307,6 +2307,19 @@ class Dataset[T] private[sql]( def distinct(): Dataset[T] = dropDuplicates() /** + * Refreshes the metadata and data cached in Spark for data associated with this Dataset. + * An example use case is to invalidate the file system metadata cached by Spark, when the + * underlying files have been updated by an external process. + * + * @group action + * @since 2.0.0 + */ + def refresh(): Unit = { +unpersist(false) --- End diff -- This new API has different behaviors from the `refreshTable` API and `Refresh Table` SQL statement. See the following code: https://github.com/apache/spark/blob/02a029df43392c5d73697203bf6ff51b8d6efb83/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala#L349-L374 IMO, if we using the word `refresh`, we have to make them consistent. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13988: [WIP][SPARK-16101][SQL] Refactoring CSV data source to b...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13988 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13988: [WIP][SPARK-16101][SQL] Refactoring CSV data source to b...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13988 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61523/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13988: [WIP][SPARK-16101][SQL] Refactoring CSV data source to b...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13988 **[Test build #61523 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61523/consoleFull)** for PR 13988 at commit [`211bfb4`](https://github.com/apache/spark/commit/211bfb47acc79c51327b3f1c40aa86802470f436). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13989#discussion_r69071788 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala --- @@ -85,5 +85,10 @@ case class LogicalRelation( expectedOutputAttributes, metastoreTableIdentifier).asInstanceOf[this.type] + override def refresh(): Unit = relation match { +case fs: HadoopFsRelation => fs.refresh() --- End diff -- How about the other leaf nodes? `LogicalRelation` is just one of them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13978: [SPARK-16256][DOCS] Minor fixes on the Structured Stream...
Github user ScrapCodes commented on the issue: https://github.com/apache/spark/pull/13978 Looks good ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/13989#discussion_r69071622 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2307,6 +2307,19 @@ class Dataset[T] private[sql]( def distinct(): Dataset[T] = dropDuplicates() /** + * Refreshes the metadata and data cached in Spark for data associated with this Dataset. + * An example use case is to invalidate the file system metadata cached by Spark, when the + * underlying files have been updated by an external process. + * + * @group action + * @since 2.0.0 + */ + def refresh(): Unit = { +unpersist(false) --- End diff -- Other refresh methods also remove cached data, so I thought this is better. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13969: [SPARK-16284][SQL] Implement reflect SQL function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13969 **[Test build #3152 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3152/consoleFull)** for PR 13969 at commit [`0e43c95`](https://github.com/apache/spark/commit/0e43c9560de9ce49953f90337e83bb30858915fc). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13966: [SPARK-16276][SQL] Implement elt SQL function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13966 **[Test build #3153 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3153/consoleFull)** for PR 13966 at commit [`bbccf10`](https://github.com/apache/spark/commit/bbccf1002a6f3a0d2bf9abc8ef68465245fa4983). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13989#discussion_r69071525 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2307,6 +2307,19 @@ class Dataset[T] private[sql]( def distinct(): Dataset[T] = dropDuplicates() /** + * Refreshes the metadata and data cached in Spark for data associated with this Dataset. + * An example use case is to invalidate the file system metadata cached by Spark, when the + * underlying files have been updated by an external process. + * + * @group action + * @since 2.0.0 + */ + def refresh(): Unit = { +unpersist(false) --- End diff -- It will remove the cached data. This is different from what JIRA describes. CC @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13966: [SPARK-16276][SQL] Implement elt SQL function
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/13966#discussion_r69070865 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -162,6 +163,46 @@ case class ConcatWs(children: Seq[Expression]) } } +@ExpressionDescription( + usage = "_FUNC_(n, str1, str2, ...) - returns the n-th string", --- End diff -- updated --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13987: [SPARK-16313][SQL] Spark should not silently drop except...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13987 **[Test build #61528 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61528/consoleFull)** for PR 13987 at commit [`bd2040a`](https://github.com/apache/spark/commit/bd2040a64e80f91b8805c3dcd1e99d3dbb7e6524). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13966: [SPARK-16276][SQL] Implement elt SQL function
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/13966#discussion_r69070679 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -162,6 +163,46 @@ case class ConcatWs(children: Seq[Expression]) } } +@ExpressionDescription( + usage = "_FUNC_(n, str1, str2, ...) - returns the n-th string", + extended = "> SELECT _FUNC_(1, 'scala', 'java') FROM src LIMIT 1;\n" + "'scala'") +case class Elt(children: Seq[Expression]) + extends Expression with ImplicitCastInputTypes with CodegenFallback { --- End diff -- Created https://issues.apache.org/jira/browse/SPARK-16315 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13989: [SPARK-16311][SQL] Improve metadata refresh
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/13989 Before, I tried to merge `invalidateTable` and `refreshTable`. @yhuai left the following comment: https://github.com/apache/spark/pull/13156#discussion_r63729506 I think maybe we can keep them separately? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13982: [SPARK-16304] LinkageError should not crash Spark execut...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/13982 cc @JoshRosen and @ericl --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13767: [MINOR][SQL] Not dropping all necessary tables
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/13767 cc: @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13990: [SPARK-16287][SQL][WIP] Implement str_to_map SQL functio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13990 **[Test build #61525 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61525/consoleFull)** for PR 13990 at commit [`1f888ab`](https://github.com/apache/spark/commit/1f888abb532c905dac11b404819786fd2641e38f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13987: [SPARK-16313][SQL] Spark should not silently drop except...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13987 **[Test build #61526 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61526/consoleFull)** for PR 13987 at commit [`dbf9e58`](https://github.com/apache/spark/commit/dbf9e58bdac662721d26f3bd5ca76a2c2acdb0ee). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13926: [SPARK-16229] [SQL] Drop Empty Table After CREATE TABLE ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13926 **[Test build #61527 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61527/consoleFull)** for PR 13926 at commit [`c0f08a5`](https://github.com/apache/spark/commit/c0f08a518332deac260bc69c787cba06ddf9cf98). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13990: [SPARK-16287][SQL][WIP] Implement str_to_map SQL ...
GitHub user techaddict opened a pull request: https://github.com/apache/spark/pull/13990 [SPARK-16287][SQL][WIP] Implement str_to_map SQL function ## What changes were proposed in this pull request? This PR adds `str_to_map` SQL function in order to remove Hive fallback. ## How was this patch tested? Pass the Jenkins tests with newly added. You can merge this pull request into a Git repository by running: $ git pull https://github.com/techaddict/spark SPARK-16287 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13990.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13990 commit af59f57cecd93de49ec5bd20058199d93a9f2445 Author: Sandeep SinghDate: 2016-06-30T03:54:05Z First pass without arguments commit dc6b1f439e32768828bdb7d1a10f8b8178fa4c13 Author: Sandeep Singh Date: 2016-06-30T04:32:54Z Add delimiter options commit a8e6631edf6d124f218b15589427664f5b454759 Author: Sandeep Singh Date: 2016-06-30T04:36:08Z Merge master commit 1f888abb532c905dac11b404819786fd2641e38f Author: Sandeep Singh Date: 2016-06-30T04:37:13Z merge fix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13926: [SPARK-16229] [SQL] Drop Empty Table After CREATE TABLE ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/13926 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13989: [SPARK-16311][SQL] Improve metadata refresh
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13989 **[Test build #61524 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61524/consoleFull)** for PR 13989 at commit [`82f9bec`](https://github.com/apache/spark/commit/82f9bec79125ad3f1c4da504891a75adb5b33f2f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13926: [SPARK-16229] [SQL] Drop Empty Table After CREATE TABLE ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/13926 ping @hvanhovell Could you please take a look at this again? : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13886: [SPARK-16185] [SQL] Better Error Messages When Creating ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/13886 Could you please review this PR again? @cloud-fan Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13989: [SPARK-16311][SQL] Improve metadata refresh
Github user rxin commented on the issue: https://github.com/apache/spark/pull/13989 cc @cloud-fan / @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13989: [SPARK-16311][SQL] Improve metadata refresh
Github user petermaxlee commented on the issue: https://github.com/apache/spark/pull/13989 cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh
GitHub user petermaxlee opened a pull request: https://github.com/apache/spark/pull/13989 [SPARK-16311][SQL] Improve metadata refresh ## What changes were proposed in this pull request? This patch implements the 3 things specified in SPARK-16311: (1) Append a message to the FileNotFoundException and say that a workaround is to do explicitly metadata refresh. (2) Make metadata refresh work on temporary tables/views. (3) Make metadata refresh work on Datasets/DataFrames, by introducing a Dataset.refresh() method. And one additional small change: (4) Merge invalidateTable and refreshTable. ## How was this patch tested? Created a new test suite that creates a temporary directory and then deletes a file from it to verify Spark can read the directory once refresh is called. You can merge this pull request into a Git repository by running: $ git pull https://github.com/petermaxlee/spark SPARK-16311 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13989.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13989 commit cbfbbc7d27ae086805625fa41dbcbad50783fee8 Author: petermaxleeDate: 2016-06-30T04:50:37Z [SPARK-16311][SQL] Improve metadata refresh commit f7150345245accd0e71a351e9da9ebac9b80a520 Author: petermaxlee Date: 2016-06-30T04:53:58Z Add test suite --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13979: [SPARK-SPARK-16302] [SQL] Set the right number of partit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13979 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61520/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13979: [SPARK-SPARK-16302] [SQL] Set the right number of partit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13979 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13979: [SPARK-SPARK-16302] [SQL] Set the right number of partit...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13979 **[Test build #61520 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61520/consoleFull)** for PR 13979 at commit [`f49ad08`](https://github.com/apache/spark/commit/f49ad0809d84ad8b512afd4cb58ac377426b8d3e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13987: [SPARK-16313][SQL] Spark should not silently drop...
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/13987#discussion_r69067474 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ListingFileCatalog.scala --- @@ -58,10 +56,16 @@ class ListingFileCatalog( } override protected def leafFiles: mutable.LinkedHashMap[Path, FileStatus] = { +if (cachedLeafFiles eq null) { + refresh() +} cachedLeafFiles } override protected def leafDirToChildrenFiles: Map[Path, Array[FileStatus]] = { +if (cachedLeafDirToChildrenFiles eq null) { + refresh() --- End diff -- There is a side effect. `refresh()` rest the `cachedPartitionSpec` to null, which may cleared already inferred partition information. ``` override def refresh(): Unit = { val files = listLeafFiles(paths) cachedLeafFiles = new mutable.LinkedHashMap[Path, FileStatus]() ++= files.map(f => f.getPath -> f) cachedLeafDirToChildrenFiles = files.toArray.groupBy(_.getPath.getParent) cachedPartitionSpec = null } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13987: [SPARK-16313][SQL] Spark should not silently drop except...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13987 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61521/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13987: [SPARK-16313][SQL] Spark should not silently drop except...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13987 **[Test build #61521 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61521/consoleFull)** for PR 13987 at commit [`f3eb4fb`](https://github.com/apache/spark/commit/f3eb4fbac5317fe9a29b2494a6006cb92932a456). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13987: [SPARK-16313][SQL] Spark should not silently drop except...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13987 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimizer
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13906 @cloud-fan Yea, that's a good point. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13988: [WIP][SPARK-16101][SQL] Refactoring CSV data source to b...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13988 I still need to correct some nits and check the consistency with JSON data source but I opened this just to check if it breaks anything. I will submit some more commits soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13988: [WIP][SPARK-16101][SQL] Refactoring CSV data source to b...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13988 **[Test build #61523 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61523/consoleFull)** for PR 13988 at commit [`211bfb4`](https://github.com/apache/spark/commit/211bfb47acc79c51327b3f1c40aa86802470f436). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13988: [WIP][SPARK-16101][SQL] Refactoring CSV data sour...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/13988 [WIP][SPARK-16101][SQL] Refactoring CSV data source to be consistent with JSON data source ## What changes were proposed in this pull request? This PR refactors CSV data source to be consistent with JSON data source. This PR removes classes `CSVParser` and introduces new classes `UnivocityParser`, `UnivocityGenerator` and `CSVUtils` to be consistent with JSON data source (`JacksonParser`, `JacksonGenerator` and `JacksonUtils`). Also, DefaultSource moves to `CSVRelation` just like `JSONRelation`. ## How was this patch tested? Existing tests should cover this. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-16101 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13988.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13988 commit 211bfb47acc79c51327b3f1c40aa86802470f436 Author: hyukjinkwonDate: 2016-06-30T03:50:58Z Refactoring CSV data source to be consistent with JSON data source --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13829: [SPARK-16071][SQL] Checks size limit when doubling the a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13829 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13829: [SPARK-16071][SQL] Checks size limit when doubling the a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13829 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61517/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13829: [SPARK-16071][SQL] Checks size limit when doubling the a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13829 **[Test build #61517 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61517/consoleFull)** for PR 13829 at commit [`943f7de`](https://github.com/apache/spark/commit/943f7de62204af5fee228e938d293e3283f4b395). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimizer
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/13906 @liancheng , I think we still need to keep some simple rules for unary node, which also helps the binary cases, as the empty relation is propagated up. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimi...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13906#discussion_r69065541 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CollapseEmptyPlan.scala --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans._ +import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.catalyst.rules._ + +/** + * Collapse plans consisting empty local relations generated by [[PruneFilters]]. + * 1. InnerJoin with one or two empty children. + * 2. Project/Generate/Filter/Sample/Join/Limit/Union/Repartition with all empty children. + * 3. Aggregate with all empty children and grpExprs containing all aggExprs. + */ +object CollapseEmptyPlan extends Rule[LogicalPlan] with PredicateHelper { + private def isEmptyLocalRelation(plan: LogicalPlan): Boolean = +plan.isInstanceOf[LocalRelation] && plan.asInstanceOf[LocalRelation].data.isEmpty + + def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { +case p @ Join(_, _, Inner, _) if p.children.exists(isEmptyLocalRelation) => --- End diff -- Yea, we can also add `Intersect`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimi...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13906#discussion_r69065425 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CollapseEmptyPlan.scala --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans._ +import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.catalyst.rules._ + +/** + * Collapse plans consisting empty local relations generated by [[PruneFilters]]. + * 1. InnerJoin with one or two empty children. + * 2. Project/Generate/Filter/Sample/Join/Limit/Union/Repartition with all empty children. + * 3. Aggregate with all empty children and grpExprs containing all aggExprs. + */ +object CollapseEmptyPlan extends Rule[LogicalPlan] with PredicateHelper { + private def isEmptyLocalRelation(plan: LogicalPlan): Boolean = +plan.isInstanceOf[LocalRelation] && plan.asInstanceOf[LocalRelation].data.isEmpty --- End diff -- ```scala plan match { case p: LocalRelation => p.data.isEmpty case _ => false } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimizer
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13906 My feeling is that, this optimization rule is mostly useful for binary plan nodes like inner join and intersection, where we can avoid scanning output of the non-empty side. On the other hand, for unary plan nodes, firstly it doesn't bring much performance benefits, especially when whole stage codegen is enabled; secondly there are non-obvious and tricky corner cases, like `Aggregate` and `Generate`. That said, although this patch is not a big one, it does introduce non-trivial complexities. For example, I didn't immediately realize that why `Aggregate` must be special cased at first (`COUNT(x)` may return 0 for empty input). The `Generate` case is even trickier. So my suggestion is to only implement this rule for inner join and intersection, which are much simpler to handle. what do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13829: [SPARK-16071][SQL] Checks size limit when doubling the a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13829 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13829: [SPARK-16071][SQL] Checks size limit when doubling the a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13829 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61515/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13829: [SPARK-16071][SQL] Checks size limit when doubling the a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13829 **[Test build #61515 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61515/consoleFull)** for PR 13829 at commit [`4265771`](https://github.com/apache/spark/commit/42657717041b055c9a9d1266f9a29d8e39edab20). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13906#discussion_r69065025 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CollapseEmptyPlan.scala --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans._ +import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.catalyst.rules._ + +/** + * Collapse plans consisting empty local relations generated by [[PruneFilters]]. + * 1. InnerJoin with one or two empty children. + * 2. Project/Generate/Filter/Sample/Join/Limit/Union/Repartition with all empty children. + * 3. Aggregate with all empty children and grpExprs containing all aggExprs. + */ +object CollapseEmptyPlan extends Rule[LogicalPlan] with PredicateHelper { + private def isEmptyLocalRelation(plan: LogicalPlan): Boolean = +plan.isInstanceOf[LocalRelation] && plan.asInstanceOf[LocalRelation].data.isEmpty + + def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { +case p @ Join(_, _, Inner, _) if p.children.exists(isEmptyLocalRelation) => --- End diff -- I think this rule is very useful, we can avoid scanning one join side if the other side is empty --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13906#discussion_r69064885 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CollapseEmptyPlan.scala --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans._ +import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.catalyst.rules._ + +/** + * Collapse plans consisting empty local relations generated by [[PruneFilters]]. + * 1. InnerJoin with one or two empty children. + * 2. Project/Generate/Filter/Sample/Join/Limit/Union/Repartition with all empty children. + * 3. Aggregate with all empty children and grpExprs containing all aggExprs. + */ +object CollapseEmptyPlan extends Rule[LogicalPlan] with PredicateHelper { + private def isEmptyLocalRelation(plan: LogicalPlan): Boolean = +plan.isInstanceOf[LocalRelation] && plan.asInstanceOf[LocalRelation].data.isEmpty + + def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { +case p @ Join(_, _, Inner, _) if p.children.exists(isEmptyLocalRelation) => + LocalRelation(p.output, data = Seq.empty) + +case p: LogicalPlan if p.children.nonEmpty && p.children.forall(isEmptyLocalRelation) => + p match { +case _: Project | _: Generate | _: Filter | _: Sample | _: Join | + _: Sort | _: GlobalLimit | _: LocalLimit | _: Union | _: Repartition => + LocalRelation(p.output, data = Seq.empty) +case Aggregate(ge, ae, _) if ae.forall(ge.contains(_)) => --- End diff -- what exactly are we checking here? it looks to me that we can do empty relation propagate if aggregate list has no aggregate function, e.g. `select col + 1 from tbl group by col` should also work. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13978: [SPARK-16256][DOCS] Minor fixes on the Structured Stream...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13978 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61522/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13978: [SPARK-16256][DOCS] Minor fixes on the Structured Stream...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13978 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13978: [SPARK-16256][DOCS] Minor fixes on the Structured Stream...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13978 **[Test build #61522 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61522/consoleFull)** for PR 13978 at commit [`f440214`](https://github.com/apache/spark/commit/f440214efb0f79d3a82be45bd3d67aa6c4038fda). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStreams to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11863 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61513/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStreams to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11863 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStreams to ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11863 **[Test build #61513 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61513/consoleFull)** for PR 11863 at commit [`cffb0e0`](https://github.com/apache/spark/commit/cffb0e0fb89808732c3ab3c1c7d83049549e2e2d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimi...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13906#discussion_r69064054 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CollapseEmptyPlan.scala --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans._ +import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.catalyst.rules._ + +/** + * Collapse plans consisting empty local relations generated by [[PruneFilters]]. + * 1. InnerJoin with one or two empty children. + * 2. Project/Generate/Filter/Sample/Join/Limit/Union/Repartition with all empty children. + * 3. Aggregate with all empty children and grpExprs containing all aggExprs. + */ +object CollapseEmptyPlan extends Rule[LogicalPlan] with PredicateHelper { + private def isEmptyLocalRelation(plan: LogicalPlan): Boolean = +plan.isInstanceOf[LocalRelation] && plan.asInstanceOf[LocalRelation].data.isEmpty + + def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { +case p @ Join(_, _, Inner, _) if p.children.exists(isEmptyLocalRelation) => + LocalRelation(p.output, data = Seq.empty) + +case p: LogicalPlan if p.children.nonEmpty && p.children.forall(isEmptyLocalRelation) => + p match { +case _: Project | _: Generate | _: Filter | _: Sample | _: Join | --- End diff -- Actually `Generate` can't be included here. Our `Generate` also support Hive style UDTF, which has a weird semantics: for a UDTF `f`, after all rows being processed, `f.close()` will be called, and *more rows can be generated* within `f.close()`. This means a UDTF may generate one or more rows even if the underlying input is empty. See [here][1] and PR #5338 for more details. [1]: https://github.com/apache/spark/pull/5383/files --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13829: [SPARK-16071][SQL] Checks size limit when doubling the a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13829 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13829: [SPARK-16071][SQL] Checks size limit when doubling the a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13829 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61514/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13829: [SPARK-16071][SQL] Checks size limit when doubling the a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13829 **[Test build #61514 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61514/consoleFull)** for PR 13829 at commit [`3a831e0`](https://github.com/apache/spark/commit/3a831e03cfbe0722701a88c9bdbc164098197113). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13978: [SPARK-16256][DOCS] Minor fixes on the Structured Stream...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13978 **[Test build #61522 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61522/consoleFull)** for PR 13978 at commit [`f440214`](https://github.com/apache/spark/commit/f440214efb0f79d3a82be45bd3d67aa6c4038fda). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13987: [SPARK-16313][SQL] Spark should not silently drop except...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/13987 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13987: [SPARK-16313][SQL] Spark should not silently drop except...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13987 **[Test build #61521 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61521/consoleFull)** for PR 13987 at commit [`f3eb4fb`](https://github.com/apache/spark/commit/f3eb4fbac5317fe9a29b2494a6006cb92932a456). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13987: [SPARK-16313][SQL] Spark should not silently drop...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/13987 [SPARK-16313][SQL] Spark should not silently drop exceptions in file listing ## What changes were proposed in this pull request? Spark silently drops exceptions during file listing. This is a very bad behavior because it can mask legitimate errors and the resulting plan will silently have 0 rows. This patch changes it to not silently drop the errors. ## How was this patch tested? Manually verified. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-16313 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13987.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13987 commit f3eb4fbac5317fe9a29b2494a6006cb92932a456 Author: Reynold XinDate: 2016-06-30T03:00:16Z [SPARK-16313][SQL] Spark should not silently drop exceptions in file listing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13972: [SPARK-16294][SQL] Labelling support for the include_exa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13972 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61519/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13972: [SPARK-16294][SQL] Labelling support for the include_exa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13972 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13972: [SPARK-16294][SQL] Labelling support for the include_exa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13972 **[Test build #61519 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61519/consoleFull)** for PR 13972 at commit [`7ea9c75`](https://github.com/apache/spark/commit/7ea9c753fc8b490f2b0549b6dbb303bd0b8a573f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13979: [SPARK-SPARK-16302] [SQL] Set the right number of partit...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13979 **[Test build #61520 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61520/consoleFull)** for PR 13979 at commit [`f49ad08`](https://github.com/apache/spark/commit/f49ad0809d84ad8b512afd4cb58ac377426b8d3e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12384: [SPARK-14608] [ML] transformSchema needs better document...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12384 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61518/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12384: [SPARK-14608] [ML] transformSchema needs better document...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12384 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12384: [SPARK-14608] [ML] transformSchema needs better document...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12384 **[Test build #61518 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61518/consoleFull)** for PR 12384 at commit [`ddbc56a`](https://github.com/apache/spark/commit/ddbc56a6cdbbd1280bd50dd55972e50f0eaa3dd5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13941: [SPARK-16249][ML] Change visibility of Object ml.cluster...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13941 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13941: [SPARK-16249][ML] Change visibility of Object ml.cluster...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13941 **[Test build #61516 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61516/consoleFull)** for PR 13941 at commit [`11a077c`](https://github.com/apache/spark/commit/11a077cd3e86c169465375c24ac50ad28801f2e2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13941: [SPARK-16249][ML] Change visibility of Object ml.cluster...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13941 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61516/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStreams to ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11863 **[Test build #3150 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3150/consoleFull)** for PR 11863 at commit [`f863369`](https://github.com/apache/spark/commit/f86336951d4dd196812420e4e902f105ea95e81b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13972: [SPARK-16294][SQL] Labelling support for the include_exa...
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/13972 @mengxr With this PR merged, I think we can also fix the [SPARK-13015 (mllib-data-types.md )](https://issues.apache.org/jira/browse/SPARK-13015) with a consolidated example file. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13972: [SPARK-16294][SQL] Labelling support for the include_exa...
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/13972 @yinxusen Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13972: [SPARK-16294][SQL] Labelling support for the include_exa...
Github user yinxusen commented on the issue: https://github.com/apache/spark/pull/13972 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org