[GitHub] [spark] SparkQA removed a comment on issue #27226: [SPARK-30524] [SQL] Disable OptimizeSkewedJoin rule when introducing additional shuffle
SparkQA removed a comment on issue #27226: [SPARK-30524] [SQL] Disable OptimizeSkewedJoin rule when introducing additional shuffle URL: https://github.com/apache/spark/pull/27226#issuecomment-574972361 **[Test build #116810 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116810/testReport)** for PR 27226 at commit [`903309f`](https://github.com/apache/spark/commit/903309fe5e7bc8e1ca69ad7f4c8e22d64acb11b3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27226: [SPARK-30524] [SQL] Disable OptimizeSkewedJoin rule when introducing additional shuffle
AmplabJenkins commented on issue #27226: [SPARK-30524] [SQL] Disable OptimizeSkewedJoin rule when introducing additional shuffle URL: https://github.com/apache/spark/pull/27226#issuecomment-575029908 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27226: [SPARK-30524] [SQL] Disable OptimizeSkewedJoin rule when introducing additional shuffle
AmplabJenkins removed a comment on issue #27226: [SPARK-30524] [SQL] Disable OptimizeSkewedJoin rule when introducing additional shuffle URL: https://github.com/apache/spark/pull/27226#issuecomment-575029908 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27226: [SPARK-30524] [SQL] Disable OptimizeSkewedJoin rule when introducing additional shuffle
AmplabJenkins removed a comment on issue #27226: [SPARK-30524] [SQL] Disable OptimizeSkewedJoin rule when introducing additional shuffle URL: https://github.com/apache/spark/pull/27226#issuecomment-575029916 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116810/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27226: [SPARK-30524] [SQL] Disable OptimizeSkewedJoin rule when introducing additional shuffle
AmplabJenkins commented on issue #27226: [SPARK-30524] [SQL] Disable OptimizeSkewedJoin rule when introducing additional shuffle URL: https://github.com/apache/spark/pull/27226#issuecomment-575029916 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116810/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27226: [SPARK-30524] [SQL] Disable OptimizeSkewedJoin rule when introducing additional shuffle
SparkQA commented on issue #27226: [SPARK-30524] [SQL] Disable OptimizeSkewedJoin rule when introducing additional shuffle URL: https://github.com/apache/spark/pull/27226#issuecomment-575029403 **[Test build #116810 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116810/testReport)** for PR 27226 at commit [`903309f`](https://github.com/apache/spark/commit/903309fe5e7bc8e1ca69ad7f4c8e22d64acb11b3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression allows
AmplabJenkins removed a comment on issue #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression allows the use of the FILTER clause URL: https://github.com/apache/spark/pull/27058#issuecomment-575028449 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21597/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression allows
AmplabJenkins removed a comment on issue #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression allows the use of the FILTER clause URL: https://github.com/apache/spark/pull/27058#issuecomment-575028437 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away
AmplabJenkins removed a comment on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away URL: https://github.com/apache/spark/pull/27096#issuecomment-575028175 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116821/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression allows the use
AmplabJenkins commented on issue #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression allows the use of the FILTER clause URL: https://github.com/apache/spark/pull/27058#issuecomment-575028449 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21597/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away
AmplabJenkins removed a comment on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away URL: https://github.com/apache/spark/pull/27096#issuecomment-575028166 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression allows the use
AmplabJenkins commented on issue #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression allows the use of the FILTER clause URL: https://github.com/apache/spark/pull/27058#issuecomment-575028437 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away
AmplabJenkins commented on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away URL: https://github.com/apache/spark/pull/27096#issuecomment-575028166 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away
AmplabJenkins commented on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away URL: https://github.com/apache/spark/pull/27096#issuecomment-575028175 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116821/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away
SparkQA commented on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away URL: https://github.com/apache/spark/pull/27096#issuecomment-575028093 **[Test build #116821 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116821/testReport)** for PR 27096 at commit [`9e895bd`](https://github.com/apache/spark/commit/9e895bd16162577196b83add1ef3d99bb4fc0d08). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away
SparkQA removed a comment on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away URL: https://github.com/apache/spark/pull/27096#issuecomment-575002718 **[Test build #116821 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116821/testReport)** for PR 27096 at commit [`9e895bd`](https://github.com/apache/spark/commit/9e895bd16162577196b83add1ef3d99bb4fc0d08). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression allows the use of th
SparkQA commented on issue #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression allows the use of the FILTER clause URL: https://github.com/apache/spark/pull/27058#issuecomment-575027955 **[Test build #116825 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116825/testReport)** for PR 27058 at commit [`a83efcf`](https://github.com/apache/spark/commit/a83efcf57021167bf9829f9f1ee2039ea9e86213). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework
imback82 commented on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework URL: https://github.com/apache/spark/pull/27187#issuecomment-575027510 Thanks. I will start migrating more command to the new framework this week. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 edited a comment on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework
imback82 edited a comment on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework URL: https://github.com/apache/spark/pull/27187#issuecomment-575027510 Thanks. I will start migrating more commands to the new framework this week. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25827: [SPARK-29128][SQL] Split predicate code in OR expressions
AmplabJenkins removed a comment on issue #25827: [SPARK-29128][SQL] Split predicate code in OR expressions URL: https://github.com/apache/spark/pull/25827#issuecomment-575026025 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25827: [SPARK-29128][SQL] Split predicate code in OR expressions
AmplabJenkins removed a comment on issue #25827: [SPARK-29128][SQL] Split predicate code in OR expressions URL: https://github.com/apache/spark/pull/25827#issuecomment-575026030 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21596/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25827: [SPARK-29128][SQL] Split predicate code in OR expressions
AmplabJenkins commented on issue #25827: [SPARK-29128][SQL] Split predicate code in OR expressions URL: https://github.com/apache/spark/pull/25827#issuecomment-575026025 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25827: [SPARK-29128][SQL] Split predicate code in OR expressions
AmplabJenkins commented on issue #25827: [SPARK-29128][SQL] Split predicate code in OR expressions URL: https://github.com/apache/spark/pull/25827#issuecomment-575026030 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21596/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] fuwhu commented on a change in pull request #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions
fuwhu commented on a change in pull request #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions URL: https://github.com/apache/spark/pull/26805#discussion_r367247006 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/PruneHiveTablePartitions.scala ## @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive.execution + +import java.io.IOException + +import org.apache.hadoop.hive.common.StatsSetupConst + +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.catalyst.analysis.CastSupport +import org.apache.spark.sql.catalyst.catalog.{CatalogStatistics, CatalogTable, CatalogTablePartition, HiveTableRelation} +import org.apache.spark.sql.catalyst.expressions.{And, AttributeSet, Expression, ExpressionSet, SubqueryExpression} +import org.apache.spark.sql.catalyst.planning.PhysicalOperation +import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, Project} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.datasources.DataSourceStrategy + +/** + * TODO: merge this with PruneFileSourcePartitions after we completely make hive as a data source. + */ +private[sql] class PruneHiveTablePartitions(session: SparkSession) + extends Rule[LogicalPlan] with CastSupport { + + override val conf = session.sessionState.conf + + /** + * Extract the partition filters from the filters on the table. + */ + private def getPartitionKeyFilters( + filters: Seq[Expression], + relation: HiveTableRelation): ExpressionSet = { +val normalizedFilters = DataSourceStrategy.normalizeExprs( + filters.filter(f => f.deterministic && !SubqueryExpression.hasSubquery(f)), relation.output) +val partitionColumnSet = AttributeSet(relation.partitionCols) +ExpressionSet(normalizedFilters.filter { f => + !f.references.isEmpty && f.references.subsetOf(partitionColumnSet) +}) + } + + /** + * Prune the hive table using filters on the partitions of the table. + */ + private def prunePartitions( + relation: HiveTableRelation, + partitionFilters: ExpressionSet): Seq[CatalogTablePartition] = { +if (conf.metastorePartitionPruning) { Review comment: you mean adding a config to control whether we should prune table partitions in optimization phase ? And we can check the config in apply , this config can default to be true. And this config can also be checked in PruneFileSourcePartitions.apply. Is that expected ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25827: [SPARK-29128][SQL] Split predicate code in OR expressions
SparkQA commented on issue #25827: [SPARK-29128][SQL] Split predicate code in OR expressions URL: https://github.com/apache/spark/pull/25827#issuecomment-575025592 **[Test build #116824 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116824/testReport)** for PR 25827 at commit [`543c016`](https://github.com/apache/spark/commit/543c0167dab23ece2e4db232c0fd7d4c9e5eeb8e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] kiszk commented on issue #25827: [SPARK-29128][SQL] Split predicate code in OR expressions
kiszk commented on issue #25827: [SPARK-29128][SQL] Split predicate code in OR expressions URL: https://github.com/apache/spark/pull/25827#issuecomment-575025184 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #26759: [SPARK-28794][SQL][DOC] Documentation for Create table Command
dilipbiswal commented on a change in pull request #26759: [SPARK-28794][SQL][DOC] Documentation for Create table Command URL: https://github.com/apache/spark/pull/26759#discussion_r367269598 ## File path: docs/sql-ref-syntax-ddl-create-table-hiveformat.md ## @@ -0,0 +1,105 @@ +--- +layout: global +title: CREATE HIVEFORMAT TABLE +displayTitle: CREATE HIVEFORMAT TABLE +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- +### Description + +The `CREATE TABLE` statement creates a new table using Hive format. + +### Syntax +{% highlight sql %} +CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name + [(col_name1[:] col_type1 [COMMENT col_comment1], ...)] + [COMMENT table_comment] + [PARTITIONED BY (col_name2[:] col_type2 [COMMENT col_comment2], ...)] + [ROW FORMAT row_format] + [STORED AS file_format] + [LOCATION path] + [TBLPROPERTIES (key1=val1, key2=val2, ...)] + [AS select_statement] + +{% endhighlight %} + +### Parameters + + + EXTERNAL + Table is created using the path provided as LOCATION, does not use default location for this table. + + + + PARTITIONED BY + Partitions are created on the table, based on the columns specified. + + + + ROW FORMAT + SERDE is used to specify a custom SerDe or the DELIMITED clause inorder to use the native SerDe. + + + + STORED Review comment: STORED AS ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5
AmplabJenkins removed a comment on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5 URL: https://github.com/apache/spark/pull/27227#issuecomment-575023664 Build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #26759: [SPARK-28794][SQL][DOC] Documentation for Create table Command
dilipbiswal commented on a change in pull request #26759: [SPARK-28794][SQL][DOC] Documentation for Create table Command URL: https://github.com/apache/spark/pull/26759#discussion_r367268754 ## File path: docs/sql-ref-syntax-ddl-create-table-datasource.md ## @@ -0,0 +1,97 @@ +--- +layout: global +title: CREATE DATASOURCE TABLE +displayTitle: CREATE DATASOURCE TABLE +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +### Description + +The `CREATE TABLE` statement creates a new table using a Data Source. + +### Syntax +{% highlight sql %} +CREATE TABLE [IF NOT EXISTS] [db_name.]table_name + [(col_name1 col_type1 [COMMENT col_comment1], ...)] + USING data_source + [OPTIONS (key1=val1, key2=val2, ...)] + [PARTITIONED BY (col_name1, col_name2, ...)] + [CLUSTERED BY (col_name3, col_name4, ...) INTO num_buckets BUCKETS] + [LOCATION path] + [COMMENT table_comment] + [TBLPROPERTIES (key1=val1, key2=val2, ...)] + [AS select_statement] +{% endhighlight %} + +### Parameters + + + USING data_source + Data Source is the file format used to create the table. Data source can be CSV, TXT, ORC, JDBC, PARQUET, etc. Review comment: should we say "input format" instead of "file format". For example, JDBC is data source is not a file format, right ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5
AmplabJenkins commented on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5 URL: https://github.com/apache/spark/pull/27227#issuecomment-575023671 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116811/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5
AmplabJenkins commented on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5 URL: https://github.com/apache/spark/pull/27227#issuecomment-575023664 Build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5
AmplabJenkins removed a comment on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5 URL: https://github.com/apache/spark/pull/27227#issuecomment-575023671 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116811/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5
SparkQA removed a comment on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5 URL: https://github.com/apache/spark/pull/27227#issuecomment-574977039 **[Test build #116811 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116811/testReport)** for PR 27227 at commit [`dbc819a`](https://github.com/apache/spark/commit/dbc819af5d3b6eccaa3874be45e1f076fdeaecd1). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5
SparkQA commented on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5 URL: https://github.com/apache/spark/pull/27227#issuecomment-575022655 **[Test build #116811 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116811/testReport)** for PR 27227 at commit [`dbc819a`](https://github.com/apache/spark/commit/dbc819af5d3b6eccaa3874be45e1f076fdeaecd1). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #26759: [SPARK-28794][SQL][DOC] Documentation for Create table Command
dilipbiswal commented on a change in pull request #26759: [SPARK-28794][SQL][DOC] Documentation for Create table Command URL: https://github.com/apache/spark/pull/26759#discussion_r367267170 ## File path: docs/sql-ref-syntax-ddl-create-table-datasource.md ## @@ -0,0 +1,97 @@ +--- +layout: global +title: CREATE DATASOURCE TABLE +displayTitle: CREATE DATASOURCE TABLE +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +### Description + +The `CREATE TABLE` statement creates a new table using a Data Source. + +### Syntax +{% highlight sql %} +CREATE TABLE [IF NOT EXISTS] [db_name.]table_name + [(col_name1 col_type1 [COMMENT col_comment1], ...)] + USING data_source + [OPTIONS (key1=val1, key2=val2, ...)] + [PARTITIONED BY (col_name1, col_name2, ...)] + [CLUSTERED BY (col_name3, col_name4, ...) INTO num_buckets BUCKETS] + [LOCATION path] + [COMMENT table_comment] + [TBLPROPERTIES (key1=val1, key2=val2, ...)] + [AS select_statement] +{% endhighlight %} + +### Parameters + + + USING data_source + Data Source is the file format used to create the table. Data source can be CSV, TXT, ORC, JDBC, PARQUET, etc. + + + + PARTITIONED BY + Partitions are created on the table, based on the columns specified. + + + + CLUSTERED BY + + Partitions created on the table will be bucketed into fixed buckets based on the column specified for bucketing. + NOTE:Bucketing is an optimization technique that uses buckets (and bucketing columns) to determine data partitioning and avoid data shuffle. + + + + + LOCATION + Path to the directory where table data is stored, could be filesystem, HDFS, etc. + + + + COMMENT + Table comments are added. + + + + TBLPROPERTIES + Table properties that has to be set are specified such as `created.by.user`, `owner`, etc. + + + + + AS select_statement + The table is populated using the data from the select statement. + + +### Examples +{% highlight sql %} + +--Using data source +CREATE TABLE Student (width INT, length INT, height INT) USING CSV Review comment: perhaps change the column names to id, name, age to be more meaningful ? Also can you please put semi colon at the end in the examples just to be consistent with other docs ? cc @huaxingao can you please check on the consistency part if you have some time ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions
AmplabJenkins commented on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions URL: https://github.com/apache/spark/pull/26805#issuecomment-575020896 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21595/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions
AmplabJenkins removed a comment on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions URL: https://github.com/apache/spark/pull/26805#issuecomment-575020896 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21595/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions
AmplabJenkins removed a comment on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions URL: https://github.com/apache/spark/pull/26805#issuecomment-575020889 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions
AmplabJenkins commented on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions URL: https://github.com/apache/spark/pull/26805#issuecomment-575020889 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27165: [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types
AmplabJenkins removed a comment on issue #27165: [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types URL: https://github.com/apache/spark/pull/27165#issuecomment-575019400 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116814/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions
SparkQA commented on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions URL: https://github.com/apache/spark/pull/26805#issuecomment-575020489 **[Test build #116823 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116823/testReport)** for PR 26805 at commit [`f89947b`](https://github.com/apache/spark/commit/f89947beff6ae5bf61fd83cd646db47381c8db57). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #27209: [SPARK-29450][SS][2.4] Measure the number of output rows for streaming aggregation with append mode
cloud-fan closed pull request #27209: [SPARK-29450][SS][2.4] Measure the number of output rows for streaming aggregation with append mode URL: https://github.com/apache/spark/pull/27209 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #27209: [SPARK-29450][SS][2.4] Measure the number of output rows for streaming aggregation with append mode
cloud-fan commented on issue #27209: [SPARK-29450][SS][2.4] Measure the number of output rows for streaming aggregation with append mode URL: https://github.com/apache/spark/pull/27209#issuecomment-575020190 thanks, merging to 2.4! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27165: [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types
AmplabJenkins removed a comment on issue #27165: [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types URL: https://github.com/apache/spark/pull/27165#issuecomment-575019391 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27165: [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types
AmplabJenkins commented on issue #27165: [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types URL: https://github.com/apache/spark/pull/27165#issuecomment-575019400 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116814/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27165: [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types
SparkQA removed a comment on issue #27165: [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types URL: https://github.com/apache/spark/pull/27165#issuecomment-574980876 **[Test build #116814 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116814/testReport)** for PR 27165 at commit [`b0447a7`](https://github.com/apache/spark/commit/b0447a7cd4b4cc3e4881f7e9ad264bea656633ac). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27165: [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types
AmplabJenkins commented on issue #27165: [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types URL: https://github.com/apache/spark/pull/27165#issuecomment-575019391 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27165: [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types
SparkQA commented on issue #27165: [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types URL: https://github.com/apache/spark/pull/27165#issuecomment-575019064 **[Test build #116814 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116814/testReport)** for PR 27165 at commit [`b0447a7`](https://github.com/apache/spark/commit/b0447a7cd4b4cc3e4881f7e9ad264bea656633ac). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5
AmplabJenkins removed a comment on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5 URL: https://github.com/apache/spark/pull/27227#issuecomment-575015480 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116813/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5
AmplabJenkins removed a comment on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5 URL: https://github.com/apache/spark/pull/27227#issuecomment-575015475 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5
AmplabJenkins commented on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5 URL: https://github.com/apache/spark/pull/27227#issuecomment-575015475 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5
SparkQA removed a comment on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5 URL: https://github.com/apache/spark/pull/27227#issuecomment-574980880 **[Test build #116813 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116813/testReport)** for PR 27227 at commit [`89e2af4`](https://github.com/apache/spark/commit/89e2af48f2a5eb91087aa0a65b00027305c0439d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5
SparkQA commented on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5 URL: https://github.com/apache/spark/pull/27227#issuecomment-575015232 **[Test build #116813 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116813/testReport)** for PR 27227 at commit [`89e2af4`](https://github.com/apache/spark/commit/89e2af48f2a5eb91087aa0a65b00027305c0439d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5
AmplabJenkins commented on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5 URL: https://github.com/apache/spark/pull/27227#issuecomment-575015480 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116813/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] bmarcott edited a comment on issue #27207: [WIP][SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling.
bmarcott edited a comment on issue #27207: [WIP][SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling. URL: https://github.com/apache/spark/pull/27207#issuecomment-575011886 @tgravescs Thanks for the comments. > so please update the description with information from the other PR Which one of my snippets from the previous PR was most clear to you? I can put that one in the description. > One thing I don't think I like Really good point on this scenario. It's bad even for the same scenario that you described but where the all resource offer has only 1 executor, and the first taskset accepts it. Let me know if you have any good ideas here  Is it ok I do follow up changes, such as variable names, unit tests, and other backend schedulers only once we iron out the problematic scenarios? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] kiszk commented on issue #23988: [SPARK-26509][SQL] Parquet DELTA_BYTE_ARRAY is not supported in Spark 2.x's Vectorized Reader
kiszk commented on issue #23988: [SPARK-26509][SQL] Parquet DELTA_BYTE_ARRAY is not supported in Spark 2.x's Vectorized Reader URL: https://github.com/apache/spark/pull/23988#issuecomment-575012114 ping @nandorKollar This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] bmarcott commented on issue #27207: [WIP][SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling.
bmarcott commented on issue #27207: [WIP][SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling. URL: https://github.com/apache/spark/pull/27207#issuecomment-575011886 @tgravescs Thanks for the comments. > so please update the description with information from the other PR Which one of my snippets from the previous PR was most clear to you? I can put that one in the description. > One thing I don't think I like Really good point on this scenario. It's bad even for the same scenario that you described but where the all resource offer has only 1 executor, and the first taskset accepts it. Let me know if you have any good ideas here  This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27231: [SPARK-28478] [SQL] Remove redundant null checks
AmplabJenkins removed a comment on issue #27231: [SPARK-28478] [SQL] Remove redundant null checks URL: https://github.com/apache/spark/pull/27231#issuecomment-575011286 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27231: [SPARK-28478] [SQL] Remove redundant null checks
AmplabJenkins commented on issue #27231: [SPARK-28478] [SQL] Remove redundant null checks URL: https://github.com/apache/spark/pull/27231#issuecomment-575011584 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27231: [SPARK-28478] [SQL] Remove redundant null checks
AmplabJenkins commented on issue #27231: [SPARK-28478] [SQL] Remove redundant null checks URL: https://github.com/apache/spark/pull/27231#issuecomment-575011286 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] davidvrba opened a new pull request #27231: [SPARK-28478] [SQL] Remove redundant null checks
davidvrba opened a new pull request #27231: [SPARK-28478] [SQL] Remove redundant null checks URL: https://github.com/apache/spark/pull/27231 ### What changes were proposed in this pull request? The purpose of this pr is to remove explicit null checks if they are not needed in order to simplify the generated code. Here is one example: Expressions of this type ``` CASE WHEN isnull(title#5) THEN title#5 ELSE substring(title#5, 0, 3) END ``` are simplified to ``` substring(title#5, 0, 3) ``` if the considered expression is null-intolerant. ### Why are the changes needed? It simplifies expressions in the query plan which leads to potential optimization due to simplified codegen. ### Does this PR introduce any user-facing change? No ### How was this patch tested? New tests are added. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27230: [SPARK-27868][CORE] Partially rollback previous change #09ed64d.
AmplabJenkins removed a comment on issue #27230: [SPARK-27868][CORE] Partially rollback previous change #09ed64d. URL: https://github.com/apache/spark/pull/27230#issuecomment-575006734 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27230: [SPARK-27868][CORE] Partially rollback previous change #09ed64d.
AmplabJenkins commented on issue #27230: [SPARK-27868][CORE] Partially rollback previous change #09ed64d. URL: https://github.com/apache/spark/pull/27230#issuecomment-575007102 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xCASx commented on issue #24732: [SPARK-27868][core] Better default value and documentation for socket server backlog.
xCASx commented on issue #24732: [SPARK-27868][core] Better default value and documentation for socket server backlog. URL: https://github.com/apache/spark/pull/24732#issuecomment-575006840 Pull request has been sent. Not sure if this case requires a separate Jira ticket. Reused original SPARK-27868. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27230: [SPARK-27868][CORE] Partially rollback previous change #09ed64d.
AmplabJenkins commented on issue #27230: [SPARK-27868][CORE] Partially rollback previous change #09ed64d. URL: https://github.com/apache/spark/pull/27230#issuecomment-575006734 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xCASx opened a new pull request #27230: [SPARK-27868][CORE] Partially rollback previous change #09ed64d.
xCASx opened a new pull request #27230: [SPARK-27868][CORE] Partially rollback previous change #09ed64d. URL: https://github.com/apache/spark/pull/27230 The default value for backLog set back to -1, as any other value may break existing configuration by overriding Netty's default io.netty.util.NetUtil#SOMAXCONN. The documentation accordingly adjusted. See discussion thread: https://github.com/apache/spark/pull/24732 ### What changes were proposed in this pull request? Partial rollback of https://github.com/apache/spark/pull/24732 (default for backLog set back to -1). ### Why are the changes needed? Previous change introduces backward incompatibility by overriding default of Netty's `io.netty.util.NetUtil#SOMAXCONN` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27197: [SPARK-30507][SQL] TableCalalog reserved properties shoudn't be changed via options or tblpropeties
AmplabJenkins removed a comment on issue #27197: [SPARK-30507][SQL] TableCalalog reserved properties shoudn't be changed via options or tblpropeties URL: https://github.com/apache/spark/pull/27197#issuecomment-575005169 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116806/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27197: [SPARK-30507][SQL] TableCalalog reserved properties shoudn't be changed via options or tblpropeties
AmplabJenkins removed a comment on issue #27197: [SPARK-30507][SQL] TableCalalog reserved properties shoudn't be changed via options or tblpropeties URL: https://github.com/apache/spark/pull/27197#issuecomment-575005165 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] fuwhu commented on a change in pull request #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions
fuwhu commented on a change in pull request #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions URL: https://github.com/apache/spark/pull/26805#discussion_r367247006 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/PruneHiveTablePartitions.scala ## @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive.execution + +import java.io.IOException + +import org.apache.hadoop.hive.common.StatsSetupConst + +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.catalyst.analysis.CastSupport +import org.apache.spark.sql.catalyst.catalog.{CatalogStatistics, CatalogTable, CatalogTablePartition, HiveTableRelation} +import org.apache.spark.sql.catalyst.expressions.{And, AttributeSet, Expression, ExpressionSet, SubqueryExpression} +import org.apache.spark.sql.catalyst.planning.PhysicalOperation +import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, Project} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.datasources.DataSourceStrategy + +/** + * TODO: merge this with PruneFileSourcePartitions after we completely make hive as a data source. + */ +private[sql] class PruneHiveTablePartitions(session: SparkSession) + extends Rule[LogicalPlan] with CastSupport { + + override val conf = session.sessionState.conf + + /** + * Extract the partition filters from the filters on the table. + */ + private def getPartitionKeyFilters( + filters: Seq[Expression], + relation: HiveTableRelation): ExpressionSet = { +val normalizedFilters = DataSourceStrategy.normalizeExprs( + filters.filter(f => f.deterministic && !SubqueryExpression.hasSubquery(f)), relation.output) +val partitionColumnSet = AttributeSet(relation.partitionCols) +ExpressionSet(normalizedFilters.filter { f => + !f.references.isEmpty && f.references.subsetOf(partitionColumnSet) +}) + } + + /** + * Prune the hive table using filters on the partitions of the table. + */ + private def prunePartitions( + relation: HiveTableRelation, + partitionFilters: ExpressionSet): Seq[CatalogTablePartition] = { +if (conf.metastorePartitionPruning) { Review comment: you mean adding a config to control whether we should prune hive table partitions in optimization phase ? And we can check the config in apply , this config can default to be true. And this config can also be checked in PruneFileSourcePartitions.apply. Is that expected ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27197: [SPARK-30507][SQL] TableCalalog reserved properties shoudn't be changed via options or tblpropeties
AmplabJenkins commented on issue #27197: [SPARK-30507][SQL] TableCalalog reserved properties shoudn't be changed via options or tblpropeties URL: https://github.com/apache/spark/pull/27197#issuecomment-575005165 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27197: [SPARK-30507][SQL] TableCalalog reserved properties shoudn't be changed via options or tblpropeties
SparkQA removed a comment on issue #27197: [SPARK-30507][SQL] TableCalalog reserved properties shoudn't be changed via options or tblpropeties URL: https://github.com/apache/spark/pull/27197#issuecomment-574960835 **[Test build #116806 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116806/testReport)** for PR 27197 at commit [`9e19d27`](https://github.com/apache/spark/commit/9e19d277f914b5c00c0860927613cfc494536e19). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27197: [SPARK-30507][SQL] TableCalalog reserved properties shoudn't be changed via options or tblpropeties
AmplabJenkins commented on issue #27197: [SPARK-30507][SQL] TableCalalog reserved properties shoudn't be changed via options or tblpropeties URL: https://github.com/apache/spark/pull/27197#issuecomment-575005169 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116806/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27197: [SPARK-30507][SQL] TableCalalog reserved properties shoudn't be changed via options or tblpropeties
SparkQA commented on issue #27197: [SPARK-30507][SQL] TableCalalog reserved properties shoudn't be changed via options or tblpropeties URL: https://github.com/apache/spark/pull/27197#issuecomment-575004931 **[Test build #116806 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116806/testReport)** for PR 27197 at commit [`9e19d27`](https://github.com/apache/spark/commit/9e19d277f914b5c00c0860927613cfc494536e19). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27229: [SPARK-29708][SQL][BRANCH-2.4] Correct aggregated values when grouping sets are duplicated
AmplabJenkins removed a comment on issue #27229: [SPARK-29708][SQL][BRANCH-2.4] Correct aggregated values when grouping sets are duplicated URL: https://github.com/apache/spark/pull/27229#issuecomment-575003044 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27229: [SPARK-29708][SQL][BRANCH-2.4] Correct aggregated values when grouping sets are duplicated
AmplabJenkins removed a comment on issue #27229: [SPARK-29708][SQL][BRANCH-2.4] Correct aggregated values when grouping sets are duplicated URL: https://github.com/apache/spark/pull/27229#issuecomment-575003048 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21594/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27229: [SPARK-29708][SQL][BRANCH-2.4] Correct aggregated values when grouping sets are duplicated
SparkQA commented on issue #27229: [SPARK-29708][SQL][BRANCH-2.4] Correct aggregated values when grouping sets are duplicated URL: https://github.com/apache/spark/pull/27229#issuecomment-575004668 **[Test build #116822 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116822/testReport)** for PR 27229 at commit [`be0ce63`](https://github.com/apache/spark/commit/be0ce63e1289322791fcb699ead1918a319576db). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] z47xu commented on issue #24421: [SPARK-12312][SQL]Support Kerberos login in JDBC connector
z47xu commented on issue #24421: [SPARK-12312][SQL]Support Kerberos login in JDBC connector URL: https://github.com/apache/spark/pull/24421#issuecomment-575003039 > @vanzin may I ask to close this? I think it won't continue. > I'm planning to pick this up. What is your plan? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27229: [SPARK-29708][SQL][BRANCH-2.4] Correct aggregated values when grouping sets are duplicated
AmplabJenkins commented on issue #27229: [SPARK-29708][SQL][BRANCH-2.4] Correct aggregated values when grouping sets are duplicated URL: https://github.com/apache/spark/pull/27229#issuecomment-575003048 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21594/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27229: [SPARK-29708][SQL][BRANCH-2.4] Correct aggregated values when grouping sets are duplicated
AmplabJenkins commented on issue #27229: [SPARK-29708][SQL][BRANCH-2.4] Correct aggregated values when grouping sets are duplicated URL: https://github.com/apache/spark/pull/27229#issuecomment-575003044 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away
SparkQA commented on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away URL: https://github.com/apache/spark/pull/27096#issuecomment-575002718 **[Test build #116821 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116821/testReport)** for PR 27096 at commit [`9e895bd`](https://github.com/apache/spark/commit/9e895bd16162577196b83add1ef3d99bb4fc0d08). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on issue #27229: [SPARK-29708][SQL][BRANCH-2.4] Correct aggregated values when grouping sets are duplicated
maropu commented on issue #27229: [SPARK-29708][SQL][BRANCH-2.4] Correct aggregated values when grouping sets are duplicated URL: https://github.com/apache/spark/pull/27229#issuecomment-575002671 cc: @dongjoon-hyun This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu opened a new pull request #27229: [SPARK-29708][SQL][BRANCH-2.4] Correct aggregated values when grouping sets are duplicated
maropu opened a new pull request #27229: [SPARK-29708][SQL][BRANCH-2.4] Correct aggregated values when grouping sets are duplicated URL: https://github.com/apache/spark/pull/27229 ### What changes were proposed in this pull request? This pr intends to fix wrong aggregated values in `GROUPING SETS` when there are duplicated grouping sets in a query (e.g., `GROUPING SETS ((k1),(k1))`). For example; ``` scala> spark.table("t").show() +---+---+---+ | k1| k2| v| +---+---+---+ | 0| 0| 3| +---+---+---+ scala> sql("""select grouping_id(), k1, k2, sum(v) from t group by grouping sets ((k1),(k1,k2),(k2,k1),(k1,k2))""").show() +-+---++--+ |grouping_id()| k1| k2|sum(v)| +-+---++--+ |0| 0| 0| 9| < wrong aggregate value and the correct answer is `3` |1| 0|null| 3| +-+---++--+ // PostgreSQL case postgres=# select k1, k2, sum(v) from t group by grouping sets ((k1),(k1,k2),(k2,k1),(k1,k2)); k1 | k2 | sum +--+- 0 |0 | 3 0 |0 | 3 0 |0 | 3 0 | NULL | 3 (4 rows) // Hive case hive> select GROUPING__ID, k1, k2, sum(v) from t group by k1, k2 grouping sets ((k1),(k1,k2),(k2,k1),(k1,k2)); 10 NULL3 00 0 3 ``` [MS SQL Server has the same behaviour with PostgreSQL](https://github.com/apache/spark/pull/26961#issuecomment-573638442). This pr follows the behaviour of PostgreSQL/SQL server; it adds one more virtual attribute in `Expand` for avoiding wrongly grouping rows with the same grouping ID. This is the #26961 backport for `branch-2.4` ### Why are the changes needed? To fix bugs. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? The existing tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] fuwhu commented on a change in pull request #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions
fuwhu commented on a change in pull request #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions URL: https://github.com/apache/spark/pull/26805#discussion_r367247006 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/PruneHiveTablePartitions.scala ## @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive.execution + +import java.io.IOException + +import org.apache.hadoop.hive.common.StatsSetupConst + +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.catalyst.analysis.CastSupport +import org.apache.spark.sql.catalyst.catalog.{CatalogStatistics, CatalogTable, CatalogTablePartition, HiveTableRelation} +import org.apache.spark.sql.catalyst.expressions.{And, AttributeSet, Expression, ExpressionSet, SubqueryExpression} +import org.apache.spark.sql.catalyst.planning.PhysicalOperation +import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, Project} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.datasources.DataSourceStrategy + +/** + * TODO: merge this with PruneFileSourcePartitions after we completely make hive as a data source. + */ +private[sql] class PruneHiveTablePartitions(session: SparkSession) + extends Rule[LogicalPlan] with CastSupport { + + override val conf = session.sessionState.conf + + /** + * Extract the partition filters from the filters on the table. + */ + private def getPartitionKeyFilters( + filters: Seq[Expression], + relation: HiveTableRelation): ExpressionSet = { +val normalizedFilters = DataSourceStrategy.normalizeExprs( + filters.filter(f => f.deterministic && !SubqueryExpression.hasSubquery(f)), relation.output) +val partitionColumnSet = AttributeSet(relation.partitionCols) +ExpressionSet(normalizedFilters.filter { f => + !f.references.isEmpty && f.references.subsetOf(partitionColumnSet) +}) + } + + /** + * Prune the hive table using filters on the partitions of the table. + */ + private def prunePartitions( + relation: HiveTableRelation, + partitionFilters: ExpressionSet): Seq[CatalogTablePartition] = { +if (conf.metastorePartitionPruning) { Review comment: you mean adding a config to control whether we should prune hive table partitions in optimization phase ? And we can check the config in apply , this config can default to be true. Is that expected ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away
AmplabJenkins removed a comment on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away URL: https://github.com/apache/spark/pull/27096#issuecomment-575001069 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away
AmplabJenkins removed a comment on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away URL: https://github.com/apache/spark/pull/27096#issuecomment-575001076 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21593/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] bmarcott commented on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away
bmarcott commented on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away URL: https://github.com/apache/spark/pull/27096#issuecomment-575001493 updated with a new sparkplan rule and added a test which makes sure a user's repartition with a different numPartitions would not be eliminated (don't want to change expected numPartitions). Review when you get a chance :). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away
AmplabJenkins commented on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away URL: https://github.com/apache/spark/pull/27096#issuecomment-575001076 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21593/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on issue #27218: [SPARK-30499][SQL] Remove SQL config spark.sql.execution.pandas.respectSessionTimeZone
MaxGekk commented on issue #27218: [SPARK-30499][SQL] Remove SQL config spark.sql.execution.pandas.respectSessionTimeZone URL: https://github.com/apache/spark/pull/27218#issuecomment-575001185 @HyukjinKwon I checked that tests failed when I set timeZone to nil/none there https://github.com/apache/spark/pull/27218/files#diff-5dad4c4e6faaa6c596e8a40d1dea74f4R67 & https://github.com/apache/spark/pull/27218/files#diff-a56d42b312418a8c63720c57614e76adR145 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away
AmplabJenkins commented on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away URL: https://github.com/apache/spark/pull/27096#issuecomment-575001069 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] fuwhu commented on a change in pull request #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions
fuwhu commented on a change in pull request #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions URL: https://github.com/apache/spark/pull/26805#discussion_r367247006 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/PruneHiveTablePartitions.scala ## @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive.execution + +import java.io.IOException + +import org.apache.hadoop.hive.common.StatsSetupConst + +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.catalyst.analysis.CastSupport +import org.apache.spark.sql.catalyst.catalog.{CatalogStatistics, CatalogTable, CatalogTablePartition, HiveTableRelation} +import org.apache.spark.sql.catalyst.expressions.{And, AttributeSet, Expression, ExpressionSet, SubqueryExpression} +import org.apache.spark.sql.catalyst.planning.PhysicalOperation +import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, Project} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.datasources.DataSourceStrategy + +/** + * TODO: merge this with PruneFileSourcePartitions after we completely make hive as a data source. + */ +private[sql] class PruneHiveTablePartitions(session: SparkSession) + extends Rule[LogicalPlan] with CastSupport { + + override val conf = session.sessionState.conf + + /** + * Extract the partition filters from the filters on the table. + */ + private def getPartitionKeyFilters( + filters: Seq[Expression], + relation: HiveTableRelation): ExpressionSet = { +val normalizedFilters = DataSourceStrategy.normalizeExprs( + filters.filter(f => f.deterministic && !SubqueryExpression.hasSubquery(f)), relation.output) +val partitionColumnSet = AttributeSet(relation.partitionCols) +ExpressionSet(normalizedFilters.filter { f => + !f.references.isEmpty && f.references.subsetOf(partitionColumnSet) +}) + } + + /** + * Prune the hive table using filters on the partitions of the table. + */ + private def prunePartitions( + relation: HiveTableRelation, + partitionFilters: ExpressionSet): Seq[CatalogTablePartition] = { +if (conf.metastorePartitionPruning) { Review comment: you mean adding a config to control whether we should prune hive table partitions ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away
SparkQA commented on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away URL: https://github.com/apache/spark/pull/27096#issuecomment-575000581 **[Test build #116820 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116820/testReport)** for PR 27096 at commit [`2517bea`](https://github.com/apache/spark/commit/2517beae0b2382a9171239d5682364957135318c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27226: [SPARK-30524] [SQL] Disable OptimizeSkewedJoin rule when introducing additional shuffle
AmplabJenkins commented on issue #27226: [SPARK-30524] [SQL] Disable OptimizeSkewedJoin rule when introducing additional shuffle URL: https://github.com/apache/spark/pull/27226#issuecomment-574998912 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27226: [SPARK-30524] [SQL] Disable OptimizeSkewedJoin rule when introducing additional shuffle
AmplabJenkins removed a comment on issue #27226: [SPARK-30524] [SQL] Disable OptimizeSkewedJoin rule when introducing additional shuffle URL: https://github.com/apache/spark/pull/27226#issuecomment-574998912 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27226: [SPARK-30524] [SQL] Disable OptimizeSkewedJoin rule when introducing additional shuffle
AmplabJenkins commented on issue #27226: [SPARK-30524] [SQL] Disable OptimizeSkewedJoin rule when introducing additional shuffle URL: https://github.com/apache/spark/pull/27226#issuecomment-574998918 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21592/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27226: [SPARK-30524] [SQL] Disable OptimizeSkewedJoin rule when introducing additional shuffle
AmplabJenkins removed a comment on issue #27226: [SPARK-30524] [SQL] Disable OptimizeSkewedJoin rule when introducing additional shuffle URL: https://github.com/apache/spark/pull/27226#issuecomment-574998918 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21592/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #26929: [SPARK-30289][SQL] DSv2's partitioning should not accept nested columns
cloud-fan commented on issue #26929: [SPARK-30289][SQL] DSv2's partitioning should not accept nested columns URL: https://github.com/apache/spark/pull/26929#issuecomment-574998506 I think Spark should support all kinds of PARTITION BY expressions as long as it can be translated to v2 `Transform`. The catalog implementation should decide if they support it or not. For examaple, Hive catalog doesn't support partition by nested columns. For the particular test failure, I think we should fix `InMemoryTable` that, when flatten the fields, we should keep the full column path not just the name. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27226: [SPARK-30524] [SQL] Disable OptimizeSkewedJoin rule when introducing additional shuffle
SparkQA commented on issue #27226: [SPARK-30524] [SQL] Disable OptimizeSkewedJoin rule when introducing additional shuffle URL: https://github.com/apache/spark/pull/27226#issuecomment-574998541 **[Test build #116819 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116819/testReport)** for PR 27226 at commit [`c37f397`](https://github.com/apache/spark/commit/c37f39774825c052cd93f35288ff105c8292c343). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27228: [SPARK-29565][FOLLOWUP] add setInputCol/setOutputCol in OHEModel
AmplabJenkins removed a comment on issue #27228: [SPARK-29565][FOLLOWUP] add setInputCol/setOutputCol in OHEModel URL: https://github.com/apache/spark/pull/27228#issuecomment-574997996 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27228: [SPARK-29565][FOLLOWUP] add setInputCol/setOutputCol in OHEModel
AmplabJenkins removed a comment on issue #27228: [SPARK-29565][FOLLOWUP] add setInputCol/setOutputCol in OHEModel URL: https://github.com/apache/spark/pull/27228#issuecomment-574998004 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116812/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27228: [SPARK-29565][FOLLOWUP] add setInputCol/setOutputCol in OHEModel
SparkQA removed a comment on issue #27228: [SPARK-29565][FOLLOWUP] add setInputCol/setOutputCol in OHEModel URL: https://github.com/apache/spark/pull/27228#issuecomment-574980877 **[Test build #116812 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116812/testReport)** for PR 27228 at commit [`b109d90`](https://github.com/apache/spark/commit/b109d908b691c48bf91972aca2d0eb411edc6949). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org