date:20200115

[GitHub] [spark] SparkQA removed a comment on issue #27226: [SPARK-30524] [SQL] Disable OptimizeSkewedJoin rule when introducing additional shuffle

2020-01-15 Thread GitBox

SparkQA removed a comment on issue #27226: [SPARK-30524] [SQL] Disable 
OptimizeSkewedJoin rule when introducing additional shuffle
URL: https://github.com/apache/spark/pull/27226#issuecomment-574972361
 
 
   **[Test build #116810 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116810/testReport)**
 for PR 27226 at commit 
[`903309f`](https://github.com/apache/spark/commit/903309fe5e7bc8e1ca69ad7f4c8e22d64acb11b3).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27226: [SPARK-30524] [SQL] Disable OptimizeSkewedJoin rule when introducing additional shuffle

2020-01-15 Thread GitBox

AmplabJenkins commented on issue #27226: [SPARK-30524] [SQL] Disable 
OptimizeSkewedJoin rule when introducing additional shuffle
URL: https://github.com/apache/spark/pull/27226#issuecomment-575029908
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27226: [SPARK-30524] [SQL] Disable OptimizeSkewedJoin rule when introducing additional shuffle

2020-01-15 Thread GitBox

AmplabJenkins removed a comment on issue #27226: [SPARK-30524] [SQL] Disable 
OptimizeSkewedJoin rule when introducing additional shuffle
URL: https://github.com/apache/spark/pull/27226#issuecomment-575029908
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27226: [SPARK-30524] [SQL] Disable OptimizeSkewedJoin rule when introducing additional shuffle

2020-01-15 Thread GitBox

AmplabJenkins removed a comment on issue #27226: [SPARK-30524] [SQL] Disable 
OptimizeSkewedJoin rule when introducing additional shuffle
URL: https://github.com/apache/spark/pull/27226#issuecomment-575029916
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116810/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27226: [SPARK-30524] [SQL] Disable OptimizeSkewedJoin rule when introducing additional shuffle

2020-01-15 Thread GitBox

AmplabJenkins commented on issue #27226: [SPARK-30524] [SQL] Disable 
OptimizeSkewedJoin rule when introducing additional shuffle
URL: https://github.com/apache/spark/pull/27226#issuecomment-575029916
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116810/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27226: [SPARK-30524] [SQL] Disable OptimizeSkewedJoin rule when introducing additional shuffle

2020-01-15 Thread GitBox

SparkQA commented on issue #27226: [SPARK-30524] [SQL] Disable 
OptimizeSkewedJoin rule when introducing additional shuffle
URL: https://github.com/apache/spark/pull/27226#issuecomment-575029403
 
 
   **[Test build #116810 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116810/testReport)**
 for PR 27226 at commit 
[`903309f`](https://github.com/apache/spark/commit/903309fe5e7bc8e1ca69ad7f4c8e22d64acb11b3).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression allows

2020-01-15 Thread GitBox

AmplabJenkins removed a comment on issue #27058: [SPARK-30395][SQL] When one or 
more DISTINCT aggregate expressions operate on the same field, the DISTINCT 
aggregate expression allows the use of the FILTER clause
URL: https://github.com/apache/spark/pull/27058#issuecomment-575028449
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21597/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression allows

2020-01-15 Thread GitBox

AmplabJenkins removed a comment on issue #27058: [SPARK-30395][SQL] When one or 
more DISTINCT aggregate expressions operate on the same field, the DISTINCT 
aggregate expression allows the use of the FILTER clause
URL: https://github.com/apache/spark/pull/27058#issuecomment-575028437
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away

2020-01-15 Thread GitBox

AmplabJenkins removed a comment on issue #27096: [SPARK-28148][SQL] Repartition 
after join is not optimized away
URL: https://github.com/apache/spark/pull/27096#issuecomment-575028175
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116821/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression allows the use

2020-01-15 Thread GitBox

AmplabJenkins commented on issue #27058: [SPARK-30395][SQL] When one or more 
DISTINCT aggregate expressions operate on the same field, the DISTINCT 
aggregate expression allows the use of the FILTER clause
URL: https://github.com/apache/spark/pull/27058#issuecomment-575028449
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21597/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away

2020-01-15 Thread GitBox

AmplabJenkins removed a comment on issue #27096: [SPARK-28148][SQL] Repartition 
after join is not optimized away
URL: https://github.com/apache/spark/pull/27096#issuecomment-575028166
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression allows the use

2020-01-15 Thread GitBox

AmplabJenkins commented on issue #27058: [SPARK-30395][SQL] When one or more 
DISTINCT aggregate expressions operate on the same field, the DISTINCT 
aggregate expression allows the use of the FILTER clause
URL: https://github.com/apache/spark/pull/27058#issuecomment-575028437
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away

2020-01-15 Thread GitBox

AmplabJenkins commented on issue #27096: [SPARK-28148][SQL] Repartition after 
join is not optimized away
URL: https://github.com/apache/spark/pull/27096#issuecomment-575028166
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away

2020-01-15 Thread GitBox

AmplabJenkins commented on issue #27096: [SPARK-28148][SQL] Repartition after 
join is not optimized away
URL: https://github.com/apache/spark/pull/27096#issuecomment-575028175
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116821/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away

2020-01-15 Thread GitBox

SparkQA commented on issue #27096: [SPARK-28148][SQL] Repartition after join is 
not optimized away
URL: https://github.com/apache/spark/pull/27096#issuecomment-575028093
 
 
   **[Test build #116821 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116821/testReport)**
 for PR 27096 at commit 
[`9e895bd`](https://github.com/apache/spark/commit/9e895bd16162577196b83add1ef3d99bb4fc0d08).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away

2020-01-15 Thread GitBox

SparkQA removed a comment on issue #27096: [SPARK-28148][SQL] Repartition after 
join is not optimized away
URL: https://github.com/apache/spark/pull/27096#issuecomment-575002718
 
 
   **[Test build #116821 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116821/testReport)**
 for PR 27096 at commit 
[`9e895bd`](https://github.com/apache/spark/commit/9e895bd16162577196b83add1ef3d99bb4fc0d08).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression allows the use of th

2020-01-15 Thread GitBox

SparkQA commented on issue #27058: [SPARK-30395][SQL] When one or more DISTINCT 
aggregate expressions operate on the same field, the DISTINCT aggregate 
expression allows the use of the FILTER clause
URL: https://github.com/apache/spark/pull/27058#issuecomment-575027955
 
 
   **[Test build #116825 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116825/testReport)**
 for PR 27058 at commit 
[`a83efcf`](https://github.com/apache/spark/commit/a83efcf57021167bf9829f9f1ee2039ea9e86213).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] imback82 commented on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework

2020-01-15 Thread GitBox

imback82 commented on issue #27187: [SPARK-30497][SQL]  migrate DESCRIBE TABLE 
to the new framework
URL: https://github.com/apache/spark/pull/27187#issuecomment-575027510
 
 
   Thanks. I will start migrating more command to the new framework this week.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] imback82 edited a comment on issue #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework

2020-01-15 Thread GitBox

imback82 edited a comment on issue #27187: [SPARK-30497][SQL]  migrate DESCRIBE 
TABLE to the new framework
URL: https://github.com/apache/spark/pull/27187#issuecomment-575027510
 
 
   Thanks. I will start migrating more commands to the new framework this week.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25827: [SPARK-29128][SQL] Split predicate code in OR expressions

2020-01-15 Thread GitBox

AmplabJenkins removed a comment on issue #25827: [SPARK-29128][SQL] Split 
predicate code in OR expressions
URL: https://github.com/apache/spark/pull/25827#issuecomment-575026025
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25827: [SPARK-29128][SQL] Split predicate code in OR expressions

2020-01-15 Thread GitBox

AmplabJenkins removed a comment on issue #25827: [SPARK-29128][SQL] Split 
predicate code in OR expressions
URL: https://github.com/apache/spark/pull/25827#issuecomment-575026030
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21596/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25827: [SPARK-29128][SQL] Split predicate code in OR expressions

2020-01-15 Thread GitBox

AmplabJenkins commented on issue #25827: [SPARK-29128][SQL] Split predicate 
code in OR expressions
URL: https://github.com/apache/spark/pull/25827#issuecomment-575026025
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25827: [SPARK-29128][SQL] Split predicate code in OR expressions

2020-01-15 Thread GitBox

AmplabJenkins commented on issue #25827: [SPARK-29128][SQL] Split predicate 
code in OR expressions
URL: https://github.com/apache/spark/pull/25827#issuecomment-575026030
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21596/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] fuwhu commented on a change in pull request #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions

2020-01-15 Thread GitBox

fuwhu commented on a change in pull request #26805: [SPARK-15616][SQL] Add 
optimizer rule PruneHiveTablePartitions
URL: https://github.com/apache/spark/pull/26805#discussion_r367247006
 
 

 ##
 File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/PruneHiveTablePartitions.scala
 ##
 @@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import java.io.IOException
+
+import org.apache.hadoop.hive.common.StatsSetupConst
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.analysis.CastSupport
+import org.apache.spark.sql.catalyst.catalog.{CatalogStatistics, CatalogTable, 
CatalogTablePartition, HiveTableRelation}
+import org.apache.spark.sql.catalyst.expressions.{And, AttributeSet, 
Expression, ExpressionSet, SubqueryExpression}
+import org.apache.spark.sql.catalyst.planning.PhysicalOperation
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, 
Project}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.datasources.DataSourceStrategy
+
+/**
+ * TODO: merge this with PruneFileSourcePartitions after we completely make 
hive as a data source.
+ */
+private[sql] class PruneHiveTablePartitions(session: SparkSession)
+  extends Rule[LogicalPlan] with CastSupport {
+
+  override val conf = session.sessionState.conf
+
+  /**
+   * Extract the partition filters from the filters on the table.
+   */
+  private def getPartitionKeyFilters(
+  filters: Seq[Expression],
+  relation: HiveTableRelation): ExpressionSet = {
+val normalizedFilters = DataSourceStrategy.normalizeExprs(
+  filters.filter(f => f.deterministic && 
!SubqueryExpression.hasSubquery(f)), relation.output)
+val partitionColumnSet = AttributeSet(relation.partitionCols)
+ExpressionSet(normalizedFilters.filter { f =>
+  !f.references.isEmpty && f.references.subsetOf(partitionColumnSet)
+})
+  }
+
+  /**
+   * Prune the hive table using filters on the partitions of the table.
+   */
+  private def prunePartitions(
+  relation: HiveTableRelation,
+  partitionFilters: ExpressionSet): Seq[CatalogTablePartition] = {
+if (conf.metastorePartitionPruning) {
 
 Review comment:
   you mean adding a config to control whether we should prune table partitions 
in optimization phase ?
   And we can check the config in apply , this config can default to be true.
   And this config can also be checked in PruneFileSourcePartitions.apply.
   Is that expected ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25827: [SPARK-29128][SQL] Split predicate code in OR expressions

2020-01-15 Thread GitBox

SparkQA commented on issue #25827: [SPARK-29128][SQL] Split predicate code in 
OR expressions
URL: https://github.com/apache/spark/pull/25827#issuecomment-575025592
 
 
   **[Test build #116824 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116824/testReport)**
 for PR 25827 at commit 
[`543c016`](https://github.com/apache/spark/commit/543c0167dab23ece2e4db232c0fd7d4c9e5eeb8e).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] kiszk commented on issue #25827: [SPARK-29128][SQL] Split predicate code in OR expressions

2020-01-15 Thread GitBox

kiszk commented on issue #25827: [SPARK-29128][SQL] Split predicate code in OR 
expressions
URL: https://github.com/apache/spark/pull/25827#issuecomment-575025184
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dilipbiswal commented on a change in pull request #26759: [SPARK-28794][SQL][DOC] Documentation for Create table Command

2020-01-15 Thread GitBox

dilipbiswal commented on a change in pull request #26759: 
[SPARK-28794][SQL][DOC] Documentation for Create table Command
URL: https://github.com/apache/spark/pull/26759#discussion_r367269598
 
 

 ##
 File path: docs/sql-ref-syntax-ddl-create-table-hiveformat.md
 ##
 @@ -0,0 +1,105 @@
+---
+layout: global
+title: CREATE HIVEFORMAT TABLE
+displayTitle: CREATE HIVEFORMAT TABLE
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+ 
+ http://www.apache.org/licenses/LICENSE-2.0
+ 
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+### Description
+
+The `CREATE TABLE` statement creates a new table using Hive format.
+
+### Syntax
+{% highlight sql %}
+CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
+  [(col_name1[:] col_type1 [COMMENT col_comment1], ...)]
+  [COMMENT table_comment]
+  [PARTITIONED BY (col_name2[:] col_type2 [COMMENT col_comment2], ...)]
+  [ROW FORMAT row_format]
+  [STORED AS file_format]
+  [LOCATION path]
+  [TBLPROPERTIES (key1=val1, key2=val2, ...)]
+  [AS select_statement]
+
+{% endhighlight %}
+
+### Parameters
+
+
+  EXTERNAL
+  Table is created using the path provided as LOCATION, does not use 
default location for this table.
+
+
+
+  PARTITIONED BY
+  Partitions are created on the table, based on the columns specified.
+
+
+
+  ROW FORMAT
+  SERDE is used to specify a custom SerDe or the DELIMITED clause inorder 
to use the native SerDe.
+
+
+
+  STORED
 
 Review comment:
   STORED AS ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5

2020-01-15 Thread GitBox

AmplabJenkins removed a comment on issue #27227: [SPARK-29290][CORE] Update to 
chill 0.9.5
URL: https://github.com/apache/spark/pull/27227#issuecomment-575023664
 
 
   Build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dilipbiswal commented on a change in pull request #26759: [SPARK-28794][SQL][DOC] Documentation for Create table Command

2020-01-15 Thread GitBox

dilipbiswal commented on a change in pull request #26759: 
[SPARK-28794][SQL][DOC] Documentation for Create table Command
URL: https://github.com/apache/spark/pull/26759#discussion_r367268754
 
 

 ##
 File path: docs/sql-ref-syntax-ddl-create-table-datasource.md
 ##
 @@ -0,0 +1,97 @@
+---
+layout: global
+title: CREATE DATASOURCE TABLE
+displayTitle: CREATE DATASOURCE TABLE
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+ 
+ http://www.apache.org/licenses/LICENSE-2.0
+ 
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+
+### Description
+
+The `CREATE TABLE` statement creates a new table using a Data Source. 
+
+### Syntax
+{% highlight sql %}
+CREATE TABLE [IF NOT EXISTS] [db_name.]table_name
+  [(col_name1 col_type1 [COMMENT col_comment1], ...)]
+  USING data_source
+  [OPTIONS (key1=val1, key2=val2, ...)]
+  [PARTITIONED BY (col_name1, col_name2, ...)]
+  [CLUSTERED BY (col_name3, col_name4, ...) INTO num_buckets BUCKETS]
+  [LOCATION path]
+  [COMMENT table_comment]
+  [TBLPROPERTIES (key1=val1, key2=val2, ...)]
+  [AS select_statement]
+{% endhighlight %}
+
+### Parameters
+
+
+  USING data_source
+  Data Source is the file format used to create the table. Data source can 
be CSV, TXT, ORC, JDBC, PARQUET, etc.
 
 Review comment:
   should we say "input format" instead of "file format".  For example, JDBC is 
data source is not a file format, right ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5

2020-01-15 Thread GitBox

AmplabJenkins commented on issue #27227: [SPARK-29290][CORE] Update to chill 
0.9.5
URL: https://github.com/apache/spark/pull/27227#issuecomment-575023671
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116811/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5

2020-01-15 Thread GitBox

AmplabJenkins commented on issue #27227: [SPARK-29290][CORE] Update to chill 
0.9.5
URL: https://github.com/apache/spark/pull/27227#issuecomment-575023664
 
 
   Build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5

2020-01-15 Thread GitBox

AmplabJenkins removed a comment on issue #27227: [SPARK-29290][CORE] Update to 
chill 0.9.5
URL: https://github.com/apache/spark/pull/27227#issuecomment-575023671
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116811/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5

2020-01-15 Thread GitBox

SparkQA removed a comment on issue #27227: [SPARK-29290][CORE] Update to chill 
0.9.5
URL: https://github.com/apache/spark/pull/27227#issuecomment-574977039
 
 
   **[Test build #116811 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116811/testReport)**
 for PR 27227 at commit 
[`dbc819a`](https://github.com/apache/spark/commit/dbc819af5d3b6eccaa3874be45e1f076fdeaecd1).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5

2020-01-15 Thread GitBox

SparkQA commented on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5
URL: https://github.com/apache/spark/pull/27227#issuecomment-575022655
 
 
   **[Test build #116811 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116811/testReport)**
 for PR 27227 at commit 
[`dbc819a`](https://github.com/apache/spark/commit/dbc819af5d3b6eccaa3874be45e1f076fdeaecd1).
* This patch passes all tests.
* This patch **does not merge cleanly**.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dilipbiswal commented on a change in pull request #26759: [SPARK-28794][SQL][DOC] Documentation for Create table Command

2020-01-15 Thread GitBox

dilipbiswal commented on a change in pull request #26759: 
[SPARK-28794][SQL][DOC] Documentation for Create table Command
URL: https://github.com/apache/spark/pull/26759#discussion_r367267170
 
 

 ##
 File path: docs/sql-ref-syntax-ddl-create-table-datasource.md
 ##
 @@ -0,0 +1,97 @@
+---
+layout: global
+title: CREATE DATASOURCE TABLE
+displayTitle: CREATE DATASOURCE TABLE
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+ 
+ http://www.apache.org/licenses/LICENSE-2.0
+ 
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+
+### Description
+
+The `CREATE TABLE` statement creates a new table using a Data Source. 
+
+### Syntax
+{% highlight sql %}
+CREATE TABLE [IF NOT EXISTS] [db_name.]table_name
+  [(col_name1 col_type1 [COMMENT col_comment1], ...)]
+  USING data_source
+  [OPTIONS (key1=val1, key2=val2, ...)]
+  [PARTITIONED BY (col_name1, col_name2, ...)]
+  [CLUSTERED BY (col_name3, col_name4, ...) INTO num_buckets BUCKETS]
+  [LOCATION path]
+  [COMMENT table_comment]
+  [TBLPROPERTIES (key1=val1, key2=val2, ...)]
+  [AS select_statement]
+{% endhighlight %}
+
+### Parameters
+
+
+  USING data_source
+  Data Source is the file format used to create the table. Data source can 
be CSV, TXT, ORC, JDBC, PARQUET, etc.
+ 
+
+
+  PARTITIONED BY
+  Partitions are created on the table, based on the columns specified.
+
+
+
+  CLUSTERED BY
+  
+   Partitions created on the table will be bucketed into fixed buckets 
based on the column specified for bucketing.
+   NOTE:Bucketing is an optimization technique that uses buckets 
(and bucketing columns) to determine data partitioning and avoid data shuffle.
+  
+
+
+
+  LOCATION
+  Path to the directory where table data is stored, could be filesystem, 
HDFS, etc.
+
+
+
+  COMMENT
+  Table comments are added.
+
+
+
+  TBLPROPERTIES
+  Table properties that has to be set are specified such as 
`created.by.user`, `owner`, etc.
+  
+
+
+
+  AS select_statement
+  The table is populated using the data from the select statement.
+
+
+### Examples
+{% highlight sql %}
+
+--Using data source
+CREATE TABLE Student (width INT, length INT, height INT) USING CSV
 
 Review comment:
   perhaps change the column names to id, name, age to be more meaningful ? 
Also can you please put semi colon at the end in the examples just to be 
consistent with other docs ?
   
   cc @huaxingao can you please check on the consistency part if you have some 
time ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions

2020-01-15 Thread GitBox

AmplabJenkins commented on issue #26805: [SPARK-15616][SQL] Add optimizer rule 
PruneHiveTablePartitions
URL: https://github.com/apache/spark/pull/26805#issuecomment-575020896
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21595/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions

2020-01-15 Thread GitBox

AmplabJenkins removed a comment on issue #26805: [SPARK-15616][SQL] Add 
optimizer rule PruneHiveTablePartitions
URL: https://github.com/apache/spark/pull/26805#issuecomment-575020896
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21595/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions

2020-01-15 Thread GitBox

AmplabJenkins removed a comment on issue #26805: [SPARK-15616][SQL] Add 
optimizer rule PruneHiveTablePartitions
URL: https://github.com/apache/spark/pull/26805#issuecomment-575020889
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions

2020-01-15 Thread GitBox

AmplabJenkins commented on issue #26805: [SPARK-15616][SQL] Add optimizer rule 
PruneHiveTablePartitions
URL: https://github.com/apache/spark/pull/26805#issuecomment-575020889
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27165: [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types

2020-01-15 Thread GitBox

AmplabJenkins removed a comment on issue #27165: [SPARK-28264][PYTHON][SQL] 
Support type hints in pandas UDF and rename/move inconsistent pandas UDF types
URL: https://github.com/apache/spark/pull/27165#issuecomment-575019400
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116814/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions

2020-01-15 Thread GitBox

SparkQA commented on issue #26805: [SPARK-15616][SQL] Add optimizer rule 
PruneHiveTablePartitions
URL: https://github.com/apache/spark/pull/26805#issuecomment-575020489
 
 
   **[Test build #116823 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116823/testReport)**
 for PR 26805 at commit 
[`f89947b`](https://github.com/apache/spark/commit/f89947beff6ae5bf61fd83cd646db47381c8db57).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan closed pull request #27209: [SPARK-29450][SS][2.4] Measure the number of output rows for streaming aggregation with append mode

2020-01-15 Thread GitBox

cloud-fan closed pull request #27209: [SPARK-29450][SS][2.4] Measure the number 
of output rows for streaming aggregation with append mode
URL: https://github.com/apache/spark/pull/27209
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #27209: [SPARK-29450][SS][2.4] Measure the number of output rows for streaming aggregation with append mode

2020-01-15 Thread GitBox

cloud-fan commented on issue #27209: [SPARK-29450][SS][2.4] Measure the number 
of output rows for streaming aggregation with append mode
URL: https://github.com/apache/spark/pull/27209#issuecomment-575020190
 
 
   thanks, merging to 2.4!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27165: [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types

2020-01-15 Thread GitBox

AmplabJenkins removed a comment on issue #27165: [SPARK-28264][PYTHON][SQL] 
Support type hints in pandas UDF and rename/move inconsistent pandas UDF types
URL: https://github.com/apache/spark/pull/27165#issuecomment-575019391
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27165: [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types

2020-01-15 Thread GitBox

AmplabJenkins commented on issue #27165: [SPARK-28264][PYTHON][SQL] Support 
type hints in pandas UDF and rename/move inconsistent pandas UDF types
URL: https://github.com/apache/spark/pull/27165#issuecomment-575019400
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116814/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #27165: [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types

2020-01-15 Thread GitBox

SparkQA removed a comment on issue #27165: [SPARK-28264][PYTHON][SQL] Support 
type hints in pandas UDF and rename/move inconsistent pandas UDF types
URL: https://github.com/apache/spark/pull/27165#issuecomment-574980876
 
 
   **[Test build #116814 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116814/testReport)**
 for PR 27165 at commit 
[`b0447a7`](https://github.com/apache/spark/commit/b0447a7cd4b4cc3e4881f7e9ad264bea656633ac).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27165: [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types

2020-01-15 Thread GitBox

AmplabJenkins commented on issue #27165: [SPARK-28264][PYTHON][SQL] Support 
type hints in pandas UDF and rename/move inconsistent pandas UDF types
URL: https://github.com/apache/spark/pull/27165#issuecomment-575019391
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27165: [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types

2020-01-15 Thread GitBox

SparkQA commented on issue #27165: [SPARK-28264][PYTHON][SQL] Support type 
hints in pandas UDF and rename/move inconsistent pandas UDF types
URL: https://github.com/apache/spark/pull/27165#issuecomment-575019064
 
 
   **[Test build #116814 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116814/testReport)**
 for PR 27165 at commit 
[`b0447a7`](https://github.com/apache/spark/commit/b0447a7cd4b4cc3e4881f7e9ad264bea656633ac).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5

2020-01-15 Thread GitBox

AmplabJenkins removed a comment on issue #27227: [SPARK-29290][CORE] Update to 
chill 0.9.5
URL: https://github.com/apache/spark/pull/27227#issuecomment-575015480
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116813/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5

2020-01-15 Thread GitBox

AmplabJenkins removed a comment on issue #27227: [SPARK-29290][CORE] Update to 
chill 0.9.5
URL: https://github.com/apache/spark/pull/27227#issuecomment-575015475
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5

2020-01-15 Thread GitBox

AmplabJenkins commented on issue #27227: [SPARK-29290][CORE] Update to chill 
0.9.5
URL: https://github.com/apache/spark/pull/27227#issuecomment-575015475
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5

2020-01-15 Thread GitBox

SparkQA removed a comment on issue #27227: [SPARK-29290][CORE] Update to chill 
0.9.5
URL: https://github.com/apache/spark/pull/27227#issuecomment-574980880
 
 
   **[Test build #116813 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116813/testReport)**
 for PR 27227 at commit 
[`89e2af4`](https://github.com/apache/spark/commit/89e2af48f2a5eb91087aa0a65b00027305c0439d).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5

2020-01-15 Thread GitBox

SparkQA commented on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5
URL: https://github.com/apache/spark/pull/27227#issuecomment-575015232
 
 
   **[Test build #116813 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116813/testReport)**
 for PR 27227 at commit 
[`89e2af4`](https://github.com/apache/spark/commit/89e2af48f2a5eb91087aa0a65b00027305c0439d).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27227: [SPARK-29290][CORE] Update to chill 0.9.5

2020-01-15 Thread GitBox

AmplabJenkins commented on issue #27227: [SPARK-29290][CORE] Update to chill 
0.9.5
URL: https://github.com/apache/spark/pull/27227#issuecomment-575015480
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116813/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] bmarcott edited a comment on issue #27207: [WIP][SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling.

2020-01-15 Thread GitBox

bmarcott edited a comment on issue #27207: [WIP][SPARK-18886][CORE] Make 
Locality wait time measure resource under utilization due to delay scheduling.
URL: https://github.com/apache/spark/pull/27207#issuecomment-575011886
 
 
   @tgravescs 
   Thanks for the comments.
   
   > so please update the description with information from the other PR
   
   Which one of my snippets from the previous PR was most clear to you? I can 
put that one in the description.
   
   > One thing I don't think I like
   
   Really good point on this scenario. It's bad even for the same scenario that 
you described but where the all resource offer has only 1 executor, and the 
first taskset accepts it.
   Let me know if you have any good ideas here  
   
   Is it ok I do follow up changes, such as variable names, unit tests, and 
other backend schedulers only once we iron out the problematic scenarios?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] kiszk commented on issue #23988: [SPARK-26509][SQL] Parquet DELTA_BYTE_ARRAY is not supported in Spark 2.x's Vectorized Reader

2020-01-15 Thread GitBox

kiszk commented on issue #23988: [SPARK-26509][SQL] Parquet DELTA_BYTE_ARRAY is 
not supported in Spark 2.x's Vectorized Reader
URL: https://github.com/apache/spark/pull/23988#issuecomment-575012114
 
 
   ping @nandorKollar


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] bmarcott commented on issue #27207: [WIP][SPARK-18886][CORE] Make Locality wait time measure resource under utilization due to delay scheduling.

2020-01-15 Thread GitBox

bmarcott commented on issue #27207: [WIP][SPARK-18886][CORE] Make Locality wait 
time measure resource under utilization due to delay scheduling.
URL: https://github.com/apache/spark/pull/27207#issuecomment-575011886
 
 
   @tgravescs 
   Thanks for the comments.
   
   > so please update the description with information from the other PR
   
   Which one of my snippets from the previous PR was most clear to you? I can 
put that one in the description.
   
   > One thing I don't think I like
   
   Really good point on this scenario. It's bad even for the same scenario that 
you described but where the all resource offer has only 1 executor, and the 
first taskset accepts it.
   Let me know if you have any good ideas here  


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27231: [SPARK-28478] [SQL] Remove redundant null checks

2020-01-15 Thread GitBox

AmplabJenkins removed a comment on issue #27231: [SPARK-28478] [SQL] Remove 
redundant null checks
URL: https://github.com/apache/spark/pull/27231#issuecomment-575011286
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27231: [SPARK-28478] [SQL] Remove redundant null checks

2020-01-15 Thread GitBox

AmplabJenkins commented on issue #27231: [SPARK-28478] [SQL] Remove redundant 
null checks
URL: https://github.com/apache/spark/pull/27231#issuecomment-575011584
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27231: [SPARK-28478] [SQL] Remove redundant null checks

2020-01-15 Thread GitBox

AmplabJenkins commented on issue #27231: [SPARK-28478] [SQL] Remove redundant 
null checks
URL: https://github.com/apache/spark/pull/27231#issuecomment-575011286
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] davidvrba opened a new pull request #27231: [SPARK-28478] [SQL] Remove redundant null checks

2020-01-15 Thread GitBox

davidvrba opened a new pull request #27231: [SPARK-28478] [SQL] Remove 
redundant null checks
URL: https://github.com/apache/spark/pull/27231
 
 
   ### What changes were proposed in this pull request?
   The purpose of this pr is to remove explicit null checks if they are not 
needed in order to simplify the generated code. Here is one example: 
   
   Expressions of this type
   ```
   CASE WHEN isnull(title#5) THEN title#5 ELSE substring(title#5, 0, 3) END
   ```
   are simplified to 
   ```
   substring(title#5, 0, 3)
   ```
   if the considered expression is null-intolerant.
   
   
   
   
   ### Why are the changes needed?
   It simplifies expressions in the query plan which leads to potential 
optimization due to simplified codegen.
   
   
   ### Does this PR introduce any user-facing change?
   No
   
   
   ### How was this patch tested?
   New tests are added.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27230: [SPARK-27868][CORE] Partially rollback previous change #09ed64d.

2020-01-15 Thread GitBox

AmplabJenkins removed a comment on issue #27230: [SPARK-27868][CORE] Partially 
rollback previous change #09ed64d.
URL: https://github.com/apache/spark/pull/27230#issuecomment-575006734
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27230: [SPARK-27868][CORE] Partially rollback previous change #09ed64d.

2020-01-15 Thread GitBox

AmplabJenkins commented on issue #27230: [SPARK-27868][CORE] Partially rollback 
previous change #09ed64d.
URL: https://github.com/apache/spark/pull/27230#issuecomment-575007102
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xCASx commented on issue #24732: [SPARK-27868][core] Better default value and documentation for socket server backlog.

2020-01-15 Thread GitBox

xCASx commented on issue #24732: [SPARK-27868][core] Better default value and 
documentation for socket server backlog.
URL: https://github.com/apache/spark/pull/24732#issuecomment-575006840
 
 
   Pull request has been sent. Not sure if this case requires a separate Jira 
ticket. Reused original SPARK-27868.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27230: [SPARK-27868][CORE] Partially rollback previous change #09ed64d.

2020-01-15 Thread GitBox

AmplabJenkins commented on issue #27230: [SPARK-27868][CORE] Partially rollback 
previous change #09ed64d.
URL: https://github.com/apache/spark/pull/27230#issuecomment-575006734
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xCASx opened a new pull request #27230: [SPARK-27868][CORE] Partially rollback previous change #09ed64d.

2020-01-15 Thread GitBox

xCASx opened a new pull request #27230: [SPARK-27868][CORE] Partially rollback 
previous change #09ed64d.
URL: https://github.com/apache/spark/pull/27230
 
 
   The default value for backLog set back to -1, as any other value may break 
existing configuration by overriding Netty's default 
io.netty.util.NetUtil#SOMAXCONN. The documentation accordingly adjusted.
   See discussion thread: https://github.com/apache/spark/pull/24732
   
   
   
   ### What changes were proposed in this pull request?
   Partial rollback of https://github.com/apache/spark/pull/24732 (default for 
backLog set back to -1).
   
   
   
   ### Why are the changes needed?
   Previous change introduces backward incompatibility by overriding default of 
Netty's `io.netty.util.NetUtil#SOMAXCONN`
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27197: [SPARK-30507][SQL] TableCalalog reserved properties shoudn't be changed via options or tblpropeties

2020-01-15 Thread GitBox

AmplabJenkins removed a comment on issue #27197: [SPARK-30507][SQL] 
TableCalalog reserved properties shoudn't be changed via options or tblpropeties
URL: https://github.com/apache/spark/pull/27197#issuecomment-575005169
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116806/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27197: [SPARK-30507][SQL] TableCalalog reserved properties shoudn't be changed via options or tblpropeties

2020-01-15 Thread GitBox

AmplabJenkins removed a comment on issue #27197: [SPARK-30507][SQL] 
TableCalalog reserved properties shoudn't be changed via options or tblpropeties
URL: https://github.com/apache/spark/pull/27197#issuecomment-575005165
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] fuwhu commented on a change in pull request #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions

2020-01-15 Thread GitBox

fuwhu commented on a change in pull request #26805: [SPARK-15616][SQL] Add 
optimizer rule PruneHiveTablePartitions
URL: https://github.com/apache/spark/pull/26805#discussion_r367247006
 
 

 ##
 File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/PruneHiveTablePartitions.scala
 ##
 @@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import java.io.IOException
+
+import org.apache.hadoop.hive.common.StatsSetupConst
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.analysis.CastSupport
+import org.apache.spark.sql.catalyst.catalog.{CatalogStatistics, CatalogTable, 
CatalogTablePartition, HiveTableRelation}
+import org.apache.spark.sql.catalyst.expressions.{And, AttributeSet, 
Expression, ExpressionSet, SubqueryExpression}
+import org.apache.spark.sql.catalyst.planning.PhysicalOperation
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, 
Project}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.datasources.DataSourceStrategy
+
+/**
+ * TODO: merge this with PruneFileSourcePartitions after we completely make 
hive as a data source.
+ */
+private[sql] class PruneHiveTablePartitions(session: SparkSession)
+  extends Rule[LogicalPlan] with CastSupport {
+
+  override val conf = session.sessionState.conf
+
+  /**
+   * Extract the partition filters from the filters on the table.
+   */
+  private def getPartitionKeyFilters(
+  filters: Seq[Expression],
+  relation: HiveTableRelation): ExpressionSet = {
+val normalizedFilters = DataSourceStrategy.normalizeExprs(
+  filters.filter(f => f.deterministic && 
!SubqueryExpression.hasSubquery(f)), relation.output)
+val partitionColumnSet = AttributeSet(relation.partitionCols)
+ExpressionSet(normalizedFilters.filter { f =>
+  !f.references.isEmpty && f.references.subsetOf(partitionColumnSet)
+})
+  }
+
+  /**
+   * Prune the hive table using filters on the partitions of the table.
+   */
+  private def prunePartitions(
+  relation: HiveTableRelation,
+  partitionFilters: ExpressionSet): Seq[CatalogTablePartition] = {
+if (conf.metastorePartitionPruning) {
 
 Review comment:
   you mean adding a config to control whether we should prune hive table 
partitions in optimization phase ?
   And we can check the config in apply , this config can default to be true.
   And this config can also be checked in PruneFileSourcePartitions.apply.
   Is that expected ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27197: [SPARK-30507][SQL] TableCalalog reserved properties shoudn't be changed via options or tblpropeties

2020-01-15 Thread GitBox

AmplabJenkins commented on issue #27197: [SPARK-30507][SQL] TableCalalog 
reserved properties shoudn't be changed via options or tblpropeties
URL: https://github.com/apache/spark/pull/27197#issuecomment-575005165
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #27197: [SPARK-30507][SQL] TableCalalog reserved properties shoudn't be changed via options or tblpropeties

2020-01-15 Thread GitBox

SparkQA removed a comment on issue #27197: [SPARK-30507][SQL] TableCalalog 
reserved properties shoudn't be changed via options or tblpropeties
URL: https://github.com/apache/spark/pull/27197#issuecomment-574960835
 
 
   **[Test build #116806 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116806/testReport)**
 for PR 27197 at commit 
[`9e19d27`](https://github.com/apache/spark/commit/9e19d277f914b5c00c0860927613cfc494536e19).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27197: [SPARK-30507][SQL] TableCalalog reserved properties shoudn't be changed via options or tblpropeties

2020-01-15 Thread GitBox

AmplabJenkins commented on issue #27197: [SPARK-30507][SQL] TableCalalog 
reserved properties shoudn't be changed via options or tblpropeties
URL: https://github.com/apache/spark/pull/27197#issuecomment-575005169
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116806/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27197: [SPARK-30507][SQL] TableCalalog reserved properties shoudn't be changed via options or tblpropeties

2020-01-15 Thread GitBox

SparkQA commented on issue #27197: [SPARK-30507][SQL] TableCalalog reserved 
properties shoudn't be changed via options or tblpropeties
URL: https://github.com/apache/spark/pull/27197#issuecomment-575004931
 
 
   **[Test build #116806 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116806/testReport)**
 for PR 27197 at commit 
[`9e19d27`](https://github.com/apache/spark/commit/9e19d277f914b5c00c0860927613cfc494536e19).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27229: [SPARK-29708][SQL][BRANCH-2.4] Correct aggregated values when grouping sets are duplicated

2020-01-15 Thread GitBox

AmplabJenkins removed a comment on issue #27229: [SPARK-29708][SQL][BRANCH-2.4] 
Correct aggregated values when grouping sets are duplicated
URL: https://github.com/apache/spark/pull/27229#issuecomment-575003044
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27229: [SPARK-29708][SQL][BRANCH-2.4] Correct aggregated values when grouping sets are duplicated

2020-01-15 Thread GitBox

AmplabJenkins removed a comment on issue #27229: [SPARK-29708][SQL][BRANCH-2.4] 
Correct aggregated values when grouping sets are duplicated
URL: https://github.com/apache/spark/pull/27229#issuecomment-575003048
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21594/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27229: [SPARK-29708][SQL][BRANCH-2.4] Correct aggregated values when grouping sets are duplicated

2020-01-15 Thread GitBox

SparkQA commented on issue #27229: [SPARK-29708][SQL][BRANCH-2.4] Correct 
aggregated values when grouping sets are duplicated
URL: https://github.com/apache/spark/pull/27229#issuecomment-575004668
 
 
   **[Test build #116822 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116822/testReport)**
 for PR 27229 at commit 
[`be0ce63`](https://github.com/apache/spark/commit/be0ce63e1289322791fcb699ead1918a319576db).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] z47xu commented on issue #24421: [SPARK-12312][SQL]Support Kerberos login in JDBC connector

2020-01-15 Thread GitBox

z47xu commented on issue #24421: [SPARK-12312][SQL]Support Kerberos login in 
JDBC connector
URL: https://github.com/apache/spark/pull/24421#issuecomment-575003039
 
 
   > @vanzin may I ask to close this? I think it won't continue.
   > I'm planning to pick this up.
   
   What is your plan? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27229: [SPARK-29708][SQL][BRANCH-2.4] Correct aggregated values when grouping sets are duplicated

2020-01-15 Thread GitBox

AmplabJenkins commented on issue #27229: [SPARK-29708][SQL][BRANCH-2.4] Correct 
aggregated values when grouping sets are duplicated
URL: https://github.com/apache/spark/pull/27229#issuecomment-575003048
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21594/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27229: [SPARK-29708][SQL][BRANCH-2.4] Correct aggregated values when grouping sets are duplicated

2020-01-15 Thread GitBox

AmplabJenkins commented on issue #27229: [SPARK-29708][SQL][BRANCH-2.4] Correct 
aggregated values when grouping sets are duplicated
URL: https://github.com/apache/spark/pull/27229#issuecomment-575003044
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away

2020-01-15 Thread GitBox

SparkQA commented on issue #27096: [SPARK-28148][SQL] Repartition after join is 
not optimized away
URL: https://github.com/apache/spark/pull/27096#issuecomment-575002718
 
 
   **[Test build #116821 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116821/testReport)**
 for PR 27096 at commit 
[`9e895bd`](https://github.com/apache/spark/commit/9e895bd16162577196b83add1ef3d99bb4fc0d08).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on issue #27229: [SPARK-29708][SQL][BRANCH-2.4] Correct aggregated values when grouping sets are duplicated

2020-01-15 Thread GitBox

maropu commented on issue #27229: [SPARK-29708][SQL][BRANCH-2.4] Correct 
aggregated values when grouping sets are duplicated
URL: https://github.com/apache/spark/pull/27229#issuecomment-575002671
 
 
   cc: @dongjoon-hyun 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu opened a new pull request #27229: [SPARK-29708][SQL][BRANCH-2.4] Correct aggregated values when grouping sets are duplicated

2020-01-15 Thread GitBox

maropu opened a new pull request #27229: [SPARK-29708][SQL][BRANCH-2.4] Correct 
aggregated values when grouping sets are duplicated
URL: https://github.com/apache/spark/pull/27229
 
 
   
   
   ### What changes were proposed in this pull request?
   
   This pr intends to fix wrong aggregated values in `GROUPING SETS` when there 
are duplicated grouping sets in a query (e.g., `GROUPING SETS ((k1),(k1))`).
   
   For example;
   ```
   scala> spark.table("t").show()
   +---+---+---+
   | k1| k2|  v|
   +---+---+---+
   |  0|  0|  3|
   +---+---+---+
   
   scala> sql("""select grouping_id(), k1, k2, sum(v) from t group by grouping 
sets ((k1),(k1,k2),(k2,k1),(k1,k2))""").show()
   +-+---++--+  
   
   |grouping_id()| k1|  k2|sum(v)|
   +-+---++--+
   |0|  0|   0| 9| < wrong aggregate value and the correct 
answer is `3`
   |1|  0|null| 3|
   +-+---++--+
   
   // PostgreSQL case
   postgres=#  select k1, k2, sum(v) from t group by grouping sets 
((k1),(k1,k2),(k2,k1),(k1,k2));
k1 |  k2  | sum 
   +--+-
 0 |0 |   3
 0 |0 |   3
 0 |0 |   3
 0 | NULL |   3
   (4 rows)
   
   // Hive case
   hive> select GROUPING__ID, k1, k2, sum(v) from t group by k1, k2 grouping 
sets ((k1),(k1,k2),(k2,k1),(k1,k2));
   10   NULL3
   00   0   3
   ```
   [MS SQL Server has the same behaviour with 
PostgreSQL](https://github.com/apache/spark/pull/26961#issuecomment-573638442). 
This pr follows the behaviour of PostgreSQL/SQL server; it adds one more 
virtual attribute in `Expand` for avoiding wrongly grouping rows with the same 
grouping ID.
   
   This is the #26961 backport  for `branch-2.4`
   
   ### Why are the changes needed?
   
   To fix bugs.
   
   ### Does this PR introduce any user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   The existing tests.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] fuwhu commented on a change in pull request #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions

2020-01-15 Thread GitBox

fuwhu commented on a change in pull request #26805: [SPARK-15616][SQL] Add 
optimizer rule PruneHiveTablePartitions
URL: https://github.com/apache/spark/pull/26805#discussion_r367247006
 
 

 ##
 File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/PruneHiveTablePartitions.scala
 ##
 @@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import java.io.IOException
+
+import org.apache.hadoop.hive.common.StatsSetupConst
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.analysis.CastSupport
+import org.apache.spark.sql.catalyst.catalog.{CatalogStatistics, CatalogTable, 
CatalogTablePartition, HiveTableRelation}
+import org.apache.spark.sql.catalyst.expressions.{And, AttributeSet, 
Expression, ExpressionSet, SubqueryExpression}
+import org.apache.spark.sql.catalyst.planning.PhysicalOperation
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, 
Project}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.datasources.DataSourceStrategy
+
+/**
+ * TODO: merge this with PruneFileSourcePartitions after we completely make 
hive as a data source.
+ */
+private[sql] class PruneHiveTablePartitions(session: SparkSession)
+  extends Rule[LogicalPlan] with CastSupport {
+
+  override val conf = session.sessionState.conf
+
+  /**
+   * Extract the partition filters from the filters on the table.
+   */
+  private def getPartitionKeyFilters(
+  filters: Seq[Expression],
+  relation: HiveTableRelation): ExpressionSet = {
+val normalizedFilters = DataSourceStrategy.normalizeExprs(
+  filters.filter(f => f.deterministic && 
!SubqueryExpression.hasSubquery(f)), relation.output)
+val partitionColumnSet = AttributeSet(relation.partitionCols)
+ExpressionSet(normalizedFilters.filter { f =>
+  !f.references.isEmpty && f.references.subsetOf(partitionColumnSet)
+})
+  }
+
+  /**
+   * Prune the hive table using filters on the partitions of the table.
+   */
+  private def prunePartitions(
+  relation: HiveTableRelation,
+  partitionFilters: ExpressionSet): Seq[CatalogTablePartition] = {
+if (conf.metastorePartitionPruning) {
 
 Review comment:
   you mean adding a config to control whether we should prune hive table 
partitions in optimization phase ?
   And we can check the config in apply , this config can default to be true.
   Is that expected ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away

2020-01-15 Thread GitBox

AmplabJenkins removed a comment on issue #27096: [SPARK-28148][SQL] Repartition 
after join is not optimized away
URL: https://github.com/apache/spark/pull/27096#issuecomment-575001069
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away

2020-01-15 Thread GitBox

AmplabJenkins removed a comment on issue #27096: [SPARK-28148][SQL] Repartition 
after join is not optimized away
URL: https://github.com/apache/spark/pull/27096#issuecomment-575001076
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21593/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] bmarcott commented on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away

2020-01-15 Thread GitBox

bmarcott commented on issue #27096: [SPARK-28148][SQL] Repartition after join 
is not optimized away
URL: https://github.com/apache/spark/pull/27096#issuecomment-575001493
 
 
   updated with a new sparkplan rule and added a test which makes sure a user's 
repartition with a different numPartitions would not be eliminated (don't want 
to change expected numPartitions).
   
   Review when you get a chance :).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away

2020-01-15 Thread GitBox

AmplabJenkins commented on issue #27096: [SPARK-28148][SQL] Repartition after 
join is not optimized away
URL: https://github.com/apache/spark/pull/27096#issuecomment-575001076
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21593/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on issue #27218: [SPARK-30499][SQL] Remove SQL config spark.sql.execution.pandas.respectSessionTimeZone

2020-01-15 Thread GitBox

MaxGekk commented on issue #27218: [SPARK-30499][SQL] Remove SQL config 
spark.sql.execution.pandas.respectSessionTimeZone
URL: https://github.com/apache/spark/pull/27218#issuecomment-575001185
 
 
   @HyukjinKwon I checked that tests failed when I set timeZone to nil/none 
there 
https://github.com/apache/spark/pull/27218/files#diff-5dad4c4e6faaa6c596e8a40d1dea74f4R67
 & 
https://github.com/apache/spark/pull/27218/files#diff-a56d42b312418a8c63720c57614e76adR145


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away

2020-01-15 Thread GitBox

AmplabJenkins commented on issue #27096: [SPARK-28148][SQL] Repartition after 
join is not optimized away
URL: https://github.com/apache/spark/pull/27096#issuecomment-575001069
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] fuwhu commented on a change in pull request #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions

2020-01-15 Thread GitBox

fuwhu commented on a change in pull request #26805: [SPARK-15616][SQL] Add 
optimizer rule PruneHiveTablePartitions
URL: https://github.com/apache/spark/pull/26805#discussion_r367247006
 
 

 ##
 File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/PruneHiveTablePartitions.scala
 ##
 @@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import java.io.IOException
+
+import org.apache.hadoop.hive.common.StatsSetupConst
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.analysis.CastSupport
+import org.apache.spark.sql.catalyst.catalog.{CatalogStatistics, CatalogTable, 
CatalogTablePartition, HiveTableRelation}
+import org.apache.spark.sql.catalyst.expressions.{And, AttributeSet, 
Expression, ExpressionSet, SubqueryExpression}
+import org.apache.spark.sql.catalyst.planning.PhysicalOperation
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, 
Project}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.datasources.DataSourceStrategy
+
+/**
+ * TODO: merge this with PruneFileSourcePartitions after we completely make 
hive as a data source.
+ */
+private[sql] class PruneHiveTablePartitions(session: SparkSession)
+  extends Rule[LogicalPlan] with CastSupport {
+
+  override val conf = session.sessionState.conf
+
+  /**
+   * Extract the partition filters from the filters on the table.
+   */
+  private def getPartitionKeyFilters(
+  filters: Seq[Expression],
+  relation: HiveTableRelation): ExpressionSet = {
+val normalizedFilters = DataSourceStrategy.normalizeExprs(
+  filters.filter(f => f.deterministic && 
!SubqueryExpression.hasSubquery(f)), relation.output)
+val partitionColumnSet = AttributeSet(relation.partitionCols)
+ExpressionSet(normalizedFilters.filter { f =>
+  !f.references.isEmpty && f.references.subsetOf(partitionColumnSet)
+})
+  }
+
+  /**
+   * Prune the hive table using filters on the partitions of the table.
+   */
+  private def prunePartitions(
+  relation: HiveTableRelation,
+  partitionFilters: ExpressionSet): Seq[CatalogTablePartition] = {
+if (conf.metastorePartitionPruning) {
 
 Review comment:
   you mean adding a config to control whether we should prune hive table 
partitions ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away

2020-01-15 Thread GitBox

SparkQA commented on issue #27096: [SPARK-28148][SQL] Repartition after join is 
not optimized away
URL: https://github.com/apache/spark/pull/27096#issuecomment-575000581
 
 
   **[Test build #116820 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116820/testReport)**
 for PR 27096 at commit 
[`2517bea`](https://github.com/apache/spark/commit/2517beae0b2382a9171239d5682364957135318c).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27226: [SPARK-30524] [SQL] Disable OptimizeSkewedJoin rule when introducing additional shuffle

2020-01-15 Thread GitBox

AmplabJenkins commented on issue #27226: [SPARK-30524] [SQL] Disable 
OptimizeSkewedJoin rule when introducing additional shuffle
URL: https://github.com/apache/spark/pull/27226#issuecomment-574998912
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27226: [SPARK-30524] [SQL] Disable OptimizeSkewedJoin rule when introducing additional shuffle

2020-01-15 Thread GitBox

AmplabJenkins removed a comment on issue #27226: [SPARK-30524] [SQL] Disable 
OptimizeSkewedJoin rule when introducing additional shuffle
URL: https://github.com/apache/spark/pull/27226#issuecomment-574998912
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27226: [SPARK-30524] [SQL] Disable OptimizeSkewedJoin rule when introducing additional shuffle

2020-01-15 Thread GitBox

AmplabJenkins commented on issue #27226: [SPARK-30524] [SQL] Disable 
OptimizeSkewedJoin rule when introducing additional shuffle
URL: https://github.com/apache/spark/pull/27226#issuecomment-574998918
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21592/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27226: [SPARK-30524] [SQL] Disable OptimizeSkewedJoin rule when introducing additional shuffle

2020-01-15 Thread GitBox

AmplabJenkins removed a comment on issue #27226: [SPARK-30524] [SQL] Disable 
OptimizeSkewedJoin rule when introducing additional shuffle
URL: https://github.com/apache/spark/pull/27226#issuecomment-574998918
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21592/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #26929: [SPARK-30289][SQL] DSv2's partitioning should not accept nested columns

2020-01-15 Thread GitBox

cloud-fan commented on issue #26929: [SPARK-30289][SQL] DSv2's partitioning 
should not accept nested columns
URL: https://github.com/apache/spark/pull/26929#issuecomment-574998506
 
 
   I think Spark should support all kinds of PARTITION BY expressions as long 
as it can be translated to v2 `Transform`. The catalog implementation should 
decide if they support it or not. For examaple, Hive catalog doesn't support 
partition by nested columns.
   
   For the particular test failure, I think we should fix `InMemoryTable` that, 
when flatten the fields, we should keep the full column path not just the name.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27226: [SPARK-30524] [SQL] Disable OptimizeSkewedJoin rule when introducing additional shuffle

2020-01-15 Thread GitBox

SparkQA commented on issue #27226: [SPARK-30524] [SQL] Disable 
OptimizeSkewedJoin rule when introducing additional shuffle
URL: https://github.com/apache/spark/pull/27226#issuecomment-574998541
 
 
   **[Test build #116819 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116819/testReport)**
 for PR 27226 at commit 
[`c37f397`](https://github.com/apache/spark/commit/c37f39774825c052cd93f35288ff105c8292c343).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27228: [SPARK-29565][FOLLOWUP] add setInputCol/setOutputCol in OHEModel

2020-01-15 Thread GitBox

AmplabJenkins removed a comment on issue #27228: [SPARK-29565][FOLLOWUP] add 
setInputCol/setOutputCol in OHEModel
URL: https://github.com/apache/spark/pull/27228#issuecomment-574997996
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27228: [SPARK-29565][FOLLOWUP] add setInputCol/setOutputCol in OHEModel

2020-01-15 Thread GitBox

AmplabJenkins removed a comment on issue #27228: [SPARK-29565][FOLLOWUP] add 
setInputCol/setOutputCol in OHEModel
URL: https://github.com/apache/spark/pull/27228#issuecomment-574998004
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116812/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #27228: [SPARK-29565][FOLLOWUP] add setInputCol/setOutputCol in OHEModel

2020-01-15 Thread GitBox

SparkQA removed a comment on issue #27228: [SPARK-29565][FOLLOWUP] add 
setInputCol/setOutputCol in OHEModel
URL: https://github.com/apache/spark/pull/27228#issuecomment-574980877
 
 
   **[Test build #116812 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116812/testReport)**
 for PR 27228 at commit 
[`b109d90`](https://github.com/apache/spark/commit/b109d908b691c48bf91972aca2d0eb411edc6949).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1057 matches

Mail list logo