[GitHub] spark issue #19916: [SPARK-22716][SQL] Replace addReferenceObj to reduce the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19916 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84575/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19916: [SPARK-22716][SQL] Replace addReferenceObj to reduce the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19916 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19893: [SPARK-16139][TEST] Add logging functionality for leaked...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19893 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84574/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19912: [SPARK-22719][SQL]Refactor ConstantPropagation
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19912#discussion_r155412840 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -64,50 +64,91 @@ object ConstantFolding extends Rule[LogicalPlan] { * }}} * * Approach used: - * - Start from AND operator as the root - * - Get all the children conjunctive predicates which are EqualTo / EqualNullSafe such that they - * don't have a `NOT` or `OR` operator in them * - Populate a mapping of attribute => constant value by looking at all the equals predicates * - Using this mapping, replace occurrence of the attributes with the corresponding constant values * in the AND node. */ object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper { - private def containsNonConjunctionPredicates(expression: Expression): Boolean = expression.find { -case _: Not | _: Or => true -case _ => false - }.isDefined - def apply(plan: LogicalPlan): LogicalPlan = plan transform { -case f: Filter => f transformExpressionsUp { - case and: And => -val conjunctivePredicates = - splitConjunctivePredicates(and) -.filter(expr => expr.isInstanceOf[EqualTo] || expr.isInstanceOf[EqualNullSafe]) -.filterNot(expr => containsNonConjunctionPredicates(expr)) - -val equalityPredicates = conjunctivePredicates.collect { - case e @ EqualTo(left: AttributeReference, right: Literal) => ((left, right), e) - case e @ EqualTo(left: Literal, right: AttributeReference) => ((right, left), e) - case e @ EqualNullSafe(left: AttributeReference, right: Literal) => ((left, right), e) - case e @ EqualNullSafe(left: Literal, right: AttributeReference) => ((right, left), e) -} - -val constantsMap = AttributeMap(equalityPredicates.map(_._1)) -val predicates = equalityPredicates.map(_._2).toSet +case f: Filter => + val (newCondition, _) = traverse(f.condition, replaceChildren = true) + if (newCondition.isDefined) { +f.copy(condition = newCondition.get) + } else { +f + } + } -def replaceConstants(expression: Expression) = expression transform { - case a: AttributeReference => -constantsMap.get(a) match { - case Some(literal) => literal - case None => a -} + /** + * Traverse a condition as a tree and replace attributes with constant values. + * - If the child of [[And]] is [[EqualTo]] or [[EqualNullSafe]], propagate the mapping + * of attribute => constant. + * - If the current [[And]] node is not child of another [[And]], replace occurrence of the + * attributes with the corresponding constant values in both children with propagated mapping. --- End diff -- Also add the comments to explain `Or` and `Not` processing below. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19893: [SPARK-16139][TEST] Add logging functionality for leaked...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19893 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19902: [SPARK-22452][SQL]Add getInt, getLong, getBoolean to Dat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19902 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19902: [SPARK-22452][SQL]Add getInt, getLong, getBoolean to Dat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19902 **[Test build #84582 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84582/testReport)** for PR 19902 at commit [`7d87edc`](https://github.com/apache/spark/commit/7d87ed0573554b6b43e0cac17994652397bb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19902: [SPARK-22452][SQL]Add getInt, getLong, getBoolean to Dat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19902 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84582/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19222 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19222 **[Test build #84586 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84586/testReport)** for PR 19222 at commit [`2ca374e`](https://github.com/apache/spark/commit/2ca374ee233f5edb9bb206b06d7383c0e22dd25e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19222 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84586/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19916: [SPARK-22716][SQL] Replace addReferenceObj to reduce the...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19916 https://github.com/apache/spark/blob/e98f9647f44d1071a6b070db070841b8cda6bd7a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala#L1123 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...
Github user merlintang commented on the issue: https://github.com/apache/spark/pull/19885 I have added this test case for the URI comparing based on Steve's comments. I have tested this in my local vm, it pass the test. meanwhile, for the hdfs://namenode1/path1 hdfs://namenode1:8020/path2 , the default port number of hdfs can be got. thus, they also matched. below is the test case: test("compare URI for filesystem") { //case 1 var srcUri = new URI("file:///file1") var dstUri = new URI("file:///file2") assert(Client.compareUri(srcUri, dstUri) == true) //case 2 srcUri = new URI("file:///c:file1") dstUri = new URI("file://c:file2") assert(Client.compareUri(srcUri, dstUri) == true) //case 3 srcUri = new URI("file://host/file1") dstUri = new URI("file://host/file2") assert(Client.compareUri(srcUri, dstUri) == true) //case 4 srcUri = new URI("wasb://bucket1@user") dstUri = new URI("wasb://bucket1@user/") assert(Client.compareUri(srcUri, dstUri) == true) //case 5 srcUri = new URI("hdfs:/path1") dstUri = new URI("hdfs:/path2") assert(Client.compareUri(srcUri, dstUri) == true) //case 6 srcUri = new URI("file:///file1") dstUri = new URI("file://host/file2") assert(Client.compareUri(srcUri, dstUri) == false) //case 7 srcUri = new URI("file://host/file1") dstUri = new URI("file:///file2") assert(Client.compareUri(srcUri, dstUri) == false) //case 8 srcUri = new URI("file://host/file1") dstUri = new URI("file://host2/file2") assert(Client.compareUri(srcUri, dstUri) == false) //case 9 srcUri = new URI("wasb://bucket1@user") dstUri = new URI("wasb://bucket2@user/") assert(Client.compareUri(srcUri, dstUri) == false) //case 10 srcUri = new URI("wasb://bucket1@user") dstUri = new URI("wasb://bucket1@user2/") assert(Client.compareUri(srcUri, dstUri) == false) //case 11 srcUri = new URI("s3a://user@pass:bucket1/") dstUri = new URI("s3a://user2@pass2:bucket1/") assert(Client.compareUri(srcUri, dstUri) == false) //case 12 srcUri = new URI("hdfs://namenode1/path1") dstUri = new URI("hdfs://namenode1:8080/path2") assert(Client.compareUri(srcUri, dstUri) == false) //case 13 srcUri = new URI("hdfs://namenode1:8020/path1") dstUri = new URI("hdfs://namenode1:8080/path2") assert(Client.compareUri(srcUri, dstUri) == false) } --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19912: [SPARK-22719][SQL]Refactor ConstantPropagation
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19912 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19917: [SPARK-22725][SQL] Add failing test for select with a sp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19917 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19917: [SPARK-22725][SQL] Add failing test for select with a sp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19917 **[Test build #84590 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84590/testReport)** for PR 19917 at commit [`2535b21`](https://github.com/apache/spark/commit/2535b214ea5349209aecaa4b93868df548077c31). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19917: [SPARK-22725][SQL] Add failing test for select with a sp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19917 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84590/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19916: [SPARK-22716][SQL] Replace addReferenceObj to reduce the...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19916 **[Test build #84575 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84575/testReport)** for PR 19916 at commit [`3004a69`](https://github.com/apache/spark/commit/3004a69058e5fce61fb2ea7b9bd7c883c5fa89ed). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19854: [SPARK-22660][BUILD] Use position() and limit() to fix a...
Github user kellyzly commented on the issue: https://github.com/apache/spark/pull/19854 thanks @HyukjinKwon ,@srowen, @viirya 's review --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19893: [SPARK-16139][TEST] Add logging functionality for leaked...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19893 **[Test build #84574 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84574/testReport)** for PR 19893 at commit [`1a64209`](https://github.com/apache/spark/commit/1a64209e0208c1f6d1e7a54db2e8cae958f409cf). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19792: [SPARK-22566][PYTHON] Better error message for `_...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19792#discussion_r155424467 --- Diff: python/pyspark/sql/session.py --- @@ -405,14 +401,9 @@ def _createFromLocal(self, data, schema): data = list(data) if schema is None or isinstance(schema, (list, tuple)): -struct = self._inferSchemaFromList(data) -converter = _create_converter(struct) +schema = self._inferSchemaFromList(data, names=schema) +converter = _create_converter(schema) data = map(converter, data) -if isinstance(schema, (list, tuple)): --- End diff -- ```python >>> spark.createDataFrame([{'a': 1}], ["b"]) DataFrame[a: bigint] ``` Hm.. sorry actually I think we should not remove this one and L385 because we should primarily respect user's input and it should be `DataFrame[b: bigint]`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19882#discussion_r155429024 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilterSuite.scala --- @@ -0,0 +1,368 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources.orc + +import java.nio.charset.StandardCharsets +import java.sql.{Date, Timestamp} + +import scala.collection.JavaConverters._ + +import org.apache.orc.storage.ql.io.sarg.{PredicateLeaf, SearchArgument} + +import org.apache.spark.sql.{Column, DataFrame} +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.planning.PhysicalOperation +import org.apache.spark.sql.execution.datasources.{DataSourceStrategy, HadoopFsRelation, LogicalRelation} +import org.apache.spark.sql.test.SharedSQLContext +import org.apache.spark.sql.types._ + +/** + * A test suite that tests Apache ORC filter API based filter pushdown optimization. + */ +class OrcFilterSuite extends OrcTest with SharedSQLContext { + + private def checkFilterPredicate( + df: DataFrame, + predicate: Predicate, + checker: (SearchArgument) => Unit): Unit = { +val output = predicate.collect { case a: Attribute => a }.distinct +val query = df + .select(output.map(e => Column(e)): _*) + .where(Column(predicate)) + +var maybeRelation: Option[HadoopFsRelation] = None +val maybeAnalyzedPredicate = query.queryExecution.optimizedPlan.collect { + case PhysicalOperation(_, filters, LogicalRelation(orcRelation: HadoopFsRelation, _, _, _)) => +maybeRelation = Some(orcRelation) +filters +}.flatten.reduceLeftOption(_ && _) +assert(maybeAnalyzedPredicate.isDefined, "No filter is analyzed from the given query") + +val (_, selectedFilters, _) = + DataSourceStrategy.selectFilters(maybeRelation.get, maybeAnalyzedPredicate.toSeq) +assert(selectedFilters.nonEmpty, "No filter is pushed down") + +val maybeFilter = OrcFilters.createFilter(query.schema, selectedFilters) +assert(maybeFilter.isDefined, s"Couldn't generate filter predicate for $selectedFilters") +checker(maybeFilter.get) + } + + private def checkFilterPredicate + (predicate: Predicate, filterOperator: PredicateLeaf.Operator) + (implicit df: DataFrame): Unit = { +def checkComparisonOperator(filter: SearchArgument) = { + val operator = filter.getLeaves.asScala + assert(operator.map(_.getOperator).contains(filterOperator)) +} +checkFilterPredicate(df, predicate, checkComparisonOperator) + } + + private def checkFilterPredicate + (predicate: Predicate, stringExpr: String) + (implicit df: DataFrame): Unit = { +def checkLogicalOperator(filter: SearchArgument) = { + assert(filter.toString == stringExpr) +} +checkFilterPredicate(df, predicate, checkLogicalOperator) + } + + private def checkNoFilterPredicate + (predicate: Predicate) + (implicit df: DataFrame): Unit = { +val output = predicate.collect { case a: Attribute => a }.distinct +val query = df + .select(output.map(e => Column(e)): _*) + .where(Column(predicate)) + +var maybeRelation: Option[HadoopFsRelation] = None +val maybeAnalyzedPredicate = query.queryExecution.optimizedPlan.collect { + case PhysicalOperation(_, filters, LogicalRelation(orcRelation: HadoopFsRelation, _, _, _)) => +maybeRelation = Some(orcRelation) +filters +}.flatten.reduceLeftOption(_ && _) +assert(maybeAnalyzedPredicate.isDefined, "No filter is analyzed from the given query") + +val (_, selectedFilters, _) = +
[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19882 I actually suggested similarly before: ``` org.apache.spark.sql.execution.datasources.csv.InferSchema org.apache.spark.sql.execution.datasources.json.InferSchema ``` but I actually received an advise at that time and it became as below: ``` org.apache.spark.sql.execution.datasources.csv.CSVInferSchema org.apache.spark.sql.execution.datasources.json.JsonInferSchema ``` I actually liked `Hive` prefix it was easier to distinguish. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19683: [SPARK-21657][SQL] optimize explode quadratic memory con...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19683 **[Test build #84595 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84595/testReport)** for PR 19683 at commit [`04b5814`](https://github.com/apache/spark/commit/04b5814ee3c545096c4a0c50148be3975fdead45). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19912: [SPARK-22719][SQL]Refactor ConstantPropagation
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19912 **[Test build #84593 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84593/testReport)** for PR 19912 at commit [`88cd025`](https://github.com/apache/spark/commit/88cd0253465a1c2f8d697b7eaaecd3d756a06e29). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19912: [SPARK-22719][SQL]Refactor ConstantPropagation
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19912 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84593/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19902: [SPARK-22452][SQL]Add getInt, getLong, getBoolean to Dat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19902 **[Test build #84578 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84578/testReport)** for PR 19902 at commit [`15e8ce5`](https://github.com/apache/spark/commit/15e8ce584180d014f8b1ffeb8f0e251299cb8f31). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19499: [SPARK-22279][SQL] Turn on spark.sql.hive.convertMetasto...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19499 Good. The failures are expected ones and handled by #19882 . - org.apache.spark.sql.hive.orc.OrcQuerySuite.SPARK-8501: Avoids discovery schema from empty ORC files - The test case expects a failure, but it's fixed in new OrcFileFormat. - org.apache.spark.sql.hive.orc.OrcSourceSuite.SPARK-19459/SPARK-18220: read char/varchar column written by Hive - This is VARCHAR issue. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19912: [SPARK-22719][SQL]Refactor ConstantPropagation
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19912 **[Test build #84596 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84596/testReport)** for PR 19912 at commit [`88cd025`](https://github.com/apache/spark/commit/88cd0253465a1c2f8d697b7eaaecd3d756a06e29). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19912: [SPARK-22719][SQL]Refactor ConstantPropagation
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19912#discussion_r155415232 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -64,50 +64,91 @@ object ConstantFolding extends Rule[LogicalPlan] { * }}} * * Approach used: - * - Start from AND operator as the root - * - Get all the children conjunctive predicates which are EqualTo / EqualNullSafe such that they - * don't have a `NOT` or `OR` operator in them * - Populate a mapping of attribute => constant value by looking at all the equals predicates * - Using this mapping, replace occurrence of the attributes with the corresponding constant values * in the AND node. */ object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper { - private def containsNonConjunctionPredicates(expression: Expression): Boolean = expression.find { -case _: Not | _: Or => true -case _ => false - }.isDefined - def apply(plan: LogicalPlan): LogicalPlan = plan transform { -case f: Filter => f transformExpressionsUp { - case and: And => -val conjunctivePredicates = - splitConjunctivePredicates(and) -.filter(expr => expr.isInstanceOf[EqualTo] || expr.isInstanceOf[EqualNullSafe]) -.filterNot(expr => containsNonConjunctionPredicates(expr)) - -val equalityPredicates = conjunctivePredicates.collect { - case e @ EqualTo(left: AttributeReference, right: Literal) => ((left, right), e) - case e @ EqualTo(left: Literal, right: AttributeReference) => ((right, left), e) - case e @ EqualNullSafe(left: AttributeReference, right: Literal) => ((left, right), e) - case e @ EqualNullSafe(left: Literal, right: AttributeReference) => ((right, left), e) -} - -val constantsMap = AttributeMap(equalityPredicates.map(_._1)) -val predicates = equalityPredicates.map(_._2).toSet +case f: Filter => + val (newCondition, _) = traverse(f.condition, replaceChildren = true) + if (newCondition.isDefined) { +f.copy(condition = newCondition.get) + } else { +f + } + } -def replaceConstants(expression: Expression) = expression transform { - case a: AttributeReference => -constantsMap.get(a) match { - case Some(literal) => literal - case None => a -} + /** + * Traverse a condition as a tree and replace attributes with constant values. + * - If the child of [[And]] is [[EqualTo]] or [[EqualNullSafe]], propagate the mapping + * of attribute => constant. + * - If the current [[And]] node is not child of another [[And]], replace occurrence of the + * attributes with the corresponding constant values in both children with propagated mapping. + * @param condition condition to be traversed + * @param replaceChildren whether to replace attributes with the corresponding constant values + */ + private def traverse(condition: Expression, replaceChildren: Boolean) +: (Option[Expression], Seq[((AttributeReference, Literal), BinaryComparison)]) = --- End diff -- Maybe define a type alias for `Seq[((AttributeReference, Literal), BinaryComparison)]`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19893: [SPARK-16139][TEST] Add logging functionality for leaked...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19893 **[Test build #84580 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84580/testReport)** for PR 19893 at commit [`fe6cd0c`](https://github.com/apache/spark/commit/fe6cd0cc727a9d4516f6af674e616f0b1f8cef93). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19882 I meant in https://github.com/apache/spark/pull/19882#pullrequestreview-81708676, I liked this `Hive` prefix here so wondered if we could do the same thing for the main codes too. Since this PR only refactors the tests, it's loosely related though. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19912: [SPARK-22719][SQL]Refactor ConstantPropagation
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19912 **[Test build #84593 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84593/testReport)** for PR 19912 at commit [`88cd025`](https://github.com/apache/spark/commit/88cd0253465a1c2f8d697b7eaaecd3d756a06e29). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19917: Add failing test for select with a splatted stream
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19917 **[Test build #84592 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84592/testReport)** for PR 19917 at commit [`c728535`](https://github.com/apache/spark/commit/c7285358a06f58ca69c75f7cec194ed27c3cf8e8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19885 **[Test build #84594 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84594/testReport)** for PR 19885 at commit [`dfec0ac`](https://github.com/apache/spark/commit/dfec0ac812b690b79f416806409d96a65f4d9fe7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19717: [SPARK-22646] [Submission] Spark on Kubernetes - basic s...
Github user liyinan926 commented on the issue: https://github.com/apache/spark/pull/19717 @vanzin @jerryshao @felixcheung @viirya @mridulm @ueshin @jiangxb1987 Really appreciate your time reviewing this! Do you have more comments? Given that we have at least one more PR (most likely for integration tests) that we want to get in before 2.3 code freeze, it would be ideal if this PR can be merged this week. Thank you all! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19882 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19882 **[Test build #84581 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84581/testReport)** for PR 19882 at commit [`fcb2ccb`](https://github.com/apache/spark/commit/fcb2ccb64813f0fc37e3321c006937fb9cb870e4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19882 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84581/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19893: [SPARK-16139][TEST] Add logging functionality for leaked...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19893 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19893: [SPARK-16139][TEST] Add logging functionality for leaked...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19893 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84580/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19882 @HyukjinKwon . The PR code and description is updated. - Update comments - Rename back to the originals The main reason I used prefix `Hive` is for naming consistency, but now it's restored. As you see in the PR description, we need to use `HiveOrcHadoopFsRelationSuite` inconsistently because `OrcHadoopFsRelationSuite` already exists in the same package. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19364: [SPARK-22144][SQL] ExchangeCoordinator combine the parti...
Github user liutang123 commented on the issue: https://github.com/apache/spark/pull/19364 @cloud-fan Would you please look at this when you have time? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19917: [SPARK-22725][SQL] Add failing test for select with a sp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19917 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19917: [SPARK-22725][SQL] Add failing test for select with a sp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19917 **[Test build #84592 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84592/testReport)** for PR 19917 at commit [`c728535`](https://github.com/apache/spark/commit/c7285358a06f58ca69c75f7cec194ed27c3cf8e8). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19912: [SPARK-22719][SQL]Refactor ConstantPropagation
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/19912 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19917: [SPARK-22725][SQL] Add failing test for select with a sp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19917 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84592/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19901: [SPARK-22705][SQL] Case, Coalesce, and In use less globa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19901 **[Test build #84588 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84588/testReport)** for PR 19901 at commit [`c4691b6`](https://github.com/apache/spark/commit/c4691b6b4cd8b2ac26dd9c89243372dfec5bb913). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19882 LGTM BTW. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19882#discussion_r155410507 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilterSuite.scala --- @@ -0,0 +1,368 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources.orc + +import java.nio.charset.StandardCharsets +import java.sql.{Date, Timestamp} + +import scala.collection.JavaConverters._ + +import org.apache.orc.storage.ql.io.sarg.{PredicateLeaf, SearchArgument} + +import org.apache.spark.sql.{Column, DataFrame} +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.planning.PhysicalOperation +import org.apache.spark.sql.execution.datasources.{DataSourceStrategy, HadoopFsRelation, LogicalRelation} +import org.apache.spark.sql.test.SharedSQLContext +import org.apache.spark.sql.types._ + +/** + * A test suite that tests Apache ORC filter API based filter pushdown optimization. + */ +class OrcFilterSuite extends OrcTest with SharedSQLContext { + + private def checkFilterPredicate( + df: DataFrame, + predicate: Predicate, + checker: (SearchArgument) => Unit): Unit = { +val output = predicate.collect { case a: Attribute => a }.distinct +val query = df + .select(output.map(e => Column(e)): _*) + .where(Column(predicate)) + +var maybeRelation: Option[HadoopFsRelation] = None +val maybeAnalyzedPredicate = query.queryExecution.optimizedPlan.collect { + case PhysicalOperation(_, filters, LogicalRelation(orcRelation: HadoopFsRelation, _, _, _)) => +maybeRelation = Some(orcRelation) +filters +}.flatten.reduceLeftOption(_ && _) +assert(maybeAnalyzedPredicate.isDefined, "No filter is analyzed from the given query") + +val (_, selectedFilters, _) = + DataSourceStrategy.selectFilters(maybeRelation.get, maybeAnalyzedPredicate.toSeq) +assert(selectedFilters.nonEmpty, "No filter is pushed down") + +val maybeFilter = OrcFilters.createFilter(query.schema, selectedFilters) +assert(maybeFilter.isDefined, s"Couldn't generate filter predicate for $selectedFilters") +checker(maybeFilter.get) + } + + private def checkFilterPredicate + (predicate: Predicate, filterOperator: PredicateLeaf.Operator) + (implicit df: DataFrame): Unit = { +def checkComparisonOperator(filter: SearchArgument) = { + val operator = filter.getLeaves.asScala + assert(operator.map(_.getOperator).contains(filterOperator)) +} +checkFilterPredicate(df, predicate, checkComparisonOperator) + } + + private def checkFilterPredicate + (predicate: Predicate, stringExpr: String) + (implicit df: DataFrame): Unit = { +def checkLogicalOperator(filter: SearchArgument) = { + assert(filter.toString == stringExpr) +} +checkFilterPredicate(df, predicate, checkLogicalOperator) + } + + private def checkNoFilterPredicate + (predicate: Predicate) + (implicit df: DataFrame): Unit = { +val output = predicate.collect { case a: Attribute => a }.distinct +val query = df + .select(output.map(e => Column(e)): _*) + .where(Column(predicate)) + +var maybeRelation: Option[HadoopFsRelation] = None +val maybeAnalyzedPredicate = query.queryExecution.optimizedPlan.collect { + case PhysicalOperation(_, filters, LogicalRelation(orcRelation: HadoopFsRelation, _, _, _)) => +maybeRelation = Some(orcRelation) +filters +}.flatten.reduceLeftOption(_ && _) +assert(maybeAnalyzedPredicate.isDefined, "No filter is analyzed from the given query") + +val (_, selectedFilters, _) = +
[GitHub] spark issue #19676: [SPARK-14516][FOLLOWUP] Adding ClusteringEvaluator to ex...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/19676 It's good to have this, sorry for late response, I will make a pass tomorrow. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19882#discussion_r155428391 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilterSuite.scala --- @@ -0,0 +1,368 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources.orc + +import java.nio.charset.StandardCharsets +import java.sql.{Date, Timestamp} + +import scala.collection.JavaConverters._ + +import org.apache.orc.storage.ql.io.sarg.{PredicateLeaf, SearchArgument} + +import org.apache.spark.sql.{Column, DataFrame} +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.planning.PhysicalOperation +import org.apache.spark.sql.execution.datasources.{DataSourceStrategy, HadoopFsRelation, LogicalRelation} +import org.apache.spark.sql.test.SharedSQLContext +import org.apache.spark.sql.types._ + +/** + * A test suite that tests Apache ORC filter API based filter pushdown optimization. + */ +class OrcFilterSuite extends OrcTest with SharedSQLContext { + + private def checkFilterPredicate( + df: DataFrame, + predicate: Predicate, + checker: (SearchArgument) => Unit): Unit = { +val output = predicate.collect { case a: Attribute => a }.distinct +val query = df + .select(output.map(e => Column(e)): _*) + .where(Column(predicate)) + +var maybeRelation: Option[HadoopFsRelation] = None +val maybeAnalyzedPredicate = query.queryExecution.optimizedPlan.collect { + case PhysicalOperation(_, filters, LogicalRelation(orcRelation: HadoopFsRelation, _, _, _)) => +maybeRelation = Some(orcRelation) +filters +}.flatten.reduceLeftOption(_ && _) +assert(maybeAnalyzedPredicate.isDefined, "No filter is analyzed from the given query") + +val (_, selectedFilters, _) = + DataSourceStrategy.selectFilters(maybeRelation.get, maybeAnalyzedPredicate.toSeq) +assert(selectedFilters.nonEmpty, "No filter is pushed down") + +val maybeFilter = OrcFilters.createFilter(query.schema, selectedFilters) +assert(maybeFilter.isDefined, s"Couldn't generate filter predicate for $selectedFilters") +checker(maybeFilter.get) + } + + private def checkFilterPredicate + (predicate: Predicate, filterOperator: PredicateLeaf.Operator) + (implicit df: DataFrame): Unit = { +def checkComparisonOperator(filter: SearchArgument) = { + val operator = filter.getLeaves.asScala + assert(operator.map(_.getOperator).contains(filterOperator)) +} +checkFilterPredicate(df, predicate, checkComparisonOperator) + } + + private def checkFilterPredicate + (predicate: Predicate, stringExpr: String) + (implicit df: DataFrame): Unit = { +def checkLogicalOperator(filter: SearchArgument) = { + assert(filter.toString == stringExpr) +} +checkFilterPredicate(df, predicate, checkLogicalOperator) + } + + private def checkNoFilterPredicate + (predicate: Predicate) + (implicit df: DataFrame): Unit = { +val output = predicate.collect { case a: Attribute => a }.distinct +val query = df + .select(output.map(e => Column(e)): _*) + .where(Column(predicate)) + +var maybeRelation: Option[HadoopFsRelation] = None +val maybeAnalyzedPredicate = query.queryExecution.optimizedPlan.collect { + case PhysicalOperation(_, filters, LogicalRelation(orcRelation: HadoopFsRelation, _, _, _)) => +maybeRelation = Some(orcRelation) +filters +}.flatten.reduceLeftOption(_ && _) +assert(maybeAnalyzedPredicate.isDefined, "No filter is analyzed from the given query") + +val (_, selectedFilters, _) = +
[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19882 Thank you so much, @HyukjinKwon ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19917: Add failing test for select with a splatted strea...
GitHub user ash211 opened a pull request: https://github.com/apache/spark/pull/19917 Add failing test for select with a splatted stream ## What changes were proposed in this pull request? Add additional test. ## How was this patch tested? Additional test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ash211/spark aash/PDS-59284 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19917.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19917 commit 2535b214ea5349209aecaa4b93868df548077c31 Author: Andrew AshDate: 2017-12-07T00:50:27Z Add test for PDS-59284 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19882 Yup but I actually received an advise before. I suggested: ``` ``` ``` org.apache.spark.sql.execution.datasources.csv.CSVInferSchema org.apache.spark.sql.execution.datasources.json.JsonInferSchema ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19917: Add failing test for select with a splatted stream
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19917 **[Test build #84590 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84590/testReport)** for PR 19917 at commit [`2535b21`](https://github.com/apache/spark/commit/2535b214ea5349209aecaa4b93868df548077c31). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19882 Definitely, my bad. For the main code, we can do later in a separate PR if needed. I hope this PR contains tests only. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19882 I am sorry, I had to clarify this ahead .. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19911: [SPARK-22717][SQL] Correct cascade default for postgres
Github user danielvdende commented on the issue: https://github.com/apache/spark/pull/19911 Sure, I'll make the changes :). I'll use this PR, seems to make sense, as it would fix truncate for postgres --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19911: [SPARK-22717][SQL] Correct cascade default for postgres
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19911 Great! I'm looking forward your new PR, @danielvdende . I can help you a little bit during review. :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19885 **[Test build #84594 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84594/testReport)** for PR 19885 at commit [`dfec0ac`](https://github.com/apache/spark/commit/dfec0ac812b690b79f416806409d96a65f4d9fe7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19885 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84594/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19885 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19902: [SPARK-22452][SQL]Add getInt, getLong, getBoolean to Dat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19902 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84578/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19912: [SPARK-22719][SQL]Refactor ConstantPropagation
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19912#discussion_r155411680 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -64,50 +64,91 @@ object ConstantFolding extends Rule[LogicalPlan] { * }}} * * Approach used: - * - Start from AND operator as the root - * - Get all the children conjunctive predicates which are EqualTo / EqualNullSafe such that they - * don't have a `NOT` or `OR` operator in them * - Populate a mapping of attribute => constant value by looking at all the equals predicates * - Using this mapping, replace occurrence of the attributes with the corresponding constant values * in the AND node. */ object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper { - private def containsNonConjunctionPredicates(expression: Expression): Boolean = expression.find { -case _: Not | _: Or => true -case _ => false - }.isDefined - def apply(plan: LogicalPlan): LogicalPlan = plan transform { -case f: Filter => f transformExpressionsUp { - case and: And => -val conjunctivePredicates = - splitConjunctivePredicates(and) -.filter(expr => expr.isInstanceOf[EqualTo] || expr.isInstanceOf[EqualNullSafe]) -.filterNot(expr => containsNonConjunctionPredicates(expr)) - -val equalityPredicates = conjunctivePredicates.collect { - case e @ EqualTo(left: AttributeReference, right: Literal) => ((left, right), e) - case e @ EqualTo(left: Literal, right: AttributeReference) => ((right, left), e) - case e @ EqualNullSafe(left: AttributeReference, right: Literal) => ((left, right), e) - case e @ EqualNullSafe(left: Literal, right: AttributeReference) => ((right, left), e) -} - -val constantsMap = AttributeMap(equalityPredicates.map(_._1)) -val predicates = equalityPredicates.map(_._2).toSet +case f: Filter => + val (newCondition, _) = traverse(f.condition, replaceChildren = true) + if (newCondition.isDefined) { +f.copy(condition = newCondition.get) + } else { +f + } + } -def replaceConstants(expression: Expression) = expression transform { - case a: AttributeReference => -constantsMap.get(a) match { - case Some(literal) => literal - case None => a -} + /** + * Traverse a condition as a tree and replace attributes with constant values. + * - If the child of [[And]] is [[EqualTo]] or [[EqualNullSafe]], propagate the mapping + * of attribute => constant. + * - If the current [[And]] node is not child of another [[And]], replace occurrence of the + * attributes with the corresponding constant values in both children with propagated mapping. + * @param condition condition to be traversed + * @param replaceChildren whether to replace attributes with the corresponding constant values + */ --- End diff -- Add doc to explain return value? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19499: [SPARK-22279][SQL] Turn on spark.sql.hive.convertMetasto...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19499 I'll retrigger after merging #19882 . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19902: [SPARK-22452][SQL]Add getInt, getLong, getBoolean to Dat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19902 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19902: [SPARK-22452][SQL]Add getInt, getLong, getBoolean to Dat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19902 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84579/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/19525#discussion_r155414954 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala --- @@ -2769,6 +2769,20 @@ class LogisticRegressionSuite LogisticRegressionSuite.allParamSettings, checkModelData) } + test("read/write with BoundsOnCoefficients") { +def checkModelData(model: LogisticRegressionModel, model2: LogisticRegressionModel): Unit = { + assert(model.getLowerBoundsOnCoefficients === model2.getLowerBoundsOnCoefficients) + assert(model.getUpperBoundsOnCoefficients === model2.getUpperBoundsOnCoefficients) --- End diff -- Sure, you can update existing read/write test or your test, as long as to check model itself rather than parameters. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/19525#discussion_r155413347 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -122,17 +124,33 @@ private[ml] object Param { /** Decodes a param value from JSON. */ def jsonDecode[T](json: String): T = { -parse(json) match { +val jValue = parse(json) +jValue match { case JString(x) => x.asInstanceOf[T] case JObject(v) => val keys = v.map(_._1) -assert(keys.contains("type") && keys.contains("values"), - s"Expect a JSON serialized vector but cannot find fields 'type' and 'values' in $json.") -JsonVectorConverter.fromJson(json).asInstanceOf[T] +if (keys.contains("class")) { + implicit val formats = DefaultFormats + val className = (jValue \ "class").extract[String] + className match { +case JsonMatrixConverter.className => + val checkFields = Array("numRows", "numCols", "values", "isTransposed") + require(checkFields.forall(keys.contains), s"Expect a JSON serialized Matrix" + +s" but cannot find fields ${checkFields.mkString(", ")} in $json.") + JsonMatrixConverter.fromJson(json).asInstanceOf[T] + +case s => throw new SparkException(s"unrecognized class $s in $json") + } +} else { // Vector does not have class info in json --- End diff -- I'd suggest to add more comment here to clarify why vector doesn't have _class_ info in json, it should facilitate the code maintenance. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/19525#discussion_r155411609 --- Diff: mllib/src/main/scala/org/apache/spark/ml/linalg/JsonMatrixConverter.scala --- @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.ml.linalg + +import org.json4s.DefaultFormats +import org.json4s.JsonDSL._ +import org.json4s.jackson.JsonMethods.{compact, parse => parseJson, render} + +private[ml] object JsonMatrixConverter { + + /** Unique class name for identifying JSON object encoded by this class. */ + val className = "org.apache.spark.ml.linalg.Matrix" --- End diff -- @hhbyyh You have got a point there, agree. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/19525#discussion_r155412287 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -122,17 +124,33 @@ private[ml] object Param { /** Decodes a param value from JSON. */ def jsonDecode[T](json: String): T = { -parse(json) match { +val jValue = parse(json) +jValue match { case JString(x) => x.asInstanceOf[T] case JObject(v) => val keys = v.map(_._1) -assert(keys.contains("type") && keys.contains("values"), - s"Expect a JSON serialized vector but cannot find fields 'type' and 'values' in $json.") -JsonVectorConverter.fromJson(json).asInstanceOf[T] +if (keys.contains("class")) { + implicit val formats = DefaultFormats + val className = (jValue \ "class").extract[String] + className match { +case JsonMatrixConverter.className => + val checkFields = Array("numRows", "numCols", "values", "isTransposed") --- End diff -- Should we check _type_ as well? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19912: [SPARK-22719][SQL]Refactor ConstantPropagation
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19912#discussion_r155414906 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -64,50 +64,91 @@ object ConstantFolding extends Rule[LogicalPlan] { * }}} * * Approach used: - * - Start from AND operator as the root - * - Get all the children conjunctive predicates which are EqualTo / EqualNullSafe such that they - * don't have a `NOT` or `OR` operator in them * - Populate a mapping of attribute => constant value by looking at all the equals predicates * - Using this mapping, replace occurrence of the attributes with the corresponding constant values * in the AND node. */ object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper { - private def containsNonConjunctionPredicates(expression: Expression): Boolean = expression.find { -case _: Not | _: Or => true -case _ => false - }.isDefined - def apply(plan: LogicalPlan): LogicalPlan = plan transform { -case f: Filter => f transformExpressionsUp { --- End diff -- Oh, this doesn't work because it still replaces the replaced `And`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19912: [SPARK-22719][SQL]Refactor ConstantPropagation
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19912 New algorithm looks correct and good. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19525: [SPARK-22289] [ML] Add JSON support for Matrix pa...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/19525#discussion_r155414284 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -476,6 +476,10 @@ class DenseMatrix @Since("2.0.0") ( @Since("2.0.0") object DenseMatrix { + @Since("2.3.0") + private[ml] def unapply(dm: DenseMatrix): Option[(Int, Int, Array[Double], Boolean)] = --- End diff -- I'm neutral on this issue. It's ok to let it private but should remove ```Since```. We can make it public later when there is clear requirement. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19901: [SPARK-22705][SQL] Case, Coalesce, and In use less globa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19901 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84588/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19901: [SPARK-22705][SQL] Case, Coalesce, and In use less globa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19901 **[Test build #84588 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84588/testReport)** for PR 19901 at commit [`c4691b6`](https://github.com/apache/spark/commit/c4691b6b4cd8b2ac26dd9c89243372dfec5bb913). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19901: [SPARK-22705][SQL] Case, Coalesce, and In use less globa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19901 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19882 **[Test build #84589 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84589/testReport)** for PR 19882 at commit [`1828571`](https://github.com/apache/spark/commit/18285717edcbefe9a79d51a6fdcb112c3d0bc5c4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19882 Oh. I'll bring back. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nested expr...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19813 LGTM, good job! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19882 **[Test build #84589 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84589/testReport)** for PR 19882 at commit [`1828571`](https://github.com/apache/spark/commit/18285717edcbefe9a79d51a6fdcb112c3d0bc5c4). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class OrcFilterSuite extends OrcTest with TestHiveSingleton ` * `implicit class IntToBinary(int: Int) ` * `class OrcPartitionDiscoverySuite extends OrcPartitionDiscoveryTest with TestHiveSingleton ` * `class OrcQuerySuite extends OrcQueryTest with TestHiveSingleton ` * `class OrcSourceSuite extends OrcSuite with TestHiveSingleton ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19882 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19882 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84589/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19902: [SPARK-22452][SQL]Add getInt, getLong, getBoolean to Dat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19902 **[Test build #84579 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84579/testReport)** for PR 19902 at commit [`5de7962`](https://github.com/apache/spark/commit/5de7962755f2175d6d11dec262644e8b9afae08b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19902: [SPARK-22452][SQL]Add getInt, getLong, getBoolean to Dat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19902 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19882#discussion_r155398123 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilterSuite.scala --- @@ -0,0 +1,370 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources.orc + +import java.nio.charset.StandardCharsets +import java.sql.{Date, Timestamp} + +import scala.collection.JavaConverters._ + +import org.apache.orc.storage.ql.io.sarg.{PredicateLeaf, SearchArgument} + +import org.apache.spark.sql.{Column, DataFrame} +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.planning.PhysicalOperation +import org.apache.spark.sql.execution.datasources.{DataSourceStrategy, HadoopFsRelation, LogicalRelation} +import org.apache.spark.sql.test.SharedSQLContext +import org.apache.spark.sql.types._ + +/** + * A test suite that tests Apache ORC filter API based filter pushdown optimization. + */ +class OrcFilterSuite extends OrcTest with SharedSQLContext { --- End diff -- Could we leave some comments to explain that reason? Seems there are many duplications and I would wonder why. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19912: [SPARK-22719][SQL]Refactor ConstantPropagation
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19912#discussion_r155413508 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -64,50 +64,91 @@ object ConstantFolding extends Rule[LogicalPlan] { * }}} * * Approach used: - * - Start from AND operator as the root - * - Get all the children conjunctive predicates which are EqualTo / EqualNullSafe such that they - * don't have a `NOT` or `OR` operator in them * - Populate a mapping of attribute => constant value by looking at all the equals predicates * - Using this mapping, replace occurrence of the attributes with the corresponding constant values * in the AND node. */ object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper { - private def containsNonConjunctionPredicates(expression: Expression): Boolean = expression.find { -case _: Not | _: Or => true -case _ => false - }.isDefined - def apply(plan: LogicalPlan): LogicalPlan = plan transform { -case f: Filter => f transformExpressionsUp { --- End diff -- Will it be the same effect if we turn `transformExpressionsUp` to `transformExpressionsDown`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19901: [SPARK-22705][SQL] Case, Coalesce, and In use less globa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19901 **[Test build #84584 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84584/testReport)** for PR 19901 at commit [`31ab853`](https://github.com/apache/spark/commit/31ab853c6d496ac5d6a0a4c4e6b9cf03e7cbf466). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19901: [SPARK-22705][SQL] Case, Coalesce, and In use less globa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19901 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19901: [SPARK-22705][SQL] Case, Coalesce, and In use less globa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19901 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84584/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19041: [SPARK-21097][CORE] Add option to recover cached data
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19041 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84585/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19041: [SPARK-21097][CORE] Add option to recover cached data
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19041 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19041: [SPARK-21097][CORE] Add option to recover cached data
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19041 **[Test build #84585 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84585/testReport)** for PR 19041 at commit [`6a54b86`](https://github.com/apache/spark/commit/6a54b8652888d5e214b7941773644868091fe5f0). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19882: [SPARK-22672][SQL][TEST] Refactor ORC Tests
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19882 **[Test build #84591 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84591/testReport)** for PR 19882 at commit [`f56423d`](https://github.com/apache/spark/commit/f56423dd9168e781fad6e02e2aa319d8593d98cc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19912: [SPARK-22719][SQL]Refactor ConstantPropagation
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/19912 @gatorsmile @viirya thanks for reviewing! I have addressed your comments. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19905: [SPARK-22710] ConfigBuilder.fallbackConf should trigger ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19905 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84551/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19905: [SPARK-22710] ConfigBuilder.fallbackConf should trigger ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19905 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19893: [SPARK-16139][TEST] Add logging functionality for leaked...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19893 **[Test build #4006 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4006/testReport)** for PR 19893 at commit [`0d45a5b`](https://github.com/apache/spark/commit/0d45a5b7a225e33eedf13ee1f379ea64d6f53c18). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19911: [SPARK-22717][SQL] Correct cascade default for postgres
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19911 **[Test build #4005 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4005/testReport)** for PR 19911 at commit [`40bd8ac`](https://github.com/apache/spark/commit/40bd8acbe6571a9dea39a6ae9083959e00110979). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org