[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19776 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user jliwork commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r152429647 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala --- @@ -0,0 +1,231 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans.PlanTest +import org.apache.spark.sql.sources +import org.apache.spark.sql.test.SharedSQLContext + +class DataSourceStrategySuite extends PlanTest with SharedSQLContext { + + test("translate simple expression") { +val attrInt = 'cint.int +val attrStr = 'cstr.string + +testTranslateFilter(EqualTo(attrInt, 1), Some(sources.EqualTo("cint", 1))) +testTranslateFilter(EqualTo(1, attrInt), Some(sources.EqualTo("cint", 1))) + +testTranslateFilter(EqualNullSafe(attrStr, Literal(null)), + Some(sources.EqualNullSafe("cstr", null))) +testTranslateFilter(EqualNullSafe(Literal(null), attrStr), + Some(sources.EqualNullSafe("cstr", null))) + +testTranslateFilter(GreaterThan(attrInt, 1), Some(sources.GreaterThan("cint", 1))) +testTranslateFilter(GreaterThan(1, attrInt), Some(sources.LessThan("cint", 1))) + +testTranslateFilter(LessThan(attrInt, 1), Some(sources.LessThan("cint", 1))) +testTranslateFilter(LessThan(1, attrInt), Some(sources.GreaterThan("cint", 1))) + +testTranslateFilter(GreaterThanOrEqual(attrInt, 1), Some(sources.GreaterThanOrEqual("cint", 1))) +testTranslateFilter(GreaterThanOrEqual(1, attrInt), Some(sources.LessThanOrEqual("cint", 1))) + +testTranslateFilter(LessThanOrEqual(attrInt, 1), Some(sources.LessThanOrEqual("cint", 1))) +testTranslateFilter(LessThanOrEqual(1, attrInt), Some(sources.GreaterThanOrEqual("cint", 1))) + +testTranslateFilter(InSet(attrInt, Set(1, 2, 3)), Some(sources.In("cint", Array(1, 2, 3 + +testTranslateFilter(In(attrInt, Seq(1, 2, 3)), Some(sources.In("cint", Array(1, 2, 3 + +testTranslateFilter(IsNull(attrInt), Some(sources.IsNull("cint"))) +testTranslateFilter(IsNotNull(attrInt), Some(sources.IsNotNull("cint"))) + +// cint > 1 AND cint < 10 +testTranslateFilter(And( + GreaterThan(attrInt, 1), + LessThan(attrInt, 10)), + Some(sources.And( +sources.GreaterThan("cint", 1), +sources.LessThan("cint", 10 + +// cint >= 8 OR cint <= 2 +testTranslateFilter(Or( + GreaterThanOrEqual(attrInt, 8), + LessThanOrEqual(attrInt, 2)), + Some(sources.Or( +sources.GreaterThanOrEqual("cint", 8), +sources.LessThanOrEqual("cint", 2 + +testTranslateFilter(Not(GreaterThanOrEqual(attrInt, 8)), + Some(sources.Not(sources.GreaterThanOrEqual("cint", 8 + +testTranslateFilter(StartsWith(attrStr, "a"), Some(sources.StringStartsWith("cstr", "a"))) + +testTranslateFilter(EndsWith(attrStr, "a"), Some(sources.StringEndsWith("cstr", "a"))) + +testTranslateFilter(Contains(attrStr, "a"), Some(sources.StringContains("cstr", "a"))) + } + + test("translate complex expression") { +val attrInt = 'cint.int + +// ABS(cint) - 2 = 1 +testTranslateFilter(LessThanOrEqual( + // Expressions are not supported --- End diff -- good catch @_@ fixed the typo. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r152425778 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala --- @@ -0,0 +1,231 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans.PlanTest +import org.apache.spark.sql.sources +import org.apache.spark.sql.test.SharedSQLContext + +class DataSourceStrategySuite extends PlanTest with SharedSQLContext { + + test("translate simple expression") { +val attrInt = 'cint.int +val attrStr = 'cstr.string + +testTranslateFilter(EqualTo(attrInt, 1), Some(sources.EqualTo("cint", 1))) +testTranslateFilter(EqualTo(1, attrInt), Some(sources.EqualTo("cint", 1))) + +testTranslateFilter(EqualNullSafe(attrStr, Literal(null)), + Some(sources.EqualNullSafe("cstr", null))) +testTranslateFilter(EqualNullSafe(Literal(null), attrStr), + Some(sources.EqualNullSafe("cstr", null))) + +testTranslateFilter(GreaterThan(attrInt, 1), Some(sources.GreaterThan("cint", 1))) +testTranslateFilter(GreaterThan(1, attrInt), Some(sources.LessThan("cint", 1))) + +testTranslateFilter(LessThan(attrInt, 1), Some(sources.LessThan("cint", 1))) +testTranslateFilter(LessThan(1, attrInt), Some(sources.GreaterThan("cint", 1))) + +testTranslateFilter(GreaterThanOrEqual(attrInt, 1), Some(sources.GreaterThanOrEqual("cint", 1))) +testTranslateFilter(GreaterThanOrEqual(1, attrInt), Some(sources.LessThanOrEqual("cint", 1))) + +testTranslateFilter(LessThanOrEqual(attrInt, 1), Some(sources.LessThanOrEqual("cint", 1))) +testTranslateFilter(LessThanOrEqual(1, attrInt), Some(sources.GreaterThanOrEqual("cint", 1))) + +testTranslateFilter(InSet(attrInt, Set(1, 2, 3)), Some(sources.In("cint", Array(1, 2, 3 + +testTranslateFilter(In(attrInt, Seq(1, 2, 3)), Some(sources.In("cint", Array(1, 2, 3 + +testTranslateFilter(IsNull(attrInt), Some(sources.IsNull("cint"))) +testTranslateFilter(IsNotNull(attrInt), Some(sources.IsNotNull("cint"))) + +// cint > 1 AND cint < 10 +testTranslateFilter(And( + GreaterThan(attrInt, 1), + LessThan(attrInt, 10)), + Some(sources.And( +sources.GreaterThan("cint", 1), +sources.LessThan("cint", 10 + +// cint >= 8 OR cint <= 2 +testTranslateFilter(Or( + GreaterThanOrEqual(attrInt, 8), + LessThanOrEqual(attrInt, 2)), + Some(sources.Or( +sources.GreaterThanOrEqual("cint", 8), +sources.LessThanOrEqual("cint", 2 + +testTranslateFilter(Not(GreaterThanOrEqual(attrInt, 8)), + Some(sources.Not(sources.GreaterThanOrEqual("cint", 8 + +testTranslateFilter(StartsWith(attrStr, "a"), Some(sources.StringStartsWith("cstr", "a"))) + +testTranslateFilter(EndsWith(attrStr, "a"), Some(sources.StringEndsWith("cstr", "a"))) + +testTranslateFilter(Contains(attrStr, "a"), Some(sources.StringContains("cstr", "a"))) + } + + test("translate complex expression") { +val attrInt = 'cint.int + +// ABS(cint) - 2 = 1 +testTranslateFilter(LessThanOrEqual( + // Expressions are not supported --- End diff -- ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user jliwork commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r152421950 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala --- @@ -0,0 +1,307 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.expressions +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans.PlanTest +import org.apache.spark.sql.sources +import org.apache.spark.sql.test.SharedSQLContext + + +class DataSourceStrategySuite extends PlanTest with SharedSQLContext { + + test("translate simple expression") { +val attrInt = 'cint.int +val attrStr = 'cstr.string + +assertResult(Some(sources.EqualTo("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.EqualTo(attrInt, 1)) +} --- End diff -- Thanks! I've followed your suggestion and the test suite looks cleaner now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user jliwork commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r152422048 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala --- @@ -0,0 +1,307 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.expressions +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans.PlanTest +import org.apache.spark.sql.sources +import org.apache.spark.sql.test.SharedSQLContext + + +class DataSourceStrategySuite extends PlanTest with SharedSQLContext { + + test("translate simple expression") { +val attrInt = 'cint.int +val attrStr = 'cstr.string + +assertResult(Some(sources.EqualTo("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.EqualTo(attrInt, 1)) +} +assertResult(Some(sources.EqualTo("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.EqualTo(1, attrInt)) +} + +assertResult(Some(sources.EqualNullSafe("cstr", null))) { + DataSourceStrategy.translateFilter( +expressions.EqualNullSafe(attrStr, Literal(null))) +} +assertResult(Some(sources.EqualNullSafe("cstr", null))) { + DataSourceStrategy.translateFilter( +expressions.EqualNullSafe(Literal(null), attrStr)) +} + +assertResult(Some(sources.GreaterThan("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.GreaterThan(attrInt, 1)) +} +assertResult(Some(sources.GreaterThan("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.LessThan(1, attrInt)) +} + +assertResult(Some(sources.LessThan("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.LessThan(attrInt, 1)) +} +assertResult(Some(sources.LessThan("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.GreaterThan(1, attrInt)) +} + +assertResult(Some(sources.GreaterThanOrEqual("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.GreaterThanOrEqual(attrInt, 1)) +} +assertResult(Some(sources.GreaterThanOrEqual("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.LessThanOrEqual(1, attrInt)) +} + +assertResult(Some(sources.LessThanOrEqual("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.LessThanOrEqual(attrInt, 1)) +} +assertResult(Some(sources.LessThanOrEqual("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.GreaterThanOrEqual(1, attrInt)) +} + +assertResult(Some(sources.In("cint", Array(1, 2, 3 { + DataSourceStrategy.translateFilter( +expressions.InSet(attrInt, Set(1, 2, 3))) +} + +assertResult(Some(sources.In("cint", Array(1, 2, 3 { + DataSourceStrategy.translateFilter( +expressions.In(attrInt, Seq(1, 2, 3))) +} + +assertResult(Some(sources.IsNull("cint"))) { + DataSourceStrategy.translateFilter( +expressions.IsNull(attrInt)) +} +assertResult(Some(sources.IsNotNull("cint"))) { + DataSourceStrategy.translateFilter( +expressions.IsNotNull(attrInt)) +} + +assertResult(Some(sources.And( + sources.GreaterThan("cint", 1), + sources.LessThan("cint", 10 { + DataSourceStrategy.translateFilter(expressions.And( +expressions.GreaterThan(attrInt, 1), +expressions.LessThan(attrInt, 10) + )) +} + +assertResult(Some(sources.Or( + sources.GreaterThanOrEqual("cint", 8), +
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user jliwork commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r152421982 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala --- @@ -296,9 +296,15 @@ class JDBCSuite extends SparkFunSuite // The older versions of spark have this kind of bugs in parquet data source. val df1 = sql("SELECT * FROM foobar WHERE NOT (THEID != 2 AND NAME != 'mary')") --- End diff -- Fixed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user jliwork commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r152421773 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala --- @@ -0,0 +1,307 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.expressions +import org.apache.spark.sql.catalyst.expressions._ --- End diff -- Thanks for the suggestion. Fixed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r152263922 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala --- @@ -0,0 +1,307 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.expressions +import org.apache.spark.sql.catalyst.expressions._ --- End diff -- As you import `expressions._`, I think we can write `EqualTo` instead of `expressions.EqualTo` for catalyst predicates below? Because you always write `sources.`EqualTo`, I think we don't confuse with them? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r152263135 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala --- @@ -0,0 +1,307 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.expressions +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans.PlanTest +import org.apache.spark.sql.sources +import org.apache.spark.sql.test.SharedSQLContext + + +class DataSourceStrategySuite extends PlanTest with SharedSQLContext { + + test("translate simple expression") { +val attrInt = 'cint.int +val attrStr = 'cstr.string + +assertResult(Some(sources.EqualTo("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.EqualTo(attrInt, 1)) +} --- End diff -- Looks like we can have a small helper function: ```scala def testTranslateFilter(catalystFilter: Expression, result: Option[sources.Filter]): Unit = { assertResult(result) { DataSourceStrategy.translateFilter(catalystFilter) } } ``` So the tests can be rewritten as: ```scala testTranslateFilter(expressions.EqualTo(attrInt, 1), Some(sources.EqualTo("cint", 1))) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r152261544 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala --- @@ -296,9 +296,15 @@ class JDBCSuite extends SparkFunSuite // The older versions of spark have this kind of bugs in parquet data source. val df1 = sql("SELECT * FROM foobar WHERE NOT (THEID != 2 AND NAME != 'mary')") --- End diff -- As I leave the comment in https://github.com/apache/spark/pull/10468#discussion_r151900175, the above test doesn't actually test against SPARK-12218 issue. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r152260557 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -497,7 +497,19 @@ object DataSourceStrategy { Some(sources.IsNotNull(a.name)) case expressions.And(left, right) => -(translateFilter(left) ++ translateFilter(right)).reduceOption(sources.And) +// See SPARK-12218 for detailed discussion +// It is not safe to just convert one side if we do not understand the +// other side. Here is an example used to explain the reason. +// Let's say we have (a = 2 AND trim(b) = 'blah') OR (c > 0) +// and we do not understand how to convert trim(b) = 'blah'. +// If we only convert a = 2, we will end up with +// (a = 2) OR (c > 0), which will generate wrong results. +// Pushing one leg of AND down is only safe to do at the top level. +// You can see ParquetFilters' createFilter for more details. +for { + leftFilter <- translateFilter(left) + rightFilter <- translateFilter(right) +} yield sources.And(leftFilter, rightFilter) --- End diff -- Although Catalyst predicate expressions are all converted to `sources.Filter` when we try to push down them. Not all convertible filters can be handled by Parquet and ORC. So I think we still can face the case only one sub-filter of `AND` can be pushed down by the file format. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r152206765 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala --- @@ -0,0 +1,307 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.expressions +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans.PlanTest +import org.apache.spark.sql.sources +import org.apache.spark.sql.test.SharedSQLContext + + +class DataSourceStrategySuite extends PlanTest with SharedSQLContext { + + test("translate simple expression") { +val attrInt = 'cint.int +val attrStr = 'cstr.string + +assertResult(Some(sources.EqualTo("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.EqualTo(attrInt, 1)) +} +assertResult(Some(sources.EqualTo("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.EqualTo(1, attrInt)) +} + +assertResult(Some(sources.EqualNullSafe("cstr", null))) { + DataSourceStrategy.translateFilter( +expressions.EqualNullSafe(attrStr, Literal(null))) +} +assertResult(Some(sources.EqualNullSafe("cstr", null))) { + DataSourceStrategy.translateFilter( +expressions.EqualNullSafe(Literal(null), attrStr)) +} + +assertResult(Some(sources.GreaterThan("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.GreaterThan(attrInt, 1)) +} +assertResult(Some(sources.GreaterThan("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.LessThan(1, attrInt)) +} + +assertResult(Some(sources.LessThan("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.LessThan(attrInt, 1)) +} +assertResult(Some(sources.LessThan("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.GreaterThan(1, attrInt)) +} + +assertResult(Some(sources.GreaterThanOrEqual("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.GreaterThanOrEqual(attrInt, 1)) +} +assertResult(Some(sources.GreaterThanOrEqual("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.LessThanOrEqual(1, attrInt)) +} + +assertResult(Some(sources.LessThanOrEqual("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.LessThanOrEqual(attrInt, 1)) +} +assertResult(Some(sources.LessThanOrEqual("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.GreaterThanOrEqual(1, attrInt)) +} + +assertResult(Some(sources.In("cint", Array(1, 2, 3 { + DataSourceStrategy.translateFilter( +expressions.InSet(attrInt, Set(1, 2, 3))) +} + +assertResult(Some(sources.In("cint", Array(1, 2, 3 { + DataSourceStrategy.translateFilter( +expressions.In(attrInt, Seq(1, 2, 3))) +} + +assertResult(Some(sources.IsNull("cint"))) { + DataSourceStrategy.translateFilter( +expressions.IsNull(attrInt)) +} +assertResult(Some(sources.IsNotNull("cint"))) { + DataSourceStrategy.translateFilter( +expressions.IsNotNull(attrInt)) +} + +assertResult(Some(sources.And( + sources.GreaterThan("cint", 1), + sources.LessThan("cint", 10 { + DataSourceStrategy.translateFilter(expressions.And( +expressions.GreaterThan(attrInt, 1), +expressions.LessThan(attrInt, 10) + )) +} + +assertResult(Some(sources.Or( + sources.GreaterThanOrEqual("cint", 8), +
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r152200656 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala --- @@ -296,8 +296,33 @@ class JDBCSuite extends SparkFunSuite // The older versions of spark have this kind of bugs in parquet data source. val df1 = sql("SELECT * FROM foobar WHERE NOT (THEID != 2 AND NAME != 'mary')") val df2 = sql("SELECT * FROM foobar WHERE NOT (THEID != 2) OR NOT (NAME != 'mary')") +val df3 = sql("SELECT * FROM foobar WHERE (THEID > 0 AND NAME = 'mary') OR (NAME = 'fred')") +val df4 = sql("SELECT * FROM foobar " + + "WHERE (THEID > 0 AND TRIM(NAME) = 'mary') OR (NAME = 'fred')") +val df5 = sql("SELECT * FROM foobar " + + "WHERE THEID > 0 AND TRIM(NAME) = 'mary' AND LENGTH(NAME) > 3") +val df6 = sql("SELECT * FROM foobar " + + "WHERE THEID < 0 OR NAME = 'mary' OR NAME = 'fred'") +val df7 = sql("SELECT * FROM foobar " + + "WHERE THEID < 0 OR TRIM(NAME) = 'mary' OR NAME = 'fred'") +val df8 = sql("SELECT * FROM foobar " + + "WHERE NOT((THEID < 0 OR NAME != 'mary') AND (THEID != 1 OR NAME != 'fred'))") +val df9 = sql("SELECT * FROM foobar " + + "WHERE NOT((THEID < 0 OR NAME != 'mary') AND (THEID != 1 OR TRIM(NAME) != 'fred'))") +val df10 = sql("SELECT * FROM foobar " + --- End diff -- why do we need to test so many cases? as an end-to-end test, I think we only need a typical case. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r152200310 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala --- @@ -0,0 +1,302 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.expressions +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans.PlanTest +import org.apache.spark.sql.sources +import org.apache.spark.sql.test.SharedSQLContext + + +class DataSourceStrategySuite extends PlanTest with SharedSQLContext { + + test("translate simple expression") { +val attrInt = 'cint.int +val attrStr = 'cstr.string + +assertResult(Some(sources.EqualTo("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.EqualTo(attrInt, 1)) +} +assertResult(Some(sources.EqualTo("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.EqualTo(1, attrInt)) +} + +assertResult(Some(sources.EqualNullSafe("cstr", null))) { + DataSourceStrategy.translateFilter( +expressions.EqualNullSafe(attrStr, Literal(null))) +} +assertResult(Some(sources.EqualNullSafe("cstr", null))) { + DataSourceStrategy.translateFilter( +expressions.EqualNullSafe(Literal(null), attrStr)) +} + +assertResult(Some(sources.GreaterThan("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.GreaterThan(attrInt, 1)) +} +assertResult(Some(sources.GreaterThan("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.LessThan(1, attrInt)) +} + +assertResult(Some(sources.LessThan("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.LessThan(attrInt, 1)) +} +assertResult(Some(sources.LessThan("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.GreaterThan(1, attrInt)) +} + +assertResult(Some(sources.GreaterThanOrEqual("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.GreaterThanOrEqual(attrInt, 1)) +} +assertResult(Some(sources.GreaterThanOrEqual("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.LessThanOrEqual(1, attrInt)) +} + +assertResult(Some(sources.LessThanOrEqual("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.LessThanOrEqual(attrInt, 1)) +} +assertResult(Some(sources.LessThanOrEqual("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.GreaterThanOrEqual(1, attrInt)) +} + +assertResult(Some(sources.In("cint", Array(1, 2, 3 { + DataSourceStrategy.translateFilter( +expressions.InSet(attrInt, Set(1, 2, 3))) +} + +assertResult(Some(sources.In("cint", Array(1, 2, 3 { + DataSourceStrategy.translateFilter( +expressions.In(attrInt, Seq(1, 2, 3))) +} + +assertResult(Some(sources.IsNull("cint"))) { + DataSourceStrategy.translateFilter( +expressions.IsNull(attrInt)) +} +assertResult(Some(sources.IsNotNull("cint"))) { + DataSourceStrategy.translateFilter( +expressions.IsNotNull(attrInt)) +} + +assertResult(Some(sources.And( + sources.GreaterThan("cint", 1), + sources.LessThan("cint", 10 { + DataSourceStrategy.translateFilter(expressions.And( +expressions.GreaterThan(attrInt, 1), +expressions.LessThan(attrInt, 10) + )) +} + +assertResult(Some(sources.Or( + sources.GreaterThanOrEqual("cint", 8), +
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r152200223 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala --- @@ -0,0 +1,302 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.expressions +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans.PlanTest +import org.apache.spark.sql.sources +import org.apache.spark.sql.test.SharedSQLContext + + +class DataSourceStrategySuite extends PlanTest with SharedSQLContext { + + test("translate simple expression") { +val attrInt = 'cint.int +val attrStr = 'cstr.string + +assertResult(Some(sources.EqualTo("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.EqualTo(attrInt, 1)) +} +assertResult(Some(sources.EqualTo("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.EqualTo(1, attrInt)) +} + +assertResult(Some(sources.EqualNullSafe("cstr", null))) { + DataSourceStrategy.translateFilter( +expressions.EqualNullSafe(attrStr, Literal(null))) +} +assertResult(Some(sources.EqualNullSafe("cstr", null))) { + DataSourceStrategy.translateFilter( +expressions.EqualNullSafe(Literal(null), attrStr)) +} + +assertResult(Some(sources.GreaterThan("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.GreaterThan(attrInt, 1)) +} +assertResult(Some(sources.GreaterThan("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.LessThan(1, attrInt)) +} + +assertResult(Some(sources.LessThan("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.LessThan(attrInt, 1)) +} +assertResult(Some(sources.LessThan("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.GreaterThan(1, attrInt)) +} + +assertResult(Some(sources.GreaterThanOrEqual("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.GreaterThanOrEqual(attrInt, 1)) +} +assertResult(Some(sources.GreaterThanOrEqual("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.LessThanOrEqual(1, attrInt)) +} + +assertResult(Some(sources.LessThanOrEqual("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.LessThanOrEqual(attrInt, 1)) +} +assertResult(Some(sources.LessThanOrEqual("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.GreaterThanOrEqual(1, attrInt)) +} + +assertResult(Some(sources.In("cint", Array(1, 2, 3 { + DataSourceStrategy.translateFilter( +expressions.InSet(attrInt, Set(1, 2, 3))) +} + +assertResult(Some(sources.In("cint", Array(1, 2, 3 { + DataSourceStrategy.translateFilter( +expressions.In(attrInt, Seq(1, 2, 3))) +} + +assertResult(Some(sources.IsNull("cint"))) { + DataSourceStrategy.translateFilter( +expressions.IsNull(attrInt)) +} +assertResult(Some(sources.IsNotNull("cint"))) { + DataSourceStrategy.translateFilter( +expressions.IsNotNull(attrInt)) +} + +assertResult(Some(sources.And( + sources.GreaterThan("cint", 1), + sources.LessThan("cint", 10 { + DataSourceStrategy.translateFilter(expressions.And( +expressions.GreaterThan(attrInt, 1), +expressions.LessThan(attrInt, 10) + )) +} + +assertResult(Some(sources.Or( + sources.GreaterThanOrEqual("cint", 8), +
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r152200130 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala --- @@ -0,0 +1,302 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.expressions +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans.PlanTest +import org.apache.spark.sql.sources +import org.apache.spark.sql.test.SharedSQLContext + + +class DataSourceStrategySuite extends PlanTest with SharedSQLContext { + + test("translate simple expression") { +val attrInt = 'cint.int +val attrStr = 'cstr.string + +assertResult(Some(sources.EqualTo("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.EqualTo(attrInt, 1)) +} +assertResult(Some(sources.EqualTo("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.EqualTo(1, attrInt)) +} + +assertResult(Some(sources.EqualNullSafe("cstr", null))) { + DataSourceStrategy.translateFilter( +expressions.EqualNullSafe(attrStr, Literal(null))) +} +assertResult(Some(sources.EqualNullSafe("cstr", null))) { + DataSourceStrategy.translateFilter( +expressions.EqualNullSafe(Literal(null), attrStr)) +} + +assertResult(Some(sources.GreaterThan("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.GreaterThan(attrInt, 1)) +} +assertResult(Some(sources.GreaterThan("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.LessThan(1, attrInt)) +} + +assertResult(Some(sources.LessThan("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.LessThan(attrInt, 1)) +} +assertResult(Some(sources.LessThan("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.GreaterThan(1, attrInt)) +} + +assertResult(Some(sources.GreaterThanOrEqual("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.GreaterThanOrEqual(attrInt, 1)) +} +assertResult(Some(sources.GreaterThanOrEqual("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.LessThanOrEqual(1, attrInt)) +} + +assertResult(Some(sources.LessThanOrEqual("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.LessThanOrEqual(attrInt, 1)) +} +assertResult(Some(sources.LessThanOrEqual("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.GreaterThanOrEqual(1, attrInt)) +} + +assertResult(Some(sources.In("cint", Array(1, 2, 3 { + DataSourceStrategy.translateFilter( +expressions.InSet(attrInt, Set(1, 2, 3))) +} + +assertResult(Some(sources.In("cint", Array(1, 2, 3 { + DataSourceStrategy.translateFilter( +expressions.In(attrInt, Seq(1, 2, 3))) +} + +assertResult(Some(sources.IsNull("cint"))) { + DataSourceStrategy.translateFilter( +expressions.IsNull(attrInt)) +} +assertResult(Some(sources.IsNotNull("cint"))) { + DataSourceStrategy.translateFilter( +expressions.IsNotNull(attrInt)) +} + +assertResult(Some(sources.And( + sources.GreaterThan("cint", 1), + sources.LessThan("cint", 10 { + DataSourceStrategy.translateFilter(expressions.And( +expressions.GreaterThan(attrInt, 1), +expressions.LessThan(attrInt, 10) + )) +} + +assertResult(Some(sources.Or( + sources.GreaterThanOrEqual("cint", 8), +
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r152200166 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala --- @@ -0,0 +1,302 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.expressions +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans.PlanTest +import org.apache.spark.sql.sources +import org.apache.spark.sql.test.SharedSQLContext + + +class DataSourceStrategySuite extends PlanTest with SharedSQLContext { + + test("translate simple expression") { +val attrInt = 'cint.int +val attrStr = 'cstr.string + +assertResult(Some(sources.EqualTo("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.EqualTo(attrInt, 1)) +} +assertResult(Some(sources.EqualTo("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.EqualTo(1, attrInt)) +} + +assertResult(Some(sources.EqualNullSafe("cstr", null))) { + DataSourceStrategy.translateFilter( +expressions.EqualNullSafe(attrStr, Literal(null))) +} +assertResult(Some(sources.EqualNullSafe("cstr", null))) { + DataSourceStrategy.translateFilter( +expressions.EqualNullSafe(Literal(null), attrStr)) +} + +assertResult(Some(sources.GreaterThan("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.GreaterThan(attrInt, 1)) +} +assertResult(Some(sources.GreaterThan("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.LessThan(1, attrInt)) +} + +assertResult(Some(sources.LessThan("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.LessThan(attrInt, 1)) +} +assertResult(Some(sources.LessThan("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.GreaterThan(1, attrInt)) +} + +assertResult(Some(sources.GreaterThanOrEqual("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.GreaterThanOrEqual(attrInt, 1)) +} +assertResult(Some(sources.GreaterThanOrEqual("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.LessThanOrEqual(1, attrInt)) +} + +assertResult(Some(sources.LessThanOrEqual("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.LessThanOrEqual(attrInt, 1)) +} +assertResult(Some(sources.LessThanOrEqual("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.GreaterThanOrEqual(1, attrInt)) +} + +assertResult(Some(sources.In("cint", Array(1, 2, 3 { + DataSourceStrategy.translateFilter( +expressions.InSet(attrInt, Set(1, 2, 3))) +} + +assertResult(Some(sources.In("cint", Array(1, 2, 3 { + DataSourceStrategy.translateFilter( +expressions.In(attrInt, Seq(1, 2, 3))) +} + +assertResult(Some(sources.IsNull("cint"))) { + DataSourceStrategy.translateFilter( +expressions.IsNull(attrInt)) +} +assertResult(Some(sources.IsNotNull("cint"))) { + DataSourceStrategy.translateFilter( +expressions.IsNotNull(attrInt)) +} + +assertResult(Some(sources.And( + sources.GreaterThan("cint", 1), + sources.LessThan("cint", 10 { + DataSourceStrategy.translateFilter(expressions.And( +expressions.GreaterThan(attrInt, 1), +expressions.LessThan(attrInt, 10) + )) +} + +assertResult(Some(sources.Or( + sources.GreaterThanOrEqual("cint", 8), +
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r152200090 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala --- @@ -0,0 +1,302 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.expressions +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans.PlanTest +import org.apache.spark.sql.sources +import org.apache.spark.sql.test.SharedSQLContext + + +class DataSourceStrategySuite extends PlanTest with SharedSQLContext { + + test("translate simple expression") { +val attrInt = 'cint.int +val attrStr = 'cstr.string + +assertResult(Some(sources.EqualTo("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.EqualTo(attrInt, 1)) +} +assertResult(Some(sources.EqualTo("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.EqualTo(1, attrInt)) +} + +assertResult(Some(sources.EqualNullSafe("cstr", null))) { + DataSourceStrategy.translateFilter( +expressions.EqualNullSafe(attrStr, Literal(null))) +} +assertResult(Some(sources.EqualNullSafe("cstr", null))) { + DataSourceStrategy.translateFilter( +expressions.EqualNullSafe(Literal(null), attrStr)) +} + +assertResult(Some(sources.GreaterThan("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.GreaterThan(attrInt, 1)) +} +assertResult(Some(sources.GreaterThan("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.LessThan(1, attrInt)) +} + +assertResult(Some(sources.LessThan("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.LessThan(attrInt, 1)) +} +assertResult(Some(sources.LessThan("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.GreaterThan(1, attrInt)) +} + +assertResult(Some(sources.GreaterThanOrEqual("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.GreaterThanOrEqual(attrInt, 1)) +} +assertResult(Some(sources.GreaterThanOrEqual("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.LessThanOrEqual(1, attrInt)) +} + +assertResult(Some(sources.LessThanOrEqual("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.LessThanOrEqual(attrInt, 1)) +} +assertResult(Some(sources.LessThanOrEqual("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.GreaterThanOrEqual(1, attrInt)) +} + +assertResult(Some(sources.In("cint", Array(1, 2, 3 { + DataSourceStrategy.translateFilter( +expressions.InSet(attrInt, Set(1, 2, 3))) +} + +assertResult(Some(sources.In("cint", Array(1, 2, 3 { + DataSourceStrategy.translateFilter( +expressions.In(attrInt, Seq(1, 2, 3))) +} + +assertResult(Some(sources.IsNull("cint"))) { + DataSourceStrategy.translateFilter( +expressions.IsNull(attrInt)) +} +assertResult(Some(sources.IsNotNull("cint"))) { + DataSourceStrategy.translateFilter( +expressions.IsNotNull(attrInt)) +} + +assertResult(Some(sources.And( + sources.GreaterThan("cint", 1), + sources.LessThan("cint", 10 { + DataSourceStrategy.translateFilter(expressions.And( +expressions.GreaterThan(attrInt, 1), +expressions.LessThan(attrInt, 10) + )) +} + +assertResult(Some(sources.Or( + sources.GreaterThanOrEqual("cint", 8), +
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r152200107 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala --- @@ -0,0 +1,302 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.expressions +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans.PlanTest +import org.apache.spark.sql.sources +import org.apache.spark.sql.test.SharedSQLContext + + +class DataSourceStrategySuite extends PlanTest with SharedSQLContext { + + test("translate simple expression") { +val attrInt = 'cint.int +val attrStr = 'cstr.string + +assertResult(Some(sources.EqualTo("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.EqualTo(attrInt, 1)) +} +assertResult(Some(sources.EqualTo("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.EqualTo(1, attrInt)) +} + +assertResult(Some(sources.EqualNullSafe("cstr", null))) { + DataSourceStrategy.translateFilter( +expressions.EqualNullSafe(attrStr, Literal(null))) +} +assertResult(Some(sources.EqualNullSafe("cstr", null))) { + DataSourceStrategy.translateFilter( +expressions.EqualNullSafe(Literal(null), attrStr)) +} + +assertResult(Some(sources.GreaterThan("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.GreaterThan(attrInt, 1)) +} +assertResult(Some(sources.GreaterThan("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.LessThan(1, attrInt)) +} + +assertResult(Some(sources.LessThan("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.LessThan(attrInt, 1)) +} +assertResult(Some(sources.LessThan("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.GreaterThan(1, attrInt)) +} + +assertResult(Some(sources.GreaterThanOrEqual("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.GreaterThanOrEqual(attrInt, 1)) +} +assertResult(Some(sources.GreaterThanOrEqual("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.LessThanOrEqual(1, attrInt)) +} + +assertResult(Some(sources.LessThanOrEqual("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.LessThanOrEqual(attrInt, 1)) +} +assertResult(Some(sources.LessThanOrEqual("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.GreaterThanOrEqual(1, attrInt)) +} + +assertResult(Some(sources.In("cint", Array(1, 2, 3 { + DataSourceStrategy.translateFilter( +expressions.InSet(attrInt, Set(1, 2, 3))) +} + +assertResult(Some(sources.In("cint", Array(1, 2, 3 { + DataSourceStrategy.translateFilter( +expressions.In(attrInt, Seq(1, 2, 3))) +} + +assertResult(Some(sources.IsNull("cint"))) { + DataSourceStrategy.translateFilter( +expressions.IsNull(attrInt)) +} +assertResult(Some(sources.IsNotNull("cint"))) { + DataSourceStrategy.translateFilter( +expressions.IsNotNull(attrInt)) +} + +assertResult(Some(sources.And( + sources.GreaterThan("cint", 1), + sources.LessThan("cint", 10 { + DataSourceStrategy.translateFilter(expressions.And( +expressions.GreaterThan(attrInt, 1), +expressions.LessThan(attrInt, 10) + )) +} + +assertResult(Some(sources.Or( + sources.GreaterThanOrEqual("cint", 8), +
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user jliwork commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r152193847 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala --- @@ -0,0 +1,306 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.catalyst.expressions +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans.PlanTest +import org.apache.spark.sql.sources +import org.apache.spark.sql.test.SharedSQLContext +import org.apache.spark.sql.types._ + + +class DataSourceStrategySuite extends PlanTest with SharedSQLContext { + + test("translate simple expression") { +val attrInt = AttributeReference("cint", IntegerType)() +val attrStr = AttributeReference("cstr", StringType)() --- End diff -- Fixed. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user jliwork commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r152193833 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala --- @@ -0,0 +1,305 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.{sources, QueryTest} +import org.apache.spark.sql.catalyst.expressions +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.test.SharedSQLContext +import org.apache.spark.sql.types._ + + +class DataSourceStrategySuite extends QueryTest with SharedSQLContext { + + test("translate simple expression") { +val attrInt = AttributeReference("cint", IntegerType)() +val attrStr = AttributeReference("cstr", StringType)() + +assertResult(Some(sources.EqualTo("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.EqualTo(attrInt, Literal(1))) --- End diff -- Fixed. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r152192018 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala --- @@ -0,0 +1,306 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.catalyst.expressions +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans.PlanTest +import org.apache.spark.sql.sources +import org.apache.spark.sql.test.SharedSQLContext +import org.apache.spark.sql.types._ + + +class DataSourceStrategySuite extends PlanTest with SharedSQLContext { + + test("translate simple expression") { +val attrInt = AttributeReference("cint", IntegerType)() +val attrStr = AttributeReference("cstr", StringType)() --- End diff -- ```Scala import org.apache.spark.sql.catalyst.dsl.expressions._ ``` You can simplify your test cases. ```Scala val attrInt = 'cint.int val attrStr = 'cstr.string ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r152191858 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala --- @@ -0,0 +1,305 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.{sources, QueryTest} +import org.apache.spark.sql.catalyst.expressions +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.test.SharedSQLContext +import org.apache.spark.sql.types._ + + +class DataSourceStrategySuite extends QueryTest with SharedSQLContext { + + test("translate simple expression") { +val attrInt = AttributeReference("cint", IntegerType)() +val attrStr = AttributeReference("cstr", StringType)() + +assertResult(Some(sources.EqualTo("cint", 1))) { + DataSourceStrategy.translateFilter( +expressions.EqualTo(attrInt, Literal(1))) --- End diff -- No need to call `Literal` here. It will be implicitly casted to Literal ``` expressions.EqualTo(attrInt,1)) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r152191659 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala --- @@ -0,0 +1,305 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.{sources, QueryTest} +import org.apache.spark.sql.catalyst.expressions +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.test.SharedSQLContext +import org.apache.spark.sql.types._ + + +class DataSourceStrategySuite extends QueryTest with SharedSQLContext { + + test("translate simple expression") { +val attrInt = AttributeReference("cint", IntegerType)() +val attrStr = AttributeReference("cstr", StringType)() --- End diff -- ```Scala import org.apache.spark.sql.catalyst.dsl.expressions._ ``` You can simplify your test cases. ```Scala val attrInt = 'cint.int val attrStr = 'cstr.string ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user jliwork commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r152191430 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala --- @@ -0,0 +1,305 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.{sources, QueryTest} +import org.apache.spark.sql.catalyst.expressions +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.test.SharedSQLContext +import org.apache.spark.sql.types._ + + +class DataSourceStrategySuite extends QueryTest with SharedSQLContext { --- End diff -- Fixed. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r152190564 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala --- @@ -0,0 +1,305 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.{sources, QueryTest} +import org.apache.spark.sql.catalyst.expressions +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.test.SharedSQLContext +import org.apache.spark.sql.types._ + + +class DataSourceStrategySuite extends QueryTest with SharedSQLContext { --- End diff -- `extends PlanTest` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user jliwork commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r152186116 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala --- @@ -296,8 +296,33 @@ class JDBCSuite extends SparkFunSuite // The older versions of spark have this kind of bugs in parquet data source. val df1 = sql("SELECT * FROM foobar WHERE NOT (THEID != 2 AND NAME != 'mary')") val df2 = sql("SELECT * FROM foobar WHERE NOT (THEID != 2) OR NOT (NAME != 'mary')") +val df3 = sql("SELECT * FROM foobar WHERE (THEID > 0 AND NAME = 'mary') OR (NAME = 'fred')") +val df4 = sql("SELECT * FROM foobar " + + "WHERE (THEID > 0 AND TRIM(NAME) = 'mary') OR (NAME = 'fred')") +val df5 = sql("SELECT * FROM foobar " + + "WHERE THEID > 0 AND TRIM(NAME) = 'mary' AND LENGTH(NAME) > 3") +val df6 = sql("SELECT * FROM foobar " + + "WHERE THEID < 0 OR NAME = 'mary' OR NAME = 'fred'") +val df7 = sql("SELECT * FROM foobar " + + "WHERE THEID < 0 OR TRIM(NAME) = 'mary' OR NAME = 'fred'") +val df8 = sql("SELECT * FROM foobar " + + "WHERE NOT((THEID < 0 OR NAME != 'mary') AND (THEID != 1 OR NAME != 'fred'))") +val df9 = sql("SELECT * FROM foobar " + + "WHERE NOT((THEID < 0 OR NAME != 'mary') AND (THEID != 1 OR TRIM(NAME) != 'fred'))") +val df10 = sql("SELECT * FROM foobar " + + "WHERE (NOT(THEID < 0 OR TRIM(NAME) != 'mary')) OR (THEID = 1 AND NAME = 'fred')") + assert(df1.collect.toSet === Set(Row("mary", 2))) assert(df2.collect.toSet === Set(Row("mary", 2))) +assert(df3.collect.toSet === Set(Row("fred", 1), Row("mary", 2))) +assert(df4.collect.toSet === Set(Row("fred", 1), Row("mary", 2))) +assert(df5.collect.toSet === Set(Row("mary", 2))) +assert(df6.collect.toSet === Set(Row("fred", 1), Row("mary", 2))) +assert(df7.collect.toSet === Set(Row("fred", 1), Row("mary", 2))) +assert(df8.collect.toSet === Set(Row("fred", 1), Row("mary", 2))) +assert(df9.collect.toSet === Set(Row("fred", 1), Row("mary", 2))) +assert(df10.collect.toSet === Set(Row("fred", 1), Row("mary", 2))) --- End diff -- I went ahead and added a new ```DataSourceStrategySuite``` to test the ```translateFilter```. Please free feel to let me know of any further comments. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user jliwork commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r152158653 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala --- @@ -296,8 +296,33 @@ class JDBCSuite extends SparkFunSuite // The older versions of spark have this kind of bugs in parquet data source. val df1 = sql("SELECT * FROM foobar WHERE NOT (THEID != 2 AND NAME != 'mary')") val df2 = sql("SELECT * FROM foobar WHERE NOT (THEID != 2) OR NOT (NAME != 'mary')") +val df3 = sql("SELECT * FROM foobar WHERE (THEID > 0 AND NAME = 'mary') OR (NAME = 'fred')") +val df4 = sql("SELECT * FROM foobar " + + "WHERE (THEID > 0 AND TRIM(NAME) = 'mary') OR (NAME = 'fred')") +val df5 = sql("SELECT * FROM foobar " + + "WHERE THEID > 0 AND TRIM(NAME) = 'mary' AND LENGTH(NAME) > 3") +val df6 = sql("SELECT * FROM foobar " + + "WHERE THEID < 0 OR NAME = 'mary' OR NAME = 'fred'") +val df7 = sql("SELECT * FROM foobar " + + "WHERE THEID < 0 OR TRIM(NAME) = 'mary' OR NAME = 'fred'") +val df8 = sql("SELECT * FROM foobar " + + "WHERE NOT((THEID < 0 OR NAME != 'mary') AND (THEID != 1 OR NAME != 'fred'))") +val df9 = sql("SELECT * FROM foobar " + + "WHERE NOT((THEID < 0 OR NAME != 'mary') AND (THEID != 1 OR TRIM(NAME) != 'fred'))") +val df10 = sql("SELECT * FROM foobar " + + "WHERE (NOT(THEID < 0 OR TRIM(NAME) != 'mary')) OR (THEID = 1 AND NAME = 'fred')") + assert(df1.collect.toSet === Set(Row("mary", 2))) assert(df2.collect.toSet === Set(Row("mary", 2))) +assert(df3.collect.toSet === Set(Row("fred", 1), Row("mary", 2))) +assert(df4.collect.toSet === Set(Row("fred", 1), Row("mary", 2))) +assert(df5.collect.toSet === Set(Row("mary", 2))) +assert(df6.collect.toSet === Set(Row("fred", 1), Row("mary", 2))) +assert(df7.collect.toSet === Set(Row("fred", 1), Row("mary", 2))) +assert(df8.collect.toSet === Set(Row("fred", 1), Row("mary", 2))) +assert(df9.collect.toSet === Set(Row("fred", 1), Row("mary", 2))) +assert(df10.collect.toSet === Set(Row("fred", 1), Row("mary", 2))) --- End diff -- @gatorsmile Thanks for comments, Sean =) You mean you'd like to see test cases calling ```DataSourceStrategy.translateFilter``` directly or end-to-end like ```sql("SELECT * FROM foobar WHERE THEID < 1")``` which goes thru the translateFilter code path to verify. I'm fine either way, just want to clarify. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r152156940 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala --- @@ -296,8 +296,33 @@ class JDBCSuite extends SparkFunSuite // The older versions of spark have this kind of bugs in parquet data source. val df1 = sql("SELECT * FROM foobar WHERE NOT (THEID != 2 AND NAME != 'mary')") val df2 = sql("SELECT * FROM foobar WHERE NOT (THEID != 2) OR NOT (NAME != 'mary')") +val df3 = sql("SELECT * FROM foobar WHERE (THEID > 0 AND NAME = 'mary') OR (NAME = 'fred')") +val df4 = sql("SELECT * FROM foobar " + + "WHERE (THEID > 0 AND TRIM(NAME) = 'mary') OR (NAME = 'fred')") +val df5 = sql("SELECT * FROM foobar " + + "WHERE THEID > 0 AND TRIM(NAME) = 'mary' AND LENGTH(NAME) > 3") +val df6 = sql("SELECT * FROM foobar " + + "WHERE THEID < 0 OR NAME = 'mary' OR NAME = 'fred'") +val df7 = sql("SELECT * FROM foobar " + + "WHERE THEID < 0 OR TRIM(NAME) = 'mary' OR NAME = 'fred'") +val df8 = sql("SELECT * FROM foobar " + + "WHERE NOT((THEID < 0 OR NAME != 'mary') AND (THEID != 1 OR NAME != 'fred'))") +val df9 = sql("SELECT * FROM foobar " + + "WHERE NOT((THEID < 0 OR NAME != 'mary') AND (THEID != 1 OR TRIM(NAME) != 'fred'))") +val df10 = sql("SELECT * FROM foobar " + + "WHERE (NOT(THEID < 0 OR TRIM(NAME) != 'mary')) OR (THEID = 1 AND NAME = 'fred')") + assert(df1.collect.toSet === Set(Row("mary", 2))) assert(df2.collect.toSet === Set(Row("mary", 2))) +assert(df3.collect.toSet === Set(Row("fred", 1), Row("mary", 2))) +assert(df4.collect.toSet === Set(Row("fred", 1), Row("mary", 2))) +assert(df5.collect.toSet === Set(Row("mary", 2))) +assert(df6.collect.toSet === Set(Row("fred", 1), Row("mary", 2))) +assert(df7.collect.toSet === Set(Row("fred", 1), Row("mary", 2))) +assert(df8.collect.toSet === Set(Row("fred", 1), Row("mary", 2))) +assert(df9.collect.toSet === Set(Row("fred", 1), Row("mary", 2))) +assert(df10.collect.toSet === Set(Row("fred", 1), Row("mary", 2))) --- End diff -- They are end-to-end test cases. If you can, we should also add such a unit test suite. In the future, we can add more unit test cases for verifying more complex cases. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r152156397 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -497,7 +497,19 @@ object DataSourceStrategy { Some(sources.IsNotNull(a.name)) case expressions.And(left, right) => -(translateFilter(left) ++ translateFilter(right)).reduceOption(sources.And) +// See SPARK-12218 for detailed discussion +// It is not safe to just convert one side if we do not understand the +// other side. Here is an example used to explain the reason. +// Let's say we have (a = 2 AND trim(b) = 'blah') OR (c > 0) +// and we do not understand how to convert trim(b) = 'blah'. +// If we only convert a = 2, we will end up with +// (a = 2) OR (c > 0), which will generate wrong results. +// Pushing one leg of AND down is only safe to do at the top level. +// You can see ParquetFilters' createFilter for more details. +for { + leftFilter <- translateFilter(left) + rightFilter <- translateFilter(right) +} yield sources.And(leftFilter, rightFilter) --- End diff -- We do not need to clean up the codes in this PR. Let us minimize the code changes and it can simplify the backport. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user jliwork commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r152137096 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala --- @@ -296,8 +296,33 @@ class JDBCSuite extends SparkFunSuite // The older versions of spark have this kind of bugs in parquet data source. val df1 = sql("SELECT * FROM foobar WHERE NOT (THEID != 2 AND NAME != 'mary')") val df2 = sql("SELECT * FROM foobar WHERE NOT (THEID != 2) OR NOT (NAME != 'mary')") +val df3 = sql("SELECT * FROM foobar WHERE (THEID > 0 AND NAME = 'mary') OR (NAME = 'fred')") +val df4 = sql("SELECT * FROM foobar " + + "WHERE (THEID > 0 AND TRIM(NAME) = 'mary') OR (NAME = 'fred')") +val df5 = sql("SELECT * FROM foobar " + + "WHERE THEID > 0 AND TRIM(NAME) = 'mary' AND LENGTH(NAME) > 3") +val df6 = sql("SELECT * FROM foobar " + + "WHERE THEID < 0 OR NAME = 'mary' OR NAME = 'fred'") +val df7 = sql("SELECT * FROM foobar " + + "WHERE THEID < 0 OR TRIM(NAME) = 'mary' OR NAME = 'fred'") +val df8 = sql("SELECT * FROM foobar " + + "WHERE NOT((THEID < 0 OR NAME != 'mary') AND (THEID != 1 OR NAME != 'fred'))") +val df9 = sql("SELECT * FROM foobar " + + "WHERE NOT((THEID < 0 OR NAME != 'mary') AND (THEID != 1 OR TRIM(NAME) != 'fred'))") +val df10 = sql("SELECT * FROM foobar " + + "WHERE (NOT(THEID < 0 OR TRIM(NAME) != 'mary')) OR (THEID = 1 AND NAME = 'fred')") + assert(df1.collect.toSet === Set(Row("mary", 2))) assert(df2.collect.toSet === Set(Row("mary", 2))) +assert(df3.collect.toSet === Set(Row("fred", 1), Row("mary", 2))) +assert(df4.collect.toSet === Set(Row("fred", 1), Row("mary", 2))) +assert(df5.collect.toSet === Set(Row("mary", 2))) +assert(df6.collect.toSet === Set(Row("fred", 1), Row("mary", 2))) +assert(df7.collect.toSet === Set(Row("fred", 1), Row("mary", 2))) +assert(df8.collect.toSet === Set(Row("fred", 1), Row("mary", 2))) +assert(df9.collect.toSet === Set(Row("fred", 1), Row("mary", 2))) +assert(df10.collect.toSet === Set(Row("fred", 1), Row("mary", 2))) --- End diff -- Sure. I can help. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user jliwork commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r152136470 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -497,7 +497,19 @@ object DataSourceStrategy { Some(sources.IsNotNull(a.name)) case expressions.And(left, right) => -(translateFilter(left) ++ translateFilter(right)).reduceOption(sources.And) +// See SPARK-12218 for detailed discussion +// It is not safe to just convert one side if we do not understand the +// other side. Here is an example used to explain the reason. +// Let's say we have (a = 2 AND trim(b) = 'blah') OR (c > 0) +// and we do not understand how to convert trim(b) = 'blah'. +// If we only convert a = 2, we will end up with +// (a = 2) OR (c > 0), which will generate wrong results. +// Pushing one leg of AND down is only safe to do at the top level. +// You can see ParquetFilters' createFilter for more details. +for { + leftFilter <- translateFilter(left) + rightFilter <- translateFilter(right) +} yield sources.And(leftFilter, rightFilter) --- End diff -- I would think so. SPARK-12218 put fixes into ParquetFilters.createFilter and OrcFilters.createFilter. They're similar to DataSourceStrategy.translateFilter but have different signature customized for Parquet and Orc. For all datasources including JDBC, Parquet, etc, translateFilter is called to determine if a predicate Expression can be pushed down as a Filter or not. Next for Parquet and ORC, Fitlers get mapped to Parquet or ORC specific filters with their own createFilter method. So this PR does help all datasources to get the correct set of push down predicates. Without this PR we simply got lucky with Parquet and ORC in terms of result correctness because 1) it looks like we always apply Filter on top of scan; 2) we end up with same number of or more rows returned with one leg missing from AND. JDBC data source does not always come with Filter on top of scan therefore exposed the bug. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r151962055 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala --- @@ -296,8 +296,33 @@ class JDBCSuite extends SparkFunSuite // The older versions of spark have this kind of bugs in parquet data source. val df1 = sql("SELECT * FROM foobar WHERE NOT (THEID != 2 AND NAME != 'mary')") val df2 = sql("SELECT * FROM foobar WHERE NOT (THEID != 2) OR NOT (NAME != 'mary')") +val df3 = sql("SELECT * FROM foobar WHERE (THEID > 0 AND NAME = 'mary') OR (NAME = 'fred')") +val df4 = sql("SELECT * FROM foobar " + + "WHERE (THEID > 0 AND TRIM(NAME) = 'mary') OR (NAME = 'fred')") +val df5 = sql("SELECT * FROM foobar " + + "WHERE THEID > 0 AND TRIM(NAME) = 'mary' AND LENGTH(NAME) > 3") +val df6 = sql("SELECT * FROM foobar " + + "WHERE THEID < 0 OR NAME = 'mary' OR NAME = 'fred'") +val df7 = sql("SELECT * FROM foobar " + + "WHERE THEID < 0 OR TRIM(NAME) = 'mary' OR NAME = 'fred'") +val df8 = sql("SELECT * FROM foobar " + + "WHERE NOT((THEID < 0 OR NAME != 'mary') AND (THEID != 1 OR NAME != 'fred'))") +val df9 = sql("SELECT * FROM foobar " + + "WHERE NOT((THEID < 0 OR NAME != 'mary') AND (THEID != 1 OR TRIM(NAME) != 'fred'))") +val df10 = sql("SELECT * FROM foobar " + + "WHERE (NOT(THEID < 0 OR TRIM(NAME) != 'mary')) OR (THEID = 1 AND NAME = 'fred')") + assert(df1.collect.toSet === Set(Row("mary", 2))) assert(df2.collect.toSet === Set(Row("mary", 2))) +assert(df3.collect.toSet === Set(Row("fred", 1), Row("mary", 2))) +assert(df4.collect.toSet === Set(Row("fred", 1), Row("mary", 2))) +assert(df5.collect.toSet === Set(Row("mary", 2))) +assert(df6.collect.toSet === Set(Row("fred", 1), Row("mary", 2))) +assert(df7.collect.toSet === Set(Row("fred", 1), Row("mary", 2))) +assert(df8.collect.toSet === Set(Row("fred", 1), Row("mary", 2))) +assert(df9.collect.toSet === Set(Row("fred", 1), Row("mary", 2))) +assert(df10.collect.toSet === Set(Row("fred", 1), Row("mary", 2))) --- End diff -- I'd like to create a new `DataSourceStrategySuite` to test the `translateFilter`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r151960214 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -497,7 +497,19 @@ object DataSourceStrategy { Some(sources.IsNotNull(a.name)) case expressions.And(left, right) => -(translateFilter(left) ++ translateFilter(right)).reduceOption(sources.And) +// See SPARK-12218 for detailed discussion +// It is not safe to just convert one side if we do not understand the +// other side. Here is an example used to explain the reason. +// Let's say we have (a = 2 AND trim(b) = 'blah') OR (c > 0) +// and we do not understand how to convert trim(b) = 'blah'. +// If we only convert a = 2, we will end up with +// (a = 2) OR (c > 0), which will generate wrong results. +// Pushing one leg of AND down is only safe to do at the top level. +// You can see ParquetFilters' createFilter for more details. +for { + leftFilter <- translateFilter(left) + rightFilter <- translateFilter(right) +} yield sources.And(leftFilter, rightFilter) --- End diff -- do we still need SPARK-12218 after this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user jliwork commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r151922970 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -497,7 +497,11 @@ object DataSourceStrategy { Some(sources.IsNotNull(a.name)) case expressions.And(left, right) => -(translateFilter(left) ++ translateFilter(right)).reduceOption(sources.And) +// See SPARK-12218 and PR 10362 for detailed discussion --- End diff -- @viirya I see. Thanks, Simon! I've removed the PR number from the comment. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user jliwork commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r151920447 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -497,7 +497,10 @@ object DataSourceStrategy { Some(sources.IsNotNull(a.name)) case expressions.And(left, right) => -(translateFilter(left) ++ translateFilter(right)).reduceOption(sources.And) +for { --- End diff -- Thanks. Just did that as you suggested. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r151920212 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -497,7 +497,11 @@ object DataSourceStrategy { Some(sources.IsNotNull(a.name)) case expressions.And(left, right) => -(translateFilter(left) ++ translateFilter(right)).reduceOption(sources.And) +// See SPARK-12218 and PR 10362 for detailed discussion --- End diff -- Usually we don't list PR number but just JIRA number is enough. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user jliwork commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r151920126 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -497,7 +497,11 @@ object DataSourceStrategy { Some(sources.IsNotNull(a.name)) case expressions.And(left, right) => -(translateFilter(left) ++ translateFilter(right)).reduceOption(sources.And) +// See SPARK-12218 and PR 10362 for detailed discussion --- End diff -- Sure. I have added more comments there with an example. Thanks, Sean! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r151917654 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -497,7 +497,11 @@ object DataSourceStrategy { Some(sources.IsNotNull(a.name)) case expressions.And(left, right) => -(translateFilter(left) ++ translateFilter(right)).reduceOption(sources.And) +// See SPARK-12218 and PR 10362 for detailed discussion --- End diff -- In the comment, you need to give an example to explain why. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r151916997 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -497,7 +497,10 @@ object DataSourceStrategy { Some(sources.IsNotNull(a.name)) case expressions.And(left, right) => -(translateFilter(left) ++ translateFilter(right)).reduceOption(sources.And) +for { --- End diff -- Yeah. Follow what @yhuai wrote in the PR https://github.com/apache/spark/pull/10362/files --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user jliwork commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r151901160 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -497,7 +497,10 @@ object DataSourceStrategy { Some(sources.IsNotNull(a.name)) case expressions.And(left, right) => -(translateFilter(left) ++ translateFilter(right)).reduceOption(sources.And) +for { --- End diff -- Sure. Will do. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19776#discussion_r151900595 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -497,7 +497,10 @@ object DataSourceStrategy { Some(sources.IsNotNull(a.name)) case expressions.And(left, right) => -(translateFilter(left) ++ translateFilter(right)).reduceOption(sources.And) +for { --- End diff -- Let's add a small comment like the PR you pointed out. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...
GitHub user jliwork opened a pull request: https://github.com/apache/spark/pull/19776 [SPARK-22548][SQL] Incorrect nested AND expression pushed down to JDB⦠â¦C data source ## What changes were proposed in this pull request? Letâs say I have a nested AND expression shown below and p2 can not be pushed down, (p1 AND p2) OR p3 In current Spark code, during data source filter translation, (p1 AND p2) is returned as p1 only and p2 is simply lost. This issue occurs with JDBC data source and is similar to SPARK-12218 for Parquet. When we have AND nested below another expression, we should either push both legs or nothing. Note that 1) the current Spark code will always split conjunctive predicate before it determines if a predicate can be pushed down or not; 2) the same translation method is also called by Data Source V2. ## How was this patch tested? Added new unit test cases to JDBCSuite You can merge this pull request into a Git repository by running: $ git pull https://github.com/jliwork/spark spark-22548 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19776.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19776 commit 58de88c21210d469b8ef14b1f23764c31ca5651e Author: Jia LiDate: 2017-11-18T01:15:01Z [SPARK-22548][SQL] Incorrect nested AND expression pushed down to JDBC data source --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org