[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r216199952 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -1729,10 +1730,8 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { } test("SPARK-9083: sort with non-deterministic expressions") { -import org.apache.spark.util.random.XORShiftRandom - val seed = 33 -val df = (1 to 100).map(Tuple1.apply).toDF("i") +val df = (1 to 100).map(Tuple1.apply).toDF("i").repartition(1) --- End diff -- I agree with this change It's okay. Was wondering if we actually make the coverage lower for local relation specifically, or if some other tests should be added additionally. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r216199131 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -1729,10 +1730,8 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { } test("SPARK-9083: sort with non-deterministic expressions") { -import org.apache.spark.util.random.XORShiftRandom - val seed = 33 -val df = (1 to 100).map(Tuple1.apply).toDF("i") +val df = (1 to 100).map(Tuple1.apply).toDF("i").repartition(1) --- End diff -- @HyukjinKwon We are leaving this optimization on for MLTest as of now. Should we open it up for TestHive and keep it disabled it for SharedSparkSession ? cc @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r216198738 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -85,14 +85,16 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { } val df5 = Seq((Seq("a", null), Seq(1, 2))).toDF("k", "v") -intercept[RuntimeException] { +val msg1 = intercept[Exception] { --- End diff -- @HyukjinKwon Yeah... we could have caught SparkException here. My intention was to have this test case pass both when location relation optimization is on and off. Thats why i changed it a a generic exception along with verifying the error text. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r216179901 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -1729,10 +1730,8 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { } test("SPARK-9083: sort with non-deterministic expressions") { -import org.apache.spark.util.random.XORShiftRandom - val seed = 33 -val df = (1 to 100).map(Tuple1.apply).toDF("i") +val df = (1 to 100).map(Tuple1.apply).toDF("i").repartition(1) --- End diff -- BTW, why does this PR target a better test coverage? do we still test the local relation conversion, which might be more common to users as well? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r216179499 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -85,14 +85,16 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { } val df5 = Seq((Seq("a", null), Seq(1, 2))).toDF("k", "v") -intercept[RuntimeException] { +val msg1 = intercept[Exception] { --- End diff -- re: https://github.com/apache/spark/pull/22270#discussion_r215485269 Didn't we disable the local relation test? Why don't we catch explicit `SparkExection`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r216009774 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -1729,10 +1730,8 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { } test("SPARK-9083: sort with non-deterministic expressions") { -import org.apache.spark.util.random.XORShiftRandom - val seed = 33 -val df = (1 to 100).map(Tuple1.apply).toDF("i") +val df = (1 to 100).map(Tuple1.apply).toDF("i").repartition(1) --- End diff -- @cloud-fan I was just trying get this test case to pass when `ConvertToLocalRelation` is enabled as well as disabled. So when this rule is active, i saw that all the calls to random.nextXXX happens in one thread. When this rule is disabled, the random values get evaluated under the `project` operator and gets called from multiple threads. Thats why i am repartitioning the data frame to enforce single threaded execution. Is this not the right thing to do ? Please let me know .. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r215869136 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -1729,10 +1730,8 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { } test("SPARK-9083: sort with non-deterministic expressions") { -import org.apache.spark.util.random.XORShiftRandom - val seed = 33 -val df = (1 to 100).map(Tuple1.apply).toDF("i") +val df = (1 to 100).map(Tuple1.apply).toDF("i").repartition(1) --- End diff -- Sorry I didn't follow this thread closely. Why do we need these repartition(1) changes? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22270 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r215485269 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -85,12 +85,12 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { } val df5 = Seq((Seq("a", null), Seq(1, 2))).toDF("k", "v") -intercept[RuntimeException] { +intercept[Exception] { --- End diff -- @HyukjinKwon We get a SparkException here which in turn wraps a RuntimeException. When we have ConvertToLocalRelation active, we get a RuntimeException from driver. But when we disable it, the error is raised from the executor with a SparkException as the top level exception. Thats the reason i changed it to intercept `Exception` so that this test can run both when the rule is active vs when its not. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r215485064 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -85,12 +85,12 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { } val df5 = Seq((Seq("a", null), Seq(1, 2))).toDF("k", "v") -intercept[RuntimeException] { +intercept[Exception] { --- End diff -- @gatorsmile Thanks.. I am checking the message now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r215484920 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -85,12 +85,12 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { } val df5 = Seq((Seq("a", null), Seq(1, 2))).toDF("k", "v") -intercept[RuntimeException] { +intercept[Exception] { df5.select(map_from_arrays($"k", $"v")).collect --- End diff -- @maropu We get a SparkException here which in turn wraps a RuntimeException. When we have ConvertToLocalRelation active, we get a RuntimeException from driver. But when we disable it, the error is raised from the executor with a SparkException as the top level exception. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r214564426 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -85,12 +85,12 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { } val df5 = Seq((Seq("a", null), Seq(1, 2))).toDF("k", "v") -intercept[RuntimeException] { +intercept[Exception] { --- End diff -- Shall we also catch specific exception per https://github.com/databricks/scala-style-guide#testing-intercepting --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r214500639 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -1464,12 +1465,14 @@ case class ArrayContains(left: Expression, right: Expression) nullSafeCodeGen(ctx, ev, (arr, value) => { val i = ctx.freshName("i") val getValue = CodeGenerator.getValue(arr, right.dataType, i) + val setIsNullCode = if (nullable) s"${ev.isNull} = true;" else "" + val unsetIsNullCode = if (nullable) s"${ev.isNull} = false;" else "" s""" for (int $i = 0; $i < $arr.numElements(); $i ++) { if ($arr.isNullAt($i)) { --- End diff -- many thanks! ya, plz ping me to review ;) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r214499281 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -1464,12 +1465,14 @@ case class ArrayContains(left: Expression, right: Expression) nullSafeCodeGen(ctx, ev, (arr, value) => { val i = ctx.freshName("i") val getValue = CodeGenerator.getValue(arr, right.dataType, i) + val setIsNullCode = if (nullable) s"${ev.isNull} = true;" else "" + val unsetIsNullCode = if (nullable) s"${ev.isNull} = false;" else "" s""" for (int $i = 0; $i < $arr.numElements(); $i ++) { if ($arr.isNullAt($i)) { --- End diff -- @maropu You are right !! I will try to optimize this in the other pr i am going to open. please check if you like it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r214499216 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -85,12 +85,12 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { } val df5 = Seq((Seq("a", null), Seq(1, 2))).toDF("k", "v") -intercept[RuntimeException] { +intercept[Exception] { df5.select(map_from_arrays($"k", $"v")).collect --- End diff -- @maropu I will double check and get back. But i think it was SparkException. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r214499155 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -1464,12 +1465,14 @@ case class ArrayContains(left: Expression, right: Expression) nullSafeCodeGen(ctx, ev, (arr, value) => { val i = ctx.freshName("i") val getValue = CodeGenerator.getValue(arr, right.dataType, i) + val setIsNullCode = if (nullable) s"${ev.isNull} = true;" else "" + val unsetIsNullCode = if (nullable) s"${ev.isNull} = false;" else "" s""" for (int $i = 0; $i < $arr.numElements(); $i ++) { if ($arr.isNullAt($i)) { - ${ev.isNull} = true; + ${setIsNullCode} } else if (${ctx.genEqual(right.dataType, value, getValue)}) { - ${ev.isNull} = false; + ${unsetIsNullCode} --- End diff -- @maropu will change --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r214499145 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -1730,9 +1730,8 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { test("SPARK-9083: sort with non-deterministic expressions") { import org.apache.spark.util.random.XORShiftRandom --- End diff -- @maropu will do. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r214433404 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -1464,12 +1465,14 @@ case class ArrayContains(left: Expression, right: Expression) nullSafeCodeGen(ctx, ev, (arr, value) => { val i = ctx.freshName("i") val getValue = CodeGenerator.getValue(arr, right.dataType, i) + val setIsNullCode = if (nullable) s"${ev.isNull} = true;" else "" + val unsetIsNullCode = if (nullable) s"${ev.isNull} = false;" else "" s""" for (int $i = 0; $i < $arr.numElements(); $i ++) { if ($arr.isNullAt($i)) { - ${ev.isNull} = true; + ${setIsNullCode} --- End diff -- @gatorsmile OK.. Sean. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r214432388 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -85,12 +85,12 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { } val df5 = Seq((Seq("a", null), Seq(1, 2))).toDF("k", "v") -intercept[RuntimeException] { +intercept[Exception] { --- End diff -- Let us capture the exception and compare the error messages. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r214432272 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -1464,12 +1465,14 @@ case class ArrayContains(left: Expression, right: Expression) nullSafeCodeGen(ctx, ev, (arr, value) => { val i = ctx.freshName("i") val getValue = CodeGenerator.getValue(arr, right.dataType, i) + val setIsNullCode = if (nullable) s"${ev.isNull} = true;" else "" + val unsetIsNullCode = if (nullable) s"${ev.isNull} = false;" else "" s""" for (int $i = 0; $i < $arr.numElements(); $i ++) { if ($arr.isNullAt($i)) { - ${ev.isNull} = true; + ${setIsNullCode} --- End diff -- Please open separate JIRAs and PRs for the codegen fixes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r214351711 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -1464,12 +1465,14 @@ case class ArrayContains(left: Expression, right: Expression) nullSafeCodeGen(ctx, ev, (arr, value) => { val i = ctx.freshName("i") val getValue = CodeGenerator.getValue(arr, right.dataType, i) + val setIsNullCode = if (nullable) s"${ev.isNull} = true;" else "" + val unsetIsNullCode = if (nullable) s"${ev.isNull} = false;" else "" s""" for (int $i = 0; $i < $arr.numElements(); $i ++) { if ($arr.isNullAt($i)) { --- End diff -- (This is also not related to this pr though) when `left.dataType.asInstanceOf[ArrayType].containsNull = false`, I think we don't need this condition `if ($arr.isNullAt($i)) {`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r214341661 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -85,12 +85,12 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { } val df5 = Seq((Seq("a", null), Seq(1, 2))).toDF("k", "v") -intercept[RuntimeException] { +intercept[Exception] { df5.select(map_from_arrays($"k", $"v")).collect --- End diff -- What's the concrete exception this query throws? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r214340187 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -1730,9 +1730,8 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { test("SPARK-9083: sort with non-deterministic expressions") { import org.apache.spark.util.random.XORShiftRandom --- End diff -- Can we move this import into the top? (this is not related to this pr though) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r214338698 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -1464,12 +1465,14 @@ case class ArrayContains(left: Expression, right: Expression) nullSafeCodeGen(ctx, ev, (arr, value) => { val i = ctx.freshName("i") val getValue = CodeGenerator.getValue(arr, right.dataType, i) + val setIsNullCode = if (nullable) s"${ev.isNull} = true;" else "" + val unsetIsNullCode = if (nullable) s"${ev.isNull} = false;" else "" s""" for (int $i = 0; $i < $arr.numElements(); $i ++) { if ($arr.isNullAt($i)) { - ${ev.isNull} = true; + ${setIsNullCode} } else if (${ctx.genEqual(right.dataType, value, getValue)}) { - ${ev.isNull} = false; + ${unsetIsNullCode} --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r214338655 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -1464,12 +1465,14 @@ case class ArrayContains(left: Expression, right: Expression) nullSafeCodeGen(ctx, ev, (arr, value) => { val i = ctx.freshName("i") val getValue = CodeGenerator.getValue(arr, right.dataType, i) + val setIsNullCode = if (nullable) s"${ev.isNull} = true;" else "" + val unsetIsNullCode = if (nullable) s"${ev.isNull} = false;" else "" s""" for (int $i = 0; $i < $arr.numElements(); $i ++) { if ($arr.isNullAt($i)) { - ${ev.isNull} = true; + ${setIsNullCode} --- End diff -- nit: `${setIsNullCode}` -> `$setIsNullCode` btw, you found some bugs when excluding the `ConvertToLocalRelation` rule in tests, right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r213945889 --- Diff: mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala --- @@ -652,65 +653,66 @@ class ALSSuite extends MLTest with DefaultReadWriteTest with Logging { test("input type validation") { val spark = this.spark import spark.implicits._ - -// check that ALS can handle all numeric types for rating column -// and user/item columns (when the user/item ids are within Int range) -val als = new ALS().setMaxIter(1).setRank(1) -Seq(("user", IntegerType), ("item", IntegerType), ("rating", FloatType)).foreach { - case (colName, sqlType) => -checkNumericTypesALS(als, spark, colName, sqlType) { - (ex, act) => -ex.userFactors.first().getSeq[Float](1) === act.userFactors.first().getSeq[Float](1) -} { (ex, act, df, enc) => - val expected = ex.transform(df).selectExpr("prediction") -.first().getFloat(0) - testTransformerByGlobalCheckFunc(df, act, "prediction") { -case rows: Seq[Row] => - expected ~== rows.head.getFloat(0) absTol 1e-6 - }(enc) -} -} -// check user/item ids falling outside of Int range -val big = Int.MaxValue.toLong + 1 -val small = Int.MinValue.toDouble - 1 -val df = Seq( - (0, 0L, 0d, 1, 1L, 1d, 3.0), - (0, big, small, 0, big, small, 2.0), - (1, 1L, 1d, 0, 0L, 0d, 5.0) -).toDF("user", "user_big", "user_small", "item", "item_big", "item_small", "rating") -val msg = "either out of Integer range or contained a fractional part" -withClue("fit should fail when ids exceed integer range. ") { - assert(intercept[SparkException] { -als.fit(df.select(df("user_big").as("user"), df("item"), df("rating"))) - }.getCause.getMessage.contains(msg)) - assert(intercept[SparkException] { -als.fit(df.select(df("user_small").as("user"), df("item"), df("rating"))) - }.getCause.getMessage.contains(msg)) - assert(intercept[SparkException] { -als.fit(df.select(df("item_big").as("item"), df("user"), df("rating"))) - }.getCause.getMessage.contains(msg)) - assert(intercept[SparkException] { -als.fit(df.select(df("item_small").as("item"), df("user"), df("rating"))) - }.getCause.getMessage.contains(msg)) -} -withClue("transform should fail when ids exceed integer range. ") { - val model = als.fit(df) - def testTransformIdExceedsIntRange[A : Encoder](dataFrame: DataFrame): Unit = { +withSQLConf(SQLConf.OPTIMIZER_EXCLUDED_RULES.key -> "") { --- End diff -- Sure. :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r213942732 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala --- @@ -59,7 +60,8 @@ object TestHive .set("spark.sql.warehouse.dir", TestHiveContext.makeWarehouseDir().toURI.getPath) // SPARK-8910 .set("spark.ui.enabled", "false") -.set("spark.unsafe.exceptionOnMemoryLeak", "true"))) +.set("spark.unsafe.exceptionOnMemoryLeak", "true") +.set(SQLConf.OPTIMIZER_EXCLUDED_RULES.key, ConvertToLocalRelation.ruleName))) --- End diff -- @viirya Sure.. i will add. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r213942715 --- Diff: mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala --- @@ -652,65 +653,66 @@ class ALSSuite extends MLTest with DefaultReadWriteTest with Logging { test("input type validation") { val spark = this.spark import spark.implicits._ - -// check that ALS can handle all numeric types for rating column -// and user/item columns (when the user/item ids are within Int range) -val als = new ALS().setMaxIter(1).setRank(1) -Seq(("user", IntegerType), ("item", IntegerType), ("rating", FloatType)).foreach { - case (colName, sqlType) => -checkNumericTypesALS(als, spark, colName, sqlType) { - (ex, act) => -ex.userFactors.first().getSeq[Float](1) === act.userFactors.first().getSeq[Float](1) -} { (ex, act, df, enc) => - val expected = ex.transform(df).selectExpr("prediction") -.first().getFloat(0) - testTransformerByGlobalCheckFunc(df, act, "prediction") { -case rows: Seq[Row] => - expected ~== rows.head.getFloat(0) absTol 1e-6 - }(enc) -} -} -// check user/item ids falling outside of Int range -val big = Int.MaxValue.toLong + 1 -val small = Int.MinValue.toDouble - 1 -val df = Seq( - (0, 0L, 0d, 1, 1L, 1d, 3.0), - (0, big, small, 0, big, small, 2.0), - (1, 1L, 1d, 0, 0L, 0d, 5.0) -).toDF("user", "user_big", "user_small", "item", "item_big", "item_small", "rating") -val msg = "either out of Integer range or contained a fractional part" -withClue("fit should fail when ids exceed integer range. ") { - assert(intercept[SparkException] { -als.fit(df.select(df("user_big").as("user"), df("item"), df("rating"))) - }.getCause.getMessage.contains(msg)) - assert(intercept[SparkException] { -als.fit(df.select(df("user_small").as("user"), df("item"), df("rating"))) - }.getCause.getMessage.contains(msg)) - assert(intercept[SparkException] { -als.fit(df.select(df("item_big").as("item"), df("user"), df("rating"))) - }.getCause.getMessage.contains(msg)) - assert(intercept[SparkException] { -als.fit(df.select(df("item_small").as("item"), df("user"), df("rating"))) - }.getCause.getMessage.contains(msg)) -} -withClue("transform should fail when ids exceed integer range. ") { - val model = als.fit(df) - def testTransformIdExceedsIntRange[A : Encoder](dataFrame: DataFrame): Unit = { +withSQLConf(SQLConf.OPTIMIZER_EXCLUDED_RULES.key -> "") { --- End diff -- @viirya Thanks !! Actually currently i am just making changes to move forward with testing to identify the failures. I will open separate pr for code/test fix. So lets discuss the right way to fix the problem there ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r213939010 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala --- @@ -59,7 +60,8 @@ object TestHive .set("spark.sql.warehouse.dir", TestHiveContext.makeWarehouseDir().toURI.getPath) // SPARK-8910 .set("spark.ui.enabled", "false") -.set("spark.unsafe.exceptionOnMemoryLeak", "true"))) +.set("spark.unsafe.exceptionOnMemoryLeak", "true") +.set(SQLConf.OPTIMIZER_EXCLUDED_RULES.key, ConvertToLocalRelation.ruleName))) --- End diff -- Can we add a comment for this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r213938811 --- Diff: mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala --- @@ -652,65 +653,66 @@ class ALSSuite extends MLTest with DefaultReadWriteTest with Logging { test("input type validation") { val spark = this.spark import spark.implicits._ - -// check that ALS can handle all numeric types for rating column -// and user/item columns (when the user/item ids are within Int range) -val als = new ALS().setMaxIter(1).setRank(1) -Seq(("user", IntegerType), ("item", IntegerType), ("rating", FloatType)).foreach { - case (colName, sqlType) => -checkNumericTypesALS(als, spark, colName, sqlType) { - (ex, act) => -ex.userFactors.first().getSeq[Float](1) === act.userFactors.first().getSeq[Float](1) -} { (ex, act, df, enc) => - val expected = ex.transform(df).selectExpr("prediction") -.first().getFloat(0) - testTransformerByGlobalCheckFunc(df, act, "prediction") { -case rows: Seq[Row] => - expected ~== rows.head.getFloat(0) absTol 1e-6 - }(enc) -} -} -// check user/item ids falling outside of Int range -val big = Int.MaxValue.toLong + 1 -val small = Int.MinValue.toDouble - 1 -val df = Seq( - (0, 0L, 0d, 1, 1L, 1d, 3.0), - (0, big, small, 0, big, small, 2.0), - (1, 1L, 1d, 0, 0L, 0d, 5.0) -).toDF("user", "user_big", "user_small", "item", "item_big", "item_small", "rating") -val msg = "either out of Integer range or contained a fractional part" -withClue("fit should fail when ids exceed integer range. ") { - assert(intercept[SparkException] { -als.fit(df.select(df("user_big").as("user"), df("item"), df("rating"))) - }.getCause.getMessage.contains(msg)) - assert(intercept[SparkException] { -als.fit(df.select(df("user_small").as("user"), df("item"), df("rating"))) - }.getCause.getMessage.contains(msg)) - assert(intercept[SparkException] { -als.fit(df.select(df("item_big").as("item"), df("user"), df("rating"))) - }.getCause.getMessage.contains(msg)) - assert(intercept[SparkException] { -als.fit(df.select(df("item_small").as("item"), df("user"), df("rating"))) - }.getCause.getMessage.contains(msg)) -} -withClue("transform should fail when ids exceed integer range. ") { - val model = als.fit(df) - def testTransformIdExceedsIntRange[A : Encoder](dataFrame: DataFrame): Unit = { +withSQLConf(SQLConf.OPTIMIZER_EXCLUDED_RULES.key -> "") { --- End diff -- If we only want to disable `ConvertToLocalRelation` in sql/core and sql/hive, maybe we can set this sql conf at `MLTest`? I don't see any usage of `withSQLConf` in ml tests. It looks a bit weird to see it here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22270#discussion_r213923292 --- Diff: mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala --- @@ -24,12 +24,10 @@ import scala.collection.JavaConverters._ import scala.collection.mutable import scala.collection.mutable.{ArrayBuffer, WrappedArray} import scala.language.existentials - import com.github.fommil.netlib.BLAS.{getInstance => blas} import org.apache.commons.io.FileUtils import org.apache.commons.io.filefilter.TrueFileFilter import org.scalatest.BeforeAndAfterEach - --- End diff -- Revert the blank lines? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...
GitHub user dilipbiswal opened a pull request: https://github.com/apache/spark/pull/22270 [SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation in the test cases of sql/core and sql/hive ## What changes were proposed in this pull request? In SharedSparkSession and TestHive, we need to disable the rule ConvertToLocalRelation for better test case coverage. ## How was this patch tested? Identify the failures after excluding "ConvertToLocalRelation" rule. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dilipbiswal/spark SPARK-25267-final Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22270.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22270 commit bd57944106348e0d0ede63afb142bfcf59af80f3 Author: Dilip Biswal Date: 2018-08-28T21:15:32Z [SPARK-25267] Disable ConvertToLocalRelation in the test cases of sql/core and sql/hive --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org