[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r216199952
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -1729,10 +1730,8 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
   }
 
   test("SPARK-9083: sort with non-deterministic expressions") {
-import org.apache.spark.util.random.XORShiftRandom
-
 val seed = 33
-val df = (1 to 100).map(Tuple1.apply).toDF("i")
+val df = (1 to 100).map(Tuple1.apply).toDF("i").repartition(1)
--- End diff --

I agree with this change It's okay. Was wondering if we actually make the 
coverage lower for local relation specifically, or if some other tests should 
be added additionally.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-09-09 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r216199131
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -1729,10 +1730,8 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
   }
 
   test("SPARK-9083: sort with non-deterministic expressions") {
-import org.apache.spark.util.random.XORShiftRandom
-
 val seed = 33
-val df = (1 to 100).map(Tuple1.apply).toDF("i")
+val df = (1 to 100).map(Tuple1.apply).toDF("i").repartition(1)
--- End diff --

@HyukjinKwon We are leaving this optimization on for MLTest as of now. 
Should we open it up for TestHive and  keep it disabled it for 
SharedSparkSession ? cc @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-09-09 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r216198738
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -85,14 +85,16 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
 }
 
 val df5 = Seq((Seq("a", null), Seq(1, 2))).toDF("k", "v")
-intercept[RuntimeException] {
+val msg1 = intercept[Exception] {
--- End diff --

@HyukjinKwon Yeah... we could have caught SparkException here. My intention 
was to have this test case pass both when location relation optimization is on 
and off. Thats why i changed it a a generic exception along with verifying the 
error text.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r216179901
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -1729,10 +1730,8 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
   }
 
   test("SPARK-9083: sort with non-deterministic expressions") {
-import org.apache.spark.util.random.XORShiftRandom
-
 val seed = 33
-val df = (1 to 100).map(Tuple1.apply).toDF("i")
+val df = (1 to 100).map(Tuple1.apply).toDF("i").repartition(1)
--- End diff --

BTW, why does this PR target a better test coverage? do we still test the 
local relation conversion, which might be more common to users as well?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-09-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r216179499
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -85,14 +85,16 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
 }
 
 val df5 = Seq((Seq("a", null), Seq(1, 2))).toDF("k", "v")
-intercept[RuntimeException] {
+val msg1 = intercept[Exception] {
--- End diff --

re: https://github.com/apache/spark/pull/22270#discussion_r215485269

Didn't we disable the local relation test? Why don't we catch explicit 
`SparkExection`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-09-07 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r216009774
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -1729,10 +1730,8 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
   }
 
   test("SPARK-9083: sort with non-deterministic expressions") {
-import org.apache.spark.util.random.XORShiftRandom
-
 val seed = 33
-val df = (1 to 100).map(Tuple1.apply).toDF("i")
+val df = (1 to 100).map(Tuple1.apply).toDF("i").repartition(1)
--- End diff --

@cloud-fan I was just trying get this test case to pass when 
`ConvertToLocalRelation` is enabled as well as disabled. So when this rule is 
active, i saw that all the calls to random.nextXXX happens in one thread. When 
this rule is disabled, the random values get evaluated under the `project` 
operator and gets called from multiple threads. Thats why i am repartitioning 
the data frame to enforce single threaded execution. Is this not the right 
thing to do ? Please let me know ..


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-09-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r215869136
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -1729,10 +1730,8 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
   }
 
   test("SPARK-9083: sort with non-deterministic expressions") {
-import org.apache.spark.util.random.XORShiftRandom
-
 val seed = 33
-val df = (1 to 100).map(Tuple1.apply).toDF("i")
+val df = (1 to 100).map(Tuple1.apply).toDF("i").repartition(1)
--- End diff --

Sorry I didn't follow this thread closely. Why do we need these 
repartition(1) changes?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-09-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22270


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-09-05 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r215485269
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -85,12 +85,12 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
 }
 
 val df5 = Seq((Seq("a", null), Seq(1, 2))).toDF("k", "v")
-intercept[RuntimeException] {
+intercept[Exception] {
--- End diff --

@HyukjinKwon We get a SparkException here which in turn wraps a 
RuntimeException. When we have ConvertToLocalRelation active, we get a 
RuntimeException from driver. But when we disable it, the error is raised from 
the executor with a SparkException as the top level exception. Thats the reason 
i changed it to intercept `Exception` so that this test can run both when the 
rule is active vs when its not.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-09-05 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r215485064
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -85,12 +85,12 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
 }
 
 val df5 = Seq((Seq("a", null), Seq(1, 2))).toDF("k", "v")
-intercept[RuntimeException] {
+intercept[Exception] {
--- End diff --

@gatorsmile Thanks.. I am checking the message now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-09-05 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r215484920
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -85,12 +85,12 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
 }
 
 val df5 = Seq((Seq("a", null), Seq(1, 2))).toDF("k", "v")
-intercept[RuntimeException] {
+intercept[Exception] {
   df5.select(map_from_arrays($"k", $"v")).collect
--- End diff --

@maropu We get a SparkException here which in turn wraps a 
RuntimeException. When we have ConvertToLocalRelation active, we get a 
RuntimeException from driver. But when we disable it, the error is raised from 
the executor with a SparkException as the top level exception.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-09-02 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r214564426
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -85,12 +85,12 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
 }
 
 val df5 = Seq((Seq("a", null), Seq(1, 2))).toDF("k", "v")
-intercept[RuntimeException] {
+intercept[Exception] {
--- End diff --

Shall we also catch specific exception per 
https://github.com/databricks/scala-style-guide#testing-intercepting


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-08-31 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r214500639
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1464,12 +1465,14 @@ case class ArrayContains(left: Expression, right: 
Expression)
 nullSafeCodeGen(ctx, ev, (arr, value) => {
   val i = ctx.freshName("i")
   val getValue = CodeGenerator.getValue(arr, right.dataType, i)
+  val setIsNullCode = if (nullable) s"${ev.isNull} = true;" else ""
+  val unsetIsNullCode = if (nullable) s"${ev.isNull} = false;" else ""
   s"""
   for (int $i = 0; $i < $arr.numElements(); $i ++) {
 if ($arr.isNullAt($i)) {
--- End diff --

many thanks! ya, plz ping me to review ;)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-08-31 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r214499281
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1464,12 +1465,14 @@ case class ArrayContains(left: Expression, right: 
Expression)
 nullSafeCodeGen(ctx, ev, (arr, value) => {
   val i = ctx.freshName("i")
   val getValue = CodeGenerator.getValue(arr, right.dataType, i)
+  val setIsNullCode = if (nullable) s"${ev.isNull} = true;" else ""
+  val unsetIsNullCode = if (nullable) s"${ev.isNull} = false;" else ""
   s"""
   for (int $i = 0; $i < $arr.numElements(); $i ++) {
 if ($arr.isNullAt($i)) {
--- End diff --

@maropu You are right !! I will try to optimize this in the other pr i am 
going to open. please check if you like it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-08-31 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r214499216
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -85,12 +85,12 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
 }
 
 val df5 = Seq((Seq("a", null), Seq(1, 2))).toDF("k", "v")
-intercept[RuntimeException] {
+intercept[Exception] {
   df5.select(map_from_arrays($"k", $"v")).collect
--- End diff --

@maropu I will double check and get back. But i think it was SparkException.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-08-31 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r214499155
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1464,12 +1465,14 @@ case class ArrayContains(left: Expression, right: 
Expression)
 nullSafeCodeGen(ctx, ev, (arr, value) => {
   val i = ctx.freshName("i")
   val getValue = CodeGenerator.getValue(arr, right.dataType, i)
+  val setIsNullCode = if (nullable) s"${ev.isNull} = true;" else ""
+  val unsetIsNullCode = if (nullable) s"${ev.isNull} = false;" else ""
   s"""
   for (int $i = 0; $i < $arr.numElements(); $i ++) {
 if ($arr.isNullAt($i)) {
-  ${ev.isNull} = true;
+  ${setIsNullCode}
 } else if (${ctx.genEqual(right.dataType, value, getValue)}) {
-  ${ev.isNull} = false;
+  ${unsetIsNullCode}
--- End diff --

@maropu will change


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-08-31 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r214499145
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -1730,9 +1730,8 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
 
   test("SPARK-9083: sort with non-deterministic expressions") {
 import org.apache.spark.util.random.XORShiftRandom
--- End diff --

@maropu will do.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-08-31 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r214433404
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1464,12 +1465,14 @@ case class ArrayContains(left: Expression, right: 
Expression)
 nullSafeCodeGen(ctx, ev, (arr, value) => {
   val i = ctx.freshName("i")
   val getValue = CodeGenerator.getValue(arr, right.dataType, i)
+  val setIsNullCode = if (nullable) s"${ev.isNull} = true;" else ""
+  val unsetIsNullCode = if (nullable) s"${ev.isNull} = false;" else ""
   s"""
   for (int $i = 0; $i < $arr.numElements(); $i ++) {
 if ($arr.isNullAt($i)) {
-  ${ev.isNull} = true;
+  ${setIsNullCode}
--- End diff --

@gatorsmile OK.. Sean.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-08-31 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r214432388
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -85,12 +85,12 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
 }
 
 val df5 = Seq((Seq("a", null), Seq(1, 2))).toDF("k", "v")
-intercept[RuntimeException] {
+intercept[Exception] {
--- End diff --

Let us capture the exception and compare the error messages.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-08-31 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r214432272
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1464,12 +1465,14 @@ case class ArrayContains(left: Expression, right: 
Expression)
 nullSafeCodeGen(ctx, ev, (arr, value) => {
   val i = ctx.freshName("i")
   val getValue = CodeGenerator.getValue(arr, right.dataType, i)
+  val setIsNullCode = if (nullable) s"${ev.isNull} = true;" else ""
+  val unsetIsNullCode = if (nullable) s"${ev.isNull} = false;" else ""
   s"""
   for (int $i = 0; $i < $arr.numElements(); $i ++) {
 if ($arr.isNullAt($i)) {
-  ${ev.isNull} = true;
+  ${setIsNullCode}
--- End diff --

Please open separate JIRAs and PRs for the codegen fixes. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-08-31 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r214351711
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1464,12 +1465,14 @@ case class ArrayContains(left: Expression, right: 
Expression)
 nullSafeCodeGen(ctx, ev, (arr, value) => {
   val i = ctx.freshName("i")
   val getValue = CodeGenerator.getValue(arr, right.dataType, i)
+  val setIsNullCode = if (nullable) s"${ev.isNull} = true;" else ""
+  val unsetIsNullCode = if (nullable) s"${ev.isNull} = false;" else ""
   s"""
   for (int $i = 0; $i < $arr.numElements(); $i ++) {
 if ($arr.isNullAt($i)) {
--- End diff --

(This is also not related to this pr though) when 
`left.dataType.asInstanceOf[ArrayType].containsNull = false`, I think we don't 
need this condition `if ($arr.isNullAt($i)) {`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-08-31 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r214341661
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -85,12 +85,12 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
 }
 
 val df5 = Seq((Seq("a", null), Seq(1, 2))).toDF("k", "v")
-intercept[RuntimeException] {
+intercept[Exception] {
   df5.select(map_from_arrays($"k", $"v")).collect
--- End diff --

What's the concrete exception this query throws?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-08-31 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r214340187
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -1730,9 +1730,8 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
 
   test("SPARK-9083: sort with non-deterministic expressions") {
 import org.apache.spark.util.random.XORShiftRandom
--- End diff --

Can we move this import into the top? (this is not related to this pr 
though)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-08-31 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r214338698
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1464,12 +1465,14 @@ case class ArrayContains(left: Expression, right: 
Expression)
 nullSafeCodeGen(ctx, ev, (arr, value) => {
   val i = ctx.freshName("i")
   val getValue = CodeGenerator.getValue(arr, right.dataType, i)
+  val setIsNullCode = if (nullable) s"${ev.isNull} = true;" else ""
+  val unsetIsNullCode = if (nullable) s"${ev.isNull} = false;" else ""
   s"""
   for (int $i = 0; $i < $arr.numElements(); $i ++) {
 if ($arr.isNullAt($i)) {
-  ${ev.isNull} = true;
+  ${setIsNullCode}
 } else if (${ctx.genEqual(right.dataType, value, getValue)}) {
-  ${ev.isNull} = false;
+  ${unsetIsNullCode}
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-08-31 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r214338655
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1464,12 +1465,14 @@ case class ArrayContains(left: Expression, right: 
Expression)
 nullSafeCodeGen(ctx, ev, (arr, value) => {
   val i = ctx.freshName("i")
   val getValue = CodeGenerator.getValue(arr, right.dataType, i)
+  val setIsNullCode = if (nullable) s"${ev.isNull} = true;" else ""
+  val unsetIsNullCode = if (nullable) s"${ev.isNull} = false;" else ""
   s"""
   for (int $i = 0; $i < $arr.numElements(); $i ++) {
 if ($arr.isNullAt($i)) {
-  ${ev.isNull} = true;
+  ${setIsNullCode}
--- End diff --

nit: `${setIsNullCode}` -> `$setIsNullCode`
btw, you found some bugs when excluding the `ConvertToLocalRelation` rule 
in tests, right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-08-30 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r213945889
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -652,65 +653,66 @@ class ALSSuite extends MLTest with 
DefaultReadWriteTest with Logging {
   test("input type validation") {
 val spark = this.spark
 import spark.implicits._
-
-// check that ALS can handle all numeric types for rating column
-// and user/item columns (when the user/item ids are within Int range)
-val als = new ALS().setMaxIter(1).setRank(1)
-Seq(("user", IntegerType), ("item", IntegerType), ("rating", 
FloatType)).foreach {
-  case (colName, sqlType) =>
-checkNumericTypesALS(als, spark, colName, sqlType) {
-  (ex, act) =>
-ex.userFactors.first().getSeq[Float](1) === 
act.userFactors.first().getSeq[Float](1)
-} { (ex, act, df, enc) =>
-  val expected = ex.transform(df).selectExpr("prediction")
-.first().getFloat(0)
-  testTransformerByGlobalCheckFunc(df, act, "prediction") {
-case rows: Seq[Row] =>
-  expected ~== rows.head.getFloat(0) absTol 1e-6
-  }(enc)
-}
-}
-// check user/item ids falling outside of Int range
-val big = Int.MaxValue.toLong + 1
-val small = Int.MinValue.toDouble - 1
-val df = Seq(
-  (0, 0L, 0d, 1, 1L, 1d, 3.0),
-  (0, big, small, 0, big, small, 2.0),
-  (1, 1L, 1d, 0, 0L, 0d, 5.0)
-).toDF("user", "user_big", "user_small", "item", "item_big", 
"item_small", "rating")
-val msg = "either out of Integer range or contained a fractional part"
-withClue("fit should fail when ids exceed integer range. ") {
-  assert(intercept[SparkException] {
-als.fit(df.select(df("user_big").as("user"), df("item"), 
df("rating")))
-  }.getCause.getMessage.contains(msg))
-  assert(intercept[SparkException] {
-als.fit(df.select(df("user_small").as("user"), df("item"), 
df("rating")))
-  }.getCause.getMessage.contains(msg))
-  assert(intercept[SparkException] {
-als.fit(df.select(df("item_big").as("item"), df("user"), 
df("rating")))
-  }.getCause.getMessage.contains(msg))
-  assert(intercept[SparkException] {
-als.fit(df.select(df("item_small").as("item"), df("user"), 
df("rating")))
-  }.getCause.getMessage.contains(msg))
-}
-withClue("transform should fail when ids exceed integer range. ") {
-  val model = als.fit(df)
-  def testTransformIdExceedsIntRange[A : Encoder](dataFrame: 
DataFrame): Unit = {
+withSQLConf(SQLConf.OPTIMIZER_EXCLUDED_RULES.key -> "") {
--- End diff --

Sure. :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-08-30 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r213942732
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala ---
@@ -59,7 +60,8 @@ object TestHive
 .set("spark.sql.warehouse.dir", 
TestHiveContext.makeWarehouseDir().toURI.getPath)
 // SPARK-8910
 .set("spark.ui.enabled", "false")
-.set("spark.unsafe.exceptionOnMemoryLeak", "true")))
+.set("spark.unsafe.exceptionOnMemoryLeak", "true")
+.set(SQLConf.OPTIMIZER_EXCLUDED_RULES.key, 
ConvertToLocalRelation.ruleName)))
--- End diff --

@viirya Sure.. i will add.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-08-30 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r213942715
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -652,65 +653,66 @@ class ALSSuite extends MLTest with 
DefaultReadWriteTest with Logging {
   test("input type validation") {
 val spark = this.spark
 import spark.implicits._
-
-// check that ALS can handle all numeric types for rating column
-// and user/item columns (when the user/item ids are within Int range)
-val als = new ALS().setMaxIter(1).setRank(1)
-Seq(("user", IntegerType), ("item", IntegerType), ("rating", 
FloatType)).foreach {
-  case (colName, sqlType) =>
-checkNumericTypesALS(als, spark, colName, sqlType) {
-  (ex, act) =>
-ex.userFactors.first().getSeq[Float](1) === 
act.userFactors.first().getSeq[Float](1)
-} { (ex, act, df, enc) =>
-  val expected = ex.transform(df).selectExpr("prediction")
-.first().getFloat(0)
-  testTransformerByGlobalCheckFunc(df, act, "prediction") {
-case rows: Seq[Row] =>
-  expected ~== rows.head.getFloat(0) absTol 1e-6
-  }(enc)
-}
-}
-// check user/item ids falling outside of Int range
-val big = Int.MaxValue.toLong + 1
-val small = Int.MinValue.toDouble - 1
-val df = Seq(
-  (0, 0L, 0d, 1, 1L, 1d, 3.0),
-  (0, big, small, 0, big, small, 2.0),
-  (1, 1L, 1d, 0, 0L, 0d, 5.0)
-).toDF("user", "user_big", "user_small", "item", "item_big", 
"item_small", "rating")
-val msg = "either out of Integer range or contained a fractional part"
-withClue("fit should fail when ids exceed integer range. ") {
-  assert(intercept[SparkException] {
-als.fit(df.select(df("user_big").as("user"), df("item"), 
df("rating")))
-  }.getCause.getMessage.contains(msg))
-  assert(intercept[SparkException] {
-als.fit(df.select(df("user_small").as("user"), df("item"), 
df("rating")))
-  }.getCause.getMessage.contains(msg))
-  assert(intercept[SparkException] {
-als.fit(df.select(df("item_big").as("item"), df("user"), 
df("rating")))
-  }.getCause.getMessage.contains(msg))
-  assert(intercept[SparkException] {
-als.fit(df.select(df("item_small").as("item"), df("user"), 
df("rating")))
-  }.getCause.getMessage.contains(msg))
-}
-withClue("transform should fail when ids exceed integer range. ") {
-  val model = als.fit(df)
-  def testTransformIdExceedsIntRange[A : Encoder](dataFrame: 
DataFrame): Unit = {
+withSQLConf(SQLConf.OPTIMIZER_EXCLUDED_RULES.key -> "") {
--- End diff --

@viirya Thanks !! Actually currently i am just making changes to move 
forward with testing to identify the failures. I will open separate pr for 
code/test fix. So lets discuss the right way to fix the problem there ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-08-30 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r213939010
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala ---
@@ -59,7 +60,8 @@ object TestHive
 .set("spark.sql.warehouse.dir", 
TestHiveContext.makeWarehouseDir().toURI.getPath)
 // SPARK-8910
 .set("spark.ui.enabled", "false")
-.set("spark.unsafe.exceptionOnMemoryLeak", "true")))
+.set("spark.unsafe.exceptionOnMemoryLeak", "true")
+.set(SQLConf.OPTIMIZER_EXCLUDED_RULES.key, 
ConvertToLocalRelation.ruleName)))
--- End diff --

Can we add a comment for this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-08-30 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r213938811
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -652,65 +653,66 @@ class ALSSuite extends MLTest with 
DefaultReadWriteTest with Logging {
   test("input type validation") {
 val spark = this.spark
 import spark.implicits._
-
-// check that ALS can handle all numeric types for rating column
-// and user/item columns (when the user/item ids are within Int range)
-val als = new ALS().setMaxIter(1).setRank(1)
-Seq(("user", IntegerType), ("item", IntegerType), ("rating", 
FloatType)).foreach {
-  case (colName, sqlType) =>
-checkNumericTypesALS(als, spark, colName, sqlType) {
-  (ex, act) =>
-ex.userFactors.first().getSeq[Float](1) === 
act.userFactors.first().getSeq[Float](1)
-} { (ex, act, df, enc) =>
-  val expected = ex.transform(df).selectExpr("prediction")
-.first().getFloat(0)
-  testTransformerByGlobalCheckFunc(df, act, "prediction") {
-case rows: Seq[Row] =>
-  expected ~== rows.head.getFloat(0) absTol 1e-6
-  }(enc)
-}
-}
-// check user/item ids falling outside of Int range
-val big = Int.MaxValue.toLong + 1
-val small = Int.MinValue.toDouble - 1
-val df = Seq(
-  (0, 0L, 0d, 1, 1L, 1d, 3.0),
-  (0, big, small, 0, big, small, 2.0),
-  (1, 1L, 1d, 0, 0L, 0d, 5.0)
-).toDF("user", "user_big", "user_small", "item", "item_big", 
"item_small", "rating")
-val msg = "either out of Integer range or contained a fractional part"
-withClue("fit should fail when ids exceed integer range. ") {
-  assert(intercept[SparkException] {
-als.fit(df.select(df("user_big").as("user"), df("item"), 
df("rating")))
-  }.getCause.getMessage.contains(msg))
-  assert(intercept[SparkException] {
-als.fit(df.select(df("user_small").as("user"), df("item"), 
df("rating")))
-  }.getCause.getMessage.contains(msg))
-  assert(intercept[SparkException] {
-als.fit(df.select(df("item_big").as("item"), df("user"), 
df("rating")))
-  }.getCause.getMessage.contains(msg))
-  assert(intercept[SparkException] {
-als.fit(df.select(df("item_small").as("item"), df("user"), 
df("rating")))
-  }.getCause.getMessage.contains(msg))
-}
-withClue("transform should fail when ids exceed integer range. ") {
-  val model = als.fit(df)
-  def testTransformIdExceedsIntRange[A : Encoder](dataFrame: 
DataFrame): Unit = {
+withSQLConf(SQLConf.OPTIMIZER_EXCLUDED_RULES.key -> "") {
--- End diff --

If we only want to disable `ConvertToLocalRelation` in sql/core and 
sql/hive, maybe we can set this sql conf at `MLTest`?

I don't see any usage of `withSQLConf` in ml tests. It looks a bit weird to 
see it here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-08-30 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22270#discussion_r213923292
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -24,12 +24,10 @@ import scala.collection.JavaConverters._
 import scala.collection.mutable
 import scala.collection.mutable.{ArrayBuffer, WrappedArray}
 import scala.language.existentials
-
 import com.github.fommil.netlib.BLAS.{getInstance => blas}
 import org.apache.commons.io.FileUtils
 import org.apache.commons.io.filefilter.TrueFileFilter
 import org.scalatest.BeforeAndAfterEach
-
--- End diff --

Revert the blank lines?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRe...

2018-08-29 Thread dilipbiswal
GitHub user dilipbiswal opened a pull request:

https://github.com/apache/spark/pull/22270

[SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation in the test cases 
of sql/core and sql/hive

## What changes were proposed in this pull request?
In SharedSparkSession and TestHive, we need to disable the rule 
ConvertToLocalRelation for better test case coverage. 
## How was this patch tested?
Identify the failures after excluding "ConvertToLocalRelation" rule.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dilipbiswal/spark SPARK-25267-final

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22270.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22270


commit bd57944106348e0d0ede63afb142bfcf59af80f3
Author: Dilip Biswal 
Date:   2018-08-28T21:15:32Z

[SPARK-25267] Disable ConvertToLocalRelation in the test cases of sql/core 
and sql/hive




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org