[jira] [Updated] (SPARK-42068) Implicit conversion is not working with parallelization in scala with java 11 and spark3

2023-01-16 Thread Srinivas Rishindra Pothireddi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivas Rishindra Pothireddi updated SPARK-42068:
--
Summary: Implicit conversion is not working with parallelization in scala 
with java 11 and spark3  (was: Parallelization in Scala is not working with 
Java 11 and spark3)

> Implicit conversion is not working with parallelization in scala with java 11 
> and spark3
> 
>
> Key: SPARK-42068
> URL: https://issues.apache.org/jira/browse/SPARK-42068
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.1, 3.2.3, 3.4.0
> Environment: spark version 3.3.1 Using Scala version 2.12.15 (OpenJDK 
> 64-Bit Server VM, Java 11.0.17)
>Reporter: Srinivas Rishindra Pothireddi
>Priority: Major
>
> The following code snippet fails with java 11 with spark3, but works with 
> java 8. It also works with spark2 and java 11. 
> {code:java}
> import scala.collection.mutable
> import scala.collection.parallel.{ExecutionContextTaskSupport, 
> ForkJoinTaskSupport}
> case class Person(name: String, age: Int)
> val pc = List(1, 2, 3).par
> val forkJoinPool = new java.util.concurrent.ForkJoinPool(2)
> pc.tasksupport = new ForkJoinTaskSupport(forkJoinPool)
> pc.map { x =>
>     val personList: Array[Person] = (1 to 999).map(value => Person("p" + 
> value, value)).toArray
>     //creating RDD of Person
>     val rddPerson = spark.sparkContext.parallelize(personList, 5)
>     val evenAgePerson = rddPerson.filter(_.age % 2 == 0)
>     import spark.implicits._
>     val evenAgePersonDF = evenAgePerson.toDF("Name", "Age")
> } {code}
> The error is as follows.
> {code:java}
> scala.ScalaReflectionException: object $read not found.
>   at scala.reflect.internal.Mirrors$RootsBase.staticModule(Mirrors.scala:185)
>   at scala.reflect.internal.Mirrors$RootsBase.staticModule(Mirrors.scala:29)
>   at $typecreator6$1.apply(:37)
>   at 
> scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:237)
>   at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:237)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:52)
>   at org.apache.spark.sql.Encoders$.product(Encoders.scala:300)
>   at 
> org.apache.spark.sql.LowPrioritySQLImplicits.newProductEncoder(SQLImplicits.scala:261)
>   at 
> org.apache.spark.sql.LowPrioritySQLImplicits.newProductEncoder$(SQLImplicits.scala:261)
>   at 
> org.apache.spark.sql.SQLImplicits.newProductEncoder(SQLImplicits.scala:32)
>   at $anonfun$res0$1(:37)
>   at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23)
>   at 
> scala.collection.parallel.AugmentedIterableIterator.map2combiner(RemainsIterator.scala:116)
>   at 
> scala.collection.parallel.AugmentedIterableIterator.map2combiner$(RemainsIterator.scala:113)
>   at 
> scala.collection.parallel.immutable.ParVector$ParVectorIterator.map2combiner(ParVector.scala:66)
>   at 
> scala.collection.parallel.ParIterableLike$Map.leaf(ParIterableLike.scala:1064)
>   at scala.collection.parallel.Task.$anonfun$tryLeaf$1(Tasks.scala:53)
>   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>   at scala.util.control.Breaks$$anon$1.catchBreak(Breaks.scala:67)
>   at scala.collection.parallel.Task.tryLeaf(Tasks.scala:56)
>   at scala.collection.parallel.Task.tryLeaf$(Tasks.scala:50)
>   at 
> scala.collection.parallel.ParIterableLike$Map.tryLeaf(ParIterableLike.scala:1061)
>   at 
> scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.internal(Tasks.scala:160)
>   at 
> scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.internal$(Tasks.scala:157)
>   at 
> scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.internal(Tasks.scala:440)
>   at 
> scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute(Tasks.scala:150)
>   at 
> scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute$(Tasks.scala:149)
>   at 
> scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:440)
>   at 
> java.base/java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
>   at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
>   at java.base/java.util.concurrent.ForkJoinTask.doJoin(ForkJoinTask.java:396)
>   at java.base/java.util.concurrent.ForkJoinTask.join(ForkJoinTask.java:721)
>   at scala.collection.parallel.ForkJoinTasks$WrappedTask.sync(Tasks.scala:379)
>   at 
> scala.collection.parallel.ForkJoinTasks$WrappedTask.sync$(Tasks.scala:379)
>   at 
> scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.sync(Tasks.scala:440)
>   at 
> 

[jira] [Updated] (SPARK-42068) Parallelization in Scala is not working with Java 11 and spark3

2023-01-14 Thread Srinivas Rishindra Pothireddi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivas Rishindra Pothireddi updated SPARK-42068:
--
Description: 
The following code snippet fails with java 11 with spark3, but works with java 
8. It also works with spark2 and java 11. 
{code:java}
import scala.collection.mutable
import scala.collection.parallel.{ExecutionContextTaskSupport, 
ForkJoinTaskSupport}

case class Person(name: String, age: Int)
val pc = List(1, 2, 3).par

val forkJoinPool = new java.util.concurrent.ForkJoinPool(2)
pc.tasksupport = new ForkJoinTaskSupport(forkJoinPool)

pc.map { x =>
    val personList: Array[Person] = (1 to 999).map(value => Person("p" + value, 
value)).toArray
    //creating RDD of Person
    val rddPerson = spark.sparkContext.parallelize(personList, 5)
    val evenAgePerson = rddPerson.filter(_.age % 2 == 0)
    import spark.implicits._
    val evenAgePersonDF = evenAgePerson.toDF("Name", "Age")
} {code}
The error is as follows.
{code:java}
scala.ScalaReflectionException: object $read not found.
  at scala.reflect.internal.Mirrors$RootsBase.staticModule(Mirrors.scala:185)
  at scala.reflect.internal.Mirrors$RootsBase.staticModule(Mirrors.scala:29)
  at $typecreator6$1.apply(:37)
  at 
scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:237)
  at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:237)
  at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:52)
  at org.apache.spark.sql.Encoders$.product(Encoders.scala:300)
  at 
org.apache.spark.sql.LowPrioritySQLImplicits.newProductEncoder(SQLImplicits.scala:261)
  at 
org.apache.spark.sql.LowPrioritySQLImplicits.newProductEncoder$(SQLImplicits.scala:261)
  at org.apache.spark.sql.SQLImplicits.newProductEncoder(SQLImplicits.scala:32)
  at $anonfun$res0$1(:37)
  at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23)
  at 
scala.collection.parallel.AugmentedIterableIterator.map2combiner(RemainsIterator.scala:116)
  at 
scala.collection.parallel.AugmentedIterableIterator.map2combiner$(RemainsIterator.scala:113)
  at 
scala.collection.parallel.immutable.ParVector$ParVectorIterator.map2combiner(ParVector.scala:66)
  at 
scala.collection.parallel.ParIterableLike$Map.leaf(ParIterableLike.scala:1064)
  at scala.collection.parallel.Task.$anonfun$tryLeaf$1(Tasks.scala:53)
  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
  at scala.util.control.Breaks$$anon$1.catchBreak(Breaks.scala:67)
  at scala.collection.parallel.Task.tryLeaf(Tasks.scala:56)
  at scala.collection.parallel.Task.tryLeaf$(Tasks.scala:50)
  at 
scala.collection.parallel.ParIterableLike$Map.tryLeaf(ParIterableLike.scala:1061)
  at 
scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.internal(Tasks.scala:160)
  at 
scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.internal$(Tasks.scala:157)
  at 
scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.internal(Tasks.scala:440)
  at 
scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute(Tasks.scala:150)
  at 
scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute$(Tasks.scala:149)
  at 
scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:440)
  at 
java.base/java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
  at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
  at java.base/java.util.concurrent.ForkJoinTask.doJoin(ForkJoinTask.java:396)
  at java.base/java.util.concurrent.ForkJoinTask.join(ForkJoinTask.java:721)
  at scala.collection.parallel.ForkJoinTasks$WrappedTask.sync(Tasks.scala:379)
  at scala.collection.parallel.ForkJoinTasks$WrappedTask.sync$(Tasks.scala:379)
  at 
scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.sync(Tasks.scala:440)
  at 
scala.collection.parallel.ForkJoinTasks.executeAndWaitResult(Tasks.scala:423)
  at 
scala.collection.parallel.ForkJoinTasks.executeAndWaitResult$(Tasks.scala:416)
  at 
scala.collection.parallel.ForkJoinTaskSupport.executeAndWaitResult(TaskSupport.scala:60)
  at 
scala.collection.parallel.ParIterableLike$ResultMapping.leaf(ParIterableLike.scala:968)
  at scala.collection.parallel.Task.$anonfun$tryLeaf$1(Tasks.scala:53)
  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
  at scala.util.control.Breaks$$anon$1.catchBreak(Breaks.scala:67)
  at scala.collection.parallel.Task.tryLeaf(Tasks.scala:56)
  at scala.collection.parallel.Task.tryLeaf$(Tasks.scala:50)
  at 
scala.collection.parallel.ParIterableLike$ResultMapping.tryLeaf(ParIterableLike.scala:963)
  at 
scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute(Tasks.scala:153)
  at 
scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute$(Tasks.scala:149)
  at 

[jira] [Updated] (SPARK-42068) Parallelization in Scala is not working with Java 11 and spark3

2023-01-14 Thread Srinivas Rishindra Pothireddi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivas Rishindra Pothireddi updated SPARK-42068:
--
Environment: spark version 3.3.1 Using Scala version 2.12.15 (OpenJDK 
64-Bit Server VM, Java 11.0.17)  (was:                     __
     / __/__  ___ _/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.3.1
      /_/

Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 11.0.17))

> Parallelization in Scala is not working with Java 11 and spark3
> ---
>
> Key: SPARK-42068
> URL: https://issues.apache.org/jira/browse/SPARK-42068
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.1, 3.2.3, 3.4.0
> Environment: spark version 3.3.1 Using Scala version 2.12.15 (OpenJDK 
> 64-Bit Server VM, Java 11.0.17)
>Reporter: Srinivas Rishindra Pothireddi
>Priority: Major
>
> The following code snippet fails with java 11 with spark3, but works with 
> java 8. It also works with spark2 and java 11. 
> {code:java}
> import scala.collection.mutable
> import scala.collection.parallel.{ExecutionContextTaskSupport, 
> ForkJoinTaskSupport}
> case class Person(name: String, age: Int)
> val pc = List(1, 2, 3).par
> val forkJoinPool = new java.util.concurrent.ForkJoinPool(2)
> pc.tasksupport = new ForkJoinTaskSupport(forkJoinPool)
> pc.map { x =>
>     val personList: Array[Person] = (1 to 999).map(value => Person("p" + 
> value, value)).toArray
>     //creating RDD of Person
>     val rddPerson = spark.sparkContext.parallelize(personList, 5)
>     val evenAgePerson = rddPerson.filter(_.age % 2 == 0)
>     import spark.implicits._
>     val evenAgePersonDF = evenAgePerson.toDF("Name", "Age")
> } {code}
> The error is as follows.
> {code:java}
> scala.ScalaReflectionException: object $read not found.
>   at scala.reflect.internal.Mirrors$RootsBase.staticModule(Mirrors.scala:185)
>   at scala.reflect.internal.Mirrors$RootsBase.staticModule(Mirrors.scala:29)
>   at $typecreator6$1.apply(:37)
>   at 
> scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:237)
>   at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:237)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:52)
>   at org.apache.spark.sql.Encoders$.product(Encoders.scala:300)
>   at 
> org.apache.spark.sql.LowPrioritySQLImplicits.newProductEncoder(SQLImplicits.scala:261)
>   at 
> org.apache.spark.sql.LowPrioritySQLImplicits.newProductEncoder$(SQLImplicits.scala:261)
>   at 
> org.apache.spark.sql.SQLImplicits.newProductEncoder(SQLImplicits.scala:32)
>   at $anonfun$res0$1(:37)
>   at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23)
>   at 
> scala.collection.parallel.AugmentedIterableIterator.map2combiner(RemainsIterator.scala:116)
>   at 
> scala.collection.parallel.AugmentedIterableIterator.map2combiner$(RemainsIterator.scala:113)
>   at 
> scala.collection.parallel.immutable.ParVector$ParVectorIterator.map2combiner(ParVector.scala:66)
>   at 
> scala.collection.parallel.ParIterableLike$Map.leaf(ParIterableLike.scala:1064)
>   at scala.collection.parallel.Task.$anonfun$tryLeaf$1(Tasks.scala:53)
>   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>   at scala.util.control.Breaks$$anon$1.catchBreak(Breaks.scala:67)
>   at scala.collection.parallel.Task.tryLeaf(Tasks.scala:56)
>   at scala.collection.parallel.Task.tryLeaf$(Tasks.scala:50)
>   at 
> scala.collection.parallel.ParIterableLike$Map.tryLeaf(ParIterableLike.scala:1061)
>   at 
> scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.internal(Tasks.scala:160)
>   at 
> scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.internal$(Tasks.scala:157)
>   at 
> scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.internal(Tasks.scala:440)
>   at 
> scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute(Tasks.scala:150)
>   at 
> scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute$(Tasks.scala:149)
>   at 
> scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:440)
>   at 
> java.base/java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
>   at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
>   at java.base/java.util.concurrent.ForkJoinTask.doJoin(ForkJoinTask.java:396)
>   at java.base/java.util.concurrent.ForkJoinTask.join(ForkJoinTask.java:721)
>   at scala.collection.parallel.ForkJoinTasks$WrappedTask.sync(Tasks.scala:379)
>   at 
> scala.collection.parallel.ForkJoinTasks$WrappedTask.sync$(Tasks.scala:379)
>   at 
> 

[jira] [Created] (SPARK-42068) Parallelization in Scala is not working with Java 11 and spark3

2023-01-14 Thread Srinivas Rishindra Pothireddi (Jira)
Srinivas Rishindra Pothireddi created SPARK-42068:
-

 Summary: Parallelization in Scala is not working with Java 11 and 
spark3
 Key: SPARK-42068
 URL: https://issues.apache.org/jira/browse/SPARK-42068
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.2.3, 3.3.1, 3.4.0
 Environment:                     __
     / __/__  ___ _/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.3.1
      /_/

Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 11.0.17)
Reporter: Srinivas Rishindra Pothireddi


The following code snippet fails with java 11 with spark3, but works with java 
8. It also works with spark2 and java 11. 
{code:java}
import scala.collection.mutable
import scala.collection.parallel.{ExecutionContextTaskSupport, 
ForkJoinTaskSupport}

case class Person(name: String, age: Int)
val pc = List(1, 2, 3).par

val forkJoinPool = new java.util.concurrent.ForkJoinPool(2)
pc.tasksupport = new ForkJoinTaskSupport(forkJoinPool)

pc.map { x =>
    val personList: Array[Person] = (1 to 999).map(value => Person("p" + value, 
value)).toArray
    //creating RDD of Person
    val rddPerson = spark.sparkContext.parallelize(personList, 5)
    val evenAgePerson = rddPerson.filter(_.age % 2 == 0)
    import spark.implicits._
    val evenAgePersonDF = evenAgePerson.toDF("Name", "Age")
} {code}
The error is as follows.
{code:java}
scala.ScalaReflectionException: object $read not found.
  at scala.reflect.internal.Mirrors$RootsBase.staticModule(Mirrors.scala:185)
  at scala.reflect.internal.Mirrors$RootsBase.staticModule(Mirrors.scala:29)
  at $typecreator6$1.apply(:37)
  at 
scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:237)
  at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:237)
  at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:52)
  at org.apache.spark.sql.Encoders$.product(Encoders.scala:300)
  at 
org.apache.spark.sql.LowPrioritySQLImplicits.newProductEncoder(SQLImplicits.scala:261)
  at 
org.apache.spark.sql.LowPrioritySQLImplicits.newProductEncoder$(SQLImplicits.scala:261)
  at org.apache.spark.sql.SQLImplicits.newProductEncoder(SQLImplicits.scala:32)
  at $anonfun$res0$1(:37)
  at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23)
  at 
scala.collection.parallel.AugmentedIterableIterator.map2combiner(RemainsIterator.scala:116)
  at 
scala.collection.parallel.AugmentedIterableIterator.map2combiner$(RemainsIterator.scala:113)
  at 
scala.collection.parallel.immutable.ParVector$ParVectorIterator.map2combiner(ParVector.scala:66)
  at 
scala.collection.parallel.ParIterableLike$Map.leaf(ParIterableLike.scala:1064)
  at scala.collection.parallel.Task.$anonfun$tryLeaf$1(Tasks.scala:53)
  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
  at scala.util.control.Breaks$$anon$1.catchBreak(Breaks.scala:67)
  at scala.collection.parallel.Task.tryLeaf(Tasks.scala:56)
  at scala.collection.parallel.Task.tryLeaf$(Tasks.scala:50)
  at 
scala.collection.parallel.ParIterableLike$Map.tryLeaf(ParIterableLike.scala:1061)
  at 
scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.internal(Tasks.scala:160)
  at 
scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.internal$(Tasks.scala:157)
  at 
scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.internal(Tasks.scala:440)
  at 
scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute(Tasks.scala:150)
  at 
scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute$(Tasks.scala:149)
  at 
scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:440)
  at 
java.base/java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
  at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
  at java.base/java.util.concurrent.ForkJoinTask.doJoin(ForkJoinTask.java:396)
  at java.base/java.util.concurrent.ForkJoinTask.join(ForkJoinTask.java:721)
  at scala.collection.parallel.ForkJoinTasks$WrappedTask.sync(Tasks.scala:379)
  at scala.collection.parallel.ForkJoinTasks$WrappedTask.sync$(Tasks.scala:379)
  at 
scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.sync(Tasks.scala:440)
  at 
scala.collection.parallel.ForkJoinTasks.executeAndWaitResult(Tasks.scala:423)
  at 
scala.collection.parallel.ForkJoinTasks.executeAndWaitResult$(Tasks.scala:416)
  at 
scala.collection.parallel.ForkJoinTaskSupport.executeAndWaitResult(TaskSupport.scala:60)
  at 
scala.collection.parallel.ParIterableLike$ResultMapping.leaf(ParIterableLike.scala:968)
  at scala.collection.parallel.Task.$anonfun$tryLeaf$1(Tasks.scala:53)
  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
  at 

[jira] [Issue Comment Deleted] (SPARK-34572) Documentation generation check fails in GitHub

2021-02-28 Thread Srinivas Rishindra Pothireddi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivas Rishindra Pothireddi updated SPARK-34572:
--
Comment: was deleted

(was: I published a PR to to remove the Invalid reference. 
[https://github.com/apache/spark/pull/31686/] .)

> Documentation generation check fails in GitHub 
> ---
>
> Key: SPARK-34572
> URL: https://issues.apache.org/jira/browse/SPARK-34572
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.2
>Reporter: Srinivas Rishindra Pothireddi
>Priority: Blocker
>
> The run documentation check has been failing for multiple PRs in GitHub 
> because the following 
> [commit|https://github.com/apache/spark/commit/397b843890db974a0534394b1907d33d62c2b888#diff-02f10976f2f83e219445cd4860a2999f45dc254b173f1ffefb323d8fe1d1c817R100-R108]
>   included a reference to the following file 
> (model_selection_random_hyperparameters_example.py).
> However this file is unavailable in the examples folder in the spark project. 
> This is causing the (Linters, licenses, dependencies and documentation 
> generation) check to fail in the documentation build in Github actions. So, 
> this has to be removed to let the build run successfully.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34572) Documentation generation check fails in GitHub

2021-02-28 Thread Srinivas Rishindra Pothireddi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17292509#comment-17292509
 ] 

Srinivas Rishindra Pothireddi commented on SPARK-34572:
---

I published a PR to to remove the Invalid reference. 
[https://github.com/apache/spark/pull/31686/] .

> Documentation generation check fails in GitHub 
> ---
>
> Key: SPARK-34572
> URL: https://issues.apache.org/jira/browse/SPARK-34572
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.2
>Reporter: Srinivas Rishindra Pothireddi
>Priority: Blocker
>
> The run documentation check has been failing for multiple PRs in GitHub 
> because the following 
> [commit|https://github.com/apache/spark/commit/397b843890db974a0534394b1907d33d62c2b888#diff-02f10976f2f83e219445cd4860a2999f45dc254b173f1ffefb323d8fe1d1c817R100-R108]
>   included a reference to the following file 
> (model_selection_random_hyperparameters_example.py).
> However this file is unavailable in the examples folder in the spark project. 
> This is causing the (Linters, licenses, dependencies and documentation 
> generation) check to fail in the documentation build in Github actions. So, 
> this has to be removed to let the build run successfully.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34572) Documentation generation check fails in GitHub

2021-02-28 Thread Srinivas Rishindra Pothireddi (Jira)
Srinivas Rishindra Pothireddi created SPARK-34572:
-

 Summary: Documentation generation check fails in GitHub 
 Key: SPARK-34572
 URL: https://issues.apache.org/jira/browse/SPARK-34572
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 3.0.2
Reporter: Srinivas Rishindra Pothireddi


The run documentation check has been failing for multiple PRs in GitHub because 
the following 
[commit|https://github.com/apache/spark/commit/397b843890db974a0534394b1907d33d62c2b888#diff-02f10976f2f83e219445cd4860a2999f45dc254b173f1ffefb323d8fe1d1c817R100-R108]
  included a reference to the following file 
(model_selection_random_hyperparameters_example.py).

However this file is unavailable in the examples folder in the spark project. 
This is causing the (Linters, licenses, dependencies and documentation 
generation) check to fail in the documentation build in Github actions. So, 
this has to be removed to let the build run successfully.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34369) Track number of pairs processed out of Join

2021-02-04 Thread Srinivas Rishindra Pothireddi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivas Rishindra Pothireddi updated SPARK-34369:
--
Description: 
Often users face a scenario where even a modest skew in a join can lead to 
tasks appearing to be stuck, due to the O(n^2) nature of a join considering all 
pairs of rows with matching keys. When this happens users think that spark has 
gotten deadlocked. If there is a bound condition, the "number of output rows" 
metric may look typical. Other metrics may look very modest (eg: shuffle read). 
In those cases, it is very hard to understand what the problem is. There is no 
conclusive proof without getting a heap dump and looking at some internal data 
structures.

It would be much better if spark had a metric(which we propose be titled 
“number of matched pairs” as a companion to “number of output rows”) which 
showed the user how many pairs were being processed in the join. This would get 
updated in the live UI (when metrics get collected during heartbeats), so the 
user could easily see what was going on.

This would even help in cases where there was some other cause of a stuck 
executor (eg. network issues) just to disprove this theory. For example, you 
may have 100k records with the same key on each side of a join. That probably 
won't really show up as extreme skew in task input data. But it'll become 10B 
join pairs that spark works through, in one task.

 

To further demonstrate the usefulness of this metric please follow the steps 
below.

 

_val df1 = spark.range(0, 20).map \{ x => (x % 20, 20) }.toDF("b", "c")_

_val df2 = spark.range(0, 30).map \{ x => (77, 20) }.toDF("b", "c")_

 

_val df3 = spark.range(0, 20).map(x => (x + 1, x + 2)).toDF("b", "c")_

_val df4 = spark.range(0, 30).map(x => (77, x + 2)).toDF("b", "c")_

 

_val df5 = df1.union(df2)_

_val df6 = df3.union(df4)_

 

_df5.createOrReplaceTempView("table1")_

_df6.createOrReplaceTempView("table2")_
h3. InnerJoin

_sql("select p.**,* f.* from table2 p join table1 f on f.b = p.b and f.c > 
p.c").count_

_number of output rows: 5,580,000_

_number of matched pairs: 90,000,490,000_
h3. FullOuterJoin

_spark.sql("select p.**,* f.* from table2 p full outer join table1 f on f.b = 
p.b and f.c > p.c").count_

_number of output rows: 6,099,964_

_number of matched pairs: 90,000,490,000_
h3. LeftOuterJoin

_sql("select p.**,* f.* from table2 p left outer join table1 f on f.b = p.b and 
f.c > p.c").count_

_number of output rows: 6,079,964_

_number of matched pairs: 90,000,490,000_
h3. RightOuterJoin

_spark.sql("select p.**,* f.* from table2 p right outer join table1 f on f.b = 
p.b and f.c > p.c").count_

_number of output rows: 5,600,000_

_number of matched pairs: 90,000,490,000_
h3. LeftSemiJoin

_spark.sql("select * from table2 p left semi join table1 f on f.b = p.b and f.c 
> p.c").count_

_number of output rows: 36_

_number of matched pairs: 89,994,910,036_
h3. CrossJoin

_spark.sql("select p.*, f.* from table2 p cross join table1 f on f.b = p.b and 
f.c > p.c").count_

_number of output rows: 5,580,000_

_number of matched pairs: 90,000,490,000_
h3. LeftAntiJoin

_spark.sql("select * from table2 p anti join table1 f on f.b = p.b and f.c > 
p.c").count_

number of output rows: 499,964

number of matched pairs: 89,994,910,036

  was:
Often users face a scenario where even a modest skew in a join can lead to 
tasks appearing to be stuck, due to the O(n^2) nature of a join considering all 
pairs of rows with matching keys. When this happens users think that spark has 
gotten deadlocked. If there is a bound condition, the "number of output rows" 
metric may look typical. Other metrics may look very modest (eg: shuffle read). 
In those cases, it is very hard to understand what the problem is. There is no 
conclusive proof without getting a heap dump and looking at some internal data 
structures.

It would be much better if spark had a metric(which we propose be titled 
“number of matched pairs” as a companion to “number of output rows”) which 
showed the user how many pairs were being processed in the join. This would get 
updated in the live UI (when metrics get collected during heartbeats), so the 
user could easily see what was going on.

This would even help in cases where there was some other cause of a stuck 
executor (eg. network issues) just to disprove this theory. For example, you 
may have 100k records with the same key on each side of a join. That probably 
won't really show up as extreme skew in task input data. But it'll become 10B 
join pairs that spark works through, in one task.

 

To further demonstrate the usefulness of this metric please follow the steps 
below.

 

_val df1 = spark.range(0, 20).map \{ x => (x % 20, 20) }.toDF("b", "c")_

_val df2 = spark.range(0, 30).map \{ x => (77, 20) }.toDF("b", 

[jira] [Commented] (SPARK-34369) Track number of pairs processed out of Join

2021-02-04 Thread Srinivas Rishindra Pothireddi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17279098#comment-17279098
 ] 

Srinivas Rishindra Pothireddi commented on SPARK-34369:
---

I am working on this

> Track number of pairs processed out of Join
> ---
>
> Key: SPARK-34369
> URL: https://issues.apache.org/jira/browse/SPARK-34369
> Project: Spark
>  Issue Type: New Feature
>  Components: Web UI
>Affects Versions: 3.2.0
>Reporter: Srinivas Rishindra Pothireddi
>Priority: Major
>
> Often users face a scenario where even a modest skew in a join can lead to 
> tasks appearing to be stuck, due to the O(n^2) nature of a join considering 
> all pairs of rows with matching keys. When this happens users think that 
> spark has gotten deadlocked. If there is a bound condition, the "number of 
> output rows" metric may look typical. Other metrics may look very modest (eg: 
> shuffle read). In those cases, it is very hard to understand what the problem 
> is. There is no conclusive proof without getting a heap dump and looking at 
> some internal data structures.
> It would be much better if spark had a metric(which we propose be titled 
> “number of matched pairs” as a companion to “number of output rows”) which 
> showed the user how many pairs were being processed in the join. This would 
> get updated in the live UI (when metrics get collected during heartbeats), so 
> the user could easily see what was going on.
> This would even help in cases where there was some other cause of a stuck 
> executor (eg. network issues) just to disprove this theory. For example, you 
> may have 100k records with the same key on each side of a join. That probably 
> won't really show up as extreme skew in task input data. But it'll become 10B 
> join pairs that spark works through, in one task.
>  
> To further demonstrate the usefulness of this metric please follow the steps 
> below.
>  
> _val df1 = spark.range(0, 20).map \{ x => (x % 20, 20) }.toDF("b", 
> "c")_
> _val df2 = spark.range(0, 30).map \{ x => (77, 20) }.toDF("b", "c")_
>  
> _val df3 = spark.range(0, 20).map(x => (x + 1, x + 2)).toDF("b", "c")_
> _val df4 = spark.range(0, 30).map(x => (77, x + 2)).toDF("b", "c")_
>  
> _val df5 = df1.union(df2)_
> _val df6 = df3.union(df4)_
>  
> _df5.createOrReplaceTempView("table1")_
> _df6.createOrReplaceTempView("table2")_
> h3. InnerJoin
> _sql("select p.*, f.* from table2 p join table1 f on f.b = p.b and f.c > 
> p.c").count_
> _number of output rows: 5,580,000_
> _number of matched pairs: 90,000,490,000_
> h3. FullOuterJoin
> _spark.sql("select p.*, f.* from table2 p full outer join table1 f on f.b = 
> p.b and f.c > p.c").count_
> _number of output rows: 6,099,964_
> _number of matched pairs: 90,000,490,000_
> h3. LeftOuterJoin
> _sql("select p.*, f.* from table2 p left outer join table1 f on f.b = p.b and 
> f.c > p.c").count_
> _number of output rows: 6,079,964_
> _number of matched pairs: 90,000,490,000_
> h3. RightOuterJoin
> _spark.sql("select p.*, f.* from table2 p right outer join table1 f on f.b = 
> p.b and f.c > p.c").count_
> _number of output rows: 5,600,000_
> _number of matched pairs: 90,000,490,000_
> h3. LeftSemiJoin
> _spark.sql("select * from table2 p left semi join table1 f on f.b = p.b and 
> f.c > p.c").count_
> _number of output rows: 36_
> _number of matched pairs: 89,994,910,036_
> h3. CrossJoin
> _spark.sql("select p.*, f.* from table2 p cross join table1 f on f.b = p.b 
> and f.c > p.c").count_
> _number of output rows: 5,580,000_
> _number of matched pairs: 90,000,490,000_
> h3. LeftAntiJoin
> _spark.sql("select * from table2 p anti join table1 f on f.b = p.b and f.c > 
> p.c").count_
> number of output rows: 499,964
> number of matched pairs: 89,994,910,036



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34369) Track number of pairs processed out of Join

2021-02-04 Thread Srinivas Rishindra Pothireddi (Jira)
Srinivas Rishindra Pothireddi created SPARK-34369:
-

 Summary: Track number of pairs processed out of Join
 Key: SPARK-34369
 URL: https://issues.apache.org/jira/browse/SPARK-34369
 Project: Spark
  Issue Type: New Feature
  Components: Web UI
Affects Versions: 3.2.0
Reporter: Srinivas Rishindra Pothireddi


Often users face a scenario where even a modest skew in a join can lead to 
tasks appearing to be stuck, due to the O(n^2) nature of a join considering all 
pairs of rows with matching keys. When this happens users think that spark has 
gotten deadlocked. If there is a bound condition, the "number of output rows" 
metric may look typical. Other metrics may look very modest (eg: shuffle read). 
In those cases, it is very hard to understand what the problem is. There is no 
conclusive proof without getting a heap dump and looking at some internal data 
structures.

It would be much better if spark had a metric(which we propose be titled 
“number of matched pairs” as a companion to “number of output rows”) which 
showed the user how many pairs were being processed in the join. This would get 
updated in the live UI (when metrics get collected during heartbeats), so the 
user could easily see what was going on.

This would even help in cases where there was some other cause of a stuck 
executor (eg. network issues) just to disprove this theory. For example, you 
may have 100k records with the same key on each side of a join. That probably 
won't really show up as extreme skew in task input data. But it'll become 10B 
join pairs that spark works through, in one task.

 

To further demonstrate the usefulness of this metric please follow the steps 
below.

 

_val df1 = spark.range(0, 20).map \{ x => (x % 20, 20) }.toDF("b", "c")_

_val df2 = spark.range(0, 30).map \{ x => (77, 20) }.toDF("b", "c")_

 

_val df3 = spark.range(0, 20).map(x => (x + 1, x + 2)).toDF("b", "c")_

_val df4 = spark.range(0, 30).map(x => (77, x + 2)).toDF("b", "c")_

 

_val df5 = df1.union(df2)_

_val df6 = df3.union(df4)_

 

_df5.createOrReplaceTempView("table1")_

_df6.createOrReplaceTempView("table2")_
h3. InnerJoin

_sql("select p.*, f.* from table2 p join table1 f on f.b = p.b and f.c > 
p.c").count_

_number of output rows: 5,580,000_

_number of matched pairs: 90,000,490,000_
h3. FullOuterJoin

_spark.sql("select p.*, f.* from table2 p full outer join table1 f on f.b = p.b 
and f.c > p.c").count_

_number of output rows: 6,099,964_

_number of matched pairs: 90,000,490,000_
h3. LeftOuterJoin

_sql("select p.*, f.* from table2 p left outer join table1 f on f.b = p.b and 
f.c > p.c").count_

_number of output rows: 6,079,964_

_number of matched pairs: 90,000,490,000_
h3. RightOuterJoin

_spark.sql("select p.*, f.* from table2 p right outer join table1 f on f.b = 
p.b and f.c > p.c").count_

_number of output rows: 5,600,000_

_number of matched pairs: 90,000,490,000_
h3. LeftSemiJoin

_spark.sql("select * from table2 p left semi join table1 f on f.b = p.b and f.c 
> p.c").count_

_number of output rows: 36_

_number of matched pairs: 89,994,910,036_
h3. CrossJoin

_spark.sql("select p.*, f.* from table2 p cross join table1 f on f.b = p.b and 
f.c > p.c").count_

_number of output rows: 5,580,000_

_number of matched pairs: 90,000,490,000_
h3. LeftAntiJoin

_spark.sql("select * from table2 p anti join table1 f on f.b = p.b and f.c > 
p.c").count_

number of output rows: 499,964

number of matched pairs: 89,994,910,036



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31377) Add unit tests for "number of output rows" metric for joins in SQLMetricsSuite

2020-04-24 Thread Srinivas Rishindra Pothireddi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivas Rishindra Pothireddi updated SPARK-31377:
--
Description: 
For some combinations of join algorithm and join types there are no unit tests 
for the "number of output rows" metric.

A list of missing unit tests include the following.
 * ShuffledHashJoin: leftOuter, RightOuter, LeftAnti, LeftSemi
 * BroadcastNestedLoopJoin: RightOuter
 * BroadcastHashJoin: LeftAnti

  was:
For some combinations of join algorithm and join types there are no unit tests 
for the "number of output rows" metric.

A list of missing unit tests include the following.
 * SortMergeJoin: ExistenceJoin
 * ShuffledHashJoin: leftOuter, RightOuter, LeftAnti, LeftSemi, ExistenseJoin
 * BroadcastNestedLoopJoin: RightOuter, InnerJoin, ExistenceJoin
 * BroadcastHashJoin: LeftAnti, ExistenceJoin


> Add unit tests for "number of output rows" metric for joins in SQLMetricsSuite
> --
>
> Key: SPARK-31377
> URL: https://issues.apache.org/jira/browse/SPARK-31377
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.1.0
>Reporter: Srinivas Rishindra Pothireddi
>Priority: Minor
>
> For some combinations of join algorithm and join types there are no unit 
> tests for the "number of output rows" metric.
> A list of missing unit tests include the following.
>  * ShuffledHashJoin: leftOuter, RightOuter, LeftAnti, LeftSemi
>  * BroadcastNestedLoopJoin: RightOuter
>  * BroadcastHashJoin: LeftAnti



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-31380) Peak Execution Memory Quantile is not displayed in Spark History Server UI

2020-04-16 Thread Srinivas Rishindra Pothireddi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085138#comment-17085138
 ] 

Srinivas Rishindra Pothireddi edited comment on SPARK-31380 at 4/16/20, 6:10 PM:
-

I tested this with spark master at the time of creating this ticket. I am not 
seeing this issue with the current master and with spark-3.0.0-preview2. I 
guess the issue might have gone away recently. The issue definitely exists with 
spark-2.4.5. 

The history server might be flaky. That could be a reason why we are seeing 
this issue intermittently. For example when I tried to run my application to 
test it I am able to see the metrics in safari but not in chrome.  
!image-2020-04-16-11-04-59-036.png!

 

  !image-2020-04-16-11-08-27-137.png!

 


was (Author: sririshindra):
I tested this with spark master at the time of creating this ticket. I am not 
seeing this issue again now and spark-3.0.0-preview2. I guess the issue might 
have gone away recently. The issue definitely exists with spark-2.4.5. 

The history server might be flaky. That could be a reason why we are seeing 
this issue intermittently. For example when I tried to run my application to 
test it I am able to see the metrics in safari but not in chrome.  
!image-2020-04-16-11-04-59-036.png!

 

  !image-2020-04-16-11-08-27-137.png!

 

> Peak Execution Memory Quantile is not displayed in Spark History Server UI
> --
>
> Key: SPARK-31380
> URL: https://issues.apache.org/jira/browse/SPARK-31380
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 3.0.0
>Reporter: Srinivas Rishindra Pothireddi
>Priority: Major
> Attachments: image-2020-04-15-18-16-18-254.png, 
> image-2020-04-16-11-04-58-953.png, image-2020-04-16-11-04-59-036.png, 
> image-2020-04-16-11-06-31-277.png, image-2020-04-16-11-08-27-137.png
>
>
> Peak Execution Memory Quantile is displayed in the regular Spark UI 
> correctly. If the same application is viewed in Spark History Server UI, Peak 
> Execution Memory is always displayed as zero.
> Spark event log for the application seem to contain Peak Execution 
> Memory(under the tag "internal.metrics.peakExecutionMemory") correctly.  
> However this is not reflected in the History Server UI.
> *Steps to produce non-zero Peak Execution Memory*
> spark.range(0, 20).map\{x => (x , x % 20)}.toDF("a", 
> "b").createOrReplaceTempView("fred")
> spark.range(0, 20).map\{x => (x , x + 1)}.toDF("a", 
> "b").createOrReplaceTempView("phil")
> sql("select p.**,* f.* from phil p join fred f on f.b = p.b").count
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-31380) Peak Execution Memory Quantile is not displayed in Spark History Server UI

2020-04-16 Thread Srinivas Rishindra Pothireddi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085138#comment-17085138
 ] 

Srinivas Rishindra Pothireddi edited comment on SPARK-31380 at 4/16/20, 6:08 PM:
-

I tested this with spark master at the time of creating this ticket. I am not 
seeing this issue again now and spark-3.0.0-preview2. I guess the issue might 
have gone away recently. The issue definitely exists with spark-2.4.5. 

The history server might be flaky. That could be a reason why we are seeing 
this issue intermittently. For example when I tried to run my application to 
test it I am able to see the metrics in safari but not in chrome.  
!image-2020-04-16-11-04-59-036.png!

 

  !image-2020-04-16-11-08-27-137.png!

 


was (Author: sririshindra):
I tested this with spark master at the time of creating this ticket. I am not 
seeing this issue again now and spark-3.0.0-preview2. I guess the issue might 
have gone away recently. The issue definitely exists with spark-2.4.5. 

The history server might be flaky. That could be a reason why we are seeing 
this issue intermittently. For example when I tried to run my application to 
test it I am able to see the metrics in safari but not in chrome.  
!image-2020-04-16-11-04-59-036.png!

 

 

!image-2020-04-16-11-06-31-277.png!

> Peak Execution Memory Quantile is not displayed in Spark History Server UI
> --
>
> Key: SPARK-31380
> URL: https://issues.apache.org/jira/browse/SPARK-31380
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 3.0.0
>Reporter: Srinivas Rishindra Pothireddi
>Priority: Major
> Attachments: image-2020-04-15-18-16-18-254.png, 
> image-2020-04-16-11-04-58-953.png, image-2020-04-16-11-04-59-036.png, 
> image-2020-04-16-11-06-31-277.png, image-2020-04-16-11-08-27-137.png
>
>
> Peak Execution Memory Quantile is displayed in the regular Spark UI 
> correctly. If the same application is viewed in Spark History Server UI, Peak 
> Execution Memory is always displayed as zero.
> Spark event log for the application seem to contain Peak Execution 
> Memory(under the tag "internal.metrics.peakExecutionMemory") correctly.  
> However this is not reflected in the History Server UI.
> *Steps to produce non-zero Peak Execution Memory*
> spark.range(0, 20).map\{x => (x , x % 20)}.toDF("a", 
> "b").createOrReplaceTempView("fred")
> spark.range(0, 20).map\{x => (x , x + 1)}.toDF("a", 
> "b").createOrReplaceTempView("phil")
> sql("select p.**,* f.* from phil p join fred f on f.b = p.b").count
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-31380) Peak Execution Memory Quantile is not displayed in Spark History Server UI

2020-04-16 Thread Srinivas Rishindra Pothireddi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085138#comment-17085138
 ] 

Srinivas Rishindra Pothireddi edited comment on SPARK-31380 at 4/16/20, 6:06 PM:
-

I tested this with spark master at the time of creating this ticket. I am not 
seeing this issue again now and spark-3.0.0-preview2. I guess the issue might 
have gone away recently. The issue definitely exists with spark-2.4.5. 

The history server might be flaky. That could be a reason why we are seeing 
this issue intermittently. For example when I tried to run my application to 
test it I am able to see the metrics in safari but not in chrome.  
!image-2020-04-16-11-04-59-036.png!

 

 

!image-2020-04-16-11-06-31-277.png!


was (Author: sririshindra):
I tested this with spark master at the time of creating this ticket. I am not 
seeing this issue again now and spark-3.0.0-preview2. I guess the issue might 
have gone away recently. The issue definitely exists with spark-2.4.5. 

The history server might be flaky. That could be a reason why we are seeing 
this issue intermittently. For example when I tried to run my application to 
test it I am able to see the metrics in safari but not in chrome. 
!image-2020-04-16-11-04-59-036.png!

 

> Peak Execution Memory Quantile is not displayed in Spark History Server UI
> --
>
> Key: SPARK-31380
> URL: https://issues.apache.org/jira/browse/SPARK-31380
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 3.0.0
>Reporter: Srinivas Rishindra Pothireddi
>Priority: Major
> Attachments: image-2020-04-15-18-16-18-254.png, 
> image-2020-04-16-11-04-58-953.png, image-2020-04-16-11-04-59-036.png, 
> image-2020-04-16-11-06-31-277.png
>
>
> Peak Execution Memory Quantile is displayed in the regular Spark UI 
> correctly. If the same application is viewed in Spark History Server UI, Peak 
> Execution Memory is always displayed as zero.
> Spark event log for the application seem to contain Peak Execution 
> Memory(under the tag "internal.metrics.peakExecutionMemory") correctly.  
> However this is not reflected in the History Server UI.
> *Steps to produce non-zero Peak Execution Memory*
> spark.range(0, 20).map\{x => (x , x % 20)}.toDF("a", 
> "b").createOrReplaceTempView("fred")
> spark.range(0, 20).map\{x => (x , x + 1)}.toDF("a", 
> "b").createOrReplaceTempView("phil")
> sql("select p.**,* f.* from phil p join fred f on f.b = p.b").count
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-31380) Peak Execution Memory Quantile is not displayed in Spark History Server UI

2020-04-16 Thread Srinivas Rishindra Pothireddi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085138#comment-17085138
 ] 

Srinivas Rishindra Pothireddi edited comment on SPARK-31380 at 4/16/20, 6:05 PM:
-

I tested this with spark master at the time of creating this ticket. I am not 
seeing this issue again now and spark-3.0.0-preview2. I guess the issue might 
have gone away recently. The issue definitely exists with spark-2.4.5. 

The history server might be flaky. That could be a reason why we are seeing 
this issue intermittently. For example when I tried to run my application to 
test it I am able to see the metrics in safari but not in chrome. 
!image-2020-04-16-11-04-59-036.png!

 


was (Author: sririshindra):
I tested this with spark master at the time of creating this ticket. I am not 
seeing this issue again now and spark-3.0.0-preview2. I guess the issue might 
have gone away recently. The issue definitely exists with spark-2.4.5. 

The history server might be flaky. That could be a reason why we are seeing 
this issue intermittently. For example when I tried to run my application to 
test it I am able to see the metrics in safari but not in chrome.

!Screen Shot 2020-04-16 at 10.55.17 AM.png!

 

!Screen Shot 2020-04-16 at 10.57.14 AM.png!

> Peak Execution Memory Quantile is not displayed in Spark History Server UI
> --
>
> Key: SPARK-31380
> URL: https://issues.apache.org/jira/browse/SPARK-31380
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 3.0.0
>Reporter: Srinivas Rishindra Pothireddi
>Priority: Major
> Attachments: image-2020-04-15-18-16-18-254.png, 
> image-2020-04-16-11-04-58-953.png, image-2020-04-16-11-04-59-036.png
>
>
> Peak Execution Memory Quantile is displayed in the regular Spark UI 
> correctly. If the same application is viewed in Spark History Server UI, Peak 
> Execution Memory is always displayed as zero.
> Spark event log for the application seem to contain Peak Execution 
> Memory(under the tag "internal.metrics.peakExecutionMemory") correctly.  
> However this is not reflected in the History Server UI.
> *Steps to produce non-zero Peak Execution Memory*
> spark.range(0, 20).map\{x => (x , x % 20)}.toDF("a", 
> "b").createOrReplaceTempView("fred")
> spark.range(0, 20).map\{x => (x , x + 1)}.toDF("a", 
> "b").createOrReplaceTempView("phil")
> sql("select p.**,* f.* from phil p join fred f on f.b = p.b").count
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31380) Peak Execution Memory Quantile is not displayed in Spark History Server UI

2020-04-16 Thread Srinivas Rishindra Pothireddi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085138#comment-17085138
 ] 

Srinivas Rishindra Pothireddi commented on SPARK-31380:
---

I tested this with spark master at the time of creating this ticket. I am not 
seeing this issue again now and spark-3.0.0-preview2. I guess the issue might 
have gone away recently. The issue definitely exists with spark-2.4.5. 

The history server might be flaky. That could be a reason why we are seeing 
this issue intermittently. For example when I tried to run my application to 
test it I am able to see the metrics in safari but not in chrome.

!Screen Shot 2020-04-16 at 10.55.17 AM.png!

 

!Screen Shot 2020-04-16 at 10.57.14 AM.png!

> Peak Execution Memory Quantile is not displayed in Spark History Server UI
> --
>
> Key: SPARK-31380
> URL: https://issues.apache.org/jira/browse/SPARK-31380
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 3.0.0
>Reporter: Srinivas Rishindra Pothireddi
>Priority: Major
> Attachments: image-2020-04-15-18-16-18-254.png
>
>
> Peak Execution Memory Quantile is displayed in the regular Spark UI 
> correctly. If the same application is viewed in Spark History Server UI, Peak 
> Execution Memory is always displayed as zero.
> Spark event log for the application seem to contain Peak Execution 
> Memory(under the tag "internal.metrics.peakExecutionMemory") correctly.  
> However this is not reflected in the History Server UI.
> *Steps to produce non-zero Peak Execution Memory*
> spark.range(0, 20).map\{x => (x , x % 20)}.toDF("a", 
> "b").createOrReplaceTempView("fred")
> spark.range(0, 20).map\{x => (x , x + 1)}.toDF("a", 
> "b").createOrReplaceTempView("phil")
> sql("select p.**,* f.* from phil p join fred f on f.b = p.b").count
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31377) Add unit tests for "number of output rows" metric for joins in SQLMetricsSuite

2020-04-10 Thread Srinivas Rishindra Pothireddi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivas Rishindra Pothireddi updated SPARK-31377:
--
Description: 
For some combinations of join algorithm and join types there are no unit tests 
for the "number of output rows" metric.

A list of missing unit tests include the following.
 * SortMergeJoin: ExistenceJoin
 * ShuffledHashJoin: leftOuter, RightOuter, LeftAnti, LeftSemi, ExistenseJoin
 * BroadcastNestedLoopJoin: RightOuter, InnerJoin, ExistenceJoin
 * BroadcastHashJoin: LeftAnti, ExistenceJoin

  was:
For some combinations of join algorithm and join types there are no unit tests 
for the "number of output rows" metric.

A list of missing unit tests include the following.
 * SortMergeJoin: ExistenceJoin
 * ShuffledHashJoin: leftOuter, RightOuter, LeftAnti, LeftSemi, ExistenseJoin
 * BroadcastNestedLoopJoin: RightOuter, ExistenceJoin, InnerJoin
 * BroadcastHashJoin: LeftAnti, ExistenceJoin


> Add unit tests for "number of output rows" metric for joins in SQLMetricsSuite
> --
>
> Key: SPARK-31377
> URL: https://issues.apache.org/jira/browse/SPARK-31377
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.1.0
>Reporter: Srinivas Rishindra Pothireddi
>Priority: Minor
>
> For some combinations of join algorithm and join types there are no unit 
> tests for the "number of output rows" metric.
> A list of missing unit tests include the following.
>  * SortMergeJoin: ExistenceJoin
>  * ShuffledHashJoin: leftOuter, RightOuter, LeftAnti, LeftSemi, ExistenseJoin
>  * BroadcastNestedLoopJoin: RightOuter, InnerJoin, ExistenceJoin
>  * BroadcastHashJoin: LeftAnti, ExistenceJoin



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31377) Add unit tests for "number of output rows" metric for joins in SQLMetricsSuite

2020-04-10 Thread Srinivas Rishindra Pothireddi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivas Rishindra Pothireddi updated SPARK-31377:
--
Description: 
For some combinations of join algorithm and join types there are no unit tests 
for the "number of output rows" metric.

A list of missing unit tests include the following.
 * SortMergeJoin: ExistenceJoin
 * ShuffledHashJoin: leftOuter, RightOuter, LeftAnti, LeftSemi, ExistenseJoin
 * BroadcastNestedLoopJoin: RightOuter, ExistenceJoin, InnerJoin
 * BroadcastHashJoin: LeftAnti, ExistenceJoin

  was:
For some combinations of join algorithm and join types there are no unit tests 
for the "number of output rows" metric.

A list of missing unit tests include the following.
 * SortMergeJoin: ExistenceJoin
 * ShuffledHashJoin: OuterJoin, leftOuter, RightOuter, LeftAnti, LeftSemi, 
ExistenseJoin
 * BroadcastNestedLoopJoin: RightOuter, ExistenceJoin, InnerJoin
 * BroadcastHashJoin: LeftAnti, ExistenceJoin


> Add unit tests for "number of output rows" metric for joins in SQLMetricsSuite
> --
>
> Key: SPARK-31377
> URL: https://issues.apache.org/jira/browse/SPARK-31377
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.1.0
>Reporter: Srinivas Rishindra Pothireddi
>Priority: Minor
>
> For some combinations of join algorithm and join types there are no unit 
> tests for the "number of output rows" metric.
> A list of missing unit tests include the following.
>  * SortMergeJoin: ExistenceJoin
>  * ShuffledHashJoin: leftOuter, RightOuter, LeftAnti, LeftSemi, ExistenseJoin
>  * BroadcastNestedLoopJoin: RightOuter, ExistenceJoin, InnerJoin
>  * BroadcastHashJoin: LeftAnti, ExistenceJoin



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31377) Add unit tests for "number of output rows" metric for joins in SQLMetricsSuite

2020-04-09 Thread Srinivas Rishindra Pothireddi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivas Rishindra Pothireddi updated SPARK-31377:
--
Description: 
For some combinations of join algorithm and join types there are no unit tests 
for the "number of output rows" metric.

A list of missing unit tests include the following.
 * SortMergeJoin: ExistenceJoin
 * ShuffledHashJoin: OuterJoin, leftOuter, RightOuter, LeftAnti, LeftSemi, 
ExistenseJoin
 * BroadcastNestedLoopJoin: RightOuter, ExistenceJoin, InnerJoin
 * BroadcastHashJoin: LeftAnti, ExistenceJoin

  was:
For some combinations of join algorithm and join types there are no unit tests 
for the "number of output rows" metric.

A list of missing unit tests include the following.
 * SortMergeJoin: ExistenceJoin
 * ShuffledHashJoin: OuterJoin, ReftOuter, RightOuter, LeftAnti, LeftSemi, 
ExistenseJoin
 * BroadcastNestedLoopJoin: RightOuter, ExistenceJoin, InnerJoin
 * BroadcastHashJoin: LeftAnti, ExistenceJoin


> Add unit tests for "number of output rows" metric for joins in SQLMetricsSuite
> --
>
> Key: SPARK-31377
> URL: https://issues.apache.org/jira/browse/SPARK-31377
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.1.0
>Reporter: Srinivas Rishindra Pothireddi
>Priority: Minor
>
> For some combinations of join algorithm and join types there are no unit 
> tests for the "number of output rows" metric.
> A list of missing unit tests include the following.
>  * SortMergeJoin: ExistenceJoin
>  * ShuffledHashJoin: OuterJoin, leftOuter, RightOuter, LeftAnti, LeftSemi, 
> ExistenseJoin
>  * BroadcastNestedLoopJoin: RightOuter, ExistenceJoin, InnerJoin
>  * BroadcastHashJoin: LeftAnti, ExistenceJoin



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31389) Ensure all tests in SQLMetricsSuite run with both codegen on and off

2020-04-09 Thread Srinivas Rishindra Pothireddi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivas Rishindra Pothireddi updated SPARK-31389:
--
Description: 
Many tests in SQLMetricsSuite run only with codegen turned off. Some complex 
code paths (for example, generated code in "SortMergeJoin metrics") aren't 
exercised at all. The generated code should be tested as well.

*List of tests that run with codegen off*

Filter metrics, SortMergeJoin metrics, SortMergeJoin(outer) metrics, 
BroadcastHashJoin metrics,  ShuffledHashJoin metrics, BroadcastHashJoin(outer) 
metrics, BroadcastNestedLoopJoin metrics, BroadcastLeftSemiJoinHash metrics, 
CartesianProduct metrics,  SortMergeJoin(left-anti) metrics

 

  was:
Many tests in SQLMetricsSuite run only with codegen turned off. Some complex 
code paths (for example, generated code in "SortMergeJoin metrics") aren't 
exercised at all. The generated code should be tested as well.

*List of tests that run with codegen off*

Filter metrics, SortMergeJoin metrics, SortMergeJoin(outer) metrics, 
BroadcastHashJoin metrics,  ShuffledHashJoin metrics, BroadcastHashJoin(outer) 
metrics, BroadcastNestedLoopJoin metrics, BroadcastLeftSemiJoinHash metrics

 


> Ensure all tests in SQLMetricsSuite run with both codegen on and off
> 
>
> Key: SPARK-31389
> URL: https://issues.apache.org/jira/browse/SPARK-31389
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.1.0
>Reporter: Srinivas Rishindra Pothireddi
>Priority: Minor
>
> Many tests in SQLMetricsSuite run only with codegen turned off. Some complex 
> code paths (for example, generated code in "SortMergeJoin metrics") aren't 
> exercised at all. The generated code should be tested as well.
> *List of tests that run with codegen off*
> Filter metrics, SortMergeJoin metrics, SortMergeJoin(outer) metrics, 
> BroadcastHashJoin metrics,  ShuffledHashJoin metrics, 
> BroadcastHashJoin(outer) metrics, BroadcastNestedLoopJoin metrics, 
> BroadcastLeftSemiJoinHash metrics, CartesianProduct metrics,  
> SortMergeJoin(left-anti) metrics
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31389) Ensure all tests in SQLMetricsSuite run with both codegen on and off

2020-04-08 Thread Srinivas Rishindra Pothireddi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078803#comment-17078803
 ] 

Srinivas Rishindra Pothireddi commented on SPARK-31389:
---

I am working on this.

> Ensure all tests in SQLMetricsSuite run with both codegen on and off
> 
>
> Key: SPARK-31389
> URL: https://issues.apache.org/jira/browse/SPARK-31389
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.1.0
>Reporter: Srinivas Rishindra Pothireddi
>Priority: Minor
>
> Many tests in SQLMetricsSuite run only with codegen turned off. Some complex 
> code paths (for example, generated code in "SortMergeJoin metrics") aren't 
> exercised at all. The generated code should be tested as well.
> *List of tests that run with codegen off*
> Filter metrics, SortMergeJoin metrics, SortMergeJoin(outer) metrics, 
> BroadcastHashJoin metrics,  ShuffledHashJoin metrics, 
> BroadcastHashJoin(outer) metrics, BroadcastNestedLoopJoin metrics, 
> BroadcastLeftSemiJoinHash metrics
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31389) Ensure all tests in SQLMetricsSuite run with both codegen on and off

2020-04-08 Thread Srinivas Rishindra Pothireddi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivas Rishindra Pothireddi updated SPARK-31389:
--
Description: 
Many tests in SQLMetricsSuite run only with codegen turned off. Some complex 
code paths (for example, generated code in "SortMergeJoin metrics") aren't 
exercised at all. The generated code should be tested as well.

*List of tests that run with codegen off*

Filter metrics, SortMergeJoin metrics, SortMergeJoin(outer) metrics, 
BroadcastHashJoin metrics,  ShuffledHashJoin metrics, BroadcastHashJoin(outer) 
metrics, BroadcastNestedLoopJoin metrics, BroadcastLeftSemiJoinHash metrics

 

  was:
Many tests in SQLMetricsSuite run only with codegen turned off. Some complex 
code paths (for example, generated code in "SortMergeJoin metrics") aren't 
exercised at all.

*List of tests that run with codegen off*

Filter metrics, SortMergeJoin metrics, SortMergeJoin(outer) metrics, 
BroadcastHashJoin metrics,  ShuffledHashJoin metrics, BroadcastHashJoin(outer) 
metrics, BroadcastNestedLoopJoin metrics, BroadcastLeftSemiJoinHash metrics

The generated code should be tested as well.


> Ensure all tests in SQLMetricsSuite run with both codegen on and off
> 
>
> Key: SPARK-31389
> URL: https://issues.apache.org/jira/browse/SPARK-31389
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.1.0
>Reporter: Srinivas Rishindra Pothireddi
>Priority: Minor
>
> Many tests in SQLMetricsSuite run only with codegen turned off. Some complex 
> code paths (for example, generated code in "SortMergeJoin metrics") aren't 
> exercised at all. The generated code should be tested as well.
> *List of tests that run with codegen off*
> Filter metrics, SortMergeJoin metrics, SortMergeJoin(outer) metrics, 
> BroadcastHashJoin metrics,  ShuffledHashJoin metrics, 
> BroadcastHashJoin(outer) metrics, BroadcastNestedLoopJoin metrics, 
> BroadcastLeftSemiJoinHash metrics
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31389) Ensure all tests in SQLMetricsSuite run with both codegen on and off

2020-04-08 Thread Srinivas Rishindra Pothireddi (Jira)
Srinivas Rishindra Pothireddi created SPARK-31389:
-

 Summary: Ensure all tests in SQLMetricsSuite run with both codegen 
on and off
 Key: SPARK-31389
 URL: https://issues.apache.org/jira/browse/SPARK-31389
 Project: Spark
  Issue Type: Bug
  Components: SQL, Tests
Affects Versions: 3.1.0
Reporter: Srinivas Rishindra Pothireddi


Many tests in SQLMetricsSuite run only with codegen turned off. Some complex 
code paths (for example, generated code in "SortMergeJoin metrics") aren't 
exercised at all.

*List of tests that run with codegen off*

Filter metrics, SortMergeJoin metrics, SortMergeJoin(outer) metrics, 
BroadcastHashJoin metrics,  ShuffledHashJoin metrics, BroadcastHashJoin(outer) 
metrics, BroadcastNestedLoopJoin metrics, BroadcastLeftSemiJoinHash metrics

The generated code should be tested as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31377) Add unit tests for "number of output rows" metric for joins in SQLMetricsSuite

2020-04-08 Thread Srinivas Rishindra Pothireddi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078524#comment-17078524
 ] 

Srinivas Rishindra Pothireddi commented on SPARK-31377:
---

I am working on this

> Add unit tests for "number of output rows" metric for joins in SQLMetricsSuite
> --
>
> Key: SPARK-31377
> URL: https://issues.apache.org/jira/browse/SPARK-31377
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.1.0
>Reporter: Srinivas Rishindra Pothireddi
>Priority: Minor
>
> For some combinations of join algorithm and join types there are no unit 
> tests for the "number of output rows" metric.
> A list of missing unit tests include the following.
>  * SortMergeJoin: ExistenceJoin
>  * ShuffledHashJoin: OuterJoin, ReftOuter, RightOuter, LeftAnti, LeftSemi, 
> ExistenseJoin
>  * BroadcastNestedLoopJoin: RightOuter, ExistenceJoin, InnerJoin
>  * BroadcastHashJoin: LeftAnti, ExistenceJoin



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31377) Add missing unit tests for "number of output rows" metric for joins in SQLMetricsSuite

2020-04-07 Thread Srinivas Rishindra Pothireddi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivas Rishindra Pothireddi updated SPARK-31377:
--
Description: 
For some combinations of join algorithm and join types there are no unit tests 
for the "number of output rows" metric.

A list of missing unit tests include the following.
 * SortMergeJoin: ExistenceJoin
 * ShuffledHashJoin: OuterJoin, ReftOuter, RightOuter, LeftAnti, LeftSemi, 
ExistenseJoin
 * BroadcastNestedLoopJoin: RightOuter, ExistenceJoin, InnerJoin
 * BroadcastHashJoin: LeftAnti, ExistenceJoin

  was:
For some combinations of join algorithm and join types there are no unit tests 
for the "number of output rows" metric.

For Inner join in SortMergeJoin there is code in unit tests with code 
generation enabled. There are no tests for the "number of output rows" metric 
with code generation disabled.

A list of missing unit tests include the following.
 * SortMergeJoin: ExistenceJoin
 * ShuffledHashJoin: OuterJoin, ReftOuter, RightOuter, LeftAnti, LeftSemi, 
ExistenseJoin
 * BroadcastNestedLoopJoin: RightOuter, ExistenceJoin, InnerJoin
 * BroadcastHashJoin: LeftAnti, ExistenceJoin
 * For inner join in SortMergeJoin with code generation turned off.


> Add missing unit tests for "number of output rows" metric for joins in 
> SQLMetricsSuite
> --
>
> Key: SPARK-31377
> URL: https://issues.apache.org/jira/browse/SPARK-31377
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Srinivas Rishindra Pothireddi
>Priority: Major
>
> For some combinations of join algorithm and join types there are no unit 
> tests for the "number of output rows" metric.
> A list of missing unit tests include the following.
>  * SortMergeJoin: ExistenceJoin
>  * ShuffledHashJoin: OuterJoin, ReftOuter, RightOuter, LeftAnti, LeftSemi, 
> ExistenseJoin
>  * BroadcastNestedLoopJoin: RightOuter, ExistenceJoin, InnerJoin
>  * BroadcastHashJoin: LeftAnti, ExistenceJoin



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31380) Peak Execution Memory Quantile is not displayed in Spark History Server UI

2020-04-07 Thread Srinivas Rishindra Pothireddi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivas Rishindra Pothireddi updated SPARK-31380:
--
Description: 
Peak Execution Memory Quantile is displayed in the regular Spark UI correctly. 
If the same application is viewed in Spark History Server UI, Peak Execution 
Memory is always displayed as zero.

Spark event log for the application seem to contain Peak Execution Memory(under 
the tag "internal.metrics.peakExecutionMemory") correctly.  However this is not 
reflected in the History Server UI.

*Steps to produce non-zero Peak Execution Memory*

spark.range(0, 20).map\{x => (x , x % 20)}.toDF("a", 
"b").createOrReplaceTempView("fred")

spark.range(0, 20).map\{x => (x , x + 1)}.toDF("a", 
"b").createOrReplaceTempView("phil")

sql("select p.**,* f.* from phil p join fred f on f.b = p.b").count

 

  was:
Peak Execution Memory Quantile is displayed in the regular Spark UI correctly. 
If the same application is viewed in Spark History Server UI, Peak Execution 
Memory is always displayed as zero.

Spark event log for the application seem to contain Peak Execution Memory(under 
the tag "internal.metrics.peakExecutionMemory") correctly.  However this is not 
reflected in the History Server UI.

*Steps to produce non-zero peakExecutionMemory*

spark.range(0, 20).map\{x => (x , x % 20)}.toDF("a", 
"b").createOrReplaceTempView("fred")

spark.range(0, 20).map\{x => (x , x + 1)}.toDF("a", 
"b").createOrReplaceTempView("phil")

sql("select p.**,* f.* from phil p join fred f on f.b = p.b").count

 


> Peak Execution Memory Quantile is not displayed in Spark History Server UI
> --
>
> Key: SPARK-31380
> URL: https://issues.apache.org/jira/browse/SPARK-31380
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 3.0.0
>Reporter: Srinivas Rishindra Pothireddi
>Priority: Major
>
> Peak Execution Memory Quantile is displayed in the regular Spark UI 
> correctly. If the same application is viewed in Spark History Server UI, Peak 
> Execution Memory is always displayed as zero.
> Spark event log for the application seem to contain Peak Execution 
> Memory(under the tag "internal.metrics.peakExecutionMemory") correctly.  
> However this is not reflected in the History Server UI.
> *Steps to produce non-zero Peak Execution Memory*
> spark.range(0, 20).map\{x => (x , x % 20)}.toDF("a", 
> "b").createOrReplaceTempView("fred")
> spark.range(0, 20).map\{x => (x , x + 1)}.toDF("a", 
> "b").createOrReplaceTempView("phil")
> sql("select p.**,* f.* from phil p join fred f on f.b = p.b").count
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31380) Peak Execution Memory Quantile is not displayed in Spark History Server UI

2020-04-07 Thread Srinivas Rishindra Pothireddi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivas Rishindra Pothireddi updated SPARK-31380:
--
Description: 
Peak Execution Memory Quantile is displayed in the regular Spark UI correctly. 
If the same application is viewed in Spark History Server UI, Peak Execution 
Memory is always displayed as zero.

Spark event log for the application seem to contain Peak Execution Memory(under 
the tag "internal.metrics.peakExecutionMemory") correctly.  However this is not 
reflected in the History Server UI.

*Steps to produce non-zero peakExecutionMemory*

spark.range(0, 20).map\{x => (x , x % 20)}.toDF("a", 
"b").createOrReplaceTempView("fred")

spark.range(0, 20).map\{x => (x , x + 1)}.toDF("a", 
"b").createOrReplaceTempView("phil")

sql("select p.**,* f.* from phil p join fred f on f.b = p.b").count

 

  was:
Peak Execution Memory Quantile is displayed in the regular Spark UI correctly. 
If the same application is viewed in Spark History Server UI, Peak Execution 
Memory is always displayed as zero.

Spark event log for the application seem to contain Peak Execution Memory(under 
the tag "internal.metrics.peakExecutionMemory") correctly.  However this is not 
reflected in the History Server UI.

*Steps to produce non-zero peakExecutionMemory*

spark.range(0, 20).map\{x => (x , x % 20)}.toDF("a", 
"b").createOrReplaceTempView("fred")

spark.range(0, 20).map\{x => (x , x + 1)}.toDF("a", 
"b").createOrReplaceTempView("phil")

sql("select p.**,* f*.** from phil p join fred f on f.b = p.b").count

 


> Peak Execution Memory Quantile is not displayed in Spark History Server UI
> --
>
> Key: SPARK-31380
> URL: https://issues.apache.org/jira/browse/SPARK-31380
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 3.0.0
>Reporter: Srinivas Rishindra Pothireddi
>Priority: Major
>
> Peak Execution Memory Quantile is displayed in the regular Spark UI 
> correctly. If the same application is viewed in Spark History Server UI, Peak 
> Execution Memory is always displayed as zero.
> Spark event log for the application seem to contain Peak Execution 
> Memory(under the tag "internal.metrics.peakExecutionMemory") correctly.  
> However this is not reflected in the History Server UI.
> *Steps to produce non-zero peakExecutionMemory*
> spark.range(0, 20).map\{x => (x , x % 20)}.toDF("a", 
> "b").createOrReplaceTempView("fred")
> spark.range(0, 20).map\{x => (x , x + 1)}.toDF("a", 
> "b").createOrReplaceTempView("phil")
> sql("select p.**,* f.* from phil p join fred f on f.b = p.b").count
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31380) Peak Execution Memory Quantile is not displayed in Spark History Server UI

2020-04-07 Thread Srinivas Rishindra Pothireddi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivas Rishindra Pothireddi updated SPARK-31380:
--
Description: 
Peak Execution Memory Quantile is displayed in the regular Spark UI correctly. 
If the same application is viewed in Spark History Server UI, Peak Execution 
Memory is always displayed as zero.

Spark event log for the application seem to contain Peak Execution Memory(under 
the tag "internal.metrics.peakExecutionMemory") correctly.  However this is not 
reflected in the history server UI.

*Steps to produce non-zero peakExecutionMemory*

spark.range(0, 20).map\{x => (x , x % 20)}.toDF("a", 
"b").createOrReplaceTempView("fred")

spark.range(0, 20).map\{x => (x , x + 1)}.toDF("a", 
"b").createOrReplaceTempView("phil")

sql("select p.**,* f*.** from phil p join fred f on f.b = p.b").count

 

  was:
Peak Execution Memory Quantile is displayed in the regular Spark UI correctly. 
If the same application is viewed in Spark History Server UI, Peak Execution 
Memory is always displayed as zero.

Spark event log for the application seem to contain Peak Execution Memory(under 
the tag "internal.metrics.peakExecutionMemory") correctly.  However this is not 
reflected in the history server UI.

*Steps to produce non-zero peakExecutionMemory*

spark.range(0, 20).map\{x => (x , x % 20)}.toDF("a", 
"b").createOrReplaceTempView("fred")

spark.range(0, 20).map\{x => (x , x + 1)}.toDF("a", 
"b").createOrReplaceTempView("phil")

sql("select p.**, f.** from phil p join fred f on f.b = p.b").count

 


> Peak Execution Memory Quantile is not displayed in Spark History Server UI
> --
>
> Key: SPARK-31380
> URL: https://issues.apache.org/jira/browse/SPARK-31380
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 3.0.0
>Reporter: Srinivas Rishindra Pothireddi
>Priority: Major
>
> Peak Execution Memory Quantile is displayed in the regular Spark UI 
> correctly. If the same application is viewed in Spark History Server UI, Peak 
> Execution Memory is always displayed as zero.
> Spark event log for the application seem to contain Peak Execution 
> Memory(under the tag "internal.metrics.peakExecutionMemory") correctly.  
> However this is not reflected in the history server UI.
> *Steps to produce non-zero peakExecutionMemory*
> spark.range(0, 20).map\{x => (x , x % 20)}.toDF("a", 
> "b").createOrReplaceTempView("fred")
> spark.range(0, 20).map\{x => (x , x + 1)}.toDF("a", 
> "b").createOrReplaceTempView("phil")
> sql("select p.**,* f*.** from phil p join fred f on f.b = p.b").count
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31380) Peak Execution Memory Quantile is not displayed in Spark History Server UI

2020-04-07 Thread Srinivas Rishindra Pothireddi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivas Rishindra Pothireddi updated SPARK-31380:
--
Description: 
Peak Execution Memory Quantile is displayed in the regular Spark UI correctly. 
If the same application is viewed in Spark History Server UI, Peak Execution 
Memory is always displayed as zero.

Spark event log for the application seem to contain Peak Execution Memory(under 
the tag "internal.metrics.peakExecutionMemory") correctly.  However this is not 
reflected in the History Server UI.

*Steps to produce non-zero peakExecutionMemory*

spark.range(0, 20).map\{x => (x , x % 20)}.toDF("a", 
"b").createOrReplaceTempView("fred")

spark.range(0, 20).map\{x => (x , x + 1)}.toDF("a", 
"b").createOrReplaceTempView("phil")

sql("select p.**,* f*.** from phil p join fred f on f.b = p.b").count

 

  was:
Peak Execution Memory Quantile is displayed in the regular Spark UI correctly. 
If the same application is viewed in Spark History Server UI, Peak Execution 
Memory is always displayed as zero.

Spark event log for the application seem to contain Peak Execution Memory(under 
the tag "internal.metrics.peakExecutionMemory") correctly.  However this is not 
reflected in the history server UI.

*Steps to produce non-zero peakExecutionMemory*

spark.range(0, 20).map\{x => (x , x % 20)}.toDF("a", 
"b").createOrReplaceTempView("fred")

spark.range(0, 20).map\{x => (x , x + 1)}.toDF("a", 
"b").createOrReplaceTempView("phil")

sql("select p.**,* f*.** from phil p join fred f on f.b = p.b").count

 


> Peak Execution Memory Quantile is not displayed in Spark History Server UI
> --
>
> Key: SPARK-31380
> URL: https://issues.apache.org/jira/browse/SPARK-31380
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 3.0.0
>Reporter: Srinivas Rishindra Pothireddi
>Priority: Major
>
> Peak Execution Memory Quantile is displayed in the regular Spark UI 
> correctly. If the same application is viewed in Spark History Server UI, Peak 
> Execution Memory is always displayed as zero.
> Spark event log for the application seem to contain Peak Execution 
> Memory(under the tag "internal.metrics.peakExecutionMemory") correctly.  
> However this is not reflected in the History Server UI.
> *Steps to produce non-zero peakExecutionMemory*
> spark.range(0, 20).map\{x => (x , x % 20)}.toDF("a", 
> "b").createOrReplaceTempView("fred")
> spark.range(0, 20).map\{x => (x , x + 1)}.toDF("a", 
> "b").createOrReplaceTempView("phil")
> sql("select p.**,* f*.** from phil p join fred f on f.b = p.b").count
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31380) Peak Execution Memory Quantile is not displayed in Spark History Server UI

2020-04-07 Thread Srinivas Rishindra Pothireddi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivas Rishindra Pothireddi updated SPARK-31380:
--
Description: 
Peak Execution Memory Quantile is displayed in the regular Spark UI correctly. 
If the same application is viewed in Spark History Server UI, Peak Execution 
Memory is always displayed as zero.

Spark event log for the application seem to contain Peak Execution Memory(under 
the tag "internal.metrics.peakExecutionMemory") correctly.  However this is not 
reflected in the history server UI.

*Steps to produce non-zero peakExecutionMemory*

spark.range(0, 20).map\{x => (x , x % 20)}.toDF("a", 
"b").createOrReplaceTempView("fred")

spark.range(0, 20).map\{x => (x , x + 1)}.toDF("a", 
"b").createOrReplaceTempView("phil")

sql("select p.**, f.** from phil p join fred f on f.b = p.b").count

 

  was:
Peak Execution Memory Quantile is displayed in the regular Spark UI correctly. 
If the same application is viewed in Spark History Server UI, Peak Execution 
Memory is always displayed as zero.

Spark event log for the application seem to contain Peak Execution Memory(under 
the tag "internal.metrics.peakExecutionMemory") correctly.  However this is not 
reflected in the history server UI.

*Steps to produce non-zero peakExecutionMemory*

spark.range(0, 20).map\{x => (x , x % 20)}.toDF("a", 
"b").createOrReplaceTempView("fred")

spark.range(0, 20).map\{x => (x , x + 1)}.toDF("a", 
"b").createOrReplaceTempView("phil")

sql("select p.*, f.* from phil p join fred f on f.b = p.b").count

 


> Peak Execution Memory Quantile is not displayed in Spark History Server UI
> --
>
> Key: SPARK-31380
> URL: https://issues.apache.org/jira/browse/SPARK-31380
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 3.0.0
>Reporter: Srinivas Rishindra Pothireddi
>Priority: Major
>
> Peak Execution Memory Quantile is displayed in the regular Spark UI 
> correctly. If the same application is viewed in Spark History Server UI, Peak 
> Execution Memory is always displayed as zero.
> Spark event log for the application seem to contain Peak Execution 
> Memory(under the tag "internal.metrics.peakExecutionMemory") correctly.  
> However this is not reflected in the history server UI.
> *Steps to produce non-zero peakExecutionMemory*
> spark.range(0, 20).map\{x => (x , x % 20)}.toDF("a", 
> "b").createOrReplaceTempView("fred")
> spark.range(0, 20).map\{x => (x , x + 1)}.toDF("a", 
> "b").createOrReplaceTempView("phil")
> sql("select p.**, f.** from phil p join fred f on f.b = p.b").count
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31380) Peak Execution Memory Quantile is not displayed in Spark History Server UI

2020-04-07 Thread Srinivas Rishindra Pothireddi (Jira)
Srinivas Rishindra Pothireddi created SPARK-31380:
-

 Summary: Peak Execution Memory Quantile is not displayed in Spark 
History Server UI
 Key: SPARK-31380
 URL: https://issues.apache.org/jira/browse/SPARK-31380
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Web UI
Affects Versions: 3.0.0
Reporter: Srinivas Rishindra Pothireddi


Peak Execution Memory Quantile is displayed in the regular Spark UI correctly. 
If the same application is viewed in Spark History Server UI, Peak Execution 
Memory is always displayed as zero.

Spark event log for the application seem to contain Peak Execution Memory(under 
the tag "internal.metrics.peakExecutionMemory") correctly.  However this is not 
reflected in the history server UI.

*Steps to produce non-zero peakExecutionMemory*

spark.range(0, 20).map\{x => (x , x % 20)}.toDF("a", 
"b").createOrReplaceTempView("fred")

spark.range(0, 20).map\{x => (x , x + 1)}.toDF("a", 
"b").createOrReplaceTempView("phil")

sql("select p.*, f.* from phil p join fred f on f.b = p.b").count

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31377) Add missing unit tests for "number of output rows" metric for joins in SQLMetricsSuite

2020-04-07 Thread Srinivas Rishindra Pothireddi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivas Rishindra Pothireddi updated SPARK-31377:
--
Description: 
For some combinations of join algorithm and join types there are no unit tests 
for the "number of output rows" metric.

For Inner join in SortMergeJoin there is code in unit tests with code 
generation enabled. There are no tests for the "number of output rows" metric 
with code generation disabled.

A list of missing unit tests include the following.
 * SortMergeJoin: ExistenceJoin
 * ShuffledHashJoin: OuterJoin, ReftOuter, RightOuter, LeftAnti, LeftSemi, 
ExistenseJoin
 * BroadcastNestedLoopJoin: RightOuter, ExistenceJoin, InnerJoin
 * BroadcastHashJoin: LeftAnti, ExistenceJoin
 * For inner join in SortMergeJoin with code generation turned off.

  was:
For some combinations of join algorithm and join types there are no unit tests 
for the "number of output rows" metric.

For Inner join and BroadcastHashJoin there is code in unit tests with code 
generation enabled. There are no tests for the "number of output rows" metric 
with code generation disabled.

A list of missing unit tests include the following.
 * SortMergeJoin: ExistenceJoin
 * ShuffledHashJoin: OuterJoin, ReftOuter, RightOuter, LeftAnti, LeftSemi, 
ExistenseJoin
 * BroadcastNestedLoopJoin: RightOuter, ExistenceJoin, InnerJoin
 * BroadcastHashJoin: LeftAnti, ExistenceJoin
 * All the join types for SortMergeJoin and BroadcastHashJoin with code 
generation turned off.


> Add missing unit tests for "number of output rows" metric for joins in 
> SQLMetricsSuite
> --
>
> Key: SPARK-31377
> URL: https://issues.apache.org/jira/browse/SPARK-31377
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Srinivas Rishindra Pothireddi
>Priority: Major
>
> For some combinations of join algorithm and join types there are no unit 
> tests for the "number of output rows" metric.
> For Inner join in SortMergeJoin there is code in unit tests with code 
> generation enabled. There are no tests for the "number of output rows" metric 
> with code generation disabled.
> A list of missing unit tests include the following.
>  * SortMergeJoin: ExistenceJoin
>  * ShuffledHashJoin: OuterJoin, ReftOuter, RightOuter, LeftAnti, LeftSemi, 
> ExistenseJoin
>  * BroadcastNestedLoopJoin: RightOuter, ExistenceJoin, InnerJoin
>  * BroadcastHashJoin: LeftAnti, ExistenceJoin
>  * For inner join in SortMergeJoin with code generation turned off.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31377) Add missing unit tests for "number of output rows" metric for joins in SQLMetricsSuite

2020-04-07 Thread Srinivas Rishindra Pothireddi (Jira)
Srinivas Rishindra Pothireddi created SPARK-31377:
-

 Summary: Add missing unit tests for "number of output rows" metric 
for joins in SQLMetricsSuite
 Key: SPARK-31377
 URL: https://issues.apache.org/jira/browse/SPARK-31377
 Project: Spark
  Issue Type: Improvement
  Components: SQL, Tests
Affects Versions: 3.0.0
Reporter: Srinivas Rishindra Pothireddi


For some combinations of join algorithm and join types there are no unit tests 
for the "number of output rows" metric.

For Inner join and BroadcastHashJoin there is code in unit tests with code 
generation enabled. There are no tests for the "number of output rows" metric 
with code generation disabled.

A list of missing unit tests include the following.
 * SortMergeJoin: ExistenceJoin
 * ShuffledHashJoin: OuterJoin, ReftOuter, RightOuter, LeftAnti, LeftSemi, 
ExistenseJoin
 * BroadcastNestedLoopJoin: RightOuter, ExistenceJoin, InnerJoin
 * BroadcastHashJoin: LeftAnti, ExistenceJoin
 * All the join types for SortMergeJoin and BroadcastHashJoin with code 
generation turned off.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17538) sqlContext.registerDataFrameAsTable is not working sometimes in pyspark 2.0.0

2016-09-15 Thread Srinivas Rishindra Pothireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15493075#comment-15493075
 ] 

Srinivas Rishindra Pothireddi commented on SPARK-17538:
---

Hi [~srowen],

I updated the description of the jira as per your suggestion. Can you please 
let me know if you need anymore information

> sqlContext.registerDataFrameAsTable is not working sometimes in pyspark 2.0.0
> -
>
> Key: SPARK-17538
> URL: https://issues.apache.org/jira/browse/SPARK-17538
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: os - linux
> cluster -> yarn and local
>Reporter: Srinivas Rishindra Pothireddi
>
> I have a production job in spark 1.6.2 that registers several dataframes as 
> tables. 
> After testing the job in spark 2.0.0, I found that one of the dataframes is 
> not getting registered as a table.
> Line 353 of my code --> self.sqlContext.registerDataFrameAsTable(anonymousDF, 
> "anonymousTable")
> line 354 of my code --> df = self.sqlContext.sql("select AnonymousFiled1, 
> AnonymousUDF( AnonymousFiled1 ) as AnonymousFiled3 from anonymousTable")
> my stacktrace
>  File "anonymousFile.py", line 354, in anonymousMethod
> df = self.sqlContext.sql("select AnonymousFiled1, AnonymousUDF( 
> AnonymousFiled1 ) as AnonymousFiled3 from anonymousTable")
>   File 
> "/home/anonymousUser/Downloads/spark-2.0.0-bin-hadoop2.7/python/pyspark/sql/context.py",
>  line 350, in sql
> return self.sparkSession.sql(sqlQuery)
>   File 
> "/home/anonymousUser/Downloads/spark-2.0.0-bin-hadoop2.7/python/pyspark/sql/session.py",
>  line 541, in sql
> return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
>   File 
> "/home/anonymousUser/Downloads/spark-2.0.0-bin-hadoop2.7/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py",
>  line 933, in __call__
> answer, self.gateway_client, self.target_id, self.name)
>   File 
> "/home/anonymousUser/Downloads/spark-2.0.0-bin-hadoop2.7/python/pyspark/sql/utils.py",
>  line 69, in deco
> raise AnalysisException(s.split(': ', 1)[1], stackTrace)
> AnalysisException: u'Table or view not found: anonymousTable; line 1 pos 61'
> The same code is working perfectly fine in spark-1.6.2 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17538) sqlContext.registerDataFrameAsTable is not working sometimes in pyspark 2.0.0

2016-09-15 Thread Srinivas Rishindra Pothireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivas Rishindra Pothireddi updated SPARK-17538:
--
Description: 
I have a production job in spark 1.6.2 that registers several dataframes as 
tables. 
After testing the job in spark 2.0.0, I found that one of the dataframes is not 
getting registered as a table.


Line 353 of my code --> self.sqlContext.registerDataFrameAsTable(anonymousDF, 
"anonymousTable")
line 354 of my code --> df = self.sqlContext.sql("select AnonymousFiled1, 
AnonymousUDF( AnonymousFiled1 ) as AnonymousFiled3 from anonymousTable")

my stacktrace

 File "anonymousFile.py", line 354, in anonymousMethod
df = self.sqlContext.sql("select AnonymousFiled1, AnonymousUDF( 
AnonymousFiled1 ) as AnonymousFiled3 from anonymousTable")
  File 
"/home/anonymousUser/Downloads/spark-2.0.0-bin-hadoop2.7/python/pyspark/sql/context.py",
 line 350, in sql
return self.sparkSession.sql(sqlQuery)
  File 
"/home/anonymousUser/Downloads/spark-2.0.0-bin-hadoop2.7/python/pyspark/sql/session.py",
 line 541, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
  File 
"/home/anonymousUser/Downloads/spark-2.0.0-bin-hadoop2.7/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py",
 line 933, in __call__
answer, self.gateway_client, self.target_id, self.name)
  File 
"/home/anonymousUser/Downloads/spark-2.0.0-bin-hadoop2.7/python/pyspark/sql/utils.py",
 line 69, in deco
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
AnalysisException: u'Table or view not found: anonymousTable; line 1 pos 61'


The same code is working perfectly fine in spark-1.6.2 

 

  was:
I have a production job in spark 1.6.2 that registers four dataframes as 
tables. After testing the job in spark 2.0.0 one of the dataframes is not 
getting registered as a table.

output of sqlContext.tableNames() just after registering the fourth dataframe 
in spark 1.6.2 is

temp1,temp2,temp3,temp4

output of sqlContext.tableNames() just after registering the fourth dataframe 
in spark 2.0.0 is
temp1,temp2,temp3

so when the table 'temp4' is used by the job at a later stage an 
AnalysisException is raised in spark 2.0.0

There are no changes in the code whatsoever. 


 

 


> sqlContext.registerDataFrameAsTable is not working sometimes in pyspark 2.0.0
> -
>
> Key: SPARK-17538
> URL: https://issues.apache.org/jira/browse/SPARK-17538
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: os - linux
> cluster -> yarn and local
>Reporter: Srinivas Rishindra Pothireddi
>
> I have a production job in spark 1.6.2 that registers several dataframes as 
> tables. 
> After testing the job in spark 2.0.0, I found that one of the dataframes is 
> not getting registered as a table.
> Line 353 of my code --> self.sqlContext.registerDataFrameAsTable(anonymousDF, 
> "anonymousTable")
> line 354 of my code --> df = self.sqlContext.sql("select AnonymousFiled1, 
> AnonymousUDF( AnonymousFiled1 ) as AnonymousFiled3 from anonymousTable")
> my stacktrace
>  File "anonymousFile.py", line 354, in anonymousMethod
> df = self.sqlContext.sql("select AnonymousFiled1, AnonymousUDF( 
> AnonymousFiled1 ) as AnonymousFiled3 from anonymousTable")
>   File 
> "/home/anonymousUser/Downloads/spark-2.0.0-bin-hadoop2.7/python/pyspark/sql/context.py",
>  line 350, in sql
> return self.sparkSession.sql(sqlQuery)
>   File 
> "/home/anonymousUser/Downloads/spark-2.0.0-bin-hadoop2.7/python/pyspark/sql/session.py",
>  line 541, in sql
> return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
>   File 
> "/home/anonymousUser/Downloads/spark-2.0.0-bin-hadoop2.7/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py",
>  line 933, in __call__
> answer, self.gateway_client, self.target_id, self.name)
>   File 
> "/home/anonymousUser/Downloads/spark-2.0.0-bin-hadoop2.7/python/pyspark/sql/utils.py",
>  line 69, in deco
> raise AnalysisException(s.split(': ', 1)[1], stackTrace)
> AnalysisException: u'Table or view not found: anonymousTable; line 1 pos 61'
> The same code is working perfectly fine in spark-1.6.2 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17538) sqlContext.registerDataFrameAsTable is not working sometimes in pyspark 2.0.0

2016-09-15 Thread Srinivas Rishindra Pothireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15492937#comment-15492937
 ] 

Srinivas Rishindra Pothireddi commented on SPARK-17538:
---

Hi [~srowen], I will fix this as soon as possible as you suggested.


> sqlContext.registerDataFrameAsTable is not working sometimes in pyspark 2.0.0
> -
>
> Key: SPARK-17538
> URL: https://issues.apache.org/jira/browse/SPARK-17538
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: os - linux
> cluster -> yarn and local
>Reporter: Srinivas Rishindra Pothireddi
>
> I have a production job in spark 1.6.2 that registers four dataframes as 
> tables. After testing the job in spark 2.0.0 one of the dataframes is not 
> getting registered as a table.
> output of sqlContext.tableNames() just after registering the fourth dataframe 
> in spark 1.6.2 is
> temp1,temp2,temp3,temp4
> output of sqlContext.tableNames() just after registering the fourth dataframe 
> in spark 2.0.0 is
> temp1,temp2,temp3
> so when the table 'temp4' is used by the job at a later stage an 
> AnalysisException is raised in spark 2.0.0
> There are no changes in the code whatsoever. 
>  
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17538) sqlContext.registerDataFrameAsTable is not working sometimes in pyspark 2.0.0

2016-09-15 Thread Srinivas Rishindra Pothireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivas Rishindra Pothireddi updated SPARK-17538:
--
Labels: pyspark  (was: )

> sqlContext.registerDataFrameAsTable is not working sometimes in pyspark 2.0.0
> -
>
> Key: SPARK-17538
> URL: https://issues.apache.org/jira/browse/SPARK-17538
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.0.1, 2.1.0
> Environment: os - linux
> cluster -> yarn and local
>Reporter: Srinivas Rishindra Pothireddi
>Priority: Critical
>  Labels: pyspark
> Fix For: 2.0.1, 2.1.0
>
>
> I have a production job in spark 1.6.2 that registers four dataframes as 
> tables. After testing the job in spark 2.0.0 one of the dataframes is not 
> getting registered as a table.
> output of sqlContext.tableNames() just after registering the fourth dataframe 
> in spark 1.6.2 is
> temp1,temp2,temp3,temp4
> output of sqlContext.tableNames() just after registering the fourth dataframe 
> in spark 2.0.0 is
> temp1,temp2,temp3
> so when the table 'temp4' is used by the job at a later stage an 
> AnalysisException is raised in spark 2.0.0
> There are no changes in the code whatsoever. 
>  
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17538) sqlContext.registerDataFrameAsTable is not working sometimes in pyspark 2.0.0

2016-09-14 Thread Srinivas Rishindra Pothireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivas Rishindra Pothireddi updated SPARK-17538:
--
Description: 
I have a production job in spark 1.6.2 that registers four dataframes as 
tables. After testing the job in spark 2.0.0 one of the dataframes is not 
getting registered as a table.

output of sqlContext.tableNames() just after registering the fourth dataframe 
in spark 1.6.2 is

temp1,temp2,temp3,temp4

output of sqlContext.tableNames() just after registering the fourth dataframe 
in spark 2.0.0 is
temp1,temp2,temp3

so when the table 'temp4' is used by the job at a later stage an 
AnalysisException is raised in spark 2.0.0

There are no changes in the code whatsoever. 


 

 

  was:
I have a production job in spark 1.6.2 that registers four dataframes as 
tables. After testing the job in spark 2.0.0 one of the dataframes is not 
getting registered as a table.

output of sqlContext.tableNames() just after registering the fourth dataframe 
in spark 1.6.2 is

temp1,temp2,temp3,temp4

output of sqlContext.tableNames() just after registering the fourth dataframe 
in spark 2.0.0 is
temp1,temp2,temp3

so when the table temp4 is used by the job at a later stage an 
AnalysisException is raised in spark 2.0.0

There are no changes in the code whatsoever. 


 

 


> sqlContext.registerDataFrameAsTable is not working sometimes in pyspark 2.0.0
> -
>
> Key: SPARK-17538
> URL: https://issues.apache.org/jira/browse/SPARK-17538
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.0.1, 2.1.0
> Environment: os - linux
> cluster -> yarn and local
>Reporter: Srinivas Rishindra Pothireddi
>Priority: Critical
> Fix For: 2.0.1, 2.1.0
>
>
> I have a production job in spark 1.6.2 that registers four dataframes as 
> tables. After testing the job in spark 2.0.0 one of the dataframes is not 
> getting registered as a table.
> output of sqlContext.tableNames() just after registering the fourth dataframe 
> in spark 1.6.2 is
> temp1,temp2,temp3,temp4
> output of sqlContext.tableNames() just after registering the fourth dataframe 
> in spark 2.0.0 is
> temp1,temp2,temp3
> so when the table 'temp4' is used by the job at a later stage an 
> AnalysisException is raised in spark 2.0.0
> There are no changes in the code whatsoever. 
>  
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17538) sqlContext.registerDataFrameAsTable is not working sometimes in pyspark 2.0.0

2016-09-14 Thread Srinivas Rishindra Pothireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivas Rishindra Pothireddi updated SPARK-17538:
--
Summary: sqlContext.registerDataFrameAsTable is not working sometimes in 
pyspark 2.0.0  (was: sqlContext.registerDataFrameAsTable is not working 
sometimes in spark 2.0)

> sqlContext.registerDataFrameAsTable is not working sometimes in pyspark 2.0.0
> -
>
> Key: SPARK-17538
> URL: https://issues.apache.org/jira/browse/SPARK-17538
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.0.1, 2.1.0
> Environment: os - linux
> cluster -> yarn and local
>Reporter: Srinivas Rishindra Pothireddi
>Priority: Critical
> Fix For: 2.0.1, 2.1.0
>
>
> I have a production job in spark 1.6.2 that registers four dataframes as 
> tables. After testing the job in spark 2.0.0 one of the dataframes is not 
> getting registered as a table.
> output of sqlContext.tableNames() just after registering the fourth dataframe 
> in spark 1.6.2 is
> temp1,temp2,temp3,temp4
> output of sqlContext.tableNames() just after registering the fourth dataframe 
> in spark 2.0.0 is
> temp1,temp2,temp3
> so when the table temp4 is used by the job at a later stage an 
> AnalysisException is raised in spark 2.0.0
> There are no changes in the code whatsoever. 
>  
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17538) sqlContext.registerDataFrameAsTable is not working sometimes in spark 2.0

2016-09-14 Thread Srinivas Rishindra Pothireddi (JIRA)
Srinivas Rishindra Pothireddi created SPARK-17538:
-

 Summary: sqlContext.registerDataFrameAsTable is not working 
sometimes in spark 2.0
 Key: SPARK-17538
 URL: https://issues.apache.org/jira/browse/SPARK-17538
 Project: Spark
  Issue Type: Bug
Affects Versions: 2.0.0, 2.0.1, 2.1.0
 Environment: os - linux
cluster -> yarn and local

Reporter: Srinivas Rishindra Pothireddi
Priority: Critical
 Fix For: 2.0.1, 2.1.0


I have a production job in spark 1.6.2 that registers four dataframes as 
tables. After testing the job in spark 2.0.0 one of the dataframes is not 
getting registered as a table.

output of sqlContext.tableNames() just after registering the fourth dataframe 
in spark 1.6.2 is

temp1,temp2,temp3,temp4

output of sqlContext.tableNames() just after registering the fourth dataframe 
in spark 2.0.0 is
temp1,temp2,temp3

so when the table temp4 is used by the job at a later stage an 
AnalysisException is raised in spark 2.0.0

There are no changes in the code whatsoever. 


 

 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org