[jira] [Commented] (SPARK-23121) When the Spark Streaming app is running for a period of time, the page is incorrectly reported when accessing '/ jobs /' or '/ jobs / job /? Id = 13' and ui can not be
[ https://issues.apache.org/jira/browse/SPARK-23121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16332300#comment-16332300 ] Sandor Murakozi commented on SPARK-23121: - One issue is with displaying old jobs. Depending on how old a job is it may or may not be displayed correctly. The bigger issue is that the main jobs page can also be affected. > When the Spark Streaming app is running for a period of time, the page is > incorrectly reported when accessing '/ jobs /' or '/ jobs / job /? Id = 13' > and ui can not be accessed. > - > > Key: SPARK-23121 > URL: https://issues.apache.org/jira/browse/SPARK-23121 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.4.0 >Reporter: guoxiaolongzte >Priority: Major > Attachments: 1.png, 2.png > > > When the Spark Streaming app is running for a period of time, the page is > incorrectly reported when accessing '/ jobs /' or '/ jobs / job /? Id = 13' > and ui can not be accessed. > > Test command: > ./bin/spark-submit --class org.apache.spark.examples.streaming.HdfsWordCount > ./examples/jars/spark-examples_2.11-2.4.0-SNAPSHOT.jar /spark > > The app is running for a period of time, ui can not be accessed, please see > attachment. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23121) When the Spark Streaming app is running for a period of time, the page is incorrectly reported when accessing '/ jobs /' or '/ jobs / job /? Id = 13' and ui can not be
[ https://issues.apache.org/jira/browse/SPARK-23121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16332249#comment-16332249 ] Sandor Murakozi commented on SPARK-23121: - [~guoxiaolongzte] found two separate problems, bot triggered by having a high number of jobs/stages. In such a situation the store of the history server drops various objects to save memory. It may happen that the job itself is in the store, but its stages or the RDDOperationGraph are not. In such cases rendering of the all jobs and the job pages fails. As a consequence, the jobs page may become inaccessible if the cluster processes many jobs, so I think the priority of this issue should be increased. What do you think [~srowen] ? > When the Spark Streaming app is running for a period of time, the page is > incorrectly reported when accessing '/ jobs /' or '/ jobs / job /? Id = 13' > and ui can not be accessed. > - > > Key: SPARK-23121 > URL: https://issues.apache.org/jira/browse/SPARK-23121 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.4.0 >Reporter: guoxiaolongzte >Priority: Major > Attachments: 1.png, 2.png > > > When the Spark Streaming app is running for a period of time, the page is > incorrectly reported when accessing '/ jobs /' or '/ jobs / job /? Id = 13' > and ui can not be accessed. > > Test command: > ./bin/spark-submit --class org.apache.spark.examples.streaming.HdfsWordCount > ./examples/jars/spark-examples_2.11-2.4.0-SNAPSHOT.jar /spark > > The app is running for a period of time, ui can not be accessed, please see > attachment. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22884) ML test for StructuredStreaming: spark.ml.clustering
[ https://issues.apache.org/jira/browse/SPARK-22884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16328898#comment-16328898 ] Sandor Murakozi commented on SPARK-22884: - Is there anybody working on this? If not I'm happy to pick it up. > ML test for StructuredStreaming: spark.ml.clustering > > > Key: SPARK-22884 > URL: https://issues.apache.org/jira/browse/SPARK-22884 > Project: Spark > Issue Type: Test > Components: ML, Tests >Affects Versions: 2.3.0 >Reporter: Joseph K. Bradley >Priority: Major > > Task for adding Structured Streaming tests for all Models/Transformers in a > sub-module in spark.ml > For an example, see LinearRegressionSuite.scala in > https://github.com/apache/spark/pull/19843 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23051) job description in Spark UI is broken
[ https://issues.apache.org/jira/browse/SPARK-23051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandor Murakozi updated SPARK-23051: Attachment: in-2.2.png Spark-23051-after.png Spark-23051-before.png Screenshots showing the behavior in 2.2 and in 2.3 before and after the fix > job description in Spark UI is broken > -- > > Key: SPARK-23051 > URL: https://issues.apache.org/jira/browse/SPARK-23051 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.0 >Reporter: Shixiong Zhu >Priority: Blocker > Labels: regression > Attachments: Spark-23051-after.png, Spark-23051-before.png, in-2.2.png > > > In previous versions, Spark UI will use the stage description if the job > description is not set. But right now it’s just empty. > Reproducer: Just run the following codes in spark shell and check the UI: > {code} > val q = > spark.readStream.format("rate").load().writeStream.format("console").start() > Thread.sleep(2000) > q.stop() > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22951) count() after dropDuplicates() on emptyDataFrame returns incorrect value
[ https://issues.apache.org/jira/browse/SPARK-22951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandor Murakozi updated SPARK-22951: Affects Version/s: 2.3.0 > count() after dropDuplicates() on emptyDataFrame returns incorrect value > > > Key: SPARK-22951 > URL: https://issues.apache.org/jira/browse/SPARK-22951 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.2, 2.2.0, 2.3.0 >Reporter: Michael Dreibelbis > > here is a minimal Spark Application to reproduce: > {code} > import org.apache.spark.sql.SQLContext > import org.apache.spark.{SparkConf, SparkContext} > object DropDupesApp extends App { > > override def main(args: Array[String]): Unit = { > val conf = new SparkConf() > .setAppName("test") > .setMaster("local") > val sc = new SparkContext(conf) > val sql = SQLContext.getOrCreate(sc) > assert(sql.emptyDataFrame.count == 0) // expected > assert(sql.emptyDataFrame.dropDuplicates.count == 1) // unexpected > } > > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-22951) count() after dropDuplicates() on emptyDataFrame returns incorrect value
[ https://issues.apache.org/jira/browse/SPARK-22951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16311225#comment-16311225 ] Sandor Murakozi edited comment on SPARK-22951 at 1/4/18 11:36 AM: -- Adding 2.2.0 to affected version: {code} __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.2.0 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_152) Type in expressions to have them evaluated. Type :help for more information. scala> Seq.empty[String].toDF.dropDuplicates.count res0: Long = 0 scala> spark.emptyDataset[String].dropDuplicates.count res1: Long = 0 scala> sc.emptyRDD[String].toDF.dropDuplicates.count res2: Long = 0 scala> spark.emptyDataFrame.dropDuplicates.count res3: Long = 1 {code} was (Author: smurakozi): Adding 2.2.0 to affected version: {quote} __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.2.0 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_152) Type in expressions to have them evaluated. Type :help for more information. scala> Seq.empty[String].toDF.dropDuplicates.count res0: Long = 0 scala> spark.emptyDataset[String].dropDuplicates.count res1: Long = 0 scala> sc.emptyRDD[String].toDF.dropDuplicates.count res2: Long = 0 scala> spark.emptyDataFrame.dropDuplicates.count res3: Long = 1{quote} > count() after dropDuplicates() on emptyDataFrame returns incorrect value > > > Key: SPARK-22951 > URL: https://issues.apache.org/jira/browse/SPARK-22951 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.2, 2.2.0 >Reporter: Michael Dreibelbis > > here is a minimal Spark Application to reproduce: > {code} > import org.apache.spark.sql.SQLContext > import org.apache.spark.{SparkConf, SparkContext} > object DropDupesApp extends App { > > override def main(args: Array[String]): Unit = { > val conf = new SparkConf() > .setAppName("test") > .setMaster("local") > val sc = new SparkContext(conf) > val sql = SQLContext.getOrCreate(sc) > assert(sql.emptyDataFrame.count == 0) // expected > assert(sql.emptyDataFrame.dropDuplicates.count == 1) // unexpected > } > > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-22951) count() after dropDuplicates() on emptyDataFrame returns incorrect value
[ https://issues.apache.org/jira/browse/SPARK-22951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16311225#comment-16311225 ] Sandor Murakozi edited comment on SPARK-22951 at 1/4/18 11:35 AM: -- Adding 2.2.0 to affected version: {quote} __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.2.0 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_152) Type in expressions to have them evaluated. Type :help for more information. scala> Seq.empty[String].toDF.dropDuplicates.count res0: Long = 0 scala> spark.emptyDataset[String].dropDuplicates.count res1: Long = 0 scala> sc.emptyRDD[String].toDF.dropDuplicates.count res2: Long = 0 scala> spark.emptyDataFrame.dropDuplicates.count res3: Long = 1{quote} was (Author: smurakozi): Adding 2.2.0 to affected version: {{ __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.2.0 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_152) Type in expressions to have them evaluated. Type :help for more information. scala> Seq.empty[String].toDF.dropDuplicates.count res0: Long = 0 scala> spark.emptyDataset[String].dropDuplicates.count res1: Long = 0 scala> sc.emptyRDD[String].toDF.dropDuplicates.count res2: Long = 0 scala> spark.emptyDataFrame.dropDuplicates.count res3: Long = 1 }} > count() after dropDuplicates() on emptyDataFrame returns incorrect value > > > Key: SPARK-22951 > URL: https://issues.apache.org/jira/browse/SPARK-22951 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.2, 2.2.0 >Reporter: Michael Dreibelbis > > here is a minimal Spark Application to reproduce: > {code} > import org.apache.spark.sql.SQLContext > import org.apache.spark.{SparkConf, SparkContext} > object DropDupesApp extends App { > > override def main(args: Array[String]): Unit = { > val conf = new SparkConf() > .setAppName("test") > .setMaster("local") > val sc = new SparkContext(conf) > val sql = SQLContext.getOrCreate(sc) > assert(sql.emptyDataFrame.count == 0) // expected > assert(sql.emptyDataFrame.dropDuplicates.count == 1) // unexpected > } > > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22951) count() after dropDuplicates() on emptyDataFrame returns incorrect value
[ https://issues.apache.org/jira/browse/SPARK-22951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandor Murakozi updated SPARK-22951: Affects Version/s: 2.2.0 > count() after dropDuplicates() on emptyDataFrame returns incorrect value > > > Key: SPARK-22951 > URL: https://issues.apache.org/jira/browse/SPARK-22951 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.2, 2.2.0 >Reporter: Michael Dreibelbis > > here is a minimal Spark Application to reproduce: > {code} > import org.apache.spark.sql.SQLContext > import org.apache.spark.{SparkConf, SparkContext} > object DropDupesApp extends App { > > override def main(args: Array[String]): Unit = { > val conf = new SparkConf() > .setAppName("test") > .setMaster("local") > val sc = new SparkContext(conf) > val sql = SQLContext.getOrCreate(sc) > assert(sql.emptyDataFrame.count == 0) // expected > assert(sql.emptyDataFrame.dropDuplicates.count == 1) // unexpected > } > > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22951) count() after dropDuplicates() on emptyDataFrame returns incorrect value
[ https://issues.apache.org/jira/browse/SPARK-22951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16311225#comment-16311225 ] Sandor Murakozi commented on SPARK-22951: - Adding 2.2.0 to affected version: {{ __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.2.0 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_152) Type in expressions to have them evaluated. Type :help for more information. scala> Seq.empty[String].toDF.dropDuplicates.count res0: Long = 0 scala> spark.emptyDataset[String].dropDuplicates.count res1: Long = 0 scala> sc.emptyRDD[String].toDF.dropDuplicates.count res2: Long = 0 scala> spark.emptyDataFrame.dropDuplicates.count res3: Long = 1 }} > count() after dropDuplicates() on emptyDataFrame returns incorrect value > > > Key: SPARK-22951 > URL: https://issues.apache.org/jira/browse/SPARK-22951 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.2, 2.2.0 >Reporter: Michael Dreibelbis > > here is a minimal Spark Application to reproduce: > {code} > import org.apache.spark.sql.SQLContext > import org.apache.spark.{SparkConf, SparkContext} > object DropDupesApp extends App { > > override def main(args: Array[String]): Unit = { > val conf = new SparkConf() > .setAppName("test") > .setMaster("local") > val sc = new SparkContext(conf) > val sql = SQLContext.getOrCreate(sc) > assert(sql.emptyDataFrame.count == 0) // expected > assert(sql.emptyDataFrame.dropDuplicates.count == 1) // unexpected > } > > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22887) ML test for StructuredStreaming: spark.ml.fpm
[ https://issues.apache.org/jira/browse/SPARK-22887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310258#comment-16310258 ] Sandor Murakozi commented on SPARK-22887: - Is anyone working on this? If not I would like to work on it. > ML test for StructuredStreaming: spark.ml.fpm > - > > Key: SPARK-22887 > URL: https://issues.apache.org/jira/browse/SPARK-22887 > Project: Spark > Issue Type: Test > Components: ML, Tests >Affects Versions: 2.3.0 >Reporter: Joseph K. Bradley > > Task for adding Structured Streaming tests for all Models/Transformers in a > sub-module in spark.ml > For an example, see LinearRegressionSuite.scala in > https://github.com/apache/spark/pull/19843 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22360) Add unit test for Window Specifications
[ https://issues.apache.org/jira/browse/SPARK-22360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandor Murakozi updated SPARK-22360: Description: {color:red}colored text{color}* different partition clauses (none, one, multiple) * different order clauses (none, one, multiple, asc/desc, nulls first/last) was: * different partition clauses (none, one, multiple) * different order clauses (none, one, multiple, asc/desc, nulls first/last) > Add unit test for Window Specifications > --- > > Key: SPARK-22360 > URL: https://issues.apache.org/jira/browse/SPARK-22360 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Jiang Xingbo > > {color:red}colored text{color}* different partition clauses (none, one, > multiple) > * different order clauses (none, one, multiple, asc/desc, nulls first/last) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22360) Add unit test for Window Specifications
[ https://issues.apache.org/jira/browse/SPARK-22360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandor Murakozi updated SPARK-22360: Description: * different partition clauses (none, one, multiple) * different order clauses (none, one, multiple, asc/desc, nulls first/last) was: {color:red}colored text{color}* different partition clauses (none, one, multiple) * different order clauses (none, one, multiple, asc/desc, nulls first/last) > Add unit test for Window Specifications > --- > > Key: SPARK-22360 > URL: https://issues.apache.org/jira/browse/SPARK-22360 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Jiang Xingbo > > * different partition clauses (none, one, multiple) > * different order clauses (none, one, multiple, asc/desc, nulls first/last) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22804) Using a window function inside of an aggregation causes StackOverflowError
[ https://issues.apache.org/jira/browse/SPARK-22804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandor Murakozi updated SPARK-22804: Priority: Minor (was: Major) > Using a window function inside of an aggregation causes StackOverflowError > -- > > Key: SPARK-22804 > URL: https://issues.apache.org/jira/browse/SPARK-22804 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Sandor Murakozi >Priority: Minor > > {code} > import org.apache.spark.sql.expressions.Window > val df = Seq(("a", 1), ("a", 2), ("b", 3)).toDF("key", "value") > df.select(min(avg('value).over(Window.partitionBy('key.show > {code} > produces > {code} > java.lang.StackOverflowError > at org.apache.spark.sql.catalyst.trees.TreeNode.find(TreeNode.scala:106) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$find$1$$anonfun$apply$1.apply(TreeNode.scala:109) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$find$1$$anonfun$apply$1.apply(TreeNode.scala:109) > at scala.Option.orElse(Option.scala:289) > ... > at org.apache.spark.sql.catalyst.trees.TreeNode.find(TreeNode.scala:109) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$.org$apache$spark$sql$catalyst$analysis$Analyzer$ExtractWindowExpressions$$hasWindowFunction(Analyzer.scala:1853) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$65.apply(Analyzer.scala:1877) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$65.apply(Analyzer.scala:1877) > at > scala.collection.TraversableLike$$anonfun$partition$1.apply(TraversableLike.scala:314) > at > scala.collection.TraversableLike$$anonfun$partition$1.apply(TraversableLike.scala:314) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > scala.collection.TraversableLike$class.partition(TraversableLike.scala:314) > at scala.collection.AbstractTraversable.partition(Traversable.scala:104) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$.org$apache$spark$sql$catalyst$analysis$Analyzer$ExtractWindowExpressions$$extract(Analyzer.scala:1877) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$apply$27.applyOrElse(Analyzer.scala:2060) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$apply$27.applyOrElse(Analyzer.scala:2021) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) > {code} > > {code} > df.select(min(avg('value).over())).show > {code} > produces a Stackoverflow as well. > {code} > df.select(min(avg('value))).show > org.apache.spark.sql.AnalysisException: It is not allowed to use an aggregate > function in the argument of another aggregate function. Please use the inner > aggregate function in a sub-query.;; > ... > df.select(min(avg('value)).over()).show > +---+ > |min(avg(value)) OVER (UnspecifiedFrame)| > +---+ > |2.0| > +---+ > {code} > I think this is a valid use case, so in the ideal case it should work. > But even if it's not supported I would expect an error message similar to the > non-window version. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22804) Using a window function inside of an aggregation causes StackOverflowError
[ https://issues.apache.org/jira/browse/SPARK-22804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16292732#comment-16292732 ] Sandor Murakozi commented on SPARK-22804: - Indeed, it looks pretty similar. I will check if it's the same. Thanks for the hint, [~sowen] > Using a window function inside of an aggregation causes StackOverflowError > -- > > Key: SPARK-22804 > URL: https://issues.apache.org/jira/browse/SPARK-22804 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Sandor Murakozi > > {code} > import org.apache.spark.sql.expressions.Window > val df = Seq(("a", 1), ("a", 2), ("b", 3)).toDF("key", "value") > df.select(min(avg('value).over(Window.partitionBy('key.show > {code} > produces > {code} > java.lang.StackOverflowError > at org.apache.spark.sql.catalyst.trees.TreeNode.find(TreeNode.scala:106) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$find$1$$anonfun$apply$1.apply(TreeNode.scala:109) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$find$1$$anonfun$apply$1.apply(TreeNode.scala:109) > at scala.Option.orElse(Option.scala:289) > ... > at org.apache.spark.sql.catalyst.trees.TreeNode.find(TreeNode.scala:109) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$.org$apache$spark$sql$catalyst$analysis$Analyzer$ExtractWindowExpressions$$hasWindowFunction(Analyzer.scala:1853) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$65.apply(Analyzer.scala:1877) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$65.apply(Analyzer.scala:1877) > at > scala.collection.TraversableLike$$anonfun$partition$1.apply(TraversableLike.scala:314) > at > scala.collection.TraversableLike$$anonfun$partition$1.apply(TraversableLike.scala:314) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > scala.collection.TraversableLike$class.partition(TraversableLike.scala:314) > at scala.collection.AbstractTraversable.partition(Traversable.scala:104) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$.org$apache$spark$sql$catalyst$analysis$Analyzer$ExtractWindowExpressions$$extract(Analyzer.scala:1877) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$apply$27.applyOrElse(Analyzer.scala:2060) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$apply$27.applyOrElse(Analyzer.scala:2021) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) > {code} > > {code} > df.select(min(avg('value).over())).show > {code} > produces a Stackoverflow as well. > {code} > df.select(min(avg('value))).show > org.apache.spark.sql.AnalysisException: It is not allowed to use an aggregate > function in the argument of another aggregate function. Please use the inner > aggregate function in a sub-query.;; > ... > df.select(min(avg('value)).over()).show > +---+ > |min(avg(value)) OVER (UnspecifiedFrame)| > +---+ > |2.0| > +---+ > {code} > I think this is a valid use case, so in the ideal case it should work. > But even if it's not supported I would expect an error message similar to the > non-window version. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22804) Using a window function inside of an aggregation causes StackOverflowError
[ https://issues.apache.org/jira/browse/SPARK-22804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandor Murakozi updated SPARK-22804: Description: {code} import org.apache.spark.sql.expressions.Window val df = Seq(("a", 1), ("a", 2), ("b", 3)).toDF("key", "value") df.select(min(avg('value).over(Window.partitionBy('key.show {code} produces {code} java.lang.StackOverflowError at org.apache.spark.sql.catalyst.trees.TreeNode.find(TreeNode.scala:106) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$find$1$$anonfun$apply$1.apply(TreeNode.scala:109) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$find$1$$anonfun$apply$1.apply(TreeNode.scala:109) at scala.Option.orElse(Option.scala:289) ... at org.apache.spark.sql.catalyst.trees.TreeNode.find(TreeNode.scala:109) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$.org$apache$spark$sql$catalyst$analysis$Analyzer$ExtractWindowExpressions$$hasWindowFunction(Analyzer.scala:1853) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$65.apply(Analyzer.scala:1877) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$65.apply(Analyzer.scala:1877) at scala.collection.TraversableLike$$anonfun$partition$1.apply(TraversableLike.scala:314) at scala.collection.TraversableLike$$anonfun$partition$1.apply(TraversableLike.scala:314) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike$class.partition(TraversableLike.scala:314) at scala.collection.AbstractTraversable.partition(Traversable.scala:104) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$.org$apache$spark$sql$catalyst$analysis$Analyzer$ExtractWindowExpressions$$extract(Analyzer.scala:1877) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$apply$27.applyOrElse(Analyzer.scala:2060) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$apply$27.applyOrElse(Analyzer.scala:2021) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) {code} {code} df.select(min(avg('value).over())).show {code} produces a Stackoverflow as well. {code} df.select(min(avg('value))).show org.apache.spark.sql.AnalysisException: It is not allowed to use an aggregate function in the argument of another aggregate function. Please use the inner aggregate function in a sub-query.;; ... df.select(min(avg('value)).over()).show +---+ |min(avg(value)) OVER (UnspecifiedFrame)| +---+ |2.0| +---+ {code} I think this is a valid use case, so in the ideal case it should work. But even if it's not supported I would expect an error message similar to the non-window version. was: {code} import org.apache.spark.sql.expressions.Window val df = Seq(("a", 1), ("a", 2), ("b", 3)).toDF("key", "value") df.select(min(avg('value).over(Window.partitionBy('key.show {code} produces {code} java.lang.StackOverflowError at org.apache.spark.sql.catalyst.trees.TreeNode.find(TreeNode.scala:106) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$find$1$$anonfun$apply$1.apply(TreeNode.scala:109) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$find$1$$anonfun$apply$1.apply(TreeNode.scala:109) at scala.Option.orElse(Option.scala:289) ... at org.apache.spark.sql.catalyst.trees.TreeNode.find(TreeNode.scala:109) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$.org$apache$spark$sql$catalyst$analysis$Analyzer$ExtractWindowExpressions$$hasWindowFunction(Analyzer.scala:1853) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$65.apply(Analyzer.scala:1877) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$65.apply(Analyzer.scala:1877) at scala.collection.TraversableLike$$anonfun$partition$1.apply(TraversableLike.scala:314) at scala.collection.TraversableLike$$anonfun$partition$1.apply(TraversableLike.scala:314) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike$class.partition(TraversableLike.scala:314) at scala.collection.AbstractTraversable.partition(Traversable.scala:104) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$.org$apache$spark$sql$catalyst$analysis$Analyzer$ExtractWindowExpressions$$extract(Analyzer.scala:1877) at
[jira] [Updated] (SPARK-22804) Using a window function inside of an aggregation causes StackOverflowError
[ https://issues.apache.org/jira/browse/SPARK-22804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandor Murakozi updated SPARK-22804: Description: {code} import org.apache.spark.sql.expressions.Window val df = Seq(("a", 1), ("a", 2), ("b", 3)).toDF("key", "value") df.select(min(avg('value).over(Window.partitionBy('key.show {code} produces {code} java.lang.StackOverflowError at org.apache.spark.sql.catalyst.trees.TreeNode.find(TreeNode.scala:106) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$find$1$$anonfun$apply$1.apply(TreeNode.scala:109) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$find$1$$anonfun$apply$1.apply(TreeNode.scala:109) at scala.Option.orElse(Option.scala:289) ... at org.apache.spark.sql.catalyst.trees.TreeNode.find(TreeNode.scala:109) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$.org$apache$spark$sql$catalyst$analysis$Analyzer$ExtractWindowExpressions$$hasWindowFunction(Analyzer.scala:1853) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$65.apply(Analyzer.scala:1877) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$65.apply(Analyzer.scala:1877) at scala.collection.TraversableLike$$anonfun$partition$1.apply(TraversableLike.scala:314) at scala.collection.TraversableLike$$anonfun$partition$1.apply(TraversableLike.scala:314) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike$class.partition(TraversableLike.scala:314) at scala.collection.AbstractTraversable.partition(Traversable.scala:104) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$.org$apache$spark$sql$catalyst$analysis$Analyzer$ExtractWindowExpressions$$extract(Analyzer.scala:1877) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$apply$27.applyOrElse(Analyzer.scala:2060) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$apply$27.applyOrElse(Analyzer.scala:2021) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) {code} {code} df.select(min(avg('value).over())).show {code} produces a Stackoverflow as well. {code} df.select(min(avg('value))).show org.apache.spark.sql.AnalysisException: It is not allowed to use an aggregate function in the argument of another aggregate function. Please use the inner aggregate function in a sub-query.;; ... df.select(min(avg('value)).over()).show +---+ |min(avg(value)) OVER (UnspecifiedFrame)| +---+ |2.0| +---+ {code} was: {code} import org.apache.spark.sql.expressions.Window val df = Seq(("a", 1), ("a", 2), ("b", 3)).toDF("key", "value") df.select(min(avg('value).over(Window.partitionBy('key.show {code} produces {code} java.lang.StackOverflowError at org.apache.spark.sql.catalyst.trees.TreeNode.find(TreeNode.scala:106) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$find$1$$anonfun$apply$1.apply(TreeNode.scala:109) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$find$1$$anonfun$apply$1.apply(TreeNode.scala:109) at scala.Option.orElse(Option.scala:289) ... at org.apache.spark.sql.catalyst.trees.TreeNode.find(TreeNode.scala:109) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$.org$apache$spark$sql$catalyst$analysis$Analyzer$ExtractWindowExpressions$$hasWindowFunction(Analyzer.scala:1853) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$65.apply(Analyzer.scala:1877) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$65.apply(Analyzer.scala:1877) at scala.collection.TraversableLike$$anonfun$partition$1.apply(TraversableLike.scala:314) at scala.collection.TraversableLike$$anonfun$partition$1.apply(TraversableLike.scala:314) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike$class.partition(TraversableLike.scala:314) at scala.collection.AbstractTraversable.partition(Traversable.scala:104) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$.org$apache$spark$sql$catalyst$analysis$Analyzer$ExtractWindowExpressions$$extract(Analyzer.scala:1877) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$apply$27.applyOrElse(Analyzer.scala:2060) at
[jira] [Updated] (SPARK-22804) Using a window function inside of an aggregation causes StackOverflowError
[ https://issues.apache.org/jira/browse/SPARK-22804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandor Murakozi updated SPARK-22804: Description: {code} import org.apache.spark.sql.expressions.Window val df = Seq(("a", 1), ("a", 2), ("b", 3)).toDF("key", "value") df.select(min(avg('value).over(Window.partitionBy('key.show {code} produces {code} java.lang.StackOverflowError at org.apache.spark.sql.catalyst.trees.TreeNode.find(TreeNode.scala:106) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$find$1$$anonfun$apply$1.apply(TreeNode.scala:109) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$find$1$$anonfun$apply$1.apply(TreeNode.scala:109) at scala.Option.orElse(Option.scala:289) ... at org.apache.spark.sql.catalyst.trees.TreeNode.find(TreeNode.scala:109) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$.org$apache$spark$sql$catalyst$analysis$Analyzer$ExtractWindowExpressions$$hasWindowFunction(Analyzer.scala:1853) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$65.apply(Analyzer.scala:1877) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$65.apply(Analyzer.scala:1877) at scala.collection.TraversableLike$$anonfun$partition$1.apply(TraversableLike.scala:314) at scala.collection.TraversableLike$$anonfun$partition$1.apply(TraversableLike.scala:314) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike$class.partition(TraversableLike.scala:314) at scala.collection.AbstractTraversable.partition(Traversable.scala:104) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$.org$apache$spark$sql$catalyst$analysis$Analyzer$ExtractWindowExpressions$$extract(Analyzer.scala:1877) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$apply$27.applyOrElse(Analyzer.scala:2060) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$apply$27.applyOrElse(Analyzer.scala:2021) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) {code} {code} df.select(min(avg('value).over())).show {code} produces a similar error. {code} df.select(min(avg('value))).show org.apache.spark.sql.AnalysisException: It is not allowed to use an aggregate function in the argument of another aggregate function. Please use the inner aggregate function in a sub-query.;; ... df.select(min(avg('value)).over()).show +---+ |min(avg(value)) OVER (UnspecifiedFrame)| +---+ |2.0| +---+ {code} was: {code} import org.apache.spark.sql.expressions.Window val df = Seq(("a", 1), ("a", 2), ("b", 3)).toDF("key", "value") df.select(min(avg('value).over(Window.partitionBy('key.show {code} produces {code} java.lang.StackOverflowError at org.apache.spark.sql.catalyst.trees.TreeNode.find(TreeNode.scala:106) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$find$1$$anonfun$apply$1.apply(TreeNode.scala:109) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$find$1$$anonfun$apply$1.apply(TreeNode.scala:109) at scala.Option.orElse(Option.scala:289) ... at org.apache.spark.sql.catalyst.trees.TreeNode.find(TreeNode.scala:109) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$.org$apache$spark$sql$catalyst$analysis$Analyzer$ExtractWindowExpressions$$hasWindowFunction(Analyzer.scala:1853) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$65.apply(Analyzer.scala:1877) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$65.apply(Analyzer.scala:1877) at scala.collection.TraversableLike$$anonfun$partition$1.apply(TraversableLike.scala:314) at scala.collection.TraversableLike$$anonfun$partition$1.apply(TraversableLike.scala:314) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike$class.partition(TraversableLike.scala:314) at scala.collection.AbstractTraversable.partition(Traversable.scala:104) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$.org$apache$spark$sql$catalyst$analysis$Analyzer$ExtractWindowExpressions$$extract(Analyzer.scala:1877) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$apply$27.applyOrElse(Analyzer.scala:2060) at
[jira] [Updated] (SPARK-22804) Using a window function inside of an aggregation causes StackOverflowError
[ https://issues.apache.org/jira/browse/SPARK-22804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandor Murakozi updated SPARK-22804: Description: {code} import org.apache.spark.sql.expressions.Window val df = Seq(("a", 1), ("a", 2), ("b", 3)).toDF("key", "value") df.select(min(avg('value).over(Window.partitionBy('key.show {code} produces {code} java.lang.StackOverflowError at org.apache.spark.sql.catalyst.trees.TreeNode.find(TreeNode.scala:106) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$find$1$$anonfun$apply$1.apply(TreeNode.scala:109) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$find$1$$anonfun$apply$1.apply(TreeNode.scala:109) at scala.Option.orElse(Option.scala:289) ... at org.apache.spark.sql.catalyst.trees.TreeNode.find(TreeNode.scala:109) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$.org$apache$spark$sql$catalyst$analysis$Analyzer$ExtractWindowExpressions$$hasWindowFunction(Analyzer.scala:1853) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$65.apply(Analyzer.scala:1877) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$65.apply(Analyzer.scala:1877) at scala.collection.TraversableLike$$anonfun$partition$1.apply(TraversableLike.scala:314) at scala.collection.TraversableLike$$anonfun$partition$1.apply(TraversableLike.scala:314) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike$class.partition(TraversableLike.scala:314) at scala.collection.AbstractTraversable.partition(Traversable.scala:104) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$.org$apache$spark$sql$catalyst$analysis$Analyzer$ExtractWindowExpressions$$extract(Analyzer.scala:1877) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$apply$27.applyOrElse(Analyzer.scala:2060) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$apply$27.applyOrElse(Analyzer.scala:2021) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) {code} ... {code} df.select(min(avg('value).over())).show {code} produces a similar error. {code} df.select(min(avg('value))).show org.apache.spark.sql.AnalysisException: It is not allowed to use an aggregate function in the argument of another aggregate function. Please use the inner aggregate function in a sub-query.;; ... df.select(min(avg('value)).over()).show +---+ |min(avg(value)) OVER (UnspecifiedFrame)| +---+ |2.0| +---+ {code} was: {code:scala} import org.apache.spark.sql.expressions.Window val df = Seq(("a", 1), ("a", 2), ("b", 3)).toDF("key", "value") df.select(min(avg('value).over(Window.partitionBy('key.show {code} produces {code} java.lang.StackOverflowError at org.apache.spark.sql.catalyst.trees.TreeNode.find(TreeNode.scala:106) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$find$1$$anonfun$apply$1.apply(TreeNode.scala:109) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$find$1$$anonfun$apply$1.apply(TreeNode.scala:109) at scala.Option.orElse(Option.scala:289) ... at org.apache.spark.sql.catalyst.trees.TreeNode.find(TreeNode.scala:109) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$.org$apache$spark$sql$catalyst$analysis$Analyzer$ExtractWindowExpressions$$hasWindowFunction(Analyzer.scala:1853) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$65.apply(Analyzer.scala:1877) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$65.apply(Analyzer.scala:1877) at scala.collection.TraversableLike$$anonfun$partition$1.apply(TraversableLike.scala:314) at scala.collection.TraversableLike$$anonfun$partition$1.apply(TraversableLike.scala:314) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike$class.partition(TraversableLike.scala:314) at scala.collection.AbstractTraversable.partition(Traversable.scala:104) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$.org$apache$spark$sql$catalyst$analysis$Analyzer$ExtractWindowExpressions$$extract(Analyzer.scala:1877) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$apply$27.applyOrElse(Analyzer.scala:2060) at
[jira] [Created] (SPARK-22804) Using a window function inside of an aggregation causes StackOverflowError
Sandor Murakozi created SPARK-22804: --- Summary: Using a window function inside of an aggregation causes StackOverflowError Key: SPARK-22804 URL: https://issues.apache.org/jira/browse/SPARK-22804 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.2.0 Reporter: Sandor Murakozi {code:scala} import org.apache.spark.sql.expressions.Window val df = Seq(("a", 1), ("a", 2), ("b", 3)).toDF("key", "value") df.select(min(avg('value).over(Window.partitionBy('key.show {code} produces {code} java.lang.StackOverflowError at org.apache.spark.sql.catalyst.trees.TreeNode.find(TreeNode.scala:106) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$find$1$$anonfun$apply$1.apply(TreeNode.scala:109) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$find$1$$anonfun$apply$1.apply(TreeNode.scala:109) at scala.Option.orElse(Option.scala:289) ... at org.apache.spark.sql.catalyst.trees.TreeNode.find(TreeNode.scala:109) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$.org$apache$spark$sql$catalyst$analysis$Analyzer$ExtractWindowExpressions$$hasWindowFunction(Analyzer.scala:1853) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$65.apply(Analyzer.scala:1877) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$65.apply(Analyzer.scala:1877) at scala.collection.TraversableLike$$anonfun$partition$1.apply(TraversableLike.scala:314) at scala.collection.TraversableLike$$anonfun$partition$1.apply(TraversableLike.scala:314) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike$class.partition(TraversableLike.scala:314) at scala.collection.AbstractTraversable.partition(Traversable.scala:104) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$.org$apache$spark$sql$catalyst$analysis$Analyzer$ExtractWindowExpressions$$extract(Analyzer.scala:1877) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$apply$27.applyOrElse(Analyzer.scala:2060) at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions$$anonfun$apply$27.applyOrElse(Analyzer.scala:2021) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) {code} ... {code:scala} df.select(min(avg('value).over())).show {code} produces a similar error. {code:scala} df.select(min(avg('value))).show org.apache.spark.sql.AnalysisException: It is not allowed to use an aggregate function in the argument of another aggregate function. Please use the inner aggregate function in a sub-query.;; ... df.select(min(avg('value)).over()).show +---+ |min(avg(value)) OVER (UnspecifiedFrame)| +---+ |2.0| +---+ {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-22360) Add unit test for Window Specifications
[ https://issues.apache.org/jira/browse/SPARK-22360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16289803#comment-16289803 ] Sandor Murakozi edited comment on SPARK-22360 at 12/13/17 8:09 PM: --- I just did a quick check of the existing test cases to see the current coverage: different partition clauses * None ** Window.rowsBetween ** reverse unbounded range frame ** window function with aggregates ** Null inputs * One ** Lots of tests * Multiple ** No test different order clauses * None ** Window.rowsBetween ** window function should fail if order by clause is not specified ** statistical functions * One ** Lots of tests * Multiple ** No test * asc ** lots of tests * desc ** aggregation and range between with unbounded ** reverse sliding range frame ** reverse unbounded range frame * nulls first/last ** last/first with ignoreNulls I will create tests for those that are not yet covered. I will also check if there are any special combinations (possibly also considering frames) that require additional test cases. [~jiangxb] are there other cases that need to be covered? Do you think it would be worthwhile to have a set of new test cases focusing and systematically going through all the partition and order clauses? was (Author: smurakozi): I just did a quick check of the existing test cases to see the current coverage: different partition clauses * None ** Window.rowsBetween ** reverse unbounded range frame ** window function with aggregates ** Null inputs * One ** Lots of tests * Multiple ** No test different order clauses * None ** Window.rowsBetween ** window function should fail if order by clause is not specified ** statistical functions * One ** Lots of tests * Multiple ** No test * asc ** lots of tests * desc ** aggregation and range between with unbounded ** reverse sliding range frame ** reverse unbounded range frame * nulls first/last ** last/first with ignoreNulls I will create tests for those that are not yet covered. I will also check if there are any special combinations (possibly also considering frames) that require additional test cases. [~jiangxb] are there other cases that need to be covered? Do you think it would be worthwhile to have a set of new test cases focusing and systematically going through all the partition and order clauses? > Add unit test for Window Specifications > --- > > Key: SPARK-22360 > URL: https://issues.apache.org/jira/browse/SPARK-22360 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Jiang Xingbo > > * different partition clauses (none, one, multiple) > * different order clauses (none, one, multiple, asc/desc, nulls first/last) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22360) Add unit test for Window Specifications
[ https://issues.apache.org/jira/browse/SPARK-22360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16289803#comment-16289803 ] Sandor Murakozi commented on SPARK-22360: - I just did a quick check of the existing test cases to see the current coverage: different partition clauses * None ** Window.rowsBetween ** reverse unbounded range frame ** window function with aggregates ** Null inputs * One ** Lots of tests * Multiple ** No test * different order clauses * None ** Window.rowsBetween ** window function should fail if order by clause is not specified ** statistical functions * One ** Lots of tests * Multiple ** No test * asc ** lots of tests * desc ** aggregation and range between with unbounded ** reverse sliding range frame ** reverse unbounded range frame * nulls first/last ** last/first with ignoreNulls I will create tests for those that are not yet covered. I will also check if there are any special combinations (possibly also considering frames) that require additional test cases. [~jiangxb] are there other cases that need to be covered? Do you think it would be worthwhile to have a set of new test cases focusing and systematically going through all the partition and order clauses? > Add unit test for Window Specifications > --- > > Key: SPARK-22360 > URL: https://issues.apache.org/jira/browse/SPARK-22360 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Jiang Xingbo > > * different partition clauses (none, one, multiple) > * different order clauses (none, one, multiple, asc/desc, nulls first/last) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-22360) Add unit test for Window Specifications
[ https://issues.apache.org/jira/browse/SPARK-22360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16289803#comment-16289803 ] Sandor Murakozi edited comment on SPARK-22360 at 12/13/17 8:08 PM: --- I just did a quick check of the existing test cases to see the current coverage: different partition clauses * None ** Window.rowsBetween ** reverse unbounded range frame ** window function with aggregates ** Null inputs * One ** Lots of tests * Multiple ** No test different order clauses * None ** Window.rowsBetween ** window function should fail if order by clause is not specified ** statistical functions * One ** Lots of tests * Multiple ** No test * asc ** lots of tests * desc ** aggregation and range between with unbounded ** reverse sliding range frame ** reverse unbounded range frame * nulls first/last ** last/first with ignoreNulls I will create tests for those that are not yet covered. I will also check if there are any special combinations (possibly also considering frames) that require additional test cases. [~jiangxb] are there other cases that need to be covered? Do you think it would be worthwhile to have a set of new test cases focusing and systematically going through all the partition and order clauses? was (Author: smurakozi): I just did a quick check of the existing test cases to see the current coverage: different partition clauses * None ** Window.rowsBetween ** reverse unbounded range frame ** window function with aggregates ** Null inputs * One ** Lots of tests * Multiple ** No test * different order clauses * None ** Window.rowsBetween ** window function should fail if order by clause is not specified ** statistical functions * One ** Lots of tests * Multiple ** No test * asc ** lots of tests * desc ** aggregation and range between with unbounded ** reverse sliding range frame ** reverse unbounded range frame * nulls first/last ** last/first with ignoreNulls I will create tests for those that are not yet covered. I will also check if there are any special combinations (possibly also considering frames) that require additional test cases. [~jiangxb] are there other cases that need to be covered? Do you think it would be worthwhile to have a set of new test cases focusing and systematically going through all the partition and order clauses? > Add unit test for Window Specifications > --- > > Key: SPARK-22360 > URL: https://issues.apache.org/jira/browse/SPARK-22360 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Jiang Xingbo > > * different partition clauses (none, one, multiple) > * different order clauses (none, one, multiple, asc/desc, nulls first/last) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22359) Improve the test coverage of window functions
[ https://issues.apache.org/jira/browse/SPARK-22359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16289792#comment-16289792 ] Sandor Murakozi commented on SPARK-22359: - I'm glad that you join [~gsomogyi]. Do you have any preferred subtasks? I've started to work on the first, dealing with WindowSpec. > Improve the test coverage of window functions > - > > Key: SPARK-22359 > URL: https://issues.apache.org/jira/browse/SPARK-22359 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 2.3.0 >Reporter: Jiang Xingbo > > There are already quite a few integration tests using window functions, but > the unit tests coverage for window funtions is not ideal. > We'd like to test the following aspects: > * Specifications > ** different partition clauses (none, one, multiple) > ** different order clauses (none, one, multiple, asc/desc, nulls first/last) > * Frames and their combinations > ** OffsetWindowFunctionFrame > ** UnboundedWindowFunctionFrame > ** SlidingWindowFunctionFrame > ** UnboundedPrecedingWindowFunctionFrame > ** UnboundedFollowingWindowFunctionFrame > * Aggregate function types > ** Declarative > ** Imperative > ** UDAF > * Spilling > ** Cover the conditions that WindowExec should spill at least once -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22359) Improve the test coverage of window functions
[ https://issues.apache.org/jira/browse/SPARK-22359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16289300#comment-16289300 ] Sandor Murakozi commented on SPARK-22359: - Is there anyone working on these issues? If not, I would be happy to jump on them. > Improve the test coverage of window functions > - > > Key: SPARK-22359 > URL: https://issues.apache.org/jira/browse/SPARK-22359 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 2.3.0 >Reporter: Jiang Xingbo > > There are already quite a few integration tests using window functions, but > the unit tests coverage for window funtions is not ideal. > We'd like to test the following aspects: > * Specifications > ** different partition clauses (none, one, multiple) > ** different order clauses (none, one, multiple, asc/desc, nulls first/last) > * Frames and their combinations > ** OffsetWindowFunctionFrame > ** UnboundedWindowFunctionFrame > ** SlidingWindowFunctionFrame > ** UnboundedPrecedingWindowFunctionFrame > ** UnboundedFollowingWindowFunctionFrame > * Aggregate function types > ** Declarative > ** Imperative > ** UDAF > * Spilling > ** Cover the conditions that WindowExec should spill at least once -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22331) Make MLlib string params case-insensitive
[ https://issues.apache.org/jira/browse/SPARK-22331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268670#comment-16268670 ] Sandor Murakozi commented on SPARK-22331: - Anyone working on this issue? If not, I would be happy to take it. > Make MLlib string params case-insensitive > - > > Key: SPARK-22331 > URL: https://issues.apache.org/jira/browse/SPARK-22331 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 2.2.0 >Reporter: yuhao yang >Priority: Minor > > Some String params in ML are still case-sensitive, as they are checked by > ParamValidators.inArray. > For consistency in user experience, there should be some general guideline in > whether String params in Spark MLlib are case-insensitive or not. > I'm leaning towards making all String params case-insensitive where possible. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22516) CSV Read breaks: When "multiLine" = "true", if "comment" option is set as last line's first character
[ https://issues.apache.org/jira/browse/SPARK-22516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262516#comment-16262516 ] Sandor Murakozi commented on SPARK-22516: - I'm a newbie, I would be happy to happy to work on it. Would it be ok for you [~hyukjin.kwon]? > CSV Read breaks: When "multiLine" = "true", if "comment" option is set as > last line's first character > - > > Key: SPARK-22516 > URL: https://issues.apache.org/jira/browse/SPARK-22516 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Kumaresh C R >Priority: Minor > Labels: csvparser > Attachments: testCommentChar.csv, test_file_without_eof_char.csv > > > Try to read attached CSV file with following parse properties, > scala> *val csvFile = > spark.read.option("header","true").option("inferSchema", > "true").option("parserLib", "univocity").option("comment", > "c").csv("hdfs://localhost:8020/test > CommentChar.csv"); * > > > csvFile: org.apache.spark.sql.DataFrame = [a: string, b: string] > > > > > > scala> csvFile.show > > > +---+---+ > > > | a| b| > > > +---+---+ > > > +---+---+ > {color:#8eb021}*Noticed that it works fine.*{color} > If we add an option "multiLine" = "true", it fails with below exception. This > happens only if we pass "comment" == input dataset's last line's first > character > scala> val csvFile = > *spark.read.option("header","true").{color:red}{color:#d04437}option("multiLine","true"){color}{color}.option("inferSchema", > "true").option("parserLib", "univocity").option("comment", > "c").csv("hdfs://localhost:8020/testCommentChar.csv");* > 17/11/14 14:26:17 ERROR Executor: Exception in task 0.0 in stage 8.0 (TID 8) > com.univocity.parsers.common.TextParsingException: > java.lang.IllegalArgumentException - Unable to skip 1 lines from line 2. End > of input reached > Parser Configuration: CsvParserSettings: > Auto configuration enabled=true > Autodetect column delimiter=false > Autodetect quotes=false > Column reordering enabled=true > Empty value=null > Escape unquoted values=false > Header extraction enabled=null > Headers=null > Ignore leading whitespaces=false > Ignore trailing whitespaces=false > Input buffer size=128 > Input reading on separate thread=false > Keep escape sequences=false > Keep quotes=false > Length of content displayed on error=-1 > Line separator detection enabled=false > Maximum number of characters per column=-1 > Maximum number of columns=20480 > Normalize escaped line separators=true > Null value= > Number of records to read=all > Processor=none > Restricting data in exceptions=false > RowProcessor error handler=null > Selected fields=none > Skip empty lines=true > Unescaped quote handling=STOP_AT_DELIMITERFormat configuration: > CsvFormat: > Comment character=c > Field delimiter=, > Line separator (normalized)=\n > Line separator sequence=\r\n > Quote character=" > Quote escape character=\ > Quote escape escape character=null > Internal state when error was thrown: line=3, column=0, record=1, charIndex=19 > at > com.univocity.parsers.common.AbstractParser.handleException(AbstractParser.java:339) > at > com.univocity.parsers.common.AbstractParser.parseNext(AbstractParser.java:475) > at >