[jira] [Comment Edited] (SPARK-13407) TaskMetrics.fromAccumulatorUpdates can crash when trying to access garbage-collected accumulators
[ https://issues.apache.org/jira/browse/SPARK-13407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1583#comment-1583 ] EE edited comment on SPARK-13407 at 1/25/17 2:09 PM: - We recreated this issue on spark 1.6.2 as well. Please find below some details and the error: 1. We are using spark-streaming application 2. When we update accumulators we got the following error 3. It's a critical bug in our case, as it crashes the streaming, and we cannot encapsulate it. {code} Exception in thread "dag-scheduler-event-loop" java.lang.IllegalAccessError: Attempted to access garbage collected Accumulator. at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:353) at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:346) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at org.apache.spark.Accumulators$.add(Accumulators.scala:346) at org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:1081) at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1153) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1639) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1601) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1590) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) {code} I'll be glad if you could implement the fix in v1.6.2 as well was (Author: ee1): We recreated this issue on spark 1.6.2 as well. Please find below some details and the error: 1. We are using spark-streaming application 2. When we update accumulators we got the following error 3. It's a critical bug in our case, as it crashes the streaming, and we cannot encapsulate it. Exception in thread "dag-scheduler-event-loop" java.lang.IllegalAccessError: Attempted to access garbage collected Accumulator. at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:353) at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:346) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at org.apache.spark.Accumulators$.add(Accumulators.scala:346) at org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:1081) at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1153) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1639) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1601) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1590) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) I'll be glad if you could implement the fix in v1.6.2 as well > TaskMetrics.fromAccumulatorUpdates can crash when trying to access > garbage-collected accumulators > - > > Key: SPARK-13407 > URL: https://issues.apache.org/jira/browse/SPARK-13407 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Josh Rosen >Assignee: Josh Rosen > Fix For: 2.0.0 > > > TaskMetrics.fromAccumulatorUpdates can fail if accumulators have been > garbage-collected: > {code} > java.lang.IllegalAccessError: Attempted to access garbage collected > accumulator 481596 > at > org.apache.spark.Accumulators$$anonfun$get$1$$anonfun$apply$1.apply(Accumulator.scala:133) > at > org.apache.spark.Accumulators$$anonfun$get$1$$anonfun$apply$1.ap
[jira] [Comment Edited] (SPARK-13407) TaskMetrics.fromAccumulatorUpdates can crash when trying to access garbage-collected accumulators
[ https://issues.apache.org/jira/browse/SPARK-13407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1583#comment-1583 ] EE edited comment on SPARK-13407 at 1/25/17 2:07 PM: - We recreated this issue on spark 1.6.2 as well. Please find below some details and the error: 1. We are using spark-streaming application 2. When we update accumulators we got the following error 3. It's a critical bug in our case, as it crashes the streaming, and we cannot encapsulate it. Exception in thread "dag-scheduler-event-loop" java.lang.IllegalAccessError: Attempted to access garbage collected Accumulator. at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:353) at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:346) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at org.apache.spark.Accumulators$.add(Accumulators.scala:346) at org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:1081) at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1153) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1639) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1601) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1590) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) I'll be glad if you could implement the fix in v1.6.2 as well was (Author: ee1): We recreated this case on spark 1.6.2 as well. on spark-streaming application that updates accumulators we got the following error: Exception in thread "dag-scheduler-event-loop" java.lang.IllegalAccessError: Attempted to access garbage collected Accumulator. at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:353) at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:346) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at org.apache.spark.Accumulators$.add(Accumulators.scala:346) at org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:1081) at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1153) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1639) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1601) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1590) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > TaskMetrics.fromAccumulatorUpdates can crash when trying to access > garbage-collected accumulators > - > > Key: SPARK-13407 > URL: https://issues.apache.org/jira/browse/SPARK-13407 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Josh Rosen >Assignee: Josh Rosen > Fix For: 2.0.0 > > > TaskMetrics.fromAccumulatorUpdates can fail if accumulators have been > garbage-collected: > {code} > java.lang.IllegalAccessError: Attempted to access garbage collected > accumulator 481596 > at > org.apache.spark.Accumulators$$anonfun$get$1$$anonfun$apply$1.apply(Accumulator.scala:133) > at > org.apache.spark.Accumulators$$anonfun$get$1$$anonfun$apply$1.apply(Accumulator.scala:133) > at scala.Option.getOrElse(Option.scala:120) > at > org.apache.spark.Accumulators$$anonfun$get$1.apply(Accumulator.scala:132) > at > org.apache.spark.Accumulators$$anonfun$get$1.apply(Acc
[jira] [Commented] (SPARK-13407) TaskMetrics.fromAccumulatorUpdates can crash when trying to access garbage-collected accumulators
[ https://issues.apache.org/jira/browse/SPARK-13407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1583#comment-1583 ] EE commented on SPARK-13407: We recreated this case on spark 1.6.2 as well. on spark-streaming application that updates accumulators we got the following error: Exception in thread "dag-scheduler-event-loop" java.lang.IllegalAccessError: Attempted to access garbage collected Accumulator. at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:353) at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:346) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at org.apache.spark.Accumulators$.add(Accumulators.scala:346) at org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:1081) at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1153) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1639) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1601) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1590) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > TaskMetrics.fromAccumulatorUpdates can crash when trying to access > garbage-collected accumulators > - > > Key: SPARK-13407 > URL: https://issues.apache.org/jira/browse/SPARK-13407 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Josh Rosen >Assignee: Josh Rosen > Fix For: 2.0.0 > > > TaskMetrics.fromAccumulatorUpdates can fail if accumulators have been > garbage-collected: > {code} > java.lang.IllegalAccessError: Attempted to access garbage collected > accumulator 481596 > at > org.apache.spark.Accumulators$$anonfun$get$1$$anonfun$apply$1.apply(Accumulator.scala:133) > at > org.apache.spark.Accumulators$$anonfun$get$1$$anonfun$apply$1.apply(Accumulator.scala:133) > at scala.Option.getOrElse(Option.scala:120) > at > org.apache.spark.Accumulators$$anonfun$get$1.apply(Accumulator.scala:132) > at > org.apache.spark.Accumulators$$anonfun$get$1.apply(Accumulator.scala:130) > at scala.Option.map(Option.scala:145) > at org.apache.spark.Accumulators$.get(Accumulator.scala:130) > at > org.apache.spark.executor.TaskMetrics$$anonfun$9.apply(TaskMetrics.scala:414) > at > org.apache.spark.executor.TaskMetrics$$anonfun$9.apply(TaskMetrics.scala:412) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251) > at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105) > at > org.apache.spark.executor.TaskMetrics$.fromAccumulatorUpdates(TaskMetrics.scala:412) > at > org.apache.spark.ui.jobs.JobProgressListener$$anonfun$onExecutorMetricsUpdate$2.apply(JobProgressListener.scala:499) > at > org.apache.spark.ui.jobs.JobProgressListener$$anonfun$onExecutorMetricsUpdate$2.apply(JobProgressListener.scala:493) > at > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34) > at > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) > at > org.apache.spark.ui.jobs.JobProgressListener.onExecutorMetricsUpdate(JobProgressListener.scala:493) > at > org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:56) > at > org.apache.spark.scheduler.LiveListenerBus.doPostEvent(LiveListenerBus.scala:35) > at > org.apache.spark.scheduler.LiveLis
[jira] [Updated] (SPARK-12512) WithColumn does not work on multiple column with special character
[ https://issues.apache.org/jira/browse/SPARK-12512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] JO EE updated SPARK-12512: -- Description: Just for simplicity I am using Scalaide scala-worksheet to show the problem the withColumn could not work from .withColumnRenamed("bField","k.b:Field") {code:title=Bar.scala|borderStyle=solid} object bug { println("Welcome to the Scala worksheet") //> Welcome to the Scala worksheet import org.apache.spark.SparkContext import org.apache.spark.SparkConf import org.apache.spark.sql.SQLContext import org.apache.spark.sql.Row import org.apache.spark.sql.types.DateType import org.apache.spark.sql.functions._ import org.apache.spark.storage.StorageLevel._ import org.apache.spark.sql.types.{StructType,StructField,StringType} val conf = new SparkConf() .setMaster("local[4]") .setAppName("Testbug") //> conf : org.apache.spark.SparkConf = org.apache.spark.SparkConf@3b94d659 val sc = new SparkContext(conf) //> sc : org.apache.spark.SparkContext = org.apache.spark.SparkContext@1dcca8d3 //| val sqlContext = new SQLContext(sc) //> sqlContext : org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLCont //| ext@2d23faef val schemaString = "aField,bField,cField" //> schemaString : String = aField,bField,cField val schema = StructType(schemaString.split(",") .map(fieldName => StructField(fieldName, StringType, true))) //> schema : org.apache.spark.sql.types.StructType = StructType(StructField(aFi //| eld,StringType,true), StructField(bField,StringType,true), StructField(cFiel //| d,StringType,true)) //import sqlContext.implicits._ val newRDD = sc.parallelize(List(("a","b","c"))) .map(x=>Row(x._1,x._2,x._3)) //> newRDD : org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitions //| RDD[1] at map at com.joee.worksheet.bug.scala:30 val newDF = sqlContext.createDataFrame(newRDD, schema) //> newDF : org.apache.spark.sql.DataFrame = [aField: string, bField: string, c //| Field: string] val changeDF = newDF.withColumnRenamed("aField","anodotField") .withColumnRenamed("bField","bnodotField") .show() //> +---+---+--+ //| |anodotField|bnodotField|cField| //| +---+---+--+ //| | a| b| c| //| +---+---+--+ //| //| changeDF : Unit = () val changeDFwithdotfield1 = newDF.withColumnRenamed("aField","k.a:Field") //> changeDFwithdotfield1 : org.apache.spark.sql.DataFrame = [k.a:Field: strin //| g, bField: string, cField: string] val changeDFwithdotfield = changeDFwithdotfield1 .withColumnRenamed("bField","k.b:Field") //> org.apache.spark.sql.AnalysisException: cannot resolve 'k.a:Field' given in //| put columns k.a:Field, bField, cField; //| at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAn //| alysis(package.scala:42) //| at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAn //| alysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:56) //| at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAn //| alysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:53) //| at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.a //| pply(TreeNode.scala:293) //| at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.a
[jira] [Created] (SPARK-12512) WithColumn does not work on multiple column with special character
JO EE created SPARK-12512: - Summary: WithColumn does not work on multiple column with special character Key: SPARK-12512 URL: https://issues.apache.org/jira/browse/SPARK-12512 Project: Spark Issue Type: Bug Affects Versions: 1.5.2 Reporter: JO EE import org.apache.spark.SparkContext import org.apache.spark.SparkConf import org.apache.spark.sql.SQLContext import org.apache.spark.sql.Row import org.apache.spark.sql.types.DateType import org.apache.spark.sql.functions._ import org.apache.spark.storage.StorageLevel._ import org.apache.spark.sql.types.{StructType,StructField,StringType} val conf = new SparkConf() .setMaster("local[4]") .setAppName("Testbug") //> conf : org.apache.spark.SparkConf = org.apache.spark.SparkConf@3b94d659 val sc = new SparkContext(conf) //> sc : org.apache.spark.SparkContext = org.apache.spark.SparkContext@1dcca8d3 //| val sqlContext = new SQLContext(sc) //> sqlContext : org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLCont //| ext@2d23faef val schemaString = "aField,bField,cField" //> schemaString : String = aField,bField,cField val schema = StructType(schemaString.split(",") .map(fieldName => StructField(fieldName, StringType, true))) //> schema : org.apache.spark.sql.types.StructType = StructType(StructField(aFi //| eld,StringType,true), StructField(bField,StringType,true), StructField(cFiel //| d,StringType,true)) //import sqlContext.implicits._ val newRDD = sc.parallelize(List(("a","b","c"))) .map(x=>Row(x._1,x._2,x._3)) //> newRDD : org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitions //| RDD[1] at map at com.joee.worksheet.bug.scala:30 val newDF = sqlContext.createDataFrame(newRDD, schema) //> newDF : org.apache.spark.sql.DataFrame = [aField: string, bField: string, c //| Field: string] val changeDF = newDF.withColumnRenamed("aField","anodotField") .withColumnRenamed("bField","bnodotField") .show() //> +---+---+--+ //| |anodotField|bnodotField|cField| //| +---+---+--+ //| | a| b| c| //| +---+---+--+ //| //| changeDF : Unit = () val changeDFwithdotfield1 = newDF.withColumnRenamed("aField","k.a:Field") //> changeDFwithdotfield1 : org.apache.spark.sql.DataFrame = [k.a:Field: strin //| g, bField: string, cField: string] val changeDFwithdotfield = changeDFwithdotfield1 .withColumnRenamed("bField","k.b:Field") //> org.apache.spark.sql.AnalysisException: cannot resolve 'k.a:Field' given in //| put columns k.a:Field, bField, cField; //| at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAn //| alysis(package.scala:42) //| at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAn //| alysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:56) //| at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAn //| alysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:53) //| at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.a //| pply(TreeNode.scala:293) //| at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.a //| pply(TreeNode.scala:293) //|