Quick update: It is a filter job that creates the error above, not the reduceByKey
Why would a filter cause an out of memory? Here is my code val inputgsup ="hdfs://"+sparkmasterip+"/user/sense/datasets/gsup/binary/30/2014/11/0[1-9]/part*"; val gsupfile = sc.newAPIHadoopFile[BytesWritable,BytesWritable,SequenceFileAsBinaryInputFormat](inputgsup) val gsup = gsupfile.map(x => (GsupHandler.DeserializeKey( x._1.getBytes ),GsupHandler.DeserializeValue( x._2.getBytes ))).map(x => (x._1._1,x._1._2,x._2._1, x._2._2)) val gsup_results_geod = gsup.flatMap(x=> doQueryGSUP(has_expo_criteria, has_fence_criteria, timerange_start_expo, timerange_end_expo, timerange_start_fence, timerange_end_fence, expo_pois, fence_pois,x)) val gsup_results_reduced = gsup_results_geod.reduceByKey((a,b)=>((a._1.toShort | b._1.toShort).toByte, a._2+b._2)) *val gsup_results = gsup_results_reduced.filter(x=>(criteria_filter.value contains x._2._1.toInt))* -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/JVM-Memory-Woes-tp19496p19510.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org