I've isolated this to a memory issue but I don't know what parameter I need to tweak. If I sample my samples RDD with 35% of the data, everything runs to completion, with 35%, it fails. In standalone mode, I can run on the full RDD without any problems.
// works val samples = sc.textFile("s3n://geonames").sample(false,0.35) // 64MB, 2849439 Lines // fails val samples = sc.textFile("s3n://geonames").sample(false,0.4) // 64MB, 2849439 Lines Any ideas? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-debug-Runs-locally-but-not-on-cluster-tp12081p12091.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org