I've isolated this to a memory issue but I don't know what parameter I need
to tweak. If I sample my samples RDD with 35% of the data, everything runs
to completion, with 35%, it fails. In standalone mode, I can run on the full
RDD without any problems.

// works
val samples = sc.textFile("s3n://geonames").sample(false,0.35) // 64MB,
2849439 Lines

// fails
val samples = sc.textFile("s3n://geonames").sample(false,0.4) // 64MB,
2849439 Lines

Any ideas? 




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-debug-Runs-locally-but-not-on-cluster-tp12081p12091.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to