Not all memory can be used for Java heap space, so maybe it does run out. Could you try repartitioning the data? To my knowledge you shouldn't be thrown out as long as a single partition fits into memory, even if the whole dataset does not.
To do that, exchange val train = parsedData.cache() with val train = parsedData.repartition(20).cache() Best regards, Simon -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/KMeans-Input-Format-tp11654p11719.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org