I noticed a behaviour where it was observed that, if i'm using val temp = sc.parallelize ( 1 to 100000)
temp.collect Task size will be in bytes let's say ( 1120 bytes). But if i change this to a for loop import scala.collection.mutable.ArrayBuffer val data= new ArrayBuffer[Integer]() for(i <- 1 to 1000000)data+=i val distData = sc.parallelize(data) distData.collect Here the task size is in MB's 5000120 bytes. Any inputs here would be appreciated, this is really confusing!!!! 1) Why does the data travel from Driver to Executor every time an Action is performed ( i thought the data exists in the Executor's memory, and only the code is pushed from driver to executor ) ?? 2) Why does Range not increase the task size, where as any other collection increases the size exponentially ?? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Task-size-variation-while-using-Range-Vs-List-tp18243.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org