Hi, I just noticed it today while toying with Spark 2.0.0 (today's build) that doing Seq(...).toDF does **not** submit a Spark job while sc.parallelize(Seq(...)).toDF does. I was nicely surprised and been thinking about the reason for the behaviour.
My explanation was that Datasets are just a "view" layer atop data and when this data is local/in memory already there's no need to submit a job to...well...compute the data. I'd appreciate more in-depth answer, perhaps with links to the code. Thanks! Pozdrawiam, Jacek Laskowski ---- https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org